If something is persisted you can easily see them under the Storage tab in
the web ui.
Thanks
Best Regards
On Tue, Nov 18, 2014 at 7:26 PM, Aniket Bhatnagar
aniket.bhatna...@gmail.com wrote:
I am trying to figure out if sorting is persisted after applying Pair RDD
transformations and I am
Akhil, I think Aniket uses the word persisted in a different way than
what you mean. I.e. not in the RDD.persist() way. Aniket asks if running
combineByKey on a sorted RDD will result in a sorted RDD. (I.e. the sorting
is preserved.)
I think the answer is no. combineByKey uses AppendOnlyMap,
Thanks Daniel. I can understand that the keys will not be in sorted order
but what I am trying to understanding is whether the functions are passed
values in sorted order in a given partition.
For example:
sc.parallelize(1 to 8).map(i = (1, i)).sortBy(t = t._2).foldByKey(0)((a,
b) = b).collect
Ah, so I misunderstood you too :).
My reading of org/ apache/spark/Aggregator.scala is that your function will
always see the items in the order that they are in the input RDD. An RDD
partition is always accessed as an iterator, so it will not be read out of
order.
On Wed, Nov 19, 2014 at 2:28
Thanks Daniel :-). It seems to make sense and something I was hoping for. I
will proceed with this assumption and report back if I see any anomalies.
On Wed Nov 19 2014 at 19:30:02 Daniel Darabos
daniel.dara...@lynxanalytics.com wrote:
Ah, so I misunderstood you too :).
My reading of org/
I am trying to figure out if sorting is persisted after applying Pair RDD
transformations and I am not able to decisively tell after reading the
documentation.
For example:
val numbers = .. // RDD of numbers
val pairedNumbers = numbers.map(number = (number % 100, number))
val sortedPairedNumbers