Hey Adrian,

Thanks for your fast reply. :)

Actually the “pre-condition” is not fixed in real application, e.g. it would 
change based on counting of previous unmatched elements.
So I need to use iterator operator, rather than flatMap-like operators…

Besides, do you have any idea on how to avoid that “sort again”? it is too 
costly… :(

Anyway thank you again!

Best,
Yifan LI





> On 12 Oct 2015, at 12:19, Adrian Tanase <atan...@adobe.com> wrote:
> 
> I think you’re looking for the flatMap (or flatMapValues) operator – you can 
> do something like
> 
> sortedRdd.flatMapValues( v =>
> If (v % 2 == 0) {
> Some(v / 2)
> } else {
> None
> }
> )
> 
> Then you need to sort again.
> 
> -adrian
> 
> From: Yifan LI
> Date: Monday, October 12, 2015 at 1:03 PM
> To: spark users
> Subject: "dynamically" sort a large collection?
> 
> Hey,
> 
> I need to scan a large "key-value" collection as below:
> 
> 1) sort it on an attribute of “value” 
> 2) scan it one by one, from element with largest value
> 2.1) if the current element matches a pre-defined condition, its value will 
> be reduced and the element will be inserted back to collection. 
> if not, this current element should be removed from collection.
> 
> 
> In my previous program, the 1) step can be easily conducted in Spark(RDD 
> operation), but I am not sure how to do 2.1) step, esp. the “put/inserted 
> back” operation on a sorted RDD.
> I have tried to make a new RDD at every-time an element was found to 
> inserted, but it is very costly due to a re-sorting…
> 
> 
> Is there anyone having some ideas?
> 
> Thanks so much!
> 
> ******************
> an example:
> 
> the sorted result of initial collection C(on bold value), sortedC:
> (1, (71, “aaa"))
> (2, (60, “bbb"))
> (3, (53.5, “ccc”))
> (4, (48, “ddd”))
> (5, (29, “eee"))
> …
> 
> pre-condition: its_value%2 == 0
> if pre-condition is matched, its value will be reduce on half.
> 
> Thus:
> 
> #1:
> 71 is not matched, so this element is removed.
> (1, (71, “aaa”)) —> removed!
> (2, (60, “bbb"))
> (3, (53.5, “ccc”))
> (4, (48, “ddd”))
> (5, (29, “eee"))
> …
> 
> #2:
> 60 is matched! 60/2 = 30, the collection right now should be as:
> (3, (53.5, “ccc”))
> (4, (48, “ddd”))
> (2, (30, “bbb”)) <— inserted back here
> (5, (29, “eee"))
> …
> 
> 
> 
> 
> 
> 
> Best,
> Yifan LI
> 
> 
> 
> 
> 

Reply via email to