RE: One element per node

Ulanov, Alexander Fri, 18 Sep 2015 17:56:39 -0700

Sounds interesting! Is it possible to make it deterministic by using global 
long value and get the element on partition only if someFunction(partitionId, 
globalLong)==true? Or by using some specific partitioner that creates such 
partitionIds that can be decomposed into nodeId and number of partitions per 
node?

From: Reynold Xin [mailto:[email protected]]
Sent: Friday, September 18, 2015 4:37 PM
To: Ulanov, Alexander
Cc: Feynman Liang; [email protected]
Subject: Re: One element per node

Use a global atomic boolean and return nothing from that partition if the 
boolean is true.

Note that your result won't be deterministic.

On Sep 18, 2015, at 4:11 PM, Ulanov, Alexander 
<[email protected]<mailto:[email protected]>> wrote:
Thank you! How can I guarantee that I have only one element per executor (per 
worker, or per physical node)?

From: Feynman Liang [mailto:[email protected]]
Sent: Friday, September 18, 2015 4:06 PM
To: Ulanov, Alexander
Cc: [email protected]<mailto:[email protected]>
Subject: Re: One element per node

rdd.mapPartitions(x => new Iterator(x.head))

On Fri, Sep 18, 2015 at 3:57 PM, Ulanov, Alexander 
<[email protected]<mailto:[email protected]>> wrote:
Dear Spark developers,

Is it possible (and how to do it if possible) to pick one element per physical 
node from an RDD? Let’s say the first element of any partition on that node. 
The result would be an RDD[element], the count of elements is equal to the N of 
nodes that has partitions of the initial RDD.

Best regards, Alexander

RE: One element per node

Reply via email to