Sounds interesting! Is it possible to make it deterministic by using global long value and get the element on partition only if someFunction(partitionId, globalLong)==true? Or by using some specific partitioner that creates such partitionIds that can be decomposed into nodeId and number of partitions per node?
From: Reynold Xin [mailto:r...@databricks.com] Sent: Friday, September 18, 2015 4:37 PM To: Ulanov, Alexander Cc: Feynman Liang; dev@spark.apache.org Subject: Re: One element per node Use a global atomic boolean and return nothing from that partition if the boolean is true. Note that your result won't be deterministic. On Sep 18, 2015, at 4:11 PM, Ulanov, Alexander <alexander.ula...@hpe.com<mailto:alexander.ula...@hpe.com>> wrote: Thank you! How can I guarantee that I have only one element per executor (per worker, or per physical node)? From: Feynman Liang [mailto:fli...@databricks.com] Sent: Friday, September 18, 2015 4:06 PM To: Ulanov, Alexander Cc: dev@spark.apache.org<mailto:dev@spark.apache.org> Subject: Re: One element per node rdd.mapPartitions(x => new Iterator(x.head)) On Fri, Sep 18, 2015 at 3:57 PM, Ulanov, Alexander <alexander.ula...@hpe.com<mailto:alexander.ula...@hpe.com>> wrote: Dear Spark developers, Is it possible (and how to do it if possible) to pick one element per physical node from an RDD? Let’s say the first element of any partition on that node. The result would be an RDD[element], the count of elements is equal to the N of nodes that has partitions of the initial RDD. Best regards, Alexander