Hi Folks, There is a use-case , where we are doing large computation on two large vectors. It is basically a scenario, where we run a flatmap operation on the Left vector and run co-relation logic by comparing it with all the rows of the second vector. When this flatmap operation is running on an executor, this compares row 1 from left vector with all rows of the second vector. The goal is that from this flatmap operation, we want to start another remote map operation that compares a portion of right vector rows. This enables a second level of concurrent operation, thereby increasing throughput and utilizing other nodes. But to achieve this we need access to spark context from within the Flatmap operation.
I have attached a snapshot describing the limitation. <http://apache-spark-developers-list.1001551.n3.nabble.com/file/t3134/Concurrency_Snapshot.jpg> In simple words, this boils down to having access to a spark context from within an executor , so that the next level of map or concurrent operations can be spun on the partitions on other machines. I have some experience with other in-memory compute grids technologies like Coherence, Hazelcast. This frameworks do allow to trigger next level of concurrent operations from within a task being executed on one node. Regards, Sandeep. -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org