Hi Folks,

   There is a use-case , where we are doing large computation on two large
vectors. It is basically a scenario, where we run a flatmap operation on the
Left vector and run co-relation logic by comparing it with all the rows of
the second vector. When this flatmap operation is running on an executor,
this compares row 1 from left vector with all rows of the second vector. The
goal is that from this flatmap operation, we want to start another remote
map operation that compares a portion of right vector rows. This enables a
second level of concurrent operation, thereby increasing throughput and
utilizing other nodes. But to achieve this we need access to spark context
from within the Flatmap operation.

I have attached a snapshot describing the limitation.

<http://apache-spark-developers-list.1001551.n3.nabble.com/file/t3134/Concurrency_Snapshot.jpg>
 

In simple words, this boils down to having access to  a spark context from
within an executor , so that the next level of map or concurrent operations
can be spun on the partitions on other machines. I have some experience with
other in-memory compute grids technologies like Coherence, Hazelcast. This
frameworks do allow to trigger next level of concurrent operations from
within a task being executed on one node.


Regards,
Sandeep.



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Reply via email to