Dear community,

I have a RDD with N rows and N partitions. I want to ensure that the
partitions run all at the some time, by setting the number of vcores
(spark-yarn) to N. The partitions need to talk to each other with some
socket based sync that is why I need them to run more or less
simultaneously.

Let's assume no node will die. Will my setup guarantee that all partitions
are computed in parallel?

I know this is somehow hackish. Is there a better way doing so?

My goal is replicate message passing (like OpenMPI) with spark, where I
have very specific and final communcation requirements. So no need for the
many comm and sync funtionality, just what I already have - sync and talk.

Thanks!
Adam

Reply via email to