Dear community, I have a RDD with N rows and N partitions. I want to ensure that the partitions run all at the some time, by setting the number of vcores (spark-yarn) to N. The partitions need to talk to each other with some socket based sync that is why I need them to run more or less simultaneously.
Let's assume no node will die. Will my setup guarantee that all partitions are computed in parallel? I know this is somehow hackish. Is there a better way doing so? My goal is replicate message passing (like OpenMPI) with spark, where I have very specific and final communcation requirements. So no need for the many comm and sync funtionality, just what I already have - sync and talk. Thanks! Adam