Hey, Guys! I am using spark for NGS data application.
In my case I have to broadcast a very big dataset to each task. However there are serveral tasks (say 48 tasks) running on cpus (also 48 cores) in the same node. These tasks, who run on the same node, could share the same dataset. But spark broadcast them 48 times (if I understand correctly). Is there a way to broadcast just one copy for each node and share by all tasks running on such nodes? Much appreciated! best! huanglr