Hey,

Lets say we have multiple independent jobs that each transform some data
and store in distinct hdfs locations, is there a nice way to run them in
parallel? See the following pseudo code snippet:

dateList.map(date =>
sc.hdfsFile(date).map(transform).saveAsHadoopFile(date))

It's unfortunate if they run in sequence, since all the executors are not
used efficiently. What's the best way to parallelize execution of these
jobs?

Thanks,
Anders

Reply via email to