Queue independent jobs

Anders Arpteg Fri, 09 Jan 2015 02:49:22 -0800

Hey,

Lets say we have multiple independent jobs that each transform some data
and store in distinct hdfs locations, is there a nice way to run them in
parallel? See the following pseudo code snippet:


dateList.map(date =>
sc.hdfsFile(date).map(transform).saveAsHadoopFile(date))

It's unfortunate if they run in sequence, since all the executors are not
used efficiently. What's the best way to parallelize execution of these
jobs?

Thanks,
Anders

Queue independent jobs

Reply via email to