Hey, Lets say we have multiple independent jobs that each transform some data and store in distinct hdfs locations, is there a nice way to run them in parallel? See the following pseudo code snippet:
dateList.map(date => sc.hdfsFile(date).map(transform).saveAsHadoopFile(date)) It's unfortunate if they run in sequence, since all the executors are not used efficiently. What's the best way to parallelize execution of these jobs? Thanks, Anders