You can parallelize on the driver side. The way to do it is almost
exactly what you have here, where you're iterating over a local Scala
collection of dates and invoking a Spark operation for each. Simply
write dateList.par.map(...) to make the local map proceed in
parallel. It should invoke the
Awesome, it actually seems to work. Amazing how simple it can be
sometimes...
Thanks Sean!
On Fri, Jan 9, 2015 at 12:42 PM, Sean Owen so...@cloudera.com wrote:
You can parallelize on the driver side. The way to do it is almost
exactly what you have here, where you're iterating over a local
Hey,
Lets say we have multiple independent jobs that each transform some data
and store in distinct hdfs locations, is there a nice way to run them in
parallel? See the following pseudo code snippet:
dateList.map(date =
sc.hdfsFile(date).map(transform).saveAsHadoopFile(date))
It's unfortunate