Re: Queue independent jobs

2015-01-09 Thread Sean Owen
You can parallelize on the driver side. The way to do it is almost exactly what you have here, where you're iterating over a local Scala collection of dates and invoking a Spark operation for each. Simply write dateList.par.map(...) to make the local map proceed in parallel. It should invoke the

Re: Queue independent jobs

2015-01-09 Thread Anders Arpteg
Awesome, it actually seems to work. Amazing how simple it can be sometimes... Thanks Sean! On Fri, Jan 9, 2015 at 12:42 PM, Sean Owen so...@cloudera.com wrote: You can parallelize on the driver side. The way to do it is almost exactly what you have here, where you're iterating over a local

Queue independent jobs

2015-01-09 Thread Anders Arpteg
Hey, Lets say we have multiple independent jobs that each transform some data and store in distinct hdfs locations, is there a nice way to run them in parallel? See the following pseudo code snippet: dateList.map(date = sc.hdfsFile(date).map(transform).saveAsHadoopFile(date)) It's unfortunate