Re: PySpark concurrent jobs using single SparkContext

2015-08-21 Thread Hemant Bhanawat
It seems like you want simultaneous processing of multiple jobs but at the same time serialization of few tasks within those jobs. I don't know how to achieve that in Spark. But, why would you bother about the inter-weaved processing when the data that is being aggregated in different jobs is per

PySpark concurrent jobs using single SparkContext

2015-08-20 Thread Mike Sukmanowsky
Hi all, We're using Spark 1.3.0 via a small YARN cluster to do some log processing. The jobs are pretty simple, for a number of customers and a number of days, fetch some event log data, build aggregates and store those aggregates into a data store. The way our script is written right now does