Re: Parallel actions from driver
This is exactly my case also, it worked, thanks Sean. On 26 March 2015 at 23:35, Sean Owen so...@cloudera.com wrote: You can do this much more simply, I think, with Scala's parallel collections (try .par). There's nothing wrong with doing this, no. Here, something is getting caught in your closure, maybe unintentionally, that's not serializable. It's not directly related to the parallelism. On Thu, Mar 26, 2015 at 3:54 PM, Aram Mkrtchyan aram.mkrtchyan...@gmail.com wrote: Hi. I'm trying to trigger DataFrame's save method in parallel from my driver. For that purposes I use ExecutorService and Futures, here's my code: val futures = [1,2,3].map( t = pool.submit( new Runnable { override def run(): Unit = { val commons = events.filter(_._1 == t).map(_._2.common) saveAsParquetFile(sqlContext, commons, s$t/common) EventTypes.all.foreach { et = val eventData = events.filter(ev = ev._1 == t ev._2.eventType == et).map(_._2.data) saveAsParquetFile(sqlContext, eventData, s$t/$et) } } })) futures.foreach(_.get) It throws Task is not Serializable exception. Is it legal to use threads in driver to trigger actions? - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org -- RGRDZ Harut
Re: Parallel actions from driver
You can do this much more simply, I think, with Scala's parallel collections (try .par). There's nothing wrong with doing this, no. Here, something is getting caught in your closure, maybe unintentionally, that's not serializable. It's not directly related to the parallelism. On Thu, Mar 26, 2015 at 3:54 PM, Aram Mkrtchyan aram.mkrtchyan...@gmail.com wrote: Hi. I'm trying to trigger DataFrame's save method in parallel from my driver. For that purposes I use ExecutorService and Futures, here's my code: val futures = [1,2,3].map( t = pool.submit( new Runnable { override def run(): Unit = { val commons = events.filter(_._1 == t).map(_._2.common) saveAsParquetFile(sqlContext, commons, s$t/common) EventTypes.all.foreach { et = val eventData = events.filter(ev = ev._1 == t ev._2.eventType == et).map(_._2.data) saveAsParquetFile(sqlContext, eventData, s$t/$et) } } })) futures.foreach(_.get) It throws Task is not Serializable exception. Is it legal to use threads in driver to trigger actions? - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Parallel actions from driver
Hi. I'm trying to trigger DataFrame's save method in parallel from my driver. For that purposes I use ExecutorService and Futures, here's my code: val futures = [1,2,3].map( t = pool.submit( new Runnable { override def run(): Unit = { val commons = events.filter(_._1 == t).map(_._2.common) saveAsParquetFile(sqlContext, commons, s$t/common) EventTypes.all.foreach { et = val eventData = events.filter(ev = ev._1 == t ev._2.eventType == et).map(_._2.data) saveAsParquetFile(sqlContext, eventData, s$t/$et) } } })) futures.foreach(_.get) It throws Task is not Serializable exception. Is it legal to use threads in driver to trigger actions?