This is exactly my case also, it worked, thanks Sean. On 26 March 2015 at 23:35, Sean Owen <so...@cloudera.com> wrote:
> You can do this much more simply, I think, with Scala's parallel > collections (try .par). There's nothing wrong with doing this, no. > > Here, something is getting caught in your closure, maybe > unintentionally, that's not serializable. It's not directly related to > the parallelism. > > On Thu, Mar 26, 2015 at 3:54 PM, Aram Mkrtchyan > <aram.mkrtchyan...@gmail.com> wrote: > > Hi. > > > > I'm trying to trigger DataFrame's save method in parallel from my driver. > > For that purposes I use ExecutorService and Futures, here's my code: > > > > > > val futures = [1,2,3].map( t => pool.submit( new Runnable { > > > > override def run(): Unit = { > > val commons = events.filter(_._1 == t).map(_._2.common) > > saveAsParquetFile(sqlContext, commons, s"$t/common") > > EventTypes.all.foreach { et => > > val eventData = events.filter(ev => ev._1 == t && > ev._2.eventType == > > et).map(_._2.data) > > saveAsParquetFile(sqlContext, eventData, s"$t/$et") > > } > > } > > > > })) > > futures.foreach(_.get) > > > > It throws "Task is not Serializable" exception. Is it legal to use > threads > > in driver to trigger actions? > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > -- RGRDZ Harut