Re: Parallel actions from driver

2015-03-27 Thread Harut Martirosyan
This is exactly my case also, it worked, thanks Sean.

On 26 March 2015 at 23:35, Sean Owen so...@cloudera.com wrote:

 You can do this much more simply, I think, with Scala's parallel
 collections (try .par). There's nothing wrong with doing this, no.

 Here, something is getting caught in your closure, maybe
 unintentionally, that's not serializable. It's not directly related to
 the parallelism.

 On Thu, Mar 26, 2015 at 3:54 PM, Aram Mkrtchyan
 aram.mkrtchyan...@gmail.com wrote:
  Hi.
 
  I'm trying to trigger DataFrame's save method in parallel from my driver.
  For that purposes I use ExecutorService and Futures, here's my code:
 
 
  val futures = [1,2,3].map( t = pool.submit( new Runnable {
 
  override def run(): Unit = {
  val commons = events.filter(_._1 == t).map(_._2.common)
  saveAsParquetFile(sqlContext, commons, s$t/common)
  EventTypes.all.foreach { et =
  val eventData = events.filter(ev = ev._1 == t 
 ev._2.eventType ==
  et).map(_._2.data)
  saveAsParquetFile(sqlContext, eventData, s$t/$et)
  }
  }
 
  }))
  futures.foreach(_.get)
 
  It throws Task is not Serializable exception. Is it legal to use
 threads
  in driver to trigger actions?

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




-- 
RGRDZ Harut


Re: Parallel actions from driver

2015-03-26 Thread Sean Owen
You can do this much more simply, I think, with Scala's parallel
collections (try .par). There's nothing wrong with doing this, no.

Here, something is getting caught in your closure, maybe
unintentionally, that's not serializable. It's not directly related to
the parallelism.

On Thu, Mar 26, 2015 at 3:54 PM, Aram Mkrtchyan
aram.mkrtchyan...@gmail.com wrote:
 Hi.

 I'm trying to trigger DataFrame's save method in parallel from my driver.
 For that purposes I use ExecutorService and Futures, here's my code:


 val futures = [1,2,3].map( t = pool.submit( new Runnable {

 override def run(): Unit = {
 val commons = events.filter(_._1 == t).map(_._2.common)
 saveAsParquetFile(sqlContext, commons, s$t/common)
 EventTypes.all.foreach { et =
 val eventData = events.filter(ev = ev._1 == t  ev._2.eventType ==
 et).map(_._2.data)
 saveAsParquetFile(sqlContext, eventData, s$t/$et)
 }
 }

 }))
 futures.foreach(_.get)

 It throws Task is not Serializable exception. Is it legal to use threads
 in driver to trigger actions?

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Parallel actions from driver

2015-03-26 Thread Aram Mkrtchyan
Hi.

I'm trying to trigger DataFrame's save method in parallel from my driver.
For that purposes I use ExecutorService and Futures, here's my code:


val futures = [1,2,3].map( t = pool.submit( new Runnable {

override def run(): Unit = {
val commons = events.filter(_._1 == t).map(_._2.common)
saveAsParquetFile(sqlContext, commons, s$t/common)
EventTypes.all.foreach { et =
val eventData = events.filter(ev = ev._1 == t  ev._2.eventType
== et).map(_._2.data)
saveAsParquetFile(sqlContext, eventData, s$t/$et)
}
}

}))
futures.foreach(_.get)

It throws Task is not Serializable exception. Is it legal to use threads
in driver to trigger actions?