Re: Spark driver main thread hanging after SQL insert

2015-01-02 Thread Alessandro Baretta
Patrick,

Sure. I was interested in knowing if anyone experienced a similar issue and
whether there was any known workaround. Anyway will report on JIRA.

Alex
On Jan 2, 2015 9:13 AM, Patrick Wendell pwend...@gmail.com wrote:

 Hi Alessandro,

 Can you create a JIRA for this rather than reporting it on the dev
 list? That's where we track issues like this. Thanks!.

 - Patrick

 On Wed, Dec 31, 2014 at 8:48 PM, Alessandro Baretta
 alexbare...@gmail.com wrote:
  Here's what the console shows:
 
  15/01/01 01:12:29 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 58.0,
  whose tasks have all completed, from pool
  15/01/01 01:12:29 INFO scheduler.DAGScheduler: Stage 58 (runJob at
  ParquetTableOperations.scala:326) finished in 5493.549 s
  15/01/01 01:12:29 INFO scheduler.DAGScheduler: Job 41 finished: runJob at
  ParquetTableOperations.scala:326, took 5493.747061 s
 
  It is now 01:40:03, so the driver has been hanging for the last 28
 minutes.
  The web UI on the other hand shows that all tasks completed successfully,
  and the output directory has been populated--although the _SUCCESS file
 is
  missing.
 
  It is worth noting that my code started this job as its own thread. The
  actual code looks like the following snippet, modulo some
 simplifications.
 
def save_to_parquet(allowExisting : Boolean = false) = {
  val threads = tables.map(table = {
val thread = new Thread {
  override def run {
table.insertInto(t.table_name)
  }
}
thread.start
thread
  })
  threads.foreach(_.join)
}
 
  As far as I can see the insertInto call never returns. Any idea why?
 
  Alex



Re: Spark driver main thread hanging after SQL insert

2015-01-02 Thread Patrick Wendell
Hi Alessandro,

Can you create a JIRA for this rather than reporting it on the dev
list? That's where we track issues like this. Thanks!.

- Patrick

On Wed, Dec 31, 2014 at 8:48 PM, Alessandro Baretta
alexbare...@gmail.com wrote:
 Here's what the console shows:

 15/01/01 01:12:29 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 58.0,
 whose tasks have all completed, from pool
 15/01/01 01:12:29 INFO scheduler.DAGScheduler: Stage 58 (runJob at
 ParquetTableOperations.scala:326) finished in 5493.549 s
 15/01/01 01:12:29 INFO scheduler.DAGScheduler: Job 41 finished: runJob at
 ParquetTableOperations.scala:326, took 5493.747061 s

 It is now 01:40:03, so the driver has been hanging for the last 28 minutes.
 The web UI on the other hand shows that all tasks completed successfully,
 and the output directory has been populated--although the _SUCCESS file is
 missing.

 It is worth noting that my code started this job as its own thread. The
 actual code looks like the following snippet, modulo some simplifications.

   def save_to_parquet(allowExisting : Boolean = false) = {
 val threads = tables.map(table = {
   val thread = new Thread {
 override def run {
   table.insertInto(t.table_name)
 }
   }
   thread.start
   thread
 })
 threads.foreach(_.join)
   }

 As far as I can see the insertInto call never returns. Any idea why?

 Alex

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org