Here's what the console shows:

15/01/01 01:12:29 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 58.0,
whose tasks have all completed, from pool
15/01/01 01:12:29 INFO scheduler.DAGScheduler: Stage 58 (runJob at
ParquetTableOperations.scala:326) finished in 5493.549 s
15/01/01 01:12:29 INFO scheduler.DAGScheduler: Job 41 finished: runJob at
ParquetTableOperations.scala:326, took 5493.747061 s

It is now 01:40:03, so the driver has been hanging for the last 28 minutes.
The web UI on the other hand shows that all tasks completed successfully,
and the output directory has been populated--although the _SUCCESS file is
missing.

It is worth noting that my code started this job as its own thread. The
actual code looks like the following snippet, modulo some simplifications.

  def save_to_parquet(allowExisting : Boolean = false) = {
    val threads = tables.map(table => {
      val thread = new Thread {
        override def run {
          table.insertInto(t.table_name)
        }
      }
      thread.start
      thread
    })
    threads.foreach(_.join)
  }

As far as I can see the insertInto call never returns. Any idea why?

Alex

Reply via email to