[jira] [Updated] (SPARK-20415) SPARK job hangs while writing DataFrame to HDFS

2017-05-09 Thread Hyukjin Kwon (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-20415:
-
Priority: Major  (was: Blocker)

I am lowering the priority to {{Major}} as higher priorities over this are 
usually reserved for committers and I guess this was not set by a committer.

> SPARK job hangs while writing DataFrame to HDFS
> ---
>
> Key: SPARK-20415
> URL: https://issues.apache.org/jira/browse/SPARK-20415
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, YARN
>Affects Versions: 2.1.0
> Environment: EMR 5.4.0
>Reporter: P K
>
> We are in POC phase with Spark. One of the Steps is reading compressed json 
> files that come from sources, "explode" them into tabular format and then 
> write them to HDFS. This worked for about three weeks until a few days ago, 
> for a particular dataset, the writer just hangs. I logged in to the worker 
> machines and see this stack trace:
> "Executor task launch worker-0" #39 daemon prio=5 os_prio=0 
> tid=0x7f6210352800 nid=0x4542 runnable [0x7f61f52b3000]
>java.lang.Thread.State: RUNNABLE
> at org.apache.spark.unsafe.Platform.copyMemory(Platform.java:210)
> at 
> org.apache.spark.sql.catalyst.expressions.UnsafeArrayData.writeToMemory(UnsafeArrayData.java:311)
> at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply6_2$(Unknown
>  Source)
> at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
>  Source)
> at 
> org.apache.spark.sql.execution.GenerateExec$$anonfun$doExecute$1$$anonfun$apply$9.apply(GenerateExec.scala:111)
> at 
> org.apache.spark.sql.execution.GenerateExec$$anonfun$doExecute$1$$anonfun$apply$9.apply(GenerateExec.scala:109)
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
> at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>  Source)
> at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
> at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377)
> at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
> at scala.collection.Iterator$JoinIterator.hasNext(Iterator.scala:211)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>  Source)
> at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
> at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377)
> at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.execute(FileFormatWriter.scala:243)
> at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:190)
> at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:188)
> at 
> org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1341)
> at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:193)
> at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$3.apply(FileFormatWriter.scala:129)
> at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$3.apply(FileFormatWriter.scala:128)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
> at org.apache.spark.scheduler.Task.run(Task.scala:99)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> The last messages ever printed in stderr before the hang are:
> 17/04/18 01:41:14 INFO DAGScheduler: Final stage: ResultStage 4 (save at 
> NativeMethodAccessorImpl.java:0)
> 17/04/18 01:41:14 INFO DAGScheduler: Parents of final stage: List()
> 17/04/18 01:41:14 INFO DAGScheduler: Missing parents: List()
> 17/04/18 01:41:14 INFO DAGScheduler: Submitting ResultStage 4 
> (MapPartitionsRDD[31] at save at NativeMethodAccessorImpl.java:0), which has 
> no missing parents
> 17/04/18 01:41:14 INFO MemoryStore: Block broadcast_9 stored as values in 
> me

[jira] [Updated] (SPARK-20415) SPARK job hangs while writing DataFrame to HDFS

2017-04-20 Thread P K (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

P K updated SPARK-20415:

Environment: EMR 5.4.0  (was: Latest EMR)

> SPARK job hangs while writing DataFrame to HDFS
> ---
>
> Key: SPARK-20415
> URL: https://issues.apache.org/jira/browse/SPARK-20415
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, YARN
>Affects Versions: 2.1.0
> Environment: EMR 5.4.0
>Reporter: P K
>Priority: Blocker
>
> We are in POC phase with Spark. One of the Steps is reading compressed json 
> files that come from sources, "explode" them into tabular format and then 
> write them to HDFS. This worked for about three weeks until a few days ago, 
> for a particular dataset, the writer just hangs. I logged in to the worker 
> machines and see this stack trace:
> "Executor task launch worker-0" #39 daemon prio=5 os_prio=0 
> tid=0x7f6210352800 nid=0x4542 runnable [0x7f61f52b3000]
>java.lang.Thread.State: RUNNABLE
> at org.apache.spark.unsafe.Platform.copyMemory(Platform.java:210)
> at 
> org.apache.spark.sql.catalyst.expressions.UnsafeArrayData.writeToMemory(UnsafeArrayData.java:311)
> at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply6_2$(Unknown
>  Source)
> at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
>  Source)
> at 
> org.apache.spark.sql.execution.GenerateExec$$anonfun$doExecute$1$$anonfun$apply$9.apply(GenerateExec.scala:111)
> at 
> org.apache.spark.sql.execution.GenerateExec$$anonfun$doExecute$1$$anonfun$apply$9.apply(GenerateExec.scala:109)
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
> at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>  Source)
> at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
> at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377)
> at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
> at scala.collection.Iterator$JoinIterator.hasNext(Iterator.scala:211)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>  Source)
> at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
> at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377)
> at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.execute(FileFormatWriter.scala:243)
> at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:190)
> at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:188)
> at 
> org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1341)
> at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:193)
> at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$3.apply(FileFormatWriter.scala:129)
> at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$3.apply(FileFormatWriter.scala:128)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
> at org.apache.spark.scheduler.Task.run(Task.scala:99)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> The last messages ever printed in stderr before the hang are:
> 17/04/18 01:41:14 INFO DAGScheduler: Final stage: ResultStage 4 (save at 
> NativeMethodAccessorImpl.java:0)
> 17/04/18 01:41:14 INFO DAGScheduler: Parents of final stage: List()
> 17/04/18 01:41:14 INFO DAGScheduler: Missing parents: List()
> 17/04/18 01:41:14 INFO DAGScheduler: Submitting ResultStage 4 
> (MapPartitionsRDD[31] at save at NativeMethodAccessorImpl.java:0), which has 
> no missing parents
> 17/04/18 01:41:14 INFO MemoryStore: Block broadcast_9 stored as values in 
> memory (estimated size 170.5 KB, free 2.2 GB)
> 17/04/18 01:41:14 INFO MemoryStore: Block broadcast_9_piece0 stored as bytes 
> in me

[jira] [Updated] (SPARK-20415) SPARK job hangs while writing DataFrame to HDFS

2017-04-20 Thread P K (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

P K updated SPARK-20415:

Summary: SPARK job hangs while writing DataFrame to HDFS  (was: SPARK job 
hangs while writing table to HDFS)

> SPARK job hangs while writing DataFrame to HDFS
> ---
>
> Key: SPARK-20415
> URL: https://issues.apache.org/jira/browse/SPARK-20415
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, YARN
>Affects Versions: 2.1.0
> Environment: Latest EMR
>Reporter: P K
>Priority: Blocker
>
> We are in POC phase with Spark. One of the Steps is reading compressed json 
> files that come from sources, "explode" them into tabular format and then 
> write them to HDFS. This worked for about three weeks until a few days ago, 
> for a particular dataset, the writer just hangs. I logged in to the worker 
> machines and see this stack trace:
> "Executor task launch worker-0" #39 daemon prio=5 os_prio=0 
> tid=0x7f6210352800 nid=0x4542 runnable [0x7f61f52b3000]
>java.lang.Thread.State: RUNNABLE
> at org.apache.spark.unsafe.Platform.copyMemory(Platform.java:210)
> at 
> org.apache.spark.sql.catalyst.expressions.UnsafeArrayData.writeToMemory(UnsafeArrayData.java:311)
> at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply6_2$(Unknown
>  Source)
> at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
>  Source)
> at 
> org.apache.spark.sql.execution.GenerateExec$$anonfun$doExecute$1$$anonfun$apply$9.apply(GenerateExec.scala:111)
> at 
> org.apache.spark.sql.execution.GenerateExec$$anonfun$doExecute$1$$anonfun$apply$9.apply(GenerateExec.scala:109)
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
> at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>  Source)
> at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
> at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377)
> at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
> at scala.collection.Iterator$JoinIterator.hasNext(Iterator.scala:211)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>  Source)
> at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
> at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377)
> at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.execute(FileFormatWriter.scala:243)
> at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:190)
> at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:188)
> at 
> org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1341)
> at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:193)
> at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$3.apply(FileFormatWriter.scala:129)
> at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$3.apply(FileFormatWriter.scala:128)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
> at org.apache.spark.scheduler.Task.run(Task.scala:99)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> The last messages ever printed in stderr before the hang are:
> 17/04/18 01:41:14 INFO DAGScheduler: Final stage: ResultStage 4 (save at 
> NativeMethodAccessorImpl.java:0)
> 17/04/18 01:41:14 INFO DAGScheduler: Parents of final stage: List()
> 17/04/18 01:41:14 INFO DAGScheduler: Missing parents: List()
> 17/04/18 01:41:14 INFO DAGScheduler: Submitting ResultStage 4 
> (MapPartitionsRDD[31] at save at NativeMethodAccessorImpl.java:0), which has 
> no missing parents
> 17/04/18 01:41:14 INFO MemoryStore: Block broadcast_9 stored as values in 
> memory (estimated size 170.5 KB, free 2.2 GB)
> 17/04/18 01:41:1