[jira] [Updated] (SPARK-20415) SPARK job hangs while writing DataFrame to HDFS
[ https://issues.apache.org/jira/browse/SPARK-20415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-20415: - Priority: Major (was: Blocker) I am lowering the priority to {{Major}} as higher priorities over this are usually reserved for committers and I guess this was not set by a committer. > SPARK job hangs while writing DataFrame to HDFS > --- > > Key: SPARK-20415 > URL: https://issues.apache.org/jira/browse/SPARK-20415 > Project: Spark > Issue Type: Bug > Components: PySpark, YARN >Affects Versions: 2.1.0 > Environment: EMR 5.4.0 >Reporter: P K > > We are in POC phase with Spark. One of the Steps is reading compressed json > files that come from sources, "explode" them into tabular format and then > write them to HDFS. This worked for about three weeks until a few days ago, > for a particular dataset, the writer just hangs. I logged in to the worker > machines and see this stack trace: > "Executor task launch worker-0" #39 daemon prio=5 os_prio=0 > tid=0x7f6210352800 nid=0x4542 runnable [0x7f61f52b3000] >java.lang.Thread.State: RUNNABLE > at org.apache.spark.unsafe.Platform.copyMemory(Platform.java:210) > at > org.apache.spark.sql.catalyst.expressions.UnsafeArrayData.writeToMemory(UnsafeArrayData.java:311) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply6_2$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown > Source) > at > org.apache.spark.sql.execution.GenerateExec$$anonfun$doExecute$1$$anonfun$apply$9.apply(GenerateExec.scala:111) > at > org.apache.spark.sql.execution.GenerateExec$$anonfun$doExecute$1$$anonfun$apply$9.apply(GenerateExec.scala:109) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377) > at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439) > at scala.collection.Iterator$JoinIterator.hasNext(Iterator.scala:211) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.execute(FileFormatWriter.scala:243) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:190) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:188) > at > org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1341) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:193) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$3.apply(FileFormatWriter.scala:129) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$3.apply(FileFormatWriter.scala:128) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > at org.apache.spark.scheduler.Task.run(Task.scala:99) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > The last messages ever printed in stderr before the hang are: > 17/04/18 01:41:14 INFO DAGScheduler: Final stage: ResultStage 4 (save at > NativeMethodAccessorImpl.java:0) > 17/04/18 01:41:14 INFO DAGScheduler: Parents of final stage: List() > 17/04/18 01:41:14 INFO DAGScheduler: Missing parents: List() > 17/04/18 01:41:14 INFO DAGScheduler: Submitting ResultStage 4 > (MapPartitionsRDD[31] at save at NativeMethodAccessorImpl.java:0), which has > no missing parents > 17/04/18 01:41:14 INFO MemoryStore: Block broadcast_9 stored as values in > me
[jira] [Updated] (SPARK-20415) SPARK job hangs while writing DataFrame to HDFS
[ https://issues.apache.org/jira/browse/SPARK-20415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] P K updated SPARK-20415: Environment: EMR 5.4.0 (was: Latest EMR) > SPARK job hangs while writing DataFrame to HDFS > --- > > Key: SPARK-20415 > URL: https://issues.apache.org/jira/browse/SPARK-20415 > Project: Spark > Issue Type: Bug > Components: PySpark, YARN >Affects Versions: 2.1.0 > Environment: EMR 5.4.0 >Reporter: P K >Priority: Blocker > > We are in POC phase with Spark. One of the Steps is reading compressed json > files that come from sources, "explode" them into tabular format and then > write them to HDFS. This worked for about three weeks until a few days ago, > for a particular dataset, the writer just hangs. I logged in to the worker > machines and see this stack trace: > "Executor task launch worker-0" #39 daemon prio=5 os_prio=0 > tid=0x7f6210352800 nid=0x4542 runnable [0x7f61f52b3000] >java.lang.Thread.State: RUNNABLE > at org.apache.spark.unsafe.Platform.copyMemory(Platform.java:210) > at > org.apache.spark.sql.catalyst.expressions.UnsafeArrayData.writeToMemory(UnsafeArrayData.java:311) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply6_2$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown > Source) > at > org.apache.spark.sql.execution.GenerateExec$$anonfun$doExecute$1$$anonfun$apply$9.apply(GenerateExec.scala:111) > at > org.apache.spark.sql.execution.GenerateExec$$anonfun$doExecute$1$$anonfun$apply$9.apply(GenerateExec.scala:109) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377) > at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439) > at scala.collection.Iterator$JoinIterator.hasNext(Iterator.scala:211) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.execute(FileFormatWriter.scala:243) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:190) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:188) > at > org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1341) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:193) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$3.apply(FileFormatWriter.scala:129) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$3.apply(FileFormatWriter.scala:128) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > at org.apache.spark.scheduler.Task.run(Task.scala:99) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > The last messages ever printed in stderr before the hang are: > 17/04/18 01:41:14 INFO DAGScheduler: Final stage: ResultStage 4 (save at > NativeMethodAccessorImpl.java:0) > 17/04/18 01:41:14 INFO DAGScheduler: Parents of final stage: List() > 17/04/18 01:41:14 INFO DAGScheduler: Missing parents: List() > 17/04/18 01:41:14 INFO DAGScheduler: Submitting ResultStage 4 > (MapPartitionsRDD[31] at save at NativeMethodAccessorImpl.java:0), which has > no missing parents > 17/04/18 01:41:14 INFO MemoryStore: Block broadcast_9 stored as values in > memory (estimated size 170.5 KB, free 2.2 GB) > 17/04/18 01:41:14 INFO MemoryStore: Block broadcast_9_piece0 stored as bytes > in me
[jira] [Updated] (SPARK-20415) SPARK job hangs while writing DataFrame to HDFS
[ https://issues.apache.org/jira/browse/SPARK-20415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] P K updated SPARK-20415: Summary: SPARK job hangs while writing DataFrame to HDFS (was: SPARK job hangs while writing table to HDFS) > SPARK job hangs while writing DataFrame to HDFS > --- > > Key: SPARK-20415 > URL: https://issues.apache.org/jira/browse/SPARK-20415 > Project: Spark > Issue Type: Bug > Components: PySpark, YARN >Affects Versions: 2.1.0 > Environment: Latest EMR >Reporter: P K >Priority: Blocker > > We are in POC phase with Spark. One of the Steps is reading compressed json > files that come from sources, "explode" them into tabular format and then > write them to HDFS. This worked for about three weeks until a few days ago, > for a particular dataset, the writer just hangs. I logged in to the worker > machines and see this stack trace: > "Executor task launch worker-0" #39 daemon prio=5 os_prio=0 > tid=0x7f6210352800 nid=0x4542 runnable [0x7f61f52b3000] >java.lang.Thread.State: RUNNABLE > at org.apache.spark.unsafe.Platform.copyMemory(Platform.java:210) > at > org.apache.spark.sql.catalyst.expressions.UnsafeArrayData.writeToMemory(UnsafeArrayData.java:311) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply6_2$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown > Source) > at > org.apache.spark.sql.execution.GenerateExec$$anonfun$doExecute$1$$anonfun$apply$9.apply(GenerateExec.scala:111) > at > org.apache.spark.sql.execution.GenerateExec$$anonfun$doExecute$1$$anonfun$apply$9.apply(GenerateExec.scala:109) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377) > at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439) > at scala.collection.Iterator$JoinIterator.hasNext(Iterator.scala:211) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.execute(FileFormatWriter.scala:243) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:190) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:188) > at > org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1341) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:193) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$3.apply(FileFormatWriter.scala:129) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$3.apply(FileFormatWriter.scala:128) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > at org.apache.spark.scheduler.Task.run(Task.scala:99) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > The last messages ever printed in stderr before the hang are: > 17/04/18 01:41:14 INFO DAGScheduler: Final stage: ResultStage 4 (save at > NativeMethodAccessorImpl.java:0) > 17/04/18 01:41:14 INFO DAGScheduler: Parents of final stage: List() > 17/04/18 01:41:14 INFO DAGScheduler: Missing parents: List() > 17/04/18 01:41:14 INFO DAGScheduler: Submitting ResultStage 4 > (MapPartitionsRDD[31] at save at NativeMethodAccessorImpl.java:0), which has > no missing parents > 17/04/18 01:41:14 INFO MemoryStore: Block broadcast_9 stored as values in > memory (estimated size 170.5 KB, free 2.2 GB) > 17/04/18 01:41:1