[jira] [Commented] (SPARK-26116) Spark SQL - Sort when writing partitioned parquet leads to OOM errors
[ https://issues.apache.org/jira/browse/SPARK-26116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16696526#comment-16696526 ] Pierre Lienhart commented on SPARK-26116: - I just enhanced the ticket description. > Spark SQL - Sort when writing partitioned parquet leads to OOM errors > - > > Key: SPARK-26116 > URL: https://issues.apache.org/jira/browse/SPARK-26116 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.1 >Reporter: Pierre Lienhart >Priority: Major > > When writing partitioned parquet using {{partitionBy}}, it looks like Spark > sorts each partition before writing but this sort consumes a huge amount of > memory compared to the size of the data. The executors can then go OOM and > get killed by YARN. As a consequence, it also forces to provision huge > amounts of memory compared to the data to be written. > Error messages found in the Spark UI are like the following : > {code:java} > Spark UI description of failure : Job aborted due to stage failure: Task 169 > in stage 2.0 failed 1 times, most recent failure: Lost task 169.0 in stage > 2.0 (TID 98, x.xx.x.xx, executor 1): ExecutorLostFailure > (executor 1 exited caused by one of the running tasks) Reason: Container > killed by YARN for exceeding memory limits. 8.1 GB of 8 GB physical memory > used. Consider boosting spark.yarn.executor.memoryOverhead. > {code} > > {code:java} > Job aborted due to stage failure: Task 66 in stage 4.0 failed 1 times, most > recent failure: Lost task 66.0 in stage 4.0 (TID 56, xxx.x.x.xx, > executor 1): org.apache.spark.SparkException: Task failed while writing rows > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:204) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$3.apply(FileFormatWriter.scala:129) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$3.apply(FileFormatWriter.scala:128) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > at org.apache.spark.scheduler.Task.run(Task.scala:99) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.OutOfMemoryError: error while calling spill() on > org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter@75194804 : > /app/hadoop/yarn/local/usercache/at053351/appcache/application_1537536072724_17039/blockmgr-a4ba7d59-e780-4385-99b4-a4c4fe95a1ec/25/temp_local_a542a412-5845-45d2-9302-bbf5ee4113ad > (No such file or directory) > at > org.apache.spark.memory.TaskMemoryManager.acquireExecutionMemory(TaskMemoryManager.java:188) > at > org.apache.spark.memory.TaskMemoryManager.allocatePage(TaskMemoryManager.java:254) > at > org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:92) > at > org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.growPointerArrayIfNecessary(UnsafeExternalSorter.java:347) > at > org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.insertKVRecord(UnsafeExternalSorter.java:425) > at > org.apache.spark.sql.execution.UnsafeKVExternalSorter.insertKV(UnsafeKVExternalSorter.java:160) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$DynamicPartitionWriteTask.execute(FileFormatWriter.scala:364) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:190) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:188) > at > org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1353) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:193) > ... 8 more{code} > > In the stderr logs, we can see that huge amount of sort data (the partition > being sorted here is 250 MB when persisted into memory, deserialized) is > being spilled to the disk ({{INFO UnsafeExternalSorter: Thread 155 spilling > sort data of 3.6 GB to disk}}). Sometimes the
[jira] [Commented] (SPARK-26116) Spark SQL - Sort when writing partitioned parquet leads to OOM errors
[ https://issues.apache.org/jira/browse/SPARK-26116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695526#comment-16695526 ] Hyukjin Kwon commented on SPARK-26116: -- Please describe that fact in the JIRA as well. > Spark SQL - Sort when writing partitioned parquet leads to OOM errors > - > > Key: SPARK-26116 > URL: https://issues.apache.org/jira/browse/SPARK-26116 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.1 >Reporter: Pierre Lienhart >Priority: Major > > When writing partitioned parquet using {{partitionBy}}, it looks like Spark > sorts each partition before writing but this sort consumes a huge amount of > memory compared to the size of the data. The executors can then go OOM and > get killed by YARN. As a consequence, it also forces to provision huge > amounts of memory compared to the data to be written. > Error messages found in the Spark UI are like the following : > {code:java} > Spark UI description of failure : Job aborted due to stage failure: Task 169 > in stage 2.0 failed 1 times, most recent failure: Lost task 169.0 in stage > 2.0 (TID 98, x.xx.x.xx, executor 1): ExecutorLostFailure > (executor 1 exited caused by one of the running tasks) Reason: Container > killed by YARN for exceeding memory limits. 8.1 GB of 8 GB physical memory > used. Consider boosting spark.yarn.executor.memoryOverhead. > {code} > > {code:java} > Job aborted due to stage failure: Task 66 in stage 4.0 failed 1 times, most > recent failure: Lost task 66.0 in stage 4.0 (TID 56, xxx.x.x.xx, > executor 1): org.apache.spark.SparkException: Task failed while writing rows > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:204) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$3.apply(FileFormatWriter.scala:129) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$3.apply(FileFormatWriter.scala:128) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > at org.apache.spark.scheduler.Task.run(Task.scala:99) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.OutOfMemoryError: error while calling spill() on > org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter@75194804 : > /app/hadoop/yarn/local/usercache/at053351/appcache/application_1537536072724_17039/blockmgr-a4ba7d59-e780-4385-99b4-a4c4fe95a1ec/25/temp_local_a542a412-5845-45d2-9302-bbf5ee4113ad > (No such file or directory) > at > org.apache.spark.memory.TaskMemoryManager.acquireExecutionMemory(TaskMemoryManager.java:188) > at > org.apache.spark.memory.TaskMemoryManager.allocatePage(TaskMemoryManager.java:254) > at > org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:92) > at > org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.growPointerArrayIfNecessary(UnsafeExternalSorter.java:347) > at > org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.insertKVRecord(UnsafeExternalSorter.java:425) > at > org.apache.spark.sql.execution.UnsafeKVExternalSorter.insertKV(UnsafeKVExternalSorter.java:160) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$DynamicPartitionWriteTask.execute(FileFormatWriter.scala:364) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:190) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:188) > at > org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1353) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:193) > ... 8 more{code} > > In the stderr logs, we can see that huge amount of sort data (the partition > being sorted here is 250 MB when persisted into memory, deserialized) is > being spilled to the disk ({{INFO UnsafeExternalSorter: Thread 155 spilling > sort data of 3.6 GB to disk}}). Sometimes the
[jira] [Commented] (SPARK-26116) Spark SQL - Sort when writing partitioned parquet leads to OOM errors
[ https://issues.apache.org/jira/browse/SPARK-26116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16694544#comment-16694544 ] Pierre Lienhart commented on SPARK-26116: - Ok so I started from a situation where I have the above-described crashes and then increased the off-heap memory size by setting spark.yarn.executor.memoryOverhead to 4g, 8g and 16g, the other settings remaining the same. It still crashes with the same logs : the UnsafeExternalSorter keeps spilling sort data to disk while the TaskMemoryManager unsuccesfully tries to allocate more pages until {{ERROR CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM}}. Note that the stdout of the crashed executor mentions {{java.lang.OutOfMemoryError: Java heap space}}. Same thing with XX:MaxDirectMemorySize. I know that the error message suggests to increase spark.yarn.executor.memoryOverhead but it does not seem to work in that case (and I forgot to mention in my first message that I had already tried). > Spark SQL - Sort when writing partitioned parquet leads to OOM errors > - > > Key: SPARK-26116 > URL: https://issues.apache.org/jira/browse/SPARK-26116 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.1 >Reporter: Pierre Lienhart >Priority: Major > > When writing partitioned parquet using {{partitionBy}}, it looks like Spark > sorts each partition before writing but this sort consumes a huge amount of > memory compared to the size of the data. The executors can then go OOM and > get killed by YARN. As a consequence, it also forces to provision huge > amounts of memory compared to the data to be written. > Error messages found in the Spark UI are like the following : > {code:java} > Spark UI description of failure : Job aborted due to stage failure: Task 169 > in stage 2.0 failed 1 times, most recent failure: Lost task 169.0 in stage > 2.0 (TID 98, x.xx.x.xx, executor 1): ExecutorLostFailure > (executor 1 exited caused by one of the running tasks) Reason: Container > killed by YARN for exceeding memory limits. 8.1 GB of 8 GB physical memory > used. Consider boosting spark.yarn.executor.memoryOverhead. > {code} > > {code:java} > Job aborted due to stage failure: Task 66 in stage 4.0 failed 1 times, most > recent failure: Lost task 66.0 in stage 4.0 (TID 56, xxx.x.x.xx, > executor 1): org.apache.spark.SparkException: Task failed while writing rows > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:204) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$3.apply(FileFormatWriter.scala:129) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$3.apply(FileFormatWriter.scala:128) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > at org.apache.spark.scheduler.Task.run(Task.scala:99) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.OutOfMemoryError: error while calling spill() on > org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter@75194804 : > /app/hadoop/yarn/local/usercache/at053351/appcache/application_1537536072724_17039/blockmgr-a4ba7d59-e780-4385-99b4-a4c4fe95a1ec/25/temp_local_a542a412-5845-45d2-9302-bbf5ee4113ad > (No such file or directory) > at > org.apache.spark.memory.TaskMemoryManager.acquireExecutionMemory(TaskMemoryManager.java:188) > at > org.apache.spark.memory.TaskMemoryManager.allocatePage(TaskMemoryManager.java:254) > at > org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:92) > at > org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.growPointerArrayIfNecessary(UnsafeExternalSorter.java:347) > at > org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.insertKVRecord(UnsafeExternalSorter.java:425) > at > org.apache.spark.sql.execution.UnsafeKVExternalSorter.insertKV(UnsafeKVExternalSorter.java:160) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$DynamicPartitionWriteTask.execute(FileFormatWriter.scala:364) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:190) > at >
[jira] [Commented] (SPARK-26116) Spark SQL - Sort when writing partitioned parquet leads to OOM errors
[ https://issues.apache.org/jira/browse/SPARK-26116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16693393#comment-16693393 ] Yuming Wang commented on SPARK-26116: - Please try to set spark.executor.memoryOverhead=6G or spark.executor.extraJavaOptions='-XX:MaxDirectMemorySize=4g'. > Spark SQL - Sort when writing partitioned parquet leads to OOM errors > - > > Key: SPARK-26116 > URL: https://issues.apache.org/jira/browse/SPARK-26116 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.1 >Reporter: Pierre Lienhart >Priority: Major > > When writing partitioned parquet using {{partitionBy}}, it looks like Spark > sorts each partition before writing but this sort consumes a huge amount of > memory compared to the size of the data. The executors can then go OOM and > get killed by YARN. As a consequence, it also forces to provision huge > amounts of memory compared to the data to be written. > Error messages found in the Spark UI are like the following : > {code:java} > Spark UI description of failure : Job aborted due to stage failure: Task 169 > in stage 2.0 failed 1 times, most recent failure: Lost task 169.0 in stage > 2.0 (TID 98, x.xx.x.xx, executor 1): ExecutorLostFailure > (executor 1 exited caused by one of the running tasks) Reason: Container > killed by YARN for exceeding memory limits. 8.1 GB of 8 GB physical memory > used. Consider boosting spark.yarn.executor.memoryOverhead. > {code} > > {code:java} > Job aborted due to stage failure: Task 66 in stage 4.0 failed 1 times, most > recent failure: Lost task 66.0 in stage 4.0 (TID 56, xxx.x.x.xx, > executor 1): org.apache.spark.SparkException: Task failed while writing rows > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:204) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$3.apply(FileFormatWriter.scala:129) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$3.apply(FileFormatWriter.scala:128) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > at org.apache.spark.scheduler.Task.run(Task.scala:99) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.OutOfMemoryError: error while calling spill() on > org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter@75194804 : > /app/hadoop/yarn/local/usercache/at053351/appcache/application_1537536072724_17039/blockmgr-a4ba7d59-e780-4385-99b4-a4c4fe95a1ec/25/temp_local_a542a412-5845-45d2-9302-bbf5ee4113ad > (No such file or directory) > at > org.apache.spark.memory.TaskMemoryManager.acquireExecutionMemory(TaskMemoryManager.java:188) > at > org.apache.spark.memory.TaskMemoryManager.allocatePage(TaskMemoryManager.java:254) > at > org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:92) > at > org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.growPointerArrayIfNecessary(UnsafeExternalSorter.java:347) > at > org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.insertKVRecord(UnsafeExternalSorter.java:425) > at > org.apache.spark.sql.execution.UnsafeKVExternalSorter.insertKV(UnsafeKVExternalSorter.java:160) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$DynamicPartitionWriteTask.execute(FileFormatWriter.scala:364) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:190) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:188) > at > org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1353) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:193) > ... 8 more{code} > > In the stderr logs, we can see that huge amount of sort data (the partition > being sorted here is 250 MB when persisted into memory, deserialized) is > being spilled to the disk ({{INFO UnsafeExternalSorter: