[jira] [Updated] (SPARK-22458) OutOfDirectMemoryError with Spark 2.2
[ https://issues.apache.org/jira/browse/SPARK-22458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kaushal Prajapati updated SPARK-22458: -- Description: We were using Spark 2.1 from last 6 months to execute multiple spark jobs that is running 15 hour long for 50+ TB of source data with below configurations successfully. {quote}spark.master yarn spark.driver.cores10 spark.driver.maxResultSize5g spark.driver.memory 20g spark.executor.cores 5 spark.executor.extraJavaOptions *-XX:+UseG1GC -Dio.netty.maxDirectMemory=1024* -XX:MaxGCPauseMillis=6 *-XX:MaxDirectMemorySize=2048m* -Dlog4j.configuration=file:///conf/log4j.properties -Dhdp.version=2.5.3.0-37 spark.driver.extraJavaOptions *-Dio.netty.maxDirectMemory=2048 -XX:MaxDirectMemorySize=2048m* -Dlog4j.configuration=file:///conf/log4j.properties -Dhdp.version=2.5.3.0-37 spark.executor.instances 30 spark.executor.memory 30g *spark.kryoserializer.buffer.max 512m* spark.network.timeout 12000s spark.serializer org.apache.spark.serializer.KryoSerializer spark.shuffle.io.preferDirectBufs false spark.sql.catalogImplementation hive spark.sql.shuffle.partitions 5000 spark.yarn.driver.memoryOverhead 1536 spark.yarn.executor.memoryOverhead4096 spark.core.connection.ack.wait.timeout600s spark.scheduler.maxRegisteredResourcesWaitingTime 15s spark.sql.hive.filesourcePartitionFileCacheSize 524288000 spark.dynamicAllocation.executorIdleTimeout 3s spark.dynamicAllocation.enabled true spark.hadoop.yarn.timeline-service.enabledfalse spark.shuffle.service.enabled true spark.yarn.am.extraJavaOptions*-Dhdp.version=2.5.3.0-37 -Dio.netty.maxDirectMemory=1024 -XX:MaxDirectMemorySize=1024m*{quote} Recently we tried to upgrade from Spark 2.1 to Spark 2.2 to get some fixes using latest version. But we started facing DirectBuffer outOfMemory error and exceeding memory limits for executor memoryOverhead issue. To fix that we started tweaking multiple properties but still issue persists. Relevant information is shared below Please let me any other details is requried, Snapshot for DirectMemory Error Stacktrace :- {code:java} 10:48:26.417 WARN org.apache.spark.scheduler.TaskSetManager: Lost task 5.0 in stage 5.3 (TID 25022, dedwdprshc070.de.xxx.com, executor 615): FetchFailed(BlockManagerId(465, dedwdprshc061.de.xxx.com, 7337, None), shuffleId=7, mapId=141, reduceId=3372, message= org.apache.spark.shuffle.FetchFailedException: failed to allocate 65536 byte(s) of direct memory (used: 1073699840, max: 1073741824) at org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:442) at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:418) at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:59) at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434) at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.sort_addToSorter$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.findNextInnerJoinRows$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$2.hasNext(WholeStageCodegenExec.scala:414) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(Unsafe
[jira] [Updated] (SPARK-22458) OutOfDirectMemoryError with Spark 2.2
[ https://issues.apache.org/jira/browse/SPARK-22458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kaushal Prajapati updated SPARK-22458: -- Description: We were using Spark 2.1 from last 6 months to execute multiple spark jobs that is running 15 hour long for 50+ TB of source data with below configurations successfully. {noformat} spark.master yarn spark.driver.cores10 spark.driver.maxResultSize5g spark.driver.memory 20g spark.executor.cores 5 spark.executor.extraJavaOptions -XX:+UseG1GC {color:red}-Dio.netty.maxDirectMemory=1024* -XX:MaxGCPauseMillis=6{color} -XX:MaxDirectMemorySize=2048m* -Dlog4j.configuration=file:///conf/log4j.properties -Dhdp.version=2.5.3.0-37 spark.driver.extraJavaOptions *-Dio.netty.maxDirectMemory=2048 -XX:MaxDirectMemorySize=2048m* -Dlog4j.configuration=file:///conf/log4j.properties -Dhdp.version=2.5.3.0-37 spark.executor.instances 30 spark.executor.memory 30g *spark.kryoserializer.buffer.max 512m* spark.network.timeout 12000s spark.serializer org.apache.spark.serializer.KryoSerializer spark.shuffle.io.preferDirectBufs false spark.sql.catalogImplementation hive spark.sql.shuffle.partitions 5000 spark.yarn.driver.memoryOverhead 1536 spark.yarn.executor.memoryOverhead4096 spark.core.connection.ack.wait.timeout600s spark.scheduler.maxRegisteredResourcesWaitingTime 15s spark.sql.hive.filesourcePartitionFileCacheSize 524288000 spark.dynamicAllocation.executorIdleTimeout 3s spark.dynamicAllocation.enabled true spark.hadoop.yarn.timeline-service.enabledfalse spark.shuffle.service.enabled true spark.yarn.am.extraJavaOptions-Dhdp.version=2.5.3.0-37 * -Dio.netty.maxDirectMemory=1024 -XX:MaxDirectMemorySize=1024m* {noformat} Recently we tried to upgrade from Spark 2.1 to Spark 2.2 to get some fixes using latest version. But we started facing DirectBuffer outOfMemory error and exceeding memory limits for executor memoryOverhead issue. To fix that we started tweaking multiple properties but still issue persists. Relevant information is shared below Please let me any other details is requried, Snapshot for DirectMemory Error Stacktrace :- {code:java} 10:48:26.417 WARN org.apache.spark.scheduler.TaskSetManager: Lost task 5.0 in stage 5.3 (TID 25022, dedwdprshc070.de.xxx.com, executor 615): FetchFailed(BlockManagerId(465, dedwdprshc061.de.xxx.com, 7337, None), shuffleId=7, mapId=141, reduceId=3372, message= org.apache.spark.shuffle.FetchFailedException: failed to allocate 65536 byte(s) of direct memory (used: 1073699840, max: 1073741824) at org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:442) at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:418) at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:59) at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434) at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.sort_addToSorter$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.findNextInnerJoinRows$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$2.hasNext(WholeStageCodegenExec.scala:414) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at org.apache.spark.shuffle.sort.Unsafe
[jira] [Updated] (SPARK-22458) OutOfDirectMemoryError with Spark 2.2
[ https://issues.apache.org/jira/browse/SPARK-22458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kaushal Prajapati updated SPARK-22458: -- Description: We were using Spark 2.1 from last 6 months to execute multiple spark jobs that is running 15 hour long for 50+ TB of source data with below configurations successfully. {noformat} spark.master yarn spark.driver.cores10 spark.driver.maxResultSize5g spark.driver.memory 20g spark.executor.cores 5 spark.executor.extraJavaOptions -XX:+UseG1GC *-Dio.netty.maxDirectMemory=1024* -XX:MaxGCPauseMillis=6 *-XX:MaxDirectMemorySize=2048m* -Dlog4j.configuration=file:///conf/log4j.properties -Dhdp.version=2.5.3.0-37 spark.driver.extraJavaOptions *-Dio.netty.maxDirectMemory=2048 -XX:MaxDirectMemorySize=2048m* -Dlog4j.configuration=file:///conf/log4j.properties -Dhdp.version=2.5.3.0-37 spark.executor.instances 30 spark.executor.memory 30g *spark.kryoserializer.buffer.max 512m* spark.network.timeout 12000s spark.serializer org.apache.spark.serializer.KryoSerializer spark.shuffle.io.preferDirectBufs false spark.sql.catalogImplementation hive spark.sql.shuffle.partitions 5000 spark.yarn.driver.memoryOverhead 1536 spark.yarn.executor.memoryOverhead4096 spark.core.connection.ack.wait.timeout600s spark.scheduler.maxRegisteredResourcesWaitingTime 15s spark.sql.hive.filesourcePartitionFileCacheSize 524288000 spark.dynamicAllocation.executorIdleTimeout 3s spark.dynamicAllocation.enabled true spark.hadoop.yarn.timeline-service.enabledfalse spark.shuffle.service.enabled true spark.yarn.am.extraJavaOptions-Dhdp.version=2.5.3.0-37 * -Dio.netty.maxDirectMemory=1024 -XX:MaxDirectMemorySize=1024m* {noformat} Recently we tried to upgrade from Spark 2.1 to Spark 2.2 to get some fixes using latest version. But we started facing DirectBuffer outOfMemory error and exceeding memory limits for executor memoryOverhead issue. To fix that we started tweaking multiple properties but still issue persists. Relevant information is shared below Please let me any other details is requried, Snapshot for DirectMemory Error Stacktrace :- {code:java} 10:48:26.417 WARN org.apache.spark.scheduler.TaskSetManager: Lost task 5.0 in stage 5.3 (TID 25022, dedwdprshc070.de.xxx.com, executor 615): FetchFailed(BlockManagerId(465, dedwdprshc061.de.xxx.com, 7337, None), shuffleId=7, mapId=141, reduceId=3372, message= org.apache.spark.shuffle.FetchFailedException: failed to allocate 65536 byte(s) of direct memory (used: 1073699840, max: 1073741824) at org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:442) at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:418) at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:59) at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434) at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.sort_addToSorter$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.findNextInnerJoinRows$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$2.hasNext(WholeStageCodegenExec.scala:414) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at org.apache.spark.shuffle.sort.UnsafeShuffleWriter.wr
[jira] [Updated] (SPARK-22458) OutOfDirectMemoryError with Spark 2.2
[ https://issues.apache.org/jira/browse/SPARK-22458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kaushal Prajapati updated SPARK-22458: -- Description: We were using Spark 2.1 from last 6 months to execute multiple spark jobs that is running 15 hour long for 50+ TB of source data with below configurations successfully. {noformat} spark.master yarn spark.driver.cores10 spark.driver.maxResultSize5g spark.driver.memory 20g spark.executor.cores 5 spark.executor.extraJavaOptions -XX:+UseG1GC *-Dio.netty.maxDirectMemory=1024* -XX:MaxGCPauseMillis=6 *-XX:MaxDirectMemorySize=2048m* -Dlog4j.configuration=file:///conf/log4j.properties -Dhdp.version=2.5.3.0-37 spark.driver.extraJavaOptions* -Dio.netty.maxDirectMemory=2048 -XX:MaxDirectMemorySize=2048m* -Dlog4j.configuration=file:///conf/log4j.properties -Dhdp.version=2.5.3.0-37 spark.executor.instances 30 spark.executor.memory 30g *spark.kryoserializer.buffer.max 512m* spark.network.timeout 12000s spark.serializer org.apache.spark.serializer.KryoSerializer spark.shuffle.io.preferDirectBufs false spark.sql.catalogImplementation hive spark.sql.shuffle.partitions 5000 spark.yarn.driver.memoryOverhead 1536 spark.yarn.executor.memoryOverhead4096 spark.core.connection.ack.wait.timeout600s spark.scheduler.maxRegisteredResourcesWaitingTime 15s spark.sql.hive.filesourcePartitionFileCacheSize 524288000 spark.dynamicAllocation.executorIdleTimeout 3s spark.dynamicAllocation.enabled true spark.hadoop.yarn.timeline-service.enabledfalse spark.shuffle.service.enabled true spark.yarn.am.extraJavaOptions-Dhdp.version=2.5.3.0-37 * -Dio.netty.maxDirectMemory=1024 -XX:MaxDirectMemorySize=1024m* {noformat} Recently we tried to upgrade from Spark 2.1 to Spark 2.2 to get some fixes using latest version. But we started facing DirectBuffer outOfMemory error and exceeding memory limits for executor memoryOverhead issue. To fix that we started tweaking multiple properties but still issue persists. Relevant information is shared below Please let me any other details is requried, Snapshot for DirectMemory Error Stacktrace :- {code:java} 10:48:26.417 WARN org.apache.spark.scheduler.TaskSetManager: Lost task 5.0 in stage 5.3 (TID 25022, dedwdprshc070.de.xxx.com, executor 615): FetchFailed(BlockManagerId(465, dedwdprshc061.de.xxx.com, 7337, None), shuffleId=7, mapId=141, reduceId=3372, message= org.apache.spark.shuffle.FetchFailedException: failed to allocate 65536 byte(s) of direct memory (used: 1073699840, max: 1073741824) at org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:442) at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:418) at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:59) at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434) at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.sort_addToSorter$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.findNextInnerJoinRows$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$2.hasNext(WholeStageCodegenExec.scala:414) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at org.apache.spark.shuffle.sort.UnsafeShuffleWriter.w
[jira] [Updated] (SPARK-22458) OutOfDirectMemoryError with Spark 2.2
[ https://issues.apache.org/jira/browse/SPARK-22458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kaushal Prajapati updated SPARK-22458: -- Description: We were using Spark 2.1 from last 6 months to execute multiple spark jobs that is running 15 hour long for 50+ TB of source data with below configurations successfully. {{spark.master yarn spark.driver.cores10 spark.driver.maxResultSize5g spark.driver.memory 20g spark.executor.cores 5 spark.executor.extraJavaOptions -XX:+UseG1GC *-Dio.netty.maxDirectMemory=1024* -XX:MaxGCPauseMillis=6 *-XX:MaxDirectMemorySize=2048m* -Dlog4j.configuration=file:///conf/log4j.properties -Dhdp.version=2.5.3.0-37 spark.driver.extraJavaOptions* -Dio.netty.maxDirectMemory=2048 -XX:MaxDirectMemorySize=2048m* -Dlog4j.configuration=file:///conf/log4j.properties -Dhdp.version=2.5.3.0-37 spark.executor.instances 30 spark.executor.memory 30g *spark.kryoserializer.buffer.max 512m* spark.network.timeout 12000s spark.serializer org.apache.spark.serializer.KryoSerializer spark.shuffle.io.preferDirectBufs false spark.sql.catalogImplementation hive spark.sql.shuffle.partitions 5000 spark.yarn.driver.memoryOverhead 1536 spark.yarn.executor.memoryOverhead4096 spark.core.connection.ack.wait.timeout600s spark.scheduler.maxRegisteredResourcesWaitingTime 15s spark.sql.hive.filesourcePartitionFileCacheSize 524288000 spark.dynamicAllocation.executorIdleTimeout 3s spark.dynamicAllocation.enabled true spark.hadoop.yarn.timeline-service.enabledfalse spark.shuffle.service.enabled true spark.yarn.am.extraJavaOptions-Dhdp.version=2.5.3.0-37 * -Dio.netty.maxDirectMemory=1024 -XX:MaxDirectMemorySize=1024m* }} Recently we tried to upgrade from Spark 2.1 to Spark 2.2 to get some fixes using latest version. But we started facing DirectBuffer outOfMemory error and exceeding memory limits for executor memoryOverhead issue. To fix that we started tweaking multiple properties but still issue persists. Relevant information is shared below Please let me any other details is requried, Snapshot for DirectMemory Error Stacktrace :- {code:java} 10:48:26.417 WARN org.apache.spark.scheduler.TaskSetManager: Lost task 5.0 in stage 5.3 (TID 25022, dedwdprshc070.de.xxx.com, executor 615): FetchFailed(BlockManagerId(465, dedwdprshc061.de.xxx.com, 7337, None), shuffleId=7, mapId=141, reduceId=3372, message= org.apache.spark.shuffle.FetchFailedException: failed to allocate 65536 byte(s) of direct memory (used: 1073699840, max: 1073741824) at org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:442) at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:418) at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:59) at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434) at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.sort_addToSorter$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.findNextInnerJoinRows$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$2.hasNext(WholeStageCodegenExec.scala:414) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWr
[jira] [Updated] (SPARK-22458) OutOfDirectMemoryError with Spark 2.2
[ https://issues.apache.org/jira/browse/SPARK-22458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kaushal Prajapati updated SPARK-22458: -- Description: We were using Spark 2.1 from last 6 months to execute multiple spark jobs that is running 15 hour long for 50+ TB of source data with below configurations successfully. spark.master yarn spark.driver.cores10 spark.driver.maxResultSize5g spark.driver.memory 20g spark.executor.cores 5 spark.executor.extraJavaOptions -XX:+UseG1GC *-Dio.netty.maxDirectMemory=1024* -XX:MaxGCPauseMillis=6 *-XX:MaxDirectMemorySize=2048m* -Dlog4j.configuration=file:///conf/log4j.properties -Dhdp.version=2.5.3.0-37 spark.driver.extraJavaOptions* -Dio.netty.maxDirectMemory=2048 -XX:MaxDirectMemorySize=2048m* -Dlog4j.configuration=file:///conf/log4j.properties -Dhdp.version=2.5.3.0-37 spark.executor.instances 30 spark.executor.memory 30g *spark.kryoserializer.buffer.max 512m* spark.network.timeout 12000s spark.serializer org.apache.spark.serializer.KryoSerializer spark.shuffle.io.preferDirectBufs false spark.sql.catalogImplementation hive spark.sql.shuffle.partitions 5000 spark.yarn.driver.memoryOverhead 1536 spark.yarn.executor.memoryOverhead4096 spark.core.connection.ack.wait.timeout600s spark.scheduler.maxRegisteredResourcesWaitingTime 15s spark.sql.hive.filesourcePartitionFileCacheSize 524288000 spark.dynamicAllocation.executorIdleTimeout 3s spark.dynamicAllocation.enabled true spark.hadoop.yarn.timeline-service.enabledfalse spark.shuffle.service.enabled true spark.yarn.am.extraJavaOptions-Dhdp.version=2.5.3.0-37 * -Dio.netty.maxDirectMemory=1024 -XX:MaxDirectMemorySize=1024m* Recently we tried to upgrade from Spark 2.1 to Spark 2.2 to get some fixes using latest version. But we started facing DirectBuffer outOfMemory error and exceeding memory limits for executor memoryOverhead issue. To fix that we started tweaking multiple properties but still issue persists. Relevant information is shared below Please let me any other details is requried, Snapshot for DirectMemory Error Stacktrace :- 10:48:26.417 WARN org.apache.spark.scheduler.TaskSetManager: Lost task 5.0 in stage 5.3 (TID 25022, dedwdprshc070.de.xxx.com, executor 615): FetchFailed(BlockManagerId(465, dedwdprshc061.de.xxx.com, 7337, None), shuffleId=7, mapId=141, reduceId=3372, message= org.apache.spark.shuffle.FetchFailedException: failed to allocate 65536 byte(s) of direct memory (used: 1073699840, max: 1073741824) at org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:442) at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:418) at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:59) at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434) at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.sort_addToSorter$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.findNextInnerJoinRows$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$2.hasNext(WholeStageCodegenExec.scala:414) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:166)