[jira] [Updated] (SPARK-22458) OutOfDirectMemoryError with Spark 2.2

2017-11-06 Thread Kaushal Prajapati (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaushal Prajapati updated SPARK-22458:
--
Description: 
We were using Spark 2.1 from last 6 months to execute multiple spark jobs that 
is running 15 hour long for 50+ TB of source data with below configurations 
successfully. 


{quote}spark.master  yarn
spark.driver.cores10
spark.driver.maxResultSize5g
spark.driver.memory   20g
spark.executor.cores  5
spark.executor.extraJavaOptions   *-XX:+UseG1GC 
-Dio.netty.maxDirectMemory=1024* -XX:MaxGCPauseMillis=6 
*-XX:MaxDirectMemorySize=2048m* 
-Dlog4j.configuration=file:///conf/log4j.properties -Dhdp.version=2.5.3.0-37
spark.driver.extraJavaOptions   
*-Dio.netty.maxDirectMemory=2048 -XX:MaxDirectMemorySize=2048m* 
-Dlog4j.configuration=file:///conf/log4j.properties -Dhdp.version=2.5.3.0-37
spark.executor.instances  30
spark.executor.memory 30g
*spark.kryoserializer.buffer.max   512m*

spark.network.timeout 12000s
spark.serializer  
org.apache.spark.serializer.KryoSerializer
spark.shuffle.io.preferDirectBufs false
spark.sql.catalogImplementation   hive
spark.sql.shuffle.partitions  5000
spark.yarn.driver.memoryOverhead  1536
spark.yarn.executor.memoryOverhead4096
spark.core.connection.ack.wait.timeout600s
spark.scheduler.maxRegisteredResourcesWaitingTime 15s
spark.sql.hive.filesourcePartitionFileCacheSize   524288000


spark.dynamicAllocation.executorIdleTimeout   3s
spark.dynamicAllocation.enabled   true
spark.hadoop.yarn.timeline-service.enabledfalse
spark.shuffle.service.enabled true
spark.yarn.am.extraJavaOptions*-Dhdp.version=2.5.3.0-37 
-Dio.netty.maxDirectMemory=1024 -XX:MaxDirectMemorySize=1024m*{quote}


Recently we tried to upgrade from Spark 2.1 to Spark 2.2 to get some fixes 
using latest version. But we started facing DirectBuffer outOfMemory error and 
exceeding memory limits for executor memoryOverhead issue. To fix that we 
started tweaking multiple properties but still issue persists. Relevant 
information is shared below

Please let me any other details is requried,

Snapshot for DirectMemory Error Stacktrace :- 

{code:java}
10:48:26.417 WARN org.apache.spark.scheduler.TaskSetManager: Lost task 5.0 in 
stage 5.3 (TID 25022, dedwdprshc070.de.xxx.com, executor 615): 
FetchFailed(BlockManagerId(465, dedwdprshc061.de.xxx.com, 7337, None), 
shuffleId=7, mapId=141, reduceId=3372, message=
org.apache.spark.shuffle.FetchFailedException: failed to allocate 65536 byte(s) 
of direct memory (used: 1073699840, max: 1073741824)
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:442)
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:418)
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:59)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at 
org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
at 
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.sort_addToSorter$(Unknown
 Source)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
 Source)
at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.findNextInnerJoinRows$(Unknown
 Source)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
 Source)
at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$2.hasNext(WholeStageCodegenExec.scala:414)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at 
org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(Unsafe

[jira] [Updated] (SPARK-22458) OutOfDirectMemoryError with Spark 2.2

2017-11-06 Thread Kaushal Prajapati (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaushal Prajapati updated SPARK-22458:
--
Description: 
We were using Spark 2.1 from last 6 months to execute multiple spark jobs that 
is running 15 hour long for 50+ TB of source data with below configurations 
successfully. 


{noformat}
spark.master  yarn
spark.driver.cores10
spark.driver.maxResultSize5g
spark.driver.memory   20g
spark.executor.cores  5
spark.executor.extraJavaOptions   -XX:+UseG1GC 
{color:red}-Dio.netty.maxDirectMemory=1024* -XX:MaxGCPauseMillis=6{color} 
-XX:MaxDirectMemorySize=2048m* 
-Dlog4j.configuration=file:///conf/log4j.properties -Dhdp.version=2.5.3.0-37
spark.driver.extraJavaOptions
*-Dio.netty.maxDirectMemory=2048 -XX:MaxDirectMemorySize=2048m* 
-Dlog4j.configuration=file:///conf/log4j.properties -Dhdp.version=2.5.3.0-37
spark.executor.instances  30
spark.executor.memory 30g
*spark.kryoserializer.buffer.max   512m*

spark.network.timeout 12000s
spark.serializer  
org.apache.spark.serializer.KryoSerializer
spark.shuffle.io.preferDirectBufs false
spark.sql.catalogImplementation   hive
spark.sql.shuffle.partitions  5000
spark.yarn.driver.memoryOverhead  1536
spark.yarn.executor.memoryOverhead4096
spark.core.connection.ack.wait.timeout600s
spark.scheduler.maxRegisteredResourcesWaitingTime 15s
spark.sql.hive.filesourcePartitionFileCacheSize   524288000


spark.dynamicAllocation.executorIdleTimeout   3s
spark.dynamicAllocation.enabled   true
spark.hadoop.yarn.timeline-service.enabledfalse
spark.shuffle.service.enabled true
spark.yarn.am.extraJavaOptions-Dhdp.version=2.5.3.0-37 * 
-Dio.netty.maxDirectMemory=1024 -XX:MaxDirectMemorySize=1024m*
{noformat}


Recently we tried to upgrade from Spark 2.1 to Spark 2.2 to get some fixes 
using latest version. But we started facing DirectBuffer outOfMemory error and 
exceeding memory limits for executor memoryOverhead issue. To fix that we 
started tweaking multiple properties but still issue persists. Relevant 
information is shared below

Please let me any other details is requried,

Snapshot for DirectMemory Error Stacktrace :- 

{code:java}
10:48:26.417 WARN org.apache.spark.scheduler.TaskSetManager: Lost task 5.0 in 
stage 5.3 (TID 25022, dedwdprshc070.de.xxx.com, executor 615): 
FetchFailed(BlockManagerId(465, dedwdprshc061.de.xxx.com, 7337, None), 
shuffleId=7, mapId=141, reduceId=3372, message=
org.apache.spark.shuffle.FetchFailedException: failed to allocate 65536 byte(s) 
of direct memory (used: 1073699840, max: 1073741824)
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:442)
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:418)
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:59)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at 
org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
at 
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.sort_addToSorter$(Unknown
 Source)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
 Source)
at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.findNextInnerJoinRows$(Unknown
 Source)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
 Source)
at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$2.hasNext(WholeStageCodegenExec.scala:414)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at 
org.apache.spark.shuffle.sort.Unsafe

[jira] [Updated] (SPARK-22458) OutOfDirectMemoryError with Spark 2.2

2017-11-06 Thread Kaushal Prajapati (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaushal Prajapati updated SPARK-22458:
--
Description: 
We were using Spark 2.1 from last 6 months to execute multiple spark jobs that 
is running 15 hour long for 50+ TB of source data with below configurations 
successfully. 


{noformat}
spark.master  yarn
spark.driver.cores10
spark.driver.maxResultSize5g
spark.driver.memory   20g
spark.executor.cores  5
spark.executor.extraJavaOptions   -XX:+UseG1GC 
*-Dio.netty.maxDirectMemory=1024* -XX:MaxGCPauseMillis=6 
*-XX:MaxDirectMemorySize=2048m* 
-Dlog4j.configuration=file:///conf/log4j.properties -Dhdp.version=2.5.3.0-37
spark.driver.extraJavaOptions
*-Dio.netty.maxDirectMemory=2048 -XX:MaxDirectMemorySize=2048m* 
-Dlog4j.configuration=file:///conf/log4j.properties -Dhdp.version=2.5.3.0-37
spark.executor.instances  30
spark.executor.memory 30g
*spark.kryoserializer.buffer.max   512m*

spark.network.timeout 12000s
spark.serializer  
org.apache.spark.serializer.KryoSerializer
spark.shuffle.io.preferDirectBufs false
spark.sql.catalogImplementation   hive
spark.sql.shuffle.partitions  5000
spark.yarn.driver.memoryOverhead  1536
spark.yarn.executor.memoryOverhead4096
spark.core.connection.ack.wait.timeout600s
spark.scheduler.maxRegisteredResourcesWaitingTime 15s
spark.sql.hive.filesourcePartitionFileCacheSize   524288000


spark.dynamicAllocation.executorIdleTimeout   3s
spark.dynamicAllocation.enabled   true
spark.hadoop.yarn.timeline-service.enabledfalse
spark.shuffle.service.enabled true
spark.yarn.am.extraJavaOptions-Dhdp.version=2.5.3.0-37 * 
-Dio.netty.maxDirectMemory=1024 -XX:MaxDirectMemorySize=1024m*
{noformat}


Recently we tried to upgrade from Spark 2.1 to Spark 2.2 to get some fixes 
using latest version. But we started facing DirectBuffer outOfMemory error and 
exceeding memory limits for executor memoryOverhead issue. To fix that we 
started tweaking multiple properties but still issue persists. Relevant 
information is shared below

Please let me any other details is requried,

Snapshot for DirectMemory Error Stacktrace :- 

{code:java}
10:48:26.417 WARN org.apache.spark.scheduler.TaskSetManager: Lost task 5.0 in 
stage 5.3 (TID 25022, dedwdprshc070.de.xxx.com, executor 615): 
FetchFailed(BlockManagerId(465, dedwdprshc061.de.xxx.com, 7337, None), 
shuffleId=7, mapId=141, reduceId=3372, message=
org.apache.spark.shuffle.FetchFailedException: failed to allocate 65536 byte(s) 
of direct memory (used: 1073699840, max: 1073741824)
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:442)
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:418)
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:59)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at 
org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
at 
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.sort_addToSorter$(Unknown
 Source)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
 Source)
at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.findNextInnerJoinRows$(Unknown
 Source)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
 Source)
at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$2.hasNext(WholeStageCodegenExec.scala:414)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at 
org.apache.spark.shuffle.sort.UnsafeShuffleWriter.wr

[jira] [Updated] (SPARK-22458) OutOfDirectMemoryError with Spark 2.2

2017-11-06 Thread Kaushal Prajapati (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaushal Prajapati updated SPARK-22458:
--
Description: 
We were using Spark 2.1 from last 6 months to execute multiple spark jobs that 
is running 15 hour long for 50+ TB of source data with below configurations 
successfully. 


{noformat}
spark.master  yarn
spark.driver.cores10
spark.driver.maxResultSize5g
spark.driver.memory   20g
spark.executor.cores  5
spark.executor.extraJavaOptions   -XX:+UseG1GC 
*-Dio.netty.maxDirectMemory=1024* -XX:MaxGCPauseMillis=6 
*-XX:MaxDirectMemorySize=2048m* 
-Dlog4j.configuration=file:///conf/log4j.properties -Dhdp.version=2.5.3.0-37
spark.driver.extraJavaOptions* 
-Dio.netty.maxDirectMemory=2048 -XX:MaxDirectMemorySize=2048m* 
-Dlog4j.configuration=file:///conf/log4j.properties -Dhdp.version=2.5.3.0-37
spark.executor.instances  30
spark.executor.memory 30g
*spark.kryoserializer.buffer.max   512m*

spark.network.timeout 12000s
spark.serializer  
org.apache.spark.serializer.KryoSerializer
spark.shuffle.io.preferDirectBufs false
spark.sql.catalogImplementation   hive
spark.sql.shuffle.partitions  5000
spark.yarn.driver.memoryOverhead  1536
spark.yarn.executor.memoryOverhead4096
spark.core.connection.ack.wait.timeout600s
spark.scheduler.maxRegisteredResourcesWaitingTime 15s
spark.sql.hive.filesourcePartitionFileCacheSize   524288000


spark.dynamicAllocation.executorIdleTimeout   3s
spark.dynamicAllocation.enabled   true
spark.hadoop.yarn.timeline-service.enabledfalse
spark.shuffle.service.enabled true
spark.yarn.am.extraJavaOptions-Dhdp.version=2.5.3.0-37 * 
-Dio.netty.maxDirectMemory=1024 -XX:MaxDirectMemorySize=1024m*
{noformat}


Recently we tried to upgrade from Spark 2.1 to Spark 2.2 to get some fixes 
using latest version. But we started facing DirectBuffer outOfMemory error and 
exceeding memory limits for executor memoryOverhead issue. To fix that we 
started tweaking multiple properties but still issue persists. Relevant 
information is shared below

Please let me any other details is requried,

Snapshot for DirectMemory Error Stacktrace :- 

{code:java}
10:48:26.417 WARN org.apache.spark.scheduler.TaskSetManager: Lost task 5.0 in 
stage 5.3 (TID 25022, dedwdprshc070.de.xxx.com, executor 615): 
FetchFailed(BlockManagerId(465, dedwdprshc061.de.xxx.com, 7337, None), 
shuffleId=7, mapId=141, reduceId=3372, message=
org.apache.spark.shuffle.FetchFailedException: failed to allocate 65536 byte(s) 
of direct memory (used: 1073699840, max: 1073741824)
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:442)
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:418)
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:59)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at 
org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
at 
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.sort_addToSorter$(Unknown
 Source)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
 Source)
at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.findNextInnerJoinRows$(Unknown
 Source)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
 Source)
at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$2.hasNext(WholeStageCodegenExec.scala:414)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at 
org.apache.spark.shuffle.sort.UnsafeShuffleWriter.w

[jira] [Updated] (SPARK-22458) OutOfDirectMemoryError with Spark 2.2

2017-11-06 Thread Kaushal Prajapati (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaushal Prajapati updated SPARK-22458:
--
Description: 
We were using Spark 2.1 from last 6 months to execute multiple spark jobs that 
is running 15 hour long for 50+ TB of source data with below configurations 
successfully. 

{{spark.master  yarn
spark.driver.cores10
spark.driver.maxResultSize5g
spark.driver.memory   20g
spark.executor.cores  5
spark.executor.extraJavaOptions   -XX:+UseG1GC 
*-Dio.netty.maxDirectMemory=1024* -XX:MaxGCPauseMillis=6 
*-XX:MaxDirectMemorySize=2048m* 
-Dlog4j.configuration=file:///conf/log4j.properties -Dhdp.version=2.5.3.0-37
spark.driver.extraJavaOptions* 
-Dio.netty.maxDirectMemory=2048 -XX:MaxDirectMemorySize=2048m* 
-Dlog4j.configuration=file:///conf/log4j.properties -Dhdp.version=2.5.3.0-37
spark.executor.instances  30
spark.executor.memory 30g
*spark.kryoserializer.buffer.max   512m*

spark.network.timeout 12000s
spark.serializer  
org.apache.spark.serializer.KryoSerializer
spark.shuffle.io.preferDirectBufs false
spark.sql.catalogImplementation   hive
spark.sql.shuffle.partitions  5000
spark.yarn.driver.memoryOverhead  1536
spark.yarn.executor.memoryOverhead4096
spark.core.connection.ack.wait.timeout600s
spark.scheduler.maxRegisteredResourcesWaitingTime 15s
spark.sql.hive.filesourcePartitionFileCacheSize   524288000


spark.dynamicAllocation.executorIdleTimeout   3s
spark.dynamicAllocation.enabled   true
spark.hadoop.yarn.timeline-service.enabledfalse
spark.shuffle.service.enabled true
spark.yarn.am.extraJavaOptions-Dhdp.version=2.5.3.0-37 * 
-Dio.netty.maxDirectMemory=1024 -XX:MaxDirectMemorySize=1024m*
}}
Recently we tried to upgrade from Spark 2.1 to Spark 2.2 to get some fixes 
using latest version. But we started facing DirectBuffer outOfMemory error and 
exceeding memory limits for executor memoryOverhead issue. To fix that we 
started tweaking multiple properties but still issue persists. Relevant 
information is shared below

Please let me any other details is requried,

Snapshot for DirectMemory Error Stacktrace :- 

{code:java}
10:48:26.417 WARN org.apache.spark.scheduler.TaskSetManager: Lost task 5.0 in 
stage 5.3 (TID 25022, dedwdprshc070.de.xxx.com, executor 615): 
FetchFailed(BlockManagerId(465, dedwdprshc061.de.xxx.com, 7337, None), 
shuffleId=7, mapId=141, reduceId=3372, message=
org.apache.spark.shuffle.FetchFailedException: failed to allocate 65536 byte(s) 
of direct memory (used: 1073699840, max: 1073741824)
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:442)
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:418)
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:59)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at 
org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
at 
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.sort_addToSorter$(Unknown
 Source)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
 Source)
at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.findNextInnerJoinRows$(Unknown
 Source)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
 Source)
at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$2.hasNext(WholeStageCodegenExec.scala:414)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at 
org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWr

[jira] [Updated] (SPARK-22458) OutOfDirectMemoryError with Spark 2.2

2017-11-06 Thread Kaushal Prajapati (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaushal Prajapati updated SPARK-22458:
--
Description: 
We were using Spark 2.1 from last 6 months to execute multiple spark jobs that 
is running 15 hour long for 50+ TB of source data with below configurations 
successfully. 

spark.master  yarn
spark.driver.cores10
spark.driver.maxResultSize5g
spark.driver.memory   20g
spark.executor.cores  5
spark.executor.extraJavaOptions   -XX:+UseG1GC 
*-Dio.netty.maxDirectMemory=1024* -XX:MaxGCPauseMillis=6 
*-XX:MaxDirectMemorySize=2048m* 
-Dlog4j.configuration=file:///conf/log4j.properties -Dhdp.version=2.5.3.0-37
spark.driver.extraJavaOptions* 
-Dio.netty.maxDirectMemory=2048 -XX:MaxDirectMemorySize=2048m* 
-Dlog4j.configuration=file:///conf/log4j.properties -Dhdp.version=2.5.3.0-37
spark.executor.instances  30
spark.executor.memory 30g
*spark.kryoserializer.buffer.max   512m*

spark.network.timeout 12000s
spark.serializer  
org.apache.spark.serializer.KryoSerializer
spark.shuffle.io.preferDirectBufs false
spark.sql.catalogImplementation   hive
spark.sql.shuffle.partitions  5000
spark.yarn.driver.memoryOverhead  1536
spark.yarn.executor.memoryOverhead4096
spark.core.connection.ack.wait.timeout600s
spark.scheduler.maxRegisteredResourcesWaitingTime 15s
spark.sql.hive.filesourcePartitionFileCacheSize   524288000


spark.dynamicAllocation.executorIdleTimeout   3s
spark.dynamicAllocation.enabled   true
spark.hadoop.yarn.timeline-service.enabledfalse
spark.shuffle.service.enabled true
spark.yarn.am.extraJavaOptions-Dhdp.version=2.5.3.0-37 * 
-Dio.netty.maxDirectMemory=1024 -XX:MaxDirectMemorySize=1024m*

Recently we tried to upgrade from Spark 2.1 to Spark 2.2 to get some fixes 
using latest version. But we started facing DirectBuffer outOfMemory error and 
exceeding memory limits for executor memoryOverhead issue. To fix that we 
started tweaking multiple properties but still issue persists. Relevant 
information is shared below

Please let me any other details is requried,

Snapshot for DirectMemory Error Stacktrace :- 
10:48:26.417 WARN org.apache.spark.scheduler.TaskSetManager: Lost task 5.0 in 
stage 5.3 (TID 25022, dedwdprshc070.de.xxx.com, executor 615): 
FetchFailed(BlockManagerId(465, dedwdprshc061.de.xxx.com, 7337, None), 
shuffleId=7, mapId=141, reduceId=3372, message=
org.apache.spark.shuffle.FetchFailedException: failed to allocate 65536 byte(s) 
of direct memory (used: 1073699840, max: 1073741824)
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:442)
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:418)
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:59)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at 
org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
at 
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.sort_addToSorter$(Unknown
 Source)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
 Source)
at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.findNextInnerJoinRows$(Unknown
 Source)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
 Source)
at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$2.hasNext(WholeStageCodegenExec.scala:414)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at 
org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:166)