[jira] [Updated] (SPARK-18289) spark.util.collection.ExternalSorter leak memory when task force spilling in-memory map to disk
[ https://issues.apache.org/jira/browse/SPARK-18289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] KaiXu updated SPARK-18289: -- Description: We use BigBench to test the performance of Hive on Spark2.0 on Intel(R) Xeon(R) CPU E5-2699 v4(1 master + 4 worker) with 3TB scale factor, we found 4 tasks failed when running q30. When checking in executor's log, we found it has memory leak when task force spilling in-memory map to disk. notable configuration parameters: spark.master=yarn spark.deploy.mode=cluster spark.driver.memory=4g; spark.executor.cores=7; spark.executor.memory=22g; spark.yarn.executor.memoryOverhead=8192; spark.executor.instances=50; spark.storage.memoryFraction=0.01; spark.memory.useLegacyMode=true; spark.executor.extraJavaOptions=-XX:+UseParallelOldGC -XX:NewRatio=1 -XX:SurvivorRatio=1 -XX:ParallelGCThreads=4; hive.exec.reducers.bytes.per.reducer=12800; hive.exec.reducers.max=10; hive.mapjoin.smalltable.filesize=10; hive.auto.convert.join.noconditionaltask.size=10; 2016-11-05 18:12:22,754 Stage-0_0: 878/878 Finished Stage-1_0: 5847(+329,-1)/22488 Stage-2_0: 0/12354 Stage-3_0: 0/12354 Stage-4_0: 0/6177 Stage-5_0: 0/1 2016-11-05 18:12:23,760 Stage-0_0: 878/878 Finished Stage-1_0: 5872(+329,-1)/22488 Stage-2_0: 0/12354 Stage-3_0: 0/12354 Stage-4_0: 0/6177 Stage-5_0: 0/1 2016-11-05 18:12:24,766 Stage-0_0: 878/878 Finished Stage-1_0: 5886(+329,-1)/22488 Stage-2_0: 0/12354 Stage-3_0: 0/12354 Stage-4_0: 0/6177 Stage-5_0: 0/1 2016-11-05 18:12:25,772 Stage-0_0: 878/878 Finished Stage-1_0: 5892(+329,-1)/22488 Stage-2_0: 0/12354 Stage-3_0: 0/12354 Stage-4_0: 0/6177 Stage-5_0: 0/1 2016-11-05 18:12:32,810 Stage-0_0: 878/878 Finished Stage-1_0: 5972(+329,-2)/22488 Stage-2_0: 0/12354 Stage-3_0: 0/12354 Stage-4_0: 0/6177 Stage-5_0: 0/1 2016-11-05 18:12:33,815 Stage-0_0: 878/878 Finished Stage-1_0: 6007(+329,-2)/22488 Stage-2_0: 0/12354 Stage-3_0: 0/12354 Stage-4_0: 0/6177 Stage-5_0: 0/1 2016-11-05 18:12:34,821 Stage-0_0: 878/878 Finished Stage-1_0: 6048(+329,-2)/22488 Stage-2_0: 0/12354 Stage-3_0: 0/12354 Stage-4_0: 0/6177 Stage-5_0: 0/1 2016-11-05 18:12:35,827 Stage-0_0: 878/878 Finished Stage-1_0: 6089(+329,-2)/22488 Stage-2_0: 0/12354 Stage-3_0: 0/12354 Stage-4_0: 0/6177 Stage-5_0: 0/1 2016-11-05 18:17:16,348 Stage-0_0: 878/878 Finished Stage-1_0: 11666(+329,-3)/22488 Stage-2_0: 0/12354 Stage-3_0: 0/12354 Stage-4_0: 0/6177 Stage-5_0: 0/1 2016-11-05 18:17:17,354 Stage-0_0: 878/878 Finished Stage-1_0: 11691(+329,-3)/22488 Stage-2_0: 0/12354 Stage-3_0: 0/12354 Stage-4_0: 0/6177 Stage-5_0: 0/1 2016-11-05 18:17:18,359 Stage-0_0: 878/878 Finished Stage-1_0: 11714(+329,-3)/22488 Stage-2_0: 0/12354 Stage-3_0: 0/12354 Stage-4_0: 0/6177 Stage-5_0: 0/1 2016-11-05 18:17:19,364 Stage-0_0: 878/878 Finished Stage-1_0: 11738(+329,-3)/22488 Stage-2_0: 0/12354 Stage-3_0: 0/12354 Stage-4_0: 0/6177 Stage-5_0: 0/1 2016-11-05 18:21:03,468 Stage-0_0: 878/878 Finished Stage-1_0: 15413(+329,-4)/22488 Stage-2_0: 0/12354 Stage-3_0: 0/12354 Stage-4_0: 0/6177 Stage-5_0: 0/1 2016-11-05 18:21:04,473 Stage-0_0: 878/878 Finished Stage-1_0: 15447(+329,-4)/22488 Stage-2_0: 0/12354 Stage-3_0: 0/12354 Stage-4_0: 0/6177 Stage-5_0: 0/1 2016-11-05 18:21:05,477 Stage-0_0: 878/878 Finished Stage-1_0: 15479(+329,-4)/22488 Stage-2_0: 0/12354 Stage-3_0: 0/12354 Stage-4_0: 0/6177 Stage-5_0: 0/1 2016-11-05 18:21:06,482 Stage-0_0: 878/878 Finished Stage-1_0: 15511(+329,-4)/22488 Stage-2_0: 0/12354 Stage-3_0: 0/12354 Stage-4_0: 0/6177 Stage-5_0: 0/1 failed task log: java.lang.NullPointerException at org.apache.spark.util.collection.ExternalSorter$SpillReader.cleanup(ExternalSorter.scala:625) at org.apache.spark.util.collection.ExternalSorter$SpillReader.nextBatchStream(ExternalSorter.scala:540) at org.apache.spark.util.collection.ExternalSorter$SpillReader.(ExternalSorter.scala:508) at org.apache.spark.util.collection.ExternalSorter$SpillableIterator.spill(ExternalSorter.scala:814) at org.apache.spark.util.collection.ExternalSorter.forceSpill(ExternalSorter.scala:254) at org.apache.spark.util.collection.Spillable.spill(Spillable.scala:111) at org.apache.spark.memory.TaskMemoryManager.acquireExecutionMemory(TaskMemoryManager.java:150) at org.apache.spark.memory.TaskMemoryManager.allocatePage(TaskMemoryManager.java:245) at org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:92) at org.apache.spark.shuffle.sort.ShuffleExternalSorter.growPointerArrayIfNecessary(ShuffleExternalSorter.java:333)
[jira] [Updated] (SPARK-18289) spark.util.collection.ExternalSorter leak memory when task force spilling in-memory map to disk
[ https://issues.apache.org/jira/browse/SPARK-18289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] KaiXu updated SPARK-18289: -- Description: We use BigBench to test the performance of Hive on Spark2.0 on Intel(R) Xeon(R) CPU E5-2699 v4(1 master + 4 worker) with 3TB scale factor, we found 4 tasks failed when running q30. When checking in executor's log, we found it has memory leak when task force spilling in-memory map to disk. notable configuration parameters: spark.master=yarn spark.deploy.mode=cluster spark.driver.memory=4g; spark.executor.cores=7; spark.executor.memory=22g; spark.yarn.executor.memoryOverhead=8192; spark.executor.instances=50; spark.storage.memoryFraction=0.01; spark.memory.useLegacyMode=true; spark.executor.extraJavaOptions=-XX:+UseParallelOldGC -XX:NewRatio=1 -XX:SurvivorRatio=1 -XX:ParallelGCThreads=4; hive.exec.reducers.bytes.per.reducer=12800; hive.exec.reducers.max=10; hive.mapjoin.smalltable.filesize=10; hive.auto.convert.join.noconditionaltask.size=10; 2016-11-05 18:12:22,754 Stage-0_0: 878/878 Finished Stage-1_0: 5847(+329,-1)/22488 Stage-2_0: 0/12354 Stage-3_0: 0/12354 Stage-4_0: 0/6177 Stage-5_0: 0/1 2016-11-05 18:12:23,760 Stage-0_0: 878/878 Finished Stage-1_0: 5872(+329,-1)/22488 Stage-2_0: 0/12354 Stage-3_0: 0/12354 Stage-4_0: 0/6177 Stage-5_0: 0/1 2016-11-05 18:12:24,766 Stage-0_0: 878/878 Finished Stage-1_0: 5886(+329,-1)/22488 Stage-2_0: 0/12354 Stage-3_0: 0/12354 Stage-4_0: 0/6177 Stage-5_0: 0/1 2016-11-05 18:12:25,772 Stage-0_0: 878/878 Finished Stage-1_0: 5892(+329,-1)/22488 Stage-2_0: 0/12354 Stage-3_0: 0/12354 Stage-4_0: 0/6177 Stage-5_0: 0/1 2016-11-05 18:12:32,810 Stage-0_0: 878/878 Finished Stage-1_0: 5972(+329,-2)/22488 Stage-2_0: 0/12354 Stage-3_0: 0/12354 Stage-4_0: 0/6177 Stage-5_0: 0/1 2016-11-05 18:12:33,815 Stage-0_0: 878/878 Finished Stage-1_0: 6007(+329,-2)/22488 Stage-2_0: 0/12354 Stage-3_0: 0/12354 Stage-4_0: 0/6177 Stage-5_0: 0/1 2016-11-05 18:12:34,821 Stage-0_0: 878/878 Finished Stage-1_0: 6048(+329,-2)/22488 Stage-2_0: 0/12354 Stage-3_0: 0/12354 Stage-4_0: 0/6177 Stage-5_0: 0/1 2016-11-05 18:12:35,827 Stage-0_0: 878/878 Finished Stage-1_0: 6089(+329,-2)/22488 Stage-2_0: 0/12354 Stage-3_0: 0/12354 Stage-4_0: 0/6177 Stage-5_0: 0/1 2016-11-05 18:17:16,348 Stage-0_0: 878/878 Finished Stage-1_0: 11666(+329,-3)/22488 Stage-2_0: 0/12354 Stage-3_0: 0/12354 Stage-4_0: 0/6177 Stage-5_0: 0/1 2016-11-05 18:17:17,354 Stage-0_0: 878/878 Finished Stage-1_0: 11691(+329,-3)/22488 Stage-2_0: 0/12354 Stage-3_0: 0/12354 Stage-4_0: 0/6177 Stage-5_0: 0/1 2016-11-05 18:17:18,359 Stage-0_0: 878/878 Finished Stage-1_0: 11714(+329,-3)/22488 Stage-2_0: 0/12354 Stage-3_0: 0/12354 Stage-4_0: 0/6177 Stage-5_0: 0/1 2016-11-05 18:17:19,364 Stage-0_0: 878/878 Finished Stage-1_0: 11738(+329,-3)/22488 Stage-2_0: 0/12354 Stage-3_0: 0/12354 Stage-4_0: 0/6177 Stage-5_0: 0/1 2016-11-05 18:21:03,468 Stage-0_0: 878/878 Finished Stage-1_0: 15413(+329,-4)/22488 Stage-2_0: 0/12354 Stage-3_0: 0/12354 Stage-4_0: 0/6177 Stage-5_0: 0/1 2016-11-05 18:21:04,473 Stage-0_0: 878/878 Finished Stage-1_0: 15447(+329,-4)/22488 Stage-2_0: 0/12354 Stage-3_0: 0/12354 Stage-4_0: 0/6177 Stage-5_0: 0/1 2016-11-05 18:21:05,477 Stage-0_0: 878/878 Finished Stage-1_0: 15479(+329,-4)/22488 Stage-2_0: 0/12354 Stage-3_0: 0/12354 Stage-4_0: 0/6177 Stage-5_0: 0/1 2016-11-05 18:21:06,482 Stage-0_0: 878/878 Finished Stage-1_0: 15511(+329,-4)/22488 Stage-2_0: 0/12354 Stage-3_0: 0/12354 Stage-4_0: 0/6177 Stage-5_0: 0/1 executor's log: 16/11/05 18:12:22 INFO exec.CommonJoinOperator: table 0 has 1000 rows for join key [972725] 16/11/05 18:12:22 INFO exec.CommonJoinOperator: table 0 has 2000 rows for join key [972725] 16/11/05 18:12:22 INFO exec.CommonJoinOperator: table 0 has 4000 rows for join key [972725] 16/11/05 18:12:22 INFO exec.CommonJoinOperator: table 0 has 8000 rows for join key [972725] 16/11/05 18:12:22 INFO exec.CommonJoinOperator: table 0 has 16000 rows for join key [972725] 16/11/05 18:12:22 INFO collection.ExternalSorter: Task 6619 force spilling in-memory map to disk and it will release 357.9 MB memory 16/11/05 18:12:22 WARN memory.TaskMemoryManager: leak 357.9 MB memory from org.apache.spark.util.collection.ExternalSorter@71b91fb4 16/11/05 18:12:22 ERROR executor.Executor: Exception in task 5741.0 in stage 1.0 (TID 6619) java.lang.NullPointerException at org.apache.spark.util.collection.ExternalSorter$SpillReader.cleanup(ExternalSorter.scala:625) at
[jira] [Updated] (SPARK-18289) spark.util.collection.ExternalSorter leak memory when task force spilling in-memory map to disk
[ https://issues.apache.org/jira/browse/SPARK-18289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] KaiXu updated SPARK-18289: -- Description: We use BigBench to test the performance of Hive on Spark2.0 on Intel(R) Xeon(R) CPU E5-2699 v4(1 master + 4 worker) with 3TB scale factor, we found 4 tasks failed when running q30. When checking in executor's log, we found it has memory leak when task force spilling in-memory map to disk. notable configuration parameters: spark.master=yarn spark.deploy.mode=cluster spark.driver.memory=4g; spark.executor.cores=7; spark.executor.memory=22g; spark.yarn.executor.memoryOverhead=8192; spark.executor.instances=50; spark.storage.memoryFraction=0.01; spark.memory.useLegacyMode=true; spark.executor.extraJavaOptions=-XX:+UseParallelOldGC -XX:NewRatio=1 -XX:SurvivorRatio=1 -XX:ParallelGCThreads=4; hive.exec.reducers.bytes.per.reducer=12800; hive.exec.reducers.max=10; hive.mapjoin.smalltable.filesize=10; hive.auto.convert.join.noconditionaltask.size=10; 2016-11-05 18:12:22,754 Stage-0_0: 878/878 Finished Stage-1_0: 5847(+329,-1)/22488 Stage-2_0: 0/12354 Stage-3_0: 0/12354 Stage-4_0: 0/6177 Stage-5_0: 0/1 2016-11-05 18:12:23,760 Stage-0_0: 878/878 Finished Stage-1_0: 5872(+329,-1)/22488 Stage-2_0: 0/12354 Stage-3_0: 0/12354 Stage-4_0: 0/6177 Stage-5_0: 0/1 2016-11-05 18:12:24,766 Stage-0_0: 878/878 Finished Stage-1_0: 5886(+329,-1)/22488 Stage-2_0: 0/12354 Stage-3_0: 0/12354 Stage-4_0: 0/6177 Stage-5_0: 0/1 2016-11-05 18:12:25,772 Stage-0_0: 878/878 Finished Stage-1_0: 5892(+329,-1)/22488 Stage-2_0: 0/12354 Stage-3_0: 0/12354 Stage-4_0: 0/6177 Stage-5_0: 0/1 2016-11-05 18:12:32,810 Stage-0_0: 878/878 Finished Stage-1_0: 5972(+329,-2)/22488 Stage-2_0: 0/12354 Stage-3_0: 0/12354 Stage-4_0: 0/6177 Stage-5_0: 0/1 2016-11-05 18:12:33,815 Stage-0_0: 878/878 Finished Stage-1_0: 6007(+329,-2)/22488 Stage-2_0: 0/12354 Stage-3_0: 0/12354 Stage-4_0: 0/6177 Stage-5_0: 0/1 2016-11-05 18:12:34,821 Stage-0_0: 878/878 Finished Stage-1_0: 6048(+329,-2)/22488 Stage-2_0: 0/12354 Stage-3_0: 0/12354 Stage-4_0: 0/6177 Stage-5_0: 0/1 2016-11-05 18:12:35,827 Stage-0_0: 878/878 Finished Stage-1_0: 6089(+329,-2)/22488 Stage-2_0: 0/12354 Stage-3_0: 0/12354 Stage-4_0: 0/6177 Stage-5_0: 0/1 2016-11-05 18:17:16,348 Stage-0_0: 878/878 Finished Stage-1_0: 11666(+329,-3)/22488 Stage-2_0: 0/12354 Stage-3_0: 0/12354 Stage-4_0: 0/6177 Stage-5_0: 0/1 2016-11-05 18:17:17,354 Stage-0_0: 878/878 Finished Stage-1_0: 11691(+329,-3)/22488 Stage-2_0: 0/12354 Stage-3_0: 0/12354 Stage-4_0: 0/6177 Stage-5_0: 0/1 2016-11-05 18:17:18,359 Stage-0_0: 878/878 Finished Stage-1_0: 11714(+329,-3)/22488 Stage-2_0: 0/12354 Stage-3_0: 0/12354 Stage-4_0: 0/6177 Stage-5_0: 0/1 2016-11-05 18:17:19,364 Stage-0_0: 878/878 Finished Stage-1_0: 11738(+329,-3)/22488 Stage-2_0: 0/12354 Stage-3_0: 0/12354 Stage-4_0: 0/6177 Stage-5_0: 0/1 2016-11-05 18:21:03,468 Stage-0_0: 878/878 Finished Stage-1_0: 15413(+329,-4)/22488 Stage-2_0: 0/12354 Stage-3_0: 0/12354 Stage-4_0: 0/6177 Stage-5_0: 0/1 2016-11-05 18:21:04,473 Stage-0_0: 878/878 Finished Stage-1_0: 15447(+329,-4)/22488 Stage-2_0: 0/12354 Stage-3_0: 0/12354 Stage-4_0: 0/6177 Stage-5_0: 0/1 2016-11-05 18:21:05,477 Stage-0_0: 878/878 Finished Stage-1_0: 15479(+329,-4)/22488 Stage-2_0: 0/12354 Stage-3_0: 0/12354 Stage-4_0: 0/6177 Stage-5_0: 0/1 2016-11-05 18:21:06,482 Stage-0_0: 878/878 Finished Stage-1_0: 15511(+329,-4)/22488 Stage-2_0: 0/12354 Stage-3_0: 0/12354 Stage-4_0: 0/6177 Stage-5_0: 0/1 16/11/05 18:12:22 INFO exec.CommonJoinOperator: table 0 has 1000 rows for join key [972725] 16/11/05 18:12:22 INFO exec.CommonJoinOperator: table 0 has 2000 rows for join key [972725] 16/11/05 18:12:22 INFO exec.CommonJoinOperator: table 0 has 4000 rows for join key [972725] 16/11/05 18:12:22 INFO exec.CommonJoinOperator: table 0 has 8000 rows for join key [972725] 16/11/05 18:12:22 INFO exec.CommonJoinOperator: table 0 has 16000 rows for join key [972725] 16/11/05 18:12:22 INFO collection.ExternalSorter: Task 6619 force spilling in-memory map to disk and it will release 357.9 MB memory 16/11/05 18:12:22 WARN memory.TaskMemoryManager: leak 357.9 MB memory from org.apache.spark.util.collection.ExternalSorter@71b91fb4 16/11/05 18:12:22 ERROR executor.Executor: Exception in task 5741.0 in stage 1.0 (TID 6619) java.lang.NullPointerException at org.apache.spark.util.collection.ExternalSorter$SpillReader.cleanup(ExternalSorter.scala:625) at
[jira] [Updated] (SPARK-18289) spark.util.collection.ExternalSorter leak memory when task force spilling in-memory map to disk
[ https://issues.apache.org/jira/browse/SPARK-18289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] KaiXu updated SPARK-18289: -- Description: We use BigBench to test the performance of Hive on Spark2.0 on Intel(R) Xeon(R) CPU E5-2699 v4(1 master + 4 worker) with 3TB scale factor, we found 4 tasks failed when running q30. When checking in executor's log, we found it has memory leak when task force spilling in-memory map to disk. notable configuration parameters: spark.master=yarn spark.deploy.mode=cluster spark.driver.memory=4g; spark.executor.cores=7; spark.executor.memory=22g; spark.yarn.executor.memoryOverhead=8192; spark.executor.instances=50; spark.storage.memoryFraction=0.01; spark.memory.useLegacyMode=true; spark.executor.extraJavaOptions=-XX:+UseParallelOldGC -XX:NewRatio=1 -XX:SurvivorRatio=1 -XX:ParallelGCThreads=4; hive.exec.reducers.bytes.per.reducer=12800; hive.exec.reducers.max=10; hive.mapjoin.smalltable.filesize=10; hive.auto.convert.join.noconditionaltask.size=10; 2016-11-05 18:12:22,754 Stage-0_0: 878/878 Finished Stage-1_0: 5847(+329,-1)/22488 Stage-2_0: 0/12354 Stage-3_0: 0/12354 Stage-4_0: 0/6177 Stage-5_0: 0/1 2016-11-05 18:12:23,760 Stage-0_0: 878/878 Finished Stage-1_0: 5872(+329,-1)/22488 Stage-2_0: 0/12354 Stage-3_0: 0/12354 Stage-4_0: 0/6177 Stage-5_0: 0/1 2016-11-05 18:12:24,766 Stage-0_0: 878/878 Finished Stage-1_0: 5886(+329,-1)/22488 Stage-2_0: 0/12354 Stage-3_0: 0/12354 Stage-4_0: 0/6177 Stage-5_0: 0/1 2016-11-05 18:12:25,772 Stage-0_0: 878/878 Finished Stage-1_0: 5892(+329,-1)/22488 Stage-2_0: 0/12354 Stage-3_0: 0/12354 Stage-4_0: 0/6177 Stage-5_0: 0/1 2016-11-05 18:12:32,810 Stage-0_0: 878/878 Finished Stage-1_0: 5972(+329,-2)/22488 Stage-2_0: 0/12354 Stage-3_0: 0/12354 Stage-4_0: 0/6177 Stage-5_0: 0/1 2016-11-05 18:12:33,815 Stage-0_0: 878/878 Finished Stage-1_0: 6007(+329,-2)/22488 Stage-2_0: 0/12354 Stage-3_0: 0/12354 Stage-4_0: 0/6177 Stage-5_0: 0/1 2016-11-05 18:12:34,821 Stage-0_0: 878/878 Finished Stage-1_0: 6048(+329,-2)/22488 Stage-2_0: 0/12354 Stage-3_0: 0/12354 Stage-4_0: 0/6177 Stage-5_0: 0/1 2016-11-05 18:12:35,827 Stage-0_0: 878/878 Finished Stage-1_0: 6089(+329,-2)/22488 Stage-2_0: 0/12354 Stage-3_0: 0/12354 Stage-4_0: 0/6177 Stage-5_0: 0/1 2016-11-05 18:17:16,348 Stage-0_0: 878/878 Finished Stage-1_0: 11666(+329,-3)/22488 Stage-2_0: 0/12354 Stage-3_0: 0/12354 Stage-4_0: 0/6177 Stage-5_0: 0/1 2016-11-05 18:17:17,354 Stage-0_0: 878/878 Finished Stage-1_0: 11691(+329,-3)/22488 Stage-2_0: 0/12354 Stage-3_0: 0/12354 Stage-4_0: 0/6177 Stage-5_0: 0/1 2016-11-05 18:17:18,359 Stage-0_0: 878/878 Finished Stage-1_0: 11714(+329,-3)/22488 Stage-2_0: 0/12354 Stage-3_0: 0/12354 Stage-4_0: 0/6177 Stage-5_0: 0/1 2016-11-05 18:17:19,364 Stage-0_0: 878/878 Finished Stage-1_0: 11738(+329,-3)/22488 Stage-2_0: 0/12354 Stage-3_0: 0/12354 Stage-4_0: 0/6177 Stage-5_0: 0/1 2016-11-05 18:21:03,468 Stage-0_0: 878/878 Finished Stage-1_0: 15413(+329,-4)/22488 Stage-2_0: 0/12354 Stage-3_0: 0/12354 Stage-4_0: 0/6177 Stage-5_0: 0/1 2016-11-05 18:21:04,473 Stage-0_0: 878/878 Finished Stage-1_0: 15447(+329,-4)/22488 Stage-2_0: 0/12354 Stage-3_0: 0/12354 Stage-4_0: 0/6177 Stage-5_0: 0/1 2016-11-05 18:21:05,477 Stage-0_0: 878/878 Finished Stage-1_0: 15479(+329,-4)/22488 Stage-2_0: 0/12354 Stage-3_0: 0/12354 Stage-4_0: 0/6177 Stage-5_0: 0/1 2016-11-05 18:21:06,482 Stage-0_0: 878/878 Finished Stage-1_0: 15511(+329,-4)/22488 Stage-2_0: 0/12354 Stage-3_0: 0/12354 Stage-4_0: 0/6177 Stage-5_0: 0/1 executor's log: 16/11/05 18:12:32 INFO exec.CommonJoinOperator: table 0 has 2000 rows for join key [973089] 16/11/05 18:12:32 INFO exec.CommonJoinOperator: table 0 has 4000 rows for join key [973089] 16/11/05 18:12:32 INFO exec.CommonJoinOperator: table 0 has 8000 rows for join key [973089] 16/11/05 18:12:32 INFO exec.CommonJoinOperator: table 0 has 16000 rows for join key [973089] 16/11/05 18:12:32 INFO exec.CommonJoinOperator: table 0 has 1000 rows for join key [972962] 16/11/05 18:12:32 INFO exec.CommonJoinOperator: table 0 has 2000 rows for join key [972962] 16/11/05 18:12:32 INFO exec.CommonJoinOperator: table 0 has 4000 rows for join key [972962] 16/11/05 18:12:32 INFO exec.CommonJoinOperator: table 0 has 8000 rows for join key [972962] 16/11/05 18:12:32 INFO exec.CommonJoinOperator: table 0 has 16000 rows for join key [972962] 16/11/05 18:12:32 INFO spark.SparkRecordHandler: processed 1051034 rows: used memory = 10801674800 16/11/05 18:12:32 INFO exec.JoinOperator: SKEWJOINFOLLOWUPJOBS:0, 16/11/05 18:12:32 INFO
[jira] [Updated] (SPARK-18289) spark.util.collection.ExternalSorter leak memory when task force spilling in-memory map to disk
[ https://issues.apache.org/jira/browse/SPARK-18289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] KaiXu updated SPARK-18289: -- Description: We use BigBench to test the performance of Hive on Spark2.0 on Intel(R) Xeon(R) CPU E5-2699 v4(1 master + 4 worker) with 3TB scale factor, we found 4 tasks failed when running q30. When checking in executor's log, we found it has memory leak when task force spilling in-memory map to disk. notable configuration parameters: spark.master=yarn spark.deploy.mode=cluster spark.driver.memory=4g; spark.executor.cores=7; spark.executor.memory=22g; spark.yarn.executor.memoryOverhead=8192; spark.executor.instances=50; spark.storage.memoryFraction=0.01; spark.memory.useLegacyMode=true; spark.executor.extraJavaOptions=-XX:+UseParallelOldGC -XX:NewRatio=1 -XX:SurvivorRatio=1 -XX:ParallelGCThreads=4; hive.exec.reducers.bytes.per.reducer=12800; hive.exec.reducers.max=10; hive.mapjoin.smalltable.filesize=10; hive.auto.convert.join.noconditionaltask.size=10; 2016-11-05 18:12:22,754 Stage-0_0: 878/878 Finished Stage-1_0: 5847(+329,-1)/22488 Stage-2_0: 0/12354 Stage-3_0: 0/12354 Stage-4_0: 0/6177 Stage-5_0: 0/1 2016-11-05 18:12:23,760 Stage-0_0: 878/878 Finished Stage-1_0: 5872(+329,-1)/22488 Stage-2_0: 0/12354 Stage-3_0: 0/12354 Stage-4_0: 0/6177 Stage-5_0: 0/1 2016-11-05 18:12:24,766 Stage-0_0: 878/878 Finished Stage-1_0: 5886(+329,-1)/22488 Stage-2_0: 0/12354 Stage-3_0: 0/12354 Stage-4_0: 0/6177 Stage-5_0: 0/1 2016-11-05 18:12:25,772 Stage-0_0: 878/878 Finished Stage-1_0: 5892(+329,-1)/22488 Stage-2_0: 0/12354 Stage-3_0: 0/12354 Stage-4_0: 0/6177 Stage-5_0: 0/1 2016-11-05 18:12:32,810 Stage-0_0: 878/878 Finished Stage-1_0: 5972(+329,-2)/22488 Stage-2_0: 0/12354 Stage-3_0: 0/12354 Stage-4_0: 0/6177 Stage-5_0: 0/1 2016-11-05 18:12:33,815 Stage-0_0: 878/878 Finished Stage-1_0: 6007(+329,-2)/22488 Stage-2_0: 0/12354 Stage-3_0: 0/12354 Stage-4_0: 0/6177 Stage-5_0: 0/1 2016-11-05 18:12:34,821 Stage-0_0: 878/878 Finished Stage-1_0: 6048(+329,-2)/22488 Stage-2_0: 0/12354 Stage-3_0: 0/12354 Stage-4_0: 0/6177 Stage-5_0: 0/1 2016-11-05 18:12:35,827 Stage-0_0: 878/878 Finished Stage-1_0: 6089(+329,-2)/22488 Stage-2_0: 0/12354 Stage-3_0: 0/12354 Stage-4_0: 0/6177 Stage-5_0: 0/1 executor's log: 16/11/05 18:12:32 INFO exec.CommonJoinOperator: table 0 has 2000 rows for join key [973089] 16/11/05 18:12:32 INFO exec.CommonJoinOperator: table 0 has 4000 rows for join key [973089] 16/11/05 18:12:32 INFO exec.CommonJoinOperator: table 0 has 8000 rows for join key [973089] 16/11/05 18:12:32 INFO exec.CommonJoinOperator: table 0 has 16000 rows for join key [973089] 16/11/05 18:12:32 INFO exec.CommonJoinOperator: table 0 has 1000 rows for join key [972962] 16/11/05 18:12:32 INFO exec.CommonJoinOperator: table 0 has 2000 rows for join key [972962] 16/11/05 18:12:32 INFO exec.CommonJoinOperator: table 0 has 4000 rows for join key [972962] 16/11/05 18:12:32 INFO exec.CommonJoinOperator: table 0 has 8000 rows for join key [972962] 16/11/05 18:12:32 INFO exec.CommonJoinOperator: table 0 has 16000 rows for join key [972962] 16/11/05 18:12:32 INFO spark.SparkRecordHandler: processed 1051034 rows: used memory = 10801674800 16/11/05 18:12:32 INFO exec.JoinOperator: SKEWJOINFOLLOWUPJOBS:0, 16/11/05 18:12:32 INFO exec.ReduceSinkOperator: RS[9]: records written - 1050990 16/11/05 18:12:32 INFO exec.ReduceSinkOperator: RECORDS_OUT_INTERMEDIATE:1050990, 16/11/05 18:12:32 INFO collection.ExternalSorter: Task 6856 force spilling in-memory map to disk and it will release 347.0 MB memory 16/11/05 18:12:32 WARN memory.TaskMemoryManager: leak 347.0 MB memory from org.apache.spark.util.collection.ExternalSorter@1bf0a514 16/11/05 18:12:32 ERROR executor.Executor: Exception in task 5978.0 in stage 1.0 (TID 6856) java.lang.NullPointerException at org.apache.spark.util.collection.ExternalSorter$SpillReader.cleanup(ExternalSorter.scala:625) at org.apache.spark.util.collection.ExternalSorter$SpillReader.nextBatchStream(ExternalSorter.scala:540) at org.apache.spark.util.collection.ExternalSorter$SpillReader.(ExternalSorter.scala:508) at org.apache.spark.util.collection.ExternalSorter$SpillableIterator.spill(ExternalSorter.scala:814) at org.apache.spark.util.collection.ExternalSorter.forceSpill(ExternalSorter.scala:254) at org.apache.spark.util.collection.Spillable.spill(Spillable.scala:111) at org.apache.spark.memory.TaskMemoryManager.acquireExecutionMemory(TaskMemoryManager.java:150) at org.apache.spark.memory.TaskMemoryManager.allocatePage(TaskMemoryManager.java:245) at org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:92) at
[jira] [Updated] (SPARK-18289) spark.util.collection.ExternalSorter leak memory when task force spilling in-memory map to disk
[ https://issues.apache.org/jira/browse/SPARK-18289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] KaiXu updated SPARK-18289: -- Description: We use BigBench to test the performance of Hive on Spark2.0 on Intel(R) Xeon(R) CPU E5-2699 v4(1 master + 4 worker) with 3TB scale factor, we found 4 tasks failed when running q30. When checking in executor's log, we found it has memory leak when task force spilling in-memory map to disk. notable configuration parameters: spark.master=yarn spark.deploy.mode=cluster spark.driver.memory=4g; spark.executor.cores=7; spark.executor.memory=22g; spark.yarn.executor.memoryOverhead=8192; spark.executor.instances=50; spark.storage.memoryFraction=0.01; spark.memory.useLegacyMode=true; spark.executor.extraJavaOptions=-XX:+UseParallelOldGC -XX:NewRatio=1 -XX:SurvivorRatio=1 -XX:ParallelGCThreads=4; hive.exec.reducers.bytes.per.reducer=12800; hive.exec.reducers.max=10; hive.mapjoin.smalltable.filesize=10; hive.auto.convert.join.noconditionaltask.size=10; executor's log: 16/11/05 18:12:32 INFO exec.CommonJoinOperator: table 0 has 2000 rows for join key [973089] 16/11/05 18:12:32 INFO exec.CommonJoinOperator: table 0 has 4000 rows for join key [973089] 16/11/05 18:12:32 INFO exec.CommonJoinOperator: table 0 has 8000 rows for join key [973089] 16/11/05 18:12:32 INFO exec.CommonJoinOperator: table 0 has 16000 rows for join key [973089] 16/11/05 18:12:32 INFO exec.CommonJoinOperator: table 0 has 1000 rows for join key [972962] 16/11/05 18:12:32 INFO exec.CommonJoinOperator: table 0 has 2000 rows for join key [972962] 16/11/05 18:12:32 INFO exec.CommonJoinOperator: table 0 has 4000 rows for join key [972962] 16/11/05 18:12:32 INFO exec.CommonJoinOperator: table 0 has 8000 rows for join key [972962] 16/11/05 18:12:32 INFO exec.CommonJoinOperator: table 0 has 16000 rows for join key [972962] 16/11/05 18:12:32 INFO spark.SparkRecordHandler: processed 1051034 rows: used memory = 10801674800 16/11/05 18:12:32 INFO exec.JoinOperator: SKEWJOINFOLLOWUPJOBS:0, 16/11/05 18:12:32 INFO exec.ReduceSinkOperator: RS[9]: records written - 1050990 16/11/05 18:12:32 INFO exec.ReduceSinkOperator: RECORDS_OUT_INTERMEDIATE:1050990, 16/11/05 18:12:32 INFO collection.ExternalSorter: Task 6856 force spilling in-memory map to disk and it will release 347.0 MB memory 16/11/05 18:12:32 WARN memory.TaskMemoryManager: leak 347.0 MB memory from org.apache.spark.util.collection.ExternalSorter@1bf0a514 16/11/05 18:12:32 ERROR executor.Executor: Exception in task 5978.0 in stage 1.0 (TID 6856) java.lang.NullPointerException at org.apache.spark.util.collection.ExternalSorter$SpillReader.cleanup(ExternalSorter.scala:625) at org.apache.spark.util.collection.ExternalSorter$SpillReader.nextBatchStream(ExternalSorter.scala:540) at org.apache.spark.util.collection.ExternalSorter$SpillReader.(ExternalSorter.scala:508) at org.apache.spark.util.collection.ExternalSorter$SpillableIterator.spill(ExternalSorter.scala:814) at org.apache.spark.util.collection.ExternalSorter.forceSpill(ExternalSorter.scala:254) at org.apache.spark.util.collection.Spillable.spill(Spillable.scala:111) at org.apache.spark.memory.TaskMemoryManager.acquireExecutionMemory(TaskMemoryManager.java:150) at org.apache.spark.memory.TaskMemoryManager.allocatePage(TaskMemoryManager.java:245) at org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:92) at org.apache.spark.shuffle.sort.ShuffleExternalSorter.growPointerArrayIfNecessary(ShuffleExternalSorter.java:333) at org.apache.spark.shuffle.sort.ShuffleExternalSorter.insertRecord(ShuffleExternalSorter.java:384) at org.apache.spark.shuffle.sort.UnsafeShuffleWriter.insertRecordIntoSorter(UnsafeShuffleWriter.java:235) at org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:162) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47) at org.apache.spark.scheduler.Task.run(Task.scala:85) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 16/11/05 18:12:32 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 7173 16/11/05 18:12:32 INFO executor.Executor: Running task 6294.0 in stage 1.0 (TID 7173) 16/11/05 18:12:32 INFO storage.ShuffleBlockFetcherIterator: Getting 878 non-empty blocks out of 878 blocks 16/11/05 18:12:32 INFO storage.ShuffleBlockFetcherIterator: Started 46 remote fetches in 17 ms 16/11/05 18:12:39 INFO executor.Executor: Finished task
[jira] [Updated] (SPARK-18289) spark.util.collection.ExternalSorter leak memory when task force spilling in-memory map to disk
[ https://issues.apache.org/jira/browse/SPARK-18289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] KaiXu updated SPARK-18289: -- Description: We use BigBench to test the performance of Hive on Spark2.0 on Intel(R) Xeon(R) CPU E5-2699 v4(1 master + 4 worker) with 3TB scale factor, we found 4 tasks failed when running q30. When checking in executor's log, we found it has memory leak when task force spilling in-memory map to disk. notable configuration parameters: spark.master=yarn spark.deploy.mode=cluster spark.driver.memory=4g; spark.executor.cores=7; spark.executor.memory=22g; spark.yarn.executor.memoryOverhead=8192; spark.executor.instances=50; spark.storage.memoryFraction=0.01; spark.memory.useLegacyMode=true; spark.executor.extraJavaOptions=-XX:+UseParallelOldGC -XX:NewRatio=1 -XX:SurvivorRatio=1 -XX:ParallelGCThreads=4; hive.exec.reducers.bytes.per.reducer=12800; hive.exec.reducers.max=10; hive.mapjoin.smalltable.filesize=10; hive.auto.convert.join.noconditionaltask.size=10; executor's log: 16/11/05 18:12:32 INFO exec.CommonJoinOperator: table 0 has 2000 rows for join key [973089] 16/11/05 18:12:32 INFO exec.CommonJoinOperator: table 0 has 4000 rows for join key [973089] 16/11/05 18:12:32 INFO exec.CommonJoinOperator: table 0 has 8000 rows for join key [973089] 16/11/05 18:12:32 INFO exec.CommonJoinOperator: table 0 has 16000 rows for join key [973089] 16/11/05 18:12:32 INFO exec.CommonJoinOperator: table 0 has 1000 rows for join key [972962] 16/11/05 18:12:32 INFO exec.CommonJoinOperator: table 0 has 2000 rows for join key [972962] 16/11/05 18:12:32 INFO exec.CommonJoinOperator: table 0 has 4000 rows for join key [972962] 16/11/05 18:12:32 INFO exec.CommonJoinOperator: table 0 has 8000 rows for join key [972962] 16/11/05 18:12:32 INFO exec.CommonJoinOperator: table 0 has 16000 rows for join key [972962] 16/11/05 18:12:32 INFO spark.SparkRecordHandler: processed 1051034 rows: used memory = 10801674800 16/11/05 18:12:32 INFO exec.JoinOperator: SKEWJOINFOLLOWUPJOBS:0, 16/11/05 18:12:32 INFO exec.ReduceSinkOperator: RS[9]: records written - 1050990 16/11/05 18:12:32 INFO exec.ReduceSinkOperator: RECORDS_OUT_INTERMEDIATE:1050990, 16/11/05 18:12:32 INFO collection.ExternalSorter: Task 6856 force spilling in-memory map to disk and it will release 347.0 MB memory 16/11/05 18:12:32 WARN memory.TaskMemoryManager: leak 347.0 MB memory from org.apache.spark.util.collection.ExternalSorter@1bf0a514 16/11/05 18:12:32 ERROR executor.Executor: Exception in task 5978.0 in stage 1.0 (TID 6856) java.lang.NullPointerException at org.apache.spark.util.collection.ExternalSorter$SpillReader.cleanup(ExternalSorter.scala:625) at org.apache.spark.util.collection.ExternalSorter$SpillReader.nextBatchStream(ExternalSorter.scala:540) at org.apache.spark.util.collection.ExternalSorter$SpillReader.(ExternalSorter.scala:508) at org.apache.spark.util.collection.ExternalSorter$SpillableIterator.spill(ExternalSorter.scala:814) at org.apache.spark.util.collection.ExternalSorter.forceSpill(ExternalSorter.scala:254) at org.apache.spark.util.collection.Spillable.spill(Spillable.scala:111) at org.apache.spark.memory.TaskMemoryManager.acquireExecutionMemory(TaskMemoryManager.java:150) at org.apache.spark.memory.TaskMemoryManager.allocatePage(TaskMemoryManager.java:245) at org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:92) at org.apache.spark.shuffle.sort.ShuffleExternalSorter.growPointerArrayIfNecessary(ShuffleExternalSorter.java:333) at org.apache.spark.shuffle.sort.ShuffleExternalSorter.insertRecord(ShuffleExternalSorter.java:384) at org.apache.spark.shuffle.sort.UnsafeShuffleWriter.insertRecordIntoSorter(UnsafeShuffleWriter.java:235) at org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:162) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47) at org.apache.spark.scheduler.Task.run(Task.scala:85) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 16/11/05 18:12:32 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 7173 16/11/05 18:12:32 INFO executor.Executor: Running task 6294.0 in stage 1.0 (TID 7173) 16/11/05 18:12:32 INFO storage.ShuffleBlockFetcherIterator: Getting 878 non-empty blocks out of 878 blocks 16/11/05 18:12:32 INFO storage.ShuffleBlockFetcherIterator: Started 46 remote fetches in 17 ms 16/11/05 18:12:39 INFO executor.Executor: Finished task
[jira] [Updated] (SPARK-18289) spark.util.collection.ExternalSorter leak memory when task force spilling in-memory map to disk
[ https://issues.apache.org/jira/browse/SPARK-18289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] KaiXu updated SPARK-18289: -- Environment: CentOS7.2 kernel: 3.10.0-327.el7.x86_64 Hadoop2.7.1 Spark2.0.0 release version Hive2.1 with patch HIVE-14029 (https://issues.apache.org/jira/browse/HIVE-14029) Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz 384GB memory was: CentOS7.2 kernel: 3.10.0-327.el7.x86_64 Hadoop2.7.1 Spark2.0.0 release version Hive2.1 with patch HIVE-14029 (https://issues.apache.org/jira/browse/HIVE-14029) Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz 384GB memory > spark.util.collection.ExternalSorter leak memory when task force spilling > in-memory map to disk > > > Key: SPARK-18289 > URL: https://issues.apache.org/jira/browse/SPARK-18289 > Project: Spark > Issue Type: Bug > Components: Scheduler, Shuffle, Spark Core >Affects Versions: 2.0.0 > Environment: CentOS7.2 kernel: 3.10.0-327.el7.x86_64 > Hadoop2.7.1 > Spark2.0.0 release version > Hive2.1 with patch HIVE-14029 > (https://issues.apache.org/jira/browse/HIVE-14029) > Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz > 384GB memory >Reporter: KaiXu > Labels: hive, spark2.0.0 > > We use BigBench to test the performance of Hive on Spark2.0 on Intel(R) > Xeon(R) CPU E5-2699 v4(1 master + 4 worker) with 3TB scale factor, we found 4 > tasks failed when running q30. When checking in executor's log, we found it > has memory leak when task force spilling in-memory map to disk. > notable configuration parameters: > spark.master=yarn > spark.deploy.mode=cluster > spark.driver.memory=4g; > spark.executor.cores=7; > spark.executor.memory=22g; > spark.yarn.executor.memoryOverhead=8192; > spark.executor.instances=50; > spark.storage.memoryFraction=0.01; > spark.memory.useLegacyMode=true; > spark.executor.extraJavaOptions=-XX:+UseParallelOldGC -XX:NewRatio=1 > -XX:SurvivorRatio=1 -XX:ParallelGCThreads=4; > executor's log: > 16/11/05 18:12:32 INFO exec.CommonJoinOperator: table 0 has 2000 rows for > join key [973089] > 16/11/05 18:12:32 INFO exec.CommonJoinOperator: table 0 has 4000 rows for > join key [973089] > 16/11/05 18:12:32 INFO exec.CommonJoinOperator: table 0 has 8000 rows for > join key [973089] > 16/11/05 18:12:32 INFO exec.CommonJoinOperator: table 0 has 16000 rows for > join key [973089] > 16/11/05 18:12:32 INFO exec.CommonJoinOperator: table 0 has 1000 rows for > join key [972962] > 16/11/05 18:12:32 INFO exec.CommonJoinOperator: table 0 has 2000 rows for > join key [972962] > 16/11/05 18:12:32 INFO exec.CommonJoinOperator: table 0 has 4000 rows for > join key [972962] > 16/11/05 18:12:32 INFO exec.CommonJoinOperator: table 0 has 8000 rows for > join key [972962] > 16/11/05 18:12:32 INFO exec.CommonJoinOperator: table 0 has 16000 rows for > join key [972962] > 16/11/05 18:12:32 INFO spark.SparkRecordHandler: processed 1051034 rows: used > memory = 10801674800 > 16/11/05 18:12:32 INFO exec.JoinOperator: SKEWJOINFOLLOWUPJOBS:0, > 16/11/05 18:12:32 INFO exec.ReduceSinkOperator: RS[9]: records written - > 1050990 > 16/11/05 18:12:32 INFO exec.ReduceSinkOperator: > RECORDS_OUT_INTERMEDIATE:1050990, > 16/11/05 18:12:32 INFO collection.ExternalSorter: Task 6856 force spilling > in-memory map to disk and it will release 347.0 MB memory > 16/11/05 18:12:32 WARN memory.TaskMemoryManager: leak 347.0 MB memory from > org.apache.spark.util.collection.ExternalSorter@1bf0a514 > 16/11/05 18:12:32 ERROR executor.Executor: Exception in task 5978.0 in stage > 1.0 (TID 6856) > java.lang.NullPointerException > at > org.apache.spark.util.collection.ExternalSorter$SpillReader.cleanup(ExternalSorter.scala:625) > at > org.apache.spark.util.collection.ExternalSorter$SpillReader.nextBatchStream(ExternalSorter.scala:540) > at > org.apache.spark.util.collection.ExternalSorter$SpillReader.(ExternalSorter.scala:508) > at > org.apache.spark.util.collection.ExternalSorter$SpillableIterator.spill(ExternalSorter.scala:814) > at > org.apache.spark.util.collection.ExternalSorter.forceSpill(ExternalSorter.scala:254) > at org.apache.spark.util.collection.Spillable.spill(Spillable.scala:111) > at > org.apache.spark.memory.TaskMemoryManager.acquireExecutionMemory(TaskMemoryManager.java:150) > at > org.apache.spark.memory.TaskMemoryManager.allocatePage(TaskMemoryManager.java:245) > at > org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:92) > at > org.apache.spark.shuffle.sort.ShuffleExternalSorter.growPointerArrayIfNecessary(ShuffleExternalSorter.java:333) > at > org.apache.spark.shuffle.sort.ShuffleExternalSorter.insertRecord(ShuffleExternalSorter.java:384) > at >
[jira] [Updated] (SPARK-18289) spark.util.collection.ExternalSorter leak memory when task force spilling in-memory map to disk
[ https://issues.apache.org/jira/browse/SPARK-18289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] KaiXu updated SPARK-18289: -- Environment: CentOS7.2 kernel: 3.10.0-327.el7.x86_64 Hadoop2.7.1 Spark2.0.0 release version Hive2.1 with patch HIVE-14029 (https://issues.apache.org/jira/browse/HIVE-14029) Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz 384GB memory was: CentOS7.2 kernel: 3.10.0-327.el7.x86_64 Hadoop2.7.1 Spark2.0.0 release version Hive2.1 with patch HIVE-14029 (https://issues.apache.org/jira/browse/HIVE-14029) Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz 384GB memory > spark.util.collection.ExternalSorter leak memory when task force spilling > in-memory map to disk > > > Key: SPARK-18289 > URL: https://issues.apache.org/jira/browse/SPARK-18289 > Project: Spark > Issue Type: Bug > Components: Scheduler, Shuffle, Spark Core >Affects Versions: 2.0.0 > Environment: CentOS7.2 kernel: 3.10.0-327.el7.x86_64 > Hadoop2.7.1 > Spark2.0.0 release version > Hive2.1 with patch HIVE-14029 > (https://issues.apache.org/jira/browse/HIVE-14029) > Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz > 384GB memory >Reporter: KaiXu > Labels: hive, spark2.0.0 > > We use BigBench to test the performance of Hive on Spark2.0 on Intel(R) > Xeon(R) CPU E5-2699 v4(1 master + 4 worker) with 3TB scale factor, we found 4 > tasks failed when running q30. When checking in executor's log, we found it > has memory leak when task force spilling in-memory map to disk. > notable configuration parameters: > spark.master=yarn > spark.deploy.mode=cluster > spark.driver.memory=4g; > spark.executor.cores=7; > spark.executor.memory=22g; > spark.yarn.executor.memoryOverhead=8192; > spark.executor.instances=50; > spark.storage.memoryFraction=0.01; > spark.memory.useLegacyMode=true; > spark.executor.extraJavaOptions=-XX:+UseParallelOldGC -XX:NewRatio=1 > -XX:SurvivorRatio=1 -XX:ParallelGCThreads=4; > executor's log: > 16/11/05 18:12:32 INFO exec.CommonJoinOperator: table 0 has 2000 rows for > join key [973089] > 16/11/05 18:12:32 INFO exec.CommonJoinOperator: table 0 has 4000 rows for > join key [973089] > 16/11/05 18:12:32 INFO exec.CommonJoinOperator: table 0 has 8000 rows for > join key [973089] > 16/11/05 18:12:32 INFO exec.CommonJoinOperator: table 0 has 16000 rows for > join key [973089] > 16/11/05 18:12:32 INFO exec.CommonJoinOperator: table 0 has 1000 rows for > join key [972962] > 16/11/05 18:12:32 INFO exec.CommonJoinOperator: table 0 has 2000 rows for > join key [972962] > 16/11/05 18:12:32 INFO exec.CommonJoinOperator: table 0 has 4000 rows for > join key [972962] > 16/11/05 18:12:32 INFO exec.CommonJoinOperator: table 0 has 8000 rows for > join key [972962] > 16/11/05 18:12:32 INFO exec.CommonJoinOperator: table 0 has 16000 rows for > join key [972962] > 16/11/05 18:12:32 INFO spark.SparkRecordHandler: processed 1051034 rows: used > memory = 10801674800 > 16/11/05 18:12:32 INFO exec.JoinOperator: SKEWJOINFOLLOWUPJOBS:0, > 16/11/05 18:12:32 INFO exec.ReduceSinkOperator: RS[9]: records written - > 1050990 > 16/11/05 18:12:32 INFO exec.ReduceSinkOperator: > RECORDS_OUT_INTERMEDIATE:1050990, > 16/11/05 18:12:32 INFO collection.ExternalSorter: Task 6856 force spilling > in-memory map to disk and it will release 347.0 MB memory > 16/11/05 18:12:32 WARN memory.TaskMemoryManager: leak 347.0 MB memory from > org.apache.spark.util.collection.ExternalSorter@1bf0a514 > 16/11/05 18:12:32 ERROR executor.Executor: Exception in task 5978.0 in stage > 1.0 (TID 6856) > java.lang.NullPointerException > at > org.apache.spark.util.collection.ExternalSorter$SpillReader.cleanup(ExternalSorter.scala:625) > at > org.apache.spark.util.collection.ExternalSorter$SpillReader.nextBatchStream(ExternalSorter.scala:540) > at > org.apache.spark.util.collection.ExternalSorter$SpillReader.(ExternalSorter.scala:508) > at > org.apache.spark.util.collection.ExternalSorter$SpillableIterator.spill(ExternalSorter.scala:814) > at > org.apache.spark.util.collection.ExternalSorter.forceSpill(ExternalSorter.scala:254) > at org.apache.spark.util.collection.Spillable.spill(Spillable.scala:111) > at > org.apache.spark.memory.TaskMemoryManager.acquireExecutionMemory(TaskMemoryManager.java:150) > at > org.apache.spark.memory.TaskMemoryManager.allocatePage(TaskMemoryManager.java:245) > at > org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:92) > at > org.apache.spark.shuffle.sort.ShuffleExternalSorter.growPointerArrayIfNecessary(ShuffleExternalSorter.java:333) > at > org.apache.spark.shuffle.sort.ShuffleExternalSorter.insertRecord(ShuffleExternalSorter.java:384) > at >