[ https://issues.apache.org/jira/browse/SPARK-29244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Viacheslav Tradunsky updated SPARK-29244: ----------------------------------------- Attachment: executor_oom.txt > ArrayIndexOutOfBoundsException on TaskCompletionListener during releasing of > memory blocks > ------------------------------------------------------------------------------------------ > > Key: SPARK-29244 > URL: https://issues.apache.org/jira/browse/SPARK-29244 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 2.4.0 > Environment: Release label:emr-5.20.0 > Hadoop distribution:Amazon 2.8.5 > Applications:Livy 0.5.0, Spark 2.4.0 > Reporter: Viacheslav Tradunsky > Priority: Major > Attachments: executor_oom.txt > > > At the end of task completion an exception happened: > {code:java} > 19/09/25 09:03:58 ERROR TaskContextImpl: Error in > TaskCompletionListener19/09/25 09:03:58 ERROR TaskContextImpl: Error in > TaskCompletionListenerjava.lang.ArrayIndexOutOfBoundsException: -3 at > org.apache.spark.memory.TaskMemoryManager.freePage(TaskMemoryManager.java:333) > at org.apache.spark.memory.MemoryConsumer.freePage(MemoryConsumer.java:130) > at org.apache.spark.memory.MemoryConsumer.freeArray(MemoryConsumer.java:108) > at org.apache.spark.unsafe.map.BytesToBytesMap.free(BytesToBytesMap.java:803) > at > org.apache.spark.sql.execution.UnsafeFixedWidthAggregationMap.free(UnsafeFixedWidthAggregationMap.java:225) > at > org.apache.spark.sql.execution.UnsafeFixedWidthAggregationMap.lambda$new$0(UnsafeFixedWidthAggregationMap.java:111) > at > org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:117) > at > org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:117) > at > org.apache.spark.TaskContextImpl$$anonfun$invokeListeners$1.apply(TaskContextImpl.scala:130) > at > org.apache.spark.TaskContextImpl$$anonfun$invokeListeners$1.apply(TaskContextImpl.scala:128) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at > org.apache.spark.TaskContextImpl.invokeListeners(TaskContextImpl.scala:128) > at > org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:116) > at org.apache.spark.scheduler.Task.run(Task.scala:131) at > org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408) at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} > > Important to note, that before this one, there was OOM of allocating some > pages. It looks like everything related to each other, but on OOM the whole > flow goes abnormally, so no resources are fried correctly. > {code:java} > java.lang.NullPointerExceptionjava.lang.NullPointerException at > org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.getMemoryUsage(UnsafeInMemorySorter.java:208) > at > org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.getMemoryUsage(UnsafeExternalSorter.java:249) > at > org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.updatePeakMemoryUsed(UnsafeExternalSorter.java:253) > at > org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.freeMemory(UnsafeExternalSorter.java:296) > at > org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.cleanupResources(UnsafeExternalSorter.java:328) > at > org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.lambda$new$0(UnsafeExternalSorter.java:178) > at > org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:117) > at > org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:117) > at > org.apache.spark.TaskContextImpl$$anonfun$invokeListeners$1.apply(TaskContextImpl.scala:130) > at > org.apache.spark.TaskContextImpl$$anonfun$invokeListeners$1.apply(TaskContextImpl.scala:128) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at > org.apache.spark.TaskContextImpl.invokeListeners(TaskContextImpl.scala:128) > at > org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:116) > at org.apache.spark.scheduler.Task.run(Task.scala:131) at > org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408) at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} > > This is must be something with job planning, but taking so many exceptions > into account doesn't make things easier. Would be happy to provide more > details. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org