[ 
https://issues.apache.org/jira/browse/SPARK-29244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viacheslav Tradunsky updated SPARK-29244:
-----------------------------------------
    Attachment: executor_oom.txt

> ArrayIndexOutOfBoundsException on TaskCompletionListener during releasing of 
> memory blocks
> ------------------------------------------------------------------------------------------
>
>                 Key: SPARK-29244
>                 URL: https://issues.apache.org/jira/browse/SPARK-29244
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.4.0
>         Environment: Release label:emr-5.20.0
> Hadoop distribution:Amazon 2.8.5
> Applications:Livy 0.5.0, Spark 2.4.0
>            Reporter: Viacheslav Tradunsky
>            Priority: Major
>         Attachments: executor_oom.txt
>
>
> At the end of task completion an exception happened:
> {code:java}
> 19/09/25 09:03:58 ERROR TaskContextImpl: Error in 
> TaskCompletionListener19/09/25 09:03:58 ERROR TaskContextImpl: Error in 
> TaskCompletionListenerjava.lang.ArrayIndexOutOfBoundsException: -3 at 
> org.apache.spark.memory.TaskMemoryManager.freePage(TaskMemoryManager.java:333)
>  at org.apache.spark.memory.MemoryConsumer.freePage(MemoryConsumer.java:130) 
> at org.apache.spark.memory.MemoryConsumer.freeArray(MemoryConsumer.java:108) 
> at org.apache.spark.unsafe.map.BytesToBytesMap.free(BytesToBytesMap.java:803) 
> at 
> org.apache.spark.sql.execution.UnsafeFixedWidthAggregationMap.free(UnsafeFixedWidthAggregationMap.java:225)
>  at 
> org.apache.spark.sql.execution.UnsafeFixedWidthAggregationMap.lambda$new$0(UnsafeFixedWidthAggregationMap.java:111)
>  at 
> org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:117)
>  at 
> org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:117)
>  at 
> org.apache.spark.TaskContextImpl$$anonfun$invokeListeners$1.apply(TaskContextImpl.scala:130)
>  at 
> org.apache.spark.TaskContextImpl$$anonfun$invokeListeners$1.apply(TaskContextImpl.scala:128)
>  at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>  at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at 
> org.apache.spark.TaskContextImpl.invokeListeners(TaskContextImpl.scala:128) 
> at 
> org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:116) 
> at org.apache.spark.scheduler.Task.run(Task.scala:131) at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402)
>  at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408) at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> {code}
>  
> Important to note, that before this one, there was OOM of allocating some 
> pages. It looks like everything related to each other, but on OOM the whole 
> flow goes abnormally, so no resources are fried correctly.
> {code:java}
> java.lang.NullPointerExceptionjava.lang.NullPointerException at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.getMemoryUsage(UnsafeInMemorySorter.java:208)
>  at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.getMemoryUsage(UnsafeExternalSorter.java:249)
>  at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.updatePeakMemoryUsed(UnsafeExternalSorter.java:253)
>  at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.freeMemory(UnsafeExternalSorter.java:296)
>  at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.cleanupResources(UnsafeExternalSorter.java:328)
>  at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.lambda$new$0(UnsafeExternalSorter.java:178)
>  at 
> org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:117)
>  at 
> org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:117)
>  at 
> org.apache.spark.TaskContextImpl$$anonfun$invokeListeners$1.apply(TaskContextImpl.scala:130)
>  at 
> org.apache.spark.TaskContextImpl$$anonfun$invokeListeners$1.apply(TaskContextImpl.scala:128)
>  at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>  at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at 
> org.apache.spark.TaskContextImpl.invokeListeners(TaskContextImpl.scala:128) 
> at 
> org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:116) 
> at org.apache.spark.scheduler.Task.run(Task.scala:131) at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402)
>  at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408) at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> {code}
>  
> This is must be something with job planning, but taking so many exceptions 
> into account doesn't make things easier. Would be happy to provide more 
> details.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to