luomh1998 opened a new issue, #11445:
URL: https://github.com/apache/incubator-gluten/issues/11445
### Backend
VL (Velox)
### Bug description
task crash when use Dynamically sizing off-heap memory, this is task log.
------------------------------------------------------------------------------
26/01/19 22:11:28 [Executor task launch worker for task 0.2 in stage 8.0
(TID 12688)] INFO UnifiedMemoryManager: Will not store
test_964a8384-26c9-4b91-9ef2-4ce0c93516a9 as the required space (8388608 bytes)
exceeds our memory limit (0 bytes)
26/01/19 22:11:28 [Executor task launch worker for task 0.2 in stage 8.0
(TID 12688)] ERROR GlobalOffHeapMemoryTarget: Spark off-heap memory is
exhausted. Storage: 0 / 0, execution: 0 / 0
26/01/19 22:11:28 [Executor task launch worker for task 0.2 in stage 8.0
(TID 12688)] WARN ThrowOnOomMemoryTarget: Max number of sleeps 9 has reached.
26/01/19 22:11:28 [Executor task launch worker for task 0.2 in stage 8.0
(TID 12688)] INFO TaskMemoryManager: Memory used in task 12688
26/01/19 22:11:28 [Executor task launch worker for task 0.2 in stage 8.0
(TID 12688)] INFO TaskMemoryManager: Acquired by
org.apache.gluten.memory.memtarget.spark.TreeMemoryConsumer@45296cac: 16.0 MiB
26/01/19 22:11:28 [Executor task launch worker for task 0.2 in stage 8.0
(TID 12688)] INFO TaskMemoryManager: 0 bytes of memory were used by task 12688
but are not associated with specific consumers
26/01/19 22:11:28 [Executor task launch worker for task 0.2 in stage 8.0
(TID 12688)] INFO TaskMemoryManager: 16777216 bytes of memory are used for
execution and 4658137 bytes of memory are used for storage
26/01/19 22:11:28 [Executor task launch worker for task 0.2 in stage 8.0
(TID 12688)] ERROR TaskResources: Task 12688 failed by error:
org.apache.gluten.exception.GlutenException: The target buffer size is
insufficient: 0 vs.2981
at
org.apache.gluten.vectorized.ColumnarBatchSerializerJniWrapper.serialize(Native
Method)
at
org.apache.spark.sql.execution.BroadcastUtils$.$anonfun$serializeStream$3(BroadcastUtils.scala:171)
at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
at scala.collection.Iterator.foreach(Iterator.scala:943)
at scala.collection.Iterator.foreach$(Iterator.scala:943)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
at
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105)
at
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49)
at scala.collection.TraversableOnce.to(TraversableOnce.scala:366)
at scala.collection.TraversableOnce.to$(TraversableOnce.scala:364)
at scala.collection.AbstractIterator.to(Iterator.scala:1431)
at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:358)
at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:358)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1431)
at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:345)
at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:339)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1431)
at
org.apache.spark.sql.execution.BroadcastUtils$.serializeStream(BroadcastUtils.scala:176)
at
org.apache.gluten.backendsapi.velox.VeloxSparkPlanExecApi.$anonfun$createBroadcastRelation$1(VeloxSparkPlanExecApi.scala:677)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2(RDD.scala:862)
at
org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2$adapted(RDD.scala:862)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92)
at
org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
at org.apache.spark.scheduler.Task.run(Task.scala:140)
at
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:562)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1555)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:565)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:842)
### Gluten version
main branch
### Spark version
Spark-3.4.x
### Spark configurations
${SPARK_HOME}/bin/spark-shell \
--master yarn --deploy-mode client \
--queue root.test1 \
--master yarn \
--driver-memory 8G \
--executor-cores 4 \
--num-executors 100 \
--conf spark.executor.memory=20g \
--conf spark.executor.memoryOverhead=6g \
--conf spark.memory.offHeap.enabled=true \
--conf spark.sql.catalogImplementation="in-memory" \
--conf spark.driver.maxResultSize=2g \
--conf spark.sql.shuffle.partitions=400 \
--conf spark.dynamicAllocation.enabled=false \
--conf spark.executorEnv.JAVA_HOME=/software/servers/jdk-17.0.12 \
--conf
spark.shuffle.manager=org.apache.spark.shuffle.sort.ColumnarShuffleManager \
--conf spark.plugins=org.apache.gluten.GlutenPlugin \
--conf spark.gluten.memory.dynamic.offHeap.sizing.enabled=true \
--conf spark.memory.offHeap.enabled=true \
--conf spark.memory.offHeap.size=2g
### System information
_No response_
### Relevant logs
```bash
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]