Kontinuation opened a new issue, #884: URL: https://github.com/apache/datafusion-comet/issues/884
### Describe the bug I've built Datafusion Comet using commit https://github.com/apache/datafusion-comet/commit/f7f0bb1ed68367b8d3e1c88010c1f943f480ea11 for Spark 3.5.1. I found that the memory usage keeps increasing when repeatedly running the [TPC-H benchmark script](https://github.com/apache/datafusion-benchmarks/blob/main/runners/datafusion-comet/tpcbench.py) on a set of parquet files. The parquet files were generated using https://github.com/databricks/spark-sql-perf with scale factor = 10. The memory usage could be as high as 20GB. Given the spark and comet configurations I'm using to run the benchmarks (see **Additional context**) this seems to be problematic. ![image](https://github.com/user-attachments/assets/2adfb671-d674-4753-8bcc-cbd272e15da0) I've noticed that the native memory allocated by `Unsafe_AllocateMemory0` keeps increasing using `jcmd VM.native_memory detail.diff | grep Unsafe -A 2`. I'm not enabling offheap memory so the allocation should be initiated by the arrow `RootAllocator`: Initially after setting the baseline: ``` [0x00000001099c98a8] Unsafe_AllocateMemory0(JNIEnv_*, _jobject*, long)+0xcc [0x000000011a0523b4] (malloc=870721KB type=Other +621478KB #6842866 +4937676) -- [0x00000001099c98a8] Unsafe_AllocateMemory0(JNIEnv_*, _jobject*, long)+0xcc [0x0000000119017be0] (malloc=8463KB type=Other -469KB #221 -3) ``` After 10 minutes: ``` [0x00000001099c98a8] Unsafe_AllocateMemory0(JNIEnv_*, _jobject*, long)+0xcc [0x000000011a0523b4] (malloc=4349265KB type=Other +4100021KB #34671096 +32765906) -- [0x00000001099c98a8] Unsafe_AllocateMemory0(JNIEnv_*, _jobject*, long)+0xcc [0x0000000119017be0] (malloc=8449KB type=Other -483KB #217 -7) ``` The leaked memory were allocated by the [`CometArrowAllocator`](https://github.com/apache/datafusion-comet/blob/33706125b8c7a7f347865c7fb38fede6aceb97e9/common/src/main/scala/org/apache/comet/package.scala#L35). I've verified this by attaching a debugger to the Spark process and inspected `CometArrowAllocator.getAllocatedMemory`: ![image](https://github.com/user-attachments/assets/3d0ddeb7-d6bd-4d97-8a57-44544fc1e19f) I've also deliberately disabled AQE coalesce partitions since I noticed this issue: https://github.com/apache/datafusion-comet/issues/381. Although it is fixed I still disabled it for being safe. See **Additional context** section for more details. ### Steps to reproduce Run the [TPC-H benchmark script](https://github.com/apache/datafusion-benchmarks/blob/main/runners/datafusion-comet/tpcbench.py) with `--iterations=100` and observe the RSS of the java process of Apache Spark.Java ### Expected behavior Memory usage should not increase over time. ### Additional context I'm simply running it locally with `master = local[4]`. Here are my test environment and spark configurations: **Environment**: * Operating System: macOS 14.6.1, arch: Apple M1 Pro * Apache Spark: 3.5.1 * Datafusion Comet: commit https://github.com/apache/datafusion-comet/commit/f7f0bb1ed68367b8d3e1c88010c1f943f480ea11 * JVM: 17.0.10 (Eclipse Adoptium) **Spark configurations**: ``` spark.master local[4] spark.driver.cores 4 spark.executor.cores 4 spark.driver.memory 4g spark.executor.memory 4g spark.comet.memory.overhead.factor 0.4 spark.jars /path/to/workspace/github/datafusion-comet/spark/target/comet-spark-spark3.5_2.12-0.3.0-SNAPSHOT.jar spark.driver.extraClassPath /path/to/workspace/github/datafusion-comet/spark/target/comet-spark-spark3.5_2.12-0.3.0-SNAPSHOT.jar spark.executor.extraClassPath /path/to/workspace/github/datafusion-comet/spark/target/comet-spark-spark3.5_2.12-0.3.0-SNAPSHOT.jar spark.serializer org.apache.spark.serializer.KryoSerializer spark.sql.extensions org.apache.comet.CometSparkSessionExtensions spark.comet.enabled true spark.comet.exec.enabled true spark.comet.exec.all.enabled true spark.comet.explainFallback.enabled false spark.comet.exec.shuffle.enabled true spark.comet.exec.shuffle.mode auto spark.shuffle.manager org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager # Disable AQE coalesce partitions spark.sql.adaptive.enabled false spark.sql.adaptive.coalescePartitions.enabled false # Enable debugging and native memory tracking spark.driver.extraJavaOptions -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=5005 -XX:NativeMemoryTracking=detail ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org