Kontinuation commented on issue #15028: URL: https://github.com/apache/datafusion/issues/15028#issuecomment-2702915602
I have tried the repro. This is more like a problem of Parquet writer, and not strongly related to sorting. I made small tweaks to the repro code to expose the status of memory consumers and the backtrace, the failure I got was: ``` Error: Resources exhausted: Additional allocation failed with top memory consumers (across reservations) as: ParquetSink(ArrowColumnWriter) consumed 65911731 bytes (62.8 MB), ExternalSorter[0] consumed 23587552 bytes (22.4 MB), ExternalSorterMerge[0] consumed 14261080 bytes (13.6 MB), ParquetSink(SerializedFileWriter) consumed 0 bytes. Error: Failed to allocate additional 1450451 bytes for ParquetSink(ArrowColumnWriter) with 62770337 bytes already allocated for this reservation - 1097237 bytes remain available for the total pool ``` I've slightly reformatted the error message to make it more readable. The backtrace is: ``` backtrace: 0: std::backtrace_rs::backtrace::libunwind::trace at /rustc/30f168ef811aec63124eac677e14699baa9395bd/library/std/src/../../backtrace/src/backtrace/libunwind.rs:117:9 1: std::backtrace_rs::backtrace::trace_unsynchronized at /rustc/30f168ef811aec63124eac677e14699baa9395bd/library/std/src/../../backtrace/src/backtrace/mod.rs:66:14 2: std::backtrace::Backtrace::create at /rustc/30f168ef811aec63124eac677e14699baa9395bd/library/std/src/backtrace.rs:331:13 3: datafusion_common::error::DataFusionError::get_back_trace at /Users/bopeng/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/datafusion-common-45.0.0/src/error.rs:410:30 4: datafusion_execution::memory_pool::pool::insufficient_capacity_err at /Users/bopeng/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/datafusion-execution-45.0.0/src/memory_pool/pool.rs:249:5 5: <datafusion_execution::memory_pool::pool::FairSpillPool as datafusion_execution::memory_pool::MemoryPool>::try_grow at /Users/bopeng/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/datafusion-execution-45.0.0/src/memory_pool/pool.rs:220:32 6: <datafusion_execution::memory_pool::pool::TrackConsumersPool<I> as datafusion_execution::memory_pool::MemoryPool>::try_grow at /Users/bopeng/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/datafusion-execution-45.0.0/src/memory_pool/pool.rs:362:9 7: datafusion_execution::memory_pool::MemoryReservation::try_grow at /Users/bopeng/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/datafusion-execution-45.0.0/src/memory_pool/mod.rs:298:9 8: datafusion_execution::memory_pool::MemoryReservation::try_resize at /Users/bopeng/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/datafusion-execution-45.0.0/src/memory_pool/mod.rs:281:34 9: datafusion::datasource::file_format::parquet::column_serializer_task::{{closure}} at /Users/bopeng/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/datafusion-45.0.0/src/datasource/file_format/parquet.rs:900:9 10: <core::pin::Pin<P> as core::future::future::Future>::poll at /Users/bopeng/.rustup/toolchains/nightly-aarch64-apple-darwin/lib/rustlib/src/rust/library/core/src/future/future.rs:124:9 11: tokio::runtime::task::core::Core<T,S>::poll::{{closure}} at /Users/bopeng/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.43.0/src/runtime/task/core.rs:331:17 12: tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut at /Users/bopeng/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.43.0/src/loom/std/unsafe_cell.rs:16:9 13: tokio::runtime::task::core::Core<T,S>::poll at /Users/bopeng/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.43.0/src/runtime/task/core.rs:320:13 14: tokio::runtime::task::harness::poll_future::{{closure}} at /Users/bopeng/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.43.0/src/runtime/task/harness.rs:532:19 15: <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once at /Users/bopeng/.rustup/toolchains/nightly-aarch64-apple-darwin/lib/rustlib/src/rust/library/core/src/panic/unwind_safe.rs:272:9 16: std::panicking::try::do_call at /Users/bopeng/.rustup/toolchains/nightly-aarch64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:587:40 17: ___rust_try ... ``` Parquet writer consumed most of the memory and triggered the allocation failure. The memory reserved is too small for Parquet writer to hold row groups in memory before flushing out to disk. This issue can also be reproduced without sorting. I tried replacing `let sorted = df.sort(...)` with `let sorted = df`. The error message is: ``` Error: Resources exhausted: Additional allocation failed with top memory consumers (across reservations) as: ParquetSink(ArrowColumnWriter) consumed 104209999 bytes (99.4 MB), ParquetSink(SerializedFileWriter) consumed 0 bytes. Error: Failed to allocate additional 1253843 bytes for ParquetSink(ArrowColumnWriter) with 99576954 bytes already allocated for this reservation - 647601 bytes remain available for the total pool ``` I've tried setting a smaller `max_row_group_size` to reduce the amount of memory required by ParquetSink, then the query finished successfully: ```rust table_opts.global.max_row_group_size = 1000; ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org