adriangb commented on PR #15700: URL: https://github.com/apache/datafusion/pull/15700#issuecomment-3156469785
I'm trying this out for our compaction system and am not able to get my sort to work without hitting memory limits. Note that I am using `datafusion-cli` but am not sure if it has a disk manager, etc. configured, but I figure if I can't reproduce it's maybe not obvious how to configure datafusion-cli so it's a fair question: In `q.sql`: ```sql -- About 6.32 GB of parquet compressed (~ 10 x compression ratio) -- Split into ~60 ~100 MB files CREATE EXTERNAL TABLE t1 STORED AS PARQUET LOCATION '/Users/adriangb/Downloads/data/day=2025-08-05/'; SET datafusion.execution.sort_spill_reservation_bytes = 0; COPY ( SELECT * FROM t1 ORDER BY deployment_environment, kind, service_name, trace_id ) TO '/Users/adriangb/Downloads/out.parquet'; ``` ```shell ❯ ./target/release/datafusion-cli --mem-pool-type 'fair' --memory-limit '1g' -f q.sql DataFusion CLI v49.0.0 0 row(s) fetched. Elapsed 0.244 seconds. 0 row(s) fetched. Elapsed 0.000 seconds. +---------------+-------------------------------+ | plan_type | plan | +---------------+-------------------------------+ | physical_plan | ┌───────────────────────────┐ | | | │ DataSinkExec │ | | | └─────────────┬─────────────┘ | | | ┌─────────────┴─────────────┐ | | | │ SortPreservingMergeExec │ | | | │ -------------------- │ | | | │ deployment_environment ASC│ | | | │ NULLS LAST, kind ASC │ | | | │ NULLS LAST, │ | | | │ service_name │ | | | │ ASC NULLS LAST, │ | | | │ trace_id ASC NULLS │ | | | │ LAST │ | | | └─────────────┬─────────────┘ | | | ┌─────────────┴─────────────┐ | | | │ SortExec │ | | | │ -------------------- │ | | | │ deployment_environment@35 │ | | | │ ASC NULLS LAST, kind@6 │ | | | │ ASC NULLS LAST, │ | | | │ service_name@27 │ | | | │ ASC NULLS LAST, │ | | | │ trace_id@4 ASC │ | | | │ NULLS LAST │ | | | └─────────────┬─────────────┘ | | | ┌─────────────┴─────────────┐ | | | │ DataSourceExec │ | | | │ -------------------- │ | | | │ files: 68 │ | | | │ format: parquet │ | | | └───────────────────────────┘ | | | | +---------------+-------------------------------+ 1 row(s) fetched. Elapsed 0.254 seconds. Not enough memory to continue external sort. Consider increasing the memory limit, or decreasing sort_spill_reservation_bytes caused by Resources exhausted: Additional allocation failed with top memory consumers (across reservations) as: ExternalSorter[10]#25(can spill: true) consumed 78.2 MB, ExternalSorter[11]#27(can spill: true) consumed 77.2 MB, ExternalSorter[7]#19(can spill: true) consumed 75.7 MB. Error: Failed to allocate additional 90.1 MB for ExternalSorter[6] with 0.0 B already allocated for this reservation - 82.2 MB remain available for the total pool ``` I can maybe share the data with some sort of NDA but honestly it's not that interesting, it's just a lot of random data. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org