jizezhang commented on issue #18782: URL: https://github.com/apache/datafusion/issues/18782#issuecomment-3574061699
Actually the behavior of one other test `test_preserve_order_with_spilling` may probably be also affected by this change. When reserving memory for an array, e.g. here https://github.com/apache/datafusion/blob/d24eb4a23156b7814836e765d5890186ab40682f/datafusion/physical-plan/src/sorts/stream.rs#L247-L250 the buffer size is computed using `capacity` for primitive arrays https://github.com/apache/arrow-rs/blob/a8a63c28d14b99d8f50b32f3184ab986bad15e50/arrow-array/src/array/primitive_array.rs#L1242. When arrow `BatchCoalescer` coalesces batches, it copies rows from batches to the internal `in_progress_arrays`, and for `InProgressPrimitiveArray`, this involves an `ensure_capacity` call that takes `batch_size` https://github.com/apache/arrow-rs/blob/a67d49758b1faee7d42fe3b215e226d6d560f237/arrow-select/src/coalesce/primitive.rs#L58. If using default batch size of 8192, 64B is not enough for this test. I plan to adjust the batch size but wanted to mention here just in case this does not make sense. Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
