adriangb opened a new issue, #22225: URL: https://github.com/apache/datafusion/issues/22225
## Is your feature request related to a problem or challenge? `arrays_zip` (`arrays_zip_inner_with_field` in `datafusion/functions-nested/src/arrays_zip.rs`) always assembles its output by walking every row through per-column `MutableArrayData` builders, copying each input slice one row at a time (`builder.extend(0, start, end)`) and padding shorter rows with NULLs (`builder.extend_nulls(...)`). When the inputs form a **perfect zip** — every input array has identical per-row element lengths, no null list rows with non-zero element slots, and therefore no null padding is needed — this row-by-row copy is wasted work. In that case the resulting struct child columns are bit-identical to the (concatenated) input value arrays, and the list offsets are identical to the inputs' offsets. ## Describe the solution you'd like Detect the perfect-zip case up front and skip the `MutableArrayData` path entirely: - Build the output struct child columns directly from the original input value `ArrayRef`s (clone / concat, no per-row copy). - Reuse an input array's offset buffer for the resulting `ListArray` instead of rebuilding it. This keeps the existing general path as a fallback for the ragged / null-padded cases. ## Describe alternatives you've considered Keep the current always-copy implementation. It is correct but does avoidable work for the common case where all zipped arrays line up. ## Additional context Raised by @paleolimbot while reviewing #21984: > Not here, but for the perfect zip (all value arrays the same length, no nulls with non-zero element slot lengths, no null padding needed) this should ideally be just clones of the original arrayrefs Split out of #21984 (a metadata-propagation bugfix) since this is an orthogonal performance optimization that warrants its own benchmarks and edge-case tests. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
