Omega359 commented on PR #22245:
URL: https://github.com/apache/datafusion/pull/22245#issuecomment-4526310021

   There is a correctness issue: the fast path reuses the first concrete 
input’s list null bitmap as the output null bitmap at arrays_zip.rs. The slow 
path marks an output row null only when all concrete input lists are null. The 
fast path currently marks it according to the first input.
   
   That changes semantics when inputs are perfectly aligned but have different 
list-level nulls.
   
   ```sql
   select arrays_zip(a,b) as zipped
   from (values
     (null::int[], []::int[]),
     ([]::int[], null::int[]),
     (null::int[], null::int[])
   ) as t(a,b);
   ```
   upstream/main returns:
   ```
   []
   []
   NULL
   ```
   This PR returns:
   ```
   NULL
   []
   NULL
   ```
   I would suggest adding this to the arrays_zip.slt file
   ```
   # mixed NULL list and non-null empty list: result row is non-null empty
   query ?
   select arrays_zip(a, b) from (values
     (null::int[], []::int[]),
     ([]::int[], null::int[]),
     (null::int[], null::int[])
   ) as t(a, b);
   ----
   []
   []
   NULL
   ```
   
   Attached is a benchmark update to test the performance of this change.
   
[arrays_zip_benchmark_main.patch](https://github.com/user-attachments/files/28181968/arrays_zip_benchmark_main.patch)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to