paleolimbot opened a new issue, #22108:
URL: https://github.com/apache/datafusion/issues/22108

   ### Describe the bug
   
   The window functions NTH_VALUE, FIRST_VALUE, and LAST_VALUE drop metadata 
(i.e., the return field does not include the metadata passed into them).
   
   I believe LEAD and LAG work (although these are the two functions that 
triggered the issue for me...I think a few PRs touched these functions since 
DataFusion 52 and these seem to be OK now).
   
   ### To Reproduce
   
   Exuberant all-caps comments courtesy of copilot/opus 🙂 
   
   ```sql
   -- Reproducer: Window functions dropping arrow field metadata
   -- Run with: target/debug/datafusion-cli -f reproducer_window_metadata.sql
   --
   -- FINDING: LEAD and LAG correctly preserve metadata.
   -- BUG: FIRST_VALUE, LAST_VALUE, and NTH_VALUE DROP field metadata.
   
   -- Load test data with field metadata (Arrow extension type metadata)
   CREATE EXTERNAL TABLE ext_table 
   STORED AS ARROW 
   LOCATION 
'testing/data/arrow-ipc-stream/integration/1.0.0-littleendian/generated_extension.arrow_file';
   
   -- Show schema
   DESCRIBE ext_table;
   
   -- Show original field metadata (should show ARROW:extension:name and 
ARROW:extension:metadata)
   SELECT 'Original metadata on uuids column:';
   SELECT arrow_metadata(uuids) FROM ext_table LIMIT 1;
   
   -- 
============================================================================
   -- THESE WORK CORRECTLY (metadata preserved):
   -- 
============================================================================
   
   -- Test LEAD - preserves metadata ✓
   SELECT 'LEAD() - metadata PRESERVED (correct):';
   SELECT arrow_metadata(LEAD(uuids) OVER (ORDER BY dict_exts)) FROM ext_table 
LIMIT 1;
   
   -- Test LEAD with offset - preserves metadata ✓
   SELECT 'LEAD(col, 2) - metadata PRESERVED (correct):';
   SELECT arrow_metadata(LEAD(uuids, 2) OVER (ORDER BY dict_exts)) FROM 
ext_table LIMIT 1;
   
   -- Test LAG - preserves metadata ✓
   SELECT 'LAG() - metadata PRESERVED (correct):';
   SELECT arrow_metadata(LAG(uuids) OVER (ORDER BY dict_exts)) FROM ext_table 
LIMIT 1;
   
   -- Test LAG with offset - preserves metadata ✓
   SELECT 'LAG(col, 2) - metadata PRESERVED (correct):';
   SELECT arrow_metadata(LAG(uuids, 2) OVER (ORDER BY dict_exts)) FROM 
ext_table LIMIT 1;
   
   -- 
============================================================================
   -- THESE DROP METADATA (BUG):
   -- 
============================================================================
   
   -- Test FIRST_VALUE - DROPS metadata ✗
   SELECT 'FIRST_VALUE() - metadata DROPPED (BUG):';
   SELECT arrow_metadata(FIRST_VALUE(uuids) OVER (ORDER BY dict_exts)) FROM 
ext_table LIMIT 1;
   
   -- Test LAST_VALUE - DROPS metadata ✗
   SELECT 'LAST_VALUE() - metadata DROPPED (BUG):';
   SELECT arrow_metadata(LAST_VALUE(uuids) OVER (ORDER BY dict_exts)) FROM 
ext_table LIMIT 1;
   
   -- Test NTH_VALUE - DROPS metadata ✗
   SELECT 'NTH_VALUE() - metadata DROPPED (BUG):';
   SELECT arrow_metadata(NTH_VALUE(uuids, 1) OVER (ORDER BY dict_exts)) FROM 
ext_table LIMIT 1;
   
   -- 
============================================================================
   -- SUMMARY:
   -- 
============================================================================
   -- LEAD and LAG correctly preserve Arrow field metadata.
   -- FIRST_VALUE, LAST_VALUE, and NTH_VALUE incorrectly drop field metadata.
   --
   -- Expected: All window functions should preserve field metadata (including
   -- Arrow extension type metadata like ARROW:extension:name).
   
   DROP TABLE ext_table;
   ```
   
   ### Expected behavior
   
   I expected the NTH_VALUE and friends to return an identical field as its 
input.
   
   ### Additional context
   
   The query that triggered this for me was in DataFusion 52 via SedonaDB main:
   
   ```sql
   SELECT id, ST_MakeLine(geometry, next_geom) AS geometry
   FROM (
       SELECT id, geometry, LEAD(geometry) OVER (ORDER BY id) AS next_geom
       FROM pts
   )
   WHERE next_geom IS NOT NULL
   ```
   
   ...where `pts.geometry` is a `geoarrow.wkb` extension type.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to