paleolimbot opened a new issue, #22108:
URL: https://github.com/apache/datafusion/issues/22108
### Describe the bug
The window functions NTH_VALUE, FIRST_VALUE, and LAST_VALUE drop metadata
(i.e., the return field does not include the metadata passed into them).
I believe LEAD and LAG work (although these are the two functions that
triggered the issue for me...I think a few PRs touched these functions since
DataFusion 52 and these seem to be OK now).
### To Reproduce
Exuberant all-caps comments courtesy of copilot/opus 🙂
```sql
-- Reproducer: Window functions dropping arrow field metadata
-- Run with: target/debug/datafusion-cli -f reproducer_window_metadata.sql
--
-- FINDING: LEAD and LAG correctly preserve metadata.
-- BUG: FIRST_VALUE, LAST_VALUE, and NTH_VALUE DROP field metadata.
-- Load test data with field metadata (Arrow extension type metadata)
CREATE EXTERNAL TABLE ext_table
STORED AS ARROW
LOCATION
'testing/data/arrow-ipc-stream/integration/1.0.0-littleendian/generated_extension.arrow_file';
-- Show schema
DESCRIBE ext_table;
-- Show original field metadata (should show ARROW:extension:name and
ARROW:extension:metadata)
SELECT 'Original metadata on uuids column:';
SELECT arrow_metadata(uuids) FROM ext_table LIMIT 1;
--
============================================================================
-- THESE WORK CORRECTLY (metadata preserved):
--
============================================================================
-- Test LEAD - preserves metadata ✓
SELECT 'LEAD() - metadata PRESERVED (correct):';
SELECT arrow_metadata(LEAD(uuids) OVER (ORDER BY dict_exts)) FROM ext_table
LIMIT 1;
-- Test LEAD with offset - preserves metadata ✓
SELECT 'LEAD(col, 2) - metadata PRESERVED (correct):';
SELECT arrow_metadata(LEAD(uuids, 2) OVER (ORDER BY dict_exts)) FROM
ext_table LIMIT 1;
-- Test LAG - preserves metadata ✓
SELECT 'LAG() - metadata PRESERVED (correct):';
SELECT arrow_metadata(LAG(uuids) OVER (ORDER BY dict_exts)) FROM ext_table
LIMIT 1;
-- Test LAG with offset - preserves metadata ✓
SELECT 'LAG(col, 2) - metadata PRESERVED (correct):';
SELECT arrow_metadata(LAG(uuids, 2) OVER (ORDER BY dict_exts)) FROM
ext_table LIMIT 1;
--
============================================================================
-- THESE DROP METADATA (BUG):
--
============================================================================
-- Test FIRST_VALUE - DROPS metadata ✗
SELECT 'FIRST_VALUE() - metadata DROPPED (BUG):';
SELECT arrow_metadata(FIRST_VALUE(uuids) OVER (ORDER BY dict_exts)) FROM
ext_table LIMIT 1;
-- Test LAST_VALUE - DROPS metadata ✗
SELECT 'LAST_VALUE() - metadata DROPPED (BUG):';
SELECT arrow_metadata(LAST_VALUE(uuids) OVER (ORDER BY dict_exts)) FROM
ext_table LIMIT 1;
-- Test NTH_VALUE - DROPS metadata ✗
SELECT 'NTH_VALUE() - metadata DROPPED (BUG):';
SELECT arrow_metadata(NTH_VALUE(uuids, 1) OVER (ORDER BY dict_exts)) FROM
ext_table LIMIT 1;
--
============================================================================
-- SUMMARY:
--
============================================================================
-- LEAD and LAG correctly preserve Arrow field metadata.
-- FIRST_VALUE, LAST_VALUE, and NTH_VALUE incorrectly drop field metadata.
--
-- Expected: All window functions should preserve field metadata (including
-- Arrow extension type metadata like ARROW:extension:name).
DROP TABLE ext_table;
```
### Expected behavior
I expected the NTH_VALUE and friends to return an identical field as its
input.
### Additional context
The query that triggered this for me was in DataFusion 52 via SedonaDB main:
```sql
SELECT id, ST_MakeLine(geometry, next_geom) AS geometry
FROM (
SELECT id, geometry, LEAD(geometry) OVER (ORDER BY id) AS next_geom
FROM pts
)
WHERE next_geom IS NOT NULL
```
...where `pts.geometry` is a `geoarrow.wkb` extension type.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]