nathanb9 commented on PR #9697:
URL: https://github.com/apache/arrow-rs/pull/9697#issuecomment-4243147914

   Nice, @HippoBaro. One finding: this change introduces a regression in 
reverse row-group scans, which surfaced to me in a Datafusion test:
   
https://github.com/apache/datafusion/blob/9dab336ee08eb778e23cd04acdd05f93fd0196f4/datafusion/datasource-parquet/src/opener.rs#L2471-L2477
   ```rust
   // Test reverse scan
   let opener = make_opener(true);
   let stream = opener.open(file.clone()).unwrap().await.unwrap();
   let reverse_values = collect_int32_values(stream).await;
   
   // The forward scan should return data in the order written
   assert_eq!(forward_values, vec![1, 2, 3, 4, 5, 6, 7, 8, 9]);
   
   // With reverse scan, row groups are reversed, so we expect:
   // Row group 3 (7,8,9), then row group 2 (4,5,6), then row group 1 (1,2,3)
   assert_eq!(reverse_values, vec![7, 8, 9, 4, 5, 6, 1, 2, 3]);
   ```
   result:`panic...buffers.rs:193:9:
   has_range(57..110) below watermark 163`
   
   Basically, the incremental release watermark invariant conflicts with 
reverse row-group traversal. From what I understand, incremental release 
decodes a row group, marks its ending byte offset as the watermark, and then 
expects subsequent ranges to stay at or above that offset. So not able to do 
reverse traversal and safely release in this way.
   
   could potentially support a symmetric reverse version of the same monotonic 
watermark tracking you've introduced here?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to