nathanb9 commented on PR #9697: URL: https://github.com/apache/arrow-rs/pull/9697#issuecomment-4243147914
Nice, @HippoBaro. One finding: this change introduces a regression in reverse row-group scans, which surfaced to me in a Datafusion test: https://github.com/apache/datafusion/blob/9dab336ee08eb778e23cd04acdd05f93fd0196f4/datafusion/datasource-parquet/src/opener.rs#L2471-L2477 ```rust // Test reverse scan let opener = make_opener(true); let stream = opener.open(file.clone()).unwrap().await.unwrap(); let reverse_values = collect_int32_values(stream).await; // The forward scan should return data in the order written assert_eq!(forward_values, vec![1, 2, 3, 4, 5, 6, 7, 8, 9]); // With reverse scan, row groups are reversed, so we expect: // Row group 3 (7,8,9), then row group 2 (4,5,6), then row group 1 (1,2,3) assert_eq!(reverse_values, vec![7, 8, 9, 4, 5, 6, 1, 2, 3]); ``` result:`panic...buffers.rs:193:9: has_range(57..110) below watermark 163` Basically, the incremental release watermark invariant conflicts with reverse row-group traversal. From what I understand, incremental release decodes a row group, marks its ending byte offset as the watermark, and then expects subsequent ranges to stay at or above that offset. So not able to do reverse traversal and safely release in this way. could potentially support a symmetric reverse version of the same monotonic watermark tracking you've introduced here? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
