etseidl commented on code in PR #6999:
URL: https://github.com/apache/arrow-rs/pull/6999#discussion_r1922751295


##########
parquet/src/file/metadata/reader.rs:
##########
@@ -1052,5 +1060,17 @@ mod async_tests {
             .unwrap();
         assert_eq!(fetch_count.load(Ordering::SeqCst), 1);
         assert!(metadata.offset_index().is_some() && 
metadata.column_index().is_some());
+
+        // Prefetch more than enough
+        fetch_count.store(0, Ordering::SeqCst);
+        let f = MetadataFetchFn(&mut fetch);
+        let metadata = ParquetMetaDataReader::new()
+            .with_page_indexes(true)
+            .with_prefetch_hint(Some(len)) // prefetch entire file

Review Comment:
   Might as well add a test with `prefetch > 130650` but less than file size.



##########
parquet/src/file/metadata/reader.rs:
##########
@@ -409,10 +409,21 @@ impl ParquetMetaDataReader {
             None => return Ok(()),
         };
 
-        let bytes = match &remainder {
-            Some((remainder_start, remainder)) if *remainder_start <= 
range.start => {
-                let offset = range.start - *remainder_start;
-                remainder.slice(offset..range.end - *remainder_start + offset)
+        let bytes = match &fetched {
+            Some((fetched_start, fetched)) if *fetched_start <= range.start => 
{
+                // `fetched`` is an amount of data spanning from fetched_start 
to the end of the file
+                // We want to slice out the range we need from that data, but 
need to adjust the
+                // range we are looking for to be relative to fetched_start.
+                let fetched_start = *fetched_start;
+                let range = range.start - fetched_start..range.end - 
fetched_start;
+                // santity check: `fetched` should always go until the end of 
the file
+                // so if our range is beyond that, something is wrong!
+                assert!(
+                    range.end <= fetched_start + fetched.len(),
+                    "range: {range:?}, fetched: {}, fetched_start: 
{fetched_start}",
+                    fetched.len()
+                );
+                fetched.slice(range)

Review Comment:
   ```suggestion
                   let offset = range.start - *remainder_start;
                   let end = offset + range.end - range.start;
                   assert!(end <= remainder.len());
                   remainder.slice(offset..end)
   ```
   Instead of all the other changes, I think this will properly compute the end 
of the slice.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to