Ted-Jiang commented on code in PR #3616:
URL: https://github.com/apache/arrow-datafusion/pull/3616#discussion_r979535017


##########
datafusion/core/src/datasource/file_format/parquet.rs:
##########
@@ -409,14 +411,31 @@ pub async fn fetch_parquet_metadata(
         metadata.put(remaining_metadata.as_ref());
         metadata.put(&suffix[..suffix_len - 8]);
 
-        Ok(decode_metadata(metadata.as_ref())?)
+        decode_metadata(metadata.as_ref())?
     } else {
         let metadata_start = meta.size - length - 8;
 
-        Ok(decode_metadata(
-            &suffix[metadata_start - footer_start..suffix_len - 8],
-        )?)
+        decode_metadata(&suffix[metadata_start - footer_start..suffix_len - 
8])?
+    };
+
+    if enable_page_index {
+        // TODO add async version in arrow-rs avoid read whole file.

Review Comment:
   will modify in `arrow-rs` 



##########
datafusion/core/src/datasource/file_format/parquet.rs:
##########
@@ -409,14 +411,31 @@ pub async fn fetch_parquet_metadata(
         metadata.put(remaining_metadata.as_ref());
         metadata.put(&suffix[..suffix_len - 8]);
 
-        Ok(decode_metadata(metadata.as_ref())?)
+        decode_metadata(metadata.as_ref())?
     } else {
         let metadata_start = meta.size - length - 8;
 
-        Ok(decode_metadata(
-            &suffix[metadata_start - footer_start..suffix_len - 8],
-        )?)
+        decode_metadata(&suffix[metadata_start - footer_start..suffix_len - 
8])?
+    };
+
+    if enable_page_index {
+        // TODO add async version in arrow-rs avoid read whole file.
+        let bytes = store.get_range(&meta.location, 0..meta.size).await?;
+        let mut location_vec = vec![];
+        let mut index_vec = vec![];
+        for rg in result_meta.row_groups() {
+            location_vec.push(index_reader::read_pages_locations(&bytes, 
rg.columns())?);

Review Comment:
   For now, will read all cols index, Maybe will can modify the `pub async fn 
fetch_parquet_metadata` API add `index_projection`, if its bottleneck.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to