alamb opened a new issue, #6464:
URL: https://github.com/apache/arrow-rs/issues/6464

   **Describe the bug**
   If the `ParquetMetadataReader` tries to read metadata written by 
`ParquetMetaDataWriter` without first loading the page indexes,  you get an 
error like "missing required field ColumnIndex.null_pages"
   
   
   Nite this depends on https://github.com/apache/arrow-rs/pull/6463
   
   **To Reproduce**
   The full reproducer  is in  https://github.com/apache/arrow-rs/pull/6463. 
Here is the relevant piece
   
   ```rust
           let parquet_bytes = create_parquet_file();
   
           // read the metadata from the file WITHOUT the page index structures
           let original_metadata = ParquetMetaDataReader::new()
               .parse_and_finish(&parquet_bytes)
               .unwrap();
   
           // read metadata back from the serialized bytes requesting to read 
the offsets
           let metadata_bytes = metadata_to_bytes(&original_metadata);
           let roundtrip_metadata = ParquetMetaDataReader::new()
               .with_page_indexes(true) // there are no page indexes in the 
metadata
               .parse_and_finish(&metadata_bytes)
               .unwrap(); // <******* This fails
   ```
   
   **Expected behavior**
   The reader should not error 
   
   I am not sure if the right fix is to 
   1. change the ParquetMetadataWriter to clear the index offset fields befor 
writing them
   2. change the ParquetMetadataReader to ignore bogus offsets
   3. SOmething else
   
   **Additional context**
   @etseidl  has added the APIs in https://github.com/apache/arrow-rs/pull/6431
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to