alamb opened a new issue, #6464: URL: https://github.com/apache/arrow-rs/issues/6464
**Describe the bug** If the `ParquetMetadataReader` tries to read metadata written by `ParquetMetaDataWriter` without first loading the page indexes, you get an error like "missing required field ColumnIndex.null_pages" Nite this depends on https://github.com/apache/arrow-rs/pull/6463 **To Reproduce** The full reproducer is in https://github.com/apache/arrow-rs/pull/6463. Here is the relevant piece ```rust let parquet_bytes = create_parquet_file(); // read the metadata from the file WITHOUT the page index structures let original_metadata = ParquetMetaDataReader::new() .parse_and_finish(&parquet_bytes) .unwrap(); // read metadata back from the serialized bytes requesting to read the offsets let metadata_bytes = metadata_to_bytes(&original_metadata); let roundtrip_metadata = ParquetMetaDataReader::new() .with_page_indexes(true) // there are no page indexes in the metadata .parse_and_finish(&metadata_bytes) .unwrap(); // <******* This fails ``` **Expected behavior** The reader should not error I am not sure if the right fix is to 1. change the ParquetMetadataWriter to clear the index offset fields befor writing them 2. change the ParquetMetadataReader to ignore bogus offsets 3. SOmething else **Additional context** @etseidl has added the APIs in https://github.com/apache/arrow-rs/pull/6431 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
