[I] Unable to set dictionary_page_offset when encoding_stats are missing [parquet-java]

via GitHub Thu, 18 Jul 2024 06:21:37 -0700


mothukur opened a new issue, #2962:
URL: https://github.com/apache/parquet-java/issues/2962


   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   I am facing an issue while splitting a parquet file into multiple files 
using the ParquetFileWriter.appendRowGroups API. It is failing to set the 
dictionary page offsets correctly in the new files. When investigated further, 
I observed that the API ParquetMetadataConverter.addRowGroup has an assumption 
on the availability of EncodingStats always. As per the format specification, 
it is not mandatory to have the encoding_stats. Is it possible to remove this 
requirement? 
   
   
https://github.com/apache/parquet-java/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java#L559
   
   
https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L826
   
   
   ### Component(s)
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] Unable to set dictionary_page_offset when encoding_stats are missing [parquet-java]

Reply via email to