emkornfield commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1318928626
########## src/main/thrift/parquet.thrift: ########## @@ -529,7 +596,15 @@ struct DataPageHeader { /** Encoding used for repetition levels **/ 4: required Encoding repetition_level_encoding; - /** Optional statistics for the data in this page**/ + /** + * Optional statistics for the data in this page. + * + * For filter use-cases populating data in the page index is generally a superior + * solution because it allows readers to avoid IO, however not all readers make use + * of the page index. For best compatibility both should be populated. If the writer Review Comment: I think this has likely become tangential to the PR I'm going to revert this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org