JFinis commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1318737835
########## src/main/thrift/parquet.thrift: ########## @@ -529,7 +596,15 @@ struct DataPageHeader { /** Encoding used for repetition levels **/ 4: required Encoding repetition_level_encoding; - /** Optional statistics for the data in this page**/ + /** + * Optional statistics for the data in this page. + * + * For filter use-cases populating data in the page index is generally a superior + * solution because it allows readers to avoid IO, however not all readers make use + * of the page index. For best compatibility both should be populated. If the writer Review Comment: Agree, this seems to switch the suggestion from not writing both to writing both. Are we sure this PR warrants a 180 degree turn on this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org