wgtmac commented on code in PR #1157:
URL: https://github.com/apache/parquet-mr/pull/1157#discussion_r1339538597
##########
parquet-hadoop/src/test/java/org/apache/parquet/filter2/dictionarylevel/DictionaryFilterTest.java:
##########
@@ -195,10 +200,28 @@ private static void prepareFile(WriterVersion version,
Path file) throws IOExcep
writeData(f, writer);
}
+ private static void prepareFileWithConf(WriterVersion version, Path file)
throws IOException {
+ Configuration configuration = new Configuration();
Review Comment:
It seems that these specific configurations defined in the
ParquetOutputFormat are solely used for ParquetOutputFormat to create a
RecordWriter, which actually puts all of them into a ParquetProperties.
https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetOutputFormat.java#L457-L484
IIUC, relying on settings from Hadoop configuration is discouraged, we
should use ParquetProperties to set all those things.
##########
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetWriter.java:
##########
@@ -358,9 +358,29 @@ public abstract static class Builder<T, SELF extends
Builder<T, SELF>> {
private long rowGroupSize = DEFAULT_BLOCK_SIZE;
private int maxPaddingSize = MAX_PADDING_SIZE_DEFAULT;
private boolean enableValidation = DEFAULT_IS_VALIDATING_ENABLED;
- private ParquetProperties.Builder encodingPropsBuilder =
+ private final ParquetProperties.Builder encodingPropsBuilder =
ParquetProperties.builder();
+ private boolean isPageSizeSet = false;
Review Comment:
TBH, these would make the code less maintainable. Not sure if Optional would
make them more organized.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]