amousavigourabi commented on code in PR #1157:
URL: https://github.com/apache/parquet-mr/pull/1157#discussion_r1341182236
##########
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetWriter.java:
##########
@@ -358,9 +358,29 @@ public abstract static class Builder<T, SELF extends
Builder<T, SELF>> {
private long rowGroupSize = DEFAULT_BLOCK_SIZE;
private int maxPaddingSize = MAX_PADDING_SIZE_DEFAULT;
private boolean enableValidation = DEFAULT_IS_VALIDATING_ENABLED;
- private ParquetProperties.Builder encodingPropsBuilder =
+ private final ParquetProperties.Builder encodingPropsBuilder =
ParquetProperties.builder();
+ private boolean isPageSizeSet = false;
Review Comment:
I agree that these booleans aren't amazing. Optional fields would not solve
the issue of making withXXX methods always override the configuration, next to
this, Optionals shouldn't be used as fields IMO, as that is [not their intended
use](https://mail.openjdk.org/pipermail/jdk8-dev/2013-September/003274.html).
Maybe an approach with a Set would work better?
##########
parquet-hadoop/src/test/java/org/apache/parquet/filter2/dictionarylevel/DictionaryFilterTest.java:
##########
@@ -195,10 +200,28 @@ private static void prepareFile(WriterVersion version,
Path file) throws IOExcep
writeData(f, writer);
}
+ private static void prepareFileWithConf(WriterVersion version, Path file)
throws IOException {
+ Configuration configuration = new Configuration();
Review Comment:
These configurations are used to build the `encodingProps` ParquetProperties
within ParquetWriter's Builder, which are used elsewhere as well. While the
usage of Configuration in this way may be discouraged, the current situation of
settings sometimes not being picked up by the ParquetWriter is inconsistent
with both ParquetReader behaviour and user expectations.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]