[
https://issues.apache.org/jira/browse/PARQUET-2351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17770367#comment-17770367
]
ASF GitHub Bot commented on PARQUET-2351:
-----------------------------------------
amousavigourabi commented on code in PR #1157:
URL: https://github.com/apache/parquet-mr/pull/1157#discussion_r1341182236
##########
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetWriter.java:
##########
@@ -358,9 +358,29 @@ public abstract static class Builder<T, SELF extends
Builder<T, SELF>> {
private long rowGroupSize = DEFAULT_BLOCK_SIZE;
private int maxPaddingSize = MAX_PADDING_SIZE_DEFAULT;
private boolean enableValidation = DEFAULT_IS_VALIDATING_ENABLED;
- private ParquetProperties.Builder encodingPropsBuilder =
+ private final ParquetProperties.Builder encodingPropsBuilder =
ParquetProperties.builder();
+ private boolean isPageSizeSet = false;
Review Comment:
I agree that these booleans aren't amazing. Optional fields would not solve
the issue of making withXXX methods always override the configuration, next to
this, Optionals shouldn't be used as fields IMO, as that is [not their intended
use](https://mail.openjdk.org/pipermail/jdk8-dev/2013-September/003274.html).
Maybe an approach with a Set would work better?
##########
parquet-hadoop/src/test/java/org/apache/parquet/filter2/dictionarylevel/DictionaryFilterTest.java:
##########
@@ -195,10 +200,28 @@ private static void prepareFile(WriterVersion version,
Path file) throws IOExcep
writeData(f, writer);
}
+ private static void prepareFileWithConf(WriterVersion version, Path file)
throws IOException {
+ Configuration configuration = new Configuration();
Review Comment:
These configurations are used to build the `encodingProps` ParquetProperties
within ParquetWriter's Builder, which are used elsewhere as well. While the
usage of Configuration in this way may be discouraged, the current situation of
settings sometimes not being picked up by the ParquetWriter is inconsistent
with both ParquetReader behaviour and user expectations.
> ParquetWriter/ParquetReader should parse options directly from supplied
> Configuration
> -------------------------------------------------------------------------------------
>
> Key: PARQUET-2351
> URL: https://issues.apache.org/jira/browse/PARQUET-2351
> Project: Parquet
> Issue Type: Improvement
> Reporter: Claire McGinty
> Priority: Minor
>
> As a Parquet user, my expectation is that ParquetWriter/ParquetReader will
> automatically parse any options passed in the supplied Configuration. For
> example:
>
> ```
> Configuration conf = new Configuration();
> conf.setBoolean(ParquetOutputFormat.BLOOM_FILTER_ENABLED, true);
> ParquetWriter<Car> writer = AvroParquetWriter.<Car>builder(path)
> .withSchema(Car.SCHEMA$)
> .withConf(conf)
> .build();
> ```
>
> However, the above code results in a ParquetWriter where `bloomFilterEnabled`
> is set to false-the expected way to configure this is to use
> `ParquetWriter#withBloomFilterEnabled` directly.
>
> IMO, when `ParquetWriter#withConf` is invoked, it should also delegate to
> `encodingPropsBuilder` and set any Configured properties.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)