[ 
https://issues.apache.org/jira/browse/PARQUET-2351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17770367#comment-17770367
 ] 

ASF GitHub Bot commented on PARQUET-2351:
-----------------------------------------

amousavigourabi commented on code in PR #1157:
URL: https://github.com/apache/parquet-mr/pull/1157#discussion_r1341182236


##########
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetWriter.java:
##########
@@ -358,9 +358,29 @@ public abstract static class Builder<T, SELF extends 
Builder<T, SELF>> {
     private long rowGroupSize = DEFAULT_BLOCK_SIZE;
     private int maxPaddingSize = MAX_PADDING_SIZE_DEFAULT;
     private boolean enableValidation = DEFAULT_IS_VALIDATING_ENABLED;
-    private ParquetProperties.Builder encodingPropsBuilder =
+    private final ParquetProperties.Builder encodingPropsBuilder =
         ParquetProperties.builder();
 
+    private boolean isPageSizeSet = false;

Review Comment:
   I agree that these booleans aren't amazing. Optional fields would not solve 
the issue of making withXXX methods always override the configuration, next to 
this, Optionals shouldn't be used as fields IMO, as that is [not their intended 
use](https://mail.openjdk.org/pipermail/jdk8-dev/2013-September/003274.html). 
Maybe an approach with a Set would work better?



##########
parquet-hadoop/src/test/java/org/apache/parquet/filter2/dictionarylevel/DictionaryFilterTest.java:
##########
@@ -195,10 +200,28 @@ private static void prepareFile(WriterVersion version, 
Path file) throws IOExcep
     writeData(f, writer);
   }
 
+  private static void prepareFileWithConf(WriterVersion version, Path file) 
throws IOException {
+    Configuration configuration = new Configuration();

Review Comment:
   These configurations are used to build the `encodingProps` ParquetProperties 
within ParquetWriter's Builder, which are used elsewhere as well. While the 
usage of Configuration in this way may be discouraged, the current situation of 
settings sometimes not being picked up by the ParquetWriter is inconsistent 
with both ParquetReader behaviour and user expectations.





> ParquetWriter/ParquetReader should parse options directly from supplied 
> Configuration
> -------------------------------------------------------------------------------------
>
>                 Key: PARQUET-2351
>                 URL: https://issues.apache.org/jira/browse/PARQUET-2351
>             Project: Parquet
>          Issue Type: Improvement
>            Reporter: Claire McGinty
>            Priority: Minor
>
> As a Parquet user, my expectation is that ParquetWriter/ParquetReader will 
> automatically parse any options passed in the supplied Configuration. For 
> example:
>  
> ```
> Configuration conf = new Configuration();
> conf.setBoolean(ParquetOutputFormat.BLOOM_FILTER_ENABLED, true);
> ParquetWriter<Car> writer = AvroParquetWriter.<Car>builder(path)
> .withSchema(Car.SCHEMA$)
> .withConf(conf)
> .build();
> ```
>  
> However, the above code results in a ParquetWriter where `bloomFilterEnabled` 
> is set to false-the expected way to configure this is to use 
> `ParquetWriter#withBloomFilterEnabled` directly.
>  
> IMO, when `ParquetWriter#withConf` is invoked, it should also delegate to 
> `encodingPropsBuilder` and set any Configured properties.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to