[ 
https://issues.apache.org/jira/browse/PARQUET-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17325680#comment-17325680
 ] 

Gabor Szadovszky commented on PARQUET-2030:
-------------------------------------------

[~Miksu82], it seems you are right that these options (with some others) are 
not available by using {{ParquetWriter}}. The difference between 
{{ParquetRecordWriter}} and {{ParquetWriter}} is that the first implements 
{{org.apache.hadoop.mapreduce.RecordWriter}} so it can be used with the Hadoop 
API directly while the latter is only part of the Parquet API. The bindings 
(avro, proto, thrift) extends {{ParquetWriter}}.
It seems that the concept is that you are able to adjust Parquet settings via 
the Hadoop conf if you are using the Hadoop API and you have the builder 
pattern if you are using the Parquet API. It also seems that a couple of 
options are missing from the builder. So your PR seems legit to extend  the 
builder with these properties.
Meanwhile, I am not sure why {{ParquetWriter.Builder}} does not support such 
properties set via the Hadoop conf and/or why these options are missing from 
the builder directly.

> Expose page size row check configurations to ParquetWriter.Builder
> ------------------------------------------------------------------
>
>                 Key: PARQUET-2030
>                 URL: https://issues.apache.org/jira/browse/PARQUET-2030
>             Project: Parquet
>          Issue Type: Improvement
>            Reporter: Mika Ristimäki
>            Priority: Minor
>
> PARQUET-1920 makes it possible to configure "page.size.row.check.max" and 
> "page.size.row.check.max". But those configurations are not exposed to 
> "org.apache.parquet.hadoop.ParquetWriter.Builder".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to