[
https://issues.apache.org/jira/browse/PARQUET-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17325680#comment-17325680
]
Gabor Szadovszky commented on PARQUET-2030:
-------------------------------------------
[~Miksu82], it seems you are right that these options (with some others) are
not available by using {{ParquetWriter}}. The difference between
{{ParquetRecordWriter}} and {{ParquetWriter}} is that the first implements
{{org.apache.hadoop.mapreduce.RecordWriter}} so it can be used with the Hadoop
API directly while the latter is only part of the Parquet API. The bindings
(avro, proto, thrift) extends {{ParquetWriter}}.
It seems that the concept is that you are able to adjust Parquet settings via
the Hadoop conf if you are using the Hadoop API and you have the builder
pattern if you are using the Parquet API. It also seems that a couple of
options are missing from the builder. So your PR seems legit to extend the
builder with these properties.
Meanwhile, I am not sure why {{ParquetWriter.Builder}} does not support such
properties set via the Hadoop conf and/or why these options are missing from
the builder directly.
> Expose page size row check configurations to ParquetWriter.Builder
> ------------------------------------------------------------------
>
> Key: PARQUET-2030
> URL: https://issues.apache.org/jira/browse/PARQUET-2030
> Project: Parquet
> Issue Type: Improvement
> Reporter: Mika Ristimäki
> Priority: Minor
>
> PARQUET-1920 makes it possible to configure "page.size.row.check.max" and
> "page.size.row.check.max". But those configurations are not exposed to
> "org.apache.parquet.hadoop.ParquetWriter.Builder".
--
This message was sent by Atlassian Jira
(v8.3.4#803005)