wgtmac commented on code in PR #3396:
URL: https://github.com/apache/parquet-java/pull/3396#discussion_r3151217519
##########
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ColumnChunkPageWriteStore.java:
##########
@@ -672,6 +677,88 @@ public ColumnChunkPageWriteStore(
}
}
+ public static Builder builder() {
+ return new Builder();
+ }
+
+ /**
+ * Builder for {@link ColumnChunkPageWriteStore}. Prefer this over the
constructors when new
+ * parameters are needed so that callers do not have to be updated every
time a parameter is
+ * added.
+ */
+ public static class Builder {
+ private Function<ColumnDescriptor, BytesInputCompressor>
compressorProvider;
+ private MessageType schema;
+ private ByteBufferAllocator allocator;
+ private int columnIndexTruncateLength =
ParquetProperties.DEFAULT_COLUMN_INDEX_TRUNCATE_LENGTH;
+ private boolean pageWriteChecksumEnabled =
ParquetProperties.DEFAULT_PAGE_WRITE_CHECKSUM_ENABLED;
+ private InternalFileEncryptor fileEncryptor = null;
+ private int rowGroupOrdinal = 0;
+
+ private Builder() {}
+
+ /**
+ * Use a single compressor for every column.
+ */
+ public Builder withCompressor(BytesInputCompressor compressor) {
+ this.compressorProvider = path -> compressor;
+ return this;
+ }
+
+ /**
+ * Resolve the compressor per column from the given codec factory and
properties, allowing
+ * per-column compression codecs.
+ */
+ public Builder withCodecFactory(CompressionCodecFactory codecFactory,
ParquetProperties props) {
+ this.compressorProvider = path ->
codecFactory.getCompressor(props.getCompressionCodec(path));
+ return this;
+ }
+
+ public Builder withSchema(MessageType schema) {
+ this.schema = schema;
+ return this;
+ }
+
+ public Builder withAllocator(ByteBufferAllocator allocator) {
+ this.allocator = allocator;
+ return this;
+ }
+
+ public Builder withColumnIndexTruncateLength(int
columnIndexTruncateLength) {
+ this.columnIndexTruncateLength = columnIndexTruncateLength;
+ return this;
+ }
+
+ public Builder withPageWriteChecksumEnabled(boolean
pageWriteChecksumEnabled) {
+ this.pageWriteChecksumEnabled = pageWriteChecksumEnabled;
+ return this;
+ }
+
+ public Builder withFileEncryptor(InternalFileEncryptor fileEncryptor) {
+ this.fileEncryptor = fileEncryptor;
+ return this;
+ }
+
+ public Builder withRowGroupOrdinal(int rowGroupOrdinal) {
+ this.rowGroupOrdinal = rowGroupOrdinal;
+ return this;
+ }
+
+ public ColumnChunkPageWriteStore build() {
+ if (compressorProvider == null) {
Review Comment:
It seems that `schema` and `allocator` are also required arguments so we
need to check them here. BTW, we may switch to use
`Preconditions.checkArgument` to be consistent with the repo.
##########
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/InternalParquetRecordWriter.java:
##########
@@ -77,7 +79,9 @@ class InternalParquetRecordWriter<T> {
* @param extraMetaData extra meta data to write in the footer of the
file
* @param rowGroupSize the size of a block in the file (this will be
approximate)
* @param compressor the codec used to compress
+ * @deprecated Use {@link #InternalParquetRecordWriter(ParquetFileWriter,
WriteSupport, MessageType, Map, long, CompressionCodecFactory, boolean,
ParquetProperties)} for per-column compression support
*/
+ @Deprecated
Review Comment:
Is it better not to deprecate this constructor?
`InternalParquetRecordWriter` is not public so we can change the parameter
`compressor` to be a function that similar to one used by
`ColumnChunkPageWriteStore`.
##########
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetRecordWriter.java:
##########
@@ -188,7 +188,10 @@ public ParquetRecordWriter(
* @param codec the compression codec used to compress the pages
* @param validating if schema validation should be turned on
* @param props parquet encoding properties
+ * @deprecated Use {@link #ParquetRecordWriter(ParquetFileWriter,
WriteSupport, MessageType, Map, long, boolean, ParquetProperties,
MemoryManager, Configuration)}
+ * and set the compression codec in {@link ParquetProperties}
instead
*/
+ @Deprecated
Review Comment:
Let's not add `@Deprecated` here because this constructor is not a public
one. We can directly modify its signature. We don't need to add a public ctor
since this class is only used by `ParquetOutputFormat`.
##########
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ColumnChunkPageWriteStore.java:
##########
@@ -672,6 +677,88 @@ public ColumnChunkPageWriteStore(
}
}
+ public static Builder builder() {
+ return new Builder();
+ }
+
+ /**
+ * Builder for {@link ColumnChunkPageWriteStore}. Prefer this over the
constructors when new
+ * parameters are needed so that callers do not have to be updated every
time a parameter is
+ * added.
+ */
+ public static class Builder {
Review Comment:
Does this new builder look good to you? @gszadovszky @Fokko?
##########
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ColumnChunkPageWriteStore.java:
##########
@@ -672,6 +677,88 @@ public ColumnChunkPageWriteStore(
}
}
+ public static Builder builder() {
+ return new Builder();
+ }
+
+ /**
+ * Builder for {@link ColumnChunkPageWriteStore}. Prefer this over the
constructors when new
+ * parameters are needed so that callers do not have to be updated every
time a parameter is
+ * added.
+ */
+ public static class Builder {
+ private Function<ColumnDescriptor, BytesInputCompressor>
compressorProvider;
+ private MessageType schema;
+ private ByteBufferAllocator allocator;
+ private int columnIndexTruncateLength =
ParquetProperties.DEFAULT_COLUMN_INDEX_TRUNCATE_LENGTH;
+ private boolean pageWriteChecksumEnabled =
ParquetProperties.DEFAULT_PAGE_WRITE_CHECKSUM_ENABLED;
+ private InternalFileEncryptor fileEncryptor = null;
+ private int rowGroupOrdinal = 0;
+
+ private Builder() {}
+
+ /**
+ * Use a single compressor for every column.
+ */
+ public Builder withCompressor(BytesInputCompressor compressor) {
+ this.compressorProvider = path -> compressor;
+ return this;
+ }
+
+ /**
+ * Resolve the compressor per column from the given codec factory and
properties, allowing
+ * per-column compression codecs.
+ */
+ public Builder withCodecFactory(CompressionCodecFactory codecFactory,
ParquetProperties props) {
Review Comment:
It looks a little bit weird that `ParquetProperties` is used here. Should we
directly use `Function<ColumnDescriptor, BytesInputCompressor>` to be more
flexible?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]