adamreeve commented on code in PR #7111:
URL: https://github.com/apache/arrow-rs/pull/7111#discussion_r2015196618
##########
parquet/src/arrow/arrow_writer/mod.rs:
##########
@@ -727,85 +819,173 @@ pub fn get_column_writers(
) -> Result<Vec<ArrowColumnWriter>> {
let mut writers = Vec::with_capacity(arrow.fields.len());
let mut leaves = parquet.columns().iter();
+ let column_factory = ArrowColumnWriterFactory::new();
for field in &arrow.fields {
- get_arrow_column_writer(field.data_type(), props, &mut leaves, &mut
writers)?;
+ column_factory.get_arrow_column_writer(
+ field.data_type(),
+ props,
+ &mut leaves,
+ &mut writers,
+ )?;
}
Ok(writers)
}
-/// Gets the [`ArrowColumnWriter`] for the given `data_type`
-fn get_arrow_column_writer(
- data_type: &ArrowDataType,
+/// Returns the [`ArrowColumnWriter`] for a given schema and supports columnar
encryption
+#[cfg(feature = "encryption")]
+fn get_column_writers_with_encryptor(
Review Comment:
`get_column_writers` above is a `pub fn` to allow users to have low-level
control over column writing, eg. see docs at
https://github.com/apache/arrow-rs/blob/4e9e1570ef28c160492891539bbc9649ec069a53/parquet/src/arrow/arrow_writer/mod.rs#L606
That method won't work with encryption though.
This new method is private as `FileEncryptor` is only `pub(crate)`, but
maybe we should make this pub and only require pub types in order to support
the same use case? Or modify `get_column_writers` to take a `row_group_idx` and
`&SerializedFileWriter` if we can make breaking changes?
If going with the first approach, that could always be done later as a
non-breaking change.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]