(arrow-rs) branch main updated: Undeprecate `ArrowWriter::into_serialized_writer` and add docs (#8621)

alamb Fri, 17 Oct 2025 18:21:03 -0700

This is an automated email from the ASF dual-hosted git repository.

alamb pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git



The following commit(s) were added to refs/heads/main by this push:
     new 5a384f4c3c Undeprecate `ArrowWriter::into_serialized_writer` and add 
docs (#8621)
5a384f4c3c is described below

commit 5a384f4c3ccd397dcb8763d89e958da3fa4c666c
Author: Andrew Lamb <[email protected]>
AuthorDate: Thu Oct 16 09:11:56 2025 -0700

    Undeprecate `ArrowWriter::into_serialized_writer` and add docs (#8621)
    
    # Which issue does this PR close?
    
    We generally require a GitHub issue to be filed for all bug fixes and
    enhancements and this helps us generate change logs for our releases.
    You can link an issue to this PR using the GitHub syntax.
    
    - Related to https://github.com/apache/arrow-rs/issues/7835
    
    
    # Rationale for this change
    
    
    While testing the arrow 57 upgrade in DataFusion I found a few things
    that need to be fixed
    in parquet-rs.
    
    - https://github.com/apache/datafusion/pull/17888
    
    One was that the method `ArrowWriter::into_serialized_writer` was
    deprecated, (which I know I suggested in
    https://github.com/apache/arrow-rs/issues/8389 🤦 ). However, when
    testing it turns out that the constructor of `SerializedFileWriter` does
    a lot of work (like creating the parquet schema from the arrow schema
    and messing with metadata)
    
https://github.com/apache/arrow-rs/blob/c4f0fc12199df696620c73d62523c8eef5743bf2/parquet/src/arrow/arrow_writer/mod.rs#L230-L263
    
    Creating a `RowGroupWriterFactory` directly would involve a bunch of
    code duplication
    
    # What changes are included in this PR?
    
    So let's not deprecate this method for now and instead add some
    additional docs to guide people to the right lace
    
    
    # Are these changes tested?
    I tested manually upstream
    
    # Are there any user-facing changes?
    
    If there are user-facing changes then we may require documentation to be
    updated before approving the PR.
    
    If there are any breaking changes to public APIs, please call them out.
---
 parquet/src/arrow/arrow_writer/mod.rs | 22 ++++++++++++++++------
 1 file changed, 16 insertions(+), 6 deletions(-)

diff --git a/parquet/src/arrow/arrow_writer/mod.rs 
b/parquet/src/arrow/arrow_writer/mod.rs
index c2a7a6376f..3e3c9108d5 100644
--- a/parquet/src/arrow/arrow_writer/mod.rs
+++ b/parquet/src/arrow/arrow_writer/mod.rs
@@ -450,11 +450,11 @@ impl<W: Write + Send> ArrowWriter<W> {
     }
 
     /// Converts this writer into a lower-level [`SerializedFileWriter`] and 
[`ArrowRowGroupWriterFactory`].
-    /// This can be useful to provide more control over how files are written.
-    #[deprecated(
-        since = "57.0.0",
-        note = "Construct a `SerializedFileWriter` and 
`ArrowRowGroupWriterFactory` directly instead"
-    )]
+    ///
+    /// Flushes any outstanding data before returning.
+    ///
+    /// This can be useful to provide more control over how files are written, 
for example
+    /// to write columns in parallel. See the example on [`ArrowColumnWriter`].
     pub fn into_serialized_writer(
         mut self,
     ) -> Result<(SerializedFileWriter<W>, ArrowRowGroupWriterFactory)> {
@@ -872,6 +872,12 @@ impl ArrowColumnWriter {
 }
 
 /// Encodes [`RecordBatch`] to a parquet row group
+///
+/// Note: this structure is created by [`ArrowRowGroupWriterFactory`] 
internally used to
+/// create [`ArrowRowGroupWriter`]s, but it is not exposed publicly.
+///
+/// See the example on [`ArrowColumnWriter`] for how to encode columns in 
parallel
+#[derive(Debug)]
 struct ArrowRowGroupWriter {
     writers: Vec<ArrowColumnWriter>,
     schema: SchemaRef,
@@ -907,6 +913,10 @@ impl ArrowRowGroupWriter {
 }
 
 /// Factory that creates new column writers for each row group in the Parquet 
file.
+///
+/// You can create this structure via an 
[`ArrowWriter::into_serialized_writer`].
+/// See the example on [`ArrowColumnWriter`] for how to encode columns in 
parallel
+#[derive(Debug)]
 pub struct ArrowRowGroupWriterFactory {
     schema: SchemaDescPtr,
     arrow_schema: SchemaRef,
@@ -937,7 +947,7 @@ impl ArrowRowGroupWriterFactory {
         Ok(ArrowRowGroupWriter::new(writers, &self.arrow_schema))
     }
 
-    /// Create column writers for a new row group.
+    /// Create column writers for a new row group, with the given row group 
index
     pub fn create_column_writers(&self, row_group_index: usize) -> 
Result<Vec<ArrowColumnWriter>> {
         let mut writers = Vec::with_capacity(self.arrow_schema.fields.len());
         let mut leaves = self.schema.columns().iter();

(arrow-rs) branch main updated: Undeprecate `ArrowWriter::into_serialized_writer` and add docs (#8621)

Reply via email to