alamb commented on code in PR #6466:
URL: https://github.com/apache/arrow-rs/pull/6466#discussion_r1777318479


##########
parquet/src/file/serialized_reader.rs:
##########
@@ -191,11 +191,13 @@ impl<R: 'static + ChunkReader> SerializedFileReader<R> {
     /// Creates file reader from a Parquet file with read options.
     /// Returns error if Parquet file does not exist or is corrupt.
     pub fn new_with_options(chunk_reader: R, options: ReadOptions) -> 
Result<Self> {
-        let metadata = 
ParquetMetaDataReader::new().parse_and_finish(&chunk_reader)?;
+        let mut metadata_builder = ParquetMetaDataReader::new()
+            .parse_and_finish(&chunk_reader)?
+            .into_builder();
         let mut predicates = options.predicates;
-        let row_groups = metadata.row_groups().to_vec();
-        let mut filtered_row_groups = Vec::<RowGroupMetaData>::new();
-        for (i, rg_meta) in row_groups.into_iter().enumerate() {
+
+        // Filter row groups based on the predicates

Review Comment:
   I think the cleanup of this code (which is modifying the ParquetMetaData) is 
the best example of why having this API makes sense -- it makes one fewer 
copies and also I think is quite a bit clearer



##########
parquet/src/file/metadata/mod.rs:
##########
@@ -290,6 +298,117 @@ impl ParquetMetaData {
     }
 }
 
+/// A builder for creating / manipulating [`ParquetMetaData`]
+///
+/// # Example creating a new [`ParquetMetaData`]
+///
+///```no_run
+/// # use parquet::file::metadata::{FileMetaData, ParquetMetaData, 
ParquetMetaDataBuilder, RowGroupMetaData, RowGroupMetaDataBuilder};
+/// # fn get_file_metadata() -> FileMetaData { unimplemented!(); }
+/// // Create a new builder given the file metadata
+/// let mut builder = ParquetMetaDataBuilder::new(get_file_metadata());
+/// // Create a row group
+/// let row_group = RowGroupMetaData::builder(file_metadata.schema_descr_ptr())
+///    .set_num_rows(100)
+///    // ... (A real row group needs more than just the number of rows)
+///    .build()
+///    .unwrap();
+/// // Create the final metadata
+/// let metadata: ParquetMetaData = builder
+///   .add_row_group(row_group)
+///   .build();
+/// ```
+///
+/// # Example modifying an existing [`ParquetMetaData`]
+/// ```no_run
+/// # use parquet::file::metadata::ParquetMetaData;
+/// # fn load_metadata() -> ParquetMetaData { unimplemented!(); }
+/// // Modify the metadata so only the last RowGroup remains
+/// let metadata: ParquetMetaData = load_metadata();
+/// let mut builder = metadata.into_builder();
+///
+/// // Take existing row groups to modify
+/// let mut row_groups = builder.take_row_groups();
+/// let last_row_group = row_groups.pop().unwrap();
+///
+/// let metadata = builder
+///   .add_row_group(last_row_group)
+///   .build();
+/// ```
+pub struct ParquetMetaDataBuilder(ParquetMetaData);
+
+impl ParquetMetaDataBuilder {
+    /// Create a new builder from a file metadata, with no row groups
+    pub fn new(file_meta_data: FileMetaData) -> Self {
+        Self(ParquetMetaData::new(file_meta_data, vec![]))
+    }
+
+    /// Create a new builder from an exising ParquetMetaData
+    pub fn new_from_metadata(metadata: ParquetMetaData) -> Self {
+        Self(metadata)
+    }
+
+    /// Adds a row group to the metadata
+    pub fn add_row_group(mut self, row_group: RowGroupMetaData) -> Self {

Review Comment:
   these methods follow the exsiting convention of `add` and `set` as the other 
Builders in this crate sucha s
   
   * 
https://docs.rs/parquet/latest/parquet/file/metadata/struct.RowGroupMetaDataBuilder.html
   * 
https://docs.rs/parquet/latest/parquet/file/metadata/struct.ColumnChunkMetaDataBuilder.html



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to