alamb commented on code in PR #9357:
URL: https://github.com/apache/arrow-rs/pull/9357#discussion_r2777513839


##########
parquet/src/file/properties.rs:
##########
@@ -45,6 +45,9 @@ pub const DEFAULT_STATISTICS_ENABLED: EnabledStatistics = 
EnabledStatistics::Pag
 pub const DEFAULT_WRITE_PAGE_HEADER_STATISTICS: bool = false;
 /// Default value for [`WriterProperties::max_row_group_size`]
 pub const DEFAULT_MAX_ROW_GROUP_SIZE: usize = 1024 * 1024;
+/// Default value for [`WriterProperties::max_row_group_bytes`] (128 MB, same 
as the official Java
+/// implementation for `parquet.block.size`)
+pub const DEFAULT_MAX_ROW_GROUP_BYTES: usize = 128 * 1024 * 1024;

Review Comment:
   Or should we set it?



##########
parquet/src/file/properties.rs:
##########
@@ -247,11 +251,25 @@ impl WriterProperties {
         self.write_batch_size
     }
 
-    /// Returns maximum number of rows in a row group.
+    /// Returns maximum number of rows in a row group, or `usize::MAX` if 
unlimited.
     ///
     /// For more details see 
[`WriterPropertiesBuilder::set_max_row_group_size`]
     pub fn max_row_group_size(&self) -> usize {

Review Comment:
   Given the introduction of max_row_group_count, what would you think about 
deprecating `max_row_group_size` and directing people to that new setting?



##########
parquet/src/file/properties.rs:
##########
@@ -468,7 +487,8 @@ impl Default for WriterPropertiesBuilder {
             data_page_size_limit: DEFAULT_PAGE_SIZE,
             data_page_row_count_limit: DEFAULT_DATA_PAGE_ROW_COUNT_LIMIT,
             write_batch_size: DEFAULT_WRITE_BATCH_SIZE,
-            max_row_group_size: DEFAULT_MAX_ROW_GROUP_SIZE,
+            max_row_group_row_count: Some(DEFAULT_MAX_ROW_GROUP_SIZE),

Review Comment:
   Could we also please align the constant name to the parameter name (eg. 
`DEFAULT_MAX_ROW_GROUP_COUNT`)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to