kszucs commented on code in PR #47090:
URL: https://github.com/apache/arrow/pull/47090#discussion_r2318883297


##########
cpp/src/parquet/properties.h:
##########
@@ -155,6 +155,7 @@ class PARQUET_EXPORT ReaderProperties {
 ReaderProperties PARQUET_EXPORT default_reader_properties();
 
 static constexpr int64_t kDefaultDataPageSize = 1024 * 1024;
+static constexpr int64_t kDefaultMaxRowsPerPage = 20'000;

Review Comment:
   I see. Theoretically this shouldn't affect the CDC effectiveness, on the 
contrary, having smaller pages will likely improve the deduplication ratio. 
Although the default CDC options were chosen to approach the 1MB page size 
limit, so I need to reconsider the defaults.
   
   Either way, I'm checking whether this change interferes with CDC or not, 
theoretically it shouldn't.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to