[ https://issues.apache.org/jira/browse/PARQUET-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17394605#comment-17394605 ]
Gabor Szadovszky commented on PARQUET-2073: ------------------------------------------- [~JiangYang], you're right, {{rowsToFillPage}} will always be zero. It means (because of [line 256|https://github.com/apache/parquet-mr/blob/master/parquet-column/src/main/java/org/apache/parquet/column/impl/ColumnWriteStoreBase.java#L256]) that we never use this estimate correctly so the next row count check will always step by {{props.getMinRowCountForPageSizeCheck()}}. Funny that it was working this way ever since we have this estimation logic. Strange that no one have ever noticed. About fixing this issue. We can have proper results without casting: {code:java} rows * remainingMem / usedMem {code} Meanwhile, this form is a bit misleading so we need some comments that we are calculating the estimated number of rows can be written to the page based on the average size of rows already written. The tricky part is how to test it. This will be a new behavior of the page writing and we have never tested this properly. (Otherwise, we would have caught this issue.) It highly depends on the characteristics of the values if this approach works fine or not. (For example small values at the beginning and large ones later can cause this logic overrun the maximum size of the page. However, the same can happen if the wrong values are used for {{min/maxRowCountForPageSizeCheck}}.) Sure, please, create a PR. I am happy to review. > Is there something wrong calculate usedMem in ColumnWriteStoreBase.java > ----------------------------------------------------------------------- > > Key: PARQUET-2073 > URL: https://issues.apache.org/jira/browse/PARQUET-2073 > Project: Parquet > Issue Type: Bug > Components: parquet-mr > Affects Versions: 1.12.0 > Reporter: JiangYang > Priority: Critical > Attachments: image-2021-08-05-14-37-51-299.png > > > !image-2021-08-05-14-37-51-299.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)