alexeykudinkin commented on PR #5129: URL: https://github.com/apache/hudi/pull/5129#issuecomment-1130488173
@nsivabalan we should not be interfering with the caching on the Parquet Writer level (by manually flushing), and checking the ParquetWriter for the currently accumulated buffer size is the right way to interface with it (as compared to intercept the FileSystem writes and accounting for how many bytes were written). The issue inadvertently planted with this approach (addressed in #5497) was that the cost of the `getDataSize` was not factored in (assumed it to be O(1), while in reality it's O(N) of the written blocks) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org