alexeykudinkin commented on PR #5129:
URL: https://github.com/apache/hudi/pull/5129#issuecomment-1130488173

   @nsivabalan we should not be interfering with the caching on the Parquet 
Writer level (by manually flushing), and checking the ParquetWriter for the 
currently accumulated buffer size is the right way to interface with it (as 
compared to intercept the  FileSystem writes and accounting for how many bytes 
were written).
   
   The issue inadvertently planted with this approach (addressed in #5497) was 
that the cost of the `getDataSize` was not factored in (assumed it to be O(1), 
while in reality it's O(N) of the written blocks)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to