smdahmed opened a new issue #776: Incorrect averageBytesPerRecord Causes OOM URL: https://github.com/apache/incubator-hudi/issues/776 Historically I see an issue that has been closed at: https://github.com/apache/incubator-hudi/issues/270. I am not sure what the fix was for the above issue. I have hit the issue today. Lets say there are about thousands of records but none get written (which may happen in my case as we want to selectively write records). This results total records written to 0 leading to avgSize going to Infinity. `` scala> val l = Math.ceil( 1.0 / 0 ) l: Double = Infinity scala> val l = Math.ceil( 1.0 / 0 ).toLong l: Long = 9223372036854775807 `` This causes OOM. `` protected long averageBytesPerRecord() { long avgSize = 0L; HoodieTimeline commitTimeline = metaClient.getActiveTimeline().getCommitTimeline() .filterCompletedInstants(); try { if (!commitTimeline.empty()) { HoodieInstant latestCommitTime = commitTimeline.lastInstant().get(); HoodieCommitMetadata commitMetadata = HoodieCommitMetadata .fromBytes(commitTimeline.getInstantDetails(latestCommitTime).get(), HoodieCommitMetadata.class); avgSize = (long) Math.ceil( (1.0 * commitMetadata.fetchTotalBytesWritten()) / commitMetadata .fetchTotalRecordsWritten()); } } catch (Throwable t) { // make this fail safe. logger.error("Error trying to compute average bytes/record ", t); } return avgSize <= 0L ? config.getCopyOnWriteRecordSizeEstimate() : avgSize; } `` I have now managed to work around it by editing the last line in the code as below. return (avgSize <= 0L | avgSize >= Integer.MAX_VALUE) ? config.getCopyOnWriteRecordSizeEstimate() : avgSize; But I believe someone more knowledgeable about this should take a look at it.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
