smdahmed opened a new issue #776: Incorrect averageBytesPerRecord Causes OOM
URL: https://github.com/apache/incubator-hudi/issues/776
 
 
   Historically I see an issue that has been closed at: 
https://github.com/apache/incubator-hudi/issues/270. 
   
   I am not sure what the fix was for the above issue.
   
   I have hit the issue today. Lets say there are about thousands of records 
but none get written (which may happen in my case as we want to selectively 
write records). This results total records written to 0 leading to avgSize 
going to Infinity. 
   
   ``
   scala> val l = Math.ceil( 1.0 / 0 )
   l: Double = Infinity
   
   scala> val l = Math.ceil( 1.0 / 0 ).toLong
   l: Long = 9223372036854775807
   ``
   This causes OOM.
   
   ``
       protected long averageBytesPerRecord() {
         long avgSize = 0L;
         HoodieTimeline commitTimeline = 
metaClient.getActiveTimeline().getCommitTimeline()
             .filterCompletedInstants();
         try {
           if (!commitTimeline.empty()) {
             HoodieInstant latestCommitTime = 
commitTimeline.lastInstant().get();
             HoodieCommitMetadata commitMetadata = HoodieCommitMetadata
                 
.fromBytes(commitTimeline.getInstantDetails(latestCommitTime).get(), 
HoodieCommitMetadata.class);
             avgSize = (long) Math.ceil(
                 (1.0 * commitMetadata.fetchTotalBytesWritten()) / 
commitMetadata
                     .fetchTotalRecordsWritten());
           }
         } catch (Throwable t) {
           // make this fail safe.
           logger.error("Error trying to compute average bytes/record ", t);
         }
         return avgSize <= 0L ? config.getCopyOnWriteRecordSizeEstimate() : 
avgSize;
       }
   ``
   
   I have now managed to work around it by editing the last line in the code as 
below. 
   
   return (avgSize <= 0L  | avgSize >= Integer.MAX_VALUE) ? 
config.getCopyOnWriteRecordSizeEstimate() : avgSize;
   
   But I believe someone more knowledgeable about this should take a look at it.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to