Hi Vinoth and other HUDI Experts,

I am stuck while processing inserts into HUDI. The process picks up CSV files 
and loads them into HUDI. The process seems to be stuck at: 
https://github.com/apache/incubator-hudi/blob/master/hoodie-client/src/main/java/com/uber/hoodie/table/HoodieCopyOnWriteTable.java#L679
Log is below:

2019-07-02 22:43:31,875 [main] INFO 
com.uber.hoodie.table.HoodieCopyOnWriteTable - AvgRecordSize => 
9223372036854775807
2019-07-02 22:43:31,969 [main] INFO 
com.uber.hoodie.table.HoodieCopyOnWriteTable - For partitionPath : 2018/05/30 
Small Files => [SmallFile {location=HoodieRecordLocation 
{commitTime=20190702161750, fileId=39cff0df-24e4-45b8-bff5-9b4f41c4096a}, 
sizeBytes=435362}]
2019-07-02 22:43:31,969 [main] INFO 
com.uber.hoodie.table.HoodieCopyOnWriteTable - After small file assignment: 
unassignedInserts => 8, totalInsertBuckets => 2147483647, recordsPerBucket => 0
Looking at the last line in the log: "unassignedInserts => 8, 
totalInsertBuckets => 2147483647, recordsPerBucket => 0", this causes the below 
code to loop for quite long causing heap issues.

logger.info(
"After small file assignment: unassignedInserts => " + totalUnassignedInserts
+ ", totalInsertBuckets => " + insertBuckets + ", recordsPerBucket => "
+ insertRecordsPerBucket);
for (int b = 0; b < insertBuckets; b++) {
bucketNumbers.add(totalBuckets);
recordsPerBucket.add(totalUnassignedInserts / insertBuckets);
BucketInfo bucketInfo = new BucketInfo();
bucketInfo.bucketType = BucketType.INSERT;
bucketInfoMap.put(totalBuckets, bucketInfo);
totalBuckets++;
}
Has someone seen the issue? Do I need to file a bug or it is something to do 
with my misconfiguration?

Any help is highly appreciated.

Thanks
Kabeer.

Reply via email to