Hi Vinoth and other HUDI Experts, I am stuck while processing inserts into HUDI. The process picks up CSV files and loads them into HUDI. The process seems to be stuck at: https://github.com/apache/incubator-hudi/blob/master/hoodie-client/src/main/java/com/uber/hoodie/table/HoodieCopyOnWriteTable.java#L679 Log is below:
2019-07-02 22:43:31,875 [main] INFO com.uber.hoodie.table.HoodieCopyOnWriteTable - AvgRecordSize => 9223372036854775807 2019-07-02 22:43:31,969 [main] INFO com.uber.hoodie.table.HoodieCopyOnWriteTable - For partitionPath : 2018/05/30 Small Files => [SmallFile {location=HoodieRecordLocation {commitTime=20190702161750, fileId=39cff0df-24e4-45b8-bff5-9b4f41c4096a}, sizeBytes=435362}] 2019-07-02 22:43:31,969 [main] INFO com.uber.hoodie.table.HoodieCopyOnWriteTable - After small file assignment: unassignedInserts => 8, totalInsertBuckets => 2147483647, recordsPerBucket => 0 Looking at the last line in the log: "unassignedInserts => 8, totalInsertBuckets => 2147483647, recordsPerBucket => 0", this causes the below code to loop for quite long causing heap issues. logger.info( "After small file assignment: unassignedInserts => " + totalUnassignedInserts + ", totalInsertBuckets => " + insertBuckets + ", recordsPerBucket => " + insertRecordsPerBucket); for (int b = 0; b < insertBuckets; b++) { bucketNumbers.add(totalBuckets); recordsPerBucket.add(totalUnassignedInserts / insertBuckets); BucketInfo bucketInfo = new BucketInfo(); bucketInfo.bucketType = BucketType.INSERT; bucketInfoMap.put(totalBuckets, bucketInfo); totalBuckets++; } Has someone seen the issue? Do I need to file a bug or it is something to do with my misconfiguration? Any help is highly appreciated. Thanks Kabeer.