Kabir,

Could you share the content of your commit metadata ? You can list the
timeline, find the latest commit in the timeline, perform a cat and paste
the results (that you can share).

Thanks,
Nishith

On Tue, Jul 2, 2019 at 4:53 PM Kabeer Ahmed <kab...@linuxmail.org> wrote:

> Hi Vinoth and other HUDI Experts,
>
> I am stuck while processing inserts into HUDI. The process picks up CSV
> files and loads them into HUDI. The process seems to be stuck at:
> https://github.com/apache/incubator-hudi/blob/master/hoodie-client/src/main/java/com/uber/hoodie/table/HoodieCopyOnWriteTable.java#L679
> Log is below:
>
> 2019-07-02 22:43:31,875 [main] INFO
> com.uber.hoodie.table.HoodieCopyOnWriteTable - AvgRecordSize =>
> 9223372036854775807
> 2019-07-02 22:43:31,969 [main] INFO
> com.uber.hoodie.table.HoodieCopyOnWriteTable - For partitionPath :
> 2018/05/30 Small Files => [SmallFile {location=HoodieRecordLocation
> {commitTime=20190702161750, fileId=39cff0df-24e4-45b8-bff5-9b4f41c4096a},
> sizeBytes=435362}]
> 2019-07-02 22:43:31,969 [main] INFO
> com.uber.hoodie.table.HoodieCopyOnWriteTable - After small file assignment:
> unassignedInserts => 8, totalInsertBuckets => 2147483647, recordsPerBucket
> => 0
> Looking at the last line in the log: "unassignedInserts => 8,
> totalInsertBuckets => 2147483647, recordsPerBucket => 0", this causes the
> below code to loop for quite long causing heap issues.
>
> logger.info(
> "After small file assignment: unassignedInserts => " +
> totalUnassignedInserts
> + ", totalInsertBuckets => " + insertBuckets + ", recordsPerBucket => "
> + insertRecordsPerBucket);
> for (int b = 0; b < insertBuckets; b++) {
> bucketNumbers.add(totalBuckets);
> recordsPerBucket.add(totalUnassignedInserts / insertBuckets);
> BucketInfo bucketInfo = new BucketInfo();
> bucketInfo.bucketType = BucketType.INSERT;
> bucketInfoMap.put(totalBuckets, bucketInfo);
> totalBuckets++;
> }
> Has someone seen the issue? Do I need to file a bug or it is something to
> do with my misconfiguration?
>
> Any help is highly appreciated.
>
> Thanks
> Kabeer.
>

Reply via email to