Kabeer,
I have never seen this issue before.
2019-07-02 22:43:31,875 [main] INFO 
com.uber.hoodie.table.HoodieCopyOnWriteTable - AvgRecordSize => 
9223372036854775807

The above average record size looks wrong. It is Long.MAX_VALUE. The logic for 
calculating average record size is in

https://github.com/apache/incubator-hudi/blob/9f18a1ca80ec1d08253688d9b4d4538a86068559/hoodie-client/src/main/java/com/uber/hoodie/table/HoodieCopyOnWriteTable.java#L742


>From the code, this could happen if no records were written in previously 
>committed instant (.commit). Can you check if you see empty commit lying 
>around inĀ  .hoodie fodler.
    On Wednesday, July 3, 2019, 3:39:15 AM PDT, Kabeer Ahmed 
<kab...@linuxmail.org> wrote:  
 
 Hi Nishith,

Please find the latest commit data in the gist at: 
https://gist.github.com/smdahmed/5d811cb4833243a11ac09b9dc61e5b4d 
(https://link.getmailspring.com/link/405dc998-0e1a-4ebe-8268-b1828f93b...@getmailspring.com/0?redirect=https%3A%2F%2Fgist.github.com%2Fsmdahmed%2F5d811cb4833243a11ac09b9dc61e5b4d&recipient=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D)
For your convenience, I have also copy pasted it below. Any help is highly 
appreciated.

Thanks
Kabeer.

20190702161629.commit:
{
"partitionToWriteStats" : {
"2018/05/30" : [ {
"fileId" : "39cff0df-24e4-45b8-bff5-9b4f41c4096a",
"path" : 
"2018/05/30/39cff0df-24e4-45b8-bff5-9b4f41c4096a_0_20190702161629.parquet",
"prevCommit" : "20190702161417",
"numWrites" : 11614,
"numDeletes" : 0,
"numUpdateWrites" : 5,
"numInserts" : 3,
"totalWriteBytes" : 848480,
"totalWriteErrors" : 0,
"tempPath" : null,
"partitionPath" : "2018/05/30",
"totalLogRecords" : 0,
"totalLogFilesCompacted" : 0,
"totalLogSizeCompacted" : 0,
"totalUpdatedRecordsCompacted" : 0,
"totalLogBlocks" : 0,
"totalCorruptLogBlock" : 0,
"totalRollbackBlocks" : 0
} ],
"2018/05/31" : [ {
"fileId" : "4f5514e8-d57c-4c6e-be8f-c3448051c956",
"path" : 
"2018/05/31/4f5514e8-d57c-4c6e-be8f-c3448051c956_1_20190702161629.parquet",
"prevCommit" : "null",
"numWrites" : 10430,
"numDeletes" : 0,
"numUpdateWrites" : 0,
"numInserts" : 10430,
"totalWriteBytes" : 820723,
"totalWriteErrors" : 0,
"tempPath" : null,
"partitionPath" : "2018/05/31",
"totalLogRecords" : 0,
"totalLogFilesCompacted" : 0,
"totalLogSizeCompacted" : 0,
"totalUpdatedRecordsCompacted" : 0,
"totalLogBlocks" : 0,
"totalCorruptLogBlock" : 0,
"totalRollbackBlocks" : 0
} ]
},
"compacted" : false,
"extraMetadataMap" : {
"ROLLING_STAT" : "{\n \"partitionToRollingStats\" : {\n \"2018/05/29\" : {\n 
\"235bd794-790b-48e7-b9ea-956149db1dce\" : {\n \"fileId\" : 
\"235bd794-790b-48e7-b9ea-956149db1dce\",\n \"inserts\" : 2,\n \"upserts\" : 
0,\n \"deletes\" : 0,\n \"totalInputWriteBytesToDisk\" : 0,\n 
\"totalInputWriteBytesOnDisk\" : 443797\n }\n },\n \"2018/05/30\" : {\n 
\"39cff0df-24e4-45b8-bff5-9b4f41c4096a\" : {\n \"fileId\" : 
\"39cff0df-24e4-45b8-bff5-9b4f41c4096a\",\n \"inserts\" : 23220,\n \"upserts\" 
: 5,\n \"deletes\" : 0,\n \"totalInputWriteBytesToDisk\" : 0,\n 
\"totalInputWriteBytesOnDisk\" : 848282\n }\n },\n \"2018/05/31\" : {\n 
\"4f5514e8-d57c-4c6e-be8f-c3448051c956\" : {\n \"fileId\" : 
\"4f5514e8-d57c-4c6e-be8f-c3448051c956\",\n \"inserts\" : 10430,\n \"upserts\" 
: 0,\n \"deletes\" : 0,\n \"totalInputWriteBytesToDisk\" : 0,\n 
\"totalInputWriteBytesOnDisk\" : 820723\n }\n }\n },\n \"actionType\" : 
\"commit\"\n}"
},
"extraMetadata" : {
"ROLLING_STAT" : "{\n \"partitionToRollingStats\" : {\n \"2018/05/29\" : {\n 
\"235bd794-790b-48e7-b9ea-956149db1dce\" : {\n \"fileId\" : 
\"235bd794-790b-48e7-b9ea-956149db1dce\",\n \"inserts\" : 2,\n \"upserts\" : 
0,\n \"deletes\" : 0,\n \"totalInputWriteBytesToDisk\" : 0,\n 
\"totalInputWriteBytesOnDisk\" : 443797\n }\n },\n \"2018/05/30\" : {\n 
\"39cff0df-24e4-45b8-bff5-9b4f41c4096a\" : {\n \"fileId\" : 
\"39cff0df-24e4-45b8-bff5-9b4f41c4096a\",\n \"inserts\" : 23220,\n \"upserts\" 
: 5,\n \"deletes\" : 0,\n \"totalInputWriteBytesToDisk\" : 0,\n 
\"totalInputWriteBytesOnDisk\" : 848282\n }\n },\n \"2018/05/31\" : {\n 
\"4f5514e8-d57c-4c6e-be8f-c3448051c956\" : {\n \"fileId\" : 
\"4f5514e8-d57c-4c6e-be8f-c3448051c956\",\n \"inserts\" : 10430,\n \"upserts\" 
: 0,\n \"deletes\" : 0,\n \"totalInputWriteBytesToDisk\" : 0,\n 
\"totalInputWriteBytesOnDisk\" : 820723\n }\n }\n },\n \"actionType\" : 
\"commit\"\n}"
},
"totalScanTime" : 0,
"totalCreateTime" : 2439,
"totalUpsertTime" : 2450,
"totalCompactedRecordsUpdated" : 0,
"totalLogFilesCompacted" : 0,
"totalLogFilesSize" : 0,
"fileIdAndRelativePaths" : {
"4f5514e8-d57c-4c6e-be8f-c3448051c956" : 
"2018/05/31/4f5514e8-d57c-4c6e-be8f-c3448051c956_1_20190702161629.parquet",
"39cff0df-24e4-45b8-bff5-9b4f41c4096a" : 
"2018/05/30/39cff0df-24e4-45b8-bff5-9b4f41c4096a_0_20190702161629.parquet"
},
"totalRecordsDeleted" : 0,
"totalLogRecordsCompacted" : 0
}

20190702161750.clean:
Objavro.schema 
{"type":"record","name":"HoodieCleanMetadata","namespace":"com.uber.hoodie.avro.model","fields":[{"name":"startCleanTime","type":{"type":"string","avro.java.string":"String"}},{"name":"timeTakenInMillis","type":"long"},{"name":"totalFilesDeleted","type":"int"},{"name":"earliestCommitToRetain","type":{"type":"string","avro.java.string":"String"}},{"name":"partitionMetadata","type":{"type":"map","values":{"type":"record","name":"HoodieCleanPartitionMetadata","fields":[{"name":"partitionPath","type":{"type":"string","avro.java.string":"String"}},{"name":"policy","type":{"type":"string","avro.java.string":"String"}},{"name":"deletePathPatterns","type":{"type":"array","items":{"type":"string","avro.java.string":"String"}}},{"name":"successDeleteFiles","type":{"type":"array","items":{"type":"string","avro.java.string":"String"}}},{"name":"failedDeleteFiles","type":{"type":"array","items":{"type":"string","avro.java.string":"String"}}}]},"avro.java.string":"String"}}]}

20190702161847.inflight:
{
"partitionToWriteStats" : {
"2018/05/31" : [ {
"fileId" : "4f5514e8-d57c-4c6e-be8f-c3448051c956",
"path" : null,
"prevCommit" : "20190702161629",
"numWrites" : 0,
"numDeletes" : 0,
"numUpdateWrites" : 2,
"numInserts" : 0,
"totalWriteBytes" : 0,
"totalWriteErrors" : 0,
"tempPath" : null,
"partitionPath" : null,
"totalLogRecords" : 0,
"totalLogFilesCompacted" : 0,
"totalLogSizeCompacted" : 0,
"totalUpdatedRecordsCompacted" : 0,
"totalLogBlocks" : 0,
"totalCorruptLogBlock" : 0,
"totalRollbackBlocks" : 0
} ]
},
"compacted" : false,
"extraMetadataMap" : { },
"totalScanTime" : 0,
"totalCreateTime" : 0,
"totalUpsertTime" : 0,
"totalCompactedRecordsUpdated" : 0,
"totalLogFilesCompacted" : 0,
"totalLogFilesSize" : 0,
"extraMetadata" : { },
"fileIdAndRelativePaths" : {
"4f5514e8-d57c-4c6e-be8f-c3448051c956" : null
},
"totalRecordsDeleted" : 0,
"totalLogRecordsCompacted" : 0
}

20190702162055.inflight:

{
"partitionToWriteStats" : { },
"compacted" : false,
"extraMetadataMap" : { },
"totalRecordsDeleted" : 0,
"totalLogRecordsCompacted" : 0,
"totalScanTime" : 0,
"totalCreateTime" : 0,
"totalUpsertTime" : 0,
"totalCompactedRecordsUpdated" : 0,
"totalLogFilesCompacted" : 0,
"totalLogFilesSize" : 0,
"fileIdAndRelativePaths" : { },
"extraMetadata" : { }
}

On Jul 3 2019, at 8:30 am, Kabeer Ahmed <kab...@linuxmail.org> wrote:
> Hi Nishith,
>
> Thank you for the quick respnose. I shall try to send the commit metadata at 
> the earliest. I hope the commit metadata you are looking for is the one 
> within .hoodie/ directory and not the ones that is archived.
> And there are inflight and commit metadata. I am taking that you want to look 
> into the one inflight. Shall revert back with further details.
> Thanks
> Kabeer.
>
> On Jul 3 2019, at 2:19 am, nishith agarwal <n3.nas...@gmail.com> wrote:
> > Kabir,
> >
> > Could you share the content of your commit metadata ? You can list the
> > timeline, find the latest commit in the timeline, perform a cat and paste
> > the results (that you can share).
> >
> > Thanks,
> > Nishith
> >
> > On Tue, Jul 2, 2019 at 4:53 PM Kabeer Ahmed <kab...@linuxmail.org> wrote:
> > > Hi Vinoth and other HUDI Experts,
> > > I am stuck while processing inserts into HUDI. The process picks up CSV
> > > files and loads them into HUDI. The process seems to be stuck at:
> > > https://github.com/apache/incubator-hudi/blob/master/hoodie-client/src/main/java/com/uber/hoodie/table/HoodieCopyOnWriteTable.java#L679
> > > Log is below:
> > >
> > > 2019-07-02 22:43:31,875 [main] INFO
> > > com.uber.hoodie.table.HoodieCopyOnWriteTable - AvgRecordSize =>
> > > 9223372036854775807
> > > 2019-07-02 22:43:31,969 [main] INFO
> > > com.uber.hoodie.table.HoodieCopyOnWriteTable - For partitionPath :
> > > 2018/05/30 Small Files => [SmallFile {location=HoodieRecordLocation
> > > {commitTime=20190702161750, fileId=39cff0df-24e4-45b8-bff5-9b4f41c4096a},
> > > sizeBytes=435362}]
> > > 2019-07-02 22:43:31,969 [main] INFO
> > > com.uber.hoodie.table.HoodieCopyOnWriteTable - After small file 
> > > assignment:
> > > unassignedInserts => 8, totalInsertBuckets => 2147483647, recordsPerBucket
> > > => 0
> > > Looking at the last line in the log: "unassignedInserts => 8,
> > > totalInsertBuckets => 2147483647, recordsPerBucket => 0", this causes the
> > > below code to loop for quite long causing heap issues.
> > >
> > > logger.info(
> > > "After small file assignment: unassignedInserts => " +
> > > totalUnassignedInserts
> > > + ", totalInsertBuckets => " + insertBuckets + ", recordsPerBucket => "
> > > + insertRecordsPerBucket);
> > > for (int b = 0; b < insertBuckets; b++) {
> > > bucketNumbers.add(totalBuckets);
> > > recordsPerBucket.add(totalUnassignedInserts / insertBuckets);
> > > BucketInfo bucketInfo = new BucketInfo();
> > > bucketInfo.bucketType = BucketType.INSERT;
> > > bucketInfoMap.put(totalBuckets, bucketInfo);
> > > totalBuckets++;
> > > }
> > > Has someone seen the issue? Do I need to file a bug or it is something to
> > > do with my misconfiguration?
> > >
> > > Any help is highly appreciated.
> > > Thanks
> > > Kabeer.
> >
> >
>
>

  

Reply via email to