[GitHub] [hudi] voonhous commented on issue #8892: [SUPPORT] [BUG] Duplicate fileID ??? from bucket ?? of partition found during the BucketStreamWriteFunction index bootstrap.
voonhous commented on issue #8892: URL: https://github.com/apache/hudi/issues/8892#issuecomment-1663965357 Okay, just finished writing the test to reproduce this issue, will submit a PR tomorrow after cleaning it up. thanks thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] voonhous commented on issue #8892: [SUPPORT] [BUG] Duplicate fileID ??? from bucket ?? of partition found during the BucketStreamWriteFunction index bootstrap.
voonhous commented on issue #8892: URL: https://github.com/apache/hudi/issues/8892#issuecomment-1663573970 @danny0405 I don't think cleaning up the files in `#finalizeWrite` is the correct way of doing things as the current implementation of `#finalizeWrite` only handles parquet files. ```java while (itr.hasNext()) { FileStatus status = itr.next(); String pathStr = status.getPath().toString(); if (pathStr.contains(HoodieTableMetaClient.MARKER_EXTN) && !pathStr.endsWith(IOType.APPEND.name())) { result.add(translateMarkerToDataPath(pathStr)); } } ``` Given that this is not a partial-failover (when a TM fails, all TMs are "restarted"), we should actually ensure a rollback is performed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] voonhous commented on issue #8892: [SUPPORT] [BUG] Duplicate fileID ??? from bucket ?? of partition found during the BucketStreamWriteFunction index bootstrap.
voonhous commented on issue #8892: URL: https://github.com/apache/hudi/issues/8892#issuecomment-1663266922 I think he means check why f#inalizeWrite is not picking up the files to be deleted upon commit? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] voonhous commented on issue #8892: [SUPPORT] [BUG] Duplicate fileID ??? from bucket ?? of partition found during the BucketStreamWriteFunction index bootstrap.
voonhous commented on issue #8892: URL: https://github.com/apache/hudi/issues/8892#issuecomment-1663258600 Spent 2 more hours looking at this issue: What happened was that I was testing this on 0.12.1 without this PR: https://github.com/apache/hudi/pull/7208 To reproduce this error: Add the snippet into `org.apache.hudi.sink.StreamWriteFunction#flushRemaining`: ```java if (taskID == 0) { // trigger a failure throw new HoodieException("Intentional failure on taskID 0 thrown to invoke partial failover?"); } Prior to this enhancement, rollbacks will be created whenever a TM fails to remove all the partially written files. However, after this enhancement rollbacks will not be created unless a job is restarted or global failover happens. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] voonhous commented on issue #8892: [SUPPORT] [BUG] Duplicate fileID ??? from bucket ?? of partition found during the BucketStreamWriteFunction index bootstrap.
voonhous commented on issue #8892: URL: https://github.com/apache/hudi/issues/8892#issuecomment-1649313876 Yeap, we ensured that has happened. In our internal version a rollback will be performed to remove all the files that was written before checkpoint. Afterwhich, a write will be performed again from the last successful checkpoint. I'll do a check on this again on the community's master version later in the week. Sorry. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] voonhous commented on issue #8892: [SUPPORT] [BUG] Duplicate fileID ??? from bucket ?? of partition found during the BucketStreamWriteFunction index bootstrap.
voonhous commented on issue #8892: URL: https://github.com/apache/hudi/issues/8892#issuecomment-1643039551 @big-doudou Apologies for the late reply. I was trying to reproduce this issue on our end, but was unable to do so. A little context on what we did: Using a datagen source, we'll sink the data into a hudi table. Before a checkpoint, we'll kill one of the TM's task. Upon doing so, a rollback will be triggered when all the TMs restart. I checked with a colleague of mine and they mentioned that when hudi is uperforming an upsert, there's a shuffle operation. The presence of a shuffle operation will trigger a "global failover". Here's the Flink-SQL that i used while attempting to reproduce your issue. ```sql CREATE TEMPORARY TABLE buyer_info ( id bigint, dec_col decimal(25, 10), country string, age INT, update_time STRING ) WITH ( 'connector' = 'datagen', 'rows-per-second' = '10', 'fields.age.min' = '0', 'fields.age.max' = '7', 'fields.country.length' = '1' ); -- Hudi table to write to CREATE TEMPORARY TABLE dim_buyer_info_test ( id bigint, dec_col decimal(25, 10), country string, age INT, update_time STRING ) PARTITIONED BY (age) WITH ( -- Hudi settings 'connector' = 'hudi', 'hoodie.datasource.write.recordkey.field' = 'id', 'path' = '/path/to/hudi_table/duplicate_file_id_issue', 'write.operation' = 'UPSERT', 'table.type' = 'MERGE_ON_READ', 'hoodie.compaction.payload.class' = 'org.apache.hudi.common.model.PartialUpdateAvroPayload', 'hoodie.datasource.write.payload.class' = 'org.apache.hudi.common.model.PartialUpdateAvroPayload', 'hoodie.table.keygenerator.class' = 'org.apache.hudi.keygen.ComplexAvroKeyGenerator', 'write.precombine.field' = 'update_time', 'index.type' = 'BUCKET', 'hoodie.bucket.index.num.buckets' = '4', 'write.tasks' = '8', 'hoodie.bucket.index.hash.field' = 'id', 'clean.retain_commits' = '5', -- Hive sync settings 'hive_sync.enable' = 'false' ); -- Insert into Hudi sink INSERT INTO dim_buyer_info_test SELECT id, dec_col, country, age, update_time FROM buyer_info; ``` Might have butchered the explanation above... As such, we were unable to reproduce your issue where of a single TM restarting. Can you please share your job configurations and how you're doing your tests? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] voonhous commented on issue #8892: [SUPPORT] [BUG] Duplicate fileID ??? from bucket ?? of partition found during the BucketStreamWriteFunction index bootstrap.
voonhous commented on issue #8892: URL: https://github.com/apache/hudi/issues/8892#issuecomment-163281 @big-doudou Thank you so much for the details! This looks like an issue with partial-failover and recovery. Apologies, am still trying to understand this, can you give me the rest of the week to try and reproduce this? Will let you know! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] voonhous commented on issue #8892: [SUPPORT] [BUG] Duplicate fileID ??? from bucket ?? of partition found during the BucketStreamWriteFunction index bootstrap.
voonhous commented on issue #8892: URL: https://github.com/apache/hudi/issues/8892#issuecomment-1631975169 @big-doudou Any updates? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] voonhous commented on issue #8892: [SUPPORT] [BUG] Duplicate fileID ??? from bucket ?? of partition found during the BucketStreamWriteFunction index bootstrap.
voonhous commented on issue #8892: URL: https://github.com/apache/hudi/issues/8892#issuecomment-1630005948 @pftn I mean @big-doudou's logs. Is he running the same version as you? Also, if this error was thrown recently on his pipeline, is it possible for him to share his JM + TM logs with me privately to assist in reproducing this issue locally? Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] voonhous commented on issue #8892: [SUPPORT] [BUG] Duplicate fileID ??? from bucket ?? of partition found during the BucketStreamWriteFunction index bootstrap.
voonhous commented on issue #8892: URL: https://github.com/apache/hudi/issues/8892#issuecomment-1629996333 @big-doudou > * Replace partition files using the repairedOutputPath in step 2 Can you please share your Hudi-version + stack trace, thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] voonhous commented on issue #8892: [SUPPORT] [BUG] Duplicate fileID ??? from bucket ?? of partition found during the BucketStreamWriteFunction index bootstrap.
voonhous commented on issue #8892: URL: https://github.com/apache/hudi/issues/8892#issuecomment-1583996315 @pftn can you please help to verify if the data in these 2 parquets are the same? 1. 20220604/0007-3477-401f-982e-e5ae38ca0e23_3-20-6_20230510170043301.parquet 2. 20220604/0007-4bc1-4340-a9d8-330666a58244_5-20-6_20230511183601566.parquet Do you still have the compaction plans that generated these 2 parquet files, it'll be extremely helpful if we can know the write token of the log files before compaction. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] voonhous commented on issue #8892: [SUPPORT] [BUG] Duplicate fileID ??? from bucket ?? of partition found during the BucketStreamWriteFunction index bootstrap.
voonhous commented on issue #8892: URL: https://github.com/apache/hudi/issues/8892#issuecomment-1582326573 I noticed 3 bucketIds being repeated and they all have instants between **20230511183601566** and **20230510170043301**. Can you please share your timeline between these 2 instants? Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] voonhous commented on issue #8892: [SUPPORT] [BUG] Duplicate fileID ??? from bucket ?? of partition found during the BucketStreamWriteFunction index bootstrap.
voonhous commented on issue #8892: URL: https://github.com/apache/hudi/issues/8892#issuecomment-1578128987 @hbgstc123 For visibility. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] voonhous commented on issue #8892: [SUPPORT] [BUG] Duplicate fileID ??? from bucket ?? of partition found during the BucketStreamWriteFunction index bootstrap.
voonhous commented on issue #8892: URL: https://github.com/apache/hudi/issues/8892#issuecomment-1578118618 Can you please share the file size between: 20220604/0009-5876-4e72-9cda-656772feb7a6_17-20-6_20230511183601566.parquet 20220604/0009-c3bc-4ae4-a1e0-917970420ac7_1-20-6_20230510170043301.parquet -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org