[GitHub] [hudi] voonhous commented on issue #8892: [SUPPORT] [BUG] Duplicate fileID ??? from bucket ?? of partition found during the BucketStreamWriteFunction index bootstrap.

2023-08-03 Thread via GitHub


voonhous commented on issue #8892:
URL: https://github.com/apache/hudi/issues/8892#issuecomment-1663965357

   Okay, just finished writing the test to reproduce this issue, will submit a 
PR tomorrow after cleaning it up. thanks thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] voonhous commented on issue #8892: [SUPPORT] [BUG] Duplicate fileID ??? from bucket ?? of partition found during the BucketStreamWriteFunction index bootstrap.

2023-08-03 Thread via GitHub


voonhous commented on issue #8892:
URL: https://github.com/apache/hudi/issues/8892#issuecomment-1663573970

   @danny0405 I don't think cleaning up the files in `#finalizeWrite` is the 
correct way of doing things as the current implementation of `#finalizeWrite` 
only handles parquet files.
   
   ```java
   while (itr.hasNext()) {
 FileStatus status = itr.next();
 String pathStr = status.getPath().toString();
 if (pathStr.contains(HoodieTableMetaClient.MARKER_EXTN) && 
!pathStr.endsWith(IOType.APPEND.name())) {
   result.add(translateMarkerToDataPath(pathStr));
 }
   }
   ```
   
   Given that this is not a partial-failover (when a TM fails, all TMs are 
"restarted"), we should actually ensure a rollback is performed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] voonhous commented on issue #8892: [SUPPORT] [BUG] Duplicate fileID ??? from bucket ?? of partition found during the BucketStreamWriteFunction index bootstrap.

2023-08-02 Thread via GitHub


voonhous commented on issue #8892:
URL: https://github.com/apache/hudi/issues/8892#issuecomment-1663266922

   I think he means check why f#inalizeWrite is not picking up the files to be 
deleted upon commit? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] voonhous commented on issue #8892: [SUPPORT] [BUG] Duplicate fileID ??? from bucket ?? of partition found during the BucketStreamWriteFunction index bootstrap.

2023-08-02 Thread via GitHub


voonhous commented on issue #8892:
URL: https://github.com/apache/hudi/issues/8892#issuecomment-1663258600

   Spent 2 more hours looking at this issue:
   
   What happened was that I was testing this on 0.12.1 without this PR: 
https://github.com/apache/hudi/pull/7208
   
   To reproduce this error:
   
   Add the snippet into 
`org.apache.hudi.sink.StreamWriteFunction#flushRemaining`:
   
   ```java
   if (taskID == 0) {
 // trigger a failure
 throw new HoodieException("Intentional failure on taskID 0 thrown to 
invoke partial failover?");
   }
   
   
   Prior to this enhancement, rollbacks will be created whenever a TM fails to 
remove all the partially written files. 
   
   However, after this enhancement rollbacks will not be created unless a job 
is restarted or global failover happens.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] voonhous commented on issue #8892: [SUPPORT] [BUG] Duplicate fileID ??? from bucket ?? of partition found during the BucketStreamWriteFunction index bootstrap.

2023-07-25 Thread via GitHub


voonhous commented on issue #8892:
URL: https://github.com/apache/hudi/issues/8892#issuecomment-1649313876

   Yeap, we ensured that has happened. In our internal version a rollback will 
be performed to remove all the files that was written before checkpoint.
   
   Afterwhich, a write will be performed again from the last successful 
checkpoint.
   
   I'll do a check on this again on the community's master version later in the 
week. Sorry.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] voonhous commented on issue #8892: [SUPPORT] [BUG] Duplicate fileID ??? from bucket ?? of partition found during the BucketStreamWriteFunction index bootstrap.

2023-07-19 Thread via GitHub


voonhous commented on issue #8892:
URL: https://github.com/apache/hudi/issues/8892#issuecomment-1643039551

   @big-doudou Apologies for the late reply. I was trying to reproduce this 
issue on our end, but was unable to do so. 
   
   A little context on what we did:
   
   Using a datagen source, we'll sink the data into a hudi table. Before a 
checkpoint, we'll kill one of the TM's task. Upon doing so, a rollback will be 
triggered when all the TMs restart. I checked with a colleague of mine and they 
mentioned that when hudi is uperforming an upsert, there's a shuffle operation. 
The presence of a shuffle operation will trigger a "global failover".
   
   Here's the Flink-SQL that i used while attempting to reproduce your issue.
   
   ```sql
   CREATE TEMPORARY TABLE buyer_info (
   id bigint, 
   dec_col decimal(25, 10),
   country string,
   age INT,
   update_time STRING
   ) WITH (
   'connector' = 'datagen',
   'rows-per-second' = '10',
   'fields.age.min' = '0',
   'fields.age.max' = '7',
   'fields.country.length' = '1'
   );
   
   -- Hudi table to write to
   CREATE TEMPORARY TABLE dim_buyer_info_test
   (
   id bigint,
   dec_col decimal(25, 10),
   country string,
   age INT,
   update_time STRING
   ) PARTITIONED BY (age)
   WITH
   (
   -- Hudi settings
   'connector' = 'hudi',
   'hoodie.datasource.write.recordkey.field' = 'id',
   'path' = '/path/to/hudi_table/duplicate_file_id_issue',
   'write.operation' = 'UPSERT',
   'table.type' = 'MERGE_ON_READ',
   'hoodie.compaction.payload.class' = 
'org.apache.hudi.common.model.PartialUpdateAvroPayload',
   'hoodie.datasource.write.payload.class' = 
'org.apache.hudi.common.model.PartialUpdateAvroPayload',
   'hoodie.table.keygenerator.class' = 
'org.apache.hudi.keygen.ComplexAvroKeyGenerator',
   'write.precombine.field' = 'update_time',
   'index.type' = 'BUCKET',
   'hoodie.bucket.index.num.buckets' = '4',
   'write.tasks' = '8',
   'hoodie.bucket.index.hash.field' = 'id',
   'clean.retain_commits' = '5',
   -- Hive sync settings
   'hive_sync.enable' = 'false'
   );
   
   -- Insert into Hudi sink
   INSERT INTO dim_buyer_info_test
   SELECT id, dec_col, country, age, update_time
   FROM buyer_info;
   ```
   
   Might have butchered the explanation above...
   
   As such, we were unable to reproduce your issue where of a single TM 
restarting. 
   
   Can you please share your job configurations and how you're doing your tests?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] voonhous commented on issue #8892: [SUPPORT] [BUG] Duplicate fileID ??? from bucket ?? of partition found during the BucketStreamWriteFunction index bootstrap.

2023-07-12 Thread via GitHub


voonhous commented on issue #8892:
URL: https://github.com/apache/hudi/issues/8892#issuecomment-163281

   @big-doudou Thank you so much for the details! This looks like an issue with 
partial-failover and recovery.
   
   Apologies, am still trying to understand this, can you give me the rest of 
the week to try and reproduce this?
   
   Will let you know! 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] voonhous commented on issue #8892: [SUPPORT] [BUG] Duplicate fileID ??? from bucket ?? of partition found during the BucketStreamWriteFunction index bootstrap.

2023-07-12 Thread via GitHub


voonhous commented on issue #8892:
URL: https://github.com/apache/hudi/issues/8892#issuecomment-1631975169

   @big-doudou Any updates? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] voonhous commented on issue #8892: [SUPPORT] [BUG] Duplicate fileID ??? from bucket ?? of partition found during the BucketStreamWriteFunction index bootstrap.

2023-07-10 Thread via GitHub


voonhous commented on issue #8892:
URL: https://github.com/apache/hudi/issues/8892#issuecomment-1630005948

   @pftn I mean @big-doudou's logs.
   
   Is he running the same version as you? Also, if this error was thrown 
recently on his pipeline, is it possible for him to share his JM + TM logs with 
me privately to assist in reproducing this issue locally? 
   
   Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] voonhous commented on issue #8892: [SUPPORT] [BUG] Duplicate fileID ??? from bucket ?? of partition found during the BucketStreamWriteFunction index bootstrap.

2023-07-10 Thread via GitHub


voonhous commented on issue #8892:
URL: https://github.com/apache/hudi/issues/8892#issuecomment-1629996333

   @big-doudou 
   
   > * Replace partition files using the repairedOutputPath in step 2
   
   Can you please share your Hudi-version + stack trace, thanks.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] voonhous commented on issue #8892: [SUPPORT] [BUG] Duplicate fileID ??? from bucket ?? of partition found during the BucketStreamWriteFunction index bootstrap.

2023-06-08 Thread via GitHub


voonhous commented on issue #8892:
URL: https://github.com/apache/hudi/issues/8892#issuecomment-1583996315

   @pftn can you please help to verify if the data in these 2 parquets are the 
same?
   
   1. 
20220604/0007-3477-401f-982e-e5ae38ca0e23_3-20-6_20230510170043301.parquet
   2. 
20220604/0007-4bc1-4340-a9d8-330666a58244_5-20-6_20230511183601566.parquet
   
   Do you still have the compaction plans that generated these 2 parquet files, 
it'll be extremely helpful if we can know the write token of the log files 
before compaction. Thanks!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] voonhous commented on issue #8892: [SUPPORT] [BUG] Duplicate fileID ??? from bucket ?? of partition found during the BucketStreamWriteFunction index bootstrap.

2023-06-08 Thread via GitHub


voonhous commented on issue #8892:
URL: https://github.com/apache/hudi/issues/8892#issuecomment-1582326573

   I noticed 3 bucketIds being repeated and they all have instants between 
**20230511183601566** and **20230510170043301**.
   
   Can you please share your timeline between these 2 instants? Thank you!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] voonhous commented on issue #8892: [SUPPORT] [BUG] Duplicate fileID ??? from bucket ?? of partition found during the BucketStreamWriteFunction index bootstrap.

2023-06-06 Thread via GitHub


voonhous commented on issue #8892:
URL: https://github.com/apache/hudi/issues/8892#issuecomment-1578128987

   @hbgstc123 For visibility.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] voonhous commented on issue #8892: [SUPPORT] [BUG] Duplicate fileID ??? from bucket ?? of partition found during the BucketStreamWriteFunction index bootstrap.

2023-06-06 Thread via GitHub


voonhous commented on issue #8892:
URL: https://github.com/apache/hudi/issues/8892#issuecomment-1578118618

   Can you please share the file size between:
   
   
20220604/0009-5876-4e72-9cda-656772feb7a6_17-20-6_20230511183601566.parquet
   
20220604/0009-c3bc-4ae4-a1e0-917970420ac7_1-20-6_20230510170043301.parquet
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org