Re: [I] [SUPPORT] Rollback failed clustering 0.12.2 [hudi]

2024-05-08 Thread via GitHub


ad1happy2go commented on issue #10964:
URL: https://github.com/apache/hudi/issues/10964#issuecomment-2100219371

   Thanks @VitoMakarevich . We were also able to resolve the same error using 
these two configs only as you suggested. 
   
   There is a discussion around fixing this in a  long term as part of this 
JIRA - https://issues.apache.org/jira/browse/HUDI-1045
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Rollback failed clustering 0.12.2 [hudi]

2024-04-12 Thread via GitHub


VitoMakarevich commented on issue #10964:
URL: https://github.com/apache/hudi/issues/10964#issuecomment-2051734341

   I managed to do it with
   
[hoodie.clustering.updates.strategy](https://hudi.apache.org/docs/configurations/#hoodieclusteringupdatesstrategy)
 -> org.apache.hudi.client.clustering.update.strategy.SparkAllowUpdateStrategy 
(non-default)
   
[hoodie.clustering.rollback.pending.replacecommit.on.conflict](https://hudi.apache.org/docs/configurations/#hoodieclusteringrollbackpendingreplacecommitonconflict)
 -> true(non-default)
   
   The precondition is that your write should affect clustered partitions, 
otherwise nothing will happen.
   
   Unfortunately, I don't see any other way to do it(without copypasting some 
Hudi internals which looks risky for many users).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Rollback failed clustering 0.12.2 [hudi]

2024-04-08 Thread via GitHub


nsivabalan commented on issue #10964:
URL: https://github.com/apache/hudi/issues/10964#issuecomment-2043986341

   hey @suryaprasanna : Can you take this up and offer some suggestions. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Rollback failed clustering 0.12.2 [hudi]

2024-04-05 Thread via GitHub


VitoMakarevich commented on issue #10964:
URL: https://github.com/apache/hudi/issues/10964#issuecomment-2040415437

   Update - dug into the code `clusteringHandleUpdate`, and see that if:
   Updates rejected - write fails.
   Updates accepted - 
if(`hoodie.clustering.rollback.pending.replacecommit.on.conflict` is `true`) - 
those pending clustering instants that conflict with update records - rolled 
back.
   Updates accepted - 
if(`hoodie.clustering.rollback.pending.replacecommit.on.conflict` is `false`) - 
pending clustering instants left on commit line, updates made to previous files.
   
   So it looks like switching these 2:
   
[hoodie.clustering.updates.strategy](https://hudi.apache.org/docs/configurations/#hoodieclusteringupdatesstrategy)
 -> 
`org.apache.hudi.client.clustering.update.strategy.SparkRejectUpdateStrategy` 
(non-default)
   
[hoodie.clustering.rollback.pending.replacecommit.on.conflict](https://hudi.apache.org/docs/configurations/#hoodieclusteringrollbackpendingreplacecommitonconflict)
 -> `true`(non-default)
   is generally safe for all operations inline and single writer.
   e.g. if the commit fails in the middle of clustering - subsequent commit 
will be run and it will synchronously rollback clustering instants, and writing 
updates into old files.
   
   Can someone confirm? @nsivabalan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Rollback failed clustering 0.12.2 [hudi]

2024-04-05 Thread via GitHub


VitoMakarevich commented on issue #10964:
URL: https://github.com/apache/hudi/issues/10964#issuecomment-2040341590

   I see this property 
`hoodie.clustering.rollback.pending.replacecommit.on.conflict` - is it 
generally safe to use if we have single writer with inline services only?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Rollback failed clustering 0.12.2 [hudi]

2024-04-05 Thread via GitHub


VitoMakarevich commented on issue #10964:
URL: https://github.com/apache/hudi/issues/10964#issuecomment-2040330648

   Update: managed to reproduce it, after stopping job during clustering, 
subsequent write fails with the exception
   `Not allowed to update the clustering file group 
HoodieFileGroupId{partitionPath='partition1=1', 
fileId='ff2d1ed7-ff77-4a9f-95c6-1b9deeccf105-0'}. For pending clustering 
operations, we are not going to support update for now.`
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[I] [SUPPORT] Rollback failed clustering 0.12.2 [hudi]

2024-04-05 Thread via GitHub


VitoMakarevich opened a new issue, #10964:
URL: https://github.com/apache/hudi/issues/10964

   **_Tips before filing an issue_**
   
   - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)?
   
   - Join the mailing list to engage in conversations and get faster support at 
dev-subscr...@hudi.apache.org.
   
   - If you have triaged this as a bug, then file an 
[issue](https://issues.apache.org/jira/projects/HUDI/issues) directly.
   
   **Describe the problem you faced**
   
   Hello, this is a followup from https://github.com/apache/hudi/issues/10878. 
We managed to run clustering, but I'm obsessed with a potential recovery plan.
   So what behavior I know - when `.commit.requested` and `.commit.inflight` 
created, but not `.commit` - then subsequent write will do a rollback. - this 
works for normal commits.
   However, if I start clustering - if the job stops before `.inflight` is 
created - subsequent write will fail if affects partition present in 
`.replacecommit.requested` - controlled by 
[hoodie.clustering.updates.strategy](https://hudi.apache.org/docs/configurations/#hoodieclusteringupdatesstrategy).
 So here I can only either run clustering from CLI or just delete instant(can 
you confirm? per code looks like it's safe if there is no `.inflight`).
   But - if it fails after start writing files(after `.replacecommit.inflight` 
is created, but before `.replacecommit` is created) - which choices do I have? 
As I checked through the code - it looks like there is no automatic rollback 
for `replacecommit`, and `hudi-cli` has rollback only for finished instants.
   Given this, can you answer 2 questions:
   1. If clustering failed after `.replacecommit.requested`, but before 
`.replacecommit.inflight` - is it safe to just delete commit file itself? 
Recently you added this PR and it looks to be  doing exactly this 
https://github.com/apache/hudi/pull/10645/files
   2. If clustering failed after `.replacecommit.inflight`, but before 
`.replacecommit` - what are recovery steps?
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1.
   2.
   3.
   4.
   
   **Expected behavior**
   
   A clear and concise description of what you expected to happen.
   
   **Environment Description**
   
   * Hudi version : 0.12.2
   
   * Spark version : 3.3.0
   
   * Hive version :
   
   * Hadoop version :
   
   * Storage (HDFS/S3/GCS..) :
   
   * Running on Docker? (yes/no) :
   
   
   **Additional context**
   
   Add any other context about the problem here.
   
   **Stacktrace**
   
   ```Add the stacktrace of the error.```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org