[GitHub] [hudi] dongkelun commented on pull request #5406: [HUDI-3954] Don't keep the last commit before the earliest commit to retain
dongkelun commented on PR #5406: URL: https://github.com/apache/hudi/pull/5406#issuecomment-1246103721 > I do not want to touch something that existed for 3+ years w/o understanding the repurcussions. will probably wait to hear from vinoth or prasanna/balaji. we can leave the patch open until then. OK -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] dongkelun commented on pull request #5406: [HUDI-3954] Don't keep the last commit before the earliest commit to retain
dongkelun commented on PR #5406: URL: https://github.com/apache/hudi/pull/5406#issuecomment-1240374420 > hey @dongkelun : may be there is some rational behind the original intent. Its just deducting 1 commit from what user wants right. as of now, I don't feel this is giving us much or fixing any regression. can we drop the patch. However, I think it doesn't make much sense, and the logic becomes complicated and makes users feel confused, and removing this logic seems to have no bad effect. So I personally don't think it's necessary to keep it -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] dongkelun commented on pull request #5406: [HUDI-3954] Don't keep the last commit before the earliest commit to retain
dongkelun commented on PR #5406: URL: https://github.com/apache/hudi/pull/5406#issuecomment-1120376579 > Okay, you have a valid point here. I agree this policy has been a bit confusing from the very beginning because of this extra logic of retaining one more commit. My only concern was this might break things for existing users who have been using this cleaning policy and assume one extra commit will be retained. I will go over the changes by tomorrow. > > I think it will be good to get opinions from @nsivabalan @bvaradar as well on this change. If they support, we are good to go. Okay -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] dongkelun commented on pull request #5406: [HUDI-3954] Don't keep the last commit before the earliest commit to retain
dongkelun commented on PR #5406: URL: https://github.com/apache/hudi/pull/5406#issuecomment-1120365156 @pratyakshsharma I think it's enough to reserve 300 minutes for query. There's no need to reserve one more 30 minutes. If we want to reserve more, we can increase the configuration parameters. In addition, other cleaning policies do not retain one more version. I don't understand why they are not consistent.And I think it's a little redundant to retain one more version, because we have reserved enough time through configuration parameters. And it is not consistent with the configuration parameters, which is easy to confuse users. So I think it's reasonable not to keep one more version, and the code is more concise. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] dongkelun commented on pull request #5406: [HUDI-3954] Don't keep the last commit before the earliest commit to retain
dongkelun commented on PR #5406: URL: https://github.com/apache/hudi/pull/5406#issuecomment-1120339108 > @dongkelun I guess there is some confusion with respect to the documentation for the said config. I have added a review comment [#4927 (comment)](https://github.com/apache/hudi/pull/4927#discussion_r867342264) to correct this. Also regarding the existing functionality, the code is fine in its correct form and no changes are required. To quote the documentation from java docs - `We assume that the max(query execution time) == commit_batch_time * config.getCleanerCommitsRetained(). > > * This is 5 hours by default (assuming ingestion is running every 30 minutes). This is essential to leave the file > * used by the query that is running for the max time.` > > Please let me know if you still have any doubts and I can explain it further. | commit | retainTime(min)| |-|-| | 001.commit | 0| | 002.commit | 30| | 003.commit | 60| | 004.commit | 90| | 005.commit | 120| | 006.commit | 150| | 007.commit | 180| | 008.commit | 210| | 009.commit | 240| | 0010.commit | 270| | 0011.commit | 300| As you said, a file is retained for 5 hours by default, that is, 300 minutes. So when `0011`.commit is completed,` 001.commit` has been reserved for 300 minutes, so we should not keep `001.commit` when we clean. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] dongkelun commented on pull request #5406: [HUDI-3954] Don't keep the last commit before the earliest commit to retain
dongkelun commented on PR #5406: URL: https://github.com/apache/hudi/pull/5406#issuecomment-1113896848 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] dongkelun commented on pull request #5406: [HUDI-3954] Don't keep the last commit before the earliest commit to retain
dongkelun commented on PR #5406: URL: https://github.com/apache/hudi/pull/5406#issuecomment-1112799961 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] dongkelun commented on pull request #5406: [HUDI-3954] Don't keep the last commit before the earliest commit to retain
dongkelun commented on PR #5406: URL: https://github.com/apache/hudi/pull/5406#issuecomment-1112373019 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org