[GitHub] [hudi] dongkelun commented on pull request #5406: [HUDI-3954] Don't keep the last commit before the earliest commit to retain

2022-09-13 Thread GitBox


dongkelun commented on PR #5406:
URL: https://github.com/apache/hudi/pull/5406#issuecomment-1246103721

   > I do not want to touch something that existed for 3+ years w/o 
understanding the repurcussions. will probably wait to hear from vinoth or 
prasanna/balaji. we can leave the patch open until then.
   OK
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] dongkelun commented on pull request #5406: [HUDI-3954] Don't keep the last commit before the earliest commit to retain

2022-09-08 Thread GitBox


dongkelun commented on PR #5406:
URL: https://github.com/apache/hudi/pull/5406#issuecomment-1240374420

   > hey @dongkelun : may be there is some rational behind the original intent. 
Its just deducting 1 commit from what user wants right. as of now, I don't feel 
this is giving us much or fixing any regression. can we drop the patch.
   
   However, I think it doesn't make much sense, and the logic becomes 
complicated and makes users feel confused, and removing this logic seems to 
have no bad effect. So I personally don't think it's necessary to keep it


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] dongkelun commented on pull request #5406: [HUDI-3954] Don't keep the last commit before the earliest commit to retain

2022-05-08 Thread GitBox


dongkelun commented on PR #5406:
URL: https://github.com/apache/hudi/pull/5406#issuecomment-1120376579

   > Okay, you have a valid point here. I agree this policy has been a bit 
confusing from the very beginning because of this extra logic of retaining one 
more commit. My only concern was this might break things for existing users who 
have been using this cleaning policy and assume one extra commit will be 
retained. I will go over the changes by tomorrow.
   > 
   > I think it will be good to get opinions from @nsivabalan @bvaradar as well 
on this change. If they support, we are good to go.
   
   Okay
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] dongkelun commented on pull request #5406: [HUDI-3954] Don't keep the last commit before the earliest commit to retain

2022-05-08 Thread GitBox


dongkelun commented on PR #5406:
URL: https://github.com/apache/hudi/pull/5406#issuecomment-1120365156

   @pratyakshsharma  I think it's enough to reserve 300 minutes for query. 
There's no need to reserve one more 30 minutes. If we want to reserve more, we 
can increase the configuration parameters. In addition, other cleaning  
policies do not retain one more version. I don't understand why they are not 
consistent.And I think it's a little redundant to retain one more version, 
because we have reserved enough time through configuration parameters. And it 
is not consistent with the configuration parameters, which is easy to confuse 
users.
   So I think it's reasonable not to keep one more version, and the code is 
more concise.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] dongkelun commented on pull request #5406: [HUDI-3954] Don't keep the last commit before the earliest commit to retain

2022-05-07 Thread GitBox


dongkelun commented on PR #5406:
URL: https://github.com/apache/hudi/pull/5406#issuecomment-1120339108

   > @dongkelun I guess there is some confusion with respect to the 
documentation for the said config. I have added a review comment [#4927 
(comment)](https://github.com/apache/hudi/pull/4927#discussion_r867342264) to 
correct this. Also regarding the existing functionality, the code is fine in 
its correct form and no changes are required. To quote the documentation from 
java docs - `We assume that the max(query execution time) == commit_batch_time 
* config.getCleanerCommitsRetained().
   > 
   > * This is 5 hours by default (assuming ingestion is running every 30 
minutes). This is essential to leave the file
   > * used by the query that is running for the max time.`
   > 
   > Please let me know if you still have any doubts and I can explain it 
further.
   
   | commit | retainTime(min)|
   |-|-|
   | 001.commit | 0|
   | 002.commit | 30|
   | 003.commit | 60|
   | 004.commit | 90|
   | 005.commit | 120|
   | 006.commit | 150|
   | 007.commit | 180|
   | 008.commit | 210|
   | 009.commit | 240|
   | 0010.commit | 270|
   | 0011.commit | 300|
   
   As you said, a file is retained for 5 hours by default, that is, 300 
minutes. So when `0011`.commit is completed,` 001.commit` has been reserved for 
300 minutes, so we should not keep `001.commit` when we clean.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] dongkelun commented on pull request #5406: [HUDI-3954] Don't keep the last commit before the earliest commit to retain

2022-04-29 Thread GitBox


dongkelun commented on PR #5406:
URL: https://github.com/apache/hudi/pull/5406#issuecomment-1113896848

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] dongkelun commented on pull request #5406: [HUDI-3954] Don't keep the last commit before the earliest commit to retain

2022-04-28 Thread GitBox


dongkelun commented on PR #5406:
URL: https://github.com/apache/hudi/pull/5406#issuecomment-1112799961

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] dongkelun commented on pull request #5406: [HUDI-3954] Don't keep the last commit before the earliest commit to retain

2022-04-28 Thread GitBox


dongkelun commented on PR #5406:
URL: https://github.com/apache/hudi/pull/5406#issuecomment-1112373019

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org