Re: [PR] Docs: Fix inconsistency in branching and tagging scenario [iceberg]
bitsondatadev commented on code in PR #9968: URL: https://github.com/apache/iceberg/pull/9968#discussion_r1560201975 ## docs/docs/branching.md: ## @@ -49,20 +49,21 @@ Tags can be used for retaining important historical snapshots for auditing purpo The above diagram demonstrates retaining important historical snapshot with the following retention policy, defined via Spark SQL. -1. Retain 1 snapshot per week for 1 month. This can be achieved by tagging the weekly snapshot and setting the tag retention to be a month. -snapshots will be kept, and the branch reference itself will be retained for 1 week. +Assume snapshots are compressed to a single day before this command executes. Review Comment: ```suggestion Consider a table that runs a daily scheduled task to compress each snapshot to a single day at the end of each day. ``` ## docs/docs/branching.md: ## @@ -49,20 +49,21 @@ Tags can be used for retaining important historical snapshots for auditing purpo The above diagram demonstrates retaining important historical snapshot with the following retention policy, defined via Spark SQL. -1. Retain 1 snapshot per week for 1 month. This can be achieved by tagging the weekly snapshot and setting the tag retention to be a month. -snapshots will be kept, and the branch reference itself will be retained for 1 week. +Assume snapshots are compressed to a single day before this command executes. + +1. Create a tag on the snapshot occurring at the end of the first week, that will expire a month after it is created. You do this by setting the tag retention to be 30 days, or an average month. Run this command for each weekend to keep a weekly Snapshot. Review Comment: ```suggestion 1. Create a tag on the snapshot occurring at the end of the first week, that expires an average month, or 30 days, after the tag generates. This command illustrates tagging the end of the initial week with the tag of 'EOW-1' after creating the seventh daily snapshot. ``` ## docs/docs/branching.md: ## @@ -49,20 +49,21 @@ Tags can be used for retaining important historical snapshots for auditing purpo The above diagram demonstrates retaining important historical snapshot with the following retention policy, defined via Spark SQL. -1. Retain 1 snapshot per week for 1 month. This can be achieved by tagging the weekly snapshot and setting the tag retention to be a month. -snapshots will be kept, and the branch reference itself will be retained for 1 week. +Assume snapshots are compressed to a single day before this command executes. + +1. Create a tag on the snapshot occurring at the end of the first week, that will expire a month after it is created. You do this by setting the tag retention to be 30 days, or an average month. Run this command for each weekend to keep a weekly Snapshot. ```sql --- Create a tag for the first end of week snapshot. Retain the snapshot for a week -ALTER TABLE prod.db.table CREATE TAG `EOW-01` AS OF VERSION 7 RETAIN 7 DAYS; +-- Create a tag for the first end of week snapshot. Retain the snapshot for a month +ALTER TABLE prod.db.table CREATE TAG `EOW-01` AS OF VERSION 7 RETAIN 30 DAYS; ``` -2. Retain 1 snapshot per month for 6 months. This can be achieved by tagging the monthly snapshot and setting the tag retention to be 6 months. +2. Create a tag on the snapshot occurring at the end of the first month, that will expire three months after it is created. You do this by setting the tag retention to be 180 days, or an average 3 months. Run this command for each month to keep a monthly Snapshot. Review Comment: ```suggestion 2. Create a tag on the snapshot occurring at the end of the first month, that expires three months after it is created. You do this by setting the tag retention to be 90 days, or an average 3 months. Run this command for each month to keep a monthly Snapshot. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Docs: Fix inconsistency in branching and tagging scenario [iceberg]
lawofcycles commented on PR #9968: URL: https://github.com/apache/iceberg/pull/9968#issuecomment-2041291407 @bitsondatadev While referring to your comments, I pushed new version to improve entire explanation for Historical Tags. I was aware of the following points. - assumption that each snapshot are compressed for each day - User must assure that target snapshot cover the intended term by timing to create tag. I hope this suggestion helps. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Docs: Fix inconsistency in branching and tagging scenario [iceberg]
bitsondatadev commented on PR #9968: URL: https://github.com/apache/iceberg/pull/9968#issuecomment-2018058241 @lawofcycles Thanks for your patience and willingness to help here! I'd like to consider an alternative explanation. I think conflating version with days is a big part of the confusion -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Docs: Fix inconsistency in branching and tagging scenario [iceberg]
lawofcycles commented on PR #9968: URL: https://github.com/apache/iceberg/pull/9968#issuecomment-2017455653 @bitsondatadev, thank you for your valuable feedback. I agree that the explanations on this page could be improved, especially around the creation and retention of weekly snapshots. While also seeking input from @Fokko, I'd be happy to work on revising the entire page based on your suggestions to make the explanations clearer. I appreciate your continued guidance. I look forward to collaborating with you to enhance this page. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Docs: Fix inconsistency in branching and tagging scenario [iceberg]
bitsondatadev commented on code in PR #9968: URL: https://github.com/apache/iceberg/pull/9968#discussion_r1537002725 ## docs/docs/branching.md: ## @@ -50,10 +50,10 @@ The above diagram demonstrates retaining important historical snapshot with the via Spark SQL. 1. Retain 1 snapshot per week for 1 month. This can be achieved by tagging the weekly snapshot and setting the tag retention to be a month. -snapshots will be kept, and the branch reference itself will be retained for 1 week. +4 weekly snapshots will be kept, and the branch reference itself will be retained for 1 week. Review Comment: Delete the first bullet point and append this after the last sentence before the list: ```suggestion Assume snapshots are compressed to a single day before this command executes. 1. Create a tag on the snapshot occurring at the end of the first week, that will expire a month after it is created. You do this by setting the tag retention to be 30 days, or an average month. Run this command for each of the following weekend tag. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Docs: Fix inconsistency in branching and tagging scenario [iceberg]
bitsondatadev commented on PR #9968: URL: https://github.com/apache/iceberg/pull/9968#issuecomment-2017104828 @Fokko, I believe the wording on this page is not only inconsistent, but it is rather confusing as it seems to indicate that a catalog is actually handling the creation of weekly tags. > The above diagram demonstrates retaining important historical snapshot with the following retention policy, defined via Spark SQL. > > Retain 1 snapshot per week for 1 month. This can be achieved by tagging the weekly snapshot and setting the tag retention to be a month. snapshots will be kept, and the branch reference itself will be retained for 1 week. Also the "snapshots will be kept" needs cleaning up. @lawofcycles, great catch on the inconsistency and actually the wording here needs a bit of rework if you're willing to help us fix it. I'll add a review comment where this needs to be updated. I do want to get Fokko's input though if I somehow don't understand how branching and tagging works, but this page almost indicates something that isn't there and I want to make sure I'm correct in my assumption that this wouldn't somehow magically start creating snapshots every 7 versions (days in this example). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Docs: Fix inconsistency in branching and tagging scenario [iceberg]
bitsondatadev commented on PR #9968: URL: https://github.com/apache/iceberg/pull/9968#issuecomment-2015939651 I'm currently AFK, I'll test this tonight. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Docs: Fix inconsistency in branching and tagging scenario [iceberg]
lawofcycles commented on PR #9968: URL: https://github.com/apache/iceberg/pull/9968#issuecomment-2015937292 @Fokko @bitsondatadev I would greatly appreciate it if you could kindly review this pull request. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[PR] Docs: Fix inconsistency in branching and tagging scenario [iceberg]
lawofcycles opened a new pull request, #9968: URL: https://github.com/apache/iceberg/pull/9968 There is an inconsistency between the explanation of the snapshot retention strategy and SQL code on EOW-1 scenario. The description mentions retaining 1 snapshot per week for 1 month, but the SQL code sets the retention period to only 1 week. > Retain 1 snapshot per week for 1 month. This can be achieved by tagging the weekly snapshot and setting the tag retention to be a month. snapshots will be kept, and the branch reference itself will be retained for 1 week. > -- Create a tag for the first end of week snapshot. Retain the snapshot for a week ALTER TABLE prod.db.table CREATE TAG `EOW-01` AS OF VERSION 7 RETAIN 7 DAYS; Modify the description to match the SQL code Update the description to state that only the latest weekly snapshot is retained for 1 week. Clarify that the branch reference itself is also retained for 1 week. Please review the proposed changes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org