Re: [PR] Docs: Fix inconsistency in branching and tagging scenario [iceberg]

2024-04-10 Thread via GitHub


bitsondatadev commented on code in PR #9968:
URL: https://github.com/apache/iceberg/pull/9968#discussion_r1560201975


##
docs/docs/branching.md:
##
@@ -49,20 +49,21 @@ Tags can be used for retaining important historical 
snapshots for auditing purpo
 The above diagram demonstrates retaining important historical snapshot with 
the following retention policy, defined 
 via Spark SQL.
 
-1. Retain 1 snapshot per week for 1 month. This can be achieved by tagging the 
weekly snapshot and setting the tag retention to be a month.
-snapshots will be kept, and the branch reference itself will be retained for 1 
week. 
+Assume snapshots are compressed to a single day before this command executes.

Review Comment:
   ```suggestion
   Consider a table that runs a daily scheduled task to compress each snapshot 
to a single day at the end of each day.
   ```
   



##
docs/docs/branching.md:
##
@@ -49,20 +49,21 @@ Tags can be used for retaining important historical 
snapshots for auditing purpo
 The above diagram demonstrates retaining important historical snapshot with 
the following retention policy, defined 
 via Spark SQL.
 
-1. Retain 1 snapshot per week for 1 month. This can be achieved by tagging the 
weekly snapshot and setting the tag retention to be a month.
-snapshots will be kept, and the branch reference itself will be retained for 1 
week. 
+Assume snapshots are compressed to a single day before this command executes.
+
+1. Create a tag on the snapshot occurring at the end of the first week, that 
will expire a month after it is created. You do this by setting the tag 
retention to be 30 days, or an average month. Run this command for each weekend 
to keep a weekly Snapshot.

Review Comment:
   ```suggestion
   1. Create a tag on the snapshot occurring at the end of the first week, that 
expires an average month, or 30 days, after the tag generates. This command 
illustrates tagging the end of the initial week with the tag of 'EOW-1' after 
creating the seventh daily snapshot.
   ```
   



##
docs/docs/branching.md:
##
@@ -49,20 +49,21 @@ Tags can be used for retaining important historical 
snapshots for auditing purpo
 The above diagram demonstrates retaining important historical snapshot with 
the following retention policy, defined 
 via Spark SQL.
 
-1. Retain 1 snapshot per week for 1 month. This can be achieved by tagging the 
weekly snapshot and setting the tag retention to be a month.
-snapshots will be kept, and the branch reference itself will be retained for 1 
week. 
+Assume snapshots are compressed to a single day before this command executes.
+
+1. Create a tag on the snapshot occurring at the end of the first week, that 
will expire a month after it is created. You do this by setting the tag 
retention to be 30 days, or an average month. Run this command for each weekend 
to keep a weekly Snapshot.
 ```sql
--- Create a tag for the first end of week snapshot. Retain the snapshot for a 
week
-ALTER TABLE prod.db.table CREATE TAG `EOW-01` AS OF VERSION 7 RETAIN 7 DAYS;
+-- Create a tag for the first end of week snapshot. Retain the snapshot for a 
month
+ALTER TABLE prod.db.table CREATE TAG `EOW-01` AS OF VERSION 7 RETAIN 30 DAYS;
 ```
 
-2. Retain 1 snapshot per month for 6 months. This can be achieved by tagging 
the monthly snapshot and setting the tag retention to be 6 months.
+2. Create a tag on the snapshot occurring at the end of the first month, that 
will expire three months after it is created. You do this by setting the tag 
retention to be 180 days, or an average 3 months. Run this command for each 
month to keep a monthly Snapshot.

Review Comment:
   ```suggestion
   2. Create a tag on the snapshot occurring at the end of the first month, 
that expires three months after it is created. You do this by setting the tag 
retention to be 90 days, or an average 3 months. Run this command for each 
month to keep a monthly Snapshot.
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Docs: Fix inconsistency in branching and tagging scenario [iceberg]

2024-04-06 Thread via GitHub


lawofcycles commented on PR #9968:
URL: https://github.com/apache/iceberg/pull/9968#issuecomment-2041291407

   @bitsondatadev 
   While referring to your comments, I pushed new version to improve entire 
explanation for Historical Tags. 
   I was aware of the following points.
   
   - assumption that each snapshot are compressed for each day
   - User must assure that target snapshot cover the intended term by timing to 
create tag.
   
   I hope this suggestion helps.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Docs: Fix inconsistency in branching and tagging scenario [iceberg]

2024-03-25 Thread via GitHub


bitsondatadev commented on PR #9968:
URL: https://github.com/apache/iceberg/pull/9968#issuecomment-2018058241

   @lawofcycles Thanks for your patience and willingness to help here! I'd like 
to consider an alternative explanation. I think conflating version with days is 
a big part of the confusion


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Docs: Fix inconsistency in branching and tagging scenario [iceberg]

2024-03-25 Thread via GitHub


lawofcycles commented on PR #9968:
URL: https://github.com/apache/iceberg/pull/9968#issuecomment-2017455653

   @bitsondatadev, thank you for your valuable feedback. I agree that the 
explanations on this page could be improved, especially around the creation and 
retention of weekly snapshots.
   
   While also seeking input from @Fokko, I'd be happy to work on revising the 
entire page based on your suggestions to make the explanations clearer.
   I appreciate your continued guidance. I look forward to collaborating with 
you to enhance this page. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Docs: Fix inconsistency in branching and tagging scenario [iceberg]

2024-03-24 Thread via GitHub


bitsondatadev commented on code in PR #9968:
URL: https://github.com/apache/iceberg/pull/9968#discussion_r1537002725


##
docs/docs/branching.md:
##
@@ -50,10 +50,10 @@ The above diagram demonstrates retaining important 
historical snapshot with the
 via Spark SQL.
 
 1. Retain 1 snapshot per week for 1 month. This can be achieved by tagging the 
weekly snapshot and setting the tag retention to be a month.
-snapshots will be kept, and the branch reference itself will be retained for 1 
week. 
+4 weekly snapshots will be kept, and the branch reference itself will be 
retained for 1 week. 

Review Comment:
   Delete the first bullet point and append this after the last sentence before 
the list:
   ```suggestion
   Assume snapshots are compressed to a single day before this command executes.
   
   1. Create a tag on the snapshot occurring at the end of the first week, that 
will expire a month after it is created. You do this by setting the tag 
retention to be 30 days, or an average month. Run this command for each of the 
following weekend tag. 
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Docs: Fix inconsistency in branching and tagging scenario [iceberg]

2024-03-24 Thread via GitHub


bitsondatadev commented on PR #9968:
URL: https://github.com/apache/iceberg/pull/9968#issuecomment-2017104828

   @Fokko, I believe the wording on this page is not only inconsistent, but it 
is rather confusing as it seems to indicate that a catalog is actually handling 
the creation of weekly tags.
   
   > The above diagram demonstrates retaining important historical snapshot 
with the following retention policy, defined via Spark SQL.
   >
   > Retain 1 snapshot per week for 1 month. This can be achieved by tagging 
the weekly snapshot and setting the tag retention to be a month. snapshots will 
be kept, and the branch reference itself will be retained for 1 week.
   
   Also the "snapshots will be kept" needs cleaning up.
   
   @lawofcycles, great catch on the inconsistency and actually the wording here 
needs a bit of rework if you're willing to help us fix it. I'll add a review 
comment where this needs to be updated. I do want to get Fokko's input though 
if I somehow don't understand how branching and tagging works, but this page 
almost indicates something that isn't there and I want to make sure I'm correct 
in my assumption that this wouldn't somehow magically start creating snapshots 
every 7 versions (days in this example).
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Docs: Fix inconsistency in branching and tagging scenario [iceberg]

2024-03-22 Thread via GitHub


bitsondatadev commented on PR #9968:
URL: https://github.com/apache/iceberg/pull/9968#issuecomment-2015939651

   I'm currently AFK, I'll test this tonight.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Docs: Fix inconsistency in branching and tagging scenario [iceberg]

2024-03-22 Thread via GitHub


lawofcycles commented on PR #9968:
URL: https://github.com/apache/iceberg/pull/9968#issuecomment-2015937292

   @Fokko @bitsondatadev 
   I would greatly appreciate it if you could kindly review this pull request.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[PR] Docs: Fix inconsistency in branching and tagging scenario [iceberg]

2024-03-16 Thread via GitHub


lawofcycles opened a new pull request, #9968:
URL: https://github.com/apache/iceberg/pull/9968

   There is an inconsistency between the explanation of the snapshot retention 
strategy and SQL code on EOW-1 scenario. 
   The description mentions retaining 1 snapshot per week for 1 month, but the 
SQL code sets the retention period to only 1 week.
   
   > Retain 1 snapshot per week for 1 month. This can be achieved by tagging 
the weekly snapshot and setting the tag retention to be a month. snapshots will 
be kept, and the branch reference itself will be retained for 1 week.
   > -- Create a tag for the first end of week snapshot. Retain the snapshot 
for a week
   ALTER TABLE prod.db.table CREATE TAG `EOW-01` AS OF VERSION 7 RETAIN 7 DAYS;
   
   Modify the description to match the SQL code
   Update the description to state that only the latest weekly snapshot is 
retained for 1 week.
   Clarify that the branch reference itself is also retained for 1 week.
   
   Please review the proposed changes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org