[ https://issues.apache.org/jira/browse/CASSANDRA-14092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16357234#comment-16357234 ]
Paulo Motta edited comment on CASSANDRA-14092 at 2/8/18 5:15 PM: ----------------------------------------------------------------- Thanks for the review, this took a bit longer than expected as I found an edge case which required a change from the previous approach as I describe later. First, see the follow-up from the previous round of review: {quote}In the 3.0+ branches, ExpirationDateOverflowHandling::maybeApplyExpirationDateOverflowPolicy can use Cell.NO_TTL rather than 0 in the first check. {quote} Agreed, fixed [here|https://github.com/apache/cassandra/commit/4bcea858f62b619b83a5db83f0cd93e192fea80c]. {quote}In 3.0+ you renamed the static policy field to have a shorter name, but missed that in the 2.1 & 2.2 branches. {quote} Good catch, fixed [here|https://github.com/apache/cassandra/commit/df59f2de8042e7ea67e520d509d675da1d53bbce]. {quote}In 2.1 I saw some (very) intermittent test failures in TTLTest. I instrumented checkTTLIsCapped to print out the (min | actual | max) TTLs to sysout and eventually managed to repro it. You can see from the output that in the first line, the min and actual are actually > the max, which caused the test to fail (this happened around 10% of the time). {quote} Oh, that's funny! Perhaps it could be related to the platform, as I cannot reproduce this locally? In any case, I updated the check [here|https://github.com/apache/cassandra/commit/18be7f140c3c2159f69d50b1bfb068927c734a13] to be more deterministic. I also noticed that we weren't applying the expiration overflow policy in the CQLv2 interface, so I updated it [here|https://github.com/apache/cassandra/commit/4230a5b4126e972827990dc33c1a9140af07afe1]. I find it hard someone is using this but I wouldn't be totally surprised. {quote}I think we should have the scrub fix in 2.1 & 2.2 as some users have not yet moved to 3.x and they should probably get the chance to repair (maybe) their data if they want/are able to. {quote} Agreed, as noted in the NEWS.txt, 2.1/2.2 is only affected if users have assertions disabled, but given that we have this [comment|https://github.com/apache/cassandra/blob/cassandra-2.1/conf/cassandra-env.sh#L173] on 2.1 I wouldn't be surprised if some users with assertions disabled hit this. While writing the 2.1 scrubber (which is slightly different from 3.0 due to CASSANDRA-8099), I found a nasty edge case with the recovery process. The problem is that, if the cell with negative/overflowed {{localExpirationTime}} [is converted to a tombstone during compaction|https://github.com/apache/cassandra/blob/30ed83d9266a03debad98ffac5610dcb3ae30934/src/java/org/apache/cassandra/db/rows/AbstractCell.java#L95], we can't convert the tombstone back into an expired cell because the TTL value is lost and we can't differentiate this from an ordinary tombstone. Furthermore, even if the original SSTable is scrubbed and the negative {{localExpirationTime}} is converted to {{MAX_DELETION_TIME}}, if the tombstone was already generated, it will shadow/delete the fixed expired cell, because [deleted cells have precedence over live cells when the timestamp is the same|https://github.com/apache/cassandra/blob/8b3a60b9a7dbefeecc06bace617279612ec7092d/src/java/org/apache/cassandra/db/Conflicts.java#L43]. I updated {{recover_negative_expiration_date_sstables_with_scrub}} to [add an SSTable with the expired cell converted to a tombstone|https://github.com/apache/cassandra-dtest/commit/63b0e7d51fbacf4fabbb860d81873605c3b66c92], and verified that an existing tombstone shadows recovered data in the current version. In order to fix this, the idea is during scrub increment the timestamp of cells with negative {{localExpirationTime}} by one in addition to setting the {{localExpirationTime}} to {{MAX_DELETION_TIME}}, so the recovered cell will shadow/supersede the potential tombstone that was already written. Since this will not recreate the row, but technically reinsert the row with a higher timestamp, we cannot do this automatically on scrub, since it may overwrite an existing row inserted just one millisecond after the original row (while very hard to happen in practice, this may depend on application logic). For this reason, the user needs to specify the {{--reinsert-overflowed-ttl}} option during scrub to perform the recovery. This approach is implemented [here|https://github.com/apache/cassandra/commit/5f9f704449ed1ef579c61953b02a9ef32f13082b] on 3.0 and ported to all other branches in a separate commit. {quote}The last thing is about providing a route to fix up overflowed dates via scrub, I think we should definitely leave the remedial Scrubber code in trunk until we have a proper fix committed. {quote} Since 4.0 was not released, and it will come out with this fix it's impossible that someone will hit this on 4.0, but I don't see any problem in keeping the scrub option for the lazy ones upgrading from 3.x without fixing their SSTables beforehand. :) For this reason I kept the 3.11 SSTables in the 4.X (d)tests. To prevent scrub and the tests from throwing an {{AssertionError}} on 2.1 I removed [this assertion|https://github.com/apache/cassandra/commit/4501eee5c962547c059a4f624155c5e18d5369d3], since it's no longer necessary given we apply the overflow policy. Finally, I [refactored|https://github.com/apache/cassandra/commit/c3e1fcc25b3db9db212dfc805a47430ff537b101] the NEWS.txt section a bit to take [~KurtG] comments into account and also to give more emphasis to the expiration overflow policies since this will be what most users need to care about, and added a recovery subsection with detailed recovery instructions. I now wonder if this notice has gotten to big and is cluttering the NEWS.txt file too much - what may confuse users looking for upgrade instructions - and we should maybe create a new WARNING/IMPORTANT file and add a pointer to it in the NEWS.txt file instead? To facilitate review, I squashed the previous commits and created a new branch with new commits only representing changes since last review (except for the 2.2 patch, which is basically the same as 2.1 except the additional commit simplifying TTLTest). The previous CI looked good, I will submit a new CI run and post the results here later. ||2.1||2.2||3.0||3.11||trunk||dtest|| |[branch|https://github.com/apache/cassandra/compare/cassandra-2.1...pauloricardomg:2.1-14092-v6]|[branch|https://github.com/apache/cassandra/compare/cassandra-2.2...pauloricardomg:2.2-14092-v6]|[branch|https://github.com/apache/cassandra/compare/cassandra-3.0...pauloricardomg:3.0-14092-v6]|[branch|https://github.com/apache/cassandra/compare/cassandra-3.11...pauloricardomg:3.11-14092-v6]|[branch|https://github.com/apache/cassandra/compare/trunk...pauloricardomg:trunk-14092-v6]|[branch|https://github.com/apache/cassandra-dtest/compare/master...pauloricardomg:14092]| was (Author: pauloricardomg): Thanks for the review, this took a bit longer than expected as I found an edge case which required a change from the previous approach as I describe later. First, see the follow-up from the previous round of review: {quote}In the 3.0+ branches, ExpirationDateOverflowHandling::maybeApplyExpirationDateOverflowPolicy can use Cell.NO_TTL rather than 0 in the first check. {quote} Agreed, fixed [here|https://github.com/apache/cassandra/commit/4bcea858f62b619b83a5db83f0cd93e192fea80c]. {quote}In 3.0+ you renamed the static policy field to have a shorter name, but missed that in the 2.1 & 2.2 branches. {quote} Good catch, fixed [here|https://github.com/apache/cassandra/commit/df59f2de8042e7ea67e520d509d675da1d53bbce]. {quote}In 2.1 I saw some (very) intermittent test failures in TTLTest. I instrumented checkTTLIsCapped to print out the (min | actual | max) TTLs to sysout and eventually managed to repro it. You can see from the output that in the first line, the min and actual are actually > the max, which caused the test to fail (this happened around 10% of the time). {quote} Oh, that's funny! Perhaps it could be related to the platform, as I cannot reproduce this locally? In any case, I updated the check [here|https://github.com/apache/cassandra/commit/18be7f140c3c2159f69d50b1bfb068927c734a13] to be more deterministic. I also noticed that we weren't applying the expiration overflow policy in the CQLv2 interface, so I updated it [here|https://github.com/apache/cassandra/commit/4230a5b4126e972827990dc33c1a9140af07afe1]. I find it hard someone is using this but I wouldn't be totally surprised. {quote}I think we should have the scrub fix in 2.1 & 2.2 as some users have not yet moved to 3.x and they should probably get the chance to repair (maybe) their data if they want/are able to. {quote} Agreed, as noted in the NEWS.txt, 2.1/2.2 is only affected if users have assertions disabled, but given that we have this [comment|https://github.com/apache/cassandra/blob/cassandra-2.1/conf/cassandra-env.sh#L173] on 2.1 I wouldn't be surprised if some users with assertions disabled hit this. While writing the 2.1 scrubber (which is slightly different from 3.0 due to CASSANDRA-8099), I found a nasty edge case with the recovery process. The problem is that, if the cell with negative/overflowed {{localExpirationTime}} [is converted to a tombstone during compaction|https://github.com/apache/cassandra/blob/30ed83d9266a03debad98ffac5610dcb3ae30934/src/java/org/apache/cassandra/db/rows/AbstractCell.java#L95], we can't convert the tombstone back into an expired cell because the TTL value is lost and we can't differentiate this from an ordinary tombstone. Furthermore, even if the original SSTable is scrubbed and the negative {{localExpirationTime}} is converted to {{MAX_DELETION_TIME}}, if the tombstone was already generated, it will shadow/delete the fixed expired cell, because [deleted cells have precedence over live cells when the timestamp is the same|https://github.com/apache/cassandra/blob/8b3a60b9a7dbefeecc06bace617279612ec7092d/src/java/org/apache/cassandra/db/Conflicts.java#L43]. I updated {{recover_negative_expiration_date_sstables_with_scrub}} to [add an SSTable with the expired cell converted to a tombstone|https://github.com/apache/cassandra-dtest/commit/63b0e7d51fbacf4fabbb860d81873605c3b66c92], and verified that an existing tombstone shadows recovered data in the current version. In order to fix this, the idea is during scrub increment the timestamp of cells with negative {{localExpirationTime}} by one in addition to setting the {{localExpirationTime}} to {{MAX_DELETION_TIME}}, so the recovered cell will shadow/supersede the potential tombstone that was already written. Since this will not recreate the row, but technically reinsert the row with a higher timestamp, we cannot do this automatically on scrub, since it may overwrite an existing row inserted just one millisecond after the original row (while very hard to happen in practice, this may depend on application logic). For this reason, the user needs to specify the {{--reinsert-overflowed-ttl}} option during scrub to perform the recovery. This approach is implemented [here|https://github.com/apache/cassandra/commit/5f9f704449ed1ef579c61953b02a9ef32f13082b] on 3.0 and ported to all other branches in a separate commit. {quote}The last thing is about providing a route to fix up overflowed dates via scrub, I think we should definitely leave the remedial Scrubber code in trunk until we have a proper fix committed. {quote} Since 4.0 was not released, and it will come out with this fix it's impossible that someone will hit this on 4.0, but I don't see any problem in keeping the scrub option for the lazy ones upgrading from 3.x without fixing their SSTables beforehand. :) For this reason I kept the 3.11 SSTables in the 4.X (d)tests. To prevent scrub and the tests from throwing an {{AssertionError}} on 2.1 I removed [this assertion|https://github.com/apache/cassandra/commit/4501eee5c962547c059a4f624155c5e18d5369d3], since it's no longer necessary given we apply the overflow policy. Finally, I refactored the NEWS.txt section a bit to take [~KurtG] comments into account and also to give more emphasis to the expiration overflow policies since this will be what most users need to care about, and added a recovery subsection with detailed recovery instructions. I now wonder if this notice has gotten to big and is cluttering the NEWS.txt file too much - what may confuse users looking for upgrade instructions - and we should maybe create a new WARNING/IMPORTANT file and add a pointer to it in the NEWS.txt file instead? To facilitate review, I squashed the previous commits and created a new branch with new commits only representing changes since last review (except for the 2.2 patch, which is basically the same as 2.1 except the additional commit simplifying TTLTest). The previous CI looked good, I will submit a new CI run and post the results here later. ||2.1||2.2||3.0||3.11||trunk||dtest|| |[branch|https://github.com/apache/cassandra/compare/cassandra-2.1...pauloricardomg:2.1-14092-v6]|[branch|https://github.com/apache/cassandra/compare/cassandra-2.2...pauloricardomg:2.2-14092-v6]|[branch|https://github.com/apache/cassandra/compare/cassandra-3.0...pauloricardomg:3.0-14092-v6]|[branch|https://github.com/apache/cassandra/compare/cassandra-3.11...pauloricardomg:3.11-14092-v6]|[branch|https://github.com/apache/cassandra/compare/trunk...pauloricardomg:trunk-14092-v6]|[branch|https://github.com/apache/cassandra-dtest/compare/master...pauloricardomg:14092]| > Max ttl of 20 years will overflow localDeletionTime > --------------------------------------------------- > > Key: CASSANDRA-14092 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14092 > Project: Cassandra > Issue Type: Bug > Components: Core > Reporter: Paulo Motta > Assignee: Paulo Motta > Priority: Blocker > Fix For: 2.1.20, 2.2.12, 3.0.16, 3.11.2 > > > CASSANDRA-4771 added a max value of 20 years for ttl to protect against [year > 2038 overflow bug|https://en.wikipedia.org/wiki/Year_2038_problem] for > {{localDeletionTime}}. > It turns out that next year the {{localDeletionTime}} will start overflowing > with the maximum ttl of 20 years ({{System.currentTimeMillis() + ttl(20 > years) > Integer.MAX_VALUE}}), so we should remove this limitation. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org