[jira] [Comment Edited] (CASSANDRA-14092) Max ttl of 20 years will overflow localDeletionTime

Paulo Motta (JIRA) Thu, 08 Feb 2018 09:16:18 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-14092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16357234#comment-16357234
 ]


Paulo Motta edited comment on CASSANDRA-14092 at 2/8/18 5:15 PM:
-----------------------------------------------------------------

Thanks for the review, this took a bit longer than expected as I found an edge 
case which required a change from the previous approach as I describe later. 
First, see the follow-up from the previous round of review:
{quote}In the 3.0+ branches, 
ExpirationDateOverflowHandling::maybeApplyExpirationDateOverflowPolicy can use 
Cell.NO_TTL rather than 0 in the first check.
{quote}
Agreed, fixed 
[here|https://github.com/apache/cassandra/commit/4bcea858f62b619b83a5db83f0cd93e192fea80c].
{quote}In 3.0+ you renamed the static policy field to have a shorter name, but 
missed that in the 2.1 & 2.2 branches.
{quote}
Good catch, fixed 
[here|https://github.com/apache/cassandra/commit/df59f2de8042e7ea67e520d509d675da1d53bbce].
{quote}In 2.1 I saw some (very) intermittent test failures in TTLTest. I 
instrumented checkTTLIsCapped to print out the (min | actual | max) TTLs to 
sysout and eventually managed to repro it. You can see from the output that in 
the first line, the min and actual are actually > the max, which caused the 
test to fail (this happened around 10% of the time).
{quote}
Oh, that's funny! Perhaps it could be related to the platform, as I cannot 
reproduce this locally? In any case, I updated the check 
[here|https://github.com/apache/cassandra/commit/18be7f140c3c2159f69d50b1bfb068927c734a13]
 to be more deterministic.

I also noticed that we weren't applying the expiration overflow policy in the 
CQLv2 interface, so I updated it 
[here|https://github.com/apache/cassandra/commit/4230a5b4126e972827990dc33c1a9140af07afe1].
 I find it hard someone is using this but I wouldn't be totally surprised.
{quote}I think we should have the scrub fix in 2.1 & 2.2 as some users have not 
yet moved to 3.x and they should probably get the chance to repair (maybe) 
their data if they want/are able to.
{quote}
Agreed, as noted in the NEWS.txt, 2.1/2.2 is only affected if users have 
assertions disabled, but given that we have this 
[comment|https://github.com/apache/cassandra/blob/cassandra-2.1/conf/cassandra-env.sh#L173]
 on 2.1 I wouldn't be surprised if some users with assertions disabled hit this.

While writing the 2.1 scrubber (which is slightly different from 3.0 due to 
CASSANDRA-8099), I found a nasty edge case with the recovery process. The 
problem is that, if the cell with negative/overflowed {{localExpirationTime}} 
[is converted to a tombstone during 
compaction|https://github.com/apache/cassandra/blob/30ed83d9266a03debad98ffac5610dcb3ae30934/src/java/org/apache/cassandra/db/rows/AbstractCell.java#L95],
 we can't convert the tombstone back into an expired cell because the TTL value 
is lost and we can't differentiate this from an ordinary tombstone. 
Furthermore, even if the original SSTable is scrubbed and the negative 
{{localExpirationTime}} is converted to {{MAX_DELETION_TIME}}, if the tombstone 
was already generated, it will shadow/delete the fixed expired cell, because 
[deleted cells have precedence over live cells when the timestamp is the 
same|https://github.com/apache/cassandra/blob/8b3a60b9a7dbefeecc06bace617279612ec7092d/src/java/org/apache/cassandra/db/Conflicts.java#L43].

I updated {{recover_negative_expiration_date_sstables_with_scrub}} to [add an 
SSTable with the expired cell converted to a 
tombstone|https://github.com/apache/cassandra-dtest/commit/63b0e7d51fbacf4fabbb860d81873605c3b66c92],
 and verified that an existing tombstone shadows recovered data in the current 
version.

In order to fix this, the idea is during scrub increment the timestamp of cells 
with negative {{localExpirationTime}} by one in addition to setting the 
{{localExpirationTime}} to {{MAX_DELETION_TIME}}, so the recovered cell will 
shadow/supersede the potential tombstone that was already written.

Since this will not recreate the row, but technically reinsert the row with a 
higher timestamp, we cannot do this automatically on scrub, since it may 
overwrite an existing row inserted just one millisecond after the original row 
(while very hard to happen in practice, this may depend on application logic). 
For this reason, the user needs to specify the {{--reinsert-overflowed-ttl}} 
option during scrub to perform the recovery.

This approach is implemented 
[here|https://github.com/apache/cassandra/commit/5f9f704449ed1ef579c61953b02a9ef32f13082b]
 on 3.0 and ported to all other branches in a separate commit.
{quote}The last thing is about providing a route to fix up overflowed dates via 
scrub, I think we should definitely leave the remedial Scrubber code in trunk 
until we have a proper fix committed.
{quote}
Since 4.0 was not released, and it will come out with this fix it's impossible 
that someone will hit this on 4.0, but I don't see any problem in keeping the 
scrub option for the lazy ones upgrading from 3.x without fixing their SSTables 
beforehand. :) For this reason I kept the 3.11 SSTables in the 4.X (d)tests.

To prevent scrub and the tests from throwing an {{AssertionError}} on 2.1 I 
removed [this 
assertion|https://github.com/apache/cassandra/commit/4501eee5c962547c059a4f624155c5e18d5369d3],
 since it's no longer necessary given we apply the overflow policy.

Finally, I 
[refactored|https://github.com/apache/cassandra/commit/c3e1fcc25b3db9db212dfc805a47430ff537b101]
 the NEWS.txt section a bit to take [~KurtG] comments into account and also to 
give more emphasis to the expiration overflow policies since this will be what 
most users need to care about, and added a recovery subsection with detailed 
recovery instructions.

I now wonder if this notice has gotten to big and is cluttering the NEWS.txt 
file too much - what may confuse users looking for upgrade instructions - and 
we should maybe create a new WARNING/IMPORTANT file and add a pointer to it in 
the NEWS.txt file instead?

To facilitate review, I squashed the previous commits and created a new branch 
with new commits only representing changes since last review (except for the 
2.2 patch, which is basically the same as 2.1 except the additional commit 
simplifying TTLTest). The previous CI looked good, I will submit a new CI run 
and post the results here later.
||2.1||2.2||3.0||3.11||trunk||dtest||
|[branch|https://github.com/apache/cassandra/compare/cassandra-2.1...pauloricardomg:2.1-14092-v6]|[branch|https://github.com/apache/cassandra/compare/cassandra-2.2...pauloricardomg:2.2-14092-v6]|[branch|https://github.com/apache/cassandra/compare/cassandra-3.0...pauloricardomg:3.0-14092-v6]|[branch|https://github.com/apache/cassandra/compare/cassandra-3.11...pauloricardomg:3.11-14092-v6]|[branch|https://github.com/apache/cassandra/compare/trunk...pauloricardomg:trunk-14092-v6]|[branch|https://github.com/apache/cassandra-dtest/compare/master...pauloricardomg:14092]|


was (Author: pauloricardomg):
Thanks for the review, this took a bit longer than expected as I found an edge 
case which required a change from the previous approach as I describe later. 
First, see the follow-up from the previous round of review:
{quote}In the 3.0+ branches, 
ExpirationDateOverflowHandling::maybeApplyExpirationDateOverflowPolicy can use 
Cell.NO_TTL rather than 0 in the first check.
{quote}
Agreed, fixed 
[here|https://github.com/apache/cassandra/commit/4bcea858f62b619b83a5db83f0cd93e192fea80c].
{quote}In 3.0+ you renamed the static policy field to have a shorter name, but 
missed that in the 2.1 & 2.2 branches.
{quote}
Good catch, fixed 
[here|https://github.com/apache/cassandra/commit/df59f2de8042e7ea67e520d509d675da1d53bbce].
{quote}In 2.1 I saw some (very) intermittent test failures in TTLTest. I 
instrumented checkTTLIsCapped to print out the (min | actual | max) TTLs to 
sysout and eventually managed to repro it. You can see from the output that in 
the first line, the min and actual are actually > the max, which caused the 
test to fail (this happened around 10% of the time).
{quote}
Oh, that's funny! Perhaps it could be related to the platform, as I cannot 
reproduce this locally? In any case, I updated the check 
[here|https://github.com/apache/cassandra/commit/18be7f140c3c2159f69d50b1bfb068927c734a13]
 to be more deterministic.

I also noticed that we weren't applying the expiration overflow policy in the 
CQLv2 interface, so I updated it 
[here|https://github.com/apache/cassandra/commit/4230a5b4126e972827990dc33c1a9140af07afe1].
 I find it hard someone is using this but I wouldn't be totally surprised.
{quote}I think we should have the scrub fix in 2.1 & 2.2 as some users have not 
yet moved to 3.x and they should probably get the chance to repair (maybe) 
their data if they want/are able to.
{quote}
Agreed, as noted in the NEWS.txt, 2.1/2.2 is only affected if users have 
assertions disabled, but given that we have this 
[comment|https://github.com/apache/cassandra/blob/cassandra-2.1/conf/cassandra-env.sh#L173]
 on 2.1 I wouldn't be surprised if some users with assertions disabled hit this.

While writing the 2.1 scrubber (which is slightly different from 3.0 due to 
CASSANDRA-8099), I found a nasty edge case with the recovery process. The 
problem is that, if the cell with negative/overflowed {{localExpirationTime}} 
[is converted to a tombstone during 
compaction|https://github.com/apache/cassandra/blob/30ed83d9266a03debad98ffac5610dcb3ae30934/src/java/org/apache/cassandra/db/rows/AbstractCell.java#L95],
 we can't convert the tombstone back into an expired cell because the TTL value 
is lost and we can't differentiate this from an ordinary tombstone. 
Furthermore, even if the original SSTable is scrubbed and the negative 
{{localExpirationTime}} is converted to {{MAX_DELETION_TIME}}, if the tombstone 
was already generated, it will shadow/delete the fixed expired cell, because 
[deleted cells have precedence over live cells when the timestamp is the 
same|https://github.com/apache/cassandra/blob/8b3a60b9a7dbefeecc06bace617279612ec7092d/src/java/org/apache/cassandra/db/Conflicts.java#L43].

I updated {{recover_negative_expiration_date_sstables_with_scrub}} to [add an 
SSTable with the expired cell converted to a 
tombstone|https://github.com/apache/cassandra-dtest/commit/63b0e7d51fbacf4fabbb860d81873605c3b66c92],
 and verified that an existing tombstone shadows recovered data in the current 
version.

In order to fix this, the idea is during scrub increment the timestamp of cells 
with negative {{localExpirationTime}} by one in addition to setting the 
{{localExpirationTime}} to {{MAX_DELETION_TIME}}, so the recovered cell will 
shadow/supersede the potential tombstone that was already written.

Since this will not recreate the row, but technically reinsert the row with a 
higher timestamp, we cannot do this automatically on scrub, since it may 
overwrite an existing row inserted just one millisecond after the original row 
(while very hard to happen in practice, this may depend on application logic). 
For this reason, the user needs to specify the {{--reinsert-overflowed-ttl}} 
option during scrub to perform the recovery.

This approach is implemented 
[here|https://github.com/apache/cassandra/commit/5f9f704449ed1ef579c61953b02a9ef32f13082b]
 on 3.0 and ported to all other branches in a separate commit.
{quote}The last thing is about providing a route to fix up overflowed dates via 
scrub, I think we should definitely leave the remedial Scrubber code in trunk 
until we have a proper fix committed.
{quote}
Since 4.0 was not released, and it will come out with this fix it's impossible 
that someone will hit this on 4.0, but I don't see any problem in keeping the 
scrub option for the lazy ones upgrading from 3.x without fixing their SSTables 
beforehand. :) For this reason I kept the 3.11 SSTables in the 4.X (d)tests.

To prevent scrub and the tests from throwing an {{AssertionError}} on 2.1 I 
removed [this 
assertion|https://github.com/apache/cassandra/commit/4501eee5c962547c059a4f624155c5e18d5369d3],
 since it's no longer necessary given we apply the overflow policy.

Finally, I refactored the NEWS.txt section a bit to take [~KurtG] comments into 
account and also to give more emphasis to the expiration overflow policies 
since this will be what most users need to care about, and added a recovery 
subsection with detailed recovery instructions.

I now wonder if this notice has gotten to big and is cluttering the NEWS.txt 
file too much - what may confuse users looking for upgrade instructions - and 
we should maybe create a new WARNING/IMPORTANT file and add a pointer to it in 
the NEWS.txt file instead?

To facilitate review, I squashed the previous commits and created a new branch 
with new commits only representing changes since last review (except for the 
2.2 patch, which is basically the same as 2.1 except the additional commit 
simplifying TTLTest). The previous CI looked good, I will submit a new CI run 
and post the results here later.
||2.1||2.2||3.0||3.11||trunk||dtest||
|[branch|https://github.com/apache/cassandra/compare/cassandra-2.1...pauloricardomg:2.1-14092-v6]|[branch|https://github.com/apache/cassandra/compare/cassandra-2.2...pauloricardomg:2.2-14092-v6]|[branch|https://github.com/apache/cassandra/compare/cassandra-3.0...pauloricardomg:3.0-14092-v6]|[branch|https://github.com/apache/cassandra/compare/cassandra-3.11...pauloricardomg:3.11-14092-v6]|[branch|https://github.com/apache/cassandra/compare/trunk...pauloricardomg:trunk-14092-v6]|[branch|https://github.com/apache/cassandra-dtest/compare/master...pauloricardomg:14092]|

> Max ttl of 20 years will overflow localDeletionTime
> ---------------------------------------------------
>
>                 Key: CASSANDRA-14092
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14092
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Paulo Motta
>            Assignee: Paulo Motta
>            Priority: Blocker
>             Fix For: 2.1.20, 2.2.12, 3.0.16, 3.11.2
>
>
> CASSANDRA-4771 added a max value of 20 years for ttl to protect against [year 
> 2038 overflow bug|https://en.wikipedia.org/wiki/Year_2038_problem] for 
> {{localDeletionTime}}.
> It turns out that next year the {{localDeletionTime}} will start overflowing 
> with the maximum ttl of 20 years ({{System.currentTimeMillis() + ttl(20 
> years) > Integer.MAX_VALUE}}), so we should remove this limitation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-14092) Max ttl of 20 years will overflow localDeletionTime

Reply via email to