[ https://issues.apache.org/jira/browse/CASSANDRA-11000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Robert Stupp resolved CASSANDRA-11000. -------------------------------------- Resolution: Won't Fix Reproduced In: 3.0.1, 2.2.4 (was: 2.2.4, 3.0.1) I completely agree with [~spo...@gmail.com]. Writing data with a timestamp in the future is an use case that requires knowledge on how C* is designed and works (however, this may also occur if the system wall clocks are not in sync - that's why we recommend to ensure that the system wall clocks are in sync). And what should C* do if it detects such a timestamp in the future? Shall it reject the operation? But what if the system wall clock was out of sync and has been adjusted? Is it still a valid operation or not? I assume there not one "golden" way that's viable for everybody. I think adding such a check to LWT operations makes things even worse. Also mixing LWT and non-LWT statements is a valid use case - but if mixed on the same columns it can cause trouble. In a perfect world, it might be easily solvable. But the world's not perfect. I.e. non-LWT updates can be delayed by LAN or WAN failures, node outages (hardware failures, regular maintenance operations, etc). There's just too much stuff that needs to be considered. TL;DR that's why I resolved this as won't fix. > Mixing LWT and non-LWT operations can result in an LWT operation being > acknowledged but not applied > --------------------------------------------------------------------------------------------------- > > Key: CASSANDRA-11000 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11000 > Project: Cassandra > Issue Type: Bug > Components: Coordination > Environment: Cassandra 2.1, 2.2, and 3.0 on Linux and OS X. > Reporter: Sebastian Marsching > > When mixing light-weight transaction (LWT, a.k.a. compare-and-set, > conditional update) operations with regular operations, it can happen that an > LWT operation is acknowledged (applied = True), even though the update has > not been applied and a SELECT operation still returns the old data. > For example, consider the following table: > {code} > CREATE TABLE test ( > pk text, > ck text, > v text, > PRIMARY KEY (pk, ck) > ); > {code} > We start with an empty table and insert data using a regular (non-LWT) > operation: > {code} > INSERT INTO test (pk, ck, v) VALUES ('foo', 'bar', '123'); > {code} > A following SELECT statement returns the data as expected. Now we do a > conditional update (LWT): > {code} > UPDATE test SET v = '456' WHERE pk = 'foo' AND ck = 'bar' IF v = '123'; > {code} > As expected, the update is applied and a following SELECT statement shows the > updated value. > Now we do the same but use a time stamp that is slightly in the future (e.g. > a few seconds) for the INSERT statement (obviously $time$ needs to be > replaced by a time stamp that is slightly ahead of the system clock). > {code} > INSERT INTO test (pk, ck, v) VALUES ('foo', 'bar', '123') USING TIMESTAMP > $time$; > {code} > Now, running the same UPDATE statement still report success (applied = True). > However, a subsequent SELECT yields the old value ('123') instead of the > updated value ('456'). Inspecting the time stamp of the value indicates that > it has not been replaced (the value from the original INSERT is still in > place). > This behavior is exhibited in an single-node cluster running Cassandra > 2.1.11, 2.2.4, and 3.0.1. > Testing this for a multi-node cluster is a bit more tricky, so I only tested > it with Cassandra 2.2.4. Here, I made one of the nodes lack behind in time > for a few seconds (using libfaketime). I used a replication factor of three > for the test keyspace. In this case, the behavior can be demonstrated even > without using an explicitly specified time stamp. Running > {code} > INSERT INTO test (pk, ck, v) VALUES ('foo', 'bar', '123'); > {code} > on a node with the regular clock followed by > {code} > UPDATE test SET v = '456' WHERE pk = 'foo' AND ck = 'bar' IF v = '123'; > {code} > on the node lagging behind results in the UPDATE to report success, but the > old value still being used. > Interestingly, everything works as expected if using LWT operations > consistently: When running > {code} > UPDATE test SET v = '456' WHERE pk = 'foo' AND ck = 'bar' IF v = '123'; > UPDATE test SET v = '123' WHERE pk = 'foo' AND ck = 'bar' IF v = '456'; > {code} > in an alternating fashion on two nodes (one with a "normal" clock, one with > the clock lagging behind), the updates are applied as expected. When checking > the time stamps ("{{SELECT WRITETIME(v) FROM test;}}"), one can see that the > time stamp is increased by just a single tick when the statement is executed > on the node lagging behind. > I think that this problem is strongly related to (or maybe even the same as) > the one described in CASSANDRA-7801, even though CASSANDRA-7801 was mainly > concerned about a single-node cluster. However, the fact that this problem > still exists in current versions of Cassandra makes me suspect that either it > is a different problem or the original problem was not fixed completely with > the patch from CASSANDRA-7801. > I found CASSANDRA-9655 which suggest removing the changes introduced with > CASSANDRA-7801 because they can be problematic under certain circumstances, > but I am not sure whether this is the right place to discuss the issue I am > experiencing. If you feel so, feel free to close this issue and update the > description of CASSANDRA-9655. > In my opinion, the best way to fix this problem would be ensuring that a > write that is part of a LWT always uses a time stamp that is at least one > tick greater than the time stamp of the existing data. As the existing data > has to be read for checking the condition anyway, I do not think that this > would cause an additional overhead. If this is not possible, I suggest to > look into whether we can somehow detect such a situation and at least report > failure (applied = False) on the LWT instead of reporting success. > The latter solution would at least fix those cases where code checks the > success of a LWT before performing any further actions (e.g. because the LWT > is used to take some kind of lock). Currently, the code will assume that the > operation was successful (and thus - staying in the example - it owns the > lock), while other processes running in parallel will see a different state. > It is my understanding that LWTs were designed to avoid exactly this > situation, but at the moment the assumptions most users will make about LWTs > do not always hold. > Until this issue is solved, I suggest at least updating the CQL documentation > and clearly stating that LWTs / conditional updates are not safe if data has > been previously INSERTed / UPDATEd / DELETEd using non-LWT operations and > there is a clock skew or time stamps that are in the future have been > supplied explicitly. This should at least save some users from making wrong > assumptions about LWTs and not realizing it until their application fails in > an unsafe way. -- This message was sent by Atlassian JIRA (v6.3.4#6332)