Nadav Har'El created CASSANDRA-14262:
----------------------------------------

             Summary: View update sent multiple times during range movement
                 Key: CASSANDRA-14262
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14262
             Project: Cassandra
          Issue Type: Improvement
          Components: Materialized Views
            Reporter: Nadav Har'El


This issue is about updating a base table with materialized views while 
token-ranges are being moved, i.e., while a node is being added or removed from 
the cluster (this is a long process because the data needs to be streamed to 
its new owning node).

During this process, each view-mutation we want to write to a view table may 
have an additional "pending node" (or several of them) - another node (or 
nodes) which will hold this view mutation, and we need to send the view 
mutations to these new nodes too. This code existed until CASSANDRA-13069, when 
it was accidentally removed, and returned in CASSANDRA-14251.

However, the current code, in mutateMV(), has each of the RF (e.g., 3) base 
replicas send the view mutation to the the same pending node. This is of course 
redundant, and reduces write throughput while the streaming is performed.

I suggested (based on an idea by [~shlomi_livne]) that it may be enough for 
only the single node which will be paired (when the range movement completes) 
with the pending node to send it the update. [~pauloricardomg] replied (see 
[https://lists.apache.org/thread.html/12c78582a3f709ca33a45e5fa6121148b1b1ad9c9b290d1a21e4409b@%3Cdev.cassandra.apache.org%3E]
 ) that it appears that such an optimization would work in the common case of 
single movements but will not work in rarer more complex cases (I did not fully 
understand the details, check out the above link for the details).

I believe there's another problem with the current code, which is of 
correctness: If any view replica ends up with two different view rows for the 
same partition key, such a mistake cannot currently be fixed (see 
CASSANDRA-10346). But if we have different base replicas with two different 
values (a consistency an ordinary base repair could fix, if we ran it) and both 
of them send their update to the same pending view replica, this view replica 
will now have two rows, one of them wrong (and cannot currently be repaired).

 

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to