[ 
https://issues.apache.org/jira/browse/CASSANDRA-14812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sivukhin Nikita updated CASSANDRA-14812:
----------------------------------------
    Summary: Multiget Thrift query skips records in case of DigestMismatch  
(was: Multiget Thrift query skip records in case of DigestMismatch)

> Multiget Thrift query skips records in case of DigestMismatch
> -------------------------------------------------------------
>
>                 Key: CASSANDRA-14812
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14812
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Sivukhin Nikita
>            Priority: Major
>              Labels: bug
>         Attachments: repro_script.py, requirements.txt, small_repro_script.py
>
>
> Since Cassandra 3.0.0 there is a subtle bug that relates to the {{multiget}} 
> Thrift query. It appears in the case when you try to read many partitions and 
> this read cause {{DigestMismatch}} for some partition. When this situation 
> happened, Cassandra cut your response stream right at the point when the 
> first {{DigestMismatch}} error occurred.
> This bug reproduced in all versions of Cassandra since 3.0.0. The pre-release 
> version 3.0.0-rc2 works fine (also, there is no such problem in Cassandra 2.x 
> versions). Looks like the big refactoring related to the task CASSANDRA-9975 
> ([link to 
> commit|https://github.com/apache/cassandra/commit/609497471441273367013c09a1e0e1c990726ec7])
>  in partition iterator architecture causes the wrong behavior.
> When concatenated iterator returned from the 
> [StorageProxy.fetchRows(...)|https://github.com/apache/cassandra/blob/a05785d82c621c9cd04d8a064c38fd2012ef981c/src/java/org/apache/cassandra/service/StorageProxy.java#L1770],
>  Cassandra starts to consume this combined iterator. Because of 
> {{DigestMismatch}} some elements of this combined iterator contains 
> additional {{ThriftCounter}}, that was added during 
> [DataResolver.resolve(...)|https://github.com/apache/cassandra/blob/ee9e06b5a75c0be954694b191ea4170456015b98/src/java/org/apache/cassandra/service/reads/DataResolver.java#L120]
>  execution. While consuming iterator for many partitions, Cassandra calls 
> [BaseIterator.tryGetMoreContents(...)|https://github.com/apache/cassandra/blob/a05785d82c621c9cd04d8a064c38fd2012ef981c/src/java/org/apache/cassandra/db/transform/BaseIterator.java#L115]
>  method that must switch from one partition iterator to another in case of 
> the devastation of former. In this case, all transformations for next 
> iterator applied to the whole BaseIterator that enumerate many partitions 
> sequence. This behavior causes iterator to stop enumeration after it fully 
> consumes partition with the {{DigestMismatch}} error because this partition 
> has addition {{ThriftCounter}} data limit that was applied to the whole 
> composite iterator.
> The attachment contains the python2 script [^small_repro_script.py] that 
> reproduces this bug within 3-nodes ccmlib controlled cluster. Also, there is 
> an extended version of this script - [^repro_script.py] - that contains more 
> logging information and provides the ability to test behavior for many 
> Cassandra versions (to run all test cases from repro_script.py you can call 
> {{python -m unittest2 -v repro_script.ThriftMultigetTestCase}}). All the 
> necessary dependencies contained in the [^requirements.txt]
>  
> This bug is critical in our production environment because we can't permit 
> any data skip.
> Any ideas about a patch for this issue?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to