[jira] [Commented] (CASSANDRA-14715) Read repairs can result in bogus timeout errors to the client
[ https://issues.apache.org/jira/browse/CASSANDRA-14715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17694981#comment-17694981 ] Stefan Miklosovic commented on CASSANDRA-14715: --- branch: [https://github.com/apache/cassandra/pull/1683] ci: [https://ci-cassandra.apache.org/view/patches/job/Cassandra-devbranch/2304/] I do not think that this is happening in 4.0+. > Read repairs can result in bogus timeout errors to the client > - > > Key: CASSANDRA-14715 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14715 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Local Write-Read Paths >Reporter: Cameron Zemek >Assignee: Stefan Miklosovic >Priority: Low > Time Spent: 50m > Remaining Estimate: 0h > > In RepairMergeListener:close() it does the following: > > {code:java} > try > { > FBUtilities.waitOnFutures(repairResults, > DatabaseDescriptor.getWriteRpcTimeout()); > } > catch (TimeoutException ex) > { > // We got all responses, but timed out while repairing > int blockFor = consistency.blockFor(keyspace); > if (Tracing.isTracing()) > Tracing.trace("Timed out while read-repairing after receiving all {} > data and digest responses", blockFor); > else > logger.debug("Timeout while read-repairing after receiving all {} > data and digest responses", blockFor); > throw new ReadTimeoutException(consistency, blockFor-1, blockFor, true); > } > {code} > This propagates up and gets sent to the client and we have customers get > confused cause they see timeouts for CL ALL requiring ALL replicas even > though they have read_repair_chance = 0 and using a LOCAL_* CL. > At minimum I suggest instead of using the consistency level of DataResolver > (which is always ALL with read repairs) for the timeout it instead use > repairResults.size(). That is blockFor = repairResults.size() . But saying it > received _blockFor - 1_ is bogus still. Fixing that would require more > changes. I was thinking maybe like so: > > {code:java} > public static void waitOnFutures(List results, long ms, > MutableInt counter) throws TimeoutException > { > for (AsyncOneResponse result : results) > { > result.get(ms, TimeUnit.MILLISECONDS); > counter.increment(); > } > } > {code} > > > > Likewise in SinglePartitionReadLifecycle:maybeAwaitFullDataRead() it says > _blockFor - 1_ for how many were received, which is also bogus. > > Steps used to reproduce was modify RepairMergeListener:close() to always > throw timeout exception. With schema: > {noformat} > CREATE KEYSPACE weather WITH replication = {'class': > 'NetworkTopologyStrategy', 'dc1': '3', 'dc2': '3'} AND durable_writes = true; > CREATE TABLE weather.city ( > cityid int PRIMARY KEY, > name text > ) WITH bloom_filter_fp_chance = 0.01 > AND dclocal_read_repair_chance = 0.0 > AND read_repair_chance = 0.0 > AND speculative_retry = 'NONE'; > {noformat} > Then using the following steps: > # ccm node1 cqlsh > # INSERT INTO weather.city(cityid, name) VALUES (1, 'Canberra'); > # exit; > # ccm node1 flush > # ccm node1 stop > # rm -rf > ~/.ccm/test_repair/node1/data0/weather/city-ff2fade0b18d11e8b1cd097acbab1e3d/mc-1-big-* > # remove the sstable with the insert > # ccm node1 start > # ccm node1 cqlsh > # CONSISTENCY LOCAL_QUORUM; > # select * from weather.city where cityid = 1; > You get result of: > {noformat} > ReadTimeout: Error from server: code=1200 [Coordinator node timed out waiting > for replica nodes' responses] message="Operation timed out - received only 5 > responses." info={'received_responses': 5, 'required_responses': 6, > 'consistency': 'ALL'}{noformat} > But was expecting: > {noformat} > ReadTimeout: Error from server: code=1200 [Coordinator node timed out waiting > for replica nodes' responses] message="Operation timed out - received only 1 > responses." info={'received_responses': 1, 'required_responses': 2, > 'consistency': 'LOCAL_QUORUM'}{noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14715) Read repairs can result in bogus timeout errors to the client
[ https://issues.apache.org/jira/browse/CASSANDRA-14715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17693173#comment-17693173 ] Stefan Miklosovic commented on CASSANDRA-14715: --- [~jaid] is this happening in 4.0 too? I am trying to figure that out. The logic around this stuff was rewritten there. https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/reads/repair/BlockingReadRepair.java#L82-L106 > Read repairs can result in bogus timeout errors to the client > - > > Key: CASSANDRA-14715 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14715 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Local Write-Read Paths >Reporter: Cameron Zemek >Priority: Low > Time Spent: 50m > Remaining Estimate: 0h > > In RepairMergeListener:close() it does the following: > > {code:java} > try > { > FBUtilities.waitOnFutures(repairResults, > DatabaseDescriptor.getWriteRpcTimeout()); > } > catch (TimeoutException ex) > { > // We got all responses, but timed out while repairing > int blockFor = consistency.blockFor(keyspace); > if (Tracing.isTracing()) > Tracing.trace("Timed out while read-repairing after receiving all {} > data and digest responses", blockFor); > else > logger.debug("Timeout while read-repairing after receiving all {} > data and digest responses", blockFor); > throw new ReadTimeoutException(consistency, blockFor-1, blockFor, true); > } > {code} > This propagates up and gets sent to the client and we have customers get > confused cause they see timeouts for CL ALL requiring ALL replicas even > though they have read_repair_chance = 0 and using a LOCAL_* CL. > At minimum I suggest instead of using the consistency level of DataResolver > (which is always ALL with read repairs) for the timeout it instead use > repairResults.size(). That is blockFor = repairResults.size() . But saying it > received _blockFor - 1_ is bogus still. Fixing that would require more > changes. I was thinking maybe like so: > > {code:java} > public static void waitOnFutures(List results, long ms, > MutableInt counter) throws TimeoutException > { > for (AsyncOneResponse result : results) > { > result.get(ms, TimeUnit.MILLISECONDS); > counter.increment(); > } > } > {code} > > > > Likewise in SinglePartitionReadLifecycle:maybeAwaitFullDataRead() it says > _blockFor - 1_ for how many were received, which is also bogus. > > Steps used to reproduce was modify RepairMergeListener:close() to always > throw timeout exception. With schema: > {noformat} > CREATE KEYSPACE weather WITH replication = {'class': > 'NetworkTopologyStrategy', 'dc1': '3', 'dc2': '3'} AND durable_writes = true; > CREATE TABLE weather.city ( > cityid int PRIMARY KEY, > name text > ) WITH bloom_filter_fp_chance = 0.01 > AND dclocal_read_repair_chance = 0.0 > AND read_repair_chance = 0.0 > AND speculative_retry = 'NONE'; > {noformat} > Then using the following steps: > # ccm node1 cqlsh > # INSERT INTO weather.city(cityid, name) VALUES (1, 'Canberra'); > # exit; > # ccm node1 flush > # ccm node1 stop > # rm -rf > ~/.ccm/test_repair/node1/data0/weather/city-ff2fade0b18d11e8b1cd097acbab1e3d/mc-1-big-* > # remove the sstable with the insert > # ccm node1 start > # ccm node1 cqlsh > # CONSISTENCY LOCAL_QUORUM; > # select * from weather.city where cityid = 1; > You get result of: > {noformat} > ReadTimeout: Error from server: code=1200 [Coordinator node timed out waiting > for replica nodes' responses] message="Operation timed out - received only 5 > responses." info={'received_responses': 5, 'required_responses': 6, > 'consistency': 'ALL'}{noformat} > But was expecting: > {noformat} > ReadTimeout: Error from server: code=1200 [Coordinator node timed out waiting > for replica nodes' responses] message="Operation timed out - received only 1 > responses." info={'received_responses': 1, 'required_responses': 2, > 'consistency': 'LOCAL_QUORUM'}{noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14715) Read repairs can result in bogus timeout errors to the client
[ https://issues.apache.org/jira/browse/CASSANDRA-14715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17693126#comment-17693126 ] Stefan Miklosovic commented on CASSANDRA-14715: --- Let's give it another shot. > Read repairs can result in bogus timeout errors to the client > - > > Key: CASSANDRA-14715 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14715 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Local Write-Read Paths >Reporter: Cameron Zemek >Priority: Low > > In RepairMergeListener:close() it does the following: > > {code:java} > try > { > FBUtilities.waitOnFutures(repairResults, > DatabaseDescriptor.getWriteRpcTimeout()); > } > catch (TimeoutException ex) > { > // We got all responses, but timed out while repairing > int blockFor = consistency.blockFor(keyspace); > if (Tracing.isTracing()) > Tracing.trace("Timed out while read-repairing after receiving all {} > data and digest responses", blockFor); > else > logger.debug("Timeout while read-repairing after receiving all {} > data and digest responses", blockFor); > throw new ReadTimeoutException(consistency, blockFor-1, blockFor, true); > } > {code} > This propagates up and gets sent to the client and we have customers get > confused cause they see timeouts for CL ALL requiring ALL replicas even > though they have read_repair_chance = 0 and using a LOCAL_* CL. > At minimum I suggest instead of using the consistency level of DataResolver > (which is always ALL with read repairs) for the timeout it instead use > repairResults.size(). That is blockFor = repairResults.size() . But saying it > received _blockFor - 1_ is bogus still. Fixing that would require more > changes. I was thinking maybe like so: > > {code:java} > public static void waitOnFutures(List results, long ms, > MutableInt counter) throws TimeoutException > { > for (AsyncOneResponse result : results) > { > result.get(ms, TimeUnit.MILLISECONDS); > counter.increment(); > } > } > {code} > > > > Likewise in SinglePartitionReadLifecycle:maybeAwaitFullDataRead() it says > _blockFor - 1_ for how many were received, which is also bogus. > > Steps used to reproduce was modify RepairMergeListener:close() to always > throw timeout exception. With schema: > {noformat} > CREATE KEYSPACE weather WITH replication = {'class': > 'NetworkTopologyStrategy', 'dc1': '3', 'dc2': '3'} AND durable_writes = true; > CREATE TABLE weather.city ( > cityid int PRIMARY KEY, > name text > ) WITH bloom_filter_fp_chance = 0.01 > AND dclocal_read_repair_chance = 0.0 > AND read_repair_chance = 0.0 > AND speculative_retry = 'NONE'; > {noformat} > Then using the following steps: > # ccm node1 cqlsh > # INSERT INTO weather.city(cityid, name) VALUES (1, 'Canberra'); > # exit; > # ccm node1 flush > # ccm node1 stop > # rm -rf > ~/.ccm/test_repair/node1/data0/weather/city-ff2fade0b18d11e8b1cd097acbab1e3d/mc-1-big-* > # remove the sstable with the insert > # ccm node1 start > # ccm node1 cqlsh > # CONSISTENCY LOCAL_QUORUM; > # select * from weather.city where cityid = 1; > You get result of: > {noformat} > ReadTimeout: Error from server: code=1200 [Coordinator node timed out waiting > for replica nodes' responses] message="Operation timed out - received only 5 > responses." info={'received_responses': 5, 'required_responses': 6, > 'consistency': 'ALL'}{noformat} > But was expecting: > {noformat} > ReadTimeout: Error from server: code=1200 [Coordinator node timed out waiting > for replica nodes' responses] message="Operation timed out - received only 1 > responses." info={'received_responses': 1, 'required_responses': 2, > 'consistency': 'LOCAL_QUORUM'}{noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14715) Read repairs can result in bogus timeout errors to the client
[ https://issues.apache.org/jira/browse/CASSANDRA-14715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17584077#comment-17584077 ] Stefan Miklosovic commented on CASSANDRA-14715: --- I do not plan to work on this. I am not sure how to move forward. > Read repairs can result in bogus timeout errors to the client > - > > Key: CASSANDRA-14715 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14715 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Local Write-Read Paths >Reporter: Cameron Zemek >Assignee: Stefan Miklosovic >Priority: Low > > In RepairMergeListener:close() it does the following: > > {code:java} > try > { > FBUtilities.waitOnFutures(repairResults, > DatabaseDescriptor.getWriteRpcTimeout()); > } > catch (TimeoutException ex) > { > // We got all responses, but timed out while repairing > int blockFor = consistency.blockFor(keyspace); > if (Tracing.isTracing()) > Tracing.trace("Timed out while read-repairing after receiving all {} > data and digest responses", blockFor); > else > logger.debug("Timeout while read-repairing after receiving all {} > data and digest responses", blockFor); > throw new ReadTimeoutException(consistency, blockFor-1, blockFor, true); > } > {code} > This propagates up and gets sent to the client and we have customers get > confused cause they see timeouts for CL ALL requiring ALL replicas even > though they have read_repair_chance = 0 and using a LOCAL_* CL. > At minimum I suggest instead of using the consistency level of DataResolver > (which is always ALL with read repairs) for the timeout it instead use > repairResults.size(). That is blockFor = repairResults.size() . But saying it > received _blockFor - 1_ is bogus still. Fixing that would require more > changes. I was thinking maybe like so: > > {code:java} > public static void waitOnFutures(List results, long ms, > MutableInt counter) throws TimeoutException > { > for (AsyncOneResponse result : results) > { > result.get(ms, TimeUnit.MILLISECONDS); > counter.increment(); > } > } > {code} > > > > Likewise in SinglePartitionReadLifecycle:maybeAwaitFullDataRead() it says > _blockFor - 1_ for how many were received, which is also bogus. > > Steps used to reproduce was modify RepairMergeListener:close() to always > throw timeout exception. With schema: > {noformat} > CREATE KEYSPACE weather WITH replication = {'class': > 'NetworkTopologyStrategy', 'dc1': '3', 'dc2': '3'} AND durable_writes = true; > CREATE TABLE weather.city ( > cityid int PRIMARY KEY, > name text > ) WITH bloom_filter_fp_chance = 0.01 > AND dclocal_read_repair_chance = 0.0 > AND read_repair_chance = 0.0 > AND speculative_retry = 'NONE'; > {noformat} > Then using the following steps: > # ccm node1 cqlsh > # INSERT INTO weather.city(cityid, name) VALUES (1, 'Canberra'); > # exit; > # ccm node1 flush > # ccm node1 stop > # rm -rf > ~/.ccm/test_repair/node1/data0/weather/city-ff2fade0b18d11e8b1cd097acbab1e3d/mc-1-big-* > # remove the sstable with the insert > # ccm node1 start > # ccm node1 cqlsh > # CONSISTENCY LOCAL_QUORUM; > # select * from weather.city where cityid = 1; > You get result of: > {noformat} > ReadTimeout: Error from server: code=1200 [Coordinator node timed out waiting > for replica nodes' responses] message="Operation timed out - received only 5 > responses." info={'received_responses': 5, 'required_responses': 6, > 'consistency': 'ALL'}{noformat} > But was expecting: > {noformat} > ReadTimeout: Error from server: code=1200 [Coordinator node timed out waiting > for replica nodes' responses] message="Operation timed out - received only 1 > responses." info={'received_responses': 1, 'required_responses': 2, > 'consistency': 'LOCAL_QUORUM'}{noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14715) Read repairs can result in bogus timeout errors to the client
[ https://issues.apache.org/jira/browse/CASSANDRA-14715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17584057#comment-17584057 ] Jai Bheemsen Rao Dhanwada commented on CASSANDRA-14715: --- [~stefan.miklosovic] do you have any estimate if this will be released anything sooner? thank you > Read repairs can result in bogus timeout errors to the client > - > > Key: CASSANDRA-14715 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14715 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Local Write-Read Paths >Reporter: Cameron Zemek >Assignee: Stefan Miklosovic >Priority: Low > > In RepairMergeListener:close() it does the following: > > {code:java} > try > { > FBUtilities.waitOnFutures(repairResults, > DatabaseDescriptor.getWriteRpcTimeout()); > } > catch (TimeoutException ex) > { > // We got all responses, but timed out while repairing > int blockFor = consistency.blockFor(keyspace); > if (Tracing.isTracing()) > Tracing.trace("Timed out while read-repairing after receiving all {} > data and digest responses", blockFor); > else > logger.debug("Timeout while read-repairing after receiving all {} > data and digest responses", blockFor); > throw new ReadTimeoutException(consistency, blockFor-1, blockFor, true); > } > {code} > This propagates up and gets sent to the client and we have customers get > confused cause they see timeouts for CL ALL requiring ALL replicas even > though they have read_repair_chance = 0 and using a LOCAL_* CL. > At minimum I suggest instead of using the consistency level of DataResolver > (which is always ALL with read repairs) for the timeout it instead use > repairResults.size(). That is blockFor = repairResults.size() . But saying it > received _blockFor - 1_ is bogus still. Fixing that would require more > changes. I was thinking maybe like so: > > {code:java} > public static void waitOnFutures(List results, long ms, > MutableInt counter) throws TimeoutException > { > for (AsyncOneResponse result : results) > { > result.get(ms, TimeUnit.MILLISECONDS); > counter.increment(); > } > } > {code} > > > > Likewise in SinglePartitionReadLifecycle:maybeAwaitFullDataRead() it says > _blockFor - 1_ for how many were received, which is also bogus. > > Steps used to reproduce was modify RepairMergeListener:close() to always > throw timeout exception. With schema: > {noformat} > CREATE KEYSPACE weather WITH replication = {'class': > 'NetworkTopologyStrategy', 'dc1': '3', 'dc2': '3'} AND durable_writes = true; > CREATE TABLE weather.city ( > cityid int PRIMARY KEY, > name text > ) WITH bloom_filter_fp_chance = 0.01 > AND dclocal_read_repair_chance = 0.0 > AND read_repair_chance = 0.0 > AND speculative_retry = 'NONE'; > {noformat} > Then using the following steps: > # ccm node1 cqlsh > # INSERT INTO weather.city(cityid, name) VALUES (1, 'Canberra'); > # exit; > # ccm node1 flush > # ccm node1 stop > # rm -rf > ~/.ccm/test_repair/node1/data0/weather/city-ff2fade0b18d11e8b1cd097acbab1e3d/mc-1-big-* > # remove the sstable with the insert > # ccm node1 start > # ccm node1 cqlsh > # CONSISTENCY LOCAL_QUORUM; > # select * from weather.city where cityid = 1; > You get result of: > {noformat} > ReadTimeout: Error from server: code=1200 [Coordinator node timed out waiting > for replica nodes' responses] message="Operation timed out - received only 5 > responses." info={'received_responses': 5, 'required_responses': 6, > 'consistency': 'ALL'}{noformat} > But was expecting: > {noformat} > ReadTimeout: Error from server: code=1200 [Coordinator node timed out waiting > for replica nodes' responses] message="Operation timed out - received only 1 > responses." info={'received_responses': 1, 'required_responses': 2, > 'consistency': 'LOCAL_QUORUM'}{noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14715) Read repairs can result in bogus timeout errors to the client
[ https://issues.apache.org/jira/browse/CASSANDRA-14715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17554144#comment-17554144 ] Stefan Miklosovic commented on CASSANDRA-14715: --- I added in-jvm dtest, in this branch. The problem with that test is that it is flaky and I am not sure why. I tried to replicate same steps as [~cam1982] . If exception is thrown, it can be either detected as part of shutdown process when cluster instance is being closed in try or it can be thrown right away. Sometimes it is not thrown at all and it just passes. If you have any ideas [~cam1982] where this flakiness comes from that would be great. https://github.com/apache/cassandra/pull/1683 > Read repairs can result in bogus timeout errors to the client > - > > Key: CASSANDRA-14715 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14715 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Local Write-Read Paths >Reporter: Cameron Zemek >Assignee: Stefan Miklosovic >Priority: Low > > In RepairMergeListener:close() it does the following: > > {code:java} > try > { > FBUtilities.waitOnFutures(repairResults, > DatabaseDescriptor.getWriteRpcTimeout()); > } > catch (TimeoutException ex) > { > // We got all responses, but timed out while repairing > int blockFor = consistency.blockFor(keyspace); > if (Tracing.isTracing()) > Tracing.trace("Timed out while read-repairing after receiving all {} > data and digest responses", blockFor); > else > logger.debug("Timeout while read-repairing after receiving all {} > data and digest responses", blockFor); > throw new ReadTimeoutException(consistency, blockFor-1, blockFor, true); > } > {code} > This propagates up and gets sent to the client and we have customers get > confused cause they see timeouts for CL ALL requiring ALL replicas even > though they have read_repair_chance = 0 and using a LOCAL_* CL. > At minimum I suggest instead of using the consistency level of DataResolver > (which is always ALL with read repairs) for the timeout it instead use > repairResults.size(). That is blockFor = repairResults.size() . But saying it > received _blockFor - 1_ is bogus still. Fixing that would require more > changes. I was thinking maybe like so: > > {code:java} > public static void waitOnFutures(List results, long ms, > MutableInt counter) throws TimeoutException > { > for (AsyncOneResponse result : results) > { > result.get(ms, TimeUnit.MILLISECONDS); > counter.increment(); > } > } > {code} > > > > Likewise in SinglePartitionReadLifecycle:maybeAwaitFullDataRead() it says > _blockFor - 1_ for how many were received, which is also bogus. > > Steps used to reproduce was modify RepairMergeListener:close() to always > throw timeout exception. With schema: > {noformat} > CREATE KEYSPACE weather WITH replication = {'class': > 'NetworkTopologyStrategy', 'dc1': '3', 'dc2': '3'} AND durable_writes = true; > CREATE TABLE weather.city ( > cityid int PRIMARY KEY, > name text > ) WITH bloom_filter_fp_chance = 0.01 > AND dclocal_read_repair_chance = 0.0 > AND read_repair_chance = 0.0 > AND speculative_retry = 'NONE'; > {noformat} > Then using the following steps: > # ccm node1 cqlsh > # INSERT INTO weather.city(cityid, name) VALUES (1, 'Canberra'); > # exit; > # ccm node1 flush > # ccm node1 stop > # rm -rf > ~/.ccm/test_repair/node1/data0/weather/city-ff2fade0b18d11e8b1cd097acbab1e3d/mc-1-big-* > # remove the sstable with the insert > # ccm node1 start > # ccm node1 cqlsh > # CONSISTENCY LOCAL_QUORUM; > # select * from weather.city where cityid = 1; > You get result of: > {noformat} > ReadTimeout: Error from server: code=1200 [Coordinator node timed out waiting > for replica nodes' responses] message="Operation timed out - received only 5 > responses." info={'received_responses': 5, 'required_responses': 6, > 'consistency': 'ALL'}{noformat} > But was expecting: > {noformat} > ReadTimeout: Error from server: code=1200 [Coordinator node timed out waiting > for replica nodes' responses] message="Operation timed out - received only 1 > responses." info={'received_responses': 1, 'required_responses': 2, > 'consistency': 'LOCAL_QUORUM'}{noformat} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14715) Read repairs can result in bogus timeout errors to the client
[ https://issues.apache.org/jira/browse/CASSANDRA-14715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17551078#comment-17551078 ] Stefan Miklosovic commented on CASSANDRA-14715: --- The proposed approch seems good to me but I had hard time to test this consistently. I tried the steps in the description of this ticket and sometimes I got the same response, sometimes I did not. Thinking about testing, I am not sure what test to write here. The options we have: 1) jvm dtest - this approach would be about setuping 2x3 cluster, inserting data, shutting down the node, removing the sstables of this node and starting it again, executing the query and watching its logs. I think that the step "removing of sstables" is not necessary because, I think, data dir of that node is automatically remove on the shutdown. I am not sure about the details and viability of this test approach yet. 2) same as 1 but it would be done in python dtests 3) Testing only RepairMergeListener and its close method. This would be very nice to do but what I noticed is that all inner classes in DataResolver (RepairMergeListener is inner class of DataResolver) are not static and they are all private. I can not just easilly test this class in isolation. I would need to rewrite it all to be static classes and so and this might have not-so-obvious consequences yet. What I found interesting while I was testing this is that when I turned the node off, removed data, turned it on and listed the data dir of respective table, that SSTable was there again. How is this possible? Is not it like commit logs were flushed on the startup or something like that? I think we would need to remove commit logs too. > Read repairs can result in bogus timeout errors to the client > - > > Key: CASSANDRA-14715 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14715 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Local Write-Read Paths >Reporter: Cameron Zemek >Assignee: Stefan Miklosovic >Priority: Low > > In RepairMergeListener:close() it does the following: > > {code:java} > try > { > FBUtilities.waitOnFutures(repairResults, > DatabaseDescriptor.getWriteRpcTimeout()); > } > catch (TimeoutException ex) > { > // We got all responses, but timed out while repairing > int blockFor = consistency.blockFor(keyspace); > if (Tracing.isTracing()) > Tracing.trace("Timed out while read-repairing after receiving all {} > data and digest responses", blockFor); > else > logger.debug("Timeout while read-repairing after receiving all {} > data and digest responses", blockFor); > throw new ReadTimeoutException(consistency, blockFor-1, blockFor, true); > } > {code} > This propagates up and gets sent to the client and we have customers get > confused cause they see timeouts for CL ALL requiring ALL replicas even > though they have read_repair_chance = 0 and using a LOCAL_* CL. > At minimum I suggest instead of using the consistency level of DataResolver > (which is always ALL with read repairs) for the timeout it instead use > repairResults.size(). That is blockFor = repairResults.size() . But saying it > received _blockFor - 1_ is bogus still. Fixing that would require more > changes. I was thinking maybe like so: > > {code:java} > public static void waitOnFutures(List results, long ms, > MutableInt counter) throws TimeoutException > { > for (AsyncOneResponse result : results) > { > result.get(ms, TimeUnit.MILLISECONDS); > counter.increment(); > } > } > {code} > > > > Likewise in SinglePartitionReadLifecycle:maybeAwaitFullDataRead() it says > _blockFor - 1_ for how many were received, which is also bogus. > > Steps used to reproduce was modify RepairMergeListener:close() to always > throw timeout exception. With schema: > {noformat} > CREATE KEYSPACE weather WITH replication = {'class': > 'NetworkTopologyStrategy', 'dc1': '3', 'dc2': '3'} AND durable_writes = true; > CREATE TABLE weather.city ( > cityid int PRIMARY KEY, > name text > ) WITH bloom_filter_fp_chance = 0.01 > AND dclocal_read_repair_chance = 0.0 > AND read_repair_chance = 0.0 > AND speculative_retry = 'NONE'; > {noformat} > Then using the following steps: > # ccm node1 cqlsh > # INSERT INTO weather.city(cityid, name) VALUES (1, 'Canberra'); > # exit; > # ccm node1 flush > # ccm node1 stop > # rm -rf > ~/.ccm/test_repair/node1/data0/weather/city-ff2fade0b18d11e8b1cd097acbab1e3d/mc-1-big-* > # remove the sstable with the insert > # ccm node1 start > # ccm node1 cqlsh > # CONSISTENCY LOCAL_QUORUM; > # select * from weather.city where cityid = 1; > You get result of: > {noformat} > ReadTimeout: Error from server: code=1200 [Coordinator node timed out
[jira] [Commented] (CASSANDRA-14715) Read repairs can result in bogus timeout errors to the client
[ https://issues.apache.org/jira/browse/CASSANDRA-14715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17550869#comment-17550869 ] Stefan Miklosovic commented on CASSANDRA-14715: --- I am on it. > Read repairs can result in bogus timeout errors to the client > - > > Key: CASSANDRA-14715 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14715 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Local Write-Read Paths >Reporter: Cameron Zemek >Assignee: Stefan Miklosovic >Priority: Low > > In RepairMergeListener:close() it does the following: > > {code:java} > try > { > FBUtilities.waitOnFutures(repairResults, > DatabaseDescriptor.getWriteRpcTimeout()); > } > catch (TimeoutException ex) > { > // We got all responses, but timed out while repairing > int blockFor = consistency.blockFor(keyspace); > if (Tracing.isTracing()) > Tracing.trace("Timed out while read-repairing after receiving all {} > data and digest responses", blockFor); > else > logger.debug("Timeout while read-repairing after receiving all {} > data and digest responses", blockFor); > throw new ReadTimeoutException(consistency, blockFor-1, blockFor, true); > } > {code} > This propagates up and gets sent to the client and we have customers get > confused cause they see timeouts for CL ALL requiring ALL replicas even > though they have read_repair_chance = 0 and using a LOCAL_* CL. > At minimum I suggest instead of using the consistency level of DataResolver > (which is always ALL with read repairs) for the timeout it instead use > repairResults.size(). That is blockFor = repairResults.size() . But saying it > received _blockFor - 1_ is bogus still. Fixing that would require more > changes. I was thinking maybe like so: > > {code:java} > public static void waitOnFutures(List results, long ms, > MutableInt counter) throws TimeoutException > { > for (AsyncOneResponse result : results) > { > result.get(ms, TimeUnit.MILLISECONDS); > counter.increment(); > } > } > {code} > > > > Likewise in SinglePartitionReadLifecycle:maybeAwaitFullDataRead() it says > _blockFor - 1_ for how many were received, which is also bogus. > > Steps used to reproduce was modify RepairMergeListener:close() to always > throw timeout exception. With schema: > {noformat} > CREATE KEYSPACE weather WITH replication = {'class': > 'NetworkTopologyStrategy', 'dc1': '3', 'dc2': '3'} AND durable_writes = true; > CREATE TABLE weather.city ( > cityid int PRIMARY KEY, > name text > ) WITH bloom_filter_fp_chance = 0.01 > AND dclocal_read_repair_chance = 0.0 > AND read_repair_chance = 0.0 > AND speculative_retry = 'NONE'; > {noformat} > Then using the following steps: > # ccm node1 cqlsh > # INSERT INTO weather.city(cityid, name) VALUES (1, 'Canberra'); > # exit; > # ccm node1 flush > # ccm node1 stop > # rm -rf > ~/.ccm/test_repair/node1/data0/weather/city-ff2fade0b18d11e8b1cd097acbab1e3d/mc-1-big-* > # remove the sstable with the insert > # ccm node1 start > # ccm node1 cqlsh > # CONSISTENCY LOCAL_QUORUM; > # select * from weather.city where cityid = 1; > You get result of: > {noformat} > ReadTimeout: Error from server: code=1200 [Coordinator node timed out waiting > for replica nodes' responses] message="Operation timed out - received only 5 > responses." info={'received_responses': 5, 'required_responses': 6, > 'consistency': 'ALL'}{noformat} > But was expecting: > {noformat} > ReadTimeout: Error from server: code=1200 [Coordinator node timed out waiting > for replica nodes' responses] message="Operation timed out - received only 1 > responses." info={'received_responses': 1, 'required_responses': 2, > 'consistency': 'LOCAL_QUORUM'}{noformat} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14715) Read repairs can result in bogus timeout errors to the client
[ https://issues.apache.org/jira/browse/CASSANDRA-14715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17542250#comment-17542250 ] Stefan Miklosovic commented on CASSANDRA-14715: --- [~brandon.williams] Not really, honestly, I forgot this one completely. I can take a look, indeed, sometimes next week. > Read repairs can result in bogus timeout errors to the client > - > > Key: CASSANDRA-14715 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14715 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Local Write-Read Paths >Reporter: Cameron Zemek >Assignee: Stefan Miklosovic >Priority: Low > > In RepairMergeListener:close() it does the following: > > {code:java} > try > { > FBUtilities.waitOnFutures(repairResults, > DatabaseDescriptor.getWriteRpcTimeout()); > } > catch (TimeoutException ex) > { > // We got all responses, but timed out while repairing > int blockFor = consistency.blockFor(keyspace); > if (Tracing.isTracing()) > Tracing.trace("Timed out while read-repairing after receiving all {} > data and digest responses", blockFor); > else > logger.debug("Timeout while read-repairing after receiving all {} > data and digest responses", blockFor); > throw new ReadTimeoutException(consistency, blockFor-1, blockFor, true); > } > {code} > This propagates up and gets sent to the client and we have customers get > confused cause they see timeouts for CL ALL requiring ALL replicas even > though they have read_repair_chance = 0 and using a LOCAL_* CL. > At minimum I suggest instead of using the consistency level of DataResolver > (which is always ALL with read repairs) for the timeout it instead use > repairResults.size(). That is blockFor = repairResults.size() . But saying it > received _blockFor - 1_ is bogus still. Fixing that would require more > changes. I was thinking maybe like so: > > {code:java} > public static void waitOnFutures(List results, long ms, > MutableInt counter) throws TimeoutException > { > for (AsyncOneResponse result : results) > { > result.get(ms, TimeUnit.MILLISECONDS); > counter.increment(); > } > } > {code} > > > > Likewise in SinglePartitionReadLifecycle:maybeAwaitFullDataRead() it says > _blockFor - 1_ for how many were received, which is also bogus. > > Steps used to reproduce was modify RepairMergeListener:close() to always > throw timeout exception. With schema: > {noformat} > CREATE KEYSPACE weather WITH replication = {'class': > 'NetworkTopologyStrategy', 'dc1': '3', 'dc2': '3'} AND durable_writes = true; > CREATE TABLE weather.city ( > cityid int PRIMARY KEY, > name text > ) WITH bloom_filter_fp_chance = 0.01 > AND dclocal_read_repair_chance = 0.0 > AND read_repair_chance = 0.0 > AND speculative_retry = 'NONE'; > {noformat} > Then using the following steps: > # ccm node1 cqlsh > # INSERT INTO weather.city(cityid, name) VALUES (1, 'Canberra'); > # exit; > # ccm node1 flush > # ccm node1 stop > # rm -rf > ~/.ccm/test_repair/node1/data0/weather/city-ff2fade0b18d11e8b1cd097acbab1e3d/mc-1-big-* > # remove the sstable with the insert > # ccm node1 start > # ccm node1 cqlsh > # CONSISTENCY LOCAL_QUORUM; > # select * from weather.city where cityid = 1; > You get result of: > {noformat} > ReadTimeout: Error from server: code=1200 [Coordinator node timed out waiting > for replica nodes' responses] message="Operation timed out - received only 5 > responses." info={'received_responses': 5, 'required_responses': 6, > 'consistency': 'ALL'}{noformat} > But was expecting: > {noformat} > ReadTimeout: Error from server: code=1200 [Coordinator node timed out waiting > for replica nodes' responses] message="Operation timed out - received only 1 > responses." info={'received_responses': 1, 'required_responses': 2, > 'consistency': 'LOCAL_QUORUM'}{noformat} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14715) Read repairs can result in bogus timeout errors to the client
[ https://issues.apache.org/jira/browse/CASSANDRA-14715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17542215#comment-17542215 ] Brandon Williams commented on CASSANDRA-14715: -- [~stefan.miklosovic] is this still on your radar? > Read repairs can result in bogus timeout errors to the client > - > > Key: CASSANDRA-14715 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14715 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Local Write-Read Paths >Reporter: Cameron Zemek >Assignee: Stefan Miklosovic >Priority: Low > > In RepairMergeListener:close() it does the following: > > {code:java} > try > { > FBUtilities.waitOnFutures(repairResults, > DatabaseDescriptor.getWriteRpcTimeout()); > } > catch (TimeoutException ex) > { > // We got all responses, but timed out while repairing > int blockFor = consistency.blockFor(keyspace); > if (Tracing.isTracing()) > Tracing.trace("Timed out while read-repairing after receiving all {} > data and digest responses", blockFor); > else > logger.debug("Timeout while read-repairing after receiving all {} > data and digest responses", blockFor); > throw new ReadTimeoutException(consistency, blockFor-1, blockFor, true); > } > {code} > This propagates up and gets sent to the client and we have customers get > confused cause they see timeouts for CL ALL requiring ALL replicas even > though they have read_repair_chance = 0 and using a LOCAL_* CL. > At minimum I suggest instead of using the consistency level of DataResolver > (which is always ALL with read repairs) for the timeout it instead use > repairResults.size(). That is blockFor = repairResults.size() . But saying it > received _blockFor - 1_ is bogus still. Fixing that would require more > changes. I was thinking maybe like so: > > {code:java} > public static void waitOnFutures(List results, long ms, > MutableInt counter) throws TimeoutException > { > for (AsyncOneResponse result : results) > { > result.get(ms, TimeUnit.MILLISECONDS); > counter.increment(); > } > } > {code} > > > > Likewise in SinglePartitionReadLifecycle:maybeAwaitFullDataRead() it says > _blockFor - 1_ for how many were received, which is also bogus. > > Steps used to reproduce was modify RepairMergeListener:close() to always > throw timeout exception. With schema: > {noformat} > CREATE KEYSPACE weather WITH replication = {'class': > 'NetworkTopologyStrategy', 'dc1': '3', 'dc2': '3'} AND durable_writes = true; > CREATE TABLE weather.city ( > cityid int PRIMARY KEY, > name text > ) WITH bloom_filter_fp_chance = 0.01 > AND dclocal_read_repair_chance = 0.0 > AND read_repair_chance = 0.0 > AND speculative_retry = 'NONE'; > {noformat} > Then using the following steps: > # ccm node1 cqlsh > # INSERT INTO weather.city(cityid, name) VALUES (1, 'Canberra'); > # exit; > # ccm node1 flush > # ccm node1 stop > # rm -rf > ~/.ccm/test_repair/node1/data0/weather/city-ff2fade0b18d11e8b1cd097acbab1e3d/mc-1-big-* > # remove the sstable with the insert > # ccm node1 start > # ccm node1 cqlsh > # CONSISTENCY LOCAL_QUORUM; > # select * from weather.city where cityid = 1; > You get result of: > {noformat} > ReadTimeout: Error from server: code=1200 [Coordinator node timed out waiting > for replica nodes' responses] message="Operation timed out - received only 5 > responses." info={'received_responses': 5, 'required_responses': 6, > 'consistency': 'ALL'}{noformat} > But was expecting: > {noformat} > ReadTimeout: Error from server: code=1200 [Coordinator node timed out waiting > for replica nodes' responses] message="Operation timed out - received only 1 > responses." info={'received_responses': 1, 'required_responses': 2, > 'consistency': 'LOCAL_QUORUM'}{noformat} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14715) Read repairs can result in bogus timeout errors to the client
[ https://issues.apache.org/jira/browse/CASSANDRA-14715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17542210#comment-17542210 ] Jai Bheemsen Rao Dhanwada commented on CASSANDRA-14715: --- Any plans to fix this in the upcoming versions or atleast 4.0.x version? the error message is quite mis-leading. > Read repairs can result in bogus timeout errors to the client > - > > Key: CASSANDRA-14715 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14715 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Local Write-Read Paths >Reporter: Cameron Zemek >Assignee: Stefan Miklosovic >Priority: Low > > In RepairMergeListener:close() it does the following: > > {code:java} > try > { > FBUtilities.waitOnFutures(repairResults, > DatabaseDescriptor.getWriteRpcTimeout()); > } > catch (TimeoutException ex) > { > // We got all responses, but timed out while repairing > int blockFor = consistency.blockFor(keyspace); > if (Tracing.isTracing()) > Tracing.trace("Timed out while read-repairing after receiving all {} > data and digest responses", blockFor); > else > logger.debug("Timeout while read-repairing after receiving all {} > data and digest responses", blockFor); > throw new ReadTimeoutException(consistency, blockFor-1, blockFor, true); > } > {code} > This propagates up and gets sent to the client and we have customers get > confused cause they see timeouts for CL ALL requiring ALL replicas even > though they have read_repair_chance = 0 and using a LOCAL_* CL. > At minimum I suggest instead of using the consistency level of DataResolver > (which is always ALL with read repairs) for the timeout it instead use > repairResults.size(). That is blockFor = repairResults.size() . But saying it > received _blockFor - 1_ is bogus still. Fixing that would require more > changes. I was thinking maybe like so: > > {code:java} > public static void waitOnFutures(List results, long ms, > MutableInt counter) throws TimeoutException > { > for (AsyncOneResponse result : results) > { > result.get(ms, TimeUnit.MILLISECONDS); > counter.increment(); > } > } > {code} > > > > Likewise in SinglePartitionReadLifecycle:maybeAwaitFullDataRead() it says > _blockFor - 1_ for how many were received, which is also bogus. > > Steps used to reproduce was modify RepairMergeListener:close() to always > throw timeout exception. With schema: > {noformat} > CREATE KEYSPACE weather WITH replication = {'class': > 'NetworkTopologyStrategy', 'dc1': '3', 'dc2': '3'} AND durable_writes = true; > CREATE TABLE weather.city ( > cityid int PRIMARY KEY, > name text > ) WITH bloom_filter_fp_chance = 0.01 > AND dclocal_read_repair_chance = 0.0 > AND read_repair_chance = 0.0 > AND speculative_retry = 'NONE'; > {noformat} > Then using the following steps: > # ccm node1 cqlsh > # INSERT INTO weather.city(cityid, name) VALUES (1, 'Canberra'); > # exit; > # ccm node1 flush > # ccm node1 stop > # rm -rf > ~/.ccm/test_repair/node1/data0/weather/city-ff2fade0b18d11e8b1cd097acbab1e3d/mc-1-big-* > # remove the sstable with the insert > # ccm node1 start > # ccm node1 cqlsh > # CONSISTENCY LOCAL_QUORUM; > # select * from weather.city where cityid = 1; > You get result of: > {noformat} > ReadTimeout: Error from server: code=1200 [Coordinator node timed out waiting > for replica nodes' responses] message="Operation timed out - received only 5 > responses." info={'received_responses': 5, 'required_responses': 6, > 'consistency': 'ALL'}{noformat} > But was expecting: > {noformat} > ReadTimeout: Error from server: code=1200 [Coordinator node timed out waiting > for replica nodes' responses] message="Operation timed out - received only 1 > responses." info={'received_responses': 1, 'required_responses': 2, > 'consistency': 'LOCAL_QUORUM'}{noformat} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14715) Read repairs can result in bogus timeout errors to the client
[ https://issues.apache.org/jira/browse/CASSANDRA-14715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16610127#comment-16610127 ] Cameron Zemek commented on CASSANDRA-14715: --- I should also point out this means that the timeouts don't get captured in the read timeout metric either due to the timeout occuring on the close for the PartitionIterator returned by StorageProxy:read where the timeouts are caught (see readRegular) > Read repairs can result in bogus timeout errors to the client > - > > Key: CASSANDRA-14715 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14715 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths >Reporter: Cameron Zemek >Priority: Minor > > In RepairMergeListener:close() it does the following: > > {code:java} > try > { > FBUtilities.waitOnFutures(repairResults, > DatabaseDescriptor.getWriteRpcTimeout()); > } > catch (TimeoutException ex) > { > // We got all responses, but timed out while repairing > int blockFor = consistency.blockFor(keyspace); > if (Tracing.isTracing()) > Tracing.trace("Timed out while read-repairing after receiving all {} > data and digest responses", blockFor); > else > logger.debug("Timeout while read-repairing after receiving all {} > data and digest responses", blockFor); > throw new ReadTimeoutException(consistency, blockFor-1, blockFor, true); > } > {code} > This propagates up and gets sent to the client and we have customers get > confused cause they see timeouts for CL ALL requiring ALL replicas even > though they have read_repair_chance = 0 and using a LOCAL_* CL. > At minimum I suggest instead of using the consistency level of DataResolver > (which is always ALL with read repairs) for the timeout it instead use > repairResults.size(). That is blockFor = repairResults.size() . But saying it > received _blockFor - 1_ is bogus still. Fixing that would require more > changes. I was thinking maybe like so: > > {code:java} > public static void waitOnFutures(List results, long ms, > MutableInt counter) throws TimeoutException > { > for (AsyncOneResponse result : results) > { > result.get(ms, TimeUnit.MILLISECONDS); > counter.increment(); > } > } > {code} > > > > Likewise in SinglePartitionReadLifecycle:maybeAwaitFullDataRead() it says > _blockFor - 1_ for how many were received, which is also bogus. > > Steps used to reproduce was modify RepairMergeListener:close() to always > throw timeout exception. With schema: > {noformat} > CREATE KEYSPACE weather WITH replication = {'class': > 'NetworkTopologyStrategy', 'dc1': '3', 'dc2': '3'} AND durable_writes = true; > CREATE TABLE weather.city ( > cityid int PRIMARY KEY, > name text > ) WITH bloom_filter_fp_chance = 0.01 > AND dclocal_read_repair_chance = 0.0 > AND read_repair_chance = 0.0 > AND speculative_retry = 'NONE'; > {noformat} > Then using the following steps: > # ccm node1 cqlsh > # INSERT INTO weather.city(cityid, name) VALUES (1, 'Canberra'); > # exit; > # ccm node1 flush > # ccm node1 stop > # rm -rf > ~/.ccm/test_repair/node1/data0/weather/city-ff2fade0b18d11e8b1cd097acbab1e3d/mc-1-big-* > # remove the sstable with the insert > # ccm node1 start > # ccm node1 cqlsh > # CONSISTENCY LOCAL_QUORUM; > # select * from weather.city where cityid = 1; > You get result of: > {noformat} > ReadTimeout: Error from server: code=1200 [Coordinator node timed out waiting > for replica nodes' responses] message="Operation timed out - received only 5 > responses." info={'received_responses': 5, 'required_responses': 6, > 'consistency': 'ALL'}{noformat} > But was expecting: > {noformat} > ReadTimeout: Error from server: code=1200 [Coordinator node timed out waiting > for replica nodes' responses] message="Operation timed out - received only 1 > responses." info={'received_responses': 1, 'required_responses': 2, > 'consistency': 'LOCAL_QUORUM'}{noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org