Cameron Zemek created CASSANDRA-14715: -----------------------------------------
Summary: Read repairs can result in bogus timeout errors to the client Key: CASSANDRA-14715 URL: https://issues.apache.org/jira/browse/CASSANDRA-14715 Project: Cassandra Issue Type: Bug Components: Local Write-Read Paths Reporter: Cameron Zemek In RepairMergeListener:close() it does the following: {code:java} try { FBUtilities.waitOnFutures(repairResults, DatabaseDescriptor.getWriteRpcTimeout()); } catch (TimeoutException ex) { // We got all responses, but timed out while repairing int blockFor = consistency.blockFor(keyspace); if (Tracing.isTracing()) Tracing.trace("Timed out while read-repairing after receiving all {} data and digest responses", blockFor); else logger.debug("Timeout while read-repairing after receiving all {} data and digest responses", blockFor); throw new ReadTimeoutException(consistency, blockFor-1, blockFor, true); } {code} This propagates up and gets sent to the client and we have customers get confused cause they see timeouts for CL ALL requiring ALL replicas even though they have read_repair_chance = 0 and using a LOCAL_* CL. At minimum I suggest instead of using the consistency level of DataResolver (which is always ALL with read repairs) for the timeout it instead use repairResults.size(). That is blockFor = repairResults.size() . But saying it received _blockFor - 1_ is bogus still. Fixing that would require more changes. I was thinking maybe like so: {code:java} public static void waitOnFutures(List<AsyncOneResponse> results, long ms, MutableInt counter) throws TimeoutException { for (AsyncOneResponse result : results) { result.get(ms, TimeUnit.MILLISECONDS); counter.increment(); } } {code} Likewise in SinglePartitionReadLifecycle:maybeAwaitFullDataRead() it says _blockFor - 1_ for how many were received, which is also bogus. Steps used to reproduce was modify RepairMergeListener:close() to always throw timeout exception. With schema: {noformat} CREATE KEYSPACE weather WITH replication = {'class': 'NetworkTopologyStrategy', 'dc1': '3', 'dc2': '3'} AND durable_writes = true; CREATE TABLE weather.city ( cityid int PRIMARY KEY, name text ) WITH bloom_filter_fp_chance = 0.01 AND dclocal_read_repair_chance = 0.0 AND read_repair_chance = 0.0 AND speculative_retry = 'NONE'; {noformat} Then using the following steps: # ccm node1 cqlsh # INSERT INTO weather.city(cityid, name) VALUES (1, 'Canberra'); # exit; # ccm node1 flush # ccm node1 stop # rm -rf ~/.ccm/test_repair/node1/data0/weather/city-ff2fade0b18d11e8b1cd097acbab1e3d/mc-1-big-* # remove the sstable with the insert # ccm node1 start # ccm node1 cqlsh # CONSISTENCY LOCAL_QUORUM; # select * from weather.city where cityid = 1; You get result of: {noformat} ReadTimeout: Error from server: code=1200 [Coordinator node timed out waiting for replica nodes' responses] message="Operation timed out - received only 5 responses." info={'received_responses': 5, 'required_responses': 6, 'consistency': 'ALL'}{noformat} But was expecting: {noformat} ReadTimeout: Error from server: code=1200 [Coordinator node timed out waiting for replica nodes' responses] message="Operation timed out - received only 1 responses." info={'received_responses': 1, 'required_responses': 2, 'consistency': 'LOCAL_QUORUM'}{noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org