[ https://issues.apache.org/jira/browse/CASSANDRA-15442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16992941#comment-16992941 ]
Jordan West commented on CASSANDRA-15442: ----------------------------------------- Thanks for the patch [~yifanc]. I've actually wondered about it for a while but never thought of it as a bug (assumed it was by design). Glad to see it addressed. A few minor nits, all optional: * I agree that we should only use the repair timeout for the reason you mentioned. I find it confusing that the write timeout is involved since the client has no idea if read repair will occur when submitting the read. * Since `awaitRepairs` is only called from `awaitRepairsUntil` consider merging the two functions. It might make the code more clear and require less jumping around. * I agree with [~bdeggleston] that setting the write timeout very high in your test can show how the change ensures independence from it – regardless of any other code changes. It may also give you a more flexible way to write the assertion at the end of `readRepairTimeoutTest()`. * Additionally, re: the two `Assert.assertTrue`s at the end of `readRepairTimeoutTest`: consider adding failure messages to make them more descriptive, as well as collapsing it in to one expression (e.g `actualTimeTaken < reducedReadTimeout + magicDelayAmount && actualTimeTaken > reducedReadTimeout`). > Read repair implicitly increases read timeout value > --------------------------------------------------- > > Key: CASSANDRA-15442 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15442 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Core > Reporter: Yifan Cai > Assignee: Yifan Cai > Priority: Normal > > When read repair occurs during a read, internally, it starts several > _blocking_ operations in sequence. See > {{org.apache.cassandra.service.StorageProxy#fetchRows}}. > The timeline of the blocking operations > # Regular read, wait for full data/digest read response to complete. > {{reads[*].awaitResponses();}} > # Read repair read, wait for full data read response to complete. > {{reads[*].awaitReadRepair();}} > # Read repair write, wait for write response to complete. > {{concatAndBlockOnRepair(results, repairs);}} > Step 1 and 2 share the same timeout, and wait for the duration of read > timeout, say 5 s. > Step 3 waits for the duration of write timeout, say 2 s. > In the worse case, the actual time taken for a read could accumulate to ~7 s, > if each individual step does not exceed the timeout value. > From the client perspective, it may not expect a request taken higher than > the database configured timeout value. > Such scenario is especially bad for the clients that have set up client-side > timeout monitoring close to the configured one. The clients think the > operations timed out and abort, but they are in fact still running on server. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org