[ https://issues.apache.org/jira/browse/CASSANDRA-12256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Geoffrey Yu updated CASSANDRA-12256: ------------------------------------ Attachment: 12256-trunk.txt I've attached a first pass at this ticket. The majority of the changes are to pass down the query start timestamp all the way to the {{ReadCallback}} and {{AbstractWriteResponseHandler}}. The timestamp is recorded when the {{QueryState}} is created for a particular query. > Properly respect the request timeouts > ------------------------------------- > > Key: CASSANDRA-12256 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12256 > Project: Cassandra > Issue Type: Improvement > Reporter: Sylvain Lebresne > Assignee: Geoffrey Yu > Fix For: 3.x > > Attachments: 12256-trunk.txt > > > We have a number of {{request_timeout_*}} option, that probably every user > expect to be an upper bound on how long the coordinator will wait before > timeouting a request, but it's actually not always the case, especially for > read requests. > I believe we don't respect those timeout properly in at least the following > cases: > * On a digest mismatch: in that case, we reset the timeout for the data > query, which means the overall query might take up to twice the configured > timeout before timeouting. > * On a range query: the timeout is reset for every sub-range that is queried. > With many nodes and vnodes, a range query could span tons of sub-range and so > a range query could take pretty much arbitrary long before actually > timeouting for the user. > * On short reads: we also reset the timeout for every short reads "retries". > It's also worth noting that even outside those, the timeouts don't take most > of the processing done by the coordinator (query parsing and CQL handling for > instance) into account. > Now, in all fairness, the reason this is this way is that the timeout > currently are *not* timeout for the full user request, but rather how long a > coordinator should wait on any given replica for any given internal query > before giving up. *However*, I'm pretty sure this is not what user > intuitively expect and want, *especially* in the context of CASSANDRA-2848 > where the goal is explicitely to have an upper bound on the query from the > user point of view. > So I'm suggesting we change how those timeouts are handled to really be > timeouts on the whole user query. > And by that I basically just mean that we'd mark the start of each query as > soon as possible in the processing, and use that starting time as base in > {{ReadCallback.await}} and {{AbstractWriteResponseHandler.get()}}. It won't > be perfect in the sense that we'll still only possibly timeout during > "blocking" operations, so typically if parsing a query takes more than your > timeout, you still won't timeout until that query is sent, but I think that's > probably fine in practice because 1) if you timeouts are small enough that > this matter, you're probably doing it wrong and 2) we can totally improve on > that later if needs be. -- This message was sent by Atlassian JIRA (v6.3.4#6332)