[jira] [Commented] (CASSANDRA-7392) Abort in-progress queries that time out

Stefania (JIRA) Wed, 16 Sep 2015 02:23:48 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-7392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14747201#comment-14747201
 ]


Stefania commented on CASSANDRA-7392:
-------------------------------------

I've added the no spam logger and moved the logging to DEBUG, increasing the 
default max number of queries that we report to 50. I know aggregation doesn't 
make much sense but it won't hurt, in case an app sends the same query multiple 
times. 

I also prefer to re-introduce the CAS in {{MonitoringStateRef}}: if the worker 
thread does not notice that the query was aborted it will carry on iterating 
which defeats the purpose of aborting the queries. 

bq. They are usually going to be unique so there is nothing to aggregate on.

It's also worse than this. I didn't realize that the CQL string reconstructed 
from {{ReadCommand}} is really an approximation as we don't have all the 
information there. For example, a query without a condition on the primary key 
will be split in several queries as follows:

{code}
SELECT * FROM ks.test2 WHERE token(id) > -1976574744135038542 AND token(id) <= 
-1551387922747101229 LIMIT 5000: total time 10011 msec - timeout 10000 msec
SELECT * FROM ks.test2 WHERE token(id) > 1096463829018333632 AND token(id) <= 
1355062136393692257 LIMIT 5000: total time 10078 msec - timeout 10000 msec
SELECT * FROM ks.test2 WHERE token(id) > 8977444122753183931 AND token(id) <= 
9033798691964141178 LIMIT 5000: total time 10057 msec - timeout 10000 msec
SELECT * FROM ks.test2 WHERE token(id) > 2635684107471725435 AND token(id) <= 
2755551655031657904 LIMIT 5000: total time 10078 msec - timeout 10000 msec
SELECT * FROM ks.test2 WHERE token(id) > 9080285075713538993 AND token(id) <= 
9187108821678730728 LIMIT 5000: total time 10056 msec - timeout 10000 msec
SELECT * FROM ks.test2 WHERE token(id) > -8240319968209337270 AND token(id) <= 
-7817157413941317374 LIMIT 5000: total time 10032 msec - timeout 10000 msec
SELECT * FROM ks.test2 WHERE token(id) > 8340735344052968255 AND token(id) <= 
8546322458038003371 LIMIT 5000: total time 10057 msec - timeout 10000 msec
SELECT * FROM ks.test2 WHERE token(id) > 5722969564085623706 AND token(id) <= 
5806785306771146835 LIMIT 5000: total time 10073 msec - timeout 10000 msec
SELECT * FROM ks.test2 WHERE token(id) > 7726207511295901422 AND token(id) <= 
7839180972141923302 LIMIT 5000: total time 10058 msec - timeout 10000 msec
SELECT * FROM ks.test2 WHERE token(id) > 3532380910529882202 AND token(id) <= 
3654921169010564232 LIMIT 5000: total time 10074 msec - timeout 10000 msec
SELECT * FROM ks.test2 WHERE token(id) > 7881865912870825334 AND token(id) <= 
7931494104861828509 LIMIT 5000: total time 10058 msec - timeout 10000 msec
{code}

A query with a condition only on non primary keys and {{ALLOW FILTERING}] will 
not be reported as such since the filtering is not done on the worker. There 
may be other limitations.

To fix this we would either need to pass the original user query all the way to 
{{ReadCommand}}, which involves changing the serialization format and increases 
the memory footprint, or we need to move the reporting to the coordinator, or 
we would have to rely on a table like we do for tracing. Unless we try and 
squeeze the first option in before 3.0 hits, I think the other two options are 
best dealt with in a separate ticket where we focus more on logging rather than 
aborting queries.



> Abort in-progress queries that time out
> ---------------------------------------
>
>                 Key: CASSANDRA-7392
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7392
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Stefania
>            Priority: Critical
>             Fix For: 3.x
>
>
> Currently we drop queries that time out before we get to them (because node 
> is overloaded) but not queries that time out while being processed.  
> (Particularly common for index queries on data that shouldn't be indexed.)  
> Adding the latter and logging when we have to interrupt one gets us a poor 
> man's "slow query log" for free.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7392) Abort in-progress queries that time out

Reply via email to