[ 
https://issues.apache.org/jira/browse/CASSANDRA-12403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15414982#comment-15414982
 ] 

Stefania commented on CASSANDRA-12403:
--------------------------------------

Hello Hoshii-san

Thank you for the patch. The approach is correct, you understood exactly what I 
meant in my previous comment. 

I created a branch with the initial commit and some suggestions in a follow-up 
commit. I've also started the tests, the results are still pending:

|trunk|[patch|https://github.com/stef1927/cassandra/commits/12403]|[testall|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-12403-testall/]|[dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-12403-dtest/]|

The suggestions can be summarized as follows:

* {{logging_slow_query_threshold_in_ms}} in cassandra.yaml was renamed to 
{{query_log_timeout_in_ms}} and the default value increased from 200 to 500 
milliseconds; I know this is very high but I prefer to be on the safe side, 
WDYT?
* The methods associated with the retrieval of the configuration property have 
been renamed for consistency with the new yaml property name.
* With the initial version of the patch, the same query would have been 
potentially added to the slow queries queue several times, I've changed 
{{MonitorableImpl}} so that we log a slow query only when it is completed. This 
has the advantage that we have the accurate total time for the query, and that 
we do not log queries that time out twice.
* Given the change above, it was then possible to refactor further 
{{MonitorTask}}, which I did. There was also a problem in that the slow queries 
queue was bounded but we did not report in the log message that some slow query 
logs were missing due to a full queue (via {{...}} in the message), the 
refactoring should have fixed this. I've also added some comments (unrelated to 
this patch, they should already have been there). 
* I think we can log both slow and timed out queries with the same scheduling 
task.
* The unit tests have been changed to reflect the points above.
* In {{MonitorTask}} the summary log message for slow queries was downgraded 
from WARN to INFO, and the logger wrapped with {{NoSpamLogger}}, which will log 
the summary message at most once every 5 minutes. {{NoSpamLogger}} is also used 
for the summary log message for timed-out queries, although this is still at 
WARN level. However, detailed logs will still be logged every time at DEBUG 
level in debug.log.

Finally, I've created some distributed tests 
[here|https://github.com/stef1927/cassandra-dtest/commits/12403]. You can run 
them to see what the log messages will look like. Here is a sample:

{code}
INFO  [ScheduledTasks:1] 2016-08-10 16:56:50,209 NoSpamLogger.java:91 - Some 
operations were slow, details available at debug level (debug.log)
DEBUG [ScheduledTasks:1] 2016-08-10 16:56:50,211 MonitoringTask.java:173 - 3 
operations were slow in the last 4998 msecs:
<SELECT * FROM ks.test2 LIMIT 5000>, time 3026 msec - slow timeout 30 msec
<SELECT * FROM ks.test2 WHERE id = 1 LIMIT 5000>, was slow 2 times: avg/min/max 
330/325/335 msec - slow timeout 30 msec
<SELECT * FROM ks.test2 WHERE token(id) > 0 LIMIT 5000>, time 1449 msec - slow 
timeout 30 msec
{code}

The INFO message appears in system.log and debug.log but not more than once 
every 5 minutes. The DEBUG log appears every 5 seconds (configurable) if there 
is at least a slow query.

Could you review the follow-up commit and let us know what you think? Please do 
not hesitate to express any concerns, especially regarding the logging choices, 
since they are intended for operators.

> Slow query detecting
> --------------------
>
>                 Key: CASSANDRA-12403
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12403
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Shogo Hoshii
>            Assignee: Shogo Hoshii
>         Attachments: sample.log, slow_query.patch, slow_query.patch
>
>
> Hello,
> In cassandra production environment, users sometimes build anti-pattern 
> tables and throw queries in inefficient manners.
> So I would like to suggest a feature that enables to log slow query.
> The feature can help cassandra operators to identify bad query patterns.
> Then operators can give advices about queries and data model to users who 
> don't know cassandra so much.
> This ticket is related to CASSANDRA-6226, and I focus on detecting bad query 
> patterns, not aborting them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to