[ https://issues.apache.org/jira/browse/CASSANDRA-12403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15414982#comment-15414982 ]
Stefania commented on CASSANDRA-12403: -------------------------------------- Hello Hoshii-san Thank you for the patch. The approach is correct, you understood exactly what I meant in my previous comment. I created a branch with the initial commit and some suggestions in a follow-up commit. I've also started the tests, the results are still pending: |trunk|[patch|https://github.com/stef1927/cassandra/commits/12403]|[testall|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-12403-testall/]|[dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-12403-dtest/]| The suggestions can be summarized as follows: * {{logging_slow_query_threshold_in_ms}} in cassandra.yaml was renamed to {{query_log_timeout_in_ms}} and the default value increased from 200 to 500 milliseconds; I know this is very high but I prefer to be on the safe side, WDYT? * The methods associated with the retrieval of the configuration property have been renamed for consistency with the new yaml property name. * With the initial version of the patch, the same query would have been potentially added to the slow queries queue several times, I've changed {{MonitorableImpl}} so that we log a slow query only when it is completed. This has the advantage that we have the accurate total time for the query, and that we do not log queries that time out twice. * Given the change above, it was then possible to refactor further {{MonitorTask}}, which I did. There was also a problem in that the slow queries queue was bounded but we did not report in the log message that some slow query logs were missing due to a full queue (via {{...}} in the message), the refactoring should have fixed this. I've also added some comments (unrelated to this patch, they should already have been there). * I think we can log both slow and timed out queries with the same scheduling task. * The unit tests have been changed to reflect the points above. * In {{MonitorTask}} the summary log message for slow queries was downgraded from WARN to INFO, and the logger wrapped with {{NoSpamLogger}}, which will log the summary message at most once every 5 minutes. {{NoSpamLogger}} is also used for the summary log message for timed-out queries, although this is still at WARN level. However, detailed logs will still be logged every time at DEBUG level in debug.log. Finally, I've created some distributed tests [here|https://github.com/stef1927/cassandra-dtest/commits/12403]. You can run them to see what the log messages will look like. Here is a sample: {code} INFO [ScheduledTasks:1] 2016-08-10 16:56:50,209 NoSpamLogger.java:91 - Some operations were slow, details available at debug level (debug.log) DEBUG [ScheduledTasks:1] 2016-08-10 16:56:50,211 MonitoringTask.java:173 - 3 operations were slow in the last 4998 msecs: <SELECT * FROM ks.test2 LIMIT 5000>, time 3026 msec - slow timeout 30 msec <SELECT * FROM ks.test2 WHERE id = 1 LIMIT 5000>, was slow 2 times: avg/min/max 330/325/335 msec - slow timeout 30 msec <SELECT * FROM ks.test2 WHERE token(id) > 0 LIMIT 5000>, time 1449 msec - slow timeout 30 msec {code} The INFO message appears in system.log and debug.log but not more than once every 5 minutes. The DEBUG log appears every 5 seconds (configurable) if there is at least a slow query. Could you review the follow-up commit and let us know what you think? Please do not hesitate to express any concerns, especially regarding the logging choices, since they are intended for operators. > Slow query detecting > -------------------- > > Key: CASSANDRA-12403 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12403 > Project: Cassandra > Issue Type: Improvement > Reporter: Shogo Hoshii > Assignee: Shogo Hoshii > Attachments: sample.log, slow_query.patch, slow_query.patch > > > Hello, > In cassandra production environment, users sometimes build anti-pattern > tables and throw queries in inefficient manners. > So I would like to suggest a feature that enables to log slow query. > The feature can help cassandra operators to identify bad query patterns. > Then operators can give advices about queries and data model to users who > don't know cassandra so much. > This ticket is related to CASSANDRA-6226, and I focus on detecting bad query > patterns, not aborting them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)