[ https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Lyuben Todorov updated CASSANDRA-5483: -------------------------------------- Attachment: prerepair-vs-postbuggedrepair.diff The new set of patches (v8.11 to v8.15) cause high cpu usage which doesn't dissipate. This happens on the node on which the repair was issued. To reproduce just apply the patches to [the branch|https://github.com/lyubent/cassandra/tree/5483] and carry out repair using -tr on all 3 nodes. Sometimes this happens on the 1st repair sometimes it takes repairing all 3 nodes multiple times (most it's ever taken me is 6x repairs, 2 on each node in a 3 node ccm cluster). Looking at the threading, there are a lot of _ReadStage_ threads running after this happens (approx 32 that are waiting and 1 that is running) so as far as I can tell the polling doesn't stop even though the repair completed and there is also one TracingStage that is waiting to complete. I'm attaching prerepair-vs-postrepair.diff that shows the extra threads once this problem occurs. > Repair tracing > -------------- > > Key: CASSANDRA-5483 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5483 > Project: Cassandra > Issue Type: Improvement > Components: Tools > Reporter: Yuki Morishita > Assignee: Ben Chan > Priority: Minor > Labels: repair > Attachments: 5483-full-trunk.txt, > 5483-v06-04-Allow-tracing-ttl-to-be-configured.patch, > 5483-v06-05-Add-a-command-column-to-system_traces.events.patch, > 5483-v06-06-Fix-interruption-in-tracestate-propagation.patch, > 5483-v07-07-Better-constructor-parameters-for-DebuggableThreadPoolExecutor.patch, > 5483-v07-08-Fix-brace-style.patch, > 5483-v07-09-Add-trace-option-to-a-more-complete-set-of-repair-functions.patch, > 5483-v07-10-Correct-name-of-boolean-repairedAt-to-fullRepair.patch, > 5483-v08-11-Shorten-trace-messages.-Use-Tracing-begin.patch, > 5483-v08-12-Trace-streaming-in-Differencer-StreamingRepairTask.patch, > 5483-v08-13-sendNotification-of-local-traces-back-to-nodetool.patch, > 5483-v08-14-Poll-system_traces.events.patch, > 5483-v08-15-Limit-trace-notifications.-Add-exponential-backoff.patch, > ccm-repair-test, cqlsh-left-justify-text-columns.patch, > prerepair-vs-postbuggedrepair.diff, test-5483-system_traces-events.txt, > trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch, > trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch, > tr...@8ebeee1-5483-v01-001-trace-filtering-and-tracestate-propagation.txt, > tr...@8ebeee1-5483-v01-002-simple-repair-tracing.txt, > v02p02-5483-v03-0003-Make-repair-tracing-controllable-via-nodetool.patch, > v02p02-5483-v04-0003-This-time-use-an-EnumSet-to-pass-boolean-repair-options.patch, > v02p02-5483-v05-0003-Use-long-instead-of-EnumSet-to-work-with-JMX.patch > > > I think it would be nice to log repair stats and results like query tracing > stores traces to system keyspace. With it, you don't have to lookup each log > file to see what was the status and how it performed the repair you invoked. > Instead, you can query the repair log with session ID to see the state and > stats of all nodes involved in that repair session. -- This message was sent by Atlassian JIRA (v6.2#6252)