[ https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13937976#comment-13937976 ]
Ben Chan commented on CASSANDRA-5483: ------------------------------------- {noformat} # tested with branch 5483 @ bce0c2c555a3; should also work following successful #git apply 5483-full-trunk.txt W=https://issues.apache.org/jira/secure/attachment for url in \ $W/12635094/5483-v08-11-Shorten-trace-messages.-Use-Tracing-begin.patch \ $W/12635095/5483-v08-12-Trace-streaming-in-Differencer-StreamingRepairTask.patch \ $W/12635096/5483-v08-13-sendNotification-of-local-traces-back-to-nodetool.patch \ $W/12635097/5483-v08-14-Poll-system_traces.events.patch \ $W/12635098/5483-v08-15-Limit-trace-notifications.-Add-exponential-backoff.patch do [ -e $(basename $url) ] || curl -sO $url; done && git apply 5483-v08-*.patch && ant clean && ant ./ccm-repair-test -kR && ccm node1 stop && ccm node1 clear && ccm node1 start && ./ccm-repair-test -rt {noformat} * {{v08-11}} There was an error in one of the log formats in Differencer, which made my grep for "out of sync" in the logs fruitless. * {{v08-12}} I ended up using the handleStreamEvent of StreamingRepairTask instead of implementing and registering my own StreamEventHandler. The new trace messages may need adjusting, especially for ProgressEvent, which is essentially just a toString currently. * {{v08-13}} This works by adding a guarded sendNotification to TraceState#trace. * {{v08-14}} This works by starting a thread to poll {{system_traces.events}}, and by adding notify functionality to TraceState. There is some jitter in the ordering between local and remote traces. An easy fix would be to have the query thread handle all sendNotification of traces. You have to accept latency in sendNotification of local traces in order to get better ordering. It might be necessary to delay all trace sendNotification by a few seconds to make it more likely that remote traces have arrived. * {{v08-15}} Even more added TraceState functionality. All to try to reduce the amount of polling without hurting latency too much. There are only a few local traces that you would expect to be followed by a remote trace, so only wake up for those. Poll with an exponential backoff after each notification. --- Heuristics are messy, and I expect plenty of opinions on {{v08-14}} and {{v08-15}}. I'm not especially proud of that code, but I can't think of anything better at the moment, given the (self-imposed?) constraints. I may have reinvented the wheel with synchronization primitives. I checked {{java.util.concurrent.*}} and {{SimpleCondition}}, but not much beyond that. I could have missed something; I don't fully understand some of the classes. What I wanted was to be woken up (with a timeout) if anything has changed since the last time I checked. Theoretically, it should work for multiple consumers (As long as no one waits for longer than {{Integer.MAX_VALUE}} updates), though that's not really necessary here, if that would simplify the code. The code seems to work reasonably well for small-scale tests. I can convince myself that it won't blow up for long repairs, but haven't done a full test yet. > Repair tracing > -------------- > > Key: CASSANDRA-5483 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5483 > Project: Cassandra > Issue Type: Improvement > Components: Tools > Reporter: Yuki Morishita > Assignee: Ben Chan > Priority: Minor > Labels: repair > Attachments: 5483-full-trunk.txt, > 5483-v06-04-Allow-tracing-ttl-to-be-configured.patch, > 5483-v06-05-Add-a-command-column-to-system_traces.events.patch, > 5483-v06-06-Fix-interruption-in-tracestate-propagation.patch, > 5483-v07-07-Better-constructor-parameters-for-DebuggableThreadPoolExecutor.patch, > 5483-v07-08-Fix-brace-style.patch, > 5483-v07-09-Add-trace-option-to-a-more-complete-set-of-repair-functions.patch, > 5483-v07-10-Correct-name-of-boolean-repairedAt-to-fullRepair.patch, > 5483-v08-11-Shorten-trace-messages.-Use-Tracing-begin.patch, > 5483-v08-12-Trace-streaming-in-Differencer-StreamingRepairTask.patch, > 5483-v08-13-sendNotification-of-local-traces-back-to-nodetool.patch, > 5483-v08-14-Poll-system_traces.events.patch, > 5483-v08-15-Limit-trace-notifications.-Add-exponential-backoff.patch, > ccm-repair-test, cqlsh-left-justify-text-columns.patch, > test-5483-system_traces-events.txt, > trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch, > trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch, > tr...@8ebeee1-5483-v01-001-trace-filtering-and-tracestate-propagation.txt, > tr...@8ebeee1-5483-v01-002-simple-repair-tracing.txt, > v02p02-5483-v03-0003-Make-repair-tracing-controllable-via-nodetool.patch, > v02p02-5483-v04-0003-This-time-use-an-EnumSet-to-pass-boolean-repair-options.patch, > v02p02-5483-v05-0003-Use-long-instead-of-EnumSet-to-work-with-JMX.patch > > > I think it would be nice to log repair stats and results like query tracing > stores traces to system keyspace. With it, you don't have to lookup each log > file to see what was the status and how it performed the repair you invoked. > Instead, you can query the repair log with session ID to see the state and > stats of all nodes involved in that repair session. -- This message was sent by Atlassian JIRA (v6.2#6252)