[jira] [Commented] (CASSANDRA-5483) Repair tracing

Ben Chan (JIRA) Thu, 10 Jul 2014 08:01:51 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14057548#comment-14057548
 ]


Ben Chan commented on CASSANDRA-5483:
-------------------------------------

A few things to note about 5483-16:

Just to emphasize: these are corner-cases and deoptimizations that may not ever 
matter in practice. Even if the decision is to not fix them, I wanted to note 
their existence.

*(1)*
If remote nodes have different config values for {{tracetype_repair_ttl}}, then 
remote traces will end up with a different ttl. Having the trace for a given 
session partially disappear may be surprising to the user.

The approach 5483-15 took was to use the ttl configured on the node being 
repaired.

*(2)*
I don't know whether this will ever arise in practice, but I thought I'd note: 
If server versions are ever mixed, then an unknown trace type could be sent 
over the wire, causing an out-of-bounds array access during deserialization.

The approach 5483-15 took was almost a no-op on the remote end (the tracing 
session was created, but nothing was traced). This is still not quite right. 
Better would be to log an error and not create the session at all.

It might still possibly make sense to create the session anyway in the case of 
A->B->C tracestate propagation, but that might be stretching things.

*(3)*
Activity notification for every local repair trace starts a new exponential 
backoff polling cycle more often than necessary.

Of course, I can't say whether this extra resource usage will actually matter 
in practice (only practice can tell).

----

Okay, so what you meant was to change tracetype to be an actual Enum type (I 
can be a little dense sometimes, and code leaves far less room for ambiguity).

One thing I was thinking forward to with bitmasks was flexible subsetting of 
traces and notifications. Tracing may not ever need trace levels or flexible 
subsetting, but I thought I'd note the idea for future reference.

{noformat}
// pseudocode; please excuse the bad naming sense.
REPAIR_0 = 0b001 << MIN_SHIFT;
REPAIR_1 = 0b010 << MIN_SHIFT;
REPAIR_2 = 0b100 << MIN_SHIFT;

// read "zero and up", etc
REPAIR_0U = REPAIR_0 | REPAIR_1 | REPAIR_2;
REPAIR_1U =            REPAIR_1 | REPAIR_2;

NOTIFY_0 = 0b001;
NOTIFY_1 = 0b010;
NOTIFY_2 = 0b100;
NOTIFY_0U = ...;
NOTIFY_1U = ...;

val = config.get_val();

trace(REPAIR_0, "only traced for REPAIR_0; superseded in the more verbose 
traces");
trace(REPAIR_0U, "always traced for repair");
trace(REPAIR_1U, "traced at REPAIR_1 and up");
trace(REPAIR_2 | val, "only at REPAIR_2, or during the tracetypes specified in 
val");
trace(REPAIR_1U & ~val, "only at REPAIR_1 and up, excluding the tracetypes 
specified in val");
trace(REPAIR_2 | REPAIR_0, "for some reason, we don't want to trace this at 
REPAIR_1");

// you could even have semi-orthogonal subsetting and levels
trace(REPAIR_0U | NOTIFY_1, "notify at NOTIFY_1 only");
trace(REPAIR_1U | NOTIFY_0U, "notify at all NOTIFY levels, but only when 
tracing REPAIR_1 and up");

awaitNotification(NOTIFY_1U); // notify me for NOTIFY_1 and up.
{noformat}

I believe log4j does something similar with what they call markers.

Of course, you can do this with Enum and EnumSet too (though more verbosely).

----

Aside from lyubent's nit (and possibly the issues in section 1), are there any 
other issues that prevent this patch from being minimally useful?

Or even better, is there at least an alpha period where {{trunk}} is still 
allowed to make breaking changes to the api, protocol, etc? It's hard to forsee 
everything.


> Repair tracing
> --------------
>
>                 Key: CASSANDRA-5483
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5483
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Tools
>            Reporter: Yuki Morishita
>            Assignee: Ben Chan
>            Priority: Minor
>              Labels: repair
>         Attachments: 5483-full-trunk.txt, 
> 5483-v06-04-Allow-tracing-ttl-to-be-configured.patch, 
> 5483-v06-05-Add-a-command-column-to-system_traces.events.patch, 
> 5483-v06-06-Fix-interruption-in-tracestate-propagation.patch, 
> 5483-v07-07-Better-constructor-parameters-for-DebuggableThreadPoolExecutor.patch,
>  5483-v07-08-Fix-brace-style.patch, 
> 5483-v07-09-Add-trace-option-to-a-more-complete-set-of-repair-functions.patch,
>  5483-v07-10-Correct-name-of-boolean-repairedAt-to-fullRepair.patch, 
> 5483-v08-11-Shorten-trace-messages.-Use-Tracing-begin.patch, 
> 5483-v08-12-Trace-streaming-in-Differencer-StreamingRepairTask.patch, 
> 5483-v08-13-sendNotification-of-local-traces-back-to-nodetool.patch, 
> 5483-v08-14-Poll-system_traces.events.patch, 
> 5483-v08-15-Limit-trace-notifications.-Add-exponential-backoff.patch, 
> 5483-v09-16-Fix-hang-caused-by-incorrect-exit-code.patch, 
> 5483-v10-17-minor-bugfixes-and-changes.patch, 
> 5483-v10-rebased-and-squashed-471f5cc.patch, 5483-v11-01-squashed.patch, 
> 5483-v11-squashed-nits.patch, 5483-v12-02-cassandra-yaml-ttl-doc.patch, 
> 5483-v13-608fb03-May-14-trace-formatting-changes.patch, 
> 5483-v14-01-squashed.patch, 
> 5483-v15-02-Hook-up-exponential-backoff-functionality.patch, 
> 5483-v15-03-Exact-doubling-for-exponential-backoff.patch, 
> 5483-v15-04-Re-add-old-StorageService-JMX-signatures.patch, 
> 5483-v15-05-Move-command-column-to-system_traces.sessions.patch, 
> 5483-v15.patch, ccm-repair-test, cqlsh-left-justify-text-columns.patch, 
> prerepair-vs-postbuggedrepair.diff, test-5483-system_traces-events.txt, 
> trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch, 
> trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch,
>  tr...@8ebeee1-5483-v01-001-trace-filtering-and-tracestate-propagation.txt, 
> tr...@8ebeee1-5483-v01-002-simple-repair-tracing.txt, 
> v02p02-5483-v03-0003-Make-repair-tracing-controllable-via-nodetool.patch, 
> v02p02-5483-v04-0003-This-time-use-an-EnumSet-to-pass-boolean-repair-options.patch,
>  v02p02-5483-v05-0003-Use-long-instead-of-EnumSet-to-work-with-JMX.patch
>
>
> I think it would be nice to log repair stats and results like query tracing 
> stores traces to system keyspace. With it, you don't have to lookup each log 
> file to see what was the status and how it performed the repair you invoked. 
> Instead, you can query the repair log with session ID to see the state and 
> stats of all nodes involved in that repair session.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-5483) Repair tracing

Reply via email to