[jira] [Commented] (CASSANDRA-5483) Repair tracing

2017-01-03 Thread Tomasz Grabiec (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15794949#comment-15794949
 ] 

Tomasz Grabiec commented on CASSANDRA-5483:
---

This change seems to have changed the behavior of "nodetool flush" to not flush 
the system keyspace (see CASSANDRA-11830).

{code}
commit f5866ca2bac8ce530bb4e20832ff61e622206815
Author: Ben Chan 
Date:   Wed Dec 3 13:15:10 2014 -0600

Add repair tracing

Patch by Ben Chan; reviewed by jmckenzie for CASSANDRA-5483
{code}

Relevant hunk:

{code}
--- a/src/java/org/apache/cassandra/tools/NodeTool.java
+++ b/src/java/org/apache/cassandra/tools/NodeTool.java
@@ -318,10 +318,15 @@ public class NodeTool
 
 protected List parseOptionalKeyspace(List cmdArgs, 
NodeProbe nodeProbe)
 {
+return parseOptionalKeyspace(cmdArgs, nodeProbe, false);
+}
+
+protected List parseOptionalKeyspace(List cmdArgs, 
NodeProbe nodeProbe, boolean includeSystemKS)
+{
 List keyspaces = new ArrayList<>();
 
 if (cmdArgs == null || cmdArgs.isEmpty())
-keyspaces.addAll(nodeProbe.getKeyspaces());
+keyspaces.addAll(includeSystemKS ? nodeProbe.getKeyspaces() : 
nodeProbe.getNonSystemKeyspaces());
 else
 keyspaces.add(cmdArgs.get(0));
 
{code}

I wonder what is the rationale for this.

> Repair tracing
> --
>
> Key: CASSANDRA-5483
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5483
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Yuki Morishita
>Assignee: Ben Chan
>Priority: Minor
>  Labels: repair
> Fix For: 2.2.0 beta 1
>
> Attachments: 5483-full-trunk.txt, 
> 5483-v06-04-Allow-tracing-ttl-to-be-configured.patch, 
> 5483-v06-05-Add-a-command-column-to-system_traces.events.patch, 
> 5483-v06-06-Fix-interruption-in-tracestate-propagation.patch, 
> 5483-v07-07-Better-constructor-parameters-for-DebuggableThreadPoolExecutor.patch,
>  5483-v07-08-Fix-brace-style.patch, 
> 5483-v07-09-Add-trace-option-to-a-more-complete-set-of-repair-functions.patch,
>  5483-v07-10-Correct-name-of-boolean-repairedAt-to-fullRepair.patch, 
> 5483-v08-11-Shorten-trace-messages.-Use-Tracing-begin.patch, 
> 5483-v08-12-Trace-streaming-in-Differencer-StreamingRepairTask.patch, 
> 5483-v08-13-sendNotification-of-local-traces-back-to-nodetool.patch, 
> 5483-v08-14-Poll-system_traces.events.patch, 
> 5483-v08-15-Limit-trace-notifications.-Add-exponential-backoff.patch, 
> 5483-v09-16-Fix-hang-caused-by-incorrect-exit-code.patch, 
> 5483-v10-17-minor-bugfixes-and-changes.patch, 
> 5483-v10-rebased-and-squashed-471f5cc.patch, 5483-v11-01-squashed.patch, 
> 5483-v11-squashed-nits.patch, 5483-v12-02-cassandra-yaml-ttl-doc.patch, 
> 5483-v13-608fb03-May-14-trace-formatting-changes.patch, 
> 5483-v14-01-squashed.patch, 
> 5483-v15-02-Hook-up-exponential-backoff-functionality.patch, 
> 5483-v15-03-Exact-doubling-for-exponential-backoff.patch, 
> 5483-v15-04-Re-add-old-StorageService-JMX-signatures.patch, 
> 5483-v15-05-Move-command-column-to-system_traces.sessions.patch, 
> 5483-v15.patch, 5483-v17-00.patch, 5483-v17-01.patch, 5483-v17.patch, 
> ccm-repair-test, cqlsh-left-justify-text-columns.patch, 
> prerepair-vs-postbuggedrepair.diff, test-5483-system_traces-events.txt, 
> trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch, 
> trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch,
>  tr...@8ebeee1-5483-v01-001-trace-filtering-and-tracestate-propagation.txt, 
> tr...@8ebeee1-5483-v01-002-simple-repair-tracing.txt, 
> v02p02-5483-v03-0003-Make-repair-tracing-controllable-via-nodetool.patch, 
> v02p02-5483-v04-0003-This-time-use-an-EnumSet-to-pass-boolean-repair-options.patch,
>  v02p02-5483-v05-0003-Use-long-instead-of-EnumSet-to-work-with-JMX.patch
>
>
> I think it would be nice to log repair stats and results like query tracing 
> stores traces to system keyspace. With it, you don't have to lookup each log 
> file to see what was the status and how it performed the repair you invoked. 
> Instead, you can query the repair log with session ID to see the state and 
> stats of all nodes involved in that repair session.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-5483) Repair tracing

2014-12-07 Thread Ben Chan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14237144#comment-14237144
 ] 

Ben Chan commented on CASSANDRA-5483:
-

Thanks for helping get this through the final (?) set of speed bumps. As fast 
as this codebase moves, I'm surprised that last rebase went so smoothly. (NRN)


> Repair tracing
> --
>
> Key: CASSANDRA-5483
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5483
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Yuki Morishita
>Assignee: Ben Chan
>Priority: Minor
>  Labels: repair
> Fix For: 3.0
>
> Attachments: 5483-full-trunk.txt, 
> 5483-v06-04-Allow-tracing-ttl-to-be-configured.patch, 
> 5483-v06-05-Add-a-command-column-to-system_traces.events.patch, 
> 5483-v06-06-Fix-interruption-in-tracestate-propagation.patch, 
> 5483-v07-07-Better-constructor-parameters-for-DebuggableThreadPoolExecutor.patch,
>  5483-v07-08-Fix-brace-style.patch, 
> 5483-v07-09-Add-trace-option-to-a-more-complete-set-of-repair-functions.patch,
>  5483-v07-10-Correct-name-of-boolean-repairedAt-to-fullRepair.patch, 
> 5483-v08-11-Shorten-trace-messages.-Use-Tracing-begin.patch, 
> 5483-v08-12-Trace-streaming-in-Differencer-StreamingRepairTask.patch, 
> 5483-v08-13-sendNotification-of-local-traces-back-to-nodetool.patch, 
> 5483-v08-14-Poll-system_traces.events.patch, 
> 5483-v08-15-Limit-trace-notifications.-Add-exponential-backoff.patch, 
> 5483-v09-16-Fix-hang-caused-by-incorrect-exit-code.patch, 
> 5483-v10-17-minor-bugfixes-and-changes.patch, 
> 5483-v10-rebased-and-squashed-471f5cc.patch, 5483-v11-01-squashed.patch, 
> 5483-v11-squashed-nits.patch, 5483-v12-02-cassandra-yaml-ttl-doc.patch, 
> 5483-v13-608fb03-May-14-trace-formatting-changes.patch, 
> 5483-v14-01-squashed.patch, 
> 5483-v15-02-Hook-up-exponential-backoff-functionality.patch, 
> 5483-v15-03-Exact-doubling-for-exponential-backoff.patch, 
> 5483-v15-04-Re-add-old-StorageService-JMX-signatures.patch, 
> 5483-v15-05-Move-command-column-to-system_traces.sessions.patch, 
> 5483-v15.patch, 5483-v17-00.patch, 5483-v17-01.patch, 5483-v17.patch, 
> ccm-repair-test, cqlsh-left-justify-text-columns.patch, 
> prerepair-vs-postbuggedrepair.diff, test-5483-system_traces-events.txt, 
> trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch, 
> trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch,
>  tr...@8ebeee1-5483-v01-001-trace-filtering-and-tracestate-propagation.txt, 
> tr...@8ebeee1-5483-v01-002-simple-repair-tracing.txt, 
> v02p02-5483-v03-0003-Make-repair-tracing-controllable-via-nodetool.patch, 
> v02p02-5483-v04-0003-This-time-use-an-EnumSet-to-pass-boolean-repair-options.patch,
>  v02p02-5483-v05-0003-Use-long-instead-of-EnumSet-to-work-with-JMX.patch
>
>
> I think it would be nice to log repair stats and results like query tracing 
> stores traces to system keyspace. With it, you don't have to lookup each log 
> file to see what was the status and how it performed the repair you invoked. 
> Instead, you can query the repair log with session ID to see the state and 
> stats of all nodes involved in that repair session.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-5483) Repair tracing

2014-12-02 Thread Ben Chan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14231840#comment-14231840
 ] 

Ben Chan commented on CASSANDRA-5483:
-

Sorry; limited computer access last week, zero access to dev box (network setup 
issues), turkey.



Branch 5483_squashed (currently at commit 78686c61e38e, merges cleanly with 
trunk at 06f626acd27b) looks fine to me; looks like there were only mechanical 
changes needed to resolve the merge conflicts.

Ran my standard simplistic test, and no obvious problems (for both 78686c61e38e 
and again when merged with 06f626acd27b).

Any problem is likely to be something I did, or some unforseen interaction with 
code in the updated trunk. Hopefully neither of those possibilities is true.  
I'll keep watching this thread/issue just in case.


> Repair tracing
> --
>
> Key: CASSANDRA-5483
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5483
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Yuki Morishita
>Assignee: Ben Chan
>Priority: Minor
>  Labels: repair
> Fix For: 3.0
>
> Attachments: 5483-full-trunk.txt, 
> 5483-v06-04-Allow-tracing-ttl-to-be-configured.patch, 
> 5483-v06-05-Add-a-command-column-to-system_traces.events.patch, 
> 5483-v06-06-Fix-interruption-in-tracestate-propagation.patch, 
> 5483-v07-07-Better-constructor-parameters-for-DebuggableThreadPoolExecutor.patch,
>  5483-v07-08-Fix-brace-style.patch, 
> 5483-v07-09-Add-trace-option-to-a-more-complete-set-of-repair-functions.patch,
>  5483-v07-10-Correct-name-of-boolean-repairedAt-to-fullRepair.patch, 
> 5483-v08-11-Shorten-trace-messages.-Use-Tracing-begin.patch, 
> 5483-v08-12-Trace-streaming-in-Differencer-StreamingRepairTask.patch, 
> 5483-v08-13-sendNotification-of-local-traces-back-to-nodetool.patch, 
> 5483-v08-14-Poll-system_traces.events.patch, 
> 5483-v08-15-Limit-trace-notifications.-Add-exponential-backoff.patch, 
> 5483-v09-16-Fix-hang-caused-by-incorrect-exit-code.patch, 
> 5483-v10-17-minor-bugfixes-and-changes.patch, 
> 5483-v10-rebased-and-squashed-471f5cc.patch, 5483-v11-01-squashed.patch, 
> 5483-v11-squashed-nits.patch, 5483-v12-02-cassandra-yaml-ttl-doc.patch, 
> 5483-v13-608fb03-May-14-trace-formatting-changes.patch, 
> 5483-v14-01-squashed.patch, 
> 5483-v15-02-Hook-up-exponential-backoff-functionality.patch, 
> 5483-v15-03-Exact-doubling-for-exponential-backoff.patch, 
> 5483-v15-04-Re-add-old-StorageService-JMX-signatures.patch, 
> 5483-v15-05-Move-command-column-to-system_traces.sessions.patch, 
> 5483-v15.patch, 5483-v17-00.patch, 5483-v17-01.patch, 5483-v17.patch, 
> ccm-repair-test, cqlsh-left-justify-text-columns.patch, 
> prerepair-vs-postbuggedrepair.diff, test-5483-system_traces-events.txt, 
> trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch, 
> trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch,
>  tr...@8ebeee1-5483-v01-001-trace-filtering-and-tracestate-propagation.txt, 
> tr...@8ebeee1-5483-v01-002-simple-repair-tracing.txt, 
> v02p02-5483-v03-0003-Make-repair-tracing-controllable-via-nodetool.patch, 
> v02p02-5483-v04-0003-This-time-use-an-EnumSet-to-pass-boolean-repair-options.patch,
>  v02p02-5483-v05-0003-Use-long-instead-of-EnumSet-to-work-with-JMX.patch
>
>
> I think it would be nice to log repair stats and results like query tracing 
> stores traces to system keyspace. With it, you don't have to lookup each log 
> file to see what was the status and how it performed the repair you invoked. 
> Instead, you can query the repair log with session ID to see the state and 
> stats of all nodes involved in that repair session.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-5483) Repair tracing

2014-11-25 Thread Joshua McKenzie (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14225291#comment-14225291
 ] 

Joshua McKenzie commented on CASSANDRA-5483:


Sorry for the delay - was at a conference last week.  I went ahead and rebased 
to current trunk as of today as there were some more changes around repair 
parallelism; your most recent changes look good and I'm not too worried about 
us having some public final strings in TraceKeyspace.

If I could get a once-over reverse-review from you to make sure I didn't miss 
anything in that final rebase, I'll get this committed.

Rebased and squashed branch available 
[*Here*|https://github.com/josh-mckenzie/cassandra/compare/5483_squashed]

> Repair tracing
> --
>
> Key: CASSANDRA-5483
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5483
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Yuki Morishita
>Assignee: Ben Chan
>Priority: Minor
>  Labels: repair
> Fix For: 3.0
>
> Attachments: 5483-full-trunk.txt, 
> 5483-v06-04-Allow-tracing-ttl-to-be-configured.patch, 
> 5483-v06-05-Add-a-command-column-to-system_traces.events.patch, 
> 5483-v06-06-Fix-interruption-in-tracestate-propagation.patch, 
> 5483-v07-07-Better-constructor-parameters-for-DebuggableThreadPoolExecutor.patch,
>  5483-v07-08-Fix-brace-style.patch, 
> 5483-v07-09-Add-trace-option-to-a-more-complete-set-of-repair-functions.patch,
>  5483-v07-10-Correct-name-of-boolean-repairedAt-to-fullRepair.patch, 
> 5483-v08-11-Shorten-trace-messages.-Use-Tracing-begin.patch, 
> 5483-v08-12-Trace-streaming-in-Differencer-StreamingRepairTask.patch, 
> 5483-v08-13-sendNotification-of-local-traces-back-to-nodetool.patch, 
> 5483-v08-14-Poll-system_traces.events.patch, 
> 5483-v08-15-Limit-trace-notifications.-Add-exponential-backoff.patch, 
> 5483-v09-16-Fix-hang-caused-by-incorrect-exit-code.patch, 
> 5483-v10-17-minor-bugfixes-and-changes.patch, 
> 5483-v10-rebased-and-squashed-471f5cc.patch, 5483-v11-01-squashed.patch, 
> 5483-v11-squashed-nits.patch, 5483-v12-02-cassandra-yaml-ttl-doc.patch, 
> 5483-v13-608fb03-May-14-trace-formatting-changes.patch, 
> 5483-v14-01-squashed.patch, 
> 5483-v15-02-Hook-up-exponential-backoff-functionality.patch, 
> 5483-v15-03-Exact-doubling-for-exponential-backoff.patch, 
> 5483-v15-04-Re-add-old-StorageService-JMX-signatures.patch, 
> 5483-v15-05-Move-command-column-to-system_traces.sessions.patch, 
> 5483-v15.patch, 5483-v17-00.patch, 5483-v17-01.patch, 5483-v17.patch, 
> ccm-repair-test, cqlsh-left-justify-text-columns.patch, 
> prerepair-vs-postbuggedrepair.diff, test-5483-system_traces-events.txt, 
> trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch, 
> trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch,
>  tr...@8ebeee1-5483-v01-001-trace-filtering-and-tracestate-propagation.txt, 
> tr...@8ebeee1-5483-v01-002-simple-repair-tracing.txt, 
> v02p02-5483-v03-0003-Make-repair-tracing-controllable-via-nodetool.patch, 
> v02p02-5483-v04-0003-This-time-use-an-EnumSet-to-pass-boolean-repair-options.patch,
>  v02p02-5483-v05-0003-Use-long-instead-of-EnumSet-to-work-with-JMX.patch
>
>
> I think it would be nice to log repair stats and results like query tracing 
> stores traces to system keyspace. With it, you don't have to lookup each log 
> file to see what was the status and how it performed the repair you invoked. 
> Instead, you can query the repair log with session ID to see the state and 
> stats of all nodes involved in that repair session.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-5483) Repair tracing

2014-11-14 Thread Ben Chan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14212771#comment-14212771
 ] 

Ben Chan commented on CASSANDRA-5483:
-

Updated https://github.com/usrbincc/cassandra/tree/5483-review (currently at 
commit 85d974753dc7f).

So a few changes to tracing got merged in on Nov 12, making a cut-and-dried 
manual rebase impossible.

- Use a default ttl on the system_traces.* CFs. Use that to clean up the code 
in Tracing, since you no longer have to specify ttls.
- Move a lot of row-insertion code out of Tracing into a new TraceKeyspace 
class.
- Since there is no more need to specify ttl, use CFRowAdder (which doesn't do 
ttl) for convenience.

Since the repair tracing patch requires a user-configurable ttl (or at the very 
least, a different ttl for repair tracing and query tracing), I needed to 
re-add ttl specification. Since it wasn't much more code, I decided to only use 
explicitly-expiring cells if the ttl didn't match the default ttl.  
  
This appears to save two native ints per column from a quick skim of the source 
code. Not sure if that's really enough to care about (especially since repair 
tracing, which is likely to insert more rows, has a ttl different from the 
default), but it was a simple enough change.



Merge conflicts are a real pain point with git. I attempted to make reviewing 
the merge conflict resolution changes easier by inserting an intermediate 
commit that includes the conflict markers from git, unmodified. Feel free to 
hide all the messiness with a {{git merge --squash}}.

- Fixed a race condition with repair trace sessions sometimes not being 
properly stopped (see commit 801b6fbf56771).
- Possible code no-no to note: I made two private fields (SESSIONS_TABLE and 
EVENTS_TABLE) in TraceKeyspace public. I only use EVENTS_TABLE, but made them 
both public for symmetry (good? bad? don't know). They're final fields, if 
that's any consolation.


> Repair tracing
> --
>
> Key: CASSANDRA-5483
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5483
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Yuki Morishita
>Assignee: Ben Chan
>Priority: Minor
>  Labels: repair
> Fix For: 3.0
>
> Attachments: 5483-full-trunk.txt, 
> 5483-v06-04-Allow-tracing-ttl-to-be-configured.patch, 
> 5483-v06-05-Add-a-command-column-to-system_traces.events.patch, 
> 5483-v06-06-Fix-interruption-in-tracestate-propagation.patch, 
> 5483-v07-07-Better-constructor-parameters-for-DebuggableThreadPoolExecutor.patch,
>  5483-v07-08-Fix-brace-style.patch, 
> 5483-v07-09-Add-trace-option-to-a-more-complete-set-of-repair-functions.patch,
>  5483-v07-10-Correct-name-of-boolean-repairedAt-to-fullRepair.patch, 
> 5483-v08-11-Shorten-trace-messages.-Use-Tracing-begin.patch, 
> 5483-v08-12-Trace-streaming-in-Differencer-StreamingRepairTask.patch, 
> 5483-v08-13-sendNotification-of-local-traces-back-to-nodetool.patch, 
> 5483-v08-14-Poll-system_traces.events.patch, 
> 5483-v08-15-Limit-trace-notifications.-Add-exponential-backoff.patch, 
> 5483-v09-16-Fix-hang-caused-by-incorrect-exit-code.patch, 
> 5483-v10-17-minor-bugfixes-and-changes.patch, 
> 5483-v10-rebased-and-squashed-471f5cc.patch, 5483-v11-01-squashed.patch, 
> 5483-v11-squashed-nits.patch, 5483-v12-02-cassandra-yaml-ttl-doc.patch, 
> 5483-v13-608fb03-May-14-trace-formatting-changes.patch, 
> 5483-v14-01-squashed.patch, 
> 5483-v15-02-Hook-up-exponential-backoff-functionality.patch, 
> 5483-v15-03-Exact-doubling-for-exponential-backoff.patch, 
> 5483-v15-04-Re-add-old-StorageService-JMX-signatures.patch, 
> 5483-v15-05-Move-command-column-to-system_traces.sessions.patch, 
> 5483-v15.patch, 5483-v17-00.patch, 5483-v17-01.patch, 5483-v17.patch, 
> ccm-repair-test, cqlsh-left-justify-text-columns.patch, 
> prerepair-vs-postbuggedrepair.diff, test-5483-system_traces-events.txt, 
> trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch, 
> trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch,
>  tr...@8ebeee1-5483-v01-001-trace-filtering-and-tracestate-propagation.txt, 
> tr...@8ebeee1-5483-v01-002-simple-repair-tracing.txt, 
> v02p02-5483-v03-0003-Make-repair-tracing-controllable-via-nodetool.patch, 
> v02p02-5483-v04-0003-This-time-use-an-EnumSet-to-pass-boolean-repair-options.patch,
>  v02p02-5483-v05-0003-Use-long-instead-of-EnumSet-to-work-with-JMX.patch
>
>
> I think it would be nice to log repair stats and results like query tracing 
> stores traces to system keyspace. With it, you don't have to lookup each log 
> file to see what was the status and how it performed the repair you invoked. 
> Instead, you can query the repair log with session ID to see the stat

[jira] [Commented] (CASSANDRA-5483) Repair tracing

2014-11-12 Thread Joshua McKenzie (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208811#comment-14208811
 ] 

Joshua McKenzie commented on CASSANDRA-5483:


Linked branch looks good, renames are solid, tests out fine.  300MB of repairs 
at RF=3 generated around 300k of event trace data so I'm comfortable with the 
current level; we can revisit paring that down in the future as appropriate.

Patch doesn't apply/rebase to current trunk cleanly - if I could get you to do 
a quick rebase we'll be ready to wrap this up.  Thanks for all the hard work on 
this!

> Repair tracing
> --
>
> Key: CASSANDRA-5483
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5483
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Yuki Morishita
>Assignee: Ben Chan
>Priority: Minor
>  Labels: repair
> Fix For: 3.0
>
> Attachments: 5483-full-trunk.txt, 
> 5483-v06-04-Allow-tracing-ttl-to-be-configured.patch, 
> 5483-v06-05-Add-a-command-column-to-system_traces.events.patch, 
> 5483-v06-06-Fix-interruption-in-tracestate-propagation.patch, 
> 5483-v07-07-Better-constructor-parameters-for-DebuggableThreadPoolExecutor.patch,
>  5483-v07-08-Fix-brace-style.patch, 
> 5483-v07-09-Add-trace-option-to-a-more-complete-set-of-repair-functions.patch,
>  5483-v07-10-Correct-name-of-boolean-repairedAt-to-fullRepair.patch, 
> 5483-v08-11-Shorten-trace-messages.-Use-Tracing-begin.patch, 
> 5483-v08-12-Trace-streaming-in-Differencer-StreamingRepairTask.patch, 
> 5483-v08-13-sendNotification-of-local-traces-back-to-nodetool.patch, 
> 5483-v08-14-Poll-system_traces.events.patch, 
> 5483-v08-15-Limit-trace-notifications.-Add-exponential-backoff.patch, 
> 5483-v09-16-Fix-hang-caused-by-incorrect-exit-code.patch, 
> 5483-v10-17-minor-bugfixes-and-changes.patch, 
> 5483-v10-rebased-and-squashed-471f5cc.patch, 5483-v11-01-squashed.patch, 
> 5483-v11-squashed-nits.patch, 5483-v12-02-cassandra-yaml-ttl-doc.patch, 
> 5483-v13-608fb03-May-14-trace-formatting-changes.patch, 
> 5483-v14-01-squashed.patch, 
> 5483-v15-02-Hook-up-exponential-backoff-functionality.patch, 
> 5483-v15-03-Exact-doubling-for-exponential-backoff.patch, 
> 5483-v15-04-Re-add-old-StorageService-JMX-signatures.patch, 
> 5483-v15-05-Move-command-column-to-system_traces.sessions.patch, 
> 5483-v15.patch, 5483-v17-00.patch, 5483-v17-01.patch, 5483-v17.patch, 
> ccm-repair-test, cqlsh-left-justify-text-columns.patch, 
> prerepair-vs-postbuggedrepair.diff, test-5483-system_traces-events.txt, 
> trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch, 
> trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch,
>  tr...@8ebeee1-5483-v01-001-trace-filtering-and-tracestate-propagation.txt, 
> tr...@8ebeee1-5483-v01-002-simple-repair-tracing.txt, 
> v02p02-5483-v03-0003-Make-repair-tracing-controllable-via-nodetool.patch, 
> v02p02-5483-v04-0003-This-time-use-an-EnumSet-to-pass-boolean-repair-options.patch,
>  v02p02-5483-v05-0003-Use-long-instead-of-EnumSet-to-work-with-JMX.patch
>
>
> I think it would be nice to log repair stats and results like query tracing 
> stores traces to system keyspace. With it, you don't have to lookup each log 
> file to see what was the status and how it performed the repair you invoked. 
> Instead, you can query the repair log with session ID to see the state and 
> stats of all nodes involved in that repair session.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-5483) Repair tracing

2014-11-10 Thread Ben Chan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14205451#comment-14205451
 ] 

Ben Chan commented on CASSANDRA-5483:
-

Sorry; SMART and reallocated sectors. Which only really accounts for about 2-3 
days of the delay. But just to say that it wasn't merely a case of hangover 
from too much Halloween candy.



Updated https://github.com/usrbincc/cassandra/tree/5483-review (currently at 
commit 202a2e2e5e602).

Merges cleanly with trunk (at commit d286ac7d072fe), building and testing 
cleanly.

I decided to be a little opinionated and did some refactoring along the lines 
of my Oct 23 message.
- Used a TraceState#waitActivity function instead of TraceState#isDone 
(waitActivity gets closer to doing only what it says on the tin; makes it less 
hairy to comment).
- Moved all exponential backoff timeout code to StorageService#createQueryThread

In addition, I renamed TraceState#enableNotifications to 
TraceState#enableActivityNotification to attempt (naming is hard) to avoid 
confusion with TraceState#setNotificationHandle, which is entirely unrelated.

Note: beyond having made this opinionated edit, I'm not planning to be 
particularly opinionated about advocating for it. All of that code should 
eventually go away once there is some way to get notified about table updates 
instead of having to do all that messy polling.

Extra note: Cassandra triggers seem to be very close to what is needed, if only 
they could be specified to run on a given node (i.e. the node that is being 
repaired). The last time I checked on this, this wasn't possible.



Unfiltered traces:

- The extra traces are generic message send-receive traces that existed prior 
to this patch. They were originally there for query tracing, which benefits 
from more detailed tracing.
- These extra traces were filtered out for repair up until v16 of this patch. 
This means that any discussions of trace messages prior to that point are 
referring to the filtered traces.

But I can't say that they're doing any real harm. I mean, it's only 3x the 
traces (estimated), and not an order of magnitude or more.

It's probably fine as it is. I certainly can't unequivocally state that there's 
no use for those extra traces. Besides, extra information can always be 
filtered out at a higher level (assuming it's tagged appropriately).

> Repair tracing
> --
>
> Key: CASSANDRA-5483
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5483
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Yuki Morishita
>Assignee: Ben Chan
>Priority: Minor
>  Labels: repair
> Fix For: 3.0
>
> Attachments: 5483-full-trunk.txt, 
> 5483-v06-04-Allow-tracing-ttl-to-be-configured.patch, 
> 5483-v06-05-Add-a-command-column-to-system_traces.events.patch, 
> 5483-v06-06-Fix-interruption-in-tracestate-propagation.patch, 
> 5483-v07-07-Better-constructor-parameters-for-DebuggableThreadPoolExecutor.patch,
>  5483-v07-08-Fix-brace-style.patch, 
> 5483-v07-09-Add-trace-option-to-a-more-complete-set-of-repair-functions.patch,
>  5483-v07-10-Correct-name-of-boolean-repairedAt-to-fullRepair.patch, 
> 5483-v08-11-Shorten-trace-messages.-Use-Tracing-begin.patch, 
> 5483-v08-12-Trace-streaming-in-Differencer-StreamingRepairTask.patch, 
> 5483-v08-13-sendNotification-of-local-traces-back-to-nodetool.patch, 
> 5483-v08-14-Poll-system_traces.events.patch, 
> 5483-v08-15-Limit-trace-notifications.-Add-exponential-backoff.patch, 
> 5483-v09-16-Fix-hang-caused-by-incorrect-exit-code.patch, 
> 5483-v10-17-minor-bugfixes-and-changes.patch, 
> 5483-v10-rebased-and-squashed-471f5cc.patch, 5483-v11-01-squashed.patch, 
> 5483-v11-squashed-nits.patch, 5483-v12-02-cassandra-yaml-ttl-doc.patch, 
> 5483-v13-608fb03-May-14-trace-formatting-changes.patch, 
> 5483-v14-01-squashed.patch, 
> 5483-v15-02-Hook-up-exponential-backoff-functionality.patch, 
> 5483-v15-03-Exact-doubling-for-exponential-backoff.patch, 
> 5483-v15-04-Re-add-old-StorageService-JMX-signatures.patch, 
> 5483-v15-05-Move-command-column-to-system_traces.sessions.patch, 
> 5483-v15.patch, 5483-v17-00.patch, 5483-v17-01.patch, 5483-v17.patch, 
> ccm-repair-test, cqlsh-left-justify-text-columns.patch, 
> prerepair-vs-postbuggedrepair.diff, test-5483-system_traces-events.txt, 
> trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch, 
> trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch,
>  tr...@8ebeee1-5483-v01-001-trace-filtering-and-tracestate-propagation.txt, 
> tr...@8ebeee1-5483-v01-002-simple-repair-tracing.txt, 
> v02p02-5483-v03-0003-Make-repair-tracing-controllable-via-nodetool.patch, 
> v02p02-5483-v04-0003-This-time-use-an-EnumSet-to-pass-boolean-repair-options.patch,
>  v02p02-5

[jira] [Commented] (CASSANDRA-5483) Repair tracing

2014-10-29 Thread Joshua McKenzie (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188766#comment-14188766
 ] 

Joshua McKenzie commented on CASSANDRA-5483:


Think that was just me not refreshing the tab I had open in Chrome w/this 
ticket in it - missed the update concerning local build problems.

Re: isDone: miscommunication in ticket.  Changing from bit-magic to named 
variables w/flow changed isDone into a self-commenting method and addressed my 
concerns with it.  I appreciate the javadoc @ the top to give just general 
overview as well.  The method is simple enough that switching to BlockingQueue 
isn't really necessary at this time imo.

Re: unfiltered traces: are these extraneous messages or just more verbosity 
surrounding the activity of the repair?  It does generate a lot of output but 
this is for tracing and I'd be comfortable with us filtering down as a future 
effort if we decide it's necessary.

Last nit: isDone should be renamed since it does more than just check if things 
are done.  We could either split out the isDone check from waitAndDouble() or 
just rename to waitWithDoubleTimeout() or something like that?

> Repair tracing
> --
>
> Key: CASSANDRA-5483
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5483
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Yuki Morishita
>Assignee: Ben Chan
>Priority: Minor
>  Labels: repair
> Fix For: 3.0
>
> Attachments: 5483-full-trunk.txt, 
> 5483-v06-04-Allow-tracing-ttl-to-be-configured.patch, 
> 5483-v06-05-Add-a-command-column-to-system_traces.events.patch, 
> 5483-v06-06-Fix-interruption-in-tracestate-propagation.patch, 
> 5483-v07-07-Better-constructor-parameters-for-DebuggableThreadPoolExecutor.patch,
>  5483-v07-08-Fix-brace-style.patch, 
> 5483-v07-09-Add-trace-option-to-a-more-complete-set-of-repair-functions.patch,
>  5483-v07-10-Correct-name-of-boolean-repairedAt-to-fullRepair.patch, 
> 5483-v08-11-Shorten-trace-messages.-Use-Tracing-begin.patch, 
> 5483-v08-12-Trace-streaming-in-Differencer-StreamingRepairTask.patch, 
> 5483-v08-13-sendNotification-of-local-traces-back-to-nodetool.patch, 
> 5483-v08-14-Poll-system_traces.events.patch, 
> 5483-v08-15-Limit-trace-notifications.-Add-exponential-backoff.patch, 
> 5483-v09-16-Fix-hang-caused-by-incorrect-exit-code.patch, 
> 5483-v10-17-minor-bugfixes-and-changes.patch, 
> 5483-v10-rebased-and-squashed-471f5cc.patch, 5483-v11-01-squashed.patch, 
> 5483-v11-squashed-nits.patch, 5483-v12-02-cassandra-yaml-ttl-doc.patch, 
> 5483-v13-608fb03-May-14-trace-formatting-changes.patch, 
> 5483-v14-01-squashed.patch, 
> 5483-v15-02-Hook-up-exponential-backoff-functionality.patch, 
> 5483-v15-03-Exact-doubling-for-exponential-backoff.patch, 
> 5483-v15-04-Re-add-old-StorageService-JMX-signatures.patch, 
> 5483-v15-05-Move-command-column-to-system_traces.sessions.patch, 
> 5483-v15.patch, 5483-v17-00.patch, 5483-v17-01.patch, 5483-v17.patch, 
> ccm-repair-test, cqlsh-left-justify-text-columns.patch, 
> prerepair-vs-postbuggedrepair.diff, test-5483-system_traces-events.txt, 
> trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch, 
> trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch,
>  tr...@8ebeee1-5483-v01-001-trace-filtering-and-tracestate-propagation.txt, 
> tr...@8ebeee1-5483-v01-002-simple-repair-tracing.txt, 
> v02p02-5483-v03-0003-Make-repair-tracing-controllable-via-nodetool.patch, 
> v02p02-5483-v04-0003-This-time-use-an-EnumSet-to-pass-boolean-repair-options.patch,
>  v02p02-5483-v05-0003-Use-long-instead-of-EnumSet-to-work-with-JMX.patch
>
>
> I think it would be nice to log repair stats and results like query tracing 
> stores traces to system keyspace. With it, you don't have to lookup each log 
> file to see what was the status and how it performed the repair you invoked. 
> Instead, you can query the repair log with session ID to see the state and 
> stats of all nodes involved in that repair session.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-5483) Repair tracing

2014-10-28 Thread Ben Chan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14187618#comment-14187618
 ] 

Ben Chan commented on CASSANDRA-5483:
-

Communication seems to be off by a phase. My Oct 23 message may not have made 
it across. Either that or I wasn't writing clearly enough (But in my defense, 
natural language has a fiddly REPL and almost-useless error messages. And 
there's no source code.).

Summary:
- Forget about the removal of the addCallback + sameThreadExecutor code before. 
 It turns out I didn't need to do that. The problem (I think) was some 
corruption in my build.
- Commenting isDone results in a convoluted comment. I was considering whether 
or not to do a refactor in order to make the comment require less complication.



For now I've gone ahead and commented isDone as best I can and fixed the naming 
that resulted in {{wait(wait);}}, along with various other fixes.

See https://github.com/usrbincc/cassandra/tree/5483-review (currently at commit 
15d8ec0a9fbbf).



Closing comments:
- For future reference: If you don't care about handling multiple consumers (as 
the current isDone does not), a large chunk of isDone (and its supporting code) 
could easily be replaced with a BlockingQueue and its {{offer}} and {{poll}} 
methods.
- Can I get a confirmation that the unfiltered repair traces (see my Oct 22 
message) in the current patch is acceptable?


> Repair tracing
> --
>
> Key: CASSANDRA-5483
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5483
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Yuki Morishita
>Assignee: Ben Chan
>Priority: Minor
>  Labels: repair
> Fix For: 3.0
>
> Attachments: 5483-full-trunk.txt, 
> 5483-v06-04-Allow-tracing-ttl-to-be-configured.patch, 
> 5483-v06-05-Add-a-command-column-to-system_traces.events.patch, 
> 5483-v06-06-Fix-interruption-in-tracestate-propagation.patch, 
> 5483-v07-07-Better-constructor-parameters-for-DebuggableThreadPoolExecutor.patch,
>  5483-v07-08-Fix-brace-style.patch, 
> 5483-v07-09-Add-trace-option-to-a-more-complete-set-of-repair-functions.patch,
>  5483-v07-10-Correct-name-of-boolean-repairedAt-to-fullRepair.patch, 
> 5483-v08-11-Shorten-trace-messages.-Use-Tracing-begin.patch, 
> 5483-v08-12-Trace-streaming-in-Differencer-StreamingRepairTask.patch, 
> 5483-v08-13-sendNotification-of-local-traces-back-to-nodetool.patch, 
> 5483-v08-14-Poll-system_traces.events.patch, 
> 5483-v08-15-Limit-trace-notifications.-Add-exponential-backoff.patch, 
> 5483-v09-16-Fix-hang-caused-by-incorrect-exit-code.patch, 
> 5483-v10-17-minor-bugfixes-and-changes.patch, 
> 5483-v10-rebased-and-squashed-471f5cc.patch, 5483-v11-01-squashed.patch, 
> 5483-v11-squashed-nits.patch, 5483-v12-02-cassandra-yaml-ttl-doc.patch, 
> 5483-v13-608fb03-May-14-trace-formatting-changes.patch, 
> 5483-v14-01-squashed.patch, 
> 5483-v15-02-Hook-up-exponential-backoff-functionality.patch, 
> 5483-v15-03-Exact-doubling-for-exponential-backoff.patch, 
> 5483-v15-04-Re-add-old-StorageService-JMX-signatures.patch, 
> 5483-v15-05-Move-command-column-to-system_traces.sessions.patch, 
> 5483-v15.patch, 5483-v17-00.patch, 5483-v17-01.patch, 5483-v17.patch, 
> ccm-repair-test, cqlsh-left-justify-text-columns.patch, 
> prerepair-vs-postbuggedrepair.diff, test-5483-system_traces-events.txt, 
> trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch, 
> trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch,
>  tr...@8ebeee1-5483-v01-001-trace-filtering-and-tracestate-propagation.txt, 
> tr...@8ebeee1-5483-v01-002-simple-repair-tracing.txt, 
> v02p02-5483-v03-0003-Make-repair-tracing-controllable-via-nodetool.patch, 
> v02p02-5483-v04-0003-This-time-use-an-EnumSet-to-pass-boolean-repair-options.patch,
>  v02p02-5483-v05-0003-Use-long-instead-of-EnumSet-to-work-with-JMX.patch
>
>
> I think it would be nice to log repair stats and results like query tracing 
> stores traces to system keyspace. With it, you don't have to lookup each log 
> file to see what was the status and how it performed the repair you invoked. 
> Instead, you can query the repair log with session ID to see the state and 
> stats of all nodes involved in that repair session.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-5483) Repair tracing

2014-10-27 Thread Joshua McKenzie (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14185252#comment-14185252
 ] 

Joshua McKenzie commented on CASSANDRA-5483:


DebuggableThreadPoolExecutor:
bq. Just to confirm (though I'm mostly sure this is what you mean), Do you mean 
that the function name should be changed to something like 
createCachedThreadpoolWithMaximumPoolSize?
Yep - just denote that it's max pool size

bq. If an older server version ever initiates a trace, then traceType could be 
null. And if it is null, that means that it's a QUERY traceType (since that's 
the only kind of trace available in previous versions). If a newer server 
version ever sends a null traceType, that should probably be an error, though 
nothing like that is being done yet.
Totally legit.  I'm fine with that.

bq. I touched on this code change earlier in the paragraph that begins "Note 
that I switched some code".
You did, and that's my bad.  There's a lot of history on this ticket that I 
needed to re-read and that was one of those things.  :)  I believe the changed 
code will behave identically to the original implementation with the caveats 
you listed, and so long as we can hand off that swallowed exception to the 
JVMStabilityInspector and document that we intend to swallow if things explode, 
I have no complaints.

bq. Futures#addCallback uses Uninterruptibles#getUninterruptibly instead of a 
plain Future#get, the main difference being that it will keep retrying 
Future#get even in the face of InterruptedException.
Not sure whether this would be a problem or not, but replicating previous 
behavior would be the safest route.

bq. In the interests of not gratuitously rearranging code, I'll merely see 
about adding better comments and making the code less magical.
Less magical is almost always a good thing.

{code} wait(wait); {code}
Reads oddly - might want to tweak naming there.  Other than that, isDone on the 
review branch comments itself.

nit: a more helpful description for trace_repair would be good.  Currently 
reads "Use -tr to trace repair"

> Repair tracing
> --
>
> Key: CASSANDRA-5483
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5483
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Yuki Morishita
>Assignee: Ben Chan
>Priority: Minor
>  Labels: repair
> Fix For: 3.0
>
> Attachments: 5483-full-trunk.txt, 
> 5483-v06-04-Allow-tracing-ttl-to-be-configured.patch, 
> 5483-v06-05-Add-a-command-column-to-system_traces.events.patch, 
> 5483-v06-06-Fix-interruption-in-tracestate-propagation.patch, 
> 5483-v07-07-Better-constructor-parameters-for-DebuggableThreadPoolExecutor.patch,
>  5483-v07-08-Fix-brace-style.patch, 
> 5483-v07-09-Add-trace-option-to-a-more-complete-set-of-repair-functions.patch,
>  5483-v07-10-Correct-name-of-boolean-repairedAt-to-fullRepair.patch, 
> 5483-v08-11-Shorten-trace-messages.-Use-Tracing-begin.patch, 
> 5483-v08-12-Trace-streaming-in-Differencer-StreamingRepairTask.patch, 
> 5483-v08-13-sendNotification-of-local-traces-back-to-nodetool.patch, 
> 5483-v08-14-Poll-system_traces.events.patch, 
> 5483-v08-15-Limit-trace-notifications.-Add-exponential-backoff.patch, 
> 5483-v09-16-Fix-hang-caused-by-incorrect-exit-code.patch, 
> 5483-v10-17-minor-bugfixes-and-changes.patch, 
> 5483-v10-rebased-and-squashed-471f5cc.patch, 5483-v11-01-squashed.patch, 
> 5483-v11-squashed-nits.patch, 5483-v12-02-cassandra-yaml-ttl-doc.patch, 
> 5483-v13-608fb03-May-14-trace-formatting-changes.patch, 
> 5483-v14-01-squashed.patch, 
> 5483-v15-02-Hook-up-exponential-backoff-functionality.patch, 
> 5483-v15-03-Exact-doubling-for-exponential-backoff.patch, 
> 5483-v15-04-Re-add-old-StorageService-JMX-signatures.patch, 
> 5483-v15-05-Move-command-column-to-system_traces.sessions.patch, 
> 5483-v15.patch, 5483-v17-00.patch, 5483-v17-01.patch, 5483-v17.patch, 
> ccm-repair-test, cqlsh-left-justify-text-columns.patch, 
> prerepair-vs-postbuggedrepair.diff, test-5483-system_traces-events.txt, 
> trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch, 
> trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch,
>  tr...@8ebeee1-5483-v01-001-trace-filtering-and-tracestate-propagation.txt, 
> tr...@8ebeee1-5483-v01-002-simple-repair-tracing.txt, 
> v02p02-5483-v03-0003-Make-repair-tracing-controllable-via-nodetool.patch, 
> v02p02-5483-v04-0003-This-time-use-an-EnumSet-to-pass-boolean-repair-options.patch,
>  v02p02-5483-v05-0003-Use-long-instead-of-EnumSet-to-work-with-JMX.patch
>
>
> I think it would be nice to log repair stats and results like query tracing 
> stores traces to system keyspace. With it, you don't have to lookup each log 
> file to see what was the status and how it performed

[jira] [Commented] (CASSANDRA-5483) Repair tracing

2014-10-23 Thread Ben Chan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14181643#comment-14181643
 ] 

Ben Chan commented on CASSANDRA-5483:
-

Review branch up at:

https://github.com/usrbincc/cassandra/tree/5483-review



StorageService#createRepairTask:

So it turns out that my mistaken assumption was this: 
"Tracing.instance.stopSession() needs to run in the main repair thread." (This 
was not one of the assumptions that I listed explicitly above).

Why did I believe this? Because repair kept on hanging at the end until I made 
this change as a sanity check.

Why does it not hang currently? I don't know, but I suspect there was something 
odd in the build that got cleaned out when I did an "ant clean". Or when I 
manually cleaned out the build/lib/jars directory (there were multiple versions 
of the same library in there; "ant clean" was not taking care of this).

So I put back the original addCallback code structure.



Commenting isDone:

I started trying to do this, but it quickly devolved into just badly parroting 
the code. It would simplify commenting if I separated this function into 
multiple parts, maybe something like this:
{code:java}
// TraceState
public enum Status { IDLE, ACTIVE, STOPPED; }
private Status status;
/*  tentative doc comment 
 * Wait (with timeout) until status is ACTIVE or STOPPED. Reading an ACTIVE
 * status resets it to IDLE. Due to the status reset logic, this function only
 * works properly with a single consumer.
 * @param timeout timeout in milliseconds
 * @return activity status
 */
public synchronized Status getActivity(long timeout)
{
// Note: This is just a placeholder for the general gist.
while (status == Status.IDLE)
wait(timeout); // TODO: we need to update timeout if this actually 
loops.
Status retStatus = status;
status = status == Status.ACTIVE ? Status.IDLE : status;
return retStatus;
}

// createQueryThread's returned Runnable#run
while ((r = getActivity(timeout)) != TraceState.Status.STOPPED)
{
   if (r == TraceState.Status.IDLE) // double timeout if second time through 
with this timeout
   { ... }
   else // reset timeout to minWait
   { ... }
   // ...
}
{code}

Note that this would end up being structured like a cleaner version of 
[^5483-v14-01-squashed.patch], with further simplifications due to not trying 
to handle multiple consumers.


> Repair tracing
> --
>
> Key: CASSANDRA-5483
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5483
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Yuki Morishita
>Assignee: Ben Chan
>Priority: Minor
>  Labels: repair
> Fix For: 3.0
>
> Attachments: 5483-full-trunk.txt, 
> 5483-v06-04-Allow-tracing-ttl-to-be-configured.patch, 
> 5483-v06-05-Add-a-command-column-to-system_traces.events.patch, 
> 5483-v06-06-Fix-interruption-in-tracestate-propagation.patch, 
> 5483-v07-07-Better-constructor-parameters-for-DebuggableThreadPoolExecutor.patch,
>  5483-v07-08-Fix-brace-style.patch, 
> 5483-v07-09-Add-trace-option-to-a-more-complete-set-of-repair-functions.patch,
>  5483-v07-10-Correct-name-of-boolean-repairedAt-to-fullRepair.patch, 
> 5483-v08-11-Shorten-trace-messages.-Use-Tracing-begin.patch, 
> 5483-v08-12-Trace-streaming-in-Differencer-StreamingRepairTask.patch, 
> 5483-v08-13-sendNotification-of-local-traces-back-to-nodetool.patch, 
> 5483-v08-14-Poll-system_traces.events.patch, 
> 5483-v08-15-Limit-trace-notifications.-Add-exponential-backoff.patch, 
> 5483-v09-16-Fix-hang-caused-by-incorrect-exit-code.patch, 
> 5483-v10-17-minor-bugfixes-and-changes.patch, 
> 5483-v10-rebased-and-squashed-471f5cc.patch, 5483-v11-01-squashed.patch, 
> 5483-v11-squashed-nits.patch, 5483-v12-02-cassandra-yaml-ttl-doc.patch, 
> 5483-v13-608fb03-May-14-trace-formatting-changes.patch, 
> 5483-v14-01-squashed.patch, 
> 5483-v15-02-Hook-up-exponential-backoff-functionality.patch, 
> 5483-v15-03-Exact-doubling-for-exponential-backoff.patch, 
> 5483-v15-04-Re-add-old-StorageService-JMX-signatures.patch, 
> 5483-v15-05-Move-command-column-to-system_traces.sessions.patch, 
> 5483-v15.patch, 5483-v17-00.patch, 5483-v17-01.patch, 5483-v17.patch, 
> ccm-repair-test, cqlsh-left-justify-text-columns.patch, 
> prerepair-vs-postbuggedrepair.diff, test-5483-system_traces-events.txt, 
> trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch, 
> trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch,
>  tr...@8ebeee1-5483-v01-001-trace-filtering-and-tracestate-propagation.txt, 
> tr...@8ebeee1-5483-v01-002-simple-repair-tracing.txt, 
> v02p02-5483-v03-0003-Make-repair-tracing-controllable-via-nodetool.patch, 
> v02p02-5483-v04-0003-This-time-use-an-EnumSet-to-pass-boolean-re

[jira] [Commented] (CASSANDRA-5483) Repair tracing

2014-10-22 Thread Ben Chan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180613#comment-14180613
 ] 

Ben Chan commented on CASSANDRA-5483:
-

I'm going to dissapoint you and say that a lot of the reasons for how things 
are is historical, or due to code transformations intended to precisely 
maintain some aspect of the old behavior, without trying to question the logic 
behind that particular behavior.

In other words, I've just been trying to add this one particular behavior 
(repair tracing) without otherwise changing program behavior. This is probably 
most obvious around the trace calls, where (hopefully) the log messages are 
unchanged, even when they're different from the trace messages. I really need 
to go back through that with a fine-tooth comb, because I think there were some 
places where I may have changed capitalization. Or worse.



Most of your comments were straightforward. Just need some clarification on 
these:

{quote}
DebuggableThreadPoolExecutor:

Do we intend to create w/Integer.MAX_VALUE as 'size' on this method?
{quote}

Yes, in order to preserve the behavior of the previous code (see my 2014-03-08 
comment and the discussion surrounding it).

{quote}
If so, name should reflect it like other methods in file.
{quote}

Just to confirm (though I'm mostly sure this is what you mean), Do you mean 
that the function name should be changed to something like 
{{createCachedThreadpoolWithMaximumPoolSize}}?

{quote}
OutboundTcpConnection:

Why do we assume that if TraceType is null the type is QUERY?
{quote}

If an older server version ever initiates a trace, then traceType could be 
null. And if it *is* null, that means that it's a QUERY traceType (since that's 
the only kind of trace available in previous versions). If a newer server 
version ever sends a null traceType, that should probably be an error, though 
nothing like that is being done yet.

{quote}
StorageService:

Swallowing Throwable on line 2828 w/out any logging or comment as to why is 
unclear. Using exception handling as control flow is step back from the 
sameThreadExecutor error handling approach taken prior as we lose naming and 
intention information. Does this gain us something that I'm not aware of?
{quote}

Before saying anything else: yes, I probably should have commented the 
exception-swallowing.

I touched on this code change earlier in the paragraph that begins "Note that I 
switched some code". In a nutshell: preserving behavior of previous code.

Whether I succeeded or not is a different issue, but at least let me run 
through my thought process so that you can help me find the holes in it. Here 
is a redacted version of the code:

{noformat}
// Previous version:
Futures.addCallback(allSessions, new FutureCallback()
{
public void onSuccess(@Nullable Object result)
{
// onSuccess stuff ...
repairComplete();
}

public void onFailure(Throwable t)
{
// onFailure stuff (empty, except for repairComplete) ...
repairComplete();
}

private void repairComplete()
{
// repairComplete stuff ...
}
}, MoreExecutors.sameThreadExecutor());
{noformat}

Here are my assumptions:
- {{// onSuccess stuff ...}} does not throw any exceptions out of its code 
block.
- {{// repairComplete stuff ...}} works the same no matter which thread it runs 
in (assuming it's called at the proper time, of course).
- The guava library used corresponds to the source code found at 
{{build/lib/sources/guava-16.0.1-sources.jar}}.

{noformat}
// Review version:
try
{
allSessions.get();
// onSuccess stuff ...
}
catch (Throwable t)
{
// onFailure stuff (empty, except for repairComplete) ...
}
// repairComplete stuff ...
{noformat}

Since {{allSessions.get()}} is the only thing that can throw to the catch 
block, then its success or failure should (almost, see below) be the only way 
to trigger the onSuccess and onFailure "branches" that correspond to the 
original code.

Intended difference:
- {{// repairComplete stuff ...}} should now run in the main repair thread, 
instead of in the thread of the last exiting subtask in allSessions.

Unintended differences:
- Futures#addCallback uses Uninterruptibles#getUninterruptibly instead of a 
plain Future#get, the main difference being that it will keep retrying 
Future#get even in the face of InterruptedException. This may be important, so 
I could do this also.

{quote}
TraceState

Right now isDone is only called by the query thread but there's nothing in the 
method signature or comments of that method documenting the 
doubling-every-second-run algorithm it's using.
{quote}

The doubling behavior used to be contained in the query thread; "isDone" used 
to be a (more or less) simple wait with timeout.

Agree on needing comments. In the interests of not gratuitously rearranging 
code, I'll merely see about addi

[jira] [Commented] (CASSANDRA-5483) Repair tracing

2014-10-22 Thread Joshua McKenzie (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180384#comment-14180384
 ] 

Joshua McKenzie commented on CASSANDRA-5483:


Initial observations / thoughts:

90 days on default repair trace TTL seems long to me, especially w/incremental 
being new default.

DebuggableThreadPoolExecutor:
- 'createWithCachedTheadpool' is misspelled -> needs an r in Thread
- Do we intend to create w/Integer.MAX_VALUE as 'size' on this method?  If so, 
name should reflect it like other methods in file.

MessageOut:
- traceParameters isn't used elsewhere so logic can be nested inside 
constructor where it's called and remove unnecessary method call indirection.

OutboundTcpConnection:
- Why do we assume that if TraceType is null the type is QUERY?  We should 
probably be explicitly setting that for all tracing types going forward if we 
plan on expanding the use of the enum

StorageService:
- nit: declaring queryThread in createRepairTask but aren't doing anything with 
it -> we should just call the method w/out declaration if we don't intend to 
use it
- usage of bitwise ops to toggle si for indexing into seen array is 
unnecessarily complex.  A distinct 0/1 and toggle is more clear - there's no 
need for inline index flipping w/bitwise ops as it's not gaining us any 
performance or code clarity.
- nit: If not intending to use Object result from allSessions.get(), just 
remove the declaration and call the method
- Swallowing Throwable on line 2828 w/out any logging or comment as to why is 
unclear.  Using exception handling as control flow is step back from the 
sameThreadExecutor error handling approach taken prior as we lose naming and 
intention information.  Does this gain us something that I'm not aware of?

TraceState
- Right now isDone is only called by the query thread but there's nothing in 
the method signature or comments of that method documenting the 
doubling-every-second-run algorithm it's using.
- the implementation of isDone looks like a premature optimization.  I would 
prefer a more clear implementation of 'double every second run' than:
{code}
else if (haveWaited)
{
wait = Math.min((wait ^ 1) << (wait & 1), maxWaitMillis);
return false;
}
else
{
haveWaited = true;
try
{
wait(wait & ~1);
}
{code}

Now that I'm more familiar with the changes here I'm going to need to re-read 
the notes on this ticket and test out / inspect output.  The above is just from 
my first read-through of the code yesterday.

> Repair tracing
> --
>
> Key: CASSANDRA-5483
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5483
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Yuki Morishita
>Assignee: Ben Chan
>Priority: Minor
>  Labels: repair
> Fix For: 3.0
>
> Attachments: 5483-full-trunk.txt, 
> 5483-v06-04-Allow-tracing-ttl-to-be-configured.patch, 
> 5483-v06-05-Add-a-command-column-to-system_traces.events.patch, 
> 5483-v06-06-Fix-interruption-in-tracestate-propagation.patch, 
> 5483-v07-07-Better-constructor-parameters-for-DebuggableThreadPoolExecutor.patch,
>  5483-v07-08-Fix-brace-style.patch, 
> 5483-v07-09-Add-trace-option-to-a-more-complete-set-of-repair-functions.patch,
>  5483-v07-10-Correct-name-of-boolean-repairedAt-to-fullRepair.patch, 
> 5483-v08-11-Shorten-trace-messages.-Use-Tracing-begin.patch, 
> 5483-v08-12-Trace-streaming-in-Differencer-StreamingRepairTask.patch, 
> 5483-v08-13-sendNotification-of-local-traces-back-to-nodetool.patch, 
> 5483-v08-14-Poll-system_traces.events.patch, 
> 5483-v08-15-Limit-trace-notifications.-Add-exponential-backoff.patch, 
> 5483-v09-16-Fix-hang-caused-by-incorrect-exit-code.patch, 
> 5483-v10-17-minor-bugfixes-and-changes.patch, 
> 5483-v10-rebased-and-squashed-471f5cc.patch, 5483-v11-01-squashed.patch, 
> 5483-v11-squashed-nits.patch, 5483-v12-02-cassandra-yaml-ttl-doc.patch, 
> 5483-v13-608fb03-May-14-trace-formatting-changes.patch, 
> 5483-v14-01-squashed.patch, 
> 5483-v15-02-Hook-up-exponential-backoff-functionality.patch, 
> 5483-v15-03-Exact-doubling-for-exponential-backoff.patch, 
> 5483-v15-04-Re-add-old-StorageService-JMX-signatures.patch, 
> 5483-v15-05-Move-command-column-to-system_traces.sessions.patch, 
> 5483-v15.patch, 5483-v17-00.patch, 5483-v17-01.patch, 5483-v17.patch, 
> ccm-repair-test, cqlsh-left-justify-text-columns.patch, 
> prerepair-vs-postbuggedrepair.diff, test-5483-system_traces-events.txt, 
> trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch, 
> trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch,
>  tr...@8e

[jira] [Commented] (CASSANDRA-5483) Repair tracing

2014-10-22 Thread Ben Chan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180257#comment-14180257
 ] 

Ben Chan commented on CASSANDRA-5483:
-

Hm. Does that mean you've already got some feedback ready?

I'll be here on and off for a little while. Maybe we can get in one full 
iteration (or n?) today.


> Repair tracing
> --
>
> Key: CASSANDRA-5483
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5483
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Yuki Morishita
>Assignee: Ben Chan
>Priority: Minor
>  Labels: repair
> Fix For: 3.0
>
> Attachments: 5483-full-trunk.txt, 
> 5483-v06-04-Allow-tracing-ttl-to-be-configured.patch, 
> 5483-v06-05-Add-a-command-column-to-system_traces.events.patch, 
> 5483-v06-06-Fix-interruption-in-tracestate-propagation.patch, 
> 5483-v07-07-Better-constructor-parameters-for-DebuggableThreadPoolExecutor.patch,
>  5483-v07-08-Fix-brace-style.patch, 
> 5483-v07-09-Add-trace-option-to-a-more-complete-set-of-repair-functions.patch,
>  5483-v07-10-Correct-name-of-boolean-repairedAt-to-fullRepair.patch, 
> 5483-v08-11-Shorten-trace-messages.-Use-Tracing-begin.patch, 
> 5483-v08-12-Trace-streaming-in-Differencer-StreamingRepairTask.patch, 
> 5483-v08-13-sendNotification-of-local-traces-back-to-nodetool.patch, 
> 5483-v08-14-Poll-system_traces.events.patch, 
> 5483-v08-15-Limit-trace-notifications.-Add-exponential-backoff.patch, 
> 5483-v09-16-Fix-hang-caused-by-incorrect-exit-code.patch, 
> 5483-v10-17-minor-bugfixes-and-changes.patch, 
> 5483-v10-rebased-and-squashed-471f5cc.patch, 5483-v11-01-squashed.patch, 
> 5483-v11-squashed-nits.patch, 5483-v12-02-cassandra-yaml-ttl-doc.patch, 
> 5483-v13-608fb03-May-14-trace-formatting-changes.patch, 
> 5483-v14-01-squashed.patch, 
> 5483-v15-02-Hook-up-exponential-backoff-functionality.patch, 
> 5483-v15-03-Exact-doubling-for-exponential-backoff.patch, 
> 5483-v15-04-Re-add-old-StorageService-JMX-signatures.patch, 
> 5483-v15-05-Move-command-column-to-system_traces.sessions.patch, 
> 5483-v15.patch, 5483-v17-00.patch, 5483-v17-01.patch, 5483-v17.patch, 
> ccm-repair-test, cqlsh-left-justify-text-columns.patch, 
> prerepair-vs-postbuggedrepair.diff, test-5483-system_traces-events.txt, 
> trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch, 
> trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch,
>  tr...@8ebeee1-5483-v01-001-trace-filtering-and-tracestate-propagation.txt, 
> tr...@8ebeee1-5483-v01-002-simple-repair-tracing.txt, 
> v02p02-5483-v03-0003-Make-repair-tracing-controllable-via-nodetool.patch, 
> v02p02-5483-v04-0003-This-time-use-an-EnumSet-to-pass-boolean-repair-options.patch,
>  v02p02-5483-v05-0003-Use-long-instead-of-EnumSet-to-work-with-JMX.patch
>
>
> I think it would be nice to log repair stats and results like query tracing 
> stores traces to system keyspace. With it, you don't have to lookup each log 
> file to see what was the status and how it performed the repair you invoked. 
> Instead, you can query the repair log with session ID to see the state and 
> stats of all nodes involved in that repair session.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-5483) Repair tracing

2014-10-22 Thread Joshua McKenzie (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180242#comment-14180242
 ] 

Joshua McKenzie commented on CASSANDRA-5483:


I actually started looking at this yesterday, hence my question. :)

> Repair tracing
> --
>
> Key: CASSANDRA-5483
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5483
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Yuki Morishita
>Assignee: Ben Chan
>Priority: Minor
>  Labels: repair
> Fix For: 3.0
>
> Attachments: 5483-full-trunk.txt, 
> 5483-v06-04-Allow-tracing-ttl-to-be-configured.patch, 
> 5483-v06-05-Add-a-command-column-to-system_traces.events.patch, 
> 5483-v06-06-Fix-interruption-in-tracestate-propagation.patch, 
> 5483-v07-07-Better-constructor-parameters-for-DebuggableThreadPoolExecutor.patch,
>  5483-v07-08-Fix-brace-style.patch, 
> 5483-v07-09-Add-trace-option-to-a-more-complete-set-of-repair-functions.patch,
>  5483-v07-10-Correct-name-of-boolean-repairedAt-to-fullRepair.patch, 
> 5483-v08-11-Shorten-trace-messages.-Use-Tracing-begin.patch, 
> 5483-v08-12-Trace-streaming-in-Differencer-StreamingRepairTask.patch, 
> 5483-v08-13-sendNotification-of-local-traces-back-to-nodetool.patch, 
> 5483-v08-14-Poll-system_traces.events.patch, 
> 5483-v08-15-Limit-trace-notifications.-Add-exponential-backoff.patch, 
> 5483-v09-16-Fix-hang-caused-by-incorrect-exit-code.patch, 
> 5483-v10-17-minor-bugfixes-and-changes.patch, 
> 5483-v10-rebased-and-squashed-471f5cc.patch, 5483-v11-01-squashed.patch, 
> 5483-v11-squashed-nits.patch, 5483-v12-02-cassandra-yaml-ttl-doc.patch, 
> 5483-v13-608fb03-May-14-trace-formatting-changes.patch, 
> 5483-v14-01-squashed.patch, 
> 5483-v15-02-Hook-up-exponential-backoff-functionality.patch, 
> 5483-v15-03-Exact-doubling-for-exponential-backoff.patch, 
> 5483-v15-04-Re-add-old-StorageService-JMX-signatures.patch, 
> 5483-v15-05-Move-command-column-to-system_traces.sessions.patch, 
> 5483-v15.patch, 5483-v17-00.patch, 5483-v17-01.patch, 5483-v17.patch, 
> ccm-repair-test, cqlsh-left-justify-text-columns.patch, 
> prerepair-vs-postbuggedrepair.diff, test-5483-system_traces-events.txt, 
> trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch, 
> trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch,
>  tr...@8ebeee1-5483-v01-001-trace-filtering-and-tracestate-propagation.txt, 
> tr...@8ebeee1-5483-v01-002-simple-repair-tracing.txt, 
> v02p02-5483-v03-0003-Make-repair-tracing-controllable-via-nodetool.patch, 
> v02p02-5483-v04-0003-This-time-use-an-EnumSet-to-pass-boolean-repair-options.patch,
>  v02p02-5483-v05-0003-Use-long-instead-of-EnumSet-to-work-with-JMX.patch
>
>
> I think it would be nice to log repair stats and results like query tracing 
> stores traces to system keyspace. With it, you don't have to lookup each log 
> file to see what was the status and how it performed the repair you invoked. 
> Instead, you can query the repair log with session ID to see the state and 
> stats of all nodes involved in that repair session.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-5483) Repair tracing

2014-10-22 Thread Ben Chan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180236#comment-14180236
 ] 

Ben Chan commented on CASSANDRA-5483:
-

I'm not planning on making any more changes pre-review. Feel free to go ahead.

(Though it would have been fine even if you had started prior to 5483-v17-00 
and 5483-v17-01; the differences are not substantial)


> Repair tracing
> --
>
> Key: CASSANDRA-5483
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5483
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Yuki Morishita
>Assignee: Ben Chan
>Priority: Minor
>  Labels: repair
> Fix For: 3.0
>
> Attachments: 5483-full-trunk.txt, 
> 5483-v06-04-Allow-tracing-ttl-to-be-configured.patch, 
> 5483-v06-05-Add-a-command-column-to-system_traces.events.patch, 
> 5483-v06-06-Fix-interruption-in-tracestate-propagation.patch, 
> 5483-v07-07-Better-constructor-parameters-for-DebuggableThreadPoolExecutor.patch,
>  5483-v07-08-Fix-brace-style.patch, 
> 5483-v07-09-Add-trace-option-to-a-more-complete-set-of-repair-functions.patch,
>  5483-v07-10-Correct-name-of-boolean-repairedAt-to-fullRepair.patch, 
> 5483-v08-11-Shorten-trace-messages.-Use-Tracing-begin.patch, 
> 5483-v08-12-Trace-streaming-in-Differencer-StreamingRepairTask.patch, 
> 5483-v08-13-sendNotification-of-local-traces-back-to-nodetool.patch, 
> 5483-v08-14-Poll-system_traces.events.patch, 
> 5483-v08-15-Limit-trace-notifications.-Add-exponential-backoff.patch, 
> 5483-v09-16-Fix-hang-caused-by-incorrect-exit-code.patch, 
> 5483-v10-17-minor-bugfixes-and-changes.patch, 
> 5483-v10-rebased-and-squashed-471f5cc.patch, 5483-v11-01-squashed.patch, 
> 5483-v11-squashed-nits.patch, 5483-v12-02-cassandra-yaml-ttl-doc.patch, 
> 5483-v13-608fb03-May-14-trace-formatting-changes.patch, 
> 5483-v14-01-squashed.patch, 
> 5483-v15-02-Hook-up-exponential-backoff-functionality.patch, 
> 5483-v15-03-Exact-doubling-for-exponential-backoff.patch, 
> 5483-v15-04-Re-add-old-StorageService-JMX-signatures.patch, 
> 5483-v15-05-Move-command-column-to-system_traces.sessions.patch, 
> 5483-v15.patch, 5483-v17-00.patch, 5483-v17-01.patch, 5483-v17.patch, 
> ccm-repair-test, cqlsh-left-justify-text-columns.patch, 
> prerepair-vs-postbuggedrepair.diff, test-5483-system_traces-events.txt, 
> trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch, 
> trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch,
>  tr...@8ebeee1-5483-v01-001-trace-filtering-and-tracestate-propagation.txt, 
> tr...@8ebeee1-5483-v01-002-simple-repair-tracing.txt, 
> v02p02-5483-v03-0003-Make-repair-tracing-controllable-via-nodetool.patch, 
> v02p02-5483-v04-0003-This-time-use-an-EnumSet-to-pass-boolean-repair-options.patch,
>  v02p02-5483-v05-0003-Use-long-instead-of-EnumSet-to-work-with-JMX.patch
>
>
> I think it would be nice to log repair stats and results like query tracing 
> stores traces to system keyspace. With it, you don't have to lookup each log 
> file to see what was the status and how it performed the repair you invoked. 
> Instead, you can query the repair log with session ID to see the state and 
> stats of all nodes involved in that repair session.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-5483) Repair tracing

2014-10-22 Thread Joshua McKenzie (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180122#comment-14180122
 ] 

Joshua McKenzie commented on CASSANDRA-5483:


I've put together a feature branch w/v17 and the 2 patches 
[here|https://github.com/josh-mckenzie/cassandra/compare/5483].  Are we stable 
and ready for review at this point [~usrbincc]?

Also, for subsequent revisions please clone this feature branch, make 
modifications, and upload to github and link the branch here in the comments.  
This ticket is already far too cluttered w/attachments and it'll help keep 
things more clear.

> Repair tracing
> --
>
> Key: CASSANDRA-5483
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5483
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Yuki Morishita
>Assignee: Ben Chan
>Priority: Minor
>  Labels: repair
> Fix For: 3.0
>
> Attachments: 5483-full-trunk.txt, 
> 5483-v06-04-Allow-tracing-ttl-to-be-configured.patch, 
> 5483-v06-05-Add-a-command-column-to-system_traces.events.patch, 
> 5483-v06-06-Fix-interruption-in-tracestate-propagation.patch, 
> 5483-v07-07-Better-constructor-parameters-for-DebuggableThreadPoolExecutor.patch,
>  5483-v07-08-Fix-brace-style.patch, 
> 5483-v07-09-Add-trace-option-to-a-more-complete-set-of-repair-functions.patch,
>  5483-v07-10-Correct-name-of-boolean-repairedAt-to-fullRepair.patch, 
> 5483-v08-11-Shorten-trace-messages.-Use-Tracing-begin.patch, 
> 5483-v08-12-Trace-streaming-in-Differencer-StreamingRepairTask.patch, 
> 5483-v08-13-sendNotification-of-local-traces-back-to-nodetool.patch, 
> 5483-v08-14-Poll-system_traces.events.patch, 
> 5483-v08-15-Limit-trace-notifications.-Add-exponential-backoff.patch, 
> 5483-v09-16-Fix-hang-caused-by-incorrect-exit-code.patch, 
> 5483-v10-17-minor-bugfixes-and-changes.patch, 
> 5483-v10-rebased-and-squashed-471f5cc.patch, 5483-v11-01-squashed.patch, 
> 5483-v11-squashed-nits.patch, 5483-v12-02-cassandra-yaml-ttl-doc.patch, 
> 5483-v13-608fb03-May-14-trace-formatting-changes.patch, 
> 5483-v14-01-squashed.patch, 
> 5483-v15-02-Hook-up-exponential-backoff-functionality.patch, 
> 5483-v15-03-Exact-doubling-for-exponential-backoff.patch, 
> 5483-v15-04-Re-add-old-StorageService-JMX-signatures.patch, 
> 5483-v15-05-Move-command-column-to-system_traces.sessions.patch, 
> 5483-v15.patch, 5483-v17-00.patch, 5483-v17-01.patch, 5483-v17.patch, 
> ccm-repair-test, cqlsh-left-justify-text-columns.patch, 
> prerepair-vs-postbuggedrepair.diff, test-5483-system_traces-events.txt, 
> trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch, 
> trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch,
>  tr...@8ebeee1-5483-v01-001-trace-filtering-and-tracestate-propagation.txt, 
> tr...@8ebeee1-5483-v01-002-simple-repair-tracing.txt, 
> v02p02-5483-v03-0003-Make-repair-tracing-controllable-via-nodetool.patch, 
> v02p02-5483-v04-0003-This-time-use-an-EnumSet-to-pass-boolean-repair-options.patch,
>  v02p02-5483-v05-0003-Use-long-instead-of-EnumSet-to-work-with-JMX.patch
>
>
> I think it would be nice to log repair stats and results like query tracing 
> stores traces to system keyspace. With it, you don't have to lookup each log 
> file to see what was the status and how it performed the repair you invoked. 
> Instead, you can query the repair log with session ID to see the state and 
> stats of all nodes involved in that repair session.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-5483) Repair tracing

2014-10-22 Thread Ben Chan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180004#comment-14180004
 ] 

Ben Chan commented on CASSANDRA-5483:
-

Thanks; "node repair -full" (in conjunction with "ccm clear") does indeed force 
streaming repair.

[^5483-v17-00.patch] fixes a few trace calls that got lost/misplaced in the 
rebase.
[^5483-v17-01.patch] reorganizes some code; functionally identical.

{noformat}
# This bash code should work out-of-the-box with a stock setup.
JMXGET='/jmx_port/{p=$2;} /binary/{split($2,a,/\047/);h=a[2];}
  END{printf("bin/nodetool -h %s -p %s\n",h,p,cmd);}'
ccm_nodetool() { local N=$1; shift; $(ccm $N show | awk -F= "$JMXGET") "$@"; }
dl_apply_maybe() { for url; do { [ -e $(basename $url) ] || curl -sO $url; } &&
  ! [ ${url%.patch} = $url ] && git apply $(basename $url); done; }

NEW_BRANCH=$(date +5483-17--%Y%m%d-%H%M%S)
W=https://issues.apache.org/jira/secure/attachment

git checkout -b $NEW_BRANCH 49833b9 &&
dl_apply_maybe \
  $W/12633156/ccm-repair-test \
  $W/12675963/5483-v17.patch \
  $W/12675963/5483-v17-00.patch \
  $W/12676340/5483-v17-01.patch &&
ant clean && ant &&
chmod +x ./ccm-repair-test && ./ccm-repair-test -kR &&
ccm node1 stop && ccm node1 clear && ccm node1 start &&
ccm_nodetool node1 repair -tr -full &&
ccm node1 showlog | grep "Performing streaming repair"
{noformat}



Note that I switched some code to a different thread in order to facilitate 
trace handling -- see the diff hunk near the end of 
StorageService#createRepairTask. I needed certain calls to happen in the parent 
repair thread, and this seemed to be the simplest way.

As far as I can tell, there shouldn't be any differences in functionality or 
concurrency level (i.e. number of tasks that are executing concurrently), but 
someone should examine that section just to make sure.



The issue with unfiltered traces (see my previous message) still remains, but 
if push comes to shove, you can consider this as just a special case of having 
a trace call where it's not needed, or missing where it *is* needed.

In other words, a cosmetic problem. Fixing this should not require any changes 
or additions to the over-wire protocol or the JMX interface.


> Repair tracing
> --
>
> Key: CASSANDRA-5483
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5483
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Yuki Morishita
>Assignee: Ben Chan
>Priority: Minor
>  Labels: repair
> Fix For: 3.0
>
> Attachments: 5483-full-trunk.txt, 
> 5483-v06-04-Allow-tracing-ttl-to-be-configured.patch, 
> 5483-v06-05-Add-a-command-column-to-system_traces.events.patch, 
> 5483-v06-06-Fix-interruption-in-tracestate-propagation.patch, 
> 5483-v07-07-Better-constructor-parameters-for-DebuggableThreadPoolExecutor.patch,
>  5483-v07-08-Fix-brace-style.patch, 
> 5483-v07-09-Add-trace-option-to-a-more-complete-set-of-repair-functions.patch,
>  5483-v07-10-Correct-name-of-boolean-repairedAt-to-fullRepair.patch, 
> 5483-v08-11-Shorten-trace-messages.-Use-Tracing-begin.patch, 
> 5483-v08-12-Trace-streaming-in-Differencer-StreamingRepairTask.patch, 
> 5483-v08-13-sendNotification-of-local-traces-back-to-nodetool.patch, 
> 5483-v08-14-Poll-system_traces.events.patch, 
> 5483-v08-15-Limit-trace-notifications.-Add-exponential-backoff.patch, 
> 5483-v09-16-Fix-hang-caused-by-incorrect-exit-code.patch, 
> 5483-v10-17-minor-bugfixes-and-changes.patch, 
> 5483-v10-rebased-and-squashed-471f5cc.patch, 5483-v11-01-squashed.patch, 
> 5483-v11-squashed-nits.patch, 5483-v12-02-cassandra-yaml-ttl-doc.patch, 
> 5483-v13-608fb03-May-14-trace-formatting-changes.patch, 
> 5483-v14-01-squashed.patch, 
> 5483-v15-02-Hook-up-exponential-backoff-functionality.patch, 
> 5483-v15-03-Exact-doubling-for-exponential-backoff.patch, 
> 5483-v15-04-Re-add-old-StorageService-JMX-signatures.patch, 
> 5483-v15-05-Move-command-column-to-system_traces.sessions.patch, 
> 5483-v15.patch, 5483-v17-00.patch, 5483-v17-01.patch, 5483-v17.patch, 
> ccm-repair-test, cqlsh-left-justify-text-columns.patch, 
> prerepair-vs-postbuggedrepair.diff, test-5483-system_traces-events.txt, 
> trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch, 
> trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch,
>  tr...@8ebeee1-5483-v01-001-trace-filtering-and-tracestate-propagation.txt, 
> tr...@8ebeee1-5483-v01-002-simple-repair-tracing.txt, 
> v02p02-5483-v03-0003-Make-repair-tracing-controllable-via-nodetool.patch, 
> v02p02-5483-v04-0003-This-time-use-an-EnumSet-to-pass-boolean-repair-options.patch,
>  v02p02-5483-v05-0003-Use-long-instead-of-EnumSet-to-work-with-JMX.patch
>
>
> I think it would be nice to log repair stats and results like query tracing 
> s

[jira] [Commented] (CASSANDRA-5483) Repair tracing

2014-10-20 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14177945#comment-14177945
 ] 

Marcus Eriksson commented on CASSANDRA-5483:


you probably need to do "nodetool repair -full", incremental repairs are 
default in trunk

> Repair tracing
> --
>
> Key: CASSANDRA-5483
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5483
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Yuki Morishita
>Assignee: Ben Chan
>Priority: Minor
>  Labels: repair
> Fix For: 3.0
>
> Attachments: 5483-full-trunk.txt, 
> 5483-v06-04-Allow-tracing-ttl-to-be-configured.patch, 
> 5483-v06-05-Add-a-command-column-to-system_traces.events.patch, 
> 5483-v06-06-Fix-interruption-in-tracestate-propagation.patch, 
> 5483-v07-07-Better-constructor-parameters-for-DebuggableThreadPoolExecutor.patch,
>  5483-v07-08-Fix-brace-style.patch, 
> 5483-v07-09-Add-trace-option-to-a-more-complete-set-of-repair-functions.patch,
>  5483-v07-10-Correct-name-of-boolean-repairedAt-to-fullRepair.patch, 
> 5483-v08-11-Shorten-trace-messages.-Use-Tracing-begin.patch, 
> 5483-v08-12-Trace-streaming-in-Differencer-StreamingRepairTask.patch, 
> 5483-v08-13-sendNotification-of-local-traces-back-to-nodetool.patch, 
> 5483-v08-14-Poll-system_traces.events.patch, 
> 5483-v08-15-Limit-trace-notifications.-Add-exponential-backoff.patch, 
> 5483-v09-16-Fix-hang-caused-by-incorrect-exit-code.patch, 
> 5483-v10-17-minor-bugfixes-and-changes.patch, 
> 5483-v10-rebased-and-squashed-471f5cc.patch, 5483-v11-01-squashed.patch, 
> 5483-v11-squashed-nits.patch, 5483-v12-02-cassandra-yaml-ttl-doc.patch, 
> 5483-v13-608fb03-May-14-trace-formatting-changes.patch, 
> 5483-v14-01-squashed.patch, 
> 5483-v15-02-Hook-up-exponential-backoff-functionality.patch, 
> 5483-v15-03-Exact-doubling-for-exponential-backoff.patch, 
> 5483-v15-04-Re-add-old-StorageService-JMX-signatures.patch, 
> 5483-v15-05-Move-command-column-to-system_traces.sessions.patch, 
> 5483-v15.patch, 5483-v17.patch, ccm-repair-test, 
> cqlsh-left-justify-text-columns.patch, prerepair-vs-postbuggedrepair.diff, 
> test-5483-system_traces-events.txt, 
> trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch, 
> trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch,
>  tr...@8ebeee1-5483-v01-001-trace-filtering-and-tracestate-propagation.txt, 
> tr...@8ebeee1-5483-v01-002-simple-repair-tracing.txt, 
> v02p02-5483-v03-0003-Make-repair-tracing-controllable-via-nodetool.patch, 
> v02p02-5483-v04-0003-This-time-use-an-EnumSet-to-pass-boolean-repair-options.patch,
>  v02p02-5483-v05-0003-Use-long-instead-of-EnumSet-to-work-with-JMX.patch
>
>
> I think it would be nice to log repair stats and results like query tracing 
> stores traces to system keyspace. With it, you don't have to lookup each log 
> file to see what was the status and how it performed the repair you invoked. 
> Instead, you can query the repair log with session ID to see the state and 
> stats of all nodes involved in that repair session.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-5483) Repair tracing

2014-10-20 Thread Philip Thompson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14177816#comment-14177816
 ] 

Philip Thompson commented on CASSANDRA-5483:


No, I know of no changes to that command. I was not even aware of its existence.

> Repair tracing
> --
>
> Key: CASSANDRA-5483
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5483
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Yuki Morishita
>Assignee: Ben Chan
>Priority: Minor
>  Labels: repair
> Fix For: 3.0
>
> Attachments: 5483-full-trunk.txt, 
> 5483-v06-04-Allow-tracing-ttl-to-be-configured.patch, 
> 5483-v06-05-Add-a-command-column-to-system_traces.events.patch, 
> 5483-v06-06-Fix-interruption-in-tracestate-propagation.patch, 
> 5483-v07-07-Better-constructor-parameters-for-DebuggableThreadPoolExecutor.patch,
>  5483-v07-08-Fix-brace-style.patch, 
> 5483-v07-09-Add-trace-option-to-a-more-complete-set-of-repair-functions.patch,
>  5483-v07-10-Correct-name-of-boolean-repairedAt-to-fullRepair.patch, 
> 5483-v08-11-Shorten-trace-messages.-Use-Tracing-begin.patch, 
> 5483-v08-12-Trace-streaming-in-Differencer-StreamingRepairTask.patch, 
> 5483-v08-13-sendNotification-of-local-traces-back-to-nodetool.patch, 
> 5483-v08-14-Poll-system_traces.events.patch, 
> 5483-v08-15-Limit-trace-notifications.-Add-exponential-backoff.patch, 
> 5483-v09-16-Fix-hang-caused-by-incorrect-exit-code.patch, 
> 5483-v10-17-minor-bugfixes-and-changes.patch, 
> 5483-v10-rebased-and-squashed-471f5cc.patch, 5483-v11-01-squashed.patch, 
> 5483-v11-squashed-nits.patch, 5483-v12-02-cassandra-yaml-ttl-doc.patch, 
> 5483-v13-608fb03-May-14-trace-formatting-changes.patch, 
> 5483-v14-01-squashed.patch, 
> 5483-v15-02-Hook-up-exponential-backoff-functionality.patch, 
> 5483-v15-03-Exact-doubling-for-exponential-backoff.patch, 
> 5483-v15-04-Re-add-old-StorageService-JMX-signatures.patch, 
> 5483-v15-05-Move-command-column-to-system_traces.sessions.patch, 
> 5483-v15.patch, 5483-v17.patch, ccm-repair-test, 
> cqlsh-left-justify-text-columns.patch, prerepair-vs-postbuggedrepair.diff, 
> test-5483-system_traces-events.txt, 
> trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch, 
> trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch,
>  tr...@8ebeee1-5483-v01-001-trace-filtering-and-tracestate-propagation.txt, 
> tr...@8ebeee1-5483-v01-002-simple-repair-tracing.txt, 
> v02p02-5483-v03-0003-Make-repair-tracing-controllable-via-nodetool.patch, 
> v02p02-5483-v04-0003-This-time-use-an-EnumSet-to-pass-boolean-repair-options.patch,
>  v02p02-5483-v05-0003-Use-long-instead-of-EnumSet-to-work-with-JMX.patch
>
>
> I think it would be nice to log repair stats and results like query tracing 
> stores traces to system keyspace. With it, you don't have to lookup each log 
> file to see what was the status and how it performed the repair you invoked. 
> Instead, you can query the repair log with session ID to see the state and 
> stats of all nodes involved in that repair session.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-5483) Repair tracing

2014-10-20 Thread Michael Shuler (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14177804#comment-14177804
 ] 

Michael Shuler commented on CASSANDRA-5483:
---

bq. My old method for forcing a streaming repair (yukim's "ccm clear" 
suggestion) no longer seems to work.
I peeked at the ccm git repo and I don't see any recent changes around "clear". 
 I did not go through your whole process, but a quick test of clear works out 
for me with the latest ccm HEAD:
{noformat}
mshuler@hana:~$ ccm create test -v git:cassandra-2.1.0 -n1 -s
Fetching Cassandra updates...
Current cluster is now: test
mshuler@hana:~$ ls -la ~/.ccm/test/node1/{commitlogs,data/system}/
/home/mshuler/.ccm/test/node1/commitlogs/:
total 108
drwxr-xr-x 2 mshuler mshuler 4096 Oct 20 20:34 .
drwxr-xr-x 8 mshuler mshuler 4096 Oct 20 20:34 ..
-rw-r--r-- 1 mshuler mshuler 33554432 Oct 20 20:35 CommitLog-4-1413855299422.log
-rw-r--r-- 1 mshuler mshuler 33554432 Oct 20 20:34 CommitLog-4-1413855299423.log

/home/mshuler/.ccm/test/node1/data/system/:
total 72
drwxr-xr-x 18 mshuler mshuler 4096 Oct 20 20:34 .
drwxr-xr-x  4 mshuler mshuler 4096 Oct 20 20:35 ..
drwxr-xr-x  2 mshuler mshuler 4096 Oct 20 20:34 
IndexInfo-9f5c6374d48532299a0a5094af9ad1e3
drwxr-xr-x  2 mshuler mshuler 4096 Oct 20 20:34 
batchlog-0290003c977e397cac3efdfdc01d626b
drwxr-xr-x  2 mshuler mshuler 4096 Oct 20 20:34 
compaction_history-b4dbb7b4dc493fb5b3bfce6e434832ca
drwxr-xr-x  3 mshuler mshuler 4096 Oct 20 20:34 
compactions_in_progress-55080ab05d9c388690a4acb25fe1f77b
drwxr-xr-x  2 mshuler mshuler 4096 Oct 20 20:34 
hints-2666e20573ef38b390fefecf96e8f0c7
drwxr-xr-x  2 mshuler mshuler 4096 Oct 20 20:35 
local-7ad54392bcdd35a684174e047860b377
drwxr-xr-x  2 mshuler mshuler 4096 Oct 20 20:34 
paxos-b7b7f0c2fd0a34108c053ef614bb7c2d
drwxr-xr-x  2 mshuler mshuler 4096 Oct 20 20:34 
peer_events-59dfeaea8db2334191ef109974d81484
drwxr-xr-x  2 mshuler mshuler 4096 Oct 20 20:34 
peers-37f71aca7dc2383ba70672528af04d4f
drwxr-xr-x  2 mshuler mshuler 4096 Oct 20 20:34 
range_xfers-55d764384e553f8b9f6e676d4af3976d
drwxr-xr-x  2 mshuler mshuler 4096 Oct 20 20:35 
schema_columnfamilies-45f5b36024bc3f83a3631034ea4fa697
drwxr-xr-x  2 mshuler mshuler 4096 Oct 20 20:35 
schema_columns-296e9c049bec3085827dc17d3df2122a
drwxr-xr-x  2 mshuler mshuler 4096 Oct 20 20:35 
schema_keyspaces-b0f2235744583cdb9631c43e59ce3676
drwxr-xr-x  2 mshuler mshuler 4096 Oct 20 20:34 
schema_triggers-0359bc7171233ee19a4ab9dfb11fc125
drwxr-xr-x  2 mshuler mshuler 4096 Oct 20 20:34 
schema_usertypes-3aa752254f82350b8d5c430fa221fa0a
drwxr-xr-x  2 mshuler mshuler 4096 Oct 20 20:34 
sstable_activity-5a1ff267ace03f128563cfae6103c65e
mshuler@hana:~$ ccm node1 clear
mshuler@hana:~$ ls -la ~/.ccm/test/node1/{commitlogs,data/system}/
ls: cannot access /home/mshuler/.ccm/test/node1/data/system/: No such file or 
directory
/home/mshuler/.ccm/test/node1/commitlogs/:
total 8
drwxr-xr-x 2 mshuler mshuler 4096 Oct 20 20:35 .
drwxr-xr-x 8 mshuler mshuler 4096 Oct 20 20:35 ..
{noformat}
[~philipthompson] do you know of any recent changes around "clear"?

> Repair tracing
> --
>
> Key: CASSANDRA-5483
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5483
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Yuki Morishita
>Assignee: Ben Chan
>Priority: Minor
>  Labels: repair
> Fix For: 3.0
>
> Attachments: 5483-full-trunk.txt, 
> 5483-v06-04-Allow-tracing-ttl-to-be-configured.patch, 
> 5483-v06-05-Add-a-command-column-to-system_traces.events.patch, 
> 5483-v06-06-Fix-interruption-in-tracestate-propagation.patch, 
> 5483-v07-07-Better-constructor-parameters-for-DebuggableThreadPoolExecutor.patch,
>  5483-v07-08-Fix-brace-style.patch, 
> 5483-v07-09-Add-trace-option-to-a-more-complete-set-of-repair-functions.patch,
>  5483-v07-10-Correct-name-of-boolean-repairedAt-to-fullRepair.patch, 
> 5483-v08-11-Shorten-trace-messages.-Use-Tracing-begin.patch, 
> 5483-v08-12-Trace-streaming-in-Differencer-StreamingRepairTask.patch, 
> 5483-v08-13-sendNotification-of-local-traces-back-to-nodetool.patch, 
> 5483-v08-14-Poll-system_traces.events.patch, 
> 5483-v08-15-Limit-trace-notifications.-Add-exponential-backoff.patch, 
> 5483-v09-16-Fix-hang-caused-by-incorrect-exit-code.patch, 
> 5483-v10-17-minor-bugfixes-and-changes.patch, 
> 5483-v10-rebased-and-squashed-471f5cc.patch, 5483-v11-01-squashed.patch, 
> 5483-v11-squashed-nits.patch, 5483-v12-02-cassandra-yaml-ttl-doc.patch, 
> 5483-v13-608fb03-May-14-trace-formatting-changes.patch, 
> 5483-v14-01-squashed.patch, 
> 5483-v15-02-Hook-up-exponential-backoff-functionality.patch, 
> 5483-v15-03-Exact-doubling-for-exponential-backoff.patch, 
> 5483-v15-04-Re-add-old-StorageService-JMX-signatures.patch, 
> 5483-v15-05-Move-command

[jira] [Commented] (CASSANDRA-5483) Repair tracing

2014-10-20 Thread Ben Chan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14177629#comment-14177629
 ] 

Ben Chan commented on CASSANDRA-5483:
-

Some nice changes to the JMX interface in the last 3 months; I no longer have 
to add lots of boilerplate function overloads.

{noformat}
read -r JMXGET  Repair tracing
> --
>
> Key: CASSANDRA-5483
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5483
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Yuki Morishita
>Assignee: Ben Chan
>Priority: Minor
>  Labels: repair
> Fix For: 3.0
>
> Attachments: 5483-full-trunk.txt, 
> 5483-v06-04-Allow-tracing-ttl-to-be-configured.patch, 
> 5483-v06-05-Add-a-command-column-to-system_traces.events.patch, 
> 5483-v06-06-Fix-interruption-in-tracestate-propagation.patch, 
> 5483-v07-07-Better-constructor-parameters-for-DebuggableThreadPoolExecutor.patch,
>  5483-v07-08-Fix-brace-style.patch, 
> 5483-v07-09-Add-trace-option-to-a-more-complete-set-of-repair-functions.patch,
>  5483-v07-10-Correct-name-of-boolean-repairedAt-to-fullRepair.patch, 
> 5483-v08-11-Shorten-trace-messages.-Use-Tracing-begin.patch, 
> 5483-v08-12-Trace-streaming-in-Differencer-StreamingRepairTask.patch, 
> 5483-v08-13-sendNotification-of-local-traces-back-to-nodetool.patch, 
> 5483-v08-14-Poll-system_traces.events.patch, 
> 5483-v08-15-Limit-trace-notifications.-Add-exponential-backoff.patch, 
> 5483-v09-16-Fix-hang-caused-by-incorrect-exit-code.patch, 
> 5483-v10-17-minor-bugfixes-and-changes.patch, 
> 5483-v10-rebased-and-squashed-471f5cc.patch, 5483-v11-01-squashed.patch, 
> 5483-v11-squashed-nits.patch, 5483-v12-02-cassandra-yaml-ttl-doc.patch, 
> 5483-v13-608fb03-May-14-trace-formatting-changes.patch, 
> 5483-v14-01-squashed.patch, 
> 5483-v15-02-Hook-up-exponential-backoff-functionality.patch, 
> 5483-v15-03-Exact-doubling-for-exponential-backoff.patch, 
> 5483-v15-04-Re-add-old-StorageService-JMX-signatures.patch, 
> 5483-v15-05-Move-command-column-to-system_traces.sessions.patch, 
> 5483-v15.patch, 5483-v17.patch, ccm-repair-test, 
> cqlsh-left-justify-text-columns.patch, prerepair-vs-postbuggedrepair.diff, 
> test-5483-system_traces-events.txt, 
> trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch, 
> trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch,
>  tr...@8ebeee1-5483-v01-001-trace-filtering-and-tracestate-propagation.txt, 
> tr...@8ebeee1-5483-v01-002-simple-repair-tracing.txt, 
> v02p02-5483-v03-0003-Make-repair-tracing-controllable-via-nodetool.patch, 
> v02p02-5483-v04-0003-This-time-use-an-EnumSet-to-pass-boolean-repair-options.patch,
>  v02p02-5483-v05-0003-Use-long-instead-of-EnumSet-to-work-with-JMX.patch
>
>
> I think it would be nice to log repair stats and results like query tracing 
> stores traces to system keyspace. With it, you don't have to lookup each log 
> file to see what was the status and how it performed the repair you invoked. 
> Instead, you can query the repair log with session ID to see the state and 
> stats of all nodes involved in that repair session.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-5483) Repair tracing

2014-10-14 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14171479#comment-14171479
 ] 

Jonathan Ellis commented on CASSANDRA-5483:
---

That should be it!

> Repair tracing
> --
>
> Key: CASSANDRA-5483
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5483
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Yuki Morishita
>Assignee: Ben Chan
>Priority: Minor
>  Labels: repair
> Fix For: 3.0
>
> Attachments: 5483-full-trunk.txt, 
> 5483-v06-04-Allow-tracing-ttl-to-be-configured.patch, 
> 5483-v06-05-Add-a-command-column-to-system_traces.events.patch, 
> 5483-v06-06-Fix-interruption-in-tracestate-propagation.patch, 
> 5483-v07-07-Better-constructor-parameters-for-DebuggableThreadPoolExecutor.patch,
>  5483-v07-08-Fix-brace-style.patch, 
> 5483-v07-09-Add-trace-option-to-a-more-complete-set-of-repair-functions.patch,
>  5483-v07-10-Correct-name-of-boolean-repairedAt-to-fullRepair.patch, 
> 5483-v08-11-Shorten-trace-messages.-Use-Tracing-begin.patch, 
> 5483-v08-12-Trace-streaming-in-Differencer-StreamingRepairTask.patch, 
> 5483-v08-13-sendNotification-of-local-traces-back-to-nodetool.patch, 
> 5483-v08-14-Poll-system_traces.events.patch, 
> 5483-v08-15-Limit-trace-notifications.-Add-exponential-backoff.patch, 
> 5483-v09-16-Fix-hang-caused-by-incorrect-exit-code.patch, 
> 5483-v10-17-minor-bugfixes-and-changes.patch, 
> 5483-v10-rebased-and-squashed-471f5cc.patch, 5483-v11-01-squashed.patch, 
> 5483-v11-squashed-nits.patch, 5483-v12-02-cassandra-yaml-ttl-doc.patch, 
> 5483-v13-608fb03-May-14-trace-formatting-changes.patch, 
> 5483-v14-01-squashed.patch, 
> 5483-v15-02-Hook-up-exponential-backoff-functionality.patch, 
> 5483-v15-03-Exact-doubling-for-exponential-backoff.patch, 
> 5483-v15-04-Re-add-old-StorageService-JMX-signatures.patch, 
> 5483-v15-05-Move-command-column-to-system_traces.sessions.patch, 
> 5483-v15.patch, ccm-repair-test, cqlsh-left-justify-text-columns.patch, 
> prerepair-vs-postbuggedrepair.diff, test-5483-system_traces-events.txt, 
> trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch, 
> trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch,
>  tr...@8ebeee1-5483-v01-001-trace-filtering-and-tracestate-propagation.txt, 
> tr...@8ebeee1-5483-v01-002-simple-repair-tracing.txt, 
> v02p02-5483-v03-0003-Make-repair-tracing-controllable-via-nodetool.patch, 
> v02p02-5483-v04-0003-This-time-use-an-EnumSet-to-pass-boolean-repair-options.patch,
>  v02p02-5483-v05-0003-Use-long-instead-of-EnumSet-to-work-with-JMX.patch
>
>
> I think it would be nice to log repair stats and results like query tracing 
> stores traces to system keyspace. With it, you don't have to lookup each log 
> file to see what was the status and how it performed the repair you invoked. 
> Instead, you can query the repair log with session ID to see the state and 
> stats of all nodes involved in that repair session.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-5483) Repair tracing

2014-10-12 Thread Ben Chan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14168742#comment-14168742
 ] 

Ben Chan commented on CASSANDRA-5483:
-

Okay, that covers all the questions that I had.

TODO list:
- Rebase to trunk (hopefully the last 3 months haven't made this too painful).
- Miscellaneous remaining cruft from the 5483-16 cleanup. Possibly some code 
simplification (low-hanging fruit only).
- Guard out-of-bounds-array access in TraceState#deserialize. The simplest 
thing I can think of is to have a "NONE" tracetype and return that for any 
unknown tracetype.
- Fix format string "Sending completed merkle tree to %s for %s.%s" being 
traced verbatim.

Barring anything unforseen, is that about it before it's good enough for a 
merge?

It may be a few days before I actually get around to this. I had some computer 
trouble recently, so I'm going to have to set up my build environment from 
scratch.


> Repair tracing
> --
>
> Key: CASSANDRA-5483
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5483
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Yuki Morishita
>Assignee: Ben Chan
>Priority: Minor
>  Labels: repair
> Fix For: 3.0
>
> Attachments: 5483-full-trunk.txt, 
> 5483-v06-04-Allow-tracing-ttl-to-be-configured.patch, 
> 5483-v06-05-Add-a-command-column-to-system_traces.events.patch, 
> 5483-v06-06-Fix-interruption-in-tracestate-propagation.patch, 
> 5483-v07-07-Better-constructor-parameters-for-DebuggableThreadPoolExecutor.patch,
>  5483-v07-08-Fix-brace-style.patch, 
> 5483-v07-09-Add-trace-option-to-a-more-complete-set-of-repair-functions.patch,
>  5483-v07-10-Correct-name-of-boolean-repairedAt-to-fullRepair.patch, 
> 5483-v08-11-Shorten-trace-messages.-Use-Tracing-begin.patch, 
> 5483-v08-12-Trace-streaming-in-Differencer-StreamingRepairTask.patch, 
> 5483-v08-13-sendNotification-of-local-traces-back-to-nodetool.patch, 
> 5483-v08-14-Poll-system_traces.events.patch, 
> 5483-v08-15-Limit-trace-notifications.-Add-exponential-backoff.patch, 
> 5483-v09-16-Fix-hang-caused-by-incorrect-exit-code.patch, 
> 5483-v10-17-minor-bugfixes-and-changes.patch, 
> 5483-v10-rebased-and-squashed-471f5cc.patch, 5483-v11-01-squashed.patch, 
> 5483-v11-squashed-nits.patch, 5483-v12-02-cassandra-yaml-ttl-doc.patch, 
> 5483-v13-608fb03-May-14-trace-formatting-changes.patch, 
> 5483-v14-01-squashed.patch, 
> 5483-v15-02-Hook-up-exponential-backoff-functionality.patch, 
> 5483-v15-03-Exact-doubling-for-exponential-backoff.patch, 
> 5483-v15-04-Re-add-old-StorageService-JMX-signatures.patch, 
> 5483-v15-05-Move-command-column-to-system_traces.sessions.patch, 
> 5483-v15.patch, ccm-repair-test, cqlsh-left-justify-text-columns.patch, 
> prerepair-vs-postbuggedrepair.diff, test-5483-system_traces-events.txt, 
> trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch, 
> trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch,
>  tr...@8ebeee1-5483-v01-001-trace-filtering-and-tracestate-propagation.txt, 
> tr...@8ebeee1-5483-v01-002-simple-repair-tracing.txt, 
> v02p02-5483-v03-0003-Make-repair-tracing-controllable-via-nodetool.patch, 
> v02p02-5483-v04-0003-This-time-use-an-EnumSet-to-pass-boolean-repair-options.patch,
>  v02p02-5483-v05-0003-Use-long-instead-of-EnumSet-to-work-with-JMX.patch
>
>
> I think it would be nice to log repair stats and results like query tracing 
> stores traces to system keyspace. With it, you don't have to lookup each log 
> file to see what was the status and how it performed the repair you invoked. 
> Instead, you can query the repair log with session ID to see the state and 
> stats of all nodes involved in that repair session.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-5483) Repair tracing

2014-10-07 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162149#comment-14162149
 ] 

Jonathan Ellis commented on CASSANDRA-5483:
---

bq. If remote nodes have different config values for tracetype_repair_ttl, then 
remote traces will end up with a different ttl

I'm fine with the limitation that weird things happen if you configure it wrong.

bq. If server versions are ever mixed, then an unknown trace type could be sent 
over the wire, causing an out-of-bounds array access during deserialization.

We definitely need to support mixed-version repairs, so it sounds like this 
needs to be fixed.

bq. Activity notification for every local repair trace starts a new exponential 
backoff polling cycle more often than necessary...  I was thinking forward to 
with bitmasks was flexible subsetting of traces and notifications.

I'm a big fan of not oversolving the problem.  Let's start with a simple 
solution and evolve from there.

> Repair tracing
> --
>
> Key: CASSANDRA-5483
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5483
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Yuki Morishita
>Assignee: Ben Chan
>Priority: Minor
>  Labels: repair
> Fix For: 3.0
>
> Attachments: 5483-full-trunk.txt, 
> 5483-v06-04-Allow-tracing-ttl-to-be-configured.patch, 
> 5483-v06-05-Add-a-command-column-to-system_traces.events.patch, 
> 5483-v06-06-Fix-interruption-in-tracestate-propagation.patch, 
> 5483-v07-07-Better-constructor-parameters-for-DebuggableThreadPoolExecutor.patch,
>  5483-v07-08-Fix-brace-style.patch, 
> 5483-v07-09-Add-trace-option-to-a-more-complete-set-of-repair-functions.patch,
>  5483-v07-10-Correct-name-of-boolean-repairedAt-to-fullRepair.patch, 
> 5483-v08-11-Shorten-trace-messages.-Use-Tracing-begin.patch, 
> 5483-v08-12-Trace-streaming-in-Differencer-StreamingRepairTask.patch, 
> 5483-v08-13-sendNotification-of-local-traces-back-to-nodetool.patch, 
> 5483-v08-14-Poll-system_traces.events.patch, 
> 5483-v08-15-Limit-trace-notifications.-Add-exponential-backoff.patch, 
> 5483-v09-16-Fix-hang-caused-by-incorrect-exit-code.patch, 
> 5483-v10-17-minor-bugfixes-and-changes.patch, 
> 5483-v10-rebased-and-squashed-471f5cc.patch, 5483-v11-01-squashed.patch, 
> 5483-v11-squashed-nits.patch, 5483-v12-02-cassandra-yaml-ttl-doc.patch, 
> 5483-v13-608fb03-May-14-trace-formatting-changes.patch, 
> 5483-v14-01-squashed.patch, 
> 5483-v15-02-Hook-up-exponential-backoff-functionality.patch, 
> 5483-v15-03-Exact-doubling-for-exponential-backoff.patch, 
> 5483-v15-04-Re-add-old-StorageService-JMX-signatures.patch, 
> 5483-v15-05-Move-command-column-to-system_traces.sessions.patch, 
> 5483-v15.patch, ccm-repair-test, cqlsh-left-justify-text-columns.patch, 
> prerepair-vs-postbuggedrepair.diff, test-5483-system_traces-events.txt, 
> trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch, 
> trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch,
>  tr...@8ebeee1-5483-v01-001-trace-filtering-and-tracestate-propagation.txt, 
> tr...@8ebeee1-5483-v01-002-simple-repair-tracing.txt, 
> v02p02-5483-v03-0003-Make-repair-tracing-controllable-via-nodetool.patch, 
> v02p02-5483-v04-0003-This-time-use-an-EnumSet-to-pass-boolean-repair-options.patch,
>  v02p02-5483-v05-0003-Use-long-instead-of-EnumSet-to-work-with-JMX.patch
>
>
> I think it would be nice to log repair stats and results like query tracing 
> stores traces to system keyspace. With it, you don't have to lookup each log 
> file to see what was the status and how it performed the repair you invoked. 
> Instead, you can query the repair log with session ID to see the state and 
> stats of all nodes involved in that repair session.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-5483) Repair tracing

2014-07-10 Thread Ben Chan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14057548#comment-14057548
 ] 

Ben Chan commented on CASSANDRA-5483:
-

A few things to note about 5483-16:

Just to emphasize: these are corner-cases and deoptimizations that may not ever 
matter in practice. Even if the decision is to not fix them, I wanted to note 
their existence.

*(1)*
If remote nodes have different config values for {{tracetype_repair_ttl}}, then 
remote traces will end up with a different ttl. Having the trace for a given 
session partially disappear may be surprising to the user.

The approach 5483-15 took was to use the ttl configured on the node being 
repaired.

*(2)*
I don't know whether this will ever arise in practice, but I thought I'd note: 
If server versions are ever mixed, then an unknown trace type could be sent 
over the wire, causing an out-of-bounds array access during deserialization.

The approach 5483-15 took was almost a no-op on the remote end (the tracing 
session was created, but nothing was traced). This is still not quite right. 
Better would be to log an error and not create the session at all.

It might still possibly make sense to create the session anyway in the case of 
A->B->C tracestate propagation, but that might be stretching things.

*(3)*
Activity notification for every local repair trace starts a new exponential 
backoff polling cycle more often than necessary.

Of course, I can't say whether this extra resource usage will actually matter 
in practice (only practice can tell).



Okay, so what you meant was to change tracetype to be an actual Enum type (I 
can be a little dense sometimes, and code leaves far less room for ambiguity).

One thing I was thinking forward to with bitmasks was flexible subsetting of 
traces and notifications. Tracing may not ever need trace levels or flexible 
subsetting, but I thought I'd note the idea for future reference.

{noformat}
// pseudocode; please excuse the bad naming sense.
REPAIR_0 = 0b001 << MIN_SHIFT;
REPAIR_1 = 0b010 << MIN_SHIFT;
REPAIR_2 = 0b100 << MIN_SHIFT;

// read "zero and up", etc
REPAIR_0U = REPAIR_0 | REPAIR_1 | REPAIR_2;
REPAIR_1U =REPAIR_1 | REPAIR_2;

NOTIFY_0 = 0b001;
NOTIFY_1 = 0b010;
NOTIFY_2 = 0b100;
NOTIFY_0U = ...;
NOTIFY_1U = ...;

val = config.get_val();

trace(REPAIR_0, "only traced for REPAIR_0; superseded in the more verbose 
traces");
trace(REPAIR_0U, "always traced for repair");
trace(REPAIR_1U, "traced at REPAIR_1 and up");
trace(REPAIR_2 | val, "only at REPAIR_2, or during the tracetypes specified in 
val");
trace(REPAIR_1U & ~val, "only at REPAIR_1 and up, excluding the tracetypes 
specified in val");
trace(REPAIR_2 | REPAIR_0, "for some reason, we don't want to trace this at 
REPAIR_1");

// you could even have semi-orthogonal subsetting and levels
trace(REPAIR_0U | NOTIFY_1, "notify at NOTIFY_1 only");
trace(REPAIR_1U | NOTIFY_0U, "notify at all NOTIFY levels, but only when 
tracing REPAIR_1 and up");

awaitNotification(NOTIFY_1U); // notify me for NOTIFY_1 and up.
{noformat}

I believe log4j does something similar with what they call markers.

Of course, you can do this with Enum and EnumSet too (though more verbosely).



Aside from lyubent's nit (and possibly the issues in section 1), are there any 
other issues that prevent this patch from being minimally useful?

Or even better, is there at least an alpha period where {{trunk}} is still 
allowed to make breaking changes to the api, protocol, etc? It's hard to forsee 
everything.


> Repair tracing
> --
>
> Key: CASSANDRA-5483
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5483
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Yuki Morishita
>Assignee: Ben Chan
>Priority: Minor
>  Labels: repair
> Attachments: 5483-full-trunk.txt, 
> 5483-v06-04-Allow-tracing-ttl-to-be-configured.patch, 
> 5483-v06-05-Add-a-command-column-to-system_traces.events.patch, 
> 5483-v06-06-Fix-interruption-in-tracestate-propagation.patch, 
> 5483-v07-07-Better-constructor-parameters-for-DebuggableThreadPoolExecutor.patch,
>  5483-v07-08-Fix-brace-style.patch, 
> 5483-v07-09-Add-trace-option-to-a-more-complete-set-of-repair-functions.patch,
>  5483-v07-10-Correct-name-of-boolean-repairedAt-to-fullRepair.patch, 
> 5483-v08-11-Shorten-trace-messages.-Use-Tracing-begin.patch, 
> 5483-v08-12-Trace-streaming-in-Differencer-StreamingRepairTask.patch, 
> 5483-v08-13-sendNotification-of-local-traces-back-to-nodetool.patch, 
> 5483-v08-14-Poll-system_traces.events.patch, 
> 5483-v08-15-Limit-trace-notifications.-Add-exponential-backoff.patch, 
> 5483-v09-16-Fix-hang-caused-by-incorrect-exit-code.patch, 
> 5483-v10-17-minor-bugfixes-and-changes.patch, 
> 5483-v10-rebased-and-squashed-471

[jira] [Commented] (CASSANDRA-5483) Repair tracing

2014-07-09 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14056727#comment-14056727
 ] 

Jonathan Ellis commented on CASSANDRA-5483:
---

bq. The code actually does use an enum (of sorts; not an Enum proper), the 
traceType. The traceType is passed to Tracing#getCommandName to look up the 
String for command name.

Right.  Any reason to not just use a real enum?

Or maybe even better -- 

bq. the server isn't going to work with old versions of nodetool

I get that, but C* is popular enough now that there's a reasonable number of 
out-of-tree C* JMX users.  I'm okay with breaking compatibility for a good 
cause but here it seems pretty easy to avoid.

bq. Repair can take a long time, so 24 hours may be too short of a ttl.

Good point.

> Repair tracing
> --
>
> Key: CASSANDRA-5483
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5483
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Yuki Morishita
>Assignee: Ben Chan
>Priority: Minor
>  Labels: repair
> Attachments: 5483-full-trunk.txt, 
> 5483-v06-04-Allow-tracing-ttl-to-be-configured.patch, 
> 5483-v06-05-Add-a-command-column-to-system_traces.events.patch, 
> 5483-v06-06-Fix-interruption-in-tracestate-propagation.patch, 
> 5483-v07-07-Better-constructor-parameters-for-DebuggableThreadPoolExecutor.patch,
>  5483-v07-08-Fix-brace-style.patch, 
> 5483-v07-09-Add-trace-option-to-a-more-complete-set-of-repair-functions.patch,
>  5483-v07-10-Correct-name-of-boolean-repairedAt-to-fullRepair.patch, 
> 5483-v08-11-Shorten-trace-messages.-Use-Tracing-begin.patch, 
> 5483-v08-12-Trace-streaming-in-Differencer-StreamingRepairTask.patch, 
> 5483-v08-13-sendNotification-of-local-traces-back-to-nodetool.patch, 
> 5483-v08-14-Poll-system_traces.events.patch, 
> 5483-v08-15-Limit-trace-notifications.-Add-exponential-backoff.patch, 
> 5483-v09-16-Fix-hang-caused-by-incorrect-exit-code.patch, 
> 5483-v10-17-minor-bugfixes-and-changes.patch, 
> 5483-v10-rebased-and-squashed-471f5cc.patch, 5483-v11-01-squashed.patch, 
> 5483-v11-squashed-nits.patch, 5483-v12-02-cassandra-yaml-ttl-doc.patch, 
> 5483-v13-608fb03-May-14-trace-formatting-changes.patch, 
> 5483-v14-01-squashed.patch, 
> 5483-v15-02-Hook-up-exponential-backoff-functionality.patch, 
> 5483-v15-03-Exact-doubling-for-exponential-backoff.patch, 
> 5483-v15-04-Re-add-old-StorageService-JMX-signatures.patch, 
> 5483-v15-05-Move-command-column-to-system_traces.sessions.patch, 
> 5483-v15.patch, ccm-repair-test, cqlsh-left-justify-text-columns.patch, 
> prerepair-vs-postbuggedrepair.diff, test-5483-system_traces-events.txt, 
> trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch, 
> trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch,
>  tr...@8ebeee1-5483-v01-001-trace-filtering-and-tracestate-propagation.txt, 
> tr...@8ebeee1-5483-v01-002-simple-repair-tracing.txt, 
> v02p02-5483-v03-0003-Make-repair-tracing-controllable-via-nodetool.patch, 
> v02p02-5483-v04-0003-This-time-use-an-EnumSet-to-pass-boolean-repair-options.patch,
>  v02p02-5483-v05-0003-Use-long-instead-of-EnumSet-to-work-with-JMX.patch
>
>
> I think it would be nice to log repair stats and results like query tracing 
> stores traces to system keyspace. With it, you don't have to lookup each log 
> file to see what was the status and how it performed the repair you invoked. 
> Instead, you can query the repair log with session ID to see the state and 
> stats of all nodes involved in that repair session.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-5483) Repair tracing

2014-07-06 Thread Lyuben Todorov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14053222#comment-14053222
 ] 

Lyuben Todorov commented on CASSANDRA-5483:
---

+1 but with a nit:
*Sending completed merkle tree to %s for %s.%s* appears in the logs because of:
{{Tracing.traceRepair("Sending completed merkle tree to %s for %s.%s", 
initiator, desc.keyspace, desc.columnFamily);}} in {{Validator#run}}. 

A switch can be made from _%s_ to _{}_ instead of using String.format for the 
sake of minimalism. 

> Repair tracing
> --
>
> Key: CASSANDRA-5483
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5483
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Yuki Morishita
>Assignee: Ben Chan
>Priority: Minor
>  Labels: repair
> Attachments: 5483-full-trunk.txt, 
> 5483-v06-04-Allow-tracing-ttl-to-be-configured.patch, 
> 5483-v06-05-Add-a-command-column-to-system_traces.events.patch, 
> 5483-v06-06-Fix-interruption-in-tracestate-propagation.patch, 
> 5483-v07-07-Better-constructor-parameters-for-DebuggableThreadPoolExecutor.patch,
>  5483-v07-08-Fix-brace-style.patch, 
> 5483-v07-09-Add-trace-option-to-a-more-complete-set-of-repair-functions.patch,
>  5483-v07-10-Correct-name-of-boolean-repairedAt-to-fullRepair.patch, 
> 5483-v08-11-Shorten-trace-messages.-Use-Tracing-begin.patch, 
> 5483-v08-12-Trace-streaming-in-Differencer-StreamingRepairTask.patch, 
> 5483-v08-13-sendNotification-of-local-traces-back-to-nodetool.patch, 
> 5483-v08-14-Poll-system_traces.events.patch, 
> 5483-v08-15-Limit-trace-notifications.-Add-exponential-backoff.patch, 
> 5483-v09-16-Fix-hang-caused-by-incorrect-exit-code.patch, 
> 5483-v10-17-minor-bugfixes-and-changes.patch, 
> 5483-v10-rebased-and-squashed-471f5cc.patch, 5483-v11-01-squashed.patch, 
> 5483-v11-squashed-nits.patch, 5483-v12-02-cassandra-yaml-ttl-doc.patch, 
> 5483-v13-608fb03-May-14-trace-formatting-changes.patch, 
> 5483-v14-01-squashed.patch, 
> 5483-v15-02-Hook-up-exponential-backoff-functionality.patch, 
> 5483-v15-03-Exact-doubling-for-exponential-backoff.patch, 
> 5483-v15-04-Re-add-old-StorageService-JMX-signatures.patch, 
> 5483-v15-05-Move-command-column-to-system_traces.sessions.patch, 
> 5483-v15.patch, ccm-repair-test, cqlsh-left-justify-text-columns.patch, 
> prerepair-vs-postbuggedrepair.diff, test-5483-system_traces-events.txt, 
> trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch, 
> trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch,
>  tr...@8ebeee1-5483-v01-001-trace-filtering-and-tracestate-propagation.txt, 
> tr...@8ebeee1-5483-v01-002-simple-repair-tracing.txt, 
> v02p02-5483-v03-0003-Make-repair-tracing-controllable-via-nodetool.patch, 
> v02p02-5483-v04-0003-This-time-use-an-EnumSet-to-pass-boolean-repair-options.patch,
>  v02p02-5483-v05-0003-Use-long-instead-of-EnumSet-to-work-with-JMX.patch
>
>
> I think it would be nice to log repair stats and results like query tracing 
> stores traces to system keyspace. With it, you don't have to lookup each log 
> file to see what was the status and how it performed the repair you invoked. 
> Instead, you can query the repair log with session ID to see the state and 
> stats of all nodes involved in that repair session.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-5483) Repair tracing

2014-06-27 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046316#comment-14046316
 ] 

Jonathan Ellis commented on CASSANDRA-5483:
---

Posted v16 to https://github.com/jbellis/cassandra/commits/5483-16.

Simplifies TraceState as follows:
 - TraceType is an enum
 - always use the tracetype the state was created for
 - don't try to overgeneralize the notification handle
 - encapsulate notification fields
 - ttl is uniquely determined by trace type

> Repair tracing
> --
>
> Key: CASSANDRA-5483
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5483
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Yuki Morishita
>Assignee: Ben Chan
>Priority: Minor
>  Labels: repair
> Attachments: 5483-full-trunk.txt, 
> 5483-v06-04-Allow-tracing-ttl-to-be-configured.patch, 
> 5483-v06-05-Add-a-command-column-to-system_traces.events.patch, 
> 5483-v06-06-Fix-interruption-in-tracestate-propagation.patch, 
> 5483-v07-07-Better-constructor-parameters-for-DebuggableThreadPoolExecutor.patch,
>  5483-v07-08-Fix-brace-style.patch, 
> 5483-v07-09-Add-trace-option-to-a-more-complete-set-of-repair-functions.patch,
>  5483-v07-10-Correct-name-of-boolean-repairedAt-to-fullRepair.patch, 
> 5483-v08-11-Shorten-trace-messages.-Use-Tracing-begin.patch, 
> 5483-v08-12-Trace-streaming-in-Differencer-StreamingRepairTask.patch, 
> 5483-v08-13-sendNotification-of-local-traces-back-to-nodetool.patch, 
> 5483-v08-14-Poll-system_traces.events.patch, 
> 5483-v08-15-Limit-trace-notifications.-Add-exponential-backoff.patch, 
> 5483-v09-16-Fix-hang-caused-by-incorrect-exit-code.patch, 
> 5483-v10-17-minor-bugfixes-and-changes.patch, 
> 5483-v10-rebased-and-squashed-471f5cc.patch, 5483-v11-01-squashed.patch, 
> 5483-v11-squashed-nits.patch, 5483-v12-02-cassandra-yaml-ttl-doc.patch, 
> 5483-v13-608fb03-May-14-trace-formatting-changes.patch, 
> 5483-v14-01-squashed.patch, 
> 5483-v15-02-Hook-up-exponential-backoff-functionality.patch, 
> 5483-v15-03-Exact-doubling-for-exponential-backoff.patch, 
> 5483-v15-04-Re-add-old-StorageService-JMX-signatures.patch, 
> 5483-v15-05-Move-command-column-to-system_traces.sessions.patch, 
> 5483-v15.patch, ccm-repair-test, cqlsh-left-justify-text-columns.patch, 
> prerepair-vs-postbuggedrepair.diff, test-5483-system_traces-events.txt, 
> trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch, 
> trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch,
>  tr...@8ebeee1-5483-v01-001-trace-filtering-and-tracestate-propagation.txt, 
> tr...@8ebeee1-5483-v01-002-simple-repair-tracing.txt, 
> v02p02-5483-v03-0003-Make-repair-tracing-controllable-via-nodetool.patch, 
> v02p02-5483-v04-0003-This-time-use-an-EnumSet-to-pass-boolean-repair-options.patch,
>  v02p02-5483-v05-0003-Use-long-instead-of-EnumSet-to-work-with-JMX.patch
>
>
> I think it would be nice to log repair stats and results like query tracing 
> stores traces to system keyspace. With it, you don't have to lookup each log 
> file to see what was the status and how it performed the repair you invoked. 
> Instead, you can query the repair log with session ID to see the state and 
> stats of all nodes involved in that repair session.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-5483) Repair tracing

2014-06-12 Thread Ben Chan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14030076#comment-14030076
 ] 

Ben Chan commented on CASSANDRA-5483:
-

{quote}
the exponential backoff is maintained, it's only defined more simply (as far as 
I could read you doubled the wait time every second timeout, which is what 
happens now also).
{quote}

Sorry, I was speed-reading the code; I saw {{wait();}} (i.e. with no timeout), 
so I immediately knew there couldn't be a timeout, exponential or otherwise.

I mostly ignored what wouldn't affect execution (stupid mental code optimizer).

If you have {{wait(wait);}} and reset {{wait}} (the {{long}}) to 
{{minWaitMillis}} in {{if (hasNotifications)}} then that should be about right. 
I've attached a patch that does this 
([^5483-v15-02-Hook-up-exponential-backoff-functionality.patch]).

I apologize for reading-code-while-distracted, thus causing me to go on about 
the exponential backoff. I'll try to learn from this experience.

---

I liked your exponential backoff calculation code; I took the liberty of 
tweaking it to double exactly in 
[^5483-v15-03-Exact-doubling-for-exponential-backoff.patch].

---

{quote}
As the code is defined now we explicitly create a repair session and spawn a 
specific thread to process it, so we have a guarantee there's only one thread. 
It doesn't matter if the consumer is in its method when notifyAll is called; if 
it isn't it will receive the notification as soon as it next enters the method.
{quote}

I agree with you.

When I said "only ever" before, I was talking about future code, not the code 
in its current state. Had I considered the alternate interpretation of "edge 
case in the current code", I would have clarified better, but there we are.

Just to reiterate, I'm reasonably cognizant that for the code in its current 
state, only a single thread is accessing the notification for a given 
TraceState, and how this allows for a simpler implementation.

All I was trying to say in my previous comment was that notification upon 
activity is a general capability, and in future code, should there ever be a 
use case for multiple threads accessing the same activity notification, then 
the implementation will have to be changed. (This also holds, should there ever 
be a use case that requires something other than exponential backoff.)

Our differences turn out to be philosophical only, and I'm not all that 
attached to mine.

I've already accepted [^5483-v15.patch] and have built my fixes for (some of) 
the Jun 10 comments on top of it. I need some more clarification before I can 
be sure what is required in order to resolve the remaining issues.

---

Test code, modified and simplified because of the updated system_traces.* 
schemas (I may upload an updated {{ccm-repair-test}} if we have many more patch 
iterations).

{noformat}
read -r JMXGET  Repair tracing
> --
>
> Key: CASSANDRA-5483
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5483
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Yuki Morishita
>Assignee: Ben Chan
>Priority: Minor
>  Labels: repair
> Attachments: 5483-full-trunk.txt, 
> 5483-v06-04-Allow-tracing-ttl-to-be-configured.patch, 
> 5483-v06-05-Add-a-command-column-to-system_traces.events.patch, 
> 5483-v06-06-Fix-interruption-in-tracestate-propagation.patch, 
> 5483-v07-07-Better-constructor-parameters-for-DebuggableThreadPoolExecutor.patch,
>  5483-v07-08-Fix-brace-style.patch, 
> 5483-v07-09-Add-trace-option-to-a-more-complete-set-of-repair-functions.patch,
>  5483-v07-10-Correct-name-of-boolean-repairedAt-to-fullRepair.patch, 
> 5483-v08-11-Shorten-trace-messages.-Use-Tracing-begin.patch, 
> 5483-v08-12-Trace-streaming-in-Differencer-StreamingRepairTask.patch, 
> 5483-v08-13-sendNotification-of-local-traces-back-to-nodetool.patch, 
> 5483-v08-14-Poll-system_traces.events.patch, 
> 5483-v08-15-Limit-trace-notifications.-Add-exponential-backoff.patch, 
> 5483-v09-16-Fix-hang-caused-by-incorrect-exit-code.patch, 
> 5483-v10-17-minor-bugfixes-and-changes.patch, 
> 5483-v10-rebased-and-squashed-471f5cc.patch, 5483-v11-01-squashed.patch, 

[jira] [Commented] (CASSANDRA-5483) Repair tracing

2014-06-11 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14028104#comment-14028104
 ] 

Benedict commented on CASSANDRA-5483:
-

Like I said, I only replicated functionality - the exponential backoff is 
maintained, it's only defined more simply (as far as I could read you doubled 
the wait time every second timeout, which is what happens now also). As the 
code is defined now we explicitly create a repair session and spawn a specific 
thread to process it, so we have a guarantee there's only one thread. It 
doesn't matter if the consumer is in its method when notifyAll is called; if it 
isn't it will receive the notification as soon as it next enters the method.

When I have some time I'll take a look and see if there's an alternative 
approach.

> Repair tracing
> --
>
> Key: CASSANDRA-5483
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5483
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Yuki Morishita
>Assignee: Ben Chan
>Priority: Minor
>  Labels: repair
> Attachments: 5483-full-trunk.txt, 
> 5483-v06-04-Allow-tracing-ttl-to-be-configured.patch, 
> 5483-v06-05-Add-a-command-column-to-system_traces.events.patch, 
> 5483-v06-06-Fix-interruption-in-tracestate-propagation.patch, 
> 5483-v07-07-Better-constructor-parameters-for-DebuggableThreadPoolExecutor.patch,
>  5483-v07-08-Fix-brace-style.patch, 
> 5483-v07-09-Add-trace-option-to-a-more-complete-set-of-repair-functions.patch,
>  5483-v07-10-Correct-name-of-boolean-repairedAt-to-fullRepair.patch, 
> 5483-v08-11-Shorten-trace-messages.-Use-Tracing-begin.patch, 
> 5483-v08-12-Trace-streaming-in-Differencer-StreamingRepairTask.patch, 
> 5483-v08-13-sendNotification-of-local-traces-back-to-nodetool.patch, 
> 5483-v08-14-Poll-system_traces.events.patch, 
> 5483-v08-15-Limit-trace-notifications.-Add-exponential-backoff.patch, 
> 5483-v09-16-Fix-hang-caused-by-incorrect-exit-code.patch, 
> 5483-v10-17-minor-bugfixes-and-changes.patch, 
> 5483-v10-rebased-and-squashed-471f5cc.patch, 5483-v11-01-squashed.patch, 
> 5483-v11-squashed-nits.patch, 5483-v12-02-cassandra-yaml-ttl-doc.patch, 
> 5483-v13-608fb03-May-14-trace-formatting-changes.patch, 
> 5483-v14-01-squashed.patch, 5483-v15.patch, ccm-repair-test, 
> cqlsh-left-justify-text-columns.patch, prerepair-vs-postbuggedrepair.diff, 
> test-5483-system_traces-events.txt, 
> trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch, 
> trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch,
>  tr...@8ebeee1-5483-v01-001-trace-filtering-and-tracestate-propagation.txt, 
> tr...@8ebeee1-5483-v01-002-simple-repair-tracing.txt, 
> v02p02-5483-v03-0003-Make-repair-tracing-controllable-via-nodetool.patch, 
> v02p02-5483-v04-0003-This-time-use-an-EnumSet-to-pass-boolean-repair-options.patch,
>  v02p02-5483-v05-0003-Use-long-instead-of-EnumSet-to-work-with-JMX.patch
>
>
> I think it would be nice to log repair stats and results like query tracing 
> stores traces to system keyspace. With it, you don't have to lookup each log 
> file to see what was the status and how it performed the repair you invoked. 
> Instead, you can query the repair log with session ID to see the state and 
> stats of all nodes involved in that repair session.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-5483) Repair tracing

2014-06-11 Thread Ben Chan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14027905#comment-14027905
 ] 

Ben Chan commented on CASSANDRA-5483:
-

I'm sorry I can't think this through very deeply at the moment, so please allow 
a little slack if I say something incorrect. I'm writing this during a small 
window of time (where I can't do anything else) in a crisis I'm dealing with.

bq. Why is command part of events instead of sessions?

It was a somewhat arbitrary design decision. I can move it over to sessions. 
The only real reason was historical (see the Mar 3 comment); it was a 
proof-of-concept that never got followed up upon until just now.

bq. Also: should use an enum internally. Logging as string representation is 
fine.

Just to be clear, you mean Java code should work with an enum, and the actual 
cql table column is fine as a string?

The code actually does use an enum (of sorts; not an Enum proper), the 
traceType. The traceType is passed to Tracing#getCommandName to look up the 
String for command name.

bq. It makes people grumpy when we break JMX signatures. Can we add a new 
overload instead, preserving the old? This should cut down on some of the code 
churn in StorageService as well.

I will admit that I didn't really consider the entire ecosystem of tools that 
use JMX to control Cassandra (though note that I did mention the JMX api change 
in a Mar 8 comment "... the server isn't going to work with old versions of 
nodetool. And a newer nodetool still won't work with older server versions.").

bq. It's a minor thing to get hung up on, but I'm not wild about all the work 
needed to propagate TTLs around. Is it really super important to persist repair 
traces much longer than queries? If so, what if we used a separate table and 
just allowed users to modify the default ttl? (The original trace code predates 
default TTLs, I think, or we would have made use of it there.)

I guess the question is how many different types of things (bootstrap, repair, 
query, decommission, ... anything else?) might eventually end up being traced. 
If n is small, then having n tables may be fine.

The logic was this (see Mar 1-3 discussion): Repair can take a long time, so 24 
hours may be too short of a ttl.

I recall reading about problematic repairs taking several days, which wouldn't 
mix well with a 24 hour ttl.

bq. Also have a nagging feeling that the notify/wait logic in StorageService + 
TraceState is more complex than necessary.

If there is guaranteed to only be one consumer of notifications at a time, then 
the updated v15 logic seems fine. But if there are ever two traces going on 
(either of different or the same type; are you allowed to have two simultaneous 
repairs of different keyspaces?) which require update notifications, then there 
could be dropped notifications. The problem (I believe) is that all consumers 
may not be in a wait state at the moment when notifyAll is signalled. This 
means a notification could be missed, right? I'm not experienced in Java 
concurrency, and this isn't the best time for me to slowly think things 
through, so it's quite possible I'm wrong here.

However, if you can be completely sure there will never be concurrent repair 
traces happening on the same node, or any other trace types (whatever types are 
added in the future) that require update notifications in order to implement 
on-the-fly reporting, then that issue is moot, and v15 should work fine, as far 
as my cursory inspection goes.

bq. I should note I've made no attempt to corroborate this behaviour is 
sensible; I've only simplified it.

Any feedback would be welcome. As I've said before, heuristics are messy. I 
talked about the reasoning behind my design decisions, and a possibility for an 
alternate implementation (with attendant tradeoffs) in a Mar 17 comment. I 
honestly thought I'd get more comments on it at the time, but it's possible the 
patch had already gotten into TL; DR territory even then.

---

Okay my short break from reality is over. Time to panic.


> Repair tracing
> --
>
> Key: CASSANDRA-5483
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5483
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Yuki Morishita
>Assignee: Ben Chan
>Priority: Minor
>  Labels: repair
> Attachments: 5483-full-trunk.txt, 
> 5483-v06-04-Allow-tracing-ttl-to-be-configured.patch, 
> 5483-v06-05-Add-a-command-column-to-system_traces.events.patch, 
> 5483-v06-06-Fix-interruption-in-tracestate-propagation.patch, 
> 5483-v07-07-Better-constructor-parameters-for-DebuggableThreadPoolExecutor.patch,
>  5483-v07-08-Fix-brace-style.patch, 
> 5483-v07-09-Add-trace-option-to-a-more-complete-set-of-repair-functions.patch,
>  5483-v07-10-Correct-nam

[jira] [Commented] (CASSANDRA-5483) Repair tracing

2014-06-10 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14027108#comment-14027108
 ] 

Jonathan Ellis commented on CASSANDRA-5483:
---

Also have a nagging feeling that the notify/wait logic in StorageService + 
TraceState is more complex than necessary.  WDYT [~benedict]?

> Repair tracing
> --
>
> Key: CASSANDRA-5483
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5483
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Yuki Morishita
>Assignee: Ben Chan
>Priority: Minor
>  Labels: repair
> Attachments: 5483-full-trunk.txt, 
> 5483-v06-04-Allow-tracing-ttl-to-be-configured.patch, 
> 5483-v06-05-Add-a-command-column-to-system_traces.events.patch, 
> 5483-v06-06-Fix-interruption-in-tracestate-propagation.patch, 
> 5483-v07-07-Better-constructor-parameters-for-DebuggableThreadPoolExecutor.patch,
>  5483-v07-08-Fix-brace-style.patch, 
> 5483-v07-09-Add-trace-option-to-a-more-complete-set-of-repair-functions.patch,
>  5483-v07-10-Correct-name-of-boolean-repairedAt-to-fullRepair.patch, 
> 5483-v08-11-Shorten-trace-messages.-Use-Tracing-begin.patch, 
> 5483-v08-12-Trace-streaming-in-Differencer-StreamingRepairTask.patch, 
> 5483-v08-13-sendNotification-of-local-traces-back-to-nodetool.patch, 
> 5483-v08-14-Poll-system_traces.events.patch, 
> 5483-v08-15-Limit-trace-notifications.-Add-exponential-backoff.patch, 
> 5483-v09-16-Fix-hang-caused-by-incorrect-exit-code.patch, 
> 5483-v10-17-minor-bugfixes-and-changes.patch, 
> 5483-v10-rebased-and-squashed-471f5cc.patch, 5483-v11-01-squashed.patch, 
> 5483-v11-squashed-nits.patch, 5483-v12-02-cassandra-yaml-ttl-doc.patch, 
> 5483-v13-608fb03-May-14-trace-formatting-changes.patch, 
> 5483-v14-01-squashed.patch, ccm-repair-test, 
> cqlsh-left-justify-text-columns.patch, prerepair-vs-postbuggedrepair.diff, 
> test-5483-system_traces-events.txt, 
> trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch, 
> trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch,
>  tr...@8ebeee1-5483-v01-001-trace-filtering-and-tracestate-propagation.txt, 
> tr...@8ebeee1-5483-v01-002-simple-repair-tracing.txt, 
> v02p02-5483-v03-0003-Make-repair-tracing-controllable-via-nodetool.patch, 
> v02p02-5483-v04-0003-This-time-use-an-EnumSet-to-pass-boolean-repair-options.patch,
>  v02p02-5483-v05-0003-Use-long-instead-of-EnumSet-to-work-with-JMX.patch
>
>
> I think it would be nice to log repair stats and results like query tracing 
> stores traces to system keyspace. With it, you don't have to lookup each log 
> file to see what was the status and how it performed the repair you invoked. 
> Instead, you can query the repair log with session ID to see the state and 
> stats of all nodes involved in that repair session.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-5483) Repair tracing

2014-06-10 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14027102#comment-14027102
 ] 

Jonathan Ellis commented on CASSANDRA-5483:
---

Some comments having applied the patch:

# Why is command part of events instead of sessions?  Also: should use an enum 
internally.  Logging as string representation is fine.
# It makes people grumpy when we break JMX signatures.  Can we add a new 
overload instead, preserving the old?  This should cut down on some of the code 
churn in StorageService as well.
# It's a minor thing to get hung up on, but I'm not wild about all the work 
needed to propagate TTLs around.  Is it really super important to persist 
repair traces much longer than queries?  If so, what if we used a separate 
table and just allowed users to modify the default ttl?  (The original trace 
code predates default TTLs, I think, or we would have made use of it there.)

> Repair tracing
> --
>
> Key: CASSANDRA-5483
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5483
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Yuki Morishita
>Assignee: Ben Chan
>Priority: Minor
>  Labels: repair
> Attachments: 5483-full-trunk.txt, 
> 5483-v06-04-Allow-tracing-ttl-to-be-configured.patch, 
> 5483-v06-05-Add-a-command-column-to-system_traces.events.patch, 
> 5483-v06-06-Fix-interruption-in-tracestate-propagation.patch, 
> 5483-v07-07-Better-constructor-parameters-for-DebuggableThreadPoolExecutor.patch,
>  5483-v07-08-Fix-brace-style.patch, 
> 5483-v07-09-Add-trace-option-to-a-more-complete-set-of-repair-functions.patch,
>  5483-v07-10-Correct-name-of-boolean-repairedAt-to-fullRepair.patch, 
> 5483-v08-11-Shorten-trace-messages.-Use-Tracing-begin.patch, 
> 5483-v08-12-Trace-streaming-in-Differencer-StreamingRepairTask.patch, 
> 5483-v08-13-sendNotification-of-local-traces-back-to-nodetool.patch, 
> 5483-v08-14-Poll-system_traces.events.patch, 
> 5483-v08-15-Limit-trace-notifications.-Add-exponential-backoff.patch, 
> 5483-v09-16-Fix-hang-caused-by-incorrect-exit-code.patch, 
> 5483-v10-17-minor-bugfixes-and-changes.patch, 
> 5483-v10-rebased-and-squashed-471f5cc.patch, 5483-v11-01-squashed.patch, 
> 5483-v11-squashed-nits.patch, 5483-v12-02-cassandra-yaml-ttl-doc.patch, 
> 5483-v13-608fb03-May-14-trace-formatting-changes.patch, 
> 5483-v14-01-squashed.patch, ccm-repair-test, 
> cqlsh-left-justify-text-columns.patch, prerepair-vs-postbuggedrepair.diff, 
> test-5483-system_traces-events.txt, 
> trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch, 
> trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch,
>  tr...@8ebeee1-5483-v01-001-trace-filtering-and-tracestate-propagation.txt, 
> tr...@8ebeee1-5483-v01-002-simple-repair-tracing.txt, 
> v02p02-5483-v03-0003-Make-repair-tracing-controllable-via-nodetool.patch, 
> v02p02-5483-v04-0003-This-time-use-an-EnumSet-to-pass-boolean-repair-options.patch,
>  v02p02-5483-v05-0003-Use-long-instead-of-EnumSet-to-work-with-JMX.patch
>
>
> I think it would be nice to log repair stats and results like query tracing 
> stores traces to system keyspace. With it, you don't have to lookup each log 
> file to see what was the status and how it performed the repair you invoked. 
> Instead, you can query the repair log with session ID to see the state and 
> stats of all nodes involved in that repair session.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-5483) Repair tracing

2014-06-06 Thread Ben Chan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14020510#comment-14020510
 ] 

Ben Chan commented on CASSANDRA-5483:
-

{noformat}
# git checkout a61ef51
W=https://issues.apache.org/jira/secure/attachment
for url in \
  $W/12648746/5483-v14-01-squashed.patch
do
  { [ -e $(basename $url) ] || curl -sO $url; } && git apply $(basename $url)
done &&
ant clean && ant &&
./ccm-repair-test -kR &&
ccm node1 stop &&
ccm node1 clear &&
ccm node1 start &&
./ccm-repair-test -rt
{noformat}

No changes, just a (mostly) straightforward rebase onto a61ef51 (recent trunk). 
Some issues only show up under actual use, so I fully expect this will need 
some followup.

An idea for the timely tracing of exceptions: you might be able to kludge most 
of it by modifying WrappedRunnable to trace exceptions. I haven't looked too 
closely at it, though.


> Repair tracing
> --
>
> Key: CASSANDRA-5483
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5483
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Yuki Morishita
>Assignee: Ben Chan
>Priority: Minor
>  Labels: repair
> Attachments: 5483-full-trunk.txt, 
> 5483-v06-04-Allow-tracing-ttl-to-be-configured.patch, 
> 5483-v06-05-Add-a-command-column-to-system_traces.events.patch, 
> 5483-v06-06-Fix-interruption-in-tracestate-propagation.patch, 
> 5483-v07-07-Better-constructor-parameters-for-DebuggableThreadPoolExecutor.patch,
>  5483-v07-08-Fix-brace-style.patch, 
> 5483-v07-09-Add-trace-option-to-a-more-complete-set-of-repair-functions.patch,
>  5483-v07-10-Correct-name-of-boolean-repairedAt-to-fullRepair.patch, 
> 5483-v08-11-Shorten-trace-messages.-Use-Tracing-begin.patch, 
> 5483-v08-12-Trace-streaming-in-Differencer-StreamingRepairTask.patch, 
> 5483-v08-13-sendNotification-of-local-traces-back-to-nodetool.patch, 
> 5483-v08-14-Poll-system_traces.events.patch, 
> 5483-v08-15-Limit-trace-notifications.-Add-exponential-backoff.patch, 
> 5483-v09-16-Fix-hang-caused-by-incorrect-exit-code.patch, 
> 5483-v10-17-minor-bugfixes-and-changes.patch, 
> 5483-v10-rebased-and-squashed-471f5cc.patch, 5483-v11-01-squashed.patch, 
> 5483-v11-squashed-nits.patch, 5483-v12-02-cassandra-yaml-ttl-doc.patch, 
> 5483-v13-608fb03-May-14-trace-formatting-changes.patch, 
> 5483-v14-01-squashed.patch, ccm-repair-test, 
> cqlsh-left-justify-text-columns.patch, prerepair-vs-postbuggedrepair.diff, 
> test-5483-system_traces-events.txt, 
> trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch, 
> trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch,
>  tr...@8ebeee1-5483-v01-001-trace-filtering-and-tracestate-propagation.txt, 
> tr...@8ebeee1-5483-v01-002-simple-repair-tracing.txt, 
> v02p02-5483-v03-0003-Make-repair-tracing-controllable-via-nodetool.patch, 
> v02p02-5483-v04-0003-This-time-use-an-EnumSet-to-pass-boolean-repair-options.patch,
>  v02p02-5483-v05-0003-Use-long-instead-of-EnumSet-to-work-with-JMX.patch
>
>
> I think it would be nice to log repair stats and results like query tracing 
> stores traces to system keyspace. With it, you don't have to lookup each log 
> file to see what was the status and how it performed the repair you invoked. 
> Instead, you can query the repair log with session ID to see the state and 
> stats of all nodes involved in that repair session.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-5483) Repair tracing

2014-06-04 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14017950#comment-14017950
 ] 

Jonathan Ellis commented on CASSANDRA-5483:
---

All right, I think we're at a point where this is certainly more useful than 
what we have now.  The perfect is the enemy of the good and all that.

Ben, can you post a branch or a squashed patch?

> Repair tracing
> --
>
> Key: CASSANDRA-5483
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5483
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Yuki Morishita
>Assignee: Ben Chan
>Priority: Minor
>  Labels: repair
> Attachments: 5483-full-trunk.txt, 
> 5483-v06-04-Allow-tracing-ttl-to-be-configured.patch, 
> 5483-v06-05-Add-a-command-column-to-system_traces.events.patch, 
> 5483-v06-06-Fix-interruption-in-tracestate-propagation.patch, 
> 5483-v07-07-Better-constructor-parameters-for-DebuggableThreadPoolExecutor.patch,
>  5483-v07-08-Fix-brace-style.patch, 
> 5483-v07-09-Add-trace-option-to-a-more-complete-set-of-repair-functions.patch,
>  5483-v07-10-Correct-name-of-boolean-repairedAt-to-fullRepair.patch, 
> 5483-v08-11-Shorten-trace-messages.-Use-Tracing-begin.patch, 
> 5483-v08-12-Trace-streaming-in-Differencer-StreamingRepairTask.patch, 
> 5483-v08-13-sendNotification-of-local-traces-back-to-nodetool.patch, 
> 5483-v08-14-Poll-system_traces.events.patch, 
> 5483-v08-15-Limit-trace-notifications.-Add-exponential-backoff.patch, 
> 5483-v09-16-Fix-hang-caused-by-incorrect-exit-code.patch, 
> 5483-v10-17-minor-bugfixes-and-changes.patch, 
> 5483-v10-rebased-and-squashed-471f5cc.patch, 5483-v11-01-squashed.patch, 
> 5483-v11-squashed-nits.patch, 5483-v12-02-cassandra-yaml-ttl-doc.patch, 
> 5483-v13-608fb03-May-14-trace-formatting-changes.patch, ccm-repair-test, 
> cqlsh-left-justify-text-columns.patch, prerepair-vs-postbuggedrepair.diff, 
> test-5483-system_traces-events.txt, 
> trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch, 
> trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch,
>  tr...@8ebeee1-5483-v01-001-trace-filtering-and-tracestate-propagation.txt, 
> tr...@8ebeee1-5483-v01-002-simple-repair-tracing.txt, 
> v02p02-5483-v03-0003-Make-repair-tracing-controllable-via-nodetool.patch, 
> v02p02-5483-v04-0003-This-time-use-an-EnumSet-to-pass-boolean-repair-options.patch,
>  v02p02-5483-v05-0003-Use-long-instead-of-EnumSet-to-work-with-JMX.patch
>
>
> I think it would be nice to log repair stats and results like query tracing 
> stores traces to system keyspace. With it, you don't have to lookup each log 
> file to see what was the status and how it performed the repair you invoked. 
> Instead, you can query the repair log with session ID to see the state and 
> stats of all nodes involved in that repair session.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-5483) Repair tracing

2014-05-22 Thread Ben Chan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14005853#comment-14005853
 ] 

Ben Chan commented on CASSANDRA-5483:
-

{quote}
Is that because the traces are asynchronous? Because I think session 2 only 
starts after session 1 finishes.
{quote}

Here's my high level understanding.

* Repair command #1 and #2, etc are serial.
* Each repair session "Syncing range ..." is "technically" concurrent, since 
each is submitted to a ThreadPoolExecutor.
** However, differencing is serialized, so if there is no streaming going on, 
you won't see very much overlap between the sessions, except at the beginning 
and end (which is exactly what we see with these simple tests).
** Conversely, this means you will see much more interleaving when heavy 
streaming is going on.

So at the very least, it might be good to eventually disambiguate the streaming 
portion.

{quote}
The easiest thing would be to make them non-redundant. Can we make the tracing 
"extra detail" on top of the normal ones instead of competing with them?
{quote}

I think it may be a conceptual block on my part. I tend to think of traces as a 
kind of profiling mechanism.

* Most of the sendNotification calls in StorageService#createRepairTask consist 
of reporting any errors from the results of RepairFuture objects. So the timing 
on those is not really that useful for profiling. They're not really what I'd 
usually think of as a trace.
* Some are request validation reporting before the repair proper even starts.
* The rest are informational sendNotification messages which are redundant when 
tracing is active (this is the easy case).

In pseudocode:

{noformat}
if (some error #1 in repair request)
  sendNotification("NO #1!");
if (some error #2 in repair request)
  sendNotification("NO #2!");
for (r : ranges)
{
  f = something.submitRepairSession(new RepairSession(r));
  futures.add(f);
  try
  {
// this serializes the differencing part.
f.waitForDifferencing()
  }
  catch (SomeException)
  {
// handle, sendNotification
  }
}

try
{
  for (f : futures)
  {
r = f.get();
sendNotification("done: %s", r);
  }
}
catch (ExecutionException ee)
{
  // handle, sendNotification
}
catch (Exception e)
{
  // handle, sendNotification
}
{noformat}

The main point being that I can't be sure that every single interesting 
exception is caught and traced in the thread where it's thrown, then rethrown. 
Most likely, this is not the case, and some exceptions are only reported at the 
StorageService#createRepairTask level. I believe most \(?) cases are already 
caught and traced, though.

So after going through all that, I'm thinking that the easiest thing is to just 
accept the possibility of redundancy and delayed reporting, and just trace all 
sendNotification in StorageService#createRepairTask (unless it's demonstrably 
redundant, or already being traced through some other mechanism).


> Repair tracing
> --
>
> Key: CASSANDRA-5483
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5483
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Yuki Morishita
>Assignee: Ben Chan
>Priority: Minor
>  Labels: repair
> Attachments: 5483-full-trunk.txt, 
> 5483-v06-04-Allow-tracing-ttl-to-be-configured.patch, 
> 5483-v06-05-Add-a-command-column-to-system_traces.events.patch, 
> 5483-v06-06-Fix-interruption-in-tracestate-propagation.patch, 
> 5483-v07-07-Better-constructor-parameters-for-DebuggableThreadPoolExecutor.patch,
>  5483-v07-08-Fix-brace-style.patch, 
> 5483-v07-09-Add-trace-option-to-a-more-complete-set-of-repair-functions.patch,
>  5483-v07-10-Correct-name-of-boolean-repairedAt-to-fullRepair.patch, 
> 5483-v08-11-Shorten-trace-messages.-Use-Tracing-begin.patch, 
> 5483-v08-12-Trace-streaming-in-Differencer-StreamingRepairTask.patch, 
> 5483-v08-13-sendNotification-of-local-traces-back-to-nodetool.patch, 
> 5483-v08-14-Poll-system_traces.events.patch, 
> 5483-v08-15-Limit-trace-notifications.-Add-exponential-backoff.patch, 
> 5483-v09-16-Fix-hang-caused-by-incorrect-exit-code.patch, 
> 5483-v10-17-minor-bugfixes-and-changes.patch, 
> 5483-v10-rebased-and-squashed-471f5cc.patch, 5483-v11-01-squashed.patch, 
> 5483-v11-squashed-nits.patch, 5483-v12-02-cassandra-yaml-ttl-doc.patch, 
> 5483-v13-608fb03-May-14-trace-formatting-changes.patch, ccm-repair-test, 
> cqlsh-left-justify-text-columns.patch, prerepair-vs-postbuggedrepair.diff, 
> test-5483-system_traces-events.txt, 
> trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch, 
> trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch,
>  tr...@8ebeee1-5483-v01-001-trace-filtering-and-tracestate-propagation.txt, 
> tr...@8ebeee1-5483-v01-002-simple-repair-tracing.txt

[jira] [Commented] (CASSANDRA-5483) Repair tracing

2014-05-20 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14004285#comment-14004285
 ] 

Jonathan Ellis commented on CASSANDRA-5483:
---

bq. To make the point clearer, I've marked each sub-task graphically in the 
nodetool trace output below

You're right that it's a bit tough to keep the sessions straight.  Is that 
because the traces are asynchronous?  Because I think session 2 only starts 
after session 1 finishes.  Either way I think we can live with it.

bq. not all nodetool messages are rendered redundant by trace output. Since we 
can't just suppress all non-trace sendNotification, how can we unambiguously 
tell nodetool trace output from normal sendNotification messages?

The easiest thing would be to make them non-redundant. :)  Can we make the 
tracing "extra detail" on top of the normal ones instead of competing with them?

> Repair tracing
> --
>
> Key: CASSANDRA-5483
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5483
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Yuki Morishita
>Assignee: Ben Chan
>Priority: Minor
>  Labels: repair
> Attachments: 5483-full-trunk.txt, 
> 5483-v06-04-Allow-tracing-ttl-to-be-configured.patch, 
> 5483-v06-05-Add-a-command-column-to-system_traces.events.patch, 
> 5483-v06-06-Fix-interruption-in-tracestate-propagation.patch, 
> 5483-v07-07-Better-constructor-parameters-for-DebuggableThreadPoolExecutor.patch,
>  5483-v07-08-Fix-brace-style.patch, 
> 5483-v07-09-Add-trace-option-to-a-more-complete-set-of-repair-functions.patch,
>  5483-v07-10-Correct-name-of-boolean-repairedAt-to-fullRepair.patch, 
> 5483-v08-11-Shorten-trace-messages.-Use-Tracing-begin.patch, 
> 5483-v08-12-Trace-streaming-in-Differencer-StreamingRepairTask.patch, 
> 5483-v08-13-sendNotification-of-local-traces-back-to-nodetool.patch, 
> 5483-v08-14-Poll-system_traces.events.patch, 
> 5483-v08-15-Limit-trace-notifications.-Add-exponential-backoff.patch, 
> 5483-v09-16-Fix-hang-caused-by-incorrect-exit-code.patch, 
> 5483-v10-17-minor-bugfixes-and-changes.patch, 
> 5483-v10-rebased-and-squashed-471f5cc.patch, 5483-v11-01-squashed.patch, 
> 5483-v11-squashed-nits.patch, 5483-v12-02-cassandra-yaml-ttl-doc.patch, 
> 5483-v13-608fb03-May-14-trace-formatting-changes.patch, ccm-repair-test, 
> cqlsh-left-justify-text-columns.patch, prerepair-vs-postbuggedrepair.diff, 
> test-5483-system_traces-events.txt, 
> trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch, 
> trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch,
>  tr...@8ebeee1-5483-v01-001-trace-filtering-and-tracestate-propagation.txt, 
> tr...@8ebeee1-5483-v01-002-simple-repair-tracing.txt, 
> v02p02-5483-v03-0003-Make-repair-tracing-controllable-via-nodetool.patch, 
> v02p02-5483-v04-0003-This-time-use-an-EnumSet-to-pass-boolean-repair-options.patch,
>  v02p02-5483-v05-0003-Use-long-instead-of-EnumSet-to-work-with-JMX.patch
>
>
> I think it would be nice to log repair stats and results like query tracing 
> stores traces to system keyspace. With it, you don't have to lookup each log 
> file to see what was the status and how it performed the repair you invoked. 
> Instead, you can query the repair log with session ID to see the state and 
> stats of all nodes involved in that repair session.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-5483) Repair tracing

2014-05-15 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13997721#comment-13997721
 ] 

Jonathan Ellis commented on CASSANDRA-5483:
---

I think the session log messages are still confusing, especially since we use 
the same term for "repairing a subrange" and "streaming data."

I don't actually see the session uuid being used in the logs except at 
start/finish.  Can we simplify from

{noformat}
[2014-05-14 17:58:53,768] New session 442daf50-db78-11e3-bad5-c3afa2948904 will 
sync range (-3074457345618258603,3074457345618258602]

[2014-05-14 17:58:56,309] Session completed successfully
[2014-05-14 17:58:56,235] Repair session 442daf50-db78-11e3-bad5-c3afa2948904 
for range (-3074457345618258603,3074457345618258602] finished
{noformat}

to
{noformat}
[2014-05-14 17:58:53,768] Syncing range 
(-3074457345618258603,3074457345618258602]
...
[2014-05-14 17:58:56,235] Completed sync of range 
(-3074457345618258603,3074457345618258602]
{noformat}

Bikeshedding this just a little more:

{{Endpoints /127.0.0.2 and /127.0.0.1 are consistent for events}} -> {{Endpoint 
/127.0.0.2 is consistent with /127.0.0.1 for events}}

> Repair tracing
> --
>
> Key: CASSANDRA-5483
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5483
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Yuki Morishita
>Assignee: Ben Chan
>Priority: Minor
>  Labels: repair
> Attachments: 5483-full-trunk.txt, 
> 5483-v06-04-Allow-tracing-ttl-to-be-configured.patch, 
> 5483-v06-05-Add-a-command-column-to-system_traces.events.patch, 
> 5483-v06-06-Fix-interruption-in-tracestate-propagation.patch, 
> 5483-v07-07-Better-constructor-parameters-for-DebuggableThreadPoolExecutor.patch,
>  5483-v07-08-Fix-brace-style.patch, 
> 5483-v07-09-Add-trace-option-to-a-more-complete-set-of-repair-functions.patch,
>  5483-v07-10-Correct-name-of-boolean-repairedAt-to-fullRepair.patch, 
> 5483-v08-11-Shorten-trace-messages.-Use-Tracing-begin.patch, 
> 5483-v08-12-Trace-streaming-in-Differencer-StreamingRepairTask.patch, 
> 5483-v08-13-sendNotification-of-local-traces-back-to-nodetool.patch, 
> 5483-v08-14-Poll-system_traces.events.patch, 
> 5483-v08-15-Limit-trace-notifications.-Add-exponential-backoff.patch, 
> 5483-v09-16-Fix-hang-caused-by-incorrect-exit-code.patch, 
> 5483-v10-17-minor-bugfixes-and-changes.patch, 
> 5483-v10-rebased-and-squashed-471f5cc.patch, 5483-v11-01-squashed.patch, 
> 5483-v11-squashed-nits.patch, 5483-v12-02-cassandra-yaml-ttl-doc.patch, 
> ccm-repair-test, cqlsh-left-justify-text-columns.patch, 
> prerepair-vs-postbuggedrepair.diff, test-5483-system_traces-events.txt, 
> trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch, 
> trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch,
>  tr...@8ebeee1-5483-v01-001-trace-filtering-and-tracestate-propagation.txt, 
> tr...@8ebeee1-5483-v01-002-simple-repair-tracing.txt, 
> v02p02-5483-v03-0003-Make-repair-tracing-controllable-via-nodetool.patch, 
> v02p02-5483-v04-0003-This-time-use-an-EnumSet-to-pass-boolean-repair-options.patch,
>  v02p02-5483-v05-0003-Use-long-instead-of-EnumSet-to-work-with-JMX.patch
>
>
> I think it would be nice to log repair stats and results like query tracing 
> stores traces to system keyspace. With it, you don't have to lookup each log 
> file to see what was the status and how it performed the repair you invoked. 
> Instead, you can query the repair log with session ID to see the state and 
> stats of all nodes involved in that repair session.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-5483) Repair tracing

2014-05-15 Thread Jeremiah Jordan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13997607#comment-13997607
 ] 

Jeremiah Jordan commented on CASSANDRA-5483:


bq. How does Endpoints /127.0.0.2 and /127.0.0.1 are consistent for events 
scale up to more replicas? Should we switch to using [..] notation instead?

Pretty sure we always do this comparison pair wise?  So there will always only 
be two ip's in it.

> Repair tracing
> --
>
> Key: CASSANDRA-5483
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5483
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Yuki Morishita
>Assignee: Ben Chan
>Priority: Minor
>  Labels: repair
> Attachments: 5483-full-trunk.txt, 
> 5483-v06-04-Allow-tracing-ttl-to-be-configured.patch, 
> 5483-v06-05-Add-a-command-column-to-system_traces.events.patch, 
> 5483-v06-06-Fix-interruption-in-tracestate-propagation.patch, 
> 5483-v07-07-Better-constructor-parameters-for-DebuggableThreadPoolExecutor.patch,
>  5483-v07-08-Fix-brace-style.patch, 
> 5483-v07-09-Add-trace-option-to-a-more-complete-set-of-repair-functions.patch,
>  5483-v07-10-Correct-name-of-boolean-repairedAt-to-fullRepair.patch, 
> 5483-v08-11-Shorten-trace-messages.-Use-Tracing-begin.patch, 
> 5483-v08-12-Trace-streaming-in-Differencer-StreamingRepairTask.patch, 
> 5483-v08-13-sendNotification-of-local-traces-back-to-nodetool.patch, 
> 5483-v08-14-Poll-system_traces.events.patch, 
> 5483-v08-15-Limit-trace-notifications.-Add-exponential-backoff.patch, 
> 5483-v09-16-Fix-hang-caused-by-incorrect-exit-code.patch, 
> 5483-v10-17-minor-bugfixes-and-changes.patch, 
> 5483-v10-rebased-and-squashed-471f5cc.patch, 5483-v11-01-squashed.patch, 
> 5483-v11-squashed-nits.patch, 5483-v12-02-cassandra-yaml-ttl-doc.patch, 
> ccm-repair-test, cqlsh-left-justify-text-columns.patch, 
> prerepair-vs-postbuggedrepair.diff, test-5483-system_traces-events.txt, 
> trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch, 
> trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch,
>  tr...@8ebeee1-5483-v01-001-trace-filtering-and-tracestate-propagation.txt, 
> tr...@8ebeee1-5483-v01-002-simple-repair-tracing.txt, 
> v02p02-5483-v03-0003-Make-repair-tracing-controllable-via-nodetool.patch, 
> v02p02-5483-v04-0003-This-time-use-an-EnumSet-to-pass-boolean-repair-options.patch,
>  v02p02-5483-v05-0003-Use-long-instead-of-EnumSet-to-work-with-JMX.patch
>
>
> I think it would be nice to log repair stats and results like query tracing 
> stores traces to system keyspace. With it, you don't have to lookup each log 
> file to see what was the status and how it performed the repair you invoked. 
> Instead, you can query the repair log with session ID to see the state and 
> stats of all nodes involved in that repair session.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-5483) Repair tracing

2014-05-15 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13995530#comment-13995530
 ] 

Jonathan Ellis commented on CASSANDRA-5483:
---

Suggestions based on the rest:

- Just skip system keyspace entirely and save the logspam (use 
Keyspace.nonSystem instead of Keyspace.all)
- Be consistent about using ks.cf notation (instead of ks/cf)
- Word "keyspace" is redundant in repair command log: {{Starting repair command 
#1, repairing 1 ranges for keyspace db.{tweet} (seq=true, full=true)}}
- Reformat {{Requesting merkle trees for sessions (to [/127.0.0.3, 
/127.0.0.1])}} to {{Requesting merkle trees for sessions from [/127.0.0.3, 
/127.0.0.1]}}
- How does {{Endpoints /127.0.0.2 and /127.0.0.1 are consistent for events}} 
scale up to more replicas?  Should we switch to using {{[..]}} notation instead?
- I'm a little lost in the commands and sessions, e.g. does {{[2014-05-08 
23:27:45,368] Session completed successfully}} refer to session 
3617e3f0-d6ef-11e3-a493-7d438369d7fc or 36a49390-d6ef-11e3-a493-7d438369d7fc?  
Is there exactly one session per command?  If so let's merge the "starting 
repair command" + "new session" output, and the "completed" + "finished".
- Why do we log {{Repair command #1 finished}} with no merkle trees requested 
for db.tweet?  Is it because all sstables are already repaired?  If so we 
should log that.

> Repair tracing
> --
>
> Key: CASSANDRA-5483
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5483
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Yuki Morishita
>Assignee: Ben Chan
>Priority: Minor
>  Labels: repair
> Attachments: 5483-full-trunk.txt, 
> 5483-v06-04-Allow-tracing-ttl-to-be-configured.patch, 
> 5483-v06-05-Add-a-command-column-to-system_traces.events.patch, 
> 5483-v06-06-Fix-interruption-in-tracestate-propagation.patch, 
> 5483-v07-07-Better-constructor-parameters-for-DebuggableThreadPoolExecutor.patch,
>  5483-v07-08-Fix-brace-style.patch, 
> 5483-v07-09-Add-trace-option-to-a-more-complete-set-of-repair-functions.patch,
>  5483-v07-10-Correct-name-of-boolean-repairedAt-to-fullRepair.patch, 
> 5483-v08-11-Shorten-trace-messages.-Use-Tracing-begin.patch, 
> 5483-v08-12-Trace-streaming-in-Differencer-StreamingRepairTask.patch, 
> 5483-v08-13-sendNotification-of-local-traces-back-to-nodetool.patch, 
> 5483-v08-14-Poll-system_traces.events.patch, 
> 5483-v08-15-Limit-trace-notifications.-Add-exponential-backoff.patch, 
> 5483-v09-16-Fix-hang-caused-by-incorrect-exit-code.patch, 
> 5483-v10-17-minor-bugfixes-and-changes.patch, 
> 5483-v10-rebased-and-squashed-471f5cc.patch, 5483-v11-01-squashed.patch, 
> 5483-v11-squashed-nits.patch, 5483-v12-02-cassandra-yaml-ttl-doc.patch, 
> ccm-repair-test, cqlsh-left-justify-text-columns.patch, 
> prerepair-vs-postbuggedrepair.diff, test-5483-system_traces-events.txt, 
> trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch, 
> trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch,
>  tr...@8ebeee1-5483-v01-001-trace-filtering-and-tracestate-propagation.txt, 
> tr...@8ebeee1-5483-v01-002-simple-repair-tracing.txt, 
> v02p02-5483-v03-0003-Make-repair-tracing-controllable-via-nodetool.patch, 
> v02p02-5483-v04-0003-This-time-use-an-EnumSet-to-pass-boolean-repair-options.patch,
>  v02p02-5483-v05-0003-Use-long-instead-of-EnumSet-to-work-with-JMX.patch
>
>
> I think it would be nice to log repair stats and results like query tracing 
> stores traces to system keyspace. With it, you don't have to lookup each log 
> file to see what was the status and how it performed the repair you invoked. 
> Instead, you can query the repair log with session ID to see the state and 
> stats of all nodes involved in that repair session.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-5483) Repair tracing

2014-05-15 Thread Lyuben Todorov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13993089#comment-13993089
 ] 

Lyuben Todorov commented on CASSANDRA-5483:
---

[~jbellis] Output available 
[here|https://gist.github.com/lyubent/bfc133fe92ef1afb9dd4] with vnode output 
at the top and non-vnode output at the bottom. 

> Repair tracing
> --
>
> Key: CASSANDRA-5483
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5483
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Yuki Morishita
>Assignee: Ben Chan
>Priority: Minor
>  Labels: repair
> Attachments: 5483-full-trunk.txt, 
> 5483-v06-04-Allow-tracing-ttl-to-be-configured.patch, 
> 5483-v06-05-Add-a-command-column-to-system_traces.events.patch, 
> 5483-v06-06-Fix-interruption-in-tracestate-propagation.patch, 
> 5483-v07-07-Better-constructor-parameters-for-DebuggableThreadPoolExecutor.patch,
>  5483-v07-08-Fix-brace-style.patch, 
> 5483-v07-09-Add-trace-option-to-a-more-complete-set-of-repair-functions.patch,
>  5483-v07-10-Correct-name-of-boolean-repairedAt-to-fullRepair.patch, 
> 5483-v08-11-Shorten-trace-messages.-Use-Tracing-begin.patch, 
> 5483-v08-12-Trace-streaming-in-Differencer-StreamingRepairTask.patch, 
> 5483-v08-13-sendNotification-of-local-traces-back-to-nodetool.patch, 
> 5483-v08-14-Poll-system_traces.events.patch, 
> 5483-v08-15-Limit-trace-notifications.-Add-exponential-backoff.patch, 
> 5483-v09-16-Fix-hang-caused-by-incorrect-exit-code.patch, 
> 5483-v10-17-minor-bugfixes-and-changes.patch, 
> 5483-v10-rebased-and-squashed-471f5cc.patch, 5483-v11-01-squashed.patch, 
> 5483-v11-squashed-nits.patch, 5483-v12-02-cassandra-yaml-ttl-doc.patch, 
> ccm-repair-test, cqlsh-left-justify-text-columns.patch, 
> prerepair-vs-postbuggedrepair.diff, test-5483-system_traces-events.txt, 
> trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch, 
> trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch,
>  tr...@8ebeee1-5483-v01-001-trace-filtering-and-tracestate-propagation.txt, 
> tr...@8ebeee1-5483-v01-002-simple-repair-tracing.txt, 
> v02p02-5483-v03-0003-Make-repair-tracing-controllable-via-nodetool.patch, 
> v02p02-5483-v04-0003-This-time-use-an-EnumSet-to-pass-boolean-repair-options.patch,
>  v02p02-5483-v05-0003-Use-long-instead-of-EnumSet-to-work-with-JMX.patch
>
>
> I think it would be nice to log repair stats and results like query tracing 
> stores traces to system keyspace. With it, you don't have to lookup each log 
> file to see what was the status and how it performed the repair you invoked. 
> Instead, you can query the repair log with session ID to see the state and 
> stats of all nodes involved in that repair session.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-5483) Repair tracing

2014-05-15 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13997706#comment-13997706
 ] 

Jonathan Ellis commented on CASSANDRA-5483:
---

bq. Pretty sure we always do this comparison pair wise? So there will always 
only be two ip's in it.

Ah, right.  Let's leave that alone then.

> Repair tracing
> --
>
> Key: CASSANDRA-5483
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5483
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Yuki Morishita
>Assignee: Ben Chan
>Priority: Minor
>  Labels: repair
> Attachments: 5483-full-trunk.txt, 
> 5483-v06-04-Allow-tracing-ttl-to-be-configured.patch, 
> 5483-v06-05-Add-a-command-column-to-system_traces.events.patch, 
> 5483-v06-06-Fix-interruption-in-tracestate-propagation.patch, 
> 5483-v07-07-Better-constructor-parameters-for-DebuggableThreadPoolExecutor.patch,
>  5483-v07-08-Fix-brace-style.patch, 
> 5483-v07-09-Add-trace-option-to-a-more-complete-set-of-repair-functions.patch,
>  5483-v07-10-Correct-name-of-boolean-repairedAt-to-fullRepair.patch, 
> 5483-v08-11-Shorten-trace-messages.-Use-Tracing-begin.patch, 
> 5483-v08-12-Trace-streaming-in-Differencer-StreamingRepairTask.patch, 
> 5483-v08-13-sendNotification-of-local-traces-back-to-nodetool.patch, 
> 5483-v08-14-Poll-system_traces.events.patch, 
> 5483-v08-15-Limit-trace-notifications.-Add-exponential-backoff.patch, 
> 5483-v09-16-Fix-hang-caused-by-incorrect-exit-code.patch, 
> 5483-v10-17-minor-bugfixes-and-changes.patch, 
> 5483-v10-rebased-and-squashed-471f5cc.patch, 5483-v11-01-squashed.patch, 
> 5483-v11-squashed-nits.patch, 5483-v12-02-cassandra-yaml-ttl-doc.patch, 
> ccm-repair-test, cqlsh-left-justify-text-columns.patch, 
> prerepair-vs-postbuggedrepair.diff, test-5483-system_traces-events.txt, 
> trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch, 
> trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch,
>  tr...@8ebeee1-5483-v01-001-trace-filtering-and-tracestate-propagation.txt, 
> tr...@8ebeee1-5483-v01-002-simple-repair-tracing.txt, 
> v02p02-5483-v03-0003-Make-repair-tracing-controllable-via-nodetool.patch, 
> v02p02-5483-v04-0003-This-time-use-an-EnumSet-to-pass-boolean-repair-options.patch,
>  v02p02-5483-v05-0003-Use-long-instead-of-EnumSet-to-work-with-JMX.patch
>
>
> I think it would be nice to log repair stats and results like query tracing 
> stores traces to system keyspace. With it, you don't have to lookup each log 
> file to see what was the status and how it performed the repair you invoked. 
> Instead, you can query the repair log with session ID to see the state and 
> stats of all nodes involved in that repair session.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-5483) Repair tracing

2014-05-15 Thread Ben Chan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13997265#comment-13997265
 ] 

Ben Chan commented on CASSANDRA-5483:
-

I seem to be having problems with JIRA email notifications. May 12 arrived 
fine, May 7 never arrived, May 8 arrived on May 13, but was dated May 10. 
Moving on...

{quote}
Just skip system keyspace entirely and save the logspam (use Keyspace.nonSystem 
instead of Keyspace.all)
{quote}

This patch has grown enough parts to be a little unwieldy. Just to be clear, 
the output from [https://gist.github.com/lyubent/bfc133fe92ef1afb9dd4] is the 
verbose output from {{nodetool}}, which means there is some extra output aside 
from the traces themselves. (TODO to self, I need to make the nodetool verbose 
output optional.). That particular message comes from 
{{src/java/org/apache/cassandra/tools/NodeProbe.java}}, in a part of the code 
untouched by this patch. I can go ahead and nuke that particular message for 
the system keyspace.

{quote}
How does {{Endpoints /127.0.0.2 and /127.0.0.1 are consistent for events}} 
scale up to more replicas? Should we switch to using {{\[..]}} notation instead?
{quote}

{{n * (n - 1)}} differences calculated for {{n}} replicas, so {{n * (n - 1)}} 
"are consistent" messages. I haven't dug deep enough into the code to be 
certain, but on the face of it, it seems like there should be some (possibly 
not-simple) way to reduce this to {{O(n * log\(n))}}. Enough speculation, 
though.

One edge case for the proposed notation would be a consistency partition:

{noformat}
A == B == C
A != D
D == E == F

=>

# We need a separate message for each partition.
Endpoints [A, B, C] are consistent for events
Endpoints [D, E, F] are consistent for events
{noformat}

Even with the edge case, it seems messy, but doable. You do lose trace timing 
information on the calculation of individual differences (the consistent ones, 
at least). On the other hand, comparing matching merkle trees should be a 
consistently fast operation, so you're probably not missing out on too much 
information.

{quote}
I'm a little lost in the commands and sessions, e.g. does {{\[2014-05-08 
23:27:45,368] Session completed successfully}} refer to session 
3617e3f0-d6ef-11e3-a493-7d438369d7fc or 36a49390-d6ef-11e3-a493-7d438369d7fc? 
Is there exactly one session per command? If so let's merge the "starting 
repair command" + "new session" output, and the "completed" + "finished".
{quote}

Each repair command seems to consist of multiple repair sessions (one per 
range). The sessions go semi-sequentially; there's a semi-random overlap 
between the end of one session and the start of another, like so (using small 
integers instead of UUIDs, and some labels on the left for readability):

{noformat}
[command #1   ] Starting repair command #1
[command #1, session 1] New session 1 will sync range ...
[command #1, session 1] Requesting merkle tree for ...
[command #1, session 1] Received merkle tree for ...
[command #1, session 2] New session 2 will sync range ...
[command #1, session 2] Requesting merkle tree for ...
[command #1, session 1] Endpoints ... consistent.
[command #1, session 1] Session 1 completed successfully
[command #1, session 2] Received merkle tree for ...
[command #1, session 2] Endpoints ... consistent.
[command #1, session 3] New session 3 will sync range ...
[command #1, session 2] Session 2 completed successfully
[command #1, session 3] Requesting merkle tree for ...
[command #1, session 3] Received merkle tree for ...
[command #1, session 3] Endpoints ... consistent.
[command #1, session 3] Session 3 completed successfully
[command #1   ] Repair command #1 finished
{noformat}

Most of the time it's obvious from context, but during that overlap, having the 
repair session UUID helps to disambiguate. I suspect the overlap is even 
greater (and more confusing) when you have heavy streaming.

{quote}
Why do we log Repair command #1 finished with no merkle trees requested for 
db.tweet? Is it because all sstables are already repaired? If so we should log 
that.
{quote}

I've never encountered a trace like that in my testing. I always seem to get 
merkle trees exchanged (see the log below), even if no streaming is needed. I'm 
hoping lyubent can provide enough information for me to be able to recreate 
this situation locally.

{quote}
Does this actually show any streaming? If so I'm missing it.
{quote}

lyubent's sample run didn't need streaming, so no streaming to trace. Here's 
how I usually test streaming (using [^ccm-repair-test] and yukim's method.):

{noformat}
./ccm-repair-test -kR &&
ccm node1 stop &&
ccm node1 clear &&
ccm node1 start &&
./ccm-repair-test -rt
{noformat}

Note that this sample run uses the codebase as of 
[^5483-v12-02-cassandra-yaml-ttl-doc.patch]; I haven't got around to doing the 
May 12 changes yet.

I should also wa

[jira] [Commented] (CASSANDRA-5483) Repair tracing

2014-05-14 Thread Lyuben Todorov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13997632#comment-13997632
 ] 

Lyuben Todorov commented on CASSANDRA-5483:
---

bq. Why do we log Repair command #1 finished with no merkle trees requested for 
db.tweet? Is it because all sstables are already repaired? If so we should log 
that.

Ah, I used a Keyspace with RF 1, Here's 
[output|https://gist.github.com/lyubent/b4718a132ecb7fe68a86] from a 3 node 
cluster with the db keyspace having RF = 3 after node 1 was nuked and bought 
back. The output has a few changes:
# System ks is no longer displayed as its not part of the repair.
# Removed _keyspace_ from "Starting repair command #1, repairing 1 ranges for 
keyspace db.\{tweet\} (seq=true, full=true)\}"
# Reformated "Requesting merkle trees for sessions (to \[/127.0.0.3, 
/127.0.0.1\])"

Changes for the above are on 
[43eaaae..fa4089e|https://github.com/lyubent/cassandra/commit/fa4089e8e51b6d647a8f616d33a64fc13b578b14]

> Repair tracing
> --
>
> Key: CASSANDRA-5483
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5483
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Yuki Morishita
>Assignee: Ben Chan
>Priority: Minor
>  Labels: repair
> Attachments: 5483-full-trunk.txt, 
> 5483-v06-04-Allow-tracing-ttl-to-be-configured.patch, 
> 5483-v06-05-Add-a-command-column-to-system_traces.events.patch, 
> 5483-v06-06-Fix-interruption-in-tracestate-propagation.patch, 
> 5483-v07-07-Better-constructor-parameters-for-DebuggableThreadPoolExecutor.patch,
>  5483-v07-08-Fix-brace-style.patch, 
> 5483-v07-09-Add-trace-option-to-a-more-complete-set-of-repair-functions.patch,
>  5483-v07-10-Correct-name-of-boolean-repairedAt-to-fullRepair.patch, 
> 5483-v08-11-Shorten-trace-messages.-Use-Tracing-begin.patch, 
> 5483-v08-12-Trace-streaming-in-Differencer-StreamingRepairTask.patch, 
> 5483-v08-13-sendNotification-of-local-traces-back-to-nodetool.patch, 
> 5483-v08-14-Poll-system_traces.events.patch, 
> 5483-v08-15-Limit-trace-notifications.-Add-exponential-backoff.patch, 
> 5483-v09-16-Fix-hang-caused-by-incorrect-exit-code.patch, 
> 5483-v10-17-minor-bugfixes-and-changes.patch, 
> 5483-v10-rebased-and-squashed-471f5cc.patch, 5483-v11-01-squashed.patch, 
> 5483-v11-squashed-nits.patch, 5483-v12-02-cassandra-yaml-ttl-doc.patch, 
> ccm-repair-test, cqlsh-left-justify-text-columns.patch, 
> prerepair-vs-postbuggedrepair.diff, test-5483-system_traces-events.txt, 
> trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch, 
> trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch,
>  tr...@8ebeee1-5483-v01-001-trace-filtering-and-tracestate-propagation.txt, 
> tr...@8ebeee1-5483-v01-002-simple-repair-tracing.txt, 
> v02p02-5483-v03-0003-Make-repair-tracing-controllable-via-nodetool.patch, 
> v02p02-5483-v04-0003-This-time-use-an-EnumSet-to-pass-boolean-repair-options.patch,
>  v02p02-5483-v05-0003-Use-long-instead-of-EnumSet-to-work-with-JMX.patch
>
>
> I think it would be nice to log repair stats and results like query tracing 
> stores traces to system keyspace. With it, you don't have to lookup each log 
> file to see what was the status and how it performed the repair you invoked. 
> Instead, you can query the repair log with session ID to see the state and 
> stats of all nodes involved in that repair session.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-5483) Repair tracing

2014-05-14 Thread Lyuben Todorov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13997375#comment-13997375
 ] 

Lyuben Todorov commented on CASSANDRA-5483:
---

bq. How does Endpoints /127.0.0.2 and /127.0.0.1 are consistent for events 
scale up to more replicas? Should we switch to using [..] notation instead?

We can store the message in the {{Differencer}}, wait until all the 
differencers complete and then retrieve their messages and build the list of 
consistent / non-consistent replicas, but if this is with pre 5351 repairs, 
trees will still get very large and take a long time to build thus their 
comparisons might take long too so my only reason against doing this is it 
might take longer to give progress information back to users. 

> Repair tracing
> --
>
> Key: CASSANDRA-5483
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5483
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Yuki Morishita
>Assignee: Ben Chan
>Priority: Minor
>  Labels: repair
> Attachments: 5483-full-trunk.txt, 
> 5483-v06-04-Allow-tracing-ttl-to-be-configured.patch, 
> 5483-v06-05-Add-a-command-column-to-system_traces.events.patch, 
> 5483-v06-06-Fix-interruption-in-tracestate-propagation.patch, 
> 5483-v07-07-Better-constructor-parameters-for-DebuggableThreadPoolExecutor.patch,
>  5483-v07-08-Fix-brace-style.patch, 
> 5483-v07-09-Add-trace-option-to-a-more-complete-set-of-repair-functions.patch,
>  5483-v07-10-Correct-name-of-boolean-repairedAt-to-fullRepair.patch, 
> 5483-v08-11-Shorten-trace-messages.-Use-Tracing-begin.patch, 
> 5483-v08-12-Trace-streaming-in-Differencer-StreamingRepairTask.patch, 
> 5483-v08-13-sendNotification-of-local-traces-back-to-nodetool.patch, 
> 5483-v08-14-Poll-system_traces.events.patch, 
> 5483-v08-15-Limit-trace-notifications.-Add-exponential-backoff.patch, 
> 5483-v09-16-Fix-hang-caused-by-incorrect-exit-code.patch, 
> 5483-v10-17-minor-bugfixes-and-changes.patch, 
> 5483-v10-rebased-and-squashed-471f5cc.patch, 5483-v11-01-squashed.patch, 
> 5483-v11-squashed-nits.patch, 5483-v12-02-cassandra-yaml-ttl-doc.patch, 
> ccm-repair-test, cqlsh-left-justify-text-columns.patch, 
> prerepair-vs-postbuggedrepair.diff, test-5483-system_traces-events.txt, 
> trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch, 
> trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch,
>  tr...@8ebeee1-5483-v01-001-trace-filtering-and-tracestate-propagation.txt, 
> tr...@8ebeee1-5483-v01-002-simple-repair-tracing.txt, 
> v02p02-5483-v03-0003-Make-repair-tracing-controllable-via-nodetool.patch, 
> v02p02-5483-v04-0003-This-time-use-an-EnumSet-to-pass-boolean-repair-options.patch,
>  v02p02-5483-v05-0003-Use-long-instead-of-EnumSet-to-work-with-JMX.patch
>
>
> I think it would be nice to log repair stats and results like query tracing 
> stores traces to system keyspace. With it, you don't have to lookup each log 
> file to see what was the status and how it performed the repair you invoked. 
> Instead, you can query the repair log with session ID to see the state and 
> stats of all nodes involved in that repair session.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-5483) Repair tracing

2014-05-13 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13995518#comment-13995518
 ] 

Jonathan Ellis commented on CASSANDRA-5483:
---

Does this actually show any streaming?  If so I'm missing it.

> Repair tracing
> --
>
> Key: CASSANDRA-5483
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5483
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Yuki Morishita
>Assignee: Ben Chan
>Priority: Minor
>  Labels: repair
> Attachments: 5483-full-trunk.txt, 
> 5483-v06-04-Allow-tracing-ttl-to-be-configured.patch, 
> 5483-v06-05-Add-a-command-column-to-system_traces.events.patch, 
> 5483-v06-06-Fix-interruption-in-tracestate-propagation.patch, 
> 5483-v07-07-Better-constructor-parameters-for-DebuggableThreadPoolExecutor.patch,
>  5483-v07-08-Fix-brace-style.patch, 
> 5483-v07-09-Add-trace-option-to-a-more-complete-set-of-repair-functions.patch,
>  5483-v07-10-Correct-name-of-boolean-repairedAt-to-fullRepair.patch, 
> 5483-v08-11-Shorten-trace-messages.-Use-Tracing-begin.patch, 
> 5483-v08-12-Trace-streaming-in-Differencer-StreamingRepairTask.patch, 
> 5483-v08-13-sendNotification-of-local-traces-back-to-nodetool.patch, 
> 5483-v08-14-Poll-system_traces.events.patch, 
> 5483-v08-15-Limit-trace-notifications.-Add-exponential-backoff.patch, 
> 5483-v09-16-Fix-hang-caused-by-incorrect-exit-code.patch, 
> 5483-v10-17-minor-bugfixes-and-changes.patch, 
> 5483-v10-rebased-and-squashed-471f5cc.patch, 5483-v11-01-squashed.patch, 
> 5483-v11-squashed-nits.patch, 5483-v12-02-cassandra-yaml-ttl-doc.patch, 
> ccm-repair-test, cqlsh-left-justify-text-columns.patch, 
> prerepair-vs-postbuggedrepair.diff, test-5483-system_traces-events.txt, 
> trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch, 
> trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch,
>  tr...@8ebeee1-5483-v01-001-trace-filtering-and-tracestate-propagation.txt, 
> tr...@8ebeee1-5483-v01-002-simple-repair-tracing.txt, 
> v02p02-5483-v03-0003-Make-repair-tracing-controllable-via-nodetool.patch, 
> v02p02-5483-v04-0003-This-time-use-an-EnumSet-to-pass-boolean-repair-options.patch,
>  v02p02-5483-v05-0003-Use-long-instead-of-EnumSet-to-work-with-JMX.patch
>
>
> I think it would be nice to log repair stats and results like query tracing 
> stores traces to system keyspace. With it, you don't have to lookup each log 
> file to see what was the status and how it performed the repair you invoked. 
> Instead, you can query the repair log with session ID to see the state and 
> stats of all nodes involved in that repair session.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-5483) Repair tracing

2014-05-02 Thread Ben Chan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987979#comment-13987979
 ] 

Ben Chan commented on CASSANDRA-5483:
-

Patch updated to work with current trunk (08d5f26).

{noformat}
# git checkout 08d5f26
W=https://issues.apache.org/jira/secure/attachment
for url in \
  $W/12643077/5483-v11-01-squashed.patch
do
  { [ -e $(basename $url) ] || curl -sO $url; } && git apply $(basename $url)
done &&
ant clean && ant &&
./ccm-repair-test -kR &&
ccm node1 stop &&
ccm node1 clear &&
ccm node1 start &&
./ccm-repair-test -rt
{noformat}


> Repair tracing
> --
>
> Key: CASSANDRA-5483
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5483
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Yuki Morishita
>Assignee: Ben Chan
>Priority: Minor
>  Labels: repair
> Attachments: 5483-full-trunk.txt, 
> 5483-v06-04-Allow-tracing-ttl-to-be-configured.patch, 
> 5483-v06-05-Add-a-command-column-to-system_traces.events.patch, 
> 5483-v06-06-Fix-interruption-in-tracestate-propagation.patch, 
> 5483-v07-07-Better-constructor-parameters-for-DebuggableThreadPoolExecutor.patch,
>  5483-v07-08-Fix-brace-style.patch, 
> 5483-v07-09-Add-trace-option-to-a-more-complete-set-of-repair-functions.patch,
>  5483-v07-10-Correct-name-of-boolean-repairedAt-to-fullRepair.patch, 
> 5483-v08-11-Shorten-trace-messages.-Use-Tracing-begin.patch, 
> 5483-v08-12-Trace-streaming-in-Differencer-StreamingRepairTask.patch, 
> 5483-v08-13-sendNotification-of-local-traces-back-to-nodetool.patch, 
> 5483-v08-14-Poll-system_traces.events.patch, 
> 5483-v08-15-Limit-trace-notifications.-Add-exponential-backoff.patch, 
> 5483-v09-16-Fix-hang-caused-by-incorrect-exit-code.patch, 
> 5483-v10-17-minor-bugfixes-and-changes.patch, 
> 5483-v10-rebased-and-squashed-471f5cc.patch, 5483-v11-01-squashed.patch, 
> ccm-repair-test, cqlsh-left-justify-text-columns.patch, 
> prerepair-vs-postbuggedrepair.diff, test-5483-system_traces-events.txt, 
> trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch, 
> trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch,
>  tr...@8ebeee1-5483-v01-001-trace-filtering-and-tracestate-propagation.txt, 
> tr...@8ebeee1-5483-v01-002-simple-repair-tracing.txt, 
> v02p02-5483-v03-0003-Make-repair-tracing-controllable-via-nodetool.patch, 
> v02p02-5483-v04-0003-This-time-use-an-EnumSet-to-pass-boolean-repair-options.patch,
>  v02p02-5483-v05-0003-Use-long-instead-of-EnumSet-to-work-with-JMX.patch
>
>
> I think it would be nice to log repair stats and results like query tracing 
> stores traces to system keyspace. With it, you don't have to lookup each log 
> file to see what was the status and how it performed the repair you invoked. 
> Instead, you can query the repair log with session ID to see the state and 
> stats of all nodes involved in that repair session.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-5483) Repair tracing

2014-04-11 Thread Ben Chan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13966778#comment-13966778
 ] 

Ben Chan commented on CASSANDRA-5483:
-

I made some additional changes. Everything is included in 
[^5483-v10-rebased-and-squashed-471f5cc.patch], but I attached 
[^5483-v10-17-minor-bugfixes-and-changes.patch] to make it more more convenient 
to review.

Repair fails without 
[7000-2.1-v2.txt|https://issues.apache.org/jira/secure/attachment/12639633/7000-2.1-v2.txt]
 so I've included that patch in the test code.

Overview:
 * Limit exponential backoff.
 * Handle the case where traceType is negative.
 * Reimplement log2 in BitUtil style.
 * Forgot to add a trace parameter.

{noformat}
# rebased against trunk @ 471f5cc
# git checkout 471f5cc
W=https://issues.apache.org/jira/secure/attachment
for url in \
  $W/12639821/5483-v10-rebased-and-squashed-471f5cc.patch \
  $W/12639633/7000-2.1-v2.txt
do
  { [ -e $(basename $url) ] || curl -sO $url; } && git apply $(basename $url)
done &&
ant clean && ant &&
./ccm-repair-test -kR &&
ccm node1 stop &&
ccm node1 clear &&
ccm node1 start &&
./ccm-repair-test -rt
{noformat}


> Repair tracing
> --
>
> Key: CASSANDRA-5483
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5483
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Yuki Morishita
>Assignee: Ben Chan
>Priority: Minor
>  Labels: repair
> Attachments: 5483-full-trunk.txt, 
> 5483-v06-04-Allow-tracing-ttl-to-be-configured.patch, 
> 5483-v06-05-Add-a-command-column-to-system_traces.events.patch, 
> 5483-v06-06-Fix-interruption-in-tracestate-propagation.patch, 
> 5483-v07-07-Better-constructor-parameters-for-DebuggableThreadPoolExecutor.patch,
>  5483-v07-08-Fix-brace-style.patch, 
> 5483-v07-09-Add-trace-option-to-a-more-complete-set-of-repair-functions.patch,
>  5483-v07-10-Correct-name-of-boolean-repairedAt-to-fullRepair.patch, 
> 5483-v08-11-Shorten-trace-messages.-Use-Tracing-begin.patch, 
> 5483-v08-12-Trace-streaming-in-Differencer-StreamingRepairTask.patch, 
> 5483-v08-13-sendNotification-of-local-traces-back-to-nodetool.patch, 
> 5483-v08-14-Poll-system_traces.events.patch, 
> 5483-v08-15-Limit-trace-notifications.-Add-exponential-backoff.patch, 
> 5483-v09-16-Fix-hang-caused-by-incorrect-exit-code.patch, 
> 5483-v10-17-minor-bugfixes-and-changes.patch, 
> 5483-v10-rebased-and-squashed-471f5cc.patch, ccm-repair-test, 
> cqlsh-left-justify-text-columns.patch, prerepair-vs-postbuggedrepair.diff, 
> test-5483-system_traces-events.txt, 
> trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch, 
> trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch,
>  tr...@8ebeee1-5483-v01-001-trace-filtering-and-tracestate-propagation.txt, 
> tr...@8ebeee1-5483-v01-002-simple-repair-tracing.txt, 
> v02p02-5483-v03-0003-Make-repair-tracing-controllable-via-nodetool.patch, 
> v02p02-5483-v04-0003-This-time-use-an-EnumSet-to-pass-boolean-repair-options.patch,
>  v02p02-5483-v05-0003-Use-long-instead-of-EnumSet-to-work-with-JMX.patch
>
>
> I think it would be nice to log repair stats and results like query tracing 
> stores traces to system keyspace. With it, you don't have to lookup each log 
> file to see what was the status and how it performed the repair you invoked. 
> Instead, you can query the repair log with session ID to see the state and 
> stats of all nodes involved in that repair session.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-5483) Repair tracing

2014-04-10 Thread Lyuben Todorov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13965957#comment-13965957
 ] 

Lyuben Todorov commented on CASSANDRA-5483:
---

[~usrbincc] Can you post the squashed rebase? Would really appreciate it!

> Repair tracing
> --
>
> Key: CASSANDRA-5483
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5483
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Yuki Morishita
>Assignee: Ben Chan
>Priority: Minor
>  Labels: repair
> Attachments: 5483-full-trunk.txt, 
> 5483-v06-04-Allow-tracing-ttl-to-be-configured.patch, 
> 5483-v06-05-Add-a-command-column-to-system_traces.events.patch, 
> 5483-v06-06-Fix-interruption-in-tracestate-propagation.patch, 
> 5483-v07-07-Better-constructor-parameters-for-DebuggableThreadPoolExecutor.patch,
>  5483-v07-08-Fix-brace-style.patch, 
> 5483-v07-09-Add-trace-option-to-a-more-complete-set-of-repair-functions.patch,
>  5483-v07-10-Correct-name-of-boolean-repairedAt-to-fullRepair.patch, 
> 5483-v08-11-Shorten-trace-messages.-Use-Tracing-begin.patch, 
> 5483-v08-12-Trace-streaming-in-Differencer-StreamingRepairTask.patch, 
> 5483-v08-13-sendNotification-of-local-traces-back-to-nodetool.patch, 
> 5483-v08-14-Poll-system_traces.events.patch, 
> 5483-v08-15-Limit-trace-notifications.-Add-exponential-backoff.patch, 
> 5483-v09-16-Fix-hang-caused-by-incorrect-exit-code.patch, ccm-repair-test, 
> cqlsh-left-justify-text-columns.patch, prerepair-vs-postbuggedrepair.diff, 
> test-5483-system_traces-events.txt, 
> trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch, 
> trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch,
>  tr...@8ebeee1-5483-v01-001-trace-filtering-and-tracestate-propagation.txt, 
> tr...@8ebeee1-5483-v01-002-simple-repair-tracing.txt, 
> v02p02-5483-v03-0003-Make-repair-tracing-controllable-via-nodetool.patch, 
> v02p02-5483-v04-0003-This-time-use-an-EnumSet-to-pass-boolean-repair-options.patch,
>  v02p02-5483-v05-0003-Use-long-instead-of-EnumSet-to-work-with-JMX.patch
>
>
> I think it would be nice to log repair stats and results like query tracing 
> stores traces to system keyspace. With it, you don't have to lookup each log 
> file to see what was the status and how it performed the repair you invoked. 
> Instead, you can query the repair log with session ID to see the state and 
> stats of all nodes involved in that repair session.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-5483) Repair tracing

2014-04-05 Thread Ben Chan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13961081#comment-13961081
 ] 

Ben Chan commented on CASSANDRA-5483:
-

Rebased, but can't test properly, because the trunk as of 64bc45849fd2 (my 
{{git rebase}} base) is throwing an error:

{noformat}
java.lang.ExceptionInInitializerError: null
at 
org.apache.cassandra.config.KSMetaData.systemKeyspace(KSMetaData.java:92) 
~[main/:na]
at 
org.apache.cassandra.config.DatabaseDescriptor.applyConfig(DatabaseDescriptor.java:545)
 ~[main/:na]
at 
org.apache.cassandra.config.DatabaseDescriptor.(DatabaseDescriptor.java:128)
 ~[main/:na]
at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:109) 
[main/:na]
at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:454) 
[main/:na]
at 
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:543) 
[main/:na]
Caused by: java.lang.RuntimeException: 
org.apache.cassandra.exceptions.SyntaxException: line 1:127 mismatched 
character 'T' expecting set null
at org.apache.cassandra.config.CFMetaData.compile(CFMetaData.java:518) 
~[main/:na]
at org.apache.cassandra.config.CFMetaData.compile(CFMetaData.java:501) 
~[main/:na]
at org.apache.cassandra.config.CFMetaData.(CFMetaData.java:99) 
~[main/:na]
... 6 common frames omitted
Caused by: org.apache.cassandra.exceptions.SyntaxException: line 1:127 
mismatched character 'T' expecting set null
at 
org.apache.cassandra.cql3.CqlLexer.throwLastRecognitionError(CqlLexer.java:201) 
~[main/:na]
at 
org.apache.cassandra.cql3.QueryProcessor.parseStatement(QueryProcessor.java:348)
 ~[main/:na]
at org.apache.cassandra.config.CFMetaData.compile(CFMetaData.java:509) 
~[main/:na]
... 8 common frames omitted
{noformat}

Things are currently hectic, so I don't know when I'll have the chance to have 
a proper look at it. I'm currently doing a {{git bisect run}} so maybe 
something obvious will turn up.


> Repair tracing
> --
>
> Key: CASSANDRA-5483
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5483
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Yuki Morishita
>Assignee: Ben Chan
>Priority: Minor
>  Labels: repair
> Attachments: 5483-full-trunk.txt, 
> 5483-v06-04-Allow-tracing-ttl-to-be-configured.patch, 
> 5483-v06-05-Add-a-command-column-to-system_traces.events.patch, 
> 5483-v06-06-Fix-interruption-in-tracestate-propagation.patch, 
> 5483-v07-07-Better-constructor-parameters-for-DebuggableThreadPoolExecutor.patch,
>  5483-v07-08-Fix-brace-style.patch, 
> 5483-v07-09-Add-trace-option-to-a-more-complete-set-of-repair-functions.patch,
>  5483-v07-10-Correct-name-of-boolean-repairedAt-to-fullRepair.patch, 
> 5483-v08-11-Shorten-trace-messages.-Use-Tracing-begin.patch, 
> 5483-v08-12-Trace-streaming-in-Differencer-StreamingRepairTask.patch, 
> 5483-v08-13-sendNotification-of-local-traces-back-to-nodetool.patch, 
> 5483-v08-14-Poll-system_traces.events.patch, 
> 5483-v08-15-Limit-trace-notifications.-Add-exponential-backoff.patch, 
> 5483-v09-16-Fix-hang-caused-by-incorrect-exit-code.patch, ccm-repair-test, 
> cqlsh-left-justify-text-columns.patch, prerepair-vs-postbuggedrepair.diff, 
> test-5483-system_traces-events.txt, 
> trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch, 
> trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch,
>  tr...@8ebeee1-5483-v01-001-trace-filtering-and-tracestate-propagation.txt, 
> tr...@8ebeee1-5483-v01-002-simple-repair-tracing.txt, 
> v02p02-5483-v03-0003-Make-repair-tracing-controllable-via-nodetool.patch, 
> v02p02-5483-v04-0003-This-time-use-an-EnumSet-to-pass-boolean-repair-options.patch,
>  v02p02-5483-v05-0003-Use-long-instead-of-EnumSet-to-work-with-JMX.patch
>
>
> I think it would be nice to log repair stats and results like query tracing 
> stores traces to system keyspace. With it, you don't have to lookup each log 
> file to see what was the status and how it performed the repair you invoked. 
> Instead, you can query the repair log with session ID to see the state and 
> stats of all nodes involved in that repair session.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-5483) Repair tracing

2014-04-03 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13959183#comment-13959183
 ] 

Jonathan Ellis commented on CASSANDRA-5483:
---

Can you both rebase and squash?

> Repair tracing
> --
>
> Key: CASSANDRA-5483
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5483
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Yuki Morishita
>Assignee: Ben Chan
>Priority: Minor
>  Labels: repair
> Attachments: 5483-full-trunk.txt, 
> 5483-v06-04-Allow-tracing-ttl-to-be-configured.patch, 
> 5483-v06-05-Add-a-command-column-to-system_traces.events.patch, 
> 5483-v06-06-Fix-interruption-in-tracestate-propagation.patch, 
> 5483-v07-07-Better-constructor-parameters-for-DebuggableThreadPoolExecutor.patch,
>  5483-v07-08-Fix-brace-style.patch, 
> 5483-v07-09-Add-trace-option-to-a-more-complete-set-of-repair-functions.patch,
>  5483-v07-10-Correct-name-of-boolean-repairedAt-to-fullRepair.patch, 
> 5483-v08-11-Shorten-trace-messages.-Use-Tracing-begin.patch, 
> 5483-v08-12-Trace-streaming-in-Differencer-StreamingRepairTask.patch, 
> 5483-v08-13-sendNotification-of-local-traces-back-to-nodetool.patch, 
> 5483-v08-14-Poll-system_traces.events.patch, 
> 5483-v08-15-Limit-trace-notifications.-Add-exponential-backoff.patch, 
> 5483-v09-16-Fix-hang-caused-by-incorrect-exit-code.patch, ccm-repair-test, 
> cqlsh-left-justify-text-columns.patch, prerepair-vs-postbuggedrepair.diff, 
> test-5483-system_traces-events.txt, 
> trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch, 
> trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch,
>  tr...@8ebeee1-5483-v01-001-trace-filtering-and-tracestate-propagation.txt, 
> tr...@8ebeee1-5483-v01-002-simple-repair-tracing.txt, 
> v02p02-5483-v03-0003-Make-repair-tracing-controllable-via-nodetool.patch, 
> v02p02-5483-v04-0003-This-time-use-an-EnumSet-to-pass-boolean-repair-options.patch,
>  v02p02-5483-v05-0003-Use-long-instead-of-EnumSet-to-work-with-JMX.patch
>
>
> I think it would be nice to log repair stats and results like query tracing 
> stores traces to system keyspace. With it, you don't have to lookup each log 
> file to see what was the status and how it performed the repair you invoked. 
> Instead, you can query the repair log with session ID to see the state and 
> stats of all nodes involved in that repair session.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-5483) Repair tracing

2014-04-02 Thread Lyuben Todorov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13958055#comment-13958055
 ] 

Lyuben Todorov commented on CASSANDRA-5483:
---

v9 has indeed fixed the high cpu usage. I pushed v8 and v9 to 
[d9abbfc62319935ef42d4685037f80b292e03e0f|https://github.com/lyubent/cassandra/commit/d9abbfc62319935ef42d4685037f80b292e03e0f]
 with only the minor change of giving a name to the thread that [carries out 
polling|https://github.com/lyubent/cassandra/commit/d9abbfc62319935ef42d4685037f80b292e03e0f#diff-b76a607445d53f18a98c9df14323c7ddR2649]
 for nodetool output. The thread name is currently {{RepairTracePolling}} but 
I'm open to suggestions. 

[~usrbincc] I think we should rebase to the latest trunk but the patch is 
looking good. 

> Repair tracing
> --
>
> Key: CASSANDRA-5483
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5483
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Yuki Morishita
>Assignee: Ben Chan
>Priority: Minor
>  Labels: repair
> Attachments: 5483-full-trunk.txt, 
> 5483-v06-04-Allow-tracing-ttl-to-be-configured.patch, 
> 5483-v06-05-Add-a-command-column-to-system_traces.events.patch, 
> 5483-v06-06-Fix-interruption-in-tracestate-propagation.patch, 
> 5483-v07-07-Better-constructor-parameters-for-DebuggableThreadPoolExecutor.patch,
>  5483-v07-08-Fix-brace-style.patch, 
> 5483-v07-09-Add-trace-option-to-a-more-complete-set-of-repair-functions.patch,
>  5483-v07-10-Correct-name-of-boolean-repairedAt-to-fullRepair.patch, 
> 5483-v08-11-Shorten-trace-messages.-Use-Tracing-begin.patch, 
> 5483-v08-12-Trace-streaming-in-Differencer-StreamingRepairTask.patch, 
> 5483-v08-13-sendNotification-of-local-traces-back-to-nodetool.patch, 
> 5483-v08-14-Poll-system_traces.events.patch, 
> 5483-v08-15-Limit-trace-notifications.-Add-exponential-backoff.patch, 
> 5483-v09-16-Fix-hang-caused-by-incorrect-exit-code.patch, ccm-repair-test, 
> cqlsh-left-justify-text-columns.patch, prerepair-vs-postbuggedrepair.diff, 
> test-5483-system_traces-events.txt, 
> trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch, 
> trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch,
>  tr...@8ebeee1-5483-v01-001-trace-filtering-and-tracestate-propagation.txt, 
> tr...@8ebeee1-5483-v01-002-simple-repair-tracing.txt, 
> v02p02-5483-v03-0003-Make-repair-tracing-controllable-via-nodetool.patch, 
> v02p02-5483-v04-0003-This-time-use-an-EnumSet-to-pass-boolean-repair-options.patch,
>  v02p02-5483-v05-0003-Use-long-instead-of-EnumSet-to-work-with-JMX.patch
>
>
> I think it would be nice to log repair stats and results like query tracing 
> stores traces to system keyspace. With it, you don't have to lookup each log 
> file to see what was the status and how it performed the repair you invoked. 
> Instead, you can query the repair log with session ID to see the state and 
> stats of all nodes involved in that repair session.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-5483) Repair tracing

2014-03-30 Thread Ben Chan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13954746#comment-13954746
 ] 

Ben Chan commented on CASSANDRA-5483:
-

Ouch. After running a before and after test, I'm 99% sure this was the problem. 
There was some obviously wrong code in {{waitActivity}} (an older version used 
0 instead of -1 to signify "done"; I apparently forgot to update everything 
when I changed this).

Sorry about removing the previous patch. It didn't have the correct {{git diff 
-p}} parameters.

For convenience:

{noformat}
W=https://issues.apache.org/jira/secure/attachment
for url in \
  $W/12637720/5483-v09-16-Fix-hang-caused-by-incorrect-exit-code.patch
do [ -e $(basename $url) ] || curl -sO $url; done &&
git apply 5483-v09-*.patch &&
ant clean && ant
{noformat}

Here's what I used to test with; I get slower and slower repairs, with a hang 
on the 5th repair with the "before" code, and consistent 10-second repairs with 
the "after" code.

{noformat}
cat > ccm-nodetool <<"E"
#!/bin/sh

# ccm doesn't let us call nodetool with options, but we still need to get the
# host and port config from it.
read -r JMXGET < Repair tracing
> --
>
> Key: CASSANDRA-5483
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5483
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Yuki Morishita
>Assignee: Ben Chan
>Priority: Minor
>  Labels: repair
> Attachments: 5483-full-trunk.txt, 
> 5483-v06-04-Allow-tracing-ttl-to-be-configured.patch, 
> 5483-v06-05-Add-a-command-column-to-system_traces.events.patch, 
> 5483-v06-06-Fix-interruption-in-tracestate-propagation.patch, 
> 5483-v07-07-Better-constructor-parameters-for-DebuggableThreadPoolExecutor.patch,
>  5483-v07-08-Fix-brace-style.patch, 
> 5483-v07-09-Add-trace-option-to-a-more-complete-set-of-repair-functions.patch,
>  5483-v07-10-Correct-name-of-boolean-repairedAt-to-fullRepair.patch, 
> 5483-v08-11-Shorten-trace-messages.-Use-Tracing-begin.patch, 
> 5483-v08-12-Trace-streaming-in-Differencer-StreamingRepairTask.patch, 
> 5483-v08-13-sendNotification-of-local-traces-back-to-nodetool.patch, 
> 5483-v08-14-Poll-system_traces.events.patch, 
> 5483-v08-15-Limit-trace-notifications.-Add-exponential-backoff.patch, 
> 5483-v09-16-Fix-hang-caused-by-incorrect-exit-code.patch, ccm-repair-test, 
> cqlsh-left-justify-text-columns.patch, prerepair-vs-postbuggedrepair.diff, 
> test-5483-system_traces-events.txt, 
> trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch, 
> trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch,
>  tr...@8ebeee1-5483-v01-001-trace-filtering-and-tracestate-propagation.txt, 
> tr...@8ebeee1-5483-v01-002-simple-repair-tracing.txt, 
> v02p02-5483-v03-0003-Make-repair-tracing-controllable-via-nodetool.patch, 
> v02p02-5483-v04-0003-This-time-use-an-EnumSet-to-pass-boolean-repair-options.patch,
>  v02p02-5483-v05-0003-Use-long-instead-of-EnumSet-to-work-with-JMX.patch
>
>
> I think it would be nice to log repair stats and results like query tracing 
> stores traces to system keyspace. With it, you don't have to lookup each log 
> file to see what was the status and how it performed the repair you invoked. 
> Instead, you can query the repair log with session ID to see the state and 
> stats of all nodes involved in that repair session.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-5483) Repair tracing

2014-03-17 Thread Ben Chan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13937976#comment-13937976
 ] 

Ben Chan commented on CASSANDRA-5483:
-

{noformat}
# tested with branch 5483 @ bce0c2c555a3; should also work following successful
#git apply 5483-full-trunk.txt
W=https://issues.apache.org/jira/secure/attachment
for url in \
  $W/12635094/5483-v08-11-Shorten-trace-messages.-Use-Tracing-begin.patch   
   \
  
$W/12635095/5483-v08-12-Trace-streaming-in-Differencer-StreamingRepairTask.patch
 \
  
$W/12635096/5483-v08-13-sendNotification-of-local-traces-back-to-nodetool.patch 
 \
  $W/12635097/5483-v08-14-Poll-system_traces.events.patch   
   \
  
$W/12635098/5483-v08-15-Limit-trace-notifications.-Add-exponential-backoff.patch
do [ -e $(basename $url) ] || curl -sO $url; done &&
git apply 5483-v08-*.patch &&
ant clean && ant

./ccm-repair-test -kR &&
ccm node1 stop &&
ccm node1 clear &&
ccm node1 start &&
./ccm-repair-test -rt
{noformat}

* {{v08-11}} There was an error in one of the log formats in Differencer, which 
made my grep for "out of sync" in the logs fruitless.
* {{v08-12}} I ended up using the handleStreamEvent of StreamingRepairTask 
instead of implementing and registering my own StreamEventHandler. The new 
trace messages may need adjusting, especially for ProgressEvent, which is 
essentially just a toString currently.
* {{v08-13}} This works by adding a guarded sendNotification to 
TraceState#trace.
* {{v08-14}} This works by starting a thread to poll {{system_traces.events}}, 
and by adding notify functionality to TraceState. There is some jitter in the 
ordering between local and remote traces. An easy fix would be to have the 
query thread handle all sendNotification of traces. You have to accept latency 
in sendNotification of local traces in order to get better ordering. It might 
be necessary to delay all trace sendNotification by a few seconds to make it 
more likely that remote traces have arrived.
* {{v08-15}} Even more added TraceState functionality. All to try to reduce the 
amount of polling without hurting latency too much. There are only a few local 
traces that you would expect to be followed by a remote trace, so only wake up 
for those. Poll with an exponential backoff after each notification.

---

Heuristics are messy, and I expect plenty of opinions on {{v08-14}} and 
{{v08-15}}. I'm not especially proud of that code, but I can't think of 
anything better at the moment, given the (self-imposed?) constraints.

I may have reinvented the wheel with synchronization primitives. I checked 
{{java.util.concurrent.*}} and {{SimpleCondition}}, but not much beyond that. I 
could have missed something; I don't fully understand some of the classes. What 
I wanted was to be woken up (with a timeout) if anything has changed since the 
last time I checked. Theoretically, it should work for multiple consumers (As 
long as no one waits for longer than {{Integer.MAX_VALUE}} updates), though 
that's not really necessary here, if that would simplify the code.

The code seems to work reasonably well for small-scale tests. I can convince 
myself that it won't blow up for long repairs, but haven't done a full test yet.


> Repair tracing
> --
>
> Key: CASSANDRA-5483
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5483
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Yuki Morishita
>Assignee: Ben Chan
>Priority: Minor
>  Labels: repair
> Attachments: 5483-full-trunk.txt, 
> 5483-v06-04-Allow-tracing-ttl-to-be-configured.patch, 
> 5483-v06-05-Add-a-command-column-to-system_traces.events.patch, 
> 5483-v06-06-Fix-interruption-in-tracestate-propagation.patch, 
> 5483-v07-07-Better-constructor-parameters-for-DebuggableThreadPoolExecutor.patch,
>  5483-v07-08-Fix-brace-style.patch, 
> 5483-v07-09-Add-trace-option-to-a-more-complete-set-of-repair-functions.patch,
>  5483-v07-10-Correct-name-of-boolean-repairedAt-to-fullRepair.patch, 
> 5483-v08-11-Shorten-trace-messages.-Use-Tracing-begin.patch, 
> 5483-v08-12-Trace-streaming-in-Differencer-StreamingRepairTask.patch, 
> 5483-v08-13-sendNotification-of-local-traces-back-to-nodetool.patch, 
> 5483-v08-14-Poll-system_traces.events.patch, 
> 5483-v08-15-Limit-trace-notifications.-Add-exponential-backoff.patch, 
> ccm-repair-test, cqlsh-left-justify-text-columns.patch, 
> test-5483-system_traces-events.txt, 
> trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch, 
> trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch,
>  tr...@8ebeee1-5483-v01-001-trace-filtering-and-tracestate-propagation.txt, 
> tr...@8ebeee1-5483-v01-002-simple-repair-tracing.txt, 
> v02p02-5483-v03-0003-Make-repair-tracing-controllable-vi

[jira] [Commented] (CASSANDRA-5483) Repair tracing

2014-03-15 Thread Ben Chan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13936173#comment-13936173
 ] 

Ben Chan commented on CASSANDRA-5483:
-

About the {{system_traces}} feedback loop during repair. I don't have a way to 
consistently reproduce it. Normally these test repairs take no longer than 1-2 
minutes, but occasionally, it gets into the 10-20 minute range, with lots of 
merkle trees for {{system_traces}} getting sent ({{system_traces.events}} at 
that point is in the tens of thousands of rows.), at which point I usually just 
stop the repair, and start fresh with a new cluster. I'll pay closer attention 
next time to see if I can get a good repro case.

> Repair tracing
> --
>
> Key: CASSANDRA-5483
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5483
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Yuki Morishita
>Assignee: Ben Chan
>Priority: Minor
>  Labels: repair
> Attachments: 5483-full-trunk.txt, 
> 5483-v06-04-Allow-tracing-ttl-to-be-configured.patch, 
> 5483-v06-05-Add-a-command-column-to-system_traces.events.patch, 
> 5483-v06-06-Fix-interruption-in-tracestate-propagation.patch, 
> 5483-v07-07-Better-constructor-parameters-for-DebuggableThreadPoolExecutor.patch,
>  5483-v07-08-Fix-brace-style.patch, 
> 5483-v07-09-Add-trace-option-to-a-more-complete-set-of-repair-functions.patch,
>  5483-v07-10-Correct-name-of-boolean-repairedAt-to-fullRepair.patch, 
> ccm-repair-test, cqlsh-left-justify-text-columns.patch, 
> test-5483-system_traces-events.txt, 
> trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch, 
> trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch,
>  tr...@8ebeee1-5483-v01-001-trace-filtering-and-tracestate-propagation.txt, 
> tr...@8ebeee1-5483-v01-002-simple-repair-tracing.txt, 
> v02p02-5483-v03-0003-Make-repair-tracing-controllable-via-nodetool.patch, 
> v02p02-5483-v04-0003-This-time-use-an-EnumSet-to-pass-boolean-repair-options.patch,
>  v02p02-5483-v05-0003-Use-long-instead-of-EnumSet-to-work-with-JMX.patch
>
>
> I think it would be nice to log repair stats and results like query tracing 
> stores traces to system keyspace. With it, you don't have to lookup each log 
> file to see what was the status and how it performed the repair you invoked. 
> Instead, you can query the repair log with session ID to see the state and 
> stats of all nodes involved in that repair session.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-5483) Repair tracing

2014-03-14 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13935288#comment-13935288
 ] 

Jonathan Ellis commented on CASSANDRA-5483:
---

bq. Maybe exclude system_traces from repair if a repair trace is going on. 

I agree that something fishy is going on, but I'd like to understand the cause 
before adding a workaround.  Can you shed any light [~yukim]?

bq. Maybe add a placeholder row with a null duration for ongoing repair sessions

This is what query tracing does (Tracing.begin, then stopSession writes the 
duration).

bq. Verbose option; dump all traces to nodetool.

I'd be okay pushing this to another ticket, FWIW.

bq. Simplify trace messages

The main principle you want to be careful to observe is to 1) log everything 
the user needs to know but 2) without logging extra noise.  Redundant 
information is noise.  So logging the repairsession once in the "New session" 
line is adequate, you don't need to repeat it with every message afterwards.  
Similarly, "Requesting merkle tree" line shows what peers are involved, we 
don't need to repeat that elsewhere either.

> Repair tracing
> --
>
> Key: CASSANDRA-5483
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5483
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Yuki Morishita
>Assignee: Ben Chan
>Priority: Minor
>  Labels: repair
> Attachments: 5483-full-trunk.txt, 
> 5483-v06-04-Allow-tracing-ttl-to-be-configured.patch, 
> 5483-v06-05-Add-a-command-column-to-system_traces.events.patch, 
> 5483-v06-06-Fix-interruption-in-tracestate-propagation.patch, 
> 5483-v07-07-Better-constructor-parameters-for-DebuggableThreadPoolExecutor.patch,
>  5483-v07-08-Fix-brace-style.patch, 
> 5483-v07-09-Add-trace-option-to-a-more-complete-set-of-repair-functions.patch,
>  5483-v07-10-Correct-name-of-boolean-repairedAt-to-fullRepair.patch, 
> ccm-repair-test, cqlsh-left-justify-text-columns.patch, 
> test-5483-system_traces-events.txt, 
> trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch, 
> trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch,
>  tr...@8ebeee1-5483-v01-001-trace-filtering-and-tracestate-propagation.txt, 
> tr...@8ebeee1-5483-v01-002-simple-repair-tracing.txt, 
> v02p02-5483-v03-0003-Make-repair-tracing-controllable-via-nodetool.patch, 
> v02p02-5483-v04-0003-This-time-use-an-EnumSet-to-pass-boolean-repair-options.patch,
>  v02p02-5483-v05-0003-Use-long-instead-of-EnumSet-to-work-with-JMX.patch
>
>
> I think it would be nice to log repair stats and results like query tracing 
> stores traces to system keyspace. With it, you don't have to lookup each log 
> file to see what was the status and how it performed the repair you invoked. 
> Instead, you can query the repair log with session ID to see the state and 
> stats of all nodes involved in that repair session.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-5483) Repair tracing

2014-03-13 Thread Yuki Morishita (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13934196#comment-13934196
 ] 

Yuki Morishita commented on CASSANDRA-5483:
---

bq. Is there a simple way to create a situation where a repair requires 
streaming?

The easiest way is to nuke one node after writing data to all nodes.
So, for example, you create 3 nodes cluster, create keyspace with RF=3, fill 
data in, clear one node(ccm node1 clear), bring it up again, then repair.


> Repair tracing
> --
>
> Key: CASSANDRA-5483
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5483
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Yuki Morishita
>Assignee: Ben Chan
>Priority: Minor
>  Labels: repair
> Attachments: 5483-full-trunk.txt, 
> 5483-v06-04-Allow-tracing-ttl-to-be-configured.patch, 
> 5483-v06-05-Add-a-command-column-to-system_traces.events.patch, 
> 5483-v06-06-Fix-interruption-in-tracestate-propagation.patch, 
> 5483-v07-07-Better-constructor-parameters-for-DebuggableThreadPoolExecutor.patch,
>  5483-v07-08-Fix-brace-style.patch, 
> 5483-v07-09-Add-trace-option-to-a-more-complete-set-of-repair-functions.patch,
>  5483-v07-10-Correct-name-of-boolean-repairedAt-to-fullRepair.patch, 
> ccm-repair-test, cqlsh-left-justify-text-columns.patch, 
> test-5483-system_traces-events.txt, 
> trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch, 
> trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch,
>  tr...@8ebeee1-5483-v01-001-trace-filtering-and-tracestate-propagation.txt, 
> tr...@8ebeee1-5483-v01-002-simple-repair-tracing.txt, 
> v02p02-5483-v03-0003-Make-repair-tracing-controllable-via-nodetool.patch, 
> v02p02-5483-v04-0003-This-time-use-an-EnumSet-to-pass-boolean-repair-options.patch,
>  v02p02-5483-v05-0003-Use-long-instead-of-EnumSet-to-work-with-JMX.patch
>
>
> I think it would be nice to log repair stats and results like query tracing 
> stores traces to system keyspace. With it, you don't have to lookup each log 
> file to see what was the status and how it performed the repair you invoked. 
> Instead, you can query the repair log with session ID to see the state and 
> stats of all nodes involved in that repair session.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-5483) Repair tracing

2014-03-12 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13932510#comment-13932510
 ] 

Jonathan Ellis commented on CASSANDRA-5483:
---

Thanks.  I'm going to bikeshed the messages a bit:

- {{Starting repair command #3, repairing 2 ranges for keyspace system_traces 
(seq=true, full=true)}} -> {{Repairing 2 ranges for system_traces.{sessions, 
events} (seq=true, full=true)}}
- {{[repair #fedc3790-a9fa-11e3-ab61-2387bbb38302] new session: will sync 
/127.0.0.1, /127.0.0.3 on range (-3074457345618258603,3074457345618258602] for 
system_traces.[sessions, events]}} -> {{New session 
#fedc3790-a9fa-11e3-ab61-2387bbb38302 will sync range 
(-3074457345618258603,3074457345618258602]}}
- All the rest: omit the session id since that's already traced, and capitalize 
the first word
- Add a message if there's nothing to stream

What does it log when there *is* something to stream?

> Repair tracing
> --
>
> Key: CASSANDRA-5483
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5483
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Yuki Morishita
>Assignee: Ben Chan
>Priority: Minor
>  Labels: repair
> Attachments: 5483-full-trunk.txt, 
> 5483-v06-04-Allow-tracing-ttl-to-be-configured.patch, 
> 5483-v06-05-Add-a-command-column-to-system_traces.events.patch, 
> 5483-v06-06-Fix-interruption-in-tracestate-propagation.patch, 
> 5483-v07-07-Better-constructor-parameters-for-DebuggableThreadPoolExecutor.patch,
>  5483-v07-08-Fix-brace-style.patch, 
> 5483-v07-09-Add-trace-option-to-a-more-complete-set-of-repair-functions.patch,
>  5483-v07-10-Correct-name-of-boolean-repairedAt-to-fullRepair.patch, 
> ccm-repair-test, test-5483-system_traces-events.txt, 
> trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch, 
> trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch,
>  tr...@8ebeee1-5483-v01-001-trace-filtering-and-tracestate-propagation.txt, 
> tr...@8ebeee1-5483-v01-002-simple-repair-tracing.txt, 
> v02p02-5483-v03-0003-Make-repair-tracing-controllable-via-nodetool.patch, 
> v02p02-5483-v04-0003-This-time-use-an-EnumSet-to-pass-boolean-repair-options.patch,
>  v02p02-5483-v05-0003-Use-long-instead-of-EnumSet-to-work-with-JMX.patch
>
>
> I think it would be nice to log repair stats and results like query tracing 
> stores traces to system keyspace. With it, you don't have to lookup each log 
> file to see what was the status and how it performed the repair you invoked. 
> Instead, you can query the repair log with session ID to see the state and 
> stats of all nodes involved in that repair session.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-5483) Repair tracing

2014-03-12 Thread Lyuben Todorov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13931875#comment-13931875
 ] 

Lyuben Todorov commented on CASSANDRA-5483:
---

It's a good idea to add a verbose option to show tracing as it happens. 
Currently to enable logging to the system_traces.events table the {{-tr}} flag 
has to be supplied {{./nodetool repair -tr}}, and then to view the traces 
{{SELECT * FROM system_traces.events}} 

{noformat}
cqlsh> SELECT * FROM system_traces.events ;

 session_id   | event_id | 
activity

  | command | source| source_elapsed | thread
--+--+---+-+---++---
 fedbe970-a9fa-11e3-ab61-2387bbb38302 | fedbe971-a9fa-11e3-ab61-2387bbb38302 |  

 Starting repair command #3, repairing 2 ranges for keyspace system_traces 
(seq=true, full=true) |  REPAIR | 127.0.0.1 | 11 | 
Thread-16
 fedbe970-a9fa-11e3-ab61-2387bbb38302 | fedc5ea0-a9fa-11e3-ab61-2387bbb38302 | 
[repair #fedc3790-a9fa-11e3-ab61-2387bbb38302] new session: will sync 
/127.0.0.1, /127.0.0.3 on range (-3074457345618258603,3074457345618258602] for 
system_traces.[sessions, events] |  REPAIR | 127.0.0.1 |   2493 | 
AntiEntropySessions:7
 fedbe970-a9fa-11e3-ab61-2387bbb38302 | fedef6b0-a9fa-11e3-ab61-2387bbb38302 |  
   [repair 
#fedc3790-a9fa-11e3-ab61-2387bbb38302] requesting merkle trees for sessions (to 
[/127.0.0.3, /127.0.0.1]) |  REPAIR | 127.0.0.1 |  19687 |   
RepairJobTask:2
 fedbe970-a9fa-11e3-ab61-2387bbb38302 | fee167b0-a9fa-11e3-b120-2387bbb38302 |  
   [repair 
#fedc3790-a9fa-11e3-ab61-2387bbb38302] Sending completed merkle tree to 
/127.0.0.1 for system_traces/sessions |  REPAIR | 127.0.0.3 |  31937 |  
  AntiEntropyStage:1
 fedbe970-a9fa-11e3-ab61-2387bbb38302 | fee3b1a0-a9fa-11e3-ab61-2387bbb38302 |  

[repair #fedc3790-a9fa-11e3-ab61-2387bbb38302] Received merkle tree for 
sessions from /127.0.0.3 |  REPAIR | 127.0.0.1 |  50996 |
AntiEntropyStage:1
 fedbe970-a9fa-11e3-ab61-2387bbb38302 | fee7d050-a9fa-11e3-ab61-2387bbb38302 |  

[repair #fedc3790-a9fa-11e3-ab61-2387bbb38302] Received merkle tree for 
sessions from /127.0.0.1 |  REPAIR | 127.0.0.1 |  77907 |
AntiEntropyStage:1
 fedbe970-a9fa-11e3-ab61-2387bbb38302 | feea1a40-a9fa-11e3-ab61-2387bbb38302 |  
 [repair 
#fedc3790-a9fa-11e3-ab61-2387bbb38302] requesting merkle trees for events (to 
[/127.0.0.3, /127.0.0.1]) |  REPAIR | 127.0.0.1 |  92358 |   
RepairJobTask:2
 fedbe970-a9fa-11e3-ab61-2387bbb38302 | feecd960-a9fa-11e3-b120-2387bbb38302 |  
 [repair 
#fedc3790-a9fa-11e3-ab61-2387bbb38302] Sending completed merkle tree to 
/127.0.0.1 for system_traces/events |  REPAIR | 127.0.0.3 | 107062 |
AntiEntropyStage:1
 fedbe970-a9fa-11e3-ab61-2387bbb38302 | feef2350-a9fa-11e3-ab61-2387bbb38302 |  

  [repair #fedc3790-a9fa-11e3-ab61-2387bbb38302] Received merkle tree for 
events from /127.0.0.3 |  REPAIR | 127.0.0.1 | 126230 |
AntiEntropyStage:1
 fedbe970-a9fa-11e3-ab61-2387bbb38302 | fef3de40-a9fa-11e3-ab61-2387bbb38302 |  

  [repair #fedc3790-a9fa-11e3-ab61-2387bbb38302] Received merkle tree for 
events from /127.0.0.1 |  REPAIR | 127.0.0.1 | 157175 |
AntiEntropyStage:1
 fedbe970-a9fa-11e3-ab61-2387bbb38302 | fef40551-a9fa-11e3-ab61-2387bbb38302 | 
[repair #fef40550-a9fa-11e3-ab61-2387bbb38302] new session: will sync 
/127.0.0.1, /127.0.0.2 on range (3074457345618258602,-9223372036854775808] for 
system_traces.[sessions, events] |  REPAIR | 127.0.0.1 | 158225 | 
AntiEntropySessions:8
 fedbe970-a9fa-11e3-ab61-2387bbb38302 | fef4c8a0-a9fa-11e3-ab61-2387bbb38302 |  
   [repair 
#fef

[jira] [Commented] (CASSANDRA-5483) Repair tracing

2014-03-12 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13931858#comment-13931858
 ] 

Jonathan Ellis commented on CASSANDRA-5483:
---

Can you post an example trace session?  

Does this add a "verbose" flag to trace to show the trace as-it-happens or do I 
have to manually check the trace tables?

> Repair tracing
> --
>
> Key: CASSANDRA-5483
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5483
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Yuki Morishita
>Assignee: Ben Chan
>Priority: Minor
>  Labels: repair
> Attachments: 5483-full-trunk.txt, 
> 5483-v06-04-Allow-tracing-ttl-to-be-configured.patch, 
> 5483-v06-05-Add-a-command-column-to-system_traces.events.patch, 
> 5483-v06-06-Fix-interruption-in-tracestate-propagation.patch, 
> 5483-v07-07-Better-constructor-parameters-for-DebuggableThreadPoolExecutor.patch,
>  5483-v07-08-Fix-brace-style.patch, 
> 5483-v07-09-Add-trace-option-to-a-more-complete-set-of-repair-functions.patch,
>  5483-v07-10-Correct-name-of-boolean-repairedAt-to-fullRepair.patch, 
> ccm-repair-test, test-5483-system_traces-events.txt, 
> trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch, 
> trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch,
>  tr...@8ebeee1-5483-v01-001-trace-filtering-and-tracestate-propagation.txt, 
> tr...@8ebeee1-5483-v01-002-simple-repair-tracing.txt, 
> v02p02-5483-v03-0003-Make-repair-tracing-controllable-via-nodetool.patch, 
> v02p02-5483-v04-0003-This-time-use-an-EnumSet-to-pass-boolean-repair-options.patch,
>  v02p02-5483-v05-0003-Use-long-instead-of-EnumSet-to-work-with-JMX.patch
>
>
> I think it would be nice to log repair stats and results like query tracing 
> stores traces to system keyspace. With it, you don't have to lookup each log 
> file to see what was the status and how it performed the repair you invoked. 
> Instead, you can query the repair log with session ID to see the state and 
> stats of all nodes involved in that repair session.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-5483) Repair tracing

2014-03-11 Thread Ben Chan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13930861#comment-13930861
 ] 

Ben Chan commented on CASSANDRA-5483:
-

Okay, final set of changes (I think).

repro code:
{noformat}
#git checkout 5483
W=https://issues.apache.org/jira/secure/attachment
for url in \
  
$W/12633989/5483-v07-07-Better-constructor-parameters-for-DebuggableThreadPoolExecutor.patch
 \
  $W/12633990/5483-v07-08-Fix-brace-style.patch \
  
$W/12633991/5483-v07-09-Add-trace-option-to-a-more-complete-set-of-repair-functions.patch
 \
  $W/12633992/5483-v07-10-Correct-name-of-boolean-repairedAt-to-fullRepair.patch
do [ -e $(basename $url) ] || curl -sO $url; done &&
git apply 5483-v07-*.patch &&
ant clean && ant

./ccm-repair-test
{noformat}

Comments:
* {{v07-07}} I get multiple {{RepairJobTask:...}} ids in the trace, so 
hopefully this resolves the issue. This is as much as I can say with any 
assurance, since I don't normally work with threading.
* {{v07-09}} Without having to worry about backwards compatibility, I decided 
to trace-enable all repair functions I could find and conveniently convert. No 
real change functionally, since nothing in the source tree calls those 
functions and/or overloadings (otherwise I would have had to change much more 
in order to get things to compile).
* {{v07-10}} Again, no functional change. I noticed the difference in parameter 
name when doing the conversion.

You can take or leave the last two; they don't do anything, really.


> Repair tracing
> --
>
> Key: CASSANDRA-5483
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5483
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Yuki Morishita
>Assignee: Ben Chan
>Priority: Minor
>  Labels: repair
> Attachments: 5483-v06-04-Allow-tracing-ttl-to-be-configured.patch, 
> 5483-v06-05-Add-a-command-column-to-system_traces.events.patch, 
> 5483-v06-06-Fix-interruption-in-tracestate-propagation.patch, 
> 5483-v07-07-Better-constructor-parameters-for-DebuggableThreadPoolExecutor.patch,
>  5483-v07-08-Fix-brace-style.patch, 
> 5483-v07-09-Add-trace-option-to-a-more-complete-set-of-repair-functions.patch,
>  5483-v07-10-Correct-name-of-boolean-repairedAt-to-fullRepair.patch, 
> ccm-repair-test, test-5483-system_traces-events.txt, 
> trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch, 
> trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch,
>  tr...@8ebeee1-5483-v01-001-trace-filtering-and-tracestate-propagation.txt, 
> tr...@8ebeee1-5483-v01-002-simple-repair-tracing.txt, 
> v02p02-5483-v03-0003-Make-repair-tracing-controllable-via-nodetool.patch, 
> v02p02-5483-v04-0003-This-time-use-an-EnumSet-to-pass-boolean-repair-options.patch,
>  v02p02-5483-v05-0003-Use-long-instead-of-EnumSet-to-work-with-JMX.patch
>
>
> I think it would be nice to log repair stats and results like query tracing 
> stores traces to system keyspace. With it, you don't have to lookup each log 
> file to see what was the status and how it performed the repair you invoked. 
> Instead, you can query the repair log with session ID to see the state and 
> stats of all nodes involved in that repair session.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-5483) Repair tracing

2014-03-10 Thread Lyuben Todorov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13930004#comment-13930004
 ] 

Lyuben Todorov commented on CASSANDRA-5483:
---

[~usrbincc] Can you please attach the last patch so we can complete this 
ticket? I'll apply it into my git branch :)

> Repair tracing
> --
>
> Key: CASSANDRA-5483
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5483
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Yuki Morishita
>Assignee: Ben Chan
>Priority: Minor
>  Labels: repair
> Attachments: 5483-v06-04-Allow-tracing-ttl-to-be-configured.patch, 
> 5483-v06-05-Add-a-command-column-to-system_traces.events.patch, 
> 5483-v06-06-Fix-interruption-in-tracestate-propagation.patch, 
> ccm-repair-test, test-5483-system_traces-events.txt, 
> trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch, 
> trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch,
>  tr...@8ebeee1-5483-v01-001-trace-filtering-and-tracestate-propagation.txt, 
> tr...@8ebeee1-5483-v01-002-simple-repair-tracing.txt, 
> v02p02-5483-v03-0003-Make-repair-tracing-controllable-via-nodetool.patch, 
> v02p02-5483-v04-0003-This-time-use-an-EnumSet-to-pass-boolean-repair-options.patch,
>  v02p02-5483-v05-0003-Use-long-instead-of-EnumSet-to-work-with-JMX.patch
>
>
> I think it would be nice to log repair stats and results like query tracing 
> stores traces to system keyspace. With it, you don't have to lookup each log 
> file to see what was the status and how it performed the repair you invoked. 
> Instead, you can query the repair log with session ID to see the state and 
> stats of all nodes involved in that repair session.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-5483) Repair tracing

2014-03-08 Thread Lyuben Todorov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13924903#comment-13924903
 ] 

Lyuben Todorov commented on CASSANDRA-5483:
---

bq. Move to a github workflow, or some sort of hybrid github/JIRA?
[~usrbincc] Whichever you prefer, attaching a patch will be simpler, I can 
merge it into the existing github branch and post the update, but if you want 
feel free to create a branch yourself. 

bq. the server isn't going to work with old versions of nodetool
I think that is ok since if someone upgrades it would kind of be expected for 
them to upgrade to the new nodetool as well. 

bq. there is a lot of repetitious parallel code with the tracing and logging 
together
Maybe worth refactoring on a separate ticket, but lets not cause regression by 
trying to minimise code. 

> Repair tracing
> --
>
> Key: CASSANDRA-5483
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5483
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Yuki Morishita
>Assignee: Ben Chan
>Priority: Minor
>  Labels: repair
> Attachments: 5483-v06-04-Allow-tracing-ttl-to-be-configured.patch, 
> 5483-v06-05-Add-a-command-column-to-system_traces.events.patch, 
> 5483-v06-06-Fix-interruption-in-tracestate-propagation.patch, 
> ccm-repair-test, test-5483-system_traces-events.txt, 
> trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch, 
> trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch,
>  tr...@8ebeee1-5483-v01-001-trace-filtering-and-tracestate-propagation.txt, 
> tr...@8ebeee1-5483-v01-002-simple-repair-tracing.txt, 
> v02p02-5483-v03-0003-Make-repair-tracing-controllable-via-nodetool.patch, 
> v02p02-5483-v04-0003-This-time-use-an-EnumSet-to-pass-boolean-repair-options.patch,
>  v02p02-5483-v05-0003-Use-long-instead-of-EnumSet-to-work-with-JMX.patch
>
>
> I think it would be nice to log repair stats and results like query tracing 
> stores traces to system keyspace. With it, you don't have to lookup each log 
> file to see what was the status and how it performed the repair you invoked. 
> Instead, you can query the repair log with session ID to see the state and 
> stats of all nodes involved in that repair session.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-5483) Repair tracing

2014-03-08 Thread Ben Chan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13924874#comment-13924874
 ] 

Ben Chan commented on CASSANDRA-5483:
-

Thanks for the catch. Hopefully the fix is as easy as:

{noformat}
// using OpenJDK's source code for Executors.newCachedThreadPool() as a 
reference
new DebuggableThreadPoolExecutor(0, Integer.MAX_VALUE, 60L, TimeUnit.SECONDS,
 new SynchronousQueue(),
 new NamedThreadFactory("RepairJobTask"));
{noformat}

Also note that {{v02-01}} added {{CompactionExecutor extends 
DebuggableThreadPoolExecutor}}, so that is something else to watch out for.

---

Re: {
Oops (I'm normally K&R).

Re: github
Move to a github workflow, or some sort of hybrid github/JIRA?

---

I guess the last caveat here is that most (I may have left in a few) of the old 
non-{{trace}} function overloads are gone, so the server isn't going to work 
with old versions of {{nodetool}}.

And a newer {{nodetool}} still won't work with older server versions. There may 
be some tricks you can do with JMX, but I don't have deep knowledge there. 
Maybe adding function overloads from the future (I noticed a similar trick in 
{{MessagingService.java}}).

And there is a lot of repetitious parallel code with the tracing and logging 
together. But that shouldn't affect the external functionality.


> Repair tracing
> --
>
> Key: CASSANDRA-5483
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5483
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Yuki Morishita
>Assignee: Ben Chan
>Priority: Minor
>  Labels: repair
> Attachments: 5483-v06-04-Allow-tracing-ttl-to-be-configured.patch, 
> 5483-v06-05-Add-a-command-column-to-system_traces.events.patch, 
> 5483-v06-06-Fix-interruption-in-tracestate-propagation.patch, 
> ccm-repair-test, test-5483-system_traces-events.txt, 
> trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch, 
> trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch,
>  tr...@8ebeee1-5483-v01-001-trace-filtering-and-tracestate-propagation.txt, 
> tr...@8ebeee1-5483-v01-002-simple-repair-tracing.txt, 
> v02p02-5483-v03-0003-Make-repair-tracing-controllable-via-nodetool.patch, 
> v02p02-5483-v04-0003-This-time-use-an-EnumSet-to-pass-boolean-repair-options.patch,
>  v02p02-5483-v05-0003-Use-long-instead-of-EnumSet-to-work-with-JMX.patch
>
>
> I think it would be nice to log repair stats and results like query tracing 
> stores traces to system keyspace. With it, you don't have to lookup each log 
> file to see what was the status and how it performed the repair you invoked. 
> Instead, you can query the repair log with session ID to see the state and 
> stats of all nodes involved in that repair session.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-5483) Repair tracing

2014-03-07 Thread Yuki Morishita (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13924086#comment-13924086
 ] 

Yuki Morishita commented on CASSANDRA-5483:
---

bq.  switching from Executors#newCachedThreadPool to 
DebuggableThreadPoolExecutor

DebuggableThreadPoolExecutor with that constructor keeps only one thread for 
tasks submitted.
I used newCachedThreadPool there to parallelize execution of short running 
tasks as much as possible.

> Repair tracing
> --
>
> Key: CASSANDRA-5483
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5483
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Yuki Morishita
>Assignee: Ben Chan
>Priority: Minor
>  Labels: repair
> Attachments: 5483-v06-04-Allow-tracing-ttl-to-be-configured.patch, 
> 5483-v06-05-Add-a-command-column-to-system_traces.events.patch, 
> 5483-v06-06-Fix-interruption-in-tracestate-propagation.patch, 
> ccm-repair-test, test-5483-system_traces-events.txt, 
> trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch, 
> trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch,
>  tr...@8ebeee1-5483-v01-001-trace-filtering-and-tracestate-propagation.txt, 
> tr...@8ebeee1-5483-v01-002-simple-repair-tracing.txt, 
> v02p02-5483-v03-0003-Make-repair-tracing-controllable-via-nodetool.patch, 
> v02p02-5483-v04-0003-This-time-use-an-EnumSet-to-pass-boolean-repair-options.patch,
>  v02p02-5483-v05-0003-Use-long-instead-of-EnumSet-to-work-with-JMX.patch
>
>
> I think it would be nice to log repair stats and results like query tracing 
> stores traces to system keyspace. With it, you don't have to lookup each log 
> file to see what was the status and how it performed the repair you invoked. 
> Instead, you can query the repair log with session ID to see the state and 
> stats of all nodes involved in that repair session.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-5483) Repair tracing

2014-03-07 Thread Lyuben Todorov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13924053#comment-13924053
 ] 

Lyuben Todorov commented on CASSANDRA-5483:
---

Patch is almost there, I'm mostly worried about the performance hit we would be 
taking by [switching 
from|https://github.com/lyubent/cassandra/commit/b2a2f0c6c7b1bc7bed5e10afd54fb9f95d1886b0#diff-e58f1647e24896cbfb4358bee924cf60L99]
 {{Executors#newCachedThreadPool}} to {{DebuggableThreadPoolExecutor}}.  /cc 
[~yukim]

Also there are a few codestyle nits:
# Validator#run has a 
[statement|https://github.com/lyubent/cassandra/commit/b2a2f0c6c7b1bc7bed5e10afd54fb9f95d1886b0#diff-05a4a919124ba0d2f176466182c67bb6R254]
 with { on the same line.
# Tracing#getCommandName has its { on the same line


Lastsly I created a [feature 
branch|https://github.com/lyubent/cassandra/commit/b2a2f0c6c7b1bc7bed5e10afd54fb9f95d1886b0#diff-e58f1647e24896cbfb4358bee924cf60L99]
 to simplify applying patches :)

> Repair tracing
> --
>
> Key: CASSANDRA-5483
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5483
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Yuki Morishita
>Assignee: Ben Chan
>Priority: Minor
>  Labels: repair
> Attachments: 5483-v06-04-Allow-tracing-ttl-to-be-configured.patch, 
> 5483-v06-05-Add-a-command-column-to-system_traces.events.patch, 
> 5483-v06-06-Fix-interruption-in-tracestate-propagation.patch, 
> ccm-repair-test, test-5483-system_traces-events.txt, 
> trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch, 
> trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch,
>  tr...@8ebeee1-5483-v01-001-trace-filtering-and-tracestate-propagation.txt, 
> tr...@8ebeee1-5483-v01-002-simple-repair-tracing.txt, 
> v02p02-5483-v03-0003-Make-repair-tracing-controllable-via-nodetool.patch, 
> v02p02-5483-v04-0003-This-time-use-an-EnumSet-to-pass-boolean-repair-options.patch,
>  v02p02-5483-v05-0003-Use-long-instead-of-EnumSet-to-work-with-JMX.patch
>
>
> I think it would be nice to log repair stats and results like query tracing 
> stores traces to system keyspace. With it, you don't have to lookup each log 
> file to see what was the status and how it performed the repair you invoked. 
> Instead, you can query the repair log with session ID to see the state and 
> stats of all nodes involved in that repair session.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-5483) Repair tracing

2014-03-06 Thread Ben Chan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13922721#comment-13922721
 ] 

Ben Chan commented on CASSANDRA-5483:
-

It was more involved than I thought, partly because of heisenbugs and the trace 
state mysteriously not propagating (see {{v06-05}}).

Note: changing JMX can cause mysterious errors if you don't {{ant clean && 
ant}}. I ran into the same kinds of stack traces as you did. It's not 
consistent. Sometimes I can make a JMX change and {{ant}} with no problem.

To make patches simpler, I'm posting full repro code. I also tried to simplify 
the naming. Unfortunately, all the previous patches are in jumbled order due to 
a naming convention that doesn't sort. Fortunately, JIRA seems to have an 
easter egg where you can choose the attachment name by changing the url.

{noformat}
# Uncomment to exactly reproduce state.
#git checkout -b 5483-e30d6dc e30d6dc

# Download all needed patches with consistent names, apply patches, build.
W=https://issues.apache.org/jira/secure/attachment
for url in \
  $W/12630490/5483-v02-01-Trace-filtering-and-tracestate-propagation.patch \
  $W/12630491/5483-v02-02-Put-a-few-traces-parallel-to-the-repair-logging.patch 
\
  $W/12631967/5483-v03-03-Make-repair-tracing-controllable-via-nodetool.patch \
  $W/12633153/5483-v06-04-Allow-tracing-ttl-to-be-configured.patch \
  $W/12633154/5483-v06-05-Add-a-command-column-to-system_traces.events.patch \
  $W/12633155/5483-v06-06-Fix-interruption-in-tracestate-propagation.patch \
  $W/12633156/ccm-repair-test
do [ -e $(basename $url) ] || curl -sO $url; done &&
git apply 5483-v0[236]-*.patch &&
ant clean && ant

# put on a separate line because you should at least minimally inspect
# arbitrary code before running.
chmod +x ./ccm-repair-test && ./ccm-repair-test
{noformat}

{{ccm-repair-test}} has some options for convenience:
{noformat}
-k keep (don't delete) the created cluster after successful exit.
-r repair only
-R don't repair
-t do traced repair only
-T don't do traced repair (if neither, then do both traced and untraced repair)
{noformat}

The output of a test run:

{noformat}
Current cluster is now: test-5483-QiR
[2014-03-06 10:46:13,617] Nothing to repair for keyspace 'system'
[2014-03-06 10:46:13,646] Starting repair command #1, repairing 2 ranges for 
keyspace s1 (seq=true, full=true)
[2014-03-06 10:46:16,999] Repair session 72648190-a546-11e3-a5f4-f94811c7b860 
for range (-3074457345618258603,3074457345618258602] finished
[2014-03-06 10:46:17,465] Repair session 73ee2ed0-a546-11e3-a5f4-f94811c7b860 
for range (3074457345618258602,-9223372036854775808] finished
[2014-03-06 10:46:17,465] Repair command #1 finished
[2014-03-06 10:46:17,485] Starting repair command #2, repairing 2 ranges for 
keyspace system_traces (seq=true, full=true)
[2014-03-06 10:46:18,782] Repair session 74aaef20-a546-11e3-a5f4-f94811c7b860 
for range (-3074457345618258603,3074457345618258602] finished
[2014-03-06 10:46:18,816] Repair session 74ff0290-a546-11e3-a5f4-f94811c7b860 
for range (3074457345618258602,-9223372036854775808] finished
[2014-03-06 10:46:18,816] Repair command #2 finished
0 rows exported in 0.015 seconds.
test-5483-QiR-system_traces-events.txt
ok
[2014-03-06 10:46:24,128] Nothing to repair for keyspace 'system'
[2014-03-06 10:46:24,166] Starting repair command #3, repairing 2 ranges for 
keyspace s1 (seq=true, full=true)
[2014-03-06 10:46:25,366] Repair session 78a6d4e0-a546-11e3-a5f4-f94811c7b860 
for range (-3074457345618258603,3074457345618258602] finished
[2014-03-06 10:46:25,415] Repair session 79263e10-a546-11e3-a5f4-f94811c7b860 
for range (3074457345618258602,-9223372036854775808] finished
[2014-03-06 10:46:25,415] Repair command #3 finished
[2014-03-06 10:46:25,485] Starting repair command #4, repairing 2 ranges for 
keyspace system_traces (seq=true, full=true)
[2014-03-06 10:46:27,077] Repair session 796f7c10-a546-11e3-a5f4-f94811c7b860 
for range (-3074457345618258603,3074457345618258602] finished
[2014-03-06 10:46:27,120] Repair session 79f240a0-a546-11e3-a5f4-f94811c7b860 
for range (3074457345618258602,-9223372036854775808] finished
[2014-03-06 10:46:27,120] Repair command #4 finished
48 rows exported in 0.104 seconds.
test-5483-QiR-system_traces-events-tr.txt
found source: 127.0.0.1
found thread: Thread-15
found thread: AntiEntropySessions:1
found thread: RepairJobTask:1
found source: 127.0.0.2
found thread: AntiEntropyStage:1
found source: 127.0.0.3
found thread: AntiEntropySessions:2
found thread: Thread-16
found thread: AntiEntropySessions:3
found thread: AntiEntropySessions:4
unique sources traced: 3
unique threads traced: 8
All thread categories accounted for
ok
{noformat}

---

Patch comments:

- {{v06-04}} I did something similar to {{v03-03}}, (almost) no refactoring. 
The implementation is a little messy architecturally.
- {{v06-05}} This is the suggestion you had to add

[jira] [Commented] (CASSANDRA-5483) Repair tracing

2014-03-03 Thread Lyuben Todorov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13918251#comment-13918251
 ] 

Lyuben Todorov commented on CASSANDRA-5483:
---

Are the latest 3 patches supposed to be incrementally added onto 
{{trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch}}
 and 
{{trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch}}?
 As in

{noformat}
1 - apply 
trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch
2 - apply 
trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch
3 - apply one of the three latest patches (v3, v4 or v5)
{noformat}

v5 Does a lot of refactoring that I think is outside the scope of this ticket 
(but might be worth it's own ticket as the idea is good), so my vote is for v3, 
but I'm getting a NoSuchMethod exception, can you post a branch with all the 
patches added onto trunk (for v3)? 

The exception: 
{noformat}
java.lang.NoSuchMethodException: forceRepairAsync(java.lang.String, boolean, 
java.util.Collection, java.util.Collection, boolean, boolean, boolean, 
[Ljava.lang.String;)
at 
com.sun.jmx.mbeanserver.PerInterface.noSuchMethod(PerInterface.java:168)
at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:135)
at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252)
at 
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)
at 
com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801)
at 
javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1487)
at 
javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:97)
at 
javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1328)
at 
javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1420)
at 
javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:848)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322)
at sun.rmi.transport.Transport$1.run(Transport.java:177)
at sun.rmi.transport.Transport$1.run(Transport.java:174)
at java.security.AccessController.doPrivileged(Native Method)
at sun.rmi.transport.Transport.serviceCall(Transport.java:173)
at 
sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:556)
at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:811)
at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:670)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
{noformat}

bq. I am thinking of calling the new table something generic like 
system_traces.trace_logs. I also assume, that like system_traces.events

I'd say events is pretty generic, the new table should show that the traces 
aren't query related like in events. If we are going to add new tables to the 
trace CF it's worth thinking about refactoring events into something more 
specific and adding new tables with names that carry meaning. Another possible 
solution is to add a "command" field to system_traces.events where it can allow 
users to retrieve data about specific events, e.g. [~jbellis] WDYT? 

{noformat}
SELECT * FROM system_traces.events;
 session_id   | ... | thread| command
--+ ... +---+-
 09d48eb0-a2f1-11e3-9f04-7d9e3709bf93 | ... | Thrift:1  | REPAIR
 29084f90-a2f3-11e3-9f04-7d9e3709bf93 | ... | Thrift:1  | QUERY

(2 rows)

SELECT * FROM system_traces.events WHERE command='REPAIR';

 session_id   | ... | thread| command
--+ ... +---+-
 09d48eb0-a2f1-11e3-9f04-7d9e3709bf93 | ... | Thrift:1  | REPAIR

(1 rows)
{noformat}


bq. the rows in this table should expire, though perhaps not as fast as 24 
hours. 

+1, repairs can take a very long time so this should be configurable with the 
default perhaps being around 30 days, but with incremental repairs (in 2.1) it 
will end up logging a lot of data, still a better choice than users doing 
regular repairs missing out on information. 

bq. One last thing I wanted to ask is about the poss

[jira] [Commented] (CASSANDRA-5483) Repair tracing

2014-02-26 Thread Lyuben Todorov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13912754#comment-13912754
 ] 

Lyuben Todorov commented on CASSANDRA-5483:
---

[~usrbincc] Join the club :D. The patch makes it so repair is traced regardless 
of whether tracing is on or not. Is this on purpose? Also as you said the table 
does fill up rather quickly and this tracing should be moved to a separate 
table as [~ob] suggested. 

> Repair tracing
> --
>
> Key: CASSANDRA-5483
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5483
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Yuki Morishita
>Assignee: Ben Chan
>Priority: Minor
>  Labels: repair
> Attachments: test-5483-system_traces-events.txt, 
> trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch, 
> trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch,
>  tr...@8ebeee1-5483-v01-001-trace-filtering-and-tracestate-propagation.txt, 
> tr...@8ebeee1-5483-v01-002-simple-repair-tracing.txt
>
>
> I think it would be nice to log repair stats and results like query tracing 
> stores traces to system keyspace. With it, you don't have to lookup each log 
> file to see what was the status and how it performed the repair you invoked. 
> Instead, you can query the repair log with session ID to see the state and 
> stats of all nodes involved in that repair session.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-5483) Repair tracing

2014-02-15 Thread Lyuben Todorov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13902431#comment-13902431
 ] 

Lyuben Todorov commented on CASSANDRA-5483:
---

[~usrbincc] Can you please rebase to trunk? Patches don't apply cleanly 
anymore.  

> Repair tracing
> --
>
> Key: CASSANDRA-5483
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5483
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Yuki Morishita
>Assignee: Ben Chan
>Priority: Minor
>  Labels: repair
> Attachments: test-5483-system_traces-events.txt, 
> tr...@8ebeee1-5483-v01-001-trace-filtering-and-tracestate-propagation.txt, 
> tr...@8ebeee1-5483-v01-002-simple-repair-tracing.txt
>
>
> I think it would be nice to log repair stats and results like query tracing 
> stores traces to system keyspace. With it, you don't have to lookup each log 
> file to see what was the status and how it performed the repair you invoked. 
> Instead, you can query the repair log with session ID to see the state and 
> stats of all nodes involved in that repair session.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-5483) Repair tracing

2014-02-12 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13899275#comment-13899275
 ] 

Jonathan Ellis commented on CASSANDRA-5483:
---

Can you review, Lyuben?

> Repair tracing
> --
>
> Key: CASSANDRA-5483
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5483
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Yuki Morishita
>Assignee: Ben Chan
>Priority: Minor
>  Labels: repair
> Attachments: test-5483-system_traces-events.txt, 
> tr...@8ebeee1-5483-v01-001-trace-filtering-and-tracestate-propagation.txt, 
> tr...@8ebeee1-5483-v01-002-simple-repair-tracing.txt
>
>
> I think it would be nice to log repair stats and results like query tracing 
> stores traces to system keyspace. With it, you don't have to lookup each log 
> file to see what was the status and how it performed the repair you invoked. 
> Instead, you can query the repair log with session ID to see the state and 
> stats of all nodes involved in that repair session.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-5483) Repair tracing

2014-01-13 Thread Ben Chan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13870131#comment-13870131
 ] 

Ben Chan commented on CASSANDRA-5483:
-

Thus far, I've been going under the assumption that it's mostly meant to be 
used as a combination of performance profiling and some sort of globally 
accessible error log (for that particular repair session), though there will 
probably easily be enough information there to be able to extract some stats.

I could use some suggestions on what things (and what level of detail) would be 
good to trace. In addition to what I already have, I can only think of a few 
places in {{Differencer}} and {{StreamingRepairTask}}, along with a more 
complete covering of the repair-related error logs.


> Repair tracing
> --
>
> Key: CASSANDRA-5483
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5483
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Yuki Morishita
>Assignee: Ben Chan
>Priority: Minor
>  Labels: repair
> Attachments: test-5483-system_traces-events.txt, 
> tr...@8ebeee1-5483-v01-001-trace-filtering-and-tracestate-propagation.txt, 
> tr...@8ebeee1-5483-v01-002-simple-repair-tracing.txt
>
>
> I think it would be nice to log repair stats and results like query tracing 
> stores traces to system keyspace. With it, you don't have to lookup each log 
> file to see what was the status and how it performed the repair you invoked. 
> Instead, you can query the repair log with session ID to see the state and 
> stats of all nodes involved in that repair session.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-5483) Repair tracing

2014-01-12 Thread Oscar Bonilla (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13869151#comment-13869151
 ] 

Oscar Bonilla commented on CASSANDRA-5483:
--

I was looking at getting some stats from repair and was pointed here. Looking 
at the code makes
me think that repair is more like compaction than query tracing. I.e. it would 
make more sense to
keep a system.repair_history that mimics system.compaction_history. 

> Repair tracing
> --
>
> Key: CASSANDRA-5483
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5483
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Yuki Morishita
>Assignee: Ben Chan
>Priority: Minor
>  Labels: repair
> Attachments: test-5483-system_traces-events.txt, 
> tr...@8ebeee1-5483-v01-001-trace-filtering-and-tracestate-propagation.txt, 
> tr...@8ebeee1-5483-v01-002-simple-repair-tracing.txt
>
>
> I think it would be nice to log repair stats and results like query tracing 
> stores traces to system keyspace. With it, you don't have to lookup each log 
> file to see what was the status and how it performed the repair you invoked. 
> Instead, you can query the repair log with session ID to see the state and 
> stats of all nodes involved in that repair session.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-5483) Repair tracing

2013-04-22 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13637887#comment-13637887
 ] 

Sylvain Lebresne commented on CASSANDRA-5483:
-

We should even probably don't stop at repair (but include 
bootstrap/decommission etc...). Though not suggesting by that that we should 
all do in one shot.

> Repair tracing
> --
>
> Key: CASSANDRA-5483
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5483
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Yuki Morishita
>Assignee: Yuki Morishita
>Priority: Minor
>  Labels: repair
>
> I think it would be nice to log repair stats and results like query tracing 
> stores traces to system keyspace. With it, you don't have to lookup each log 
> file to see what was the status and how it performed the repair you invoked. 
> Instead, you can query the repair log with session ID to see the state and 
> stats of all nodes involved in that repair session.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5483) Repair tracing

2013-04-16 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13633321#comment-13633321
 ] 

Jonathan Ellis commented on CASSANDRA-5483:
---

Good idea!

> Repair tracing
> --
>
> Key: CASSANDRA-5483
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5483
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Yuki Morishita
>Assignee: Yuki Morishita
>Priority: Minor
>  Labels: repair
>
> I think it would be nice to log repair stats and results like query tracing 
> stores traces to system keyspace. With it, you don't have to lookup each log 
> file to see what was the status and how it performed the repair you invoked. 
> Instead, you can query the repair log with session ID to see the state and 
> stats of all nodes involved in that repair session.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira