[jira] [Comment Edited] (CASSANDRA-5483) Repair tracing

Lyuben Todorov (JIRA) Mon, 03 Mar 2014 09:28:26 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13918251#comment-13918251
 ]


Lyuben Todorov edited comment on CASSANDRA-5483 at 3/3/14 5:25 PM:
-------------------------------------------------------------------

Are the latest 3 patches supposed to be incrementally added onto 
{{trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch}}
 and 
{{trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch}}?
 As in

{noformat}
1 - apply 
trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch
2 - apply 
trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch
3 - apply one of the three latest patches (v3, v4 or v5)
{noformat}

v5 Does a lot of refactoring that I think is outside the scope of this ticket 
(but might be worth it's own ticket as the idea is good), so my vote is for v3, 
but I'm getting a NoSuchMethod exception, can you post a branch with all the 
patches added onto trunk (for v3)? 

The exception: 
{noformat}
java.lang.NoSuchMethodException: forceRepairAsync(java.lang.String, boolean, 
java.util.Collection, java.util.Collection, boolean, boolean, boolean, 
[Ljava.lang.String;)
        at 
com.sun.jmx.mbeanserver.PerInterface.noSuchMethod(PerInterface.java:168)
        at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:135)
        at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252)
        at 
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)
        at 
com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801)
        at 
javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1487)
        at 
javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:97)
        at 
javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1328)
        at 
javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1420)
        at 
javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:848)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322)
        at sun.rmi.transport.Transport$1.run(Transport.java:177)
        at sun.rmi.transport.Transport$1.run(Transport.java:174)
        at java.security.AccessController.doPrivileged(Native Method)
        at sun.rmi.transport.Transport.serviceCall(Transport.java:173)
        at 
sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:556)
        at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:811)
        at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:670)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)
{noformat}

bq. I am thinking of calling the new table something generic like 
system_traces.trace_logs. I also assume, that like system_traces.events

I'd say events is pretty generic, the new table should show that the traces 
aren't query related like in events. If we are going to add new tables to the 
trace CF it's worth thinking about refactoring events into something more 
specific and adding new tables with names that carry meaning. Another possible 
solution is to add a "command" field to system_traces.events where it can allow 
users to retrieve data about specific events, e.g. [~jbellis] WDYT? 

{noformat}
SELECT * FROM system_traces.events;
 session_id                           | ... | thread    | command
--------------------------------------+ ... +-----------+---------
 09d48eb0-a2f1-11e3-9f04-7d9e3709bf93 | ... | Thrift:1  | REPAIR
 29084f90-a2f3-11e3-9f04-7d9e3709bf93 | ... | Thrift:1  | QUERY

(2 rows)

SELECT * FROM system_traces.events WHERE command='REPAIR';

 session_id                           | ... | thread    | command
--------------------------------------+ ... +-----------+---------
 09d48eb0-a2f1-11e3-9f04-7d9e3709bf93 | ... | Thrift:1  | REPAIR

(1 rows)
{noformat}


bq. the rows in this table should expire, though perhaps not as fast as 24 
hours. 

+1, repairs can take a very long time so this should be configurable with the 
default perhaps being around 90 days (but should be configurable), but with 
incremental repairs (in 2.1) it will end up logging a lot of data, still a 
better choice than users doing regular repairs missing out on information. 

bq. One last thing I wanted to ask is about the possibility of trace log 
levels. What is the minimum amount of trace log information you would find 
useful, the next amount, and so on? Should it just follow the loglevel?

Trace is supposed to give as much info as possible and tends to be used for 
debugging problems, e.g. slow queries or in this case, repairs taking too long, 
so its important to include useful information but not spam logs with every 
detail. Different log levels might be useful, but in this ticket the aim is to 
track progress of repairs, so logging each repair command's completion should 
be sufficient.  


was (Author: lyubent):
Are the latest 3 patches supposed to be incrementally added onto 
{{trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch}}
 and 
{{trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch}}?
 As in

{noformat}
1 - apply 
trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch
2 - apply 
trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch
3 - apply one of the three latest patches (v3, v4 or v5)
{noformat}

v5 Does a lot of refactoring that I think is outside the scope of this ticket 
(but might be worth it's own ticket as the idea is good), so my vote is for v3, 
but I'm getting a NoSuchMethod exception, can you post a branch with all the 
patches added onto trunk (for v3)? 

The exception: 
{noformat}
java.lang.NoSuchMethodException: forceRepairAsync(java.lang.String, boolean, 
java.util.Collection, java.util.Collection, boolean, boolean, boolean, 
[Ljava.lang.String;)
        at 
com.sun.jmx.mbeanserver.PerInterface.noSuchMethod(PerInterface.java:168)
        at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:135)
        at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252)
        at 
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)
        at 
com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801)
        at 
javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1487)
        at 
javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:97)
        at 
javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1328)
        at 
javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1420)
        at 
javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:848)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322)
        at sun.rmi.transport.Transport$1.run(Transport.java:177)
        at sun.rmi.transport.Transport$1.run(Transport.java:174)
        at java.security.AccessController.doPrivileged(Native Method)
        at sun.rmi.transport.Transport.serviceCall(Transport.java:173)
        at 
sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:556)
        at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:811)
        at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:670)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)
{noformat}

bq. I am thinking of calling the new table something generic like 
system_traces.trace_logs. I also assume, that like system_traces.events

I'd say events is pretty generic, the new table should show that the traces 
aren't query related like in events. If we are going to add new tables to the 
trace CF it's worth thinking about refactoring events into something more 
specific and adding new tables with names that carry meaning. Another possible 
solution is to add a "command" field to system_traces.events where it can allow 
users to retrieve data about specific events, e.g. [~jbellis] WDYT? 

{noformat}
SELECT * FROM system_traces.events;
 session_id                           | ... | thread    | command
--------------------------------------+ ... +-----------+---------
 09d48eb0-a2f1-11e3-9f04-7d9e3709bf93 | ... | Thrift:1  | REPAIR
 29084f90-a2f3-11e3-9f04-7d9e3709bf93 | ... | Thrift:1  | QUERY

(2 rows)

SELECT * FROM system_traces.events WHERE command='REPAIR';

 session_id                           | ... | thread    | command
--------------------------------------+ ... +-----------+---------
 09d48eb0-a2f1-11e3-9f04-7d9e3709bf93 | ... | Thrift:1  | REPAIR

(1 rows)
{noformat}


bq. the rows in this table should expire, though perhaps not as fast as 24 
hours. 

+1, repairs can take a very long time so this should be configurable with the 
default perhaps being around 30 days, but with incremental repairs (in 2.1) it 
will end up logging a lot of data, still a better choice than users doing 
regular repairs missing out on information. 

bq. One last thing I wanted to ask is about the possibility of trace log 
levels. What is the minimum amount of trace log information you would find 
useful, the next amount, and so on? Should it just follow the loglevel?

Trace is supposed to give as much info as possible and tends to be used for 
debugging problems, e.g. slow queries or in this case, repairs taking too long, 
so its important to include useful information but not spam logs with every 
detail. Different log levels might be useful, but in this ticket the aim is to 
track progress of repairs, so logging each repair command's completion should 
be sufficient.  

> Repair tracing
> --------------
>
>                 Key: CASSANDRA-5483
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5483
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Tools
>            Reporter: Yuki Morishita
>            Assignee: Ben Chan
>            Priority: Minor
>              Labels: repair
>         Attachments: test-5483-system_traces-events.txt, 
> trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch, 
> trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch,
>  tr...@8ebeee1-5483-v01-001-trace-filtering-and-tracestate-propagation.txt, 
> tr...@8ebeee1-5483-v01-002-simple-repair-tracing.txt, 
> v02p02-5483-v03-0003-Make-repair-tracing-controllable-via-nodetool.patch, 
> v02p02-5483-v04-0003-This-time-use-an-EnumSet-to-pass-boolean-repair-options.patch,
>  v02p02-5483-v05-0003-Use-long-instead-of-EnumSet-to-work-with-JMX.patch
>
>
> I think it would be nice to log repair stats and results like query tracing 
> stores traces to system keyspace. With it, you don't have to lookup each log 
> file to see what was the status and how it performed the repair you invoked. 
> Instead, you can query the repair log with session ID to see the state and 
> stats of all nodes involved in that repair session.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Comment Edited] (CASSANDRA-5483) Repair tracing

Reply via email to