[ https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13918251#comment-13918251 ]
Lyuben Todorov edited comment on CASSANDRA-5483 at 3/3/14 5:25 PM: ------------------------------------------------------------------- Are the latest 3 patches supposed to be incrementally added onto {{trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch}} and {{trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch}}? As in {noformat} 1 - apply trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch 2 - apply trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch 3 - apply one of the three latest patches (v3, v4 or v5) {noformat} v5 Does a lot of refactoring that I think is outside the scope of this ticket (but might be worth it's own ticket as the idea is good), so my vote is for v3, but I'm getting a NoSuchMethod exception, can you post a branch with all the patches added onto trunk (for v3)? The exception: {noformat} java.lang.NoSuchMethodException: forceRepairAsync(java.lang.String, boolean, java.util.Collection, java.util.Collection, boolean, boolean, boolean, [Ljava.lang.String;) at com.sun.jmx.mbeanserver.PerInterface.noSuchMethod(PerInterface.java:168) at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:135) at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819) at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801) at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1487) at javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:97) at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1328) at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1420) at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:848) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322) at sun.rmi.transport.Transport$1.run(Transport.java:177) at sun.rmi.transport.Transport$1.run(Transport.java:174) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Transport.java:173) at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:556) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:811) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:670) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) {noformat} bq. I am thinking of calling the new table something generic like system_traces.trace_logs. I also assume, that like system_traces.events I'd say events is pretty generic, the new table should show that the traces aren't query related like in events. If we are going to add new tables to the trace CF it's worth thinking about refactoring events into something more specific and adding new tables with names that carry meaning. Another possible solution is to add a "command" field to system_traces.events where it can allow users to retrieve data about specific events, e.g. [~jbellis] WDYT? {noformat} SELECT * FROM system_traces.events; session_id | ... | thread | command --------------------------------------+ ... +-----------+--------- 09d48eb0-a2f1-11e3-9f04-7d9e3709bf93 | ... | Thrift:1 | REPAIR 29084f90-a2f3-11e3-9f04-7d9e3709bf93 | ... | Thrift:1 | QUERY (2 rows) SELECT * FROM system_traces.events WHERE command='REPAIR'; session_id | ... | thread | command --------------------------------------+ ... +-----------+--------- 09d48eb0-a2f1-11e3-9f04-7d9e3709bf93 | ... | Thrift:1 | REPAIR (1 rows) {noformat} bq. the rows in this table should expire, though perhaps not as fast as 24 hours. +1, repairs can take a very long time so this should be configurable with the default perhaps being around 90 days (but should be configurable), but with incremental repairs (in 2.1) it will end up logging a lot of data, still a better choice than users doing regular repairs missing out on information. bq. One last thing I wanted to ask is about the possibility of trace log levels. What is the minimum amount of trace log information you would find useful, the next amount, and so on? Should it just follow the loglevel? Trace is supposed to give as much info as possible and tends to be used for debugging problems, e.g. slow queries or in this case, repairs taking too long, so its important to include useful information but not spam logs with every detail. Different log levels might be useful, but in this ticket the aim is to track progress of repairs, so logging each repair command's completion should be sufficient. was (Author: lyubent): Are the latest 3 patches supposed to be incrementally added onto {{trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch}} and {{trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch}}? As in {noformat} 1 - apply trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch 2 - apply trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch 3 - apply one of the three latest patches (v3, v4 or v5) {noformat} v5 Does a lot of refactoring that I think is outside the scope of this ticket (but might be worth it's own ticket as the idea is good), so my vote is for v3, but I'm getting a NoSuchMethod exception, can you post a branch with all the patches added onto trunk (for v3)? The exception: {noformat} java.lang.NoSuchMethodException: forceRepairAsync(java.lang.String, boolean, java.util.Collection, java.util.Collection, boolean, boolean, boolean, [Ljava.lang.String;) at com.sun.jmx.mbeanserver.PerInterface.noSuchMethod(PerInterface.java:168) at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:135) at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819) at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801) at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1487) at javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:97) at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1328) at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1420) at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:848) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322) at sun.rmi.transport.Transport$1.run(Transport.java:177) at sun.rmi.transport.Transport$1.run(Transport.java:174) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Transport.java:173) at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:556) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:811) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:670) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) {noformat} bq. I am thinking of calling the new table something generic like system_traces.trace_logs. I also assume, that like system_traces.events I'd say events is pretty generic, the new table should show that the traces aren't query related like in events. If we are going to add new tables to the trace CF it's worth thinking about refactoring events into something more specific and adding new tables with names that carry meaning. Another possible solution is to add a "command" field to system_traces.events where it can allow users to retrieve data about specific events, e.g. [~jbellis] WDYT? {noformat} SELECT * FROM system_traces.events; session_id | ... | thread | command --------------------------------------+ ... +-----------+--------- 09d48eb0-a2f1-11e3-9f04-7d9e3709bf93 | ... | Thrift:1 | REPAIR 29084f90-a2f3-11e3-9f04-7d9e3709bf93 | ... | Thrift:1 | QUERY (2 rows) SELECT * FROM system_traces.events WHERE command='REPAIR'; session_id | ... | thread | command --------------------------------------+ ... +-----------+--------- 09d48eb0-a2f1-11e3-9f04-7d9e3709bf93 | ... | Thrift:1 | REPAIR (1 rows) {noformat} bq. the rows in this table should expire, though perhaps not as fast as 24 hours. +1, repairs can take a very long time so this should be configurable with the default perhaps being around 30 days, but with incremental repairs (in 2.1) it will end up logging a lot of data, still a better choice than users doing regular repairs missing out on information. bq. One last thing I wanted to ask is about the possibility of trace log levels. What is the minimum amount of trace log information you would find useful, the next amount, and so on? Should it just follow the loglevel? Trace is supposed to give as much info as possible and tends to be used for debugging problems, e.g. slow queries or in this case, repairs taking too long, so its important to include useful information but not spam logs with every detail. Different log levels might be useful, but in this ticket the aim is to track progress of repairs, so logging each repair command's completion should be sufficient. > Repair tracing > -------------- > > Key: CASSANDRA-5483 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5483 > Project: Cassandra > Issue Type: Improvement > Components: Tools > Reporter: Yuki Morishita > Assignee: Ben Chan > Priority: Minor > Labels: repair > Attachments: test-5483-system_traces-events.txt, > trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch, > trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch, > tr...@8ebeee1-5483-v01-001-trace-filtering-and-tracestate-propagation.txt, > tr...@8ebeee1-5483-v01-002-simple-repair-tracing.txt, > v02p02-5483-v03-0003-Make-repair-tracing-controllable-via-nodetool.patch, > v02p02-5483-v04-0003-This-time-use-an-EnumSet-to-pass-boolean-repair-options.patch, > v02p02-5483-v05-0003-Use-long-instead-of-EnumSet-to-work-with-JMX.patch > > > I think it would be nice to log repair stats and results like query tracing > stores traces to system keyspace. With it, you don't have to lookup each log > file to see what was the status and how it performed the repair you invoked. > Instead, you can query the repair log with session ID to see the state and > stats of all nodes involved in that repair session. -- This message was sent by Atlassian JIRA (v6.2#6252)