[ https://issues.apache.org/jira/browse/CASSANDRA-1123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13438475#comment-13438475 ]
Jonathan Ellis commented on CASSANDRA-1123: ------------------------------------------- Dug into this pretty hard today. Thanks for the huge effort, David! The good news is I think we can simplify this a great deal. The bad news is that it's a lot of code churn. Sorry about that! We probably could have used a little more back-and-forth before fleshing it out so completely. What I saw as I tried to add trace points was that the TraceEvent/Builder api was fairly cumbersome. What I really wanted to do was instrument all the debug logging that we've accumulated, most of which has the battle scars of being added to solve a particular tricky situation. So I started banging on the API and realized that I was ending up with something that resembled a logging appender. So why not hook into our existing logging api? Like Aaron's original idea, but without any copy and pasting; we can just implement a log4j LoggingAppender and life is good. This does mean we don't have payload maps anywhere but in the session initialization, which gets special-cased, but I'm fine with that. I've pushed a sketch of this approach to https://github.com/jbellis/cassandra/tree/1123-4. I can't devote another day to finish it, so I'm going to have to throw it back to David, but I've considerately left the parts that need work in a non-compiling state so it's clear where to start. :) (The main drawback is we have to be careful about where we log on the write path to avoid an infinite loop, but this was a problem with the old approach as well. At least infinite loops will be fairly obvious and easy to fix.) Some other points: - I'm violently opposed to serializing thrift objects into the trace. This just pushes the job of making them human-readable out to each consumer. Let's solve this once at the server level instead. One approach is given (but not completed) in my tree. Note that I'm okay with being *barely* human-readable; the main use case we're concerned with is tracing queries interactively, in which case we already know the parameters and the logging is a formality. I do want it to be *possible* to reconstruct a problematic query detected by probabilistic sampling, but it doesn't have to be easy. (That said, turning hex back into cli or cqlsh is at least possible without writing a Thrift deserializer first, so it has that much of an advantage over the initial patch.) - I'm fine with giving up some structure in exchange for ease of use. A single "activity" column, with some metadata, is enough. - I'm also fine with giving up any or all of the pretty printer, summary by request type, and tests, which I've made no effort to port. Would strongly prefer getting a bare bones implementation finished, then adding more functionality later. (I do note that the pretty printer probably makes more sense to grow vertically rather than horizontally.) - enabling tracing-by-probability should be in jmx rather than thrift. unsure if max-sessions-to-trace is useful. - I think you will like the improvements around DTPE and the Stage. - Wiring up appenders at different levels (we'd want the file appender at INFO, the tracing one at DEBUG) is kind of a bitch. Apologies in advance. Possibly useful: http://stackoverflow.com/questions/2154539/log4j-log-level-per-appender-for-a-single-logger, http://stackoverflow.com/questions/751431/how-can-i-direct-log4j-output-so-that-different-log-levels-go-to-different-appen Think that about covers it, but it's late and I could have missed something. > Allow tracing query details > --------------------------- > > Key: CASSANDRA-1123 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1123 > Project: Cassandra > Issue Type: New Feature > Components: Core > Reporter: Jonathan Ellis > Assignee: David Alves > Fix For: 1.2.0 > > Attachments: 1123-3.patch.gz, 1123.patch, 1123.patch > > > In the spirit of CASSANDRA-511, it would be useful to tracing on queries to > see where latency is coming from: how long did row cache lookup take? key > search in the index? merging the data from the sstables? etc. > The main difference vs setting debug logging is that debug logging is too big > of a hammer; by turning on the flood of logging for everyone, you actually > distort the information you're looking for. This would be something you > could set per-query (or more likely per connection). > We don't need to be as sophisticated as the techniques discussed in the > following papers but they are interesting reading: > http://research.google.com/pubs/pub36356.html > http://www.usenix.org/events/osdi04/tech/full_papers/barham/barham_html/ > http://www.usenix.org/event/nsdi07/tech/fonseca.html -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira