[ 
https://issues.apache.org/jira/browse/CASSANDRA-1123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13438475#comment-13438475
 ] 

Jonathan Ellis commented on CASSANDRA-1123:
-------------------------------------------

Dug into this pretty hard today.  Thanks for the huge effort, David!

The good news is I think we can simplify this a great deal.  The bad news is 
that it's a lot of code churn.  Sorry about that!  We probably could have used 
a little more back-and-forth before fleshing it out so completely.

What I saw as I tried to add trace points was that the TraceEvent/Builder api 
was fairly cumbersome.  What I really wanted to do was instrument all the debug 
logging that we've accumulated, most of which has the battle scars of being 
added to solve a particular tricky situation.

So I started banging on the API and realized that I was ending up with 
something that resembled a logging appender.  So why not hook into our existing 
logging api?  Like Aaron's original idea, but without any copy and pasting; we 
can just implement a log4j LoggingAppender and life is good.

This does mean we don't have payload maps anywhere but in the session 
initialization, which gets special-cased, but I'm fine with that.

I've pushed a sketch of this approach to 
https://github.com/jbellis/cassandra/tree/1123-4.  I can't devote another day 
to finish it, so I'm going to have to throw it back to David, but I've 
considerately left the parts that need work in a non-compiling state so it's 
clear where to start. :)

(The main drawback is we have to be careful about where we log on the write 
path to avoid an infinite loop, but this was a problem with the old approach as 
well.  At least infinite loops will be fairly obvious and easy to fix.)

Some other points:
- I'm violently opposed to serializing thrift objects into the trace.  This 
just pushes the job of making them human-readable out to each consumer.  Let's 
solve this once at the server level instead.  One approach is given (but not 
completed) in my tree.  Note that I'm okay with being *barely* human-readable; 
the main use case we're concerned with is tracing queries interactively, in 
which case we already know the parameters and the logging is a formality.  I do 
want it to be *possible* to reconstruct a problematic query detected by 
probabilistic sampling, but it doesn't have to be easy.  (That said, turning 
hex back into cli or cqlsh is at least possible without writing a Thrift 
deserializer first, so it has that much of an advantage over the initial patch.)
- I'm fine with giving up some structure in exchange for ease of use.  A single 
"activity" column, with some metadata, is enough.
- I'm also fine with giving up any or all of the pretty printer, summary by 
request type, and tests, which I've made no effort to port.  Would strongly 
prefer getting a bare bones implementation finished, then adding more 
functionality later.  (I do note that the pretty printer probably makes more 
sense to grow vertically rather than horizontally.)
- enabling tracing-by-probability should be in jmx rather than thrift.  unsure 
if max-sessions-to-trace is useful.
- I think you will like the improvements around DTPE and the Stage.
- Wiring up appenders at different levels (we'd want the file appender at INFO, 
the tracing one at DEBUG) is kind of a bitch. Apologies in advance. Possibly 
useful: 
http://stackoverflow.com/questions/2154539/log4j-log-level-per-appender-for-a-single-logger,
 
http://stackoverflow.com/questions/751431/how-can-i-direct-log4j-output-so-that-different-log-levels-go-to-different-appen

Think that about covers it, but it's late and I could have missed something.
                
> Allow tracing query details
> ---------------------------
>
>                 Key: CASSANDRA-1123
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1123
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: David Alves
>             Fix For: 1.2.0
>
>         Attachments: 1123-3.patch.gz, 1123.patch, 1123.patch
>
>
> In the spirit of CASSANDRA-511, it would be useful to tracing on queries to 
> see where latency is coming from: how long did row cache lookup take?  key 
> search in the index?  merging the data from the sstables?  etc.
> The main difference vs setting debug logging is that debug logging is too big 
> of a hammer; by turning on the flood of logging for everyone, you actually 
> distort the information you're looking for.  This would be something you 
> could set per-query (or more likely per connection).
> We don't need to be as sophisticated as the techniques discussed in the 
> following papers but they are interesting reading:
> http://research.google.com/pubs/pub36356.html
> http://www.usenix.org/events/osdi04/tech/full_papers/barham/barham_html/
> http://www.usenix.org/event/nsdi07/tech/fonseca.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to