[jira] [Assigned] (CASSANDRA-1123) Allow tracing query details

Jonathan Ellis (JIRA) Fri, 29 Jun 2012 18:07:47 -0700

     [ 
https://issues.apache.org/jira/browse/CASSANDRA-1123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jonathan Ellis reassigned CASSANDRA-1123:
-----------------------------------------

    Assignee: David Alves  (was: Aaron Morton)

So, super high level view of how I think this should work:

Store events like this:
{code}
CREATE TABLE trace_sessions (
  session int,
  coordinator inetaddr,
  request text,
  PRIMARY KEY (session, coordinator)
);

CREATE TABLE trace_events (
  session     int,
  coordinator inetaddr,
  id     uuid,
  source inetaddr,
  event  text,
  happened_at timestamp,
  duration int,
  PRIMARY KEY (session, coordinator, id)
);
{code}

The session table will log the operation being processed, and the events will 
log its progress.

Coordinator and replicas will all just throw events into this table; they don't 
have to keep anything in memory.  Then we get the permanent record, AND clients 
can query the log directly without needing to add more special-case code to the 
coordinator.

One way to tell the client about session ids would be to change Aaron's API a 
bit: {{int system_trace_next_query()}}.  Then clients can just "SELECT * from 
trace_events WHERE coordinator = ? and session = ?" if desired.

Alternatively we could retain the "enable tracing on all queries until I turn 
it off" and add {{int system_get_last_query_session()}} but I prefer the 
one-method version for simplicity and because tracing probably shouldn't be 
left on indefinitely.

No idea how best to expose to a Thrift-less CQL.  Should we create a 
special-cased function there too?

Notes:
- coordinator will generate the session id locally.  Session ids will NOT be 
globally unique; (id, coordinator) pair identifies a series of events
- session comes before coordinator since we want id to be the partition key; 
this will create narrower rows
- "source" is the machine logging the event, not necessarily the same as 
coordinator (better name needed?)
- happened_at will come from currentTimeMillis, so will not be high resolution 
enough to give us the duration of many events; instead, we should use nanotime 
on coordinator (and replicas) to compute duration
- We should TTL event data (1 day?) both to conserve space and to avoid 
problems when event ids wrap
- Tracing N% of queries randomly and providing an external session id can be 
broken out into separate tickets
- Not sure if the threadlocal approach I originally advocated is a good one.  
Certainly there are important aspects that are not covered by the "main" 
coordinator and replica threads: commitlog on the replica, and response 
processing on the coordinator; we may add more such thread pools in the future.
- Aaron's hacking of the slf4j adapator is clever, but I think we should leave 
it out; you'd still have to piece together what happened across multiple 
machines.  Logging into a columnfamily instead solves that problem and makes 
the data more accessible than log files.
- Probably worth encapsulating "log an event to the columnfamily if tracing is 
enabled, otherwise log at debug to log4j" into a utility method, though.
                
> Allow tracing query details
> ---------------------------
>
>                 Key: CASSANDRA-1123
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1123
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: David Alves
>             Fix For: 1.2
>
>         Attachments: 1123-3.patch.gz
>
>
> In the spirit of CASSANDRA-511, it would be useful to tracing on queries to 
> see where latency is coming from: how long did row cache lookup take?  key 
> search in the index?  merging the data from the sstables?  etc.
> The main difference vs setting debug logging is that debug logging is too big 
> of a hammer; by turning on the flood of logging for everyone, you actually 
> distort the information you're looking for.  This would be something you 
> could set per-query (or more likely per connection).
> We don't need to be as sophisticated as the techniques discussed in the 
> following papers but they are interesting reading:
> http://research.google.com/pubs/pub36356.html
> http://www.usenix.org/events/osdi04/tech/full_papers/barham/barham_html/
> http://www.usenix.org/event/nsdi07/tech/fonseca.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (CASSANDRA-1123) Allow tracing query details

Reply via email to