[ https://issues.apache.org/jira/browse/CASSANDRA-1123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jonathan Ellis reassigned CASSANDRA-1123: ----------------------------------------- Assignee: David Alves (was: Aaron Morton) So, super high level view of how I think this should work: Store events like this: {code} CREATE TABLE trace_sessions ( session int, coordinator inetaddr, request text, PRIMARY KEY (session, coordinator) ); CREATE TABLE trace_events ( session int, coordinator inetaddr, id uuid, source inetaddr, event text, happened_at timestamp, duration int, PRIMARY KEY (session, coordinator, id) ); {code} The session table will log the operation being processed, and the events will log its progress. Coordinator and replicas will all just throw events into this table; they don't have to keep anything in memory. Then we get the permanent record, AND clients can query the log directly without needing to add more special-case code to the coordinator. One way to tell the client about session ids would be to change Aaron's API a bit: {{int system_trace_next_query()}}. Then clients can just "SELECT * from trace_events WHERE coordinator = ? and session = ?" if desired. Alternatively we could retain the "enable tracing on all queries until I turn it off" and add {{int system_get_last_query_session()}} but I prefer the one-method version for simplicity and because tracing probably shouldn't be left on indefinitely. No idea how best to expose to a Thrift-less CQL. Should we create a special-cased function there too? Notes: - coordinator will generate the session id locally. Session ids will NOT be globally unique; (id, coordinator) pair identifies a series of events - session comes before coordinator since we want id to be the partition key; this will create narrower rows - "source" is the machine logging the event, not necessarily the same as coordinator (better name needed?) - happened_at will come from currentTimeMillis, so will not be high resolution enough to give us the duration of many events; instead, we should use nanotime on coordinator (and replicas) to compute duration - We should TTL event data (1 day?) both to conserve space and to avoid problems when event ids wrap - Tracing N% of queries randomly and providing an external session id can be broken out into separate tickets - Not sure if the threadlocal approach I originally advocated is a good one. Certainly there are important aspects that are not covered by the "main" coordinator and replica threads: commitlog on the replica, and response processing on the coordinator; we may add more such thread pools in the future. - Aaron's hacking of the slf4j adapator is clever, but I think we should leave it out; you'd still have to piece together what happened across multiple machines. Logging into a columnfamily instead solves that problem and makes the data more accessible than log files. - Probably worth encapsulating "log an event to the columnfamily if tracing is enabled, otherwise log at debug to log4j" into a utility method, though. > Allow tracing query details > --------------------------- > > Key: CASSANDRA-1123 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1123 > Project: Cassandra > Issue Type: New Feature > Components: Core > Reporter: Jonathan Ellis > Assignee: David Alves > Fix For: 1.2 > > Attachments: 1123-3.patch.gz > > > In the spirit of CASSANDRA-511, it would be useful to tracing on queries to > see where latency is coming from: how long did row cache lookup take? key > search in the index? merging the data from the sstables? etc. > The main difference vs setting debug logging is that debug logging is too big > of a hammer; by turning on the flood of logging for everyone, you actually > distort the information you're looking for. This would be something you > could set per-query (or more likely per connection). > We don't need to be as sophisticated as the techniques discussed in the > following papers but they are interesting reading: > http://research.google.com/pubs/pub36356.html > http://www.usenix.org/events/osdi04/tech/full_papers/barham/barham_html/ > http://www.usenix.org/event/nsdi07/tech/fonseca.html -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira