In response to the apparent mass confusion about nodetool repair that became evidence in the thread:
http://www.mail-archive.com/user@cassandra.apache.org/msg11755.html I started looking around to see what is actually claimed about repair. I found that the Datastax docs: http://www.datastax.com/docs/0.7/operations/scheduled_tasks#repair ... uses phrasing which seems very very wrong. It strongly seems to imply that you should not normally run nodetool repair on a cluster. First of all, have I completely flown off the handle and completely and utterly confused myself - is what I say in the E-Mail thread wrong? On the assumption that I'm not crazy, I think this is a good time to talk about documentation. I've been itching for a while about the state of documentation. There is the ad-hoc wiki, and there is the Datastax stuff, but neither is really complete. What I ask myself is how we can achieve the goal that people who are trying to adopt Cassandra can do so, and use it reliably, without extensive time spent following mailinglists, JIRA, and trying to keep track of what's still true and not on the wiki, etc. This includes challenges like: * How to actually phrase and structure documentation in an accessible fashion for people who just want to *use* Cassandra, and not be knee-deep in the community. * Try to minimize the amount of "subtle detail" that you have to get right in order to not have a problem; the very up-to-you-to-fix and not-very-well-advertised state of 'nodetool repair' is a good example. Whatever can be done to avoid there even having to *be* documentation for it, except for people who want to know extra details or are doing stuff like not having deletes and wanting to avoid repair. * Keeping the documentation up-to-date. Do people agree with the goals and the claim that we're not there? What are good ways to achieve the goals? I keep feeling the need that there should really be a handbook. The datastax docs seem to be the right "format" (similarly to the FreeBSD handbook, which is a good example). But it seems we need something more agile that people can easily contribute to, while it still can be kept up-to-date. So what can be done? Is having a handbook a good idea? The key property of what I call a handbook is that there is some section on e.g. "Cassandra operations" that is reasonably long, and that someone can read through from beginning to end and get a coherent overall view of how things work, and know the important aspects that must be taken care of in production clusters. It's fine if every single little detail and potential kink isn't there (like long long details about how to calculate memtable thresholds). But stuff like 'yeah, you need to run nodetool repair at least as often as X"' is important. So are operational best-practices for performing operations on a cluster in a safe manner (e.g., moving nodes, seeds being sensitive, gossip delays, bootstrapping multiple nodes at once, etc). I'm not sure how to get there. It's not like I'm *so* motivated and have *so* much time that if people agree I'll sit down and write 500 pages of Cassandra handbook. So the question is how to achieve something incrementally that is yet more organized than the wiki. Thoughts? -- / Peter Schuller