In response to the apparent mass confusion about nodetool repair that
became evidence in the thread:

   http://www.mail-archive.com/user@cassandra.apache.org/msg11755.html

I started looking around to see what is actually claimed about repair.
I found that the Datastax docs:

   http://www.datastax.com/docs/0.7/operations/scheduled_tasks#repair

... uses phrasing which seems very very wrong. It strongly seems to
imply that you should not normally run nodetool repair on a cluster.

First of all, have I completely flown off the handle and completely
and utterly confused myself - is what I say in the E-Mail thread
wrong?

On the assumption that I'm not crazy, I think this is a good time to
talk about documentation. I've been itching for a while about the
state of documentation. There is the ad-hoc wiki, and there is the
Datastax stuff, but neither is really complete.

What I ask myself is how we can achieve the goal that people who are
trying to adopt Cassandra can do so, and use it reliably, without
extensive time spent following mailinglists, JIRA, and trying to keep
track of what's still true and not on the wiki, etc.

This includes challenges like:

* How to actually phrase and structure documentation in an accessible
fashion for people who just want to *use* Cassandra, and not be
knee-deep in the community.

* Try to minimize the amount of "subtle detail" that you have to get
right in order to not have a problem; the very up-to-you-to-fix and
not-very-well-advertised state of 'nodetool repair' is a good example.
Whatever can be done to avoid there even having to *be* documentation
for it, except for people who want to know extra details or are doing
stuff like not having deletes and wanting to avoid repair.

* Keeping the documentation up-to-date.

Do people agree with the goals and the claim that we're not there?
What are good ways to achieve the goals?

I keep feeling the need that there should really be a handbook. The
datastax docs seem to be the right "format" (similarly to the FreeBSD
handbook, which is a good example). But it seems we need something
more agile that people can easily contribute to, while it still can be
kept up-to-date. So what can be done?

Is having a handbook a good idea? The key property of what I call a
handbook is that there is some section on e.g. "Cassandra operations"
that is reasonably long, and that someone can read through from
beginning to end and get a coherent overall view of how things work,
and know the important aspects that must be taken care of in
production clusters.

It's fine if every single little detail and potential kink isn't there
(like long long details about how to calculate memtable thresholds).
But stuff like 'yeah, you need to run nodetool repair at least as
often as X"' is important. So are operational best-practices for
performing operations on a cluster in a safe manner (e.g., moving
nodes, seeds being sensitive, gossip delays, bootstrapping multiple
nodes at once, etc).

I'm not sure how to get there. It's not like I'm *so* motivated and
have *so* much time that if people agree I'll sit down and write 500
pages of Cassandra handbook. So the question is how to achieve
something incrementally that is yet more organized than the wiki.

Thoughts?

-- 
/ Peter Schuller

Reply via email to