Heya!

Earlier this week we had a user in IRC that was having difficulty running
1.5.0 because their classpath didn't include commons-configuration.

In one case, they just needed to fix their accumulo-site to include hadoop
2 paths. In the other, they were using Apache Hadoop 0.20.2, which has no
commons-configuration.

Initially, the user thought they were running a CDH3 version. This turned
out not to be the case, but it so happens that CDH3 also does not have
commons-configuration provided by Hadoop.

This interaction pointed out 2 issues, and I'd like some opinions on how to
handle them before I file jiras and possibly patches.

1) We are not sufficiently warning people about the need for durable sync

Or maybe we're just not getting across when durable sync is available.
Hadoop versions are nonsensical for most outsiders, so I think we need to
spell it out in docs. Waiting for users to start an instance and then look
at a log is insufficient.

I'm thinking we need something similar to what HBase has[1].

My question is, where should I add this? the README seems like a good
place, since it already talks about enabling durable sync. How about the
user manual? Both?

2) Should we document commons-configuration similar to commons-io?

The README already has a section about how some older versions of Hadoop
don't have commons-io. I think the versions given need to be tightened up
given (1) above (since right now it implicitly refers to versions people
should not be using).

The only Hadoop distro I know of that both has proper append support and
does not have commons-configuration is CDH3. In addition to being a
vendor-specific version, it is no longer supported by said vendor.

So would it be preferable to

  2a) add a note after the commons-io section that gives similar
instructions for adding commons-configuration?

  2b) file a jira that points out that users on CDH3 won't have commons
configuration, document the work around on said ticket, close it as won'tfix

The idea with the latter approach is that it would give searchers a chance
to find the information and give us somewhere to point people, while not
adding to our long-term documentation baggage. The downside is that this
won't be as accessible to users, so it will be more painful for them (esp
if they don't have regular internet access).


-Sean

[1]: http://hbase.apache.org/book/configuration.html#hadoop.older.versions

Reply via email to