Heya! Earlier this week we had a user in IRC that was having difficulty running 1.5.0 because their classpath didn't include commons-configuration.
In one case, they just needed to fix their accumulo-site to include hadoop 2 paths. In the other, they were using Apache Hadoop 0.20.2, which has no commons-configuration. Initially, the user thought they were running a CDH3 version. This turned out not to be the case, but it so happens that CDH3 also does not have commons-configuration provided by Hadoop. This interaction pointed out 2 issues, and I'd like some opinions on how to handle them before I file jiras and possibly patches. 1) We are not sufficiently warning people about the need for durable sync Or maybe we're just not getting across when durable sync is available. Hadoop versions are nonsensical for most outsiders, so I think we need to spell it out in docs. Waiting for users to start an instance and then look at a log is insufficient. I'm thinking we need something similar to what HBase has[1]. My question is, where should I add this? the README seems like a good place, since it already talks about enabling durable sync. How about the user manual? Both? 2) Should we document commons-configuration similar to commons-io? The README already has a section about how some older versions of Hadoop don't have commons-io. I think the versions given need to be tightened up given (1) above (since right now it implicitly refers to versions people should not be using). The only Hadoop distro I know of that both has proper append support and does not have commons-configuration is CDH3. In addition to being a vendor-specific version, it is no longer supported by said vendor. So would it be preferable to 2a) add a note after the commons-io section that gives similar instructions for adding commons-configuration? 2b) file a jira that points out that users on CDH3 won't have commons configuration, document the work around on said ticket, close it as won'tfix The idea with the latter approach is that it would give searchers a chance to find the information and give us somewhere to point people, while not adding to our long-term documentation baggage. The downside is that this won't be as accessible to users, so it will be more painful for them (esp if they don't have regular internet access). -Sean [1]: http://hbase.apache.org/book/configuration.html#hadoop.older.versions
