Maybe I'm being particularly dense, but I still think that this is being made too complex by failing to enumerate the specific goals.
First case; data for which Accumulo is defined to persistently store *characters*, as opposed to bytes. I would hope that, in all such cases, we would agree that those characters should be stored in some Unicode format, never in some legacy encoding. Second case; data for which Accumulo is defined to store bytes, but, for convenience, an API allows the user to read and write characters. In this case, I can imagine two competing API designs. One would be to mirror Java, and in all such cases give the user the option of specifying the charset, defaulting to file.encoding. The other would be to insist on UTF-8. A third possibility - to just respect file.encoding - seems to me to be retreading the errors of Java 1.x. Third case; cases in which the user either supplies a text file for Accumulo to read, or asks Accumulo to write a text file. Having an API that can default to file.encoding here would be convenient for users, who want files in their platform's default encoding. Note that this is incompatible with the notion of *setting* file.encoding as an implementation technique for getting the string constructor and getBytes() to do UTF-8. Finally for today, I had a hard time following the response to my writing on servlets. I'll vastly simplify my presentation: when a user of Accumulo writes Java code that calls the Accumulo API, I find it unacceptable to require that user to set file.encoding to get correct behavior from Accumulo, except as described in the second case above. When Accumulo classes are integrated into user applications, Accumulo must respect file.encoding, or ignore file.encoding, but it cannot require the user to set it to something in particular to get correct behavior.
