https://issues.apache.org/jira/browse/ACCUMULO-241?focusedCommentId=13449680&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13449680
In this comment, John mentioned that all getBytes() method calls should be changed to use UTF8. There are about 1,800 getBytes() calls and not all of them involve String objects. I am working on ways to identify a subset of these calls to change. I have created https://issues.apache.org/jira/browse/ACCUMULO-836 to track this issue. Should we create one static Charset object? Class AccumuloDefaultCharset { public static Charset UTF8 = Charset.forName("UTF8"); } Should we use a static constant? public static String UTF8 = "UTF8"; I have found one instance of getBytes() in InputFormatBase: protected static byte[] getPassword(Configuration conf) { return Base64.decodeBase64(conf.get(PASSWORD, "").getBytes()); } Are there any reasons why I can't start specifying the charset? Is UTF8 the right Charset to use? I am not an expert in non-English charsets, so guidance would be welcome.
