I like the idea of making the change explicit in the source code. Setting the encoding in the jvm property would be easier but not as explicit. I have a few dozen of the files changed. Today I have free time since Hurricane Sandy has closed offices.
On Mon, Oct 29, 2012 at 11:39 AM, William Slacum <[email protected]> wrote: > Isn't it easier to just set the JVM property `file.encoding`? > > On Sun, Oct 28, 2012 at 3:18 PM, Ed Kohlwey <[email protected]> wrote: > >> If you use a private static field in each class for the charset, it will >> basically be a singleton because charsets are cached in char set.forname. >> IMHO this is a somewhat cleaner approach than having lots of static imports >> to utility classes with lots of constants in them. >> On Oct 28, 2012 5:50 PM, "David Medinets" <[email protected]> >> wrote: >> >> > >> > >> https://issues.apache.org/jira/browse/ACCUMULO-241?focusedCommentId=13449680&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13449680 >> > >> > In this comment, John mentioned that all getBytes() method calls >> > should be changed to use UTF8. There are about 1,800 getBytes() calls >> > and not all of them involve String objects. I am working on ways to >> > identify a subset of these calls to change. >> > >> > I have created https://issues.apache.org/jira/browse/ACCUMULO-836 to >> > track this issue. >> > >> > Should we create one static Charset object? >> > >> > Class AccumuloDefaultCharset { >> > public static Charset UTF8 = Charset.forName("UTF8"); >> > } >> > >> > Should we use a static constant? >> > >> > public static String UTF8 = "UTF8"; >> > >> > I have found one instance of getBytes() in InputFormatBase: >> > >> > protected static byte[] getPassword(Configuration conf) { >> > return Base64.decodeBase64(conf.get(PASSWORD, "").getBytes()); >> > } >> > >> > Are there any reasons why I can't start specifying the charset? Is >> > UTF8 the right Charset to use? I am not an expert in non-English >> > charsets, so guidance would be welcome. >> > >>
