On Mon, Oct 29, 2012 at 3:18 PM, John Vines <vi...@apache.org> wrote: > So perhaps we should have ISO-8859-1 as the standard. Mike- do you see any > reason to use something beside ISO-8859-1 for the encodings?
I object and caution against *any* plan that involves transcoding from X to UTF-16 and back where when the data is not always going to be valid bytes of encoding X. The only clean solution here is to have an API entirely in terms of bytes, and either let the user do getBytes if they want to store string data, or provide additional API. > > John > > On Mon, Oct 29, 2012 at 3:14 PM, Michael Flester <fles...@gmail.com> wrote: > >> > UTF-8 should always be present (according to the JLS), and as a >> multi-byte >> > format should be able to encode any character that you would need to. >> > >> >> UTF-8 cannot encode arbitrary data. All data that we store in accumulo >> is not characters. A safe encoding to use as a pass through when you >> don't know if you are dealing with characters is ISO-8859-1 since we know >> that we can make the round trip from bytes to string to bytes without loss. >>