I'm saying that I don't know of anything in the core API which performs a getBytes() on the data itself. Accumulo itself is agnostic dealing only in byte[]. I think we're saying the same thing..

On 10/29/2012 8:54 PM, Benson Margulies wrote:
On Mon, Oct 29, 2012 at 8:46 PM, Josh Elser <[email protected]> wrote:
+1 Mike.

1. It would be hard for me to believe Key/Value are ever handled internally
in terms of Strings, but, if such a case does exist, it would be extremely
prudent to fix.

2. FWIW, the Shell does use ISO-8859-1 as its charset which is referenced by
other commands [1,2]. It would be good to double check all of the other
commands.

I'm a bit lost. Any possible Java String can be rendered in UTF-8. So,
if you are calling String.getBytes to turn a string into some bytes
for some purpose, I think you need UTF-8.

On the other hand, as Mike pointed out, new String(somebytes, "utf-8")
will destroy data for some byte values that are not, in fact, UTF-8.
By why would Accumulo ever need to string-ify some array of bytes of
uncertain parentage?



[1]
https://github.com/apache/accumulo/blob/trunk/core/src/main/java/org/apache/accumulo/core/util/shell/Shell.java
[2]
https://github.com/apache/accumulo/blob/trunk/core/src/main/java/org/apache/accumulo/core/util/shell/commands/InsertCommand.java


On 10/29/2012 8:27 PM, Michael Flester wrote:

I agree with Benson entirely with one caveat. It seems to me that there
might be two categories of things being discussed

    1. User data (keys and values)
    2. Ancillary things needed for operation of Accumulo (passwords).

These could well be considered separately. Trying to do anything with
keys and values other than treating them as bytes all of the time
I find quite scary.

And if this is only being done to satisfy pmd or findbugs, those tools
can be convinced to modify their reporting about this issue.


Reply via email to