On 21/03/12 14:27, Andy Seaborne wrote:
On 21/03/12 14:20, Sam Tunnicliffe wrote:
It seems that NodeLib, or rather NodecSSE it uses under the covers,
has problems round tripping certain strings. How the data gets into
the larger system is still an issue but somewhat orthogonal here.

String s = "Hello \uDAE0 World";
Node literal = Node.createLiteral(s);
ByteBuffer bb = NodeLib.encode(literal);
NodeLib.decode(bb);

blows up during the decode - looking at the stacktraces, this seems to
be the what causes the problems in committing the transaction. Should
we expect NodeLib's encode()& decode() to be symmetrical?

Yes.

What range of unicode codepoints does it fail on?

(D is high bit set in 16 bits).

It is part of a surrogate pair but not a complete pair.

Surrogate pairs come in (high, low) pairs. This is a high surrogate but there is no low surrogate. If I add a low surrogate, it seems to work for me.

String s = "\uDAE0\uDC00";

Unicode: chapter 3, section 3.8.

I think the exception occurs because the UTF-8 decoder (from the Java library) aborts and says "end of file"

        Andy



Andy


Cheers,
Sam

Reply via email to