On 21/03/12 14:27, Andy Seaborne wrote:
On 21/03/12 14:20, Sam Tunnicliffe wrote:
It seems that NodeLib, or rather NodecSSE it uses under the covers,
has problems round tripping certain strings. How the data gets into
the larger system is still an issue but somewhat orthogonal here.
String s = "Hello \uDAE0 World";
Node literal = Node.createLiteral(s);
ByteBuffer bb = NodeLib.encode(literal);
NodeLib.decode(bb);
blows up during the decode - looking at the stacktraces, this seems to
be the what causes the problems in committing the transaction. Should
we expect NodeLib's encode()& decode() to be symmetrical?
Yes.
What range of unicode codepoints does it fail on?
(D is high bit set in 16 bits).
It is part of a surrogate pair but not a complete pair.
Surrogate pairs come in (high, low) pairs. This is a high surrogate but
there is no low surrogate. If I add a low surrogate, it seems to work
for me.
String s = "\uDAE0\uDC00";
Unicode: chapter 3, section 3.8.
I think the exception occurs because the UTF-8 decoder (from the Java
library) aborts and says "end of file"
Andy
Andy
Cheers,
Sam