Re: \n in literals and N-Triples|N-Quads|Turtle files...

Andy Seaborne Wed, 21 Mar 2012 07:43:28 -0700

On 21/03/12 14:27, Andy Seaborne wrote:

On 21/03/12 14:20, Sam Tunnicliffe wrote:

It seems that NodeLib, or rather NodecSSE it uses under the covers,
has problems round tripping certain strings. How the data gets into
the larger system is still an issue but somewhat orthogonal here.


String s = "Hello \uDAE0 World";
Node literal = Node.createLiteral(s);
ByteBuffer bb = NodeLib.encode(literal);
NodeLib.decode(bb);

blows up during the decode - looking at the stacktraces, this seems to
be the what causes the problems in committing the transaction. Should
we expect NodeLib's encode()& decode() to be symmetrical?


Yes.

What range of unicode codepoints does it fail on?

(D is high bit set in 16 bits).


It is part of a surrogate pair but not a complete pair.

Surrogate pairs come in (high, low) pairs. This is a high surrogate butthere is no low surrogate. If I add a low surrogate, it seems to workfor me.


String s = "\uDAE0\uDC00";

Unicode: chapter 3, section 3.8.

I think the exception occurs because the UTF-8 decoder (from the Javalibrary) aborts and says "end of file"


        Andy


Andy


Cheers,
Sam

Re: \n in literals and N-Triples|N-Quads|Turtle files...

Reply via email to