Re: Support for Non Blocking Parsers

Andy Seaborne Sun, 29 Jan 2012 14:25:37 -0800

On 29/01/12 21:40, Henry Story wrote:

    It would be better of course if the structure passed could be one that was 
not
changeable, even better, if it could use NIO bytes buffers as that reduces
the need even to copy data, but I guess that the Jena parsers were not written
with that in mind.


This bit, I didn't follow.

Parsing, in general, needs a char stream and, for Turtle one-char lookahead.

The parsers work from InputStreams. The RIOT parsers work fromTokenizers, which normally work from InputStreams but it's chnagable asits Jena code.

An InputStream is just an interface and a bit of machinary (AKA a trait)- it can be implemented to implement over NIO buffers so a zero-copydesign is quite possible.

RIOT has PeekInputStream which could be adapted to get bytes from an NIObuffer.

My experience is that accessing an NIO buffer byte-by-byte needs alittle care - it may not be very cheap as several checks are always doneand, while the JIT is good, the per-byte cost that can be significant.It might be better to read out chunks (RIOT's InputStreamBuffered). Itwould still be zero-copy overall - no complete copy of the source taken.

Copying is not always bad - I have tried to do faster-than-std-javaconversion of UTF-8 bytes to chars in pure code, no copy, but thebuilt-in decoder (which is probably native code) is still a few-% betterdespite the fact it introduces a copy. CharsetDecoders work onByteBuffers. I don't think its possible in java to avoid a copy at thepoint of bytes->chars.


        Andy

Re: Support for Non Blocking Parsers

Reply via email to