On 29/01/12 21:40, Henry Story wrote:
    It would be better of course if the structure passed could be one that was 
not
changeable, even better, if it could use NIO bytes buffers as that reduces
the need even to copy data, but I guess that the Jena parsers were not written
with that in mind.

This bit, I didn't follow.

Parsing, in general, needs a char stream and, for Turtle one-char look ahead.

The parsers work from InputStreams. The RIOT parsers work from Tokenizers, which normally work from InputStreams but it's chnagable as its Jena code.

An InputStream is just an interface and a bit of machinary (AKA a trait) - it can be implemented to implement over NIO buffers so a zero-copy design is quite possible.

RIOT has PeekInputStream which could be adapted to get bytes from an NIO buffer.

My experience is that accessing an NIO buffer byte-by-byte needs a little care - it may not be very cheap as several checks are always done and, while the JIT is good, the per-byte cost that can be significant. It might be better to read out chunks (RIOT's InputStreamBuffered). It would still be zero-copy overall - no complete copy of the source taken.

Copying is not always bad - I have tried to do faster-than-std-java conversion of UTF-8 bytes to chars in pure code, no copy, but the built-in decoder (which is probably native code) is still a few-% better despite the fact it introduces a copy. CharsetDecoders work on ByteBuffers. I don't think its possible in java to avoid a copy at the point of bytes->chars.

        Andy

Reply via email to