On 29 Jan 2012, at 23:25, Andy Seaborne wrote:

> On 29/01/12 21:40, Henry Story wrote:
>>    It would be better of course if the structure passed could be one that 
>> was not
>> changeable, even better, if it could use NIO bytes buffers as that reduces
>> the need even to copy data, but I guess that the Jena parsers were not 
>> written
>> with that in mind.
> 
> This bit, I didn't follow.

I just discovered this, which you should find very interesting
   http://akka.io/docs/akka/2.0-M3/scala/io.html


> 
> Parsing, in general, needs a char stream and, for Turtle one-char look ahead.
> 
> The parsers work from InputStreams.  The RIOT parsers work from Tokenizers, 
> which normally work from InputStreams but it's chnagable as its Jena code.
> 
> An InputStream is just an interface and a bit of machinary (AKA a trait) - it 
> can be implemented to implement over NIO buffers so a zero-copy design is 
> quite possible.
> 
> RIOT has PeekInputStream which could be adapted to get bytes from an NIO 
> buffer.
> 
> My experience is that accessing an NIO buffer byte-by-byte needs a little 
> care - it may not be very cheap as several checks are always done and, while 
> the JIT is good, the per-byte cost that can be significant. It might be 
> better to read out chunks (RIOT's InputStreamBuffered).  It would still be 
> zero-copy overall - no complete copy of the source taken.
> 
> Copying is not always bad - I have tried to do faster-than-std-java 
> conversion of UTF-8 bytes to chars in pure code, no copy, but the built-in 
> decoder (which is probably native code) is still a few-% better despite the 
> fact it introduces a copy.  CharsetDecoders work on ByteBuffers.  I don't 
> think its possible in java to avoid a copy at the point of bytes->chars.
> 
>       Andy

Social Web Architect
http://bblfish.net/

Reply via email to