On 10/14/10 12:56 CDT, Denis Koroskin wrote:
appendDelim *requires* buffering for to be implemented. No OS provides
an API to read from a file (be it pipe, socket, whatever) to read up to
some abstract delimiter. It *always* reads in blocks.
Clear. What may be not so clear is that read(ubyte[] buf) ALSO requires
buffering. Disk I/O comes in fixed buffer sizes (sometimes aligned at
512 bytes or whatever), so ANY protocol that allows the user to set the
maximum bytes to read will require buffering and copying. So how is
appendDelim worse than read?
As such, if you
need to read until a delimeter, you need to fetch block to some internal
buffer, MANUALLY search through it and THEN copy to output string.
And there's no way for the client to efficiently do that.
I've
implemented that on top of chunked read interface, and it was 5% faster
than getline()/getdelim() that GNU libc provides (despite you claming it
to be "many times faster"). It's not.
Please post your code.
Buffering requires and additional level of data copying, and this is bad
for fast I/O.
Agreed. But then you define routines that also requires buffering. How
do you reconcile your own requirement with your own interface?
If you need fast I/O or must pull that out of the stream
interface. Otherwise chunked read will be less efficient due to
additional copies to and from buffers.
On the contrary line-based reading can be implemented on top of the
chunked read without sacrificing a tiny bit of efficiency.
Except for extra copying.
appendDelim implementation:
1. Low-level read in internal buffers
2. Search for delimiter (assume found for simplicity)
3. Resize user buffer
4. Copy
That's one copy, with the necessary corner cases when the delimiter
isn't found yet etc. (which increase copying ONLY if the buffer is
actually moved when reallocated).
The implementation in your message on 10/13/2010 21:20 CDT:
1. Low-level read in internal buffers
2. Copy from internal buffers into the internal buffer provided by your
ByLine implementation
3. Copy from the internal buffer of ByLine into the user-supplied buffer
That's two copies. Agreed?
Andrei