Re: some regex vs std.ascii vs handcode times

Juan Manuel Cabo Tue, 20 Mar 2012 22:10:01 -0700

On Monday, 19 March 2012 at 17:23:36 UTC, Andrei Alexandrescuwrote:


[.....]

I wanted for a long time to improve byLine by allowing it to doits own buffering. That means once you used byLine it's notpossible to stop it, get back to the original File, andcontinue reading it. Using byLine is a commitment. This is whatmost uses of it do anyway.


Great!! Perhaps we don't have to choose. We may have both!!
Allow me to suggest:

      byLineBuffered(bufferSize, keepTerminator);
or    byLineOnly(bufferSize, keepTerminator);
or    byLineChunked(bufferSize, keepTerminator);
or    byLineFastAndDangerous :-) hahah :-)

Or the other way around:

      byLine(keepTerminator, underlyingBufferSize);
renaming the current one to:
      byLineUnbuffered(keepTerminator);

Other ideas (I think I read them somewhere about
this same byLine topic):
  * I think it'd be cool if 'line' could be a slice of the
underlying buffer when possible if buffering is added.
  * Another good idea would be a new argument, maxLineLength,
so that one can avoid reading and allocating the whole
file into a big line string if there are no newlines
in the file, and one knows the max length desired.

--jm

Ok, this was the good surprise. Reading by chunks was fasterthan
reading the whole file, by several ms.
What may be at work here is cache effects. Reusing the same 1MBmay place it in faster cache memory, whereas reading 20MB atonce may spill into slower memory.
Andrei

Re: some regex vs std.ascii vs handcode times

Reply via email to