Re: the large file challenge

Scott Raney Sun, 10 Nov 2002 16:03:40 -0800

On Sun, 10 Nov 2002 Richard Gaskin <[EMAIL PROTECTED]> wrote:

> My hunch is that reading for lines is slower than reading a
> specified number of chars, since with lines it needs to evaluate
> each incoming character to determine if it's a return -- Scott, am I
> right or should they be about the same?


You're right, though I wouldn't think it would make *that* much
difference.

As for my guess as to the fastest way to do this, it'd probably be a
hybrid approach, using both "read for x" and "repeat for each line".
You'd start by opening the file for binary read (faster than other
modes).  Then read for X characters, where X would be some large
number experimentally determined for each system (it'd probably some
large percentage of the free RAM, and so probably on the order of a
few MB), and then use "repeat for each line l in it".

The trick is that the last line will be incomplete in this case, so
for the second and subsequent reads you subtract the length of the
last line from X, and do "read for X at Y", where Y is a running total
of what's been read, after subtracting the partial lines of course.
Some extra bookkeeping will be required in this case (e.g., if the tag
you're looking for is in the partial last line you need to subtract 1
from the count so you don't count it twice).  Exactly how to do this
part most efficiently is left as an excercise for the reader ;-)
  Regards,
    Scott

> -- 
>  Richard Gaskin 
>  Fourth World Media Corporation
>  Developer of WebMerge 2.0: Publish any database on any site
>  ___________________________________________________________
>  [EMAIL PROTECTED]       http://www.FourthWorld.com
>  Tel: 323-225-3717                       AIM: FourthWorldInc

********************************************************
Scott Raney  [EMAIL PROTECTED]  http://www.metacard.com
MetaCard: You know, there's an easier way to do that...

_______________________________________________
metacard mailing list
[EMAIL PROTECTED]
http://lists.runrev.com/mailman/listinfo/metacard

Re: the large file challenge

Reply via email to