Re: middle line (was Re: Daily Perl FAQ...)

Michael G Schwern Thu, 29 Nov 2001 13:37:35 -0800

On Thu, Nov 29, 2001 at 12:43:43PM -0800, Chris Thorpe wrote:
> > you can do it in O(1) memory and O(n) time. just take schwern's
> > File::ReadBackwards solution with a fix to the module so it supports
> > TELL. and compare the tell values to see when you have read the middle
> > line. then you only read the entire file once (maybe with a dup read of
> > the center line) which is O(n) in time. and it uses no more than 1
> > buffer (actually 2) of data which is O(1) in memory no matter the size
> > of the file. that assumes normal size lines and not enormous ones.
> 
> My point remains that you have to read the file 1 1/2 times or keep an
> array in memory.  There is no way to use byte offsets to find the center
> line in a file because linebreaks are simply a special class of character,
> and are not distributed with any predictability.  If the TELL function
> works on lines rather than bytes, it is because it has read through the
> file and figured out where the linebreaks are and created some sort of
> array to remember them, which I thought was against the rules.


Look closer at how File::ReadBackwards does it's work.  Rather than
naively reading in the whole file to an array and then just iterating
through that backwards, it does it in small buffered chunks.  So it
reads in the last 8192 bytes, breaks that down into an array and feeds
that out.  Once it's exhausted that, the next to last 8192 bytes is
read in and broken into chunks.

This is O(1) memory.  Never more than 8192 bytes (or whatever your
buffer size happens to be).

File::ReadBackwards is a generic module and is designed to be
efficient.  Strictly speaking, it violates the "no arrays"
requirement, but that's not essential to it's working.  Read on.


> There's no way to read the entire file exactly once, keep only 1
> buffer containing only the line to save, and print out the middle
> line, all without using any arrays.

Oh ye of little faith.  Just read the file backwards one byte at a
time to find the "next" newline while reading the file forwards one
byte at a time to do the same.  Compare file positions until they
match.  Horribly inefficient, but it uses no arrays, no buffers, a
very tiny amount of memory and only reads the file once.

Just to be swank, I've eliminated the need to use perl's readline
command or any buffering of the middle line at all. ;)


use Fcntl qw(:seek);

open F, $ARGV[0] or die;
open B, $ARGV[0] or die;

my $g_idx = -1;
seek B, $g_idx, SEEK_END or die;

sub find_eol_backwards {
    my($fh) = shift;

    my $eol;

    while(1) {
        my $char;
        read $fh, $char, 1;
        $eol = tell $fh if $char eq "\n";

        $g_idx--;
        seek $fh, -2, SEEK_CUR;

        last if defined $eol;
    }

    return $eol;
}

sub find_eol_forwards {
    my($fh) = shift;

    my $eol;

    while(1) {
        read $fh, $char, 1;
        $eol = tell $fh if $char eq "\n";

        last if defined $eol;
    }

    return $eol;
}

my $b_eol = -1;
my $f_eol = -1;
while(1) {
    $f_eol = find_eol_forwards(F);

    printf "F = %d, B = %d\n", $f_eol, $b_eol;
    last if $b_eol == $f_eol;

    $b_eol = find_eol_backwards(B);

    printf "F = %d, B = %d\n", $f_eol, $b_eol;
    last if $b_eol == $f_eol;
}

print scalar <F>;


-- 

Michael G. Schwern   <[EMAIL PROTECTED]>    http://www.pobox.com/~schwern/
Perl Quality Assurance      <[EMAIL PROTECTED]>         Kwalitee Is Job One
You're smoother than a tunnel of shining sorrow.

Re: middle line (was Re: Daily Perl FAQ...)

Reply via email to