Re: Loop Through Content Using LWP::Simple

Chas Owens Wed, 12 Sep 2007 12:39:57 -0700

On 9/12/07, Dr.Ruud <[EMAIL PROTECTED]> wrote:
snip
> Broken alternative;
snip
> $ perl -wle '$t="abc\ndefgh\n\nxyz"; print "<$1>" while $t =~ /(.*)/g'
snip
> But this variant might be handy:
> $ perl -wle '$t="abc\ndefgh\n\nxyz"; while ($t =~ /(.+)/g) { print "<$1>" }'
snip
> (main advantage: it doesn't create an immediate array)
>
> But I assume that a proper HTML parser will be the final answer.
snip


If the file returned is indeed HTML then an HTML parser is the right
answer.  But if he is just fetching a text file with HTTP and wants to
read it line by line, then either split (faster) or open (less memory
usage) is the way to go.  I would not recommend using a regex to get
just lines; however, if you are looking for something specific in the
entire file then a regex may be the way to go.

test of subs, output should be subname: 78 78
open: 78 78
regex1: 78 78
regex2: 78 78
split: 78 78

with 10 lines
           Rate regex1   open regex2  split
regex1  53096/s     --   -16%   -45%   -57%
open    63015/s    19%     --   -35%   -49%
regex2  96376/s    82%    53%     --   -22%
split  123675/s   133%    96%    28%     --

with 100 lines
          Rate regex1 regex2   open  split
regex1  5749/s     --   -44%   -55%   -61%
regex2 10239/s    78%     --   -20%   -30%
open   12799/s   123%    25%     --   -13%
split  14638/s   155%    43%    14%     --

with 1000 lines
         Rate regex1 regex2   open  split
regex1  575/s     --   -43%   -60%   -61%
regex2 1008/s    75%     --   -30%   -32%
open   1437/s   150%    43%     --    -3%
split  1478/s   157%    47%     3%     --

with 10000 lines
         Rate regex1 regex2   open  split
regex1 57.3/s     --   -44%   -61%   -61%
regex2  103/s    79%     --   -30%   -30%
open    146/s   155%    42%     --    -1%
split   148/s   158%    44%     1%     --

#!/usr/bin/perl

use strict;
use warnings;
use Benchmark;

my $text = ("a" x 78 . "\n") x 2;

my %subs = (
        split => sub {
                my @a;
                for my $line (split /\n/, $text) {
                        push @a, length $line;
                }
                return @a;
        },
        regex2 => sub {
                my @a;
                while ($text =~ /(.+)/g) {
                        my $line = $1;
                        push @a, length $line;
                }
                return @a;
        },
        regex1 => sub {
                my @a;
                while ($text =~ /(.*)/g) {
                        next unless $1;
                        my $line = $1;
                        push @a, length $line;
                }
                return @a;
        },
        open => sub {
                my @a;
                open my $f, "<", \$text;
                while (my $line = <$f>) {
                        chomp $line;
                        push @a, length $line;
                }
                return @a;
        },
);

print "test of subs, output should be subname: 78 78\n";
for my $sub (sort keys %subs) {
        print "$sub: @{[$subs{$sub}->()]}\n";
}

for my $lines (10, 100, 1_000, 10_000) {
        $text = ("a" x 78 . "\n") x $lines;
        print "\nwith $lines lines\n";
        Benchmark::cmpthese(-2, \%subs);
}

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/

Re: Loop Through Content Using LWP::Simple

Reply via email to