"Paul J. Lucas" <[EMAIL PROTECTED]> writes:

> On 28 Jan 2000, Randal L. Schwartz wrote:
> 
> > Have you looked at the new XS version of HTML::Parser?
> 
>       Not previously, but I just did.
> 
> > It's a speedy little beasty.  I dare say probably faster than even
> > expat-based XML::Parser because it doesn't do quite as much.
> 
>       But still an order of magnitude slower than mine.  For a test,
>       I downloaded Yahoo!'s home page for a test HTML file and wrote
>       the following code:
> 
> ----- test code -----
> #! /usr/local/bin/perl
> 
> use Benchmark;
> use HTML::Parser;
> use HTML::Tree;
> 
> @t = timethese( 1000, {
>    'Parser' => '$p = HTML::Parser->new(); $p->parse_file( "/tmp/test.html" );',
>    'Tree'   => '$html = HTML::Tree->new( "/tmp/test.html" );',
> } );
> ---------------------
> 
>       The results are:
> 
> ----- results -----
> Benchmark: timing 1000 iterations of Parser, Tree...
>     Parser: 37 secs (36.22 usr  0.15 sys = 36.37 cpu)
>       Tree:  7 secs ( 7.40 usr  0.22 sys =  7.62 cpu)
> -------------------
> 
>       One really can't compete against mmap(2), pointer arithmetic,
>       and dereferencing.

That's because you fall back to version 2 compatibility when you don't
provide any arguments to the HTML::Parser constructor.  The parser
will then make useless method calls for all stuff it finds, and method
calls with perl are not as cheap as I would wish.

----- test code -----
use Benchmark;
use HTML::Parser;

timethese( 1000, {
   'Parser' => '$p = HTML::Parser->new(); $p->parse_file( "./index.html" );',
   'Parser3' => 'HTML::Parser->new(api_version => 3)->parse_file( "./index.html" );'
} );
---------------------

$ lwp-download http://yahoo.com
Saving to 'index.html'...
11.6 KB received in 2 seconds (5.8 KB/sec)

$ perl test.pl
Benchmark: timing 1000 iterations of Parser, Parser3...
    Parser: 30 wallclock secs (29.31 usr +  0.20 sys = 29.51 CPU)
   Parser3:  2 wallclock secs ( 1.39 usr +  0.17 sys =  1.56 CPU)

...but this is kind of a useless benchmark, as it does not do anything.

Regards,
Gisle

Reply via email to