"Paul J. Lucas" <[EMAIL PROTECTED]> writes:
> On 28 Jan 2000, Randal L. Schwartz wrote:
>
> > Have you looked at the new XS version of HTML::Parser?
>
> Not previously, but I just did.
>
> > It's a speedy little beasty. I dare say probably faster than even
> > expat-based XML::Parser because it doesn't do quite as much.
>
> But still an order of magnitude slower than mine. For a test,
> I downloaded Yahoo!'s home page for a test HTML file and wrote
> the following code:
>
> ----- test code -----
> #! /usr/local/bin/perl
>
> use Benchmark;
> use HTML::Parser;
> use HTML::Tree;
>
> @t = timethese( 1000, {
> 'Parser' => '$p = HTML::Parser->new(); $p->parse_file( "/tmp/test.html" );',
> 'Tree' => '$html = HTML::Tree->new( "/tmp/test.html" );',
> } );
> ---------------------
>
> The results are:
>
> ----- results -----
> Benchmark: timing 1000 iterations of Parser, Tree...
> Parser: 37 secs (36.22 usr 0.15 sys = 36.37 cpu)
> Tree: 7 secs ( 7.40 usr 0.22 sys = 7.62 cpu)
> -------------------
>
> One really can't compete against mmap(2), pointer arithmetic,
> and dereferencing.
That's because you fall back to version 2 compatibility when you don't
provide any arguments to the HTML::Parser constructor. The parser
will then make useless method calls for all stuff it finds, and method
calls with perl are not as cheap as I would wish.
----- test code -----
use Benchmark;
use HTML::Parser;
timethese( 1000, {
'Parser' => '$p = HTML::Parser->new(); $p->parse_file( "./index.html" );',
'Parser3' => 'HTML::Parser->new(api_version => 3)->parse_file( "./index.html" );'
} );
---------------------
$ lwp-download http://yahoo.com
Saving to 'index.html'...
11.6 KB received in 2 seconds (5.8 KB/sec)
$ perl test.pl
Benchmark: timing 1000 iterations of Parser, Parser3...
Parser: 30 wallclock secs (29.31 usr + 0.20 sys = 29.51 CPU)
Parser3: 2 wallclock secs ( 1.39 usr + 0.17 sys = 1.56 CPU)
...but this is kind of a useless benchmark, as it does not do anything.
Regards,
Gisle