Gary Hawkins wrote:
> Along that line, I would like to be able to wind up with pages after retrieval
> as plain text without html tags, hopefully using a module.
Here's a really quick way to do so using HTML::Parser, it can probably
use some tweaking.
Hope this helps,
Briac
#!/usr/bin/perl -w
use strict;
use HTML::Parser 3;
use LWP::Simple;
my $html = get("http://www.mit.edu") or die "Couldn't fetch the page";
my $parser = HTML::Parser->new(
unbroken_text => 1,
ignore_elements => [qw( script head )],
text_h => [ sub {print shift}, 'dtext']
)->parse($html)->eof();
__END__
--
briac
A flying lark. Five
trout swim in the pond. Four foxes
under a she-oak.
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]