On 11-05-18 09:02 AM, Mike Blezien wrote:
Hello,
Is there a perl module available, or a regex method, that will prase an
HTML formatted file then remove ALL the HTML elements so you end up with
just the text content of the file?
Any help/suggestions appreciated.
HTML::TreeBuilder loads HTML::Element which has a method as_text(). Use
HTML::Element::look_down() to find the body, than use as_text()
http://search.cpan.org/~jfearn/HTML-Tree-4.2/lib/HTML/TreeBuilder.pm
http://search.cpan.org/~jfearn/HTML-Tree-4.2/lib/HTML/Element.pm
--
Just my 0.00000002 million dollars worth,
Shawn
Confusion is the first step of understanding.
Programming is as much about organization and communication
as it is about coding.
The secret to great software: Fail early & often.
Eliminate software piracy: use only FLOSS.
--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/