> To extract just the text of an HTML file I use HTML::TokeParser and
> the get_trimmed_text("</BODY>") method call. That takes care of all
> the HTML tags in one fell swoop. It will leave strings like [IMG] where
> an image tag was but '$text=~s/\s*\[IMG\]\s*/ /g;' takes care of those.
I think the OP was more interested in optimizing for speed rather than a
correct solution. I would imagine that loading up a full parser to parse
the HTML would be slower than a regexp. Does the parser remove
references to Javascript code ( see my other post under the original
thread ) ?
- Ron
_______________________________________________
Perl-Win32-Users mailing list
[EMAIL PROTECTED]
http://listserv.ActiveState.com/mailman/listinfo/perl-win32-users