Hi!

There's the wonderful recipe 20.5 ("Converting HTML to ASCII") in Chapter 20
("Web Automation") of the "Perl Cookbook" (by Tom Christiansen and Nathan
Torkington, from O'Reilly).

A basic way to achieve the ripping of HTML tags and the replacement of <br>
and <p> tags by line breaks might be something like:

#!/usr/bin/perl -w

open HTMLFILE, "<the_html_file's_name" || die "Can't open that: $!";
while (<HTMLFILE>)
{
  chomp;
  s/<p[^>]+>/\n\n/gi;
  s/<br[^>]+>/\n/gi;
  s/<[^>]+>//g;
  print;
}

- but be careful! This won't work when there are multi-line HTML tags!


Sascha

----------
>Von: "Sparkle Williams" <[EMAIL PROTECTED]>
>An: [EMAIL PROTECTED]
>Betreff: perl and internet files
>Datum: Don, 19. Jul 2001 15:51 Uhr
>

> Good morning!
> I just wrote a perl program that retrieves files of type http:// and ftp://
> from the internet.  When it retrieves the files it
> comes up in the html syntax of head, body, text etc. Is there any way I can
> write an addition to my script that will cause
> the text to come up in it's formatted form rather than the html syntax
> describing it's format?
>
> _________________________________________________________________
> Descargue GRATUITAMENTE MSN Explorer en http://explorer.msn.es/intl.asp
>
>
> --
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to