Here's a very naïve approach, which will probably work. Might screw up your <PRE> sections though...
perl -p0 s/ >/> /g;s/^[^>]{1,69}[>"]/join$",split' ',$&/mge Including an end of line match might make it a little more resilient: ^[^>]{1,69}[>"]$ It puts the attributes with the entity, but that looks right to me... i.e. you get: <META NAME="GENERATOR"> Instead of <META NAME="GENERATOR"> Greg -----Original Message----- From: Phil Carmody [mailto:[EMAIL PROTECTED] Sent: Friday, October 10, 2003 1:34 PM To: [EMAIL PROTECTED] Subject: HTML de-uglifier in 2 lines of perl #!/usr/bin/perl -n chomp;if($#p>=0&&s/^(\"?>)//){$p[-1].="$1\n";print(join($w<70?' ':"\n",@p));@p=($_);$w=0} [EMAIL PROTECTED],$_}$w+=length;}{print(join("\n",@p))if($#p>=0); I wrote that because docbook2html produces ugly HTML: <<< <HTML ><HEAD ><TITLE >A World Wide Web Interface to CTAN</TITLE ><META NAME="GENERATOR" CONTENT="Modular DocBook HTML Stylesheet Version 1.76b+ "></HEAD ><BODY ... >>> and I wanted (IMHO) prettier HTML: <<< <HEAD> <TITLE> A World Wide Web Interface to CTAN</TITLE> <META NAME="GENERATOR" CONTENT="Modular DocBook HTML Stylesheet Version 1.76b+"> </HEAD> <BODY ... >>> The script also tries to join multiple attributes onto the same line, as long as the line wouldn't be too long (70 chars) as I also find that improves the readability of HTML (by reducing the noise level). As I suck at perl, I reckon that something only half the length of that might be possible. Don't spend more than 2 minutes on it. I didn't! Phil ------------------------------------------------------------------------------ This message is intended only for the personal and confidential use of the designated recipient(s) named above. If you are not the intended recipient of this message you are hereby notified that any review, dissemination, distribution or copying of this message is strictly prohibited. This communication is for information purposes only and should not be regarded as an offer to sell or as a solicitation of an offer to buy any financial product, an official confirmation of any transaction, or as an official statement of Lehman Brothers. Email transmission cannot be guaranteed to be secure or error-free. Therefore, we do not represent that this information is complete or accurate and it should not be relied upon as such. All information is subject to change without notice.