On Sunday 20 July 2008 08:37, Gary Kline wrote:
> On Sun, Jul 20, 2008 at 05:03:15AM +0300, Giorgos Keramidas wrote:
> > On Sun, 20 Jul 2008 03:44:07 +0300, Giorgos Keramidas 
<[EMAIL PROTECTED]> wrote:
> > > Now, if you want to merely "hack something quick and dirty", a short
> > > Perl script can probably do regexp substitution similar to
> > >
> > >         #
> > >         # WARNING: THIS HAS NOT BEEN TESTED :P
> > >         #
> > >         my $foo = <STDIN>;
> > >         $foo = s:(<[^>]+>[^<]*</[^>]+>):$1\n:ge;
> > >         print "$foo";
> > >
> > > but you shouldn't trust the output of such a quick hack too much.
> >
> > As I wrote in reply to the personal email, this was untested and a bit
> > wrong in places, but now I've tried something like:
> >
> >   $ echo '<hello>world</hello><hello>next world</hello>' | \
> >   perl -e '$foo = <STDIN>; $foo =~ s:(<[^>]+>[^<]*</[^>]+>):$1\n:g; print
> > "$foo";'
> >
> > and it does seem to sort of work.  The output is:
> >
> >   <hello>world</hello>
> >   <hello>next world</hello>
> >
> > Maybe that's good enough?  They say `the perfect is the enemy of good
> > enough', so if this works for your data set, it's probably ok to use it
> > :-)
> >
> > Have fun,
> > Giorgos
>
>       Fun?!  welll, but yes, anything that can save me from
>       hand-editing  ~~70 files will be a riot;)

I haven't tried it, but I suspect if the simple approach fails, HTML::Tidy may 
well have an option which would help. It can be installed from CPAN or ports, 
where it is textproc/p5-HTML-Tidy.

Jonathan
_______________________________________________
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Reply via email to