How to clean up HTML...
Using shell with xmllint (yes, ugly shortcuts below):
export cmd="xmllint --html"
find . -name '*.html' -exec $cmd \{\} > \{\}.new \;
find . -name '*.html' -exec cp \{\}.new \{\} \;
svn status | egrep '^\?' | sed -e 's/^\? *//g' | xargs rm
(--html not totally necessary if you have valid XML, eg you
can format xml as follows:
export cmd="xmllint"
find . -name '*.xml' -exec $cmd \{\} > \{\}.new \;
find . -name '*.xml' -exec cp \{\}.new \{\} \;
svn status | egrep '^\?' | sed -e 's/^\? *//g' | xargs rm
)
Using shell with tidy:
export cmd="tidy -m -i -c -e"
find . -name '*.html' -exec $cmd \{\} \;
In ant you would create a <fileset> and then do an <exec> of
much the same.
Both tools have some more interesting options.
- LSD
On Wed, Feb 01, 2006 at 09:29:56AM +1100, David Crossley wrote:
> Martin Sebor wrote:
> >
> > I'm a little distressed to see the conversion process has messed
> > up the formatting of the original HTML that I manually maintained
> > for readability. Specifically, many of the terminating tags (such
> > as </p>) are not indented as they ought to be and instead are in
> > column 1. I don't suppose there is an easy way to regenerate the
> > page so as to preserve more of the original formatting, is there?
>
> I tried my best to format stuff automatically
> as part of the Forrest output process. If it
> was raw xml serialiser output then it would have
> been even worse. No we cannot retain original
> formatting.
>
> I know that it is not good enough.
>
> Someone could run all documents through something
> like HTML Tidy or Henning's CodeWrestler or perhaps
> some XSL.
>
> I would be pleased to see how they do this, because
> i want to add the ability to our future tools.
>
> On many projects i have seen messy source documents
> cause grief with svn diffs - too much clutter and
> inconsistent whitespace.
>
> -David
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]