An alternative to tidy is my fix-html program: http://dcs.nac.uci.edu/~strombrg/fix-html.html
I had a page that wasn't working with tidy (was it the palm addicts website?), that fix-html sailed through. It's based on the BeautifulSoup python module, which appears to be pretty good at making sense of bad html. On Thu, 2005-03-03 at 10:06 +0100, Justus Piater wrote: > Hi, > > The issue of Web pages whose HTML is fouled up to the point of > impluckability (add this to Merriam-Webster!) comes up over and over > again. > > The standard solution would be to use wget with the right options to > download all that's needed, then run tidy on the file(s) in question, > and then pluck the local files. This is quite cumbersome, and one > loses the original URL in the plucked PDB. > > How about adding an option to plucker-build for filtering each > downloaded file through tidy? > > This should only be a minor hack, the tidying occurs in the right > place in the pipeline, and it increases plucker-build's practical > usability without placing additional burden on the user. > > Justus >
signature.asc
Description: This is a digitally signed message part

