bruce wrote: > hi paddy... > > that's exactly what i'm trying to accomplish... i've used tidy, but it seems > to still generate warnings... > > initFile -> tidy ->cleanFile -> perl app (using xpath/livxml) > > the xpath/linxml functions in the perl app complain regarding the file. my > thought is that tidy isn't cleaning enough, or that the perl xpath/libxml > functions are too strict! > > which is why i decided to see if anyone on the python side has > experienced/solved this problem..
FWIW here's my usual approach: http://copia.ogbuji.net/blog/2005-07-22/Beyond_HTM Personally, I avoid Tidy. I've too often seen it crash or hang on really bad HTML. TagSoup seems to be built like a tank. I've also never seen BeautifulSoup choke, but I don't use it as much as TagSoup. -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.net http://fourthought.com http://copia.ogbuji.net http://4Suite.org Articles: http://uche.ogbuji.net/tech/publications/ -- http://mail.python.org/mailman/listinfo/python-list