On Sat, Jun 24, 2006 at 05:03:22PM +0200, Marius Westenberg wrote: > nowadays, most of the webpages are rather complex. I wonder how do you > handle those webpages? I have tried to download and convert them simply > with plucker, but unfortunately the results did not always satisfy me. [...] > I think it is pretty complex to edit every html-document manually, in > order to get the main content from the middle of the webpage. For *books* I usually edit them manually since often, as you have said, one just need to weed out several screenfuls of JS-code from the top and bottom of the page.
Unfortunately, many of such books are just "not really HTML", for example, text inside PRE tags, or text w/o P tags, just many non-breakable spaces and BRs. Such "HTML files" are mostly useless, anyway. _______________________________________________ plucker-list mailing list [email protected] http://lists.rubberchicken.org/mailman/listinfo/plucker-list

