On Sat, Jun 24, 2006 at 05:03:22PM +0200, Marius Westenberg wrote:

> nowadays, most of the webpages are rather complex. I wonder how do you 
> handle those webpages? I have tried to download and convert them simply 
> with plucker, but unfortunately the results did not always satisfy me.
[...]
> I think it is pretty complex to edit every html-document manually, in 
> order to get the main content from the middle of the webpage.
For *books* I usually edit them manually since often, as you have
said, one just need to weed out several screenfuls of JS-code from
the top and bottom of the page.

Unfortunately, many of such books are just "not really HTML", for
example, text inside PRE tags, or text w/o P tags, just many
non-breakable spaces and BRs. Such "HTML files" are mostly useless,
anyway.

_______________________________________________
plucker-list mailing list
[email protected]
http://lists.rubberchicken.org/mailman/listinfo/plucker-list

Reply via email to