Now that I've looked at the problem a little more closely, I probably need more that this. The root of what I want to retrieve is www.slackware.com/book which is a php beast. What I'm looking to do is
1. Retrieve the base page and follow all Next links, strip out all the extra crap on each page, retain and format the text, and store the result for printing.
2. I could do this with simple python tools for a normal html site, but the [EMAIL PROTECTED] slackware site doesn't respond to simple http requests; even the links are php commands. A browser, of course, can wade through this with ease, but I don't want to have to save each individual page as html just to format it.
3. All this work because the Slack folks don't provide a printable version.
Any thoughts?
wget with the mirror option?
-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ L. Friedman [EMAIL PROTECTED] Linux Step-by-step & TyGeMo: http://netllama.ipfox.com
8:35pm up 26 days, 21:04, 3 users, load average: 0.27, 0.07, 0.02
_______________________________________________ Linux-users mailing list [EMAIL PROTECTED] Unsubscribe/Suspend/Etc -> http://www.linux-sxs.org/mailman/listinfo/linux-users
