On 04/04/03 19:39, Collins Richey wrote:
Now that I've looked at the problem a little more closely, I probably
need more that this.  The root of what I want to retrieve is
www.slackware.com/book which is a php beast.  What I'm looking to do is

1. Retrieve the base page and follow all Next links, strip out all the
extra crap on each page, retain and format the text, and store the
result for printing.

2. I could do this with simple python tools for a normal html site, but
the [EMAIL PROTECTED] slackware site doesn't respond to simple http requests; even
the links are php commands.  A browser, of course, can wade through this
with ease, but I don't want to have to save each individual page as html
just to format it.

3. All this work because the Slack folks don't provide a printable
version.

Any thoughts?

wget with the mirror option?


--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
L. Friedman                                    [EMAIL PROTECTED]
Linux Step-by-step & TyGeMo:                    http://netllama.ipfox.com

8:35pm up 26 days, 21:04, 3 users, load average: 0.27, 0.07, 0.02

_______________________________________________
Linux-users mailing list
[EMAIL PROTECTED]
Unsubscribe/Suspend/Etc -> http://www.linux-sxs.org/mailman/listinfo/linux-users

Reply via email to