Dont know much about PHP or Python but with Perl I'd just make a socket and grab the page I want, look at: http://www.itworld.com/nl/perl/05312001/pf_index.html for a basic perl sockets using LWP. the url for the book is http://www.slackware.com/book/index.php a socket will grab the page as it can mask itself as a browser, it does not care what the links are formatted as so there should be no problem following the "next" link or putting the TOC in a hash, they are even nice enough to give you a HR to pattern match and grab whats between them.
On Fri, 4 Apr 2003 20:39:46 -0700 - Collins Richey <[EMAIL PROTECTED]> wrote the following Re: Re: Printing web pages >On Fri, 04 Apr 2003 17:39:19 -0800 >"Net Llama!" <[EMAIL PROTECTED]> wrote: > >> On 04/04/03 17:35, Collins Richey wrote: >> > Are there any generalized utility programs that will grap a web >> > page, extract the text, convert to a text (or fill-in-the-blanks) >> > file for printing? >> > >> > I'm getting ready to work on some python code to do that for >> > printing the Slackware users' manual, but it would be nice to have a >> > real tool. >> >> html2jpeg creates jpegs (basically screenshots) of webpages: >> http://freshmeat.net/projects/html2jpg/ >> >> html2ps converts html to postscript >> http://freshmeat.net/projects/html2ps/ >> >> html2pdf >> http://freshmeat.net/projects/html2pdf/ >> > >Thanks, > >Now that I've looked at the problem a little more closely, I probably >need more that this. The root of what I want to retrieve is >www.slackware.com/book which is a php beast. What I'm looking to do is > >1. Retrieve the base page and follow all Next links, strip out all the >extra crap on each page, retain and format the text, and store the >result for printing. > >2. I could do this with simple python tools for a normal html site, but >the [EMAIL PROTECTED] slackware site doesn't respond to simple http requests; even >the links are php commands. A browser, of course, can wade through this >with ease, but I don't want to have to save each individual page as html >just to format it. > >3. All this work because the Slack folks don't provide a printable >version. > >Any thoughts? > >-- >Collins - Slack 9.0 EXT3 >_______________________________________________ >Linux-users mailing list >[EMAIL PROTECTED] >Unsubscribe/Suspend/Etc -> >http://www.linux-sxs.org/mailman/listinfo/linux-users _______________________________________________ Linux-users mailing list [EMAIL PROTECTED] Unsubscribe/Suspend/Etc -> http://www.linux-sxs.org/mailman/listinfo/linux-users
