On Fri, 04 Apr 2003 17:39:19 -0800
"Net Llama!" <[EMAIL PROTECTED]> wrote:

> On 04/04/03 17:35, Collins Richey wrote:
> > Are there any generalized utility programs that will grap a web
> > page, extract the text, convert to a text (or fill-in-the-blanks)
> > file for printing?
> > 
> > I'm getting ready to work on some python code to do that for
> > printing the Slackware users' manual, but it would be nice to have a
> > real tool.
> 
> html2jpeg creates jpegs (basically screenshots) of webpages:
> http://freshmeat.net/projects/html2jpg/
> 
> html2ps converts html to postscript
> http://freshmeat.net/projects/html2ps/
> 
> html2pdf
> http://freshmeat.net/projects/html2pdf/
> 

Thanks,

Now that I've looked at the problem a little more closely, I probably
need more that this.  The root of what I want to retrieve is
www.slackware.com/book which is a php beast.  What I'm looking to do is

1. Retrieve the base page and follow all Next links, strip out all the
extra crap on each page, retain and format the text, and store the
result for printing.

2. I could do this with simple python tools for a normal html site, but
the [EMAIL PROTECTED] slackware site doesn't respond to simple http requests; even
the links are php commands.  A browser, of course, can wade through this
with ease, but I don't want to have to save each individual page as html
just to format it.

3. All this work because the Slack folks don't provide a printable
version.

Any thoughts?

--
Collins - Slack 9.0 EXT3
_______________________________________________
Linux-users mailing list
[EMAIL PROTECTED]
Unsubscribe/Suspend/Etc -> http://www.linux-sxs.org/mailman/listinfo/linux-users

Reply via email to