On 26 Feb 2005 02:36:31 -0800, Paul Rubin <> wrote: > Jorgen Grahn <[EMAIL PROTECTED]> writes: >> You should probably do what some other poster suggested -- download >> lynx or some other text-only browser and make your code execute it >> in -dump mode to get the text-formatted html. You'll get that >> working in an hour or so, and then you can see if you need something >> more complicated. > > Lynx is pathetically slow for large files. It seems to use a > quadratic algorithm for remembering where the links point, or > something. I wrote a very crude but very fast renderer in C that I > can post if someone wants it, which is what I use for this purpose.
That may be so, but it's fast enough for all the people who use it as a general html->plaintext tool, so it's probably good enough for the OP. w3m and links are other options. They provide better formatting than lynx, and at least w3m has the -dump option. I wouldn't mind if there was a reusable library for rendering HTML to text, from various languages. I'd also like to see one (CSS-aware) for rendering to troff or Postscript. /Jorgen -- // Jorgen Grahn <jgrahn@ Ph'nglui mglw'nafh Cthulhu \X/ algonet.se> R'lyeh wgah'nagl fhtagn! -- http://mail.python.org/mailman/listinfo/python-list