That worked. Thanks! Running lynx on my local copies of the *.html files works reasonably well, although the output is not what IE produces, and is harder for me to parse.
A minor follow up question. Currently I have to run lynx from its own directory. Otherwise I got \lynx_w32\lynx.bat foo.htm LINES value must be >= 2: got 1 initscr(): LINES=1 COLS=1: too small. Is there a way to set up lynx to let me run it from elsewhere? Steve Tolkin VP, Architecture FESCo Architecture & Strategy Group Fidelity Employer Services Company 400 Puritan Way M3B Marlborough MA 01752 508-787-9006 [EMAIL PROTECTED] The information in this email and subsequent attachments may contain confidential information that is intended solely for the attention and use of the named addressee(s). This message or any part thereof must not be disclosed, copied, distributed or retained by any person without authorization from Fidelity Investments. -----Original Message----- From: Chris Devers [mailto:[EMAIL PROTECTED] Sent: Wednesday, May 02, 2007 1:53 PM To: Tolkin, Steve Cc: Boston Perl Mongers Subject: Re: [Boston.pm] Extract text from html preserving newlines On Wed, 2 May 2007, Tolkin, Steve wrote: > Q1. Is there a way to automate IE or Mozilla Firefox to save 100's of > files as text? Probably, but might it be easier to automate using `lynx -dump` (or better still, `links -dump`) ? If those produce output the way you want them, automating it should be a snap to do, even with just a simple shell script. $ for f in *.html; do links -dump $f > ${f}.txt; done Etc. -- Chris Devers DO NOT LEAVE IT IS NOT REAL _______________________________________________ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm