Re: [Boston.pm] Extract text from html preserving newlines

Tolkin, Steve Wed, 02 May 2007 11:07:52 -0700

That worked.  Thanks! 

Running lynx on my local copies of the *.html files works reasonably
well, although the output is not what IE produces, and is harder for me
to parse.

A minor follow up question.  Currently I have to run lynx from its own
directory.  Otherwise I got 

\lynx_w32\lynx.bat foo.htm 
LINES value must be >= 2: got 1
initscr(): LINES=1 COLS=1: too small.

Is there a way to set up lynx to let me run it from elsewhere?

Steve Tolkin
VP, Architecture   FESCo Architecture & Strategy Group   Fidelity
Employer Services Company
400 Puritan Way   M3B   Marlborough MA 01752   508-787-9006
[EMAIL PROTECTED]
The information in this email and subsequent attachments may contain
confidential information that is intended solely for the attention and
use of the named addressee(s). This message or any part thereof must not
be disclosed, copied, distributed or retained by any person without
authorization from Fidelity Investments.

-----Original Message-----
From: Chris Devers [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, May 02, 2007 1:53 PM
To: Tolkin, Steve
Cc: Boston Perl Mongers
Subject: Re: [Boston.pm] Extract text from html preserving newlines

On Wed, 2 May 2007, Tolkin, Steve wrote:

> Q1. Is there a way to automate IE or Mozilla Firefox to save 100's of
> files as text?

Probably, but might it be easier to automate using `lynx -dump` (or 
better still, `links -dump`) ?

If those produce output the way you want them, automating it should be a

snap to do, even with just a simple shell script. 

    $ for f in *.html; do links -dump $f > ${f}.txt; done

Etc.

-- 
Chris Devers
DO NOT LEAVE IT IS NOT REAL

_______________________________________________
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

Re: [Boston.pm] Extract text from html preserving newlines

Reply via email to