date:20070502

[Boston.pm] [job] Software Engineer, superpages.com, Waltham, MA

2007-05-02 Thread Ronald J Kimball

Superpages.com is looking for a team player to work in a dynamic group of developers. We are a brand new company (recent spin-off). Regardless we are a large, profitable and stable company. The office is located in Waltham, MA just next to exit 28b of I-95/128. Both senior and quick-learning juni

Re: [Boston.pm] Extract text from html preserving newlines

2007-05-02 Thread Tolkin, Steve

That worked. Thanks! Running lynx on my local copies of the *.html files works reasonably well, although the output is not what IE produces, and is harder for me to parse. A minor follow up question. Currently I have to run lynx from its own directory. Otherwise I got \lynx_w32\lynx.bat f

Re: [Boston.pm] Extract text from html preserving newlines

2007-05-02 Thread Tolkin, Steve

Thanks Jerrad, I actually tried lynx first. However, the html files are on a server that needs authentication. Even adding -auth my-user-id:my-pw To lynx was not enough. Here is the lynx output (I added the # as these are comments in the perl program): # Looking up [my proxy] # Making HTTP

Re: [Boston.pm] Extract text from html preserving newlines

2007-05-02 Thread Jerrad Pierce

NTLM is bad, 'm-k? -- Free map of local environmental resources: http://CambridgeMA.GreenMap.org -- MOTD on Boomtime, the 49th of Discord, in the YOLD 3173: It is useless for sheep to pass resolutions in favor of vegetarianism while wolves remain of a different opinion.

Re: [Boston.pm] Extract text from html preserving newlines

2007-05-02 Thread Chris Devers

On Wed, 2 May 2007, Tolkin, Steve wrote: > Q1. Is there a way to automate IE or Mozilla Firefox to save 100's of > files as text? Probably, but might it be easier to automate using `lynx -dump` (or better still, `links -dump`) ? If those produce output the way you want them, automating it shoul

Re: [Boston.pm] Extract text from html preserving newlines

2007-05-02 Thread Jerrad Pierce

lynx -dump -- Free map of local environmental resources: http://CambridgeMA.GreenMap.org -- MOTD on Boomtime, the 49th of Discord, in the YOLD 3173: It is useless for sheep to pass resolutions in favor of vegetarianism while wolves remain of a different opinion.

[Boston.pm] Extract text from html preserving newlines

2007-05-02 Thread Tolkin, Steve

I want to extract the text from several hundred *.html files. Many html tags cause a newline to appear in the output, e.g. etc. In Internet Explorer if I do "Files Save As..." and change "Save as Type" to be "Text File (*.txt)" the output file preserves newlines (and other whitespace) in a reas

[Boston.pm] [job] Software Engineer, superpages.com, Waltham, MA

Re: [Boston.pm] Extract text from html preserving newlines

Re: [Boston.pm] Extract text from html preserving newlines

Re: [Boston.pm] Extract text from html preserving newlines

Re: [Boston.pm] Extract text from html preserving newlines

Re: [Boston.pm] Extract text from html preserving newlines

[Boston.pm] Extract text from html preserving newlines

7 matches

Site Navigation

Mail list logo

Footer information