Brian Volk wrote:
Hi All,

I'm still very much learning and need a little help getting started w/ my
next project... :-)

I manage a web site that has links to other web sites (MSDS - material
safety data sheets).  When the web sites that I'm linking to change their
pages, my links break.  I need to create a program that will check to see if
I have any broken links....

The web site was written in JavaScript.  The files that contain the URL for
the MSDS links are just notepad .txt files.  These files also contain the
short description to the product; for example:

J:\flash_host\ecomm\descriptions\product\small\70005.txt

contains.........

Non-acid disinfectant bathroom cleaner. Ready-to-use. Kills HBV and HCV on
inanimate surfaces. EPA Reg. #5741-18 ~
http://www.spartanchemical.com/sfa/MSDSRep.nsf/99b229d7d7868537852567e0006d7
a64/671a7d07c725353885256e9f0063c4cc!OpenDocument



So I guess I need to create a program that will check each file in the
J:\flash_host\ecomm\descriptions\product\small directory and search for the
http://www string (?) then (ping ?) that URL? The next step would be to
return the file name (70005.txt) of the broken links to a file?? Does
this sound like I'm on the right track..? I have my trusty Learning Perl
book, so I'm not totally lost.

Sounds right. I suggest Regexp::Common for finding the url in the text file

use Regexp::Common qw( URI );

/$RE{URI}{HTTP}/

and LWP for checking the links:

use LWP::Simple;
content = get( $url );
die "Couldn't get it!" unless defined $content;

ping will only tell you if a site is up, not if its webserver is available or if a page is available.

Randy.

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>




Reply via email to