Re: Link check problem.

Pete Emerson Wed, 06 Jun 2001 11:11:44 -0700
Bruno,
    I used CGI and LWP::Simple to get ESPN's AL East baseball standings table
and then rewrite that table into a format that I wanted. You can see it at
http://jasper.cs.yale.edu if you're curious (insert shameless plug for the Red
Sox here). But that sort of thing might help you to grab the source for an
HTML page. Here's the snippet that's the critical part, and if you want my
whole script, let me know and I'll send it to you.

#!/usr/bin/perl

use CGI;
use LWP::Simple;
use strict;

open OUTFILE, ">/var/www/html/baseball.html";
my $content = get("http://sports.espn.go.com/mlb/standings";);
my @lines=split(/\n/,$content);

Obviously it may not be necessary for you to split the content up into an
array, but I found it useful for what I was doing.
As far as checking to see if a page is available for browsing, you might just
check the contents for any one of <HTML>, <TITLE>, <HEAD>, or <BODY> or
<A HREF or <IMG (all case insensitive, of course).  Chances are pretty high
that if a page is live you'll hit one of those on any page. There's gotta be a
better solution, though.

        Pete

Bruno Veldeman wrote:

> Hi,
>
> back on the same thing as before:
>
> See Orig msg below.
>
> As I only want to know if the page is available for browsing, what are the
> result codes I should look for.
>
> and another question:
>
> If I want to read the page title and meta tags, how do I get the content, I
> looked into HTTP:: But got lost, any ideas?
>
> Bruno
>

[ snip ]
Re: Link check problem.

Reply via email to