> I have pulled the seach.html file as follows:
> I went to link http://srch.overture.com) then search for word "help",
> then I save the result as file named search.html
> Then I wrote the script below to extract and find the URLs in this
> saved web page (which is not working very well).
>
> Part II
> I now want to pipe the saved file to the script from STDIN.
> the URLs found should be printed to the command line, each on different line.
> The  script should be   general enough so that it can be tested with
> a different query, eg. "help them".

This will produce a list of URL's:

#!/usr/local/bin/perl

use HTML::SimpleLinkExtor;
use LWP::Simple;

$URL =
"http://srch.overture.com/d/search/;$sessionid$LME54CYADCNCWCQCBGWAPUQ?type=hom
e&tm=1&Keywords=help";

my %seen = ();
my $parser = HTML::LinkExtor->new(undef, "$URL");
$parser->parse(get($URL))->eof; # offending line if not connected
my @links = $parser->links;

foreach my $linkarray (@links) {
        my @element = @$linkarray;
        shift @element;
        while (@element) {
                my ($attr_name, $attr_value) = splice(@element, 0, 2);
                push @LinksToProcess, "$attr_value";
                $seen{$attr_value}++;
        }
}

for (@LinksToProcess) {
        print "$_\n";
}

__END__

Gary






















-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to