crawling question...

Peter Eisengrein Thu, 27 May 2004 10:16:35 -0700

Title: RE: perl/spider/crawling question...

if you can assume the info you want is somehow linked to their main page (either directly or by proxy) then you should be able to keep a file or database of each school's url. Then use on of the various modules (Win32::Internet, LWP, etc) to get the webpage. Then parse through it to find links to more pages. I would add each url you find to an array. Once you have crawled each page listed in the array, then you go back to your master list and move on to the next school.

Also look at HTML::Parser and related modules

> -----Original Message-----
> From: bruce [mailto:[EMAIL PROTECTED]]
> Sent: Thursday, May 27, 2004 11:42 AM
> To: [EMAIL PROTECTED]
> Subject: perl/spider/crawling question...
>
>
> hi...
>
> we're looking at creating a project/app to extract information from
> university websites. we know we can write a separate individual perl
> app/scipt for each school which would crawl/parse/extract the
> information we
> need. however, we'd rather not write a unique perl script for
> each school if
> there is a better/more efficient way.
>
> anybody have any good suggestions, preferably with code samples!!
>
> thanks for any help/assistance/pointers/etc...
>
> bruce
> [EMAIL PROTECTED]
>
>
> _______________________________________________
> Perl-Win32-Users mailing list
> [EMAIL PROTECTED]
> To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
>

_______________________________________________
Perl-Win32-Users mailing list
[EMAIL PROTECTED]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

RE: perl/spider/crawling question...

Reply via email to