Hello,
We need to grab some data from a webpage fetch via the LWP module. This is the
coding and
the $resultdata below, need to regrex out various data, indicated by the [ ]
brackets... see below for further explainations.
My regrex is not very strong and need to some help figuring out the best way to
do this.
===============================================================
#!/usr/bin/perl
BEGIN { open (STDERR, ">./mandy_error.log"); }
use CGI::Carp qw(fatalsToBrowser);
use CGI qw(:standard);
use HTTP::Request;
use LWP::UserAgent;
use strict;
my $agent = "Thunder Rain Scraper";
my $adminemail = '[EMAIL PROTECTED]';
my $urltofetch =
'http://www.mandy.com/1/jobs2.cfm?terr=usny&skill=crw&paid=no&p=';
my $resultdata = fetch_results($urltofetch);
print header();
if(defined($resultdata))
{
# process resulting data returned
$resultdata =~ s/&/&/ig;
$resultdata =~ s/ / /ig;
LOOP:
for my $lines ( split(/\n/,$resultdata) )
{
if($lines =~ /<tr class=\"main\"/i) # THIS IS NOT WORKING.
{
# DO STUFF HERE -
}
}
}
else
{
print qq~\nNo Result Data Returned\r\n~;
}
print qq~\nProcess Completed\n~;
exit();
sub fetch_results {
my $url = shift();
# MAIN
my $ua = new LWP::UserAgent; # create a new LWP agent
$ua->from($adminemail); # set HTTP From
$ua->agent($agent); # set Agent-Name
# retrieve the file from $url
my $request = new HTTP::Request GET => $url;
my $response = $ua->request($request);
# return content
if ($response->is_success()) { return $response->content(); }
else { return undef; }
}
__END__
===================================================================
Now the data returned, we need to filter out all except where it has <!-- START
GRABBING RESULT HERE -->
till the <!-- END RESULT HERE --> I need to grab the data within the [ ]
brackets. Those brackets [ ] I inserted for clarification, there not normally
there. And go through each <tr class="main"> (.*?)</tr> table cell up to the end
of the </table>
######################################################################################
# FILTET TO RESULTS
... A BUNCH HEADER STUFF HERE ....
# START TABLE HERE
<table border="0" width="100%" cellpadding="5" cellspacing="0">
<tr class="dbluetoppedbox" bgcolor="#E6EFF8"><td valign="TOP">
<span
class="main">Vacancy</span>
</td><td valign="TOP"><span class="main">Employer</span>
</td><td valign="TOP" nowrap><span
class="main">
Where (Ad posted)</span></td>
<td valign="TOP"><span class="main">Duration</span></td>
<td valign="TOP" nowrap><span class="main">Pay</span></td>
</tr>
<!-- START GRABBING RESULT HERE -->
<tr class="main"><td valign="TOP"><a href="[jobs3.cfm?v=18327933]">
[Camera Operator/ Video Editor]</a></td><td valign="TOP">[BigbreakNy]</td>
<td valign="TOP">[Manhattan and Union ]([30 Aug ])</td>
<td valign="TOP">[ASAP / A few days of shooting]</td><td
valign="TOP">[Lo/no]</td>
</tr>
# NEXT ROW CELL
<tr class="main"><td valign="TOP"><a href="[jobs3.cfm?v=18326674]">
[Video Sub]</a></td><td valign="TOP">[Blue Man Group]</td><td valign="TOP">[New
York (30 Aug)]
</td><td valign="TOP">[ASAP / open ended]</td><td
valign="TOP">[Paid]</td></tr>
# NEXT ROW CELL
......
<!-- END RESULT GRABBING HERE -->
</table>
Mike(mickalo)Blezien
===============================
Thunder Rain Internet Publishing
Providing Internet Solution that Work
http://www.thunder-rain.com
===============================
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>