Hello,

We need to grab some data from a webpage fetch via the LWP module. This is the coding and the $resultdata below, need to regrex out various data, indicated by the [ ] brackets... see below for further explainations. My regrex is not very strong and need to some help figuring out the best way to do this.
===============================================================
#!/usr/bin/perl
BEGIN { open (STDERR, ">./mandy_error.log"); }
use CGI::Carp qw(fatalsToBrowser);
use CGI qw(:standard);
use HTTP::Request;
use LWP::UserAgent;
use strict;

my $agent       = "Thunder Rain Scraper";
my $adminemail  = '[EMAIL PROTECTED]';
my $urltofetch = 'http://www.mandy.com/1/jobs2.cfm?terr=usny&skill=crw&paid=no&p=';

my $resultdata = fetch_results($urltofetch);

print header();

if(defined($resultdata))
{
 # process resulting data returned
   $resultdata =~ s/&/&/ig;
   $resultdata =~ s/ / /ig;

   LOOP:
   for my $lines ( split(/\n/,$resultdata) )
   {
     if($lines =~ /<tr class=\"main\"/i)  # THIS IS NOT WORKING.
      {
          # DO STUFF HERE -
      }
   }
}
else
{
 print qq~\nNo Result Data Returned\r\n~;
}
print qq~\nProcess Completed\n~;
exit();

sub fetch_results {
   my $url = shift();

   # MAIN
   my $ua = new LWP::UserAgent;   # create a new LWP agent
   $ua->from($adminemail);        # set HTTP From
   $ua->agent($agent);            # set Agent-Name

   # retrieve the file from $url
   my $request = new HTTP::Request GET => $url;
   my $response = $ua->request($request);

   # return content
   if ($response->is_success()) { return $response->content(); }
   else                         { return undef;                }
}

__END__
===================================================================

Now the data returned, we need to filter out all except where it has <!-- START GRABBING RESULT HERE --> till the <!-- END RESULT HERE --> I need to grab the data within the [ ] brackets. Those brackets [ ] I inserted for clarification, there not normally there. And go through each <tr class="main"> (.*?)</tr> table cell up to the end of the </table>

######################################################################################
# FILTET TO RESULTS
... A BUNCH HEADER STUFF HERE ....

# START TABLE HERE
<table border="0" width="100%" cellpadding="5" cellspacing="0">
<tr class="dbluetoppedbox" bgcolor="#E6EFF8"><td valign="TOP">
<span class="main">Vacancy</span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
&nbsp;&nbsp;&nbsp;&nbsp;</td><td valign="TOP"><span class="main">Employer</span>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</td><td valign="TOP" nowrap><span class="main">
Where&nbsp;(Ad&nbsp;posted)</span></td>
<td valign="TOP"><span class="main">Duration</span></td>
<td valign="TOP" nowrap><span class="main">Pay</span></td>
</tr>

<!-- START GRABBING RESULT HERE -->
<tr class="main"><td valign="TOP"><a href="[jobs3.cfm?v=18327933]">
[Camera Operator/ Video Editor]</a></td><td valign="TOP">[BigbreakNy]</td>
<td valign="TOP">[Manhattan and Union ]([30&nbsp;Aug ])</td>
<td valign="TOP">[ASAP&nbsp;/&nbsp;A few days of shooting]</td><td valign="TOP">[Lo/no]</td>
</tr>
# NEXT ROW CELL
<tr class="main"><td valign="TOP"><a href="[jobs3.cfm?v=18326674]">
[Video Sub]</a></td><td valign="TOP">[Blue Man Group]</td><td valign="TOP">[New York (30&nbsp;Aug)] </td><td valign="TOP">[ASAP&nbsp;/&nbsp;open ended]</td><td valign="TOP">[Paid]</td></tr>
# NEXT ROW CELL
......

<!-- END RESULT GRABBING HERE -->
</table>

Mike(mickalo)Blezien
===============================
Thunder Rain Internet Publishing
Providing Internet Solution that Work
http://www.thunder-rain.com
===============================

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>


Reply via email to