RE: Need help with repeating match

Zembower, Kevin Tue, 26 Sep 2006 08:54:19 -0700

Rob, thanks so much for helping me with this perl task. I'm still going
over your solution character-by-character to fully understand it. I
really appreciate your efforts in working it out.


-Kevin

-----Original Message-----
From: Rob Dixon [mailto:[EMAIL PROTECTED] 
Sent: Monday, September 25, 2006 5:22 PM
To: beginners@perl.org
Subject: Re: Need help with repeating match

Zembower, Kevin wrote:
 >
 > I'm trying to process a file that mostly has lines like:
 > http://www.cpsp.edu.pk/jcpsp/ARCHIEVE/May2006/article5.pdf
342740
 > http://www.scielo.br/pdf/bjid/v10n1/a04v10n1.pdf        342741
 >
 > However, it sometimes has more than one URL on a line, like:
 >
http://db.jhuccp.org/docs/732301.pdfhttp://db.jhuccp.org/docs/732301FRE.
 >
pdfhttp://db.jhuccp.org/docs/732301SPA.pdfhttp://db.jhuccp.org/docs/7323
 > 01POR.pdf        16875
 >
http://db.jhuccp.org/docs/732302.pdfhttp://db.jhuccp.org/docs/732302FRE.
 > pdfhttp://db.jhuccp.org/docs/732302POR.pdf      18024
 >
 > I want to capture the portion from the start of 'http://' to either
the
 > first whitespace or to the start of the next 'http://' and loop
through,
 > doing something with the portion captured, until it fails to capture
any
 > more. I also need the six digit number at the end inside each loop.
 >
 > I think the way to do this is with a look-ahead assertion, but don't
 > understand this very well. I also don't understand how to write this
so
 > it doesn't fail to match the lines with only one URL in them. Can
anyone
 > give me a hand getting started with this task? Thanks for your help
and
 > suggestions.


This code is hopefully made a littel more readable by first constructing
a regex
for a single url and then using it in the global match to say that what
we want
is a URL followed by either another URL or whitespace.

Hope it does the trick.

Rob



use strict;
use warnings;

while (<DATA>) {

   my ($n) = /(\d+)\s*$/;

   my $url = qr#http://\S*?#;
   my @urls = m#$url(?=$url|\s)#g;

   print "$_\n" foreach @urls;
   print $n, "\n\n";
}


__DATA__
http://www.cpsp.edu.pk/jcpsp/ARCHIEVE/May2006/article5.pdf      342740
http://www.scielo.br/pdf/bjid/v10n1/a04v10n1.pdf        342741
http://db.jhuccp.org/docs/732301.pdfhttp://db.jhuccp.org/docs/732301FRE.
pdfhttp://db.jhuccp.org/docs/732301SPA.pdfhttp://db.jhuccp.org/docs/7323
01POR.pdf 
        16875
http://db.jhuccp.org/docs/732302.pdfhttp://db.jhuccp.org/docs/732302FRE.
pdfhttp://db.jhuccp.org/docs/732302POR.pdf 
      18024


**OUTPUT**

http://www.cpsp.edu.pk/jcpsp/ARCHIEVE/May2006/article5.pdf
342740

http://www.scielo.br/pdf/bjid/v10n1/a04v10n1.pdf
342741

http://db.jhuccp.org/docs/732301.pdf
http://db.jhuccp.org/docs/732301FRE.pdf
http://db.jhuccp.org/docs/732301SPA.pdf
http://db.jhuccp.org/docs/732301POR.pdf
16875

http://db.jhuccp.org/docs/732302.pdf
http://db.jhuccp.org/docs/732302FRE.pdf
http://db.jhuccp.org/docs/732302POR.pdf
18024

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>

RE: Need help with repeating match

Reply via email to