Rob, thanks so much for helping me with this perl task. I'm still going over your solution character-by-character to fully understand it. I really appreciate your efforts in working it out.
-Kevin -----Original Message----- From: Rob Dixon [mailto:[EMAIL PROTECTED] Sent: Monday, September 25, 2006 5:22 PM To: beginners@perl.org Subject: Re: Need help with repeating match Zembower, Kevin wrote: > > I'm trying to process a file that mostly has lines like: > http://www.cpsp.edu.pk/jcpsp/ARCHIEVE/May2006/article5.pdf 342740 > http://www.scielo.br/pdf/bjid/v10n1/a04v10n1.pdf 342741 > > However, it sometimes has more than one URL on a line, like: > http://db.jhuccp.org/docs/732301.pdfhttp://db.jhuccp.org/docs/732301FRE. > pdfhttp://db.jhuccp.org/docs/732301SPA.pdfhttp://db.jhuccp.org/docs/7323 > 01POR.pdf 16875 > http://db.jhuccp.org/docs/732302.pdfhttp://db.jhuccp.org/docs/732302FRE. > pdfhttp://db.jhuccp.org/docs/732302POR.pdf 18024 > > I want to capture the portion from the start of 'http://' to either the > first whitespace or to the start of the next 'http://' and loop through, > doing something with the portion captured, until it fails to capture any > more. I also need the six digit number at the end inside each loop. > > I think the way to do this is with a look-ahead assertion, but don't > understand this very well. I also don't understand how to write this so > it doesn't fail to match the lines with only one URL in them. Can anyone > give me a hand getting started with this task? Thanks for your help and > suggestions. This code is hopefully made a littel more readable by first constructing a regex for a single url and then using it in the global match to say that what we want is a URL followed by either another URL or whitespace. Hope it does the trick. Rob use strict; use warnings; while (<DATA>) { my ($n) = /(\d+)\s*$/; my $url = qr#http://\S*?#; my @urls = m#$url(?=$url|\s)#g; print "$_\n" foreach @urls; print $n, "\n\n"; } __DATA__ http://www.cpsp.edu.pk/jcpsp/ARCHIEVE/May2006/article5.pdf 342740 http://www.scielo.br/pdf/bjid/v10n1/a04v10n1.pdf 342741 http://db.jhuccp.org/docs/732301.pdfhttp://db.jhuccp.org/docs/732301FRE. pdfhttp://db.jhuccp.org/docs/732301SPA.pdfhttp://db.jhuccp.org/docs/7323 01POR.pdf 16875 http://db.jhuccp.org/docs/732302.pdfhttp://db.jhuccp.org/docs/732302FRE. pdfhttp://db.jhuccp.org/docs/732302POR.pdf 18024 **OUTPUT** http://www.cpsp.edu.pk/jcpsp/ARCHIEVE/May2006/article5.pdf 342740 http://www.scielo.br/pdf/bjid/v10n1/a04v10n1.pdf 342741 http://db.jhuccp.org/docs/732301.pdf http://db.jhuccp.org/docs/732301FRE.pdf http://db.jhuccp.org/docs/732301SPA.pdf http://db.jhuccp.org/docs/732301POR.pdf 16875 http://db.jhuccp.org/docs/732302.pdf http://db.jhuccp.org/docs/732302FRE.pdf http://db.jhuccp.org/docs/732302POR.pdf 18024 -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response> -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>