On Wed, 15 Sep 2004, Ing. Branislav Gerzo wrote:
> but why is yout regexp so difficult ? what about this ?
>
> if ( $line =~ /href="([^")">2</ )
Typo? That bracket is unbalanced. Try this:
if ( $line =~ /href="([^"]+)">2</ )
But even that I'm not sure about -- what are you trying to match after
the end of the first tag? This will only match links like
<a href="foo">2</a>
but not anything like
<a href="foo">bar</a>
That isn't what you mwant, is it?
If you're trying to extract the url, you shouldn't have to put it in an
if statement like that. It seems like this ought to work:
@urls = ( $line =~ m/href="([^"]+)"/g );
And now, @urls should contain all the urls that have been linked.
I'm not sure if I understand what is meant by
> I just want to extract the first URL which fits my condition, so I
> use:
> if($line=~m/href="((?:[^"\\]|\\.)*)">2/)
>
> But use this method the second also fit it.
>
Is the intent to only capture one of each copy of each URL, or was there
something special about the visible text that has to be acccounted for?
If you only want one of each URL, the easiest approach may be to match
all of the URLs into an array -- as noted above -- and then go over that
array to remove any duplicates. Here's one way to do that:
my %seen; # container for URLs we've seen before
my @unique_urls = grep( !$seen{$_}++, @urls );
See perldoc for details:
<http://www.perldoc.com/perl5.8.4/pod/perlfaq4.html#How-can-I-remove-duplicate-elements-from-a-list-or-array->
--
Chris Devers
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>