Re: How to extract the exact URL

Chris Devers Wed, 15 Sep 2004 04:31:48 -0700

On Wed, 15 Sep 2004, Ing. Branislav Gerzo wrote:

> but why is yout regexp so difficult ? what about this ?
> 
> if ( $line =~ /href="([^")">2</ )


Typo? That bracket is unbalanced. Try this:

    if ( $line =~ /href="([^"]+)">2</ )

But even that I'm not sure about -- what are you trying to match after 
the end of the first tag? This will only match links like

    <a href="foo">2</a>

but not anything like

    <a href="foo">bar</a>

That isn't what you mwant, is it?

If you're trying to extract the url, you shouldn't have to put it in an 
if statement like that. It seems like this ought to work:

    @urls = ( $line =~ m/href="([^"]+)"/g );

And now, @urls should contain all the urls that have been linked.


I'm not sure if I understand what is meant by

> I just want to extract the first URL which fits my condition, so I 
> use:                   
> if($line=~m/href="((?:[^"\\]|\\.)*)">2/)                                             
>      
> But use this method the second also fit it.                                          
>      

Is the intent to only capture one of each copy of each URL, or was there 
something special about the visible text that has to be acccounted for? 
If you only want one of each URL, the easiest approach may be to match 
all of the URLs into an array -- as noted above -- and then go over that 
array to remove any duplicates. Here's one way to do that:

    my %seen; # container for URLs we've seen before
    my @unique_urls = grep( !$seen{$_}++, @urls );

See perldoc for details:

<http://www.perldoc.com/perl5.8.4/pod/perlfaq4.html#How-can-I-remove-duplicate-elements-from-a-list-or-array->


-- 
Chris Devers

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>

Re: How to extract the exact URL

Reply via email to