I'll offer the following code to get you started on the task - and invite 
critiques/improvements!
(cribbed from various sources and 'tuned' - note that either apostrophes or double 
quotes can be used to delimit
the URL)

 $bValidity      = $iFound
                   = preg_match_all( "/(href *= *['\"]?)([^'\" >]*)(['\" >])/i", 
$HTML, $aRegExOut );
 if ( 0          < $iFound )
 {
  $aA          = $aRegExOut[2];
  if ( DEBUG ) { ShowList( "Located", $aA ); }

BTW I'm covering a case of finding multiple links in one piece of HTML. This can be 
dialed-back for single
cases.

The rest I'll leave to you.

Regards,
=dn

----- Original Message -----
From: "SpamSucks86" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: 20 February 2002 21:16
Subject: RE: [PHP] regexp on user supplied link


> I absolutely hate regular expressions because I suck at writing
> them...but I can help you with the logic. I was thinking search for a
> pattern which matches HREF=" + any number of characters + ". Your match
> would be HREF="blahblahblah". Then, you could go and chop off the HREF="
> and the lagging ", and then you are left with just a URL. Then, you can
> use that built in url parser function (I forget its name, I think it
> might be urlparse()). Then, see if there is no host, it's obviously a
> relative link, otherwise, you can just see if the host matches or not.
> This should work well. Good luck
>
> -----Original Message-----
> From: Martin Towell [mailto:[EMAIL PROTECTED]]
> Sent: Tuesday, February 19, 2002 6:59 PM
> To: '[EMAIL PROTECTED]'; php
> Subject: RE: [PHP] regexp on user supplied link
>
> reg.ex. something like (not tested):
> "<a[^>]*>"
> this would give you the entire anchor tag, then go from there?
>
> or what about using the XML parsing routines, get it to find the anchors
> and
> give you it's attributes, then go from there?
>
> Martin
>
> -----Original Message-----
> From: Justin French [mailto:[EMAIL PROTECTED]]
> Sent: Wednesday, February 20, 2002 10:46 AM
> To: php
> Subject: [PHP] regexp on user supplied link
>
>
> Hi,
>
> I have a website which is based purely on user-added content.  The
> problem with this is that some areas allow users to use links in the
> text, and it's difficult to ensure that they all have a decent knowledge
> of attributes such as tartget="_new", etc etc.
>
> So, I'd like a script that...
>
> 1. looks at $text for any link tags, and for each tag, does the
> following:
>
> 2. throws out everything except the HREF eg:
> <A HREF="http://www.somesite.com"; target="_new">click</a> becomes
> http://www.somesite.com
> <A HREF="javascript:something();"> becomes javascript:something();
>
> 3. prefixe the url with <A HREF="
>
> 4. establish if it's an internal or external link:  so how do we
> establish if it's an external link? well it'd be easy if we just say
> "anything begining with http:// is not relative", but because this
> content is user-driven, I'd like to be a little safer, and say "anything
> that begins with http://www.mysite.com OR http://mysite.com"; is an
> external link.
>
> 5. if it's an external link, suffix the URL with " TARGET="_new">, or if
> it's internal, suffix it with ">
>
>
> Anyway, that'd be a great start.  From there, I might like to prex each
> external link to go thru a program called out.php to log affiliate
> activity, and I might like to retain onmouseover, onclick, onmouseout
> etc etc properties in the tag, I might like to ensure a session ID is
> found within each internal link, and stripped from each external link,
> ensure that the <A> has a matching </A> etc etc, but the above would be
> a great start.
>
>
> Any help, especially with steps 1, 2 & 4, would be much appreciated.
>
>
> Thanks in advance,
>
> Justin French
> http://indent.com.au
> http://soundpimps.com
>
> --
> PHP General Mailing List (http://www.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub.php
>
>
> --
> PHP General Mailing List (http://www.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub.php
>
>


-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to