Re: nofollow regex

2009-01-15 Thread Jeff Becker
Wowsers.. Thanks Peter.. I looked at Adrian's code yesterday to try to see if I 
could modify to include all the complex examples.

I'll test your code today and let you know how it goes. For as gracefully as 
Wordpress and some forum software handle this, it sure is complex to implement. 

~|
Adobe® ColdFusion® 8 software 8 is the most important and dramatic release to 
date
Get the Free Trial
http://ad.doubleclick.net/clk;207172674;29440083;f

Archive: 
http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:317994
Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm
Unsubscribe: 
http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=11502.10531.4


nofollow regex

2009-01-14 Thread Jeff Becker
Hey folks,
Since I got no love in the RegEx forum, I'm hoping to post here to get a little 
more eyeballs on the question I'm struggling over.

I'm looking for a working rel=nofollow regex to modify links.

For example:

Goto a href=http://google.com;Google/a now!
and turning it into:
Goto a href=http://google.com; rel=nofollowGoogle/a now!

The best solution I've found so far is:  
http://www.sitecritic.net/articleDetail.php?id=242  , but this 
is a PHP solution.  Any ideas on converting this to Coldfusion?

The PHP solution covers a lot of scenarios (extra attributes, single quotes 
instead of double quotes) etc.. so that would be ideal.

Thanks! 

~|
Adobe® ColdFusion® 8 software 8 is the most important and dramatic release to 
date
Get the Free Trial
http://ad.doubleclick.net/clk;207172674;29440083;f

Archive: 
http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:317920
Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm
Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4


RE: nofollow regex

2009-01-14 Thread Andy Matthews
Does it have to be a server side solution? jQuery would make this a snap:

$(document).ready(function(){
$('a[href^=http]').attr('rel','nofollow');
});

a href=/somepage.htmlThis is an internal link/a
brbr
a href=http://google.com;And this is an external link, with no follow/a


-Original Message-
From: Jeff Becker [mailto:jpbec...@yahoo.com] 
Sent: Wednesday, January 14, 2009 8:58 AM
To: cf-talk
Subject: nofollow regex

Hey folks,
Since I got no love in the RegEx forum, I'm hoping to post here to get a
little more eyeballs on the question I'm struggling over.

I'm looking for a working rel=nofollow regex to modify links.

For example:

Goto a href=http://google.com;Google/a now!
and turning it into:
Goto a href=http://google.com; rel=nofollowGoogle/a now!

The best solution I've found so far is:
http://www.sitecritic.net/articleDetail.php?id=242  , but this is a PHP
solution.  Any ideas on converting this to Coldfusion?

The PHP solution covers a lot of scenarios (extra attributes, single quotes
instead of double quotes) etc.. so that would be ideal.

Thanks! 



~|
Adobe® ColdFusion® 8 software 8 is the most important and dramatic release to 
date
Get the Free Trial
http://ad.doubleclick.net/clk;207172674;29440083;f

Archive: 
http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:317923
Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm
Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4


RE: nofollow regex

2009-01-14 Thread Adrian Lynch
Ha! That would be great, but search engines won't see that which is the
point really.

Using that PHP regex you pointed to:

cfsavecontent variable=html
Goto a href=http://google.com;No/a now!
and turning it into:
Goto a href=http://google.com; rel=nofollow~Yes/a now!
/cfsavecontent

cfoutput

pre#HTMLEditFormat(html)#/pre

cfset re =
[\s]*a[\s]*href=[\s]*[\\']?([\w.-]*)[\\']?[^]*(.*?)\/a

cfset matches = REMatch(re, html)

cfdump var=#matches#

cfloop array=#matches# index=match
p#HTMLEditFormat(match)#/p
cfif NOT FindNoCase(nofollow, match)
cfset newLink = Replace(match, ,  rel=nofollow,
ONE)
cfset html = ReplaceNoCase(html, match, newLink)
/cfif
/cfloop

pre#HTMLEditFormat(html)#/pre

Not as nice as a one hit RegEx but seems to get the job done :)

Adrian

 -Original Message-
 From: Andy Matthews [mailto:li...@commadelimited.com]
 Sent: 14 January 2009 15:27
 To: cf-talk
 Subject: RE: nofollow regex
 
 Does it have to be a server side solution? jQuery would make this a
 snap:
 
 $(document).ready(function(){
   $('a[href^=http]').attr('rel','nofollow');
 });
 
 a href=/somepage.htmlThis is an internal link/a
 brbr
 a href=http://google.com;And this is an external link, with no
 follow/a
 
 
 -Original Message-
 From: Jeff Becker [mailto:jpbec...@yahoo.com]
 Sent: Wednesday, January 14, 2009 8:58 AM
 To: cf-talk
 Subject: nofollow regex
 
 Hey folks,
 Since I got no love in the RegEx forum, I'm hoping to post here to get
 a
 little more eyeballs on the question I'm struggling over.
 
 I'm looking for a working rel=nofollow regex to modify links.
 
 For example:
 
 Goto a href=http://google.com;Google/a now!
 and turning it into:
 Goto a href=http://google.com; rel=nofollowGoogle/a now!
 
 The best solution I've found so far is:
 http://www.sitecritic.net/articleDetail.php?id=242  , but this is a PHP
 solution.  Any ideas on converting this to Coldfusion?
 
 The PHP solution covers a lot of scenarios (extra attributes, single
 quotes
 instead of double quotes) etc.. so that would be ideal.
 
 Thanks!



~|
Adobe® ColdFusion® 8 software 8 is the most important and dramatic release to 
date
Get the Free Trial
http://ad.doubleclick.net/clk;207172674;29440083;f

Archive: 
http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:317925
Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm
Unsubscribe: 
http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=11502.10531.4


Re: nofollow regex

2009-01-14 Thread Jeff Becker
Yes it does.  This is for validation on a blog or forum, etc...

For the spam comments/posts that do sneak by, it would be nice to have the 
server side validation making sure to add in rel=nofollow.

I thought about client side, but again, ideally, I'm after server-side 
validation and formatting.





~|
Adobe® ColdFusion® 8 software 8 is the most important and dramatic release to 
date
Get the Free Trial
http://ad.doubleclick.net/clk;207172674;29440083;f

Archive: 
http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:317926
Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm
Unsubscribe: 
http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=11502.10531.4


RE: nofollow regex

2009-01-14 Thread Andy Matthews
Ahhh...I gotcha. That does sort of put a damper on it doesn't it?

Oh well. I had fun whipping that out. 

-Original Message-
From: Jeff Becker [mailto:jpbec...@yahoo.com] 
Sent: Wednesday, January 14, 2009 9:43 AM
To: cf-talk
Subject: Re: nofollow regex

Yes it does.  This is for validation on a blog or forum, etc...

For the spam comments/posts that do sneak by, it would be nice to have the
server side validation making sure to add in rel=nofollow.

I thought about client side, but again, ideally, I'm after server-side
validation and formatting.







~|
Adobe® ColdFusion® 8 software 8 is the most important and dramatic release to 
date
Get the Free Trial
http://ad.doubleclick.net/clk;207172674;29440083;f

Archive: 
http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:317928
Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm
Unsubscribe: 
http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=11502.10531.4


Re: nofollow regex

2009-01-14 Thread Jeff Becker
Adrian,
Thats very nice.. Thanks for that.

I made one minor correction.  Note the added in 
cfset newLink = Replace(match, ,  rel=nofollow,ONE)


I'm starting to run more complicated examples and have two issues.
Running:
cfsavecontent variable=html
Goto a href=http://google.com;Google/a now!BR
Also hit up a href='http://movies.com' rel='junkrel'movies/aBRBR
Don't forget a   href=http://coffee.com; title=Great Coffee GRRREAT COFFEE 
/a
/cfsavecontent


Two items:
Anyway to remove that rel=junkrel.  Again concern is for spammers.  I think 
fakerel=nofollow might be bypassed as well.

Other item.  Any issues on the movies example switching from single quote to 
double quote ONLY ON the rel=nofollow.  I'm thinking that might ok for search 
engine spiders

Thanks again! 

~|
Adobe® ColdFusion® 8 software 8 is the most important and dramatic release to 
date
Get the Free Trial
http://ad.doubleclick.net/clk;207172674;29440083;f

Archive: 
http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:317929
Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm
Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4


Re: nofollow regex

2009-01-14 Thread Peter Boughton
It's been a long day, so I may well have missed something, but here we go 
anyway...

One caveat - this assumes at least semi-valid markup - in so far as no  inside 
of A opening tag/attributes.
(Not impossible to fix, but since it's not valid I can't be bothered going that 
far right now.)


cffunction name=setHyperlinkRel returntype=String output=false
cfargument name=LinkCode  type=String/
cfargument name=RelValue  type=String/
cfargument name=Appendtype=Boolean default=true/
cfargument name=Delimiter type=String  default=,/

cfset var Head = ListFirst(Arguments.LinkCode,'')/
cfset var Tail = ListRest(Arguments.LinkCode,'')/

cfset var RelAttr = rematch( 
'(?ims)\brel=(?:\S+|(['']).*?)(?=\1(?:\s|$))' , Head )/

cfif ArrayLen(RelAttr)
cfif NOT find(RelValue,RelAttr[1])
cfset Head = replace
( Head
, RelAttr[1]
, ListAppend( RelAttr[1] , Arguments.RelValue , 
Arguments.Delimiter )
)/
/cfif
cfelse
cfset Head = Head  ' rel=#Arguments.RelValue#' /
/cfif

cfreturn Head  ''  Tail /
/cffunction

cffunction name=addRelNofollow returntype=String output=false
cfargument name=InputText type=String/

cfset var Result = Arguments.InputText/
cfset var NewHyperlink = 0/
cfset var i = 0/

cfset var Hyperlinks = rematch( '(?ims)a[^]+.*?/a' , 
Arguments.InputText )/

cfloop index=i from=1 to=#ArrayLen(Hyperlinks)#
cfset NewHyperlink = setHyperlinkRel( Hyperlinks[i] , 
'nofollow' ) /
cfset Result = replace( Result , Hyperlinks[i] , NewHyperlink 
)/
/cfloop

cfreturn Result /
/cffunction

cfset NewContent = addRelNofollow( OldContent ) /


That works on all the various examples I've tried so far - let me know if 
there's anything missed and I'll update it.

Needs ColdFusion 8 or Railo 3 for rematch calls (can do one with Java Regex if 
people need it working with other engines).



~|
Adobe® ColdFusion® 8 software 8 is the most important and dramatic release to 
date
Get the Free Trial
http://ad.doubleclick.net/clk;207172674;29440083;f

Archive: 
http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:317964
Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm
Unsubscribe: 
http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=11502.10531.4


Re: nofollow regex

2009-01-14 Thread Peter Boughton
I knew I wasn't fully awake - I forgot to implement the append/overwrite 
functionality.

Below is an updated version of the first function that allows you to overwrite 
rel, rather than appending onto it.
(You need to update function call in second function to actually turn this on.)

Not entirely happy with how I've done the overwriting, but it works with the 
examples I tried.


cffunction name=setHyperlinkRel returntype=String output=false
cfargument name=LinkCode  type=String/
cfargument name=RelValue  type=String/
cfargument name=Appendtype=Boolean default=true/
cfargument name=Delimiter type=String  default=,/

cfset var Head = ListFirst(Arguments.LinkCode,'')/
cfset var Tail = ListRest(Arguments.LinkCode,'')/

cfset var RelAttr = rematch( 
'(?ims)\brel=(?:\S+|(['']).*?)(?=\1(?:\s|$))' , Head )/

cfif ArrayLen(RelAttr)
cfif Arguments.Append
cfif NOT find(RelValue,RelAttr[1])
cfset Head = replace
( Head
, RelAttr[1]
, ListAppend( RelAttr[1] , 
Arguments.RelValue , Arguments.Delimiter )
)/
/cfif
cfelse
cfset Head = rereplace
( Head
, RelAttr[1]  '['']?'
, 'rel=#Arguments.RelValue#'
)/
/cfif
cfelse
cfset Head = Head  ' rel=#Arguments.RelValue#' /
/cfif

cfreturn Head  ''  Tail /
/cffunction



~|
Adobe® ColdFusion® 8 software 8 is the most important and dramatic release to 
date
Get the Free Trial
http://ad.doubleclick.net/clk;207172674;29440083;f

Archive: 
http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:317965
Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm
Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4


Re: nofollow regex

2009-01-14 Thread Peter Boughton
Spotted another problem:
Currently it avoids fakerel=nofollow but doesn't avoid fake-rel=nofollow

Possibly changing \b to \s in first regex will work, but that needs testing. 

~|
Adobe® ColdFusion® 8 software 8 is the most important and dramatic release to 
date
Get the Free Trial
http://ad.doubleclick.net/clk;207172674;29440083;f

Archive: 
http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:317966
Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm
Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4