Re: OT: How to stop bots indexing my dev sites

2008-02-13 Thread James Holmes
If you can't give them a VPN client, the I agree that a simple
password enforced by the webserver is the way to go. This way your
site doesn't have to change when you move it to prod.

On Feb 14, 2008 12:12 PM, Rick Faircloth <[EMAIL PROTECTED]> wrote:
> I like to let me clients look in on the site during development
> and test features along the way.  (They get training and I get
> testing that way.  They are also giving tacit approval to the
> design and development by being in on the process.)
>
> Restricting the IP wouldn't be a good solution since they may
> want to view the site at the office, or at home, friends, etc.
>
> Seems like a login to the development site would be the easiest
> way to go.
>
> Thoughts?
>
> Thanks,
>
> Rick
>
>
>
> > -Original Message-
> > From: James Holmes [mailto:[EMAIL PROTECTED]
> > Sent: Wednesday, February 13, 2008 8:16 PM
> > To: CF-Talk
> > Subject: Re: OT: How to stop bots indexing my dev sites
> >
>
> > We restrict the dev site/server by IP address, so only local
> > connections are permitted by the webserver. VPN access is used to
> > access a dev server off-site.
> >
> > On Feb 14, 2008 12:56 AM, Rick Faircloth <[EMAIL PROTECTED]> wrote:
> > > Hi, all.
> > >
> > > I usually put up a client's site while it is development
> > > so they can view progress, make comments, etc.
> > >
> > > I recently completed a client's site and began to perform
> > > search engine optimization on it.
> > >
> > > Once I started reviewing rankings, I found that my development
> > > site was ranking higher for the content than my client's main site!
> > > Not good!
> > >
> > > I thought about using the meta tag for robots not to follow or index
> > > the site, but realized I'd have to take that out of the file every time
> > > I uploaded the site to my client's domain.
> > >
> > > I guess I could check the cgi.server_name and skip the meta tag with
> > > conditional cf code if it's in the development domain.
> > >
> > > Other ideas?
> > >
> > > Thanks,
> > >
> > > Rick
> > >
> > >
> > >
> > >
> >
> >
>
> 

~|
Adobe® ColdFusion® 8 software 8 is the most important and dramatic release to 
date
Get the Free Trial
http://ad.doubleclick.net/clk;160198600;22374440;w

Archive: 
http://www.houseoffusion.com/groups/CF-Talk/message.cfm/messageid:298953
Subscription: http://www.houseoffusion.com/groups/CF-Talk/subscribe.cfm
Unsubscribe: 
http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=11502.10531.4


RE: OT: How to stop bots indexing my dev sites

2008-02-13 Thread Rick Faircloth
I like to let me clients look in on the site during development
and test features along the way.  (They get training and I get
testing that way.  They are also giving tacit approval to the
design and development by being in on the process.)

Restricting the IP wouldn't be a good solution since they may
want to view the site at the office, or at home, friends, etc.

Seems like a login to the development site would be the easiest
way to go.

Thoughts?

Thanks,

Rick



> -Original Message-
> From: James Holmes [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, February 13, 2008 8:16 PM
> To: CF-Talk
> Subject: Re: OT: How to stop bots indexing my dev sites
> 
> We restrict the dev site/server by IP address, so only local
> connections are permitted by the webserver. VPN access is used to
> access a dev server off-site.
> 
> On Feb 14, 2008 12:56 AM, Rick Faircloth <[EMAIL PROTECTED]> wrote:
> > Hi, all.
> >
> > I usually put up a client's site while it is development
> > so they can view progress, make comments, etc.
> >
> > I recently completed a client's site and began to perform
> > search engine optimization on it.
> >
> > Once I started reviewing rankings, I found that my development
> > site was ranking higher for the content than my client's main site!
> > Not good!
> >
> > I thought about using the meta tag for robots not to follow or index
> > the site, but realized I'd have to take that out of the file every time
> > I uploaded the site to my client's domain.
> >
> > I guess I could check the cgi.server_name and skip the meta tag with
> > conditional cf code if it's in the development domain.
> >
> > Other ideas?
> >
> > Thanks,
> >
> > Rick
> >
> >
> >
> >
> 
> 

~|
Adobe® ColdFusion® 8 software 8 is the most important and dramatic release to 
date
Get the Free Trial
http://ad.doubleclick.net/clk;160198600;22374440;w

Archive: 
http://www.houseoffusion.com/groups/CF-Talk/message.cfm/messageid:298950
Subscription: http://www.houseoffusion.com/groups/CF-Talk/subscribe.cfm
Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4


Re: OT: How to stop bots indexing my dev sites

2008-02-13 Thread James Holmes
We restrict the dev site/server by IP address, so only local
connections are permitted by the webserver. VPN access is used to
access a dev server off-site.

On Feb 14, 2008 12:56 AM, Rick Faircloth <[EMAIL PROTECTED]> wrote:
> Hi, all.
>
> I usually put up a client's site while it is development
> so they can view progress, make comments, etc.
>
> I recently completed a client's site and began to perform
> search engine optimization on it.
>
> Once I started reviewing rankings, I found that my development
> site was ranking higher for the content than my client's main site!
> Not good!
>
> I thought about using the meta tag for robots not to follow or index
> the site, but realized I'd have to take that out of the file every time
> I uploaded the site to my client's domain.
>
> I guess I could check the cgi.server_name and skip the meta tag with
> conditional cf code if it's in the development domain.
>
> Other ideas?
>
> Thanks,
>
> Rick
>
>
>
> 

~|
Adobe® ColdFusion® 8 software 8 is the most important and dramatic release to 
date
Get the Free Trial
http://ad.doubleclick.net/clk;160198600;22374440;w

Archive: 
http://www.houseoffusion.com/groups/CF-Talk/message.cfm/messageid:298943
Subscription: http://www.houseoffusion.com/groups/CF-Talk/subscribe.cfm
Unsubscribe: 
http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=11502.10531.4


RE: OT: How to stop bots indexing my dev sites

2008-02-13 Thread Dave Watts
>> Use robots.txt:
>
> This is correct to control good bots, but 
> bad bots don't read it.

This isn't relevant, since the crawlers used by Google and Yahoo do obey 
robots.txt.

> Some (very) bad bots even exploit what 
> is in the file to read what's forbidden.

Robots.txt is not intended to be a security mechanism.

Dave Watts, CTO, Fig Leaf Software

~|
Adobe® ColdFusion® 8 software 8 is the most important and dramatic release to 
date
Get the Free Trial
http://ad.doubleclick.net/clk;160198600;22374440;w

Archive: 
http://www.houseoffusion.com/groups/CF-Talk/message.cfm/messageid:298917
Subscription: http://www.houseoffusion.com/groups/CF-Talk/subscribe.cfm
Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4


Re: OT: How to stop bots indexing my dev sites

2008-02-13 Thread Claude Schneegans
 >>If he is simply
trying to prevent search engines from indexing certain folders, he should be
fine?

Let's say 50% fine, since bad bots (may be 50% of all robot traffic) do 
not even care reading robots.txt

~|
Adobe® ColdFusion® 8 software 8 is the most important and dramatic release to 
date
Get the Free Trial
http://ad.doubleclick.net/clk;160198600;22374440;w

Archive: 
http://www.houseoffusion.com/groups/CF-Talk/message.cfm/messageid:298905
Subscription: http://www.houseoffusion.com/groups/CF-Talk/subscribe.cfm
Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4


RE: OT: How to stop bots indexing my dev sites

2008-02-13 Thread Russ
Put Apache/IIS authentication on your dev sites.  (Of course much easier in
apache) 

Russ

> -Original Message-
> From: Claude Schneegans [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, February 13, 2008 12:09 PM
> To: CF-Talk
> Subject: Re: OT: How to stop bots indexing my dev sites
> 
>  >>Use robots.txt:
> 
> This is correct to control good bots, but bad bots don't read it.
> Some (very) bad bots even exploit what is in the file to read what's
> forbidden.
> 
> 

~|
Adobe® ColdFusion® 8 software 8 is the most important and dramatic release to 
date
Get the Free Trial
http://ad.doubleclick.net/clk;160198600;22374440;w

Archive: 
http://www.houseoffusion.com/groups/CF-Talk/message.cfm/messageid:298904
Subscription: http://www.houseoffusion.com/groups/CF-Talk/subscribe.cfm
Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4


RE: OT: How to stop bots indexing my dev sites

2008-02-13 Thread Will Swain
Sure. But it depends on what the OP's requirements are. If he is simply
trying to prevent search engines from indexing certain folders, he should be
fine?

Will

-Original Message-
From: Claude Schneegans [mailto:[EMAIL PROTECTED] 
Sent: 13 February 2008 17:09
To: CF-Talk
Subject: Re: OT: How to stop bots indexing my dev sites

 >>Use robots.txt:

This is correct to control good bots, but bad bots don't read it.
Some (very) bad bots even exploit what is in the file to read what's
forbidden.



~|
Adobe® ColdFusion® 8 software 8 is the most important and dramatic release to 
date
Get the Free Trial
http://ad.doubleclick.net/clk;160198600;22374440;w

Archive: 
http://www.houseoffusion.com/groups/CF-Talk/message.cfm/messageid:298903
Subscription: http://www.houseoffusion.com/groups/CF-Talk/subscribe.cfm
Unsubscribe: 
http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=11502.10531.4


Re: OT: How to stop bots indexing my dev sites

2008-02-13 Thread Claude Schneegans
 >>Use robots.txt:

This is correct to control good bots, but bad bots don't read it.
Some (very) bad bots even exploit what is in the file to read what's 
forbidden.

~|
Adobe® ColdFusion® 8 software 8 is the most important and dramatic release to 
date
Get the Free Trial
http://ad.doubleclick.net/clk;160198600;22374440;w

Archive: 
http://www.houseoffusion.com/groups/CF-Talk/message.cfm/messageid:298900
Subscription: http://www.houseoffusion.com/groups/CF-Talk/subscribe.cfm
Unsubscribe: 
http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=11502.10531.4


Re: OT: How to stop bots indexing my dev sites

2008-02-13 Thread Claude Schneegans
 >>Once I started reviewing rankings, I found that my development
site was ranking higher for the content than my client's main site!
Not good!

Exact.
But robot control is not something trivial.
Firstly, there are true and friendly robots, like Google, secondly, 
there are bad bots, looking for
mail addresses, trying to put spam into your sites, chinese bots 
checking if your site should be
banned because they are speaking about human's right, etc.

Good bots are easy to recognize: they have a web address in the user 
agent, ex:
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

Bad bots are more tricky to detect, because they don't want to look like 
a robot, then they mimic
standard browsers like MSIE, Mozilla, etc.

I've designed my own bad bot detector ("robotCop") and it takes several 
factors in account like:
- reads the robots.txt file,
- respects instructions in the robots.txt file,
- falls in click trap (some link not visible by a human visitor)
- average time spend between pages,
- reads images,
- reads javascript files,
- execute Javascript,
- support cookis,
- listed in black lists... etc.

Based on these factors, agents are granted
- full access (supposedly human browsers)
- text only (supposedly good robots)
- banned (supposedly bad or unwanted bots)
 

~|
Adobe® ColdFusion® 8 software 8 is the most important and dramatic release to 
date
Get the Free Trial
http://ad.doubleclick.net/clk;160198600;22374440;w

Archive: 
http://www.houseoffusion.com/groups/CF-Talk/message.cfm/messageid:298898
Subscription: http://www.houseoffusion.com/groups/CF-Talk/subscribe.cfm
Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4


RE: OT: How to stop bots indexing my dev sites

2008-02-13 Thread Dave Watts
> I thought about using the meta tag for 
> robots not to follow or index the site, but 
> realized I'd have to take that out of the 
> file every time I uploaded the site to my 
> client's domain.

Use robots.txt:

http://www.robotstxt.org/

Dave Watts, CTO, Fig Leaf Software

~|
Adobe® ColdFusion® 8 software 8 is the most important and dramatic release to 
date
Get the Free Trial
http://ad.doubleclick.net/clk;160198600;22374440;w

Archive: 
http://www.houseoffusion.com/groups/CF-Talk/message.cfm/messageid:298892
Subscription: http://www.houseoffusion.com/groups/CF-Talk/subscribe.cfm
Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4