subject:"\[Full\-disclosure\] Google's robot.txt handling"

Re: [Full-disclosure] Google's robot.txt handling

2012-12-11 Thread Scott Ferguson

 If I understand the OP correctly, he is not stating that listing something
 in robots.txt would make it inaccessible, but rather that Google indexes
 the robots.txt files themselves, 

snipped


Well, um, yeah - I got that.

So you are what, proposing that moving an open door back a few
centimetres solves the (non) problem?

Take your proposal to it's logical extension and stop all search engines
(especially the ones that don't respect robots.txt) from indexing
robots.txt. Now what do you do about Nutch or even some perl script that
anyone can whip up in 2 minutes?

Security through obscurity is fine when couple with actual security -
but relying on it alone is just daft.

Expecting to world to change so bad habits have no consequence is
dangerously naive.

I suspect you're looking to hard at finding fault with Google - who are
complying with the robots.txt. Read the spec. - it's about not following
the listed directories, not about not listing the robots.txt.  Next
you'll want laws against bad weather and furniture with sharp corners.

Don't put things you don't want seen to see in places that can be seen.



 On Mon, Dec 10, 2012 at 8:19 PM, Scott Ferguson 
 scott.ferguson.it.consulting () gmail com wrote:


 /From/: Hurgel Bumpf l0rd_lunatic () yahoo com
 /Date/: Mon, 10 Dec 2012 19:25:39 + (GMT)
 
 Hi list,


 i tried to contact google, but as they didn't answer my email,  i do

 forward this to FD.

 This security feature is not cleary a google vulnerability, but

 exposes websites informations that are not really

 intended to be public.

 Conan the bavarian

 Your point eludes me - Google is indexing something which is publicly
 available. eg.:- curl http://somesite.tld/robots.txt
 So it seems the solution to the question your raise is, um, nonsensical.

 If you don't want something exposed on your web server *don't publish
 references to it*.

 The solution, which should be blindingly obvious,  is don't create the
 problem in the first place. Password sensitive directories (htpasswd) -
 then they don't have to be excluded from search engines (because listing
 the inaccessible in robots.txt is redundant).  You must of missed the
 first day of web school.

 Kind regards.

___
Full-Disclosure - We believe in it.
Charter: http://lists.grok.org.uk/full-disclosure-charter.html
Hosted and sponsored by Secunia - http://secunia.com/

Re: [Full-disclosure] Google's robot.txt handling

2012-12-11 Thread Mario Vilas

I think we can all agree this is not a vulnerability. Still, I have yet to
see an argument saying why what the OP is proposing is a bad idea. It may
be a good idea to stop indexing robots.txt to mitigate the faults of lazy
or incompetent admins (Google already does this for many specific search
queries) and there's not much point in indexing the robots.txt file for
legitimate uses anyway.

On Tue, Dec 11, 2012 at 2:01 PM, Scott Ferguson 
scott.ferguson.it.consult...@gmail.com wrote:

  If I understand the OP correctly, he is not stating that listing
 something
  in robots.txt would make it inaccessible, but rather that Google indexes
  the robots.txt files themselves,

 snipped


 Well, um, yeah - I got that.

 So you are what, proposing that moving an open door back a few
 centimetres solves the (non) problem?

 Take your proposal to it's logical extension and stop all search engines
 (especially the ones that don't respect robots.txt) from indexing
 robots.txt. Now what do you do about Nutch or even some perl script that
 anyone can whip up in 2 minutes?

 Security through obscurity is fine when couple with actual security -
 but relying on it alone is just daft.

 Expecting to world to change so bad habits have no consequence is
 dangerously naive.

 I suspect you're looking to hard at finding fault with Google - who are
 complying with the robots.txt. Read the spec. - it's about not following
 the listed directories, not about not listing the robots.txt.  Next
 you'll want laws against bad weather and furniture with sharp corners.

 Don't put things you don't want seen to see in places that can be seen.

 
 
  On Mon, Dec 10, 2012 at 8:19 PM, Scott Ferguson 
  scott.ferguson.it.consulting () gmail com wrote:
 
 
  /From/: Hurgel Bumpf l0rd_lunatic () yahoo com
  /Date/: Mon, 10 Dec 2012 19:25:39 + (GMT)
 
 
  Hi list,
 
 
  i tried to contact google, but as they didn't answer my email,  i do
 
  forward this to FD.
 
  This security feature is not cleary a google vulnerability, but
 
  exposes websites informations that are not really
 
  intended to be public.
 
  Conan the bavarian
 
  Your point eludes me - Google is indexing something which is publicly
  available. eg.:- curl http://somesite.tld/robots.txt
  So it seems the solution to the question your raise is, um,
 nonsensical.
 
  If you don't want something exposed on your web server *don't publish
  references to it*.
 
  The solution, which should be blindingly obvious,  is don't create the
  problem in the first place. Password sensitive directories (htpasswd) -
  then they don't have to be excluded from search engines (because listing
  the inaccessible in robots.txt is redundant).  You must of missed the
  first day of web school.
 
  Kind regards.

 ___
 Full-Disclosure - We believe in it.
 Charter: http://lists.grok.org.uk/full-disclosure-charter.html
 Hosted and sponsored by Secunia - http://secunia.com/




-- 
“There's a reason we separate military and the police: one fights the enemy
of the state, the other serves and protects the people. When the military
becomes both, then the enemies of the state tend to become the people.”
___
Full-Disclosure - We believe in it.
Charter: http://lists.grok.org.uk/full-disclosure-charter.html
Hosted and sponsored by Secunia - http://secunia.com/

Re: [Full-disclosure] Google's robot.txt handling

2012-12-11 Thread Jeffrey Walton

On Tue, Dec 11, 2012 at 4:11 PM, Mario Vilas mvi...@gmail.com wrote:
 I think we can all agree this is not a vulnerability. Still, I have yet to
 see an argument saying why what the OP is proposing is a bad idea. It may be
 a good idea to stop indexing robots.txt to mitigate the faults of lazy or
 incompetent admins (Google already does this for many specific search
 queries) and there's not much point in indexing the robots.txt file for
 legitimate uses anyway.
I kind of agree here. The information is valuable for the
reconnaissance phase of an attack, buts its not a vulnerability per
se. But what is to stop the attacker from fetching it himself/herself
since its at a known location for all sites? In this case, Google
would be removing aggregated search results (which means the attacker
would have to compile it himself/herself).

Google removed other interesting searches, such as social security
numbers and credit card numbers (or does not provide them to the
general public).

Jeff

 On Tue, Dec 11, 2012 at 2:01 PM, Scott Ferguson
 scott.ferguson.it.consult...@gmail.com wrote:

  If I understand the OP correctly, he is not stating that listing
  something
  in robots.txt would make it inaccessible, but rather that Google indexes
  the robots.txt files themselves,

 snipped

 Well, um, yeah - I got that.

 So you are what, proposing that moving an open door back a few
 centimetres solves the (non) problem?

 Take your proposal to it's logical extension and stop all search engines
 (especially the ones that don't respect robots.txt) from indexing
 robots.txt. Now what do you do about Nutch or even some perl script that
 anyone can whip up in 2 minutes?

 Security through obscurity is fine when couple with actual security -
 but relying on it alone is just daft.

 Expecting to world to change so bad habits have no consequence is
 dangerously naive.

 I suspect you're looking to hard at finding fault with Google - who are
 complying with the robots.txt. Read the spec. - it's about not following
 the listed directories, not about not listing the robots.txt.  Next
 you'll want laws against bad weather and furniture with sharp corners.

 Don't put things you don't want seen to see in places that can be seen.

 
 
  On Mon, Dec 10, 2012 at 8:19 PM, Scott Ferguson 
  scott.ferguson.it.consulting () gmail com wrote:
 
 
  /From/: Hurgel Bumpf l0rd_lunatic () yahoo com
  /Date/: Mon, 10 Dec 2012 19:25:39 + (GMT)
 
  
  Hi list,
 
 
  i tried to contact google, but as they didn't answer my email,  i do
 
  forward this to FD.
 
  This security feature is not cleary a google vulnerability, but
 
  exposes websites informations that are not really
 
  intended to be public.
 
  Conan the bavarian
 
  Your point eludes me - Google is indexing something which is publicly
  available. eg.:- curl http://somesite.tld/robots.txt
  So it seems the solution to the question your raise is, um,
  nonsensical.
 
  If you don't want something exposed on your web server *don't publish
  references to it*.
 
  The solution, which should be blindingly obvious,  is don't create the
  problem in the first place. Password sensitive directories (htpasswd) -
  then they don't have to be excluded from search engines (because listing
  the inaccessible in robots.txt is redundant).  You must of missed the
  first day of web school.

___
Full-Disclosure - We believe in it.
Charter: http://lists.grok.org.uk/full-disclosure-charter.html
Hosted and sponsored by Secunia - http://secunia.com/

Re: [Full-disclosure] Google's robot.txt handling

2012-12-11 Thread Hurgel Bumpf

Hi guys,

thank you for your valuable feedback.

The question was raised, what prevents somebody to build a script to scan for 
the robots.txt manually. Seriously, let's call it just common sense. The time 
and effort invested does not pay off very well. 

This is why google is very useful in that way. Thousands of servers indexing 
the files in no time for free. 


Thanks,

Coman the Intensivecarian





 Von: Jeffrey Walton noloa...@gmail.com
An: Mario Vilas mvi...@gmail.com 
CC: full-disclosure@lists.grok.org.uk 
Gesendet: 22:38 Dienstag, 11.Dezember 2012
Betreff: Re: [Full-disclosure] Google's robot.txt handling
 
On Tue, Dec 11, 2012 at 4:11 PM, Mario Vilas mvi...@gmail.com wrote:
 I think we can all agree this is not a vulnerability. Still, I have yet to
 see an argument saying why what the OP is proposing is a bad idea. It may be
 a good idea to stop indexing robots.txt to mitigate the faults of lazy or
 incompetent admins (Google already does this for many specific search
 queries) and there's not much point in indexing the robots.txt file for
 legitimate uses anyway.
I kind of agree here. The information is valuable for the
reconnaissance phase of an attack, buts its not a vulnerability per
se. But what is to stop the attacker from fetching it himself/herself
since its at a known location for all sites? In this case, Google
would be removing aggregated search results (which means the attacker
would have to compile it himself/herself).

Google removed other interesting searches, such as social security
numbers and credit card numbers (or does not provide them to the
general public).

Jeff

 On Tue, Dec 11, 2012 at 2:01 PM, Scott Ferguson
 scott.ferguson.it.consult...@gmail.com wrote:

  If I understand the OP correctly, he is not stating that listing
  something
  in robots.txt would make it inaccessible, but rather that Google indexes
  the robots.txt files themselves,

 snipped

 Well, um, yeah - I got that.

 So you are what, proposing that moving an open door back a few
 centimetres solves the (non) problem?

 Take your proposal to it's logical extension and stop all search engines
 (especially the ones that don't respect robots.txt) from indexing
 robots.txt. Now what do you do about Nutch or even some perl script that
 anyone can whip up in 2 minutes?

 Security through obscurity is fine when couple with actual security -
 but relying on it alone is just daft.

 Expecting to world to change so bad habits have no consequence is
 dangerously naive.

 I suspect you're looking to hard at finding fault with Google - who are
 complying with the robots.txt. Read the spec. - it's about not following
 the listed directories, not about not listing the robots.txt.  Next
 you'll want laws against bad weather and furniture with sharp corners.

 Don't put things you don't want seen to see in places that can be seen.

 
 
  On Mon, Dec 10, 2012 at 8:19 PM, Scott Ferguson 
  scott.ferguson.it.consulting () gmail com wrote:
 
 
      /From/: Hurgel Bumpf l0rd_lunatic () yahoo com
      /Date/: Mon, 10 Dec 2012 19:25:39 + (GMT)
 
  
      Hi list,
 
 
      i tried to contact google, but as they didn't answer my email,  i do
 
  forward this to FD.
 
      This security feature is not cleary a google vulnerability, but
 
  exposes websites informations that are not really
 
      intended to be public.
 
      Conan the bavarian
 
  Your point eludes me - Google is indexing something which is publicly
  available. eg.:- curl http://somesite.tld/robots.txt
  So it seems the solution to the question your raise is, um,
  nonsensical.
 
  If you don't want something exposed on your web server *don't publish
  references to it*.
 
  The solution, which should be blindingly obvious,  is don't create the
  problem in the first place. Password sensitive directories (htpasswd) -
  then they don't have to be excluded from search engines (because listing
  the inaccessible in robots.txt is redundant).  You must of missed the
  first day of web school.

___
Full-Disclosure - We believe in it.
Charter: http://lists.grok.org.uk/full-disclosure-charter.html
Hosted and sponsored by Secunia - http://secunia.com/___
Full-Disclosure - We believe in it.
Charter: http://lists.grok.org.uk/full-disclosure-charter.html
Hosted and sponsored by Secunia - http://secunia.com/

Re: [Full-disclosure] Google's robot.txt handling

2012-12-11 Thread Christian Sciberras

If you ask me, it's a stupid idea. :)

I prefer to know where I am with a service; and (IMHO) I would prefer to
query (occasionally) Google for my CC instead of waiting for someone to
start taking funds off it.
Hiding it only provides a false sense of security - it will last until
someone finds the service leaking out CCs.

This is especially the case with robots.txt. Can someone on the list please
define a good web crawler?
There's plenty of crawlers out there, most are relatively unknown how
will we know which to trust?

I think the problem here is that people are plain stupid and throw in
direct entries inside robots.txt, whereas they should be sending wildcard
entries.
Couple that with actually protecting sensitive areas, and it's a pretty
good defence.

On a side note, someone already said this, but I'll repeat it for effect:
don't thrown in anything on the Net which you're not prepared to protect.
If a control panel should
not be accessible to the general public, consider restricting access by IP
and similar measures. Even a personal certificate is a valid layer of
defence...


Chris.



On Tue, Dec 11, 2012 at 10:38 PM, Jeffrey Walton noloa...@gmail.com wrote:

 On Tue, Dec 11, 2012 at 4:11 PM, Mario Vilas mvi...@gmail.com wrote:
  I think we can all agree this is not a vulnerability. Still, I have yet
 to
  see an argument saying why what the OP is proposing is a bad idea. It
 may be
  a good idea to stop indexing robots.txt to mitigate the faults of lazy or
  incompetent admins (Google already does this for many specific search
  queries) and there's not much point in indexing the robots.txt file for
  legitimate uses anyway.
 I kind of agree here. The information is valuable for the
 reconnaissance phase of an attack, buts its not a vulnerability per
 se. But what is to stop the attacker from fetching it himself/herself
 since its at a known location for all sites? In this case, Google
 would be removing aggregated search results (which means the attacker
 would have to compile it himself/herself).

 Google removed other interesting searches, such as social security
 numbers and credit card numbers (or does not provide them to the
 general public).

 Jeff

  On Tue, Dec 11, 2012 at 2:01 PM, Scott Ferguson
  scott.ferguson.it.consult...@gmail.com wrote:
 
   If I understand the OP correctly, he is not stating that listing
   something
   in robots.txt would make it inaccessible, but rather that Google
 indexes
   the robots.txt files themselves,
 
  snipped
 
  Well, um, yeah - I got that.
 
  So you are what, proposing that moving an open door back a few
  centimetres solves the (non) problem?
 
  Take your proposal to it's logical extension and stop all search engines
  (especially the ones that don't respect robots.txt) from indexing
  robots.txt. Now what do you do about Nutch or even some perl script that
  anyone can whip up in 2 minutes?
 
  Security through obscurity is fine when couple with actual security -
  but relying on it alone is just daft.
 
  Expecting to world to change so bad habits have no consequence is
  dangerously naive.
 
  I suspect you're looking to hard at finding fault with Google - who are
  complying with the robots.txt. Read the spec. - it's about not following
  the listed directories, not about not listing the robots.txt.  Next
  you'll want laws against bad weather and furniture with sharp corners.
 
  Don't put things you don't want seen to see in places that can be seen.
 
  
  
   On Mon, Dec 10, 2012 at 8:19 PM, Scott Ferguson 
   scott.ferguson.it.consulting () gmail com wrote:
  
  
   /From/: Hurgel Bumpf l0rd_lunatic () yahoo com
   /Date/: Mon, 10 Dec 2012 19:25:39 + (GMT)
  
  
 
   Hi list,
  
  
   i tried to contact google, but as they didn't answer my email,  i
 do
  
   forward this to FD.
  
   This security feature is not cleary a google vulnerability, but
  
   exposes websites informations that are not really
  
   intended to be public.
  
   Conan the bavarian
  
   Your point eludes me - Google is indexing something which is publicly
   available. eg.:- curl http://somesite.tld/robots.txt
   So it seems the solution to the question your raise is, um,
   nonsensical.
  
   If you don't want something exposed on your web server *don't publish
   references to it*.
  
   The solution, which should be blindingly obvious,  is don't create the
   problem in the first place. Password sensitive directories (htpasswd)
 -
   then they don't have to be excluded from search engines (because
 listing
   the inaccessible in robots.txt is redundant).  You must of missed the
   first day of web school.

 ___
 Full-Disclosure - We believe in it.
 Charter: http://lists.grok.org.uk/full-disclosure-charter.html
 Hosted and sponsored by Secunia - http://secunia.com/

Re: [Full-disclosure] Google's robot.txt handling

2012-12-11 Thread Jeffrey Walton

On Tue, Dec 11, 2012 at 5:53 PM, Christian Sciberras uuf6...@gmail.com wrote:
 If you ask me, it's a stupid idea. :)

 I prefer to know where I am with a service; and (IMHO) I would prefer to
 query (occasionally) Google for my CC instead of waiting for someone to
 start taking funds off it.
 Hiding it only provides a false sense of security - it will last until
 someone finds the service leaking out CCs.
Agreed. How about search engine data by other crawlers that was not sanitized?

 This is especially the case with robots.txt. Can someone on the list please
 define a good web crawler?
Haha! Milk up the nose.

 I think the problem here is that people are plain stupid and throw in direct
 entries inside robots.txt, whereas they should be sending wildcard entries.
 Couple that with actually protecting sensitive areas, and it's a pretty good
 defence.
We now know you don't need a robots.txt for exclusion. Just ask Weev.

Jeff

 On Tue, Dec 11, 2012 at 10:38 PM, Jeffrey Walton noloa...@gmail.com wrote:

 On Tue, Dec 11, 2012 at 4:11 PM, Mario Vilas mvi...@gmail.com wrote:
  I think we can all agree this is not a vulnerability. Still, I have yet
  to
  see an argument saying why what the OP is proposing is a bad idea. It
  may be
  a good idea to stop indexing robots.txt to mitigate the faults of lazy
  or
  incompetent admins (Google already does this for many specific search
  queries) and there's not much point in indexing the robots.txt file for
  legitimate uses anyway.
 I kind of agree here. The information is valuable for the
 reconnaissance phase of an attack, buts its not a vulnerability per
 se. But what is to stop the attacker from fetching it himself/herself
 since its at a known location for all sites? In this case, Google
 would be removing aggregated search results (which means the attacker
 would have to compile it himself/herself).

 Google removed other interesting searches, such as social security
 numbers and credit card numbers (or does not provide them to the
 general public).

 Jeff

  On Tue, Dec 11, 2012 at 2:01 PM, Scott Ferguson
  scott.ferguson.it.consult...@gmail.com wrote:
 
   If I understand the OP correctly, he is not stating that listing
   something
   in robots.txt would make it inaccessible, but rather that Google
   indexes
   the robots.txt files themselves,
 
  snipped
 
  Well, um, yeah - I got that.
 
  So you are what, proposing that moving an open door back a few
  centimetres solves the (non) problem?
 
  Take your proposal to it's logical extension and stop all search
  engines
  (especially the ones that don't respect robots.txt) from indexing
  robots.txt. Now what do you do about Nutch or even some perl script
  that
  anyone can whip up in 2 minutes?
 
  Security through obscurity is fine when couple with actual security -
  but relying on it alone is just daft.
 
  Expecting to world to change so bad habits have no consequence is
  dangerously naive.
 
  I suspect you're looking to hard at finding fault with Google - who are
  complying with the robots.txt. Read the spec. - it's about not
  following
  the listed directories, not about not listing the robots.txt.  Next
  you'll want laws against bad weather and furniture with sharp corners.
 
  Don't put things you don't want seen to see in places that can be seen.
 
  
  
   On Mon, Dec 10, 2012 at 8:19 PM, Scott Ferguson 
   scott.ferguson.it.consulting () gmail com wrote:
  
  
   /From/: Hurgel Bumpf l0rd_lunatic () yahoo com
   /Date/: Mon, 10 Dec 2012 19:25:39 + (GMT)
  
  
   
   Hi list,
  
  
   i tried to contact google, but as they didn't answer my email,  i
   do
  
   forward this to FD.
  
   This security feature is not cleary a google vulnerability, but
  
   exposes websites informations that are not really
  
   intended to be public.
  
   Conan the bavarian
  
   Your point eludes me - Google is indexing something which is publicly
   available. eg.:- curl http://somesite.tld/robots.txt
   So it seems the solution to the question your raise is, um,
   nonsensical.
  
   If you don't want something exposed on your web server *don't publish
   references to it*.
  
   The solution, which should be blindingly obvious,  is don't create
   the
   problem in the first place. Password sensitive directories (htpasswd)
   -
   then they don't have to be excluded from search engines (because
   listing
   the inaccessible in robots.txt is redundant).  You must of missed the
   first day of web school.

___
Full-Disclosure - We believe in it.
Charter: http://lists.grok.org.uk/full-disclosure-charter.html
Hosted and sponsored by Secunia - http://secunia.com/

Re: [Full-disclosure] Google's robot.txt handling

2012-12-11 Thread Thomas Behrend

 We found this Security Issue real long time ago and used it by 
 ourself to find hidden pages.
 The only thing you could do, is to harden the directory for Crawlers 
 with Mod_Rewrite or in the index.(php|pl|py|asp|etc) itself when you 
 check the Browser String. If it doesn´t contain somethin like the common 
 Browser Strings, just send an 404 back and Google and other Crawlers 
 will never index it.

 Of course, you just could rename /admin/ to /4dm1n/ or even just us an 
 Subdomain you never link on your Webpage, in that case, just split the 
 Webcontent and hide / in the robots.txt just in case the URL leaks.

 Another thing we saw working: Just lock the directory via htaccess of 
 your Webserversoftware and Google didn´t index the page because the 
 Crawler didn´t get an HTTP Code 200 back, its getting an 401.

 So, thats our way to hide our Admininterfaces. Worked so far, but 
 even in case someone finds it, the Interface should be strong enough to 
 withstand any Attack. And of course, the Login Creditials shouldn´t be 
 password or on top of Page Speak friend an come in :)

 So long
 Thomas

 On Tue, 11 Dec 2012 17:57:31 -0500, Jeffrey Walton noloa...@gmail.com 
 wrote:
 On Tue, Dec 11, 2012 at 5:53 PM, Christian Sciberras
 uuf6...@gmail.com wrote:
 If you ask me, it's a stupid idea. :)

 I prefer to know where I am with a service; and (IMHO) I would 
 prefer to
 query (occasionally) Google for my CC instead of waiting for someone 
 to
 start taking funds off it.
 Hiding it only provides a false sense of security - it will last 
 until
 someone finds the service leaking out CCs.
 Agreed. How about search engine data by other crawlers that was not
 sanitized?

 This is especially the case with robots.txt. Can someone on the list 
 please
 define a good web crawler?
 Haha! Milk up the nose.

 I think the problem here is that people are plain stupid and throw 
 in direct
 entries inside robots.txt, whereas they should be sending wildcard 
 entries.
 Couple that with actually protecting sensitive areas, and it's a 
 pretty good
 defence.
 We now know you don't need a robots.txt for exclusion. Just ask Weev.

 Jeff

 On Tue, Dec 11, 2012 at 10:38 PM, Jeffrey Walton 
 noloa...@gmail.com wrote:

 On Tue, Dec 11, 2012 at 4:11 PM, Mario Vilas mvi...@gmail.com 
 wrote:
  I think we can all agree this is not a vulnerability. Still, I 
 have yet
  to
  see an argument saying why what the OP is proposing is a bad 
 idea. It
  may be
  a good idea to stop indexing robots.txt to mitigate the faults of 
 lazy
  or
  incompetent admins (Google already does this for many specific 
 search
  queries) and there's not much point in indexing the robots.txt 
 file for
  legitimate uses anyway.
 I kind of agree here. The information is valuable for the
 reconnaissance phase of an attack, buts its not a vulnerability per
 se. But what is to stop the attacker from fetching it 
 himself/herself
 since its at a known location for all sites? In this case, Google
 would be removing aggregated search results (which means the 
 attacker
 would have to compile it himself/herself).

 Google removed other interesting searches, such as social security
 numbers and credit card numbers (or does not provide them to the
 general public).

 Jeff

  On Tue, Dec 11, 2012 at 2:01 PM, Scott Ferguson
  scott.ferguson.it.consult...@gmail.com wrote:
 
   If I understand the OP correctly, he is not stating that 
 listing
   something
   in robots.txt would make it inaccessible, but rather that 
 Google
   indexes
   the robots.txt files themselves,
 
  snipped
 
  Well, um, yeah - I got that.
 
  So you are what, proposing that moving an open door back a few
  centimetres solves the (non) problem?
 
  Take your proposal to it's logical extension and stop all search
  engines
  (especially the ones that don't respect robots.txt) from 
 indexing
  robots.txt. Now what do you do about Nutch or even some perl 
 script
  that
  anyone can whip up in 2 minutes?
 
  Security through obscurity is fine when couple with actual 
 security -
  but relying on it alone is just daft.
 
  Expecting to world to change so bad habits have no consequence 
 is
  dangerously naive.
 
  I suspect you're looking to hard at finding fault with Google - 
 who are
  complying with the robots.txt. Read the spec. - it's about not
  following
  the listed directories, not about not listing the robots.txt.  
 Next
  you'll want laws against bad weather and furniture with sharp 
 corners.
 
  Don't put things you don't want seen to see in places that can 
 be seen.
 
  
  
   On Mon, Dec 10, 2012 at 8:19 PM, Scott Ferguson 
   scott.ferguson.it.consulting () gmail com wrote:
  
  
   /From/: Hurgel Bumpf l0rd_lunatic () yahoo com
   /Date/: Mon, 10 Dec 2012 19:25:39 + (GMT)
  
  
   
 
   Hi list,
  
  
   i tried to contact google, but as they didn't answer my 
 email,  i
   do

Re: [Full-disclosure] Google's robot.txt handling

Re: [Full-disclosure] Google's robot.txt handling

Re: [Full-disclosure] Google's robot.txt handling

Re: [Full-disclosure] Google's robot.txt handling

Re: [Full-disclosure] Google's robot.txt handling

Re: [Full-disclosure] Google's robot.txt handling

Re: [Full-disclosure] Google's robot.txt handling

7 matches

Site Navigation

Mail list logo

Footer information