Re: [Full-disclosure] Google's robot.txt handling
If I understand the OP correctly, he is not stating that listing something in robots.txt would make it inaccessible, but rather that Google indexes the robots.txt files themselves, snipped Well, um, yeah - I got that. So you are what, proposing that moving an open door back a few centimetres solves the (non) problem? Take your proposal to it's logical extension and stop all search engines (especially the ones that don't respect robots.txt) from indexing robots.txt. Now what do you do about Nutch or even some perl script that anyone can whip up in 2 minutes? Security through obscurity is fine when couple with actual security - but relying on it alone is just daft. Expecting to world to change so bad habits have no consequence is dangerously naive. I suspect you're looking to hard at finding fault with Google - who are complying with the robots.txt. Read the spec. - it's about not following the listed directories, not about not listing the robots.txt. Next you'll want laws against bad weather and furniture with sharp corners. Don't put things you don't want seen to see in places that can be seen. On Mon, Dec 10, 2012 at 8:19 PM, Scott Ferguson scott.ferguson.it.consulting () gmail com wrote: /From/: Hurgel Bumpf l0rd_lunatic () yahoo com /Date/: Mon, 10 Dec 2012 19:25:39 + (GMT) Hi list, i tried to contact google, but as they didn't answer my email, i do forward this to FD. This security feature is not cleary a google vulnerability, but exposes websites informations that are not really intended to be public. Conan the bavarian Your point eludes me - Google is indexing something which is publicly available. eg.:- curl http://somesite.tld/robots.txt So it seems the solution to the question your raise is, um, nonsensical. If you don't want something exposed on your web server *don't publish references to it*. The solution, which should be blindingly obvious, is don't create the problem in the first place. Password sensitive directories (htpasswd) - then they don't have to be excluded from search engines (because listing the inaccessible in robots.txt is redundant). You must of missed the first day of web school. Kind regards. ___ Full-Disclosure - We believe in it. Charter: http://lists.grok.org.uk/full-disclosure-charter.html Hosted and sponsored by Secunia - http://secunia.com/
Re: [Full-disclosure] Google's robot.txt handling
I think we can all agree this is not a vulnerability. Still, I have yet to see an argument saying why what the OP is proposing is a bad idea. It may be a good idea to stop indexing robots.txt to mitigate the faults of lazy or incompetent admins (Google already does this for many specific search queries) and there's not much point in indexing the robots.txt file for legitimate uses anyway. On Tue, Dec 11, 2012 at 2:01 PM, Scott Ferguson scott.ferguson.it.consult...@gmail.com wrote: If I understand the OP correctly, he is not stating that listing something in robots.txt would make it inaccessible, but rather that Google indexes the robots.txt files themselves, snipped Well, um, yeah - I got that. So you are what, proposing that moving an open door back a few centimetres solves the (non) problem? Take your proposal to it's logical extension and stop all search engines (especially the ones that don't respect robots.txt) from indexing robots.txt. Now what do you do about Nutch or even some perl script that anyone can whip up in 2 minutes? Security through obscurity is fine when couple with actual security - but relying on it alone is just daft. Expecting to world to change so bad habits have no consequence is dangerously naive. I suspect you're looking to hard at finding fault with Google - who are complying with the robots.txt. Read the spec. - it's about not following the listed directories, not about not listing the robots.txt. Next you'll want laws against bad weather and furniture with sharp corners. Don't put things you don't want seen to see in places that can be seen. On Mon, Dec 10, 2012 at 8:19 PM, Scott Ferguson scott.ferguson.it.consulting () gmail com wrote: /From/: Hurgel Bumpf l0rd_lunatic () yahoo com /Date/: Mon, 10 Dec 2012 19:25:39 + (GMT) Hi list, i tried to contact google, but as they didn't answer my email, i do forward this to FD. This security feature is not cleary a google vulnerability, but exposes websites informations that are not really intended to be public. Conan the bavarian Your point eludes me - Google is indexing something which is publicly available. eg.:- curl http://somesite.tld/robots.txt So it seems the solution to the question your raise is, um, nonsensical. If you don't want something exposed on your web server *don't publish references to it*. The solution, which should be blindingly obvious, is don't create the problem in the first place. Password sensitive directories (htpasswd) - then they don't have to be excluded from search engines (because listing the inaccessible in robots.txt is redundant). You must of missed the first day of web school. Kind regards. ___ Full-Disclosure - We believe in it. Charter: http://lists.grok.org.uk/full-disclosure-charter.html Hosted and sponsored by Secunia - http://secunia.com/ -- “There's a reason we separate military and the police: one fights the enemy of the state, the other serves and protects the people. When the military becomes both, then the enemies of the state tend to become the people.” ___ Full-Disclosure - We believe in it. Charter: http://lists.grok.org.uk/full-disclosure-charter.html Hosted and sponsored by Secunia - http://secunia.com/
Re: [Full-disclosure] Google's robot.txt handling
On Tue, Dec 11, 2012 at 4:11 PM, Mario Vilas mvi...@gmail.com wrote: I think we can all agree this is not a vulnerability. Still, I have yet to see an argument saying why what the OP is proposing is a bad idea. It may be a good idea to stop indexing robots.txt to mitigate the faults of lazy or incompetent admins (Google already does this for many specific search queries) and there's not much point in indexing the robots.txt file for legitimate uses anyway. I kind of agree here. The information is valuable for the reconnaissance phase of an attack, buts its not a vulnerability per se. But what is to stop the attacker from fetching it himself/herself since its at a known location for all sites? In this case, Google would be removing aggregated search results (which means the attacker would have to compile it himself/herself). Google removed other interesting searches, such as social security numbers and credit card numbers (or does not provide them to the general public). Jeff On Tue, Dec 11, 2012 at 2:01 PM, Scott Ferguson scott.ferguson.it.consult...@gmail.com wrote: If I understand the OP correctly, he is not stating that listing something in robots.txt would make it inaccessible, but rather that Google indexes the robots.txt files themselves, snipped Well, um, yeah - I got that. So you are what, proposing that moving an open door back a few centimetres solves the (non) problem? Take your proposal to it's logical extension and stop all search engines (especially the ones that don't respect robots.txt) from indexing robots.txt. Now what do you do about Nutch or even some perl script that anyone can whip up in 2 minutes? Security through obscurity is fine when couple with actual security - but relying on it alone is just daft. Expecting to world to change so bad habits have no consequence is dangerously naive. I suspect you're looking to hard at finding fault with Google - who are complying with the robots.txt. Read the spec. - it's about not following the listed directories, not about not listing the robots.txt. Next you'll want laws against bad weather and furniture with sharp corners. Don't put things you don't want seen to see in places that can be seen. On Mon, Dec 10, 2012 at 8:19 PM, Scott Ferguson scott.ferguson.it.consulting () gmail com wrote: /From/: Hurgel Bumpf l0rd_lunatic () yahoo com /Date/: Mon, 10 Dec 2012 19:25:39 + (GMT) Hi list, i tried to contact google, but as they didn't answer my email, i do forward this to FD. This security feature is not cleary a google vulnerability, but exposes websites informations that are not really intended to be public. Conan the bavarian Your point eludes me - Google is indexing something which is publicly available. eg.:- curl http://somesite.tld/robots.txt So it seems the solution to the question your raise is, um, nonsensical. If you don't want something exposed on your web server *don't publish references to it*. The solution, which should be blindingly obvious, is don't create the problem in the first place. Password sensitive directories (htpasswd) - then they don't have to be excluded from search engines (because listing the inaccessible in robots.txt is redundant). You must of missed the first day of web school. ___ Full-Disclosure - We believe in it. Charter: http://lists.grok.org.uk/full-disclosure-charter.html Hosted and sponsored by Secunia - http://secunia.com/
Re: [Full-disclosure] Google's robot.txt handling
Hi guys, thank you for your valuable feedback. The question was raised, what prevents somebody to build a script to scan for the robots.txt manually. Seriously, let's call it just common sense. The time and effort invested does not pay off very well. This is why google is very useful in that way. Thousands of servers indexing the files in no time for free. Thanks, Coman the Intensivecarian Von: Jeffrey Walton noloa...@gmail.com An: Mario Vilas mvi...@gmail.com CC: full-disclosure@lists.grok.org.uk Gesendet: 22:38 Dienstag, 11.Dezember 2012 Betreff: Re: [Full-disclosure] Google's robot.txt handling On Tue, Dec 11, 2012 at 4:11 PM, Mario Vilas mvi...@gmail.com wrote: I think we can all agree this is not a vulnerability. Still, I have yet to see an argument saying why what the OP is proposing is a bad idea. It may be a good idea to stop indexing robots.txt to mitigate the faults of lazy or incompetent admins (Google already does this for many specific search queries) and there's not much point in indexing the robots.txt file for legitimate uses anyway. I kind of agree here. The information is valuable for the reconnaissance phase of an attack, buts its not a vulnerability per se. But what is to stop the attacker from fetching it himself/herself since its at a known location for all sites? In this case, Google would be removing aggregated search results (which means the attacker would have to compile it himself/herself). Google removed other interesting searches, such as social security numbers and credit card numbers (or does not provide them to the general public). Jeff On Tue, Dec 11, 2012 at 2:01 PM, Scott Ferguson scott.ferguson.it.consult...@gmail.com wrote: If I understand the OP correctly, he is not stating that listing something in robots.txt would make it inaccessible, but rather that Google indexes the robots.txt files themselves, snipped Well, um, yeah - I got that. So you are what, proposing that moving an open door back a few centimetres solves the (non) problem? Take your proposal to it's logical extension and stop all search engines (especially the ones that don't respect robots.txt) from indexing robots.txt. Now what do you do about Nutch or even some perl script that anyone can whip up in 2 minutes? Security through obscurity is fine when couple with actual security - but relying on it alone is just daft. Expecting to world to change so bad habits have no consequence is dangerously naive. I suspect you're looking to hard at finding fault with Google - who are complying with the robots.txt. Read the spec. - it's about not following the listed directories, not about not listing the robots.txt. Next you'll want laws against bad weather and furniture with sharp corners. Don't put things you don't want seen to see in places that can be seen. On Mon, Dec 10, 2012 at 8:19 PM, Scott Ferguson scott.ferguson.it.consulting () gmail com wrote: /From/: Hurgel Bumpf l0rd_lunatic () yahoo com /Date/: Mon, 10 Dec 2012 19:25:39 + (GMT) Hi list, i tried to contact google, but as they didn't answer my email, i do forward this to FD. This security feature is not cleary a google vulnerability, but exposes websites informations that are not really intended to be public. Conan the bavarian Your point eludes me - Google is indexing something which is publicly available. eg.:- curl http://somesite.tld/robots.txt So it seems the solution to the question your raise is, um, nonsensical. If you don't want something exposed on your web server *don't publish references to it*. The solution, which should be blindingly obvious, is don't create the problem in the first place. Password sensitive directories (htpasswd) - then they don't have to be excluded from search engines (because listing the inaccessible in robots.txt is redundant). You must of missed the first day of web school. ___ Full-Disclosure - We believe in it. Charter: http://lists.grok.org.uk/full-disclosure-charter.html Hosted and sponsored by Secunia - http://secunia.com/___ Full-Disclosure - We believe in it. Charter: http://lists.grok.org.uk/full-disclosure-charter.html Hosted and sponsored by Secunia - http://secunia.com/
Re: [Full-disclosure] Google's robot.txt handling
If you ask me, it's a stupid idea. :) I prefer to know where I am with a service; and (IMHO) I would prefer to query (occasionally) Google for my CC instead of waiting for someone to start taking funds off it. Hiding it only provides a false sense of security - it will last until someone finds the service leaking out CCs. This is especially the case with robots.txt. Can someone on the list please define a good web crawler? There's plenty of crawlers out there, most are relatively unknown how will we know which to trust? I think the problem here is that people are plain stupid and throw in direct entries inside robots.txt, whereas they should be sending wildcard entries. Couple that with actually protecting sensitive areas, and it's a pretty good defence. On a side note, someone already said this, but I'll repeat it for effect: don't thrown in anything on the Net which you're not prepared to protect. If a control panel should not be accessible to the general public, consider restricting access by IP and similar measures. Even a personal certificate is a valid layer of defence... Chris. On Tue, Dec 11, 2012 at 10:38 PM, Jeffrey Walton noloa...@gmail.com wrote: On Tue, Dec 11, 2012 at 4:11 PM, Mario Vilas mvi...@gmail.com wrote: I think we can all agree this is not a vulnerability. Still, I have yet to see an argument saying why what the OP is proposing is a bad idea. It may be a good idea to stop indexing robots.txt to mitigate the faults of lazy or incompetent admins (Google already does this for many specific search queries) and there's not much point in indexing the robots.txt file for legitimate uses anyway. I kind of agree here. The information is valuable for the reconnaissance phase of an attack, buts its not a vulnerability per se. But what is to stop the attacker from fetching it himself/herself since its at a known location for all sites? In this case, Google would be removing aggregated search results (which means the attacker would have to compile it himself/herself). Google removed other interesting searches, such as social security numbers and credit card numbers (or does not provide them to the general public). Jeff On Tue, Dec 11, 2012 at 2:01 PM, Scott Ferguson scott.ferguson.it.consult...@gmail.com wrote: If I understand the OP correctly, he is not stating that listing something in robots.txt would make it inaccessible, but rather that Google indexes the robots.txt files themselves, snipped Well, um, yeah - I got that. So you are what, proposing that moving an open door back a few centimetres solves the (non) problem? Take your proposal to it's logical extension and stop all search engines (especially the ones that don't respect robots.txt) from indexing robots.txt. Now what do you do about Nutch or even some perl script that anyone can whip up in 2 minutes? Security through obscurity is fine when couple with actual security - but relying on it alone is just daft. Expecting to world to change so bad habits have no consequence is dangerously naive. I suspect you're looking to hard at finding fault with Google - who are complying with the robots.txt. Read the spec. - it's about not following the listed directories, not about not listing the robots.txt. Next you'll want laws against bad weather and furniture with sharp corners. Don't put things you don't want seen to see in places that can be seen. On Mon, Dec 10, 2012 at 8:19 PM, Scott Ferguson scott.ferguson.it.consulting () gmail com wrote: /From/: Hurgel Bumpf l0rd_lunatic () yahoo com /Date/: Mon, 10 Dec 2012 19:25:39 + (GMT) Hi list, i tried to contact google, but as they didn't answer my email, i do forward this to FD. This security feature is not cleary a google vulnerability, but exposes websites informations that are not really intended to be public. Conan the bavarian Your point eludes me - Google is indexing something which is publicly available. eg.:- curl http://somesite.tld/robots.txt So it seems the solution to the question your raise is, um, nonsensical. If you don't want something exposed on your web server *don't publish references to it*. The solution, which should be blindingly obvious, is don't create the problem in the first place. Password sensitive directories (htpasswd) - then they don't have to be excluded from search engines (because listing the inaccessible in robots.txt is redundant). You must of missed the first day of web school. ___ Full-Disclosure - We believe in it. Charter: http://lists.grok.org.uk/full-disclosure-charter.html Hosted and sponsored by Secunia - http://secunia.com/
Re: [Full-disclosure] Google's robot.txt handling
On Tue, Dec 11, 2012 at 5:53 PM, Christian Sciberras uuf6...@gmail.com wrote: If you ask me, it's a stupid idea. :) I prefer to know where I am with a service; and (IMHO) I would prefer to query (occasionally) Google for my CC instead of waiting for someone to start taking funds off it. Hiding it only provides a false sense of security - it will last until someone finds the service leaking out CCs. Agreed. How about search engine data by other crawlers that was not sanitized? This is especially the case with robots.txt. Can someone on the list please define a good web crawler? Haha! Milk up the nose. I think the problem here is that people are plain stupid and throw in direct entries inside robots.txt, whereas they should be sending wildcard entries. Couple that with actually protecting sensitive areas, and it's a pretty good defence. We now know you don't need a robots.txt for exclusion. Just ask Weev. Jeff On Tue, Dec 11, 2012 at 10:38 PM, Jeffrey Walton noloa...@gmail.com wrote: On Tue, Dec 11, 2012 at 4:11 PM, Mario Vilas mvi...@gmail.com wrote: I think we can all agree this is not a vulnerability. Still, I have yet to see an argument saying why what the OP is proposing is a bad idea. It may be a good idea to stop indexing robots.txt to mitigate the faults of lazy or incompetent admins (Google already does this for many specific search queries) and there's not much point in indexing the robots.txt file for legitimate uses anyway. I kind of agree here. The information is valuable for the reconnaissance phase of an attack, buts its not a vulnerability per se. But what is to stop the attacker from fetching it himself/herself since its at a known location for all sites? In this case, Google would be removing aggregated search results (which means the attacker would have to compile it himself/herself). Google removed other interesting searches, such as social security numbers and credit card numbers (or does not provide them to the general public). Jeff On Tue, Dec 11, 2012 at 2:01 PM, Scott Ferguson scott.ferguson.it.consult...@gmail.com wrote: If I understand the OP correctly, he is not stating that listing something in robots.txt would make it inaccessible, but rather that Google indexes the robots.txt files themselves, snipped Well, um, yeah - I got that. So you are what, proposing that moving an open door back a few centimetres solves the (non) problem? Take your proposal to it's logical extension and stop all search engines (especially the ones that don't respect robots.txt) from indexing robots.txt. Now what do you do about Nutch or even some perl script that anyone can whip up in 2 minutes? Security through obscurity is fine when couple with actual security - but relying on it alone is just daft. Expecting to world to change so bad habits have no consequence is dangerously naive. I suspect you're looking to hard at finding fault with Google - who are complying with the robots.txt. Read the spec. - it's about not following the listed directories, not about not listing the robots.txt. Next you'll want laws against bad weather and furniture with sharp corners. Don't put things you don't want seen to see in places that can be seen. On Mon, Dec 10, 2012 at 8:19 PM, Scott Ferguson scott.ferguson.it.consulting () gmail com wrote: /From/: Hurgel Bumpf l0rd_lunatic () yahoo com /Date/: Mon, 10 Dec 2012 19:25:39 + (GMT) Hi list, i tried to contact google, but as they didn't answer my email, i do forward this to FD. This security feature is not cleary a google vulnerability, but exposes websites informations that are not really intended to be public. Conan the bavarian Your point eludes me - Google is indexing something which is publicly available. eg.:- curl http://somesite.tld/robots.txt So it seems the solution to the question your raise is, um, nonsensical. If you don't want something exposed on your web server *don't publish references to it*. The solution, which should be blindingly obvious, is don't create the problem in the first place. Password sensitive directories (htpasswd) - then they don't have to be excluded from search engines (because listing the inaccessible in robots.txt is redundant). You must of missed the first day of web school. ___ Full-Disclosure - We believe in it. Charter: http://lists.grok.org.uk/full-disclosure-charter.html Hosted and sponsored by Secunia - http://secunia.com/
Re: [Full-disclosure] Google's robot.txt handling
We found this Security Issue real long time ago and used it by ourself to find hidden pages. The only thing you could do, is to harden the directory for Crawlers with Mod_Rewrite or in the index.(php|pl|py|asp|etc) itself when you check the Browser String. If it doesn´t contain somethin like the common Browser Strings, just send an 404 back and Google and other Crawlers will never index it. Of course, you just could rename /admin/ to /4dm1n/ or even just us an Subdomain you never link on your Webpage, in that case, just split the Webcontent and hide / in the robots.txt just in case the URL leaks. Another thing we saw working: Just lock the directory via htaccess of your Webserversoftware and Google didn´t index the page because the Crawler didn´t get an HTTP Code 200 back, its getting an 401. So, thats our way to hide our Admininterfaces. Worked so far, but even in case someone finds it, the Interface should be strong enough to withstand any Attack. And of course, the Login Creditials shouldn´t be password or on top of Page Speak friend an come in :) So long Thomas On Tue, 11 Dec 2012 17:57:31 -0500, Jeffrey Walton noloa...@gmail.com wrote: On Tue, Dec 11, 2012 at 5:53 PM, Christian Sciberras uuf6...@gmail.com wrote: If you ask me, it's a stupid idea. :) I prefer to know where I am with a service; and (IMHO) I would prefer to query (occasionally) Google for my CC instead of waiting for someone to start taking funds off it. Hiding it only provides a false sense of security - it will last until someone finds the service leaking out CCs. Agreed. How about search engine data by other crawlers that was not sanitized? This is especially the case with robots.txt. Can someone on the list please define a good web crawler? Haha! Milk up the nose. I think the problem here is that people are plain stupid and throw in direct entries inside robots.txt, whereas they should be sending wildcard entries. Couple that with actually protecting sensitive areas, and it's a pretty good defence. We now know you don't need a robots.txt for exclusion. Just ask Weev. Jeff On Tue, Dec 11, 2012 at 10:38 PM, Jeffrey Walton noloa...@gmail.com wrote: On Tue, Dec 11, 2012 at 4:11 PM, Mario Vilas mvi...@gmail.com wrote: I think we can all agree this is not a vulnerability. Still, I have yet to see an argument saying why what the OP is proposing is a bad idea. It may be a good idea to stop indexing robots.txt to mitigate the faults of lazy or incompetent admins (Google already does this for many specific search queries) and there's not much point in indexing the robots.txt file for legitimate uses anyway. I kind of agree here. The information is valuable for the reconnaissance phase of an attack, buts its not a vulnerability per se. But what is to stop the attacker from fetching it himself/herself since its at a known location for all sites? In this case, Google would be removing aggregated search results (which means the attacker would have to compile it himself/herself). Google removed other interesting searches, such as social security numbers and credit card numbers (or does not provide them to the general public). Jeff On Tue, Dec 11, 2012 at 2:01 PM, Scott Ferguson scott.ferguson.it.consult...@gmail.com wrote: If I understand the OP correctly, he is not stating that listing something in robots.txt would make it inaccessible, but rather that Google indexes the robots.txt files themselves, snipped Well, um, yeah - I got that. So you are what, proposing that moving an open door back a few centimetres solves the (non) problem? Take your proposal to it's logical extension and stop all search engines (especially the ones that don't respect robots.txt) from indexing robots.txt. Now what do you do about Nutch or even some perl script that anyone can whip up in 2 minutes? Security through obscurity is fine when couple with actual security - but relying on it alone is just daft. Expecting to world to change so bad habits have no consequence is dangerously naive. I suspect you're looking to hard at finding fault with Google - who are complying with the robots.txt. Read the spec. - it's about not following the listed directories, not about not listing the robots.txt. Next you'll want laws against bad weather and furniture with sharp corners. Don't put things you don't want seen to see in places that can be seen. On Mon, Dec 10, 2012 at 8:19 PM, Scott Ferguson scott.ferguson.it.consulting () gmail com wrote: /From/: Hurgel Bumpf l0rd_lunatic () yahoo com /Date/: Mon, 10 Dec 2012 19:25:39 + (GMT) Hi list, i tried to contact google, but as they didn't answer my email, i do