Hi. I have some pre-production servers that I don't wan't to be indexed by search engines. I tried many way to make a global rule that will always return
User-agent: * Disallow: / for /robots.txt, but there are always special case where a vhost could change this behaviour and be indexed. I suppose a lot of other peole have the same problem. So I made a module called "mod_norobot" that, once enabled will make you sure that all your server will never be indexed. It is really simple, but I was wondering if it could be useful to others, so here it is, attached to this email. Disclaimer : I'm absolutly not a C ou System developer. I'm a Java developer. And this is my first module. So maybe it could be made better ... -- Mike Baroukh --- Cardiweb - 29 Cite d'Antin Paris IXeme +33 6 63 57 27 22 / +33 1 53 21 82 63 http://www.cardiweb.com/ ---
/* ** mod_norobot.c -- Apache sample norobot module ** [Autogenerated via ``apxs -n norobot -g''] ** ** To play with this sample module first compile it into a ** DSO file and install it into Apache's modules directory ** by running: ** ** $ apxs -c -i mod_norobot.c ** ** Then activate it in Apache's apache2.conf file for instance ** for the URL /norobot in as follows: ** ** # apache2.conf ** LoadModule norobot_module modules/mod_norobot.so ** <Location /norobot> ** SetHandler norobot ** </Location> ** ** Then after restarting Apache via ** ** $ apachectl restart ** ** you immediately can request the URL /norobot and watch for the ** output of this module. This can be achieved for instance via: ** ** $ lynx -mime_header http://localhost/norobot ** ** The output should be similar to the following one: ** ** HTTP/1.1 200 OK ** Date: Tue, 31 Mar 1998 14:42:22 GMT ** Server: Apache/1.3.4 (Unix) ** Connection: close ** Content-Type: text/html ** ** The sample page from mod_norobot.c */ #include "httpd.h" #include "http_config.h" #include "http_protocol.h" #include "ap_config.h" /* The sample content handler */ static int norobot_handler(request_rec *r) { if (r->parsed_uri.path==NULL || strcasecmp(r->parsed_uri.path, "/robots.txt")) { return DECLINED; } ap_set_content_type(r, "text/plain"); ap_rputs("User-agent: *\nDisallow: /", r); return OK; } static void norobot_register_hooks(apr_pool_t *p) { ap_hook_handler(norobot_handler, NULL, NULL, APR_HOOK_FIRST); } /* Dispatch list for API hooks */ module AP_MODULE_DECLARE_DATA norobot_module = { STANDARD20_MODULE_STUFF, NULL, /* create per-dir config structures */ NULL, /* merge per-dir config structures */ NULL, /* create per-server config structures */ NULL, /* merge per-server config structures */ NULL, /* table of config file commands */ norobot_register_hooks /* register hooks */ };