Hi.

I have some pre-production servers that I don't wan't to be indexed by
search engines.
I tried many way to make a global rule that will always return

User-agent: *
Disallow: /

for /robots.txt, but there are always special case where a vhost could
change this behaviour and be indexed.

I suppose a lot of other peole have the same problem.

So I made a module called "mod_norobot" that, once enabled will make you
sure that all your server will never be indexed.
It is really simple, but I was wondering if it could be useful to
others, so here it is, attached to this email.

Disclaimer :
I'm absolutly not a C ou System developer.
I'm a Java developer.
And this is my first module.
So maybe it could be made better ...


-- 
Mike Baroukh
---
Cardiweb  - 29 Cite d'Antin Paris IXeme
+33 6 63 57 27 22 / +33 1 53 21 82 63 
http://www.cardiweb.com/
---

/* 
**  mod_norobot.c -- Apache sample norobot module
**  [Autogenerated via ``apxs -n norobot -g'']
**
**  To play with this sample module first compile it into a
**  DSO file and install it into Apache's modules directory 
**  by running:
**
**    $ apxs -c -i mod_norobot.c
**
**  Then activate it in Apache's apache2.conf file for instance
**  for the URL /norobot in as follows:
**
**    #   apache2.conf
**    LoadModule norobot_module modules/mod_norobot.so
**    <Location /norobot>
**    SetHandler norobot
**    </Location>
**
**  Then after restarting Apache via
**
**    $ apachectl restart
**
**  you immediately can request the URL /norobot and watch for the
**  output of this module. This can be achieved for instance via:
**
**    $ lynx -mime_header http://localhost/norobot 
**
**  The output should be similar to the following one:
**
**    HTTP/1.1 200 OK
**    Date: Tue, 31 Mar 1998 14:42:22 GMT
**    Server: Apache/1.3.4 (Unix)
**    Connection: close
**    Content-Type: text/html
**  
**    The sample page from mod_norobot.c
*/ 

#include "httpd.h"
#include "http_config.h"
#include "http_protocol.h"
#include "ap_config.h"

/* The sample content handler */
static int norobot_handler(request_rec *r)
{

    if (r->parsed_uri.path==NULL || strcasecmp(r->parsed_uri.path, "/robots.txt")) {
        return DECLINED;
    }

    ap_set_content_type(r, "text/plain");
    ap_rputs("User-agent: *\nDisallow: /", r);

    return OK;
}

static void norobot_register_hooks(apr_pool_t *p)
{
    ap_hook_handler(norobot_handler, NULL, NULL, APR_HOOK_FIRST);
}

/* Dispatch list for API hooks */
module AP_MODULE_DECLARE_DATA norobot_module = {
    STANDARD20_MODULE_STUFF, 
    NULL,                  /* create per-dir    config structures */
    NULL,                  /* merge  per-dir    config structures */
    NULL,                  /* create per-server config structures */
    NULL,                  /* merge  per-server config structures */
    NULL,                  /* table of config file commands       */
    norobot_register_hooks  /* register hooks                      */
};

Reply via email to