- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Maxime
Subject: Re: limit index to domain, text files with specific extensions, i.e. 
.lib, .spi, etc

A page of a site is crawled only if there is an appropriate Server/Realm/Subnet 
command and there isn't any Disallow command prevents to do so.

With the following command you crawl all target site to follow all links on it 
looking for urls:
Server hrefonly http://www.domain.com/

With the following command you index all urls that end in .lib or .spi:
Realm regex http://www.doamain.com/.*\.(lib|spi)$

As well, make sure that remote httpd-server supply a text/plain , text/html or 
text/xml Content-Type header with these files. Otherwise you need to specify a 
parser to convert such data to one of types specified above.
- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=02;topic_id=1226881847

Ответить