Hello,

I have a problem crawling pages from a intranet. I would like crawl just the 
pages that are in the intranet *.intranet.bancolombia.com.co, but when I see 
the crawl process I see other pages that are linked from mine.
It's suppoused that when I put the line "-.", the crawl will skip everything 
else, but it is not doing that.
This is my crawl-urlfilter file. What could be the problem?

# The url filter file used by the crawl command.

# Better for intranet crawling.
# Be sure to change MY.DOMAIN.NAME to your domain name.

# Each non-comment, non-blank line contains a regular expression
# prefixed by '+' or '-'.  The first matching pattern in the file
# determines whether a URL is included or ignored.  If no pattern
# matches, the URL is ignored.

# skip file:, ftp:, & mailto: urls
-^(file|ftp|mailto):

# skip image and other suffixes we can't yet parse
-\.(gif|GIF|jpg|JPG|png|PNG|ico|ICO|css|sit|eps|wmf|zip|ppt|mpg|xls|gz|rpm|tgz|mov|MOV|exe|jpeg|JPEG|bmp|BMP)$

# skip URLs containing certain characters as probable queries, etc.
-[...@=]

# skip URLs with slash-delimited segment that repeats 3+ times, to break loops
-.*(/.+?)/.*?\1/.*?\1/

# accept hosts in MY.DOMAIN.NAME
+^http://([a-z0-9]*\.)intranet.bancolombia.com.co/

# skip everything else
-.

Saludos,

Jair Piedrahíta Vargas
Estudiante en Práctica - Gerencia de Investigación y Nuevas Tecnologías
Dirección de Estrategia y Arquitectura
Vicepresidencia de Tecnología de Información
BANCOLOMBIA S.A.
www.bancolombia.com<http://www.bancolombia.com>
Tel: (++ 57) (4) 40 41 632
Fax: (++ 57) (4) 40 40 197 - (++ 57) (4) 40 40 198
E-mail: [email protected]<mailto:[email protected]>
Cra. 48 # 26 - 85 Av. Los Industriales
Torre Norte Piso 6B -  120 (Medellín, Colombia)
____________________________________________________
Horario flexible: 7:00 - 12:00 y 1:30 - 4:30 GMT (-05:00)


________________________________
El contenido de este mensaje puede ser información privilegiada y confidencial. 
Si usted no es el destinatario real del mismo, por favor informe de ello a 
quien lo envía y destrúyalo en forma inmediata. Está prohibida su retención, 
grabación, utilización o divulgación con cualquier propósito. Este mensaje ha 
sido verificado con software antivirus; en consecuencia, el remitente de éste 
no se hace responsable por la presencia en él o en sus anexos de algún virus 
que pueda generar daños en los equipos o programas del destinatario.
******************************************************************************************************
This communication (including all attachments) may contain information that is 
private, confidential and privileged. If you have received this communication 
in error; please notify the sender immediately, delete this communication from 
all data storage devices and destroy all hard copies. Any use, dissemination, 
distribution, copying or disclosure of this message and any attachments, in 
whole or in part, by anyone other than the intended recipient(s) is strictly 
prohibited. This message has been checked with an antivirus software; 
accordingly, the sender is not liable for the presence of any virus in 
attachments that causes or may cause damage to the recipient's equipment or 
software.

Reply via email to