Just removing these lines should be enough.

# skip URLs containing certain characters as probable queries, etc.
[EMAIL PROTECTED]

I notice that your domain is 'realdomain'. Make sure you set that right, or
it won't match what you *do* want.

Thanks,

Steve Betts
[EMAIL PROTECTED]
937-477-1797


-----Original Message-----
From: Andy Morris [mailto:[EMAIL PROTECTED]
Sent: Thursday, February 02, 2006 3:31 PM
To: [email protected]
Subject: RE: Still not processing asp files

So do I just add the + to the files I want crawled?
Here is my crawl-urlfilter file, I just want my local intranet site
crawled....

# The url filter file used by the crawl command.

# Better for intranet crawling.
# Be sure to change MY.DOMAIN.NAME to your domain name.

# Each non-comment, non-blank line contains a regular expression
# prefixed by '+' or '-'.  The first matching pattern in the file
# determines whether a URL is included or ignored.  If no pattern
# matches, the URL is ignored.

# skip file:, ftp:, & mailto: urls
+^(file|ftp|mailto):

# skip image and other suffixes we can't yet parse
+\.(gif|GIF|jpg|JPG|ico|ICO|css|sit|eps|wmf|zip|ppt|mpg|xls|gz|rpm|tgz|m
ov|MOV|exe|png)$

# skip URLs containing certain characters as probable queries, etc.
[EMAIL PROTECTED]

# skip URLs with slash-delimited segment that repeats 3+ times, to break
loops
-.*(/.+?)/.*?\1/.*?\1/

# accept hosts in MY.DOMAIN.NAME
+^http://([a-z0-9]*\.)*realdomain/

# skip everything else
-.

Thanks,
Andy

-----Original Message-----
From: Ivan Sekulovic [mailto:[EMAIL PROTECTED]
Sent: Thursday, February 02, 2006 10:28 AM
To: [email protected]
Subject: Re: Still not processing asp files

You should also check the same regex for '=' sign.

Best Regards,
Sekula
http://www.ifimages.com/


Steve Betts wrote:

>Does your url filter (I use regex) remove all urls with a '?' in them?
>That would remove most of your dynamic content.
>
>Thanks,
>
>Steve Betts
>[EMAIL PROTECTED]
>937-477-1797
>
>
>-----Original Message-----
>From: Andy Morris [mailto:[EMAIL PROTECTED]
>Sent: Thursday, February 02, 2006 9:54 AM
>To: [email protected]
>Subject: Still not processing asp files
>
> I have version "nutch-nightly" running from january 26.  I am still
>not able to process the asp files, the htm, html files work great.  Any

>options I need to set for this to work?
>
>Andy
>
>
>
>
>
>




-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to