Hi,
I am crawling filesystem with nutch 0.7.2 on windows. I have enabled
parse plugin for text and html.
It is to my surprise that it is including search results of file
with extension of .java, .class,.jar,.dll and so on so forth.
I can add these into ignore list in regex-urlfilter.txt. But that is not a
solution. Since there are number of file format and I can't add each of them in
ignore list.
Alternative could be that it fetch and show result only of parsable
documents.
can anybody help me in this regards.....l
Regards,
Arun Sharma (Tech Lead-Java/J2EE )
www.voltix.com, www.voltixindia.com
SCO 13-15, Sector 34A
Chandigarh
---------------------------------
Jiyo cricket on Yahoo! India cricket
Yahoo! Messenger Mobile Stay in touch with your buddies all the time.