disable the html-parser from the nutch-site and keep only your parser. you can also add in uour filter file this : -(htm|html)$
thx > Date: Mon, 26 Oct 2009 17:53:11 +0300 > Subject: How to index files only with specific type > From: dfun...@gmail.com > To: nutch-user@lucene.apache.org > > Hi, I've create parser and indexer to specific file type(geo xml meta > file - kml). > I am trying to crawl couple of sites, and index only files of this type. > I don't want to index html or anything else. > How can I achieve this? > Thanks.- _________________________________________________________________ Save up to 84% on Windows 7 until Jan 3—eligible CDN College & University students only. Hurry—buy it now for $39.99! http://go.microsoft.com/?linkid=9691635