Re: Using Nutch for crawling and Lucene for searching (Wildcard/Fuzzy)

Andrzej Bialecki Thu, 14 May 2009 04:49:50 -0700

inghe wrote:


Hi,
I want to use Nutch for crawling contents and Lucene for extract and analyze
the contents of the index created by Nutch. I'm trying to extract from the
index the contents of web pages, but i don' know how to set the
NutchDocumentAnalyzer in my application, if i use the StandardAnalyzer of
Lucene, i'll get to extract the fields "title", "url" but not the "content".
I'm using Nutch1.0 and Lucene2.4.0

There is no content in Lucene indexes. The original content is stored inNutch segments. You can use the command bin/nutch readseg to retrieveall (or selected) pages.



--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Re: Using Nutch for crawling and Lucene for searching (Wildcard/Fuzzy)

Reply via email to