Re: Using Nutch for crawling and Lucene for searching (Wildcard/Fuzzy)

inghe Fri, 15 May 2009 01:03:18 -0700


Andrzej Bialecki wrote:
> 
> Page content is NOT stored in Lucene indexes that Nutch creates. It's 
> only indexed, which is not the same. Luke can show you the text in the 
> "content" field only because it reconstructs it from the index. This 
> reconstruction is incomplete because some information is missing (the 
> information discarded by NutchDocumentAnalyzer).
> 
> As I wrote before, full content is stored in Nutch segments. That's why 
> Nutch can show you the full content, but Luke cannot.
> 
>


Thanks again, but is there a method to get a "content" informations through
the libraries of Lucene? I would like to work on the content of the web
pages extracted.

-- 
View this message in context: 
http://www.nabble.com/Using-Nutch-for-crawling-and-Lucene-for-searching-%28Wildcard-Fuzzy%29-tp19990219p23555198.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: Using Nutch for crawling and Lucene for searching (Wildcard/Fuzzy)

Reply via email to