Re: Using Nutch for crawling and Lucene for searching (Wildcard/Fuzzy)

inghe Thu, 14 May 2009 08:03:27 -0700

Thank you for answer, but i have still a doubt!
Why can i read the filed "content" in Luke, if i load the index file created
by nutch?
So, i load in Luke the index file created by Nutch-1.0, then I can view the
fields "url" "title" "host" "ecc, but not all field; if i click on an Edit
Botton opens a window that contains other fields including the field
"content" with the his value, but as it uses the seampleAnalyzer and the
content is not displayed correctly. I tried to change the analyzer and
insert NutchDocumenAnalyzer but I do not know how to do it


help :(


Andrzej Bialecki wrote:
> 
> inghe wrote:
>> 
>> Hi,
>> I want to use Nutch for crawling contents and Lucene for extract and
>> analyze
>> the contents of the index created by Nutch. I'm trying to extract from
>> the
>> index the contents of web pages, but i don' know how to set the
>> NutchDocumentAnalyzer in my application, if i use the StandardAnalyzer of
>> Lucene, i'll get to extract the fields "title", "url" but not the
>> "content".
>> I'm using Nutch1.0 and Lucene2.4.0
> 
> There is no content in Lucene indexes. The original content is stored in 
> Nutch segments. You can use the command bin/nutch readseg to retrieve 
> all (or selected) pages.
> 
> 
> -- 
> Best regards,
> Andrzej Bialecki     <><
>   ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Using-Nutch-for-crawling-and-Lucene-for-searching-%28Wildcard-Fuzzy%29-tp19990219p23542476.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: Using Nutch for crawling and Lucene for searching (Wildcard/Fuzzy)

Reply via email to