inghe wrote:
Thank you for answer, but i have still a doubt!
Why can i read the filed "content" in Luke, if i load the index file created
by nutch?
So, i load in Luke the index file created by Nutch-1.0, then I can view the
fields "url" "title" "host" "ecc, but not all field; if i click on an Edit
Botton opens a window that contains other fields including the field
"content" with the his value, but as it uses the seampleAnalyzer and the
content is not displayed correctly. I tried to change the analyzer and
insert NutchDocumenAnalyzer but I do not know how to do it
help :(
Page content is NOT stored in Lucene indexes that Nutch creates. It's
only indexed, which is not the same. Luke can show you the text in the
"content" field only because it reconstructs it from the index. This
reconstruction is incomplete because some information is missing (the
information discarded by NutchDocumentAnalyzer).
As I wrote before, full content is stored in Nutch segments. That's why
Nutch can show you the full content, but Luke cannot.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com