Thank you for answer, but i have still a doubt! Why can i read the filed "content" in Luke, if i load the index file created by nutch? So, i load in Luke the index file created by Nutch-1.0, then I can view the fields "url" "title" "host" "ecc, but not all field; if i click on an Edit Botton opens a window that contains other fields including the field "content" with the his value, but as it uses the seampleAnalyzer and the content is not displayed correctly. I tried to change the analyzer and insert NutchDocumenAnalyzer but I do not know how to do it
help :( Andrzej Bialecki wrote: > > inghe wrote: >> >> Hi, >> I want to use Nutch for crawling contents and Lucene for extract and >> analyze >> the contents of the index created by Nutch. I'm trying to extract from >> the >> index the contents of web pages, but i don' know how to set the >> NutchDocumentAnalyzer in my application, if i use the StandardAnalyzer of >> Lucene, i'll get to extract the fields "title", "url" but not the >> "content". >> I'm using Nutch1.0 and Lucene2.4.0 > > There is no content in Lucene indexes. The original content is stored in > Nutch segments. You can use the command bin/nutch readseg to retrieve > all (or selected) pages. > > > -- > Best regards, > Andrzej Bialecki <>< > ___. ___ ___ ___ _ _ __________________________________ > [__ || __|__/|__||\/| Information Retrieval, Semantic Web > ___|||__|| \| || | Embedded Unix, System Integration > http://www.sigram.com Contact: info at sigram dot com > > > -- View this message in context: http://www.nabble.com/Using-Nutch-for-crawling-and-Lucene-for-searching-%28Wildcard-Fuzzy%29-tp19990219p23542476.html Sent from the Nutch - User mailing list archive at Nabble.com.
