Andrzej Bialecki wrote: > > Page content is NOT stored in Lucene indexes that Nutch creates. It's > only indexed, which is not the same. Luke can show you the text in the > "content" field only because it reconstructs it from the index. This > reconstruction is incomplete because some information is missing (the > information discarded by NutchDocumentAnalyzer). > > As I wrote before, full content is stored in Nutch segments. That's why > Nutch can show you the full content, but Luke cannot. > >
Thanks again, but is there a method to get a "content" informations through the libraries of Lucene? I would like to work on the content of the web pages extracted. -- View this message in context: http://www.nabble.com/Using-Nutch-for-crawling-and-Lucene-for-searching-%28Wildcard-Fuzzy%29-tp19990219p23555198.html Sent from the Nutch - User mailing list archive at Nabble.com.
