Thanks, Doğacan. Thanks for the clarification conserning the content setting.
The index-basic plugin modification is okay, but is it possible to access the segment data containing content from a lucene client? I kind of like the speed nutch provides by caching content as segment data, and if searching the index will be a big performance issue after storing content in index I will choose to access the segmentet data if possible. Also, I must specify that I am bound to use java 1.4 for our client, but I guess I could rewrite/recompile some needed Nutch code for java 1.4 if needed to access segmented data. Best regards, Ronny -----Opprinnelig melding----- Fra: Doğacan Güney [mailto:[EMAIL PROTECTED] Sendt: 20. juni 2007 08:14 Til: [EMAIL PROTECTED] Emne: Re: Lucene client and nutch index On 6/20/07, Naess, Ronny <[EMAIL PROTECTED]> wrote: > I tried your tip Brian, but the property > > <property> > <name>fetcher.store.content</name> > <value>true</value> > <description>If true, fetcher will store > content.</description> </property> > > set in nutch-site.xml does not seem to work (still no content) and I > found exactly the same setting in nutch-default.xml anyway, and it was > also set to true.....strange!!?? > > Does it mean what we think it does as in store into index or does it > mean store as segment data? If fetcher.store.content is set to true, then fetcher stores the original version of the page (its 'content') in <segment>/content directory. It has nothing to do with indexing. Note that content is not available to Indexer but parse text is. If you want to store parse text in index, just change index-basic plugin where it adds the "content" field to Store.YES. (If there is any confusion, parse text is indexed as "content"). > > Regards, > Ronny > > -----Opprinnelig melding----- > Fra: Brian Whitman [mailto:[EMAIL PROTECTED] > Sendt: 19. juni 2007 19:52 > Til: [EMAIL PROTECTED] > Emne: Re: Lucene client and nutch index > > > On Jun 19, 2007, at 1:39 PM, Naess, Ronny wrote: > > > I have made a small Lucene client reading my nutch index created > > with > > Nutch-0.9 > > > > This works fine. However since 'content' is not stored only indexed > > in > > > the index I have to find a way to access the content to create a > > summary (and highlighting the query terms). > > > > You can simply set the content to be stored in the Lucene index then > highlighting will work normally from any Lucene client. Search the > mailing list (there was a post just yesterday) about how to accomplish > this, there's a single line of code to change. Do realise that storing > content will slow down some queries and your index size will grow very > large. > > -Brian > > > > > > -- Doğacan Güney !DSPAM:4678c5de300541387220021! ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
