Yes I found out. I got at tip how to solve it with hadoop, but as someone else said I would still need Java 1.5. Since I need to use 1.4 I have modified index-basic plugin to store content. I have also disabled storing content as segmented data.
It works fine now. I will do some testing concerning performance and I think this will do just fine. Is there other things I could do to increase performance or to make the index smaller? I am thinking about removing some of the fields that nutch uses, but I don't know if that will give me exceptions while trying to crawl or not. I do not need host, site, cache, and probably some more fields as I know, but as I said Nutch might depend upon them. Any tips around this as I use a plain Lucene client? Regards, Ronny -----Opprinnelig melding----- Fra: Andrzej Bialecki [mailto:[EMAIL PROTECTED] Sendt: 20. juni 2007 12:07 Til: [EMAIL PROTECTED] Emne: Re: SV: doubt about indexing Naess, Ronny wrote: > > Andrzej, do you think it is possible without to much work to access > the segment data from a Lucene client that I have made with or without > the use of nutch? > > I stated the same question late yesterday in a new mail named 'Lucene > client and nutch index' refering the FetchedSegment class you are > refering to. In order to access segment data you have to use Hadoop API, specifically MapFileOutputFormat.getReaders() and associated Hadoop I/O classes. The only alternative would be to store everything you need inside Lucene indexes, as stored and/or binary fields. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com !DSPAM:4678fc6657411573131950! ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
