Yes I found out. I got at tip how to solve it with hadoop, but as
someone else said I would still need Java 1.5.
Since I need to use 1.4 I have modified index-basic plugin to store
content. I have also disabled storing content as segmented data.

It works fine now. I will do some testing concerning performance and I
think this will do just fine.

Is there other things I could do to increase performance or to make the
index smaller? I am thinking about removing some of the fields that
nutch uses, but I don't know if that will give me exceptions while
trying to crawl or not.  I do not need host, site, cache, and probably
some more fields as I know, but as I said Nutch might depend upon them.
Any tips around this as I use a plain Lucene client?

Regards,
Ronny  

-----Opprinnelig melding-----
Fra: Andrzej Bialecki [mailto:[EMAIL PROTECTED] 
Sendt: 20. juni 2007 12:07
Til: [EMAIL PROTECTED]
Emne: Re: SV: doubt about indexing

Naess, Ronny wrote:
>  
> Andrzej, do you think it is possible without to much work to access 
> the segment data from a Lucene client that I have made with or without

> the use of nutch?
> 
> I stated the same question late yesterday in a new mail named 'Lucene 
> client and nutch index' refering the FetchedSegment class you are 
> refering to.

In order to access segment data you have to use Hadoop API, specifically
MapFileOutputFormat.getReaders() and associated Hadoop I/O classes.

The only alternative would be to store everything you need inside Lucene
indexes, as stored and/or binary fields.


-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


!DSPAM:4678fc6657411573131950!


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to