Re: Inrease the performance of Indexing in Lucene

Michael McCandless Thu, 19 Jul 2007 03:24:45 -0700

This new Wiki page has a number of best practices for maximizing your
indexing speed:


    http://wiki.apache.org/lucene-java/ImproveIndexingSpeed

Also, if appropriate you can try testing with the trunk version of
Lucene which includes substantial optimizations (from LUCENE-843).
Usual caveat: this is not yet released and may have sneaky bugs,
so, use at your own risk....

With LUCENE-843, both the indexing speed and the RAM efficiency of
IndexWriter have been greatly improved such that you can fit more docs
into each MB of RAM (depending on how small the docs are).

Mike

"miztaken" <[EMAIL PROTECTED]> wrote:
> 
> Hi, Please help me.
> Its been a month since i am trying lucene.
> My requirements are huge, i have to index and search in TB of data. 
> I have question regarding three topics:
> 
> 1. Problem in Indexing
>   As i need to index TB of data, so by googling and visiting different
>   forum
> i deployed following fashion for indexing:
> 
> 1. First i created an array of RAMDirectory then i added the documents on
> it. 
>    After crossing certain threshold i dumped it into my drive as
>    tempIndex1
> 
> 2. I repeated the same process until all documents are indexed in my
> drive
> as tempindex1. tempindex2...
> 
> 3. Then finally i loaded the temp directories and merged in as a main
> full
> indexed directories. 
> 
> 4. I have used threading too for this purpose.
> 
> 5. This some what removed the optimize() overhead of IndexWriter, as i
> added
> directories together only at the end.
> 
> Am i doing this the right way or not, is there any other solution to
> boost
> the indexing process. 
> 
> 2. Problem in searching
> 
> As lucene doesnt support LSI and SVD so as to achieve conceptual search,
> i
> first search the lucene index for the user inputted text then retrieved
> the
> document and then expanded the query using LSI and SVD and then
> re-searched
> the index. 
> Now with few words in query doesnt seem to have performance problem but
> when
> i expand the query i.e. when the query contains ten words ORed together
> then
> it takes tremendously unacceptable amount of time to get Hits. Is this
> obvious or am i missing something here too.. 
> What are the ways to achieve boost in query performance when the query
> contains many terms and especially they are ORed, as for ANDed query it
> requires less time to produce Hits.
> I have used single Indexsearcher and my index is optimized as well..
> 
> 
> 3. Another Problem
> 
> As i require to dump my database table too inside lucene along with
> fulltext
> info. 
> What effect will it have on indexing and searching.?
> Also i might need to change the name of Field of Document indexed in
> lucene,
> will it be possible.?
> I know its not possible to change the value of the field but will it be
> possible to change the name of the field or we have to control
> externally..? 
> 
> Please shade me some light in these things:
> Your help is highly anticipated
> 
> -- 
> View this message in context:
> http://www.nabble.com/Inrease-the-performance-of-Indexing-in-Lucene-tf4108165.html#a11682360
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Inrease the performance of Indexing in Lucene

Reply via email to