Hi Samar,

Have you looked at top or iostat or other monitoring utilities to see if you 
are cpu bound vs I/O bound?

With 225 term queries, it's possible that you are I/O bound.

I suspect you need to think about seek time and caching. For each unique 
field:term combination lucene has to look up the postings for that term in the 
index.  Additionally for any phrase, lucene has to additionally look up the 
positions data for each term in the phrase. (In our case phrase searches are 
very expensive as our positions (*prx) index is about 8 times as large as our 
frq index) So for 225 terms including some number of phrases, that is a lot of 
disk seeks.  To the extent that the terms are close together in the index and 
various buffer caches contain adjacent terms, you might not actually have 225 
seeks, but I suspect there will still be a lot.    

Although Lucene implements a number of caches (and you should take a look at 
your cache hit ratios), Lucene depends on the OS disk cache to cache postings 
data for individual terms. Most unix/linux OS's use free memory for disk 
caching.  How much memory is available on the machine after the JVM gets it 
allocation?

Have you considered running cache warming queries of your most frequent 
terms/phrases so that the data is in the OS disk cache?  

Tom


>> When queries (without two fields mentioned above) have a lot of
>>words/phrases search time is high. E.g I took a query with around 80 unique
>>terms (not words) in 5 fields. These terms occur repeatedly and become total
>>225 terms (non-unique). This particular query took 4.2 seconds. the 15
>>indexes used for this query were of total size 5 G.
>>Are 225 terms (80 unique) is a very big number?

-----Original Message-----

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to