I'm sure that you should try building one large index and convert to NumericField wherever you can. I'm convinced that will be faster - but as ever, the proof will be in the numbers.
On repeated terms, I believe that lucene will search multiple times. If so, I'd guess it is just something that has never been optimized. 225 terms is not a very big number, but not very small either. Complex queries with lots of terms can be expected to be slower than simple queries with few terms. If you have a particular problem with repeated terms perhaps you could dedup them yourself. -- Ian. On Wed, May 11, 2011 at 9:10 AM, Samarendra Pratap <[email protected]> wrote: > Hi Tom, > the more i am getting responses in this thread the more i feel that our > application needs optimization. > > 350 GB and less than 2 seconds!!! That's much more than my expectation :-) > (in current scenario). > > *characteristics of slow queries:* > there are a few reasons for greater search time > > 1.Two of our fields contain decimal values but are not NumericField :( . > These fields are searched as a range. Whenever the ranges are larger and/or > both the fields are used in search the search time and server load goes > high. I have already started work to convert it to NumericField - but > suggestions and experiences are most welcome. > > 2. When queries (without two fields mentioned above) have a lot of > words/phrases search time is high. E.g I took a query with around 80 unique > terms (not words) in 5 fields. These terms occur repeatedly and become total > 225 terms (non-unique). This particular query took 4.2 seconds. the 15 > indexes used for this query were of total size 5 G. > Are 225 terms (80 unique) is a very big number? > > and yes, slow queries are always slow. yes but obviously high load will add > up to their slowness. > > > > > Here I have another curiosity about something I noticed. > If I have a query like following: > > > title:xyz title:xyz title:xyz title:xyz title:xyz title:xyz title:xyz > title:xyz title:xyz title:xyz title:xyz > > *Will lucene search for the term 11 times or it will reuse the results of > first term?* > > If later is true (which I think is), is there any particular reason or it > may be optimized inside lucene? > > > On Tue, May 10, 2011 at 9:46 PM, Burton-West, Tom <[email protected]>wrote: > >> Hi Samar, >> >> >>Normal queries go fine under 500 ms but when people start searching >> >>"anything" some queries take up to > 100 seconds. Don't you think >> >>distributing smaller indexes on different machines would reduce the >> average >> >>.search time. (Although I have a feeling that search time for smaller >> queries >> >>may be slightly increased) >> >> What are the characteristics of your slow queries? Can you give examples? >> Are the slow queries always slow or only under heavy load? What the >> bottleneck is and whether splitting into smaller indexes would help depends >> on just what your bottleneck is. It's not clear that your index is large >> enough that the size of the index is causing your bottleneck. >> >> We run indexes of about 350GB with average response times under 200ms and >> 99th percentile reponse times of under 2 seconds. (We have a very low qps >> rate however). >> >> >> Tom >> >> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] >> >> > > > -- > Regards, > Samar > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
