Hi Mike, *"I think the usual approach is to create multiple mirrored copies (slaves) rather than sharding"* This is where my eyes stuck.
We do have mirrors and in-fact a good number of those. 6 servers are being used for serving regular queries (2 are for specific queries that do take time) and each of them receives around 3-3.5 K queries per hour in peak hours. The problem is that the interface being used by end users has a lot of options plus a few text boxes where they can type up to 64 words each. (and unfortunately i am not able to reduce these things as these are business requirements) Normal queries go fine under 500 ms but when people start searching "anything" some queries take up to > 100 seconds. Don't you think distributing smaller indexes on different machines would reduce the average search time. (Although I have a feeling that search time for smaller queries may be slightly increased) On Tue, May 10, 2011 at 6:32 PM, Mike Sokolov <soko...@ifactory.com> wrote: > > Down to basics, Lucene searches work by locating terms and resolving >> documents from them. For standard term queries, a term is located by a >> process akin to binary search. That means that it uses log(n) seeks to >> get the term. Let's say you have 10M terms in your corpus. If you stored >> that in a single field in a single index with a single segment, it would >> take log(10M) ~= 24 seeks to locate a term. This is of course very >> simplified. >> >> When you have 63 indexes, log(n) works against you. Even with the >> unrealistic assumption that the 10M terms are evenly distributed and >> without duplicates, the number of seeks for a search that hits all parts >> will still be 63 * log(10M/63) ~= 63 * 18 = 1134. And we haven't even >> begun to estimate the merging part. >> > This is true, but if the indexes are kept on 63 separate servers, those > seeks will be carried out in parallel. The OP did indicate his indexes > would be on different servers, I think? I still agree with your overall > point - at this scale a single server is probably best. And if there are > performance issues, I think the usual approach is to create multiple > mirrored copies (slaves) rather than sharding. Sharding is useful for very > large indexes: indexes to big to store on disk and cache in memory on one > commodity box > > -Mike > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- Regards, Samar