Re: Sharding Techniques

Johannes Zillmann Tue, 10 May 2011 00:52:50 -0700

On May 10, 2011, at 9:42 AM, Samarendra Pratap wrote:

> Hi,
> Though we have 30 GB total index, size of the indexes that are used
> in 75%-80% searches is 5 GB. and we have average search time around 700 ms.
> (yes, we have optimized index).
> 
> Could someone please throw some light on my original doubt!!!
> If I want to keep smaller indexes on different servers so that CPU and
> memory may be optimized, how can I aggregate the results of a query from
> each of the server. One thing I know is RMI which I studied a few years
> back, but that was too slow (or i thought so that time). What are other
> techniques?


There is also http://katta.sourceforge.net/ out there...

Johannes

> 
> Is 1 second a bad search time for following?
> total index size: 30 GB
> index size which is being used in 80% searches - 5 GB
> number of fields - 40
> most of the fields being numeric fields.
> one big "contents" field with 500 - 1000 words.
> 3500 queries / second mostly on
> on an average a query uses 7 fields (1 big 6 small) with 25-30 tokens
> 
> Are there any benchmarks from which I can compare the performance of my
> application? Or any approximate formula which can guide me
> calculating (using system parameters and index/search stats) the "best"
> expected search time?
> 
> Thanks in advance
> 
> On Tue, May 10, 2011 at 9:59 AM, Ganesh <emailg...@yahoo.co.in> wrote:
> 
>> We are using similar technique as yours. We keep smaller indexes and use
>> ParallelMultiSearcher to search across the index. Keeping smaller indexes is
>> good as index and index optimzation would be faster.  There will be small
>> delay while searching across the indexes.
>> 
>> 1. What is your search time?
>> 2. Is your index optimized?
>> 
>> I have a doubt, If we keep the indexes size to 30 GB then each file size
>> (fdt, fdx etc) would in GB's. Small addition or deletion to the file will
>> not cause more IO as it has to skip those bytes and write it at the end of
>> file.
>> 
>> Regards
>> Ganesh
>> 
>> 
>> 
>> ----- Original Message -----
>> From: "Samarendra Pratap" <samarz...@gmail.com>
>> To: <java-user@lucene.apache.org>
>> Sent: Monday, May 09, 2011 5:26 PM
>> Subject: Sharding Techniques
>> 
>> 
>>> Hi list,
>>> We have an index directory of 30 GB which is divided into 3
>> subdirectories
>>> (idx1, idx2, idx3) which are again divided into 21 sub-subdirectories
>>> (idx1-1, idx1-2, ...., idx2-1, ...., idx3-1, ...., idx3-21).
>>> 
>>> We are running with java 1.6, lucene 2.9 (going to upgrade to 3.1 very
>>> soon), linux (fedora core - kernel 2.6.17-13.1), reiserfs.
>>> 
>>> We have almost 40 fields in each index (is it a bad to have so many
>>> fields?). most of them are id based fields.
>>> We are using 8 servers for search, and each of which receives
>> approximately
>>> 3000/hour queries in peak hour and search time of more than 1 second is
>>> considered bad (is it really bad?) as per the business requirement.
>>> 
>>> Since past few months we are experiencing issues (load and search time)
>> on
>>> our search servers, due to which I am looking for sharding techniques.
>> Can
>>> someone guide or give me pointers where i can read more and test?
>>> 
>>> Keeping parts of indexes on different servers search on all of them and
>> then
>>> merging the results - what could be the best approach?
>>> 
>>> Let me tell you that most queries use only 6-7 indexes and 4 - 5 fields
>> (to
>>> search for) but some queries (searching all the data) require all the
>>> indexes and are primary cause of the performance degradation.
>>> 
>>> Any suggestions/ideas are greatly appreciated. And further more will
>>> sharding (or similar thing) really reduce search time? (load is a less
>>> severe issue when compared to search time)
>>> 
>>> 
>>> --
>>> Regards,
>>> Samar
>>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>> 
>> 
> 
> 
> -- 
> Regards,
> Samar


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Sharding Techniques

Reply via email to