On May 10, 2011, at 9:42 AM, Samarendra Pratap wrote: > Hi, > Though we have 30 GB total index, size of the indexes that are used > in 75%-80% searches is 5 GB. and we have average search time around 700 ms. > (yes, we have optimized index). > > Could someone please throw some light on my original doubt!!! > If I want to keep smaller indexes on different servers so that CPU and > memory may be optimized, how can I aggregate the results of a query from > each of the server. One thing I know is RMI which I studied a few years > back, but that was too slow (or i thought so that time). What are other > techniques?
There is also http://katta.sourceforge.net/ out there... Johannes > > Is 1 second a bad search time for following? > total index size: 30 GB > index size which is being used in 80% searches - 5 GB > number of fields - 40 > most of the fields being numeric fields. > one big "contents" field with 500 - 1000 words. > 3500 queries / second mostly on > on an average a query uses 7 fields (1 big 6 small) with 25-30 tokens > > Are there any benchmarks from which I can compare the performance of my > application? Or any approximate formula which can guide me > calculating (using system parameters and index/search stats) the "best" > expected search time? > > Thanks in advance > > On Tue, May 10, 2011 at 9:59 AM, Ganesh <[email protected]> wrote: > >> We are using similar technique as yours. We keep smaller indexes and use >> ParallelMultiSearcher to search across the index. Keeping smaller indexes is >> good as index and index optimzation would be faster. There will be small >> delay while searching across the indexes. >> >> 1. What is your search time? >> 2. Is your index optimized? >> >> I have a doubt, If we keep the indexes size to 30 GB then each file size >> (fdt, fdx etc) would in GB's. Small addition or deletion to the file will >> not cause more IO as it has to skip those bytes and write it at the end of >> file. >> >> Regards >> Ganesh >> >> >> >> ----- Original Message ----- >> From: "Samarendra Pratap" <[email protected]> >> To: <[email protected]> >> Sent: Monday, May 09, 2011 5:26 PM >> Subject: Sharding Techniques >> >> >>> Hi list, >>> We have an index directory of 30 GB which is divided into 3 >> subdirectories >>> (idx1, idx2, idx3) which are again divided into 21 sub-subdirectories >>> (idx1-1, idx1-2, ...., idx2-1, ...., idx3-1, ...., idx3-21). >>> >>> We are running with java 1.6, lucene 2.9 (going to upgrade to 3.1 very >>> soon), linux (fedora core - kernel 2.6.17-13.1), reiserfs. >>> >>> We have almost 40 fields in each index (is it a bad to have so many >>> fields?). most of them are id based fields. >>> We are using 8 servers for search, and each of which receives >> approximately >>> 3000/hour queries in peak hour and search time of more than 1 second is >>> considered bad (is it really bad?) as per the business requirement. >>> >>> Since past few months we are experiencing issues (load and search time) >> on >>> our search servers, due to which I am looking for sharding techniques. >> Can >>> someone guide or give me pointers where i can read more and test? >>> >>> Keeping parts of indexes on different servers search on all of them and >> then >>> merging the results - what could be the best approach? >>> >>> Let me tell you that most queries use only 6-7 indexes and 4 - 5 fields >> (to >>> search for) but some queries (searching all the data) require all the >>> indexes and are primary cause of the performance degradation. >>> >>> Any suggestions/ideas are greatly appreciated. And further more will >>> sharding (or similar thing) really reduce search time? (load is a less >>> severe issue when compared to search time) >>> >>> >>> -- >>> Regards, >>> Samar >>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] >> >> > > > -- > Regards, > Samar --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
