Ganesh
Nobody is saying that sharding is never a good idea - it just doesn't seem to be applicable in the case being discussed. On my indexes I care much more about speed of searching rather than speed of indexing. The latter typically happens in the background in the dead of night and within reason I don't really care how long it takes. Your application and requirements will be different and you may come to different conclusions. -- Ian. On Wed, May 11, 2011 at 6:09 AM, Ganesh <[email protected]> wrote: > > We also use similar kind of technique, breaking indexes in to smaller and > search using ParallelMultiSearcher. We have to do incremental indexing and > the records older than 6 months or 1 year (based on ageout setting) should be > deleted. Having multiple small indexes is really fast in terms of indexing. > > Since you guys mentioned about keeping single large index. Search time woule > be faster but the indexing and index optimization will take more time. How > you are handling it in case of incremental indexing. If we keep the indexes > size to 100+ GB then each file size (fdt, fdx etc) would in GB's. Small > addition or deletion to the file will not cause more IO as it has to skip > those bytes and write it at the end of file. > > Regards > Ganesh > > ----- Original Message ----- > From: "Burton-West, Tom" <[email protected]> > To: <[email protected]> > Sent: Tuesday, May 10, 2011 9:46 PM > Subject: RE: Sharding Techniques > > > Hi Samar, > >>>Normal queries go fine under 500 ms but when people start searching >>>"anything" some queries take up to > 100 seconds. Don't you think >>>distributing smaller indexes on different machines would reduce the average >>>.search time. (Although I have a feeling that search time for smaller queries >>>may be slightly increased) > > What are the characteristics of your slow queries? Can you give examples? > Are the slow queries always slow or only under heavy load? What the > bottleneck is and whether splitting into smaller indexes would help depends > on just what your bottleneck is. It's not clear that your index is large > enough that the size of the index is causing your bottleneck. > > We run indexes of about 350GB with average response times under 200ms and > 99th percentile reponse times of under 2 seconds. (We have a very low qps > rate however). > > > Tom > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
