Re: Sharding Techniques

Samarendra Pratap Mon, 09 May 2011 08:10:49 -0700

Hi Ian,
 Thanks for sharing your knowledge and to-the-point answers.

1. I've not tested my application with single index as initially (a few
years back) we thought smaller the index size (7 indexes for default 80%
searches) the faster the search time would be. Anyway i'll give it a try and
share the experience.


2. For sharing/caching we create index readers once the server starts and
use these throughout the server's life (1 day). At the time of searches,
number of indexes to be read are decided by analyzing the search parameters.
IndexSearchers are created on persistent IndexReaders and finally a
ParallelMultiSearcher is created from these IndexSearchers (I hope this is
not a problem, or is it???)

3. I had gone through the link you provided and some of the things are
already implemented (e.g. readOnly=true, NIOFSDirectory, optmizing, etc.).
We are using filters for some of the fields and caching those filters in the
memory, through hashtable.

Will reducing number of tokens in a particular field in index reduce the
search time (or CPU, memory etc)?

E.g. I have 11 documents and tokens in field (fld1) are
1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9 and 2.0.

The query is - fld1:[ 1.0 TO 2.0 ]

Would it make any difference if the tokens in documents (in the same field)
would be
1,1,1,1,1,1,1,1,1,2
??



On Mon, May 9, 2011 at 6:36 PM, Ian Lea <[email protected]> wrote:

> 30Gb isn't that big by lucene standards.  Have you considered or tried
> just having one large index?  If necessary you could restrict searches
> to particular "indexes", or groups thereof, by a field in the combined
> index, preferably used as a filter.  If the slow searches have to
> search across 63 separate indexes it is perhaps not surprising that
> they are slow.  What do you do about sharing or caching
> searcher/reader instances?  There are lots of useful tips on
> http://wiki.apache.org/lucene-java/ImproveSearchingSpeed.
>
> 40 fields isn't that many - should be fine.
>
> On sharding/scaling/etc,
>
> http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Scaling-Lucene-and-Solr
> looks well worth a read.
>
>
> --
> Ian.
>
> On Mon, May 9, 2011 at 12:56 PM, Samarendra Pratap <[email protected]>
> wrote:
> > Hi list,
> >  We have an index directory of 30 GB which is divided into 3
> subdirectories
> > (idx1, idx2, idx3) which are again divided into 21 sub-subdirectories
> > (idx1-1, idx1-2, ...., idx2-1, ...., idx3-1, ...., idx3-21).
> >
> > We are running with java 1.6, lucene 2.9 (going to upgrade to 3.1 very
> > soon), linux (fedora core - kernel 2.6.17-13.1), reiserfs.
> >
> > We have almost 40 fields in each index (is it a bad to have so many
> > fields?). most of them are id based fields.
> > We are using 8 servers for search, and each of which receives
> approximately
> > 3000/hour queries in peak hour and search time of more than 1 second is
> > considered bad (is it really bad?) as per the business requirement.
> >
> > Since past few months we are experiencing issues (load and search time)
> on
> > our search servers, due to which I am looking for sharding techniques.
> Can
> > someone guide or give me pointers where i can read more and test?
> >
> > Keeping parts of indexes on different servers search on all of them and
> then
> > merging the results - what could be the best approach?
> >
> > Let me tell you that most queries use only 6-7 indexes and 4 - 5 fields
> (to
> > search for) but some queries (searching all the data) require all the
> > indexes and are primary cause of the performance degradation.
> >
> > Any suggestions/ideas are greatly appreciated. And further more will
> > sharding (or similar thing) really reduce search time? (load is a less
> > severe issue when compared to search time)
> >
> >
> > --
> > Regards,
> > Samar
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>


-- 
Regards,
Samar

Re: Sharding Techniques

Reply via email to