On 8/13/07, Erick Erickson <[EMAIL PROTECTED]> wrote: > Have you tried the very simple techinque if just making an OR clause > containing all the sources for a particular query and just letting > it run? I was surprised at the speed...
I think the TermsFilter that I use does exactly that. > > But before doing *any* of that, you need to find out, and tell us, what > exactly is taking the time. Are you opening a new IndexReader for > each query? No. > Are you iterating through a Hits object that has more than > 100 (maybe it's 200 now) entries? Are you loading each document that > satisfies the query? Etc. Etc. Unfortunately, yes. And I know this is another big source for slowness. But due to some other reason that cannot be worked around at this stage. I'll have to return all hits for a search for now. For each document I get the docid (not the internal one in lucene), date and publication. I've already used FieldCache to cache all the 3 fields. > > Put some simple timers in your code and measure exactly what's taking the > time before tuning your code. Time the call to search. Time the call for > parsing. Time the assembly of the responses, in, say, blocks of 100. Time for parsing < 0.01 sec, Time for assembly of the response, sent over network, etc: ignored. The 2-3 seconds is only the time to call Searcher.search(query, filter, n, sort). > > You simply cannot improve your code without knowing, through > measurement, what is taking the time. Virtually every time I've tried to > improve speed without measuring first, I've been wrong <G>.. I'll have to confess, if I only take first 100 hits, search time can be bring down to around 1 second. But being unable to do this, I've also tried to measure the performance taking out each individual factor, e.g. sort, filter by date, filter by publications. and I found that filter by publication generally takes the most time. I forgot the exact measures, but around 0.5~1 seconds improvements taking it out. > > BTW, have you looked over the suggestions here? > > http://wiki.apache.org/lucene-java/ImproveSearchingSpeed Yes, I've looked over it a couple of times already =) Get faster hardware, add RAMs are all good suggestions in my case. We will eventually spread our indexes to a number of machines. But I would still like to eliminate any inefficiencies in our search implementation first. > > > Best > Erick > > On 8/13/07, Cedric Ho <[EMAIL PROTECTED]> wrote: > > > > Hi all, > > > > My problem is as follows: > > > > Our documents each comes from a different publication. And we > > currently have > 5000 different publication sources. > > > > Our clients can choose arbitrarily a subset of the publications while > > performing search. It is not uncommon that a search will have to > > match hundreds or thousands of publications. > > > > I currently try to index the publication information as a field in > > each document. and use a TermsFilter when performing search. However > > the performance is less than satisfactory. Many simple searches takes > > more than 2-3 seconds. (our goal: < 0.5seconds). > > > > Using the CachingWrapperFilter is great for search speed. But I've > > done some calculation and figured that it is basically impossible to > > cache all combination of publications or even some common > > combinations. > > > > > > Is there any other more effective way to do the filtering? > > > > (I know that the slowness is not purely due to the publication filter, > > we also have some other things that will slow down the search. But > > this one definitely contributed quite a lot to the overall search > > time) > > > > Regards, > > Cedric > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > -- [EMAIL PROTECTED]