Hi Kuro,

How did you generate your second , larger, test data set?
Did you simply copy the original data set multiple times? Or did you use new 
pseudo-random data (words). If the first then you would expect a linear 
increase in search time as the number of indexed terms has not changed, just 
the number of matching documents. If you used additional words (with pseudo 
random data) then the number of terms should increase (up to a point) giving 
you non linear search time increases (until you max out on the number of 
unique dictionary words).

Rgs
Joel

On Thursday 18 June 2009 23:27:29 Yonik Seeley wrote:
> On Thu, Jun 18, 2009 at 3:54 PM, Teruhiko Kurosaka<k...@basistech.com> 
wrote:
> > Because the number of hits was proportinoal to the number
> > of Documents in the index in my previous test, I came
> > to a wrong conclusion that the search time is proportional
> > to the index size.  If I have only one Document that can
> > matches with a query, the search time remains constant no
> > matter how large the index is.
>
> Right. An inverted index contains a list of documents that match each
> term, so ignoring other overhead and effects, search time is
> proportional to the number of documents matching the various clauses
> of the query.
>
> -Yonik
> http://www.lucidimagination.com
>
> > -kuro
> >
> >> -----Original Message-----
> >> From: Erick Erickson [mailto:erickerick...@gmail.com]
> >> Sent: Thursday, June 18, 2009 12:44 AM
> >> To: java-user@lucene.apache.org
> >> Subject: Re: Lucene performance: is search time linear to the
> >> index size?
> >>
> >> Opening a searcher and doing the first query incurs a
> >> significant amount of overhead, cache loading, etc. Inferring
> >> search times relative to index size with a program like you
> >> describe is unreliable.
> >>
> >> Try firing a few queries at the index without measuring,
> >> *then* measure the time it takes for subsequent queries and
> >> you'll get a much better picture of actual response time.
> >>
> >> The fact that a program that fires a single query at a newly
> >> opened reader has near-linear performance isn't as surprising
> >> as all that. I'd be more concerned if, say, queries 10
> >> through 100 *on the same underlying reader* displayed this behavior.
> >>
> >> See:
> >>
> >> http://wiki.apache.org/lucene-java/ImproveSearchingSpeed?highl
> >> ight=(warming)
> >>
> >> especially the questions around:
> >> *When measuring performance, disregard the first query
> >>
> >> Best
> >> Erick
> >> *
> >> On Thu, Jun 18, 2009 at 12:49 AM, Teruhiko Kurosaka
> >>
> >> <k...@basistech.com>wrote:
> >> > I've written a test program that uses the simplest form of search,
> >> > TermQuery and measure the time it takes to search a term in
> >>
> >> a field on
> >>
> >> > indices of various sizes.
> >> >
> >> > The result is a very linear growth of search time vs the
> >>
> >> index size in
> >>
> >> > terms of # of Documents, not # of unique terms in that field.
> >> >
> >> > -kuro
> >>
> >> ---------------------------------------------------------------------
> >>
> >> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> >> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org



-- 
Joel Halbert
020 3051 8637
075 2501 0825
j...@su3analytics.com
www.su3analytics.com
www.storequery.com
SU3 Analytics Ltd, The Print House, 18 Ashwin St, London E8 3DL.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to