Re: lucene query complexity

2015-11-20 Thread Adrien Grand
I don't think the big-O notation is appropriate to measure the cost of
Lucene queries.

Le mer. 11 nov. 2015 à 20:31, search engine  a
écrit :

> Hi,
>
> I've been thinking how to use big O annotation to show complexity for
> different types of queries, like term query, prefix query, phrase query,
> wild card and fuzzy query. Any ideas?
>
> thanks,
> Zong
>


Re: lucene query complexity

2015-11-20 Thread search engine
What if we have some assumptions. For example, we assume that we have only
one segment and the entire segment is in memory ?

thanks,
Zong

On Fri, Nov 20, 2015 at 4:38 AM, Adrien Grand  wrote:

> I don't think the big-O notation is appropriate to measure the cost of
> Lucene queries.
>
> Le mer. 11 nov. 2015 à 20:31, search engine  a
> écrit :
>
> > Hi,
> >
> > I've been thinking how to use big O annotation to show complexity for
> > different types of queries, like term query, prefix query, phrase query,
> > wild card and fuzzy query. Any ideas?
> >
> > thanks,
> > Zong
> >
>


Re: lucene query complexity

2015-11-20 Thread Jack Krupansky
Sigh. Yeah, I agree that a simple big-O won't work for Lucene. But
nonetheless, we really should have some sort of performance
characterization. When people ask me about how to characterize Lucene/Solr
performance I always tell them that it is highly non-linear, with lots of
optimizations and options (tokenizers, stemming, case, n-grams, numeric
fields) and highly sensitive to the specifics of the data, so that
estimating performance or memory requirements is impractical. I mean, most
people don't have a handle on cardinality, actual data size, actual
document term counts, or data distribution, so even if we had an accurate
performance model most people wouldn't have accurate numbers to feed into
the model, especially since a lot of use cases involve data in the future
that nobody has seen yet. The average manager thinks they are on top of
performance and memory requirements when they can tell you how many raw
files and how many giga/tera-bytes of data they have, which clearly won't
feed into any sane model of Lucene performance.

Ultimately the best we can do is fall back on the model of doing a proof of
concept implementation and actually measuring performance and memory for a
significant sample of realistic data and then you can empirically deduce
who the big-O function is for your particular application data and data
model.

-- Jack Krupansky

On Fri, Nov 20, 2015 at 4:38 AM, Adrien Grand  wrote:

> I don't think the big-O notation is appropriate to measure the cost of
> Lucene queries.
>
> Le mer. 11 nov. 2015 à 20:31, search engine  a
> écrit :
>
> > Hi,
> >
> > I've been thinking how to use big O annotation to show complexity for
> > different types of queries, like term query, prefix query, phrase query,
> > wild card and fuzzy query. Any ideas?
> >
> > thanks,
> > Zong
> >
>


lucene query complexity

2015-11-11 Thread search engine
Hi,

I've been thinking how to use big O annotation to show complexity for
different types of queries, like term query, prefix query, phrase query,
wild card and fuzzy query. Any ideas?

thanks,
Zong