Understanding performance characteristics of the new point types

Florian Hopf Wed, 02 Nov 2016 11:10:44 -0700

Hi,

we are indexing different types of documents in one Lucene index. They
have most fields in common but we need to filter some types for certain
queries. We are using numeric values to determine the types of documents
(1-4). Now, when querying these documents we see that the performance
degrades the more documents of a type are in the index.


Using a simple test that indexes 10 Mio documents I can see the
following when filtering on everything but 100000 documents:

* When issuing the query alone the new PointRangeQuery
(IntPoint.newExactQuery) is a lot faster than term and legacy numeric
(in my case around 2x the speed of the others)
* When issuing a bool query that contains a term query that selects 5
documents together with a must query that selects on the numeric the
points are 5x slower than legacy numeric
(LegacyNumericRangeQuery.newIntRange) and terms (TermQuery)
* When doing the same thing with SHOULD instead of MUST for the
additional term query the PointRangeQuery is fastests as well

I suspect this to be related to the discussion in
https://issues.apache.org/jira/browse/LUCENE-7254

Of course there could be something wrong with the way I am measuring the
performance, I'd be happy to share the code. But what I read in the
ticket above seems to hint that the points are not suited for every use
case? Is it recommended to use StringField in a case like this instead?

Regards
Florian

-- 
Florian Hopf
Freelance Software Developer

http://blog.florian-hopf.de

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Understanding performance characteristics of the new point types

Reply via email to