Thank you both for the explanation, we will switch to StringField with a TermQuery instead.
On 02.11.2016 20:09, Michael McCandless wrote: > Yeah it's best to use StringField for low-cardinality use cases. > > When cardinality is low (4 unique values in your case), legacy > numerics would rewrite to a BooleanQuery, which is much more > performant for MUST clauses, vs dimensional points which will always > need to construct an up front bitset for all documents with that > value. Using StringField instead will ensure you always get a > BooleanQuery... > > Mike McCandless > > http://blog.mikemccandless.com > > > On Wed, Nov 2, 2016 at 2:43 PM, Fuad Efendi <f...@efendi.ca> wrote: >> Hi florian, >> >> If my understanting is correct, you are using IntPoint to index 4 different >> document types which is overkill; why not to try classic “non-tokenized” >> keyword field (a.k.a. “legacy string”) for document types? Cardinality is >> only four for document types. >> >> >> -- >> >> Fuad Efendi >> >> (416) 993-2060 >> >> http://www.tokenizer.ca >> Recommender Systems >> >> >> On November 2, 2016 at 2:10:14 PM, Florian Hopf ( >> mailingli...@florian-hopf.de) wrote: >> >> Hi, >> >> we are indexing different types of documents in one Lucene index. They >> have most fields in common but we need to filter some types for certain >> queries. We are using numeric values to determine the types of documents >> (1-4). Now, when querying these documents we see that the performance >> degrades the more documents of a type are in the index. >> >> Using a simple test that indexes 10 Mio documents I can see the >> following when filtering on everything but 100000 documents: >> >> * When issuing the query alone the new PointRangeQuery >> (IntPoint.newExactQuery) is a lot faster than term and legacy numeric >> (in my case around 2x the speed of the others) >> * When issuing a bool query that contains a term query that selects 5 >> documents together with a must query that selects on the numeric the >> points are 5x slower than legacy numeric >> (LegacyNumericRangeQuery.newIntRange) and terms (TermQuery) >> * When doing the same thing with SHOULD instead of MUST for the >> additional term query the PointRangeQuery is fastests as well >> >> I suspect this to be related to the discussion in >> https://issues.apache.org/jira/browse/LUCENE-7254 >> >> Of course there could be something wrong with the way I am measuring the >> performance, I'd be happy to share the code. But what I read in the >> ticket above seems to hint that the points are not suited for every use >> case? Is it recommended to use StringField in a case like this instead? >> >> Regards >> Florian >> >> -- >> Florian Hopf >> Freelance Software Developer >> >> http://blog.florian-hopf.de >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org > -- Florian Hopf Freelance Software Developer http://blog.florian-hopf.de --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org