Hi Michael, I have an index that contains the terms of the TermInSetQuery but the score provided at query time, represented by the order in a List of terms, is not known at indexing time; it depend from other calculations done at runtime. What do you mean to index the ordinals?
I was wondering if I can wrap TermQuery in BoostQuery, where I boost based on the ordinals I have and create a disjunction query of all the terms; I was wondering how much slower than TermInSetQuery it can be. Nicola On Mon, 2018-07-02 at 06:41 -0400, Michael Sokolov wrote: > Since you have the terms ordered, why not index their ordinals, and > then sort by that? > > On Mon, Jul 2, 2018, 6:16 AM Nicola Buso <nb...@ebi.ac.uk> wrote: > > Hi Uwe, > > > > as said the sorting is calculated elsewhere upfront and the terms > > are > > provided to Lucene in the order calculated (in any case in an not > > ordered Set as by the query API). > > > > I would like an API to keep the input order otherwise I will end up > > on > > the usual problem that I can't re-order afterward because accessing > > the > > results in a paginated way will make impossible this operation. > > > > > > Nicola > > > > On Mon, 2018-06-25 at 21:49 +0200, Uwe Schindler wrote: > > > Hi Nicola, > > > > > > if you sort it elsewhere, why do you care about sort order then? > > What > > > you see as result is simple: As there is nothing available for > > > scoring a constant score query returns the results in index > > order. > > > That's wanted. There is no way to change this "default" order for > > a > > > TermInSetQuery because it's missing information. > > > > > > Uwe > > > > > > ----- > > > Uwe Schindler > > > Achterdiek 19, D-28357 Bremen > > > http://www.thetaphi.de > > > eMail: u...@thetaphi.de > > > > > > > -----Original Message----- > > > > From: Nicola Buso <nb...@ebi.ac.uk> > > > > Sent: Monday, June 25, 2018 5:09 PM > > > > To: Uwe Schindler <u...@thetaphi.de>; java-u...@lucene.apache.or > > g > > > > Subject: Re: TermInSetQuery keep terms order in results > > > > > > > > Hi Uwe, > > > > > > > > thanks for the reply. TermInSetQuery cover most of my use case: > > > > - thousands of term values (also 100,000) > > > > - no need for scoring, because it's calculated elsewhere > > > > - intersect with normal full text query for further filtering > > > > > > > > Using a TermQuery do I risk to hit the > > > > BooleanQuery.getMaxClauseCount() > > > > limit? > > > > > > > > Cheers, > > > > > > > > > > > > Nicola > > > > > > > > > > > > > > > > On Mon, 2018-06-25 at 16:52 +0200, Uwe Schindler wrote: > > > > > Hi, > > > > > > > > > > the TermInSetQuery is a so-called Constant Score Query. It is > > > > > more > > > > > meant as a filter, so you would need some "real" fulltext > > query > > > > > in > > > > > parallel. See the term-in-set query more like the SQL "IN" > > > > > operator. > > > > > It can be used to pass lots of identifiers to filter results > > > > > (e.g. > > > > > when you apply access rights or group policies for filtering > > > > > users to > > > > > your main query as a filter). > > > > > > > > > > As it is a "set", which is by default unordered, the order of > > > > > terms > > > > > in the set is undefined. Internally TermInSetQuery reorders > > the > > > > > terms > > > > > to improve processing speed. > > > > > > > > > > If you need scoring, use TermQuery wrapped by a BooleanQuery. > > > > > Then > > > > > you can apply some boosts to some terms to improve order > > (e.g. > > > > > boost > > > > > term queries coming first) and apply on a field without > > norms. > > > > > > > > > > TermInSetQuery is fast because it neglects scoring and is > > just > > > > > good > > > > > at intersecting the terms dict with the given terms set. > > > > > > > > > > Uwe > > > > > > > > > > ----- > > > > > Uwe Schindler > > > > > Achterdiek 19, D-28357 Bremen > > > > > http://www.thetaphi.de > > > > > eMail: u...@thetaphi.de > > > > > > > > > > > -----Original Message----- > > > > > > From: Nicola Buso <nb...@ebi.ac.uk> > > > > > > Sent: Monday, June 25, 2018 1:23 PM > > > > > > To: java-user@lucene.apache.org > > > > > > Subject: TermInSetQuery keep terms order in results > > > > > > > > > > > > Hi, > > > > > > > > > > > > I need to use the TermInSetQuery, but I would like to keep > > the > > > > > > sorting > > > > > > of the results based on the term set order provided. > > Currently > > > > > > seems > > > > > > using a index documents insertion order in the results. > > > > > > > > > > > > Is this already implemented somewhere or do I need to > > implement > > > > > > a > > > > > > CustomScoreQuery to calculate this score? > > > > > > > > > > > > Cheers, > > > > > > > > > > > > > > > > > > Nicola > > > > > > > > > > > > > > > > > > -- > > > > > > Nicola Buso <nb...@ebi.ac.uk> > > > > > > EMBL-EBI > > > > > > > > > > > > --------------------------------------------------------- > > ---- > > > > > > ---- > > > > > > ---- > > > > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache > > .org > > > > > > For additional commands, e-mail: java-user-h...@lucene.apac > > he.o > > > > > > rg > > > > > > > > > > > > > > > > > > ------------------------------------------------------------- > > ---- > > > > ---- > > > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > > > For additional commands, e-mail: java-user-help@lucene.apache.o > > rg > > > > > > -- Nicola Buso <nb...@ebi.ac.uk> EMBL-EBI --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org