Adrien, thank you for your answer and sorry for the lack of clarity.

No, the score of a document does not depend on the score of another
document, the problem lies within a document.

There are several "only once score" fields; to simplify, I suppose there is
only one "only once score" field;
a document can contain several times this "only once score" field with
different values;
a query can contain several clauses on the different values of this field
and these clauses can be SHOULD or MUST.
But for such a document, the score of this field should only be counted on
the first pass through my CustomScoreQuery subclass, on subsequent passes,
the custom score = 0 ;
to process so, the constructor of the subclass has as argument the map "my
document id (not Lucene doc!) to the field".

 Then, the score of the first pass is multiplied by a date factor which
depends on the age of the document (age = maximum date of the query results
- date of the document):
the score of a document decreases with its age.

The total score (field + date) is correctly calculated, but the explanation
log shows that the sort score (the first element of fields[]) is not the
total score but the total score minus the "only once score" or to put it
another way, a total score where the "only once score" = 0, and that's why
a hit with a lower total score happens to be ranked before a hit with a
higher total score.

The log of my CustomScoreQuery subclass shows that even if the document
contains only one "only once score" field,
Lucene passes the CustomScoreProvider's customScore method twice, so the
score = 0 and it seems to me that this value is retained for the sort score.

I did not find why a TopFieldDocs search (with Sort = SortField.FIELD_SCORE
and date) uses the "diminished" score and not the total score, as TopDocs
does.


Thanks in advance.


Claude Lepère

On 2022/03/14 12:59:45 Adrien Grand wrote:
> It's a bit hard for me to parse what you are trying to do, but it
> looks like you are making assumptions about how Lucene works
> internally that are not correct.
>
> Do I understand correctly that your scoring mechanism has dependencies
> on other documents, ie. the score of a document could depend on the
> score of other documents? This is something that Lucene doesn't
> support.
>
> On Thu, Mar 10, 2022 at 12:23 PM Claude Lepere <cl...@gmail.com> wrote:
> >
> > Hi.
> > The problem is that although sorting by score a match with a lower
score is
> > ranked before a match with a greater score.
> > The origin of the problem lies in a subclass of CustomScoreQuery which
> > calculates an "only once" score for each document: on the first pass the
> > document gets its score and, if the document contains several times the
> > same field, on the subsequent passes it gets 0.
> > I wonder if it is possible for Lucene to give a score that depends on a
> > previous pass in the CustomScoreProvider customScore routine for the
same
> > document.
> > I ran 2 searches with IndexSearcher: the first one returns a TopDocs
which
> > is sorted by default by relevance, and the second search - with the Sort
> > array = [SortField.FIELD_SCORE, a date SortField] argument - returns a
> > TopFieldDocs.
> > The TopDocs results are sorted by the score with the first pass value of
> > the only once method while the TopFieldDocs results are sorted by the
score
> > with the value (= 0) of the next pass, hence the ranking errors.
> > I did not find why does the TopFieldDocs search not use to sort the
score
> > of the hit, as the TopDocs search?
> > I did not find how to tell the TopFieldDocs search to use the hit score
to
> > sort.
> >
> > Claude Lepère
>
>
>
> --
> Adrien
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

Reply via email to