subject:"Re\: Custom scores and sort"

RE: Re: Custom scores and sort

2022-03-25 Thread Claude Lepère

Hi Adrien . Thank you for your reply.

Here a detailed example in order to clarify what I try to do:
The name of the “only once score” field = “onlyOnce”, its boost = 5;
2 documents:

  1.  doc1 has 2 times the onlyOnce field with the values “2” and “3”, plus 
some other fields
  2.  doc2 has 1 onlyOnce field with the value “2”, plus some other fields
The SHOULD query = custom(onlyOnce:2) custom(onlyOnce:3)

The “onlyOnce” field must be counted only once per document; to this end, I 
give my CustomScoreQuery subclass a map “doc ID to field name” as argument (doc 
ID is my ID, not the doc of Lucene):

  1.  doc1:
 *   at calculation of the custom score of onlyOnce 2, the map is filled 
with doc1 ID to “onlyOnce” and the returned subscore = 1
 *   at calculation of the custom score of onlyOnce 3, as the map already 
contains the key doc1 ID with the value “onlyOnce” the returned subscore = 0
  2.  doc2:
 *   at calculation of the custom score of onlyOnce 2, the pair doc2 ID to 
“onlyOnce” is added to the map and the returned subscore = 1

Therefore doc1 and doc2 get the same final subscore for the “onlyOnce” field: 
subscore x boost = 1 x 5 = 5.
The TopFieldDocs search as well as the TopDocs return the same correct final 
score. All is OK for the final score.

But, actually, there are other fields than “onlyOnce” and I use a TopFieldDocs 
search to sort by score first and then by a date field and by a third field.
The Lucene explanation shows that the TopFieldDocs search does not use the 
correct final score to sort: for doc1 as well as doc2, it uses a score 
(fields[0]) where the contribution of the “onlyOnce” field is 0 and not 5; the 
reason I suspect is that to sort it passes through the CustomScoreQuery 
subclass while the map contains already the doc1 and doc2 pairs.
And the result is that for some hits a hit with a lower total final score can 
be ranked before a hit with a higher score.

The test with a TopDocs search returns the correct final score of 5 and the 
default sorting by relevance only is correct.

Why is fields[0] which is used to sort the TopFieldDocs hits not the final 
score?

I agree with you, I must conclude that my CustomScoreQuery subclass breaks some 
Lucene assumptions.

About your last question about the LongDistanceFeatureQuery, I don’t know it, 
it is not in the version 5 of Lucene I use.

Claude Lepère

From: Adrien Grand 
Sent: Wednesday, March 23, 2022 17:58
To: Lucene Users Mailing List 
Subject: Re: Re: Custom scores and sort

CAUTION: external mail

Sorry Claude, but I have some trouble following what you are doing
with your CustomScoreQuery. It feels like your query is doing
something that breaks some assumptions that Lucene makes.

Have you looked at existing ways that Lucene supports boosting
documents by recency, such as putting a LongDistanceFeatureQuery as a
SHOULD clause in a BooleanQuery?

On Mon, Mar 14, 2022 at 7:00 PM Claude Lepere 
mailto:claudelep...@gmail.com>> wrote:
>
> Adrien, thank you for your answer and sorry for the lack of clarity.
>
> No, the score of a document does not depend on the score of another
> document, the problem lies within a document.
>
> There are several "only once score" fields; to simplify, I suppose there is
> only one "only once score" field;
> a document can contain several times this "only once score" field with
> different values;
> a query can contain several clauses on the different values of this field
> and these clauses can be SHOULD or MUST.
> But for such a document, the score of this field should only be counted on
> the first pass through my CustomScoreQuery subclass, on subsequent passes,
> the custom score = 0 ;
> to process so, the constructor of the subclass has as argument the map "my
> document id (not Lucene doc!) to the field".
>
> Then, the score of the first pass is multiplied by a date factor which
> depends on the age of the document (age = maximum date of the query results
> - date of the document):
> the score of a document decreases with its age.
>
> The total score (field + date) is correctly calculated, but the explanation
> log shows that the sort score (the first element of fields[]) is not the
> total score but the total score minus the "only once score" or to put it
> another way, a total score where the "only once score" = 0, and that's why
> a hit with a lower total score happens to be ranked before a hit with a
> higher total score.
>
> The log of my CustomScoreQuery subclass shows that even if the document
> contains only one "only once score" field,
> Lucene passes the CustomScoreProvider's customScore method twice, so the
> score = 0 and it seems to me that this value is retained for the sort score.
>
> I did not find why a TopFieldDocs search (with Sort = SortField.FIELD_SCORE
> and

Re: Re: Custom scores and sort

2022-03-23 Thread Adrien Grand

Sorry Claude, but I have some trouble following what you are doing
with your CustomScoreQuery. It feels like your query is doing
something that breaks some assumptions that Lucene makes.

Have you looked at existing ways that Lucene supports boosting
documents by recency, such as putting a LongDistanceFeatureQuery as a
SHOULD clause in a BooleanQuery?

On Mon, Mar 14, 2022 at 7:00 PM Claude Lepere  wrote:
>
> Adrien, thank you for your answer and sorry for the lack of clarity.
>
> No, the score of a document does not depend on the score of another
> document, the problem lies within a document.
>
> There are several "only once score" fields; to simplify, I suppose there is
> only one "only once score" field;
> a document can contain several times this "only once score" field with
> different values;
> a query can contain several clauses on the different values of this field
> and these clauses can be SHOULD or MUST.
> But for such a document, the score of this field should only be counted on
> the first pass through my CustomScoreQuery subclass, on subsequent passes,
> the custom score = 0 ;
> to process so, the constructor of the subclass has as argument the map "my
> document id (not Lucene doc!) to the field".
>
>  Then, the score of the first pass is multiplied by a date factor which
> depends on the age of the document (age = maximum date of the query results
> - date of the document):
> the score of a document decreases with its age.
>
> The total score (field + date) is correctly calculated, but the explanation
> log shows that the sort score (the first element of fields[]) is not the
> total score but the total score minus the "only once score" or to put it
> another way, a total score where the "only once score" = 0, and that's why
> a hit with a lower total score happens to be ranked before a hit with a
> higher total score.
>
> The log of my CustomScoreQuery subclass shows that even if the document
> contains only one "only once score" field,
> Lucene passes the CustomScoreProvider's customScore method twice, so the
> score = 0 and it seems to me that this value is retained for the sort score.
>
> I did not find why a TopFieldDocs search (with Sort = SortField.FIELD_SCORE
> and date) uses the "diminished" score and not the total score, as TopDocs
> does.
>
>
> Thanks in advance.
>
>
> Claude Lepère
>
> On 2022/03/14 12:59:45 Adrien Grand wrote:
> > It's a bit hard for me to parse what you are trying to do, but it
> > looks like you are making assumptions about how Lucene works
> > internally that are not correct.
> >
> > Do I understand correctly that your scoring mechanism has dependencies
> > on other documents, ie. the score of a document could depend on the
> > score of other documents? This is something that Lucene doesn't
> > support.
> >
> > On Thu, Mar 10, 2022 at 12:23 PM Claude Lepere  wrote:
> > >
> > > Hi.
> > > The problem is that although sorting by score a match with a lower
> score is
> > > ranked before a match with a greater score.
> > > The origin of the problem lies in a subclass of CustomScoreQuery which
> > > calculates an "only once" score for each document: on the first pass the
> > > document gets its score and, if the document contains several times the
> > > same field, on the subsequent passes it gets 0.
> > > I wonder if it is possible for Lucene to give a score that depends on a
> > > previous pass in the CustomScoreProvider customScore routine for the
> same
> > > document.
> > > I ran 2 searches with IndexSearcher: the first one returns a TopDocs
> which
> > > is sorted by default by relevance, and the second search - with the Sort
> > > array = [SortField.FIELD_SCORE, a date SortField] argument - returns a
> > > TopFieldDocs.
> > > The TopDocs results are sorted by the score with the first pass value of
> > > the only once method while the TopFieldDocs results are sorted by the
> score
> > > with the value (= 0) of the next pass, hence the ranking errors.
> > > I did not find why does the TopFieldDocs search not use to sort the
> score
> > > of the hit, as the TopDocs search?
> > > I did not find how to tell the TopFieldDocs search to use the hit score
> to
> > > sort.
> > >
> > > Claude Lepère
> >
> >
> >
> > --
> > Adrien
> >
> > -
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >
> >



-- 
Adrien

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

RE: Re: Custom scores and sort

2022-03-14 Thread Claude Lepere

Adrien, thank you for your answer and sorry for the lack of clarity.

No, the score of a document does not depend on the score of another
document, the problem lies within a document.

There are several "only once score" fields; to simplify, I suppose there is
only one "only once score" field;
a document can contain several times this "only once score" field with
different values;
a query can contain several clauses on the different values of this field
and these clauses can be SHOULD or MUST.
But for such a document, the score of this field should only be counted on
the first pass through my CustomScoreQuery subclass, on subsequent passes,
the custom score = 0 ;
to process so, the constructor of the subclass has as argument the map "my
document id (not Lucene doc!) to the field".

 Then, the score of the first pass is multiplied by a date factor which
depends on the age of the document (age = maximum date of the query results
- date of the document):
the score of a document decreases with its age.

The total score (field + date) is correctly calculated, but the explanation
log shows that the sort score (the first element of fields[]) is not the
total score but the total score minus the "only once score" or to put it
another way, a total score where the "only once score" = 0, and that's why
a hit with a lower total score happens to be ranked before a hit with a
higher total score.

The log of my CustomScoreQuery subclass shows that even if the document
contains only one "only once score" field,
Lucene passes the CustomScoreProvider's customScore method twice, so the
score = 0 and it seems to me that this value is retained for the sort score.

I did not find why a TopFieldDocs search (with Sort = SortField.FIELD_SCORE
and date) uses the "diminished" score and not the total score, as TopDocs
does.

Thanks in advance.

Claude Lepère

On 2022/03/14 12:59:45 Adrien Grand wrote:
> It's a bit hard for me to parse what you are trying to do, but it
> looks like you are making assumptions about how Lucene works
> internally that are not correct.
>
> Do I understand correctly that your scoring mechanism has dependencies
> on other documents, ie. the score of a document could depend on the
> score of other documents? This is something that Lucene doesn't
> support.
>
> On Thu, Mar 10, 2022 at 12:23 PM Claude Lepere  wrote:
> >
> > Hi.
> > The problem is that although sorting by score a match with a lower
score is
> > ranked before a match with a greater score.
> > The origin of the problem lies in a subclass of CustomScoreQuery which
> > calculates an "only once" score for each document: on the first pass the
> > document gets its score and, if the document contains several times the
> > same field, on the subsequent passes it gets 0.
> > I wonder if it is possible for Lucene to give a score that depends on a
> > previous pass in the CustomScoreProvider customScore routine for the
same
> > document.
> > I ran 2 searches with IndexSearcher: the first one returns a TopDocs
which
> > is sorted by default by relevance, and the second search - with the Sort
> > array = [SortField.FIELD_SCORE, a date SortField] argument - returns a
> > TopFieldDocs.
> > The TopDocs results are sorted by the score with the first pass value of
> > the only once method while the TopFieldDocs results are sorted by the
score
> > with the value (= 0) of the next pass, hence the ranking errors.
> > I did not find why does the TopFieldDocs search not use to sort the
score
> > of the hit, as the TopDocs search?
> > I did not find how to tell the TopFieldDocs search to use the hit score
to
> > sort.
> >
> > Claude Lepère
>
>
>
> --
> Adrien
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

Re: Custom scores and sort

2022-03-14 Thread Adrien Grand

It's a bit hard for me to parse what you are trying to do, but it
looks like you are making assumptions about how Lucene works
internally that are not correct.

Do I understand correctly that your scoring mechanism has dependencies
on other documents, ie. the score of a document could depend on the
score of other documents? This is something that Lucene doesn't
support.

On Thu, Mar 10, 2022 at 12:23 PM Claude Lepere  wrote:
>
> Hi.
> The problem is that although sorting by score a match with a lower score is
> ranked before a match with a greater score.
> The origin of the problem lies in a subclass of CustomScoreQuery which
> calculates an "only once" score for each document: on the first pass the
> document gets its score and, if the document contains several times the
> same field, on the subsequent passes it gets 0.
> I wonder if it is possible for Lucene to give a score that depends on a
> previous pass in the CustomScoreProvider customScore routine for the same
> document.
> I ran 2 searches with IndexSearcher: the first one returns a TopDocs which
> is sorted by default by relevance, and the second search - with the Sort
> array = [SortField.FIELD_SCORE, a date SortField] argument - returns a
> TopFieldDocs.
> The TopDocs results are sorted by the score with the first pass value of
> the only once method while the TopFieldDocs results are sorted by the score
> with the value (= 0) of the next pass, hence the ranking errors.
> I did not find why does the TopFieldDocs search not use to sort the score
> of the hit, as the TopDocs search?
> I did not find how to tell the TopFieldDocs search to use the hit score to
> sort.
>
> Claude Lepère



-- 
Adrien

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

RE: Custom scores and sort

2022-02-27 Thread Claude Lepere

Hi!

I see where the problem lies but I can't find a way to solve it.

First feature: one of the fields must be scored only once: if a document
matches this field several times (the values are different), the score is
counted only the first time.
A map is given as an argument to the CustomScoreQuery, it registers that
the document has been scored once and that all subsequent matches must
result in a score of 0.

A second feature: another CustomScoreQuery multiplies each sub-score by a
factor based on the date of the document: document A that matches better
than document B but is older may receive a lower final score than document
B.

The calculation of the final total score (only once score field + date)
gives the expected correct result (the Explanation shows it) but in some
cases - because of the date correction - the ranking is wrong, a document
with a lower final total score is ranked before a document with a higher
score.

In scoreDoc.toString(), the score=... part and the fields=[score, ...] part
do not have the same score value, that of fields=[] is
smaller: the difference is equal to the score of the "only once score"
field multiplied by the date factor.
This fields part represents the Sort requested from the IndexSearcher.
This difference exists for all hits, whether the document has the "only
once score" field once or more.
Why this difference?
When debugging, I see that the IndexSearcher search enters at least a
second time into the "only once score" CustomScoreQuery and that it is the
0 score that is finally retained since the record that the score has
already been given was made for each match.

I can't figure out how to solve this problem, I'm not sure if there is a
solution since a score depends on a previous score; I've tried the
FunctionQuery route without success but I'm not sure that technique applies
here either.

Am I making a mistake somewhere? I can only see re-sorting all the hits at
the end, apart from Lucene, as a workaround.

I would be very happy if someone could point me to a better solution.

Thanks in advance. Claude Lepère

On 2022/02/21 09:56:18 Claude Lepere wrote:
> Hi! I have a question with sorting, I don’t understand why in a test a hit
> with a lower score is ranked before hits with higher scores.
>
> I am using Lucene 5.2.1.
>
>
>
> Two CustomScoreQuery subqueries on two fields, subquery 1 and subquery 2,
> and two test cases:
>
> case 1: the two calculated custom scores are multiplied by the same factor
> depending on the date of the match at the end of the customScore method of
> CustomScoreProvider
>
> case 2: the two calculated custom scores are *not* multiplied by the date
> factor.
>
>
>
> All tests with the same Sort, by score then by date.
>
>
>
> Case 1: with date factor:
>
>
>
> Test 1: subquery 1 only:
>
> two hits, doc A (date A) gets the score A1, doc B (date B) gets the score
> B1: score A1 > score B1, date A < date B, and doc A is ranked before doc B
>
> Explanation:
>
> doc A score A1 shardIndex=0 fields=[score A1, date A]
>
> doc B score B1 shardIndex=0 fields=[score B1, date B]
>
>
>
> That's correct.
>
>
>
>
>
> Test 2: MUST query subquery 1, subquery 2:
>
> the two same docs match: doc A (date A) gets the score A2, doc B (date B)
> gets the score B2: score A2 *<* score B2, date A < date B, and *doc A is
> ranked before doc B*
>
> Explanation:
>
> doc A score A2 shardIndex=0 fields=[score A1, date A]
>
> doc B score B2 shardIndex=0 fields=[score B1, date B]
>
>
>
> *doc A is ranked before doc B although score A2 < score B2 and sorting
> should use scores A2 and B2, not A1 and B1.*
>
>
>
>
>
>
>
> Case 2: without date factor:
>
>
>
> Test 1: subquery 1 only:
>
> doc A (date A) gets the score A1, doc B (date B) gets the score B1: score
> A1 > score B1, date A < date B, and doc A is ranked before doc B
>
> Explanation:
>
> doc A score A1 shardIndex=0 fields=[score A1, date A]
>
> doc B score B1 shardIndex=0 fields=[score B1, date B]
>
>
>
>
>
> Test 2: MUST query subquery 1, subquery 2:
>
> the two same docs match: doc A (date A) gets the score A2, doc B (date B)
> gets the score B2: score A2 *>* score B2, date A < date B, and doc A is
> ranked before doc B
>
> Explanation:
>
> doc A score A2 shardIndex=0 fields=[score A1, date A]
>
> doc B score B2 shardIndex=0 fields=[score B1, date B]
>
>
>
> Using score A1 here works: without the date factor, all the hits of test 2
> match subquery 2 in the same way and they get the same sub-score: the
> explanation shows in this case that the score = field[0] score + the
common
> sub-score of the hits, therefore the sorting is the same by current score
> as by field[0] score.
>
>
>
> But, with the date factor, this is no longer true, the sort [Score, date]
> should use the current scores of test 2 and not those of test 1.
>
>
>
>
>
> Please, could someone enlighten me? Do I make a mistake somewhere?
>
>
>
> Claude Lepère
>
> <
http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-e

Re: Custom scores and sort

2022-02-21 Thread passignat

Hello Claude,

here is what I'm doing and it seems to work, I haven't yet created
failure tests. Maybe more expert member will have more information.

Date field inserted:
final Date parse = DATE_FORMAT.parse(DATE_FORMAT.format(o1));
new LongPoint(attributeName, parse.getTime()));


The sorter:
Sort sort = new Sort(SortField.FIELD_SCORE, new SortField(LAST_UPDATE,
SortField.Type.STRING));

The query:
TopDocs docs = searcher.search(q, maxCount, sort); 



The records are inserted with 1 sec delay (for tests purposes only)

Stephane


-Original Message-
From: Claude Lepere 
Reply-To: java-user@lucene.apache.org
To: java-user@lucene.apache.org
Subject: Custom scores and sort
Date: Mon, 21 Feb 2022 10:56:18 +0100

Hi! I have a question with sorting, I don’t understand why in a test a
hitwith a lower score is ranked before hits with higher scores.
I am using Lucene 5.2.1.


Two CustomScoreQuery subqueries on two fields, subquery 1 and subquery
2,and two test cases:
case 1: the two calculated custom scores are multiplied by the same
factordepending on the date of the match at the end of the customScore
method ofCustomScoreProvider
case 2: the two calculated custom scores are *not* multiplied by the
datefactor.


All tests with the same Sort, by score then by date.


Case 1: with date factor:


Test 1: subquery 1 only:
two hits, doc A (date A) gets the score A1, doc B (date B) gets the
scoreB1: score A1 > score B1, date A < date B, and doc A is ranked
before doc B
Explanation:
doc A score A1 shardIndex=0 fields=[score A1, date A]
doc B score B1 shardIndex=0 fields=[score B1, date B]


That's correct.




Test 2: MUST query subquery 1, subquery 2:
the two same docs match: doc A (date A) gets the score A2, doc B (date
B)gets the score B2: score A2 *<* score B2, date A < date B, and *doc
A isranked before doc B*
Explanation:
doc A score A2 shardIndex=0 fields=[score A1, date A]
doc B score B2 shardIndex=0 fields=[score B1, date B]


*doc A is ranked before doc B although score A2 < score B2 and
sortingshould use scores A2 and B2, not A1 and B1.*






Case 2: without date factor:


Test 1: subquery 1 only:
doc A (date A) gets the score A1, doc B (date B) gets the score B1:
scoreA1 > score B1, date A < date B, and doc A is ranked before doc B
Explanation:
doc A score A1 shardIndex=0 fields=[score A1, date A]
doc B score B1 shardIndex=0 fields=[score B1, date B]




Test 2: MUST query subquery 1, subquery 2:
the two same docs match: doc A (date A) gets the score A2, doc B (date
B)gets the score B2: score A2 *>* score B2, date A < date B, and doc A
isranked before doc B
Explanation:
doc A score A2 shardIndex=0 fields=[score A1, date A]
doc B score B2 shardIndex=0 fields=[score B1, date B]


Using score A1 here works: without the date factor, all the hits of
test 2match subquery 2 in the same way and they get the same sub-
score: theexplanation shows in this case that the score = field[0]
score + the commonsub-score of the hits, therefore the sorting is the
same by current scoreas by field[0] score.


But, with the date factor, this is no longer true, the sort [Score,
date]should use the current scores of test 2 and not those of test 1.




Please, could someone enlighten me? Do I make a mistake somewhere?


Claude Lepère
<
http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>Virus-free.www.avg.com
<
http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail><#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2
>

RE: Re: Custom scores and sort

Re: Re: Custom scores and sort

RE: Re: Custom scores and sort

Re: Custom scores and sort

RE: Custom scores and sort

Re: Custom scores and sort

6 matches

Site Navigation

Mail list logo

Footer information