[jira] Commented: (LUCENE-2236) Similarity can only be set per index, but I may want to adjust scoring behaviour at a field level

Robert Muir (JIRA) Mon, 17 Jan 2011 05:44:13 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12982635#action_12982635
 ]


Robert Muir commented on LUCENE-2236:
-------------------------------------

bq. Is that too bad?

well my concern about the deprecated methods is we get into the hairy backwards 
compat situation...
we already had issues with this with Similarity.

It might be ok to essentially fix Similarity to be the way we want for 4.0 
(break it) since its an expert API anyway.
This patch was just a quick stab...
I definitely agree with you about the name though, i prefer Similarity.

bq. should Sim be aware of for which field it was created, so that no need to 
pass it as parameter in its methods in case this is ever important?

Well honestly I think what you are saying is really needed for the future... 
but I would prefer to actually delay that until a future patch :)

Making an optimized TermScorer is becoming more and more complicated, see the 
one in the bulkpostings branch for example. Because of this,
its extremely tricky to customize the scoring with good performance. I think 
the score caching etc in term scorer needs to be moved out of TermScorer,
instead the responsibility of calculating the score should reside in 
Similarity, including any caching it needs to do (which is really impl 
dependent).
Basically Similarity needs to be responsible for score(), but let TermScorer 
etc deal with enumerating postings etc.

For example, we now have the stats totalTermFreq/totalCollectionFreq by field 
for a term, but you can't e.g. take these and make a 
Language-modelling based scorer, which you should be able to do *right now*, 
except for limitations in our APIs.

So in a future issue I would like to propose a patch to do just this, so that 
TermScorer, for example is more general. Similarity would need to be able
to 'setup' a query (e.g. things like IDF, building score caches for the query, 
whatever), and then also score an individual document.

In the flexible scoring prototype this is what we did, but we went even 
further, where a Similarity is also responsible for 'setting up' a searcher, 
too.
So that means, its responsible for managing norm byte[] (in that patch, you 
only had a byte[] norms, if you made it in your Similarity yourself).
I think long term that approach is definitely really interesting, but I think 
we can go ahead and make scoring a lot more flexible in tiny steps 
like this without rewriting all of lucene in one enormous patch... and this is 
safer as we can benchmark performance each step of the way.


> Similarity can only be set per index, but I may want to adjust scoring 
> behaviour at a field level
> -------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-2236
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2236
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Query/Scoring
>    Affects Versions: 3.0
>            Reporter: Paul taylor
>            Assignee: Robert Muir
>         Attachments: LUCENE-2236.patch
>
>
> Similarity can only be set per index, but I may want to adjust scoring 
> behaviour at a field level, to faciliate this could we pass make field name 
> available to all score methods.
> Currently it is only passed to some such as lengthNorm() but not others such 
> as tf()

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-2236) Similarity can only be set per index, but I may want to adjust scoring behaviour at a field level

Reply via email to