[jira] Commented: (LUCENE-2236) Similarity can only be set per index, but I may want to adjust scoring behaviour at a field level

Robert Muir (JIRA) Mon, 17 Jan 2011 13:48:10 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12982888#action_12982888
 ]


Robert Muir commented on LUCENE-2236:
-------------------------------------

bq. So let's keep that name (Similarity) 

OK, I'll fix the patch, to rename FieldSimilarity->Similarity

{quote}
So Similarity is not only per field but also per query/scorer.. 
and Query would have an abstract method getSimilarityProvider(fieldName) which 
would be implemented by each concrete query, neatly separating finding matches 
from scores computation, and allowing more extendable scoring. Nice.
Also, perhaps what seems to be like an inflation of Similarity objects (per 
query per field) is one more good reason to keep the field name params for now.
{quote}

Well I'm not totally sure how we want to do it, but definitely I think we want 
to split Scorer's calculations and finding matches as you say,
and also split Weight's calculations and "resource management"

For example, TermWeight today has a PerReaderTermState, which contains all the 
information you need to calculate the "setup" portion
without doing any real I/O (e.g. docFreq, totalTermFreq, totalCollectionFreq, 
...) So maybe this is the right thing to pass to Similarity's "query setup".

The Weight then would just be responsible for managing termstate and creating a 
Scorer...

I think also the Similarity needs to be fully responsible for Explanations... 
but most users wouldn't have to interact with this I think.
Instead I think typically their "base class" (TFIDFSimilarity or whatever it 
is) would typically provide this, based on the methods and API
it exposes: tf(), idf(), but this would allow us to also have other 
fully-fleshed out base classes like BM25Similarity, that you can extend
and tune based on the parameters that make sense to it.

Anyway these are just some thoughts, first I'm going to adjust the patch to 
keep our existing name "Similarity".


> Similarity can only be set per index, but I may want to adjust scoring 
> behaviour at a field level
> -------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-2236
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2236
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Query/Scoring
>    Affects Versions: 3.0
>            Reporter: Paul taylor
>            Assignee: Robert Muir
>         Attachments: LUCENE-2236.patch
>
>
> Similarity can only be set per index, but I may want to adjust scoring 
> behaviour at a field level, to faciliate this could we pass make field name 
> available to all score methods.
> Currently it is only passed to some such as lengthNorm() but not others such 
> as tf()

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-2236) Similarity can only be set per index, but I may want to adjust scoring behaviour at a field level

Reply via email to