[
https://issues.apache.org/jira/browse/LUCENE-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12982888#action_12982888
]
Robert Muir commented on LUCENE-2236:
-------------------------------------
bq. So let's keep that name (Similarity)
OK, I'll fix the patch, to rename FieldSimilarity->Similarity
{quote}
So Similarity is not only per field but also per query/scorer..
and Query would have an abstract method getSimilarityProvider(fieldName) which
would be implemented by each concrete query, neatly separating finding matches
from scores computation, and allowing more extendable scoring. Nice.
Also, perhaps what seems to be like an inflation of Similarity objects (per
query per field) is one more good reason to keep the field name params for now.
{quote}
Well I'm not totally sure how we want to do it, but definitely I think we want
to split Scorer's calculations and finding matches as you say,
and also split Weight's calculations and "resource management"
For example, TermWeight today has a PerReaderTermState, which contains all the
information you need to calculate the "setup" portion
without doing any real I/O (e.g. docFreq, totalTermFreq, totalCollectionFreq,
...) So maybe this is the right thing to pass to Similarity's "query setup".
The Weight then would just be responsible for managing termstate and creating a
Scorer...
I think also the Similarity needs to be fully responsible for Explanations...
but most users wouldn't have to interact with this I think.
Instead I think typically their "base class" (TFIDFSimilarity or whatever it
is) would typically provide this, based on the methods and API
it exposes: tf(), idf(), but this would allow us to also have other
fully-fleshed out base classes like BM25Similarity, that you can extend
and tune based on the parameters that make sense to it.
Anyway these are just some thoughts, first I'm going to adjust the patch to
keep our existing name "Similarity".
> Similarity can only be set per index, but I may want to adjust scoring
> behaviour at a field level
> -------------------------------------------------------------------------------------------------
>
> Key: LUCENE-2236
> URL: https://issues.apache.org/jira/browse/LUCENE-2236
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Query/Scoring
> Affects Versions: 3.0
> Reporter: Paul taylor
> Assignee: Robert Muir
> Attachments: LUCENE-2236.patch
>
>
> Similarity can only be set per index, but I may want to adjust scoring
> behaviour at a field level, to faciliate this could we pass make field name
> available to all score methods.
> Currently it is only passed to some such as lengthNorm() but not others such
> as tf()
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]