[jira] Commented: (LUCENE-1360) A Similarity class which has unique length norms for numTerms <= 10

Robert Muir (JIRA) Thu, 27 Jan 2011 13:23:07 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-1360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12987766#action_12987766
 ]


Robert Muir commented on LUCENE-1360:
-------------------------------------

In my opinion, the best thing to do would be to open an issue
for better per-field Similarity integration into the solr schema.

Currently you can pass a SimProvider to the 'global' SimilarityFactory for the 
entire schema.
in this java code you would have to e.g. make a hashset with "smallfield1", 
"smallfield2", "smallfield3",
and return SmallFloatSimilarity for these.

Instead, it would be better if the FieldType? (dunno if this is even the best)
could simply have similarity=SmallFloatSimilarity or whatever, so that the 
specification is more declarative.

Then solr could have an example 'short field type' FieldType in the example 
schema.
(with the tradeoffs of the fact floatToByte52 maxes out at 1984, so don't use 
for large fields or big boosts).

This way, people could make their metadata fields of this smalltype, but their 
large document fields
still use the ordinary text type (e.g. guys like Hathitrust with some enormous 
fields), and everything in 
their application works, they just get quantization that makes sense for each 
field... 


> A Similarity class which has unique length norms for numTerms <= 10
> -------------------------------------------------------------------
>
>                 Key: LUCENE-1360
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1360
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Query/Scoring
>            Reporter: Sean Timm
>            Assignee: Otis Gospodnetic
>            Priority: Trivial
>         Attachments: LUCENE-1380 visualization.pdf, 
> ShortFieldNormSimilarity.java
>
>
> A Similarity class which extends DefaultSimilarity and simply overrides 
> lengthNorm.  lengthNorm is implemented as a lookup for numTerms <= 10, else 
> as {{1/sqrt(numTerms)}}. This is to avoid term counts below 11 from having 
> the same lengthNorm after stored as a single byte in the index.
> This is useful if your search is only on short fields such as titles or 
> product descriptions.
> See mailing list discussion: 
> http://www.nabble.com/How-to-boost-the-score-higher-in-case-user-query-matches-entire-field-value-than-just-some-words-within-a-field-td19079221.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-1360) A Similarity class which has unique length norms for numTerms <= 10

Reply via email to