[jira] Updated: (LUCENE-1360) A Similarity class which has unique length norms for numTerms <= 10

2008-08-21 Thread Sean Timm (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Timm updated LUCENE-1360:
--

Attachment: ShortFieldNormSimilarity.java

> A Similarity class which has unique length norms for numTerms <= 10
> ---
>
> Key: LUCENE-1360
> URL: https://issues.apache.org/jira/browse/LUCENE-1360
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Sean Timm
>Priority: Trivial
> Attachments: ShortFieldNormSimilarity.java
>
>
> A Similarity class which extends DefaultSimilarity and simply overrides 
> lengthNorm.  lengthNorm is implemented as a lookup for numTerms <= 10, else 
> as {{1/sqrt(numTerms)}}. This is to avoid term counts below 11 from having 
> the same lengthNorm after stored as a single byte in the index.
> This is useful if your search is only on short fields such as titles or 
> product descriptions.
> See mailing list discussion: 
> http://www.nabble.com/How-to-boost-the-score-higher-in-case-user-query-matches-entire-field-value-than-just-some-words-within-a-field-td19079221.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-1360) A Similarity class which has unique length norms for numTerms <= 10

2009-11-20 Thread Lance Norskog (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lance Norskog updated LUCENE-1360:
--

Attachment: LUCENE-1380 visualization.pdf

This is a graph of the standard norms against the results of this patch. The 
orange/red dots at the left are the elevated values for boosting short 
documents.

Both displays show the norms after the 8-bit encode/decode process, rather than 
raw 1/x. Here is the code for the generator:

public class FloatEncode {
private static float ARR[] = { 0.0f, 1.5f, 1.25f, 1.0f, 0.875f, 0.75f, 
0.625f, 0.5f, 0.4375f, 0.375f, 0.3125f};

/**
 * @param args
 */
public static void main(String[] args) {
for(int i = 1; i < 100; i++) {
float f = i;
f = 1/f;
byte b = SmallFloat.floatToByte315(f);
float f2 = SmallFloat.byte315ToFloat(b);
float ff = f2;
if (i < ARR.length)
ff = ARR[i];
System.out.println(i + "," + f2 + "," + ff);
}

}

}


> A Similarity class which has unique length norms for numTerms <= 10
> ---
>
> Key: LUCENE-1360
> URL: https://issues.apache.org/jira/browse/LUCENE-1360
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Sean Timm
>Assignee: Otis Gospodnetic
>Priority: Trivial
> Attachments: LUCENE-1380 visualization.pdf, 
> ShortFieldNormSimilarity.java
>
>
> A Similarity class which extends DefaultSimilarity and simply overrides 
> lengthNorm.  lengthNorm is implemented as a lookup for numTerms <= 10, else 
> as {{1/sqrt(numTerms)}}. This is to avoid term counts below 11 from having 
> the same lengthNorm after stored as a single byte in the index.
> This is useful if your search is only on short fields such as titles or 
> product descriptions.
> See mailing list discussion: 
> http://www.nabble.com/How-to-boost-the-score-higher-in-case-user-query-matches-entire-field-value-than-just-some-words-within-a-field-td19079221.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org