[jira] [Updated] (LUCENE-5398) NormValueSource unable to read long field norm

Peng Cheng (JIRA) Tue, 28 Jan 2014 14:52:25 -0800

     [ 
https://issues.apache.org/jira/browse/LUCENE-5398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Peng Cheng updated LUCENE-5398:
-------------------------------

    Description: 
Previous Lucene implementation store field norms of all documents in memory, 
float values are therefore encoded into byte to minimize memory consumption.
Recent release no longer have this constraint (see LUCENE-5078, and discussion 
at http://lucene.markmail.org/message/jtwit3pwu5oiqr2h), users are encouraged 
to implement their own encodeNormValue() to encode them into/decode from any 
type including int, byte and long, to fulfil their request for precision.
But the legacy NormValueSource still typecast any long encoding into byte, as 
seen in line 74 in the java file, making any TFIDFSimilarity using more 
accurate encoding useless.
It should be removed for the greater good.

  was:
Previous Lucene used to store norms in memory, hence float values are encoded 
into byte to avoid memory overflow.
Recent release no longer have this constraint (see LUCENE-5078, and discussion 
at http://lucene.markmail.org/message/jtwit3pwu5oiqr2h), as a result, normValue 
are generally encoded to/decoded from long.
But the legacy NormValueSource still typecast any long encoding into byte, as 
seen in line 74 in the java file, making any TFIDFSimilarity using more 
accurate encoding useless.
It should be removed for the greater good.


> NormValueSource unable to read long field norm
> ----------------------------------------------
>
>                 Key: LUCENE-5398
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5398
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/query/scoring
>    Affects Versions: 4.6
>         Environment: Ubuntu 12.04
>            Reporter: Peng Cheng
>            Priority: Trivial
>             Fix For: 4.7
>
>         Attachments: NormValueSource.java
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Previous Lucene implementation store field norms of all documents in memory, 
> float values are therefore encoded into byte to minimize memory consumption.
> Recent release no longer have this constraint (see LUCENE-5078, and 
> discussion at http://lucene.markmail.org/message/jtwit3pwu5oiqr2h), users are 
> encouraged to implement their own encodeNormValue() to encode them 
> into/decode from any type including int, byte and long, to fulfil their 
> request for precision.
> But the legacy NormValueSource still typecast any long encoding into byte, as 
> seen in line 74 in the java file, making any TFIDFSimilarity using more 
> accurate encoding useless.
> It should be removed for the greater good.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (LUCENE-5398) NormValueSource unable to read long field norm

Reply via email to