[
https://issues.apache.org/jira/browse/LUCENE-5398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Peng Cheng updated LUCENE-5398:
-------------------------------
Description:
Previous Lucene implementation store field norms of all documents in memory,
float values are therefore encoded into byte to minimize memory consumption.
Recent release no longer have this constraint (see LUCENE-5078, and discussion
at http://lucene.markmail.org/message/jtwit3pwu5oiqr2h), users are encouraged
to implement their own encodeNormValue() to encode them into/decode from any
type including int, byte and long, to fulfil their request for precision.
But the legacy NormValueSource still typecast any long encoding into byte, as
seen in line 74 in the java file, making any TFIDFSimilarity using more
accurate encoding useless.
It should be removed for the greater good.
was:
Previous Lucene used to store norms in memory, hence float values are encoded
into byte to avoid memory overflow.
Recent release no longer have this constraint (see LUCENE-5078, and discussion
at http://lucene.markmail.org/message/jtwit3pwu5oiqr2h), as a result, normValue
are generally encoded to/decoded from long.
But the legacy NormValueSource still typecast any long encoding into byte, as
seen in line 74 in the java file, making any TFIDFSimilarity using more
accurate encoding useless.
It should be removed for the greater good.
> NormValueSource unable to read long field norm
> ----------------------------------------------
>
> Key: LUCENE-5398
> URL: https://issues.apache.org/jira/browse/LUCENE-5398
> Project: Lucene - Core
> Issue Type: Bug
> Components: core/query/scoring
> Affects Versions: 4.6
> Environment: Ubuntu 12.04
> Reporter: Peng Cheng
> Priority: Trivial
> Fix For: 4.7
>
> Attachments: NormValueSource.java
>
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> Previous Lucene implementation store field norms of all documents in memory,
> float values are therefore encoded into byte to minimize memory consumption.
> Recent release no longer have this constraint (see LUCENE-5078, and
> discussion at http://lucene.markmail.org/message/jtwit3pwu5oiqr2h), users are
> encouraged to implement their own encodeNormValue() to encode them
> into/decode from any type including int, byte and long, to fulfil their
> request for precision.
> But the legacy NormValueSource still typecast any long encoding into byte, as
> seen in line 74 in the java file, making any TFIDFSimilarity using more
> accurate encoding useless.
> It should be removed for the greater good.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]