[
https://issues.apache.org/jira/browse/LUCENE-4540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13491185#comment-13491185
]
Robert Muir commented on LUCENE-4540:
-------------------------------------
I tried this on that geonames database since my default indexing (just shoving
everything in as a TextField)
creates a huge .nrm file today (150MB: 8M docs * 19 fields). Just as a test I
tried a simple similarity
implementation that uses
{code}
@Override
public void computeNorm(FieldInvertState state, Norm norm) {
norm.setPackedLong(state.getLength());
}
{code}
{noformat}
-rw-rw-r-- 1 rmuir rmuir 49339454 Nov 5 22:30 _7e_nrm.cfs
{noformat}
If you want to use boosts too, you would have to be careful how you encode, but
I think this can be useful.
In this case its 1/3 of the RAM, even though documents lengths are exact vs.
lossy (though most fields are
shortish, some are huge, like alternate names fields for major countries and
cities, which have basically every
language imaginable shoved in the field: thats why it doesnt save more I think)
> Allow packed ints norms
> -----------------------
>
> Key: LUCENE-4540
> URL: https://issues.apache.org/jira/browse/LUCENE-4540
> Project: Lucene - Core
> Issue Type: Task
> Components: core/index
> Reporter: Robert Muir
> Attachments: LUCENE-4540.patch
>
>
> I was curious what the performance would be, because it might be useful
> option to use packedints for norms if you have lots of fields and still want
> good scoring:
> Today the smallest norm per-field-per-doc you can use is a single byte, and
> if you have _f_ fields with norms enabled and _n_ docs, it uses _f_ * _n_
> bytes of space in RAM.
> Especially if you aren't using index-time boosting (or even if you are, but
> not with ridiculous values), this could be wasting a ton of RAM.
> But then I noticed there was no clean way to allow you to do this in your
> Similarity: its a trivial patch.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]