[ 
https://issues.apache.org/jira/browse/LUCENE-2846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12980171#action_12980171
 ] 

Michael McCandless commented on LUCENE-2846:
--------------------------------------------

bq. Couldn't we investigate a sparse data structure to be used instead?

This would be very interesting to explore.  The fact that norms are dense, on 
disk and in memory, can cause horrific problems like your index taking much 
much more disk space and RAM on optimize/big merge finishing.

But I think that's orthogonal to the improvements here?  Ie this issue removes 
fake norms, invalid uses of default Sim, etc.

Also, I would worry about the lookup cost of sparse vectors in RAM -- looking 
up the norm per doc is a severe hotspot on Lucene.

> omitTF is viral, but omitNorms is anti-viral.
> ---------------------------------------------
>
>                 Key: LUCENE-2846
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2846
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Robert Muir
>             Fix For: 4.0
>
>         Attachments: LUCENE-2846.patch, LUCENE-2846.patch, LUCENE-2846.patch, 
> LUCENE-2846.patch
>
>
> omitTF is viral. if you add document 1 with field "foo" as omitTF, then 
> document 2 has field "foo" without omitTF, they are both treated as omitTF.
> but omitNorms is the opposite. if you have a million documents with field 
> "foo" with omitNorms, then you add just one document without omitting norms, 
> now you suddenly have a million 'real norms'.
> I think it would be good for omitNorms to be viral too, just for consistency, 
> and also to prevent huge byte[]'s.
> but another option is to make omitTF anti-viral, which is more "schemaless" i 
> guess.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to