[ 
https://issues.apache.org/jira/browse/LUCENE-7475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15551694#comment-15551694
 ] 

Adrien Grand commented on LUCENE-7475:
--------------------------------------

bq. We can improve how we encode it on future issues.

Yes, we will need to improve the format indeed. The current  sparse format uses 
a bitset to store docs with norms, so it is still wasteful in the very sparse 
case: if less than 1/32 docs have a values even storing the full 4-byte doc ids 
would be more efficent. On the other hand, if the norms are almost dense, there 
will be a performance hit so we might want to keep the dense encoding above a 
certain threshold of documents that have a value.

Thanks for having a look. I'll address your comments and push.

> Sparse norms
> ------------
>
>                 Key: LUCENE-7475
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7475
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>             Fix For: master (7.0)
>
>         Attachments: LUCENE-7475.patch, LUCENE-7475.patch
>
>
> Even though norms now have an iterator API, they are still always dense in 
> practice since documents that do not have a value get assigned 0 as a norm 
> value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to