[
https://issues.apache.org/jira/browse/LUCENE-5542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942530#comment-13942530
]
Robert Muir commented on LUCENE-5542:
-------------------------------------
The codec can already decide how to encode the values. Making the API more
complicated doesn't seem to buy us anything. I'm open to a benchmark showing
this, but I'm not seeing it.
> Explore making DVConsumer sparse-aware
> --------------------------------------
>
> Key: LUCENE-5542
> URL: https://issues.apache.org/jira/browse/LUCENE-5542
> Project: Lucene - Core
> Issue Type: Improvement
> Components: core/codecs
> Reporter: Shai Erera
>
> Today DVConsumer API requires the caller to pass a value for every document,
> where {{null}} means "this doc has no value". The Codec can then choose how
> to encode the values, i.e. whether it encodes a 0 for a numeric field, or
> encodes the sparse docs. In practice, from what I see, we choose to encode
> the 0s.
> I wonder if we e.g. added an {{Iterable<Number>}} to
> DVConsumer.addXYZField(), if that would make a better API. The caller only
> passes <doc,value> pairs and it's up to the Codec to decide how it wants to
> encode the missing values. Like, if a user's app truly has a sparse NDV,
> IndexWriter doesn't need to "fill the gaps" artificially. It's the job of the
> Codec.
> To be clear, I don't propose to change any Codec implementation in this issue
> (w.r.t. sparse encoding - yes/no), only change the API to reflect that
> sparseness. I think that if we'll ever want to encode sparse values, it will
> be a more convenient API.
> Thoughts? I volunteer to do this work, but want to get others' opinion before
> I start.
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]