Shai Erera created LUCENE-5542:
----------------------------------
Summary: Explore making DVConsumer sparse-aware
Key: LUCENE-5542
URL: https://issues.apache.org/jira/browse/LUCENE-5542
Project: Lucene - Core
Issue Type: Improvement
Components: core/codecs
Reporter: Shai Erera
Today DVConsumer API requires the caller to pass a value for every document,
where {{null}} means "this doc has no value". The Codec can then choose how to
encode the values, i.e. whether it encodes a 0 for a numeric field, or encodes
the sparse docs. In practice, from what I see, we choose to encode the 0s.
I wonder if we e.g. added an {{Iterable<Number>}} to DVConsumer.addXYZField(),
if that would make a better API. The caller only passes <doc,value> pairs and
it's up to the Codec to decide how it wants to encode the missing values. Like,
if a user's app truly has a sparse NDV, IndexWriter doesn't need to "fill the
gaps" artificially. It's the job of the Codec.
To be clear, I don't propose to change any Codec implementation in this issue
(w.r.t. sparse encoding - yes/no), only change the API to reflect that
sparseness. I think that if we'll ever want to encode sparse values, it will be
a more convenient API.
Thoughts? I volunteer to do this work, but want to get others' opinion before I
start.
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]