Shai Erera created LUCENE-5542:
----------------------------------

             Summary: Explore making DVConsumer sparse-aware
                 Key: LUCENE-5542
                 URL: https://issues.apache.org/jira/browse/LUCENE-5542
             Project: Lucene - Core
          Issue Type: Improvement
          Components: core/codecs
            Reporter: Shai Erera


Today DVConsumer API requires the caller to pass a value for every document, 
where {{null}} means "this doc has no value". The Codec can then choose how to 
encode the values, i.e. whether it encodes a 0 for a numeric field, or encodes 
the sparse docs. In practice, from what I see, we choose to encode the 0s.

I wonder if we e.g. added an {{Iterable<Number>}} to DVConsumer.addXYZField(), 
if that would make a better API. The caller only passes <doc,value> pairs and 
it's up to the Codec to decide how it wants to encode the missing values. Like, 
if a user's app truly has a sparse NDV, IndexWriter doesn't need to "fill the 
gaps" artificially. It's the job of the Codec.

To be clear, I don't propose to change any Codec implementation in this issue 
(w.r.t. sparse encoding - yes/no), only change the API to reflect that 
sparseness. I think that if we'll ever want to encode sparse values, it will be 
a more convenient API.

Thoughts? I volunteer to do this work, but want to get others' opinion before I 
start.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to