[
https://issues.apache.org/jira/browse/LUCENE-3504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13124925#comment-13124925
]
Michael McCandless commented on LUCENE-3504:
--------------------------------------------
So this would mean doc values can never support the notion of a
"missing value" for a document, right?
Ie, this is more limited than FieldCache.
So it's the app's job to always index a doc value for every document,
else the behavior is hardwired at search time (0 for numerics, new
byte[0] for var-length bytes, zero bytes for fixed-length bytes).
I guess if for some reason an app really has a problem with this, it
could go and store its own "single bit docvalues field" (eg int
field with only 0 and 1 values) to indicate "missing-ness", and then
at sort time, sort first by this field and second by the "normal" sort
field(s). This would let you sort missing first or last, at least.
OK I actually like this approach: it's stricter than field cache.
The app is not allowed to skip documents when making a doc-values
field, or if it does, it must accept the hardwired defaults we return
for such documents.
> DocValues: deref/sorted bytes types shouldn't return empty byte[] when doc
> didn't have a value
> ----------------------------------------------------------------------------------------------
>
> Key: LUCENE-3504
> URL: https://issues.apache.org/jira/browse/LUCENE-3504
> Project: Lucene - Java
> Issue Type: Bug
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Fix For: 4.0
>
>
> I'm looking at making a FieldComparator that uses DV's SortedSource to
> sort by string field (ie just like TermOrdValComparator, except using
> DV instead of FieldCache). We already have comparators for DV int and
> float DV fields.
> But one thing I noticed is we can't detect documents that didn't have
> any value indexed vs documents that had empty byte[] indexed.
> This is easy to fix (and we used to do this), because these types are
> deref'd (ie, each doc stores an address, and then separately looks up
> the byte[] at that address), we can reserve ord/address 0 to mean "doc
> didn't have the field". Then we should return null when you retrieve
> the BytesRef value for that field.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]