[ https://issues.apache.org/jira/browse/LUCENE-3504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13124925#comment-13124925 ]
Michael McCandless commented on LUCENE-3504: -------------------------------------------- So this would mean doc values can never support the notion of a "missing value" for a document, right? Ie, this is more limited than FieldCache. So it's the app's job to always index a doc value for every document, else the behavior is hardwired at search time (0 for numerics, new byte[0] for var-length bytes, zero bytes for fixed-length bytes). I guess if for some reason an app really has a problem with this, it could go and store its own "single bit docvalues field" (eg int field with only 0 and 1 values) to indicate "missing-ness", and then at sort time, sort first by this field and second by the "normal" sort field(s). This would let you sort missing first or last, at least. OK I actually like this approach: it's stricter than field cache. The app is not allowed to skip documents when making a doc-values field, or if it does, it must accept the hardwired defaults we return for such documents. > DocValues: deref/sorted bytes types shouldn't return empty byte[] when doc > didn't have a value > ---------------------------------------------------------------------------------------------- > > Key: LUCENE-3504 > URL: https://issues.apache.org/jira/browse/LUCENE-3504 > Project: Lucene - Java > Issue Type: Bug > Reporter: Michael McCandless > Assignee: Michael McCandless > Fix For: 4.0 > > > I'm looking at making a FieldComparator that uses DV's SortedSource to > sort by string field (ie just like TermOrdValComparator, except using > DV instead of FieldCache). We already have comparators for DV int and > float DV fields. > But one thing I noticed is we can't detect documents that didn't have > any value indexed vs documents that had empty byte[] indexed. > This is easy to fix (and we used to do this), because these types are > deref'd (ie, each doc stores an address, and then separately looks up > the byte[] at that address), we can reserve ord/address 0 to mean "doc > didn't have the field". Then we should return null when you retrieve > the BytesRef value for that field. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org