[ https://issues.apache.org/jira/browse/LUCENE-5989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14482902#comment-14482902 ]
Michael McCandless commented on LUCENE-5989: -------------------------------------------- bq. If we fix this .document api to allow a StringField to have a binary value, maybe it could help with merge code. This would be very nice ... I struggled some with it, but got stuck with StorableField.stringValue() returning String. I think we need to keep that because that's also the API apps use to retrieve their stored fields. But the default merging operates on StorableDocument/StorableField, so I'm not sure how to separate the two. Really there are two concepts: the "schema" for this doc (did it store a binary or string value for this field), and what's used to represent a string value (byte[] vs String), and both concepts are being smooshed together into this API. Maybe we could baby step here, and just change StoredFieldVisitor.stringField to take byte[]? I know this doesn't help all the stupid work we do during default merge to decode/encode but at least it's a start ... > Add BinaryField, to index a single binary token > ----------------------------------------------- > > Key: LUCENE-5989 > URL: https://issues.apache.org/jira/browse/LUCENE-5989 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Michael McCandless > Assignee: Michael McCandless > Fix For: 5.0, Trunk > > Attachments: LUCENE-5989.patch > > > 5 years ago (LUCENE-1458) we "enabled" fully binary terms in the > lowest levels of Lucene (the codec APIs) yet today, actually adding an > arbitrary byte[] binary term during indexing is far from simple: you > must make a custom Field with a custom TokenStream and a custom > TermToBytesRefAttribute, as far as I know. > This is supremely expert, I wonder if anyone out there has succeeded > in doing so? > I think we should make indexing a single byte[] as simple as indexing > a single String. > This is a pre-cursor for issues like LUCENE-5596 (encoding IPv6 > address as byte[16]) and LUCENE-5879 (encoding native numeric values > in their simple binary form). -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org