[ https://issues.apache.org/jira/browse/LUCENE-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12772133#action_12772133 ]
Steven Rowe commented on LUCENE-2019: ------------------------------------- bq. Steven, the only reason I might disagree is that a Lucene Index is supposed to be portable across different languages other than Lucene Java. Right, but not all Lucene indexes in-the-wild are accessed from more than one language. The vast majority of Lucene index uses, I'd venture to guess, are single-language, single-process uses. bq. in my opinion, if you are to store process-internal codepoints as abstract characters in terms, then you should not claim that Lucene indexes are in any Unicode format, because then they violate the standard. I strongly disagree with the assumption that interchange and serialization are synonymous. bq. By *not* storing them in terms, then you are free to use them as delimiters, or other purposes. right now U+FFFF is used as a delimiter, but who knows, maybe someday you might need more? I actually agree with this argument. What if Lucene needs more process-internal characters? I don't have any way of gauging the probability that it will in the future (other than the last eight years of history, during which only one was deemed necessary). But what does Mike M. say? "Design for now" or something like that? > map unicode process-internal codepoints to replacement character > ---------------------------------------------------------------- > > Key: LUCENE-2019 > URL: https://issues.apache.org/jira/browse/LUCENE-2019 > Project: Lucene - Java > Issue Type: Improvement > Components: Index > Reporter: Robert Muir > Priority: Minor > Attachments: LUCENE-2019.patch > > > A spinoff from LUCENE-2016. > There are several process-internal codepoints in unicode, we should not store > these in the index. > Instead they should be mapped to replacement character (U+FFFD), so they can > be used process-internally. > An example of this is how Lucene Java currently uses U+FFFF > process-internally, it can't be in the index or will cause problems. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org