[
https://issues.apache.org/jira/browse/LUCENE-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12772133#action_12772133
]
Steven Rowe commented on LUCENE-2019:
-------------------------------------
bq. Steven, the only reason I might disagree is that a Lucene Index is supposed
to be portable across different languages other than Lucene Java.
Right, but not all Lucene indexes in-the-wild are accessed from more than one
language. The vast majority of Lucene index uses, I'd venture to guess, are
single-language, single-process uses.
bq. in my opinion, if you are to store process-internal codepoints as abstract
characters in terms, then you should not claim that Lucene indexes are in any
Unicode format, because then they violate the standard.
I strongly disagree with the assumption that interchange and serialization are
synonymous.
bq. By *not* storing them in terms, then you are free to use them as
delimiters, or other purposes. right now U+FFFF is used as a delimiter, but who
knows, maybe someday you might need more?
I actually agree with this argument. What if Lucene needs more
process-internal characters? I don't have any way of gauging the probability
that it will in the future (other than the last eight years of history, during
which only one was deemed necessary). But what does Mike M. say? "Design for
now" or something like that?
> map unicode process-internal codepoints to replacement character
> ----------------------------------------------------------------
>
> Key: LUCENE-2019
> URL: https://issues.apache.org/jira/browse/LUCENE-2019
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Index
> Reporter: Robert Muir
> Priority: Minor
> Attachments: LUCENE-2019.patch
>
>
> A spinoff from LUCENE-2016.
> There are several process-internal codepoints in unicode, we should not store
> these in the index.
> Instead they should be mapped to replacement character (U+FFFD), so they can
> be used process-internally.
> An example of this is how Lucene Java currently uses U+FFFF
> process-internally, it can't be in the index or will cause problems.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]