msokolov commented on PR #15900:
URL: https://github.com/apache/lucene/pull/15900#issuecomment-4173151169

   As long as it doesn't produce invalid characters, that sounds reasonable, 
but I could honestly see different purposes for this filter, so I don't think 
there's a right or wrong answer. If I'm trying to limit the storage I might 
care about bytes or java characters (16-bit "words"), or maybe I really want to 
be concerned about characters in the sense of glyphs in the writing system. But 
we shouldn't be concerning ourselves with combining forms and that kind of thing


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to