msokolov commented on PR #15900: URL: https://github.com/apache/lucene/pull/15900#issuecomment-4173151169
As long as it doesn't produce invalid characters, that sounds reasonable, but I could honestly see different purposes for this filter, so I don't think there's a right or wrong answer. If I'm trying to limit the storage I might care about bytes or java characters (16-bit "words"), or maybe I really want to be concerned about characters in the sense of glyphs in the writing system. But we shouldn't be concerning ourselves with combining forms and that kind of thing -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
