Github user osma commented on the pull request:
https://github.com/apache/jena/pull/52#issuecomment-101017747
Great tests!
I wonder if there isn't a better method to convert 3 letter ISO 639
language codes to the 2 letter equivalents. But since there is only a
relatively small number of Lucene analyzers anyway, maybe this is OK.
> About the implementation, your proposal would use a StandardAnalyzer on
indexing phase and a localized queryAnalyzer for queries ?
No, that wouldn't work. You have to use the same analyzer for both indexing
and queries (in this case, the language-specific analyzer), otherwise the
tokens won't match.
But I think it should still be possible to share the same index, if you
have a field that specifies the language and make sure to target your queries
only to the specific language.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---