[GitHub] jena pull request: jena-text multilingual indexing (take 2)

osma Mon, 11 May 2015 12:01:18 -0700

Github user osma commented on the pull request:

    https://github.com/apache/jena/pull/52#issuecomment-101017747
  
    Great tests!
    
    I wonder if there isn't a better method to convert 3 letter ISO 639 
language codes to the 2 letter equivalents. But since there is only a 
relatively small number of Lucene analyzers anyway, maybe this is OK.
    
    > About the implementation, your proposal would use a StandardAnalyzer on 
indexing phase and a localized queryAnalyzer for queries ?
    
    No, that wouldn't work. You have to use the same analyzer for both indexing 
and queries (in this case, the language-specific analyzer), otherwise the 
tokens won't match. 
    
    But I think it should still be possible to share the same index, if you 
have a field that specifies the language and make sure to target your queries 
only to the specific language.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] jena pull request: jena-text multilingual indexing (take 2)

Reply via email to