[GitHub] jena pull request: jena-text multilingual indexing (take 2)

amiara514 Mon, 11 May 2015 13:20:06 -0700

Github user amiara514 commented on the pull request:

    https://github.com/apache/jena/pull/52#issuecomment-101035985
  
    > But since there is only a relatively small number of Lucene analyzers 
anyway, maybe this is OK.
    
    It's why it's done like this :-)
    
    >No, that wouldn't work. You have to use the same analyzer for both 
indexing and queries (in this case, the language-specific analyzer), otherwise 
the tokens won't match. 
    
    Exactly
    
    > But I think it should still be possible to share the same index, if you 
have a field that specifies the language and make sure to target your queries 
only to the specific language.
    
    Store the language as an extra field is easy to do during the document 
creation (on the addEntity method). Add an extra param in queries is not a 
problem either (done in my solution).
    But how to change correctly the existent code to target Lucene taking that 
extra language into account ?




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] jena pull request: jena-text multilingual indexing (take 2)

Reply via email to