[GitHub] jena pull request: Jena-text multilingual alternative implementati...

osma Thu, 14 May 2015 12:37:00 -0700

Github user osma commented on the pull request:

    https://github.com/apache/jena/pull/64#issuecomment-102144875
  
    Hi, I see that you already changed your code - impressive work!
    
    One more suggestion - I hope it won't come too late, since you've already 
moved code from TextIndexLucene to TextIndexLuceneMultilingual like I 
suggested...
    
    I've been thinking about how I could make use of this in my application 
(Skosmos). I don't have a need for language-specific analyzers 
(LowerCaseKeywordAnalyzer works better for the application, in all languages), 
but it could be useful to be able to target searches based on language - this 
way I could avoid some false hits from the text index and thus get faster 
queries overall. So I wonder if it would be possible to separate these two 
aspects:
    
    1. Store language tags of literals in the Lucene index and be able to 
restrict the query to a specific language with a query parameter
    2. Use different analyzers for different languages
    
    Right now your code does both, but it's not possible to do only 1. 
Obviously 2 depends on 1.
    
    How about adding a new option "langField", similar to "graphField", that 
can be configured via the assembler (or as a constructor parameter, just like 
graphField). When set to the name of a field (the obvious choice would be 
"lang"), the language tags would get stored in the index, and it would be 
possible to target queries for a specific language. This would already be base 
functionality for TextIndexLucene (sorry - I know you just moved the code 
away!).
    
    Then the MultilingualAnalyzer, implemented in TextIndexLuceneMultilingual, 
would depend on having langField set and would actually cause the analyzer to 
be dynamically selected based on language.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] jena pull request: Jena-text multilingual alternative implementati...

Reply via email to