Github user osma commented on the pull request:
https://github.com/apache/jena/pull/64#issuecomment-102144875
Hi, I see that you already changed your code - impressive work!
One more suggestion - I hope it won't come too late, since you've already
moved code from TextIndexLucene to TextIndexLuceneMultilingual like I
suggested...
I've been thinking about how I could make use of this in my application
(Skosmos). I don't have a need for language-specific analyzers
(LowerCaseKeywordAnalyzer works better for the application, in all languages),
but it could be useful to be able to target searches based on language - this
way I could avoid some false hits from the text index and thus get faster
queries overall. So I wonder if it would be possible to separate these two
aspects:
1. Store language tags of literals in the Lucene index and be able to
restrict the query to a specific language with a query parameter
2. Use different analyzers for different languages
Right now your code does both, but it's not possible to do only 1.
Obviously 2 depends on 1.
How about adding a new option "langField", similar to "graphField", that
can be configured via the assembler (or as a constructor parameter, just like
graphField). When set to the name of a field (the obvious choice would be
"lang"), the language tags would get stored in the index, and it would be
possible to target queries for a specific language. This would already be base
functionality for TextIndexLucene (sorry - I know you just moved the code
away!).
Then the MultilingualAnalyzer, implemented in TextIndexLuceneMultilingual,
would depend on having langField set and would actually cause the analyzer to
be dynamically selected based on language.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---