Github user osma commented on the pull request:
https://github.com/apache/jena/pull/64#issuecomment-102297487
Thanks! I think decoupling these would be a good thing for other purposes
too. For example, I have some plans to propose storing (optionally) the full
literal values in the Lucene index, so that they can be used in SPARQL queries,
and having the language tags in the index would help a bit with that.
Still a couple more proposals:
1. Your new LuceneUtil class appears to only contain (static) methods
related to the multilingual analyzer. I propose moving those methods to
TextIndexLuceneMultilingual and removing the LuceneUtil class, if there is no
other use for it.
2. The TextDatasetFactory methods for creating indexes are getting more and
more convoluted as new parameters and variations are added. (I'm also guilty of
this - when I added graphField support, I just made new versions of the methods
with more parameters, but left in the old ones too for compatibility.) I think
that it would make sense to introduce a new class TextIndexConfiguration which
could be used to set only those parameters that are relevant to the use,
something like this:
```java
TextIndexConfiguration conf = new TextIndexConfiguration(entDef);
conf.setAnalyzer(analyzer);
conf.setGraphField("graph");
conf.setLanguageField("lang");
Dataset dsLucene = TextDatasetFactory.createLucene(dsBase, "textdir", conf);
```
Using a multilingual index could perhaps be a boolean flag (since it's not
just a standard analyzer), like this:
```java
conf.setMultilingualAnalyzer(true);
```
This way, I think we could avoid blowing up the number of createLucene and
createLuceneIndex methods further. There would only be one more variant of each
method in TextDatasetFactory, taking a TextIndexConfiguration parameter. When
new features are added (such as langField and multilingual indexing), these can
be added to TextIndexConfiguration but there is no need for further
createLucene-style methods. createLuceneIndexMultilingual could be dropped.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---