Github user osma commented on the pull request:

    https://github.com/apache/jena/pull/64#issuecomment-102297487
  
    Thanks! I think decoupling these would be a good thing for other purposes 
too. For example, I have some plans to propose storing (optionally) the full 
literal values in the Lucene index, so that they can be used in SPARQL queries, 
and having the language tags in the index would help a bit with that.
    
    Still a couple more proposals:
    
    1. Your new LuceneUtil class appears to only contain (static) methods 
related to the multilingual analyzer. I propose moving those methods to 
TextIndexLuceneMultilingual and removing the LuceneUtil class, if there is no 
other use for it.
    2. The TextDatasetFactory methods for creating indexes are getting more and 
more convoluted as new parameters and variations are added. (I'm also guilty of 
this - when I added graphField support, I just made new versions of the methods 
with more parameters, but left in the old ones too for compatibility.) I think 
that it would make sense to introduce a new class TextIndexConfiguration which 
could be used to set only those parameters that are relevant to the use, 
something like this:
    
    ```java
    TextIndexConfiguration conf = new TextIndexConfiguration(entDef);
    conf.setAnalyzer(analyzer);
    conf.setGraphField("graph");
    conf.setLanguageField("lang");
    Dataset dsLucene = TextDatasetFactory.createLucene(dsBase, "textdir", conf);
    ```
    
    Using a multilingual index could perhaps be a boolean flag (since it's not 
just a standard analyzer), like this:
    
    ```java
    conf.setMultilingualAnalyzer(true);
    ```
    
    This way, I think we could avoid blowing up the number of createLucene and 
createLuceneIndex methods further. There would only be one more variant of each 
method in TextDatasetFactory, taking a TextIndexConfiguration parameter. When 
new features are added (such as langField and multilingual indexing), these can 
be added to TextIndexConfiguration but there is no need for further 
createLucene-style methods. createLuceneIndexMultilingual could be dropped.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to