[GitHub] jena pull request: jena-text multilingual indexing (take 2)

osma Tue, 28 Apr 2015 10:51:44 -0700

Github user osma commented on the pull request:

    https://github.com/apache/jena/pull/52#issuecomment-97152034
  
    Hi Alexis!
    
    > Hi Osma, it's now ok for merge.
    
    Excellent!
    
    > For the other part, I will propose soon the "triple deletion" which clean 
the related entry in the index.
    
    OK this is good. I'm actually more interested in that part :)
    
    > For the synchronization of jena-tdb and Lucene transactions, the current 
codebase seems to manage that.
    > I'm not 100% sure yet.
    
    Fine.
    
    I took a new look at the code. Some comments/questions:
    
    1. You seem to have added support for all(?) of the language-specific 
analyzers in Lucene. This is good, though even better would be to be able to 
override these in the configuration (similar to the way it is already possible 
to use a custom analyzer with jena-text).
    
    2. You are changing the method signatures for some public methods, e.g. 
TextDatasetFactory.createLuceneIndex, TextDatasetFactory.createLucene, 
TextIndexLucene constructor. I'm not sure that this is a good idea as it might 
break other people's code that call those methods. I suggest that you add 
methods that support the old parameters, which can then call the extended 
variants. This way, you shouldn't need to change the existing tests, as you 
currently do.
    
    3. This seems a bit suspicious:
    
    +            if (indexer instanceof TextIndexLuceneMultiLingual) {
    +                String lang = o.getLiteral().language();
    +                ((TextIndexLuceneMultiLingual)indexer).addEntity(entity, 
lang);
    +            }
    +            else
    +                indexer.addEntity(entity) ;
    
    Would it be possible to avoid the instanceof check and simply detect the 
language from within the entity? (possibly extending Entity instead, if it 
doesn't keep track of the language of the literal) I.e. this code would just be 
"indexer.addEntity(entity)" as it used to be and the magic to detect the 
language would be within TextIndexLuceneMultiLingual.addEntity(entity) where it 
arguably belongs...
    
    4. I see that you have touched the existing unit tests (which I think may 
be a bad idea, see above) but you have not written unit test that specifically 
test the multilingual indexing. Would it be possible for you to add some of 
those? These would also serve as examples for the usage and expected behavior.
    
    5. What about documentation?




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] jena pull request: jena-text multilingual indexing (take 2)

Reply via email to