Github user osma commented on the pull request:
https://github.com/apache/jena/pull/52#issuecomment-97152034
Hi Alexis!
> Hi Osma, it's now ok for merge.
Excellent!
> For the other part, I will propose soon the "triple deletion" which clean
the related entry in the index.
OK this is good. I'm actually more interested in that part :)
> For the synchronization of jena-tdb and Lucene transactions, the current
codebase seems to manage that.
> I'm not 100% sure yet.
Fine.
I took a new look at the code. Some comments/questions:
1. You seem to have added support for all(?) of the language-specific
analyzers in Lucene. This is good, though even better would be to be able to
override these in the configuration (similar to the way it is already possible
to use a custom analyzer with jena-text).
2. You are changing the method signatures for some public methods, e.g.
TextDatasetFactory.createLuceneIndex, TextDatasetFactory.createLucene,
TextIndexLucene constructor. I'm not sure that this is a good idea as it might
break other people's code that call those methods. I suggest that you add
methods that support the old parameters, which can then call the extended
variants. This way, you shouldn't need to change the existing tests, as you
currently do.
3. This seems a bit suspicious:
+ if (indexer instanceof TextIndexLuceneMultiLingual) {
+ String lang = o.getLiteral().language();
+ ((TextIndexLuceneMultiLingual)indexer).addEntity(entity,
lang);
+ }
+ else
+ indexer.addEntity(entity) ;
Would it be possible to avoid the instanceof check and simply detect the
language from within the entity? (possibly extending Entity instead, if it
doesn't keep track of the language of the literal) I.e. this code would just be
"indexer.addEntity(entity)" as it used to be and the magic to detect the
language would be within TextIndexLuceneMultiLingual.addEntity(entity) where it
arguably belongs...
4. I see that you have touched the existing unit tests (which I think may
be a bad idea, see above) but you have not written unit test that specifically
test the multilingual indexing. Would it be possible for you to add some of
those? These would also serve as examples for the usage and expected behavior.
5. What about documentation?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---