[GitHub] jena issue #436: JENA-1556 implementation

xristy Fri, 15 Jun 2018 12:15:55 -0700

Github user xristy commented on the issue:

    https://github.com/apache/jena/pull/436
  
    @kinow I think the configuration and results reflect the `text:searchFor` 
functionality; however, in the analyzer defn for the tag `jp`:
    
             text:analyzer [
               a text:GenericAnalyzer ;
               text:class "org.apache.lucene.analysis.ja.JapaneseAnalyzer" ;
               text:tokenizer <#tokenizer> ;
            ]
    
    the `text:tokenizer <#tokenizer> ;` is not effective. Tokenizer specs work 
with `ConfigurableAnalyzer` and are ignored in `text:GenericAnalyzer`. Perhaps 
a warning should be logged but that means checking for the presence of 
unsupported predicates?
    
    Re:
    > the complexity put on TextIndexLucene. A few methods are getting a 
boolean flag to change their behaviour. And when that happens too much, 
sometimes it may feel like the method has two behaviours, and writing tests or 
changing it may be challenging. Maybe it could extend it in some other way.
    
    I'm not sure how to improve this. The flag in `highlightResults` affects 
the value of the `effectiveField` in the context of a larger method, and the 
flag in `getQueryAnalyzer` conditions whether any useful work is done or not. I 
factored that as a method rather than leaving it inline in `query$` to reduce 
the clutter in that principal routine.
    
    Re:
    > it's not a batteries-included feature, if I understand correctly. You 
still need to prepare the other part of the solution, be it a tokenizer that 
gets a value such as "kinou", then searches some dictionary, and finally create 
tokens for :ex3 dc:title "æ¨æ¥" and "ãã®ã", or change the data a bit. 
Maybe this could be a separate project, or an extension of sorts.
    
    I'm not sure what you are recommending here. The `text:searchFor` and 
`text:auxIndex` functionalities are ways of configuring the _application_ of 
appropriate analyzers that have been separately defined. So yes the features 
are not self-contained in that analyzers do have to be supplied.

---

[GitHub] jena issue #436: JENA-1556 implementation

Reply via email to