[GitHub] jena pull request: JENA-1134: support AnalyzingQueryParser in jena...

osma Wed, 30 Mar 2016 01:36:22 -0700

GitHub user osma opened a pull request:

    https://github.com/apache/jena/pull/131


    JENA-1134: support AnalyzingQueryParser in jena-text

    This PR makes it possible to select either the standard Lucene QueryParser 
or the AnalyzingQueryParser using jena-text configuration like this:
    
    ```
    <#indexLucene> a text:TextIndexLucene ;
        text:directory <file:Lucene> ;
        text:queryParser text:AnalyzingQueryParser ;
        text:queryAnalyzer [
            a text:ConfigurableAnalyzer ;
            text:tokenizer text:KeywordTokenizer ;
            text:filters (text:ASCIIFoldingFilter text:LowerCaseFilter)
        ] 
        text:entityMap <#entMap> ;
    ```
    
    The main difference between these query parsers is that 
AnalyzingQueryParser performs analysis also for wildcard queries. For example, 
if you use ASCIIFoldingFilter as above, if you want a search for `Ã©du*` to 
match `Ã©ducation` you need AnalyzingQueryParser.
    
    One problem I had with the implementation is that the query parser needs to 
be constructed dynamically for every query, so I need to store the information 
about which query parser to use instead of just storing the 
QueryParser/AnalyzingQueryParser instance directly. I solved this by simply 
storing the type of query parser as a string, i.e. either `"QueryParser"` or 
`"AnalyzingQueryParser"`, and then dynamically construct the correct type of 
parser based on this information. I'm sure there are more elegant ways of doing 
this, e.g. creating Factories for each parser type and saving the correct kind 
of Factory, but I don't want to overengineer. Opinions?
    
    This could rather easily be extended to other query parser types supported 
by Lucene, though I'm unsure how useful that would be in practice. 
ComplexPhraseQueryParser and/or PrecedenceQueryParser could perhaps be useful 
to somebody.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/osma/jena jena-text-queryparser

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/jena/pull/131.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #131
    
----
commit 547d4ac64e4331e45cba96e045345a5f3ab214a7
Author: Osma Suominen <o...@apache.org>
Date:   2016-03-29T14:23:27Z

    simplify parseQuery and preParseQuery: get rid of primaryField argument as 
it is always the same

commit 22a81f8cbc9498cbe4f1970115aa32f9c21fb239
Author: Osma Suominen <o...@apache.org>
Date:   2016-03-30T08:09:02Z

    JENA-1134: basic support for AnalyzingQueryParser

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] jena pull request: JENA-1134: support AnalyzingQueryParser in jena...

Reply via email to