GitHub user osma opened a pull request:
https://github.com/apache/jena/pull/131
JENA-1134: support AnalyzingQueryParser in jena-text
This PR makes it possible to select either the standard Lucene QueryParser
or the AnalyzingQueryParser using jena-text configuration like this:
```
<#indexLucene> a text:TextIndexLucene ;
text:directory <file:Lucene> ;
text:queryParser text:AnalyzingQueryParser ;
text:queryAnalyzer [
a text:ConfigurableAnalyzer ;
text:tokenizer text:KeywordTokenizer ;
text:filters (text:ASCIIFoldingFilter text:LowerCaseFilter)
]
text:entityMap <#entMap> ;
```
The main difference between these query parsers is that
AnalyzingQueryParser performs analysis also for wildcard queries. For example,
if you use ASCIIFoldingFilter as above, if you want a search for `édu*` to
match `éducation` you need AnalyzingQueryParser.
One problem I had with the implementation is that the query parser needs to
be constructed dynamically for every query, so I need to store the information
about which query parser to use instead of just storing the
QueryParser/AnalyzingQueryParser instance directly. I solved this by simply
storing the type of query parser as a string, i.e. either `"QueryParser"` or
`"AnalyzingQueryParser"`, and then dynamically construct the correct type of
parser based on this information. I'm sure there are more elegant ways of doing
this, e.g. creating Factories for each parser type and saving the correct kind
of Factory, but I don't want to overengineer. Opinions?
This could rather easily be extended to other query parser types supported
by Lucene, though I'm unsure how useful that would be in practice.
ComplexPhraseQueryParser and/or PrecedenceQueryParser could perhaps be useful
to somebody.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/osma/jena jena-text-queryparser
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/jena/pull/131.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #131
----
commit 547d4ac64e4331e45cba96e045345a5f3ab214a7
Author: Osma Suominen <[email protected]>
Date: 2016-03-29T14:23:27Z
simplify parseQuery and preParseQuery: get rid of primaryField argument as
it is always the same
commit 22a81f8cbc9498cbe4f1970115aa32f9c21fb239
Author: Osma Suominen <[email protected]>
Date: 2016-03-30T08:09:02Z
JENA-1134: basic support for AnalyzingQueryParser
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---