[
https://issues.apache.org/jira/browse/CLEREZZA-388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13007902#comment-13007902
]
Daniel Spicar commented on CLEREZZA-388:
----------------------------------------
I'd like give some feedback from some use-case scenario experience.
The use case is that I have a web site with a search interface that allows me
to search for users on the platform. I'd like to be able to search
"intuitively". This means when I enter "jessica" i expect all users where
jessica appears in the name string as a single word. A rough specification is:
- exact string matching with double quotes ("phrase").
- wildcard matching (*,?)
- case-insensitive search ('jessica' and 'Jessica' should deliver the same
results)
- boolean condtitions for search terms (AND, OR, NOT)
Lucene provides a QueryParser that supports most of these things and even more
(fuzzy searches, range searches, etc). -->
http://lucene.apache.org/java/3_0_0/queryparsersyntax.html
Thus I implemented my own Condition that uses the QueryParser on the user input
to generate a query.
But I faced some problems which need to be resolved in CRIS:
1. CRIS indexes named resources with the Field.Index.NOT_ANALYZED attribute.
This means the index is not tokenized and it is case-sensitive.
2. CRIS is currently hard-coded to deliver the top 10 results. For this use
case this would need to be configurable though.
Concerning problem 1:
I resolved it locally by adding another field to the indexed document:
doc.add(new Field(vProperty.stringKey, propertyValue, Field.Store.YES,
Field.Index.ANALYZED))
Because CRIS uses the StandardAnalyzer this means that in that new field the
words are tokenized, common English stop words (like "a") are omitted, and the
index is (according to my understanding) lower-case.
This means that now there is a field with the exact value, and another field
with a lower-case, tokenized index.
The consequences from this solution are that it would be good it the
GraphIndexer could somehow expose the Lucene Version attribute and the Analyzer
that it uses on the public interface so custom conditions (like mine) can use
the same Analyzer as the index has been written with.
I'll attach the GenericCondition, GraphIndexer, ResourceFinder files for
reference. It is not production level code though.
> Composite Resource Index Service
> --------------------------------
>
> Key: CLEREZZA-388
> URL: https://issues.apache.org/jira/browse/CLEREZZA-388
> Project: Clerezza
> Issue Type: New Feature
> Reporter: Reto Bachmann-Gmür
> Assignee: Reto Bachmann-Gmür
>
> A service shall monitor a graph for resource of a specific typed and provide
> composite indexes on specified properties. It shall support searching by
> exact value, by range as well as full-text search. This service shall make it
> possible to provide fast faceted searches.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira