[jira] Commented: (CLEREZZA-388) Composite Resource Index Service

Daniel Spicar (JIRA) Thu, 17 Mar 2011 05:44:03 -0700

    [ 
https://issues.apache.org/jira/browse/CLEREZZA-388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13007902#comment-13007902
 ]


Daniel Spicar commented on CLEREZZA-388:
----------------------------------------

I'd like give some feedback from some use-case scenario experience.

The use case is that I have a web site with a search interface that allows me 
to search for users on the platform. I'd like to be able to search 
"intuitively". This means when I enter "jessica" i expect all users where 
jessica appears in the name string as a single word. A rough specification is:
- exact string matching with double quotes ("phrase").
- wildcard matching (*,?)
- case-insensitive search ('jessica' and 'Jessica' should deliver the same 
results)
- boolean condtitions for search terms (AND, OR, NOT)

Lucene provides a QueryParser that supports most of these things and even more 
(fuzzy searches, range searches, etc). --> 
http://lucene.apache.org/java/3_0_0/queryparsersyntax.html

Thus I implemented my own Condition that uses the QueryParser on the user input 
to generate a query.

But I faced some problems which need to be resolved in CRIS:
1. CRIS indexes named resources with the Field.Index.NOT_ANALYZED attribute. 
This means the index is not tokenized and it is case-sensitive.
2. CRIS is currently hard-coded to deliver the top 10 results. For this use 
case this would need to be configurable though.

Concerning problem 1:
I resolved it locally by adding another field to the indexed document:
doc.add(new Field(vProperty.stringKey, propertyValue, Field.Store.YES, 
Field.Index.ANALYZED))

Because CRIS uses the StandardAnalyzer this means that in that new field the 
words are tokenized, common English stop words (like "a") are omitted, and the 
index is (according to my understanding) lower-case.
This means that now there is a field with the exact value, and another field 
with a lower-case, tokenized index.

The consequences from this solution are that it would be good it the 
GraphIndexer could somehow expose the Lucene Version attribute and the Analyzer 
that it uses on the public interface so custom conditions (like mine) can use 
the same Analyzer as the index has been written with.

I'll attach the GenericCondition, GraphIndexer, ResourceFinder files for 
reference. It is not production level code though.

> Composite Resource Index Service
> --------------------------------
>
>                 Key: CLEREZZA-388
>                 URL: https://issues.apache.org/jira/browse/CLEREZZA-388
>             Project: Clerezza
>          Issue Type: New Feature
>            Reporter: Reto Bachmann-Gmür
>            Assignee: Reto Bachmann-Gmür
>
> A service shall monitor a graph for resource of a specific typed and provide 
> composite indexes on specified properties. It shall support searching by 
> exact value, by range as well as full-text search. This service shall make it 
> possible to provide fast faceted searches.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (CLEREZZA-388) Composite Resource Index Service

Reply via email to