[ https://issues.apache.org/jira/browse/OAK-890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Thomas Mueller updated OAK-890: ------------------------------- Fix Version/s: (was: 0.14) 0.15 > Query: advanced fulltext search conditions > ------------------------------------------ > > Key: OAK-890 > URL: https://issues.apache.org/jira/browse/OAK-890 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: query > Reporter: Thomas Mueller > Assignee: Thomas Mueller > Fix For: 0.15 > > > Currently, the query engine does not use a fulltext index if there are > multiple fulltext conditions combined with "or". Also, the QueryIndex > interface does not support boosts, and does not support fulltext conditions > on properties (just on nodes) - Filter.getFulltextConditions is a collection > of strings, combined with "and", but does not contain the information whether > a condition is on a property or on all properties. Also, the popular sorting > by score (specially descending) is not currently supported. > [~mreutegg] and me discussed how we could support those features (including > boost) in a way that is backward compatible with Jackrabbit 2.x, but without > adding a lot of complexity. Example Jackrabbit 2.x query: > {code} > /jcr:root/content//*[(@jcr:primaryType='page' > and (jcr:contains(jcr:content/@tags, 'it:blue') > or jcr:contains(jcr:content/@tags, '/tags/it/blue')))] > /jcr:root/content//element(*, nt:hierarchyNode)[ > (jcr:contains(jcr:content, 'SomeTextToSearch') > or jcr:contains(jcr:content/@jcr:title, 'SomeTextToSearch') > or jcr:contains(jcr:content/@jcr:description, 'SomeTextToSearch'))] > /rep:excerpt(.) order by @jcr:score descending > {code} > A possible solution is to extend the internal fulltext syntax to support > those features. The internal fulltext syntax is the one used by > Filter.getFulltextCondition (not the one used within the original XPath, SQL, > or SQL-2 query). The proposed syntax (work in progress, just a rough draft so > far) is: > {code} > FullTextSearch ::= Or > ['order by score' [' desc']] > Or ::= And {' OR ' And}* > And ::= Term {' ' Term}* > Term ::= '(' Or ')' | ['-'] SimpleTerm > SimpleTerm ::= [Property ':'] '"' Word {' ' Word}* '"' ['^' Boost] > Property ::= <property name> > Boost ::= <number> > {code} > The idea is that the syntax matches the syntax used by Lucene (except for the > 'order by' part), so that the Lucene and Solr index implementations should > get simpler (only need minimal parsing, possibly just the 'order by' part). > Search terms (phrases, words) are always within double quotes. That means, > the above queries would result in the following condition: > {code} > jcr:content/tags:"it:blue" > OR jcr:content/tags:"/tags/it/blue" > jcr:content/*:"SomeTextToSearch" > OR jcr:content/jcr:title:"SomeTextToSearch" > OR jcr:content/jcr:description:"SomeTextToSearch" > order by score desc > {code} > It would also allow to switch back from > {code} > Collection<String> getFulltextConditions() > {code} > to > {code} > String getFulltextCondition() > {code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)