[jira] [Comment Edited] (OAK-5707) [Oak lucene indexes] Clarify aggregates, nodeScopeIndex, propertyIndex, analyzed

David Gonzalez (JIRA) Fri, 17 Feb 2017 13:18:06 -0800

    [ 
https://issues.apache.org/jira/browse/OAK-5707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15872536#comment-15872536
 ]


David Gonzalez edited comment on OAK-5707 at 2/17/17 9:16 PM:
--------------------------------------------------------------

Including helpful offline conversations w/ Vikas. 

The following require review for correctness, and are added here to help shape 
the discussion and for convenience and should NOT be considered correct until 
the review has been finalized.

* Aggregate instruct Oak to  fulltext-index any property found under the 
provided path pattern (ex. */*/*) (avoiding complication of how it recurses 
through types... )
** By default all String and String[] properties are candidates for 
aggregation, however other property types can be specified at the aggregation  
level.
* Specific property index definitions defined under indexRules, are about "how 
to index a specific property"
** The effected property is defined via a) a relative property path or b) via a 
regex property path match from the <nodeType> the indexRules applies to.
* A property that can be reached by an aggregate rule pattern is same as that 
property having a `indexRules/<nodeType>/properties/myProp@nodeScopeIndex=true`
** The advantages of ALSO defining an indexRule for a property covered by an 
aggregate are:
*** The property can also be marked as a propertyIndex which allows for more 
performation property based equality matches
*** The property can be marked for special use (ex. useInSuggest, 
useInSpellecheck, boost, etc.) which may (depending on the applied special use 
properties) how it behaves in the aggregated search.

* `analyzed=true` in an indexRules prop def (say for `jcr:content/jcr:title`) 
allows for `...FROM [app:Page] where CONTAINS([jcr:content/jcr:title], 'foo')`
** TBD what special user props are applicable to this (if any?)
* 'propertyIndex=true` is for the condition you were asking at first `WHERE 
[jcr:content/jcr:title]='foo'`
** If the property is a candidate for aggregates or nodeScopeIndex (which as 
noted are ~ equivalent), equality property conditions (WHERE 
[jcr:content/jcr:title]='foo') may still appear fast and not report a traversal 
warning, as Oak is able to leverage the internal aggregate index to quickly 
isolate matches. That being said, for property equality checks, it is always as 
fast (if not faster) to defined an indexRule for the property with 
`propertyIndex=true`
** TBD clearly describe the considerations of equality matches when only using 
the aggregate index.
* aggregate and nodeScopeIndex are intended to roll content up into the index's 
"nodeType" index, so that content will be candidate for fulltext searchs 
against that node (vs against a specific property) or rather: `WHERE 
CONTAINS(*, 'foo')`
 * The `excludeFromAggregation` prop "disables" aggregate indexing of a prop 
that matches a prop def having `excludeFromAggregation`



was (Author: empire29):
Including helpful offline conversations w/ Vikas.

The following require review for correctness, and are added here to help shape 
the discussion and for convenience.

* Aggregate instruct Oak to  fulltext-index any property found under the 
provided path pattern (ex. */*/*) (avoiding complication of how it recurses 
through types... )
** By default all String and String[] properties are candidates for 
aggregation, however other property types can be specified at the aggregation  
level.
* Specific property index definitions defined under indexRules, are about "how 
to index a specific property"
** The effected property is defined via a) a relative property path or b) via a 
regex property path match from the <nodeType> the indexRules applies to.
* A property that can be reached by an aggregate rule pattern is same as that 
property having a `indexRules/<nodeType>/properties/myProp@nodeScopeIndex=true`
** The advantages of ALSO defining an indexRule for a property covered by an 
aggregate are:
*** The property can also be marked as a propertyIndex which allows for more 
performation property based equality matches
*** The property can be marked for special use (ex. useInSuggest, 
useInSpellecheck, boost, etc.) which may (depending on the applied special use 
properties) how it behaves in the aggregated search.

* `analyzed=true` in an indexRules prop def (say for `jcr:content/jcr:title`) 
allows for `...FROM [app:Page] where CONTAINS([jcr:content/jcr:title], 'foo')`
** TBD what special user props are applicable to this (if any?)
* 'propertyIndex=true` is for the condition you were asking at first `WHERE 
[jcr:content/jcr:title]='foo'`
** If the property is a candidate for aggregates or nodeScopeIndex (which as 
noted are ~ equivalent), equality property conditions (WHERE 
[jcr:content/jcr:title]='foo') may still appear fast and not report a traversal 
warning, as Oak is able to leverage the internal aggregate index to quickly 
isolate matches. That being said, for property equality checks, it is always as 
fast (if not faster) to defined an indexRule for the property with 
`propertyIndex=true`
** TBD clearly describe the considerations of equality matches when only using 
the aggregate index.
* aggregate and nodeScopeIndex are intended to roll content up into the index's 
"nodeType" index, so that content will be candidate for fulltext searchs 
against that node (vs against a specific property) or rather: `WHERE 
CONTAINS(*, 'foo')`
 * The `excludeFromAggregation` prop "disables" aggregate indexing of a prop 
that matches a prop def having `excludeFromAggregation`


> [Oak lucene indexes] Clarify aggregates, nodeScopeIndex, propertyIndex, 
> analyzed
> --------------------------------------------------------------------------------
>
>                 Key: OAK-5707
>                 URL: https://issues.apache.org/jira/browse/OAK-5707
>             Project: Jackrabbit Oak
>          Issue Type: Documentation
>            Reporter: David Gonzalez
>            Assignee: Vikas Saurabh
>
> Oak lucene documentation would benefit from clarifying the relationships and 
> expect behaviors around aggregates, nodeScopeIndex, propertyIndex and 
> analyzed.
> These features have some overlap in what they do and/or augment one another, 
> but to the lay-developer it is unclear how these work in concern and/or the 
> implications of these using the various features.
> Its worth remembering many developers are under the mindset (shifting from 
> jackrabbit 2 -> oak) that oak indexing requires explicit inclusion of content 
> into search results; thus implicit content inclusion into indexes via 
> generalized aggregations (vs named properties) is unclear/unexpected to many.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Comment Edited] (OAK-5707) [Oak lucene indexes] Clarify aggregates, nodeScopeIndex, propertyIndex, analyzed

Reply via email to