[jira] [Comment Edited] (OAK-5707) [Oak lucene indexes] Clarify aggregates, nodeScopeIndex, propertyIndex, analyzed

2017-02-23 Thread Vikas Saurabh (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-5707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15881832#comment-15881832
 ] 

Vikas Saurabh edited comment on OAK-5707 at 2/24/17 2:57 AM:
-

In the spirit of laziness and rationalizing that I need to this before planning 
how to document: attaching [^OAK-5707.patch] which should have been a main 
class but test cases just have better utility methods - so, it's a test. 

It'd print 3 type of definitions and how the data is stored in the index. 
Current output is at \[0]. Index dump is of the form:
{noformat}

   => []
   => []
  ...

  

{noformat}

It's just 3 new files, so the patch should cleanly apply. [~empire29], you 
might want to check it out and see if this shows what is getting stored.

Some more interesting index types that should show up here: aggregates (in a 
few forms), special properties like {{evaluatePathRestricition}}, etc.

My next step is to add queries and their plans to the output. That should make 
it bit clearer how the index would be queried.

I hope with enough shuffling, I'd get to a point where relevant points could be 
documented succinctly.

PS: Somehow the content tree dump isn't following the order in which indices 
are present in content tree :-/. The real order of prop defs is {{foo}}, 
{{bar}}, {{allBar}}.

\[0]:
{noformat}
CONTENT---
+/test
  -foo = fox jumping
  +test1
+testChild
  -bar = dog jumping
  +test2
+testChild
  -barX = dog jumping
  +testChild
-bar = dog jumping

propIdx--
Definition
--
+/oak:index/propIdx
  -includedPaths = [/test]
  -reindexCount = 1
  -compatVersion = 2
  -reindex = false
  -type = lucene
  -jcr:primaryType = oak:QueryIndexDefinition
  +indexRules
-jcr:primaryType = nt:unstructured
+nt:base
  -jcr:primaryType = nt:unstructured
  +properties
-jcr:primaryType = nt:unstructured
+allBar
  -name = testChild/ba.*
  -propertyIndex = true
  -isRegexp = true
  -jcr:primaryType = nt:unstructured
+foo
  -name = foo
  -propertyIndex = true
  -jcr:primaryType = nt:unstructured
+bar
  -name = testChild/bar
  -propertyIndex = true
  -jcr:primaryType = nt:unstructured
Index
-
foo
  fox jumping => [/test]
testChild/bar
  dog jumping => [/test/test1, /test]
testChild/barX
  dog jumping => [/test/test2]

analyzedIdx--
Definition
--
+/oak:index/analyzedIdx
  -includedPaths = [/test]
  -reindexCount = 1
  -compatVersion = 2
  -reindex = false
  -type = lucene
  -jcr:primaryType = oak:QueryIndexDefinition
  +indexRules
-jcr:primaryType = nt:unstructured
+nt:base
  -jcr:primaryType = nt:unstructured
  +properties
-jcr:primaryType = nt:unstructured
+allBar
  -analyzed = true
  -name = testChild/ba.*
  -isRegexp = true
  -jcr:primaryType = nt:unstructured
+foo
  -analyzed = true
  -name = foo
  -jcr:primaryType = nt:unstructured
+bar
  -analyzed = true
  -name = testChild/bar
  -jcr:primaryType = nt:unstructured
Index
-
:fulltext
  test => [/test]
  test1 => [/test/test1]
  test2 => [/test/test2]
full:foo
  fox => [/test]
  jumping => [/test]
full:testChild/bar
  dog => [/test/test1, /test]
  jumping => [/test/test1, /test]
full:testChild/barX
  dog => [/test/test2]
  jumping => [/test/test2]

nodeScopedIdx--
Definition
--
+/oak:index/nodeScopedIdx
  -includedPaths = [/test]
  -reindexCount = 1
  -compatVersion = 2
  -reindex = false
  -type = lucene
  -jcr:primaryType = oak:QueryIndexDefinition
  +indexRules
-jcr:primaryType = nt:unstructured
+nt:base
  -jcr:primaryType = nt:unstructured
  +properties
-jcr:primaryType = nt:unstructured
+allBar
  -nodeScopeIndex = true
  -name = testChild/ba.*
  -isRegexp = true
  -jcr:primaryType = nt:unstructured
+foo
  -nodeScopeIndex = true
  -name = foo
  -jcr:primaryType = nt:unstructured
+bar
  -nodeScopeIndex = true
  -name = testChild/bar
  -jcr:primaryType = nt:unstructured
Index
-
:fulltext
  dog => [/test/test1, /test/test2, /test]
  fox => [/test]
  jumping => [/test/test1, /test/test2, /test]
  test => [/test]
  test1 => [/test/test1]
  test2 => [/test/test2]
  testchild => [/test/test1/testChild, /test/test2/testChild, /test/testChild]
{noformat}


was (Author: catholicon):
In the spirit of laziness and rationalizing that I need to this before planning 
how to document: attaching [^OAK-5707.patch] which should have been a main 
class but test cases just have better utility methods - so, it's a test. 

It'd print 3 type of 

[jira] [Comment Edited] (OAK-5707) [Oak lucene indexes] Clarify aggregates, nodeScopeIndex, propertyIndex, analyzed

2017-02-17 Thread David Gonzalez (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-5707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15872536#comment-15872536
 ] 

David Gonzalez edited comment on OAK-5707 at 2/17/17 9:16 PM:
--

Including helpful offline conversations w/ Vikas. 

The following require review for correctness, and are added here to help shape 
the discussion and for convenience and should NOT be considered correct until 
the review has been finalized.

* Aggregate instruct Oak to  fulltext-index any property found under the 
provided path pattern (ex. */*/*) (avoiding complication of how it recurses 
through types... )
** By default all String and String[] properties are candidates for 
aggregation, however other property types can be specified at the aggregation  
level.
* Specific property index definitions defined under indexRules, are about "how 
to index a specific property"
** The effected property is defined via a) a relative property path or b) via a 
regex property path match from the  the indexRules applies to.
* A property that can be reached by an aggregate rule pattern is same as that 
property having a `indexRules//properties/myProp@nodeScopeIndex=true`
** The advantages of ALSO defining an indexRule for a property covered by an 
aggregate are:
*** The property can also be marked as a propertyIndex which allows for more 
performation property based equality matches
*** The property can be marked for special use (ex. useInSuggest, 
useInSpellecheck, boost, etc.) which may (depending on the applied special use 
properties) how it behaves in the aggregated search.

* `analyzed=true` in an indexRules prop def (say for `jcr:content/jcr:title`) 
allows for `...FROM [app:Page] where CONTAINS([jcr:content/jcr:title], 'foo')`
** TBD what special user props are applicable to this (if any?)
* 'propertyIndex=true` is for the condition you were asking at first `WHERE 
[jcr:content/jcr:title]='foo'`
** If the property is a candidate for aggregates or nodeScopeIndex (which as 
noted are ~ equivalent), equality property conditions (WHERE 
[jcr:content/jcr:title]='foo') may still appear fast and not report a traversal 
warning, as Oak is able to leverage the internal aggregate index to quickly 
isolate matches. That being said, for property equality checks, it is always as 
fast (if not faster) to defined an indexRule for the property with 
`propertyIndex=true`
** TBD clearly describe the considerations of equality matches when only using 
the aggregate index.
* aggregate and nodeScopeIndex are intended to roll content up into the index's 
"nodeType" index, so that content will be candidate for fulltext searchs 
against that node (vs against a specific property) or rather: `WHERE 
CONTAINS(*, 'foo')`
 * The `excludeFromAggregation` prop "disables" aggregate indexing of a prop 
that matches a prop def having `excludeFromAggregation`



was (Author: empire29):
Including helpful offline conversations w/ Vikas.

The following require review for correctness, and are added here to help shape 
the discussion and for convenience.

* Aggregate instruct Oak to  fulltext-index any property found under the 
provided path pattern (ex. */*/*) (avoiding complication of how it recurses 
through types... )
** By default all String and String[] properties are candidates for 
aggregation, however other property types can be specified at the aggregation  
level.
* Specific property index definitions defined under indexRules, are about "how 
to index a specific property"
** The effected property is defined via a) a relative property path or b) via a 
regex property path match from the  the indexRules applies to.
* A property that can be reached by an aggregate rule pattern is same as that 
property having a `indexRules//properties/myProp@nodeScopeIndex=true`
** The advantages of ALSO defining an indexRule for a property covered by an 
aggregate are:
*** The property can also be marked as a propertyIndex which allows for more 
performation property based equality matches
*** The property can be marked for special use (ex. useInSuggest, 
useInSpellecheck, boost, etc.) which may (depending on the applied special use 
properties) how it behaves in the aggregated search.

* `analyzed=true` in an indexRules prop def (say for `jcr:content/jcr:title`) 
allows for `...FROM [app:Page] where CONTAINS([jcr:content/jcr:title], 'foo')`
** TBD what special user props are applicable to this (if any?)
* 'propertyIndex=true` is for the condition you were asking at first `WHERE 
[jcr:content/jcr:title]='foo'`
** If the property is a candidate for aggregates or nodeScopeIndex (which as 
noted are ~ equivalent), equality property conditions (WHERE 
[jcr:content/jcr:title]='foo') may still appear fast and not report a traversal 
warning, as Oak is able to leverage the internal aggregate index to quickly 
isolate matches. That being said, for property equality checks, it is always as