[ 
https://issues.apache.org/jira/browse/OAK-5707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15881832#comment-15881832
 ] 

Vikas Saurabh edited comment on OAK-5707 at 2/24/17 2:57 AM:
-------------------------------------------------------------

In the spirit of laziness and rationalizing that I need to this before planning 
how to document: attaching [^OAK-5707.patch] which should have been a main 
class but test cases just have better utility methods - so, it's a test. 

It'd print 3 type of definitions and how the data is stored in the index. 
Current output is at \[0]. Index dump is of the form:
{noformat}
<fieldName1>
  <term1> => [<list of paths>]
  <term2> => [<list of paths>]
  ...
<fieldName2>
  ....
....
{noformat}

It's just 3 new files, so the patch should cleanly apply. [~empire29], you 
might want to check it out and see if this shows what is getting stored.

Some more interesting index types that should show up here: aggregates (in a 
few forms), special properties like {{evaluatePathRestricition}}, etc.

My next step is to add queries and their plans to the output. That should make 
it bit clearer how the index would be queried.

I hope with enough shuffling, I'd get to a point where relevant points could be 
documented succinctly.

PS: Somehow the content tree dump isn't following the order in which indices 
are present in content tree :-/. The real order of prop defs is {{foo}}, 
{{bar}}, {{allBar}}.

\[0]:
{noformat}
----------------CONTENT-------------------
+/test
  -foo = fox jumping
  +test1
    +testChild
      -bar = dog jumping
  +test2
    +testChild
      -barX = dog jumping
  +testChild
    -bar = dog jumping

----------------propIdx--------------
Definition
----------
+/oak:index/propIdx
  -includedPaths = [/test]
  -reindexCount = 1
  -compatVersion = 2
  -reindex = false
  -type = lucene
  -jcr:primaryType = oak:QueryIndexDefinition
  +indexRules
    -jcr:primaryType = nt:unstructured
    +nt:base
      -jcr:primaryType = nt:unstructured
      +properties
        -jcr:primaryType = nt:unstructured
        +allBar
          -name = testChild/ba.*
          -propertyIndex = true
          -isRegexp = true
          -jcr:primaryType = nt:unstructured
        +foo
          -name = foo
          -propertyIndex = true
          -jcr:primaryType = nt:unstructured
        +bar
          -name = testChild/bar
          -propertyIndex = true
          -jcr:primaryType = nt:unstructured
Index
-----
foo
  fox jumping => [/test]
testChild/bar
  dog jumping => [/test/test1, /test]
testChild/barX
  dog jumping => [/test/test2]

----------------analyzedIdx--------------
Definition
----------
+/oak:index/analyzedIdx
  -includedPaths = [/test]
  -reindexCount = 1
  -compatVersion = 2
  -reindex = false
  -type = lucene
  -jcr:primaryType = oak:QueryIndexDefinition
  +indexRules
    -jcr:primaryType = nt:unstructured
    +nt:base
      -jcr:primaryType = nt:unstructured
      +properties
        -jcr:primaryType = nt:unstructured
        +allBar
          -analyzed = true
          -name = testChild/ba.*
          -isRegexp = true
          -jcr:primaryType = nt:unstructured
        +foo
          -analyzed = true
          -name = foo
          -jcr:primaryType = nt:unstructured
        +bar
          -analyzed = true
          -name = testChild/bar
          -jcr:primaryType = nt:unstructured
Index
-----
:fulltext
  test => [/test]
  test1 => [/test/test1]
  test2 => [/test/test2]
full:foo
  fox => [/test]
  jumping => [/test]
full:testChild/bar
  dog => [/test/test1, /test]
  jumping => [/test/test1, /test]
full:testChild/barX
  dog => [/test/test2]
  jumping => [/test/test2]

----------------nodeScopedIdx--------------
Definition
----------
+/oak:index/nodeScopedIdx
  -includedPaths = [/test]
  -reindexCount = 1
  -compatVersion = 2
  -reindex = false
  -type = lucene
  -jcr:primaryType = oak:QueryIndexDefinition
  +indexRules
    -jcr:primaryType = nt:unstructured
    +nt:base
      -jcr:primaryType = nt:unstructured
      +properties
        -jcr:primaryType = nt:unstructured
        +allBar
          -nodeScopeIndex = true
          -name = testChild/ba.*
          -isRegexp = true
          -jcr:primaryType = nt:unstructured
        +foo
          -nodeScopeIndex = true
          -name = foo
          -jcr:primaryType = nt:unstructured
        +bar
          -nodeScopeIndex = true
          -name = testChild/bar
          -jcr:primaryType = nt:unstructured
Index
-----
:fulltext
  dog => [/test/test1, /test/test2, /test]
  fox => [/test]
  jumping => [/test/test1, /test/test2, /test]
  test => [/test]
  test1 => [/test/test1]
  test2 => [/test/test2]
  testchild => [/test/test1/testChild, /test/test2/testChild, /test/testChild]
{noformat}


was (Author: catholicon):
In the spirit of laziness and rationalizing that I need to this before planning 
how to document: attaching [^OAK-5707.patch] which should have been a main 
class but test cases just have better utility methods - so, it's a test. 

It'd print 3 type of definitions and how the data is stored in the index. 
Current output is at \[0]. Index dump is of the form:
{noformat}
<fieldName1>
  <term1> => [<list of paths>]
  <term2> => [<list of paths>]
  ...
<fieldName2>
  ....
....
{noformat}

It's just 3 new files, so the patch should cleanly apply. [~empire29], you 
might want to check it out and see if this shows what is getting stored.

My next step is to add queries and their plans to the output. That should make 
it bit clearer how the index would be queried.

I hope with enough shuffling, I'd get to a point where relevant points could be 
documented succinctly.

PS: Somehow the content tree dump isn't following the order in which indices 
are present in content tree :-/. The real order of prop defs is {{foo}}, 
{{bar}}, {{allBar}}.

\[0]:
{noformat}
----------------CONTENT-------------------
+/test
  -foo = fox jumping
  +test1
    +testChild
      -bar = dog jumping
  +test2
    +testChild
      -barX = dog jumping
  +testChild
    -bar = dog jumping

----------------propIdx--------------
Definition
----------
+/oak:index/propIdx
  -includedPaths = [/test]
  -reindexCount = 1
  -compatVersion = 2
  -reindex = false
  -type = lucene
  -jcr:primaryType = oak:QueryIndexDefinition
  +indexRules
    -jcr:primaryType = nt:unstructured
    +nt:base
      -jcr:primaryType = nt:unstructured
      +properties
        -jcr:primaryType = nt:unstructured
        +allBar
          -name = testChild/ba.*
          -propertyIndex = true
          -isRegexp = true
          -jcr:primaryType = nt:unstructured
        +foo
          -name = foo
          -propertyIndex = true
          -jcr:primaryType = nt:unstructured
        +bar
          -name = testChild/bar
          -propertyIndex = true
          -jcr:primaryType = nt:unstructured
Index
-----
foo
  fox jumping => [/test]
testChild/bar
  dog jumping => [/test/test1, /test]
testChild/barX
  dog jumping => [/test/test2]

----------------analyzedIdx--------------
Definition
----------
+/oak:index/analyzedIdx
  -includedPaths = [/test]
  -reindexCount = 1
  -compatVersion = 2
  -reindex = false
  -type = lucene
  -jcr:primaryType = oak:QueryIndexDefinition
  +indexRules
    -jcr:primaryType = nt:unstructured
    +nt:base
      -jcr:primaryType = nt:unstructured
      +properties
        -jcr:primaryType = nt:unstructured
        +allBar
          -analyzed = true
          -name = testChild/ba.*
          -isRegexp = true
          -jcr:primaryType = nt:unstructured
        +foo
          -analyzed = true
          -name = foo
          -jcr:primaryType = nt:unstructured
        +bar
          -analyzed = true
          -name = testChild/bar
          -jcr:primaryType = nt:unstructured
Index
-----
:fulltext
  test => [/test]
  test1 => [/test/test1]
  test2 => [/test/test2]
full:foo
  fox => [/test]
  jumping => [/test]
full:testChild/bar
  dog => [/test/test1, /test]
  jumping => [/test/test1, /test]
full:testChild/barX
  dog => [/test/test2]
  jumping => [/test/test2]

----------------nodeScopedIdx--------------
Definition
----------
+/oak:index/nodeScopedIdx
  -includedPaths = [/test]
  -reindexCount = 1
  -compatVersion = 2
  -reindex = false
  -type = lucene
  -jcr:primaryType = oak:QueryIndexDefinition
  +indexRules
    -jcr:primaryType = nt:unstructured
    +nt:base
      -jcr:primaryType = nt:unstructured
      +properties
        -jcr:primaryType = nt:unstructured
        +allBar
          -nodeScopeIndex = true
          -name = testChild/ba.*
          -isRegexp = true
          -jcr:primaryType = nt:unstructured
        +foo
          -nodeScopeIndex = true
          -name = foo
          -jcr:primaryType = nt:unstructured
        +bar
          -nodeScopeIndex = true
          -name = testChild/bar
          -jcr:primaryType = nt:unstructured
Index
-----
:fulltext
  dog => [/test/test1, /test/test2, /test]
  fox => [/test]
  jumping => [/test/test1, /test/test2, /test]
  test => [/test]
  test1 => [/test/test1]
  test2 => [/test/test2]
  testchild => [/test/test1/testChild, /test/test2/testChild, /test/testChild]
{noformat}

> [Oak lucene indexes] Clarify aggregates, nodeScopeIndex, propertyIndex, 
> analyzed
> --------------------------------------------------------------------------------
>
>                 Key: OAK-5707
>                 URL: https://issues.apache.org/jira/browse/OAK-5707
>             Project: Jackrabbit Oak
>          Issue Type: Documentation
>            Reporter: David Gonzalez
>            Assignee: Vikas Saurabh
>         Attachments: OAK-5707.patch
>
>
> Oak lucene documentation would benefit from clarifying the relationships and 
> expect behaviors around aggregates, nodeScopeIndex, propertyIndex and 
> analyzed.
> These features have some overlap in what they do and/or augment one another, 
> but to the lay-developer it is unclear how these work in concern and/or the 
> implications of these using the various features.
> Its worth remembering many developers are under the mindset (shifting from 
> jackrabbit 2 -> oak) that oak indexing requires explicit inclusion of content 
> into search results; thus implicit content inclusion into indexes via 
> generalized aggregations (vs named properties) is unclear/unexpected to many.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to