[ https://issues.apache.org/jira/browse/OAK-5707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15881832#comment-15881832 ]
Vikas Saurabh edited comment on OAK-5707 at 2/24/17 2:57 AM: ------------------------------------------------------------- In the spirit of laziness and rationalizing that I need to this before planning how to document: attaching [^OAK-5707.patch] which should have been a main class but test cases just have better utility methods - so, it's a test. It'd print 3 type of definitions and how the data is stored in the index. Current output is at \[0]. Index dump is of the form: {noformat} <fieldName1> <term1> => [<list of paths>] <term2> => [<list of paths>] ... <fieldName2> .... .... {noformat} It's just 3 new files, so the patch should cleanly apply. [~empire29], you might want to check it out and see if this shows what is getting stored. Some more interesting index types that should show up here: aggregates (in a few forms), special properties like {{evaluatePathRestricition}}, etc. My next step is to add queries and their plans to the output. That should make it bit clearer how the index would be queried. I hope with enough shuffling, I'd get to a point where relevant points could be documented succinctly. PS: Somehow the content tree dump isn't following the order in which indices are present in content tree :-/. The real order of prop defs is {{foo}}, {{bar}}, {{allBar}}. \[0]: {noformat} ----------------CONTENT------------------- +/test -foo = fox jumping +test1 +testChild -bar = dog jumping +test2 +testChild -barX = dog jumping +testChild -bar = dog jumping ----------------propIdx-------------- Definition ---------- +/oak:index/propIdx -includedPaths = [/test] -reindexCount = 1 -compatVersion = 2 -reindex = false -type = lucene -jcr:primaryType = oak:QueryIndexDefinition +indexRules -jcr:primaryType = nt:unstructured +nt:base -jcr:primaryType = nt:unstructured +properties -jcr:primaryType = nt:unstructured +allBar -name = testChild/ba.* -propertyIndex = true -isRegexp = true -jcr:primaryType = nt:unstructured +foo -name = foo -propertyIndex = true -jcr:primaryType = nt:unstructured +bar -name = testChild/bar -propertyIndex = true -jcr:primaryType = nt:unstructured Index ----- foo fox jumping => [/test] testChild/bar dog jumping => [/test/test1, /test] testChild/barX dog jumping => [/test/test2] ----------------analyzedIdx-------------- Definition ---------- +/oak:index/analyzedIdx -includedPaths = [/test] -reindexCount = 1 -compatVersion = 2 -reindex = false -type = lucene -jcr:primaryType = oak:QueryIndexDefinition +indexRules -jcr:primaryType = nt:unstructured +nt:base -jcr:primaryType = nt:unstructured +properties -jcr:primaryType = nt:unstructured +allBar -analyzed = true -name = testChild/ba.* -isRegexp = true -jcr:primaryType = nt:unstructured +foo -analyzed = true -name = foo -jcr:primaryType = nt:unstructured +bar -analyzed = true -name = testChild/bar -jcr:primaryType = nt:unstructured Index ----- :fulltext test => [/test] test1 => [/test/test1] test2 => [/test/test2] full:foo fox => [/test] jumping => [/test] full:testChild/bar dog => [/test/test1, /test] jumping => [/test/test1, /test] full:testChild/barX dog => [/test/test2] jumping => [/test/test2] ----------------nodeScopedIdx-------------- Definition ---------- +/oak:index/nodeScopedIdx -includedPaths = [/test] -reindexCount = 1 -compatVersion = 2 -reindex = false -type = lucene -jcr:primaryType = oak:QueryIndexDefinition +indexRules -jcr:primaryType = nt:unstructured +nt:base -jcr:primaryType = nt:unstructured +properties -jcr:primaryType = nt:unstructured +allBar -nodeScopeIndex = true -name = testChild/ba.* -isRegexp = true -jcr:primaryType = nt:unstructured +foo -nodeScopeIndex = true -name = foo -jcr:primaryType = nt:unstructured +bar -nodeScopeIndex = true -name = testChild/bar -jcr:primaryType = nt:unstructured Index ----- :fulltext dog => [/test/test1, /test/test2, /test] fox => [/test] jumping => [/test/test1, /test/test2, /test] test => [/test] test1 => [/test/test1] test2 => [/test/test2] testchild => [/test/test1/testChild, /test/test2/testChild, /test/testChild] {noformat} was (Author: catholicon): In the spirit of laziness and rationalizing that I need to this before planning how to document: attaching [^OAK-5707.patch] which should have been a main class but test cases just have better utility methods - so, it's a test. It'd print 3 type of definitions and how the data is stored in the index. Current output is at \[0]. Index dump is of the form: {noformat} <fieldName1> <term1> => [<list of paths>] <term2> => [<list of paths>] ... <fieldName2> .... .... {noformat} It's just 3 new files, so the patch should cleanly apply. [~empire29], you might want to check it out and see if this shows what is getting stored. My next step is to add queries and their plans to the output. That should make it bit clearer how the index would be queried. I hope with enough shuffling, I'd get to a point where relevant points could be documented succinctly. PS: Somehow the content tree dump isn't following the order in which indices are present in content tree :-/. The real order of prop defs is {{foo}}, {{bar}}, {{allBar}}. \[0]: {noformat} ----------------CONTENT------------------- +/test -foo = fox jumping +test1 +testChild -bar = dog jumping +test2 +testChild -barX = dog jumping +testChild -bar = dog jumping ----------------propIdx-------------- Definition ---------- +/oak:index/propIdx -includedPaths = [/test] -reindexCount = 1 -compatVersion = 2 -reindex = false -type = lucene -jcr:primaryType = oak:QueryIndexDefinition +indexRules -jcr:primaryType = nt:unstructured +nt:base -jcr:primaryType = nt:unstructured +properties -jcr:primaryType = nt:unstructured +allBar -name = testChild/ba.* -propertyIndex = true -isRegexp = true -jcr:primaryType = nt:unstructured +foo -name = foo -propertyIndex = true -jcr:primaryType = nt:unstructured +bar -name = testChild/bar -propertyIndex = true -jcr:primaryType = nt:unstructured Index ----- foo fox jumping => [/test] testChild/bar dog jumping => [/test/test1, /test] testChild/barX dog jumping => [/test/test2] ----------------analyzedIdx-------------- Definition ---------- +/oak:index/analyzedIdx -includedPaths = [/test] -reindexCount = 1 -compatVersion = 2 -reindex = false -type = lucene -jcr:primaryType = oak:QueryIndexDefinition +indexRules -jcr:primaryType = nt:unstructured +nt:base -jcr:primaryType = nt:unstructured +properties -jcr:primaryType = nt:unstructured +allBar -analyzed = true -name = testChild/ba.* -isRegexp = true -jcr:primaryType = nt:unstructured +foo -analyzed = true -name = foo -jcr:primaryType = nt:unstructured +bar -analyzed = true -name = testChild/bar -jcr:primaryType = nt:unstructured Index ----- :fulltext test => [/test] test1 => [/test/test1] test2 => [/test/test2] full:foo fox => [/test] jumping => [/test] full:testChild/bar dog => [/test/test1, /test] jumping => [/test/test1, /test] full:testChild/barX dog => [/test/test2] jumping => [/test/test2] ----------------nodeScopedIdx-------------- Definition ---------- +/oak:index/nodeScopedIdx -includedPaths = [/test] -reindexCount = 1 -compatVersion = 2 -reindex = false -type = lucene -jcr:primaryType = oak:QueryIndexDefinition +indexRules -jcr:primaryType = nt:unstructured +nt:base -jcr:primaryType = nt:unstructured +properties -jcr:primaryType = nt:unstructured +allBar -nodeScopeIndex = true -name = testChild/ba.* -isRegexp = true -jcr:primaryType = nt:unstructured +foo -nodeScopeIndex = true -name = foo -jcr:primaryType = nt:unstructured +bar -nodeScopeIndex = true -name = testChild/bar -jcr:primaryType = nt:unstructured Index ----- :fulltext dog => [/test/test1, /test/test2, /test] fox => [/test] jumping => [/test/test1, /test/test2, /test] test => [/test] test1 => [/test/test1] test2 => [/test/test2] testchild => [/test/test1/testChild, /test/test2/testChild, /test/testChild] {noformat} > [Oak lucene indexes] Clarify aggregates, nodeScopeIndex, propertyIndex, > analyzed > -------------------------------------------------------------------------------- > > Key: OAK-5707 > URL: https://issues.apache.org/jira/browse/OAK-5707 > Project: Jackrabbit Oak > Issue Type: Documentation > Reporter: David Gonzalez > Assignee: Vikas Saurabh > Attachments: OAK-5707.patch > > > Oak lucene documentation would benefit from clarifying the relationships and > expect behaviors around aggregates, nodeScopeIndex, propertyIndex and > analyzed. > These features have some overlap in what they do and/or augment one another, > but to the lay-developer it is unclear how these work in concern and/or the > implications of these using the various features. > Its worth remembering many developers are under the mindset (shifting from > jackrabbit 2 -> oak) that oak indexing requires explicit inclusion of content > into search results; thus implicit content inclusion into indexes via > generalized aggregations (vs named properties) is unclear/unexpected to many. -- This message was sent by Atlassian JIRA (v6.3.15#6346)