[jira] [Commented] (OAK-5499) IndexUpdate can do mulitple traversal of a content tree during initial index when there are sub-root indices
[ https://issues.apache.org/jira/browse/OAK-5499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15901161#comment-15901161 ] Vikas Saurabh commented on OAK-5499: Ok, I think I now understand what's going on (and that's different from my idea of how-stuff-works when I logged this issue). About the patch, I think the change is useful for full reindexing mode. And afaiu, Chetan's concern isn't easy to fit in because {{before}} state for {{IndexUpdate}} is different from that required by reindexing-editors (EMPTY). I wonder if it's ok/possible to hook in EMPTY-vs-CURRENT diff being done for reindexing editors to collect the editors too? I mean it does process normal diff - but if it "finds" that it needs to reindex a few in current one then it by-passes usual calls being sent in until it leaves the sub-tree. In my mean-time (while it's by-passing calls), it hands over another editor for EMPTY-vs-CURRENT diff to "find" other ones. (I understand the para above is very hand-wavy - but that's the best way I could do with my current understanding and words :-/.. also, I wonder how 3 level indices would behave then). [~chetanm], would it makes sense to get this state in and open another one to handle non-full-reindex case? (assuming doing this completely would take a bit more time) > IndexUpdate can do mulitple traversal of a content tree during initial index > when there are sub-root indices > > > Key: OAK-5499 > URL: https://issues.apache.org/jira/browse/OAK-5499 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: core >Reporter: Vikas Saurabh >Assignee: Alex Parvulescu >Priority: Minor > Fix For: 1.8 > > Attachments: OAK-5499.patch, OAK-5499-v2-demo.patch, > OAK-5499-v2-fix.patch > > > In case we've index defs such as: > {noformat} > /oak:index/foo1Index > /content >/oak:index/foo2Index > {noformat} > then initial indexing process \[0] would traverse tree under {{/content}} > twice - once while indexing for top-level indices and next when it starts to > index newly discovered {{foo2Index}} while traversing {{/content/oak:index}}. > What we can do is that while first diff processes {{/content}} and discovers > a node named {{oak:index}}, it can actively go in that tree and peek into > index defs from under it and register as required. The diff can then proceed > under {{/content}} while the new indices would also get diffs (avoiding > another traversal) > \[0] first time indexing or in case {{/:async}} gets deleted or checkpoint > for async index couldn't be retrieved -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (OAK-5499) IndexUpdate can do mulitple traversal of a content tree during initial index when there are sub-root indices
[ https://issues.apache.org/jira/browse/OAK-5499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15901089#comment-15901089 ] Vikas Saurabh commented on OAK-5499: Assigning to Alex as he's actively working on this. I'd review in a bit, but Curran's concern seems relevant. > IndexUpdate can do mulitple traversal of a content tree during initial index > when there are sub-root indices > > > Key: OAK-5499 > URL: https://issues.apache.org/jira/browse/OAK-5499 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: core >Reporter: Vikas Saurabh >Assignee: Alex Parvulescu >Priority: Minor > Fix For: 1.8 > > Attachments: OAK-5499.patch, OAK-5499-v2-demo.patch, > OAK-5499-v2-fix.patch > > > In case we've index defs such as: > {noformat} > /oak:index/foo1Index > /content >/oak:index/foo2Index > {noformat} > then initial indexing process \[0] would traverse tree under {{/content}} > twice - once while indexing for top-level indices and next when it starts to > index newly discovered {{foo2Index}} while traversing {{/content/oak:index}}. > What we can do is that while first diff processes {{/content}} and discovers > a node named {{oak:index}}, it can actively go in that tree and peek into > index defs from under it and register as required. The diff can then proceed > under {{/content}} while the new indices would also get diffs (avoiding > another traversal) > \[0] first time indexing or in case {{/:async}} gets deleted or checkpoint > for async index couldn't be retrieved -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (OAK-5499) IndexUpdate can do mulitple traversal of a content tree during initial index when there are sub-root indices
[ https://issues.apache.org/jira/browse/OAK-5499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15901082#comment-15901082 ] Chetan Mehrotra commented on OAK-5499: -- The approach is neat! However it only addresses the case for initial index. Now consider a case where we have index definitions at {noformat} /oak:index /fooIndex /barIndex /content /project /oak:index /foo2Index {noformat} And now we had 2 reindex the fooIndex and foo2Index. With current approach this would result in 2 traversals. So we need a way to handle such cases also > IndexUpdate can do mulitple traversal of a content tree during initial index > when there are sub-root indices > > > Key: OAK-5499 > URL: https://issues.apache.org/jira/browse/OAK-5499 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: core >Reporter: Vikas Saurabh >Assignee: Vikas Saurabh >Priority: Minor > Fix For: 1.8 > > Attachments: OAK-5499.patch, OAK-5499-v2-demo.patch, > OAK-5499-v2-fix.patch > > > In case we've index defs such as: > {noformat} > /oak:index/foo1Index > /content >/oak:index/foo2Index > {noformat} > then initial indexing process \[0] would traverse tree under {{/content}} > twice - once while indexing for top-level indices and next when it starts to > index newly discovered {{foo2Index}} while traversing {{/content/oak:index}}. > What we can do is that while first diff processes {{/content}} and discovers > a node named {{oak:index}}, it can actively go in that tree and peek into > index defs from under it and register as required. The diff can then proceed > under {{/content}} while the new indices would also get diffs (avoiding > another traversal) > \[0] first time indexing or in case {{/:async}} gets deleted or checkpoint > for async index couldn't be retrieved -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (OAK-5499) IndexUpdate can do mulitple traversal of a content tree during initial index when there are sub-root indices
[ https://issues.apache.org/jira/browse/OAK-5499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15899615#comment-15899615 ] Alex Parvulescu commented on OAK-5499: -- [~chetanm], [~catholicon], [~tmueller] gentle ping, the patch needs some more eyes for review! > IndexUpdate can do mulitple traversal of a content tree during initial index > when there are sub-root indices > > > Key: OAK-5499 > URL: https://issues.apache.org/jira/browse/OAK-5499 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: core >Reporter: Vikas Saurabh >Assignee: Vikas Saurabh >Priority: Minor > Fix For: 1.8 > > Attachments: OAK-5499.patch, OAK-5499-v2-demo.patch, > OAK-5499-v2-fix.patch > > > In case we've index defs such as: > {noformat} > /oak:index/foo1Index > /content >/oak:index/foo2Index > {noformat} > then initial indexing process \[0] would traverse tree under {{/content}} > twice - once while indexing for top-level indices and next when it starts to > index newly discovered {{foo2Index}} while traversing {{/content/oak:index}}. > What we can do is that while first diff processes {{/content}} and discovers > a node named {{oak:index}}, it can actively go in that tree and peek into > index defs from under it and register as required. The diff can then proceed > under {{/content}} while the new indices would also get diffs (avoiding > another traversal) > \[0] first time indexing or in case {{/:async}} gets deleted or checkpoint > for async index couldn't be retrieved -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (OAK-5499) IndexUpdate can do mulitple traversal of a content tree during initial index when there are sub-root indices
[ https://issues.apache.org/jira/browse/OAK-5499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15837817#comment-15837817 ] Alex Parvulescu commented on OAK-5499: -- bq. Collecting oak:index nodes as part of first reindex traversal would help in avoiding further traversal. It wouldn't though, the traversal would happen anyways because that's how the diff currently works. I added a comment on OAK-5511, but I don't see it as helping much here. > IndexUpdate can do mulitple traversal of a content tree during initial index > when there are sub-root indices > > > Key: OAK-5499 > URL: https://issues.apache.org/jira/browse/OAK-5499 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: core >Reporter: Vikas Saurabh >Assignee: Vikas Saurabh >Priority: Minor > Fix For: 1.8 > > Attachments: OAK-5499.patch, OAK-5499-v2-demo.patch, > OAK-5499-v2-fix.patch > > > In case we've index defs such as: > {noformat} > /oak:index/foo1Index > /content >/oak:index/foo2Index > {noformat} > then initial indexing process \[0] would traverse tree under {{/content}} > twice - once while indexing for top-level indices and next when it starts to > index newly discovered {{foo2Index}} while traversing {{/content/oak:index}}. > What we can do is that while first diff processes {{/content}} and discovers > a node named {{oak:index}}, it can actively go in that tree and peek into > index defs from under it and register as required. The diff can then proceed > under {{/content}} while the new indices would also get diffs (avoiding > another traversal) > \[0] first time indexing or in case {{/:async}} gets deleted or checkpoint > for async index couldn't be retrieved -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-5499) IndexUpdate can do mulitple traversal of a content tree during initial index when there are sub-root indices
[ https://issues.apache.org/jira/browse/OAK-5499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15837160#comment-15837160 ] Chetan Mehrotra commented on OAK-5499: -- Also opened OAK-5511 to reduce cost of such index definition "discovery" > IndexUpdate can do mulitple traversal of a content tree during initial index > when there are sub-root indices > > > Key: OAK-5499 > URL: https://issues.apache.org/jira/browse/OAK-5499 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: core >Reporter: Vikas Saurabh >Assignee: Vikas Saurabh >Priority: Minor > Fix For: 1.8 > > Attachments: OAK-5499.patch > > > In case we've index defs such as: > {noformat} > /oak:index/foo1Index > /content >/oak:index/foo2Index > {noformat} > then initial indexing process \[0] would traverse tree under {{/content}} > twice - once while indexing for top-level indices and next when it starts to > index newly discovered {{foo2Index}} while traversing {{/content/oak:index}}. > What we can do is that while first diff processes {{/content}} and discovers > a node named {{oak:index}}, it can actively go in that tree and peek into > index defs from under it and register as required. The diff can then proceed > under {{/content}} while the new indices would also get diffs (avoiding > another traversal) > \[0] first time indexing or in case {{/:async}} gets deleted or checkpoint > for async index couldn't be retrieved -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-5499) IndexUpdate can do mulitple traversal of a content tree during initial index when there are sub-root indices
[ https://issues.apache.org/jira/browse/OAK-5499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15837149#comment-15837149 ] Chetan Mehrotra commented on OAK-5499: -- Collecting oak:index nodes as part of first reindex traversal would help in avoiding further traversal. Another option would be to use nodetype index and look for oak:QueryIndexDefinition entries. This can then be used to determine location of oak:index nodes in whole repository > IndexUpdate can do mulitple traversal of a content tree during initial index > when there are sub-root indices > > > Key: OAK-5499 > URL: https://issues.apache.org/jira/browse/OAK-5499 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: core >Reporter: Vikas Saurabh >Assignee: Vikas Saurabh >Priority: Minor > Fix For: 1.8 > > Attachments: OAK-5499.patch > > > In case we've index defs such as: > {noformat} > /oak:index/foo1Index > /content >/oak:index/foo2Index > {noformat} > then initial indexing process \[0] would traverse tree under {{/content}} > twice - once while indexing for top-level indices and next when it starts to > index newly discovered {{foo2Index}} while traversing {{/content/oak:index}}. > What we can do is that while first diff processes {{/content}} and discovers > a node named {{oak:index}}, it can actively go in that tree and peek into > index defs from under it and register as required. The diff can then proceed > under {{/content}} while the new indices would also get diffs (avoiding > another traversal) > \[0] first time indexing or in case {{/:async}} gets deleted or checkpoint > for async index couldn't be retrieved -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-5499) IndexUpdate can do mulitple traversal of a content tree during initial index when there are sub-root indices
[ https://issues.apache.org/jira/browse/OAK-5499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15833697#comment-15833697 ] Varun Mehrotra commented on OAK-5499: - Thanks [~catholicon] for logging this issue. Can we increase the priority of this? > IndexUpdate can do mulitple traversal of a content tree during initial index > when there are sub-root indices > > > Key: OAK-5499 > URL: https://issues.apache.org/jira/browse/OAK-5499 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: core >Reporter: Vikas Saurabh >Assignee: Vikas Saurabh >Priority: Minor > Fix For: 1.8 > > > In case we've index defs such as: > {noformat} > /oak:index/foo1Index > /content >/oak:index/foo2Index > {noformat} > then initial indexing process \[0] would traverse tree under {{/content}} > twice - once while indexing for top-level indices and next when it starts to > index newly discovered {{foo2Index}} while traversing {{/content/oak:index}}. > What we can do is that while first diff processes {{/content}} and discovers > a node named {{oak:index}}, it can actively go in that tree and peek into > index defs from under it and register as required. The diff can then proceed > under {{/content}} while the new indices would also get diffs (avoiding > another traversal) > \[0] first time indexing or in case {{/:async}} gets deleted or checkpoint > for async index couldn't be retrieved -- This message was sent by Atlassian JIRA (v6.3.4#6332)