[jira] [Commented] (OAK-2807) Improve getSize performance for "public" content
[ https://issues.apache.org/jira/browse/OAK-2807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14591761#comment-14591761 ] angela commented on OAK-2807: - hi michael {quote} There is a very common special case: content (a subtree) that is readable by everyone (anonymous). {quote} unfortunately it only looks like being readable to everyone because the access-checks will filter out all 'meta' data that is not readable. it's only the 'regular' content that is readable. special things like e.g. the policy that opens up the read-access for everyone is not world-readable, nor are other special data like e.g. analytics data. {quote} If we mark an index on that subtree as "readable by everyone" on index creation then we could skip ACL check on the result set or precompute/cache certain query results. {quote} the problem is that you don't know if it is really readable by everyone for the following reasons: - any policy _above_ your target tree denying access for a user-principal will take precedence - any policy _below_ (e.g. CUG) that looks down access again will make your test for 'readable by everyone' become wrong - with OAK-1268 and having the latter covered by dedicated policies looking a given implementation of {{AccessControlPolicy}} and {{AccessControlEntry}} will no longer reflect the complete picture - with OAK-2008 the permission evaluation will become a multiplexing one and there will be simple shortcut to determine 'readable-for-everyone-for-the-whole-subtree' any more (if it ever was possible). - {{TreePermissions.canReadAll()}} is already intended to provide exact the short-cut you are proposing and the reason why this only returns {{true}} for the administrative access is, that i didn't see a scalable way to predict this! {quote} In order to avoid information leakage the index would have to be marked "invalid" as soon as one node in that sub-tree is not readable by everyone anymore. (could be checked through a commit hook) {quote} if that was _really_ feasible... why not... but so far i don't see how this would work reliably for _every_ combination of principals. as you can see in {{TreePermission}} I added this shortcut because I thought that it might be doable but so far I didn't find a solution for this except for the the trivial case. {quote} Maybe this concept could even be generalized later to work with other principals than everyone. {quote} the problem is not _everyone_ versus some other principals... if we have a concept that _really_ works, it doesn't matter on whether it's everyone or some other principal. btw: the same suggestion has been made by david ages ago, because he just made the same assumptions that this is easy to achieve. but unfortunately it's not if you look at it from a repository point of view and not from a demo-hack point of view. kind regards angela > Improve getSize performance for "public" content > > > Key: OAK-2807 > URL: https://issues.apache.org/jira/browse/OAK-2807 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: query, security >Affects Versions: 1.0.13, 1.2 >Reporter: Michael Marth > > Certain operations in the query engine like getting the size of a result set > or facets are expensive to compute due to the fact that ACLs need to be > computed on the entire result set. This issue is to discuss an idea how we > could improve this: > There is a very common special case: content (a subtree) that is readable by > everyone (anonymous). If we mark an index on that subtree as "readable by > everyone" on index creation then we could skip ACL check on the result set or > precompute/cache certain query results. > In order to avoid information leakage the index would have to be marked > "invalid" as soon as one node in that sub-tree is not readable by everyone > anymore. (could be checked through a commit hook) > Maybe this concept could even be generalized later to work with other > principals than everyone. > Just an idea - feel free to poke holes and shoot it down :) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-2807) Improve getSize performance for "public" content
[ https://issues.apache.org/jira/browse/OAK-2807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14543465#comment-14543465 ] Davide Giannella commented on OAK-2807: --- I like the idea but as of now I don't know how to easily/cleanly solve it. Unfortunately the JCR API don't allow the {{count()}} function in query so we have to rely on the Iterator implementation. First the index should expose to the query engine the information about it can serve the counting. For example lucene can serve counts of the number of returned nodes, using the [TotalHitCountCollector|http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/search/TotalHitCountCollector.html]. So we should work out something along the following lines {code} diff --git a/oak-core/src/main/java/org/apache/jackrabbit/oak/spi/query/QueryIndex.java b/oak-core/src/main/java/org/apache/jackrabbit/oak/spi/query/QueryIndex.java index f86dbc0..20f7f63 100644 --- a/oak-core/src/main/java/org/apache/jackrabbit/oak/spi/query/QueryIndex.java +++ b/oak-core/src/main/java/org/apache/jackrabbit/oak/spi/query/QueryIndex.java @@ -309,6 +309,12 @@ public interface QueryIndex { Object getAttribute(String name); /** + * + * @return true if the current index could serve the count of returned nodes + */ +boolean isCountIndex(); + +/** * A builder for index plans. */ public class Builder { @@ -320,6 +326,7 @@ public interface QueryIndex { protected boolean isDelayed; protected boolean isFulltextIndex; protected boolean includesNodeData; +protected boolean countIndex; protected List sortOrder; protected NodeState definition; protected PropertyRestriction propRestriction; @@ -386,6 +393,11 @@ public interface QueryIndex { return this; } +public Builder setCountIndex(final boolean countIndex) { +this.countIndex = countIndex; +return this; +} + public IndexPlan build() { return new IndexPlan() { @@ -417,6 +429,8 @@ public interface QueryIndex { private final Map attributes = Builder.this.attributes; +private final boolean countIndex = Builder.this.countIndex; + @Override public String toString() { return String.format( @@ -430,7 +444,8 @@ public interface QueryIndex { + " sortOrder : %s," + " definition : %s," + " propertyRestriction : %s," -+ " pathPrefix : %s }", ++ " pathPrefix : %s, " ++ " countIndex : %s }", costPerExecution, costPerEntry, estimatedEntryCount, @@ -441,11 +456,17 @@ public interface QueryIndex { sortOrder, definition, propRestriction, -pathPrefix +pathPrefix, +countIndex ); } @Override +public boolean isCountIndex() { +return countIndex; +} + +@Override public double getCostPerExecution() { return costPerExecution; } {code} Then we should work on the Iterator side in oak making it aware of the executed plan and in case rely on the index itself. The iterator ({{RowIterator}} and {{NodeIterator}}) is initialised in [QueryResultImpl|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-jcr/src/main/java/org/apache/jackrabbit/oak/jcr/query/QueryResultImpl.java] We'll probably need to implement an Oak version of the Iterators which takes in account the executed plan/query. Unfortunately the information about the plan and the index is lost way down the chain in the QueryImpl. So far I didn't find any clear way to pass this information along. > Improve getSize performance for "public" content > > > Key: OAK-2807 > URL: https://issues.apache.org/jira/browse/OAK-2807 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: query, security >Affects Versions: 1.0.13, 1.2 >Reporter: Michael Marth > > Certain operations in the query engine like getting the size of a result
[jira] [Commented] (OAK-2807) Improve getSize performance for "public" content
[ https://issues.apache.org/jira/browse/OAK-2807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14511865#comment-14511865 ] Alexander Klimetschek commented on OAK-2807: Sounds great. The security folks will argue that the invalidation is extremely important, though in reality it would never occur. The common case of a public site at say /content/mysite that would always be public, especially on a published environment, should benefit greatly from that. The prerequisite of separate indexes with one for say /content/mysite in particular was done with Oak already, it's time to make use of it :) > Improve getSize performance for "public" content > > > Key: OAK-2807 > URL: https://issues.apache.org/jira/browse/OAK-2807 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: query, security >Affects Versions: 1.0.13, 1.2 >Reporter: Michael Marth > > Certain operations in the query engine like getting the size of a result set > or facets are expensive to compute due to the fact that ACLs need to be > computed on the entire result set. This issue is to discuss an idea how we > could improve this: > There is a very common special case: content (a subtree) that is readable by > everyone (anonymous). If we mark an index on that subtree as "readable by > everyone" on index creation then we could skip ACL check on the result set or > precompute/cache certain query results. > In order to avoid information leakage the index would have to be marked > "invalid" as soon as one node in that sub-tree is not readable by everyone > anymore. (could be checked through a commit hook) > Maybe this concept could even be generalized later to work with other > principals than everyone. > Just an idea - feel free to poke holes and shoot it down :) -- This message was sent by Atlassian JIRA (v6.3.4#6332)