[jira] [Commented] (OAK-2807) Improve getSize performance for public content

2015-06-18 Thread angela (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-2807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14591761#comment-14591761
 ] 

angela commented on OAK-2807:
-

hi michael

{quote}
There is a very common special case: content (a subtree) that is readable by 
everyone (anonymous). 
{quote}

unfortunately it only looks like being readable to everyone because the 
access-checks will filter out all 'meta' data that is not readable. it's only 
the 'regular' content that is readable. special things like e.g. the policy 
that opens up the read-access for everyone is not world-readable, nor are other 
special data like e.g. analytics data.

{quote}
If we mark an index on that subtree as readable by everyone on index creation 
then we could skip ACL check on the result set or precompute/cache certain 
query results.
{quote}

the problem is that you don't know if it is really readable by everyone for the 
following reasons:
- any policy _above_ your target tree denying access for a  user-principal will 
take precedence
- any policy _below_ (e.g. CUG) that looks down access again will make your 
test for 'readable by everyone' become wrong
- with OAK-1268 and having the latter covered by dedicated policies looking a 
given implementation of {{AccessControlPolicy}} and {{AccessControlEntry}} will 
no longer reflect the complete picture
- with OAK-2008 the permission evaluation will become a multiplexing one and 
there will be simple shortcut to determine 
'readable-for-everyone-for-the-whole-subtree' any more (if it ever was 
possible).
- {{TreePermissions.canReadAll()}} is already intended to provide exact the 
short-cut you are proposing and the reason why this only returns {{true}} for 
the administrative access is, that i didn't see a scalable way to predict this!

{quote}
In order to avoid information leakage the index would have to be marked 
invalid as soon as one node in that sub-tree is not readable by everyone 
anymore. (could be checked through a commit hook)
{quote}

if that was _really_ feasible... why not... but so far i don't see how this 
would work reliably for _every_ combination of principals. 
as you can see in {{TreePermission}} I added this shortcut because I thought 
that it might be doable but so far I didn't find a solution for this except 
for the the trivial case.

{quote}
Maybe this concept could even be generalized later to work with other 
principals than everyone.
{quote}

the problem is not _everyone_ versus some other principals... if we have a 
concept that _really_ works, it doesn't matter on whether it's everyone or some 
other principal.

btw: the same suggestion has been made by david ages ago, because he just made 
the same assumptions that this is easy to achieve. but unfortunately it's not 
if you look at it from a repository point of view and not from a demo-hack 
point of view.

kind regards
angela

 Improve getSize performance for public content
 

 Key: OAK-2807
 URL: https://issues.apache.org/jira/browse/OAK-2807
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: query, security
Affects Versions: 1.0.13, 1.2
Reporter: Michael Marth

 Certain operations in the query engine like getting the size of a result set 
 or facets are expensive to compute due to the fact that ACLs need to be 
 computed on the entire result set. This issue is to discuss an idea how we 
 could improve this:
 There is a very common special case: content (a subtree) that is readable by 
 everyone (anonymous). If we mark an index on that subtree as readable by 
 everyone on index creation then we could skip ACL check on the result set or 
  precompute/cache certain query results.
 In order to avoid information leakage the index would have to be marked 
 invalid as soon as one node in that sub-tree is not readable by everyone 
 anymore. (could be checked through a commit hook)
 Maybe this concept could even be generalized later to work with other 
 principals than everyone.
 Just an idea - feel free to poke holes and shoot it down :)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-2807) Improve getSize performance for public content

2015-05-14 Thread Davide Giannella (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-2807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14543465#comment-14543465
 ] 

Davide Giannella commented on OAK-2807:
---

I like the idea but as of now I don't know how to easily/cleanly solve
it. Unfortunately the JCR API don't allow the {{count()}} function in
query so we have to rely on the Iterator implementation.

First the index should expose to the query engine the information
about it can serve the counting. 

For example lucene can serve counts of the number of returned nodes,
using the 
[TotalHitCountCollector|http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/search/TotalHitCountCollector.html].


So we should work out something along the following lines

{code}
diff --git 
a/oak-core/src/main/java/org/apache/jackrabbit/oak/spi/query/QueryIndex.java 
b/oak-core/src/main/java/org/apache/jackrabbit/oak/spi/query/QueryIndex.java
index f86dbc0..20f7f63 100644
--- a/oak-core/src/main/java/org/apache/jackrabbit/oak/spi/query/QueryIndex.java
+++ b/oak-core/src/main/java/org/apache/jackrabbit/oak/spi/query/QueryIndex.java
@@ -309,6 +309,12 @@ public interface QueryIndex {
 Object getAttribute(String name);
 
 /**
+ * 
+ * @return true if the current index could serve the count of returned 
nodes
+ */
+boolean isCountIndex();
+
+/**
  * A builder for index plans.
  */
 public class Builder {
@@ -320,6 +326,7 @@ public interface QueryIndex {
 protected boolean isDelayed;
 protected boolean isFulltextIndex;
 protected boolean includesNodeData;
+protected boolean countIndex;
 protected ListOrderEntry sortOrder;
 protected NodeState definition;
 protected PropertyRestriction propRestriction;
@@ -386,6 +393,11 @@ public interface QueryIndex {
return this;
 }
 
+public Builder setCountIndex(final boolean countIndex) {
+this.countIndex = countIndex;
+return this;
+}
+
 public IndexPlan build() {
 
 return new IndexPlan() {
@@ -417,6 +429,8 @@ public interface QueryIndex {
 private final MapString, Object attributes =
 Builder.this.attributes;
 
+private final boolean countIndex = Builder.this.countIndex;
+
 @Override
 public String toString() {
 return String.format(
@@ -430,7 +444,8 @@ public interface QueryIndex {
 +  sortOrder : %s,
 +  definition : %s,
 +  propertyRestriction : %s,
-+  pathPrefix : %s },
++  pathPrefix : %s, 
++  countIndex : %s },
 costPerExecution,
 costPerEntry,
 estimatedEntryCount,
@@ -441,11 +456,17 @@ public interface QueryIndex {
 sortOrder,
 definition,
 propRestriction,
-pathPrefix
+pathPrefix,
+countIndex
 );
 }
 
 @Override
+public boolean isCountIndex() {
+return countIndex;
+}
+
+@Override
 public double getCostPerExecution() {
 return costPerExecution;
 }

{code}

Then we should work on the Iterator side in oak making it aware of the
executed plan and in case rely on the index itself. 

The iterator ({{RowIterator}} and {{NodeIterator}}) is initialised in
[QueryResultImpl|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-jcr/src/main/java/org/apache/jackrabbit/oak/jcr/query/QueryResultImpl.java]

We'll probably need to implement an Oak version of the Iterators which
takes in account the executed plan/query. Unfortunately the information
about the plan and the index is lost way down the chain in the
QueryImpl. So far I didn't find any clear way to pass this information
along.





 Improve getSize performance for public content
 

 Key: OAK-2807
 URL: https://issues.apache.org/jira/browse/OAK-2807
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: query, security
Affects Versions: 1.0.13, 1.2
Reporter: Michael Marth

 Certain operations in the query engine like getting the size of a result 

[jira] [Commented] (OAK-2807) Improve getSize performance for public content

2015-04-24 Thread Alexander Klimetschek (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-2807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14511865#comment-14511865
 ] 

Alexander Klimetschek commented on OAK-2807:


Sounds great. The security folks will argue that the invalidation is extremely 
important, though in reality it would never occur. The common case of a public 
site at say /content/mysite that would always be public, especially on a 
published environment, should benefit greatly from that. The prerequisite of 
separate indexes with one for say /content/mysite in particular was done with 
Oak already, it's time to make use of it :)

 Improve getSize performance for public content
 

 Key: OAK-2807
 URL: https://issues.apache.org/jira/browse/OAK-2807
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: query, security
Affects Versions: 1.0.13, 1.2
Reporter: Michael Marth

 Certain operations in the query engine like getting the size of a result set 
 or facets are expensive to compute due to the fact that ACLs need to be 
 computed on the entire result set. This issue is to discuss an idea how we 
 could improve this:
 There is a very common special case: content (a subtree) that is readable by 
 everyone (anonymous). If we mark an index on that subtree as readable by 
 everyone on index creation then we could skip ACL check on the result set or 
  precompute/cache certain query results.
 In order to avoid information leakage the index would have to be marked 
 invalid as soon as one node in that sub-tree is not readable by everyone 
 anymore. (could be checked through a commit hook)
 Maybe this concept could even be generalized later to work with other 
 principals than everyone.
 Just an idea - feel free to poke holes and shoot it down :)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)