subject:"\[jira\] \[Commented\] \(OAK\-2807\) Improve getSize performance for public content"

[jira] [Commented] (OAK-2807) Improve getSize performance for public content

2015-06-18 Thread angela (JIRA)

[
https://issues.apache.org/jira/browse/OAK-2807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14591761#comment-14591761
]

angela commented on OAK-2807:
-

hi michael

{quote}
There is a very common special case: content (a subtree) that is readable by
everyone (anonymous).
{quote}

unfortunately it only looks like being readable to everyone because the
access-checks will filter out all 'meta' data that is not readable. it's only
the 'regular' content that is readable. special things like e.g. the policy
that opens up the read-access for everyone is not world-readable, nor are other
special data like e.g. analytics data.

{quote}
If we mark an index on that subtree as readable by everyone on index creation
then we could skip ACL check on the result set or precompute/cache certain
query results.
{quote}

the problem is that you don't know if it is really readable by everyone for the
following reasons:
- any policy _above_ your target tree denying access for a user-principal will
take precedence
- any policy _below_ (e.g. CUG) that looks down access again will make your
test for 'readable by everyone' become wrong
- with OAK-1268 and having the latter covered by dedicated policies looking a
given implementation of {{AccessControlPolicy}} and {{AccessControlEntry}} will
no longer reflect the complete picture
- with OAK-2008 the permission evaluation will become a multiplexing one and
there will be simple shortcut to determine
'readable-for-everyone-for-the-whole-subtree' any more (if it ever was
possible).
- {{TreePermissions.canReadAll()}} is already intended to provide exact the
short-cut you are proposing and the reason why this only returns {{true}} for
the administrative access is, that i didn't see a scalable way to predict this!

{quote}
In order to avoid information leakage the index would have to be marked
invalid as soon as one node in that sub-tree is not readable by everyone
anymore. (could be checked through a commit hook)
{quote}

if that was _really_ feasible... why not... but so far i don't see how this
would work reliably for _every_ combination of principals.
as you can see in {{TreePermission}} I added this shortcut because I thought
that it might be doable but so far I didn't find a solution for this except
for the the trivial case.

{quote}
Maybe this concept could even be generalized later to work with other
principals than everyone.
{quote}

the problem is not _everyone_ versus some other principals... if we have a
concept that _really_ works, it doesn't matter on whether it's everyone or some
other principal.

btw: the same suggestion has been made by david ages ago, because he just made
the same assumptions that this is easy to achieve. but unfortunately it's not
if you look at it from a repository point of view and not from a demo-hack
point of view.

kind regards
angela

Improve getSize performance for public content

Key: OAK-2807
URL: https://issues.apache.org/jira/browse/OAK-2807
Project: Jackrabbit Oak
Issue Type: Improvement
Components: query, security
Affects Versions: 1.0.13, 1.2
Reporter: Michael Marth

Certain operations in the query engine like getting the size of a result set
or facets are expensive to compute due to the fact that ACLs need to be
computed on the entire result set. This issue is to discuss an idea how we
could improve this:
There is a very common special case: content (a subtree) that is readable by
everyone (anonymous). If we mark an index on that subtree as readable by
everyone on index creation then we could skip ACL check on the result set or
precompute/cache certain query results.
In order to avoid information leakage the index would have to be marked
invalid as soon as one node in that sub-tree is not readable by everyone
anymore. (could be checked through a commit hook)
Maybe this concept could even be generalized later to work with other
principals than everyone.
Just an idea - feel free to poke holes and shoot it down :)

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (OAK-2807) Improve getSize performance for public content

2015-05-14 Thread Davide Giannella (JIRA)


[ 
https://issues.apache.org/jira/browse/OAK-2807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14543465#comment-14543465
 ] 

Davide Giannella commented on OAK-2807:
---

I like the idea but as of now I don't know how to easily/cleanly solve
it. Unfortunately the JCR API don't allow the {{count()}} function in
query so we have to rely on the Iterator implementation.

First the index should expose to the query engine the information
about it can serve the counting. 

For example lucene can serve counts of the number of returned nodes,
using the 
[TotalHitCountCollector|http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/search/TotalHitCountCollector.html].


So we should work out something along the following lines

{code}
diff --git 
a/oak-core/src/main/java/org/apache/jackrabbit/oak/spi/query/QueryIndex.java 
b/oak-core/src/main/java/org/apache/jackrabbit/oak/spi/query/QueryIndex.java
index f86dbc0..20f7f63 100644
--- a/oak-core/src/main/java/org/apache/jackrabbit/oak/spi/query/QueryIndex.java
+++ b/oak-core/src/main/java/org/apache/jackrabbit/oak/spi/query/QueryIndex.java
@@ -309,6 +309,12 @@ public interface QueryIndex {
 Object getAttribute(String name);
 
 /**
+ * 
+ * @return true if the current index could serve the count of returned 
nodes
+ */
+boolean isCountIndex();
+
+/**
  * A builder for index plans.
  */
 public class Builder {
@@ -320,6 +326,7 @@ public interface QueryIndex {
 protected boolean isDelayed;
 protected boolean isFulltextIndex;
 protected boolean includesNodeData;
+protected boolean countIndex;
 protected ListOrderEntry sortOrder;
 protected NodeState definition;
 protected PropertyRestriction propRestriction;
@@ -386,6 +393,11 @@ public interface QueryIndex {
return this;
 }
 
+public Builder setCountIndex(final boolean countIndex) {
+this.countIndex = countIndex;
+return this;
+}
+
 public IndexPlan build() {
 
 return new IndexPlan() {
@@ -417,6 +429,8 @@ public interface QueryIndex {
 private final MapString, Object attributes =
 Builder.this.attributes;
 
+private final boolean countIndex = Builder.this.countIndex;
+
 @Override
 public String toString() {
 return String.format(
@@ -430,7 +444,8 @@ public interface QueryIndex {
 +  sortOrder : %s,
 +  definition : %s,
 +  propertyRestriction : %s,
-+  pathPrefix : %s },
++  pathPrefix : %s, 
++  countIndex : %s },
 costPerExecution,
 costPerEntry,
 estimatedEntryCount,
@@ -441,11 +456,17 @@ public interface QueryIndex {
 sortOrder,
 definition,
 propRestriction,
-pathPrefix
+pathPrefix,
+countIndex
 );
 }
 
 @Override
+public boolean isCountIndex() {
+return countIndex;
+}
+
+@Override
 public double getCostPerExecution() {
 return costPerExecution;
 }

{code}

Then we should work on the Iterator side in oak making it aware of the
executed plan and in case rely on the index itself. 

The iterator ({{RowIterator}} and {{NodeIterator}}) is initialised in
[QueryResultImpl|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-jcr/src/main/java/org/apache/jackrabbit/oak/jcr/query/QueryResultImpl.java]

We'll probably need to implement an Oak version of the Iterators which
takes in account the executed plan/query. Unfortunately the information
about the plan and the index is lost way down the chain in the
QueryImpl. So far I didn't find any clear way to pass this information
along.





 Improve getSize performance for public content
 

 Key: OAK-2807
 URL: https://issues.apache.org/jira/browse/OAK-2807
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: query, security
Affects Versions: 1.0.13, 1.2
Reporter: Michael Marth

 Certain operations in the query engine like getting the size of a result

[jira] [Commented] (OAK-2807) Improve getSize performance for public content

2015-04-24 Thread Alexander Klimetschek (JIRA)


[ 
https://issues.apache.org/jira/browse/OAK-2807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14511865#comment-14511865
 ] 

Alexander Klimetschek commented on OAK-2807:


Sounds great. The security folks will argue that the invalidation is extremely 
important, though in reality it would never occur. The common case of a public 
site at say /content/mysite that would always be public, especially on a 
published environment, should benefit greatly from that. The prerequisite of 
separate indexes with one for say /content/mysite in particular was done with 
Oak already, it's time to make use of it :)

 Improve getSize performance for public content
 

 Key: OAK-2807
 URL: https://issues.apache.org/jira/browse/OAK-2807
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: query, security
Affects Versions: 1.0.13, 1.2
Reporter: Michael Marth

 Certain operations in the query engine like getting the size of a result set 
 or facets are expensive to compute due to the fact that ACLs need to be 
 computed on the entire result set. This issue is to discuss an idea how we 
 could improve this:
 There is a very common special case: content (a subtree) that is readable by 
 everyone (anonymous). If we mark an index on that subtree as readable by 
 everyone on index creation then we could skip ACL check on the result set or 
  precompute/cache certain query results.
 In order to avoid information leakage the index would have to be marked 
 invalid as soon as one node in that sub-tree is not readable by everyone 
 anymore. (could be checked through a commit hook)
 Maybe this concept could even be generalized later to work with other 
 principals than everyone.
 Just an idea - feel free to poke holes and shoot it down :)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (OAK-2807) Improve getSize performance for public content

[jira] [Commented] (OAK-2807) Improve getSize performance for public content

[jira] [Commented] (OAK-2807) Improve getSize performance for public content

3 matches

Site Navigation

Mail list logo

Footer information