[jira] [Commented] (OAK-7182) Make it possible to update Guava
[ https://issues.apache.org/jira/browse/OAK-7182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17558438#comment-17558438 ] Dawid Iwo Cokan commented on OAK-7182: -- I am available to support decoupling task. As I understand next task would be to replace guava references for other bundles (OAK-8857). But from this thread I understood there are public APIs that depends on Guava. What is a plan for those? We still have to depend on Guava to avoid major upgrade, correct? > Make it possible to update Guava > > > Key: OAK-7182 > URL: https://issues.apache.org/jira/browse/OAK-7182 > Project: Jackrabbit Oak > Issue Type: Wish >Reporter: Julian Reschke >Priority: Minor > Attachments: GuavaTests.java, OAK-7182-guava-21-3.diff, > OAK-7182-guava-21-4.diff, OAK-7182-guava-21.diff, OAK-7182-guava-23.6.1.diff, > guava.diff > > > We currently rely on Guava 15, and this affects all users of Oak because they > essentially need to use the same version. > This is an overall issue to investigate what would need to be done in Oak in > order to make updates possible. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (OAK-7182) Make it possible to update Guava
[ https://issues.apache.org/jira/browse/OAK-7182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17558428#comment-17558428 ] Dawid Iwo Cokan commented on OAK-7182: -- {quote}the question whether we use Guava, and if so how (directly, or by shading it) becomes secondary. {quote} Yes. But it has serious implication. Personally I am not fan of shading for following reason. Consider you shade Guava and noone has to care about version used. At the same time noone can change it. What if tomorrow new security vulnerability is detected? In such case anyone who uses OAK with given version cannot get rid of it until we release new OAK that will embed fixed Guava. Saying this I feel shading stands in a contrary with idea of maven and other packages management tool. If you feel upgrade to Guava 22 makes sense I can prepare patch. I tried locally seems to be easy adoption > Make it possible to update Guava > > > Key: OAK-7182 > URL: https://issues.apache.org/jira/browse/OAK-7182 > Project: Jackrabbit Oak > Issue Type: Wish >Reporter: Julian Reschke >Priority: Minor > Attachments: GuavaTests.java, OAK-7182-guava-21-3.diff, > OAK-7182-guava-21-4.diff, OAK-7182-guava-21.diff, OAK-7182-guava-23.6.1.diff, > guava.diff > > > We currently rely on Guava 15, and this affects all users of Oak because they > essentially need to use the same version. > This is an overall issue to investigate what would need to be done in Oak in > order to make updates possible. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (OAK-7182) Make it possible to update Guava
[ https://issues.apache.org/jira/browse/OAK-7182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17557617#comment-17557617 ] Dawid Iwo Cokan commented on OAK-7182: -- Hi [~reschke] We use Guava 15 as mentioned but problem with using different version by clients is not because of used version (nor the fact of exposed Guava code in OAK APIs) but because Guava makes incompatible changes, right? Based on Guava [statement|https://github.com/google/guava] starting from 22.0 all their APIs will remain compatible unless they are _@Beta:_ {quote}APIs without {{@Beta}} will remain binary-compatible for the indefinite future. (Previously, we sometimes removed such APIs after a deprecation period. The last release to remove non-{{{}@Beta{}}} APIs was Guava 21.0.) Even {{@Deprecated}} APIs will remain (again, unless they are {{{}@Beta{}}}). We have no plans to start removing things again, but officially, we're leaving our options open in case of surprises (like, say, a serious security problem). {quote} So correct me if I am wrong but clients would be free to chose the Guava version (whichever higher than 22.0) if we: * Depend on 22.0 Guava * Declare Import-Package as [22.0,) * Include [Guava Beta Checker|https://github.com/google/guava-beta-checker] in build chain to ensure no code depends on code that potentially does not exist in future releases > Make it possible to update Guava > > > Key: OAK-7182 > URL: https://issues.apache.org/jira/browse/OAK-7182 > Project: Jackrabbit Oak > Issue Type: Wish >Reporter: Julian Reschke >Priority: Minor > Attachments: GuavaTests.java, OAK-7182-guava-21-3.diff, > OAK-7182-guava-21-4.diff, OAK-7182-guava-21.diff, OAK-7182-guava-23.6.1.diff, > guava.diff > > > We currently rely on Guava 15, and this affects all users of Oak because they > essentially need to use the same version. > This is an overall issue to investigate what would need to be done in Oak in > order to make updates possible. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (OAK-9606) NodeType index is not ignored for facets
Dawid Iwo Cokan created OAK-9606: Summary: NodeType index is not ignored for facets Key: OAK-9606 URL: https://issues.apache.org/jira/browse/OAK-9606 Project: Jackrabbit Oak Issue Type: Bug Components: search Affects Versions: 1.6.21 Reporter: Dawid Iwo Cokan *Conditions:* Created 300 nodes of new _tmp:document_ type Added Lucene index for above Configured facets for one of property (_tmp:tag)_ for that node type *Steps:* * Prepare query to run: {code:java} SELECT [rep:facet(tmp:tags)] FROM [tmp:document] AS d WHERE (isdescendantnode([d], [/docsFolder])) {code} * Don't set limit on query *Expected:* Since there is index configured for that node the query engine should pick that index and should provide correct results *Current:* NodeType index is selected as least cost insetead and empty results is provided *Note:* * If limit is set on query then Lucene index is beating the other indexes and it works correctly * If _contains('*', d)_ is added to query, nodeType is ignored and lucene index is picked * If there will be too many documents present then Lucene index wins. This is because lucene index canot give higher cost than 1001 while nodeType is growing along with number of documents *Possible fix:* IMO the nodeType should be discarded whenever the query is having facets condition - as far as I could see the full text search condition is being checked Additionaly I am not sure why Query Engine takes care about limit for facets query. As per my understanding the limit has nothing do to in this case -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (OAK-9381) Access check delegated to query execution
Dawid Iwo Cokan created OAK-9381: Summary: Access check delegated to query execution Key: OAK-9381 URL: https://issues.apache.org/jira/browse/OAK-9381 Project: Jackrabbit Oak Issue Type: Wish Reporter: Dawid Iwo Cokan We are implementing a system to manage documents based on Jackrabbit Oak. We store thousands of them and we have have access rules set individually for every document (due to business requirements). We have configured the Lucene index to support all our queries but there are some users in the system that have access to only small subset of documents. When one of such user invokes the search it takes long time because OAK will first use index to read all results matching constraints and only then will check whether user has access to it. We were evaluating how to improve this and we simply added additional property to our document nodes and saved list of user ids who can read particular node. Then we extended definition of Lucene index to include this field. Next we ensured that all queries we perform add the condition for that property. Now results coming from LuceneIndex are 100% matched with current user access and perfomance is very good. I am adding this as a Wish as this should be for sure discussed in wider public. Especially there are known limitations / problems: * Lucene would not support negation of the property so if the node would have DENY set for some principal it would still have to be checked in memory * The property would be visible when reading a node, so would have to ensure it gets hidden * We'd have to ensure the property is aligned with current state of ACL, also when parent node settings are changed * Principal can have child principals and can be resolved dynamically so the finite list of all principal names who can access the node might vary over the time * In case of inheriting access the same principal would have to be set for each of the node in structure -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (OAK-9376) Optionally reject queries with huge result sets
[ https://issues.apache.org/jira/browse/OAK-9376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17299507#comment-17299507 ] Dawid Iwo Cokan commented on OAK-9376: -- Yes indeed, this seems to be applicable for our case so we will go for it > Optionally reject queries with huge result sets > --- > > Key: OAK-9376 > URL: https://issues.apache.org/jira/browse/OAK-9376 > Project: Jackrabbit Oak > Issue Type: Wish > Components: query >Reporter: Manfred Baedke >Assignee: Manfred Baedke >Priority: Minor > > In cases where processing a result of a query uses a lot of memory and/or > time (e.g. where filtering or ordering of many nodes in memory is required), > an option to set an upper limit to the number of processed nodes and fail the > query if the limit is exceeded would be useful. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (OAK-9376) Optionally reject queries with huge result sets
[ https://issues.apache.org/jira/browse/OAK-9376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17298662#comment-17298662 ] Dawid Iwo Cokan commented on OAK-9376: -- Hi Actually I am in touch with Manfred and I suggested this improvement based on our case, let me share more details about it. We are implementing a system that supports managing a lot of documents. In system we store around 100k documents and we perform number of searches. With that amount of data we assume it can work only when we properly use index. So any query that will not properly use index is assumed to be incorrect. Not it happened that in some corner cases our indexes were unable to get the results - there was a need for OAK to read nodes into memory and perform in memory filtering. For instance our index was incorrectly set for one of ordering conditions. When we performed a search we simply wanted to read first 10 items by given order. The query had one constraint that match index definition, so in query planner the index was picked. Although at the end it had to read all nodes into memory and order them. This is causing the search is very slow, so users don't wait for it to end assuming it simply fails and retry that leads to overheating a system. It would be much more convenient to be able to set a max nodes the query iterator can read, similar like we have for traverse option > Optionally reject queries with huge result sets > --- > > Key: OAK-9376 > URL: https://issues.apache.org/jira/browse/OAK-9376 > Project: Jackrabbit Oak > Issue Type: Wish > Components: query >Reporter: Manfred Baedke >Assignee: Manfred Baedke >Priority: Minor > > In cases where processing a result of a query uses a lot of memory and/or > time (e.g. where filtering or ordering of many nodes in memory is required), > an option to set an upper limit to the number of processed nodes and fail the > query if the limit is exceeded would be useful. -- This message was sent by Atlassian Jira (v8.3.4#803005)