[jira] [Commented] (OAK-7182) Make it possible to update Guava

2022-06-24 Thread Dawid Iwo Cokan (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-7182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17558438#comment-17558438
 ] 

Dawid Iwo Cokan commented on OAK-7182:
--

I am available to support decoupling task. As I understand next task would be 
to replace guava references for other bundles (OAK-8857). But from this thread 
I understood there are public APIs that depends on Guava. What is a plan for 
those? We still have to depend on Guava to avoid major upgrade, correct? 

> Make it possible to update Guava
> 
>
> Key: OAK-7182
> URL: https://issues.apache.org/jira/browse/OAK-7182
> Project: Jackrabbit Oak
>  Issue Type: Wish
>Reporter: Julian Reschke
>Priority: Minor
> Attachments: GuavaTests.java, OAK-7182-guava-21-3.diff, 
> OAK-7182-guava-21-4.diff, OAK-7182-guava-21.diff, OAK-7182-guava-23.6.1.diff, 
> guava.diff
>
>
> We currently rely on Guava 15, and this affects all users of Oak because they 
> essentially need to use the same version.
> This is an overall issue to investigate what would need to be done in Oak in 
> order to make updates possible.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (OAK-7182) Make it possible to update Guava

2022-06-24 Thread Dawid Iwo Cokan (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-7182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17558428#comment-17558428
 ] 

Dawid Iwo Cokan commented on OAK-7182:
--

{quote}the question whether we use Guava, and if so how (directly, or by 
shading it) becomes secondary.
{quote}
Yes. But it has serious implication. Personally I am not fan of shading for 
following reason. Consider you shade Guava and noone has to care about version 
used. At the same time noone can change it. What if tomorrow new security 
vulnerability is detected? In such case anyone who uses OAK with given version 
cannot get rid of it until we release new OAK that will embed fixed Guava. 
Saying this I feel shading stands in a contrary with idea of maven and other 
packages management tool. 

If you feel upgrade to Guava 22 makes sense I can prepare patch. I tried 
locally seems to be easy adoption

> Make it possible to update Guava
> 
>
> Key: OAK-7182
> URL: https://issues.apache.org/jira/browse/OAK-7182
> Project: Jackrabbit Oak
>  Issue Type: Wish
>Reporter: Julian Reschke
>Priority: Minor
> Attachments: GuavaTests.java, OAK-7182-guava-21-3.diff, 
> OAK-7182-guava-21-4.diff, OAK-7182-guava-21.diff, OAK-7182-guava-23.6.1.diff, 
> guava.diff
>
>
> We currently rely on Guava 15, and this affects all users of Oak because they 
> essentially need to use the same version.
> This is an overall issue to investigate what would need to be done in Oak in 
> order to make updates possible.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (OAK-7182) Make it possible to update Guava

2022-06-22 Thread Dawid Iwo Cokan (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-7182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17557617#comment-17557617
 ] 

Dawid Iwo Cokan commented on OAK-7182:
--

Hi [~reschke] 

 

We use Guava 15 as mentioned but problem with using different version by 
clients is not because of used version (nor the fact of exposed Guava code in 
OAK APIs) but because Guava makes incompatible changes, right? 

Based on Guava [statement|https://github.com/google/guava] starting from 22.0 
all their APIs will remain compatible unless they are _@Beta:_
{quote}APIs without {{@Beta}} will remain binary-compatible for the indefinite 
future. (Previously, we sometimes removed such APIs after a deprecation period. 
The last release to remove non-{{{}@Beta{}}} APIs was Guava 21.0.) Even 
{{@Deprecated}} APIs will remain (again, unless they are {{{}@Beta{}}}). We 
have no plans to start removing things again, but officially, we're leaving our 
options open in case of surprises (like, say, a serious security problem).
{quote}
 

So correct me if I am wrong but clients would be free to chose the Guava 
version (whichever higher than 22.0) if we:
 * Depend on 22.0 Guava
 * Declare Import-Package as [22.0,)
 * Include [Guava Beta Checker|https://github.com/google/guava-beta-checker] in 
build chain to ensure no code depends on code that potentially does not exist 
in future releases

 

> Make it possible to update Guava
> 
>
> Key: OAK-7182
> URL: https://issues.apache.org/jira/browse/OAK-7182
> Project: Jackrabbit Oak
>  Issue Type: Wish
>Reporter: Julian Reschke
>Priority: Minor
> Attachments: GuavaTests.java, OAK-7182-guava-21-3.diff, 
> OAK-7182-guava-21-4.diff, OAK-7182-guava-21.diff, OAK-7182-guava-23.6.1.diff, 
> guava.diff
>
>
> We currently rely on Guava 15, and this affects all users of Oak because they 
> essentially need to use the same version.
> This is an overall issue to investigate what would need to be done in Oak in 
> order to make updates possible.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (OAK-9606) NodeType index is not ignored for facets

2021-10-26 Thread Dawid Iwo Cokan (Jira)
Dawid Iwo Cokan created OAK-9606:


 Summary: NodeType index is not ignored for facets
 Key: OAK-9606
 URL: https://issues.apache.org/jira/browse/OAK-9606
 Project: Jackrabbit Oak
  Issue Type: Bug
  Components: search
Affects Versions: 1.6.21
Reporter: Dawid Iwo Cokan


*Conditions:*

Created 300 nodes of new _tmp:document_ type

Added Lucene index for above

Configured facets for one of property (_tmp:tag)_ for that node type

*Steps:*
 * Prepare query to run:
{code:java}
SELECT [rep:facet(tmp:tags)] FROM [tmp:document] AS d WHERE  
(isdescendantnode([d], [/docsFolder])) {code}

 * Don't set limit on query

 

*Expected:*

Since there is index configured for that node the query engine should pick that 
index and should provide correct results

*Current:*

NodeType index is selected as least cost insetead and empty results is provided

 

*Note:*
 * If limit is set on query then Lucene index is beating the other indexes and 
it works correctly
 * If _contains('*', d)_ is added to query, nodeType is ignored and lucene 
index is picked
 * If there will be too many documents present then Lucene index wins. This is 
because lucene index canot give higher cost than 1001 while nodeType is growing 
along with number of documents

 

*Possible fix:*

IMO the nodeType should be discarded whenever the query is having facets 
condition -  as far as I could see the full text search condition is being 
checked

Additionaly I am not sure why Query Engine takes care about limit for facets 
query. As per my understanding the limit has nothing do to in this case

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (OAK-9381) Access check delegated to query execution

2021-03-11 Thread Dawid Iwo Cokan (Jira)
Dawid Iwo Cokan created OAK-9381:


 Summary: Access check delegated to query execution
 Key: OAK-9381
 URL: https://issues.apache.org/jira/browse/OAK-9381
 Project: Jackrabbit Oak
  Issue Type: Wish
Reporter: Dawid Iwo Cokan


We are implementing a system to manage documents based on Jackrabbit Oak. We 
store thousands of them and we have have access rules set individually for 
every document (due to business requirements). We have configured the Lucene 
index to support all our queries but there are some users in the system that 
have access to only small subset of documents. When one of such user invokes 
the search it takes long time because OAK will first use index to read all 
results matching constraints and only then will check whether user has access 
to it.

We were evaluating how to improve this and we simply added additional property 
to our document nodes and saved list of user ids who can read particular node. 
Then we extended definition of Lucene index to include this field. 

Next we ensured that all queries we perform add the condition for that 
property. Now results coming from LuceneIndex are 100% matched with current 
user access and perfomance is very good. 

 

I am adding this as a Wish as this should be for sure discussed in wider 
public. Especially there are known limitations / problems:
 * Lucene would not support negation of the property so if the node would have 
DENY set for some principal it would still have to be checked in memory
 * The property would be visible when reading a node, so would have to ensure 
it gets hidden
 * We'd have to ensure the property is aligned with current state of ACL, also 
when parent node settings are changed
 * Principal can have child principals and can be resolved dynamically so the 
finite list of all principal names who can access the node might vary over the 
time
 * In case of inheriting access the same principal would have to be set for 
each of the node in structure

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OAK-9376) Optionally reject queries with huge result sets

2021-03-11 Thread Dawid Iwo Cokan (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-9376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17299507#comment-17299507
 ] 

Dawid Iwo Cokan commented on OAK-9376:
--

Yes indeed, this seems to be applicable for our case so we will go for it

> Optionally reject queries with huge result sets
> ---
>
> Key: OAK-9376
> URL: https://issues.apache.org/jira/browse/OAK-9376
> Project: Jackrabbit Oak
>  Issue Type: Wish
>  Components: query
>Reporter: Manfred Baedke
>Assignee: Manfred Baedke
>Priority: Minor
>
> In cases where processing a result of a query uses a lot of memory and/or 
> time (e.g. where filtering or ordering of many nodes in memory is required), 
> an option to set an upper limit to the number of processed nodes and fail the 
> query if the limit is exceeded would be useful.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OAK-9376) Optionally reject queries with huge result sets

2021-03-10 Thread Dawid Iwo Cokan (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-9376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17298662#comment-17298662
 ] 

Dawid Iwo Cokan commented on OAK-9376:
--

Hi

Actually I am in touch with Manfred and I suggested this improvement based on 
our case, let me share more details about it. 

We are implementing a system that supports managing a lot of documents. In 
system we store around 100k documents and we perform number of searches. With 
that amount of data we assume it can work only when we properly use index. So 
any query that will not properly use index is assumed to be incorrect. Not it 
happened that in some corner cases our indexes were unable to get the results - 
there was a need for OAK to read nodes into memory and perform in memory 
filtering. 

For instance our index was incorrectly set for one of ordering conditions. When 
we performed a search we simply wanted to read first 10 items by given order. 
The query had one constraint that match index definition, so in query planner 
the index was picked. Although at the end it had to read all nodes into memory 
and order them. This is causing the search is very slow, so users don't wait 
for it to end assuming it simply fails and retry that leads to overheating a 
system. It would be much more convenient to be able to set a max nodes the 
query iterator can read, similar like we have for traverse option

> Optionally reject queries with huge result sets
> ---
>
> Key: OAK-9376
> URL: https://issues.apache.org/jira/browse/OAK-9376
> Project: Jackrabbit Oak
>  Issue Type: Wish
>  Components: query
>Reporter: Manfred Baedke
>Assignee: Manfred Baedke
>Priority: Minor
>
> In cases where processing a result of a query uses a lot of memory and/or 
> time (e.g. where filtering or ordering of many nodes in memory is required), 
> an option to set an upper limit to the number of processed nodes and fail the 
> query if the limit is exceeded would be useful.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)