[ 
https://issues.apache.org/jira/browse/OAK-2725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Florin Iordache updated OAK-2725:
---------------------------------
    Description: 
The {{ApproximateCounter.adjustCountSync}} public method that is used by the 
indexing engine will sometimes produce very unrealistic cost estimates. 

The problem is that it can produce an estimated cost that exceeds the estimated 
cost of the full traversal query, thus causing the index to be bypassed 
altogether, resulting in a full traversal rather than the use of the existing 
index.

Problem resides in the way the property counts are updated:
* The count property update goes through if two randoms are equal to zero: 
random(100) and random({1, 2, 4, 8, 16, ...}).
* Same static pseudo random generator for all invocations.

Even if #1 might seem improbable, it is statistically possible to reach a very 
high count with only a handful of invocations.

In practice I've found that running 100 tests with 1000 invocations if the 
adjustCountSync method will yield costs exceeding value 2000 in 4-10% of the 
tests. Attaching a patch for {{ApproximateCounterTest}} with this test case.

  was:
The {{ApproximateCounter.adjustCountSync}} public method that is used by the 
indexing engine will sometimes produce very unrealistic cost estimates. 

The problem is that it can produce an estimated cost that exceeds the estimated 
cost of the full traversal query, thus causing the index to be bypassed 
altogether, resulting in a full traversal rather than the use of the existing 
index.

Problem resides in the way the property counts are updated:
* The count property update goes through if two randoms are not zero: 
random(100) and random({1, 2, 4, 8, 16, ...}).
* Same static pseudo random generator for all invocations.

Even if #1 might seem improbable, it is statistically possible to reach a very 
high count with only a handful of invocations.

In practice I've found that running 100 tests with 1000 invocations if the 
adjustCountSync method will yield costs exceeding value 2000 in 4-10% of the 
tests. Attaching a patch for {{ApproximateCounterTest}} with this test case.


> Wrong indexed query estimates exceed more than double the actual index entries
> ------------------------------------------------------------------------------
>
>                 Key: OAK-2725
>                 URL: https://issues.apache.org/jira/browse/OAK-2725
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: query
>    Affects Versions: 1.1.8
>            Reporter: Florin Iordache
>            Priority: Critical
>             Fix For: 1.2
>
>         Attachments: OAK-2725-test.patch
>
>
> The {{ApproximateCounter.adjustCountSync}} public method that is used by the 
> indexing engine will sometimes produce very unrealistic cost estimates. 
> The problem is that it can produce an estimated cost that exceeds the 
> estimated cost of the full traversal query, thus causing the index to be 
> bypassed altogether, resulting in a full traversal rather than the use of the 
> existing index.
> Problem resides in the way the property counts are updated:
> * The count property update goes through if two randoms are equal to zero: 
> random(100) and random({1, 2, 4, 8, 16, ...}).
> * Same static pseudo random generator for all invocations.
> Even if #1 might seem improbable, it is statistically possible to reach a 
> very high count with only a handful of invocations.
> In practice I've found that running 100 tests with 1000 invocations if the 
> adjustCountSync method will yield costs exceeding value 2000 in 4-10% of the 
> tests. Attaching a patch for {{ApproximateCounterTest}} with this test case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to