[ 
https://issues.apache.org/jira/browse/JENA-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shawn Smith updated JENA-1771:
------------------------------
    Description: 
It looks like Jena assumes that OpDistinct preserves order, but order is not 
preserved when spilling occurs. This is only a problem when the 
ARQ.spillToDiskThreshold setting has been configured.

Consider the following query:
{code:java}
PREFIX : <http://example.org/>
SELECT DISTINCT  *
WHERE
  { ?x  :p  ?v }
ORDER BY ASC(?v)
{code}
Here's the query plan for this query:
{code:java}
(distinct
  (order ((asc ?v))
    (bgp (triple ?x <http://example.org/p> ?v))))
{code}
Jena executes the ORDER BY ASC(?v) before the DISTINCT, relying on the SPARQL 
requirement:
{quote}The order of Distinct(Ψ) must preserve any ordering given by OrderBy.
{quote}
But, when spilling, QueryIterDistinct (which executes OpDistinct) creates a 
DistinctDataBag with a BindingComparator without any sort conditions. As a 
result, the DISTINCT operation sorts using "compareBindingsSyntactic()" and 
doesn't preserve the ORDER BY ASC(?v) requirement.

Note that some query plans will reorder the ORDER BY and DISTINCT, making 
things work correctly. For example, adding a LIMIT 5 clause to the query above 
results in a "(top (5 (asc ?v))" operation that doesn't suffer from the bug.

You can reproduce this by injecting the following into QueryTest.java then 
running the ARQTestRefEngine tests:
{code:java}
void runTestSelect(Query query, QueryExecution qe)
{
    qe.getContext().set(ARQ.spillToDiskThreshold, 4);   // add this line
    ...
{code}
For example, "ARQTestRefEngine -> Algebra optimizations -> 
QueryTest.opt-top-05" will fail with:
{code:java}
Query: 
PREFIX  :     <http://example/>

SELECT DISTINCT  *
WHERE
  { ?x  :p  ?v }
ORDER BY ASC(?v)
LIMIT   5

Got: 5 --------------------------------
-------------
| x    | v  |
=============
| :x1  | 1  |
| :x2  | 2  |
| :x10 | 10 |
| :x11 | 11 |
| :x12 | 12 |
-------------
Expected: 5 -----------------------------
------------
| x    | v |
============
| :x1  | 1 |
| :x2  | 2 |
| :x3  | 3 |
| :x3a | 3 |
| :x4  | 4 |
------------

junit.framework.AssertionFailedError: Results do not match
        at junit.framework.Assert.fail(Assert.java:57)
        at junit.framework.Assert.assertTrue(Assert.java:22)
        at junit.framework.TestCase.assertTrue(TestCase.java:192)
        at 
org.apache.jena.sparql.junit.QueryTest.runTestSelect(QueryTest.java:284)
        at 
org.apache.jena.sparql.junit.QueryTest.runTestForReal(QueryTest.java:201)
        at 
org.apache.jena.sparql.junit.EarlTestCase.runTest(EarlTestCase.java:88)
        at junit.framework.TestCase.runBare(TestCase.java:141)
        at junit.framework.TestResult$1.protect(TestResult.java:122)
        at junit.framework.TestResult.runProtected(TestResult.java:142)
        at junit.framework.TestResult.run(TestResult.java:125)
        at junit.framework.TestCase.run(TestCase.java:129)
{code}

  was:
It looks like Jena assumes that OpDistinct preserves order, but order is not 
preserved when spilling occurs. This is only a problem when the 
ARQ.spillToDiskThreshold setting has been configured.

Consider the following query:
{code:java}
PREFIX : <http://example.org/>
SELECT DISTINCT  *
WHERE
  { ?x  :p  ?v }
ORDER BY ASC(?v)
{code}
Here's the query plan for this query:
{code:java}
(distinct
  (order ((asc ?v))
    (bgp (triple ?x <http://example.org/p> ?v))))
{code}
Jena executes the ORDER BY ASC(?v) before the DISTINCT, relying on the SPARQL 
requirement:
{quote}The order of Distinct(Ψ) must preserve any ordering given by OrderBy.
{quote}
But, when spilling, QueryIterDistinct (which executes OpDistinct) creates a 
DistinctDataBag with a BindingComparator without any sort conditions. As a 
result, the DISTINCT operation sorts using "compareBindingsSyntactic()" and 
doesn't preserve the ORDER BY ASC(?v) requirement.

Note that some query plans will reorder the ORDER BY and DISTINCT, making 
things work correctly. For example, adding a LIMIT clause to the query above 
results in a "(top (5 (asc ?v))" operation that doesn't suffer from the bug.

You can reproduce this by injecting the following into QueryTest.java then 
running the ARQTestRefEngine tests:
{code:java}
void runTestSelect(Query query, QueryExecution qe)
{
    qe.getContext().set(ARQ.spillToDiskThreshold, 4);   // add this line
    ...
{code}
For example, "ARQTestRefEngine -> Algebra optimizations -> 
QueryTest.opt-top-05" will fail with:
{code:java}
Query: 
PREFIX  :     <http://example/>

SELECT DISTINCT  *
WHERE
  { ?x  :p  ?v }
ORDER BY ASC(?v)
LIMIT   5

Got: 5 --------------------------------
-------------
| x    | v  |
=============
| :x1  | 1  |
| :x2  | 2  |
| :x10 | 10 |
| :x11 | 11 |
| :x12 | 12 |
-------------
Expected: 5 -----------------------------
------------
| x    | v |
============
| :x1  | 1 |
| :x2  | 2 |
| :x3  | 3 |
| :x3a | 3 |
| :x4  | 4 |
------------

junit.framework.AssertionFailedError: Results do not match
        at junit.framework.Assert.fail(Assert.java:57)
        at junit.framework.Assert.assertTrue(Assert.java:22)
        at junit.framework.TestCase.assertTrue(TestCase.java:192)
        at 
org.apache.jena.sparql.junit.QueryTest.runTestSelect(QueryTest.java:284)
        at 
org.apache.jena.sparql.junit.QueryTest.runTestForReal(QueryTest.java:201)
        at 
org.apache.jena.sparql.junit.EarlTestCase.runTest(EarlTestCase.java:88)
        at junit.framework.TestCase.runBare(TestCase.java:141)
        at junit.framework.TestResult$1.protect(TestResult.java:122)
        at junit.framework.TestResult.runProtected(TestResult.java:142)
        at junit.framework.TestResult.run(TestResult.java:125)
        at junit.framework.TestCase.run(TestCase.java:129)
{code}


> Spilling combined with DISTINCT .. ORDER BY returns rows in the wrong order
> ---------------------------------------------------------------------------
>
>                 Key: JENA-1771
>                 URL: https://issues.apache.org/jira/browse/JENA-1771
>             Project: Apache Jena
>          Issue Type: Bug
>          Components: ARQ
>    Affects Versions: Jena 3.13.1
>            Reporter: Shawn Smith
>            Priority: Major
>
> It looks like Jena assumes that OpDistinct preserves order, but order is not 
> preserved when spilling occurs. This is only a problem when the 
> ARQ.spillToDiskThreshold setting has been configured.
> Consider the following query:
> {code:java}
> PREFIX : <http://example.org/>
> SELECT DISTINCT  *
> WHERE
>   { ?x  :p  ?v }
> ORDER BY ASC(?v)
> {code}
> Here's the query plan for this query:
> {code:java}
> (distinct
>   (order ((asc ?v))
>     (bgp (triple ?x <http://example.org/p> ?v))))
> {code}
> Jena executes the ORDER BY ASC(?v) before the DISTINCT, relying on the SPARQL 
> requirement:
> {quote}The order of Distinct(Ψ) must preserve any ordering given by OrderBy.
> {quote}
> But, when spilling, QueryIterDistinct (which executes OpDistinct) creates a 
> DistinctDataBag with a BindingComparator without any sort conditions. As a 
> result, the DISTINCT operation sorts using "compareBindingsSyntactic()" and 
> doesn't preserve the ORDER BY ASC(?v) requirement.
> Note that some query plans will reorder the ORDER BY and DISTINCT, making 
> things work correctly. For example, adding a LIMIT 5 clause to the query 
> above results in a "(top (5 (asc ?v))" operation that doesn't suffer from the 
> bug.
> You can reproduce this by injecting the following into QueryTest.java then 
> running the ARQTestRefEngine tests:
> {code:java}
> void runTestSelect(Query query, QueryExecution qe)
> {
>     qe.getContext().set(ARQ.spillToDiskThreshold, 4);   // add this line
>     ...
> {code}
> For example, "ARQTestRefEngine -> Algebra optimizations -> 
> QueryTest.opt-top-05" will fail with:
> {code:java}
> Query: 
> PREFIX  :     <http://example/>
> SELECT DISTINCT  *
> WHERE
>   { ?x  :p  ?v }
> ORDER BY ASC(?v)
> LIMIT   5
> Got: 5 --------------------------------
> -------------
> | x    | v  |
> =============
> | :x1  | 1  |
> | :x2  | 2  |
> | :x10 | 10 |
> | :x11 | 11 |
> | :x12 | 12 |
> -------------
> Expected: 5 -----------------------------
> ------------
> | x    | v |
> ============
> | :x1  | 1 |
> | :x2  | 2 |
> | :x3  | 3 |
> | :x3a | 3 |
> | :x4  | 4 |
> ------------
> junit.framework.AssertionFailedError: Results do not match
>       at junit.framework.Assert.fail(Assert.java:57)
>       at junit.framework.Assert.assertTrue(Assert.java:22)
>       at junit.framework.TestCase.assertTrue(TestCase.java:192)
>       at 
> org.apache.jena.sparql.junit.QueryTest.runTestSelect(QueryTest.java:284)
>       at 
> org.apache.jena.sparql.junit.QueryTest.runTestForReal(QueryTest.java:201)
>       at 
> org.apache.jena.sparql.junit.EarlTestCase.runTest(EarlTestCase.java:88)
>       at junit.framework.TestCase.runBare(TestCase.java:141)
>       at junit.framework.TestResult$1.protect(TestResult.java:122)
>       at junit.framework.TestResult.runProtected(TestResult.java:142)
>       at junit.framework.TestResult.run(TestResult.java:125)
>       at junit.framework.TestCase.run(TestCase.java:129)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to