Shawn Smith created JENA-1771:
---------------------------------

             Summary: Spilling combined with DISTINCT .. ORDER BY returns rows 
in the wrong order
                 Key: JENA-1771
                 URL: https://issues.apache.org/jira/browse/JENA-1771
             Project: Apache Jena
          Issue Type: Bug
          Components: ARQ
    Affects Versions: Jena 3.13.1
            Reporter: Shawn Smith


It looks like Jena assumes that OpDistinct preserves order, but order is not 
preserved when spilling occurs.  This is only a problem when the 
ARQ.spillToDiskThreshold setting has been configured.

Consider the following query:
{code:java}
SELECT DISTINCT  *
WHERE
  { ?x  :p  ?v }
ORDER BY ASC(?v)
{code}
Jena executes the ORDER BY ASC(?v) before the DISTINCT, relying on the SPARQL 
requirement:

bq. The order of Distinct(Ψ) must preserve any ordering given by OrderBy.

But, when spilling, QueryIterDistinct (which executes OpDistinct) creates a 
DistinctDataBag with a BindingComparator without any sort conditions.  As a 
result, the DISTINCT operation doesn't preserve the ORDER BY ASC(?v) 
requirement.

You can reproduce this by injecting the following into QueryTest.java then 
running the ARQTestRefEngine tests:
{code:java}
void runTestSelect(Query query, QueryExecution qe)
{
    qe.getContext().set(ARQ.spillToDiskThreshold, 4);   // add this line
    ...
{code}

For example, "ARQTestRefEngine -> Algebra optimizations -> 
QueryTest.opt-top-05" will fail with:
{code}
Query: 
PREFIX  :     <http://example/>

SELECT DISTINCT  *
WHERE
  { ?x  :p  ?v }
ORDER BY ASC(?v)
LIMIT   5

Got: 5 --------------------------------
-------------
| x    | v  |
=============
| :x1  | 1  |
| :x2  | 2  |
| :x10 | 10 |
| :x11 | 11 |
| :x12 | 12 |
-------------
Expected: 5 -----------------------------
------------
| x    | v |
============
| :x1  | 1 |
| :x2  | 2 |
| :x3  | 3 |
| :x3a | 3 |
| :x4  | 4 |
------------

junit.framework.AssertionFailedError: Results do not match
        at junit.framework.Assert.fail(Assert.java:57)
        at junit.framework.Assert.assertTrue(Assert.java:22)
        at junit.framework.TestCase.assertTrue(TestCase.java:192)
        at 
org.apache.jena.sparql.junit.QueryTest.runTestSelect(QueryTest.java:284)
        at 
org.apache.jena.sparql.junit.QueryTest.runTestForReal(QueryTest.java:201)
        at 
org.apache.jena.sparql.junit.EarlTestCase.runTest(EarlTestCase.java:88)
        at junit.framework.TestCase.runBare(TestCase.java:141)
        at junit.framework.TestResult$1.protect(TestResult.java:122)
        at junit.framework.TestResult.runProtected(TestResult.java:142)
        at junit.framework.TestResult.run(TestResult.java:125)
        at junit.framework.TestCase.run(TestCase.java:129)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to