[ 
https://issues.apache.org/jira/browse/JENA-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16962230#comment-16962230
 ] 

ASF subversion and git services commented on JENA-1771:
-------------------------------------------------------

Commit 8ced15f1ea6ed4f49aeae7da2c91d4a73197dbd8 in jena's branch 
refs/heads/master from Andy Seaborne
[ https://gitbox.apache.org/repos/asf?p=jena.git;h=8ced15f ]

Merge pull request #623 from afs/jena-1771_spill-distinct-order

 JENA-1771: Pass sort conditions to distinct execution for spill. 

> Spilling combined with DISTINCT .. ORDER BY returns rows in the wrong order
> ---------------------------------------------------------------------------
>
>                 Key: JENA-1771
>                 URL: https://issues.apache.org/jira/browse/JENA-1771
>             Project: Apache Jena
>          Issue Type: Bug
>          Components: ARQ
>    Affects Versions: Jena 3.13.1
>            Reporter: Shawn Smith
>            Assignee: Andy Seaborne
>            Priority: Major
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> It looks like Jena assumes that OpDistinct preserves order, but order is not 
> preserved when spilling occurs. This is only a problem when the 
> ARQ.spillToDiskThreshold setting has been configured.
> Consider the following query:
> {code:java}
> PREFIX : <http://example.org/>
> SELECT DISTINCT  *
> WHERE
>   { ?x  :p  ?v }
> ORDER BY ASC(?v)
> {code}
> Here's the query plan for this query:
> {code:java}
> (distinct
>   (order ((asc ?v))
>     (bgp (triple ?x <http://example.org/p> ?v))))
> {code}
> Jena executes the ORDER BY ASC(?v) before the DISTINCT, relying on the SPARQL 
> requirement:
> {quote}The order of Distinct(Ψ) must preserve any ordering given by OrderBy.
> {quote}
> But, when spilling, QueryIterDistinct (which executes OpDistinct) creates a 
> DistinctDataBag with a BindingComparator without any sort conditions. As a 
> result, the DISTINCT operation sorts using "compareBindingsSyntactic()" and 
> doesn't preserve the ORDER BY ASC(?v) requirement.
> Note that some query plans will reorder the ORDER BY and DISTINCT, making 
> things work correctly. For example, adding a LIMIT 5 clause to the query 
> above results in a "(top (5 (asc ?v))" operation that doesn't suffer from the 
> bug.
> You can reproduce this by injecting the following into QueryTest.java then 
> running the ARQTestRefEngine tests:
> {code:java}
> void runTestSelect(Query query, QueryExecution qe)
> {
>     qe.getContext().set(ARQ.spillToDiskThreshold, 4);   // add this line
>     ...
> {code}
> For example, "ARQTestRefEngine -> Algebra optimizations -> 
> QueryTest.opt-top-05" will fail with:
> {code:java}
> Query: 
> PREFIX  :     <http://example/>
> SELECT DISTINCT  *
> WHERE
>   { ?x  :p  ?v }
> ORDER BY ASC(?v)
> LIMIT   5
> Got: 5 --------------------------------
> -------------
> | x    | v  |
> =============
> | :x1  | 1  |
> | :x2  | 2  |
> | :x10 | 10 |
> | :x11 | 11 |
> | :x12 | 12 |
> -------------
> Expected: 5 -----------------------------
> ------------
> | x    | v |
> ============
> | :x1  | 1 |
> | :x2  | 2 |
> | :x3  | 3 |
> | :x3a | 3 |
> | :x4  | 4 |
> ------------
> junit.framework.AssertionFailedError: Results do not match
>       at junit.framework.Assert.fail(Assert.java:57)
>       at junit.framework.Assert.assertTrue(Assert.java:22)
>       at junit.framework.TestCase.assertTrue(TestCase.java:192)
>       at 
> org.apache.jena.sparql.junit.QueryTest.runTestSelect(QueryTest.java:284)
>       at 
> org.apache.jena.sparql.junit.QueryTest.runTestForReal(QueryTest.java:201)
>       at 
> org.apache.jena.sparql.junit.EarlTestCase.runTest(EarlTestCase.java:88)
>       at junit.framework.TestCase.runBare(TestCase.java:141)
>       at junit.framework.TestResult$1.protect(TestResult.java:122)
>       at junit.framework.TestResult.runProtected(TestResult.java:142)
>       at junit.framework.TestResult.run(TestResult.java:125)
>       at junit.framework.TestCase.run(TestCase.java:129)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to