[ 
https://issues.apache.org/jira/browse/JENA-1770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16962226#comment-16962226
 ] 

ASF subversion and git services commented on JENA-1770:
-------------------------------------------------------

Commit d45a93925ad603f5c94404cb8c80892e94747739 in jena's branch 
refs/heads/master from Andy Seaborne
[ https://gitbox.apache.org/repos/asf?p=jena.git;h=d45a939 ]

JENA-1770: Handle varying variables in a split file.

> Spilling bindings with OPTIONAL leads to wrong answers
> ------------------------------------------------------
>
>                 Key: JENA-1770
>                 URL: https://issues.apache.org/jira/browse/JENA-1770
>             Project: Apache Jena
>          Issue Type: Bug
>          Components: ARQ
>    Affects Versions: Jena 3.13.1
>            Reporter: Shawn Smith
>            Assignee: Andy Seaborne
>            Priority: Major
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> A query like the following where some variables are optional may lead to 
> wrong answers when spilling occurs: 
> {code:java}
> PREFIX  foaf: <http://xmlns.com/foaf/0.1/>
> SELECT  ?name ?mbox
> WHERE
>   { ?x  foaf:name  ?name
>     OPTIONAL
>       { ?x  foaf:mbox  ?mbox }
>   }
> ORDER BY ASC(?mbox)
> {code}
> This is only a problem when the ARQ.spillToDiskThreshold setting has been 
> configured.
> The root cause is that BindingOutputStream emits a VARS row based on the 
> first binding, but it doesn't emit a new VARS row when a subsequent binding 
> contains additional variables.  
> The BindingOutputStream.needVars() method will cause a second VARS row to be 
> emitted when a new binding is missing variables, but not when it has extras.  
> This logic may be inverted from what was intended.
> There's a TestDistinctDataBag test case below that reproduces the problem. It 
> generates a spill file like this:
> {code}
> VARS ?1 .
> "A" .
> "A" .
> {code}
> when a correct spill file would be:
> {code}
> VARS ?1 .
> "A" .
> VARS ?2 ?1 .
> "B" "A" .
> {code}
> If you run it, you may notice that it fails with a spill threshold of 2 but 
> passes with a higher threshold:
> {code:java}
> @Test public void testOptionalVariables()
> {
>     // Setup a situation where the second binding in a spill file binds more
>     // variables than the first binding
>     BindingMap binding1 = BindingFactory.create();
>     binding1.add(Var.alloc("1"), NodeFactory.createLiteral("A"));
>     BindingMap binding2 = BindingFactory.create();
>     binding2.add(Var.alloc("1"), NodeFactory.createLiteral("A"));
>     binding2.add(Var.alloc("2"), NodeFactory.createLiteral("B"));
>     List<Binding> undistinct = Arrays.asList(binding1, binding2, binding1);
>     List<Binding> control = Iter.toList(Iter.distinct(undistinct.iterator()));
>     List<Binding> distinct = new ArrayList<>();
>     DistinctDataBag<Binding> db = new DistinctDataBag<>(
>             new ThresholdPolicyCount<Binding>(2),
>             SerializationFactoryFinder.bindingSerializationFactory(),
>             new BindingComparator(new ArrayList<SortCondition>()));
>     try
>     {
>         db.addAll(undistinct);
>         Iterator<Binding> iter = db.iterator();
>         while (iter.hasNext())
>         {
>             distinct.add(iter.next());
>         }
>         Iter.close(iter);
>     }
>     finally
>     {
>         db.close();
>     }
>     assertEquals(control.size(), distinct.size());
>     assertTrue(ResultSetCompare.equalsByTest(control, distinct, 
> NodeUtils.sameTerm));
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to