[ https://issues.apache.org/jira/browse/JENA-1770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16962226#comment-16962226 ]
ASF subversion and git services commented on JENA-1770: ------------------------------------------------------- Commit d45a93925ad603f5c94404cb8c80892e94747739 in jena's branch refs/heads/master from Andy Seaborne [ https://gitbox.apache.org/repos/asf?p=jena.git;h=d45a939 ] JENA-1770: Handle varying variables in a split file. > Spilling bindings with OPTIONAL leads to wrong answers > ------------------------------------------------------ > > Key: JENA-1770 > URL: https://issues.apache.org/jira/browse/JENA-1770 > Project: Apache Jena > Issue Type: Bug > Components: ARQ > Affects Versions: Jena 3.13.1 > Reporter: Shawn Smith > Assignee: Andy Seaborne > Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > A query like the following where some variables are optional may lead to > wrong answers when spilling occurs: > {code:java} > PREFIX foaf: <http://xmlns.com/foaf/0.1/> > SELECT ?name ?mbox > WHERE > { ?x foaf:name ?name > OPTIONAL > { ?x foaf:mbox ?mbox } > } > ORDER BY ASC(?mbox) > {code} > This is only a problem when the ARQ.spillToDiskThreshold setting has been > configured. > The root cause is that BindingOutputStream emits a VARS row based on the > first binding, but it doesn't emit a new VARS row when a subsequent binding > contains additional variables. > The BindingOutputStream.needVars() method will cause a second VARS row to be > emitted when a new binding is missing variables, but not when it has extras. > This logic may be inverted from what was intended. > There's a TestDistinctDataBag test case below that reproduces the problem. It > generates a spill file like this: > {code} > VARS ?1 . > "A" . > "A" . > {code} > when a correct spill file would be: > {code} > VARS ?1 . > "A" . > VARS ?2 ?1 . > "B" "A" . > {code} > If you run it, you may notice that it fails with a spill threshold of 2 but > passes with a higher threshold: > {code:java} > @Test public void testOptionalVariables() > { > // Setup a situation where the second binding in a spill file binds more > // variables than the first binding > BindingMap binding1 = BindingFactory.create(); > binding1.add(Var.alloc("1"), NodeFactory.createLiteral("A")); > BindingMap binding2 = BindingFactory.create(); > binding2.add(Var.alloc("1"), NodeFactory.createLiteral("A")); > binding2.add(Var.alloc("2"), NodeFactory.createLiteral("B")); > List<Binding> undistinct = Arrays.asList(binding1, binding2, binding1); > List<Binding> control = Iter.toList(Iter.distinct(undistinct.iterator())); > List<Binding> distinct = new ArrayList<>(); > DistinctDataBag<Binding> db = new DistinctDataBag<>( > new ThresholdPolicyCount<Binding>(2), > SerializationFactoryFinder.bindingSerializationFactory(), > new BindingComparator(new ArrayList<SortCondition>())); > try > { > db.addAll(undistinct); > Iterator<Binding> iter = db.iterator(); > while (iter.hasNext()) > { > distinct.add(iter.next()); > } > Iter.close(iter); > } > finally > { > db.close(); > } > assertEquals(control.size(), distinct.size()); > assertTrue(ResultSetCompare.equalsByTest(control, distinct, > NodeUtils.sameTerm)); > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)