[jira] [Commented] (JENA-1813) Join optimization transform results in incorrect query results

2020-01-13 Thread Shawn Smith (Jira)


[ 
https://issues.apache.org/jira/browse/JENA-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17014731#comment-17014731
 ] 

Shawn Smith commented on JENA-1813:
---

Thanks!

> Join optimization transform results in incorrect query results
> --
>
> Key: JENA-1813
> URL: https://issues.apache.org/jira/browse/JENA-1813
> Project: Apache Jena
>  Issue Type: Bug
>  Components: ARQ
>Affects Versions: Jena 3.13.1
>Reporter: Shawn Smith
>Assignee: Andy Seaborne
>Priority: Major
> Fix For: Jena 3.14.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> I think I've found a query where TransformJoinStrategy incorrectly decides 
> that a query is linear such that a "join" operation can be replaced by a 
> "sequence" operation.  As a result, the query returns incorrect results.  
> Disabling optimizations with "qe.getContext().set(ARQ.optimization, false)" 
> fixes the issue.
> Here's the query: 
> {noformat}
> PREFIX  :  
> SELECT ?a
> WHERE {
>   GRAPH :graph { :s :p ?a }
>   GRAPH :graph {
> SELECT (?b AS ?a)
> WHERE { :t :q ?b }
> GROUP BY ?b
>   }
> }
> {noformat}
> Here's the data to test it with (two quads, as Trig): 
> {noformat}
> @prefix :   .
> :graph {
> :s  :p  "a" .
> :t  :q  "b" .
> }
> {noformat}
> I expected the query to return zero results because the two GRAPH clauses 
> can't find compatible bindings for ?a.  But, in practice, Jena returns ?a="a" 
> and logs a warning:
> {noformat}
> [main] WARN  BindingUtils - merge: Mismatch : "a" != "b"{noformat}
> Note the warning is actually coming from QueryIterProjectMerge.java, not 
> BindingUtils.java.  With more complicated queries and datasets, this issue 
> can result in thousands or millions of logged warnings.
> The query plan before optimization looks like this:
> {noformat}
> (project (?a)
>   (join
> (graph 
>   (bgp (triple   ?a)))
> (graph 
>   (project (?a)
> (extend ((?a ?b))
>   (group (?b)
> (bgp (triple   
> ?b
> {noformat}
> Optimization replaces "join" with "sequence" which fails to detect conflicts 
> on ?a:
> {noformat}
> (project (?a)
>   (sequence
> (graph 
>   (bgp (triple   ?a)))
> (graph 
>   (project (?a)
> (extend ((?a ?/b))
>   (group (?/b)
> (bgp (triple   
> ?/b
> {noformat}
> For convenience, here's Java code that reproduces the bug:
> {noformat}
> import org.apache.jena.query.ARQ;
> import org.apache.jena.query.Dataset;
> import org.apache.jena.query.DatasetFactory;
> import org.apache.jena.query.QueryExecution;
> import org.apache.jena.query.QueryExecutionFactory;
> import org.apache.jena.query.ResultSet;
> import org.apache.jena.riot.Lang;
> import org.apache.jena.riot.RDFParser;
> import org.junit.Test;
> public class QueryTest {
> @Test
> public void testGraphQuery() {
> String query = "" +
> "PREFIX  :  \n" +
> "SELECT ?a\n" +
> "WHERE {\n" +
> "  GRAPH :graph { :s :p ?a }\n" +
> "  GRAPH :graph {\n" +
> "SELECT (?b AS ?a)\n" +
> "WHERE { :t :q ?b }\n" +
> "GROUP BY ?b\n" +
> "  }\n" +
> "}\n";
> String data = "" +
> "@prefix :   .\n" +
> ":graph {\n" +
> "  :s  :p  \"a\" .\n" +
> "  :t  :q  \"b\" .\n" +
> "}\n";
> Dataset ds = DatasetFactory.create();
> RDFParser.fromString(data).lang(Lang.TRIG).parse(ds);
> try (QueryExecution qe = QueryExecutionFactory.create(query, ds)) {
> qe.getContext().set(ARQ.optimization, true);  // flipping this to 
> false fixes the test
> ResultSet rs = qe.execSelect();
> if (rs.hasNext()) {
> System.out.println(rs.nextBinding());
> throw new AssertionError("Result set should be empty");
> }
> }
> }
> }
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (JENA-1813) Join optimization transform results in incorrect query results

2020-01-09 Thread Shawn Smith (Jira)


[ 
https://issues.apache.org/jira/browse/JENA-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17011957#comment-17011957
 ] 

Shawn Smith edited comment on JENA-1813 at 1/9/20 3:36 PM:
---

This patch appears to fix my specific query and passes all the tests.  But it's 
not obvious to me whether it's generally correct:
{noformat}
diff --git 
a/jena-arq/src/main/java/org/apache/jena/sparql/engine/main/JoinClassifier.java 
b/jena-arq/src/main/java/org/apache/jena/sparql/engine/main/JoinClassifier.java
index 58b1773e9b..c21a872cd2 100644
--- 
a/jena-arq/src/main/java/org/apache/jena/sparql/engine/main/JoinClassifier.java
+++ 
b/jena-arq/src/main/java/org/apache/jena/sparql/engine/main/JoinClassifier.java
@@ -205,11 +205,16 @@ public class JoinClassifier
 
 /** Find the "effective op" - ie. the one that may be sensitive to 
linearization */
 private static Op effectiveOp(Op op) {
-if ( op instanceof OpExt )
-op = ((OpExt)op).effectiveOp() ;
-while (safeModifier(op))
-op = ((OpModifier)op).getSubOp() ;
-return op ;
+for (;;) {
+if (op instanceof OpExt)
+op = ((OpExt) op).effectiveOp();
+else if (op instanceof OpGraph)
+op = ((OpGraph) op).getSubOp();
+else if (safeModifier(op))
+op = ((OpModifier) op).getSubOp();
+else
+return op;
+}
 }
 
 /** Helper - test for "safe" modifiers */
diff --git 
a/jena-arq/src/test/java/org/apache/jena/sparql/algebra/TestClassify.java 
b/jena-arq/src/test/java/org/apache/jena/sparql/algebra/TestClassify.java
index 05db3910ee..aecd07ec6f 100644
--- a/jena-arq/src/test/java/org/apache/jena/sparql/algebra/TestClassify.java
+++ b/jena-arq/src/test/java/org/apache/jena/sparql/algebra/TestClassify.java
@@ -159,6 +159,12 @@ public class TestClassify extends BaseTest
 TestClassify.classifyJ(x1, false);
 }
 
+// JENA-1813
+@Test public void testClassify_Join_54() {
+String x1 = "{ ?s  ?p  ?V GRAPH ?g { SELECT (?w AS ?V) { ?t  ?q  ?w } 
GROUP BY ?w } }";
+TestClassify.classifyJ(x1, false);
+}
+
 public static void classifyJ(String pattern, boolean expected)
 {
 String qs1 = "PREFIX : \n" ;
{noformat}


was (Author: ssmith):
This patch appears to fix my specific query.  But it's not obvious to me 
whether it's generally correct:
{noformat}
diff --git 
a/jena-arq/src/main/java/org/apache/jena/sparql/engine/main/JoinClassifier.java 
b/jena-arq/src/main/java/org/apache/jena/sparql/engine/main/JoinClassifier.java
index 58b1773e9b..c21a872cd2 100644
--- 
a/jena-arq/src/main/java/org/apache/jena/sparql/engine/main/JoinClassifier.java
+++ 
b/jena-arq/src/main/java/org/apache/jena/sparql/engine/main/JoinClassifier.java
@@ -205,11 +205,16 @@ public class JoinClassifier
 
 /** Find the "effective op" - ie. the one that may be sensitive to 
linearization */
 private static Op effectiveOp(Op op) {
-if ( op instanceof OpExt )
-op = ((OpExt)op).effectiveOp() ;
-while (safeModifier(op))
-op = ((OpModifier)op).getSubOp() ;
-return op ;
+for (;;) {
+if (op instanceof OpExt)
+op = ((OpExt) op).effectiveOp();
+else if (op instanceof OpGraph)
+op = ((OpGraph) op).getSubOp();
+else if (safeModifier(op))
+op = ((OpModifier) op).getSubOp();
+else
+return op;
+}
 }
 
 /** Helper - test for "safe" modifiers */
diff --git 
a/jena-arq/src/test/java/org/apache/jena/sparql/algebra/TestClassify.java 
b/jena-arq/src/test/java/org/apache/jena/sparql/algebra/TestClassify.java
index 05db3910ee..aecd07ec6f 100644
--- a/jena-arq/src/test/java/org/apache/jena/sparql/algebra/TestClassify.java
+++ b/jena-arq/src/test/java/org/apache/jena/sparql/algebra/TestClassify.java
@@ -159,6 +159,12 @@ public class TestClassify extends BaseTest
 TestClassify.classifyJ(x1, false);
 }
 
+// JENA-1813
+@Test public void testClassify_Join_54() {
+String x1 = "{ ?s  ?p  ?V GRAPH ?g { SELECT (?w AS ?V) { ?t  ?q  ?w } 
GROUP BY ?w } }";
+TestClassify.classifyJ(x1, false);
+}
+
 public static void classifyJ(String pattern, boolean expected)
 {
 String qs1 = "PREFIX : \n" ;
{noformat}

> Join optimization transform results in incorrect query results
> --
>
> Key: JENA-1813
> URL: https://issues.apache.org/jira/browse/JENA-1813
> Project: Apache Jena
>  Issue Type: Bug
>  Components: ARQ
>Affects Versions: Jena 3.13.1
>Reporter: Shawn Smith
>Priority: Major
>
> I think I've found a query where 

[jira] [Commented] (JENA-1813) Join optimization transform results in incorrect query results

2020-01-09 Thread Shawn Smith (Jira)


[ 
https://issues.apache.org/jira/browse/JENA-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17011957#comment-17011957
 ] 

Shawn Smith commented on JENA-1813:
---

This patch appears to fix my specific query.  But it's not obvious to me 
whether it's generally correct:
{noformat}
diff --git 
a/jena-arq/src/main/java/org/apache/jena/sparql/engine/main/JoinClassifier.java 
b/jena-arq/src/main/java/org/apache/jena/sparql/engine/main/JoinClassifier.java
index 58b1773e9b..c21a872cd2 100644
--- 
a/jena-arq/src/main/java/org/apache/jena/sparql/engine/main/JoinClassifier.java
+++ 
b/jena-arq/src/main/java/org/apache/jena/sparql/engine/main/JoinClassifier.java
@@ -205,11 +205,16 @@ public class JoinClassifier
 
 /** Find the "effective op" - ie. the one that may be sensitive to 
linearization */
 private static Op effectiveOp(Op op) {
-if ( op instanceof OpExt )
-op = ((OpExt)op).effectiveOp() ;
-while (safeModifier(op))
-op = ((OpModifier)op).getSubOp() ;
-return op ;
+for (;;) {
+if (op instanceof OpExt)
+op = ((OpExt) op).effectiveOp();
+else if (op instanceof OpGraph)
+op = ((OpGraph) op).getSubOp();
+else if (safeModifier(op))
+op = ((OpModifier) op).getSubOp();
+else
+return op;
+}
 }
 
 /** Helper - test for "safe" modifiers */
diff --git 
a/jena-arq/src/test/java/org/apache/jena/sparql/algebra/TestClassify.java 
b/jena-arq/src/test/java/org/apache/jena/sparql/algebra/TestClassify.java
index 05db3910ee..aecd07ec6f 100644
--- a/jena-arq/src/test/java/org/apache/jena/sparql/algebra/TestClassify.java
+++ b/jena-arq/src/test/java/org/apache/jena/sparql/algebra/TestClassify.java
@@ -159,6 +159,12 @@ public class TestClassify extends BaseTest
 TestClassify.classifyJ(x1, false);
 }
 
+// JENA-1813
+@Test public void testClassify_Join_54() {
+String x1 = "{ ?s  ?p  ?V GRAPH ?g { SELECT (?w AS ?V) { ?t  ?q  ?w } 
GROUP BY ?w } }";
+TestClassify.classifyJ(x1, false);
+}
+
 public static void classifyJ(String pattern, boolean expected)
 {
 String qs1 = "PREFIX : \n" ;
{noformat}

> Join optimization transform results in incorrect query results
> --
>
> Key: JENA-1813
> URL: https://issues.apache.org/jira/browse/JENA-1813
> Project: Apache Jena
>  Issue Type: Bug
>  Components: ARQ
>Affects Versions: Jena 3.13.1
>Reporter: Shawn Smith
>Priority: Major
>
> I think I've found a query where TransformJoinStrategy incorrectly decides 
> that a query is linear such that a "join" operation can be replaced by a 
> "sequence" operation.  As a result, the query returns incorrect results.  
> Disabling optimizations with "qe.getContext().set(ARQ.optimization, false)" 
> fixes the issue.
> Here's the query: 
> {noformat}
> PREFIX  :  
> SELECT ?a
> WHERE {
>   GRAPH :graph { :s :p ?a }
>   GRAPH :graph {
> SELECT (?b AS ?a)
> WHERE { :t :q ?b }
> GROUP BY ?b
>   }
> }
> {noformat}
> Here's the data to test it with (two quads, as Trig): 
> {noformat}
> @prefix :   .
> :graph {
> :s  :p  "a" .
> :t  :q  "b" .
> }
> {noformat}
> I expected the query to return zero results because the two GRAPH clauses 
> can't find compatible bindings for ?a.  But, in practice, Jena returns ?a="a" 
> and logs a warning:
> {noformat}
> [main] WARN  BindingUtils - merge: Mismatch : "a" != "b"{noformat}
> Note the warning is actually coming from QueryIterProjectMerge.java, not 
> BindingUtils.java.  With more complicated queries and datasets, this issue 
> can result in thousands or millions of logged warnings.
> The query plan before optimization looks like this:
> {noformat}
> (project (?a)
>   (join
> (graph 
>   (bgp (triple   ?a)))
> (graph 
>   (project (?a)
> (extend ((?a ?b))
>   (group (?b)
> (bgp (triple   
> ?b
> {noformat}
> Optimization replaces "join" with "sequence" which fails to detect conflicts 
> on ?a:
> {noformat}
> (project (?a)
>   (sequence
> (graph 
>   (bgp (triple   ?a)))
> (graph 
>   (project (?a)
> (extend ((?a ?/b))
>   (group (?/b)
> (bgp (triple   
> ?/b
> {noformat}
> For convenience, here's Java code that reproduces the bug:
> {noformat}
> import org.apache.jena.query.ARQ;
> import 

[jira] [Updated] (JENA-1813) Join optimization transform results in incorrect query results

2020-01-08 Thread Shawn Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/JENA-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shawn Smith updated JENA-1813:
--
Description: 
I think I've found a query where TransformJoinStrategy incorrectly decides that 
a query is linear such that a "join" operation can be replaced by a "sequence" 
operation.  As a result, the query returns incorrect results.  Disabling 
optimizations with "qe.getContext().set(ARQ.optimization, false)" fixes the 
issue.

Here's the query: 
{noformat}
PREFIX  :  
SELECT ?a
WHERE {
  GRAPH :graph { :s :p ?a }
  GRAPH :graph {
SELECT (?b AS ?a)
WHERE { :t :q ?b }
GROUP BY ?b
  }
}
{noformat}
Here's the data to test it with (two quads, as Trig): 
{noformat}
@prefix :   .
:graph {
:s  :p  "a" .
:t  :q  "b" .
}
{noformat}
I expected the query to return zero results because the two GRAPH clauses can't 
find compatible bindings for ?a.  But, in practice, Jena returns ?a="a" and 
logs a warning:
{noformat}
[main] WARN  BindingUtils - merge: Mismatch : "a" != "b"{noformat}
Note the warning is actually coming from QueryIterProjectMerge.java, not 
BindingUtils.java.  With more complicated queries and datasets, this issue can 
result in thousands or millions of logged warnings.

The query plan before optimization looks like this:
{noformat}
(project (?a)
  (join
(graph 
  (bgp (triple   ?a)))
(graph 
  (project (?a)
(extend ((?a ?b))
  (group (?b)
(bgp (triple   
?b
{noformat}
Optimization replaces "join" with "sequence" which fails to detect conflicts on 
?a:
{noformat}
(project (?a)
  (sequence
(graph 
  (bgp (triple   ?a)))
(graph 
  (project (?a)
(extend ((?a ?/b))
  (group (?/b)
(bgp (triple   
?/b
{noformat}
For convenience, here's Java code that reproduces the bug:
{noformat}
import org.apache.jena.query.ARQ;
import org.apache.jena.query.Dataset;
import org.apache.jena.query.DatasetFactory;
import org.apache.jena.query.QueryExecution;
import org.apache.jena.query.QueryExecutionFactory;
import org.apache.jena.query.ResultSet;
import org.apache.jena.riot.Lang;
import org.apache.jena.riot.RDFParser;
import org.junit.Test;

public class QueryTest {

@Test
public void testGraphQuery() {
String query = "" +
"PREFIX  :  \n" +
"SELECT ?a\n" +
"WHERE {\n" +
"  GRAPH :graph { :s :p ?a }\n" +
"  GRAPH :graph {\n" +
"SELECT (?b AS ?a)\n" +
"WHERE { :t :q ?b }\n" +
"GROUP BY ?b\n" +
"  }\n" +
"}\n";
String data = "" +
"@prefix :   .\n" +
":graph {\n" +
"  :s  :p  \"a\" .\n" +
"  :t  :q  \"b\" .\n" +
"}\n";

Dataset ds = DatasetFactory.create();
RDFParser.fromString(data).lang(Lang.TRIG).parse(ds);

try (QueryExecution qe = QueryExecutionFactory.create(query, ds)) {
qe.getContext().set(ARQ.optimization, true);  // flipping this to 
false fixes the test
ResultSet rs = qe.execSelect();
if (rs.hasNext()) {
System.out.println(rs.nextBinding());
throw new AssertionError("Result set should be empty");
}
}
}
}
{noformat}

  was:
I think I've found a query where TransformJoinStrategy incorrectly decides that 
a query is linear such that a "join" operation can be replaced by a "sequence" 
operation.  As a result, the query returns incorrect results.  Disabling 
optimizations with "qe.getContext().set(ARQ.optimization, false)" fixes the 
issue.

Here's the query: 
{noformat}
PREFIX  :  
SELECT ?a
WHERE {
  GRAPH :graph { :s :p ?a }
  GRAPH :graph {
SELECT (?b AS ?a)
WHERE { :t :q ?b }
GROUP BY ?b
  }
}
{noformat}
Here's the data to test it with (two quads, as Trig): 
{noformat}
@prefix :   .
:graph {
:s  :p  "a" .
:t  :q  "b" .
}
{noformat}
 I expected the query to return zero results because the two GRAPH clauses 
can't find compatible bindings for ?a.  But, in practice, Jena returns ?a="a" 
and logs a warning:
{noformat}
[main] WARN  BindingUtils - merge: Mismatch : "a" != "b"{noformat}
Note the warning is actually coming from QueryIterProjectMerge.java, not 
BindingUtils.java.  With more complicated queries and datasets, this issue can 
result in thousands or millions 

[jira] [Updated] (JENA-1813) Join optimization transform results in incorrect query results

2020-01-08 Thread Shawn Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/JENA-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shawn Smith updated JENA-1813:
--
Description: 
I think I've found a query where TransformJoinStrategy incorrectly decides that 
a query is linear such that a "join" operation can be replaced by a "sequence" 
operation.  As a result, the query returns incorrect results.  Disabling 
optimizations with "qe.getContext().set(ARQ.optimization, false)" fixes the 
issue.

Here's the query: 
{noformat}
PREFIX  :  
SELECT ?a
WHERE {
  GRAPH :graph { :s :p ?a }
  GRAPH :graph {
SELECT (?b AS ?a)
WHERE { :t :q ?b }
GROUP BY ?b
  }
}
{noformat}
Here's the data to test it with (two quads, as Trig): 
{noformat}
@prefix :   .
:graph {
:s  :p  "a" .
:t  :q  "b" .
}
{noformat}
 I expected the query to return zero results because the two GRAPH clauses 
can't find compatible bindings for ?a.  But, in practice, Jena returns ?a="a" 
and logs a warning:
{noformat}
[main] WARN  BindingUtils - merge: Mismatch : "a" != "b"{noformat}
Note the warning is actually coming from QueryIterProjectMerge.java, not 
BindingUtils.java.  With more complicated queries and datasets, this issue can 
result in thousands or millions of logged warnings.

The query plan before optimization looks like this:
{noformat}
(project (?a)
  (join
(graph 
  (bgp (triple   ?a)))
(graph 
  (project (?a)
(extend ((?a ?b))
  (group (?b)
(bgp (triple   
?b
{noformat}
Optimization replaces "join" with "sequence" which fails to detect conflicts on 
?a:
{noformat}
(project (?a)
  (sequence
(graph 
  (bgp (triple   ?a)))
(graph 
  (project (?a)
(extend ((?a ?/b))
  (group (?/b)
(bgp (triple   
?/b
{noformat}
For convenience, here's Java code that reproduces the bug:
{noformat}
import org.apache.jena.query.ARQ;
import org.apache.jena.query.Dataset;
import org.apache.jena.query.DatasetFactory;
import org.apache.jena.query.QueryExecution;
import org.apache.jena.query.QueryExecutionFactory;
import org.apache.jena.query.ResultSet;
import org.apache.jena.riot.Lang;
import org.apache.jena.riot.RDFParser;
import org.junit.Test;

public class QueryTest {

@Test
public void testGraphQuery() {
String query = "" +
"PREFIX  :  \n" +
"SELECT ?a\n" +
"WHERE {\n" +
"  GRAPH :graph { :s :p ?a }\n" +
"  GRAPH :graph {\n" +
"SELECT (?b AS ?a)\n" +
"WHERE { :t :q ?b }\n" +
"GROUP BY ?b\n" +
"  }\n" +
"}\n";
String data = "" +
"@prefix :   .\n" +
":graph {\n" +
"  :s  :p  \"a\" .\n" +
"  :t  :q  \"b\" .\n" +
"}\n";

Dataset ds = DatasetFactory.create();
RDFParser.fromString(data).lang(Lang.TRIG).parse(ds);

try (QueryExecution qe = QueryExecutionFactory.create(query, ds)) {
qe.getContext().set(ARQ.optimization, true);  // flipping this to 
false fixes the test
ResultSet rs = qe.execSelect();
if (rs.hasNext()) {
System.out.println(rs.nextBinding());
throw new AssertionError("Result set should be empty");
}
}
}
}
{noformat}

  was:
I think I've found a query where TransformJoinStrategy incorrectly decides that 
a query is linear such that a "join" operation can be replaced by a "sequence" 
operation.  As a result, the query returns incorrect results.  Disabling 
optimizations with "qe.getContext().set(ARQ.optimization, false)" fixes the 
issue.

Here's the query: 
{noformat}
PREFIX  :  
SELECT ?a
WHERE {
  GRAPH :graph { :s :p ?a }
  GRAPH :graph {
SELECT (?b AS ?a)
WHERE { :t :q ?b }
GROUP BY ?b
  }
}
{noformat}
Here's the data to test it with (two triples, as Trig): 
{noformat}
@prefix :   .
:graph {
:s  :p  "a" .
:t  :q  "b" .
}
{noformat}
 I expected the query to return zero results because the two GRAPH clauses 
can't find compatible bindings for ?a.  But, in practice, Jena returns ?a="a" 
and logs a warning:
{noformat}
[main] WARN  BindingUtils - merge: Mismatch : "a" != "b"{noformat}
Note the warning is actually coming from QueryIterProjectMerge.java, not 
BindingUtils.java.  With more complicated queries and datasets, this issue can 
result in thousands or 

[jira] [Created] (JENA-1813) Join optimization transform results in incorrect query results

2020-01-08 Thread Shawn Smith (Jira)
Shawn Smith created JENA-1813:
-

 Summary: Join optimization transform results in incorrect query 
results
 Key: JENA-1813
 URL: https://issues.apache.org/jira/browse/JENA-1813
 Project: Apache Jena
  Issue Type: Bug
  Components: ARQ
Affects Versions: Jena 3.13.1
Reporter: Shawn Smith


I think I've found a query where TransformJoinStrategy incorrectly decides that 
a query is linear such that a "join" operation can be replaced by a "sequence" 
operation.  As a result, the query returns incorrect results.  Disabling 
optimizations with "qe.getContext().set(ARQ.optimization, false)" fixes the 
issue.

Here's the query: 
{noformat}
PREFIX  :  
SELECT ?a
WHERE {
  GRAPH :graph { :s :p ?a }
  GRAPH :graph {
SELECT (?b AS ?a)
WHERE { :t :q ?b }
GROUP BY ?b
  }
}
{noformat}
Here's the data to test it with (two triples, as Trig): 
{noformat}
@prefix :   .
:graph {
:s  :p  "a" .
:t  :q  "b" .
}
{noformat}
 I expected the query to return zero results because the two GRAPH clauses 
can't find compatible bindings for ?a.  But, in practice, Jena returns ?a="a" 
and logs a warning:
{noformat}
[main] WARN  BindingUtils - merge: Mismatch : "a" != "b"{noformat}
Note the warning is actually coming from QueryIterProjectMerge.java, not 
BindingUtils.java.  With more complicated queries and datasets, this issue can 
result in thousands or millions of logged warnings.

The query plan before optimization looks like this:
{noformat}
(project (?a)
  (join
(graph 
  (bgp (triple   ?a)))
(graph 
  (project (?a)
(extend ((?a ?b))
  (group (?b)
(bgp (triple   
?b
{noformat}
Optimization replaces "join" with "sequence" which fails to detect conflicts on 
?a:
{noformat}
(project (?a)
  (sequence
(graph 
  (bgp (triple   ?a)))
(graph 
  (project (?a)
(extend ((?a ?/b))
  (group (?/b)
(bgp (triple   
?/b
{noformat}
For convenience, here's Java code that reproduces the bug:
{noformat}
import org.apache.jena.query.ARQ;
import org.apache.jena.query.Dataset;
import org.apache.jena.query.DatasetFactory;
import org.apache.jena.query.QueryExecution;
import org.apache.jena.query.QueryExecutionFactory;
import org.apache.jena.query.ResultSet;
import org.apache.jena.riot.Lang;
import org.apache.jena.riot.RDFParser;
import org.junit.Test;

public class QueryTest {

@Test
public void testGraphQuery() {
String query = "" +
"PREFIX  :  \n" +
"SELECT ?a\n" +
"WHERE {\n" +
"  GRAPH :graph { :s :p ?a }\n" +
"  GRAPH :graph {\n" +
"SELECT (?b AS ?a)\n" +
"WHERE { :t :q ?b }\n" +
"GROUP BY ?b\n" +
"  }\n" +
"}\n";
String data = "" +
"@prefix :   .\n" +
":graph {\n" +
"  :s  :p  \"a\" .\n" +
"  :t  :q  \"b\" .\n" +
"}\n";

Dataset ds = DatasetFactory.create();
RDFParser.fromString(data).lang(Lang.TRIG).parse(ds);

try (QueryExecution qe = QueryExecutionFactory.create(query, ds)) {
qe.getContext().set(ARQ.optimization, true);  // flipping this to 
false fixes the test
ResultSet rs = qe.execSelect();
if (rs.hasNext()) {
System.out.println(rs.nextBinding());
throw new AssertionError("Result set should be empty");
}
}
}
}
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (JENA-1771) Spilling combined with DISTINCT .. ORDER BY returns rows in the wrong order

2019-10-17 Thread Shawn Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/JENA-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shawn Smith updated JENA-1771:
--
Description: 
It looks like Jena assumes that OpDistinct preserves order, but order is not 
preserved when spilling occurs. This is only a problem when the 
ARQ.spillToDiskThreshold setting has been configured.

Consider the following query:
{code:java}
PREFIX : 
SELECT DISTINCT  *
WHERE
  { ?x  :p  ?v }
ORDER BY ASC(?v)
{code}
Here's the query plan for this query:
{code:java}
(distinct
  (order ((asc ?v))
(bgp (triple ?x  ?v
{code}
Jena executes the ORDER BY ASC(?v) before the DISTINCT, relying on the SPARQL 
requirement:
{quote}The order of Distinct(Ψ) must preserve any ordering given by OrderBy.
{quote}
But, when spilling, QueryIterDistinct (which executes OpDistinct) creates a 
DistinctDataBag with a BindingComparator without any sort conditions. As a 
result, the DISTINCT operation sorts using "compareBindingsSyntactic()" and 
doesn't preserve the ORDER BY ASC(?v) requirement.

Note that some query plans will reorder the ORDER BY and DISTINCT, making 
things work correctly. For example, adding a LIMIT 5 clause to the query above 
results in a "(top (5 (asc ?v))" operation that doesn't suffer from the bug.

You can reproduce this by injecting the following into QueryTest.java then 
running the ARQTestRefEngine tests:
{code:java}
void runTestSelect(Query query, QueryExecution qe)
{
qe.getContext().set(ARQ.spillToDiskThreshold, 4);   // add this line
...
{code}
For example, "ARQTestRefEngine -> Algebra optimizations -> 
QueryTest.opt-top-05" will fail with:
{code:java}
Query: 
PREFIX  : 

SELECT DISTINCT  *
WHERE
  { ?x  :p  ?v }
ORDER BY ASC(?v)
LIMIT   5

Got: 5 
-
| x| v  |
=
| :x1  | 1  |
| :x2  | 2  |
| :x10 | 10 |
| :x11 | 11 |
| :x12 | 12 |
-
Expected: 5 -

| x| v |

| :x1  | 1 |
| :x2  | 2 |
| :x3  | 3 |
| :x3a | 3 |
| :x4  | 4 |


junit.framework.AssertionFailedError: Results do not match
at junit.framework.Assert.fail(Assert.java:57)
at junit.framework.Assert.assertTrue(Assert.java:22)
at junit.framework.TestCase.assertTrue(TestCase.java:192)
at 
org.apache.jena.sparql.junit.QueryTest.runTestSelect(QueryTest.java:284)
at 
org.apache.jena.sparql.junit.QueryTest.runTestForReal(QueryTest.java:201)
at 
org.apache.jena.sparql.junit.EarlTestCase.runTest(EarlTestCase.java:88)
at junit.framework.TestCase.runBare(TestCase.java:141)
at junit.framework.TestResult$1.protect(TestResult.java:122)
at junit.framework.TestResult.runProtected(TestResult.java:142)
at junit.framework.TestResult.run(TestResult.java:125)
at junit.framework.TestCase.run(TestCase.java:129)
{code}

  was:
It looks like Jena assumes that OpDistinct preserves order, but order is not 
preserved when spilling occurs. This is only a problem when the 
ARQ.spillToDiskThreshold setting has been configured.

Consider the following query:
{code:java}
PREFIX : 
SELECT DISTINCT  *
WHERE
  { ?x  :p  ?v }
ORDER BY ASC(?v)
{code}
Here's the query plan for this query:
{code:java}
(distinct
  (order ((asc ?v))
(bgp (triple ?x  ?v
{code}
Jena executes the ORDER BY ASC(?v) before the DISTINCT, relying on the SPARQL 
requirement:
{quote}The order of Distinct(Ψ) must preserve any ordering given by OrderBy.
{quote}
But, when spilling, QueryIterDistinct (which executes OpDistinct) creates a 
DistinctDataBag with a BindingComparator without any sort conditions. As a 
result, the DISTINCT operation sorts using "compareBindingsSyntactic()" and 
doesn't preserve the ORDER BY ASC(?v) requirement.

Note that some query plans will reorder the ORDER BY and DISTINCT, making 
things work correctly. For example, adding a LIMIT clause to the query above 
results in a "(top (5 (asc ?v))" operation that doesn't suffer from the bug.

You can reproduce this by injecting the following into QueryTest.java then 
running the ARQTestRefEngine tests:
{code:java}
void runTestSelect(Query query, QueryExecution qe)
{
qe.getContext().set(ARQ.spillToDiskThreshold, 4);   // add this line
...
{code}
For example, "ARQTestRefEngine -> Algebra optimizations -> 
QueryTest.opt-top-05" will fail with:
{code:java}
Query: 
PREFIX  : 

SELECT DISTINCT  *
WHERE
  { ?x  :p  ?v }
ORDER BY ASC(?v)
LIMIT   5

Got: 5 
-
| x| v  |
=
| :x1  | 1  |
| :x2  | 2  |
| :x10 | 10 |
| :x11 | 11 |
| :x12 | 12 |
-
Expected: 5 -

| x| v |

| :x1  | 1 |
| :x2  | 2 |
| :x3  | 3 |
| :x3a | 3 |
| :x4  | 4 |



[jira] [Updated] (JENA-1771) Spilling combined with DISTINCT .. ORDER BY returns rows in the wrong order

2019-10-17 Thread Shawn Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/JENA-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shawn Smith updated JENA-1771:
--
Description: 
It looks like Jena assumes that OpDistinct preserves order, but order is not 
preserved when spilling occurs. This is only a problem when the 
ARQ.spillToDiskThreshold setting has been configured.

Consider the following query:
{code:java}
PREFIX : 
SELECT DISTINCT  *
WHERE
  { ?x  :p  ?v }
ORDER BY ASC(?v)
{code}
Here's the query plan for this query:
{code:java}
(distinct
  (order ((asc ?v))
(bgp (triple ?x  ?v
{code}
Jena executes the ORDER BY ASC(?v) before the DISTINCT, relying on the SPARQL 
requirement:
{quote}The order of Distinct(Ψ) must preserve any ordering given by OrderBy.
{quote}
But, when spilling, QueryIterDistinct (which executes OpDistinct) creates a 
DistinctDataBag with a BindingComparator without any sort conditions. As a 
result, the DISTINCT operation sorts using "compareBindingsSyntactic()" and 
doesn't preserve the ORDER BY ASC(?v) requirement.

Note that some query plans will reorder the ORDER BY and DISTINCT, making 
things work correctly. For example, adding a LIMIT clause to the query above 
results in a "(top (5 (asc ?v))" operation that doesn't suffer from the bug.

You can reproduce this by injecting the following into QueryTest.java then 
running the ARQTestRefEngine tests:
{code:java}
void runTestSelect(Query query, QueryExecution qe)
{
qe.getContext().set(ARQ.spillToDiskThreshold, 4);   // add this line
...
{code}
For example, "ARQTestRefEngine -> Algebra optimizations -> 
QueryTest.opt-top-05" will fail with:
{code:java}
Query: 
PREFIX  : 

SELECT DISTINCT  *
WHERE
  { ?x  :p  ?v }
ORDER BY ASC(?v)
LIMIT   5

Got: 5 
-
| x| v  |
=
| :x1  | 1  |
| :x2  | 2  |
| :x10 | 10 |
| :x11 | 11 |
| :x12 | 12 |
-
Expected: 5 -

| x| v |

| :x1  | 1 |
| :x2  | 2 |
| :x3  | 3 |
| :x3a | 3 |
| :x4  | 4 |


junit.framework.AssertionFailedError: Results do not match
at junit.framework.Assert.fail(Assert.java:57)
at junit.framework.Assert.assertTrue(Assert.java:22)
at junit.framework.TestCase.assertTrue(TestCase.java:192)
at 
org.apache.jena.sparql.junit.QueryTest.runTestSelect(QueryTest.java:284)
at 
org.apache.jena.sparql.junit.QueryTest.runTestForReal(QueryTest.java:201)
at 
org.apache.jena.sparql.junit.EarlTestCase.runTest(EarlTestCase.java:88)
at junit.framework.TestCase.runBare(TestCase.java:141)
at junit.framework.TestResult$1.protect(TestResult.java:122)
at junit.framework.TestResult.runProtected(TestResult.java:142)
at junit.framework.TestResult.run(TestResult.java:125)
at junit.framework.TestCase.run(TestCase.java:129)
{code}

  was:
It looks like Jena assumes that OpDistinct preserves order, but order is not 
preserved when spilling occurs.  This is only a problem when the 
ARQ.spillToDiskThreshold setting has been configured.

Consider the following query:
{code:java}
SELECT DISTINCT  *
WHERE
  { ?x  :p  ?v }
ORDER BY ASC(?v)
{code}
Jena executes the ORDER BY ASC(?v) before the DISTINCT, relying on the SPARQL 
requirement:

bq. The order of Distinct(Ψ) must preserve any ordering given by OrderBy.

But, when spilling, QueryIterDistinct (which executes OpDistinct) creates a 
DistinctDataBag with a BindingComparator without any sort conditions.  As a 
result, the DISTINCT operation doesn't preserve the ORDER BY ASC(?v) 
requirement.

You can reproduce this by injecting the following into QueryTest.java then 
running the ARQTestRefEngine tests:
{code:java}
void runTestSelect(Query query, QueryExecution qe)
{
qe.getContext().set(ARQ.spillToDiskThreshold, 4);   // add this line
...
{code}

For example, "ARQTestRefEngine -> Algebra optimizations -> 
QueryTest.opt-top-05" will fail with:
{code}
Query: 
PREFIX  : 

SELECT DISTINCT  *
WHERE
  { ?x  :p  ?v }
ORDER BY ASC(?v)
LIMIT   5

Got: 5 
-
| x| v  |
=
| :x1  | 1  |
| :x2  | 2  |
| :x10 | 10 |
| :x11 | 11 |
| :x12 | 12 |
-
Expected: 5 -

| x| v |

| :x1  | 1 |
| :x2  | 2 |
| :x3  | 3 |
| :x3a | 3 |
| :x4  | 4 |


junit.framework.AssertionFailedError: Results do not match
at junit.framework.Assert.fail(Assert.java:57)
at junit.framework.Assert.assertTrue(Assert.java:22)
at junit.framework.TestCase.assertTrue(TestCase.java:192)
at 
org.apache.jena.sparql.junit.QueryTest.runTestSelect(QueryTest.java:284)
at 
org.apache.jena.sparql.junit.QueryTest.runTestForReal(QueryTest.java:201)
at 

[jira] [Created] (JENA-1771) Spilling combined with DISTINCT .. ORDER BY returns rows in the wrong order

2019-10-17 Thread Shawn Smith (Jira)
Shawn Smith created JENA-1771:
-

 Summary: Spilling combined with DISTINCT .. ORDER BY returns rows 
in the wrong order
 Key: JENA-1771
 URL: https://issues.apache.org/jira/browse/JENA-1771
 Project: Apache Jena
  Issue Type: Bug
  Components: ARQ
Affects Versions: Jena 3.13.1
Reporter: Shawn Smith


It looks like Jena assumes that OpDistinct preserves order, but order is not 
preserved when spilling occurs.  This is only a problem when the 
ARQ.spillToDiskThreshold setting has been configured.

Consider the following query:
{code:java}
SELECT DISTINCT  *
WHERE
  { ?x  :p  ?v }
ORDER BY ASC(?v)
{code}
Jena executes the ORDER BY ASC(?v) before the DISTINCT, relying on the SPARQL 
requirement:

bq. The order of Distinct(Ψ) must preserve any ordering given by OrderBy.

But, when spilling, QueryIterDistinct (which executes OpDistinct) creates a 
DistinctDataBag with a BindingComparator without any sort conditions.  As a 
result, the DISTINCT operation doesn't preserve the ORDER BY ASC(?v) 
requirement.

You can reproduce this by injecting the following into QueryTest.java then 
running the ARQTestRefEngine tests:
{code:java}
void runTestSelect(Query query, QueryExecution qe)
{
qe.getContext().set(ARQ.spillToDiskThreshold, 4);   // add this line
...
{code}

For example, "ARQTestRefEngine -> Algebra optimizations -> 
QueryTest.opt-top-05" will fail with:
{code}
Query: 
PREFIX  : 

SELECT DISTINCT  *
WHERE
  { ?x  :p  ?v }
ORDER BY ASC(?v)
LIMIT   5

Got: 5 
-
| x| v  |
=
| :x1  | 1  |
| :x2  | 2  |
| :x10 | 10 |
| :x11 | 11 |
| :x12 | 12 |
-
Expected: 5 -

| x| v |

| :x1  | 1 |
| :x2  | 2 |
| :x3  | 3 |
| :x3a | 3 |
| :x4  | 4 |


junit.framework.AssertionFailedError: Results do not match
at junit.framework.Assert.fail(Assert.java:57)
at junit.framework.Assert.assertTrue(Assert.java:22)
at junit.framework.TestCase.assertTrue(TestCase.java:192)
at 
org.apache.jena.sparql.junit.QueryTest.runTestSelect(QueryTest.java:284)
at 
org.apache.jena.sparql.junit.QueryTest.runTestForReal(QueryTest.java:201)
at 
org.apache.jena.sparql.junit.EarlTestCase.runTest(EarlTestCase.java:88)
at junit.framework.TestCase.runBare(TestCase.java:141)
at junit.framework.TestResult$1.protect(TestResult.java:122)
at junit.framework.TestResult.runProtected(TestResult.java:142)
at junit.framework.TestResult.run(TestResult.java:125)
at junit.framework.TestCase.run(TestCase.java:129)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (JENA-1770) Spilling bindings with OPTIONAL leads to wrong answers

2019-10-17 Thread Shawn Smith (Jira)
Shawn Smith created JENA-1770:
-

 Summary: Spilling bindings with OPTIONAL leads to wrong answers
 Key: JENA-1770
 URL: https://issues.apache.org/jira/browse/JENA-1770
 Project: Apache Jena
  Issue Type: Bug
  Components: ARQ
Affects Versions: Jena 3.13.1
Reporter: Shawn Smith


A query like the following where some variables are optional may lead to wrong 
answers when spilling occurs: 
{code:java}
PREFIX  foaf: 
SELECT  ?name ?mbox
WHERE
  { ?x  foaf:name  ?name
OPTIONAL
  { ?x  foaf:mbox  ?mbox }
  }
ORDER BY ASC(?mbox)
{code}
This is only a problem when the ARQ.spillToDiskThreshold setting has been 
configured.

The root cause is that BindingOutputStream emits a VARS row based on the first 
binding, but it doesn't emit a new VARS row when a subsequent binding contains 
additional variables.  

The BindingOutputStream.needVars() method will cause a second VARS row to be 
emitted when a new binding is missing variables, but not when it has extras.  
This logic may be inverted from what was intended.

There's a TestDistinctDataBag test case below that reproduces the problem. It 
generates a spill file like this:
{code}
VARS ?1 .
"A" .
"A" .
{code}
when a correct spill file would be:
{code}
VARS ?1 .
"A" .
VARS ?2 ?1 .
"B" "A" .
{code}

If you run it, you may notice that it fails with a spill threshold of 2 but 
passes with a higher threshold:
{code:java}
@Test public void testOptionalVariables()
{
// Setup a situation where the second binding in a spill file binds more
// variables than the first binding
BindingMap binding1 = BindingFactory.create();
binding1.add(Var.alloc("1"), NodeFactory.createLiteral("A"));

BindingMap binding2 = BindingFactory.create();
binding2.add(Var.alloc("1"), NodeFactory.createLiteral("A"));
binding2.add(Var.alloc("2"), NodeFactory.createLiteral("B"));

List undistinct = Arrays.asList(binding1, binding2, binding1);
List control = Iter.toList(Iter.distinct(undistinct.iterator()));
List distinct = new ArrayList<>();

DistinctDataBag db = new DistinctDataBag<>(
new ThresholdPolicyCount(2),
SerializationFactoryFinder.bindingSerializationFactory(),
new BindingComparator(new ArrayList()));
try
{
db.addAll(undistinct);
Iterator iter = db.iterator();
while (iter.hasNext())
{
distinct.add(iter.next());
}
Iter.close(iter);
}
finally
{
db.close();
}

assertEquals(control.size(), distinct.size());
assertTrue(ResultSetCompare.equalsByTest(control, distinct, 
NodeUtils.sameTerm));
}
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (JENA-1523) "VARS requires a list of variables" exception w/spilling and renamed vars

2018-04-16 Thread Shawn Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-1523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16439819#comment-16439819
 ] 

Shawn Smith commented on JENA-1523:
---

Thanks for the quick turnaround!

> "VARS requires a list of variables" exception w/spilling and renamed vars
> -
>
> Key: JENA-1523
> URL: https://issues.apache.org/jira/browse/JENA-1523
> Project: Apache Jena
>  Issue Type: Bug
>  Components: ARQ
>Affects Versions: Jena 3.7.0
>Reporter: Shawn Smith
>Assignee: Andy Seaborne
>Priority: Major
> Fix For: Jena 3.8.0
>
>
> Spilling a {{DistinctDataBag}} or {{SortedDataBag}} when executing SPARQL 
> queries that are modified by {{TransformScopeRename}} can result in the 
> following:
> {noformat}
> org.apache.jena.riot.RiotException: [line: 1, col: 7 ] VARS requires a list 
> of variables (found '[SLASH]')
> at 
> org.apache.jena.riot.system.ErrorHandlerFactory$ErrorHandlerStd.fatal(ErrorHandlerFactory.java:147)
> at org.apache.jena.riot.lang.LangEngine.raiseException(LangEngine.java:148)
> at org.apache.jena.riot.lang.LangEngine.exceptionDirect(LangEngine.java:143)
> at org.apache.jena.riot.lang.LangEngine.exception(LangEngine.java:137)
> at 
> org.apache.jena.sparql.engine.binding.BindingInputStream.access$1900(BindingInputStream.java:64)
> at 
> org.apache.jena.sparql.engine.binding.BindingInputStream$IteratorTuples.directiveVars(BindingInputStream.java:227)
> at 
> org.apache.jena.sparql.engine.binding.BindingInputStream$IteratorTuples.directives(BindingInputStream.java:140)
> at 
> org.apache.jena.sparql.engine.binding.BindingInputStream$IteratorTuples.(BindingInputStream.java:129)
> at 
> org.apache.jena.sparql.engine.binding.BindingInputStream.(BindingInputStream.java:99)
> at 
> org.apache.jena.sparql.engine.binding.BindingInputStream.(BindingInputStream.java:78)
> at 
> org.apache.jena.sparql.engine.binding.BindingInputStream.(BindingInputStream.java:73)
> at 
> org.apache.jena.riot.system.SerializationFactoryFinder$1.createDeserializer(SerializationFactoryFinder.java:56)
> at 
> org.apache.jena.atlas.data.SortedDataBag.getInputIterator(SortedDataBag.java:190)
> at org.apache.jena.atlas.data.SortedDataBag.iterator(SortedDataBag.java:235)
> at org.apache.jena.atlas.data.SortedDataBag.iterator(SortedDataBag.java:206)
> at 
> org.apache.jena.atlas.data.DistinctDataBag.iterator(DistinctDataBag.java:94){noformat}
> The problem is that renaming variables prepends a "/" so that, for example, 
> the first line of the spill file might look like the following which 
> {{BindingInputStream.directiveVars()}} can't parse:
> {noformat}
> VARS ?/.1 ?/.0 ?v_2 ?v_21 ?v_1 .{noformat}
> Here's a test case that reproduces the exception:
> {noformat}
> @Test
> public void testWithRenamedVars() {
> ExprVar expr = (ExprVar) Rename.renameVars(new ExprVar("1"), 
> Collections.emptySet());
> BindingMap binding = BindingFactory.create();
> binding.add(expr.asVar(), NodeFactory.createLiteral("foo"));
> SortedDataBag dataBag = BagFactory.newSortedBag(
> new ThresholdPolicyCount<>(0),
> SerializationFactoryFinder.bindingSerializationFactory(),
> new BindingComparator(new ArrayList<>()));
> try {
> dataBag.add(binding);
> dataBag.flush();
> // Spill file looks like the following:
> // VARS ?/1 .
> // "foo" .
> Binding actual = dataBag.iterator().next();
> assertEquals(binding, actual);
> } finally {
> dataBag.close();
> }
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (JENA-1523) "VARS requires a list of variables" exception w/spilling and renamed vars

2018-04-12 Thread Shawn Smith (JIRA)
Shawn Smith created JENA-1523:
-

 Summary: "VARS requires a list of variables" exception w/spilling 
and renamed vars
 Key: JENA-1523
 URL: https://issues.apache.org/jira/browse/JENA-1523
 Project: Apache Jena
  Issue Type: Bug
  Components: ARQ
Affects Versions: Jena 3.7.0
Reporter: Shawn Smith


Spilling a {{DistinctDataBag}} or {{SortedDataBag}} when executing SPARQL 
queries that are modified by {{TransformScopeRename}} can result in the 
following:
{noformat}
org.apache.jena.riot.RiotException: [line: 1, col: 7 ] VARS requires a list of 
variables (found '[SLASH]')

at 
org.apache.jena.riot.system.ErrorHandlerFactory$ErrorHandlerStd.fatal(ErrorHandlerFactory.java:147)
at org.apache.jena.riot.lang.LangEngine.raiseException(LangEngine.java:148)
at org.apache.jena.riot.lang.LangEngine.exceptionDirect(LangEngine.java:143)
at org.apache.jena.riot.lang.LangEngine.exception(LangEngine.java:137)
at 
org.apache.jena.sparql.engine.binding.BindingInputStream.access$1900(BindingInputStream.java:64)
at 
org.apache.jena.sparql.engine.binding.BindingInputStream$IteratorTuples.directiveVars(BindingInputStream.java:227)
at 
org.apache.jena.sparql.engine.binding.BindingInputStream$IteratorTuples.directives(BindingInputStream.java:140)
at 
org.apache.jena.sparql.engine.binding.BindingInputStream$IteratorTuples.(BindingInputStream.java:129)
at 
org.apache.jena.sparql.engine.binding.BindingInputStream.(BindingInputStream.java:99)
at 
org.apache.jena.sparql.engine.binding.BindingInputStream.(BindingInputStream.java:78)
at 
org.apache.jena.sparql.engine.binding.BindingInputStream.(BindingInputStream.java:73)
at 
org.apache.jena.riot.system.SerializationFactoryFinder$1.createDeserializer(SerializationFactoryFinder.java:56)
at 
org.apache.jena.atlas.data.SortedDataBag.getInputIterator(SortedDataBag.java:190)
at org.apache.jena.atlas.data.SortedDataBag.iterator(SortedDataBag.java:235)
at org.apache.jena.atlas.data.SortedDataBag.iterator(SortedDataBag.java:206)
at 
org.apache.jena.atlas.data.DistinctDataBag.iterator(DistinctDataBag.java:94){noformat}
The problem is that renaming variables prepends a "/" so that, for example, the 
first line of the spill file might look like the following which 
{{BindingInputStream.directiveVars()}} can't parse:
{noformat}
VARS ?/.1 ?/.0 ?v_2 ?v_21 ?v_1 .{noformat}
Here's a test case that reproduces the exception:
{noformat}
@Test
public void testWithRenamedVars() {
ExprVar expr = (ExprVar) Rename.renameVars(new ExprVar("1"), 
Collections.emptySet());

BindingMap binding = BindingFactory.create();
binding.add(expr.asVar(), NodeFactory.createLiteral("foo"));

SortedDataBag dataBag = BagFactory.newSortedBag(
new ThresholdPolicyCount<>(0),
SerializationFactoryFinder.bindingSerializationFactory(),
new BindingComparator(new ArrayList<>()));
try {
dataBag.add(binding);
dataBag.flush();

// Spill file looks like the following:
// VARS ?/1 .
// "foo" .

Binding actual = dataBag.iterator().next();
assertEquals(binding, actual);
} finally {
dataBag.close();
}
}
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (JENA-1269) Spilling a data bag with boolean literals throws a parse exception

2016-12-22 Thread Shawn Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-1269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15770274#comment-15770274
 ] 

Shawn Smith commented on JENA-1269:
---

I built the latest master and tested it locally.  Everything looks good.  
Thanks for the quick turnaround!

> Spilling a data bag with boolean literals throws a parse exception
> --
>
> Key: JENA-1269
> URL: https://issues.apache.org/jira/browse/JENA-1269
> Project: Apache Jena
>  Issue Type: Bug
>  Components: ARQ
>Affects Versions: Jena 3.1.1
>Reporter: Shawn Smith
>Assignee: Andy Seaborne
> Fix For: Jena 3.2.0
>
>
> Spilling bindings with boolean literals to a DistinctDataBag or SortedDataBag 
> results in parse errors when the data bag reads the bindings back in.  This 
> occurs with:
> {noformat}
> "false"^^
> "true"^^
> {noformat}
> It looks like there's a mismatch where booleans don't round trip correctly 
> through BindingOutputStream and BindingInputStream.  BindingOutputStream 
> writes the boolean literals as to the spill file as "true" or "false", then 
> BindingInputStream parses them as symbol tokens instead of node tokens and 
> fails.
> Here's a unit test that reproduces the parse error:
> {code:java}
> import org.apache.jena.atlas.data.*;
> import org.apache.jena.datatypes.xsd.XSDDatatype;
> import org.apache.jena.graph.*;
> import org.apache.jena.riot.system.SerializationFactoryFinder;
> import org.apache.jena.sparql.core.Var;
> import org.apache.jena.sparql.engine.binding.*;
> import org.junit.Assert;
> import org.junit.Test;
> public class DataBagSpillTest {
>   @Test
>   public void testSpillBooleans() {
> Node literal = NodeFactory.createLiteral("true", XSDDatatype.XSDboolean);
> Binding parent = BindingFactory.binding(Var.alloc("a"), 
> NodeFactory.createLiteral("xyz"));
> Binding binding = BindingFactory.binding(parent, Var.alloc("b"), literal);
> //Binding binding = BindingFactory.binding(BindingFactory.noParent, 
> Var.alloc("b"), literal);
> SerializationFactory serializationFactory = 
> SerializationFactoryFinder.bindingSerializationFactory();
> SortedDataBag dataBag = BagFactory.newSortedBag(new 
> ThresholdPolicyCount<>(0), serializationFactory, null);
> try {
>   dataBag.add(binding);
>   dataBag.flush();
>   // Spill file looks like the following (uses Turtle syntax for 
> literals):
>   // VARS ?b ?a .
>   // true "xyz" .
>   // On reading back the dataBag it throws:
>   //
>   //  org.apache.jena.riot.RiotException: [line: 2, col: 7 ]
>   //Not a valid token for an RDF term: [KEYWORD:false]
>   //
>   // If the test is modified to leave out the 'parent' binding (uncomment 
> 'noParent' line) it throws:
>   //
>   //  org.apache.jena.riot.RiotException: [line: 2, col: 6 ]
>   //Too many items in a line.  Expected 1
>   //
>   Binding actual = dataBag.iterator().next();
>   Assert.assertEquals(binding, actual);
> } finally {
>   dataBag.close();
> }
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (JENA-1269) Spilling a data bag with boolean literals throws a parse exception

2016-12-21 Thread Shawn Smith (JIRA)

 [ 
https://issues.apache.org/jira/browse/JENA-1269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shawn Smith updated JENA-1269:
--
Description: 
Spilling bindings with boolean literals to a DistinctDataBag or SortedDataBag 
results in parse errors when the data bag reads the bindings back in.  This 
occurs with:

{noformat}
"false"^^
"true"^^
{noformat}

It looks like there's a mismatch where booleans don't round trip correctly 
through BindingOutputStream and BindingInputStream.  BindingOutputStream writes 
the boolean literals as to the spill file as "true" or "false", then 
BindingInputStream parses them as symbol tokens instead of node tokens and 
fails.

Here's a unit test that reproduces the parse error:

{code:java}
import org.apache.jena.atlas.data.*;
import org.apache.jena.datatypes.xsd.XSDDatatype;
import org.apache.jena.graph.*;
import org.apache.jena.riot.system.SerializationFactoryFinder;
import org.apache.jena.sparql.core.Var;
import org.apache.jena.sparql.engine.binding.*;
import org.junit.Assert;
import org.junit.Test;

public class DataBagSpillTest {
  @Test
  public void testSpillBooleans() {
Node literal = NodeFactory.createLiteral("true", XSDDatatype.XSDboolean);

Binding parent = BindingFactory.binding(Var.alloc("a"), 
NodeFactory.createLiteral("xyz"));
Binding binding = BindingFactory.binding(parent, Var.alloc("b"), literal);
//Binding binding = BindingFactory.binding(BindingFactory.noParent, 
Var.alloc("b"), literal);

SerializationFactory serializationFactory = 
SerializationFactoryFinder.bindingSerializationFactory();
SortedDataBag dataBag = BagFactory.newSortedBag(new 
ThresholdPolicyCount<>(0), serializationFactory, null);
try {
  dataBag.add(binding);
  dataBag.flush();

  // Spill file looks like the following (uses Turtle syntax for literals):
  // VARS ?b ?a .
  // true "xyz" .

  // On reading back the dataBag it throws:
  //
  //  org.apache.jena.riot.RiotException: [line: 2, col: 7 ]
  //Not a valid token for an RDF term: [KEYWORD:false]
  //
  // If the test is modified to leave out the 'parent' binding (uncomment 
'noParent' line) it throws:
  //
  //  org.apache.jena.riot.RiotException: [line: 2, col: 6 ]
  //Too many items in a line.  Expected 1
  //

  Binding actual = dataBag.iterator().next();
  Assert.assertEquals(binding, actual);
} finally {
  dataBag.close();
}
  }
}
{code}


  was:
Spilling bindings with boolean literals to a DistinctDataBag or SortedDataBag 
results in parse errors when the data bag reads the bindings back in.  This 
occurs with:

{noformat}
"false"^^
"true"^^
{noformat}

It looks like there's a mismatch where booleans don't round trip correctly 
through BindingOutputStream and BindingInputStream.  BindingOutputStream writes 
the boolean literals as to the spill file as "true" or "false", then 
BindingInputStream parses them as symbol tokens instead of node tokens and 
fails.

Here's a unit test that reproduces the parse error:

{code:java}
import org.apache.jena.atlas.data.*;
import org.apache.jena.datatypes.xsd.XSDDatatype;
import org.apache.jena.graph.*;
import org.apache.jena.riot.system.SerializationFactoryFinder;
import org.apache.jena.sparql.core.Var;
import org.apache.jena.sparql.engine.binding.*;
import org.junit.Assert;
import org.junit.Test;

public class JenaSparqlClientTest {
  @Test
  public void testSpillBooleans() {
Node literal = NodeFactory.createLiteral("true", XSDDatatype.XSDboolean);

Binding parent = BindingFactory.binding(Var.alloc("a"), 
NodeFactory.createLiteral("xyz"));
Binding binding = BindingFactory.binding(parent, Var.alloc("b"), literal);
//Binding binding = BindingFactory.binding(BindingFactory.noParent, 
Var.alloc("b"), literal);

SerializationFactory serializationFactory = 
SerializationFactoryFinder.bindingSerializationFactory();
SortedDataBag dataBag = BagFactory.newSortedBag(new 
ThresholdPolicyCount<>(0), serializationFactory, null);
try {
  dataBag.add(binding);
  dataBag.flush();

  // Spill file looks like the following (uses Turtle syntax for literals):
  // VARS ?b ?a .
  // true "xyz" .

  // On reading back the dataBag it throws:
  //
  //  org.apache.jena.riot.RiotException: [line: 2, col: 7 ] Not a valid 
token for an RDF term: [KEYWORD:false]
  //
  // If the test is modified to leave out the 'parent' binding (uncomment 
'noParent' line) it throws:
  //
  //  org.apache.jena.riot.RiotException: [line: 2, col: 6 ] Too many items 
in a line.  Expected 1
  //

  Binding actual = dataBag.iterator().next();
  Assert.assertEquals(binding, actual);
} finally {
  

[jira] [Created] (JENA-1269) Spilling a data bag with boolean literals throws a parse exception

2016-12-21 Thread Shawn Smith (JIRA)
Shawn Smith created JENA-1269:
-

 Summary: Spilling a data bag with boolean literals throws a parse 
exception
 Key: JENA-1269
 URL: https://issues.apache.org/jira/browse/JENA-1269
 Project: Apache Jena
  Issue Type: Bug
  Components: ARQ
Affects Versions: Jena 3.1.1
Reporter: Shawn Smith


Spilling bindings with boolean literals to a DistinctDataBag or SortedDataBag 
results in parse errors when the data bag reads the bindings back in.  This 
occurs with:

{noformat}
"false"^^
"true"^^
{noformat}

It looks like there's a mismatch where booleans don't round trip correctly 
through BindingOutputStream and BindingInputStream.  BindingOutputStream writes 
the boolean literals as to the spill file as "true" or "false", then 
BindingInputStream parses them as symbol tokens instead of node tokens and 
fails.

Here's a unit test that reproduces the parse error:

{code:java}
import org.apache.jena.atlas.data.*;
import org.apache.jena.datatypes.xsd.XSDDatatype;
import org.apache.jena.graph.*;
import org.apache.jena.riot.system.SerializationFactoryFinder;
import org.apache.jena.sparql.core.Var;
import org.apache.jena.sparql.engine.binding.*;
import org.junit.Assert;
import org.junit.Test;

public class JenaSparqlClientTest {
  @Test
  public void testSpillBooleans() {
Node literal = NodeFactory.createLiteral("true", XSDDatatype.XSDboolean);

Binding parent = BindingFactory.binding(Var.alloc("a"), 
NodeFactory.createLiteral("xyz"));
Binding binding = BindingFactory.binding(parent, Var.alloc("b"), literal);
//Binding binding = BindingFactory.binding(BindingFactory.noParent, 
Var.alloc("b"), literal);

SerializationFactory serializationFactory = 
SerializationFactoryFinder.bindingSerializationFactory();
SortedDataBag dataBag = BagFactory.newSortedBag(new 
ThresholdPolicyCount<>(0), serializationFactory, null);
try {
  dataBag.add(binding);
  dataBag.flush();

  // Spill file looks like the following (uses Turtle syntax for literals):
  // VARS ?b ?a .
  // true "xyz" .

  // On reading back the dataBag it throws:
  //
  //  org.apache.jena.riot.RiotException: [line: 2, col: 7 ] Not a valid 
token for an RDF term: [KEYWORD:false]
  //
  // If the test is modified to leave out the 'parent' binding (uncomment 
'noParent' line) it throws:
  //
  //  org.apache.jena.riot.RiotException: [line: 2, col: 6 ] Too many items 
in a line.  Expected 1
  //

  Binding actual = dataBag.iterator().next();
  Assert.assertEquals(binding, actual);
} finally {
  dataBag.close();
}
  }
}
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)