[jira] [Commented] (JENA-587) SELECT DISTINCT returns duplicate results

2013-11-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13820420#comment-13820420
 ] 

Hudson commented on JENA-587:
-

ABORTED: Integrated in Jena_Development_Test #1039 (See 
[https://builds.apache.org/job/Jena_Development_Test/1039/])
Couple more unit tests for JENA-587 (rvesse: rev 1541149)
* 
/jena/trunk/jena-arq/src/test/java/com/hp/hpl/jena/sparql/algebra/optimize/TestOptimizer.java
Re-enable TransformDistinctToReduced making it much stricter about the kinds of 
queries it will optimize.
Expands the unit tests to cover various scenarios identified in the associated 
bug (JENA-587) (rvesse: rev 1541139)
* 
/jena/trunk/jena-arq/src/main/java/com/hp/hpl/jena/sparql/algebra/optimize/Optimize.java
* 
/jena/trunk/jena-arq/src/main/java/com/hp/hpl/jena/sparql/algebra/optimize/TransformDistinctToReduced.java
* 
/jena/trunk/jena-arq/src/test/java/com/hp/hpl/jena/sparql/algebra/optimize/TestOptimizer.java
Make TransformDistinctToReduced off by default until it can be refactored to 
only apply when safe, also disables affected tests for now (JENA-587) (rvesse: 
rev 1541019)
* 
/jena/trunk/jena-arq/src/main/java/com/hp/hpl/jena/sparql/algebra/optimize/Optimize.java
* 
/jena/trunk/jena-arq/src/test/java/com/hp/hpl/jena/sparql/algebra/optimize/TestOptimizer.java


> SELECT DISTINCT returns duplicate results
> -
>
> Key: JENA-587
> URL: https://issues.apache.org/jira/browse/JENA-587
> Project: Apache Jena
>  Issue Type: Bug
>  Components: ARQ
>Affects Versions: Jena 2.11.0
>Reporter: Veyriere
>Assignee: Rob Vesse
> Attachments: D.ttl, Q.rq, bug Jena2.11.0.zip, jena-587.zip
>
>
> SELECT DISTINCT returns duplicate results. Attaching a small quads dump and 
> the query to reproduce with TDB
> Reproduced with Jena 2.11.0 and Jena 2.10.1 (was working with 2.7.4)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


Re: svn commit: r1541118 - in /jena/trunk/jena-arq/src/main/java/org/apache/jena/riot: lang/BlankNodeAllocatorFixedSeedHash.java lang/BlankNodeAllocatorHash.java lang/LabelToNode.java tokens/Tokenizer

2013-11-12 Thread Andy Seaborne

 
jena/trunk/jena-arq/src/main/java/org/apache/jena/riot/tokens/TokenizerFactory.java





Modified: 
jena/trunk/jena-arq/src/main/java/org/apache/jena/riot/tokens/TokenizerFactory.java
URL: 
http://svn.apache.org/viewvc/jena/trunk/jena-arq/src/main/java/org/apache/jena/riot/tokens/TokenizerFactory.java?rev=1541118&r1=1541117&r2=1541118&view=diff
==
--- 
jena/trunk/jena-arq/src/main/java/org/apache/jena/riot/tokens/TokenizerFactory.java
 (original)
+++ 
jena/trunk/jena-arq/src/main/java/org/apache/jena/riot/tokens/TokenizerFactory.java
 Tue Nov 12 15:53:36 2013
@@ -42,6 +42,13 @@ public class TokenizerFactory
  Tokenizer tokenizer = new TokenizerText(peekReader) ;
  return tokenizer ;
  }
+
+public static Tokenizer makeTokenizerUTF8(String string)
+{
+PeekReader peekReader = PeekReader.readString(string);
+Tokenizer tokenizer = new TokenizerText(peekReader);
+return tokenizer;
+}

  public static Tokenizer makeTokenizerASCII(InputStream in)
  {




Rob -

There is TokenizerFactory.makeTokenizerString which is identical to 
makeTokenizerUTF8.  "String" was a better name because a string isn't 
UTF8 in Java.


Andy



[jira] [Commented] (JENA-587) SELECT DISTINCT returns duplicate results

2013-11-12 Thread Rob Vesse (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13820272#comment-13820272
 ] 

Rob Vesse commented on JENA-587:


I've committed a revised version of the optimiser which should implement the 
restrictions we discussed.  Currently it doesn't attempt to handle the case of 
{{SELECT DISTINCT *}} with a total ordering but that could be added later.

Jenkins appears to be ill so I've pushed up SNAPSHOTs manually

> SELECT DISTINCT returns duplicate results
> -
>
> Key: JENA-587
> URL: https://issues.apache.org/jira/browse/JENA-587
> Project: Apache Jena
>  Issue Type: Bug
>  Components: ARQ
>Affects Versions: Jena 2.11.0
>Reporter: Veyriere
>Assignee: Rob Vesse
> Attachments: D.ttl, Q.rq, bug Jena2.11.0.zip, jena-587.zip
>
>
> SELECT DISTINCT returns duplicate results. Attaching a small quads dump and 
> the query to reproduce with TDB
> Reproduced with Jena 2.11.0 and Jena 2.10.1 (was working with 2.7.4)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (JENA-587) SELECT DISTINCT returns duplicate results

2013-11-12 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13820250#comment-13820250
 ] 

ASF subversion and git services commented on JENA-587:
--

Commit 1541149 from [~rvesse] in branch 'jena/trunk'
[ https://svn.apache.org/r1541149 ]

Couple more unit tests for JENA-587

> SELECT DISTINCT returns duplicate results
> -
>
> Key: JENA-587
> URL: https://issues.apache.org/jira/browse/JENA-587
> Project: Apache Jena
>  Issue Type: Bug
>  Components: ARQ
>Affects Versions: Jena 2.11.0
>Reporter: Veyriere
>Assignee: Rob Vesse
> Attachments: D.ttl, Q.rq, bug Jena2.11.0.zip, jena-587.zip
>
>
> SELECT DISTINCT returns duplicate results. Attaching a small quads dump and 
> the query to reproduce with TDB
> Reproduced with Jena 2.11.0 and Jena 2.10.1 (was working with 2.7.4)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (JENA-587) SELECT DISTINCT returns duplicate results

2013-11-12 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13820241#comment-13820241
 ] 

ASF subversion and git services commented on JENA-587:
--

Commit 1541139 from [~rvesse] in branch 'jena/trunk'
[ https://svn.apache.org/r1541139 ]

Re-enable TransformDistinctToReduced making it much stricter about the kinds of 
queries it will optimize.
Expands the unit tests to cover various scenarios identified in the associated 
bug (JENA-587)

> SELECT DISTINCT returns duplicate results
> -
>
> Key: JENA-587
> URL: https://issues.apache.org/jira/browse/JENA-587
> Project: Apache Jena
>  Issue Type: Bug
>  Components: ARQ
>Affects Versions: Jena 2.11.0
>Reporter: Veyriere
>Assignee: Rob Vesse
> Attachments: D.ttl, Q.rq, bug Jena2.11.0.zip, jena-587.zip
>
>
> SELECT DISTINCT returns duplicate results. Attaching a small quads dump and 
> the query to reproduce with TDB
> Reproduced with Jena 2.11.0 and Jena 2.10.1 (was working with 2.7.4)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Comment Edited] (JENA-189) Jena 3 / technical

2013-11-12 Thread Andy Seaborne (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13820129#comment-13820129
 ] 

Andy Seaborne edited comment on JENA-189 at 11/12/13 2:48 PM:
--

There is quite a big difference between, say, needing to change import 
statements because of repacking and needing to use IRIs everywhere.  This is 
not to argue against it but just because some changes require rework, does not 
mean it's all the same amount of work.

For this to be a good idea, we'd need to understand the implications.  Jena IRI 
library performs a detailed parsing of the string.  Is that an acceptable cost? 
 What if a loop is doing an operation where part of the loop body is using the 
same string each time - avoiding repeated parsing maybe necessary.

Jena can support multiple APIs - a possibility is to grow this style in 
parallel with a fairly direct port of the existing API and see which gains 
traction.  It allows for a wide scope for change without forcing it on users 
just to get access to other improvements that aren't connected to the API.


was (Author: andy.seaborne):
There is quite a big difference between, say, needing to change import 
statements because of repacking and needing to use IRIs everywhere.  This is 
not to argue against it but just because some changes require rework, does not 
mean it's all the same amount of work.

For this to be a good idea, we'd need to understand the implications.  Jena IRI 
library performs a detailed parsing of the string.  Is that an acceptable cost? 
 What is a loop is doing an operation where part of the loop body is using the 
same string each time - avoiding repeated parsing maybe necessary.

Jena can support multiple APIs - a possibility is to grow this style in 
parallel with a fairly direct port of the existing API and see which gains 
traction.  It allows for a wide scope for change without forcing it it get 
access to other improvements that aren't connected to the API.

> Jena 3 / technical
> --
>
> Key: JENA-189
> URL: https://issues.apache.org/jira/browse/JENA-189
> Project: Apache Jena
>  Issue Type: Brainstorming
>Reporter: Andy Seaborne
> Attachments: IteratorLockandTransactionsinJena3.pdf
>
>
> This is a JIRA to discuss and collect technical changes to Jena that would 
> warrant a "Jena3" whether an incompatible change or just sufficient changes 
> to mean bumping the major version number is best.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


Fuseki UI: validation conneg

2013-11-12 Thread Ian Dickinson
Hi Andy,
Can we have the various /validate/* methods return JSON when
application/json is requested? I'm replicating the behaviour of the
current validation forms, but I'd like to do them as Ajax requests and
display the results in a codemirror box. So all a JSON API needs to
return is the validation output.

Thanks,
Ian


[jira] [Commented] (JENA-189) Jena 3 / technical

2013-11-12 Thread Andy Seaborne (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13820129#comment-13820129
 ] 

Andy Seaborne commented on JENA-189:


There is quite a big difference between, say, needing to change import 
statements because of repacking and needing to use IRIs everywhere.  This is 
not to argue against it but just because some changes require rework, does not 
mean it's all the same amount of work.

For this to be a good idea, we'd need to understand the implications.  Jena IRI 
library performs a detailed parsing of the string.  Is that an acceptable cost? 
 What is a loop is doing an operation where part of the loop body is using the 
same string each time - avoiding repeated parsing maybe necessary.

Jena can support multiple APIs - a possibility is to grow this style in 
parallel with a fairly direct port of the existing API and see which gains 
traction.  It allows for a wide scope for change without forcing it it get 
access to other improvements that aren't connected to the API.

> Jena 3 / technical
> --
>
> Key: JENA-189
> URL: https://issues.apache.org/jira/browse/JENA-189
> Project: Apache Jena
>  Issue Type: Brainstorming
>Reporter: Andy Seaborne
> Attachments: IteratorLockandTransactionsinJena3.pdf
>
>
> This is a JIRA to discuss and collect technical changes to Jena that would 
> warrant a "Jena3" whether an incompatible change or just sufficient changes 
> to mean bumping the major version number is best.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (JENA-587) SELECT DISTINCT returns duplicate results

2013-11-12 Thread Rob Vesse (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13820051#comment-13820051
 ] 

Rob Vesse commented on JENA-587:


I'll take a look at implemented the restricted optimisation later today

> SELECT DISTINCT returns duplicate results
> -
>
> Key: JENA-587
> URL: https://issues.apache.org/jira/browse/JENA-587
> Project: Apache Jena
>  Issue Type: Bug
>  Components: ARQ
>Affects Versions: Jena 2.11.0
>Reporter: Veyriere
>Assignee: Rob Vesse
> Attachments: D.ttl, Q.rq, bug Jena2.11.0.zip, jena-587.zip
>
>
> SELECT DISTINCT returns duplicate results. Attaching a small quads dump and 
> the query to reproduce with TDB
> Reproduced with Jena 2.11.0 and Jena 2.10.1 (was working with 2.7.4)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (JENA-587) SELECT DISTINCT returns duplicate results

2013-11-12 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13820049#comment-13820049
 ] 

ASF subversion and git services commented on JENA-587:
--

Commit 1541019 from [~rvesse] in branch 'jena/trunk'
[ https://svn.apache.org/r1541019 ]

Make TransformDistinctToReduced off by default until it can be refactored to 
only apply when safe, also disables affected tests for now (JENA-587)

> SELECT DISTINCT returns duplicate results
> -
>
> Key: JENA-587
> URL: https://issues.apache.org/jira/browse/JENA-587
> Project: Apache Jena
>  Issue Type: Bug
>  Components: ARQ
>Affects Versions: Jena 2.11.0
>Reporter: Veyriere
> Attachments: D.ttl, Q.rq, bug Jena2.11.0.zip, jena-587.zip
>
>
> SELECT DISTINCT returns duplicate results. Attaching a small quads dump and 
> the query to reproduce with TDB
> Reproduced with Jena 2.11.0 and Jena 2.10.1 (was working with 2.7.4)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Assigned] (JENA-587) SELECT DISTINCT returns duplicate results

2013-11-12 Thread Rob Vesse (JIRA)

 [ 
https://issues.apache.org/jira/browse/JENA-587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rob Vesse reassigned JENA-587:
--

Assignee: Rob Vesse

> SELECT DISTINCT returns duplicate results
> -
>
> Key: JENA-587
> URL: https://issues.apache.org/jira/browse/JENA-587
> Project: Apache Jena
>  Issue Type: Bug
>  Components: ARQ
>Affects Versions: Jena 2.11.0
>Reporter: Veyriere
>Assignee: Rob Vesse
> Attachments: D.ttl, Q.rq, bug Jena2.11.0.zip, jena-587.zip
>
>
> SELECT DISTINCT returns duplicate results. Attaching a small quads dump and 
> the query to reproduce with TDB
> Reproduced with Jena 2.11.0 and Jena 2.10.1 (was working with 2.7.4)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (JENA-587) SELECT DISTINCT returns duplicate results

2013-11-12 Thread Rob Vesse (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13820050#comment-13820050
 ] 

Rob Vesse commented on JENA-587:


Agreed, I have pushed a commit which disables the optimisation unless 
explicitly enabled for the time being and disables affected tests

> SELECT DISTINCT returns duplicate results
> -
>
> Key: JENA-587
> URL: https://issues.apache.org/jira/browse/JENA-587
> Project: Apache Jena
>  Issue Type: Bug
>  Components: ARQ
>Affects Versions: Jena 2.11.0
>Reporter: Veyriere
> Attachments: D.ttl, Q.rq, bug Jena2.11.0.zip, jena-587.zip
>
>
> SELECT DISTINCT returns duplicate results. Attaching a small quads dump and 
> the query to reproduce with TDB
> Reproduced with Jena 2.11.0 and Jena 2.10.1 (was working with 2.7.4)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (JENA-587) SELECT DISTINCT returns duplicate results

2013-11-12 Thread Andy Seaborne (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13820038#comment-13820038
 ] 

Andy Seaborne commented on JENA-587:


Slight stronger condition - if all the {{DISTINCT}} variables appear in {{ORDER 
BY}} and also only those variables. {{DISTINCT ?v ORDER BY ?v}} and {{DISTINCT 
?v ?w ORDER BY ?v ?w}}

The order in {{ORDER BY}} matters. {{DISTINCT ?v ORDER BY ?v ?w}} is OK but 
reversing v and ?w {{DISTINCT ?v ORDER BY ?w ?v}} is not because the sorting on 
?w first scrambles the ?v adjacency needed by {{REDUCED}}

Shall we disable the optimization in trunk for now to give space to think about 
it?  Better slow/correct than fast/incorrect.


> SELECT DISTINCT returns duplicate results
> -
>
> Key: JENA-587
> URL: https://issues.apache.org/jira/browse/JENA-587
> Project: Apache Jena
>  Issue Type: Bug
>  Components: ARQ
>Affects Versions: Jena 2.11.0
>Reporter: Veyriere
> Attachments: D.ttl, Q.rq, bug Jena2.11.0.zip, jena-587.zip
>
>
> SELECT DISTINCT returns duplicate results. Attaching a small quads dump and 
> the query to reproduce with TDB
> Reproduced with Jena 2.11.0 and Jena 2.10.1 (was working with 2.7.4)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (JENA-587) SELECT DISTINCT returns duplicate results

2013-11-12 Thread Rob Vesse (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13820032#comment-13820032
 ] 

Rob Vesse commented on JENA-587:


OK, so I think what we're getting at is that the optimisation needs to be 
applied more sparingly.

>From what you've outlined can we agree that this is valid if all the 
>{{DISTINCT}} variables appear in the {{ORDER BY}}?

> SELECT DISTINCT returns duplicate results
> -
>
> Key: JENA-587
> URL: https://issues.apache.org/jira/browse/JENA-587
> Project: Apache Jena
>  Issue Type: Bug
>  Components: ARQ
>Affects Versions: Jena 2.11.0
>Reporter: Veyriere
> Attachments: D.ttl, Q.rq, bug Jena2.11.0.zip, jena-587.zip
>
>
> SELECT DISTINCT returns duplicate results. Attaching a small quads dump and 
> the query to reproduce with TDB
> Reproduced with Jena 2.11.0 and Jena 2.10.1 (was working with 2.7.4)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Comment Edited] (JENA-587) SELECT DISTINCT returns duplicate results

2013-11-12 Thread Andy Seaborne (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13820026#comment-13820026
 ] 

Andy Seaborne edited comment on JENA-587 at 11/12/13 11:37 AM:
---

We should not rely on the ARQ join strategy.  

* It may change and it does not apply to all storage systems.
* The join order is only sufficiently predicable for some cases like BGPs - 
even adding union default graph may scramble the the order coming of the 
{{WHERE}} clause depends on the pattern (e.g. some inner SELECTs mixed with 
other things).  We use hash tables for {{MINUS}}.

It is a legal optimization if the {{DISTINCT}} is of variables that are in 
order due to {{ORDER BY}}
Shall we switch the optimization off for the moment while we consider things?
Legal: 
1. {{DISTINCT ?v ORDER BY ?v}}
2. {{DISTINCT ?v ORDER BY ?v ?w}}
3. {{DISTINCT ?v ?w ORDER BY ?v ?w}}
4. {{DISTINCT ?v ?w ORDER BY ?w ?v}}

{{DISTINCT * ORDER BY ...}} is possible only if the ORDER BY is a total 
ordering of the underlying pattern.

Not legal:
1. {{DISTINCT ?v ORDER BY ?w}}
2. {{DISTINCT ?v ORDER BY ?w ?v}} because not sorted by ?v first.

Maybe the first step is to just do some simple cases such as {{ORDER BY}} 
exactly the variables of the project of the {{DISTINCT}} then expand  the 
intelligence of the transformation.



was (Author: andy.seaborne):
We should not reply on the ARQ join strategy.  

* It may change and it does not apply to all storage systems.
* The join order is only sufficiently predicable for some cases like BGPs - 
even adding union default graph may scramble the the order coming of the 
{{WHERE}} clause depends on the pattern (e.g. some inner SELECTs mixed with 
other things).  We use hash tables for {{MINUS}}.

It is a legal optimization if the {{DISTINCT}} is of variables that are in 
order due to {{ORDER BY}}

Legal: 
1. {{DISTINCT ?v ORDER BY ?v}}
2. {{DISTINCT ?v ORDER BY ?v ?w}}
3. {{DISTINCT ?v ?w ORDER BY ?v ?w}}
4. {{DISTINCT ?v ?w ORDER BY ?w ?v}}

{{DISTINCT * ORDER BY ...}} is possible only if the ORDER BY is a total 
ordering of the underlying pattern.

Not legal:
1. {{DISTINCT ?v ORDER BY ?w}}
2. {{DISTINCT ?v ORDER BY ?w ?v}} because not sorted by ?v first.

Maybe the first step is to just do some simple cases such as {{ORDER BY}} 
exactly the variables of the project of the {{DISTINCT}} then expand  the 
intelligence of the transformation.


> SELECT DISTINCT returns duplicate results
> -
>
> Key: JENA-587
> URL: https://issues.apache.org/jira/browse/JENA-587
> Project: Apache Jena
>  Issue Type: Bug
>  Components: ARQ
>Affects Versions: Jena 2.11.0
>Reporter: Veyriere
> Attachments: D.ttl, Q.rq, bug Jena2.11.0.zip, jena-587.zip
>
>
> SELECT DISTINCT returns duplicate results. Attaching a small quads dump and 
> the query to reproduce with TDB
> Reproduced with Jena 2.11.0 and Jena 2.10.1 (was working with 2.7.4)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Comment Edited] (JENA-587) SELECT DISTINCT returns duplicate results

2013-11-12 Thread Andy Seaborne (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13820026#comment-13820026
 ] 

Andy Seaborne edited comment on JENA-587 at 11/12/13 11:31 AM:
---

We should not reply on the ARQ join strategy.  

* It may change and it does not apply to all storage systems.
* The join order is only sufficiently predicable for some cases like BGPs - 
even adding union default graph may scramble the the order coming of the 
{{WHERE}} clause depends on the pattern (e.g. some inner SELECTs mixed with 
other things).  We use hash tables for {{MINUS}}.

It is a legal optimization if the {{DISTINCT}} is of variables that are in 
order due to {{ORDER BY}}

Legal: 
1. {{DISTINCT ?v ORDER BY ?v}}
2. {{DISTINCT ?v ORDER BY ?v ?w}}
3. {{DISTINCT ?v ?w ORDER BY ?v ?w}}
4. {{DISTINCT ?v ?w ORDER BY ?w ?v}}

{{DISTINCT * ORDER BY ...}} is possible only if the ORDER BY is a total 
ordering of the underlying pattern.

Not legal:
1. {{DISTINCT ?v ORDER BY ?w}}
2. {{DISTINCT ?v ORDER BY ?w ?v}} because not sorted by ?v first.

Maybe the first step is to just do some simple cases such as {{ORDER BY}} 
exactly the variables of the project of the {{DISTINCT}} then expand  the 
intelligence of the transformation.



was (Author: andy.seaborne):
We should not reply on the ARQ join strategy.  

* It may change.
* The join order is only predicable for BGPs - even adding union default graph 
may 
The order coming of the {{WHERE}} clause depends on the pattern (e.g. sub 
SELECTs mixed with other things).
* 

It is a legal optimization if the {{DISTINCT}} is of variables that are in 
order due to {{ORDER BY}}

Legal: 
1. {{DISTINCT ?v ORDER BY ?v}}
2. {{DISTINCT ?v ORDER BY ?v ?w}}
3. {{DISTINCT ?v ?w ORDER BY ?v ?w}}
4. {{DISTINCT ?v ?w ORDER BY ?w ?v}}

Not legal:
1. {{DISTINCT ?v ORDER BY ?w}}
2. {{DISTINCT ?v ORDER BY ?w ?v}} because not sorted by ?v first.



> SELECT DISTINCT returns duplicate results
> -
>
> Key: JENA-587
> URL: https://issues.apache.org/jira/browse/JENA-587
> Project: Apache Jena
>  Issue Type: Bug
>  Components: ARQ
>Affects Versions: Jena 2.11.0
>Reporter: Veyriere
> Attachments: D.ttl, Q.rq, bug Jena2.11.0.zip, jena-587.zip
>
>
> SELECT DISTINCT returns duplicate results. Attaching a small quads dump and 
> the query to reproduce with TDB
> Reproduced with Jena 2.11.0 and Jena 2.10.1 (was working with 2.7.4)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (JENA-587) SELECT DISTINCT returns duplicate results

2013-11-12 Thread Andy Seaborne (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13820026#comment-13820026
 ] 

Andy Seaborne commented on JENA-587:


We should not reply on the ARQ join strategy.  

* It may change.
* The join order is only predicable for BGPs - even adding union default graph 
may 
The order coming of the {{WHERE}} clause depends on the pattern (e.g. sub 
SELECTs mixed with other things).
* 

It is a legal optimization if the {{DISTINCT}} is of variables that are in 
order due to {{ORDER BY}}

Legal: 
1. {{DISTINCT ?v ORDER BY ?v}}
2. {{DISTINCT ?v ORDER BY ?v ?w}}
3. {{DISTINCT ?v ?w ORDER BY ?v ?w}}
4. {{DISTINCT ?v ?w ORDER BY ?w ?v}}

Not legal:
1. {{DISTINCT ?v ORDER BY ?w}}
2. {{DISTINCT ?v ORDER BY ?w ?v}} because not sorted by ?v first.



> SELECT DISTINCT returns duplicate results
> -
>
> Key: JENA-587
> URL: https://issues.apache.org/jira/browse/JENA-587
> Project: Apache Jena
>  Issue Type: Bug
>  Components: ARQ
>Affects Versions: Jena 2.11.0
>Reporter: Veyriere
> Attachments: D.ttl, Q.rq, bug Jena2.11.0.zip, jena-587.zip
>
>
> SELECT DISTINCT returns duplicate results. Attaching a small quads dump and 
> the query to reproduce with TDB
> Reproduced with Jena 2.11.0 and Jena 2.10.1 (was working with 2.7.4)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (JENA-587) SELECT DISTINCT returns duplicate results

2013-11-12 Thread Rob Vesse (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13820021#comment-13820021
 ] 

Rob Vesse commented on JENA-587:


Not sure if those compatibility tests will help.  These are primarily around 
ensuring that applying the distinct before the ordering doesn't change the 
query semantics.

What I actually think needs to be done is that the logic in 
{{TransformDistinctToReduced}} needs to change so rather than applying only 
when a {{ORDER BY}} is also present it should apply only when an {{ORDER BY}} 
is not present.  Without the {{ORDER BY}} ARQ's join strategy should guarantee 
that {{REDUCED}} is equivalent to {{DISTINCT}} for the majority of cases (I 
think there will always be some queries where this is not the case)

> SELECT DISTINCT returns duplicate results
> -
>
> Key: JENA-587
> URL: https://issues.apache.org/jira/browse/JENA-587
> Project: Apache Jena
>  Issue Type: Bug
>  Components: ARQ
>Affects Versions: Jena 2.11.0
>Reporter: Veyriere
> Attachments: D.ttl, Q.rq, bug Jena2.11.0.zip, jena-587.zip
>
>
> SELECT DISTINCT returns duplicate results. Attaching a small quads dump and 
> the query to reproduce with TDB
> Reproduced with Jena 2.11.0 and Jena 2.10.1 (was working with 2.7.4)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (JENA-587) SELECT DISTINCT returns duplicate results

2013-11-12 Thread Andy Seaborne (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13820014#comment-13820014
 ] 

Andy Seaborne commented on JENA-587:


c.f. TransformOrderByDistinctAppplication which does do some compatibility 
testing.

> SELECT DISTINCT returns duplicate results
> -
>
> Key: JENA-587
> URL: https://issues.apache.org/jira/browse/JENA-587
> Project: Apache Jena
>  Issue Type: Bug
>  Components: ARQ
>Affects Versions: Jena 2.11.0
>Reporter: Veyriere
> Attachments: D.ttl, Q.rq, bug Jena2.11.0.zip, jena-587.zip
>
>
> SELECT DISTINCT returns duplicate results. Attaching a small quads dump and 
> the query to reproduce with TDB
> Reproduced with Jena 2.11.0 and Jena 2.10.1 (was working with 2.7.4)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Comment Edited] (JENA-587) SELECT DISTINCT returns duplicate results

2013-11-12 Thread Andy Seaborne (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13820008#comment-13820008
 ] 

Andy Seaborne edited comment on JENA-587 at 11/12/13 11:02 AM:
---

This is not TDB related.   See attached files D.ttl and Q.rq

The issue seems to be that the DISTINCT variables and the ORDER BY do not align 
so the "reduced" assumption is invalid.  Maybe it just needs to test that the 
ORDER BY covers the DISTINCT projection.

Running from the command line:

{noformat}
sparql --data D.ttl --file Q.rq
{noformat}

{noformat}
sparql --set arq:optDistinctToReduced=false  --data D.ttl --file Q.rq
{noformat}
gives different answers (the second is right, the first has duplicates).



was (Author: andy.seaborne):
This is not TDB related.   See attached files D.ttl and Q.rq

 he issue seems to be that the DISTINCT variables and the ORDER BY do not align 
so the "reduced" assumption is invalid.  Maybe it just needs to test that the 
ORDER BY covers the DISTINCT projection.

Running from the command line:

{noformat}
sparql --data D.ttl --file Q.rq
{noformat}

{noformat}
sparql --set arq:optDistinctToReduced=false  --data D.ttl --file Q.rq
{noformat}
gives different answers (the second is right, the first has duplicates).


> SELECT DISTINCT returns duplicate results
> -
>
> Key: JENA-587
> URL: https://issues.apache.org/jira/browse/JENA-587
> Project: Apache Jena
>  Issue Type: Bug
>  Components: ARQ
>Affects Versions: Jena 2.11.0
>Reporter: Veyriere
> Attachments: D.ttl, Q.rq, bug Jena2.11.0.zip, jena-587.zip
>
>
> SELECT DISTINCT returns duplicate results. Attaching a small quads dump and 
> the query to reproduce with TDB
> Reproduced with Jena 2.11.0 and Jena 2.10.1 (was working with 2.7.4)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Comment Edited] (JENA-587) SELECT DISTINCT returns duplicate results

2013-11-12 Thread Andy Seaborne (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13820008#comment-13820008
 ] 

Andy Seaborne edited comment on JENA-587 at 11/12/13 11:01 AM:
---

This is not TDB related.   See attached files D.ttl and Q.rq

 he issue seems to be that the DISTINCT variables and the ORDER BY do not align 
so the "reduced" assumption is invalid.  Maybe it just needs to test that the 
ORDER BY covers the DISTINCT projection.

Running from the command line:

{noformat}
sparql --data D.ttl --file Q.rq
{noformat}

{noformat}
sparql --set arq:optDistinctToReduced=false  --data D.ttl --file Q.rq
{noformat}
gives different answers (the second is right, the first has duplicates).



was (Author: andy.seaborne):
This is not TDB related.   See attached files D.ttl and Q.rq

 he issue seems to be that the DISTINCT variables and the ORDER BY do not align 
so the "reduced" assumption is invalid.  Maybe it just needs to test that the 
ORDER BY covers the DISTINCT projection.

Running from the command line:

{noformat}
sparql --data D.ttl --file Q.rq
{noformat}

{noformat}
sparql --set arq:optDistinctToReduced=false  --data D.ttl --file Q.rq
{noformat}



> SELECT DISTINCT returns duplicate results
> -
>
> Key: JENA-587
> URL: https://issues.apache.org/jira/browse/JENA-587
> Project: Apache Jena
>  Issue Type: Bug
>  Components: ARQ
>Affects Versions: Jena 2.11.0
>Reporter: Veyriere
> Attachments: D.ttl, Q.rq, bug Jena2.11.0.zip, jena-587.zip
>
>
> SELECT DISTINCT returns duplicate results. Attaching a small quads dump and 
> the query to reproduce with TDB
> Reproduced with Jena 2.11.0 and Jena 2.10.1 (was working with 2.7.4)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (JENA-587) SELECT DISTINCT returns duplicate results

2013-11-12 Thread Rob Vesse (JIRA)

 [ 
https://issues.apache.org/jira/browse/JENA-587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rob Vesse updated JENA-587:
---

Attachment: jena-587.zip

Attaching cleaned up version of the bug which has appropriate file extensions 
and modifies the query to not require TDB union graph mode to be set.

Issue can reproduced by running with Fuseki using --memTDB option (TDB is 
required to use named graphs in the query), loading the data from the data.nq 
file and running the query in query.rq

> SELECT DISTINCT returns duplicate results
> -
>
> Key: JENA-587
> URL: https://issues.apache.org/jira/browse/JENA-587
> Project: Apache Jena
>  Issue Type: Bug
>  Components: ARQ
>Affects Versions: Jena 2.11.0
>Reporter: Veyriere
> Attachments: D.ttl, Q.rq, bug Jena2.11.0.zip, jena-587.zip
>
>
> SELECT DISTINCT returns duplicate results. Attaching a small quads dump and 
> the query to reproduce with TDB
> Reproduced with Jena 2.11.0 and Jena 2.10.1 (was working with 2.7.4)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (JENA-587) SELECT DISTINCT returns duplicate results

2013-11-12 Thread Andy Seaborne (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13820008#comment-13820008
 ] 

Andy Seaborne commented on JENA-587:


This is not TDB related.   See attached files D.ttl and Q.rq

 he issue seems to be that the DISTINCT variables and the ORDER BY do not align 
so the "reduced" assumption is invalid.  Maybe it just needs to test that the 
ORDER BY covers the DISTINCT projection.

Running from the command line:

{noformat}
sparql --data D.ttl --file Q.rq
{noformat}

{noformat}
sparql --set arq:optDistinctToReduced=false  --data D.ttl --file Q.rq
{noformat}



> SELECT DISTINCT returns duplicate results
> -
>
> Key: JENA-587
> URL: https://issues.apache.org/jira/browse/JENA-587
> Project: Apache Jena
>  Issue Type: Bug
>  Components: ARQ
>Affects Versions: Jena 2.11.0
>Reporter: Veyriere
> Attachments: D.ttl, Q.rq, bug Jena2.11.0.zip, jena-587.zip
>
>
> SELECT DISTINCT returns duplicate results. Attaching a small quads dump and 
> the query to reproduce with TDB
> Reproduced with Jena 2.11.0 and Jena 2.10.1 (was working with 2.7.4)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (JENA-587) SELECT DISTINCT returns duplicate results

2013-11-12 Thread Andy Seaborne (JIRA)

 [ 
https://issues.apache.org/jira/browse/JENA-587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Seaborne updated JENA-587:
---

Attachment: Q.rq
D.ttl

> SELECT DISTINCT returns duplicate results
> -
>
> Key: JENA-587
> URL: https://issues.apache.org/jira/browse/JENA-587
> Project: Apache Jena
>  Issue Type: Bug
>  Components: ARQ
>Affects Versions: Jena 2.11.0
>Reporter: Veyriere
> Attachments: D.ttl, Q.rq, bug Jena2.11.0.zip, jena-587.zip
>
>
> SELECT DISTINCT returns duplicate results. Attaching a small quads dump and 
> the query to reproduce with TDB
> Reproduced with Jena 2.11.0 and Jena 2.10.1 (was working with 2.7.4)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Comment Edited] (JENA-587) SELECT DISTINCT returns duplicate results

2013-11-12 Thread Rob Vesse (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13819993#comment-13819993
 ] 

Rob Vesse edited comment on JENA-587 at 11/12/13 10:56 AM:
---

More recent versions of ARQ automatically optimise DISTINCT -> REDUCED which 
may leave some duplicates.

Due to the predictable way in which TDB returns scan results and ARQ executes 
joins for most queries this is a non-issue since the two queries will be 
equivalent since REDUCED in ARQ eliminates neighbouring non-distinct solutions.

This behaviour can be turned off like so:

{noformat}
ARQ.getContext().set(ARQ.optDistinctToReduced, false)
{noformat}


was (Author: rvesse):
More recent of versions automatically optimise DISTINCT -> REDUCED which may 
leave some duplicates.

Due to the predictable way in which TDB returns scan results and ARQ executes 
joins for most queries this is a non-issue since the two queries will be 
equivalent since REDUCED in ARQ eliminates neighbouring non-distinct solutions.

This behaviour can be turned off like so:

{noformat}
ARQ.getContext().set(ARQ.optDistinctToReduced, false)
{noformat}

> SELECT DISTINCT returns duplicate results
> -
>
> Key: JENA-587
> URL: https://issues.apache.org/jira/browse/JENA-587
> Project: Apache Jena
>  Issue Type: Bug
>  Components: ARQ
>Affects Versions: Jena 2.11.0
>Reporter: Veyriere
> Attachments: bug Jena2.11.0.zip
>
>
> SELECT DISTINCT returns duplicate results. Attaching a small quads dump and 
> the query to reproduce with TDB
> Reproduced with Jena 2.11.0 and Jena 2.10.1 (was working with 2.7.4)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (JENA-587) SELECT DISTINCT returns duplicate results

2013-11-12 Thread Rob Vesse (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13820002#comment-13820002
 ] 

Rob Vesse commented on JENA-587:


The specific cause of the {{DISTINCT}} not being equivalent to the {{REDUCED}} 
in your case is that the use of {{ORDER BY}} changes the ordering of rows so 
the non-distinct rows are not adjacent meaning that {{REDUCED}} does not 
eliminate them.  Removing the {{ORDER BY}} does result in duplicates being 
eliminated.

> SELECT DISTINCT returns duplicate results
> -
>
> Key: JENA-587
> URL: https://issues.apache.org/jira/browse/JENA-587
> Project: Apache Jena
>  Issue Type: Bug
>  Components: ARQ
>Affects Versions: Jena 2.11.0
>Reporter: Veyriere
> Attachments: bug Jena2.11.0.zip
>
>
> SELECT DISTINCT returns duplicate results. Attaching a small quads dump and 
> the query to reproduce with TDB
> Reproduced with Jena 2.11.0 and Jena 2.10.1 (was working with 2.7.4)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (JENA-587) SELECT DISTINCT returns duplicate results

2013-11-12 Thread Rob Vesse (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13819993#comment-13819993
 ] 

Rob Vesse commented on JENA-587:


More recent of versions automatically optimise DISTINCT -> REDUCED which may 
leave some duplicates.

Due to the predictable way in which TDB returns scan results and ARQ executes 
joins for most queries this is a non-issue since the two queries will be 
equivalent since REDUCED in ARQ eliminates neighbouring non-distinct solutions.

This behaviour can be turned off like so:

{noformat}
ARQ.getContext().set(ARQ.optDistinctToReduced, false)
{noformat}

> SELECT DISTINCT returns duplicate results
> -
>
> Key: JENA-587
> URL: https://issues.apache.org/jira/browse/JENA-587
> Project: Apache Jena
>  Issue Type: Bug
>  Components: ARQ
>Affects Versions: Jena 2.11.0
>Reporter: Veyriere
> Attachments: bug Jena2.11.0.zip
>
>
> SELECT DISTINCT returns duplicate results. Attaching a small quads dump and 
> the query to reproduce with TDB
> Reproduced with Jena 2.11.0 and Jena 2.10.1 (was working with 2.7.4)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Closed] (JENA-586) Fuseki 500 - Out of range: on multiple add to fuseki with in memory store

2013-11-12 Thread Andy Seaborne (JIRA)

 [ 
https://issues.apache.org/jira/browse/JENA-586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Seaborne closed JENA-586.
--


> Fuseki 500 - Out of range: on multiple add to fuseki with in memory store
> -
>
> Key: JENA-586
> URL: https://issues.apache.org/jira/browse/JENA-586
> Project: Apache Jena
>  Issue Type: Bug
>  Components: Fuseki
>Affects Versions: Fuseki 1.0.0
> Environment: Windows 8
>Reporter: Brian McBride
>Assignee: Andy Seaborne
>Priority: Minor
> Fix For: Jena 2.11.1
>
> Attachments: testMultipleAdd.zip
>
>
> I have junit tests of my application failing.  The tests use a Fuseki 
> configured with an in memory tdb.
> The tests fail when they do a second DatasetAccessor.add call to the same 
> graph.
> My tests work when run against a Fuseki with a persistent TDB using the 
> filing system.  I've marked the issue as major in case it is timing dependent 
> issue.  If its just an issue with the in-memory store, it it less significant.
> I will attach a minimal example once I have submitted this.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (JENA-587) SELECT DISTINCT returns duplicate results

2013-11-12 Thread Veyriere (JIRA)

 [ 
https://issues.apache.org/jira/browse/JENA-587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Veyriere updated JENA-587:
--

Attachment: bug Jena2.11.0.zip

> SELECT DISTINCT returns duplicate results
> -
>
> Key: JENA-587
> URL: https://issues.apache.org/jira/browse/JENA-587
> Project: Apache Jena
>  Issue Type: Bug
>  Components: ARQ
>Affects Versions: Jena 2.11.0
>Reporter: Veyriere
> Attachments: bug Jena2.11.0.zip
>
>
> SELECT DISTINCT returns duplicate results. Attaching a small quads dump and 
> the query to reproduce with TDB
> Reproduced with Jena 2.11.0 and Jena 2.10.1 (was working with 2.7.4)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (JENA-587) SELECT DISTINCT returns duplicate results

2013-11-12 Thread Veyriere (JIRA)
Veyriere created JENA-587:
-

 Summary: SELECT DISTINCT returns duplicate results
 Key: JENA-587
 URL: https://issues.apache.org/jira/browse/JENA-587
 Project: Apache Jena
  Issue Type: Bug
  Components: ARQ
Affects Versions: Jena 2.11.0
Reporter: Veyriere
 Attachments: bug Jena2.11.0.zip

SELECT DISTINCT returns duplicate results. Attaching a small quads dump and the 
query to reproduce with TDB
Reproduced with Jena 2.11.0 and Jena 2.10.1 (was working with 2.7.4)



--
This message was sent by Atlassian JIRA
(v6.1#6144)