[jira] [Commented] (JENA-587) SELECT DISTINCT returns duplicate results
[ https://issues.apache.org/jira/browse/JENA-587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13820420#comment-13820420 ] Hudson commented on JENA-587: - ABORTED: Integrated in Jena_Development_Test #1039 (See [https://builds.apache.org/job/Jena_Development_Test/1039/]) Couple more unit tests for JENA-587 (rvesse: rev 1541149) * /jena/trunk/jena-arq/src/test/java/com/hp/hpl/jena/sparql/algebra/optimize/TestOptimizer.java Re-enable TransformDistinctToReduced making it much stricter about the kinds of queries it will optimize. Expands the unit tests to cover various scenarios identified in the associated bug (JENA-587) (rvesse: rev 1541139) * /jena/trunk/jena-arq/src/main/java/com/hp/hpl/jena/sparql/algebra/optimize/Optimize.java * /jena/trunk/jena-arq/src/main/java/com/hp/hpl/jena/sparql/algebra/optimize/TransformDistinctToReduced.java * /jena/trunk/jena-arq/src/test/java/com/hp/hpl/jena/sparql/algebra/optimize/TestOptimizer.java Make TransformDistinctToReduced off by default until it can be refactored to only apply when safe, also disables affected tests for now (JENA-587) (rvesse: rev 1541019) * /jena/trunk/jena-arq/src/main/java/com/hp/hpl/jena/sparql/algebra/optimize/Optimize.java * /jena/trunk/jena-arq/src/test/java/com/hp/hpl/jena/sparql/algebra/optimize/TestOptimizer.java > SELECT DISTINCT returns duplicate results > - > > Key: JENA-587 > URL: https://issues.apache.org/jira/browse/JENA-587 > Project: Apache Jena > Issue Type: Bug > Components: ARQ >Affects Versions: Jena 2.11.0 >Reporter: Veyriere >Assignee: Rob Vesse > Attachments: D.ttl, Q.rq, bug Jena2.11.0.zip, jena-587.zip > > > SELECT DISTINCT returns duplicate results. Attaching a small quads dump and > the query to reproduce with TDB > Reproduced with Jena 2.11.0 and Jena 2.10.1 (was working with 2.7.4) -- This message was sent by Atlassian JIRA (v6.1#6144)
Re: svn commit: r1541118 - in /jena/trunk/jena-arq/src/main/java/org/apache/jena/riot: lang/BlankNodeAllocatorFixedSeedHash.java lang/BlankNodeAllocatorHash.java lang/LabelToNode.java tokens/Tokenizer
jena/trunk/jena-arq/src/main/java/org/apache/jena/riot/tokens/TokenizerFactory.java Modified: jena/trunk/jena-arq/src/main/java/org/apache/jena/riot/tokens/TokenizerFactory.java URL: http://svn.apache.org/viewvc/jena/trunk/jena-arq/src/main/java/org/apache/jena/riot/tokens/TokenizerFactory.java?rev=1541118&r1=1541117&r2=1541118&view=diff == --- jena/trunk/jena-arq/src/main/java/org/apache/jena/riot/tokens/TokenizerFactory.java (original) +++ jena/trunk/jena-arq/src/main/java/org/apache/jena/riot/tokens/TokenizerFactory.java Tue Nov 12 15:53:36 2013 @@ -42,6 +42,13 @@ public class TokenizerFactory Tokenizer tokenizer = new TokenizerText(peekReader) ; return tokenizer ; } + +public static Tokenizer makeTokenizerUTF8(String string) +{ +PeekReader peekReader = PeekReader.readString(string); +Tokenizer tokenizer = new TokenizerText(peekReader); +return tokenizer; +} public static Tokenizer makeTokenizerASCII(InputStream in) { Rob - There is TokenizerFactory.makeTokenizerString which is identical to makeTokenizerUTF8. "String" was a better name because a string isn't UTF8 in Java. Andy
[jira] [Commented] (JENA-587) SELECT DISTINCT returns duplicate results
[ https://issues.apache.org/jira/browse/JENA-587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13820272#comment-13820272 ] Rob Vesse commented on JENA-587: I've committed a revised version of the optimiser which should implement the restrictions we discussed. Currently it doesn't attempt to handle the case of {{SELECT DISTINCT *}} with a total ordering but that could be added later. Jenkins appears to be ill so I've pushed up SNAPSHOTs manually > SELECT DISTINCT returns duplicate results > - > > Key: JENA-587 > URL: https://issues.apache.org/jira/browse/JENA-587 > Project: Apache Jena > Issue Type: Bug > Components: ARQ >Affects Versions: Jena 2.11.0 >Reporter: Veyriere >Assignee: Rob Vesse > Attachments: D.ttl, Q.rq, bug Jena2.11.0.zip, jena-587.zip > > > SELECT DISTINCT returns duplicate results. Attaching a small quads dump and > the query to reproduce with TDB > Reproduced with Jena 2.11.0 and Jena 2.10.1 (was working with 2.7.4) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (JENA-587) SELECT DISTINCT returns duplicate results
[ https://issues.apache.org/jira/browse/JENA-587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13820250#comment-13820250 ] ASF subversion and git services commented on JENA-587: -- Commit 1541149 from [~rvesse] in branch 'jena/trunk' [ https://svn.apache.org/r1541149 ] Couple more unit tests for JENA-587 > SELECT DISTINCT returns duplicate results > - > > Key: JENA-587 > URL: https://issues.apache.org/jira/browse/JENA-587 > Project: Apache Jena > Issue Type: Bug > Components: ARQ >Affects Versions: Jena 2.11.0 >Reporter: Veyriere >Assignee: Rob Vesse > Attachments: D.ttl, Q.rq, bug Jena2.11.0.zip, jena-587.zip > > > SELECT DISTINCT returns duplicate results. Attaching a small quads dump and > the query to reproduce with TDB > Reproduced with Jena 2.11.0 and Jena 2.10.1 (was working with 2.7.4) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (JENA-587) SELECT DISTINCT returns duplicate results
[ https://issues.apache.org/jira/browse/JENA-587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13820241#comment-13820241 ] ASF subversion and git services commented on JENA-587: -- Commit 1541139 from [~rvesse] in branch 'jena/trunk' [ https://svn.apache.org/r1541139 ] Re-enable TransformDistinctToReduced making it much stricter about the kinds of queries it will optimize. Expands the unit tests to cover various scenarios identified in the associated bug (JENA-587) > SELECT DISTINCT returns duplicate results > - > > Key: JENA-587 > URL: https://issues.apache.org/jira/browse/JENA-587 > Project: Apache Jena > Issue Type: Bug > Components: ARQ >Affects Versions: Jena 2.11.0 >Reporter: Veyriere >Assignee: Rob Vesse > Attachments: D.ttl, Q.rq, bug Jena2.11.0.zip, jena-587.zip > > > SELECT DISTINCT returns duplicate results. Attaching a small quads dump and > the query to reproduce with TDB > Reproduced with Jena 2.11.0 and Jena 2.10.1 (was working with 2.7.4) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Comment Edited] (JENA-189) Jena 3 / technical
[ https://issues.apache.org/jira/browse/JENA-189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13820129#comment-13820129 ] Andy Seaborne edited comment on JENA-189 at 11/12/13 2:48 PM: -- There is quite a big difference between, say, needing to change import statements because of repacking and needing to use IRIs everywhere. This is not to argue against it but just because some changes require rework, does not mean it's all the same amount of work. For this to be a good idea, we'd need to understand the implications. Jena IRI library performs a detailed parsing of the string. Is that an acceptable cost? What if a loop is doing an operation where part of the loop body is using the same string each time - avoiding repeated parsing maybe necessary. Jena can support multiple APIs - a possibility is to grow this style in parallel with a fairly direct port of the existing API and see which gains traction. It allows for a wide scope for change without forcing it on users just to get access to other improvements that aren't connected to the API. was (Author: andy.seaborne): There is quite a big difference between, say, needing to change import statements because of repacking and needing to use IRIs everywhere. This is not to argue against it but just because some changes require rework, does not mean it's all the same amount of work. For this to be a good idea, we'd need to understand the implications. Jena IRI library performs a detailed parsing of the string. Is that an acceptable cost? What is a loop is doing an operation where part of the loop body is using the same string each time - avoiding repeated parsing maybe necessary. Jena can support multiple APIs - a possibility is to grow this style in parallel with a fairly direct port of the existing API and see which gains traction. It allows for a wide scope for change without forcing it it get access to other improvements that aren't connected to the API. > Jena 3 / technical > -- > > Key: JENA-189 > URL: https://issues.apache.org/jira/browse/JENA-189 > Project: Apache Jena > Issue Type: Brainstorming >Reporter: Andy Seaborne > Attachments: IteratorLockandTransactionsinJena3.pdf > > > This is a JIRA to discuss and collect technical changes to Jena that would > warrant a "Jena3" whether an incompatible change or just sufficient changes > to mean bumping the major version number is best. -- This message was sent by Atlassian JIRA (v6.1#6144)
Fuseki UI: validation conneg
Hi Andy, Can we have the various /validate/* methods return JSON when application/json is requested? I'm replicating the behaviour of the current validation forms, but I'd like to do them as Ajax requests and display the results in a codemirror box. So all a JSON API needs to return is the validation output. Thanks, Ian
[jira] [Commented] (JENA-189) Jena 3 / technical
[ https://issues.apache.org/jira/browse/JENA-189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13820129#comment-13820129 ] Andy Seaborne commented on JENA-189: There is quite a big difference between, say, needing to change import statements because of repacking and needing to use IRIs everywhere. This is not to argue against it but just because some changes require rework, does not mean it's all the same amount of work. For this to be a good idea, we'd need to understand the implications. Jena IRI library performs a detailed parsing of the string. Is that an acceptable cost? What is a loop is doing an operation where part of the loop body is using the same string each time - avoiding repeated parsing maybe necessary. Jena can support multiple APIs - a possibility is to grow this style in parallel with a fairly direct port of the existing API and see which gains traction. It allows for a wide scope for change without forcing it it get access to other improvements that aren't connected to the API. > Jena 3 / technical > -- > > Key: JENA-189 > URL: https://issues.apache.org/jira/browse/JENA-189 > Project: Apache Jena > Issue Type: Brainstorming >Reporter: Andy Seaborne > Attachments: IteratorLockandTransactionsinJena3.pdf > > > This is a JIRA to discuss and collect technical changes to Jena that would > warrant a "Jena3" whether an incompatible change or just sufficient changes > to mean bumping the major version number is best. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (JENA-587) SELECT DISTINCT returns duplicate results
[ https://issues.apache.org/jira/browse/JENA-587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13820051#comment-13820051 ] Rob Vesse commented on JENA-587: I'll take a look at implemented the restricted optimisation later today > SELECT DISTINCT returns duplicate results > - > > Key: JENA-587 > URL: https://issues.apache.org/jira/browse/JENA-587 > Project: Apache Jena > Issue Type: Bug > Components: ARQ >Affects Versions: Jena 2.11.0 >Reporter: Veyriere >Assignee: Rob Vesse > Attachments: D.ttl, Q.rq, bug Jena2.11.0.zip, jena-587.zip > > > SELECT DISTINCT returns duplicate results. Attaching a small quads dump and > the query to reproduce with TDB > Reproduced with Jena 2.11.0 and Jena 2.10.1 (was working with 2.7.4) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (JENA-587) SELECT DISTINCT returns duplicate results
[ https://issues.apache.org/jira/browse/JENA-587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13820049#comment-13820049 ] ASF subversion and git services commented on JENA-587: -- Commit 1541019 from [~rvesse] in branch 'jena/trunk' [ https://svn.apache.org/r1541019 ] Make TransformDistinctToReduced off by default until it can be refactored to only apply when safe, also disables affected tests for now (JENA-587) > SELECT DISTINCT returns duplicate results > - > > Key: JENA-587 > URL: https://issues.apache.org/jira/browse/JENA-587 > Project: Apache Jena > Issue Type: Bug > Components: ARQ >Affects Versions: Jena 2.11.0 >Reporter: Veyriere > Attachments: D.ttl, Q.rq, bug Jena2.11.0.zip, jena-587.zip > > > SELECT DISTINCT returns duplicate results. Attaching a small quads dump and > the query to reproduce with TDB > Reproduced with Jena 2.11.0 and Jena 2.10.1 (was working with 2.7.4) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Assigned] (JENA-587) SELECT DISTINCT returns duplicate results
[ https://issues.apache.org/jira/browse/JENA-587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rob Vesse reassigned JENA-587: -- Assignee: Rob Vesse > SELECT DISTINCT returns duplicate results > - > > Key: JENA-587 > URL: https://issues.apache.org/jira/browse/JENA-587 > Project: Apache Jena > Issue Type: Bug > Components: ARQ >Affects Versions: Jena 2.11.0 >Reporter: Veyriere >Assignee: Rob Vesse > Attachments: D.ttl, Q.rq, bug Jena2.11.0.zip, jena-587.zip > > > SELECT DISTINCT returns duplicate results. Attaching a small quads dump and > the query to reproduce with TDB > Reproduced with Jena 2.11.0 and Jena 2.10.1 (was working with 2.7.4) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (JENA-587) SELECT DISTINCT returns duplicate results
[ https://issues.apache.org/jira/browse/JENA-587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13820050#comment-13820050 ] Rob Vesse commented on JENA-587: Agreed, I have pushed a commit which disables the optimisation unless explicitly enabled for the time being and disables affected tests > SELECT DISTINCT returns duplicate results > - > > Key: JENA-587 > URL: https://issues.apache.org/jira/browse/JENA-587 > Project: Apache Jena > Issue Type: Bug > Components: ARQ >Affects Versions: Jena 2.11.0 >Reporter: Veyriere > Attachments: D.ttl, Q.rq, bug Jena2.11.0.zip, jena-587.zip > > > SELECT DISTINCT returns duplicate results. Attaching a small quads dump and > the query to reproduce with TDB > Reproduced with Jena 2.11.0 and Jena 2.10.1 (was working with 2.7.4) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (JENA-587) SELECT DISTINCT returns duplicate results
[ https://issues.apache.org/jira/browse/JENA-587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13820038#comment-13820038 ] Andy Seaborne commented on JENA-587: Slight stronger condition - if all the {{DISTINCT}} variables appear in {{ORDER BY}} and also only those variables. {{DISTINCT ?v ORDER BY ?v}} and {{DISTINCT ?v ?w ORDER BY ?v ?w}} The order in {{ORDER BY}} matters. {{DISTINCT ?v ORDER BY ?v ?w}} is OK but reversing v and ?w {{DISTINCT ?v ORDER BY ?w ?v}} is not because the sorting on ?w first scrambles the ?v adjacency needed by {{REDUCED}} Shall we disable the optimization in trunk for now to give space to think about it? Better slow/correct than fast/incorrect. > SELECT DISTINCT returns duplicate results > - > > Key: JENA-587 > URL: https://issues.apache.org/jira/browse/JENA-587 > Project: Apache Jena > Issue Type: Bug > Components: ARQ >Affects Versions: Jena 2.11.0 >Reporter: Veyriere > Attachments: D.ttl, Q.rq, bug Jena2.11.0.zip, jena-587.zip > > > SELECT DISTINCT returns duplicate results. Attaching a small quads dump and > the query to reproduce with TDB > Reproduced with Jena 2.11.0 and Jena 2.10.1 (was working with 2.7.4) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (JENA-587) SELECT DISTINCT returns duplicate results
[ https://issues.apache.org/jira/browse/JENA-587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13820032#comment-13820032 ] Rob Vesse commented on JENA-587: OK, so I think what we're getting at is that the optimisation needs to be applied more sparingly. >From what you've outlined can we agree that this is valid if all the >{{DISTINCT}} variables appear in the {{ORDER BY}}? > SELECT DISTINCT returns duplicate results > - > > Key: JENA-587 > URL: https://issues.apache.org/jira/browse/JENA-587 > Project: Apache Jena > Issue Type: Bug > Components: ARQ >Affects Versions: Jena 2.11.0 >Reporter: Veyriere > Attachments: D.ttl, Q.rq, bug Jena2.11.0.zip, jena-587.zip > > > SELECT DISTINCT returns duplicate results. Attaching a small quads dump and > the query to reproduce with TDB > Reproduced with Jena 2.11.0 and Jena 2.10.1 (was working with 2.7.4) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Comment Edited] (JENA-587) SELECT DISTINCT returns duplicate results
[ https://issues.apache.org/jira/browse/JENA-587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13820026#comment-13820026 ] Andy Seaborne edited comment on JENA-587 at 11/12/13 11:37 AM: --- We should not rely on the ARQ join strategy. * It may change and it does not apply to all storage systems. * The join order is only sufficiently predicable for some cases like BGPs - even adding union default graph may scramble the the order coming of the {{WHERE}} clause depends on the pattern (e.g. some inner SELECTs mixed with other things). We use hash tables for {{MINUS}}. It is a legal optimization if the {{DISTINCT}} is of variables that are in order due to {{ORDER BY}} Shall we switch the optimization off for the moment while we consider things? Legal: 1. {{DISTINCT ?v ORDER BY ?v}} 2. {{DISTINCT ?v ORDER BY ?v ?w}} 3. {{DISTINCT ?v ?w ORDER BY ?v ?w}} 4. {{DISTINCT ?v ?w ORDER BY ?w ?v}} {{DISTINCT * ORDER BY ...}} is possible only if the ORDER BY is a total ordering of the underlying pattern. Not legal: 1. {{DISTINCT ?v ORDER BY ?w}} 2. {{DISTINCT ?v ORDER BY ?w ?v}} because not sorted by ?v first. Maybe the first step is to just do some simple cases such as {{ORDER BY}} exactly the variables of the project of the {{DISTINCT}} then expand the intelligence of the transformation. was (Author: andy.seaborne): We should not reply on the ARQ join strategy. * It may change and it does not apply to all storage systems. * The join order is only sufficiently predicable for some cases like BGPs - even adding union default graph may scramble the the order coming of the {{WHERE}} clause depends on the pattern (e.g. some inner SELECTs mixed with other things). We use hash tables for {{MINUS}}. It is a legal optimization if the {{DISTINCT}} is of variables that are in order due to {{ORDER BY}} Legal: 1. {{DISTINCT ?v ORDER BY ?v}} 2. {{DISTINCT ?v ORDER BY ?v ?w}} 3. {{DISTINCT ?v ?w ORDER BY ?v ?w}} 4. {{DISTINCT ?v ?w ORDER BY ?w ?v}} {{DISTINCT * ORDER BY ...}} is possible only if the ORDER BY is a total ordering of the underlying pattern. Not legal: 1. {{DISTINCT ?v ORDER BY ?w}} 2. {{DISTINCT ?v ORDER BY ?w ?v}} because not sorted by ?v first. Maybe the first step is to just do some simple cases such as {{ORDER BY}} exactly the variables of the project of the {{DISTINCT}} then expand the intelligence of the transformation. > SELECT DISTINCT returns duplicate results > - > > Key: JENA-587 > URL: https://issues.apache.org/jira/browse/JENA-587 > Project: Apache Jena > Issue Type: Bug > Components: ARQ >Affects Versions: Jena 2.11.0 >Reporter: Veyriere > Attachments: D.ttl, Q.rq, bug Jena2.11.0.zip, jena-587.zip > > > SELECT DISTINCT returns duplicate results. Attaching a small quads dump and > the query to reproduce with TDB > Reproduced with Jena 2.11.0 and Jena 2.10.1 (was working with 2.7.4) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Comment Edited] (JENA-587) SELECT DISTINCT returns duplicate results
[ https://issues.apache.org/jira/browse/JENA-587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13820026#comment-13820026 ] Andy Seaborne edited comment on JENA-587 at 11/12/13 11:31 AM: --- We should not reply on the ARQ join strategy. * It may change and it does not apply to all storage systems. * The join order is only sufficiently predicable for some cases like BGPs - even adding union default graph may scramble the the order coming of the {{WHERE}} clause depends on the pattern (e.g. some inner SELECTs mixed with other things). We use hash tables for {{MINUS}}. It is a legal optimization if the {{DISTINCT}} is of variables that are in order due to {{ORDER BY}} Legal: 1. {{DISTINCT ?v ORDER BY ?v}} 2. {{DISTINCT ?v ORDER BY ?v ?w}} 3. {{DISTINCT ?v ?w ORDER BY ?v ?w}} 4. {{DISTINCT ?v ?w ORDER BY ?w ?v}} {{DISTINCT * ORDER BY ...}} is possible only if the ORDER BY is a total ordering of the underlying pattern. Not legal: 1. {{DISTINCT ?v ORDER BY ?w}} 2. {{DISTINCT ?v ORDER BY ?w ?v}} because not sorted by ?v first. Maybe the first step is to just do some simple cases such as {{ORDER BY}} exactly the variables of the project of the {{DISTINCT}} then expand the intelligence of the transformation. was (Author: andy.seaborne): We should not reply on the ARQ join strategy. * It may change. * The join order is only predicable for BGPs - even adding union default graph may The order coming of the {{WHERE}} clause depends on the pattern (e.g. sub SELECTs mixed with other things). * It is a legal optimization if the {{DISTINCT}} is of variables that are in order due to {{ORDER BY}} Legal: 1. {{DISTINCT ?v ORDER BY ?v}} 2. {{DISTINCT ?v ORDER BY ?v ?w}} 3. {{DISTINCT ?v ?w ORDER BY ?v ?w}} 4. {{DISTINCT ?v ?w ORDER BY ?w ?v}} Not legal: 1. {{DISTINCT ?v ORDER BY ?w}} 2. {{DISTINCT ?v ORDER BY ?w ?v}} because not sorted by ?v first. > SELECT DISTINCT returns duplicate results > - > > Key: JENA-587 > URL: https://issues.apache.org/jira/browse/JENA-587 > Project: Apache Jena > Issue Type: Bug > Components: ARQ >Affects Versions: Jena 2.11.0 >Reporter: Veyriere > Attachments: D.ttl, Q.rq, bug Jena2.11.0.zip, jena-587.zip > > > SELECT DISTINCT returns duplicate results. Attaching a small quads dump and > the query to reproduce with TDB > Reproduced with Jena 2.11.0 and Jena 2.10.1 (was working with 2.7.4) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (JENA-587) SELECT DISTINCT returns duplicate results
[ https://issues.apache.org/jira/browse/JENA-587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13820026#comment-13820026 ] Andy Seaborne commented on JENA-587: We should not reply on the ARQ join strategy. * It may change. * The join order is only predicable for BGPs - even adding union default graph may The order coming of the {{WHERE}} clause depends on the pattern (e.g. sub SELECTs mixed with other things). * It is a legal optimization if the {{DISTINCT}} is of variables that are in order due to {{ORDER BY}} Legal: 1. {{DISTINCT ?v ORDER BY ?v}} 2. {{DISTINCT ?v ORDER BY ?v ?w}} 3. {{DISTINCT ?v ?w ORDER BY ?v ?w}} 4. {{DISTINCT ?v ?w ORDER BY ?w ?v}} Not legal: 1. {{DISTINCT ?v ORDER BY ?w}} 2. {{DISTINCT ?v ORDER BY ?w ?v}} because not sorted by ?v first. > SELECT DISTINCT returns duplicate results > - > > Key: JENA-587 > URL: https://issues.apache.org/jira/browse/JENA-587 > Project: Apache Jena > Issue Type: Bug > Components: ARQ >Affects Versions: Jena 2.11.0 >Reporter: Veyriere > Attachments: D.ttl, Q.rq, bug Jena2.11.0.zip, jena-587.zip > > > SELECT DISTINCT returns duplicate results. Attaching a small quads dump and > the query to reproduce with TDB > Reproduced with Jena 2.11.0 and Jena 2.10.1 (was working with 2.7.4) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (JENA-587) SELECT DISTINCT returns duplicate results
[ https://issues.apache.org/jira/browse/JENA-587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13820021#comment-13820021 ] Rob Vesse commented on JENA-587: Not sure if those compatibility tests will help. These are primarily around ensuring that applying the distinct before the ordering doesn't change the query semantics. What I actually think needs to be done is that the logic in {{TransformDistinctToReduced}} needs to change so rather than applying only when a {{ORDER BY}} is also present it should apply only when an {{ORDER BY}} is not present. Without the {{ORDER BY}} ARQ's join strategy should guarantee that {{REDUCED}} is equivalent to {{DISTINCT}} for the majority of cases (I think there will always be some queries where this is not the case) > SELECT DISTINCT returns duplicate results > - > > Key: JENA-587 > URL: https://issues.apache.org/jira/browse/JENA-587 > Project: Apache Jena > Issue Type: Bug > Components: ARQ >Affects Versions: Jena 2.11.0 >Reporter: Veyriere > Attachments: D.ttl, Q.rq, bug Jena2.11.0.zip, jena-587.zip > > > SELECT DISTINCT returns duplicate results. Attaching a small quads dump and > the query to reproduce with TDB > Reproduced with Jena 2.11.0 and Jena 2.10.1 (was working with 2.7.4) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (JENA-587) SELECT DISTINCT returns duplicate results
[ https://issues.apache.org/jira/browse/JENA-587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13820014#comment-13820014 ] Andy Seaborne commented on JENA-587: c.f. TransformOrderByDistinctAppplication which does do some compatibility testing. > SELECT DISTINCT returns duplicate results > - > > Key: JENA-587 > URL: https://issues.apache.org/jira/browse/JENA-587 > Project: Apache Jena > Issue Type: Bug > Components: ARQ >Affects Versions: Jena 2.11.0 >Reporter: Veyriere > Attachments: D.ttl, Q.rq, bug Jena2.11.0.zip, jena-587.zip > > > SELECT DISTINCT returns duplicate results. Attaching a small quads dump and > the query to reproduce with TDB > Reproduced with Jena 2.11.0 and Jena 2.10.1 (was working with 2.7.4) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Comment Edited] (JENA-587) SELECT DISTINCT returns duplicate results
[ https://issues.apache.org/jira/browse/JENA-587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13820008#comment-13820008 ] Andy Seaborne edited comment on JENA-587 at 11/12/13 11:02 AM: --- This is not TDB related. See attached files D.ttl and Q.rq The issue seems to be that the DISTINCT variables and the ORDER BY do not align so the "reduced" assumption is invalid. Maybe it just needs to test that the ORDER BY covers the DISTINCT projection. Running from the command line: {noformat} sparql --data D.ttl --file Q.rq {noformat} {noformat} sparql --set arq:optDistinctToReduced=false --data D.ttl --file Q.rq {noformat} gives different answers (the second is right, the first has duplicates). was (Author: andy.seaborne): This is not TDB related. See attached files D.ttl and Q.rq he issue seems to be that the DISTINCT variables and the ORDER BY do not align so the "reduced" assumption is invalid. Maybe it just needs to test that the ORDER BY covers the DISTINCT projection. Running from the command line: {noformat} sparql --data D.ttl --file Q.rq {noformat} {noformat} sparql --set arq:optDistinctToReduced=false --data D.ttl --file Q.rq {noformat} gives different answers (the second is right, the first has duplicates). > SELECT DISTINCT returns duplicate results > - > > Key: JENA-587 > URL: https://issues.apache.org/jira/browse/JENA-587 > Project: Apache Jena > Issue Type: Bug > Components: ARQ >Affects Versions: Jena 2.11.0 >Reporter: Veyriere > Attachments: D.ttl, Q.rq, bug Jena2.11.0.zip, jena-587.zip > > > SELECT DISTINCT returns duplicate results. Attaching a small quads dump and > the query to reproduce with TDB > Reproduced with Jena 2.11.0 and Jena 2.10.1 (was working with 2.7.4) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Comment Edited] (JENA-587) SELECT DISTINCT returns duplicate results
[ https://issues.apache.org/jira/browse/JENA-587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13820008#comment-13820008 ] Andy Seaborne edited comment on JENA-587 at 11/12/13 11:01 AM: --- This is not TDB related. See attached files D.ttl and Q.rq he issue seems to be that the DISTINCT variables and the ORDER BY do not align so the "reduced" assumption is invalid. Maybe it just needs to test that the ORDER BY covers the DISTINCT projection. Running from the command line: {noformat} sparql --data D.ttl --file Q.rq {noformat} {noformat} sparql --set arq:optDistinctToReduced=false --data D.ttl --file Q.rq {noformat} gives different answers (the second is right, the first has duplicates). was (Author: andy.seaborne): This is not TDB related. See attached files D.ttl and Q.rq he issue seems to be that the DISTINCT variables and the ORDER BY do not align so the "reduced" assumption is invalid. Maybe it just needs to test that the ORDER BY covers the DISTINCT projection. Running from the command line: {noformat} sparql --data D.ttl --file Q.rq {noformat} {noformat} sparql --set arq:optDistinctToReduced=false --data D.ttl --file Q.rq {noformat} > SELECT DISTINCT returns duplicate results > - > > Key: JENA-587 > URL: https://issues.apache.org/jira/browse/JENA-587 > Project: Apache Jena > Issue Type: Bug > Components: ARQ >Affects Versions: Jena 2.11.0 >Reporter: Veyriere > Attachments: D.ttl, Q.rq, bug Jena2.11.0.zip, jena-587.zip > > > SELECT DISTINCT returns duplicate results. Attaching a small quads dump and > the query to reproduce with TDB > Reproduced with Jena 2.11.0 and Jena 2.10.1 (was working with 2.7.4) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (JENA-587) SELECT DISTINCT returns duplicate results
[ https://issues.apache.org/jira/browse/JENA-587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rob Vesse updated JENA-587: --- Attachment: jena-587.zip Attaching cleaned up version of the bug which has appropriate file extensions and modifies the query to not require TDB union graph mode to be set. Issue can reproduced by running with Fuseki using --memTDB option (TDB is required to use named graphs in the query), loading the data from the data.nq file and running the query in query.rq > SELECT DISTINCT returns duplicate results > - > > Key: JENA-587 > URL: https://issues.apache.org/jira/browse/JENA-587 > Project: Apache Jena > Issue Type: Bug > Components: ARQ >Affects Versions: Jena 2.11.0 >Reporter: Veyriere > Attachments: D.ttl, Q.rq, bug Jena2.11.0.zip, jena-587.zip > > > SELECT DISTINCT returns duplicate results. Attaching a small quads dump and > the query to reproduce with TDB > Reproduced with Jena 2.11.0 and Jena 2.10.1 (was working with 2.7.4) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (JENA-587) SELECT DISTINCT returns duplicate results
[ https://issues.apache.org/jira/browse/JENA-587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13820008#comment-13820008 ] Andy Seaborne commented on JENA-587: This is not TDB related. See attached files D.ttl and Q.rq he issue seems to be that the DISTINCT variables and the ORDER BY do not align so the "reduced" assumption is invalid. Maybe it just needs to test that the ORDER BY covers the DISTINCT projection. Running from the command line: {noformat} sparql --data D.ttl --file Q.rq {noformat} {noformat} sparql --set arq:optDistinctToReduced=false --data D.ttl --file Q.rq {noformat} > SELECT DISTINCT returns duplicate results > - > > Key: JENA-587 > URL: https://issues.apache.org/jira/browse/JENA-587 > Project: Apache Jena > Issue Type: Bug > Components: ARQ >Affects Versions: Jena 2.11.0 >Reporter: Veyriere > Attachments: D.ttl, Q.rq, bug Jena2.11.0.zip, jena-587.zip > > > SELECT DISTINCT returns duplicate results. Attaching a small quads dump and > the query to reproduce with TDB > Reproduced with Jena 2.11.0 and Jena 2.10.1 (was working with 2.7.4) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (JENA-587) SELECT DISTINCT returns duplicate results
[ https://issues.apache.org/jira/browse/JENA-587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Seaborne updated JENA-587: --- Attachment: Q.rq D.ttl > SELECT DISTINCT returns duplicate results > - > > Key: JENA-587 > URL: https://issues.apache.org/jira/browse/JENA-587 > Project: Apache Jena > Issue Type: Bug > Components: ARQ >Affects Versions: Jena 2.11.0 >Reporter: Veyriere > Attachments: D.ttl, Q.rq, bug Jena2.11.0.zip, jena-587.zip > > > SELECT DISTINCT returns duplicate results. Attaching a small quads dump and > the query to reproduce with TDB > Reproduced with Jena 2.11.0 and Jena 2.10.1 (was working with 2.7.4) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Comment Edited] (JENA-587) SELECT DISTINCT returns duplicate results
[ https://issues.apache.org/jira/browse/JENA-587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13819993#comment-13819993 ] Rob Vesse edited comment on JENA-587 at 11/12/13 10:56 AM: --- More recent versions of ARQ automatically optimise DISTINCT -> REDUCED which may leave some duplicates. Due to the predictable way in which TDB returns scan results and ARQ executes joins for most queries this is a non-issue since the two queries will be equivalent since REDUCED in ARQ eliminates neighbouring non-distinct solutions. This behaviour can be turned off like so: {noformat} ARQ.getContext().set(ARQ.optDistinctToReduced, false) {noformat} was (Author: rvesse): More recent of versions automatically optimise DISTINCT -> REDUCED which may leave some duplicates. Due to the predictable way in which TDB returns scan results and ARQ executes joins for most queries this is a non-issue since the two queries will be equivalent since REDUCED in ARQ eliminates neighbouring non-distinct solutions. This behaviour can be turned off like so: {noformat} ARQ.getContext().set(ARQ.optDistinctToReduced, false) {noformat} > SELECT DISTINCT returns duplicate results > - > > Key: JENA-587 > URL: https://issues.apache.org/jira/browse/JENA-587 > Project: Apache Jena > Issue Type: Bug > Components: ARQ >Affects Versions: Jena 2.11.0 >Reporter: Veyriere > Attachments: bug Jena2.11.0.zip > > > SELECT DISTINCT returns duplicate results. Attaching a small quads dump and > the query to reproduce with TDB > Reproduced with Jena 2.11.0 and Jena 2.10.1 (was working with 2.7.4) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (JENA-587) SELECT DISTINCT returns duplicate results
[ https://issues.apache.org/jira/browse/JENA-587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13820002#comment-13820002 ] Rob Vesse commented on JENA-587: The specific cause of the {{DISTINCT}} not being equivalent to the {{REDUCED}} in your case is that the use of {{ORDER BY}} changes the ordering of rows so the non-distinct rows are not adjacent meaning that {{REDUCED}} does not eliminate them. Removing the {{ORDER BY}} does result in duplicates being eliminated. > SELECT DISTINCT returns duplicate results > - > > Key: JENA-587 > URL: https://issues.apache.org/jira/browse/JENA-587 > Project: Apache Jena > Issue Type: Bug > Components: ARQ >Affects Versions: Jena 2.11.0 >Reporter: Veyriere > Attachments: bug Jena2.11.0.zip > > > SELECT DISTINCT returns duplicate results. Attaching a small quads dump and > the query to reproduce with TDB > Reproduced with Jena 2.11.0 and Jena 2.10.1 (was working with 2.7.4) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (JENA-587) SELECT DISTINCT returns duplicate results
[ https://issues.apache.org/jira/browse/JENA-587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13819993#comment-13819993 ] Rob Vesse commented on JENA-587: More recent of versions automatically optimise DISTINCT -> REDUCED which may leave some duplicates. Due to the predictable way in which TDB returns scan results and ARQ executes joins for most queries this is a non-issue since the two queries will be equivalent since REDUCED in ARQ eliminates neighbouring non-distinct solutions. This behaviour can be turned off like so: {noformat} ARQ.getContext().set(ARQ.optDistinctToReduced, false) {noformat} > SELECT DISTINCT returns duplicate results > - > > Key: JENA-587 > URL: https://issues.apache.org/jira/browse/JENA-587 > Project: Apache Jena > Issue Type: Bug > Components: ARQ >Affects Versions: Jena 2.11.0 >Reporter: Veyriere > Attachments: bug Jena2.11.0.zip > > > SELECT DISTINCT returns duplicate results. Attaching a small quads dump and > the query to reproduce with TDB > Reproduced with Jena 2.11.0 and Jena 2.10.1 (was working with 2.7.4) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Closed] (JENA-586) Fuseki 500 - Out of range: on multiple add to fuseki with in memory store
[ https://issues.apache.org/jira/browse/JENA-586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Seaborne closed JENA-586. -- > Fuseki 500 - Out of range: on multiple add to fuseki with in memory store > - > > Key: JENA-586 > URL: https://issues.apache.org/jira/browse/JENA-586 > Project: Apache Jena > Issue Type: Bug > Components: Fuseki >Affects Versions: Fuseki 1.0.0 > Environment: Windows 8 >Reporter: Brian McBride >Assignee: Andy Seaborne >Priority: Minor > Fix For: Jena 2.11.1 > > Attachments: testMultipleAdd.zip > > > I have junit tests of my application failing. The tests use a Fuseki > configured with an in memory tdb. > The tests fail when they do a second DatasetAccessor.add call to the same > graph. > My tests work when run against a Fuseki with a persistent TDB using the > filing system. I've marked the issue as major in case it is timing dependent > issue. If its just an issue with the in-memory store, it it less significant. > I will attach a minimal example once I have submitted this. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (JENA-587) SELECT DISTINCT returns duplicate results
[ https://issues.apache.org/jira/browse/JENA-587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Veyriere updated JENA-587: -- Attachment: bug Jena2.11.0.zip > SELECT DISTINCT returns duplicate results > - > > Key: JENA-587 > URL: https://issues.apache.org/jira/browse/JENA-587 > Project: Apache Jena > Issue Type: Bug > Components: ARQ >Affects Versions: Jena 2.11.0 >Reporter: Veyriere > Attachments: bug Jena2.11.0.zip > > > SELECT DISTINCT returns duplicate results. Attaching a small quads dump and > the query to reproduce with TDB > Reproduced with Jena 2.11.0 and Jena 2.10.1 (was working with 2.7.4) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (JENA-587) SELECT DISTINCT returns duplicate results
Veyriere created JENA-587: - Summary: SELECT DISTINCT returns duplicate results Key: JENA-587 URL: https://issues.apache.org/jira/browse/JENA-587 Project: Apache Jena Issue Type: Bug Components: ARQ Affects Versions: Jena 2.11.0 Reporter: Veyriere Attachments: bug Jena2.11.0.zip SELECT DISTINCT returns duplicate results. Attaching a small quads dump and the query to reproduce with TDB Reproduced with Jena 2.11.0 and Jena 2.10.1 (was working with 2.7.4) -- This message was sent by Atlassian JIRA (v6.1#6144)