[jira] [Commented] (JENA-766) Aggregate query returns (possibly) wrong results

Andy Seaborne (JIRA) Fri, 22 Aug 2014 01:16:47 -0700

    [ 
https://issues.apache.org/jira/browse/JENA-766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14106623#comment-14106623
 ]


Andy Seaborne commented on JENA-766:
------------------------------------

The results look right to me.  Consider this query:

{noformat:title=Example 2}
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> 
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
 SELECT ?resource 
        (COUNT (DISTINCT ?c) as ?c_count) 
        (COUNT (DISTINCT ?resource) as ?r_count) 
WHERE {
        ?resource rdf:type/rdfs:subClassOf* 
<http://example.com/ns#CardExact3Class>.
        ?resource <http://example.com/ns#cardExact3> ?c.
} GROUP BY ?resource
HAVING ( ( count(?c)  != 3 ) && ( count(?c)  != 0 ) )
{noformat}

which has results (run with arq.sparql --data ... --query ...).

{noformat:title=Results 2}
-------------------------------------------------------
| resource                        | c_count | r_count |
=======================================================
| <http://example.com/ns#error3a> | 2       | 1       |
| <http://example.com/ns#error3b> | 4       | 1       |
-------------------------------------------------------
{noformat}

In your query, {{(COUNT (DISTINCT ?resource)}} is counting the number 
of distinct occurences of {{?resource}} in each group.  Because the query has 
{{GROUP BY ?resource}}, in each group there is exactly one value of 
{{?resource}}.
This gives {{?r_count}}.

{{?c_count}} aligns with the {{HAVING}} clause usage.

{{DISTINCT}} inside an aggregate applies to the expression of the aggregate
over the group.  {{SELECT DISTINCT}} applies to the rows of the result set.


> Aggregate query returns (possibly) wrong results
> ------------------------------------------------
>
>                 Key: JENA-766
>                 URL: https://issues.apache.org/jira/browse/JENA-766
>             Project: Apache Jena
>          Issue Type: Bug
>          Components: Jena
>    Affects Versions: Jena 2.11.2
>         Environment: Ubuntu 14.04
>            Reporter: Dimitris Kontokostas
>            Priority: Minor
>
> I have the following query
> {code:title=SPARQL-Query}
> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> 
> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
>  SELECT (COUNT (DISTINCT ?resource) as ?total) WHERE {
>       ?resource rdf:type/rdfs:subClassOf* 
> <http://example.com/ns#CardExact3Class>.
>       ?resource <http://example.com/ns#cardExact3> ?c.
> } GROUP BY ?resource
> HAVING ( ( count(?c)  != 3 ) && ( count(?c)  != 0 ) )
> {code}
> run against the following data
> {code:title=TTL-Data}
> @prefix ex: <http://example.com/ns#> .
> ex:error3a a ex:CardExact3Class ; # 1 error
>       ex:cardExact3 ex:abc1 ;
>       ex:cardExact3 ex:abc2 .
> ex:error3b a ex:CardExact3Class ; # 1 error
>       ex:cardExact3 ex:abc1 ;
>       ex:cardExact3 ex:abc2 ;
>       ex:cardExact3 ex:abc3 ;
>       ex:cardExact3 ex:abc4 .
> {code}
> The query should return 2 as result but instead returns 1.
> If I change the query type to SELECT DISTINCT ?resource I get 2 results so I 
> think this should be a Jena issue (but maybe I miss something again)
> Here's some sample Java code to reproduce
> {code:title=Java code to reproduce}
>         Model model = ModelFactory.createDefaultModel();
>         String Query =
>                 "PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> \n" +
>                 "PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
> \n" +
>                 "SELECT (COUNT (DISTINCT ?resource) as ?total) WHERE {\n" +
>                 "\t?resource rdf:type/rdfs:subClassOf* 
> <http://example.com/ns#CardExact3Class>.\n" +
>                 "\t?resource <http://example.com/ns#cardExact3> ?c.\n" +
>                 "} GROUP BY ?resource\n" +
>                 "HAVING ( ( count(?c)  != 3 ) && ( count(?c)  != 0 ) )\n" +
>                 "\n" +
>                 "  ";
>         String data =
>                 "@prefix ex: <http://example.com/ns#> .\n\n" +
>                 "ex:error3a a ex:CardExact3Class ; # 1 error\n" +
>                 "\tex:cardExact3 ex:abc1 ;\n" +
>                 "\tex:cardExact3 ex:abc2 ;\n" +
>                 "\t.\n" +
>                 "ex:error3b a ex:CardExact3Class ; # 1 error\n" +
>                 "   \tex:cardExact3 ex:abc1 ;\n" +
>                 "   \tex:cardExact3 ex:abc2 ;\n" +
>                 "   \tex:cardExact3 ex:abc3 ;\n" +
>                 "   \tex:cardExact3 ex:abc4 ;\n" +
>                 "   \t.\n";
>         model.read(new ByteArrayInputStream( data.getBytes() ), null, "TTL");
>         QueryExecution qe = 
> com.hp.hpl.jena.query.QueryExecutionFactory.create(Query, model);
>         ResultSet results = qe.execSelect();
>         if (results.hasNext()) {
>             QuerySolution qs = results.next();
>             int total = qs.get("total").asLiteral().getInt();
>             // Total should be 2 while it is 1
>         }
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (JENA-766) Aggregate query returns (possibly) wrong results

Reply via email to