On 03/03/14 22:00, David Booth wrote:
Hi Andy,

On 03/03/2014 03:01 PM, Andy Seaborne wrote:
(please forward if the mailing list does not allow non-subscribers to
send to it)

On 03/03/14 16:32, David Booth wrote:
On 02/09/2014 05:45 PM, w3.h...@gmail.com wrote:
Relevant docs:
- Working draft of W3C Note:
https://docs.google.com/document/d/1zGQJ9bO_dSc8taINTNHdnjYEzUyYkbjglrcuUPuoITw/edit#heading=h.wyc73yp7c8jz




I notice that section 6.6.1 Core statistics shows this SPARQL query for
counting the number of triples:

   SELECT (COUNT(*) AS ?no) { ?s ?p ?o  }

However, I believe the SPARQL 1.1 standard allows duplicate triples and
duplicate query solutions by default.  If so, to get an accurate count
of the number of triples, the DISTINCT keyword must be used:

   SELECT (COUNT(DISTINCT *) AS ?no) { ?s ?p ?o  }

I'm copying Andy Seaborne to see if this is correct, since I could not
easily find this information in the SPARQL 1.1 spec when I did a quick
scan.   Andy, am I correct about this?

Thanks,
David

Hi,

In the case of { ?s ?p ?o }, the match is against the default graph and
an RDF graph is a set of triples - so there are no duplicates over the
?s, ?p, ?o elements of a row.

Because of the nature of the pattern, COUNT(*) and COUNT(DISTINCT *)
should be the same.

I think section 6.6.1 Core statistics is correct as is.

What does the spec say?  That's the definitive place to look.


I'm particularly thinking of AllegroGraph, which (by default I believe)

I don't know what AllegroGraph does. Sounds like a question for the developers.

does not remove duplicate triples if the same triple happens to be
loaded more than once.

bNodes? All the RDF syntaxes, when a fie is read twice, creates separate bNodes.

 If AllegroGraph returns a different count to the
queries above (with or without DISTINCT), does that mean that
AllegroGraph is not SPARQL 1.1 compliant?   I.e., is it a bug, or is it
a permissible implementation variation?

I had the impression that SPARQL 1.1 conformant implementations are
permitted to have duplicate solutions in the solution set unless the
word DISTINCT is used,

do you have a pointer to text that gave you that impression?

and hence I would have thought that a solution
set that is not explicitly constrained to be DISTINCT could include
duplicates, even if that solution set is for only a { ?s ?p ?o } graph
pattern over the default graph, but maybe I'm wrong.

I don't see how { ?s ?p ?o } can create duplicates - an RDF graph is a *set* of triples (that's not a SPARQL definition - it's an RDF definition) so subject/predicate/object is a unique combination within a graph.

If the graph is composed behind the scenes of other data, that's nothing to do with the RDF or SPARQL specs.

OTOH, if, when
DISTINCT is not specified, the SPARQL 1.1 standard only *sometimes*
permits duplicates, then how can I determine which circumstances permit
them and which don't?

It depends on the query pattern but we're talking about one specific pattern - { ?s ?p ?o }

In general, SPARQL results are multisets (duplicates). Some of the algebra operations can cause duplicates such as projection and union but their cardinality is defined.

        Andy



David



Reply via email to