Re: Querying only the default graph from the data store

Andy Seaborne Sun, 09 Sep 2012 04:09:26 -0700


On 07/09/12 15:37, Lee Feigenbaum wrote:

[moved to public-sparql-dev]

I have a related question -- do all quad stores  / named graph stores
include a default graph? If the store that you develop or use does have
a default graph, does that graph also have a name (URI)?

TDB has a real, stored default graph or can operate in union-named-graphmode.


The storage default graph can have be accessed by a URI name but:

1/ The name is the same in all stores so it is not a well-formed
2/ It does not show up in GRAPH ?g {}

The union of all named graphs also has a name so a query accessing theusual default graph as normal can also GRAPH to get the union.


But:

The most common use case is storing and querying a single graph, namedgraphs just confuse the issue a lot of the time :-)


        Andy


Answering for Anzo: Anzo does not have a default graph. All graphs are
named with URIs.

Lee

On 9/7/2012 7:44 AM, Barry Bishop wrote:

Hello Axel,

On 05/09/12 21:14, Polleres, Axel wrote:

Thanks Barry,

Since you confirm that the response addresses your comment, please
consider this reply informal (chair-hat off).

I feel this is a shame, as two different implementations can
produce different output from the simplest of queries, e.g.
SELECT * { ?s ?p ?o }

I personally find this quite normal... different endpoints
respond differently to such query since they refer to different
default datasets, i.e.
Naturally when I query dbpedia.org I qury a different dataset than
data.semanticweb.org, etc.


Well, dbpedia.org and data.semanticweb.org sparql endpoints make
different data available, so I suppose you would naturally get
different results to the same query. However, this is not what I was
getting at. In fact, I'm not sure I have managed to get my point
across at all. Perhaps another hypothetical example:

Suppose you run a development team that builds an application that
interacts with some public sparql endpoint, say http://xyz.org/sparql
- then one day xyz.org start to have scalability problems and decide
to upgrade their RDF database to some expensive new thing. Both old
and new RDF databases are fully compliant with W3C, but after they
upgrade your application is completely broken only because the two
database implementations construct their RDF dataset differently when
no FROM clauses are given. I am sure you wouldn't find it so natural
in this case.

There are some workarounds as you say, but not in all cases. When you
are using someone else's database and don't get to decide how they
partition their data in to separate graphs, then you can be completely
stuck. As fabulous as the query language is (and I do think it is
tremendous achievement), this ambiguity over constructing a dataset
when there are no FROMs is a bit of a hole.


Notably, I'd like to also point you to the another document within
the SPARQL1.1 specification,
i.e. the service-description document at
http://www.w3.org/TR/sparql11-service-description/
which provides means to describe which graphs compose the default
dataset of a particular service endpoint.
Particularly, the property
http://www.w3.org/TR/sparql11-service-description/#sd-defaultDataset
is intended to provide a description of the default dataset that an
endpoint uses.
Note also that the service desription voaculary is extensible, and
what we specify now is only a core, but other vocabulary can be used
to extend this (e.g. VoID)


All well and good, if this feature is actually provided by an
endpoint. However, it requires quite a lot of programming for a client
to work all this out and re-write queries accordingly. And actually,
it still doesn't help - e.g. if the endpoint you want to use
constructs the dataset as an RDF merge of all graphs (when no FROM
clauses are given [I need to find an abbreviation for this]) and you
only want to query the default graph, then you just can't do it. There
is no way to tell such an endpoint that you only want the default
graph using the query language.

The problem is basically that the default graph is special - because
it doesn't have an identifier it can not be used in the same way as
named graphs....

... in the query language. However, in the update language the
appropriate syntax has already been created and would be the perfect
complement to the query language, e.g. if I can do this:

    CLEAR DEFAULT

why can't I do this:

    SELECT *
    FROM DEFAULT
    {...}

and specify absolutely unambiguously that I want my query to execute
*only* over the default graph in the database. No matter how an
implementation constructs its dataset when no FROM clauses are given,
this syntax should always work in the expected way.

Since I am rambling on, the related keywords from the update language
would also be very useful, e.g. one can clear all graphs like this:

    CLEAR ALL

so why not be able to do this:

    SELECT *
    FROM ALL
    {...}

This would help in the opposite case, when an implementation
constructs the dataset using only the default graph (when no FROM
clauses are given). In this situation, it is not possible to query for
the graph names (using select distinct ?g {graph ?g {?s ?p ?o}}), so
the above would say: "please merge all graphs for input to my query,
even though I don't know what their names are and have no way of
finding out (using the query language)".

These things might not seem important, but they are life and death to
application programmers. Right now, to build an application that needs
to interact with a sparql endpoint that is only known at runtime is
fraught with difficulties. Not the least of which is that if your
application is required to query data only from the default graph,
then there is no way to write a query that is guaranteed to do this on
all (W3C compliant) sparql endpoints.

Which I still feel is a bit of a shame.

barry


As for the rest of your response, we seem to agree that what you're
aiming at
is rather a new feature than something this working group can address
within its current
charter and resources.

Best regards,
Axel

-----Original Message-----
From: Barry Bishop [mailto:[email protected]]
Sent: Mittwoch, 05. September 2012 19:49
To: Polleres, Axel
Cc: [email protected]
Subject: Re: Querying only the default graph from the data store

Hello Axel,

Thanks for taking the time to reply. I realise this thread is
somewhat out of place given the status/progress of the WG.

Your reply does address my initial post. It does not resolve
it, but this is perhaps not the time. However, for the
purpose of clarity I will make further comments inline:

On 05/09/12 04:11, Polleres, Axel wrote:

Hi Barry,

This is in response to

http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2012Aug/0

011.html

The working draft does not specify how the RDF dataset is

constructed

when no FROM and FROM NAMED clauses are present in the

SPARQL query.

Implementations are therefore able to construct the dataset
differently, e.g.
a. dataset default graph contains only the data store's

default graph

b. dataset default graph contains the RDF merge of all

graphs in the

data store

It is correct that how the concrete default dataset of a

SPARQL endpoint is conctructed is left open to
implementations. Since different endpoints and
implementations support different behaviours in this regard
(e.g. in some implementations the default graph of the
default dataset is the union of all named graphs whereas in
others this is not the case), the working group does not feel
that there is a unique standard behavior to be advocated this
time around.

I feel this is a shame, as two different implementations can
produce different output from the simplest of queries, e.g.
SELECT * { ?s ?p ?o }

However, this is a separate issue.

As soon as a single FROM or FROM NAMED clause is used then

the data

store's default graph is excluded from the query's dataset.

Which means that there is no portable way to defne a

SPARQL query so

that it executes only against the default graph in the

data store -

or even against a combination of the default graph and one or more
named graphs.

Please note that a) querying the default graph in the

datastore is the standard behavior when no explicit FROM or
FROM NAMED clauses are given. b) the combination of querying
named graphs and the default graph of the endpoint's default
dataset is supported via GRAPH graph patterns.

a) This is rather inconsistent. Above you say that the
construction of the default RDF dataset (when no FROM/FROM
NAMED clauses are given) is not defined, but here you say
constructing it using the default graph only is the 'standard
behaviour'. One of the motivations for this post is that
there are good reasons not to have only the default graph in
the 'default dataset', e.g. you wouldn't be able to do this
to find out the graph names when presented with an unknown endpoint:

SELECT DISTINCT ?g WHERE { GRAPH ?g {?s ?p ?o } }

Anyway, the point here is that there is no *portable* way to
query just the default graph.

b) yes, but you can't query the RDF merge of the default
graph and a named graph in the same way with two named
graphs, e.g. FROM ex:g1 FROM ex:g2. Instead one would need to
use a triple and graph pattern union, which for complex
queries becomes cumbersome. Put another way, any combination
of named graphs can be merged and explored with query triple
patterns, but this can't be done with any combination of
named graphs and the default graph.

See also examples below.

This is a problem that often confuses users of RDF data

stores and is

likely to lead to implementations that provide their own specific
means to achieve this, e.g.
http://www.openrdf.org/issues/browse/SES-850

Inspired by the update language's use of the 'DEFAULT' keyword for
graph manipulation, I suggest an extension to the query

language that

allows "FROM DEFAULT" to be used, e.g.

SELECT *
FROM DEFAULT
WHERE { ..... }

=> dataset contains a default graph made up of the data store's
default graph only

Please note that this the standard behaviour when no FROM clause is
given, i.e. this corresponds to

SELECT *
WHERE { ..... }       <--- (no use of GRAPH keyword)

I don't think this is "standard behaviour", rather it is
common behaviour. It can not be standard when the
construction of the dataset is implementation dependent when
no FROM clause is given.

This construct can be used with any number of FROM <uri>

or FROM NAMED

<uri> clauses, e.g.

SELECT *
FROM DEFAULT
FROM <http://example.com#g1>
WHERE { ..... }

=> dataset contains a default graph made up of the data

store's default

graph merged with the contents of the data store's g1 graph
This would be a fairly trivial change for exisiting sparql

processor

implementations, but would provide a big improvement in
functionality/flexibility by allowing a data store's

default graph to be

used/queried/merged in the same way as any of it's named graphs.

Note that similar to the example above, you can query the

default graph and named graphs within the default dataset in
a data store side by side by using GRAPH graph patterns, i.e.

   SELECT *
   WHERE
   {
     .....                              <-- (no use of

GRAPH) matches the default graph

     GRAPH <http://ex.com#g1> { .... }  <-- matches named

graph g1 (assuming g1 is a named graph in the default dataset)

Consider an application that needs to execute queries over various
subsets of a database's contents, where the subsets are defined using
various combinations of named graphs. It would certainly be useful to
have standard queries which only required the appropriate
"FROM g1 FROM
g2 etc" prepended. This is easy to do, unless one of the
graphs is the
default graph.

Finally, note that it is not possible in SPARQL1.1 to

construct a *new* dataset composed of *parts* of the default
dataset of an endpoint plus possible external graphs; such a
feature currently not foreseen in the features addressed in
this round of SPARQL, but had been suggested before [1].

The features being worked on in this round of

standardization have been decided in a voting process at the
beginning of the WG and are documented in the following
document: http://www.w3.org/TR/sparql-features/

Additionally, a list of work items and features postponed

to a future working group are being collected by the group in
a dedicated wiki page [2] which also contains the features
discussed in the beginning of the WG which have not been
considered for this round [3].

Yes, I will be more timely next time and will endeavour to
progress this
topic in the proper way. My apologies for the 'noise'.

Regards,
barry

Among this list, the feature "Composite Datasets" [1] might

partially capture what you have in mind and a future WG might
possibly work out the details of such feature.

We'd kindly ask you to confirm by a reply to this list that

this addresses your comment.

Axel Polleres, on behalf of the SPARQL WG

1. http://www.w3.org/2009/sparql/wiki/Feature:CompositeDatasets
2. http://www.w3.org/2009/sparql/wiki/Future_Work_Items
3. http://www.w3.org/2009/sparql/wiki/Category:Features

Re: Querying only the default graph from the data store

Reply via email to