John, Paul,
We have been looking at this with a format "application/rdf-patch"
http://afs.github.io/rdf-patch/
to use with PATCH around the idea that a dataset is a set of quads and
triples and sending a sequence of change operations.
It's N-quads + a column for "Add" or "Delete". It allows direct
reference to blank nodes in the store with <_:StoreInternalId>.
We wanted partial deletes for small scale chnages to large graphs and
the ability to add to daat aroudn existing blank nodes. (It would work
as a graph PATCH format as well.)
The downside is that it is a new format. and needs the target server to
support the format.
On 23/04/14 09:03, John Walker wrote:
One could imagine similar quad semantics for HTTP GET, PUT, POST and
DELETE where:
-GET would return the entire contents of the graph store in the
requested quad format (could also support triples where context
> is omitted)
-PUT would replace the entire contents of the graph store with the
RDF quad payload
-POST would insert the RDF quad payload into the graph store leaving
existing data intact
-DELETE would be equivalent to DROP ALL
I have been using these semantics for quads-on-datasets as well. The
endpoint is typically the dataset itself, rather than the SPARQL Graph
Store HTTP Protocol endpoint and rdf-patch PATCH is on the same URI.
The minor advantage is the base URI for the operation is the dataset but
otherwise, sending to an extended-by-content-type SPARQL Graph Store
HTTP Protocol endpoint would work.
Andy
On 24/04/14 09:07, John Walker wrote:
Hi Paul,
Certainly it is possible to do the same using SPARQL 1.1 Update as the GSP
requests can be expressed in those terms.
We actually tried this approach by generating SPARQL Update procedures (and
TriG) as the output from XSLT processing step.
However as the original message is XML and the transformation pipeline is using
XML technology, we settled on TriX format for the output as it gave benefit of
being able to validate against the TriX XSD.
Also worth to note that the original XML message can be up to 10MB and even
larger following transformation to whatever RDF/SPARQL format.
We found that we ran into problems when trying to issue such large SPARQL
Update procedures against the SPARQL endpoint.
When using GSP we did not encounter this limitation.
Of course this is dependent on the graph store implementation one is using.
Regarding the LDP my understanding is that it only permits HTTP operations on
individual (RDF triple) resources. Thus updating several LDP RDF Sources in a
single HTTP request is not possible.
On the subject of PLM, it's also worth looking at OSLC
(http://open-services.net/).
Regards,
John
-----Original Message-----
From: Paul Tyson [mailto:[email protected]]
Sent: Thursday, April 24, 2014 2:44 AM
To: John Walker
Cc: [email protected]
Subject: Re: Extending SPARQL Graph Store HTTP Protocol with quad semantics
Hi John,
Interesting work, thanks for sharing it.
I'm also implementing PLM linked data capabilities. We haven't tackled the
problem of incremental updates to the RDF store yet, but that is coming. Not
even sure we'll do it over HTTP at first, but if SPARQL protocol wouldn't
handle it I'd be inclined to look at setting up a Linked Data Platform
(http://www.w3.org/TR/ldp/) to handle the extra semantics.
Regards,
--Paul
On Wed, 2014-04-23 at 08:03 +0000, John Walker wrote:
Hi All,
I’d like to share some information about what we’ve implemented and
see if there is either:
- Previous work done in this area, or
- Others that might find this useful
Perhaps in longer term this is something that could even be
standardized.
To set the scene we’ve been working on converting a rather large
dataset to RDF.
The dataset is in product lifecycle management domain.
The primary goal is to have a ‘virtualized’ copy of the current state
of all items that can be flexibly queried over.
For management of the data in the graph store we settled on a graph
per resource pattern [1] where each named graph contains a description
of one item plus some additional metadata about the graph itself.
This allows us to use HTTP operations (e.g. PUT) to interact with the
named graphs, which is consistent with the granularity of updates to
individual items from the source system (i.e. any change to an item
creates a new version of the item which replaces the previous
version).
However we also knew that the updates from the source system were sent
as messages which contain the description of one or more changed items
plus the description of all related items potentially impacted by the
change.
One option we considered was to deconstruct the message into several
HTTP PUT operations for each item described in a particular message.
However this would have the downside that the updates in the graph
store (state changes) do not directly correspond to the messages and
that potentially the updates in a message might be half applied should
there be some error during processing.
The solution we arrived at was the convert the message to RDF quads
and apply the update with a HTTP PATCH request to the graph store with
‘custom’ semantics.
We define HTTP PATCH using Quad data as equivalent to:
- DROP SILENT operation on each named graph in payload, followed
by
- INSERT DATA operation on each named graph in payload
In other words this is the same as a HTTP PUT request against each
named graph in the quad data.
This allows us to apply the changes described in a message in one
atomic action.
Any named graphs already present in the graph store that are not in
the RDF quad payload are not mutated.
There is some more info in slides 14-17 of a recent presentation [2].
One could imagine similar quad semantics for HTTP GET, PUT, POST and
DELETE where:
- GET would return the entire contents of the graph store in the
requested quad format (could also support triples where context is
omitted)
- PUT would replace the entire contents of the graph store with
the RDF quad payload
- POST would insert the RDF quad payload into the graph store
leaving existing data intact
- DELETE would be equivalent to DROP ALL
Here it may also be useful to have separate URIs to represent the
graph store instance and the data in that instance to remove any
ambiguity if the DELETE request, for example, should delete the graph
store itself or the data in the store.
Regards,
John Walker
[1] http://patterns.dataincubator.org/book/graph-per-resource.html
[2] http://www.nxp.com/documents/other/PiLOD2_20140417.pdf