Re: [RDF Patch] Looking at Talis Changesets and other proposals.

Andy Seaborne Tue, 30 Jul 2013 13:19:43 -0700

Rob,

It's a balance as to whether to put in a discussion section. I'd preferto keep the doc as spec-like and just talk about what RDF patch is.Someone wanting to use/implement RDF patch may not be interested in thehistory (this is a few years from now!). A separate doc would enable usto be more opinionated :-)

Or we could promote discussion by sending to public-sparql-dev; afterall, the analysis is more opinion than the rest of the doc. We'llprobably get better feedback as well.


        Andy

PS Any news on the binary format?

On 30/07/13 16:56, Rob Vesse wrote:

Andy

I am familiar with Talis Changesets having used them heavily in my PhD
research.

My concerns are much the same as yours in that Changesets really don't
scale well.  The other big problem is that since they are RDF graphs they
are unordered since once cannot rely on a serializer/parser producing the
data in the same order as was originally intended especially if you start
crossing boundaries between different toolkits/APIs.  This makes them
effectively useless as a streaming patch format unless you send a stream
of small changests, this however adds copious overhead to a format
intended for speed and simplicity.

Perhaps more simply you can do the following

#METADATA
<> rp:create [ foaf:name "Andy" ; foaf:orgURL <http://jena.apache.org> ] ;
    rp:createdDate "2013-07-30"^^xsd:date
    rdfs:comment "A valid Turtle graph" .
#METADATA

The #METADATA is used to denote the start/end of a metadata block (which
ideally we permit only at the start of the patch).  This can then be
easily discarded by line oriented processors since if you see #METADATA
you just throw away all subsequent lines until you see #METADATA again.
Within the metadata block you could allow full blown Turtle or restrict to
a simpler tuple format if preferable?

Is it worth adding a comparison to alternative approaches as an Appendix
to the RDF patch proposal?

Rob


On 7/30/13 7:49 AM, "Andy Seaborne" <a...@apache.org> wrote:

Rob, all,

Leigh Dodds expressed a preference for Talis Changesets for patches.  I
have tries to analysis their pros and cons.

For me, the scale issue alone makes changesets the wrong starting point.
  They really solve a different problem of managing some remote data
with small, incremental changes.

It would be useful to add to RDF patch the ability to have metadata
about the change itself.

One way is to introduce a new marker M, which permits effectively,
N-Triples.  (Maybe required to be at the front.)

Not Turtle but I see RDF patch as machine oriented, not human readable.

M <> rp:create _:a .
M _:a foaf:name "Andy" .
M _:a foaf:orgURL <http://jena.apache.org/> .
M <> rp:createdDate "2013-07-30"^^xsd:date .
M <> rdfs:comment "Seems like a good idea" .

        Andy

------------------------------------------------------------------

Talis Changesets (TCS)

http://docs.api.talis.com/getting-started/changesets
http://docs.api.talis.com/getting-started/changeset-protocol
http://vocab.org/changeset/schema.html

== Brief Description

A Changeset is a set of triples to remove and a set of triples to add,
recorded as a single RDF graph.  There is a fixed "subject of change" -
a changeset is a change to a single resource.  The triples of the change
must all have the same subject and this must be the subject of change.

The triples of the change are recorded as reified statements.  This is
necessary so that triples can be grouped into removal and addition sets.
The change set can have descriptive information about the change.
Because the changset is an RDF graph, the graph can say who was the
creator, record the reason for the change, and the date the modification
was created (not executed).  This also requires that the change triples
are reified.

ChangeSet can be linked together to produce a sequence of changes.  This
is how to get changes to several resources - a list of changesets.

== Pros and Cons

This approach a some advantages and some disadvantages:
(some of these can be overcome by fairly obvious changes to the definition

1/ Changes relate only to one resource.  You can't make a coordinated
set of changes, such as adding a batch of several new resources in a
single HTTP request.

2/ Blank nodes can't be handled.  There is no way give the subject of
change if it is a blank node. (The Talis platform didn't support blank
nodes.)

3/ A changeset is an RDF graph.

It needs the whole changeset graph to be available before any changes
are made.  The whole graph is needed to validate the changeset (e.g. all
reified triples have the same subject), and order of triples in a
serialization of a graph is arbitrary (esp. if produced  by a generic
RDF serializer) so, for example, the "subject of change" triple could be
last, of the additions and removals can be mixed in any order.  To get
stable changes, it is necessary to have a rule like all removals done
before the additions are done.

This is a limitation at scale. In practice, a changeset must be parsed
into memory (standard parser), validated (changeset specific code) and
applied (changeset specific code).  The design can't support streaming
nor changes which may be larger than available RAM (e.g. millions of
triples).

It does mean that a standard RDF tool kit can be used to produce the
change set (with suitable application code to build the graph structure)
and to parse it at the receiver, toegther with some application code for
producing, validaing and executing a changeset.

4/ The feature of metadata per change is a useful feature.

5/ Change sets only work for a change to a resource in a single graph.

== Other

Graph literals:

Some other proposals have been made (like Delta, or varients based on
TriG) where named graphs are used instead of reified triples.   The
scaling issue remains - processing can't start until the whole chanhe
has been seen.

Delta:

{ ?x bank:accountNo "1234578"}
   diff:deletion { ?x bank:balance 4000};
   diff:insertion { ?x bank:balance 3575}
}

Restricted SPARQL Update:

That leaves a restricted SPARQL update: e.g. DELETE DATA, INSERT DATA,
and maybe DELETE WHERE.

As soon as additional restrictions apply, then, to validate, you need a
special parser so one advantage (reuse of existing tools) is only partial.

No blank node handling without requiring skolemized URI handling, again
stepping away from standard, general tools.

Re: [RDF Patch] Looking at Talis Changesets and other proposals.

Reply via email to