Andy,

I am agreed about scalability and stream processing being an important
goal, which as you note the Changesets do not address.  To me, the cleanest
way to implement that would be to go beyond an RDF representation and
create a new language as you've done.

The metadata block you mention is basically what I suggested as a header
block.  I do think it will be important to talk about the patch itself.
 I'm sort of leaning towards requiring it to appear before any data (but
after prefixes).  I'm trying to think when it might be necessary to allow
it to appear at any arbitrary point in the document (change some kind of
state after its processed say 10 million triples?), and I'm not coming up
with a lot of examples.  If it were required (lets say some statistics at
the end of the file about the preceding data), then maybe we have a header
(which has to appear up front) and metadata (which can appear anywhere)?

-Stephen


On Tue, Jul 30, 2013 at 10:49 AM, Andy Seaborne <a...@apache.org> wrote:

> Rob, all,
>
> Leigh Dodds expressed a preference for Talis Changesets for patches.  I
> have tries to analysis their pros and cons.
>
> For me, the scale issue alone makes changesets the wrong starting point.
>  They really solve a different problem of managing some remote data with
> small, incremental changes.
>
> It would be useful to add to RDF patch the ability to have metadata about
> the change itself.
>
> One way is to introduce a new marker M, which permits effectively,
> N-Triples.  (Maybe required to be at the front.)
>
> Not Turtle but I see RDF patch as machine oriented, not human readable.
>
> M <> rp:create _:a .
> M _:a foaf:name "Andy" .
> M _:a foaf:orgURL <http://jena.apache.org/> .
> M <> rp:createdDate "2013-07-30"^^xsd:date .
> M <> rdfs:comment "Seems like a good idea" .
>
>         Andy
>
> ------------------------------**------------------------------**------
>
> Talis Changesets (TCS)
>
> http://docs.api.talis.com/**getting-started/changesets<http://docs.api.talis.com/getting-started/changesets>
> http://docs.api.talis.com/**getting-started/changeset-**protocol<http://docs.api.talis.com/getting-started/changeset-protocol>
> http://vocab.org/changeset/**schema.html<http://vocab.org/changeset/schema.html>
>
> == Brief Description
>
> A Changeset is a set of triples to remove and a set of triples to add,
> recorded as a single RDF graph.  There is a fixed "subject of change" - a
> changeset is a change to a single resource.  The triples of the change must
> all have the same subject and this must be the subject of change.
>
> The triples of the change are recorded as reified statements.  This is
> necessary so that triples can be grouped into removal and addition sets.
> The change set can have descriptive information about the change. Because
> the changset is an RDF graph, the graph can say who was the creator, record
> the reason for the change, and the date the modification was created (not
> executed).  This also requires that the change triples are reified.
>
> ChangeSet can be linked together to produce a sequence of changes.  This
> is how to get changes to several resources - a list of changesets.
>
> == Pros and Cons
>
> This approach a some advantages and some disadvantages:
> (some of these can be overcome by fairly obvious changes to the definition
>
> 1/ Changes relate only to one resource.  You can't make a coordinated set
> of changes, such as adding a batch of several new resources in a single
> HTTP request.
>
> 2/ Blank nodes can't be handled.  There is no way give the subject of
> change if it is a blank node. (The Talis platform didn't support blank
> nodes.)
>
> 3/ A changeset is an RDF graph.
>
> It needs the whole changeset graph to be available before any changes are
> made.  The whole graph is needed to validate the changeset (e.g. all
> reified triples have the same subject), and order of triples in a
> serialization of a graph is arbitrary (esp. if produced  by a generic RDF
> serializer) so, for example, the "subject of change" triple could be last,
> of the additions and removals can be mixed in any order.  To get stable
> changes, it is necessary to have a rule like all removals done before the
> additions are done.
>
> This is a limitation at scale. In practice, a changeset must be parsed
> into memory (standard parser), validated (changeset specific code) and
> applied (changeset specific code).  The design can't support streaming nor
> changes which may be larger than available RAM (e.g. millions of triples).
>
> It does mean that a standard RDF tool kit can be used to produce the
> change set (with suitable application code to build the graph structure)
> and to parse it at the receiver, toegther with some application code for
> producing, validaing and executing a changeset.
>
> 4/ The feature of metadata per change is a useful feature.
>
> 5/ Change sets only work for a change to a resource in a single graph.
>
> == Other
>
> Graph literals:
>
> Some other proposals have been made (like Delta, or varients based on
> TriG) where named graphs are used instead of reified triples.   The scaling
> issue remains - processing can't start until the whole chanhe has been seen.
>
> Delta:
>
> { ?x bank:accountNo "1234578"}
>   diff:deletion { ?x bank:balance 4000};
>   diff:insertion { ?x bank:balance 3575}
> }
>
> Restricted SPARQL Update:
>
> That leaves a restricted SPARQL update: e.g. DELETE DATA, INSERT DATA, and
> maybe DELETE WHERE.
>
> As soon as additional restrictions apply, then, to validate, you need a
> special parser so one advantage (reuse of existing tools) is only partial.
>
> No blank node handling without requiring skolemized URI handling, again
> stepping away from standard, general tools.
>

Reply via email to