Re: [Fedora-commons-developers] The REST API, The Resource Index and the Semantic Web

Asger Askov Blekinge Sat, 07 Nov 2009 02:49:01 -0800

I very much like what you are thinking here.


On Fri, 2009-11-06 at 22:36 +0100, Steve Bayliss wrote:
> Thinking over the current debates over the REST API, particularly
> manipulating relationships, and how the resource index fits in with this, I
> wonder if there is some unified approach that could be used to relate all of
> these together in a semantic web-friendly, REST-friendly, Web 2.0-friendly
> model.
> 
> Asger's work on Enhanced Content Models, and particularly the ideas around a
> "reference counting" mechanism for triples to get around some of the
> limitations with the current single-graph resource index, plus our own work
> on having arbitrary RDF datastreams propagated to the resource index (and
> the inherent problems with this) also feeds into this thinking, along with
> Carsten Friedrich's recent post expressing a desire for a relationships API
> that is not tied to needing to manipulate individual RELS-EXT, RELS-INT and
> DC datastreams.  Ben Armintor's comments on the wiki on a (sub-)
> graph-centric approach to manipulating relationships is also relevant.
> 
> This is early-stage thinking, but I thought it might be useful to get these
> ideas out there, albeit in a bit of a raw state.  And spending too long
> trying to define a vision of where you want to get to can get in the way of
> actually getting there...
> 
> And what follows is pretty dependent on Fedora's Resource Index being
> enabled, it is also Mulgara-centric, which is not exactly in line with
> current thinking.  So completely ignoring the
> "triplestore-is-only-a-cache-and-might-not-even-exist" issue...
> 

As far as I can see, you actually assume that the triple store is only a
cache, but you do require that it exist. "Triple store is only a cache"
means somewhat the same as "every triple in the triplestore should be
expressed in one of the objects"



> So:
> 
> Fundamentally two "kinds" of APIs:
> 
> 1) an API much as the current SOAP API, with a Fedora-object-centric view of
> the world, for manipulating objects, datastreams, disseminators etc
> 
> 2) a "semweb" API, with an RDF graph expression(s) of the Fedora repository,
> where resource URIs in the graph (objects, datastreams, disseminators etc)
> are resolvable, and are REST endpoints both for disseminating the contents
> of the repository (bitstreams, resource metadata, RDF graphs describing
> resources etc), and making changes to the repository, using REST semantics.
> So you could navigate the resource index to discover resources, then use the
> resource identifiers as REST endpoints.
> 
> So essentially the "semweb" API would represent a coming-together of the
> REST API and the resource index.  I think Asger's current proposal for an
> alternative REST API would fit in very well with this in terms of exposing
> the kind of REST endpoints that would be needed - and would provide the
> resolvable resource URIs for the RDF representation(s).
> 
> The Resource Index and graphs (models)
> ======================================
> Currently the Fedora Resource Index is a single graph, <#ri> (or
> <rmi://someserver/fedora#ri>).
> 
> Mulgara supports creation of multiple models (or graphs) and querying across
> multiple graphs.  (Fedora does make use of additional graphs, a datatyping
> graph, and a full text model if full text indexing is enabled).
> 
> Mulgara also supports creation of "View" models which do not hold triples,
> but are a view over multiple models, for instance the union of several
> graphs: http://docs.mulgara.org/itqloperations/views.html
> 
> It should therefore be possible to express a Fedora repository as a set of
> individual graphs whilst still presenting an overall single graph view of
> the repository; with sub-graphs being individually identifiable.
> 
> Essentially some kind of hierarchy of graphs and views, for example (please
> ignore the actual model/graph identifiers used below, I've not thought those
> through... this is just for conceptual illustration!).  (and note that these
> are not Fedora resource identifiers - they are identifiers for graphs and
> sub-graphs describing Fedora resources, with triples containing URIs that
> resolve to Fedora resources.)
> 
> <#ri> - a view containing:
>   <#some:pid> - object graph for some:pid, a view containing:
>     <#some:pid/properties> - graph containing object properties
>     <#some:pid/datastreams> - a view containing:
>       <#some:pid/datastreams/rels-ext> - graph containing triples from
> rels-ext
>       <#some:pid/datastreams/rels-int> - graph containing triples from
> rels-int
>       <#some:pid/datastreams/dc> - graph containing triples from DC
>       <#some:pid/datastreams/{rdf datastream}> - graph containing triples
> from some other rdf datastream
>       <#some:pid/datastreams/{dsid}/properties> - graph containing
> properties of datastream {dsid} (state, last modified, etc)
>   <#some:otherpid> - object graph for some:otherpid, a view containing:
>     <#some:otherpid/properties> - etc
>     <#some:otherpid/datastreams> - etc
>       ...
> 
> There's undoubtedly stuff I haven't thought about that should be included
> above (notably disseminators).  And there's probably a better design of this
> hierarchy.  But as a principle...
> 
> The top-level <#ri> graph would still look like it does today.
> 
> This top level view could be (disseminated from) a "special" Fedora object
> representing the repository itself (an idea I know has been floating
> around).
> 
> This could get around the situation where if one allowed arbitrary RDF
> datastreams to be propagated to the resource index, and two datastreams
> assert the same triple, deletion of one of the datastreams results in
> deletion of the triple in the resource index although the triple is still
> being asserted by the second datastream.
> 
> In the above example, if a triple was asserted by two different datastreams
> then the triple would be present in two different graphs (one graph for each
> datastream).  The top level <#ri> view would show a single triple, however
> deletion of the triple from one rdf datastream would result in it being
> removed from one graph whilst still leaving it in the graph for the other
> datastream, and therefore it would still be asserted in the resource index.

And you have thus beautifully solved an old Fedora problem!

> 
> Resolvable RI URIs - being more Semantic Web- and Web 2.0-friendly
> ==================================================================
> The resource index uses the "fedora" namespace in the info uri scheme to
> identify objects, datastreams, disseminators etc, eg <info:fedora/some:pid>.
> 
> It could also be useful to also expose resolvable URIs in the resource
> index, as an alternative view.  For instance, something akin to a
> URL-rewriting mechanism could be used to transform <info:fedora/some:pid>
> into http://server:port/fedora/objects/some:pid (using the proposed
> alternative REST API syntax).
> 
> On the way in, queries (updates, etc) would have resolvable http identifiers
> translated back to the info:fedora scheme.  (So RELS-EXT, RELS-INT etc would
> continue to use the info:fedora scheme.)
> 
> Essentially this would be an "external" view of the resource index
> containing resolvable URIs for Fedora resources that are also REST
> endpoints.
> 
> It should also be possible to disseminate sub-graphs with resolvable URIs as
> (for example) OAI-ORE resource maps.
> 
> Mapping between Fedora objects and the resource index
> =====================================================
> Currently the specification of what triples get created for Fedora objects,
> datastreams and properties is embodied in imperative Java code.
> 
> It could be possible to move this to a declarative specification, perhaps as
> part of the CMA.
> 
> For instance the base content model that every object belongs to could
> specify:
> - an XSLT for generating the "system" triples for Fedora object and
> datastream properties, relationships between objects, datastreams and
> disseminators; and which graph the triples should be added to
> - an XSLT for generating triples from RELS-EXT; and which graph the triples
> should be added to
> - an XSLT for generating triples from RELS-INT; and which graph the triples
> should be added to
> 
> "User" content models could for instance specify that XML metadata
> datastream xyz should be converted using an XSLT into RDF, and the content
> model would also indicate what graph the triples should be created in.
> 
> (XSLT is just used as an example, there may be better/alternative
> approaches, such as GRDDL, and a combination of methods may be best)
I was actually thinking that this could be expressed as disseminators.
Then the content model would only have to express which disseminator to
call.


> 
> Validation criteria (rdf schema, ontology, xml schema etc) could also be
> defined in a similar manner.
> 
> Unified relationships API
> =========================
> Having declarative specifications of the relationship between graphs in the
> resource index and the Fedora object model would help in implementing a
> unified relatinoships API - ie a method of specifying modifications to
> triples at the repository level, with the API resolving this to what it
> represents in terms of Fedora objects/datastreams and performing the
> necessary modifications on these.
> 
> Persistence is fundamental - all relationships should be stored in the
> filesystem - adding triples to Mulgara without persisting them in the Fedora
> object model should not be allowed.
And thus the triple store IS only a cache ;) But this is required.


> 
> This needs thinking about more, for instance if an arbitrary triple is to be
> added, what object should it be stored in (that is a triple that does not
> make an assertion about a Fedora object or datastream for example)?  Should
> it be possible to add a triple(s) that assert a new datastream or Fedora
> object?  (ie having a completely RDF-centric API).
I feel a useful distinction could be statements to create new graphs,
and statements to add triples to a graph. Graphs should only be created
through the "traditional" API, as they create new objects and
datastreams. 

About modifying the content of say the DC datastream through the triple
store, for that to work we need a way to map rdf statements back into
dublin core xml. This could be done by having two XSLTs and marking
those graphs that cannot map back as writeprotected.

Regards  


> 
> 
> 
> Regards
> Steve
> 
> 
> ------------------------------------------------------------------------------
> Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
> trial. Simplify your report design, integration and deployment - and focus on
> what you do best, core application coding. Discover what's new with
> Crystal Reports now.  http://p.sf.net/sfu/bobj-july
> _______________________________________________
> Fedora-commons-developers mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers


------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Fedora-commons-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers

Re: [Fedora-commons-developers] The REST API, The Resource Index and the Semantic Web

Reply via email to