Re: Future of Clerezza and Stanbol

Reto Bachmann-Gmür Tue, 13 Nov 2012 00:08:57 -0800

On Mon, Nov 12, 2012 at 10:40 PM, Andy Seaborne <[email protected]> wrote:


> On 12/11/12 19:42, Reto Bachmann-Gmür wrote:
>
>> On Mon, Nov 12, 2012 at 5:46 PM, Andy Seaborne <[email protected]> wrote:
>>
>>  On 09/11/12 09:56, Rupert Westenthaler wrote:
>>>
>>>  RDF libs:
>>>> ====
>>>>
>>>> Out of the viewpoint of Apache Stanbol one needs to ask the Question
>>>> if it makes sense to manage an own RDF API. I expect the Semantic Web
>>>> Standards to evolve quite a bit in the coming years and I do have
>>>> concern that the Clerezza RDF modules will be updated/extended to
>>>> provide implementations of those. One example of such an situation is
>>>> SPARQL 1.1 that is around for quite some time and is still not
>>>> supported by Clerezza. While I do like the small API, the flexibility
>>>> to use different TripleStores and that Clerezza comes with OSGI
>>>> support I think given the current situation we would need to discuss
>>>> all options and those do also include a switch to Apache Jena or
>>>> Sesame. Especially Sesame would be an attractive option as their RDF
>>>> Graph API [1] is very similar to what Clerezza uses. Apache Jena's
>>>> counterparts (Model [2] and Graph [3]) are considerable different and
>>>> more complex interfaces. In addition Jena will only change to
>>>> org.apache packages with the next major release so a switch before
>>>> that release would mean two incompatible API changes.
>>>>
>>>>
>>> Jena isn't changing the packaging as such -- what we've discussed is
>>> providing a package for the current API and then a new, org.apache API.
>>>   The new API may be much the same as the existing one or it may be
>>> different - that depends on contributions made!
>>>
>>>
>> I didn't know about jena planning to introduce such a common API.
>>
>>
>>> I'd like to hear more about your experiences esp. with Graph API as that
>>> is supposed to be quite simple - it's targeted at storage extensions as
>>> well as supporting the richer Model API.  Personally, aside from the fact
>>> that Clerreza enforces slot constraints (no literals as subjects), the
>>> Jena
>>> Graph API and Clerezza RDF core API seem reasonably aligned.
>>>
>>>
>> Yes the slot constraints comes from the RDF abstract syntax. In my opinion
>> it's something one could decide to relax, by adding appropriate owl:sameAs
>> bnode any graph could be transformed to an rdf-abstract-syntax compliant
>> one. So maybe have a GnereicTripleCollection that can be converted to an
>> RDFTRipleCollection - not sure. Just sticking to the spec and wait till
>> this is allowed by the abstract syntax might be the easiest.
>>
>
> At the core, unconstrained slots has worked best for us.
>

The question is shall this be part of a common API. For machinering doing
inference and dealing with the meaning of RDF graphs resources should also
be associated to a set of IRIs (that serialize into oswl:sameAs).


>
> Then either:
>
> 1/ have a test like:
>   Triple.isValidRDF
>
> 2/ Layer an app API to impose the constraints (but it's easy to run out of
> good names).
>

The clerezza API would be such a layer.


>
>
> The Graph/Node/Triple level in Jena is an API but it's primary role is the
> other side, to storage and inference, not apps.
>
> Generality gives
> A/ Future proofing (not perfect)
> B/ Arises in inference and query naturally.
> C/ using RDF structures for processing RDF
>
> Nodes in triples can be variables, and I would have found it useful to
> have marker nodes to be able to build structures e.g. "known to be bound at
> this point in a query".  As it was, I ended up creating parallel structures.
>
>
>  Where I see advantages of the clerezza API:
>> - Bases on collections framework so standard tools can be used for graphs
>>
>
> Given a core system API, a scala and clojure and even different Java APIs
> for difefrent styles are all possible.
>

Right. That's why I propose having a minimum API and decorators as to
provide scala interfacing or the resource api for java ( which corresponds
more or less to the W3C RDF API draft)


>
> A universal API across systems is about plugging in machinery (parser,
> query engines, storage, inference).  It's good to separate that from
> application APIs otherwise there is a design tension.

I'm wondering if there need to be specia hooks for inference or if this
cannot just as well be done by simply wrapping the graphs.


>
>
>  - Immutable graphs follow identity criterion of RDF semantics, this allows
>> graph component to be added to sets and more straight forwardly implement
>> diff and patch algorithms
>> - BNode have no ids: apart from promoting the usage of URIs where this is
>> appropriate it allows behind the scenes leanification and saves memory
>> where the backend doesn't hast such ids.
>>
>
> We have argued about this before.
>
> + As you have objects, there is a concept of identity (you can tell two
> bNodes apart).
>
No, two bnodes might be indistinguisgibe as in

a :knows b
b : knows a

You cannot tell them apart even though none of them can be leanified away


> + For persistence, an internal id is necessary to reconstruct consistently
> with caches.
>

Here we are talking about some implementation stuff that imho should be
separate from API discussion. Do you accept my Toy-usecase challenge [1],
if we leave the classical dedicate triple store usecase scenario the id
quickly becomes something that makes things harder rather than easier.


> + Leaning isn't a core feature of RDF.  In fact, IIRC, mention is going to
> be removed.  It's information reduction, not data reduction.
>

It simply arises from bnodes being existential variables. If they are
eredined to be something else then I have difficulties to see what
advantages they wold still offer to named nodes (maybe in some slolem: uri
scheme)


> + There will be a have a skolemization Note from RDF-WG to deal with the
> practical matters of dealing with bNodes.
>
> RDF as data model for linked data.
>
> Its a datastructure with good properties for combining.  And it has links.
>
>
>
>>
>>
>>
>>> (for generalised systems such as rules engine - and for SPARQL - triples
>>> can arise with extras like literals as subjects; they get removed later)
>>>
>>
>>
>> If this shall be an API for interoperability based on RDF standard I'm
>> wonder if is shall be possible to expose such intermediate constructs.
>>
>
> My suggestion is that the API for interoperability is designed to support
> RDF standards.
>
> The key elements are IRIs, literals, Triples, Quads, Graphs, Datasets.
>

Datasets are an element of the relevant sparql spec, I don't see Quads.


>
> But also storage, SPARQL (Query and Update), and web access (e.g. conneg).
>

Clerezza is very stong on conneg but I don't think this would be part of
the rdf core api, but rather of the parts that could be part of Stanbol and
provide a Linked Data Platform Container (LDPC).


Reto

1.
http://mail-archives.apache.org/mod_mbox/stanbol-dev/201211.mbox/%3CCALvhUEUfOd-mLBh-%3DXkwbLAJHBcboE963hDxv6g0jHNPj6cxPQ%40mail.gmail.com%3E

Re: Future of Clerezza and Stanbol

Reply via email to