Re: Future of Clerezza and Stanbol

Reto Bachmann-Gmür Tue, 13 Nov 2012 05:22:07 -0800

On Tue, Nov 13, 2012 at 1:13 PM, Rupert Westenthaler <
[email protected]> wrote:


> Hi all,
>
> I would like to share some thoughts/comments and suggestions from my side:
>
> ResourceFactory: Clerezza is missing a Factory for RDF resources. I
> would like to have such a Factory. The Factory should be obtainable
> via the Graph - the Collection of Triples. IMO such a Factory is
> required if all resource types (IRI, Bnode, Literal) are represented
> by interfaces.
>
> Such a factory should not be tied to Graph, as any Resource object should
be usable with any Graph


> BNodes: If Bnode is an interface than any implementation is free to
> internally use a "bnode-id". One argument pro such ids (that was not
> yet mentioned) is that such id's allow you to avoid in-memory mappings
> for bnodes when wrapping an native implementation. In Clerezza you
> currently need to have this Bidi maps.
>

Wenn for the mapped nodes no mapping is needed, a map is need only for the
bnodes originating from another source and only as long as these object ar
referenced from elsewere (so the map has to contain only week references).
These is typically a small number of bnodes and an id wouldn't help unless
you assume id-based cross graph identity of bnodes (or you tie the bndoes
to a graph but if you do this you need no id either).


>
> Triple, Quads: While for some use cases the Triple-in-Graph based API
> (Quad := Triple t =
> TripleStore#getGraph(context).filter(subject,predicate,object)) is
> sufficient this is no longer the case as soon as Applications want to
> work with an Graph that contains Quads with several contexts. So I
> would vote for having support for Quads.


> Dataset,Graph: Out of an User perspective Dataset (how the TripleStore
> looks at the Triples) and Graph (how RDF looks at the Triples) are not
> so different. Because of that I would like to have a single domain
> object fitting for both. The API should focus on the Graph aspects (as
> Clerezza does) while still allowing efficient implementations that do
> not load all triples into memory (e.g. use closeable iterators)
>

I suggest you propose usecases for which the implementation with different
APIs can be proposed and the pros and cons evaluated. I think a quad-view
can easily be implemented on top of a DataSet and of coure the
implementation is free to use quads internaly.


>
> Immutable Graphs: I had really problems to get this right and the
> current Clerezza API does not help with that task (resulting in things
> like read-only mutable graphs that are no Graphs as they only provide
> a read-only view on a Graph that might still be changed by other
> means). I think read-only Graphs (like
> Collections.unmodifiableCollection(..)) should be sufficient. IMHO the
> use case to protect a returned graph from modifications by the caller
> of the method is much more prominent as truly immutable graphs.
>

It's about the different identity criterion. The identity of graph is
clearly defined in the RDF specs but applies only to graphs that do not
change over time (respectively to time-slices of graphs that do). As I
already wrote a motivating usecase here is to have easy way to do
synchronization and diffs over decomposed graphs (i.e. the individual
immutable graphs are MSG as they are used in RDFSync). of Course this is no
hindrance and ortogonal to havin a
TripleCollection.getImmutableTripleCollection(...).


>
> SPARQL: I would not deal with parsing SPARQL queries but rather
> forward them as is to the underlaying implementation. If doing so the
> API would only need to border with result sets. This would also avoid
> the need to deal with "Datasets". This is not arguing against a
> fallback (e.g. the trick Clerezza does by using the Jena SPARQL
> implementation) but in practice efficient SPARQL executions can only
> happen natively within the TripleStore. Trying to do otherwise will
> only trick users into use cases that will not scale.
>
+1 for sparql fastlane, some parsing is still needed to see if the query is
against graphs that all come from one and the same backend and which one
this is.

Cheers,
Reto

>
> best
> Rupert
>
> On Tue, Nov 13, 2012 at 9:08 AM, Reto Bachmann-Gmür <[email protected]>
> wrote:
> > On Mon, Nov 12, 2012 at 10:40 PM, Andy Seaborne <[email protected]> wrote:
> >
> >> On 12/11/12 19:42, Reto Bachmann-Gmür wrote:
> >>
> >>> On Mon, Nov 12, 2012 at 5:46 PM, Andy Seaborne <[email protected]>
> wrote:
> >>>
> >>>  On 09/11/12 09:56, Rupert Westenthaler wrote:
> >>>>
> >>>>  RDF libs:
> >>>>> ====
> >>>>>
> >>>>> Out of the viewpoint of Apache Stanbol one needs to ask the Question
> >>>>> if it makes sense to manage an own RDF API. I expect the Semantic Web
> >>>>> Standards to evolve quite a bit in the coming years and I do have
> >>>>> concern that the Clerezza RDF modules will be updated/extended to
> >>>>> provide implementations of those. One example of such an situation is
> >>>>> SPARQL 1.1 that is around for quite some time and is still not
> >>>>> supported by Clerezza. While I do like the small API, the flexibility
> >>>>> to use different TripleStores and that Clerezza comes with OSGI
> >>>>> support I think given the current situation we would need to discuss
> >>>>> all options and those do also include a switch to Apache Jena or
> >>>>> Sesame. Especially Sesame would be an attractive option as their RDF
> >>>>> Graph API [1] is very similar to what Clerezza uses. Apache Jena's
> >>>>> counterparts (Model [2] and Graph [3]) are considerable different and
> >>>>> more complex interfaces. In addition Jena will only change to
> >>>>> org.apache packages with the next major release so a switch before
> >>>>> that release would mean two incompatible API changes.
> >>>>>
> >>>>>
> >>>> Jena isn't changing the packaging as such -- what we've discussed is
> >>>> providing a package for the current API and then a new, org.apache
> API.
> >>>>   The new API may be much the same as the existing one or it may be
> >>>> different - that depends on contributions made!
> >>>>
> >>>>
> >>> I didn't know about jena planning to introduce such a common API.
> >>>
> >>>
> >>>> I'd like to hear more about your experiences esp. with Graph API as
> that
> >>>> is supposed to be quite simple - it's targeted at storage extensions
> as
> >>>> well as supporting the richer Model API.  Personally, aside from the
> fact
> >>>> that Clerreza enforces slot constraints (no literals as subjects), the
> >>>> Jena
> >>>> Graph API and Clerezza RDF core API seem reasonably aligned.
> >>>>
> >>>>
> >>> Yes the slot constraints comes from the RDF abstract syntax. In my
> opinion
> >>> it's something one could decide to relax, by adding appropriate
> owl:sameAs
> >>> bnode any graph could be transformed to an rdf-abstract-syntax
> compliant
> >>> one. So maybe have a GnereicTripleCollection that can be converted to
> an
> >>> RDFTRipleCollection - not sure. Just sticking to the spec and wait till
> >>> this is allowed by the abstract syntax might be the easiest.
> >>>
> >>
> >> At the core, unconstrained slots has worked best for us.
> >>
> >
> > The question is shall this be part of a common API. For machinering doing
> > inference and dealing with the meaning of RDF graphs resources should
> also
> > be associated to a set of IRIs (that serialize into oswl:sameAs).
> >
> >
> >>
> >> Then either:
> >>
> >> 1/ have a test like:
> >>   Triple.isValidRDF
> >>
> >> 2/ Layer an app API to impose the constraints (but it's easy to run out
> of
> >> good names).
> >>
> >
> > The clerezza API would be such a layer.
> >
> >
> >>
> >>
> >> The Graph/Node/Triple level in Jena is an API but it's primary role is
> the
> >> other side, to storage and inference, not apps.
> >>
> >> Generality gives
> >> A/ Future proofing (not perfect)
> >> B/ Arises in inference and query naturally.
> >> C/ using RDF structures for processing RDF
> >>
> >> Nodes in triples can be variables, and I would have found it useful to
> >> have marker nodes to be able to build structures e.g. "known to be
> bound at
> >> this point in a query".  As it was, I ended up creating parallel
> structures.
> >>
> >>
> >>  Where I see advantages of the clerezza API:
> >>> - Bases on collections framework so standard tools can be used for
> graphs
> >>>
> >>
> >> Given a core system API, a scala and clojure and even different Java
> APIs
> >> for difefrent styles are all possible.
> >>
> >
> > Right. That's why I propose having a minimum API and decorators as to
> > provide scala interfacing or the resource api for java ( which
> corresponds
> > more or less to the W3C RDF API draft)
> >
> >
> >>
> >> A universal API across systems is about plugging in machinery (parser,
> >> query engines, storage, inference).  It's good to separate that from
> >> application APIs otherwise there is a design tension.
> >
> > I'm wondering if there need to be specia hooks for inference or if this
> > cannot just as well be done by simply wrapping the graphs.
> >
> >
> >>
> >>
> >>  - Immutable graphs follow identity criterion of RDF semantics, this
> allows
> >>> graph component to be added to sets and more straight forwardly
> implement
> >>> diff and patch algorithms
> >>> - BNode have no ids: apart from promoting the usage of URIs where this
> is
> >>> appropriate it allows behind the scenes leanification and saves memory
> >>> where the backend doesn't hast such ids.
> >>>
> >>
> >> We have argued about this before.
> >>
> >> + As you have objects, there is a concept of identity (you can tell two
> >> bNodes apart).
> >>
> > No, two bnodes might be indistinguisgibe as in
> >
> > a :knows b
> > b : knows a
> >
> > You cannot tell them apart even though none of them can be leanified away
> >
> >
> >> + For persistence, an internal id is necessary to reconstruct
> consistently
> >> with caches.
> >>
> >
> > Here we are talking about some implementation stuff that imho should be
> > separate from API discussion. Do you accept my Toy-usecase challenge [1],
> > if we leave the classical dedicate triple store usecase scenario the id
> > quickly becomes something that makes things harder rather than easier.
> >
> >
> >> + Leaning isn't a core feature of RDF.  In fact, IIRC, mention is going
> to
> >> be removed.  It's information reduction, not data reduction.
> >>
> >
> > It simply arises from bnodes being existential variables. If they are
> > eredined to be something else then I have difficulties to see what
> > advantages they wold still offer to named nodes (maybe in some slolem:
> uri
> > scheme)
> >
> >
> >> + There will be a have a skolemization Note from RDF-WG to deal with the
> >> practical matters of dealing with bNodes.
> >>
> >> RDF as data model for linked data.
> >>
> >> Its a datastructure with good properties for combining.  And it has
> links.
> >>
> >>
> >>
> >>>
> >>>
> >>>
> >>>> (for generalised systems such as rules engine - and for SPARQL -
> triples
> >>>> can arise with extras like literals as subjects; they get removed
> later)
> >>>>
> >>>
> >>>
> >>> If this shall be an API for interoperability based on RDF standard I'm
> >>> wonder if is shall be possible to expose such intermediate constructs.
> >>>
> >>
> >> My suggestion is that the API for interoperability is designed to
> support
> >> RDF standards.
> >>
> >> The key elements are IRIs, literals, Triples, Quads, Graphs, Datasets.
> >>
> >
> > Datasets are an element of the relevant sparql spec, I don't see Quads.
> >
> >
> >>
> >> But also storage, SPARQL (Query and Update), and web access (e.g.
> conneg).
> >>
> >
> > Clerezza is very stong on conneg but I don't think this would be part of
> > the rdf core api, but rather of the parts that could be part of Stanbol
> and
> > provide a Linked Data Platform Container (LDPC).
> >
> >
> > Reto
> >
> > 1.
> >
> http://mail-archives.apache.org/mod_mbox/stanbol-dev/201211.mbox/%3CCALvhUEUfOd-mLBh-%3DXkwbLAJHBcboE963hDxv6g0jHNPj6cxPQ%40mail.gmail.com%3E
>
>
>
> --
> | Rupert Westenthaler             [email protected]
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen
>

Re: Future of Clerezza and Stanbol

Reply via email to