(NB multiple dev@ mailing lists)

On 13/11/12 12:13, Rupert Westenthaler wrote:
Hi all,

I would like to share some thoughts/comments and suggestions from my side:

Thanks - these are interesting to hear.


ResourceFactory: Clerezza is missing a Factory for RDF resources. I
would like to have such a Factory. The Factory should be obtainable
via the Graph - the Collection of Triples. IMO such a Factory is
required if all resource types (IRI, Bnode, Literal) are represented
by interfaces.

Yes - a factory is needed if interfaces.

Whether they should interfaces or whether fixed classes is an interesting design point. I can see arguments both ways.

The argument for interfaces is presumably different implementations for different storage layers (e.g. with hidden internal pointers related to the storage). It is also a case of "it's the Java way".

But two RDFTerms (resources) are equal by value - if they have the same IRI they are equal and equality is tied to putting in Java collections.

I think the consequence is that a specific subsystem can't assume that RDF terms passed to it automatically must have come from that component. Theer's not

And some RDF terms are

[[RDF Term is the term invented in SPARQL to cover resources/bnodes/literals because there wasn't one in RDF : resource is used for "web resource" so either the thing being described, not its name, and/or as a general concept, not specific to RDF ]]


Interesting design point for literals is value vs lexical form/datatype. It is the value that matters (OK - should matter), whether it's written "+1"^^xsd:integer or "01"^^xsd:byte. Does any one have a use case example where the derived datatype matters semantically?

BNodes: If Bnode is an interface than any implementation is free to
internally use a "bnode-id". One argument pro such ids (that was not
yet mentioned) is that such id's allow you to avoid in-memory mappings
for bnodes when wrapping an native implementation. In Clerezza you
currently need to have this Bidi maps.

Triple, Quads: While for some use cases the Triple-in-Graph based API
(Quad := Triple t =
TripleStore#getGraph(context).filter(subject,predicate,object)) is
sufficient this is no longer the case as soon as Applications want to
work with an Graph that contains Quads with several contexts. So I
would vote for having support for Quads.

That is what an RDF dataset is supposed to be, but it's not completely transparent - just working with the default graph is very much like working with one graph.

The full-blown quads-in-graph would be N3-style formulae, where a graph nodes can be a graph. Also called "graph literals".

At this point, they are not going to happen for RDF but if building an API or component, I would at least put the hooks in for it to prepare for a possible future.

Dataset,Graph: Out of an User perspective Dataset (how the TripleStore
looks at the Triples) and Graph (how RDF looks at the Triples) are not
so different. Because of that I would like to have a single domain
object fitting for both. The API should focus on the Graph aspects (as
Clerezza does) while still allowing efficient implementations that do
not load all triples into memory (e.g. use closeable iterators)

Immutable Graphs: I had really problems to get this right and the
current Clerezza API does not help with that task (resulting in things
like read-only mutable graphs that are no Graphs as they only provide
a read-only view on a Graph that might still be changed by other
means). I think read-only Graphs (like
Collections.unmodifiableCollection(..)) should be sufficient. IMHO the
use case to protect a returned graph from modifications by the caller
of the method is much more prominent as truly immutable graphs.

SPARQL: I would not deal with parsing SPARQL queries but rather
forward them as is to the underlaying implementation. If doing so the
API would only need to border with result sets. This would also avoid
the need to deal with "Datasets". This is not arguing against a
fallback (e.g. the trick Clerezza does by using the Jena SPARQL
implementation) but in practice efficient SPARQL executions can only
happen natively within the TripleStore. Trying to do otherwise will
only trick users into use cases that will not scale.

Agreed - and memory is a precious resource at scale. It's usually better to give it to the data storage to avoid I/O. Too much overhead in higher level APIs keeping state competes with the I/O caching.

        Andy


best
Rupert

Reply via email to