Re: [RDF] Local Scope and BlankNode internalIdentifier (was: github Commons RDF vs. Apache Commons Sandbox RDF)

Reto Gmür Sat, 31 Jan 2015 10:21:45 -0800

Hi Stian,


> By keeping the "internalIdentifier" property, an application is able
> to talk about an existing blankNode without having to keep track of
> earlier BlankNode instances (e.g. not needing their own
> Map<internalIdentifier,BlankNode>).
>
By application I assume you mean an implementation of the API. Even without
exposing an identifier an application can keep track of their BlankNode,
after all they may all be instances of MyImplBNode which contains all the
field required by the backend. The question is what should happen with
BlankNode that comes from other implementations. The clerezza approach is:
they work just as well, and as long as the blanknode object is reachable
(i.e. not eligible for garbage collection) it is guaranteed that the
implementation returns an equal instance to represent this node.

The newest github javadoc says "two BlankNode in different Graphs MUST
differ". So how is this implemented, if I have a Bnode n from G1 and I add
the triple (n,p,o) to G2 with which bnode will n now differ? It can hardly
be G1, one could argue that G2 as it doesn't recognize n as one of its own
it has to create n', so that the triple actually stored is (n',p,o). If I
query for (?,p,o) the result would not contain n but n'. But what if I go
on adding triples with n? Quite clearly the intention of the following
pseudo code is to add two triples with the same subject:

G2.add(n,p,o);
G2.add(n,q,r);

But if after the first invocation the implementation must make sure the
added bnode is not equals to the "alien" bnode. so the second call to add
will create a new triple with a different subject. In practice this would
mean that there is no interoperability in the general case and that one
needs to add only terms created with the right term factory.

So it seems to me that we have no real benefit from exposing the internal
id but that it causes quite some limitations and complexity.


> It also means that a streaming copy from one implementation to another
> would work - even if there would be multiple JVM objects "on the line"
> representing the same BlankNode - having the same internalIdentifier.
>

This seems hard to reconcile with the postulate that BNodes from different
graphs MUST differ. On adding the second statement with a Bnode with the
same ID (be it the same instance or not) we are not adding a triple with
the same BNode that's already in the graph

>
>
> That said, nothing is preventing
> RDFTermFactory.createBlankNode(internalIdentifier) from always
> returning the same JVM object through some kind of lookup - as long as
> that object then is able to live in multiple "local scopes" or
> Graph.add()  copies it to set the scope.
>
Even if it always returns different instances thy still have to handle the
complexity of being added to different scopes and then become different.
Seems incredibly complex compared with alternative (clerezza) approach
where BlankNode is just objects without any exposed internas. What usecases
justify this added complexity?



>
>
> There's an open issue about what is the extent of this "local scope"
> and how this affects equivalence.
>
> https://github.com/commons-rdf/commons-rdf/issues/56
>
>
> My attempt to confuse this further:
>
> https://github.com/commons-rdf/commons-rdf/pull/48
>
>
>
> Some earlier discussions about equals:
>
> https://github.com/commons-rdf/commons-rdf/issues/45
>
>
> I think your FAQ in
> https://svn.apache.org/repos/asf/commons/sandbox/rdf/trunk/README.md
>
> is great - but this seems to include some JVM-specific decisions that
> might be easy to do in all the RDF implementations.
>

I assume you mean "NOT easy to do".

This is about providing the best possible API for Java and maybe for other
languages on the JVM. So I think the contract guaranteed by the API can
well be expressed using the whole power of this platform. At the end of day
most implementations will store the data somewhere outside the JVM, that's
fine. The Clerezza API (following the principles in the Readme) is
implemented against multiple backends. For backends allowing identification
of BNode it is very unproblematic and at most needs usage of a WeakHashMap
(creating a small linear memory overhead on "alien" BNodes).

It is harder to implement the API on top of a backend that does not expose
ids of BNodes (notably a sparql endpoint) as the BNode implementation have
to keep track of containing subgraphs. Clearly in such a situation with the
Github proposal one would have to arbitrarily create identifier (maybe
based on a hash of the Minimum Self Contained Graph of the Blanknode + a
canonical bnode labeling within this Graph)


>
> I think we have agreed that the localIdentifier doesn't have to do
> anything with the ntriplesString, which I have reflected in the tests.
> Thus a local identifier like "not: a URI or anything" is fine - all we
> know is that two BlankNode with the same local identifier in the same
> Graph should be equal, and that their ntriplesString - whatever it is
> (I do UUID v3 if the id doesn't work) should also be equal.
>
>
> What is unclear is how this "local scope" propagates - as it's not
> exposed anywhere in the current interfaces.
>
> Perhaps blank nodes should only be possible to create from/with a Graph?
>
> When you say a scope could be narrower.. what do you mean, narrower
> than a Graph? I guess say from a SPARQL result set using the Commons
> RDF API (but not Graph), the scope would be that particular result
> set.
>

The API Documentation I looked at a couple of days ago when I wrote the
previous mail said the scope might be the JVM. Now the API is clear that it
has to be narrower and BNodes with the same Id must be different in
different graphs.

As you write a consequence of this could be that it's only possible to
create thrm from/with a Graph. Again,, what are the advantages compared
with the approach described in the SVN-Readme where BNodes have no exposed
identifier and a BNode object may as long as it is alive be in multiple
graphs?


>
>
>
> Andy has said he would like the ability to copy such a BlankNode to a
> different graph, then back again to the first, and then be equal to
> the original BlankNode. (Not sure if this was meant with inserting the
> same BlankNode object into two Graphs directly, or making a single or
> tTriple instance that is added into two Graphs).
>

So we have G1 at time t1:

[] rdf:type foaf:Person.

At t2, we copy G1 to the previously empty G2.

We modify G2 by adding a triple with the existing BNode as subject, so at
t3 G2 looks like this:

[ rdf:type foaf:Person; foaf:name "Alice"].

We modify G1 by adding a triple with the existing BNode as subject, so at
t4 G1 looks like this:

[ rdf:type foaf:Person; foaf:name "Bob"].

I think copying G2 back to G1 (at t5) should result in there be two persons
in G1, not one person called both "Alice" and "Bob". With clerezza the
BNodes in the different graphs would generally not be equals and thus
result in two persons in G1 at t5. An exception is if we actually kept the
BNode instance referenceable in memory, this could be called the "BNode
zeno effect" ;)



>
> It is unclear if that BlankNode object in graph 2 will have the same
> "local scope" as the BlankNode in graph 1. Is a Triple added to two
> graphs now in two local scopes?
>
>
>
> In the 'simple' implementation achieved the back-and-forward
> equivalence by keeping a "local scope" as an Optional<Graph> within
> the BlankNodeImpl, and use this as part of equivalence:
>
>
> https://github.com/commons-rdf/commons-rdf/blob/master/simple/src/main/java/com/github/commonsrdf/simple/BlankNodeImpl.java#L40
>
>
> Should a a "free-standing" BlankNodeImpl (not inside a Triple) claim
> to be equal to, or NOT equal to another BlanNodeImpl with same
> localIdentifier if neither are in any scope?  Currently I think my
> implementation does the first of this.
>
>
> On Graph.add(Triple) I always make a clone of TripleImpl (to not
> overwrite the localScope), which will call "inScope" to clone the
> BlankNode with the new graph as scope.
>
>
> https://github.com/commons-rdf/commons-rdf/blob/master/simple/src/main/java/com/github/commonsrdf/simple/TripleImpl.java#L63
>
>
> But I see now that with the split Graph.add(s,p,o) form I don't
> propagate the Graph localScope correctly and might even cause a NPE:
>
>
> https://github.com/commons-rdf/commons-rdf/blob/master/simple/src/main/java/com/github/commonsrdf/simple/GraphImpl.java#L43
>
> https://github.com/commons-rdf/commons-rdf/blob/master/simple/src/main/java/com/github/commonsrdf/simple/TripleImpl.java#L46
>
> .. so this is tricky to get right!
>
> <rant>
> Whoever invented Blank Nodes... why not just
> <urn:uuid:7096a534-d698-414c-87fa-4b09ca5d03f2> and be done with it.
> If something exists, it exists.. just give it a name - anything! Names
> come cheap - at least now that we got rid of LSID servers :)
>
Everybody is free to just use named nodes! The price of it is that we might
end up which tons of owl:sameAs resources. See also:
http://lists.w3.org/Archives/Public/semantic-web/2008Jan/0118.html


> </rant>
>


Cheers,
Reto




>
>
> On 27 January 2015 at 13:39, Reto Gmür <[email protected]> wrote:
> > On Fri, Jan 16, 2015 at 12:29 AM, Peter Ansell <[email protected]>
> > wrote:
> >
> >> The only sticking point then and now IMO is the purely academic
> >> distinction of opening up internal labels for blank nodes versus not
> >> opening it up at all. Reto is against having the API allow access to
> >> the identifiers on academic grounds, where other systems pragmatically
> >> allow it with heavily worded javadoc contracts about their limited
> >> usefulness, per the RDF specifications:
> >>
> >
> > Hi Peter,
> >
> > Sorry for the late reply.
> >
> > I see that the javadoc for the internalIdentifier method has now become
> > quite long.
> >
> > It says:
> >
> > * In particular, the existence of two objects of type {@link BlankNode}
> *
> > with the same value returned from {@link #internalIdentifier()} are not
> *
> > equivalent unless they are known to have been created in the same local
> *
> > scope.
> > It is however not so clear what such a local scope is. It says that such
> a
> > local scope may be for example a JVM instance.  Can the scope also be
> > narrower? To allow removing redundancies (as described in
> > https://svn.apache.org/repos/asf/commons/sandbox/rdf/trunk/README.md) no
> > promise should be made that a bnode with the same ID in the same JVM will
> > denote the same node. On the other hand, how long is it guaranteed thath
> if
> > I have a BNode objects I can add triples to a graph and this object will
> > keep representing the same RDF Node? Does it make a difference if I keep
> > the instance or is I create a new instance with the same internal
> > identifier?
> >
> > Similarly: can I add bnodes I get form one graph form one implementation
> to
> > another? If I get BNode :foo from G1 can I add the triple (:foo ex:p
> ex:o)
> > it to G2? When later or I will add (:foo ex:q ex:r) to G2 will the two
> > triples have the same subject?
> >
> > I think these are important questions to allow generic interoperable
> > implementations. I'm not saying that questions like the one I answer in
> the
> > Readme of my draft cannot be satisfactory answered when having such an
> > internal identifier, but I think it might get more complicated and less
> > intuitive for the user.
> >
> > Also, you're writing about "opening up" the labels. This make sense from
> a
> > triple store perspective where the BNode have such an internal label.
> > However I think this should not be the only usecase scenario. One can
> very
> > well use the RDF API to expose some arbitrary java (data) objects as RDF.
> > I've illustrated such a scenario here:
> >
> >
> http://mail-archives.apache.org/mod_mbox/stanbol-dev/201211.mbox/%3CCALvhUEUfOd-mLBh-=xkwblajhbcboe963hdxv6g0jhnpj6c...@mail.gmail.com%3E
> >
> > I'm not sure if with the github API one could say "the scope is the node
> > instance" and return a fixed identifier for all BNode. If so the
> identifier
> > is obviously pointless. If on the other hand one would have to assign
> > identifier to all the objects the complexity of the implementation this
> > would make implementations more complex both in terms of code as in terms
> > of memory usage.
> >
> > Again, it seems to make things more complex while I see no clear
> advantage
> > comparing with the toString() method the object has anyway.
> >
> > Cheers,
> > Reto
>
> --
> Stian Soiland-Reyes
> Apache Taverna (incubating)
> http://orcid.org/0000-0001-9842-9718
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Re: [RDF] Local Scope and BlankNode internalIdentifier (was: github Commons RDF vs. Apache Commons Sandbox RDF)

Reply via email to