Re: DatasetGraph, Context serialization and thrift implementation, BNode distribution/collision.

2017-03-04 Thread Andy Seaborne


Will BNodes in two DatasetGraph's ever collide?


No.

In Jena, the internal blank node identifier is globally unique.

In the parsers, blanks node internal label are based on UUIDs so they
are universally unique to very high probability.

Internally generated blank nodes (createResource()) are java RMI UIDs so
32 bit random host ID, JVM id, timestamp.


Looking at the JDK code, I see a couple of small potential problems that
limit the full range of the bit space but nothing too bad.
We could change that and switch to using UUIDs to generate BlankNodeIds.
 That gets used on Google App Engine anyway where there is no working
java.rmi RMI UID


JENA-1304.  Use UUIDs in the last place they aren't.



Andy




Dick.



Re: DatasetGraph, Context serialization and thrift implementation, BNode distribution/collision.

2017-03-04 Thread Andy Seaborne



On 03/03/17 19:57, Dick Murray wrote:

Hi.

Question regarding the design thoughts behind Context and the callbacks.
Also merging BNodes...

I have implemented a Thrift based RPC DatasetGraph consisting of a Client
(implements DatasetGraph) which forwards calls to an IFace (generated from
a Thrift file which closely mimics the DatasetGraph interface with some
method name tweaks to handle thrift nuances such as not supporting method
overloading). The IFace wraps a DatasetGraph.

The IFace supports all of the DatasetGraph interface (including RPC lock
and transaction support) with the exception of getContext(), currently it
returns Context.emptyContext. Context and Symbol don't implement
Serializable, which in itself can be overcome. But I stumped at the
callback part of Context. What is it? What does it do? (besides the
obvious!).

In the bigger picture I implement a distributed DatasetGraph which contains
a set of IFace endpoints. Thus when find(Quad) is called it makes a set of
RPC calls to the IFace endpoints and aggregates the results. Internally
locks are applied when needed, in particular write locks are weighted e.g.
add(x, s, p, o) will attempt to lock the IFace which has graph x (i.e. it
checks for the graph before doing the write lock and add). Basically
beginTransaction(ReadWrite) on the DatsetGraphDistributed won't actually
call beginTransaction(ReadWrite) on an IFace until it needs to. This allows
multiple IFace endpoints to be in write transactions. To support the thread
affinity of DatasetGraph the IFace endpoints use a UUID to delegate from
the Thrift thread pool thread (i.e. the one servicing the RPC call) to the
same thread which actually performs the wrapped DatasetGraph action.
Additionally the underlying DatasetGraph can be accessed as usual whilst
being wrapped by the IFace which supports RPC calls into the same
DatasetGraph.

Anyway...

What was the Context callback designed for? Is it ever used?


It is for passing through configuration information e.g. strict mode, 
time query started (used by NOW()), registries for functions.


For TDB, the default union graph mode for a dataset lives there.

The execution normally takes the global one, adds in the dataset one, 
adds "now" and the query/algebra tracking, then freezes it.



If I have a central Context which I push to the IFace endpoints would that
cause me any issues? Similar idea to a central config...

Will BNodes in two DatasetGraph's ever collide?


No.

In Jena, the internal blank node identifier is globally unique.

In the parsers, blanks node internal label are based on UUIDs so they 
are universally unique to very high probability.


Internally generated blank nodes (createResource()) are java RMI UIDs so 
32 bit random host ID, JVM id, timestamp.



Looking at the JDK code, I see a couple of small potential problems that 
limit the full range of the bit space but nothing too bad.
We could change that and switch to using UUIDs to generate BlankNodeIds. 
 That gets used on Google App Engine anyway where there is no working 
java.rmi RMI UID


Andy




Dick.



DatasetGraph, Context serialization and thrift implementation, BNode distribution/collision.

2017-03-03 Thread Dick Murray
Hi.

Question regarding the design thoughts behind Context and the callbacks.
Also merging BNodes...

I have implemented a Thrift based RPC DatasetGraph consisting of a Client
(implements DatasetGraph) which forwards calls to an IFace (generated from
a Thrift file which closely mimics the DatasetGraph interface with some
method name tweaks to handle thrift nuances such as not supporting method
overloading). The IFace wraps a DatasetGraph.

The IFace supports all of the DatasetGraph interface (including RPC lock
and transaction support) with the exception of getContext(), currently it
returns Context.emptyContext. Context and Symbol don't implement
Serializable, which in itself can be overcome. But I stumped at the
callback part of Context. What is it? What does it do? (besides the
obvious!).

In the bigger picture I implement a distributed DatasetGraph which contains
a set of IFace endpoints. Thus when find(Quad) is called it makes a set of
RPC calls to the IFace endpoints and aggregates the results. Internally
locks are applied when needed, in particular write locks are weighted e.g.
add(x, s, p, o) will attempt to lock the IFace which has graph x (i.e. it
checks for the graph before doing the write lock and add). Basically
beginTransaction(ReadWrite) on the DatsetGraphDistributed won't actually
call beginTransaction(ReadWrite) on an IFace until it needs to. This allows
multiple IFace endpoints to be in write transactions. To support the thread
affinity of DatasetGraph the IFace endpoints use a UUID to delegate from
the Thrift thread pool thread (i.e. the one servicing the RPC call) to the
same thread which actually performs the wrapped DatasetGraph action.
Additionally the underlying DatasetGraph can be accessed as usual whilst
being wrapped by the IFace which supports RPC calls into the same
DatasetGraph.

Anyway...

What was the Context callback designed for? Is it ever used?

If I have a central Context which I push to the IFace endpoints would that
cause me any issues? Similar idea to a central config...

Will BNodes in two DatasetGraph's ever collide?

Dick.