Re: DatasetGraph, Context serialization and thrift implementation, BNode distribution/collision.
Will BNodes in two DatasetGraph's ever collide? No. In Jena, the internal blank node identifier is globally unique. In the parsers, blanks node internal label are based on UUIDs so they are universally unique to very high probability. Internally generated blank nodes (createResource()) are java RMI UIDs so 32 bit random host ID, JVM id, timestamp. Looking at the JDK code, I see a couple of small potential problems that limit the full range of the bit space but nothing too bad. We could change that and switch to using UUIDs to generate BlankNodeIds. That gets used on Google App Engine anyway where there is no working java.rmi RMI UID JENA-1304. Use UUIDs in the last place they aren't. Andy Dick.
Re: DatasetGraph, Context serialization and thrift implementation, BNode distribution/collision.
On 03/03/17 19:57, Dick Murray wrote: Hi. Question regarding the design thoughts behind Context and the callbacks. Also merging BNodes... I have implemented a Thrift based RPC DatasetGraph consisting of a Client (implements DatasetGraph) which forwards calls to an IFace (generated from a Thrift file which closely mimics the DatasetGraph interface with some method name tweaks to handle thrift nuances such as not supporting method overloading). The IFace wraps a DatasetGraph. The IFace supports all of the DatasetGraph interface (including RPC lock and transaction support) with the exception of getContext(), currently it returns Context.emptyContext. Context and Symbol don't implement Serializable, which in itself can be overcome. But I stumped at the callback part of Context. What is it? What does it do? (besides the obvious!). In the bigger picture I implement a distributed DatasetGraph which contains a set of IFace endpoints. Thus when find(Quad) is called it makes a set of RPC calls to the IFace endpoints and aggregates the results. Internally locks are applied when needed, in particular write locks are weighted e.g. add(x, s, p, o) will attempt to lock the IFace which has graph x (i.e. it checks for the graph before doing the write lock and add). Basically beginTransaction(ReadWrite) on the DatsetGraphDistributed won't actually call beginTransaction(ReadWrite) on an IFace until it needs to. This allows multiple IFace endpoints to be in write transactions. To support the thread affinity of DatasetGraph the IFace endpoints use a UUID to delegate from the Thrift thread pool thread (i.e. the one servicing the RPC call) to the same thread which actually performs the wrapped DatasetGraph action. Additionally the underlying DatasetGraph can be accessed as usual whilst being wrapped by the IFace which supports RPC calls into the same DatasetGraph. Anyway... What was the Context callback designed for? Is it ever used? It is for passing through configuration information e.g. strict mode, time query started (used by NOW()), registries for functions. For TDB, the default union graph mode for a dataset lives there. The execution normally takes the global one, adds in the dataset one, adds "now" and the query/algebra tracking, then freezes it. If I have a central Context which I push to the IFace endpoints would that cause me any issues? Similar idea to a central config... Will BNodes in two DatasetGraph's ever collide? No. In Jena, the internal blank node identifier is globally unique. In the parsers, blanks node internal label are based on UUIDs so they are universally unique to very high probability. Internally generated blank nodes (createResource()) are java RMI UIDs so 32 bit random host ID, JVM id, timestamp. Looking at the JDK code, I see a couple of small potential problems that limit the full range of the bit space but nothing too bad. We could change that and switch to using UUIDs to generate BlankNodeIds. That gets used on Google App Engine anyway where there is no working java.rmi RMI UID Andy Dick.
DatasetGraph, Context serialization and thrift implementation, BNode distribution/collision.
Hi. Question regarding the design thoughts behind Context and the callbacks. Also merging BNodes... I have implemented a Thrift based RPC DatasetGraph consisting of a Client (implements DatasetGraph) which forwards calls to an IFace (generated from a Thrift file which closely mimics the DatasetGraph interface with some method name tweaks to handle thrift nuances such as not supporting method overloading). The IFace wraps a DatasetGraph. The IFace supports all of the DatasetGraph interface (including RPC lock and transaction support) with the exception of getContext(), currently it returns Context.emptyContext. Context and Symbol don't implement Serializable, which in itself can be overcome. But I stumped at the callback part of Context. What is it? What does it do? (besides the obvious!). In the bigger picture I implement a distributed DatasetGraph which contains a set of IFace endpoints. Thus when find(Quad) is called it makes a set of RPC calls to the IFace endpoints and aggregates the results. Internally locks are applied when needed, in particular write locks are weighted e.g. add(x, s, p, o) will attempt to lock the IFace which has graph x (i.e. it checks for the graph before doing the write lock and add). Basically beginTransaction(ReadWrite) on the DatsetGraphDistributed won't actually call beginTransaction(ReadWrite) on an IFace until it needs to. This allows multiple IFace endpoints to be in write transactions. To support the thread affinity of DatasetGraph the IFace endpoints use a UUID to delegate from the Thrift thread pool thread (i.e. the one servicing the RPC call) to the same thread which actually performs the wrapped DatasetGraph action. Additionally the underlying DatasetGraph can be accessed as usual whilst being wrapped by the IFace which supports RPC calls into the same DatasetGraph. Anyway... What was the Context callback designed for? Is it ever used? If I have a central Context which I push to the IFace endpoints would that cause me any issues? Similar idea to a central config... Will BNodes in two DatasetGraph's ever collide? Dick.