Re: In-memory copies of graphs returned for Jena base TcProviders (was: PrivilegedMGraphWrapper#getGraph() create in-memory copy)

Rupert Westenthaler Wed, 21 Mar 2012 10:36:21 -0700

On 21.03.2012, at 16:27, Daniel Spicar wrote:

> Hi Rupert,
> 
> Your findings sound quite serious to me. From a quick check I can confirm
> your findings. It seems TDB backed read-only Graphs are in fact in-memory
> SimpleTripleCollections.  I didn't implement this functionality originally
> so I am not an authority on it though ;)


> About the problem of the MGraph's getGraph method: Intuitively I would
> approach the problem by creating a Wrapper (Decortator) for MGraphs that
> returns a "Immutable" Graph. This would return a Graph to the user that
> forwards read-access to the MGraph and prevents write access. However the
> backing graph will be an MGraph.

I am also in favor of the decorator pattern, but it is not quite the same as 
creating an immutable copy, because with a decorator components with a 
reference to the decorated MGraph could still modify it. This would - 
theoretically - introduce the need to use read locks on Graphs (e.g. to protect 
iterators over the Graph for changes in the backing MGraph).

Creating a real immutable copy of a graph is already possible by calling 
TcProvider.createGraph(TripleCollection tc). I do not see the necessarily to 
duplicate this in the MGraph API.

> 
> In general: I don't know much about TDB's inner workings, does it offer
> read-only graphs? And if so what are the benefits of using them? (I assume
> more efficient synchronization). If there is such a thing, implementing
> native access to TDBs read-only graphs is definitely something great.
> 

AFAIK TDB does not provide feature, but the TcProvider implementations for TDB 
take care of that as the do not allow to create an MGraph over an graph that 
was created as Graph (that's true for both the "TdbTcProvider" and the 
"SingleTdbDatasetTcProvider". The only possibility to change a TDB model that 
was created for a Graph would therefore to directly access the TDB dataset 
(outside of Clerezza).

BTW a simple workaround to avoid the  creation of in-memory copy for TDB graphs 
is to instantiate the JenaGraphAdapter by using

                MGraph jenaAdapter = new JenaGraphAdaptor(model.getGraph()){
                    /**
                     * Ensure that no in-memory copies are created for read only
                     * Jena Graphs
                     * @return
                     */
                    @Override
                    public Graph getGraph() {
                        return new SimpleGraph(this,true);
                    }
                };
                Graph graph = jenaAdapter.getGraph();

when get/createGraph is called on the TcProvider as the constructor 
"SimpleGraph(TripleCollection tc,boolean tripleCollectionWillNeverChange)" does 
not create a copy of the parsed TripleCollection.
For now I use this with the SingleTdbDatasetTcProvider.

best
Rupert


> Daniel
> 
> On 20 March 2012 09:49, Rupert Westenthaler
> <[email protected]>wrote:
> 
>> Hi again
>> 
>> Just noticed that the
>> "org.apache.clerezza.rdf.jena.storage.JenaGraphAdaptor" does the exact same
>> by extending "org.apache.clerezza.rdf.core.impl.AbstractMGraph".
>> 
>> This means that all Graphs returned by the Jena TDB provider
>> (org.apache.clerezza.rdf.jena.tdb.storage.TdbTcProvider) are in fact
>> in-memory copies. This would not be necessary as the TdbTcProvider already
>> ensures that a Graph can not be opened as MGraph.
>> 
>> To avoid such copies one would need to refactor the JenaGraphAdaptor so
>> that one can create both a "JenaMGraphAdaptor" and a read-only
>> "JenaGraphAdaptor". JenaMGraphAdaptor.getGraph() would still need to create
>> an in-memory copy, but the "JenaGraphAdaptor" would allow to avoid this.
>> TcProvider implementations that instantiate ""JenaGraphAdaptor" would need
>> to ensure themselves that the underlining JenaGraph is not modified.
>> 
>> This is of special importance to the SingleTdbDatasetTcProvider as I am
>> planing to add support for exposing the "urn:x-arq:UnionGraph" via the
>> TcProvider.getGraph(..) method. Creating in-memory copies of the union
>> graph over all named models within the TDB store is not feasible.
>> 
>> best
>> Rupert
>> 
>> 
>> On 20.03.2012, at 08:42, Rupert Westenthaler wrote:
>> 
>>> Hi all,
>>> 
>>> While working on the SingleTdbDatasetTcProvider I noticed that the
>>> 
>>>   PrivilegedMGraphWrapper#getGraph()
>>> 
>>> calls
>>> 
>>>      public Graph getGraph() {
>>>              return new SimpleGraph(this);
>>>      }
>>> 
>>> If I am right this causes an in-memory copy of the the wrapped MGraph to
>> be created. Is there a special reason for that or should that?
>>> 
>>> I would rather expect an PrivilegedGraphWrapper  wrapping the graph
>> returned by the wrapped MGraph to be returned. Something like.
>>> 
>>>      public Graph getGraph() {
>>>              return AccessController.doPrivileged(new
>> PrivilegedAction<Graph>() {
>>> 
>>>                      @Override
>>>                      public Graph run() {
>>>                              return new
>> PrivilegedGraphWrapper(wrapped.getGraph());
>>>                      }
>>>              });
>>>      }
>>> 
>>> Maybe one would even like to have only a single PrivilegedGraphWrapper
>> that is created on the first call to getGraph()
>>> 
>>> best
>>> Rupert
>>> 
>> 
>>

Re: In-memory copies of graphs returned for Jena base TcProviders (was: PrivilegedMGraphWrapper#getGraph() create in-memory copy)

Reply via email to