[ 
https://issues.apache.org/jira/browse/CLEREZZA-395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12982851#action_12982851
 ] 

Reto Bachmann-Gmür commented on CLEREZZA-395:
---------------------------------------------

Hi Rupert,

You're right that a bnode-instance can reference bnodes in different 
triple-collections, in terms of rdf these are obviously not the same bnode. 
Nevertheless when you merge the two triplecollections the two bnodes 
represented by one instance become the same bnode.

The difference from the technical perspective is the life-span of the 
bnode-reference, as soon as the bnode-instance becomes eligible for garbage 
collection the storage provider knows that the bnode in question has no longer 
an intrinsic identity alien to rdf.

As long as we support such a triple-centric api we need to be able to point to 
a bnode at least while "drawing" a graph, but if this pointer has no age limit 
the storage layer would have to keep redundant information for ever.

Say the following Statements are created (with an empty graph1 and 
!rupert1.equals(rupert2)):

graph1.add(new TripleImpl(rupert1;firstName,new PlainLiteral("Rupert")); 
graph1.add(new TripleImpl(rupert2;firstName,new PlainLiteral("Rupert"));

after these two statements graph1 is clearly not lean, yet the implementation 
cannot remove the redundancy as long as following statements could be added:

graph1.add(new TripleImpl(rupert1;lastName,new PlainLiteral("Westenthaler")); 
graph1.add(new TripleImpl(rupert2;lastName,new PlainLiteral("Murdoch")); 

If you don't add the latter two statements the store is free to remove the 
redundancy when there's no reference to the bnode in any object of the 
application. If bnode identity was determined by a bnode-label the storage 
layer would never know for sure that nobody will attempt to reference the node 
by that id.

Cheers,
reto

> bnodes mapping in JenaGraphAdaptor should not keep growing with every parsing 
> of rdf files
> ------------------------------------------------------------------------------------------
>
>                 Key: CLEREZZA-395
>                 URL: https://issues.apache.org/jira/browse/CLEREZZA-395
>             Project: Clerezza
>          Issue Type: Improvement
>            Reporter: Hasan
>            Assignee: Hasan
>
> With every parsing of rdf files free memory is getting less.
> The problem seems to lie in the JenaGraphAdaptor class
> It has a member:
> final BidiMap<BNode, Node> tria2JenaBNodes = new BidiMapImpl<BNode, Node>();
> which grows each time a serialized graph get parsed.
> My experiments with my test data show
> At the end of the 1st parsing: Size of tria2JenaBNodes = 87200
> At the end of the 2nd parsing: Size of tria2JenaBNodes = 130800
> At the end of the 3rd parsing: Size of tria2JenaBNodes = 174400

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to