Re: NodeCache : keep or remove?

Stephen Allen Fri, 20 Jul 2012 08:45:39 -0700

Additionally, a fun thing to look at for Node would be to have a
static cache pre-populated with commonly used resources (RDF, RDFS,
OWL, etc.), similar to Java's Integer.valueOf(int) method.  That could
be useful.


-Stephen


On Fri, Jul 20, 2012 at 8:31 AM, Stephen Allen <sal...@apache.org> wrote:
> +1 on removal of both the node and triple caches.
>
> In addition to the reasons already discussed, there is also the fact
> that Node.create() uses a global lock, which is going to be really bad
> for concurrency!
>
> Triple.create() doesn't do any locking, which appears to work out OK
> in this specific instance because the cache never tries to remove
> anything (the worst that could happen would be for two identical
> triples to be floating around when two threads tried to insert the
> same triple at the same time).
>
> -Stephen
>
>
> On Fri, Jul 20, 2012 at 7:59 AM, Dave Reynolds
> <dave.e.reyno...@gmail.com> wrote:
>> Agreed.
>>
>> Primary value of a node cache from my POV is space saving for in-memory
>> models. But that could indeed be done by ARP (if it isn't already) and is
>> probably better done at the resource level.
>>
>> I wouldn't expect any significant effect on the rules engines from scraping
>> these caches.
>>
>> Dave
>>
>>
>>
>> On 20/07/12 12:52, Andy Seaborne wrote:
>>>
>>> In JENA-279, the issue of whether the NodeCache serves any useful
>>> purpose these days has come up.
>>>
>>> Proposal: Remove the node cache
>>> Proposal: Remove the triple cache
>>>
>>> Node cache:
>>>
>>> There are two reasons for the cache: time saving (object creation costs)
>>> and space saving (reuse nodes).  I'm not sure either of these apply much
>>> nowadays.  Java has moved on; parsers should be doing the caching then
>>> the cache is per-run.
>>>
>>> TDB does it's own thing because it is caching the node file and the
>>> cache is NodeId to Node.
>>>
>>> RIOT, for IRIs, does it's own thing because it is coupled with caching
>>> IRI parsing which is expensive because it's picky.
>>>
>>> A quick test: parsing a file:
>>> - - - - - - - - - - - - - -
>>> With node cache:
>>> bsbm-25m.nt.gz : 183.27 sec  25,000,250 triples  136,415.85 TPS
>>>
>>> Without node cache:
>>> Node.cache(false) ;
>>> bsbm-25m.nt.gz : 179.19 sec  25,000,250 triples  139,514.99 TPS
>>> - - - - - - - - - - - - - -
>>>
>>> so I think that it is better to remove the Node cache and Triple caches
>>> and put reuse of Nodes (space saving, if any) as the responsibility of
>>> the creation code (which is a parser or persistent-to-memory storage
>>> unit typically).
>>>
>>> I will check ARP to see what it does (unless anyone can knowns ...)
>>>
>>> There are other caches at the Resource level so there some overlap there.
>>>
>>> Triple cache:
>>>
>>> There is a Triple cache as well although a lot of code goes direct to
>>> new Triple()
>>>
>>> But any storage layer already does checking for a triple on insertion so
>>> there is no spacing within one graph.  The rules engine has two graphs
>>> so there is not much saving there either.  In fact, the cache overhead
>>> is a net cost!
>>>
>>> There is no Quad cache.
>>>
>>>      Andy
>>>
>>

Re: NodeCache : keep or remove?

Reply via email to