Additionally, a fun thing to look at for Node would be to have a static cache pre-populated with commonly used resources (RDF, RDFS, OWL, etc.), similar to Java's Integer.valueOf(int) method. That could be useful.
-Stephen On Fri, Jul 20, 2012 at 8:31 AM, Stephen Allen <sal...@apache.org> wrote: > +1 on removal of both the node and triple caches. > > In addition to the reasons already discussed, there is also the fact > that Node.create() uses a global lock, which is going to be really bad > for concurrency! > > Triple.create() doesn't do any locking, which appears to work out OK > in this specific instance because the cache never tries to remove > anything (the worst that could happen would be for two identical > triples to be floating around when two threads tried to insert the > same triple at the same time). > > -Stephen > > > On Fri, Jul 20, 2012 at 7:59 AM, Dave Reynolds > <dave.e.reyno...@gmail.com> wrote: >> Agreed. >> >> Primary value of a node cache from my POV is space saving for in-memory >> models. But that could indeed be done by ARP (if it isn't already) and is >> probably better done at the resource level. >> >> I wouldn't expect any significant effect on the rules engines from scraping >> these caches. >> >> Dave >> >> >> >> On 20/07/12 12:52, Andy Seaborne wrote: >>> >>> In JENA-279, the issue of whether the NodeCache serves any useful >>> purpose these days has come up. >>> >>> Proposal: Remove the node cache >>> Proposal: Remove the triple cache >>> >>> Node cache: >>> >>> There are two reasons for the cache: time saving (object creation costs) >>> and space saving (reuse nodes). I'm not sure either of these apply much >>> nowadays. Java has moved on; parsers should be doing the caching then >>> the cache is per-run. >>> >>> TDB does it's own thing because it is caching the node file and the >>> cache is NodeId to Node. >>> >>> RIOT, for IRIs, does it's own thing because it is coupled with caching >>> IRI parsing which is expensive because it's picky. >>> >>> A quick test: parsing a file: >>> - - - - - - - - - - - - - - >>> With node cache: >>> bsbm-25m.nt.gz : 183.27 sec 25,000,250 triples 136,415.85 TPS >>> >>> Without node cache: >>> Node.cache(false) ; >>> bsbm-25m.nt.gz : 179.19 sec 25,000,250 triples 139,514.99 TPS >>> - - - - - - - - - - - - - - >>> >>> so I think that it is better to remove the Node cache and Triple caches >>> and put reuse of Nodes (space saving, if any) as the responsibility of >>> the creation code (which is a parser or persistent-to-memory storage >>> unit typically). >>> >>> I will check ARP to see what it does (unless anyone can knowns ...) >>> >>> There are other caches at the Resource level so there some overlap there. >>> >>> Triple cache: >>> >>> There is a Triple cache as well although a lot of code goes direct to >>> new Triple() >>> >>> But any storage layer already does checking for a triple on insertion so >>> there is no spacing within one graph. The rules engine has two graphs >>> so there is not much saving there either. In fact, the cache overhead >>> is a net cost! >>> >>> There is no Quad cache. >>> >>> Andy >>> >>