RE: combining models and inferencing

David Jordan Fri, 09 Sep 2011 06:14:02 -0700

OK. I'll have several different needs, some data can be brought all in memory 
and needs to be very fast, other data can be accessed from the database. I am 
trying to get a better understanding of the various architectural choices that 
are possible and the technical implications of those choices. A document 
describing all of this would be really nice.


-----Original Message-----
From: Dave Reynolds [mailto:[email protected]] 
Sent: Friday, September 09, 2011 9:06 AM
To: [email protected]
Subject: RE: combining models and inferencing

On Fri, 2011-09-09 at 12:54 +0000, David Jordan wrote: 
> In terms of caching, I am talking about caching the entire data set in memory 
> (as opposed to prior query results), with efficient lookup data structures as 
> well. I was thinking more of a larger server environment with essentially an 
> in-memory database, initially read from a disk-based database, but then all 
> brought in memory. 

Fair enough, in that case just add the data to a mem model - no need for all 
the messing with subModels/datasets that I talked about :) 

[I don't know what the performance of copying the entire SDB image into memory 
is compared to reading a compressed N-Triples file but that might be another 
option for the static parts of the data.]

Dave


> -----Original Message-----
> From: Dave Reynolds [mailto:[email protected]]
> Sent: Friday, September 09, 2011 3:53 AM
> To: [email protected]
> Subject: RE: combining models and inferencing
> 
> Hi,
> 
> On Thu, 2011-09-08 at 17:09 +0000, David Jordan wrote: 
> > Thanks Dave for explaining the various approaches that can be used and some 
> > of the implications of each.
> > I think I need option 2. Thanks for clarifying the differences that result 
> > from add versus addSubModel.
> > I guess my real question was whether these sub models need to be wrapped 
> > around their own inference model before they are "added" to the "parent" 
> > inference model. I am guessing the answer is no.
> 
> The parent shouldn't be an inference model, just an OntModel, assuming you 
> are trying to avoid running inference over your pre-inferred data.
> In which case if you want the inference closure of one of the components 
> you'll need to make that an InfModel.
> 
> > It sounds like SDB always goes to the database when the Java API is used. 
> > It sounds like there is no caching at all by the API. 
> 
> Correct, no caching at the RDF level. However, the database itself may well 
> do some caching.
> 
> > Does TDB or Josecki do any caching in memory? 
> 
> TDB is the same. There is no RDF level caching but the OS caching of the 
> memory mapped blocks can be quite effective depending on data sizes and 
> access patterns.
> 
> Joseki/Fuseki are not different stores they are web server encapsulations of 
> the Jena stores i.e. TDB, SDB or memory.
> 
> > Are there any plans to support caching in memory? Is there a way to open a 
> > model that has its data stored in the database, but then causing it to be 
> > cached in memory in an efficient form for very fast access and inferencing?
> 
> Short answer is no, no active plans. 
> 
> You can certainly load a database-backed model into memory by doing an add 
> but the point of using the database is typically to get scaling beyond what 
> can fit in memory.
> 
> Caching is quite hard to do at the RDF level since there is no equivalent of 
> an "object" to act as a unit of cache management. Each query typically does a 
> complex set of joins across different slices of the data, it is relatively 
> rare to find one query is a neat subset of a previous query that you have 
> already cached.
> 
> Dave
> 
> > 
> > 
> > 
> > -----Original Message-----
> > From: Dave Reynolds [mailto:[email protected]]
> > Sent: Thursday, September 08, 2011 12:47 PM
> > To: [email protected]
> > Subject: Re: combining models and inferencing
> > 
> > On Thu, 2011-09-08 at 15:28 +0000, David Jordan wrote: 
> > > I am getting ready to write my first application that combines several 
> > > models that have already been populated in SDB. This application will be 
> > > doing inferencing to answer some questions.
> > > 
> > > One of the models had a relatively large hierarchy, 10s of 1000s of 
> > > classes. I created a fully inferenced version of this model and saved it 
> > > into the database as a separate model. This is one of the models I will 
> > > combine in the application.
> > > 
> > > I then have several fairly small models that I have created and stored, 
> > > all of these in SDB.
> > > 
> > > Now I am going to write an application that reads a small OWL file into a 
> > > new inference model, adding these predefined models stored in the 
> > > database.
> > 
> > What do you mean by "add" here?
> > 
> > If you mean literally model.add then that will load all the data from your 
> > database into the memory model. This may not be what you want.
> > 
> > > Assume the following models:
> > > 1.      Large pre-inferenced model (stored in SDB)
> > > 2.      3 models that have not been pre-inferenced (stored in SDB)
> > > 3.      A newly created in memory model with inferencing
> > > 
> > > I will be combining these models together. When I access the pre-existing 
> > > models in SDB, to I need to create an inference model for them, or will 
> > > that happen automatically because I am combining them with the new 
> > > in-memory inference model?
> > 
> > As above, depends on what you mean by "combine".
> > 
> > To give a single view over such a disparate collection of models you have a 
> > few options:
> > 
> > 1. Load all the data into a single memory model, with inference 
> > (presumably not what you mean to do)
> > 
> > 2. Create an OntModel (without inference) and use addSubModel to add 
> > each component model as a dynamic union, if you want the inference 
> > closure of your in-memory model then wrap that as an InfModel before 
> > you addSubModel it to the union
> > 
> > 3 Use SPARQL DataSets and access all your data with SPARQL rather 
> > than the Java API being explicit about named graphs you are querying
> > 
> > Note that there are some complex tradeoffs here, both in terms of exactly 
> > what inferences you are trying to do and what your access patterns are.
> > 
> > In particular accessing things at the API level via dynamic unions may be 
> > slow for the database parts because you are asking one triple pattern at a 
> > time and doing joins in your application code, whereas via SPARQL you can 
> > delegate some of the joins to the database. Note that you query the union 
> > of all your SDB models by switching on the SDBL.defaultUnionGraph flag.
> > 
> > Dave
> > 
> > 
> > 
> 
> 
> 
>

RE: combining models and inferencing

Reply via email to