OK. I'll have several different needs, some data can be brought all in memory and needs to be very fast, other data can be accessed from the database. I am trying to get a better understanding of the various architectural choices that are possible and the technical implications of those choices. A document describing all of this would be really nice.
-----Original Message----- From: Dave Reynolds [mailto:[email protected]] Sent: Friday, September 09, 2011 9:06 AM To: [email protected] Subject: RE: combining models and inferencing On Fri, 2011-09-09 at 12:54 +0000, David Jordan wrote: > In terms of caching, I am talking about caching the entire data set in memory > (as opposed to prior query results), with efficient lookup data structures as > well. I was thinking more of a larger server environment with essentially an > in-memory database, initially read from a disk-based database, but then all > brought in memory. Fair enough, in that case just add the data to a mem model - no need for all the messing with subModels/datasets that I talked about :) [I don't know what the performance of copying the entire SDB image into memory is compared to reading a compressed N-Triples file but that might be another option for the static parts of the data.] Dave > -----Original Message----- > From: Dave Reynolds [mailto:[email protected]] > Sent: Friday, September 09, 2011 3:53 AM > To: [email protected] > Subject: RE: combining models and inferencing > > Hi, > > On Thu, 2011-09-08 at 17:09 +0000, David Jordan wrote: > > Thanks Dave for explaining the various approaches that can be used and some > > of the implications of each. > > I think I need option 2. Thanks for clarifying the differences that result > > from add versus addSubModel. > > I guess my real question was whether these sub models need to be wrapped > > around their own inference model before they are "added" to the "parent" > > inference model. I am guessing the answer is no. > > The parent shouldn't be an inference model, just an OntModel, assuming you > are trying to avoid running inference over your pre-inferred data. > In which case if you want the inference closure of one of the components > you'll need to make that an InfModel. > > > It sounds like SDB always goes to the database when the Java API is used. > > It sounds like there is no caching at all by the API. > > Correct, no caching at the RDF level. However, the database itself may well > do some caching. > > > Does TDB or Josecki do any caching in memory? > > TDB is the same. There is no RDF level caching but the OS caching of the > memory mapped blocks can be quite effective depending on data sizes and > access patterns. > > Joseki/Fuseki are not different stores they are web server encapsulations of > the Jena stores i.e. TDB, SDB or memory. > > > Are there any plans to support caching in memory? Is there a way to open a > > model that has its data stored in the database, but then causing it to be > > cached in memory in an efficient form for very fast access and inferencing? > > Short answer is no, no active plans. > > You can certainly load a database-backed model into memory by doing an add > but the point of using the database is typically to get scaling beyond what > can fit in memory. > > Caching is quite hard to do at the RDF level since there is no equivalent of > an "object" to act as a unit of cache management. Each query typically does a > complex set of joins across different slices of the data, it is relatively > rare to find one query is a neat subset of a previous query that you have > already cached. > > Dave > > > > > > > > > -----Original Message----- > > From: Dave Reynolds [mailto:[email protected]] > > Sent: Thursday, September 08, 2011 12:47 PM > > To: [email protected] > > Subject: Re: combining models and inferencing > > > > On Thu, 2011-09-08 at 15:28 +0000, David Jordan wrote: > > > I am getting ready to write my first application that combines several > > > models that have already been populated in SDB. This application will be > > > doing inferencing to answer some questions. > > > > > > One of the models had a relatively large hierarchy, 10s of 1000s of > > > classes. I created a fully inferenced version of this model and saved it > > > into the database as a separate model. This is one of the models I will > > > combine in the application. > > > > > > I then have several fairly small models that I have created and stored, > > > all of these in SDB. > > > > > > Now I am going to write an application that reads a small OWL file into a > > > new inference model, adding these predefined models stored in the > > > database. > > > > What do you mean by "add" here? > > > > If you mean literally model.add then that will load all the data from your > > database into the memory model. This may not be what you want. > > > > > Assume the following models: > > > 1. Large pre-inferenced model (stored in SDB) > > > 2. 3 models that have not been pre-inferenced (stored in SDB) > > > 3. A newly created in memory model with inferencing > > > > > > I will be combining these models together. When I access the pre-existing > > > models in SDB, to I need to create an inference model for them, or will > > > that happen automatically because I am combining them with the new > > > in-memory inference model? > > > > As above, depends on what you mean by "combine". > > > > To give a single view over such a disparate collection of models you have a > > few options: > > > > 1. Load all the data into a single memory model, with inference > > (presumably not what you mean to do) > > > > 2. Create an OntModel (without inference) and use addSubModel to add > > each component model as a dynamic union, if you want the inference > > closure of your in-memory model then wrap that as an InfModel before > > you addSubModel it to the union > > > > 3 Use SPARQL DataSets and access all your data with SPARQL rather > > than the Java API being explicit about named graphs you are querying > > > > Note that there are some complex tradeoffs here, both in terms of exactly > > what inferences you are trying to do and what your access patterns are. > > > > In particular accessing things at the API level via dynamic unions may be > > slow for the database parts because you are asking one triple pattern at a > > time and doing joins in your application code, whereas via SPARQL you can > > delegate some of the joins to the database. Note that you query the union > > of all your SDB models by switching on the SDBL.defaultUnionGraph flag. > > > > Dave > > > > > > > > > >
