RE: benchmarking

David Jordan Mon, 19 Sep 2011 11:19:02 -0700

One question is what level of inferencing is really necessary for things like 
the following:
It is not very clear to me yet which OWL constructs require particular 
inferencing levels.

:Prostate_Cancer a owl:Class ;
        owl:unionOf (
                HOM_ICD9:HOM_ICD_10566  # V10.46  These identify ICD9 codes for 
Prostate Cancer
                HOM_ICD9:HOM_ICD_1767   # 233.4     which exist in a large 
class hierarchy
                HOM_ICD9:HOM_ICD_1343   # 185
        ) .

:Cohort1 a owl:Class ;
        owl:equivalentClass [
                a owl:Restriction ;
                owl:onProperty patient:hasDiagnosis ;  # which patients have a 
diagnosis associated with prostate cancer?
                owl:someValuesFrom :Prostate_Cancer
        ] .

Except for taking the ICD9 class hierarchy into account, this is not really 
much more than a simple database query.
The nice aspect of doing this in OWL is that we can define these sets, like 
:Prostate_Cancer and :Cohort1, and then ask other questions of these sets.

I thought that with TDB running on a 64 bit Linux box, doing memory mapped I/O, 
that TDB could efficiently pull everything into memory quickly, avoiding doing 
lots of fine grained SQL calls to a MySQL server.

I did use writeAll for writing the OntModel.

Relative to your suggestion of
(1) Precompute all inferences, store those, then at runtime work with plain (no 
inference at all) models over that stored closure.

Would I need to do this for EVERYTHING, including the declarations above for 
Prostate_Cancer and Cohort1?

-----Original Message-----
From: Dave Reynolds [mailto:[email protected]] 
Sent: Monday, September 19, 2011 11:39 AM
To: [email protected]
Subject: Re: benchmarking

Hi,

On Mon, 2011-09-19 at 15:04 +0000, David Jordan wrote: 
> I have switch over from SDB to TDB to see if I can get better performance.
> In the following, Database is a class of mine that insulated the code from 
> knowing if it is SDB or TDB.
> 
> I do the following, which combines 2 models I have stored in TDB and then 
> reads a third small model from a file that contains some classes I want to 
> “test”. I then have some code that times how long it takes to get a 
> particular class and list its instances.
> 
> Model model1 = Database.getICD9inferredModel(); Model model2 = 
> Database.getPatientModel(); OntModel omodel = 
> ModelFactory.createOntologyModel(OntModelSpec.OWL_MEM_MICRO_RULE_INF, 
> model1); omodel.add(model2);

That is running a full rule reasoner over the TDB model. As I've mentioned 
before the rule inference engines store everything in memory so that doesn't 
give you any scaling over simply loading the file into memory and doing 
inference over that, it just goes very very slowly!

> InputStream in = FileManager.get().open(fileName); omodel.read(in, 
> baseName, "TURTLE");
> 
> OntClass oclass = omodel.getOntClass(line);   // access the class
> 
> On the first call to getOntClass, I have been seeing a VERY long wait (around 
> an hour) before I get a response.
> Then after that first call, subsequent calls are much faster.
> But I started looking at the CPU utilization. After the call to getOntClass, 
> CPU utilization is very close to 0.
> Is this to be expected?

Seems plausible, the inference engines are in effect doing huge number of 
triple queries to TDB which will be spend most of its time waiting for the disk.

If you really need to run live inference over the entire dataset then load it 
into a memory model first, then construct your inference model over that.

> Is there any form of tracing/logging that can be turned on to determine what 
> (if anything) is happening?
> 
> Is there something I am doing wrong in setting up my models?
> For the ICD9 ontology I am using, I had read in the OWL data, created an 
> OntModel with it, wrote this OntModel data out.
> Then I store the data from the OntModel into TDB, so it supposedly does not 
> have to do as much work at runtime.

As Chris says, make sure you using writeAll not just plain write to store the 
OntModel.

That aside, this doesn't necessarily save you much work because the rules are 
having to run anyway, they are just not discovering anything much new.

In the absence of a highly scalable inference solution for Jena (something 
which can't be done without resourcing) then your two good options are:

(1) Precompute all inferences, store those, then at runtime work with plain (no 
inference at all) models over that stored closure.

(2) Load all the data into memory and run inference over that.

Dave

RE: benchmarking

Reply via email to