Thanks Andy, this was my suspicion. I ruled out the database as the cause by 
doing a first run to get all the data I wanted, stop the application, then 
immediately re-ran it. Results were the same, so the database was not the 
cause. I did not know whether the inferences were calculated in a dynamic, lazy 
evaluation fashion, or done in bulk up front. This test and your explanation 
confirms this happens on the first call.

I do plan on trying TDB for the ICD9 data, but cannot use it for all the data 
because of the need for concurrent updating for some of the other data. I'll be 
anxious to see how much TDB improves the results.

-----Original Message-----
From: Andy Seaborne [mailto:[email protected]] 
Sent: Monday, September 12, 2011 4:20 AM
To: [email protected]
Subject: Re: some surprising performance results



On 11/09/11 17:29, David Jordan wrote:
>
>
> I am getting some performance numbers that don't make sense based on
my understanding of how Jena and SDB work. It was my understanding that nothing 
is cached in memory, that each API call will result in queries performed 
against the database. But I am getting very different results, suggesting that 
on the very first call, a significant amount of processing takes place to 
produce the results, that is then used for subsequent calls.

It is true that SDB does not do client-side caching (the DB server is cachign 
liek crazy as all SQL systems do).  But if you put an inference model over a 
SDB model, then the inference layer is caching.  A lot of work can happen on 
first use, then the inferences are already calculated.

You can confimr this by looking at the DB log - if I underatand your setup, 
operations are the first shoudl arely if all cause any SQL traffic.

TDB is going to be faster to back an ontology model with inference.  A lot of 
quite fine grained data requests are done and SDB incurs JDBC overhead on each 
one.

        Andy

>
> I’ll try to explain in detail what I have. There is an ICD9 ontology,
it is strictly a hierarchy of 11874 classes, each class has a singleton, the 
hierarchy is around 5 levels deep. Since this is a read-only set of data, I 
created an OntModel with the appropriate level of reasoning, output this and 
then read it into another model. Below is a list of the number of nodes for 
that ICD9 model, in both forms, to give you an idea of how large it is.

> mysql>  select lex, count(*) from Nodes, Quads where g = hash group by 
> mysql> lex;
> +---------------------------------------------------------+----------+
> | lex                                                     | count(*) |
> +---------------------------------------------------------+----------+
> | http://purl.bioontology.org/ontology/HOM_ICD9/          |    73420 |
> | http://purl.bioontology.org/ontology/HOM_ICD9_inferred/ |   262024 |
> | http://www.sas.com/hls/hoa/patient/                     |     1282 |
> +---------------------------------------------------------+----------+
>
> Though I am reading in the patient ontology, I am not yet using it, that will 
> be my next test.
>
> For the benchmark application, I read in a file that contains the following 
> descriptions of diseases, defined via a set of ICD9 codes. I add the 
> HOM_ICD9_inferred and patient ontologies using addSubModel().
>
> @prefix :<http://www.sas.com/hls/ ex/>  .
> @prefix HOM_ICD9:<http://purl.bioontology.org/ontology/HOM_ICD9/>  .
> @prefix xsd:<http://www.w3.org/2001/XMLSchema#>  .
> @prefix rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>  .
> @prefix rdfs:<http://www.w3.org/2000/01/rdf-schema#>  .
> @prefix owl:<http://www.w3.org/2002/07/owl#>  .
>
> :Disease rdf:type owl:Class ;
>          rdfs:label "Disease"^^xsd:string .
>
> :Colon_Cancer rdf:type owl:Class ;
>          rdfs:subClassOf :Disease ;
>          rdfs:label "Colon Cancer"^^xsd:string .
>
> :Anal_Cancer rdf:type owl:Class ;
>          rdfs:subClassOf :Disease ;
>          rdfs:label "Anal Cancer"^^xsd:string .
>
> :Lung_Cancer rdf:type owl:Class ;
>          rdfs:subClassOf :Disease ;
>          rdfs:label "Lung Cancer"^^xsd:string .
>
> :Breast_Cancer rdf:type owl:Class ;
>          rdfs:subClassOf :Disease ;
>          rdfs:label "Breast Cancer"^^xsd:string .
>
> :Prostate_Cancer rdf:type owl:Class ;
>          rdfs:subClassOf :Disease ;
>          rdfs:label "Prostate Cancer"^^xsd:string .
>
> :Breast_Cancer a owl:Class ;
>          owl:unionOf (
>                  HOM_ICD9:HOM_ICD_10558  # V10.3
>                  HOM_ICD9:HOM_ICD_1292   # 174
>                  HOM_ICD9:HOM_ICD_1297   # 174.4
>                  HOM_ICD9:HOM_ICD_1300   # 174.8
>                  HOM_ICD9:HOM_ICD_1294   # 174.1
>                  HOM_ICD9:HOM_ICD_1295   # 174.2
>                  HOM_ICD9:HOM_ICD_1299   # 174.6
>                  HOM_ICD9:HOM_ICD_1758   # 233
>                  HOM_ICD9:HOM_ICD_1296   # 174.3
>                  HOM_ICD9:HOM_ICD_1298   # 174.5
>                  HOM_ICD9:HOM_ICD_1304   # 175.9
>                  HOM_ICD9:HOM_ICD_1301   # 174.9
>                  HOM_ICD9:HOM_ICD_1302   # 175
>          ) .
>
> :Lung_Cancer a owl:Class ;
>          owl:unionOf (
>                  HOM_ICD9:HOM_ICD_1231   # 162.9
>                  HOM_ICD9:HOM_ICD_1744   # 231.2
>                  HOM_ICD9:HOM_ICD_10551  # V10.11
>                  HOM_ICD9:HOM_ICD_1228   # 162.4
>                  HOM_ICD9:HOM_ICD_1229   # 162.5
>                  HOM_ICD9:HOM_ICD_1227   # 162.3
>                  HOM_ICD9:HOM_ICD_1230   # 162.8
>                  HOM_ICD9:HOM_ICD_1544   # 209.21
>                  HOM_ICD9:HOM_ICD_1226   # 162.2
>          ) .
>
> :Colon_Cancer a owl:Class ;
>          owl:unionOf (
>                  HOM_ICD9:HOM_ICD_10546  # V10.05
>                  HOM_ICD9:HOM_ICD_1170   # 153.6
>                  HOM_ICD9:HOM_ICD_1166   # 153.2
>                  HOM_ICD9:HOM_ICD_1169   # 153.5
>                  HOM_ICD9:HOM_ICD_1163   # 153
>                  HOM_ICD9:HOM_ICD_1165   # 153.1
>                  HOM_ICD9:HOM_ICD_1734   # 230.3
>                  HOM_ICD9:HOM_ICD_1173   # 153.9
>                  HOM_ICD9:HOM_ICD_1168   # 153.4
>                  HOM_ICD9:HOM_ICD_1540   # 209.16
>                  HOM_ICD9:HOM_ICD_1539   # 209.15
>                  HOM_ICD9:HOM_ICD_1538   # 209.14
>                  HOM_ICD9:HOM_ICD_1537   # 209.13
>                  HOM_ICD9:HOM_ICD_1172   # 153.8
>                  HOM_ICD9:HOM_ICD_1536   # 209.12
>                  HOM_ICD9:HOM_ICD_1535   # 209.11
>                  HOM_ICD9:HOM_ICD_1171   # 153.7
>                  HOM_ICD9:HOM_ICD_1533   # 209.1
>                  HOM_ICD9:HOM_ICD_1167   # 153.3
>                  HOM_ICD9:HOM_ICD_1202   # 159
>          ) .
>
> :Prostate_Cancer a owl:Class ;
>          owl:unionOf (
>                  HOM_ICD9:HOM_ICD_10566  # V10.46
>                  HOM_ICD9:HOM_ICD_1767   # 233.4
>                  HOM_ICD9:HOM_ICD_1343   # 185
>          ) .
>
> :Anal_Cancer a owl:Class ;
>          owl:unionOf (
>                  HOM_ICD9:HOM_ICD_1179   # 154.8
>                  HOM_ICD9:HOM_ICD_10547  # V10.06
>                  HOM_ICD9:HOM_ICD_1178   # 154.3
>                  HOM_ICD9:HOM_ICD_8412   # 796.76
>                  HOM_ICD9:HOM_ICD_8410   # 796.74
>                  HOM_ICD9:HOM_ICD_8409   # 796.73
>                  HOM_ICD9:HOM_ICD_1177   # 154.2
>                  HOM_ICD9:HOM_ICD_8408   # 796.72
>                  HOM_ICD9:HOM_ICD_8407   # 796.71
>                  HOM_ICD9:HOM_ICD_8405   # 796.7
>                  HOM_ICD9:HOM_ICD_1737   # 230.6
>                  HOM_ICD9:HOM_ICD_1736   # 230.5
>                  HOM_ICD9:HOM_ICD_1174   # 154
>                  HOM_ICD9:HOM_ICD_1735   # 230.4
>                  HOM_ICD9:HOM_ICD_1541   # 209.17
> ) .
>
> Below are the results from running my program on a laptop. Once I do this on 
> our server, I expect to get better numbers, but the general pattern of time 
> differences will likely be the same. As you can see, the very first access of 
> the Prostate_Cancer class and its instances to 4081 seconds, for just 3 
> classes involved. Yet subsequent calls to the other disease classes took 
> substantially less time. Can this all be attributed to the time it takes the 
> database to cache things, or is something going on in Jena that explains 
> this???
>
> I’ll be running this on our Linux server and will also try TDB for the ICD9 
> ontology to see if how much difference it makes. But it would be really nice 
> to hear if there is a Jena or SDB specific explanation for these results.
>
> Performance will be an important consideration for this application. I want 
> to be sure I am doing things in a way that gets to best possible performance 
> when using Jena.
>
>
>
> Loading http://purl.bioontology.org/ontology/HOM_ICD9_inferred/ took 
> 0.359 seconds Loading http://www.sas.com/hls/patient/ took 0.0 seconds 
> http://www.sas.com/hls/ex/Prostate_Cancer
> 3 instances found.
> Access took 4080.932 seconds
>
> http://www.sas.com/hls/ex/Breast_Cancer
> 28 instances found.
> Access took 3.313 seconds
>
> http://www.sas.com/hls/ex/Lung_Cancer
> 9 instances found.
> Access took 1.062 seconds
>
> http://www.sas.com/hls/ex/Anal_Cancer
> 17 instances found.
> Access took 1.656 seconds
>
> http://www.sas.com/hls/ex/Colon_Cancer
> 25 instances found.
> Access took 2.641 seconds
>
> http://www.sas.com/hls/ex/Disease
> 81 instances found.
> Access took 4.891 seconds
>
>
> David Jordan
> Software Developer
> SAS Institute Inc.
> Health&  Life Sciences, Research&  Development Bldg R ▪ Office 4467
> 600 Research Drive ▪ Cary, NC 27513
> Tel: 919 531 1233 ▪ [email protected]<mailto:[email protected]>
> www.sas.com<http://www.sas.com>
> SAS® … THE POWER TO KNOW®
>
>
>

Reply via email to