On 11/09/11 17:29, David Jordan wrote:


I am getting some performance numbers that don't make sense based on
my understanding of how Jena and SDB work. It was my understanding that
nothing is cached in memory, that each API call will result in queries
performed against the database. But I am getting very different results,
suggesting that on the very first call, a significant amount of
processing takes place to produce the results, that is then used for
subsequent calls.

It is true that SDB does not do client-side caching (the DB server is cachign liek crazy as all SQL systems do). But if you put an inference model over a SDB model, then the inference layer is caching. A lot of work can happen on first use, then the inferences are already calculated.

You can confimr this by looking at the DB log - if I underatand your setup, operations are the first shoudl arely if all cause any SQL traffic.

TDB is going to be faster to back an ontology model with inference. A lot of quite fine grained data requests are done and SDB incurs JDBC overhead on each one.

        Andy


I’ll try to explain in detail what I have. There is an ICD9
ontology,
it is strictly a hierarchy of 11874 classes, each class has a singleton,
the hierarchy is around 5 levels deep. Since this is a read-only set of
data, I created an OntModel with the appropriate level of reasoning,
output this and then read it into another model. Below is a list of the
number of nodes for that ICD9 model, in both forms, to give you an idea
of how large it is.

mysql>  select lex, count(*) from Nodes, Quads where g = hash group by lex;
+---------------------------------------------------------+----------+
| lex                                                     | count(*) |
+---------------------------------------------------------+----------+
| http://purl.bioontology.org/ontology/HOM_ICD9/          |    73420 |
| http://purl.bioontology.org/ontology/HOM_ICD9_inferred/ |   262024 |
| http://www.sas.com/hls/hoa/patient/                     |     1282 |
+---------------------------------------------------------+----------+

Though I am reading in the patient ontology, I am not yet using it, that will 
be my next test.

For the benchmark application, I read in a file that contains the following 
descriptions of diseases, defined via a set of ICD9 codes. I add the 
HOM_ICD9_inferred and patient ontologies using addSubModel().

@prefix :<http://www.sas.com/hls/ ex/>  .
@prefix HOM_ICD9:<http://purl.bioontology.org/ontology/HOM_ICD9/>  .
@prefix xsd:<http://www.w3.org/2001/XMLSchema#>  .
@prefix rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>  .
@prefix rdfs:<http://www.w3.org/2000/01/rdf-schema#>  .
@prefix owl:<http://www.w3.org/2002/07/owl#>  .

:Disease rdf:type owl:Class ;
         rdfs:label "Disease"^^xsd:string .

:Colon_Cancer rdf:type owl:Class ;
         rdfs:subClassOf :Disease ;
         rdfs:label "Colon Cancer"^^xsd:string .

:Anal_Cancer rdf:type owl:Class ;
         rdfs:subClassOf :Disease ;
         rdfs:label "Anal Cancer"^^xsd:string .

:Lung_Cancer rdf:type owl:Class ;
         rdfs:subClassOf :Disease ;
         rdfs:label "Lung Cancer"^^xsd:string .

:Breast_Cancer rdf:type owl:Class ;
         rdfs:subClassOf :Disease ;
         rdfs:label "Breast Cancer"^^xsd:string .

:Prostate_Cancer rdf:type owl:Class ;
         rdfs:subClassOf :Disease ;
         rdfs:label "Prostate Cancer"^^xsd:string .

:Breast_Cancer a owl:Class ;
         owl:unionOf (
                 HOM_ICD9:HOM_ICD_10558  # V10.3
                 HOM_ICD9:HOM_ICD_1292   # 174
                 HOM_ICD9:HOM_ICD_1297   # 174.4
                 HOM_ICD9:HOM_ICD_1300   # 174.8
                 HOM_ICD9:HOM_ICD_1294   # 174.1
                 HOM_ICD9:HOM_ICD_1295   # 174.2
                 HOM_ICD9:HOM_ICD_1299   # 174.6
                 HOM_ICD9:HOM_ICD_1758   # 233
                 HOM_ICD9:HOM_ICD_1296   # 174.3
                 HOM_ICD9:HOM_ICD_1298   # 174.5
                 HOM_ICD9:HOM_ICD_1304   # 175.9
                 HOM_ICD9:HOM_ICD_1301   # 174.9
                 HOM_ICD9:HOM_ICD_1302   # 175
         ) .

:Lung_Cancer a owl:Class ;
         owl:unionOf (
                 HOM_ICD9:HOM_ICD_1231   # 162.9
                 HOM_ICD9:HOM_ICD_1744   # 231.2
                 HOM_ICD9:HOM_ICD_10551  # V10.11
                 HOM_ICD9:HOM_ICD_1228   # 162.4
                 HOM_ICD9:HOM_ICD_1229   # 162.5
                 HOM_ICD9:HOM_ICD_1227   # 162.3
                 HOM_ICD9:HOM_ICD_1230   # 162.8
                 HOM_ICD9:HOM_ICD_1544   # 209.21
                 HOM_ICD9:HOM_ICD_1226   # 162.2
         ) .

:Colon_Cancer a owl:Class ;
         owl:unionOf (
                 HOM_ICD9:HOM_ICD_10546  # V10.05
                 HOM_ICD9:HOM_ICD_1170   # 153.6
                 HOM_ICD9:HOM_ICD_1166   # 153.2
                 HOM_ICD9:HOM_ICD_1169   # 153.5
                 HOM_ICD9:HOM_ICD_1163   # 153
                 HOM_ICD9:HOM_ICD_1165   # 153.1
                 HOM_ICD9:HOM_ICD_1734   # 230.3
                 HOM_ICD9:HOM_ICD_1173   # 153.9
                 HOM_ICD9:HOM_ICD_1168   # 153.4
                 HOM_ICD9:HOM_ICD_1540   # 209.16
                 HOM_ICD9:HOM_ICD_1539   # 209.15
                 HOM_ICD9:HOM_ICD_1538   # 209.14
                 HOM_ICD9:HOM_ICD_1537   # 209.13
                 HOM_ICD9:HOM_ICD_1172   # 153.8
                 HOM_ICD9:HOM_ICD_1536   # 209.12
                 HOM_ICD9:HOM_ICD_1535   # 209.11
                 HOM_ICD9:HOM_ICD_1171   # 153.7
                 HOM_ICD9:HOM_ICD_1533   # 209.1
                 HOM_ICD9:HOM_ICD_1167   # 153.3
                 HOM_ICD9:HOM_ICD_1202   # 159
         ) .

:Prostate_Cancer a owl:Class ;
         owl:unionOf (
                 HOM_ICD9:HOM_ICD_10566  # V10.46
                 HOM_ICD9:HOM_ICD_1767   # 233.4
                 HOM_ICD9:HOM_ICD_1343   # 185
         ) .

:Anal_Cancer a owl:Class ;
         owl:unionOf (
                 HOM_ICD9:HOM_ICD_1179   # 154.8
                 HOM_ICD9:HOM_ICD_10547  # V10.06
                 HOM_ICD9:HOM_ICD_1178   # 154.3
                 HOM_ICD9:HOM_ICD_8412   # 796.76
                 HOM_ICD9:HOM_ICD_8410   # 796.74
                 HOM_ICD9:HOM_ICD_8409   # 796.73
                 HOM_ICD9:HOM_ICD_1177   # 154.2
                 HOM_ICD9:HOM_ICD_8408   # 796.72
                 HOM_ICD9:HOM_ICD_8407   # 796.71
                 HOM_ICD9:HOM_ICD_8405   # 796.7
                 HOM_ICD9:HOM_ICD_1737   # 230.6
                 HOM_ICD9:HOM_ICD_1736   # 230.5
                 HOM_ICD9:HOM_ICD_1174   # 154
                 HOM_ICD9:HOM_ICD_1735   # 230.4
                 HOM_ICD9:HOM_ICD_1541   # 209.17
) .

Below are the results from running my program on a laptop. Once I do this on 
our server, I expect to get better numbers, but the general pattern of time 
differences will likely be the same. As you can see, the very first access of 
the Prostate_Cancer class and its instances to 4081 seconds, for just 3 classes 
involved. Yet subsequent calls to the other disease classes took substantially 
less time. Can this all be attributed to the time it takes the database to 
cache things, or is something going on in Jena that explains this???

I’ll be running this on our Linux server and will also try TDB for the ICD9 
ontology to see if how much difference it makes. But it would be really nice to 
hear if there is a Jena or SDB specific explanation for these results.

Performance will be an important consideration for this application. I want to 
be sure I am doing things in a way that gets to best possible performance when 
using Jena.



Loading http://purl.bioontology.org/ontology/HOM_ICD9_inferred/ took 0.359 
seconds
Loading http://www.sas.com/hls/patient/ took 0.0 seconds
http://www.sas.com/hls/ex/Prostate_Cancer
3 instances found.
Access took 4080.932 seconds

http://www.sas.com/hls/ex/Breast_Cancer
28 instances found.
Access took 3.313 seconds

http://www.sas.com/hls/ex/Lung_Cancer
9 instances found.
Access took 1.062 seconds

http://www.sas.com/hls/ex/Anal_Cancer
17 instances found.
Access took 1.656 seconds

http://www.sas.com/hls/ex/Colon_Cancer
25 instances found.
Access took 2.641 seconds

http://www.sas.com/hls/ex/Disease
81 instances found.
Access took 4.891 seconds


David Jordan
Software Developer
SAS Institute Inc.
Health&  Life Sciences, Research&  Development
Bldg R ▪ Office 4467
600 Research Drive ▪ Cary, NC 27513
Tel: 919 531 1233 ▪ [email protected]<mailto:[email protected]>
www.sas.com<http://www.sas.com>
SAS® … THE POWER TO KNOW®



Reply via email to