TDB, Fuseki, large journal file, and maxing a CPU
Hi all, We're running Fuseki with a few TDB datasets, and it seems to be acting rather inefficiently. Here's version numbers: [root@opendataproduction ~]# /usr/bin/java -jar /usr/share/java/fuseki-server.jar --version Jena: VERSION: 2.7.5-SNAPSHOT Jena: BUILD_DATE: 2012-10-21T09:26:22+0100 ARQ:VERSION: 2.9.5-SNAPSHOT ARQ:BUILD_DATE: 2012-10-21T09:29:20+0100 TDB:VERSION: 0.9.5-SNAPSHOT TDB:BUILD_DATE: 2012-10-21T09:40:32+0100 Fuseki: VERSION: 0.2.6-SNAPSHOT Fuseki: BUILD_DATE: 2012-10-21T09:44:10+0100 Here's top: top - 11:29:56 up 123 days, 18:52, 1 user, load average: 1.06, 1.20, 1.27 Tasks: 208 total, 1 running, 207 sleeping, 0 stopped, 0 zombie Cpu(s):100.0%us, 0.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 6132016k total, 5290072k used, 841944k free,93864k buffers Swap: 499704k total, 499704k used,0k free, 1290944k cached PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 22324 fuseki20 0 6624m 1.2g 23m S 99.3 20.0 1172:44 java You can see the trends at http://opendata.oucs.ox.ac.uk/oucs.ox.ac.uk/opendata.oucs.ox.ac.uk/cpu.html. The journal files look like this: [root@opendataproduction tdb]# ls */journal.jrnl -lh -rw-r--r-- 1 fuseki fuseki 3.5M Jun 15 03:55 courses/journal.jrnl -rw-r--r-- 1 fuseki fuseki 22M Jun 15 03:37 equipment/journal.jrnl -rw-r--r-- 1 fuseki fuseki 5.2M Jun 15 02:17 itservices/journal.jrnl -rw-r--r-- 1 fuseki fuseki 448M Jun 15 03:59 public/journal.jrnl -rw-r--r-- 1 fuseki fuseki 18M Jun 12 14:41 seesec/journal.jrnl Looking at the Fuseki logs, there have been various quiet periods where there shouldn't have been any read locks, and I would have thought these would have been cleared (particularly as non-public stores don't attract search engines or users). We're getting rather a number of Java heap space errors.java has a -Xmx1g (that's right, right? :D). The TDB DBs also seem to be growing over time disproportionally to any increase in triples. For example, the entire TDB directory for our public store is 28GB on disk; dumping it and reloading it recently put it at 97MB. The trend can be seen at http://opendata.oucs.ox.ac.uk/oucs.ox.ac.uk/opendata.oucs.ox.ac.uk/df.html; the sudden drops on the by-year graph are me dumping and reloading. The increase in disk usage in the last few days is — I suspect — something else. I'm thinking this could be managed by periodically shutting down Fuseki, applying the journal, reloading the store, and then setting Fuseki going again. However, I'm loathe to do this without understanding why it gets the way it does. Any thoughts? Answers of yes, we've fixed this; you need to upgrade are perfectly reasonable ;-). Yours, Alex -- Alexander Dutton Linked Open Data Architect, Office of the CIO; data.ox.ac.uk, OxPoints IT Services, University of Oxford signature.asc Description: OpenPGP digital signature
Re: Conditional INSERT statements
PREFIX dri: http://nationalarchives.gov.uk/terms/dri# PREFIX rdfs: http://www.w3.org/2000/01/rdf-schema# PREFIX status: http://nationalarchives.gov.uk/dri/catalogue/transferAssetStatus# PREFIX dct: http://purl.org/dc/terms/ PREFIX xsd: http://www.w3.org/2001/XMLSchema# PREFIX rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# DELETE { ?transfer dri:transferAsset ?transferAsset . } INSERT { ?transfer dri:transferAsset _:b0 . _:b0 dct:subject ?subject . _:b0 dri:transferAssetStatus status:SENT . _:b0 dct:modified 2013-06-13T11:58:23.468Z^^xsd:dateTime . } WHERE { ?transfer dct:identifier 201305241200^^xsd:string . ?subject dct:identifier dff82497-f161-4afd-8e38-f31a8b475b43^^xsd:string OPTIONAL { ?transfer dri:transferAsset ?transferAsset . ?transferAsset dct:subject ?subject . ?transferAsset dct:modified ?transferAssetModified FILTER ( ?transferAssetModified 2013-06-13T11:58:23.468Z^^xsd:dateTime ) } } Rob, (which version of the software?) The example data does not look like the DELETE was applied - there is still a dri:transferAsset link to the old state. I would have expected the bnode still to be there but the triple connecting it should have gone. If so, then the OPTIONAL is not matching -- it sets ?transferAsset. In your example, the ?subject dct:identifier ... does not match either but an INSERT does seem to have happened. Could you delete all ?transferAsset completely? The new state only depends on the new status if it's a legal state transition for the status. To cope with the fact that COMPLETED can come before SENDING, test the status. DELETE { ?transfer dri:transferAsset ?transferAsset . ?transferAsset ?p ?o . } INSERT { ?transfer dri:transferAsset _:b0 . _:b0 dct:subject ?subject . _:b0 dri:transferAssetStatus status:SENT . _:b0 dct:modified 2013-06-13T11:58:23.468Z^^xsd:dateTime . } WHERE { ?transfer dct:identifier 201305241200^^xsd:string ; dri:transferAssetStatus ?status ; dri:transferAsset ?transferAsset . FILTER (?status != status:COMPLETED) ?transferAsset ?p ?o . } ; SPARQL Updates can be several operations in one request. It may be easier to have two operations DELETE { ... } WHERE { ... } ; INSERT { ... } WHERE { ... } Andy
jena-arq doesn't contain org.openjena.riot.RiotLoader in 2.10.0
Hi, I'm working with the Jena examples provided on git://github.com/castagna/jena-examples.git There are some examples in the dev tree which contain the org.openjena.riot.RiotLoader type. It seems this type is only part of the jena-arq package 2.10.1 Setting a smaller version raises version conflicts with other types. What is the recommended new type for org.openjena.riot.RiotLoader so I can run the examples from GitHub Thanks in advance! Günter
Re: TDB, Fuseki, large journal file, and maxing a CPU
I'm thinking this could be managed by periodically shutting down Fuseki, applying the journal, reloading the store, and then setting Fuseki going again. However, I'm loathe to do this without understanding why it gets the way it does. Any thoughts? Answers of yes, we've fixed this; you need to upgrade are perfectly reasonable . BUILD_DATE: 2012-10-21 Hi Alex, Upgrade :-) That date is very close to 0.2.5. It does sound liek an area where improvements have been made. Also, stopping and starting the server should reduce the journal to zero bytes because all outstanding commits in the journal are always applied during startup. What is the write workload of these datasets? Is a lot of data deleted and new material added? Andy On 15/06/13 11:57, Alexander Dutton wrote: Hi all, We're running Fuseki with a few TDB datasets, and it seems to be acting rather inefficiently. Here's version numbers: [root@opendataproduction ~]# /usr/bin/java -jar /usr/share/java/fuseki-server.jar --version Jena: VERSION: 2.7.5-SNAPSHOT Jena: BUILD_DATE: 2012-10-21T09:26:22+0100 ARQ:VERSION: 2.9.5-SNAPSHOT ARQ:BUILD_DATE: 2012-10-21T09:29:20+0100 TDB:VERSION: 0.9.5-SNAPSHOT TDB:BUILD_DATE: 2012-10-21T09:40:32+0100 Fuseki: VERSION: 0.2.6-SNAPSHOT Fuseki: BUILD_DATE: 2012-10-21T09:44:10+0100 Here's top: top - 11:29:56 up 123 days, 18:52, 1 user, load average: 1.06, 1.20, 1.27 Tasks: 208 total, 1 running, 207 sleeping, 0 stopped, 0 zombie Cpu(s):100.0%us, 0.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 6132016k total, 5290072k used, 841944k free,93864k buffers Swap: 499704k total, 499704k used,0k free, 1290944k cached PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 22324 fuseki20 0 6624m 1.2g 23m S 99.3 20.0 1172:44 java You can see the trends at http://opendata.oucs.ox.ac.uk/oucs.ox.ac.uk/opendata.oucs.ox.ac.uk/cpu.html. The journal files look like this: [root@opendataproduction tdb]# ls */journal.jrnl -lh -rw-r--r-- 1 fuseki fuseki 3.5M Jun 15 03:55 courses/journal.jrnl -rw-r--r-- 1 fuseki fuseki 22M Jun 15 03:37 equipment/journal.jrnl -rw-r--r-- 1 fuseki fuseki 5.2M Jun 15 02:17 itservices/journal.jrnl -rw-r--r-- 1 fuseki fuseki 448M Jun 15 03:59 public/journal.jrnl -rw-r--r-- 1 fuseki fuseki 18M Jun 12 14:41 seesec/journal.jrnl Looking at the Fuseki logs, there have been various quiet periods where there shouldn't have been any read locks, and I would have thought these would have been cleared (particularly as non-public stores don't attract search engines or users). We're getting rather a number of Java heap space errors.java has a -Xmx1g (that's right, right? :D). The TDB DBs also seem to be growing over time disproportionally to any increase in triples. For example, the entire TDB directory for our public store is 28GB on disk; dumping it and reloading it recently put it at 97MB. The trend can be seen at http://opendata.oucs.ox.ac.uk/oucs.ox.ac.uk/opendata.oucs.ox.ac.uk/df.html; the sudden drops on the by-year graph are me dumping and reloading. The increase in disk usage in the last few days is — I suspect — something else. I'm thinking this could be managed by periodically shutting down Fuseki, applying the journal, reloading the store, and then setting Fuseki going again. However, I'm loathe to do this without understanding why it gets the way it does. Any thoughts? Answers of yes, we've fixed this; you need to upgrade are perfectly reasonable ;-). Yours, Alex