TDB, Fuseki, large journal file, and maxing a CPU

2013-06-15 Thread Alexander Dutton
Hi all,

We're running Fuseki with a few TDB datasets, and it seems to be acting
rather inefficiently.

Here's version numbers:

 [root@opendataproduction ~]# /usr/bin/java -jar
 /usr/share/java/fuseki-server.jar --version
 Jena:   VERSION: 2.7.5-SNAPSHOT
 Jena:   BUILD_DATE: 2012-10-21T09:26:22+0100
 ARQ:VERSION: 2.9.5-SNAPSHOT
 ARQ:BUILD_DATE: 2012-10-21T09:29:20+0100
 TDB:VERSION: 0.9.5-SNAPSHOT
 TDB:BUILD_DATE: 2012-10-21T09:40:32+0100
 Fuseki: VERSION: 0.2.6-SNAPSHOT
 Fuseki: BUILD_DATE: 2012-10-21T09:44:10+0100

Here's top:

 top - 11:29:56 up 123 days, 18:52,  1 user,  load average: 1.06, 1.20,
1.27
 Tasks: 208 total,   1 running, 207 sleeping,   0 stopped,   0 zombie
 Cpu(s):100.0%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi, 
0.0%si,  0.0%st
 Mem:   6132016k total,  5290072k used,   841944k free,93864k buffers
 Swap:   499704k total,   499704k used,0k free,  1290944k cached

   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+ 
COMMAND 


 22324 fuseki20   0 6624m 1.2g  23m S 99.3 20.0   1172:44
java



You can see the trends at
http://opendata.oucs.ox.ac.uk/oucs.ox.ac.uk/opendata.oucs.ox.ac.uk/cpu.html.

The journal files look like this:

 [root@opendataproduction tdb]# ls */journal.jrnl -lh
 -rw-r--r-- 1 fuseki fuseki 3.5M Jun 15 03:55 courses/journal.jrnl
 -rw-r--r-- 1 fuseki fuseki  22M Jun 15 03:37 equipment/journal.jrnl
 -rw-r--r-- 1 fuseki fuseki 5.2M Jun 15 02:17 itservices/journal.jrnl
 -rw-r--r-- 1 fuseki fuseki 448M Jun 15 03:59 public/journal.jrnl
 -rw-r--r-- 1 fuseki fuseki  18M Jun 12 14:41 seesec/journal.jrnl

Looking at the Fuseki logs, there have been various quiet periods where
there shouldn't have been any read locks, and I would have thought these
would have been cleared (particularly as non-public stores don't attract
search engines or users).

We're getting rather a number of Java heap space errors.java has a
-Xmx1g (that's right, right? :D).

The TDB DBs also seem to be growing over time disproportionally to any
increase in triples. For example, the entire TDB directory for our
public store is 28GB on disk; dumping it and reloading it recently put
it at 97MB. The trend can be seen at
http://opendata.oucs.ox.ac.uk/oucs.ox.ac.uk/opendata.oucs.ox.ac.uk/df.html;
the sudden drops on the by-year graph are me dumping and reloading. The
increase in disk usage in the last few days is — I suspect ­— something
else.

I'm thinking this could be managed by periodically shutting down Fuseki,
applying the journal, reloading the store, and then setting Fuseki going
again. However, I'm loathe to do this without understanding why it gets
the way it does.

Any thoughts? Answers of yes, we've fixed this; you need to upgrade
are perfectly reasonable ;-).

Yours,

Alex

-- 
Alexander Dutton
Linked Open Data Architect, Office of the CIO; data.ox.ac.uk, OxPoints
IT Services, University of Oxford




signature.asc
Description: OpenPGP digital signature


Re: Conditional INSERT statements

2013-06-15 Thread Andy Seaborne

PREFIX  dri:  http://nationalarchives.gov.uk/terms/dri#
PREFIX  rdfs: http://www.w3.org/2000/01/rdf-schema#
PREFIX  status: 
http://nationalarchives.gov.uk/dri/catalogue/transferAssetStatus#

PREFIX  dct:  http://purl.org/dc/terms/
PREFIX  xsd:  http://www.w3.org/2001/XMLSchema#
PREFIX  rdf:  http://www.w3.org/1999/02/22-rdf-syntax-ns#

DELETE {
  ?transfer dri:transferAsset ?transferAsset .
}
INSERT {
  ?transfer dri:transferAsset _:b0 .
  _:b0 dct:subject ?subject .
  _:b0 dri:transferAssetStatus status:SENT .
  _:b0 dct:modified 2013-06-13T11:58:23.468Z^^xsd:dateTime .
}
WHERE
  { ?transfer dct:identifier 201305241200^^xsd:string .
?subject dct:identifier 
dff82497-f161-4afd-8e38-f31a8b475b43^^xsd:string

OPTIONAL
  { ?transfer dri:transferAsset ?transferAsset .
?transferAsset dct:subject ?subject .
?transferAsset dct:modified ?transferAssetModified
FILTER ( ?transferAssetModified  
2013-06-13T11:58:23.468Z^^xsd:dateTime )

  }
  }



Rob,

(which version of the software?)

The example data does not look like the DELETE was applied - there is 
still a dri:transferAsset link to the old state.  I would have expected 
the bnode still to be there but the triple connecting it should have gone.


If so, then the OPTIONAL is not matching -- it sets ?transferAsset.

In your example, the

?subject dct:identifier ...

does not match either but an INSERT does seem to have happened.

Could you delete all ?transferAsset completely?  The new state only 
depends on the new status if it's a legal state transition for the status.


To cope with the fact that COMPLETED can come before SENDING, test the 
status.



DELETE {
  ?transfer dri:transferAsset ?transferAsset .
  ?transferAsset ?p ?o .
}
INSERT {
  ?transfer dri:transferAsset _:b0 .
  _:b0 dct:subject ?subject .
  _:b0 dri:transferAssetStatus status:SENT .
  _:b0 dct:modified 2013-06-13T11:58:23.468Z^^xsd:dateTime .
}
WHERE {
  ?transfer dct:identifier 201305241200^^xsd:string ;
dri:transferAssetStatus ?status ;
dri:transferAsset ?transferAsset .
  FILTER (?status != status:COMPLETED)
  ?transferAsset ?p ?o .
} ;


SPARQL Updates can be several operations in one request.  It may be 
easier to have two operations


DELETE { ... } WHERE { ... } ;
INSERT { ... } WHERE { ... }

Andy


jena-arq doesn't contain org.openjena.riot.RiotLoader in 2.10.0

2013-06-15 Thread Günter Hipler
Hi,

I'm working with the Jena examples provided on
git://github.com/castagna/jena-examples.git

There are some examples in the dev tree which contain the
org.openjena.riot.RiotLoader type.

It seems this type is only part of the  jena-arq package  2.10.1

Setting a smaller version raises version conflicts with other types.

What is the recommended new type for org.openjena.riot.RiotLoader so I can
run the examples from GitHub

Thanks in advance!

Günter


Re: TDB, Fuseki, large journal file, and maxing a CPU

2013-06-15 Thread Andy Seaborne

I'm thinking this could be managed by periodically shutting down Fuseki,
applying the journal, reloading the store, and then setting Fuseki going
again. However, I'm loathe to do this without understanding why it gets
the way it does.

Any thoughts? Answers of yes, we've fixed this; you need to upgrade
are perfectly reasonable .




 BUILD_DATE: 2012-10-21

Hi Alex,

Upgrade :-)  That date is very close to 0.2.5.  It does sound liek an 
area where improvements have been made.


Also, stopping and starting the server should reduce the journal to zero 
bytes because all outstanding commits in the journal are always applied 
during startup.


What is the write workload of these datasets?  Is a lot of data deleted 
and new material added?


Andy

On 15/06/13 11:57, Alexander Dutton wrote:

Hi all,

We're running Fuseki with a few TDB datasets, and it seems to be acting
rather inefficiently.

Here's version numbers:


[root@opendataproduction ~]# /usr/bin/java -jar
/usr/share/java/fuseki-server.jar --version
Jena:   VERSION: 2.7.5-SNAPSHOT
Jena:   BUILD_DATE: 2012-10-21T09:26:22+0100
ARQ:VERSION: 2.9.5-SNAPSHOT
ARQ:BUILD_DATE: 2012-10-21T09:29:20+0100
TDB:VERSION: 0.9.5-SNAPSHOT
TDB:BUILD_DATE: 2012-10-21T09:40:32+0100
Fuseki: VERSION: 0.2.6-SNAPSHOT
Fuseki: BUILD_DATE: 2012-10-21T09:44:10+0100


Here's top:


top - 11:29:56 up 123 days, 18:52,  1 user,  load average: 1.06, 1.20,

1.27

Tasks: 208 total,   1 running, 207 sleeping,   0 stopped,   0 zombie
Cpu(s):100.0%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,

0.0%si,  0.0%st

Mem:   6132016k total,  5290072k used,   841944k free,93864k buffers
Swap:   499704k total,   499704k used,0k free,  1290944k cached

   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+

COMMAND


22324 fuseki20   0 6624m 1.2g  23m S 99.3 20.0   1172:44

java


You can see the trends at
http://opendata.oucs.ox.ac.uk/oucs.ox.ac.uk/opendata.oucs.ox.ac.uk/cpu.html.

The journal files look like this:


[root@opendataproduction tdb]# ls */journal.jrnl -lh
-rw-r--r-- 1 fuseki fuseki 3.5M Jun 15 03:55 courses/journal.jrnl
-rw-r--r-- 1 fuseki fuseki  22M Jun 15 03:37 equipment/journal.jrnl
-rw-r--r-- 1 fuseki fuseki 5.2M Jun 15 02:17 itservices/journal.jrnl
-rw-r--r-- 1 fuseki fuseki 448M Jun 15 03:59 public/journal.jrnl
-rw-r--r-- 1 fuseki fuseki  18M Jun 12 14:41 seesec/journal.jrnl


Looking at the Fuseki logs, there have been various quiet periods where
there shouldn't have been any read locks, and I would have thought these
would have been cleared (particularly as non-public stores don't attract
search engines or users).

We're getting rather a number of Java heap space errors.java has a
-Xmx1g (that's right, right? :D).

The TDB DBs also seem to be growing over time disproportionally to any
increase in triples. For example, the entire TDB directory for our
public store is 28GB on disk; dumping it and reloading it recently put
it at 97MB. The trend can be seen at
http://opendata.oucs.ox.ac.uk/oucs.ox.ac.uk/opendata.oucs.ox.ac.uk/df.html;
the sudden drops on the by-year graph are me dumping and reloading. The
increase in disk usage in the last few days is — I suspect ­— something
else.

I'm thinking this could be managed by periodically shutting down Fuseki,
applying the journal, reloading the store, and then setting Fuseki going
again. However, I'm loathe to do this without understanding why it gets
the way it does.

Any thoughts? Answers of yes, we've fixed this; you need to upgrade
are perfectly reasonable ;-).

Yours,

Alex