Hi Bill,
Sorry for the delay in replying. It was too sunny.
On 01/04/12 22:41, Bill Roberts wrote:
Hi all
I've been trying to find a simple repeatable example that might reproduce the
database corruption problems I mentioned last week. Not sure this is exactly
the same thing, but I suspect it is related.
The essence of it is that running a SELECT query while a PUT to /data is in
progress seems to cause problems.
Unless you think I have missed something obvious, I'll add a ticket to the
issue tracker.
Please add it to JIRA.
There's nothing I can see that
(how much RAM?)
big.ttl is the imd-2010-imd-score.ttl file?
Steps to reproduce (details of my set up are at the bottom of the email)
1. create empty dir for TDB and start fuseki with that directory as the
tdb:location
2. PUT a small file to the graph protocol endpoint
curl -v -H "Content-Type: text/turtle" --upload-file small.ttl
http://localhost:3030/crashtest/data?graph=http://test1
3. Run a count of all triples
select (count(*) as ?c) where {?s ?p ?o}
Answer in my case is 25 (as expected)
Does it have to be 2 queries at stage 3? Does one have the same effect?
None?
4. PUT a big file
curl -v -H "Content-Type: text/turtle" --upload-file big.ttl
http://localhost:3030/crashtest/data?graph=http://test2
(big enough that it takes at least several seconds to load, so you have time to
run some other stuff. My example was about 200,000 triples)
5. Before it finishes, run the count query another 2 or 3 times.
It comes back with 25 each time. So far so good.
6. After the big file load is finished (check for 201 Created in log), run the
count again.
This is where the problem is evident: the count still shows 25, when it should
show 200,000 or so.
(Probably not significant, but my small test file has a few blank nodes in it.
The big file does not).
ls -l of the TDB dir shows lots of data still in nodes.dat-jrnl. The log
includes the line:
"WARN TDB :: Transaction not active: 5"
(full copy of the log below)
Going through the same procedure without running the COUNTs mentioned in stage
5, then everything goes smoothly.
I'd be interested to hear if anyone else can reproduce this - and of course to
hear what you think might be wrong!
Many thanks
Bill
Thanks
Andy
Details:
OS: Macosx 10.6.8
fuseki-server --version
------------------------------
Jena: VERSION: 2.7.0-incubating
Jena: BUILD_DATE: 2011-12-14T14:54:09+0000
ARQ: VERSION: 2.9.0-incubating
ARQ: BUILD_DATE: 2011-12-14T15:04:27+0000
TDB: VERSION: 0.9.0-incubating
TDB: BUILD_DATE: 2012-02-29T19:39:52+0000
Fuseki: VERSION: 0.2.2-incubating-SNAPSHOT
Fuseki: BUILD_DATE: 20120330-0505
java -version
------------------
java version "1.6.0_29"
Java(TM) SE Runtime Environment (build 1.6.0_29-b11-402-10M3527)
Java HotSpot(TM) 64-Bit Server VM (build 20.4-b02-402, mixed mode)
Config file:
---------------
@prefix tdb:<http://jena.hpl.hp.com/2008/tdb#> .
@prefix rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs:<http://www.w3.org/2000/01/rdf-schema#> .
@prefix ja:<http://jena.hpl.hp.com/2005/11/Assembler#> .
@prefix fuseki:<http://jena.apache.org/fuseki#> .
[] rdf:type fuseki:Server ;
# Services available. Only explicitly listed services are configured.
# If there is a service description not linked from this list, it is
ignored.
fuseki:services (
<#service1>
) .
[] ja:loadClass "com.hp.hpl.jena.tdb.TDB" .
tdb:DatasetTDB rdfs:subClassOf ja:RDFDataset .
tdb:GraphTDB rdfs:subClassOf ja:Model .
<#service1> rdf:type fuseki:Service ;
fuseki:name "crashtest" ; # http://host:port/blah
fuseki:serviceQuery "query" ; # SPARQL query service
fuseki:serviceUpdate "update" ; # SPARQL update service
fuseki:serviceReadWriteGraphStore "data" ; # SPARQL Graph store
protocol (read and write)
fuseki:dataset<#dataset-blah> ;
.
<#dataset-blah> rdf:type tdb:DatasetTDB ;
tdb:location "/Users/bill/tdb/crashtest" ;
# Query timeout on this dataset (1s, 1000 milliseconds)
ja:context [ ja:cxtName "arq:queryTimeout" ; ja:cxtValue "10000" ] ;
tdb:unionDefaultGraph true ;
://
fuseki log:
--------------
20:23:51 INFO Config :: Configuration file: test.ttl
20:23:51 INFO Config ::
Service:<file:///Users/bill/code/fuseki-0.2.2/test.ttl#service1>
20:23:51 INFO Config :: name = crashtest
20:23:51 INFO Config :: query = /crashtest/query
20:23:51 INFO Config :: update = /crashtest/update
20:23:51 INFO Config :: graphStore(RW) = /crashtest/data
20:23:52 INFO Server :: Dataset path = /crashtest
20:23:52 INFO Server :: Fuseki 0.2.2-incubating-SNAPSHOT
20120330-0505
20:23:52 INFO Server :: Jetty 7.x.y-SNAPSHOT
20:23:52 INFO Server :: Started 2012/04/01 20:23:52 BST on port
3030
20:24:15 INFO Fuseki :: [1] PUT
http://localhost:3030/crashtest/data?graph=http://test1
20:24:16 INFO Fuseki :: [1] 201 Created
20:24:43 INFO Fuseki :: [2] GET
http://localhost:3030/crashtest/query?query=select+%28count%28*%29+as+%3Fc%29+where+%7B%3Fs+%3Fp+%3Fo%7D+&output=text&stylesheet=%2Fxml-to-html.xsl
20:24:43 INFO Fuseki :: [2] Query = select (count(*) as ?c)
where {?s ?p ?o}
20:24:43 INFO Fuseki :: [2] OK/select
20:24:43 INFO Fuseki :: [2] 200 OK
20:29:12 INFO Fuseki :: [3] PUT
http://localhost:3030/crashtest/data?graph=http://test2
20:29:14 INFO Fuseki :: [4] GET
http://localhost:3030/crashtest/query?query=select+%28count%28*%29+as+%3Fc%29+where+%7B%3Fs+%3Fp+%3Fo%7D+&output=text&stylesheet=%2Fxml-to-html.xsl
20:29:14 INFO Fuseki :: [4] Query = select (count(*) as ?c)
where {?s ?p ?o}
20:29:14 INFO Fuseki :: [4] OK/select
20:29:14 INFO Fuseki :: [4] 200 OK
20:29:18 INFO Fuseki :: [5] GET
http://localhost:3030/crashtest/query?query=select+%28count%28*%29+as+%3Fc%29+where+%7B%3Fs+%3Fp+%3Fo%7D+&output=text&stylesheet=%2Fxml-to-html.xsl
20:29:18 INFO Fuseki :: [5] Query = select (count(*) as ?c)
where {?s ?p ?o}
20:29:18 INFO Fuseki :: [5] OK/select
20:29:18 INFO Fuseki :: [5] 200 OK
20:29:28 WARN TDB :: Transaction not active: 5
20:29:28 INFO Fuseki :: [3] 201 Created
20:29:28 INFO Fuseki :: [6] GET
http://localhost:3030/crashtest/query?query=select+%28count%28*%29+as+%3Fc%29+where+%7B%3Fs+%3Fp+%3Fo%7D+&output=text&stylesheet=%2Fxml-to-html.xsl
20:29:28 INFO Fuseki :: [6] Query = select (count(*) as ?c)
where {?s ?p ?o}
20:29:28 INFO Fuseki :: [6] OK/select
20:29:28 INFO Fuseki :: [6] 200 OK
ls -l crashtest
------------------
drwxr-xr-x 31 bill bill 1054 1 Apr 20:24 ./
drwxr-xr-x 15 bill bill 510 1 Apr 20:23 ../
-rw-r--r-- 1 bill bill 16777216 1 Apr 20:29 GOSP.dat
-rw-r--r-- 1 bill bill 8388608 1 Apr 20:23 GOSP.idn
-rw-r--r-- 1 bill bill 16777216 1 Apr 20:29 GPOS.dat
-rw-r--r-- 1 bill bill 8388608 1 Apr 20:23 GPOS.idn
-rw-r--r-- 1 bill bill 16777216 1 Apr 20:29 GSPO.dat
-rw-r--r-- 1 bill bill 8388608 1 Apr 20:23 GSPO.idn
-rw-r--r-- 1 bill bill 8388608 1 Apr 20:23 OSP.dat
-rw-r--r-- 1 bill bill 8388608 1 Apr 20:23 OSP.idn
-rw-r--r-- 1 bill bill 16777216 1 Apr 20:29 OSPG.dat
-rw-r--r-- 1 bill bill 8388608 1 Apr 20:23 OSPG.idn
-rw-r--r-- 1 bill bill 8388608 1 Apr 20:23 POS.dat
-rw-r--r-- 1 bill bill 8388608 1 Apr 20:23 POS.idn
-rw-r--r-- 1 bill bill 16777216 1 Apr 20:29 POSG.dat
-rw-r--r-- 1 bill bill 8388608 1 Apr 20:23 POSG.idn
-rw-r--r-- 1 bill bill 8388608 1 Apr 20:23 SPO.dat
-rw-r--r-- 1 bill bill 8388608 1 Apr 20:23 SPO.idn
-rw-r--r-- 1 bill bill 16777216 1 Apr 20:29 SPOG.dat
-rw-r--r-- 1 bill bill 8388608 1 Apr 20:23 SPOG.idn
-rw-r--r-- 1 bill bill 0 1 Apr 20:24 journal.jrnl
-rw-r--r-- 1 bill bill 8388608 1 Apr 20:24 node2id.dat
-rw-r--r-- 1 bill bill 8388608 1 Apr 20:24 node2id.idn
-rw-r--r-- 1 bill bill 2485 1 Apr 20:24 nodes.dat
-rw-r--r-- 1 bill bill 5596812 1 Apr 20:29 nodes.dat-jrnl
-rw-r--r-- 1 bill bill 8388608 1 Apr 20:23 prefix2id.dat
-rw-r--r-- 1 bill bill 8388608 1 Apr 20:23 prefix2id.idn
-rw-r--r-- 1 bill bill 8388608 1 Apr 20:23 prefixIdx.dat
-rw-r--r-- 1 bill bill 8388608 1 Apr 20:23 prefixIdx.idn
-rw-r--r-- 1 bill bill 0 1 Apr 20:23 prefixes.dat
-rw-r--r-- 1 bill bill 0 1 Apr 20:24 prefixes.dat-jrnl