Hi Andy,
I am sharing a few more details, but nothing really useful (I am sorry).

Andy Seaborne wrote:
2/ How much data is there in a store?

Not big.

In triples?

~ 5M triples.

I did check the TDB indexes (which we saved before rebuilding that
dataset) were not corrupted. I am able to do a tdbdump on it and
I also checked with a tool which do a scan on all the indexes.
Everything seems, fine as far as I can see.

This does not scale for large indexes and it's just a quick hack:
https://github.com/castagna/tdbloader3/blob/master/src/test/java/dev/TDBVerifier.java
... however, a command line tool which users can run to check the
integrity of TDB indexes could be useful.




3/ How big and how frequent are the updates?
Ditto reads.

The update we were submitting when we saw the exception wasn't big
(but not tiny): 13492 triples.

That store performed 442 write transactions previously, without
problems. It failed when we submitted the 443 write transaction.

At that point in time we were submitting many updates, sequentially
one after the other, and continuously (i.e. we were replaying old
updates from a key-value store).

A couple of other nodes, running exactly the same code, did not
experience any problem. The difference might be on the reads.
There might have been reads during the updates.


4/ How are the updates being done?

We still serialize writes and we run write transactions via the usual
begin, try { ... commit } catch { abort } pattern.

So API, not SPARQL Update?

Which calls?

A snippet of what we do:

    SyncableDataset dataset = getDataset();
    try {
        [...]
        commit ( dataset );
    } catch (RdfStoreException e) {
        abort( dataset );
        throw e;
    } catch (InconsistentRemovalSetException e) {
        abort( dataset );
        throw e;
    } finally {
        if ( dataset instanceof TxTdbDataset ) {
            [...]
            dataset.close();
        }
    }

SyncableDataset implements Dataset.

getDataset() ultimately does:

    sConn = StoreConnection.make(location);
    datasetGraph = sConn.begin(ReadWrite.WRITE);

abort() and commit() ultimately do:

    DatasetGraphTxn dsg = ...
    dsg.commit();

and:

    dsg.abort();


Last but not least, I double checked and in the last 20 or so days that
was the only TDBException we had.

From the logs I have, I does not seem that there was reads going at the
time of the update (and exception).

Paolo



Reply via email to