On 17/06/11 17:08, jp wrote:
Hey Simon
The only code I am running is.
DatasetGraphTDB datasetGraph = TDBFactory.createDatasetGraph(tdbDir);
InputStream inputStream = new FileInputStream(dbpediaData);
BulkLoader bulkLoader = new BulkLoader();
bulkLoader.loadDataset(dataset, instanceStream, true);
jp,
How does this fit with running:
>> datasetGraph.getDefaultGraph().add(new
>> Triple(Node.createURI("urn:hello"), RDF.type.asNode(),
>> Node.createURI("urn:house")));
>> datasetGraph.sync();
Is the preload of one triple a separate JVM or the same JVM as the
BulkLoader call - could you provide a single complete minimal example?
In attempting to reconstruct this, I don't want to hide the problem by
guessing how things are wired together.
Also - exactly which dbpedia file are you loading (URL?) although I
doubt the exact data is the cause here.
No other processes or threads are running and the application has exclusive
access to the tdb directory. Because of this I suspect a timing issue within
TDB's code maybe somewhere in RecordBuffer or in the BPTree itself. I have
noticed I can only reproduce the issue on fast harddrives such as a SSD
harddrive.
Let's hope it not an issue with sync() on SSDs.
Andy
Thanks
-jp
On Fri, Jun 17, 2011 at 11:52 AM, Simon Helsen<[email protected]> wrote:
TBD is not thread-safe. You have to protect read and write operations
yourself (i.e. multiple read, but exclusive write, i.e. no read while write)
Simon
*Simon Helsen, Ph.D.*
Advisory Software Engineer - Jazz Foundation Server
------------------------------
*Phone:* 1-416-225-5717 | *Mobile:* 1-647-966-8280*
E-mail:* *[email protected]*<[email protected]>
[image: IBM]
From: jp<[email protected]> To: [email protected] Date:
06/17/2011
11:39 AM Subject: BulkLoader error with large data and fast harddrive
------------------------------
I recently updated my computer hardware and am receiving exceptions
while loading a dbpedia dataset of ~19million triples. I have been
able to produce the error below using the follow code. I believe this
might be a concurrency issue as the same data loads with the same code
on a similar machine with a standard harddrive.
DatasetGraphTDB datasetGraph = TDBFactory.createDatasetGraph(tdbDir);
InputStream inputStream = new FileInputStream(dbpediaData);
BulkLoader bulkLoader = new BulkLoader();
bulkLoader.loadDataset(dataset, instanceStream, true);
My current specs are
2.3gh Quad core i5 processor
4gb ram
128gb ssd harddrive
tested on both
java version "1.6.0_22"
OpenJDK Runtime Environment (IcedTea6 1.10.1) (6b22-1.10.1-0ubuntu1)
OpenJDK 64-Bit Server VM (build 20.0-b11, mixed mode)
java version "1.6.0_24"
Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
Jena versions are as follows
arq-2.8.8
jena-2.6.4
tdb-0.8.10
Error while loading into an empty directory
java.lang.IllegalArgumentException
at java.nio.Buffer.position(Buffer.java:235)
at
com.hp.hpl.jena.tdb.base.record.RecordFactory.buildFrom(RecordFactory.java:94)
at
com.hp.hpl.jena.tdb.base.buffer.RecordBuffer._get(RecordBuffer.java:95)
at
com.hp.hpl.jena.tdb.base.buffer.RecordBuffer.get(RecordBuffer.java:41)
at
com.hp.hpl.jena.tdb.index.bplustree.BPTreeRecords.getSplitKey(BPTreeRecords.java:141)
at
com.hp.hpl.jena.tdb.index.bplustree.BPTreeNode.split(BPTreeNode.java:435)
at
com.hp.hpl.jena.tdb.index.bplustree.BPTreeNode.internalInsert(BPTreeNode.java:387)
at
com.hp.hpl.jena.tdb.index.bplustree.BPTreeNode.internalInsert(BPTreeNode.java:399)
at
com.hp.hpl.jena.tdb.index.bplustree.BPTreeNode.internalInsert(BPTreeNode.java:399)
at
com.hp.hpl.jena.tdb.index.bplustree.BPTreeNode.insert(BPTreeNode.java:167)
at
com.hp.hpl.jena.tdb.index.bplustree.BPlusTree.addAndReturnOld(BPlusTree.java:297)
at
com.hp.hpl.jena.tdb.index.bplustree.BPlusTree.add(BPlusTree.java:289)
at
com.hp.hpl.jena.tdb.index.TupleIndexRecord.performAdd(TupleIndexRecord.java:48)
at
com.hp.hpl.jena.tdb.index.TupleIndexBase.add(TupleIndexBase.java:49)
at
com.hp.hpl.jena.tdb.index.TupleTable.add(TupleTable.java:54)
at
com.hp.hpl.jena.tdb.nodetable.NodeTupleTableConcrete.addRow(NodeTupleTableConcrete.java:77)
at
com.hp.hpl.jena.tdb.store.bulkloader.LoaderNodeTupleTable.load(LoaderNodeTupleTable.java:112)
at
com.hp.hpl.jena.tdb.store.bulkloader.BulkLoader$2.send(BulkLoader.java:268)
at
com.hp.hpl.jena.tdb.store.bulkloader.BulkLoader$2.send(BulkLoader.java:244)
at
org.openjena.riot.lang.LangNTuple.runParser(LangNTuple.java:60)
at org.openjena.riot.lang.LangBase.parse(LangBase.java:71)
at
org.openjena.riot.RiotReader.parseQuads(RiotReader.java:122)
at
com.hp.hpl.jena.tdb.store.bulkloader.BulkLoader.loadQuads$(BulkLoader.java:159)
at
com.hp.hpl.jena.tdb.store.bulkloader.BulkLoader.loadDataset(BulkLoader.java:117)
at
com.nimblegraph.data.bin.SimpleDatasetLoader.main(SimpleDatasetLoader.java:24)
Error when loading into a directory with one triple. The following is
run before the bulk loader.
datasetGraph.getDefaultGraph().add(new
Triple(Node.createURI("urn:hello"), RDF.type.asNode(),
Node.createURI("urn:house")));
datasetGraph.sync();
java.lang.IllegalArgumentException: Out of bounds: idx=0, size=-866953722
at
com.hp.hpl.jena.tdb.base.buffer.RecordBuffer.checkBounds(RecordBuffer.java:228)
at
com.hp.hpl.jena.tdb.base.buffer.RecordBuffer.add(RecordBuffer.java:66)
at
com.hp.hpl.jena.tdb.index.bplustree.BPTreeRecords.internalInsert(BPTreeRecords.java:112)
at
com.hp.hpl.jena.tdb.index.bplustree.BPTreeNode.internalInsert(BPTreeNode.java:399)
at
com.hp.hpl.jena.tdb.index.bplustree.BPTreeNode.internalInsert(BPTreeNode.java:399)
at
com.hp.hpl.jena.tdb.index.bplustree.BPTreeNode.internalInsert(BPTreeNode.java:399)
at
com.hp.hpl.jena.tdb.index.bplustree.BPTreeNode.insert(BPTreeNode.java:167)
at
com.hp.hpl.jena.tdb.index.bplustree.BPlusTree.addAndReturnOld(BPlusTree.java:297)
at
com.hp.hpl.jena.tdb.index.bplustree.BPlusTree.add(BPlusTree.java:289)
at
com.hp.hpl.jena.tdb.index.TupleIndexRecord.performAdd(TupleIndexRecord.java:48)
at
com.hp.hpl.jena.tdb.index.TupleIndexBase.add(TupleIndexBase.java:49)
at
com.hp.hpl.jena.tdb.index.TupleTable.add(TupleTable.java:54)
at
com.hp.hpl.jena.tdb.nodetable.NodeTupleTableConcrete.addRow(NodeTupleTableConcrete.java:77)
at
com.hp.hpl.jena.tdb.store.bulkloader.LoaderNodeTupleTable.load(LoaderNodeTupleTable.java:112)
at
com.hp.hpl.jena.tdb.store.bulkloader.BulkLoader$2.send(BulkLoader.java:268)
at
com.hp.hpl.jena.tdb.store.bulkloader.BulkLoader$2.send(BulkLoader.java:244)
at
org.openjena.riot.lang.LangNTuple.runParser(LangNTuple.java:60)
at org.openjena.riot.lang.LangBase.parse(LangBase.java:71)
at
org.openjena.riot.RiotReader.parseQuads(RiotReader.java:122)
at
com.hp.hpl.jena.tdb.store.bulkloader.BulkLoader.loadQuads$(BulkLoader.java:159)
at
com.hp.hpl.jena.tdb.store.bulkloader.BulkLoader.loadDataset(BulkLoader.java:117)
at
com.nimblegraph.data.bin.SimpleDatasetLoader.main(SimpleDatasetLoader.java:24)
Any help tracking down the issue would be greatly appreciated.
Thanks for the great software
-jp
[email protected]