Dear all,

I worked on some ideas on improving the import/write performance in the
triple store. With the new implementation that is now in GIT (devel
branch), I was able to reduce the import time for larger datasets by
more than 50%. The new implementation uses for some databases
(PostgreSQL and MySQL) a batched insertion into the database. This means
that when storing nodes and triples, they are not immediately added to
the JDBC connection but rather cached in-memory. When either the
batch-size is reached (currently 1000 triples or nodes) or the
connection commits, the triples and nodes are written to the database
using JDBC/SQL batch operations.

Here are the figures (PostgreSQL, my machine, about 30k triples):

batch disabled: 83033ms
batch enabled (triples only): 70966ms
batch enabled (triples & nodes): 45308ms
batch enabled (triples & nodes, shared connection): 39495ms

Of course, the implementation with batching becomes more complex, so I
would like to ask you to try it out in different scenarios.
Particularly, I'd like to ask Raffaele if it improves the performance
benchmarks he has been doing. :-)

Greetings,

Sebastian

Reply via email to