Dear all, I worked on some ideas on improving the import/write performance in the triple store. With the new implementation that is now in GIT (devel branch), I was able to reduce the import time for larger datasets by more than 50%. The new implementation uses for some databases (PostgreSQL and MySQL) a batched insertion into the database. This means that when storing nodes and triples, they are not immediately added to the JDBC connection but rather cached in-memory. When either the batch-size is reached (currently 1000 triples or nodes) or the connection commits, the triples and nodes are written to the database using JDBC/SQL batch operations.
Here are the figures (PostgreSQL, my machine, about 30k triples): batch disabled: 83033ms batch enabled (triples only): 70966ms batch enabled (triples & nodes): 45308ms batch enabled (triples & nodes, shared connection): 39495ms Of course, the implementation with batching becomes more complex, so I would like to ask you to try it out in different scenarios. Particularly, I'd like to ask Raffaele if it improves the performance benchmarks he has been doing. :-) Greetings, Sebastian
