Hi Raffaele, first of all: I did a number of database improvements. Maybe you can check how in your example this affects performance (check out the latest source release in the development branch)?
Most of the refactoring for this change has already been done, the main things that are needed are: - allow SesameService to inject different implementations of storage backends, in the same style it already handles SailProviders - create separate modules for the different backends (e.g. marmotta-backend-kiwi, marmotta-backend-native, marmotta-backend-bigdata, marmotta-backend-tdb, marmotta-backend-sdb) - check which other modules are affected by this change, e.g. the current versioning and reasoning will only work with the kiwi backend, but other backends also support different kinds of reasoning so maybe the marmotta-reasoner can be made more generic to support different styles of reasoning, or there are different reasoning modules depending on the backend like marmotta-reasoner-kiwi, marmotta-reasoner-bigdata, ... What I still need to think about is how to ensure only one backend is used even if several are found on the classpath - because unfortunately Maven does not provide a way to define mutual exclusion of dependencies (or does it?). Maybe the user would even want to be able to select the backend at runtime - but then this has consequences on which other modules are available. Since I probably know the architecture best, I will try to provide the necessary infrastructure in the coming weeks. In the meantime, if you want to work on a Sesame API wrapper for Jena SDB and TDB, this would be a major step towards this goal. ;-) Greetings, Sebastian 2013/5/27 Raffaele Palmieri <[email protected]> > > > Il giorno venerdì 24 maggio 2013, Sergio Fernández ha scritto: > > Hi, >> >> IMHO just switch to Jena SDB is not a right idea, but porting some ideas >> to KiWi triple store would be nice. > > > Yes, i agree with you in consequence also of created issues > Marmotta-85/89; Jena SDB would be only a possibile option, not the only > backend. > > >> >> In parallel, when we switched to a pure Sesame-backend, the idea for the >> mid-long term was to be able to run Marmotta on top of other triple stores. >> I particularly would like to be able to use Jena TDB. But there are still >> many part of Marmotta that should be refactored for allowing such. >> >> > So, how do we think to approach this refactoring? Can we identify the > parts that need refactoring or is it premature and so do we focus on other > issues? > > > >> Cheers, >> > > Regards, > Raffaele. > > >> >> >> On 23/05/13 15:03, Sebastian Schaffert wrote: >> >> Hi Raffaele, >> >> the idea was anyways to allow different backends besides KiWi, because >> each >> has its advantages and disadvantages (KiWi's advantages are the versioning >> and the reasoner). The issue is documented under >> >> https://issues.apache.org/**jira/browse/MARMOTTA-85<https://issues.apache.org/jira/browse/MARMOTTA-85> >> >> and the individual backends have subsequent numbers. See e.g. >> >> https://issues.apache.org/**jira/browse/MARMOTTA-89<https://issues.apache.org/jira/browse/MARMOTTA-89> >> >> for the SDB backend implementation. >> >> Changing backends is currently not possible, but it is foreseen in the >> architecture and it would take me about one day of work to change the >> platform in a way that other backends can be used. The main change will be >> in the SesameServiceImpl which sets up the underlying triple store. The >> initialisation method for this service stacks together different sails >> depending on the configuration and is already very modular. The only thing >> that is currently hardcoded there is the initialisation of a new >> KiWiStore, >> but in principle it could be any Sesame Sail. >> >> But there are some consequences and dependencies, e.g. the >> marmotta-versioning and marmotta-reasoner modules cannot be used if the >> backend is not KiWi, and I need to find a clean way to model these >> dependencies (Maven is unfortunately probably not enough, because several >> backends could be on the classpath and only one backend selected - on the >> other hand we could simply create different backend configurations in >> Maven >> that only include one backend to be used - we will see). >> >> If you want to try with SDB and TDB, the first step would be to implement >> a >> clean wrapper that allows accessing Jena through the Sesame SAIL API. >> Peter >> Ansell has already worked on such adapters: >> >> https://github.com/ansell/**JenaSesame<https://github.com/ansell/JenaSesame> >> >> Maybe this would be a good starting point. I will in parallel try to work >> on modularizing the backends. Not sure when I will be able to finish this, >> because other things are currently on my priority list... >> >> Greetings, >> >> Sebastian >> >> >> 2013/5/23 Raffaele Palmieri <[email protected]> >> >> Hi Sebastian, below are some considerations that induce me to think that >> Jena SDB(or TDB) could be a better solution, but I understand that's a big >> impact on codebase, and so I would go cautious. >> >> On 23 May 2013 12:20, Sebastian Schaffert <[email protected] >> >> wrote: >> >> >> Hi Raffaele, >> >> >> 2013/5/22 Raffaele Palmieri <[email protected]> >> >> On 22 May 2013 15:04, Andy Seaborne <[email protected]> wrote: >> >> What is the current loading rate? >> >> >> Tried a test with a graph of 661 nodes and 957 triples: it took about >> >> 18 >> >> sec. So, looking the triples the medium rate is 18.8 ms per triple; >> >> tested >> >> on Tomcat with maximum size of 1.5 Gb. >> >> >> This is a bit too small for a real test, because you will have a high >> influence of side effects (like cache initialisation). I have done some >> performance comparisons with importing about 10% of GeoNames (about 15 >> million triples, 1.5 million resources). The test uses a specialised >> parallel importer that was configured to run 8 importing threads in >> parallel. Here are some figures on different hardware: >> *- VMWare, 4CPU, 6GB RAM, HDD: 4:20h (avg per 100 resources: 10-13 >> >> seconds, >> >> 8 in parallel). In case of VMWare, the CPU is waiting most of the time >> >> for >> >> -- >> Sergio Fernández >> >
