I'm going to give it a go sometime soon and report back on my non-scientific findings. Your point about the small number of columns is well made, but the research paper cited earlier also mentions this and reports that because of column store optimisations even when they vertically partitioned their data rather than using a property-table approach they still saw good improvement. However, again, I'm no column store expert so perhaps I'm missing some point here :-). Anyway, time to "suck it and see@, all in the name of progress of course.
On 03/09/11 16:29, David Jordan wrote:
I have not used a column-oriented database, but I am somewhat familiar with them. My understanding of them is that the storage is partitioned on a column basis, such that there is no physical clustering together of all the columns for a given row. An advantage of this would be in the case where you have tables with many columns, but the particular application only needs a small subset of columns. With the SDB representation of triples (3 columns) and quads (4 columns), and access typically based on having a specific value for one or two of the columns, I am not so sure that a column-based approach would offer any advantage. But again, I am no expert on these types of databases. These discussions about alternative datastore representations RDF/OWL data are very useful, to gain better understanding of which data architectures yield the best implementation approach for high-performance. p.s. I Monet provides support for JDBC, I would not think much effort is needed to support in with SDB. -----Original Message----- From: nat lu [mailto:[email protected]] Sent: Saturday, September 03, 2011 6:59 AM To: [email protected] Subject: Re: Adding support for MonetDB 1) Has anyone tried using MonetDB just as a jdbc-SDB source ? I suppose the DDL jena uses to create the normalized schema may need adjusting to suit MonetDBs SQL flavour, but it should work, with some mileage, to try it out and do some gap analysis - right ? 2) It also seems reasonable that a SPARQL front end (top of its 3 layer stack) could be created in MAL to augment the existing SQL and Xquery modules. I've also seen some talk of an RDF module in newsgroups that is under development/experimental at this stage. I'm interested to see how an SDB column store will perform with my small dataset compared to and RDBMS backed SDB instance. On 02/09/11 10:11, Paolo Castagna wrote:Andy Seaborne wrote:It seems to use a non-normalized table design (as did the CStore paper) and rely on indexing. It would be interesting to see how that compares with a normalized design, which is what most RDF stores use. When normalized, the joins for patterns are on fixed size numbers, not string comparisons, and don't use secondary indexes (and at least one uses bitmap indexes). SDB is built around a normalized design, and portable across various different SQL DBs. The main triple table (or quad table) is 3 columns of 4 or 8 bytes (depending on hash vs index allocation of internal ids). That means the main table is a thin, long table. If you want a deep integration of SPARQL and MonetDB, the first thing to consider, if pure speed is the objective, is what schema design is best for MonetDB. The reported tests are on a dataset of 50 million triples which is^ notparticularly big - it means RAM caching is going to play a significant role nowadays (they only used a one-CPU, 4G machine). Andy On 01/09/11 19:49, nat lu wrote:Speed I think ! At least in some use cases. (But Im no MonetDB expert) Heres an interesting article that tries a few of them out with MonetDB and CStore. http://oai.cwi.nl/oai/asset/13806/13806B.pdf Whats missing I believe is a the SPARQL endpoint integration. On 01/09/11 08:36, Andy Seaborne wrote:Tobias, To turn the question round - what unique features of MonetDB could be exposed through SDB to yield a better RDf store? The current design covers the current set of databases supported but it's not fixed - maybe there is something especially useful in MonetDB and maybe it needs an extension to the design. The design is based on templating via all those classes (the instance for each database is a small class). The SQL generator usually needs some per-DB work because SQL databases aren't exactly very "standard". Andy On 25/08/11 21:28, Paolo Castagna wrote:Hi Tobias, first of all, welcome on jena-users mailing list. Tobias Willig wrote:Hi everyone, I like to add support for MonetDB in SDB and I have two questions concerning this project: 1. How much effort it takes to add support for a new database type?It requires some effort. You need to add new Java classes and the necessary tests. Have you checked out the SDB sources yet? If not: svn co https://svn.apache.org/repos/asf/incubator/jena/Jena2/SDB/trunk/ SDB cd SDB mvn test You can use Eclipse and search for "Derby" which is one of the DBMS supported by SDB. This way you'll find the list of Java classes in SDB to support Derby. Then, you can read and study those classes. While you do that, you'll learn the design of SDB and you will get an idea on what it is required to add MonetDB.2. Are there predefined extension points that allow adding a new database type easily?Yes, there are. Look at the super classes and interfaces from the list of classes above (i.e. searching for "Derby"). There isn't an "how to add a new database to SDB" guide, however make sure you read the general SDB documentation (it does not hurt). Also, if you want to contribute an "how to add a new database to SDB" you are more than welcome to do so.If so could you give me the name of some classes and config files, which are important to accomplish that task?You can start from: GenerateSQLDerby -- extends --> GenrateSQL FormatterSimpleDerby -- extends --> FormatterSimple StoreSimpleDerby -- extends --> StoreBase1 FmtLayout2HashDerby -- extends --> FmtLayout2 StoreTriplesNodesHashDerby -- extends --> StoreBaseHash TupleLoaderHashDerby -- extends --> TupleLoaderHashBase FmtLayout2IndexDerby -- extends --> FmtLayout2HashDerby StoreTriplesNodesIndexDerby -- extends --> StoreBaseIndex TupleLoaderIndexDerby -- extends --> TupleLoaderIndexBase Also look at: - JDBC.java - DatabaseType.java - StoreFactory.java - SDB.java And the existing tests. May I ask you what motivates you in adding MonetDB? I've never used it myself and, indeed, I use TDB instead of SDB. Transactions are coming, hopefully in the 0.9.0 release of TDB. Last but not least, it's not only about the code. You should be willing to support the users of your code too. Once you add support for MonetDB, people will start using it and, as they use your code, they'll find bugs and they'll ask you for more features, eventually. You should be willing to put some effort in fixing the bugs, at least... and you can always say "no" politely to new features. Until, someone else, who really needs the new feature and he/she is willing to put some effort, will take over and push the software a step further. Once you start, you might have more specific questions on SDB design. You can post your questions here or on [email protected] mailing list. The more you demonstrate you put some effort the more likely you'll receive helpful answers back from the SDB developers. If you don't put enough effort and expect others will do it for you, I am afraid, you risk to have not much back. To conclude, it you are motivated and you think you'll have fun doing it (and you need it for your work): go ahead, it's not a terribly huge task and it could be a nice contribution (in particular for all the MonetDB users out there). PaoloThanks in advance! Best Regards Tobias Willig
