Hi Everyone,

I've seen a lot of talk on the forums over the past year about the need for database support in the D Standard Library and I completely agree. At the end of the day the purpose of any programming language and its attendant libraries is to allow the developer to solve their problems quickly and efficiently; and a large subset of those solutions require some form of structured data store. To my mind, this makes some form of interface(s) to a data-store an essential component of the D Standard Library. And since this is something that my particular problem spaces also need, I thought it would be useful to attempt to do something about it.

First, I've seen a couple of promising projects, the most complete, and recent of which, dstddb (Github: https://github.com/cruisercoder/dstddb) hasn't seen a commit since June. An additional setback came when I tried to use it and was greeted with a litany of compiler errors.

This is *not* a problem, it's the natural course of a volunteer community such as ours; and I want to thank Erik Smith profusely for his work. Priorities and circumstances change and that means that valuable projects are inexplicably dropped.

But we still lack a critical component, and to get the conversation started, I'd like to break down the issues I've seen brought up in past threads on this subject and encourage you to bring your own. I may have ideas, but I can't possibly know the entire problem space.

1. Isn't this an enormous amount of work?

My answer: Absolutely, depending on your preferred scope of work.

In general, I've seen two distinct camps on this issue. One says that we should implement everything in D from the ground up, including re-implementation of the database drivers themselves in D. If this is your preferred scope of work then you will invariably become disheartened at the truly stupendous amount of work you face and give up.

The other camp says that we should make use of existing drivers and include them in the D Standard Library. This is difficult path to follow as the vanilla build of the D Standard Library now requires a significant number of foreign libraries, all with differing licenses, be built and distributed to everyone; regardless of whether or not they use them in their project. This is more-or-less than path the dstddb is/was on.

My idea: Focus on defining the interface, not the individual driver implementations.

If instead we focused on defining an interface that a "conforming implementation" had to follow, we would allow developers to only pull in the library they need or build a from-scratch library if they so desire.

Indeed this is the model that both Java (JDO) and .NET (ADO.NET) follow and I think we would be well advised to follow their lead here. Not only is the methodology battle-proven, it is also well understood by a significant portion of D's potential user-base. By way of example, Npgsql is the ADO.NET implementation of a driver for PostgreSQL.

2. There are so many different types of data storage systems, how do you design a system generic enough for all of them?

My answer: You don't. Nobody else has bothered trying, and I believe that our worry over that question is a large part of why we don't have anything substantive today.

My idea: Split the data storage systems out by category of data-store.
For example:
        - SQL: std.database.sql (PostgreSQL, MySQL, MSSQL, etc.)
        - Document: std.database.document (Mongo, CouchDB, etc.)
        - Key-Value: std.database.keyvalue (Redis, etcd2, etc.)

If you want something that doesn't fit into a category above, you're own your own, but you were also on your own in other languages.

3. We need to provide a single interface for all data-stores in the SQL/Document/Key-Value category.

My answer: Are you sure? The problem is that each underlying data-store has it's own dialect. For example, PostgreSQL and MSSQL are both ostensibly ANSI-SQL, except where they aren't. Re-targeting data-stores, even in the same category, is never going to be as simple as changing a connection string. And additionally, you will have to implement a super-set of features in the interface to support all the variations and throw exceptions where the chosen implementation does not support a specific feature.

My idea: Each data store has it's own implementation with it's own naming convention. For example (ADO.NET):
        - SqlConnection (MSSQL)
        - NpgsqlConnection (Npgsql)

Yes, this means that you have to change names in your code if you switch data-stores, but since you are already changing your queries, which is a much more difficult change, this isn't a significant additional cost. Also, the code becomes clearer to those who take over maintenance duties from the original author, especially when you are mixing data-stores. But in all honest, most developers will pick on technology and stick with it for the entirety of the software's lifespan.

4. We should hide querying from the developer because they are bad at it, security flaws, etc.

My answer: While agree in principal, especially with the security concerns, in reality what you are asking for is an ORM. In my opinion, that is a separate concern from a database interface, and is typically implemented as layer over the DB interface.

My idea: Don't do it. Save it for a different project.

5. D has so many useful features for data access, we should use as many as possible!

My answer: D absolutely has many useful features for data access and manipulation. But that doesn't mean that a good interface has to use any of them. The first job of a Database Interface, and indeed any library, is to get the job done with a minimum of overhead. Let's worry about that before going crazy adding in all the D goodness. Ranges have been a particular target for abuse here, and while I love ranges, I think the mechanics of data-store manipulation don't lend themselves well to working with ranges. I'd love to hear your ideas on this though.

My idea: Focus on a more conservative implementation in the style of JDO or ADO.NET. This will allow us to ship something that works in a reasonable time frame. I'm not saying that we can't use any of D's unique talents, but using those talents should be subordinate to designing an interface that works efficiently.

That is all I have for now. I am looking forward to hear your thoughts on this topic! Until then, I am going to go close out 2016 (PST) with family and friends and I wish you all a Happy New Year!

--
Adam Wilson
IRC: LightBender
import quiet.dlang.dev;

Reply via email to