Hi Everyone,
I've seen a lot of talk on the forums over the past year about the need
for database support in the D Standard Library and I completely agree.
At the end of the day the purpose of any programming language and its
attendant libraries is to allow the developer to solve their problems
quickly and efficiently; and a large subset of those solutions require
some form of structured data store. To my mind, this makes some form of
interface(s) to a data-store an essential component of the D Standard
Library. And since this is something that my particular problem spaces
also need, I thought it would be useful to attempt to do something about it.
First, I've seen a couple of promising projects, the most complete, and
recent of which, dstddb (Github: https://github.com/cruisercoder/dstddb)
hasn't seen a commit since June. An additional setback came when I tried
to use it and was greeted with a litany of compiler errors.
This is *not* a problem, it's the natural course of a volunteer
community such as ours; and I want to thank Erik Smith profusely for his
work. Priorities and circumstances change and that means that valuable
projects are inexplicably dropped.
But we still lack a critical component, and to get the conversation
started, I'd like to break down the issues I've seen brought up in past
threads on this subject and encourage you to bring your own. I may have
ideas, but I can't possibly know the entire problem space.
1. Isn't this an enormous amount of work?
My answer: Absolutely, depending on your preferred scope of work.
In general, I've seen two distinct camps on this issue. One says that we
should implement everything in D from the ground up, including
re-implementation of the database drivers themselves in D. If this is
your preferred scope of work then you will invariably become
disheartened at the truly stupendous amount of work you face and give up.
The other camp says that we should make use of existing drivers and
include them in the D Standard Library. This is difficult path to follow
as the vanilla build of the D Standard Library now requires a
significant number of foreign libraries, all with differing licenses, be
built and distributed to everyone; regardless of whether or not they use
them in their project. This is more-or-less than path the dstddb is/was on.
My idea: Focus on defining the interface, not the individual driver
implementations.
If instead we focused on defining an interface that a "conforming
implementation" had to follow, we would allow developers to only pull in
the library they need or build a from-scratch library if they so desire.
Indeed this is the model that both Java (JDO) and .NET (ADO.NET) follow
and I think we would be well advised to follow their lead here. Not only
is the methodology battle-proven, it is also well understood by a
significant portion of D's potential user-base. By way of example,
Npgsql is the ADO.NET implementation of a driver for PostgreSQL.
2. There are so many different types of data storage systems, how do you
design a system generic enough for all of them?
My answer: You don't. Nobody else has bothered trying, and I believe
that our worry over that question is a large part of why we don't have
anything substantive today.
My idea: Split the data storage systems out by category of data-store.
For example:
- SQL: std.database.sql (PostgreSQL, MySQL, MSSQL, etc.)
- Document: std.database.document (Mongo, CouchDB, etc.)
- Key-Value: std.database.keyvalue (Redis, etcd2, etc.)
If you want something that doesn't fit into a category above, you're own
your own, but you were also on your own in other languages.
3. We need to provide a single interface for all data-stores in the
SQL/Document/Key-Value category.
My answer: Are you sure? The problem is that each underlying data-store
has it's own dialect. For example, PostgreSQL and MSSQL are both
ostensibly ANSI-SQL, except where they aren't. Re-targeting data-stores,
even in the same category, is never going to be as simple as changing a
connection string. And additionally, you will have to implement a
super-set of features in the interface to support all the variations and
throw exceptions where the chosen implementation does not support a
specific feature.
My idea: Each data store has it's own implementation with it's own
naming convention. For example (ADO.NET):
- SqlConnection (MSSQL)
- NpgsqlConnection (Npgsql)
Yes, this means that you have to change names in your code if you switch
data-stores, but since you are already changing your queries, which is a
much more difficult change, this isn't a significant additional cost.
Also, the code becomes clearer to those who take over maintenance duties
from the original author, especially when you are mixing data-stores.
But in all honest, most developers will pick on technology and stick
with it for the entirety of the software's lifespan.
4. We should hide querying from the developer because they are bad at
it, security flaws, etc.
My answer: While agree in principal, especially with the security
concerns, in reality what you are asking for is an ORM. In my opinion,
that is a separate concern from a database interface, and is typically
implemented as layer over the DB interface.
My idea: Don't do it. Save it for a different project.
5. D has so many useful features for data access, we should use as many
as possible!
My answer: D absolutely has many useful features for data access and
manipulation. But that doesn't mean that a good interface has to use any
of them. The first job of a Database Interface, and indeed any library,
is to get the job done with a minimum of overhead. Let's worry about
that before going crazy adding in all the D goodness. Ranges have been a
particular target for abuse here, and while I love ranges, I think the
mechanics of data-store manipulation don't lend themselves well to
working with ranges. I'd love to hear your ideas on this though.
My idea: Focus on a more conservative implementation in the style of JDO
or ADO.NET. This will allow us to ship something that works in a
reasonable time frame. I'm not saying that we can't use any of D's
unique talents, but using those talents should be subordinate to
designing an interface that works efficiently.
That is all I have for now. I am looking forward to hear your thoughts
on this topic! Until then, I am going to go close out 2016 (PST) with
family and friends and I wish you all a Happy New Year!
--
Adam Wilson
IRC: LightBender
import quiet.dlang.dev;