Re: Databases and the D Standard Library

2017-01-04 Thread Dejan Lekic via Digitalmars-d

On Sunday, 1 January 2017 at 03:24:31 UTC, Adam Wilson wrote:

Hi Everyone,

I've seen a lot of talk on the forums over the past year about 
the need for database support in the D Standard Library and I 
completely agree. At the end of the day the purpose of any 
programming language and its attendant libraries is to allow 
the developer to solve their problems quickly and efficiently; 
and a large subset of those solutions require some form of 
structured data store. To my mind, this makes some form of 
interface(s) to a data-store an essential component of the D 
Standard Library. And since this is something that my 
particular problem spaces also need, I thought it would be 
useful to attempt to do something about it.



The only thing I want, database related, in the standard library 
is the API! - Nothing else! There should be a standard 
implementation of that API (libd-db.so for an example), but it 
should be separated from Phobos. In general, Phobos should only 
contain the APIs in my humble opinion.
We should handle XML processing the same way (API in Phobos, 
libd-xml.so for the reference implementation), Image processing 
the same way, GUI, etc...


Why? Phobos is enormous already!


Re: Databases and the D Standard Library

2017-01-03 Thread Jacob Carlborg via Digitalmars-d

On 2017-01-03 18:13, Chris Wright wrote:


The returned row data is mandatory, and its size can be much larger than
the stack limit. (A MySQL MEDIUMBLOB field will likely break your stack
limit.)

I suppose you could have a streaming API for row data, one that has a
stack-allocated buffer and returns slices of that:

  string fieldName;
  ubyte[] data;
  ubyte[][string] fields;
  db.query("SELECT * FROM USERS")
// have to revisit this if a db allows large names
.onFieldStart((fieldName) => field = fieldName)
.onFieldData((fragment) => data ~= fragment)
.onFieldEnd(() { fields[field] = data; data = null; })
.onRowEnd(() => process(fields))
.onResultsEnd!(() => writeln("done"))
.exec();

This looks pretty terrible, to be honest. I get this sort of thing from
nodejs because it doesn't want to potentially block and also doesn't want
to delay letting me process things, but the worst I get there is usually
two callbacks.

This would also result in more GC use for the majority of people who use
the GC.


Look, I didn't say that using the GC should be completely forbidden. I 
just said we should try to avoid it. For example, I've been using the 
ddb Postgres driver [1]. It uses classes for most of its types, even if 
it might not be necessary. Here's one example [2], unless there some 
intention to have some form of higher level, DB independent, API on top 
of this, I don't see a reason why that type needs to be a class.



2. Buffers say nothing how they're allocated. With classes on the other
hand, you're basically forced to allocate with the GC


You haven't looked at std.experimental.allocator, have you?


I know it's possible to allocate a class without the GC, hence the 
"basically". I'm not sure how other write their code but at least I make 
the assumption that all objects are allocated with the GC.


[1] https://github.com/pszturmaj/ddb
[2] https://github.com/pszturmaj/ddb/blob/master/source/ddb/postgres.d#L904

--
/Jacob Carlborg


Re: Databases and the D Standard Library

2017-01-03 Thread Chris Wright via Digitalmars-d
On Tue, 03 Jan 2017 13:23:55 +0100, Jacob Carlborg wrote:

> On 2017-01-03 09:38, Chris Wright wrote:
> 
>> You are unable to interact with two different databases in the same
>> executable using the same library. For instance, if you're using
>> hibernated, either you compiled it to connect to mysql, or you compiled
>> it to connect to oracle.
> 
> That's true. And that's why I said it's difficult to design an API
> without trying it in code :)

I didn't try it in code.

>> In exchange, you get...slightly less GC usage. It's not *no* GC usage
>> --
>> you'll see a bunch of buffers allocated to hold incoming and outgoing
>> messages. You'll just peel back one layer of it.
> 
> 1. I hope there won't be that many buffers in the API, at least not in
> the user facing API

The returned row data is mandatory, and its size can be much larger than 
the stack limit. (A MySQL MEDIUMBLOB field will likely break your stack 
limit.)

I suppose you could have a streaming API for row data, one that has a 
stack-allocated buffer and returns slices of that:

  string fieldName;
  ubyte[] data;
  ubyte[][string] fields;
  db.query("SELECT * FROM USERS")
// have to revisit this if a db allows large names
.onFieldStart((fieldName) => field = fieldName)
.onFieldData((fragment) => data ~= fragment)
.onFieldEnd(() { fields[field] = data; data = null; })
.onRowEnd(() => process(fields))
.onResultsEnd!(() => writeln("done"))
.exec();

This looks pretty terrible, to be honest. I get this sort of thing from 
nodejs because it doesn't want to potentially block and also doesn't want 
to delay letting me process things, but the worst I get there is usually 
two callbacks.

This would also result in more GC use for the majority of people who use 
the GC.

> 2. Buffers say nothing how they're allocated. With classes on the other
> hand, you're basically forced to allocate with the GC

You haven't looked at std.experimental.allocator, have you?

http://dpldocs.info/experimental-docs/std.conv.emplace.3.html
http://dpldocs.info/experimental-docs/std.experimental.allocator.make.html
http://dpldocs.info/experimental-docs/
std.experimental.allocator.dispose.2.html


Re: Databases and the D Standard Library

2017-01-03 Thread Jacob Carlborg via Digitalmars-d

On 2017-01-03 09:38, Chris Wright wrote:


You are unable to interact with two different databases in the same
executable using the same library. For instance, if you're using
hibernated, either you compiled it to connect to mysql, or you compiled
it to connect to oracle.


That's true. And that's why I said it's difficult to design an API 
without trying it in code :)



In exchange, you get...slightly less GC usage. It's not *no* GC usage --
you'll see a bunch of buffers allocated to hold incoming and outgoing
messages. You'll just peel back one layer of it.


1. I hope there won't be that many buffers in the API, at least not in 
the user facing API


2. Buffers say nothing how they're allocated. With classes on the other 
hand, you're basically forced to allocate with the GC



You'd be much better off asking that we encourage the use of
std.experimental.allocator in the driver interface.


Then I'll ask for that as well :)

--
/Jacob Carlborg


Re: Databases and the D Standard Library

2017-01-03 Thread Nicholas Wilson via Digitalmars-d

On Tuesday, 3 January 2017 at 08:09:54 UTC, Chris Wright wrote:

On Mon, 02 Jan 2017 21:25:42 -0800, Adam Wilson wrote:
As far as I am aware, the only way to meet those requirements 
is to use a base-class model. Is there something I am missing?


Templates. Templates everywhere.

Every method in your application that might possibly touch a 
database, or touch anything that touches a database, and so on, 
needs to be templated according to what type of database might 
be used.


That limits you to one DB per compilation or craploads of 
template bloat.


There are a number variables here: the number of DB backends you 
wish to support (b), the number of DB backends you actually use 
at runtime (r), the number of symbols (not quite the word I'm 
looking for but, oh well) you need to represent an abstract 
backend API (s),the number of class types you use to abstract the 
backend (c) and the number of template you use to abstract the 
back end (t).


b is ideally fixed at "all the backends"
r is variable and dependent on the application (e.g. I may only 
care for Postgres, but someone else may wish to support many SQL 
DBs). If r == 1 then a template approach is acceptable.


s is a function of the dissimilarity of the backends you wish to 
support. Breaking the problem up into SQL like, graph-like and 
KV-store is a tradeoff somewhere between having "one DB 
(interface) to rule them all" and one interface for each backend.


c + t = s

What this represents is a tradeoff between compile time dispatch 
and runtime dispatch. As s moves from being all classes to more 
templates + structs (from the "bottom up"), the last layer of 
dynamic dispatch before the static dispatch of the templates 
becomes an algebraic type selection (i.e. check the tag, choose 
the type, and then static dispatch).


I believe the sweet spot for this lies at the point where the 
dissimilarity of similar backends becomes apparent after the 
start of a logical operation. Or put another way the point where 
I know the result that I want and no longer care about any 
implementation details.


As an example using a compute API (sorry I don't know much about 
DBs): launching a kernel represents a single logical operation 
but is in fact many driver calls. If one wishes to abstract the 
compute API then this point becomes the point I would choose.


Finding those points will probably not be easy and may be 
different for different people, but it is worth considering.





Re: Databases and the D Standard Library

2017-01-03 Thread Chris Wright via Digitalmars-d
On Tue, 03 Jan 2017 08:25:55 +0100, Jacob Carlborg wrote:

> Structs and functions, with or without templates. Could something like
> this work:
> 
> module db_interface;
> 
> version (Postgres)
>  public import pg.db_interface;
> else version (MySQL)
>  public import mysql.db_interface;

You are unable to interact with two different databases in the same 
executable using the same library. For instance, if you're using 
hibernated, either you compiled it to connect to mysql, or you compiled 
it to connect to oracle.

This means you can't, for instance, use mysql for the CI server (because 
it's open source and doesn't have licensing fees), then use oracle for 
production (because it's faster for your workflow), because then you're 
testing with a different binary. You can't have some data in postgres and 
some in SQL Server because you're in the middle of a migration.

You can still use both if you are using the database interface directly. 
But if you're connecting via a library, you're SOL.

You have to recompile everything whenever you switch databases. That's a 
barrier to proprietary libraries that interact with databases. They're 
not impossible, but they have to release separate binaries for every 
database the maintainer thinks you might want to connect to.

Every library that lets you access a database must maintain a list of db 
drivers that it supports. If you have a new or private driver you want to 
use, you need to modify any library you use that talks to a database.

In exchange, you get...slightly less GC usage. It's not *no* GC usage -- 
you'll see a bunch of buffers allocated to hold incoming and outgoing 
messages. You'll just peel back one layer of it.

You'd be much better off asking that we encourage the use of 
std.experimental.allocator in the driver interface.


Re: Databases and the D Standard Library

2017-01-03 Thread Chris Wright via Digitalmars-d
On Mon, 02 Jan 2017 21:25:42 -0800, Adam Wilson wrote:
> As far as I am aware, the only way to meet those requirements is to use
> a base-class model. Is there something I am missing?

Templates. Templates everywhere.

Every method in your application that might possibly touch a database, or 
touch anything that touches a database, and so on, needs to be templated 
according to what type of database might be used.


Re: Databases and the D Standard Library

2017-01-02 Thread Jacob Carlborg via Digitalmars-d

On 2017-01-03 06:25, Adam Wilson wrote:


Ok. How would you design a database API for D?


I don't know. I think it's difficult to design something upfront without 
trying out different API's to see what's possible to implement in code.


Structs and functions, with or without templates. Could something like 
this work:


module db_interface;

version (Postgres)
public import pg.db_interface;
else version (MySQL)
public import mysql.db_interface;

static assert(isInterfaceImplemented, "The DB interface is not 
implemented");



--
/Jacob Carlborg


Re: Databases and the D Standard Library

2017-01-02 Thread Adam Wilson via Digitalmars-d

On 1/2/17 12:09 AM, Jacob Carlborg wrote:

On 2017-01-02 02:34, Adam Wilson wrote:


That was my intention, the knee-jerk reaction that class and interfaces
get here sometimes strikes me as a bit histrionic sometimes. They are a
tool with a use case. :)


I think that the design should try to avoid classes as much as possible
for things that will be frequently created. It's always possible to wrap
a struct in a class, the other way around is a bit more difficult. But
when it comes to the connection object I think it's fine to use classes
since it will most likely only be created once per thread.



Ok. How would you design a database API for D?

The requirements I am operating under are:

1. Individual data-store driver implementations are not included in the 
D Standard Library. Driver licensing and implementation details vary. 
For example libpq5 uses it's own mix of licenses that is not Boost 
compatible.


2. The D Standard Library provides a common API and implementation of 
shared components, but leaves the data-store specific implementation up 
to the implementer. We don't care how the implementation is constructed 
or licensed, only that the API is followed.


3. In order to support higher level abstractions like ORM's we need a 
base class model that can be extended by implementers but still consumed 
by the ORM without knowing implementation specific details.


As far as I am aware, the only way to meet those requirements is to use 
a base-class model. Is there something I am missing?


--
Adam Wilson
IRC: LightBender
import quiet.dlang.dev;


Re: Databases and the D Standard Library

2017-01-02 Thread Adam Wilson via Digitalmars-d

On 1/2/17 12:05 AM, Jacob Carlborg wrote:

On 2017-01-01 17:50, Chris Wright wrote:


Those both limit your ability to use the underlying database to its full
potential. They offer a chance for queries that seem simple and efficient
to become horribly inefficient.


I'm perfectly aware of the limitations and capabilities of ORM's. I'm
just saying that making the interface/names different just to make it
different is not a good idea. It should be up to the user to choose if
an ORM is used or not and this interface should try to, as much as
possible, to make it possible to use an ORM just as well as not using an
ORM.



Is there a assumption here that there are no classes? Because and ORM 
could quite easily work with base classes, and indeed both NHibernate 
and EntityFramework function exactly this way.



This whole idea seems fail even before it's barely stared. If this idea
is going to work then all the layers need to be designed correctly and
the lower layers should not know anything about the higher layers.



I absolutely agree, which, ironically, is why I am having this conversation.

--
Adam Wilson
IRC: LightBender
import quiet.dlang.dev;


Re: Databases and the D Standard Library

2017-01-02 Thread Adam Wilson via Digitalmars-d

On 1/2/17 8:33 AM, Chris Wright wrote:

On Sun, 01 Jan 2017 17:55:01 -0800, Adam Wilson wrote:

On that I beg to differ. The C libraries are not @safe, they have wildly
different API's, and they have high-complexity, which is a large
risk-factor for bugs and/or security flaws.


If we have the database interface defined, there's no reason we couldn't
provide, for instance, a postgres-c-wrapper driver and a postgres-pure-d
driver.



Precisely, I would love to enable this!

--
Adam Wilson
IRC: LightBender
import quiet.dlang.dev;


Re: Databases and the D Standard Library

2017-01-02 Thread Chris Wright via Digitalmars-d
On Sun, 01 Jan 2017 17:55:01 -0800, Adam Wilson wrote:
> On that I beg to differ. The C libraries are not @safe, they have wildly
> different API's, and they have high-complexity, which is a large
> risk-factor for bugs and/or security flaws.

If we have the database interface defined, there's no reason we couldn't 
provide, for instance, a postgres-c-wrapper driver and a postgres-pure-d 
driver.


Re: Databases and the D Standard Library

2017-01-02 Thread Jacob Carlborg via Digitalmars-d

On 2017-01-01 17:50, Chris Wright wrote:


Those both limit your ability to use the underlying database to its full
potential. They offer a chance for queries that seem simple and efficient
to become horribly inefficient.


I'm perfectly aware of the limitations and capabilities of ORM's. I'm 
just saying that making the interface/names different just to make it 
different is not a good idea. It should be up to the user to choose if 
an ORM is used or not and this interface should try to, as much as 
possible, to make it possible to use an ORM just as well as not using an 
ORM.


This whole idea seems fail even before it's barely stared. If this idea 
is going to work then all the layers need to be designed correctly and 
the lower layers should not know anything about the higher layers.


--
/Jacob Carlborg


Re: Databases and the D Standard Library

2017-01-02 Thread Jacob Carlborg via Digitalmars-d

On 2017-01-02 02:34, Adam Wilson wrote:


That was my intention, the knee-jerk reaction that class and interfaces
get here sometimes strikes me as a bit histrionic sometimes. They are a
tool with a use case. :)


I think that the design should try to avoid classes as much as possible 
for things that will be frequently created. It's always possible to wrap 
a struct in a class, the other way around is a bit more difficult. But 
when it comes to the connection object I think it's fine to use classes 
since it will most likely only be created once per thread.


--
/Jacob Carlborg


Re: Databases and the D Standard Library

2017-01-01 Thread rikki cattermole via Digitalmars-d

On 02/01/2017 4:44 PM, Adam Wilson wrote:

rikki cattermole wrote:

On 02/01/2017 3:03 PM, Adam Wilson wrote:

rikki cattermole wrote:

On 01/01/2017 5:19 PM, Adam D. Ruppe wrote:

On Sunday, 1 January 2017 at 03:51:52 UTC, rikki cattermole wrote:

Which is fine if all you use is c's sockets or only that database
connection for a thread.


The C drivers typically offer handles of some sort (Windows HANDLE,
*nix
file descriptor, that kind of thing) that you can integrate into other
event loops.


That's fine and all, but you've still got to deal with it on D's
side so
you can mix and match libraries that require access to the same event
loop (such as Windows).


Vibe.d is working on a native D event loop. We would probably want to
integrate with that.

EventCore: https://code.dlang.org/packages/eventcore



No, it isn't generic enough.
Nor can it handle windowing without a good bit of modifications.

Mine in SPEW[0] is however ready for this task.

[0]
https://github.com/Devisualization/spew/tree/master/src/base/cf/spew/event_loop




How much effort to make the changes? Their plans seem to indicate that
they want to support UI integration. I ask because I need to use the
library that is going to get the most support over time and that is
vibe.d right now.


Read and compare the code.
To add anything into the Vibe.d one requires direct modification which 
is unacceptable for Phobos IMO.


My aim for SPEW was to be as close as glib's[0] as reasonably possible. 
A well tested set of features.


I'm sorry but this is just not acceptable[1] for an event loop:

interface EventDriver {
@safe: /*@nogc:*/ nothrow:
@property EventDriverCore core();
@property EventDriverTimers timers();
@property EventDriverEvents events();
@property EventDriverSignals signals();
@property EventDriverSockets sockets();
@property EventDriverDNS dns();
@property EventDriverFiles files();
@property EventDriverWatchers watchers();

/// Releases all resources associated with the driver
void dispose();
}

Its fine for a web framework, aka a specific task but not when its generic.

When we're discussing a generic event loop it shouldn't care about the 
different usage of it. All it knows is that there are events that come 
from sources and then mapped to a consumer. It isn't the most performant 
but that is ok. You would use a different implementation depending on 
your use case e.g. 1:1 is easy enough to do without a event loop manager.


Just so you're aware, windowing is extremely hard to get right. Sockets, 
DNS, signals and timers ext. are easy to implement compared.


[0] https://developer.gnome.org/glib/stable/glib-The-Main-Event-Loop.html
[1] 
https://github.com/vibe-d/eventcore/blob/master/source/eventcore/driver.d


Re: Databases and the D Standard Library

2017-01-01 Thread Adam Wilson via Digitalmars-d

rikki cattermole wrote:

On 02/01/2017 3:03 PM, Adam Wilson wrote:

rikki cattermole wrote:

On 01/01/2017 5:19 PM, Adam D. Ruppe wrote:

On Sunday, 1 January 2017 at 03:51:52 UTC, rikki cattermole wrote:

Which is fine if all you use is c's sockets or only that database
connection for a thread.


The C drivers typically offer handles of some sort (Windows HANDLE,
*nix
file descriptor, that kind of thing) that you can integrate into other
event loops.


That's fine and all, but you've still got to deal with it on D's side so
you can mix and match libraries that require access to the same event
loop (such as Windows).


Vibe.d is working on a native D event loop. We would probably want to
integrate with that.

EventCore: https://code.dlang.org/packages/eventcore



No, it isn't generic enough.
Nor can it handle windowing without a good bit of modifications.

Mine in SPEW[0] is however ready for this task.

[0]
https://github.com/Devisualization/spew/tree/master/src/base/cf/spew/event_loop



How much effort to make the changes? Their plans seem to indicate that 
they want to support UI integration. I ask because I need to use the 
library that is going to get the most support over time and that is 
vibe.d right now.


--
Adam Wilson
IRC: LightBender
import quiet.dlang.dev;


Re: Databases and the D Standard Library

2017-01-01 Thread rikki cattermole via Digitalmars-d

On 02/01/2017 3:03 PM, Adam Wilson wrote:

rikki cattermole wrote:

On 01/01/2017 5:19 PM, Adam D. Ruppe wrote:

On Sunday, 1 January 2017 at 03:51:52 UTC, rikki cattermole wrote:

Which is fine if all you use is c's sockets or only that database
connection for a thread.


The C drivers typically offer handles of some sort (Windows HANDLE, *nix
file descriptor, that kind of thing) that you can integrate into other
event loops.


That's fine and all, but you've still got to deal with it on D's side so
you can mix and match libraries that require access to the same event
loop (such as Windows).


Vibe.d is working on a native D event loop. We would probably want to
integrate with that.

EventCore: https://code.dlang.org/packages/eventcore



No, it isn't generic enough.
Nor can it handle windowing without a good bit of modifications.

Mine in SPEW[0] is however ready for this task.

[0] 
https://github.com/Devisualization/spew/tree/master/src/base/cf/spew/event_loop


Re: Databases and the D Standard Library

2017-01-01 Thread Adam Wilson via Digitalmars-d

rikki cattermole wrote:

On 01/01/2017 5:19 PM, Adam D. Ruppe wrote:

On Sunday, 1 January 2017 at 03:51:52 UTC, rikki cattermole wrote:

Which is fine if all you use is c's sockets or only that database
connection for a thread.


The C drivers typically offer handles of some sort (Windows HANDLE, *nix
file descriptor, that kind of thing) that you can integrate into other
event loops.


That's fine and all, but you've still got to deal with it on D's side so
you can mix and match libraries that require access to the same event
loop (such as Windows).


Vibe.d is working on a native D event loop. We would probably want to 
integrate with that.


EventCore: https://code.dlang.org/packages/eventcore

--
Adam Wilson
IRC: LightBender
import quiet.dlang.dev;


Re: Databases and the D Standard Library

2017-01-01 Thread Adam Wilson via Digitalmars-d

Adam D. Ruppe wrote:

On Sunday, 1 January 2017 at 03:24:31 UTC, Adam Wilson wrote:

interface(s) to a data-store an essential component of the D Standard
Library.


Eh, I count it as would-be-nice just because it isn't that hard to just
use the C ones, or another third party lib; it doesn't have to be Phobos
itself.



On that I beg to differ. The C libraries are not @safe, they have wildly 
different API's, and they have high-complexity, which is a large 
risk-factor for bugs and/or security flaws. Any place where we can



That said though, a basic db interface is quite simple and would be a
nice batteries included bit - it is one of the things IMO that PHP did
quite successfully (even if its interface sucked, it still just worked)


I'll agree that it isn't hard, and I think including it would help boost 
D's usability in the web service space.


--
Adam Wilson
IRC: LightBender
import quiet.dlang.dev;


Re: Databases and the D Standard Library

2017-01-01 Thread Adam Wilson via Digitalmars-d

Chris Wright wrote:

On Sat, 31 Dec 2016 19:24:31 -0800, Adam Wilson wrote:

My idea: Split the data storage systems out by category of data-store.
For example:
- SQL: std.database.sql (PostgreSQL, MySQL, MSSQL, etc.)


This is doable; SQL is an ANSI and ISO standard, and it strongly
constrains what you can do with your data.


- Document: std.database.document (Mongo, CouchDB, etc.)
- Key-Value: std.database.keyvalue (Redis, etcd2, etc.)


I'm not so certain about this. CouchDB has a rather different approach to
things than MongoDB -- possibly not as divergent as Mongo from MySQL, but
far more than Postgres from MySQL.

Likewise, there are many key/value stores in existence, and they support
many different operations. For instance, it looks like etcd2 has a notion
of directories, where you can list items in a directory. Redis just lets
you list keys with a given prefix. Redis lets you modify values in-place;
etcd2 doesn't.

We could define a common subset of operations for document databases and
key/value stores, but most people probably wouldn't be satisfied with it.

There's also a question of where you'd put Cassandra in that, since it's
decidedly not a SQL database but tries to pretend it is.

Given that the Cassandra folks wrote an ADO.NET provider for it, I would 
suggest that it is easiest to treat it as a SQL database from an 
interface standpoint.



4. We should hide querying from the developer because they are bad at
it, security flaws, etc.

My answer: While agree in principal, especially with the security
concerns, in reality what you are asking for is an ORM. In my opinion,
that is a separate concern from a database interface, and is typically
implemented as layer over the DB interface.


We can encourage people to use prepared queries with documentation and
naming.



Precisely, my focus would be an making the API as easy as possible to 
use with Parameterized Queries, and if that makes it harder to write 
non-parameterized queries, oh well. :D


--
Adam Wilson
IRC: LightBender
import quiet.dlang.dev;


Re: Databases and the D Standard Library

2017-01-01 Thread Adam Wilson via Digitalmars-d

Chris Wright wrote:

On Sun, 01 Jan 2017 10:29:28 +0100, Jacob Carlborg wrote:


On 2017-01-01 04:24, Adam Wilson wrote:


My idea: Each data store has it's own implementation with it's own
naming convention. For example (ADO.NET):
  - SqlConnection (MSSQL)
  - NpgsqlConnection (Npgsql)

Yes, this means that you have to change names in your code if you
switch data-stores, but since you are already changing your queries,
which is a much more difficult change, this isn't a significant
additional cost.


I don't think we should try to make implementations different just
because. If you have an SQL builder or an ORM on top of the interface
that abstract the differences in the SQL syntax, it's possible to switch
driver, within reason.


Those both limit your ability to use the underlying database to its full
potential. They offer a chance for queries that seem simple and efficient
to become horribly inefficient.



I cannot state my agreement with this paragraph enough. Every ORM I've 
worked with generates some inexplicably horrific SQL in seemingly simple 
situations.



I ran across a problem in NHibernate about a decade ago. We had a
straightforward HQL query involving joins. It took over a minute to run.
We wrote the simple equivalent in SQL and it completed in milliseconds.
Fortunately, NHibernate had the ability to run raw SQL queries.



I've also seen Entity Framework 6/7 do the same thing.

--
Adam Wilson
IRC: LightBender
import quiet.dlang.dev;


Re: Databases and the D Standard Library

2017-01-01 Thread Adam Wilson via Digitalmars-d

Chris Wright wrote:

On Sat, 31 Dec 2016 19:24:31 -0800, Adam Wilson wrote:

My idea: Each data store has it's own implementation with it's own
naming convention. For example (ADO.NET):
- SqlConnection (MSSQL)
- NpgsqlConnection (Npgsql)

Yes, this means that you have to change names in your code if you switch
data-stores


You *can* use classes and interfaces and type hierarchies. They do use
the GC by default, unlike structs, but they're kind of handy, especially
here.



That was my intention, the knee-jerk reaction that class and interfaces 
get here sometimes strikes me as a bit histrionic sometimes. They are a 
tool with a use case. :)



Then you have a SqlConnection interface that most people use all the time
and all people use most of the time. If you explicitly need some
connection properties that are specific to Postgres, you cast to a
PostgresConnection.



That is pretty much how it works in ADO.NET and JDO. And I think it 
works well.


--
Adam Wilson
IRC: LightBender
import quiet.dlang.dev;


Re: Databases and the D Standard Library

2017-01-01 Thread Adam Wilson via Digitalmars-d

Mark wrote:

On Sunday, 1 January 2017 at 03:24:31 UTC, Adam Wilson wrote:

2. There are so many different types of data storage systems, how do
you design a system generic enough for all of them?

My answer: You don't. Nobody else has bothered trying, and I believe
that our worry over that question is a large part of why we don't have
anything substantive today.

My idea: Split the data storage systems out by category of data-store.
For example:
- SQL: std.database.sql (PostgreSQL, MySQL, MSSQL, etc.)
- Document: std.database.document (Mongo, CouchDB, etc.)
- Key-Value: std.database.keyvalue (Redis, etcd2, etc.)


I think that these can all be seen as special cases of a hypegraph
database. So on the face of it you probably can build some common
interface (apparently the fellows in hypergraphdb.org are trying to do
something in this spirit). You can then have specialized interfaces
inheriting from it. But given how general hypergraph dbs are,  I'm not
sure if this is a worthwhile abstraction.



My experience with graph DB's is that the reality has never been 
anywhere close to the hype. I don't think it's a worthwhile abstraction. 
But that is my opinion. And if it is, we should be able to add it over 
the top of this layer if we really want too.



By the way, what about XML? The documentaion on std.xml says that the
module will be replaced at some point in the future. I wonder when and
with what...


I have no idea. It's a great question! That said it's a bit outside the 
scope of this topic. :)


--
Adam Wilson
IRC: LightBender
//quiet.dlang.dev


Re: Databases and the D Standard Library

2017-01-01 Thread Mark via Digitalmars-d

On Sunday, 1 January 2017 at 03:24:31 UTC, Adam Wilson wrote:
2. There are so many different types of data storage systems, 
how do you design a system generic enough for all of them?


My answer: You don't. Nobody else has bothered trying, and I 
believe that our worry over that question is a large part of 
why we don't have anything substantive today.


My idea: Split the data storage systems out by category of 
data-store.

For example:
- SQL: std.database.sql (PostgreSQL, MySQL, MSSQL, etc.)
- Document: std.database.document (Mongo, CouchDB, etc.)
- Key-Value: std.database.keyvalue (Redis, etcd2, etc.)


I think that these can all be seen as special cases of a 
hypegraph database. So on the face of it you probably can build 
some common interface (apparently the fellows in hypergraphdb.org 
are trying to do something in this spirit). You can then have 
specialized interfaces inheriting from it. But given how general 
hypergraph dbs are,  I'm not sure if this is a worthwhile 
abstraction.


By the way, what about XML? The documentaion on std.xml says that 
the module will be replaced at some point in the future. I wonder 
when and with what...


Re: Databases and the D Standard Library

2017-01-01 Thread Chris Wright via Digitalmars-d
On Sat, 31 Dec 2016 19:24:31 -0800, Adam Wilson wrote:
> My idea: Each data store has it's own implementation with it's own
> naming convention. For example (ADO.NET):
>   - SqlConnection (MSSQL)
>   - NpgsqlConnection (Npgsql)
> 
> Yes, this means that you have to change names in your code if you switch
> data-stores

You *can* use classes and interfaces and type hierarchies. They do use 
the GC by default, unlike structs, but they're kind of handy, especially 
here.

Then you have a SqlConnection interface that most people use all the time 
and all people use most of the time. If you explicitly need some 
connection properties that are specific to Postgres, you cast to a 
PostgresConnection.


Re: Databases and the D Standard Library

2017-01-01 Thread Chris Wright via Digitalmars-d
On Sun, 01 Jan 2017 10:29:28 +0100, Jacob Carlborg wrote:

> On 2017-01-01 04:24, Adam Wilson wrote:
> 
>> My idea: Each data store has it's own implementation with it's own
>> naming convention. For example (ADO.NET):
>>  - SqlConnection (MSSQL)
>>  - NpgsqlConnection (Npgsql)
>>
>> Yes, this means that you have to change names in your code if you
>> switch data-stores, but since you are already changing your queries,
>> which is a much more difficult change, this isn't a significant
>> additional cost.
> 
> I don't think we should try to make implementations different just
> because. If you have an SQL builder or an ORM on top of the interface
> that abstract the differences in the SQL syntax, it's possible to switch
> driver, within reason.

Those both limit your ability to use the underlying database to its full 
potential. They offer a chance for queries that seem simple and efficient 
to become horribly inefficient.

I ran across a problem in NHibernate about a decade ago. We had a 
straightforward HQL query involving joins. It took over a minute to run. 
We wrote the simple equivalent in SQL and it completed in milliseconds. 
Fortunately, NHibernate had the ability to run raw SQL queries.


Re: Databases and the D Standard Library

2017-01-01 Thread Jacob Carlborg via Digitalmars-d

On 2017-01-01 04:24, Adam Wilson wrote:


My idea: Each data store has it's own implementation with it's own
naming convention. For example (ADO.NET):
 - SqlConnection (MSSQL)
 - NpgsqlConnection (Npgsql)

Yes, this means that you have to change names in your code if you switch
data-stores, but since you are already changing your queries, which is a
much more difficult change, this isn't a significant additional cost.


I don't think we should try to make implementations different just 
because. If you have an SQL builder or an ORM on top of the interface 
that abstract the differences in the SQL syntax, it's possible to switch 
driver, within reason.


--
/Jacob Carlborg


Re: Databases and the D Standard Library

2016-12-31 Thread Chris Wright via Digitalmars-d
On Sat, 31 Dec 2016 19:24:31 -0800, Adam Wilson wrote:
> My idea: Split the data storage systems out by category of data-store.
> For example:
>   - SQL: std.database.sql (PostgreSQL, MySQL, MSSQL, etc.)

This is doable; SQL is an ANSI and ISO standard, and it strongly 
constrains what you can do with your data.

>   - Document: std.database.document (Mongo, CouchDB, etc.)
>   - Key-Value: std.database.keyvalue (Redis, etcd2, etc.)

I'm not so certain about this. CouchDB has a rather different approach to 
things than MongoDB -- possibly not as divergent as Mongo from MySQL, but 
far more than Postgres from MySQL.

Likewise, there are many key/value stores in existence, and they support 
many different operations. For instance, it looks like etcd2 has a notion 
of directories, where you can list items in a directory. Redis just lets 
you list keys with a given prefix. Redis lets you modify values in-place; 
etcd2 doesn't.

We could define a common subset of operations for document databases and 
key/value stores, but most people probably wouldn't be satisfied with it.

There's also a question of where you'd put Cassandra in that, since it's 
decidedly not a SQL database but tries to pretend it is.

> 4. We should hide querying from the developer because they are bad at
> it, security flaws, etc.
> 
> My answer: While agree in principal, especially with the security
> concerns, in reality what you are asking for is an ORM. In my opinion,
> that is a separate concern from a database interface, and is typically
> implemented as layer over the DB interface.

We can encourage people to use prepared queries with documentation and 
naming.


Re: Databases and the D Standard Library

2016-12-31 Thread rikki cattermole via Digitalmars-d

On 01/01/2017 5:19 PM, Adam D. Ruppe wrote:

On Sunday, 1 January 2017 at 03:51:52 UTC, rikki cattermole wrote:

Which is fine if all you use is c's sockets or only that database
connection for a thread.


The C drivers typically offer handles of some sort (Windows HANDLE, *nix
file descriptor, that kind of thing) that you can integrate into other
event loops.


That's fine and all, but you've still got to deal with it on D's side so 
you can mix and match libraries that require access to the same event 
loop (such as Windows).


Re: Databases and the D Standard Library

2016-12-31 Thread Adam D. Ruppe via Digitalmars-d

On Sunday, 1 January 2017 at 03:24:31 UTC, Adam Wilson wrote:
interface(s) to a data-store an essential component of the D 
Standard Library.


Eh, I count it as would-be-nice just because it isn't that hard 
to just use the C ones, or another third party lib; it doesn't 
have to be Phobos itself.


That said though, a basic db interface is quite simple and would 
be a nice batteries included bit - it is one of the things IMO 
that PHP did quite successfully (even if its interface sucked, it 
still just worked)


Re: Databases and the D Standard Library

2016-12-31 Thread Adam D. Ruppe via Digitalmars-d

On Sunday, 1 January 2017 at 03:51:52 UTC, rikki cattermole wrote:
Which is fine if all you use is c's sockets or only that 
database connection for a thread.


The C drivers typically offer handles of some sort (Windows 
HANDLE, *nix file descriptor, that kind of thing) that you can 
integrate into other event loops.


Re: Databases and the D Standard Library

2016-12-31 Thread rikki cattermole via Digitalmars-d

On 01/01/2017 4:46 PM, Adam Wilson wrote:

On 12/31/16 7:31 PM, rikki cattermole wrote:

We do indeed need a good database abstraction.
But a core requirement for any implementation has yet to be met.

There has to be a standard way for asynchronous sockets. To implement
this we need to take into consideration the event loop that it uses and
more importantly allow it to be integrated for e.g. windowing.

So here is a dependency before we get a database abstraction into
Phobos, a nice fast event loop manager that is generic.


Or, alternatively, use existing drivers that have this capability built in?


Which is fine if all you use is c's sockets or only that database 
connection for a thread.


Which is not very realistic for game or web development.


Re: Databases and the D Standard Library

2016-12-31 Thread Adam Wilson via Digitalmars-d

On 12/31/16 7:31 PM, rikki cattermole wrote:

We do indeed need a good database abstraction.
But a core requirement for any implementation has yet to be met.

There has to be a standard way for asynchronous sockets. To implement
this we need to take into consideration the event loop that it uses and
more importantly allow it to be integrated for e.g. windowing.

So here is a dependency before we get a database abstraction into
Phobos, a nice fast event loop manager that is generic.


Or, alternatively, use existing drivers that have this capability built in?

--
Adam Wilson
IRC: LightBender
import quiet.dlang.dev;


Re: Databases and the D Standard Library

2016-12-31 Thread rikki cattermole via Digitalmars-d

We do indeed need a good database abstraction.
But a core requirement for any implementation has yet to be met.

There has to be a standard way for asynchronous sockets. To implement 
this we need to take into consideration the event loop that it uses and 
more importantly allow it to be integrated for e.g. windowing.


So here is a dependency before we get a database abstraction into 
Phobos, a nice fast event loop manager that is generic.


Databases and the D Standard Library

2016-12-31 Thread Adam Wilson via Digitalmars-d

Hi Everyone,

I've seen a lot of talk on the forums over the past year about the need 
for database support in the D Standard Library and I completely agree. 
At the end of the day the purpose of any programming language and its 
attendant libraries is to allow the developer to solve their problems 
quickly and efficiently; and a large subset of those solutions require 
some form of structured data store. To my mind, this makes some form of 
interface(s) to a data-store an essential component of the D Standard 
Library. And since this is something that my particular problem spaces 
also need, I thought it would be useful to attempt to do something about it.


First, I've seen a couple of promising projects, the most complete, and 
recent of which, dstddb (Github: https://github.com/cruisercoder/dstddb) 
hasn't seen a commit since June. An additional setback came when I tried 
to use it and was greeted with a litany of compiler errors.


This is *not* a problem, it's the natural course of a volunteer 
community such as ours; and I want to thank Erik Smith profusely for his 
work. Priorities and circumstances change and that means that valuable 
projects are inexplicably dropped.


But we still lack a critical component, and to get the conversation 
started, I'd like to break down the issues I've seen brought up in past 
threads on this subject and encourage you to bring your own. I may have 
ideas, but I can't possibly know the entire problem space.


1. Isn't this an enormous amount of work?

My answer: Absolutely, depending on your preferred scope of work.

In general, I've seen two distinct camps on this issue. One says that we 
should implement everything in D from the ground up, including 
re-implementation of the database drivers themselves in D. If this is 
your preferred scope of work then you will invariably become 
disheartened at the truly stupendous amount of work you face and give up.


The other camp says that we should make use of existing drivers and 
include them in the D Standard Library. This is difficult path to follow 
as the vanilla build of the D Standard Library now requires a 
significant number of foreign libraries, all with differing licenses, be 
built and distributed to everyone; regardless of whether or not they use 
them in their project. This is more-or-less than path the dstddb is/was on.


My idea: Focus on defining the interface, not the individual driver 
implementations.


If instead we focused on defining an interface that a "conforming 
implementation" had to follow, we would allow developers to only pull in 
the library they need or build a from-scratch library if they so desire.


Indeed this is the model that both Java (JDO) and .NET (ADO.NET) follow 
and I think we would be well advised to follow their lead here. Not only 
is the methodology battle-proven, it is also well understood by a 
significant portion of D's potential user-base. By way of example, 
Npgsql is the ADO.NET implementation of a driver for PostgreSQL.


2. There are so many different types of data storage systems, how do you 
design a system generic enough for all of them?


My answer: You don't. Nobody else has bothered trying, and I believe 
that our worry over that question is a large part of why we don't have 
anything substantive today.


My idea: Split the data storage systems out by category of data-store.
For example:
- SQL: std.database.sql (PostgreSQL, MySQL, MSSQL, etc.)
- Document: std.database.document (Mongo, CouchDB, etc.)
- Key-Value: std.database.keyvalue (Redis, etcd2, etc.)

If you want something that doesn't fit into a category above, you're own 
your own, but you were also on your own in other languages.


3. We need to provide a single interface for all data-stores in the 
SQL/Document/Key-Value category.


My answer: Are you sure? The problem is that each underlying data-store 
has it's own dialect. For example, PostgreSQL and MSSQL are both 
ostensibly ANSI-SQL, except where they aren't. Re-targeting data-stores, 
even in the same category, is never going to be as simple as changing a 
connection string. And additionally, you will have to implement a 
super-set of features in the interface to support all the variations and 
throw exceptions where the chosen implementation does not support a 
specific feature.


My idea: Each data store has it's own implementation with it's own 
naming convention. For example (ADO.NET):

- SqlConnection (MSSQL)
- NpgsqlConnection (Npgsql)

Yes, this means that you have to change names in your code if you switch 
data-stores, but since you are already changing your queries, which is a 
much more difficult change, this isn't a significant additional cost. 
Also, the code becomes clearer to those who take over maintenance duties 
from the original author, especially when you are mixing data-stores. 
But in all honest, most developers will pick on technology and stick 
with it for the entirety of the soft