Re: std.database

Piotrek via Digitalmars-d Thu, 03 Mar 2016 22:23:05 -0800

On Thursday, 3 March 2016 at 18:48:08 UTC, Chris Wright wrote:

You were a bit vague before. I interpreted you as saying "justoffer a range and an array-like API, and then you can use itwith std.algorithm". But if you meant to offer an API that issimilar to std.algorithm and also array-like, that's morefeasible.

I agree I could be better in describing the concept. But I justsketched the idea.

You're still left with the task of transpiling D to SQL.

If someone wants to use SQL in its *full power* no D API nor anyother language will suffice. Mainly because it will be always atraslation layer . The only we can to is to provide an aid likethings suggested by Andrei (sql parser, value binding, etc).

This model does not work with CouchDB.

I don't know CouchDB so I can't comment.

You must avoid using std.algorithm and std.range functionsassiduously because they would offer terrible performance.

For big data in db, plain vanilla std.algorithm won't beinsufficient. I agree.

 * No support for complex queries.
Not sure what you mean by complex queries. Also I think theAPI allows arbitrary complex queries.
Aggregates, especially with joins. Computed fields.

Regarding computed fields and other database vendor specificfeatures you are right.But on the other hand aggregations and joins can be representedas objects and proxies of objects.

 * No support for joins.
Can be done by @attributes or other linking functionalitybetween DbCollections.
With attributes, you need users to define aggregate typesinstead of just using Row and the like. That's ORM territory.

I don't like ORM with respect to SQL. But quasi object databasewhich can look similar to ORM is not a problem for me.

At a previous job I maintained an internal BI site that exposed50-100 different queries, each with their own set of resultfields. We didn't want to use ORM there; it would have beencumbersome and inappropriate.

I can see your point. But the problem can be solved by not usingSQL.

Also, that assumes that you will always want a join whenquerying a table. I maintained an application once, using ORM,in which we sometimes wanted an eager join and sometimes wanteda lazy one. This posed a nontrivial performance impact.


Something like DbProxy would handle lazy "joins".

I'm not sure ORM would be a candidate for phobos.

As I don't plan to use an (traditional) ORM I'm not involved.However if other people would find it worthy I don't object.

 * No support for projections.
You mean something like referring to part of the item'sfields? I see no problem here.
Let me point you to the existence of the TEXT and BLOBdatatypes. They can each hold 2**32 bytes of data in MySQL.


This is something a DbProxy would handle. Eventually:

struct OrginalObject
{
  int id;
  string bigString;
}

struct StrippedObject
{
  int id;
}

then
auto collA = db.collection!OrginalObject("Big");
auto collA = db.collection!StrippedObject("Big");

In the second line the string is not fetched.

I'm not splitting those off into a separate table to port mylegacy database to your API. I'm not dragging in multiplemegabytes of data in every query.
If you're going full ORM, you can add lazy fields. That addscomplexity. It's also inefficient when I know in advance that Ineed those fields.
 * In your implementation, updates must bring every affected
row over the wire, then send back the modified row.
In my implementation there is no wire (that's why I call itembedded). However I thought we talk about API and notparticular implementation. I don't see how this API excludesRPC. Query strings (e.g. SQL) can be provided in old fashionedway.
I'm running a website and decide that, with the latest changes,existing users need to get the new user email. So I write:
  UPDATE users SET sent_join_email = FALSE;
  -- ok; 1,377,212 rows affected
Or I'm using your database system. If it uses std.algorithm, Ihave to iterate through the users list, pulling each row intomy process's memory from the database server, and then I haveto write everything back to the database server.
Depending on the implementation, it's using a database cursoror issuing a new query for every K results. If it's using adatabase cursor, those might not be valid across transactionboundaries. I'm not sure. If they aren't, you get a largetransaction, which causes problems.
If your database system instead offers a string-based APIsimilar to std.algorithm, you might be able to turn this into asingle query, but it's going to be a lot of work for you.

For client-server approach I agree with the above. For embeddeddesign (as in my project) this is not a case.

 * Updates affect an entire row. If one process updates one
field in a row and another one updates a different field, oneof those
writes gets clobbered.
I think this is just a "must have" for any db engine. I don'tsee how it applies to the proposed API other than anyimplementation of db engine has to handle it properly.
Without transactions, MySQL supports writing to two differentcolumns in two different queries without those writesclobbering each other.
That's handling it properly.

Designing a good locking mechanism will be a challenging task,that is what I'm sure :)

When I say DbCollection should behave similar to an ordinalarray I don't mean it should be an ordinal array.
* The API assumes a total ordering for each DbCollection.This
is not valid.
I don't know what you mean here. Example would be good.
opIndex(size_t offset) assumes the database supports aone-to-one mapping between offsets and rows.

[...]

This is a terrible usage pattern, but by offering opIndex andlength operations, you are recommending it.

I don't recommend it. I just added it for evaluation. I'm awareit only works when the collection is not mutated. I think thesame goes for all shared collections (also those in memory)


Finally, IMO any DB API will be biased toward one solution.

Cheers
Piotrek

Re: std.database

Reply via email to