Re: Database Design Advice

2013-11-18 Thread Smylers
Dagfinn Ilmari Mannsåker writes:

 Smylers smyl...@stripey.com writes:
 
  Here it is in Postgres's own function language, which goes by the
  awkwardly written name PL/pgSQL:
 
CREATE OR REPLACE FUNCTION set_document_version_num() RETURNS TRIGGER AS
$FN$
  BEGIN
NEW.version := 1 +
MAX(version)
FROM document_version
WHERE document_id = NEW.document_id;
RETURN NEW;
  END;
$FN$ LANGUAGE plpgsql;
 
CREATE TRIGGER insert_document_version_num
BEFORE INSERT ON document_version FOR EACH ROW
EXECUTE PROCEDURE set_document_version_num();
 
 This trigger works fine, until you get concurrent saves of the same
 document, in which case one of the transactions will get a duplicate key
 violation.

Ah, thank you for pointing that out.

In this case I think having a collision, and hence an error, may
actually be desirable: the application can detect the error and warn the
user that another change has already been made.

 If you want both to succed, with the last one winning, you can do it
 by keeping the current version in the document table,

I had been thinking that doing that would be bad, because it's
redundant, even though having it looks like it'd be convenient in a few
places.

 and making the trigger update it:

But that sounds quite nice, the duplication seems liveable-with when
it's managed with a trigger.

   ALTER TABLE document
 ADD COLUMN current_version INTEGER NOT NULL DEFAULT 0;
 
   REPLACE FUNCTION set_document_version_num() RETURNS TRIGGER AS
   $FN$
 BEGIN
   UPDATE document
 SET current_version = current_version + 1 
 WHERE id = NEW.document_id
 RETURNING current_version
 INTO NEW.version;
   RETURN NEW;
 END
   $FN$ LANGUAGE plpgsql;

I do like Postgres's RETURNING clauses — they seem so much more flexible
and elegant than other ways of achieving the same ends.

Cheers

Smylers
-- 
The UK gov's proposed gagging law will suppress protest and campaigning by
charities and political groups. Read more: http://civilsocietycommission.info/
Please sign this petition: http://38d.gs/1cJ9ViA
Oxfam • BHA • Quakers * Countryside Alliance • League Against Cruel Sports • CND



Re: Database Design Advice

2013-11-18 Thread Philip Skinner

Auto-increment a double column primary key, something like:

uid int(11) not null auto_increment, revision int(11) not null 
auto_increment, primary key(uid, revision)


; works in mysql. You won't have your concurrent query issues there, for 
the most part anyway. Maybe marking a records as tainted.


On 11/18/2013 10:25 AM, Smylers wrote:

Dagfinn Ilmari Mannsåker writes:


Smylers smyl...@stripey.com writes:


Here it is in Postgres's own function language, which goes by the
awkwardly written name PL/pgSQL:

   CREATE OR REPLACE FUNCTION set_document_version_num() RETURNS TRIGGER AS
   $FN$
 BEGIN
   NEW.version := 1 +
   MAX(version)
   FROM document_version
   WHERE document_id = NEW.document_id;
   RETURN NEW;
 END;
   $FN$ LANGUAGE plpgsql;

   CREATE TRIGGER insert_document_version_num
   BEFORE INSERT ON document_version FOR EACH ROW
   EXECUTE PROCEDURE set_document_version_num();

This trigger works fine, until you get concurrent saves of the same
document, in which case one of the transactions will get a duplicate key
violation.

Ah, thank you for pointing that out.

In this case I think having a collision, and hence an error, may
actually be desirable: the application can detect the error and warn the
user that another change has already been made.


If you want both to succed, with the last one winning, you can do it
by keeping the current version in the document table,

I had been thinking that doing that would be bad, because it's
redundant, even though having it looks like it'd be convenient in a few
places.


and making the trigger update it:

But that sounds quite nice, the duplication seems liveable-with when
it's managed with a trigger.


   ALTER TABLE document
 ADD COLUMN current_version INTEGER NOT NULL DEFAULT 0;

   REPLACE FUNCTION set_document_version_num() RETURNS TRIGGER AS
   $FN$
 BEGIN
   UPDATE document
 SET current_version = current_version + 1
 WHERE id = NEW.document_id
 RETURNING current_version
 INTO NEW.version;
   RETURN NEW;
 END
   $FN$ LANGUAGE plpgsql;

I do like Postgres's RETURNING clauses — they seem so much more flexible
and elegant than other ways of achieving the same ends.

Cheers

Smylers




Re: Database Design Advice

2013-11-18 Thread Matt Lawrence

On 18/11/13 11:18, Philip Skinner wrote:

Auto-increment a double column primary key, something like:

uid int(11) not null auto_increment, revision int(11) not null 
auto_increment, primary key(uid, revision)


; works in mysql. You won't have your concurrent query issues there, 
for the most part anyway. Maybe marking a records as tainted.


As far as I know that only works with MyISAM in mysql. It certainly 
doesn't work like that in Postgres.


Matt


Re: Database Design Advice

2013-11-09 Thread Peter Corlett
On 8 Nov 2013, at 19:33, David Cantrell da...@cantrell.org.uk wrote:
[...]
 Because you might need to know which of two events at 2013-11-08Z19:31:04 
 happened first. Sure you could use microseconds or whatever to get better 
 resolution, but all that does is make the problem less likely, it doesn't 
 make it go away.  You also normally want sort order to be consistent. If you 
 have two records where the sort field is the same, the order they come out is 
 going to be unpredictable.


That's just a specific example of the general problem where one desires a 
stable sort by a non-unique column. The simple solution is to just add more 
columns to the ORDER BY until the tuples *are* unique. The primary key is an 
obvious choice of tie-breaker if you don't care about the order so long as it's 
consistent.





Re: Database Design Advice

2013-11-09 Thread Kieren Diment
On 10/11/2013, at 4:24, Peter Corlett ab...@cabal.org.uk wrote:

 On 8 Nov 2013, at 19:33, David Cantrell da...@cantrell.org.uk wrote:
 [...]
 Because you might need to know which of two events at 2013-11-08Z19:31:04 
 happened first. Sure you could use microseconds or whatever to get better 
 resolution, but all that does is make the problem less likely, it doesn't 
 make it go away.  You also normally want sort order to be consistent. If you 
 have two records where the sort field is the same, the order they come out 
 is going to be unpredictable.
 
 
 That's just a specific example of the general problem where one desires a 
 stable sort by a non-unique column. The simple solution is to just add more 
 columns to the ORDER BY until the tuples *are* unique. The primary key is an 
 obvious choice of tie-breaker if you don't care about the order so long as 
 it's consistent.

Also, the time I tried to use a date field for ordering, it wasn't susceptible 
to this problem for reasons related to the specific use case. It was still a 
bad idea though, for scope-creep related reasons. 


Re: Database Design Advice

2013-11-08 Thread James Laver


Smylers smyl...@stripey.com wrote:
William Blunn writes:

 
 Instead of storing a version ID (e.g. 1, 2, 3, 4, 5, 6), you could
 store a SERIAL.
 
 So for one document ID, you might store versions 10, 11, 12, 50, 75,
87.

This was my immediate thought as well, but I'd probably cheat and declarr the 
document version numbers to be oureky decorative and thus the realm of 
userspace to turn them into 1,2,3,4,5 etc. -- one can, after all fix this with 
a single line of code.

Performance (and complexity) would be much better than triggers

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


Re: Database Design Advice

2013-11-08 Thread Mark Stringer

On 11/08/2013 08:17 AM, James Laver wrote:


Smylers smyl...@stripey.com wrote:

William Blunn writes:


Instead of storing a version ID (e.g. 1, 2, 3, 4, 5, 6), you could
store a SERIAL.

So for one document ID, you might store versions 10, 11, 12, 50, 75,

87.

This was my immediate thought as well, but I'd probably cheat and declarr the 
document version numbers to be oureky decorative and thus the realm of 
userspace to turn them into 1,2,3,4,5 etc. -- one can, after all fix this with 
a single line of code.

Performance (and complexity) would be much better than triggers

From the OP: But version should start at 1 for each document and be 
consecutive.


I'd guess that the date_created is being stored in the table, so could 
be used to order the records without the need for a serial, if ordering 
is the only concern.


Re: Database Design Advice

2013-11-08 Thread James Laver


Mark Stringer m...@repixl.com wrote:

 From the OP: But version should start at 1 for each document and be 
consecutive.

I'll grant that I've assumed that was to be more friendly to the end user, but 
in that case, it really is just cunning use of the .. operator.

I'd guess that the date_created is being stored in the table, so could 
be used to order the records without the need for a serial, if ordering

is the only concern.

Personally I would opt for timestamps by preference, but that brings in a 
longer requirements chat. How does backdating revisions work? Do you need a 
separate key to order in that case? Etc.

James

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


Re: Database Design Advice

2013-11-08 Thread Kieren Diment
On 08/11/2013, at 19:38, Mark Stringer m...@repixl.com wrote:

 On 11/08/2013 08:17 AM, James Laver wrote:
 
 Smylers smyl...@stripey.com wrote:
 William Blunn writes:
 
 Instead of storing a version ID (e.g. 1, 2, 3, 4, 5, 6), you could
 store a SERIAL.
 
 So for one document ID, you might store versions 10, 11, 12, 50, 75,
 87.
 This was my immediate thought as well, but I'd probably cheat and declarr 
 the document version numbers to be oureky decorative and thus the realm of 
 userspace to turn them into 1,2,3,4,5 etc. -- one can, after all fix this 
 with a single line of code.
 
 Performance (and complexity) would be much better than triggers
 From the OP: But version should start at 1 for each document and be 
 consecutive.
 
 I'd guess that the date_created is being stored in the table, so could be 
 used to order the records without the need for a serial, if ordering is the 
 only concern.

I've tried to get away with that in the past. I was told that I'm not the 
messiah, I'm a very naughty boy. It's fine in one respect, but a total pain to 
fix if the business logic shifts on you at a later stage. 


Re: Database Design Advice

2013-11-08 Thread Smylers
James Laver writes:

 Smylers smyl...@stripey.com wrote:
 
  William Blunn writes:
 
   Instead of storing a version ID (e.g. 1, 2, 3, 4, 5, 6), you could
   store a SERIAL.
   
   So for one document ID, you might store versions 10, 11, 12, 50,
   75, 87.
 
 This was my immediate thought as well, but I'd probably cheat and
 declarr the document version numbers to be oureky decorative
 ^^
“purely”, I presume?

 and thus the realm of userspace to turn them into 1,2,3,4,5 etc. --
 one can, after all fix this with a single line of code.

True. I'm all for cheating.

But reporting will be done by Crystal Reports connecting directly to the
database (and there's a distinct possibility that there will end up
being more than one code base (in different programming languages) using
the database too), so I'd rather any serial-number-to-per-document-
version-number conversion was handled in the database, so it's in just
one place.

That conversion could be abstracted by a view, so the complexity is
hidden for somebody just doing a SELECT — but my attempts with the
Rank() window function seem like too complex complexity to be worth it.

 Performance (and complexity) would be much better than triggers

Worth bearing in mind. Thanks.

Smylers
-- 
The UK gov's proposed gagging law will suppress protest and campaigning by
charities and political groups. Read more: http://civilsocietycommission.info/
Please sign this petition: http://38d.gs/1cJ9ViA
Oxfam • BHA • Quakers * Countryside Alliance • League Against Cruel Sports • CND



Re: Database Design Advice

2013-11-08 Thread James Laver
On Fri, Nov 8, 2013 at 10:06 AM, Smylers smyl...@stripey.com wrote:
 James Laver writes:

 This was my immediate thought as well, but I'd probably cheat and
 declarr the document version numbers to be oureky decorative
  ^^
 “purely”, I presume?

Yes. Phone keyboard, sorry.

 and thus the realm of userspace to turn them into 1,2,3,4,5 etc. --
 one can, after all fix this with a single line of code.

 True. I'm all for cheating.

 But reporting will be done by Crystal Reports connecting directly to the
 database (and there's a distinct possibility that there will end up
 being more than one code base (in different programming languages) using
 the database too), so I'd rather any serial-number-to-per-document-
 version-number conversion was handled in the database, so it's in just
 one place.

 That conversion could be abstracted by a view, so the complexity is
 hidden for somebody just doing a SELECT — but my attempts with the
 Rank() window function seem like too complex complexity to be worth it.

In this case, a view seems like the best solution. But if you're going
to materialise it for extra performance, then you're back in the world
of triggers (assuming eager materialisation), and back comes the
complexity :(

James



Re: Database Design Advice

2013-11-08 Thread Smylers
I wrote:

 Hello. I'm designing a database schema, and am interested in any
 wisdom folk can share over a few aspects of it:

Thanks for all the advice so far. One more question I forgot to ask in
the initial mail:

A discount can either be a percentage or a value in euros. I can think
of several suboptimal ways of representing this:

• Separate fields for discount_percentage and discount_amount. This has
  the disadvantage of needing to ensure that a record doesn't have both
  fields set.
  
  If the business concocts another discount type in future, this will
  require a field for each discount type.

• A discount_type field which indicates either 'percent' or 'euro' (or
  NULL for no discount), then a numeric field which stores either the
  percentage of the euro amount. This seems really icky, having a field
  which can mean one of two different things, just that they both happen
  to be numeric. 

  This approach always uses 2 fields regardless of the number of
  discount types. But it forces all of them to be numeric.

• Have separate euro_discount and percentage_discount tables, then when
  a discount applies create a record in the appropriate table. This
  avoids any NULLs in the DB (something I've seen advocated as good
  database design), but it still requires ensuring that both discount
  types don't get used at once.
  
  It also makes answering the question ‘is there any discount?’ more
  work, and something which will get worse if the business concocts
  another discount type in future.

All suggestions gratefully received.

Thanks

Smylers
-- 
The UK gov's proposed gagging law will suppress protest and campaigning by
charities and political groups. Read more: http://civilsocietycommission.info/
Please sign this petition: http://38d.gs/1cJ9ViA
Oxfam • BHA • Quakers * Countryside Alliance • League Against Cruel Sports • CND



Re: Database Design Advice

2013-11-08 Thread Kieren Diment
I must say I quite like playing with triggers.  A well encapsulated trigger 
shouldn't take too much developer time, and so long as people read the schema 
definitions, and the trigger is correctly documented/linked to  in the right 
places (which may be tricky to anticipate in advance, where the right places 
are), then that's what I'd do.  Although I'd make a view instead if someone on 
the team felt strongly about it.

On 08/11/2013, at 10:41 PM, James Laver wrote:

 On Fri, Nov 8, 2013 at 10:06 AM, Smylers smyl...@stripey.com wrote:
 James Laver writes:
 
 This was my immediate thought as well, but I'd probably cheat and
 declarr the document version numbers to be oureky decorative
 ^^
 “purely”, I presume?
 
 Yes. Phone keyboard, sorry.
 
 and thus the realm of userspace to turn them into 1,2,3,4,5 etc. --
 one can, after all fix this with a single line of code.
 
 True. I'm all for cheating.
 
 But reporting will be done by Crystal Reports connecting directly to the
 database (and there's a distinct possibility that there will end up
 being more than one code base (in different programming languages) using
 the database too), so I'd rather any serial-number-to-per-document-
 version-number conversion was handled in the database, so it's in just
 one place.
 
 That conversion could be abstracted by a view, so the complexity is
 hidden for somebody just doing a SELECT — but my attempts with the
 Rank() window function seem like too complex complexity to be worth it.
 
 In this case, a view seems like the best solution. But if you're going
 to materialise it for extra performance, then you're back in the world
 of triggers (assuming eager materialisation), and back comes the
 complexity :(
 
 James
 




Re: Database Design Advice

2013-11-08 Thread James Laver
On Fri, Nov 8, 2013 at 11:43 AM, Smylers smyl...@stripey.com wrote:

 • Separate fields for discount_percentage and discount_amount. This has
   the disadvantage of needing to ensure that a record doesn't have both
   fields set.

   If the business concocts another discount type in future, this will
   require a field for each discount type.

I'd probably actually go for this and a trigger. The particular case
of discounts means it's unlikely they'll add another, and this seems
like the most straightforward way to deal with it.

Relational databases aren't always great for modelling things, so when
it's unclear what's the best solution, it's usually best to go for
simple.

James



Re: Database Design Advice

2013-11-08 Thread Adam Witney



On 8. 11. 2013 10:06, Smylers wrote:

James Laver writes:


Smylers smyl...@stripey.com wrote:


William Blunn writes:


Instead of storing a version ID (e.g. 1, 2, 3, 4, 5, 6), you could
store a SERIAL.

So for one document ID, you might store versions 10, 11, 12, 50,
75, 87.


This was my immediate thought as well, but I'd probably cheat and
declarr the document version numbers to be oureky decorative

  ^^
“purely”, I presume?


and thus the realm of userspace to turn them into 1,2,3,4,5 etc. --
one can, after all fix this with a single line of code.


True. I'm all for cheating.

But reporting will be done by Crystal Reports connecting directly to the
database (and there's a distinct possibility that there will end up
being more than one code base (in different programming languages) using
the database too), so I'd rather any serial-number-to-per-document-
version-number conversion was handled in the database, so it's in just
one place.

That conversion could be abstracted by a view, so the complexity is
hidden for somebody just doing a SELECT — but my attempts with the
Rank() window function seem like too complex complexity to be worth it.


I would probably do this with a view, rule and a pl/pgsql function to 
encode the logic, if that helps.


Adam




Re: Database Design Advice

2013-11-08 Thread Dirk Koopman

On 08/11/13 11:43, Smylers wrote:

• Have separate euro_discount and percentage_discount tables, then when
   a discount applies create a record in the appropriate table. This
   avoids any NULLs in the DB (something I've seen advocated as good
   database design), but it still requires ensuring that both discount
   types don't get used at once.

   It also makes answering the question ‘is there any discount?’ more
   work, and something which will get worse if the business concocts
   another discount type in future.


It all depends on the application and your view as to where the business 
might go in the future. A case that I had to deal with many years ago 
was related to the building trade and their suppliers (Lloyds Insurance 
Broking would also qualify [in a less complex way, and for other things]).


In essence suppliers would supply using discount rate cards which 
would state that for products of discount band a1-z9 (to give a rough 
scale of number of bands) would have chain discounts on book price of 
the form +135% [...] -30% -10% [...] and customers might offered 
anything from cost + 5% (say on a lorry load organised by phone) - 5% 
off list price (all retail customers [everyone likes a discount]), via 
-30% trade -10% (for plumbing) -5% ('cos you're a good customer).


And then there would be fixed amounts off individual items (say in 
sales) as well as normalising units of measure (timber is easier now, 
but in those days there were about 6 units of measure - depending on 
where the timber came from. Now there are only about 3 with m3 being 
predominant).


What I am trying to say is: do something that doesn't back you into a 
corner that you hadn't anticipated.


Oh and normalising tables an all in databases is all very fine and gives 
one interlecktuwal warm feelings, but it doesn't necessarily give the 
*business* an optimal result.


Re: Database Design Advice

2013-11-08 Thread Smylers
Abigail writes:

 On Thu, Nov 07, 2013 at 01:03:00PM +, Smylers wrote:
 
version should start at 1 for each document and be
 
 I've used triggers to enforce business rules like this in the past

Mark Stringer writes:

 I've used BEFORE INSERT triggers to select the max existing version
 ID, then increment that by one for the new version of the doc -

Jérôme Étévé writes:

 On Postgres and triggers. You can have them to run at any time you want:
 http://www.postgresql.org/docs/9.1/static/sql-createtrigger.html

Thank you all three of you. A trigger to set the document version number
when inserting the row should at least mean consistent data is inserted
in the first place, which is probably good enough.

Here it is in Postgres's own function language, which goes by the
awkwardly written name PL/pgSQL:

  CREATE OR REPLACE FUNCTION set_document_version_num() RETURNS TRIGGER AS
  $FN$
BEGIN
  NEW.version := 1 +
  MAX(version)
  FROM document_version
  WHERE document_id = NEW.document_id;
  RETURN NEW;
END;
  $FN$ LANGUAGE plpgsql;

  CREATE TRIGGER insert_document_version_num
  BEFORE INSERT ON document_version FOR EACH ROW
  EXECUTE PROCEDURE set_document_version_num();

Postgres also supports embedded Perl, so the function can alternatively
be written:

  CREATE EXTENSION plperl;

  CREATE OR REPLACE FUNCTION set_document_version_num() RETURNS TRIGGER AS
  $FN$
$result = spi_exec_query
(qq[
  SELECT MAX(version)
  FROM document_version
  WHERE document_id = $_TD-{new}{document_id}
]);
$_TD-{new}{version} = 1 + $result-{rows}[0]{max};
'MODIFY';
  $FN$ LANGUAGE plperl;

There's no reason to do this, other than this mailing list has “perl” in
its name.

Theoretically I'd rather write this stuff in a language I already know
(Perl) than one I don't (Postgres's procedural extensions to SQL).

But where the body of the function is mostly an SQL query, the Perl
version seems more awkward. And the overhead of learning the
Postgres–Perl glue ($_TD, and spi_exec_query() and its return value data
structure) was greater than learning the little bit of PL/pgSQL I needed
(mainly to omit the word SELECT from the beginning of the query).

Thank you everybody.

Smylers
-- 
The UK gov's proposed gagging law will suppress protest and campaigning by
charities and political groups. Read more: http://civilsocietycommission.info/
Please sign this petition: http://38d.gs/1cJ9ViA
Oxfam • BHA • Quakers * Countryside Alliance • League Against Cruel Sports • CND



Re: Database Design Advice

2013-11-08 Thread Smylers
Dirk Koopman writes:

 It all depends on the application and your view as to where the
 business might go in the future.

Database design should only be performed by a qualified soothsayer?

James Laver writes:

 On Fri, Nov 8, 2013 at 11:43 AM, Smylers smyl...@stripey.com wrote:
 
  • Separate fields for discount_percentage and discount_amount. This
has the disadvantage of needing to ensure that a record doesn't
have both fields set.
 
If the business concocts another discount type in future, this
will require a field for each discount type.
 
 I'd probably actually go for this and a trigger.

Unless somebody else points out a compelling reason to pick a different
option, I'll do this one then.

Though a CHECK constraint should suffice for for enforcing that at least
one of the fields be NULL, so I'll probably go with that rather than a
trigger.

 The particular case of discounts means it's unlikely they'll add
 another,

You sound more confident in your soothsaying skills than I am in mine!
In the field of business this is for, I could just about see “days” not
being a completely insane way of describing a discount, but whether
that's likely to happen is a different matter. (I'm certainly not about
to suggest it.)

Dirk Koopman writes:

 What I am trying to say is: do something that doesn't back you into a
 corner that you hadn't anticipated.

Indeed — that's basically my reason for asking these questions. Having
worked on so many systems where I've been backed into a corner by
database design that predates my involvement, I'm trying to avoid doing
that on this project where I'm in the rare position of creating the
initial database from scratch.

 Oh and normalising tables an all in databases is all very fine and
 gives one interlecktuwal warm feelings, but it doesn't necessarily
 give the *business* an optimal result.

James Laver writes:

 Relational databases aren't always great for modelling things, so when
 it's unclear what's the best solution, it's usually best to go for
 simple.

I think most of the times I've been scuppered by an irritating database
design, it's been through too little normalization rather than too much.
Hence my concern with these questions that I was going to do something
non-normal and regret it later. So I'm pleased with the reassurance that
doing this is reasonable.

Thanks, both. Hope you don't mind my intermingling your messages like
that.

Smylers
-- 
The UK gov's proposed gagging law will suppress protest and campaigning by
charities and political groups. Read more: http://civilsocietycommission.info/
Please sign this petition: http://38d.gs/1cJ9ViA
Oxfam • BHA • Quakers * Countryside Alliance • League Against Cruel Sports • CND



Re: Database Design Advice

2013-11-08 Thread Abigail
On Fri, Nov 08, 2013 at 11:43:08AM +, Smylers wrote:
 I wrote:
 
  Hello. I'm designing a database schema, and am interested in any
  wisdom folk can share over a few aspects of it:
 
 Thanks for all the advice so far. One more question I forgot to ask in
 the initial mail:
 
 A discount can either be a percentage or a value in euros. I can think
 of several suboptimal ways of representing this:
 
 • Separate fields for discount_percentage and discount_amount. This has
   the disadvantage of needing to ensure that a record doesn't have both
   fields set.
   
   If the business concocts another discount type in future, this will
   require a field for each discount type.
 
 • A discount_type field which indicates either 'percent' or 'euro' (or
   NULL for no discount), then a numeric field which stores either the
   percentage of the euro amount. This seems really icky, having a field
   which can mean one of two different things, just that they both happen
   to be numeric. 
 
   This approach always uses 2 fields regardless of the number of
   discount types. But it forces all of them to be numeric.
 
 • Have separate euro_discount and percentage_discount tables, then when
   a discount applies create a record in the appropriate table. This
   avoids any NULLs in the DB (something I've seen advocated as good
   database design), but it still requires ensuring that both discount
   types don't get used at once.
   
   It also makes answering the question ‘is there any discount?’ more
   work, and something which will get worse if the business concocts
   another discount type in future.
 
 All suggestions gratefully received.


Been there, done that.


I have had the same businesss requirement in the past, except that it was
to generally derive a price from another price -- which meant not just 
discounts, but also premiums; but that's only a minor detail.

I opted for two columns, additional and percentage, the first
defaulting to 0, the other to 1. Prices can then be easily calculated
as original_price * percentage + additional. No conditionals needed.
And the calculation is currency agnostic. For discounts, the percentage
is less than 1, or the additional is negative.

This solution proved to be really great -- two years after my implementation
I got the request Can we have discounts from discounts? We'd like to be
able to give both a percentage and a set discount [1]. I told them,
If you just fill in both discounts, it will do what you want.

Now, other discounts have been implemented, and required more columns,
but those have been orthogonal to the percentage/fixed amount. For instance,
we have later implemented an option to round prices to units after applying
a percentage. But this more to be seen as a attribute of a discount, than
a different discount type. The possibility that later on you may have to
implement a different discount type is a question that should be considered,
but ask yourself (and the rest of your business) how likely it is that such
a thing is going to happen, and how hard is it actually add a column. And it
may very well be that a different discount type requires new columns anyway
(because it requires a piece of data that isn't numeric, or it may require
 more than one piece of data).

Do note that your second option doesn't require a NULL. You could use
a third option none next to percentage and euro. Or you could
put the discount_type and the amount in the second table, using the
same primary key as the original table. Not having a discount means no
corresponding row in the second table.


[1] We're in the hotel business. Think of a hotel giving a EUR 10 discount
if you're staying alone in a room, and a 10% discount if you select
a non-refundable rate.



Abigail


Re: Database Design Advice

2013-11-08 Thread Smylers
Abigail writes:

 On Fri, Nov 08, 2013 at 11:43:08AM +, Smylers wrote:
 
  A discount can either be a percentage or a value in euros. I can
  think of several suboptimal ways of representing this:
 
 I opted for two columns, additional and percentage, the first
 defaulting to 0, the other to 1. Prices can then be easily calculated
 as original_price * percentage + additional. No conditionals needed.

Good idea. Ta.

Smylers
-- 
The UK gov's proposed gagging law will suppress protest and campaigning by
charities and political groups. Read more: http://civilsocietycommission.info/
Please sign this petition: http://38d.gs/1cJ9ViA
Oxfam • BHA • Quakers * Countryside Alliance • League Against Cruel Sports • CND



Re: Database Design Advice

2013-11-08 Thread Peter Sergeant
On Fri, Nov 8, 2013 at 3:02 PM, Abigail abig...@abigail.be wrote:

 I opted for two columns, additional and percentage, the first
 defaulting to 0, the other to 1. Prices can then be easily calculated
 as original_price * percentage + additional. No conditionals needed.
 And the calculation is currency agnostic. For discounts, the percentage
 is less than 1, or the additional is negative.

 This solution proved to be really great -- two years after my
 implementation
 I got the request Can we have discounts from discounts? We'd like to be
 able to give both a percentage and a set discount [1]. I told them,
 If you just fill in both discounts, it will do what you want.


Unless what you want is the percentage discount applied to the price after
the flat discount, which doesn't seem unreasonable.

-P


Re: Database Design Advice

2013-11-08 Thread Yitzchak Scott-Thoennes
On Fri, Nov 8, 2013 at 12:38 AM, Mark Stringer m...@repixl.com wrote:
 I'd guess that the date_created is being stored in the table, so could be
 used to order the records without the need for a serial, if ordering is the
 only concern.

Never use date/times as unique identifiers/sort keys.


Re: Database Design Advice

2013-11-08 Thread James Laver
On Fri, Nov 8, 2013 at 5:22 PM, Yitzchak Scott-Thoennes
sthoe...@gmail.com wrote:
 On Fri, Nov 8, 2013 at 12:38 AM, Mark Stringer m...@repixl.com wrote:
 I'd guess that the date_created is being stored in the table, so could be
 used to order the records without the need for a serial, if ordering is the
 only concern.

 Never use date/times as unique identifiers/sort keys.

I'll agree with the former, but what's the reasoning behind the latter?

James


Re: Database Design Advice

2013-11-08 Thread David Cantrell

On 08/11/2013 18:08, James Laver wrote:

On Fri, Nov 8, 2013 at 5:22 PM, Yitzchak Scott-Thoennes
sthoe...@gmail.com wrote:

On Fri, Nov 8, 2013 at 12:38 AM, Mark Stringer m...@repixl.com wrote:

I'd guess that the date_created is being stored in the table, so could be
used to order the records without the need for a serial, if ordering is the
only concern.

Never use date/times as unique identifiers/sort keys.

I'll agree with the former, but what's the reasoning behind the latter?


Because you might need to know which of two events at 
2013-11-08Z19:31:04 happened first. Sure you could use microseconds or 
whatever to get better resolution, but all that does is make the problem 
less likely, it doesn't make it go away.  You also normally want sort 
order to be consistent. If you have two records where the sort field is 
the same, the order they come out is going to be unpredictable.


--
David Cantrell | Bourgeois reactionary pig

fdisk format reinstall, doo-dah, doo-dah;
fdisk format reinstall, it's the Windows way


Database Design Advice

2013-11-07 Thread Smylers
Hello. I'm designing a database schema, and am interested in any wisdom
folk can share over a few aspects of it:

• A document can have versions. There's a document table and a
  document_version_contents table, where each document_version_contents
  record references document.id and has a version number.

  The pair {document_id, version} are unique, and that can be enforced
  in the DB.

  But version should start at 1 for each document and be consecutive.
  That is, if there is a record for document_id = 3846, version = 6
  then there should also be records for version 1 to 5 of that document
  ID. Is there a way of enforcing that at the DB level? We're using
  Postgres, if that makes a difference.

• The primary key of document_version_contents could be {document_id,
  version} (with tables linking to it specifying both of those), or I could
  create document_version_contents.id field, an arbitrary auto-increment
  number, which other tables could use as a foreign key.

  The advantage of the arbitrary id field is that it means other tables
  only need to link to one field.

  The advantage of using the pair of unique fields that exist anyway is
  that the values in them are meaningful: when working on the DB, or
  writing a report pulling data from it, it's easy to see which records
  correspond with which document; in programs using the DB, look-ups use
  values it has handy anyway.

  Anybody think I should pick one or t'other? Or has experience of
  going with one then regretting it?

• A purchase consists of ordering one product from a supplier. Each
  product is only available from a single supplier. So a record in the
  purchase table just needs to store a product ID, and by linking
  through the product table that defines the supplier too.

  A purchase will also be handled by a particular contact at the
  supplier. Again, the purchase table can link to the supplier_contact
  table, which in turn links to the supplier.

  Except there's now two (indirect) links from the purchase table to the
  supplier table. Can the database enforce that they both go to the same
  supplier? (That is, that a purchase handled by a particular contact
  must be for a product sold by that contact's company?)

Thank you for any advice.

Smylers
-- 
The UK gov's proposed gagging law will suppress protest and campaigning by
charities and political groups. Read more: http://civilsocietycommission.info/
Please sign this petition: http://38d.gs/1cJ9ViA
Oxfam • BHA • Quakers * Countryside Alliance • League Against Cruel Sports • CND



Re: Database Design Advice

2013-11-07 Thread Abigail
On Thu, Nov 07, 2013 at 01:03:00PM +, Smylers wrote:
 Hello. I'm designing a database schema, and am interested in any wisdom
 folk can share over a few aspects of it:
 
 • A document can have versions. There's a document table and a
   document_version_contents table, where each document_version_contents
   record references document.id and has a version number.
 
   The pair {document_id, version} are unique, and that can be enforced
   in the DB.
 
   But version should start at 1 for each document and be consecutive.
   That is, if there is a record for document_id = 3846, version = 6
   then there should also be records for version 1 to 5 of that document
   ID. Is there a way of enforcing that at the DB level? We're using
   Postgres, if that makes a difference.


I'm not familiar with Postgres, but I've used triggers to enforce 
business rules like this in the past (using Sybase). In Sybase,
triggers are run before a transaction is committed, and you can
rollback a transaction frow within a trigger.

On an insert, you'd check whether the number of documents with the
same document_id, and version less than the to be inserted version is
1 less than the version. You'd disallow updates of the version number,
and on deletions, you check if no higher version exists with the same
document_id.


 • The primary key of document_version_contents could be {document_id,
   version} (with tables linking to it specifying both of those), or I could
   create document_version_contents.id field, an arbitrary auto-increment
   number, which other tables could use as a foreign key.
 
   The advantage of the arbitrary id field is that it means other tables
   only need to link to one field.
 
   The advantage of using the pair of unique fields that exist anyway is
   that the values in them are meaningful: when working on the DB, or
   writing a report pulling data from it, it's easy to see which records
   correspond with which document; in programs using the DB, look-ups use
   values it has handy anyway.
 
   Anybody think I should pick one or t'other? Or has experience of
   going with one then regretting it?

It will depend on what queries you actually are going to perform. 
Both have their advantages and disadvantages, but only when you know
your queries, you know which solution has the bigger advantage.

 
 • A purchase consists of ordering one product from a supplier. Each
   product is only available from a single supplier. So a record in the
   purchase table just needs to store a product ID, and by linking
   through the product table that defines the supplier too.
 
   A purchase will also be handled by a particular contact at the
   supplier. Again, the purchase table can link to the supplier_contact
   table, which in turn links to the supplier.
 
   Except there's now two (indirect) links from the purchase table to the
   supplier table. Can the database enforce that they both go to the same
   supplier? (That is, that a purchase handled by a particular contact
   must be for a product sold by that contact's company?)
 


Triggers ;-)


Abigail


Re: Database Design Advice

2013-11-07 Thread Mark Stringer

On 11/07/2013 01:28 PM, Abigail wrote:


I'm not familiar with Postgres, but I've used triggers to enforce
business rules like this in the past (using Sybase). In Sybase,
triggers are run before a transaction is committed, and you can
rollback a transaction frow within a trigger.

On an insert, you'd check whether the number of documents with the
same document_id, and version less than the to be inserted version is
1 less than the version. You'd disallow updates of the version number,
and on deletions, you check if no higher version exists with the same
document_id.

Another RDBMS, so may be different again, but when I've used triggers in 
the past (MS SQL), I've used BEFORE INSERT triggers to select the max 
existing version ID, then increment that by one for the new version of 
the doc - preventing the need to know the current max version and supply 
it manually for the insert. It also prevents any possibility of 
application layer issues causing version number woes.


Re: Database Design Advice

2013-11-07 Thread Jérôme Étévé
On Postgres and triggers. You can have them to run at any time you want:
http://www.postgresql.org/docs/9.1/static/sql-createtrigger.html


Re: Database Design Advice

2013-11-07 Thread William Blunn

On 07/11/2013 13:03, Smylers wrote:

That is, if there is a record for document_id = 3846, version = 6
   then there should also be records for version 1 to 5 of that document
   ID. Is there a way of enforcing that at the DB level? We're using
   Postgres, if that makes a difference.


Instead of storing a version ID (e.g. 1, 2, 3, 4, 5, 6), you could store 
a SERIAL.


So for one document ID, you might store versions 10, 11, 12, 50, 75, 87.

If you then later want to know the version number as a consecutive 
counting number (e.g. 1, 2, 3, 4, 5, 6), then you could compute it using 
a PostgreSQL window function.


This should guarantee that you always get consecutive counting numbers.

Regards,

Bill


Re: Database Design Advice

2013-11-07 Thread William Blunn

On 07/11/2013 13:03, Smylers wrote:

A purchase consists of ordering one product from a supplier. Each
   product is only available from a single supplier. So a record in the
   purchase table just needs to store a product ID, and by linking
   through the product table that defines the supplier too.

   A purchase will also be handled by a particular contact at the
   supplier. Again, the purchase table can link to the supplier_contact
   table, which in turn links to the supplier.

   Except there's now two (indirect) links from the purchase table to the
   supplier table. Can the database enforce that they both go to the same
   supplier? (That is, that a purchase handled by a particular contact
   must be for a product sold by that contact's company?)


I assume your purchase table has foreign key constraints

product_id - product (id)
contact_id - contact(id)

And you want to ensure that for a given purchase row that 
product(supplier_id) = contact(supplier_id)?


You could add supplier_id to the purchase table.

Now this will create a denormalisation. To ensure that your data does 
not become inconsistent, you need to extend your foreign key constraints 
on the purchase table to be:


(product_id, supplier_id) - product(id, supplier_id)
(contact_id, supplier_id) - contact(id, supplier_id)

This will guarantee that you have no non-existent (product_id, 
supplier_id) or (contact_id, supplier_id) tuples in the purchase table.


So the data integrity ill-effects of denormalisation are resolved.

Having done this, any row in the purchase table will now point to a 
product and a contact relating to the (one) supplier mentioned in the 
purchase row itself, which means that the product and contact will 
always relate to the same supplier.


I used this approach in anger at a former employer and it worked well.

Regards,

Bill


Re: Database Design Advice

2013-11-07 Thread Smylers
William Blunn writes:

 On 07/11/2013 13:03, Smylers wrote:
 
A purchase consists of ordering one product from a supplier. Each
product is only available from a single supplier. So a record in
the purchase table just needs to store a product ID, and by
linking through the product table that defines the supplier too.
 
A purchase will also be handled by a particular contact at the
supplier. Again, the purchase table can link to the
supplier_contact table, which in turn links to the supplier.
 
Except there's now two (indirect) links from the purchase table to
the supplier table. Can the database enforce that they both go to
the same supplier? (That is, that a purchase handled by a
particular contact must be for a product sold by that contact's
company?)
 
 I assume your purchase table has foreign key constraints

Yeah, I'm assuming that too! (Currently this database only exists on
paper.)

 product_id - product (id)
 contact_id - contact(id)
 
 And you want to ensure that for a given purchase row that
 product(supplier_id) = contact(supplier_id)?
 
 You could add supplier_id to the purchase table.
 
 Now this will create a denormalisation. To ensure that your data
 does not become inconsistent, you need to extend your foreign key
 constraints on the purchase table to be:
 
 (product_id, supplier_id) - product(id, supplier_id)
 (contact_id, supplier_id) - contact(id, supplier_id)

Makes sense — thank you for the suggestion.

 I used this approach in anger at a former employer and it worked well.

Good to know.

I'm now reading up on window functions, prompted by your other mail.

Smylers
-- 
The UK gov's proposed gagging law will suppress protest and campaigning by
charities and political groups. Read more: http://civilsocietycommission.info/
Please sign this petition: http://38d.gs/1cJ9ViA
Oxfam • BHA • Quakers * Countryside Alliance • League Against Cruel Sports • CND



Re: Database Design Advice

2013-11-07 Thread Smylers
William Blunn writes:

 On 07/11/2013 13:03, Smylers wrote:
 
That is, if there is a record for document_id = 3846, version = 6
then there should also be records for version 1 to 5 of that
document ID. Is there a way of enforcing that at the DB level?
We're using Postgres, if that makes a difference.
 
 Instead of storing a version ID (e.g. 1, 2, 3, 4, 5, 6), you could
 store a SERIAL.
 
 So for one document ID, you might store versions 10, 11, 12, 50, 75, 87.
 
 If you then later want to know the version number as a consecutive
 counting number (e.g. 1, 2, 3, 4, 5, 6),

Yes, it's a requirement to display the document's version number.

 then you could compute it using a PostgreSQL window function.
 
 This should guarantee that you always get consecutive counting
 numbers.

Thank you for the suggestion; I hadn't used window functions before, so
have just tried this out.

A window function only operates over the rows that are being selected in
the query, not all matching rows in the table. Which means that when
selecting a single record, the Rank() window function always returns 1.

So, using your numbers above, selecting the version with primary key 75
and calculating its version number as being 5, seems to involve
something along the lines of:

  SELECT *
  FROM
  (
SELECT *, Rank() OVER (PARTITION BY document_id ORDER BY id)
FROM document_version
WHERE document_id =
(
  SELECT id
  FROM document_version
  WHERE id = 75
)
  ) AS _
  WHERE id = 75;

The middle SELECT finds the records for all versions of the document,
and calculates a rank for each one. The outer SELECT then discards all
the rows except for the one we're interested in. The inner SELECT is to
find the document ID for the rows which will be selected by the middle
query and discarded by the outer one — which therefore involves passing
the primary key 75 to the query twice.

So it's certainly possible. But I'm thinking the above complexity is
worse than the original issue I was trying avoid with it.

However I feel richer for having learnt about window functions, which
I'm pleased to now have in my toolbox and expect will come in handy
sooner or later.

Thanks.

Smylers
-- 
The UK gov's proposed gagging law will suppress protest and campaigning by
charities and political groups. Read more: http://civilsocietycommission.info/
Please sign this petition: http://38d.gs/1cJ9ViA
Oxfam • BHA • Quakers * Countryside Alliance • League Against Cruel Sports • CND