Re: Database Design Advice
Dagfinn Ilmari Mannsåker writes: Smylers smyl...@stripey.com writes: Here it is in Postgres's own function language, which goes by the awkwardly written name PL/pgSQL: CREATE OR REPLACE FUNCTION set_document_version_num() RETURNS TRIGGER AS $FN$ BEGIN NEW.version := 1 + MAX(version) FROM document_version WHERE document_id = NEW.document_id; RETURN NEW; END; $FN$ LANGUAGE plpgsql; CREATE TRIGGER insert_document_version_num BEFORE INSERT ON document_version FOR EACH ROW EXECUTE PROCEDURE set_document_version_num(); This trigger works fine, until you get concurrent saves of the same document, in which case one of the transactions will get a duplicate key violation. Ah, thank you for pointing that out. In this case I think having a collision, and hence an error, may actually be desirable: the application can detect the error and warn the user that another change has already been made. If you want both to succed, with the last one winning, you can do it by keeping the current version in the document table, I had been thinking that doing that would be bad, because it's redundant, even though having it looks like it'd be convenient in a few places. and making the trigger update it: But that sounds quite nice, the duplication seems liveable-with when it's managed with a trigger. ALTER TABLE document ADD COLUMN current_version INTEGER NOT NULL DEFAULT 0; REPLACE FUNCTION set_document_version_num() RETURNS TRIGGER AS $FN$ BEGIN UPDATE document SET current_version = current_version + 1 WHERE id = NEW.document_id RETURNING current_version INTO NEW.version; RETURN NEW; END $FN$ LANGUAGE plpgsql; I do like Postgres's RETURNING clauses — they seem so much more flexible and elegant than other ways of achieving the same ends. Cheers Smylers -- The UK gov's proposed gagging law will suppress protest and campaigning by charities and political groups. Read more: http://civilsocietycommission.info/ Please sign this petition: http://38d.gs/1cJ9ViA Oxfam • BHA • Quakers * Countryside Alliance • League Against Cruel Sports • CND
Re: Database Design Advice
Auto-increment a double column primary key, something like: uid int(11) not null auto_increment, revision int(11) not null auto_increment, primary key(uid, revision) ; works in mysql. You won't have your concurrent query issues there, for the most part anyway. Maybe marking a records as tainted. On 11/18/2013 10:25 AM, Smylers wrote: Dagfinn Ilmari Mannsåker writes: Smylers smyl...@stripey.com writes: Here it is in Postgres's own function language, which goes by the awkwardly written name PL/pgSQL: CREATE OR REPLACE FUNCTION set_document_version_num() RETURNS TRIGGER AS $FN$ BEGIN NEW.version := 1 + MAX(version) FROM document_version WHERE document_id = NEW.document_id; RETURN NEW; END; $FN$ LANGUAGE plpgsql; CREATE TRIGGER insert_document_version_num BEFORE INSERT ON document_version FOR EACH ROW EXECUTE PROCEDURE set_document_version_num(); This trigger works fine, until you get concurrent saves of the same document, in which case one of the transactions will get a duplicate key violation. Ah, thank you for pointing that out. In this case I think having a collision, and hence an error, may actually be desirable: the application can detect the error and warn the user that another change has already been made. If you want both to succed, with the last one winning, you can do it by keeping the current version in the document table, I had been thinking that doing that would be bad, because it's redundant, even though having it looks like it'd be convenient in a few places. and making the trigger update it: But that sounds quite nice, the duplication seems liveable-with when it's managed with a trigger. ALTER TABLE document ADD COLUMN current_version INTEGER NOT NULL DEFAULT 0; REPLACE FUNCTION set_document_version_num() RETURNS TRIGGER AS $FN$ BEGIN UPDATE document SET current_version = current_version + 1 WHERE id = NEW.document_id RETURNING current_version INTO NEW.version; RETURN NEW; END $FN$ LANGUAGE plpgsql; I do like Postgres's RETURNING clauses — they seem so much more flexible and elegant than other ways of achieving the same ends. Cheers Smylers
Re: Database Design Advice
On 18/11/13 11:18, Philip Skinner wrote: Auto-increment a double column primary key, something like: uid int(11) not null auto_increment, revision int(11) not null auto_increment, primary key(uid, revision) ; works in mysql. You won't have your concurrent query issues there, for the most part anyway. Maybe marking a records as tainted. As far as I know that only works with MyISAM in mysql. It certainly doesn't work like that in Postgres. Matt
Re: Database Design Advice
On 8 Nov 2013, at 19:33, David Cantrell da...@cantrell.org.uk wrote: [...] Because you might need to know which of two events at 2013-11-08Z19:31:04 happened first. Sure you could use microseconds or whatever to get better resolution, but all that does is make the problem less likely, it doesn't make it go away. You also normally want sort order to be consistent. If you have two records where the sort field is the same, the order they come out is going to be unpredictable. That's just a specific example of the general problem where one desires a stable sort by a non-unique column. The simple solution is to just add more columns to the ORDER BY until the tuples *are* unique. The primary key is an obvious choice of tie-breaker if you don't care about the order so long as it's consistent.
Re: Database Design Advice
On 10/11/2013, at 4:24, Peter Corlett ab...@cabal.org.uk wrote: On 8 Nov 2013, at 19:33, David Cantrell da...@cantrell.org.uk wrote: [...] Because you might need to know which of two events at 2013-11-08Z19:31:04 happened first. Sure you could use microseconds or whatever to get better resolution, but all that does is make the problem less likely, it doesn't make it go away. You also normally want sort order to be consistent. If you have two records where the sort field is the same, the order they come out is going to be unpredictable. That's just a specific example of the general problem where one desires a stable sort by a non-unique column. The simple solution is to just add more columns to the ORDER BY until the tuples *are* unique. The primary key is an obvious choice of tie-breaker if you don't care about the order so long as it's consistent. Also, the time I tried to use a date field for ordering, it wasn't susceptible to this problem for reasons related to the specific use case. It was still a bad idea though, for scope-creep related reasons.
Re: Database Design Advice
Smylers smyl...@stripey.com wrote: William Blunn writes: Instead of storing a version ID (e.g. 1, 2, 3, 4, 5, 6), you could store a SERIAL. So for one document ID, you might store versions 10, 11, 12, 50, 75, 87. This was my immediate thought as well, but I'd probably cheat and declarr the document version numbers to be oureky decorative and thus the realm of userspace to turn them into 1,2,3,4,5 etc. -- one can, after all fix this with a single line of code. Performance (and complexity) would be much better than triggers -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Re: Database Design Advice
On 11/08/2013 08:17 AM, James Laver wrote: Smylers smyl...@stripey.com wrote: William Blunn writes: Instead of storing a version ID (e.g. 1, 2, 3, 4, 5, 6), you could store a SERIAL. So for one document ID, you might store versions 10, 11, 12, 50, 75, 87. This was my immediate thought as well, but I'd probably cheat and declarr the document version numbers to be oureky decorative and thus the realm of userspace to turn them into 1,2,3,4,5 etc. -- one can, after all fix this with a single line of code. Performance (and complexity) would be much better than triggers From the OP: But version should start at 1 for each document and be consecutive. I'd guess that the date_created is being stored in the table, so could be used to order the records without the need for a serial, if ordering is the only concern.
Re: Database Design Advice
Mark Stringer m...@repixl.com wrote: From the OP: But version should start at 1 for each document and be consecutive. I'll grant that I've assumed that was to be more friendly to the end user, but in that case, it really is just cunning use of the .. operator. I'd guess that the date_created is being stored in the table, so could be used to order the records without the need for a serial, if ordering is the only concern. Personally I would opt for timestamps by preference, but that brings in a longer requirements chat. How does backdating revisions work? Do you need a separate key to order in that case? Etc. James -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Re: Database Design Advice
On 08/11/2013, at 19:38, Mark Stringer m...@repixl.com wrote: On 11/08/2013 08:17 AM, James Laver wrote: Smylers smyl...@stripey.com wrote: William Blunn writes: Instead of storing a version ID (e.g. 1, 2, 3, 4, 5, 6), you could store a SERIAL. So for one document ID, you might store versions 10, 11, 12, 50, 75, 87. This was my immediate thought as well, but I'd probably cheat and declarr the document version numbers to be oureky decorative and thus the realm of userspace to turn them into 1,2,3,4,5 etc. -- one can, after all fix this with a single line of code. Performance (and complexity) would be much better than triggers From the OP: But version should start at 1 for each document and be consecutive. I'd guess that the date_created is being stored in the table, so could be used to order the records without the need for a serial, if ordering is the only concern. I've tried to get away with that in the past. I was told that I'm not the messiah, I'm a very naughty boy. It's fine in one respect, but a total pain to fix if the business logic shifts on you at a later stage.
Re: Database Design Advice
James Laver writes: Smylers smyl...@stripey.com wrote: William Blunn writes: Instead of storing a version ID (e.g. 1, 2, 3, 4, 5, 6), you could store a SERIAL. So for one document ID, you might store versions 10, 11, 12, 50, 75, 87. This was my immediate thought as well, but I'd probably cheat and declarr the document version numbers to be oureky decorative ^^ “purely”, I presume? and thus the realm of userspace to turn them into 1,2,3,4,5 etc. -- one can, after all fix this with a single line of code. True. I'm all for cheating. But reporting will be done by Crystal Reports connecting directly to the database (and there's a distinct possibility that there will end up being more than one code base (in different programming languages) using the database too), so I'd rather any serial-number-to-per-document- version-number conversion was handled in the database, so it's in just one place. That conversion could be abstracted by a view, so the complexity is hidden for somebody just doing a SELECT — but my attempts with the Rank() window function seem like too complex complexity to be worth it. Performance (and complexity) would be much better than triggers Worth bearing in mind. Thanks. Smylers -- The UK gov's proposed gagging law will suppress protest and campaigning by charities and political groups. Read more: http://civilsocietycommission.info/ Please sign this petition: http://38d.gs/1cJ9ViA Oxfam • BHA • Quakers * Countryside Alliance • League Against Cruel Sports • CND
Re: Database Design Advice
On Fri, Nov 8, 2013 at 10:06 AM, Smylers smyl...@stripey.com wrote: James Laver writes: This was my immediate thought as well, but I'd probably cheat and declarr the document version numbers to be oureky decorative ^^ “purely”, I presume? Yes. Phone keyboard, sorry. and thus the realm of userspace to turn them into 1,2,3,4,5 etc. -- one can, after all fix this with a single line of code. True. I'm all for cheating. But reporting will be done by Crystal Reports connecting directly to the database (and there's a distinct possibility that there will end up being more than one code base (in different programming languages) using the database too), so I'd rather any serial-number-to-per-document- version-number conversion was handled in the database, so it's in just one place. That conversion could be abstracted by a view, so the complexity is hidden for somebody just doing a SELECT — but my attempts with the Rank() window function seem like too complex complexity to be worth it. In this case, a view seems like the best solution. But if you're going to materialise it for extra performance, then you're back in the world of triggers (assuming eager materialisation), and back comes the complexity :( James
Re: Database Design Advice
I wrote: Hello. I'm designing a database schema, and am interested in any wisdom folk can share over a few aspects of it: Thanks for all the advice so far. One more question I forgot to ask in the initial mail: A discount can either be a percentage or a value in euros. I can think of several suboptimal ways of representing this: • Separate fields for discount_percentage and discount_amount. This has the disadvantage of needing to ensure that a record doesn't have both fields set. If the business concocts another discount type in future, this will require a field for each discount type. • A discount_type field which indicates either 'percent' or 'euro' (or NULL for no discount), then a numeric field which stores either the percentage of the euro amount. This seems really icky, having a field which can mean one of two different things, just that they both happen to be numeric. This approach always uses 2 fields regardless of the number of discount types. But it forces all of them to be numeric. • Have separate euro_discount and percentage_discount tables, then when a discount applies create a record in the appropriate table. This avoids any NULLs in the DB (something I've seen advocated as good database design), but it still requires ensuring that both discount types don't get used at once. It also makes answering the question ‘is there any discount?’ more work, and something which will get worse if the business concocts another discount type in future. All suggestions gratefully received. Thanks Smylers -- The UK gov's proposed gagging law will suppress protest and campaigning by charities and political groups. Read more: http://civilsocietycommission.info/ Please sign this petition: http://38d.gs/1cJ9ViA Oxfam • BHA • Quakers * Countryside Alliance • League Against Cruel Sports • CND
Re: Database Design Advice
I must say I quite like playing with triggers. A well encapsulated trigger shouldn't take too much developer time, and so long as people read the schema definitions, and the trigger is correctly documented/linked to in the right places (which may be tricky to anticipate in advance, where the right places are), then that's what I'd do. Although I'd make a view instead if someone on the team felt strongly about it. On 08/11/2013, at 10:41 PM, James Laver wrote: On Fri, Nov 8, 2013 at 10:06 AM, Smylers smyl...@stripey.com wrote: James Laver writes: This was my immediate thought as well, but I'd probably cheat and declarr the document version numbers to be oureky decorative ^^ “purely”, I presume? Yes. Phone keyboard, sorry. and thus the realm of userspace to turn them into 1,2,3,4,5 etc. -- one can, after all fix this with a single line of code. True. I'm all for cheating. But reporting will be done by Crystal Reports connecting directly to the database (and there's a distinct possibility that there will end up being more than one code base (in different programming languages) using the database too), so I'd rather any serial-number-to-per-document- version-number conversion was handled in the database, so it's in just one place. That conversion could be abstracted by a view, so the complexity is hidden for somebody just doing a SELECT — but my attempts with the Rank() window function seem like too complex complexity to be worth it. In this case, a view seems like the best solution. But if you're going to materialise it for extra performance, then you're back in the world of triggers (assuming eager materialisation), and back comes the complexity :( James
Re: Database Design Advice
On Fri, Nov 8, 2013 at 11:43 AM, Smylers smyl...@stripey.com wrote: • Separate fields for discount_percentage and discount_amount. This has the disadvantage of needing to ensure that a record doesn't have both fields set. If the business concocts another discount type in future, this will require a field for each discount type. I'd probably actually go for this and a trigger. The particular case of discounts means it's unlikely they'll add another, and this seems like the most straightforward way to deal with it. Relational databases aren't always great for modelling things, so when it's unclear what's the best solution, it's usually best to go for simple. James
Re: Database Design Advice
On 8. 11. 2013 10:06, Smylers wrote: James Laver writes: Smylers smyl...@stripey.com wrote: William Blunn writes: Instead of storing a version ID (e.g. 1, 2, 3, 4, 5, 6), you could store a SERIAL. So for one document ID, you might store versions 10, 11, 12, 50, 75, 87. This was my immediate thought as well, but I'd probably cheat and declarr the document version numbers to be oureky decorative ^^ “purely”, I presume? and thus the realm of userspace to turn them into 1,2,3,4,5 etc. -- one can, after all fix this with a single line of code. True. I'm all for cheating. But reporting will be done by Crystal Reports connecting directly to the database (and there's a distinct possibility that there will end up being more than one code base (in different programming languages) using the database too), so I'd rather any serial-number-to-per-document- version-number conversion was handled in the database, so it's in just one place. That conversion could be abstracted by a view, so the complexity is hidden for somebody just doing a SELECT — but my attempts with the Rank() window function seem like too complex complexity to be worth it. I would probably do this with a view, rule and a pl/pgsql function to encode the logic, if that helps. Adam
Re: Database Design Advice
On 08/11/13 11:43, Smylers wrote: • Have separate euro_discount and percentage_discount tables, then when a discount applies create a record in the appropriate table. This avoids any NULLs in the DB (something I've seen advocated as good database design), but it still requires ensuring that both discount types don't get used at once. It also makes answering the question ‘is there any discount?’ more work, and something which will get worse if the business concocts another discount type in future. It all depends on the application and your view as to where the business might go in the future. A case that I had to deal with many years ago was related to the building trade and their suppliers (Lloyds Insurance Broking would also qualify [in a less complex way, and for other things]). In essence suppliers would supply using discount rate cards which would state that for products of discount band a1-z9 (to give a rough scale of number of bands) would have chain discounts on book price of the form +135% [...] -30% -10% [...] and customers might offered anything from cost + 5% (say on a lorry load organised by phone) - 5% off list price (all retail customers [everyone likes a discount]), via -30% trade -10% (for plumbing) -5% ('cos you're a good customer). And then there would be fixed amounts off individual items (say in sales) as well as normalising units of measure (timber is easier now, but in those days there were about 6 units of measure - depending on where the timber came from. Now there are only about 3 with m3 being predominant). What I am trying to say is: do something that doesn't back you into a corner that you hadn't anticipated. Oh and normalising tables an all in databases is all very fine and gives one interlecktuwal warm feelings, but it doesn't necessarily give the *business* an optimal result.
Re: Database Design Advice
Abigail writes: On Thu, Nov 07, 2013 at 01:03:00PM +, Smylers wrote: version should start at 1 for each document and be I've used triggers to enforce business rules like this in the past Mark Stringer writes: I've used BEFORE INSERT triggers to select the max existing version ID, then increment that by one for the new version of the doc - Jérôme Étévé writes: On Postgres and triggers. You can have them to run at any time you want: http://www.postgresql.org/docs/9.1/static/sql-createtrigger.html Thank you all three of you. A trigger to set the document version number when inserting the row should at least mean consistent data is inserted in the first place, which is probably good enough. Here it is in Postgres's own function language, which goes by the awkwardly written name PL/pgSQL: CREATE OR REPLACE FUNCTION set_document_version_num() RETURNS TRIGGER AS $FN$ BEGIN NEW.version := 1 + MAX(version) FROM document_version WHERE document_id = NEW.document_id; RETURN NEW; END; $FN$ LANGUAGE plpgsql; CREATE TRIGGER insert_document_version_num BEFORE INSERT ON document_version FOR EACH ROW EXECUTE PROCEDURE set_document_version_num(); Postgres also supports embedded Perl, so the function can alternatively be written: CREATE EXTENSION plperl; CREATE OR REPLACE FUNCTION set_document_version_num() RETURNS TRIGGER AS $FN$ $result = spi_exec_query (qq[ SELECT MAX(version) FROM document_version WHERE document_id = $_TD-{new}{document_id} ]); $_TD-{new}{version} = 1 + $result-{rows}[0]{max}; 'MODIFY'; $FN$ LANGUAGE plperl; There's no reason to do this, other than this mailing list has “perl” in its name. Theoretically I'd rather write this stuff in a language I already know (Perl) than one I don't (Postgres's procedural extensions to SQL). But where the body of the function is mostly an SQL query, the Perl version seems more awkward. And the overhead of learning the Postgres–Perl glue ($_TD, and spi_exec_query() and its return value data structure) was greater than learning the little bit of PL/pgSQL I needed (mainly to omit the word SELECT from the beginning of the query). Thank you everybody. Smylers -- The UK gov's proposed gagging law will suppress protest and campaigning by charities and political groups. Read more: http://civilsocietycommission.info/ Please sign this petition: http://38d.gs/1cJ9ViA Oxfam • BHA • Quakers * Countryside Alliance • League Against Cruel Sports • CND
Re: Database Design Advice
Dirk Koopman writes: It all depends on the application and your view as to where the business might go in the future. Database design should only be performed by a qualified soothsayer? James Laver writes: On Fri, Nov 8, 2013 at 11:43 AM, Smylers smyl...@stripey.com wrote: • Separate fields for discount_percentage and discount_amount. This has the disadvantage of needing to ensure that a record doesn't have both fields set. If the business concocts another discount type in future, this will require a field for each discount type. I'd probably actually go for this and a trigger. Unless somebody else points out a compelling reason to pick a different option, I'll do this one then. Though a CHECK constraint should suffice for for enforcing that at least one of the fields be NULL, so I'll probably go with that rather than a trigger. The particular case of discounts means it's unlikely they'll add another, You sound more confident in your soothsaying skills than I am in mine! In the field of business this is for, I could just about see “days” not being a completely insane way of describing a discount, but whether that's likely to happen is a different matter. (I'm certainly not about to suggest it.) Dirk Koopman writes: What I am trying to say is: do something that doesn't back you into a corner that you hadn't anticipated. Indeed — that's basically my reason for asking these questions. Having worked on so many systems where I've been backed into a corner by database design that predates my involvement, I'm trying to avoid doing that on this project where I'm in the rare position of creating the initial database from scratch. Oh and normalising tables an all in databases is all very fine and gives one interlecktuwal warm feelings, but it doesn't necessarily give the *business* an optimal result. James Laver writes: Relational databases aren't always great for modelling things, so when it's unclear what's the best solution, it's usually best to go for simple. I think most of the times I've been scuppered by an irritating database design, it's been through too little normalization rather than too much. Hence my concern with these questions that I was going to do something non-normal and regret it later. So I'm pleased with the reassurance that doing this is reasonable. Thanks, both. Hope you don't mind my intermingling your messages like that. Smylers -- The UK gov's proposed gagging law will suppress protest and campaigning by charities and political groups. Read more: http://civilsocietycommission.info/ Please sign this petition: http://38d.gs/1cJ9ViA Oxfam • BHA • Quakers * Countryside Alliance • League Against Cruel Sports • CND
Re: Database Design Advice
On Fri, Nov 08, 2013 at 11:43:08AM +, Smylers wrote: I wrote: Hello. I'm designing a database schema, and am interested in any wisdom folk can share over a few aspects of it: Thanks for all the advice so far. One more question I forgot to ask in the initial mail: A discount can either be a percentage or a value in euros. I can think of several suboptimal ways of representing this: • Separate fields for discount_percentage and discount_amount. This has the disadvantage of needing to ensure that a record doesn't have both fields set. If the business concocts another discount type in future, this will require a field for each discount type. • A discount_type field which indicates either 'percent' or 'euro' (or NULL for no discount), then a numeric field which stores either the percentage of the euro amount. This seems really icky, having a field which can mean one of two different things, just that they both happen to be numeric. This approach always uses 2 fields regardless of the number of discount types. But it forces all of them to be numeric. • Have separate euro_discount and percentage_discount tables, then when a discount applies create a record in the appropriate table. This avoids any NULLs in the DB (something I've seen advocated as good database design), but it still requires ensuring that both discount types don't get used at once. It also makes answering the question ‘is there any discount?’ more work, and something which will get worse if the business concocts another discount type in future. All suggestions gratefully received. Been there, done that. I have had the same businesss requirement in the past, except that it was to generally derive a price from another price -- which meant not just discounts, but also premiums; but that's only a minor detail. I opted for two columns, additional and percentage, the first defaulting to 0, the other to 1. Prices can then be easily calculated as original_price * percentage + additional. No conditionals needed. And the calculation is currency agnostic. For discounts, the percentage is less than 1, or the additional is negative. This solution proved to be really great -- two years after my implementation I got the request Can we have discounts from discounts? We'd like to be able to give both a percentage and a set discount [1]. I told them, If you just fill in both discounts, it will do what you want. Now, other discounts have been implemented, and required more columns, but those have been orthogonal to the percentage/fixed amount. For instance, we have later implemented an option to round prices to units after applying a percentage. But this more to be seen as a attribute of a discount, than a different discount type. The possibility that later on you may have to implement a different discount type is a question that should be considered, but ask yourself (and the rest of your business) how likely it is that such a thing is going to happen, and how hard is it actually add a column. And it may very well be that a different discount type requires new columns anyway (because it requires a piece of data that isn't numeric, or it may require more than one piece of data). Do note that your second option doesn't require a NULL. You could use a third option none next to percentage and euro. Or you could put the discount_type and the amount in the second table, using the same primary key as the original table. Not having a discount means no corresponding row in the second table. [1] We're in the hotel business. Think of a hotel giving a EUR 10 discount if you're staying alone in a room, and a 10% discount if you select a non-refundable rate. Abigail
Re: Database Design Advice
Abigail writes: On Fri, Nov 08, 2013 at 11:43:08AM +, Smylers wrote: A discount can either be a percentage or a value in euros. I can think of several suboptimal ways of representing this: I opted for two columns, additional and percentage, the first defaulting to 0, the other to 1. Prices can then be easily calculated as original_price * percentage + additional. No conditionals needed. Good idea. Ta. Smylers -- The UK gov's proposed gagging law will suppress protest and campaigning by charities and political groups. Read more: http://civilsocietycommission.info/ Please sign this petition: http://38d.gs/1cJ9ViA Oxfam • BHA • Quakers * Countryside Alliance • League Against Cruel Sports • CND
Re: Database Design Advice
On Fri, Nov 8, 2013 at 3:02 PM, Abigail abig...@abigail.be wrote: I opted for two columns, additional and percentage, the first defaulting to 0, the other to 1. Prices can then be easily calculated as original_price * percentage + additional. No conditionals needed. And the calculation is currency agnostic. For discounts, the percentage is less than 1, or the additional is negative. This solution proved to be really great -- two years after my implementation I got the request Can we have discounts from discounts? We'd like to be able to give both a percentage and a set discount [1]. I told them, If you just fill in both discounts, it will do what you want. Unless what you want is the percentage discount applied to the price after the flat discount, which doesn't seem unreasonable. -P
Re: Database Design Advice
On Fri, Nov 8, 2013 at 12:38 AM, Mark Stringer m...@repixl.com wrote: I'd guess that the date_created is being stored in the table, so could be used to order the records without the need for a serial, if ordering is the only concern. Never use date/times as unique identifiers/sort keys.
Re: Database Design Advice
On Fri, Nov 8, 2013 at 5:22 PM, Yitzchak Scott-Thoennes sthoe...@gmail.com wrote: On Fri, Nov 8, 2013 at 12:38 AM, Mark Stringer m...@repixl.com wrote: I'd guess that the date_created is being stored in the table, so could be used to order the records without the need for a serial, if ordering is the only concern. Never use date/times as unique identifiers/sort keys. I'll agree with the former, but what's the reasoning behind the latter? James
Re: Database Design Advice
On 08/11/2013 18:08, James Laver wrote: On Fri, Nov 8, 2013 at 5:22 PM, Yitzchak Scott-Thoennes sthoe...@gmail.com wrote: On Fri, Nov 8, 2013 at 12:38 AM, Mark Stringer m...@repixl.com wrote: I'd guess that the date_created is being stored in the table, so could be used to order the records without the need for a serial, if ordering is the only concern. Never use date/times as unique identifiers/sort keys. I'll agree with the former, but what's the reasoning behind the latter? Because you might need to know which of two events at 2013-11-08Z19:31:04 happened first. Sure you could use microseconds or whatever to get better resolution, but all that does is make the problem less likely, it doesn't make it go away. You also normally want sort order to be consistent. If you have two records where the sort field is the same, the order they come out is going to be unpredictable. -- David Cantrell | Bourgeois reactionary pig fdisk format reinstall, doo-dah, doo-dah; fdisk format reinstall, it's the Windows way
Database Design Advice
Hello. I'm designing a database schema, and am interested in any wisdom folk can share over a few aspects of it: • A document can have versions. There's a document table and a document_version_contents table, where each document_version_contents record references document.id and has a version number. The pair {document_id, version} are unique, and that can be enforced in the DB. But version should start at 1 for each document and be consecutive. That is, if there is a record for document_id = 3846, version = 6 then there should also be records for version 1 to 5 of that document ID. Is there a way of enforcing that at the DB level? We're using Postgres, if that makes a difference. • The primary key of document_version_contents could be {document_id, version} (with tables linking to it specifying both of those), or I could create document_version_contents.id field, an arbitrary auto-increment number, which other tables could use as a foreign key. The advantage of the arbitrary id field is that it means other tables only need to link to one field. The advantage of using the pair of unique fields that exist anyway is that the values in them are meaningful: when working on the DB, or writing a report pulling data from it, it's easy to see which records correspond with which document; in programs using the DB, look-ups use values it has handy anyway. Anybody think I should pick one or t'other? Or has experience of going with one then regretting it? • A purchase consists of ordering one product from a supplier. Each product is only available from a single supplier. So a record in the purchase table just needs to store a product ID, and by linking through the product table that defines the supplier too. A purchase will also be handled by a particular contact at the supplier. Again, the purchase table can link to the supplier_contact table, which in turn links to the supplier. Except there's now two (indirect) links from the purchase table to the supplier table. Can the database enforce that they both go to the same supplier? (That is, that a purchase handled by a particular contact must be for a product sold by that contact's company?) Thank you for any advice. Smylers -- The UK gov's proposed gagging law will suppress protest and campaigning by charities and political groups. Read more: http://civilsocietycommission.info/ Please sign this petition: http://38d.gs/1cJ9ViA Oxfam • BHA • Quakers * Countryside Alliance • League Against Cruel Sports • CND
Re: Database Design Advice
On Thu, Nov 07, 2013 at 01:03:00PM +, Smylers wrote: Hello. I'm designing a database schema, and am interested in any wisdom folk can share over a few aspects of it: • A document can have versions. There's a document table and a document_version_contents table, where each document_version_contents record references document.id and has a version number. The pair {document_id, version} are unique, and that can be enforced in the DB. But version should start at 1 for each document and be consecutive. That is, if there is a record for document_id = 3846, version = 6 then there should also be records for version 1 to 5 of that document ID. Is there a way of enforcing that at the DB level? We're using Postgres, if that makes a difference. I'm not familiar with Postgres, but I've used triggers to enforce business rules like this in the past (using Sybase). In Sybase, triggers are run before a transaction is committed, and you can rollback a transaction frow within a trigger. On an insert, you'd check whether the number of documents with the same document_id, and version less than the to be inserted version is 1 less than the version. You'd disallow updates of the version number, and on deletions, you check if no higher version exists with the same document_id. • The primary key of document_version_contents could be {document_id, version} (with tables linking to it specifying both of those), or I could create document_version_contents.id field, an arbitrary auto-increment number, which other tables could use as a foreign key. The advantage of the arbitrary id field is that it means other tables only need to link to one field. The advantage of using the pair of unique fields that exist anyway is that the values in them are meaningful: when working on the DB, or writing a report pulling data from it, it's easy to see which records correspond with which document; in programs using the DB, look-ups use values it has handy anyway. Anybody think I should pick one or t'other? Or has experience of going with one then regretting it? It will depend on what queries you actually are going to perform. Both have their advantages and disadvantages, but only when you know your queries, you know which solution has the bigger advantage. • A purchase consists of ordering one product from a supplier. Each product is only available from a single supplier. So a record in the purchase table just needs to store a product ID, and by linking through the product table that defines the supplier too. A purchase will also be handled by a particular contact at the supplier. Again, the purchase table can link to the supplier_contact table, which in turn links to the supplier. Except there's now two (indirect) links from the purchase table to the supplier table. Can the database enforce that they both go to the same supplier? (That is, that a purchase handled by a particular contact must be for a product sold by that contact's company?) Triggers ;-) Abigail
Re: Database Design Advice
On 11/07/2013 01:28 PM, Abigail wrote: I'm not familiar with Postgres, but I've used triggers to enforce business rules like this in the past (using Sybase). In Sybase, triggers are run before a transaction is committed, and you can rollback a transaction frow within a trigger. On an insert, you'd check whether the number of documents with the same document_id, and version less than the to be inserted version is 1 less than the version. You'd disallow updates of the version number, and on deletions, you check if no higher version exists with the same document_id. Another RDBMS, so may be different again, but when I've used triggers in the past (MS SQL), I've used BEFORE INSERT triggers to select the max existing version ID, then increment that by one for the new version of the doc - preventing the need to know the current max version and supply it manually for the insert. It also prevents any possibility of application layer issues causing version number woes.
Re: Database Design Advice
On Postgres and triggers. You can have them to run at any time you want: http://www.postgresql.org/docs/9.1/static/sql-createtrigger.html
Re: Database Design Advice
On 07/11/2013 13:03, Smylers wrote: That is, if there is a record for document_id = 3846, version = 6 then there should also be records for version 1 to 5 of that document ID. Is there a way of enforcing that at the DB level? We're using Postgres, if that makes a difference. Instead of storing a version ID (e.g. 1, 2, 3, 4, 5, 6), you could store a SERIAL. So for one document ID, you might store versions 10, 11, 12, 50, 75, 87. If you then later want to know the version number as a consecutive counting number (e.g. 1, 2, 3, 4, 5, 6), then you could compute it using a PostgreSQL window function. This should guarantee that you always get consecutive counting numbers. Regards, Bill
Re: Database Design Advice
On 07/11/2013 13:03, Smylers wrote: A purchase consists of ordering one product from a supplier. Each product is only available from a single supplier. So a record in the purchase table just needs to store a product ID, and by linking through the product table that defines the supplier too. A purchase will also be handled by a particular contact at the supplier. Again, the purchase table can link to the supplier_contact table, which in turn links to the supplier. Except there's now two (indirect) links from the purchase table to the supplier table. Can the database enforce that they both go to the same supplier? (That is, that a purchase handled by a particular contact must be for a product sold by that contact's company?) I assume your purchase table has foreign key constraints product_id - product (id) contact_id - contact(id) And you want to ensure that for a given purchase row that product(supplier_id) = contact(supplier_id)? You could add supplier_id to the purchase table. Now this will create a denormalisation. To ensure that your data does not become inconsistent, you need to extend your foreign key constraints on the purchase table to be: (product_id, supplier_id) - product(id, supplier_id) (contact_id, supplier_id) - contact(id, supplier_id) This will guarantee that you have no non-existent (product_id, supplier_id) or (contact_id, supplier_id) tuples in the purchase table. So the data integrity ill-effects of denormalisation are resolved. Having done this, any row in the purchase table will now point to a product and a contact relating to the (one) supplier mentioned in the purchase row itself, which means that the product and contact will always relate to the same supplier. I used this approach in anger at a former employer and it worked well. Regards, Bill
Re: Database Design Advice
William Blunn writes: On 07/11/2013 13:03, Smylers wrote: A purchase consists of ordering one product from a supplier. Each product is only available from a single supplier. So a record in the purchase table just needs to store a product ID, and by linking through the product table that defines the supplier too. A purchase will also be handled by a particular contact at the supplier. Again, the purchase table can link to the supplier_contact table, which in turn links to the supplier. Except there's now two (indirect) links from the purchase table to the supplier table. Can the database enforce that they both go to the same supplier? (That is, that a purchase handled by a particular contact must be for a product sold by that contact's company?) I assume your purchase table has foreign key constraints Yeah, I'm assuming that too! (Currently this database only exists on paper.) product_id - product (id) contact_id - contact(id) And you want to ensure that for a given purchase row that product(supplier_id) = contact(supplier_id)? You could add supplier_id to the purchase table. Now this will create a denormalisation. To ensure that your data does not become inconsistent, you need to extend your foreign key constraints on the purchase table to be: (product_id, supplier_id) - product(id, supplier_id) (contact_id, supplier_id) - contact(id, supplier_id) Makes sense — thank you for the suggestion. I used this approach in anger at a former employer and it worked well. Good to know. I'm now reading up on window functions, prompted by your other mail. Smylers -- The UK gov's proposed gagging law will suppress protest and campaigning by charities and political groups. Read more: http://civilsocietycommission.info/ Please sign this petition: http://38d.gs/1cJ9ViA Oxfam • BHA • Quakers * Countryside Alliance • League Against Cruel Sports • CND
Re: Database Design Advice
William Blunn writes: On 07/11/2013 13:03, Smylers wrote: That is, if there is a record for document_id = 3846, version = 6 then there should also be records for version 1 to 5 of that document ID. Is there a way of enforcing that at the DB level? We're using Postgres, if that makes a difference. Instead of storing a version ID (e.g. 1, 2, 3, 4, 5, 6), you could store a SERIAL. So for one document ID, you might store versions 10, 11, 12, 50, 75, 87. If you then later want to know the version number as a consecutive counting number (e.g. 1, 2, 3, 4, 5, 6), Yes, it's a requirement to display the document's version number. then you could compute it using a PostgreSQL window function. This should guarantee that you always get consecutive counting numbers. Thank you for the suggestion; I hadn't used window functions before, so have just tried this out. A window function only operates over the rows that are being selected in the query, not all matching rows in the table. Which means that when selecting a single record, the Rank() window function always returns 1. So, using your numbers above, selecting the version with primary key 75 and calculating its version number as being 5, seems to involve something along the lines of: SELECT * FROM ( SELECT *, Rank() OVER (PARTITION BY document_id ORDER BY id) FROM document_version WHERE document_id = ( SELECT id FROM document_version WHERE id = 75 ) ) AS _ WHERE id = 75; The middle SELECT finds the records for all versions of the document, and calculates a rank for each one. The outer SELECT then discards all the rows except for the one we're interested in. The inner SELECT is to find the document ID for the rows which will be selected by the middle query and discarded by the outer one — which therefore involves passing the primary key 75 to the query twice. So it's certainly possible. But I'm thinking the above complexity is worse than the original issue I was trying avoid with it. However I feel richer for having learnt about window functions, which I'm pleased to now have in my toolbox and expect will come in handy sooner or later. Thanks. Smylers -- The UK gov's proposed gagging law will suppress protest and campaigning by charities and political groups. Read more: http://civilsocietycommission.info/ Please sign this petition: http://38d.gs/1cJ9ViA Oxfam • BHA • Quakers * Countryside Alliance • League Against Cruel Sports • CND