Hello Tim,
Pardon the questions but I haven't had the need to use denormalization
yet, so:
Tim Wintle wrote:
/* Table A */
CREATE TABLE TableA (
project_id BIGINT NOT NULL,
cost INT,
date DATETIME,
PRIMARY KEY (project_id, date)
);
/* Table projects */
CREATE TABLE projects (
client_id BIGINT NOT NULL,
project_id BIGINT NOT NULL,
INDEX(client_id)
);
.... now the index on TableA has been optimised for queries against date
ranges on specific project ids which should more or less be sequential
(under a load of other assumptions) - but that reduces the efficiency of
the query under a join with the table "projects".
If you denormalise the table, and update the first index to be on
(client_id, project_id, date) it can end up running far more quickly -
IOW you basically merged the tables like follows?
CREATE TABLE projects (
client_id BIGINT NOT NULL,
project_id BIGINT NOT NULL,
cost INT,
date DATETIME,
INDEX(client_id, project_id, date)
);
From what you write further in the mail I conclude that you have not
eliminated the first table, just made table projects look like I wrote
above, right? (and used stored procedures to make sure that both tables
contain the relevant data for client_id and project_id columns in both
tables)
Have you had some other joins on denormalized keys? i.e. in example how
the join of hypothetical TableB with projects on projects.client_id
behave with such big tables? (bc I assume that you obviously can't
denormalize absolutely everything, so this implies the need of doing
some joins on denormalized columns like client_id).
assuming you can access the first mapping anyway -
? I'm not clear on what you mean here.
so you're still
storing the first table, with stored procedures to ensure you still have
correct data in all tables.
Regards,
mk
--
http://mail.python.org/mailman/listinfo/python-list