Hello Tim,

Pardon the questions but I haven't had the need to use denormalization yet, so:

Tim Wintle wrote:

/*   Table A    */
CREATE TABLE TableA (
    project_id BIGINT NOT NULL,
    cost INT,
    date DATETIME,
    PRIMARY KEY (project_id, date)
);

/* Table projects */
CREATE TABLE projects (
    client_id BIGINT NOT NULL,
    project_id BIGINT NOT NULL,
    INDEX(client_id)
);


.... now the index on TableA has been optimised for queries against date
ranges on specific project ids which should more or less be sequential
(under a load of other assumptions) - but that reduces the efficiency of
the query under a join with the table "projects".

If you denormalise the table, and update the first index to be on
(client_id, project_id, date) it can end up running far more quickly -

IOW you basically merged the tables like follows?

CREATE TABLE projects (
    client_id BIGINT NOT NULL,
    project_id BIGINT NOT NULL,
    cost INT,
    date DATETIME,
    INDEX(client_id, project_id, date)
);

From what you write further in the mail I conclude that you have not eliminated the first table, just made table projects look like I wrote above, right? (and used stored procedures to make sure that both tables contain the relevant data for client_id and project_id columns in both tables)

Have you had some other joins on denormalized keys? i.e. in example how the join of hypothetical TableB with projects on projects.client_id behave with such big tables? (bc I assume that you obviously can't denormalize absolutely everything, so this implies the need of doing some joins on denormalized columns like client_id).

assuming you can access the first mapping anyway -

? I'm not clear on what you mean here.

so you're still
storing the first table, with stored procedures to ensure you still have
correct data in all tables.

Regards,
mk

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to