Yasir Assam <[EMAIL PROTECTED]> wrote on 04/03/2006 11:09:01 PM:
> Hello,
>
> I need to be able to sum over distinct values but I can't seem to do it
> unless I use sub-selects (which I want to avoid doing).
>
> To see what I mean, I've constructed a toy DB:
>
> DROP TABLE IF EXISTS spell;
> CREATE TABLE spell (
> id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
> power FLOAT,
> time_casted DATETIME
> );
>
> DROP TABLE IF EXISTS wizard;
> CREATE TABLE wizard (
> id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
> age INT UNSIGNED,
> name VARCHAR(255)
> );
>
> DROP TABLE IF EXISTS spellcast;
> CREATE TABLE spellcast (
> id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
> type ENUM ('fire', 'air', 'water', 'earth'),
> spell_id INT UNSIGNED,
> wizard_id INT UNSIGNED
> );
>
> INSERT INTO wizard (name, age) VALUES
> ('alan', 20),
> ('bill', 23),
> ('chris', 21);
>
> INSERT INTO spell (power, time_casted) VALUES
> (400, '2006-02-02 12:00'),
> (432, '2006-02-04 12:00'),
> (123, '2006-02-03 12:00'),
> (543, '2006-02-08 12:00'),
> (320, '2006-02-01 12:00'),
> (102, '2006-02-12 12:00'),
> (732, '2006-02-14 12:00'),
> (948, '2006-02-18 12:00'),
> (932, '2006-02-21 12:00'),
> (842, '2006-02-26 12:00');
>
> INSERT INTO spellcast (type, spell_id, wizard_id) VALUES
> ('fire', 1, 1),
> ('air', 1, 1),
> ('water', 1, 1),
> ('earth', 2, 1),
> ('water', 2, 1),
> ('fire', 3, 1),
> ('water', 3, 1),
> ('water', 4, 1),
> ('fire', 4, 1),
> ('air', 5, 1),
> ('fire', 6, 1),
> ('water', 7, 1),
> ('water', 1, 2),
> ('fire', 1, 2),
> ('air', 2, 2),
> ('earth', 3, 2),
> ('water', 3, 2),
> ('earth', 4, 2),
> ('fire', 4, 2),
> ('air', 4, 2),
> ('water', 1, 3),
> ('earth', 1, 3),
> ('air', 1, 3),
> ('water', 5, 3),
> ('fire', 5, 3),
> ('earth', 5, 3),
> ('water', 6, 3),
> ('air', 7, 3);
>
> A spell is an individual spell that's been cast. A spellcast is the
> action of casting the spell by a particular wizard (or a group of
> wizards). When casting a spell, a wizard can contribute various essenses
> (fire, earth, air, water). So for example, Alan cast a spell (id=1) and
> contributed three essences (fire, air & water) - this means that there
> are 3 spellcast rows for this contribution to this spell.
>
> Let's say I want to find the total power of all the spells cast by each
> wizard that involve fire & air. At first I thought the following might
work:
>
> SELECT wizard.name,
> SUM(spell.power)
> FROM spell,
> spellcast,
> wizard
> WHERE wizard.id = spellcast.wizard_id AND
> spellcast.spell_id = spell.id AND
> spellcast.type IN ('fire', 'air')
> GROUP BY wizard.id;
>
> But this is wrong. The above query will count some spells more than
> once, so the resulting sum is greater than it should be.
>
> The only way I can think of doing this correctly is to use sub-queries:
>
> SELECT DISTINCT wizard.name,
> sub.s
> FROM
> (SELECT SUM(inner_sub.power) AS s,
> inner_sub.wiz_id
> FROM
> (SELECT DISTINCT
> spell.id,
> spell.power,
> wizard.id AS wiz_id
> FROM spell,
> spellcast,
> wizard
> WHERE wizard.id = spellcast.wizard_id AND
> spellcast.spell_id = spell.id AND
> spellcast.type IN ('fire', 'air')
> ) AS inner_sub
> GROUP BY inner_sub.wiz_id
> ) AS sub,
> wizard,
> spellcast
> WHERE wizard.id = sub.wiz_id AND
> spellcast.wizard_id = wizard.id AND
> spellcast.type IN ('fire', 'air');
>
> This works but I was wondering whether there was a simpler way to do it.
> All my queries are generated dynamically, and I want to avoid generating
> complex subqueries.
>
> Anyone know of a way to do the above a lot more simply? I can change the
> DB schema if needs be.
>
> Thanks,
> Yasir
>
>
Based on database theory, your schema is correct (so long as each spell
only consumes up to 1 unit of essence). As you have discovered, you are
trying to take a second-level summary within a single statement. While the
COUNT() aggregate function has a DISTINCT modifier none of the others do.
That is why you posted.
One technique you could try is to create an intermediate pivot table. That
way you can know how much of each essence has been used in each spell.
CREATE TEMPORARY TABLE tmpSpellSummary (PRIMARY KEY (wizard_id,spell_id))
SELECT
wizard_id,
spell_id,
SUM(if(type='AIR',1,0)) air,
SUM(if(type='EARTH',1,0)) earth,
SUM(if(type='FIRE',1,0)) fire,
SUM(if(type='WATER',1,0)) water
FROM spellcast
GROUP BY wizard_id, spell_id
Now you can join this pivot table to your other query and you won't have
the duplication. You can also modify this by adding (after the GROUP BY
clause):
HAVING air>0 and water>0
to pick out just those spell_id's that used both air and water.
Another technique you could try (still assuming that using an essence only
consumes 1 unit of it) is to add an "essence" bitmap value to your spell
table (along with a wizard_id) and eliminate the spellcast table entirely.
Assuming that your lowest four bits represent air,earth,fire, water (in
that order) you could match an (air+fire) spell against the value 10 or an
(earth + water) spell against the value 5 or an earth-only spell against
the value 4. A column of type tinyint unsigned (one byte) allows you to
flag on up to 8 values at once (allowing for 4 more essences or any other
standard spell components). Increasing the column size gives you 16 or 32
bits to play with. However adding up how many fire spells or earth spells
cast by an individual wizard becomes interesting again as you have to
peform bit-level functions to get at that information.
Another optimization you could try (especially if you start allowing
spells to use different quantities of each essense) is to assign each
essence it's own value column in the spell table (denormalize your
spell-spellcast tables). It would make finding combination spells simpler
and wouldn't take up too much room compared to your existing structure. It
makes totaling up the total qty of all essences use in each spell a little
more difficult (because they are all in their own columns) but how often
do you really do that? This is one of those situations where speed is
probably more important than theory and breaking normalization at this one
level may provide enough performance boost to really help you out.
Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine
PS. Another thing to try is combining the bitmap idea with the pivot table
idea:
CREATE TEMPORARY TABLE tmpSpellSummary (PRIMARY KEY (wizard_id,spell_id))
SELECT
wizard_id,
spell_id,
SUM(if(type='AIR',1,if(type='EARTH',2,if(type='FIRE',4,if(type='WATER',8,0)))))
bitmap
FROM spellcast
GROUP BY wizard_id, spell_id
HAVING bitmap = 9
(AIR + WATER = 1 + 8 = 9)
As you can tell, there are lots of ways to work through this. :-)