Yasir Assam <[EMAIL PROTECTED]> wrote on 04/03/2006 11:09:01 PM:

> Hello,
> 
> I need to be able to sum over distinct values but I can't seem to do it 
> unless I use sub-selects (which I want to avoid doing).
> 
> To see what I mean, I've constructed a toy DB:
> 
> DROP TABLE IF EXISTS spell;
> CREATE TABLE spell (
>   id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
>   power FLOAT,
>   time_casted DATETIME
> );
> 
> DROP TABLE IF EXISTS wizard;
> CREATE TABLE wizard (
>   id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
>   age INT UNSIGNED,
>   name VARCHAR(255)
> );
> 
> DROP TABLE IF EXISTS spellcast;
> CREATE TABLE spellcast (
>   id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
>   type ENUM ('fire', 'air', 'water', 'earth'),
>   spell_id INT UNSIGNED,
>   wizard_id INT UNSIGNED
> );
> 
> INSERT INTO wizard (name, age) VALUES
>    ('alan', 20),
>    ('bill', 23),
>    ('chris', 21);
> 
> INSERT INTO spell (power, time_casted) VALUES
>    (400, '2006-02-02 12:00'),
>    (432, '2006-02-04 12:00'),
>    (123, '2006-02-03 12:00'),
>    (543, '2006-02-08 12:00'),
>    (320, '2006-02-01 12:00'),
>    (102, '2006-02-12 12:00'),
>    (732, '2006-02-14 12:00'),
>    (948, '2006-02-18 12:00'),
>    (932, '2006-02-21 12:00'),
>    (842, '2006-02-26 12:00');
> 
> INSERT INTO spellcast (type, spell_id, wizard_id) VALUES
>    ('fire', 1, 1),
>    ('air', 1, 1),
>    ('water', 1, 1),
>    ('earth', 2, 1),
>    ('water', 2, 1),
>    ('fire', 3, 1),
>    ('water', 3, 1),
>    ('water', 4, 1),
>    ('fire', 4, 1),
>    ('air', 5, 1),
>    ('fire', 6, 1),
>    ('water', 7, 1),
>    ('water', 1, 2),
>    ('fire', 1, 2),
>    ('air', 2, 2),
>    ('earth', 3, 2),
>    ('water', 3, 2),
>    ('earth', 4, 2),
>    ('fire', 4, 2),
>    ('air', 4, 2),
>    ('water', 1, 3),
>    ('earth', 1, 3),
>    ('air', 1, 3),
>    ('water', 5, 3),
>    ('fire', 5, 3),
>    ('earth', 5, 3),
>    ('water', 6, 3),
>    ('air', 7, 3);
> 
> A spell is an individual spell that's been cast. A spellcast is the 
> action of casting the spell by a particular wizard (or a group of 
> wizards). When casting a spell, a wizard can contribute various essenses 

> (fire, earth, air, water). So for example, Alan cast a spell (id=1) and 
> contributed three essences (fire, air & water) - this means that there 
> are 3 spellcast rows for this contribution to this spell.
> 
> Let's say I want to find the total power of all the spells cast by each 
> wizard that involve fire & air. At first I thought the following might 
work:
> 
> SELECT wizard.name,
>        SUM(spell.power)
> FROM spell,
>      spellcast,
>      wizard
> WHERE wizard.id = spellcast.wizard_id AND
>       spellcast.spell_id = spell.id AND
>       spellcast.type IN ('fire', 'air')
> GROUP BY wizard.id;
> 
> But this is wrong. The above query will count some spells more than 
> once, so the resulting sum is greater than it should be.
> 
> The only way I can think of doing this correctly is to use sub-queries:
> 
> SELECT DISTINCT wizard.name,
>        sub.s
> FROM
>   (SELECT SUM(inner_sub.power) AS s,
>           inner_sub.wiz_id
>    FROM
>      (SELECT DISTINCT
>              spell.id,
>              spell.power,
>              wizard.id AS wiz_id
>       FROM spell,
>            spellcast,
>            wizard
>       WHERE wizard.id = spellcast.wizard_id AND
>             spellcast.spell_id = spell.id AND
>             spellcast.type IN ('fire', 'air')
>       ) AS inner_sub
>    GROUP BY inner_sub.wiz_id
>    ) AS sub,
>    wizard,
>    spellcast
> WHERE wizard.id = sub.wiz_id AND
>       spellcast.wizard_id = wizard.id AND
>       spellcast.type IN ('fire', 'air');
> 
> This works but I was wondering whether there was a simpler way to do it. 

> All my queries are generated dynamically, and I want to avoid generating 

> complex subqueries.
> 
> Anyone know of a way to do the above a lot more simply? I can change the 

> DB schema if needs be.
> 
> Thanks,
> Yasir
> 
> 


Based on database theory, your schema is correct (so long as each spell 
only consumes up to 1 unit of essence). As you have discovered, you are 
trying to take a second-level summary within a single statement. While the 
COUNT() aggregate function has a DISTINCT modifier none of the others do. 
That is why you posted. 

One technique you could try is to create an intermediate pivot table. That 
way you can know how much of each essence has been used in each spell.

CREATE TEMPORARY TABLE tmpSpellSummary (PRIMARY KEY (wizard_id,spell_id)) 
SELECT 
  wizard_id,
  spell_id,
  SUM(if(type='AIR',1,0)) air,
  SUM(if(type='EARTH',1,0)) earth,
  SUM(if(type='FIRE',1,0)) fire,
  SUM(if(type='WATER',1,0)) water
FROM spellcast
GROUP BY wizard_id, spell_id

Now you can join this pivot table to your other query and you won't have 
the duplication. You can also modify this by adding (after the GROUP BY 
clause):

HAVING air>0 and water>0

to pick out just those spell_id's that used both air and water.

Another technique you could try (still assuming that using an essence only 
consumes 1 unit of it) is to add an "essence" bitmap value to your spell 
table (along with a wizard_id) and eliminate the spellcast table entirely. 
Assuming that your lowest four bits represent air,earth,fire, water (in 
that order) you could match an (air+fire) spell against the value 10 or an 
(earth + water) spell against the value 5 or an earth-only spell against 
the value 4. A column of type tinyint unsigned (one byte) allows you to 
flag on up to 8 values at once (allowing for 4 more essences or any other 
standard spell components). Increasing the column size gives you 16 or 32 
bits to play with. However adding up how many fire spells or earth spells 
cast by an individual wizard becomes interesting again as you have to 
peform bit-level functions to get at that information.

Another optimization you could try (especially if you start allowing 
spells to use different quantities of each essense) is to assign each 
essence it's own value column in the spell table (denormalize your 
spell-spellcast tables).  It would make finding combination spells simpler 
and wouldn't take up too much room compared to your existing structure. It 
makes totaling up the total qty of all essences use in each spell a little 
more difficult (because they are all in their own columns) but how often 
do you really do that? This is one of those situations where speed is 
probably more important than theory and breaking normalization at this one 
level may provide enough performance boost to really help you out.

Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine

PS. Another thing to try is combining the bitmap idea with the pivot table 
idea:

CREATE TEMPORARY TABLE tmpSpellSummary (PRIMARY KEY (wizard_id,spell_id)) 
SELECT 
  wizard_id,
  spell_id,
 
SUM(if(type='AIR',1,if(type='EARTH',2,if(type='FIRE',4,if(type='WATER',8,0))))) 
bitmap
FROM spellcast
GROUP BY wizard_id, spell_id
HAVING bitmap = 9 

(AIR + WATER = 1 + 8 = 9)

As you can tell, there are lots of ways to work through this.  :-)

Reply via email to