Many thanks for your reply Shawn - I have some comments below.


> Hello,
>
> I need to be able to sum over distinct values but I can't seem to do it
> unless I use sub-selects (which I want to avoid doing).
>
> To see what I mean, I've constructed a toy DB:
>
> DROP TABLE IF EXISTS spell;
> CREATE TABLE spell (
>   id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
>   power FLOAT,
>   time_casted DATETIME
> );
>
> DROP TABLE IF EXISTS wizard;
> CREATE TABLE wizard (
>   id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
>   age INT UNSIGNED,
>   name VARCHAR(255)
> );
>
> DROP TABLE IF EXISTS spellcast;
> CREATE TABLE spellcast (
>   id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
>   type ENUM ('fire', 'air', 'water', 'earth'),
>   spell_id INT UNSIGNED,
>   wizard_id INT UNSIGNED
> );
>
> INSERT INTO wizard (name, age) VALUES
>    ('alan', 20),
>    ('bill', 23),
>    ('chris', 21);
>
> INSERT INTO spell (power, time_casted) VALUES
>    (400, '2006-02-02 12:00'),
>    (432, '2006-02-04 12:00'),
>    (123, '2006-02-03 12:00'),
>    (543, '2006-02-08 12:00'),
>    (320, '2006-02-01 12:00'),
>    (102, '2006-02-12 12:00'),
>    (732, '2006-02-14 12:00'),
>    (948, '2006-02-18 12:00'),
>    (932, '2006-02-21 12:00'),
>    (842, '2006-02-26 12:00');
>
> INSERT INTO spellcast (type, spell_id, wizard_id) VALUES
>    ('fire', 1, 1),
>    ('air', 1, 1),
>    ('water', 1, 1),
>    ('earth', 2, 1),
>    ('water', 2, 1),
>    ('fire', 3, 1),
>    ('water', 3, 1),
>    ('water', 4, 1),
>    ('fire', 4, 1),
>    ('air', 5, 1),
>    ('fire', 6, 1),
>    ('water', 7, 1),
>    ('water', 1, 2),
>    ('fire', 1, 2),
>    ('air', 2, 2),
>    ('earth', 3, 2),
>    ('water', 3, 2),
>    ('earth', 4, 2),
>    ('fire', 4, 2),
>    ('air', 4, 2),
>    ('water', 1, 3),
>    ('earth', 1, 3),
>    ('air', 1, 3),
>    ('water', 5, 3),
>    ('fire', 5, 3),
>    ('earth', 5, 3),
>    ('water', 6, 3),
>    ('air', 7, 3);
>
> A spell is an individual spell that's been cast. A spellcast is the
> action of casting the spell by a particular wizard (or a group of
> wizards). When casting a spell, a wizard can contribute various essenses
> (fire, earth, air, water). So for example, Alan cast a spell (id=1) and
> contributed three essences (fire, air & water) - this means that there
> are 3 spellcast rows for this contribution to this spell.
>
> Let's say I want to find the total power of all the spells cast by each
> wizard that involve fire & air. At first I thought the following might work:
>
> SELECT wizard.name,
>        SUM(spell.power)
> FROM spell,
>      spellcast,
>      wizard
> WHERE wizard.id = spellcast.wizard_id AND
>       spellcast.spell_id = spell.id AND
>       spellcast.type IN ('fire', 'air')
> GROUP BY wizard.id;
>
> But this is wrong. The above query will count some spells more than
> once, so the resulting sum is greater than it should be.
>
> The only way I can think of doing this correctly is to use sub-queries:
>
> SELECT DISTINCT wizard.name,
>        sub.s
> FROM
>   (SELECT SUM(inner_sub.power) AS s,
>           inner_sub.wiz_id
>    FROM
>      (SELECT DISTINCT
>              spell.id,
>              spell.power,
>              wizard.id AS wiz_id
>       FROM spell,
>            spellcast,
>            wizard
>       WHERE wizard.id = spellcast.wizard_id AND
>             spellcast.spell_id = spell.id AND
>             spellcast.type IN ('fire', 'air')
>       ) AS inner_sub
>    GROUP BY inner_sub.wiz_id
>    ) AS sub,
>    wizard,
>    spellcast
> WHERE wizard.id = sub.wiz_id AND
>       spellcast.wizard_id = wizard.id AND
>       spellcast.type IN ('fire', 'air');
>
> This works but I was wondering whether there was a simpler way to do it. > All my queries are generated dynamically, and I want to avoid generating
> complex subqueries.
>
> Anyone know of a way to do the above a lot more simply? I can change the
> DB schema if needs be.
>
> Thanks,
> Yasir
>
>


Based on database theory, your schema is correct (so long as each spell only consumes up to 1 unit of essence). As you have discovered, you are trying to take a second-level summary within a single statement. While the COUNT() aggregate function has a DISTINCT modifier none of the others do. That is why you posted.

One technique you could try is to create an intermediate pivot table. That way you can know how much of each essence has been used in each spell.

CREATE TEMPORARY TABLE tmpSpellSummary (PRIMARY KEY (wizard_id,spell_id)) SELECT
  wizard_id,
  spell_id,
  SUM(if(type='AIR',1,0)) air,
  SUM(if(type='EARTH',1,0)) earth,
  SUM(if(type='FIRE',1,0)) fire,
  SUM(if(type='WATER',1,0)) water
FROM spellcast
GROUP BY wizard_id, spell_id

Now you can join this pivot table to your other query and you won't have the duplication. You can also modify this by adding (after the GROUP BY clause):

HAVING air>0 and water>0

to pick out just those spell_id's that used both air and water.

MySQL won't let me use HAVING here (because HAVING only works on columns that are in the SELECT list) so this is what I did:

SELECT wizard.name,
      SUM(spell.power)
FROM spell,
    wizard,
    tmpSpellSummary
WHERE wizard.id = tmpSpellSummary.wizard_id AND
     tmpSpellSummary.spell_id = spell.id AND
     (tmpSpellSummary.air > 0 OR tmpSpellSummary.fire > 0)
GROUP BY wizard.id;

and it works.


Another technique you could try (still assuming that using an essence only consumes 1 unit of it) is to add an "essence" bitmap value to your spell table (along with a wizard_id) and eliminate the spellcast table entirely. Assuming that your lowest four bits represent air,earth,fire, water (in that order) you could match an (air+fire) spell against the value 10 or an (earth + water) spell against the value 5 or an earth-only spell against the value 4. A column of type tinyint unsigned (one byte) allows you to flag on up to 8 values at once (allowing for 4 more essences or any other standard spell components). Increasing the column size gives you 16 or 32 bits to play with. However adding up how many fire spells or earth spells cast by an individual wizard becomes interesting again as you have to peform bit-level functions to get at that information.

Also, I could use the SET column type (which is probably implemented using bit fields anyway).

Another optimization you could try (especially if you start allowing spells to use different quantities of each essense) is to assign each essence it's own value column in the spell table (denormalize your spell-spellcast tables). It would make finding combination spells simpler and wouldn't take up too much room compared to your existing structure. It makes totaling up the total qty of all essences use in each spell a little more difficult (because they are all in their own columns) but how often do you really do that? This is one of those situations where speed is probably more important than theory and breaking normalization at this one level may provide enough performance boost to really help you out.

Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine

PS. Another thing to try is combining the bitmap idea with the pivot table idea:

CREATE TEMPORARY TABLE tmpSpellSummary (PRIMARY KEY (wizard_id,spell_id)) SELECT
  wizard_id,
  spell_id,
SUM(if(type='AIR',1,if(type='EARTH',2,if(type='FIRE',4,if(type='WATER',8,0))))) bitmap
FROM spellcast
GROUP BY wizard_id, spell_id
HAVING bitmap = 9

(AIR + WATER = 1 + 8 = 9)

This sounds like the best solution to me, though I think I'll use a SET rather than a bitmap.

Thanks again for all your help - I really appreciate it.

Yasir


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to