I would first try refactoring your SQL to use INNER JOIN statements 
instead of the comma separated lists you are currently using. I would also 
not use any subqueries. Test this and see if it works for you:

SELECT SUM(li.quantity) as qtysoldytd 
FROM LineItem li
INNER JOIN Sales sa 
        on li.saletranID=sa.saletranID 
        and YEAR(sa.solddate)=2005
INNER JOIN Inventory inv
        on inv.invID = li.invID
INNER JOIN Category cat
        on cat.categoryid = inv.categoryid
        AND cat.vendcode='AA';


The linkages work like this:
1) LineItem links into Sales through saletranID and YEAR(solddate)
2) Sales links into Inventory through the invID
3) Inventory links to Category through categoryid and vendcode

Because I used INNER JOINs, each link in the chain must exist across all 
tables or the row cannot be added to the final results. 

Because this query contains several joins and your table sizes are not 
insignificant it becomes a candidate for what I call "piecewize 
evaluation". Piecewize evaluation is where you take the full query and 
build your desired results in stages. One stage that jumps out at me is 
the conversion of vendcode to a list of invID values. Another stage could 
be isolating just those line items for 2005. I suggest this because 
JOINing two tables (either by explicit declaration as I do or by 
comma-separated lists as you did) is a geometrically expensive operation 
(it's cost to compute grows by multiplying how many rows are participating 
from each table). If we start with two tables M and N and they each have m 
and n rows in them, a JOIN operation takes on the order of m*n cycles to 
compute.  If we can somehow shrink each table participating in the JOIN 
(by pre-selecting certain rows) so that we now have m/4 and n/2 rows to 
JOIN that reduces your overall cost to (m * n)/8. When we are discussing 
products of m*n on the order of 100 million rows or so, reducing 
production time by a factor of 8 is noticable. The situation is even more 
apparent if you add more tables.

Consider if you had tables A, B, and C and they had a,b, and c rows in 
them. If you had to JOIN those three tables to build a query it would take 
a*b*c units of time to complete. If we were only able to reduce each table 
by 10%, that reduces the overall computation to (.9*a)*(.9*b)*(.9*c) = 
.729(abc)

If:
a =  50,000
b = 500,000
c = 800,000 records

The original execution cost is proportional to:
        (50000 * 500000 * 800000) = 20000000000000000 (2.0e16)
after 10% reductions through precomputations:
         2.0e16 * .729 = 1.458e16
---------------------------------------------------------------------------
# of rows combinations NOT fed through the CPU 
to be evaluated as being in the result or not:
        2.0e16 - 1.458e16 = 5.42e+15 = 5420000000000000

How long do you think it takes even a modern computer to do 
5420000000000000 tests? It can make a serious difference.

Piecewize evaluation works VERY WELL in stored procedures (if you are on 
v5.0 or higher) because you can parameterize your queries quite easily and 
you are assured of executing the same query pattern every time you need 
it.

## stage 1 - identifying Line items from 2005

CREATE TEMPORARY TABLE tmpLI (
KEY(invID)
) SELECT li.invID, li.quantity
FROM LineItem li 
INNER JOIN Sales sa 
        on li.saletranID=sa.saletranID 
        and YEAR(sa.solddate)=2005

## stage 2 - identifying Inventory Items for a certain category
CREATE TEMPORARY TABLE tmpInv (
        KEY(invID)
) SELECT DISTINCT invID
FROM Inventory inv
        on inv.invID = li.invID
INNER JOIN Category cat
        on cat.categoryid = inv.categoryid
        AND cat.vendcode='AA';

## stage 3 - compute your desired results
SELECT SUM(li.quantity)
FROM tmpLI li
INNER JOIN tmpInf inv
        ON inv.invID = li.invID;

## stage 4 - the database is not your momma. Clean up after yourself...

DROP TEMPORARY TABLE tmpLi;
DROP TEMPORARY TABLE tmpInv;

## end query

I hope that helps (HTH),

Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine








[EMAIL PROTECTED] wrote on 11/04/2005 12:28:50 AM:

> I cannot figure this one out. I have a Category table with 50,000
> records, an Inventory table with over 2 million records. A Sales table
> with 500,000 records. And a LineItem table with 800,000 records pairing
> the Inventory ID with the Sales Transaction ID and Quantity. I need to
> generate a Quantity sold year to date for a certain vendor. The vendor
> code can be found in the Category table which has a relationship with
> Inventory. I am trying a SQL statement like this:
> 
> select sum(li.quantity) as qtysoldytd from LineItem li, Sales sa where
> (li.saletranID=sa.saletranID and YEAR(sa.solddate)=2005) and li.InvID
> IN (select invID from Inventory where categoryid IN (select categoryid
> from Category where vendcode='AA')) 
> 
> this yields null when I know there are sales for that vendor in 2005.
> Simplified schemas for the tables are as follows:
> Category:
> 
+----------------+------------------+------+-----+---------+----------------+
> | Field          | Type             | Null | Key | Default | Extra 
>    |
> 
+----------------+------------------+------+-----+---------+----------------+
> | vendcode       | char(3)          | YES  | MUL | NULL    | 
>    |
> | categoryID     | int(10) unsigned |      | PRI | NULL    |
> auto_increment |
> 
+----------------+------------------+------+-----+---------+----------------+
> 
> Inventory:
> +--------------+---------------+------+-----+---------+-------+
> | Field        | Type          | Null | Key | Default | Extra |
> +--------------+---------------+------+-----+---------+-------+
> | categoryID   | int(11)       | YES  | MUL | NULL    |       |
> | invID        | int(10)       |      | PRI | 0       |       |
> | itemnum      | int(11)       | YES  | MUL | NULL    |       |
> +--------------+---------------+------+-----+---------+-------+
> 
> Sales:
> 
+--------------+------------------+------+-----+---------+----------------+
> | Field        | Type             | Null | Key | Default | Extra 
>  |
> 
+--------------+------------------+------+-----+---------+----------------+
> | saletranID   | int(10) unsigned |      | PRI | NULL    |
> auto_increment |
> | solddate     | datetime         | YES  |     | NULL    | 
>  |
> 
+--------------+------------------+------+-----+---------+----------------+
> 
> LineItem:
> +------------+---------+------+-----+---------+-------+
> | Field      | Type    | Null | Key | Default | Extra |
> +------------+---------+------+-----+---------+-------+
> | invID      | int(10) | YES  | MUL | NULL    |       |
> | quantity   | int(10) | YES  |     | NULL    |       |
> | saletranID | int(10) | YES  | MUL | NULL    |       |
> +------------+---------+------+-----+---------+-------+
> 
> Can anybody shed some light on this and if this is even possible. I have
> indexes in place and the query is still slow to pull.
> Thanks a million,
> Nathan
> 
> 
> -- 
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]
> 

Reply via email to