Re: SQL help for qty Sold YTD...

2005-11-04 Thread Rhino
I spent several minutes looking at your question and your data model and
nothing jumped out at me that precluded you from determining the quantity of
the this vendor's items sold via this data model. I might have missed
something though.

I'm a little concerned that your LineItem table appeared to have no primary
key. In my opinion, the primary key of a line item table should be an order
number (saleTranID?) and then a sequence number (1 for the first item on the
order, 2 for the second, etc.) but you (apparently) have no primary key
defined at all and don't have a sequence number either. However, that
shouldn't keep this particular query from running or returning appropriate
rows.

I am also assuming that invID is an inventory ID - my brain kept reading it
as invoice ID but I learned to ignore it ;-) - where an inventory ID
uniquely identifies one product that you sell, e.g. invID 1 might be power
supplies for Sony camcorders while invID 2 might be Palm Tungsten E PDAs.
This is something I would normally call a product ID if I were doing the
naming :-) If, in fact, invID *is* an invoice ID, i.e. something that
uniquely identifies a particular sales transaction then there is something
wrong which might explain why you're not getting any data.

So, assuming I haven't misunderstood anything or simply missed something, I
would be inclined to break the query down into chunks. Execute each chunk on
its own andmake sure that each chunk delivers what you think it should. If
it doesn't, either the query is wrong or the data isn't what you think it
is. Verify that the data you expect is there by doing SELECTs against the
relevant tables; if the data is there, it's got to be your query that is
wrong. Inspect each chunk until you find the culprit(s) in either the SQL or
the data.

Also, for what it's worth, I would strongly suggest that you set up a test
environment with a SMALL quantity of data in each table - 50 rows or less
should be plenty for most situations - and try your queries against that
test environment. That makes the testing process a lot less painful - why
wait for many seconds or even minutes for the query to give you the wrong
answer? - and let's you solve the problem faster. It might sound like a lot
of work but it shouldn't be; just clone the real tables and then copy a
small but representative sample of data from the real tables into the
clones.

You also asked about performance but there is no way anyone can comment on
that without knowing a lot more about what indexes you have and, perhaps,
which engine you are using. But, in my opinion, your first effort should be
directed toward getting the query running correctly, THEN worry about making
it go faster.

Rhino

- Original Message - 
From: [EMAIL PROTECTED]
To: mysql@lists.mysql.com
Sent: Friday, November 04, 2005 12:28 AM
Subject: SQL help for qty Sold YTD...


 I cannot figure this one out. I have a Category table with 50,000
 records, an Inventory table with over 2 million records. A Sales table
 with 500,000 records. And a LineItem table with 800,000 records pairing
 the Inventory ID with the Sales Transaction ID and Quantity. I need to
 generate a Quantity sold year to date for a certain vendor. The vendor
 code can be found in the Category table which has a relationship with
 Inventory. I am trying a SQL statement like this:

 select sum(li.quantity) as qtysoldytd from LineItem li, Sales sa where
 (li.saletranID=sa.saletranID and YEAR(sa.solddate)=2005) and li.InvID
 IN (select invID from Inventory where categoryid IN (select categoryid
 from Category where vendcode='AA'))

 this yields null when I know there are sales for that vendor in 2005.
 Simplified schemas for the tables are as follows:
 Category:

++--+--+-+-+
+
 | Field  | Type | Null | Key | Default | Extra
|

++--+--+-+-+
+
 | vendcode   | char(3)  | YES  | MUL | NULL|
|
 | categoryID | int(10) unsigned |  | PRI | NULL|
 auto_increment |

++--+--+-+-+
+

 Inventory:
 +--+---+--+-+-+---+
 | Field| Type  | Null | Key | Default | Extra |
 +--+---+--+-+-+---+
 | categoryID   | int(11)   | YES  | MUL | NULL|   |
 | invID| int(10)   |  | PRI | 0   |   |
 | itemnum  | int(11)   | YES  | MUL | NULL|   |
 +--+---+--+-+-+---+

 Sales:

+--+--+--+-+-++
 | Field| Type | Null | Key | Default | Extra
  |

+--+--+--+-+-++
 | saletranID   | int(10) unsigned |  | PRI | NULL|
 auto_increment

Re: SQL help for qty Sold YTD...

2005-11-04 Thread SGreen
I would first try refactoring your SQL to use INNER JOIN statements 
instead of the comma separated lists you are currently using. I would also 
not use any subqueries. Test this and see if it works for you:

SELECT SUM(li.quantity) as qtysoldytd 
FROM LineItem li
INNER JOIN Sales sa 
on li.saletranID=sa.saletranID 
and YEAR(sa.solddate)=2005
INNER JOIN Inventory inv
on inv.invID = li.invID
INNER JOIN Category cat
on cat.categoryid = inv.categoryid
AND cat.vendcode='AA';


The linkages work like this:
1) LineItem links into Sales through saletranID and YEAR(solddate)
2) Sales links into Inventory through the invID
3) Inventory links to Category through categoryid and vendcode

Because I used INNER JOINs, each link in the chain must exist across all 
tables or the row cannot be added to the final results. 

Because this query contains several joins and your table sizes are not 
insignificant it becomes a candidate for what I call piecewize 
evaluation. Piecewize evaluation is where you take the full query and 
build your desired results in stages. One stage that jumps out at me is 
the conversion of vendcode to a list of invID values. Another stage could 
be isolating just those line items for 2005. I suggest this because 
JOINing two tables (either by explicit declaration as I do or by 
comma-separated lists as you did) is a geometrically expensive operation 
(it's cost to compute grows by multiplying how many rows are participating 
from each table). If we start with two tables M and N and they each have m 
and n rows in them, a JOIN operation takes on the order of m*n cycles to 
compute.  If we can somehow shrink each table participating in the JOIN 
(by pre-selecting certain rows) so that we now have m/4 and n/2 rows to 
JOIN that reduces your overall cost to (m * n)/8. When we are discussing 
products of m*n on the order of 100 million rows or so, reducing 
production time by a factor of 8 is noticable. The situation is even more 
apparent if you add more tables.

Consider if you had tables A, B, and C and they had a,b, and c rows in 
them. If you had to JOIN those three tables to build a query it would take 
a*b*c units of time to complete. If we were only able to reduce each table 
by 10%, that reduces the overall computation to (.9*a)*(.9*b)*(.9*c) = 
.729(abc)

If:
a =  50,000
b = 500,000
c = 800,000 records

The original execution cost is proportional to:
(5 * 50 * 80) = 2 (2.0e16)
after 10% reductions through precomputations:
 2.0e16 * .729 = 1.458e16
---
# of rows combinations NOT fed through the CPU 
to be evaluated as being in the result or not:
2.0e16 - 1.458e16 = 5.42e+15 = 5420

How long do you think it takes even a modern computer to do 
5420 tests? It can make a serious difference.

Piecewize evaluation works VERY WELL in stored procedures (if you are on 
v5.0 or higher) because you can parameterize your queries quite easily and 
you are assured of executing the same query pattern every time you need 
it.

## stage 1 - identifying Line items from 2005

CREATE TEMPORARY TABLE tmpLI (
KEY(invID)
) SELECT li.invID, li.quantity
FROM LineItem li 
INNER JOIN Sales sa 
on li.saletranID=sa.saletranID 
and YEAR(sa.solddate)=2005

## stage 2 - identifying Inventory Items for a certain category
CREATE TEMPORARY TABLE tmpInv (
KEY(invID)
) SELECT DISTINCT invID
FROM Inventory inv
on inv.invID = li.invID
INNER JOIN Category cat
on cat.categoryid = inv.categoryid
AND cat.vendcode='AA';

## stage 3 - compute your desired results
SELECT SUM(li.quantity)
FROM tmpLI li
INNER JOIN tmpInf inv
ON inv.invID = li.invID;

## stage 4 - the database is not your momma. Clean up after yourself...

DROP TEMPORARY TABLE tmpLi;
DROP TEMPORARY TABLE tmpInv;

## end query

I hope that helps (HTH),

Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine








[EMAIL PROTECTED] wrote on 11/04/2005 12:28:50 AM:

 I cannot figure this one out. I have a Category table with 50,000
 records, an Inventory table with over 2 million records. A Sales table
 with 500,000 records. And a LineItem table with 800,000 records pairing
 the Inventory ID with the Sales Transaction ID and Quantity. I need to
 generate a Quantity sold year to date for a certain vendor. The vendor
 code can be found in the Category table which has a relationship with
 Inventory. I am trying a SQL statement like this:
 
 select sum(li.quantity) as qtysoldytd from LineItem li, Sales sa where
 (li.saletranID=sa.saletranID and YEAR(sa.solddate)=2005) and li.InvID
 IN (select invID from Inventory where categoryid IN (select categoryid
 from Category where vendcode='AA')) 
 
 this yields null when I know there are sales for that vendor in 2005.
 Simplified schemas for the tables are as follows:
 Category:
 

SQL help for qty Sold YTD...

2005-11-03 Thread wodev
I cannot figure this one out. I have a Category table with 50,000
records, an Inventory table with over 2 million records. A Sales table
with 500,000 records. And a LineItem table with 800,000 records pairing
the Inventory ID with the Sales Transaction ID and Quantity. I need to
generate a Quantity sold year to date for a certain vendor. The vendor
code can be found in the Category table which has a relationship with
Inventory. I am trying a SQL statement like this:

select sum(li.quantity) as qtysoldytd from LineItem li, Sales sa where
(li.saletranID=sa.saletranID and YEAR(sa.solddate)=2005) and li.InvID
IN (select invID from Inventory where categoryid IN (select categoryid
from Category where vendcode='AA')) 

this yields null when I know there are sales for that vendor in 2005.
Simplified schemas for the tables are as follows:
Category:
++--+--+-+-++
| Field  | Type | Null | Key | Default | Extra  
   |
++--+--+-+-++
| vendcode   | char(3)  | YES  | MUL | NULL|
   |
| categoryID | int(10) unsigned |  | PRI | NULL|
auto_increment |
++--+--+-+-++

Inventory:
+--+---+--+-+-+---+
| Field| Type  | Null | Key | Default | Extra |
+--+---+--+-+-+---+
| categoryID   | int(11)   | YES  | MUL | NULL|   |
| invID| int(10)   |  | PRI | 0   |   |
| itemnum  | int(11)   | YES  | MUL | NULL|   |
+--+---+--+-+-+---+

Sales:
+--+--+--+-+-++
| Field| Type | Null | Key | Default | Extra
 |
+--+--+--+-+-++
| saletranID   | int(10) unsigned |  | PRI | NULL|
auto_increment |
| solddate | datetime | YES  | | NULL|  
 |
+--+--+--+-+-++

LineItem:
++-+--+-+-+---+
| Field  | Type| Null | Key | Default | Extra |
++-+--+-+-+---+
| invID  | int(10) | YES  | MUL | NULL|   |
| quantity   | int(10) | YES  | | NULL|   |
| saletranID | int(10) | YES  | MUL | NULL|   |
++-+--+-+-+---+

Can anybody shed some light on this and if this is even possible. I have
indexes in place and the query is still slow to pull.
Thanks a million,
Nathan


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]