Re: SQL help for qty Sold YTD...
I spent several minutes looking at your question and your data model and nothing jumped out at me that precluded you from determining the quantity of the this vendor's items sold via this data model. I might have missed something though. I'm a little concerned that your LineItem table appeared to have no primary key. In my opinion, the primary key of a line item table should be an order number (saleTranID?) and then a sequence number (1 for the first item on the order, 2 for the second, etc.) but you (apparently) have no primary key defined at all and don't have a sequence number either. However, that shouldn't keep this particular query from running or returning appropriate rows. I am also assuming that invID is an inventory ID - my brain kept reading it as invoice ID but I learned to ignore it ;-) - where an inventory ID uniquely identifies one product that you sell, e.g. invID 1 might be power supplies for Sony camcorders while invID 2 might be Palm Tungsten E PDAs. This is something I would normally call a product ID if I were doing the naming :-) If, in fact, invID *is* an invoice ID, i.e. something that uniquely identifies a particular sales transaction then there is something wrong which might explain why you're not getting any data. So, assuming I haven't misunderstood anything or simply missed something, I would be inclined to break the query down into chunks. Execute each chunk on its own andmake sure that each chunk delivers what you think it should. If it doesn't, either the query is wrong or the data isn't what you think it is. Verify that the data you expect is there by doing SELECTs against the relevant tables; if the data is there, it's got to be your query that is wrong. Inspect each chunk until you find the culprit(s) in either the SQL or the data. Also, for what it's worth, I would strongly suggest that you set up a test environment with a SMALL quantity of data in each table - 50 rows or less should be plenty for most situations - and try your queries against that test environment. That makes the testing process a lot less painful - why wait for many seconds or even minutes for the query to give you the wrong answer? - and let's you solve the problem faster. It might sound like a lot of work but it shouldn't be; just clone the real tables and then copy a small but representative sample of data from the real tables into the clones. You also asked about performance but there is no way anyone can comment on that without knowing a lot more about what indexes you have and, perhaps, which engine you are using. But, in my opinion, your first effort should be directed toward getting the query running correctly, THEN worry about making it go faster. Rhino - Original Message - From: [EMAIL PROTECTED] To: mysql@lists.mysql.com Sent: Friday, November 04, 2005 12:28 AM Subject: SQL help for qty Sold YTD... I cannot figure this one out. I have a Category table with 50,000 records, an Inventory table with over 2 million records. A Sales table with 500,000 records. And a LineItem table with 800,000 records pairing the Inventory ID with the Sales Transaction ID and Quantity. I need to generate a Quantity sold year to date for a certain vendor. The vendor code can be found in the Category table which has a relationship with Inventory. I am trying a SQL statement like this: select sum(li.quantity) as qtysoldytd from LineItem li, Sales sa where (li.saletranID=sa.saletranID and YEAR(sa.solddate)=2005) and li.InvID IN (select invID from Inventory where categoryid IN (select categoryid from Category where vendcode='AA')) this yields null when I know there are sales for that vendor in 2005. Simplified schemas for the tables are as follows: Category: ++--+--+-+-+ + | Field | Type | Null | Key | Default | Extra | ++--+--+-+-+ + | vendcode | char(3) | YES | MUL | NULL| | | categoryID | int(10) unsigned | | PRI | NULL| auto_increment | ++--+--+-+-+ + Inventory: +--+---+--+-+-+---+ | Field| Type | Null | Key | Default | Extra | +--+---+--+-+-+---+ | categoryID | int(11) | YES | MUL | NULL| | | invID| int(10) | | PRI | 0 | | | itemnum | int(11) | YES | MUL | NULL| | +--+---+--+-+-+---+ Sales: +--+--+--+-+-++ | Field| Type | Null | Key | Default | Extra | +--+--+--+-+-++ | saletranID | int(10) unsigned | | PRI | NULL| auto_increment
Re: SQL help for qty Sold YTD...
I would first try refactoring your SQL to use INNER JOIN statements instead of the comma separated lists you are currently using. I would also not use any subqueries. Test this and see if it works for you: SELECT SUM(li.quantity) as qtysoldytd FROM LineItem li INNER JOIN Sales sa on li.saletranID=sa.saletranID and YEAR(sa.solddate)=2005 INNER JOIN Inventory inv on inv.invID = li.invID INNER JOIN Category cat on cat.categoryid = inv.categoryid AND cat.vendcode='AA'; The linkages work like this: 1) LineItem links into Sales through saletranID and YEAR(solddate) 2) Sales links into Inventory through the invID 3) Inventory links to Category through categoryid and vendcode Because I used INNER JOINs, each link in the chain must exist across all tables or the row cannot be added to the final results. Because this query contains several joins and your table sizes are not insignificant it becomes a candidate for what I call piecewize evaluation. Piecewize evaluation is where you take the full query and build your desired results in stages. One stage that jumps out at me is the conversion of vendcode to a list of invID values. Another stage could be isolating just those line items for 2005. I suggest this because JOINing two tables (either by explicit declaration as I do or by comma-separated lists as you did) is a geometrically expensive operation (it's cost to compute grows by multiplying how many rows are participating from each table). If we start with two tables M and N and they each have m and n rows in them, a JOIN operation takes on the order of m*n cycles to compute. If we can somehow shrink each table participating in the JOIN (by pre-selecting certain rows) so that we now have m/4 and n/2 rows to JOIN that reduces your overall cost to (m * n)/8. When we are discussing products of m*n on the order of 100 million rows or so, reducing production time by a factor of 8 is noticable. The situation is even more apparent if you add more tables. Consider if you had tables A, B, and C and they had a,b, and c rows in them. If you had to JOIN those three tables to build a query it would take a*b*c units of time to complete. If we were only able to reduce each table by 10%, that reduces the overall computation to (.9*a)*(.9*b)*(.9*c) = .729(abc) If: a = 50,000 b = 500,000 c = 800,000 records The original execution cost is proportional to: (5 * 50 * 80) = 2 (2.0e16) after 10% reductions through precomputations: 2.0e16 * .729 = 1.458e16 --- # of rows combinations NOT fed through the CPU to be evaluated as being in the result or not: 2.0e16 - 1.458e16 = 5.42e+15 = 5420 How long do you think it takes even a modern computer to do 5420 tests? It can make a serious difference. Piecewize evaluation works VERY WELL in stored procedures (if you are on v5.0 or higher) because you can parameterize your queries quite easily and you are assured of executing the same query pattern every time you need it. ## stage 1 - identifying Line items from 2005 CREATE TEMPORARY TABLE tmpLI ( KEY(invID) ) SELECT li.invID, li.quantity FROM LineItem li INNER JOIN Sales sa on li.saletranID=sa.saletranID and YEAR(sa.solddate)=2005 ## stage 2 - identifying Inventory Items for a certain category CREATE TEMPORARY TABLE tmpInv ( KEY(invID) ) SELECT DISTINCT invID FROM Inventory inv on inv.invID = li.invID INNER JOIN Category cat on cat.categoryid = inv.categoryid AND cat.vendcode='AA'; ## stage 3 - compute your desired results SELECT SUM(li.quantity) FROM tmpLI li INNER JOIN tmpInf inv ON inv.invID = li.invID; ## stage 4 - the database is not your momma. Clean up after yourself... DROP TEMPORARY TABLE tmpLi; DROP TEMPORARY TABLE tmpInv; ## end query I hope that helps (HTH), Shawn Green Database Administrator Unimin Corporation - Spruce Pine [EMAIL PROTECTED] wrote on 11/04/2005 12:28:50 AM: I cannot figure this one out. I have a Category table with 50,000 records, an Inventory table with over 2 million records. A Sales table with 500,000 records. And a LineItem table with 800,000 records pairing the Inventory ID with the Sales Transaction ID and Quantity. I need to generate a Quantity sold year to date for a certain vendor. The vendor code can be found in the Category table which has a relationship with Inventory. I am trying a SQL statement like this: select sum(li.quantity) as qtysoldytd from LineItem li, Sales sa where (li.saletranID=sa.saletranID and YEAR(sa.solddate)=2005) and li.InvID IN (select invID from Inventory where categoryid IN (select categoryid from Category where vendcode='AA')) this yields null when I know there are sales for that vendor in 2005. Simplified schemas for the tables are as follows: Category:
SQL help for qty Sold YTD...
I cannot figure this one out. I have a Category table with 50,000 records, an Inventory table with over 2 million records. A Sales table with 500,000 records. And a LineItem table with 800,000 records pairing the Inventory ID with the Sales Transaction ID and Quantity. I need to generate a Quantity sold year to date for a certain vendor. The vendor code can be found in the Category table which has a relationship with Inventory. I am trying a SQL statement like this: select sum(li.quantity) as qtysoldytd from LineItem li, Sales sa where (li.saletranID=sa.saletranID and YEAR(sa.solddate)=2005) and li.InvID IN (select invID from Inventory where categoryid IN (select categoryid from Category where vendcode='AA')) this yields null when I know there are sales for that vendor in 2005. Simplified schemas for the tables are as follows: Category: ++--+--+-+-++ | Field | Type | Null | Key | Default | Extra | ++--+--+-+-++ | vendcode | char(3) | YES | MUL | NULL| | | categoryID | int(10) unsigned | | PRI | NULL| auto_increment | ++--+--+-+-++ Inventory: +--+---+--+-+-+---+ | Field| Type | Null | Key | Default | Extra | +--+---+--+-+-+---+ | categoryID | int(11) | YES | MUL | NULL| | | invID| int(10) | | PRI | 0 | | | itemnum | int(11) | YES | MUL | NULL| | +--+---+--+-+-+---+ Sales: +--+--+--+-+-++ | Field| Type | Null | Key | Default | Extra | +--+--+--+-+-++ | saletranID | int(10) unsigned | | PRI | NULL| auto_increment | | solddate | datetime | YES | | NULL| | +--+--+--+-+-++ LineItem: ++-+--+-+-+---+ | Field | Type| Null | Key | Default | Extra | ++-+--+-+-+---+ | invID | int(10) | YES | MUL | NULL| | | quantity | int(10) | YES | | NULL| | | saletranID | int(10) | YES | MUL | NULL| | ++-+--+-+-+---+ Can anybody shed some light on this and if this is even possible. I have indexes in place and the query is still slow to pull. Thanks a million, Nathan -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]