James Nobis <[EMAIL PROTECTED]> wrote on 04/21/2005 10:44:07 AM: > The problem is something fairly simple but yet MySQL seems to make this > complicated. Essentially, find a list of customers who have not > bought product > X ever. (Customers have orders, orders have order line items). All > 3 coworkers > independently arrived at the same sql which failed to work. Then, we wrote it > as a subquery which has performance issue and finally rewrote it with a temp > table and a join. However, it seems like what we had should have worked. > > Borrowing from http://builder.com.com/5100-6388_14-5532304.html about midway > down the page I set out to create an identical schema and query in MySQL. > > CREATE TABLE `Customer` ( > `id` int(11) NOT NULL default '0', > `name` varchar(255) NOT NULL default '' > ) ENGINE=MyISAM DEFAULT CHARSET=latin1; > > INSERT INTO `Customer` VALUES (1, 'bob'); > INSERT INTO `Customer` VALUES (2, 'nathan'); > > CREATE TABLE `Order` ( > `id` int(11) NOT NULL auto_increment, > `customer_id` int(11) NOT NULL default '0', > `order_date` datetime NOT NULL default '0000-00-00 00:00:00', > PRIMARY KEY (`id`) > ) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=3 ; > > INSERT INTO `Order` VALUES (1, 1, '0000-00-00 00:00:00'); > INSERT INTO `Order` VALUES (2, 2, '0000-00-00 00:00:00'); > > CREATE TABLE `OrderLines` ( > `order_id` int(11) NOT NULL default '0', > `product_id` int(11) NOT NULL default '0', > `quantity` int(11) NOT NULL default '0' > ) ENGINE=MyISAM DEFAULT CHARSET=latin1; > > INSERT INTO `OrderLines` VALUES (1, 5, 1); > INSERT INTO `OrderLines` VALUES (1, 9, 1); > INSERT INTO `OrderLines` VALUES (2, 15, 1); > INSERT INTO `OrderLines` VALUES (2, 25, 1); > > Then, I run the following query: > SELECT DISTINCT Customer.id, Customer.name > FROM Customer > LEFT JOIN `Order` ON Customer.id = Order.customer_id > INNER JOIN OrderLines ON Order.id = OrderLines.order_id > AND OrderLines.product_id =9 > WHERE Order.customer_id IS NULL > > I would expect this to return a single row with Customer.id 2. > > Is there something obvious my coworkers and I are missing? > > James Nobis > Web Developer > Academic Superstore > 223 W. Anderson Ln. Suite A110, Austin, TX 78752 > Voice: (512) 450-1199 x453 Fax: (512) 450-0263 > http://www.academicsuperstore.com > It's hard to remember where I picked this up but I once read that it's generally bad form to start with an outer join (LEFT or RIGHT JOIN) and move into an INNER JOIN like you are doing. Because if the rows from the Order table are optional to the results of the query, the rows from the OrderLines are transitively optional as well (if an Order row doesn't exist then there can't be any OrderLine rows either). So an equivalent form of your query could have been:
SELECT DISTINCT Customer.id, Customer.name FROM Customer LEFT JOIN `Order` ON Customer.id = Order.customer_id LEFT JOIN OrderLines ON Order.id = OrderLines.order_id AND OrderLines.product_id =9 WHERE Order.customer_id IS NULL; But this won't help you to determine if a Customer had NEVER ordered that product because you are including Order rows regardless of whether that order had a product #9 on it or not. I then tried a nested JOIN using parentheses like this and got no names: SELECT DISTINCT Customer.id, Customer.name FROM Customer LEFT JOIN (`Order` INNER JOIN OrderLines ON Order.id = OrderLines.order_id AND OrderLines.product_id =9 ) ON Customer.id = Order.customer_id WHERE Order.customer_id IS NULL; The unfiltered results of that join look like this(sorry if it wraps): SELECT * FROM Customer LEFT JOIN ( `Order` INNER JOIN OrderLines ON Order.id = OrderLines.order_id AND OrderLines.product_id =9 ) ON Customer.id = Order.customer_id; +----+--------+----+-------------+---------------------+----------+------------+----------+ | id | name | id | customer_id | order_date | order_id | product_id | quantity | +----+--------+----+-------------+---------------------+----------+------------+----------+ | 1 | bob | 1 | 1 | 0000-00-00 00:00:00 | 1 | 9 | 1 | | 2 | nathan | 1 | 1 | 0000-00-00 00:00:00 | NULL | NULL | NULL | | 1 | bob | 2 | 2 | 0000-00-00 00:00:00 | NULL | NULL | NULL | | 2 | nathan | 2 | 2 | 0000-00-00 00:00:00 | NULL | NULL | NULL | +----+--------+----+-------------+---------------------+----------+------------+----------+ 4 rows in set (0.00 sec) Each customer has at least one order so the nested JOIN didn't work to find your answer either (BTW- nested joins are not documented as a valid syntax so I wasn't sure if it was going to work or not). However, I thought, why not do exactly what the original question stated: count how many times product 9 appears as a line item on an order and return the names of the customers where that count is 0. SELECT Customer.id , Customer.name , COUNT(orderlines.product_id) as LineItemCount FROM Customer LEFT JOIN `Order` ON Customer.id = Order.customer_id LEFT JOIN OrderLines ON Order.id = OrderLines.order_id AND OrderLines.product_id =9 GROUP BY 1,2; +----+--------+---------------+ | id | name | LineItemCount | +----+--------+---------------+ | 1 | bob | 1 | | 2 | nathan | 0 | +----+--------+---------------+ 2 rows in set (0.01 sec) All we need now is a HAVING clause to pick out those who have never ordered #9: SELECT Customer.id , Customer.name , COUNT(orderlines.product_id) as LineItemCount FROM Customer LEFT JOIN `Order` ON Customer.id = Order.customer_id LEFT JOIN OrderLines ON Order.id = OrderLines.order_id AND OrderLines.product_id =9 GROUP BY 1,2 HAVING LineItemCount=0; +----+--------+---------------+ | id | name | LineItemCount | +----+--------+---------------+ | 2 | nathan | 0 | +----+--------+---------------+ 1 row in set (0.00 sec) Which is the results you wanted, right? Why didn't your original query work? I can't say for sure but I am sure it has something to do with the fact that your INNER join was subordinate to your LEFT join. There are several bugs about similar situations (mixing LEFT, RIGHT, and INNER in the same query, mixing LEFT and RIGHT) and I think the development team are still trying to work out the correct algorithms to use to apply the correct logic algebra to this class of query. Who knows, maybe your query will be the one that helps the light go off in their head so they can get this all straightened out. Until they do, try to keep your INNER joins superior to your OUTER joins and you should stay out of trouble or refactor your query to precompute your subordinate INNER join into a temp table and work with it from there. With that advice in mind, this may be a faster solution CREATE TEMPORARY TABLE tmpOrders(KEY(customer_id)) SELECT DISTINCT o.customer_id FROM Order o INNER JOIN OrderLines ol ON ol.order_id = o.id SELECT DISTINCT Customer.id, Customer.name FROM Customer LEFT JOIN tmpOrders ON Customer.id = tmpOrders.customer_id WHERE tmpOrders.customer_id is null; DROP TABLE tmpOrders; FWIW.... Shawn Green Database Administrator Unimin Corporation - Spruce Pine