James Nobis <[EMAIL PROTECTED]> wrote on 04/21/2005 
10:44:07 AM:

> The problem is something fairly simple but yet MySQL seems to make this
> complicated.  Essentially, find a list of customers who have not 
> bought product
> X ever.  (Customers have orders, orders have order line items).  All
> 3 coworkers
> independently arrived at the same sql which failed to work.  Then, we 
wrote it
> as a subquery which has performance issue and finally rewrote it with a 
temp
> table and a join.  However, it seems like what we had should have 
worked.
> 
> Borrowing from http://builder.com.com/5100-6388_14-5532304.html about 
midway
> down the page I set out to create an identical schema and query in 
MySQL.
> 
> CREATE TABLE `Customer` (
>   `id` int(11) NOT NULL default '0',
>   `name` varchar(255) NOT NULL default ''
> ) ENGINE=MyISAM DEFAULT CHARSET=latin1;
> 
> INSERT INTO `Customer` VALUES (1, 'bob');
> INSERT INTO `Customer` VALUES (2, 'nathan');
> 
> CREATE TABLE `Order` (
>   `id` int(11) NOT NULL auto_increment,
>   `customer_id` int(11) NOT NULL default '0',
>   `order_date` datetime NOT NULL default '0000-00-00 00:00:00',
>   PRIMARY KEY  (`id`)
> ) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=3 ;
> 
> INSERT INTO `Order` VALUES (1, 1, '0000-00-00 00:00:00');
> INSERT INTO `Order` VALUES (2, 2, '0000-00-00 00:00:00');
> 
> CREATE TABLE `OrderLines` (
>   `order_id` int(11) NOT NULL default '0',
>   `product_id` int(11) NOT NULL default '0',
>   `quantity` int(11) NOT NULL default '0'
> ) ENGINE=MyISAM DEFAULT CHARSET=latin1;
> 
> INSERT INTO `OrderLines` VALUES (1, 5, 1);
> INSERT INTO `OrderLines` VALUES (1, 9, 1);
> INSERT INTO `OrderLines` VALUES (2, 15, 1);
> INSERT INTO `OrderLines` VALUES (2, 25, 1);
> 
> Then, I run the following query:
> SELECT DISTINCT Customer.id, Customer.name
> FROM Customer
> LEFT JOIN `Order` ON Customer.id = Order.customer_id
> INNER JOIN OrderLines ON Order.id = OrderLines.order_id
> AND OrderLines.product_id =9
> WHERE Order.customer_id IS NULL
> 
> I would expect this to return a single row with Customer.id 2.
> 
> Is there something obvious my coworkers and I are missing?
> 
> James Nobis
> Web Developer
> Academic Superstore
> 223 W. Anderson Ln. Suite A110, Austin, TX 78752
> Voice: (512) 450-1199 x453 Fax: (512) 450-0263
> http://www.academicsuperstore.com
> 
It's hard to remember where I picked this up but I once read that it's 
generally bad form to start with an outer join (LEFT or RIGHT JOIN) and 
move into an INNER JOIN like you are doing. Because if the rows from the 
Order table are optional to the results of the query, the rows from the 
OrderLines are transitively optional as well (if an Order row doesn't 
exist then there can't be any OrderLine rows either). So an equivalent 
form of your query could have been:

SELECT DISTINCT Customer.id, Customer.name
FROM Customer
LEFT JOIN `Order` 
        ON Customer.id = Order.customer_id
LEFT JOIN OrderLines 
        ON Order.id = OrderLines.order_id
        AND OrderLines.product_id =9
WHERE Order.customer_id IS NULL;

But this won't help you to determine if a Customer had NEVER ordered that 
product because you are including Order rows regardless of whether that 
order had a product #9 on it or not. I then tried a nested JOIN using 
parentheses like this and got no names:

SELECT DISTINCT Customer.id, Customer.name
FROM Customer
LEFT JOIN (`Order` 
INNER JOIN OrderLines 
        ON Order.id = OrderLines.order_id
        AND OrderLines.product_id =9
) ON Customer.id = Order.customer_id
WHERE Order.customer_id IS NULL;

The unfiltered results of that join look like this(sorry if it wraps):

SELECT *
FROM Customer
LEFT JOIN (
        `Order` INNER JOIN OrderLines
                ON Order.id = OrderLines.order_id
                AND OrderLines.product_id =9
) ON Customer.id = Order.customer_id;
+----+--------+----+-------------+---------------------+----------+------------+----------+
| id | name   | id | customer_id | order_date          | order_id | 
product_id | quantity |
+----+--------+----+-------------+---------------------+----------+------------+----------+
|  1 | bob    |  1 |           1 | 0000-00-00 00:00:00 |        1 |   9 |  
     1 |
|  2 | nathan |  1 |           1 | 0000-00-00 00:00:00 |     NULL | NULL | 
    NULL |
|  1 | bob    |  2 |           2 | 0000-00-00 00:00:00 |     NULL | NULL | 
    NULL |
|  2 | nathan |  2 |           2 | 0000-00-00 00:00:00 |     NULL | NULL | 
    NULL |
+----+--------+----+-------------+---------------------+----------+------------+----------+
4 rows in set (0.00 sec)

Each customer has at least one order so the nested JOIN didn't work to 
find your answer either (BTW- nested joins are not documented as a valid 
syntax so I wasn't sure if it was going to work or not). 

However, I thought, why not do exactly what the original question stated: 
count how many times product 9 appears as a line item on an order and 
return the names of the customers where that count is 0.

SELECT Customer.id
        , Customer.name
        , COUNT(orderlines.product_id) as LineItemCount
FROM Customer
LEFT JOIN `Order` 
        ON Customer.id = Order.customer_id
LEFT JOIN OrderLines 
        ON Order.id = OrderLines.order_id
        AND OrderLines.product_id =9
GROUP BY 1,2;
+----+--------+---------------+
| id | name   | LineItemCount |
+----+--------+---------------+
|  1 | bob    |             1 |
|  2 | nathan |             0 |
+----+--------+---------------+
2 rows in set (0.01 sec)

All we need now is a HAVING clause to pick out those who have never 
ordered #9:

SELECT Customer.id
        , Customer.name
        , COUNT(orderlines.product_id) as LineItemCount
FROM Customer
LEFT JOIN `Order` 
        ON Customer.id = Order.customer_id
LEFT JOIN OrderLines 
        ON Order.id = OrderLines.order_id
        AND OrderLines.product_id =9
GROUP BY 1,2
HAVING LineItemCount=0;
+----+--------+---------------+
| id | name   | LineItemCount |
+----+--------+---------------+
|  2 | nathan |             0 |
+----+--------+---------------+
1 row in set (0.00 sec)

Which is the results you wanted, right?  Why didn't your original query 
work? I can't say for sure but I am sure it has something to do with the 
fact that your INNER join was subordinate to your LEFT join. There are 
several bugs about similar situations (mixing LEFT, RIGHT, and INNER in 
the same query, mixing LEFT and RIGHT) and I think the development team 
are still trying to work out the correct algorithms to use to apply the 
correct logic algebra to this class of query. Who knows, maybe your query 
will be the one that helps the light go off in their head so they can get 
this all straightened out.  Until they do, try to keep your INNER joins 
superior to your OUTER joins and you should stay out of trouble or 
refactor your query to precompute your subordinate INNER join into a temp 
table and work with it from there.

With that advice in mind, this may be a faster solution

CREATE TEMPORARY TABLE tmpOrders(KEY(customer_id))
SELECT DISTINCT o.customer_id
FROM Order o
INNER JOIN OrderLines ol
        ON ol.order_id = o.id

SELECT DISTINCT Customer.id, Customer.name
FROM Customer
LEFT JOIN tmpOrders
        ON Customer.id = tmpOrders.customer_id
WHERE tmpOrders.customer_id is null;

DROP TABLE tmpOrders;

FWIW....
Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine

Reply via email to