Thanks everyone for such quick and thorough responses!

Quoting [EMAIL PROTECTED]:

James Nobis <[EMAIL PROTECTED]> wrote on 04/21/2005
10:44:07 AM:

The problem is something fairly simple but yet MySQL seems to make this
complicated.  Essentially, find a list of customers who have not
bought product
X ever.  (Customers have orders, orders have order line items).  All
3 coworkers
independently arrived at the same sql which failed to work.  Then, we
wrote it
as a subquery which has performance issue and finally rewrote it with a
temp
table and a join. However, it seems like what we had should have
worked.

Borrowing from http://builder.com.com/5100-6388_14-5532304.html about
midway
down the page I set out to create an identical schema and query in
MySQL.

CREATE TABLE `Customer` ( `id` int(11) NOT NULL default '0', `name` varchar(255) NOT NULL default '' ) ENGINE=MyISAM DEFAULT CHARSET=latin1;

INSERT INTO `Customer` VALUES (1, 'bob');
INSERT INTO `Customer` VALUES (2, 'nathan');

CREATE TABLE `Order` (
  `id` int(11) NOT NULL auto_increment,
  `customer_id` int(11) NOT NULL default '0',
  `order_date` datetime NOT NULL default '0000-00-00 00:00:00',
  PRIMARY KEY  (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=3 ;

INSERT INTO `Order` VALUES (1, 1, '0000-00-00 00:00:00');
INSERT INTO `Order` VALUES (2, 2, '0000-00-00 00:00:00');

CREATE TABLE `OrderLines` (
  `order_id` int(11) NOT NULL default '0',
  `product_id` int(11) NOT NULL default '0',
  `quantity` int(11) NOT NULL default '0'
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

INSERT INTO `OrderLines` VALUES (1, 5, 1);
INSERT INTO `OrderLines` VALUES (1, 9, 1);
INSERT INTO `OrderLines` VALUES (2, 15, 1);
INSERT INTO `OrderLines` VALUES (2, 25, 1);

Then, I run the following query:
SELECT DISTINCT Customer.id, Customer.name
FROM Customer
LEFT JOIN `Order` ON Customer.id = Order.customer_id
INNER JOIN OrderLines ON Order.id = OrderLines.order_id
AND OrderLines.product_id =9
WHERE Order.customer_id IS NULL

I would expect this to return a single row with Customer.id 2.

Is there something obvious my coworkers and I are missing?

James Nobis
Web Developer
Academic Superstore
223 W. Anderson Ln. Suite A110, Austin, TX 78752
Voice: (512) 450-1199 x453 Fax: (512) 450-0263
http://www.academicsuperstore.com

It's hard to remember where I picked this up but I once read that it's
generally bad form to start with an outer join (LEFT or RIGHT JOIN) and
move into an INNER JOIN like you are doing. Because if the rows from the
Order table are optional to the results of the query, the rows from the
OrderLines are transitively optional as well (if an Order row doesn't
exist then there can't be any OrderLine rows either). So an equivalent
form of your query could have been:

SELECT DISTINCT Customer.id, Customer.name
FROM Customer
LEFT JOIN `Order`
       ON Customer.id = Order.customer_id
LEFT JOIN OrderLines
       ON Order.id = OrderLines.order_id
       AND OrderLines.product_id =9
WHERE Order.customer_id IS NULL;

But this won't help you to determine if a Customer had NEVER ordered that
product because you are including Order rows regardless of whether that
order had a product #9 on it or not. I then tried a nested JOIN using
parentheses like this and got no names:

SELECT DISTINCT Customer.id, Customer.name
FROM Customer
LEFT JOIN (`Order`
INNER JOIN OrderLines
       ON Order.id = OrderLines.order_id
       AND OrderLines.product_id =9
) ON Customer.id = Order.customer_id
WHERE Order.customer_id IS NULL;

The unfiltered results of that join look like this(sorry if it wraps):

SELECT *
FROM Customer
LEFT JOIN (
       `Order` INNER JOIN OrderLines
               ON Order.id = OrderLines.order_id
               AND OrderLines.product_id =9
) ON Customer.id = Order.customer_id;
+----+--------+----+-------------+---------------------+----------+------------+----------+
| id | name   | id | customer_id | order_date          | order_id |
product_id | quantity |
+----+--------+----+-------------+---------------------+----------+------------+----------+
|  1 | bob    |  1 |           1 | 0000-00-00 00:00:00 |        1 |   9 |
    1 |
|  2 | nathan |  1 |           1 | 0000-00-00 00:00:00 |     NULL | NULL |
   NULL |
|  1 | bob    |  2 |           2 | 0000-00-00 00:00:00 |     NULL | NULL |
   NULL |
|  2 | nathan |  2 |           2 | 0000-00-00 00:00:00 |     NULL | NULL |
   NULL |
+----+--------+----+-------------+---------------------+----------+------------+----------+
4 rows in set (0.00 sec)

Each customer has at least one order so the nested JOIN didn't work to
find your answer either (BTW- nested joins are not documented as a valid
syntax so I wasn't sure if it was going to work or not).

However, I thought, why not do exactly what the original question stated:
count how many times product 9 appears as a line item on an order and
return the names of the customers where that count is 0.

SELECT Customer.id
       , Customer.name
       , COUNT(orderlines.product_id) as LineItemCount
FROM Customer
LEFT JOIN `Order`
       ON Customer.id = Order.customer_id
LEFT JOIN OrderLines
       ON Order.id = OrderLines.order_id
       AND OrderLines.product_id =9
GROUP BY 1,2;
+----+--------+---------------+
| id | name   | LineItemCount |
+----+--------+---------------+
|  1 | bob    |             1 |
|  2 | nathan |             0 |
+----+--------+---------------+
2 rows in set (0.01 sec)

All we need now is a HAVING clause to pick out those who have never
ordered #9:

SELECT Customer.id
       , Customer.name
       , COUNT(orderlines.product_id) as LineItemCount
FROM Customer
LEFT JOIN `Order`
       ON Customer.id = Order.customer_id
LEFT JOIN OrderLines
       ON Order.id = OrderLines.order_id
       AND OrderLines.product_id =9
GROUP BY 1,2
HAVING LineItemCount=0;
+----+--------+---------------+
| id | name   | LineItemCount |
+----+--------+---------------+
|  2 | nathan |             0 |
+----+--------+---------------+
1 row in set (0.00 sec)

Which is the results you wanted, right?  Why didn't your original query
work? I can't say for sure but I am sure it has something to do with the
fact that your INNER join was subordinate to your LEFT join. There are
several bugs about similar situations (mixing LEFT, RIGHT, and INNER in
the same query, mixing LEFT and RIGHT) and I think the development team
are still trying to work out the correct algorithms to use to apply the
correct logic algebra to this class of query. Who knows, maybe your query
will be the one that helps the light go off in their head so they can get
this all straightened out.  Until they do, try to keep your INNER joins
superior to your OUTER joins and you should stay out of trouble or
refactor your query to precompute your subordinate INNER join into a temp
table and work with it from there.

With that advice in mind, this may be a faster solution

CREATE TEMPORARY TABLE tmpOrders(KEY(customer_id))
SELECT DISTINCT o.customer_id
FROM Order o
INNER JOIN OrderLines ol
       ON ol.order_id = o.id

SELECT DISTINCT Customer.id, Customer.name
FROM Customer
LEFT JOIN tmpOrders
       ON Customer.id = tmpOrders.customer_id
WHERE tmpOrders.customer_id is null;

DROP TABLE tmpOrders;

FWIW....
Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine



James Nobis Web Developer Academic Superstore 223 W. Anderson Ln. Suite A110, Austin, TX 78752 Voice: (512) 450-1199 x453 Fax: (512) 450-0263 http://www.academicsuperstore.com


-- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]



Reply via email to