Re: Limiting results from joins

Michael Stassen Mon, 12 Jun 2006 12:49:06 -0700

Kim Christensen wrote:
> Hey list;
>
> Consider this statement:
>
> SELECT * FROM products p, manufactors m, items i WHERE p.manufactor_id
> = m.manufactor_id && p.product_id = i.product_id;
>
> Now, each unique product_id from "products" can have more than one
> entry in the "items" table, but I only want to fetch the one which
> fullfills a certain criteria. In this case, I want the statement only
> to JOIN the row if the column "item_updated" from the "items" table
> equals "1".


Adding

  WHERE i.item_updated = 1

is the simplest solution.

> Is this the proper way to solve this:
>
> SELECT * FROM products p, manufactors m, items i WHERE p.manufactor_id
> = m.manufactor_id && p.product_id = i.product_id && i.item_id =
> (SELECT item_id FROM items i2 WHERE i2.item_updated = 1);

Definitely not, as you've seen.  Your subquery returns a *lot* of unwanted 
item_ids.

> I find the above solution VERY slow, almost as if I have missed out on
> a very fundamental part of the logic - but it does get the work done.
> How could I speed this up, or solve it in another statement?

[EMAIL PROTECTED] wrote:
> How about:
>
> select * from products p
> inner join manufactors m on p.manufactor_id=m.manufactor_id
> inner join items i on p.product_id=i.product_id and i.item_updated=1

I'd have suggested

  SELECT *
  FROM products p
  JOIN manufactors m ON p.manufactor_id = m.manufactor_id
  JOIN items i ON p.product_id = i.product_id
  WHERE i.item_updated=1

Although they appear different in theory, the optimizer will almost certainlytreat them the same. (Check each with EXPLAIN to see.) In either case, theefficient way to execute the query will depend largely on what indexes are inthe items table. With an index on items.item_updated, the optimizer may chooseto use that index to select the appropriate rows from items, then join to theother two tables. On the other hand, if items has a multi-column index on(product_id, item_updated), the optimizer would be able to join table itemslast, using the index to match the one matching row for each row of tableproducts. Without the multi-column index, it couldn't do that.


Kim Christensen wrote:
> That worked great, really cut the time on my queries! However, how
> would one filter out the INNER JOIN statement if it returns more than
> one row to JOIN? I have reworked my previous table setup and query so
> that I need the row that has the lowest value of "item_updated", not
> particularly 1.

Well, that's different. This is an example of why you shouldn't try to simplifyyour query for the list. When your question doesn't include your real query,you are quite likely to get answers that don't apply to your real query.


> Here's what it looks like right now:
>
> SELECT * FROM products p
> INNER JOIN manufactors m ON m.manufactor_id = p.manufactor_id
> INNER JOIN items i ON i.product_id = p.product_id
>
> The problem is, that each entry in "products" may occur more than once
> in "items", and they are identified by "product_id" in both tables.
> How do I filter out the results from the last INNER JOIN by certain
> criterias? I want the INNER JOIN to only return the row from items
> which has the lowest value in the column "item_updated".

This is a FAQ, with solutions given in the manual,<http://dev.mysql.com/doc/refman/4.1/en/example-maximum-column-group-row.html>.(While the subquery solution looks more elegant, the temporary table solutionis frequently more efficient.)


> As it is right now, MySQL returns a row for each time the product
> occurs in the "items" table, which is not what I want :-)

Peter Lauri wrote:
> Just add:
>
> ORDER BY i.item_updated GROUP BY i.product_id
>
> (assuming that item_updated and product_id are in table i)

That definitely will NOT work. You cannot select columns not included in thegrouped columns and expect to get meaningful results. See the manual for why,<http://dev.mysql.com/doc/refman/4.1/en/group-by-hidden-fields.html>.


Kim Christensen wrote:
> That gives me the result that I'm after, the only problem seems to be
> that the sorting is
> made before the matching - on all the rows (huge execution time) - how
> can I prevent this?

I doubt that really gives the result you want, unless you were very lucky, orstill haven't told us the real query. In any case, this query will not workreliably.


Peter Lauri wrote:
> Take away the GROUP BY thing. And after that you just check if the rows are
> in the order that you want. The upper row would be the one that GROUP BY
> will take. Are you sure that you want the "lowest" value in the
> "item_update"? I would like to have the highest value.
>
> If you want the latest date, then you do ORDER BY i.item_updated DESC GROUP
> BY i.product_id

Again, no. ORDER BY does not affect which rows are "chosen" by GROUP BY, whichis what you are hoping for with this query. In fact, GROUP BY does not evenselect rows. It returns group identifiers and aggregate functions. If you tryto select a column which is not included in the GROUP BY clause, you get arandom chosen value from the rows in each group. You cannot change that withORDER BY.


Michael

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Re: Limiting results from joins

Reply via email to