Re: sql join statement that I do not understand

SGreen Mon, 28 Jun 2004 10:22:24 -0700

Joseph,

You actually had 3 joined statements. Here are your queries isolated from
the rest of your posting (and slightly reformatted):


1) an INNER JOIN (aka an 'equijoin') using the "comma" format

select cs_fld_cs_tbl_l.cs_type
      ,field_name
      ,name
      ,type
      ,type_sql
      ,rl_table
      ,cs_tbl
from cs_fld,  cs_fld_cs_tbl_l
where
      cs_fld_cs_tbl_l.cs_fld_id = cs_fld.cs_rid
      and  cs_fld_cs_tbl_l.cs_tbl_id = '23'
      AND cs_fld_cs_tbl_l.cs_type = 'basic'

2) LEFT JOIN #1

select cs_fld_cs_tbl_l.cs_type
      ,field_name
      ,name
      ,type
      ,type_sql
      ,rl_table
      ,cs_tbl
from cs_fld
left join cs_fld_cs_tbl_l
      ON cs_fld_cs_tbl_l.cs_fld_id = cs_fld.cs_rid
WHERE cs_fld_cs_tbl_l.cs_tbl_id = '23'
      AND cs_fld_cs_tbl_l.cs_type = 'basic'

3) LEFT JOIN #2

select cs_fld_cs_tbl_l.cs_type
      ,field_name
      ,name
      ,type
      ,type_sql
      ,rl_table
      ,cs_tbl
from cs_fld_cs_tbl_l
left join cs_fld
      ON cs_fld_cs_tbl_l.cs_fld_id = cs_fld.cs_rid
WHERE cs_fld_cs_tbl_l.cs_tbl_id = '23'
      AND cs_fld_cs_tbl_l.cs_type = 'basic';

With all JOINS the query engine must construct an internal table that
contains every possible combination of the rows from one table with the
rows from the other table. Fortunately we are allowed to specify (with the
ON clause) exactly which rows out of all those combinations we really want
to deal with. If you had 1000 rows in tablea and 500 rows in tableb and
said :

FROM tablea INNER JOIN tableb

and you didn't specify an ON clause, you would be forcing the engine to
compile (500 x 1000 = 500000) rows of data into a temporary table. It's
that temporary table that the WHERE clause must operate against in order to
complete your query. If you specify an ON clause that restricted your
temporary table to just a few dozen rows, the WHERE clause would execute
much faster.

In query 1 you are only going to get a record from "cs_fld" if a record
exists in "cs_fld_cs_tbl_1" so that "cs_fld_cs_tbl_l.cs_fld_id =
cs_fld.cs_rid" is true.  Your temporary table is composed of just the
matching rows of the two tables which is probably only a few rows. You
WHERE restrictions take practically no time at all to apply to such a small
set.

In query 2 and 3 you are doing LEFT JOINS which means that the results will
consist of __all__ rows from the table "in the direction" of the JOIN and
only those rows from the table "in the opposite direction" of the JOIN that
satisfy your ON clause.

In Query 2 you build an internal dataset that consists of ALL of the rows
of cs_fld (the table on the LEFT of the statement). The WHERE clause must
apply its restrictions to all of those rows in order to complete your
query.

In Query 3, you reversed the tables. Now cs_fld_cs_tbl_l is the "driving"
table in the query. I would have to guess that it is much smaller than
cs_fld so the temporary table is much smaller so your WHERE clause takes
less time to apply. Or, there was an index that the engine could apply to
cs_fld_cs_tbl_l that it wasn't able to use in Query 2.

You can play with these kinds of queries (queries with JOINed tables) by
moving terms from the WHERE clause into the ON clause in attempts to
minimize the size of the temporary table that is the result of your JOINS.
In fact, this is one way to rewrite query 3:

select cs_fld_cs_tbl_l.cs_type
      ,field_name
      ,name
      ,type
      ,type_sql
      ,rl_table
      ,cs_tbl
from cs_fld_cs_tbl_l
left join cs_fld
      ON cs_fld_cs_tbl_l.cs_fld_id = cs_fld.cs_rid
      AND cs_fld_cs_tbl_l.cs_tbl_id = '23'
      AND cs_fld_cs_tbl_l.cs_type = 'basic';


I eliminated the WHERE clause entirely yet I still have a valid query. This
technique becomes more useful if you have multiple JOINS in a single query.
Moving a restriction into the ON clause does not guarantee better
performance, you must test, test, and re-test in order to determine the
best response for your particular data and index structures.

Yours,
Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine



                                                                                       
                                
                      "Joseph Norris"                                                  
                                
                      <[EMAIL PROTECTED]        To:       <[EMAIL PROTECTED]>          
                             
                      >                        cc:                                     
                                
                                               Fax to:                                 
                                
                      06/28/2004 12:32         Subject:  sql join statement that I do 
not understand                   
                      PM                                                               
                                
                                                                                       
                                
                                                                                       
                                






Group,

First of all thanks to anyone who can respond to this - I am really
stumped.


I have been trying to figure this one out and maybe someone out there with
a
little deep understanding of joins in sql can give me a hand.

I am working on a system that creates these sql statements on the fly and
so
the table names and fields are really perl variables.

This the sql without the join:


select
cs_fld_cs_tbl_l.cs_type,field_name,name,type,type_sql,rl_table,cs_tbl from
cs_fld,  cs_fld_cs_tbl_l
 where cs_fld_cs_tbl_l.cs_fld_id = cs_fld.cs_rid
 and  cs_fld_cs_tbl_l.cs_tbl_id = '23'  AND cs_fld_cs_tbl_l.cs_type =
'basic'



+---------+------------+------------------+------------+-------------+------

----+--------+
| cs_type | field_name | name             | type       | type_sql    |
rl_table | cs_tbl |
+---------+------------+------------------+------------+-------------+------

----+--------+
| basic   | status     | Status           | recordid   | int         |
status   | [23]   |
| basic   | body       | Main Body        | textarea   | text        |
| [23]   |
| basic   | section    | Section          | recordid   | int         |
demsect  | [23]   |
| basic   | title      | Title            | text       | varchar(50) |
| [23]   |
| basic   | assignu    | Assign to User:  | recordid   | int         | u
| [23]   |
| basic   | subsect    | Subsection       | recordlist | list        |
subsect  | [23]   |
| basic   | assigngr   | Assign to Group: | recordid   | int         |
cs_com   | [23]   |
| basic   | sorder     | Relative Order   | numeric    | int         |
| [23]   |
| basic   | con_type   | Content Type:    | recordid   | int         |
con_type | [23]   |
+---------+------------+------------------+------------+-------------+------

----+--------+
9 rows in set (0.01 sec)





This is the join:

select
cs_fld_cs_tbl_l.cs_type,field_name,name,type,type_sql,rl_table,cs_tbl from
cs_fld left join cs_fld_cs_tbl_l
 ON cs_fld_cs_tbl_l.cs_fld_id = cs_fld.cs_rid
 WHERE cs_fld_cs_tbl_l.cs_tbl_id = '23'  AND cs_fld_cs_tbl_l.cs_type =
'basic'


+---------+------------+------------------+------------+-------------+------

----+--------+
| cs_type | field_name | name             | type       | type_sql    |
rl_table | cs_tbl |
+---------+------------+------------------+------------+-------------+------

----+--------+
| basic   | status     | Status           | recordid   | int         |
status   | [23]   |
| basic   | body       | Main Body        | textarea   | text        |
| [23]   |
| basic   | title      | Title            | text       | varchar(50) |
| [23]   |
| basic   | subsect    | Subsection       | recordlist | list        |
subsect  | [23]   |
| basic   | assigngr   | Assign to Group: | recordid   | int         |
cs_com   | [23]   |
| basic   | section    | Section          | recordid   | int         |
demsect  | [23]   |
| basic   | sorder     | Relative Order   | numeric    | int         |
| [23]   |
| basic   | assignu    | Assign to User:  | recordid   | int         | u
| [23]   |
| basic   | con_type   | Content Type:    | recordid   | int         |
con_type | [23]   |
+---------+------------+------------------+------------+-------------+------

----+--------+
9 rows in set (4.44 sec)



Notice the time difference?  I thought that the join statement was supposed
to be more efficient.


I did some dinking with the joins and I go this:

It appears that optimization of joins depends upon what table you are
joining to what:


This is the new sql:

mysql> select
    ->
cs_fld_cs_tbl_l.cs_type,field_name,name,type,type_sql,rl_table,cs_tbl
from
    -> cs_fld_cs_tbl_l left join cs_fld
    ->  ON cs_fld_cs_tbl_l.cs_fld_id = cs_fld.cs_rid
    ->  WHERE cs_fld_cs_tbl_l.cs_tbl_id = '23'  AND cs_fld_cs_tbl_l.cs_type
= 'basic';
+---------+------------+------------------+------------+-------------+------

----+--------+
| cs_type | field_name | name             | type       | type_sql    |
rl_table | cs_tbl |
+---------+------------+------------------+------------+-------------+------

----+--------+
| basic   | status     | Status           | recordid   | int         |
status   | [23]   |
| basic   | body       | Main Body        | textarea   | text        |
| [23]   |
| basic   | section    | Section          | recordid   | int         |
demsect  | [23]   |
| basic   | title      | Title            | text       | varchar(50) |
| [23]   |
| basic   | assignu    | Assign to User:  | recordid   | int         | u
| [23]   |
| basic   | subsect    | Subsection       | recordlist | list        |
subsect  | [23]   |
| basic   | assigngr   | Assign to Group: | recordid   | int         |
cs_com   | [23]   |
| basic   | sorder     | Relative Order   | numeric    | int         |
| [23]   |
| basic   | con_type   | Content Type:    | recordid   | int         |
con_type | [23]   |
+---------+------------+------------------+------------+-------------+------

----+--------+
9 rows in set (0.01 sec)
All I did was transpose the cs_fld table with the cs_fld_cs_tbl_l and bam!
I
get the more efficient time.

My question is:   How do I determine which table should be on which side of
the join statement?

Thanks to all who respond - I really am stumped on this one.


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]







-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Re: sql join statement that I do not understand

Reply via email to