Re: MySQL LEFT JOIN Optimization Using LIMIT CLAUSE

2006-02-03 Thread SGreen
Scott Klarenbach [EMAIL PROTECTED] wrote on 02/02/2006 02:01:11 
PM:

 I have a table `requirement` which is left joining to a table 
`inventory`
 based on a matching `partNumber` column.  The inventory table has 
millions
 of records, the requirement table has tens of thousands.  I'm noticing 
that
 the left join between requirement and inventory doesn't take advantage 
of a
 LIMIT clause.  So whether I select all records from requirement or limit 
it
 to 50, the LEFT JOIN operation still seems to be calculating for ALL
 requirement records against ALL inventory records.  (The query takes the
 exact same amount of time, whether I pull 50 requirement records or 
10,000).
 
 How can I force mysql to only join the inventory table for the those 50
 records brought back by the LIMIT clause?
 
 What I would do in a more powerful DB like SQL Server, is build a 
temporary
 table with my 50 requirement rows, and then perform the inventory join 
on
 the temp table.  But due to MySQL SPROC limitations (ie, LIMIT clauses 
must
 have integer constants, not parameters) and View limititations (ie, no
 indexing of views), I'd have to build this temporary table and the rest 
of
 query in PHP first, which is really ugly.
 
 I'm hoping there is a nice SQL trick I can use with MySQL to restrict 
the
 join to only those records that would come back from the limit set.
 
 Thanks,
 Scott Klarenbach

Yes, and no.  You cannot apply a LIMIT specifically to a JOIN clause 
unless you break your query into separate pieces and put limits on each of 
them.  What happens during the normal execution of a query is that after 
parsing and planning the engine begins collecting and combining the source 
data. Which records are combined and matched against which others is 
defined in the FROM clause and all of the JOIN clauses. 

The equivalent to a large virtual table (similar to saying SELECT * FROM 
all involved tables) is created in memory. The only restrictions to 
which rows of data make it into this first processing stage come from the 
ON clauses (and any WHERE clauses the optimizer _may_ choose to include) 
defined between the JOINed tables. Next comes WHERE clause processing, 
then GROUP BY processing, HAVING processing, ORDER BY processing, and 
finally LIMIT processing. 

As you can see by the flow of query execution, LIMIT clauses are really 
only useful for restricting how much data is finally sent to the user. In 
order to minimize how much processing your CPU has to do to compute a 
particular query you have several tools at your disposal: indexes, 
temporary tables, and stepwize result construction.

JOINing tables is a geometrically expensive action. The number of 
potential row matches increase by the product of the number of rows in 
each table involved in the join. If you can preselect certain target rows 
from your really large tables into smaller temporary tables and build your 
final result set from them, the query processor will only need to compute 
a small fraction of the row comparisons it would have had to perform 
compared to the number of row comparisons necessary to JOIN your original 
tables. Take this rough math as an example:

TABLE A: 1 rows
TABLE B: 1 rows

SELECT * from A INNER JOIN B ON A.id = B.A_ic;

There are potentially 1 x 1 = 1 (1.0e+08) row combinations 
to be checked. If instead of joining A to B, we create two derivative 
tables called C and D (assuming we don't change the column names)

TABLE A - TABLE C: 5000 rows
TABLE B - TABLE D: 1000 rows

SELECT * from C INNER JOIN D ON C.id = D.A_ic;

That means there are now 5000 x 1000 = 500 (5.0e+06) or 1/20th the 
number of comparisons to run. Computing tables C and D should be in linear 
or logarithmic time (because you should have good index coverage) so there 
will usually be a net gain in performance. This is the secret to stepwize 
result construction.

To help you to optimize your particular query, I would need to see it and 
the table definitions it is working against (SHOW CREATE TABLE works best 
for me).

Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine



Re: MySQL LEFT JOIN Optimization Using LIMIT CLAUSE

2006-02-03 Thread Scott Klarenbach
Thanks a lot Shawn.  As always, your advice has been very helpful.

On 2/3/06, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:



 Scott Klarenbach [EMAIL PROTECTED] wrote on 02/02/2006 02:01:11
 PM:

  I have a table `requirement` which is left joining to a table
 `inventory`
  based on a matching `partNumber` column.  The inventory table has
 millions
  of records, the requirement table has tens of thousands.  I'm noticing
 that
  the left join between requirement and inventory doesn't take advantage
 of a
  LIMIT clause.  So whether I select all records from requirement or limit
 it
  to 50, the LEFT JOIN operation still seems to be calculating for ALL
  requirement records against ALL inventory records.  (The query takes the
  exact same amount of time, whether I pull 50 requirement records or
 10,000).
 
  How can I force mysql to only join the inventory table for the those 50
  records brought back by the LIMIT clause?
 
  What I would do in a more powerful DB like SQL Server, is build a
 temporary
  table with my 50 requirement rows, and then perform the inventory join
 on
  the temp table.  But due to MySQL SPROC limitations (ie, LIMIT clauses
 must
  have integer constants, not parameters) and View limititations (ie, no
  indexing of views), I'd have to build this temporary table and the rest
 of
  query in PHP first, which is really ugly.
 
  I'm hoping there is a nice SQL trick I can use with MySQL to restrict
 the
  join to only those records that would come back from the limit set.
 
  Thanks,
  Scott Klarenbach

 Yes, and no.  You cannot apply a LIMIT specifically to a JOIN clause
 unless you break your query into separate pieces and put limits on each of
 them.  What happens during the normal execution of a query is that after
 parsing and planning the engine begins collecting and combining the source
 data. Which records are combined and matched against which others is defined
 in the FROM clause and all of the JOIN clauses.

 The equivalent to a large virtual table (similar to saying SELECT * FROM
 all involved tables) is created in memory. The only restrictions to which
 rows of data make it into this first processing stage come from the ON
 clauses (and any WHERE clauses the optimizer _may_ choose to include)
 defined between the JOINed tables. Next comes WHERE clause processing, then
 GROUP BY processing, HAVING processing, ORDER BY processing, and finally
 LIMIT processing.

 As you can see by the flow of query execution, LIMIT clauses are really
 only useful for restricting how much data is finally sent to the user. In
 order to minimize how much processing your CPU has to do to compute a
 particular query you have several tools at your disposal: indexes, temporary
 tables, and stepwize result construction.

 JOINing tables is a geometrically expensive action. The number of
 potential row matches increase by the product of the number of rows in each
 table involved in the join. If you can preselect certain target rows from
 your really large tables into smaller temporary tables and build your final
 result set from them, the query processor will only need to compute a small
 fraction of the row comparisons it would have had to perform compared to the
 number of row comparisons necessary to JOIN your original tables. Take this
 rough math as an example:

 TABLE A: 1 rows
 TABLE B: 1 rows

 SELECT * from A INNER JOIN B ON A.id http://a.id/ = B.A_ic;

 There are potentially 1 x 1 = 1 (1.0e+08) row combinations
 to be checked. If instead of joining A to B, we create two derivative tables
 called C and D (assuming we don't change the column names)

 TABLE A - TABLE C: 5000 rows
 TABLE B - TABLE D: 1000 rows

 SELECT * from C INNER JOIN D ON C.id http://c.id/ = D.A_ic;

 That means there are now 5000 x 1000 = 500 (5.0e+06) or 1/20th the
 number of comparisons to run. Computing tables C and D should be in linear
 or logarithmic time (because you should have good index coverage) so there
 will usually be a net gain in performance. This is the secret to stepwize
 result construction.

 To help you to optimize your particular query, I would need to see it and
 the table definitions it is working against (SHOW CREATE TABLE works best
 for me).

 Shawn Green
 Database Administrator
 Unimin Corporation - Spruce Pine




MySQL LEFT JOIN Optimization Using LIMIT CLAUSE

2006-02-02 Thread Scott Klarenbach
I have a table `requirement` which is left joining to a table `inventory`
based on a matching `partNumber` column.  The inventory table has millions
of records, the requirement table has tens of thousands.  I'm noticing that
the left join between requirement and inventory doesn't take advantage of a
LIMIT clause.  So whether I select all records from requirement or limit it
to 50, the LEFT JOIN operation still seems to be calculating for ALL
requirement records against ALL inventory records.  (The query takes the
exact same amount of time, whether I pull 50 requirement records or 10,000).

How can I force mysql to only join the inventory table for the those 50
records brought back by the LIMIT clause?

What I would do in a more powerful DB like SQL Server, is build a temporary
table with my 50 requirement rows, and then perform the inventory join on
the temp table.  But due to MySQL SPROC limitations (ie, LIMIT clauses must
have integer constants, not parameters) and View limititations (ie, no
indexing of views), I'd have to build this temporary table and the rest of
query in PHP first, which is really ugly.

I'm hoping there is a nice SQL trick I can use with MySQL to restrict the
join to only those records that would come back from the limit set.

Thanks,
Scott Klarenbach


Re: MySQL LEFT JOIN Optimization Using LIMIT CLAUSE

2006-02-02 Thread Augusto Bott
Try this:

[EMAIL PROTECTED]:ule select * from a;
++--+
| id | data |
++--+
|  1 | a|
|  2 | b|
|  3 | c|
|  4 | d|
|  5 | e|
++--+
5 rows in set (0.00 sec)

[EMAIL PROTECTED]:ule select * from b;
++--+
| id | data |
++--+
|  1 | aa   |
|  3 | bb   |
|  4 | cc   |
|  3 | bb   |
++--+
4 rows in set (0.00 sec)

[EMAIL PROTECTED]:ule select A, a.data, b.id as B, b.data FROM (select
a.id as A, a.data from a limit 3) a LEFT JOIN b on A=b.id;
+---+--+--+--+
| A | data | B| data |
+---+--+--+--+
| 1 | a|1 | aa   |
| 2 | b| NULL | NULL |
| 3 | c|3 | bb   |
| 3 | c|3 | bb   |
+---+--+--+--+
4 rows in set (0.00 sec)

--
Augusto Bott
augusto.bott (at) gmail.com

On 2/2/06, Scott Klarenbach [EMAIL PROTECTED] wrote:
 I have a table `requirement` which is left joining to a table `inventory`
 based on a matching `partNumber` column.  The inventory table has millions
 of records, the requirement table has tens of thousands.  I'm noticing that
 the left join between requirement and inventory doesn't take advantage of a
 LIMIT clause.  So whether I select all records from requirement or limit it
 to 50, the LEFT JOIN operation still seems to be calculating for ALL
 requirement records against ALL inventory records.  (The query takes the
 exact same amount of time, whether I pull 50 requirement records or 10,000).

 How can I force mysql to only join the inventory table for the those 50
 records brought back by the LIMIT clause?

 What I would do in a more powerful DB like SQL Server, is build a temporary
 table with my 50 requirement rows, and then perform the inventory join on
 the temp table.  But due to MySQL SPROC limitations (ie, LIMIT clauses must
 have integer constants, not parameters) and View limititations (ie, no
 indexing of views), I'd have to build this temporary table and the rest of
 query in PHP first, which is really ugly.

 I'm hoping there is a nice SQL trick I can use with MySQL to restrict the
 join to only those records that would come back from the limit set.

 Thanks,
 Scott Klarenbach