Scott Klarenbach <[EMAIL PROTECTED]> wrote on 02/02/2006 02:01:11 
PM:

> I have a table `requirement` which is left joining to a table 
`inventory`
> based on a matching `partNumber` column.  The inventory table has 
millions
> of records, the requirement table has tens of thousands.  I'm noticing 
that
> the left join between requirement and inventory doesn't take advantage 
of a
> LIMIT clause.  So whether I select all records from requirement or limit 
it
> to 50, the LEFT JOIN operation still seems to be calculating for ALL
> requirement records against ALL inventory records.  (The query takes the
> exact same amount of time, whether I pull 50 requirement records or 
10,000).
> 
> How can I force mysql to only join the inventory table for the those 50
> records brought back by the LIMIT clause?
> 
> What I would do in a more powerful DB like SQL Server, is build a 
temporary
> table with my 50 requirement rows, and then perform the inventory join 
on
> the temp table.  But due to MySQL SPROC limitations (ie, LIMIT clauses 
must
> have integer constants, not parameters) and View limititations (ie, no
> indexing of views), I'd have to build this temporary table and the rest 
of
> query in PHP first, which is really ugly.
> 
> I'm hoping there is a nice SQL trick I can use with MySQL to restrict 
the
> join to only those records that would come back from the limit set.
> 
> Thanks,
> Scott Klarenbach

Yes, and no.  You cannot apply a LIMIT specifically to a JOIN clause 
unless you break your query into separate pieces and put limits on each of 
them.  What happens during the normal execution of a query is that after 
parsing and planning the engine begins collecting and combining the source 
data. Which records are combined and matched against which others is 
defined in the FROM clause and all of the JOIN clauses. 

The equivalent to a large virtual table (similar to saying "SELECT * FROM 
<all involved tables>") is created in memory. The only restrictions to 
which rows of data make it into this first processing stage come from the 
ON clauses (and any WHERE clauses the optimizer _may_ choose to include) 
defined between the JOINed tables. Next comes WHERE clause processing, 
then GROUP BY processing, HAVING processing, ORDER BY processing, and 
finally LIMIT processing. 

As you can see by the flow of query execution, LIMIT clauses are really 
only useful for restricting how much data is finally sent to the user. In 
order to minimize how much processing your CPU has to do to compute a 
particular query you have several tools at your disposal: indexes, 
temporary tables, and stepwize result construction.

JOINing tables is a geometrically expensive action. The number of 
potential row matches increase by the product of the number of rows in 
each table involved in the join. If you can preselect certain target rows 
from your really large tables into smaller temporary tables and build your 
final result set from them, the query processor will only need to compute 
a small fraction of the row comparisons it would have had to perform 
compared to the number of row comparisons necessary to JOIN your original 
tables. Take this rough math as an example:

TABLE A: 10000 rows
TABLE B: 10000 rows

SELECT * from A INNER JOIN B ON A.id = B.A_ic;

There are potentially 10000 x 10000 = 100000000 (1.0e+08) row combinations 
to be checked. If instead of joining A to B, we create two derivative 
tables called C and D (assuming we don't change the column names)

TABLE A -> TABLE C: 5000 rows
TABLE B -> TABLE D: 1000 rows

SELECT * from C INNER JOIN D ON C.id = D.A_ic;

That means there are now 5000 x 1000 = 5000000 (5.0e+06) or 1/20th the 
number of comparisons to run. Computing tables C and D should be in linear 
or logarithmic time (because you should have good index coverage) so there 
will usually be a net gain in performance. This is the secret to stepwize 
result construction.

To help you to optimize your particular query, I would need to see it and 
the table definitions it is working against (SHOW CREATE TABLE works best 
for me).

Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine

Reply via email to