Scott Klarenbach <[EMAIL PROTECTED]> wrote on 02/02/2006 02:01:11 PM:
> I have a table `requirement` which is left joining to a table `inventory` > based on a matching `partNumber` column. The inventory table has millions > of records, the requirement table has tens of thousands. I'm noticing that > the left join between requirement and inventory doesn't take advantage of a > LIMIT clause. So whether I select all records from requirement or limit it > to 50, the LEFT JOIN operation still seems to be calculating for ALL > requirement records against ALL inventory records. (The query takes the > exact same amount of time, whether I pull 50 requirement records or 10,000). > > How can I force mysql to only join the inventory table for the those 50 > records brought back by the LIMIT clause? > > What I would do in a more powerful DB like SQL Server, is build a temporary > table with my 50 requirement rows, and then perform the inventory join on > the temp table. But due to MySQL SPROC limitations (ie, LIMIT clauses must > have integer constants, not parameters) and View limititations (ie, no > indexing of views), I'd have to build this temporary table and the rest of > query in PHP first, which is really ugly. > > I'm hoping there is a nice SQL trick I can use with MySQL to restrict the > join to only those records that would come back from the limit set. > > Thanks, > Scott Klarenbach Yes, and no. You cannot apply a LIMIT specifically to a JOIN clause unless you break your query into separate pieces and put limits on each of them. What happens during the normal execution of a query is that after parsing and planning the engine begins collecting and combining the source data. Which records are combined and matched against which others is defined in the FROM clause and all of the JOIN clauses. The equivalent to a large virtual table (similar to saying "SELECT * FROM <all involved tables>") is created in memory. The only restrictions to which rows of data make it into this first processing stage come from the ON clauses (and any WHERE clauses the optimizer _may_ choose to include) defined between the JOINed tables. Next comes WHERE clause processing, then GROUP BY processing, HAVING processing, ORDER BY processing, and finally LIMIT processing. As you can see by the flow of query execution, LIMIT clauses are really only useful for restricting how much data is finally sent to the user. In order to minimize how much processing your CPU has to do to compute a particular query you have several tools at your disposal: indexes, temporary tables, and stepwize result construction. JOINing tables is a geometrically expensive action. The number of potential row matches increase by the product of the number of rows in each table involved in the join. If you can preselect certain target rows from your really large tables into smaller temporary tables and build your final result set from them, the query processor will only need to compute a small fraction of the row comparisons it would have had to perform compared to the number of row comparisons necessary to JOIN your original tables. Take this rough math as an example: TABLE A: 10000 rows TABLE B: 10000 rows SELECT * from A INNER JOIN B ON A.id = B.A_ic; There are potentially 10000 x 10000 = 100000000 (1.0e+08) row combinations to be checked. If instead of joining A to B, we create two derivative tables called C and D (assuming we don't change the column names) TABLE A -> TABLE C: 5000 rows TABLE B -> TABLE D: 1000 rows SELECT * from C INNER JOIN D ON C.id = D.A_ic; That means there are now 5000 x 1000 = 5000000 (5.0e+06) or 1/20th the number of comparisons to run. Computing tables C and D should be in linear or logarithmic time (because you should have good index coverage) so there will usually be a net gain in performance. This is the secret to stepwize result construction. To help you to optimize your particular query, I would need to see it and the table definitions it is working against (SHOW CREATE TABLE works best for me). Shawn Green Database Administrator Unimin Corporation - Spruce Pine