On Wed, Feb 4, 2015 at 2:56 PM, shawn l.green <shawn.l.gr...@oracle.com> wrote: > Hi Larry, > > > On 2/1/2015 4:49 PM, Larry Martell wrote: >> >> I have 2 queries. One takes 4 hours to run and returns 21 rows, and >> the other, which has 1 additional where clause, takes 3 minutes and >> returns 20 rows. The main table being selected from is largish >> (37,247,884 rows with 282 columns). Caching is off for my testing, so >> it's not related to that. To short circuit anyone asking, these >> queries are generated by python code, which is why there's an IN >> clause with 1 value, as oppose to an =. >> >> Here are the queries and their explains. The significant difference is >> that the faster query has "Using >> intersect(data_cst_bbccbce0,data_cst_fba12377)" in the query plan - >> those 2 indexes are on the 2 columns in the where clause, so that's >> why the second one is faster. But I am wondering what I can do to make >> the first one faster. >> >> >> 4 hour query: >> >> SELECT MIN(data_tool.name) as tool, >> MIN(data_cst.date_time) "start", >> MAX(data_cst.date_time) "end", >> MIN(data_target.name) as target, >> MIN(data_lot.name) as lot, >> MIN(data_wafer.name) as wafer, >> MIN(measname) as measname, >> MIN(data_recipe.name) as recipe >> FROM data_cst >> INNER JOIN data_tool ON data_tool.id = data_cst.tool_id >> INNER JOIN data_target ON data_target.id = data_cst.target_name_id >> INNER JOIN data_lot ON data_lot.id = data_cst.lot_id >> INNER JOIN data_wafer ON data_wafer.id = data_cst.wafer_id >> INNER JOIN data_measparams ON data_measparams.id = >> data_cst.meas_params_name_id >> INNER JOIN data_recipe ON data_recipe.id = data_cst.recipe_id >> WHERE data_target.id IN (172) AND >> data_cst.date_time BETWEEN '2015-01-26 00:00:00' AND '2015-01-26 >> 23:59:59' >> GROUP BY wafer_id, data_cst.lot_id, target_name_id >> > > ... snipped ... > >> >> >> Faster query: >> >> SELECT MIN(data_tool.name) as tool, >> MIN(data_cst.date_time) "start", >> MAX(data_cst.date_time) "end", >> MIN(data_target.name) as target, >> MIN(data_lot.name) as lot, >> MIN(data_wafer.name) as wafer, >> MIN(measname) as measname, >> MIN(data_recipe.name) as recipe >> FROM data_cst >> INNER JOIN data_tool ON data_tool.id = data_cst.tool_id >> INNER JOIN data_target ON data_target.id = data_cst.target_name_id >> INNER JOIN data_lot ON data_lot.id = data_cst.lot_id >> INNER JOIN data_wafer ON data_wafer.id = data_cst.wafer_id >> INNER JOIN data_measparams ON data_measparams.id = >> data_cst.meas_params_name_id >> INNER JOIN data_recipe ON data_recipe.id = data_cst.recipe_id >> WHERE data_target.id IN (172) AND >> data_recipe.id IN (148) AND >> data_cst.date_time BETWEEN '2015-01-26 00:00:00' AND '2015-01-26 >> 23:59:59' >> GROUP BY wafer_id, data_cst.lot_id, target_name_id >> > ... snip ... >> >> >> Thanks for taking the time to read this, and for any help or pointers >> you can give me. >> > > The biggest difference is the added selectivity generated by the WHERE term > against the data_recipe table. > > Compare the two EXPLAINS, in the faster query you see that data_recipe is > listed second. This allows the additional term a chance to reduce the number > of row combinations for the entire query. > > To really get at the logic behind how the Optimizer chooses its execution > plan, get an optimizer trace. Look at the "cost" estimates for each phase > being considered. > http://dev.mysql.com/doc/refman/5.6/en/optimizer-trace-table.html > http://dev.mysql.com/doc/internals/en/optimizer-tracing.html
Thanks very much Shawn for the reply and the links. I will check those out and I'm sure I will find them very useful. Meanwhile I changed the query to select from data_cst using the where clause into a temp table and then I join the temp table with the other tables. That has improved the slow query from 4 hours to 10 seconds (!) -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/mysql