Re: Help improving query performance
Hi Larry, On 2/1/2015 4:49 PM, Larry Martell wrote: I have 2 queries. One takes 4 hours to run and returns 21 rows, and the other, which has 1 additional where clause, takes 3 minutes and returns 20 rows. The main table being selected from is largish (37,247,884 rows with 282 columns). Caching is off for my testing, so it's not related to that. To short circuit anyone asking, these queries are generated by python code, which is why there's an IN clause with 1 value, as oppose to an =. Here are the queries and their explains. The significant difference is that the faster query has Using intersect(data_cst_bbccbce0,data_cst_fba12377) in the query plan - those 2 indexes are on the 2 columns in the where clause, so that's why the second one is faster. But I am wondering what I can do to make the first one faster. 4 hour query: SELECT MIN(data_tool.name) as tool, MIN(data_cst.date_time) start, MAX(data_cst.date_time) end, MIN(data_target.name) as target, MIN(data_lot.name) as lot, MIN(data_wafer.name) as wafer, MIN(measname) as measname, MIN(data_recipe.name) as recipe FROM data_cst INNER JOIN data_tool ON data_tool.id = data_cst.tool_id INNER JOIN data_target ON data_target.id = data_cst.target_name_id INNER JOIN data_lot ON data_lot.id = data_cst.lot_id INNER JOIN data_wafer ON data_wafer.id = data_cst.wafer_id INNER JOIN data_measparams ON data_measparams.id = data_cst.meas_params_name_id INNER JOIN data_recipe ON data_recipe.id = data_cst.recipe_id WHERE data_target.id IN (172) AND data_cst.date_time BETWEEN '2015-01-26 00:00:00' AND '2015-01-26 23:59:59' GROUP BY wafer_id, data_cst.lot_id, target_name_id ... snipped ... Faster query: SELECT MIN(data_tool.name) as tool, MIN(data_cst.date_time) start, MAX(data_cst.date_time) end, MIN(data_target.name) as target, MIN(data_lot.name) as lot, MIN(data_wafer.name) as wafer, MIN(measname) as measname, MIN(data_recipe.name) as recipe FROM data_cst INNER JOIN data_tool ON data_tool.id = data_cst.tool_id INNER JOIN data_target ON data_target.id = data_cst.target_name_id INNER JOIN data_lot ON data_lot.id = data_cst.lot_id INNER JOIN data_wafer ON data_wafer.id = data_cst.wafer_id INNER JOIN data_measparams ON data_measparams.id = data_cst.meas_params_name_id INNER JOIN data_recipe ON data_recipe.id = data_cst.recipe_id WHERE data_target.id IN (172) AND data_recipe.id IN (148) AND data_cst.date_time BETWEEN '2015-01-26 00:00:00' AND '2015-01-26 23:59:59' GROUP BY wafer_id, data_cst.lot_id, target_name_id ... snip ... Thanks for taking the time to read this, and for any help or pointers you can give me. The biggest difference is the added selectivity generated by the WHERE term against the data_recipe table. Compare the two EXPLAINS, in the faster query you see that data_recipe is listed second. This allows the additional term a chance to reduce the number of row combinations for the entire query. To really get at the logic behind how the Optimizer chooses its execution plan, get an optimizer trace. Look at the cost estimates for each phase being considered. http://dev.mysql.com/doc/refman/5.6/en/optimizer-trace-table.html http://dev.mysql.com/doc/internals/en/optimizer-tracing.html Yours, -- Shawn Green MySQL Senior Principal Technical Support Engineer Oracle USA, Inc. - Hardware and Software, Engineered to Work Together. Office: Blountville, TN -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql
Re: Help improving query performance
Hi Larry, On 2/4/2015 3:18 PM, Larry Martell wrote: On Wed, Feb 4, 2015 at 2:56 PM, shawn l.green shawn.l.gr...@oracle.com wrote: Hi Larry, On 2/1/2015 4:49 PM, Larry Martell wrote: I have 2 queries. One takes 4 hours to run and returns 21 rows, and the other, which has 1 additional where clause, takes 3 minutes and returns 20 rows. The main table being selected from is largish (37,247,884 rows with 282 columns). Caching is off for my testing, so it's not related to that. To short circuit anyone asking, these queries are generated by python code, which is why there's an IN clause with 1 value, as oppose to an =. Here are the queries and their explains. The significant difference is that the faster query has Using intersect(data_cst_bbccbce0,data_cst_fba12377) in the query plan - those 2 indexes are on the 2 columns in the where clause, so that's why the second one is faster. But I am wondering what I can do to make the first one faster. 4 hour query: SELECT MIN(data_tool.name) as tool, MIN(data_cst.date_time) start, MAX(data_cst.date_time) end, MIN(data_target.name) as target, MIN(data_lot.name) as lot, MIN(data_wafer.name) as wafer, MIN(measname) as measname, MIN(data_recipe.name) as recipe FROM data_cst INNER JOIN data_tool ON data_tool.id = data_cst.tool_id INNER JOIN data_target ON data_target.id = data_cst.target_name_id INNER JOIN data_lot ON data_lot.id = data_cst.lot_id INNER JOIN data_wafer ON data_wafer.id = data_cst.wafer_id INNER JOIN data_measparams ON data_measparams.id = data_cst.meas_params_name_id INNER JOIN data_recipe ON data_recipe.id = data_cst.recipe_id WHERE data_target.id IN (172) AND data_cst.date_time BETWEEN '2015-01-26 00:00:00' AND '2015-01-26 23:59:59' GROUP BY wafer_id, data_cst.lot_id, target_name_id ... snipped ... Faster query: SELECT MIN(data_tool.name) as tool, MIN(data_cst.date_time) start, MAX(data_cst.date_time) end, MIN(data_target.name) as target, MIN(data_lot.name) as lot, MIN(data_wafer.name) as wafer, MIN(measname) as measname, MIN(data_recipe.name) as recipe FROM data_cst INNER JOIN data_tool ON data_tool.id = data_cst.tool_id INNER JOIN data_target ON data_target.id = data_cst.target_name_id INNER JOIN data_lot ON data_lot.id = data_cst.lot_id INNER JOIN data_wafer ON data_wafer.id = data_cst.wafer_id INNER JOIN data_measparams ON data_measparams.id = data_cst.meas_params_name_id INNER JOIN data_recipe ON data_recipe.id = data_cst.recipe_id WHERE data_target.id IN (172) AND data_recipe.id IN (148) AND data_cst.date_time BETWEEN '2015-01-26 00:00:00' AND '2015-01-26 23:59:59' GROUP BY wafer_id, data_cst.lot_id, target_name_id ... snip ... Thanks for taking the time to read this, and for any help or pointers you can give me. The biggest difference is the added selectivity generated by the WHERE term against the data_recipe table. Compare the two EXPLAINS, in the faster query you see that data_recipe is listed second. This allows the additional term a chance to reduce the number of row combinations for the entire query. To really get at the logic behind how the Optimizer chooses its execution plan, get an optimizer trace. Look at the cost estimates for each phase being considered. http://dev.mysql.com/doc/refman/5.6/en/optimizer-trace-table.html http://dev.mysql.com/doc/internals/en/optimizer-tracing.html Thanks very much Shawn for the reply and the links. I will check those out and I'm sure I will find them very useful. Meanwhile I changed the query to select from data_cst using the where clause into a temp table and then I join the temp table with the other tables. That has improved the slow query from 4 hours to 10 seconds (!) Did you also add an index to the temporary table for the JOIN condition? It might make it even faster Yours, -- Shawn Green MySQL Senior Principal Technical Support Engineer Oracle USA, Inc. - Hardware and Software, Engineered to Work Together. Office: Blountville, TN -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql
Re: Help improving query performance
On Wed, Feb 4, 2015 at 3:25 PM, shawn l.green shawn.l.gr...@oracle.com wrote: Hi Larry, On 2/4/2015 3:18 PM, Larry Martell wrote: On Wed, Feb 4, 2015 at 2:56 PM, shawn l.green shawn.l.gr...@oracle.com wrote: Hi Larry, On 2/1/2015 4:49 PM, Larry Martell wrote: I have 2 queries. One takes 4 hours to run and returns 21 rows, and the other, which has 1 additional where clause, takes 3 minutes and returns 20 rows. The main table being selected from is largish (37,247,884 rows with 282 columns). Caching is off for my testing, so it's not related to that. To short circuit anyone asking, these queries are generated by python code, which is why there's an IN clause with 1 value, as oppose to an =. Here are the queries and their explains. The significant difference is that the faster query has Using intersect(data_cst_bbccbce0,data_cst_fba12377) in the query plan - those 2 indexes are on the 2 columns in the where clause, so that's why the second one is faster. But I am wondering what I can do to make the first one faster. 4 hour query: SELECT MIN(data_tool.name) as tool, MIN(data_cst.date_time) start, MAX(data_cst.date_time) end, MIN(data_target.name) as target, MIN(data_lot.name) as lot, MIN(data_wafer.name) as wafer, MIN(measname) as measname, MIN(data_recipe.name) as recipe FROM data_cst INNER JOIN data_tool ON data_tool.id = data_cst.tool_id INNER JOIN data_target ON data_target.id = data_cst.target_name_id INNER JOIN data_lot ON data_lot.id = data_cst.lot_id INNER JOIN data_wafer ON data_wafer.id = data_cst.wafer_id INNER JOIN data_measparams ON data_measparams.id = data_cst.meas_params_name_id INNER JOIN data_recipe ON data_recipe.id = data_cst.recipe_id WHERE data_target.id IN (172) AND data_cst.date_time BETWEEN '2015-01-26 00:00:00' AND '2015-01-26 23:59:59' GROUP BY wafer_id, data_cst.lot_id, target_name_id ... snipped ... Faster query: SELECT MIN(data_tool.name) as tool, MIN(data_cst.date_time) start, MAX(data_cst.date_time) end, MIN(data_target.name) as target, MIN(data_lot.name) as lot, MIN(data_wafer.name) as wafer, MIN(measname) as measname, MIN(data_recipe.name) as recipe FROM data_cst INNER JOIN data_tool ON data_tool.id = data_cst.tool_id INNER JOIN data_target ON data_target.id = data_cst.target_name_id INNER JOIN data_lot ON data_lot.id = data_cst.lot_id INNER JOIN data_wafer ON data_wafer.id = data_cst.wafer_id INNER JOIN data_measparams ON data_measparams.id = data_cst.meas_params_name_id INNER JOIN data_recipe ON data_recipe.id = data_cst.recipe_id WHERE data_target.id IN (172) AND data_recipe.id IN (148) AND data_cst.date_time BETWEEN '2015-01-26 00:00:00' AND '2015-01-26 23:59:59' GROUP BY wafer_id, data_cst.lot_id, target_name_id ... snip ... Thanks for taking the time to read this, and for any help or pointers you can give me. The biggest difference is the added selectivity generated by the WHERE term against the data_recipe table. Compare the two EXPLAINS, in the faster query you see that data_recipe is listed second. This allows the additional term a chance to reduce the number of row combinations for the entire query. To really get at the logic behind how the Optimizer chooses its execution plan, get an optimizer trace. Look at the cost estimates for each phase being considered. http://dev.mysql.com/doc/refman/5.6/en/optimizer-trace-table.html http://dev.mysql.com/doc/internals/en/optimizer-tracing.html Thanks very much Shawn for the reply and the links. I will check those out and I'm sure I will find them very useful. Meanwhile I changed the query to select from data_cst using the where clause into a temp table and then I join the temp table with the other tables. That has improved the slow query from 4 hours to 10 seconds (!) Did you also add an index to the temporary table for the JOIN condition? It might make it even faster No, I didn't. I (and the users) were so shocked and happy with the massive improvement I moved on to make similar changes to other queries. This is a django app, and it's a one-shot deal - i.e. there's just the one query run and the response is sent back to the browser and that's the end of the session and the temp table. So I'm thinking it's probably not worth it. As an aside this change has messed up all my unit tests - they send multiple requests, but they're all in the same session. So only the first succeeds and the next one fails because the temp table already exists. I haven't figured out how to get it run each request in its own session. I guess I'm going to have to drop the temp table after I join with it before I sent the response back. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql
Re: Help improving query performance
On Wed, Feb 4, 2015 at 2:56 PM, shawn l.green shawn.l.gr...@oracle.com wrote: Hi Larry, On 2/1/2015 4:49 PM, Larry Martell wrote: I have 2 queries. One takes 4 hours to run and returns 21 rows, and the other, which has 1 additional where clause, takes 3 minutes and returns 20 rows. The main table being selected from is largish (37,247,884 rows with 282 columns). Caching is off for my testing, so it's not related to that. To short circuit anyone asking, these queries are generated by python code, which is why there's an IN clause with 1 value, as oppose to an =. Here are the queries and their explains. The significant difference is that the faster query has Using intersect(data_cst_bbccbce0,data_cst_fba12377) in the query plan - those 2 indexes are on the 2 columns in the where clause, so that's why the second one is faster. But I am wondering what I can do to make the first one faster. 4 hour query: SELECT MIN(data_tool.name) as tool, MIN(data_cst.date_time) start, MAX(data_cst.date_time) end, MIN(data_target.name) as target, MIN(data_lot.name) as lot, MIN(data_wafer.name) as wafer, MIN(measname) as measname, MIN(data_recipe.name) as recipe FROM data_cst INNER JOIN data_tool ON data_tool.id = data_cst.tool_id INNER JOIN data_target ON data_target.id = data_cst.target_name_id INNER JOIN data_lot ON data_lot.id = data_cst.lot_id INNER JOIN data_wafer ON data_wafer.id = data_cst.wafer_id INNER JOIN data_measparams ON data_measparams.id = data_cst.meas_params_name_id INNER JOIN data_recipe ON data_recipe.id = data_cst.recipe_id WHERE data_target.id IN (172) AND data_cst.date_time BETWEEN '2015-01-26 00:00:00' AND '2015-01-26 23:59:59' GROUP BY wafer_id, data_cst.lot_id, target_name_id ... snipped ... Faster query: SELECT MIN(data_tool.name) as tool, MIN(data_cst.date_time) start, MAX(data_cst.date_time) end, MIN(data_target.name) as target, MIN(data_lot.name) as lot, MIN(data_wafer.name) as wafer, MIN(measname) as measname, MIN(data_recipe.name) as recipe FROM data_cst INNER JOIN data_tool ON data_tool.id = data_cst.tool_id INNER JOIN data_target ON data_target.id = data_cst.target_name_id INNER JOIN data_lot ON data_lot.id = data_cst.lot_id INNER JOIN data_wafer ON data_wafer.id = data_cst.wafer_id INNER JOIN data_measparams ON data_measparams.id = data_cst.meas_params_name_id INNER JOIN data_recipe ON data_recipe.id = data_cst.recipe_id WHERE data_target.id IN (172) AND data_recipe.id IN (148) AND data_cst.date_time BETWEEN '2015-01-26 00:00:00' AND '2015-01-26 23:59:59' GROUP BY wafer_id, data_cst.lot_id, target_name_id ... snip ... Thanks for taking the time to read this, and for any help or pointers you can give me. The biggest difference is the added selectivity generated by the WHERE term against the data_recipe table. Compare the two EXPLAINS, in the faster query you see that data_recipe is listed second. This allows the additional term a chance to reduce the number of row combinations for the entire query. To really get at the logic behind how the Optimizer chooses its execution plan, get an optimizer trace. Look at the cost estimates for each phase being considered. http://dev.mysql.com/doc/refman/5.6/en/optimizer-trace-table.html http://dev.mysql.com/doc/internals/en/optimizer-tracing.html Thanks very much Shawn for the reply and the links. I will check those out and I'm sure I will find them very useful. Meanwhile I changed the query to select from data_cst using the where clause into a temp table and then I join the temp table with the other tables. That has improved the slow query from 4 hours to 10 seconds (!) -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql
Re: Help improving query performance
Hello Larry, On 2/4/2015 3:37 PM, Larry Martell wrote: On Wed, Feb 4, 2015 at 3:25 PM, shawn l.green shawn.l.gr...@oracle.com wrote: Hi Larry, On 2/4/2015 3:18 PM, Larry Martell wrote: On Wed, Feb 4, 2015 at 2:56 PM, shawn l.green shawn.l.gr...@oracle.com wrote: Hi Larry, On 2/1/2015 4:49 PM, Larry Martell wrote: I have 2 queries. One takes 4 hours to run and returns 21 rows, and the other, which has 1 additional where clause, takes 3 minutes and returns 20 rows. The main table being selected from is largish (37,247,884 rows with 282 columns). Caching is off for my testing, so it's not related to that. To short circuit anyone asking, these queries are generated by python code, which is why there's an IN clause with 1 value, as oppose to an =. Here are the queries and their explains. The significant difference is that the faster query has Using intersect(data_cst_bbccbce0,data_cst_fba12377) in the query plan - those 2 indexes are on the 2 columns in the where clause, so that's why the second one is faster. But I am wondering what I can do to make the first one faster. 4 hour query: SELECT MIN(data_tool.name) as tool, MIN(data_cst.date_time) start, MAX(data_cst.date_time) end, MIN(data_target.name) as target, MIN(data_lot.name) as lot, MIN(data_wafer.name) as wafer, MIN(measname) as measname, MIN(data_recipe.name) as recipe FROM data_cst INNER JOIN data_tool ON data_tool.id = data_cst.tool_id INNER JOIN data_target ON data_target.id = data_cst.target_name_id INNER JOIN data_lot ON data_lot.id = data_cst.lot_id INNER JOIN data_wafer ON data_wafer.id = data_cst.wafer_id INNER JOIN data_measparams ON data_measparams.id = data_cst.meas_params_name_id INNER JOIN data_recipe ON data_recipe.id = data_cst.recipe_id WHERE data_target.id IN (172) AND data_cst.date_time BETWEEN '2015-01-26 00:00:00' AND '2015-01-26 23:59:59' GROUP BY wafer_id, data_cst.lot_id, target_name_id ... snipped ... Faster query: SELECT MIN(data_tool.name) as tool, MIN(data_cst.date_time) start, MAX(data_cst.date_time) end, MIN(data_target.name) as target, MIN(data_lot.name) as lot, MIN(data_wafer.name) as wafer, MIN(measname) as measname, MIN(data_recipe.name) as recipe FROM data_cst INNER JOIN data_tool ON data_tool.id = data_cst.tool_id INNER JOIN data_target ON data_target.id = data_cst.target_name_id INNER JOIN data_lot ON data_lot.id = data_cst.lot_id INNER JOIN data_wafer ON data_wafer.id = data_cst.wafer_id INNER JOIN data_measparams ON data_measparams.id = data_cst.meas_params_name_id INNER JOIN data_recipe ON data_recipe.id = data_cst.recipe_id WHERE data_target.id IN (172) AND data_recipe.id IN (148) AND data_cst.date_time BETWEEN '2015-01-26 00:00:00' AND '2015-01-26 23:59:59' GROUP BY wafer_id, data_cst.lot_id, target_name_id ... snip ... Thanks for taking the time to read this, and for any help or pointers you can give me. The biggest difference is the added selectivity generated by the WHERE term against the data_recipe table. Compare the two EXPLAINS, in the faster query you see that data_recipe is listed second. This allows the additional term a chance to reduce the number of row combinations for the entire query. To really get at the logic behind how the Optimizer chooses its execution plan, get an optimizer trace. Look at the cost estimates for each phase being considered. http://dev.mysql.com/doc/refman/5.6/en/optimizer-trace-table.html http://dev.mysql.com/doc/internals/en/optimizer-tracing.html Thanks very much Shawn for the reply and the links. I will check those out and I'm sure I will find them very useful. Meanwhile I changed the query to select from data_cst using the where clause into a temp table and then I join the temp table with the other tables. That has improved the slow query from 4 hours to 10 seconds (!) Did you also add an index to the temporary table for the JOIN condition? It might make it even faster No, I didn't. I (and the users) were so shocked and happy with the massive improvement I moved on to make similar changes to other queries. This is a django app, and it's a one-shot deal - i.e. there's just the one query run and the response is sent back to the browser and that's the end of the session and the temp table. So I'm thinking it's probably not worth it. As an aside this change has messed up all my unit tests - they send multiple requests, but they're all in the same session. So only the first succeeds and the next one fails because the temp table already exists. I haven't figured out how to get it run each request in its own session. I guess I'm going to have to drop the temp table after I join with it before I sent the response back. If... * it's a MEMORY temp table * it's always the same table design Then, you can use DELETE to clear the content (it's faster than DROP or
Help improving query performance
I have 2 queries. One takes 4 hours to run and returns 21 rows, and the other, which has 1 additional where clause, takes 3 minutes and returns 20 rows. The main table being selected from is largish (37,247,884 rows with 282 columns). Caching is off for my testing, so it's not related to that. To short circuit anyone asking, these queries are generated by python code, which is why there's an IN clause with 1 value, as oppose to an =. Here are the queries and their explains. The significant difference is that the faster query has Using intersect(data_cst_bbccbce0,data_cst_fba12377) in the query plan - those 2 indexes are on the 2 columns in the where clause, so that's why the second one is faster. But I am wondering what I can do to make the first one faster. 4 hour query: SELECT MIN(data_tool.name) as tool, MIN(data_cst.date_time) start, MAX(data_cst.date_time) end, MIN(data_target.name) as target, MIN(data_lot.name) as lot, MIN(data_wafer.name) as wafer, MIN(measname) as measname, MIN(data_recipe.name) as recipe FROM data_cst INNER JOIN data_tool ON data_tool.id = data_cst.tool_id INNER JOIN data_target ON data_target.id = data_cst.target_name_id INNER JOIN data_lot ON data_lot.id = data_cst.lot_id INNER JOIN data_wafer ON data_wafer.id = data_cst.wafer_id INNER JOIN data_measparams ON data_measparams.id = data_cst.meas_params_name_id INNER JOIN data_recipe ON data_recipe.id = data_cst.recipe_id WHERE data_target.id IN (172) AND data_cst.date_time BETWEEN '2015-01-26 00:00:00' AND '2015-01-26 23:59:59' GROUP BY wafer_id, data_cst.lot_id, target_name_id Explain: ++-+-++---+---+-+-+--+-+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | ++-+-++---+---+-+-+--+-+ | 1 | SIMPLE | data_target | const | PRIMARY | PRIMARY | 4 | const |1 | Using temporary; Using filesort | | 1 | SIMPLE | data_measparams | index | PRIMARY | PRIMARY | 4 | NULL | 767 | Using index | | 1 | SIMPLE | data_cst| ref| data_cst_4262c4f4,data_cst_01213b5c,data_cst_bbccbce0,data_cst_b0229011,data_cst_65c0,data_cst_fba12377,data_cst_634020d0 | data_cst_634020d0 | 5 | motor_gf.data_measparams.id | 48 | Using where | | 1 | SIMPLE | data_tool | eq_ref | PRIMARY | PRIMARY | 4 | motor_gf.data_cst.tool_id |1 | NULL| | 1 | SIMPLE | data_recipe | eq_ref | PRIMARY | PRIMARY | 4 | motor_gf.data_cst.recipe_id |1 | NULL| | 1 | SIMPLE | data_lot| eq_ref | PRIMARY | PRIMARY | 4 | motor_gf.data_cst.lot_id|1 | NULL| | 1 | SIMPLE | data_wafer | eq_ref | PRIMARY | PRIMARY | 4 | motor_gf.data_cst.wafer_id |1 | NULL| ++-+-++---+---+-+-+--+-+ Faster query: SELECT MIN(data_tool.name) as tool, MIN(data_cst.date_time) start, MAX(data_cst.date_time) end, MIN(data_target.name) as target, MIN(data_lot.name) as lot, MIN(data_wafer.name) as wafer, MIN(measname) as measname, MIN(data_recipe.name) as recipe FROM data_cst INNER JOIN data_tool ON data_tool.id = data_cst.tool_id INNER JOIN data_target ON data_target.id = data_cst.target_name_id INNER JOIN data_lot ON data_lot.id = data_cst.lot_id INNER JOIN data_wafer ON data_wafer.id = data_cst.wafer_id INNER JOIN data_measparams ON data_measparams.id = data_cst.meas_params_name_id INNER JOIN data_recipe ON data_recipe.id = data_cst.recipe_id WHERE data_target.id IN (172) AND data_recipe.id IN (148) AND data_cst.date_time BETWEEN '2015-01-26 00:00:00' AND '2015-01-26 23:59:59' GROUP BY wafer_id, data_cst.lot_id,
Does the order of tuples in a bulk insert impact query performance?
Does the order of tuples in a bulk insert impact later query performance? E.g. will sorting the rows before a bulk insert cause queries to perform better for indexed or non-indexed fields? Thanks, Anthony -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: Does the order of tuples in a bulk insert impact query performance?
On 2/7/10 7:28 AM, Anthony Urso antho...@cs.ucla.edu wrote: Does the order of tuples in a bulk insert impact later query performance? E.g. will sorting the rows before a bulk insert cause queries to perform better for indexed or non-indexed fields? when i load a large body of data (using INFILE), i remove the indexes before the load and add them back after. i work on the assumption that if the data is sorted already then the recreation of the indexes in the last step is faster. but then if you have orthogonal indexes, you have to choose one to sort the rows by. as for performance on accessing the table later in normal use, i guess it depends on the usage patterns. if there's a lot of clustered or sequential table reads then caching might be more effective if the rows are sorted. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
RE: Erratic query performance
Any ideas of what could be causing the varied response time on a simple query when everything on the server appears to be identical from one run to another? Are there settings that can be made on the server to tweak response time for a database/query like this? [JS] Is it possible that there are locking conflicts? They can produce the kind of behavior you are describing. Here are stats on the files involved in the query and the actual query I am trying to run. Note that the number of receipts with amount = 1 is very smal compared to the total number of records. Main: 900,000 records, 500 Mb (886,361 where recordtype = INDIVIDUAL) Primary key: id (int) Receipt: 4,500,000 records, 700 Mb (6,817 where amount = 1) Primary key: id (int) Indexed on: mainid (int) Committee: 4,500 records, 600Kb (1,476 where party = D) Primary key: id (int) Indexed on: Comm_id (varchar(6)) create temporary table tmp type = heap select distinct 3 filterid, m.id, GroupLevel, 0 GroupCum from main m left join receipt r on m.id = r.mainid left join campaccommon.committee c on r.comm_id = c.Comm_id where recordtype = INDIVIDUAL and c.party = D and r.amount = 1 Returns 294 records. Thanks for any insight you can offer. - Leo Siefert -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Erratic query performance
I have a moderate sized database set up and a program that allows users to create ad-hoc queries into the data based on entries in a form, so that I, as the programmer, have control over the actual construction of the queries and can do what is needed to optimize queries. I also keep a log of all queries that are run so that I can easily see the exact query that may have caused a problem. For the most part, unless a query is quite complex, there is no problem with the response time - from a few seconds up to a minute or two for more complex queries or one returning very large result sets. Recently a seemingly very simple query has resulted in unacceptably long processing times. After playing around with the query in PhpMyAdmin I am totally perplexed as to what could be causing the problem. Sometimes the query will execute in less than 30 seconds, but other times it takes from 4 to 10 or more minutes. It never seems to complete in between 30 seconds and 4 minutes. To try to isolate the problem today I did a lot of testing on an isolated server - nothing on it but MySql and this database and no one but me has access to it. Tried rearranging the joins and eliminating one of the joins as well as everything else I could think of to figure out what could be causing the issue. Through all of the testing I got consistent results in the four minute range for all of the variations I tried - repeated attempts with the same query varied by only a second or two. Then I want back to my program and ran the original query on the public database - the same place that the problem had been originally found and instead of timing out the gateway (five minute limit) as it had done consistently over the past few days it ran it successfully in about 20 seconds. I was able to repeat this many times both using the program and by entrering the query into PhpMyAdmin. Still takes 4 minutes on the private server, though. A couple of hours later - shortly before starting this message - I tried again on the public server and again the response time was under 30 seconds. Trying again now and it's on its way to timing out again. Checked and there are no other processes running on the server - volume is usually low as there are less than 100 users total. Any ideas of what could be causing the varied response time on a simple query when everything on the server appears to be identical from one run to another? Are there settings that can be made on the server to tweak response time for a database/query like this? Here are stats on the files involved in the query and the actual query I am trying to run. Note that the number of receipts with amount = 1 is very smal compared to the total number of records. Main: 900,000 records, 500 Mb (886,361 where recordtype = INDIVIDUAL) Primary key: id (int) Receipt: 4,500,000 records, 700 Mb (6,817 where amount = 1) Primary key: id (int) Indexed on: mainid (int) Committee: 4,500 records, 600Kb (1,476 where party = D) Primary key: id (int) Indexed on: Comm_id (varchar(6)) create temporary table tmp type = heap select distinct 3 filterid, m.id, GroupLevel, 0 GroupCum from main m left join receipt r on m.id = r.mainid left join campaccommon.committee c on r.comm_id = c.Comm_id where recordtype = INDIVIDUAL and c.party = D and r.amount = 1 Returns 294 records. Thanks for any insight you can offer. - Leo Siefert
Re: Erratic query performance
In the last episode (Aug 13), Leo Siefert said: After playing around with the query in PhpMyAdmin I am totally perplexed as to what could be causing the problem. Sometimes the query will execute in less than 30 seconds, but other times it takes from 4 to 10 or more minutes. It never seems to complete in between 30 seconds and 4 minutes. To try to isolate the problem today I did a lot of testing on an isolated server - nothing on it but MySql and this database and no one but me has access to it. Tried rearranging the joins and eliminating one of the joins as well as everything else I could think of to figure out what could be causing the issue. Through all of the testing I got consistent results in the four minute range for all of the variations I tried - repeated attempts with the same query varied by only a second or two. Then I want back to my program and ran the original query on the public database - the same place that the problem had been originally found and instead of timing out the gateway (five minute limit) as it had done consistently over the past few days it ran it successfully in about 20 seconds. I was able to repeat this many times both using the program and by entrering the query into PhpMyAdmin. Still takes 4 minutes on the private server, though. What is the disk and CPU activity during this time? Does a second identical query return quickly? Could be a caching effect. If the tables and indexes aren't cached, the query will run slow, and once they are cached it runs fast. If someone else runs a query that pushes the rows you're interested in out of cache, it goes slow again. Do you have enough memory (and is key_buffer_size set high enough) to cache all the indexes you are using? show status like 'key%' before and after the query might be useful. Do you have enough memory to cache all the tables as well? An EXPLAIN of the query would be useful, just to verify that it is using the indexes you expect. You can't explain a create table query, so just explain the select part. If you're not selecting any fields from receipt (i.e. it's just being used to join main and campaccommon), an index on (mainid, comm_id) could let you bypass a lot of random table accesses. A couple of hours later - shortly before starting this message - I tried again on the public server and again the response time was under 30 seconds. Trying again now and it's on its way to timing out again. Checked and there are no other processes running on the server - volume is usually low as there are less than 100 users total. Any ideas of what could be causing the varied response time on a simple query when everything on the server appears to be identical from one run to another? Are there settings that can be made on the server to tweak response time for a database/query like this? Here are stats on the files involved in the query and the actual query I am trying to run. Note that the number of receipts with amount = 1 is very smal compared to the total number of records. Main: 900,000 records, 500 Mb (886,361 where recordtype = INDIVIDUAL) Primary key: id (int) Receipt: 4,500,000 records, 700 Mb (6,817 where amount = 1) Primary key: id (int) Indexed on: mainid (int) Committee: 4,500 records, 600Kb (1,476 where party = D) Primary key: id (int) Indexed on: Comm_id (varchar(6)) create temporary table tmp type = heap select distinct 3 filterid, m.id, GroupLevel, 0 GroupCum from main m left join receipt r on m.id = r.mainid left join campaccommon.committee c on r.comm_id = c.Comm_id where recordtype = INDIVIDUAL and c.party = D and r.amount = 1 Returns 294 records. -- Dan Nelson dnel...@allantgroup.com -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
RE: Erratic query performance
Hi Leo, Also include the EXPLAIN SELECT ...; output, and the SHOW CREATE TABLE table\G for each table used in the query. Have you considered that your query's execution time depends on other activity on the server? Have you tried skipping the cache ( SELECT SQL_NO_CACHE ... ) and see if you get consistent times? What about running this directly through the mysql cli? Regards, Gavin Towey -Original Message- From: Leo Siefert [mailto:lsief...@sbcglobal.net] Sent: Thursday, August 13, 2009 1:10 PM To: mysql@lists.mysql.com Subject: Erratic query performance I have a moderate sized database set up and a program that allows users to create ad-hoc queries into the data based on entries in a form, so that I, as the programmer, have control over the actual construction of the queries and can do what is needed to optimize queries. I also keep a log of all queries that are run so that I can easily see the exact query that may have caused a problem. For the most part, unless a query is quite complex, there is no problem with the response time - from a few seconds up to a minute or two for more complex queries or one returning very large result sets. Recently a seemingly very simple query has resulted in unacceptably long processing times. After playing around with the query in PhpMyAdmin I am totally perplexed as to what could be causing the problem. Sometimes the query will execute in less than 30 seconds, but other times it takes from 4 to 10 or more minutes. It never seems to complete in between 30 seconds and 4 minutes. To try to isolate the problem today I did a lot of testing on an isolated server - nothing on it but MySql and this database and no one but me has access to it. Tried rearranging the joins and eliminating one of the joins as well as everything else I could think of to figure out what could be causing the issue. Through all of the testing I got consistent results in the four minute range for all of the variations I tried - repeated attempts with the same query varied by only a second or two. Then I want back to my program and ran the original query on the public database - the same place that the problem had been originally found and instead of timing out the gateway (five minute limit) as it had done consistently over the past few days it ran it successfully in about 20 seconds. I was able to repeat this many times both using the program and by entrering the query into PhpMyAdmin. Still takes 4 minutes on the private server, though. A couple of hours later - shortly before starting this message - I tried again on the public server and again the response time was under 30 seconds. Trying again now and it's on its way to timing out again. Checked and there are no other processes running on the server - volume is usually low as there are less than 100 users total. Any ideas of what could be causing the varied response time on a simple query when everything on the server appears to be identical from one run to another? Are there settings that can be made on the server to tweak response time for a database/query like this? Here are stats on the files involved in the query and the actual query I am trying to run. Note that the number of receipts with amount = 1 is very smal compared to the total number of records. Main: 900,000 records, 500 Mb (886,361 where recordtype = INDIVIDUAL) Primary key: id (int) Receipt: 4,500,000 records, 700 Mb (6,817 where amount = 1) Primary key: id (int) Indexed on: mainid (int) Committee: 4,500 records, 600Kb (1,476 where party = D) Primary key: id (int) Indexed on: Comm_id (varchar(6)) create temporary table tmp type = heap select distinct 3 filterid, m.id, GroupLevel, 0 GroupCum from main m left join receipt r on m.id = r.mainid left join campaccommon.committee c on r.comm_id = c.Comm_id where recordtype = INDIVIDUAL and c.party = D and r.amount = 1 Returns 294 records. Thanks for any insight you can offer. - Leo Siefert The information contained in this transmission may contain privileged and confidential information. It is intended only for the use of the person(s) named above. If you are not the intended recipient, you are hereby notified that any review, dissemination, distribution or duplication of this communication is strictly prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: Slow query Performance
On Wed, 15 Jul 2009 23:53:05 -0400 Darryle Steplight said: Can you show us the output of DESCRIBE score and SHOW INDEX FROM score? On Wed, Jul 15, 2009 at 6:44 PM, Tachu®tachu1+my...@gmail.com wrote: I'm having random query slowness that i can only reproduce once. My main question is that the query runs faster the second time around but i dont have ... snipage -- A: It reverses the normal flow of conversation. Q: What's wrong with top-posting? A: Top-posting. Q: What's the biggest scourge on plain text email discussions? Anybody else see the irony here? -- Don Readdon_r...@att.net It's always darkest before the dawn. So if you are going to steal the neighbor's newspaper, that's the time to do it. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
RE: Slow query Performance
when my.ini has query-cache-type = 1 setting the query results are placed in cache on first read second and consequent reads reference resultset from cache http://www.databasejournal.com/features/mysql/article.php/3110171/MySQLs-Query-Cache.htm Martin Gainty Confucius say Big Dog in door prevents newspaper from being stolen __ Verzicht und Vertraulichkeitanmerkung/Note de déni et de confidentialité Diese Nachricht ist vertraulich. Sollten Sie nicht der vorgesehene Empfaenger sein, so bitten wir hoeflich um eine Mitteilung. Jede unbefugte Weiterleitung oder Fertigung einer Kopie ist unzulaessig. Diese Nachricht dient lediglich dem Austausch von Informationen und entfaltet keine rechtliche Bindungswirkung. Aufgrund der leichten Manipulierbarkeit von E-Mails koennen wir keine Haftung fuer den Inhalt uebernehmen. Ce message est confidentiel et peut être privilégié. Si vous n'êtes pas le destinataire prévu, nous te demandons avec bonté que pour satisfaire informez l'expéditeur. N'importe quelle diffusion non autorisée ou la copie de ceci est interdite. Ce message sert à l'information seulement et n'aura pas n'importe quel effet légalement obligatoire. Étant donné que les email peuvent facilement être sujets à la manipulation, nous ne pouvons accepter aucune responsabilité pour le contenu fourni. Date: Thu, 16 Jul 2009 10:08:03 -0400 From: don_r...@att.net To: dstepli...@gmail.com CC: mysql@lists.mysql.com Subject: Re: Slow query Performance On Wed, 15 Jul 2009 23:53:05 -0400 Darryle Steplight said: Can you show us the output of DESCRIBE score and SHOW INDEX FROM score? On Wed, Jul 15, 2009 at 6:44 PM, Tachu®tachu1+my...@gmail.com wrote: I'm having random query slowness that i can only reproduce once. My main question is that the query runs faster the second time around but i dont have ... snipage -- A: It reverses the normal flow of conversation. Q: What's wrong with top-posting? A: Top-posting. Q: What's the biggest scourge on plain text email discussions? Anybody else see the irony here? -- Don Readdon_r...@att.net It's always darkest before the dawn. So if you are going to steal the neighbor's newspaper, that's the time to do it. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=mgai...@hotmail.com _ Insert movie times and more without leaving Hotmail®. http://windowslive.com/Tutorial/Hotmail/QuickAdd?ocid=TXT_TAGLM_WL_HM_Tutorial_QuickAdd_062009
Re: Slow query Performance
In the last episode (Jul 15), Tachu(R) said: I'm having random query slowness that i can only reproduce once. My main question is that the query runs faster the second time around but i dont have query cache enabled here is some info from mysql profiler; The time is spent mostly on the sending data step first time around 63 rows in set (0.51 sec) Second time around 63 rows in set (0.00 sec) Any ideas how i can improve the performance of the query. the explain explain select user_id,result_id from score where quiz_id='495536' and user_id in (594939703, 641833475, 648583496, 663932271, 791002140, 844089643, 1014189359, 1014996058, 1021011357, 1035297313, 1043753292, 1103136802, 1107070131, 1114628916, 1129457032, 1133091309, 1188705251, 1211995704, 1219452575, 1219631303, 1239604246, 1241474238, 1266412488, 1266549868, 1288719892, 1289732597, 1317205736, 1346089661, 1350738033, 1354967647, 1356046070, 1369391720, 1374076904, 1406156780, 1407302487, 1414151928, 1425275210, 1457839666, 1466635900, 1484315366, 1493410149, 1536834812, 1544094394, 1572354290, 1575139632, 1578136049, 1587129534, 1592996678, 1594617334, 1615538051, 1615906710, 1621733854, 1622940529, 1646693120, 1674002418, 1684166314, 1684535294, 1701864533, 1703227082, 1711823847, 1713353427, 1732903860, 1752595138, 1758240924, 1813245914, 1815724221, 1839942291, 10015406640); ++-+---+---+-+-+-+--+--+-+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | ++-+---+---+-+-+-+--+--+-+ | 1 | SIMPLE | score | range | user_id,quiz_id | user_id | 12 | NULL | 68 | Using where | ++-+---+---+-+-+-+--+--+-+ 1 row in set (0.02 sec) UNIQUE KEY `user_id` (`user_id`, `quiz_id`), Try swapping those fields in the compound index. The way you have it, mysql has to jump to each of the 68 user_id values in the index and see if one of the quiz_ids is 495536. If you have an index on (quiz_id,user_id), mysql only has to jump to the 495536 quiz_id section, and all the user_ids are all right there. That should cut your query time by 50% (since you still have to do 68 seeks to the table rows to fetch result_id). If you also add result_id to your compound index, then mysql will be able to get all its information from the index without having to go to the table at all. -- Dan Nelson dnel...@allantgroup.com -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Slow query Performance
I'm having random query slowness that i can only reproduce once. My main question is that the query runs faster the second time around but i dont have query cache enabled here is some info from mysql profiler; The time is spent mostly on the sending data step first time around 63 rows in set (0.51 sec) show profile all; ++--+--++---+-+--+---+---+---+---+---+---+---+---+-+ | Status | Duration | CPU_user | CPU_system | Context_voluntary | Context_involuntary | Block_ops_in | Block_ops_out | Messages_sent | Messages_received | Page_faults_major | Page_faults_minor | Swaps | Source_function | Source_file | Source_line | ++--+--++---+-+--+---+---+---+---+---+---+---+---+-+ | starting | 0.000165 | 0.00 | 0.00 | 0 | 0 |0 | 0 | 0 | 0 | 0 | 0 | 0 | NULL | NULL |NULL | | Opening tables | 0.33 | 0.00 | 0.00 | 0 | 0 |0 | 0 | 0 | 0 | 0 | 0 | 0 | open_tables | sql_base.cc |4450 | | System lock| 0.20 | 0.00 | 0.00 | 0 | 0 |0 | 0 | 0 | 0 | 0 | 0 | 0 | mysql_lock_tables | lock.cc | 258 | | Table lock | 0.28 | 0.00 | 0.00 | 0 | 0 |0 | 0 | 0 | 0 | 0 | 0 | 0 | mysql_lock_tables | lock.cc | 269 | | init | 0.52 | 0.00 | 0.00 | 0 | 0 |0 | 0 | 0 | 0 | 0 | 0 | 0 | mysql_select | sql_select.cc |2337 | | optimizing | 0.36 | 0.00 | 0.00 | 0 | 0 |0 | 0 | 0 | 0 | 0 | 0 | 0 | optimize | sql_select.cc | 762 | | statistics | 0.000233 | 0.001000 | 0.00 | 0 | 0 |0 | 0 | 0 | 0 | 0 | 0 | 0 | optimize | sql_select.cc | 944 | | preparing | 0.31 | 0.00 | 0.00 | 0 | 0 |0 | 0 | 0 | 0 | 0 | 0 | 0 | optimize | sql_select.cc | 954 | | executing | 0.17 | 0.00 | 0.00 | 0 | 0 |0 | 0 | 0 | 0 | 0 | 0 | 0 | exec | sql_select.cc |1638 | | Sending data | 0.504797 | 0.129980 | 0.012998 | 429 | 38 | 2456 |64 | 0 | 0 | 0 | 0 | 0 | exec | sql_select.cc |2177 | | end| 0.54 | 0.00 | 0.00 | 0 | 0 |0 | 0 | 0 | 0 | 0 | 0 | 0 | mysql_select | sql_select.cc |2382 | | query end | 0.23 | 0.00 | 0.00 | 0 | 0 |0 | 0 | 0 | 0 | 0 | 0 | 0 | mysql_execute_command | sql_parse.cc |4799 | | freeing items | 0.63 | 0.00 | 0.000999 | 0 | 0 |0 | 0 | 0 | 0 | 0 | 0 | 0 | mysql_parse | sql_parse.cc |5805 | | logging slow query | 0.18 | 0.00 | 0.00 | 0 | 0 |0 | 0 | 0 | 0 | 0 | 0 | 0 | log_slow_statement| sql_parse.cc |1608 | | cleaning up| 0.20 | 0.00 | 0.00 | 0 | 0 |
Re: Slow query Performance
Can you show us the output of DESCRIBE score and SHOW INDEX FROM score? On Wed, Jul 15, 2009 at 6:44 PM, Tachu®tachu1+my...@gmail.com wrote: I'm having random query slowness that i can only reproduce once. My main question is that the query runs faster the second time around but i dont have query cache enabled here is some info from mysql profiler; The time is spent mostly on the sending data step first time around 63 rows in set (0.51 sec) show profile all; ++--+--++---+-+--+---+---+---+---+---+---+---+---+-+ | Status | Duration | CPU_user | CPU_system | Context_voluntary | Context_involuntary | Block_ops_in | Block_ops_out | Messages_sent | Messages_received | Page_faults_major | Page_faults_minor | Swaps | Source_function | Source_file | Source_line | ++--+--++---+-+--+---+---+---+---+---+---+---+---+-+ | starting | 0.000165 | 0.00 | 0.00 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | NULL | NULL | NULL | | Opening tables | 0.33 | 0.00 | 0.00 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | open_tables | sql_base.cc | 4450 | | System lock | 0.20 | 0.00 | 0.00 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | mysql_lock_tables | lock.cc | 258 | | Table lock | 0.28 | 0.00 | 0.00 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | mysql_lock_tables | lock.cc | 269 | | init | 0.52 | 0.00 | 0.00 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | mysql_select | sql_select.cc | 2337 | | optimizing | 0.36 | 0.00 | 0.00 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | optimize | sql_select.cc | 762 | | statistics | 0.000233 | 0.001000 | 0.00 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | optimize | sql_select.cc | 944 | | preparing | 0.31 | 0.00 | 0.00 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | optimize | sql_select.cc | 954 | | executing | 0.17 | 0.00 | 0.00 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | exec | sql_select.cc | 1638 | | Sending data | 0.504797 | 0.129980 | 0.012998 | 429 | 38 | 2456 | 64 | 0 | 0 | 0 | 0 | 0 | exec | sql_select.cc | 2177 | | end | 0.54 | 0.00 | 0.00 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | mysql_select | sql_select.cc | 2382 | | query end | 0.23 | 0.00 | 0.00 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | mysql_execute_command | sql_parse.cc | 4799 | | freeing items | 0.63 | 0.00 | 0.000999 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | mysql_parse | sql_parse.cc | 5805 | | logging slow query | 0.18 | 0.00 | 0.00 | 0 | 0 | 0 | 0 | 0 |
Re: Query performance plain/text versus AES_DECRYPT(): LIKE %..%
Auch, thanks for pointing that out, what a terrible mistake. I am aware of the performance issue, and so is the customer. But with a table that's only going to hold maximally 60.000 records in 10 years, I'm not afraid it'll cause significant problems. If it gets out of hand we'll have to think of a better solution. Once again, thanks! John -- / Humanique / Webstrategie en ontwikkeling / http://www.humanique.com/ - Humanique zoekt een ervaren Web-ontwikkelaar (PHP). Bekijk de vacature op http://www.humanique.com/ - Jeremy Cole wrote: Hi John, OK, no conspiracy here. Here is your problem: 25 $qry = sprintf(SELECT id, line FROM `encryptietest` WHERE AES_DECRYPT(`field`, '%') LIKE '%%%s%%', $enckey, $word); You are missing the s in %s for your first string argument, which causes the query to be syntactically incorrect and fail. So your AES test is only testing how quickly you can query with a syntax error. :) After adding the s, the results I get are: ([EMAIL PROTECTED]) [~/datisstom/bench]$ php -q bench.php Control test (plain/text LIKE %..%):1.383749s Decrypt test (AES_DECRYPT() LIKE %..%): 1.441944s done Nonetheless, I'd still argue that this entire concept is doomed to terrible performance anyway. Regards, Jeremy John Kraal wrote: I put it here: http://pro.datisstom.nl/tests/bench.tar.bz2 The encryption isn't really a *real* security measure, except for when somebody is stupid enough to install phpMyAdmin or anything equivalent and try to get personal data. The problem is the password needs to be anywhere on the application-server and if you're in, you're in. But it's a request and I'm happy to oblige. Even if it only stops them for 1 minute (which could be enough). Regards, John -- / Humanique / Webstrategie en ontwikkeling / http://www.humanique.com/ - Humanique zoekt een ervaren Web-ontwikkelaar (PHP). Bekijk de vacature op http://www.humanique.com/ - Jeremy Cole wrote: Hi John, Your attachment for the php code got stripped somewhere. Can you post it somewhere (http preferable)? In either case it's going to result in a full table scan, so they are actually both a bad strategy long term, but they should in theory perform as you would expect, with with encryption being slightly slower. Have you tried with longer strings? What is your customer's fear with having the data in plain text? Presumably in order to use this in your application, you will have the AES password stored in your application, and it will end up in logs (such as the slow query log) quite frequently. I would think your data can be safer and your security more effective by setting some policies which are less intrusive into the actual workings of the data, such as encrypting backups and setting system-level policies. Regards, Jeremy John Kraal wrote: Dear you, I've been working on encrypting some data for a customer. They want their personal/sensitive information encrypted in the database, but they want to be able to search it too, through the application. So we've been thinking a bit, and just started trying and benchmarking some solutions we thought up. The next one really got my attention, I created a table with 4 fields: 1. id (primary/auto_increment, not really interesting) 2. field, with encrypted data 3. md5sum (it has no special use, we benched it though.) 4. line, always containing three words (the same three as encrypted) When we started querying the table for random words (from lipsum.com), it seems that searching in the encrypted fields was _lots_ faster. Results below: 1.000 queries per field: ~$ php -q searchtest.php Control test (plain/text LIKE %..%):1.409699s Decrypt test (AES_DECRYPT() LIKE %..%): 1.226069s done 1.000.000 queries per field: ~$ php -q searchtest.php Control test (plain/text LIKE %..%):155.059671s Decrypt test (AES_DECRYPT() LIKE %..%): 137.003216s done Actually, the only thing I could think of to say was: Well, at least it's consistent. I've attached all the files I used for this test. Edit db.inc.php (add some more lipsum if you want), execute fill.php, and then have fun with bench.php. Does any of you know why this is, how come, etc? I'm just very curious. Regards, John Kraal -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Query performance plain/text versus AES_DECRYPT(): LIKE %..%
I put it here: http://pro.datisstom.nl/tests/bench.tar.bz2 The encryption isn't really a *real* security measure, except for when somebody is stupid enough to install phpMyAdmin or anything equivalent and try to get personal data. The problem is the password needs to be anywhere on the application-server and if you're in, you're in. But it's a request and I'm happy to oblige. Even if it only stops them for 1 minute (which could be enough). Regards, John -- / Humanique / Webstrategie en ontwikkeling / http://www.humanique.com/ - Humanique zoekt een ervaren Web-ontwikkelaar (PHP). Bekijk de vacature op http://www.humanique.com/ - Jeremy Cole wrote: Hi John, Your attachment for the php code got stripped somewhere. Can you post it somewhere (http preferable)? In either case it's going to result in a full table scan, so they are actually both a bad strategy long term, but they should in theory perform as you would expect, with with encryption being slightly slower. Have you tried with longer strings? What is your customer's fear with having the data in plain text? Presumably in order to use this in your application, you will have the AES password stored in your application, and it will end up in logs (such as the slow query log) quite frequently. I would think your data can be safer and your security more effective by setting some policies which are less intrusive into the actual workings of the data, such as encrypting backups and setting system-level policies. Regards, Jeremy John Kraal wrote: Dear you, I've been working on encrypting some data for a customer. They want their personal/sensitive information encrypted in the database, but they want to be able to search it too, through the application. So we've been thinking a bit, and just started trying and benchmarking some solutions we thought up. The next one really got my attention, I created a table with 4 fields: 1. id (primary/auto_increment, not really interesting) 2. field, with encrypted data 3. md5sum (it has no special use, we benched it though.) 4. line, always containing three words (the same three as encrypted) When we started querying the table for random words (from lipsum.com), it seems that searching in the encrypted fields was _lots_ faster. Results below: 1.000 queries per field: ~$ php -q searchtest.php Control test (plain/text LIKE %..%):1.409699s Decrypt test (AES_DECRYPT() LIKE %..%): 1.226069s done 1.000.000 queries per field: ~$ php -q searchtest.php Control test (plain/text LIKE %..%):155.059671s Decrypt test (AES_DECRYPT() LIKE %..%): 137.003216s done Actually, the only thing I could think of to say was: Well, at least it's consistent. I've attached all the files I used for this test. Edit db.inc.php (add some more lipsum if you want), execute fill.php, and then have fun with bench.php. Does any of you know why this is, how come, etc? I'm just very curious. Regards, John Kraal -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Query performance plain/text versus AES_DECRYPT(): LIKE %..%
Hi John, OK, no conspiracy here. Here is your problem: 25 $qry = sprintf(SELECT id, line FROM `encryptietest` WHERE AES_DECRYPT(`field`, '%') LIKE '%%%s%%', $enckey, $word); You are missing the s in %s for your first string argument, which causes the query to be syntactically incorrect and fail. So your AES test is only testing how quickly you can query with a syntax error. :) After adding the s, the results I get are: ([EMAIL PROTECTED]) [~/datisstom/bench]$ php -q bench.php Control test (plain/text LIKE %..%):1.383749s Decrypt test (AES_DECRYPT() LIKE %..%): 1.441944s done Nonetheless, I'd still argue that this entire concept is doomed to terrible performance anyway. Regards, Jeremy John Kraal wrote: I put it here: http://pro.datisstom.nl/tests/bench.tar.bz2 The encryption isn't really a *real* security measure, except for when somebody is stupid enough to install phpMyAdmin or anything equivalent and try to get personal data. The problem is the password needs to be anywhere on the application-server and if you're in, you're in. But it's a request and I'm happy to oblige. Even if it only stops them for 1 minute (which could be enough). Regards, John -- / Humanique / Webstrategie en ontwikkeling / http://www.humanique.com/ - Humanique zoekt een ervaren Web-ontwikkelaar (PHP). Bekijk de vacature op http://www.humanique.com/ - Jeremy Cole wrote: Hi John, Your attachment for the php code got stripped somewhere. Can you post it somewhere (http preferable)? In either case it's going to result in a full table scan, so they are actually both a bad strategy long term, but they should in theory perform as you would expect, with with encryption being slightly slower. Have you tried with longer strings? What is your customer's fear with having the data in plain text? Presumably in order to use this in your application, you will have the AES password stored in your application, and it will end up in logs (such as the slow query log) quite frequently. I would think your data can be safer and your security more effective by setting some policies which are less intrusive into the actual workings of the data, such as encrypting backups and setting system-level policies. Regards, Jeremy John Kraal wrote: Dear you, I've been working on encrypting some data for a customer. They want their personal/sensitive information encrypted in the database, but they want to be able to search it too, through the application. So we've been thinking a bit, and just started trying and benchmarking some solutions we thought up. The next one really got my attention, I created a table with 4 fields: 1. id (primary/auto_increment, not really interesting) 2. field, with encrypted data 3. md5sum (it has no special use, we benched it though.) 4. line, always containing three words (the same three as encrypted) When we started querying the table for random words (from lipsum.com), it seems that searching in the encrypted fields was _lots_ faster. Results below: 1.000 queries per field: ~$ php -q searchtest.php Control test (plain/text LIKE %..%):1.409699s Decrypt test (AES_DECRYPT() LIKE %..%): 1.226069s done 1.000.000 queries per field: ~$ php -q searchtest.php Control test (plain/text LIKE %..%):155.059671s Decrypt test (AES_DECRYPT() LIKE %..%): 137.003216s done Actually, the only thing I could think of to say was: Well, at least it's consistent. I've attached all the files I used for this test. Edit db.inc.php (add some more lipsum if you want), execute fill.php, and then have fun with bench.php. Does any of you know why this is, how come, etc? I'm just very curious. Regards, John Kraal -- high performance mysql consulting www.provenscaling.com -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Query performance plain/text versus AES_DECRYPT(): LIKE %..%
Dear you, I've been working on encrypting some data for a customer. They want their personal/sensitive information encrypted in the database, but they want to be able to search it too, through the application. So we've been thinking a bit, and just started trying and benchmarking some solutions we thought up. The next one really got my attention, I created a table with 4 fields: 1. id (primary/auto_increment, not really interesting) 2. field, with encrypted data 3. md5sum (it has no special use, we benched it though.) 4. line, always containing three words (the same three as encrypted) When we started querying the table for random words (from lipsum.com), it seems that searching in the encrypted fields was _lots_ faster. Results below: 1.000 queries per field: ~$ php -q searchtest.php Control test (plain/text LIKE %..%):1.409699s Decrypt test (AES_DECRYPT() LIKE %..%): 1.226069s done 1.000.000 queries per field: ~$ php -q searchtest.php Control test (plain/text LIKE %..%):155.059671s Decrypt test (AES_DECRYPT() LIKE %..%): 137.003216s done Actually, the only thing I could think of to say was: Well, at least it's consistent. I've attached all the files I used for this test. Edit db.inc.php (add some more lipsum if you want), execute fill.php, and then have fun with bench.php. Does any of you know why this is, how come, etc? I'm just very curious. Regards, John Kraal -- / Humanique / Webstrategie en ontwikkeling / http://www.humanique.com/ - Humanique zoekt een ervaren Web-ontwikkelaar (PHP). Bekijk de vacature op http://www.humanique.com/ - -- MySQL dump 10.10 -- -- Host: localhostDatabase: hmnq_test -- -- -- Server version 5.0.30-Debian_3-log /*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */; /*!40101 SET @OLD_CHARACTER_SET_RESULTS=@@CHARACTER_SET_RESULTS */; /*!40101 SET @OLD_COLLATION_CONNECTION=@@COLLATION_CONNECTION */; /*!40101 SET NAMES utf8 */; /*!40103 SET @OLD_TIME_ZONE=@@TIME_ZONE */; /*!40103 SET TIME_ZONE='+00:00' */; /*!40014 SET @OLD_UNIQUE_CHECKS=@@UNIQUE_CHECKS, UNIQUE_CHECKS=0 */; /*!40014 SET @OLD_FOREIGN_KEY_CHECKS=@@FOREIGN_KEY_CHECKS, FOREIGN_KEY_CHECKS=0 */; /*!40101 SET @OLD_SQL_MODE=@@SQL_MODE, SQL_MODE='NO_AUTO_VALUE_ON_ZERO' */; /*!40111 SET @OLD_SQL_NOTES=@@SQL_NOTES, SQL_NOTES=0 */; -- -- Table structure for table `encryptie` -- DROP TABLE IF EXISTS `encryptie`; CREATE TABLE `encryptie` ( `id` int(11) NOT NULL auto_increment, `field` blob NOT NULL, `md5sum` varchar(32) NOT NULL, `line` varchar(255) NOT NULL, PRIMARY KEY (`id`), KEY `md5sum` (`md5sum`), KEY `line` (`line`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1; -- -- Dumping data for table `encryptie` -- /*!40103 SET [EMAIL PROTECTED] */; /*!40101 SET [EMAIL PROTECTED] */; /*!40014 SET [EMAIL PROTECTED] */; /*!40014 SET [EMAIL PROTECTED] */; /*!40101 SET [EMAIL PROTECTED] */; /*!40101 SET [EMAIL PROTECTED] */; /*!40101 SET [EMAIL PROTECTED] */; /*!40111 SET [EMAIL PROTECTED] */; -- Dump completed on 2007-10-23 12:17:47 -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
RE: Query performance plain/text versus AES_DECRYPT(): LIKE %..%
Have you tried reversing the order of your tests, to see if there is some influence from caching? Regards, Jerry Schwartz The Infoshop by Global Information Incorporated 195 Farmington Ave. Farmington, CT 06032 860.674.8796 / FAX: 860.674.8341 www.the-infoshop.com www.giiexpress.com www.etudes-marche.com -Original Message- From: John Kraal [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 23, 2007 8:51 AM To: mysql@lists.mysql.com Subject: Query performance plain/text versus AES_DECRYPT(): LIKE %..% Dear you, I've been working on encrypting some data for a customer. They want their personal/sensitive information encrypted in the database, but they want to be able to search it too, through the application. So we've been thinking a bit, and just started trying and benchmarking some solutions we thought up. The next one really got my attention, I created a table with 4 fields: 1. id (primary/auto_increment, not really interesting) 2. field, with encrypted data 3. md5sum (it has no special use, we benched it though.) 4. line, always containing three words (the same three as encrypted) When we started querying the table for random words (from lipsum.com), it seems that searching in the encrypted fields was _lots_ faster. Results below: 1.000 queries per field: ~$ php -q searchtest.php Control test (plain/text LIKE %..%):1.409699s Decrypt test (AES_DECRYPT() LIKE %..%): 1.226069s done 1.000.000 queries per field: ~$ php -q searchtest.php Control test (plain/text LIKE %..%):155.059671s Decrypt test (AES_DECRYPT() LIKE %..%): 137.003216s done Actually, the only thing I could think of to say was: Well, at least it's consistent. I've attached all the files I used for this test. Edit db.inc.php (add some more lipsum if you want), execute fill.php, and then have fun with bench.php. Does any of you know why this is, how come, etc? I'm just very curious. Regards, John Kraal -- / Humanique / Webstrategie en ontwikkeling / http://www.humanique.com/ - Humanique zoekt een ervaren Web-ontwikkelaar (PHP). Bekijk de vacature op http://www.humanique.com/ - -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Query performance plain/text versus AES_DECRYPT(): LIKE %..%
Yes, I did, and shutdowns between the tests and between reversing the tests. -- / Humanique / Webstrategie en ontwikkeling / http://www.humanique.com/ - Humanique zoekt een ervaren Web-ontwikkelaar (PHP). Bekijk de vacature op http://www.humanique.com/ - Jerry Schwartz wrote: Have you tried reversing the order of your tests, to see if there is some influence from caching? Regards, Jerry Schwartz The Infoshop by Global Information Incorporated 195 Farmington Ave. Farmington, CT 06032 860.674.8796 / FAX: 860.674.8341 www.the-infoshop.com www.giiexpress.com www.etudes-marche.com -Original Message- From: John Kraal [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 23, 2007 8:51 AM To: mysql@lists.mysql.com Subject: Query performance plain/text versus AES_DECRYPT(): LIKE %..% Dear you, I've been working on encrypting some data for a customer. They want their personal/sensitive information encrypted in the database, but they want to be able to search it too, through the application. So we've been thinking a bit, and just started trying and benchmarking some solutions we thought up. The next one really got my attention, I created a table with 4 fields: 1. id (primary/auto_increment, not really interesting) 2. field, with encrypted data 3. md5sum (it has no special use, we benched it though.) 4. line, always containing three words (the same three as encrypted) When we started querying the table for random words (from lipsum.com), it seems that searching in the encrypted fields was _lots_ faster. Results below: 1.000 queries per field: ~$ php -q searchtest.php Control test (plain/text LIKE %..%):1.409699s Decrypt test (AES_DECRYPT() LIKE %..%): 1.226069s done 1.000.000 queries per field: ~$ php -q searchtest.php Control test (plain/text LIKE %..%):155.059671s Decrypt test (AES_DECRYPT() LIKE %..%): 137.003216s done Actually, the only thing I could think of to say was: Well, at least it's consistent. I've attached all the files I used for this test. Edit db.inc.php (add some more lipsum if you want), execute fill.php, and then have fun with bench.php. Does any of you know why this is, how come, etc? I'm just very curious. Regards, John Kraal -- / Humanique / Webstrategie en ontwikkeling / http://www.humanique.com/ - Humanique zoekt een ervaren Web-ontwikkelaar (PHP). Bekijk de vacature op http://www.humanique.com/ - -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Query performance plain/text versus AES_DECRYPT(): LIKE %..%
Hi John, Your attachment for the php code got stripped somewhere. Can you post it somewhere (http preferable)? In either case it's going to result in a full table scan, so they are actually both a bad strategy long term, but they should in theory perform as you would expect, with with encryption being slightly slower. Have you tried with longer strings? What is your customer's fear with having the data in plain text? Presumably in order to use this in your application, you will have the AES password stored in your application, and it will end up in logs (such as the slow query log) quite frequently. I would think your data can be safer and your security more effective by setting some policies which are less intrusive into the actual workings of the data, such as encrypting backups and setting system-level policies. Regards, Jeremy John Kraal wrote: Dear you, I've been working on encrypting some data for a customer. They want their personal/sensitive information encrypted in the database, but they want to be able to search it too, through the application. So we've been thinking a bit, and just started trying and benchmarking some solutions we thought up. The next one really got my attention, I created a table with 4 fields: 1. id (primary/auto_increment, not really interesting) 2. field, with encrypted data 3. md5sum (it has no special use, we benched it though.) 4. line, always containing three words (the same three as encrypted) When we started querying the table for random words (from lipsum.com), it seems that searching in the encrypted fields was _lots_ faster. Results below: 1.000 queries per field: ~$ php -q searchtest.php Control test (plain/text LIKE %..%):1.409699s Decrypt test (AES_DECRYPT() LIKE %..%): 1.226069s done 1.000.000 queries per field: ~$ php -q searchtest.php Control test (plain/text LIKE %..%):155.059671s Decrypt test (AES_DECRYPT() LIKE %..%): 137.003216s done Actually, the only thing I could think of to say was: Well, at least it's consistent. I've attached all the files I used for this test. Edit db.inc.php (add some more lipsum if you want), execute fill.php, and then have fun with bench.php. Does any of you know why this is, how come, etc? I'm just very curious. Regards, John Kraal -- high performance mysql consulting www.provenscaling.com -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Query performance.
Thanks a lot!! :D You were right. There was a bug. Upgrading to mysql 4.1.20 solved my problem. Daniel da Veiga wrote: Check http://bugs.mysql.com/bug.php?id=12915 -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Query performance.
Hi, List! I'm a little bit confused with (IMHO) poor query performance. I have a table with 1'000'000 records. Table consists of 2 service fields and a number of data fields. Service fields are status and processor_id (added for concurrent queue processing). The question is why are updates so slow? A query like: UPDATE queue SET status=1 WHERE status=0 LIMIT 1; takes about 5 seconds while this SELECT * FROM queue WHERE status=0 LIMIT 1; takes 0.01-0.02 second. As I can see in process list most of the time query is Searching rows for update what's very strange. I thought UPDATE searches rows the same way SELECT does. Doesn't it? Actually, seems like it does, because if I remove all fields except for id and status, same both queries (SELECT UPDATE) work quite fast. So, why is my update query so slow? What can I do to make it work faster? Can I somehow find out what is the bottleneck here? May be I should increase some buffers or something else? I copied my-huge my.cnf sample from mysql distribution. I'm looking forward for any help because I'm stuck with this and don't know what to do. Thanks in advance to all! P.S.: Some table info: mysql show table status like 'queue'; +--++-++++-+-+--+---++-+-++--+--++---+ | Name | Engine | Version | Row_format | Rows | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time | Update_time | Check_time | Collation| Checksum | Create_options | Comment | +--++-++++-+-+--+---++-+-++--+--++---+ | queue | InnoDB | 9 | Dynamic| 726423 |159 | 116031488 |NULL | 32555008 | 0 |101 | 2006-06-06 22:01:21 | NULL| NULL | koi8r_general_ci | NULL || InnoDB free: 68608 kB | +--++-++++-+-+--+---++-+-++--+--++---+ mysql show indexes from queue; +--++--+--+--+---+-+--++--++-+ | Table| Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | +--++--+--+--+---+-+--++--++-+ | queue | 0 | PRIMARY |1 | id | A | 1170633 | NULL | NULL | | BTREE | | | queue | 1 | status |1 | status | A | 18 | NULL | NULL | | BTREE | | | queue | 1 | processor_id |1 | processor_id | A | 18 | NULL | NULL | YES | BTREE | | +--++--+--+--+---+-+--++--++-+ 3 rows in set (0.01 sec) /etc/my.cnf: ... [mysqld] port= 3306 socket = /tmp/mysql.sock skip-locking key_buffer = 384M max_allowed_packet = 1M table_cache = 512 sort_buffer_size = 2M read_buffer_size = 2M read_rnd_buffer_size = 8M myisam_sort_buffer_size = 64M thread_cache_size = 8 query_cache_size = 32M # Try number of CPU's*2 for thread_concurrency thread_concurrency = 4 ... -- BR, Eugene Kosov -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Query performance.
On 6/6/06, Eugene Kosov [EMAIL PROTECTED] wrote: Hi, List! I'm a little bit confused with (IMHO) poor query performance. I have a table with 1'000'000 records. Table consists of 2 service fields and a number of data fields. Service fields are status and processor_id (added for concurrent queue processing). The question is why are updates so slow? A query like: UPDATE queue SET status=1 WHERE status=0 LIMIT 1; takes about 5 seconds while this SELECT * FROM queue WHERE status=0 LIMIT 1; takes 0.01-0.02 second. As I can see in process list most of the time query is Searching rows for update what's very strange. I thought UPDATE searches rows the same way SELECT does. Doesn't it? Actually, seems like it does, because if I remove all fields except for id and status, same both queries (SELECT UPDATE) work quite fast. So, why is my update query so slow? What can I do to make it work faster? Can I somehow find out what is the bottleneck here? May be I should increase some buffers or something else? I copied my-huge my.cnf sample from mysql distribution. I'm looking forward for any help because I'm stuck with this and don't know what to do. Thanks in advance to all! P.S.: Some table info: mysql show table status like 'queue'; +--++-++++-+-+--+---++-+-++--+--++---+ | Name | Engine | Version | Row_format | Rows | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time | Update_time | Check_time | Collation| Checksum | Create_options | Comment | +--++-++++-+-+--+---++-+-++--+--++---+ | queue | InnoDB | 9 | Dynamic| 726423 |159 | 116031488 |NULL | 32555008 | 0 |101 | 2006-06-06 22:01:21 | NULL| NULL | koi8r_general_ci | NULL || InnoDB free: 68608 kB | +--++-++++-+-+--+---++-+-++--+--++---+ mysql show indexes from queue; +--++--+--+--+---+-+--++--++-+ | Table| Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | +--++--+--+--+---+-+--++--++-+ | queue | 0 | PRIMARY |1 | id | A | 1170633 | NULL | NULL | | BTREE | | | queue | 1 | status |1 | status | A | 18 | NULL | NULL | | BTREE | | | queue | 1 | processor_id |1 | processor_id | A | 18 | NULL | NULL | YES | BTREE | | +--++--+--+--+---+-+--++--++-+ 3 rows in set (0.01 sec) /etc/my.cnf: ... [mysqld] port= 3306 socket = /tmp/mysql.sock skip-locking key_buffer = 384M max_allowed_packet = 1M table_cache = 512 sort_buffer_size = 2M read_buffer_size = 2M read_rnd_buffer_size = 8M myisam_sort_buffer_size = 64M thread_cache_size = 8 query_cache_size = 32M # Try number of CPU's*2 for thread_concurrency thread_concurrency = 4 ... Check http://bugs.mysql.com/bug.php?id=12915 -- Daniel da Veiga Computer Operator - RS - Brazil -BEGIN GEEK CODE BLOCK- Version: 3.1 GCM/IT/P/O d-? s:- a? C++$ UBLA++ P+ L++ E--- W+++$ N o+ K- w O M- V- PS PE Y PGP- t+ 5 X+++ R+* tv b+ DI+++ D+ G+ e h+ r+ y++ --END GEEK CODE BLOCK-- -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Slow Query Performance
Hi Green, Scrubbing out the data is a great suggestion, I will definitely try that out. I did try out the other option using REGEXP instead of matching individual conditions. It definitely cleaned up the implementation, but did not really improve the performance. -Harini [EMAIL PROTECTED] wrote: Harini Raghavan [EMAIL PROTECTED] wrote on 10/04/2005 11:17:48 AM: Hi, I am using MYSQL 4.1 database in my J2ee application. I am facing performance issues with some queries that are being run on text fields. Since MYISAM storage engine does not support transactions(and my application requires the database tables to support transaction), I have not been able to use FULL TEXT searches. I need to perform token matching against the text fields and so use a lot of LIKE expressions in the query. Below is one such query which is taking almost 2 mins to execute. select count(emp.id) from executive as exec1 , company comp, target_company targetComp, employment as emp where emp.executive_id = exec1.id and emp.company_id = comp.id and comp.id = targetComp.company_id and ((emp.title like 'V.P.' OR emp.title like 'V.P. %' OR emp.title like 'V.P., %' OR emp.title like 'V.P.. %' OR emp.title like '% V.P. %' OR emp.title like '% V.P., %' OR emp.title like '% V.P.. %' OR emp.title like '% V.P.' OR emp.title like '% V.P..') OR (emp.title like 'VP' OR emp.title like 'VP %' OR emp.title like 'VP, %' OR emp.title like 'VP. %' OR emp.title like '% VP %' OR emp.title like '% VP, %' OR emp.title like '% VP. %' OR emp.title like '% VP' OR emp.title like '% VP.') OR (emp.title like 'Vice President' OR emp.title like 'Vice President %' OR emp.title like 'Vice President, %' OR emp.title like 'Vice President. %' OR emp.title like '% Vice President %' OR emp.title like '% Vice President, %' OR emp.title like '% Vice President. %' OR emp.title like '% Vice President' OR emp.title like '% Vice President.') OR (emp.title like 'Vice-President' OR emp.title like 'Vice-President %' OR emp.title like 'Vice-President, %' OR emp.title like 'Vice-President. %' OR emp.title like '% Vice-President %' OR emp.title like '% Vice-President, %' OR emp.title like '% Vice-President. %' OR emp.title like '% Vice-President' OR emp.title like '% Vice-President.')) and emp.active = 1 Does MYSQL provide any other option to perform text based searches? Can someone suggest any tips for performance tuning the database in this scenario? Thanks, Harini It seems to me that you are trying to search on unscrubbed data. I can guess that your are collecting this data from a variety of sources and that those sources don't always use the same abbreviation or punctuation. However, it is better do deal with this kind of issue as the data arrives (before it enters your database) and not to deal with it during retrieval (as you are trying to do). You need to scrub your data and standardize on certain names and abbreviations. Decide that V.P. is going to be your standard for Vice President (and any of it's variants) and update all of your data to match. Until then, you can use RLIKE or REGEXP to minimize how many search terms you are evaluating. http://dev.mysql.com/doc/mysql/en/string-comparison-functions.html http://dev.mysql.com/doc/mysql/en/regexp.html For example, all of these terms: (emp.title like 'V.P.' OR emp.title like 'V.P. %' OR emp.title like 'V.P., %' OR emp.title like 'V.P.. %' OR emp.title like '% V.P. %' OR emp.title like '% V.P., %' OR emp.title like '% V.P.. %' OR emp.title like '% V.P.' OR emp.title like '% V.P..') OR (emp.title like 'VP' OR emp.title like 'VP %' OR emp.title like 'VP, %' OR emp.title like 'VP. %' OR emp.title like '% VP %' OR emp.title like '% VP, %' OR emp.title like '% VP. %' OR emp.title like '% VP' OR emp.title like '% VP.') can be simplified to just: emp.title REGEXP '[:space:]*V.*P[ ,.]*' Would that help? Shawn Green Database Administrator Unimin Corporation - Spruce Pine -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Slow Query Performance
Hi Brent, Using REGEXP did not really help with the performance. I need to do whole word matching sowould prefer not to do LIKE '%Vice President%' as it may return ome negative results. I separated out some of the text based columns in to a different table using MYISAM storage engine. Using FULLTEXT with IN BOOLEAN MODE was a bit slow, but without the IN BOOLEAN MODE, it seems fast. However I think the phrase searches are not working properly. For example the below query returned records where the title was 'Vice Chairman': select emp.title from employment_title emp where MATCH(emp.title) AGAINST('Vice President') I have verified the syntax of phrase query, and it seems to bve correct. Any idea why this is happening? Also if I have multiple phrases is the following query syntax correct? select emp.title from employment_title emp where MATCH(emp.title) AGAINST('V.P. VP Vice President Vice-President') Thanks, Harini Brent Baisley wrote: Egads! That's a lot of OR's. You probably want to use REGEXP instead of all those OR's. REGEXP can be slow, but you'll be doing far less comparisons than what you have in your current query so it may be faster than what you have. Something like this: SELECT ... WHERE ... emp.title REGEXP 'V[.]?P[.]?' OR emp.title REGEXP 'Vice[-]? President' I think that will match everything you have. At the very least you don't have to check for periods, commas, space and everything else before and after what you are searching on. Searching on emp.title LIKE '%Vice President%', will find 'Vice President' anywhere in the text, regardless of what come before or after it. One thing you should think about trying to do is breaking out the fields you need to do a full text search on into a separate table that you can make MYISAM. You'll be adding a JOIN to your queries that need to do the full text search, but it should be a lot quicker and your queries much simpler. You'll then have a mix of InnoDB and MYISAM tables, which is perfectly legal. On Oct 4, 2005, at 11:17 AM, Harini Raghavan wrote: Hi, I am using MYSQL 4.1 database in my J2ee application. I am facing performance issues with some queries that are being run on text fields. Since MYISAM storage engine does not support transactions (and my application requires the database tables to support transaction), I have not been able to use FULL TEXT searches. I need to perform token matching against the text fields and so use a lot of LIKE expressions in the query. Below is one such query which is taking almost 2 mins to execute. select count(emp.id) from executive as exec1 , company comp, target_company targetComp, employment as emp where emp.executive_id = exec1.id and emp.company_id = comp.id and comp.id = targetComp.company_id and ((emp.title like 'V.P.' OR emp.title like 'V.P. %' OR emp.title like 'V.P., %' OR emp.title like 'V.P.. %' OR emp.title like '% V.P. %' OR emp.title like '% V.P., %' OR emp.title like '% V.P.. %' OR emp.title like '% V.P.' OR emp.title like '% V.P..') OR (emp.title like 'VP' OR emp.title like 'VP %' OR emp.title like 'VP, %' OR emp.title like 'VP. %' OR emp.title like '% VP %' OR emp.title like '% VP, %' OR emp.title like '% VP. %' OR emp.title like '% VP' OR emp.title like '% VP.') OR (emp.title like 'Vice President' OR emp.title like 'Vice President %' OR emp.title like 'Vice President, %' OR emp.title like 'Vice President. %' OR emp.title like '% Vice President %' OR emp.title like '% Vice President, %' OR emp.title like '% Vice President. %' OR emp.title like '% Vice President' OR emp.title like '% Vice President.') OR (emp.title like 'Vice-President' OR emp.title like 'Vice-President %' OR emp.title like 'Vice-President, %' OR emp.title like 'Vice- President. %' OR emp.title like '% Vice-President %' OR emp.title like '% Vice-President, %' OR emp.title like '% Vice- President. %' OR emp.title like '% Vice-President' OR emp.title like '% Vice-President.')) and emp.active = 1 Does MYSQL provide any other option to perform text based searches? Can someone suggest any tips for performance tuning the database in this scenario? Thanks, Harini -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql? [EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Slow Query Performance
You're still doing a full table scan with REGEX, so you'll never get it really fast. I was thinking it would be slightly faster because of less comparisons. It's the full table scan and no use of indexes that you want to get away from. Without doing that, the only way to get things faster is with faster disks and more RAM for caching. I assume you created a full text index on the title field? Sorry, but have to ask. You can perform a full text search without full text index, but it's not going to be fast since no index is used. Try adding a plus sign in front of the words or phrases you want to search on when using IN BOOLEAN MODE. MATCH(emp.title) AGAINST('+Vice President' IN BOOLEAN MODE) Although I don't know why it would be matching Vice Chairman, that's just not right. Are you patched to the latest version of MySQL? I remember getting some really weird results with full text searches with an older version of 4.1. Regardless, you will have to adjust the full text settings and reindex to find words like VP, since that is shorter than the default 4 character word length minimum. I had to do this to search on things like VB (Visual Basic). MySQL still won't index V.P. since periods aren't valid characters for words and you wouldn't want MySQL indexing single letter words. I have a similar problem trying to search on C, the programming language. I've figured out a work around for C++, but not C. I think you're stuck with doing at least a little data scrubbing and do cleanup for new data before saving. On Oct 5, 2005, at 7:05 AM, Harini Raghavan wrote: Hi Brent, Using REGEXP did not really help with the performance. I need to do whole word matching sowould prefer not to do LIKE '%Vice President %' as it may return ome negative results. I separated out some of the text based columns in to a different table using MYISAM storage engine. Using FULLTEXT with IN BOOLEAN MODE was a bit slow, but without the IN BOOLEAN MODE, it seems fast. However I think the phrase searches are not working properly. For example the below query returned records where the title was 'Vice Chairman': select emp.title from employment_title emp where MATCH(emp.title) AGAINST('Vice President') I have verified the syntax of phrase query, and it seems to bve correct. Any idea why this is happening? Also if I have multiple phrases is the following query syntax correct? select emp.title from employment_title emp where MATCH(emp.title) AGAINST('V.P. VP Vice President Vice-President') Thanks, Harini Brent Baisley wrote: Egads! That's a lot of OR's. You probably want to use REGEXP instead of all those OR's. REGEXP can be slow, but you'll be doing far less comparisons than what you have in your current query so it may be faster than what you have. Something like this: SELECT ... WHERE ... emp.title REGEXP 'V[.]?P[.]?' OR emp.title REGEXP 'Vice [-]? President' I think that will match everything you have. At the very least you don't have to check for periods, commas, space and everything else before and after what you are searching on. Searching on emp.title LIKE '%Vice President%', will find 'Vice President' anywhere in the text, regardless of what come before or after it. One thing you should think about trying to do is breaking out the fields you need to do a full text search on into a separate table that you can make MYISAM. You'll be adding a JOIN to your queries that need to do the full text search, but it should be a lot quicker and your queries much simpler. You'll then have a mix of InnoDB and MYISAM tables, which is perfectly legal. On Oct 4, 2005, at 11:17 AM, Harini Raghavan wrote: Hi, I am using MYSQL 4.1 database in my J2ee application. I am facing performance issues with some queries that are being run on text fields. Since MYISAM storage engine does not support transactions (and my application requires the database tables to support transaction), I have not been able to use FULL TEXT searches. I need to perform token matching against the text fields and so use a lot of LIKE expressions in the query. Below is one such query which is taking almost 2 mins to execute. select count(emp.id) from executive as exec1 , company comp, target_company targetComp, employment as emp where emp.executive_id = exec1.id and emp.company_id = comp.id and comp.id = targetComp.company_id and ((emp.title like 'V.P.' OR emp.title like 'V.P. %' OR emp.title like 'V.P., %' OR emp.title like 'V.P.. %' OR emp.title like '% V.P. %' OR emp.title like '% V.P., %' OR emp.title like '% V.P.. %' OR emp.title like '% V.P.' OR emp.title like '% V.P..') OR (emp.title like 'VP' OR emp.title like 'VP %' OR emp.title like 'VP, %' OR emp.title like 'VP. %' OR emp.title like '% VP %' OR emp.title like '% VP, %' OR emp.title like '% VP. %' OR emp.title like '% VP' OR emp.title like '% VP.') OR (emp.title
Slow Query Performance
Hi, I am using MYSQL 4.1 database in my J2ee application. I am facing performance issues with some queries that are being run on text fields. Since MYISAM storage engine does not support transactions(and my application requires the database tables to support transaction), I have not been able to use FULL TEXT searches. I need to perform token matching against the text fields and so use a lot of LIKE expressions in the query. Below is one such query which is taking almost 2 mins to execute. select count(emp.id) from executive as exec1 , company comp, target_company targetComp, employment as emp where emp.executive_id = exec1.id and emp.company_id = comp.id and comp.id = targetComp.company_id and ((emp.title like 'V.P.' OR emp.title like 'V.P. %' OR emp.title like 'V.P., %' OR emp.title like 'V.P.. %' OR emp.title like '% V.P. %' OR emp.title like '% V.P., %' OR emp.title like '% V.P.. %' OR emp.title like '% V.P.' OR emp.title like '% V.P..') OR (emp.title like 'VP' OR emp.title like 'VP %' OR emp.title like 'VP, %' OR emp.title like 'VP. %' OR emp.title like '% VP %' OR emp.title like '% VP, %' OR emp.title like '% VP. %' OR emp.title like '% VP' OR emp.title like '% VP.') OR (emp.title like 'Vice President' OR emp.title like 'Vice President %' OR emp.title like 'Vice President, %' OR emp.title like 'Vice President. %' OR emp.title like '% Vice President %' OR emp.title like '% Vice President, %' OR emp.title like '% Vice President. %' OR emp.title like '% Vice President' OR emp.title like '% Vice President.') OR (emp.title like 'Vice-President' OR emp.title like 'Vice-President %' OR emp.title like 'Vice-President, %' OR emp.title like 'Vice-President. %' OR emp.title like '% Vice-President %' OR emp.title like '% Vice-President, %' OR emp.title like '% Vice-President. %' OR emp.title like '% Vice-President' OR emp.title like '% Vice-President.')) and emp.active = 1 Does MYSQL provide any other option to perform text based searches? Can someone suggest any tips for performance tuning the database in this scenario? Thanks, Harini -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Slow Query Performance
Harini Raghavan [EMAIL PROTECTED] wrote on 10/04/2005 11:17:48 AM: Hi, I am using MYSQL 4.1 database in my J2ee application. I am facing performance issues with some queries that are being run on text fields. Since MYISAM storage engine does not support transactions(and my application requires the database tables to support transaction), I have not been able to use FULL TEXT searches. I need to perform token matching against the text fields and so use a lot of LIKE expressions in the query. Below is one such query which is taking almost 2 mins to execute. select count(emp.id) from executive as exec1 , company comp, target_company targetComp, employment as emp where emp.executive_id = exec1.id and emp.company_id = comp.id and comp.id = targetComp.company_id and ((emp.title like 'V.P.' OR emp.title like 'V.P. %' OR emp.title like 'V.P., %' OR emp.title like 'V.P.. %' OR emp.title like '% V.P. %' OR emp.title like '% V.P., %' OR emp.title like '% V.P.. %' OR emp.title like '% V.P.' OR emp.title like '% V.P..') OR (emp.title like 'VP' OR emp.title like 'VP %' OR emp.title like 'VP, %' OR emp.title like 'VP. %' OR emp.title like '% VP %' OR emp.title like '% VP, %' OR emp.title like '% VP. %' OR emp.title like '% VP' OR emp.title like '% VP.') OR (emp.title like 'Vice President' OR emp.title like 'Vice President %' OR emp.title like 'Vice President, %' OR emp.title like 'Vice President. %' OR emp.title like '% Vice President %' OR emp.title like '% Vice President, %' OR emp.title like '% Vice President. %' OR emp.title like '% Vice President' OR emp.title like '% Vice President.') OR (emp.title like 'Vice-President' OR emp.title like 'Vice-President %' OR emp.title like 'Vice-President, %' OR emp.title like 'Vice-President. %' OR emp.title like '% Vice-President %' OR emp.title like '% Vice-President, %' OR emp.title like '% Vice-President. %' OR emp.title like '% Vice-President' OR emp.title like '% Vice-President.')) and emp.active = 1 Does MYSQL provide any other option to perform text based searches? Can someone suggest any tips for performance tuning the database in this scenario? Thanks, Harini It seems to me that you are trying to search on unscrubbed data. I can guess that your are collecting this data from a variety of sources and that those sources don't always use the same abbreviation or punctuation. However, it is better do deal with this kind of issue as the data arrives (before it enters your database) and not to deal with it during retrieval (as you are trying to do). You need to scrub your data and standardize on certain names and abbreviations. Decide that V.P. is going to be your standard for Vice President (and any of it's variants) and update all of your data to match. Until then, you can use RLIKE or REGEXP to minimize how many search terms you are evaluating. http://dev.mysql.com/doc/mysql/en/string-comparison-functions.html http://dev.mysql.com/doc/mysql/en/regexp.html For example, all of these terms: (emp.title like 'V.P.' OR emp.title like 'V.P. %' OR emp.title like 'V.P., %' OR emp.title like 'V.P.. %' OR emp.title like '% V.P. %' OR emp.title like '% V.P., %' OR emp.title like '% V.P.. %' OR emp.title like '% V.P.' OR emp.title like '% V.P..') OR (emp.title like 'VP' OR emp.title like 'VP %' OR emp.title like 'VP, %' OR emp.title like 'VP. %' OR emp.title like '% VP %' OR emp.title like '% VP, %' OR emp.title like '% VP. %' OR emp.title like '% VP' OR emp.title like '% VP.') can be simplified to just: emp.title REGEXP '[:space:]*V.*P[ ,.]*' Would that help? Shawn Green Database Administrator Unimin Corporation - Spruce Pine
Re: Slow Query Performance
Hello. Does MYSQL provide any other option to perform text based searches? Can someone suggest any tips for performance tuning the database in this scenario? Use the same queries linked with UNION instead of a lot of ORs in WHERE clause. For example this query can't use index (at least in 4.1 branch): SELECT A FROM B WHERE A LIKE 'h%' OR A LIKE 'l%'; But this one which is equal can: SELECT A FROM B WHERE A LIKE 'h%' UNION SELECT A FROM B WHERE A LIKE 'l%'; Check that you have an index on emp.title. MySQL 5.0 has so known 'Index Merge Optimization' which might be helpful in your case. See: http://dev.mysql.com/doc/mysql/en/index-merge-optimization.html Harini Raghavan wrote: Hi, I am using MYSQL 4.1 database in my J2ee application. I am facing performance issues with some queries that are being run on text fields. Since MYISAM storage engine does not support transactions(and my application requires the database tables to support transaction), I have not been able to use FULL TEXT searches. I need to perform token matching against the text fields and so use a lot of LIKE expressions in the query. Below is one such query which is taking almost 2 mins to execute. select count(emp.id) from executive as exec1 , company comp, target_company targetComp, employment as emp where emp.executive_id = exec1.id and emp.company_id = comp.id and comp.id = targetComp.company_id and ((emp.title like 'V.P.' OR emp.title like 'V.P. %' OR emp.title like 'V.P., %' OR emp.title like 'V.P.. %' OR emp.title like '% V.P. %' OR emp.title like '% V.P., %' OR emp.title like '% V.P.. %' OR emp.title like '% V.P.' OR emp.title like '% V.P..') OR (emp.title like 'VP' OR emp.title like 'VP %' OR emp.title like 'VP, %' OR emp.title like 'VP. %' OR emp.title like '% VP %' OR emp.title like '% VP, %' OR emp.title like '% VP. %' OR emp.title like '% VP' OR emp.title like '% VP.') OR (emp.title like 'Vice President' OR emp.title like 'Vice President %' OR emp.title like 'Vice President, %' OR emp.title like 'Vice President. %' OR emp.title like '% Vice President %' OR emp.title like '% Vice President, %' OR emp.title like '% Vice President. %' OR emp.title like '% Vice President' OR emp.title like '% Vice President.') OR (emp.title like 'Vice-President' OR emp.title like 'Vice-President %' OR emp.title like 'Vice-President, %' OR emp.title like 'Vice-President. %' OR emp.title like '% Vice-President %' OR emp.title like '% Vice-President, %' OR emp.title like '% Vice-President. %' OR emp.title like '% Vice-President' OR emp.title like '% Vice-President.')) and emp.active = 1 Does MYSQL provide any other option to perform text based searches? Can someone suggest any tips for performance tuning the database in this scenario? Thanks, Harini -- For technical support contracts, goto https://order.mysql.com/?ref=ensita This email is sponsored by Ensita.NET http://www.ensita.net/ __ ___ ___ __ / |/ /_ __/ __/ __ \/ /Gleb Paharenko / /|_/ / // /\ \/ /_/ / /__ [EMAIL PROTECTED] /_/ /_/\_, /___/\___\_\___/ MySQL AB / Ensita.NET ___/ www.mysql.com -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Query performance...two table design options
James Tu wrote: Hi: Let's say I want to store the following information. Unique ID - INT(10) autoincrement First Name - VARCHAR (25) Last Name - VARCHAR (25) Age - INT(3) In general 'age' is a bad column, because you need to know what year the data was entered to calculate the current age. It is often better to store year of birth or date of birth. This may not be relevant to your application, I just wanted to mention it. Date - DATETIME Activity - VARCHAR(100) Data - TEXT I would be basing my queries on all columns _except_ the Data column. I.e. I would be using WHERE's with all except the Data column. You are not telling us how much data you are planning to maintain. How big will the Data column be, on average, and how many rows/persons are we talking about? Hundreds, thousands or millions? My question is...which design would perform better? (Design A) Put all in one table...index all the columns that I will use WHERE with. -TABLE_ALL- Unique ID - INT(10) autoincrement First Name - VARCHAR (25) Last Name - VARCHAR (25) Age - INT(3) Date - DATETIME Activity - VARCHAR(100) Data - TEXT Indices - Unique ID, First Name, Last Name, Age, Date, Activity You will probably not need to index all columns. If you have few rows, you don't need indexes at all, except for the primary key on the unique ID. A primary key automatically works as an index. I would probably start with only the primary key, and add indexes only when I find that some queries are too slow. SELECT First_Name, Last_Name, Data FROM TABLE_ALL WHERE Activity = 'draw' AND Age 24; (Design B) Put the Data in its own separate table. -TABLE_A- Unique ID - INT(10) autoincrement First Name - VARCHAR (25) Last Name - VARCHAR (25) Age - INT(3) Date - DATETIME Activity - VARCHAR(100) Data_ID - INT(10) Indices - Unique ID, First Name, Last Name, Age, Date, Activity -TABLE_B- Data_ID - INT(10) Data - TEXT Index - Data_ID This will be faster if your Data column is relatively big (several K on average, I don't know. depends on your HW, of course). I would suggest using the unique ID from TABLE_A as a primary key in TABLE_B, and drop Data_ID from TABLE_A. If there are millions of rows I would normalize these tables to the extreme, something like this: Person: P_Id,Born FName: FN_Id,FirstName LName: LN_Id,LastName FN_P: FN_Id,P_id LN_P: LN_Id,P_id Activity: A_Id,Activity Act_P: A_id,P_Id Data:P_Id,Data FN_P and LN_P are so-called link tables, linking names to persons in a many-to-many relation. Even further normalization would have been achieved with an additional counter column. It would be used in these tables to maintain the order of the names when a person have multiple first names or last names, so that you would have one FName row for each unique name, Mary Jane would be split in Mary and Jane. You could query this schema like this: SELECT FirstName,LastName,Data FROM Person,FName,LName,Data,Activity,FN_P,LN_P,Act_P WHERE Person.P_Id = Data.P_Id AND Person.P_Id = FN_P.P_Id AND Person.P_Id = LN_P.P_Id AND Person.P_Id = Act_P.P_Id AND FName.FN_Id = FN_P.FN_Id AND LName.LN_Id = LN_P.LN_Id AND Activity.A_Id = Act_P.A_Id AND Activity = 'draw' and Born year(now()) - 24 ...or with more explicit formulated joins, like this: SELECT FirstName,LastName,Data FROM Person NATURAL JOIN Act_P NATURAL JOIN Activity INNER JOIN FN_P ON FN_P.P_Id=Person.P_Id NATURAL JOIN FName INNER JOIN LN_P ON LN_P.P_Id=Person.P_Id NATURAL JOIN LName, LEFT JOIN Data ON Data.P_Id = Person.P_Id WHERE Activity = 'draw' and Born year(now()) - 24 The NATURAL JOINS are joins based on columns with the same name in the two joined tables, see the manual. The LEFT JOIN is used in this case because some Persons may not have a corresponding row in the Data table, in this case the Data column of the result table will contain NULL. If you used an INNER join in place of the LEFT join in this case, Persons without a Data record would be omitted from the result. SELECT TABLE_A.First_Name, TABLE_A.Last_Name, TABLE_B.Data FROM TABLE_A, TABLE_B WHERE Activity = 'draw' AND Age 24 AND TABLE_A.Data_ID = TABLE_B.Data_ID; (Aside: Would this query give me the same results as the above query?) Yes, I think so, if all rows in TABLE_A have a corresponding row in TABLE_B. -- Roger -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Query performance...two table design options
Hi: Let's say I want to store the following information. Unique ID - INT(10) autoincrement First Name - VARCHAR (25) Last Name - VARCHAR (25) Age - INT(3) Date - DATETIME Activity - VARCHAR(100) Data - TEXT I would be basing my queries on all columns _except_ the Data column. I.e. I would be using WHERE's with all except the Data column. My question is...which design would perform better? (Design A) Put all in one table...index all the columns that I will use WHERE with. -TABLE_ALL- Unique ID - INT(10) autoincrement First Name - VARCHAR (25) Last Name - VARCHAR (25) Age - INT(3) Date - DATETIME Activity - VARCHAR(100) Data - TEXT Indices - Unique ID, First Name, Last Name, Age, Date, Activity SELECT First_Name, Last_Name, Data FROM TABLE_ALL WHERE Activity = 'draw' AND Age 24; (Design B) Put the Data in its own separate table. -TABLE_A- Unique ID - INT(10) autoincrement First Name - VARCHAR (25) Last Name - VARCHAR (25) Age - INT(3) Date - DATETIME Activity - VARCHAR(100) Data_ID - INT(10) Indices - Unique ID, First Name, Last Name, Age, Date, Activity -TABLE_B- Data_ID - INT(10) Data - TEXT Index - Data_ID SELECT TABLE_A.First_Name, TABLE_A.Last_Name, TABLE_B.Data FROM TABLE_A, TABLE_B WHERE Activity = 'draw' AND Age 24 AND TABLE_A.Data_ID = TABLE_B.Data_ID; (Aside: Would this query give me the same results as the above query?) -James
Query Performance
Hi Folks! Here, I have the following table: CREATE TABLE `accesses` ( `time` varchar(15) NOT NULL default '', `duration` int(10) default NULL, `user` varchar(25) NOT NULL default '', `ipaddr` varchar(15) NOT NULL default '', `result` varchar(30) default NULL, `bytes` int(10) default NULL, `reqmethod` varchar(10) default NULL, `urlparent` varchar(100) NOT NULL default '', KEY `usuario` (`usuario`), KEY `time_result` (`time`, `result`) ); If my table has a great number of rows (something like 5 millions), the result time is too much longer. select user, count(distinct concat(date_format(from_unixtime(time), %d/%m/%Y), - , time_format(from_unixtime(time), %H:%i)), ipaddr, urlparent) as qtd, sec_to_time(sum(duration)/1000) as duration, sum(bytes) as bytes from acessos where time = 1109646000 and time = 1112324399 and result TCP_DENIED/403 group by user order by user; PS: explaining this select, the time_result key is a possible_key, but, in the key field I have the NULL value (the NULL persists even if I force with use index()). Can anybody help me what can I do to make this query faster (indexes, tuning, or, change the table structure or the query). Thank you ! -- Fernando Henrique Giorgetti [EMAIL PROTECTED] Departamento de Tecnologia http://www.gruponet.com.br -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
RE: Query Performance
From: Fernando Henrique Giorgetti [mailto:[EMAIL PROTECTED] Hi Folks! Here, I have the following table: CREATE TABLE `accesses` ( `time` varchar(15) NOT NULL default '', `duration` int(10) default NULL, `user` varchar(25) NOT NULL default '', `ipaddr` varchar(15) NOT NULL default '', `result` varchar(30) default NULL, `bytes` int(10) default NULL, `reqmethod` varchar(10) default NULL, `urlparent` varchar(100) NOT NULL default '', KEY `usuario` (`usuario`), KEY `time_result` (`time`, `result`) ); If my table has a great number of rows (something like 5 millions), the result time is too much longer. select user, count(distinct concat(date_format(from_unixtime(time), %d/%m/%Y), - , time_format(from_unixtime(time), %H:%i)), ipaddr, urlparent) as qtd, sec_to_time(sum(duration)/1000) as duration, sum(bytes) as bytes from acessos where time = 1109646000 and time = 1112324399 and result TCP_DENIED/403 group by user order by user; PS: explaining this select, the time_result key is a possible_key, but, in the key field I have the NULL value (the NULL persists even if I force with use index()). Can anybody help me what can I do to make this query faster (indexes, tuning, or, change the table structure or the query). Thank you ! The first thing I'd do is index the `result` field, as you're checking against it in the WHERE clause. However, if I remember indexing behavior correctly, that won't help if you only have a a few unique values in that column. Give it a shot, though, I imagine it'd definitely help. If that doesn't drastically improve it, I'd also look into a way around performing the date and time functions in the query. I don't know if that's possible, but depending on what this is feeding to (most likely PHP or Perl), it may be quicker to do those calculations in the wrapping script (if there is one, that is). HTH! -- Mike Johnson Smarter Living, Inc. Web Developerwww.smartertravel.com [EMAIL PROTECTED] (617) 886-5539 -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
RE: Query Performance
From: Fernando Henrique Giorgetti [mailto:[EMAIL PROTECTED] Hi Folks! Here, I have the following table: CREATE TABLE `accesses` ( `time` varchar(15) NOT NULL default '', `duration` int(10) default NULL, `user` varchar(25) NOT NULL default '', `ipaddr` varchar(15) NOT NULL default '', `result` varchar(30) default NULL, `bytes` int(10) default NULL, `reqmethod` varchar(10) default NULL, `urlparent` varchar(100) NOT NULL default '', KEY `usuario` (`usuario`), KEY `time_result` (`time`, `result`) ); If my table has a great number of rows (something like 5 millions), the result time is too much longer. select user, count(distinct concat(date_format(from_unixtime(time), %d/%m/%Y), - , time_format(from_unixtime(time), %H:%i)), ipaddr, urlparent) as qtd, sec_to_time(sum(duration)/1000) as duration, sum(bytes) as bytes from acessos where time = 1109646000 and time = 1112324399 and result TCP_DENIED/403 group by user order by user; PS: explaining this select, the time_result key is a possible_key, but, in the key field I have the NULL value (the NULL persists even if I force with use index()). Can anybody help me what can I do to make this query faster (indexes, tuning, or, change the table structure or the query). Thank you ! Oh, I'm sorry. I read your CREATE statement too quickly the first time and didn't notice that the `time_result` index was across both `time` and `result`. In that case, indexing `result` separately may not help at all. Might be worth a shot, though, if you have the disk space and time to play around with it. -- Mike Johnson Smarter Living, Inc. Web Developerwww.smartertravel.com [EMAIL PROTECTED] (617) 886-5539 -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Query Performance
Fernando Henrique Giorgetti [EMAIL PROTECTED] wrote on 04/14/2005 02:34:30 PM: Hi Folks! Here, I have the following table: CREATE TABLE `accesses` ( `time` varchar(15) NOT NULL default '', `duration` int(10) default NULL, `user` varchar(25) NOT NULL default '', `ipaddr` varchar(15) NOT NULL default '', `result` varchar(30) default NULL, `bytes` int(10) default NULL, `reqmethod` varchar(10) default NULL, `urlparent` varchar(100) NOT NULL default '', KEY `usuario` (`usuario`), KEY `time_result` (`time`, `result`) ); If my table has a great number of rows (something like 5 millions), the result time is too much longer. select user, count(distinct concat(date_format(from_unixtime(time), %d/%m/%Y), - , time_format(from_unixtime(time), %H:%i)), ipaddr, urlparent) as qtd, sec_to_time(sum(duration)/1000) as duration, sum(bytes) as bytes from acessos where time = 1109646000 and time = 1112324399 and result TCP_DENIED/403 group by user order by user; PS: explaining this select, the time_result key is a possible_key, but, in the key field I have the NULL value (the NULL persists even if I force with use index()). Can anybody help me what can I do to make this query faster (indexes, tuning, or, change the table structure or the query). Thank you ! -- Fernando Henrique Giorgetti [EMAIL PROTECTED] Departamento de Tecnologia http://www.gruponet.com.br -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED] If I unfold and indent your query I get: select user , count(distinct concat( date_format(from_unixtime(time), %d/%m/%Y) , - , time_format(from_unixtime(time), %H:%i) ) , ipaddr , urlparent ) as qtd , sec_to_time(sum(duration)/1000) as duration , sum(bytes) as bytes from acessos where time = 1109646000 and time = 1112324399 and result TCP_DENIED/403 group by user order by user; Your COUNT() operator seems to be trying to execute a COUNT((concatenated date to nearest minute), ipaddr, urlparent). I may have unfolded it incorrectly but that's how it seems to me. I think you meant to put the ipaddr and urlparent fields INTO the CONCAT() but I am just working from what I got. There is a faster way to compute time to the nearest minute than what you are doing with the string conversions. Just do an integer division of your TIME value by 60 and throw away the remainder like this: time DIV 60 or like this: FLOOR(time/60) (http://dev.mysql.com/doc/mysql/en/arithmetic-functions.html) If I understand your COUNT(DISTINCT ) statement correctly, you want to know how in how many different minutes the user used either a unique ipaddr or a different urlparent. Am I close? You can also compute as the OR of two ranges (which may end up using the index or it may not...) So this could be a valid revision of your original query: select user , count(distinct concat( FLOOR(time/60) , ipaddr , urlparent ) ) as qtd , sec_to_time(sum(duration)/1000) as duration , sum(bytes) as bytes from acessos where time = 1109646000 and time = 1112324399 and ( result 'TCP_DENIED/403' OR result 'TCP_DENIED/403' ) group by user; Note: GROUP BY includes a free ORDER BY unless you specify otherwise. HTH, Shawn Green Database Administrator Unimin Corporation - Spruce Pine
Re: Query Performance
You could probably save a bit of processing time by changing: concat(date_format(from_unixtime(time), %d/%m/%Y), - , time_format(from_unixtime(time), %H:%i)) to: date_format(from_unixtime(time), %d/%m/%Y - %H:%i) This would mean half the date conversions would be executed. Separating out the 'time' and 'result' indicies will probably help too. Cheers, Andrew On 14/4/05 6:34 pm, Fernando Henrique Giorgetti [EMAIL PROTECTED] wrote: Hi Folks! Here, I have the following table: CREATE TABLE `accesses` ( `time` varchar(15) NOT NULL default '', `duration` int(10) default NULL, `user` varchar(25) NOT NULL default '', `ipaddr` varchar(15) NOT NULL default '', `result` varchar(30) default NULL, `bytes` int(10) default NULL, `reqmethod` varchar(10) default NULL, `urlparent` varchar(100) NOT NULL default '', KEY `usuario` (`usuario`), KEY `time_result` (`time`, `result`) ); If my table has a great number of rows (something like 5 millions), the result time is too much longer. select user, count(distinct concat(date_format(from_unixtime(time), %d/%m/%Y), - , time_format(from_unixtime(time), %H:%i)), ipaddr, urlparent) as qtd, sec_to_time(sum(duration)/1000) as duration, sum(bytes) as bytes from acessos where time = 1109646000 and time = 1112324399 and result TCP_DENIED/403 group by user order by user; PS: explaining this select, the time_result key is a possible_key, but, in the key field I have the NULL value (the NULL persists even if I force with use index()). Can anybody help me what can I do to make this query faster (indexes, tuning, or, change the table structure or the query). Thank you ! -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: query performance
If you do I suggest you also include relevant table definitions and possibly a little sample data (plus an indication of total table sizes) and expected output, this will greatly assist anyone who my be able to help. Oh yes, and don't forget to state the version of MySQL you are running. Coz On Wed, 16 Feb 2005 18:22:11 -0700, Ryan McCullough [EMAIL PROTECTED] wrote: Can I post a query to this list and ask for help optimizing it? -- Ryan McCullough mailto:[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED] -- CozWeb Solutions Ltd http://www.cozweb.net -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
query performance
Can I post a query to this list and ask for help optimizing it? -- Ryan McCullough mailto:[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
RE: Help with query performance anomaly
Thanks for the advice Steven, I'll bear it in mind and do some reading. Graham -Original Message- From: Steven Roussey [mailto:[EMAIL PROTECTED] Sent: 13 November 2004 02:52 To: 'Graham Cossey' Cc: [EMAIL PROTECTED] Subject: RE: Help with query performance anomaly For production systems, I would never let the mysql optimizer guess a query plan when there are joins of big tables and you know exactly how it should behave. Once you think a query is finished, you should optimize it yourself. Use STRAIGHT_JOIN and USE INDEX as found here in the manual: http://dev.mysql.com/doc/mysql/en/JOIN.html STRAIGHT_JOIN is identical to JOIN, except that the left table is always read before the right table. This can be used for those (few) cases for which the join optimizer puts the tables in the wrong order. http://dev.mysql.com/doc/mysql/en/SELECT.html The use of USE INDEX, IGNORE INDEX, FORCE INDEX to give the optimizer hints about how to choose indexes is described in section 14.1.7.1 JOIN Syntax. -steve-- -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
RE: Help with query performance anomaly (SOLVED)
It turns out that it appears to be a data discrepancy that caused the query optimiser to, well, not optimise. I thought the main table (r) with 3million records would be the problem, but it was table p with 3100 records on the live server and 3082 records on my dev pc that caused the problem. Although the results of show create table etc were identical on both machines, uploading the data from dev to live has solved the problem. Thanks to all that offered advice. Graham snip size=big/ -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
RE: Help with query performance anomaly
For production systems, I would never let the mysql optimizer guess a query plan when there are joins of big tables and you know exactly how it should behave. Once you think a query is finished, you should optimize it yourself. Use STRAIGHT_JOIN and USE INDEX as found here in the manual: http://dev.mysql.com/doc/mysql/en/JOIN.html STRAIGHT_JOIN is identical to JOIN, except that the left table is always read before the right table. This can be used for those (few) cases for which the join optimizer puts the tables in the wrong order. http://dev.mysql.com/doc/mysql/en/SELECT.html The use of USE INDEX, IGNORE INDEX, FORCE INDEX to give the optimizer hints about how to choose indexes is described in section 14.1.7.1 JOIN Syntax. -steve-- -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Help with query performance anomaly
Hi Can someone offer any advice on a strange problem I have at present... If I run a certain query (see below) on my local development PC using mysqlcc it returns in 3.7s. If I run the exact same query on my live webserver (again using mysqlcc) I have yet to get a result !! Both databases have the same table definitions (live db originally created from mysqldump of dev PC) and have exactly the same [number of] records in each table. Both machines are running MySQL 3.23.58. Dev PC if Fedora Core2, live is RedHat 9. Other than this one query all else appears normal, any suggestions? Let me know if you need more info and I'll attempt to supply it... Many thanks Graham Query: SELECT code, sum(qty) as total FROM table1 as d, db2.table2 as r, table3 as p WHERE year=2004 AND month=10 AND r.col1=p.col1 and d.code=p.code and from_period = 200410 and to_period 200410 and d.col3!='6' GROUP BY code -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Help with query performance anomaly
What does EXPLAIN show for the query on both systems? (I am wondering if you may have an index on your development system that you do not have on your production server.) Shawn Green Database Administrator Unimin Corporation - Spruce Pine Graham Cossey [EMAIL PROTECTED] wrote on 11/11/2004 11:19:08 AM: Hi Can someone offer any advice on a strange problem I have at present... If I run a certain query (see below) on my local development PC using mysqlcc it returns in 3.7s. If I run the exact same query on my live webserver (again using mysqlcc) I have yet to get a result !! Both databases have the same table definitions (live db originally created from mysqldump of dev PC) and have exactly the same [number of] records in each table. Both machines are running MySQL 3.23.58. Dev PC if Fedora Core2, live is RedHat 9. Other than this one query all else appears normal, any suggestions? Let me know if you need more info and I'll attempt to supply it... Many thanks Graham Query: SELECT code, sum(qty) as total FROM table1 as d, db2.table2 as r, table3 as p WHERE year=2004 AND month=10 AND r.col1=p.col1 and d.code=p.code and from_period = 200410 and to_period 200410 and d.col3!='6' GROUP BY code -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
RE: Help with query performance anomaly
Thanks Shaun EXPLAIN shows the same 'possible keys' for each table but 'key' and 'key-len' columns are different, as are the 'rows' as well of course. I guess this points to a probable difference in key definitions? Can 2 installations with the same table definitions produce different results like this? Maybe something in the configs? Thanks Graham -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: 11 November 2004 16:28 To: Graham Cossey Cc: [EMAIL PROTECTED] Subject: Re: Help with query performance anomaly What does EXPLAIN show for the query on both systems? (I am wondering if you may have an index on your development system that you do not have on your production server.) Shawn Green Database Administrator Unimin Corporation - Spruce Pine Graham Cossey [EMAIL PROTECTED] wrote on 11/11/2004 11:19:08 AM: Hi Can someone offer any advice on a strange problem I have at present... If I run a certain query (see below) on my local development PC using mysqlcc it returns in 3.7s. If I run the exact same query on my live webserver (again using mysqlcc) I have yet to get a result !! Both databases have the same table definitions (live db originally created from mysqldump of dev PC) and have exactly the same [number of] records in each table. Both machines are running MySQL 3.23.58. Dev PC if Fedora Core2, live is RedHat 9. Other than this one query all else appears normal, any suggestions? Let me know if you need more info and I'll attempt to supply it... Many thanks Graham Query: SELECT code, sum(qty) as total FROM table1 as d, db2.table2 as r, table3 as p WHERE year=2004 AND month=10 AND r.col1=p.col1 and d.code=p.code and from_period = 200410 and to_period 200410 and d.col3!='6' GROUP BY code -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
RE: Help with query performance anomaly
Check the results of SHOW CREATE TABLE for the three tables you use and compare between production and development. You should be able to spot any differences in your key definitions. If they are the same on both machines then you should probably run ANALYZE TABLE against the three tables on your production machine. That will update the query optimizer's statistics for those tables. If the optimizer has bad stats it can make poor choices about which index to use. If that doesn't help, try using the OPTIMIZE TABLE command on your three tables. Heavy fragmentation can slow down data retrieval, too. Let me know how things turn out. Shawn Green Database Administrator Unimin Corporation - Spruce Pine Graham Cossey [EMAIL PROTECTED] wrote on 11/11/2004 11:48:13 AM: Thanks Shaun EXPLAIN shows the same 'possible keys' for each table but 'key' and 'key-len' columns are different, as are the 'rows' as well of course. I guess this points to a probable difference in key definitions? Can 2 installations with the same table definitions produce different results like this? Maybe something in the configs? Thanks Graham -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: 11 November 2004 16:28 To: Graham Cossey Cc: [EMAIL PROTECTED] Subject: Re: Help with query performance anomaly What does EXPLAIN show for the query on both systems? (I am wondering if you may have an index on your development system that you do not have on your production server.) Shawn Green Database Administrator Unimin Corporation - Spruce Pine Graham Cossey [EMAIL PROTECTED] wrote on 11/11/2004 11:19:08 AM: Hi Can someone offer any advice on a strange problem I have at present... If I run a certain query (see below) on my local development PC using mysqlcc it returns in 3.7s. If I run the exact same query on my live webserver (again using mysqlcc) I have yet to get a result !! Both databases have the same table definitions (live db originally created from mysqldump of dev PC) and have exactly the same [number of] records in each table. Both machines are running MySQL 3.23.58. Dev PC if Fedora Core2, live is RedHat 9. Other than this one query all else appears normal, any suggestions? Let me know if you need more info and I'll attempt to supply it... Many thanks Graham Query: SELECT code, sum(qty) as total FROM table1 as d, db2.table2 as r, table3 as p WHERE year=2004 AND month=10 AND r.col1=p.col1 and d.code=p.code and from_period = 200410 and to_period 200410 and d.col3!='6' GROUP BY code -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]
RE: Help with query performance anomaly
Thanks Shaun EXPLAIN shows the same 'possible keys' for each table but 'key' and 'key-len' columns are different, as are the 'rows' as well of course. I guess this points to a probable difference in key definitions? Can 2 installations with the same table definitions produce different results like this? Maybe something in the configs? [snip] What does EXPLAIN show for the query on both systems? (I am wondering if you may have an index on your development system that you do not have on your production server.) [snip] Can someone offer any advice on a strange problem I have at present... If I run a certain query (see below) on my local development PC using mysqlcc it returns in 3.7s. If I run the exact same query on my live webserver (again using mysqlcc) I have yet to get a result !! Both databases have the same table definitions (live db originally created from mysqldump of dev PC) and have exactly the same [number of] records in each table. Both machines are running MySQL 3.23.58. Dev PC if Fedora Core2, live is RedHat 9. Other than this one query all else appears normal, any suggestions? Let me know if you need more info and I'll attempt to supply it... Many thanks Graham [snip] I've done mysqldumps of the tables involved on both machines and the create table definitions and key definitions are identical. The results of my EXPLAINs are pasted below. Thanks Graham DEV BOX: EXPLAIN SELECT d.dcode, sum(qty) as total FROM table1 as d, db2.table2 as r, table3 as p WHERE year=2004 AND month=10 AND r.pcode=p.pcode and d.dcode=p.dcode and from_period = 200410 and to_period 200410 and d.region!='6' GROUP BY dcode +---+---+--- -+--+-+-+--+ --+ | table | type | possible_keys | key | key_len | ref | rows | Extra | +---+---+--- -+--+-+-+--+ --+ | d | ALL | [NULL] | [NULL] | [NULL] | [NULL] | 322 | Using where; Using temporary; Using filesort | | p | index | PRIMARY | PRIMARY | 19 | [NULL] | 6082 | Using where; Using index | | r | ref | PRIMARY,yr_mn_mk_rng_dr,yr_mn_st,yr_mn_pc,yr_mn_f,yr_mn_mk_st | yr_mn_pc | 13 | const,const,p.pcode | 41 | Using where | +---+---+--- -+--+-+-+--+ --+ LIVE SERVER: EXPLAIN SELECT d.dcode, sum(qty) as total FROM table1 as d, db2.table2 as r, table3 as p WHERE year=2004 AND month=10 AND r.pcode=p.pcode and d.dcode=p.dcode and from_period = 200410 and to_period 200410 and d.region!='6' GROUP BY dcode +---+--+ +-+-+-+---+- -+ | table | type | possible_keys | key | key_len | ref | rows | Extra | +---+--+ +-+-+-+---+- -+ | d | ALL | [NULL] | [NULL] | [NULL] | [NULL] | 322 | Using where; Using temporary; Using filesort | | r | ref | PRIMARY,yr_mn_mk_rng_dr,yr_mn_st,yr_mn_pc,yr_mn_f,yr_mn_mk_st | PRIMARY | 8 | const,const | 89618 | Using where | | p | ref | PRIMARY | PRIMARY | 4 | r.pcode | 2 | Using where; Using index | +---+--+ +-+-+-+---+- -+ -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
RE: Help with query performance anomaly
Response at end Graham Cossey [EMAIL PROTECTED] wrote on 11/11/2004 12:19:17 PM: Thanks Shaun EXPLAIN shows the same 'possible keys' for each table but 'key' and 'key-len' columns are different, as are the 'rows' as well of course. I guess this points to a probable difference in key definitions? Can 2 installations with the same table definitions produce different results like this? Maybe something in the configs? [snip] What does EXPLAIN show for the query on both systems? (I am wondering if you may have an index on your development system that you do not have on your production server.) [snip] Can someone offer any advice on a strange problem I have at present... If I run a certain query (see below) on my local development PC using mysqlcc it returns in 3.7s. If I run the exact same query on my live webserver (again using mysqlcc) I have yet to get a result !! Both databases have the same table definitions (live db originally created from mysqldump of dev PC) and have exactly the same [number of] records in each table. Both machines are running MySQL 3.23.58. Dev PC if Fedora Core2, live is RedHat 9. Other than this one query all else appears normal, any suggestions? Let me know if you need more info and I'll attempt to supply it... Many thanks Graham [snip] I've done mysqldumps of the tables involved on both machines and the create table definitions and key definitions are identical. The results of my EXPLAINs are pasted below. Thanks Graham DEV BOX: EXPLAIN SELECT d.dcode, sum(qty) as total FROM table1 as d, db2.table2 as r, table3 as p WHERE year=2004 AND month=10 AND r.pcode=p.pcode and d.dcode=p.dcode and from_period = 200410 and to_period 200410 and d.region!='6' GROUP BY dcode +---+---+--- -+--+-+-+--+ --+ | table | type | possible_keys | key | key_len | ref | rows | Extra | +---+---+--- -+--+-+-+--+ --+ | d | ALL | [NULL] | [NULL] | [NULL] | [NULL] | 322 | Using where; Using temporary; Using filesort | | p | index | PRIMARY | PRIMARY | 19 | [NULL] | 6082 | Using where; Using index | | r | ref | PRIMARY,yr_mn_mk_rng_dr,yr_mn_st,yr_mn_pc,yr_mn_f,yr_mn_mk_st | yr_mn_pc | 13 | const,const,p.pcode | 41 | Using where | +---+---+--- -+--+-+-+--+ --+ LIVE SERVER: EXPLAIN SELECT d.dcode, sum(qty) as total FROM table1 as d, db2.table2 as r, table3 as p WHERE year=2004 AND month=10 AND r.pcode=p.pcode and d.dcode=p.dcode and from_period = 200410 and to_period 200410 and d.region!='6' GROUP BY dcode +---+--+ +-+-+-+---+- -+ | table | type | possible_keys | key | key_len | ref | rows | Extra | +---+--+ +-+-+-+---+- -+ | d | ALL | [NULL] | [NULL] | [NULL] | [NULL] | 322 | Using where; Using temporary; Using filesort | | r | ref | PRIMARY,yr_mn_mk_rng_dr,yr_mn_st,yr_mn_pc,yr_mn_f,yr_mn_mk_st | PRIMARY | 8 | const,const | 89618 | Using where | | p | ref | PRIMARY | PRIMARY | 4 | r.pcode | 2 | Using where; Using index | +---+--+ +-+-+-+---+- -+ These are two different plans. Your development machine is using the index yr_mn_pc on the r table and is joining that table last. On your production server, the r table is joined second and is joined by the index PRIMARY. Let me know how the ANALYZE TABLE I suggested in a previous message works out to help the statistics. Shawn Green Database Administrator Unimin Corporation - Spruce Pine
RE: Help with query performance anomaly
[big snip] These are two different plans. Your development machine is using the index yr_mn_pc on the r table and is joining that table last. On your production server, the r table is joined second and is joined by the index PRIMARY. Let me know how the ANALYZE TABLE I suggested in a previous message works out to help the statistics. I have run ANALYZE on all tables on the live server and the result of EXPLAIN is the same as before. Any further suggestions? I'm off to double check the create table stuff once more... Thanks for your help (and patience!!) Graham -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Help with query performance anomaly
How do the OS statistics look on both boxes. Do top, sar, vmstat or iostat show any CPU, memory or I/O performance issues? Does anything odd appear in the /var/log/messages file? -Jamie On Thu, 11 Nov 2004 18:42:48 -, Graham Cossey [EMAIL PROTECTED] wrote: [big snip] These are two different plans. Your development machine is using the index yr_mn_pc on the r table and is joining that table last. On your production server, the r table is joined second and is joined by the index PRIMARY. Let me know how the ANALYZE TABLE I suggested in a previous message works out to help the statistics. I have run ANALYZE on all tables on the live server and the result of EXPLAIN is the same as before. Any further suggestions? I'm off to double check the create table stuff once more... Thanks for your help (and patience!!) Graham -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
RE: Help with query performance anomaly
Everything seems fine except for this one query. I'm not sure quite what the results of top will say or what they mean if it's a Virtual Private Server environment? Nothing untoward in /var/log/messages or var/log/httpd/error_log or virtual host httpd logs. Have just run 'top' on the live server... Before running the query I get: 13:56:09 up 45 days, 11:47, 1 user, load average: 0.00, 0.28, 0.44 24 processes: 23 sleeping, 1 running, 0 zombie, 0 stopped CPU0 states: 0.0% user 0.0% system0.0% nice 0.0% iowait 100.0% idle CPU1 states: 0.0% user 0.0% system0.0% nice 0.0% iowait 100.0% idle CPU2 states: 0.0% user 0.1% system0.0% nice 0.0% iowait 99.4% idle CPU3 states: 0.0% user 0.0% system0.0% nice 0.0% iowait 100.0% idle Mem: 6203744k av, 6194148k used,9596k free, 0k shrd, 304848k buff 1948476k active,3601304k inactive Swap: 4192956k av, 1876604k used, 2316352k free 4081216k cached PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND 7622 15 0 904 904 748 R 0.1 0.0 0:00 2 top 1 root 9 0 468 440 420 S 0.0 0.0 2:58 1 init 1733 root 9 0 548 516 480 S 0.0 0.0 0:35 1 syslogd 2242 root 8 0 808 736 684 S 0.0 0.0 0:11 0 xinetd 3393 root 8 0 576 552 512 S 0.0 0.0 0:21 1 crond 15329 root 9 0 1284 1156 1088 S 0.0 0.0 0:27 2 sshd 3264 root 8 0 3676 1548 1488 S 0.0 0.0 0:37 2 httpd 15296 apache 9 0 9904 8872 4752 S 0.0 0.1 0:00 2 httpd 4576 apache 9 0 9876 8804 4344 S 0.0 0.1 0:01 1 httpd 8992 root 9 0 1000 1000 848 S 0.0 0.0 0:00 1 mysqld_safe 10433 mysql 9 0 17036 16M 2012 S 0.0 0.2 0:00 2 mysqld 11360 mysql 8 0 17036 16M 2012 S 0.0 0.2 0:00 2 mysqld 11395 mysql 9 0 17036 16M 2012 S 0.0 0.2 0:00 0 mysqld 11425 mysql 9 0 17036 16M 2012 S 0.0 0.2 0:00 2 mysqld 11456 mysql 9 0 17036 16M 2012 S 0.0 0.2 0:00 3 mysqld 11491 mysql 9 0 17036 16M 2012 S 0.0 0.2 0:00 2 mysqld 12128 mysql 9 0 17036 16M 2012 S 0.0 0.2 0:00 2 mysqld 12162 mysql 9 0 17036 16M 2012 S 0.0 0.2 0:00 0 mysqld 12193 mysql 9 0 17036 16M 2012 S 0.0 0.2 0:00 2 mysqld 12224 mysql 9 0 17036 16M 2012 S 0.0 0.2 0:00 1 mysqld 32418 mysql 9 0 17036 16M 2012 S 0.0 0.2 0:00 2 mysqld 5284 9 0 2288 2244 2068 S 0.0 0.0 0:00 3 sshd 5538 9 0 1292 1292 984 S 0.0 0.0 0:00 1 bash Now, I don't know if it's coincidental or not but after setting the query running and re-issuing the top command I get: 13:59:49 up 45 days, 11:51, 1 user, load average: 0.98, 0.61, 0.53 26 processes: 24 sleeping, 2 running, 0 zombie, 0 stopped CPU0 states: 75.0% user 25.0% system0.0% nice 0.0% iowait 0.0% idle CPU1 states: 76.0% user 24.0% system0.0% nice 0.0% iowait 0.0% idle CPU2 states: 63.0% user 36.0% system0.0% nice 0.0% iowait 0.0% idle Floating point exception Does not look good to me !! Comments? Advice? Thanks Graham -Original Message- From: Jamie Kinney [mailto:[EMAIL PROTECTED] Sent: 11 November 2004 19:25 To: Graham Cossey Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: Re: Help with query performance anomaly How do the OS statistics look on both boxes. Do top, sar, vmstat or iostat show any CPU, memory or I/O performance issues? Does anything odd appear in the /var/log/messages file? -Jamie On Thu, 11 Nov 2004 18:42:48 -, Graham Cossey [EMAIL PROTECTED] wrote: [big snip] These are two different plans. Your development machine is using the index yr_mn_pc on the r table and is joining that table last. On your production server, the r table is joined second and is joined by the index PRIMARY. Let me know how the ANALYZE TABLE I suggested in a previous message works out to help the statistics. I have run ANALYZE on all tables on the live server and the result of EXPLAIN is the same as before. Any further suggestions? I'm off to double check the create table stuff once more... Thanks for your help (and patience!!) Graham -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
RE: Help with query performance anomaly
[snip] Have just run 'top' on the live server... Before running the query I get: 13:56:09 up 45 days, 11:47, 1 user, load average: 0.00, 0.28, 0.44 24 processes: 23 sleeping, 1 running, 0 zombie, 0 stopped CPU0 states: 0.0% user 0.0% system0.0% nice 0.0% iowait 100.0% idle CPU1 states: 0.0% user 0.0% system0.0% nice 0.0% iowait 100.0% idle CPU2 states: 0.0% user 0.1% system0.0% nice 0.0% iowait 99.4% idle CPU3 states: 0.0% user 0.0% system0.0% nice 0.0% iowait 100.0% idle Mem: 6203744k av, 6194148k used,9596k free, 0k shrd, 304848k buff 1948476k active,3601304k inactive Swap: 4192956k av, 1876604k used, 2316352k free 4081216k cached PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND 7622 15 0 904 904 748 R 0.1 0.0 0:00 2 top 1 root 9 0 468 440 420 S 0.0 0.0 2:58 1 init 1733 root 9 0 548 516 480 S 0.0 0.0 0:35 1 syslogd 2242 root 8 0 808 736 684 S 0.0 0.0 0:11 0 xinetd 3393 root 8 0 576 552 512 S 0.0 0.0 0:21 1 crond 15329 root 9 0 1284 1156 1088 S 0.0 0.0 0:27 2 sshd 3264 root 8 0 3676 1548 1488 S 0.0 0.0 0:37 2 httpd 15296 apache 9 0 9904 8872 4752 S 0.0 0.1 0:00 2 httpd 4576 apache 9 0 9876 8804 4344 S 0.0 0.1 0:01 1 httpd 8992 root 9 0 1000 1000 848 S 0.0 0.0 0:00 1 mysqld_safe 10433 mysql 9 0 17036 16M 2012 S 0.0 0.2 0:00 2 mysqld 11360 mysql 8 0 17036 16M 2012 S 0.0 0.2 0:00 2 mysqld 11395 mysql 9 0 17036 16M 2012 S 0.0 0.2 0:00 0 mysqld 11425 mysql 9 0 17036 16M 2012 S 0.0 0.2 0:00 2 mysqld 11456 mysql 9 0 17036 16M 2012 S 0.0 0.2 0:00 3 mysqld 11491 mysql 9 0 17036 16M 2012 S 0.0 0.2 0:00 2 mysqld 12128 mysql 9 0 17036 16M 2012 S 0.0 0.2 0:00 2 mysqld 12162 mysql 9 0 17036 16M 2012 S 0.0 0.2 0:00 0 mysqld 12193 mysql 9 0 17036 16M 2012 S 0.0 0.2 0:00 2 mysqld 12224 mysql 9 0 17036 16M 2012 S 0.0 0.2 0:00 1 mysqld 32418 mysql 9 0 17036 16M 2012 S 0.0 0.2 0:00 2 mysqld 5284 9 0 2288 2244 2068 S 0.0 0.0 0:00 3 sshd 5538 9 0 1292 1292 984 S 0.0 0.0 0:00 1 bash Now, I don't know if it's coincidental or not but after setting the query running and re-issuing the top command I get: 13:59:49 up 45 days, 11:51, 1 user, load average: 0.98, 0.61, 0.53 26 processes: 24 sleeping, 2 running, 0 zombie, 0 stopped CPU0 states: 75.0% user 25.0% system0.0% nice 0.0% iowait 0.0% idle CPU1 states: 76.0% user 24.0% system0.0% nice 0.0% iowait 0.0% idle CPU2 states: 63.0% user 36.0% system0.0% nice 0.0% iowait 0.0% idle Floating point exception Does not look good to me !! [snip] I have now managed to get a top while the query is running: 14:29:52 up 45 days, 12:21, 1 user, load average: 0.69, 0.28, 0.39 25 processes: 23 sleeping, 2 running, 0 zombie, 0 stopped CPU0 states: 71.1% user 28.0% system0.0% nice 0.0% iowait 0.0% idle CPU1 states: 68.0% user 31.0% system0.0% nice 0.0% iowait 0.0% idle CPU2 states: 71.0% user 28.0% system0.0% nice 0.0% iowait 0.0% idle CPU3 states: 80.0% user 19.0% system0.0% nice 0.0% iowait 0.0% idle Mem: 6203744k av, 5764148k used, 439596k free, 0k shrd, 257900k buff 1839520k active,3282316k inactive Swap: 4192956k av, 1881496k used, 2311460k free 3687672k cached PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND 19462 mysql 14 0 19968 19M 2016 R95.4 0.3 0:38 0 mysqld 25248 10 0 1004 1004 748 R 0.3 0.0 0:00 3 top 1 root 9 0 468 440 420 S 0.0 0.0 2:58 1 init 1733 root 9 0 548 516 480 S 0.0 0.0 0:35 3 syslogd 2242 root 8 0 808 736 684 S 0.0 0.0 0:11 3 xinetd 3393 root 9 0 576 552 512 S 0.0 0.0 0:21 2 crond 15329 root 9 0 1284 1156 1088 S 0.0 0.0 0:27 0 sshd 3264 root 9 0 3676 1548 1500 S 0.0 0.0 0:37 1 httpd 15296 apache 9 0 10632 9608 4768 S 0.0 0.1 0:01 0 httpd 4576 apache 9 0 10036 8964 4344 S 0.0 0.1 0:01 3 httpd 8992 root 9 0 1000 1000 848 S 0.0 0.0 0:00 1 mysqld_safe 10433 mysql 9 0 19968 19M 2016 S 0.0 0.3 0:00 1 mysqld 11360 mysql 9 0 19968 19M 2016 S 0.0 0.3 0:00 2 mysqld 11395 mysql 9 0 19968 19M 2016 S 0.0 0.3 0:00 0 mysqld 11425 mysql 9 0 19968 19M 2016 S 0.0 0.3 0:00 2 mysqld 11456 mysql
Strange query performance problem
Mysql 4.1.3 Windows XP SP1 All tables are InnoDB The query (1): select Product.id, LongAnswer.value, count(*) from LongAnswer inner join Answer on LongAnswer.answer=Answer.id inner join QuestionDefinition on Answer.question=QuestionDefinition.id inner join Survey on Answer.survey = Survey.id inner join DueReport on Survey.dueReport = DueReport.id inner join Product on Answer.product = Product.id where ( Product.id = 117 ) and(QuestionDefinition.id=2 ) and(DueReport.id=64 ) group by Product.id, LongAnswer.value; produces these results (see running time): +-+---+--+ | id | value | count(*) | +-+---+--+ | 117 | 3 | 47 | | 117 | 4 | 153 | +-+---+--+ 2 rows in set (0.92 sec) If I drop ( Product.id = 117 ) clause the larger rowset gets generated but it's lightning fast (2): select Product.id, LongAnswer.value, count(*) from LongAnswer inner join Answer on LongAnswer.answer=Answer.id inner join QuestionDefinition on Answer.question=QuestionDefinition.id inner join Survey on Answer.survey = Survey.id inner join DueReport on Survey.dueReport = DueReport.id inner join Product on Answer.product = Product.id where (QuestionDefinition.id=2 ) and(DueReport.id=64 ) group by Product.id, LongAnswer.value; +-+---+--+ | id | value | count(*) | +-+---+--+ | 64 | 3 |4 | | 64 | 4 | 196 | | 65 | 3 |3 | | 65 | 4 | 197 | | 66 | 3 |6 | | 66 | 4 | 194 | | 67 | 3 | 44 | | 67 | 4 | 156 | | 68 | 3 | 21 | | 68 | 4 | 179 | | 69 | 3 | 20 | | 69 | 4 | 180 | | 70 | 3 | 26 | | 70 | 4 | 174 | | 71 | 3 | 11 | | 71 | 4 | 189 | | 72 | 3 | 102 | | 72 | 4 | 98 | | 73 | 3 | 31 | | 73 | 4 | 169 | | 74 | 3 | 19 | | 74 | 4 | 181 | | 75 | 3 | 13 | | 75 | 4 | 187 | | 76 | 3 | 22 | | 76 | 4 | 178 | | 77 | 3 | 39 | | 77 | 4 | 161 | | 78 | 3 | 16 | | 78 | 4 | 184 | | 79 | 3 | 56 | | 79 | 4 | 144 | | 80 | 3 | 66 | | 80 | 4 | 134 | | 81 | 3 | 36 | | 81 | 4 | 164 | | 82 | 3 | 68 | | 82 | 4 | 132 | | 83 | 3 | 73 | | 83 | 4 | 127 | | 84 | 3 | 49 | | 84 | 4 | 151 | | 85 | 3 | 54 | | 85 | 4 | 146 | | 86 | 3 | 50 | | 86 | 4 | 150 | | 87 | 3 | 48 | | 87 | 4 | 152 | | 88 | 3 | 35 | | 88 | 4 | 165 | | 89 | 3 |9 | | 89 | 4 | 191 | | 90 | 3 |9 | | 90 | 4 | 191 | | 91 | 3 | 10 | | 91 | 4 | 190 | | 92 | 3 | 24 | | 92 | 4 | 176 | | 93 | 3 | 38 | | 93 | 4 | 162 | | 94 | 3 | 52 | | 94 | 4 | 148 | | 95 | 3 | 18 | | 95 | 4 | 182 | | 96 | 3 | 17 | | 96 | 4 | 183 | | 97 | 3 | 10 | | 97 | 4 | 190 | | 98 | 3 | 85 | | 98 | 4 | 115 | | 99 | 3 |3 | | 99 | 4 | 197 | | 100 | 3 |5 | | 100 | 4 | 195 | | 101 | 3 |7 | | 101 | 4 | 193 | | 102 | 3 | 22 | | 102 | 4 | 178 | | 103 | 3 | 23 | | 103 | 4 | 177 | | 104 | 3 | 22 | | 104 | 4 | 178 | | 105 | 3 |2 | | 105 | 4 | 198 | | 106 | 3 |3 | | 106 | 4 | 197 | | 107 | 3 |8 | | 107 | 4 | 192 | | 108 | 3 |9 | | 108 | 4 | 191 | | 109 | 3 | 21 | | 109 | 4 | 179 | | 110 | 3 | 25 | | 110 | 4 | 175 | | 111 | 3 | 10 | | 111 | 4 | 190 | | 112 | 3 | 113 | | 112 | 4 | 87 | | 113 | 3 | 54 | | 113 | 4 | 146 | | 114 | 3 | 69 | | 114 | 4 | 131 | | 115 | 3 | 68 | | 115 | 4 | 132 | | 116 | 3 |6 | | 116 | 4 | 194 | | 117 | 3 | 47 | | 117 | 4 | 153 | +-+---+--+ 108 rows in set (0.08 sec) explain select shows subtle differences but I am not experienced enough to interpret them properly? (1) ++-+++--++-+---+--+--+ | id | select_type | table | type | possible_keys | key| key_len | ref | rows | Extra|
Re: MySQL query performance test tool
Haitao Jiang [EMAIL PROTECTED] wrote: We want to test our MYSQL (4.1.4g) server's query performance, and I just wondering if there is a tool that enable us sending a list of queries over HTTP or JDBC repeatedly and gather/display the statistics? Honetsly, it's almost always better to write your own for your own application. It's not so complicated. See perl module WWW::Mechanize on cpan (http://search.cpan.org), it's a gift for HTTP test/benchmark development. :) -- For technical support contracts, goto https://order.mysql.com/?ref=ensita This email is sponsored by Ensita.net http://www.ensita.net/ __ ___ ___ __ / |/ /_ __/ __/ __ \/ /Egor Egorov / /|_/ / // /\ \/ /_/ / /__ [EMAIL PROTECTED] /_/ /_/\_, /___/\___\_\___/ MySQL AB / Ensita.net ___/ www.mysql.com -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: MySQL query performance test tool
Hi Haitao. I'm in the process of developing one of these for the company I work for. Feel Free to drop me a line and we'll see how we can get it going for you. Regards Ian Haitao Jiang wrote: Hi, We want to test our MYSQL (4.1.4g) server's query performance, and I just wondering if there is a tool that enable us sending a list of queries over HTTP or JDBC repeatedly and gather/display the statistics? Thanks HT -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
MySQL query performance test tool
Hi, We want to test our MYSQL (4.1.4g) server's query performance, and I just wondering if there is a tool that enable us sending a list of queries over HTTP or JDBC repeatedly and gather/display the statistics? Thanks HT -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Estimating Query Performance
G'day all, I was hoping to leech from your amalgamated knowledge: I've been asked to estimate the query performance of several SQL queries that power our Reporting system. At the moment we're preparing to scale up enormously the amount of data we're using in our system, and therefore I'm trying to estimate the length of time these queries will take. We are using Red Hat 7.2/MySQL 3.23.49a I believe (I know, I know - dont ask why). Refering to the manual, section 7.2.2 Estimating Performance (http://dev.mysql.com/doc/mysql/en/Estimating_performance.html). I have an issue with the values for the equation given: log(row_count) / log(index_block_length / 3 * 2 / (index_length + data_pointer_length)) + 1 seeks to find a row. Lets take one of my example tables: row_count - 1,024,306 (will soon be ~23,250,000) rows. index_block_length - ? index_length - ? data_pointer_length - ? * data_pointer_length index_block_length :- I know the manual states: MySQL an index block is usually 1024 bytes and the data pointer is usually 4 bytes My issue is, what is meant by *usually*. How can I check. Should I just use these. * index_length :- For this table, it has the following indexes: PRIMARY KEY (`ID`), - ID is INT(11) KEY `LogTimeIdx` (`LogTime`), - LogTime is TIMESTAMP(14) KEY `signid` (`SignID`) - SignID is INT(11) So what would be my index length? * SHOW TABLE STATUS tells me: rows: 1,024,306 avg_row_length: 51 data_length: 52,543,348 index_length: 32,238,592 Any help with the values I should be using or any guidance on estimating a Queries Performance would be unimaginably appreciated. Regards, Matt. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Estimating Query Performance
On Fri, 10 Sep 2004 05:54:42 +1000, Matthew Boulter [EMAIL PROTECTED] wrote: snip/ Any help with the values I should be using or any guidance on estimating a Queries Performance would be unimaginably appreciated. This is an area in which I felt better armed when I used Oracle. I'm curious to see whether anyone has come up with some practical ideas here, too. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
query performance
I have a db that had some 20,000 records or so in it. I have a query to find out how many jobs have been input during the current day. To add them I ran the following query... select count(idnumber) from maintable where inputdatetime '$date 00:00:00' and client='smith' $date is the current date in CCYY-MM-DD fashion and the query runs. However it seems fairly slow. I have now added some 100,000+ records from a merge I performed and now it takes a really long time. Is there a better way to query this that will take a load off the machine? The only key in the table is the idnumber. I don't really know anything about how keys help or when to/not to use them other than their being a necessity for an auto_increment field. TIA Larry -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: query performance
At 07:10 PM 1/23/2004, Larry Brown wrote: I have a db that had some 20,000 records or so in it. I have a query to find out how many jobs have been input during the current day. To add them I ran the following query... select count(idnumber) from maintable where inputdatetime '$date 00:00:00' and client='smith' $date is the current date in CCYY-MM-DD fashion and the query runs. However it seems fairly slow. I have now added some 100,000+ records from a merge I performed and now it takes a really long time. Is there a better way to query this that will take a load off the machine? The only key in the table is the idnumber. I don't really know anything about how keys help or when to/not to use them other than their being a necessity for an auto_increment field. TIA Larry Larry, Add two indexes, one for InputDateTime and another for Client. You should read up on MySQL. Try Paul Dubois book MySQL 2nd Edition because starts off really easy with stuff like this and by the time you're done, you're an expert. Mike -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Query performance
Hi, I have a table with 18 million of rows. The table structure is describe user_att +-+--+--+-+-+---+ | Field | Type | Null | Key | Default | Extra | +-+--+--+-+-+---+ | user_id | int(11) | | PRI | 0 | | | att_id | int(11) | | PRI | 0 | | | value | varchar(200) | | | | | | date| datetime | YES | | NULL| | +-+--+--+-+-+---+ 2 index on this table: - one unique index on user_id and att_id (pk) - one index on att_id and user_id. I need to have the following query: select value from user_att where att_id = ? and value like '?' (no wildcard) My question is 1. when I do a explain, this query use the second index. But, if I change my second index to att_id and value, will the performance improve? 2. what is the difference if I change the query to select value from user_att where att_id = ? and lower(value) = lower('?') will this query slower? 3. when compare string, is mysql sql case sensitive? It seems that it is case in-sensitive. If case in-sensitive, the following query will be faster? select value from user_att where att_id = ? and value = '?' Thanks Hsiu-Hui -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
RE: Query performance
2 index on this table: - one unique index on user_id and att_id (pk) - one index on att_id and user_id. I need to have the following query: select value from user_att where att_id = ? and value like '?' (no wildcard) 1. when I do a explain, this query use the second index. But, if I change my second index to att_id and value, will the performance improve? You could add it as a third index and see which works better, but on 18 million rows that's going to probably take quite a bit of time. From looking at your query it seems like it would be a better index than the current one. 2. what is the difference if I change the query to select value from user_att where att_id = ? and lower(value) = lower('?') will this query slower? I could be wrong, but I believe the query won't use the index if you use lower(). Run an explain on this query and see. 3. when compare string, is mysql sql case sensitive? It seems that it is case in-sensitive. If case in-sensitive, the following query will be faster? select value from user_att where att_id = ? and value = '?' Mysql is only case sensitive on binary and blob fields. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
load data query performance
Hi, I was read from this group about LOAD DATA query performance. Someone was tell me that when load large data text file into database which has primary key or index may be slow. The way to make load data faster is drop and create table without index, after load data then create index again. But I found the opposite respond, I have text file (with delimited) about 5 mb (for table 126 fields, 1 PK, 2 indexes, 7500 rows). this is the respond time. - load data with drop-create table-add PK, indexestake 50 second - load data with empty tale (still have PK indexes) take 26 second What's wrong with my environment or load data concept? Sommai -- Please be informed that all e-mail which are addressing to thaithanakit.co.th will need to be changed to BTsecurities.com by March 1, 2002 Thank you. :-) -- - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
Bizarre query performance
I have been testing an application that uses mysql on SCO OpenServer and I have discovered some strange query performance. To investigate the matter further, I have written a client program that uses the mysql C API directly so that I can time things exactly. Environment: Server: SCO OpenServer V3.2 R5.0.5, AMD K6-2 350Mhz CPU, 128Mb RAM mySQL: 3.23.39, compiled by me to avoid use of libraries, using latest available pthreads Clients: Win32 machines (more detail later). There are 2 times I am interested in, the time to execute a query, and the time required to fetch the results across the network (100Mbps LAN, 3 isolated workstations in test setup). Basically I have been timing the mysql_query() and the mysql_store_result() calls on the client. I have been getting some very bizarre results, that are 100% reproducible: QUERY 1: SELECT * FROM X WHERE ID=100 ID is defined as the PRIMARY KEY for this table, therefore this query returns exactly one row, and should be very quick. This table only has around 20 columns, mostly integers, no long text fields. Client 1 (Windows 2000 SP2, AMD Athlon 900Mhz processor, 512Mb RAM) mysql_query takes less than 10ms to execute. This is fine. mysql_store_result takes around 130ms-200ms to execute. This is not fine! The performance of this query is the same when run against a number of tables (all of my tables have a column called ID defined as a PRIMARY KEY). Client 2 (Windows 98SE, Intel Pentium II 400Mhz, 256Mb RAM) mysql_query takes around 180ms to execute. mysql_store_result takes less than 10ms to execute. Analysis: both clients take about 200ms to execute the query and fetch data across the network to the client. However, where the time is being spent is the opposite for the 2 clients. Additionally, selecting a single unique row using a primary key should not take 200ms. Investigating this further, I tried the following query on the same table: QUERY 2: SELECT * FROM X WHERE SCHEDULEDSTART BETWEEN 2001-10-06 AND 2001-10-07 This query returns 33 rows, i.e. a lot more data than the previous query! SCHEDULEDSTART is an indexed DATETIME field. mysql_query takes around 10ms to execute on both clients. mysql_store_result takes no measurable time to execute. These results are completely reproducible and make no sense! Fetching a single row using a primary key takes around 200ms, while fetching 33 rows using another indexed field takes around 10ms! As I say, I can reproduce this entirely. Can anyone give me some assistance with this bizarre behaviour? - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
Re: Bizarre query performance
Have you tried explaining the two select to see where all the time is being spent and how the queries are optimized? --Russell - Original Message - From: Philip Brown [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Friday, October 05, 2001 1:18 PM Subject: Bizarre query performance I have been testing an application that uses mysql on SCO OpenServer and I have discovered some strange query performance. To investigate the matter further, I have written a client program that uses the mysql C API directly so that I can time things exactly. Environment: Server: SCO OpenServer V3.2 R5.0.5, AMD K6-2 350Mhz CPU, 128Mb RAM mySQL: 3.23.39, compiled by me to avoid use of libraries, using latest available pthreads Clients: Win32 machines (more detail later). There are 2 times I am interested in, the time to execute a query, and the time required to fetch the results across the network (100Mbps LAN, 3 isolated workstations in test setup). Basically I have been timing the mysql_query() and the mysql_store_result() calls on the client. I have been getting some very bizarre results, that are 100% reproducible: QUERY 1: SELECT * FROM X WHERE ID=100 ID is defined as the PRIMARY KEY for this table, therefore this query returns exactly one row, and should be very quick. This table only has around 20 columns, mostly integers, no long text fields. Client 1 (Windows 2000 SP2, AMD Athlon 900Mhz processor, 512Mb RAM) mysql_query takes less than 10ms to execute. This is fine. mysql_store_result takes around 130ms-200ms to execute. This is not fine! The performance of this query is the same when run against a number of tables (all of my tables have a column called ID defined as a PRIMARY KEY). Client 2 (Windows 98SE, Intel Pentium II 400Mhz, 256Mb RAM) mysql_query takes around 180ms to execute. mysql_store_result takes less than 10ms to execute. Analysis: both clients take about 200ms to execute the query and fetch data across the network to the client. However, where the time is being spent is the opposite for the 2 clients. Additionally, selecting a single unique row using a primary key should not take 200ms. Investigating this further, I tried the following query on the same table: QUERY 2: SELECT * FROM X WHERE SCHEDULEDSTART BETWEEN 2001-10-06 AND 2001-10-07 This query returns 33 rows, i.e. a lot more data than the previous query! SCHEDULEDSTART is an indexed DATETIME field. mysql_query takes around 10ms to execute on both clients. mysql_store_result takes no measurable time to execute. These results are completely reproducible and make no sense! Fetching a single row using a primary key takes around 200ms, while fetching 33 rows using another indexed field takes around 10ms! As I say, I can reproduce this entirely. Can anyone give me some assistance with this bizarre behaviour? - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php _ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
RE: Bizarre query performance
Have you tried explaining the two select to see where all the time is being spent and how the queries are optimized? Sorry, I should have included that in my detail. +---+---+---+-+-+---+--+---+ | table | type | possible_keys | key | key_len | ref | rows | Extra | +---+---+---+-+-+---+--+---+ | X | const | PRIMARY,p1| PRIMARY | 4 | const |1 | | +---+---+---+-+-+---+--+---+ This is the query that takes 200ms. It performs the same regardless of the particular table involved (all have a similar primary key), or the record fetched. +--+---+++-+--+--+-- --+ | table| type | possible_keys | key| key_len | ref | rows | Extra | +--+---+++-+--+--+-- --+ | X| range | ScheduledStart | ScheduledStart | 8 | NULL | 25 | where used | +--+---+++-+--+--+-- --+ This is the query that takes less than 10ms. Performance should be worse than that above, but it is not. - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
Re: Bizarre query performance
On Fri, 5 Oct 2001, Philip Brown wrote: Environment: Server: SCO OpenServer V3.2 R5.0.5, AMD K6-2 350Mhz CPU, 128Mb RAM mySQL: 3.23.39, compiled by me to avoid use of libraries, using latest available pthreads ... much deleted... Can anyone give me some assistance with this bizarre behaviour? How is your DNS, WINS,... setup? SCO/Caldera UNIX can use DNS when you do not think it will. Most SCO/Caldera tcp what ever will do a forward and reverse DNS look-up. I can add entries in the MS machines in the hosts file location MS OS/install dependent and very times. Also I can very performation if I have visionfs running and how it is configured. What we need is more details on exactly how things are configured. To many variables are unknown. The details on how your MS machines are configured for networking is also very important. Also I have noticed that depending on what MS patches I have installed I can very performance from machine to machine talking to the SCO boxes. Good Luck, -- Boyd Gerber [EMAIL PROTECTED] ZENEZ 3748 Valley Forge Road, Magna Utah 84044 Office 801-250-0795 FAX 801-250-7975 - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
Re: Bizarre query performance
It doesn't look like you are using an index. Have you tried creating one and seeing what the effect on execution time is? --Russell - Original Message - From: Philip Brown [EMAIL PROTECTED] To: Russell Miller [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent: Friday, October 05, 2001 1:34 PM Subject: RE: Bizarre query performance Have you tried explaining the two select to see where all the time is being spent and how the queries are optimized? Sorry, I should have included that in my detail. +---+---+---+-+-+---+--+---+ | table | type | possible_keys | key | key_len | ref | rows | Extra | +---+---+---+-+-+---+--+---+ | X | const | PRIMARY,p1| PRIMARY | 4 | const |1 | | +---+---+---+-+-+---+--+---+ This is the query that takes 200ms. It performs the same regardless of the particular table involved (all have a similar primary key), or the record fetched. +--+---+++-+--+--+-- --+ | table| type | possible_keys | key| key_len | ref | rows | Extra | +--+---+++-+--+--+-- --+ | X| range | ScheduledStart | ScheduledStart | 8 | NULL | 25 | where used | +--+---+++-+--+--+-- --+ This is the query that takes less than 10ms. Performance should be worse than that above, but it is not. _ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
Re: Bizarre query performance
In the last episode (Oct 05), Philip Brown said: Server: SCO OpenServer V3.2 R5.0.5, AMD K6-2 350Mhz CPU, 128Mb RAM mySQL: 3.23.39, compiled by me to avoid use of libraries, using latest available pthreads Clients: Win32 machines (more detail later). There are 2 times I am interested in, the time to execute a query, and the time required to fetch the results across the network (100Mbps LAN, 3 isolated workstations in test setup). Basically I have been timing the mysql_query() and the mysql_store_result() calls on the client. I have been getting some very bizarre results, that are 100% reproducible: I suppose your test program connects, and loops the same query multiple times in the same session? (Just to rule out connect/disconnect overhead) What are your timings if you run your client on the SCO box? If all your queries are instantaneous, I'd say start dumping packets on the network. 200ms sounds a lot like Nagle's Algorithm kicking in (which shouldn't happen assuming the mysql libs are written right). -- Dan Nelson [EMAIL PROTECTED] - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
RE: Bizarre query performance
How is your DNS, WINS,... setup? SCO/Caldera UNIX can use DNS when you do not think it will. Most SCO/Caldera tcp what ever will do a forward and reverse DNS look-up. I can add entries in the MS machines in the hosts file location MS OS/install dependent and very times. All machines have hard-wired IP addresses (only 3 on the test network) and all machine names entered in /etc/hosts. Also I can very performation if I have visionfs running and how it is configured. VisionFS is running, default configuration. What we need is more details on exactly how things are configured. To many variables are unknown. The details on how your MS machines are configured for networking is also very important. Also I have noticed that depending on what MS patches I have installed I can very performance from machine to machine talking to the SCO boxes. All of these networking issues would affect general query performance. However it does not explain the erratic (yet reproducible) nature of how different queries perform badly, or well, depending. - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
RE: Bizarre query performance
It doesn't look like you are using an index. Have you tried creating one and seeing what the effect on execution time is? What makes you think the index isn't being used? EXPLAIN SELECT * FROM X WHERE (PRIMARY KEY) = N looks like it is using the primary key. However, just to check I created a specific unique index on that value but performance was the same. The second query is using the index on ScheduledStart, which is correct and fine. That query works fine. It looks like the queries on the primary key are the one's that have odd behaviour. --Russell - Original Message - From: Philip Brown [EMAIL PROTECTED] To: Russell Miller [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent: Friday, October 05, 2001 1:34 PM Subject: RE: Bizarre query performance Have you tried explaining the two select to see where all the time is being spent and how the queries are optimized? Sorry, I should have included that in my detail. +---+---+---+-+-+---+- -+---+ | table | type | possible_keys | key | key_len | ref | rows | Extra | +---+---+---+-+-+---+- -+---+ | X | const | PRIMARY,p1| PRIMARY | 4 | const |1 | | +---+---+---+-+-+---+- -+---+ This is the query that takes 200ms. It performs the same regardless of the particular table involved (all have a similar primary key), or the record fetched. +--+---+++-+-- +--+-- --+ | table| type | possible_keys | key| key_len | ref | rows | Extra | +--+---+++-+-- +--+-- --+ | X| range | ScheduledStart | ScheduledStart | 8 | NULL | 25 | where used | +--+---+++-+-- +--+-- --+ This is the query that takes less than 10ms. Performance should be worse than that above, but it is not. _ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
RE: Bizarre query performance
I suppose your test program connects, and loops the same query multiple times in the same session? (Just to rule out connect/disconnect overhead) Of course. I also run the same query multiple times, to eliminate caching issues. Performance on successive iterations is the same as on the first. It is bizarre how the query that asks for 33 rows on an indexed field works fine, yet the query that asks for 1 row on the primary key performs so badly. What are your timings if you run your client on the SCO box? mysql simply reports a query time of 10ms or less (0.01s). Of course, this doesn't have any network overhead. If all your queries are instantaneous They're not - one Windows box (the 98SE one) reports a query time of 180ms and a data fetch time of 10ms, while the Win2000 box reports the opposite (query 10ms, fetch 180ms) I'd say start dumping packets on the network. I'd agree, but I'm confused as to why a different query (that requests more data; 33 rows vs 1) can reliably execute and fetch in 10ms on all machines? The behaviour is completely reproducible: SELECT (1 record) ON PRIMARY KEY = slow (200ms), SELECT (lots of records on indexed field) = fast (10ms) 200ms sounds a lot like Nagle's Algorithm kicking in (which shouldn't happen assuming the mysql libs are written right). Indeed, I wouldn't have thought they'd have included that! Isn't Nagle restricted to telnet? But anyway, not all queries perform equally badly. - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
RE: Bizarre query performance
On Fri, 5 Oct 2001, Philip Brown wrote: How is your DNS, WINS,... setup? SCO/Caldera UNIX can use DNS when you do not think it will. Most SCO/Caldera tcp what ever will do a forward and reverse DNS look-up. I can add entries in the MS machines in the hosts file location MS OS/install dependent and very times. All machines have hard-wired IP addresses (only 3 on the test network) and all machine names entered in /etc/hosts. I ran a few quick tests from a few of my machines and looked at the what was being transfered. For example NT 4.0, C:\WINNT\system32\drivers\etc\hosts DNS, WINS, ODBC, no ODBC I guess what I am trying to say is I could get results that did not make sense from MS OS to MS OS depending on OS, ODBC, configuration, patches... Even if I had everything set the same they they did not seem to be constistent. Sniffing to see that what was sent was really the same was the only way. To make things more uniform I set visionfs on my SCO/Caldera Unix Box be the primary WINS server. I have a local DNS for my test setup. Make sure every machine has it's own hosts file that is exactly the same. Make sure all MS of the same type are configured and patched the same. Make sure I have the same configuration for mysql clients, server's, make sure database all have good index's,... Then and only then can I get the same results from each MS OS, they do very some between them. Also check your network drivers and settings on the SCO box. I have seen some strange behavior because of HW, configuration, routing,... Not all ms OS's are the same. I have seen some pretty strange things with them. I think you need to get out your sniffer and look at what is really going on. It may be just how MS does things from one OS to another. Then again it may not. Till you know that things are exactly the same, you never really know. All of these networking issues would affect general query performance. However it does not explain the erratic (yet reproducible) nature of how different queries perform badly, or well, depending. That is what I thought till I got out a sniffer and looked at what was really happening. I was supprised to see the differences. When they did the same thing I got similar responses. What do you see from your programs on the server. Do you see the same differences? Good Luck, -- Boyd Gerber [EMAIL PROTECTED] ZENEZ 3748 Valley Forge Road, Magna Utah 84044 Office 801-250-0795 FAX 801-250-7975 - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
Re: Bizarre query performance
In the last episode (Oct 05), Philip Brown said: What are your timings if you run your client on the SCO box? mysql simply reports a query time of 10ms or less (0.01s). Of course, this doesn't have any network overhead. This rules out mysql as the cause for the delay. I'd say start dumping packets on the network. I'd agree, but I'm confused as to why a different query (that requests more data; 33 rows vs 1) can reliably execute and fetch in 10ms on all machines? The behaviour is completely reproducible: SELECT (1 record) ON PRIMARY KEY = slow (200ms), SELECT (lots of records on indexed field) = fast (10ms) The fact that your two queries take different times to process has nothing to do with indices, and more to do with the bytecount of the query and the response. 200ms sounds a lot like Nagle's Algorithm kicking in (which shouldn't happen assuming the mysql libs are written right). Indeed, I wouldn't have thought they'd have included that! Isn't Nagle restricted to telnet? But anyway, not all queries perform equally badly. Nagle's algorithm applies to all TCP sessions unless explicitly disabled. It buffers outgoing data less than your MSS for up to 200ms if there is unacknowleged data already on the wire. This is usually triggered by inefficient code on the sending end that does multiple writes(); the first write() gets sent immediately. Any subsequent writes() get buffered up by Nagle until 200ms or the ACK for the first block of data from the receiving machine. The standard fix is to rewrite the sending code to send all its data in a single write(), but the simple fix (which ends up wasting bandwidth by sending many small packets) is to int var=1; setsockopt(socket, IPPROTO_TCP, TCP_NODELAY, var, sizeof(var)); Chances are your two windows machines have differnt myodbc versions, or different TCP settings in the registry, that make Nagle kick in at different times. For more reading: http://www.faqs.org/rfcs/rfc896.html http://www.openldap.org/lists/openldap-devel/199907/msg00082.html -- Dan Nelson [EMAIL PROTECTED] - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
Re: Bizarre query performance
On Fri, 5 Oct 2001, Dan Nelson wrote: This rules out mysql as the cause for the delay. I agree. I'd say start dumping packets on the network. I'd agree, but I'm confused as to why a different query (that requests more data; 33 rows vs 1) can reliably execute and fetch in 10ms on all machines? The behaviour is completely reproducible: SELECT (1 record) ON PRIMARY KEY = slow (200ms), SELECT (lots of records on indexed field) = fast (10ms) The fact that your two queries take different times to process has nothing to do with indices, and more to do with the bytecount of the query and the response. 200ms sounds a lot like Nagle's Algorithm kicking in (which shouldn't happen assuming the mysql libs are written right). Indeed, I wouldn't have thought they'd have included that! Isn't Nagle restricted to telnet? But anyway, not all queries perform equally badly. Nagle's algorithm applies to all TCP sessions unless explicitly disabled. It buffers outgoing data less than your MSS for up to 200ms if there is unacknowleged data already on the wire. This is usually triggered by inefficient code on the sending end that does multiple writes(); the first write() gets sent immediately. Any subsequent writes() get buffered up by Nagle until 200ms or the ACK for the first block of data from the receiving machine. The standard fix is to rewrite the sending code to send all its data in a single write(), but the simple fix (which ends up wasting bandwidth by sending many small packets) is to int var=1; setsockopt(socket, IPPROTO_TCP, TCP_NODELAY, var, sizeof(var)); Chances are your two windows machines have differnt myodbc versions, or different TCP settings in the registry, that make Nagle kick in at different times. Great. That is exactly what I was trying to point out in my last post. But done really poorly. tt, -- Boyd Gerber [EMAIL PROTECTED] ZENEZ 3748 Valley Forge Road, Magna Utah 84044 Office 801-250-0795 FAX 801-250-7975 - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
how to improve query performance
Hi, I am pretty new to mySql. (In fact just built a databse and started working on it). I want to do keyword search on a column (clob). It may contain millions of records. How do I write query so that it is efficient. Thanks in advance for your time, Puru. __ Do You Yahoo!? Get personalized email addresses from Yahoo! Mail http://personal.mail.yahoo.com/ - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php