Subquery performance slow to non-existent...
Hey all... I am having a problem with sub-queries that I cannot trouble shoot. I run query a: select symbol from names where market like 'NYMEX' and name like 'natural gas {%'; and get 168 names that I manually insert into query b: select * from endOfDayData where endOfDayData.market like 'NYMEX' and endOfDayData.symbol IN (NGF09,NGF10,NGF11,NGF12,NGF13,NGF14,NGF15,NGF16,NGF17,NGF18,NGF19,NGF20, NGF21,NGF22,NGG09,NGG10,NGG11,NGG12,NGG13,NGG14,NGG15,NGG16,NGG17,NGG18,NGG19,NGG20,NGG21,NGG22,NGH09,NGH10,NGH11,NGH12,NGH13,NGH14,NGH15,NGH16, NGH17,NGH18,NGH19,NGH20,NGH21,NGH22,NGJ09,NGJ10,NGJ11,NGJ12,NGJ13,NGJ14,NGJ15,NGJ16,NGJ17,NGJ18,NGJ19,NGJ20,NGJ21,NGJ22,NGK09,NGK10,NGK11,NGK12, NGK13,NGK14,NGK15,NGK16,NGK17,NGK18,NGK19,NGK20,NGK21,NGK22,NGM09,NGM10,NGM11,NGM12,NGM13,NGM14,NGM15,NGM16,NGM17,NGM18,NGM19,NGM20,NGM21,NGM22, NGN09,NGN10,NGN11,NGN12,NGN13,NGN14,NGN15,NGN16,NGN17,NGN18,NGN19,NGN20,NGN21,NGN22,NGQ09,NGQ10,NGQ11,NGQ12,NGQ13,NGQ14,NGQ15,NGQ16,NGQ17,NGQ18, NGQ19,NGQ20,NGQ21,NGQ22,NGU09,NGU10,NGU11,NGU12,NGU13,NGU14,NGU15,NGU16,NGU17,NGU18,NGU19,NGU20,NGU21,NGU22,NGV09,NGV10,NGV11,NGV12,NGV13,NGV14, NGV15,NGV16,NGV17,NGV18,NGV19,NGV20,NGV21,NGV22,NGX09,NGX10,NGX11,NGX12,NGX13,NGX14,NGX15,NGX16,NGX17,NGX18,NGX19,NGX20,NGX21,NGX22,NGZ09,NGZ10, NGZ11,NGZ12,NGZ13,NGZ14,NGZ15,NGZ16,NGZ17,NGZ18,NGZ19,NGZ20,NGZ21,NGZ22); Running query b gives me a result set as follows: | 2010-01-15 | NYMEX | NGZ22 | 8.9620 | 8.9680 | 8.9620 | 8.9680 | 0 | ++++-+-+-+-++ 86765 rows in set (4.46 sec) I then because I want to generalize query b I continue by creating query c as follows: mysql select * from endOfDayData where endOfDayData.market like 'NYMEX' and endOfDayData.symbol IN (select names.symbol from names where names.market like 'NYMEX' and names.name like 'natural gas {%'); Query c seems to have good syntax as neither the command line mysql interface nor the gui spit it back but it literally takes forever to run; I've waited at least twenty minutes and not got anything back. I'm running Ubuntu 9.10 on an intel core i7 with 4GB RAM and 12GB swap... the process monitor doesn't even flinch so I'm not thinking hardware here... why is the sub-query running so slow? Thanks for you help!! Nick
Re: Subquery performance slow to non-existent...
In the last episode (Jan 19), Nick Torenvliet said: Hey all... I am having a problem with sub-queries that I cannot trouble shoot. I run query a: select symbol from names where market like 'NYMEX' and name like 'natural gas {%'; and get 168 names that I manually insert into query b: select * from endOfDayData where endOfDayData.market like 'NYMEX' and endOfDayData.symbol IN (NGF09,NGF10,NGF11,NGF12,NGF13,NGF14,NGF15,NGF16,NGF17,NGF18,NGF19,NGF20, NGF21,NGF22,NGG09,NGG10,NGG11,NGG12,NGG13,NGG14,NGG15,NGG16,NGG17,NGG18,NGG19,NGG20,NGG21,NGG22,NGH09,NGH10,NGH11,NGH12,NGH13,NGH14,NGH15,NGH16, NGH17,NGH18,NGH19,NGH20,NGH21,NGH22,NGJ09,NGJ10,NGJ11,NGJ12,NGJ13,NGJ14,NGJ15,NGJ16,NGJ17,NGJ18,NGJ19,NGJ20,NGJ21,NGJ22,NGK09,NGK10,NGK11,NGK12, NGK13,NGK14,NGK15,NGK16,NGK17,NGK18,NGK19,NGK20,NGK21,NGK22,NGM09,NGM10,NGM11,NGM12,NGM13,NGM14,NGM15,NGM16,NGM17,NGM18,NGM19,NGM20,NGM21,NGM22, NGN09,NGN10,NGN11,NGN12,NGN13,NGN14,NGN15,NGN16,NGN17,NGN18,NGN19,NGN20,NGN21,NGN22,NGQ09,NGQ10,NGQ11,NGQ12,NGQ13,NGQ14,NGQ15,NGQ16,NGQ17,NGQ18, NGQ19,NGQ20,NGQ21,NGQ22,NGU09,NGU10,NGU11,NGU12,NGU13,NGU14,NGU15,NGU16,NGU17,NGU18,NGU19,NGU20,NGU21,NGU22,NGV09,NGV10,NGV11,NGV12,NGV13,NGV14, NGV15,NGV16,NGV17,NGV18,NGV19,NGV20,NGV21,NGV22,NGX09,NGX10,NGX11,NGX12,NGX13,NGX14,NGX15,NGX16,NGX17,NGX18,NGX19,NGX20,NGX21,NGX22,NGZ09,NGZ10, NGZ11,NGZ12,NGZ13,NGZ14,NGZ15,NGZ16,NGZ17,NGZ18,NGZ19,NGZ20,NGZ21,NGZ22); Running query b gives me a result set as follows: | 2010-01-15 | NYMEX | NGZ22 | 8.9620 | 8.9680 | 8.9620 | 8.9680 | 0 | ++++-+-+-+-++ 86765 rows in set (4.46 sec) I then because I want to generalize query b I continue by creating query c as follows: mysql select * from endOfDayData where endOfDayData.market like 'NYMEX' and endOfDayData.symbol IN (select names.symbol from names where names.market like 'NYMEX' and names.name like 'natural gas {%'); Query c seems to have good syntax as neither the command line mysql interface nor the gui spit it back but it literally takes forever to run; I've waited at least twenty minutes and not got anything back. I'm running Ubuntu 9.10 on an intel core i7 with 4GB RAM and 12GB swap... the process monitor doesn't even flinch so I'm not thinking hardware here... why is the sub-query running so slow? MySQL's subquery optimizer is pretty bad. I bet if you explain that query, mysql thinks the subquery is dependant. That means that it will run the subquery for each row of the outer query, even though it's obviously not going to change from row to row. Your best bet for now is to do what you're currently doing with queries A and B. -- Dan Nelson dnel...@allantgroup.com -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: Subquery performance slow to non-existent...
Nick Torenvliet wrote: Hey all... I am having a problem with sub-queries that I cannot trouble shoot. I run query a: select symbol from names where market like 'NYMEX' and name like 'natural gas {%'; and get 168 names that I manually insert into query b: select * from endOfDayData where endOfDayData.market like 'NYMEX' and endOfDayData.symbol IN (... big list clipped ...); Running query b gives me a result set as follows: | 2010-01-15 | NYMEX | NGZ22 | 8.9620 | 8.9680 | 8.9620 | 8.9680 | 0 | ++++-+-+-+-++ 86765 rows in set (4.46 sec) I then because I want to generalize query b I continue by creating query c as follows: mysql select * from endOfDayData where endOfDayData.market like 'NYMEX' and endOfDayData.symbol IN (select names.symbol from names where names.market like 'NYMEX' and names.name like 'natural gas {%'); Query c seems to have good syntax as neither the command line mysql interface nor the gui spit it back but it literally takes forever to run; I've waited at least twenty minutes and not got anything back. I'm running Ubuntu 9.10 on an intel core i7 with 4GB RAM and 12GB swap... the process monitor doesn't even flinch so I'm not thinking hardware here... why is the sub-query running so slow? Thanks for you help!! Subqueries are also not indexed. Even if this is an independent subquery, the optimizer will still need to scan the results of each subquery for every line of the outer query. Try rewriting this as a joinl SELECT eod.* from endOfDayData eod INNER JOIN names n on n.symbol = eod.symbol and n.market like 'NYMEX' and names.name like 'natural gas {%'; Try that and see what a difference it makes. -- Shawn Green, MySQL Senior Support Engineer Sun Microsystems, Inc. Office: Blountville, TN -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: Subquery performance slow to non-existent...
Nick, select * from endOfDayData where endOfDayData.market like 'NYMEX' and endOfDayData.symbol IN (select names.symbol from names where names.market like 'NYMEX' and names.name like 'natural gas {%'); Query c seems to have good syntax as neither the command line mysql interface nor the gui spit it back but it literally takes forever to run; IN(SELECT...) is slow in MySQL; for alternatives see The unbearable slowness of IN() at http://www.artfulsoftware.com/infotree/queries.php. Try something like this? select * from endOfDayData d join names n using (symbol) where d.market = 'NYMEX' and 'natural gas {' = substr( n.name,1,13); PB - Nick Torenvliet wrote: Hey all... I am having a problem with sub-queries that I cannot trouble shoot. I run query a: select symbol from names where market like 'NYMEX' and name like 'natural gas {%'; and get 168 names that I manually insert into query b: select * from endOfDayData where endOfDayData.market like 'NYMEX' and endOfDayData.symbol IN (NGF09,NGF10,NGF11,NGF12,NGF13,NGF14,NGF15,NGF16,NGF17,NGF18,NGF19,NGF20, NGF21,NGF22,NGG09,NGG10,NGG11,NGG12,NGG13,NGG14,NGG15,NGG16,NGG17,NGG18,NGG19,NGG20,NGG21,NGG22,NGH09,NGH10,NGH11,NGH12,NGH13,NGH14,NGH15,NGH16, NGH17,NGH18,NGH19,NGH20,NGH21,NGH22,NGJ09,NGJ10,NGJ11,NGJ12,NGJ13,NGJ14,NGJ15,NGJ16,NGJ17,NGJ18,NGJ19,NGJ20,NGJ21,NGJ22,NGK09,NGK10,NGK11,NGK12, NGK13,NGK14,NGK15,NGK16,NGK17,NGK18,NGK19,NGK20,NGK21,NGK22,NGM09,NGM10,NGM11,NGM12,NGM13,NGM14,NGM15,NGM16,NGM17,NGM18,NGM19,NGM20,NGM21,NGM22, NGN09,NGN10,NGN11,NGN12,NGN13,NGN14,NGN15,NGN16,NGN17,NGN18,NGN19,NGN20,NGN21,NGN22,NGQ09,NGQ10,NGQ11,NGQ12,NGQ13,NGQ14,NGQ15,NGQ16,NGQ17,NGQ18, NGQ19,NGQ20,NGQ21,NGQ22,NGU09,NGU10,NGU11,NGU12,NGU13,NGU14,NGU15,NGU16,NGU17,NGU18,NGU19,NGU20,NGU21,NGU22,NGV09,NGV10,NGV11,NGV12,NGV13,NGV14, NGV15,NGV16,NGV17,NGV18,NGV19,NGV20,NGV21,NGV22,NGX09,NGX10,NGX11,NGX12,NGX13,NGX14,NGX15,NGX16,NGX17,NGX18,NGX19,NGX20,NGX21,NGX22,NGZ09,NGZ10, NGZ11,NGZ12,NGZ13,NGZ14,NGZ15,NGZ16,NGZ17,NGZ18,NGZ19,NGZ20,NGZ21,NGZ22); Running query b gives me a result set as follows: | 2010-01-15 | NYMEX | NGZ22 | 8.9620 | 8.9680 | 8.9620 | 8.9680 | 0 | ++++-+-+-+-++ 86765 rows in set (4.46 sec) I then because I want to generalize query b I continue by creating query c as follows: mysql select * from endOfDayData where endOfDayData.market like 'NYMEX' and endOfDayData.symbol IN (select names.symbol from names where names.market like 'NYMEX' and names.name like 'natural gas {%'); Query c seems to have good syntax as neither the command line mysql interface nor the gui spit it back but it literally takes forever to run; I've waited at least twenty minutes and not got anything back. I'm running Ubuntu 9.10 on an intel core i7 with 4GB RAM and 12GB swap... the process monitor doesn't even flinch so I'm not thinking hardware here... why is the sub-query running so slow? Thanks for you help!! Nick No virus found in this incoming message. Checked by AVG - www.avg.com Version: 8.5.432 / Virus Database: 270.14.150/2632 - Release Date: 01/19/10 07:34:00
subquery performance
I have a query with a subquery which does not throw an error, but does not return either. I've been testing the query using mysql Query Browser and the poor dolphin jumps only once a minute or so ;) I use MYSQL's excellent error documentation heavily so if the query doesn't throw an error, I'm at a disadvantage. Is there an optimization or just better syntax to use? THE DATA I have a table that contains an abbreviated identifier and a full identifier. The real data is a bit messy so here's a sanitized example: Abbreviated Column contents: TR123, RG456 Full Identifier Column contents: TR 123 abc, RG 456 def THE QUERY My intent is to: 1. select some of the Abbreviated Column and convert that to a selection for the Full Identifier Column by: - extracting the first 2 characters - inserting a space - selecting the last 3 characters - appending % so I can match any of the last 3 characters in the Full Identifier 2. select rows from Full Identifier Column based on #1 PROBLEMS I think I have two problems: 1. in... % syntax is not present in any examples I've seen. They are all like... % so in may not work. 2. Here's query that runs, but does not return: select name, address from testTable where FullIdentifier in ( select concat ( substring ( AbbreviatedIdentifier,1,2) , , substring(AbbreviatedIdentifier from 3) , % ) from testTable where name like 'Daisy')); My left join attempt complained because the data is all in one table. Is there a better solution than my FullIdentifier in(select... ? I am not an SQL expert so I'd appreciate any ideas on how to correct this query. Thanks
Re: subquery performance
Jeff Drew wrote: I have a query with a subquery which does not throw an error, but does not return either. I've been testing the query using mysql Query Browser and the poor dolphin jumps only once a minute or so ;) I use MYSQL's excellent error documentation heavily so if the query doesn't throw an error, I'm at a disadvantage. Is there an optimization or just better syntax to use? THE DATA I have a table that contains an abbreviated identifier and a full identifier. The real data is a bit messy so here's a sanitized example: Please don't do that. When you don't show us the real problem, you increase the odds of getting the wrong solution. Abbreviated Column contents: TR123, RG456 Full Identifier Column contents: TR 123 abc, RG 456 def Do I understand correctly that your table looks something like AbbreviatedIdentifier FullIdentifier Name Address - -- --- TR123 TR 123 abc a name an address RG456 RG 456 def another name another address with FullIdentifier as the primary key? If so, that's a poor table design, which is almost certainly causing your problems. See below. THE QUERY My intent is to: 1. select some of the Abbreviated Column and convert that to a selection for the Full Identifier Column by: - extracting the first 2 characters - inserting a space - selecting the last 3 characters - appending % so I can match any of the last 3 characters in the Full Identifier ...in the subquery. The % character is the wildcard for LIKE matches. There is no wildcard for IN. IN is followed by a list of values, one of which must match exactly for the row to be selected, so this won't work as intended. 2. select rows from Full Identifier Column based on #1 That's very convoluted. How do you expect this to help? I'm guessing that FullIdentifier is your primary key. Because it's a long, messy string, you are finding it slow to use it to select rows, particularly when you need to find rows with a particular substring buried in the key. You hoped that a shorter string might match more quickly, enabling you to narrow down the number of rows where the full id has to be examined. That will never work as you've described it. The subquery might run faster, but then you must do the full id comparison anyway to find rows which match the subquery list (FullIdentifier IN ...). You've actually added overhead. This *might* work in some situations if you had an integer primary key to use to do the matching between inner and outer query, but there would probably still be a better way. In any case, the reality is that MySQL doesn't optimize subqueries all that well, so they are seldom the best way to speed up a query. It may be the case that your subquery is being treated as DEPENDENT, in which case it is being run once for each row found in the outer query. That is, once for every row in your table. Since your query never seems to return, I'd bet that's the case here. PROBLEMS I think I have two problems: 1. in... % syntax is not present in any examples I've seen. They are all like... % so in may not work. Right, it won't. 2. Here's query that runs, but does not return: reformatted SELECT name, address FROM testTable WHERE FullIdentifier IN ( SELECT CONCAT(SUBSTRING(AbbreviatedIdentifier,1,2), , SUBSTRING(AbbreviatedIdentifier FROM 3) , % ) FROM testTable WHERE name LIKE 'Daisy')); Is this really your query? As I've already mentioned, the % won't work for IN. Also, neither FullIdentifier nor AbbreviatedIdentifier is mentioned in the selection criteria, so the problem you tried to fix with the subquery is not present in this query. Finally, there is no wildcard in the string that name is supposed to match, so there is no need for LIKE. The intent of this query is equivalent to SELECT name, address FROM testTable WHERE name = 'Daisy'; which should be as fast as possible if there is an index on name. My left join attempt complained because the data is all in one table. Is there a better solution than my FullIdentifier in(select... ? I am not an SQL expert so I'd appreciate any ideas on how to correct this query. When I see a string primary key that looks like TR 123 abc, I find it is usually a safe bet that the TR means something (a code, a manufacturer, ...), the 123 means something (a production run, a part number, a plant number, ...) and the abc means something (some detail, a team code, ...). In other words, you've crammed the answers to 3 questions into one column. That's almost never a good idea. I would (for example): #Create an auto_inc id column and 3 separate columns for each of the id parts ALTER TABLE testTable DROP PRIMARY KEY, ADD id INTEGER UNSIGNED NOT
Re: subquery performance
Thanks for the detailed feedback. I do not create the data so I can't modify it. Your illustration of the table is correct. I'll try to implement some of your feedback. Thanks On 9/25/06, Michael Stassen [EMAIL PROTECTED] wrote: Jeff Drew wrote: I have a query with a subquery which does not throw an error, but does not return either. I've been testing the query using mysql Query Browser and the poor dolphin jumps only once a minute or so ;) I use MYSQL's excellent error documentation heavily so if the query doesn't throw an error, I'm at a disadvantage. Is there an optimization or just better syntax to use? THE DATA I have a table that contains an abbreviated identifier and a full identifier. The real data is a bit messy so here's a sanitized example: Please don't do that. When you don't show us the real problem, you increase the odds of getting the wrong solution. Abbreviated Column contents: TR123, RG456 Full Identifier Column contents: TR 123 abc, RG 456 def Do I understand correctly that your table looks something like AbbreviatedIdentifier FullIdentifier Name Address - -- --- TR123 TR 123 abc a name an address RG456 RG 456 def another name another address with FullIdentifier as the primary key? If so, that's a poor table design, which is almost certainly causing your problems. See below. THE QUERY My intent is to: 1. select some of the Abbreviated Column and convert that to a selection for the Full Identifier Column by: - extracting the first 2 characters - inserting a space - selecting the last 3 characters - appending % so I can match any of the last 3 characters in the Full Identifier ...in the subquery. The % character is the wildcard for LIKE matches. There is no wildcard for IN. IN is followed by a list of values, one of which must match exactly for the row to be selected, so this won't work as intended. 2. select rows from Full Identifier Column based on #1 That's very convoluted. How do you expect this to help? I'm guessing that FullIdentifier is your primary key. Because it's a long, messy string, you are finding it slow to use it to select rows, particularly when you need to find rows with a particular substring buried in the key. You hoped that a shorter string might match more quickly, enabling you to narrow down the number of rows where the full id has to be examined. That will never work as you've described it. The subquery might run faster, but then you must do the full id comparison anyway to find rows which match the subquery list (FullIdentifier IN ...). You've actually added overhead. This *might* work in some situations if you had an integer primary key to use to do the matching between inner and outer query, but there would probably still be a better way. In any case, the reality is that MySQL doesn't optimize subqueries all that well, so they are seldom the best way to speed up a query. It may be the case that your subquery is being treated as DEPENDENT, in which case it is being run once for each row found in the outer query. That is, once for every row in your table. Since your query never seems to return, I'd bet that's the case here. PROBLEMS I think I have two problems: 1. in... % syntax is not present in any examples I've seen. They are all like... % so in may not work. Right, it won't. 2. Here's query that runs, but does not return: reformatted SELECT name, address FROM testTable WHERE FullIdentifier IN ( SELECT CONCAT(SUBSTRING(AbbreviatedIdentifier,1,2), , SUBSTRING(AbbreviatedIdentifier FROM 3) , % ) FROM testTable WHERE name LIKE 'Daisy')); Is this really your query? As I've already mentioned, the % won't work for IN. Also, neither FullIdentifier nor AbbreviatedIdentifier is mentioned in the selection criteria, so the problem you tried to fix with the subquery is not present in this query. Finally, there is no wildcard in the string that name is supposed to match, so there is no need for LIKE. The intent of this query is equivalent to SELECT name, address FROM testTable WHERE name = 'Daisy'; which should be as fast as possible if there is an index on name. My left join attempt complained because the data is all in one table. Is there a better solution than my FullIdentifier in(select... ? I am not an SQL expert so I'd appreciate any ideas on how to correct this query. When I see a string primary key that looks like TR 123 abc, I find it is usually a safe bet that the TR means something (a code, a manufacturer, ...), the 123 means something (a production run, a part number, a plant number, ...) and the abc means something (some detail, a team code, ...). In other words, you've crammed the answers to 3 questions into one column.