Subquery performance slow to non-existent...

2010-01-19 Thread Nick Torenvliet
Hey all...

I am having a problem with sub-queries that I cannot trouble shoot.

I run query a:
select symbol from names where market like 'NYMEX' and name like 'natural
gas {%';

and get  168 names that I manually insert into query b:
 select * from endOfDayData where endOfDayData.market like 'NYMEX' and
endOfDayData.symbol IN
(NGF09,NGF10,NGF11,NGF12,NGF13,NGF14,NGF15,NGF16,NGF17,NGF18,NGF19,NGF20,
NGF21,NGF22,NGG09,NGG10,NGG11,NGG12,NGG13,NGG14,NGG15,NGG16,NGG17,NGG18,NGG19,NGG20,NGG21,NGG22,NGH09,NGH10,NGH11,NGH12,NGH13,NGH14,NGH15,NGH16,
NGH17,NGH18,NGH19,NGH20,NGH21,NGH22,NGJ09,NGJ10,NGJ11,NGJ12,NGJ13,NGJ14,NGJ15,NGJ16,NGJ17,NGJ18,NGJ19,NGJ20,NGJ21,NGJ22,NGK09,NGK10,NGK11,NGK12,
NGK13,NGK14,NGK15,NGK16,NGK17,NGK18,NGK19,NGK20,NGK21,NGK22,NGM09,NGM10,NGM11,NGM12,NGM13,NGM14,NGM15,NGM16,NGM17,NGM18,NGM19,NGM20,NGM21,NGM22,
NGN09,NGN10,NGN11,NGN12,NGN13,NGN14,NGN15,NGN16,NGN17,NGN18,NGN19,NGN20,NGN21,NGN22,NGQ09,NGQ10,NGQ11,NGQ12,NGQ13,NGQ14,NGQ15,NGQ16,NGQ17,NGQ18,
NGQ19,NGQ20,NGQ21,NGQ22,NGU09,NGU10,NGU11,NGU12,NGU13,NGU14,NGU15,NGU16,NGU17,NGU18,NGU19,NGU20,NGU21,NGU22,NGV09,NGV10,NGV11,NGV12,NGV13,NGV14,
NGV15,NGV16,NGV17,NGV18,NGV19,NGV20,NGV21,NGV22,NGX09,NGX10,NGX11,NGX12,NGX13,NGX14,NGX15,NGX16,NGX17,NGX18,NGX19,NGX20,NGX21,NGX22,NGZ09,NGZ10,
NGZ11,NGZ12,NGZ13,NGZ14,NGZ15,NGZ16,NGZ17,NGZ18,NGZ19,NGZ20,NGZ21,NGZ22);

Running query b gives me a result set as follows:

| 2010-01-15 | NYMEX  | NGZ22  |  8.9620 |  8.9680 |  8.9620 |  8.9680
|  0 |
++++-+-+-+-++
86765 rows in set (4.46 sec)

I then because I want to generalize query b I continue by creating query c
as follows:
mysql select * from endOfDayData where endOfDayData.market like 'NYMEX'
and  endOfDayData.symbol IN (select names.symbol from names where
names.market like  'NYMEX' and names.name like 'natural gas {%');

Query c seems to have good syntax as neither the command line mysql
interface nor the gui spit it back but it literally takes forever to run;
I've waited at least twenty minutes and not got anything back.  I'm running
Ubuntu 9.10 on an intel core i7 with 4GB RAM and 12GB swap... the process
monitor doesn't even flinch so I'm not thinking hardware here... why is the
sub-query running so slow?

Thanks for you help!!

Nick


Re: Subquery performance slow to non-existent...

2010-01-19 Thread Dan Nelson
In the last episode (Jan 19), Nick Torenvliet said:
 Hey all...
 
 I am having a problem with sub-queries that I cannot trouble shoot.
 
 I run query a:
 select symbol from names where market like 'NYMEX' and name like 'natural
 gas {%';
 
 and get  168 names that I manually insert into query b:
  select * from endOfDayData where endOfDayData.market like 'NYMEX' and
 endOfDayData.symbol IN
 (NGF09,NGF10,NGF11,NGF12,NGF13,NGF14,NGF15,NGF16,NGF17,NGF18,NGF19,NGF20,
 NGF21,NGF22,NGG09,NGG10,NGG11,NGG12,NGG13,NGG14,NGG15,NGG16,NGG17,NGG18,NGG19,NGG20,NGG21,NGG22,NGH09,NGH10,NGH11,NGH12,NGH13,NGH14,NGH15,NGH16,
 NGH17,NGH18,NGH19,NGH20,NGH21,NGH22,NGJ09,NGJ10,NGJ11,NGJ12,NGJ13,NGJ14,NGJ15,NGJ16,NGJ17,NGJ18,NGJ19,NGJ20,NGJ21,NGJ22,NGK09,NGK10,NGK11,NGK12,
 NGK13,NGK14,NGK15,NGK16,NGK17,NGK18,NGK19,NGK20,NGK21,NGK22,NGM09,NGM10,NGM11,NGM12,NGM13,NGM14,NGM15,NGM16,NGM17,NGM18,NGM19,NGM20,NGM21,NGM22,
 NGN09,NGN10,NGN11,NGN12,NGN13,NGN14,NGN15,NGN16,NGN17,NGN18,NGN19,NGN20,NGN21,NGN22,NGQ09,NGQ10,NGQ11,NGQ12,NGQ13,NGQ14,NGQ15,NGQ16,NGQ17,NGQ18,
 NGQ19,NGQ20,NGQ21,NGQ22,NGU09,NGU10,NGU11,NGU12,NGU13,NGU14,NGU15,NGU16,NGU17,NGU18,NGU19,NGU20,NGU21,NGU22,NGV09,NGV10,NGV11,NGV12,NGV13,NGV14,
 NGV15,NGV16,NGV17,NGV18,NGV19,NGV20,NGV21,NGV22,NGX09,NGX10,NGX11,NGX12,NGX13,NGX14,NGX15,NGX16,NGX17,NGX18,NGX19,NGX20,NGX21,NGX22,NGZ09,NGZ10,
 NGZ11,NGZ12,NGZ13,NGZ14,NGZ15,NGZ16,NGZ17,NGZ18,NGZ19,NGZ20,NGZ21,NGZ22);
 
 Running query b gives me a result set as follows:
 
 | 2010-01-15 | NYMEX  | NGZ22  |  8.9620 |  8.9680 |  8.9620 |  8.9680
 |  0 |
 ++++-+-+-+-++
 86765 rows in set (4.46 sec)
 
 I then because I want to generalize query b I continue by creating query c
 as follows:
 mysql select * from endOfDayData where endOfDayData.market like 'NYMEX'
 and  endOfDayData.symbol IN (select names.symbol from names where
 names.market like  'NYMEX' and names.name like 'natural gas {%');
 
 Query c seems to have good syntax as neither the command line mysql
 interface nor the gui spit it back but it literally takes forever to run;
 I've waited at least twenty minutes and not got anything back.  I'm running
 Ubuntu 9.10 on an intel core i7 with 4GB RAM and 12GB swap... the process
 monitor doesn't even flinch so I'm not thinking hardware here... why is the
 sub-query running so slow?

MySQL's subquery optimizer is pretty bad.  I bet if you explain that query,
mysql thinks the subquery is dependant.  That means that it will run the
subquery for each row of the outer query, even though it's obviously not
going to change from row to row.  Your best bet for now is to do what you're
currently doing with queries A and B.

-- 
Dan Nelson
dnel...@allantgroup.com

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org



Re: Subquery performance slow to non-existent...

2010-01-19 Thread Shawn Green

Nick Torenvliet wrote:

Hey all...

I am having a problem with sub-queries that I cannot trouble shoot.

I run query a:
select symbol from names where market like 'NYMEX' and name like 'natural
gas {%';

and get  168 names that I manually insert into query b:
 select * from endOfDayData where endOfDayData.market like 'NYMEX' and
endOfDayData.symbol IN
(... big list clipped ...);

Running query b gives me a result set as follows:

| 2010-01-15 | NYMEX  | NGZ22  |  8.9620 |  8.9680 |  8.9620 |  8.9680
|  0 |
++++-+-+-+-++
86765 rows in set (4.46 sec)

I then because I want to generalize query b I continue by creating query c
as follows:
mysql select * from endOfDayData where endOfDayData.market like 'NYMEX'
and  endOfDayData.symbol IN (select names.symbol from names where
names.market like  'NYMEX' and names.name like 'natural gas {%');

Query c seems to have good syntax as neither the command line mysql
interface nor the gui spit it back but it literally takes forever to run;
I've waited at least twenty minutes and not got anything back.  I'm running
Ubuntu 9.10 on an intel core i7 with 4GB RAM and 12GB swap... the process
monitor doesn't even flinch so I'm not thinking hardware here... why is the
sub-query running so slow?

Thanks for you help!!


Subqueries are also not indexed. Even if this is an independent 
subquery, the optimizer will still need to scan the results of each 
subquery for every line of the outer query.


Try rewriting this as a joinl

SELECT eod.*
from endOfDayData eod
INNER JOIN names n
  on n.symbol = eod.symbol
  and n.market like 'NYMEX'
  and names.name like 'natural gas {%';

Try that and see what a difference it makes.

--
Shawn Green, MySQL Senior Support Engineer
Sun Microsystems, Inc.
Office: Blountville, TN



--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org



Re: Subquery performance slow to non-existent...

2010-01-19 Thread Peter Brawley

Nick,


select * from endOfDayData where endOfDayData.market like 'NYMEX'
and  endOfDayData.symbol IN (select names.symbol from names where
names.market like  'NYMEX' and names.name like 'natural gas {%');

Query c seems to have good syntax as neither the command line mysql
interface nor the gui spit it back but it literally takes forever to run;


IN(SELECT...) is slow in MySQL; for alternatives see The unbearable slowness of IN() at http://www.artfulsoftware.com/infotree/queries.php. 


Try something like this?

select * 
from endOfDayData d

join names n using (symbol)
where d.market = 'NYMEX' and 'natural gas {' = substr( n.name,1,13);

PB

-

Nick Torenvliet wrote:

Hey all...

I am having a problem with sub-queries that I cannot trouble shoot.

I run query a:
select symbol from names where market like 'NYMEX' and name like 'natural
gas {%';

and get  168 names that I manually insert into query b:
 select * from endOfDayData where endOfDayData.market like 'NYMEX' and
endOfDayData.symbol IN
(NGF09,NGF10,NGF11,NGF12,NGF13,NGF14,NGF15,NGF16,NGF17,NGF18,NGF19,NGF20,
NGF21,NGF22,NGG09,NGG10,NGG11,NGG12,NGG13,NGG14,NGG15,NGG16,NGG17,NGG18,NGG19,NGG20,NGG21,NGG22,NGH09,NGH10,NGH11,NGH12,NGH13,NGH14,NGH15,NGH16,
NGH17,NGH18,NGH19,NGH20,NGH21,NGH22,NGJ09,NGJ10,NGJ11,NGJ12,NGJ13,NGJ14,NGJ15,NGJ16,NGJ17,NGJ18,NGJ19,NGJ20,NGJ21,NGJ22,NGK09,NGK10,NGK11,NGK12,
NGK13,NGK14,NGK15,NGK16,NGK17,NGK18,NGK19,NGK20,NGK21,NGK22,NGM09,NGM10,NGM11,NGM12,NGM13,NGM14,NGM15,NGM16,NGM17,NGM18,NGM19,NGM20,NGM21,NGM22,
NGN09,NGN10,NGN11,NGN12,NGN13,NGN14,NGN15,NGN16,NGN17,NGN18,NGN19,NGN20,NGN21,NGN22,NGQ09,NGQ10,NGQ11,NGQ12,NGQ13,NGQ14,NGQ15,NGQ16,NGQ17,NGQ18,
NGQ19,NGQ20,NGQ21,NGQ22,NGU09,NGU10,NGU11,NGU12,NGU13,NGU14,NGU15,NGU16,NGU17,NGU18,NGU19,NGU20,NGU21,NGU22,NGV09,NGV10,NGV11,NGV12,NGV13,NGV14,
NGV15,NGV16,NGV17,NGV18,NGV19,NGV20,NGV21,NGV22,NGX09,NGX10,NGX11,NGX12,NGX13,NGX14,NGX15,NGX16,NGX17,NGX18,NGX19,NGX20,NGX21,NGX22,NGZ09,NGZ10,
NGZ11,NGZ12,NGZ13,NGZ14,NGZ15,NGZ16,NGZ17,NGZ18,NGZ19,NGZ20,NGZ21,NGZ22);

Running query b gives me a result set as follows:

| 2010-01-15 | NYMEX  | NGZ22  |  8.9620 |  8.9680 |  8.9620 |  8.9680
|  0 |
++++-+-+-+-++
86765 rows in set (4.46 sec)

I then because I want to generalize query b I continue by creating query c
as follows:
mysql select * from endOfDayData where endOfDayData.market like 'NYMEX'
and  endOfDayData.symbol IN (select names.symbol from names where
names.market like  'NYMEX' and names.name like 'natural gas {%');

Query c seems to have good syntax as neither the command line mysql
interface nor the gui spit it back but it literally takes forever to run;
I've waited at least twenty minutes and not got anything back.  I'm running
Ubuntu 9.10 on an intel core i7 with 4GB RAM and 12GB swap... the process
monitor doesn't even flinch so I'm not thinking hardware here... why is the
sub-query running so slow?

Thanks for you help!!

Nick

  




No virus found in this incoming message.
Checked by AVG - www.avg.com 
Version: 8.5.432 / Virus Database: 270.14.150/2632 - Release Date: 01/19/10 07:34:00


  


subquery performance

2006-09-25 Thread Jeff Drew

I have a query with a subquery which does not throw an error, but does not
return either.   I've been testing the query using mysql Query Browser and
the poor dolphin jumps only once a minute or so ;)   I use MYSQL's excellent
error documentation heavily so if the query doesn't throw an error, I'm at a
disadvantage.  Is there an optimization or just better syntax to use?

THE DATA
I have a table that contains an abbreviated identifier and a full
identifier.  The real data is a bit messy so here's a sanitized example:

Abbreviated Column contents:   TR123, RG456
Full Identifier Column contents: TR 123 abc, RG 456 def

THE QUERY
My intent is to:
1. select some of the Abbreviated Column and convert that to a selection for
the Full Identifier Column by:
   - extracting the first 2 characters
   - inserting a space
   - selecting the last 3 characters
   - appending % so I can match any of the last 3 characters in the
Full Identifier
2. select rows from Full Identifier Column based on #1

PROBLEMS
I think I have two problems:
1. in... % syntax  is not present in any examples I've seen.  They are all
like... %  so in may not work.
2. Here's query that runs, but does not return:

select name, address from testTable where FullIdentifier in ( select concat
( substring ( AbbreviatedIdentifier,1,2) ,   ,
substring(AbbreviatedIdentifier from 3) , % ) from testTable where name
like 'Daisy'));

My left join attempt complained  because the data is all in one table. Is
there a better solution than my  FullIdentifier in(select...  ?

I am not an SQL expert so I'd appreciate any ideas on how to correct this
query.

Thanks


Re: subquery performance

2006-09-25 Thread Michael Stassen

Jeff Drew wrote:
 I have a query with a subquery which does not throw an error, but does not
 return either.   I've been testing the query using mysql Query Browser and
 the poor dolphin jumps only once a minute or so ;)   I use MYSQL's
 excellent error documentation heavily so if the query doesn't throw an
 error, I'm at a disadvantage.  Is there an optimization or just better
 syntax to use?

 THE DATA
 I have a table that contains an abbreviated identifier and a full
 identifier.  The real data is a bit messy so here's a sanitized example:

Please don't do that.  When you don't show us the real problem, you increase the 
odds of getting the wrong solution.


 Abbreviated Column contents:   TR123, RG456
 Full Identifier Column contents: TR 123 abc, RG 456 def

Do I understand correctly that your table looks something like

  AbbreviatedIdentifier  FullIdentifier Name   Address
  -  --    ---
  TR123  TR 123 abc a name an address
  RG456  RG 456 def another name   another address

with FullIdentifier as the primary key?  If so, that's a poor table design, 
which is almost certainly causing your problems.  See below.


 THE QUERY
 My intent is to:
 1. select some of the Abbreviated Column and convert that to a selection
 for the Full Identifier Column by:
- extracting the first 2 characters
- inserting a space
- selecting the last 3 characters
- appending % so I can match any of the last 3 characters in the
 Full Identifier

...in the subquery.  The % character is the wildcard for LIKE matches.  There is 
no wildcard for IN.  IN is followed by a list of values, one of which must match 
exactly for the row to be selected, so this won't work as intended.


 2. select rows from Full Identifier Column based on #1

That's very convoluted.  How do you expect this to help?

I'm guessing that FullIdentifier is your primary key.  Because it's a long, 
messy string, you are finding it slow to use it to select rows, particularly 
when you need to find rows with a particular substring buried in the key.  You 
hoped that a shorter string might match more quickly, enabling you to narrow 
down the number of rows where the full id has to be examined.  That will never 
work as you've described it.  The subquery might run faster, but then you must 
do the full id comparison anyway to find rows which match the subquery list 
(FullIdentifier IN ...).  You've actually added overhead.  This *might* work in 
some situations if you had an integer primary key to use to do the matching 
between inner and outer query, but there would probably still be a better way.


In any case, the reality is that MySQL doesn't optimize subqueries all that 
well, so they are seldom the best way to speed up a query.  It may be the case 
that your subquery is being treated as DEPENDENT, in which case it is being run 
once for each row found in the outer query.  That is, once for every row in your 
table.  Since your query never seems to return, I'd bet that's the case here.


 PROBLEMS
 I think I have two problems:
 1. in... % syntax  is not present in any examples I've seen.  They are
 all like... %  so in may not work.

Right, it won't.

 2. Here's query that runs, but does not return:
reformatted

 SELECT name, address
 FROM testTable
 WHERE FullIdentifier
   IN ( SELECT CONCAT(SUBSTRING(AbbreviatedIdentifier,1,2),   ,
  SUBSTRING(AbbreviatedIdentifier FROM 3) , % )
FROM testTable
WHERE name LIKE 'Daisy'));

Is this really your query?  As I've already mentioned, the % won't work for IN. 
 Also, neither FullIdentifier nor AbbreviatedIdentifier is mentioned in the 
selection criteria, so the problem you tried to fix with the subquery is not 
present in this query.  Finally, there is no wildcard in the string that name is 
supposed to match, so there is no need for LIKE.  The intent of this query is 
equivalent to


  SELECT name, address
  FROM testTable
  WHERE name = 'Daisy';

which should be as fast as possible if there is an index on name.

 My left join attempt complained  because the data is all in one table. Is
 there a better solution than my  FullIdentifier in(select...  ?

 I am not an SQL expert so I'd appreciate any ideas on how to correct this
 query.

When I see a string primary key that looks like TR 123 abc, I find it is 
usually a safe bet that the TR means something (a code, a manufacturer, ...), 
the 123 means something (a production run, a part number, a plant number, ...) 
and the abc means something (some detail, a team code, ...).  In other words, 
you've crammed the answers to 3 questions into one column.  That's almost never 
a good idea.


I would (for example):

#Create an auto_inc id column and 3 separate columns for each of the id parts
  ALTER TABLE testTable
  DROP PRIMARY KEY,
  ADD id INTEGER UNSIGNED NOT 

Re: subquery performance

2006-09-25 Thread Jeff Drew

Thanks for the detailed feedback.  I do not create the data so I can't
modify it.  Your illustration of the table is correct.  I'll try to
implement some of your feedback.

Thanks

On 9/25/06, Michael Stassen [EMAIL PROTECTED] wrote:


Jeff Drew wrote:
 I have a query with a subquery which does not throw an error, but does
not
 return either.   I've been testing the query using mysql Query Browser
and
 the poor dolphin jumps only once a minute or so ;)   I use MYSQL's
 excellent error documentation heavily so if the query doesn't throw an
 error, I'm at a disadvantage.  Is there an optimization or just better
 syntax to use?

 THE DATA
 I have a table that contains an abbreviated identifier and a full
 identifier.  The real data is a bit messy so here's a sanitized example:

Please don't do that.  When you don't show us the real problem, you
increase the
odds of getting the wrong solution.

 Abbreviated Column contents:   TR123, RG456
 Full Identifier Column contents: TR 123 abc, RG 456 def

Do I understand correctly that your table looks something like

   AbbreviatedIdentifier  FullIdentifier Name   Address
   -  -- 
---
   TR123  TR 123 abc a name an address
   RG456  RG 456 def another name   another
address

with FullIdentifier as the primary key?  If so, that's a poor table
design,
which is almost certainly causing your problems.  See below.

 THE QUERY
 My intent is to:
 1. select some of the Abbreviated Column and convert that to a selection
 for the Full Identifier Column by:
- extracting the first 2 characters
- inserting a space
- selecting the last 3 characters
- appending % so I can match any of the last 3 characters in
the
 Full Identifier

...in the subquery.  The % character is the wildcard for LIKE
matches.  There is
no wildcard for IN.  IN is followed by a list of values, one of which must
match
exactly for the row to be selected, so this won't work as intended.

 2. select rows from Full Identifier Column based on #1

That's very convoluted.  How do you expect this to help?

I'm guessing that FullIdentifier is your primary key.  Because it's a
long,
messy string, you are finding it slow to use it to select rows,
particularly
when you need to find rows with a particular substring buried in the
key.  You
hoped that a shorter string might match more quickly, enabling you to
narrow
down the number of rows where the full id has to be examined.  That will
never
work as you've described it.  The subquery might run faster, but then you
must
do the full id comparison anyway to find rows which match the subquery
list
(FullIdentifier IN ...).  You've actually added overhead.  This *might*
work in
some situations if you had an integer primary key to use to do the
matching
between inner and outer query, but there would probably still be a better
way.

In any case, the reality is that MySQL doesn't optimize subqueries all
that
well, so they are seldom the best way to speed up a query.  It may be the
case
that your subquery is being treated as DEPENDENT, in which case it is
being run
once for each row found in the outer query.  That is, once for every row
in your
table.  Since your query never seems to return, I'd bet that's the case
here.

 PROBLEMS
 I think I have two problems:
 1. in... % syntax  is not present in any examples I've seen.  They are
 all like... %  so in may not work.

Right, it won't.

 2. Here's query that runs, but does not return:
reformatted

 SELECT name, address
 FROM testTable
 WHERE FullIdentifier
   IN ( SELECT CONCAT(SUBSTRING(AbbreviatedIdentifier,1,2),   ,
  SUBSTRING(AbbreviatedIdentifier FROM 3) , % )
FROM testTable
WHERE name LIKE 'Daisy'));

Is this really your query?  As I've already mentioned, the % won't work
for IN.
  Also, neither FullIdentifier nor AbbreviatedIdentifier is mentioned in
the
selection criteria, so the problem you tried to fix with the subquery is
not
present in this query.  Finally, there is no wildcard in the string that
name is
supposed to match, so there is no need for LIKE.  The intent of this query
is
equivalent to

   SELECT name, address
   FROM testTable
   WHERE name = 'Daisy';

which should be as fast as possible if there is an index on name.

 My left join attempt complained  because the data is all in one table.
Is
 there a better solution than my  FullIdentifier in(select...  ?

 I am not an SQL expert so I'd appreciate any ideas on how to correct
this
 query.

When I see a string primary key that looks like TR 123 abc, I find it is
usually a safe bet that the TR means something (a code, a manufacturer,
...),
the 123 means something (a production run, a part number, a plant
number, ...)
and the abc means something (some detail, a team code, ...).  In other
words,
you've crammed the answers to 3 questions into one column.