Re: Possible bug in self-join order optimization
On Sat, 20 Oct 2001 12:48:36 -0500 Eric <[EMAIL PROTECTED]> wrote: I wouldn't be opposed to implementing this as a part of the join optimizer in MySQL, in fact, I've been reading through it for a few days now... However, it seems like it would be a large project as the join optimizer does not take WHERE conditions on the joins into account at all when estimating number of rows coming from a table. In addition, I would probably need to start storing some more metadata in order to facillitate the kind of optimization I need... has anyone thought We will definitely take a look at this optimizer glitch. -- Regards, -- For technical support contracts, go to https://order.mysql.com/ __ ___ ___ __ / |/ /_ __/ __/ __ \/ /Mr. Sinisa Milivojevic <[EMAIL PROTECTED]> / /|_/ / // /\ \/ /_/ / /__ MySQL AB, FullTime Developer /_/ /_/\_, /___/\___\_\___/ Larnaca, Cyprus <___/ www.mysql.com - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail <[EMAIL PROTECTED]> To unsubscribe, e-mail <[EMAIL PROTECTED]> Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
Re: Possible bug in self-join order optimization
Well, I would definitely have to do the count for each query; not because my table sizes are changing (although they are at a fairly rapid rate), but because the number of rows I want to select is vastly different between queries. This is actually a self-join (refer to first emails from me to this list a few days ago), so I can't just count how many rows are in each table, because there is only one. The problem is that each self-join in my query has widely varying number of rows it returns based on what range I specify for the nvalue column. So I'm not really sure what to do...Wouldn't doing a COUNT on each self-join involved table take almost as much time as running the query on each table? I wouldn't be opposed to implementing this as a part of the join optimizer in MySQL, in fact, I've been reading through it for a few days now... However, it seems like it would be a large project as the join optimizer does not take WHERE conditions on the joins into account at all when estimating number of rows coming from a table. In addition, I would probably need to start storing some more metadata in order to facillitate the kind of optimization I need... has anyone thought about doing this? eric. On Sat, Oct 20, 2001 at 07:52:28PM +0300, Sinisa Milivojevic wrote: > Eric writes: > > I have no problem using STRAIGHT_JOIN, etc. My problem is really just > > figuring out the optimal join order. Is doing a "SELECT COUNT" on > > each of the tables I'm going to join the way to do it? Isn't there > > potential for the count to take as long as the full query processing > > would take (especially since the attribute I'm doing a range on is not > > indexed)? > > > > eric. > > > > -- > > _ _ > > | |(_) http://ir.iit.edu/~ej > > | _| | | Page me via ICQ at > > | |___ | | http://wwp.mirabilis.com/19022931 > > |__/ | or by mailing [EMAIL PROTECTED] > > |__/ > > > > Well , first of all this is recommandation only until we further > improve the optimiser. > > In most applications relative sizes of tables do not change > drastically. It is extremely rare that a tabla A that has 1000 times > less rows then table B would in short time have 1000 times more rows > then table B. > > > -- > Regards, >__ ___ ___ __ > / |/ /_ __/ __/ __ \/ /Mr. Sinisa Milivojevic <[EMAIL PROTECTED]> > / /|_/ / // /\ \/ /_/ / /__ MySQL AB, FullTime Developer > /_/ /_/\_, /___/\___\_\___/ Larnaca, Cyprus ><___/ www.mysql.com > -- _ _ | |(_) http://ir.iit.edu/~ej | _| | | Page me via ICQ at | |___ | | http://wwp.mirabilis.com/19022931 |__/ | or by mailing [EMAIL PROTECTED] |__/ - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail <[EMAIL PROTECTED]> To unsubscribe, e-mail <[EMAIL PROTECTED]> Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
Re: Possible bug in self-join order optimization
Eric writes: > I have no problem using STRAIGHT_JOIN, etc. My problem is really just > figuring out the optimal join order. Is doing a "SELECT COUNT" on > each of the tables I'm going to join the way to do it? Isn't there > potential for the count to take as long as the full query processing > would take (especially since the attribute I'm doing a range on is not > indexed)? > > eric. > > -- > _ _ > | |(_) http://ir.iit.edu/~ej > | _| | | Page me via ICQ at > | |___ | | http://wwp.mirabilis.com/19022931 > |__/ | or by mailing [EMAIL PROTECTED] > |__/ > Well , first of all this is recommandation only until we further improve the optimiser. In most applications relative sizes of tables do not change drastically. It is extremely rare that a tabla A that has 1000 times less rows then table B would in short time have 1000 times more rows then table B. -- Regards, __ ___ ___ __ / |/ /_ __/ __/ __ \/ /Mr. Sinisa Milivojevic <[EMAIL PROTECTED]> / /|_/ / // /\ \/ /_/ / /__ MySQL AB, FullTime Developer /_/ /_/\_, /___/\___\_\___/ Larnaca, Cyprus <___/ www.mysql.com - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail <[EMAIL PROTECTED]> To unsubscribe, e-mail <[EMAIL PROTECTED]> Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
Re: Possible bug in self-join order optimization
The information I require is the number of rows that will come from a SELECT which places a certain range restriction on an unindexed attribute...and I need this to not take much time relative to actually executing the query (constant time would be best). eric. On Sat, Oct 20, 2001 at 03:06:58PM +0300, Sinisa Milivojevic wrote: > Eric writes: > > Well, answering my own email, what I thought was a bug is not one at > > all. I was mistaken in thinking that MySQL paid any attention to the > > WHERE conditions when optimizing the join order beyond determining > > which keys are used for the join, correct? > > > > This is really terrible for queries like mine where the query could be > > sped up by orders of magnitude if the join optimizer would just > > determine which table in the join to scan and which to do the key > > lookup on based on a more intelligent estimation of the number of rows > > from each table. It would have to go beyond looking at what keys are > > used in the join (since each of the tables in my query can be looked > > up by the same key) and account for the WHERE conditions placed on the > > tables in the join. > > > > Is there sufficient metadata to estimate rows coming from a table > > based on conditions placed on the attributes of that table? Where is > > it? Has anyone ever thought of coding this? Can anyone give me a > > place to start? > > > > eric. > > > > > > -- > > _ _ > > | |(_) http://ir.iit.edu/~ej > > | _| | | Page me via ICQ at > > | |___ | | http://wwp.mirabilis.com/19022931 > > |__/ | or by mailing [EMAIL PROTECTED] > > |__/ > > > > > What information do you precisely require ?? > > The answer also depends on the API you are using and a method of > retrieval. > > For example, you can know how many rows you get if you use _store_ > instead of _use_ method, but that method is not applicable in the case > of larger result sets. > > -- > Regards, >__ ___ ___ __ > / |/ /_ __/ __/ __ \/ /Mr. Sinisa Milivojevic <[EMAIL PROTECTED]> > / /|_/ / // /\ \/ /_/ / /__ MySQL AB, FullTime Developer > /_/ /_/\_, /___/\___\_\___/ Larnaca, Cyprus ><___/ www.mysql.com > > > - > Before posting, please check: >http://www.mysql.com/manual.php (the manual) >http://lists.mysql.com/ (the list archive) > > To request this thread, e-mail [EMAIL PROTECTED] > To unsubscribe, e-mail <[EMAIL PROTECTED]> > -- _ _ | |(_) http://ir.iit.edu/~ej | _| | | Page me via ICQ at | |___ | | http://wwp.mirabilis.com/19022931 |__/ | or by mailing [EMAIL PROTECTED] |__/ - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail <[EMAIL PROTECTED]> To unsubscribe, e-mail <[EMAIL PROTECTED]> Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
Re: Possible bug in self-join order optimization
I have no problem using STRAIGHT_JOIN, etc. My problem is really just figuring out the optimal join order. Is doing a "SELECT COUNT" on each of the tables I'm going to join the way to do it? Isn't there potential for the count to take as long as the full query processing would take (especially since the attribute I'm doing a range on is not indexed)? eric. On Sat, Oct 20, 2001 at 02:30:13PM +0300, Sinisa Milivojevic wrote: > On Fri, 19 Oct 2001 13:03:02 -0500 > Eric <[EMAIL PROTECTED]> wrote: > > > > Well, answering my own email, what I thought was a bug is not one at > > all. I was mistaken in thinking that MySQL paid any attention to the > > WHERE conditions when optimizing the join order beyond determining > > which keys are used for the join, correct? > > > > This is really terrible for queries like mine where the query could be > > sped up by orders of magnitude if the join optimizer would just > > determine which table in the join to scan and which to do the key > > lookup on based on a more intelligent estimation of the number of rows > > from each table. It would have to go beyond looking at what keys are > > used in the join (since each of the tables in my query can be looked > > up by the same key) and account for the WHERE conditions placed on the > > tables in the join. > > > > Is there sufficient metadata to estimate rows coming from a table > > based on conditions placed on the attributes of that table? Where is > > it? Has anyone ever thought of coding this? Can anyone give me a > > place to start? > > > > eric. > > HI! > > At least, with LEFT JOIN you can always specify precisely which table is to be >scanned and which to be searched by key. > > In other joins, you can use STRAIGHT_JOIN with correct order of tables in order to >achieve the same. > > Yes, the above is little bit harder to be done with dynamically created queries, but >it is still possible, as you can always get number of rows. > > -- > > Regards, > > -- > For technical support contracts, go to https://order.mysql.com/ >__ ___ ___ __ > / |/ /_ __/ __/ __ \/ /Mr. Sinisa Milivojevic <[EMAIL PROTECTED]> > / /|_/ / // /\ \/ /_/ / /__ MySQL AB, FullTime Developer > /_/ /_/\_, /___/\___\_\___/ Larnaca, Cyprus ><___/ www.mysql.com -- _ _ | |(_) http://ir.iit.edu/~ej | _| | | Page me via ICQ at | |___ | | http://wwp.mirabilis.com/19022931 |__/ | or by mailing [EMAIL PROTECTED] |__/ - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail <[EMAIL PROTECTED]> To unsubscribe, e-mail <[EMAIL PROTECTED]> Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
Re: Possible bug in self-join order optimization
Eric writes: > Well, answering my own email, what I thought was a bug is not one at > all. I was mistaken in thinking that MySQL paid any attention to the > WHERE conditions when optimizing the join order beyond determining > which keys are used for the join, correct? > > This is really terrible for queries like mine where the query could be > sped up by orders of magnitude if the join optimizer would just > determine which table in the join to scan and which to do the key > lookup on based on a more intelligent estimation of the number of rows > from each table. It would have to go beyond looking at what keys are > used in the join (since each of the tables in my query can be looked > up by the same key) and account for the WHERE conditions placed on the > tables in the join. > > Is there sufficient metadata to estimate rows coming from a table > based on conditions placed on the attributes of that table? Where is > it? Has anyone ever thought of coding this? Can anyone give me a > place to start? > > eric. > > > -- > _ _ > | |(_) http://ir.iit.edu/~ej > | _| | | Page me via ICQ at > | |___ | | http://wwp.mirabilis.com/19022931 > |__/ | or by mailing [EMAIL PROTECTED] > |__/ > What information do you precisely require ?? The answer also depends on the API you are using and a method of retrieval. For example, you can know how many rows you get if you use _store_ instead of _use_ method, but that method is not applicable in the case of larger result sets. -- Regards, __ ___ ___ __ / |/ /_ __/ __/ __ \/ /Mr. Sinisa Milivojevic <[EMAIL PROTECTED]> / /|_/ / // /\ \/ /_/ / /__ MySQL AB, FullTime Developer /_/ /_/\_, /___/\___\_\___/ Larnaca, Cyprus <___/ www.mysql.com - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail <[EMAIL PROTECTED]> To unsubscribe, e-mail <[EMAIL PROTECTED]> Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
Re: Possible bug in self-join order optimization
On Fri, 19 Oct 2001 13:03:02 -0500 Eric <[EMAIL PROTECTED]> wrote: > Well, answering my own email, what I thought was a bug is not one at > all. I was mistaken in thinking that MySQL paid any attention to the > WHERE conditions when optimizing the join order beyond determining > which keys are used for the join, correct? > > This is really terrible for queries like mine where the query could be > sped up by orders of magnitude if the join optimizer would just > determine which table in the join to scan and which to do the key > lookup on based on a more intelligent estimation of the number of rows > from each table. It would have to go beyond looking at what keys are > used in the join (since each of the tables in my query can be looked > up by the same key) and account for the WHERE conditions placed on the > tables in the join. > > Is there sufficient metadata to estimate rows coming from a table > based on conditions placed on the attributes of that table? Where is > it? Has anyone ever thought of coding this? Can anyone give me a > place to start? > > eric. HI! At least, with LEFT JOIN you can always specify precisely which table is to be scanned and which to be searched by key. In other joins, you can use STRAIGHT_JOIN with correct order of tables in order to achieve the same. Yes, the above is little bit harder to be done with dynamically created queries, but it is still possible, as you can always get number of rows. -- Regards, -- For technical support contracts, go to https://order.mysql.com/ __ ___ ___ __ / |/ /_ __/ __/ __ \/ /Mr. Sinisa Milivojevic <[EMAIL PROTECTED]> / /|_/ / // /\ \/ /_/ / /__ MySQL AB, FullTime Developer /_/ /_/\_, /___/\___\_\___/ Larnaca, Cyprus <___/ www.mysql.com - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail <[EMAIL PROTECTED]> To unsubscribe, e-mail <[EMAIL PROTECTED]> Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
Re: Possible bug in self-join order optimization
Well, answering my own email, what I thought was a bug is not one at all. I was mistaken in thinking that MySQL paid any attention to the WHERE conditions when optimizing the join order beyond determining which keys are used for the join, correct? This is really terrible for queries like mine where the query could be sped up by orders of magnitude if the join optimizer would just determine which table in the join to scan and which to do the key lookup on based on a more intelligent estimation of the number of rows from each table. It would have to go beyond looking at what keys are used in the join (since each of the tables in my query can be looked up by the same key) and account for the WHERE conditions placed on the tables in the join. Is there sufficient metadata to estimate rows coming from a table based on conditions placed on the attributes of that table? Where is it? Has anyone ever thought of coding this? Can anyone give me a place to start? eric. On Thu, Oct 18, 2001 at 10:40:43AM -0500, Eric wrote: > I am sending this again as I am desperate for some help and believe > this to be a signifigant bug if it actually is one...which it seems to > be. See below for examples. > > What is quite puzzling is MySQL's estimation of the number of rows > from each of the self-joins. The conditions on alias queryTable0 > actually refer to 1582 rows, and the conditions on alias queryTable1 > refer to 39 rows. Notice in the EXPLAIN below that when I flip around > the join order, MySQL thinks that 1152 (which is its estimation for > 1582) rows are coming from queryTable1, whereas with the original join > order, it thought 1152 rows were coming from queryTable0...this seems > like a bug to me since the conditions on those two aliases are the > same between the two queries. Only the "FROM index queryTable0, index > queryTable1" is flipped to "FROM index queryTable1, index queryTable0". > > SELECT DISTINCT queryTable0.num, queryTable0.value, queryTable1.value > FROM > index queryTable0, index queryTable1 WHERE > queryTable0.path=24 AND queryTable0.type="E" AND > queryTable1.path=27 AND queryTable1.type="E" AND > queryTable0.num=queryTable1.num AND > queryTable0.nvalue > 0.0 AND queryTable0.nvalue <= 90.0 AND > queryTable1.nvalue > 140.0 AND queryTable1.nvalue <= 200.0; > > >+-+--+--++-+++--+-+ > | table | type | possible_keys| key| key_len | > ref| rows | Extra | > >+-+--+--++-+++--+-+ > | queryTable0 | ref | pathndx,numndx | pathndx | 4 | > const | 1152 | where used; Using temporary | > | queryTable1 | ref | pathndx,numndx | numndx | 4 | > queryTable0.num | 53 | where used | > >+-+--+--++-+++--+-+ > 2 rows in set (0.01 sec) > > On Wed, Oct 17, 2001 at 04:04:21PM +0300, Michael Widenius wrote: > > We have done some modifications to optimizer in 4.0, but nothing that > > should affect this. > > > > What is the output from EXPLAIN if you swap the tables ? > > EXPLAIN of query with "FROM index queryTable1, index queryTable0": > > >+-+--+--++-++--+-+ > | table | type | possible_keys| key| key_len | > ref| rows | Extra | > >+-+--+--++-++--+-+ > | queryTable1 | ref | pathndx,numndx | pathndx | 4 | > const | 1152 | where used; Using temporary | > | queryTable0 | ref | pathndx,numndx | numndx | 4 | > queryTable1.num | 53 | where used | > >+-+--+--++-++--+-+ > 2 rows in set (0.01 sec) > > > What is the output from "show create table 'index'" > > CREATE TABLE is: > > CREATE TABLE `index` ( > `indexnum` int(10) unsigned NOT NULL auto_increment, > `parent` int(10) unsigned NOT NULL default '0', > `path` int(10) unsigned NOT NULL default '0', > `type` char(1) NOT NULL default '', > `tagname` int(10) unsigned NOT NULL default '0', > `atrname` int(10) unsigned NOT NULL default '0', > `num` int(10) unsigned NOT NULL default '0', > `nvalue` double default NULL, > `value` mediumtext, > PRIMARY KEY (`indexnum`), > KEY `parentndx`(`parent`), > KEY `pathndx`(`path`), > KEY `tagnamendx`(`tagname`), > KEY `atrnamendx`(`atrname`), > KEY `numndx`(`num`), > ) TYPE=MyISAM MAX_ROWS=315360 PACK_KEYS=1 > > -- > _ _ > | ___