Re: Possible bug in self-join order optimization

2001-10-22 Thread Sinisa Milivojevic

On Sat, 20 Oct 2001 12:48:36 -0500
Eric <[EMAIL PROTECTED]> wrote:


I wouldn't be opposed to implementing this as a part of the join
optimizer in MySQL, in fact, I've been reading through it for a few
days now...  However, it seems like it would be a large project as the
join optimizer does not take WHERE conditions on the joins into
account at all when estimating number of rows coming from a table.  In
addition, I would probably need to start storing some more metadata in
order to facillitate the kind of optimization I need...  has anyone
thought 

We will definitely take a look at this optimizer glitch.

--

Regards,

--
For technical support contracts, go to https://order.mysql.com/
   __  ___ ___   __
  /  |/  /_ __/ __/ __ \/ /Mr. Sinisa Milivojevic <[EMAIL PROTECTED]>
 / /|_/ / // /\ \/ /_/ / /__   MySQL AB, FullTime Developer
/_/  /_/\_, /___/\___\_\___/   Larnaca, Cyprus
   <___/   www.mysql.com

-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




Re: Possible bug in self-join order optimization

2001-10-20 Thread Eric

Well, I would definitely have to do the count for each query; not
because my table sizes are changing (although they are at a fairly
rapid rate), but because the number of rows I want to select is vastly
different between queries.  This is actually a self-join (refer to
first emails from me to this list a few days ago), so I can't just
count how many rows are in each table, because there is only one.  The
problem is that each self-join in my query has widely varying number
of rows it returns based on what range I specify for the nvalue
column.

So I'm not really sure what to do...Wouldn't doing a COUNT on each
self-join involved table take almost as much time as running the query
on each table?

I wouldn't be opposed to implementing this as a part of the join
optimizer in MySQL, in fact, I've been reading through it for a few
days now...  However, it seems like it would be a large project as the
join optimizer does not take WHERE conditions on the joins into
account at all when estimating number of rows coming from a table.  In
addition, I would probably need to start storing some more metadata in
order to facillitate the kind of optimization I need...  has anyone
thought about doing this?

eric.

On Sat, Oct 20, 2001 at 07:52:28PM +0300, Sinisa Milivojevic wrote:
> Eric writes:
> > I have no problem using STRAIGHT_JOIN, etc.  My problem is really just
> > figuring out the optimal join order.  Is doing a "SELECT COUNT" on
> > each of the tables I'm going to join the way to do it?  Isn't there
> > potential for the count to take as long as the full query processing
> > would take (especially since the attribute I'm doing a range on is not
> > indexed)?
> > 
> > eric.
> > 
> > -- 
> >  _  _ 
> > | |(_) http://ir.iit.edu/~ej
> > |  _|  | | Page me via ICQ at
> > | |___ | | http://wwp.mirabilis.com/19022931
> > |__/ | or by mailing [EMAIL PROTECTED]
> >  |__/
> > 
> 
> Well , first of all this is recommandation only until we further
> improve the optimiser.
> 
> In most applications relative sizes of tables do not change
> drastically. It is extremely rare that a tabla A that has 1000 times
> less rows then table B would in short time have 1000 times more rows
> then table B.
> 
> 
> -- 
> Regards,
>__  ___ ___   __
>   /  |/  /_ __/ __/ __ \/ /Mr. Sinisa Milivojevic <[EMAIL PROTECTED]>
>  / /|_/ / // /\ \/ /_/ / /__   MySQL AB, FullTime Developer
> /_/  /_/\_, /___/\___\_\___/   Larnaca, Cyprus
><___/   www.mysql.com
> 

-- 
 _  _ 
| |(_) http://ir.iit.edu/~ej
|  _|  | | Page me via ICQ at
| |___ | | http://wwp.mirabilis.com/19022931
|__/ | or by mailing [EMAIL PROTECTED]
 |__/

-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




Re: Possible bug in self-join order optimization

2001-10-20 Thread Sinisa Milivojevic

Eric writes:
> I have no problem using STRAIGHT_JOIN, etc.  My problem is really just
> figuring out the optimal join order.  Is doing a "SELECT COUNT" on
> each of the tables I'm going to join the way to do it?  Isn't there
> potential for the count to take as long as the full query processing
> would take (especially since the attribute I'm doing a range on is not
> indexed)?
> 
> eric.
> 
> -- 
>  _  _ 
> | |(_) http://ir.iit.edu/~ej
> |  _|  | | Page me via ICQ at
> | |___ | | http://wwp.mirabilis.com/19022931
> |__/ | or by mailing [EMAIL PROTECTED]
>  |__/
> 

Well , first of all this is recommandation only until we further
improve the optimiser.

In most applications relative sizes of tables do not change
drastically. It is extremely rare that a tabla A that has 1000 times
less rows then table B would in short time have 1000 times more rows
then table B.


-- 
Regards,
   __  ___ ___   __
  /  |/  /_ __/ __/ __ \/ /Mr. Sinisa Milivojevic <[EMAIL PROTECTED]>
 / /|_/ / // /\ \/ /_/ / /__   MySQL AB, FullTime Developer
/_/  /_/\_, /___/\___\_\___/   Larnaca, Cyprus
   <___/   www.mysql.com


-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




Re: Possible bug in self-join order optimization

2001-10-20 Thread Eric

The information I require is the number of rows that will come from a
SELECT which places a certain range restriction on an unindexed
attribute...and I need this to not take much time relative to actually
executing the query (constant time would be best).

eric.

On Sat, Oct 20, 2001 at 03:06:58PM +0300, Sinisa Milivojevic wrote:
> Eric writes:
> > Well, answering my own email, what I thought was a bug is not one at
> > all.  I was mistaken in thinking that MySQL paid any attention to the
> > WHERE conditions when optimizing the join order beyond determining
> > which keys are used for the join, correct?  
> > 
> > This is really terrible for queries like mine where the query could be
> > sped up by orders of magnitude if the join optimizer would just
> > determine which table in the join to scan and which to do the key
> > lookup on based on a more intelligent estimation of the number of rows
> > from each table.  It would have to go beyond looking at what keys are
> > used in the join (since each of the tables in my query can be looked
> > up by the same key) and account for the WHERE conditions placed on the
> > tables in the join.
> > 
> > Is there sufficient metadata to estimate rows coming from a table
> > based on conditions placed on the attributes of that table?  Where is
> > it?  Has anyone ever thought of coding this?  Can anyone give me a
> > place to start?
> > 
> > eric.
> > 
> > 
> > -- 
> >  _  _ 
> > | |(_) http://ir.iit.edu/~ej
> > |  _|  | | Page me via ICQ at
> > | |___ | | http://wwp.mirabilis.com/19022931
> > |__/ | or by mailing [EMAIL PROTECTED]
> >  |__/
> > 
> 
> 
> What information do you precisely require ??
> 
> The answer also depends on the API you are using and a method of
> retrieval. 
> 
> For example, you can know how many rows you get if you use _store_
> instead of _use_ method, but that method is not applicable in the case
> of larger result sets.
> 
> -- 
> Regards,
>__  ___ ___   __
>   /  |/  /_ __/ __/ __ \/ /Mr. Sinisa Milivojevic <[EMAIL PROTECTED]>
>  / /|_/ / // /\ \/ /_/ / /__   MySQL AB, FullTime Developer
> /_/  /_/\_, /___/\___\_\___/   Larnaca, Cyprus
><___/   www.mysql.com
> 
> 
> -
> Before posting, please check:
>http://www.mysql.com/manual.php   (the manual)
>http://lists.mysql.com/   (the list archive)
> 
> To request this thread, e-mail [EMAIL PROTECTED]
> To unsubscribe, e-mail <[EMAIL PROTECTED]>
> 

-- 
 _  _ 
| |(_) http://ir.iit.edu/~ej
|  _|  | | Page me via ICQ at
| |___ | | http://wwp.mirabilis.com/19022931
|__/ | or by mailing [EMAIL PROTECTED]
 |__/

-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




Re: Possible bug in self-join order optimization

2001-10-20 Thread Eric

I have no problem using STRAIGHT_JOIN, etc.  My problem is really just
figuring out the optimal join order.  Is doing a "SELECT COUNT" on
each of the tables I'm going to join the way to do it?  Isn't there
potential for the count to take as long as the full query processing
would take (especially since the attribute I'm doing a range on is not
indexed)?

eric.

On Sat, Oct 20, 2001 at 02:30:13PM +0300, Sinisa Milivojevic wrote:
> On Fri, 19 Oct 2001 13:03:02 -0500
> Eric <[EMAIL PROTECTED]> wrote:
> 
> 
> > Well, answering my own email, what I thought was a bug is not one at
> > all.  I was mistaken in thinking that MySQL paid any attention to the
> > WHERE conditions when optimizing the join order beyond determining
> > which keys are used for the join, correct?  
> > 
> > This is really terrible for queries like mine where the query could be
> > sped up by orders of magnitude if the join optimizer would just
> > determine which table in the join to scan and which to do the key
> > lookup on based on a more intelligent estimation of the number of rows
> > from each table.  It would have to go beyond looking at what keys are
> > used in the join (since each of the tables in my query can be looked
> > up by the same key) and account for the WHERE conditions placed on the
> > tables in the join.
> > 
> > Is there sufficient metadata to estimate rows coming from a table
> > based on conditions placed on the attributes of that table?  Where is
> > it?  Has anyone ever thought of coding this?  Can anyone give me a
> > place to start?
> >
> > eric.
> 
> HI!
> 
> At least, with LEFT JOIN you can always specify precisely which table is to be 
>scanned and which to be searched by key.
> 
> In other joins, you can use STRAIGHT_JOIN with correct order of tables in order to 
>achieve the same. 
> 
> Yes, the above is little bit harder to be done with dynamically created queries, but 
>it is still possible, as you can always get number of rows.
> 
> --
> 
> Regards,
> 
> --
> For technical support contracts, go to https://order.mysql.com/
>__  ___ ___   __
>   /  |/  /_ __/ __/ __ \/ /Mr. Sinisa Milivojevic <[EMAIL PROTECTED]>
>  / /|_/ / // /\ \/ /_/ / /__   MySQL AB, FullTime Developer
> /_/  /_/\_, /___/\___\_\___/   Larnaca, Cyprus
><___/   www.mysql.com

-- 
 _  _ 
| |(_) http://ir.iit.edu/~ej
|  _|  | | Page me via ICQ at
| |___ | | http://wwp.mirabilis.com/19022931
|__/ | or by mailing [EMAIL PROTECTED]
 |__/

-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




Re: Possible bug in self-join order optimization

2001-10-20 Thread Sinisa Milivojevic

Eric writes:
> Well, answering my own email, what I thought was a bug is not one at
> all.  I was mistaken in thinking that MySQL paid any attention to the
> WHERE conditions when optimizing the join order beyond determining
> which keys are used for the join, correct?  
> 
> This is really terrible for queries like mine where the query could be
> sped up by orders of magnitude if the join optimizer would just
> determine which table in the join to scan and which to do the key
> lookup on based on a more intelligent estimation of the number of rows
> from each table.  It would have to go beyond looking at what keys are
> used in the join (since each of the tables in my query can be looked
> up by the same key) and account for the WHERE conditions placed on the
> tables in the join.
> 
> Is there sufficient metadata to estimate rows coming from a table
> based on conditions placed on the attributes of that table?  Where is
> it?  Has anyone ever thought of coding this?  Can anyone give me a
> place to start?
> 
> eric.
> 
> 
> -- 
>  _  _ 
> | |(_) http://ir.iit.edu/~ej
> |  _|  | | Page me via ICQ at
> | |___ | | http://wwp.mirabilis.com/19022931
> |__/ | or by mailing [EMAIL PROTECTED]
>  |__/
> 


What information do you precisely require ??

The answer also depends on the API you are using and a method of
retrieval. 

For example, you can know how many rows you get if you use _store_
instead of _use_ method, but that method is not applicable in the case
of larger result sets.

-- 
Regards,
   __  ___ ___   __
  /  |/  /_ __/ __/ __ \/ /Mr. Sinisa Milivojevic <[EMAIL PROTECTED]>
 / /|_/ / // /\ \/ /_/ / /__   MySQL AB, FullTime Developer
/_/  /_/\_, /___/\___\_\___/   Larnaca, Cyprus
   <___/   www.mysql.com


-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




Re: Possible bug in self-join order optimization

2001-10-20 Thread Sinisa Milivojevic

On Fri, 19 Oct 2001 13:03:02 -0500
Eric <[EMAIL PROTECTED]> wrote:


> Well, answering my own email, what I thought was a bug is not one at
> all.  I was mistaken in thinking that MySQL paid any attention to the
> WHERE conditions when optimizing the join order beyond determining
> which keys are used for the join, correct?  
> 
> This is really terrible for queries like mine where the query could be
> sped up by orders of magnitude if the join optimizer would just
> determine which table in the join to scan and which to do the key
> lookup on based on a more intelligent estimation of the number of rows
> from each table.  It would have to go beyond looking at what keys are
> used in the join (since each of the tables in my query can be looked
> up by the same key) and account for the WHERE conditions placed on the
> tables in the join.
> 
> Is there sufficient metadata to estimate rows coming from a table
> based on conditions placed on the attributes of that table?  Where is
> it?  Has anyone ever thought of coding this?  Can anyone give me a
> place to start?
>
> eric.

HI!

At least, with LEFT JOIN you can always specify precisely which table is to be scanned 
and which to be searched by key.

In other joins, you can use STRAIGHT_JOIN with correct order of tables in order to 
achieve the same. 

Yes, the above is little bit harder to be done with dynamically created queries, but 
it is still possible, as you can always get number of rows.

--

Regards,

--
For technical support contracts, go to https://order.mysql.com/
   __  ___ ___   __
  /  |/  /_ __/ __/ __ \/ /Mr. Sinisa Milivojevic <[EMAIL PROTECTED]>
 / /|_/ / // /\ \/ /_/ / /__   MySQL AB, FullTime Developer
/_/  /_/\_, /___/\___\_\___/   Larnaca, Cyprus
   <___/   www.mysql.com

-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




Re: Possible bug in self-join order optimization

2001-10-19 Thread Eric

Well, answering my own email, what I thought was a bug is not one at
all.  I was mistaken in thinking that MySQL paid any attention to the
WHERE conditions when optimizing the join order beyond determining
which keys are used for the join, correct?  

This is really terrible for queries like mine where the query could be
sped up by orders of magnitude if the join optimizer would just
determine which table in the join to scan and which to do the key
lookup on based on a more intelligent estimation of the number of rows
from each table.  It would have to go beyond looking at what keys are
used in the join (since each of the tables in my query can be looked
up by the same key) and account for the WHERE conditions placed on the
tables in the join.

Is there sufficient metadata to estimate rows coming from a table
based on conditions placed on the attributes of that table?  Where is
it?  Has anyone ever thought of coding this?  Can anyone give me a
place to start?

eric.

On Thu, Oct 18, 2001 at 10:40:43AM -0500, Eric wrote:
> I am sending this again as I am desperate for some help and believe
> this to be a signifigant bug if it actually is one...which it seems to
> be.  See below for examples.
> 
> What is quite puzzling is MySQL's estimation of the number of rows
> from each of the self-joins.  The conditions on alias queryTable0
> actually refer to 1582 rows, and the conditions on alias queryTable1
> refer to 39 rows.  Notice in the EXPLAIN below that when I flip around
> the join order, MySQL thinks that 1152 (which is its estimation for
> 1582) rows are coming from queryTable1, whereas with the original join
> order, it thought 1152 rows were coming from queryTable0...this seems
> like a bug to me since the conditions on those two aliases are the
> same between the two queries.  Only the "FROM index queryTable0, index
> queryTable1" is flipped to "FROM index queryTable1, index queryTable0".
> 
> SELECT DISTINCT queryTable0.num, queryTable0.value, queryTable1.value
> FROM
> index queryTable0, index queryTable1 WHERE
> queryTable0.path=24 AND queryTable0.type="E" AND
> queryTable1.path=27 AND queryTable1.type="E" AND
> queryTable0.num=queryTable1.num AND
> queryTable0.nvalue > 0.0 AND  queryTable0.nvalue <= 90.0 AND
> queryTable1.nvalue > 140.0 AND queryTable1.nvalue <= 200.0;
> 
> 
>+-+--+--++-+++--+-+
> | table   | type | possible_keys| key| key_len |
> ref| rows | Extra   |
> 
>+-+--+--++-+++--+-+
> | queryTable0 | ref  | pathndx,numndx | pathndx |   4 |
> const  | 1152 | where used; Using temporary |
> | queryTable1 | ref  | pathndx,numndx | numndx  |   4 |
> queryTable0.num |   53 | where used  |
> 
>+-+--+--++-+++--+-+
> 2 rows in set (0.01 sec)
> 
> On Wed, Oct 17, 2001 at 04:04:21PM +0300, Michael Widenius wrote:
> > We have done some modifications to optimizer in 4.0, but nothing that
> > should affect this.
> > 
> > What is the output from EXPLAIN if you swap the tables ?
> 
> EXPLAIN of query with "FROM index queryTable1, index queryTable0":
> 
> 
>+-+--+--++-++--+-+
> | table   | type | possible_keys| key| key_len |
> ref| rows | Extra   |
> 
>+-+--+--++-++--+-+
> | queryTable1 | ref  | pathndx,numndx | pathndx |   4 |
> const  | 1152 | where used; Using temporary |
> | queryTable0 | ref  | pathndx,numndx | numndx  |   4 |
> queryTable1.num |   53 | where used  |
> 
>+-+--+--++-++--+-+
> 2 rows in set (0.01 sec)
> 
> > What is the output from "show create table 'index'"
> 
> CREATE TABLE is:
> 
> CREATE TABLE `index` (
>   `indexnum` int(10) unsigned NOT NULL auto_increment,
>   `parent` int(10) unsigned NOT NULL default '0',
>   `path` int(10) unsigned NOT NULL default '0',
>   `type` char(1) NOT NULL default '',
>   `tagname` int(10) unsigned NOT NULL default '0',
>   `atrname` int(10) unsigned NOT NULL default '0',
>   `num` int(10) unsigned NOT NULL default '0',
>   `nvalue` double default NULL,
>   `value` mediumtext,
>   PRIMARY KEY (`indexnum`),
>   KEY `parentndx`(`parent`),
>   KEY `pathndx`(`path`),
>   KEY `tagnamendx`(`tagname`),
>   KEY `atrnamendx`(`atrname`),
>   KEY `numndx`(`num`),
> ) TYPE=MyISAM MAX_ROWS=315360 PACK_KEYS=1
> 
> -- 
>  _  _ 
> | ___