RE: Composite Index Usage in Joins

2012-07-13 Thread Rick James
  trade_time IN ('2012-07-01', '2012-07-02');
represents two distinct points in a continuum of a DATETIME.  Perhaps you 
should change to DATE.  Even then, it would probably be treated as IN, not 
RANGE.  Perhaps you really want BETWEEN (a range) instead of IN (a set).

What version are you using?  Older version have essentially no optimizations 
for IN().

Shrinking the table size may help -- Do you really need BIGINT (8 bytes) 
instead of INT UNSIGNED (4 bytes)?  DATETIME (8) vs DATE(3).  Etc.

If these are stock prices, how can `price` be INT?




 -Original Message-
 From: Jeffrey Grollo [mailto:grol...@gmail.com]
 Sent: Tuesday, July 10, 2012 2:50 PM
 To: mysql@lists.mysql.com
 Subject: Composite Index Usage in Joins
 
 Hi,
 
 I'm attempting to optimize a join and am having a difficult time using
 multiple columns from a composite index. The second column of the
 composite key is being used when tested for equality, but not for IN or
 BETWEEN criteria.
 
 As an example, say that I'm searching two tables: portfolio and trades.
 The portfolio table contains a list of security IDs. The trades table
 keeps tracks of the price and time when I've traded securities in my
 portfolio.
 Tables are:
 
 CREATE TABLE portfolio (
   sec_id bigint(20) NOT NULL AUTO_INCREMENT,
   name char(10) NOT NULL,
   PRIMARY KEY (sec_id)
 ) ENGINE=InnoDB ;
 
 CREATE TABLE trades (
   tx_id bigint(20) NOT NULL AUTO_INCREMENT,
   sec_id bigint(20) NOT NULL,
   trade_time datetime NOT NULL,
   price int NOT NULL,
   PRIMARY KEY (tx_id),
   KEY sec_time (sec_id, trade_time)
 ) ENGINE=InnoDB ;
 
 If I query the trades table directly both columns of the composite
 index sec_time will be used when I'm using a range criteria on the
 trade_time
 column:
 
 
 mysql explain select price from trades force index(sec_time)
 - where sec_id IN (1, 2) and trade_time BETWEEN '2012-07-01' AND
 '2012-07-04';
 ++-++---+---+--+---
 --+--+--+-+
 | id | select_type | table  | type  | possible_keys | key  |
 key_len |
 ref  | rows | Extra   |
 ++-++---+---+--+---
 --+--+--+-+
 |  1 | SIMPLE  | trades | range | sec_time  | sec_time | 16
 |
 NULL |2 | Using where |
 ++-++---+---+--+---
 --+--+--+-+
 
 
 If I introduce a join to retrieve all trades for my portfolio, the
 entire index will continue to be used if I make trade_time a constant:
 
 mysql explain select price from portfolio p inner join trades t force
 index(sec_time) on p.sec_id = t.sec_id
 - where trade_time = '2012-07-01';
 ++-+---+---+---+--+
 -++--+-+
 | id | select_type | table | type  | possible_keys | key  | key_len
 |
 ref| rows | Extra   |
 ++-+---+---+---+--+
 -++--+-+
 |  1 | SIMPLE  | p | index | PRIMARY   | PRIMARY  | 8
 |
 NULL   |1 | Using index |
 |  1 | SIMPLE  | t | ref   | sec_time  | sec_time | 16
 |
 vantage.p.sec_id,const |1 | |
 ++-+---+---+---+--+
 -++--+-+
 
 
 However, if I expand the trade_time search (either using IN or
 BETWEEN), only the sec_id column of the composite query is used:
 
 
 mysql explain select price from portfolio p inner join trades t force
 index(sec_time) on p.sec_id = t.sec_id
 - where trade_time IN ('2012-07-01', '2012-07-02');
 ++-+---+---+---+--+
 -+--+--+-+
 | id | select_type | table | type  | possible_keys | key  | key_len
 |
 ref  | rows | Extra   |
 ++-+---+---+---+--+
 -+--+--+-+
 |  1 | SIMPLE  | p | index | PRIMARY   | PRIMARY  | 8
 |
 NULL |1 | Using index |
 |  1 | SIMPLE  | t | ref   | sec_time  | sec_time | 8
 |
 vantage.p.sec_id |1 | Using where |
 ++-+---+---+---+--+
 -+--+--+-+
 
 
 My expectation is that MySQL would be able to use both columns of the
 sec_time index, but I've been unable to find either confirmation of
 refutation of that assumption. If MySQL cannot optimize a join in this
 case, is there another approach to optimizing this query that I should
 pursue? Essentially, my trades table may contain many historical
 records and pulling the entire history of trades for each security
 would produce a much larger result set than would be retrieved if the
 trade_time criteria was applied in the index 

Re: Composite Index Usage in Joins

2012-07-12 Thread Jeffrey Grollo
Thanks for the guidance and references, Shawn.

On Wed, Jul 11, 2012 at 3:37 PM, Shawn Green shawn.l.gr...@oracle.com wrote:

 Ranged scans only happen for the last portion of an index being used. OR
 queries (or those using IN) can also only be applied to the last part of an
 index search. This means that if you are using IN (or OR) against the first
 part of an index, that's where the usage of the index stops.

Basically, this is what I was expecting. However, in the second table
in the join, the secondary column wasn't being used for the range or
IN. I was expecting, given the nested-join loop algorithm (as I don't
believe that BNL would be used
http://dev.mysql.com/doc/refman/5.5/en/nested-loop-joins.html), that
the first column in the key would be treated as const allowing the
second column to be used in a range or IN criteria.


 Also, trying to force index usage may be creating more work for your disks
 than necessary.

I should have indicated in the original post that I was forcing the
index simply to make the example work without data.

Regards,
Jeff

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



Re: Composite Index Usage in Joins

2012-07-12 Thread Jeffrey Grollo
On Wed, Jul 11, 2012 at 5:30 PM, Sergei Petrunia pser...@askmonty.org wrote:
 I can provide a refutation. Ability to make a combined index access of

 1. Equality with a non-constant: t.sec_id= p.sec_id
 2. non-equality comparison with constants, trade_time IN ('2012-07-01', 
 '2012-07-02')

 has been discussed a number of times by query optimizer developer, but as far
 as public knowlege goes, nobody has ever implemented it, either at Oracle, or
 at MariaDB, or elsewhere.

As mentioned in my reply to Shawn, given the nested-loop join
processing, I had expected #1 to be treated like a constant for
purposes of index access, but sounds like this is either a
misunderstanding or just a limitation of how the optimizer builds an
execution plan.

My takeaway then is that composite indexes built for supporting join
and where criteria in a query should be constructed to speed join key
lookup first then to satisfy equality conditions in additional
criteria.

 MariaDB 5.3+ and MySQL 5.6+ have Index Condition Pushdown feature

A very nice feature indeed.

 Basically, ref access can be constructed from equality conditions. If you have
 data for many years, and the range you're scanning fits within one particular
 year, this could help:

 alter table trades add trade_year int, add index(sec_id, trade_year)
 update trades set trade_year=year(trade_date); -- will need to be done for 
 new data, too

Appreciate the suggestion. Given the variability in date ranges this
search is over, finding the right granularity for supporting an
equality condition is an interesting problem. Presumably, the same
technique could be applied monthly, for example, with a second
composite index to support:

alter table trades add trade_month int, add index(sec_id, trade_month)

It seems that partition pruning might be another way to deal with
reducing the data scanned. With yearly partitions, I could then rely
on the storage engine only scanning relevant partitions and then use
equality at the month level.

Thank you for the insight,
Jeff

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



Re: Composite Index Usage in Joins

2012-07-11 Thread Shawn Green

On 7/10/2012 5:50 PM, Jeffrey Grollo wrote:

Hi,

I’m attempting to optimize a join and am having a difficult time using
multiple columns from a composite index. The second column of the composite
key is being used when tested for equality, but not for IN or BETWEEN
criteria.

As an example, say that I’m searching two tables: portfolio and trades. The
portfolio table contains a list of security IDs. The trades table keeps
tracks of the price and time when I’ve traded securities in my portfolio.
Tables are:

CREATE TABLE portfolio (
   sec_id bigint(20) NOT NULL AUTO_INCREMENT,
   name char(10) NOT NULL,
   PRIMARY KEY (sec_id)
) ENGINE=InnoDB ;

CREATE TABLE trades (
   tx_id bigint(20) NOT NULL AUTO_INCREMENT,
   sec_id bigint(20) NOT NULL,
   trade_time datetime NOT NULL,
   price int NOT NULL,
   PRIMARY KEY (tx_id),
   KEY sec_time (sec_id, trade_time)
) ENGINE=InnoDB ;

If I query the trades table directly both columns of the composite index
sec_time will be used when I'm using a range criteria on the trade_time
column:


mysql explain select price from trades force index(sec_time)
 - where sec_id IN (1, 2) and trade_time BETWEEN '2012-07-01' AND
'2012-07-04';
++-++---+---+--+-+--+--+-+
| id | select_type | table  | type  | possible_keys | key  | key_len |
ref  | rows | Extra   |
++-++---+---+--+-+--+--+-+
|  1 | SIMPLE  | trades | range | sec_time  | sec_time | 16  |
NULL |2 | Using where |
++-++---+---+--+-+--+--+-+


If I introduce a join to retrieve all trades for my portfolio, the entire
index will continue to be used if I make trade_time a constant:

mysql explain select price from portfolio p inner join trades t force
index(sec_time) on p.sec_id = t.sec_id
 - where trade_time = '2012-07-01';
++-+---+---+---+--+-++--+-+
| id | select_type | table | type  | possible_keys | key  | key_len |
ref| rows | Extra   |
++-+---+---+---+--+-++--+-+
|  1 | SIMPLE  | p | index | PRIMARY   | PRIMARY  | 8   |
NULL   |1 | Using index |
|  1 | SIMPLE  | t | ref   | sec_time  | sec_time | 16  |
vantage.p.sec_id,const |1 | |
++-+---+---+---+--+-++--+-+


However, if I expand the trade_time search (either using IN or BETWEEN),
only the sec_id column of the composite query is used:


mysql explain select price from portfolio p inner join trades t force
index(sec_time) on p.sec_id = t.sec_id
 - where trade_time IN ('2012-07-01', '2012-07-02');
++-+---+---+---+--+-+--+--+-+
| id | select_type | table | type  | possible_keys | key  | key_len |
ref  | rows | Extra   |
++-+---+---+---+--+-+--+--+-+
|  1 | SIMPLE  | p | index | PRIMARY   | PRIMARY  | 8   |
NULL |1 | Using index |
|  1 | SIMPLE  | t | ref   | sec_time  | sec_time | 8   |
vantage.p.sec_id |1 | Using where |
++-+---+---+---+--+-+--+--+-+


My expectation is that MySQL would be able to use both columns of the
sec_time index, but I've been unable to find either confirmation of
refutation of that assumption. If MySQL cannot optimize a join in this
case, is there another approach to optimizing this query that I should
pursue? Essentially, my trades table may contain many historical records
and pulling the entire history of trades for each security would produce a
much larger result set than would be retrieved if the trade_time criteria
was applied in the index reference.

I'm using MySQL 5.5.11.

Thanks for any guidance,
Jeff



Indexes are stored as b-TREE structures. For InnoDB tables, the leaf 
nodes of the tree will either contain the PRIMARY KEY of the table or a 
6-byte hidden value that acts as a row identifier.


The tree is structured so that the key values are parsed together and a 
balanced binary tree is built that represents the range of values on the 
table. For example, the index entry for one of your rows could be 
1_2012-07-01. At each level above the leaves, you have a node that 
lists the first and last elements of the range of leaves beneath it. For 
example an intermediate node may have 1_2009-01-01 and 1_2009-10-15 
to represent that those are the values within that branch of the tree.


Ranged 

Re: Composite Index Usage in Joins

2012-07-11 Thread Sergei Petrunia
On Tue, Jul 10, 2012 at 05:50:07PM -0400, Jeffrey Grollo wrote:
 Hi,
 
 I’m attempting to optimize a join and am having a difficult time using
 multiple columns from a composite index. The second column of the composite
 key is being used when tested for equality, but not for IN or BETWEEN
 criteria.
 
 As an example, say that I’m searching two tables: portfolio and trades. The
 portfolio table contains a list of security IDs. The trades table keeps
 tracks of the price and time when I’ve traded securities in my portfolio.
 Tables are:
 
 CREATE TABLE portfolio (
   sec_id bigint(20) NOT NULL AUTO_INCREMENT,
   name char(10) NOT NULL,
   PRIMARY KEY (sec_id)
 ) ENGINE=InnoDB ;
 
 CREATE TABLE trades (
   tx_id bigint(20) NOT NULL AUTO_INCREMENT,
   sec_id bigint(20) NOT NULL,
   trade_time datetime NOT NULL,
   price int NOT NULL,
   PRIMARY KEY (tx_id),
   KEY sec_time (sec_id, trade_time)
 ) ENGINE=InnoDB ;
 
 If I query the trades table directly both columns of the composite index
 sec_time will be used when I'm using a range criteria on the trade_time
 column:
 
 
 mysql explain select price from trades force index(sec_time)
 - where sec_id IN (1, 2) and trade_time BETWEEN '2012-07-01' AND
 '2012-07-04';
 ++-++---+---+--+-+--+--+-+
 | id | select_type | table  | type  | possible_keys | key  | key_len |
 ref  | rows | Extra   |
 ++-++---+---+--+-+--+--+-+
 |  1 | SIMPLE  | trades | range | sec_time  | sec_time | 16  |
 NULL |2 | Using where |
 ++-++---+---+--+-+--+--+-+
 
 
 If I introduce a join to retrieve all trades for my portfolio, the entire
 index will continue to be used if I make trade_time a constant:
 
 mysql explain select price from portfolio p inner join trades t force
 index(sec_time) on p.sec_id = t.sec_id
 - where trade_time = '2012-07-01';
 ++-+---+---+---+--+-++--+-+
 | id | select_type | table | type  | possible_keys | key  | key_len |
 ref| rows | Extra   |
 ++-+---+---+---+--+-++--+-+
 |  1 | SIMPLE  | p | index | PRIMARY   | PRIMARY  | 8   |
 NULL   |1 | Using index |
 |  1 | SIMPLE  | t | ref   | sec_time  | sec_time | 16  |
 vantage.p.sec_id,const |1 | |
 ++-+---+---+---+--+-++--+-+
 
 
 However, if I expand the trade_time search (either using IN or BETWEEN),
 only the sec_id column of the composite query is used:
 
 
 mysql explain select price from portfolio p inner join trades t force
 index(sec_time) on p.sec_id = t.sec_id
 - where trade_time IN ('2012-07-01', '2012-07-02');
 ++-+---+---+---+--+-+--+--+-+
 | id | select_type | table | type  | possible_keys | key  | key_len |
 ref  | rows | Extra   |
 ++-+---+---+---+--+-+--+--+-+
 |  1 | SIMPLE  | p | index | PRIMARY   | PRIMARY  | 8   |
 NULL |1 | Using index |
 |  1 | SIMPLE  | t | ref   | sec_time  | sec_time | 8   |
 vantage.p.sec_id |1 | Using where |
 ++-+---+---+---+--+-+--+--+-+
 
 
 My expectation is that MySQL would be able to use both columns of the
 sec_time index, but I've been unable to find either confirmation of
 refutation of that assumption. 
I can provide a refutation. Ability to make a combined index access of

1. Equality with a non-constant: t.sec_id= p.sec_id 
2. non-equality comparison with constants, trade_time IN ('2012-07-01', 
'2012-07-02')

has been discussed a number of times by query optimizer developer, but as far 
as public knowlege goes, nobody has ever implemented it, either at Oracle, or 
at MariaDB, or elsewhere.

MariaDB 5.3+ and MySQL 5.6+ have Index Condition Pushdown feature which will
have the optimizer to check the condition

  trade_time IN ('2012-07-01', '2012-07-02')

before reading the fill tables.

 If MySQL cannot optimize a join in this
 case, is there another approach to optimizing this query that I should
 pursue? Essentially, my trades table may contain many historical records
 and pulling the entire history of trades for each security would produce a
 much larger result set than would be retrieved if the trade_time criteria
 was applied in the index reference.

Basically, ref access can be constructed from equality conditions. If