On Tue, Jul 10, 2012 at 05:50:07PM -0400, Jeffrey Grollo wrote:
> Hi,
> 
> I’m attempting to optimize a join and am having a difficult time using
> multiple columns from a composite index. The second column of the composite
> key is being used when tested for equality, but not for IN or BETWEEN
> criteria.
> 
> As an example, say that I’m searching two tables: portfolio and trades. The
> portfolio table contains a list of security IDs. The trades table keeps
> tracks of the price and time when I’ve traded securities in my portfolio.
> Tables are:
> 
> CREATE TABLE portfolio (
>   sec_id bigint(20) NOT NULL AUTO_INCREMENT,
>   name char(10) NOT NULL,
>   PRIMARY KEY (sec_id)
> ) ENGINE=InnoDB ;
> 
> CREATE TABLE trades (
>   tx_id bigint(20) NOT NULL AUTO_INCREMENT,
>   sec_id bigint(20) NOT NULL,
>   trade_time datetime NOT NULL,
>   price int NOT NULL,
>   PRIMARY KEY (tx_id),
>   KEY sec_time (sec_id, trade_time)
> ) ENGINE=InnoDB ;
> 
> If I query the trades table directly both columns of the composite index
> "sec_time" will be used when I'm using a range criteria on the trade_time
> column:
> 
> 
> mysql> explain select price from trades force index(sec_time)
>     -> where sec_id IN (1, 2) and trade_time BETWEEN '2012-07-01' AND
> '2012-07-04';
> +----+-------------+--------+-------+---------------+----------+---------+------+------+-------------+
> | id | select_type | table  | type  | possible_keys | key      | key_len |
> ref  | rows | Extra       |
> +----+-------------+--------+-------+---------------+----------+---------+------+------+-------------+
> |  1 | SIMPLE      | trades | range | sec_time      | sec_time | 16      |
> NULL |    2 | Using where |
> +----+-------------+--------+-------+---------------+----------+---------+------+------+-------------+
> 
> 
> If I introduce a join to retrieve all trades for my portfolio, the entire
> index will continue to be used if I make trade_time a constant:
> 
> mysql> explain select price from portfolio p inner join trades t force
> index(sec_time) on p.sec_id = t.sec_id
>     -> where trade_time = '2012-07-01';
> +----+-------------+-------+-------+---------------+----------+---------+------------------------+------+-------------+
> | id | select_type | table | type  | possible_keys | key      | key_len |
> ref                    | rows | Extra       |
> +----+-------------+-------+-------+---------------+----------+---------+------------------------+------+-------------+
> |  1 | SIMPLE      | p     | index | PRIMARY       | PRIMARY  | 8       |
> NULL                   |    1 | Using index |
> |  1 | SIMPLE      | t     | ref   | sec_time      | sec_time | 16      |
> vantage.p.sec_id,const |    1 |             |
> +----+-------------+-------+-------+---------------+----------+---------+------------------------+------+-------------+
> 
> 
> However, if I expand the trade_time search (either using IN or BETWEEN),
> only the sec_id column of the composite query is used:
> 
> 
> mysql> explain select price from portfolio p inner join trades t force
> index(sec_time) on p.sec_id = t.sec_id
>     -> where trade_time IN ('2012-07-01', '2012-07-02');
> +----+-------------+-------+-------+---------------+----------+---------+------------------+------+-------------+
> | id | select_type | table | type  | possible_keys | key      | key_len |
> ref              | rows | Extra       |
> +----+-------------+-------+-------+---------------+----------+---------+------------------+------+-------------+
> |  1 | SIMPLE      | p     | index | PRIMARY       | PRIMARY  | 8       |
> NULL             |    1 | Using index |
> |  1 | SIMPLE      | t     | ref   | sec_time      | sec_time | 8       |
> vantage.p.sec_id |    1 | Using where |
> +----+-------------+-------+-------+---------------+----------+---------+------------------+------+-------------+
> 
> 
> My expectation is that MySQL would be able to use both columns of the
> sec_time index, but I've been unable to find either confirmation of
> refutation of that assumption. 
I can provide a refutation. Ability to make a combined index access of

1. Equality with a non-constant: t.sec_id= p.sec_id 
2. non-equality comparison with constants, trade_time IN ('2012-07-01', 
'2012-07-02')

has been discussed a number of times by query optimizer developer, but as far 
as public knowlege goes, nobody has ever implemented it, either at Oracle, or 
at MariaDB, or elsewhere.

MariaDB 5.3+ and MySQL 5.6+ have Index Condition Pushdown feature which will
have the optimizer to check the condition

  trade_time IN ('2012-07-01', '2012-07-02')

before reading the fill tables.

> If MySQL cannot optimize a join in this
> case, is there another approach to optimizing this query that I should
> pursue? Essentially, my trades table may contain many historical records
> and pulling the entire history of trades for each security would produce a
> much larger result set than would be retrieved if the trade_time criteria
> was applied in the index reference.

Basically, ref access can be constructed from equality conditions. If you have
data for many years, and the range you're scanning fits within one particular
year, this could help:

alter table trades add trade_year int, add index(sec_id, trade_year)
update trades set trade_year=year(trade_date); -- will need to be done for new 
data, too

select price from portfolio p inner join trades t on p.sec_id = t.sec_id
where trade_time IN ('2012-07-01', '2012-07-02') AND trade_year=2012;

(year can be changed to quarter or day)

BR
 Sergei
-- 
Sergei Petrunia, Software Developer
Monty Program AB, http://askmonty.org
Blog: http://s.petrunia.net/blog

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/mysql

Reply via email to