RE: Composite Index Usage in Joins
trade_time IN ('2012-07-01', '2012-07-02'); represents two distinct points in a continuum of a DATETIME. Perhaps you should change to DATE. Even then, it would probably be treated as IN, not RANGE. Perhaps you really want BETWEEN (a range) instead of IN (a set). What version are you using? Older version have essentially no optimizations for IN(). Shrinking the table size may help -- Do you really need BIGINT (8 bytes) instead of INT UNSIGNED (4 bytes)? DATETIME (8) vs DATE(3). Etc. If these are stock prices, how can `price` be INT? -Original Message- From: Jeffrey Grollo [mailto:grol...@gmail.com] Sent: Tuesday, July 10, 2012 2:50 PM To: mysql@lists.mysql.com Subject: Composite Index Usage in Joins Hi, I'm attempting to optimize a join and am having a difficult time using multiple columns from a composite index. The second column of the composite key is being used when tested for equality, but not for IN or BETWEEN criteria. As an example, say that I'm searching two tables: portfolio and trades. The portfolio table contains a list of security IDs. The trades table keeps tracks of the price and time when I've traded securities in my portfolio. Tables are: CREATE TABLE portfolio ( sec_id bigint(20) NOT NULL AUTO_INCREMENT, name char(10) NOT NULL, PRIMARY KEY (sec_id) ) ENGINE=InnoDB ; CREATE TABLE trades ( tx_id bigint(20) NOT NULL AUTO_INCREMENT, sec_id bigint(20) NOT NULL, trade_time datetime NOT NULL, price int NOT NULL, PRIMARY KEY (tx_id), KEY sec_time (sec_id, trade_time) ) ENGINE=InnoDB ; If I query the trades table directly both columns of the composite index sec_time will be used when I'm using a range criteria on the trade_time column: mysql explain select price from trades force index(sec_time) - where sec_id IN (1, 2) and trade_time BETWEEN '2012-07-01' AND '2012-07-04'; ++-++---+---+--+--- --+--+--+-+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | ++-++---+---+--+--- --+--+--+-+ | 1 | SIMPLE | trades | range | sec_time | sec_time | 16 | NULL |2 | Using where | ++-++---+---+--+--- --+--+--+-+ If I introduce a join to retrieve all trades for my portfolio, the entire index will continue to be used if I make trade_time a constant: mysql explain select price from portfolio p inner join trades t force index(sec_time) on p.sec_id = t.sec_id - where trade_time = '2012-07-01'; ++-+---+---+---+--+ -++--+-+ | id | select_type | table | type | possible_keys | key | key_len | ref| rows | Extra | ++-+---+---+---+--+ -++--+-+ | 1 | SIMPLE | p | index | PRIMARY | PRIMARY | 8 | NULL |1 | Using index | | 1 | SIMPLE | t | ref | sec_time | sec_time | 16 | vantage.p.sec_id,const |1 | | ++-+---+---+---+--+ -++--+-+ However, if I expand the trade_time search (either using IN or BETWEEN), only the sec_id column of the composite query is used: mysql explain select price from portfolio p inner join trades t force index(sec_time) on p.sec_id = t.sec_id - where trade_time IN ('2012-07-01', '2012-07-02'); ++-+---+---+---+--+ -+--+--+-+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | ++-+---+---+---+--+ -+--+--+-+ | 1 | SIMPLE | p | index | PRIMARY | PRIMARY | 8 | NULL |1 | Using index | | 1 | SIMPLE | t | ref | sec_time | sec_time | 8 | vantage.p.sec_id |1 | Using where | ++-+---+---+---+--+ -+--+--+-+ My expectation is that MySQL would be able to use both columns of the sec_time index, but I've been unable to find either confirmation of refutation of that assumption. If MySQL cannot optimize a join in this case, is there another approach to optimizing this query that I should pursue? Essentially, my trades table may contain many historical records and pulling the entire history of trades for each security would produce a much larger result set than would be retrieved if the trade_time criteria was applied in the index
Re: Composite Index Usage in Joins
Thanks for the guidance and references, Shawn. On Wed, Jul 11, 2012 at 3:37 PM, Shawn Green shawn.l.gr...@oracle.com wrote: Ranged scans only happen for the last portion of an index being used. OR queries (or those using IN) can also only be applied to the last part of an index search. This means that if you are using IN (or OR) against the first part of an index, that's where the usage of the index stops. Basically, this is what I was expecting. However, in the second table in the join, the secondary column wasn't being used for the range or IN. I was expecting, given the nested-join loop algorithm (as I don't believe that BNL would be used http://dev.mysql.com/doc/refman/5.5/en/nested-loop-joins.html), that the first column in the key would be treated as const allowing the second column to be used in a range or IN criteria. Also, trying to force index usage may be creating more work for your disks than necessary. I should have indicated in the original post that I was forcing the index simply to make the example work without data. Regards, Jeff -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql
Re: Composite Index Usage in Joins
On Wed, Jul 11, 2012 at 5:30 PM, Sergei Petrunia pser...@askmonty.org wrote: I can provide a refutation. Ability to make a combined index access of 1. Equality with a non-constant: t.sec_id= p.sec_id 2. non-equality comparison with constants, trade_time IN ('2012-07-01', '2012-07-02') has been discussed a number of times by query optimizer developer, but as far as public knowlege goes, nobody has ever implemented it, either at Oracle, or at MariaDB, or elsewhere. As mentioned in my reply to Shawn, given the nested-loop join processing, I had expected #1 to be treated like a constant for purposes of index access, but sounds like this is either a misunderstanding or just a limitation of how the optimizer builds an execution plan. My takeaway then is that composite indexes built for supporting join and where criteria in a query should be constructed to speed join key lookup first then to satisfy equality conditions in additional criteria. MariaDB 5.3+ and MySQL 5.6+ have Index Condition Pushdown feature A very nice feature indeed. Basically, ref access can be constructed from equality conditions. If you have data for many years, and the range you're scanning fits within one particular year, this could help: alter table trades add trade_year int, add index(sec_id, trade_year) update trades set trade_year=year(trade_date); -- will need to be done for new data, too Appreciate the suggestion. Given the variability in date ranges this search is over, finding the right granularity for supporting an equality condition is an interesting problem. Presumably, the same technique could be applied monthly, for example, with a second composite index to support: alter table trades add trade_month int, add index(sec_id, trade_month) It seems that partition pruning might be another way to deal with reducing the data scanned. With yearly partitions, I could then rely on the storage engine only scanning relevant partitions and then use equality at the month level. Thank you for the insight, Jeff -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql
Re: Composite Index Usage in Joins
On 7/10/2012 5:50 PM, Jeffrey Grollo wrote: Hi, I’m attempting to optimize a join and am having a difficult time using multiple columns from a composite index. The second column of the composite key is being used when tested for equality, but not for IN or BETWEEN criteria. As an example, say that I’m searching two tables: portfolio and trades. The portfolio table contains a list of security IDs. The trades table keeps tracks of the price and time when I’ve traded securities in my portfolio. Tables are: CREATE TABLE portfolio ( sec_id bigint(20) NOT NULL AUTO_INCREMENT, name char(10) NOT NULL, PRIMARY KEY (sec_id) ) ENGINE=InnoDB ; CREATE TABLE trades ( tx_id bigint(20) NOT NULL AUTO_INCREMENT, sec_id bigint(20) NOT NULL, trade_time datetime NOT NULL, price int NOT NULL, PRIMARY KEY (tx_id), KEY sec_time (sec_id, trade_time) ) ENGINE=InnoDB ; If I query the trades table directly both columns of the composite index sec_time will be used when I'm using a range criteria on the trade_time column: mysql explain select price from trades force index(sec_time) - where sec_id IN (1, 2) and trade_time BETWEEN '2012-07-01' AND '2012-07-04'; ++-++---+---+--+-+--+--+-+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | ++-++---+---+--+-+--+--+-+ | 1 | SIMPLE | trades | range | sec_time | sec_time | 16 | NULL |2 | Using where | ++-++---+---+--+-+--+--+-+ If I introduce a join to retrieve all trades for my portfolio, the entire index will continue to be used if I make trade_time a constant: mysql explain select price from portfolio p inner join trades t force index(sec_time) on p.sec_id = t.sec_id - where trade_time = '2012-07-01'; ++-+---+---+---+--+-++--+-+ | id | select_type | table | type | possible_keys | key | key_len | ref| rows | Extra | ++-+---+---+---+--+-++--+-+ | 1 | SIMPLE | p | index | PRIMARY | PRIMARY | 8 | NULL |1 | Using index | | 1 | SIMPLE | t | ref | sec_time | sec_time | 16 | vantage.p.sec_id,const |1 | | ++-+---+---+---+--+-++--+-+ However, if I expand the trade_time search (either using IN or BETWEEN), only the sec_id column of the composite query is used: mysql explain select price from portfolio p inner join trades t force index(sec_time) on p.sec_id = t.sec_id - where trade_time IN ('2012-07-01', '2012-07-02'); ++-+---+---+---+--+-+--+--+-+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | ++-+---+---+---+--+-+--+--+-+ | 1 | SIMPLE | p | index | PRIMARY | PRIMARY | 8 | NULL |1 | Using index | | 1 | SIMPLE | t | ref | sec_time | sec_time | 8 | vantage.p.sec_id |1 | Using where | ++-+---+---+---+--+-+--+--+-+ My expectation is that MySQL would be able to use both columns of the sec_time index, but I've been unable to find either confirmation of refutation of that assumption. If MySQL cannot optimize a join in this case, is there another approach to optimizing this query that I should pursue? Essentially, my trades table may contain many historical records and pulling the entire history of trades for each security would produce a much larger result set than would be retrieved if the trade_time criteria was applied in the index reference. I'm using MySQL 5.5.11. Thanks for any guidance, Jeff Indexes are stored as b-TREE structures. For InnoDB tables, the leaf nodes of the tree will either contain the PRIMARY KEY of the table or a 6-byte hidden value that acts as a row identifier. The tree is structured so that the key values are parsed together and a balanced binary tree is built that represents the range of values on the table. For example, the index entry for one of your rows could be 1_2012-07-01. At each level above the leaves, you have a node that lists the first and last elements of the range of leaves beneath it. For example an intermediate node may have 1_2009-01-01 and 1_2009-10-15 to represent that those are the values within that branch of the tree. Ranged
Re: Composite Index Usage in Joins
On Tue, Jul 10, 2012 at 05:50:07PM -0400, Jeffrey Grollo wrote: Hi, I’m attempting to optimize a join and am having a difficult time using multiple columns from a composite index. The second column of the composite key is being used when tested for equality, but not for IN or BETWEEN criteria. As an example, say that I’m searching two tables: portfolio and trades. The portfolio table contains a list of security IDs. The trades table keeps tracks of the price and time when I’ve traded securities in my portfolio. Tables are: CREATE TABLE portfolio ( sec_id bigint(20) NOT NULL AUTO_INCREMENT, name char(10) NOT NULL, PRIMARY KEY (sec_id) ) ENGINE=InnoDB ; CREATE TABLE trades ( tx_id bigint(20) NOT NULL AUTO_INCREMENT, sec_id bigint(20) NOT NULL, trade_time datetime NOT NULL, price int NOT NULL, PRIMARY KEY (tx_id), KEY sec_time (sec_id, trade_time) ) ENGINE=InnoDB ; If I query the trades table directly both columns of the composite index sec_time will be used when I'm using a range criteria on the trade_time column: mysql explain select price from trades force index(sec_time) - where sec_id IN (1, 2) and trade_time BETWEEN '2012-07-01' AND '2012-07-04'; ++-++---+---+--+-+--+--+-+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | ++-++---+---+--+-+--+--+-+ | 1 | SIMPLE | trades | range | sec_time | sec_time | 16 | NULL |2 | Using where | ++-++---+---+--+-+--+--+-+ If I introduce a join to retrieve all trades for my portfolio, the entire index will continue to be used if I make trade_time a constant: mysql explain select price from portfolio p inner join trades t force index(sec_time) on p.sec_id = t.sec_id - where trade_time = '2012-07-01'; ++-+---+---+---+--+-++--+-+ | id | select_type | table | type | possible_keys | key | key_len | ref| rows | Extra | ++-+---+---+---+--+-++--+-+ | 1 | SIMPLE | p | index | PRIMARY | PRIMARY | 8 | NULL |1 | Using index | | 1 | SIMPLE | t | ref | sec_time | sec_time | 16 | vantage.p.sec_id,const |1 | | ++-+---+---+---+--+-++--+-+ However, if I expand the trade_time search (either using IN or BETWEEN), only the sec_id column of the composite query is used: mysql explain select price from portfolio p inner join trades t force index(sec_time) on p.sec_id = t.sec_id - where trade_time IN ('2012-07-01', '2012-07-02'); ++-+---+---+---+--+-+--+--+-+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | ++-+---+---+---+--+-+--+--+-+ | 1 | SIMPLE | p | index | PRIMARY | PRIMARY | 8 | NULL |1 | Using index | | 1 | SIMPLE | t | ref | sec_time | sec_time | 8 | vantage.p.sec_id |1 | Using where | ++-+---+---+---+--+-+--+--+-+ My expectation is that MySQL would be able to use both columns of the sec_time index, but I've been unable to find either confirmation of refutation of that assumption. I can provide a refutation. Ability to make a combined index access of 1. Equality with a non-constant: t.sec_id= p.sec_id 2. non-equality comparison with constants, trade_time IN ('2012-07-01', '2012-07-02') has been discussed a number of times by query optimizer developer, but as far as public knowlege goes, nobody has ever implemented it, either at Oracle, or at MariaDB, or elsewhere. MariaDB 5.3+ and MySQL 5.6+ have Index Condition Pushdown feature which will have the optimizer to check the condition trade_time IN ('2012-07-01', '2012-07-02') before reading the fill tables. If MySQL cannot optimize a join in this case, is there another approach to optimizing this query that I should pursue? Essentially, my trades table may contain many historical records and pulling the entire history of trades for each security would produce a much larger result set than would be retrieved if the trade_time criteria was applied in the index reference. Basically, ref access can be constructed from equality conditions. If