Re: Query Optimization Question

Michael Stassen Mon, 13 Mar 2006 22:15:49 -0800

Robert DiFalco wrote:
> In a previous database engine I was using an IN was more optimal than a
> <>. So, for example:
>
>     SELECT * FROM table WHERE table.type IN (1,2,3);
>
> Where the possible values of type are 0-3, was appreciably faster than:
>
>     SELECT * FROM table WHERE table.type <> 0;
>
> I've been playing with the Query Browser and checking out the
> optimization documents and haven't been able to make a clear call on
> whether or not this is also the case with MySQL/InnoDB.


Mladen Adamovic wrote:
> IN should be faster implemented with both hash tables and BTREE's so
> nowadays it should also be faster than <> as all MySQL implementation
> AFAIK use those well known data structures for indexes.

[EMAIL PROTECTED] wrote:
> YES, YES, YES! This is definitely an optimization.
>
> When you say IN or =, you are asking for "matching values". Matches can
> come from indexes. When you say <> or NOT IN, you are asking for
> everything BUT matches.  In order to evaluate a negative, the database
> engine (and this is usually true regardless of database server) almost
> always performs a full table scan to test every row to make sure it is
> either <> or NOT IN. At the very best, they have to perform a full index
> scan which is still less efficient than  ranged or values-based lookups.
>
> It's when you get into the situation where you are matching against dozens
> of IN-clause items that you may run into slowdowns again. Until you reach
> 2 or 3 dozen terms (depending on your hardware) you should be faster with
> an IN comparison than a <> or a NOT IN comparison. An optimization to
> search for BUT a term or two is to create a temporary table of all of your
> terms and delete the exact ones you want to exclude. Put an index on your
> temp table then JOIN that back into your query again (replacing the huge
> IN clause).  The database will match index to index and things will get
> fast again. This technique can scale up to some really big queries.
>
> Always try to code for the affirmative tests. Your users will thank you.

Implicit in Mladen and Shawn's answers, but never actually mentioned in theoriginal post, is the presence of an index on the type column. This is probablyobvious to all concerned, but I mention it for completeness: without an index ontype, there is no difference between "type IN (1,2,3)" and "type != 0". Thatis, the question is not whether IN is better than !=, but rather which willallow the optimizer to make good use of the index on type.

I find mysql's optimizer is pretty good with well-written queries, as long assubqueries aren't involved, so my initial reaction was to expect no difference.After all, as the optimizer considers the WHERE conditions and the availableindexes, it is certainly possible, at least theoretically, for it to notice that"type IN (1,2,3)" and "type != 0" are identical conditions. That is, a cleveroptimizer could treat them identically. Shawn's and Mladen's answers gave mepause, however, and aroused my curiosity, so I decided to test:


  SELECT VERSION();
  +-----------+
  | VERSION() |
  +-----------+
  | 4.1.15    |
  +-----------+

  SELECT cat, COUNT(*) FROM inits GROUP BY cat;
  +------+----------+
  | type | COUNT(*) |
  +------+----------+
  |    0 |    44224 |
  |    1 |     1919 |
  |    2 |     1931 |
  |    3 |     1926 |
  +------+----------+

  mysql> EXPLAIN SELECT * FROM inits WHERE cat IN (1,2,3) \G
  *************************** 1. row ***************************
             id: 1
    select_type: SIMPLE
          table: inits
           type: range
  possible_keys: cat_idx
            key: cat_idx
        key_len: 5
            ref: NULL
           rows: 8117
          Extra: Using where

  mysql> EXPLAIN SELECT * FROM inits WHERE cat != 0 \G
  *************************** 1. row ***************************
             id: 1
    select_type: SIMPLE
          table: inits
           type: range
  possible_keys: cat_idx
            key: cat_idx
        key_len: 5
            ref: NULL
           rows: 8120
          Extra: Using where

As you can see, the optimizer plans to use the index in both cases, examining8117 rows in the IN case and 8120 rows in the != case, to get 5776 matching rowsout of 50,000 (about 11.55%).

On the other hand, it makes a difference how many rows will match. What is thedistribution of values of type? If the number of matching rows is more thanabout 30% of the table, the optimizer won't use an available index in any case.For example,


mysql> EXPLAIN SELECT * FROM inits WHERE cat IN (0,1,2) \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: inits
         type: ALL
possible_keys: cat_idx
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 50000
        Extra: Using where

mysql> EXPLAIN SELECT * FROM inits WHERE cat !=3 \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: inits
         type: ALL
possible_keys: cat_idx
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 50000
        Extra: Using where

Here I've gone to the other extreme, selecting about 96.15% of the rows in thetable, so the optimizer prefers not to use the index in either case.

Of course, I've just done one simple test with arbitrary data on an unloaded devmachine. My advice is to try your query both ways with EXPLAIN in front so youcan see what choices the optimizer makes with your actual query and data, ratherthan relying on theory or one test case. If there's a difference, pickwhichever is better.


Michael

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Re: Query Optimization Question

Reply via email to