[29 Dec 6:39] Laurynas Biveinis
Description:
If something changes the index tree between the two dives in 
btr_estimate_n_rows_in_range, this goes undetected, making it return bogus row 
count for the range, resulting in bogus query plans.

How to repeat:
Apply

--- storage/innobase/btr/btr0cur.cc.orig        2016-12-29 07:16:06.000000000 
+0200
+++ storage/innobase/btr/btr0cur.cc     2016-12-29 07:16:08.000000000 +0200
@@ -3708,6 +3708,9 @@
 
        mtr_commit(&mtr);
 
+       if (!strcmp(index->name, "iC"))
+               DEBUG_SYNC_C("btr_estimate_n_rows_in_range_between_dives");
+
        mtr_start(&mtr);
 
        cursor.path_arr = path2;

Then use the following MTR testcase. It uses purge as the concurrent
index tree writer.

--source include/have_debug.inc
--source include/have_debug_sync.inc
--source include/have_innodb.inc

create table t3
(
  key1 int not null,
  key2 int not null,
  INDEX i1(key1),
  INDEX i2(key2)
) engine=InnoDB;

--disable_query_log
insert into t3 values (1,1),(2,2);

let $1=9;
set @d=2;
while ($1)
{
  eval insert into t3 select key1+@d, key2+@d from t3;
  eval set @d=@d*2;
  dec $1;
}
--enable_query_log

analyze table t3;

SET @@GLOBAL.innodb_purge_stop_now=TRUE;

alter table t3 add keyC int not null, add index iC(keyC);
update t3 set keyC=key1;

analyze table t3;

--connect(con1,localhost,root,,)

explain select * from t3 where
  key1=1 or key2=2 or keyC=12;

SET DEBUG_SYNC = "btr_estimate_n_rows_in_range_between_dives SIGNAL
estimate_ready WAIT_FOR estimate_finish";

send explain select * from t3 where
  key1=1 or key2=2 or keyC=12;

connection default;
SET DEBUG_SYNC = "now WAIT_FOR estimate_ready";

SET @@GLOBAL.innodb_purge_run_now=TRUE;
--source include/wait_innodb_all_purged.inc

SET DEBUG_SYNC = "now SIGNAL estimate_finish";

connection con1;
reap;

explain select * from t3 where
  key1=1 or key2=2 or keyC=12;

disconnect con1;
connection default;

drop table t3;

The testcase above executes the same EXPLAIN SELECT three times: with
the iC index tree not purged at all, with it being purged between the
two dives, and with the fully clean tree. The first and the last
execution use index_merge, and the middle one is a table scan instead.

This is reduced from an intermittent main.index_merge_innodb failure:

main.index_merge_innodb                  [ fail ]
        Test ended at 2016-12-09 07:08:51

CURRENT_TEST: main.index_merge_innodb
--- 
/mnt/workspace/percona-server-5.6-param/BUILD_TYPE/release/Host/centos6-64/mysql-test/r/index_merge_innodb.result
   2016-12-09 14:08:06.818000030 +0300
+++ 
/mnt/workspace/percona-server-5.6-param/BUILD_TYPE/release/Host/centos6-64/build/mysql-test/var/log/index_merge_innodb.reject
       2016-12-09 15:08:51.296000042 +0300
@@ -299,7 +299,7 @@
 key5=5 or key6=6 or key7=7 or key8=8 or
 key9=9 or keyA=10 or keyB=11 or keyC=12;
 id     select_type     table   type    possible_keys   key     key_len ref     
rows    Extra
-1      SIMPLE  t3      index_merge     i1,i2,i3,i4,i5,i6,i7,i8,i9,iA,iB,iC     
i1,i2,i3,i4,i5,i6,i7,i8,i9,iA,iB,iC     4,4,4,4,4,4,4,4,4,4,4,4 NULL    12      
Using union(i1,i2,i3,i4,i5,i6,i7,i8,i9,iA,iB,iC); Using where
+1      SIMPLE  t3      ALL     i1,i2,i3,i4,i5,i6,i7,i8,i9,iA,iB,iC     NULL    
NULL    NULL    1024    Using where
 select * from t3 where
 key1=1 or key2=2 or key3=3 or key4=4 or
 key5=5 or key6=6 or key7=7 or key8=8 or

mysqltest: Result content mismatch

Suggested fix:
The bug results in index dive paths with path1[0].nth_rec > path2[0].nth_rec, 
returning hundreds of rows in the range instead of expected one. Even without 
concurrent modifications aside, such path crossing is detected in 
btr_estimate_n_rows_in_range:

                                /* It is possible that
                                slot1->nth_rec >= slot2->nth_rec
                                if, for example, we have a single page
                                tree which contains (inf, 5, 6, supr)
                                and we select where x > 20 and x < 30;
                                in this case slot1->nth_rec will point
                                to the supr record and slot2->nth_rec
                                will point to 6 */
                                n_rows = 0;

But instead of returning zero right there, path tracing continues.
Shouldn't it stop and return zero here instead?

Then, 5.7 actually catches some of the tree modifications in this
function by [1]. It does not catch this particular tree change because
both paths still point to the same page. A possible fix is extending [1]
with a  condition that the same page id node in the two paths should
contain the same number of records too - but this might be too
aggressive for the trees with heavy write volume, resulting in needless
dive restarts and eventual returning arbitrary 10 rows in range fallback
constant. Perhaps adding a check that the same page in the two paths
should have the same number of records - but only if the paths have
crossed - would be reasonable.

In any case such fix would be partial only as it's possible to modify
the tree and still have the same number of records before and after.

[1]:

commit 9ac1425053312b98fc9e6a999087efb09e60daec
Author: Vasil Dimov <[email protected]>
Date:   Fri Apr 17 15:04:58 2015 +0300

    Fix Bug#20618309 ASSERT SLOT1->PAGE_LEVEL == SLOT2->PAGE_LEVEL, 
BTR_ESTIMATE_N_ROWS_IN_RANGE()
    
    Relax a too strict assert. If the tree is changed between both dives for
    the left boundary and the right boundary, then our markers (slot1 and
    slot2) could end up on different levels in the tree.
    
    If we detect that this has happened - then retry a few times and if
    still unsuccessful then return an arbitrary number for an estimate.
    
    Reviewed-by:    Marko M<C3><A4>kel<C3><A4> <[email protected]>
    Reviewed-by:    Jimmy Yang <[email protected]>
    RB:             8570

** Summary changed:

- Test main.index_merge_innodb is unstable
+ InnoDB index dives do not detect concurrent tree changes, return bogus 
estimates

** Changed in: percona-server/5.5
       Status: Invalid => New

** Changed in: percona-server/5.5
   Importance: Undecided => Medium

** Tags added: ci innodb upstream

** Bug watch added: MySQL Bug System #84366
   http://bugs.mysql.com/bug.php?id=84366

** Also affects: mysql-server via
   http://bugs.mysql.com/bug.php?id=84366
   Importance: Unknown
       Status: Unknown

-- 
You received this bug notification because you are a member of Ubuntu
Server/Client Support Team, which is subscribed to MySQL.
Matching subscriptions: Ubuntu Server/Client Support Team
https://bugs.launchpad.net/bugs/1625151

Title:
  InnoDB index dives do not detect concurrent tree changes, return bogus
  estimates

To manage notifications about this bug go to:
https://bugs.launchpad.net/mysql-server/+bug/1625151/+subscriptions

-- 
Mailing list: https://launchpad.net/~enterprise-support
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~enterprise-support
More help   : https://help.launchpad.net/ListHelp

Reply via email to