Hello All, I have been running sysbench oltp with a mariadb 10.1 master-slave topology. I have not seen any replication errors when slave parallel mode is conservative.
However, when I configure slave parallel mode to optimistic and slave parallel threads = 2, I get a lock timeout replication error with TokuDB. Just before the lock timeout error fires (which requires a tokudb lock timeout to occur), I see the one of the replication threads waiting for a lock held by the other replication thread. gdb shows the first thread waiting on a lock inside of tokudb. the other thread is stalled when committing the transaction in wait_for_prior_commit_2 <- wait_for_prior_commit <- THD::wait_for_prior_commit <- TC_LOG_MMAP::log_and_order <- ha_commit_trans. Is TokuDB supposed to call the thd report wait for API just prior to a thread about to wait on a tokudb lock? On Sun, Aug 7, 2016 at 7:50 PM, jocelyn fournier <jocelyn.fourn...@gmail.com > wrote: > Hi Kristian, > > > Just FYI I confirm the "Lock wait timeout exceeded; try restarting > transaction" behaviour you described. > > I've duplicated & modified the rpl_parallel_optimistic.test and run it > into storage/tokudb/mysql-test/tokudb_rpl/t/rpl_parallel_optimistic.test : > > ./mtr --suite=tokudb_rpl <1:33:48 > Logging: ./mtr --suite=tokudb_rpl > vardir: /home/joce/mariadb-10.1.16/mysql-test/var > Checking leftover processes... > Removing old var directory... > Creating var directory '/home/joce/mariadb-10.1.16/mysql-test/var'... > Checking supported features... > MariaDB Version 10.1.16-MariaDB-debug > - SSL connections supported > - binaries are debug compiled > Using suites: tokudb_rpl > Collecting tests... > Installing system database... > ============================================================ > ================== > > TEST RESULT TIME (ms) or COMMENT > -------------------------------------------------------------------------- > > worker[1] Using MTR_BUILD_THREAD 300, with reserved ports 16000..16019 > worker[1] mysql-test-run: WARNING: running this script as _root_ will > cause some tests to be skipped > tokudb_rpl.rpl_parallel_optimistic 'innodb_plugin,mix' [ fail ] > Test ended at 2016-08-08 01:26:34 > > CURRENT_TEST: tokudb_rpl.rpl_parallel_optimistic > mysqltest: In included file "./include/sync_with_master_gtid.inc": > included from /home/joce/mariadb-10.1.16/storage/tokudb/mysql-test/tokudb_ > rpl/t/rpl_parallel_optimistic.test at line 59: > At line 50: Failed to sync with master > > The result from queries just before the failure was: > < snip > > DELETE FROM t1 WHERE a=2; > INSERT INTO t1 VALUES (2,5); > DELETE FROM t1 WHERE a=3; > INSERT INTO t1 VALUES(3,2); > DELETE FROM t1 WHERE a=1; > INSERT INTO t1 VALUES(1,2); > DELETE FROM t1 WHERE a=3; > INSERT INTO t1 VALUES(3,3); > DELETE FROM t1 WHERE a=2; > INSERT INTO t1 VALUES (2,6); > include/save_master_gtid.inc > SELECT * FROM t1 ORDER BY a; > a b > 1 2 > 2 6 > 3 3 > include/start_slave.inc > include/sync_with_master_gtid.inc > Timeout in master_gtid_wait('0-1-20', 120), current slave GTID position > is: 0-1-3. > Slave state : Waiting for master to send event 127.0.0.1 root 16000 > 1 master-bin.000001 3468 slave-relay-bin.000002 796 > master-bin.000001 Yes No 1205 Lock wait > timeout exceeded; try restarting transaction 0 772 3790 None > 0 No No 0 1205 Lock wait > timeout exceeded; try restarting transaction 1 Slave_Pos 0-1-20 > optimistic > > > I've no explanation so far for the DUPLICATE KEY error I've seen. > > > Jocelyn > > > Le 15/07/2016 à 17:09, Kristian Nielsen a écrit : > >> jocelyn fournier <jocelyn.fourn...@gmail.com> writes: >> >> Thanks for the quick answer! I wonder if it would be possible the >>> automatically disable the optimistic parallel replication for an >>> engine if it does not implement it ? >>> >> That would probably be good - though it would be better to just implement >> the necessary API, it's a very small change (basically TokuDB just needs >> to >> inform the upper layer of any lock waits that take place inside). >> >> However, looking more at your description, you got a "key not found" >> error. Not implementing the thd_report_wait_for() could lead to deadlocks, >> but it shouldn't cause key not found. In fact, in optimistic mode, all >> errors are treated as "deadlock" errors, the query is rolled back, and >> run again, this time not in parallel. >> >> So I'm wondering if there is something else going on. If transactions T1 >> and >> T2 run in parallel, it's possible that they have a row conflict. But if T2 >> deleted a row expected by T1, I would expect T1 to wait on a row lock held >> by T2, not get a duplicate key error. And if T1 has not yet inserted a row >> expected by T2, then T2 would be rolled back and retried after T1 has >> committed. The first can cause deadlock, but neither case seems to cause >> duplicate error. >> >> Maybe TokuDB is doing something special with locks around replication, or >> something else goes wrong. I guess TokuDB just hasn't been tested much >> with >> parallel replication. >> >> Does it work ok when running in conservative parallel mode? >> >> - Kristian. >> > >
_______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp