Hi, Andrei!
On Apr 27, Andrei Elkin wrote:
> >> +--let $case = B: one engine has committed its transaction branch
> >> +# Hold off after one engine has committed.
> >> +--let $shutdown_timeout=0
> >> +--let $debug_sync_action = "commit_after_run_commit_ordered SIGNAL
> >> con1_ready WAIT_FOR signal_no_signal"
> >> +# Both debug_sync and debug-dbug are required to make sure Engines
> >> remember the commit state
> >> +# debug_sync alone will not help.
> >> +--let $restart_parameters = --rpl-semi-sync-slave-enabled=1
> >> --debug-dbug=d,binlog_truncate_partial_commit
> >
> > in the first review I wrote
> >
> > this seems to be a rather crude way of faking a partially committed
> > transaction. better to crash after the first engine has committed,
> > that'd be much more natural.
> >
> > and you replied
> >
> > This simulation aimed at (allows for) more complicated recovery time
> > event sequences.
> > In this case, indeed, crashing by demand is about of the same efforts.
> > I can convert to that.
> >
> > [x]
>
> I had a rather sobering followup, slipped communicating to you though, sorry.
>
> It's actually turns out not to be easy. A sequence of execution events
>
> E1.prepare(trx), E2.prepare(trx), E1.commit(trx), *crash*
>
> does not make E1 such as Innodb committed persistently in file. At
> recovery in E1 trx may be found in prepared state, unless before the
> crash I'd do something like Binlog CheckPoint (BCP) notification request
> and wait of it.
I don't understand, what a binlog checkpoint has to do with it?
> In this just a selected engine would be requested, unlike in BCP event
> case that is generated when both (in this case) have persistently committed.
> (That is we even can't involve BCP event to wait for, and the involvement
> would require some simulation as well).
>
> So I propose to keep the current method leaving the "honest" crash to
> RQG that Alice has been training the patch with.
Alice was loading the server with transactions and doing `kill -9` at
some point. Many interesting cases that we discuss (for example, the one
above) happen in a very small time window, so I don't know whether
random killing could provide an adequate test coverage.
> >> diff --git a/mysql-test/suite/rpl/t/rpl_semi_sync_fail_over.test
> >> b/mysql-test/suite/rpl/t/rpl_semi_sync_fail_over.test
> >> --- /dev/null
> >> +++ b/mysql-test/suite/rpl/t/rpl_semi_sync_fail_over.test
> >> @@ -0,0 +1,143 @@
> >> +
> >> +--source include/have_innodb.inc
> >> +--source include/have_debug_sync.inc
> >> +--source include/have_binlog_format_row.inc
> >> +--let $rpl_topology=1->2
> >> +--source include/rpl_init.inc
> >
> > why not to source master-slave.inc if you're using a standard master-slave
> > topology anyway?
>
> Using numbers it's easier to manipulate with failover operatations.
> Besides it's truly uncomfortable both read and write
> when `master` would be serving the slave role as it happens in the test :-).
I believe I already replied to that earlier.
master-slave.inc creates server_1 and server_2 connections all right.
> >> diff --git a/sql/handler.h b/sql/handler.h
> >> --- a/sql/handler.h
> >> +++ b/sql/handler.h
> >> @@ -873,6 +874,15 @@ typedef struct xid_t XID;
> >> /* The 'buf' has to have space for at least SQL_XIDSIZE bytes. */
> >> uint get_sql_xid(XID *xid, char *buf);
> >>
> >> +/* struct for semisync slave binlog truncate recovery */
> >> +struct xid_recovery_member
> >> +{
> >> + my_xid xid;
> >> + uint in_engine_prepare; // number of engines that have xid prepared
> >> + bool decided_to_commit;
> >> + std::pair binlog_coord; // semisync recovery binlog
> >> offset
> >
> > wouldn't it be clearer to have a struct with named members?
>
> I am not sure if there's a need for naming. But we do need
> comparison that std::pair provides for granted.
>
> I could consider (a) new typedef(s) to represent the binlog id and
> offset in the file to make it more readable.
>
> You say?
>
> > in fact, I'm somewhat surprised there's no such struct for binlog coords
> > already.
>
> I had started that back at HB implementation actually, but that struct
> did not attact much of attention.
I'd still suggest a structure, like
struct binlog_coords_t {
uint binlog_num;
my_off_t offset;
};
> >
> >> +};
> >> +
> >> /* for recover() handlerton call */
> >> #define MIN_XID_LIST_SIZE 128
> >> #define MAX_XID_LIST_SIZE (1024*128)
> >> @@ -4820,7 +4830,8 @@ int ha_commit_one_phase(THD *thd, bool all);
> >> int ha_commit_trans(THD *thd, bool all);
> >> int ha_rollback_trans(THD *thd, bool all);
> >> int ha_prepare(THD *thd);
> >> -int ha_recover(HASH *commit_list);
> >> +int ha_recover(HASH *commit_list, MEM_ROOT *mem_root= NULL);
> >> +uint ha_recover_complete(HASH *commit_list, std::pair
> >> *coord= NULL);
> >
> > is coord a truncation position?
>
> So NULL is for the normal recovery.
> Non-NULL is an estimate of the 1st to rollback.
> Trx with lesser offsets