Re: [Maria-developers] 880e92a48ba: MDEV-27760 event may non stop replicate in circular semisync setup

2022-03-07 Thread Sergei Golubchik
Hi, Andrei,

I don't see a point in caching the value in memory that is very cheap to
calculate per event. I've pushed f8ec4dd38d0 to 10.4-serg three hours
ago and Jenkins said it's fine.

So I'd suggest we go with it, because we really have to close this
blocker and release ES now. And MDEV-27837 will be fixed as a separate
bug.

queue_event checks global_system_variables.server_id in multiple places,
moving only do_accept_own_server_id into Master_info will not make it
any better or worse against MDEV-27837.

> > As far as I can see, you can calculate it for every event just the same.
> 
> The per event computation is extraneous and not consistent with the
> nature of the flag that belongs to the group of events.
> 
> Also notice (or remember) a "FR"
>   MDEV-27837 disallow `set @@session.server_id` within transaction
> so until this bug is fixed it's not 'just the same'.
> 
> I am recomming the patch, assuming that you'll be fine with the Gtid time
> only flag computation.
> 
Regards,
Sergei
VP of MariaDB Server Engineering
and secur...@mariadb.org

___
Mailing list: https://launchpad.net/~maria-developers
Post to : maria-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-developers
More help   : https://help.launchpad.net/ListHelp


Re: [Maria-developers] 880e92a48ba: MDEV-27760 event may non stop replicate in circular semisync setup

2022-03-06 Thread Sergei Golubchik
Hi, Andrei,

On Mar 01, Andrei Elkin wrote:
> > On Feb 28, Andrei wrote:
> >> revision-id: 880e92a48ba (mariadb-10.6.6-16-g880e92a48ba)
> >> parent(s): 4030a9fb2eb
> >> author: Andrei
> >> committer: Andrei
> >> timestamp: 2022-02-21 14:58:13 +0200
> >> message:
> >> 
> >> MDEV-27760 event may non stop replicate in circular semisync setup
> >> 
> >> diff --git a/sql/slave.cc b/sql/slave.cc
> >> index c0eef02ca7a..a1f4f4f0081 100644
> >> --- a/sql/slave.cc
> >> +++ b/sql/slave.cc
> >> @@ -6186,13 +6186,13 @@ static int queue_event(Master_info* mi, const 
> >> uchar *buf, ulong event_len)
> >>bool is_rows_event= false;
> >>/*
> >>  The flag has replicate_same_server_id semantics and is raised to 
> >> accept
> >> -a same-server-id event on the semisync slave, for both the gtid and 
> >> legacy
> >> -connection modes.
> >> -Such events can appear as result of this server recovery so the event
> >> -was created there and replicated elsewhere right before the crash. At 
> >> recovery
> >> -it could be evicted from the server's binlog.
> >> -  */
> >> -  bool do_accept_own_server_id= false;
> >> +a same-server-id event group by the gtid strict mode semisync slave.
> >> +Own server-id events can appear as result of this server 
> >> crash-recovery:
> >> +the transaction was created on this server then being master, got 
> >> replicated
> >> +elsewhere right before the crash before commit;
> >> +finally at recovery the transaction gets evicted from the server's 
> >> binlog.
> >> + */
> >> +  static bool do_accept_own_server_id= false;
> >
> > No, sorry, it cannot be static. If you need to preserve it between
> > events, you can keep it, for example, in Master_info.
> 
> The var is set by Gtid events to serve as a marker of the whole current 
> Gtid-headed
> group of event.
> Only next Gtid event may change the current value.
> 
> Why can't it?

Why it cannot be static? Because it's not a property of the server, it's
a property of a specific replication stream. Technically one can have a
multi-source and do_accept_own_server_id could have different values for
different masters.

> > So, the question now is, why do you need to preserve the value of
> > do_accept_own_server_id between queue_event invocations?
> 
> The flag as said above applies to the whole group of events, so it's
> preserved at least through so many 
> queue_event() calls as many event in the group (I count Gtid itself in).

As far as I can see, you can calculate it for every event just the same.

> >> @@ -6281,6 +6281,8 @@ static int queue_event(Master_info* mi, const uchar 
> >> *buf, ulong event_len)
> >>  dbug_rows_event_count = 0;
> >>};);
> >>  #endif
> >> +  s_id= uint4korr(buf + SERVER_ID_OFFSET);
> >> +
> >>mysql_mutex_lock(>data_lock);
> >>  
> >>switch (buf[EVENT_TYPE_OFFSET]) {
> >> @@ -6722,6 +6724,11 @@ static int queue_event(Master_info* mi, const uchar 
> >> *buf, ulong event_len)
> >>  
> >>  ++mi->events_queued_since_last_gtid;
> >>  inc_pos= event_len;
> >> +
> >> +do_accept_own_server_id=
> >> +  (s_id == global_system_variables.server_id && 
> >> rpl_semi_sync_slave_enabled
> >> +   && mi->using_gtid != Master_info::USE_GTID_NO && 
> >> opt_gtid_strict_mode) ?
> >> +  true : false;

Regards,
Sergei
VP of MariaDB Server Engineering
and secur...@mariadb.org

___
Mailing list: https://launchpad.net/~maria-developers
Post to : maria-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-developers
More help   : https://help.launchpad.net/ListHelp


Re: [Maria-developers] 880e92a48ba: MDEV-27760 event may non stop replicate in circular semisync setup

2022-03-01 Thread Sergei Golubchik
Hi, Andrei,

On Feb 28, Andrei wrote:
> revision-id: 880e92a48ba (mariadb-10.6.6-16-g880e92a48ba)
> parent(s): 4030a9fb2eb
> author: Andrei
> committer: Andrei
> timestamp: 2022-02-21 14:58:13 +0200
> message:
> 
> MDEV-27760 event may non stop replicate in circular semisync setup
> 
> diff --git a/sql/slave.cc b/sql/slave.cc
> index c0eef02ca7a..a1f4f4f0081 100644
> --- a/sql/slave.cc
> +++ b/sql/slave.cc
> @@ -6186,13 +6186,13 @@ static int queue_event(Master_info* mi, const uchar 
> *buf, ulong event_len)
>bool is_rows_event= false;
>/*
>  The flag has replicate_same_server_id semantics and is raised to accept
> -a same-server-id event on the semisync slave, for both the gtid and 
> legacy
> -connection modes.
> -Such events can appear as result of this server recovery so the event
> -was created there and replicated elsewhere right before the crash. At 
> recovery
> -it could be evicted from the server's binlog.
> -  */
> -  bool do_accept_own_server_id= false;
> +a same-server-id event group by the gtid strict mode semisync slave.
> +Own server-id events can appear as result of this server crash-recovery:
> +the transaction was created on this server then being master, got 
> replicated
> +elsewhere right before the crash before commit;
> +finally at recovery the transaction gets evicted from the server's 
> binlog.
> + */
> +  static bool do_accept_own_server_id= false;

No, sorry, it cannot be static. If you need to preserve it between
events, you can keep it, for example, in Master_info.

yes, I remember, that's where it was and I asked why it's in
Master_info. You could've answered, that you need to preserve the value
between queue_event invocations - I didn't realize that at the time.

So, the question now is, why do you need to preserve the value of
do_accept_own_server_id between queue_event invocations?

>/*
>  FD_q must have been prepared for the first R_a event
>  inside get_master_version_and_clock()
> @@ -6281,6 +6281,8 @@ static int queue_event(Master_info* mi, const uchar 
> *buf, ulong event_len)
>  dbug_rows_event_count = 0;
>};);
>  #endif
> +  s_id= uint4korr(buf + SERVER_ID_OFFSET);
> +
>mysql_mutex_lock(>data_lock);
>  
>switch (buf[EVENT_TYPE_OFFSET]) {
> @@ -6722,6 +6724,11 @@ static int queue_event(Master_info* mi, const uchar 
> *buf, ulong event_len)
>  
>  ++mi->events_queued_since_last_gtid;
>  inc_pos= event_len;
> +
> +do_accept_own_server_id=
> +  (s_id == global_system_variables.server_id && 
> rpl_semi_sync_slave_enabled
> +   && mi->using_gtid != Master_info::USE_GTID_NO && 
> opt_gtid_strict_mode) ?
> +  true : false;

this is a ridiculous pattern

   bool var = (bool expr) ? true : false;

don't do that.

>}
>break;
>/*
> @@ -6909,7 +6916,6 @@ static int queue_event(Master_info* mi, const uchar 
> *buf, ulong event_len)
>*/
>  
>mysql_mutex_lock(log_lock);
> -  s_id= uint4korr(buf + SERVER_ID_OFFSET);
>/*
>  Write the event to the relay log, unless we reconnected in the middle
>  of an event group and now need to skip the initial part of the group that
> @@ -6955,7 +6961,7 @@ static int queue_event(Master_info* mi, const uchar 
> *buf, ulong event_len)
>else
>if ((s_id == global_system_variables.server_id &&
> !(mi->rli.replicate_same_server_id ||
> - (do_accept_own_server_id= rpl_semi_sync_slave_enabled))) ||
> + do_accept_own_server_id)) ||
>event_that_should_be_ignored(buf) ||
>/*
>  the following conjunction deals with IGNORE_SERVER_IDS, if set
> 
Regards,
Sergei
VP of MariaDB Server Engineering
and secur...@mariadb.org

___
Mailing list: https://launchpad.net/~maria-developers
Post to : maria-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-developers
More help   : https://help.launchpad.net/ListHelp