On Tue, Nov 5, 2013 at 3:30 PM, Apollon Oikonomopoulos
<[email protected]> wrote:
> DrbdAttachNet supports both, normal primary/secondary node operation, and
> (during live migration) dual-primary operation. When resources are newly
> attached, we poll until we find all of them in connected or syncing operation.
>
> Although aggressive, this is enough for primary/secondary operation, because
> the primary/secondary role is not changed from within DrbdAttachNet. However,
> in the dual-primary (“multimaster”) case, both peers are subsequently upgraded
> to the primary role.  If - for unspecified reasons - both disks are not
> UpToDate, then a resync may be triggered after both peers have switched to
> primary, causing the resource to disconnect:
>
>   kernel: [1465514.164009] block drbd2: I shall become SyncTarget, but I am 
> primary!
>   kernel: [1465514.171562] block drbd2: ASSERT( os.conn == C_WF_REPORT_PARAMS 
> ) in /build/linux-rrsxby/linux-3.2.51/drivers/block/drbd/drbd_receiver.c:3245

This line is too long and cannot be committed (I know it's a log line,
and it would be nice to keep it like that, but we have automated
checks in place), so it should be split.

>
> This seems to be extremely racey and is possibly triggered by some underlying
> network issues (e.g. high latency), but it has been observed in the wild. By
> logging the DRBD resource state in the old secondary, we managed to see a
> resource getting promoted to primary while it was:
>
>   WFSyncUUID Secondary/Primary Outdated/UpToDate
>
> We fix this by explicitly waiting for “Connected” cstate and
> “UpToDate/UpToDate” disks, as advised in [1]:
>
>   ”For this purpose and scenario,
>    you only want to promote once you are Connected UpToDate/UpToDate.”
>
> [1] http://lists.linbit.com/pipermail/drbd-user/2013-July/020173.html
>
> Signed-off-by: Apollon Oikonomopoulos <[email protected]>
> ---
>  lib/backend.py |   16 ++++++++++++++--
>  lib/bdev.py    |    1 +
>  2 files changed, 15 insertions(+), 2 deletions(-)
>
> diff --git a/lib/backend.py b/lib/backend.py
> index a75432b..9e12639 100644
> --- a/lib/backend.py
> +++ b/lib/backend.py
> @@ -3622,8 +3622,20 @@ def DrbdAttachNet(nodes_ip, disks, instance_name, 
> multimaster):
>      for rd in bdevs:
>        stats = rd.GetProcStatus()
>
> -      all_connected = (all_connected and
> -                       (stats.is_connected or stats.is_in_resync))
> +      if multimaster:
> +        # In the multimaster case we have to wait explicitly until
> +        # the resource is Connected and UpToDate/UpToDate, because
> +        # we promote *both nodes* to primary directly afterwards.
> +        # Being in resync is not enough, since there is a race during which 
> we
> +        # may promote a node with an Outdated disk to primary, effectively
> +        # tearing down the connection.
> +        all_connected = (all_connected and
> +                         stats.is_connected and
> +                         stats.is_disk_uptodate and
> +                         stats.peer_disk_uptodate)
> +      else:
> +        all_connected = (all_connected and
> +                         (stats.is_connected or stats.is_in_resync))
>
>        if stats.is_standalone:
>          # peer had different config info and this node became
> diff --git a/lib/bdev.py b/lib/bdev.py
> index 7623869..acc18ec 100644
> --- a/lib/bdev.py
> +++ b/lib/bdev.py
> @@ -1135,6 +1135,7 @@ class DRBD8Status(object):
>
>      self.is_diskless = self.ldisk == self.DS_DISKLESS
>      self.is_disk_uptodate = self.ldisk == self.DS_UPTODATE
> +    self.peer_disk_uptodate = self.rdisk == self.DS_UPTODATE

I'd change peer_disk_uptodate to is_peer_disk_uptodate, to be
consistent with the style of all the predicates defined here.

>
>      self.is_in_resync = self.cstatus in self.CSET_SYNC
>      self.is_in_use = self.cstatus != self.CS_UNCONFIGURED
> --
> 1.7.10.4
>

Rest LGTM.

Thanks a lot for the patch, Apollon, and thanks George for confirming it works.

Michele

-- 
Google Germany GmbH
Dienerstr. 12
80331 München

Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Geschäftsführer: Graham Law, Christine Elizabeth Flores

Reply via email to