On Mon, Mar 3, 2014 at 4:36 PM, Jose A. Lopes <[email protected]> wrote:
> On Mar 03 16:30, Michele Tartara wrote:
>> On Mon, Mar 3, 2014 at 4:05 PM, Jose A. Lopes <[email protected]> wrote:
>> > This patch changes the watcher to check whether an instance that is
>> > down is also locked by some LU before attempting to restart the
>> > instance.  Without checking the lock status, the watcher could think
>> > that an instance that is being failed over is actually down, for
>> > example.
>> >
>> > This problem occurs because there is a significant time window between
>> > 'xm stop' and 'xm destroy' during which an instance is reported as
>> > being down, but the watcher should not act during this period.
>> >
>> > This fixes issue 734.
>> > This introduces issue 743.
>> >
>> > Unfortunately, this fix introduces a race condition given at the
>> > moment it is not possible to query the instance status and the lock
>> > status simultaneously.  It won't be possible to fix this race
>> > condition until after the locks have been migrated completely to
>> > Haskell and the cluster configuration can be functionally updated in
>> > Haskell, which will also allow the simultaneous queries.
>>
>> How bad is this? Just a theoretical possibility or a race condition
>> that could actually show up more or less frequently?
>
> So the race condition occurs in the following situation:
>
>   1. the lock is free
>   2. the watcher queries the lock and sees it is free
>   3. some lu starts and acquires the lock
>   4. the lu stops the instance
>   5. the watcher queries the instance status and sees it is down
>   6. the watcher sends an opcode to clean the instance (because the lock is 
> free and the instance is down)
>
> This is very unlikely because two queries (lock + status) are much
> faster than acquiring the lock and shutting down an instance.

Ok, and given that it cannot really be solved until we complete the
migration, and that you already filed a bug for it, we can deal with
it for now.

>
>> >
>> > Signed-off-by: Jose A. Lopes <[email protected]>
>> > ---
>> >  lib/watcher/__init__.py | 33 +++++++++++++++++++++++++++------
>> >  1 file changed, 27 insertions(+), 6 deletions(-)
>> >
>> > diff --git a/lib/watcher/__init__.py b/lib/watcher/__init__.py
>> > index a56ee4e..5abd282 100644
>> > --- a/lib/watcher/__init__.py
>> > +++ b/lib/watcher/__init__.py
>> > @@ -173,9 +173,20 @@ class Node:
>> >      self.secondaries = secondaries
>> >
>> >
>> > -def _CleanupInstance(cl, notepad, inst):
>> > +def _CleanupInstance(cl, notepad, inst, locks):
>> >    n = notepad.NumberOfCleanupAttempts(inst.name)
>> >
>> > +  lock = None
>> > +  for (name, lock_status) in locks:
>> > +    if name == inst.name:
>> > +      lock = lock_status
>> > +      break
>>
>> I'm not entirely convinced by this. The number of instances can be
>> big, and each of them might have a few locks. Therefore, the list of
>> tuples (name, lock_status) can be quite long, and iterating over it
>> seems to be not really optimal.
>>
>> Wouldn't it be better to have the locks structure as a dictionary
>> where the key is the name of the instance?
>>
>> Also, if I'm not mistaken we don't really care about which locks are
>> taken, but only whether the instance is locked, so this would make the
>> required data structure even easier and smaller: {name:
>> bool_is_locked, ...}
>>
>> What do you think?
>
> Sure! We can use a set instead.
>
> Thanks,
> Jose
>
>> > +
>> > +  if lock is not None:
>> > +    logging.info("Not cleaning up instance '%s', instance is locked",
>> > +                 inst.name)
>> > +    return
>> > +
>> >    if n > MAXTRIES:
>> >      logging.warning("Not cleaning up instance '%s', retries exhausted",
>> >                      inst.name)
>> > @@ -194,7 +205,7 @@ def _CleanupInstance(cl, notepad, inst):
>> >      notepad.RecordCleanupAttempt(inst.name)
>> >
>> >
>> > -def _CheckInstances(cl, notepad, instances):
>> > +def _CheckInstances(cl, notepad, instances, locks):
>> >    """Make a pass over the list of instances, restarting downed ones.
>> >
>> >    """
>> > @@ -204,7 +215,7 @@ def _CheckInstances(cl, notepad, instances):
>> >
>> >    for inst in instances.values():
>> >      if inst.status == constants.INSTST_USERDOWN:
>> > -      _CleanupInstance(cl, notepad, inst)
>> > +      _CleanupInstance(cl, notepad, inst, locks)
>> >      elif inst.status in BAD_STATES:
>> >        n = notepad.NumberOfRestartAttempts(inst.name)
>> >
>> > @@ -648,6 +659,15 @@ def _GetGroupData(qcl, uuid):
>> >    """Retrieves instances and nodes per node group.
>> >
>> >    """
>> > +  locks = qcl.Query(constants.QR_LOCK, ["name", "mode"], None)
>> > +
>> > +  prefix = "instance/"
>> > +  prefix_len = len(prefix)
>> > +
>> > +  locks = [(name[prefix_len:], lock)
>> > +           for [[_, name], [_, lock]] in locks.data
>> > +           if name.startswith(prefix)]
>> > +
>> >    queries = [
>> >        (constants.QR_INSTANCE,
>> >         ["name", "status", "disks_active", "snodes",
>> > @@ -693,7 +713,8 @@ def _GetGroupData(qcl, uuid):
>> >             for (name, bootid, offline) in raw_nodes]
>> >
>> >    return (dict((node.name, node) for node in nodes),
>> > -          dict((inst.name, inst) for inst in instances))
>> > +          dict((inst.name, inst) for inst in instances),
>> > +          locks)
>> >
>> >
>> >  def _LoadKnownGroups():
>> > @@ -751,7 +772,7 @@ def _GroupWatcher(opts):
>> >
>> >      _CheckMaster(client)
>> >
>> > -    (nodes, instances) = _GetGroupData(query_client, group_uuid)
>> > +    (nodes, instances, locks) = _GetGroupData(query_client, group_uuid)
>> >
>> >      # Update per-group instance status file
>> >      _UpdateInstanceStatus(inst_status_path, instances.values())
>> > @@ -760,7 +781,7 @@ def _GroupWatcher(opts):
>> >                           pathutils.WATCHER_GROUP_INSTANCE_STATUS_FILE,
>> >                           known_groups)
>> >
>> > -    started = _CheckInstances(client, notepad, instances)
>> > +    started = _CheckInstances(client, notepad, instances, locks)
>> >      _CheckDisks(client, notepad, nodes, instances, started)
>> >      _VerifyDisks(client, group_uuid, nodes, instances)
>> >    except Exception, err:
>> > --
>> > 1.9.0.279.gdc9e3eb
>> >
>>
>> Thanks,
>> Michele
>>
>> --
>> Google Germany GmbH
>> Dienerstr. 12
>> 80331 München
>>
>> Registergericht und -nummer: Hamburg, HRB 86891
>> Sitz der Gesellschaft: Hamburg
>> Geschäftsführer: Graham Law, Christine Elizabeth Flores
>
> --
> Jose Antonio Lopes
> Ganeti Engineering
> Google Germany GmbH
> Dienerstr. 12, 80331, München
>
> Registergericht und -nummer: Hamburg, HRB 86891
> Sitz der Gesellschaft: Hamburg
> Geschäftsführer: Graham Law, Christine Elizabeth Flores
> Steuernummer: 48/725/00206
> Umsatzsteueridentifikationsnummer: DE813741370


Thanks,
Michele

-- 
Google Germany GmbH
Dienerstr. 12
80331 München

Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Geschäftsführer: Graham Law, Christine Elizabeth Flores

Reply via email to