On Thu, 7 Nov 2019 at 20:05, Scott Moser <ssmoser2+ubu...@gmail.com> wrote:
>
> > > So that means we have this sequence of events:
> > >  a.) growpart change partition table
> > >  b.) growpart call partx
> > >  c.) udev created and events being processed
>
> > That is not true. whilst sfdisk is deleting, creating, finishing
> > partition table (a) and partx is called (b), udev events are already fired
> > and running in parallel and may complete against deleted, partially new,
> > completely new partition table, with or without partx completed.
>
> You're correct... I left out some 'events created and handled' after 'a'.
> But that doesn't change anything.  The problem we're seeing here is *not*
> that 'b' had any issue.
>
> >
> > No amount of settling for events will fix the fact that events were run
> > against racy state of the partition table _during_ sfdisk and partx calls.
>
> complete non-sense.  I dont care about any racy state *during* anything. I
> call 'udevadm settle'.  That means "block until stuff is done."  I think
> you're saying that I cannot:
>  1.) do something that causes udev events
>  2.) wait until all udev events caused by that something are finished
>
> if that is the case, then nothing ever can fix this, and we might as well
> go find jobs on a farm.
>

Both those thing happen, but udev events are started processing whilst
the partition table changes have not completed yet. This is what is
document in the sfdisk manpage as a know bug that nobody yet has
managed to figure out and derace.
Meaning if the udev events happened, and one waits to finish their
processing, there is no guarantee that they have been processed
against consistent disk state.

This is why sfdisk recommends taking flock. And this is why udev also
tries to take an flock.

In the past IBM has demonstrated a race similar to this one in
https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1571707
where they tried to rapidly and in parallel partition 256 devices,
with only 89 of them successfully showing partitions after the limit
test is executed, and appear fully after a reboot in April 2016 on top
of Xenial.

-- 
Regards,

Dimitri.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-azure in Ubuntu.
https://bugs.launchpad.net/bugs/1834875

Title:
  cloud-init growpart race with udev

Status in cloud-init:
  Incomplete
Status in cloud-utils:
  New
Status in linux-azure package in Ubuntu:
  New
Status in systemd package in Ubuntu:
  Incomplete

Bug description:
  On Azure, it happens regularly (20-30%), that cloud-init's growpart
  module fails to extend the partition to full size.

  Such as in this example:

  ========================================

  2019-06-28 12:24:18,666 - util.py[DEBUG]: Running command ['growpart', 
'--dry-run', '/dev/sda', '1'] with allowed return codes [0] (shell=False, 
capture=True)
  2019-06-28 12:24:19,157 - util.py[DEBUG]: Running command ['growpart', 
'/dev/sda', '1'] with allowed return codes [0] (shell=False, capture=True)
  2019-06-28 12:24:19,726 - util.py[DEBUG]: resize_devices took 1.075 seconds
  2019-06-28 12:24:19,726 - handlers.py[DEBUG]: finish: 
init-network/config-growpart: FAIL: running config-growpart with frequency 
always
  2019-06-28 12:24:19,727 - util.py[WARNING]: Running module growpart (<module 
'cloudinit.config.cc_growpart' from 
'/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py'>) failed
  2019-06-28 12:24:19,727 - util.py[DEBUG]: Running module growpart (<module 
'cloudinit.config.cc_growpart' from 
'/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py'>) failed
  Traceback (most recent call last):
    File "/usr/lib/python3/dist-packages/cloudinit/stages.py", line 812, in 
_run_modules
      freq=freq)
    File "/usr/lib/python3/dist-packages/cloudinit/cloud.py", line 54, in run
      return self._runners.run(name, functor, args, freq, clear_on_fail)
    File "/usr/lib/python3/dist-packages/cloudinit/helpers.py", line 187, in run
      results = functor(*args)
    File "/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py", line 
351, in handle
      func=resize_devices, args=(resizer, devices))
    File "/usr/lib/python3/dist-packages/cloudinit/util.py", line 2521, in 
log_time
      ret = func(*args, **kwargs)
    File "/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py", line 
298, in resize_devices
      (old, new) = resizer.resize(disk, ptnum, blockdev)
    File "/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py", line 
159, in resize
      return (before, get_size(partdev))
    File "/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py", line 
198, in get_size
      fd = os.open(filename, os.O_RDONLY)
  FileNotFoundError: [Errno 2] No such file or directory: 
'/dev/disk/by-partuuid/a5f2b49f-abd6-427f-bbc4-ba5559235cf3'

  ========================================

  @rcj suggested this is a race with udev. This seems to only happen on
  Cosmic and later.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1834875/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to