Re: [Openstack-operators] [nova][cinder] Disabling nova volume-update (aka swap volume; aka cinder live migration)

2018-08-21 Thread Lee Yarwood
On 20-08-18 16:29:52, Matthew Booth wrote:
> For those who aren't familiar with it, nova's volume-update (also
> called swap volume by nova devs) is the nova part of the
> implementation of cinder's live migration (also called retype).
> Volume-update is essentially an internal cinder<->nova api, but as
> that's not a thing it's also unfortunately exposed to users. Some
> users have found it and are using it, but because it's essentially an
> internal cinder<->nova api it breaks pretty easily if you don't treat
> it like a special snowflake. It looks like we've finally found a way
> it's broken for non-cinder callers that we can't fix, even with a
> dirty hack.
> 
> volume-updateessentially does a live copy of the
> data on  volume to  volume, then seamlessly swaps the
> attachment to  from  to . The guest OS on 
> will not notice anything at all as the hypervisor swaps the storage
> backing an attached volume underneath it.
> 
> When called by cinder, as intended, cinder does some post-operation
> cleanup such that  is deleted and  inherits the same
> volume_id; that is  effectively becomes . When called any
> other way, however, this cleanup doesn't happen, which breaks a bunch
> of assumptions. One of these is that a disk's serial number is the
> same as the attached volume_id. Disk serial number, in KVM at least,
> is immutable, so can't be updated during volume-update. This is fine
> if we were called via cinder, because the cinder cleanup means the
> volume_id stays the same. If called any other way, however, they no
> longer match, at least until a hard reboot when it will be reset to
> the new volume_id. It turns out this breaks live migration, but
> probably other things too. We can't think of a workaround.
> 
> I wondered why users would want to do this anyway. It turns out that
> sometimes cinder won't let you migrate a volume, but nova
> volume-update doesn't do those checks (as they're specific to cinder
> internals, none of nova's business, and duplicating them would be
> fragile, so we're not adding them!). Specifically we know that cinder
> won't let you migrate a volume with snapshots. There may be other
> reasons. If cinder won't let you migrate your volume, you can still
> move your data by using nova's volume-update, even though you'll end
> up with a new volume on the destination, and a slightly broken
> instance. Apparently the former is a trade-off worth making, but the
> latter has been reported as a bug.
> 
> I'd like to make it very clear that nova's volume-update, isn't
> expected to work correctly except when called by cinder. Specifically
> there was a proposal that we disable volume-update from non-cinder
> callers in some way, possibly by asserting volume state that can only
> be set by cinder. However, I'm also very aware that users are calling
> volume-update because it fills a need, and we don't want to trap data
> that wasn't previously trapped.
> 
> Firstly, is anybody aware of any other reasons to use nova's
> volume-update directly?
> 
> Secondly, is there any reason why we shouldn't just document then you
> have to delete snapshots before doing a volume migration? Hopefully
> some cinder folks or operators can chime in to let me know how to back
> them up or somehow make them independent before doing this, at which
> point the volume itself should be migratable?
> 
> If we can establish that there's an acceptable alternative to calling
> volume-update directly for all use-cases we're aware of, I'm going to
> propose heading off this class of bug by disabling it for non-cinder
> callers.

I'm definitely in favor of hiding this from users eventually but
wouldn't this require some form of deprecation cycle?

Warnings within the API documentation would also be useful and even
something we could backport to stable to highlight just how fragile this
API is ahead of any policy change.

Cheers,

-- 
Lee Yarwood A5D1 9385 88CB 7E5F BE64  6618 BCA6 6E33 F672 2D76


signature.asc
Description: PGP signature
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [openstack-dev] Reminder to add "nova-status upgrade check" to deployment tooling

2018-06-15 Thread Lee Yarwood
On 13-06-18 10:14:32, Matt Riedemann wrote:
> I was going through some recently reported nova bugs and came across [1]
> which I opened at the Summit during one of the FFU sessions where I realized
> the nova upgrade docs don't mention the nova-status upgrade check CLI [2]
> (added in Ocata).
> 
> As a result, I was wondering how many deployment tools out there support
> upgrades and from those, which are actually integrating that upgrade status
> check command.

TripleO doesn't at present but like OSA it looks trivial to add:

https://github.com/openstack/tripleo-heat-templates/blob/master/docker/services/nova-api.yaml

I've created the following bug to track this:

https://bugs.launchpad.net/tripleo/+bug/1777060

> I'm not really familiar with most of them, but I've dabbled in OSA enough to
> know where the code lived for nova upgrades, so I posted a patch [3].
> 
> I'm hoping this can serve as a template for other deployment projects to
> integrate similar checks into their upgrade (and install verification)
> flows.
> 
> [1] https://bugs.launchpad.net/nova/+bug/1772973
> [2] https://docs.openstack.org/nova/latest/cli/nova-status.html
> [3] https://review.openstack.org/#/c/575125/

Cheers,

-- 
Lee Yarwood A5D1 9385 88CB 7E5F BE64  6618 BCA6 6E33 F672 2D76

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] [ffu][upgrades] Dublin PTG room and agenda

2018-02-19 Thread Lee Yarwood
Hello all,

A very late mail to highlight that there will once again be a 1 day
track/room dedicated to talking about Fast-forward upgrades at the
upcoming PTG in Dublin. The etherpad for which is listed below:

https://etherpad.openstack.org/p/ffu-ptg-rocky

Please feel free to add items to the pad, I'd really like to see some
concrete action items finally come from these discussions ahead of R.

Thanks in advance and see you in Dublin!

-- 
Lee Yarwood A5D1 9385 88CB 7E5F BE64  6618 BCA6 6E33 F672 2D76


signature.asc
Description: PGP signature
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [openstack-dev] [skip-level-upgrades][fast-forward-upgrades] PTG summary

2017-09-29 Thread Lee Yarwood
On 29-09-17 11:40:21, Saverio Proto wrote:
> Hello,
> 
> sorry I could not make it to the PTG.
> 
> I have an idea that I want to share with the community. I hope this is a
> good place to start the discussion.
> 
> After years of Openstack operations, upgrading releases from Icehouse to
> Newton, the feeling is that the control plane upgrade is doable.
> 
> But it is also a lot of pain to upgrade all the compute nodes. This
> really causes downtime to the VMs that are running.
> I can't always make live migrations, sometimes the VMs are just too big
> or too busy.
> 
> It would be nice to guarantee the ability to run an updated control
> plane with compute nodes up to N-3 Release.
> 
> This way even if we have to upgrade the control plane every 6 months, we
> can keep a longer lifetime for compute nodes. Basically we can never
> upgrade them until we decommission the hardware.
> 
> If there are new features that require updated compute nodes, we can
> always organize our datacenter in availability zones, not scheduling new
> VMs to those compute nodes.
> 
> To my understanding this means having compatibility at least for the
> nova-compute agent and the neutron-agents running on the compute node.
> 
> Is it a very bad idea ?
> 
> Do other people feel like me that upgrading all the compute nodes is
> also a big part of the burden regarding the upgrade ?

Yeah, I don't think the Nova community would ever be able or willing to
verify and maintain that level of backward compatibility. Ultimately
there's nothing stopping you from upgrading Nova on the computes while
also keeping instance running.

You only run into issues with kernel, OVS and QEMU (for n-cpu with
libvirt) etc upgrades that require reboots or instances to be restarted
(either hard or via live-migration). If you're unable or just unwilling
to take downtime for instances that can't be moved when these components
require an update then you have bigger problems IMHO.

Regards,

Lee
-- 
Lee Yarwood A5D1 9385 88CB 7E5F BE64  6618 BCA6 6E33 F672 2D76


signature.asc
Description: PGP signature
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [openstack-dev] [skip-level-upgrades][fast-forward-upgrades] PTG summary

2017-09-28 Thread Lee Yarwood
On 20-09-17 14:56:20, arkady.kanev...@dell.com wrote:
> Lee,
> I can chair meeting in Sydney.
> Thanks,
> Arkady

Thanks Arkady!

FYI I see that emccormickva has created the following Forum session to
discuss FF upgrades:

http://forumtopics.openstack.org/cfp/details/19

You might want to reach out to him to help craft the agenda for the
session based on our discussions in Denver.

Thanks again,

Lee
-- 
Lee Yarwood A5D1 9385 88CB 7E5F BE64  6618 BCA6 6E33 F672 2D76

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] [skip-level-upgrades][upgrades] Denver PTG room & etherpad

2017-08-21 Thread Lee Yarwood
Hello all,

This is a brief announcement to highlight that there will be a skip
level upgrades room again at the PTG in Denver. I'll be chairing the
room and have seeded the etherpad below with a few goal and topic ideas.
I'd really welcome additional input from others, especially if you were
present at the previous discussions in Boston!

https://etherpad.openstack.org/p/queens-PTG-skip-level-upgrades

Thanks in advance and see you in Denver!

Lee
-- 
Lee Yarwood A5D1 9385 88CB 7E5F BE64  6618 BCA6 6E33 F672 2D76

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova] Do we have users of CryptsetupEncryptor and if so why?

2016-11-08 Thread Lee Yarwood
On 07-11-16 17:42:02, Lee Yarwood wrote:
> Hello all,
> 
> The following bug was recently discovered where encrypted volumes
> created prior to Newton use a slightly mangled passphrase :
> 
> The passphrase used to encrypt or decrypt volumes was mangled prior to Newton
> https://launchpad.net/bugs/1633518
> 
> This is currently being resolved for LUKS based volumes in the following
> change with the incorrect passphrase being removed and replaced :
> 
> encryptors: Workaround mangled passphrases
> https://review.openstack.org/#/c/386670/
> 
> Unfortunately we can't do the same for volumes using the plain format
> provided by the CryptsetupEncryptor class. While the above change does
> include a workaround it would be better if we could deprecate this
> format and encryptor for new volumes ASAP and move everyone to LUKS etc.
> 
> Before deprecating CryptsetupEncryptor I wanted to ask this list if we
> have any active users of this encryptor and if so why is it being used?
> Is there a specific use case where plain is better than LUKS and thus
> needs to stay around?
> 
> Thanks in advance,
> 
> Lee

CC'ing openstack-dev for some additional feedback.

-- 
Lee Yarwood
Senior Software Engineer
Red Hat

PGP : A5D1 9385 88CB 7E5F BE64  6618 BCA6 6E33 F672 2D76

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] [nova] Do we have users of CryptsetupEncryptor and if so why?

2016-11-07 Thread Lee Yarwood
Hello all,

The following bug was recently discovered where encrypted volumes
created prior to Newton use a slightly mangled passphrase :

The passphrase used to encrypt or decrypt volumes was mangled prior to Newton
https://launchpad.net/bugs/1633518

This is currently being resolved for LUKS based volumes in the following
change with the incorrect passphrase being removed and replaced :

encryptors: Workaround mangled passphrases
https://review.openstack.org/#/c/386670/

Unfortunately we can't do the same for volumes using the plain format
provided by the CryptsetupEncryptor class. While the above change does
include a workaround it would be better if we could deprecate this
format and encryptor for new volumes ASAP and move everyone to LUKS etc.

Before deprecating CryptsetupEncryptor I wanted to ask this list if we
have any active users of this encryptor and if so why is it being used?
Is there a specific use case where plain is better than LUKS and thus
needs to stay around?

Thanks in advance,

Lee
-- 
Lee Yarwood
Senior Software Engineer
Red Hat

PGP : A5D1 9385 88CB 7E5F BE64  6618 BCA6 6E33 F672 2D76

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators