[openstack-dev] [Nova] Virtual device role tagging

2015-07-14 Thread Artom Lifshitz
Hello,

I'd like to get the conversation started around a spec that my colleague Dan
Berrange has proposed to the backlog.

The spec [1] solves the problem of passing information about virtual devices
into an instance.

For example, in an instance with multiple network interfaces, each connected to
profoundly different networks, software running inside the instance needs to
know each NIC's role. Similarly, in an instance with multiple disks, each
intended for different a usage, software inside the instance needs to know each
disk's role.

I feel like a lot of discussion will happen around this spec before it can
be merged - hopefully in the M cycle - so I'm requesting comments and
suggestions very early ;)

Thanks all!

[1] https://review.openstack.org/#/c/195662/1

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [Nova] Add config option for real deletes instead of soft-deletes

2015-04-21 Thread Artom Lifshitz
Hello,

I'd like to gauge acceptance of introducing a feature that would give operators
a config option to perform real database deletes instead of soft deletes.

There's definitely a need for *something* that cleans up the database. There
have been a few attempts at a DB purge engine [1][2][3][4][5], and archiving to
shadow tables has been merged [6] (though that currently has some issues [7]).

DB archiving notwithstanding, the general response to operators when they   

mention the database becoming too big seems to be "DIY cleanup."

I would like to propose a different approach: add a config option that turns
soft-deletes into real deletes, and start telling operators "if you turn this
on, it's DIY backups."

Would something like that be acceptable and feasible? I'm ready to put in the
work to implement this, however searching the mailing list indicates that it
would be somewhere between non trivial and impossible [8]. Before I start, I
would like some confidence that it's closer to the former than the latter :)

Cheers!

[1] https://blueprints.launchpad.net/nova/+spec/db-purge-engine
[2] https://blueprints.launchpad.net/nova/+spec/db-purge2
[3] https://blueprints.launchpad.net/nova/+spec/remove-db-archiving
[4] https://blueprints.launchpad.net/nova/+spec/database-purge
[5] https://blueprints.launchpad.net/nova/+spec/db-archiving
[6] https://review.openstack.org/#/c/18493/
[7] https://review.openstack.org/#/c/109201/
[8] 
http://lists.openstack.org/pipermail/openstack-operators/2014-November/005591.html

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [Nova] Virt device role tagging for other virt drivers

2016-06-15 Thread Artom Lifshitz
Hey folks,

For a while now we've been working on virt device role tagging.

The full spec is here [1], but the quick gist of it is that device
tagging is a way for the user to assign arbitrary string tags to
either vNICs or block devices. Those tags then get exposed by the
metadata API to the guest, along with other device metadata such as
bus and address, for example PCI :00:02.0.

This work is being done for the libvirt driver, and we would obviously
love it if other drivers implemented the functionality. This email is
meant to get this cooperation started.

A good starting point for developers of other drivers is our own
libvirt implementation [2]. The basic idea is that we use new objects
from [3] to build the metadata hierarchy. The hierarchy is then saved
in the database in the instance_extra table, of which you can see the
details here [4]. This is pretty much the only functionality that
other virt drivers would need to implement. Everything else (API,
metadata API) is being handled by us, though of course we welcome your
feedback.

I hope I've been concise yet complete. If you have any questions don't
hesitate to ask either vladikr or artom on IRC.

Cheers!

[1] 
http://specs.openstack.org/openstack/nova-specs/specs/newton/approved/virt-device-role-tagging.html
[2] https://review.openstack.org/#/c/264016/42/nova/virt/libvirt/driver.py
[3] 
https://github.com/openstack/nova/blob/master/nova/objects/virt_device_metadata.py
[4] https://review.openstack.org/#/c/327920/

--
Artom Lifshitz

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Non-priority feature freeze and FFEs

2016-07-05 Thread Artom Lifshitz
> The Hyper-V implementation of the bp virt-device-role-tagging is mergeable 
> [1]. The patch is quite simple, it got some reviews, and the tempest test 
> test_device_tagging [2] passed. [3]
>
> [1] https://review.openstack.org/#/c/331889/
> [2] https://review.openstack.org/#/c/305120/
> [3] http://64.119.130.115/debug/nova/331889/8/04-07-2016_19-43/results.html.gz

For what it's worth, the implementation for libvirt and all the
plumbing in the API, metadata API, compute manager, etc, has merged,
so this can be thought of as a continuation of that same patch series.

There's the XenAPI implementation [4] as well, but that's not
mergeable in its current state.

[4] https://review.openstack.org/#/c/333781/

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Device tag in the API breaks in the old microversion

2017-01-24 Thread Artom Lifshitz
> So the current API behavior is as below:
>
> 2.32: BDM tag and network device tag added.
> 2.33 - 2.36: 'tag' in the BDM disappeared. The network device tag still
> works.
> 2.37: The network device tag disappeared also.

Thanks for the summary. For the visual minded like me, I made some
ASCII art of the above:

http://paste.openstack.org/raw/596225/

> There are few questions we should think about:
>
> 1. Should we fix that by Microversion?
> Thanks to Chris Dent point that out in the review. I also think we need
> to bump Microversion, which follow the rule of Microversion.

I don't think we have a choice - we'd be adding new API parameters
that didn't exist in, for example, 2.39.

> 2. If we need Microversion, is that something we can do before release?
> We are very close to the feature freeze. And in normal, we need spec for
> microversion. Maybe we only can do that in Pike. For now we can update the
> API-ref, and microversion history to notice that, maybe a reno also.

I think it's too late before FF to do any functional fixes. I vote we
document our screw up in the api-ref at least, and during Pike we can
merge a new microversion that fixes this mess.

> 2. How can we prevent that happened again?
>Both of those patches were reviewed multiple cycles. But we still miss
> that. It is worth to think about how to prevent that happened again.
>
>Talk with Sean. He suggests stop passing plain string version to the
> schema extension point. We should always pass APIVersionRequest object
> instead of plain string. Due to "version == APIVersionRequest('2.32')" is
> always wrong, we should remove the '__eq__'. The developer should always use
> the 'APIVersionRequest.matches' [3] method.

This looks like a smart way to make sure all API version comparisons
are of the less than/greater than kind.

>That can prevent the first mistake we made. But nothing help for second
> mistake. Currently we only run the test on the specific Microversion for the
> specific interesting point. In the before, the new tests always inherits
> from the previous microversion tests, just like [4]. That can test the old
> API behavior won't be changed in the new microversion. But now, we said that
> is waste, we didn't do that again just like [5]. Should we change that back?

An idea would be to run all functional tests against 2.latest. This
doesn't cover all microversions, but since as time progresses and
2.latest increases, all previous microversions will have been covered
in the past, and it gives us some confidence that we didn't break
anything. This doesn't work for patches that removed an API parameter
for example, so those kinds of changes will have to be an exception.

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Device tag in the API breaks in the old microversion

2017-01-26 Thread Artom Lifshitz
Since the consensus is to fix this with a new microversion, I've
submitted some patches:

* https://review.openstack.org/#/c/426030/
  A spec for the new microversion in case folks want one.

* https://review.openstack.org/#/c/424759/
  The new microversion itself. I've already had feedback from Alex and
Ghanshyam (thanks guys!), and I've tried to address it.

* https://review.openstack.org/#/c/425876/
  A patch to - as Alex and Sean suggested - stop passing plain string
version to the schema extension point.

On Tue, Jan 24, 2017 at 10:38 PM, Matt Riedemann  wrote:
> On 1/24/2017 8:16 PM, Alex Xu wrote:
>>
>>
>>
>> One other thing: we're going to need to also fix this in
>> python-novaclient, which we might want to do first, or work
>> concurrently, since that's going to give us the client side
>> perspective on how gross it will be to deal with this issue.
>>
>>
>
> This is Andrey's patch to at least document the limitation:
>
> https://review.openstack.org/#/c/424745/
>
> We'll have to fix the client to use the new microversion in Pike (or at
> least release the fix in Pike) since the client release freeze is Thursday.
>
>
> --
>
> Thanks,
>
> Matt Riedemann
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



-- 
--
Artom Lifshitz

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Device tag in the API breaks in the old microversion

2017-01-31 Thread Artom Lifshitz
The more urgent stuff has indeed merged - many thanks to Matt and
other cores for getting this in quickly before rc1. The fixes to tests
do indeed need more attention, which I will provide :)

On Mon, Jan 30, 2017 at 8:49 PM, Matt Riedemann  wrote:
> On 1/26/2017 8:32 PM, Artom Lifshitz wrote:
>>
>> Since the consensus is to fix this with a new microversion, I've
>> submitted some patches:
>>
>> * https://review.openstack.org/#/c/426030/
>>   A spec for the new microversion in case folks want one.
>
>
> Merged.
>
>>
>> * https://review.openstack.org/#/c/424759/
>>   The new microversion itself. I've already had feedback from Alex and
>> Ghanshyam (thanks guys!), and I've tried to address it.
>
>
> +2 from me, +1 from gmann. The Tempest patch for the 2.42 microversion is
> here:
>
> https://review.openstack.org/#/c/426991/1
>
>>
>> * https://review.openstack.org/#/c/425876/
>>   A patch to - as Alex and Sean suggested - stop passing plain string
>> version to the schema extension point.
>>
>
> Needs some work.
>
>
> --
>
> Thanks,
>
> Matt Riedemann
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



-- 
--
Artom Lifshitz

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] Device tagging: rebuild config drive upon instance reboot to refresh metadata on it

2017-02-17 Thread Artom Lifshitz
Early on in the inception of device role tagging, it was decided that
it's acceptable that the device metadata on the config drive lags
behind the metadata API, as long as it eventually catches up, for
example when the instance is rebooted and we get a chance to
regenerate the config drive.

So far this hasn't really been a problem because devices could only be
tagged at instance boot time, and the tags never changed. So the
config drive was pretty always much up to date.

In Pike the tagged device attachment series of patches [1] will
hopefully merge, and we'll be in a situation where device tags can
change during instance uptime, which makes it that much more important
to regenerate the config drive whenever we get a chance.

However, when the config drive is first generated, some of the
information stored in there is only available at instance boot time
and is not persisted anywhere, as far as I can tell. Specifically, the
injected_files and admin_pass parameters [2] are passed from the API
and are not stored anywhere.

This creates a problem when we want to regenerated the config drive,
because the information that we're supposed to put in it is no longer
available to us.

We could start persisting this information in instance_extra, for
example, and pulling it up when the config drive is regenerated. We
could even conceivably hack something to read the metadata files from
the "old" config drive before refreshing them with new information.
However, is that really worth it? I feel like saying "the config drive
is static, deal with it - if you want to up to date metadata, use the
API" is an equally, if not more, valid option.

Thoughts? I know y'all are flying out to the PTG, so I'm unlikely to
get responses, but I've at least put my thoughts into writing, and
will be able to refer to them later on :)

[1] 
https://review.openstack.org/#/q/status:open+topic:bp/virt-device-tagged-attach-detach
[2] 
https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L2667-L2672

--
Artom Lifshitz

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Device tagging: rebuild config drive upon instance reboot to refresh metadata on it

2017-02-18 Thread Artom Lifshitz
In reply to Michael:

> We have had this discussion several times in the past for other reasons. The
> reality is that some people will never deploy the metadata API, so I feel
> like we need a better solution than what we have now.

Aha, that's definitely a good reason to continue making the config
drive a first-class citizen.

> However, I would consider it probably unsafe for the hypervisor to read the
> current config drive to get values

Yeah, I was using the word "hack" very generously ;)

> and persisting things like the instance
> root password in the Nova DB sounds like a bad idea too.

I hadn't even thought of the security implication. That's a very good
point, there's no way to persist admin_pass in securely. We'll have to
read it at some point, so no amount of encryption will change
anything. We can argue that since we already store admin_pass on the
config drive, storing it in the database as well is OK (it's probably
immediately changed anyways), but there's a difference between having
it in a file on a single compute node, and in the database accessible
by the entire deployment.

In reply to Clint:

> Agreed. What if we simply have a second config drive that is for "things
> that change" and only rebuild that one on reboot?

We've already set the precedent that there's a single config drive
with the device tagging metadata on it, I don't think we can go back
on that promise.


So while we shouldn't read from the config drive to get current values
in order to afterwards monolithically regenerate a new one, we could
try just writing to the files we want changed. I'm thinking of a
system where code that needs to change information on the config drive
would have a way of telling it "here are the new values for
device_metadata", and whenever we next get a chance, for example when
the instance is rebooted, those values are saved on the config drive.

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Device tagging: rebuild config drive upon instance reboot to refresh metadata on it

2017-02-18 Thread Artom Lifshitz
A few good points were made:

* the config drive could be VFAT, in which case we can't trust what's
on it because the guest has write access
* if the config drive is ISO9660, we can't selectively write to it, we
need to regenerate the whole thing - but in this case it's actually
safe to read from (right?)
* the point about the precedent being set that the config drive
doesn't change... I'm not sure I 100% agree. There's definitely a
precedent that information on the config drive will remain present for
the entire instance lifetime (so the admin_pass won't disappear after
a reboot, even if using that "feature" in a workflow seems ludicrous),
but we've made no promises that the information itself will remain
constant. For example, nothing says the device metadata must remain
unchanged after a reboot.

Based on that here's what I propose:

If the config drive is vfat, we can just update the information on it
that we need to update. In the device metadata case, we write a new
JSON file, overwriting the old one.

If the config drive is ISO9660, we can safely read from it to fill in
what information isn't persisted anywhere else, then update it with
the new stuff we want to change. Then write out the new image.



On Sat, Feb 18, 2017 at 12:36 PM, Dean Troyer  wrote:
> On Sat, Feb 18, 2017 at 10:23 AM, Clint Byrum  wrote:
>> But I believe Michael is not saying "it's unsafe to read the json
>> files" but rather "it's unsafe to read the whole config drive". It's
>> an ISO filesystem, so you can't write to it. You have to read the whole
>> contents back into a directory and regenerate it. I'm guessing Michael
>> is concerned that there is some danger in doing this, though I can't
>> imagine what it is.
>
> Nova can be configured for config drive to be a VFAT filesystem, which
> can not be trusted.  Unfortunately this is (was??) required for
> libvirt live migration to work so is likely to not be an edge case in
> deployments.
>
> The safest read-back approach would be to generate both ISO9660 and
> VFAT (if configured) and only read back from the ISO version.  But
> yuck, two config drive images...still better than passwords in the
> database.
>
> dt
>
> --
>
> Dean Troyer
> dtro...@gmail.com
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



-- 
--
Artom Lifshitz

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Device tagging: rebuild config drive upon instance reboot to refresh metadata on it

2017-02-20 Thread Artom Lifshitz
Config drive over read-only NFS anyone?


A shared filesystem so that both Nova and the guest can do IO on it at the
same time is indeed the proper way to solve this. But I'm afraid of the
ramifications in terms of live migrations and all other operations we can
do on VMs...


Michael

On Sun, Feb 19, 2017 at 6:12 AM, Steve Gordon  wrote:

> - Original Message -
> > From: "Artom Lifshitz" 
> > To: "OpenStack Development Mailing List (not for usage questions)" <
> openstack-dev@lists.openstack.org>
> > Sent: Saturday, February 18, 2017 8:11:10 AM
> > Subject: Re: [openstack-dev] [nova] Device tagging: rebuild config drive
> upon instance reboot to refresh metadata on
> > it
> >
> > In reply to Michael:
> >
> > > We have had this discussion several times in the past for other
> reasons.
> > > The
> > > reality is that some people will never deploy the metadata API, so I
> feel
> > > like we need a better solution than what we have now.
> >
> > Aha, that's definitely a good reason to continue making the config
> > drive a first-class citizen.
>
> The other reason is that the metadata API as it stands isn't an option for
> folks trying to do IPV6-only IIRC.
>
> -Steve
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>



-- 
Rackspace Australia

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Device tagging: rebuild config drive upon instance reboot to refresh metadata on it

2017-02-20 Thread Artom Lifshitz
I don't think we're trying to re-invent configuration management in
Nova. We have this problem where we want to communicate to the guest,
from the host, a bunch of dynamic metadata that can change throughout
the guest's lifetime. We currently have two possible avenues for this
already in place, and both have problems:

1. The metadata service isn't universally deployed by operators for
security and other reasons.
2. The config drive was never designed for dynamic metadata.

So far in this thread we've mostly been discussing ways to shoehorn a
solution into the config drive avenue, but that's going to be ugly no
matter what because it was never designed for what we're trying to do
in the first place.

Some folks are saying that we admit that the config drive is only for
static information and metadata that is known at boot time, and work
on a third way to communicate dynamic metadata to the guest. I can get
behind that 100%. I like the virtio-vsock option, but that's only
supported by libvirt IIUC. We've got device tagging support in hyper-v
as well, and xenapi hopefully on the way soon [1], so we need
something a bit more universal. How about fixing up the metadata
service to be more deployable, both in terms of security, and IPv6
support?

[1] https://review.openstack.org/#/c/333781/

On Mon, Feb 20, 2017 at 10:35 AM, Clint Byrum  wrote:
> Excerpts from Jay Pipes's message of 2017-02-20 10:00:06 -0500:
>> On 02/17/2017 02:28 PM, Artom Lifshitz wrote:
>> > Early on in the inception of device role tagging, it was decided that
>> > it's acceptable that the device metadata on the config drive lags
>> > behind the metadata API, as long as it eventually catches up, for
>> > example when the instance is rebooted and we get a chance to
>> > regenerate the config drive.
>> >
>> > So far this hasn't really been a problem because devices could only be
>> > tagged at instance boot time, and the tags never changed. So the
>> > config drive was pretty always much up to date.
>> >
>> > In Pike the tagged device attachment series of patches [1] will
>> > hopefully merge, and we'll be in a situation where device tags can
>> > change during instance uptime, which makes it that much more important
>> > to regenerate the config drive whenever we get a chance.
>> >
>> > However, when the config drive is first generated, some of the
>> > information stored in there is only available at instance boot time
>> > and is not persisted anywhere, as far as I can tell. Specifically, the
>> > injected_files and admin_pass parameters [2] are passed from the API
>> > and are not stored anywhere.
>> >
>> > This creates a problem when we want to regenerated the config drive,
>> > because the information that we're supposed to put in it is no longer
>> > available to us.
>> >
>> > We could start persisting this information in instance_extra, for
>> > example, and pulling it up when the config drive is regenerated. We
>> > could even conceivably hack something to read the metadata files from
>> > the "old" config drive before refreshing them with new information.
>> > However, is that really worth it? I feel like saying "the config drive
>> > is static, deal with it - if you want to up to date metadata, use the
>> > API" is an equally, if not more, valid option.
>>
>> Yeah, config drive should, IMHO, be static, readonly. If you want to
>> change device tags or other configuration data after boot, use a
>> configuration management system or something like etcd watches. I don't
>> think Nova should be responsible for this.
>
> I tend to agree with you, and I personally wouldn't write apps that need
> this. However, in the interest of understanding the desire to change this,
> I think the scenario is this:
>
> 1) Servers are booted with {n_tagged_devices} and come up, actions happen
> using automated thing that reads device tags and reacts accordingly.
>
> 2) A new device is added to the general configuration.
>
> 3) New servers configure themselves with the new devices automatically. But
> existing servers do not have those device tags in their config drive. In
> order to configure these, one would now have to write a fair amount of
> orchestration to duplicate what already exists for new servers.
>
> While I'm a big fan of the cattle approach (just delete those old
> servers!) I don't think OpenStack is constrained enough to say that
> this is always going to be efficient. And writing two paths for server
> configuration feels like repeating yoursel

Re: [openstack-dev] [nova] Device tagging: rebuild config drive upon instance reboot to refresh metadata on it

2017-02-20 Thread Artom Lifshitz
> But before doing that though, I think it'd be worth understanding whether
> metadata-over-vsock support would be acceptable to people who refuse
> to deploy metadata-over-TCPIP today.

Sure, although I'm still concerned that it'll effectively make tagged
hotplug libvirt-only.

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Device tagging: rebuild config drive upon instance reboot to refresh metadata on it

2017-02-20 Thread Artom Lifshitz
>> But before doing that though, I think it'd be worth understanding whether
>> metadata-over-vsock support would be acceptable to people who refuse
>> to deploy metadata-over-TCPIP today.
>
> Sure, although I'm still concerned that it'll effectively make tagged
> hotplug libvirt-only.

Upon rethink, that not strictly true, there's still the existing
metadata service that works across all hypervisor drivers. I know
we're far for feature parity across all virt drivers, but would
metadata-over-vsock be acceptable? That's not even lack of feature
parity, that's a specific feature being exposed in a different (and
arguably worse) way depending on the virt driver.

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Device tagging: rebuild config drive upon instance reboot to refresh metadata on it

2017-02-20 Thread Artom Lifshitz
> But before doing that though, I think it'd be worth understanding whether
> metadata-over-vsock support would be acceptable to people who refuse
> to deploy metadata-over-TCPIP today.

I wrote a thing [1], let's see what happens.

[1] 
http://lists.openstack.org/pipermail/openstack-operators/2017-February/012724.html

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Device tagging: rebuild config drive upon instance reboot to refresh metadata on it

2017-02-27 Thread Artom Lifshitz
>  - virtio-vsock - think of this as UNIX domain sockets between the host and
>guest.  This is to deal with the valid use case of people wanting to use
>a network protocol, but not wanting an real NIC exposed to the guest/host
>for security concerns. As such I think it'd be useful to run the metadata
>service over virtio-vsock as an option. It'd likely address at lesat some
>people's security concerns wrt metadata service. It would also fix the
>ability to use the metadat service in IPv6-only environments, as we would
>not be using IP at all :-)

Is this currently exposed by libvirt? I had a look at [1] and couldn't
find any mention of 'vsock' or anything that resembles what you've
described.

[1] https://libvirt.org/formatdomain.html

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] EC2 cleanup ?

2018-03-26 Thread Artom Lifshitz
> That is easier said than done. There have been a couple of related attempts
> in the past:
>
> https://review.openstack.org/#/c/266425/
>
> https://review.openstack.org/#/c/282872/
>
> I don't remember exactly where those fell down, but it's worth looking at
> this first before trying to do this again.

Interesting. [1] exists, and I'm pretty sure that we ship it as part
of Red Hat OpenStack (but I'm not a PM and this is not an official Red
Hat stance, just me and my memory), so it works well enough. If we
have things that depend on our in-tree ec2 api, maybe we need to get
them moved over to [1]?

[1] https://github.com/openstack/ec2-api

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] Default scheduler filters survey

2018-04-18 Thread Artom Lifshitz
Hi all,

A CI issue [1] caused by tempest thinking some filters are enabled
when they're really not, and a proposed patch [2] to add
(Same|Different)HostFilter to the default filters as a workaround, has
led to a discussion about what filters should be enabled by default in
nova.

The default filters should make sense for a majority of real world
deployments. Adding some filters to the defaults because CI needs them
is faulty logic, because the needs of CI are different to the needs of
operators/users, and the latter takes priority (though it's my
understanding that a good chunk of operators run tempest on their
clouds post-deployment as a way to validate that the cloud is working
properly, so maybe CI's and users' needs aren't that different after
all).

To that end, we'd like to know what filters operators are enabling in
their deployment. If you can, please reply to this email with your
[filter_scheduler]/enabled_filters (or
[DEFAULT]/scheduler_default_filters if you're using an older version)
option from nova.conf. Any other comments are welcome as well :)

Cheers!

[1] https://bugs.launchpad.net/tempest/+bug/1628443
[2] https://review.openstack.org/#/c/561651/

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Default scheduler filters survey

2018-04-29 Thread Artom Lifshitz
Thanks everyone for your input!

I wrote a small Python script [1] to present all your responses in an
understandable format. Here's the output:

Filters common to all deployments: {'ComputeFilter',
'ServerGroupAntiAffinityFilter'}

Filter counts (out of 9 deployments):
ServerGroupAntiAffinityFilter9
ComputeFilter9
AvailabilityZoneFilter   8
ServerGroupAffinityFilter8
AggregateInstanceExtraSpecsFilter8
ImagePropertiesFilter8
RetryFilter  7
ComputeCapabilitiesFilter5
AggregateCoreFilter  4
RamFilter4
PciPassthroughFilter 3
AggregateRamFilter   3
CoreFilter   2
DiskFilter   2
AggregateImagePropertiesIsolation2
SameHostFilter   2
AggregateMultiTenancyIsolation   1
NUMATopologyFilter   1
AggregateDiskFilter  1
DifferentHostFilter  1

Based on that, we can definitely say that SameHostFilter and
DifferentHostFilter do *not* belong in the defaults. In fact, we got
our defaults pretty spot on, based on this admittedly very limited
dataset. The only frequently occurring filter that's not in our
defaults is AggregateInstanceExtraSpecsFilter.

[1] https://gist.github.com/notartom/0819df7c3cb9d02315bfabe5630385c9

On Fri, Apr 27, 2018 at 8:10 PM, Lingxian Kong  wrote:
> At Catalyst Cloud:
>
> RetryFilter
> AvailabilityZoneFilter
> RamFilter
> ComputeFilter
> AggregateCoreFilter
> DiskFilter
> AggregateInstanceExtraSpecsFilter
> ImagePropertiesFilter
> ServerGroupAntiAffinityFilter
> SameHostFilter
>
> Cheers,
> Lingxian Kong
>
>
> On Sat, Apr 28, 2018 at 3:04 AM Jim Rollenhagen 
> wrote:
>>
>> On Wed, Apr 18, 2018 at 11:17 AM, Artom Lifshitz 
>> wrote:
>>>
>>> Hi all,
>>>
>>> A CI issue [1] caused by tempest thinking some filters are enabled
>>> when they're really not, and a proposed patch [2] to add
>>> (Same|Different)HostFilter to the default filters as a workaround, has
>>> led to a discussion about what filters should be enabled by default in
>>> nova.
>>>
>>> The default filters should make sense for a majority of real world
>>> deployments. Adding some filters to the defaults because CI needs them
>>> is faulty logic, because the needs of CI are different to the needs of
>>> operators/users, and the latter takes priority (though it's my
>>> understanding that a good chunk of operators run tempest on their
>>> clouds post-deployment as a way to validate that the cloud is working
>>> properly, so maybe CI's and users' needs aren't that different after
>>> all).
>>>
>>> To that end, we'd like to know what filters operators are enabling in
>>> their deployment. If you can, please reply to this email with your
>>> [filter_scheduler]/enabled_filters (or
>>> [DEFAULT]/scheduler_default_filters if you're using an older version)
>>> option from nova.conf. Any other comments are welcome as well :)
>>
>>
>> At Oath:
>>
>> AggregateImagePropertiesIsolation
>> ComputeFilter
>> CoreFilter
>> DifferentHostFilter
>> SameHostFilter
>> ServerGroupAntiAffinityFilter
>> ServerGroupAffinityFilter
>> AvailabilityZoneFilter
>> AggregateInstanceExtraSpecsFilter
>>
>> // jim
>>
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>



-- 
--
Artom Lifshitz
Software Engineer, OpenStack Compute DFG

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tc][all] A culture change (nitpicking)

2018-05-29 Thread Artom Lifshitz
I dunno, there's a fine line to be drawn between getting a finished
product that looks unprofessional (because of typos, English mistakes,
etc), and nitpicking to the point of smothering and being
counter-productive. One idea would be that, once the meat of the patch
has passed multiple rounds of reviews and looks good, and what remains
is only nits, the reviewer themselves take on the responsibility of
pushing a new patch that fixes the nits that they found.

On Tue, May 29, 2018 at 9:55 AM, Julia Kreger
 wrote:
> During the Forum, the topic of review culture came up in session after
> session. During these discussions, the subject of our use of nitpicks
> were often raised as a point of contention and frustration, especially
> by community members that have left the community and that were
> attempting to re-engage the community. Contributors raised the point
> of review feedback requiring for extremely precise English, or
> compliance to a particular core reviewer's style preferences, which
> may not be the same as another core reviewer.
>
> These things are not just frustrating, but also very inhibiting for
> part time contributors such as students who may also be time limited.
> Or an operator who noticed something that was clearly a bug and that
> put forth a very minor fix and doesn't have the time to revise it over
> and over.
>
> While nitpicks do help guide and teach, the consensus seemed to be
> that we do need to shift the culture a little bit. As such, I've
> proposed a change to our principles[1] in governance that attempts to
> capture the essence and spirit of the nitpicking topic as a first
> step.
>
> -Julia
> -
> [1]: https://review.openstack.org/570940
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



-- 
--
Artom Lifshitz
Software Engineer, OpenStack Compute DFG

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tc][all] A culture change (nitpicking)

2018-05-29 Thread Artom Lifshitz
> On Tue, May 29, 2018 at 10:52:04AM -0400, Mohammed Naser wrote:
>
> :On Tue, May 29, 2018 at 10:43 AM, Artom Lifshitz  wrote:
> :>  One idea would be that, once the meat of the patch
> :> has passed multiple rounds of reviews and looks good, and what remains
> :> is only nits, the reviewer themselves take on the responsibility of
> :> pushing a new patch that fixes the nits that they found.
>
> Doesn't the above suggestion sufficiently address the concern below?
>
> :I'd just like to point out that what you perceive as a 'finished
> :product that looks unprofessional' might be already hard enough for a
> :contributor to achieve.  We have a lot of new contributors coming from
> :all over the world and it is very discouraging for them to have their
> :technical knowledge and work be categorized as 'unprofessional'
> :because of the language barrier.
> :
> :git-nit and a few minutes of your time will go a long way, IMHO.
>
> As very intermittent contributor and native english speaker with
> relatively poor spelling and typing I'd be much happier with a
> reviewer pushing a patch that fixes nits rather than having a ton of
> inline comments that point them out.
>
> maybe we're all saying the same thing here?

Yeah, I feel like we're all essentially in agreement that nits (of the
English mistake of typo type) do need to get fixed, but sometimes
(often?) putting the burden of fixing them on the original patch
contributor is neither fair nor constructive.

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][cinder] Update (swap) of multiattach volume should not be allowed

2018-06-06 Thread Artom Lifshitz
 volume.
>
> Yes, that is the commit message in its entirety. Of course, the commit had
> no documentation at all in it, so there's no ability to understand what the
> original use case really was here.
>
> https://review.openstack.org/#/c/28995/
>
> If the use case was really "that a user needs to move volume data for
> attached volumes", why not just pause the VM, detach the volume, do a
> openstack volume migrate to the new destination, reattach the volume and
> start the VM? That would mean no libvirt/QEMU-specific implementation
> behaviour leaking out of the public HTTP API and allow the volume service
> (Cinder) to do its job properly.
>
>
>> With single attach that's exactly what they get: the end
>> user should never notice. With multi-attach they don't get that. We're
>> basically forking the shared volume at a point in time, with the
>> instance which did the swap writing to the new location while all
>> others continue writing to the old location. Except that even the fork
>> is broken, because they'll get a corrupt, inconsistent copy rather
>> than point in time. I can't think of a use case for this behaviour,
>> and it certainly doesn't meet the original design intent.
>>
>> What they really want is for the multi-attached volume to be copied
>> from location a to location b and for all attachments to be updated.
>> Unfortunately I don't think we're going to be in a position to do that
>> any time soon, but I also think users will be unhappy if they're no
>> longer able to move data at all because it's multi-attach. We can
>> compromise, though, if we allow a multiattach volume to be moved as
>> long as it only has a single attachment. This means the operator can't
>> move this data without disruption to users, but at least it's not
>> fundamentally immovable.
>>
>> This would require some cooperation with cinder to achieve, as we need
>> to be able to temporarily prevent cinder from allowing new
>> attachments. A natural way to achieve this would be to allow a
>> multi-attach volume with only a single attachment to be redesignated
>> not multiattach, but there might be others. The flow would then be:
>>
>> Detach volume from server 2
>> Set multiattach=False on volume
>> Migrate volume on server 1
>> Set multiattach=True on volume
>> Attach volume to server 2
>>
>> Combined with a patch to nova to disallow swap_volume on any
>> multiattach volume, this would then be possible if inconvenient.
>>
>> Regardless of any other changes, though, I think it's urgent that we
>> disable the ability to swap_volume a multiattach volume because we
>> don't want users to start using this relatively new, but broken,
>> feature.
>
>
> Or we could deprecate the swap_volume Compute API operation and use Cinder
> for all of this.
>
> But sure, we could also add more cruft to the Compute API and add more
> conditional "it works but only when X" docs to the API reference.
>
> Just my two cents,
> -jay
>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



-- 
--
Artom Lifshitz
Software Engineer, OpenStack Compute DFG

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][libvirt] Lets make libvirt's domain XML canonical

2016-09-27 Thread Artom Lifshitz
l of 'management':
>> it's the minutia of the hypervisor. In order to persist all of these things
>> in Nova we'd have to implement them explicitly, and when libvirt/kvm grows
>> more stuff we'll have to do that too. We'll need to mirror the
>> functionality of libvirt in Nova, feature for feature. This is a red flag
>> for me, and I think it means we should switch to libvirt being canonical.
>>
>> I think we should be able to create a domain, but once created we should
>> never redefine a domain. We can do adding and removing devices dynamically
>> using libvirt's apis, secure in the knowledge that libvirt will persist
>> this for us. When we upgrade the host, libvirt can ensure we don't break
>> guests which are on it. Evacuate should be pretty much the only reason to
>> start again.
>
> And in fact we do persist the guest XML with libvirt already. We sadly
> never use that info though - we just blindly overwrite it every time
> with newly generated XML.
>
> Fixing this should not be technically difficult for the most part.
>
>> I raised this in the live migration sub-team meeting, and the immediate
>> response was understandably conservative. I think this solves more problems
>> than it creates, though, and it would result in Nova's libvirt driver
>> getting a bit smaller and a bit simpler. That's a big win in my book.
>
> I don't think it'll get significantly smaller/simpler, but it will
> definitely be more intelligent and user friendly to do this IMHO.
> As mentioned above, I think the windows license reactivation issue
> alone is enough of a reason todo this.
>
> Regards,
> Daniel
> --
> |: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
> |: http://libvirt.org  -o- http://virt-manager.org :|
> |: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
> |: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



-- 
--
Artom Lifshitz

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] NUMA-aware live migration: easy but incomplete vs complete but hard

2018-06-18 Thread Artom Lifshitz
Hey all,

For Rocky I'm trying to get live migration to work properly for
instances that have a NUMA topology [1].

A question that came up on one of patches [2] is how to handle
resources claims on the destination, or indeed whether to handle that
at all.

The previous attempt's approach [3] (call it A) was to use the
resource tracker. This is race-free and the "correct" way to do it,
but the code is pretty opaque and not easily reviewable, as evidenced
by [3] sitting in review purgatory for literally years.

A simpler approach (call it B) is to ignore resource claims entirely
for now and wait for NUMA in placement to land in order to handle it
that way. This is obviously race-prone and not the "correct" way of
doing it, but the code would be relatively easy to review.

For the longest time, live migration did not keep track of resources
(until it started updating placement allocations). The message to
operators was essentially "we're giving you this massive hammer, don't
break your fingers." Continuing to ignore resource claims for now is
just maintaining the status quo. In addition, there is value in
improving NUMA live migration *now*, even if the improvement is
incomplete because it's missing resource claims. "Best is the enemy of
good" and all that. Finally, making use of the resource tracker is
just work that we know will get thrown out once we start using
placement for NUMA resources.

For all those reasons, I would favor approach B, but I wanted to ask
the community for their thoughts.

Thanks!

[1] 
https://review.openstack.org/#/q/topic:bp/numa-aware-live-migration+(status:open+OR+status:merged)
[2] https://review.openstack.org/#/c/567242/
[3] https://review.openstack.org/#/c/244489/

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] NUMA-aware live migration: easy but incomplete vs complete but hard

2018-06-18 Thread Artom Lifshitz
> For what it's worth, I think the previous patch languished for a number of
> reasons other than the complexity of the code...the original author left,
> the coding style was a bit odd, there was an attempt to make it work even if
> the source was an earlier version, etc.  I think a fresh implementation
> would be less complicated to review.

I'm afraid of unknowns in the resource tracker and claims mechanism.
For snips and giggles, I submitted a quick patch that attempts to use
a  claim [1] when live migrating instances. Assuming it somehow passes
CI, I have no idea if I've just opened rabbit hole of people telling
me "oh but you need to do this other thing in this other place." How
knows the claims code well, anyways?

[1] https://review.openstack.org/576222

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] NUMA-aware live migration: easy but incomplete vs complete but hard

2018-06-19 Thread Artom Lifshitz
> Adding
> claims support later on wouldn't change any on-the-wire messaging, it would
> just make things work more robustly.

I'm not even sure about that. Assuming [1] has at least the right
idea, it looks like it's an either-or kind of thing: either we use
resource tracker claims and get the new instance NUMA topology that
way, or do what was in the spec and have the dest send it to the
source.

That being said, I still think I'm still in favor of choosing the
"easy" way out. For instance, [2] should fail because we can't access
the api db from the compute node. So unless there's a simpler way,
using RT claims would involve changing the RPC to add parameters to
check_can_live_migration_destination, which, while not necessarily
bad, seems like useless complexity for a thing we know will get ripped
out.

[1] https://review.openstack.org/#/c/576222/
[2] https://review.openstack.org/#/c/576222/3/nova/compute/manager.py@5897

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] NUMA-aware live migration: easy but incomplete vs complete but hard

2018-06-21 Thread Artom Lifshitz
>
> As I understand it, Artom is proposing to have a larger race window,
> essentially
> from when the scheduler selects a node until the resource audit runs on
> that node.
>

Exactly. When writing the spec I thought we could just call the resource
tracker to claim the resources when the migration was done. However, when I
started looking at the code in reaction to Sahid's feedback, I noticed that
there's no way to do it without the MoveClaim context (right?)

Keep in mind, we're not making any race windows worse - I'm proposing
keeping the status quo and fixing it later with NUMA in placement (or the
resource tracker if we can swing it).

The resource tracker stuff is just so... opaque. For instance, the original
patch [1] uses a mutated_migration_context around the pre_live_migration
call to the libvirt driver. Would I still need to do that? Why or why not?

At this point we need to commit to something and roll with it, so I'm
sticking to the "easy way". If it gets shut down in code review, at least
we'll have certainty on how to approach this next cycle.

[1] https://review.openstack.org/#/c/244489/

>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] NUMA-aware live migration: easy but incomplete vs complete but hard

2018-06-21 Thread Artom Lifshitz
> Side question... does either approach touch PCI device management during
> live migration?

Nope. I'd need to do some research to see what, if anything, is needed
at the lower levels (kernel, libvirt) to enable this.

> I ask because the only workloads I've ever seen that pin guest vCPU threads
> to specific host processors -- or make use of huge pages consumed from a
> specific host NUMA node -- have also made use of SR-IOV and/or PCI
> passthrough. [1]
>
> If workloads that use PCI passthrough or SR-IOV VFs cannot be live migrated
> (due to existing complications in the lower-level virt layers) I don't see
> much of a point spending lots of developer resources trying to "fix" this
> situation when in the real world, only a mythical workload that uses CPU
> pinning or huge pages but *doesn't* use PCI passthrough or SR-IOV VFs would
> be helped by it.

It's definitely a paint point for at least some of our customers - I
don't know their use cases exactly, but live migration with CPU
pinning but no other "high performance" features has come up a few
times in our downstream bug tracker. In any case, incremental progress
is better than no progress at all, so if we can improve how NUMA live
migration works, we'll be in a better position to make it work with
PCI devices down the road.

> [Mooney, Sean K]  I would generally agree but with the extention of include 
> dpdk based vswitch like ovs-dpdk or vpp.
> Cpu pinned or hugepage backed guests generally also have some kind of high 
> performance networking solution or use a hardware
> Acclaortor like a gpu to justify the performance assertion that pinning of 
> cores or ram is required.
> Dpdk networking stack would however not require the pci remaping to be 
> addressed though I belive that is planned to be added in stine.

I think Stephen Finucane's NUMA-aware vswitches work depends on mine
to work with live migration - ie, it'll work just fine on its own, but
to live migrate an instance with a NUMA vswitch (I know I'm abusing
language here, apologies) this spec will need to be implemented first.

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova][Cinder][Tempest] Help with tempest.api.compute.servers.test_device_tagging.TaggedAttachmentsTest.test_tagged_attachment needed

2018-07-19 Thread Artom Lifshitz
Because we're waiting for the volume to become available before we
continue with the test [1], its tag still being present means Nova's
not cleaning up the device tags on volume detach. This is most likely
a bug. I'll look into it.

[1] 
https://github.com/openstack/tempest/blob/master/tempest/api/compute/servers/test_device_tagging.py#L378

On Thu, Jul 19, 2018 at 7:09 AM, Slawomir Kaplonski  wrote:
> Hi,
>
> Since some time we see that test 
> tempest.api.compute.servers.test_device_tagging.TaggedAttachmentsTest.test_tagged_attachment
>  is failing sometimes.
> Bug about that is reported for Tempest currently [1] but after small patch 
> [2] was merged I was today able to check what cause this issue.
>
> Test which is failing is in [3] and it looks that everything is going fine 
> with it up to last line of test. So volume and port are created, attached, 
> tags are set properly, both devices are detached properly also and at the end 
> test is failing as in http://169.254.169.254/openstack/latest/meta_data.json 
> still has some device inside.
> And it looks now from [4] that it is volume which isn’t removed from this 
> meta_data.json.
> So I think that it would be good if people from Nova and Cinder teams could 
> look at it and try to figure out what is going on there and how it can be 
> fixed.
>
> Thanks in advance for help.
>
> [1] https://bugs.launchpad.net/tempest/+bug/1775947
> [2] https://review.openstack.org/#/c/578765/
> [3] 
> https://github.com/openstack/tempest/blob/master/tempest/api/compute/servers/test_device_tagging.py#L330
> [4] 
> http://logs.openstack.org/69/567369/15/check/tempest-full/528bc75/job-output.txt.gz#_2018-07-19_10_06_09_273919
>
> —
> Slawek Kaplonski
> Senior software engineer
> Red Hat
>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



-- 
--
Artom Lifshitz
Software Engineer, OpenStack Compute DFG

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova][Cinder][Tempest] Help with tempest.api.compute.servers.test_device_tagging.TaggedAttachmentsTest.test_tagged_attachment needed

2018-07-19 Thread Artom Lifshitz
I've proposed [1] to add extra logging on the Nova side. Let's see if
that helps us catch the root cause of this.

[1] https://review.openstack.org/584032

On Thu, Jul 19, 2018 at 12:50 PM, Artom Lifshitz  wrote:
> Because we're waiting for the volume to become available before we
> continue with the test [1], its tag still being present means Nova's
> not cleaning up the device tags on volume detach. This is most likely
> a bug. I'll look into it.
>
> [1] 
> https://github.com/openstack/tempest/blob/master/tempest/api/compute/servers/test_device_tagging.py#L378
>
> On Thu, Jul 19, 2018 at 7:09 AM, Slawomir Kaplonski  
> wrote:
>> Hi,
>>
>> Since some time we see that test 
>> tempest.api.compute.servers.test_device_tagging.TaggedAttachmentsTest.test_tagged_attachment
>>  is failing sometimes.
>> Bug about that is reported for Tempest currently [1] but after small patch 
>> [2] was merged I was today able to check what cause this issue.
>>
>> Test which is failing is in [3] and it looks that everything is going fine 
>> with it up to last line of test. So volume and port are created, attached, 
>> tags are set properly, both devices are detached properly also and at the 
>> end test is failing as in 
>> http://169.254.169.254/openstack/latest/meta_data.json still has some device 
>> inside.
>> And it looks now from [4] that it is volume which isn’t removed from this 
>> meta_data.json.
>> So I think that it would be good if people from Nova and Cinder teams could 
>> look at it and try to figure out what is going on there and how it can be 
>> fixed.
>>
>> Thanks in advance for help.
>>
>> [1] https://bugs.launchpad.net/tempest/+bug/1775947
>> [2] https://review.openstack.org/#/c/578765/
>> [3] 
>> https://github.com/openstack/tempest/blob/master/tempest/api/compute/servers/test_device_tagging.py#L330
>> [4] 
>> http://logs.openstack.org/69/567369/15/check/tempest-full/528bc75/job-output.txt.gz#_2018-07-19_10_06_09_273919
>>
>> —
>> Slawek Kaplonski
>> Senior software engineer
>> Red Hat
>>
>>
>> ______
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
>
> --
> --
> Artom Lifshitz
> Software Engineer, OpenStack Compute DFG



-- 
--
Artom Lifshitz
Software Engineer, OpenStack Compute DFG

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [infra][nova] Running NFV tests in CI

2018-07-24 Thread Artom Lifshitz
Hey all,

tl;dr Humbly requesting a handful of nodes to run NFV tests in CI

Intel has their NFV tests tempest plugin [1] and manages a third party
CI for Nova. Two of the cores on that project (Stephen Finucane and
Sean Mooney) have now moved to Red Hat, but the point still stands
that there's a need and a use case for testing things like NUMA
topologies, CPU pinning and hugepages.

At Red Hat, we also have a similar tempest plugin project [2] that we
use for downstream whitebox testing. The scope is a bit bigger than
just NFV, but the main use case is still testing NFV code in an
automated way.

Given that there's a clear need for this sort of whitebox testing, I
would like to humbly request a handful of nodes (in the 3 to 5 range)
from infra to run an "official" Nova NFV CI. The code doing the
testing would initially be the current Intel plugin, bug we could have
a separate discussion about keeping "Intel" in the name or forking
and/or renaming it to something more vendor-neutral.

I won't be at PTG (conflict with personal travel), so I'm kindly
asking Stephen and Sean to represent this idea in Denver.

Cheers!

[1] https://github.com/openstack/intel-nfv-ci-tests
[2] 
https://review.rdoproject.org/r/#/admin/projects/openstack/whitebox-tempest-plugin

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [infra][nova] Running NFV tests in CI

2018-07-24 Thread Artom Lifshitz
On Tue, Jul 24, 2018 at 12:30 PM, Clark Boylan  wrote:
> On Tue, Jul 24, 2018, at 9:23 AM, Artom Lifshitz wrote:
>> Hey all,
>>
>> tl;dr Humbly requesting a handful of nodes to run NFV tests in CI
>>
>> Intel has their NFV tests tempest plugin [1] and manages a third party
>> CI for Nova. Two of the cores on that project (Stephen Finucane and
>> Sean Mooney) have now moved to Red Hat, but the point still stands
>> that there's a need and a use case for testing things like NUMA
>> topologies, CPU pinning and hugepages.
>>
>> At Red Hat, we also have a similar tempest plugin project [2] that we
>> use for downstream whitebox testing. The scope is a bit bigger than
>> just NFV, but the main use case is still testing NFV code in an
>> automated way.
>>
>> Given that there's a clear need for this sort of whitebox testing, I
>> would like to humbly request a handful of nodes (in the 3 to 5 range)
>> from infra to run an "official" Nova NFV CI. The code doing the
>> testing would initially be the current Intel plugin, bug we could have
>> a separate discussion about keeping "Intel" in the name or forking
>> and/or renaming it to something more vendor-neutral.
>
> The way you request nodes from Infra is through your Zuul configuration. Add 
> jobs to a project to run tests on the node labels that you want.

Aha, thanks, I'll look into that. I was coming from a place of
complete ignorance about infra.
>
> I'm guessing this process doesn't work for NFV tests because you have 
> specific hardware requirements that are not met by our current VM resources?
> If that is the case it would probably be best to start by documenting what is 
> required and where the existing VM resources fall
> short.

Well, it should be possible to do most of what we'd like with nested
virt and virtual NUMA topologies, though things like hugepages will
need host configuration, specifically the kernel boot command [1]. Is
that possible with the nodes we have?

> In general though we operate on top of donated cloud resources, and if those 
> do not work we will have to identify a source of resources that would work.

Right, as always it comes down to resources and money. I believe
historically Red Hat has been opposed to running an upstream third
party CI (this is by no means an official Red Hat position, just
remembering what I think I heard), but I can always see what I can do.

[1] 
https://docs.openstack.org/nova/latest/admin/huge-pages.html#enabling-huge-pages-on-the-host

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [placement] The "intended purpose" of traits

2018-10-01 Thread Artom Lifshitz
> So from a code perspective _placement_ is completely agnostic to
> whether a trait is "PCI_ADDRESS_01_AB_23_CD", "STORAGE_DISK_SSD", or
> "JAY_LIKES_CRUNCHIE_BARS".

Right, but words have meanings, and everyone is better off if that
meaning is common amongst everyone doing the talking. So if placement
understands traits as "a unitary piece of information that is either
true or false" (ex: HAS_SSD), but nova understands it as "multiple
pieces of information, all of which are either true or false" (ex:
HAS_PCI_DE_AD_BE_EF), then that's asking for trouble. Can it work out?
Probably, but it'll be more by accident that by design, sort of like
French and Spanish sharing certain words, but then having some similar
sounding words mean something completely different.

> However, things which are using traits (e.g., nova, ironic) need to
> make their own decisions about how the value of traits are
> interpreted.

Well... if placement is saying "here's the primitives I can work with
and can expose to my users", but nova is saying "well, we like this
one primitive, but what we really need is this other primitive, and
you don't have it, but we can totally hack this first primitive that
you do have to do what we want"... That's ugly. From what I
understand, Nova needs *resources* (not resources providers) to have
*quantities* of things, and this is not something that placement can
currently work with, which is why we're having this flamewar ;)

> I don't have a strong position on that except to say
> that _if_ we end up in a position of there being lots of traits
> willy nilly, people who have chosen to do that need to know that the
> contract presented by traits right now (present or not present, no
> value comprehension) is fixed.
>
> > I *do* see a problem with it, based on my experience in Nova where this kind
> > of thing leads to ugly, unmaintainable, and incomprehensible code as I have
> > pointed to in previous responses.
>
> I think there are many factors that have led to nova being
> incomprehensible and indeed bad representations is one of them, but
> I think reasonable people can disagree on which factors are the most
> important and with sufficient discussion come to some reasonable
> compromises. I personally feel that while the bad representations
> (encoding stuff in strings or json blobs) thing is a big deal,
> another major factor is a predilection to make new apis, new
> abstractions, and new representations rather than working with and
> adhering to the constraints of the existing ones. This leads to a
> lot of code that encodes business logic in itself (e.g., several
> different ways and layers of indirection to think about allocation
> ratios) rather than working within strong and constraining
> contracts.
>
> From my standpoint there isn't much to talk about here from a
> placement code standpoint. We should clearly document the functional
> contract (and stick to it) and we should come up with exemplars
> for how to make the best use of traits.
>
> I think this conversation could allow us to find those examples.
>
> I don't, however, want placement to be a traffic officer for how
> people do things. In the context of the orchestration between nova
> and ironic and how that interaction happens, nova has every right to
> set some guidelines if it needs to.
>
> --
> Chris Dent   ٩◔̯◔۶   https://anticdent.org/
> freenode: cdent tw: 
> @anticdent______
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



-- 
--
Artom Lifshitz
Software Engineer, OpenStack Compute DFG

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] State of NUMA live migration

2018-10-03 Thread Artom Lifshitz
Yes, this is still happening. Mea culpa for not carrying the ball and
maintaining visibility. There's work in nova to actually get it
working, and in intel-nfv-ci to lay down the groundwork for eventual
CI.

In nova, the spec has been re-proposed for Stein [1]. There are some
differences from the Rocky version, but based on what I've heard was
discussed at Denver, it shouldn't be too controversial. There's a
couple of nova patches up as well [2], but that's still pretty WIP
given the changes in the spec. A bunch of patches from Rocky were
abandoned because they're no longer applicable.

In intel-nfv-ci, there's a whole stack of changes [3] that are mostly
about technical debt and laying the groundwork to support multinode
test environments, but there's also a WIP patch in there [4] that'll
eventually actually test live migration.

For now we have no upstream/public environment to run that on, so
anyone who's involved will need their own env if they want to run the
tests and/or play with the feature. Longer-term, I would like to have
some form of upstream CI testing this, be it in the vanilla nodepool
with nested virt and "fake" NUMA topologies, or a 3rd party CI with
resources provided by an interested stakeholder.

[1] https://review.openstack.org/#/c/599587/
[2] 
https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:bp/numa-aware-live-migration
[3] https://review.openstack.org/#/c/576602/
[4] https://review.openstack.org/#/c/574871/6

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] State of NUMA live migration

2018-10-03 Thread Artom Lifshitz
> Yes, this is still happening. Mea culpa for not carrying the ball and
> maintaining visibility. There's work in nova to actually get it
> working, and in intel-nfv-ci to lay down the groundwork for eventual
> CI.
>
> In nova, the spec has been re-proposed for Stein [1]. There are some
> differences from the Rocky version, but based on what I've heard was
> discussed at Denver, it shouldn't be too controversial. There's a
> couple of nova patches up as well [2], but that's still pretty WIP
> given the changes in the spec. A bunch of patches from Rocky were
> abandoned because they're no longer applicable.
>
> In intel-nfv-ci, there's a whole stack of changes [3] that are mostly
> about technical debt and laying the groundwork to support multinode
> test environments, but there's also a WIP patch in there [4] that'll
> eventually actually test live migration.
>
> For now we have no upstream/public environment to run that on, so
> anyone who's involved will need their own env if they want to run the
> tests and/or play with the feature. Longer-term, I would like to have
> some form of upstream CI testing this, be it in the vanilla nodepool
> with nested virt and "fake" NUMA topologies, or a 3rd party CI with
> resources provided by an interested stakeholder.

Forgot the nova tag :(

> [1] https://review.openstack.org/#/c/599587/
> [2] 
> https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:bp/numa-aware-live-migration
> [3] https://review.openstack.org/#/c/576602/
> [4] https://review.openstack.org/#/c/574871/6



-- 
--
Artom Lifshitz
Software Engineer, OpenStack Compute DFG

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev