Re: [openstack-dev] [nova] Migration progress

2015-11-24 Thread
Hi Paul,
Comments inline:

2015-11-23 16:36 GMT+08:00 Paul Carlton :

> John
>
> At the live migration sub team meeting I undertook to look at the issue
> of progress reporting.
>
> The use cases I'm envisaging are...
>
> As a user I want to know how much longer my instance will be migrating
> for.
>
> As an operator I want to identify any migration that are making slow
>  progress so I can expedite their progress or abort them.
>
> The current implementation reports on the instance's migration with
> respect to memory transfer, using the total memory and memory remaining
> fields from libvirt to report the percentage of memory still to be
> transferred.  Due to the instance writing to pages already transferred
> this percentage can go up as well as down.  Daniel has done a good job
> of generating regular log records to report progress and highlight lack
> of progress but from the API all a user/operator can see is the current
> percentage complete.  By observing this periodically they can identify
> instance migrations that are struggling to migrate memory pages fast
> enough to keep pace with the instance's memory updates.
>
> The problem is that at present we have only one field, the instance
> progress, to record progress.  With a live migration there are measures
>

[Shaohe]:

>From this link, OpenStack API ref:
http://developer.openstack.org/api-ref-compute-v2.1.html#listDetailServers
It describe the instance progress: A percentage value of the build progress.
But for libvirt driver it does be migration progress.
For other driver it is building progress.
And there is a spec to propose some change.
https://review.openstack.org/#/c/249086/



> of progress, how much of the ephemeral disks (not needed for shared
> disk setups) have been copied and how much of the memory has been
> copied. Both can go up and down as the instance writes to pages already
> copied causing those pages to need to be copied again.  As Daniel says
> in his comments in the code, the disk size could dwarf the memory so
> reporting both in single percentage number is problematic.
>
> We could add an additional progress item to the instance object, i.e.
> disk progress and memory progress but that seems odd to have an
> additional progress field only for this operation so this is probably
> a non starter!
>
> For operations staff with access to log files we could report disk
> progress as well as memory in the log file, however that does not
> address the needs of users and whilst log files are the right place for
> support staff to look when investigating issues operational tooling
> is much better served by notification messages.
>
> Thus I'd recommend generating periodic notifications during a migration
> to report both memory and disk progress would be useful?  Cloud
> operators are likely to manage their instance migration activity using
> some orchestration tooling which could consume these notifications and
> deduce what challenges the instance migration is encountering and thus
> determine how to address any issues.
>
> The use cases are only partially addressed by the current
> implementation, they can repeatedly get the server details and look at
> the progress percentage to see how quickly (or even if) it is
> increasing and determine how long the instance is likely to be
> migrating for.  However for an instance that has a large disk and/or
> is doing a high rate of disk i/o they may see the percentage complete
> (i.e. memory) repeatedly showing 90%+ but the instance migration does
> not complete.
>
> The nova spec https://review.openstack.org/#/c/248472/ suggests making
> detailed information available via the os-migrations object.  This is
> not a bad idea but I have some issues with the implementation that I
> will share on that spec.
>

[Shaohe]:

About this spec, Daniel has give some comments on it, and we have updated it.
Maybe we can work together on it to make it more better.

I have worked on libvirt multi-thread compress migration for libvirt. and looks
into some live migrations performance optimizations.

and generate an  ideas:
1. Let nova expose more live migration
details, such as the RAM statistics, xbzrle-cache status, also the information
of multi-thread compression in future, and so on.
2. nova can enable auto-converge, tune
the xbzrle-cache and multi-thread compression dynamically.
3. Then other project can make a good
strategy to tune the live migration base on the migration details.


For example:
cache size is a performance key for xbzrle,  the best is that the cache size are
same with the guest total RAM, but this maybe not always available on host.
Multi-thread compress level is higher is better, but it is cpu consume,
Auto converge will slow down the CPU running.
Seems things not always as good as I had expected.

Also we have submit a topic to summit about this idea, but not accepted.
Topic: 
Link: 
https://www.openstack.org/summit/tokyo-2015/vote-for-speakers/presentation/4971


We looking in

[openstack-dev] how do we get the migration status details info from nova

2015-11-26 Thread
Now, we are agree on getting more migration status details info are useful.

But How do we get them?
By REST API or Notification?


IF by API, does the  "time_elapsed" is needed?

For there is a "created_at" field.
But IMO, it is base on the time of the conductor server?
The time_elapsed can get from libvirt, which from the hypervisor.
Usually, there are ntp-server in the cloud. and we can get the time_elapsed
by "created_at".
but not sure there will be the case:
the time of hypervisor and conductor server host are out of sync?
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] how do we get the migration status details info from nova

2015-11-26 Thread
Useful information.
Thank you Gibi.

BR
Shaohe Feng..

2015-11-26 21:29 GMT+08:00 Balázs Gibizer :

> > -Original Message-
> > From: Paul Carlton [mailto:paul.carlt...@hpe.com]
> > Sent: November 26, 2015 12:11
> > On 26/11/15 10:48, 少合冯 wrote:
> >
> >
> >   Now, we are agree on getting more migration status details
> > info are useful.
> >
> >   But How do we get them?
> >   By REST API or Notification?
> >
> >
> >   IF by API, does the  "time_elapsed" is needed?
> >
> >   For there is a "created_at" field.
> >
> >   But IMO, it is base on the time of the conductor server?
> >   The time_elapsed can get from libvirt, which from the
> > hypervisor.
> >   Usually, there are ntp-server in the cloud. and we can get the
> > time_elapsed by "created_at".
> >   but not sure there will be the case:
> >   the time of hypervisor and conductor server host are out of
> > sync?
> >
> > Why not both.  Just update the _monitor_live_migration method in the
> > libvirt  driver (and any similar functions in other drivers if they
> exist) so it
> > updates  the migration object and also sends notification events.  These
> > don't have  to be at 5 second intervals, although I think that is about
> right for
> > the migration object update.  Notification messages could be once event
> 30
> > seconds or so.
> >
> > Operators can monitor the progress via the API and orchestration
> utilities  to
> > consume the notification messages (and/or use API).
> > This will enable them to identify migration operations that are not
> making
> > good progress and take actions to address the issue.
> >
> > The created_at and updated_at fields of the migration object should be
> > sufficient to allow the caller to work out how long the migration has
> been
> > running for (or how long it took in the case of a completed migration).
> >
> > Notification payload can include the created_at field or not.  I'd say
> not.
> > There will be a notification message generated when a migration starts so
> > subsequent progress messages don't need it, if the consumer wants the
> > complete picture they can call the API.
>
>
> As a side note if you are planning to add a new notification please
> consider
> aligning with the ongoing effort to make the notification payloads
> versioned. [1]
> Cheers,
> Gibi
>
> [1] https://blueprints.launchpad.net/nova/+spec/versioned-notification-api
> >
> >
> >
> > --
> > Paul Carlton
> > Software Engineer
> > Cloud Services
> > Hewlett Packard
> > BUK03:T242
> > Longdown Avenue
> > Stoke Gifford
> > Bristol BS34 8QZ
> >
> > Mobile:+44 (0)7768 994283
> > Email:mailto:paul.carlt...@hpe.com
> > Hewlett-Packard Limited registered Office: Cain Road, Bracknell, Berks
> RG12
> > 1HN Registered No: 690597 England.
> > The contents of this message and any attachments to it are confidential
> and
> > may be legally privileged. If you have received this message in error,
> you
> > should delete it from your system immediately and advise the sender. To
> any
> > recipient of this message within HP, unless otherwise stated you should
> > consider this message and attachments as "HP CONFIDENTIAL".
>
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] how do we get the migration status details info from nova

2015-11-26 Thread
Agree.  Why not both.  and will use created_at to work out how long the
migration has been
running.


Paul, thank you very much for the suggestion.

BR.
Shaohe Feng



2015-11-26 19:10 GMT+08:00 Paul Carlton :

> On 26/11/15 10:48, 少合冯 wrote:
>
> Now, we are agree on getting more migration status details info are
> useful.
>
> But How do we get them?
> By REST API or Notification?
>
>
> IF by API, does the  "time_elapsed" is needed?
> For there is a "created_at" field.
> But IMO, it is base on the time of the conductor server?
> The time_elapsed can get from libvirt, which from the hypervisor.
> Usually, there are ntp-server in the cloud. and we can get the
> time_elapsed by "created_at".
> but not sure there will be the case:
> the time of hypervisor and conductor server host are out of sync?
>
> Why not both.  Just update the _monitor_live_migration method in the
> libvirt
>  driver (and any similar functions in other drivers if they exist) so it
> updates
>  the migration object and also sends notification events.  These don't have
>  to be at 5 second intervals, although I think that is about right for the
> migration object update.  Notification messages could be once event 30
>  seconds or so.
>
> Operators can monitor the progress via the API and orchestration utilities
>  to consume the notification messages (and/or use API).
> This will enable them to identify migration operations that are not making
>  good progress and take actions to address the issue.
>
> The created_at and updated_at fields of the migration object should be
> sufficient to allow the caller to work out how long the migration has been
> running for (or how long it took in the case of a completed migration).
>
> Notification payload can include the created_at field or not.  I'd say not.
> There will be a notification message generated when a migration starts
> so subsequent progress messages don't need it, if the consumer wants
> the complete picture they can call the API.
>
>
> --
> Paul Carlton
> Software Engineer
> Cloud Services
> Hewlett Packard
> BUK03:T242
> Longdown Avenue
> Stoke Gifford
> Bristol BS34 8QZ
>
> Mobile:+44 (0)7768 994283
> Email:mailto:paul.carlt...@hpe.com 
> Hewlett-Packard Limited registered Office: Cain Road, Bracknell, Berks RG12 
> 1HN Registered No: 690597 England.
> The contents of this message and any attachments to it are confidential and 
> may be legally privileged. If you have received this message in error, you 
> should delete it from your system immediately and advise the sender. To any 
> recipient of this message within HP, unless otherwise stated you should 
> consider this message and attachments as "HP CONFIDENTIAL".
>
>
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] [RFC] how to enable xbzrle compress for live migration

2015-11-26 Thread
Hi all,
We want to support xbzrle compress for live migration.

Now there are 3 options,
1. add the enable flag in nova.conf.
such as a dedicated 'live_migration_compression=on|off" parameter in
nova.conf.
And nova simply enable it.
seems not good.
2.  add a parameters in live migration API.

A new array compress will be added as optional, the json-schema as below::

  {
'type': 'object',
'properties': {
  'os-migrateLive': {
'type': 'object',
'properties': {
  'block_migration': parameter_types.boolean,
  'disk_over_commit': parameter_types.boolean,
  'compress': {
'type': 'array',
'items': ["xbzrle"],
  },
  'host': host
},
'additionalProperties': False,
  },
},
'required': ['os-migrateLive'],
'additionalProperties': False,
  }


3.  dynamically choose when to activate xbzrle compress for live migration.
 This is the best.
 xbzrle really wants to be used if the network is not able to keep up
with the dirtying rate of the guest RAM.
 But how do I check the coming migration fit this situation?


REF:
https://review.openstack.org/#/c/248465/


BR
Shaohe Feng
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [RFC] how to enable xbzrle compress for live migration

2015-11-27 Thread
2015-11-27 2:19 GMT+08:00 Daniel P. Berrange :

> On Thu, Nov 26, 2015 at 05:39:04PM +, Daniel P. Berrange wrote:
> > On Thu, Nov 26, 2015 at 11:55:31PM +0800, 少合冯 wrote:
> > > 3.  dynamically choose when to activate xbzrle compress for live
> migration.
> > >  This is the best.
> > >  xbzrle really wants to be used if the network is not able to keep
> up
> > > with the dirtying rate of the guest RAM.
> > >  But how do I check the coming migration fit this situation?
> >
> > FWIW, if we decide we want compression support in Nova, I think that
> > having the Nova libvirt driver dynamically decide when to use it is
> > the only viable approach. Unfortunately the way the QEMU support
> > is implemented makes it very hard to use, as QEMU forces you to decide
> > to use it upfront, at a time when you don't have any useful information
> > on which to make the decision :-(  To be useful IMHO, we really need
> > the ability to turn on compression on the fly for an existing active
> > migration process. ie, we'd start migration off and let it run and
> > only enable compression if we encounter problems with completion.
> > Sadly we can't do this with QEMU as it stands today :-(
> >
>
[Shaohe Feng]
Add more guys working on kernel/hypervisor in our loop.
Wonder whether there will be any good solutions to improve it in QEMU in
future.


> > Oh and of course we still need to address the issue of RAM usage and
> > communicating that need with the scheduler in order to avoid OOM
> > scenarios due to large compression cache.
> >
> > I tend to feel that the QEMU compression code is currently broken by
> > design and needs rework in QEMU before it can be pratically used in
> > an autonomous fashion :-(
>
> Actually thinking about it, there's not really any significant
> difference between Option 1 and Option 3. In both cases we want
> a nova.conf setting live_migration_compression=on|off to control
> whether we want to *permit* use  of compression.
>
> The only real difference between 1 & 3 is whether migration has
> compression enabled always, or whether we turn it on part way
> though migration.
>
> So although option 3 is our desired approach (which we can't
> actually implement due to QEMU limitations), option 1 could
> be made fairly similar if we start off with a very small
> compression cache size which would have the effect of more or
> less disabling compression initially.
>
> We already have logic in the code for dynamically increasing
> the max downtime value, which we could mirror here
>
> eg something like
>
>  live_migration_compression=on|off
>
>   - Whether to enable use of compression
>
>  live_migration_compression_cache_ratio=0.8
>
>   - The maximum size of the compression cache relative to
> the guest RAM size. Must be less than 1.0
>
>  live_migration_compression_cache_steps=10
>
>   - The number of steps to take to get from initial cache
> size to the maximum cache size
>
>  live_migration_compression_cache_delay=75
>
>   - The time delay in seconds between increases in cache
> size
>
>
> In the same way that we do with migration downtime, instead of
> increasing cache size linearly, we'd increase it in ever larger
> steps until we hit the maximum. So we'd start off fairly small
> a few MB, and monitoring the cache hit rates, we'd increase it
> periodically.  If the number of steps configured and time delay
> between steps are reasonably large, that would have the effect
> that most migrations would have a fairly small cache and would
> complete without needing much compression overhead.
>
> Doing this though, we still need a solution to the host OOM scenario
> problem. We can't simply check free RAM at start of migration and
> see if there's enough to spare for compression cache, as the schedular
> can spawn a new guest on the compute host at any time, pushing us into
> OOM. We really need some way to indicate that there is a (potentially
> very large) extra RAM overhead for the guest during migration.
>
> ie if live_migration_compression_cache_ratio is 0.8 and we have a
> 4 GB guest, we need to make sure the schedular knows that we are
> potentially going to be using 7.2 GB of memory during migration
>
>
[Shaohe Feng]
These suggestions sounds good.
Thank you, Daneil.

Do we need to consider this factor:
  Seems, XBZRLE compress is executed after bulk stage. During the bulk
stage,
  calculate an transfer rate. If the transfer rate bellow a certain
  threshold value, we can set a bigger cache size.




> Regards,
> Daniel
> --
> |: http://berrange.com  

Re: [openstack-dev] [nova] [RFC] how to enable xbzrle compress for live migration

2015-11-27 Thread
2015-11-27 19:49 GMT+08:00 Daniel P. Berrange :

> On Fri, Nov 27, 2015 at 07:37:50PM +0800, 少合冯 wrote:
> > 2015-11-27 2:19 GMT+08:00 Daniel P. Berrange :
> >
> > > On Thu, Nov 26, 2015 at 05:39:04PM +, Daniel P. Berrange wrote:
> > > > On Thu, Nov 26, 2015 at 11:55:31PM +0800, 少合冯 wrote:
> > > > > 3.  dynamically choose when to activate xbzrle compress for live
> > > migration.
> > > > >  This is the best.
> > > > >  xbzrle really wants to be used if the network is not able to
> keep
> > > up
> > > > > with the dirtying rate of the guest RAM.
> > > > >  But how do I check the coming migration fit this situation?
> > > >
> > > > FWIW, if we decide we want compression support in Nova, I think that
> > > > having the Nova libvirt driver dynamically decide when to use it is
> > > > the only viable approach. Unfortunately the way the QEMU support
> > > > is implemented makes it very hard to use, as QEMU forces you to
> decide
> > > > to use it upfront, at a time when you don't have any useful
> information
> > > > on which to make the decision :-(  To be useful IMHO, we really need
> > > > the ability to turn on compression on the fly for an existing active
> > > > migration process. ie, we'd start migration off and let it run and
> > > > only enable compression if we encounter problems with completion.
> > > > Sadly we can't do this with QEMU as it stands today :-(
> > > >
> > >
> > [Shaohe Feng]
> > Add more guys working on kernel/hypervisor in our loop.
> > Wonder whether there will be any good solutions to improve it in QEMU in
> > future.
> >
>
IMHO,  It is possible to enable XBZRLE  on the fly during for an existing
active
migration process.
Than need improvement in qemu.



> >
> > > > Oh and of course we still need to address the issue of RAM usage and
> > > > communicating that need with the scheduler in order to avoid OOM
> > > > scenarios due to large compression cache.
> > > >
> > > > I tend to feel that the QEMU compression code is currently broken by
> > > > design and needs rework in QEMU before it can be pratically used in
> > > > an autonomous fashion :-(
> > >
> > > Actually thinking about it, there's not really any significant
> > > difference between Option 1 and Option 3. In both cases we want
> > > a nova.conf setting live_migration_compression=on|off to control
> > > whether we want to *permit* use  of compression.
> > >
> > > The only real difference between 1 & 3 is whether migration has
> > > compression enabled always, or whether we turn it on part way
> > > though migration.
> > >
> > > So although option 3 is our desired approach (which we can't
> > > actually implement due to QEMU limitations), option 1 could
> > > be made fairly similar if we start off with a very small
> > > compression cache size which would have the effect of more or
> > > less disabling compression initially.
> > >
> > > We already have logic in the code for dynamically increasing
> > > the max downtime value, which we could mirror here
> > >
> > > eg something like
> > >
> > >  live_migration_compression=on|off
> > >
> > >   - Whether to enable use of compression
> > >
> > >  live_migration_compression_cache_ratio=0.8
> > >
> > >   - The maximum size of the compression cache relative to
> > > the guest RAM size. Must be less than 1.0
> > >
> > >  live_migration_compression_cache_steps=10
> > >
> > >   - The number of steps to take to get from initial cache
> > > size to the maximum cache size
> > >
> > >  live_migration_compression_cache_delay=75
> > >
> > >   - The time delay in seconds between increases in cache
> > > size
> > >
> > >
> > > In the same way that we do with migration downtime, instead of
> > > increasing cache size linearly, we'd increase it in ever larger
> > > steps until we hit the maximum. So we'd start off fairly small
> > > a few MB, and monitoring the cache hit rates, we'd increase it
> > > periodically.  If the number of steps configured and time delay
> > > between steps are reasonably large, that would have the effect
> > > that most migrations would have a fairly small cache 

Re: [openstack-dev] [nova] [RFC] how to enable xbzrle compress for live migration

2015-11-29 Thread
2015-11-30 14:45 GMT+08:00 Koniszewski, Pawel :

> > -Original Message-
> > From: Murray, Paul (HP Cloud) [mailto:pmur...@hpe.com]
> > Sent: Friday, November 27, 2015 4:29 PM
> > To: Daniel P. Berrange; Carlton, Paul (Cloud Services)
> > Cc: 少合冯; OpenStack Development Mailing List (not for usage questions);
> > John Garbutt; Koniszewski, Pawel; Jin, Yuntong; Feng, Shaohe; Qiao,
> Liyong
> > Subject: RE: [nova] [RFC] how to enable xbzrle compress for live
> migration
> >
> >
> >
> > > -Original Message-
> > > From: Daniel P. Berrange [mailto:berra...@redhat.com]
> > > Sent: 26 November 2015 17:58
> > > To: Carlton, Paul (Cloud Services)
> > > Cc: 少合冯; OpenStack Development Mailing List (not for usage
> > questions);
> > > John Garbutt; pawel.koniszew...@intel.com; yuntong@intel.com;
> > > shaohe.f...@intel.com; Murray, Paul (HP Cloud); liyong.q...@intel.com
> > > Subject: Re: [nova] [RFC] how to enable xbzrle compress for live
> > > migration
> > >
> > > On Thu, Nov 26, 2015 at 05:49:50PM +, Paul Carlton wrote:
> > > > Seems to me the prevailing view is that we should get live migration
> > > > to figure out the best setting for itself where possible.  There was
> > > > discussion of being able have a default policy setting that will
> > > > allow the operator to define balance between speed of migration and
> > > > impact on the instance.  This could be a global default for the
> > > > cloud with overriding defaults per aggregate, image, tenant and
> > > > instance as well as the ability to vary the setting during the
> migration
> > operation.
> > > >
> > > > Seems to me that items like compression should be set in
> > > > configuration files based on what works best given the cloud
> operator's
> > environment?
> > >
> > > Merely turning on use of compression is the "easy" bit - there needs
> > > to be a way to deal with compression cache size allocation, which
> > > needs to have some smarts in Nova, as there's no usable "one size fits
> > > all" value for the compression cache size. If we did want to hardcode
> > > a compression cache size, you'd have to pick set it as a scaling
> factor against
> > the guest RAM size.
> > > This is going to be very heavy on memory usage, so there needs careful
> > > design work to solve the problem of migration compression triggering
> > > host OOM scenarios, particularly since we can have multiple concurrent
> > > migrations.
> > >
> >
> >
> > Use cases for live migration generally fall into two types:
> >
> > 1. I need to empty the host (host maintenance/reboot)
> >
> > 2. I generally want to balance load on the cloud
> >
> > The first case is by far the most common need right now and in that case
> the
> > node gets progressively more empty as VMs are moved off. So the resources
> > available for caching etc. grow as the process goes on.
>
> I'd rather say that these resources might shrink. You need to turn off one
> compute node, stack more VMs on remaining compute nodes and you need to
> allocate cache on both sides, source and destination.
>

why do we need on destination?

>
> > The second case is less likely to be urgent from the operators point of
> view,
> > so doing things more slowly may not be a problem.
> >
> > So looking at how much resource is available at the start of a migration
> and
> > deciding then what to do on a per VM basis is probably not a bad idea.
> > Especially if we can differentiate between the two cases.
> >
> >
> > > Regards,
> > > Daniel
> > > --
> > > |: http://berrange.com  -o-
> http://www.flickr.com/photos/dberrange/
> > :|
> > > |: http://libvirt.org  -o-
> http://virt-manager.org :|
> > > |: http://autobuild.org   -o-
> http://search.cpan.org/~danberr/ :|
> > > |: http://entangle-photo.org   -o-
> http://live.gnome.org/gtk-vnc :|
>
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [RFC] how to enable xbzrle compress for live migration

2015-11-30 Thread
2015-11-30 16:19 GMT+08:00 Koniszewski, Pawel :

> 2015-11-30 14:45 GMT+08:00 Koniszewski, Pawel  >:
> > -Original Message-
> > From: Murray, Paul (HP Cloud) [mailto:pmur...@hpe.com]
> > Sent: Friday, November 27, 2015 4:29 PM
> > To: Daniel P. Berrange; Carlton, Paul (Cloud Services)
> > Cc: 少合冯; OpenStack Development Mailing List (not for usage questions);
> > John Garbutt; Koniszewski, Pawel; Jin, Yuntong; Feng, Shaohe; Qiao,
> Liyong
> > Subject: RE: [nova] [RFC] how to enable xbzrle compress for live
> migration
> >
> >
> >
> > > -Original Message-
> > > From: Daniel P. Berrange [mailto:berra...@redhat.com]
> > > Sent: 26 November 2015 17:58
> > > To: Carlton, Paul (Cloud Services)
> > > Cc: 少合冯; OpenStack Development Mailing List (not for usage
> > questions);
> > > John Garbutt; pawel.koniszew...@intel.com; yuntong@intel.com;
> > > shaohe.f...@intel.com; Murray, Paul (HP Cloud); liyong.q...@intel.com
> > > Subject: Re: [nova] [RFC] how to enable xbzrle compress for live
> > > migration
> > >
> > > On Thu, Nov 26, 2015 at 05:49:50PM +, Paul Carlton wrote:
> > > > Seems to me the prevailing view is that we should get live migration
> > > > to figure out the best setting for itself where possible.  There was
> > > > discussion of being able have a default policy setting that will
> > > > allow the operator to define balance between speed of migration and
> > > > impact on the instance.  This could be a global default for the
> > > > cloud with overriding defaults per aggregate, image, tenant and
> > > > instance as well as the ability to vary the setting during the
> migration
> > operation.
> > > >
> > > > Seems to me that items like compression should be set in
> > > > configuration files based on what works best given the cloud
> operator's
> > environment?
> > >
> > > Merely turning on use of compression is the "easy" bit - there needs
> > > to be a way to deal with compression cache size allocation, which
> > > needs to have some smarts in Nova, as there's no usable "one size fits
> > > all" value for the compression cache size. If we did want to hardcode
> > > a compression cache size, you'd have to pick set it as a scaling factor
> against
> > the guest RAM size.
> > > This is going to be very heavy on memory usage, so there needs careful
> > > design work to solve the problem of migration compression triggering
> > > host OOM scenarios, particularly since we can have multiple concurrent
> > > migrations.
> > >
> >
> >
> > Use cases for live migration generally fall into two types:
> >
> > 1. I need to empty the host (host maintenance/reboot)
> >
> > 2. I generally want to balance load on the cloud
> >
> > The first case is by far the most common need right now and in that case
> the
> > node gets progressively more empty as VMs are moved off. So the resources
> > available for caching etc. grow as the process goes on.
> >I'd rather say that these resources might shrink. You need to turn off one
> compute node, stack more VMs on remaining compute nodes and you need to
> allocate cache on both sides, source and destination.
>
> >why do we need on destination?
>
> XBZRLE sends only a delta over network and it works in two phases:
> compressing and decompressing. During compression the original page and
> updated page are XORed together and resulting information is passed over to
> the RLE algorithm - the output is the delta page which is sent over network
> to destination host. During decompression run length decodes each pair of
> symbol-counter and the original page is XORed with the result from the run
> length decoding - the output is the updated page. It means that it needs to
> allocate cache on source and destination node.
>

But I think the RAM on the destination is the  original  page .
Just decompression
with the delta.

It does not need extra cache.


> > The second case is less likely to be urgent from the operators point of
> view,
> > so doing things more slowly may not be a problem.
> >
> > So looking at how much resource is available at the start of a migration
> and
> > deciding then what to do on a per VM basis is probably not a bad idea.
> > Especially if we can differentiate between the two cases.
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova][RFC] delete instance device

2015-11-30 Thread
Hi all,

I'd like to talk about the delete instance device of nova.

Here is the libvirt doc string to describe it underly function
detachDeviceFlags.
http://paste.openstack.org/show/480330/

It says:

detaching a device from a running domain may be asynchronous.

*and it suggests:*

To check whether the device was successfully removed, either recheck domain

configuration using virDomainGetXMLDesc() or add handler for

VIR_DOMAIN_EVENT_ID_DEVICE_REMOVED


Also Daniel elaborated it, and gave us more some scenarios about it.

it is not guaranteed to succeed. What happens is that the hypervisor
injects an ACPI request to unplug the device. The guest OS must co-operate
by releasing the device, before the hypervisor will complete the action of
physically removing it. So you require a guest OS that supports ACPI unplug
of course, and if the guest is crashed or being malicious there is no
guarantee the unplug will succeed. Libvirt will wait a short while for
success, but you must monitor for libvirt events to see if/when it finally
completes. This delayed release has implications for when Nova can mark the
PCI device as unused and available for other guests to assign.


Now I have checked the code, both detach volume or detach interface call
   5   1220  nova/virt/libvirt/driver.py <>
 guest . detach_device ( conf , persistent = True , live = live
)
   6   1280  nova/virt/libvirt/driver.py <>
 guest . detach_device ( cfg , persistent = True , live = live )
   7   3016  nova/virt/libvirt/driver.py <<_detach_pci_devices>>
 guest . detach_device ( self . _get_guest_pci_device ( dev ) ,
live = True )
   8   3105  nova/virt/libvirt/driver.py <<_detach_sriov_ports>>
 guest . detach_device ( cfg , live = True )

And for detach_interface in nova/compute/manager.py:

@wrap_exception()
@wrap_instance_fault
def detach_interface(self, context, instance, port_id):
"""Detach an network adapter from an instance."""
network_info = instance.info_cache.network_info
condemned = None
for vif in network_info:
if vif['id'] == port_id:
condemned = vif
break
if condemned is None:
raise exception.PortNotFound(_("Port %s is not "
   "attached") % port_id)
try:
self.driver.detach_interface(instance, condemned)
except exception.NovaException as ex:
LOG.warning(_LW("Detach interface failed, port_id=%(port_id)s,"
" reason: %(msg)s"),
{'port_id': port_id, 'msg': ex}, instance=instance)
raise exception.InterfaceDetachFailed(instance_uuid=instance.uuid)
else:
try:
self.network_api.deallocate_port_for_instance(
context, instance, port_id)
except Exception as ex:
with excutils.save_and_reraise_exception():
# Since this is a cast operation, log the failure for
# triage.
LOG.warning(_LW('Failed to deallocate port %(port_id)s '
'for instance. Error: %(error)s'),
{'port_id': port_id, 'error': ex},
instance=instance)



It just detach_interface, no double check the device is detached finally.

Now I will support the detach SRIOV code.
https://review.openstack.org/#/c/139910/
I'm not sure should I need to double check the device is finally detached.

If yes.

What should I support?

3 options:
1. Just ignored it. key the nova code.

2. sync check.
as the libvirt suggests: use virDomainGetXMLDesc()

def detach_interface(self, context, instance, port_id):

   self.driver.detach_interface(instance, condemned)

   # just *pseudo-code*

*   for i in range(1, 51):*

  if not(virDomainGetXMLDesc()):

   sleep(1)

   else if i == 51:

   raise exception

   else:

   break

   self.network_api.deallocate_port_for_instance(
   context, instance, port_id)



3. async notification.
 as the libvirt suggests:

add event handler for VIR_DOMAIN_EVENT_ID_DEVICE_REMOVED.


call network_api.deallocate_port_for_instance in a backend task.
The backend receives the event result from event handler by AMPQ and filter
the
device is the the expected interface device, not the volume device.

Then backend call network_api.deallocate_port_for_instance to deallocate
the port.


I have not check the volume detach, not sure it has the same issues.

Beside this issue:
But from the libvirt Doc string,

hypervisors may prevent this operation if there is a currentblock copy
operation on the device being detached;
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsub

[openstack-dev] [nova][RFC] A base patch for live migration(list/show/cancel/pause)

2015-12-21 Thread
Now,

I have notice:

https://review.openstack.org/#/c/258771/  (list/show migration)
https://review.openstack.org/#/c/245921/   (pause migration)

introduce the a new controller:  ServerMigrationsController.
As we all know, the following  cancel patch will also need it.

Also the above patches are not a small patch, so it will take some times to
be well reviewed.

In order to avoid duplicated code, and one depends on others to long.

So I suggest to split the ServerMigrationsController as separated

And it's better simple.

So the define  list/show/cancel/pause as HTTPNotImplemented. as follow link:
http://paste.openstack.org/show/482387/
It is a good way to sync our work.

Notes: in both list and pause migration patch, they define:
ALIAS = 'os-server-migrations'
policy:
"os_compute_api:os-server-migrations:force_complete": "rule:admin_or_owner",


They are not right.

'os_compute_api:servers:migrations:show', and the default permission is
admin only.

please see this spec.
https://review.openstack.org/#/c/255122/8/specs/mitaka/approved/live-migration-progress-report.rst

we should know migrations is the sub-collection of servers.
And only Admin care  migration.
Owner should not care which host their VM is on.

BR
Shaohe Feng
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] request spec freeze exception for Attach/Detach SR-IOV interface

2015-01-12 Thread
Hello,

I'd like to request an exception for Attach/Detach SR-IOV interface
feature. [1]
This is an important feature that aims to improve better performance than
normal
network interface in guests and not too hard to implement.

Thanks,
Shao He, Feng

[1] https://review.openstack.org/#/c/139910/

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] request spec freeze exception for Attach/Detach SR-IOV interface

2015-01-12 Thread
Hello,

I'd like to request an exception for Attach/Detach SR-IOV interface
feature. [1]
This is an important feature that aims to improve better performance than
normal
network interface in guests and not too hard to implement.

Thanks,
Shao He, Feng

[1] https://review.openstack.org/#/c/139910/
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] request spec freeze exception for Attach/Detach SR-IOV interface

2015-01-12 Thread
2015-01-13 13:57 GMT+08:00 少合冯 :

> Hello,
>
> I'd like to request an exception for Attach/Detach SR-IOV interface
> feature. [1]
> This is an important feature that aims to improve better performance than
> normal
> network interface in guests and not too hard to implement.
>
> Thanks,
> Shao He, Feng
>
> [1] https://review.openstack.org/#/c/139910/
> <https://review.openstack.org/#/c/128825>
>


Sorry, the above link is wrong

This is the right one.
[1] https://review.openstack.org/#/c/139910/
Thanks.
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova][cell] Can nova do live migration cross cells?

2016-01-08 Thread
Hi, all

Now I'm working on the migrations list/show API.
The new api defines  migrations is a sub-collection of an instance.
GET   v2.1/tenant_id/servers/server-id/migrations/migration-id

I need to support  the new cell API to get a migration of a specified
instance.
get_migration_by_instance_and_id


I read up the nova cell doc.
http://docs.openstack.org/developer/nova/cells.html

It tells us that, the migrations table will be API-level.
The instances table is Cell-level.
So does that means that we can do live migration cross cells?


And also this will affect my code.
If the migrations is API-level, so I do not need a cell name to get the
migrations info from DB.

Or I need to get the cell name by the specified instance, and then get
the migrations
info by _TargetedMessage.


BR
ShaoHe Feng
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][cell] Can nova do live migration cross cells?

2016-01-08 Thread
2016-01-08 21:53 GMT+08:00 Andrew Laski :

> On 01/08/16 at 12:33pm, John Garbutt wrote:
>
>> On 8 January 2016 at 08:17, 少合冯  wrote:
>>
>>> Hi, all
>>>
>>> Now I'm working on the migrations list/show API.
>>> The new api defines  migrations is a sub-collection of an instance.
>>> GET   v2.1/tenant_id/servers/server-id/migrations/migration-id
>>>
>>> I need to support  the new cell API to get a migration of a specified
>>> instance.
>>> get_migration_by_instance_and_id
>>>
>>> I read up the nova cell doc.
>>> http://docs.openstack.org/developer/nova/cells.html
>>>
>>> It tells us that, the migrations table will be API-level.
>>> The instances table is Cell-level.
>>> So does that means that we can do live migration cross cells?
>>>
>>
>> In sort, you can't currently live-migrate between cells.
>>
>> So there is the current cells v1. When using cells v1 you will never
>> be able to move between cells. In addition, when using nova-network,
>> network assumptions generally mean IPs can't move between cells.
>>
>
> Assuming a global network the limitation is just within Nova.  There is no
> mechanism for moving instance data from one cell database to another.  And
> it's not something that will be added since v1 is in a freeze.
>
>
>> In cells v2, we were talking about ways that could happen. Its still
>> hard, because you would need to copy the instance record between one
>> cell database to another, with the API understanding how the move is
>> going. Complexity best avoided, if possible.
>>
>
> Cells v2 makes some architecture choices that will make it easier to
> accomplish this.  But it's not necessary to have for cells v2 so it's not
> likely to be in place initially.
>
>
>> And also this will affect my code.
>>> If the migrations is API-level, so I do not need a cell name to get the
>>> migrations info from DB.
>>>
>>> Or I need to get the cell name by the specified instance, and then get
>>> the
>>> migrations info by _TargetedMessage.
>>>
>>
>> Afraid I don't have enough context from the rest of your email to
>> answer these questions.
>>
>
> I'm not entirely clear on the question either, but if the migration is in
> the api level database then you do not need a cell name to get the
> migration from the db.  The cell name is only used to route a request to a
> cell database.
>
> Thanks very much.
Sorry for my question is not clear.
But you  have  gave me the answer.

>
>> Thanks,
>> John
>>
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][cell] Can nova do live migration cross cells?

2016-01-08 Thread
Thank you very much. .

And Andrew Laski  has answer my second question.

2016-01-08 20:33 GMT+08:00 John Garbutt :

> On 8 January 2016 at 08:17, 少合冯  wrote:
> > Hi, all
> >
> > Now I'm working on the migrations list/show API.
> > The new api defines  migrations is a sub-collection of an instance.
> > GET   v2.1/tenant_id/servers/server-id/migrations/migration-id
> >
> > I need to support  the new cell API to get a migration of a specified
> > instance.
> > get_migration_by_instance_and_id
> >
> > I read up the nova cell doc.
> > http://docs.openstack.org/developer/nova/cells.html
> >
> > It tells us that, the migrations table will be API-level.
> > The instances table is Cell-level.
> > So does that means that we can do live migration cross cells?
>
> In sort, you can't currently live-migrate between cells.
>
> So there is the current cells v1. When using cells v1 you will never
> be able to move between cells. In addition, when using nova-network,
> network assumptions generally mean IPs can't move between cells.
>
> In cells v2, we were talking about ways that could happen. Its still
> hard, because you would need to copy the instance record between one
> cell database to another, with the API understanding how the move is
> going. Complexity best avoided, if possible.
>
> > And also this will affect my code.
> > If the migrations is API-level, so I do not need a cell name to get the
> > migrations info from DB.
> >
> > Or I need to get the cell name by the specified instance, and then get
> the
> > migrations info by _TargetedMessage.
>
> Afraid I don't have enough context from the rest of your email to
> answer these questions.
>
> Thanks,
> John
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][cell] Can nova do live migration cross cells?

2016-01-08 Thread
still a question in line.
Thanks.

2016-01-08 21:53 GMT+08:00 Andrew Laski :

> On 01/08/16 at 12:33pm, John Garbutt wrote:
>
>> On 8 January 2016 at 08:17, 少合冯  wrote:
>>
>>> Hi, all
>>>
>>> Now I'm working on the migrations list/show API.
>>> The new api defines  migrations is a sub-collection of an instance.
>>> GET   v2.1/tenant_id/servers/server-id/migrations/migration-id
>>>
>>> I need to support  the new cell API to get a migration of a specified
>>> instance.
>>> get_migration_by_instance_and_id
>>>
>>> I read up the nova cell doc.
>>> http://docs.openstack.org/developer/nova/cells.html
>>>
>>> It tells us that, the migrations table will be API-level.
>>> The instances table is Cell-level.
>>> So does that means that we can do live migration cross cells?
>>>
>>
>> In sort, you can't currently live-migrate between cells.
>>
>> So there is the current cells v1. When using cells v1 you will never
>> be able to move between cells. In addition, when using nova-network,
>> network assumptions generally mean IPs can't move between cells.
>>
>
> Assuming a global network the limitation is just within Nova.  There is no
> mechanism for moving instance data from one cell database to another.  And
> it's not something that will be added since v1 is in a freeze.
>
>
>> In cells v2, we were talking about ways that could happen. Its still
>> hard, because you would need to copy the instance record between one
>> cell database to another, with the API understanding how the move is
>> going. Complexity best avoided, if possible.
>>
>
> Cells v2 makes some architecture choices that will make it easier to
> accomplish this.  But it's not necessary to have for cells v2 so it's not
> likely to be in place initially.
>
>
>> And also this will affect my code.
>>> If the migrations is API-level, so I do not need a cell name to get the
>>> migrations info from DB.
>>>
>>> Or I need to get the cell name by the specified instance, and then get
>>> the
>>> migrations info by _TargetedMessage.
>>>
>>
>> Afraid I don't have enough context from the rest of your email to
>> answer these questions.
>>
>
> I'm not entirely clear on the question either, but if the migration is in
> the api level database then you do not need a cell name to get the
> migration from the db.  The cell name is only used to route a request to a
> cell database.
>

still puzzle the current nova code.
https://github.com/openstack/nova/blob/master/nova/cells/messaging.py#L1668
https://github.com/openstack/nova/blob/master/nova/cells/messaging.py#L1330

why get_migrations in the above link still need to route a request  to a
cell database, or broadcast to all cell database?


>
>> Thanks,
>> John
>>
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [Nova][API] Does nova API allow the server_id parem as DB index?

2016-02-15 Thread
I guess others may ask the same questions.

I read the nova API doc:
such as this API:
http://developer.openstack.org/api-ref-compute-v2.1.html#showServer

GET /v2.1/​{tenant_id}​/servers/​{server_id}​
*Show server details*


*Request parameters*
ParameterStyleTypeDescription
tenant_id URI csapi:UUID

The UUID of the tenant in a multi-tenancy cloud.
server_id URI csapi:UUID

The UUID of the server.

But I can get the server by DB index:

curl -s -H X-Auth-Token:6b8968eb38df47c6a09ac9aee81ea0c6
http://192.168.2.103:8774/v2.1/f5a8829cc14c4825a2728b273aa91aa1/servers/2
{
"server": {
"OS-DCF:diskConfig": "MANUAL",
"OS-EXT-AZ:availability_zone": "nova",
"OS-EXT-SRV-ATTR:host": "shaohe1",
"OS-EXT-SRV-ATTR:hypervisor_hostname": "shaohe1",
"OS-EXT-SRV-ATTR:instance_name": "instance-0002",
"OS-EXT-STS:power_state": 1,
"OS-EXT-STS:task_state": "migrating",
"OS-EXT-STS:vm_state": "error",
"OS-SRV-USG:launched_at": "2015-12-18T07:41:00.00",
"OS-SRV-USG:terminated_at": null,
..
}
}

and the code really allow it use  DB index
https://github.com/openstack/nova/blob/master/nova/compute/api.py#L1939
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova][API] Does nova API allow the server_id parem as DB index?

2016-02-15 Thread
File a bug.
https://bugs.launchpad.net/openstack-api-site/+bug/1545922


Anne,  Alex, and Ghanshyam Mann,
Can this be raised in the API meeting for discussion?

I'm very care my patch:
https://review.openstack.org/#/c/258771/12/nova/api/openstack/compute/server_migrations.py

Should I allow it as DB index?

BR
Shaohe Feng.


2016-02-16 10:46 GMT+08:00 GHANSHYAM MANN :

> Yes, currently Nova support that for show/update/delete server APIs etc
> (both v2 and v2.1) and python-novaclient too. But I think that was old
> behaviour and for ec2 API mainly?
>
> I searched on ec2 repo [1] and they get the instance from nova using UUID,
> i did not find any place they are fetching using id. Hut not sure if
> external interface directly fetch that on nova by 'id'.
>
> But apart from that, may be some users using 'id' instead of 'uuid' but
> that was not recommended or documented anywhere So in that case can we
> remove this old behaviour without version bump?
>
>
> [1].. https://github.com/openstack/ec2-api
>
> Regards
> Ghanshyam Mann
>
> On Tue, Feb 16, 2016 at 11:24 AM, Anne Gentle <
> annegen...@justwriteclick.com> wrote:
>
>>
>>
>> On Mon, Feb 15, 2016 at 6:03 PM, 少合冯  wrote:
>>
>>> I guess others may ask the same questions.
>>>
>>> I read the nova API doc:
>>> such as this API:
>>> http://developer.openstack.org/api-ref-compute-v2.1.html#showServer
>>>
>>> GET /v2.1/​{tenant_id}​/servers/​{server_id}​
>>> *Show server details*
>>>
>>>
>>> *Request parameters*
>>> ParameterStyleTypeDescription
>>> tenant_id URI csapi:UUID
>>>
>>> The UUID of the tenant in a multi-tenancy cloud.
>>> server_id URI csapi:UUID
>>>
>>> The UUID of the server.
>>>
>>> But I can get the server by DB index:
>>>
>>> curl -s -H X-Auth-Token:6b8968eb38df47c6a09ac9aee81ea0c6
>>> http://192.168.2.103:8774/v2.1/f5a8829cc14c4825a2728b273aa91aa1/servers/2
>>> {
>>> "server": {
>>> "OS-DCF:diskConfig": "MANUAL",
>>> "OS-EXT-AZ:availability_zone": "nova",
>>> "OS-EXT-SRV-ATTR:host": "shaohe1",
>>> "OS-EXT-SRV-ATTR:hypervisor_hostname": "shaohe1",
>>> "OS-EXT-SRV-ATTR:instance_name": "instance-0002",
>>> "OS-EXT-STS:power_state": 1,
>>> "OS-EXT-STS:task_state": "migrating",
>>> "OS-EXT-STS:vm_state": "error",
>>> "OS-SRV-USG:launched_at": "2015-12-18T07:41:00.00",
>>> "OS-SRV-USG:terminated_at": null,
>>> ..
>>> }
>>> }
>>>
>>> and the code really allow it use  DB index
>>> https://github.com/openstack/nova/blob/master/nova/compute/api.py#L1939
>>>
>>>
>> Nice find. Can you log this as an API bug and we'll triage it -- can even
>> help you fix it on the site if you like.
>>
>> https://bugs.launchpad.net/openstack-api-site/+filebug
>>
>> Basically, click that link, write a short summary, then copy and paste in
>> this email's contents, it has lots of good info.
>>
>> Let me know if you'd also like to fix the bug on the site.
>>
>> And hey nova team, if you think it's actually an API bug, we'll move it
>> over to you.
>>
>> Thanks for reporting it!
>> Anne
>>
>>
>>
>>>
>>> __
>>> OpenStack Development Mailing List (not for usage questions)
>>> Unsubscribe:
>>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>
>>>
>>
>>
>> --
>> Anne Gentle
>> Rackspace
>> Principal Engineer
>> www.justwriteclick.com
>>
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [Nova][Migration][RFC]: What are in progress migration?

2016-02-25 Thread
There's one current nova code define it as follow:
https://github.com/openstack/nova/blob/master/nova/db/sqlalchemy/api.py#L4535-L4546

that means: beside [ 'accepted', 'confirmed', 'reverted', 'error', 'failed'
], other status are all in progress.
Note: here  finished is in progress.

John Garbutt has raised the same question in the code review.
https://review.openstack.org/#/c/258771/29/nova/db/sqlalchemy/api.py


There are two problems want to discuss.

1.  should "finished" be in progress?
from literal meaning, it should not.
So we should add it to  non-in-progress list.

And should not return the "finished" migration when users use
migration-index to fetch it.

But is this reasonable?
A user do a migration, he get nothing information about the  migrations by
migration-index after it is finished.


2. I wonder what's the difference among "done", "completed" and "finished" ?
I use  this command:
$ git grep "migration.*status"
I have gotten all migrations status beside non-in-progress as follow.
 done, post-migrating, preparing, queued, completed, accepted,  finished,
running.

The current migration.status define is not good for read so I file a bug. (
https://bugs.launchpad.net/nova/+bug/1549558)
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] [Cyborg] Tracking multiple functions

2018-03-20 Thread
2018-03-07 10:36 GMT+08:00 Alex Xu :

>
>
> 2018-03-07 10:21 GMT+08:00 Alex Xu :
>
>>
>>
>> 2018-03-06 22:45 GMT+08:00 Mooney, Sean K :
>>
>>>
>>>
>>>
>>>
>>> *From:* Matthew Booth [mailto:mbo...@redhat.com]
>>> *Sent:* Saturday, March 3, 2018 4:15 PM
>>> *To:* OpenStack Development Mailing List (not for usage questions) <
>>> openstack-dev@lists.openstack.org>
>>> *Subject:* Re: [openstack-dev] [Nova] [Cyborg] Tracking multiple
>>> functions
>>>
>>>
>>>
>>> On 2 March 2018 at 14:31, Jay Pipes  wrote:
>>>
>>> On 03/02/2018 02:00 PM, Nadathur, Sundar wrote:
>>>
>>> Hello Nova team,
>>>
>>>  During the Cyborg discussion at Rocky PTG, we proposed a flow for
>>> FPGAs wherein the request spec asks for a device type as a resource class,
>>> and optionally a function (such as encryption) in the extra specs. This
>>> does not seem to work well for the usage model that I’ll describe below.
>>>
>>> An FPGA device may implement more than one function. For example, it may
>>> implement both compression and encryption. Say a cluster has 10 devices of
>>> device type X, and each of them is programmed to offer 2 instances of
>>> function A and 4 instances of function B. More specifically, the device may
>>> implement 6 PCI functions, with 2 of them tied to function A, and the other
>>> 4 tied to function B. So, we could have 6 separate instances accessing
>>> functions on the same device.
>>>
>>>
>>>
>>> Does this imply that Cyborg can't reprogram the FPGA at all?
>>>
>>> *[Mooney, Sean K] cyborg is intended to support fixed function
>>> acclerators also so it will not always be able to program the accelerator.
>>> In this case where an fpga is preprogramed with a multi function bitstream
>>> that is statically provisioned cyborge will not be able to reprogram the
>>> slot if any of the fuctions from that slot are already allocated to an
>>> instance. In this case it will have to treat it like a fixed function
>>> device and simply allocate a unused  vf  of the corret type if available. *
>>>
>>>
>>>
>>>
>>>
>>> In the current flow, the device type X is modeled as a resource class,
>>> so Placement will count how many of them are in use. A flavor for ‘RC
>>> device-type-X + function A’ will consume one instance of the RC
>>> device-type-X.  But this is not right because this precludes other
>>> functions on the same device instance from getting used.
>>>
>>> One way to solve this is to declare functions A and B as resource
>>> classes themselves and have the flavor request the function RC. Placement
>>> will then correctly count the function instances. However, there is still a
>>> problem: if the requested function A is not available, Placement will
>>> return an empty list of RPs, but we need some way to reprogram some device
>>> to create an instance of function A.
>>>
>>>
>>> Clearly, nova is not going to be reprogramming devices with an instance
>>> of a particular function.
>>>
>>> Cyborg might need to have a separate agent that listens to the nova
>>> notifications queue and upon seeing an event that indicates a failed build
>>> due to lack of resources, then Cyborg can try and reprogram a device and
>>> then try rebuilding the original request.
>>>
>>>
>>>
>>> It was my understanding from that discussion that we intend to insert
>>> Cyborg into the spawn workflow for device configuration in the same way
>>> that we currently insert resources provided by Cinder and Neutron. So while
>>> Nova won't be reprogramming a device, it will be calling out to Cyborg to
>>> reprogram a device, and waiting while that happens.
>>>
>>> My understanding is (and I concede some areas are a little hazy):
>>>
>>> * The flavors says device type X with function Y
>>>
>>> * Placement tells us everywhere with device type X
>>>
>>> * A weigher orders these by devices which already have an available
>>> function Y (where is this metadata stored?)
>>>
>>> * Nova schedules to host Z
>>>
>>> * Nova host Z asks cyborg for a local function Y and blocks
>>>
>>>   * Cyborg hopefully returns function Y which is already available
>>>
>>>   * If not, Cyborg reprograms a function Y, then returns it
>>>
>>> Can anybody correct me/fill in the gaps?
>>>
>>> *[Mooney, Sean K] that correlates closely to my recollection also. As
>>> for the metadata I think the weigher may need to call to cyborg to retrieve
>>> this as it will not be available in the host state object.*
>>>
>> Is it the nova scheduler weigher or we want to support weigh on
>> placement? Function is traits as I think, so can we have preferred_traits?
>> I remember we talk about that parameter in the past, but we don't have good
>> use-case at that time. This is good use-case.
>>
>
> If we call the Cyborg from the nova scheduler weigher, that will slow down
> the scheduling a lot also.
>

I'm not sure how much the performance loss.
But one nova scheduler weighter call Cyborg API (get the all accelerators
info) once seems is acceptable.

I‘m not sure how many placement API calls during 

[openstack-dev] [Nova] [Cyborg] why cyborg can not support an accelerators info list for more than one host?

2018-03-21 Thread
For today's IRC discussion.
There is question about the weigher.
Can cyborg support a get list API to get more than one host accelerators
info when weigher.

Sorry, I did not attend the PTG.
Is it said there is a conclusion:
Scheduler weigher will call into Cyborg REST API for each host instead of
one REST API for all hosts.
Is there some reason?



INFO:
http://eavesdrop.openstack.org/meetings/openstack_cyborg/2018/openstack_cyborg.2018-03-21-14.00.log.html


BR
Shaohe Feng
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] [Cyborg] why cyborg can not support an accelerators info list for more than one host?

2018-03-21 Thread
2018-03-22 0:11 GMT+08:00 Ed Leafe :

> On Mar 21, 2018, at 10:56 AM, 少合冯  wrote:
> >
> > Sorry, I did not attend the PTG.
> > Is it said there is a conclusion:
> > Scheduler weigher will call into Cyborg REST API for each host instead
> of one REST API for all hosts.
> > Is there some reason?
>
> By default, hosts are weighed one by one. You can subclass the BaseWeigher
> (in nova/weights.py) to weigh all objects at once.
>
>
Does that means it require call cyborg accelerator one by one?  the pseudo
code as follow:
for host in hosts:
   accelerator = cyborg.http_get_ accelerator(host)
   do_weight_by_accelerator

Instead of call cyborg accelerators once,  the pseudo code as follow :
accelerators = cyborg.http_get_ accelerator(hosts)
for acc in accelerators:
   do_weight_by_accelerator

-- Ed Leafe
>
>
>
>
>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] [Cyborg] why cyborg can not support an accelerators info list for more than one host?

2018-03-21 Thread
got it, thanks.


2018-03-22 0:50 GMT+08:00 Ed Leafe :

> On Mar 21, 2018, at 11:35 AM, 少合冯  wrote:
> >
> >> By default, hosts are weighed one by one. You can subclass the
> BaseWeigher (in nova/weights.py) to weigh all objects at once.
> >
> > Does that means it require call cyborg accelerator one by one?  the
> pseudo code as follow:
> > for host in hosts:
> >accelerator = cyborg.http_get_ accelerator(host)
> >do_weight_by_accelerator
> >
> > Instead of call cyborg accelerators once,  the pseudo code as follow :
> > accelerators = cyborg.http_get_ accelerator(hosts)
> > for acc in accelerators:
> >do_weight_by_accelerator
>
> What it means is that if you override the weigh_objects() method of the
> BaseWeigher class, you can make a single call to Cyborg with a list of all
> the hosts. That call could then create a list of weights for all the hosts
> and return that. So if you have 100 hosts, you don’t need to make 100 calls
> to Cyborg; only 1.
>
> -- Ed Leafe
>
>
>
>
>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [cyborg] Race condition in the Cyborg/Nova flow

2018-03-27 Thread
As I know placement and nova scheduler dedicate to filter and weight.
 Placement
and nova scheduler is responsible for avoiding race.

Nested provider + traits should cover most scenarios.

Any  special case please let the nova developer and cyborg developer know,
let work together to get a solution.



I re-paste our design (for a POC) I have send it before as follow, hopeful
it can helpful.
We do not let cyborg do any scheduler function( include filter and weight).
It just responsible to do binding for FPGA device and vm instance ( or call
it FPGA devices assignment)

===
hi all

IMHO, we can consider the upstream of image management and resource
provider management, even scheduler weight.

1.  image  management
For  image management, I miss one things in the meeting.

We have discussed it before.
And Li Liu suggested to add a cyborg wrapper to upload the FPGA image.
This is a good ideas.
For example:
PUT
/cyborg/v1/images/{image_id}/file


It will call glance upload API to  upload  the image.
This is helpful for us to normalize the tags of image and properties.

To Dutch, Li Liu, Dolpher, Sunder and other FPGA experts:
 How about get agreement on the standardization of glance image
metadata, especially, tags and property.

For the tags:
IMHO, the "FPGA" is necessary, for there maybe many images managed by
glance, not only fpga image but also VM image. This tag can be a filter
help us to get only fpga images.
The vendor name is necessary as a tag? Such as "INTEL" or "XILINX"
The product model is necessary as a tag? Such as "STRATIX10"
Any others should be in the image tags?
For the properties :
It should include the function name(this means the accelerator type).
Should it also include stream id and vendor name?
such as: --property vendor=xilinx --property type=crypto,transcoding
 Any others should be in the image properties?

Li Liu is working on the spec.


2.   provider management.
  resource class, maybe the nested provider supported.
  we can define them as fellow:
  level 1 provider  resource class is  CUSTOM_FPGA_, and level 2
is  CUSTOM_FPGA__,   level 3 is
CUSTOM_FPGA___
  { "CUSTOM_FPGA_VF":
   { "num": 3
  "CUSTOM_FPGA_ XILINX _VF": { "num": 1 }
  "CUSTOM_FPGA_INTEL_VF":
  { "CUSTOM_FPGA_INTEL_STRATIX10_VF": "num": 1 }
  { "CUSTOM_FPGA_INTEL_STRATIX11_VF": "num": 1 }
   }
  }
  Not sure I understand correctly.

  And traits should include:  CUSTOM__FUNCTION_
  domain  means which project to consume these traits. CYBORG or
ACCELERATOR which is better?  Here it means cyborg care these traits. Nova,
neutron, cinder can ignore them.
   function, can be CRYPTO, TRANSCODING.

To Jay Pipes, Dutch, Li Liu, Dolpher, Sunder and other FPGA/placement
experts:
   Any suggestion on it?

3.  scheduler weight.
I think this is not the high priority at present for cyborg.
Zhipeng, Li Liu, Zhuli, Dopher and I have discussed them before for the
deployable model implementation.
We need to add steaming or image information for deployable.
Li Liu and Zhuli's design, they do have add extra info for deployable. So
it can be used for  steaming or image information.

And cyborg API had better support filters for  scheduler weighting.
Such as:
GET /cyborg/v1/accelerators?hosts=cyborg-1, cyborg-2,
cyborg-3&function=crypto,transcoding
It query all the hosts  cyborg-1, cyborg-2, cyborg-3 to get all
accelerators support crypto and transcoding function.
Cyborg API call conductor to get the accelerators information from by these
filters
scheduler can leverage the the accelerators information for weighting.
Maybe  Cyborg API can also help to do the  weighting. But I think this is
not a good idea.

To Sunder:
I know you are interested in scheduler weight and you have some other
weighting solutions.
Hopeful this can useful for you.
REF: https://etherpad.openstack.org/p/cyborg-nova-poc



2018-03-23 12:27 GMT+08:00 Nadathur, Sundar :

> Hi all,
> There seems to be a possibility of a race condition in the Cyborg/Nova
> flow. Apologies for missing this earlier. (You can refer to the proposed
> Cyborg/Nova spec
> 
> for details.)
>
> Consider the scenario where the flavor specifies a resource class for a
> device type, and also specifies a function (e.g. encrypt) in the extra
> specs. The Nova scheduler would only track the device type as a resource,
> and Cyborg needs to track the availability of functions. Further, to keep
> it simple, say all the functions exist all the time (no reprogramming
> involved).
>
> To recap, here is the scheduler flow for this case:
>
>- A request spec with a flavor comes to Nova conductor/scheduler. The
>flavor has a device type as a resource class, and a function in the extra
>specs.
>- Placement API returns the list 

Re: [openstack-dev] [nova] [cyborg] Race condition in the Cyborg/Nova flow

2018-03-28 Thread
I have summarize some scenarios for fpga devices request.
https://etherpad.openstack.org/p/cyborg-fpga-request-scenarios

Please add more  more  scenarios to find out the exceptions that placement
can not satisfy the filter and weight.

IMOH, I refer  placement  to do  filter and weight. If we have to let
cyborg do filter and weight.  Nova scheduler just need call cyborg once for
all host  weight though we do the weigh one by one.


2018-03-23 12:27 GMT+08:00 Nadathur, Sundar :

> Hi all,
> There seems to be a possibility of a race condition in the Cyborg/Nova
> flow. Apologies for missing this earlier. (You can refer to the proposed
> Cyborg/Nova spec
> 
> for details.)
>
> Consider the scenario where the flavor specifies a resource class for a
> device type, and also specifies a function (e.g. encrypt) in the extra
> specs. The Nova scheduler would only track the device type as a resource,
> and Cyborg needs to track the availability of functions. Further, to keep
> it simple, say all the functions exist all the time (no reprogramming
> involved).
>
> To recap, here is the scheduler flow for this case:
>
>- A request spec with a flavor comes to Nova conductor/scheduler. The
>flavor has a device type as a resource class, and a function in the extra
>specs.
>- Placement API returns the list of RPs (compute nodes) which contain
>the requested device types (but not necessarily the function).
>- Cyborg will provide a custom filter which queries Cyborg DB. This
>needs to check which hosts contain the needed function, and filter out the
>rest.
>- The scheduler selects one node from the filtered list, and the
>request goes to the compute node.
>
> For the filter to work, the Cyborg DB needs to maintain a table with
> triples of (host, function type, #free units). The filter checks if a given
> host has one or more free units of the requested function type. But, to
> keep the # free units up to date, Cyborg on the selected compute node needs
> to notify the Cyborg API to decrement the #free units when an instance is
> spawned, and to increment them when resources are released.
>
> Therein lies the catch: this loop from the compute node to controller is
> susceptible to race conditions. For example, if two simultaneous requests
> each ask for function A, and there is only one unit of that available, the
> Cyborg filter will approve both, both may land on the same host, and one
> will fail. This is because Cyborg on the controller does not decrement
> resource usage due to one request before processing the next request.
>
> This is similar to this previous Nova scheduling issue
> .
> That was solved by having the scheduler claim a resource in Placement for
> the selected node. I don't see an analog for Cyborg, since it would not
> know which node is selected.
>
> Thanks in advance for suggestions and solutions.
>
> Regards,
> Sundar
>
>
>
>
>
>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [cyborg] [nova] Cyborg quotas

2018-05-19 Thread
2018-05-18 19:58 GMT+08:00 Nadathur, Sundar :

> Hi Matt,
> On 5/17/2018 3:18 PM, Matt Riedemann wrote:
>
> On 5/17/2018 3:36 PM, Nadathur, Sundar wrote:
>
> This applies only to the resources that Nova handles, IIUC, which does not
> handle accelerators. The generic method that Alex talks about is obviously
> preferable but, if that is not available in Rocky, is the filter an option?
>
>
> If nova isn't creating accelerator resources managed by cyborg, I have no
> idea why nova would be doing quota checks on those types of resources. And
> no, I don't think adding a scheduler filter to nova for checking
> accelerator quota is something we'd add either. I'm not sure that would
> even make sense - the quota for the resource is per tenant, not per host is
> it? The scheduler filters work on a per-host basis.
>
> Can we not extend BaseFilter.filter_all() to get all the hosts in a
> filter?
>   https://github.com/openstack/nova/blob/master/nova/filters.
> py#L36
>
> I should have made it clearer that this putative filter will be
> out-of-tree, and needed only till better solutions become available.
>
>
> Like any other resource in openstack, the project that manages that
> resource should be in charge of enforcing quota limits for it.
>
> Agreed. Not sure how other projects handle it, but here's the situation
> for Cyborg. A request may get scheduled on a compute node with no
> intervention by Cyborg. So, the earliest check that can be made today is in
> the selected compute node. A simple approach can result in quota violations
> as in this example.
>
> Say there are 5 devices in a cluster. A tenant has a quota of 4 and is
> currently using 3. That leaves 2 unused devices, of which the tenant is
> permitted to use only one. But he may submit two concurrent requests, and
> they may land on two different compute nodes. The Cyborg agent in each node
> will see the current tenant usage as 3 and let the request go through,
> resulting in quota violation.
>
> That's a bed design if Cyborg agent in each node let the request go
through.
And the current Cyborg quota design does not have this issue.

> To prevent this, we need some kind of atomic update , like SQLAlchemy's
> with_lockmode():
>  https://wiki.openstack.org/wiki/OpenStack_and_SQLAlchemy#
> Pessimistic_Locking_-_SELECT_FOR_UPDATE
> That seems to have issues, as documented in the link above. Also, since
> every compute node does that, it would also serialize the bringup of all
> instances with accelerators, across the cluster.
>
> If there is a better solution, I'll be happy to hear it.
>
> Thanks,
> Sundar
>
>
>
>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev