Re: [Openstack-operators] How to tune scheduling for "Insufficient compute resources" (race conditions ?)

2016-12-01 Thread Massimo Sgaravatto
Thanks a lot George

It looks like this indeed helps !

Cheers, Massimo

2016-11-30 16:04 GMT+01:00 George Mihaiescu :

> Try changing the following in nova.conf and restart the nova-scheduler:
>
> scheduler_host_subset_size = 10
> scheduler_max_attempts = 10
>
> Cheers,
> George
>
> On Wed, Nov 30, 2016 at 9:56 AM, Massimo Sgaravatto <
> massimo.sgarava...@gmail.com> wrote:
>
>> Hi all
>>
>> I have a problem with scheduling in our Mitaka Cloud,
>> Basically when there are a lot of requests for new instances, some of
>> them fail because "Failed to compute_task_build_instances: Exceeded maximum
>> number of retries". And the failures are because "Insufficient compute
>> resources: Free memory 2879.50 MB < requested
>>  8192 MB" [*]
>>
>> But there are compute nodes with enough memory that could serve such
>> requests.
>>
>> In the conductor log I also see messages reporting that "Function
>> 'nova.servicegroup.drivers.db.DbDriver._report_state' run outlasted
>> interval by xxx sec" [**]
>>
>>
>> My understanding is that:
>>
>> - VM a is scheduled to a certain compute node
>> - the scheduler chooses the same compute node for VM b before the info
>> for that compute node is updated (so the 'size' of VM a is not taken into
>> account)
>>
>> Does this make sense or am I totally wrong ?
>>
>> Any hints about how to cope with such scenarios, besides increasing
>>  scheduler_max_attempts ?
>>
>> scheduler_default_filters is set to:
>>
>> scheduler_default_filters = AggregateInstanceExtraSpecsFil
>> ter,AggregateMultiTenancyIsolation,RetryFilter,AvailabilityZ
>> oneFilter,RamFilter,CoreFilter,AggregateRamFilter,
>> AggregateCoreFilter,ComputeFilter,ComputeCapabilitiesFilter,
>> ImagePropertiesFilter,ServerGroupAntiAffinityFilter,ServerGr
>> oupAffinityFilter
>>
>>
>> Thanks a lot, Massimo
>>
>> [*]
>>
>> 2016-11-30 15:10:20.233 25140 WARNING nova.scheduler.utils
>> [req-ec8c0bdc-b413-4cab-b925-eb8f11212049 840c96b6fb1e4972beaa3d30ade10cc7
>> d27fe2becea94a3e980fb9f66e2f29
>> 1a - - -] Failed to compute_task_build_instances: Exceeded maximum number
>> of retries. Exceeded max scheduling attempts 5 for instance
>> 314eccd0-fc73-446f-8138-7d8d3c
>> 8644f7. Last exception: Insufficient compute resources: Free memory
>> 2879.50 MB < requested 8192 MB.
>> 2016-11-30 15:10:20.233 25140 WARNING nova.scheduler.utils
>> [req-ec8c0bdc-b413-4cab-b925-eb8f11212049 840c96b6fb1e4972beaa3d30ade10cc7
>> d27fe2becea94a3e980fb9f66e2f29
>> 1a - - -] [instance: 314eccd0-fc73-446f-8138-7d8d3c8644f7] Setting
>> instance to ERROR state.
>>
>>
>> [**]
>>
>> 2016-11-30 15:10:48.873 25128 WARNING oslo.service.loopingcall [-]
>> Function 'nova.servicegroup.drivers.db.DbDriver._report_state' run
>> outlasted interval by 9.08 sec
>> 2016-11-30 15:10:54.372 25142 WARNING oslo.service.loopingcall [-]
>> Function 'nova.servicegroup.drivers.db.DbDriver._report_state' run
>> outlasted interval by 9.33 sec
>> 2016-11-30 15:10:54.375 25140 WARNING oslo.service.loopingcall [-]
>> Function 'nova.servicegroup.drivers.db.DbDriver._report_state' run
>> outlasted interval by 9.32 sec
>> 2016-11-30 15:10:54.376 25129 WARNING oslo.service.loopingcall [-]
>> Function 'nova.servicegroup.drivers.db.DbDriver._report_state' run
>> outlasted interval by 9.30 sec
>> 2016-11-30 15:10:54.381 25138 WARNING oslo.service.loopingcall [-]
>> Function 'nova.servicegroup.drivers.db.DbDriver._report_state' run
>> outlasted interval by 9.24 sec
>> 2016-11-30 15:10:54.381 25139 WARNING oslo.service.loopingcall [-]
>> Function 'nova.servicegroup.drivers.db.DbDriver._report_state' run
>> outlasted interval by 9.28 sec
>> 2016-11-30 15:10:54.382 25143 WARNING oslo.service.loopingcall [-]
>> Function 'nova.servicegroup.drivers.db.DbDriver._report_state' run
>> outlasted interval by 9.24 sec
>> 2016-11-30 15:10:54.385 25141 WARNING oslo.service.loopingcall [-]
>> Function 'nova.servicegroup.drivers.db.DbDriver._report_state' run
>> outlasted interval by 9.11 sec
>> 2016-11-30 15:11:01.964 25128 WARNING oslo.service.loopingcall [-]
>> Function 'nova.servicegroup.drivers.db.DbDriver._report_state' run
>> outlasted interval by 3.09 sec
>> 2016-11-30 15:11:05.503 25142 WARNING oslo.service.loopingcall [-]
>> Function 'nova.servicegroup.drivers.db.DbDriver._report_state' run
>> outlasted interval by 1.13 sec
>> 2016-11-30 15:11:05.506 25138 WARNING oslo.service.loopingcall [-]
>> Function 'nova.servicegroup.drivers.db.DbDriver._report_state' run
>> outlasted interval by 1.12 sec
>> 2016-11-30 15:11:05.509 25139 WARNING oslo.service.loopingcall [-]
>> Function 'nova.servicegroup.drivers.db.DbDriver._report_state' run
>> outlasted interval by 1.13 sec
>> 2016-11-30 15:11:05.512 25141 WARNING oslo.service.loopingcall [-]
>> Function 'nova.servicegroup.drivers.db.DbDriver._report_state' run
>> outlasted interval by 1.13 sec
>> 2016-11-30 15:11:05.525 25143 WARNING oslo.service.loopingcall [-]
>> Function 'nova.servicegroup.drivers.db.DbDriver._report_state' run
>> outlasted interval by 1.14

Re: [Openstack-operators] How to tune scheduling for "Insufficient compute resources" (race conditions ?)

2016-11-30 Thread Belmiro Moreira
How many nova-schedulers are you running?
You can hit this issue when multiple nova-schedulers select the same
compute node for different instances.

Belmiro

On Wed, Nov 30, 2016 at 3:56 PM, Massimo Sgaravatto <
massimo.sgarava...@gmail.com> wrote:

> Hi all
>
> I have a problem with scheduling in our Mitaka Cloud,
> Basically when there are a lot of requests for new instances, some of them
> fail because "Failed to compute_task_build_instances: Exceeded maximum
> number of retries". And the failures are because "Insufficient compute
> resources: Free memory 2879.50 MB < requested
>  8192 MB" [*]
>
> But there are compute nodes with enough memory that could serve such
> requests.
>
> In the conductor log I also see messages reporting that "Function
> 'nova.servicegroup.drivers.db.DbDriver._report_state' run outlasted
> interval by xxx sec" [**]
>
>
> My understanding is that:
>
> - VM a is scheduled to a certain compute node
> - the scheduler chooses the same compute node for VM b before the info for
> that compute node is updated (so the 'size' of VM a is not taken into
> account)
>
> Does this make sense or am I totally wrong ?
>
> Any hints about how to cope with such scenarios, besides increasing
>  scheduler_max_attempts ?
>
> scheduler_default_filters is set to:
>
> scheduler_default_filters = AggregateInstanceExtraSpecsFilter,
> AggregateMultiTenancyIsolation,RetryFilter,AvailabilityZoneFilter,
> RamFilter,CoreFilter,AggregateRamFilter,AggregateCoreFilter,ComputeFilter,
> ComputeCapabilitiesFilter,ImagePropertiesFilter,
> ServerGroupAntiAffinityFilter,ServerGroupAffinityFilter
>
>
> Thanks a lot, Massimo
>
> [*]
>
> 2016-11-30 15:10:20.233 25140 WARNING nova.scheduler.utils
> [req-ec8c0bdc-b413-4cab-b925-eb8f11212049 840c96b6fb1e4972beaa3d30ade10cc7
> d27fe2becea94a3e980fb9f66e2f29
> 1a - - -] Failed to compute_task_build_instances: Exceeded maximum number
> of retries. Exceeded max scheduling attempts 5 for instance
> 314eccd0-fc73-446f-8138-7d8d3c
> 8644f7. Last exception: Insufficient compute resources: Free memory
> 2879.50 MB < requested 8192 MB.
> 2016-11-30 15:10:20.233 25140 WARNING nova.scheduler.utils
> [req-ec8c0bdc-b413-4cab-b925-eb8f11212049 840c96b6fb1e4972beaa3d30ade10cc7
> d27fe2becea94a3e980fb9f66e2f29
> 1a - - -] [instance: 314eccd0-fc73-446f-8138-7d8d3c8644f7] Setting
> instance to ERROR state.
>
>
> [**]
>
> 2016-11-30 15:10:48.873 25128 WARNING oslo.service.loopingcall [-]
> Function 'nova.servicegroup.drivers.db.DbDriver._report_state' run
> outlasted interval by 9.08 sec
> 2016-11-30 15:10:54.372 25142 WARNING oslo.service.loopingcall [-]
> Function 'nova.servicegroup.drivers.db.DbDriver._report_state' run
> outlasted interval by 9.33 sec
> 2016-11-30 15:10:54.375 25140 WARNING oslo.service.loopingcall [-]
> Function 'nova.servicegroup.drivers.db.DbDriver._report_state' run
> outlasted interval by 9.32 sec
> 2016-11-30 15:10:54.376 25129 WARNING oslo.service.loopingcall [-]
> Function 'nova.servicegroup.drivers.db.DbDriver._report_state' run
> outlasted interval by 9.30 sec
> 2016-11-30 15:10:54.381 25138 WARNING oslo.service.loopingcall [-]
> Function 'nova.servicegroup.drivers.db.DbDriver._report_state' run
> outlasted interval by 9.24 sec
> 2016-11-30 15:10:54.381 25139 WARNING oslo.service.loopingcall [-]
> Function 'nova.servicegroup.drivers.db.DbDriver._report_state' run
> outlasted interval by 9.28 sec
> 2016-11-30 15:10:54.382 25143 WARNING oslo.service.loopingcall [-]
> Function 'nova.servicegroup.drivers.db.DbDriver._report_state' run
> outlasted interval by 9.24 sec
> 2016-11-30 15:10:54.385 25141 WARNING oslo.service.loopingcall [-]
> Function 'nova.servicegroup.drivers.db.DbDriver._report_state' run
> outlasted interval by 9.11 sec
> 2016-11-30 15:11:01.964 25128 WARNING oslo.service.loopingcall [-]
> Function 'nova.servicegroup.drivers.db.DbDriver._report_state' run
> outlasted interval by 3.09 sec
> 2016-11-30 15:11:05.503 25142 WARNING oslo.service.loopingcall [-]
> Function 'nova.servicegroup.drivers.db.DbDriver._report_state' run
> outlasted interval by 1.13 sec
> 2016-11-30 15:11:05.506 25138 WARNING oslo.service.loopingcall [-]
> Function 'nova.servicegroup.drivers.db.DbDriver._report_state' run
> outlasted interval by 1.12 sec
> 2016-11-30 15:11:05.509 25139 WARNING oslo.service.loopingcall [-]
> Function 'nova.servicegroup.drivers.db.DbDriver._report_state' run
> outlasted interval by 1.13 sec
> 2016-11-30 15:11:05.512 25141 WARNING oslo.service.loopingcall [-]
> Function 'nova.servicegroup.drivers.db.DbDriver._report_state' run
> outlasted interval by 1.13 sec
> 2016-11-30 15:11:05.525 25143 WARNING oslo.service.loopingcall [-]
> Function 'nova.servicegroup.drivers.db.DbDriver._report_state' run
> outlasted interval by 1.14 sec
> 2016-11-30 15:11:05.526 25140 WARNING oslo.service.loopingcall [-]
> Function 'nova.servicegroup.drivers.db.DbDriver._report_state' run
> outlasted interval by 1.15 sec
> 2016-11-30 15:11:05.529 25129 WARNING oslo.service.loopi

Re: [Openstack-operators] How to tune scheduling for "Insufficient compute resources" (race conditions ?)

2016-11-30 Thread Massimo Sgaravatto
Hi Belmiro

We are indeed running 2 nova-schedulers, to have some HA

Thanks, Massimo

2016-11-30 16:18 GMT+01:00 Belmiro Moreira <
moreira.belmiro.email.li...@gmail.com>:

> How many nova-schedulers are you running?
> You can hit this issue when multiple nova-schedulers select the same
> compute node for different instances.
>
> Belmiro
>
> On Wed, Nov 30, 2016 at 3:56 PM, Massimo Sgaravatto <
> massimo.sgarava...@gmail.com> wrote:
>
>> Hi all
>>
>> I have a problem with scheduling in our Mitaka Cloud,
>> Basically when there are a lot of requests for new instances, some of
>> them fail because "Failed to compute_task_build_instances: Exceeded maximum
>> number of retries". And the failures are because "Insufficient compute
>> resources: Free memory 2879.50 MB < requested
>>  8192 MB" [*]
>>
>> But there are compute nodes with enough memory that could serve such
>> requests.
>>
>> In the conductor log I also see messages reporting that "Function
>> 'nova.servicegroup.drivers.db.DbDriver._report_state' run outlasted
>> interval by xxx sec" [**]
>>
>>
>> My understanding is that:
>>
>> - VM a is scheduled to a certain compute node
>> - the scheduler chooses the same compute node for VM b before the info
>> for that compute node is updated (so the 'size' of VM a is not taken into
>> account)
>>
>> Does this make sense or am I totally wrong ?
>>
>> Any hints about how to cope with such scenarios, besides increasing
>>  scheduler_max_attempts ?
>>
>> scheduler_default_filters is set to:
>>
>> scheduler_default_filters = AggregateInstanceExtraSpecsFil
>> ter,AggregateMultiTenancyIsolation,RetryFilter,AvailabilityZ
>> oneFilter,RamFilter,CoreFilter,AggregateRamFilter,
>> AggregateCoreFilter,ComputeFilter,ComputeCapabilitiesFilter,
>> ImagePropertiesFilter,ServerGroupAntiAffinityFilter,ServerGr
>> oupAffinityFilter
>>
>>
>> Thanks a lot, Massimo
>>
>> [*]
>>
>> 2016-11-30 15:10:20.233 25140 WARNING nova.scheduler.utils
>> [req-ec8c0bdc-b413-4cab-b925-eb8f11212049 840c96b6fb1e4972beaa3d30ade10cc7
>> d27fe2becea94a3e980fb9f66e2f29
>> 1a - - -] Failed to compute_task_build_instances: Exceeded maximum number
>> of retries. Exceeded max scheduling attempts 5 for instance
>> 314eccd0-fc73-446f-8138-7d8d3c
>> 8644f7. Last exception: Insufficient compute resources: Free memory
>> 2879.50 MB < requested 8192 MB.
>> 2016-11-30 15:10:20.233 25140 WARNING nova.scheduler.utils
>> [req-ec8c0bdc-b413-4cab-b925-eb8f11212049 840c96b6fb1e4972beaa3d30ade10cc7
>> d27fe2becea94a3e980fb9f66e2f29
>> 1a - - -] [instance: 314eccd0-fc73-446f-8138-7d8d3c8644f7] Setting
>> instance to ERROR state.
>>
>>
>> [**]
>>
>> 2016-11-30 15:10:48.873 25128 WARNING oslo.service.loopingcall [-]
>> Function 'nova.servicegroup.drivers.db.DbDriver._report_state' run
>> outlasted interval by 9.08 sec
>> 2016-11-30 15:10:54.372 25142 WARNING oslo.service.loopingcall [-]
>> Function 'nova.servicegroup.drivers.db.DbDriver._report_state' run
>> outlasted interval by 9.33 sec
>> 2016-11-30 15:10:54.375 25140 WARNING oslo.service.loopingcall [-]
>> Function 'nova.servicegroup.drivers.db.DbDriver._report_state' run
>> outlasted interval by 9.32 sec
>> 2016-11-30 15:10:54.376 25129 WARNING oslo.service.loopingcall [-]
>> Function 'nova.servicegroup.drivers.db.DbDriver._report_state' run
>> outlasted interval by 9.30 sec
>> 2016-11-30 15:10:54.381 25138 WARNING oslo.service.loopingcall [-]
>> Function 'nova.servicegroup.drivers.db.DbDriver._report_state' run
>> outlasted interval by 9.24 sec
>> 2016-11-30 15:10:54.381 25139 WARNING oslo.service.loopingcall [-]
>> Function 'nova.servicegroup.drivers.db.DbDriver._report_state' run
>> outlasted interval by 9.28 sec
>> 2016-11-30 15:10:54.382 25143 WARNING oslo.service.loopingcall [-]
>> Function 'nova.servicegroup.drivers.db.DbDriver._report_state' run
>> outlasted interval by 9.24 sec
>> 2016-11-30 15:10:54.385 25141 WARNING oslo.service.loopingcall [-]
>> Function 'nova.servicegroup.drivers.db.DbDriver._report_state' run
>> outlasted interval by 9.11 sec
>> 2016-11-30 15:11:01.964 25128 WARNING oslo.service.loopingcall [-]
>> Function 'nova.servicegroup.drivers.db.DbDriver._report_state' run
>> outlasted interval by 3.09 sec
>> 2016-11-30 15:11:05.503 25142 WARNING oslo.service.loopingcall [-]
>> Function 'nova.servicegroup.drivers.db.DbDriver._report_state' run
>> outlasted interval by 1.13 sec
>> 2016-11-30 15:11:05.506 25138 WARNING oslo.service.loopingcall [-]
>> Function 'nova.servicegroup.drivers.db.DbDriver._report_state' run
>> outlasted interval by 1.12 sec
>> 2016-11-30 15:11:05.509 25139 WARNING oslo.service.loopingcall [-]
>> Function 'nova.servicegroup.drivers.db.DbDriver._report_state' run
>> outlasted interval by 1.13 sec
>> 2016-11-30 15:11:05.512 25141 WARNING oslo.service.loopingcall [-]
>> Function 'nova.servicegroup.drivers.db.DbDriver._report_state' run
>> outlasted interval by 1.13 sec
>> 2016-11-30 15:11:05.525 25143 WARNING oslo.service.loopingcall [-]
>> Function 'nova.servicegroup.drivers.d

Re: [Openstack-operators] How to tune scheduling for "Insufficient compute resources" (race conditions ?)

2016-11-30 Thread Alvise Dorigo



On 11/30/2016 04:18 PM, Belmiro Moreira wrote:

How many nova-schedulers are you running?
You can hit this issue when multiple nova-schedulers select the same 
compute node for different instances.




we're running 2 nova-scheduler processes. Could you explain more in 
details please ?


many thanks,

Alvise
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] How to tune scheduling for "Insufficient compute resources" (race conditions ?)

2016-11-30 Thread George Mihaiescu
Try changing the following in nova.conf and restart the nova-scheduler:

scheduler_host_subset_size = 10
scheduler_max_attempts = 10

Cheers,
George

On Wed, Nov 30, 2016 at 9:56 AM, Massimo Sgaravatto <
massimo.sgarava...@gmail.com> wrote:

> Hi all
>
> I have a problem with scheduling in our Mitaka Cloud,
> Basically when there are a lot of requests for new instances, some of them
> fail because "Failed to compute_task_build_instances: Exceeded maximum
> number of retries". And the failures are because "Insufficient compute
> resources: Free memory 2879.50 MB < requested
>  8192 MB" [*]
>
> But there are compute nodes with enough memory that could serve such
> requests.
>
> In the conductor log I also see messages reporting that "Function
> 'nova.servicegroup.drivers.db.DbDriver._report_state' run outlasted
> interval by xxx sec" [**]
>
>
> My understanding is that:
>
> - VM a is scheduled to a certain compute node
> - the scheduler chooses the same compute node for VM b before the info for
> that compute node is updated (so the 'size' of VM a is not taken into
> account)
>
> Does this make sense or am I totally wrong ?
>
> Any hints about how to cope with such scenarios, besides increasing
>  scheduler_max_attempts ?
>
> scheduler_default_filters is set to:
>
> scheduler_default_filters = AggregateInstanceExtraSpecsFilter,
> AggregateMultiTenancyIsolation,RetryFilter,AvailabilityZoneFilter,
> RamFilter,CoreFilter,AggregateRamFilter,AggregateCoreFilter,ComputeFilter,
> ComputeCapabilitiesFilter,ImagePropertiesFilter,
> ServerGroupAntiAffinityFilter,ServerGroupAffinityFilter
>
>
> Thanks a lot, Massimo
>
> [*]
>
> 2016-11-30 15:10:20.233 25140 WARNING nova.scheduler.utils
> [req-ec8c0bdc-b413-4cab-b925-eb8f11212049 840c96b6fb1e4972beaa3d30ade10cc7
> d27fe2becea94a3e980fb9f66e2f29
> 1a - - -] Failed to compute_task_build_instances: Exceeded maximum number
> of retries. Exceeded max scheduling attempts 5 for instance
> 314eccd0-fc73-446f-8138-7d8d3c
> 8644f7. Last exception: Insufficient compute resources: Free memory
> 2879.50 MB < requested 8192 MB.
> 2016-11-30 15:10:20.233 25140 WARNING nova.scheduler.utils
> [req-ec8c0bdc-b413-4cab-b925-eb8f11212049 840c96b6fb1e4972beaa3d30ade10cc7
> d27fe2becea94a3e980fb9f66e2f29
> 1a - - -] [instance: 314eccd0-fc73-446f-8138-7d8d3c8644f7] Setting
> instance to ERROR state.
>
>
> [**]
>
> 2016-11-30 15:10:48.873 25128 WARNING oslo.service.loopingcall [-]
> Function 'nova.servicegroup.drivers.db.DbDriver._report_state' run
> outlasted interval by 9.08 sec
> 2016-11-30 15:10:54.372 25142 WARNING oslo.service.loopingcall [-]
> Function 'nova.servicegroup.drivers.db.DbDriver._report_state' run
> outlasted interval by 9.33 sec
> 2016-11-30 15:10:54.375 25140 WARNING oslo.service.loopingcall [-]
> Function 'nova.servicegroup.drivers.db.DbDriver._report_state' run
> outlasted interval by 9.32 sec
> 2016-11-30 15:10:54.376 25129 WARNING oslo.service.loopingcall [-]
> Function 'nova.servicegroup.drivers.db.DbDriver._report_state' run
> outlasted interval by 9.30 sec
> 2016-11-30 15:10:54.381 25138 WARNING oslo.service.loopingcall [-]
> Function 'nova.servicegroup.drivers.db.DbDriver._report_state' run
> outlasted interval by 9.24 sec
> 2016-11-30 15:10:54.381 25139 WARNING oslo.service.loopingcall [-]
> Function 'nova.servicegroup.drivers.db.DbDriver._report_state' run
> outlasted interval by 9.28 sec
> 2016-11-30 15:10:54.382 25143 WARNING oslo.service.loopingcall [-]
> Function 'nova.servicegroup.drivers.db.DbDriver._report_state' run
> outlasted interval by 9.24 sec
> 2016-11-30 15:10:54.385 25141 WARNING oslo.service.loopingcall [-]
> Function 'nova.servicegroup.drivers.db.DbDriver._report_state' run
> outlasted interval by 9.11 sec
> 2016-11-30 15:11:01.964 25128 WARNING oslo.service.loopingcall [-]
> Function 'nova.servicegroup.drivers.db.DbDriver._report_state' run
> outlasted interval by 3.09 sec
> 2016-11-30 15:11:05.503 25142 WARNING oslo.service.loopingcall [-]
> Function 'nova.servicegroup.drivers.db.DbDriver._report_state' run
> outlasted interval by 1.13 sec
> 2016-11-30 15:11:05.506 25138 WARNING oslo.service.loopingcall [-]
> Function 'nova.servicegroup.drivers.db.DbDriver._report_state' run
> outlasted interval by 1.12 sec
> 2016-11-30 15:11:05.509 25139 WARNING oslo.service.loopingcall [-]
> Function 'nova.servicegroup.drivers.db.DbDriver._report_state' run
> outlasted interval by 1.13 sec
> 2016-11-30 15:11:05.512 25141 WARNING oslo.service.loopingcall [-]
> Function 'nova.servicegroup.drivers.db.DbDriver._report_state' run
> outlasted interval by 1.13 sec
> 2016-11-30 15:11:05.525 25143 WARNING oslo.service.loopingcall [-]
> Function 'nova.servicegroup.drivers.db.DbDriver._report_state' run
> outlasted interval by 1.14 sec
> 2016-11-30 15:11:05.526 25140 WARNING oslo.service.loopingcall [-]
> Function 'nova.servicegroup.drivers.db.DbDriver._report_state' run
> outlasted interval by 1.15 sec
> 2016-11-30 15:11:05.529 25129 WARNING oslo.service.loopingcall [-

[Openstack-operators] How to tune scheduling for "Insufficient compute resources" (race conditions ?)

2016-11-30 Thread Massimo Sgaravatto
Hi all

I have a problem with scheduling in our Mitaka Cloud,
Basically when there are a lot of requests for new instances, some of them
fail because "Failed to compute_task_build_instances: Exceeded maximum
number of retries". And the failures are because "Insufficient compute
resources: Free memory 2879.50 MB < requested
 8192 MB" [*]

But there are compute nodes with enough memory that could serve such
requests.

In the conductor log I also see messages reporting that "Function
'nova.servicegroup.drivers.db.DbDriver._report_state' run outlasted
interval by xxx sec" [**]


My understanding is that:

- VM a is scheduled to a certain compute node
- the scheduler chooses the same compute node for VM b before the info for
that compute node is updated (so the 'size' of VM a is not taken into
account)

Does this make sense or am I totally wrong ?

Any hints about how to cope with such scenarios, besides increasing
 scheduler_max_attempts ?

scheduler_default_filters is set to:

scheduler_default_filters =
AggregateInstanceExtraSpecsFilter,AggregateMultiTenancyIsolation,RetryFilter,AvailabilityZoneFilter,RamFilter,CoreFilter,AggregateRamFilter,AggregateCoreFilter,ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,ServerGroupAntiAffinityFilter,ServerGroupAffinityFilter


Thanks a lot, Massimo

[*]

2016-11-30 15:10:20.233 25140 WARNING nova.scheduler.utils
[req-ec8c0bdc-b413-4cab-b925-eb8f11212049 840c96b6fb1e4972beaa3d30ade10cc7
d27fe2becea94a3e980fb9f66e2f29
1a - - -] Failed to compute_task_build_instances: Exceeded maximum number
of retries. Exceeded max scheduling attempts 5 for instance
314eccd0-fc73-446f-8138-7d8d3c
8644f7. Last exception: Insufficient compute resources: Free memory 2879.50
MB < requested 8192 MB.
2016-11-30 15:10:20.233 25140 WARNING nova.scheduler.utils
[req-ec8c0bdc-b413-4cab-b925-eb8f11212049 840c96b6fb1e4972beaa3d30ade10cc7
d27fe2becea94a3e980fb9f66e2f29
1a - - -] [instance: 314eccd0-fc73-446f-8138-7d8d3c8644f7] Setting instance
to ERROR state.


[**]

2016-11-30 15:10:48.873 25128 WARNING oslo.service.loopingcall [-] Function
'nova.servicegroup.drivers.db.DbDriver._report_state' run outlasted
interval by 9.08 sec
2016-11-30 15:10:54.372 25142 WARNING oslo.service.loopingcall [-] Function
'nova.servicegroup.drivers.db.DbDriver._report_state' run outlasted
interval by 9.33 sec
2016-11-30 15:10:54.375 25140 WARNING oslo.service.loopingcall [-] Function
'nova.servicegroup.drivers.db.DbDriver._report_state' run outlasted
interval by 9.32 sec
2016-11-30 15:10:54.376 25129 WARNING oslo.service.loopingcall [-] Function
'nova.servicegroup.drivers.db.DbDriver._report_state' run outlasted
interval by 9.30 sec
2016-11-30 15:10:54.381 25138 WARNING oslo.service.loopingcall [-] Function
'nova.servicegroup.drivers.db.DbDriver._report_state' run outlasted
interval by 9.24 sec
2016-11-30 15:10:54.381 25139 WARNING oslo.service.loopingcall [-] Function
'nova.servicegroup.drivers.db.DbDriver._report_state' run outlasted
interval by 9.28 sec
2016-11-30 15:10:54.382 25143 WARNING oslo.service.loopingcall [-] Function
'nova.servicegroup.drivers.db.DbDriver._report_state' run outlasted
interval by 9.24 sec
2016-11-30 15:10:54.385 25141 WARNING oslo.service.loopingcall [-] Function
'nova.servicegroup.drivers.db.DbDriver._report_state' run outlasted
interval by 9.11 sec
2016-11-30 15:11:01.964 25128 WARNING oslo.service.loopingcall [-] Function
'nova.servicegroup.drivers.db.DbDriver._report_state' run outlasted
interval by 3.09 sec
2016-11-30 15:11:05.503 25142 WARNING oslo.service.loopingcall [-] Function
'nova.servicegroup.drivers.db.DbDriver._report_state' run outlasted
interval by 1.13 sec
2016-11-30 15:11:05.506 25138 WARNING oslo.service.loopingcall [-] Function
'nova.servicegroup.drivers.db.DbDriver._report_state' run outlasted
interval by 1.12 sec
2016-11-30 15:11:05.509 25139 WARNING oslo.service.loopingcall [-] Function
'nova.servicegroup.drivers.db.DbDriver._report_state' run outlasted
interval by 1.13 sec
2016-11-30 15:11:05.512 25141 WARNING oslo.service.loopingcall [-] Function
'nova.servicegroup.drivers.db.DbDriver._report_state' run outlasted
interval by 1.13 sec
2016-11-30 15:11:05.525 25143 WARNING oslo.service.loopingcall [-] Function
'nova.servicegroup.drivers.db.DbDriver._report_state' run outlasted
interval by 1.14 sec
2016-11-30 15:11:05.526 25140 WARNING oslo.service.loopingcall [-] Function
'nova.servicegroup.drivers.db.DbDriver._report_state' run outlasted
interval by 1.15 sec
2016-11-30 15:11:05.529 25129 WARNING oslo.service.loopingcall [-] Function
'nova.servicegroup.drivers.db.DbDriver._report_state' run outlasted
interval by 1.15 sec
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators