Re: [openstack-dev] [nova][placement] Scheduler VM distribution

2018-04-19 Thread Jay Pipes

Привет, Андрей! Comments inline...

On 04/19/2018 10:27 AM, Andrey Volkov wrote:

Hello,

 From my understanding, we have a race between the scheduling
process and host weight update.

I made a simple experiment. On the 50 fake host environment
it was asked to boot 40 VMs those should be placed 1 on each host.
The hosts are equal to each other in terms of inventory.

img=6fedf6a1-5a55-4149-b774-b0b4dccd2ed1
flavor=1
for i in {1..40}; do
nova boot --flavor $flavor --image $img --nic none vm-$i;
sleep 1;
done

The following distribution was gotten:

mysql> select resource_provider_id, count(*) from allocations where 
resource_class_id = 0 group by 1;


+--+--+
| resource_provider_id | count(*) |
+--+--+
|                    1 |        2 |
|                   18 |        2 |
|                   19 |        3 |
|                   20 |        3 |
|                   26 |        2 |
|                   29 |        2 |
|                   33 |        3 |
|                   36 |        2 |
|                   41 |        1 |
|                   49 |        3 |
|                   51 |        2 |
|                   52 |        3 |
|                   55 |        2 |
|                   60 |        3 |
|                   61 |        2 |
|                   63 |        2 |
|                   67 |        3 |
+--+--+
17 rows in set (0.00 sec)

And the question is:
If we have an atomic resource allocation what is the reason
to use compute_nodes.* for weight calculation?


The resource allocation is only atomic in the placement service, since 
the placement service prevents clients from modifying records that have 
changed since the client read information about the record (it uses a 
"generation" field in the resource_providers table records to provide 
this protection).


What seems to be happening is that a scheduler thread's view of the set 
of HostState objects used in weighing is stale at some point in the 
weighing process. I'm going to guess and say you have 3 scheduler 
processes, right?


In other words, what is happening is something like this:

(Tx indicates a period in sequential time)

T0: thread A gets a list of filtered hosts and weighs them.
T1: thread B gets a list of filtered hosts and weighs them.
T2: thread A picks the first host in its weighed list
T3: thread B picks the first host in its weighed list (this is the same 
host as thread A picked)
T4: thread B increments the num_instances attribute of its HostState 
object for the chosen host (done in the 
HostState._consume_from_request() method)
T5: thread A increments the num_instances attribute of its HostState 
object for the same chosen host.


So, both thread A and B choose the same host because at the time they 
read the HostState objects, the num_instances attribute was 0 and the 
weight for that host was the same (2.0 in the logs).


I'm not aware of any effort to fix this behaviour in the scheduler.

Best,
-jay


There is a custom log of behavior I described: http://ix.io/18cw

--
Thanks,

Andrey Volkov,
Software Engineer, Mirantis, Inc.


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova][placement] Scheduler VM distribution

2018-04-19 Thread Andrey Volkov
Hello,

>From my understanding, we have a race between the scheduling
process and host weight update.

I made a simple experiment. On the 50 fake host environment
it was asked to boot 40 VMs those should be placed 1 on each host.
The hosts are equal to each other in terms of inventory.

img=6fedf6a1-5a55-4149-b774-b0b4dccd2ed1
flavor=1
for i in {1..40}; do
nova boot --flavor $flavor --image $img --nic none vm-$i;
sleep 1;
done

The following distribution was gotten:

mysql> select resource_provider_id, count(*) from allocations where
resource_class_id = 0 group by 1;

+--+--+
| resource_provider_id | count(*) |
+--+--+
|1 |2 |
|   18 |2 |
|   19 |3 |
|   20 |3 |
|   26 |2 |
|   29 |2 |
|   33 |3 |
|   36 |2 |
|   41 |1 |
|   49 |3 |
|   51 |2 |
|   52 |3 |
|   55 |2 |
|   60 |3 |
|   61 |2 |
|   63 |2 |
|   67 |3 |
+--+--+
17 rows in set (0.00 sec)

And the question is:
If we have an atomic resource allocation what is the reason
to use compute_nodes.* for weight calculation?

There is a custom log of behavior I described: http://ix.io/18cw

-- 
Thanks,

Andrey Volkov,
Software Engineer, Mirantis, Inc.
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev