On 5 Jun 2014, at 10:38 am, Patrick Hemmer <pacema...@feystorm.net> wrote:

> From: Andrew Beekhof <and...@beekhof.net>
> Sent: 2014-06-04 20:15:22 EDT
> To: The Pacemaker cluster resource manager <pacemaker@oss.clusterlabs.org>
> Subject: Re: [Pacemaker] resources not rebalancing
> 
>> On 5 Jun 2014, at 12:57 am, Patrick Hemmer <pacema...@feystorm.net>
>>  wrote:
>> 
>> 
>>> From: Andrew Beekhof <and...@beekhof.net>
>>> 
>>> Sent: 2014-06-04 04:15:48 E
>>> To: The Pacemaker cluster resource manager 
>>> <pacemaker@oss.clusterlabs.org>
>>> 
>>> Subject: Re: [Pacemaker] resources not rebalancing
>>> 
>>> 
>>>> On 4 Jun 2014, at 4:22 pm, Patrick Hemmer <pacema...@feystorm.net>
>>>> 
>>>>  wrote:
>>>> 
>>>> 
>>>> 
>>>>> Testing some different scenarios, and after bringing a node back online, 
>>>>> none of the resources move to it unless they are restarted. However 
>>>>> default-resource-stickiness is set to 0, so they should be able to move 
>>>>> around freely.
>>>>> 
>>>>> # pcs status
>>>>> Cluster name: docker
>>>>> Last updated: Wed Jun  4 06:09:26 2014
>>>>> Last change: Wed Jun  4 06:08:40 2014 via cibadmin on i-093f1f55
>>>>> Stack: corosync
>>>>> Current DC: i-083f1f54 (3) - partition with quorum
>>>>> Version: 1.1.11-1.fc20-9d39a6b
>>>>> 3 Nodes configured
>>>>> 8 Resources configured
>>>>> 
>>>>> 
>>>>> Online: [ i-053f1f59 i-083f1f54 i-093f1f55 ]
>>>>> 
>>>>> Full list of resources:
>>>>> 
>>>>>  dummy2    (ocf::pacemaker:Dummy):    Started i-083f1f54 
>>>>>  Clone Set: dummy1-clone [dummy1] (unique)
>>>>>      dummy1:0    (ocf::pacemaker:Dummy):    Started i-083f1f54 
>>>>>      dummy1:1    (ocf::pacemaker:Dummy):    Started i-093f1f55 
>>>>>      dummy1:2    (ocf::pacemaker:Dummy):    Started i-093f1f55 
>>>>>      dummy1:3    (ocf::pacemaker:Dummy):    Started i-083f1f54 
>>>>>      dummy1:4    (ocf::pacemaker:Dummy):    Started i-093f1f55 
>>>>> 
>>>>> # pcs resource show --all 
>>>>>  Resource: dummy2 (class=ocf provider=pacemaker type=Dummy)
>>>>>  Clone: dummy1-clone
>>>>>   Meta Attrs: clone-max=5 clone-node-max=5 globally-unique=true 
>>>>>   Resource: dummy1 (class=ocf provider=pacemaker type=Dummy)
>>>>> 
>>>>> # pcs property show --all | grep default-resource-stickiness
>>>>>  default-resource-stickiness: 0
>>>>> 
>>>>> Notice how i-053f1f59 isn't running anything. I feel like I'm missing 
>>>>> something obvious, but it escapes me.
>>>>> 
>>>>> 
>>>> clones are ever so slightly sticky by default, try setting 
>>>> resource-stickiness=0 for the clone resource
>>>> (and unset it once everything has moved back)
>>>> 
>>>> 
>>>> 
>>> Thanks, that did indeed fix it. But how come dummy2 didn't move? It's not a 
>>> clone, but it didn't move either?
>>> 
>> Do you have a location constraint that says it should prefer i-053f1f59?
> No location constraint.
> 
>>> And now a separate follow up question, the resources didn't balance as they 
>>> should. I've got several utilization attributes set, and the resources 
>>> aren't balanced according to the placement-strategy.
>>> 
>>> # pcs property show placement-strategy
>>> Cluster Properties:
>>>  placement-strategy: balanced
>>> 
>>> # crm_simulate -URL
>>> 
>>> Current cluster status:
>>> Online: [ i-053f1f59 i-083f1f54 i-093f1f55 ]
>>> 
>>>  dummy2    (ocf::pacemaker:Dummy):    Started i-053f1f59 
>>>  Clone Set: dummy1-clone [dummy1] (unique)
>>>      dummy1:0    (ocf::pacemaker:Dummy):    Started i-053f1f59 
>>>      dummy1:1    (ocf::pacemaker:Dummy):    Started i-093f1f55 
>>>      dummy1:2    (ocf::pacemaker:Dummy):    Started i-083f1f54 
>>>      dummy1:3    (ocf::pacemaker:Dummy):    Started i-083f1f54 
>>>      dummy1:4    (ocf::pacemaker:Dummy):    Started i-093f1f55 
>>> 
>>> Utilization information:
>>> Original: i-053f1f59 capacity: cpu=5000000 mem=3840332000
>>> Original: i-083f1f54 capacity: cpu=5000000 mem=3840332000
>>> Original: i-093f1f55 capacity: cpu=5000000 mem=3840332000
>>> calculate_utilization: dummy2 utilization on i-053f1f59: cpu=10000
>>> calculate_utilization: dummy1:2 utilization on i-083f1f54: cpu=1000
>>> calculate_utilization: dummy1:1 utilization on i-093f1f55: cpu=1000
>>> calculate_utilization: dummy1:0 utilization on i-053f1f59: cpu=1000
>>> calculate_utilization: dummy1:3 utilization on i-083f1f54: cpu=1000
>>> calculate_utilization: dummy1:4 utilization on i-093f1f55: cpu=1000
>>> Remaining: i-053f1f59 capacity: cpu=4989000 mem=3840332000
>>> Remaining: i-083f1f54 capacity: cpu=4998000 mem=3840332000
>>> Remaining: i-093f1f55 capacity: cpu=4998000 mem=3840332000
>>> 
>>> 
>>> 
>>> The "balanced" strategy is defined as: "the node that has more free 
>>> capacity gets consumed first".
>>> Notice that dummy2 consumes cpu=10000, while dummy1 is only 1000 (10x 
>>> less). After dummy2 was placed on i-053f1f59, that should have consumed 
>>> enough "cpu" resource to keep dummy1 off it and on the other 2 nodes, but 
>>> dummy1:0 got placed on the node.
>>> 
>> But i-053f1f59 still has orders of magnitude more cpu capacity left to run 
>> things. 
> 
> I don't follow. They're all equal in terms of total "cpu" capacity.

Right. But each node still has 4998000+ units with which to accommodate 
something that only requires 10000.
Thats about 0.2% of the remaining capacity, so wherever it starts, its hardly 
making a dint.

>  And at the bottom of the simulate output, the "Remaining" even shows 
> i-053f1f59 has less remaining than the other nodes.
> 
> However after playing with it some more, this appears to be an issue with 
> clones. When I created 5 separate resources instead, this does work as 
> expected. the dummy2 resource gets put on a node by itself, and the other 
> resources get distributed among the remaining nodes (at least until the "cpu" 
> used balances out).
> 
> Since this smells like a bug, I can enter it on the bug tracker you mention 
> below.

Its probably a result of clone stickiness (they have a default of 1) and the 
hoops we have to jump through to avoid them needlessly shuffling around.

> 
>> 
>>> Also how difficult is it to add a strategy?
>>> 
>> It might be challenging, the policy engine is deep voodoo :)
>> Can you create an entry at bugs.clusterlabs.org and include the result of 
>> 'cibadmin -Q' when the cluster is in the state you describe above?
>> 
>> It wont make it into 1.1.12 but we can look at it for .13
>> 
> 
> Will ponder possible scenarios and then enter it. Another thought occurred 
> that you might want to balance based on percentage of capacity used. So now 
> you've got, balanced based on amount of capacity used, balanced based on 
> amount of capacity free, and balance based on percent of capacity. All 3 of 
> them are probably similar enough in logic that the same algorithm could take 
> care of them, would just need a way to tune that algorithm (this would be my 
> guess anyway, no clue what the code looks like).
> 
> 
>> 
>>> I'd be interested in having a strategy which places a resource on a node 
>>> with the least amount of capacity used? Kind of the inverse of "balanced". 
>>> The docs say balanced looks at much capacity is free. The 2 strategies 
>>> would be equivalent if all nodes have the same capacity, but if one node 
>>> has 10x the capacity of the other nodes, I want the resources to be 
>>> distributed evenly (based on the capacity each uses), and not over-utilize 
>>> that one node.
>>> 
>>> Thanks
>>> 
>>> -Patrick
>>> 
>>> _
>>> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to