Thanks for your help.  I think I have it solved.

The trick is that the crm tools also need to know what the Pacemaker 
IPC buffer size is.  I have set:

/etc/sysconfig/pacemaker
#export LRMD_MAX_CHILDREN="8"

# Force use of a particular class of IPC connection
# PCMK_ipc_type=shared-mem|socket|posix|sysv
export PCMK_ipc_type=shared-mem

# Specify an IPC buffer size in bytes
# Useful when connecting to really big clusters that exceed the default 
20k buffer
# PCMK_ipc_buffer=20480
export PCMK_ipc_buffer=20480000

and

~/.bashrc
export PCMK_ipc_type=shared-mem
export PCMK_ipc_buffer=20480000

And now everything seems to play nicely together.

A 20MB buffer seems huge but I have a TON of virtual machines on this 
cluster.

On Fri 30 Aug 2013 01:00:36 AM EDT, Andrew Beekhof wrote:
> You'd have to ask suse.
> They'd know what the old and new are and therefor the differences between the 
> two.
>
> On 30/08/2013, at 2:21 PM, Tom Parker <tpar...@cbnco.com> wrote:
>
>> Do you know if this has changed significantly from the older versions?
>> This cluster was working fine before the upgrade.
>>
>> On Fri 30 Aug 2013 12:16:35 AM EDT, Andrew Beekhof wrote:
>>>
>>> On 30/08/2013, at 1:42 PM, Tom Parker <tpar...@cbnco.com> wrote:
>>>
>>>> My pacemaker config contains the following settings:
>>>>
>>>> LRMD_MAX_CHILDREN="8"
>>>> export PCMK_ipc_buffer=3172882
>>>
>>> perhaps go higher
>>>
>>>>
>>>> This is what I had today to get to 127 Resources defined.  I am not sure 
>>>> what I should choose for the PCMK_ipc_type.  Do you have any suggestions 
>>>> for large clusters?
>>>
>>> shm is the new upstream default, but it may not have propagated to suse yet.
>>>
>>>>
>>>> Thanks
>>>>
>>>> Tom
>>>>
>>>> On 08/29/2013 11:19 PM, Andrew Beekhof wrote:
>>>>> On 30/08/2013, at 5:49 AM, Tom Parker <tpar...@cbnco.com>
>>>>> wrote:
>>>>>
>>>>>
>>>>>> Hello.  Las night I updated my SLES 11 servers to HAE-SP3 which contains
>>>>>> the following versions of software:
>>>>>>
>>>>>> cluster-glue-1.0.11-0.15.28
>>>>>> libcorosync4-1.4.5-0.18.15
>>>>>> corosync-1.4.5-0.18.15
>>>>>> pacemaker-mgmt-2.1.2-0.7.40
>>>>>> pacemaker-mgmt-client-2.1.2-0.7.40
>>>>>> pacemaker-1.1.9-0.19.102
>>>>>>
>>>>>> With the previous versions of openais/corosync I could run over 200
>>>>>> resources with no problems and with very little lag with the management
>>>>>> commands (crm_mon, crm configure, etc)
>>>>>>
>>>>>> Today I am unable to configure more than 127 resources.  When I commit
>>>>>> my 128th resource all the crm commands start to fail (crm_mon just
>>>>>> hangs) or timeout (ERROR: running cibadmin -Ql: Call cib_query failed
>>>>>> (-62): Timer expired)
>>>>>>
>>>>>> I have attached my original crm config with 201 primitives to this 
>>>>>> e-mail.
>>>>>>
>>>>>> If anyone has any ideas as to what may have changed between pacemaker
>>>>>> versions that would cause this please let me know.  If I can't get this
>>>>>> solved this week I will have to downgrade to SP2 again.
>>>>>>
>>>>>> Thanks for any information.
>>>>>>
>>>>> I suspect you've hit an IPC buffer limit.
>>>>>
>>>>> Depending on exactly what went into the SUSE builds, you should have the 
>>>>> following environment variables (documentation from 
>>>>> /etc/syconfig/pacemaker on RHEL) to play with:
>>>>>
>>>>> # Force use of a particular class of IPC connection
>>>>> # PCMK_ipc_type=shared-mem|socket|posix|sysv
>>>>>
>>>>> # Specify an IPC buffer size in bytes
>>>>> # Useful when connecting to really big clusters that exceed the default 
>>>>> 20k buffer
>>>>> # PCMK_ipc_buffer=20480
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Linux-HA mailing list
>>>>>
>>>>> Linux-HA@lists.linux-ha.org
>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>>>>
>>>>> See also:
>>>>> http://linux-ha.org/ReportingProblems
>>>>
>>>
>
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to