On 24/09/2013, at 2:09 AM, Халезов Иван <i.khale...@rts.ru> wrote:
> Hi all, > > I use pacemaker 1.1.9 with corosync 2.3 both built from source. > My OS is CentOS 6.4 x86_64 > > I have about 30 resources of one type managed by my own resource agent. It is > nesessary for the resource agent to know utilization parameter of the > configured resource. I query for this parameter by crm_resource utility in > the start function of the RA. After I had implemented this feature, I got a > lot of error's in my logs: > > Sep 23 19:19:47 iblade5 lrmd[7492]: notice: operation_finished: > RESOURCE_start_0:8445:stderr [ Could not establish cib_rw connection: > Resource temporarily unavailable (11) ] > Sep 23 19:19:47 iblade5 lrmd[7492]: notice: operation_finished: > RESOURCE_start_0:8445:stderr [ Error signing on to the CIB service: Transport > endpoint is not connected ] > > So, only few resources (about 4 or 5), every time different, start correctly > (crm_resource correctly returns the needed value during start action). And > all other resources fail to start. > > I think there is a problem when many (20-30) resources start at the same > time, and there are 20-30 queries to CIB from the resource agents > > How can I correct this ? I recall talking to NTT about this recently but I forget what they did to make progress. Perhaps you could look for $?=11 and try again. I _think_ there might have been a patch for libqb that resolved it.
signature.asc
Description: Message signed with OpenPGP using GPGMail
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org