Ok, I’ll look into it. Thanks for retesting. > On 5 Aug 2015, at 4:00 pm, renayama19661...@ybb.ne.jp wrote: > > Hi Andrew, > >>> Do you know if this behaviour still exists? >>> A LOT of work went into the remote node logic in the last couple of months, >> its >>> possible this was fixed as a side-effect. >> >> >> It is the latest and does not confirm it. >> I confirm it. > > > I confirmed it in latest > Pacemaker.(pacemaker-eefdc909a41b571dc2e155f7b14b5ef0368f2de7) > > After all the phenomenon occurs. > > > In the first clean up, pacemaker fails in connection with pacemaker_remote. > The second succeeds. > > The problem does not seem to be settled somehow or other. > > > > It was the latest and incorporated my log again. > > ------- > (snip) > static size_tcrm_remote_recv_once(crm_remote_t * remote){ int rc = 0; > size_t read_len = sizeof(struct crm_remote_header_v0); > struct crm_remote_header_v0 *header = crm_remote_header(remote); > > if(header) { > /* Stop at the end of the current message */ > read_len = header->size_total; > } > > /* automatically grow the buffer when needed */ > if(remote->buffer_size < read_len) { > remote->buffer_size = 2 * read_len; > crm_trace("Expanding buffer to %u bytes", remote->buffer_size); > > remote->buffer = realloc_safe(remote->buffer, remote->buffer_size + > 1); CRM_ASSERT(remote->buffer != NULL); > } > > #ifdef HAVE_GNUTLS_GNUTLS_H > if (remote->tls_session) { if (remote->buffer == NULL) { > crm_info("### YAMAUCHI buffer is NULL [buffer_zie[%d] > readlen[%d]", remote->buffer_size, read_len); > } > rc = gnutls_record_recv(*(remote->tls_session), > remote->buffer + remote->buffer_offset, > remote->buffer_size - remote->buffer_offset); > (snip) > ------- > > When Pacemaker fails in connection first in remote, my log is printed. > My log is not printed by the second connection. > > [root@sl7-01 ~]# tail -f /var/log/messages | grep YAMA > Aug 5 14:46:25 sl7-01 crmd[21306]: info: ### YAMAUCHI buffer is NULL > [buffer_zie[1326] readlen[40] > Aug 5 14:46:26 sl7-01 crmd[21306]: info: ### YAMAUCHI buffer is NULL > [buffer_zie[1326] readlen[40] > Aug 5 14:46:28 sl7-01 crmd[21306]: info: ### YAMAUCHI buffer is NULL > [buffer_zie[1326] readlen[40] > Aug 5 14:46:30 sl7-01 crmd[21306]: info: ### YAMAUCHI buffer is NULL > [buffer_zie[1326] readlen[40] > Aug 5 14:46:31 sl7-01 crmd[21306]: info: ### YAMAUCHI buffer is NULL > [buffer_zie[1326] readlen[40] > (snip) > > Best Regards, > Hideo Yamauchi. > > > > > ----- Original Message ----- >> From: "renayama19661...@ybb.ne.jp" <renayama19661...@ybb.ne.jp> >> To: Cluster Labs - All topics related to open-source clustering welcomed >> <users@clusterlabs.org> >> Cc: >> Date: 2015/8/4, Tue 18:40 >> Subject: Re: [ClusterLabs] Antw: Re: Antw: Re: [Question] About movement of >> pacemaker_remote. >> >> Hi Andrew, >> >>> Do you know if this behaviour still exists? >>> A LOT of work went into the remote node logic in the last couple of months, >> its >>> possible this was fixed as a side-effect. >> >> >> It is the latest and does not confirm it. >> I confirm it. >> >> Many Thanks! >> Hideo Yamauchi. >> >> >> ----- Original Message ----- >>> From: Andrew Beekhof <and...@beekhof.net> >>> To: renayama19661...@ybb.ne.jp; Cluster Labs - All topics related to >> open-source clustering welcomed <users@clusterlabs.org> >>> Cc: >>> Date: 2015/8/4, Tue 13:16 >>> Subject: Re: [ClusterLabs] Antw: Re: Antw: Re: [Question] About movement of >> pacemaker_remote. >>> >>> >>>> On 12 May 2015, at 12:12 pm, renayama19661...@ybb.ne.jp wrote: >>>> >>>> Hi All, >>>> >>>> The problem is like a buffer becoming NULL after crm_resouce -C >> practice >>> somehow or other after having rebooted remote node. >>>> >>>> I incorporated log in a source code and confirmed it. >>>> >>>> ------------------------------------------------ >>>> crm_remote_recv_once(crm_remote_t * remote) >>>> { >>>> (snip) >>>> /* automatically grow the buffer when needed */ >>>> if(remote->buffer_size < read_len) { >>>> remote->buffer_size = 2 * read_len; >>>> crm_trace("Expanding buffer to %u bytes", >>> remote->buffer_size); >>>> >>>> remote->buffer = realloc_safe(remote->buffer, >>> remote->buffer_size + 1); >>>> CRM_ASSERT(remote->buffer != NULL); >>>> } >>>> >>>> #ifdef HAVE_GNUTLS_GNUTLS_H >>>> if (remote->tls_session) { >>>> if (remote->buffer == NULL) { >>>> crm_info("### YAMAUCHI buffer is NULL [buffer_zie[%d] >>> readlen[%d]", remote->buffer_size, read_len); >>>> } >>>> rc = gnutls_record_recv(*(remote->tls_session), >>>> remote->buffer + >>> remote->buffer_offset, >>>> remote->buffer_size - >>> remote->buffer_offset); >>>> (snip) >>>> ------------------------------------------------ >>>> >>>> May 12 10:54:01 sl7-01 crmd[30447]: info: crm_remote_recv_once: ### >>> YAMAUCHI buffer is NULL [buffer_zie[1326] readlen[40] >>>> May 12 10:54:02 sl7-01 crmd[30447]: info: crm_remote_recv_once: ### >>> YAMAUCHI buffer is NULL [buffer_zie[1326] readlen[40] >>>> May 12 10:54:04 sl7-01 crmd[30447]: info: crm_remote_recv_once: ### >>> YAMAUCHI buffer is NULL [buffer_zie[1326] readlen[40] >>> >>> Do you know if this behaviour still exists? >>> A LOT of work went into the remote node logic in the last couple of months, >> its >>> possible this was fixed as a side-effect. >>> >>>> >>>> ------------------------------------------------ >>>> >>>> gnutls_record_recv processes an empty buffer and becomes the error. >>>> >>>> ------------------------------------------------ >>>> (snip) >>>> ssize_t >>>> _gnutls_recv_int(gnutls_session_t session, content_type_t type, >>>> gnutls_handshake_description_t htype, >>>> gnutls_packet_t *packet, >>>> uint8_t * data, size_t data_size, void *seq, >>>> unsigned int ms) >>>> { >>>> int ret; >>>> >>>> if (packet == NULL && (type != GNUTLS_ALERT && type != >> >>> GNUTLS_HEARTBEAT) >>>> && (data_size == 0 || data == NULL)) >>>> return gnutls_assert_val(GNUTLS_E_INVALID_REQUEST); >>>> >>>> (sip) >>>> ssize_t >>>> gnutls_record_recv(gnutls_session_t session, void *data, size_t >> data_size) >>>> { >>>> return _gnutls_recv_int(session, GNUTLS_APPLICATION_DATA, -1, NULL, >>>> data, data_size, NULL, >>>> session->internals.record_timeout_ms); >>>> } >>>> (snip) >>>> ------------------------------------------------ >>>> >>>> Best Regards, >>>> Hideo Yamauchi. >>>> >>>> >>>> >>>> ----- Original Message ----- >>>>> From: "renayama19661...@ybb.ne.jp" >>> <renayama19661...@ybb.ne.jp> >>>>> To: "users@clusterlabs.org" >> <users@clusterlabs.org> >>>>> Cc: >>>>> Date: 2015/5/11, Mon 16:45 >>>>> Subject: Re: [ClusterLabs] Antw: Re: Antw: Re: [Question] About >>> movement of pacemaker_remote. >>>>> >>>>> Hi Ulrich, >>>>> >>>>> Thank you for comments. >>>>> >>>>>> So your host and you resource are both named >> "snmp1"? I >>> also >>>>> don't >>>>>> have much experience with cleaning up resources for a node >> that is >>> offline. >>>>> What >>>>>> change should it make (while the node is offline)? >>>>> >>>>> >>>>> The name of the remote resource and the name of the remote node >> make >>> same >>>>> "snmp1". >>>>> >>>>> >>>>> (snip) >>>>> primitive snmp1 ocf:pacemaker:remote \ >>>>> params \ >>>>> server="snmp1" \ >>>>> op start interval="0s" timeout="60s" >>>>> on-fail="ignore" \ >>>>> op monitor interval="3s" timeout="15s" >> >>> \ >>>>> op stop interval="0s" timeout="60s" >>>>> on-fail="ignore" >>>>> >>>>> primitive Host-rsc1 ocf:heartbeat:Dummy \ >>>>> op start interval="0s" timeout="60s" >>>>> on-fail="restart" \ >>>>> op monitor interval="10s" >> timeout="60s" >>>>> on-fail="restart" \ >>>>> op stop interval="0s" timeout="60s" >>>>> on-fail="ignore" >>>>> >>>>> primitive Remote-rsc1 ocf:heartbeat:Dummy \ >>>>> op start interval="0s" timeout="60s" >>>>> on-fail="restart" \ >>>>> op monitor interval="10s" >> timeout="60s" >>>>> on-fail="restart" \ >>>>> op stop interval="0s" timeout="60s" >>>>> on-fail="ignore" >>>>> >>>>> location loc1 Remote-rsc1 \ >>>>> rule 200: #uname eq snmp1 >>>>> location loc3 Host-rsc1 \ >>>>> rule 200: #uname eq bl460g8n1 >>>>> (snip) >>>>> >>>>> The pacemaker_remoted of the snmp1 node stops in SIGTERM. >>>>> I reboot pacemaker_remoted of the snmp1 node afterwards. >>>>> And I execute crm_resource command, but the snmp1 node remains >>> off-line. >>>>> >>>>> After having executed crm_resource command, the remote node thinks >> that >>> it is >>>>> right movement to become the snmp1 online. >>>>> >>>>> >>>>> >>>>> Best Regards, >>>>> Hideo Yamauchi. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> ----- Original Message ----- >>>>>> From: Ulrich Windl <ulrich.wi...@rz.uni-regensburg.de> >>>>>> To: users@clusterlabs.org; renayama19661...@ybb.ne.jp >>>>>> Cc: >>>>>> Date: 2015/5/11, Mon 15:39 >>>>>> Subject: Antw: Re: [ClusterLabs] Antw: Re: [Question] About >>> movement of >>>>> pacemaker_remote. >>>>>> >>>>>>>>> <renayama19661...@ybb.ne.jp> schrieb am >>> 11.05.2015 um >>>>> 06:22 >>>>>> in Nachricht >>>>>> <361916.15877...@web200006.mail.kks.yahoo.co.jp>: >>>>>>> Hi All, >>>>>>> >>>>>>> I matched the OS version of the remote node with a host >> once >>> again and >>>>> >>>>>>> confirmed it in Pacemaker1.1.13-rc2. >>>>>>> >>>>>>> It was the same even if I made a host >> RHEL7.1.(bl460g8n1) >>>>>>> I made the remote host RHEL7.1.(snmp1) >>>>>>> >>>>>>> The first crm_resource -C fails. >>>>>>> -------------------------------- >>>>>>> [root@bl460g8n1 ~]# crm_resource -C -r snmp1 >>>>>>> Cleaning up snmp1 on bl460g8n1 >>>>>>> Waiting for 1 replies from the CRMd. OK >>>>>>> >>>>>>> [root@bl460g8n1 ~]# crm_mon -1 -Af >>>>>>> Last updated: Mon May 11 12:44:31 2015 >>>>>>> Last change: Mon May 11 12:43:30 2015 >>>>>>> Stack: corosync >>>>>>> Current DC: bl460g8n1 - partition WITHOUT quorum >>>>>>> Version: 1.1.12-7a2e3ae >>>>>>> 2 Nodes configured >>>>>>> 3 Resources configured >>>>>>> >>>>>>> >>>>>>> Online: [ bl460g8n1 ] >>>>>>> RemoteOFFLINE: [ snmp1 ] >>>>>> >>>>>> So your host and you resource are both named >> "snmp1"? I >>> also >>>>> don't >>>>>> have much experience with cleaning up resources for a node >> that is >>> offline. >>>>> What >>>>>> change should it make (while the node is offline)? >>>>>> >>>>>>> >>>>>>> Host-rsc1 (ocf::heartbeat:Dummy): Started bl460g8n1 >>>>>>> Remote-rsc1 (ocf::heartbeat:Dummy): Started bl460g8n1 >> >>> (failure >>>>> ignored) >>>>>>> >>>>>>> Node Attributes: >>>>>>> * Node bl460g8n1: >>>>>>> + ringnumber_0 : 192.168.101.21 >> is UP >>>>>>> + ringnumber_1 : 192.168.102.21 >> is UP >>>>>>> >>>>>>> Migration summary: >>>>>>> * Node bl460g8n1: >>>>>>> snmp1: migration-threshold=1 fail-count=1000000 >>>>> last-failure='Mon >>>>>> May 11 >>>>>>> 12:44:28 2015' >>>>>>> >>>>>>> Failed actions: >>>>>>> snmp1_start_0 on bl460g8n1 'unknown error' >> (1): >>> call=5, >>>>>> status=Timed >>>>>>> Out, exit-reason='none', last-rc-change='Mon >> May >>> 11 >>>>> 12:43:31 >>>>>> 2015', queued=0ms, >>>>>>> exec=0ms >>>>>>> -------------------------------- >>>>>>> >>>>>>> >>>>>>> The second crm_resource -C succeeded and was connected >> to the >>> remote >>>>> host. >>>>>> >>>>>> Then the node was online it seems. >>>>>> >>>>>> Regards, >>>>>> Ulrich >>>>>> >>>>>>> -------------------------------- >>>>>>> [root@bl460g8n1 ~]# crm_mon -1 -Af >>>>>>> Last updated: Mon May 11 12:44:54 2015 >>>>>>> Last change: Mon May 11 12:44:48 2015 >>>>>>> Stack: corosync >>>>>>> Current DC: bl460g8n1 - partition WITHOUT quorum >>>>>>> Version: 1.1.12-7a2e3ae >>>>>>> 2 Nodes configured >>>>>>> 3 Resources configured >>>>>>> >>>>>>> >>>>>>> Online: [ bl460g8n1 ] >>>>>>> RemoteOnline: [ snmp1 ] >>>>>>> >>>>>>> Host-rsc1 (ocf::heartbeat:Dummy): Started bl460g8n1 >>>>>>> Remote-rsc1 (ocf::heartbeat:Dummy): Started snmp1 >>>>>>> snmp1 (ocf::pacemaker:remote): Started bl460g8n1 >>>>>>> >>>>>>> Node Attributes: >>>>>>> * Node bl460g8n1: >>>>>>> + ringnumber_0 : 192.168.101.21 >> is UP >>>>>>> + ringnumber_1 : 192.168.102.21 >> is UP >>>>>>> * Node snmp1: >>>>>>> >>>>>>> Migration summary: >>>>>>> * Node bl460g8n1: >>>>>>> * Node snmp1: >>>>>>> -------------------------------- >>>>>>> >>>>>>> The gnutls of a host and the remote node was the next >>> version. >>>>>>> >>>>>>> gnutls-devel-3.3.8-12.el7.x86_64 >>>>>>> gnutls-dane-3.3.8-12.el7.x86_64 >>>>>>> gnutls-c++-3.3.8-12.el7.x86_64 >>>>>>> gnutls-3.3.8-12.el7.x86_64 >>>>>>> gnutls-utils-3.3.8-12.el7.x86_64 >>>>>>> >>>>>>> >>>>>>> Best Regards, >>>>>>> Hideo Yamauchi. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> ----- Original Message ----- >>>>>>>> From: "renayama19661...@ybb.ne.jp" >>>>>> <renayama19661...@ybb.ne.jp> >>>>>>>> To: Cluster Labs - All topics related to open-source >> >>> clustering >>>>>> welcomed >>>>>>> <users@clusterlabs.org> >>>>>>>> Cc: >>>>>>>> Date: 2015/4/28, Tue 14:06 >>>>>>>> Subject: Re: [ClusterLabs] Antw: Re: [Question] >> About >>> movement of >>>>>>> pacemaker_remote. >>>>>>>> >>>>>>>> Hi David, >>>>>>>> >>>>>>>> Even if the result changed the remote node to >> RHEL7.1, it >>> was the >>>>> same. >>>>>>>> >>>>>>>> >>>>>>>> I try it with a host node of pacemaker as RHEL7.1 >> this >>> time. >>>>>>>> >>>>>>>> >>>>>>>> I noticed an interesting phenomenon. >>>>>>>> The remote node fails in a reconnection in the first >> >>> crm_resource. >>>>>>>> However, the remote node succeeds in a reconnection >> in >>> the second >>>>>>> crm_resource. >>>>>>>> >>>>>>>> I think that I have some problem with the point >> where I >>> cut the >>>>>> connection >>>>>>> with >>>>>>>> the remote node first. >>>>>>>> >>>>>>>> Best Regards, >>>>>>>> Hideo Yamauchi. >>>>>>>> >>>>>>>> >>>>>>>> ----- Original Message ----- >>>>>>>>> From: "renayama19661...@ybb.ne.jp" >>>>>>>> <renayama19661...@ybb.ne.jp> >>>>>>>>> To: Cluster Labs - All topics related to >> open-source >>>>> clustering >>>>>> welcomed >>>>>>>> <users@clusterlabs.org> >>>>>>>>> Cc: >>>>>>>>> Date: 2015/4/28, Tue 11:52 >>>>>>>>> Subject: Re: [ClusterLabs] Antw: Re: [Question] >> About >>> >>>>> movement of >>>>>>>> pacemaker_remote. >>>>>>>>> >>>>>>>>> Hi David, >>>>>>>>> Thank you for comments. >>>>>>>>>> At first glance this looks gnutls related. >>> GNUTLS is >>>>>> returning -50 >>>>>>>> during >>>>>>>>> receive >>>>>>>>> >>>>>>>>>> on the client side (pacemaker's side). >> -50 >>> maps to >>>>>> 'invalid >>>>>>>>> request'. >debug: crm_remote_recv_once: >> >>> TLS >>>>> receive >>>>>> failed: The >>>>>>>>> request is invalid. >We treat this error as >> fatal >>> and >>>>> destroy >>>>>> the >>>>>>>> connection. >>>>>>>>> I've never encountered >>>>>>>>>> this error and I don't know what causes >> it. >>> It's >>>>>> possible >>>>>>>>> there's a bug in >>>>>>>>>> our gnutls usage... it's also possible >>> there's a >>>>> bug >>>>>> in the >>>>>>>> version >>>>>>>>> of gnutls >>>>>>>>>> that is in use as well. >>>>>>>>> We built the remote node in RHEL6.5. >>>>>>>>> Because it may be a problem of gnutls, I confirm >> it >>> in >>>>> RHEL7.1. >>>>>>>>> >>>>>>>>> Best Regards, >>>>>>>>> Hideo Yamauchi. >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Users mailing list: Users@clusterlabs.org >>>>>>>>> http://clusterlabs.org/mailman/listinfo/users >>>>>>>>> >>>>>>>>> Project Home: http://www.clusterlabs.org >>>>>>>>> Getting started: >>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>>>>>> Bugs: http://bugs.clusterlabs.org >>>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Users mailing list: Users@clusterlabs.org >>>>>>>> http://clusterlabs.org/mailman/listinfo/users >>>>>>>> >>>>>>>> Project Home: http://www.clusterlabs.org >>>>>>>> Getting started: >>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>>>>> Bugs: http://bugs.clusterlabs.org >>>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Users mailing list: Users@clusterlabs.org >>>>>>> http://clusterlabs.org/mailman/listinfo/users >>>>>>> >>>>>>> Project Home: http://www.clusterlabs.org >>>>>>> Getting started: >>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>>>> Bugs: http://bugs.clusterlabs.org >>>>>> >>>>> >>>>> _______________________________________________ >>>>> Users mailing list: Users@clusterlabs.org >>>>> http://clusterlabs.org/mailman/listinfo/users >>>>> >>>>> Project Home: http://www.clusterlabs.org >>>>> Getting started: >>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>> Bugs: http://bugs.clusterlabs.org >>>>> >>>> >>>> _______________________________________________ >>>> Users mailing list: Users@clusterlabs.org >>>> http://clusterlabs.org/mailman/listinfo/users >>>> >>>> Project Home: http://www.clusterlabs.org >>>> Getting started: >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>> Bugs: http://bugs.clusterlabs.org >>> >> >> _______________________________________________ >> Users mailing list: Users@clusterlabs.org >> http://clusterlabs.org/mailman/listinfo/users >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org >> > > _______________________________________________ > Users mailing list: Users@clusterlabs.org > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org
_______________________________________________ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org