The patch was definitely there in 3.12.3. Do you have the glusterd and brick logs handy with you when this happened?
On Sun, Jan 21, 2018 at 10:21 PM, Alan Orth <alan.o...@gmail.com> wrote: > For what it's worth, I just updated some CentOS 7 servers from GlusterFS > 3.12.1 to 3.12.4 and hit this bug. Did the patch make it into 3.12.4? I had > to use Mike Hulsman's script to check the daemon port against the port in > the volume's brick info, update the port, and restart glusterd on each > node. Luckily I only have four servers! Hoping I don't have to do this > every time I reboot! > > Regards, > > On Sat, Dec 2, 2017 at 5:23 PM Atin Mukherjee <amukh...@redhat.com> wrote: > >> On Sat, 2 Dec 2017 at 19:29, Jo Goossens <jo.gooss...@hosted-power.com> >> wrote: >> >>> Hello Atin, >>> >>> >>> >>> >>> >>> Could you confirm this should have been fixed in 3.10.8? If so we'll >>> test it for sure! >>> >> >> Fix should be part of 3.10.8 which is awaiting release announcement. >> >> >>> >>> Regards >>> >>> Jo >>> >>> >>> >>> >>> >>> >>> -----Original message----- >>> *From:* Atin Mukherjee <amukh...@redhat.com> >>> >>> *Sent:* Mon 30-10-2017 17:40 >>> *Subject:* Re: [Gluster-users] BUG: After stop and start wrong port is >>> advertised >>> *To:* Jo Goossens <jo.gooss...@hosted-power.com>; >>> *CC:* gluster-users@gluster.org; >>> >>> On Sat, 28 Oct 2017 at 02:36, Jo Goossens <jo.gooss...@hosted-power.com> >>> wrote: >>> >>> Hello Atin, >>> >>> >>> >>> >>> >>> I just read it and very happy you found the issue. We really hope this >>> will be fixed in the next 3.10.7 version! >>> >>> >>> 3.10.7 - no I guess as the patch is still in review and 3.10.7 is >>> getting tagged today. You’ll get this fix in 3.10.8. >>> >>> >>> >>> >>> >>> >>> >>> >>> PS: Wow nice all that c code and those "goto out" statements (not always >>> considered clean but the best way often I think). Can remember the days I >>> wrote kernel drivers myself in c :) >>> >>> >>> >>> >>> >>> Regards >>> >>> Jo Goossens >>> >>> >>> >>> >>> >>> >>> >>> >>> -----Original message----- >>> *From:* Atin Mukherjee <amukh...@redhat.com> >>> *Sent:* Fri 27-10-2017 21:01 >>> *Subject:* Re: [Gluster-users] BUG: After stop and start wrong port is >>> advertised >>> *To:* Jo Goossens <jo.gooss...@hosted-power.com>; >>> *CC:* gluster-users@gluster.org; >>> >>> We (finally) figured out the root cause, Jo! >>> >>> Patch https://review.gluster.org/#/c/18579 posted upstream for review. >>> >>> On Thu, Sep 21, 2017 at 2:08 PM, Jo Goossens < >>> jo.gooss...@hosted-power.com> wrote: >>> >>> Hi, >>> >>> >>> >>> >>> >>> We use glusterfs 3.10.5 on Debian 9. >>> >>> >>> >>> When we stop or restart the service, e.g.: service glusterfs-server >>> restart >>> >>> >>> >>> We see that the wrong port get's advertised afterwards. For example: >>> >>> >>> >>> Before restart: >>> >>> >>> Status of volume: public >>> Gluster process TCP Port RDMA Port Online >>> Pid >>> ------------------------------------------------------------ >>> ------------------ >>> Brick 192.168.140.41:/gluster/public 49153 0 Y >>> 6364 >>> Brick 192.168.140.42:/gluster/public 49152 0 Y >>> 1483 >>> Brick 192.168.140.43:/gluster/public 49152 0 Y >>> 5913 >>> Self-heal Daemon on localhost N/A N/A Y >>> 5932 >>> Self-heal Daemon on 192.168.140.42 N/A N/A Y >>> 13084 >>> Self-heal Daemon on 192.168.140.41 N/A N/A Y >>> 15499 >>> >>> Task Status of Volume public >>> ------------------------------------------------------------ >>> ------------------ >>> There are no active volume tasks >>> >>> >>> After restart of the service on one of the nodes (192.168.140.43) the >>> port seems to have changed (but it didn't): >>> >>> root@app3:/var/log/glusterfs# gluster volume status >>> Status of volume: public >>> Gluster process TCP Port RDMA Port Online >>> Pid >>> ------------------------------------------------------------ >>> ------------------ >>> Brick 192.168.140.41:/gluster/public 49153 0 Y >>> 6364 >>> Brick 192.168.140.42:/gluster/public 49152 0 Y >>> 1483 >>> Brick 192.168.140.43:/gluster/public 49154 0 Y >>> 5913 >>> Self-heal Daemon on localhost N/A N/A Y >>> 4628 >>> Self-heal Daemon on 192.168.140.42 N/A N/A Y >>> 3077 >>> Self-heal Daemon on 192.168.140.41 N/A N/A Y >>> 28777 >>> >>> Task Status of Volume public >>> ------------------------------------------------------------ >>> ------------------ >>> There are no active volume tasks >>> >>> >>> However the active process is STILL the same pid AND still listening on >>> the old port >>> >>> root@192.168.140.43:/var/log/glusterfs# netstat -tapn | grep gluster >>> tcp 0 0 0.0.0.0:49152 0.0.0.0:* >>> LISTEN 5913/glusterfsd >>> >>> >>> The other nodes logs fill up with errors because they can't reach the >>> daemon anymore. They try to reach it on the "new" port instead of the old >>> one: >>> >>> [2017-09-21 08:33:25.225006] E [socket.c:2327:socket_connect_finish] >>> 0-public-client-2: connection to 192.168.140.43:49154 failed >>> (Connection refused); disconnecting socket >>> [2017-09-21 08:33:29.226633] I [rpc-clnt.c:2000:rpc_clnt_reconfig] >>> 0-public-client-2: changing port to 49154 (from 0) >>> [2017-09-21 08:33:29.227490] E [socket.c:2327:socket_connect_finish] >>> 0-public-client-2: connection to 192.168.140.43:49154 failed >>> (Connection refused); disconnecting socket >>> [2017-09-21 08:33:33.225849] I [rpc-clnt.c:2000:rpc_clnt_reconfig] >>> 0-public-client-2: changing port to 49154 (from 0) >>> [2017-09-21 08:33:33.236395] E [socket.c:2327:socket_connect_finish] >>> 0-public-client-2: connection to 192.168.140.43:49154 failed >>> (Connection refused); disconnecting socket >>> [2017-09-21 08:33:37.225095] I [rpc-clnt.c:2000:rpc_clnt_reconfig] >>> 0-public-client-2: changing port to 49154 (from 0) >>> [2017-09-21 08:33:37.225628] E [socket.c:2327:socket_connect_finish] >>> 0-public-client-2: connection to 192.168.140.43:49154 failed >>> (Connection refused); disconnecting socket >>> [2017-09-21 08:33:41.225805] I [rpc-clnt.c:2000:rpc_clnt_reconfig] >>> 0-public-client-2: changing port to 49154 (from 0) >>> [2017-09-21 08:33:41.226440] E [socket.c:2327:socket_connect_finish] >>> 0-public-client-2: connection to 192.168.140.43:49154 failed >>> (Connection refused); disconnecting socket >>> >>> So they now try 49154 instead of the old 49152 >>> >>> Is this also by design? We had a lot of issues because of this recently. >>> We don't understand why it starts advertising a completely wrong port after >>> stop/start. >>> >>> >>> >>> >>> >>> >>> >>> Regards >>> >>> Jo Goossens >>> >>> >>> >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users@gluster.org >>> http://lists.gluster.org/mailman/listinfo/gluster-users >>> >>> -- >>> - Atin (atinm) >>> >>> -- >> - Atin (atinm) >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users@gluster.org >> http://lists.gluster.org/mailman/listinfo/gluster-users > > > > -- > > Alan Orth > alan.o...@gmail.com > https://picturingjordan.com > https://englishbulgaria.net > https://mjanja.ch >
_______________________________________________ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users