On Thu, 4 Apr 2019 at 22:10, Darrell Budic <bu...@onholyground.com> wrote:
> Just the glusterd.log from each node, right? > Yes. > > On Apr 4, 2019, at 11:25 AM, Atin Mukherjee <amukh...@redhat.com> wrote: > > Darell, > > I fully understand that you can't reproduce it and you don't have > bandwidth to test it again, but would you be able to send us the glusterd > log from all the nodes when this happened. We would like to go through the > logs and get back. I would particularly like to see if something has gone > wrong with transport.socket.listen-port option. But with out the log files > we can't find out anything. Hope you understand it. > > On Thu, Apr 4, 2019 at 9:27 PM Darrell Budic <bu...@onholyground.com> > wrote: > >> I didn’t follow any specific documents, just a generic rolling upgrade >> one node at a time. Once the first node didn’t reconnect, I tried to follow >> the workaround in the bug during the upgrade. Basic procedure was: >> >> - take 3 nodes that were initially installed with 3.12.x (forget which, >> but low number) and had been upgraded directly to 5.5 from 3.12.15 >> - op-version was 50400 >> - on node A: >> - yum install centos-release-gluster6 >> - yum upgrade (was some ovirt cockpit components, gluster, and a lib or >> two this time), hit yes >> - discover glusterd was dead >> - systemctl restart glusterd >> - no peer connections, try iptables -F; systemctl restart glusterd, no >> change >> - following the workaround in the bug, try iptables -F & restart glusterd >> on other 2 nodes, no effect >> - nodes B & C were still connected to each other and all bricks were >> fine at this point >> - try upgrading other 2 nodes and restarting gluster, no effect (iptables >> still empty) >> - lost quota here, so all bricks went offline >> - read logs, not finding much, but looked at glusterd.vol and compared to >> new versions >> - updated glusterd.vol on A and restarted glusterd >> - A doesn’t show any connected peers, but both other nodes show A as >> connected >> - update glusterd.vol on B & C, restart glusterd >> - all nodes show connected and volumes are active and healing >> >> The only odd thing in my process was that node A did not have any active >> bricks on it at the time of the upgrade. It doesn’t seem like this mattered >> since B & C showed the same symptoms between themselves while being >> upgraded, but I don’t know. The only log entry that referenced anything >> about peer connections is included below already. >> >> Looks like it was related to my glusterd settings, since that’s what >> fixed it for me. Unfortunately, I don’t have the bandwidth or the systems >> to test different versions of that specifically, but maybe you guys can on >> some test resources? Otherwise, I’ve got another cluster (my production >> one!) that’s midway through the upgrade from 3.12.15 -> 5.5. I paused when >> I started getting multiple brick processes on the two nodes that had gone >> to 5.5 already. I think I’m going to jump the last node right to 6 to try >> and avoid that mess, and it has the same glusterd.vol settings. I’ll try >> and capture it’s logs during the upgrade and see if there’s any new info, >> or if it has the same issues as this group did. >> >> -Darrell >> >> On Apr 4, 2019, at 2:54 AM, Sanju Rakonde <srako...@redhat.com> wrote: >> >> We don't hit https://bugzilla.redhat.com/show_bug.cgi?id=1694010 while >> upgrading to glusterfs-6. We tested it in different setups and understood >> that this issue is seen because of some issue in setup. >> >> regarding the issue you have faced, can you please let us know which >> documentation you have followed for the upgrade? During our testing, we >> didn't hit any such issue. we would like to understand what went wrong. >> >> On Thu, Apr 4, 2019 at 2:08 AM Darrell Budic <bu...@onholyground.com> >> wrote: >> >>> Hari- >>> >>> I was upgrading my test cluster from 5.5 to 6 and I hit this bug ( >>> https://bugzilla.redhat.com/show_bug.cgi?id=1694010) or something >>> similar. In my case, the workaround did not work, and I was left with a >>> gluster that had gone into no-quorum mode and stopped all the bricks. >>> Wasn’t much in the logs either, but I noticed my >>> /etc/glusterfs/glusterd.vol files were not the same as the newer versions, >>> so I updated them, restarted glusterd, and suddenly the updated node showed >>> as peer-in-cluster again. Once I updated other notes the same way, things >>> started working again. Maybe a place to look? >>> >>> My old config (all nodes): >>> volume management >>> type mgmt/glusterd >>> option working-directory /var/lib/glusterd >>> option transport-type socket >>> option transport.socket.keepalive-time 10 >>> option transport.socket.keepalive-interval 2 >>> option transport.socket.read-fail-log off >>> option ping-timeout 10 >>> option event-threads 1 >>> option rpc-auth-allow-insecure on >>> # option transport.address-family inet6 >>> # option base-port 49152 >>> end-volume >>> >>> changed to: >>> volume management >>> type mgmt/glusterd >>> option working-directory /var/lib/glusterd >>> option transport-type socket,rdma >>> option transport.socket.keepalive-time 10 >>> option transport.socket.keepalive-interval 2 >>> option transport.socket.read-fail-log off >>> option transport.socket.listen-port 24007 >>> option transport.rdma.listen-port 24008 >>> option ping-timeout 0 >>> option event-threads 1 >>> option rpc-auth-allow-insecure on >>> # option lock-timer 180 >>> # option transport.address-family inet6 >>> # option base-port 49152 >>> option max-port 60999 >>> end-volume >>> >>> the only thing I found in the glusterd logs that looks relevant was >>> (repeated for both of the other nodes in this cluster), so no clue why it >>> happened: >>> [2019-04-03 20:19:16.802638] I [MSGID: 106004] >>> [glusterd-handler.c:6427:__glusterd_peer_rpc_notify] 0-management: Peer >>> <ossuary-san> (<0ecbf953-681b-448f-9746-d1c1fe7a0978>), in state <Peer in >>> Cluster>, has disconnected from glusterd. >>> >>> >>> On Apr 2, 2019, at 4:53 AM, Atin Mukherjee <atin.mukherje...@gmail.com> >>> wrote: >>> >>> >>> >>> On Mon, 1 Apr 2019 at 10:28, Hari Gowtham <hgowt...@redhat.com> wrote: >>> >>>> Comments inline. >>>> >>>> On Mon, Apr 1, 2019 at 5:55 AM Sankarshan Mukhopadhyay >>>> <sankarshan.mukhopadh...@gmail.com> wrote: >>>> > >>>> > Quite a considerable amount of detail here. Thank you! >>>> > >>>> > On Fri, Mar 29, 2019 at 11:42 AM Hari Gowtham <hgowt...@redhat.com> >>>> wrote: >>>> > > >>>> > > Hello Gluster users, >>>> > > >>>> > > As you all aware that glusterfs-6 is out, we would like to inform >>>> you >>>> > > that, we have spent a significant amount of time in testing >>>> > > glusterfs-6 in upgrade scenarios. We have done upgrade testing to >>>> > > glusterfs-6 from various releases like 3.12, 4.1 and 5.3. >>>> > > >>>> > > As glusterfs-6 has got in a lot of changes, we wanted to test those >>>> portions. >>>> > > There were xlators (and respective options to enable/disable them) >>>> > > added and deprecated in glusterfs-6 from various versions [1]. >>>> > > >>>> > > We had to check the following upgrade scenarios for all such options >>>> > > Identified in [1]: >>>> > > 1) option never enabled and upgraded >>>> > > 2) option enabled and then upgraded >>>> > > 3) option enabled and then disabled and then upgraded >>>> > > >>>> > > We weren't manually able to check all the combinations for all the >>>> options. >>>> > > So the options involving enabling and disabling xlators were >>>> prioritized. >>>> > > The below are the result of the ones tested. >>>> > > >>>> > > Never enabled and upgraded: >>>> > > checked from 3.12, 4.1, 5.3 to 6 the upgrade works. >>>> > > >>>> > > Enabled and upgraded: >>>> > > Tested for tier which is deprecated, It is not a recommended >>>> upgrade. >>>> > > As expected the volume won't be consumable and will have a few more >>>> > > issues as well. >>>> > > Tested with 3.12, 4.1 and 5.3 to 6 upgrade. >>>> > > >>>> > > Enabled, disabled before upgrade. >>>> > > Tested for tier with 3.12 and the upgrade went fine. >>>> > > >>>> > > There is one common issue to note in every upgrade. The node being >>>> > > upgraded is going into disconnected state. You have to flush the >>>> iptables >>>> > > and the restart glusterd on all nodes to fix this. >>>> > > >>>> > >>>> > Is this something that is written in the upgrade notes? I do not seem >>>> > to recall, if not, I'll send a PR >>>> >>>> No this wasn't mentioned in the release notes. PRs are welcome. >>>> >>>> > >>>> > > The testing for enabling new options is still pending. The new >>>> options >>>> > > won't cause as much issues as the deprecated ones so this was put at >>>> > > the end of the priority list. It would be nice to get contributions >>>> > > for this. >>>> > > >>>> > >>>> > Did the range of tests lead to any new issues? >>>> >>>> Yes. In the first round of testing we found an issue and had to >>>> postpone the >>>> release of 6 until the fix was made available. >>>> https://bugzilla.redhat.com/show_bug.cgi?id=1684029 >>>> >>>> And then we tested it again after this patch was made available. >>>> and came across this: >>>> https://bugzilla.redhat.com/show_bug.cgi?id=1694010 >>> >>> >>> This isn’t a bug as we found that upgrade worked seamelessly in two >>> different setup. So we have no issues in the upgrade path to glusterfs-6 >>> release. >>> >>> <https://bugzilla.redhat.com/show_bug.cgi?id=1694010> >>>> >>>> Have mentioned this in the second mail as to how to over this situation >>>> for now until the fix is available. >>>> >>>> > >>>> > > For the disable testing, tier was used as it covers most of the >>>> xlator >>>> > > that was removed. And all of these tests were done on a replica 3 >>>> volume. >>>> > > >>>> > >>>> > I'm not sure if the Glusto team is reading this, but it would be >>>> > pertinent to understand if the approach you have taken can be >>>> > converted into a form of automated testing pre-release. >>>> >>>> I don't have an answer for this, have CCed Vijay. >>>> He might have an idea. >>>> >>>> > >>>> > > Note: This is only for upgrade testing of the newly added and >>>> removed >>>> > > xlators. Does not involve the normal tests for the xlator. >>>> > > >>>> > > If you have any questions, please feel free to reach us. >>>> > > >>>> > > [1] >>>> https://docs.google.com/spreadsheets/d/1nh7T5AXaV6kc5KgILOy2pEqjzC3t_R47f1XUXSVFetI/edit?usp=sharing >>>> > > >>>> > > Regards, >>>> > > Hari and Sanju. >>>> > _______________________________________________ >>>> > Gluster-users mailing list >>>> > Gluster-users@gluster.org >>>> > https://lists.gluster.org/mailman/listinfo/gluster-users >>>> >>>> >>>> >>>> -- >>>> Regards, >>>> Hari Gowtham. >>>> _______________________________________________ >>>> Gluster-users mailing list >>>> Gluster-users@gluster.org >>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>> >>> -- >>> --Atin >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users@gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-users >>> >>> >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users@gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-users >> >> >> >> -- >> Thanks, >> Sanju >> >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users@gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > > > -- - Atin (atinm)
_______________________________________________ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users