Just the glusterd.log from each node, right?

> On Apr 4, 2019, at 11:25 AM, Atin Mukherjee <amukh...@redhat.com> wrote:
> 
> Darell,
> 
> I fully understand that you can't reproduce it and you don't have bandwidth 
> to test it again, but would you be able to send us the glusterd log from all 
> the nodes when this happened. We would like to go through the logs and get 
> back. I would particularly like to see if something has gone wrong with 
> transport.socket.listen-port option. But with out the log files we can't find 
> out anything. Hope you understand it.
> 
> On Thu, Apr 4, 2019 at 9:27 PM Darrell Budic <bu...@onholyground.com 
> <mailto:bu...@onholyground.com>> wrote:
> I didn’t follow any specific documents, just a generic rolling upgrade one 
> node at a time. Once the first node didn’t reconnect, I tried to follow the 
> workaround in the bug during the upgrade. Basic procedure was:
> 
> - take 3 nodes that were initially installed with 3.12.x (forget which, but 
> low number) and had been upgraded directly to 5.5 from 3.12.15
>   - op-version was 50400
> - on node A:
>   - yum install centos-release-gluster6
>   - yum upgrade (was some ovirt cockpit components, gluster, and a lib or two 
> this time), hit yes
>   - discover glusterd was dead
>   - systemctl restart glusterd
>   - no peer connections, try iptables -F; systemctl restart glusterd, no 
> change
> - following the workaround in the bug, try iptables -F & restart glusterd on 
> other 2 nodes, no effect
>   - nodes B & C were still connected to each other and all bricks were fine 
> at this point
> - try upgrading other 2 nodes and restarting gluster, no effect (iptables 
> still empty)
>   - lost quota here, so all bricks went offline
> - read logs, not finding much, but looked at glusterd.vol and compared to new 
> versions
> - updated glusterd.vol on A and restarted glusterd
>   - A doesn’t show any connected peers, but both other nodes show A as 
> connected
> - update glusterd.vol on B & C, restart glusterd
>   - all nodes show connected and volumes are active and healing
> 
> The only odd thing in my process was that node A did not have any active 
> bricks on it at the time of the upgrade. It doesn’t seem like this mattered 
> since B & C showed the same symptoms between themselves while being upgraded, 
> but I don’t know. The only log entry that referenced anything about peer 
> connections is included below already.
> 
> Looks like it was related to my glusterd settings, since that’s what fixed it 
> for me. Unfortunately, I don’t have the bandwidth or the systems to test 
> different versions of that specifically, but maybe you guys can on some test 
> resources? Otherwise, I’ve got another cluster (my production one!) that’s 
> midway through the upgrade from 3.12.15 -> 5.5. I paused when I started 
> getting multiple brick processes on the two nodes that had gone to 5.5 
> already. I think I’m going to jump the last node right to 6 to try and avoid 
> that mess, and it has the same glusterd.vol settings. I’ll try and capture 
> it’s logs during the upgrade and see if there’s any new info, or if it has 
> the same issues as this group did.
> 
>   -Darrell
> 
>> On Apr 4, 2019, at 2:54 AM, Sanju Rakonde <srako...@redhat.com 
>> <mailto:srako...@redhat.com>> wrote:
>> 
>> We don't hit https://bugzilla.redhat.com/show_bug.cgi?id=1694010 
>> <https://bugzilla.redhat.com/show_bug.cgi?id=1694010> while upgrading to 
>> glusterfs-6. We tested it in different setups and understood that this issue 
>> is seen because of some issue in setup.
>> 
>> regarding the issue you have faced, can you please let us know which 
>> documentation you have followed for the upgrade? During our testing, we 
>> didn't hit any such issue. we would like to understand what went wrong.
>> 
>> On Thu, Apr 4, 2019 at 2:08 AM Darrell Budic <bu...@onholyground.com 
>> <mailto:bu...@onholyground.com>> wrote:
>> Hari-
>> 
>> I was upgrading my test cluster from 5.5 to 6 and I hit this bug 
>> (https://bugzilla.redhat.com/show_bug.cgi?id=1694010 
>> <https://bugzilla.redhat.com/show_bug.cgi?id=1694010>) or something similar. 
>> In my case, the workaround did not work, and I was left with a gluster that 
>> had gone into no-quorum mode and stopped all the bricks. Wasn’t much in the 
>> logs either, but I noticed my /etc/glusterfs/glusterd.vol files were not the 
>> same as the newer versions, so I updated them, restarted glusterd, and 
>> suddenly the updated node showed as peer-in-cluster again. Once I updated 
>> other notes the same way, things started working again. Maybe a place to 
>> look?
>> 
>> My old config (all nodes):
>> volume management
>>     type mgmt/glusterd
>>     option working-directory /var/lib/glusterd
>>     option transport-type socket
>>     option transport.socket.keepalive-time 10
>>     option transport.socket.keepalive-interval 2
>>     option transport.socket.read-fail-log off
>>     option ping-timeout 10
>>     option event-threads 1
>>     option rpc-auth-allow-insecure on
>> #   option transport.address-family inet6
>> #   option base-port 49152
>> end-volume
>> 
>> changed to:
>> volume management
>>     type mgmt/glusterd
>>     option working-directory /var/lib/glusterd
>>     option transport-type socket,rdma
>>     option transport.socket.keepalive-time 10
>>     option transport.socket.keepalive-interval 2
>>     option transport.socket.read-fail-log off
>>     option transport.socket.listen-port 24007
>>     option transport.rdma.listen-port 24008
>>     option ping-timeout 0
>>     option event-threads 1
>>     option rpc-auth-allow-insecure on
>> #   option lock-timer 180
>> #   option transport.address-family inet6
>> #   option base-port 49152
>>     option max-port  60999
>> end-volume
>> 
>> the only thing I found in the glusterd logs that looks relevant was 
>> (repeated for both of the other nodes in this cluster), so no clue why it 
>> happened:
>> [2019-04-03 20:19:16.802638] I [MSGID: 106004] 
>> [glusterd-handler.c:6427:__glusterd_peer_rpc_notify] 0-management: Peer 
>> <ossuary-san> (<0ecbf953-681b-448f-9746-d1c1fe7a0978>), in state <Peer in 
>> Cluster>, has disconnected from glusterd.
>> 
>> 
>>> On Apr 2, 2019, at 4:53 AM, Atin Mukherjee <atin.mukherje...@gmail.com 
>>> <mailto:atin.mukherje...@gmail.com>> wrote:
>>> 
>>> 
>>> 
>>> On Mon, 1 Apr 2019 at 10:28, Hari Gowtham <hgowt...@redhat.com 
>>> <mailto:hgowt...@redhat.com>> wrote:
>>> Comments inline.
>>> 
>>> On Mon, Apr 1, 2019 at 5:55 AM Sankarshan Mukhopadhyay
>>> <sankarshan.mukhopadh...@gmail.com 
>>> <mailto:sankarshan.mukhopadh...@gmail.com>> wrote:
>>> >
>>> > Quite a considerable amount of detail here. Thank you!
>>> >
>>> > On Fri, Mar 29, 2019 at 11:42 AM Hari Gowtham <hgowt...@redhat.com 
>>> > <mailto:hgowt...@redhat.com>> wrote:
>>> > >
>>> > > Hello Gluster users,
>>> > >
>>> > > As you all aware that glusterfs-6 is out, we would like to inform you
>>> > > that, we have spent a significant amount of time in testing
>>> > > glusterfs-6 in upgrade scenarios. We have done upgrade testing to
>>> > > glusterfs-6 from various releases like 3.12, 4.1 and 5.3.
>>> > >
>>> > > As glusterfs-6 has got in a lot of changes, we wanted to test those 
>>> > > portions.
>>> > > There were xlators (and respective options to enable/disable them)
>>> > > added and deprecated in glusterfs-6 from various versions [1].
>>> > >
>>> > > We had to check the following upgrade scenarios for all such options
>>> > > Identified in [1]:
>>> > > 1) option never enabled and upgraded
>>> > > 2) option enabled and then upgraded
>>> > > 3) option enabled and then disabled and then upgraded
>>> > >
>>> > > We weren't manually able to check all the combinations for all the 
>>> > > options.
>>> > > So the options involving enabling and disabling xlators were 
>>> > > prioritized.
>>> > > The below are the result of the ones tested.
>>> > >
>>> > > Never enabled and upgraded:
>>> > > checked from 3.12, 4.1, 5.3 to 6 the upgrade works.
>>> > >
>>> > > Enabled and upgraded:
>>> > > Tested for tier which is deprecated, It is not a recommended upgrade.
>>> > > As expected the volume won't be consumable and will have a few more
>>> > > issues as well.
>>> > > Tested with 3.12, 4.1 and 5.3 to 6 upgrade.
>>> > >
>>> > > Enabled, disabled before upgrade.
>>> > > Tested for tier with 3.12 and the upgrade went fine.
>>> > >
>>> > > There is one common issue to note in every upgrade. The node being
>>> > > upgraded is going into disconnected state. You have to flush the 
>>> > > iptables
>>> > > and the restart glusterd on all nodes to fix this.
>>> > >
>>> >
>>> > Is this something that is written in the upgrade notes? I do not seem
>>> > to recall, if not, I'll send a PR
>>> 
>>> No this wasn't mentioned in the release notes. PRs are welcome.
>>> 
>>> >
>>> > > The testing for enabling new options is still pending. The new options
>>> > > won't cause as much issues as the deprecated ones so this was put at
>>> > > the end of the priority list. It would be nice to get contributions
>>> > > for this.
>>> > >
>>> >
>>> > Did the range of tests lead to any new issues?
>>> 
>>> Yes. In the first round of testing we found an issue and had to postpone the
>>> release of 6 until the fix was made available.
>>> https://bugzilla.redhat.com/show_bug.cgi?id=1684029 
>>> <https://bugzilla.redhat.com/show_bug.cgi?id=1684029>
>>> 
>>> And then we tested it again after this patch was made available.
>>> and came  across this:
>>> https://bugzilla.redhat.com/show_bug.cgi?id=1694010 
>>> <https://bugzilla.redhat.com/show_bug.cgi?id=1694010>
>>> 
>>> This isn’t a bug as we found that upgrade worked seamelessly in two 
>>> different setup. So we have no issues in the upgrade path to glusterfs-6 
>>> release.
>>> 
>>>  <https://bugzilla.redhat.com/show_bug.cgi?id=1694010>
>>> 
>>> Have mentioned this in the second mail as to how to over this situation
>>> for now until the fix is available.
>>> 
>>> >
>>> > > For the disable testing, tier was used as it covers most of the xlator
>>> > > that was removed. And all of these tests were done on a replica 3 
>>> > > volume.
>>> > >
>>> >
>>> > I'm not sure if the Glusto team is reading this, but it would be
>>> > pertinent to understand if the approach you have taken can be
>>> > converted into a form of automated testing pre-release.
>>> 
>>> I don't have an answer for this, have CCed Vijay.
>>> He might have an idea.
>>> 
>>> >
>>> > > Note: This is only for upgrade testing of the newly added and removed
>>> > > xlators. Does not involve the normal tests for the xlator.
>>> > >
>>> > > If you have any questions, please feel free to reach us.
>>> > >
>>> > > [1] 
>>> > > https://docs.google.com/spreadsheets/d/1nh7T5AXaV6kc5KgILOy2pEqjzC3t_R47f1XUXSVFetI/edit?usp=sharing
>>> > >  
>>> > > <https://docs.google.com/spreadsheets/d/1nh7T5AXaV6kc5KgILOy2pEqjzC3t_R47f1XUXSVFetI/edit?usp=sharing>
>>> > >
>>> > > Regards,
>>> > > Hari and Sanju.
>>> > _______________________________________________
>>> > Gluster-users mailing list
>>> > Gluster-users@gluster.org <mailto:Gluster-users@gluster.org>
>>> > https://lists.gluster.org/mailman/listinfo/gluster-users 
>>> > <https://lists.gluster.org/mailman/listinfo/gluster-users>
>>> 
>>> 
>>> 
>>> -- 
>>> Regards,
>>> Hari Gowtham.
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users@gluster.org <mailto:Gluster-users@gluster.org>
>>> https://lists.gluster.org/mailman/listinfo/gluster-users 
>>> <https://lists.gluster.org/mailman/listinfo/gluster-users>
>>> -- 
>>> --Atin
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users@gluster.org <mailto:Gluster-users@gluster.org>
>>> https://lists.gluster.org/mailman/listinfo/gluster-users 
>>> <https://lists.gluster.org/mailman/listinfo/gluster-users>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users@gluster.org <mailto:Gluster-users@gluster.org>
>> https://lists.gluster.org/mailman/listinfo/gluster-users 
>> <https://lists.gluster.org/mailman/listinfo/gluster-users>
>> 
>> -- 
>> Thanks,
>> Sanju
> 
> _______________________________________________
> Gluster-users mailing list
> Gluster-users@gluster.org <mailto:Gluster-users@gluster.org>
> https://lists.gluster.org/mailman/listinfo/gluster-users 
> <https://lists.gluster.org/mailman/listinfo/gluster-users>
_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Reply via email to