Re: [Gluster-users] [Gluster-devel] Upgrade testing to gluster 6

2019-04-04 Thread Atin Mukherjee
On Thu, 4 Apr 2019 at 22:10, Darrell Budic  wrote:

> Just the glusterd.log from each node, right?
>

Yes.


>
> On Apr 4, 2019, at 11:25 AM, Atin Mukherjee  wrote:
>
> Darell,
>
> I fully understand that you can't reproduce it and you don't have
> bandwidth to test it again, but would you be able to send us the glusterd
> log from all the nodes when this happened. We would like to go through the
> logs and get back. I would particularly like to see if something has gone
> wrong with transport.socket.listen-port option. But with out the log files
> we can't find out anything. Hope you understand it.
>
> On Thu, Apr 4, 2019 at 9:27 PM Darrell Budic 
> wrote:
>
>> I didn’t follow any specific documents, just a generic rolling upgrade
>> one node at a time. Once the first node didn’t reconnect, I tried to follow
>> the workaround in the bug during the upgrade. Basic procedure was:
>>
>> - take 3 nodes that were initially installed with 3.12.x (forget which,
>> but low number) and had been upgraded directly to 5.5 from 3.12.15
>>   - op-version was 50400
>> - on node A:
>>   - yum install centos-release-gluster6
>>   - yum upgrade (was some ovirt cockpit components, gluster, and a lib or
>> two this time), hit yes
>>   - discover glusterd was dead
>>   - systemctl restart glusterd
>>   - no peer connections, try iptables -F; systemctl restart glusterd, no
>> change
>> - following the workaround in the bug, try iptables -F & restart glusterd
>> on other 2 nodes, no effect
>>   - nodes B & C were still connected to each other and all bricks were
>> fine at this point
>> - try upgrading other 2 nodes and restarting gluster, no effect (iptables
>> still empty)
>>   - lost quota here, so all bricks went offline
>> - read logs, not finding much, but looked at glusterd.vol and compared to
>> new versions
>> - updated glusterd.vol on A and restarted glusterd
>>   - A doesn’t show any connected peers, but both other nodes show A as
>> connected
>> - update glusterd.vol on B & C, restart glusterd
>>   - all nodes show connected and volumes are active and healing
>>
>> The only odd thing in my process was that node A did not have any active
>> bricks on it at the time of the upgrade. It doesn’t seem like this mattered
>> since B & C showed the same symptoms between themselves while being
>> upgraded, but I don’t know. The only log entry that referenced anything
>> about peer connections is included below already.
>>
>> Looks like it was related to my glusterd settings, since that’s what
>> fixed it for me. Unfortunately, I don’t have the bandwidth or the systems
>> to test different versions of that specifically, but maybe you guys can on
>> some test resources? Otherwise, I’ve got another cluster (my production
>> one!) that’s midway through the upgrade from 3.12.15 -> 5.5. I paused when
>> I started getting multiple brick processes on the two nodes that had gone
>> to 5.5 already. I think I’m going to jump the last node right to 6 to try
>> and avoid that mess, and it has the same glusterd.vol settings. I’ll try
>> and capture it’s logs during the upgrade and see if there’s any new info,
>> or if it has the same issues as this group did.
>>
>>   -Darrell
>>
>> On Apr 4, 2019, at 2:54 AM, Sanju Rakonde  wrote:
>>
>> We don't hit https://bugzilla.redhat.com/show_bug.cgi?id=1694010 while
>> upgrading to glusterfs-6. We tested it in different setups and understood
>> that this issue is seen because of some issue in setup.
>>
>> regarding the issue you have faced, can you please let us know which
>> documentation you have followed for the upgrade? During our testing, we
>> didn't hit any such issue. we would like to understand what went wrong.
>>
>> On Thu, Apr 4, 2019 at 2:08 AM Darrell Budic 
>> wrote:
>>
>>> Hari-
>>>
>>> I was upgrading my test cluster from 5.5 to 6 and I hit this bug (
>>> https://bugzilla.redhat.com/show_bug.cgi?id=1694010) or something
>>> similar. In my case, the workaround did not work, and I was left with a
>>> gluster that had gone into no-quorum mode and stopped all the bricks.
>>> Wasn’t much in the logs either, but I noticed my
>>> /etc/glusterfs/glusterd.vol files were not the same as the newer versions,
>>> so I updated them, restarted glusterd, and suddenly the updated node showed
>>> as peer-in-cluster again. Once I updated other notes the same way, things
>>> started working again. Maybe a place to look?
>>>
>>> My old config (all nodes):
>>> volume management
>>> type mgmt/glusterd
>>> option working-directory /var/lib/glusterd
>>> option transport-type socket
>>> option transport.socket.keepalive-time 10
>>> option transport.socket.keepalive-interval 2
>>> option transport.socket.read-fail-log off
>>> option ping-timeout 10
>>> option event-threads 1
>>> option rpc-auth-allow-insecure on
>>> #   option transport.address-family inet6
>>> #   option base-port 49152
>>> end-volume
>>>
>>> changed to:
>>> volume management
>>> type mgmt/glusterd
>>>  

Re: [Gluster-users] [Gluster-devel] Upgrade testing to gluster 6

2019-04-04 Thread Darrell Budic
Just the glusterd.log from each node, right?

> On Apr 4, 2019, at 11:25 AM, Atin Mukherjee  wrote:
> 
> Darell,
> 
> I fully understand that you can't reproduce it and you don't have bandwidth 
> to test it again, but would you be able to send us the glusterd log from all 
> the nodes when this happened. We would like to go through the logs and get 
> back. I would particularly like to see if something has gone wrong with 
> transport.socket.listen-port option. But with out the log files we can't find 
> out anything. Hope you understand it.
> 
> On Thu, Apr 4, 2019 at 9:27 PM Darrell Budic  > wrote:
> I didn’t follow any specific documents, just a generic rolling upgrade one 
> node at a time. Once the first node didn’t reconnect, I tried to follow the 
> workaround in the bug during the upgrade. Basic procedure was:
> 
> - take 3 nodes that were initially installed with 3.12.x (forget which, but 
> low number) and had been upgraded directly to 5.5 from 3.12.15
>   - op-version was 50400
> - on node A:
>   - yum install centos-release-gluster6
>   - yum upgrade (was some ovirt cockpit components, gluster, and a lib or two 
> this time), hit yes
>   - discover glusterd was dead
>   - systemctl restart glusterd
>   - no peer connections, try iptables -F; systemctl restart glusterd, no 
> change
> - following the workaround in the bug, try iptables -F & restart glusterd on 
> other 2 nodes, no effect
>   - nodes B & C were still connected to each other and all bricks were fine 
> at this point
> - try upgrading other 2 nodes and restarting gluster, no effect (iptables 
> still empty)
>   - lost quota here, so all bricks went offline
> - read logs, not finding much, but looked at glusterd.vol and compared to new 
> versions
> - updated glusterd.vol on A and restarted glusterd
>   - A doesn’t show any connected peers, but both other nodes show A as 
> connected
> - update glusterd.vol on B & C, restart glusterd
>   - all nodes show connected and volumes are active and healing
> 
> The only odd thing in my process was that node A did not have any active 
> bricks on it at the time of the upgrade. It doesn’t seem like this mattered 
> since B & C showed the same symptoms between themselves while being upgraded, 
> but I don’t know. The only log entry that referenced anything about peer 
> connections is included below already.
> 
> Looks like it was related to my glusterd settings, since that’s what fixed it 
> for me. Unfortunately, I don’t have the bandwidth or the systems to test 
> different versions of that specifically, but maybe you guys can on some test 
> resources? Otherwise, I’ve got another cluster (my production one!) that’s 
> midway through the upgrade from 3.12.15 -> 5.5. I paused when I started 
> getting multiple brick processes on the two nodes that had gone to 5.5 
> already. I think I’m going to jump the last node right to 6 to try and avoid 
> that mess, and it has the same glusterd.vol settings. I’ll try and capture 
> it’s logs during the upgrade and see if there’s any new info, or if it has 
> the same issues as this group did.
> 
>   -Darrell
> 
>> On Apr 4, 2019, at 2:54 AM, Sanju Rakonde > > wrote:
>> 
>> We don't hit https://bugzilla.redhat.com/show_bug.cgi?id=1694010 
>>  while upgrading to 
>> glusterfs-6. We tested it in different setups and understood that this issue 
>> is seen because of some issue in setup.
>> 
>> regarding the issue you have faced, can you please let us know which 
>> documentation you have followed for the upgrade? During our testing, we 
>> didn't hit any such issue. we would like to understand what went wrong.
>> 
>> On Thu, Apr 4, 2019 at 2:08 AM Darrell Budic > > wrote:
>> Hari-
>> 
>> I was upgrading my test cluster from 5.5 to 6 and I hit this bug 
>> (https://bugzilla.redhat.com/show_bug.cgi?id=1694010 
>> ) or something similar. 
>> In my case, the workaround did not work, and I was left with a gluster that 
>> had gone into no-quorum mode and stopped all the bricks. Wasn’t much in the 
>> logs either, but I noticed my /etc/glusterfs/glusterd.vol files were not the 
>> same as the newer versions, so I updated them, restarted glusterd, and 
>> suddenly the updated node showed as peer-in-cluster again. Once I updated 
>> other notes the same way, things started working again. Maybe a place to 
>> look?
>> 
>> My old config (all nodes):
>> volume management
>> type mgmt/glusterd
>> option working-directory /var/lib/glusterd
>> option transport-type socket
>> option transport.socket.keepalive-time 10
>> option transport.socket.keepalive-interval 2
>> option transport.socket.read-fail-log off
>> option ping-timeout 10
>> option event-threads 1
>> option rpc-auth-allow-insecure on
>> #   option transport.address-family inet6
>>

Re: [Gluster-users] [Gluster-devel] Upgrade testing to gluster 6

2019-04-04 Thread Atin Mukherjee
Darell,

I fully understand that you can't reproduce it and you don't have bandwidth
to test it again, but would you be able to send us the glusterd log from
all the nodes when this happened. We would like to go through the logs and
get back. I would particularly like to see if something has gone wrong with
transport.socket.listen-port option. But with out the log files we can't
find out anything. Hope you understand it.

On Thu, Apr 4, 2019 at 9:27 PM Darrell Budic  wrote:

> I didn’t follow any specific documents, just a generic rolling upgrade one
> node at a time. Once the first node didn’t reconnect, I tried to follow the
> workaround in the bug during the upgrade. Basic procedure was:
>
> - take 3 nodes that were initially installed with 3.12.x (forget which,
> but low number) and had been upgraded directly to 5.5 from 3.12.15
>   - op-version was 50400
> - on node A:
>   - yum install centos-release-gluster6
>   - yum upgrade (was some ovirt cockpit components, gluster, and a lib or
> two this time), hit yes
>   - discover glusterd was dead
>   - systemctl restart glusterd
>   - no peer connections, try iptables -F; systemctl restart glusterd, no
> change
> - following the workaround in the bug, try iptables -F & restart glusterd
> on other 2 nodes, no effect
>   - nodes B & C were still connected to each other and all bricks were
> fine at this point
> - try upgrading other 2 nodes and restarting gluster, no effect (iptables
> still empty)
>   - lost quota here, so all bricks went offline
> - read logs, not finding much, but looked at glusterd.vol and compared to
> new versions
> - updated glusterd.vol on A and restarted glusterd
>   - A doesn’t show any connected peers, but both other nodes show A as
> connected
> - update glusterd.vol on B & C, restart glusterd
>   - all nodes show connected and volumes are active and healing
>
> The only odd thing in my process was that node A did not have any active
> bricks on it at the time of the upgrade. It doesn’t seem like this mattered
> since B & C showed the same symptoms between themselves while being
> upgraded, but I don’t know. The only log entry that referenced anything
> about peer connections is included below already.
>
> Looks like it was related to my glusterd settings, since that’s what fixed
> it for me. Unfortunately, I don’t have the bandwidth or the systems to test
> different versions of that specifically, but maybe you guys can on some
> test resources? Otherwise, I’ve got another cluster (my production one!)
> that’s midway through the upgrade from 3.12.15 -> 5.5. I paused when I
> started getting multiple brick processes on the two nodes that had gone to
> 5.5 already. I think I’m going to jump the last node right to 6 to try and
> avoid that mess, and it has the same glusterd.vol settings. I’ll try and
> capture it’s logs during the upgrade and see if there’s any new info, or if
> it has the same issues as this group did.
>
>   -Darrell
>
> On Apr 4, 2019, at 2:54 AM, Sanju Rakonde  wrote:
>
> We don't hit https://bugzilla.redhat.com/show_bug.cgi?id=1694010 while
> upgrading to glusterfs-6. We tested it in different setups and understood
> that this issue is seen because of some issue in setup.
>
> regarding the issue you have faced, can you please let us know which
> documentation you have followed for the upgrade? During our testing, we
> didn't hit any such issue. we would like to understand what went wrong.
>
> On Thu, Apr 4, 2019 at 2:08 AM Darrell Budic 
> wrote:
>
>> Hari-
>>
>> I was upgrading my test cluster from 5.5 to 6 and I hit this bug (
>> https://bugzilla.redhat.com/show_bug.cgi?id=1694010) or something
>> similar. In my case, the workaround did not work, and I was left with a
>> gluster that had gone into no-quorum mode and stopped all the bricks.
>> Wasn’t much in the logs either, but I noticed my
>> /etc/glusterfs/glusterd.vol files were not the same as the newer versions,
>> so I updated them, restarted glusterd, and suddenly the updated node showed
>> as peer-in-cluster again. Once I updated other notes the same way, things
>> started working again. Maybe a place to look?
>>
>> My old config (all nodes):
>> volume management
>> type mgmt/glusterd
>> option working-directory /var/lib/glusterd
>> option transport-type socket
>> option transport.socket.keepalive-time 10
>> option transport.socket.keepalive-interval 2
>> option transport.socket.read-fail-log off
>> option ping-timeout 10
>> option event-threads 1
>> option rpc-auth-allow-insecure on
>> #   option transport.address-family inet6
>> #   option base-port 49152
>> end-volume
>>
>> changed to:
>> volume management
>> type mgmt/glusterd
>> option working-directory /var/lib/glusterd
>> option transport-type socket,rdma
>> option transport.socket.keepalive-time 10
>> option transport.socket.keepalive-interval 2
>> option transport.socket.read-fail-log off
>> option transport.socket.listen-port 2

Re: [Gluster-users] [Gluster-devel] Upgrade testing to gluster 6

2019-04-04 Thread Darrell Budic
I didn’t follow any specific documents, just a generic rolling upgrade one node 
at a time. Once the first node didn’t reconnect, I tried to follow the 
workaround in the bug during the upgrade. Basic procedure was:

- take 3 nodes that were initially installed with 3.12.x (forget which, but low 
number) and had been upgraded directly to 5.5 from 3.12.15
  - op-version was 50400
- on node A:
  - yum install centos-release-gluster6
  - yum upgrade (was some ovirt cockpit components, gluster, and a lib or two 
this time), hit yes
  - discover glusterd was dead
  - systemctl restart glusterd
  - no peer connections, try iptables -F; systemctl restart glusterd, no change
- following the workaround in the bug, try iptables -F & restart glusterd on 
other 2 nodes, no effect
  - nodes B & C were still connected to each other and all bricks were fine at 
this point
- try upgrading other 2 nodes and restarting gluster, no effect (iptables still 
empty)
  - lost quota here, so all bricks went offline
- read logs, not finding much, but looked at glusterd.vol and compared to new 
versions
- updated glusterd.vol on A and restarted glusterd
  - A doesn’t show any connected peers, but both other nodes show A as connected
- update glusterd.vol on B & C, restart glusterd
  - all nodes show connected and volumes are active and healing

The only odd thing in my process was that node A did not have any active bricks 
on it at the time of the upgrade. It doesn’t seem like this mattered since B & 
C showed the same symptoms between themselves while being upgraded, but I don’t 
know. The only log entry that referenced anything about peer connections is 
included below already.

Looks like it was related to my glusterd settings, since that’s what fixed it 
for me. Unfortunately, I don’t have the bandwidth or the systems to test 
different versions of that specifically, but maybe you guys can on some test 
resources? Otherwise, I’ve got another cluster (my production one!) that’s 
midway through the upgrade from 3.12.15 -> 5.5. I paused when I started getting 
multiple brick processes on the two nodes that had gone to 5.5 already. I think 
I’m going to jump the last node right to 6 to try and avoid that mess, and it 
has the same glusterd.vol settings. I’ll try and capture it’s logs during the 
upgrade and see if there’s any new info, or if it has the same issues as this 
group did.

  -Darrell

> On Apr 4, 2019, at 2:54 AM, Sanju Rakonde  wrote:
> 
> We don't hit https://bugzilla.redhat.com/show_bug.cgi?id=1694010 
>  while upgrading to 
> glusterfs-6. We tested it in different setups and understood that this issue 
> is seen because of some issue in setup.
> 
> regarding the issue you have faced, can you please let us know which 
> documentation you have followed for the upgrade? During our testing, we 
> didn't hit any such issue. we would like to understand what went wrong.
> 
> On Thu, Apr 4, 2019 at 2:08 AM Darrell Budic  > wrote:
> Hari-
> 
> I was upgrading my test cluster from 5.5 to 6 and I hit this bug 
> (https://bugzilla.redhat.com/show_bug.cgi?id=1694010 
> ) or something similar. 
> In my case, the workaround did not work, and I was left with a gluster that 
> had gone into no-quorum mode and stopped all the bricks. Wasn’t much in the 
> logs either, but I noticed my /etc/glusterfs/glusterd.vol files were not the 
> same as the newer versions, so I updated them, restarted glusterd, and 
> suddenly the updated node showed as peer-in-cluster again. Once I updated 
> other notes the same way, things started working again. Maybe a place to look?
> 
> My old config (all nodes):
> volume management
> type mgmt/glusterd
> option working-directory /var/lib/glusterd
> option transport-type socket
> option transport.socket.keepalive-time 10
> option transport.socket.keepalive-interval 2
> option transport.socket.read-fail-log off
> option ping-timeout 10
> option event-threads 1
> option rpc-auth-allow-insecure on
> #   option transport.address-family inet6
> #   option base-port 49152
> end-volume
> 
> changed to:
> volume management
> type mgmt/glusterd
> option working-directory /var/lib/glusterd
> option transport-type socket,rdma
> option transport.socket.keepalive-time 10
> option transport.socket.keepalive-interval 2
> option transport.socket.read-fail-log off
> option transport.socket.listen-port 24007
> option transport.rdma.listen-port 24008
> option ping-timeout 0
> option event-threads 1
> option rpc-auth-allow-insecure on
> #   option lock-timer 180
> #   option transport.address-family inet6
> #   option base-port 49152
> option max-port  60999
> end-volume
> 
> the only thing I found in the glusterd logs that looks relevant was (repeated 
> for both of the other nodes in this cluster), so no cl

Re: [Gluster-users] [Gluster-devel] Upgrade testing to gluster 6

2019-04-04 Thread Sanju Rakonde
We don't hit https://bugzilla.redhat.com/show_bug.cgi?id=1694010 while
upgrading to glusterfs-6. We tested it in different setups and understood
that this issue is seen because of some issue in setup.

regarding the issue you have faced, can you please let us know which
documentation you have followed for the upgrade? During our testing, we
didn't hit any such issue. we would like to understand what went wrong.

On Thu, Apr 4, 2019 at 2:08 AM Darrell Budic  wrote:

> Hari-
>
> I was upgrading my test cluster from 5.5 to 6 and I hit this bug (
> https://bugzilla.redhat.com/show_bug.cgi?id=1694010) or something
> similar. In my case, the workaround did not work, and I was left with a
> gluster that had gone into no-quorum mode and stopped all the bricks.
> Wasn’t much in the logs either, but I noticed my
> /etc/glusterfs/glusterd.vol files were not the same as the newer versions,
> so I updated them, restarted glusterd, and suddenly the updated node showed
> as peer-in-cluster again. Once I updated other notes the same way, things
> started working again. Maybe a place to look?
>
> My old config (all nodes):
> volume management
> type mgmt/glusterd
> option working-directory /var/lib/glusterd
> option transport-type socket
> option transport.socket.keepalive-time 10
> option transport.socket.keepalive-interval 2
> option transport.socket.read-fail-log off
> option ping-timeout 10
> option event-threads 1
> option rpc-auth-allow-insecure on
> #   option transport.address-family inet6
> #   option base-port 49152
> end-volume
>
> changed to:
> volume management
> type mgmt/glusterd
> option working-directory /var/lib/glusterd
> option transport-type socket,rdma
> option transport.socket.keepalive-time 10
> option transport.socket.keepalive-interval 2
> option transport.socket.read-fail-log off
> option transport.socket.listen-port 24007
> option transport.rdma.listen-port 24008
> option ping-timeout 0
> option event-threads 1
> option rpc-auth-allow-insecure on
> #   option lock-timer 180
> #   option transport.address-family inet6
> #   option base-port 49152
> option max-port  60999
> end-volume
>
> the only thing I found in the glusterd logs that looks relevant was
> (repeated for both of the other nodes in this cluster), so no clue why it
> happened:
> [2019-04-03 20:19:16.802638] I [MSGID: 106004]
> [glusterd-handler.c:6427:__glusterd_peer_rpc_notify] 0-management: Peer
>  (<0ecbf953-681b-448f-9746-d1c1fe7a0978>), in state  Cluster>, has disconnected from glusterd.
>
>
> On Apr 2, 2019, at 4:53 AM, Atin Mukherjee 
> wrote:
>
>
>
> On Mon, 1 Apr 2019 at 10:28, Hari Gowtham  wrote:
>
>> Comments inline.
>>
>> On Mon, Apr 1, 2019 at 5:55 AM Sankarshan Mukhopadhyay
>>  wrote:
>> >
>> > Quite a considerable amount of detail here. Thank you!
>> >
>> > On Fri, Mar 29, 2019 at 11:42 AM Hari Gowtham 
>> wrote:
>> > >
>> > > Hello Gluster users,
>> > >
>> > > As you all aware that glusterfs-6 is out, we would like to inform you
>> > > that, we have spent a significant amount of time in testing
>> > > glusterfs-6 in upgrade scenarios. We have done upgrade testing to
>> > > glusterfs-6 from various releases like 3.12, 4.1 and 5.3.
>> > >
>> > > As glusterfs-6 has got in a lot of changes, we wanted to test those
>> portions.
>> > > There were xlators (and respective options to enable/disable them)
>> > > added and deprecated in glusterfs-6 from various versions [1].
>> > >
>> > > We had to check the following upgrade scenarios for all such options
>> > > Identified in [1]:
>> > > 1) option never enabled and upgraded
>> > > 2) option enabled and then upgraded
>> > > 3) option enabled and then disabled and then upgraded
>> > >
>> > > We weren't manually able to check all the combinations for all the
>> options.
>> > > So the options involving enabling and disabling xlators were
>> prioritized.
>> > > The below are the result of the ones tested.
>> > >
>> > > Never enabled and upgraded:
>> > > checked from 3.12, 4.1, 5.3 to 6 the upgrade works.
>> > >
>> > > Enabled and upgraded:
>> > > Tested for tier which is deprecated, It is not a recommended upgrade.
>> > > As expected the volume won't be consumable and will have a few more
>> > > issues as well.
>> > > Tested with 3.12, 4.1 and 5.3 to 6 upgrade.
>> > >
>> > > Enabled, disabled before upgrade.
>> > > Tested for tier with 3.12 and the upgrade went fine.
>> > >
>> > > There is one common issue to note in every upgrade. The node being
>> > > upgraded is going into disconnected state. You have to flush the
>> iptables
>> > > and the restart glusterd on all nodes to fix this.
>> > >
>> >
>> > Is this something that is written in the upgrade notes? I do not seem
>> > to recall, if not, I'll send a PR
>>
>> No this wasn't mentioned in the release notes. PRs are welcome.
>>
>> >
>> > > The testing for enabling new options is still pending. The new options
>> > > won't cause as much issues as

Re: [Gluster-users] [Gluster-devel] Upgrade testing to gluster 6

2019-04-03 Thread Darrell Budic
Hari-

I was upgrading my test cluster from 5.5 to 6 and I hit this bug 
(https://bugzilla.redhat.com/show_bug.cgi?id=1694010 
) or something similar. In 
my case, the workaround did not work, and I was left with a gluster that had 
gone into no-quorum mode and stopped all the bricks. Wasn’t much in the logs 
either, but I noticed my /etc/glusterfs/glusterd.vol files were not the same as 
the newer versions, so I updated them, restarted glusterd, and suddenly the 
updated node showed as peer-in-cluster again. Once I updated other notes the 
same way, things started working again. Maybe a place to look?

My old config (all nodes):
volume management
type mgmt/glusterd
option working-directory /var/lib/glusterd
option transport-type socket
option transport.socket.keepalive-time 10
option transport.socket.keepalive-interval 2
option transport.socket.read-fail-log off
option ping-timeout 10
option event-threads 1
option rpc-auth-allow-insecure on
#   option transport.address-family inet6
#   option base-port 49152
end-volume

changed to:
volume management
type mgmt/glusterd
option working-directory /var/lib/glusterd
option transport-type socket,rdma
option transport.socket.keepalive-time 10
option transport.socket.keepalive-interval 2
option transport.socket.read-fail-log off
option transport.socket.listen-port 24007
option transport.rdma.listen-port 24008
option ping-timeout 0
option event-threads 1
option rpc-auth-allow-insecure on
#   option lock-timer 180
#   option transport.address-family inet6
#   option base-port 49152
option max-port  60999
end-volume

the only thing I found in the glusterd logs that looks relevant was (repeated 
for both of the other nodes in this cluster), so no clue why it happened:
[2019-04-03 20:19:16.802638] I [MSGID: 106004] 
[glusterd-handler.c:6427:__glusterd_peer_rpc_notify] 0-management: Peer 
 (<0ecbf953-681b-448f-9746-d1c1fe7a0978>), in state , has disconnected from glusterd.


> On Apr 2, 2019, at 4:53 AM, Atin Mukherjee  wrote:
> 
> 
> 
> On Mon, 1 Apr 2019 at 10:28, Hari Gowtham  > wrote:
> Comments inline.
> 
> On Mon, Apr 1, 2019 at 5:55 AM Sankarshan Mukhopadhyay
>  > wrote:
> >
> > Quite a considerable amount of detail here. Thank you!
> >
> > On Fri, Mar 29, 2019 at 11:42 AM Hari Gowtham  > > wrote:
> > >
> > > Hello Gluster users,
> > >
> > > As you all aware that glusterfs-6 is out, we would like to inform you
> > > that, we have spent a significant amount of time in testing
> > > glusterfs-6 in upgrade scenarios. We have done upgrade testing to
> > > glusterfs-6 from various releases like 3.12, 4.1 and 5.3.
> > >
> > > As glusterfs-6 has got in a lot of changes, we wanted to test those 
> > > portions.
> > > There were xlators (and respective options to enable/disable them)
> > > added and deprecated in glusterfs-6 from various versions [1].
> > >
> > > We had to check the following upgrade scenarios for all such options
> > > Identified in [1]:
> > > 1) option never enabled and upgraded
> > > 2) option enabled and then upgraded
> > > 3) option enabled and then disabled and then upgraded
> > >
> > > We weren't manually able to check all the combinations for all the 
> > > options.
> > > So the options involving enabling and disabling xlators were prioritized.
> > > The below are the result of the ones tested.
> > >
> > > Never enabled and upgraded:
> > > checked from 3.12, 4.1, 5.3 to 6 the upgrade works.
> > >
> > > Enabled and upgraded:
> > > Tested for tier which is deprecated, It is not a recommended upgrade.
> > > As expected the volume won't be consumable and will have a few more
> > > issues as well.
> > > Tested with 3.12, 4.1 and 5.3 to 6 upgrade.
> > >
> > > Enabled, disabled before upgrade.
> > > Tested for tier with 3.12 and the upgrade went fine.
> > >
> > > There is one common issue to note in every upgrade. The node being
> > > upgraded is going into disconnected state. You have to flush the iptables
> > > and the restart glusterd on all nodes to fix this.
> > >
> >
> > Is this something that is written in the upgrade notes? I do not seem
> > to recall, if not, I'll send a PR
> 
> No this wasn't mentioned in the release notes. PRs are welcome.
> 
> >
> > > The testing for enabling new options is still pending. The new options
> > > won't cause as much issues as the deprecated ones so this was put at
> > > the end of the priority list. It would be nice to get contributions
> > > for this.
> > >
> >
> > Did the range of tests lead to any new issues?
> 
> Yes. In the first round of testing we found an issue and had to postpone the
> release of 6 until the fix was made available.
> https://bugzilla.redhat.com/show_bug.cgi?id=1684029 
> 
> 
> And then we tested it again

Re: [Gluster-users] [Gluster-devel] Upgrade testing to gluster 6

2019-04-02 Thread Atin Mukherjee
On Mon, 1 Apr 2019 at 10:28, Hari Gowtham  wrote:

> Comments inline.
>
> On Mon, Apr 1, 2019 at 5:55 AM Sankarshan Mukhopadhyay
>  wrote:
> >
> > Quite a considerable amount of detail here. Thank you!
> >
> > On Fri, Mar 29, 2019 at 11:42 AM Hari Gowtham 
> wrote:
> > >
> > > Hello Gluster users,
> > >
> > > As you all aware that glusterfs-6 is out, we would like to inform you
> > > that, we have spent a significant amount of time in testing
> > > glusterfs-6 in upgrade scenarios. We have done upgrade testing to
> > > glusterfs-6 from various releases like 3.12, 4.1 and 5.3.
> > >
> > > As glusterfs-6 has got in a lot of changes, we wanted to test those
> portions.
> > > There were xlators (and respective options to enable/disable them)
> > > added and deprecated in glusterfs-6 from various versions [1].
> > >
> > > We had to check the following upgrade scenarios for all such options
> > > Identified in [1]:
> > > 1) option never enabled and upgraded
> > > 2) option enabled and then upgraded
> > > 3) option enabled and then disabled and then upgraded
> > >
> > > We weren't manually able to check all the combinations for all the
> options.
> > > So the options involving enabling and disabling xlators were
> prioritized.
> > > The below are the result of the ones tested.
> > >
> > > Never enabled and upgraded:
> > > checked from 3.12, 4.1, 5.3 to 6 the upgrade works.
> > >
> > > Enabled and upgraded:
> > > Tested for tier which is deprecated, It is not a recommended upgrade.
> > > As expected the volume won't be consumable and will have a few more
> > > issues as well.
> > > Tested with 3.12, 4.1 and 5.3 to 6 upgrade.
> > >
> > > Enabled, disabled before upgrade.
> > > Tested for tier with 3.12 and the upgrade went fine.
> > >
> > > There is one common issue to note in every upgrade. The node being
> > > upgraded is going into disconnected state. You have to flush the
> iptables
> > > and the restart glusterd on all nodes to fix this.
> > >
> >
> > Is this something that is written in the upgrade notes? I do not seem
> > to recall, if not, I'll send a PR
>
> No this wasn't mentioned in the release notes. PRs are welcome.
>
> >
> > > The testing for enabling new options is still pending. The new options
> > > won't cause as much issues as the deprecated ones so this was put at
> > > the end of the priority list. It would be nice to get contributions
> > > for this.
> > >
> >
> > Did the range of tests lead to any new issues?
>
> Yes. In the first round of testing we found an issue and had to postpone
> the
> release of 6 until the fix was made available.
> https://bugzilla.redhat.com/show_bug.cgi?id=1684029
>
> And then we tested it again after this patch was made available.
> and came  across this:
> https://bugzilla.redhat.com/show_bug.cgi?id=1694010


This isn’t a bug as we found that upgrade worked seamelessly in two
different setup. So we have no issues in the upgrade path to glusterfs-6
release.


>
> Have mentioned this in the second mail as to how to over this situation
> for now until the fix is available.
>
> >
> > > For the disable testing, tier was used as it covers most of the xlator
> > > that was removed. And all of these tests were done on a replica 3
> volume.
> > >
> >
> > I'm not sure if the Glusto team is reading this, but it would be
> > pertinent to understand if the approach you have taken can be
> > converted into a form of automated testing pre-release.
>
> I don't have an answer for this, have CCed Vijay.
> He might have an idea.
>
> >
> > > Note: This is only for upgrade testing of the newly added and removed
> > > xlators. Does not involve the normal tests for the xlator.
> > >
> > > If you have any questions, please feel free to reach us.
> > >
> > > [1]
> https://docs.google.com/spreadsheets/d/1nh7T5AXaV6kc5KgILOy2pEqjzC3t_R47f1XUXSVFetI/edit?usp=sharing
> > >
> > > Regards,
> > > Hari and Sanju.
> > ___
> > Gluster-users mailing list
> > Gluster-users@gluster.org
> > https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
>
> --
> Regards,
> Hari Gowtham.
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>
-- 
--Atin
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [Gluster-devel] Upgrade testing to gluster 6

2019-03-31 Thread Hari Gowtham
Comments inline.

On Mon, Apr 1, 2019 at 5:55 AM Sankarshan Mukhopadhyay
 wrote:
>
> Quite a considerable amount of detail here. Thank you!
>
> On Fri, Mar 29, 2019 at 11:42 AM Hari Gowtham  wrote:
> >
> > Hello Gluster users,
> >
> > As you all aware that glusterfs-6 is out, we would like to inform you
> > that, we have spent a significant amount of time in testing
> > glusterfs-6 in upgrade scenarios. We have done upgrade testing to
> > glusterfs-6 from various releases like 3.12, 4.1 and 5.3.
> >
> > As glusterfs-6 has got in a lot of changes, we wanted to test those 
> > portions.
> > There were xlators (and respective options to enable/disable them)
> > added and deprecated in glusterfs-6 from various versions [1].
> >
> > We had to check the following upgrade scenarios for all such options
> > Identified in [1]:
> > 1) option never enabled and upgraded
> > 2) option enabled and then upgraded
> > 3) option enabled and then disabled and then upgraded
> >
> > We weren't manually able to check all the combinations for all the options.
> > So the options involving enabling and disabling xlators were prioritized.
> > The below are the result of the ones tested.
> >
> > Never enabled and upgraded:
> > checked from 3.12, 4.1, 5.3 to 6 the upgrade works.
> >
> > Enabled and upgraded:
> > Tested for tier which is deprecated, It is not a recommended upgrade.
> > As expected the volume won't be consumable and will have a few more
> > issues as well.
> > Tested with 3.12, 4.1 and 5.3 to 6 upgrade.
> >
> > Enabled, disabled before upgrade.
> > Tested for tier with 3.12 and the upgrade went fine.
> >
> > There is one common issue to note in every upgrade. The node being
> > upgraded is going into disconnected state. You have to flush the iptables
> > and the restart glusterd on all nodes to fix this.
> >
>
> Is this something that is written in the upgrade notes? I do not seem
> to recall, if not, I'll send a PR

No this wasn't mentioned in the release notes. PRs are welcome.

>
> > The testing for enabling new options is still pending. The new options
> > won't cause as much issues as the deprecated ones so this was put at
> > the end of the priority list. It would be nice to get contributions
> > for this.
> >
>
> Did the range of tests lead to any new issues?

Yes. In the first round of testing we found an issue and had to postpone the
release of 6 until the fix was made available.
https://bugzilla.redhat.com/show_bug.cgi?id=1684029

And then we tested it again after this patch was made available.
and came  across this:
https://bugzilla.redhat.com/show_bug.cgi?id=1694010

Have mentioned this in the second mail as to how to over this situation
for now until the fix is available.

>
> > For the disable testing, tier was used as it covers most of the xlator
> > that was removed. And all of these tests were done on a replica 3 volume.
> >
>
> I'm not sure if the Glusto team is reading this, but it would be
> pertinent to understand if the approach you have taken can be
> converted into a form of automated testing pre-release.

I don't have an answer for this, have CCed Vijay.
He might have an idea.

>
> > Note: This is only for upgrade testing of the newly added and removed
> > xlators. Does not involve the normal tests for the xlator.
> >
> > If you have any questions, please feel free to reach us.
> >
> > [1] 
> > https://docs.google.com/spreadsheets/d/1nh7T5AXaV6kc5KgILOy2pEqjzC3t_R47f1XUXSVFetI/edit?usp=sharing
> >
> > Regards,
> > Hari and Sanju.
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users



-- 
Regards,
Hari Gowtham.
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] [Gluster-devel] Upgrade testing to gluster 6

2019-03-31 Thread Sankarshan Mukhopadhyay
Quite a considerable amount of detail here. Thank you!

On Fri, Mar 29, 2019 at 11:42 AM Hari Gowtham  wrote:
>
> Hello Gluster users,
>
> As you all aware that glusterfs-6 is out, we would like to inform you
> that, we have spent a significant amount of time in testing
> glusterfs-6 in upgrade scenarios. We have done upgrade testing to
> glusterfs-6 from various releases like 3.12, 4.1 and 5.3.
>
> As glusterfs-6 has got in a lot of changes, we wanted to test those portions.
> There were xlators (and respective options to enable/disable them)
> added and deprecated in glusterfs-6 from various versions [1].
>
> We had to check the following upgrade scenarios for all such options
> Identified in [1]:
> 1) option never enabled and upgraded
> 2) option enabled and then upgraded
> 3) option enabled and then disabled and then upgraded
>
> We weren't manually able to check all the combinations for all the options.
> So the options involving enabling and disabling xlators were prioritized.
> The below are the result of the ones tested.
>
> Never enabled and upgraded:
> checked from 3.12, 4.1, 5.3 to 6 the upgrade works.
>
> Enabled and upgraded:
> Tested for tier which is deprecated, It is not a recommended upgrade.
> As expected the volume won't be consumable and will have a few more
> issues as well.
> Tested with 3.12, 4.1 and 5.3 to 6 upgrade.
>
> Enabled, disabled before upgrade.
> Tested for tier with 3.12 and the upgrade went fine.
>
> There is one common issue to note in every upgrade. The node being
> upgraded is going into disconnected state. You have to flush the iptables
> and the restart glusterd on all nodes to fix this.
>

Is this something that is written in the upgrade notes? I do not seem
to recall, if not, I'll send a PR

> The testing for enabling new options is still pending. The new options
> won't cause as much issues as the deprecated ones so this was put at
> the end of the priority list. It would be nice to get contributions
> for this.
>

Did the range of tests lead to any new issues?

> For the disable testing, tier was used as it covers most of the xlator
> that was removed. And all of these tests were done on a replica 3 volume.
>

I'm not sure if the Glusto team is reading this, but it would be
pertinent to understand if the approach you have taken can be
converted into a form of automated testing pre-release.

> Note: This is only for upgrade testing of the newly added and removed
> xlators. Does not involve the normal tests for the xlator.
>
> If you have any questions, please feel free to reach us.
>
> [1] 
> https://docs.google.com/spreadsheets/d/1nh7T5AXaV6kc5KgILOy2pEqjzC3t_R47f1XUXSVFetI/edit?usp=sharing
>
> Regards,
> Hari and Sanju.
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users