Re: [Gluster-users] Gluster 11.0 upgrade

Marcus Pedersén Tue, 21 Feb 2023 05:13:33 -0800

Hi again,
I went ahead and upgraded the last two nodes in the cluster.
This is what I noted:
I upgraded the arbiter first and in:
/var/lib/glusterd/vols/gds-common/info
The parameter "nfs.disable=on" was added by the upgrade and
made the checksum fail.
I removed "nfs.disable=on" and all the three nodes connected fine.
I upgraded one of the other nodes and no changes were made to
the /var/lib/glusterd/vols/gds-common/info file, so the arbiter
node and the resent upgraded node had contact.
I upgraded the last node and on this node the parameter "nfs.disable=on"
was added in file: /var/lib/glusterd/vols/gds-common/info
I removed "nfs.disable=on" and restarted glusterd and the entire cluster
is up and ruunning the way it should.


The command: gluster volume get all cluster.max-op-version
Still says:

Option                                   Value
------                                   -----
cluster.max-op-version                   100000

I hope that this info helps!
Please let me know if I can help out in any other way!

Regards
Marcus


On Tue, Feb 21, 2023 at 01:19:58PM +0100, Marcus Pedersén wrote:
> CAUTION: This email originated from outside of the organization. Do not click 
> links or open attachments unless you recognize the sender and know the 
> content is safe.
>
>
> Hi Xavi,
> Copy the same info file worked well and the gluster 11 arbiter
> is now up and running and all the nodes are communication
> the way they should.
>
> Just another note on something I discovered on my virt machines.
> All the three nodes has been upgarded to 11.0 and are working.
> If I run:
> gluster volume get all cluster.op-version
> I get:
> Option                                   Value
> ------                                   -----
> cluster.op-version                       100000
>
> Which is correct as I have not updated the op-version,
> but if I run:
> gluster volume get all cluster.max-op-version
> I get:
> Option                                   Value
> ------                                   -----
> cluster.max-op-version                   100000
>
> I expected the max-op-version to be 110000.
> Isn't it supposed to be 110000?
> And after upgrade you should upgrade the op-version
> to 110000?
>
> Many thanks for all your help!
> Regards
> Marcus
>
>
> On Tue, Feb 21, 2023 at 09:29:28AM +0100, Xavi Hernandez wrote:
> > CAUTION: This email originated from outside of the organization. Do not 
> > click links or open attachments unless you recognize the sender and know 
> > the content is safe.
> >
> >
> > Hi Marcus,
> >
> > On Mon, Feb 20, 2023 at 2:53 PM Marcus Pedersén 
> > <[email protected]<mailto:[email protected]>> wrote:
> > Hi again Xavi,
> >
> > I did some more testing on my virt machines
> > with same setup:
> > Number of Bricks: 1 x (2 + 1) = 3
> > If I do it the same way, I upgrade the arbiter first,
> > I get the same behavior that the bricks do not start
> > and the other nodes does not "see" the upgraded node.
> > If I upgrade one of the other nodes (non arbiter) and restart
> > glusterd on both the arbiter and the other the arbiter starts
> > the bricks and connects with the other upgraded node as expected.
> > If I upgrade the last node (non arbiter) it will fail to start
> > the bricks, same behaviour as the arbiter at first.
> > If I then copy the /var/lib/gluster/vols/<myvol> from the
> > upgraded (non arbiter) node to the other node that does not start the bricks
> > and replace /var/lib/gluster/vols/<myvol> with the copied directory
> > and restarts glusterd it works nicely after that.
> > Everything then works the way it should.
> >
> > So the question is if the arbiter is treated in some other way
> > compared to the other nodes?
> >
> > It seems so, but at this point I'm not sure what could be the difference.
> >
> >
> > Some type of config is happening at the start of the glusterd that
> > makes the node fail?
> >
> > Gluster requires that all glusterd share the same configuration. In this 
> > case it seems that the "info" file in the volume definition has different 
> > contents on the servers.  One of the servers has the value "nfs.disable=on" 
> > but the others do not. This can be the difference that causes the checksum 
> > error.
> >
> > You can try to copy the "info" file from one node to the one that doesn't 
> > start and try restarting glusterd.
> >
> >
> > Do I dare to continue to upgrade my real cluster with the above described 
> > way?
> >
> > Thanks!
> >
> > Regards
> > Marcus
> >
> >
> >
> > On Mon, Feb 20, 2023 at 01:42:47PM +0100, Marcus Pedersén wrote:
> > > I made a recusive diff on the upgraded arbiter.
> > >
> > > /var/lib/glusterd/vols/gds-common is the upgraded aribiter
> > > /home/marcus/gds-common is one of the other nodes still on gluster 10
> > >
> > > diff -r 
> > > /var/lib/glusterd/vols/gds-common/bricks/urd-gds-030:-urd-gds-gds-common 
> > > /home/marcus/gds-common/bricks/urd-gds-030:-urd-gds-gds-common
> > > 5c5
> > > < listen-port=60419
> > > ---
> > > > listen-port=0
> > > 11c11
> > > < brick-fsid=14764358630653534655
> > > ---
> > > > brick-fsid=0
> > > diff -r 
> > > /var/lib/glusterd/vols/gds-common/bricks/urd-gds-031:-urd-gds-gds-common 
> > > /home/marcus/gds-common/bricks/urd-gds-031:-urd-gds-gds-common
> > > 5c5
> > > < listen-port=0
> > > ---
> > > > listen-port=60891
> > > 11c11
> > > < brick-fsid=0
> > > ---
> > > > brick-fsid=1088380223149770683
> > > diff -r /var/lib/glusterd/vols/gds-common/cksum 
> > > /home/marcus/gds-common/cksum
> > > 1c1
> > > < info=3948700922
> > > ---
> > > > info=458813151
> > > diff -r 
> > > /var/lib/glusterd/vols/gds-common/gds-common.urd-gds-030.urd-gds-gds-common.vol
> > >  /home/marcus/gds-common/gds-common.urd-gds-030.urd-gds-gds-common.vol
> > > 3c3
> > > <     option shared-brick-count 1
> > > ---
> > > >     option shared-brick-count 0
> > > diff -r 
> > > /var/lib/glusterd/vols/gds-common/gds-common.urd-gds-031.urd-gds-gds-common.vol
> > >  /home/marcus/gds-common/gds-common.urd-gds-031.urd-gds-gds-common.vol
> > > 3c3
> > > <     option shared-brick-count 0
> > > ---
> > > >     option shared-brick-count 1
> > > diff -r /var/lib/glusterd/vols/gds-common/info 
> > > /home/marcus/gds-common/info
> > > 23a24
> > > > nfs.disable=on
> > >
> > >
> > > I setup 3 virt machines  and configured them with gluster 10 (arbiter 1).
> > > After that I upgraded to 11 and the first 2 nodes was fine but on the 
> > > third
> > > node I got the same behaviour: the brick never started.
> > >
> > > Thanks for the help!
> > >
> > > Regards
> > > Marcus
> > >
> > >
> > > On Mon, Feb 20, 2023 at 12:30:37PM +0100, Xavi Hernandez wrote:
> > > > CAUTION: This email originated from outside of the organization. Do not 
> > > > click links or open attachments unless you recognize the sender and 
> > > > know the content is safe.
> > > >
> > > >
> > > > Hi Marcus,
> > > >
> > > > On Mon, Feb 20, 2023 at 8:50 AM Marcus Pedersén 
> > > > <[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>>
> > > >  wrote:
> > > > Hi Xavi,
> > > > I stopped glusterd and killall glusterd glusterfs glusterfsd
> > > > and started glusterd again.
> > > >
> > > > The only log that is not empty is glusterd.log, I attach the log
> > > > from the restart time. The brick log, glustershd.log and 
> > > > glfsheal-gds-common.log is empty.
> > > >
> > > > This are the errors in the log:
> > > > [2023-02-20 07:23:46.235263 +0000] E [MSGID: 106061] 
> > > > [glusterd.c:597:glusterd_crt_georep_folders] 0-glusterd: Dict get 
> > > > failed [{Key=log-group}, {errno=2}, {error=No such file or directory}]
> > > > [2023-02-20 07:23:47.359917 +0000] E [MSGID: 106010] 
> > > > [glusterd-utils.c:3542:glusterd_compare_friend_volume] 0-management: 
> > > > Version of Cksums gds-common differ. local cksum = 3017846959, remote 
> > > > cksum = 2065453698 on peer urd-gds-031
> > > > [2023-02-20 07:23:47.438052 +0000] E [MSGID: 106010] 
> > > > [glusterd-utils.c:3542:glusterd_compare_friend_volume] 0-management: 
> > > > Version of Cksums gds-common differ. local cksum = 3017846959, remote 
> > > > cksum = 2065453698 on peer urd-gds-032
> > > >
> > > > Geo replication is not setup so I guess there is nothing strange that 
> > > > there is an error regarding georep.
> > > > The checksum error seems natural to be there as the other nodes are 
> > > > still on version 10.
> > > >
> > > > No. The configurations should be identical.
> > > >
> > > > Can you try to compare volume definitions in 
> > > > /var/lib/glusterd/vols/gds-common between the upgraded server and one 
> > > > of the old ones ?
> > > >
> > > > Regards,
> > > >
> > > > Xavi
> > > >
> > > >
> > > > My previous exprience with upgrades is that the local bricks starts and
> > > > gluster is up and running. No connection with the other nodes until 
> > > > they are upgraded as well.
> > > >
> > > >
> > > > gluster peer status, gives the output:
> > > > Number of Peers: 2
> > > >
> > > > Hostname: urd-gds-032
> > > > Uuid: e6f96ad2-0fea-4d80-bd42-8236dd0f8439
> > > > State: Peer Rejected (Connected)
> > > >
> > > > Hostname: urd-gds-031
> > > > Uuid: 2d7c0ad7-dfcf-4eaf-9210-f879c7b406bf
> > > > State: Peer Rejected (Connected)
> > > >
> > > > I suppose and guess that this is due to that the arbiter is version 11
> > > > and the other 2 nodes are version 10.
> > > >
> > > > Please let me know if I can provide any other information
> > > > to try to solve this issue.
> > > >
> > > > Many thanks!
> > > > Marcus
> > > >
> > > >
> > > > On Mon, Feb 20, 2023 at 07:29:20AM +0100, Xavi Hernandez wrote:
> > > > > CAUTION: This email originated from outside of the organization. Do 
> > > > > not click links or open attachments unless you recognize the sender 
> > > > > and know the content is safe.
> > > > >
> > > > >
> > > > > Hi Marcus,
> > > > >
> > > > > these errors shouldn't prevent the bricks from starting. Isn't there 
> > > > > any other error or warning ?
> > > > >
> > > > > Regards,
> > > > >
> > > > > Xavi
> > > > >
> > > > > On Fri, Feb 17, 2023 at 3:06 PM Marcus Pedersén 
> > > > > <[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>><mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>>>
> > > > >  wrote:
> > > > > Hi all,
> > > > > I started an upgrade to gluster 11.0 from 10.3 on one of my clusters.
> > > > > OS: Debian bullseye
> > > > >
> > > > > Volume Name: gds-common
> > > > > Type: Replicate
> > > > > Volume ID: 42c9fa00-2d57-4a58-b5ae-c98c349cfcb6
> > > > > Status: Started
> > > > > Snapshot Count: 0
> > > > > Number of Bricks: 1 x (2 + 1) = 3
> > > > > Transport-type: tcp
> > > > > Bricks:
> > > > > Brick1: urd-gds-031:/urd-gds/gds-common
> > > > > Brick2: urd-gds-032:/urd-gds/gds-common
> > > > > Brick3: urd-gds-030:/urd-gds/gds-common (arbiter)
> > > > > Options Reconfigured:
> > > > > cluster.granular-entry-heal: on
> > > > > storage.fips-mode-rchecksum: on
> > > > > transport.address-family: inet
> > > > > nfs.disable: on
> > > > > performance.client-io-threads: off
> > > > >
> > > > > I started with the arbiter node, stopped all of gluster
> > > > > upgraded to 11.0 and all went fine.
> > > > > After upgrade I was able to see the other nodes and
> > > > > all nodes were connected.
> > > > > After a reboot on the arbiter nothing works the way it should.
> > > > > Both brick1 and brick2 has connection but no connection
> > > > > with the arbiter.
> > > > > On the arbiter glusterd has started and is listening on port 24007,
> > > > > the problem seems to be glusterfsd, it never starts!
> > > > >
> > > > > If I run: gluster volume status
> > > > >
> > > > > Status of volume: gds-common
> > > > > Gluster process                             TCP Port  RDMA Port  
> > > > > Online  Pid
> > > > > ------------------------------------------------------------------------------
> > > > > Brick urd-gds-030:/urd-gds/gds-common       N/A       N/A        N    
> > > > >    N/A
> > > > > Self-heal Daemon on localhost               N/A       N/A        N    
> > > > >    N/A
> > > > >
> > > > > Task Status of Volume gds-common
> > > > > ------------------------------------------------------------------------------
> > > > > There are no active volume tasks
> > > > >
> > > > >
> > > > > In glusterd.log I find the following errors (arbiter node):
> > > > > [2023-02-17 12:30:40.519585 +0000] E 
> > > > > [gf-io-uring.c:404:gf_io_uring_setup] 0-io: [MSGID:101240] Function 
> > > > > call failed <{function=io_uring_setup()}, {error=12 (Cannot allocate 
> > > > > memory)}>
> > > > > [2023-02-17 12:30:40.678031 +0000] E [MSGID: 106061] 
> > > > > [glusterd.c:597:glusterd_crt_georep_folders] 0-glusterd: Dict get 
> > > > > failed [{Key=log-group}, {errno=2}, {error=No such file or directory}]
> > > > >
> > > > > In brick/urd-gds-gds-common.log I find the following error:
> > > > > [2023-02-17 12:30:43.550753 +0000] E 
> > > > > [gf-io-uring.c:404:gf_io_uring_setup] 0-io: [MSGID:101240] Function 
> > > > > call failed <{function=io_uring_setup()}, {error=12 (Cannot allocate 
> > > > > memory)}>
> > > > >
> > > > > I enclose both logfiles.
> > > > >
> > > > > How do I resolve this issue??
> > > > >
> > > > > Many thanks in advance!!
> > > > >
> > > > > Marcus
> > > > > ---
> > > > > När du skickar e-post till SLU så innebär detta att SLU behandlar 
> > > > > dina personuppgifter. För att läsa mer om hur detta går till, klicka 
> > > > > här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
> > > > > E-mailing SLU will result in SLU processing your personal data. For 
> > > > > more information on how this is done, click here 
> > > > > <https://www.slu.se/en/about-slu/contact-slu/personal-data/>
> > > > > ________
> > > > >
> > > > >
> > > > >
> > > > > Community Meeting Calendar:
> > > > >
> > > > > Schedule -
> > > > > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> > > > > Bridge: https://meet.google.com/cpu-eiue-hvk
> > > > > Gluster-users mailing list
> > > > > [email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>><mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>>
> > > > > https://lists.gluster.org/mailman/listinfo/gluster-users
> > > > ---
> > > > När du skickar e-post till SLU så innebär detta att SLU behandlar dina 
> > > > personuppgifter. För att läsa mer om hur detta går till, klicka här 
> > > > <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
> > > > E-mailing SLU will result in SLU processing your personal data. For 
> > > > more information on how this is done, click here 
> > > > <https://www.slu.se/en/about-slu/contact-slu/personal-data/>
> > ---
> > När du skickar e-post till SLU så innebär detta att SLU behandlar dina 
> > personuppgifter. För att läsa mer om hur detta går till, klicka här 
> > <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
> > E-mailing SLU will result in SLU processing your personal data. For more 
> > information on how this is done, click here 
> > <https://www.slu.se/en/about-slu/contact-slu/personal-data/>
> >
>
> --
> **************************************************
> * Marcus Pedersén                                *
> * System administrator                           *
> **************************************************
> * Interbull Centre                               *
> * ================                               *
> * Department of Animal Breeding & Genetics — SLU *
> * Box 7023, SE-750 07                            *
> * Uppsala, Sweden                                *
> **************************************************
> * Visiting address:                              *
> * Room 55614, Ulls väg 26, Ultuna                *
> * Uppsala                                        *
> * Sweden                                         *
> *                                                *
> * Tel: +46-(0)18-67 1962                         *
> *                                                *
> **************************************************
> *     ISO 9001 Bureau Veritas No SE004561-1      *
> **************************************************
> ---
> När du skickar e-post till SLU så innebär detta att SLU behandlar dina 
> personuppgifter. För att läsa mer om hur detta går till, klicka här 
> <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
> E-mailing SLU will result in SLU processing your personal data. For more 
> information on how this is done, click here 
> <https://www.slu.se/en/about-slu/contact-slu/personal-data/>
> ________
>
>
>
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://meet.google.com/cpu-eiue-hvk
> Gluster-users mailing list
> [email protected]
> https://lists.gluster.org/mailman/listinfo/gluster-users
---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina 
personuppgifter. För att läsa mer om hur detta går till, klicka här 
<https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more 
information on how this is done, click here 
<https://www.slu.se/en/about-slu/contact-slu/personal-data/>
________



Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
[email protected]
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Gluster 11.0 upgrade

Reply via email to