Hi again, I went ahead and upgraded the last two nodes in the cluster. This is what I noted: I upgraded the arbiter first and in: /var/lib/glusterd/vols/gds-common/info The parameter "nfs.disable=on" was added by the upgrade and made the checksum fail. I removed "nfs.disable=on" and all the three nodes connected fine. I upgraded one of the other nodes and no changes were made to the /var/lib/glusterd/vols/gds-common/info file, so the arbiter node and the resent upgraded node had contact. I upgraded the last node and on this node the parameter "nfs.disable=on" was added in file: /var/lib/glusterd/vols/gds-common/info I removed "nfs.disable=on" and restarted glusterd and the entire cluster is up and ruunning the way it should.
The command: gluster volume get all cluster.max-op-version Still says: Option Value ------ ----- cluster.max-op-version 100000 I hope that this info helps! Please let me know if I can help out in any other way! Regards Marcus On Tue, Feb 21, 2023 at 01:19:58PM +0100, Marcus Pedersén wrote: > CAUTION: This email originated from outside of the organization. Do not click > links or open attachments unless you recognize the sender and know the > content is safe. > > > Hi Xavi, > Copy the same info file worked well and the gluster 11 arbiter > is now up and running and all the nodes are communication > the way they should. > > Just another note on something I discovered on my virt machines. > All the three nodes has been upgarded to 11.0 and are working. > If I run: > gluster volume get all cluster.op-version > I get: > Option Value > ------ ----- > cluster.op-version 100000 > > Which is correct as I have not updated the op-version, > but if I run: > gluster volume get all cluster.max-op-version > I get: > Option Value > ------ ----- > cluster.max-op-version 100000 > > I expected the max-op-version to be 110000. > Isn't it supposed to be 110000? > And after upgrade you should upgrade the op-version > to 110000? > > Many thanks for all your help! > Regards > Marcus > > > On Tue, Feb 21, 2023 at 09:29:28AM +0100, Xavi Hernandez wrote: > > CAUTION: This email originated from outside of the organization. Do not > > click links or open attachments unless you recognize the sender and know > > the content is safe. > > > > > > Hi Marcus, > > > > On Mon, Feb 20, 2023 at 2:53 PM Marcus Pedersén > > <[email protected]<mailto:[email protected]>> wrote: > > Hi again Xavi, > > > > I did some more testing on my virt machines > > with same setup: > > Number of Bricks: 1 x (2 + 1) = 3 > > If I do it the same way, I upgrade the arbiter first, > > I get the same behavior that the bricks do not start > > and the other nodes does not "see" the upgraded node. > > If I upgrade one of the other nodes (non arbiter) and restart > > glusterd on both the arbiter and the other the arbiter starts > > the bricks and connects with the other upgraded node as expected. > > If I upgrade the last node (non arbiter) it will fail to start > > the bricks, same behaviour as the arbiter at first. > > If I then copy the /var/lib/gluster/vols/<myvol> from the > > upgraded (non arbiter) node to the other node that does not start the bricks > > and replace /var/lib/gluster/vols/<myvol> with the copied directory > > and restarts glusterd it works nicely after that. > > Everything then works the way it should. > > > > So the question is if the arbiter is treated in some other way > > compared to the other nodes? > > > > It seems so, but at this point I'm not sure what could be the difference. > > > > > > Some type of config is happening at the start of the glusterd that > > makes the node fail? > > > > Gluster requires that all glusterd share the same configuration. In this > > case it seems that the "info" file in the volume definition has different > > contents on the servers. One of the servers has the value "nfs.disable=on" > > but the others do not. This can be the difference that causes the checksum > > error. > > > > You can try to copy the "info" file from one node to the one that doesn't > > start and try restarting glusterd. > > > > > > Do I dare to continue to upgrade my real cluster with the above described > > way? > > > > Thanks! > > > > Regards > > Marcus > > > > > > > > On Mon, Feb 20, 2023 at 01:42:47PM +0100, Marcus Pedersén wrote: > > > I made a recusive diff on the upgraded arbiter. > > > > > > /var/lib/glusterd/vols/gds-common is the upgraded aribiter > > > /home/marcus/gds-common is one of the other nodes still on gluster 10 > > > > > > diff -r > > > /var/lib/glusterd/vols/gds-common/bricks/urd-gds-030:-urd-gds-gds-common > > > /home/marcus/gds-common/bricks/urd-gds-030:-urd-gds-gds-common > > > 5c5 > > > < listen-port=60419 > > > --- > > > > listen-port=0 > > > 11c11 > > > < brick-fsid=14764358630653534655 > > > --- > > > > brick-fsid=0 > > > diff -r > > > /var/lib/glusterd/vols/gds-common/bricks/urd-gds-031:-urd-gds-gds-common > > > /home/marcus/gds-common/bricks/urd-gds-031:-urd-gds-gds-common > > > 5c5 > > > < listen-port=0 > > > --- > > > > listen-port=60891 > > > 11c11 > > > < brick-fsid=0 > > > --- > > > > brick-fsid=1088380223149770683 > > > diff -r /var/lib/glusterd/vols/gds-common/cksum > > > /home/marcus/gds-common/cksum > > > 1c1 > > > < info=3948700922 > > > --- > > > > info=458813151 > > > diff -r > > > /var/lib/glusterd/vols/gds-common/gds-common.urd-gds-030.urd-gds-gds-common.vol > > > /home/marcus/gds-common/gds-common.urd-gds-030.urd-gds-gds-common.vol > > > 3c3 > > > < option shared-brick-count 1 > > > --- > > > > option shared-brick-count 0 > > > diff -r > > > /var/lib/glusterd/vols/gds-common/gds-common.urd-gds-031.urd-gds-gds-common.vol > > > /home/marcus/gds-common/gds-common.urd-gds-031.urd-gds-gds-common.vol > > > 3c3 > > > < option shared-brick-count 0 > > > --- > > > > option shared-brick-count 1 > > > diff -r /var/lib/glusterd/vols/gds-common/info > > > /home/marcus/gds-common/info > > > 23a24 > > > > nfs.disable=on > > > > > > > > > I setup 3 virt machines and configured them with gluster 10 (arbiter 1). > > > After that I upgraded to 11 and the first 2 nodes was fine but on the > > > third > > > node I got the same behaviour: the brick never started. > > > > > > Thanks for the help! > > > > > > Regards > > > Marcus > > > > > > > > > On Mon, Feb 20, 2023 at 12:30:37PM +0100, Xavi Hernandez wrote: > > > > CAUTION: This email originated from outside of the organization. Do not > > > > click links or open attachments unless you recognize the sender and > > > > know the content is safe. > > > > > > > > > > > > Hi Marcus, > > > > > > > > On Mon, Feb 20, 2023 at 8:50 AM Marcus Pedersén > > > > <[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>> > > > > wrote: > > > > Hi Xavi, > > > > I stopped glusterd and killall glusterd glusterfs glusterfsd > > > > and started glusterd again. > > > > > > > > The only log that is not empty is glusterd.log, I attach the log > > > > from the restart time. The brick log, glustershd.log and > > > > glfsheal-gds-common.log is empty. > > > > > > > > This are the errors in the log: > > > > [2023-02-20 07:23:46.235263 +0000] E [MSGID: 106061] > > > > [glusterd.c:597:glusterd_crt_georep_folders] 0-glusterd: Dict get > > > > failed [{Key=log-group}, {errno=2}, {error=No such file or directory}] > > > > [2023-02-20 07:23:47.359917 +0000] E [MSGID: 106010] > > > > [glusterd-utils.c:3542:glusterd_compare_friend_volume] 0-management: > > > > Version of Cksums gds-common differ. local cksum = 3017846959, remote > > > > cksum = 2065453698 on peer urd-gds-031 > > > > [2023-02-20 07:23:47.438052 +0000] E [MSGID: 106010] > > > > [glusterd-utils.c:3542:glusterd_compare_friend_volume] 0-management: > > > > Version of Cksums gds-common differ. local cksum = 3017846959, remote > > > > cksum = 2065453698 on peer urd-gds-032 > > > > > > > > Geo replication is not setup so I guess there is nothing strange that > > > > there is an error regarding georep. > > > > The checksum error seems natural to be there as the other nodes are > > > > still on version 10. > > > > > > > > No. The configurations should be identical. > > > > > > > > Can you try to compare volume definitions in > > > > /var/lib/glusterd/vols/gds-common between the upgraded server and one > > > > of the old ones ? > > > > > > > > Regards, > > > > > > > > Xavi > > > > > > > > > > > > My previous exprience with upgrades is that the local bricks starts and > > > > gluster is up and running. No connection with the other nodes until > > > > they are upgraded as well. > > > > > > > > > > > > gluster peer status, gives the output: > > > > Number of Peers: 2 > > > > > > > > Hostname: urd-gds-032 > > > > Uuid: e6f96ad2-0fea-4d80-bd42-8236dd0f8439 > > > > State: Peer Rejected (Connected) > > > > > > > > Hostname: urd-gds-031 > > > > Uuid: 2d7c0ad7-dfcf-4eaf-9210-f879c7b406bf > > > > State: Peer Rejected (Connected) > > > > > > > > I suppose and guess that this is due to that the arbiter is version 11 > > > > and the other 2 nodes are version 10. > > > > > > > > Please let me know if I can provide any other information > > > > to try to solve this issue. > > > > > > > > Many thanks! > > > > Marcus > > > > > > > > > > > > On Mon, Feb 20, 2023 at 07:29:20AM +0100, Xavi Hernandez wrote: > > > > > CAUTION: This email originated from outside of the organization. Do > > > > > not click links or open attachments unless you recognize the sender > > > > > and know the content is safe. > > > > > > > > > > > > > > > Hi Marcus, > > > > > > > > > > these errors shouldn't prevent the bricks from starting. Isn't there > > > > > any other error or warning ? > > > > > > > > > > Regards, > > > > > > > > > > Xavi > > > > > > > > > > On Fri, Feb 17, 2023 at 3:06 PM Marcus Pedersén > > > > > <[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>><mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>>> > > > > > wrote: > > > > > Hi all, > > > > > I started an upgrade to gluster 11.0 from 10.3 on one of my clusters. > > > > > OS: Debian bullseye > > > > > > > > > > Volume Name: gds-common > > > > > Type: Replicate > > > > > Volume ID: 42c9fa00-2d57-4a58-b5ae-c98c349cfcb6 > > > > > Status: Started > > > > > Snapshot Count: 0 > > > > > Number of Bricks: 1 x (2 + 1) = 3 > > > > > Transport-type: tcp > > > > > Bricks: > > > > > Brick1: urd-gds-031:/urd-gds/gds-common > > > > > Brick2: urd-gds-032:/urd-gds/gds-common > > > > > Brick3: urd-gds-030:/urd-gds/gds-common (arbiter) > > > > > Options Reconfigured: > > > > > cluster.granular-entry-heal: on > > > > > storage.fips-mode-rchecksum: on > > > > > transport.address-family: inet > > > > > nfs.disable: on > > > > > performance.client-io-threads: off > > > > > > > > > > I started with the arbiter node, stopped all of gluster > > > > > upgraded to 11.0 and all went fine. > > > > > After upgrade I was able to see the other nodes and > > > > > all nodes were connected. > > > > > After a reboot on the arbiter nothing works the way it should. > > > > > Both brick1 and brick2 has connection but no connection > > > > > with the arbiter. > > > > > On the arbiter glusterd has started and is listening on port 24007, > > > > > the problem seems to be glusterfsd, it never starts! > > > > > > > > > > If I run: gluster volume status > > > > > > > > > > Status of volume: gds-common > > > > > Gluster process TCP Port RDMA Port > > > > > Online Pid > > > > > ------------------------------------------------------------------------------ > > > > > Brick urd-gds-030:/urd-gds/gds-common N/A N/A N > > > > > N/A > > > > > Self-heal Daemon on localhost N/A N/A N > > > > > N/A > > > > > > > > > > Task Status of Volume gds-common > > > > > ------------------------------------------------------------------------------ > > > > > There are no active volume tasks > > > > > > > > > > > > > > > In glusterd.log I find the following errors (arbiter node): > > > > > [2023-02-17 12:30:40.519585 +0000] E > > > > > [gf-io-uring.c:404:gf_io_uring_setup] 0-io: [MSGID:101240] Function > > > > > call failed <{function=io_uring_setup()}, {error=12 (Cannot allocate > > > > > memory)}> > > > > > [2023-02-17 12:30:40.678031 +0000] E [MSGID: 106061] > > > > > [glusterd.c:597:glusterd_crt_georep_folders] 0-glusterd: Dict get > > > > > failed [{Key=log-group}, {errno=2}, {error=No such file or directory}] > > > > > > > > > > In brick/urd-gds-gds-common.log I find the following error: > > > > > [2023-02-17 12:30:43.550753 +0000] E > > > > > [gf-io-uring.c:404:gf_io_uring_setup] 0-io: [MSGID:101240] Function > > > > > call failed <{function=io_uring_setup()}, {error=12 (Cannot allocate > > > > > memory)}> > > > > > > > > > > I enclose both logfiles. > > > > > > > > > > How do I resolve this issue?? > > > > > > > > > > Many thanks in advance!! > > > > > > > > > > Marcus > > > > > --- > > > > > När du skickar e-post till SLU så innebär detta att SLU behandlar > > > > > dina personuppgifter. För att läsa mer om hur detta går till, klicka > > > > > här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/> > > > > > E-mailing SLU will result in SLU processing your personal data. For > > > > > more information on how this is done, click here > > > > > <https://www.slu.se/en/about-slu/contact-slu/personal-data/> > > > > > ________ > > > > > > > > > > > > > > > > > > > > Community Meeting Calendar: > > > > > > > > > > Schedule - > > > > > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > > > > > Bridge: https://meet.google.com/cpu-eiue-hvk > > > > > Gluster-users mailing list > > > > > [email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>><mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>> > > > > > https://lists.gluster.org/mailman/listinfo/gluster-users > > > > --- > > > > När du skickar e-post till SLU så innebär detta att SLU behandlar dina > > > > personuppgifter. För att läsa mer om hur detta går till, klicka här > > > > <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/> > > > > E-mailing SLU will result in SLU processing your personal data. For > > > > more information on how this is done, click here > > > > <https://www.slu.se/en/about-slu/contact-slu/personal-data/> > > --- > > När du skickar e-post till SLU så innebär detta att SLU behandlar dina > > personuppgifter. För att läsa mer om hur detta går till, klicka här > > <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/> > > E-mailing SLU will result in SLU processing your personal data. For more > > information on how this is done, click here > > <https://www.slu.se/en/about-slu/contact-slu/personal-data/> > > > > -- > ************************************************** > * Marcus Pedersén * > * System administrator * > ************************************************** > * Interbull Centre * > * ================ * > * Department of Animal Breeding & Genetics — SLU * > * Box 7023, SE-750 07 * > * Uppsala, Sweden * > ************************************************** > * Visiting address: * > * Room 55614, Ulls väg 26, Ultuna * > * Uppsala * > * Sweden * > * * > * Tel: +46-(0)18-67 1962 * > * * > ************************************************** > * ISO 9001 Bureau Veritas No SE004561-1 * > ************************************************** > --- > När du skickar e-post till SLU så innebär detta att SLU behandlar dina > personuppgifter. För att läsa mer om hur detta går till, klicka här > <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/> > E-mailing SLU will result in SLU processing your personal data. For more > information on how this is done, click here > <https://www.slu.se/en/about-slu/contact-slu/personal-data/> > ________ > > > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://meet.google.com/cpu-eiue-hvk > Gluster-users mailing list > [email protected] > https://lists.gluster.org/mailman/listinfo/gluster-users --- När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/> ________ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list [email protected] https://lists.gluster.org/mailman/listinfo/gluster-users
