Re: [Gluster-users] Geo-replication status Faulty
Usually there is always only 1 "master" , but when you power off one of the 2 nodes - the geo rep should handle that and the second node should take the job. How long did you wait after gluster1 has been rebooted ? Best Regards, Strahil Nikolov В понеделник, 26 октомври 2020 г., 22:46:21 Гринуич+2, Gilberto Nunes написа: I was able to solve the issue restarting all servers. Now I have another issue! I just powered off the gluster01 server and then the geo-replication entered in faulty status. I tried to stop and start the gluster geo-replication like that: gluster volume geo-replication DATA root@gluster03::DATA-SLAVE resume Peer gluster01.home.local, which is a part of DATA volume, is down. Please bring up the peer and retry. geo-replication command failed How can I have geo-replication with 2 master and 1 slave? Thanks --- Gilberto Nunes Ferreira Em seg., 26 de out. de 2020 às 17:23, Gilberto Nunes escreveu: > Hi there... > > I'd created a 2 gluster vol and another 1 gluster server acting as a backup > server, using geo-replication. > So in gluster01 I'd issued the command: > > gluster peer probe gluster02;gluster peer probe gluster03 > gluster vol create DATA replica 2 gluster01:/DATA/master01-data > gluster02:/DATA/master01-data/ > > Then in gluster03 server: > > gluster vol create DATA-SLAVE gluster03:/DATA/slave-data/ > > I'd setted the ssh powerless session between this 3 servers. > > Then I'd used this script > > https://github.com/gilbertoferreira/georepsetup > > like this > > georepsetup > /usr/local/lib/python2.7/dist-packages/paramiko-2.7.2-py2.7.egg/paramiko/transport.py:33: > CryptographyDeprecationWarning: Python 2 is no longer supported by the > Python core team. Support for it is now deprecated in cryptography, and will > be removed in a future release. from cryptography.hazmat.backends import > default_backend usage: georepsetup [-h] [--force] [--no-color] MASTERVOL > SLAVE SLAVEVOL georepsetup: error: too few arguments gluster01:~# georepsetup > DATA gluster03 DATA-SLAVE > /usr/local/lib/python2.7/dist-packages/paramiko-2.7.2-py2.7.egg/paramiko/transport.py:33: > CryptographyDeprecationWarning: Python 2 is no longer supported by the > Python core team. Support for it is now deprecated in cryptography, and will > be removed in a future release. from cryptography.hazmat.backends import > default_backend Geo-replication session will be established between DATA and > gluster03::DATA-SLAVE Root password of gluster03 is required to complete the > setup. NOTE: Password will not be stored. root@gluster03's password: [ > OK] gluster03 is Reachable(Port 22) [ OK] SSH Connection established > root@gluster03 [ OK] Master Volume and Slave Volume are compatible > (Version: 8.2) [ OK] Common secret pub file present at > /var/lib/glusterd/geo-replication/common_secret.pem.pub [ OK] > common_secret.pem.pub file copied to gluster03 [ OK] Master SSH Keys > copied to all Up Slave nodes [ OK] Updated Master SSH Keys to all Up Slave > nodes authorized_keys file [ OK] Geo-replication Session Established > Then I reboot the 3 servers... > After a while everything works ok, but after a few minutes, I get Faulty > status in gluster01 > > There's the log > > > [2020-10-26 20:16:41.362584] I [gsyncdstatus(monitor):248:set_worker_status] > GeorepStatus: Worker Status Change [{status=Initializing...}] [2020-10-26 > 20:16:41.362937] I [monitor(monitor):160:monitor] Monitor: starting gsyncd > worker [{brick=/DATA/master01-data}, {slave_node=gluster03}] [2020-10-26 > 20:16:41.508884] I [resource(worker /DATA/master01-data):1387:connect_remote] > SSH: Initializing SSH connection between master and slave... [2020-10-26 > 20:16:42.996678] I [resource(worker /DATA/master01-data):1436:connect_remote] > SSH: SSH connection between master and slave established. [{duration=1.4873}] > [2020-10-26 20:16:42.997121] I [resource(worker > /DATA/master01-data):1116:connect] GLUSTER: Mounting gluster volume > locally... [2020-10-26 20:16:44.170661] E [syncdutils(worker > /DATA/master01-data):110:gf_mount_ready] : failed to get the xattr value > [2020-10-26 20:16:44.171281] I [resource(worker > /DATA/master01-data):1139:connect] GLUSTER: Mounted gluster volume > [{duration=1.1739}] [2020-10-26 20:16:44.171772] I [subcmds(worker > /DATA/master01-data):84:subcmd_worker] : Worker spawn successful. > Acknowledging back to monitor [2020-10-26 20:16:46.200603] I [master(worker > /DATA/master01-data):1645:register] _GMaster: Working dir > [{path=/var/lib/misc/gluster/gsyncd/DATA_gluster03_DATA-SLAVE/DATA-master01-data}] > [2020-10-26 20:16:46.201798] I [resource(worker > /DATA/master01-data):1292:service_loop] GLUSTER: Register time > [{time=1603743406}] [2020-10-26 20:16:46.226415] I [gsyncdstatus(worker > /DATA/master01-data):281:set_active] GeorepStatus: Worker Status Change > [{status=Active}] [2020-10-2
Re: [Gluster-users] Upgrade from 6.9 to 7.7 stuck (peer is rejected)
You need to fix that "reject" issue before trying anything else. Have you tried to "detach" the arbiter and then "probe" it again ? I have no idea what you did to reach that state - can you provide the details ? Best Regards, Strahil Nikolov В понеделник, 26 октомври 2020 г., 20:38:38 Гринуич+2, mabi написа: Ok I see I won't go down that path of disabling quota. I could now remove the arbiter brick of my volume which has the quota issue so it is now a simple 2 nodes replica with 1 brick per node. Now I would like to add the brick back but I get the following error: volume add-brick: failed: Host arbiternode.domain.tld is not in 'Peer in Cluster' state In fact I checked and the arbiter node is still rejected as you can see here: State: Peer Rejected (Connected) On the arbiter node glusted.log file I see the following errors: [2020-10-26 18:35:05.605124] E [MSGID: 106012] [glusterd-utils.c:3682:glusterd_compare_friend_volume] 0-management: Cksums of quota configuration of volume woelkli-private differ. local cksum = 0, remote cksum = 66908910 on peer node1.domain.tld [2020-10-26 18:35:05.617009] E [MSGID: 106012] [glusterd-utils.c:3682:glusterd_compare_friend_volume] 0-management: Cksums of quota configuration of volume myvol-private differ. local cksum = 0, remote cksum = 66908910 on peer node2.domain.tld So although I have removed the arbiter brick from my volume it it still complains about that checksum of the quota configuration. I also tried to restart glusterd on my arbiter node but it does not help. The peer is still rejected. What should I do at this stage? ‐‐‐ Original Message ‐‐‐ On Monday, October 26, 2020 6:06 PM, Strahil Nikolov wrote: > Detaching the arbiter is pointless... > Quota is an extended file attribute, and thus disabling and reenabling quota > on a volume with millions of files will take a lot of time and lots of IOPS. > I would leave it as a last resort. > > Also, it was mentioned in the list about the following script that might help > you: > https://github.com/gluster/glusterfs/blob/devel/extras/quota/quota_fsck.py > > You can take a look in the mailing list for usage and more details. > > Best Regards, > Strahil Nikolov > > В понеделник, 26 октомври 2020 г., 16:40:06 Гринуич+2, Diego Zuccato > diego.zucc...@unibo.it написа: > > Il 26/10/20 15:09, mabi ha scritto: > > > Right, seen liked that this sounds reasonable. Do you actually remember the > > exact command you ran in order to remove the brick? I was thinking this > > should be it: > > gluster volume remove-brick force > > but should I use "force" or "start"? > > Memory does not serve me well (there are 28 disks, not 26!), but bash > history does :) > > gluster volume remove-brick BigVol replica 2 > > = > > str957-biostq:/srv/arbiters/{00..27}/BigVol force > > gluster peer detach str957-biostq > > == > > gluster peer probe str957-biostq > > = > > gluster volume add-brick BigVol replica 3 arbiter 1 > > > > str957-biostq:/srv/arbiters/{00..27}/BigVol > > You obviously have to wait for remove-brick to complete before detaching > arbiter. > > > > IIRC it took about 3 days, but the arbiters are on a VM (8CPU, 8GB RAM) > > > that uses an iSCSI disk. More than 80% continuous load on both CPUs and > > > RAM. > > > That's quite long I must say and I am in the same case as you, my arbiter > > > is a VM. > > Give all the CPU and RAM you can. Less than 8GB RAM is asking for > troubles (in my case). > > - > > Diego Zuccato > DIFA - Dip. di Fisica e Astronomia > Servizi Informatici > Alma Mater Studiorum - Università di Bologna > V.le Berti-Pichat 6/2 - 40127 Bologna - Italy > tel.: +39 051 20 95786 > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://bluejeans.com/441850968 > > Gluster-users mailing list > Gluster-users@gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] Gluster monitoring
Hello How do you keep track of the health status of your Gluster volumes? When Brick went down (crash, failure, shutdown), node failure, peering issue, on-going healing? Gluster Tendrl is complex and sometimes it's broken, Prometheus exporter still lacking, gstatus is basic. Currently, to monitor a Gluster volume, a custom script should be used to gather whatever info needed for monitoring or a combination of the mentioned tools. Can Gluster have something similar to Ceph and display the health of the entire cluster? I know Ceph uses it’s “Monitors” to keep track of everything going inside the cluster, but Gluster should also have a way to keep track of the cluster’s health. How’s the community experience with Gluster monitoring? How are you managing and tracking alerts and issues? Any recommendations? Thank you. -- Respectfully Mahdi Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Geo-replication status Faulty
Well I do not reboot the host. I shut down the host. Then after 15 min give up. Don't know why that happened. I will try it latter --- Gilberto Nunes Ferreira Em seg., 26 de out. de 2020 às 21:31, Strahil Nikolov escreveu: > Usually there is always only 1 "master" , but when you power off one of > the 2 nodes - the geo rep should handle that and the second node should > take the job. > > How long did you wait after gluster1 has been rebooted ? > > > Best Regards, > Strahil Nikolov > > > > > > > В понеделник, 26 октомври 2020 г., 22:46:21 Гринуич+2, Gilberto Nunes < > gilberto.nune...@gmail.com> написа: > > > > > > I was able to solve the issue restarting all servers. > > Now I have another issue! > > I just powered off the gluster01 server and then the geo-replication > entered in faulty status. > I tried to stop and start the gluster geo-replication like that: > > gluster volume geo-replication DATA root@gluster03::DATA-SLAVE resume > Peer gluster01.home.local, which is a part of DATA volume, is down. Please > bring up the peer and retry. geo-replication command failed > How can I have geo-replication with 2 master and 1 slave? > > Thanks > > > --- > Gilberto Nunes Ferreira > > > > > > > > Em seg., 26 de out. de 2020 às 17:23, Gilberto Nunes < > gilberto.nune...@gmail.com> escreveu: > > Hi there... > > > > I'd created a 2 gluster vol and another 1 gluster server acting as a > backup server, using geo-replication. > > So in gluster01 I'd issued the command: > > > > gluster peer probe gluster02;gluster peer probe gluster03 > > gluster vol create DATA replica 2 gluster01:/DATA/master01-data > gluster02:/DATA/master01-data/ > > > > Then in gluster03 server: > > > > gluster vol create DATA-SLAVE gluster03:/DATA/slave-data/ > > > > I'd setted the ssh powerless session between this 3 servers. > > > > Then I'd used this script > > > > https://github.com/gilbertoferreira/georepsetup > > > > like this > > > > georepsetup > > /usr/local/lib/python2.7/dist-packages/paramiko-2.7.2-py2.7.egg/paramiko/transport.py:33: > CryptographyDeprecationWarning: Python 2 is no longer supported by the > Python core team. Support for it is now deprecated in cryptography, and > will be removed in a future release. from cryptography.hazmat.backends > import default_backend usage: georepsetup [-h] [--force] [--no-color] > MASTERVOL SLAVE SLAVEVOL georepsetup: error: too few arguments gluster01:~# > georepsetup DATA gluster03 DATA-SLAVE > /usr/local/lib/python2.7/dist-packages/paramiko-2.7.2-py2.7.egg/paramiko/transport.py:33: > CryptographyDeprecationWarning: Python 2 is no longer supported by the > Python core team. Support for it is now deprecated in cryptography, and > will be removed in a future release. from cryptography.hazmat.backends > import default_backend Geo-replication session will be established between > DATA and gluster03::DATA-SLAVE Root password of gluster03 is required to > complete the setup. NOTE: Password will not be stored. root@gluster03's > password: [OK] gluster03 is Reachable(Port 22) [OK] SSH Connection > established root@gluster03 [OK] Master Volume and Slave Volume are > compatible (Version: 8.2) [OK] Common secret pub file present at > /var/lib/glusterd/geo-replication/common_secret.pem.pub [OK] > common_secret.pem.pub file copied to gluster03 [OK] Master SSH Keys > copied to all Up Slave nodes [OK] Updated Master SSH Keys to all Up > Slave nodes authorized_keys file [OK] Geo-replication Session > Established > > Then I reboot the 3 servers... > > After a while everything works ok, but after a few minutes, I get Faulty > status in gluster01 > > > > There's the log > > > > > > [2020-10-26 20:16:41.362584] I > [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status > Change [{status=Initializing...}] [2020-10-26 20:16:41.362937] I > [monitor(monitor):160:monitor] Monitor: starting gsyncd worker > [{brick=/DATA/master01-data}, {slave_node=gluster03}] [2020-10-26 > 20:16:41.508884] I [resource(worker > /DATA/master01-data):1387:connect_remote] SSH: Initializing SSH connection > between master and slave... [2020-10-26 20:16:42.996678] I [resource(worker > /DATA/master01-data):1436:connect_remote] SSH: SSH connection between > master and slave established. [{duration=1.4873}] [2020-10-26 > 20:16:42.997121] I [resource(worker /DATA/master01-data):1116:connect] > GLUSTER: Mounting gluster volume locally... [2020-10-26 20:16:44.170661] E > [syncdutils(worker /DATA/master01-data):110:gf_mount_ready] : failed > to get the xattr value [2020-10-26 20:16:44.171281] I [resource(worker > /DATA/master01-data):1139:connect] GLUSTER: Mounted gluster volume > [{duration=1.1739}] [2020-10-26 20:16:44.171772] I [subcmds(worker > /DATA/master01-data):84:subcmd_worker] : Worker spawn successful. > Acknowledging back to monitor [2020-10-26 20:16:46.200603] I [master(worker > /DATA/master01-data):1645:register] _GMaster: Working dir > [{path=/var/lib/
Re: [Gluster-users] Upgrade from 6.9 to 7.7 stuck (peer is rejected)
Ok I see I won't go down that path of disabling quota. I could now remove the arbiter brick of my volume which has the quota issue so it is now a simple 2 nodes replica with 1 brick per node. Now I would like to add the brick back but I get the following error: volume add-brick: failed: Host arbiternode.domain.tld is not in 'Peer in Cluster' state In fact I checked and the arbiter node is still rejected as you can see here: State: Peer Rejected (Connected) On the arbiter node glusted.log file I see the following errors: [2020-10-26 18:35:05.605124] E [MSGID: 106012] [glusterd-utils.c:3682:glusterd_compare_friend_volume] 0-management: Cksums of quota configuration of volume woelkli-private differ. local cksum = 0, remote cksum = 66908910 on peer node1.domain.tld [2020-10-26 18:35:05.617009] E [MSGID: 106012] [glusterd-utils.c:3682:glusterd_compare_friend_volume] 0-management: Cksums of quota configuration of volume myvol-private differ. local cksum = 0, remote cksum = 66908910 on peer node2.domain.tld So although I have removed the arbiter brick from my volume it it still complains about that checksum of the quota configuration. I also tried to restart glusterd on my arbiter node but it does not help. The peer is still rejected. What should I do at this stage? ‐‐‐ Original Message ‐‐‐ On Monday, October 26, 2020 6:06 PM, Strahil Nikolov wrote: > Detaching the arbiter is pointless... > Quota is an extended file attribute, and thus disabling and reenabling quota > on a volume with millions of files will take a lot of time and lots of IOPS. > I would leave it as a last resort. > > Also, it was mentioned in the list about the following script that might help > you: > https://github.com/gluster/glusterfs/blob/devel/extras/quota/quota_fsck.py > > You can take a look in the mailing list for usage and more details. > > Best Regards, > Strahil Nikolov > > В понеделник, 26 октомври 2020 г., 16:40:06 Гринуич+2, Diego Zuccato > diego.zucc...@unibo.it написа: > > Il 26/10/20 15:09, mabi ha scritto: > > > Right, seen liked that this sounds reasonable. Do you actually remember the > > exact command you ran in order to remove the brick? I was thinking this > > should be it: > > gluster volume remove-brick force > > but should I use "force" or "start"? > > Memory does not serve me well (there are 28 disks, not 26!), but bash > history does :) > > gluster volume remove-brick BigVol replica 2 > > = > > str957-biostq:/srv/arbiters/{00..27}/BigVol force > > gluster peer detach str957-biostq > > == > > gluster peer probe str957-biostq > > = > > gluster volume add-brick BigVol replica 3 arbiter 1 > > > > str957-biostq:/srv/arbiters/{00..27}/BigVol > > You obviously have to wait for remove-brick to complete before detaching > arbiter. > > > > IIRC it took about 3 days, but the arbiters are on a VM (8CPU, 8GB RAM) > > > that uses an iSCSI disk. More than 80% continuous load on both CPUs and > > > RAM. > > > That's quite long I must say and I am in the same case as you, my arbiter > > > is a VM. > > Give all the CPU and RAM you can. Less than 8GB RAM is asking for > troubles (in my case). > > - > > Diego Zuccato > DIFA - Dip. di Fisica e Astronomia > Servizi Informatici > Alma Mater Studiorum - Università di Bologna > V.le Berti-Pichat 6/2 - 40127 Bologna - Italy > tel.: +39 051 20 95786 > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://bluejeans.com/441850968 > > Gluster-users mailing list > Gluster-users@gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Upgrade from 6.9 to 7.7 stuck (peer is rejected)
Detaching the arbiter is pointless... Quota is an extended file attribute, and thus disabling and reenabling quota on a volume with millions of files will take a lot of time and lots of IOPS. I would leave it as a last resort. Also, it was mentioned in the list about the following script that might help you: https://github.com/gluster/glusterfs/blob/devel/extras/quota/quota_fsck.py You can take a look in the mailing list for usage and more details. Best Regards, Strahil Nikolov В понеделник, 26 октомври 2020 г., 16:40:06 Гринуич+2, Diego Zuccato написа: Il 26/10/20 15:09, mabi ha scritto: > Right, seen liked that this sounds reasonable. Do you actually remember the > exact command you ran in order to remove the brick? I was thinking this > should be it: > gluster volume remove-brick force > but should I use "force" or "start"? Memory does not serve me well (there are 28 disks, not 26!), but bash history does :) # gluster volume remove-brick BigVol replica 2 str957-biostq:/srv/arbiters/{00..27}/BigVol force # gluster peer detach str957-biostq # gluster peer probe str957-biostq # gluster volume add-brick BigVol replica 3 arbiter 1 str957-biostq:/srv/arbiters/{00..27}/BigVol You obviously have to wait for remove-brick to complete before detaching arbiter. >> IIRC it took about 3 days, but the arbiters are on a VM (8CPU, 8GB RAM) >> that uses an iSCSI disk. More than 80% continuous load on both CPUs and RAM. > That's quite long I must say and I am in the same case as you, my arbiter is > a VM. Give all the CPU and RAM you can. Less than 8GB RAM is asking for troubles (in my case). -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Geo-replication status Faulty
I was able to solve the issue restarting all servers. Now I have another issue! I just powered off the gluster01 server and then the geo-replication entered in faulty status. I tried to stop and start the gluster geo-replication like that: gluster volume geo-replication DATA root@gluster03::DATA-SLAVE resume Peer gluster01.home.local, which is a part of DATA volume, is down. Please bring up the peer and retry. geo-replication command failed How can I have geo-replication with 2 master and 1 slave? Thanks --- Gilberto Nunes Ferreira Em seg., 26 de out. de 2020 às 17:23, Gilberto Nunes < gilberto.nune...@gmail.com> escreveu: > Hi there... > > I'd created a 2 gluster vol and another 1 gluster server acting as a > backup server, using geo-replication. > So in gluster01 I'd issued the command: > > gluster peer probe gluster02;gluster peer probe gluster03 > gluster vol create DATA replica 2 gluster01:/DATA/master01-data > gluster02:/DATA/master01-data/ > > Then in gluster03 server: > > gluster vol create DATA-SLAVE gluster03:/DATA/slave-data/ > > I'd setted the ssh powerless session between this 3 servers. > > Then I'd used this script > > https://github.com/gilbertoferreira/georepsetup > > like this > > georepsetup > /usr/local/lib/python2.7/dist-packages/paramiko-2.7.2-py2.7.egg/paramiko/transport.py:33: > CryptographyDeprecationWarning: Python 2 is no longer supp > orted by the Python core team. Support for it is now deprecated in > cryptography, and will be removed in a future release. > from cryptography.hazmat.backends import default_backend > usage: georepsetup [-h] [--force] [--no-color] MASTERVOL SLAVE SLAVEVOL > georepsetup: error: too few arguments > gluster01:~# georepsetup DATA gluster03 DATA-SLAVE > /usr/local/lib/python2.7/dist-packages/paramiko-2.7.2-py2.7.egg/paramiko/transport.py:33: > CryptographyDeprecationWarning: Python 2 is no longer supp > orted by the Python core team. Support for it is now deprecated in > cryptography, and will be removed in a future release. > from cryptography.hazmat.backends import default_backend > Geo-replication session will be established between DATA and > gluster03::DATA-SLAVE > Root password of gluster03 is required to complete the setup. NOTE: > Password will not be stored. > > root@gluster03's password: > [OK] gluster03 is Reachable(Port 22) > [OK] SSH Connection established root@gluster03 > [OK] Master Volume and Slave Volume are compatible (Version: 8.2) > [OK] Common secret pub file present at > /var/lib/glusterd/geo-replication/common_secret.pem.pub > [OK] common_secret.pem.pub file copied to gluster03 > [OK] Master SSH Keys copied to all Up Slave nodes > [OK] Updated Master SSH Keys to all Up Slave nodes authorized_keys > file > [OK] Geo-replication Session Established > > Then I reboot the 3 servers... > After a while everything works ok, but after a few minutes, I get Faulty > status in gluster01 > > There's the log > > > [2020-10-26 20:16:41.362584] I > [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status > Change [{status=Initializing...}] > [2020-10-26 20:16:41.362937] I [monitor(monitor):160:monitor] Monitor: > starting gsyncd worker [{brick=/DATA/master01-data}, > {slave_node=gluster03}] > [2020-10-26 20:16:41.508884] I [resource(worker > /DATA/master01-data):1387:connect_remote] SSH: Initializing SSH connection > between master and slave. > .. > [2020-10-26 20:16:42.996678] I [resource(worker > /DATA/master01-data):1436:connect_remote] SSH: SSH connection between > master and slave established. > [{duration=1.4873}] > [2020-10-26 20:16:42.997121] I [resource(worker > /DATA/master01-data):1116:connect] GLUSTER: Mounting gluster volume > locally... > [2020-10-26 20:16:44.170661] E [syncdutils(worker > /DATA/master01-data):110:gf_mount_ready] : failed to get the xattr > value > [2020-10-26 20:16:44.171281] I [resource(worker > /DATA/master01-data):1139:connect] GLUSTER: Mounted gluster volume > [{duration=1.1739}] > [2020-10-26 20:16:44.171772] I [subcmds(worker > /DATA/master01-data):84:subcmd_worker] : Worker spawn successful. > Acknowledging back to monitor > [2020-10-26 20:16:46.200603] I [master(worker > /DATA/master01-data):1645:register] _GMaster: Working dir > [{path=/var/lib/misc/gluster/gsyncd/DATA_glu > ster03_DATA-SLAVE/DATA-master01-data}] > [2020-10-26 20:16:46.201798] I [resource(worker > /DATA/master01-data):1292:service_loop] GLUSTER: Register time > [{time=1603743406}] > [2020-10-26 20:16:46.226415] I [gsyncdstatus(worker > /DATA/master01-data):281:set_active] GeorepStatus: Worker Status Change > [{status=Active}] > [2020-10-26 20:16:46.395112] I [gsyncdstatus(worker > /DATA/master01-data):253:set_worker_crawl_status] GeorepStatus: Crawl > Status Change [{status=His > tory Crawl}] > [2020-10-26 20:16:46.396491] I [master(worker > /DATA/master01-data):1559:crawl] _GMaster: starting history crawl > [{turns=1}, {stime=(1603742506, 0)}, > {etime=1603743406}, {ent
[Gluster-users] Geo-replication status Faulty
Hi there... I'd created a 2 gluster vol and another 1 gluster server acting as a backup server, using geo-replication. So in gluster01 I'd issued the command: gluster peer probe gluster02;gluster peer probe gluster03 gluster vol create DATA replica 2 gluster01:/DATA/master01-data gluster02:/DATA/master01-data/ Then in gluster03 server: gluster vol create DATA-SLAVE gluster03:/DATA/slave-data/ I'd setted the ssh powerless session between this 3 servers. Then I'd used this script https://github.com/gilbertoferreira/georepsetup like this georepsetup /usr/local/lib/python2.7/dist-packages/paramiko-2.7.2-py2.7.egg/paramiko/transport.py:33: CryptographyDeprecationWarning: Python 2 is no longer supp orted by the Python core team. Support for it is now deprecated in cryptography, and will be removed in a future release. from cryptography.hazmat.backends import default_backend usage: georepsetup [-h] [--force] [--no-color] MASTERVOL SLAVE SLAVEVOL georepsetup: error: too few arguments gluster01:~# georepsetup DATA gluster03 DATA-SLAVE /usr/local/lib/python2.7/dist-packages/paramiko-2.7.2-py2.7.egg/paramiko/transport.py:33: CryptographyDeprecationWarning: Python 2 is no longer supp orted by the Python core team. Support for it is now deprecated in cryptography, and will be removed in a future release. from cryptography.hazmat.backends import default_backend Geo-replication session will be established between DATA and gluster03::DATA-SLAVE Root password of gluster03 is required to complete the setup. NOTE: Password will not be stored. root@gluster03's password: [OK] gluster03 is Reachable(Port 22) [OK] SSH Connection established root@gluster03 [OK] Master Volume and Slave Volume are compatible (Version: 8.2) [OK] Common secret pub file present at /var/lib/glusterd/geo-replication/common_secret.pem.pub [OK] common_secret.pem.pub file copied to gluster03 [OK] Master SSH Keys copied to all Up Slave nodes [OK] Updated Master SSH Keys to all Up Slave nodes authorized_keys file [OK] Geo-replication Session Established Then I reboot the 3 servers... After a while everything works ok, but after a few minutes, I get Faulty status in gluster01 There's the log [2020-10-26 20:16:41.362584] I [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change [{status=Initializing...}] [2020-10-26 20:16:41.362937] I [monitor(monitor):160:monitor] Monitor: starting gsyncd worker [{brick=/DATA/master01-data}, {slave_node=gluster03}] [2020-10-26 20:16:41.508884] I [resource(worker /DATA/master01-data):1387:connect_remote] SSH: Initializing SSH connection between master and slave. .. [2020-10-26 20:16:42.996678] I [resource(worker /DATA/master01-data):1436:connect_remote] SSH: SSH connection between master and slave established. [{duration=1.4873}] [2020-10-26 20:16:42.997121] I [resource(worker /DATA/master01-data):1116:connect] GLUSTER: Mounting gluster volume locally... [2020-10-26 20:16:44.170661] E [syncdutils(worker /DATA/master01-data):110:gf_mount_ready] : failed to get the xattr value [2020-10-26 20:16:44.171281] I [resource(worker /DATA/master01-data):1139:connect] GLUSTER: Mounted gluster volume [{duration=1.1739}] [2020-10-26 20:16:44.171772] I [subcmds(worker /DATA/master01-data):84:subcmd_worker] : Worker spawn successful. Acknowledging back to monitor [2020-10-26 20:16:46.200603] I [master(worker /DATA/master01-data):1645:register] _GMaster: Working dir [{path=/var/lib/misc/gluster/gsyncd/DATA_glu ster03_DATA-SLAVE/DATA-master01-data}] [2020-10-26 20:16:46.201798] I [resource(worker /DATA/master01-data):1292:service_loop] GLUSTER: Register time [{time=1603743406}] [2020-10-26 20:16:46.226415] I [gsyncdstatus(worker /DATA/master01-data):281:set_active] GeorepStatus: Worker Status Change [{status=Active}] [2020-10-26 20:16:46.395112] I [gsyncdstatus(worker /DATA/master01-data):253:set_worker_crawl_status] GeorepStatus: Crawl Status Change [{status=His tory Crawl}] [2020-10-26 20:16:46.396491] I [master(worker /DATA/master01-data):1559:crawl] _GMaster: starting history crawl [{turns=1}, {stime=(1603742506, 0)}, {etime=1603743406}, {entry_stime=(1603743226, 0)}] [2020-10-26 20:16:46.399292] E [resource(worker /DATA/master01-data):1312:service_loop] GLUSTER: Changelog History Crawl failed [{error=[Errno 0] Su cesso}] [2020-10-26 20:16:47.177205] I [monitor(monitor):228:monitor] Monitor: worker died in startup phase [{brick=/DATA/master01-data}] [2020-10-26 20:16:47.184525] I [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change [{status=Faulty}] Any advice will be welcome. Thanks --- Gilberto Nunes Ferreira Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Upgrade from 6.9 to 7.7 stuck (peer is rejected)
‐‐‐ Original Message ‐‐‐ On Monday, October 26, 2020 3:39 PM, Diego Zuccato wrote: > Memory does not serve me well (there are 28 disks, not 26!), but bash > history does :) Yes, I also too often rely on history ;) > gluster volume remove-brick BigVol replica 2 > str957-biostq:/srv/arbiters/{00..27}/BigVol force Thanks for the info, I was missing the "replica 2" inside the command it looks like > gluster peer detach str957-biostq > gluster peer probe str957-biostq Do I really need to do a detach and re-probe the aribter node? I would like to avoid that because I have two other volumes with even more files... so that would mean that I have to remove the arbiter brick of the two other volumes too... > Give all the CPU and RAM you can. Less than 8GB RAM is asking for > troubles (in my case). I have added an extra 4 GB of RAM just in case. Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Upgrade from 6.9 to 7.7 stuck (peer is rejected)
‐‐‐ Original Message ‐‐‐ On Monday, October 26, 2020 2:56 PM, Diego Zuccato wrote: > The volume is built by 26 10TB disks w/ genetic data. I currently don't > have exact numbers, but it's still at the beginning, so there are a bit > less than 10TB actually used. > But you're only removing the arbiters, you always have two copies of > your files. The worst that can happen is a split brain condition > (avoidable by requiring a 2-nodes quorum, in that case the worst is that > the volume goes readonly). Right, seen liked that this sounds reasonable. Do you actually remember the exact command you ran in order to remove the brick? I was thinking this should be it: gluster volume remove-brick force but should I use "force" or "start"? > IIRC it took about 3 days, but the arbiters are on a VM (8CPU, 8GB RAM) > that uses an iSCSI disk. More than 80% continuous load on both CPUs and RAM. That's quite long I must say and I am in the same case as you, my arbiter is a VM. Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Upgrade from 6.9 to 7.7 stuck (peer is rejected)
On Monday, October 26, 2020 11:34 AM, Diego Zuccato wrote: > IIRC it's the same issue I had some time ago. > I solved it by "degrading" the volume to replica 2, then cleared the > arbiter bricks and upgraded again to replica 3 arbiter 1. Thanks Diego for pointing out this workaround. How much data do you have on that volume in terms of TB and files? Because I have around 3TB of data in 10 million files. So I am a bit worried of taking such drastic measures. How bad was the load after on your volume when re-adding the arbiter brick? and how long did it take to sync/heal? Would another workaround such as turning off quotas on that problematic volume work? That sounds much less scary but I don't know if that would work... Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Upgrade from 6.9 to 7.7 stuck (peer is rejected)
Il 26/10/20 15:09, mabi ha scritto: > Right, seen liked that this sounds reasonable. Do you actually remember the > exact command you ran in order to remove the brick? I was thinking this > should be it: > gluster volume remove-brick force > but should I use "force" or "start"? Memory does not serve me well (there are 28 disks, not 26!), but bash history does :) # gluster volume remove-brick BigVol replica 2 str957-biostq:/srv/arbiters/{00..27}/BigVol force # gluster peer detach str957-biostq # gluster peer probe str957-biostq # gluster volume add-brick BigVol replica 3 arbiter 1 str957-biostq:/srv/arbiters/{00..27}/BigVol You obviously have to wait for remove-brick to complete before detaching arbiter. >> IIRC it took about 3 days, but the arbiters are on a VM (8CPU, 8GB RAM) >> that uses an iSCSI disk. More than 80% continuous load on both CPUs and RAM. > That's quite long I must say and I am in the same case as you, my arbiter is > a VM. Give all the CPU and RAM you can. Less than 8GB RAM is asking for troubles (in my case). -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Upgrade from 6.9 to 7.7 stuck (peer is rejected)
Il 26/10/20 14:46, mabi ha scritto: >> I solved it by "degrading" the volume to replica 2, then cleared the >> arbiter bricks and upgraded again to replica 3 arbiter 1. > Thanks Diego for pointing out this workaround. How much data do you have on > that volume in terms of TB and files? Because I have around 3TB of data in 10 > million files. So I am a bit worried of taking such drastic measures. The volume is built by 26 10TB disks w/ genetic data. I currently don't have exact numbers, but it's still at the beginning, so there are a bit less than 10TB actually used. But you're only removing the arbiters, you always have two copies of your files. The worst that can happen is a split brain condition (avoidable by requiring a 2-nodes quorum, in that case the worst is that the volume goes readonly). > How bad was the load after on your volume when re-adding the arbiter brick? > and how long did it take to sync/heal? IIRC it took about 3 days, but the arbiters are on a VM (8CPU, 8GB RAM) that uses an iSCSI disk. More than 80% continuous load on both CPUs and RAM. > Would another workaround such as turning off quotas on that problematic > volume work? That sounds much less scary but I don't know if that would > work... I don't know, sorry. -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Glusterfs as databse store
HI Strahil, Thanx for your feedback. I had already received your feedback which seems to be very useful. You had pointed at /var/lib/glusterd/groups/db-workload profile which includes recommended gluster volume settings for such work-loads (includes direct IO). I will be testing this setup though I expect no issues apart from slower performance then native setup. On Sun, Oct 25, 2020 at 9:45 PM Strahil Nikolov wrote: > Hey Alex, > > sorry for the late reply - seems you went to the SPAM dir. > > I think that a DB with direct I/O won't have any issues with Gluster.As a > second thought , DBs know their data file names , so even 1 file per table > will work quite OK. > > But you will need a lot of testing before putting something into > production. > > > Best Regards, > Strahil Nikolov > > > > > > > В понеделник, 12 октомври 2020 г., 21:10:03 Гринуич+3, Alex K < > rightkickt...@gmail.com> написа: > > > > > > > > On Mon, Oct 12, 2020, 19:24 Strahil Nikolov wrote: > > Hi Alex, > > > > I can share that oVirt is using Gluster as a HCI solution and many > people are hosting DBs in their Virtual Machines.Yet, oVirt bypasses any > file system caches and uses Direct I/O in order to ensure consistency. > > > > As you will be using pacemaker, drbd is a viable solution that can be > controlled easily. > Thank you Strahil. I am using ovirt with glusterfs successfully for the > last 5 years and I'm very happy about it. Though the vms gluster volume has > sharding enabled by default and I suspect this is different if you run DB > directly on top glusterfs. I assume there are optimizations one could apply > at gluster volumes (use direct io?, small file workload optimizations, etc) > and was hoping that there were success stories of DBs on top glusterfs. I > might go with drbd as the latest version is much more scalable and > simplified. > > > > Best Regards, > > Strahil Nikolov > > > > > > > > > > > > > > > > В понеделник, 12 октомври 2020 г., 12:12:18 Гринуич+3, Alex K < > rightkickt...@gmail.com> написа: > > > > > > > > > > > > > > > > On Mon, Oct 12, 2020 at 9:47 AM Diego Zuccato > wrote: > >> Il 10/10/20 16:53, Alex K ha scritto: > >> > >>> Reading from the docs i see that this is not recommended? > >> IIUC the risk of having partially-unsynced data is is too high. > >> DB replication is not easy to configure because it's hard to do well, > >> even active/passive. > >> But I can tell you that a 3-node mariadb (galera) cluster is not hard to > >> setup. Just follow one of the tutorials. It's nearly as easy as setting > >> up a replica3 gluster volume :) > >> And "guarantees" consinstency in the DB data. > > I see. Since I will not have only mariadb, then I have to setup the same > replication for postgresql and later influxdb, which adds into the > complexity. > > For cluster management I will be using pacemaker/corosync. > > > > Thanx for your feedback > > > >> > >> -- > >> Diego Zuccato > >> DIFA - Dip. di Fisica e Astronomia > >> Servizi Informatici > >> Alma Mater Studiorum - Università di Bologna > >> V.le Berti-Pichat 6/2 - 40127 Bologna - Italy > >> tel.: +39 051 20 95786 > >> > > > > > > > > > > Community Meeting Calendar: > > > > Schedule - > > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > > Bridge: https://bluejeans.com/441850968 > > > > Gluster-users mailing list > > Gluster-users@gluster.org > > https://lists.gluster.org/mailman/listinfo/gluster-users > > > Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] missing files on FUSE mount
HI Strahil, thanks for your reply, I had one node with 13 clients, the rest with 14. I've just restarted the services on that node, now I have 14, let's see what happens. Regarding the samba repos, I wasn't aware of that, I was using centos main repo. I'll check the out Best Regards, Martin On Tue, Oct 20, 2020 at 3:19 PM Strahil Nikolov wrote: > Do you have the same ammount of clients connected to each brick ? > > I guess something like this can show it: > > gluster volume status VOL clients > gluster volume status VOL client-list > > Best Regards, > Strahil Nikolov > > > > > > > В вторник, 20 октомври 2020 г., 15:41:45 Гринуич+3, Martín Lorenzo < > mlore...@gmail.com> написа: > > > > > > Hi, I have the following problem, I have a distributed replicated cluster > set up with samba and CTDB, over fuse mount points > I am having inconsistencies across the FUSE mounts, users report that > files are disappearing after being copied/moved. I take a look at the mount > points on each node, and they don't display the same data > > faulty mount point > [root@gluster6 ARRIBA GENTE martes 20 de octubre]# ll > ls: cannot access PANEO VUELTA A CLASES CON TAPABOCAS.mpg: No such file or > directory > ls: cannot access PANEO NIÑOS ESCUELAS CON TAPABOCAS.mpg: No such file or > directory > total 633723 > drwxr-xr-x. 5 arribagente PN 4096 Oct 19 10:52 COMERCIAL AG martes 20 > de octubre > -rw-r--r--. 1 arribagente PN 648927236 Jun 3 07:16 PANEO FACHADA PALACIO > LEGISLATIVO DRONE DIA Y NOCHE.mpg > -?? ? ? ? ?? PANEO NIÑOS ESCUELAS > CON TAPABOCAS.mpg > -?? ? ? ? ?? PANEO VUELTA A CLASES > CON TAPABOCAS.mpg > > > ###healthy mount point### > [root@gluster7 ARRIBA GENTE martes 20 de octubre]# ll > total 3435596 > drwxr-xr-x. 5 arribagente PN 4096 Oct 19 10:52 COMERCIAL AG martes > 20 de octubre > -rw-r--r--. 1 arribagente PN 648927236 Jun 3 07:16 PANEO FACHADA PALACIO > LEGISLATIVO DRONE DIA Y NOCHE.mpg > -rw-r--r--. 1 arribagente PN 2084415492 Aug 18 09:14 PANEO NIÑOS ESCUELAS > CON TAPABOCAS.mpg > -rw-r--r--. 1 arribagente PN 784701444 Sep 4 07:23 PANEO VUELTA A CLASES > CON TAPABOCAS.mpg > > - So far the only way to solve this is to create a directory in the > healthy mount point, on the same path: > [root@gluster7 ARRIBA GENTE martes 20 de octubre]# mkdir hola > > - When you refresh the other mountpoint, and the issue is resolved: > [root@gluster6 ARRIBA GENTE martes 20 de octubre]# ll > total 3435600 > drwxr-xr-x. 5 arribagente PN 4096 Oct 19 10:52 COMERCIAL AG martes > 20 de octubre > drwxr-xr-x. 2 rootroot 4096 Oct 20 08:45 hola > -rw-r--r--. 1 arribagente PN648927236 Jun 3 07:16 PANEO FACHADA > PALACIO LEGISLATIVO DRONE DIA Y NOCHE.mpg > -rw-r--r--. 1 arribagente PN 2084415492 Aug 18 09:14 PANEO NIÑOS > ESCUELAS CON TAPABOCAS.mpg > -rw-r--r--. 1 arribagente PN784701444 Sep 4 07:23 PANEO VUELTA A > CLASES CON TAPABOCAS.mpg > > Interestingly, the error occurs on the mount point where the files were > copied. They don't show up as pending heal entries. I have around 15 people > using them over samba, I think I'm having this issue reported every two > days. > > I have an older cluster with similar issues, different gluster version, > but a very similar topology (4 bricks, initially two bricks then expanded) > Please note , the bricks aren't the same size (but their replicas are), so > my other suspicion is that rebalancing has something to do with it. > > I'm trying to reproduce it over a small virtualized cluster, so far no > results. > > Here are the cluster details > four nodes, replica 2, plus one arbiter hosting 2 bricks > > I have 2 bricks with ~20 TB capacity and the other pair is ~48TB > Volume Name: tapeless > Type: Distributed-Replicate > Volume ID: 53bfa86d-b390-496b-bbd7-c4bba625c956 > Status: Started > Snapshot Count: 0 > Number of Bricks: 2 x (2 + 1) = 6 > Transport-type: tcp > Bricks: > Brick1: gluster6.glustersaeta.net:/data/glusterfs/tapeless/brick_6/brick > Brick2: gluster7.glustersaeta.net:/data/glusterfs/tapeless/brick_7/brick > Brick3: kitchen-store.glustersaeta.net:/data/glusterfs/tapeless/brick_1a/brick > (arbiter) > Brick4: gluster12.glustersaeta.net:/data/glusterfs/tapeless/brick_12/brick > Brick5: gluster13.glustersaeta.net:/data/glusterfs/tapeless/brick_13/brick > Brick6: kitchen-store.glustersaeta.net:/data/glusterfs/tapeless/brick_2a/brick > (arbiter) > Options Reconfigured: > features.quota-deem-statfs: on > performance.client-io-threads: on > nfs.disable: on > transport.address-family: inet > features.quota: on > features.inode-quota: on > features.cache-invalidation: on > features.cache-invalidation-timeout: 600 > performance.cache-samba-metadata: on > performance.stat-prefetch: on > performance.cache-invalidation: on > performance.md-cache-timeout: 600 > network.inode-lru-limit: 20 > performance.nl-cache: on > performance.nl-cache-timeout: 600 > p
Re: [Gluster-users] Upgrade from 6.9 to 7.7 stuck (peer is rejected)
Dear all, Thanks to this fix I could successfully upgrade from GlusterFS 6.9 to 7.8 but now, 1 week later after the upgrade, I have rebooted my third node (arbiter node) and unfortunately the bricks do not want to come up on that node. I get the same following error message: [2020-10-26 06:21:59.726705] E [MSGID: 106012] [glusterd-utils.c:3682:glusterd_compare_friend_volume] 0-management: Cksums of quota configuration of volume myvol-private differ. local cksum = 0, remote cksum = 66908910 on peer node2.domain [2020-10-26 06:21:59.726871] I [MSGID: 106493] [glusterd-handler.c:3715:glusterd_xfer_friend_add_resp] 0-glusterd: Responded to node2.domain (0), ret: 0, op_ret: -1 [2020-10-26 06:21:59.728164] I [MSGID: 106490] [glusterd-handler.c:2434:__glusterd_handle_incoming_friend_req] 0-glusterd: Received probe from uuid: 5f4ccbf4-33f6-4298-8b31-213553223349 [2020-10-26 06:21:59.728969] E [MSGID: 106012] [glusterd-utils.c:3682:glusterd_compare_friend_volume] 0-management: Cksums of quota configuration of volume myvol-private differ. local cksum = 0, remote cksum = 66908910 on peer node1.domain [2020-10-26 06:21:59.729099] I [MSGID: 106493] [glusterd-handler.c:3715:glusterd_xfer_friend_add_resp] 0-glusterd: Responded to node1.domain (0), ret: 0, op_ret: -1 Can someone please advise what I need to do in order to have my arbiter node up and running again as soon as possible? Thank you very much in advance for your help. Best regards, Mabi ‐‐‐ Original Message ‐‐‐ On Monday, September 7, 2020 5:49 AM, Sanju Rakonde wrote: > Hi, > > issue https://github.com/gluster/glusterfs/issues/1332 is fixed now with > https://github.com/gluster/glusterfs/commit/865cca1190e233381f975ff36118f46e29477dcf. > > It will be backported to release-7 and release-8 branches soon. > > On Mon, Sep 7, 2020 at 1:14 AM Strahil Nikolov wrote: > >> Your e-mail got in the spam... >> >> If you haven't fixed the issue, check Hari's topic about quota issues (based >> on the error message you provided) : >> https://medium.com/@harigowtham/glusterfs-quota-fix-accounting-840df33fcd3a >> >> Most probably there is a quota issue and you need to fix it. >> >> Best Regards, >> Strahil Nikolov >> >> В неделя, 23 август 2020 г., 11:05:27 Гринуич+3, mabi >> написа: >> >> Hello, >> >> So to be precise I am exactly having the following issue: >> >> https://github.com/gluster/glusterfs/issues/1332 >> >> I could not wait any longer to find some workarounds or quick fixes so I >> decided to downgrade my rejected from 7.7 back to 6.9 which worked. >> >> I would be really glad if someone could fix this issue or provide me a >> workaround which works because version 6 of GlusterFS is not supported >> anymore so I would really like to move on to the stable version 7. >> >> Thank you very much in advance. >> >> Best regards, >> Mabi >> >> ‐‐‐ Original Message ‐‐‐ >> >> On Saturday, August 22, 2020 7:53 PM, mabi wrote: >> >>> Hello, >>> >>> I just started an upgrade of my 3 nodes replica (incl arbiter) of GlusterFS >>> from 6.9 to 7.7 but unfortunately after upgrading the first node, that node >>> gets rejected due to the following error: >>> >>> [2020-08-22 17:43:00.240990] E [MSGID: 106012] >>> [glusterd-utils.c:3537:glusterd_compare_friend_volume] 0-management: Cksums >>> of quota configuration of volume myvolume differ. local cksum = 3013120651, >>> remote cksum = 0 on peer myfirstnode.domain.tld >>> >>> So glusterd process is running but not glusterfsd. >>> >>> I am exactly in the same issue as described here: >>> >>> https://www.gitmemory.com/Adam2Marsh >>> >>> But I do not see any solutions or workaround. So now I am stuck with a >>> degraded GlusterFS cluster. >>> >>> Could someone please advise me as soon as possible on what I should do? Is >>> there maybe any workarounds? >>> >>> Thank you very much in advance for your response. >>> >>> Best regards, >>> Mabi >> >> >> >> Community Meeting Calendar: >> >> Schedule - >> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >> Bridge: https://bluejeans.com/441850968 >> >> Gluster-users mailing list >> Gluster-users@gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users >> >> >> Community Meeting Calendar: >> >> Schedule - >> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >> Bridge: https://bluejeans.com/441850968 >> >> Gluster-users mailing list >> Gluster-users@gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > > -- > Thanks, > Sanju Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Upgrade from 6.9 to 7.7 stuck (peer is rejected)
Il 26/10/20 07:40, mabi ha scritto: > Thanks to this fix I could successfully upgrade from GlusterFS 6.9 to > 7.8 but now, 1 week later after the upgrade, I have rebooted my third > node (arbiter node) and unfortunately the bricks do not want to come up > on that node. I get the same following error message: IIRC it's the same issue I had some time ago. I solved it by "degrading" the volume to replica 2, then cleared the arbiter bricks and upgraded again to replica 3 arbiter 1. -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users