Re: [Gluster-users] Confusion supreme
Hi Zenon, First step would be to ensure that all clients are connected to all bricks - this will reduce the chance of new problems. For some reason there are problems with the broken node. Did you reduce the replica to 2 before reinstalling the broken node and re-adding it to the TSP ? Try to get the attributes and the blames of a few files.The following article (check all 3 parts) could help you understand the logic and give you hints where to look at: https://ravispeaks.wordpress.com/2019/04/05/glusterfs-afr-the-complete-guide/ Best Regards, Strahil Nikolov On Wed, Jun 26, 2024 at 20:46, Zenon Panoussis wrote: I should add that in /var/lib/glusterd/vols/gv0/gv0-shd.vol and in all other configs in /var/lib/glusterd/ on all three machines the nodes are consistently named client-2: zephyrosaurus client-3: alvarezsaurus client-4: nanosaurus This is normal. It was the second time that a brick was removed, so client-0 and client-1 are gone. So the problem is the file attibutes themselves. And there I see things like trusted.afr.gv0-client-0=0x trusted.afr.gv0-client-1=0x0ab0 trusted.afr.gv0-client-3=0x trusted.afr.gv0-client-4=0x and trusted.afr.gv0-client-3=0x trusted.afr.gv0-client-4=0x and other such, where the only thing that is consistent, is inconsistency. When a brick is removed, shouldn't all files on the remaining bricks be re-fattr'ed to remove the pointers to the non-existent brick? I guess I can do this manually, but it will still leave me with those files where the value of all trusted.afr.gv0-client(s) is zero. How does healing deal with those? Cheers, Z -- Слава Україні! Путлер хуйло! Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Glusterfs community status
Hi Ilias, Usually when healing problems occur, the first step I do is to take a look if all clients are connected to all bricks using: gluster volume status all client-list gluster volume status all clients Can you check if you have clients connected only to some of the bricks instead of all ? Best Regards, Strahil Nikolov В вторник, 9 юли 2024 г. в 08:59:38 ч. Гринуич+3, Ilias Chasapakis forumZFD написа: Hi, we at forumZFD are currently experiencing problems similar to those mentioned here on the mailing list especially on the latest messages. Our gluster just doesn't heal all entries and "manual" healing is long and tedious. Entries accumulate in time and we have to do regular cleanups that take long and are risky. Despite changing available options with different combinations of values, the problem persists. So we thought, "let's go to the community meeting" if not much is happening here on the list. We are at the end of our knowledge and can therefore no longer contribute much to the list. Unfortunately, nobody was at the community meeting. Somehow we have the feeling that there is no one left in the community or in the project who is interested in fixing the basics of Gluster (namely the healing). Is that the case and is gluster really end of life? We appreciate a lot the contributions in the last few years and all the work done. As well as for the honest efforts to give a hand. But would be good to have an orientation on the status of the project itself. Many thanks in advance for any replies. Ilias -- forumZFD Entschieden für Frieden | Committed to Peace Ilias Chasapakis Referent IT | IT Referent Forum Ziviler Friedensdienst e.V. | Forum Civil Peace Service Am Kölner Brett 8 | 50825 Köln | Germany Tel 0221 91273243 | Fax 0221 91273299 | http://www.forumZFD.de Vorstand nach § 26 BGB, einzelvertretungsberechtigt|Executive Board: Alexander Mauz, Sonja Wiekenberg-Mlalandle, Jens von Bargen VR 17651 Amtsgericht Köln Spenden|Donations: IBAN DE90 4306 0967 4103 7264 00 BIC GENODEM1GLS Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Adding storage capacity to a production disperse volume
Hi Ted, What do you mean with one unit ? Best Regards,Strahil Nikolov On Fri, Mar 29, 2024 at 4:33, Theodore Buchwald wrote: Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Adding storage capacity to a production disperse volume
Hi, This is dispersed volume - the number of bricks added must match the number of bricks defined, so in your case 5 bricks.By the way, check this thread on the topic: https://lists.gluster.org/pipermail/gluster-users/2018-July/034491.html Best Regards,Strahil Nikolov On Sun, Mar 17, 2024 at 20:38, Theodore Buchwald wrote: Hi, This is the first time I have tried to expand the storage of a live gluster volume. I was able to get another supermicro storage unit for a gluster cluster that I built. The current clustered storage configuration contains five supermicro units. And the cluster volume is setup with the following configuration: node-6[/var/log/glusterfs]# gluster volume info Volume Name: researchdata Type: Disperse Volume ID: 93d4-482a-8933-2d81298d5b3b Status: Started Snapshot Count: 0 Number of Bricks: 1 x (4 + 1) = 5 Transport-type: tcp Bricks: Brick1: node-1:/mnt/data/researchdata-1 Brick2: node-2:/mnt/data/researchdata-2 Brick3: node-3:/mnt/data/researchdata-3 Brick4: node-4:/mnt/data/researchdata-4 Brick5: node-5:/mnt/data/researchdata-5 Options Reconfigured: features.quota-deem-statfs: on features.inode-quota: on features.quota: on storage.fips-mode-rchecksum: on transport.address-family: inet nfs.disable: on locks.mandatory-locking: optimal Adding the node to the cluster was no problem. But adding a brick using 'add-brick' to the volume resulted in "volume add-brick: failed: Incorrect number of bricks supplied 1 with count 5". So my question is. What would be the correct amount of bricks needed to expand the storage on the current configuration of 'Number of Bricks: 1 x (4 + 1) = 5'? Without reconfiguring the volume all together. Thanks in advance for any pointers in how to expand this volume's storage capabilities. Thanks, Tbuck Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Adding storage capacity to a production disperse volume
Hi Theodore, Is this a dispersed volume ?Can you share the whole volume info ? Best Regards,Strahil Nikolov On Thu, Mar 14, 2024 at 14:38, Thomas Pries wrote: Hi, On 14.03.2024 01:39, Theodore Buchwald wrote: ... So my question is. What would be the correct amount of bricks needed to expand the storage on the current configuration of 'Number of Bricks: 1 x (4 + 1) = 5'? ... I tried something similar and ended up with a similar error. As far as I understand the documentation the answer in your case is "5". See: https://docs.gluster.org/en/v3/Administrator%20Guide/Managing%20Volumes/#expanding-volumes "... When expanding distributed replicated and distributed dispersed volumes, you need to add a number of bricks that is a multiple of the replica or disperse count. ..." One suboptimal idea could be: divide the new device in 5 partitions and add these 5 partitions as new bricks and when you get the next device, move one of this bricks to the new device, and so on ... until you have all 5 additional devices. I'm curious for other ideas. Kind regards Thomas Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Gluster Rebalance question
Hi Patrick, I don't think you need rebalance, just ensure that all bricks have the same size. Take a look in RH documention[1] and this old article [2] for more details. Best Regards,Strahil Nikolov [1] https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.5/html/administration_guide/sect-rebalancing_volumes[2] https://staged-gluster-docs.readthedocs.io/en/release3.7.0beta1/Features/rebalance/ On Tue, Mar 12, 2024 at 15:43, Patrick Dijkgraaf wrote: Hi all, I'm using glusterfs for a few years now, and generally very happy with it. Saved my data multiple times already! :-) However, I do have a few questions for which I hope someone is able to answer them. I have a distributed, replicated glusterfs setup. I am in the process of replacing 4TB bricks with 8TB bricks, which is working nicely. However, what I am seeing now is that the space usage of the replaced bricks is way lower than the rest. 45% vs. 90%, which makes sense because the disk is now twice as large. As I understand it, this won't balance out automatically and I need to run a rebalance process to ditribute the data evenly across the bricks. - What does the "fix-layout" option do exactly? Does it only correct/adjust gluster metadata, meaning it should be finished quite quickly? - Would the "fix-layout" option be required in this scenario? I know it is required when adding/removing bricks. But in my scenario the amount of bricks has stayed the same, only the size has changed. - Will the rebalance read/write all data, or only the data that is causing the imbalance (only moving the excess data on the fuller bricks to the lesser-full bricks). - What does the "force" option do exactly? I read something about link files and performance impact if you do not use the "force" option. - Should I use the "force" option in my scenario, or not? Thanks in advance! -- Groet / Cheers, Patrick Dijkgraaf Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Graceful shutdown doesn't stop all Gluster processes
Basically the lvm operations on the iSCSI target are online, and then on the client rescan the lun: 'iscsiadm -m node --targetname target_name -R and then just extend the FS.Of course, test the procedure before doing it in production. Why do you use glusterFS on iSCSI ? You can have a shared file system on the same lun and mounted on multiple nodes. Gluster is supposed to be used with local disks in order to improve resilience and scale to massive sizes. Best Regards,Strahil Nikolov On Mon, Feb 26, 2024 at 11:37, Anant Saraswat wrote: Hi Strahil, In our setup, the Gluster brick comes from an iSCSI SAN storage and is then used as a brick on the Gluster server. To extend the brick, we stop the Gluster server, extend the logical volume (LV) on the SAN server, resize it on the host, mount the brick with the extended size, and finally start the Gluster server. Please let me know if this process can be optimized, I will be happy to do so. Many thanks,Anant From: Strahil Nikolov Sent: 24 February 2024 12:33 PM To: Anant Saraswat Cc: gluster-users@gluster.org Subject: Re: [Gluster-users] Graceful shutdown doesn't stop all Gluster processes EXTERNAL: Do not click links or open attachments if you do not recognize the sender. Hi Anant, why would you need to shutdown a brick to expand it ? This is an online operation. Best Regards, Strahil Nikolov DISCLAIMER: This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the sender. This message contains confidential information and is intended only for the individual named. If you are not the named addressee, you should not disseminate, distribute or copy this email. Please notify the sender immediately by email if you have received this email by mistake and delete this email from your system. If you are not the intended recipient, you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited. Thanks for your cooperation. Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Graceful shutdown doesn't stop all Gluster processes
Hi Anant, why would you need to shutdown a brick to expand it ? This is an online operation. Best Regards, Strahil Nikolov Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Graceful shutdown doesn't stop all Gluster processes
Well, you prepare the host for shutdown, right ? So why don't you setup systemd to start the container and shut it down before the bricks ? Best Regards, Strahil Nikolov В петък, 16 февруари 2024 г. в 18:48:36 ч. Гринуич+2, Anant Saraswat написа: Hi Strahil, Yes, we mount the fuse to the physical host and then use bind mount to provide access to the container. The same physical host also runs the gluster server. Therefore, when we stop gluster using 'stop-all-gluster-processes.sh' on the physical host, it kills the fuse mount and impacts containers accessing this volume via bind. Thanks, Anant From: Strahil Nikolov Sent: 16 February 2024 3:51 PM To: Anant Saraswat ; Aravinda Cc: ronny.adse...@amazinginternet.com ; gluster-users@gluster.org Subject: Re: [Gluster-users] Graceful shutdown doesn't stop all Gluster processes EXTERNAL: Do not click links or open attachments if you do not recognize the sender. Hi Anant, Do you use the fuse client in the container ? Wouldn't it be more reasonable to mount the fuse and then use bind mount to provide access to the container ? Best Regards, Strahil Nikolov > > On Fri, Feb 16, 2024 at 15:02, Anant Saraswat > > wrote: > > > > Okay, I understand. Yes, it would be beneficial to include an option for > skipping the client processes. This way, we could utilize the > 'stop-all-gluster-processes.sh' script with that option to stop the gluster > server process while retaining the fuse mounts. > > > > > > > > From: Aravinda > Sent: 16 February 2024 12:36 PM > To: Anant Saraswat > Cc: ronny.adse...@amazinginternet.com ; > gluster-users@gluster.org ; Strahil Nikolov > > Subject: Re: [Gluster-users] Graceful shutdown doesn't stop all Gluster > processes > > > > EXTERNAL: Do not click links or open attachments if you do not recognize the > sender. > > No. If the script is used to update the GlusterFS packages in the node, then > we need to stop the client processes as well (Fuse client is `glusterfs` > process. `ps ax | grep glusterfs`). > > > > The default behaviour can't be changed, but the script can be enhanced by > adding a new option `--skip-clients` so that it can skip stopping the client > processes. > > > > -- > > > Aravinda > > Kadalu Technologies > > > > > > > > > On Fri, 16 Feb 2024 16:15:22 +0530 Anant Saraswat > wrote --- > > > > > Hello Everyone, > > > > > We are mounting this external Gluster volume (dc.local:/docker_config) for > docker configuration on one of the Gluster servers. When I ran the > stop-all-gluster-processes.sh script, I wanted to stop all gluster > server-related processes on the server, but not to unmount the external > gluster volume mounted on the server. However, running > stop-all-gluster-processes.sh unmounted the dc.local:/docker_config volume > from the server. > > > > /dev/mapper/tier1data 6.1T 4.7T 1.4T 78% > /opt/tier1data/brick > > dc.local:/docker_config 100G 81G 19G 82% /opt/docker_config > > > > Do you think stop-all-gluster-processes.sh should unmount the fuse mount? > > > > Thanks, > > Anant > > From: Gluster-users on behalf of Strahil > Nikolov > Sent: 09 February 2024 5:23 AM > To: ronny.adse...@amazinginternet.com ; > gluster-users@gluster.org > Subject: Re: [Gluster-users] Graceful shutdown doesn't stop all Gluster > processes > > > > EXTERNAL: Do not click links or open attachments if you do not recognize the > sender. > > I think the service that shutdowns the bricks on EL systems is something like > this - right now I don't have access to my systems to check but you can > extract the rpms and see it: > > > > https://bugzilla.redhat.com/show_bug.cgi?id=1022542#c4 > > > > Best Regards, > > Strahil Nikolov > >> >> On Wed, Feb 7, 2024 at 19:51, Ronny Adsetts >> >> wrote: >> >> Community Meeting Calendar:Schedule -Every 2nd and 4th Tuesday at >>14:30 IST / 09:00 UTCBridge: >>https://meet.google.com/cpu-eiue-hvkGluster-users mailing >>listGluster-users@gluster.orghttps://lists.gluster.org/mailman/listinfo/gluster-users >> > > > > DISCLAIMER: This email and any files transmitted with it are confidential and > intended solely for the use of the individual or entity to whom they are > addressed. If you have received this email in error, please notify the > sender. Thi
Re: [Gluster-users] Graceful shutdown doesn't stop all Gluster processes
Hi Anant, Do you use the fuse client in the container ?Wouldn't it be more reasonable to mount the fuse and then use bind mount to provide access to the container ? Best Regards,Strahil Nikolov On Fri, Feb 16, 2024 at 15:02, Anant Saraswat wrote: Okay, I understand. Yes, it would be beneficial to include an option for skipping the client processes. This way, we could utilize the 'stop-all-gluster-processes.sh' script with that option to stop the gluster server process while retaining the fuse mounts. From: Aravinda Sent: 16 February 2024 12:36 PM To: Anant Saraswat Cc: ronny.adse...@amazinginternet.com ; gluster-users@gluster.org ; Strahil Nikolov Subject: Re: [Gluster-users] Graceful shutdown doesn't stop all Gluster processes EXTERNAL: Do not click links or open attachments if you do not recognize the sender. No. If the script is used to update the GlusterFS packages in the node, then we need to stop the client processes as well (Fuse client is `glusterfs` process. `ps ax | grep glusterfs`). The default behaviour can't be changed, but the script can be enhanced by adding a new option `--skip-clients` so that it can skip stopping the client processes. --AravindaKadalu Technologies On Fri, 16 Feb 2024 16:15:22 +0530 Anant Saraswat wrote --- Hello Everyone, We are mounting this external Gluster volume (dc.local:/docker_config) for docker configuration on one of the Gluster servers. When I ran the stop-all-gluster-processes.sh script, I wanted to stop all gluster server-related processes on the server, but not to unmount the external gluster volume mounted on the server. However, running stop-all-gluster-processes.sh unmounted the dc.local:/docker_config volume from the server. /dev/mapper/tier1data 6.1T 4.7T 1.4T 78% /opt/tier1data/brickdc.local:/docker_config 100G 81G 19G 82% /opt/docker_config Do you think stop-all-gluster-processes.sh should unmount the fuse mount? Thanks,AnantFrom: Gluster-users on behalf of Strahil Nikolov Sent: 09 February 2024 5:23 AM To: ronny.adse...@amazinginternet.com ;gluster-users@gluster.org Subject: Re: [Gluster-users] Graceful shutdown doesn't stop all Gluster processes EXTERNAL: Do not click links or open attachments if you do not recognize the sender. I think the service that shutdowns the bricks on EL systems is something like this - right now I don't have access to my systems to check but you can extract the rpms and see it: https://bugzilla.redhat.com/show_bug.cgi?id=1022542#c4 Best Regards,Strahil Nikolov On Wed, Feb 7, 2024 at 19:51, Ronny Adsetts wrote: Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users DISCLAIMER: This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the sender. This message contains confidential information and is intended only for the individual named. If you are not the named addressee, you should not disseminate, distribute or copy this email. Please notify the sender immediately by email if you have received this email by mistake and delete this email from your system. If you are not the intended recipient, you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited. Thanks for your cooperation. DISCLAIMER: This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the sender. This message contains confidential information and is intended only for the individual named. If you are not the named addressee, you should not disseminate, distribute or copy this email. Please notify the sender immediately by email if you have received this email by mistake and delete this email from your system. If you are not the intended recipient, you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited. Thanks for your cooperation. Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] __Geo-replication status is getting Faulty after few seconds
The other option around (if indeed is the only in .glusterfs) is to have it link somewhere on the real filesystem structure and then sync it to the other bricks.If georep works again - just remove it from the fuse and check if it will get deleted or not. Best Regards,Strahil Nikolov On Fri, Feb 9, 2024 at 7:25, Strahil Nikolov wrote: It's a hard link, so use find's '-samefile' option to see if it's the last one or not. If you really want to delete it, have a backup and then delete both the gfid and any other hard links. Best Regards,Strahil Nikolov Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] __Geo-replication status is getting Faulty after few seconds
It's a hard link, so use find's '-samefile' option to see if it's the last one or not. If you really want to delete it, have a backup and then delete both the gfid and any other hard links. Best Regards,Strahil Nikolov On Thu, Feb 8, 2024 at 22:43, Anant Saraswat wrote: Can anyone please suggest if it's safe to delete '/opt/tier1data2019/brick/.glusterfs/d5/3f/d53fad8f-84e9-4b24-9eb0-ccbcbdc4baa8'? This file is only present on the primary master node (master1) and doesn't exist on master2 and master3 nodes. When I resume the geo-replication, I get the following error. Also, how can I remove this file from the changelogs so that when I start the geo-replication again, this file won't be checked? [2024-02-07 22:37:36.911439] D [master(worker /opt/tier1data2019/brick):1344:process] _GMaster: processing change [{changelog=/var/lib/misc/gluster/gsyncd/tier1data_drtier1data_drtier1data/opt-tier1data2019-brick/.history/.processing/CHANGELOG.1705936007}][2024-02-07 22:37:36.915193] E [syncdutils(worker /opt/tier1data2019/brick):346:log_raise_exception] : Gluster Mount process exited [{error=ENOTCONN}][2024-02-07 22:37:36.915252] E [syncdutils(worker /opt/tier1data2019/brick):363:log_raise_exception] : FULL EXCEPTION TRACE:Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 317, in main func(args) File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 86, in subcmd_worker local.service_loop(remote) File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1298, in service_loop g3.crawlwrap(oneshot=True) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 604, in crawlwrap self.crawl() File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1614, in crawl self.changelogs_batch_process(changes) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1510, in changelogs_batch_process self.process(batch) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1345, in process self.process_change(change, done, retry) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1071, in process_change st = lstat(pt) File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 589, in lstat return errno_wrap(os.lstat, [e], [ENOENT], [ESTALE, EBUSY]) File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 571, in errno_wrap return call(*arg)OSError: [Errno 107] Transport endpoint is not connected: '.gfid/d53fad8f-84e9-4b24-9eb0-ccbcbdc4baa8'[2024-02-07 22:37:37.344426] I [monitor(monitor):228:monitor] Monitor: worker died in startup phase [{brick=/opt/tier1data2019/brick}][2024-02-07 22:37:37.346601] I [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change [{status=Faulty}] Thanks,Anant From: Anant Saraswat Sent: 08 February 2024 2:00 PM To: Diego Zuccato ; gluster-users@gluster.org ; Strahil Nikolov ; Aravinda Vishwanathapura Subject: Re: [Gluster-users] __Geo-replication status is getting Faulty after few seconds Thanks@Diego Zuccato, I'm just thinking, if we delete the suspected file, won't it create an issue since this ID is present in the `CHANGELOG.1705936007` file? [root@master1 ~]# grep -i "d53fad8f-84e9-4b24-9eb0-ccbcbdc4baa8" /var/lib/misc/gluster/gsyncd/tier1data_drtier1data_drtier1data/opt-tier1data2019-brick/.history/.processing/CHANGELOG.1705936007E d53fad8f-84e9-4b24-9eb0-ccbcbdc4baa8 CREATE 33188 0 0 e8aff729-a310-4d21-a64b-d8cc7cb1a828/app_docmerge12monthsfixedCUSTODIAL_2024_1_22_15_3_24_648.docD d53fad8f-84e9-4b24-9eb0-ccbcbdc4baa8E d53fad8f-84e9-4b24-9eb0-ccbcbdc4baa8 UNLINK e8aff729-a310-4d21-a64b-d8cc7cb1a828/app_docmerge12monthsfixedCUSTODIAL_2024_1_22_15_3_24_648.doc From: Gluster-users on behalf of Diego Zuccato Sent: 08 February 2024 1:37 PM To: gluster-users@gluster.org Subject: Re: [Gluster-users] __Geo-replication status is getting Faulty after few seconds EXTERNAL: Do not click links or open attachments if you do not recognize the sender. That '1' means there's no corresponding file in the regular file structure (outside .glusterfs). IIUC it shouldn't happen, but it does (quite often). *Probably* it's safe to just delete it, but wait for advice from more competent users. Diego Il 08/02/2024 13:42, Anant Saraswat ha scritto: > Hi Everyone, > > As I was getting "OSError: [Errno 107] Transport endpoint is not > connected: '.gfid/d53fad8f-84e9-4b24-9eb0-ccbcbdc4baa8' " error in the > primary master node gsyncd log, So I started searching this file details > and I found this file in the brick, under the .glusterfs folder on > master1 node. > > Path on master1 - > /opt/tier1data2019/brick/.glusterfs/d5/3f/d53fad8f-84e9-4b24-9eb0-ccbcbdc4baa8 > > [root@master1 ~]# ls -lrt /opt/tier1
Re: [Gluster-users] Graceful shutdown doesn't stop all Gluster processes
I think the service that shutdowns the bricks on EL systems is something like this - right now I don't have access to my systems to check but you can extract the rpms and see it: https://bugzilla.redhat.com/show_bug.cgi?id=1022542#c4 Best Regards,Strahil Nikolov On Wed, Feb 7, 2024 at 19:51, Ronny Adsetts wrote: Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] __Geo-replication status is getting Faulty after few seconds
Have you tried setting up gluster georep with a dedicated non-root user ? Best Regards,Strahil Nikolov On Tue, Feb 6, 2024 at 16:38, Anant Saraswat wrote: Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Challenges with Replicated Gluster volume after stopping Gluster on any node.
That's not true for EL-based systems as they have 'glusterfsd.service' which on shutdown kills all gluster brick processes. Best Regards,Strahil Nikolov On Tue, Feb 6, 2024 at 9:21, Diego Zuccato wrote: Just a notice: bricks processes not being terminated prevents a simple "reboot" (or "shutdown -r now", or an automated shutdown initiated by UPS) to complete: you're left with most daemons (including sshd) terminated and a machine that's neither shutting down nor returning to normal operations and the only option is unclean shutdown. Diego Il 05/02/2024 19:22, Anant Saraswat ha scritto: > Hi Hu, > > Yes, I have used the "stop-all-gluster-processes.sh" after systemctl stop. > > I consider "stop-all-gluster-processes.sh" as the last resort to kill > all the remaining gluster processes. Do you use it primarily to stop the > gluster? > > Thanks, > Anant > > Get Outlook for iOS <https://aka.ms/o0ukef> > > *From:* Hu Bert > *Sent:* Monday, February 5, 2024 6:15 PM > *To:* Anant Saraswat > *Cc:* gluster-users@gluster.org > *Subject:* Re: [Gluster-users] Challenges with Replicated Gluster volume > after stopping Gluster on any node. > > *EXTERNAL: Do not click links or open attachments if you do not > recognize the sender.* > > Hi, > normally, when we shut down or reboot one of the (server) nodes, we call > the "stop-all-gluster-processes.sh" script. But i think you did that, right? > > > Best regards, > Hubert > > > Am Mo., 5. Feb. 2024 um 13:35 Uhr schrieb Anant Saraswat > mailto:anant.saras...@techblue.co.uk>>: > > Hello Everyone, > > We have a replicated Gluster volume with three nodes, and we face a > strange issue whenever we need to restart one of the nodes in this > cluster. > > As per my understanding, if we shut down one node, the Gluster mount > should smoothly connect to another remaining Gluster server and > shouldn't create any issues. > > In our setup, when we stop Gluster on any of the nodes, we mostly > get the error 'Transport endpoint is not connected' on the clients. > When we run the commands to check the connected clients on the > Gluster server, we get the following error: > > gluster volume status tier1data clients > FAILED: Commit failed on master1. Error: Unable to decode brick op > response. > > Could anyone recommend a potential solution for this? > > Thanks, > Anant > > > Anant Saraswat > > DevOps Lead > > IT | Technology Blueprint Ltd. > > > mobilePhone > > +44-8450047142 (5020) | > +91-9818761614 > emailAddress > > anant.saras...@techblue.co.uk <mailto:anant.saras...@techblue.co.uk> > website > > https://www.technologyblueprint.co.uk > ><https://urldefense.com/v3/__https://www.technologyblueprint.co.uk__;!!I_DbfM1H!HlVD8KfD8301Uq0Lq9EeyrI7mMGdgXcE2IwOS99fuppfYoCIBGTz0MqYAR-oRGuHhwXpq9lDRCX86lHUGepk4EZVhUf-$> > address > > Salisbury House, Station Road, Cambridge, Cambridgeshire, CB1 2LA > > DISCLAIMER: This email and any files transmitted with it are > confidential and intended solely for the use of the individual or > entity to whom they are addressed. If you have received this email > in error, please notify the sender. This message contains > confidential information and is intended only for the individual > named. If you are not the named addressee, you should not > disseminate, distribute or copy this email. Please notify the sender > immediately by email if you have received this email by mistake and > delete this email from your system. > > If you are not the intended recipient, you are notified that > disclosing, copying, distributing or taking any action in reliance > on the contents of this information is strictly prohibited. Thanks > for your cooperation. > > > > > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://meet.google.com/cpu-eiue-hvk > ><https://urldefense.com/v3/__https://meet.google.com/cpu-eiue-hvk__;!!I_DbfM1H!HlVD8KfD8301Uq0Lq9EeyrI7mMGdgXcE2IwOS99fuppfYoCIBGTz0MqYAR-oRGuHhwXpq9lDRCX86lHUGepk4BWhreAq$> > Gluster-users mailing list > Gluster-users@gluster.org <mailto:Gluster-users@gluster.org> > https://lists.gluster.org/mailman/listinfo/gluster-users > ><https://urldefense.com/v3/__https://lists.gluster.org/mailman/listinfo/glust
Re: [Gluster-users] Geo-replication status is getting Faulty after few seconds
Hi Anant, What version of Gluster are you using ? Best Regards,Strahil Nikolov Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Upgrade 10.4 -> 11.1 making problems
Hi, This is a simplified description, see the links bellow for more detailed one.When a client makes a change to a file - it commits that change to all bricks simultaneously and if the change passes on a quorate number of bricks (in your case 2 out of 3 is enough) it is treated as successful.During that phase the 2 bricks, that successfully have completed the task, will mark the 3rd brick as 'dirty' and you will see that in the heal report.Only when the heal daemon syncs the file to the final brick, that heal will be cleaned from the remaining bricks. If a client has only 2 out of 3 bricks connected, it will constantly create new files for healing (as it can't save it on all 3) and this can even get worse with the increase of the number of clients that fail to connect to the 3rd brick. Check that all client's IPs are connected to all bricks and those that are not - remount the volume. After remounting the behavior should not persist. If it does - check with your network/firewall team for troubleshooting the problem. You can use 'gluster volume status all client-list' and 'gluster volume status all clients' (where 'all' can be replaced by the volume name) to find more details on that side. You can find a more detailed explanation of the whole process at this blog:https://ravispeaks.wordpress.com/2019/04/05/glusterfs-afr-the-complete-guide/ https://ravispeaks.wordpress.com/2019/04/15/gluster-afr-the-complete-guide-part-2/ https://ravispeaks.wordpress.com/2019/05/14/gluster-afr-the-complete-guide-part-3/ Best Regards,Strahil Nikolov On Tue, Jan 30, 2024 at 15:26, Hu Bert wrote: Hi Strahil, hm, not sure what the clients have to do with the situation. "gluster volume status workdata clients" - lists all clients with their IP addresses. "gluster peer status" and "gluster volume status" are ok, the latter one says that all bricks are online, have a port etc. The network is okay, ping works etc. Well, made a check on one client: umount gluster volume, remount, now the client appears in the list. Yeah... but why now? Will try a few more... not that easy as most of these systems are in production... I had enabled the 3 self-heal values, but that didn't have any effect back then. And, honestly, i won't do it now, because: if the heal started now that would probably slow down the live system (with the clients). I'll try it when the cluster isn't used anymore. Interesting - new messages incoming on the "bad" server: [2024-01-30 14:15:11,820] INFO [utils - 67:log_event] - {'nodeid': '8ea1e6b4-9c77-4390-96a7-8724c3f9dc0f', 'ts': 1706620511, 'event': 'AFR_SPLIT_BRAIN', 'message': {'client-pid': '-6', 'subvol': 'workdata-replicate-2', 'type': 'gfid', ' file': '/756>', 'count': '2', 'child-2': 'workdata-client-8', 'gfid-2': '39807be6-b7de-4a82-8a22-cf61b1415208', 'child-0': 'workdata-client-6', 'gfid-0': 'bb4a12ec-f9b7-46bc-9fb3-c57730f1fc49'} } [2024-01-30 14:15:17,028] INFO [utils - 67:log_event] - {'nodeid': '8ea1e6b4-9c77-4390-96a7-8724c3f9dc0f', 'ts': 1706620517, 'event': 'AFR_SPLIT_BRAIN', 'message': {'client-pid': '-6', 'subvol': 'workdata-replicate-4', 'type': 'gfid', ' file': '/94259611>', 'count': '2', 'child-2': 'workdata-client-14', 'gfid-2': '01234675-17b9-4523-a598-5e331a72c4fa', 'child-0': 'workdata-client-12', 'gfid-0': 'b11140bd-355b-4583-9a85-5d06085 89f97'}} They didn't appear in the beginning. Looks like a funny state that this volume is in :D Thx & best regards, Hubert Am Di., 30. Jan. 2024 um 07:14 Uhr schrieb Strahil Nikolov : > > This is your problem : bad server has only 3 clients. > > I remember there is another gluster volume command to list the IPs of the > clients. Find it and run it to find which clients are actually OK (those 3) > and the remaining 17 are not. > > Then try to remount those 17 clients and if the situation persistes - work > with your Network Team to identify why the 17 clients can't reach the brick. > > Do you have selfheal enabled? > > cluster.data-self-heal > cluster.entry-self-heal > cluster.metadata-self-heal > > > Best Regards, > > Strahil Nikolov > > On Mon, Jan 29, 2024 at 10:26, Hu Bert > wrote: > Hi, > not sure what you mean with "clients" - do you mean the clients that > mount the volume? > > gluster volume status workdata clients > -- > Brick : glusterpub2:/gluster/md3/workdata > Clients connected : 20 > Hostname BytesRead > BytesWritten OpVersion > - > - > 192.168.0.222:49140 43698212 > 41152108 11 > [...shortened...] > 192.168.0.126:49123 8362352021 > 16445401205 11 &g
Re: [Gluster-users] Upgrade 10.4 -> 11.1 making problems
This is your problem : bad server has only 3 clients. I remember there is another gluster volume command to list the IPs of the clients. Find it and run it to find which clients are actually OK (those 3) and the remaining 17 are not. Then try to remount those 17 clients and if the situation persistes - work with your Network Team to identify why the 17 clients can't reach the brick. Do you have selfheal enabled?cluster.data-self-heal cluster.entry-self-heal cluster.metadata-self-heal Best Regards,Strahil Nikolov On Mon, Jan 29, 2024 at 10:26, Hu Bert wrote: Hi, not sure what you mean with "clients" - do you mean the clients that mount the volume? gluster volume status workdata clients -- Brick : glusterpub2:/gluster/md3/workdata Clients connected : 20 Hostname BytesRead BytesWritten OpVersion - - 192.168.0.222:49140 43698212 41152108 11 [...shortened...] 192.168.0.126:49123 8362352021 16445401205 11 -- Brick : glusterpub3:/gluster/md3/workdata Clients connected : 3 Hostname BytesRead BytesWritten OpVersion - - 192.168.0.44:49150 5855740279 63649538575 11 192.168.0.44:49137 308958200 319216608 11 192.168.0.126:49120 7524915770 15489813449 11 192.168.0.44 (glusterpub3) is the "bad" server. Not sure what you mean by "old" - probably not the age of the server, but rather the gluster version. op-version is 11 on all servers+clients, upgraded from 10.4 -> 11.1 "Have you checked if a client is not allowed to update all 3 copies ?" -> are there special log messages for that? "If it's only 1 system, you can remove the brick, reinitialize it and then bring it back for a full sync." -> https://docs.gluster.org/en/v3/Administrator%20Guide/Managing%20Volumes/#replace-brick -> Replacing bricks in Replicate/Distributed Replicate volumes this part, right? Well, can't do this right now, as there are ~33TB of data (many small files) to copy, that would slow down the servers / the volume. But if the replacement is running i could do it afterwards, just to see what happens. Hubert Am Mo., 29. Jan. 2024 um 08:21 Uhr schrieb Strahil Nikolov : > > 2800 is too much. Most probably you are affected by a bug. How old are the > clients ? Is only 1 server affected ? > Have you checked if a client is not allowed to update all 3 copies ? > > If it's only 1 system, you can remove the brick, reinitialize it and then > bring it back for a full sync. > > Best Regards, > Strahil Nikolov > > On Mon, Jan 29, 2024 at 8:44, Hu Bert > wrote: > Morning, > a few bad apples - but which ones? Checked glustershd.log on the "bad" > server and counted todays "gfid mismatch" entries (2800 in total): > > 44 /212>, > 44 /174>, > 44 /94037803>, > 44 /94066216>, > 44 /249771609>, > 44 /64235523>, > 44 /185>, > > etc. But as i said, these are pretty new and didn't appear when the > volume/servers started missbehaving. Are there scripts/snippets > available how one could handle this? > > Healing would be very painful for the running system (still connected, > but not very long anymore), as there surely are 4-5 million entries to > be healed. I can't do this now - maybe, when the replacement is in > productive state, one could give it a try. > > Thx, > Hubert > > Am So., 28. Jan. 2024 um 23:12 Uhr schrieb Strahil Nikolov > : > > > > From gfid mismatch a manual effort is needed but you can script it. > > I think that a few bad "apples" can break the healing and if you fix them > > the healing might be recovered. > > > > Also, check why the client is not updating all copies. Most probably you > > have a client that is not able to connect to a brick. > > > > gluster volume status VOLUME_NAME clients > > > > Best Regards, > > Strahil Nikolov > > > > On Sun, Jan 28, 2024 at 20:55, Hu Bert > > wrote: > > Hi Strahil, > > there's no arbiter: 3 servers with 5 bricks each. > > > > Volume Name: workdata > > Type: Distributed-Replicate > > Volume ID: 7d1e23e5-0308-4443-a832-d36f85ff7959 > > Status: Started > > Snapshot Count: 0 > > Number of Bricks: 5 x 3 = 15 >
Re: [Gluster-users] Upgrade 10.4 -> 11.1 making problems
2800 is too much. Most probably you are affected by a bug. How old are the clients ? Is only 1 server affected ?Have you checked if a client is not allowed to update all 3 copies ? If it's only 1 system, you can remove the brick, reinitialize it and then bring it back for a full sync. Best Regards,Strahil Nikolov On Mon, Jan 29, 2024 at 8:44, Hu Bert wrote: Morning, a few bad apples - but which ones? Checked glustershd.log on the "bad" server and counted todays "gfid mismatch" entries (2800 in total): 44 /212>, 44 /174>, 44 /94037803>, 44 /94066216>, 44 /249771609>, 44 /64235523>, 44 /185>, etc. But as i said, these are pretty new and didn't appear when the volume/servers started missbehaving. Are there scripts/snippets available how one could handle this? Healing would be very painful for the running system (still connected, but not very long anymore), as there surely are 4-5 million entries to be healed. I can't do this now - maybe, when the replacement is in productive state, one could give it a try. Thx, Hubert Am So., 28. Jan. 2024 um 23:12 Uhr schrieb Strahil Nikolov : > > From gfid mismatch a manual effort is needed but you can script it. > I think that a few bad "apples" can break the healing and if you fix them the > healing might be recovered. > > Also, check why the client is not updating all copies. Most probably you have > a client that is not able to connect to a brick. > > gluster volume status VOLUME_NAME clients > > Best Regards, > Strahil Nikolov > > On Sun, Jan 28, 2024 at 20:55, Hu Bert > wrote: > Hi Strahil, > there's no arbiter: 3 servers with 5 bricks each. > > Volume Name: workdata > Type: Distributed-Replicate > Volume ID: 7d1e23e5-0308-4443-a832-d36f85ff7959 > Status: Started > Snapshot Count: 0 > Number of Bricks: 5 x 3 = 15 > > The "problem" is: the number of files/entries to-be-healed has > continuously grown since the beginning, and now we're talking about > way too many files to do this manually. Last time i checked: 700K per > brick, should be >900K at the moment. The command 'gluster volume heal > workdata statistics heal-count' is unable to finish. Doesn't look that > good :D > > Interesting, the glustershd.log on the "bad" server now shows errors like > these: > > [2024-01-28 18:48:33.734053 +] E [MSGID: 108008] > [afr-self-heal-common.c:399:afr_gfid_split_brain_source] > 0-workdata-replicate-3: Gfid mismatch detected for > /803620716>, > 82d7939a-8919-40ea- > 9459-7b8af23d3b72 on workdata-client-11 and > bb9399a3-0a5c-4cd1-b2b1-3ee787ec835a on workdata-client-9 > > Shouldn't the heals happen on the 2 "good" servers? > > Anyway... we're currently preparing a different solution for our data > and we'll throw away this gluster volume - no critical data will be > lost, as these are derived from source data (on a different volume on > different servers). Will be a hard time (calculating tons of data), > but the chosen solution should have a way better performance. > > Well... thx to all for your efforts, really appreciate that :-) > > > Hubert > > Am So., 28. Jan. 2024 um 08:35 Uhr schrieb Strahil Nikolov > : > > > > What about the arbiter node ? > > Actually, check on all nodes and script it - you might need it in the > > future. > > > > Simplest way to resolve is to make the file didappear (rename to something > > else and then rename it back). Another easy trick is to read thr whole > > file: dd if=file of=/dev/null status=progress > > > > Best Regards, > > Strahil Nikolov > > > > On Sat, Jan 27, 2024 at 8:24, Hu Bert > > wrote: > > Morning, > > > > gfid1: > > getfattr -d -e hex -m. > > /gluster/md{3,4,5,6,7}/workdata/.glusterfs/fa/f5/faf59566-10f5-4ddd-8b0c-a87bc6a334fb > > > > glusterpub1 (good one): > > getfattr: Removing leading '/' from absolute path names > > # file: > > gluster/md6/workdata/.glusterfs/fa/f5/faf59566-10f5-4ddd-8b0c-a87bc6a334fb > > trusted.afr.dirty=0x > > trusted.afr.workdata-client-11=0x00020001 > > trusted.gfid=0xfaf5956610f54ddd8b0ca87bc6a334fb > > trusted.gfid2path.c2845024cc9b402e=0x38633139626234612d396236382d343532652d623434652d3664616331666434616465652f31323878313238732e6a7067 > > trusted.glusterfs.mdata=0x0165aaecff2695ebb765aaecff2695ebb765aaecff2533f110 > > > > glusterpub3 (bad one): > > getfattr: > > /gluster/md6/workdata/.glusterfs/fa/f5/faf59566-10f5-4ddd-8b0c-a87bc6a334fb: > > No such file or directory > > >
Re: [Gluster-users] Geo-replication status is getting Faulty after few seconds
Gluster doesn't use the ssh key in /root/.ssh, thus you need to exchange the public key that corresponds to /var/lib/glusterd/geo-replication/secret.pem . If you don't know the pub key, google how to obtain it from the private key. Ensure that all hosts can ssh to the secondary before proceeding with the troubleshooting. Best Regards,Strahil Nikolov On Sun, Jan 28, 2024 at 15:58, Anant Saraswat wrote: Hi All, I have now copied /var/lib/glusterd/geo-replication/secret.pem.pub (public key) from master3 to drtier1data /root/.ssh/authorized_keys, and now I can ssh from master node3 to drtier1data using the georep key (/var/lib/glusterd/geo-replication/secret.pem). But I am still getting the same error, and geo-replication is getting faulty again and again. [2024-01-28 13:46:38.897683] I [resource(worker /opt/tier1data2019/brick):1292:service_loop] GLUSTER: Register time [{time=1706449598}][2024-01-28 13:46:38.922491] I [gsyncdstatus(worker /opt/tier1data2019/brick):281:set_active] GeorepStatus: Worker Status Change [{status=Active}][2024-01-28 13:46:38.923127] I [gsyncdstatus(worker /opt/tier1data2019/brick):253:set_worker_crawl_status] GeorepStatus: Crawl Status Change [{status=History Crawl}][2024-01-28 13:46:38.923313] I [master(worker /opt/tier1data2019/brick):1576:crawl] _GMaster: starting history crawl [{turns=1}, {stime=(1705935991, 0)}, {etime=1706449598}, {entry_stime=(1705935991, 0)}][2024-01-28 13:46:39.973584] I [master(worker /opt/tier1data2019/brick):1605:crawl] _GMaster: slave's time [{stime=(1705935991, 0)}][2024-01-28 13:46:40.98970] E [syncdutils(worker /opt/tier1data2019/brick):346:log_raise_exception] : Gluster Mount process exited [{error=ENOTCONN}][2024-01-28 13:46:40.757691] I [monitor(monitor):228:monitor] Monitor: worker died in startup phase [{brick=/opt/tier1data2019/brick}][2024-01-28 13:46:40.766860] I [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change [{status=Faulty}][2024-01-28 13:46:50.793311] I [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change [{status=Initializing...}][2024-01-28 13:46:50.793469] I [monitor(monitor):160:monitor] Monitor: starting gsyncd worker [{brick=/opt/tier1data2019/brick}, {slave_node=drtier1data}][2024-01-28 13:46:50.874474] I [resource(worker /opt/tier1data2019/brick):1387:connect_remote] SSH: Initializing SSH connection between master and slave...[2024-01-28 13:46:52.659114] I [resource(worker /opt/tier1data2019/brick):1436:connect_remote] SSH: SSH connection between master and slave established. [{duration=1.7844}][2024-01-28 13:46:52.659461] I [resource(worker /opt/tier1data2019/brick):1116:connect] GLUSTER: Mounting gluster volume locally...[2024-01-28 13:46:53.698769] I [resource(worker /opt/tier1data2019/brick):1139:connect] GLUSTER: Mounted gluster volume [{duration=1.0392}][2024-01-28 13:46:53.698984] I [subcmds(worker /opt/tier1data2019/brick):84:subcmd_worker] : Worker spawn successful. Acknowledging back to monitor[2024-01-28 13:46:55.831999] I [master(worker /opt/tier1data2019/brick):1662:register] _GMaster: Working dir [{path=/var/lib/misc/gluster/gsyncd/tier1data_drtier1data_drtier1data/opt-tier1data2019-brick}][2024-01-28 13:46:55.832354] I [resource(worker /opt/tier1data2019/brick):1292:service_loop] GLUSTER: Register time [{time=1706449615}][2024-01-28 13:46:55.854684] I [gsyncdstatus(worker /opt/tier1data2019/brick):281:set_active] GeorepStatus: Worker Status Change [{status=Active}][2024-01-28 13:46:55.855251] I [gsyncdstatus(worker /opt/tier1data2019/brick):253:set_worker_crawl_status] GeorepStatus: Crawl Status Change [{status=History Crawl}][2024-01-28 13:46:55.855419] I [master(worker /opt/tier1data2019/brick):1576:crawl] _GMaster: starting history crawl [{turns=1}, {stime=(1705935991, 0)}, {etime=1706449615}, {entry_stime=(1705935991, 0)}][2024-01-28 13:46:56.905496] I [master(worker /opt/tier1data2019/brick):1605:crawl] _GMaster: slave's time [{stime=(1705935991, 0)}][2024-01-28 13:46:57.38262] E [syncdutils(worker /opt/tier1data2019/brick):346:log_raise_exception] : Gluster Mount process exited [{error=ENOTCONN}][2024-01-28 13:46:57.704128] I [monitor(monitor):228:monitor] Monitor: worker died in startup phase [{brick=/opt/tier1data2019/brick}][2024-01-28 13:46:57.706743] I [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change [{status=Faulty}][2024-01-28 13:47:07.741438] I [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change [{status=Initializing...}][2024-01-28 13:47:07.741582] I [monitor(monitor):160:monitor] Monitor: starting gsyncd worker [{brick=/opt/tier1data2019/brick}, {slave_node=drtier1data}][2024-01-28 13:47:07.821284] I [resource(worker /opt/tier1data2019/brick):1387:connect_remote] SSH: Initializing SSH connection between master and slave...[2024-01-28 13:47:09.573661] I [resource(worker /opt/tier1data2019/brick):1436
Re: [Gluster-users] Gluster communication via TLS client problem
You didn't specify correctly the IP in the SANS but I'm not sure if that's the root cause. In the SANs section Specify all hosts + their IPs: IP.1=1.2.3.4IP.2=2.3.4.5DNS.1=c01.glusterDNS.2=c02.gluster What is the output from the client:openssl s_client -showcerts -connect c02.gluster:24007 There is a very good article on the topic:https://www.redhat.com/en/blog/hardening-gluster-installations-tls Can you check it for a missed step ?Can you share the volume settings ? Best Regards,Strahil Nikolov On Sun, Jan 28, 2024 at 11:38, Stefan Kania wrote: Hi Strahil, ok, that's what I did now to create the certificate: - openssl req -x509 -sha256 -key glusterfs.key -out "glusterfs.pem" -days 365 -subj "/C=de/ST=SH/L=St. Michel/O=stka/OU=gluster-nodes/CN=c01.gluster" -addext "subjectAltName = DNS:192.168.56.41" still the same. The communication between the gluster-nodes is working with TLS, but the client can't mount the volume anymore. I now try to mount the volume with log-level=trace mount -t glusterfs -o log-level=trace c02.gluster:/gv1 /mnt and got the following: --- [2024-01-28 09:22:38.348905 +] I [MSGID: 100030] [glusterfsd.c:2767:main] 0-/usr/sbin/glusterfs: Started running version [{arg=/usr/sbin/glusterfs}, {version=10.5}, {cmdlinestr=/usr/sbin/glusterfs --log-level=TRACE --process-name fuse --volfile-server=c02.gluster --volfile-id=/gv1 /mnt}] [2024-01-28 09:22:38.349095 +] T [MSGID: 0] [xlator.c:388:xlator_dynload] 0-xlator: attempt to load file /usr/lib/x86_64-linux-gnu/glusterfs/10.5/xlator/mount/fuse.so [2024-01-28 09:22:38.349650 +] T [MSGID: 0] [xlator.c:301:xlator_dynload_apis] 0-xlator: fuse: method missing (reconfigure) [2024-01-28 09:22:38.349728 +] T [MSGID: 0] [xlator.c:319:xlator_dynload_apis] 0-xlator: fuse: method missing (dump_metrics) [2024-01-28 09:22:38.349854 +] T [MSGID: 0] [xlator.c:325:xlator_dynload_apis] 0-xlator: fuse: method missing (pass_through_fops), falling back to default [2024-01-28 09:22:38.349979 +] D [MSGID: 0] [glusterfsd.c:421:set_fuse_mount_options] 0-glusterfsd: fopen-keep-cache mode 2 [2024-01-28 09:22:38.350111 +] D [MSGID: 0] [glusterfsd.c:465:set_fuse_mount_options] 0-glusterfsd: fuse direct io type 2 [2024-01-28 09:22:38.350222 +] D [MSGID: 0] [glusterfsd.c:478:set_fuse_mount_options] 0-glusterfsd: fuse no-root-squash mode 0 [2024-01-28 09:22:38.350347 +] D [MSGID: 0] [glusterfsd.c:519:set_fuse_mount_options] 0-glusterfsd: kernel-writeback-cache mode 2 [2024-01-28 09:22:38.350458 +] D [MSGID: 0] [glusterfsd.c:537:set_fuse_mount_options] 0-glusterfsd: fuse-flush-handle-interrupt mode 2 [2024-01-28 09:22:38.350674 +] T [MSGID: 0] [options.c:1239:xlator_option_init_double] 0-fuse: option attribute-timeout using default value 1.0 [2024-01-28 09:22:38.350792 +] T [MSGID: 0] [options.c:513:xlator_option_validate_double] 0-fuse: no range check required for 'option attribute-timeout 1.0' [2024-01-28 09:22:38.350925 +] T [MSGID: 0] [options.c:1230:xlator_option_init_uint32] 0-fuse: option reader-thread-count using default value 1 [2024-01-28 09:22:38.351133 +] D [dict.c:2503:dict_get_str] (-->/usr/lib/x86_64-linux-gnu/glusterfs/10.5/xlator/mount/fuse.so(+0x1ee10) [0x7ff51324ce10] -->/lib/x86_64-linux-gnu/libglusterfs.so.0(xlator_option_init_bool+0x60) [0x7ff513e88bf0] -->/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_get_str+0xdf) [0x7ff513e358df] ) 0-dict: key auto-invalidation, string type asked, has unsigned integer type [Das Argument ist ungültig] [2024-01-28 09:22:38.351262 +] D [MSGID: 0] [options.c:1236:xlator_option_init_bool] 0-fuse: option auto-invalidation using set value 0 [2024-01-28 09:22:38.351514 +] T [MSGID: 0] [options.c:1239:xlator_option_init_double] 0-fuse: option entry-timeout using default value 1.0 [2024-01-28 09:22:38.351661 +] T [MSGID: 0] [options.c:513:xlator_option_validate_double] 0-fuse: no range check required for 'option entry-timeout 1.0' [2024-01-28 09:22:38.351894 +] D [dict.c:2503:dict_get_str] (-->/usr/lib/x86_64-linux-gnu/glusterfs/10.5/xlator/mount/fuse.so(+0x1ee6e) [0x7ff51324ce6e] -->/lib/x86_64-linux-gnu/libglusterfs.so.0(xlator_option_init_double+0x60) [0x7ff513e89080] -->/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_get_str+0xdf) [0x7ff513e358df] ) 0-dict: key negative-timeout, string type asked, has float type [Das Argument ist ungültig] [2024-01-28 09:22:38.351970 +] D [MSGID: 0] [options.c:1239:xlator_option_init_double] 0-fuse: option negative-timeout using set value 0.00 [2024-01-28 09:22:38.352092 +] T [MSGID: 0] [options.c:513:xlator_option_validate_double] 0-fuse: no range check required for 'option negative-timeout 0.00' [2024-01-28 09:22:38.352283 +] T [MSGID: 0] [options.c:1231:xlator_option_init_int32] 0-fuse: option client-pid not set [2024-01-28
Re: [Gluster-users] Gluster communication via TLS client problem
Usually with Certificates it's always a pain.I would ask you to regenerate the certificates but by adding the FQDN of the system and the IP used by the clients to reach the brick in 'SANS' section of the cert. Also, set the validity to 365 days for the test. Best Regards,Strahil Nikolov On Fri, Jan 26, 2024 at 21:37, Stefan Kania wrote: Hi Aravinda Am 26.01.24 um 17:01 schrieb Aravinda: > Does the combined glusterfs.ca includes client nodes pem? Also this file > need to be placed in Client node as well. Yes, I put all the Gluster-node Certificates AND the client certificate into the glusterfs.ca file. And I put the file to all gluster-nodes and clients. I did it twice (delete all certificate and restart all over)the result was always the same. Stefan Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Heal failure
Can you find the gfid on the other nodes ? If yes , then find the actual file (not the one in .glusterfs) that is linked (hard link) to the gfidThen check if the file (not the gfid) exist on the same brick where the heal daemon is complaining the gfid is missing. Use the following on all bricks: getfattr -d -e hex -m. '/urd-gds/gds-admin/.glusterfs/cd/b6/cdb62af8-ef52-4b8f-aa27-480405769877' Maybe only the hard link is missing on the node. Best Regards,Strahil Nikolov On Fri, Jan 19, 2024 at 11:48, Marcus Pedersén wrote: Hi all, I have a really strange problem with my cluster. Running gluster 10.4, replicated with an arbiter: Number of Bricks: 1 x (2 + 1) = 3 All my files in the system seems fine and I have not found any broken files. Even though I have 4 files that needs healing, in heal-count. Heal fails for all the files over and over again. If I use heal info I just get a long list of gfids and trying gfids with the script resolve-gfid.sh the only reply I get is: File: ls: cannot access '/urd-gds/gds-admin//.glusterfs/cd/b6/cdb62af8-ef52-4b8f-aa27-480405769877': No such file or directory Have not tried them all, but quite many. How can I get rid of these "failed" files, that are not files? Many thanks in advance!! Best regards Marcus --- När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/> Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Upgrade 10.4 -> 11.1 making problems
What about the arbiter node ?Actually, check on all nodes and script it - you might need it in the future. Simplest way to resolve is to make the file didappear (rename to something else and then rename it back). Another easy trick is to read thr whole file: dd if=file of=/dev/null status=progress Best Regards,Strahil Nikolov On Sat, Jan 27, 2024 at 8:24, Hu Bert wrote: Morning, gfid1: getfattr -d -e hex -m. /gluster/md{3,4,5,6,7}/workdata/.glusterfs/fa/f5/faf59566-10f5-4ddd-8b0c-a87bc6a334fb glusterpub1 (good one): getfattr: Removing leading '/' from absolute path names # file: gluster/md6/workdata/.glusterfs/fa/f5/faf59566-10f5-4ddd-8b0c-a87bc6a334fb trusted.afr.dirty=0x trusted.afr.workdata-client-11=0x00020001 trusted.gfid=0xfaf5956610f54ddd8b0ca87bc6a334fb trusted.gfid2path.c2845024cc9b402e=0x38633139626234612d396236382d343532652d623434652d3664616331666434616465652f31323878313238732e6a7067 trusted.glusterfs.mdata=0x0165aaecff2695ebb765aaecff2695ebb765aaecff2533f110 glusterpub3 (bad one): getfattr: /gluster/md6/workdata/.glusterfs/fa/f5/faf59566-10f5-4ddd-8b0c-a87bc6a334fb: No such file or directory gfid 2: getfattr -d -e hex -m. /gluster/md{3,4,5,6,7}/workdata/.glusterfs/60/46/60465723-5dc0-4ebe-aced-9f2c12e52642 glusterpub1 (good one): getfattr: Removing leading '/' from absolute path names # file: gluster/md5/workdata/.glusterfs/60/46/60465723-5dc0-4ebe-aced-9f2c12e52642 trusted.afr.dirty=0x trusted.afr.workdata-client-8=0x00020001 trusted.gfid=0x604657235dc04ebeaced9f2c12e52642 trusted.gfid2path.ac4669e3c4faf926=0x33366463366137392d666135642d343238652d613738642d6234376230616662316562642f31323878313238732e6a7067 trusted.glusterfs.mdata=0x0165aaecfe0c5403bd65aaecfe0c5403bd65aaecfe0ad61ee4 glusterpub3 (bad one): getfattr: /gluster/md5/workdata/.glusterfs/60/46/60465723-5dc0-4ebe-aced-9f2c12e52642: No such file or directory thx, Hubert Am Sa., 27. Jan. 2024 um 06:13 Uhr schrieb Strahil Nikolov : > > You don't need to mount it. > Like this : > # getfattr -d -e hex -m. > /path/to/brick/.glusterfs/00/46/00462be8-3e61-4931-8bda-dae1645c639e > # file: 00/46/00462be8-3e61-4931-8bda-dae1645c639e > trusted.gfid=0x00462be83e6149318bdadae1645c639e > trusted.gfid2path.05fcbdafdeea18ab=0x3032673930632d386637622d346436652d393464362d3936393132313930643131312f66696c656c6f636b696e672e7079 > trusted.glusterfs.mdata=0x016170340c25b6a7456170340c20efb5776170340c20d42b07 > trusted.glusterfs.shard.block-size=0x0400 > trusted.glusterfs.shard.file-size=0x00cd000100000000 > > > Best Regards, > Strahil Nikolov > > > > В четвъртък, 25 януари 2024 г. в 09:42:46 ч. Гринуич+2, Hu Bert > написа: > > > > > > Good morning, > > hope i got it right... using: > https://access.redhat.com/documentation/de-de/red_hat_gluster_storage/3.1/html/administration_guide/ch27s02 > > mount -t glusterfs -o aux-gfid-mount glusterpub1:/workdata /mnt/workdata > > gfid 1: > getfattr -n trusted.glusterfs.pathinfo -e text > /mnt/workdata/.gfid/faf59566-10f5-4ddd-8b0c-a87bc6a334fb > getfattr: Removing leading '/' from absolute path names > # file: mnt/workdata/.gfid/faf59566-10f5-4ddd-8b0c-a87bc6a334fb > trusted.glusterfs.pathinfo="( > ( > > uster/md6/workdata/images/133/283/13328349/128x128s.jpg>))" > > gfid 2: > getfattr -n trusted.glusterfs.pathinfo -e text > /mnt/workdata/.gfid/60465723-5dc0-4ebe-aced-9f2c12e52642 > getfattr: Removing leading '/' from absolute path names > # file: mnt/workdata/.gfid/60465723-5dc0-4ebe-aced-9f2c12e52642 > trusted.glusterfs.pathinfo="( > ( > > ):glusterpub1:/gluster/md5/workdata/.glusterfs/60/46/60465723-5dc0-4ebe-aced-9f2c12e52642>))" > > glusterpub1 + glusterpub2 are the good ones, glusterpub3 is the > misbehaving (not healing) one. > > The file with gfid 1 is available under > /gluster/md6/workdata/images/133/283/13328349/ on glusterpub1+2 > bricks, but missing on glusterpub3 brick. > > gfid 2: > /gluster/md5/workdata/.glusterfs/60/46/60465723-5dc0-4ebe-aced-9f2c12e52642 > is present on glusterpub1+2, but not on glusterpub3. > > > Thx, > Hubert > > Am Mi., 24. Jan. 2024 um 17:36 Uhr schrieb Strahil Nikolov > : > > > > > Hi, > > > > Can you find and check the files with gfids: > > 60465723-5dc0-4ebe-aced-9f2c12e52642 > > faf59566-10f5-4ddd-8b0c-a87bc6a334fb > > > > Use 'getfattr -d -e hex -m. ' command from > > https://docs.gluster.org/en/main/Troubleshooting/resolving-sp
Re: [Gluster-users] Geo-replication status is getting Faulty after few seconds
Don't forget to test with the georep key. I think it was /var/lib/glusterd/geo-replication/secret.pem Best Regards, Strahil Nikolov В събота, 27 януари 2024 г. в 07:24:07 ч. Гринуич+2, Strahil Nikolov написа: Hi Anant, i would first start checking if you can do ssh from all masters to the slave node.If you haven't setup a dedicated user for the session, then gluster is using root. Best Regards, Strahil Nikolov В петък, 26 януари 2024 г. в 18:07:59 ч. Гринуич+2, Anant Saraswat написа: Hi All, I have run the following commands on master3, and that has added master3 to geo-replication. gluster system:: execute gsec_create gluster volume geo-replication tier1data drtier1data::drtier1data create push-pem force gluster volume geo-replication tier1data drtier1data::drtier1data stop gluster volume geo-replication tier1data drtier1data::drtier1data start Now I am able to start the geo-replication, but I am getting the same error. [2024-01-24 19:51:24.80892] I [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change [{status=Initializing...}] [2024-01-24 19:51:24.81020] I [monitor(monitor):160:monitor] Monitor: starting gsyncd worker [{brick=/opt/tier1data2019/brick}, {slave_node=drtier1data}] [2024-01-24 19:51:24.158021] I [resource(worker /opt/tier1data2019/brick):1387:connect_remote] SSH: Initializing SSH connection between master and slave... [2024-01-24 19:51:25.951998] I [resource(worker /opt/tier1data2019/brick):1436:connect_remote] SSH: SSH connection between master and slave established. [{duration=1.7938}] [2024-01-24 19:51:25.952292] I [resource(worker /opt/tier1data2019/brick):1116:connect] GLUSTER: Mounting gluster volume locally... [2024-01-24 19:51:26.986974] I [resource(worker /opt/tier1data2019/brick):1139:connect] GLUSTER: Mounted gluster volume [{duration=1.0346}] [2024-01-24 19:51:26.987137] I [subcmds(worker /opt/tier1data2019/brick):84:subcmd_worker] : Worker spawn successful. Acknowledging back to monitor [2024-01-24 19:51:29.139131] I [master(worker /opt/tier1data2019/brick):1662:register] _GMaster: Working dir [{path=/var/lib/misc/gluster/gsyncd/tier1data_drtier1data_drtier1data/opt-tier1data2019-brick}] [2024-01-24 19:51:29.139531] I [resource(worker /opt/tier1data2019/brick):1292:service_loop] GLUSTER: Register time [{time=1706125889}] [2024-01-24 19:51:29.173877] I [gsyncdstatus(worker /opt/tier1data2019/brick):281:set_active] GeorepStatus: Worker Status Change [{status=Active}] [2024-01-24 19:51:29.174407] I [gsyncdstatus(worker /opt/tier1data2019/brick):253:set_worker_crawl_status] GeorepStatus: Crawl Status Change [{status=History Crawl}] [2024-01-24 19:51:29.174558] I [master(worker /opt/tier1data2019/brick):1576:crawl] _GMaster: starting history crawl [{turns=1}, {stime=(1705935991, 0)}, {etime=1706125889}, {entry_stime=(1705935991, 0)}] [2024-01-24 19:51:30.251965] I [master(worker /opt/tier1data2019/brick):1605:crawl] _GMaster: slave's time [{stime=(1705935991, 0)}] [2024-01-24 19:51:30.376715] E [syncdutils(worker /opt/tier1data2019/brick):346:log_raise_exception] : Gluster Mount process exited [{error=ENOTCONN}] [2024-01-24 19:51:30.991856] I [monitor(monitor):228:monitor] Monitor: worker died in startup phase [{brick=/opt/tier1data2019/brick}] [2024-01-24 19:51:30.993608] I [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change [{status=Faulty}] Any idea why it's stuck in this loop? Thanks, Anant From: Gluster-users on behalf of Anant Saraswat Sent: 22 January 2024 9:00 PM To: gluster-users@gluster.org Subject: [Gluster-users] Geo-replication status is getting Faulty after few seconds EXTERNAL: Do not click links or open attachments if you do not recognize the sender. Hi There, We have a Gluster setup with three master nodes in replicated mode and one slave node with geo-replication. # gluster volume info Volume Name: tier1data Type: Replicate Volume ID: 93c45c14-f700-4d50-962b-7653be471e27 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: master1:/opt/tier1data2019/brick Brick2: master2:/opt/tier1data2019/brick Brick3: master3:/opt/tier1data2019/brick master1 |master2 | --geo-replication- | drtier1datamaster3 | We added the master3 node a few months back, the initial setup consisted of 2 master nodes and one geo-replicated slave(drtier1data). Our geo-replication was functioning well with the initial two master nodes (master1 and master2), where master1 was active and master2 was in passive mode. However, today, we started experiencing issues where geo-replication suddenly stopped and became stuck in a loop of Initializing..., Active.. Faulty on master1, while master2 remained in passive mode. Upon checking the gsyncd.log on the master1
Re: [Gluster-users] Geo-replication status is getting Faulty after few seconds
Hi Anant, i would first start checking if you can do ssh from all masters to the slave node.If you haven't setup a dedicated user for the session, then gluster is using root. Best Regards, Strahil Nikolov В петък, 26 януари 2024 г. в 18:07:59 ч. Гринуич+2, Anant Saraswat написа: Hi All, I have run the following commands on master3, and that has added master3 to geo-replication. gluster system:: execute gsec_create gluster volume geo-replication tier1data drtier1data::drtier1data create push-pem force gluster volume geo-replication tier1data drtier1data::drtier1data stop gluster volume geo-replication tier1data drtier1data::drtier1data start Now I am able to start the geo-replication, but I am getting the same error. [2024-01-24 19:51:24.80892] I [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change [{status=Initializing...}] [2024-01-24 19:51:24.81020] I [monitor(monitor):160:monitor] Monitor: starting gsyncd worker [{brick=/opt/tier1data2019/brick}, {slave_node=drtier1data}] [2024-01-24 19:51:24.158021] I [resource(worker /opt/tier1data2019/brick):1387:connect_remote] SSH: Initializing SSH connection between master and slave... [2024-01-24 19:51:25.951998] I [resource(worker /opt/tier1data2019/brick):1436:connect_remote] SSH: SSH connection between master and slave established. [{duration=1.7938}] [2024-01-24 19:51:25.952292] I [resource(worker /opt/tier1data2019/brick):1116:connect] GLUSTER: Mounting gluster volume locally... [2024-01-24 19:51:26.986974] I [resource(worker /opt/tier1data2019/brick):1139:connect] GLUSTER: Mounted gluster volume [{duration=1.0346}] [2024-01-24 19:51:26.987137] I [subcmds(worker /opt/tier1data2019/brick):84:subcmd_worker] : Worker spawn successful. Acknowledging back to monitor [2024-01-24 19:51:29.139131] I [master(worker /opt/tier1data2019/brick):1662:register] _GMaster: Working dir [{path=/var/lib/misc/gluster/gsyncd/tier1data_drtier1data_drtier1data/opt-tier1data2019-brick}] [2024-01-24 19:51:29.139531] I [resource(worker /opt/tier1data2019/brick):1292:service_loop] GLUSTER: Register time [{time=1706125889}] [2024-01-24 19:51:29.173877] I [gsyncdstatus(worker /opt/tier1data2019/brick):281:set_active] GeorepStatus: Worker Status Change [{status=Active}] [2024-01-24 19:51:29.174407] I [gsyncdstatus(worker /opt/tier1data2019/brick):253:set_worker_crawl_status] GeorepStatus: Crawl Status Change [{status=History Crawl}] [2024-01-24 19:51:29.174558] I [master(worker /opt/tier1data2019/brick):1576:crawl] _GMaster: starting history crawl [{turns=1}, {stime=(1705935991, 0)}, {etime=1706125889}, {entry_stime=(1705935991, 0)}] [2024-01-24 19:51:30.251965] I [master(worker /opt/tier1data2019/brick):1605:crawl] _GMaster: slave's time [{stime=(1705935991, 0)}] [2024-01-24 19:51:30.376715] E [syncdutils(worker /opt/tier1data2019/brick):346:log_raise_exception] : Gluster Mount process exited [{error=ENOTCONN}] [2024-01-24 19:51:30.991856] I [monitor(monitor):228:monitor] Monitor: worker died in startup phase [{brick=/opt/tier1data2019/brick}] [2024-01-24 19:51:30.993608] I [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change [{status=Faulty}] Any idea why it's stuck in this loop? Thanks, Anant From: Gluster-users on behalf of Anant Saraswat Sent: 22 January 2024 9:00 PM To: gluster-users@gluster.org Subject: [Gluster-users] Geo-replication status is getting Faulty after few seconds EXTERNAL: Do not click links or open attachments if you do not recognize the sender. Hi There, We have a Gluster setup with three master nodes in replicated mode and one slave node with geo-replication. # gluster volume info Volume Name: tier1data Type: Replicate Volume ID: 93c45c14-f700-4d50-962b-7653be471e27 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: master1:/opt/tier1data2019/brick Brick2: master2:/opt/tier1data2019/brick Brick3: master3:/opt/tier1data2019/brick master1 |master2 | --geo-replication- | drtier1datamaster3 | We added the master3 node a few months back, the initial setup consisted of 2 master nodes and one geo-replicated slave(drtier1data). Our geo-replication was functioning well with the initial two master nodes (master1 and master2), where master1 was active and master2 was in passive mode. However, today, we started experiencing issues where geo-replication suddenly stopped and became stuck in a loop of Initializing..., Active.. Faulty on master1, while master2 remained in passive mode. Upon checking the gsyncd.log on the master1 node, we observed the following error (please refer to the attached logs for more details): E [syncdutils(worker /opt/tier1data2019/brick):346:log_raise_exception] : Gluster Mount process exited [{error=ENOTCONN
Re: [Gluster-users] Upgrade 10.4 -> 11.1 making problems
You don't need to mount it. Like this : # getfattr -d -e hex -m. /path/to/brick/.glusterfs/00/46/00462be8-3e61-4931-8bda-dae1645c639e # file: 00/46/00462be8-3e61-4931-8bda-dae1645c639e trusted.gfid=0x00462be83e6149318bdadae1645c639e trusted.gfid2path.05fcbdafdeea18ab=0x3032673930632d386637622d346436652d393464362d3936393132313930643131312f66696c656c6f636b696e672e7079 trusted.glusterfs.mdata=0x016170340c25b6a7456170340c20efb5776170340c20d42b07 trusted.glusterfs.shard.block-size=0x0400 trusted.glusterfs.shard.file-size=0x00cd0001 Best Regards, Strahil Nikolov В четвъртък, 25 януари 2024 г. в 09:42:46 ч. Гринуич+2, Hu Bert написа: Good morning, hope i got it right... using: https://access.redhat.com/documentation/de-de/red_hat_gluster_storage/3.1/html/administration_guide/ch27s02 mount -t glusterfs -o aux-gfid-mount glusterpub1:/workdata /mnt/workdata gfid 1: getfattr -n trusted.glusterfs.pathinfo -e text /mnt/workdata/.gfid/faf59566-10f5-4ddd-8b0c-a87bc6a334fb getfattr: Removing leading '/' from absolute path names # file: mnt/workdata/.gfid/faf59566-10f5-4ddd-8b0c-a87bc6a334fb trusted.glusterfs.pathinfo="( ( ))" gfid 2: getfattr -n trusted.glusterfs.pathinfo -e text /mnt/workdata/.gfid/60465723-5dc0-4ebe-aced-9f2c12e52642 getfattr: Removing leading '/' from absolute path names # file: mnt/workdata/.gfid/60465723-5dc0-4ebe-aced-9f2c12e52642 trusted.glusterfs.pathinfo="( ( ))" glusterpub1 + glusterpub2 are the good ones, glusterpub3 is the misbehaving (not healing) one. The file with gfid 1 is available under /gluster/md6/workdata/images/133/283/13328349/ on glusterpub1+2 bricks, but missing on glusterpub3 brick. gfid 2: /gluster/md5/workdata/.glusterfs/60/46/60465723-5dc0-4ebe-aced-9f2c12e52642 is present on glusterpub1+2, but not on glusterpub3. Thx, Hubert Am Mi., 24. Jan. 2024 um 17:36 Uhr schrieb Strahil Nikolov : > > Hi, > > Can you find and check the files with gfids: > 60465723-5dc0-4ebe-aced-9f2c12e52642 > faf59566-10f5-4ddd-8b0c-a87bc6a334fb > > Use 'getfattr -d -e hex -m. ' command from > https://docs.gluster.org/en/main/Troubleshooting/resolving-splitbrain/#analysis-of-the-output > . > > Best Regards, > Strahil Nikolov > > On Sat, Jan 20, 2024 at 9:44, Hu Bert > wrote: > Good morning, > > thx Gilberto, did the first three (set to WARNING), but the last one > doesn't work. Anyway, with setting these three some new messages > appear: > > [2024-01-20 07:23:58.561106 +] W [MSGID: 114061] > [client-common.c:796:client_pre_lk_v2] 0-workdata-client-11: remote_fd > is -1. EBADFD [{gfid=faf59566-10f5-4ddd-8b0c-a87bc6a334fb}, > {errno=77}, {error=File descriptor in bad state}] > [2024-01-20 07:23:58.561177 +] E [MSGID: 108028] > [afr-open.c:361:afr_is_reopen_allowed_cbk] 0-workdata-replicate-3: > Failed getlk for faf59566-10f5-4ddd-8b0c-a87bc6a334fb [File descriptor > in bad state] > [2024-01-20 07:23:58.562151 +] W [MSGID: 114031] > [client-rpc-fops_v2.c:2561:client4_0_lookup_cbk] 0-workdata-client-11: > remote operation failed. > [{path=}, > {gfid=faf59566-10f5-4ddd-8b0c-a87b > c6a334fb}, {errno=2}, {error=No such file or directory}] > [2024-01-20 07:23:58.562296 +] W [MSGID: 114061] > [client-common.c:530:client_pre_flush_v2] 0-workdata-client-11: > remote_fd is -1. EBADFD [{gfid=faf59566-10f5-4ddd-8b0c-a87bc6a334fb}, > {errno=77}, {error=File descriptor in bad state}] > [2024-01-20 07:23:58.860552 +] W [MSGID: 114061] > [client-common.c:796:client_pre_lk_v2] 0-workdata-client-8: remote_fd > is -1. EBADFD [{gfid=60465723-5dc0-4ebe-aced-9f2c12e52642}, > {errno=77}, {error=File descriptor in bad state}] > [2024-01-20 07:23:58.860608 +] E [MSGID: 108028] > [afr-open.c:361:afr_is_reopen_allowed_cbk] 0-workdata-replicate-2: > Failed getlk for 60465723-5dc0-4ebe-aced-9f2c12e52642 [File descriptor > in bad state] > [2024-01-20 07:23:58.861520 +] W [MSGID: 114031] > [client-rpc-fops_v2.c:2561:client4_0_lookup_cbk] 0-workdata-client-8: > remote operation failed. > [{path=}, > {gfid=60465723-5dc0-4ebe-aced-9f2c1 > 2e52642}, {errno=2}, {error=No such file or directory}] > [2024-01-20 07:23:58.861640 +] W [MSGID: 114061] > [client-common.c:530:client_pre_flush_v2] 0-workdata-client-8: > remote_fd is -1. EBADFD [{gfid=60465723-5dc0-4ebe-aced-9f2c12e52642}, > {errno=77}, {error=File descriptor in bad state}] > > Not many log entries appear, only a few. Has someone seen error > messages like these? Setting diagnostics.brick-sys-log-level to DEBUG > shows way more log entries, uploaded it to: > https://file.io/spLhlcbMCzr8 - not sure if that helps. > > > Thx, > Hubert > > Am Fr., 19.
Re: [Gluster-users] Upgrade 10.4 -> 11.1 making problems
Hi, Can you find and check the files with gfids: 60465723-5dc0-4ebe-aced-9f2c12e52642faf59566-10f5-4ddd-8b0c-a87bc6a334fb Use 'getfattr -d -e hex -m. ' command from https://docs.gluster.org/en/main/Troubleshooting/resolving-splitbrain/#analysis-of-the-output . Best Regards,Strahil Nikolov On Sat, Jan 20, 2024 at 9:44, Hu Bert wrote: Good morning, thx Gilberto, did the first three (set to WARNING), but the last one doesn't work. Anyway, with setting these three some new messages appear: [2024-01-20 07:23:58.561106 +] W [MSGID: 114061] [client-common.c:796:client_pre_lk_v2] 0-workdata-client-11: remote_fd is -1. EBADFD [{gfid=faf59566-10f5-4ddd-8b0c-a87bc6a334fb}, {errno=77}, {error=File descriptor in bad state}] [2024-01-20 07:23:58.561177 +] E [MSGID: 108028] [afr-open.c:361:afr_is_reopen_allowed_cbk] 0-workdata-replicate-3: Failed getlk for faf59566-10f5-4ddd-8b0c-a87bc6a334fb [File descriptor in bad state] [2024-01-20 07:23:58.562151 +] W [MSGID: 114031] [client-rpc-fops_v2.c:2561:client4_0_lookup_cbk] 0-workdata-client-11: remote operation failed. [{path=}, {gfid=faf59566-10f5-4ddd-8b0c-a87b c6a334fb}, {errno=2}, {error=No such file or directory}] [2024-01-20 07:23:58.562296 +] W [MSGID: 114061] [client-common.c:530:client_pre_flush_v2] 0-workdata-client-11: remote_fd is -1. EBADFD [{gfid=faf59566-10f5-4ddd-8b0c-a87bc6a334fb}, {errno=77}, {error=File descriptor in bad state}] [2024-01-20 07:23:58.860552 +] W [MSGID: 114061] [client-common.c:796:client_pre_lk_v2] 0-workdata-client-8: remote_fd is -1. EBADFD [{gfid=60465723-5dc0-4ebe-aced-9f2c12e52642}, {errno=77}, {error=File descriptor in bad state}] [2024-01-20 07:23:58.860608 +] E [MSGID: 108028] [afr-open.c:361:afr_is_reopen_allowed_cbk] 0-workdata-replicate-2: Failed getlk for 60465723-5dc0-4ebe-aced-9f2c12e52642 [File descriptor in bad state] [2024-01-20 07:23:58.861520 +] W [MSGID: 114031] [client-rpc-fops_v2.c:2561:client4_0_lookup_cbk] 0-workdata-client-8: remote operation failed. [{path=}, {gfid=60465723-5dc0-4ebe-aced-9f2c1 2e52642}, {errno=2}, {error=No such file or directory}] [2024-01-20 07:23:58.861640 +] W [MSGID: 114061] [client-common.c:530:client_pre_flush_v2] 0-workdata-client-8: remote_fd is -1. EBADFD [{gfid=60465723-5dc0-4ebe-aced-9f2c12e52642}, {errno=77}, {error=File descriptor in bad state}] Not many log entries appear, only a few. Has someone seen error messages like these? Setting diagnostics.brick-sys-log-level to DEBUG shows way more log entries, uploaded it to: https://file.io/spLhlcbMCzr8 - not sure if that helps. Thx, Hubert Am Fr., 19. Jan. 2024 um 16:24 Uhr schrieb Gilberto Ferreira : > > gluster volume set testvol diagnostics.brick-log-level WARNING > gluster volume set testvol diagnostics.brick-sys-log-level WARNING > gluster volume set testvol diagnostics.client-log-level ERROR > gluster --log-level=ERROR volume status > > --- > Gilberto Nunes Ferreira > > > > > > > Em sex., 19 de jan. de 2024 às 05:49, Hu Bert > escreveu: >> >> Hi Strahil, >> hm, don't get me wrong, it may sound a bit stupid, but... where do i >> set the log level? Using debian... >> >> https://access.redhat.com/documentation/de-de/red_hat_gluster_storage/3/html/administration_guide/configuring_the_log_level >> >> ls /etc/glusterfs/ >> eventsconfig.json glusterfs-georep-logrotate >> gluster-rsyslog-5.8.conf group-db-workload group-gluster-block >> group-nl-cache group-virt.example logger.conf.example >> glusterd.vol glusterfs-logrotate >> gluster-rsyslog-7.2.conf group-distributed-virt group-metadata-cache >> group-samba gsyncd.conf thin-arbiter.vol >> >> checked: /etc/glusterfs/logger.conf.example >> >> # To enable enhanced logging capabilities, >> # >> # 1. rename this file to /etc/glusterfs/logger.conf >> # >> # 2. rename /etc/rsyslog.d/gluster.conf.example to >> # /etc/rsyslog.d/gluster.conf >> # >> # This change requires restart of all gluster services/volumes and >> # rsyslog. >> >> tried (to test): /etc/glusterfs/logger.conf with " LOG_LEVEL='WARNING' " >> >> restart glusterd on that node, but this doesn't work, log-level stays >> on INFO. /etc/rsyslog.d/gluster.conf.example does not exist. Probably >> /etc/rsyslog.conf on debian. But first it would be better to know >> where to set the log-level for glusterd. >> >> Depending on how much the DEBUG log-level talks ;-) i could assign up >> to 100G to /var >> >> >> Thx & best regards, >> Hubert >> >> >> Am Do., 18. Jan. 2024 um 22:58 Uhr schrieb Strahil Nikolov >> : >> > >> > Are you able to set the logs to debug level ? >> > It might provide a clue what it is going o
Re: [Gluster-users] Upgrade 10.4 -> 11.1 making problems
Are you able to set the logs to debug level ?It might provide a clue what it is going on. Best Regards,Strahil Nikolov On Thu, Jan 18, 2024 at 13:08, Diego Zuccato wrote: That's the same kind of errors I keep seeing on my 2 clusters, regenerated some months ago. Seems a pseudo-split-brain that should be impossible on a replica 3 cluster but keeps happening. Sadly going to ditch Gluster ASAP. Diego Il 18/01/2024 07:11, Hu Bert ha scritto: > Good morning, > heal still not running. Pending heals now sum up to 60K per brick. > Heal was starting instantly e.g. after server reboot with version > 10.4, but doesn't with version 11. What could be wrong? > > I only see these errors on one of the "good" servers in glustershd.log: > > [2024-01-18 06:08:57.328480 +] W [MSGID: 114031] > [client-rpc-fops_v2.c:2561:client4_0_lookup_cbk] 0-workdata-client-0: > remote operation failed. > [{path=}, > {gfid=cb39a1e4-2a4c-4727-861d-3ed9e > f00681b}, {errno=2}, {error=No such file or directory}] > [2024-01-18 06:08:57.594051 +] W [MSGID: 114031] > [client-rpc-fops_v2.c:2561:client4_0_lookup_cbk] 0-workdata-client-1: > remote operation failed. > [{path=}, > {gfid=3e9b178c-ae1f-4d85-ae47-fc539 > d94dd11}, {errno=2}, {error=No such file or directory}] > > About 7K today. Any ideas? Someone? > > > Best regards, > Hubert > > Am Mi., 17. Jan. 2024 um 11:24 Uhr schrieb Hu Bert : >> >> ok, finally managed to get all servers, volumes etc runnung, but took >> a couple of restarts, cksum checks etc. >> >> One problem: a volume doesn't heal automatically or doesn't heal at all. >> >> gluster volume status >> Status of volume: workdata >> Gluster process TCP Port RDMA Port Online Pid >> -- >> Brick glusterpub1:/gluster/md3/workdata 58832 0 Y 3436 >> Brick glusterpub2:/gluster/md3/workdata 59315 0 Y 1526 >> Brick glusterpub3:/gluster/md3/workdata 56917 0 Y 1952 >> Brick glusterpub1:/gluster/md4/workdata 59688 0 Y 3755 >> Brick glusterpub2:/gluster/md4/workdata 60271 0 Y 2271 >> Brick glusterpub3:/gluster/md4/workdata 49461 0 Y 2399 >> Brick glusterpub1:/gluster/md5/workdata 54651 0 Y 4208 >> Brick glusterpub2:/gluster/md5/workdata 49685 0 Y 2751 >> Brick glusterpub3:/gluster/md5/workdata 59202 0 Y 2803 >> Brick glusterpub1:/gluster/md6/workdata 55829 0 Y 4583 >> Brick glusterpub2:/gluster/md6/workdata 50455 0 Y 3296 >> Brick glusterpub3:/gluster/md6/workdata 50262 0 Y 3237 >> Brick glusterpub1:/gluster/md7/workdata 52238 0 Y 5014 >> Brick glusterpub2:/gluster/md7/workdata 52474 0 Y 3673 >> Brick glusterpub3:/gluster/md7/workdata 57966 0 Y 3653 >> Self-heal Daemon on localhost N/A N/A Y 4141 >> Self-heal Daemon on glusterpub1 N/A N/A Y 5570 >> Self-heal Daemon on glusterpub2 N/A N/A Y 4139 >> >> "gluster volume heal workdata info" lists a lot of files per brick. >> "gluster volume heal workdata statistics heal-count" shows thousands >> of files per brick. >> "gluster volume heal workdata enable" has no effect. >> >> gluster volume heal workdata full >> Launching heal operation to perform full self heal on volume workdata >> has been successful >> Use heal info commands to check status. >> >> -> not doing anything at all. And nothing happening on the 2 "good" >> servers in e.g. glustershd.log. Heal was working as expected on >> version 10.4, but here... silence. Someone has an idea? >> >> >> Best regards, >> Hubert >> >> Am Di., 16. Jan. 2024 um 13:44 Uhr schrieb Gilberto Ferreira >> : >>> >>> Ah! Indeed! You need to perform an upgrade in the clients as well. >>> >>> >>> >>> >>> >>> >>> >>> >>> Em ter., 16 de jan. de 2024 às 03:12, Hu Bert >>> escreveu: >>>> >>>> morning to those still reading :-) >>>> >>>> i found this: >>>> https://docs.gluster.org/en/main/Troubleshooting/troubleshooting-glusterd/#common-issues-and-how-to-resolve-them >>>> >>>> there's a paragraph about "peer rejected" with the same error me
Re: [Gluster-users] Replacing Failed Server Failing
Hi, I think you might be mixing the approach.Basically you have 2 options: Create brand new system, use different hostname and then add it to the TSP (Trusted Storage Pool).Then you need to remove the bricks(server + directory combination ) owned by the previous system and then add the new bricks . Use the same hostname as the old system and restore from backup the gluster directories (both the one in '/etc' and in '/var/lib').If your gluster storage was also affected, you will need to recover the bricks from backup or remove the old ones from the volume and recreate them. Can you describe what you have done so far (logically) ? Best Regards,Strahil Nikolov On Mon, Jan 1, 2024 at 6:59, duluxoz wrote: Hi All (and Happy New Year), We had to replace one of our Gluster Servers in our Trusted Pool this week (node1). The new server is now built, with empty folders for the bricks, peered to the old Nodes (node2 & node3). We basically followed this guide: https://docs.rackspace.com/docs/recover-from-a-failed-server-in-a-glusterfs-array We are using the same/old IP address. So when we try to do a `gluster volume sync node2 all` we get a `volume sync node2 all : FAILED : Staging failed on node2. Please check log file for details.` The logs all *seem* to be complaining the there are no volumes on node1 - which makes sense (I think) because there *are* no volumes on node1. If we try to create a volume on node1 the system complains that the volume already exists (on nodes 2& 3) - again, yes, this is correct. So, what are we doing wrong? Thanks in advance Dulux-Oz Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Gluster 11 OP version
Hi Marcus, If you see heals happening, you have to check if all clients are connected to all bricks of the volume. I have seen sporadic healing happening when a client was disconnected from a brick. You can’t lower the OP version and if you are indeed having issues, try to setup NFS Ganesha as a solution. Best Regards,Strahil Nikolov On Tuesday, December 19, 2023, 2:42 PM, Marcus Pedersén wrote: Hi all, We upgraded to gluster 11.1 and the OP version was fixed in this version, so I changed the OP version to 11. Now we have some obscure, vague problem. Our users usually run 100+ processes with GNU parallel and now the execution time have increased close to the double. I can see that there are a couple of heals happening every now and then but this do not seem starange to me. Just to make sure that it was not on the client side, I downgraded glusterfs-client to 10 but we still have this slow down. I tried to lower the OP version back to 10 again but this is apparently not possible: volume set: failed: Required op-version (10) should not be equal or lower than current cluster op-version (11). Before the change to OP version 11 everything worked fine. Is there a way to "manually" change the OP version back? Or any other ideas on how to fix this? Many thanks in advance!! Best regards Marcus --- När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/> Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Gluster Performance - 12 Gbps SSDs and 10 Gbps NIC
Hi Gilberto, Have you checked https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.5/html/administration_guide/chap-configuring_red_hat_storage_for_enhancing_performance ? I think that you will need to test the virt profile as the settings will prevent some bad situations - especially VM live migration.You should also consider sharding which can reduce healing time but also makes your life more difficult if you need to access the disks of the VMs. I think that client.event-thread , server.event-thread and performance.io-thread-count can be tuned in your case. Consider setting ip a VM using the gluster volume as backing store and run the tests inside the VM to simulate real workload (best is to run a DB, webserver, etc inside a VM). Best Regards,Strahil Nikolov On Wednesday, December 13, 2023, 2:34 PM, Gilberto Ferreira wrote: Hi allAravinda, usually I set this in two server env and never get split brain:gluster vol set VMS cluster.heal-timeout 5 gluster vol heal VMS enable gluster vol set VMS cluster.quorum-reads false gluster vol set VMS cluster.quorum-count 1 gluster vol set VMS network.ping-timeout 2 gluster vol set VMS cluster.favorite-child-policy mtime gluster vol heal VMS granular-entry-heal enable gluster vol set VMS cluster.data-self-heal-algorithm full gluster vol set VMS features.shard on Strahil, in general, I get 0,06ms with 1G dedicated NIC.My env are very simple, using Proxmox + QEMU/KVM, with 3 or 5 VM. ---Gilberto Nunes Ferreira(47) 99676-7530 - Whatsapp / Telegram Em qua., 13 de dez. de 2023 às 06:08, Strahil Nikolov escreveu: Hi Aravinda, Based on the output it’s a ‘replica 3 arbiter 1’ type. Gilberto,What’s the latency between the nodes ? Best Regards,Strahil Nikolov On Wednesday, December 13, 2023, 7:36 AM, Aravinda wrote: Only Replica 2 or Distributed Gluster volumes can be created with two servers. High chance of split brain with Replica 2 compared to Replica 3 volume. For NFS Ganesha, no issue exporting the volume even if only one server is available. Run NFS Ganesha servers in Gluster server nodes and NFS clients from the network can connect to any NFS Ganesha server. You can use Haproxy + Keepalived (or any other load balancer) if high availability required for the NFS Ganesha connections (Ex: If a server node goes down, then nfs client can connect to other NFS ganesha server node). --Aravinda Kadalu Technologies On Wed, 13 Dec 2023 01:42:11 +0530 Gilberto Ferreira wrote --- Ah that's nice.Somebody knows this can be achieved with two servers? --- Gilberto Nunes Ferreira (47) 99676-7530 - Whatsapp / Telegram Em ter., 12 de dez. de 2023 às 17:08, Danny escreveu: Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users Wow, HUGE improvement with NFS-Ganesha! sudo dnf -y install glusterfs-ganesha sudo vim /etc/ganesha/ganesha.conf NFS_CORE_PARAM { mount_path_pseudo = true; Protocols = 3,4; } EXPORT_DEFAULTS { Access_Type = RW; } LOG { Default_Log_Level = WARN; } EXPORT{ Export_Id = 1 ; # Export ID unique to each export Path = "/data"; # Path of the volume to be exported FSAL { name = GLUSTER; hostname = "localhost"; # IP of one of the nodes in the trusted pool volume = "data"; # Volume name. Eg: "test_volume" } Access_type = RW; # Access permissions Squash = No_root_squash; # To enable/disable root squashing Disable_ACL = TRUE; # To enable/disable ACL Pseudo = "/data"; # NFSv4 pseudo path for this export Protocols = "3","4" ; # NFS protocols supported Transports = "UDP","TCP" ; # Transport protocols supported SecType = "sys"; # Security flavors supported } sudo systemctl enable --now nfs-ganesha sudo vim /etc/fstab localhost:/data /data nfs defaults,_netdev 0 0 sudo systemctl daemon-reload sudo mount -a fio --name=test --filename=/data/wow --size=1G --readwrite=write Run status group 0 (all jobs): WRITE: bw=2246MiB/s (2355MB/s), 2246MiB/s-2246MiB/s (2355MB/s-2355MB/s), io=1024MiB (1074MB), run=456-456msec Yeah 2355MB/s is much better than the original 115MB/s So in the end, I guess FUSE isn't the best choice. On Tue, Dec 12, 2023 at 3:00 PM Gilberto Ferreira wrote: Fuse there some overhead.Take a look at libgfapi: https://staged-gluster-docs.readthedocs.io/en/release3.7.0beta1/Features/libgfapi/ I know this doc somehow is out of date, but could be a hint --- Gilberto Nunes Ferreira (47) 99676-7530 - Whatsapp / Telegram Em ter., 12 de dez. de 2023 às 1
Re: [Gluster-users] Gluster Performance - 12 Gbps SSDs and 10 Gbps NIC
Hi Aravinda, Based on the output it’s a ‘replica 3 arbiter 1’ type. Gilberto,What’s the latency between the nodes ? Best Regards,Strahil Nikolov On Wednesday, December 13, 2023, 7:36 AM, Aravinda wrote: Only Replica 2 or Distributed Gluster volumes can be created with two servers. High chance of split brain with Replica 2 compared to Replica 3 volume. For NFS Ganesha, no issue exporting the volume even if only one server is available. Run NFS Ganesha servers in Gluster server nodes and NFS clients from the network can connect to any NFS Ganesha server. You can use Haproxy + Keepalived (or any other load balancer) if high availability required for the NFS Ganesha connections (Ex: If a server node goes down, then nfs client can connect to other NFS ganesha server node). --Aravinda Kadalu Technologies On Wed, 13 Dec 2023 01:42:11 +0530 Gilberto Ferreira wrote --- Ah that's nice.Somebody knows this can be achieved with two servers? --- Gilberto Nunes Ferreira (47) 99676-7530 - Whatsapp / Telegram Em ter., 12 de dez. de 2023 às 17:08, Danny escreveu: Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users Wow, HUGE improvement with NFS-Ganesha! sudo dnf -y install glusterfs-ganesha sudo vim /etc/ganesha/ganesha.conf NFS_CORE_PARAM { mount_path_pseudo = true; Protocols = 3,4; } EXPORT_DEFAULTS { Access_Type = RW; } LOG { Default_Log_Level = WARN; } EXPORT{ Export_Id = 1 ; # Export ID unique to each export Path = "/data"; # Path of the volume to be exported FSAL { name = GLUSTER; hostname = "localhost"; # IP of one of the nodes in the trusted pool volume = "data"; # Volume name. Eg: "test_volume" } Access_type = RW; # Access permissions Squash = No_root_squash; # To enable/disable root squashing Disable_ACL = TRUE; # To enable/disable ACL Pseudo = "/data"; # NFSv4 pseudo path for this export Protocols = "3","4" ; # NFS protocols supported Transports = "UDP","TCP" ; # Transport protocols supported SecType = "sys"; # Security flavors supported } sudo systemctl enable --now nfs-ganesha sudo vim /etc/fstab localhost:/data /data nfs defaults,_netdev 0 0 sudo systemctl daemon-reload sudo mount -a fio --name=test --filename=/data/wow --size=1G --readwrite=write Run status group 0 (all jobs): WRITE: bw=2246MiB/s (2355MB/s), 2246MiB/s-2246MiB/s (2355MB/s-2355MB/s), io=1024MiB (1074MB), run=456-456msec Yeah 2355MB/s is much better than the original 115MB/s So in the end, I guess FUSE isn't the best choice. On Tue, Dec 12, 2023 at 3:00 PM Gilberto Ferreira wrote: Fuse there some overhead.Take a look at libgfapi: https://staged-gluster-docs.readthedocs.io/en/release3.7.0beta1/Features/libgfapi/ I know this doc somehow is out of date, but could be a hint --- Gilberto Nunes Ferreira (47) 99676-7530 - Whatsapp / Telegram Em ter., 12 de dez. de 2023 às 16:29, Danny escreveu: Nope, not a caching thing. I've tried multiple different types of fio tests, all produce the same results. Gbps when hitting the disks locally, slow MB\s when hitting the Gluster FUSE mount. I've been reading up on glustr-ganesha, and will give that a try. On Tue, Dec 12, 2023 at 1:58 PM Ramon Selga wrote: Dismiss my first question: you have SAS 12Gbps SSDs Sorry! El 12/12/23 a les 19:52, Ramon Selga ha escrit: May ask you which kind of disks you have in this setup? rotational, ssd SAS/SATA, nvme? Is there a RAID controller with writeback caching? It seems to me your fio test on local brick has a unclear result due to some caching. Try something like (you can consider to increase test file size depending of your caching memory) : fio --size=16G --name=test --filename=/gluster/data/brick/wow --bs=1M --nrfiles=1 --direct=1 --sync=0 --randrepeat=0 --rw=write --refill_buffers --end_fsync=1 --iodepth=200 --ioengine=libaio Also remember a replica 3 arbiter 1 volume writes synchronously to two data bricks, halving throughput of your network backend. Try similar fio on gluster mount but I hardly see more than 300MB/s writing sequentially on only one fuse mount even with nvme backend. On the other side, with 4 to 6 clients, you can easily reach 1.5GB/s of aggregate throughput To start, I think is better to try with default parameters for your replica volume. Best regards! Ramon El 12/12/23 a les 19:10, Danny ha escrit: Sorry, I noticed that too after I posted, so I instantly upgraded to 10. Issue remains. On Tue, Dec 12, 2023
Re: [Gluster-users] Gluster Performance - 12 Gbps SSDs and 10 Gbps NIC
Hi, Let’s try the simple things: Check if you can use MTU9000 and if it’s possible, set it on the Bond Slaves and the bond devices: ping GLUSTER_PEER -c 10 -M do -s 8972 Then try to follow up the recommendations from https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.5/html/administration_guide/chap-configuring_red_hat_storage_for_enhancing_performance Best Regards,Strahil Nikolov On Monday, December 11, 2023, 3:32 PM, Danny wrote: Hello list, I'm hoping someone can let me know what setting I missed. Hardware:Dell R650 servers, Dual 24 Core Xeon 2.8 GHz, 1 TB RAM 8x SSD s Negotiated Speed 12 GbpsPERC H755 Controller - RAID 6 Created virtual "data" disk from the above 8 SSD drives, for a ~20 TB /dev/sdb OS:CentOS Streamkernel-4.18.0-526.el8.x86_64glusterfs-7.9-1.el8.x86_64 IPERF Test between nodes: [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 11.5 GBytes 9.90 Gbits/sec 0 sender [ 5] 0.00-10.04 sec 11.5 GBytes 9.86 Gbits/sec receiver All good there. ~10 Gbps, as expected. LVM Install:export DISK="/dev/sdb" sudo parted --script $DISK "mklabel gpt" sudo parted --script $DISK "mkpart primary 0% 100%" sudo parted --script $DISK "set 1 lvm on"sudo pvcreate --dataalignment 128K /dev/sdb1 sudo vgcreate --physicalextentsize 128K gfs_vg /dev/sdb1 sudo lvcreate -L 16G -n gfs_pool_meta gfs_vg sudo lvcreate -l 95%FREE -n gfs_pool gfs_vg sudo lvconvert --chunksize 1280K --thinpool gfs_vg/gfs_pool --poolmetadata gfs_vg/gfs_pool_meta sudo lvchange --zero n gfs_vg/gfs_pool sudo lvcreate -V 19.5TiB --thinpool gfs_vg/gfs_pool -n gfs_lv sudo mkfs.xfs -f -i size=512 -n size=8192 -d su=128k,sw=10 /dev/mapper/gfs_vg-gfs_lv sudo vim /etc/fstab/dev/mapper/gfs_vg-gfs_lv /gluster/data/brick xfs rw,inode64,noatime,nouuid 0 0 sudo systemctl daemon-reload && sudo mount -a fio --name=test --filename=/gluster/data/brick/wow --size=1G --readwrite=write Run status group 0 (all jobs): WRITE: bw=2081MiB/s (2182MB/s), 2081MiB/s-2081MiB/s (2182MB/s-2182MB/s), io=1024MiB (1074MB), run=492-492msec All good there. 2182MB/s =~ 17.5 Gbps. Nice! Gluster install:export NODE1='10.54.95.123' export NODE2='10.54.95.124' export NODE3='10.54.95.125' sudo gluster peer probe $NODE2 sudo gluster peer probe $NODE3 sudo gluster volume create data replica 3 arbiter 1 $NODE1:/gluster/data/brick $NODE2:/gluster/data/brick $NODE3:/gluster/data/brick force sudo gluster volume set data network.ping-timeout 5 sudo gluster volume set data performance.client-io-threads on sudo gluster volume set data group metadata-cache sudo gluster volume start data sudo gluster volume info all Volume Name: data Type: Replicate Volume ID: b52b5212-82c8-4b1a-8db3-52468bc0226e Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: 10.54.95.123:/gluster/data/brick Brick2: 10.54.95.124:/gluster/data/brick Brick3: 10.54.95.125:/gluster/data/brick (arbiter) Options Reconfigured: network.inode-lru-limit: 20 performance.md-cache-timeout: 600 performance.cache-invalidation: on performance.stat-prefetch: on features.cache-invalidation-timeout: 600 features.cache-invalidation: on network.ping-timeout: 5 transport.address-family: inet storage.fips-mode-rchecksum: on nfs.disable: on performance.client-io-threads: on sudo vim /etc/fstab localhost:/data /data glusterfs defaults,_netdev 0 0 sudo systemctl daemon-reload && sudo mount -afio --name=test --filename=/data/wow --size=1G --readwrite=write Run status group 0 (all jobs): WRITE: bw=109MiB/s (115MB/s), 109MiB/s-109MiB/s (115MB/s-115MB/s), io=1024MiB (1074MB), run=9366-9366msec Oh no, what's wrong? From 2182MB/s down to only 115MB/s? What am I missing? I'm not expecting the above ~17 Gbps, but I'm thinking it should at least be close(r) to ~10 Gbps. Any suggestions? Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Announcing Gluster release 11.1
Great news! Best Regards,Strahil Nikolov On Fri, Nov 24, 2023 at 3:32, Shwetha Acharya wrote: The Gluster community is pleased to announce the release of Gluster11.1 Packages available at [1]. Release notes for the release can be found at [2]. Highlights of Release: - Fix upgrade issue by reverting posix change related to storage.reserve value - Fix possible data loss during rebalance if there is any link file on the system - Fix maximum op-version for release 11 Thanks, Shwetha References: [1] Packages for 11.1 https://download.gluster.org/pub/gluster/glusterfs/11/11.1/ [2] Release notes for 11.1: https://docs.gluster.org/en/latest/release-notes/11.1/ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Verify limit-objects from clients in Gluster9 ?
What do you mean by dir ?Usually inode max value is per File System. Best Regards,Strahil Nikolov On Mon, Nov 6, 2023 at 12:58, difa.csi wrote: Hello all. Is there a way to check inode limit from clients? df -i /path/to/dir seems to report values for all the volume, not just the dir. For space it works as expected: # gluster v quota cluster_data list Path Hard-limit Soft-limit Used Available Soft-limit exceeded? Hard-limit exceeded? --- /astro 20.0TB 80%(16.0TB) 18.8TB 1.2TB Yes No # df /mnt/scratch/astro Filesystem 1K-blocks Used Available Use% Mounted on clustor00:cluster_data 21474836480 20169918036 1304918444 94% /mnt/scratch For inodes, instead: # gluster v quota cluster_data list-objects Path Hard-limit Soft-limit Files Dirs Available Soft-limit exceeded? Hard-limit exceeded? --- /astro 10 80%(8) # df -i /mnt/scratch/astro Filesystem Inodes IUsed IFree IUse% Mounted on clustor00:cluster_data 4687500480 122689 4687377791 1% /mnt/scratch 99897 103 0 Yes Yes Should report 100% use for "hard quota exceeded", IMO. That's on Gluster 9.6. -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] strange link files
Hi, Usually, the servers have backward compatibility controlled with the op version.Most probably it's a bug/incompatibility between clients and servers. What happens when you downgrade the client ? Best Regards,Strahil Nikolov On Mon, Nov 6, 2023 at 12:20, Stefan Solbrig wrote: Dear all, I recently upgraded my clients to 10.4, while I left the servers (distrubuted only) on glusterfs 9. I'm seeing a strange effect when I do a "mv filename1 filename2": filename2 is uplicated, one time with zero size and sticky bit set. In generally, I know that glusterfs creates link files (size zero and sticky bit set) when the new filename is hashed to a different brick. However, these link files are only visible to the glusterfsd, not to the users. I also noted that now the "link files" have an additional attibute (glusterfs.mdata) and previously link files stored the "real" brick name in "glusterfs.dht.linkto". Therefore, I have a two of questions: 1) Is this a bug, or is it intentionally that glusterfs 10 clients don't work with glusterfs 9 servers? 2) If it's not a bug, how does the attribute "glusterfs.mdata" store the "real" brick name? best wishes, Stefan Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] State of the gluster project
Well, After IBM acquisition, RH discontinued their support in many projects including GlusterFS (certification exams were removed, payed product went EOL, etc). The only way to get it back on track is with a sponsor company that haves the capability to drive it.Kadalu is relying on GlusterFS but they are not as big as Red Hat and based on one of the previous e-mails they will need sponsorship to dedicate resources. Best Regards,Strahil Nikolov On Saturday, October 28, 2023, 9:57 AM, Marcus Pedersén wrote: Hi all, I just have a general thought about the gluster project. I have got the feeling that things has slowed down in the gluster project. I have had a look at github and to me the project seems to slow down, for gluster version 11 there has been no minor releases, we are still on 11.0 and I have not found any references to 11.1. There is a milestone called 12 but it seems to be stale. I have hit the issue: https://github.com/gluster/glusterfs/issues/4085 that seems to have no sollution. I noticed when version 11 was released that you could not bump OP version to 11 and reported this, but this is still not available. I am just wondering if I am missing something here? We have been using gluster for many years in production and I think that gluster is great!! It has served as well over the years and we have seen some great improvments of stabilility and speed increase. So is there something going on or have I got the wrong impression (and feeling)? Best regards Marcus --- När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/> Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Replace faulty host
Hi Markus, It looks quite well documented, but please use https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.5/html/administration_guide/sect-replacing_hosts as 3.5 is the latest version for RHGS. If the OS disks are failing, I would have tried moving the data disks to the new machine and transferring the gluster files in /etc and /var/lib to the new node. Any reason to reuse the FQDN ?For me it was always much simpler to remove the brick, remove the node from TSP, add the new node and then add the brick and trigger full heal. Best Regards,Strahil Nikolov On Wednesday, October 25, 2023, 1:30 PM, Marcus Pedersén wrote: Hi all, I have a problem with one of our gluster clusters. This is the setup: Volume Name: gds-common Type: Distributed-Replicate Volume ID: 42c9fa00-2d57-4a58-b5ae-c98c349cfcb6 Status: Started Snapshot Count: 26 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: urd-gds-031:/urd-gds/gds-common Brick2: urd-gds-032:/urd-gds/gds-common Brick3: urd-gds-030:/urd-gds/gds-common (arbiter) Options Reconfigured: cluster.granular-entry-heal: on storage.fips-mode-rchecksum: on transport.address-family: inet performance.client-io-threads: off features.barrier: disable The arbiter node has a faulty root disk but it is still up and glusterd is still running. I have a spare server equal to the arbiter node, so my plan is to replace the arbiter host and then I can calmly reinstall OS and fix the rest of the configuration on the faulty host to be used in another cluster. I want to use the same hostname on the new host. What is the correct commands and way to replace the aribter node. I search online and found this: https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3/html/administration_guide/sect-replacing_hosts Can I use this guide to replace the host? Please, give me advice on this. Many thanks in advance!! Best regards Marcus --- När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/> Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] orphaned snapshots
As I mentioned, I’ve never had such a case.Give it a try in a test environment and if it works - go ahead. Best Regards,Strahil Nikolov On Wednesday, August 16, 2023, 1:21 PM, Sebastian Neustein wrote: Strahil Nikolov: I’ve never had such situation and I don’t recall someone sharing something similar. That's strange, it is really easy to reproduce. This is from a fresh test environment: summary: - There is one snapshot present. - On one node glusterd is stopped. - During the stop, one snapshot is deleted. - The node is brought up again - On that node there is an orphaned snapshot detailed version: # on node 1: root@gl1:~# cat /etc/debian_version 11.7 root@gl1:~# gluster --version glusterfs 10.4 root@gl1:~# gluster volume info Volume Name: glvol_samba Type: Replicate Volume ID: 91cb059e-10e4-4439-92ea-001065652749 Status: Started Snapshot Count: 1 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: gl1:/data/glusterfs/glvol_samba/brick0/brick Brick2: gl2:/data/glusterfs/glvol_samba/brick0/brick Brick3: gl3:/data/glusterfs/glvol_samba/brick0/brick Options Reconfigured: cluster.granular-entry-heal: on storage.fips-mode-rchecksum: on transport.address-family: inet nfs.disable: on performance.client-io-threads: off features.barrier: disable root@gl1:~# gluster snapshot list snaps_GMT-2023.08.15-13.05.28 # on node 3: root@gl3:~# systemctl stop glusterd.service # on node 1: root@gl1:~# gluster snapshot deactivate snaps_GMT-2023.08.15-13.05.28 Deactivating snap will make its data inaccessible. Do you want to continue? (y/n) y Snapshot deactivate: snaps_GMT-2023.08.15-13.05.28: Snap deactivated successfully root@gl1:~# gluster snapshot delete snaps_GMT-2023.08.15-13.05.28 Deleting snap will erase all the information about the snap. Do you still want to continue? (y/n) y snapshot delete: snaps_GMT-2023.08.15-13.05.28: snap removed successfully root@gl1:~# gluster snapshot list No snapshots present # on node 3: root@gl3:~# systemctl start glusterd.service root@gl3:~# gluster snapshot list snaps_GMT-2023.08.15-13.05.28 root@gl3:~# gluster snapshot deactivate snaps_GMT-2023.08.15-13.05.28 Deactivating snap will make its data inaccessible. Do you want to continue? (y/n) y snapshot deactivate: failed: Pre Validation failed on gl1.ad.arc.de. Snapshot (snaps_GMT-2023.08.15-13.05.28) does not exist. Pre Validation failed on gl2. Snapshot (snaps_GMT-2023.08.15-13.05.28) does not exist. Snapshot command failed root@gl3:~# lvs -a LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert 669cbc14fa7542acafb2995666284583_0 vg_brick0 Vwi-aotz-- 15,00g tp_brick0 lv_brick0 0,08 lv_brick0 vg_brick0 Vwi-aotz-- 15,00g tp_brick0 0,08 [lvol0_pmspare] vg_brick0 ewi--- 20,00m tp_brick0 vg_brick0 twi-aotz-- 18,00g 0,12 10,57 [tp_brick0_tdata] vg_brick0 Twi-ao 18,00g [tp_brick0_tmeta] vg_brick0 ewi-ao 20,00m Would it be dangerous to just delete following items on node 3 while gluster is down: - the orphaned directories in /var/lib/glusterd/snaps/ - the orphaned lvm, here 669cbc14fa7542acafb2995666284583_0 Or is there a self-heal command? Regards Sebastian Am 10.08.2023 um 20:33 schrieb Strahil Nikolov: I’ve never had such situation and I don’t recall someone sharing something similar. Most probably it’s easier to remove the node from the TSP and re-add it. Of course , test the case in VMs just to validate that it’s possible to add a mode to a cluster with snapshots. I have a vague feeling that you will need to delete all snapshots. Best Regards, Strahil Nikolov On Thursday, August 10, 2023, 4:36 AM, Sebastian Neustein wrote: Hi Due to an outage of one node, after bringing it up again, the node has some orphaned snapshosts, which are already deleted on the other nodes. How can I delete these orphaned snapshots? Trying the normal way produceses these errors: [2023-08-08 19:34:03.667109 +] E [MSGID: 106115][glusterd-mgmt.c:118:gd_mgmt_v3_collate_errors] 0-management: Pre Validation failed on B742. Please check log file for details. [2023-08-08 19:34:03.667184 +] E [MSGID: 106115][glusterd-mgmt.c:118:gd_mgmt_v3_collate_errors] 0-management: Pre Validation failed on B741. Please check log file for details. [2023-08-08 19:34:03.667210 +] E [MSGID: 106121][glusterd-mgmt.c:1083:glusterd_mgmt_v3_pre_validate] 0-management: Pre Validation failed on peers [2023-08-08 19:34:03.667236 +] E [MSGID: 106121][glusterd-mgmt.c:2875:glusterd_mgmt_v3_initiate_snap_phases] 0-management: Pre Validation Failed Even worse: I followed read hat gluster snapshot trouble guide and deleted one of those
Re: [Gluster-users] Gluster client version vs gluster server
Hi, In gluster the servers can run with newer version in a backward compatibility mode - a.k.a op-version.Check this article and ensure that client op version is not smaller than the cluster one.https://docs.gluster.org/en/v3/Upgrade-Guide/op_version/ .In best scenario, just download the packages from gluster’s repo and ensure all clients and servers have the same version. Also, you can build your own rpms by following https://docs.gluster.org/en/main/Developer-guide/Building-GlusterFS/ if you don’t want the precompiled binaries: https://download.gluster.org/pub/gluster/glusterfs/LATEST/ Best Regards,Strahil Nikolov On Monday, August 14, 2023, 8:31 PM, Roy Sigurd Karlsbakk wrote: Hi all I have a RHEL machine with gluster 7.9 installed, which is the one from EPEL. Also, I have a set of debian machines running glusterfs server/cluster with version 9.3. Is it likely to work well with this combination or should everything be the same version? That might be a bit hard across distros. Also, RHEL just sells gluster, since it's such a nice feature so they find it hard to not charge us USD 4500 per year per node for it, plus the price difference between an edu license and a full license, per node. Well, we can probably use that money for something else, but we're not quite ready to leave rhel yet (not my fault). So - would these different versions be compatible or what would the potential problems be to mix them like described? roy Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Rebuilding a failed cluster
If you preserved the gluster structure in /etc/ and /var/lib, you should be able to run the cluster again.First install the same gluster version all nodes and then overwrite the structure in /etc and in /var/lib.Once you mount the bricks , start glusterd and check the situation. The other option is to setup a new cluster and volume and then mount the volume via FUSE and copy the data from one of the bricks. Best Regards,Strahil Nikolov On Saturday, August 12, 2023, 7:46 AM, Richard Betel wrote: I had a small cluster with a disperse 3 volume. 2 nodes had hardware failures and no longer boot, and I don't have replacement hardware for them (it's an old board called a PC-duino). However, I do have their intact root filesystems and the disks the bricks are on. So I need to rebuild the cluster on all new host hardware. does anyone have any suggestions on how to go about doing this? I've built 3 vms to be a new test cluster, but if I copy over a file from the 3 nodes and try to read it, I can't and get errors in /var/log/glusterfs/foo.log: [2023-08-12 03:50:47.638134 +] W [MSGID: 114031] [client-rpc-fops_v2.c:2561:client4_0_lookup_cbk] 0-gv-client-0: remote operation failed. [{path=/helmetpart.scad}, {gfid=----} , {errno=61}, {error=No data available}] [2023-08-12 03:50:49.834859 +] E [MSGID: 122066] [ec-common.c:1301:ec_prepare_update_cbk] 0-gv-disperse-0: Unable to get config xattr. FOP : 'FXATTROP' failed on gfid 076a511d-3721-4231-ba3b-5c4cbdbd7f5d. Pa rent FOP: READ [No data available] [2023-08-12 03:50:49.834930 +] W [fuse-bridge.c:2994:fuse_readv_cbk] 0-glusterfs-fuse: 39: READ => -1 gfid=076a511d-3721-4231-ba3b-5c4cbdbd7f5d fd=0x7fbc9c001a98 (No data available) so obviously, I need to copy over more stuff from the original cluster. If I force the 3 nodes and the volume to have the same uuids, will that be enough? Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] orphaned snapshots
I’ve never had such situation and I don’t recall someone sharing something similar. Most probably it’s easier to remove the node from the TSP and re-add it.Of course , test the case in VMs just to validate that it’s possible to add a mode to a cluster with snapshots. I have a vague feeling that you will need to delete all snapshots. Best Regards,Strahil Nikolov On Thursday, August 10, 2023, 4:36 AM, Sebastian Neustein wrote: Hi Due to an outage of one node, after bringing it up again, the node has some orphaned snapshosts, which are already deleted on the other nodes. How can I delete these orphaned snapshots? Trying the normal way produceses these errors: [2023-08-08 19:34:03.667109 +] E [MSGID: 106115] [glusterd-mgmt.c:118:gd_mgmt_v3_collate_errors] 0-management: Pre Validation failed on B742. Please check log file for details. [2023-08-08 19:34:03.667184 +] E [MSGID: 106115] [glusterd-mgmt.c:118:gd_mgmt_v3_collate_errors] 0-management: Pre Validation failed on B741. Please check log file for details. [2023-08-08 19:34:03.667210 +] E [MSGID: 106121] [glusterd-mgmt.c:1083:glusterd_mgmt_v3_pre_validate] 0-management: Pre Validation failed on peers [2023-08-08 19:34:03.667236 +] E [MSGID: 106121] [glusterd-mgmt.c:2875:glusterd_mgmt_v3_initiate_snap_phases] 0-management: Pre Validation Failed Even worse: I followed read hat gluster snapshot trouble guide and deleted one of those directories defining a snapshot. Now I receive this on the cli: run-gluster-snaps-e4dcd4166538414c849fa91b0b3934d7-brick6-brick[297342]: [2023-08-09 08:59:41.107243 +] M [MSGID: 113075] [posix-helpers.c:2161:posix_health_check_thread_proc] 0-e4dcd4166538414c849fa91b0b3934d7-posix: health-check failed, going down run-gluster-snaps-e4dcd4166538414c849fa91b0b3934d7-brick6-brick[297342]: [2023-08-09 08:59:41.107243 +] M [MSGID: 113075] [posix-helpers.c:2161:posix_health_check_thread_proc] 0-e4dcd4166538414c849fa91b0b3934d7-posix: health-check failed, going down run-gluster-snaps-e4dcd4166538414c849fa91b0b3934d7-brick6-brick[297342]: [2023-08-09 08:59:41.107292 +] M [MSGID: 113075] [posix-helpers.c:2179:posix_health_check_thread_proc] 0-e4dcd4166538414c849fa91b0b3934d7-posix: still alive! -> SIGTERM run-gluster-snaps-e4dcd4166538414c849fa91b0b3934d7-brick6-brick[297342]: [2023-08-09 08:59:41.107292 +] M [MSGID: 113075] [posix-helpers.c:2179:posix_health_check_thread_proc] 0-e4dcd4166538414c849fa91b0b3934d7-posix: still alive! -> SIGTERM What are my options? - is there an easy way to remove all those snapshots? - or would it be easier to remove and rejoin the node to the gluster cluster? Thank you for any help! Seb -- Sebastian Neustein Airport Research Center GmbH Bismarckstraße 61 52066 Aachen Germany Phone: +49 241 16843-23 Fax: +49 241 16843-19 e-mail: sebastian.neust...@arc-aachen.de Website: http://www.airport-consultants.com Register Court: Amtsgericht Aachen HRB 7313 Ust-Id-No.: DE196450052 Managing Director: Dipl.-Ing. Tom Alexander Heuer Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] log file spewing on one node, but not the others
What is the uptime of the affected node ? There is a similar error reported in https://access.redhat.com/solutions/5518661 which could indicate a possible problem in a memory area named ‘lru’ .Have you noticed any ECC errors in dmesg/IPMI of the system ? At least I would reboot the node and run hardware diagnostics to check that everything is fine. Best Regards,Strahil Nikolov Sent from Yahoo Mail for iPhone On Tuesday, July 25, 2023, 4:31 AM, W Kern wrote: we have an older 2+1 arbiter gluster cluster running 6.10 on Ubuntu18LTS It has run beautifully for years. Only occaisionally needing attention as drives have died, etc Each peer has two volumes. G1 and G2 with a shared 'gluster' network. Since July 1st one of the peers for one volume is spewing the logfile /var-lib-G1.log with the following errors. The volume (G2) is not showing this nor are there issue with other peer and the arbiter for the G1 volume. So its one machine with one volume that has the problem. There have been NO issues with the volumes themselves. It simply a matter of the the logfiles generating GBs of entries every hour (which is how we noticed it when we started running out of log space). According to google there are mentions of this error, but that it was fixed in the 6.x series. I can find no other mentions. I have tried restarting glusterd with no change. there doesn't seem to be any hardware issues. I am wondering if perhaps this is an XFS file corruption issue and if I were to unmount the Gluster run xfs_repair and bring it back, that would solve the issue. Any other suggestions? [2023-07-21 18:51:38.260507] W [inode.c:1638:inode_table_prune] (-->/usr/lib/x86_64-linux-gnu/glusterfs/6.10/xlator/features/shard.so(+0x21b47) [0x7fb261c13b47] -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(inode_unref+0x36) [0x7fb26947f416] -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x3337a) [0x7fb26947f37a] ) 0-GLB1image-shard: Empty inode lru list found but with (-2) lru_size [2023-07-21 18:51:38.261231] W [inode.c:1638:inode_table_prune] (-->/usr/lib/x86_64-linux-gnu/glusterfs/6.10/xlator/mount/fuse.so(+0xba51) [0x7fb266cdca51] -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(inode_unref+0x36) [0x7fb26947f416] -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x3337a) [0x7fb26947f37a] ) 0-fuse: Empty inode lru list found but with (-2) lru_size [2023-07-21 18:51:38.261377] W [inode.c:1638:inode_table_prune] (-->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(loc_wipe+0x12) [0x7fb26946bd72] -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(inode_unref+0x36) [0x7fb26947f416] -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x3337a) [0x7fb26947f37a] ) 0-GLB1image-shard: Empty inode lru list found but with (-2) lru_size [2023-07-21 18:51:38.261806] W [inode.c:1638:inode_table_prune] (-->/usr/lib/x86_64-linux-gnu/glusterfs/6.10/xlator/cluster/replicate.so(+0x5ca57) [0x7fb26213ba57] -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(inode_unref+0x36) [0x7fb26947f416] -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x3337a) [0x7fb26947f37a] ) 0-GLB1image-replicate-0: Empty inode lru list found but with (-2) lru_size [2023-07-21 18:51:38.261933] W [inode.c:1638:inode_table_prune] (-->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(fd_unref+0x1ef) [0x7fb269495eaf] -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(inode_unref+0x36) [0x7fb26947f416] -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x3337a) [0x7fb26947f37a] ) 0-GLB1image-client-1: Empty inode lru list found but with (-2) lru_size [2023-07-21 18:51:38.262684] W [inode.c:1638:inode_table_prune] (-->/usr/lib/x86_64-linux-gnu/glusterfs/6.10/xlator/cluster/replicate.so(+0x5ca57) [0x7fb26213ba57] -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(inode_unref+0x36) [0x7fb26947f416] -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x3337a) [0x7fb26947f37a] ) 0-GLB1image-replicate-0: Empty inode lru list found but with (-2) lru_size -wk Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] remove_me files building up
Thanks for the clarification. That behaviour is quite weird as arbiter bricks should hold only metadata. What does the following show on host uk3-prod-gfs-arb-01: du -h -x -d 1 /data/glusterfs/gv1/brick1/brickdu -h -x -d 1 /data/glusterfs/gv1/brick3/brickdu -h -x -d 1 /data/glusterfs/gv1/brick2/brick If indeed the shards are taking space - that is a really strange situation.From which version did you upgrade and which one is now ? I assume all gluster TSP members (the servers) have the same version, but it’s nice to double check. Does the archival job actually deletes the original files after being processed or the workload keeps overriding the existing files ? Best Regards,Strahil Nikolov Sent from Yahoo Mail for iPhone On Tuesday, July 4, 2023, 6:50 PM, Liam Smith wrote: #yiv0069265236 P {margin-top:0;margin-bottom:0;}Hi Strahil, We're using gluster to act as a share for an application to temporarily process and store files, before they're then archived off over night. The issue we're seeing isn't with the inodes running out of space, but the actual disk space on the arb server running low. This is the df -h output for the bricks on the arb server:/dev/sdd1 15G 12G 3.3G 79% /data/glusterfs/gv1/brick3/dev/sdc1 15G 2.8G 13G 19% /data/glusterfs/gv1/brick1/dev/sde1 15G 14G 1.6G 90% /data/glusterfs/gv1/brick2 And this is the df -hi output for the bricks on the arb server:/dev/sdd1 7.5M 2.7M 4.9M 35% /data/glusterfs/gv1/brick3/dev/sdc1 7.5M 643K 6.9M 9% /data/glusterfs/gv1/brick1/dev/sde1 6.1M 3.0M 3.1M 49% /data/glusterfs/gv1/brick2 So the inode usage appears to be fine, but we're seeing that the actual disk usage keeps increasing on the bricks despite it being the arbiter. The actual issue appears to be that files under /data/glusterfs/gv1/brick3/brick/.shard/.remove_me/ and/data/glusterfs/gv1/brick2/brick/.shard/.remove_me/ are being retained, even when the original files are deleted from the data nodes. For reference, I've attached disk usage graphs for brick 3 over the past two weeks; one is a graph from a data node, the other from the arb. As you can see, the disk usage of the data node builds throughout the day, but then an archival job clears space down. However, on the arb, we see the disk space increasing in the same sort of trend, but it's never cleared down like the data node. Hopefully this clarifies the issue, we're a bit confused as to why this is occurring and whether this is actually intended behaviour or potentially a bug, so any advice is greatly appreciated. Thanks, | | | | | | | Liam Smith | | Linux Systems Support Engineer, Scholar | | | | | | The contents of this email message and any attachments are intended solely for the addressee(s) and may contain confidential and/or privileged information and may be legally protected from disclosure. | | | | From: Strahil Nikolov Sent: 04 July 2023 15:51 To: Liam Smith ; gluster-users@gluster.org Subject: Re: [Gluster-users] remove_me files building up | CAUTION: This e-mail originates from outside of Ekco. Do not click links or attachments unless you recognise the sender. | Hi Liam, I saw that your XFS uses ‘imaxpct=25’ which for an arbiter brick is a little bit low. If you have free space on the bricks, increase the maxpct to a bigger value, like:xfs_growfs -m 80 /path/to/brickThat will set 80% of the Filesystem for inodes, which you can verify with df -i /brick/path (compare before and after). This way you won’t run out of inodes in the future. Of course, always test that on non Prod first. Are you using the volume for VM disk storage domain ? What is your main workload ? Best Regards,Strahil Nikolov On Tuesday, July 4, 2023, 2:12 PM, Liam Smith wrote: Hi, Thanks for your response, please find the xfs_info for each brick on the arbiter below: root@uk3-prod-gfs-arb-01:~# xfs_info /data/glusterfs/gv1/brick1meta-data=/dev/sdc1 isize=512 agcount=31, agsize=131007 blks = sectsz=512 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=1, rmapbt=0 = reflink=1data = bsize=4096 blocks=3931899, imaxpct=25 = sunit=0 swidth=0 blksnaming =version 2 bsize=4096 ascii-ci=0, ftype=1log =internal log bsize=4096 blocks=2560, version=2 = sectsz=512 sunit=0 blks, lazy-count=1realtime =none extsz=4096 blocks=0, rtextents=0 root@uk3-prod-gfs-arb-01:~# xfs_info /data/glusterfs/gv1/brick2meta-data=/dev/sde1 isize=512 agcount=13, agsize=327616 blks = sectsz=512 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=1, rmapbt=0
Re: [Gluster-users] remove_me files building up
Hi Liam, I saw that your XFS uses ‘imaxpct=25’ which for an arbiter brick is a little bit low. If you have free space on the bricks, increase the maxpct to a bigger value, like:xfs_growfs -m 80 /path/to/brickThat will set 80% of the Filesystem for inodes, which you can verify with df -i /brick/path (compare before and after). This way you won’t run out of inodes in the future. Of course, always test that on non Prod first. Are you using the volume for VM disk storage domain ? What is your main workload ? Best Regards,Strahil Nikolov On Tuesday, July 4, 2023, 2:12 PM, Liam Smith wrote: #yiv8784601153 P {margin-top:0;margin-bottom:0;}Hi, Thanks for your response, please find the xfs_info for each brick on the arbiter below: root@uk3-prod-gfs-arb-01:~# xfs_info /data/glusterfs/gv1/brick1meta-data=/dev/sdc1 isize=512 agcount=31, agsize=131007 blks = sectsz=512 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=1, rmapbt=0 = reflink=1data = bsize=4096 blocks=3931899, imaxpct=25 = sunit=0 swidth=0 blksnaming =version 2 bsize=4096 ascii-ci=0, ftype=1log =internal log bsize=4096 blocks=2560, version=2 = sectsz=512 sunit=0 blks, lazy-count=1realtime =none extsz=4096 blocks=0, rtextents=0 root@uk3-prod-gfs-arb-01:~# xfs_info /data/glusterfs/gv1/brick2meta-data=/dev/sde1 isize=512 agcount=13, agsize=327616 blks = sectsz=512 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=1, rmapbt=0 = reflink=1data = bsize=4096 blocks=3931899, imaxpct=25 = sunit=0 swidth=0 blksnaming =version 2 bsize=4096 ascii-ci=0, ftype=1log =internal log bsize=4096 blocks=2560, version=2 = sectsz=512 sunit=0 blks, lazy-count=1realtime =none extsz=4096 blocks=0, rtextents=0 root@uk3-prod-gfs-arb-01:~# xfs_info /data/glusterfs/gv1/brick3meta-data=/dev/sdd1 isize=512 agcount=13, agsize=327616 blks = sectsz=512 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=1, rmapbt=0 = reflink=1data = bsize=4096 blocks=3931899, imaxpct=25 = sunit=0 swidth=0 blksnaming =version 2 bsize=4096 ascii-ci=0, ftype=1log =internal log bsize=4096 blocks=2560, version=2 = sectsz=512 sunit=0 blks, lazy-count=1realtime =none extsz=4096 blocks=0, rtextents=0 I've also copied below some df output from the arb server: root@uk3-prod-gfs-arb-01:~# df -hiFilesystem Inodes IUsed IFree IUse% Mounted onudev 992K 473 991K 1% /devtmpfs 995K 788 994K 1% /run/dev/sda1 768K 105K 664K 14% /tmpfs 995K 3 995K 1% /dev/shmtmpfs 995K 4 995K 1% /run/locktmpfs 995K 18 995K 1% /sys/fs/cgroup/dev/sdb1 128K 113 128K 1% /var/lib/glusterd/dev/sdd1 7.5M 2.6M 5.0M 35% /data/glusterfs/gv1/brick3/dev/sdc1 7.5M 600K 7.0M 8% /data/glusterfs/gv1/brick1/dev/sde1 6.4M 2.9M 3.5M 46% /data/glusterfs/gv1/brick2uk1-prod-gfs-01:/gv1 150M 6.5M 144M 5% /mnt/gfstmpfs 995K 21 995K 1% /run/user/1004 root@uk3-prod-gfs-arb-01:~# df -hFilesystem Size Used Avail Use% Mounted onudev 3.9G 0 3.9G 0% /devtmpfs 796M 916K 795M 1% /run/dev/sda1 12G 3.9G 7.3G 35% /tmpfs 3.9G 8.0K 3.9G 1% /dev/shmtmpfs 5.0M 0 5.0M 0% /run/locktmpfs 3.9G 0 3.9G 0% /sys/fs/cgroup/dev/sdb1 2.0G 456K 1.9G 1% /var/lib/glusterd/dev/sdd1 15G 12G 3.5G 78% /data/glusterfs/gv1/brick3/dev/sdc1 15G 2.6G 13G 18% /data/glusterfs/gv1/brick1/dev/sde1 15G 14G 1.8G 89% /data/glusterfs/gv1/brick2uk1-prod-gfs-01:/gv1 300G 139G 162G 47% /mnt/gfstmpfs 796M 0 796M 0% /run/user/1004 Something I forgot to mention in my initial message is that the opversion was upgraded from 70200 to 10, which seems as though it could have been a trigger for the issue as well. Thanks, | | | | | | | Liam Smith | | Linux Systems Support Engineer, Scholar | | | | | | | | | | From: Strahil Nikolov Sent: 03 July 2023 18:28 To: Liam Smith ; gluster-users
Re: [Gluster-users] remove_me files building up
Hi, you mentioned that the arbiter bricks run out of inodes.Are you using XFS ?Can you provide the xfs_info of each brick ? Best Regards,Strahil Nikolov On Sat, Jul 1, 2023 at 19:41, Liam Smith wrote:Hi, We're running a cluster with two data nodes and one arbiter, and have sharding enabled. We had an issue a while back where one of the server's crashed, we got the server back up and running and ensured that all healing entries cleared, and also increased the server spec (CPU/Mem) as this seemed to be the potential cause. Since then however, we've seen some strange behaviour, whereby a lot of 'remove_me' files are building up under `/data/glusterfs/gv1/brick2/brick/.shard/.remove_me/` and `/data/glusterfs/gv1/brick3/brick/.shard/.remove_me/`. This is causing the arbiter to run out of space on brick2 and brick3, as the remove_me files are constantly increasing. brick1 appears to be fine, the disk usage increases throughout the day and drops down in line with the trend of the brick on the data nodes. We see the disk usage increase and drop throughout the day on the data nodes for brick2 and brick3 as well, but while the arbiter follows the same trend of the disk usage increasing, it doesn't drop at any point. This is the output of some gluster commands, occasional heal entries come and go: root@uk3-prod-gfs-arb-01:~# gluster volume info gv1 Volume Name: gv1Type: Distributed-ReplicateVolume ID: d3d1fdec-7df9-4f71-b9fc-660d12c2a046Status: StartedSnapshot Count: 0Number of Bricks: 3 x (2 + 1) = 9Transport-type: tcpBricks:Brick1: uk1-prod-gfs-01:/data/glusterfs/gv1/brick1/brickBrick2: uk2-prod-gfs-01:/data/glusterfs/gv1/brick1/brickBrick3: uk3-prod-gfs-arb-01:/data/glusterfs/gv1/brick1/brick (arbiter)Brick4: uk1-prod-gfs-01:/data/glusterfs/gv1/brick3/brickBrick5: uk2-prod-gfs-01:/data/glusterfs/gv1/brick3/brickBrick6: uk3-prod-gfs-arb-01:/data/glusterfs/gv1/brick3/brick (arbiter)Brick7: uk1-prod-gfs-01:/data/glusterfs/gv1/brick2/brickBrick8: uk2-prod-gfs-01:/data/glusterfs/gv1/brick2/brickBrick9: uk3-prod-gfs-arb-01:/data/glusterfs/gv1/brick2/brick (arbiter)Options Reconfigured:cluster.entry-self-heal: oncluster.metadata-self-heal: oncluster.data-self-heal: onperformance.client-io-threads: offstorage.fips-mode-rchecksum: ontransport.address-family: inetcluster.lookup-optimize: offperformance.readdir-ahead: offcluster.readdir-optimize: offcluster.self-heal-daemon: enablefeatures.shard: enablefeatures.shard-block-size: 512MBcluster.min-free-disk: 10%cluster.use-anonymous-inode: yes root@uk3-prod-gfs-arb-01:~# gluster peer status Number of Peers: 2 Hostname: uk2-prod-gfs-01Uuid: 2fdfa4a2-195d-4cc5-937c-f48466e76149State: Peer in Cluster (Connected) Hostname: uk1-prod-gfs-01Uuid: 43ec93d1-ad83-4103-aea3-80ded0903d88State: Peer in Cluster (Connected) root@uk3-prod-gfs-arb-01:~# gluster volume heal gv1 info Brick uk1-prod-gfs-01:/data/glusterfs/gv1/brick1/brickStatus: ConnectedNumber of entries: 1 Brick uk2-prod-gfs-01:/data/glusterfs/gv1/brick1/brickStatus: ConnectedNumber of entries: 0 Brick uk3-prod-gfs-arb-01:/data/glusterfs/gv1/brick1/brickStatus: ConnectedNumber of entries: 0 Brick uk1-prod-gfs-01:/data/glusterfs/gv1/brick3/brickStatus: ConnectedNumber of entries: 0 Brick uk2-prod-gfs-01:/data/glusterfs/gv1/brick3/brickStatus: ConnectedNumber of entries: 0 Brick uk3-prod-gfs-arb-01:/data/glusterfs/gv1/brick3/brickStatus: ConnectedNumber of entries: 0 Brick uk1-prod-gfs-01:/data/glusterfs/gv1/brick2/brickStatus: ConnectedNumber of entries: 0 Brick uk2-prod-gfs-01:/data/glusterfs/gv1/brick2/brick/.shard/.remove_meStatus: ConnectedNumber of entries: 3 Brick uk3-prod-gfs-arb-01:/data/glusterfs/gv1/brick2/brick/.shard/.remove_meStatus: ConnectedNumber of entries: 3 root@uk3-prod-gfs-arb-01:~# gluster volume get all cluster.op-versionOption Value-- -cluster.op-version 10 We're not sure if this is a potential bug or if something's corrupted that we don't have visibility of, so any pointers/suggestions about how to approach this would be appreciated. Thanks,Liam | | The contents of this email message and any attachments are intended solely for the addressee(s) and may contain confidential and/or privileged information and may be legally protected from disclosure. Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] remove snapshot when one peer is dead
Next time try the delete command with ‘force’ . Best Regards,Strahil Nikolov On Sunday, June 25, 2023, 8:30 AM, Stefan Kania wrote: fixed: setting both quorum settings to "none" restart glusterd and then I could remove the snapshots and the peer Am 19.06.23 um 13:01 schrieb Stefan Kania: > Hi to all, > > i have a volume where one peer is dead and I now would like to remove > the dead peer, but ther are still snapshots on the volume when I try to > remove the peer I gozt: > > root@glfs1:/run/gluster/snaps# gluster peer detach c-03.heartbeat.net > > All clients mounted through the peer which is getting detached need to > be remounted using one of the other active peers in the trusted storage > pool to ensure client gets notification on any changes│ > done on the gluster configuration and if the same has been done do you > want to proceed? (y/n) y > peer detach: failed: c-03.heartbeat.net is part of existing snapshot. > Remove those snapshots before proceeding > > > When I try to remove all snapshots I get: > -- > root@glfs1:/run/gluster/snaps# gluster snapshot delete all > > System contains 6 snapshot(s). > > Do you still want to continue and delete them? (y/n) y > > snapshot delete: failed: glusterds are not in quorum > > Snapshot command failed > -- > > How can I remove all snapshots with out quorum? > > Stefan > > > > > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://meet.google.com/cpu-eiue-hvk > Gluster-users mailing list > Gluster-users@gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] How to find out data alignment for LVM thin volume brick
Yes, disk alignment is only needed as the LVM stack cannot get the details of the HW Raids.In your case , use the default 256k as per documentation. Best Regards,Strahil Nikolov Sent from Yahoo Mail for iPhone On Wednesday, June 7, 2023, 9:21 PM, mabi wrote: Dear Strahil, Thank you very much for pointing me to the RedHat documentation. I wasn't aware of it and it is much more detailed. I will have to read it carefully. Now as I have a single disk (no RAID) based on that documentation I understand that I should use a data alignment value of 256kB. Best regards,Mabi --- Original Message --- On Wednesday, June 7th, 2023 at 6:56 AM, Strahil Nikolov wrote: Have you checked this page: https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.5/html/administration_guide/brick_configuration ? The alignment depends on the HW raid stripe unit size. Best Regards,Strahil Nikolov On Tue, Jun 6, 2023 at 2:35, mabi wrote: Hello, I am preparing a brick as LVM thin volume for a test slave node using this documentation: https://docs.gluster.org/en/main/Administrator-Guide/formatting-and-mounting-bricks/ but I am confused regarding the right "--dataalignment" option to be used for pvcreate. The documentation mentions the following under point 1: "Create a physical volume(PV) by using the pvcreate command. For example: pvcreate --dataalignment 128K /dev/sdb Here, /dev/sdb is a storage device. Use the correct dataalignment option based on your device. Note: The device name and the alignment value will vary based on the device you are using." As test disk for this brick I have an external USB 500GB SSD disk from Samsung PSSD T7 (https://semiconductor.samsung.com/consumer-storage/portable-ssd/t7/) but my question is where do I find the information on which alignment value I need to use for this specific disk? Best regards, Mabi Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Geo replication procedure for DR
To be honest, I have never reached that point but I think that if the original volume is too outdated it makes sense to setup a new volume on primary site and run a replication from the DR to primary site and then schedule a cut-over (Read-only DR volume, remove replication, point all clients to main site). You will need to test the whole scenario on a separate cluster , till the procedure is well established. Best Regards,Strahil Nikolov Sent from Yahoo Mail for iPhone On Wednesday, June 7, 2023, 9:13 PM, mabi wrote: Dear Strahil, Thank you for the detailed command. So once you want to switch all traffic to the DR site in case of disaster one should first disable the read-only setting on the secondary volume on the slave site. What happens after when the master site is back online? What's the procedure there? I had the following question in my previous mail in this regard: "And once the primary site is back online how do you copy back or sync all data changes done on the secondary volume on the secondary site back to the primary volume on the primary site?" Best regards,Mabi --- Original Message --- On Wednesday, June 7th, 2023 at 6:52 AM, Strahil Nikolov wrote: It's just a setting on the target volume: gluster volume set read-only OFF Best Regards,Strahil Nikolov On Mon, Jun 5, 2023 at 22:30, mabi wrote: Hello, I was reading the geo replication documentation here: https://docs.gluster.org/en/main/Administrator-Guide/Geo-Replication/ and I was wondering how it works when in case of disaster recovery when the primary cluster is down and the the secondary site with the volume needs to be used? What is the procedure here to make the secondary volume on the secondary site available for read/write? And once the primary site is back online how do you copy back or sync all data changes done on the secondary volume on the secondary site back to the primary volume on the primary site? Best regards, Mabi Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Using glusterfs for virtual machines with qcow2 images
Just check an existing mount unit and use that as reference. It’s not very reliable to use service to mount your mount points. P.S.: All entries in /etc/fstab has a dynamically generated mount unit.If an fstab entry misses , the system fails to boot, yet a missing mount unit doesn’t have that effect. Best Regards,Strahil Nikolov Sent from Yahoo Mail for iPhone On Wednesday, June 7, 2023, 3:48 PM, Gilberto Ferreira wrote: Hi everybody Regarding the issue with mount, usually I am using this systemd service to bring up the mount points:/etc/systemd/system/glusterfsmounts.service [Unit] Description=Glustermounting Requires=glusterd.service Wants=glusterd.service After=network.target network-online.target glusterd.service [Service] Type=simple RemainAfterExit=true ExecStartPre=/usr/sbin/gluster volume list ExecStart=/bin/mount -a -t glusterfs TimeoutSec=600 SuccessExitStatus=15 Restart=on-failure RestartSec=60 StartLimitBurst=6 StartLimitInterval=3600 [Install] WantedBy=multi-user.target After create it remember to reload the systemd daemon like: systemctl enable glusterfsmounts.servicesystemctl demon-reload Also, I am using /etc/fstab to mount the glusterfs mount point properly, since the Proxmox GUI seems to me a little broken in this regards gluster1:VMS1 /vms1 glusterfs defaults,_netdev,x-systemd.automount,backupvolfile-server=gluster2 0 0 ---Gilberto Nunes Ferreira(47) 99676-7530 - Whatsapp / Telegram Em qua., 7 de jun. de 2023 às 01:51, Strahil Nikolov escreveu: Hi Chris, here is a link to the settings needed for VM storage: https://github.com/gluster/glusterfs/blob/03592930239c3b43cbbdce17607c099ae075fd6d/extras/group-virt.example#L4 You can also ask in ovirt-users for real-world settings.Test well before changing production!!! IMPORTANT: ONCE SHARDING IS ENABLED, IT CANNOT BE DISABLED !!! Best Regards,Strahil Nikolov On Mon, Jun 5, 2023 at 13:55, Christian Schoepplein wrote: Hi, we'd like to use glusterfs for Proxmox and virtual machines with qcow2 disk images. We have a three node glusterfs setup with one volume and Proxmox is attached and VMs are created, but after some time, and I think after much i/o is going on for a VM, the data inside the virtual machine gets corrupted. When I copy files from or to our glusterfs directly everything is OK, I've checked the files with md5sum. So in general our glusterfs setup seems to be OK I think..., but with the VMs and the self growing qcow2 images there are problems. If I use raw images for the VMs tests look better, but I need to do more testing to be sure, the problem is a bit hard to reproduce :-(. I've also asked on a Proxmox mailinglist, but got no helpfull response so far :-(. So maybe you have any helping hint what might be wrong with our setup, what needs to be configured to use glusterfs as a storage backend for virtual machines with self growing disk images. e.g. Any helpfull tip would be great, because I am absolutely no glusterfs expert and also not a expert for virtualization and what has to be done to let all components play well together... Thanks for your support! Here some infos about our glusterfs setup, please let me know if you need more infos. We are using Ubuntu 22.04 as operating system: root@gluster1:~# gluster --version glusterfs 10.1 Repository revision: git://git.gluster.org/glusterfs.git Copyright (c) 2006-2016 Red Hat, Inc. <https://www.gluster.org/> GlusterFS comes with ABSOLUTELY NO WARRANTY. It is licensed to you under your choice of the GNU Lesser General Public License, version 3 or any later version (LGPLv3 or later), or the GNU General Public License, version 2 (GPLv2), in all cases as published by the Free Software Foundation. root@gluster1:~# root@gluster1:~# gluster v status gfs_vms Status of volume: gfs_vms Gluster process TCP Port RDMA Port Online Pid -- Brick gluster1.linova.de:/glusterfs/sde1enc /brick 58448 0 Y 1062218 Brick gluster2.linova.de:/glusterfs/sdc1enc /brick 50254 0 Y 20596 Brick gluster3.linova.de:/glusterfs/sdc1enc /brick 52840 0 Y 1627513 Brick gluster1.linova.de:/glusterfs/sdf1enc /brick 49832 0 Y 1062227 Brick gluster2.linova.de:/glusterfs/sdd1enc /brick 56095 0 Y 20612 Brick gluster3.linova.de:/glusterfs/sdd1enc /brick 51252 0 Y 1627521 Brick gluster1.linova.de:/glusterfs/sdg1enc /brick 54991 0 Y 1062230 Brick gluster2.linova.de:/glusterfs/sde1enc /brick 60812 0 Y 20628 Brick gluster3.linova.de:/glu
Re: [Gluster-users] How to find out data alignment for LVM thin volume brick
Have you checked this page: https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.5/html/administration_guide/brick_configuration ? The alignment depends on the HW raid stripe unit size. Best Regards,Strahil Nikolov On Tue, Jun 6, 2023 at 2:35, mabi wrote: Hello, I am preparing a brick as LVM thin volume for a test slave node using this documentation: https://docs.gluster.org/en/main/Administrator-Guide/formatting-and-mounting-bricks/ but I am confused regarding the right "--dataalignment" option to be used for pvcreate. The documentation mentions the following under point 1: "Create a physical volume(PV) by using the pvcreate command. For example: pvcreate --dataalignment 128K /dev/sdb Here, /dev/sdb is a storage device. Use the correct dataalignment option based on your device. Note: The device name and the alignment value will vary based on the device you are using." As test disk for this brick I have an external USB 500GB SSD disk from Samsung PSSD T7 (https://semiconductor.samsung.com/consumer-storage/portable-ssd/t7/) but my question is where do I find the information on which alignment value I need to use for this specific disk? Best regards, Mabi Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Qustionmark in permission and Owner
Usually when you see '' for user, group, date - it's a split brain situation (could be a gfid split brain) and Gluster can't decide which copy is bad . Best Regards,Strahil Nikolov On Mon, Jun 5, 2023 at 23:30, Diego Zuccato wrote: Seen something similar when FUSE client died, but it marked the whole mountpoint, not just some files. Might be a desync or communication loss between the nodes? Diego Il 05/06/2023 11:23, Stefan Kania ha scritto: > Hello, > > I have a strange problem on a gluster volume > > If I do an "ls -l" in a directory insight a mountet gluster-volume I > see, only for some files, questionmarks for the permission, the owner, > the size and the date. > Looking at the same directory on the brick it self, everything is ok. > After rebooting the nodes everything is back to normal. > > System is Debian 11 and Gluster is version 9. The filesystem is LVM2 > thin provisioned and formated with XFS. > > But as I said, the brick is ok only the mountet volume is having the > problem. > > Any hind what it could be? > > Thank's > > Stefan > > > > > > > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://meet.google.com/cpu-eiue-hvk > Gluster-users mailing list > Gluster-users@gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Geo replication procedure for DR
It's just a setting on the target volume: gluster volume set read-only OFF Best Regards,Strahil Nikolov On Mon, Jun 5, 2023 at 22:30, mabi wrote: Hello, I was reading the geo replication documentation here: https://docs.gluster.org/en/main/Administrator-Guide/Geo-Replication/ and I was wondering how it works when in case of disaster recovery when the primary cluster is down and the the secondary site with the volume needs to be used? What is the procedure here to make the secondary volume on the secondary site available for read/write? And once the primary site is back online how do you copy back or sync all data changes done on the secondary volume on the secondary site back to the primary volume on the primary site? Best regards, Mabi Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Using glusterfs for virtual machines with qcow2 images
Hi Chris, here is a link to the settings needed for VM storage: https://github.com/gluster/glusterfs/blob/03592930239c3b43cbbdce17607c099ae075fd6d/extras/group-virt.example#L4 You can also ask in ovirt-users for real-world settings.Test well before changing production!!! IMPORTANT: ONCE SHARDING IS ENABLED, IT CANNOT BE DISABLED !!! Best Regards,Strahil Nikolov On Mon, Jun 5, 2023 at 13:55, Christian Schoepplein wrote: Hi, we'd like to use glusterfs for Proxmox and virtual machines with qcow2 disk images. We have a three node glusterfs setup with one volume and Proxmox is attached and VMs are created, but after some time, and I think after much i/o is going on for a VM, the data inside the virtual machine gets corrupted. When I copy files from or to our glusterfs directly everything is OK, I've checked the files with md5sum. So in general our glusterfs setup seems to be OK I think..., but with the VMs and the self growing qcow2 images there are problems. If I use raw images for the VMs tests look better, but I need to do more testing to be sure, the problem is a bit hard to reproduce :-(. I've also asked on a Proxmox mailinglist, but got no helpfull response so far :-(. So maybe you have any helping hint what might be wrong with our setup, what needs to be configured to use glusterfs as a storage backend for virtual machines with self growing disk images. e.g. Any helpfull tip would be great, because I am absolutely no glusterfs expert and also not a expert for virtualization and what has to be done to let all components play well together... Thanks for your support! Here some infos about our glusterfs setup, please let me know if you need more infos. We are using Ubuntu 22.04 as operating system: root@gluster1:~# gluster --version glusterfs 10.1 Repository revision: git://git.gluster.org/glusterfs.git Copyright (c) 2006-2016 Red Hat, Inc. <https://www.gluster.org/> GlusterFS comes with ABSOLUTELY NO WARRANTY. It is licensed to you under your choice of the GNU Lesser General Public License, version 3 or any later version (LGPLv3 or later), or the GNU General Public License, version 2 (GPLv2), in all cases as published by the Free Software Foundation. root@gluster1:~# root@gluster1:~# gluster v status gfs_vms Status of volume: gfs_vms Gluster process TCP Port RDMA Port Online Pid -- Brick gluster1.linova.de:/glusterfs/sde1enc /brick 58448 0 Y 1062218 Brick gluster2.linova.de:/glusterfs/sdc1enc /brick 50254 0 Y 20596 Brick gluster3.linova.de:/glusterfs/sdc1enc /brick 52840 0 Y 1627513 Brick gluster1.linova.de:/glusterfs/sdf1enc /brick 49832 0 Y 1062227 Brick gluster2.linova.de:/glusterfs/sdd1enc /brick 56095 0 Y 20612 Brick gluster3.linova.de:/glusterfs/sdd1enc /brick 51252 0 Y 1627521 Brick gluster1.linova.de:/glusterfs/sdg1enc /brick 54991 0 Y 1062230 Brick gluster2.linova.de:/glusterfs/sde1enc /brick 60812 0 Y 20628 Brick gluster3.linova.de:/glusterfs/sde1enc /brick 59254 0 Y 1627522 Self-heal Daemon on localhost N/A N/A Y 1062249 Bitrot Daemon on localhost N/A N/A Y 3591335 Scrubber Daemon on localhost N/A N/A Y 3591346 Self-heal Daemon on gluster2.linova.de N/A N/A Y 20645 Bitrot Daemon on gluster2.linova.de N/A N/A Y 987517 Scrubber Daemon on gluster2.linova.de N/A N/A Y 987588 Self-heal Daemon on gluster3.linova.de N/A N/A Y 1627568 Bitrot Daemon on gluster3.linova.de N/A N/A Y 1627543 Scrubber Daemon on gluster3.linova.de N/A N/A Y 1627554 Task Status of Volume gfs_vms -- There are no active volume tasks root@gluster1:~# root@gluster1:~# gluster v status gfs_vms detail Status of volume: gfs_vms -- Brick : Brick gluster1.linova.de:/glusterfs/sde1enc/brick TCP Port : 58448 RDMA Port : 0 Online : Y Pid : 1062218 File System : xfs Device : /dev/mapper/sde1enc Mount Options : rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota
Re: [Gluster-users] Error in gluster v11
Looks similar to https://github.com/gluster/glusterfs/issues/4104 I don’t see any progress there.Maybe asking in gluster-devel (in CC) could help. Best Regards,Strahil Nikolov On Sunday, May 14, 2023, 5:28 PM, Gilberto Ferreira wrote: Anybody also has this error? May 14 07:05:39 srv01 vms[9404]: [2023-05-14 10:05:39.618424 +] C [gf-io-uring.c:612:gf_io_uring_cq_process_some] (-->/lib/x86_64 -linux-gnu/libglusterfs.so.0(+0x849ae) [0x7fb4ebace9ae] -->/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x8a2e5) [0x7fb4ebad42e5] -->/lib /x86_64-linux-gnu/libglusterfs.so.0(+0x8a1a5) [0x7fb4ebad41a5] ) 0-: Assertion failed: May 14 07:05:39 srv01 vms[9404]: patchset: git://git.gluster.org/glusterfs.git May 14 07:05:39 srv01 vms[9404]: package-string: glusterfs 11.0 ---Gilberto Nunes Ferreira(47) 99676-7530 - Whatsapp / Telegram Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] writing to fuse device yielded ENOENT
Hey David, Did you get any reply on this thread ?Did you manage to identify the problem ? Best Rega,Strahil Nikolov On Wednesday, April 19, 2023, 4:35 AM, David Cunningham wrote: Hello, I tried reporting this in issue #4097 but got no response. Following on from issue #1741 and #3498, we were experiencing very slow response time accessing files on a GlusterFS 9.6 system on Ubuntu 18.04 server. The server in question is both a GlusterFS node and client. Listing directory contents via the FUSE mount typically took 2-10 seconds, whereas a different client was fast. In mnt-glusterfs.log we saw lots of warnings like this: [2023-04-03 20:16:14.789588 +] W [fuse-bridge.c:310:check_and_dump_fuse_W] 0-glusterfs-fuse: writing to fuse device yielded ENOENT 256 times After running "echo 3 > /proc/sys/vm/drop_caches" as suggested in issue #1471 the response improved dramatically, to around 0.009s, the same as the other client. Can you please advise how we should tune GlusterFS to avoid this problem? I see mention of the --lru-limit and --invalidate-limit options in that issue, but to be honest I don't understand how to use the warning messages to decide on a suitable value for those options. Thanks in advance. Here are the GlusterFS details: root@br:~# gluster volume info Volume Name: gvol0 Type: Replicate Volume ID: 2d2c1552-bc93-4c91-b8ca-73553f00fdcd Status: Started Snapshot Count: 0 Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: br:/nodirectwritedata/gluster/gvol0 Brick2: sg:/nodirectwritedata/gluster/gvol0 Options Reconfigured: cluster.min-free-disk: 20% network.ping-timeout: 10 cluster.granular-entry-heal: on storage.fips-mode-rchecksum: on transport.address-family: inet nfs.disable: on performance.client-io-threads: off storage.health-check-interval: 0 cluster.server-quorum-ratio: 50 root@br:~# root@br:~# gluster volume status Status of volume: gvol0 Gluster process TCP Port RDMA Port Online Pid -- Brick br:/nodirectwritedata/gluster/gvol0 49152 0 Y 4761 Brick sg:/nodirectwritedata/gluster/gvol0 49152 0 Y 2329 Self-heal Daemon on localhost N/A N/A Y 5304 Self-heal Daemon on sg N/A N/A Y 2629 Task Status of Volume gvol0 -- There are no active volume tasks root@br:~# root@br:~# gluster volume heal gvol0 info summary Brick br:/nodirectwritedata/gluster/gvol0 Status: Connected Total Number of entries: 0 Number of entries in heal pending: 0 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick sg:/nodirectwritedata/gluster/gvol0 Status: Connected Total Number of entries: 0 Number of entries in heal pending: 0 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Thank you, --David Cunningham, Voisonics Limited http://voisonics.com/ USA: +1 213 221 1092 New Zealand: +64 (0)28 2558 3782 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] gluster, arbiter, thin-arbiter questions
As nobody chimed in, let me reply inline. Best Regards,Strahil Nikolov On Sunday, April 23, 2023, 2:35 AM, Peter P wrote: Good afternoon, I am looking for additional information about glusterfs, arbiters and thin-arbiters. The current stable release is gluster 11, so I am considering that version for deployment. My planned setup is: 4 storage servers in distributed + replicated mode. Server1, server2 : replica 2, arbiter 1 Server3, server4 : replica 2, arbiter 1 Since having replica 2 is not recommended due to split-brains I will have an arbiter. Generic questions: - Is arbiter or thin-arbiter recommended in a production, large volume storage environment? - Both were introduced a long time ago. Most users prefer full arbiter, so healing is far more optimal (only changed files will be healed). - Is thin arbiter code stable and deployment ready? - I know that it’s in use but full arbiter has been introduced earlier and has a wider adoption. - Arbiter is file based and stores metadata for each file. In this scenario I would at least double the default inode count on the arbiter server. Thin-arbiter on the other hand is brick based but I have not found enough information if its inode usage is as inode hungry as the arbiter configuration. I am thinking, it should not be as it is brick based. So do I need to increase the inode count when using thin-arbiters? If yes, what is recommended? - Full arbiter is sensitive to network latency and disk speed (a lot of small I/O for those inodes). Increase macpct (XFS) on arbiter bricks and prefer using a SSD/NVME. As full arbiter doesn’t store any data , you can set the maxpct to around 75%- Thin arbiter doesn’t have a brick and when you create it, you just specify the replica id file ( see https://docs.gluster.org/en/main/Administrator-Guide/Thin-Arbiter-Volumes/ ) - I've read old bug reports reporting that thin arbiter was not ready to serve multiple trusted pools. Is this still the case? I may configure multiple trusted pools in the future. - I saw Kadalu uses their own thin arbiter and I never saw issues. I doubt I was the only one using it, so it should be fine. - I have many linux boxes running different linux distributions and releases. Ideally the assortment of boxes would mount the same gluster pool/volume. I looked for information about older versions of gluster clients running on a range of older distributions mounting the most recent gluster 11 pool/volume? Does that work? Can gluster client (version 10, 9, 8, 7, etc.) mount gluster 11 volume and run without significant issues? I understand that older versions of client will not have the most recent features. Most recent features aside, is such configuration supported/stable? - For that purpose gluster has 2 settings:cluster.max-op-version -> the max compatibility version you can set your cluster based of the oldest client’s versioncluster.op-version -> the cluster’s compatibility versionAs long you keep the cluster.op-version compatible with your client - it should work. Thin-arbiter approach: If I go with the thin-arbiter configuration I will use a 5th server as this server can be outside of the trusted pool and can be shared among multiple trusted pools Server1, server2: replica 2, thin-arbiter server5 Server3, server4: replica 2, thin-arbiter server5 Old arbiter approach: If I go with the older arbiter configuration, I am considering using 2 of the storage servers to act as both replica and an arbiter. Is that configuration possible/supported and reasonable? Server1, server2: replica 2, arbiter server3 Server3, server4: replica 2, arbiter server1 - Yes, as long as you have a dedicated brick (in this example server3 should have a data brick and arbiter brick) In this configuration, I am considering using server3 to be arbiter for server{1,2} replica 2, and using server1 to be arbiter for server{3,4} replica 2. Questions: - Is this a reasonable/recommended configuration? - It’s used quite often - Should the arbiter metadata folder be inside or outside of the volume? - In detail. Say server{1,2} replica has 1 brick each /gluster/brick1 with /gfs1vol1 as the volume - Should the arbiter metadata folder location be: /gluster/arbiter/gfs1vol1 (outside of the volume path) or /gfs1vol1/arbiter1/ (inside the volume path) - Always keep bricks as separate mount points. For example:/dev/vg/lv mounted on /bricks/databricks with directory vol1/brick1/dev/vg/lv2 mounted on /bricks/arbiterbricks with directory vol1/arbiterbrick1 The idea is that if the device is not mounted, the brick directory will be missing and the mess will be far less. Thank you for your thoughts, Peter | | Virus-free.www.avg.com | Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cp
Re: [Gluster-users] 'error=No space left on device' but, there is plenty of space all nodes
Hi,Have you checked inode usage (df -i /lvbackups/brick ) ? Best Regards,Strahil Nikolov On Tuesday, May 2, 2023, 3:05 AM, bran...@thinkhuge.net wrote: Hi Gluster users, We are seeing 'error=No space left on device' issue and hoping someone might could advise? We are using a 12 node glusterfs v10.4 distributed vsftpd backup cluster for years (not new) and recently 2 weeks ago upgraded to v9 > v10.4. I do not know if the upgrade is related to this new issue. We are seeing a new issue 'error=No space left on device' error below on multiple gluster v10.4 nodes in the logs. At the moment seeing it in the logs for about half (5 out of 12) of the nodes. The issue will go away if we reboot all the glusterfs nodes but, backups take a little over 2 days to complete each weekend and the issue returns after about 1 day of backups running and before the backup cycle is complete. It has been happening the last 2 weekends we have run backups to these nodes. #example log msg from /var/log/glusterfs/home-volbackups.log [2023-05-01 21:43:15.450502 +] W [MSGID: 114031] [client-rpc-fops_v2.c:670:client4_0_writev_cbk] 0-volbackups-client-18: remote operation failed. [{errno=28}, {error=No space left on device}] Each glusterfs node has a single brick and mounts a single distributed volume as a glusterfs client locally and receives backup files to the volume each weekend. We distribute the ftp upload load between the servers through a combination of /etc/hosts entries and AWS weighted dns. We have 91 TB available on the volume though and each of the 12 nodes have 4-11 TB free so we are nowhere near out of space on any node? We have already tried the setting change from 'cluster.min-free-disk: 1%' to 'cluster.min-free-disk: 1GB' and rebooted all the gluster nodes to refresh them and it happened again. That was mentioned in this doc https://access.redhat.com/solutions/276483 as an idea. Does anyone know what we might check next? glusterfs-server-10.4-1.el8s.x86_64 glusterfs-fuse-10.4-1.el8s.x86_64 Here is the info (hostnames changed) below. [root@nybaknode1 ~]# gluster volume status volbackups detail Status of volume: volbackups -- Brick : Brick nybaknode9.example.net:/lvbackups/brick TCP Port : 60039 RDMA Port : 0 Online : Y Pid : 1664 File System : xfs Device : /dev/mapper/vgbackups-lvbackups Mount Options : rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,sunit=512,swidth=512,noquot a Inode Size : 512 Disk Space Free : 6.1TB Total Disk Space : 29.0TB Inode Count : 3108974976 Free Inodes : 3108881513 -- Brick : Brick nybaknode11.example.net:/lvbackups/brick TCP Port : 52682 RDMA Port : 0 Online : Y Pid : 2076 File System : xfs Device : /dev/mapper/vgbackups-lvbackups Mount Options : rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,sunit=512,swidth=512,noquot a Inode Size : 512 Disk Space Free : 10.1TB Total Disk Space : 43.5TB Inode Count : 4672138432 Free Inodes : 4672039743 -- Brick : Brick nybaknode2.example.net:/lvbackups/brick TCP Port : 56722 RDMA Port : 0 Online : Y Pid : 1761 File System : xfs Device : /dev/mapper/vgbackups-lvbackups Mount Options : rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,sunit=512,swidth=512,noquot a Inode Size : 512 Disk Space Free : 6.6TB Total Disk Space : 29.0TB Inode Count : 3108921344 Free Inodes : 3108827241 -- Brick : Brick nybaknode3.example.net:/lvbackups/brick TCP Port : 53098 RDMA Port : 0 Online : Y Pid : 1601 File System : xfs Device : /dev/mapper/vgbackups-lvbackups Mount Options : rw,relatime,attr2,inode64,logbufs=8,logbsize=256k,sunit=512,swidth=512,noquo ta Inode Size : 512 Disk Space Free : 6.4TB Total Disk Space : 29.0TB Inode Count : 3108921344 Free Inodes : 3108827312 -- Brick : Brick nybaknode4.example.net:/lvbackups/brick TCP Port : 51476 RDMA Port : 0 Online : Y Pid : 1633 File System : xfs Device : /dev/mapper/vgbackups-lvbackups Mount Options : rw,relatime,attr2,inode64,logbufs=8,logbsize=256k,sunit=512,swidth=512,noquo ta Inode Size : 512 D
Re: [Gluster-users] Rename volume?
Most probably you can stop glusterd on all nodes (by default that doesn't stop the bricks), edit all volfiles and then start glusterd. I guess for some reason it's no longer safe to rename a volume. You can ask in gluster-devel mailing list if interested. Best Regards,Strahil Nikolov On Wed, Apr 12, 2023 at 18:30, Ruediger Kupper wrote: I noticed that there once was a rename command but it was removed. Do you know why? And is there a way to do it manually? Thanks! -- OStR Dr. R. Kupper Kepler-Gymnasium Freudenstadt Am Mittwoch, April 12, 2023 17:11 CEST, schrieb Gilberto Ferreira : > I think gluster volume rename is not available anymore since version 6.5. > > --- > Gilberto Nunes Ferreira > (47) 99676-7530 - Whatsapp / Telegram > > > > > > > Em qua., 12 de abr. de 2023 às 11:51, Ruediger Kupper > escreveu: > > > Hi! > > Is it possible to rename a gluster volume? If so, what is to be done? > > (Context: I'm trying to recover from a misconfiguration by copying all > > contents of a volume to a new one. After that the old volume will be > > removed and the new one needs to be renamed to the old name.) > > > > Thanks for you help! > > Rüdiger > > > > -- > > OStR Dr. R. Kupper > > Kepler-Gymnasium Freudenstadt > > > > > > > > > > > > > > Community Meeting Calendar: > > > > Schedule - > > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > > Bridge: https://meet.google.com/cpu-eiue-hvk > > Gluster-users mailing list > > Gluster-users@gluster.org > > https://lists.gluster.org/mailman/listinfo/gluster-users > > Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] bind glusterd to specified interface
One workaround is to use firewalld and add the interface to a zone where you don't allow gluster communication. Best Regards,Strahil Nikolov On Mon, Apr 3, 2023 at 12:59, Gregor Burck wrote: Hi, after a bit of time I play around with glusterfs again. For now I want to bind the glusterd to an specified interface/ip adress. I want to have a management net, where the service doesn't reachable and an cluster net where the service is working on. I read something about to define it in the /etc/glusterfs/glusterfsd.vol file but found no valid desciption of it. Even on https://docs.gluster.org or in an man page,... Bye Gregor Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] hardware issues and new server advice
Hi Hubert, I think it will be better to open a separate thread for your case . If you have HW Raid1 presented as disks, then you can easily use striped LVM or md raid ( level 0 ) to stripe the disks. One advantage is that you won't have to worry about gluster rebalance or overloaded brick (multiple file access requests to the same brick), but of course it has disadvantages. Keep in mind that negative searches (searches of non-existing/deleted objects) has highest penalty. Best Regards, Strahil Nikolov В неделя, 26 март 2023 г., 08:52:18 ч. Гринуич+3, Hu Bert написа: Hi, sry if i hijack this, but maybe it's helpful for other gluster users... > pure NVME-based volume will be waste of money. Gluster excells when you have > more servers and clients to consume that data. > I would choose LVM cache (NVMEs) + HW RAID10 of SAS 15K disks to cope with > the load. At least if you decide to go with more disks for the raids, use > several (not the built-in ones) controllers. Well, we have to take what our provider (hetzner) offers - SATA hdds or sata|nvme ssds. Volume Name: workdata Type: Distributed-Replicate Number of Bricks: 5 x 3 = 15 Bricks: Brick1: gls1:/gluster/md3/workdata Brick2: gls2:/gluster/md3/workdata Brick3: gls3:/gluster/md3/workdata Brick4: gls1:/gluster/md4/workdata Brick5: gls2:/gluster/md4/workdata Brick6: gls3:/gluster/md4/workdata etc. Below are the volume settings. Each brick is a sw raid1 (made out of 10TB hdds). file access to the backends is pretty slow, even with low system load (which reaches >100 on the servers on high traffic days); even a simple 'ls' on a directory with ~1000 sub-directories will take a couple of seconds. Some images: https://abload.de/img/gls-diskutilfti5d.png https://abload.de/img/gls-io6cfgp.png https://abload.de/img/gls-throughput3oicf.png As you mentioned it: is a raid10 better than x*raid1? Anything misconfigured? Thx a lot & best regards, Hubert Options Reconfigured: performance.client-io-threads: off nfs.disable: on transport.address-family: inet performance.cache-invalidation: on performance.stat-prefetch: on features.cache-invalidation-timeout: 600 features.cache-invalidation: on performance.read-ahead: off performance.io-cache: off performance.quick-read: on cluster.self-heal-window-size: 16 cluster.heal-wait-queue-length: 1 cluster.data-self-heal-algorithm: full cluster.background-self-heal-count: 256 network.inode-lru-limit: 20 cluster.shd-max-threads: 8 server.outstanding-rpc-limit: 128 transport.listen-backlog: 100 performance.least-prio-threads: 8 performance.cache-size: 6GB cluster.min-free-disk: 1% performance.io-thread-count: 32 performance.write-behind-window-size: 16MB performance.cache-max-file-size: 128MB client.event-threads: 8 server.event-threads: 8 performance.parallel-readdir: on performance.cache-refresh-timeout: 4 cluster.readdir-optimize: off performance.md-cache-timeout: 600 performance.nl-cache: off cluster.lookup-unhashed: on cluster.shd-wait-qlength: 1 performance.readdir-ahead: on storage.build-pgfid: off Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] hardware issues and new server advice
Based on my observation multiple small systems deal better than one large server. If you have a caching layer, then LVM cache is an overkill. Why don't you mount the old system's volume on one of the new gluster servers and 'cp' from the first FUSE mount point to the new FUSE mount point ? Best Regards,Strahil Nikolov On Sat, Mar 25, 2023 at 3:31, Martin Bähr wrote: Excerpts from Strahil Nikolov's message of 2023-03-24 21:11:28 +: > Gluster excells when you have more servers and clients to consume that data. you mean multiple smaller servers are better than one large server? > LVM cache (NVMEs) we only have a few clients. gluster is for us effectively only a scalable large file storage for one application. new files are written once and then access to files is rather random (users accessing their albums) so that i don't see a benefit in using a cache. (and we also have a webcache which covers most of the repeated access from clients) > @Martin, > in order to get a more reliable setup, you will have to either get > more servers and switch to distributed-replicated volume(s) or that is the plan. we are not considering dispersed volumes. with the small file sized that doesn't seem worth it. besides, with regular volumes the files remain accessible even if gluster itself fails (which is the case now, as healing causes our raid to fail, we decided to turn off gluster on the old servers, and simply copy the raw files from the gluster storage to the new gluster once that is set up). greetings, martin. Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] hardware issues and new server advice
Actually, pure NVME-based volume will be waste of money. Gluster excells when you have more servers and clients to consume that data. I would choose LVM cache (NVMEs) + HW RAID10 of SAS 15K disks to cope with the load. At least if you decide to go with more disks for the raids, use several (not the built-in ones) controllers. @Martin, in order to get a more reliable setup, you will have to either get more servers and switch to distributed-replicated volume(s) or consider getting server hardware.Dispersed volumes require a lot of CPU computations and the Ryzens won't cope with the load. Best Regards,Strahil Nikolov On Thu, Mar 23, 2023 at 12:16, Hu Bert wrote: Hi, Am Di., 21. März 2023 um 23:36 Uhr schrieb Martin Bähr : > the primary data is photos. we get an average of 5 new files per > day, with a peak if 7 to 8 times as much during christmas. > > gluster has always been able to keep up with that, only when raid resync > or checks happen the server load sometimes increases to cause issues. Interesting, we have a similar workload: hundreds of millions of images, small files, and especially on weekends with high traffic the load+iowait is really heavy. Or if a hdd fails, or during a raid check. our hardware: 10x 10TB hdds -> 5x raid1, each raid1 is a brick, replicate 3 setup. About 40TB of data. Well, the bricks are bigger than recommended... Sooner or later we will have to migrate that stuff, and use nvme for that, either 3.5TB or bigger ones. Those should be faster... *fingerscrossed* regards, Hubert Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] How to configure?
Can you check your volume file contents?Maybe it really can't find (or access) a specific volfile ? Best Regards,Strahil Nikolov On Fri, Mar 24, 2023 at 8:07, Diego Zuccato wrote: In glfsheal-Connection.log I see many lines like: [2023-03-13 23:04:40.241481 +] E [MSGID: 104021] [glfs-mgmt.c:586:glfs_mgmt_getspec_cbk] 0-gfapi: failed to get the volume file [{from server}, {errno=2}, {error=File o directory non esistente}] And *lots* of gfid-mismatch errors in glustershd.log . Couldn't find anything that would prevent heal to start. :( Diego Il 21/03/2023 20:39, Strahil Nikolov ha scritto: > I have no clue. Have you checked for errors in the logs ? Maybe you > might find something useful. > > Best Regards, > Strahil Nikolov > > On Tue, Mar 21, 2023 at 9:56, Diego Zuccato > wrote: > Killed glfsheal, after a day there were 218 processes, then they got > killed by OOM during the weekend. Now there are no processes active. > Trying to run "heal info" reports lots of files quite quickly but does > not spawn any glfsheal process. And neither does restarting glusterd. > Is there some way to selectively run glfsheal to fix one brick at a > time? > > Diego > > Il 21/03/2023 01:21, Strahil Nikolov ha scritto: > > Theoretically it might help. > > If possible, try to resolve any pending heals. > > > > Best Regards, > > Strahil Nikolov > > > > On Thu, Mar 16, 2023 at 15:29, Diego Zuccato > > mailto:diego.zucc...@unibo.it>> wrote: > > In Debian stopping glusterd does not stop brick processes: to stop > > everything (and free the memory) I have to > > systemctl stop glusterd > > killall glusterfs{,d} > > killall glfsheal > > systemctl start glusterd > > [this behaviour hangs a simple reboot of a machine running > glusterd... > > not nice] > > > > For now I just restarted glusterd w/o killing the bricks: > > > > root@str957-clustor00:~# ps aux|grep glfsheal|wc -l ; > systemctl restart > > glusterd ; ps aux|grep glfsheal|wc -l > > 618 > > 618 > > > > No change neither in glfsheal processes nor in free memory :( > > Should I "killall glfsheal" before OOK kicks in? > > > > Diego > > > > Il 16/03/2023 12:37, Strahil Nikolov ha scritto: > > > Can you restart glusterd service (first check that it was not > > modified > > > to kill the bricks)? > > > > > > Best Regards, > > > Strahil Nikolov > > > > > > On Thu, Mar 16, 2023 at 8:26, Diego Zuccato > > > mailto:diego.zucc...@unibo.it> > <mailto:diego.zucc...@unibo.it>> wrote: > > > OOM is just just a matter of time. > > > > > > Today mem use is up to 177G/187 and: > > > # ps aux|grep glfsheal|wc -l > > > 551 > > > > > > (well, one is actually the grep process, so "only" 550 > glfsheal > > > processes. > > > > > > I'll take the last 5: > > > root 3266352 0.5 0.0 600292 93044 ? Sl > 06:55 0:07 > > > /usr/libexec/glusterfs/glfsheal cluster_data > info-summary --xml > > > root 3267220 0.7 0.0 600292 91964 ? Sl > 07:00 0:07 > > > /usr/libexec/glusterfs/glfsheal cluster_data > info-summary --xml > > > root 3268076 1.0 0.0 600160 88216 ? Sl > 07:05 0:08 > > > /usr/libexec/glusterfs/glfsheal cluster_data > info-summary --xml > > > root 3269492 1.6 0.0 600292 91248 ? Sl > 07:10 0:07 > > > /usr/libexec/glusterfs/glfsheal cluster_data > info-summary --xml > > > root 3270354 4.4 0.0 600292 93260 ? Sl > 07:15 0:07 > > > /usr/libexec/glusterfs/glfsheal cluster_data > info-summary --xml > > > > > > -8<-- > > > root@str957-clustor00:~# ps -o ppid= 3266352 > > > 3266345 > > > root@str957-clustor00:~# ps -o ppid= 3267220 > > > 3267213 > > > root@str957-clu
Re: [Gluster-users] can't set up geo-replication: can't fetch slave details
Usually geo-rep creation fails if the ports are blocked or ssh is not setup.Have you setup password-less ssh from one of the source hosts to one of the destination hosts ? Best Regards,Strahil Nikolov On Tue, Mar 21, 2023 at 15:35, Kingsley Tart wrote: Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] How to configure?
I have no clue. Have you checked for errors in the logs ? Maybe you might find something useful. Best Regards,Strahil Nikolov On Tue, Mar 21, 2023 at 9:56, Diego Zuccato wrote: Killed glfsheal, after a day there were 218 processes, then they got killed by OOM during the weekend. Now there are no processes active. Trying to run "heal info" reports lots of files quite quickly but does not spawn any glfsheal process. And neither does restarting glusterd. Is there some way to selectively run glfsheal to fix one brick at a time? Diego Il 21/03/2023 01:21, Strahil Nikolov ha scritto: > Theoretically it might help. > If possible, try to resolve any pending heals. > > Best Regards, > Strahil Nikolov > > On Thu, Mar 16, 2023 at 15:29, Diego Zuccato > wrote: > In Debian stopping glusterd does not stop brick processes: to stop > everything (and free the memory) I have to > systemctl stop glusterd > killall glusterfs{,d} > killall glfsheal > systemctl start glusterd > [this behaviour hangs a simple reboot of a machine running glusterd... > not nice] > > For now I just restarted glusterd w/o killing the bricks: > > root@str957-clustor00:~# ps aux|grep glfsheal|wc -l ; systemctl restart > glusterd ; ps aux|grep glfsheal|wc -l > 618 > 618 > > No change neither in glfsheal processes nor in free memory :( > Should I "killall glfsheal" before OOK kicks in? > > Diego > > Il 16/03/2023 12:37, Strahil Nikolov ha scritto: > > Can you restart glusterd service (first check that it was not > modified > > to kill the bricks)? > > > > Best Regards, > > Strahil Nikolov > > > > On Thu, Mar 16, 2023 at 8:26, Diego Zuccato > > mailto:diego.zucc...@unibo.it>> wrote: > > OOM is just just a matter of time. > > > > Today mem use is up to 177G/187 and: > > # ps aux|grep glfsheal|wc -l > > 551 > > > > (well, one is actually the grep process, so "only" 550 glfsheal > > processes. > > > > I'll take the last 5: > > root 3266352 0.5 0.0 600292 93044 ? Sl 06:55 0:07 > > /usr/libexec/glusterfs/glfsheal cluster_data info-summary --xml > > root 3267220 0.7 0.0 600292 91964 ? Sl 07:00 0:07 > > /usr/libexec/glusterfs/glfsheal cluster_data info-summary --xml > > root 3268076 1.0 0.0 600160 88216 ? Sl 07:05 0:08 > > /usr/libexec/glusterfs/glfsheal cluster_data info-summary --xml > > root 3269492 1.6 0.0 600292 91248 ? Sl 07:10 0:07 > > /usr/libexec/glusterfs/glfsheal cluster_data info-summary --xml > > root 3270354 4.4 0.0 600292 93260 ? Sl 07:15 0:07 > > /usr/libexec/glusterfs/glfsheal cluster_data info-summary --xml > > > > -8<-- > > root@str957-clustor00:~# ps -o ppid= 3266352 > > 3266345 > > root@str957-clustor00:~# ps -o ppid= 3267220 > > 3267213 > > root@str957-clustor00:~# ps -o ppid= 3268076 > > 3268069 > > root@str957-clustor00:~# ps -o ppid= 3269492 > > 3269485 > > root@str957-clustor00:~# ps -o ppid= 3270354 > > 3270347 > > root@str957-clustor00:~# ps aux|grep 3266345 > > root 3266345 0.0 0.0 430536 10764 ? Sl 06:55 0:00 > > gluster volume heal cluster_data info summary --xml > > root 3271532 0.0 0.0 6260 2500 pts/1 S+ 07:21 0:00 > grep > > 3266345 > > root@str957-clustor00:~# ps aux|grep 3267213 > > root 3267213 0.0 0.0 430536 10644 ? Sl 07:00 0:00 > > gluster volume heal cluster_data info summary --xml > > root 3271599 0.0 0.0 6260 2480 pts/1 S+ 07:22 0:00 > grep > > 3267213 > > root@str957-clustor00:~# ps aux|grep 3268069 > > root 3268069 0.0 0.0 430536 10704 ? Sl 07:05 0:00 > > gluster volume heal cluster_data info summary --xml > > root 3271626 0.0 0.0 6260 2516 pts/1 S+ 07:22 0:00 > grep > > 3268069 > > root@str957-clustor00:~# ps aux|grep 3269485 > > root 3269485 0.0 0.0 430536 10756 ? Sl 07:10 0:00 > > gluster volume heal cluster_data info summary --xml > > root 3271647 0.0 0.0 6260 2480 pts/1 S+ 07:22 0:00 > grep > > 326
Re: [Gluster-users] hardware issues and new server advice
Generally,the recommended approach is to have 4TB disks and no more than 10-12 per HW RAID.Of course , it's not always possible but a resync of a failed 14 TB drive will take eons. I'm not sure if the Ryzens can support ECC memory, but if they do - go for it. In both scenarios, always align the upper layers (LVM , FS) with the stripe width and stripe size. What kind of workload do you have ? Best Regards,Strahil Nikolov On Sat, Mar 18, 2023 at 14:36, Martin Bähr wrote: hi, our current servers are suffering from a weird hardware issue that forces us to start over. in short we have two servers with 15 disks at 6TB each, divided into three raid5 arrays for three bricks per server at 22TB per brick. each brick on one server is replicated to a brick on the second server. the hardware issue is that somewhere in the backplane random I/O errors happen when the system is under load. these cause the raid to fail disks, although the disks themselves are perfectly fine. reintegration of the disks causes more load and is therefore difficult. we have been running these servers for at least four years, and the problem only started appearing about three months ago our hostingprovider acknowledged the issue but does not support moving the disks to different servers. (they replaced the hardware but that didn't help) so we need to start over. my first intuition was that we should have smaller servers with less disks to avoid repeating the above scenario. we also previously had issues with the load created by raid resync so we are considering to skip raid alltogether and rely on gluster replication instead. (by compensating with three replicas per brick instead of two) our options are: 6 of these: AMD Ryzen 5 Pro 3600 - 6c/12t - 3.6GHz/4.2GHz 32GB - 128GB RAM 4 or 6 × 6TB HDD SATA 6Gbit/s or three of these: AMD Ryzen 7 Pro 3700 - 8c/16t - 3.6GHz/4.4GHz 32GB - 128GB RAM 6× 14TB HDD SAS 6Gbit/s i would configure 5 bricks on each server (leaving one disk as a hot spare) the engineers prefer the second option due to the architecture and SAS disks. it is also cheaper. i am concerned that 14TB disks will take to long to heal if one ever has to be replaced and would favor the smaller disks. the other question is, is skipping raid a good idea? greetings, martin. Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] How to configure?
Theoretically it might help.If possible, try to resolve any pending heals. Best Regards,Strahil Nikolov On Thu, Mar 16, 2023 at 15:29, Diego Zuccato wrote: In Debian stopping glusterd does not stop brick processes: to stop everything (and free the memory) I have to systemctl stop glusterd killall glusterfs{,d} killall glfsheal systemctl start glusterd [this behaviour hangs a simple reboot of a machine running glusterd... not nice] For now I just restarted glusterd w/o killing the bricks: root@str957-clustor00:~# ps aux|grep glfsheal|wc -l ; systemctl restart glusterd ; ps aux|grep glfsheal|wc -l 618 618 No change neither in glfsheal processes nor in free memory :( Should I "killall glfsheal" before OOK kicks in? Diego Il 16/03/2023 12:37, Strahil Nikolov ha scritto: > Can you restart glusterd service (first check that it was not modified > to kill the bricks)? > > Best Regards, > Strahil Nikolov > > On Thu, Mar 16, 2023 at 8:26, Diego Zuccato > wrote: > OOM is just just a matter of time. > > Today mem use is up to 177G/187 and: > # ps aux|grep glfsheal|wc -l > 551 > > (well, one is actually the grep process, so "only" 550 glfsheal > processes. > > I'll take the last 5: > root 3266352 0.5 0.0 600292 93044 ? Sl 06:55 0:07 > /usr/libexec/glusterfs/glfsheal cluster_data info-summary --xml > root 3267220 0.7 0.0 600292 91964 ? Sl 07:00 0:07 > /usr/libexec/glusterfs/glfsheal cluster_data info-summary --xml > root 3268076 1.0 0.0 600160 88216 ? Sl 07:05 0:08 > /usr/libexec/glusterfs/glfsheal cluster_data info-summary --xml > root 3269492 1.6 0.0 600292 91248 ? Sl 07:10 0:07 > /usr/libexec/glusterfs/glfsheal cluster_data info-summary --xml > root 3270354 4.4 0.0 600292 93260 ? Sl 07:15 0:07 > /usr/libexec/glusterfs/glfsheal cluster_data info-summary --xml > > -8<-- > root@str957-clustor00:~# ps -o ppid= 3266352 > 3266345 > root@str957-clustor00:~# ps -o ppid= 3267220 > 3267213 > root@str957-clustor00:~# ps -o ppid= 3268076 > 3268069 > root@str957-clustor00:~# ps -o ppid= 3269492 > 3269485 > root@str957-clustor00:~# ps -o ppid= 3270354 > 3270347 > root@str957-clustor00:~# ps aux|grep 3266345 > root 3266345 0.0 0.0 430536 10764 ? Sl 06:55 0:00 > gluster volume heal cluster_data info summary --xml > root 3271532 0.0 0.0 6260 2500 pts/1 S+ 07:21 0:00 grep > 3266345 > root@str957-clustor00:~# ps aux|grep 3267213 > root 3267213 0.0 0.0 430536 10644 ? Sl 07:00 0:00 > gluster volume heal cluster_data info summary --xml > root 3271599 0.0 0.0 6260 2480 pts/1 S+ 07:22 0:00 grep > 3267213 > root@str957-clustor00:~# ps aux|grep 3268069 > root 3268069 0.0 0.0 430536 10704 ? Sl 07:05 0:00 > gluster volume heal cluster_data info summary --xml > root 3271626 0.0 0.0 6260 2516 pts/1 S+ 07:22 0:00 grep > 3268069 > root@str957-clustor00:~# ps aux|grep 3269485 > root 3269485 0.0 0.0 430536 10756 ? Sl 07:10 0:00 > gluster volume heal cluster_data info summary --xml > root 3271647 0.0 0.0 6260 2480 pts/1 S+ 07:22 0:00 grep > 3269485 > root@str957-clustor00:~# ps aux|grep 3270347 > root 3270347 0.0 0.0 430536 10672 ? Sl 07:15 0:00 > gluster volume heal cluster_data info summary --xml > root 3271666 0.0 0.0 6260 2568 pts/1 S+ 07:22 0:00 grep > 3270347 > -8<-- > > Seems glfsheal is spawning more processes. > I can't rule out a metadata corruption (or at least a desync), but it > shouldn't happen... > > Diego > > Il 15/03/2023 20:11, Strahil Nikolov ha scritto: > > If you don't experience any OOM , you can focus on the heals. > > > > 284 processes of glfsheal seems odd. > > > > Can you check the ppid for 2-3 randomly picked ? > > ps -o ppid= > > > > Best Regards, > > Strahil Nikolov > > > > On Wed, Mar 15, 2023 at 9:54, Diego Zuccato > > mailto:diego.zucc...@unibo.it>> wrote: > > I enabled it yesterday and that greatly reduced memory pressure. > > Current volume info: > > -8<-- > > Volume Name: cluster_data > > Type: Distributed-Replicate > > Volume ID: a8caaa90-d161-45bb-a68c-278263a8531a > > Status: Started > > Snapshot Count: 0 > > Number of Bricks: 45 x (2 + 1) = 135 > > Transport
Re: [Gluster-users] How to configure?
Can you restart glusterd service (first check that it was not modified to kill the bricks)? Best Regards,Strahil Nikolov On Thu, Mar 16, 2023 at 8:26, Diego Zuccato wrote: OOM is just just a matter of time. Today mem use is up to 177G/187 and: # ps aux|grep glfsheal|wc -l 551 (well, one is actually the grep process, so "only" 550 glfsheal processes. I'll take the last 5: root 3266352 0.5 0.0 600292 93044 ? Sl 06:55 0:07 /usr/libexec/glusterfs/glfsheal cluster_data info-summary --xml root 3267220 0.7 0.0 600292 91964 ? Sl 07:00 0:07 /usr/libexec/glusterfs/glfsheal cluster_data info-summary --xml root 3268076 1.0 0.0 600160 88216 ? Sl 07:05 0:08 /usr/libexec/glusterfs/glfsheal cluster_data info-summary --xml root 3269492 1.6 0.0 600292 91248 ? Sl 07:10 0:07 /usr/libexec/glusterfs/glfsheal cluster_data info-summary --xml root 3270354 4.4 0.0 600292 93260 ? Sl 07:15 0:07 /usr/libexec/glusterfs/glfsheal cluster_data info-summary --xml -8<-- root@str957-clustor00:~# ps -o ppid= 3266352 3266345 root@str957-clustor00:~# ps -o ppid= 3267220 3267213 root@str957-clustor00:~# ps -o ppid= 3268076 3268069 root@str957-clustor00:~# ps -o ppid= 3269492 3269485 root@str957-clustor00:~# ps -o ppid= 3270354 3270347 root@str957-clustor00:~# ps aux|grep 3266345 root 3266345 0.0 0.0 430536 10764 ? Sl 06:55 0:00 gluster volume heal cluster_data info summary --xml root 3271532 0.0 0.0 6260 2500 pts/1 S+ 07:21 0:00 grep 3266345 root@str957-clustor00:~# ps aux|grep 3267213 root 3267213 0.0 0.0 430536 10644 ? Sl 07:00 0:00 gluster volume heal cluster_data info summary --xml root 3271599 0.0 0.0 6260 2480 pts/1 S+ 07:22 0:00 grep 3267213 root@str957-clustor00:~# ps aux|grep 3268069 root 3268069 0.0 0.0 430536 10704 ? Sl 07:05 0:00 gluster volume heal cluster_data info summary --xml root 3271626 0.0 0.0 6260 2516 pts/1 S+ 07:22 0:00 grep 3268069 root@str957-clustor00:~# ps aux|grep 3269485 root 3269485 0.0 0.0 430536 10756 ? Sl 07:10 0:00 gluster volume heal cluster_data info summary --xml root 3271647 0.0 0.0 6260 2480 pts/1 S+ 07:22 0:00 grep 3269485 root@str957-clustor00:~# ps aux|grep 3270347 root 3270347 0.0 0.0 430536 10672 ? Sl 07:15 0:00 gluster volume heal cluster_data info summary --xml root 3271666 0.0 0.0 6260 2568 pts/1 S+ 07:22 0:00 grep 3270347 -8<-- Seems glfsheal is spawning more processes. I can't rule out a metadata corruption (or at least a desync), but it shouldn't happen... Diego Il 15/03/2023 20:11, Strahil Nikolov ha scritto: > If you don't experience any OOM , you can focus on the heals. > > 284 processes of glfsheal seems odd. > > Can you check the ppid for 2-3 randomly picked ? > ps -o ppid= > > Best Regards, > Strahil Nikolov > > On Wed, Mar 15, 2023 at 9:54, Diego Zuccato > wrote: > I enabled it yesterday and that greatly reduced memory pressure. > Current volume info: > -8<-- > Volume Name: cluster_data > Type: Distributed-Replicate > Volume ID: a8caaa90-d161-45bb-a68c-278263a8531a > Status: Started > Snapshot Count: 0 > Number of Bricks: 45 x (2 + 1) = 135 > Transport-type: tcp > Bricks: > Brick1: clustor00:/srv/bricks/00/d > Brick2: clustor01:/srv/bricks/00/d > Brick3: clustor02:/srv/bricks/00/q (arbiter) > [...] > Brick133: clustor01:/srv/bricks/29/d > Brick134: clustor02:/srv/bricks/29/d > Brick135: clustor00:/srv/bricks/14/q (arbiter) > Options Reconfigured: > performance.quick-read: off > cluster.entry-self-heal: on > cluster.data-self-heal-algorithm: full > cluster.metadata-self-heal: on > cluster.shd-max-threads: 2 > network.inode-lru-limit: 50 > performance.md-cache-timeout: 600 > performance.cache-invalidation: on > features.cache-invalidation-timeout: 600 > features.cache-invalidation: on > features.quota-deem-statfs: on > performance.readdir-ahead: on > cluster.granular-entry-heal: enable > features.scrub: Active > features.bitrot: on > cluster.lookup-optimize: on > performance.stat-prefetch: on > performance.cache-refresh-timeout: 60 > performance.parallel-readdir: on > performance.write-behind-window-size: 128MB > cluster.self-heal-daemon: enable > features.inode-quota: on > features.quota: on > transport.address-family: inet > nfs.disable: on > performance.client-io-threads: off > client.event-threads: 1 > features.scrub-throttle: normal > diagnostics.brick-log-level: ERROR > diagnostics.client-log-level: ERROR > config.brick-threads: 0 > cluster.lookup-unhashed: on > config.client-thr
Re: [Gluster-users] How to configure?
If you don't experience any OOM , you can focus on the heals. 284 processes of glfsheal seems odd. Can you check the ppid for 2-3 randomly picked ?ps -o ppid= Best Regards,Strahil Nikolov On Wed, Mar 15, 2023 at 9:54, Diego Zuccato wrote: I enabled it yesterday and that greatly reduced memory pressure. Current volume info: -8<-- Volume Name: cluster_data Type: Distributed-Replicate Volume ID: a8caaa90-d161-45bb-a68c-278263a8531a Status: Started Snapshot Count: 0 Number of Bricks: 45 x (2 + 1) = 135 Transport-type: tcp Bricks: Brick1: clustor00:/srv/bricks/00/d Brick2: clustor01:/srv/bricks/00/d Brick3: clustor02:/srv/bricks/00/q (arbiter) [...] Brick133: clustor01:/srv/bricks/29/d Brick134: clustor02:/srv/bricks/29/d Brick135: clustor00:/srv/bricks/14/q (arbiter) Options Reconfigured: performance.quick-read: off cluster.entry-self-heal: on cluster.data-self-heal-algorithm: full cluster.metadata-self-heal: on cluster.shd-max-threads: 2 network.inode-lru-limit: 50 performance.md-cache-timeout: 600 performance.cache-invalidation: on features.cache-invalidation-timeout: 600 features.cache-invalidation: on features.quota-deem-statfs: on performance.readdir-ahead: on cluster.granular-entry-heal: enable features.scrub: Active features.bitrot: on cluster.lookup-optimize: on performance.stat-prefetch: on performance.cache-refresh-timeout: 60 performance.parallel-readdir: on performance.write-behind-window-size: 128MB cluster.self-heal-daemon: enable features.inode-quota: on features.quota: on transport.address-family: inet nfs.disable: on performance.client-io-threads: off client.event-threads: 1 features.scrub-throttle: normal diagnostics.brick-log-level: ERROR diagnostics.client-log-level: ERROR config.brick-threads: 0 cluster.lookup-unhashed: on config.client-threads: 1 cluster.use-anonymous-inode: off diagnostics.brick-sys-log-level: CRITICAL features.scrub-freq: monthly cluster.data-self-heal: on cluster.brick-multiplex: on cluster.daemon-log-level: ERROR -8<-- htop reports that memory usage is up to 143G, there are 602 tasks and 5232 threads (~20 running) on clustor00, 117G/49 tasks/1565 threads on clustor01 and 126G/45 tasks/1574 threads on clustor02. I see quite a lot (284!) of glfsheal processes running on clustor00 (a "gluster v heal cluster_data info summary" is running on clustor02 since yesterday, still no output). Shouldn't be just one per brick? Diego Il 15/03/2023 08:30, Strahil Nikolov ha scritto: > Do you use brick multiplexing ? > > Best Regards, > Strahil Nikolov > > On Tue, Mar 14, 2023 at 16:44, Diego Zuccato > wrote: > Hello all. > > Our Gluster 9.6 cluster is showing increasing problems. > Currently it's composed of 3 servers (2x Intel Xeon 4210 [20 cores dual > thread, total 40 threads], 192GB RAM, 30x HGST HUH721212AL5200 [12TB]), > configured in replica 3 arbiter 1. Using Debian packages from Gluster > 9.x latest repository. > > Seems 192G RAM are not enough to handle 30 data bricks + 15 arbiters > and > I often had to reload glusterfsd because glusterfs processed got killed > for OOM. > On top of that, performance have been quite bad, especially when we > reached about 20M files. On top of that, one of the servers have had > mobo issues that resulted in memory errors that corrupted some > bricks fs > (XFS, it required "xfs_reparir -L" to fix). > Now I'm getting lots of "stale file handle" errors and other errors > (like directories that seem empty from the client but still containing > files in some bricks) and auto healing seems unable to complete. > > Since I can't keep up continuing to manually fix all the issues, I'm > thinking about backup+destroy+recreate strategy. > > I think that if I reduce the number of bricks per server to just 5 > (RAID1 of 6x12TB disks) I might resolve RAM issues - at the cost of > longer heal times in case a disk fails. Am I right or it's useless? > Other recommendations? > Servers have space for another 6 disks. Maybe those could be used for > some SSDs to speed up access? > > TIA. > > -- > Diego Zuccato > DIFA - Dip. di Fisica e Astronomia > Servizi Informatici > Alma Mater Studiorum - Università di Bologna > V.le Berti-Pichat 6/2 - 40127 Bologna - Italy > tel.: +39 051 20 95786 > > > > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://meet.google.com/cpu-eiue-hvk > <https://meet.google.com/cpu-eiue-hvk> > Gluster-users mailing list > Gluster-users@gluster.org <mailto:Gluster-users@gluster.org> > https://lists.gluster.org/mailman/listinfo/gluster-users > <https://lists.
Re: [Gluster-users] How to configure?
Do you use brick multiplexing ? Best Regards,Strahil Nikolov On Tue, Mar 14, 2023 at 16:44, Diego Zuccato wrote: Hello all. Our Gluster 9.6 cluster is showing increasing problems. Currently it's composed of 3 servers (2x Intel Xeon 4210 [20 cores dual thread, total 40 threads], 192GB RAM, 30x HGST HUH721212AL5200 [12TB]), configured in replica 3 arbiter 1. Using Debian packages from Gluster 9.x latest repository. Seems 192G RAM are not enough to handle 30 data bricks + 15 arbiters and I often had to reload glusterfsd because glusterfs processed got killed for OOM. On top of that, performance have been quite bad, especially when we reached about 20M files. On top of that, one of the servers have had mobo issues that resulted in memory errors that corrupted some bricks fs (XFS, it required "xfs_reparir -L" to fix). Now I'm getting lots of "stale file handle" errors and other errors (like directories that seem empty from the client but still containing files in some bricks) and auto healing seems unable to complete. Since I can't keep up continuing to manually fix all the issues, I'm thinking about backup+destroy+recreate strategy. I think that if I reduce the number of bricks per server to just 5 (RAID1 of 6x12TB disks) I might resolve RAM issues - at the cost of longer heal times in case a disk fails. Am I right or it's useless? Other recommendations? Servers have space for another 6 disks. Maybe those could be used for some SSDs to speed up access? TIA. -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Gluster 11 upgrade, glusterd crash
Start with the removal of tier-enabled from all volume files (of course back them up) and restart glusterd. Check if the errors will be less. Best Regards,Strahil Nikolov On Mon, Mar 6, 2023 at 17:16, Marcus Pedersén wrote: Hi again, As I got the error: [2023-03-06 15:09:14.594977 +] E [MSGID: 106204] [glusterd-store.c:2622:glusterd_store_retrieve_bricks] 0-management: Unknown key: device_path multiple times I tried to remove all device_path in config in /var/lib/glusterd/snaps/... But that did not make any difference glusterd still crashes with the same log output, except that these error log lines do not exist. I do not know how to continue to figure this problem out! Best regards Marcus On Mon, Mar 06, 2023 at 09:13:05AM +0100, Marcus Pedersén wrote: > CAUTION: This email originated from outside of the organization. Do not click > links or open attachments unless you recognize the sender and know the > content is safe. > > > Hi Strahil, > > Volume info says: > > Volume Name: gds-home > Type: Replicate > Volume ID: 3d9d7182-47a8-43ac-8cd1-6a090bb4b8b9 > Status: Started > Snapshot Count: 10 > Number of Bricks: 1 x (2 + 1) = 3 > Transport-type: tcp > Bricks: > Brick1: urd-gds-022:/urd-gds/gds-home > Brick2: urd-gds-021:/urd-gds/gds-home > Brick3: urd-gds-020:/urd-gds/gds-home (arbiter) > Options Reconfigured: > features.barrier: disable > storage.fips-mode-rchecksum: on > transport.address-family: inet > nfs.disable: on > performance.client-io-threads: off > > If I look in /var/lib/glusterd/vols/gds-home/info on all three nodes it says: > tier-enabled=0 > > The snapshot config also says: > tier-enabled=0 > > As far as I can tell there are no depricated features enabled. > This is the /var/lib/glusterd/vols/gds-home/info file from the arbiter: > > type=2 > count=3 > status=1 > sub_count=3 > replica_count=3 > arbiter_count=1 > disperse_count=0 > redundancy_count=0 > version=1588 > transport-type=0 > volume-id=3d9d7182-47a8-43ac-8cd1-6a090bb4b8b9 > username= > password= > op-version=4 > client-op-version=2 > quota-version=0 > tier-enabled=0 > parent_volname=N/A > restored_from_snap=---- > snap-max-hard-limit=256 > features.barrier=disable > storage.fips-mode-rchecksum=on > transport.address-family=inet > nfs.disable=on > performance.client-io-threads=off > brick-0=urd-gds-022:-urd-gds-gds-home > brick-1=urd-gds-021:-urd-gds-gds-home > brick-2=urd-gds-020:-urd-gds-gds-home > > > No, you will not see the arbiter in the status report as glusterd > does not run at all. > > Thanks for your support Strahil! > > Best regards > Marcus > > > > On Mon, Mar 06, 2023 at 06:06:00AM +, Strahil Nikolov wrote: > > CAUTION: This email originated from outside of the organization. Do not > > click links or open attachments unless you recognize the sender and know > > the content is safe. > > > > > > Somewhere tiering is enabled. > > Check the deprecated options in > > https://docs.gluster.org/en/main/Upgrade-Guide/upgrade-to-11/#the-following-options-are-removed-from-the-code-base-and-require-to-be-unset. > > > > The simplest way would be to downgrade the arbiter, ensure it works (or > > readd it back to the TSP), and remove any deprecated options before > > upgrading . > > > > Best Regards, > > Strahil Nikolov > > > > > > On Mon, Mar 6, 2023 at 8:02, Strahil Nikolov > > wrote: > > I don't see the arbiter in the status report. > > Maybe the volfiles on host1 and host2 were changed ? > > > > What is the volume info ? > > > > Best Regards, > > Strahil Nikolov > > > > On Fri, Mar 3, 2023 at 17:30, Marcus Pedersén > > wrote: > > Hi again, > > I turned up the logging level so here is a more detailed start of glusterd. > > File is enclosed. > > > > Thanks alot for help! > > > > Regards > > Marcus > > > > > > On Fri, Mar 03, 2023 at 03:00:46PM +0100, Marcus Pedersén wrote: > > > CAUTION: This email originated from outside of the organization. Do not > > > click links or open attachments unless you recognize the sender and know > > > the content is safe. > > > > > > > > > Hi all, > > > > > > I just started to upgrade from gluster 10.3 to gluster 11. > > > I started with my arbiter node and upgraded. > > > After upgrade gluster started after change in info file. > > > Rebooted machine and after that glusterd crashes. > > > > > > I have double checked config in /var/lib/glusterd > &
Re: [Gluster-users] Gluster 11 upgrade, glusterd crash
Somewhere tiering is enabled.Check the deprecated options in https://docs.gluster.org/en/main/Upgrade-Guide/upgrade-to-11/#the-following-options-are-removed-from-the-code-base-and-require-to-be-unset. The simplest way would be to downgrade the arbiter, ensure it works (or readd it back to the TSP), and remove any deprecated options before upgrading . Best Regards,Strahil Nikolov On Mon, Mar 6, 2023 at 8:02, Strahil Nikolov wrote: I don't see the arbiter in the status report.Maybe the volfiles on host1 and host2 were changed ? What is the volume info ? Best Regards,Strahil Nikolov On Fri, Mar 3, 2023 at 17:30, Marcus Pedersén wrote: Hi again, I turned up the logging level so here is a more detailed start of glusterd. File is enclosed. Thanks alot for help! Regards Marcus On Fri, Mar 03, 2023 at 03:00:46PM +0100, Marcus Pedersén wrote: > CAUTION: This email originated from outside of the organization. Do not click > links or open attachments unless you recognize the sender and know the > content is safe. > > > Hi all, > > I just started to upgrade from gluster 10.3 to gluster 11. > I started with my arbiter node and upgraded. > After upgrade gluster started after change in info file. > Rebooted machine and after that glusterd crashes. > > I have double checked config in /var/lib/glusterd > and updated /var/lib/glusterd/vols/gv0/info > so the checksum is correct. > > OS: debian bullseye (11) > > gluster volume status (from onte of the other nodes that is not upgraded) > > Status of volume: gv0 > Gluster process TCP Port RDMA Port Online Pid > -- > Brick host2:/urd-gds/gv0 52902 0 Y 113172 > Brick host1:/urd-gds/gv0 61235 0 Y 5487 > Self-heal Daemon on localhost N/A N/A Y 5550 > Self-heal Daemon on host2 N/A N/A Y 113236 > > Task Status of Volume gds-home > -- > There are no active volume tasks > > I do not know where to start looking, > enclosed is a boot part from glusterd.log > > Thanks alot for your help!! > > Regards > Marcus > > > > -- > ** > * Marcus Pedersén * > * System administrator * > ** > * Interbull Centre * > * * > * Department of Animal Breeding & Genetics — SLU * > * Box 7023, SE-750 07 * > * Uppsala, Sweden * > ** > * Visiting address: * > * Room 55614, Ulls väg 26, Ultuna * > * Uppsala * > * Sweden * > * * > * Tel: +46-(0)18-67 1962 * > * * > ** > * ISO 9001 Bureau Veritas No SE004561-1 * > ** > --- > När du skickar e-post till SLU så innebär detta att SLU behandlar dina > personuppgifter. För att läsa mer om hur detta går till, klicka här > <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/> > E-mailing SLU will result in SLU processing your personal data. For more > information on how this is done, click here > <https://www.slu.se/en/about-slu/contact-slu/personal-data/> > [2023-03-03 13:46:01.248805 +] I [MSGID: 100030] [glusterfsd.c:2872:main] > 0-/usr/sbin/glusterd: Started running version [{arg=/usr/sbin/glusterd}, > {version=11.0}, {cmdlinestr=/usr/sbin/glusterd -p /var/run/glusterd.pid > --log-level INFO}] > [2023-03-03 13:46:01.249174 +] I [glusterfsd.c:2562:daemonize] > 0-glusterfs: Pid of current running process is 4256 > [2023-03-03 13:46:01.250935 +] I [MSGID: 0] > [glusterfsd.c:1597:volfile_init] 0-glusterfsd-mgmt: volume not found, > continuing with init > [2023-03-03 13:46:01.282327 +] I [MSGID: 106479] [glusterd.c:1660:init] > 0-management: Using /var/lib/glusterd as working directory > [2023-03-03 13:46:01.282371 +] I [MSGID: 106479] [glusterd.c:1664:init] > 0-management: Using /var/run/gluster as pid file working directory > [2023-03-03 13:46:01.288793 +] I [socket.c:973:__socket_server_bind] > 0-socket.management: process started listening on port (24007) &
Re: [Gluster-users] Gluster 11 upgrade, glusterd crash
I don't see the arbiter in the status report.Maybe the volfiles on host1 and host2 were changed ? What is the volume info ? Best Regards,Strahil Nikolov On Fri, Mar 3, 2023 at 17:30, Marcus Pedersén wrote: Hi again, I turned up the logging level so here is a more detailed start of glusterd. File is enclosed. Thanks alot for help! Regards Marcus On Fri, Mar 03, 2023 at 03:00:46PM +0100, Marcus Pedersén wrote: > CAUTION: This email originated from outside of the organization. Do not click > links or open attachments unless you recognize the sender and know the > content is safe. > > > Hi all, > > I just started to upgrade from gluster 10.3 to gluster 11. > I started with my arbiter node and upgraded. > After upgrade gluster started after change in info file. > Rebooted machine and after that glusterd crashes. > > I have double checked config in /var/lib/glusterd > and updated /var/lib/glusterd/vols/gv0/info > so the checksum is correct. > > OS: debian bullseye (11) > > gluster volume status (from onte of the other nodes that is not upgraded) > > Status of volume: gv0 > Gluster process TCP Port RDMA Port Online Pid > -- > Brick host2:/urd-gds/gv0 52902 0 Y 113172 > Brick host1:/urd-gds/gv0 61235 0 Y 5487 > Self-heal Daemon on localhost N/A N/A Y 5550 > Self-heal Daemon on host2 N/A N/A Y 113236 > > Task Status of Volume gds-home > -- > There are no active volume tasks > > I do not know where to start looking, > enclosed is a boot part from glusterd.log > > Thanks alot for your help!! > > Regards > Marcus > > > > -- > ** > * Marcus Pedersén * > * System administrator * > ** > * Interbull Centre * > * * > * Department of Animal Breeding & Genetics — SLU * > * Box 7023, SE-750 07 * > * Uppsala, Sweden * > ** > * Visiting address: * > * Room 55614, Ulls väg 26, Ultuna * > * Uppsala * > * Sweden * > * * > * Tel: +46-(0)18-67 1962 * > * * > ** > * ISO 9001 Bureau Veritas No SE004561-1 * > ** > --- > När du skickar e-post till SLU så innebär detta att SLU behandlar dina > personuppgifter. För att läsa mer om hur detta går till, klicka här > <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/> > E-mailing SLU will result in SLU processing your personal data. For more > information on how this is done, click here > <https://www.slu.se/en/about-slu/contact-slu/personal-data/> > [2023-03-03 13:46:01.248805 +] I [MSGID: 100030] [glusterfsd.c:2872:main] > 0-/usr/sbin/glusterd: Started running version [{arg=/usr/sbin/glusterd}, > {version=11.0}, {cmdlinestr=/usr/sbin/glusterd -p /var/run/glusterd.pid > --log-level INFO}] > [2023-03-03 13:46:01.249174 +] I [glusterfsd.c:2562:daemonize] > 0-glusterfs: Pid of current running process is 4256 > [2023-03-03 13:46:01.250935 +] I [MSGID: 0] > [glusterfsd.c:1597:volfile_init] 0-glusterfsd-mgmt: volume not found, > continuing with init > [2023-03-03 13:46:01.282327 +] I [MSGID: 106479] [glusterd.c:1660:init] > 0-management: Using /var/lib/glusterd as working directory > [2023-03-03 13:46:01.282371 +] I [MSGID: 106479] [glusterd.c:1664:init] > 0-management: Using /var/run/gluster as pid file working directory > [2023-03-03 13:46:01.288793 +] I [socket.c:973:__socket_server_bind] > 0-socket.management: process started listening on port (24007) > [2023-03-03 13:46:01.291749 +] I [socket.c:916:__socket_server_bind] > 0-socket.management: closing (AF_UNIX) reuse check socket 13 > [2023-03-03 13:46:01.292549 +] I [MSGID: 106059] [glusterd.c:1923:init] > 0-management: max-port override: 60999 > [2023-03-03 13:46:01.338404 +] E [MSGID: 106061] > [glusterd.c:597:glusterd_crt_georep_folders] 0-glusterd: Dict get failed > [{Key=
Re: [Gluster-users] File\Directory not healing
Move away the file located onthe arbiter brick as it has different gfid and touch it(only if the software that consumes it is NOT sensitive to atime modification). Best Regards,Strahil Nikolov On Wed, Feb 22, 2023 at 13:09, David Dolan wrote: Hi Strahil, The output in my previous email showed the directory the file is located in with a different GFID on the Arbiter node compared with the bricks on the other nodes. Based on that, do you know what my next step should be? ThanksDavid On Wed, 15 Feb 2023 at 09:21, David Dolan wrote: sorry I didn't receive the previous email.I've run the command on all 3 nodes(bricks). See below. The directory only has one file.On the Arbiter, the file doesn't exist and the directory the file should be in has a different GFID than the bricks on the other nodes Node 1 Brickgetfattr -d -m . -e hex /path_on_brick/subdir1/subdir2/filetrusted.gfid=0x7b1aa40dd1e64b7b8aac7fc6bcbc9e9b getfattr -d -m . -e hex /path_on_brick/subdir1/subdir2 trusted.gfid=0xdc99ac0db85d4b1c8a6af57a71bbe22c getfattr -d -m . -e hex /path_on_brick/subdir1 trusted.gfid=0x2aa1fe9e65094e6188fc91a6d16dd2c4 Node 2 Brickgetfattr -d -m . -e hex /path_on_brick/subdir1/subdir2/file trusted.gfid=0x7b1aa40dd1e64b7b8aac7fc6bcbc9e9b getfattr -d -m . -e hex /path_on_brick/subdir1/subdir2trusted.gfid=0xdc99ac0db85d4b1c8a6af57a71bbe22c getfattr -d -m . -e hex /path_on_brick/subdir1 trusted.gfid=0x2aa1fe9e65094e6188fc91a6d16dd2c4 Node 3 Brick (Arbiter)Path to file doesn't existgetfattr -d -m . -e hex /path_on_brick/subdir1/subdir2 trusted.gfid=0x51cca97ac2974ceb9322fe21e6f8ea91 getfattr -d -m . -e hex /path_on_brick/subdir1 trusted.gfid=0x2aa1fe9e65094e6188fc91a6d16dd2c4 ThanksDavid On Tue, 14 Feb 2023 at 20:38, Strahil Nikolov wrote: I guess you didn't receive my last e-mail. Use getfattr and identify if the gfid mismatch. If yes, move away the mismatched one. In order a dir to heal, you have to fix all files inside it before it can be healed. Best Regards, Strahil Nikolov В вторник, 14 февруари 2023 г., 14:04:31 ч. Гринуич+2, David Dolan написа: I've touched the directory one level above the directory with the I\O issue as the one above that is the one showing as dirty.It hasn't healed. Should the self heal daemon automatically kick in here? Is there anything else I can do? ThanksDavid On Tue, 14 Feb 2023 at 07:03, Strahil Nikolov wrote: You can always mount it locally on any of the gluster nodes. Best Regards,Strahil Nikolov On Mon, Feb 13, 2023 at 18:13, David Dolan wrote: HI Strahil, Thanks for that. It's the first time I've been in this position, so I'm learning as I go along. Unfortunately I can't go into the directory on the client side as I get an input/output errorInput/output error d? ? ? ? ? ? 01 ThanksDavid On Sun, 12 Feb 2023 at 20:29, Strahil Nikolov wrote: Setting blame on client-1 and client-2 will make a bigger mess.Can't you touch the affected file from the FUSE mount point ? Best Regards,Strahil Nikolov On Tue, Feb 7, 2023 at 14:42, David Dolan wrote: Hi All. Hoping you can help me with a healing problem. I have one file which didn't self heal. it looks to be a problem with a directory in the path as one node says it's dirty. I have a replica volume with arbiter This is what the 3 nodes say. One brick on each Node1getfattr -d -m . -e hex /path/to/dir | grep afrgetfattr: Removing leading '/' from absolute path namestrusted.afr.volume-client-2=0x0001trusted.afr.dirty=0xNode2getfattr -d -m . -e hex /path/to/dir | grep afrgetfattr: Removing leading '/' from absolute path namestrusted.afr.volume-client-2=0x0001trusted.afr.dirty=0xNode3(Arbiter)getfattr -d -m . -e hex /path/to/dir | grep afrgetfattr: Removing leading '/' from absolute path namestrusted.afr.dirty=0x0001Since Node3(the arbiter) sees it as dirty and it looks like Node 1 and Node 2 have good copies, I was thinking of running the following on Node1 which I believe would tell Node 2 and Node 3 to sync from Node 1 I'd then kick off a heal on the volume setfattr -n trusted.afr.volume-client-1 -v 0x0001 /path/to/dirsetfattr -n trusted.afr.volume-client-2 -v 0x0001 /path/to/dirclient-0 is node 1, client-1 is node2 and client-2 is node 3. I've verified the hard links with gfid are in the xattrop directory Is this the correct way to heal and resolve the issue? ThanksDavid Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster
Re: [Gluster-users] File\Directory not healing
I guess you didn't receive my last e-mail. Use getfattr and identify if the gfid mismatch. If yes, move away the mismatched one. In order a dir to heal, you have to fix all files inside it before it can be healed. Best Regards, Strahil Nikolov В вторник, 14 февруари 2023 г., 14:04:31 ч. Гринуич+2, David Dolan написа: I've touched the directory one level above the directory with the I\O issue as the one above that is the one showing as dirty.It hasn't healed. Should the self heal daemon automatically kick in here? Is there anything else I can do? ThanksDavid On Tue, 14 Feb 2023 at 07:03, Strahil Nikolov wrote: You can always mount it locally on any of the gluster nodes. Best Regards,Strahil Nikolov On Mon, Feb 13, 2023 at 18:13, David Dolan wrote: HI Strahil, Thanks for that. It's the first time I've been in this position, so I'm learning as I go along. Unfortunately I can't go into the directory on the client side as I get an input/output errorInput/output error d? ? ? ? ? ? 01 ThanksDavid On Sun, 12 Feb 2023 at 20:29, Strahil Nikolov wrote: Setting blame on client-1 and client-2 will make a bigger mess.Can't you touch the affected file from the FUSE mount point ? Best Regards,Strahil Nikolov On Tue, Feb 7, 2023 at 14:42, David Dolan wrote: Hi All. Hoping you can help me with a healing problem. I have one file which didn't self heal. it looks to be a problem with a directory in the path as one node says it's dirty. I have a replica volume with arbiter This is what the 3 nodes say. One brick on each Node1getfattr -d -m . -e hex /path/to/dir | grep afrgetfattr: Removing leading '/' from absolute path namestrusted.afr.volume-client-2=0x0001trusted.afr.dirty=0xNode2getfattr -d -m . -e hex /path/to/dir | grep afrgetfattr: Removing leading '/' from absolute path namestrusted.afr.volume-client-2=0x0001trusted.afr.dirty=0xNode3(Arbiter)getfattr -d -m . -e hex /path/to/dir | grep afrgetfattr: Removing leading '/' from absolute path namestrusted.afr.dirty=0x0001Since Node3(the arbiter) sees it as dirty and it looks like Node 1 and Node 2 have good copies, I was thinking of running the following on Node1 which I believe would tell Node 2 and Node 3 to sync from Node 1 I'd then kick off a heal on the volume setfattr -n trusted.afr.volume-client-1 -v 0x0001 /path/to/dirsetfattr -n trusted.afr.volume-client-2 -v 0x0001 /path/to/dirclient-0 is node 1, client-1 is node2 and client-2 is node 3. I've verified the hard links with gfid are in the xattrop directory Is this the correct way to heal and resolve the issue? ThanksDavid Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Quick way to fix stale gfids?
That is not normal.Which version are you using ? Can you provide the output from all bricks (including the arbiter):getfattr -d -m . -e hex /BRICK/PATH/TO/output_21 Troubleshooting and restoring the files should be your secondary tasks, so you should focus on stabilizing the cluster. First, enable debug log for bricks if you have the space (see https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.5/html/administration_guide/configuring_the_log_level ) to troubleshoot the dying bricks. Best Regards,Strahil Nikolov On Mon, Feb 13, 2023 at 13:21, Diego Zuccato wrote: My volume is replica 3 arbiter 1, maybe that makes a difference? Bricks processes tend to die quite often (I have to restart glusterd at least once a day because "gluster v info | grep ' N '" reports at least one missing brick; sometimes even if all bricks are reported up I have to kill all glusterfs[d] processes and restart glusterd). The 3 servers have 192GB RAM (that should be way more than enough!), 30 data bricks and 15 arbiters (the arbiters share a single SSD). And I noticed that some "stale file handle" are not reported by heal info. root@str957-cluster:/# ls -l /scratch/extra/m**/PNG/PNGQuijote/ModGrav/fNL40/ ls: cannot access '/scratch/extra/m**/PNG/PNGQuijote/ModGrav/fNL40/output_21': Stale file handle total 40 d? ? ? ? ? ? output_21 ... but "gluster v heal cluster_data info |grep output_21" returns nothing. :( Seems the other stale handles either got corrected by subsequent 'stat's or became I/O errors. Diego. Il 12/02/2023 21:34, Strahil Nikolov ha scritto: > The 2-nd error indicates conflicts between the nodes. The only way that > could happen on replica 3 is gfid conflict (file/dir was renamed or > recreated). > > Are you sure that all bricks are online? Usually 'Transport endpoint is > not connected' indicates a brick down situation. > > First start with all stale file handles: > check md5sum on all bricks. If it differs somewhere, delete the gfid and > move the file away from the brick and check in FUSE. If it's fine , > touch it and the FUSE client will "heal" it. > > Best Regards, > Strahil Nikolov > > > > On Tue, Feb 7, 2023 at 16:33, Diego Zuccato > wrote: > The contents do not match exactly, but the only difference is the > "option shared-brick-count" line that sometimes is 0 and sometimes 1. > > The command you gave could be useful for the files that still needs > healing with the source still present, but the files related to the > stale gfids have been deleted, so "find -samefile" won't find anything. > > For the other files reported by heal info, I saved the output to > 'healinfo', then: > for T in $(grep '^/' healinfo |sort|uniq); do stat /mnt/scratch$T > > /dev/null; done > > but I still see a lot of 'Transport endpoint is not connected' and > 'Stale file handle' errors :( And many 'No such file or directory'... > > I don't understand the first two errors, since /mnt/scratch have been > freshly mounted after enabling client healing, and gluster v info does > not highlight unconnected/down bricks. > > Diego > > Il 06/02/2023 22:46, Strahil Nikolov ha scritto: > > I'm not sure if the md5sum has to match , but at least the content > > should do. > > In modern versions of GlusterFS the client side healing is > disabled , > > but it's worth trying. > > You will need to enable cluster.metadata-self-heal, > > cluster.data-self-heal and cluster.entry-self-heal and then create a > > small one-liner that identifies the names of the files/dirs from the > > volume heal ,so you can stat them through the FUSE. > > > > Something like this: > > > > > > for i in $(gluster volume heal info | awk -F '' > '/gfid:/ > > {print $2}'); do find /PATH/TO/BRICK/ -samefile > > /PATH/TO/BRICK/.glusterfs/${i:0:2}/${i:2:2}/$i | awk '!/.glusterfs/ > > {gsub("/PATH/TO/BRICK", "stat /MY/FUSE/MOUNTPOINT", $0); print > $0}' ; done > > > > Then Just copy paste the output and you will trigger the client side > > heal only on the affected gfids. > > > > Best Regards, > > Strahil Nikolov > > В понеделник, 6 февруари 2023 г., 10:19:02 ч. Гринуич+2, Diego > Zuccato > > mailto:diego.zucc...@unibo.it>> написа: > > > > > > Ops... Reincluding the list that got excluded in my previous > answer :( > > > > I generated md5sums of all files in
Re: [Gluster-users] Quick way to fix stale gfids?
The 2-nd error indicates conflicts between the nodes. The only way that could happen on replica 3 is gfid conflict (file/dir was renamed or recreated). Are you sure that all bricks are online? Usually 'Transport endpoint is not connected' indicates a brick down situation. First start with all stale file handles:check md5sum on all bricks. If it differs somewhere, delete the gfid and move the file away from the brick and check in FUSE. If it's fine , touch it and the FUSE client will "heal" it. Best Regards,Strahil Nikolov On Tue, Feb 7, 2023 at 16:33, Diego Zuccato wrote: The contents do not match exactly, but the only difference is the "option shared-brick-count" line that sometimes is 0 and sometimes 1. The command you gave could be useful for the files that still needs healing with the source still present, but the files related to the stale gfids have been deleted, so "find -samefile" won't find anything. For the other files reported by heal info, I saved the output to 'healinfo', then: for T in $(grep '^/' healinfo |sort|uniq); do stat /mnt/scratch$T > /dev/null; done but I still see a lot of 'Transport endpoint is not connected' and 'Stale file handle' errors :( And many 'No such file or directory'... I don't understand the first two errors, since /mnt/scratch have been freshly mounted after enabling client healing, and gluster v info does not highlight unconnected/down bricks. Diego Il 06/02/2023 22:46, Strahil Nikolov ha scritto: > I'm not sure if the md5sum has to match , but at least the content > should do. > In modern versions of GlusterFS the client side healing is disabled , > but it's worth trying. > You will need to enable cluster.metadata-self-heal, > cluster.data-self-heal and cluster.entry-self-heal and then create a > small one-liner that identifies the names of the files/dirs from the > volume heal ,so you can stat them through the FUSE. > > Something like this: > > > for i in $(gluster volume heal info | awk -F '' '/gfid:/ > {print $2}'); do find /PATH/TO/BRICK/ -samefile > /PATH/TO/BRICK/.glusterfs/${i:0:2}/${i:2:2}/$i | awk '!/.glusterfs/ > {gsub("/PATH/TO/BRICK", "stat /MY/FUSE/MOUNTPOINT", $0); print $0}' ; done > > Then Just copy paste the output and you will trigger the client side > heal only on the affected gfids. > > Best Regards, > Strahil Nikolov > В понеделник, 6 февруари 2023 г., 10:19:02 ч. Гринуич+2, Diego Zuccato > написа: > > > Ops... Reincluding the list that got excluded in my previous answer :( > > I generated md5sums of all files in vols/ on clustor02 and compared to > the other nodes (clustor00 and clustor01). > There are differences in volfiles (shouldn't it always be 1, since every > data brick is on its own fs? quorum bricks, OTOH, share a single > partition on SSD and should always be 15, but in both cases sometimes > it's 0). > > I nearly got a stroke when I saw diff output for 'info' files, but once > I sorted 'em their contents matched. Pfhew! > > Diego > > Il 03/02/2023 19:01, Strahil Nikolov ha scritto: > > This one doesn't look good: > > > > > > [2023-02-03 07:45:46.896924 +] E [MSGID: 114079] > > [client-handshake.c:1253:client_query_portmap] 0-cluster_data-client-48: > > remote-subvolume not set in volfile [] > > > > > > Can you compare all vol files in /var/lib/glusterd/vols/ between the > nodes ? > > I have the suspicioun that there is a vol file mismatch (maybe > > /var/lib/glusterd/vols//*-shd.vol). > > > > Best Regards, > > Strahil Nikolov > > > > On Fri, Feb 3, 2023 at 12:20, Diego Zuccato > > mailto:diego.zucc...@unibo.it>> wrote: > > Can't see anything relevant in glfsheal log, just messages related to > > the crash of one of the nodes (the one that had the mobo replaced... I > > fear some on-disk structures could have been silently damaged by RAM > > errors and that makes gluster processes crash, or it's just an issue > > with enabling brick-multiplex). > > -8<-- > > [2023-02-03 07:45:46.896924 +] E [MSGID: 114079] > > [client-handshake.c:1253:client_query_portmap] > > 0-cluster_data-client-48: > > remote-subvolume not set in volfile [] > > [2023-02-03 07:45:46.897282 +] E > > [rpc-clnt.c:331:saved_frames_unwind] (--> > > > /lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x195)[0x7fce0c867b95] > > (--> /lib/x86_64-linux-gnu/libgfrpc.so.0(+0x72fc)[0x7fce0c0ca2fc] (--> > > > /lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x109)[0x7fce0c0d2419] > > (--> /lib
Re: [Gluster-users] File\Directory not healing
Setting blame on client-1 and client-2 will make a bigger mess.Can't you touch the affected file from the FUSE mount point ? Best Regards,Strahil Nikolov On Tue, Feb 7, 2023 at 14:42, David Dolan wrote: Hi All. Hoping you can help me with a healing problem. I have one file which didn't self heal. it looks to be a problem with a directory in the path as one node says it's dirty. I have a replica volume with arbiter This is what the 3 nodes say. One brick on each Node1 getfattr -d -m . -e hex /path/to/dir | grep afr getfattr: Removing leading '/' from absolute path names trusted.afr.volume-client-2=0x0001 trusted.afr.dirty=0x Node2 getfattr -d -m . -e hex /path/to/dir | grep afr getfattr: Removing leading '/' from absolute path names trusted.afr.volume-client-2=0x0001 trusted.afr.dirty=0x Node3(Arbiter) getfattr -d -m . -e hex /path/to/dir | grep afr getfattr: Removing leading '/' from absolute path names trusted.afr.dirty=0x0001Since Node3(the arbiter) sees it as dirty and it looks like Node 1 and Node 2 have good copies, I was thinking of running the following on Node1 which I believe would tell Node 2 and Node 3 to sync from Node 1 I'd then kick off a heal on the volume setfattr -n trusted.afr.volume-client-1 -v 0x0001 /path/to/dir setfattr -n trusted.afr.volume-client-2 -v 0x0001 /path/to/dirclient-0 is node 1, client-1 is node2 and client-2 is node 3. I've verified the hard links with gfid are in the xattrop directory Is this the correct way to heal and resolve the issue? ThanksDavid Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Quick way to fix stale gfids?
I'm not sure if the md5sum has to match , but at least the content should do. In modern versions of GlusterFS the client side healing is disabled , but it's worth trying. You will need to enable cluster.metadata-self-heal, cluster.data-self-heal and cluster.entry-self-heal and then create a small one-liner that identifies the names of the files/dirs from the volume heal ,so you can stat them through the FUSE. Something like this: for i in $(gluster volume heal info | awk -F '' '/gfid:/ {print $2}'); do find /PATH/TO/BRICK/ -samefile /PATH/TO/BRICK/.glusterfs/${i:0:2}/${i:2:2}/$i | awk '!/.glusterfs/ {gsub("/PATH/TO/BRICK", "stat /MY/FUSE/MOUNTPOINT", $0); print $0}' ; done Then Just copy paste the output and you will trigger the client side heal only on the affected gfids. Best Regards, Strahil Nikolov В понеделник, 6 февруари 2023 г., 10:19:02 ч. Гринуич+2, Diego Zuccato написа: Ops... Reincluding the list that got excluded in my previous answer :( I generated md5sums of all files in vols/ on clustor02 and compared to the other nodes (clustor00 and clustor01). There are differences in volfiles (shouldn't it always be 1, since every data brick is on its own fs? quorum bricks, OTOH, share a single partition on SSD and should always be 15, but in both cases sometimes it's 0). I nearly got a stroke when I saw diff output for 'info' files, but once I sorted 'em their contents matched. Pfhew! Diego Il 03/02/2023 19:01, Strahil Nikolov ha scritto: > This one doesn't look good: > > > [2023-02-03 07:45:46.896924 +] E [MSGID: 114079] > [client-handshake.c:1253:client_query_portmap] 0-cluster_data-client-48: > remote-subvolume not set in volfile [] > > > Can you compare all vol files in /var/lib/glusterd/vols/ between the nodes ? > I have the suspicioun that there is a vol file mismatch (maybe > /var/lib/glusterd/vols//*-shd.vol). > > Best Regards, > Strahil Nikolov > > On Fri, Feb 3, 2023 at 12:20, Diego Zuccato > wrote: > Can't see anything relevant in glfsheal log, just messages related to > the crash of one of the nodes (the one that had the mobo replaced... I > fear some on-disk structures could have been silently damaged by RAM > errors and that makes gluster processes crash, or it's just an issue > with enabling brick-multiplex). > -8<-- > [2023-02-03 07:45:46.896924 +] E [MSGID: 114079] > [client-handshake.c:1253:client_query_portmap] > 0-cluster_data-client-48: > remote-subvolume not set in volfile [] > [2023-02-03 07:45:46.897282 +] E > [rpc-clnt.c:331:saved_frames_unwind] (--> > >/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x195)[0x7fce0c867b95] > (--> /lib/x86_64-linux-gnu/libgfrpc.so.0(+0x72fc)[0x7fce0c0ca2fc] (--> > >/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x109)[0x7fce0c0d2419] > (--> /lib/x86_64-linux-gnu/libgfrpc.so.0(+0x10308)[0x7fce0c0d3308] (--> > >/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_transport_notify+0x26)[0x7fce0c0ce7e6] > ) 0-cluster_data-client-48: forced unwinding frame type(GF-DUMP) > op(NULL(2)) called at 2023-02-03 07:45:46.891054 + (xid=0x13) > -8<-- > > Well, actually I *KNOW* the files outside .glusterfs have been deleted > (by me :) ). That's why I call those 'stale' gfids. > Affected entries under .glusterfs have usually link count = 1 => > nothing > 'find' can find. > Since I already recovered those files (before deleting from bricks), > can > .glusterfs entries be deleted too or should I check something else? > Maybe I should create a script that finds all files/dirs (not symlinks, > IIUC) in .glusterfs on all bricks/arbiters and moves 'em to a temp dir? > > Diego > > Il 02/02/2023 23:35, Strahil Nikolov ha scritto: > > Any issues reported in /var/log/glusterfs/glfsheal-*.log ? > > > > The easiest way to identify the affected entries is to run: > > find /FULL/PATH/TO/BRICK/ -samefile > > > /FULL/PATH/TO/BRICK/.glusterfs/57/e4/57e428c7-6bed-4eb3-b9bd-02ca4c46657a > > > > > > Best Regards, > > Strahil Nikolov > > > > > > В вторник, 31 януари 2023 г., 11:58:24 ч. Гринуич+2, Diego Zuccato > > mailto:diego.zucc...@unibo.it>> написа: > > > > > > Hello all. > > > > I've had one of the 3 nodes serving a "replica 3 arbiter 1" down for > > some days (apparently RAM issues, but actually failing mobo). > > The other nodes have had some issues (RAM exhaustion, old problem > > already ticketed but still no solution) and some brick pr
Re: [Gluster-users] [Gluster-devel] Regarding Glusterfs file locking
As far as I remember there are only 2 types of locking in Linux: - Advisory - Mandatory In order to use mandatory locking, you need to pass the "mand" mount option to the FUSE client(mount -o mand, ...) and chmod g+s,g-x // Best Regards, Strahil Nikolov В сряда, 1 февруари 2023 г., 13:22:59 ч. Гринуич+2, Maaz Sheikh написа: #yiv5808026394 P {margin-top:0;margin-bottom:0;}Team, please let us know if u have any feedback.From: Maaz Sheikh Sent: Wednesday, January 25, 2023 4:51 PM To: gluster-de...@gluster.org ; gluster-users@gluster.org Subject: Regarding Glusterfs file locking Hi,Greetings of the day, Our configuration is like:We have installed both glusterFS server and GlusterFS client on node1 as well as node2. We have mounted node1 volume to both nodes. Our use case is :From glusterFS node 1, we have to take an exclusive lock and open a file (which is a shared file between both the nodes) and we should write/read in that file. >From glusterFS node 2, we should not be able to read/write that file. Now the problem we are facing is:From node1, we are able to take an exclusive lock and the program has started writing in that shared file.From node2, we are able to read and write on that file which should not happen because node1 has already acquired the lock on that file. Therefore, requesting you to please provide us a solution asap. Thanks,Maaz SheikhAssociate Software Engineer Impetus Technologies India NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference. --- Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-devel mailing list gluster-de...@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-devel Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Quick way to fix stale gfids?
Any issues reported in /var/log/glusterfs/glfsheal-*.log ? The easiest way to identify the affected entries is to run: find /FULL/PATH/TO/BRICK/ -samefile /FULL/PATH/TO/BRICK/.glusterfs/57/e4/57e428c7-6bed-4eb3-b9bd-02ca4c46657a Best Regards, Strahil Nikolov В вторник, 31 януари 2023 г., 11:58:24 ч. Гринуич+2, Diego Zuccato написа: Hello all. I've had one of the 3 nodes serving a "replica 3 arbiter 1" down for some days (apparently RAM issues, but actually failing mobo). The other nodes have had some issues (RAM exhaustion, old problem already ticketed but still no solution) and some brick processes coredumped. Restarting the processes allowed the cluster to continue working. Mostly. After the third server got fixed I started a heal, but files didn't get healed and count (by "ls -l /srv/bricks/*/d/.glusterfs/indices/xattrop/|grep ^-|wc -l") did not decrease over 2 days. So, to recover I copied files from bricks to temp storage (keeping both copies of conflicting files with different contents), removed files on bricks and arbiters, and finally copied back from temp storage to the volume. Now the files are accessible but I still see lots of entries like IIUC that's due to a mismatch between .glusterfs/ contents and normal hierarchy. Is there some tool to speed up the cleanup? Tks. -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] change OS on one node
I've done something similar when upgrading from EL7 to EL8.Backup /var/lib/glusterd/ & /etc/glusterfs/ (if needed upgrade to a gluster version that exists in both OSes) and once you install the system, install the same gluster package version and restore the 2 dirs from backup. Of course , a proper backup is always a good idea. Best Regards,Strahil Nikolov On Mon, Jan 30, 2023 at 11:33, Stefan Kania wrote: Hi to all, I would like to replace the operating system on one gluster brick. Is it possible to hold the data on the brick? If yes, how can I connect the data partition back to the new brick. I will remove the brick from the volume and remove the peer from the pool first. Then setup the new OS then I would like to configure the new brick and bind the exixting partion to the new brick. Or is it better to format the brick and rereplicat the data? We only have 500GB of data. Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] really large number of skipped files after a scrub
Great to hear that. You can also setup some logic to track the scrub status (for example ELK stack to ingest the logs). Best Regards, Strahil Nikolov В четвъртък, 19 януари 2023 г., 15:19:27 ч. Гринуич+2, cYuSeDfZfb cYuSeDfZfb написа: Hi, Just to follow up my first observation from this email from december: automatic scheduled scrubs that not happen. We have now upgraded glusterfs from 7.4 to 10.1, and now see that the automated scrubs ARE running now. Not sure why they didn't in 7.4, but issue solved. :-) MJ On Mon, 12 Dec 2022 at 13:38, cYuSeDfZfb cYuSeDfZfb wrote: Hi, I am running a PoC with cluster, and, as one does, I am trying to break and heal it. One of the things I am testing is scrubbing / healing. My cluster is created on ubuntu 20.04 with stock glusterfs 7.2, and my test volume info: Volume Name: gv0 Type: Replicate Volume ID: 7c09100b-8095-4062-971f-2cea9fa8c2bc Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: gluster1:/data/brick1/gv0 Brick2: gluster2:/data/brick1/gv0 Brick3: gluster3:/data/brick1/gv0 Options Reconfigured: features.scrub-freq: daily auth.allow: x.y.z.q transport.address-family: inet storage.fips-mode-rchecksum: on nfs.disable: on performance.client-io-threads: off features.bitrot: on features.scrub: Active features.scrub-throttle: aggressive storage.build-pgfid: on I have two issues: 1) scrubs are configured to run daily (see above) but they don't automatically happen. Do I need to configure something to actually get daily automatic scrubs? 2) A "scrub status" reports *many* skipped files, and only very few files that have actually been scrubbed. Why are so many files skipped? See: gluster volume bitrot gv0 scrub status Volume name : gv0 State of scrub: Active (Idle) Scrub impact: aggressive Scrub frequency: daily Bitrot error log location: /var/log/glusterfs/bitd.log Scrubber error log location: /var/log/glusterfs/scrub.log = Node: localhost Number of Scrubbed files: 8112 Number of Skipped files: 51209 Last completed scrub time: 2022-12-10 04:36:55 Duration of last scrub (D:M:H:M:S): 0:16:58:53 Error count: 0 = Node: gluster3 Number of Scrubbed files: 42 Number of Skipped files: 59282 Last completed scrub time: 2022-12-10 02:24:42 Duration of last scrub (D:M:H:M:S): 0:16:58:15 Error count: 0 = Node: gluster2 Number of Scrubbed files: 42 Number of Skipped files: 59282 Last completed scrub time: 2022-12-10 02:24:29 Duration of last scrub (D:M:H:M:S): 0:16:58:2 Error count: 0 = Thanks!MJ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Arbiter node in slow network
As Alan mentioned latency is more important. Also consider using an SSD for all arbiter bricks and set maxpct (man 8 mkfs.xfs) to a high level (I prefer to use '90'). Best Regards, Strahil Nikolov В събота, 31 декември 2022 г., 10:43:50 ч. Гринуич+2, Alan Orth написа: Hi Filipe, I think it would probably be fine. The Red Hat Storage docs list the important thing being 5ms latency, not link speed: https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.5/html/administration_guide/creating_arbitrated_replicated_volumes I haven't used an arbiter configuration yet (still stuck on distribute + replicate, not sure how to migrate). Let us know it goes. Regards, On Fri, Dec 2, 2022 at 6:59 PM Filipe Alvarez wrote: Hi glusters, I'm close to deploy my first GlusterFS replica 3 arbiter 1 volume. Below I will describe my hardware / plans: Node1: two bricks, 2 x raid0 arrays 40gbe network Node2: two bricks, 2 x raid0 arrays 40gbe network Node3: Arbiter 1gbe network Between Node1 and Node2, I have a 40gbe network. But the arbiter has 1 gbe network. The question is, can arbiter run in a slow network ? It will affect the general performance of volume? Thank you Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users -- Alan Orth alan.o...@gmail.com https://picturingjordan.com https://englishbulgaria.net https://mjanja.ch Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] poor performance
Hi Jaco, Have you tested the performance on GlusterFS + NFS Ganesha (gluster sets up a whole corosync/pacemaker cluster for you) ? I know that Ganesha uses libgfapi and you can use NFS Caching on the web servers. Best Regards,Strahil Nikolov On Mon, Dec 19, 2022 at 9:16, Jaco Kroon wrote: Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] GlusterFS and sysctl tweaks.
oVirt (upstream of RHV which is also KVM-based) uses sharding, which reduces sync times as only the changed shards are synced. Check the virt group's gluster tunables at /var/lib/glusterd/groups/virt .Also in the source:https://github.com/gluster/glusterfs/blob/devel/extras/group-virt.example WARNING: ONCE THE SHARDING IS ENABLED, NEVER EVER DISABLE IT ! Best Regards,Strahil Nikolov On Sun, Dec 18, 2022 at 6:51, Gilberto Ferreira wrote: Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] GlusterFS and sysctl tweaks.
Gluster's tuned profile 'rhgs-random-io' has the following : [main]include=throughput-performance [sysctl]vm.dirty_ratio = 5vm.dirty_background_ratio = 2 What kind of workload do you have (sequential IO or not) ? Best Regards,Strahil Nikolov On Fri, Dec 16, 2022 at 21:31, Gilberto Ferreira wrote: Hello! Is there any sysctl tuning to improve glusterfs regarding network configuration? Thanks ---Gilberto Nunes Ferreira(47) 99676-7530 - Whatsapp / Telegram Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] poor performance
You can try qemu ontop of GlusterFS (libgfapi enabled) to host the VMs with the PHP code. Best Regards,Strahil Nikolov On Wed, Dec 14, 2022 at 17:44, Joe Julian wrote: Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Fwd: really large number of skipped files after a scrub
By the way, what is the output of 'ps aux | grep bitd' ? Best Regards,Strahil Nikolov On Tue, Dec 13, 2022 at 15:45, Strahil Nikolov wrote: Based on https://bugzilla.redhat.com/show_bug.cgi?id=1299737#c12 , the previos name was 'number of unsigned files'. Signing seem to be a very complex process (see http://goo.gl/Mjy4mD ) and as far as I understand - those 'skipped' files were too new to be signed . If you do have RAID5/6 , I think that bitrod is unnecessary. Best Regards,Strahil Nikolov On Tue, Dec 13, 2022 at 12:33, cYuSeDfZfb cYuSeDfZfb wrote: Hi, I am running a PoC with cluster, and, as one does, I am trying to break and heal it. One of the things I am testing is scrubbing / healing. My cluster is created on ubuntu 20.04 with stock glusterfs 7.2, and my test volume info: Volume Name: gv0 Type: Replicate Volume ID: 7c09100b-8095-4062-971f-2cea9fa8c2bc Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: gluster1:/data/brick1/gv0 Brick2: gluster2:/data/brick1/gv0 Brick3: gluster3:/data/brick1/gv0 Options Reconfigured: features.scrub-freq: daily auth.allow: x.y.z.q transport.address-family: inet storage.fips-mode-rchecksum: on nfs.disable: on performance.client-io-threads: off features.bitrot: on features.scrub: Active features.scrub-throttle: aggressive storage.build-pgfid: on I have two issues: 1) scrubs are configured to run daily (see above) but they don't automatically happen. Do I need to configure something to actually get daily automatic scrubs? 2) A "scrub status" reports *many* skipped files, and only very few files that have actually been scrubbed. Why are so many files skipped? See: gluster volume bitrot gv0 scrub status Volume name : gv0 State of scrub: Active (Idle) Scrub impact: aggressive Scrub frequency: daily Bitrot error log location: /var/log/glusterfs/bitd.log Scrubber error log location: /var/log/glusterfs/scrub.log = Node: localhost Number of Scrubbed files: 8112 Number of Skipped files: 51209 Last completed scrub time: 2022-12-10 04:36:55 Duration of last scrub (D:M:H:M:S): 0:16:58:53 Error count: 0 = Node: gluster3 Number of Scrubbed files: 42 Number of Skipped files: 59282 Last completed scrub time: 2022-12-10 02:24:42 Duration of last scrub (D:M:H:M:S): 0:16:58:15 Error count: 0 = Node: gluster2 Number of Scrubbed files: 42 Number of Skipped files: 59282 Last completed scrub time: 2022-12-10 02:24:29 Duration of last scrub (D:M:H:M:S): 0:16:58:2 Error count: 0 = Thanks!MJ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Fwd: really large number of skipped files after a scrub
Based on https://bugzilla.redhat.com/show_bug.cgi?id=1299737#c12 , the previos name was 'number of unsigned files'. Signing seem to be a very complex process (see http://goo.gl/Mjy4mD ) and as far as I understand - those 'skipped' files were too new to be signed . If you do have RAID5/6 , I think that bitrod is unnecessary. Best Regards,Strahil Nikolov On Tue, Dec 13, 2022 at 12:33, cYuSeDfZfb cYuSeDfZfb wrote: Hi, I am running a PoC with cluster, and, as one does, I am trying to break and heal it. One of the things I am testing is scrubbing / healing. My cluster is created on ubuntu 20.04 with stock glusterfs 7.2, and my test volume info: Volume Name: gv0 Type: Replicate Volume ID: 7c09100b-8095-4062-971f-2cea9fa8c2bc Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: gluster1:/data/brick1/gv0 Brick2: gluster2:/data/brick1/gv0 Brick3: gluster3:/data/brick1/gv0 Options Reconfigured: features.scrub-freq: daily auth.allow: x.y.z.q transport.address-family: inet storage.fips-mode-rchecksum: on nfs.disable: on performance.client-io-threads: off features.bitrot: on features.scrub: Active features.scrub-throttle: aggressive storage.build-pgfid: on I have two issues: 1) scrubs are configured to run daily (see above) but they don't automatically happen. Do I need to configure something to actually get daily automatic scrubs? 2) A "scrub status" reports *many* skipped files, and only very few files that have actually been scrubbed. Why are so many files skipped? See: gluster volume bitrot gv0 scrub status Volume name : gv0 State of scrub: Active (Idle) Scrub impact: aggressive Scrub frequency: daily Bitrot error log location: /var/log/glusterfs/bitd.log Scrubber error log location: /var/log/glusterfs/scrub.log = Node: localhost Number of Scrubbed files: 8112 Number of Skipped files: 51209 Last completed scrub time: 2022-12-10 04:36:55 Duration of last scrub (D:M:H:M:S): 0:16:58:53 Error count: 0 = Node: gluster3 Number of Scrubbed files: 42 Number of Skipped files: 59282 Last completed scrub time: 2022-12-10 02:24:42 Duration of last scrub (D:M:H:M:S): 0:16:58:15 Error count: 0 = Node: gluster2 Number of Scrubbed files: 42 Number of Skipped files: 59282 Last completed scrub time: 2022-12-10 02:24:29 Duration of last scrub (D:M:H:M:S): 0:16:58:2 Error count: 0 = Thanks!MJ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Arbiter node in slow network
As the arbiter doesn't receive or provide any data to the clients - just metadata ,so bandwith is not critical but lattency is.Ensure that lattency is the same or lower for the arbiter node and you can use an SSD/NVME to ensure that storage lattency won't be a bottleneck. Also, don't forget to specify the isize=512 and bump the 'maxpct' to a bigger number. Usually I set it to minimum 80%. Best Regards,Strahil Nikolov On Fri, Dec 2, 2022 at 18:59, Filipe Alvarez wrote: Hi glusters, I'm close to deploy my first GlusterFS replica 3 arbiter 1 volume. Below I will describe my hardware / plans: Node1: two bricks, 2 x raid0 arrays 40gbe network Node2: two bricks, 2 x raid0 arrays 40gbe network Node3: Arbiter 1gbe network Between Node1 and Node2, I have a 40gbe network. But the arbiter has 1 gbe network. The question is, can arbiter run in a slow network ? It will affect the general performance of volume? Thank you Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Right way to use community Gluster on genuine RHEL?
Keep in mind that even if they do work, once Stream 9 is dead - there will be no more packages from the CentOS Storage SIG. Best Regards,Strahil Nikolov On Mon, Nov 28, 2022 at 13:30, Ville-Pekka Vainio wrote: > On 23. Nov 2022, at 1.03, Strahil Nikolov wrote: > > To be honest, I have no idea how to answer that. > Rocky guys do not want to duplicate the work, as we got GlusterFS in CentOS > Stream (see https://forums.rockylinux.org/t/glusterfs-repos-location/6630/7 ) > while the CentOS guys confirmed my suspicion -> once Stream is dead, the > whole infra will be deleted (see > https://www.spinics.net/lists/centos-devel/msg21628.html ) and no more > packages will be available. Thanks for attempting to answer! The discussions seem to have been mainly around EL8 and we are interested in EL9. It seems we will need to test CentOS Storage SIG packages against EL9 stable although they have been built against CentOS 9 Stream. If there are any compatibility issues, we will need to rebuild Gluster from the source RPMs. Best regards, Ville-Pekka Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Right way to use community Gluster on genuine RHEL?
To be honest, I have no idea how to answer that.Rocky guys do not want to duplicate the work, as we got GlusterFS in CentOS Stream (see https://forums.rockylinux.org/t/glusterfs-repos-location/6630/7 ) while the CentOS guys confirmed my suspicion -> once Stream is dead, the whole infra will be deleted (see https://www.spinics.net/lists/centos-devel/msg21628.html ) and no more packages will be available. As IBM's Red Hat is now abandoning GlusterFS, I see a lot of challanges for the project. Best Regards,Strahil Nikolov On Mon, Nov 21, 2022 at 13:55, Ville-Pekka Vainio wrote: Hi! I’m reviving an old thread, because my questions are related to the original topic. Are there any community packages available in any repository which are built against the stable Alma or Rocky 9? Is there a risk that Gluster packages built against CentOS 9 Stream would at some point be incompatible with the stable Alma/Rocky 9? RH promises ABI compatibility for some libraries, but if I understand correctly, liburcu (the RPM named userspace-rcu) is not one of those and there’s a chance it could change between EL9 minor releases. I’m able to rebuild the srpm, if needed, but it’d be convenient if there was a repository. Best regards, Ville-Pekka Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Doubts re: remove-brick
Have you checked https://staged-gluster-docs.readthedocs.io/en/release3.7.0beta1/Features/rebalance/ ? I know it's old but it might provide some clarity. The files are removed from the source subvolume to the new subvolume. Removed bricks do not get any writes, as during the preparation - a rebalance is issued which notifies the clients to use the new DHT subvolume. Best Regards,Strahil Nikolov On Fri, Nov 18, 2022 at 15:52, Diego Zuccato wrote: Hello all. I need to reorganize the bricks (making RAID1 on the backing devices to reduce memory used by Gluster processes) and I have a couple of doubts: - do moved (rebalanced) files get removed from source bricks so at the end I only have the files that received writes?- do bricks being removed continue getting writes for new files? Tks. -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] replica performance and brick size best practice
According to RH,the most optimal would be to have:- Disk size: 3-4TB (faster resync after failure)- Disk count: 10-12- HW raid : As you can also see on the picture that the optimal one for writes is RAID10 https://community.hpe.com/t5/servers-systems-the-right/what-are-raid-levels-and-which-are-best-for-you/ba-p/7041151 The full stripe size should be between 1MB and 2MB (prefer staying closer to the 1MB). I'm not sure of the HW Raid controller capabilities, but I would also switch the I/O scheduler to 'none' (First-In First-out while merging the requests).Enaure that you have a battery-backed cache and the cache ratio of the controller is leaning towards the writes (something like 25% read, 75% write). Jumbo Frames are recomended but not mandatory.Still, they will reduce the number of packets processed by your infrastructure which is always benefitial. Tuned Profile:You can find the tuned profiles that were usually shipped with Red Hat'sGluster Storage at https://ftp.redhat.com/redhat/linux/enterprise/7Server/en/RHS/SRPMS/redhat-storage-server-3.5.0.0-8.el7rhgs.src.rpm I will type the contents of the random-io profile here, so please double check it for typos. # /etc/tuned.d/rhgs-random-io/tuned.conf:[main]include=throughput-performace [sysctl]vm.dirty_ratio = 5vm.dirty_background_ratio = 2 Don't forget to install tuned before that. For small files , Follow the guidelines from https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.5/html/administration_guide/small_file_performance_enhancements Note: Do not use Gluster v9 and update your version to the latest minor one (for example if you use v10 -> update to 10.3). In Gluster v10 a major improvement was done for small files and v9 is out of support now. For the XFS: Mount the bricks with 'noatime'. If you use SELINUX , use the following:noatime,context="system_u:object_r:glusterd_brick_t:s0"Also, consider setting gluster's option 'cluster.min-free-disk' to something that makes sense for you (for details run 'gluster volume set help'). Of course do benchmarking with the application itself and both before and after you made a change. Best Regards,Strahil Nikolov On Mon, Nov 14, 2022 at 13:33, beer Ll wrote: Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] replica performance and brick size best practice
Hi, First you need to identify what kind of workload you will have.Some optimizations for one workload can prevent better performance in another type. If you plan to use the volume for small files , this document is a good start https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.5/html/administration_guide/small_file_performance_enhancements Replica 2 volumes are prone to split brain and it's always recommended to have a replica 3 or an arbiter . As a best practice - always test the volume with the application that will use it as synthetic benchmarks are just synthetics. I always start performance review from the bottom (infrastructure) and end on application level. You can start with https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.5/html/administration_guide/chap-configuring_red_hat_storage_for_enhancing_performance as the storage is one of the most important parts. What kind of HW raid , how many disks, stripe size and stripe width that you used in LVM?Do you use thin LVM ?How did you create your XFS (isize is critical) ? Have you used gluster's tuned profile ? Jumbo frames ? Then you will need to optimize the volume for small files (see the link above). Does you app allow you to use libgfapi ? Based on my observations in the oVirt list, libgfapi it used to provide some performance benefits compared to fuse . Also, if you work with very small files, it would make sense to combine them in some container (like in VM disks). Keep in mind that GlusterFS performance scales with the size of the cluster and the number of clients. For ultra high performance for a few clients -> there are other options. Best Regards,Strahil Nikolov On Wed, Nov 9, 2022 at 12:05, beer Ll wrote: Hi I have 2 gluster server (backup1 , backup2) connected with 10Gbit linkglusterfs version 10.3-1 server and client Each server is with 44T disk raid 6 with 1 partition used for the brick /dev/VG_BT_BRICK1/LV_BT_BRICK1 [ 43.66 TiB] /dev/mapper/VG_BT_BRICK1-LV_BT_BRICK1 on /mnt/glusterfs type xfs (rw,noatime,nouuid,attr2,inode64,sunit=256,swidth=1536,noquota) I create a replica volume named backup Volume Name: backup Type: Replicate Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: backup-1:/mnt/glusterfs/brick1 Brick2: backup-2:/mnt/glusterfs/brick1 Options Reconfigured: cluster.granular-entry-heal: on storage.fips-mode-rchecksum: on transport.address-family: inet nfs.disable: on performance.client-io-threads: off cluster.enable-shared-storage: enable The volume backup is mounted with gluster client on /mnt/share backup-1:/backup on /mnt/share type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072) Test with smallfile utility XFS with filesystem xfs on /mnt/glusterfs total threads = 8 total files = 800 100.00% of requested files processed, warning threshold is 70.00% elapsed time = 0.009 files/sec = 120927.642211 GLUSTERFS CLIENT with glusterfs on /mnt/share total threads = 8 total files = 800 100.00% of requested files processed, warning threshold is 70.00% elapsed time = 3.014 files/sec = 284.975861 How is it possible to increase the performance in glusterfs volume ? How is best practice of brick size and replica management ?Is better 1 big brick per server or more small bricks distributed ? Many Thanks Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Question on stale shards with distribute-replicate volume
I skimmed over , so take everything I say with a grain of salt. Based on thr logs, the gfid for one of the cases is clear -> b42dc8f9-755e-46be-8418-4882a9f765e1 and shard 5613. As there is a linkto, most probably the shards location was on another subvolume and in such case I would just "walk" over all bricks and get the extended file attributes of the real ones. I can't imagine why it happened but I do suspect a gfid splitbrain. If I were in your shoes, I would check the gfids and assume that those with the same gfid value are the good ones (usually the one that differs has an older timestamp) and I would remove the copy from the last brick and check if it fixes the things for me. Best Regards,Strahil Nikolov On Thu, Nov 3, 2022 at 17:24, Ronny Adsetts wrote: Hi, We have a 4 x ( 2 + 1 ) distribute-replicate volume with sharding enabled. We use the volume for storing backing files for iscsi devices. The iscsi devices are provided to our file server using tgtd using the glfs backing store type via libgfapi. So we had a problem the other day where one of the filesystems wouldn't re-mount following a rolling tgtd restart (we have 4 servers providing tgtd). I think this rolling restart was done too quickly which meant there was a disconnect at the file server end (speculating). After some investigation, and manually trying to copy the fs image file to a temporary location, I found 0 byte shards. Because I mounted the file directly I got errors in the gluster logs (/var/log/glusterfs/srv-iscsi.log) for the volume. I get no errors in gluster logs when this happens via libgfapi though I did see tgtd errors. The tgtd errors look like this: tgtd[24080]: tgtd: bs_glfs_request(279) Error on read 1000tgtd: bs_glfs_request(370) io error 0x55da8d9820b0 2 28 -1 4096 376698519552, Stale file handle Not sure how to figure out which shard is the issue out of that log entry. :-) The gluster logs look like this: [2022-11-01 16:51:28.496911] E [MSGID: 133010] [shard.c:2342:shard_common_lookup_shards_cbk] 0-iscsi-shard: Lookup on shard 5613 failed. Base file gfid = b42dc8f9-755e-46be-8418-4882a9f765e1 [Stale file handle] [2022-11-01 19:17:09.060376] E [MSGID: 133010] [shard.c:2342:shard_common_lookup_shards_cbk] 0-iscsi-shard: Lookup on shard 5418 failed. Base file gfid = b42dc8f9-755e-46be-8418-4882a9f765e1 [Stale file handle] So there were the two shards showing up as problematic. Checking the shard files showed that they were 0 byte with a trusted.glusterfs.dht.linkto value in the file attributes. There were other shard files of the same name with the correct size. So I guess the shard had been moved at some point resulting in the 8 byte linkto copies. Anyway, moving the offending .shard and associated .gluster files out of the way resulted in me being able to first, copy the file without error, and then run an "xfs_repair -L" on the filesystem and get it remounted. There was some data loss but minor as far as I can tell. So the two shards I removed (replica 2 + arbiter) look like so: ronny@cogline:~$ ls -al /tmp/publichomes-backup-stale-shards/.shard/ total 0 drwxr-xr-x 2 root root 104 Nov 2 00:13 . drwxr-xr-x 4 root root 38 Nov 2 00:05 .. -T 1 root root 0 Aug 14 04:26 b42dc8f9-755e-46be-8418-4882a9f765e1.5418 -T 1 root root 0 Oct 25 10:49 b42dc8f9-755e-46be-8418-4882a9f765e1.5613 ronny@keratrix:~$ ls -al /tmp/publichomes-backup-stale-shards/.shard/ total 0 drwxr-xr-x 2 root root 104 Nov 2 00:13 . drwxr-xr-x 4 root root 38 Nov 2 00:07 .. -T 1 root root 0 Aug 14 04:26 b42dc8f9-755e-46be-8418-4882a9f765e1.5418 -T 1 root root 0 Oct 25 10:49 b42dc8f9-755e-46be-8418-4882a9f765e1.5613 ronny@bellizen:~$ ls -al /tmp/publichomes-backup-stale-shards/.shard/ total 0 drwxr-xr-x 2 root root 55 Nov 2 00:07 . drwxr-xr-x 4 root root 38 Nov 2 00:07 .. -T 1 root root 0 Oct 25 10:49 b42dc8f9-755e-46be-8418-4882a9f765e1.5613 ronny@risca:~$ ls -al /tmp/publichomes-backup-stale-shards/.shard/ total 0 drwxr-xr-x 2 root root 55 Nov 2 00:13 . drwxr-xr-x 4 root root 38 Nov 2 00:13 .. -T 1 root root 0 Aug 14 04:26 b42dc8f9-755e-46be-8418-4882a9f765e1.5418 So the first question is did I do the right thing to get this resolved? The other, and more important question now relates to "Stale file handle" errors we are now seeing on a different file system. I only have tgtd log entries for this and wondered if anyone could help with taking a log entry and somehow figuring out which shard is the problematic one: tgtd[3052]: tgtd: bs_glfs_request(370) io error 0x56404e0dc510 2 2a -1 1310720 428680884224, Stale file handle Thanks for any help anyone can provide. Ronny -- Ronny Adsetts Technical Director Amazing Internet Ltd, London t: +44 20 8977 8943 w: www.amazinginternet.com Registered office: 85 Waldegrave Park, Twickenham, TW1 4TJ Registered in England. Company No. 40429
Re: [Gluster-users] Gluster 5.10 rebalance stuck
I would check the details in /var/lib/glusterd/vols//node_state.info Best Regards,Strahil Nikolov On Wed, Nov 2, 2022 at 9:06, Shreyansh Shah wrote: Hi, I Would really appreciate it if someone would be able to help on the above issue. We are stuck as we cannot run rebalance due to this and thus are not able to extract peak performance from the setup due to unbalanced data. Adding gluster info (without the bricks) below. Please let me know if any other details/logs are needed. Volume Name: data Type: Distribute Volume ID: 75410231-bb25-4f14-bcde-caf18fce1d31 Status: Started Snapshot Count: 0 Number of Bricks: 41 Transport-type: tcp Options Reconfigured: server.event-threads: 4 network.ping-timeout: 90 client.keepalive-time: 60 server.keepalive-time: 60 storage.health-check-interval: 60 performance.client-io-threads: on nfs.disable: on transport.address-family: inet performance.cache-size: 8GB performance.cache-refresh-timeout: 60 cluster.min-free-disk: 3% client.event-threads: 4 performance.io-thread-count: 16 On Fri, Oct 28, 2022 at 11:40 AM Shreyansh Shah wrote: Hi, We are running glusterfs 5.10 server volume. Recently we added a few new bricks and started a rebalance operation. After a couple of days the rebalance operation was just stuck, with one of the peers showing In-Progress with no file being read/transferred and the rest showing Failed/Completed, so we stopped it using "gluster volume rebalance data stop". Now when we are trying to start it again, we get the below error. Any assistance would be appreciated root@gluster-11:~# gluster volume rebalance data status volume rebalance: data: failed: Rebalance not started for volume data. root@gluster-11:~# gluster volume rebalance data start volume rebalance: data: failed: Rebalance on data is already started root@gluster-11:~# gluster volume rebalance data stop volume rebalance: data: failed: Rebalance not started for volume data. -- Regards, Shreyansh Shah AlphaGrep Securities Pvt. Ltd. -- Regards, Shreyansh Shah AlphaGrep Securities Pvt. Ltd. Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users