Re: [Gluster-users] gluster volume tier option missing
It's been deprecated quite a long time ago, like RDMA support :( Diego Il 21/02/2024 18:43, garcetto ha scritto: good evening, i am new to glusterFS and read about tiering option, BUT cannot find it on current v10 ubuntu 22 lts version, my fault? thank you. https://www.gluster.org/automated-tiering-in-gluster-2/ <https://www.gluster.org/automated-tiering-in-gluster-2/> Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] __Geo-replication status is getting Faulty after few seconds
you do not recognize the sender.* Have you tried setting up gluster georep with a dedicated non-root user ? Best Regards, Strahil Nikolov On Tue, Feb 6, 2024 at 16:38, Anant Saraswat mailto:anant.saras...@techblue.co.uk>> wrote: Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk <https://urldefense.com/v3/__https://meet.google.com/cpu-eiue-hvk__;!!I_DbfM1H!Dm8_fHcUmz5wnOfTdrkMSb6PXqGdC_3VpklsIdfjPuKgee_Ds7JD__1KjwR4F62a67f5292of5PyQVk9y3-TRe_00eSiJw$> Gluster-users mailing list Gluster-users@gluster.org <mailto:Gluster-users@gluster.org> https://lists.gluster.org/mailman/listinfo/gluster-users <https://urldefense.com/v3/__https://lists.gluster.org/mailman/listinfo/gluster-users__;!!I_DbfM1H!Dm8_fHcUmz5wnOfTdrkMSb6PXqGdC_3VpklsIdfjPuKgee_Ds7JD__1KjwR4F62a67f5292of5PyQVk9y3-TRe-GwoljEQ$> DISCLAIMER: This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the sender. This message contains confidential information and is intended only for the individual named. If you are not the named addressee, you should not disseminate, distribute or copy this email. Please notify the sender immediately by email if you have received this email by mistake and delete this email from your system. If you are not the intended recipient, you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited. Thanks for your cooperation. DISCLAIMER: This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the sender. This message contains confidential information and is intended only for the individual named. If you are not the named addressee, you should not disseminate, distribute or copy this email. Please notify the sender immediately by email if you have received this email by mistake and delete this email from your system. If you are not the intended recipient, you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited. Thanks for your cooperation. DISCLAIMER: This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the sender. This message contains confidential information and is intended only for the individual named. If you are not the named addressee, you should not disseminate, distribute or copy this email. Please notify the sender immediately by email if you have received this email by mistake and delete this email from your system. If you are not the intended recipient, you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited. Thanks for your cooperation. Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Challenges with Replicated Gluster volume after stopping Gluster on any node.
ded recipient, you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited. Thanks for your cooperation. Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Upgrade 10.4 -> 11.1 making problems
I don't want to hijack the thread. And in my case setting logs to debug would fill my /var partitions in no time. Maybe the OP can. Diego Il 18/01/2024 22:58, Strahil Nikolov ha scritto: Are you able to set the logs to debug level ? It might provide a clue what it is going on. Best Regards, Strahil Nikolov On Thu, Jan 18, 2024 at 13:08, Diego Zuccato wrote: That's the same kind of errors I keep seeing on my 2 clusters, regenerated some months ago. Seems a pseudo-split-brain that should be impossible on a replica 3 cluster but keeps happening. Sadly going to ditch Gluster ASAP. Diego Il 18/01/2024 07:11, Hu Bert ha scritto: > Good morning, > heal still not running. Pending heals now sum up to 60K per brick. > Heal was starting instantly e.g. after server reboot with version > 10.4, but doesn't with version 11. What could be wrong? > > I only see these errors on one of the "good" servers in glustershd.log: > > [2024-01-18 06:08:57.328480 +] W [MSGID: 114031] > [client-rpc-fops_v2.c:2561:client4_0_lookup_cbk] 0-workdata-client-0: > remote operation failed. > [{path=}, > {gfid=cb39a1e4-2a4c-4727-861d-3ed9e > f00681b}, {errno=2}, {error=No such file or directory}] > [2024-01-18 06:08:57.594051 +] W [MSGID: 114031] > [client-rpc-fops_v2.c:2561:client4_0_lookup_cbk] 0-workdata-client-1: > remote operation failed. > [{path=}, > {gfid=3e9b178c-ae1f-4d85-ae47-fc539 > d94dd11}, {errno=2}, {error=No such file or directory}] > > About 7K today. Any ideas? Someone? > > > Best regards, > Hubert > > Am Mi., 17. Jan. 2024 um 11:24 Uhr schrieb Hu Bert mailto:revi...@googlemail.com>>: >> >> ok, finally managed to get all servers, volumes etc runnung, but took >> a couple of restarts, cksum checks etc. >> >> One problem: a volume doesn't heal automatically or doesn't heal at all. >> >> gluster volume status >> Status of volume: workdata >> Gluster process TCP Port RDMA Port Online Pid >> -- >> Brick glusterpub1:/gluster/md3/workdata 58832 0 Y 3436 >> Brick glusterpub2:/gluster/md3/workdata 59315 0 Y 1526 >> Brick glusterpub3:/gluster/md3/workdata 56917 0 Y 1952 >> Brick glusterpub1:/gluster/md4/workdata 59688 0 Y 3755 >> Brick glusterpub2:/gluster/md4/workdata 60271 0 Y 2271 >> Brick glusterpub3:/gluster/md4/workdata 49461 0 Y 2399 >> Brick glusterpub1:/gluster/md5/workdata 54651 0 Y 4208 >> Brick glusterpub2:/gluster/md5/workdata 49685 0 Y 2751 >> Brick glusterpub3:/gluster/md5/workdata 59202 0 Y 2803 >> Brick glusterpub1:/gluster/md6/workdata 55829 0 Y 4583 >> Brick glusterpub2:/gluster/md6/workdata 50455 0 Y 3296 >> Brick glusterpub3:/gluster/md6/workdata 50262 0 Y 3237 >> Brick glusterpub1:/gluster/md7/workdata 52238 0 Y 5014 >> Brick glusterpub2:/gluster/md7/workdata 52474 0 Y 3673 >> Brick glusterpub3:/gluster/md7/workdata 57966 0 Y 3653 >> Self-heal Daemon on localhost N/A N/A Y 4141 >> Self-heal Daemon on glusterpub1 N/A N/A Y 5570 >> Self-heal Daemon on glusterpub2 N/A N/A Y 4139 >> >> "gluster volume heal workdata info" lists a lot of files per brick. >> "gluster volume heal workdata statistics heal-count" shows thousands >> of files per brick. >> "gluster volume heal workdata enable" has no effect. >> >> gluster volume heal workdata full >> Launching heal operation to perform full self heal on volume workdata >> has been successful >> Use heal info commands to check status. >> >> -> not doing anything at all. And nothing happening on the 2 "good" >> servers in e.g. glustershd.log. Heal was working as expected on >> version 10.4, but here... silence. Someone has an idea? >> >> >> Best regards, >> Hubert &
Re: [Gluster-users] Upgrade 10.4 -> 11.1 making problems
Since glusterd does not consider it a split brain, you can't solve it with standard split brain tools. I've found no way to resolve it except by manually handling one file at a time: completely unmanageable with thousands of files and having to juggle between actual path on brick and metadata files! Previously I "fixed" it by: 1) moving all the data from the volume to a temp space 2) recovering from the bricks what was inaccessible from the mountpoint (keeping different file revisions for the conflicting ones) 3) destroying and recreating the volume 4) copying back the data from the backup When gluster gets used because you need lots of space (we had more than 400TB on 3 nodes with 30x12TB SAS disks in "replica 3 arbiter 1"), where do you park the data? Is the official solution "just have a second cluster idle for when you need to fix errors"? It took more than a month of downtime this summer, and after less than 6 months I'd have to repeat it? Users are rightly quite upset... Diego Il 18/01/2024 09:17, Hu Bert ha scritto: were you able to solve the problem? Can it be treated like a "normal" split brain? 'gluster peer status' and 'gluster volume status' are ok, so kinda looks like "pseudo"... hubert Am Do., 18. Jan. 2024 um 08:28 Uhr schrieb Diego Zuccato : That's the same kind of errors I keep seeing on my 2 clusters, regenerated some months ago. Seems a pseudo-split-brain that should be impossible on a replica 3 cluster but keeps happening. Sadly going to ditch Gluster ASAP. Diego Il 18/01/2024 07:11, Hu Bert ha scritto: Good morning, heal still not running. Pending heals now sum up to 60K per brick. Heal was starting instantly e.g. after server reboot with version 10.4, but doesn't with version 11. What could be wrong? I only see these errors on one of the "good" servers in glustershd.log: [2024-01-18 06:08:57.328480 +] W [MSGID: 114031] [client-rpc-fops_v2.c:2561:client4_0_lookup_cbk] 0-workdata-client-0: remote operation failed. [{path=}, {gfid=cb39a1e4-2a4c-4727-861d-3ed9e f00681b}, {errno=2}, {error=No such file or directory}] [2024-01-18 06:08:57.594051 +] W [MSGID: 114031] [client-rpc-fops_v2.c:2561:client4_0_lookup_cbk] 0-workdata-client-1: remote operation failed. [{path=}, {gfid=3e9b178c-ae1f-4d85-ae47-fc539 d94dd11}, {errno=2}, {error=No such file or directory}] About 7K today. Any ideas? Someone? Best regards, Hubert Am Mi., 17. Jan. 2024 um 11:24 Uhr schrieb Hu Bert : ok, finally managed to get all servers, volumes etc runnung, but took a couple of restarts, cksum checks etc. One problem: a volume doesn't heal automatically or doesn't heal at all. gluster volume status Status of volume: workdata Gluster process TCP Port RDMA Port Online Pid -- Brick glusterpub1:/gluster/md3/workdata 58832 0 Y 3436 Brick glusterpub2:/gluster/md3/workdata 59315 0 Y 1526 Brick glusterpub3:/gluster/md3/workdata 56917 0 Y 1952 Brick glusterpub1:/gluster/md4/workdata 59688 0 Y 3755 Brick glusterpub2:/gluster/md4/workdata 60271 0 Y 2271 Brick glusterpub3:/gluster/md4/workdata 49461 0 Y 2399 Brick glusterpub1:/gluster/md5/workdata 54651 0 Y 4208 Brick glusterpub2:/gluster/md5/workdata 49685 0 Y 2751 Brick glusterpub3:/gluster/md5/workdata 59202 0 Y 2803 Brick glusterpub1:/gluster/md6/workdata 55829 0 Y 4583 Brick glusterpub2:/gluster/md6/workdata 50455 0 Y 3296 Brick glusterpub3:/gluster/md6/workdata 50262 0 Y 3237 Brick glusterpub1:/gluster/md7/workdata 52238 0 Y 5014 Brick glusterpub2:/gluster/md7/workdata 52474 0 Y 3673 Brick glusterpub3:/gluster/md7/workdata 57966 0 Y 3653 Self-heal Daemon on localhost N/A N/AY 4141 Self-heal Daemon on glusterpub1 N/A N/AY 5570 Self-heal Daemon on glusterpub2 N/A N/AY 4139 "gluster volume heal workdata info" lists a lot of files per brick. "gluster volume heal workdata statistics heal-count" shows thousands of files per brick. "gluster volume heal workdata enable" has no effect. gluster volume heal workdata full Launching heal operation to perform full self heal on volume workdata has been successful Use heal info commands to check status. -> not doing anything at all. And nothing happening on the 2 "good" servers in e.g. glustershd.log. Heal was working as expected on version 10.4, but here... silence. Someone has an idea? Best regards, Hubert Am Di., 16. Jan. 2024 um 13:44 Uhr schrieb
Re: [Gluster-users] Upgrade 10.4 -> 11.1 making problems
usterd-handler.c:2546:__glusterd_handle_incoming_friend_req] 0-glusterd: Received probe from uuid: b71401c3-512a-47cb-ac18-473c4ba7776e [2024-01-15 08:02:23.608349 +] E [MSGID: 106010] [glusterd-utils.c:3824:glusterd_compare_friend_volume] 0-management: Version of Cksums sourceimages differ. local cksum = 2204642525, remote cksum = 1931483801 on peer gluster190 [2024-01-15 08:02:23.608584 +] I [MSGID: 106493] [glusterd-handler.c:3819:glusterd_xfer_friend_add_resp] 0-glusterd: Responded to gluster190 (0), ret: 0, op_ret: -1 [2024-01-15 08:02:23.613553 +] I [MSGID: 106493] [glusterd-rpc-ops.c:467:__glusterd_friend_add_cbk] 0-glusterd: Received RJT from uuid: b71401c3-512a-47cb-ac18-473c4ba7776e, host: gluster190, port: 0 peer status from rebooted node: root@gluster190 ~ # gluster peer status Number of Peers: 2 Hostname: gluster189 Uuid: 50dc8288-aa49-4ea8-9c6c-9a9a926c67a7 State: Peer Rejected (Connected) Hostname: gluster188 Uuid: e15a33fe-e2f7-47cf-ac53-a3b34136555d State: Peer Rejected (Connected) So the rebooted gluster190 is not accepted anymore. And thus does not appear in "gluster volume status". I then followed this guide: https://gluster-documentations.readthedocs.io/en/latest/Administrator%20Guide/Resolving%20Peer%20Rejected/ Remove everything under /var/lib/glusterd/ (except glusterd.info) and restart glusterd service etc. Data get copied from other nodes, 'gluster peer status' is ok again - but the volume info is missing, /var/lib/glusterd/vols is empty. When syncing this dir from another node, the volume then is available again, heals start etc. Well, and just to be sure that everything's working as it should, rebooted that node again - the rebooted node is kicked out again, and you have to restart bringing it back again. Sry, but did i miss anything? Has someone experienced similar problems? I'll probably downgrade to 10.4 again, that version was working... Thx, Hubert Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Gluster -> Ceph
Il 17/12/2023 14:52, Joe Julian ha scritto: From what I've been told (by experts) it's really hard to make it happen. More if proper redundancy of MON and MDS daemons is implemented on quality HW. LSI isn't exactly crap hardware. But when a flaw causes it to drop drives under heavy load, the rebalance from dropped drives can cause that heavy load causing a cascading failure. When the journal is never idle long enough to checkpoint, it fills the partition and ends up corrupted and unrecoverable. Good to know. Better to add a monitoring service that stops everything when the log is too full. That also applies to Gluster, BTW, even if with less severe consequences: sometimes, "peer files" got lost due to /var filling up and glusterd wouldn't come up after a reboot. Neither Gluster nor Ceph are "backup solutions", so if the data is not easily replaceable it's better to have it elsewhere. Better if offline. It's a nice idea but when you're dealing in petabytes of data, streaming in as fast as your storage will allow, it's just not physically possible. Well, it will have to stop sometimes, or you'd need an infinite storage, no? :) Usually data from experiments comes in bursts, with (often large) intervals when you can process/archive it. -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Gluster -> Ceph
Il 14/12/2023 16:08, Joe Julian ha scritto: With ceph, if the placement database is corrupted, all your data is lost (happened to my employer, once, losing 5PB of customer data). From what I've been told (by experts) it's really hard to make it happen. More if proper redundancy of MON and MDS daemons is implemented on quality HW. With Gluster, it's just files on disks, easily recovered. I've already had to do it twice in a year with the coming third time that's the "definitive migration". The first time there were too many little files, the second it seemed 192GB RAM are not enough to handle 30 bricks per server, and now that I reduced to just 6 bricks per server (creating RAIDs) and created a brand new volume in august, I already find lots of FUSE-inaccessible files that doesn't heal. Should be impossible since I'm using "replica 3 arbiter 1" over IPoIB with the three servers speaking directly via the switch. But it keeps happening. I really trusted Gluster promises, but currently what I (and, worse, the users) see is a 60-70% availability. Neither Gluster nor Ceph are "backup solutions", so if the data is not easily replaceable it's better to have it elsewhere. Better if offline. -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Gluster 9.3 not distributing data evenly across bricks
3 9.9T 8.7T 1.2T 89% /data3 *Node 8:* /dev/bcache0 9.7T 8.7T 994G 90% /data /dev/bcache1 3.9T 3.3T 645G 84% /data1 /dev/bcache2 3.9T 3.4T 519G 87% /data2 /dev/bcache3 9.9T 9.0T 868G 92% /data3 *Node 9:* /dev/bcache0 10T 8.6T 1.4T 87% /data /dev/bcache1 8.0T 6.7T 1.4T 84% /data1 /dev/bcache2 8.0T 6.8T 1.3T 85% /data2 *Node 10:* /dev/bcache0 10T 8.8T 1.3T 88% /data /dev/bcache1 8.0T 6.6T 1.4T 83% /data1 /dev/bcache2 8.0T 7.0T 990G 88% /data2 *Node 11:* /dev/bcache0 10T 8.1T 1.9T 82% /data /dev/bcache1 10T 8.5T 1.5T 86% /data1 /dev/bcache2 10T 8.4T 1.6T 85% /data2 *Node 12:* /dev/bcache0 10T 8.4T 1.6T 85% /data /dev/bcache1 10T 8.4T 1.6T 85% /data1 /dev/bcache2 10T 8.2T 1.8T 83% /data2 *Node 13:* /dev/bcache1 10T 8.7T 1.3T 88% /data1 /dev/bcache2 10T 8.8T 1.2T 88% /data2 /dev/bcache0 10T 8.6T 1.5T 86% /data -- Regards, Shreyansh Shah AlphaGrep* Securities Pvt. Ltd.* Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Rebuilding a failed cluster
Much depends on the original volume layout. For replica volumes you'll find multiple copies of the same file on different bricks. And sometimes 0-byte files that are placeholders of renamed files: do not overwrite a good file with its empty version! If the old volume is still online, it's better if you copy from its FUSE mount point to the new one. But since it's a temporary "backup", there's no need to use another Gluster volume as the destination: just use a USB drive directly connected to the old nodes (one at a time) or to a machine that can still FUSE mount the old volume. Once you have a backup, write-protect it and experiment freely :) Diego Il 29/11/2023 19:17, Richard Betel ha scritto: Ok, it's been a while, but I'm getting back to this "project". I was unable to get gluster for the platform: the machines are ARM-based, and there are no ARM binaries on the gluster package repo. I tried building it instead, but the version of gluster I was running was quite old, and I couldn't get all the right package versions to do a successful build. As a result, it sounds like my best option is to follow your alternate suggestion: "The other option is to setup a new cluster and volume and then mount the volume via FUSE and copy the data from one of the bricks." I want to be sure I understand what you're saying, though. Here's my plan: create 3 VMs on amd64 processors(*) Give each a 100G brick set up the 3 bricks as disperse mount the new gluster volume on my workstation copy directories from one of the old bricks to the mounted new GFS volume Copy fully restored data from new GFS volume to workstation or whatever permanent setup I go with. Is that right? Or do I want the GFS system to be offline while I copy the contents of the old brick to the new brick? (*) I'm not planning to keep my GFS on VMs on cloud, I just want something temporary to work with so I don't blow up anything else. On Sat, 12 Aug 2023 at 09:20, Strahil Nikolov <mailto:hunter86...@yahoo.com>> wrote: If you preserved the gluster structure in /etc/ and /var/lib, you should be able to run the cluster again. First install the same gluster version all nodes and then overwrite the structure in /etc and in /var/lib. Once you mount the bricks , start glusterd and check the situation. The other option is to setup a new cluster and volume and then mount the volume via FUSE and copy the data from one of the bricks. Best Regards, Strahil Nikolov On Saturday, August 12, 2023, 7:46 AM, Richard Betel mailto:emte...@gmail.com>> wrote: I had a small cluster with a disperse 3 volume. 2 nodes had hardware failures and no longer boot, and I don't have replacement hardware for them (it's an old board called a PC-duino). However, I do have their intact root filesystems and the disks the bricks are on. So I need to rebuild the cluster on all new host hardware. does anyone have any suggestions on how to go about doing this? I've built 3 vms to be a new test cluster, but if I copy over a file from the 3 nodes and try to read it, I can't and get errors in /var/log/glusterfs/foo.log: [2023-08-12 03:50:47.638134 +] W [MSGID: 114031] [client-rpc-fops_v2.c:2561:client4_0_lookup_cbk] 0-gv-client-0: remote operation failed. [{path=/helmetpart.scad}, {gfid=----} , {errno=61}, {error=No data available}] [2023-08-12 03:50:49.834859 +] E [MSGID: 122066] [ec-common.c:1301:ec_prepare_update_cbk] 0-gv-disperse-0: Unable to get config xattr. FOP : 'FXATTROP' failed on gfid 076a511d-3721-4231-ba3b-5c4cbdbd7f5d. Pa rent FOP: READ [No data available] [2023-08-12 03:50:49.834930 +] W [fuse-bridge.c:2994:fuse_readv_cbk] 0-glusterfs-fuse: 39: READ => -1 gfid=076a511d-3721-4231-ba3b-5c4cbdbd7f5d fd=0x7fbc9c001a98 (No data available) so obviously, I need to copy over more stuff from the original cluster. If I force the 3 nodes and the volume to have the same uuids, will that be enough? Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk <https://meet.google.com/cpu-eiue-hvk> Gluster-users mailing list Gluster-users@gluster.org <mailto:Gluster-users@gluster.org> https://lists.gluster.org/mailman/listinfo/gluster-users <https://lists.gluster.org/mailman/listinfo/gluster-users> Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman
Re: [Gluster-users] Verify limit-objects from clients in Gluster9 ?
Since Gluster allows for setting "project quotas" on directories both for size (limit) and for inodes (limit-objects), and it allows for checking size quota from clients (via df), I expected it to do the same for inodes (tentatively via "df -i"). But it seems it's not the case, so I was asking if I'm missing something... Diego Il 21/11/2023 19:06, Strahil Nikolov ha scritto: What do you mean by dir ? Usually inode max value is per File System. Best Regards, Strahil Nikolov On Mon, Nov 6, 2023 at 12:58, difa.csi wrote: Hello all. Is there a way to check inode limit from clients? df -i /path/to/dir seems to report values for all the volume, not just the dir. For space it works as expected: # gluster v quota cluster_data list Path Hard-limit Soft-limit Used Available Soft-limit exceeded? Hard-limit exceeded? --- /astro 20.0TB 80%(16.0TB) 18.8TB 1.2TB Yes No # df /mnt/scratch/astro Filesystem 1K-blocks Used Available Use% Mounted on clustor00:cluster_data 21474836480 20169918036 1304918444 94% /mnt/scratch For inodes, instead: # gluster v quota cluster_data list-objects Path Hard-limit Soft-limit Files Dirs Available Soft-limit exceeded? Hard-limit exceeded? --- /astro 10 80%(8) # df -i /mnt/scratch/astro Filesystem Inodes IUsed IFree IUse% Mounted on clustor00:cluster_data 4687500480 122689 4687377791 1% /mnt/scratch 99897 103 0 Yes Yes Should report 100% use for "hard quota exceeded", IMO. That's on Gluster 9.6. -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk <https://meet.google.com/cpu-eiue-hvk> Gluster-users mailing list Gluster-users@gluster.org <mailto:Gluster-users@gluster.org> https://lists.gluster.org/mailman/listinfo/gluster-users <https://lists.gluster.org/mailman/listinfo/gluster-users> Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] State of the gluster project
Maybe a bit OT... I'm no expert on either, but the concepts are quite similar. Both require "extra" nodes (metadata and monitor), but those can be virtual machines or you can host the services on OSD machines. We don't use snapshots, so I can't comment on that. My experience with Ceph is limited to having it working on Proxmox. No experience yet with CephFS. BeeGFS is more like a "freemium" FS: the base functionality is free, but if you need "enterprise" features (quota, replication...) you have to pay (quite a lot... probably not to compromise lucrative GPFS licensing). We also saw more than 30 minutes for an ls on a Gluster directory containing about 50 files when we had many millions of files on the fs (with one disk per brick, which also lead to many memory issues). After last rebuild I created 5-disks RAID5 bricks (about 44TB each) and memory pressure wend down drastically, but desyncs still happen even if the nodes are connected via IPoIB links that are really rock-solid (and in the worst case they could fallback to 1Gbps Ethernet connectivity). Diego Il 27/10/2023 10:30, Marcus Pedersén ha scritto: Hi Diego, I have had a look at BeeGFS and is seems more similar to ceph then to gluster. It requires extra management nodes similar to ceph, right? Second of all there are no snapshots in BeeGFS, as I understand it. I know ceph has snapshots so for us this seems a better alternative. What is your experience of ceph? I am sorry to hear about your problems with gluster, from my experience we had quite some issues with gluster when it was "young", I thing the first version we installed whas 3.5 or so. It was also extremly slow, an ls took forever. But later versions has been "kind" to us and worked quite well and file access has become really comfortable. Best regards Marcus On Fri, Oct 27, 2023 at 10:16:08AM +0200, Diego Zuccato wrote: CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe. Hi. I'm also migrating to BeeGFS and CephFS (depending on usage). What I liked most about Gluster was that files were easily recoverable from bricks even in case of disaster and that it said it supported RDMA. But I soon found that RDMA was being phased out, and I always find entries that are not healing after a couple months of (not really heavy) use, directories that can't be removed because not all files have been deleted from all the bricks and files or directories that become inaccessible with no apparent reason. Given that I currently have 3 nodes with 30 12TB disks each in replica 3 arbiter 1 it's become a major showstopper: can't stop production, backup everything and restart from scratch every 3-4 months. And there are no tools helping, just log digging :( Even at version 9.6 seems it's not really "production ready"... More like v0.9.6 IMVHO. And now it being EOLed makes it way worse. Diego Il 27/10/2023 09:40, Zakhar Kirpichenko ha scritto: Hi, Red Hat Gluster Storage is EOL, Red Hat moved Gluster devs to other projects, so Gluster doesn't get much attention. From my experience, it has deteriorated since about version 9.0, and we're migrating to alternatives. /Z On Fri, 27 Oct 2023 at 10:29, Marcus Pedersén mailto:marcus.peder...@slu.se>> wrote: Hi all, I just have a general thought about the gluster project. I have got the feeling that things has slowed down in the gluster project. I have had a look at github and to me the project seems to slow down, for gluster version 11 there has been no minor releases, we are still on 11.0 and I have not found any references to 11.1. There is a milestone called 12 but it seems to be stale. I have hit the issue: https://github.com/gluster/glusterfs/issues/4085 <https://github.com/gluster/glusterfs/issues/4085> that seems to have no sollution. I noticed when version 11 was released that you could not bump OP version to 11 and reported this, but this is still not available. I am just wondering if I am missing something here? We have been using gluster for many years in production and I think that gluster is great!! It has served as well over the years and we have seen some great improvments of stabilility and speed increase. So is there something going on or have I got the wrong impression (and feeling)? Best regards Marcus --- När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/ <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>> E-mailing SLU will result in SLU processing your personal data. For more information on how this
Re: [Gluster-users] State of the gluster project
Hi. I'm also migrating to BeeGFS and CephFS (depending on usage). What I liked most about Gluster was that files were easily recoverable from bricks even in case of disaster and that it said it supported RDMA. But I soon found that RDMA was being phased out, and I always find entries that are not healing after a couple months of (not really heavy) use, directories that can't be removed because not all files have been deleted from all the bricks and files or directories that become inaccessible with no apparent reason. Given that I currently have 3 nodes with 30 12TB disks each in replica 3 arbiter 1 it's become a major showstopper: can't stop production, backup everything and restart from scratch every 3-4 months. And there are no tools helping, just log digging :( Even at version 9.6 seems it's not really "production ready"... More like v0.9.6 IMVHO. And now it being EOLed makes it way worse. Diego Il 27/10/2023 09:40, Zakhar Kirpichenko ha scritto: Hi, Red Hat Gluster Storage is EOL, Red Hat moved Gluster devs to other projects, so Gluster doesn't get much attention. From my experience, it has deteriorated since about version 9.0, and we're migrating to alternatives. /Z On Fri, 27 Oct 2023 at 10:29, Marcus Pedersén <mailto:marcus.peder...@slu.se>> wrote: Hi all, I just have a general thought about the gluster project. I have got the feeling that things has slowed down in the gluster project. I have had a look at github and to me the project seems to slow down, for gluster version 11 there has been no minor releases, we are still on 11.0 and I have not found any references to 11.1. There is a milestone called 12 but it seems to be stale. I have hit the issue: https://github.com/gluster/glusterfs/issues/4085 <https://github.com/gluster/glusterfs/issues/4085> that seems to have no sollution. I noticed when version 11 was released that you could not bump OP version to 11 and reported this, but this is still not available. I am just wondering if I am missing something here? We have been using gluster for many years in production and I think that gluster is great!! It has served as well over the years and we have seen some great improvments of stabilility and speed increase. So is there something going on or have I got the wrong impression (and feeling)? Best regards Marcus --- När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/ <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/ <https://www.slu.se/en/about-slu/contact-slu/personal-data/>> Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk <https://meet.google.com/cpu-eiue-hvk> Gluster-users mailing list Gluster-users@gluster.org <mailto:Gluster-users@gluster.org> https://lists.gluster.org/mailman/listinfo/gluster-users <https://lists.gluster.org/mailman/listinfo/gluster-users> Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Qustionmark in permission and Owner
Seen something similar when FUSE client died, but it marked the whole mountpoint, not just some files. Might be a desync or communication loss between the nodes? Diego Il 05/06/2023 11:23, Stefan Kania ha scritto: Hello, I have a strange problem on a gluster volume If I do an "ls -l" in a directory insight a mountet gluster-volume I see, only for some files, questionmarks for the permission, the owner, the size and the date. Looking at the same directory on the brick it self, everything is ok. After rebooting the nodes everything is back to normal. System is Debian 11 and Gluster is version 9. The filesystem is LVM2 thin provisioned and formated with XFS. But as I said, the brick is ok only the mountet volume is having the problem. Any hind what it could be? Thank's Stefan Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] How to configure?
After a lot of tests and unsuccessful searching, I decided to start from scratch: I'm going to ditch the old volume and create a new one. I have 3 servers with 30 12TB disks each. Since I'm going to start a new volume, could it be better to group disks in 10 3-disk (or 6 5-disk) RAID-0 volumes to reduce the number of bricks? Redundancy would be given by replica 2 (still undecided about arbiter vs thin-arbiter...). Current configuration is: root@str957-clustor00:~# gluster v info cluster_data Volume Name: cluster_data Type: Distributed-Replicate Volume ID: a8caaa90-d161-45bb-a68c-278263a8531a Status: Started Snapshot Count: 0 Number of Bricks: 45 x (2 + 1) = 135 Transport-type: tcp Bricks: Brick1: clustor00:/srv/bricks/00/d Brick2: clustor01:/srv/bricks/00/d Brick3: clustor02:/srv/bricks/00/q (arbiter) ... Brick133: clustor01:/srv/bricks/29/d Brick134: clustor02:/srv/bricks/29/d Brick135: clustor00:/srv/bricks/14/q (arbiter) Options Reconfigured: cluster.background-self-heal-count: 256 cluster.heal-wait-queue-length: 1 performance.quick-read: off cluster.entry-self-heal: on cluster.data-self-heal-algorithm: full cluster.metadata-self-heal: on cluster.shd-max-threads: 2 network.inode-lru-limit: 50 performance.md-cache-timeout: 600 performance.cache-invalidation: on features.cache-invalidation-timeout: 600 features.cache-invalidation: on features.quota-deem-statfs: on performance.readdir-ahead: on cluster.granular-entry-heal: enable features.scrub: Active features.bitrot: on cluster.lookup-optimize: on performance.stat-prefetch: on performance.cache-refresh-timeout: 60 performance.parallel-readdir: on performance.write-behind-window-size: 128MB cluster.self-heal-daemon: enable features.inode-quota: on features.quota: on transport.address-family: inet nfs.disable: on performance.client-io-threads: off client.event-threads: 1 features.scrub-throttle: normal diagnostics.brick-log-level: ERROR diagnostics.client-log-level: ERROR config.brick-threads: 0 cluster.lookup-unhashed: on config.client-threads: 1 cluster.use-anonymous-inode: off diagnostics.brick-sys-log-level: CRITICAL features.scrub-freq: monthly cluster.data-self-heal: on cluster.brick-multiplex: on cluster.daemon-log-level: ERROR Each node is a dual-Xeon 4210 (for a total of 20 cores, 40 threads) equipped with 192GB RAM (that got exhausted quite often, before enabling brick-multiplex). Diego Il 24/03/2023 19:21, Strahil Nikolov ha scritto: Try finding if any of them is missing on one of the systems. Best Regards, Strahil Nikolov On Fri, Mar 24, 2023 at 15:59, Diego Zuccato wrote: There are 285 files in /var/lib/glusterd/vols/cluster_data ... including many files with names related to quorum bricks already moved to a different path (like cluster_data.client.clustor02.srv-quorum-00-d.vol that should already have been replaced by cluster_data.clustor02.srv-bricks-00-q.vol -- and both vol files exist). Is there something I should check inside the volfiles? Diego Il 24/03/2023 13:05, Strahil Nikolov ha scritto: > Can you check your volume file contents? > Maybe it really can't find (or access) a specific volfile ? > > Best Regards, > Strahil Nikolov > > On Fri, Mar 24, 2023 at 8:07, Diego Zuccato > mailto:diego.zucc...@unibo.it>> wrote: > In glfsheal-Connection.log I see many lines like: > [2023-03-13 23:04:40.241481 +] E [MSGID: 104021] > [glfs-mgmt.c:586:glfs_mgmt_getspec_cbk] 0-gfapi: failed to get the > volume file [{from server}, {errno=2}, {error=File o directory non > esistente}] > > And *lots* of gfid-mismatch errors in glustershd.log . > > Couldn't find anything that would prevent heal to start. :( > > Diego > > Il 21/03/2023 20:39, Strahil Nikolov ha scritto: > > I have no clue. Have you checked for errors in the logs ? Maybe you > > might find something useful. > > > > Best Regards, > > Strahil Nikolov > > > > On Tue, Mar 21, 2023 at 9:56, Diego Zuccato > > mailto:diego.zucc...@unibo.it> <mailto:diego.zucc...@unibo.it>> wrote: > > Killed glfsheal, after a day there were 218 processes, then > they got > > killed by OOM during the weekend. Now there are no processes > active. > > Trying to run "heal info" reports lots of files quite quickly > but does > > not spawn any glfsheal process. And neither does restarting > glusterd. > > Is there some way to selectively run glfsheal to fix one brick > at a > > time? >
Re: [Gluster-users] Performance: lots of small files, hdd, nvme etc.
Well, you have *way* more files than we do... :) Il 30/03/2023 11:26, Hu Bert ha scritto: Just an observation: is there a performance difference between a sw raid10 (10 disks -> one brick) or 5x raid1 (each raid1 a brick) Err... RAID10 is not 10 disks unless you stripe 5 mirrors of 2 disks. with the same disks (10TB hdd)? The heal processes on the 5xraid1-scenario seems faster. Just out of curiosity... It should be, since the bricks are smaller. But given you're using a replica 3 I don't understand why you're also using RAID1: for each 10T of user-facing capacity you're keeping 60TB of data on disks. I'd ditch local RAIDs to double the space available. Unless you desperately need the extra read performance. Options Reconfigured:I'll have a look at the options you use. Maybe something can be useful in our case. Tks :) -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] How to configure?
There are 285 files in /var/lib/glusterd/vols/cluster_data ... including many files with names related to quorum bricks already moved to a different path (like cluster_data.client.clustor02.srv-quorum-00-d.vol that should already have been replaced by cluster_data.clustor02.srv-bricks-00-q.vol -- and both vol files exist). Is there something I should check inside the volfiles? Diego Il 24/03/2023 13:05, Strahil Nikolov ha scritto: Can you check your volume file contents? Maybe it really can't find (or access) a specific volfile ? Best Regards, Strahil Nikolov On Fri, Mar 24, 2023 at 8:07, Diego Zuccato wrote: In glfsheal-Connection.log I see many lines like: [2023-03-13 23:04:40.241481 +] E [MSGID: 104021] [glfs-mgmt.c:586:glfs_mgmt_getspec_cbk] 0-gfapi: failed to get the volume file [{from server}, {errno=2}, {error=File o directory non esistente}] And *lots* of gfid-mismatch errors in glustershd.log . Couldn't find anything that would prevent heal to start. :( Diego Il 21/03/2023 20:39, Strahil Nikolov ha scritto: > I have no clue. Have you checked for errors in the logs ? Maybe you > might find something useful. > > Best Regards, > Strahil Nikolov > > On Tue, Mar 21, 2023 at 9:56, Diego Zuccato > mailto:diego.zucc...@unibo.it>> wrote: > Killed glfsheal, after a day there were 218 processes, then they got > killed by OOM during the weekend. Now there are no processes active. > Trying to run "heal info" reports lots of files quite quickly but does > not spawn any glfsheal process. And neither does restarting glusterd. > Is there some way to selectively run glfsheal to fix one brick at a > time? > > Diego > > Il 21/03/2023 01:21, Strahil Nikolov ha scritto: > > Theoretically it might help. > > If possible, try to resolve any pending heals. > > > > Best Regards, > > Strahil Nikolov > > > > On Thu, Mar 16, 2023 at 15:29, Diego Zuccato > > mailto:diego.zucc...@unibo.it> <mailto:diego.zucc...@unibo.it>> wrote: > > In Debian stopping glusterd does not stop brick processes: to stop > > everything (and free the memory) I have to > > systemctl stop glusterd > > killall glusterfs{,d} > > killall glfsheal > > systemctl start glusterd > > [this behaviour hangs a simple reboot of a machine running > glusterd... > > not nice] > > > > For now I just restarted glusterd w/o killing the bricks: > > > > root@str957-clustor00:~# ps aux|grep glfsheal|wc -l ; > systemctl restart > > glusterd ; ps aux|grep glfsheal|wc -l > > 618 > > 618 > > > > No change neither in glfsheal processes nor in free memory :( > > Should I "killall glfsheal" before OOK kicks in? > > > > Diego > > > > Il 16/03/2023 12:37, Strahil Nikolov ha scritto: > > > Can you restart glusterd service (first check that it was not > > modified > > > to kill the bricks)? > > > > > > Best Regards, > > > Strahil Nikolov > > > > > > On Thu, Mar 16, 2023 at 8:26, Diego Zuccato > > > mailto:diego.zucc...@unibo.it> <mailto:diego.zucc...@unibo.it> > <mailto:diego.zucc...@unibo.it>> wrote: > > > OOM is just just a matter of time. > > > > > > Today mem use is up to 177G/187 and: > > > # ps aux|grep glfsheal|wc -l > > > 551 > > > > > > (well, one is actually the grep process, so "only" 550 > glfsheal > > > processes. > > > > > > I'll take the last 5: > > > root 3266352 0.5 0.0 600292 93044 ? Sl > 06:55 0:07 > > > /usr/libexec/glusterfs/glfsheal cluster_data > info-summary --xml > > > root 3267220 0.7 0.0 600292 91964 ? Sl > 07:00 0:07 >
Re: [Gluster-users] How to configure?
In glfsheal-Connection.log I see many lines like: [2023-03-13 23:04:40.241481 +] E [MSGID: 104021] [glfs-mgmt.c:586:glfs_mgmt_getspec_cbk] 0-gfapi: failed to get the volume file [{from server}, {errno=2}, {error=File o directory non esistente}] And *lots* of gfid-mismatch errors in glustershd.log . Couldn't find anything that would prevent heal to start. :( Diego Il 21/03/2023 20:39, Strahil Nikolov ha scritto: I have no clue. Have you checked for errors in the logs ? Maybe you might find something useful. Best Regards, Strahil Nikolov On Tue, Mar 21, 2023 at 9:56, Diego Zuccato wrote: Killed glfsheal, after a day there were 218 processes, then they got killed by OOM during the weekend. Now there are no processes active. Trying to run "heal info" reports lots of files quite quickly but does not spawn any glfsheal process. And neither does restarting glusterd. Is there some way to selectively run glfsheal to fix one brick at a time? Diego Il 21/03/2023 01:21, Strahil Nikolov ha scritto: > Theoretically it might help. > If possible, try to resolve any pending heals. > > Best Regards, > Strahil Nikolov > > On Thu, Mar 16, 2023 at 15:29, Diego Zuccato > mailto:diego.zucc...@unibo.it>> wrote: > In Debian stopping glusterd does not stop brick processes: to stop > everything (and free the memory) I have to > systemctl stop glusterd > killall glusterfs{,d} > killall glfsheal > systemctl start glusterd > [this behaviour hangs a simple reboot of a machine running glusterd... > not nice] > > For now I just restarted glusterd w/o killing the bricks: > > root@str957-clustor00:~# ps aux|grep glfsheal|wc -l ; systemctl restart > glusterd ; ps aux|grep glfsheal|wc -l > 618 > 618 > > No change neither in glfsheal processes nor in free memory :( > Should I "killall glfsheal" before OOK kicks in? > > Diego > > Il 16/03/2023 12:37, Strahil Nikolov ha scritto: > > Can you restart glusterd service (first check that it was not > modified > > to kill the bricks)? > > > > Best Regards, > > Strahil Nikolov > > > > On Thu, Mar 16, 2023 at 8:26, Diego Zuccato > > mailto:diego.zucc...@unibo.it> <mailto:diego.zucc...@unibo.it>> wrote: > > OOM is just just a matter of time. > > > > Today mem use is up to 177G/187 and: > > # ps aux|grep glfsheal|wc -l > > 551 > > > > (well, one is actually the grep process, so "only" 550 glfsheal > > processes. > > > > I'll take the last 5: > > root 3266352 0.5 0.0 600292 93044 ? Sl 06:55 0:07 > > /usr/libexec/glusterfs/glfsheal cluster_data info-summary --xml > > root 3267220 0.7 0.0 600292 91964 ? Sl 07:00 0:07 > > /usr/libexec/glusterfs/glfsheal cluster_data info-summary --xml > > root 3268076 1.0 0.0 600160 88216 ? Sl 07:05 0:08 > > /usr/libexec/glusterfs/glfsheal cluster_data info-summary --xml > > root 3269492 1.6 0.0 600292 91248 ? Sl 07:10 0:07 > > /usr/libexec/glusterfs/glfsheal cluster_data info-summary --xml > > root 3270354 4.4 0.0 600292 93260 ? Sl 07:15 0:07 > > /usr/libexec/glusterfs/glfsheal cluster_data info-summary --xml > > > > -8<-- > > root@str957-clustor00:~# ps -o ppid= 3266352 > > 3266345 > > root@str957-clustor00:~# ps -o ppid= 3267220 > > 3267213 > > root@str957-clustor00:~# ps -o ppid= 3268076 > > 3268069 > > root@str957-clustor00:~# ps -o ppid= 3269492 > > 3269485 > > root@str957-clustor00:~# ps -o ppid= 3270354 > > 3270347 > > root@str957-clustor00:~# ps aux|grep 3266345 > > root 3266345 0.0 0.0 430536 10764 ? Sl 06:55 0:00 > > gluster volume heal cluster_data info summary --xml > > root 3271532 0.0 0.0 6260 2500 pts/1 S+ 07:21 0:00 > grep > > 3266345 >
Re: [Gluster-users] How to configure?
Killed glfsheal, after a day there were 218 processes, then they got killed by OOM during the weekend. Now there are no processes active. Trying to run "heal info" reports lots of files quite quickly but does not spawn any glfsheal process. And neither does restarting glusterd. Is there some way to selectively run glfsheal to fix one brick at a time? Diego Il 21/03/2023 01:21, Strahil Nikolov ha scritto: Theoretically it might help. If possible, try to resolve any pending heals. Best Regards, Strahil Nikolov On Thu, Mar 16, 2023 at 15:29, Diego Zuccato wrote: In Debian stopping glusterd does not stop brick processes: to stop everything (and free the memory) I have to systemctl stop glusterd killall glusterfs{,d} killall glfsheal systemctl start glusterd [this behaviour hangs a simple reboot of a machine running glusterd... not nice] For now I just restarted glusterd w/o killing the bricks: root@str957-clustor00:~# ps aux|grep glfsheal|wc -l ; systemctl restart glusterd ; ps aux|grep glfsheal|wc -l 618 618 No change neither in glfsheal processes nor in free memory :( Should I "killall glfsheal" before OOK kicks in? Diego Il 16/03/2023 12:37, Strahil Nikolov ha scritto: > Can you restart glusterd service (first check that it was not modified > to kill the bricks)? > > Best Regards, > Strahil Nikolov > > On Thu, Mar 16, 2023 at 8:26, Diego Zuccato > mailto:diego.zucc...@unibo.it>> wrote: > OOM is just just a matter of time. > > Today mem use is up to 177G/187 and: > # ps aux|grep glfsheal|wc -l > 551 > > (well, one is actually the grep process, so "only" 550 glfsheal > processes. > > I'll take the last 5: > root 3266352 0.5 0.0 600292 93044 ? Sl 06:55 0:07 > /usr/libexec/glusterfs/glfsheal cluster_data info-summary --xml > root 3267220 0.7 0.0 600292 91964 ? Sl 07:00 0:07 > /usr/libexec/glusterfs/glfsheal cluster_data info-summary --xml > root 3268076 1.0 0.0 600160 88216 ? Sl 07:05 0:08 > /usr/libexec/glusterfs/glfsheal cluster_data info-summary --xml > root 3269492 1.6 0.0 600292 91248 ? Sl 07:10 0:07 > /usr/libexec/glusterfs/glfsheal cluster_data info-summary --xml > root 3270354 4.4 0.0 600292 93260 ? Sl 07:15 0:07 > /usr/libexec/glusterfs/glfsheal cluster_data info-summary --xml > > -8<-- > root@str957-clustor00:~# ps -o ppid= 3266352 > 3266345 > root@str957-clustor00:~# ps -o ppid= 3267220 > 3267213 > root@str957-clustor00:~# ps -o ppid= 3268076 > 3268069 > root@str957-clustor00:~# ps -o ppid= 3269492 > 3269485 > root@str957-clustor00:~# ps -o ppid= 3270354 > 3270347 > root@str957-clustor00:~# ps aux|grep 3266345 > root 3266345 0.0 0.0 430536 10764 ? Sl 06:55 0:00 > gluster volume heal cluster_data info summary --xml > root 3271532 0.0 0.0 6260 2500 pts/1 S+ 07:21 0:00 grep > 3266345 > root@str957-clustor00:~# ps aux|grep 3267213 > root 3267213 0.0 0.0 430536 10644 ? Sl 07:00 0:00 > gluster volume heal cluster_data info summary --xml > root 3271599 0.0 0.0 6260 2480 pts/1 S+ 07:22 0:00 grep > 3267213 > root@str957-clustor00:~# ps aux|grep 3268069 > root 3268069 0.0 0.0 430536 10704 ? Sl 07:05 0:00 > gluster volume heal cluster_data info summary --xml > root 3271626 0.0 0.0 6260 2516 pts/1 S+ 07:22 0:00 grep > 3268069 > root@str957-clustor00:~# ps aux|grep 3269485 > root 3269485 0.0 0.0 430536 10756 ? Sl 07:10 0:00 > gluster volume heal cluster_data info summary --xml > root 3271647 0.0 0.0 6260 2480 pts/1 S+ 07:22 0:00 grep > 3269485 > root@str957-clustor00:~# ps aux|grep 3270347 > root 3270347 0.0 0.0 430536 10672 ? Sl 07:15 0:00 > gluster volume heal cluster_data info summary --xml > root 3271666 0.0 0.0 6260 2568 pts/1 S+ 07:22 0:00 grep > 3270347 > -8<-- > > Seems glfsheal is spawning more processes. > I can't rule out a metadata corruption (or at least a desync), but it > shouldn't happen... > > Diego > > Il 15/03/2023 20:11, Strahil Nikolov ha scritto: > > I
Re: [Gluster-users] How to configure?
In Debian stopping glusterd does not stop brick processes: to stop everything (and free the memory) I have to systemctl stop glusterd killall glusterfs{,d} killall glfsheal systemctl start glusterd [this behaviour hangs a simple reboot of a machine running glusterd... not nice] For now I just restarted glusterd w/o killing the bricks: root@str957-clustor00:~# ps aux|grep glfsheal|wc -l ; systemctl restart glusterd ; ps aux|grep glfsheal|wc -l 618 618 No change neither in glfsheal processes nor in free memory :( Should I "killall glfsheal" before OOK kicks in? Diego Il 16/03/2023 12:37, Strahil Nikolov ha scritto: Can you restart glusterd service (first check that it was not modified to kill the bricks)? Best Regards, Strahil Nikolov On Thu, Mar 16, 2023 at 8:26, Diego Zuccato wrote: OOM is just just a matter of time. Today mem use is up to 177G/187 and: # ps aux|grep glfsheal|wc -l 551 (well, one is actually the grep process, so "only" 550 glfsheal processes. I'll take the last 5: root 3266352 0.5 0.0 600292 93044 ? Sl 06:55 0:07 /usr/libexec/glusterfs/glfsheal cluster_data info-summary --xml root 3267220 0.7 0.0 600292 91964 ? Sl 07:00 0:07 /usr/libexec/glusterfs/glfsheal cluster_data info-summary --xml root 3268076 1.0 0.0 600160 88216 ? Sl 07:05 0:08 /usr/libexec/glusterfs/glfsheal cluster_data info-summary --xml root 3269492 1.6 0.0 600292 91248 ? Sl 07:10 0:07 /usr/libexec/glusterfs/glfsheal cluster_data info-summary --xml root 3270354 4.4 0.0 600292 93260 ? Sl 07:15 0:07 /usr/libexec/glusterfs/glfsheal cluster_data info-summary --xml -8<-- root@str957-clustor00:~# ps -o ppid= 3266352 3266345 root@str957-clustor00:~# ps -o ppid= 3267220 3267213 root@str957-clustor00:~# ps -o ppid= 3268076 3268069 root@str957-clustor00:~# ps -o ppid= 3269492 3269485 root@str957-clustor00:~# ps -o ppid= 3270354 3270347 root@str957-clustor00:~# ps aux|grep 3266345 root 3266345 0.0 0.0 430536 10764 ? Sl 06:55 0:00 gluster volume heal cluster_data info summary --xml root 3271532 0.0 0.0 6260 2500 pts/1 S+ 07:21 0:00 grep 3266345 root@str957-clustor00:~# ps aux|grep 3267213 root 3267213 0.0 0.0 430536 10644 ? Sl 07:00 0:00 gluster volume heal cluster_data info summary --xml root 3271599 0.0 0.0 6260 2480 pts/1 S+ 07:22 0:00 grep 3267213 root@str957-clustor00:~# ps aux|grep 3268069 root 3268069 0.0 0.0 430536 10704 ? Sl 07:05 0:00 gluster volume heal cluster_data info summary --xml root 3271626 0.0 0.0 6260 2516 pts/1 S+ 07:22 0:00 grep 3268069 root@str957-clustor00:~# ps aux|grep 3269485 root 3269485 0.0 0.0 430536 10756 ? Sl 07:10 0:00 gluster volume heal cluster_data info summary --xml root 3271647 0.0 0.0 6260 2480 pts/1 S+ 07:22 0:00 grep 3269485 root@str957-clustor00:~# ps aux|grep 3270347 root 3270347 0.0 0.0 430536 10672 ? Sl 07:15 0:00 gluster volume heal cluster_data info summary --xml root 3271666 0.0 0.0 6260 2568 pts/1 S+ 07:22 0:00 grep 3270347 -8<-- Seems glfsheal is spawning more processes. I can't rule out a metadata corruption (or at least a desync), but it shouldn't happen... Diego Il 15/03/2023 20:11, Strahil Nikolov ha scritto: > If you don't experience any OOM , you can focus on the heals. > > 284 processes of glfsheal seems odd. > > Can you check the ppid for 2-3 randomly picked ? > ps -o ppid= > > Best Regards, > Strahil Nikolov > > On Wed, Mar 15, 2023 at 9:54, Diego Zuccato > mailto:diego.zucc...@unibo.it>> wrote: > I enabled it yesterday and that greatly reduced memory pressure. > Current volume info: > -8<-- > Volume Name: cluster_data > Type: Distributed-Replicate > Volume ID: a8caaa90-d161-45bb-a68c-278263a8531a > Status: Started > Snapshot Count: 0 > Number of Bricks: 45 x (2 + 1) = 135 > Transport-type: tcp > Bricks: > Brick1: clustor00:/srv/bricks/00/d > Brick2: clustor01:/srv/bricks/00/d > Brick3: clustor02:/srv/bricks/00/q (arbiter) > [...] > Brick133: clustor01:/srv/bricks/29/d > Brick134: clustor02:/srv/bricks/29/d > Brick135: clustor00:/srv/bricks/14/q (arbiter) > Options Reconfigured: > performance.quick-read: off > cluster.entry-self-heal: on > cluster.data-self-heal-algorithm: full > cluster.metadata-self-heal: on >
Re: [Gluster-users] How to configure?
OOM is just just a matter of time. Today mem use is up to 177G/187 and: # ps aux|grep glfsheal|wc -l 551 (well, one is actually the grep process, so "only" 550 glfsheal processes. I'll take the last 5: root 3266352 0.5 0.0 600292 93044 ?Sl 06:55 0:07 /usr/libexec/glusterfs/glfsheal cluster_data info-summary --xml root 3267220 0.7 0.0 600292 91964 ?Sl 07:00 0:07 /usr/libexec/glusterfs/glfsheal cluster_data info-summary --xml root 3268076 1.0 0.0 600160 88216 ?Sl 07:05 0:08 /usr/libexec/glusterfs/glfsheal cluster_data info-summary --xml root 3269492 1.6 0.0 600292 91248 ?Sl 07:10 0:07 /usr/libexec/glusterfs/glfsheal cluster_data info-summary --xml root 3270354 4.4 0.0 600292 93260 ?Sl 07:15 0:07 /usr/libexec/glusterfs/glfsheal cluster_data info-summary --xml -8<-- root@str957-clustor00:~# ps -o ppid= 3266352 3266345 root@str957-clustor00:~# ps -o ppid= 3267220 3267213 root@str957-clustor00:~# ps -o ppid= 3268076 3268069 root@str957-clustor00:~# ps -o ppid= 3269492 3269485 root@str957-clustor00:~# ps -o ppid= 3270354 3270347 root@str957-clustor00:~# ps aux|grep 3266345 root 3266345 0.0 0.0 430536 10764 ?Sl 06:55 0:00 gluster volume heal cluster_data info summary --xml root 3271532 0.0 0.0 6260 2500 pts/1S+ 07:21 0:00 grep 3266345 root@str957-clustor00:~# ps aux|grep 3267213 root 3267213 0.0 0.0 430536 10644 ?Sl 07:00 0:00 gluster volume heal cluster_data info summary --xml root 3271599 0.0 0.0 6260 2480 pts/1S+ 07:22 0:00 grep 3267213 root@str957-clustor00:~# ps aux|grep 3268069 root 3268069 0.0 0.0 430536 10704 ?Sl 07:05 0:00 gluster volume heal cluster_data info summary --xml root 3271626 0.0 0.0 6260 2516 pts/1S+ 07:22 0:00 grep 3268069 root@str957-clustor00:~# ps aux|grep 3269485 root 3269485 0.0 0.0 430536 10756 ?Sl 07:10 0:00 gluster volume heal cluster_data info summary --xml root 3271647 0.0 0.0 6260 2480 pts/1S+ 07:22 0:00 grep 3269485 root@str957-clustor00:~# ps aux|grep 3270347 root 3270347 0.0 0.0 430536 10672 ?Sl 07:15 0:00 gluster volume heal cluster_data info summary --xml root 3271666 0.0 0.0 6260 2568 pts/1S+ 07:22 0:00 grep 3270347 -8<-- Seems glfsheal is spawning more processes. I can't rule out a metadata corruption (or at least a desync), but it shouldn't happen... Diego Il 15/03/2023 20:11, Strahil Nikolov ha scritto: If you don't experience any OOM , you can focus on the heals. 284 processes of glfsheal seems odd. Can you check the ppid for 2-3 randomly picked ? ps -o ppid= Best Regards, Strahil Nikolov On Wed, Mar 15, 2023 at 9:54, Diego Zuccato wrote: I enabled it yesterday and that greatly reduced memory pressure. Current volume info: -8<-- Volume Name: cluster_data Type: Distributed-Replicate Volume ID: a8caaa90-d161-45bb-a68c-278263a8531a Status: Started Snapshot Count: 0 Number of Bricks: 45 x (2 + 1) = 135 Transport-type: tcp Bricks: Brick1: clustor00:/srv/bricks/00/d Brick2: clustor01:/srv/bricks/00/d Brick3: clustor02:/srv/bricks/00/q (arbiter) [...] Brick133: clustor01:/srv/bricks/29/d Brick134: clustor02:/srv/bricks/29/d Brick135: clustor00:/srv/bricks/14/q (arbiter) Options Reconfigured: performance.quick-read: off cluster.entry-self-heal: on cluster.data-self-heal-algorithm: full cluster.metadata-self-heal: on cluster.shd-max-threads: 2 network.inode-lru-limit: 50 performance.md-cache-timeout: 600 performance.cache-invalidation: on features.cache-invalidation-timeout: 600 features.cache-invalidation: on features.quota-deem-statfs: on performance.readdir-ahead: on cluster.granular-entry-heal: enable features.scrub: Active features.bitrot: on cluster.lookup-optimize: on performance.stat-prefetch: on performance.cache-refresh-timeout: 60 performance.parallel-readdir: on performance.write-behind-window-size: 128MB cluster.self-heal-daemon: enable features.inode-quota: on features.quota: on transport.address-family: inet nfs.disable: on performance.client-io-threads: off client.event-threads: 1 features.scrub-throttle: normal diagnostics.brick-log-level: ERROR diagnostics.client-log-level: ERROR config.brick-threads: 0 cluster.lookup-unhashed: on config.client-threads: 1 cluster.use-anonymous-inode: off diagnostics.brick-sys-log-level: CRITICAL features.scrub-freq: monthly cluster.data-self-heal: on cluster.brick-multiplex: on cluster.daemon-log-level: ERROR -8<-- htop reports that memory usage is up to 143G, there are 602 tasks and 5232 threads (~20 running) on clustor00, 117G/49 tasks/1565 threads on
Re: [Gluster-users] How to configure?
I enabled it yesterday and that greatly reduced memory pressure. Current volume info: -8<-- Volume Name: cluster_data Type: Distributed-Replicate Volume ID: a8caaa90-d161-45bb-a68c-278263a8531a Status: Started Snapshot Count: 0 Number of Bricks: 45 x (2 + 1) = 135 Transport-type: tcp Bricks: Brick1: clustor00:/srv/bricks/00/d Brick2: clustor01:/srv/bricks/00/d Brick3: clustor02:/srv/bricks/00/q (arbiter) [...] Brick133: clustor01:/srv/bricks/29/d Brick134: clustor02:/srv/bricks/29/d Brick135: clustor00:/srv/bricks/14/q (arbiter) Options Reconfigured: performance.quick-read: off cluster.entry-self-heal: on cluster.data-self-heal-algorithm: full cluster.metadata-self-heal: on cluster.shd-max-threads: 2 network.inode-lru-limit: 50 performance.md-cache-timeout: 600 performance.cache-invalidation: on features.cache-invalidation-timeout: 600 features.cache-invalidation: on features.quota-deem-statfs: on performance.readdir-ahead: on cluster.granular-entry-heal: enable features.scrub: Active features.bitrot: on cluster.lookup-optimize: on performance.stat-prefetch: on performance.cache-refresh-timeout: 60 performance.parallel-readdir: on performance.write-behind-window-size: 128MB cluster.self-heal-daemon: enable features.inode-quota: on features.quota: on transport.address-family: inet nfs.disable: on performance.client-io-threads: off client.event-threads: 1 features.scrub-throttle: normal diagnostics.brick-log-level: ERROR diagnostics.client-log-level: ERROR config.brick-threads: 0 cluster.lookup-unhashed: on config.client-threads: 1 cluster.use-anonymous-inode: off diagnostics.brick-sys-log-level: CRITICAL features.scrub-freq: monthly cluster.data-self-heal: on cluster.brick-multiplex: on cluster.daemon-log-level: ERROR -8<-- htop reports that memory usage is up to 143G, there are 602 tasks and 5232 threads (~20 running) on clustor00, 117G/49 tasks/1565 threads on clustor01 and 126G/45 tasks/1574 threads on clustor02. I see quite a lot (284!) of glfsheal processes running on clustor00 (a "gluster v heal cluster_data info summary" is running on clustor02 since yesterday, still no output). Shouldn't be just one per brick? Diego Il 15/03/2023 08:30, Strahil Nikolov ha scritto: Do you use brick multiplexing ? Best Regards, Strahil Nikolov On Tue, Mar 14, 2023 at 16:44, Diego Zuccato wrote: Hello all. Our Gluster 9.6 cluster is showing increasing problems. Currently it's composed of 3 servers (2x Intel Xeon 4210 [20 cores dual thread, total 40 threads], 192GB RAM, 30x HGST HUH721212AL5200 [12TB]), configured in replica 3 arbiter 1. Using Debian packages from Gluster 9.x latest repository. Seems 192G RAM are not enough to handle 30 data bricks + 15 arbiters and I often had to reload glusterfsd because glusterfs processed got killed for OOM. On top of that, performance have been quite bad, especially when we reached about 20M files. On top of that, one of the servers have had mobo issues that resulted in memory errors that corrupted some bricks fs (XFS, it required "xfs_reparir -L" to fix). Now I'm getting lots of "stale file handle" errors and other errors (like directories that seem empty from the client but still containing files in some bricks) and auto healing seems unable to complete. Since I can't keep up continuing to manually fix all the issues, I'm thinking about backup+destroy+recreate strategy. I think that if I reduce the number of bricks per server to just 5 (RAID1 of 6x12TB disks) I might resolve RAM issues - at the cost of longer heal times in case a disk fails. Am I right or it's useless? Other recommendations? Servers have space for another 6 disks. Maybe those could be used for some SSDs to speed up access? TIA. -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk <https://meet.google.com/cpu-eiue-hvk> Gluster-users mailing list Gluster-users@gluster.org <mailto:Gluster-users@gluster.org> https://lists.gluster.org/mailman/listinfo/gluster-users <https://lists.gluster.org/mailman/listinfo/gluster-users> -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] How to configure?
Hello all. Our Gluster 9.6 cluster is showing increasing problems. Currently it's composed of 3 servers (2x Intel Xeon 4210 [20 cores dual thread, total 40 threads], 192GB RAM, 30x HGST HUH721212AL5200 [12TB]), configured in replica 3 arbiter 1. Using Debian packages from Gluster 9.x latest repository. Seems 192G RAM are not enough to handle 30 data bricks + 15 arbiters and I often had to reload glusterfsd because glusterfs processed got killed for OOM. On top of that, performance have been quite bad, especially when we reached about 20M files. On top of that, one of the servers have had mobo issues that resulted in memory errors that corrupted some bricks fs (XFS, it required "xfs_reparir -L" to fix). Now I'm getting lots of "stale file handle" errors and other errors (like directories that seem empty from the client but still containing files in some bricks) and auto healing seems unable to complete. Since I can't keep up continuing to manually fix all the issues, I'm thinking about backup+destroy+recreate strategy. I think that if I reduce the number of bricks per server to just 5 (RAID1 of 6x12TB disks) I might resolve RAM issues - at the cost of longer heal times in case a disk fails. Am I right or it's useless? Other recommendations? Servers have space for another 6 disks. Maybe those could be used for some SSDs to speed up access? TIA. -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Quick way to fix stale gfids?
My volume is replica 3 arbiter 1, maybe that makes a difference? Bricks processes tend to die quite often (I have to restart glusterd at least once a day because "gluster v info | grep ' N '" reports at least one missing brick; sometimes even if all bricks are reported up I have to kill all glusterfs[d] processes and restart glusterd). The 3 servers have 192GB RAM (that should be way more than enough!), 30 data bricks and 15 arbiters (the arbiters share a single SSD). And I noticed that some "stale file handle" are not reported by heal info. root@str957-cluster:/# ls -l /scratch/extra/m**/PNG/PNGQuijote/ModGrav/fNL40/ ls: cannot access '/scratch/extra/m**/PNG/PNGQuijote/ModGrav/fNL40/output_21': Stale file handle total 40 d? ? ?? ?? output_21 ... but "gluster v heal cluster_data info |grep output_21" returns nothing. :( Seems the other stale handles either got corrected by subsequent 'stat's or became I/O errors. Diego. Il 12/02/2023 21:34, Strahil Nikolov ha scritto: The 2-nd error indicates conflicts between the nodes. The only way that could happen on replica 3 is gfid conflict (file/dir was renamed or recreated). Are you sure that all bricks are online? Usually 'Transport endpoint is not connected' indicates a brick down situation. First start with all stale file handles: check md5sum on all bricks. If it differs somewhere, delete the gfid and move the file away from the brick and check in FUSE. If it's fine , touch it and the FUSE client will "heal" it. Best Regards, Strahil Nikolov On Tue, Feb 7, 2023 at 16:33, Diego Zuccato wrote: The contents do not match exactly, but the only difference is the "option shared-brick-count" line that sometimes is 0 and sometimes 1. The command you gave could be useful for the files that still needs healing with the source still present, but the files related to the stale gfids have been deleted, so "find -samefile" won't find anything. For the other files reported by heal info, I saved the output to 'healinfo', then: for T in $(grep '^/' healinfo |sort|uniq); do stat /mnt/scratch$T > /dev/null; done but I still see a lot of 'Transport endpoint is not connected' and 'Stale file handle' errors :( And many 'No such file or directory'... I don't understand the first two errors, since /mnt/scratch have been freshly mounted after enabling client healing, and gluster v info does not highlight unconnected/down bricks. Diego Il 06/02/2023 22:46, Strahil Nikolov ha scritto: > I'm not sure if the md5sum has to match , but at least the content > should do. > In modern versions of GlusterFS the client side healing is disabled , > but it's worth trying. > You will need to enable cluster.metadata-self-heal, > cluster.data-self-heal and cluster.entry-self-heal and then create a > small one-liner that identifies the names of the files/dirs from the > volume heal ,so you can stat them through the FUSE. > > Something like this: > > > for i in $(gluster volume heal info | awk -F '' '/gfid:/ > {print $2}'); do find /PATH/TO/BRICK/ -samefile > /PATH/TO/BRICK/.glusterfs/${i:0:2}/${i:2:2}/$i | awk '!/.glusterfs/ > {gsub("/PATH/TO/BRICK", "stat /MY/FUSE/MOUNTPOINT", $0); print $0}' ; done > > Then Just copy paste the output and you will trigger the client side > heal only on the affected gfids. > > Best Regards, > Strahil Nikolov > В понеделник, 6 февруари 2023 г., 10:19:02 ч. Гринуич+2, Diego Zuccato > mailto:diego.zucc...@unibo.it>> написа: > > > Ops... Reincluding the list that got excluded in my previous answer :( > > I generated md5sums of all files in vols/ on clustor02 and compared to > the other nodes (clustor00 and clustor01). > There are differences in volfiles (shouldn't it always be 1, since every > data brick is on its own fs? quorum bricks, OTOH, share a single > partition on SSD and should always be 15, but in both cases sometimes > it's 0). > > I nearly got a stroke when I saw diff output for 'info' files, but once > I sorted 'em their contents matched. Pfhew! > > Diego > > Il 03/02/2023 19:01, Strahil Nikolov ha scritto: > > This one doesn't look good: > > > > > > [2023-02-03 07:45:46.896924 +] E [MSGID: 114079] > > [client-handshake.c:1253:client_query_portmap] 0-cluster_data-client-48: > > remote-subvolume not set in volfile [] > > > > > >
Re: [Gluster-users] Quick way to fix stale gfids?
The contents do not match exactly, but the only difference is the "option shared-brick-count" line that sometimes is 0 and sometimes 1. The command you gave could be useful for the files that still needs healing with the source still present, but the files related to the stale gfids have been deleted, so "find -samefile" won't find anything. For the other files reported by heal info, I saved the output to 'healinfo', then: for T in $(grep '^/' healinfo |sort|uniq); do stat /mnt/scratch$T > /dev/null; done but I still see a lot of 'Transport endpoint is not connected' and 'Stale file handle' errors :( And many 'No such file or directory'... I don't understand the first two errors, since /mnt/scratch have been freshly mounted after enabling client healing, and gluster v info does not highlight unconnected/down bricks. Diego Il 06/02/2023 22:46, Strahil Nikolov ha scritto: I'm not sure if the md5sum has to match , but at least the content should do. In modern versions of GlusterFS the client side healing is disabled , but it's worth trying. You will need to enable cluster.metadata-self-heal, cluster.data-self-heal and cluster.entry-self-heal and then create a small one-liner that identifies the names of the files/dirs from the volume heal ,so you can stat them through the FUSE. Something like this: for i in $(gluster volume heal info | awk -F '' '/gfid:/ {print $2}'); do find /PATH/TO/BRICK/ -samefile /PATH/TO/BRICK/.glusterfs/${i:0:2}/${i:2:2}/$i | awk '!/.glusterfs/ {gsub("/PATH/TO/BRICK", "stat /MY/FUSE/MOUNTPOINT", $0); print $0}' ; done Then Just copy paste the output and you will trigger the client side heal only on the affected gfids. Best Regards, Strahil Nikolov В понеделник, 6 февруари 2023 г., 10:19:02 ч. Гринуич+2, Diego Zuccato написа: Ops... Reincluding the list that got excluded in my previous answer :( I generated md5sums of all files in vols/ on clustor02 and compared to the other nodes (clustor00 and clustor01). There are differences in volfiles (shouldn't it always be 1, since every data brick is on its own fs? quorum bricks, OTOH, share a single partition on SSD and should always be 15, but in both cases sometimes it's 0). I nearly got a stroke when I saw diff output for 'info' files, but once I sorted 'em their contents matched. Pfhew! Diego Il 03/02/2023 19:01, Strahil Nikolov ha scritto: > This one doesn't look good: > > > [2023-02-03 07:45:46.896924 +] E [MSGID: 114079] > [client-handshake.c:1253:client_query_portmap] 0-cluster_data-client-48: > remote-subvolume not set in volfile [] > > > Can you compare all vol files in /var/lib/glusterd/vols/ between the nodes ? > I have the suspicioun that there is a vol file mismatch (maybe > /var/lib/glusterd/vols//*-shd.vol). > > Best Regards, > Strahil Nikolov > > On Fri, Feb 3, 2023 at 12:20, Diego Zuccato > mailto:diego.zucc...@unibo.it>> wrote: > Can't see anything relevant in glfsheal log, just messages related to > the crash of one of the nodes (the one that had the mobo replaced... I > fear some on-disk structures could have been silently damaged by RAM > errors and that makes gluster processes crash, or it's just an issue > with enabling brick-multiplex). > -8<-- > [2023-02-03 07:45:46.896924 +] E [MSGID: 114079] > [client-handshake.c:1253:client_query_portmap] > 0-cluster_data-client-48: > remote-subvolume not set in volfile [] > [2023-02-03 07:45:46.897282 +] E > [rpc-clnt.c:331:saved_frames_unwind] (--> > /lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x195)[0x7fce0c867b95] > (--> /lib/x86_64-linux-gnu/libgfrpc.so.0(+0x72fc)[0x7fce0c0ca2fc] (--> > /lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x109)[0x7fce0c0d2419] > (--> /lib/x86_64-linux-gnu/libgfrpc.so.0(+0x10308)[0x7fce0c0d3308] (--> > /lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_transport_notify+0x26)[0x7fce0c0ce7e6] > ) 0-cluster_data-client-48: forced unwinding frame type(GF-DUMP) > op(NULL(2)) called at 2023-02-03 07:45:46.891054 + (xid=0x13) > -8<-- > > Well, actually I *KNOW* the files outside .glusterfs have been deleted > (by me :) ). That's why I call those 'stale' gfids. > Affected entries under .glusterfs have usually link count = 1 => > nothing > 'find' can find. > Since I already recovered those files (before deleting from bricks), > can > .glusterfs entries be deleted too or should I check something else? > Maybe I should create a script that finds all files/dirs (not symlinks, > IIUC) in .glusterfs on all bricks/arbiters and moves 'em to a temp dir? > > Diego > > Il 02/02/2023
Re: [Gluster-users] Quick way to fix stale gfids?
Ops... Reincluding the list that got excluded in my previous answer :( I generated md5sums of all files in vols/ on clustor02 and compared to the other nodes (clustor00 and clustor01). There are differences in volfiles (shouldn't it always be 1, since every data brick is on its own fs? quorum bricks, OTOH, share a single partition on SSD and should always be 15, but in both cases sometimes it's 0). I nearly got a stroke when I saw diff output for 'info' files, but once I sorted 'em their contents matched. Pfhew! Diego Il 03/02/2023 19:01, Strahil Nikolov ha scritto: This one doesn't look good: [2023-02-03 07:45:46.896924 +] E [MSGID: 114079] [client-handshake.c:1253:client_query_portmap] 0-cluster_data-client-48: remote-subvolume not set in volfile [] Can you compare all vol files in /var/lib/glusterd/vols/ between the nodes ? I have the suspicioun that there is a vol file mismatch (maybe /var/lib/glusterd/vols//*-shd.vol). Best Regards, Strahil Nikolov On Fri, Feb 3, 2023 at 12:20, Diego Zuccato wrote: Can't see anything relevant in glfsheal log, just messages related to the crash of one of the nodes (the one that had the mobo replaced... I fear some on-disk structures could have been silently damaged by RAM errors and that makes gluster processes crash, or it's just an issue with enabling brick-multiplex). -8<-- [2023-02-03 07:45:46.896924 +] E [MSGID: 114079] [client-handshake.c:1253:client_query_portmap] 0-cluster_data-client-48: remote-subvolume not set in volfile [] [2023-02-03 07:45:46.897282 +] E [rpc-clnt.c:331:saved_frames_unwind] (--> /lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x195)[0x7fce0c867b95] (--> /lib/x86_64-linux-gnu/libgfrpc.so.0(+0x72fc)[0x7fce0c0ca2fc] (--> /lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x109)[0x7fce0c0d2419] (--> /lib/x86_64-linux-gnu/libgfrpc.so.0(+0x10308)[0x7fce0c0d3308] (--> /lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_transport_notify+0x26)[0x7fce0c0ce7e6] ) 0-cluster_data-client-48: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at 2023-02-03 07:45:46.891054 + (xid=0x13) -8<-- Well, actually I *KNOW* the files outside .glusterfs have been deleted (by me :) ). That's why I call those 'stale' gfids. Affected entries under .glusterfs have usually link count = 1 => nothing 'find' can find. Since I already recovered those files (before deleting from bricks), can .glusterfs entries be deleted too or should I check something else? Maybe I should create a script that finds all files/dirs (not symlinks, IIUC) in .glusterfs on all bricks/arbiters and moves 'em to a temp dir? Diego Il 02/02/2023 23:35, Strahil Nikolov ha scritto: > Any issues reported in /var/log/glusterfs/glfsheal-*.log ? > > The easiest way to identify the affected entries is to run: > find /FULL/PATH/TO/BRICK/ -samefile > /FULL/PATH/TO/BRICK/.glusterfs/57/e4/57e428c7-6bed-4eb3-b9bd-02ca4c46657a > > > Best Regards, > Strahil Nikolov > > > В вторник, 31 януари 2023 г., 11:58:24 ч. Гринуич+2, Diego Zuccato > mailto:diego.zucc...@unibo.it>> написа: > > > Hello all. > > I've had one of the 3 nodes serving a "replica 3 arbiter 1" down for > some days (apparently RAM issues, but actually failing mobo). > The other nodes have had some issues (RAM exhaustion, old problem > already ticketed but still no solution) and some brick processes > coredumped. Restarting the processes allowed the cluster to continue > working. Mostly. > > After the third server got fixed I started a heal, but files didn't get > healed and count (by "ls -l > /srv/bricks/*/d/.glusterfs/indices/xattrop/|grep ^-|wc -l") did not > decrease over 2 days. So, to recover I copied files from bricks to temp > storage (keeping both copies of conflicting files with different > contents), removed files on bricks and arbiters, and finally copied back > from temp storage to the volume. > > Now the files are accessible but I still see lots of entries like > > > IIUC that's due to a mismatch between .glusterfs/ contents and normal > hierarchy. Is there some tool to speed up the cleanup? > > Tks. > > -- > Diego Zuccato > DIFA - Dip. di Fisica e Astronomia > Servizi Informatici > Alma Mater Studiorum - Università di Bologna > V.le Berti-Pichat 6/2 - 40127 Bologna - Italy > tel.: +39 051 20 95786 > > > > > Community Meeting Calendar: > &g
Re: [Gluster-users] [Gluster-devel] Regarding Glusterfs file locking
w21v6MqByaqQXxNXfIu_8nDGQD8EEStnhIl-Z9rpRbcbOmmg9ZOkU1ATnFJWyzPFNRdREsAw2g-BW2quWfglxYjdcUYrf63ntrYgrg8ZEDOgMzp8pV0psisEjmHR57IuTgPjs7iZWes9nG_yBsP6yBmLPtWSKfIGj4Diu01fwJfIG3EKXlE4xtia9TqEAj7nTcAMx1_dqKyjCgDU7ZhN-S8XQ9RWlp7OVKQ0GEPM-CSJozOXukVWlM00zAGfmPVfQAI_DmCap5bB6BXhAiIB9LXqWWDi8nrR5/https%3A%2F%2Flists.gluster.org%2Fmailman%2Flistinfo%2Fgluster-devel> NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference. Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] Quick way to fix stale gfids?
Hello all. I've had one of the 3 nodes serving a "replica 3 arbiter 1" down for some days (apparently RAM issues, but actually failing mobo). The other nodes have had some issues (RAM exhaustion, old problem already ticketed but still no solution) and some brick processes coredumped. Restarting the processes allowed the cluster to continue working. Mostly. After the third server got fixed I started a heal, but files didn't get healed and count (by "ls -l /srv/bricks/*/d/.glusterfs/indices/xattrop/|grep ^-|wc -l") did not decrease over 2 days. So, to recover I copied files from bricks to temp storage (keeping both copies of conflicting files with different contents), removed files on bricks and arbiters, and finally copied back from temp storage to the volume. Now the files are accessible but I still see lots of entries like IIUC that's due to a mismatch between .glusterfs/ contents and normal hierarchy. Is there some tool to speed up the cleanup? Tks. -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Doubts re: remove-brick
Il 20/11/2022 09:39, Strahil Nikolov ha scritto: Have you checked https://staged-gluster-docs.readthedocs.io/en/release3.7.0beta1/Features/rebalance/ ? I know it's old but it might provide some clarity. The files are removed from the source subvolume to the new subvolume. Ok, Tks! RH's numbering is really confusing. Why can't they simply use the official release number? :( And having lots of docs paywalled doesn't help either. Removed bricks do not get any writes, as during the preparation - a rebalance is issued which notifies the clients to use the new DHT subvolume. IIUC there's a mismatch between what you say and the warning I get when starting a remove-brick operation: "It is recommended that remove-brick be run with cluster.force-migration option disabled to prevent possible data corruption. Doing so will ensure that files that receive writes during migration will not be migrated and will need to be manually copied after the remove-brick commit operation. Please check the value of the option and update accordingly. Do you want to continue with your current cluster.force-migration settings? (y/n)" If files being migrated don't receive writes (I assume "on the original brick"), then why is that note needed? Most probably I'm missing some vital piece of information. [BTW my cluster.force-migration is already off... that warning is a long standing issue that seems is not easily fixable] -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] Doubts re: remove-brick
Hello all. I need to reorganize the bricks (making RAID1 on the backing devices to reduce memory used by Gluster processes) and I have a couple of doubts: - do moved (rebalanced) files get removed from source bricks so at the end I only have the files that received writes? - do bricks being removed continue getting writes for new files? Tks. -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Gluster 5.10 rebalance stuck
I think I've been in a similar situation. "Solved" by creating a new volume on a new set of bricks on the same disks and moving data to new volume. Then just deleted old volume and relative bricks. Quite sure there's a better way, but that was nearly-static data and the move was a faster fix. Diego Il 02/11/2022 08:05, Shreyansh Shah ha scritto: Hi, I Would really appreciate it if someone would be able to help on the above issue. We are stuck as we cannot run rebalance due to this and thus are not able to extract peak performance from the setup due to unbalanced data. Adding gluster info (without the bricks) below. Please let me know if any other details/logs are needed. Volume Name: data Type: Distribute Volume ID: 75410231-bb25-4f14-bcde-caf18fce1d31 Status: Started Snapshot Count: 0 Number of Bricks: 41 Transport-type: tcp Options Reconfigured: server.event-threads: 4 network.ping-timeout: 90 client.keepalive-time: 60 server.keepalive-time: 60 storage.health-check-interval: 60 performance.client-io-threads: on nfs.disable: on transport.address-family: inet performance.cache-size: 8GB performance.cache-refresh-timeout: 60 cluster.min-free-disk: 3% client.event-threads: 4 performance.io-thread-count: 16 On Fri, Oct 28, 2022 at 11:40 AM Shreyansh Shah mailto:shreyansh.s...@alpha-grep.com>> wrote: Hi, We are running glusterfs 5.10 server volume. Recently we added a few new bricks and started a rebalance operation. After a couple of days the rebalance operation was just stuck, with one of the peers showing In-Progress with no file being read/transferred and the rest showing Failed/Completed, so we stopped it using "gluster volume rebalance data stop". Now when we are trying to start it again, we get the below error. Any assistance would be appreciated root@gluster-11:~# gluster volume rebalance data status volume rebalance: data: failed: Rebalance not started for volume data. root@gluster-11:~# gluster volume rebalance data start volume rebalance: data: failed: Rebalance on data is already started root@gluster-11:~# gluster volume rebalance data stop volume rebalance: data: failed: Rebalance not started for volume data. -- Regards, Shreyansh Shah AlphaGrep* Securities Pvt. Ltd.* -- Regards, Shreyansh Shah AlphaGrep* Securities Pvt. Ltd.* Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] bitd.log and quotad.log flooding /var
Seems it's accumulating again. ATM it's like this: root 2134553 2.1 11.2 23071940 22091644 ? Ssl set23 1059:58 /usr/sbin/glusterfs -s localhost --volfile-id gluster/quotad -p /var/run/gluster/quotad/quotad.pid -l /var/log/glusterfs/quotad.log -S /var/run/gluster/321cad6822171c64.socket --process-name quotad Uptime is 77d. The other 2 nodes are in the same situation. Gluster is 9.5-1 amd64. Is it latest enough or should I plan a migration to 10? Hints? Diego Il 12/08/2022 22:18, Strahil Nikolov ha scritto: 75GB -> that's definately a memory leak. What version do you use ? If latest - open a github issue. Best Regards, Strahil Nikolov On Thu, Aug 11, 2022 at 10:06, Diego Zuccato wrote: Yup. Seems the /etc/sysconfig/glusterd setting got finally applied and I now have a process like this: root 4107315 0.0 0.0 529244 40124 ? Ssl ago08 2:44 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level ERROR but bitd still spits out (some) 'I' lines [2022-08-11 07:02:21.072943 +] I [MSGID: 118016] [bit-rot.c:1052:bitd_oneshot_crawl] 0-cluster_data-bit-rot-0: Triggering signing [{path=/extra/some/other/dirs/file.dat}, {gfid=3e35b158-35a6-4e63-adbd-41075a11022e}, {Brick-path=/srv/bricks/00/d}] Moreover I've had to disable quota, since quota processes were eating more than *75GB* RAM on each storage node! :( Il 11/08/2022 07:12, Strahil Nikolov ha scritto: > Have you decreased glusterd log level via: > glusterd --log-level WARNING|ERROR > > It seems that bitrot doesn't have it's own log level. > > As a workaround, you can configure syslog to send the logs only remotely > and thus preventing the overfill of the /var . > > > Best Regards, > Strahil Nikolov > > On Wed, Aug 10, 2022 at 7:52, Diego Zuccato > mailto:diego.zucc...@unibo.it>> wrote: > Hi Strahil. > > Sure. Luckily I didn't delete 'em all :) > > From bitd.log: > -8<-- > [2022-08-09 05:58:12.075999 +] I [MSGID: 118016] > [bit-rot.c:1052:bitd_oneshot_crawl] 0-cluster_data-bit-rot-0: > Triggering > signing [{path=/astro/...omisis.../file.dat}, > {gfid=5956af24-5efc-496c-8d7e-ea6656f298de}, > {Brick-path=/srv/bricks/10/d}] > [2022-08-09 05:58:12.082264 +] I [MSGID: 118016] > [bit-rot.c:1052:bitd_oneshot_crawl] 0-cluster_data-bit-rot-0: > Triggering > signing [{path=/astro/...omisis.../file.txt}, > {gfid=afb75c03-0d29-414e-917a-ff718982c849}, > {Brick-path=/srv/bricks/13/d}] > [2022-08-09 05:58:12.082267 +] I [MSGID: 118016] > [bit-rot.c:1052:bitd_oneshot_crawl] 0-cluster_data-bit-rot-0: > Triggering > signing [{path=/astro/...omisis.../file.dat}, > {gfid=982bc7a8-d4ba-45d7-9104-044e5d446802}, > {Brick-path=/srv/bricks/06/d}] > [2022-08-09 05:58:12.084960 +] I [MSGID: 118016] > [bit-rot.c:1052:bitd_oneshot_crawl] 0-cluster_data-bit-rot-0: > Triggering > signing [{path=/atmos/...omisis.../file}, > {gfid=17e4dfb0-1f64-47a3-9aa8-b3fa05b7cd4e}, > {Brick-path=/srv/bricks/15/d}] > [2022-08-09 05:58:12.089357 +] I [MSGID: 118016] > [bit-rot.c:1052:bitd_oneshot_crawl] 0-cluster_data-bit-rot-0: > Triggering > signing [{path=/astro/...omisis.../file.txt}, > {gfid=e70bf289-5aeb-43c2-aadd-d18979cf62b5}, > {Brick-path=/srv/bricks/00/d}] > [2022-08-09 05:58:12.094440 +] I [MSGID: 100011] > [glusterfsd.c:1511:reincarnate] 0-glusterfsd: Fetching the volume file > from server... [] > [2022-08-09 05:58:12.096299 +] I > [glusterfsd-mgmt.c:2170:mgmt_getspec_cbk] 0-glusterfs: Received list of > available volfile servers: clustor00:24007 clustor02:24007 > [2022-08-09 05:58:12.096653 +] I [MSGID: 101221] > [common-utils.c:3851:gf_set_volfile_server_common] 0-gluster: duplicate > entry for volfile-server [{errno=17}, {error=File già esistente}] > [2022-08-09 05:58:12.096853 +] I > [glusterfsd-mgmt.c:2203:mgmt_getspec_cbk] 0-glusterfs: No change in > volfile,continuing > [2022-08-09 05:58:12.096702 +] I [MSGID: 101221] > [common-utils.c:3851:gf_set_volfile_server_common] 0-gluster: duplicate > entry for volfile-server [{errno=17}, {error=File già esistente}] > [2022-08-09 05:58:12.102176 +] I [MSGID: 118016] > [bit-rot.c:1052:bitd_oneshot_crawl] 0-cluster_data-bit-rot-0: > Triggering > signin
Re: [Gluster-users] bitd.log and quotad.log flooding /var
Yup. Seems the /etc/sysconfig/glusterd setting got finally applied and I now have a process like this: root 4107315 0.0 0.0 529244 40124 ?Ssl ago08 2:44 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level ERROR but bitd still spits out (some) 'I' lines [2022-08-11 07:02:21.072943 +] I [MSGID: 118016] [bit-rot.c:1052:bitd_oneshot_crawl] 0-cluster_data-bit-rot-0: Triggering signing [{path=/extra/some/other/dirs/file.dat}, {gfid=3e35b158-35a6-4e63-adbd-41075a11022e}, {Brick-path=/srv/bricks/00/d}] Moreover I've had to disable quota, since quota processes were eating more than *75GB* RAM on each storage node! :( Il 11/08/2022 07:12, Strahil Nikolov ha scritto: Have you decreased glusterd log level via: glusterd --log-level WARNING|ERROR It seems that bitrot doesn't have it's own log level. As a workaround, you can configure syslog to send the logs only remotely and thus preventing the overfill of the /var . Best Regards, Strahil Nikolov On Wed, Aug 10, 2022 at 7:52, Diego Zuccato wrote: Hi Strahil. Sure. Luckily I didn't delete 'em all :) From bitd.log: -8<-- [2022-08-09 05:58:12.075999 +] I [MSGID: 118016] [bit-rot.c:1052:bitd_oneshot_crawl] 0-cluster_data-bit-rot-0: Triggering signing [{path=/astro/...omisis.../file.dat}, {gfid=5956af24-5efc-496c-8d7e-ea6656f298de}, {Brick-path=/srv/bricks/10/d}] [2022-08-09 05:58:12.082264 +] I [MSGID: 118016] [bit-rot.c:1052:bitd_oneshot_crawl] 0-cluster_data-bit-rot-0: Triggering signing [{path=/astro/...omisis.../file.txt}, {gfid=afb75c03-0d29-414e-917a-ff718982c849}, {Brick-path=/srv/bricks/13/d}] [2022-08-09 05:58:12.082267 +] I [MSGID: 118016] [bit-rot.c:1052:bitd_oneshot_crawl] 0-cluster_data-bit-rot-0: Triggering signing [{path=/astro/...omisis.../file.dat}, {gfid=982bc7a8-d4ba-45d7-9104-044e5d446802}, {Brick-path=/srv/bricks/06/d}] [2022-08-09 05:58:12.084960 +] I [MSGID: 118016] [bit-rot.c:1052:bitd_oneshot_crawl] 0-cluster_data-bit-rot-0: Triggering signing [{path=/atmos/...omisis.../file}, {gfid=17e4dfb0-1f64-47a3-9aa8-b3fa05b7cd4e}, {Brick-path=/srv/bricks/15/d}] [2022-08-09 05:58:12.089357 +] I [MSGID: 118016] [bit-rot.c:1052:bitd_oneshot_crawl] 0-cluster_data-bit-rot-0: Triggering signing [{path=/astro/...omisis.../file.txt}, {gfid=e70bf289-5aeb-43c2-aadd-d18979cf62b5}, {Brick-path=/srv/bricks/00/d}] [2022-08-09 05:58:12.094440 +] I [MSGID: 100011] [glusterfsd.c:1511:reincarnate] 0-glusterfsd: Fetching the volume file from server... [] [2022-08-09 05:58:12.096299 +] I [glusterfsd-mgmt.c:2170:mgmt_getspec_cbk] 0-glusterfs: Received list of available volfile servers: clustor00:24007 clustor02:24007 [2022-08-09 05:58:12.096653 +] I [MSGID: 101221] [common-utils.c:3851:gf_set_volfile_server_common] 0-gluster: duplicate entry for volfile-server [{errno=17}, {error=File già esistente}] [2022-08-09 05:58:12.096853 +] I [glusterfsd-mgmt.c:2203:mgmt_getspec_cbk] 0-glusterfs: No change in volfile,continuing [2022-08-09 05:58:12.096702 +] I [MSGID: 101221] [common-utils.c:3851:gf_set_volfile_server_common] 0-gluster: duplicate entry for volfile-server [{errno=17}, {error=File già esistente}] [2022-08-09 05:58:12.102176 +] I [MSGID: 118016] [bit-rot.c:1052:bitd_oneshot_crawl] 0-cluster_data-bit-rot-0: Triggering signing [{path=/astro/...omisis.../file.dat}, {gfid=45f59e3f-eef4-4ccf-baac-bc8bf10c5ced}, {Brick-path=/srv/bricks/09/d}] [2022-08-09 05:58:12.106120 +] I [MSGID: 118016] [bit-rot.c:1052:bitd_oneshot_crawl] 0-cluster_data-bit-rot-0: Triggering signing [{path=/astro/...omisis.../file.txt}, {gfid=216832dd-0a1c-4593-8a9e-f54d70efc637}, {Brick-path=/srv/bricks/13/d}] -8<-- And from quotad.log: -<-- [2022-08-09 05:58:12.291030 +] I [glusterfsd-mgmt.c:2170:mgmt_getspec_cbk] 0-glusterfs: Received list of available volfile servers: clustor00:24007 clustor02:24007 [2022-08-09 05:58:12.291143 +] I [MSGID: 101221] [common-utils.c:3851:gf_set_volfile_server_common] 0-gluster: duplicate entry for volfile-server [{errno=17}, {error=File già esistente}] [2022-08-09 05:58:12.291653 +] I [glusterfsd-mgmt.c:2203:mgmt_getspec_cbk] 0-glusterfs: No change in volfile,continuing [2022-08-09 05:58:12.292990 +] I [glusterfsd-mgmt.c:2170:mgmt_getspec_cbk] 0-glusterfs: Received list of available volfile servers: clustor00:24007 clustor02:24007 [2022-08-09 05:58:12.293204 +] I [glusterfsd-mgmt.c:2170:mgmt_getspec_cbk] 0-glusterfs: Received list of available volfile servers: clustor00:24007 clustor02:24007 [2022-08-09 05:58:12.293500 +] I [glusterfsd-mgmt.c:2203:mgmt_getspec_cbk] 0-glusterfs: No change in volfile,continuing [2
Re: [Gluster-users] bitd.log and quotad.log flooding /var
] 0-gluster: duplicate entry for volfile-server [{errno=17}, {error=File già esistente}] [2022-08-09 22:00:07.364719 +] I [MSGID: 100011] [glusterfsd.c:1511:reincarnate] 0-glusterfsd: Fetching the volume file from server... [] [2022-08-09 22:00:07.374040 +] I [glusterfsd-mgmt.c:2170:mgmt_getspec_cbk] 0-glusterfs: Received list of available volfile servers: clustor00:24007 clustor02:24007 [2022-08-09 22:00:07.374099 +] I [MSGID: 101221] [common-utils.c:3851:gf_set_volfile_server_common] 0-gluster: duplicate entry for volfile-server [{errno=17}, {error=File già esistente}] [2022-08-09 22:00:07.374569 +] I [glusterfsd-mgmt.c:2203:mgmt_getspec_cbk] 0-glusterfs: No change in volfile,continuing [2022-08-09 22:00:07.385610 +] I [glusterfsd-mgmt.c:2170:mgmt_getspec_cbk] 0-glusterfs: Received list of available volfile servers: clustor00:24007 clustor02:24007 [2022-08-09 22:00:07.386119 +] I [glusterfsd-mgmt.c:2203:mgmt_getspec_cbk] 0-glusterfs: No change in volfile,continuing -8<-- I've now used gluster v set cluster_data diagnostics.brick-sys-log-level CRITICAL and rate of filling decreased, but I still see many 'I' lines :( Using Gluster 9.5 packages from deb [arch=amd64] https://download.gluster.org/pub/gluster/glusterfs/9/LATEST/Debian/bullseye/amd64/apt bullseye main Tks, Diego Il 09/08/2022 22:08, Strahil Nikolov ha scritto: Hey Diego, can you show a sample of such Info entries ? Best Regards, Strahil Nikolov On Mon, Aug 8, 2022 at 15:59, Diego Zuccato wrote: Hello all. Lately, I noticed some hickups in our Gluster volume. It's a "replica 3 arbiter 1" with many bricks (currently 90 data bricks over 3 servers). I tried to reduce log level by setting diagnostics.brick-log-level: ERROR diagnostics.client-log-level: ERROR and creating /etc/default/glusterd containing "LOG_LEVEL=ERROR". But I still see a lot of 'I' lines in the logs and have to manually run logrotate way too often or /var gets too full. Any hints? What did I forget? Tks. -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk <https://meet.google.com/cpu-eiue-hvk> Gluster-users mailing list Gluster-users@gluster.org <mailto:Gluster-users@gluster.org> https://lists.gluster.org/mailman/listinfo/gluster-users <https://lists.gluster.org/mailman/listinfo/gluster-users> -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] bitd.log and quotad.log flooding /var
Hello all. Lately, I noticed some hickups in our Gluster volume. It's a "replica 3 arbiter 1" with many bricks (currently 90 data bricks over 3 servers). I tried to reduce log level by setting diagnostics.brick-log-level: ERROR diagnostics.client-log-level: ERROR and creating /etc/default/glusterd containing "LOG_LEVEL=ERROR". But I still see a lot of 'I' lines in the logs and have to manually run logrotate way too often or /var gets too full. Any hints? What did I forget? Tks. -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] Debian repository instructions outdated
Hello all. The instructions given f.e. at [1] do not follow the Debian instructions for 3rdparty repositories [2] . Mostly it boils down to changing the first step to: mkdir /etc/apt/keyrings curl https://download.gluster.org/pub/gluster/glusterfs/9/rsa.pub | gpg --dearmor > /etc/apt/keyrings/gluster-archive-keyring.gpg and then add 'signed-by=/etc/apt/keyrings/gluster-archive-keyring.gpg' between '[' and 'arch=amd64'. HIH, Diego [1] https://download.gluster.org/pub/gluster/glusterfs/9/9.4/Debian/ [2] https://wiki.debian.org/DebianRepository/UseThirdParty -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] Flood of "SSL support for MGMT is ENABLED"
Hello all. I have a Gluster 9.2 volume in "replica 3 arbiter 1": -8<-- Volume Name: cluster_data Type: Distributed-Replicate Volume ID: a8caaa90-d161-45bb-a68c-278263a8531a Status: Started Snapshot Count: 0 Number of Bricks: 45 x (2 + 1) = 135 Transport-type: tcp Bricks: Brick1: clustor00:/srv/bricks/00/d Brick2: clustor01:/srv/bricks/00/d Brick3: clustor02:/srv/quorum/00/d (arbiter) [...] Brick133: clustor01:/srv/bricks/29/d Brick134: clustor02:/srv/bricks/29/d Brick135: clustor00:/srv/quorum/14/d (arbiter) Options Reconfigured: cluster.granular-entry-heal: disable features.scrub-throttle: normal performance.parallel-readdir: on performance.write-behind-window-size: 128MB cluster.self-heal-daemon: enable features.default-soft-limit: 90 features.quota-deem-statfs: on features.inode-quota: on features.quota: on transport.address-family: inet nfs.disable: on performance.client-io-threads: on client.event-threads: 8 performance.cache-refresh-timeout: 60 performance.stat-prefetch: on cluster.lookup-optimize: on features.bitrot: on features.scrub: Active diagnostics.brick-log-level: WARNING diagnostics.client-log-level: WARNING config.brick-threads: 0 cluster.lookup-unhashed: on config.client-threads: 36 -8<-- I've had to reboot clustor02 and now that I'm trying to restart it I get the log flooded by lines like: [2022-06-06 10:18:01.639007 +] I [socket.c:4279:ssl_setup_connection_params] 0-socket.management: SSL support for MGMT is ENABLED IO path is ENABLED certificate depth is 1 for peer 192.168.253.79:48962 [2022-06-06 10:18:01.641246 +] I [socket.c:4279:ssl_setup_connection_params] 0-socket.management: SSL support for MGMT is ENABLED IO path is ENABLED certificate depth is 1 for peer 192.168.253.73:48951 To have a working node I've had to create /etc/sysconfig/glusterd file containing LOG_LEVEL=WARNING But that just hides the messages... Is that normal behaviour? -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Usage not updating in quotas
Hello Amar. Maybe I missed something, but does that mean that if I upgrade from 9.5 to 10 I lose the quota? Seems too strange to be true... Tks, Diego Il 28/04/2022 07:12, Amar Tumballi ha scritto: Hi Alan, Strahil, On Thu, Apr 28, 2022 at 3:50 AM Strahil Nikolov <mailto:hunter86...@yahoo.com>> wrote: @Amar, did the quota feature reach Gluster v10 ? It got merged in the development branch (ie, confirmed on v11). As per our release policy, it wouldn't make it to v10 as its a feature. Anyone wanting to test it, experiment with it should build from nightly or development branch for now. Regards, Amar On Tue, Apr 26, 2022 at 12:09, Alan Orth mailto:alan.o...@gmail.com>> wrote: Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] Outdated docs?
Hello all. I think there's something wrong in the docs: the page https://docs.gluster.org/en/main/Administrator-Guide/Handling-of-users-with-many-groups/ says "The FUSE client gets the groups of the process that does the I/O by reading the information from /proc/$pid/status. This file only contains up to 32 groups." I checked on my system and status files report way more than 32 groups (when the user does have 'em, obv). It could probably just be outdated info: I think it got 'fixed' 9y ago by this patch: https://linux-kernel.vger.kernel.narkive.com/KDWSnAMn/patch-proc-pid-status-show-all-supplementary-groups -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Indexing/Importing existing files on disk
Il 03/04/2022 16:36, Strahil Nikolov ha scritto: Using relative/noatime mount option reduces the I/O to the brick device.IMVHO this sentence could cause misunderstandings. :) It could be read like "noatime slows down your brick" while, IIUC, it really means it *improves* the brick's performance by reducing the number of "housekeeping" IOs. -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Interpreting gluster volume top read/write output
Hi Hubert. Didn't notice the 'top' command, so I cannot answer your doubts. But I tried and noticed that read reports some entries like 2880 2761 Hope that's not symptom of a problem... Il 22/02/2022 09:41, Hu Bert ha scritto: Hello @ll, we're just doing some "research" on our own replica 3 volume (hostnames gserver1-3, gluster 9.4), and there are a few questions regarding the output of 'gluster volume top $volname read/write'. 1) gluster volume top workdata write Brick: gserver2:/gluster/md7/workdata Count filename === 203 /images/504/013/50401355/de.mp4 195 /images/396/910/39691058/de.mp4 167 /themes/oad-high-scardus-trail/media/220202Hstimageslider1.mp4 does this mean that these files have been written 203/195/... times? lifetime writes? 2) gluster volume top workdata read Brick: gserver1:/gluster/md3/workdata Count filename === 1794/images/441/297/44129755/de.mp4 275 /images/275/806/27580686/default.jpg 258 /images/269/844/26984442/default.jpg 256 /images/269/845/26984597/default.jpg gserver1 was rebooted yesterday; does this mean these files have been read 1794/275/... times? lifetime reads or since reboot? We're just a bit ... curious if these are real read/write stats, lifetime, since reboot. thx, Hubert Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Experimenting with thin-arbiter
Not there. It's not one of the defined services :( Maybe Debian does not support it? Il 16/02/2022 13:26, Strahil Nikolov ha scritto: My bad, it should be /gluster-ta-volume.service/ On Wed, Feb 16, 2022 at 7:45, Diego Zuccato wrote: No such process is defined. Just the standard glusterd.service and glustereventsd.service . Using Debian stable. Il 15/02/2022 15:41, Strahil Nikolov ha scritto: > Any errors in gluster-ta.service on the arbiter node ? > > Best Regards, > Strahil Nikolov > > On Tue, Feb 15, 2022 at 14:28, Diego Zuccato > mailto:diego.zucc...@unibo.it>> wrote: > Hello all. > > I'm experimenting with thin-arbiter and getting disappointing results. > > I have 3 hosts in the trusted pool: > root@nas1 <mailto:root@nas1> <mailto:root@nas1 <mailto:root@nas1>>:~# gluster --version > glusterfs 9.2 > [...] > root@nas1 <mailto:root@nas1> <mailto:root@nas1 <mailto:root@nas1>>:~# gluster pool list > UUID Hostname State > d4791fed-3e6d-4f8f-bdb6-4e0043610ead nas3 Connected > bff398f0-9d1d-4bd0-8a47-0bf481d1d593 nas2 Connected > 4607034c-919d-4675-b5fc-14e1cad90214 localhost Connected > > When I try to create a new volume, the first initialization succeeds: > root@nas1 <mailto:root@nas1> <mailto:root@nas1 <mailto:root@nas1>>:~# gluster v create Bck replica 2 > thin-arbiter 1 > nas{1,3}:/bricks/00/Bck nas2:/bricks/arbiter/Bck > volume create: Bck: success: please start the volume to access data > > But adding a second brick segfaults the daemon: > root@nas1 <mailto:root@nas1> <mailto:root@nas1 <mailto:root@nas1>>:~# gluster v add-brick Bck > nas{1,3}:/bricks/01/Bck > Connection failed. Please check if gluster daemon is operational. > > After erroring out, systemctl status glusterd reports daemon in > "restarting" state and it eventually restarts. But the new brick is not > added to the volume, even if trying to re-add it yelds a "brick is > already part of a volume" error. Seems glusterd crashes between marking > brick dir as used and recording its data in the config. > > If I try to add all the bricks during the creation, glusterd does not > die but the volume doesn't get created: > root@nas1 <mailto:root@nas1> <mailto:root@nas1 <mailto:root@nas1>>:~# rm -rf /bricks/{00..07}/Bck && mkdir > /bricks/{00..07}/Bck > root@nas1 <mailto:root@nas1> <mailto:root@nas1 <mailto:root@nas1>>:~# gluster v create Bck replica 2 > thin-arbiter 1 > nas{1,3}:/bricks/00/Bck nas{1,3}:/bricks/01/Bck nas{1,3}:/bricks/02/Bck > nas{1,3}:/bricks/03/Bck nas{1,3}:/bricks/04/Bck nas{1,3}:/bricks/05/Bck > nas{1,3}:/bricks/06/Bck nas{1,3}:/bricks/07/Bck nas2:/bricks/arbiter/Bck > volume create: Bck: failed: Commit failed on localhost. Please check > the > log file for more details. > > Couldn't find anything useful in the logs :( > > If I create a "replica 3 arbiter 1" over the same brick directories > (just adding some directories to keep arbiters separated), it succeeds: > root@nas1 <mailto:root@nas1> <mailto:root@nas1 <mailto:root@nas1>>:~# gluster v create Bck replica 3 > arbiter 1 > nas{1,3}:/bricks/00/Bck nas2:/bricks/arbiter/Bck/00 > volume create: Bck: success: please start the volume to access data > root@nas1 <mailto:root@nas1> <mailto:root@nas1 <mailto:root@nas1>>:~# for T in {01..07}; do gluster v > add-brick Bck > nas{1,3}:/bricks/$T/Bck nas2:/bricks/arbiter/Bck/$T ; done > volume add-brick: success > volume add-brick: success > volume add-brick: success > volume add-brick: success > volume add-brick: success > volume add-brick: success > volume add-brick: success > root@nas1 <mailto:root@nas1> <mailto:root@nas1 <mailto:root@nas1>>:~# gluster v start Bck > volume start: Bck: success > root@nas1 <mailto:root@nas1> <mailto:root@nas1 <mailto:root@nas1>>:~# gluster v info Bck > > Volume Name: Bck > Type: Distributed-Replicate > Volume ID: 4786e747-8203-42bf-abe8-107a5
Re: [Gluster-users] Experimenting with thin-arbiter
No such process is defined. Just the standard glusterd.service and glustereventsd.service . Using Debian stable. Il 15/02/2022 15:41, Strahil Nikolov ha scritto: Any errors in gluster-ta.service on the arbiter node ? Best Regards, Strahil Nikolov On Tue, Feb 15, 2022 at 14:28, Diego Zuccato wrote: Hello all. I'm experimenting with thin-arbiter and getting disappointing results. I have 3 hosts in the trusted pool: root@nas1 <mailto:root@nas1>:~# gluster --version glusterfs 9.2 [...] root@nas1 <mailto:root@nas1>:~# gluster pool list UUID Hostname State d4791fed-3e6d-4f8f-bdb6-4e0043610ead nas3 Connected bff398f0-9d1d-4bd0-8a47-0bf481d1d593 nas2 Connected 4607034c-919d-4675-b5fc-14e1cad90214 localhost Connected When I try to create a new volume, the first initialization succeeds: root@nas1 <mailto:root@nas1>:~# gluster v create Bck replica 2 thin-arbiter 1 nas{1,3}:/bricks/00/Bck nas2:/bricks/arbiter/Bck volume create: Bck: success: please start the volume to access data But adding a second brick segfaults the daemon: root@nas1 <mailto:root@nas1>:~# gluster v add-brick Bck nas{1,3}:/bricks/01/Bck Connection failed. Please check if gluster daemon is operational. After erroring out, systemctl status glusterd reports daemon in "restarting" state and it eventually restarts. But the new brick is not added to the volume, even if trying to re-add it yelds a "brick is already part of a volume" error. Seems glusterd crashes between marking brick dir as used and recording its data in the config. If I try to add all the bricks during the creation, glusterd does not die but the volume doesn't get created: root@nas1 <mailto:root@nas1>:~# rm -rf /bricks/{00..07}/Bck && mkdir /bricks/{00..07}/Bck root@nas1 <mailto:root@nas1>:~# gluster v create Bck replica 2 thin-arbiter 1 nas{1,3}:/bricks/00/Bck nas{1,3}:/bricks/01/Bck nas{1,3}:/bricks/02/Bck nas{1,3}:/bricks/03/Bck nas{1,3}:/bricks/04/Bck nas{1,3}:/bricks/05/Bck nas{1,3}:/bricks/06/Bck nas{1,3}:/bricks/07/Bck nas2:/bricks/arbiter/Bck volume create: Bck: failed: Commit failed on localhost. Please check the log file for more details. Couldn't find anything useful in the logs :( If I create a "replica 3 arbiter 1" over the same brick directories (just adding some directories to keep arbiters separated), it succeeds: root@nas1 <mailto:root@nas1>:~# gluster v create Bck replica 3 arbiter 1 nas{1,3}:/bricks/00/Bck nas2:/bricks/arbiter/Bck/00 volume create: Bck: success: please start the volume to access data root@nas1 <mailto:root@nas1>:~# for T in {01..07}; do gluster v add-brick Bck nas{1,3}:/bricks/$T/Bck nas2:/bricks/arbiter/Bck/$T ; done volume add-brick: success volume add-brick: success volume add-brick: success volume add-brick: success volume add-brick: success volume add-brick: success volume add-brick: success root@nas1 <mailto:root@nas1>:~# gluster v start Bck volume start: Bck: success root@nas1 <mailto:root@nas1>:~# gluster v info Bck Volume Name: Bck Type: Distributed-Replicate Volume ID: 4786e747-8203-42bf-abe8-107a50b238ee Status: Started Snapshot Count: 0 Number of Bricks: 8 x (2 + 1) = 24 Transport-type: tcp Bricks: Brick1: nas1:/bricks/00/Bck Brick2: nas3:/bricks/00/Bck Brick3: nas2:/bricks/arbiter/Bck/00 (arbiter) Brick4: nas1:/bricks/01/Bck Brick5: nas3:/bricks/01/Bck Brick6: nas2:/bricks/arbiter/Bck/01 (arbiter) Brick7: nas1:/bricks/02/Bck Brick8: nas3:/bricks/02/Bck Brick9: nas2:/bricks/arbiter/Bck/02 (arbiter) Brick10: nas1:/bricks/03/Bck Brick11: nas3:/bricks/03/Bck Brick12: nas2:/bricks/arbiter/Bck/03 (arbiter) Brick13: nas1:/bricks/04/Bck Brick14: nas3:/bricks/04/Bck Brick15: nas2:/bricks/arbiter/Bck/04 (arbiter) Brick16: nas1:/bricks/05/Bck Brick17: nas3:/bricks/05/Bck Brick18: nas2:/bricks/arbiter/Bck/05 (arbiter) Brick19: nas1:/bricks/06/Bck Brick20: nas3:/bricks/06/Bck Brick21: nas2:/bricks/arbiter/Bck/06 (arbiter) Brick22: nas1:/bricks/07/Bck Brick23: nas3:/bricks/07/Bck Brick24: nas2:/bricks/arbiter/Bck/07 (arbiter) Options Reconfigured: cluster.granular-entry-heal: on storage.fips-mode-rchecksum: on transport.address-family: inet nfs.disable: on performance.client-io-threads: off Does thin arbiter support just one replica of bricks? -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786
[Gluster-users] Experimenting with thin-arbiter
Hello all. I'm experimenting with thin-arbiter and getting disappointing results. I have 3 hosts in the trusted pool: root@nas1:~# gluster --version glusterfs 9.2 [...] root@nas1:~# gluster pool list UUIDHostnameState d4791fed-3e6d-4f8f-bdb6-4e0043610eadnas3Connected bff398f0-9d1d-4bd0-8a47-0bf481d1d593nas2Connected 4607034c-919d-4675-b5fc-14e1cad90214localhost Connected When I try to create a new volume, the first initialization succeeds: root@nas1:~# gluster v create Bck replica 2 thin-arbiter 1 nas{1,3}:/bricks/00/Bck nas2:/bricks/arbiter/Bck volume create: Bck: success: please start the volume to access data But adding a second brick segfaults the daemon: root@nas1:~# gluster v add-brick Bck nas{1,3}:/bricks/01/Bck Connection failed. Please check if gluster daemon is operational. After erroring out, systemctl status glusterd reports daemon in "restarting" state and it eventually restarts. But the new brick is not added to the volume, even if trying to re-add it yelds a "brick is already part of a volume" error. Seems glusterd crashes between marking brick dir as used and recording its data in the config. If I try to add all the bricks during the creation, glusterd does not die but the volume doesn't get created: root@nas1:~# rm -rf /bricks/{00..07}/Bck && mkdir /bricks/{00..07}/Bck root@nas1:~# gluster v create Bck replica 2 thin-arbiter 1 nas{1,3}:/bricks/00/Bck nas{1,3}:/bricks/01/Bck nas{1,3}:/bricks/02/Bck nas{1,3}:/bricks/03/Bck nas{1,3}:/bricks/04/Bck nas{1,3}:/bricks/05/Bck nas{1,3}:/bricks/06/Bck nas{1,3}:/bricks/07/Bck nas2:/bricks/arbiter/Bck volume create: Bck: failed: Commit failed on localhost. Please check the log file for more details. Couldn't find anything useful in the logs :( If I create a "replica 3 arbiter 1" over the same brick directories (just adding some directories to keep arbiters separated), it succeeds: root@nas1:~# gluster v create Bck replica 3 arbiter 1 nas{1,3}:/bricks/00/Bck nas2:/bricks/arbiter/Bck/00 volume create: Bck: success: please start the volume to access data root@nas1:~# for T in {01..07}; do gluster v add-brick Bck nas{1,3}:/bricks/$T/Bck nas2:/bricks/arbiter/Bck/$T ; done volume add-brick: success volume add-brick: success volume add-brick: success volume add-brick: success volume add-brick: success volume add-brick: success volume add-brick: success root@nas1:~# gluster v start Bck volume start: Bck: success root@nas1:~# gluster v info Bck Volume Name: Bck Type: Distributed-Replicate Volume ID: 4786e747-8203-42bf-abe8-107a50b238ee Status: Started Snapshot Count: 0 Number of Bricks: 8 x (2 + 1) = 24 Transport-type: tcp Bricks: Brick1: nas1:/bricks/00/Bck Brick2: nas3:/bricks/00/Bck Brick3: nas2:/bricks/arbiter/Bck/00 (arbiter) Brick4: nas1:/bricks/01/Bck Brick5: nas3:/bricks/01/Bck Brick6: nas2:/bricks/arbiter/Bck/01 (arbiter) Brick7: nas1:/bricks/02/Bck Brick8: nas3:/bricks/02/Bck Brick9: nas2:/bricks/arbiter/Bck/02 (arbiter) Brick10: nas1:/bricks/03/Bck Brick11: nas3:/bricks/03/Bck Brick12: nas2:/bricks/arbiter/Bck/03 (arbiter) Brick13: nas1:/bricks/04/Bck Brick14: nas3:/bricks/04/Bck Brick15: nas2:/bricks/arbiter/Bck/04 (arbiter) Brick16: nas1:/bricks/05/Bck Brick17: nas3:/bricks/05/Bck Brick18: nas2:/bricks/arbiter/Bck/05 (arbiter) Brick19: nas1:/bricks/06/Bck Brick20: nas3:/bricks/06/Bck Brick21: nas2:/bricks/arbiter/Bck/06 (arbiter) Brick22: nas1:/bricks/07/Bck Brick23: nas3:/bricks/07/Bck Brick24: nas2:/bricks/arbiter/Bck/07 (arbiter) Options Reconfigured: cluster.granular-entry-heal: on storage.fips-mode-rchecksum: on transport.address-family: inet nfs.disable: on performance.client-io-threads: off Does thin arbiter support just one replica of bricks? -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Arbiter
Il 08/02/2022 12:17, Karthik Subrahmanya ha scritto: Since there are 4 nodes available here, and based on the configuration of the available volumes (requested volume info for the same) I was thinking whether the arbiter brick can be hosted on one of those nodes itself, or a new node is required. We're using replica 3 arbiter 1, with quorum balanced between the 3 servers. No need for an extra server. When we will add a 4th server, there'll be a lot of brick juggling (luckily they're connected by IB100 :) ) . The simplest thing you can do to balance load across 4 servers i laying down data as: S1 S2 S3 S4 0a b0 q0 1a 1b 1q 2a 2b 2q 3a 3b 3q ... and so on: it requires adding 8 disks at a time, 2 per server -- as long as you have enough blocks *and inodes* available on an ssd for metadata. Hope the layout is clear: Xa and Xb are the replicated bricks, Xq is quorum brick for bricks Xa and Xb. For a 3 servers setup the layout we're using is S1 S2 S3 0a 0b 0q 1a 1q 1b 2q 2a 2b HIH. -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Arbiter
IIUC it always requires 3 servers. Lightweight arbiter is just to avoid split brain (a client needs to reach two servers out of three to be able to write data). "Full" arbiter is a third replica of metadata while there are only two copies of the data. Il 08/02/2022 11:58, Gilberto Ferreira ha scritto: Forgive me if I am wrong, but AFAIK, arbiter is for a two-node configuration, isn't it? --- Gilberto Nunes Ferreira (47) 99676-7530 - Whatsapp / Telegram Em ter., 8 de fev. de 2022 às 07:17, Karthik Subrahmanya mailto:ksubr...@redhat.com>> escreveu: Hi Andre, Striped volumes are deprecated long back, see [1] & [2]. Seems like you are using a very old version. May I know which version of gluster you are running and the gluster volume info please? Release schedule and the maintained branches can be found at [3]. [1] https://docs.gluster.org/en/latest/release-notes/6.0/ <https://docs.gluster.org/en/latest/release-notes/6.0/> [2] https://lists.gluster.org/pipermail/gluster-users/2018-July/034400.html <https://lists.gluster.org/pipermail/gluster-users/2018-July/034400.html> [3] https://www.gluster.org/release-schedule/ <https://www.gluster.org/release-schedule/> Regards, Karthik On Mon, Feb 7, 2022 at 9:43 PM Andre Probst mailto:andrefpro...@gmail.com>> wrote: I have a striped and replicated volume with 4 nodes. How do I add an arbiter to this volume? -- André Probst Consultor de Tecnologia 43 99617 8765 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk <https://meet.google.com/cpu-eiue-hvk> Gluster-users mailing list Gluster-users@gluster.org <mailto:Gluster-users@gluster.org> https://lists.gluster.org/mailman/listinfo/gluster-users <https://lists.gluster.org/mailman/listinfo/gluster-users> Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk <https://meet.google.com/cpu-eiue-hvk> Gluster-users mailing list Gluster-users@gluster.org <mailto:Gluster-users@gluster.org> https://lists.gluster.org/mailman/listinfo/gluster-users <https://lists.gluster.org/mailman/listinfo/gluster-users> Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Swapping brick mounts/nodes
Il 01/02/2022 20:08, Fox ha scritto: Basically I'm asking if the bricks are mountpoint and node agnostic.Nope. They aren't :( (unless something changed in the latest releases). Some days ago I asked basically the same question (how to move a volume to a new server). -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Move a volume to a new server
Il 18/01/2022 18:27, Strahil Nikolov ha scritto: If you manage to get that server and can setup a replica -> then the migration can be transparent for the clients. But I'll be beaten by the network team :) OK as a last resort and for safety. Another option is to move both OS+Gluster disks , rebuild the initramfs and thus you will change only the decommissioned hardeare. That's more problematic... The same server serves another volume, that must stay there. -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Move a volume to a new server
Tks. I'll do it. Worst case: I'll have to create a new volume and move files from the old bricks to it. The safety is what I like most in Gluster. The 'funny' thing is that a replica server is planned but not yet operative :( Il 17/01/2022 18:24, Strahil Nikolov ha scritto: Verify that you can install the same version of gluster. If not, plan to update to a version that is available to both old and new servers' OS. Once you migrate (check all release notes in advance) to a common version, you can do something like this: - Install the gluster software on the new host - Setup the firewall to match the old server - Stop the gluster volumes and any geo-rep sessions - Shutdown glusterd service - Umount the bricks - Disable LVM LVs/VGs that host your bricks (if you share the same VG with other software, you will have to use vgsplit) - Remove the multipath devices (multipath -f) - Remove the block devices that are part of those multipath devices - Backup /etc/glusterfs - Backup /var/lib/glusterd - Unmap the LUNs - Present the LUNs on the new host - Verify that the multipath devices are there - Rescan the LVM stack (pvscan --cache, vgscan lvscan) - Activate the VGs/LVs - Mount the bricks and ensure mounting on boot (autofs, systemd's '.mount/.automount' units, fstab) - restore /etc/glusterfs & /var/lib/glusterd - Start the glusterd service - Start the volumes - Mount via FUSE to verify the situation - Start the geo-replications (if any) Note, if you use VDO - disable the volume on the old system and backup the config (/etc/vdoconf.yml) -> restore on the new host. Check your tuned profile and if needed transfer the configuration file on the new system and activate it. I might have missed something (like custom entries in /etc/hosts) , so do a short test on test system in advance. Edit: You didn't mention your FS type, so I assume XFS . Best Regards, Strahil Nikolov On Mon, Jan 17, 2022 at 13:15, Diego Zuccato wrote: Hello all. I have a Gluster volume that I'd need to move to a different server. The volume is 4x10TB bricks accessed via FC (different LUNs) on an old CX3-80. I have no extra space to create a copy of all the data, so I'd need to hide the LUNs from the old server and make 'em visible to the new ("move the disks"), w/o copying data. Can I just do something like this? - stop volume - umount bricks - copy volume state files to new server (which ones?) - map LUNs to new server - mount bricks on new server (maintaining the same path they had on old server) - start glusterd on new server - start volume Tks! -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk <https://meet.google.com/cpu-eiue-hvk> Gluster-users mailing list Gluster-users@gluster.org <mailto:Gluster-users@gluster.org> https://lists.gluster.org/mailman/listinfo/gluster-users <https://lists.gluster.org/mailman/listinfo/gluster-users> -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] Move a volume to a new server
Hello all. I have a Gluster volume that I'd need to move to a different server. The volume is 4x10TB bricks accessed via FC (different LUNs) on an old CX3-80. I have no extra space to create a copy of all the data, so I'd need to hide the LUNs from the old server and make 'em visible to the new ("move the disks"), w/o copying data. Can I just do something like this? - stop volume - umount bricks - copy volume state files to new server (which ones?) - map LUNs to new server - mount bricks on new server (maintaining the same path they had on old server) - start glusterd on new server - start volume Tks! -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] force realignement after downing a node?
Done just that :) Today I upgraded the second node, with a cleaner shutdown. Strangely, at reboot it worked for about half an hour with all the cores at 100% (but low mem use) and "gluster v heal cluster_data info" apparently hanging. Had lunch and now all cores are back to normal (20-60%), memory use is higher ang gluster is responding again. Still no files in heal pending. I'll skip tomorrow, then upgrade the last server. Hope it all goes smoothly again. Tks. Il 30/11/2021 13:17, Strahil Nikolov ha scritto: Than, take a beer/tee/coffee/ and enjoy the rest of the day ;) Best Regards, Strahil Nikolov On Mon, Nov 29, 2021 at 13:09, Diego Zuccato wrote: Here it is. Seems gluster thinks there's nothing to be done... -8<-- root@str957-clustor00 <mailto:root@str957-clustor00>:~# gluster v heal cluster_data info summary Brick clustor00:/srv/bricks/00/d Status: Connected Total Number of entries: 0 Number of entries in heal pending: 0 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick clustor01:/srv/bricks/00/d Status: Connected Total Number of entries: 0 Number of entries in heal pending: 0 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick clustor02:/srv/quorum/00/d Status: Connected Total Number of entries: 0 Number of entries in heal pending: 0 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick clustor02:/srv/bricks/00/d Status: Connected Total Number of entries: 0 Number of entries in heal pending: 0 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick clustor00:/srv/bricks/01/d Status: Connected Total Number of entries: 0 Number of entries in heal pending: 0 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick clustor01:/srv/quorum/00/d Status: Connected Total Number of entries: 0 Number of entries in heal pending: 0 Number of entries in split-brain: 0 Number of entries possibly healing: 0 [... snip: everything reports 0...] Brick clustor01:/srv/bricks/29/d Status: Connected Total Number of entries: 0 Number of entries in heal pending: 0 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick clustor02:/srv/bricks/29/d Status: Connected Total Number of entries: 0 Number of entries in heal pending: 0 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick clustor00:/srv/quorum/14/d Status: Connected Total Number of entries: 0 Number of entries in heal pending: 0 Number of entries in split-brain: 0 Number of entries possibly healing: 0 -8<-- Il 29/11/2021 12:02, Strahil Nikolov ha scritto: > What is the output of 'gluster volume heal VOLUME info summary' ? > > Best Regards, > Strahil Nikolov > > On Mon, Nov 29, 2021 at 10:33, Diego Zuccato > mailto:diego.zucc...@unibo.it>> wrote: > Hello all. > > I just brought offline a node (in a replica 3 arbiter 1 volume) to > install more RAM. > The other two nodes kept being used, so I expected to see some > resync at > power on. But I saw nothing unusual: seems it's just serving files > as usual. > Is it normal or should I force a resync? If so, how? > > Regards. > > -- > Diego Zuccato > DIFA - Dip. di Fisica e Astronomia > Servizi Informatici > Alma Mater Studiorum - Università di Bologna > V.le Berti-Pichat 6/2 - 40127 Bologna - Italy > tel.: +39 051 20 95786 > > > > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://meet.google.com/cpu-eiue-hvk <https://meet.google.com/cpu-eiue-hvk> > <https://meet.google.com/cpu-eiue-hvk <https://meet.google.com/cpu-eiue-hvk>> > Gluster-users mailing list > Gluster-users@gluster.org <mailto:Gluster-users@gluster.org> <mailto:Gluster-users@gluster.org <mailto:Gluster-users@gluster.org>> > https://lists.gluster.org/mailman/listinfo/gluster-users <https://lists.gluster.org/mailman/listinfo/gluster-users> > <https://lists.gluster.org/mailman/listinfo/gluster-users <https://lists.gluster.org/mailman/listinfo/gluster-users>> > -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna
Re: [Gluster-users] force realignement after downing a node?
Here it is. Seems gluster thinks there's nothing to be done... -8<-- root@str957-clustor00:~# gluster v heal cluster_data info summary Brick clustor00:/srv/bricks/00/d Status: Connected Total Number of entries: 0 Number of entries in heal pending: 0 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick clustor01:/srv/bricks/00/d Status: Connected Total Number of entries: 0 Number of entries in heal pending: 0 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick clustor02:/srv/quorum/00/d Status: Connected Total Number of entries: 0 Number of entries in heal pending: 0 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick clustor02:/srv/bricks/00/d Status: Connected Total Number of entries: 0 Number of entries in heal pending: 0 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick clustor00:/srv/bricks/01/d Status: Connected Total Number of entries: 0 Number of entries in heal pending: 0 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick clustor01:/srv/quorum/00/d Status: Connected Total Number of entries: 0 Number of entries in heal pending: 0 Number of entries in split-brain: 0 Number of entries possibly healing: 0 [... snip: everything reports 0...] Brick clustor01:/srv/bricks/29/d Status: Connected Total Number of entries: 0 Number of entries in heal pending: 0 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick clustor02:/srv/bricks/29/d Status: Connected Total Number of entries: 0 Number of entries in heal pending: 0 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick clustor00:/srv/quorum/14/d Status: Connected Total Number of entries: 0 Number of entries in heal pending: 0 Number of entries in split-brain: 0 Number of entries possibly healing: 0 -8<-- Il 29/11/2021 12:02, Strahil Nikolov ha scritto: What is the output of 'gluster volume heal VOLUME info summary' ? Best Regards, Strahil Nikolov On Mon, Nov 29, 2021 at 10:33, Diego Zuccato wrote: Hello all. I just brought offline a node (in a replica 3 arbiter 1 volume) to install more RAM. The other two nodes kept being used, so I expected to see some resync at power on. But I saw nothing unusual: seems it's just serving files as usual. Is it normal or should I force a resync? If so, how? Regards. -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk <https://meet.google.com/cpu-eiue-hvk> Gluster-users mailing list Gluster-users@gluster.org <mailto:Gluster-users@gluster.org> https://lists.gluster.org/mailman/listinfo/gluster-users <https://lists.gluster.org/mailman/listinfo/gluster-users> -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] force realignement after downing a node?
Hello all. I just brought offline a node (in a replica 3 arbiter 1 volume) to install more RAM. The other two nodes kept being used, so I expected to see some resync at power on. But I saw nothing unusual: seems it's just serving files as usual. Is it normal or should I force a resync? If so, how? Regards. -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Re-Add Distributed Volume Volume
Hi. If you can still move files from the broken *volume*, you don't have to touch .glusterfs folder: it's managed by gluster itself and that's the preferred way to recover (more like a transfer between two volumes that only accidentally share the filesystem on the bricks). But if the vol was really broken (wouldn't start at all), the only way to recover would be to read the files *from the bricks*. Those are quite different scenarios that require different recovery methods. Recovering from the briks is slightly more complicated, since you have to manually handle duplicate files, checking if they're actually identical or if there are differences. Il 16/11/2021 11:14, Taste-Of-IT ha scritto: Hi Diego, i noticed, when i move files from broken volume to new mounted gf volume, the folder and files are also deleted from .glusterfs directory. Thats ok right, because there are the hardlilnks of the files, right? You wrote, that you delete the .glusterfs Folder first and than move. I didnt try it because iam afraid of loosing all files, if these are the hardlings. I also noticed, if i moved files and the folder and files are also deleted in .glusterfs, the disk size didnt change. => i read about hardlink and it seems that there is a remaining part of them, thats why the free space rises, right? So i have to delete the .glusterfs directory right and no "real" files are deleted. Thats what i understand now by reading about hardlinks. What do you think thx -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Re-Add Distributed Volume Volume
Il 15/11/2021 06:45, Strahil Nikolov ha scritto: Gluster uses hard links (2 entries pointing to the same inode) and untill the hard links are not deleted, the data will be still there. [...] The hard links are in the .glusterfs directory and after a successful move you can delete them.When I've had to move from a "broken" volume to a newly created one, I first deleted the .glusterfs folders from the roots of the old bricks (the volume was already broken, after all) and then moved the other folders to their new home (the new volume, mounted on every node). Just avoided overwriting existing files. This way the bricks didn't overflow. -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Gluster extra large file on brick
Il 06/07/2021 18:28, Dan Thomson ha scritto: Hi. Maybe you're hitting the "reserved space for root" (usually 5%): when you try to write from the server directly to the brick, you're mos probably doing it from root and you use the reserved space. When you try writing from a client you're likely using a normal user and get the "no space left". Another possible issue to watch out for, is exhaustion of inodes (I've been bitten by it for arbiter bricks partition). HIH, Diego Hi gluster users, I'm having an issue that I'm hoping to get some help with on a dispersed volume (EC: 2x(4+2)) that's causing me some headaches. This is on a cluster running Gluster 6.9 on CentOS 7. At some point in the last week, writes to one of my bricks have started failing due to an "No Space Left on Device" error: [2021-07-06 16:08:57.261307] E [MSGID: 115067] [server-rpc-fops_v2.c:1373:server4_writev_cbk] 0-gluster-01-server: 1853436561: WRITEV -2 (f2d6f2f8-4fd7-4692-bd60-23124897be54), client: CTX_ID:648a7383-46c8-4ed7-a921-acafc90bec1a-GRAPH_ID:4-PID:19471-HOST:rhevh08.mgmt.triumf.ca-PC_NAME:gluster-01-client-5-RECON_NO:-5, error-xlator: gluster-01-posix [No space left on device] The disk is quite full (listed as 100% on the server), but does have some writable room left: /dev/mapper/vg--brick1-brick1 11T 11T 97G 100% /data/glusterfs/gluster-01/brick1 however, I'm not sure if the amount of disk space used on the physical drive is the true cause of the "No Space Left on Device" errors anyway. I can still manually write to this brick outside of Gluster, so it seems like the operating system isn't preventing the writes from happening. During my investigation, I noticed that one .glusterfs paths on the problem server is using up much more space than it is on the other servers. I can't quite figure out why that might be, or how that happened. I'm wondering if there's any advice on what the cause might've been. I had done some package updates on this server with the issue and not on the other servers. This included the kernel version, but didn't include the Gluster packages. So possibly this, or the reboot to load the new kernel may have caused a problem. I have scripts on my gluster machines to nicely kill all of the brick processes before rebooting, so I'm not leaning towards an abrupt shutdown being the cause, but it's a possibility. I'm also looking for advice on how to safely remove the problem file and rebuild it from the other Gluster peers. I've seen some documentation on this, but I'm a little nervous about corrupting the volume if I misunderstand the process. I'm not free to take the volume or cluster down and do maintenance at this point, but that might be something I'll have to consider if it's my only option. For reference, here's the comparison of the same path that seems to be taking up extra space on one of the hosts: 1: 26G /data/gluster-01/brick1/vol/.glusterfs/99/56 2: 26G /data/gluster-01/brick1/vol/.glusterfs/99/56 3: 26G /data/gluster-01/brick1/vol/.glusterfs/99/56 4: 26G /data/gluster-01/brick1/vol/.glusterfs/99/56 5: 26G /data/gluster-01/brick1/vol/.glusterfs/99/56 6: 3.0T /data/gluster-01/brick1/vol/.glusterfs/99/56 Any and all advice is appreciated. Thanks! -- Daniel Thomson DevOps Engineer t +1 604 222 7428 dthom...@triumf.ca TRIUMF Canada's particle accelerator centre www.triumf.ca @TRIUMFLab 4004 Wesbrook Mall Vancouver BC V6T 2A3 Canada Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] File not found errors
Il 24/06/21 03:45, cfel...@rocketmail.com ha scritto: Probably related to the problem I reported some time ago (apr, 22nd), thread "quantum directories?". Got no answer :( Hi, I've got an interesting issue with files not being found when accessed directly. When accessing a file like so: #ls /mnt/path/to/some/file/file.json ls: cannot access '/mnt/path/to/some/file/file.json': No such file or directory I can wait, try again, and same result. I could put this in a loop with a delay between read attempts, same result. However, if I do an "ls" of the directory first then everything works. That is: # ls /mnt/path/to/some/file/file.json ls: cannot access '/mnt/path/to/some/file/file.json': No such file or directory # ls /mnt/path/to/some/file/ dir1 dir2 file1.txt file2.txt file3.txt file.json At this point I can successfully "ls" the file: # ls /mnt/path/to/some/file/file.json /mnt/path/to/some/file/file.json It is curious, but seems to be isolated to certain directories/mount points. Looking at the mount logs there is absolutely nothing printed to the mnt log on the client with regards to the failure or the eventual success. The configuration: Distributed-Replicate 4 x 2 = 8 Gluster version 9.2 Clients are using native FUSE mounts (also version 9.2) I did recently add the 4th brick, and the system is currently undergoing a rebalance (the fix-layout rebalance already completed). I am throttling the full rebalance as 'lazy'. I looked at the bricks on the server and the files still exists on the older bricks, so I don't think the files in question got rebalanced (yet). The system load isn't too high, and this doesn't seem to be a consistent error as I don't have this issue in other directories, but I consistently have the issue on this directory as well as a few others. Adding some additional background/history: Cluster was setup a couple of years ago on gluster 6.x. Did an expansion (two replica pairs to three replica pairs) and rebalance while on 6.x, no issues. Upgraded to gluster 7.x -> 8.x -> 9.x Ops version has been upgraded to 9 Doing another expansion (three replica pairs to four replica pairs) and rebalance now, and seeing some interesting issues, including this one. There is 0 entries under "heal info" or "info split-brain". I'm just wondering if anyone on this list has seen anything similar, or has any suggestions. Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Replica bricks fungible?
Il 05/06/2021 14:36, Zenon Panoussis ha scritto: What I'm really asking is: can I physically move a brick from one server to another such as I can now answer my own question: yes, replica bricks are identical and can be physically moved or copied from one server to another. I have now done it a few times without any problems, though I made sure no healing was pending before the moves. Well, if it's officially supported, that could be a really interesting option to quickly scale big storage systems. I'm thinking about our scenario: 3 servers, 36 12TB disks each. When adding a new server (or another pair of servers, to keep an odd number) it will require quite a lot of time to rebalance, with heavy implications both on IB network and latency for the users. If we could simply swap around some disks it could be a lot faster. Have you documented the procedure you followed? -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Replica bricks fungible?
Il 23/04/21 13:30, Zenon Panoussis ha scritto: > Are all replica (non-arbiter) bricks identical to each > other? If not, what do they differ in? No. At least meta-metadata is different, IIUC. > What I'm really asking is: can I physically move a brick > from one server to another such as [...] > and then remove node2 from the volume, add node4 to > it and be back up and running without the need of any > synchronisation?I'm no expert, but I think you can't. It might be an > interesting feature, tho. Could be very useful to quickly scale a cluster w/o moving terabytes of data via network: move some (carefully-chosen) bricks from old nodes to the new one, replace 'em with empty disks and expand. Something like MD-RAID metadata. -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] Quantum directories?
Hello all. I just noticed a really inexplicable (for me) behaviour: root@str957-cluster:/scratch/.resv2# mkdir test root@str957-cluster:/scratch/.resv2# ls -la total 4 drwxr-xr-x 16 root root 4096 Apr 22 11:40 .. root@str957-cluster:/scratch/.resv2# mkdir test mkdir: cannot create directory 'test': File exists root@str957-cluster:/scratch/.resv2# ls -la total 4 drwxr-xr-x 16 root root 4096 Apr 22 11:40 .. root@str957-cluster:/scratch/.resv2# cd test root@str957-cluster:/scratch/.resv2/test# pwd /scratch/.resv2/test The directory both exists and doesn't exist at the same time??? The volume is: Volume Name: cluster_data Type: Distributed-Replicate Volume ID: a8caaa90-d161-45bb-a68c-278263a8531a Status: Started Snapshot Count: 0 Number of Bricks: 21 x (2 + 1) = 63 Transport-type: tcp Bricks: Brick1: clustor00:/srv/bricks/00/d Brick2: clustor01:/srv/bricks/00/d Brick3: clustor02:/srv/quorum/00/d (arbiter) [...snip...] Brick61: clustor01:/srv/bricks/13/d Brick62: clustor02:/srv/bricks/13/d Brick63: clustor00:/srv/quorum/06/d (arbiter) Options Reconfigured: features.scrub: Active features.bitrot: on cluster.lookup-optimize: on performance.stat-prefetch: on performance.cache-refresh-timeout: 60 client.event-threads: 8 performance.client-io-threads: on nfs.disable: on transport.address-family: inet features.quota: on features.inode-quota: on features.quota-deem-statfs: on features.default-soft-limit: 90 cluster.self-heal-daemon: enable performance.write-behind-window-size: 128MB performance.parallel-readdir: on Disabling performance.parallel-readdir seems to "fix" it, but IIUC it shouldn't happen even with parallel-readdir turned on, right? PS: sometimes, access to the volume becomes quite slow (3-4s for a ls of a dozen files). Any hints about options I could enable or change? The 3 servers currently have only 96GB RAM (already asked to double it), and should host up to 36 bricks + 18 quorums). There are about 50 clients. Tks. -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] OOM kills gluster process
Il 21/04/21 12:20, Strahil Nikolov ha scritto: Tks for answering. > You will need ro create a script to identify the pids and then protect them. > Then , you can add that script in the gluster's service file as > ExecStartPost=myscript Well, using pidof or the runfile contents it should be doable... Probably the runfile is the best option. I'll try it. > Another approach is to use cgroups and limit everything in the userspace. Tried that, but have had to revert the change: SLURM is propagating ulimits to the nodes... Going to ask in the SLURM list ... -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] [EXT] OOM kills gluster process
Il 21/04/21 13:04, Stefan Solbrig ha scritto: Tks for answering. > You could also consider disabling overcommitting memory: > /etc/sysctl.d/: > vm.overcommit_memory = 2 > vm.overcommit_ratio = 100 > (See https://www.kernel.org/doc/Documentation/vm/overcommit-accounting) Interesting idea, but a bit of swapping is not too bad. > This way, If users allocate too much memory, they get an error upon > allocation. > This should limit the cases where the oom killer needs to take action. > however, it has other side effects, like killing user programs that overcommit > by default. (Or user programs that fork() a lot.) Actually the fork()-intensive programs are the ones that most likely are behaving badly... I'll have to dig deeper. -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] OOM kills gluster process
Hello all. I have a somewhat undersized cluster frontend node where users (way too often) use too much RAM. Too bad that the first process selected for killing is the one handling the gluster mount! Is there a way to permanently make it "unkillable"? I already tried altering oom_adj, but the PID changes at every boot... -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Gluster usage scenarios in HPC cluster management
Il 22/03/21 16:54, Erik Jacobson ha scritto: > So if you had 24 leaders like HLRS, there would be 8 replica-3 at the > bottom layer, and then distributed across. (replicated/distributed > volumes) I still have to grasp the "leader node" concept. Weren't gluster nodes "peers"? Or by "leader" you mean that it's mentioned in the fstab entry like /l1,l2,l3:gv0 /mnt/gv0 glusterfs defaults 0 0 while the peer list includes l1,l2,l3 and a bunch of other nodes? > So we would have 24 leader nodes, each leader would have a disk serving > 4 bricks (one of which is simply a lock FS for CTDB, one is sharded, > one is for logs, and one is heavily optimized for non-object expanded > tree NFS). The term "disk" is loose. That's a system way bigger than ours (3 nodes, replica3arbiter1, up to 36 bricks per node). > Specs of a leader node at a customer site: > * 256G RAM Glip! 256G for 4 bricks... No wonder I have had troubles running 26 bricks in 64GB RAM... :) -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Gluster usage scenarios in HPC cluster management
Il 22/03/21 14:45, Erik Jacobson ha scritto: > The stuff I work on doesn't use containers much (unlike a different > system also at HPE). By "pods" I meant "glusterd instance", a server hosting a collection of bricks. > I don't have a recipe, they've just always been beefy enough for > gluster. Sorry I don't have a more scientific answer. Seems that 64GB RAM are not enough for a pod with 26 glusterfsd instances and no other services (except sshd for management). What do you mean by "beefy enough"? 128GB RAM or 1TB? -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Gluster usage scenarios in HPC cluster management
Il 19/03/2021 16:03, Erik Jacobson ha scritto: A while back I was asked to make a blog or something similar to discuss the use cases the team I work on (HPCM cluster management) at HPE. Tks for the article. I just miss a bit of information: how are you sizing CPU/RAM for pods? -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Volume not healing
Il 19/03/21 18:06, Strahil Nikolov ha scritto: > Are you running it against the fuse mountpoint ? Yup. > You are not supposed to see 'no such file or directory' ... Maybe > something more serious is going on. Between that and the duplicated files,that's for sure. But I don't know where to look to at least diangose (if not fix) this :( As I said, probably part of the issue is due to the multiple failures for OOM and the multiple tries to remove a brick. I'm currently emptying the volume then I'll recreate it from scratch, hoping for the best. -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Volume not healing
Il 20/03/21 15:21, Zenon Panoussis ha scritto: > When you have 0 files that need healing, > gluster volume heal BigVol granular-entry-heal enable > I have tested with and without granular and, empirically, > without any hard statistics, I find granular considerably > faster. Tks for the hint, but it's already set. I usually do it as soon as I create the volume :) I don't understand why it's not the default :) -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Volume not healing
Il 19/03/21 13:17, Strahil Nikolov ha scritto: > find /FUSE/mountpoint -exec stat {} \; Running it now (redirecting stdout to /dev/null). It's finding quite a lot of "no such file or directory" errors. -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Volume not healing
Il 19/03/21 11:06, Diego Zuccato ha scritto: > I tried to run "gluster v heal BigVol info summary" and got quite a high > count of entries to be healed on some bricks: > # gluster v heal BigVol info summary|grep pending|grep -v ' 0$' > Number of entries in heal pending: 41 > Number of entries in heal pending: 2971 > Number of entries in heal pending: 20 > Number of entries in heal pending: 2393 > > Too bad that those numbers aren't decreasing with time. Slight correction. Seems the numbers are *slowly* decreasing. After one hour I see: # gluster v heal BigVol info summary|grep pending|grep -v ' 0$' Number of entries in heal pending: 41 Number of entries in heal pending: 2955 Number of entries in heal pending: 20 Number of entries in heal pending: 2384 Is it possible to speed it up? Nodes are nearly idle... -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] Volume not healing
Hello all. I have a "problematic" volume. It was Rep3a1 with a dedicated VM for the arbiters. Too bad I understimated RAM needs and the arbiters VM crashed frequently for OOM (had just 8GB allocated). Even the other two nodes sometimes crashed, too, during a remove-brick operation (other thread). So I've had to stop & re-run the remove-brick multiple times, even rebooting the nodes, but it never completed. Now, I decided to move all the files to a temporary storage to rebuild the volume from scratch, but I find directories with duplicated files (two identical files, same name, size and contents), probably the two replicas. I tried to run "gluster v heal BigVol info summary" and got quite a high count of entries to be healed on some bricks: # gluster v heal BigVol info summary|grep pending|grep -v ' 0$' Number of entries in heal pending: 41 Number of entries in heal pending: 2971 Number of entries in heal pending: 20 Number of entries in heal pending: 2393 Too bad that those numbers aren't decreasing with time. Seems no entries are considered in split-brain condition (all counts for "gluster v heal BigVol info split-brain" are 0). Is there something I can do to convince Gluster to heal those entries w/o going entry-by-entry manually? Thanks. -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] Suggested setup for VM images
Hello all. What server config would you suggest for hosting live VM images? I'd have to replace a Dell MD3200 and a Dell MD3800i that are getting too old and I'd like to have a distributed architecture to avoid SPOF. What are the recommended RAM/CPU/#of disks per server? We're currently using 3 servers configured with 2 * Intel(R) Xeon(R) Silver 4210, 96GB RAM (6x16G -- probably it could be better to increase it), up to 36 12TB spinning disks. Volume is Distributed-Replicate with 2 data copies + 1 arbiter. But they're serving normal (small-medium) files, not big images (and sometimes an ls takes 3s... uhm...) so the workload is quite different... -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Proper procedure to reduce an active volume
Il 04/02/21 19:28, Nag Pavan Chilakam ha scritto: > What is the proper procedure to reduce a "replica 3 arbiter 1" volume? > Can you kindly elaborate the volume configuration. Is this a plain > arbiter volume or is it a distributed arbiter volume? > Please share the volume info so that we can help you better Sure. Here it is. Shortened a bit :) -8<-- # gluster v info Volume Name: BigVol Type: Distributed-Replicate Volume ID: c51926bd-6715-46b2-8bb3-8c915ec47e28 Status: Started Snapshot Count: 0 Number of Bricks: 28 x (2 + 1) = 84 Transport-type: tcp Bricks: Brick1: str957-biostor2:/srv/bricks/00/BigVol Brick2: str957-biostor:/srv/bricks/00/BigVol Brick3: str957-biostq:/srv/arbiters/00/BigVol (arbiter) Brick4: str957-biostor2:/srv/bricks/01/BigVol Brick5: str957-biostor:/srv/bricks/01/BigVol Brick6: str957-biostq:/srv/arbiters/01/BigVol (arbiter) [...] Brick79: str957-biostor:/srv/bricks/26/BigVol Brick80: str957-biostor2:/srv/bricks/26/BigVol Brick81: str957-biostq:/srv/arbiters/26/BigVol (arbiter) Brick82: str957-biostor:/srv/bricks/27/BigVol Brick83: str957-biostor2:/srv/bricks/27/BigVol Brick84: str957-biostq:/srv/arbiters/27/BigVol (arbiter) Options Reconfigured: features.scrub-throttle: aggressive server.manage-gids: on features.quota-deem-statfs: on features.inode-quota: on features.quota: on cluster.self-heal-daemon: enable ssl.certificate-depth: 1 auth.ssl-allow: str957-bio* features.scrub-freq: biweekly features.scrub: Active features.bitrot: on transport.address-family: inet performance.readdir-ahead: on nfs.disable: on client.ssl: on server.ssl: on server.event-threads: 8 client.event-threads: 8 cluster.granular-entry-heal: enable -8<-- > The procedure I've found is: > 1) # gluster volume remove-brick VOLNAME BRICK start > (repeat for each brick to be removed, but being a r3a1 should I remove > both bricks and the arbiter in a single command or multiple ones?) > No , you can mention bricks of a distributed subvolume in one command. > If you are having a 1x(2+1a) volume , then you should mention only one > brick. Start by removing the arbiter brick Ok. > 2) # gluster volume remove-brick VOLNAME BRICK status > (to monitor migration) > 3) # gluster volume remove-brick VOLNAME BRICK commit > (to finalize the removal) > 4) umount and reformat the freed (now unused) bricks > Is this safe? > What is the actual need to remove bricks? I need to move a couple of disks to a new server, to keep it all well balanced and increase the available space. > If you feel this volume is not needed anymore , then just delete the > volume, instead of going through each brick deletion Nono, the volume is needed and is currently hosting data I cannot lose... But I haven't space to copy it elsewhere... -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Proper procedure to reduce an active volume
Il 03/02/21 18:15, Strahil Nikolov ha scritto: Tks for the fast answer. > Replica volumes do not require the 'start + commit' - it's needed only > for distributed replicated volumes and other types of volumes. > Yet, I'm not sure if removing a data brick (and keeping the arbiter) > makes any sense. Usually, I just remove 1 data copy + the arbiter to > reshape the volume. Well, actually I need to remove both data bricks and the arbiters w/o losing the data. Probably that wasn't clear, sorry. The current pods have 28x10TB disks and all the arbiters are on a VM. The new pod does have only 26 disks. What I want to do is remove one disk from each of the current pods, move one of the freed disks to the new pod (this way each pod will have 27 disks and I'll have a cold spare to quickly replace a failed disk) and distribute the arbiters between the three pods to dismiss the VM. If possible, I'd prefer to keep redundancy (hence not going to replica 1 in an intermediate step). > Keep in mind that as you remove a brick you need to specify the new > replica count. > For example you have 'replica 3 arbiter 1' and you want to remove the > second copy and the arbiter: > gluster volume remove-brick replica 1 server2:/path/to/brick > arbiter:/path/to/brick force That's what I want to avoid :) I need to migrate data out of s1:/bricks/27, s2:/bricks/27 and s3:/arbiters/27 redistributing it to the remaining bricks. BTW, isn't replica count an attribute of the whole volume? > If you wish to reuse block devices, don't forget to rebuild the FS (as > it's fastest way to cleanup)! Yup. Already been bitten by eas :) > When you increase the count (add second data brick and maybe arbiter), > you should run: > gluster volume add-brick replica 3 arbiter 1 > server4:/path/to/brick arbiter2:/path/to/brick > gluster volume heal full That will be useful when more disks will be added. After removing the last bricks (isn't there a term for "all the components of a replica set"? slice?) I thought I could move the remaining bricks with replace-brick and keep a "rotating" distribution: slice | s1 | s2 | s3 00 | b00 | b00 | a00 (vm.a00 -> s2.a00) 01 | a00 | b01 | b00 (s1.b01 -> s3.b00, vm.a01 -> s1.a00) 02 | b01 | a00 | b01 (s1.b02 -> s1.b01, s2.b02 -> s3.b01, vm.a02 -> s2.a00) [and so on] That will take quite a long time (IIUC I cannot move to a brick being moved to another... or at least it doesn't seem wise :) ). It's probably faster to first move arbiters and then the data. -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] Proper procedure to reduce an active volume
Hello all. What is the proper procedure to reduce a "replica 3 arbiter 1" volume? The procedure I've found is: 1) # gluster volume remove-brick VOLNAME BRICK start (repeat for each brick to be removed, but being a r3a1 should I remove both bricks and the arbiter in a single command or multiple ones?) 2) # gluster volume remove-brick VOLNAME BRICK status (to monitor migration) 3) # gluster volume remove-brick VOLNAME BRICK commit (to finalize the removal) 4) umount and reformat the freed (now unused) bricks Is this safe? And once the bricks are removed I'll have to distribute arbiters across the current two data servers and a new one (currently I'm using a dedicated VM just for the arbiters). But that's another pie :) -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] Very slow 'ls' ?
Hello all. I have a volume configured as: -8<-- root@str957-clustor00:~# gluster v info cluster_data Volume Name: cluster_data Type: Distributed-Replicate Volume ID: a8caaa90-d161-45bb-a68c-278263a8531a Status: Started Snapshot Count: 0 Number of Bricks: 21 x (2 + 1) = 63 Transport-type: tcp Bricks: Brick1: clustor00:/srv/bricks/00/d Brick2: clustor01:/srv/bricks/00/d Brick3: clustor02:/srv/quorum/00/d (arbiter) [...] Brick61: clustor01:/srv/bricks/13/d Brick62: clustor02:/srv/bricks/13/d Brick63: clustor00:/srv/quorum/06/d (arbiter) Options Reconfigured: client.event-threads: 2 performance.client-io-threads: off nfs.disable: on transport.address-family: inet features.quota: on features.inode-quota: on features.quota-deem-statfs: on features.default-soft-limit: 90 cluster.self-heal-daemon: enable -8<-- Connection between client and server is via InfiniBand (40G from the client, 100G between storage nodes), using ipoib (IIUC RDMA is deprecated and unmaintained). A simple "ls -ln" ('n' to avoid delays due to lookups) for a folder with just 7 entries takes more than 4s on the first run, ~1s on the next one and a reasonable 0.1s on the third (if I'm fast enough). I tried enabling client-io-threads, but seems it didn't change anything. Any hints? TIA! -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Replication logic
Il 28/12/20 22:14, Zenon Panoussis ha scritto: > Is that so, or am imagining impossible acrobatics? Given the slow link, probably snail mail is faster. Configure a new node near the fast ones, add it to the pool, replace thin arbiters with full replicas on the new node, let it rebuild (fast, since it's "local"), then put it offline and send it to the final location. Once you turn it on again it will have to sync only the latest changes. Sould take less than 3 weeks :) -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Replica 3 volume with forced quorum 1 fault tolerance and recovery
Il 01/12/20 15:23, Dmitry Antipov ha scritto: > At least I can imagine the volume option to specify "let's assume that > the only live brick contains the > most recent (and so hopefully valid) data, so newly (re)started ones are > pleased to heal from it" behavior. Too dangerous and prone to byzantine desync. Say only node 1 survives, and a file gets written to it. Then, while node 2 returns to activity, node 1 dies before being able to tell node2 what changed. Another client writes to the "same" file a different content. Now node 1 returns active and you have split-brain: no version of the file is "better" than the other. A returning node 3 can't know (in an automated way) which copy of the file should be replicated. That's why you should always have a quorum of N/2+1 when data integrity is important. -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Geo-replication status Faulty
Il 27/10/20 13:15, Gilberto Nunes ha scritto: > I have applied this parameters to the 2-node gluster: > gluster vol set VMS cluster.heal-timeout 10 > gluster volume heal VMS enable > gluster vol set VMS cluster.quorum-reads false > gluster vol set VMS cluster.quorum-count 1 Urgh! IIUC you're begging for split-brain ... I think you should leave quorum-count=2 for safe writes. If a node is down, obviously the volume becomes readonly. But if you planned the downtime you can reduce quorum-count just before shutting it down. You'll have to bring it back to 2 before re-enabling the downed server, then wait for heal to complete before being able to down the second server. > Then I mount the gluster volume putting this line in the fstab file: > In gluster01 > gluster01:VMS /vms glusterfs > defaults,_netdev,x-systemd.automount,backupvolfile-server=gluster02 0 0 > In gluster02 > gluster02:VMS /vms glusterfs > defaults,_netdev,x-systemd.automount,backupvolfile-server=gluster01 0 0 Isn't it preferrable to use the 'hostlist' syntax? gluster01,gluster02:VMS /vms glusterfs defaults,_netdev 0 0 A / at the beginning is optional, but can be useful if you're trying to use the diamond freespace collector (w/o the initial slash, it ignores glusterfs mountpoints). -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Upgrade from 6.9 to 7.7 stuck (peer is rejected)
Il 27/10/20 07:40, mabi ha scritto: > First to answer your question how this first happened, I reached that issue > first by simply rebooting my arbiter node yesterday morning in order to due > some maintenance which I do on a regular basis and was never a problem before > GlusterFS 7.8. In my case the problem originated from the daemon being reaped by OOM killer, but the result was the same. You're in the same rat hole I've been into... IIRC you have to probe *a working node from the detached node* . I followed these instructions: https://staged-gluster-docs.readthedocs.io/en/release3.7.0beta1/Administrator%20Guide/Resolving%20Peer%20Rejected/ Yes, they're for an ancient version, but it worked... -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Upgrade from 6.9 to 7.7 stuck (peer is rejected)
Il 26/10/20 15:09, mabi ha scritto: > Right, seen liked that this sounds reasonable. Do you actually remember the > exact command you ran in order to remove the brick? I was thinking this > should be it: > gluster volume remove-brick force > but should I use "force" or "start"? Memory does not serve me well (there are 28 disks, not 26!), but bash history does :) # gluster volume remove-brick BigVol replica 2 str957-biostq:/srv/arbiters/{00..27}/BigVol force # gluster peer detach str957-biostq # gluster peer probe str957-biostq # gluster volume add-brick BigVol replica 3 arbiter 1 str957-biostq:/srv/arbiters/{00..27}/BigVol You obviously have to wait for remove-brick to complete before detaching arbiter. >> IIRC it took about 3 days, but the arbiters are on a VM (8CPU, 8GB RAM) >> that uses an iSCSI disk. More than 80% continuous load on both CPUs and RAM. > That's quite long I must say and I am in the same case as you, my arbiter is > a VM. Give all the CPU and RAM you can. Less than 8GB RAM is asking for troubles (in my case). -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Upgrade from 6.9 to 7.7 stuck (peer is rejected)
Il 26/10/20 14:46, mabi ha scritto: >> I solved it by "degrading" the volume to replica 2, then cleared the >> arbiter bricks and upgraded again to replica 3 arbiter 1. > Thanks Diego for pointing out this workaround. How much data do you have on > that volume in terms of TB and files? Because I have around 3TB of data in 10 > million files. So I am a bit worried of taking such drastic measures. The volume is built by 26 10TB disks w/ genetic data. I currently don't have exact numbers, but it's still at the beginning, so there are a bit less than 10TB actually used. But you're only removing the arbiters, you always have two copies of your files. The worst that can happen is a split brain condition (avoidable by requiring a 2-nodes quorum, in that case the worst is that the volume goes readonly). > How bad was the load after on your volume when re-adding the arbiter brick? > and how long did it take to sync/heal? IIRC it took about 3 days, but the arbiters are on a VM (8CPU, 8GB RAM) that uses an iSCSI disk. More than 80% continuous load on both CPUs and RAM. > Would another workaround such as turning off quotas on that problematic > volume work? That sounds much less scary but I don't know if that would > work... I don't know, sorry. -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Upgrade from 6.9 to 7.7 stuck (peer is rejected)
Il 26/10/20 07:40, mabi ha scritto: > Thanks to this fix I could successfully upgrade from GlusterFS 6.9 to > 7.8 but now, 1 week later after the upgrade, I have rebooted my third > node (arbiter node) and unfortunately the bricks do not want to come up > on that node. I get the same following error message: IIRC it's the same issue I had some time ago. I solved it by "degrading" the volume to replica 2, then cleared the arbiter bricks and upgraded again to replica 3 arbiter 1. -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] gluster replica 3 with third less powerful machine
Il 20/10/20 15:53, Gilberto Nunes ha scritto: > I have 3 servers but the third one is a very low machine compared to the > 2 others servers. How much RAM does it have? > How could I create a replica 3 in order to prevent split-brain, but tell > the gluster to not use the third node too much??? You could have it host just arbiters in a "replica 3 arbiter 1" volume. I currently use a VM in this role, but it needs at least 8GB RAM to avoid OOM (it handles 26 arbiters, so you probably can get away with less if you have less bricks). My VM also have 8 CPUs to reduce the time needed for resync. Remember that backing filesystems for arbiters should be tweaked for allowing a lot of inodes. I formatted my XFS volumes with mkfs.xfs -i size=512,maxpct=90 /dev/sdXn to allow up to 90% for inodes (instead of the usual 5%) => a single fs can handle multiple arbiter bricks. -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Low cost, extendable, failure tolerant home cloud storage question
Il 04/10/20 21:29, Strahil Nikolov ha scritto: > In order to be safe you need 'replica 3' or a disperse volumes. At work I'm using "replica 3 arbiter 1" to balance storage overhead and data security. > In both cases extending by 1 brick (brick is not equal to node) is not > possible in most cases. For example in 'replica 3' you need to add 3 more > bricks (brick is a combination of 'server + directory' and it is recommended > to be on separate systems or it's a potential single point of failure). > Dispersed volumes also need to be extended in numbers of , so > if you have 4+2 (4 bricks , 2 are the maximum you can loose without dataloss > ) - you need to add another 6 bricks to extend. To extend a replica 3 arbiter 1 you only have to add two disks. And have enough inodes available on the third server. Don't understimate inodes use, especially if you're using a single partition for all the arbiters! >> - cheap nodes (with 1-2GB of RAM) able to handle the task (like Rpi, >Odroid >> XU4 or even HP T610) > You need a little bit more ram for daily usage and most probably more cores > as healing of data in replica is demanding (dispersed volumes are like raid's > parity and require some cpu). From my experience 8GB is the minimum during healing. Less than that and you'll get OOM kills and many problems. I'd recommend not less than 16G for an "arbiter only" server, and 32G for a replica server. These figures are for a volume with 26 10TB disks (two physical servers w/ 26 disks each plus the arbiter-only in a VM). > The idea with the ITX boards is not so bad. You can get 2 small systems and > create your erasure coding. Isn't EC a tad overkill with only 2 systems? BTW I noticed that too small systems are not practical: you have a lot of (nearly) fixed costs (motherboard, enclosure, power suppy) that only manages 2-3 disks. Then, most depends on how much you think you'll expand your storage. Old theorem is that "the time required to fill a disk is constant" :) > Yet, I would prefer the 'replica 3 arbiter 1' approach as it doesn't take so > much space and extending will require only 2 data disks . And you won't have split-brain issues that are a mess to fix! -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Fwd: New GlusterFS deployment, doubts on 1 brick per host vs 1 brick per drive.
Il 09/09/20 15:30, Miguel Mascarenhas Filipe ha scritto: I'm a noob, but IIUC this is the option giving the best performance: > 2. 1 brick per drive, Gluster "distributed replicated" volumes, no > internal redundancy Clients can write to both servers in parallel and read scattered (read performance using multiple files ~ 16x vs 2x with a single disk per host). Moreover it's easier to extend. But why ZFS instead of XFS ? In my experience it's heavier. PS: add a third host ASAP, at least for arbiter volumes (replica 3 arbiter 1). Split brain can be a real pain to fix! -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] How to fix I/O error ? (resend)
Il 28/08/20 10:31, Felix Kölzow ha scritto: > I faced a directory were a simple ls leads to input/output error. I saw something similar, but the directory was OK, except some files that reported "??" (IIRC in the size field). That got healed automatically. > I cd into the corresponding directory on the brick and I did a ls > command and it works. Well, you have to check all the bricks of a replica to be sure to get all the files. > # while read item > # do > # rm -rf $item > # done < /tmp/mylist Before this I'd have saved the files outside of the bricks :) > Thats it. Afterwards, I copied the deleted files back from our backup. Ah, you had a backup! :) > Please give me a hint if this procedure also works for you.Different > situation. But could probably work. Except for the fact we don't have a backup of those files :( Our volume is mostly used for archiving, so writes are rare. I know really well redundancy is no substitute for a backup (with redundancy only, if a file gets deleted, it's lost -- for this, a WORM translator could be useful :) ). BTW, in my case I noticed that having the two replicas online and bringing down the arbiters brought back online the files, so I completely removed the abriter bricks (degrading to replica 2) and I'm now slowly re-adding 'em to have "replica 3 arbiter 1" again (see "node sizing" thread). -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] Node sizing
Hello all. I just noticed that rebuilding arbiter bricks is using lots of CPU and RAM. I thought it was quite a lightweight op so I installed the arbiter node in a VM, but 8CPUs and 16GB RAM are maxed out (and a bit of swap gets used, too). The volume is 28*(2+1) 10TB bricks. Gluster v 5.5 . Is there some rule of thumb for sizing nodes? I couldn't find anything... TIA. -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] How to fix I/O error ? (resend)
Il 25/08/20 15:27, Amar Tumballi ha scritto: > I am not aware of any data layout changes we did between current latest > (7.7) and 3.8.8. But due to some issues, 'online' migration is not > possible, even the clients needs to be updated, so you have to umount > the volume once. Tks for the info. Actually the issue is less bad than I thought: I checked on a client that (somehow) still used Debian oldstable. Current stable uses 5.5, still old but not prehistoric :) Too bad the original issue still persists, even after removing the file and its hardlink from .gluster dir :( Maybe the upgrade can fix it? Or I risk breaking it even more? -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] How to fix I/O error ? (resend)
Il 24/08/20 15:23, Diego Zuccato ha scritto: > I'm now completely out of ideas :( Actually I have one last idea. My nodes are installed from standard Debian "stable" repos. That means they're version 3.8.8 ! I understand it's an ancient version. What's the recommended upgrade path to a current version? Possibly keeping the data safe: I have nowhere to move all those TBs to... -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] How to fix I/O error ? (resend)
Il 21/08/20 13:56, Diego Zuccato ha scritto: Hello again. I also tried disabling bitrot (and re-enabling it afterwards) and the procedure for recovery from split-brain[*] removing the file and its link from one of the nodes, but no luck. I'm now completely out of ideas :( How can I resync those gfids ? Tks! Diego [*] even if "gluster volume heal BigVol info split-brain" reports 0 for every brick. > Hello all. > > I have a volume setup as: > -8<-- > root@str957-biostor:~# gluster v info BigVol > > Volume Name: BigVol > Type: Distributed-Replicate > Volume ID: c51926bd-6715-46b2-8bb3-8c915ec47e28 > Status: Started > Snapshot Count: 0 > Number of Bricks: 28 x (2 + 1) = 84 > Transport-type: tcp > Bricks: > Brick1: str957-biostor2:/srv/bricks/00/BigVol > Brick2: str957-biostor:/srv/bricks/00/BigVol > Brick3: str957-biostq:/srv/arbiters/00/BigVol (arbiter) > [...] > Options Reconfigured: > cluster.granular-entry-heal: enable > client.event-threads: 8 > server.event-threads: 8 > server.ssl: on > client.ssl: on > nfs.disable: on > performance.readdir-ahead: on > transport.address-family: inet > features.bitrot: on > features.scrub: Active > features.scrub-freq: biweekly > auth.ssl-allow: str957-bio* > ssl.certificate-depth: 1 > cluster.self-heal-daemon: enable > features.quota: on > features.inode-quota: on > features.quota-deem-statfs: on > server.manage-gids: on > features.scrub-throttle: aggressive > -8<-- > > After a couple failures (a disk on biostor2 went "missing", and glusterd > on biostq got killed by OOM) I noticed that some files can't be accessed > from the clients: > -8<-- > $ ls -lh 1_germline_CGTACTAG_L005_R* > -rwxr-xr-x 1 e.f domain^users 2,0G apr 24 2015 > 1_germline_CGTACTAG_L005_R1_001.fastq.gz > -rwxr-xr-x 1 e.f domain^users 2,0G apr 24 2015 > 1_germline_CGTACTAG_L005_R2_001.fastq.gz > $ ls -lh 1_germline_CGTACTAG_L005_R1_001.fastq.gz > ls: cannot access '1_germline_CGTACTAG_L005_R1_001.fastq.gz': > Input/output error > -8<-- > (note that if I request ls for more files, it works...). > > The files have exactly the same contents (verified via md5sum). The only > difference is in getfattr: trusted.bit-rot.version is > 0x17005f3f9e670002ad5b on a node and > 0x12005f3ce7af000dccad on the other. > > On the client, the log reports: > -8<- > [2020-08-21 11:32:52.208809] W [MSGID: 108008] > [afr-self-heal-name.c:354:afr_selfheal_name_gfid_mismatch_check] > 4-BigVol-replicate-13: GFID mismatch for > /1_germline_CGTACTAG_L005_R1_001.fastq.gz > d70a4a6d-05fc-4988-8041-5e7f62155fe5 on BigVol-client-55 and > f249f88a-909f-489d-8d1d-d428e842ee96 on BigVol-client-34 > [2020-08-21 11:32:52.209768] W [fuse-bridge.c:471:fuse_entry_cbk] > 0-glusterfs-fuse: 233606: LOOKUP() > /[...]/1_germline_CGTACTAG_L005_R1_001.fastq.gz => -1 (Errore di > input/output) > -8<-- > > As suggested on IRC, I tested the RAM, but the only thing I got have > been a "Peer rejected" status due to another OOM kill. No problem, I've > been able to resolve it, but the original problem still remains. > > What else can I do? > > TIA! > > -- > Diego Zuccato > DIFA - Dip. di Fisica e Astronomia > Servizi Informatici > Alma Mater Studiorum - Università di Bologna > V.le Berti-Pichat 6/2 - 40127 Bologna - Italy > tel.: +39 051 20 95786 > > > > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://bluejeans.com/441850968 > > Gluster-users mailing list > Gluster-users@gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] How to fix I/O error ? (resend)
Hello all. I have a volume setup as: -8<-- root@str957-biostor:~# gluster v info BigVol Volume Name: BigVol Type: Distributed-Replicate Volume ID: c51926bd-6715-46b2-8bb3-8c915ec47e28 Status: Started Snapshot Count: 0 Number of Bricks: 28 x (2 + 1) = 84 Transport-type: tcp Bricks: Brick1: str957-biostor2:/srv/bricks/00/BigVol Brick2: str957-biostor:/srv/bricks/00/BigVol Brick3: str957-biostq:/srv/arbiters/00/BigVol (arbiter) [...] Options Reconfigured: cluster.granular-entry-heal: enable client.event-threads: 8 server.event-threads: 8 server.ssl: on client.ssl: on nfs.disable: on performance.readdir-ahead: on transport.address-family: inet features.bitrot: on features.scrub: Active features.scrub-freq: biweekly auth.ssl-allow: str957-bio* ssl.certificate-depth: 1 cluster.self-heal-daemon: enable features.quota: on features.inode-quota: on features.quota-deem-statfs: on server.manage-gids: on features.scrub-throttle: aggressive -8<-- After a couple failures (a disk on biostor2 went "missing", and glusterd on biostq got killed by OOM) I noticed that some files can't be accessed from the clients: -8<-- $ ls -lh 1_germline_CGTACTAG_L005_R* -rwxr-xr-x 1 e.f domain^users 2,0G apr 24 2015 1_germline_CGTACTAG_L005_R1_001.fastq.gz -rwxr-xr-x 1 e.f domain^users 2,0G apr 24 2015 1_germline_CGTACTAG_L005_R2_001.fastq.gz $ ls -lh 1_germline_CGTACTAG_L005_R1_001.fastq.gz ls: cannot access '1_germline_CGTACTAG_L005_R1_001.fastq.gz': Input/output error -8<-- (note that if I request ls for more files, it works...). The files have exactly the same contents (verified via md5sum). The only difference is in getfattr: trusted.bit-rot.version is 0x17005f3f9e670002ad5b on a node and 0x12005f3ce7af000dccad on the other. On the client, the log reports: -8<- [2020-08-21 11:32:52.208809] W [MSGID: 108008] [afr-self-heal-name.c:354:afr_selfheal_name_gfid_mismatch_check] 4-BigVol-replicate-13: GFID mismatch for /1_germline_CGTACTAG_L005_R1_001.fastq.gz d70a4a6d-05fc-4988-8041-5e7f62155fe5 on BigVol-client-55 and f249f88a-909f-489d-8d1d-d428e842ee96 on BigVol-client-34 [2020-08-21 11:32:52.209768] W [fuse-bridge.c:471:fuse_entry_cbk] 0-glusterfs-fuse: 233606: LOOKUP() /[...]/1_germline_CGTACTAG_L005_R1_001.fastq.gz => -1 (Errore di input/output) -8<-- As suggested on IRC, I tested the RAM, but the only thing I got have been a "Peer rejected" status due to another OOM kill. No problem, I've been able to resolve it, but the original problem still remains. What else can I do? TIA! -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Replicate over WAN?
On 05/05/2010 20:18, Vikas Gorur wrote: 2) readdir (ls) is always sent to the first subvolume. This is necessary to ensure consistent inode numbers. Uhm... Couldn't the same result be achieved storing a virtual inode number in an attribute? So that it gets replicated with the rest of the data and it makes possible to have the first subvolume always local... I understand tht it could lead to possible problems (like how do I generate an inode number if the master node is missing), but it could open to the replicate writes, local reads that many people are requesting... The scenario to think about is a firm w/ a remote office connected via a VPN -- if you can cut nearly all the read traffic from the VPN, then you see a great boost in performance. Or maybe I missed something... -- Diego Zuccato Servizi Informatici Dip. di Astronomia - Università di Bologna Via Ranzani, 1 - 40126 Bologna - Italy tel.: +39 051 20 95786 mail: diego.zucc...@unibo.it -- LA RICERCA C’È E SI VEDE: 5 per mille all'Università di Bologna - C.F.: 80007010376 http://www.unibo.it/Vademecum5permille.htm Questa informativa è inserita in automatico dal sistema al fine esclusivo della realizzazione dei fini istituzionali dell’ente. ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users