Re: [Gluster-users] Replica 3 scale out and ZFS bricks
On 9/17/20 4:47 PM, Strahil Nikolov wrote: I guess I misunderstood you - if I decode the diagram correctly it should be OK , you will always have at least 2 bricks available after a node get's down. It would be way simpler if you add a 5th node (VM probably) as an arbiter and switch to 'replica 3 arbiter 1'. Yep, I would add an arbiter node in this case. What I wanted to make sure was my understanding of the way GlusterFS is able to scale is correct. Specifically expanding a volume by adding one storage node to the current setup. Thanks, Strahil. Best regards, -- alexander iliev Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Replica 3 scale out and ZFS bricks
On 9/17/20 3:37 AM, Stephan von Krawczynski wrote: Nevertheless you will break performance anyway by deploying user-space crawling-slow glusterfs... outcome of 10 wasted years of development in the wrong direction. Genuinely asking - what would you recommend instead of GlusterFS for a highly available, horizontally scalable storage system? Best regards, -- alexander iliev Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Replica 3 scale out and ZFS bricks
On 9/16/20 9:53 PM, Strahil Nikolov wrote: В сряда, 16 септември 2020 г., 11:54:57 Гринуич+3, Alexander Iliev написа: From what I understood, in order to be able to scale it one node at a time, I need to set up the initial nodes with a number of bricks that is a multiple of 3 (e.g., 3, 6, 9, etc. bricks). The initial cluster will be able to export a volume as large as the storage of a single node and adding one more node will grow the volume by 1/3 (assuming homogeneous nodes.) You can't add 1 node to a replica 3, so no - you won't get 1/3 with that extra node. OK, then I guess I was totally confused on this point. I'd imagined something like this would work: node1node2node3 +-+ +-+ +-+ | brick 1 | | brick 1 | | brick 1 | | brick 2 | | brick 2 | | brick 2 | | brick 3 | | brick 3 | | brick 3 | +-+ +-+ +-+ | v node1node2node3node4 +-+ +-+ +-+ +-+ | brick 1 | | brick 1 | | brick 4 | | brick 1 | | brick 2 | | brick 4 | | brick 2 | | brick 2 | | brick 3 | | brick 3 | | brick 3 | | brick 4 | +-+ +-+ +-+ +-+ any# gluster peer probe node4 any# gluster volume replace-brick volume1 node2:/gfs/2/brick node4:/gfs/2/brick commit force any# gluster volume replace-brick volume1 node3:/gfs/1/brick node4:/gfs/1/brick commit force node2# umount /gfs/2 && mkfs /dev/... && mv /gfs/2 /gfs/4 && mount /dev/... /gfs/4 # or clean up the replaced brick by other means node3# umount /gfs/1 && mkgs /dev/... && mv /gfs/1 /gfs/4 && mount /dev/... /gfs/4 # or clean up the replaced brick by other means any# gluster volume add-brick volume1 node2:/gfs/4/brick node3:/gfs/4/brick node4:/gfs/4/brick (Note: /etc/fstab or whatever mounting mechanism is used also needs to be updated after renaming the mount-points on node2 and node3.) I played around with this in a VM setup and it seems to work, but maybe I'm missing something. Even if this is supposed to work maybe it has other implications I'm not aware of, so I would be happy to be educated on this. My plan is to use ZFS as the underlying system for the bricks. Now I'm wondering - if I join the disks on each node in a, say, RAIDZ2 pool and then create a dataset within the pool for each brick, the GlusterFS volume would report the volume size 3x$brick_size, because each brick shares the same pool and the size/free space is reported according to the ZFS pool size/free space. I'm not sure about ZFS (never played with it on Linux), but in my systems I setup a Thinpool consisting on all HDDs in a striped way (when no Hardware Raid Controller is available) and then you setup thin LVs for each brick. In thin LVM you can define Virtual Size and this size is reported as the volume size (assuming that all bricks are the same in size).If you have 1 RAIDZ2 pool per Gluster TSP node, then that pool's size is the maximum size of your volume. If you plan to use snapshots , then you should set quota on the volume to control the usage. How should I go about this? Should I create a ZFS pool per brick (this seems to have a negative impact on performance)? Should I set a quota for each dataset? I would go with 1 RAIDZ2 pool with 1 dataset of type 'filesystem' per Gluster node . Quota is always good to have. P.S.: Any reason to use ZFS ? It uses a lot of memory . Two main reasons for ZFS - node-level redundancy and compression. I want to enable some node-level fault tolerance in order to avoid healing a failed node from scratch. From my experience so far healing (at least in our environment) is quite slow and painful. Hardware RAID is not an option in our setup. With LVM mirroring we would be utilizing 50% of the physical space. We could go with mdadm+LVM, but it feels messier and AFAIK mdadm RAID6 is prone to the "write hole" problem (but maybe I'm outdated on this one).hunter86...@yahoo.com Best regards, -- alexander iliev Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] Replica 3 scale out and ZFS bricks
Hi list, I am in the process of planning a 3-node replica 3 setup and I have a question about scaling it out. From what I understood, in order to be able to scale it one node at a time, I need to set up the initial nodes with a number of bricks that is a multiple of 3 (e.g., 3, 6, 9, etc. bricks). The initial cluster will be able to export a volume as large as the storage of a single node and adding one more node will grow the volume by 1/3 (assuming homogeneous nodes.) Please let me know if my understanding is correct. My plan is to use ZFS as the underlying system for the bricks. Now I'm wondering - if I join the disks on each node in a, say, RAIDZ2 pool and then create a dataset within the pool for each brick, the GlusterFS volume would report the volume size 3x$brick_size, because each brick shares the same pool and the size/free space is reported according to the ZFS pool size/free space. How should I go about this? Should I create a ZFS pool per brick (this seems to have a negative impact on performance)? Should I set a quota for each dataset? Does my plan even make sense? Thank you! Best regards, -- alexander iliev Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] GlusterFS geo-replication progress question
Thanks, Sunny. alexander iliev On 4/7/20 12:25 AM, Sunny Kumar wrote: Hi Alexander, Answers inline below: On Thu, Apr 2, 2020 at 1:08 AM Alexander Iliev wrote: Hi all, I have a running geo-replication session between two clusters and I'm trying to figure out what is the current progress of the replication and possibly how much longer it will take. It has been running for quite a while now (> 1 month), but the thing is that both the hardware of the nodes and the link between the two clusters aren't that great (e.g., the volumes are backed by rotating disks) and the volume is somewhat sizeable (30-ish TB) and given these details I'm not really sure how long it is supposed to take normally. I have several bricks in the volume (same brick size and physical layout in both clusters) that are now showing up with a Changelog Crawl status and with a recent LAST_SYNCED date in the `gluster colume geo-replication status detail` command output which seems to be the desired state for all bricks. The rest of the bricks though are in Hybrid Crawl state and have been in that state forever. So I suppose my questions are - how can I tell if the replication session is somehow broken and if it's not, then is there are way for me to find out the progress and the ETA of the replication? Please go through this section[1] which talks about this. In Hybrid crawl at present we do not have any accounting information like how much time it will take to sync data. In /var/log/glusterfs/geo-replication/$session_dir/gsyncd.log there are some errors like: [2020-03-31 11:48:47.81269] E [syncdutils(worker /data/gfs/store1/8/brick):822:errlog] Popen: command returned error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto -S /tmp/gsync d-aux-ssh-6aDWmc/206c4b2c3eb782ea2cf49ab5142bd68b.sock x.x.x.x /nonexistent/gsyncd slave x.x.x.x:: --master-node x.x.x.x --master-node-id 9476b8bb-d7ee-489a-b083-875805343e67 --master-brick --local-node x.x.x.x 2 --local-node-id 426b564d-35d9-4291-980e-795903e9a386 --slave-timeout 120 --slave-log-level INFO --slave-gluster-log-level INFO --slave-gluster-command-dir /usr/sbinerror=1 [2020-03-31 11:48:47.81617] E [syncdutils(worker ):826:logerr] Popen: ssh> failed with ValueError. [2020-03-31 11:48:47.390397] I [repce(agent ):97:service_loop] RepceServer: terminating on reaching EOF. If you are seeing this error at a regular interval then please check your ssh connection, it might have broken. If possible please share full traceback form both master and slave to debug the issue. In the brick logs I see stuff like: [2020-03-29 07:49:05.338947] E [fuse-bridge.c:4167:fuse_xattr_cbk] 0-glusterfs-fuse: extended attribute not supported by the backend storage I don't know if these are critical, from the rest of the logs it looks like data is traveling between the clusters. Any help will be greatly appreciated. Thank you in advance! Best regards, -- alexander iliev Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users [1]. https://docs.gluster.org/en/latest/Administrator%20Guide/Geo%20Replication/#status /sunny Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] gluster v6.8: systemd units disabled after install
Hi Hubert, I think this would vary from distribution to distribution and it is up to the package maintainers of the particular distribution to decide what the default should be. I am using Gluster 6.6 on CentOS and the Gluster-specific services there were also disabled (although not exactly as in your original post - the vendor preset was also disabled for me, while it is enabled for you). This is only a speculation for this particular case, but I think the idea in general is to have the system administrator explicitly enable the services he wants running on reboot. I would argue that this is the safer approach as opposed to enabling a service automatically after its installation. An example scenario would be - you install a service, the system is rebooted, e.g. due to a power outage, mistyped command, etc., the service is started automatically even though it hasn't been properly configured yet. I guess, to really know the reasoning, the respective package maintainers would need to jump in and share their idea behind this decision. Best regards, -- alexander iliev On 4/11/20 7:40 AM, Hu Bert wrote: Hi, so no one has seen the problem of disabled systemd units before? Regards, Hubert Am Mo., 6. Apr. 2020 um 12:30 Uhr schrieb Hu Bert : Hello, after a server reboot (with a fresh gluster 6.8 install) i noticed that the gluster services weren't running. systemctl status glusterd.service ● glusterd.service - GlusterFS, a clustered file-system server Loaded: loaded (/lib/systemd/system/glusterd.service; disabled; vendor preset: enabled) Active: inactive (dead) Docs: man:glusterd(8) Apr 06 11:34:18 glfsserver1 systemd[1]: /lib/systemd/system/glusterd.service:9: PIDFile= references path below legacy directory /var/run/, updating /var/run/glusterd.pid → /run/glusterd.pid; please update the unit file accordingly. systemctl status glustereventsd.service ● glustereventsd.service - Gluster Events Notifier Loaded: loaded (/lib/systemd/system/glustereventsd.service; disabled; vendor preset: enabled) Active: inactive (dead) Docs: man:glustereventsd(8) Apr 06 11:34:27 glfsserver1 systemd[1]: /lib/systemd/system/glustereventsd.service:11: PIDFile= references path below legacy directory /var/run/, updating /var/run/glustereventsd.pid → /run/glustereventsd.pid; please update the unit file accordingly. You have to enable them manually: systemctl enable glusterd.service Created symlink /etc/systemd/system/multi-user.target.wants/glusterd.service → /lib/systemd/system/glusterd.service. systemctl enable glustereventsd.service Created symlink /etc/systemd/system/multi-user.target.wants/glustereventsd.service → /lib/systemd/system/glustereventsd.service. Is this a bug? If so: already known? Regards, Hubert Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] GlusterFS geo-replication progress question
Hi all, I have a running geo-replication session between two clusters and I'm trying to figure out what is the current progress of the replication and possibly how much longer it will take. It has been running for quite a while now (> 1 month), but the thing is that both the hardware of the nodes and the link between the two clusters aren't that great (e.g., the volumes are backed by rotating disks) and the volume is somewhat sizeable (30-ish TB) and given these details I'm not really sure how long it is supposed to take normally. I have several bricks in the volume (same brick size and physical layout in both clusters) that are now showing up with a Changelog Crawl status and with a recent LAST_SYNCED date in the `gluster colume geo-replication status detail` command output which seems to be the desired state for all bricks. The rest of the bricks though are in Hybrid Crawl state and have been in that state forever. So I suppose my questions are - how can I tell if the replication session is somehow broken and if it's not, then is there are way for me to find out the progress and the ETA of the replication? In /var/log/glusterfs/geo-replication/$session_dir/gsyncd.log there are some errors like: [2020-03-31 11:48:47.81269] E [syncdutils(worker /data/gfs/store1/8/brick):822:errlog] Popen: command returned error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto -S /tmp/gsync d-aux-ssh-6aDWmc/206c4b2c3eb782ea2cf49ab5142bd68b.sock x.x.x.x /nonexistent/gsyncd slave x.x.x.x:: --master-node x.x.x.x --master-node-id 9476b8bb-d7ee-489a-b083-875805343e67 --master-brick --local-node x.x.x.x 2 --local-node-id 426b564d-35d9-4291-980e-795903e9a386 --slave-timeout 120 --slave-log-level INFO --slave-gluster-log-level INFO --slave-gluster-command-dir /usr/sbinerror=1 [2020-03-31 11:48:47.81617] E [syncdutils(worker ):826:logerr] Popen: ssh> failed with ValueError. [2020-03-31 11:48:47.390397] I [repce(agent ):97:service_loop] RepceServer: terminating on reaching EOF. In the brick logs I see stuff like: [2020-03-29 07:49:05.338947] E [fuse-bridge.c:4167:fuse_xattr_cbk] 0-glusterfs-fuse: extended attribute not supported by the backend storage I don't know if these are critical, from the rest of the logs it looks like data is traveling between the clusters. Any help will be greatly appreciated. Thank you in advance! Best regards, -- alexander iliev Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Is rebalance in progress or not?
On 3/15/20 5:17 PM, Strahil Nikolov wrote: On March 15, 2020 12:16:51 PM GMT+02:00, Alexander Iliev wrote: On 3/15/20 11:07 AM, Strahil Nikolov wrote: On March 15, 2020 11:50:32 AM GMT+02:00, Alexander Iliev wrote: Hi list, I was having some issues with one of my Gluster nodes so I ended up re-installing it. Now I want to re-add the bricks for my main volume and I'm having the following issue - when I try to add the bricks I get: # gluster volume add-brick store1 replica 3 volume add-brick: failed: Pre Validation failed on 172.31.35.132. Volume name store1 rebalance is in progress. Please retry after completion But then if I get the rebalance status I get: # gluster volume rebalance store1 status volume rebalance: store1: failed: Rebalance not started for volume store1. And if I try to start the rebalancing I get: # gluster volume rebalance store1 start volume rebalance: store1: failed: Rebalance on store1 is already started Looking at the logs of the first node, when I try to start the rebalance operation I see this: [2020-03-15 09:41:31.883651] E [MSGID: 106276] [glusterd-rpc-ops.c:1200:__glusterd_stage_op_cbk] 0-management: Received stage RJT from uuid: 9476b8bb-d7ee-489a-b083-875805343e67 On the second node the logs are showing stuff that indicates that a rebalance operation is indeed in progress: [2020-03-15 09:47:34.190042] I [MSGID: 109081] [dht-common.c:5868:dht_setxattr] 0-store1-dht: fixing the layout of /redacted [2020-03-15 09:47:34.775691] I [dht-rebalance.c:3285:gf_defrag_process_dir] 0-store1-dht: migrate data called on /redacted [2020-03-15 09:47:36.019403] I [dht-rebalance.c:3480:gf_defrag_process_dir] 0-store1-dht: Migration operation on dir /redacted took 1.24 secs Some background on what led to this situation: The volume was originally a replica 3 distributed replicated volume on three nodes. In order to detach the faulty node I lowered the replica count to 2 and removed the bricks from that node from the volume. I cleaned up the storage (formatted the bricks and cleaned the trusted.gfid and trusted.glusterfs.volume-id extended attributes) and purged the gluster packages from the system, then I re-installed the gluster packages and did a `gluster peer probe` from another node. I'm running Gluster 6.6 on CentOS 7.7 on all nodes. I feel stuck at this point, so any guidance will be greatly appreciated. Thanks! Best regards, Hey Alex, Did you try to go the second node (the one tgat thinks balance is running) and stop tge balance ? gluster volume rebalance VOLNAME stop Then add the new brick (and increase the replica count) and after the heal is over - rebalance again. Hey Strahil, Thanks for the suggestion, I just tried it, but unfortunately the result is pretty much the same - when I try to stop the rebalance on the second node it reports that no rebalance is in progress: # gluster volume rebalance store1 stop volume rebalance: store1: failed: Rebalance not started for volume store1. Best Regards, Strahil Nikolov Best regards, -- alexander iliev Hey Alex, I'm not sure if the command has a 'force' flag, but of it does - it is worth trying. gluster volume rebalance store1 stop force Hey Strahil, Thank again for your suggestions! According to the `gluster volume rebalance help` output only the `start` subcommand supports a force flag. I tried that already, unfortunately it doesn't help: # gluster volume rebalance store1 start force volume rebalance: store1: failed: Rebalance on store1 is already started # gluster volume rebalance store1 stop volume rebalance: store1: failed: Rebalance not started for volume store1. Sadly, as the second node thinks balance is running - I'm not sure if a 'start force' (to convince both nodes that balance is runking )and then 'stop' will have the expected effect. The rebalance is indeed running on the second node judging from the contents of /var/log/glusterfs/store1-rebalance.log. Sadly, this situation is hard to reproduce. In any way , a bug report should be opened . The thing is I'm not sure if I can provide meaningful steps to reproduce at this point. I didn't keep proper track of all the things I attempted, so I'm not sure if the bug report I can file would be of much value. :( Keep in mind that I do not have a distributed volume , so everything above is pure speculation. Based on my experience - a gluster upgrade can fix odd situations like that, but also it could make things worse . So for now avoid any upgrades, until a dev confirms it is safe to do. Yeah, I'd rather wait for the rebalance to finish before I make any further attempts at it. Sadly the storage is backed by rather slow (spinning) drives, so it might take a while, but even so I prefer being safe than sorry. :) Best Regards, Strahil Nikolov Best regards, -- alexander iliev Community Meeting Calendar: Schedule - Every Tuesday
Re: [Gluster-users] Is rebalance in progress or not?
On 3/15/20 11:07 AM, Strahil Nikolov wrote: On March 15, 2020 11:50:32 AM GMT+02:00, Alexander Iliev wrote: Hi list, I was having some issues with one of my Gluster nodes so I ended up re-installing it. Now I want to re-add the bricks for my main volume and I'm having the following issue - when I try to add the bricks I get: # gluster volume add-brick store1 replica 3 volume add-brick: failed: Pre Validation failed on 172.31.35.132. Volume name store1 rebalance is in progress. Please retry after completion But then if I get the rebalance status I get: # gluster volume rebalance store1 status volume rebalance: store1: failed: Rebalance not started for volume store1. And if I try to start the rebalancing I get: # gluster volume rebalance store1 start volume rebalance: store1: failed: Rebalance on store1 is already started Looking at the logs of the first node, when I try to start the rebalance operation I see this: [2020-03-15 09:41:31.883651] E [MSGID: 106276] [glusterd-rpc-ops.c:1200:__glusterd_stage_op_cbk] 0-management: Received stage RJT from uuid: 9476b8bb-d7ee-489a-b083-875805343e67 On the second node the logs are showing stuff that indicates that a rebalance operation is indeed in progress: [2020-03-15 09:47:34.190042] I [MSGID: 109081] [dht-common.c:5868:dht_setxattr] 0-store1-dht: fixing the layout of /redacted [2020-03-15 09:47:34.775691] I [dht-rebalance.c:3285:gf_defrag_process_dir] 0-store1-dht: migrate data called on /redacted [2020-03-15 09:47:36.019403] I [dht-rebalance.c:3480:gf_defrag_process_dir] 0-store1-dht: Migration operation on dir /redacted took 1.24 secs Some background on what led to this situation: The volume was originally a replica 3 distributed replicated volume on three nodes. In order to detach the faulty node I lowered the replica count to 2 and removed the bricks from that node from the volume. I cleaned up the storage (formatted the bricks and cleaned the trusted.gfid and trusted.glusterfs.volume-id extended attributes) and purged the gluster packages from the system, then I re-installed the gluster packages and did a `gluster peer probe` from another node. I'm running Gluster 6.6 on CentOS 7.7 on all nodes. I feel stuck at this point, so any guidance will be greatly appreciated. Thanks! Best regards, Hey Alex, Did you try to go the second node (the one tgat thinks balance is running) and stop tge balance ? gluster volume rebalance VOLNAME stop Then add the new brick (and increase the replica count) and after the heal is over - rebalance again. Hey Strahil, Thanks for the suggestion, I just tried it, but unfortunately the result is pretty much the same - when I try to stop the rebalance on the second node it reports that no rebalance is in progress: > # gluster volume rebalance store1 stop > volume rebalance: store1: failed: Rebalance not started for volume store1. Best Regards, Strahil Nikolov Best regards, -- alexander iliev Community Meeting Calendar: Schedule - Every Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] Is rebalance in progress or not?
Hi list, I was having some issues with one of my Gluster nodes so I ended up re-installing it. Now I want to re-add the bricks for my main volume and I'm having the following issue - when I try to add the bricks I get: > # gluster volume add-brick store1 replica 3 > volume add-brick: failed: Pre Validation failed on 172.31.35.132. Volume name store1 rebalance is in progress. Please retry after completion But then if I get the rebalance status I get: > # gluster volume rebalance store1 status > volume rebalance: store1: failed: Rebalance not started for volume store1. And if I try to start the rebalancing I get: > # gluster volume rebalance store1 start > volume rebalance: store1: failed: Rebalance on store1 is already started Looking at the logs of the first node, when I try to start the rebalance operation I see this: > [2020-03-15 09:41:31.883651] E [MSGID: 106276] [glusterd-rpc-ops.c:1200:__glusterd_stage_op_cbk] 0-management: Received stage RJT from uuid: 9476b8bb-d7ee-489a-b083-875805343e67 On the second node the logs are showing stuff that indicates that a rebalance operation is indeed in progress: > [2020-03-15 09:47:34.190042] I [MSGID: 109081] [dht-common.c:5868:dht_setxattr] 0-store1-dht: fixing the layout of /redacted > [2020-03-15 09:47:34.775691] I [dht-rebalance.c:3285:gf_defrag_process_dir] 0-store1-dht: migrate data called on /redacted > [2020-03-15 09:47:36.019403] I [dht-rebalance.c:3480:gf_defrag_process_dir] 0-store1-dht: Migration operation on dir /redacted took 1.24 secs Some background on what led to this situation: The volume was originally a replica 3 distributed replicated volume on three nodes. In order to detach the faulty node I lowered the replica count to 2 and removed the bricks from that node from the volume. I cleaned up the storage (formatted the bricks and cleaned the trusted.gfid and trusted.glusterfs.volume-id extended attributes) and purged the gluster packages from the system, then I re-installed the gluster packages and did a `gluster peer probe` from another node. I'm running Gluster 6.6 on CentOS 7.7 on all nodes. I feel stuck at this point, so any guidance will be greatly appreciated. Thanks! Best regards, -- alexander iliev Community Meeting Calendar: Schedule - Every Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] Geo-replication /var/lib space question
Hello list, I have been running a geo-replication session for some time now, but at some point I noticed that the /var/lib/misc/gluster is eating up the storage on my root partition. I moved the folder away to another partition, but I don't seem to remember reading any specific space requirement for /var/lib and geo-replication. Did I miss it in the documentation? Also, does the space used in /var/lib/misc/gluster depend on the geo-replicated volume size? What exactly is stored there? (I'm guessing that's where gsyncd keeps track of the replicatation progress.) (I'm running gluster 6.6 on CentOS 7.7.) Thanks! -- alexander iliev Community Meeting Calendar: APAC Schedule - Every 2nd and 4th Tuesday at 11:30 AM IST Bridge: https://bluejeans.com/441850968 NA/EMEA Schedule - Every 1st and 3rd Tuesday at 01:00 PM EDT Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Issues with Geo-replication (GlusterFS 6.3 on Ubuntu 18.04)
On 10/17/19 5:32 PM, Aravinda Vishwanathapura Krishna Murthy wrote: On Thu, Oct 17, 2019 at 12:54 PM Alexander Iliev mailto:ailiev%2bglus...@mamul.org>> wrote: Thanks, Aravinda. Does this mean that my scenario is currently unsupported? Please try by providing external IP while creating Geo-rep session. We will work on the enhancement if it didn't work. This is what I've been doing all along. It didn't work for me. It seems that I need to make sure that the nodes in the two clusters can see each-other (some kind of VPN would work I guess). Is this be documented somewhere? I think I've read the geo-replication documentation several times now, but somehow it wasn't obvious to me that you need access to the slave nodes from the master ones (apart from the SSH access). Thanks! Best regards, -- alexander iliev On 10/17/19 5:25 AM, Aravinda Vishwanathapura Krishna Murthy wrote: > Got it. > > Geo-replication uses slave nodes IP in the following cases, > > - Verification during Session creation - It tries to mount the Slave > volume using the hostname/IP provided in Geo-rep create command. Try > Geo-rep create by specifying the external IP which is accessible from > the master node. > - Once Geo-replication is started, it gets the list of Slave nodes > IP/hostname from Slave volume info and connects to those IPs. But in > this case, those are internal IP addresses that are not accessible from > Master nodes. - We need to enhance Geo-replication to accept external IP > and internal IP map details so that for all connections it can use > external IP. > > On Wed, Oct 16, 2019 at 10:29 PM Alexander Iliev > mailto:ailiev%2bglus...@mamul.org> <mailto:ailiev%2bglus...@mamul.org <mailto:ailiev%252bglus...@mamul.org>>> wrote: > > Hi Aravinda, > > All volume brick on the slave volume are up and the volume seems > functional. > > Your suggestion about trying to mount the slave volume on a master node > brings up my question about network connectivity again - the GlusterFS > documentation[1] says: > > > The server specified in the mount command is only used to fetch the > gluster configuration volfile describing the volume name. Subsequently, > the client will communicate directly with the servers mentioned in the > volfile (which might not even include the one used for mount). > > To me this means that the masternode from your example is expected to > have connectivity to the network where the slave volume runs, i.e. to > have network access to the slave nodes. In my geo-replication scenario > this is definitely not the case. The two cluster are running in two > completely different networks that are not interconnected. > > So my question is - how is the slave volume mount expected to happen if > the client host cannot access the GlusterFS nodes? Or is the > connectivity a requirement even for geo-replication? > > I'm not sure if I'm missing something, but any help will be highly > appreciated! > > Thanks! > > Links: > [1] > https://gluster.readthedocs.io/en/latest/Administrator%20Guide/Setting%20Up%20Clients/ > -- > alexander iliev > > On 10/16/19 6:03 AM, Aravinda Vishwanathapura Krishna Murthy wrote: > > Hi Alexander, > > > > Please check the status of Volume. Looks like the Slave volume > mount is > > failing because bricks are down or not reachable. If Volume > status shows > > all bricks are up then try mounting the slave volume using mount > command. > > > > ``` > > masternode$ mkdir /mnt/vol > > masternode$ mount -t glusterfs : /mnt/vol > > ``` > > > > On Fri, Oct 11, 2019 at 4:03 AM Alexander Iliev > > mailto:ailiev%2bglus...@mamul.org> <mailto:ailiev%2bglus...@mamul.org <mailto:ailiev%252bglus...@mamul.org>> > <mailto:ailiev%2bglus...@mamul.org <mailto:ailiev%252bglus...@mamul.org> > <mailto:ailiev%252bglus...@mamul.org <mailto:ailiev%25252bglus...@mamul.org>>>> wrote: > > > > Hi all, > > > > I ended up rein
Re: [Gluster-users] Issues with Geo-replication (GlusterFS 6.3 on Ubuntu 18.04)
Thanks, Aravinda. Does this mean that my scenario is currently unsupported? It seems that I need to make sure that the nodes in the two clusters can see each-other (some kind of VPN would work I guess). Is this be documented somewhere? I think I've read the geo-replication documentation several times now, but somehow it wasn't obvious to me that you need access to the slave nodes from the master ones (apart from the SSH access). Thanks! Best regards, -- alexander iliev On 10/17/19 5:25 AM, Aravinda Vishwanathapura Krishna Murthy wrote: Got it. Geo-replication uses slave nodes IP in the following cases, - Verification during Session creation - It tries to mount the Slave volume using the hostname/IP provided in Geo-rep create command. Try Geo-rep create by specifying the external IP which is accessible from the master node. - Once Geo-replication is started, it gets the list of Slave nodes IP/hostname from Slave volume info and connects to those IPs. But in this case, those are internal IP addresses that are not accessible from Master nodes. - We need to enhance Geo-replication to accept external IP and internal IP map details so that for all connections it can use external IP. On Wed, Oct 16, 2019 at 10:29 PM Alexander Iliev mailto:ailiev%2bglus...@mamul.org>> wrote: Hi Aravinda, All volume brick on the slave volume are up and the volume seems functional. Your suggestion about trying to mount the slave volume on a master node brings up my question about network connectivity again - the GlusterFS documentation[1] says: > The server specified in the mount command is only used to fetch the gluster configuration volfile describing the volume name. Subsequently, the client will communicate directly with the servers mentioned in the volfile (which might not even include the one used for mount). To me this means that the masternode from your example is expected to have connectivity to the network where the slave volume runs, i.e. to have network access to the slave nodes. In my geo-replication scenario this is definitely not the case. The two cluster are running in two completely different networks that are not interconnected. So my question is - how is the slave volume mount expected to happen if the client host cannot access the GlusterFS nodes? Or is the connectivity a requirement even for geo-replication? I'm not sure if I'm missing something, but any help will be highly appreciated! Thanks! Links: [1] https://gluster.readthedocs.io/en/latest/Administrator%20Guide/Setting%20Up%20Clients/ -- alexander iliev On 10/16/19 6:03 AM, Aravinda Vishwanathapura Krishna Murthy wrote: > Hi Alexander, > > Please check the status of Volume. Looks like the Slave volume mount is > failing because bricks are down or not reachable. If Volume status shows > all bricks are up then try mounting the slave volume using mount command. > > ``` > masternode$ mkdir /mnt/vol > masternode$ mount -t glusterfs : /mnt/vol > ``` > > On Fri, Oct 11, 2019 at 4:03 AM Alexander Iliev > mailto:ailiev%2bglus...@mamul.org> <mailto:ailiev%2bglus...@mamul.org <mailto:ailiev%252bglus...@mamul.org>>> wrote: > > Hi all, > > I ended up reinstalling the nodes with CentOS 7.5 and GlusterFS 6.5 > (installed from the SIG.) > > Now when I try to create a replication session I get the following: > > > # gluster volume geo-replication store1 ::store2 create > push-pem > > Unable to mount and fetch slave volume details. Please check the > log: > /var/log/glusterfs/geo-replication/gverify-slavemnt.log > > geo-replication command failed > > You can find the contents of gverify-slavemnt.log below, but the > initial > error seems to be: > > > [2019-10-10 22:07:51.578519] E > [fuse-bridge.c:5211:fuse_first_lookup] > 0-fuse: first lookup on root failed (Transport endpoint is not > connected) > > I only found > [this](https://bugzilla.redhat.com/show_bug.cgi?id=1659824) > bug report which doesn't seem to help. The reported issue is failure to > mount a volume on a GlusterFS client, but in my case I need > geo-replication which implies the client (geo-replication master) being > on a different network. > > Any help will be appreciated. > > Thanks! > > gverify-slavemnt.log: > > > [2019-10-10 22:07:40.571256] I [MSGID:
Re: [Gluster-users] Issues with Geo-replication (GlusterFS 6.3 on Ubuntu 18.04)
Hi Aravinda, All volume brick on the slave volume are up and the volume seems functional. Your suggestion about trying to mount the slave volume on a master node brings up my question about network connectivity again - the GlusterFS documentation[1] says: > The server specified in the mount command is only used to fetch the gluster configuration volfile describing the volume name. Subsequently, the client will communicate directly with the servers mentioned in the volfile (which might not even include the one used for mount). To me this means that the masternode from your example is expected to have connectivity to the network where the slave volume runs, i.e. to have network access to the slave nodes. In my geo-replication scenario this is definitely not the case. The two cluster are running in two completely different networks that are not interconnected. So my question is - how is the slave volume mount expected to happen if the client host cannot access the GlusterFS nodes? Or is the connectivity a requirement even for geo-replication? I'm not sure if I'm missing something, but any help will be highly appreciated! Thanks! Links: [1] https://gluster.readthedocs.io/en/latest/Administrator%20Guide/Setting%20Up%20Clients/ -- alexander iliev On 10/16/19 6:03 AM, Aravinda Vishwanathapura Krishna Murthy wrote: Hi Alexander, Please check the status of Volume. Looks like the Slave volume mount is failing because bricks are down or not reachable. If Volume status shows all bricks are up then try mounting the slave volume using mount command. ``` masternode$ mkdir /mnt/vol masternode$ mount -t glusterfs : /mnt/vol ``` On Fri, Oct 11, 2019 at 4:03 AM Alexander Iliev mailto:ailiev%2bglus...@mamul.org>> wrote: Hi all, I ended up reinstalling the nodes with CentOS 7.5 and GlusterFS 6.5 (installed from the SIG.) Now when I try to create a replication session I get the following: > # gluster volume geo-replication store1 ::store2 create push-pem > Unable to mount and fetch slave volume details. Please check the log: /var/log/glusterfs/geo-replication/gverify-slavemnt.log > geo-replication command failed You can find the contents of gverify-slavemnt.log below, but the initial error seems to be: > [2019-10-10 22:07:51.578519] E [fuse-bridge.c:5211:fuse_first_lookup] 0-fuse: first lookup on root failed (Transport endpoint is not connected) I only found [this](https://bugzilla.redhat.com/show_bug.cgi?id=1659824) bug report which doesn't seem to help. The reported issue is failure to mount a volume on a GlusterFS client, but in my case I need geo-replication which implies the client (geo-replication master) being on a different network. Any help will be appreciated. Thanks! gverify-slavemnt.log: > [2019-10-10 22:07:40.571256] I [MSGID: 100030] [glusterfsd.c:2847:main] 0-glusterfs: Started running glusterfs version 6.5 (args: glusterfs --xlator-option=*dht.lookup-unhashed=off --volfile-server --volfile-id store2 -l /var/log/glusterfs/geo-replication/gverify-slavemnt.log /tmp/gverify.sh.5nFlRh) > [2019-10-10 22:07:40.575438] I [glusterfsd.c:2556:daemonize] 0-glusterfs: Pid of current running process is 6021 > [2019-10-10 22:07:40.584282] I [MSGID: 101190] [event-epoll.c:680:event_dispatch_epoll_worker] 0-epoll: Started thread with index 0 > [2019-10-10 22:07:40.584299] I [MSGID: 101190] [event-epoll.c:680:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 > [2019-10-10 22:07:40.928094] I [MSGID: 114020] [client.c:2393:notify] 0-store2-client-0: parent translators are ready, attempting connect on transport > [2019-10-10 22:07:40.931121] I [MSGID: 114020] [client.c:2393:notify] 0-store2-client-1: parent translators are ready, attempting connect on transport > [2019-10-10 22:07:40.933976] I [MSGID: 114020] [client.c:2393:notify] 0-store2-client-2: parent translators are ready, attempting connect on transport > Final graph: > +--+ > 1: volume store2-client-0 > 2: type protocol/client > 3: option ping-timeout 42 > 4: option remote-host 172.31.36.11 > 5: option remote-subvolume /data/gfs/store1/1/brick-store2 > 6: option transport-type socket > 7: option transport.address-family inet > 8: option transport.socket.ssl-enabled off > 9: option transport.tcp-user-timeout 0 > 10: option transport.socket.keepalive-time 20 > 11: option transport.socket.keepalive-interval 2 > 12: option transport.socket.keepalive-count 9 > 13:
Re: [Gluster-users] Issues with Geo-replication (GlusterFS 6.3 on Ubuntu 18.04)
nd-volume > 65: > 66: volume store2-read-ahead > 67: type performance/read-ahead > 68: subvolumes store2-write-behind > 69: end-volume > 70: > 71: volume store2-readdir-ahead > 72: type performance/readdir-ahead > 73: option parallel-readdir off > 74: option rda-request-size 131072 > 75: option rda-cache-limit 10MB > 76: subvolumes store2-read-ahead > 77: end-volume > 78: > 79: volume store2-io-cache > 80: type performance/io-cache > 81: subvolumes store2-readdir-ahead > 82: end-volume > 83: > 84: volume store2-open-behind > 85: type performance/open-behind > 86: subvolumes store2-io-cache > 87: end-volume > 88: > 89: volume store2-quick-read > 90: type performance/quick-read > 91: subvolumes store2-open-behind > 92: end-volume > 93: > 94: volume store2-md-cache > 95: type performance/md-cache > 96: subvolumes store2-quick-read > 97: end-volume > 98: > 99: volume store2 > 100: type debug/io-stats > 101: option log-level INFO > 102: option latency-measurement off > 103: option count-fop-hits off > 104: subvolumes store2-md-cache > 105: end-volume > 106: > 107: volume meta-autoload > 108: type meta > 109: subvolumes store2 > 110: end-volume > 111: > +--+ > [2019-10-10 22:07:51.578287] I [fuse-bridge.c:5142:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel 7.22 > [2019-10-10 22:07:51.578356] I [fuse-bridge.c:5753:fuse_graph_sync] 0-fuse: switched to graph 0 > [2019-10-10 22:07:51.578467] I [MSGID: 108006] [afr-common.c:5666:afr_local_init] 0-store2-replicate-0: no subvolumes up > [2019-10-10 22:07:51.578519] E [fuse-bridge.c:5211:fuse_first_lookup] 0-fuse: first lookup on root failed (Transport endpoint is not connected) > [2019-10-10 22:07:51.578709] W [fuse-bridge.c:1266:fuse_attr_cbk] 0-glusterfs-fuse: 2: LOOKUP() / => -1 (Transport endpoint is not connected) > [2019-10-10 22:07:51.578687] I [MSGID: 108006] [afr-common.c:5666:afr_local_init] 0-store2-replicate-0: no subvolumes up > [2019-10-10 22:09:48.222459] E [MSGID: 108006] [afr-common.c:5318:__afr_handle_child_down_event] 0-store2-replicate-0: All subvolumes are down. Going offline until at least one of them comes back up. > The message "E [MSGID: 108006] [afr-common.c:5318:__afr_handle_child_down_event] 0-store2-replicate-0: All subvolumes are down. Going offline until at least one of them comes back up." repeated 2 times between [2019-10-10 22:09:48.222459] and [2019-10-10 22:09:48.222891] > alexander iliev On 9/8/19 4:50 PM, Alexander Iliev wrote: Hi all, Sunny, thank you for the update. I have applied the patch locally on my slave system and now the mountbroker setup is successful. I am facing another issue though - when I try to create a replication session between the two sites I am getting: # gluster volume geo-replication store1 glustergeorep@::store1 create push-pem Error : Request timed out geo-replication command failed It is still unclear to me if my setup is expected to work at all. Reading the geo-replication documentation at [1] I see this paragraph: > A password-less SSH connection is also required for gsyncd between every node in the master to every node in the slave. The gluster system:: execute gsec_create command creates secret-pem files on all the nodes in the master, and is used to implement the password-less SSH connection. The push-pem option in the geo-replication create command pushes these keys to all the nodes in the slave. It is not clear to me whether connectivity from each master node to each slave node is a requirement in terms of networking. In my setup the slave nodes form the Gluster pool over a private network which is not reachable from the master site. Any ideas how to proceed from here will be greatly appreciated. Thanks! Links: [1] https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3/html/administration_guide/sect-preparing_to_deploy_geo-replication Best regards, -- alexander iliev On 9/3/19 2:50 PM, Sunny Kumar wrote: Thank you for the explanation Kaleb. Alexander, This fix will be available with next release for all supported versions. /sunny On Mon, Sep 2, 2019 at 6:47 PM Kaleb Keithley wrote: Fixes on master (before or after the release-7 branch was taken) almost certainly warrant a backport IMO to at least release-6, and probably release-5 as well. We used to have a "tracker" BZ for each minor release (e.g. 6.6) to keep track of backports by cloning the original BZ and changing the Version, and adding that BZ to the tracker. I'm not sure what happened to that practice. T
[Gluster-users] Reboot Issue with 6.5 on Ubuntu 18.04
Hi all, I am running a GlusterFS server 6.3 on three Ubuntu 18.04 nodes installed from the https://launchpad.net/~gluster PPA. I tried upgrading to 6.5 today and ran into an issue with the first (and only) node that has been upgraded so far. When I rebooted the node the underlying brick filesystems failed to mount because of a `pvscan` process timing out on boot. I did some experimenting and the issue seems to be that on reboot the glusterfsd processes (that expose the bricks as far as I understand) are not being shut down which leads to the underlying filesystems show up as busy and not getting properly unmounted. Then I found out that `systemctl stop glusterd.service` doesn't stop the brick processes by design and it also seems that for Fedora/RHEL this has been worked around by having a separate `glusterfsd.service` unit that only acts on shutdown. This however does not seem to be the case on Ubuntu and I can't figure out what is the expected flow there. So I guess my question is - is this normal/expected behaviour on Ubuntu? How is one supposed to set things up so that bricks get properly unmounted on reboot and properly mounted at startup? I am also considering migrating from Ubuntu to CentOS now as the upstream support seems much better there. If I decide to switch can I re-use the existing bricks or do I need to spin up a clean node, join the cluster and get the data synced to it? Thanks! Best regards, -- alexander iliev ___ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Issues with Geo-replication (GlusterFS 6.3 on Ubuntu 18.04)
Hi all, Sunny, thank you for the update. I have applied the patch locally on my slave system and now the mountbroker setup is successful. I am facing another issue though - when I try to create a replication session between the two sites I am getting: # gluster volume geo-replication store1 glustergeorep@::store1 create push-pem Error : Request timed out geo-replication command failed It is still unclear to me if my setup is expected to work at all. Reading the geo-replication documentation at [1] I see this paragraph: > A password-less SSH connection is also required for gsyncd between every node in the master to every node in the slave. The gluster system:: execute gsec_create command creates secret-pem files on all the nodes in the master, and is used to implement the password-less SSH connection. The push-pem option in the geo-replication create command pushes these keys to all the nodes in the slave. It is not clear to me whether connectivity from each master node to each slave node is a requirement in terms of networking. In my setup the slave nodes form the Gluster pool over a private network which is not reachable from the master site. Any ideas how to proceed from here will be greatly appreciated. Thanks! Links: [1] https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3/html/administration_guide/sect-preparing_to_deploy_geo-replication Best regards, -- alexander iliev On 9/3/19 2:50 PM, Sunny Kumar wrote: Thank you for the explanation Kaleb. Alexander, This fix will be available with next release for all supported versions. /sunny On Mon, Sep 2, 2019 at 6:47 PM Kaleb Keithley wrote: Fixes on master (before or after the release-7 branch was taken) almost certainly warrant a backport IMO to at least release-6, and probably release-5 as well. We used to have a "tracker" BZ for each minor release (e.g. 6.6) to keep track of backports by cloning the original BZ and changing the Version, and adding that BZ to the tracker. I'm not sure what happened to that practice. The last ones I can find are for 6.3 and 5.7; https://bugzilla.redhat.com/show_bug.cgi?id=glusterfs-6.3 and https://bugzilla.redhat.com/show_bug.cgi?id=glusterfs-5.7 It isn't enough to just backport recent fixes on master to release-7. We are supposedly continuing to maintain release-6 and release-5 after release-7 GAs. If that has changed, I haven't seen an announcement to that effect. I don't know why our developers don't automatically backport to all the actively maintained releases. Even if there isn't a tracker BZ, you can always create a backport BZ by cloning the original BZ and change the release to 6. That'd be a good place to start. On Sun, Sep 1, 2019 at 8:45 AM Alexander Iliev wrote: Hi Strahil, Yes, this might be right, but I would still expect fixes like this to be released for all supported major versions (which should include 6.) At least that's how I understand https://www.gluster.org/release-schedule/. Anyway, let's wait for Sunny to clarify. Best regards, alexander iliev On 9/1/19 2:07 PM, Strahil Nikolov wrote: Hi Alex, I'm not very deep into bugzilla stuff, but for me NEXTRELEASE means v7. Sunny, Am I understanding it correctly ? Best Regards, Strahil Nikolov В неделя, 1 септември 2019 г., 14:27:32 ч. Гринуич+3, Alexander Iliev написа: Hi Sunny, Thank you for the quick response. It's not clear to me however if the fix has been already released or not. The bug status is CLOSED NEXTRELEASE and according to [1] the NEXTRELEASE resolution means that the fix will be included in the next supported release. The bug is logged against the mainline version though, so I'm not sure what this means exactly. From the 6.4[2] and 6.5[3] release notes it seems it hasn't been released yet. Ideally I would not like to patch my systems locally, so if you have an ETA on when this will be out officially I would really appreciate it. Links: [1] https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_status [2] https://docs.gluster.org/en/latest/release-notes/6.4/ [3] https://docs.gluster.org/en/latest/release-notes/6.5/ Thank you! Best regards, alexander iliev On 8/30/19 9:22 AM, Sunny Kumar wrote: > Hi Alexander, > > Thanks for pointing that out! > > But this issue is fixed now you can see below link for bz-link and patch. > > BZ - https://bugzilla.redhat.com/show_bug.cgi?id=1709248 > > Patch - https://review.gluster.org/#/c/glusterfs/+/22716/ > > Hope this helps. > > /sunny > > On Fri, Aug 30, 2019 at 2:30 AM Alexander Iliev > mailto:glus...@mamul.org>> wrote: >> >> Hello dear GlusterFS users list, >> >> I have been trying to set up geo-replication between two clusters for >> some time now. The desired state is (Cluster #1) being replicated to >> (Cluster #2). >>
Re: [Gluster-users] Issues with Geo-replication (GlusterFS 6.3 on Ubuntu 18.04)
Hi Strahil, Yes, this might be right, but I would still expect fixes like this to be released for all supported major versions (which should include 6.) At least that's how I understand https://www.gluster.org/release-schedule/. Anyway, let's wait for Sunny to clarify. Best regards, alexander iliev On 9/1/19 2:07 PM, Strahil Nikolov wrote: Hi Alex, I'm not very deep into bugzilla stuff, but for me NEXTRELEASE means v7. Sunny, Am I understanding it correctly ? Best Regards, Strahil Nikolov В неделя, 1 септември 2019 г., 14:27:32 ч. Гринуич+3, Alexander Iliev написа: Hi Sunny, Thank you for the quick response. It's not clear to me however if the fix has been already released or not. The bug status is CLOSED NEXTRELEASE and according to [1] the NEXTRELEASE resolution means that the fix will be included in the next supported release. The bug is logged against the mainline version though, so I'm not sure what this means exactly. From the 6.4[2] and 6.5[3] release notes it seems it hasn't been released yet. Ideally I would not like to patch my systems locally, so if you have an ETA on when this will be out officially I would really appreciate it. Links: [1] https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_status [2] https://docs.gluster.org/en/latest/release-notes/6.4/ [3] https://docs.gluster.org/en/latest/release-notes/6.5/ Thank you! Best regards, alexander iliev On 8/30/19 9:22 AM, Sunny Kumar wrote: > Hi Alexander, > > Thanks for pointing that out! > > But this issue is fixed now you can see below link for bz-link and patch. > > BZ - https://bugzilla.redhat.com/show_bug.cgi?id=1709248 > > Patch - https://review.gluster.org/#/c/glusterfs/+/22716/ > > Hope this helps. > > /sunny > > On Fri, Aug 30, 2019 at 2:30 AM Alexander Iliev > mailto:glus...@mamul.org>> wrote: >> >> Hello dear GlusterFS users list, >> >> I have been trying to set up geo-replication between two clusters for >> some time now. The desired state is (Cluster #1) being replicated to >> (Cluster #2). >> >> Here are some details about the setup: >> >> Cluster #1: three nodes connected via a local network (172.31.35.0/24), >> one replicated (3 replica) volume. >> >> Cluster #2: three nodes connected via a local network (172.31.36.0/24), >> one replicated (3 replica) volume. >> >> The two clusters are connected to the Internet via separate network >> adapters. >> >> Only SSH (port 22) is open on cluster #2 nodes' adapters connected to >> the Internet. >> >> All nodes are running Ubuntu 18.04 and GlusterFS 6.3 installed from [1]. >> >> The first time I followed the guide[2] everything went fine up until I >> reached the "Create the session" step. That was like a month ago, then I >> had to temporarily stop working in this and now I am coming back to it. >> >> Currently, if I try to see the mountbroker status I get the following: >> >>> # gluster-mountbroker status >>> Traceback (most recent call last): >>> File "/usr/sbin/gluster-mountbroker", line 396, in >>> runcli() >>> File "/usr/lib/python3/dist-packages/gluster/cliutils/cliutils.py", line 225, in runcli >>> cls.run(args) >>> File "/usr/sbin/gluster-mountbroker", line 275, in run >>> out = execute_in_peers("node-status") >>> File "/usr/lib/python3/dist-packages/gluster/cliutils/cliutils.py", >> line 127, in execute_in_peers >>> raise GlusterCmdException((rc, out, err, " ".join(cmd))) >>> gluster.cliutils.cliutils.GlusterCmdException: (1, '', 'Unable to >> end. Error : Success\n', 'gluster system:: execute mountbroker.py >> node-status') >> >> And in /var/log/gluster/glusterd.log I have: >> >>> [2019-08-10 15:24:21.418834] E [MSGID: 106336] >> [glusterd-geo-rep.c:5413:glusterd_op_sys_exec] 0-management: Unable to >> end. Error : Success >>> [2019-08-10 15:24:21.418908] E [MSGID: 106122] >> [glusterd-syncop.c:1445:gd_commit_op_phase] 0-management: Commit of >> operation 'Volume Execute system commands' failed on localhost : Unable >> to end. Error : Success >> >> So, I have two questions right now: >> >> 1) Is there anything wrong with my setup (networking, open ports, etc.)? >> Is it expected to work with this setup or should I redo it in a >> different way? >> 2) How can I troubleshoot the current status of my setup? Can I find out >> what's missing/wrong and continue from there or should I just start from >> scra
Re: [Gluster-users] Issues with Geo-replication (GlusterFS 6.3 on Ubuntu 18.04)
Hi Sunny, Thank you for the quick response. It's not clear to me however if the fix has been already released or not. The bug status is CLOSED NEXTRELEASE and according to [1] the NEXTRELEASE resolution means that the fix will be included in the next supported release. The bug is logged against the mainline version though, so I'm not sure what this means exactly. From the 6.4[2] and 6.5[3] release notes it seems it hasn't been released yet. Ideally I would not like to patch my systems locally, so if you have an ETA on when this will be out officially I would really appreciate it. Links: [1] https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_status [2] https://docs.gluster.org/en/latest/release-notes/6.4/ [3] https://docs.gluster.org/en/latest/release-notes/6.5/ Thank you! Best regards, alexander iliev On 8/30/19 9:22 AM, Sunny Kumar wrote: Hi Alexander, Thanks for pointing that out! But this issue is fixed now you can see below link for bz-link and patch. BZ - https://bugzilla.redhat.com/show_bug.cgi?id=1709248 Patch - https://review.gluster.org/#/c/glusterfs/+/22716/ Hope this helps. /sunny On Fri, Aug 30, 2019 at 2:30 AM Alexander Iliev wrote: Hello dear GlusterFS users list, I have been trying to set up geo-replication between two clusters for some time now. The desired state is (Cluster #1) being replicated to (Cluster #2). Here are some details about the setup: Cluster #1: three nodes connected via a local network (172.31.35.0/24), one replicated (3 replica) volume. Cluster #2: three nodes connected via a local network (172.31.36.0/24), one replicated (3 replica) volume. The two clusters are connected to the Internet via separate network adapters. Only SSH (port 22) is open on cluster #2 nodes' adapters connected to the Internet. All nodes are running Ubuntu 18.04 and GlusterFS 6.3 installed from [1]. The first time I followed the guide[2] everything went fine up until I reached the "Create the session" step. That was like a month ago, then I had to temporarily stop working in this and now I am coming back to it. Currently, if I try to see the mountbroker status I get the following: # gluster-mountbroker status Traceback (most recent call last): File "/usr/sbin/gluster-mountbroker", line 396, in runcli() File "/usr/lib/python3/dist-packages/gluster/cliutils/cliutils.py", line 225, in runcli cls.run(args) File "/usr/sbin/gluster-mountbroker", line 275, in run out = execute_in_peers("node-status") File "/usr/lib/python3/dist-packages/gluster/cliutils/cliutils.py", line 127, in execute_in_peers raise GlusterCmdException((rc, out, err, " ".join(cmd))) gluster.cliutils.cliutils.GlusterCmdException: (1, '', 'Unable to end. Error : Success\n', 'gluster system:: execute mountbroker.py node-status') And in /var/log/gluster/glusterd.log I have: [2019-08-10 15:24:21.418834] E [MSGID: 106336] [glusterd-geo-rep.c:5413:glusterd_op_sys_exec] 0-management: Unable to end. Error : Success [2019-08-10 15:24:21.418908] E [MSGID: 106122] [glusterd-syncop.c:1445:gd_commit_op_phase] 0-management: Commit of operation 'Volume Execute system commands' failed on localhost : Unable to end. Error : Success So, I have two questions right now: 1) Is there anything wrong with my setup (networking, open ports, etc.)? Is it expected to work with this setup or should I redo it in a different way? 2) How can I troubleshoot the current status of my setup? Can I find out what's missing/wrong and continue from there or should I just start from scratch? Links: [1] http://ppa.launchpad.net/gluster/glusterfs-6/ubuntu [2] https://docs.gluster.org/en/latest/Administrator%20Guide/Geo%20Replication/ Thank you! Best regards, -- alexander iliev ___ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] Issues with Geo-replication (GlusterFS 6.3 on Ubuntu 18.04)
Hello dear GlusterFS users list, I have been trying to set up geo-replication between two clusters for some time now. The desired state is (Cluster #1) being replicated to (Cluster #2). Here are some details about the setup: Cluster #1: three nodes connected via a local network (172.31.35.0/24), one replicated (3 replica) volume. Cluster #2: three nodes connected via a local network (172.31.36.0/24), one replicated (3 replica) volume. The two clusters are connected to the Internet via separate network adapters. Only SSH (port 22) is open on cluster #2 nodes' adapters connected to the Internet. All nodes are running Ubuntu 18.04 and GlusterFS 6.3 installed from [1]. The first time I followed the guide[2] everything went fine up until I reached the "Create the session" step. That was like a month ago, then I had to temporarily stop working in this and now I am coming back to it. Currently, if I try to see the mountbroker status I get the following: # gluster-mountbroker status Traceback (most recent call last): File "/usr/sbin/gluster-mountbroker", line 396, in runcli() File "/usr/lib/python3/dist-packages/gluster/cliutils/cliutils.py", line 225, in runcli cls.run(args) File "/usr/sbin/gluster-mountbroker", line 275, in run out = execute_in_peers("node-status") File "/usr/lib/python3/dist-packages/gluster/cliutils/cliutils.py", line 127, in execute_in_peers raise GlusterCmdException((rc, out, err, " ".join(cmd))) gluster.cliutils.cliutils.GlusterCmdException: (1, '', 'Unable to end. Error : Success\n', 'gluster system:: execute mountbroker.py node-status') And in /var/log/gluster/glusterd.log I have: [2019-08-10 15:24:21.418834] E [MSGID: 106336] [glusterd-geo-rep.c:5413:glusterd_op_sys_exec] 0-management: Unable to end. Error : Success [2019-08-10 15:24:21.418908] E [MSGID: 106122] [glusterd-syncop.c:1445:gd_commit_op_phase] 0-management: Commit of operation 'Volume Execute system commands' failed on localhost : Unable to end. Error : Success So, I have two questions right now: 1) Is there anything wrong with my setup (networking, open ports, etc.)? Is it expected to work with this setup or should I redo it in a different way? 2) How can I troubleshoot the current status of my setup? Can I find out what's missing/wrong and continue from there or should I just start from scratch? Links: [1] http://ppa.launchpad.net/gluster/glusterfs-6/ubuntu [2] https://docs.gluster.org/en/latest/Administrator%20Guide/Geo%20Replication/ Thank you! Best regards, -- alexander iliev ___ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users