Re: [Gluster-users] Replica 3 scale out and ZFS bricks

2020-09-18 Thread Alexander Iliev

On 9/17/20 4:47 PM, Strahil Nikolov wrote:

  I guess I misunderstood you - if I decode the diagram correctly it should be 
OK , you will always have at least 2 bricks available after a node get's down.

It would be way simpler if you add a 5th node (VM probably) as an arbiter and 
switch to 'replica 3 arbiter 1'.


Yep, I would add an arbiter node in this case.

What I wanted to make sure was my understanding of the way GlusterFS is 
able to scale is correct. Specifically expanding a volume by adding one 
storage node to the current setup.


Thanks, Strahil.

Best regards,
--
alexander iliev




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Replica 3 scale out and ZFS bricks

2020-09-17 Thread Alexander Iliev

On 9/17/20 3:37 AM, Stephan von Krawczynski wrote:

Nevertheless you will break performance anyway by deploying user-space
crawling-slow glusterfs... outcome of 10 wasted years of development in the
wrong direction.


Genuinely asking - what would you recommend instead of GlusterFS for a 
highly available, horizontally scalable storage system?


Best regards,
--
alexander iliev




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Replica 3 scale out and ZFS bricks

2020-09-17 Thread Alexander Iliev

On 9/16/20 9:53 PM, Strahil Nikolov wrote:

В сряда, 16 септември 2020 г., 11:54:57 Гринуич+3, Alexander Iliev 
 написа:

 From what I understood, in order to be able to scale it one node at a
time, I need to set up the initial nodes with a number of bricks that is
a multiple of 3 (e.g., 3, 6, 9, etc. bricks). The initial cluster will
be able to export a volume as large as the storage of a single node and
adding one more node will grow the volume by 1/3 (assuming homogeneous
nodes.)

     You can't add 1 node to a replica 3, so no - you won't get 1/3 with that 
extra node.


OK, then I guess I was totally confused on this point.

I'd imagined something like this would work:

  node1node2node3
+-+  +-+  +-+
| brick 1 |  | brick 1 |  | brick 1 |
| brick 2 |  | brick 2 |  | brick 2 |
| brick 3 |  | brick 3 |  | brick 3 |
+-+  +-+  +-+
 |
 v
  node1node2node3node4
+-+  +-+  +-+  +-+
| brick 1 |  | brick 1 |  | brick 4 |  | brick 1 |
| brick 2 |  | brick 4 |  | brick 2 |  | brick 2 |
| brick 3 |  | brick 3 |  | brick 3 |  | brick 4 |
+-+  +-+  +-+  +-+

any# gluster peer probe node4
any# gluster volume replace-brick volume1 node2:/gfs/2/brick 
node4:/gfs/2/brick commit force
any# gluster volume replace-brick volume1 node3:/gfs/1/brick 
node4:/gfs/1/brick commit force
node2# umount /gfs/2 && mkfs /dev/... && mv /gfs/2 /gfs/4 && mount 
/dev/... /gfs/4 # or clean up the replaced brick by other means
node3# umount /gfs/1 && mkgs /dev/... && mv /gfs/1 /gfs/4 && mount 
/dev/... /gfs/4 # or clean up the replaced brick by other means
any# gluster volume add-brick volume1 node2:/gfs/4/brick 
node3:/gfs/4/brick node4:/gfs/4/brick


(Note: /etc/fstab or whatever mounting mechanism is used also needs to 
be updated after renaming the mount-points on node2 and node3.)


I played around with this in a VM setup and it seems to work, but maybe 
I'm missing something.


Even if this is supposed to work maybe it has other implications I'm not 
aware of, so I would be happy to be educated on this.




My plan is to use ZFS as the underlying system for the bricks. Now I'm
wondering - if I join the disks on each node in a, say, RAIDZ2 pool and
then create a dataset within the pool for each brick, the GlusterFS
volume would report the volume size 3x$brick_size, because each brick
shares the same pool and the size/free space is reported according to
the ZFS pool size/free space.

I'm not sure about ZFS (never played with it on Linux), but in my systems I 
setup a Thinpool consisting on all HDDs in a striped way (when no Hardware Raid 
Controller is available) and then you setup thin LVs for each brick.
In thin LVM you can define Virtual Size and this size is reported as the volume 
size (assuming that all bricks are the same in size).If you have 1 RAIDZ2 pool 
per Gluster TSP node, then that pool's size is the maximum size of your volume. 
If you plan to use snapshots , then you should set quota on the volume to 
control the usage.

How should I go about this? Should I create a ZFS pool per brick (this
seems to have a negative impact on performance)? Should I set a quota
for each dataset?

I would go with 1 RAIDZ2 pool with 1 dataset of type 'filesystem' per Gluster 
node . Quota is always good to have.


P.S.: Any reason to use ZFS ? It uses a lot of memory .


Two main reasons for ZFS - node-level redundancy and compression.

I want to enable some node-level fault tolerance in order to avoid 
healing a failed node from scratch. From my experience so far healing 
(at least in our environment) is quite slow and painful.


Hardware RAID is not an option in our setup. With LVM mirroring we would 
be utilizing 50% of the physical space. We could go with mdadm+LVM, but 
it feels messier and AFAIK mdadm RAID6 is prone to the "write hole" 
problem (but maybe I'm outdated on this one).hunter86...@yahoo.com


Best regards,
--
alexander iliev




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] Replica 3 scale out and ZFS bricks

2020-09-16 Thread Alexander Iliev

Hi list,

I am in the process of planning a 3-node replica 3 setup and I have a 
question about scaling it out.


From what I understood, in order to be able to scale it one node at a 
time, I need to set up the initial nodes with a number of bricks that is 
a multiple of 3 (e.g., 3, 6, 9, etc. bricks). The initial cluster will 
be able to export a volume as large as the storage of a single node and 
adding one more node will grow the volume by 1/3 (assuming homogeneous 
nodes.)


Please let me know if my understanding is correct.

My plan is to use ZFS as the underlying system for the bricks. Now I'm 
wondering - if I join the disks on each node in a, say, RAIDZ2 pool and 
then create a dataset within the pool for each brick, the GlusterFS 
volume would report the volume size 3x$brick_size, because each brick 
shares the same pool and the size/free space is reported according to 
the ZFS pool size/free space.


How should I go about this? Should I create a ZFS pool per brick (this 
seems to have a negative impact on performance)? Should I set a quota 
for each dataset?


Does my plan even make sense?

Thank you!

Best regards,
--
alexander iliev




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] GlusterFS geo-replication progress question

2020-04-19 Thread Alexander Iliev

Thanks, Sunny.

alexander iliev

On 4/7/20 12:25 AM, Sunny Kumar wrote:

Hi Alexander,

Answers inline below:

On Thu, Apr 2, 2020 at 1:08 AM Alexander Iliev  wrote:


Hi all,

I have a running geo-replication session between two clusters and I'm
trying to figure out what is the current progress of the replication and
possibly how much longer it will take.

It has been running for quite a while now (> 1 month), but the thing is
that both the hardware of the nodes and the link between the two
clusters aren't that great (e.g., the volumes are backed by rotating
disks) and the volume is somewhat sizeable (30-ish TB) and given these
details I'm not really sure how long it is supposed to take normally.

I have several bricks in the volume (same brick size and physical layout
in both clusters) that are now showing up with a Changelog Crawl status
and with a recent LAST_SYNCED date in the `gluster colume
geo-replication status detail` command output which seems to be the
desired state for all bricks. The rest of the bricks though are in
Hybrid Crawl state and have been in that state forever.

So I suppose my questions are - how can I tell if the replication
session is somehow broken and if it's not, then is there are way for me
to find out the progress and the ETA of the replication?


Please go through this section[1] which talks about this.
In Hybrid crawl at present we do not have any accounting information
like how much time it will take to sync data.


In /var/log/glusterfs/geo-replication/$session_dir/gsyncd.log there are
some errors like:

[2020-03-31 11:48:47.81269] E [syncdutils(worker
/data/gfs/store1/8/brick):822:errlog] Popen: command returned error
cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i
/var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto
-S /tmp/gsync
d-aux-ssh-6aDWmc/206c4b2c3eb782ea2cf49ab5142bd68b.sock x.x.x.x
/nonexistent/gsyncd slave  x.x.x.x:: --master-node x.x.x.x
--master-node-id 9476b8bb-d7ee-489a-b083-875805343e67 --master-brick
 --local-node x.x.x.x
2 --local-node-id 426b564d-35d9-4291-980e-795903e9a386 --slave-timeout
120 --slave-log-level INFO --slave-gluster-log-level INFO
--slave-gluster-command-dir /usr/sbinerror=1
[2020-03-31 11:48:47.81617] E [syncdutils(worker
):826:logerr] Popen: ssh> failed with ValueError.
[2020-03-31 11:48:47.390397] I [repce(agent
):97:service_loop] RepceServer: terminating on reaching EOF.



If you are seeing this error at a regular interval then please check
your ssh connection, it might have broken.
If possible please share full traceback form both master and slave to
debug the issue.


In the brick logs I see stuff like:

[2020-03-29 07:49:05.338947] E [fuse-bridge.c:4167:fuse_xattr_cbk]
0-glusterfs-fuse: extended attribute not supported by the backend storage

I don't know if these are critical, from the rest of the logs it looks
like data is traveling between the clusters.

Any help will be greatly appreciated. Thank you in advance!

Best regards,
--
alexander iliev




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


[1]. 
https://docs.gluster.org/en/latest/Administrator%20Guide/Geo%20Replication/#status

/sunny






Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] gluster v6.8: systemd units disabled after install

2020-04-11 Thread Alexander Iliev

Hi Hubert,

I think this would vary from distribution to distribution and it is up 
to the package maintainers of the particular distribution to decide what 
the default should be.


I am using Gluster 6.6 on CentOS and the Gluster-specific services there 
were also disabled (although not exactly as in your original post - the 
vendor preset was also disabled for me, while it is enabled for you).


This is only a speculation for this particular case, but I think the 
idea in general is to have the system administrator explicitly enable 
the services he wants running on reboot.


I would argue that this is the safer approach as opposed to enabling a 
service automatically after its installation. An example scenario would 
be - you install a service, the system is rebooted, e.g. due to a power 
outage, mistyped command, etc., the service is started automatically 
even though it hasn't been properly configured yet.


I guess, to really know the reasoning, the respective package 
maintainers would need to jump in and share their idea behind this decision.


Best regards,
--
alexander iliev

On 4/11/20 7:40 AM, Hu Bert wrote:

Hi,

so no one has seen the problem of disabled systemd units before?

Regards,
Hubert

Am Mo., 6. Apr. 2020 um 12:30 Uhr schrieb Hu Bert :


Hello,

after a server reboot (with a fresh gluster 6.8 install) i noticed
that the gluster services weren't running.

systemctl status glusterd.service
● glusterd.service - GlusterFS, a clustered file-system server
Loaded: loaded (/lib/systemd/system/glusterd.service; disabled;
vendor preset: enabled)
Active: inactive (dead)
  Docs: man:glusterd(8)

Apr 06 11:34:18 glfsserver1 systemd[1]:
/lib/systemd/system/glusterd.service:9: PIDFile= references path below
legacy directory /var/run/, updating /var/run/glusterd.pid →
/run/glusterd.pid; please update the unit file accordingly.

systemctl status glustereventsd.service
● glustereventsd.service - Gluster Events Notifier
Loaded: loaded (/lib/systemd/system/glustereventsd.service;
disabled; vendor preset: enabled)
Active: inactive (dead)
  Docs: man:glustereventsd(8)

Apr 06 11:34:27 glfsserver1 systemd[1]:
/lib/systemd/system/glustereventsd.service:11: PIDFile= references
path below legacy directory /var/run/, updating
/var/run/glustereventsd.pid → /run/glustereventsd.pid; please update
the unit file accordingly.

You have to enable them manually:

systemctl enable glusterd.service
Created symlink
/etc/systemd/system/multi-user.target.wants/glusterd.service →
/lib/systemd/system/glusterd.service.
systemctl enable glustereventsd.service
Created symlink
/etc/systemd/system/multi-user.target.wants/glustereventsd.service →
/lib/systemd/system/glustereventsd.service.

Is this a bug? If so: already known?


Regards,
Hubert





Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users






Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] GlusterFS geo-replication progress question

2020-04-01 Thread Alexander Iliev

Hi all,

I have a running geo-replication session between two clusters and I'm 
trying to figure out what is the current progress of the replication and 
possibly how much longer it will take.


It has been running for quite a while now (> 1 month), but the thing is 
that both the hardware of the nodes and the link between the two 
clusters aren't that great (e.g., the volumes are backed by rotating 
disks) and the volume is somewhat sizeable (30-ish TB) and given these 
details I'm not really sure how long it is supposed to take normally.


I have several bricks in the volume (same brick size and physical layout 
in both clusters) that are now showing up with a Changelog Crawl status 
and with a recent LAST_SYNCED date in the `gluster colume 
geo-replication status detail` command output which seems to be the 
desired state for all bricks. The rest of the bricks though are in 
Hybrid Crawl state and have been in that state forever.


So I suppose my questions are - how can I tell if the replication 
session is somehow broken and if it's not, then is there are way for me 
to find out the progress and the ETA of the replication?


In /var/log/glusterfs/geo-replication/$session_dir/gsyncd.log there are 
some errors like:


[2020-03-31 11:48:47.81269] E [syncdutils(worker 
/data/gfs/store1/8/brick):822:errlog] Popen: command returned error 
cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i 
/var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto 
-S /tmp/gsync
d-aux-ssh-6aDWmc/206c4b2c3eb782ea2cf49ab5142bd68b.sock x.x.x.x 
/nonexistent/gsyncd slave  x.x.x.x:: --master-node x.x.x.x 
--master-node-id 9476b8bb-d7ee-489a-b083-875805343e67 --master-brick 
 --local-node x.x.x.x
2 --local-node-id 426b564d-35d9-4291-980e-795903e9a386 --slave-timeout 
120 --slave-log-level INFO --slave-gluster-log-level INFO 
--slave-gluster-command-dir /usr/sbinerror=1
[2020-03-31 11:48:47.81617] E [syncdutils(worker 
):826:logerr] Popen: ssh> failed with ValueError.
[2020-03-31 11:48:47.390397] I [repce(agent 
):97:service_loop] RepceServer: terminating on reaching EOF.


In the brick logs I see stuff like:

[2020-03-29 07:49:05.338947] E [fuse-bridge.c:4167:fuse_xattr_cbk] 
0-glusterfs-fuse: extended attribute not supported by the backend storage


I don't know if these are critical, from the rest of the logs it looks 
like data is traveling between the clusters.


Any help will be greatly appreciated. Thank you in advance!

Best regards,
--
alexander iliev




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Is rebalance in progress or not?

2020-03-15 Thread Alexander Iliev

On 3/15/20 5:17 PM, Strahil Nikolov wrote:

On March 15, 2020 12:16:51 PM GMT+02:00, Alexander Iliev 
 wrote:

On 3/15/20 11:07 AM, Strahil Nikolov wrote:

On March 15, 2020 11:50:32 AM GMT+02:00, Alexander Iliev

 wrote:

Hi list,

I was having some issues with one of my Gluster nodes so I ended up
re-installing it. Now I want to re-add the bricks for my main volume
and
I'm having the following issue - when I try to add the bricks I get:


# gluster volume add-brick store1 replica 3 
volume add-brick: failed: Pre Validation failed on 172.31.35.132.

Volume name store1 rebalance is in progress. Please retry after
completion

But then if I get the rebalance status I get:


# gluster volume rebalance store1 status
volume rebalance: store1: failed: Rebalance not started for volume

store1.

And if I try to start the rebalancing I get:


# gluster volume rebalance store1 start
volume rebalance: store1: failed: Rebalance on store1 is already

started

Looking at the logs of the first node, when I try to start the
rebalance
operation I see this:


[2020-03-15 09:41:31.883651] E [MSGID: 106276]

[glusterd-rpc-ops.c:1200:__glusterd_stage_op_cbk] 0-management:
Received
stage RJT from uuid: 9476b8bb-d7ee-489a-b083-875805343e67

On the second node the logs are showing stuff that indicates that a
rebalance operation is indeed in progress:


[2020-03-15 09:47:34.190042] I [MSGID: 109081]

[dht-common.c:5868:dht_setxattr] 0-store1-dht: fixing the layout of
/redacted

[2020-03-15 09:47:34.775691] I

[dht-rebalance.c:3285:gf_defrag_process_dir] 0-store1-dht: migrate

data


called on /redacted

[2020-03-15 09:47:36.019403] I

[dht-rebalance.c:3480:gf_defrag_process_dir] 0-store1-dht: Migration
operation on dir /redacted took 1.24 secs


Some background on what led to this situation:

The volume was originally a replica 3 distributed replicated volume

on

three nodes. In order to detach the faulty node I lowered the

replica

count to 2 and removed the bricks from that node from the volume. I
cleaned up the storage (formatted the bricks and cleaned the
trusted.gfid and trusted.glusterfs.volume-id extended attributes)

and

purged the gluster packages from the system, then I re-installed the
gluster packages and did a `gluster peer probe` from another node.

I'm running Gluster 6.6 on CentOS 7.7 on all nodes.

I feel stuck at this point, so any guidance will be greatly
appreciated.

Thanks!

Best regards,


Hey  Alex,

Did you try to  go the second node  (the  one tgat  thinks  balance

is running)  and stop tge balance ?


gluster volume rebalance VOLNAME stop

Then add the new brick (and  increase  the  replica  count) and after

the  heal is over - rebalance again.

Hey Strahil,

Thanks for the suggestion, I just tried it, but unfortunately the
result
is pretty much the same - when I try to stop the rebalance on the
second
node it reports that no rebalance is in progress:


# gluster volume rebalance store1 stop
volume rebalance: store1: failed: Rebalance not started for volume

store1.



Best Regards,
Strahil Nikolov



Best regards,
--
alexander iliev


Hey Alex,

I'm not sure  if the  command  has  a  'force' flag, but of it does - it is 
worth trying.

gluster volume rebalance store1 stop force


Hey Strahil,

Thank again for your suggestions!

According to the `gluster volume rebalance help` output only the `start` 
subcommand supports a force flag. I tried that already, unfortunately it 
doesn't help:


# gluster volume rebalance store1 start force
volume rebalance: store1: failed: Rebalance on store1 is already started
# gluster volume rebalance store1 stop
volume rebalance: store1: failed: Rebalance not started for volume store1.


Sadly, as  the second  node  thinks balance  is running - I'm not sure if a 
'start force' (to convince both nodes  that balance  is runking )and then 
'stop'  will have the expected  effect.


The rebalance is indeed running on the second node judging from the 
contents of /var/log/glusterfs/store1-rebalance.log.



Sadly, this situation is hard to reproduce.


In any way , a bug report  should be opened .


The thing is I'm not sure if I can provide meaningful steps to reproduce 
at this point. I didn't keep proper track of all the things I attempted, 
so I'm not sure if the bug report I can file would be of much value. :(



Keep  in mind  that I do not have  a  distributed volume ,  so everything above 
is pure speculation.


Based  on my experience - a gluster  upgrade can fix odd situations like that, 
but also it could make things worse . So for now avoid any upgrades,  until a 
dev confirms  it is safe to do.


Yeah, I'd rather wait for the rebalance to finish before I make any 
further attempts at it. Sadly the storage is backed by rather slow 
(spinning) drives, so it might take a while, but even so I prefer being 
safe than sorry. :)





Best Regards,
Strahil Nikolov



Best regards,
--
alexander iliev




Community Meeting Calendar:

Schedule -
Every Tuesday

Re: [Gluster-users] Is rebalance in progress or not?

2020-03-15 Thread Alexander Iliev

On 3/15/20 11:07 AM, Strahil Nikolov wrote:

On March 15, 2020 11:50:32 AM GMT+02:00, Alexander Iliev 
 wrote:

Hi list,

I was having some issues with one of my Gluster nodes so I ended up
re-installing it. Now I want to re-add the bricks for my main volume
and
I'm having the following issue - when I try to add the bricks I get:


# gluster volume add-brick store1 replica 3 
volume add-brick: failed: Pre Validation failed on 172.31.35.132.

Volume name store1 rebalance is in progress. Please retry after
completion

But then if I get the rebalance status I get:


# gluster volume rebalance store1 status
volume rebalance: store1: failed: Rebalance not started for volume

store1.

And if I try to start the rebalancing I get:


# gluster volume rebalance store1 start
volume rebalance: store1: failed: Rebalance on store1 is already

started

Looking at the logs of the first node, when I try to start the
rebalance
operation I see this:


[2020-03-15 09:41:31.883651] E [MSGID: 106276]

[glusterd-rpc-ops.c:1200:__glusterd_stage_op_cbk] 0-management:
Received
stage RJT from uuid: 9476b8bb-d7ee-489a-b083-875805343e67

On the second node the logs are showing stuff that indicates that a
rebalance operation is indeed in progress:


[2020-03-15 09:47:34.190042] I [MSGID: 109081]

[dht-common.c:5868:dht_setxattr] 0-store1-dht: fixing the layout of
/redacted

[2020-03-15 09:47:34.775691] I

[dht-rebalance.c:3285:gf_defrag_process_dir] 0-store1-dht: migrate data

called on /redacted

[2020-03-15 09:47:36.019403] I

[dht-rebalance.c:3480:gf_defrag_process_dir] 0-store1-dht: Migration
operation on dir /redacted took 1.24 secs


Some background on what led to this situation:

The volume was originally a replica 3 distributed replicated volume on
three nodes. In order to detach the faulty node I lowered the replica
count to 2 and removed the bricks from that node from the volume. I
cleaned up the storage (formatted the bricks and cleaned the
trusted.gfid and trusted.glusterfs.volume-id extended attributes) and
purged the gluster packages from the system, then I re-installed the
gluster packages and did a `gluster peer probe` from another node.

I'm running Gluster 6.6 on CentOS 7.7 on all nodes.

I feel stuck at this point, so any guidance will be greatly
appreciated.

Thanks!

Best regards,


Hey  Alex,

Did you try to  go the second node  (the  one tgat  thinks  balance  is 
running)  and stop tge balance ?

gluster volume rebalance VOLNAME stop

Then add the new brick (and  increase  the  replica  count) and after  the  
heal is over - rebalance again.


Hey Strahil,

Thanks for the suggestion, I just tried it, but unfortunately the result 
is pretty much the same - when I try to stop the rebalance on the second 
node it reports that no rebalance is in progress:


> # gluster volume rebalance store1 stop
> volume rebalance: store1: failed: Rebalance not started for volume 
store1.




Best Regards,
Strahil Nikolov



Best regards,
--
alexander iliev




Community Meeting Calendar:

Schedule -
Every Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] Is rebalance in progress or not?

2020-03-15 Thread Alexander Iliev

Hi list,

I was having some issues with one of my Gluster nodes so I ended up 
re-installing it. Now I want to re-add the bricks for my main volume and 
I'm having the following issue - when I try to add the bricks I get:


> # gluster volume add-brick store1 replica 3 
> volume add-brick: failed: Pre Validation failed on 172.31.35.132. 
Volume name store1 rebalance is in progress. Please retry after completion


But then if I get the rebalance status I get:

> # gluster volume rebalance store1 status
> volume rebalance: store1: failed: Rebalance not started for volume 
store1.


And if I try to start the rebalancing I get:

> # gluster volume rebalance store1 start
> volume rebalance: store1: failed: Rebalance on store1 is already started

Looking at the logs of the first node, when I try to start the rebalance 
operation I see this:


> [2020-03-15 09:41:31.883651] E [MSGID: 106276] 
[glusterd-rpc-ops.c:1200:__glusterd_stage_op_cbk] 0-management: Received 
stage RJT from uuid: 9476b8bb-d7ee-489a-b083-875805343e67


On the second node the logs are showing stuff that indicates that a 
rebalance operation is indeed in progress:


> [2020-03-15 09:47:34.190042] I [MSGID: 109081] 
[dht-common.c:5868:dht_setxattr] 0-store1-dht: fixing the layout of 
/redacted
> [2020-03-15 09:47:34.775691] I 
[dht-rebalance.c:3285:gf_defrag_process_dir] 0-store1-dht: migrate data 
called on /redacted
> [2020-03-15 09:47:36.019403] I 
[dht-rebalance.c:3480:gf_defrag_process_dir] 0-store1-dht: Migration 
operation on dir /redacted took 1.24 secs



Some background on what led to this situation:

The volume was originally a replica 3 distributed replicated volume on 
three nodes. In order to detach the faulty node I lowered the replica 
count to 2 and removed the bricks from that node from the volume. I 
cleaned up the storage (formatted the bricks and cleaned the 
trusted.gfid and trusted.glusterfs.volume-id extended attributes) and 
purged the gluster packages from the system, then I re-installed the 
gluster packages and did a `gluster peer probe` from another node.


I'm running Gluster 6.6 on CentOS 7.7 on all nodes.

I feel stuck at this point, so any guidance will be greatly appreciated.

Thanks!

Best regards,
--
alexander iliev




Community Meeting Calendar:

Schedule -
Every Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] Geo-replication /var/lib space question

2020-02-10 Thread Alexander Iliev

Hello list,

I have been running a geo-replication session for some time now, but at 
some point I noticed that the /var/lib/misc/gluster is eating up the 
storage on my root partition.


I moved the folder away to another partition, but I don't seem to 
remember reading any specific space requirement for /var/lib and 
geo-replication. Did I miss it in the documentation?


Also, does the space used in /var/lib/misc/gluster depend on the 
geo-replicated volume size? What exactly is stored there? (I'm guessing 
that's where gsyncd keeps track of the replicatation progress.)


(I'm running gluster 6.6 on CentOS 7.7.)

Thanks!
--
alexander iliev


Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/441850968

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Issues with Geo-replication (GlusterFS 6.3 on Ubuntu 18.04)

2019-10-17 Thread Alexander Iliev

On 10/17/19 5:32 PM, Aravinda Vishwanathapura Krishna Murthy wrote:



On Thu, Oct 17, 2019 at 12:54 PM Alexander Iliev 
mailto:ailiev%2bglus...@mamul.org>> wrote:


Thanks, Aravinda.

Does this mean that my scenario is currently unsupported?


Please try by providing external IP while creating Geo-rep session. We 
will work on the enhancement if it didn't work.


This is what I've been doing all along. It didn't work for me.




It seems that I need to make sure that the nodes in the two clusters
can
see each-other (some kind of VPN would work I guess).

Is this be documented somewhere? I think I've read the geo-replication
documentation several times now, but somehow it wasn't obvious to me
that you need access to the slave nodes from the master ones (apart
from
the SSH access).

Thanks!

Best regards,
--
alexander iliev

On 10/17/19 5:25 AM, Aravinda Vishwanathapura Krishna Murthy wrote:
 > Got it.
 >
 > Geo-replication uses slave nodes IP in the following cases,
 >
 > - Verification during Session creation - It tries to mount the Slave
 > volume using the hostname/IP provided in Geo-rep create command. Try
 > Geo-rep create by specifying the external IP which is accessible
from
 > the master node.
 > - Once Geo-replication is started, it gets the list of Slave nodes
 > IP/hostname from Slave volume info and connects to those IPs. But in
 > this case, those are internal IP addresses that are not
accessible from
 > Master nodes. - We need to enhance Geo-replication to accept
external IP
 > and internal IP map details so that for all connections it can use
 > external IP.
 >
 > On Wed, Oct 16, 2019 at 10:29 PM Alexander Iliev
 > mailto:ailiev%2bglus...@mamul.org>
<mailto:ailiev%2bglus...@mamul.org
<mailto:ailiev%252bglus...@mamul.org>>> wrote:
 >
 >     Hi Aravinda,
 >
 >     All volume brick on the slave volume are up and the volume seems
 >     functional.
 >
 >     Your suggestion about trying to mount the slave volume on a
master node
 >     brings up my question about network connectivity again - the
GlusterFS
 >     documentation[1] says:
 >
 >       > The server specified in the mount command is only used to
fetch the
 >     gluster configuration volfile describing the volume name.
Subsequently,
 >     the client will communicate directly with the servers
mentioned in the
 >     volfile (which might not even include the one used for mount).
 >
 >     To me this means that the masternode from your example is
expected to
 >     have connectivity to the network where the slave volume runs,
i.e. to
 >     have network access to the slave nodes. In my geo-replication
scenario
 >     this is definitely not the case. The two cluster are running
in two
 >     completely different networks that are not interconnected.
 >
 >     So my question is - how is the slave volume mount expected to
happen if
 >     the client host cannot access the GlusterFS nodes? Or is the
 >     connectivity a requirement even for geo-replication?
 >
 >     I'm not sure if I'm missing something, but any help will be
highly
 >     appreciated!
 >
 >     Thanks!
 >
 >     Links:
 >     [1]
 >

https://gluster.readthedocs.io/en/latest/Administrator%20Guide/Setting%20Up%20Clients/
 >     --
 >     alexander iliev
 >
 >     On 10/16/19 6:03 AM, Aravinda Vishwanathapura Krishna Murthy
wrote:
 >      > Hi Alexander,
 >      >
 >      > Please check the status of Volume. Looks like the Slave volume
 >     mount is
 >      > failing because bricks are down or not reachable. If Volume
 >     status shows
 >      > all bricks are up then try mounting the slave volume using
mount
     >     command.
 >      >
 >      > ```
 >      > masternode$ mkdir /mnt/vol
 >      > masternode$ mount -t glusterfs : /mnt/vol
 >      > ```
 >      >
 >      > On Fri, Oct 11, 2019 at 4:03 AM Alexander Iliev
 >      > mailto:ailiev%2bglus...@mamul.org>
<mailto:ailiev%2bglus...@mamul.org
<mailto:ailiev%252bglus...@mamul.org>>
 >     <mailto:ailiev%2bglus...@mamul.org
<mailto:ailiev%252bglus...@mamul.org>
 >     <mailto:ailiev%252bglus...@mamul.org
<mailto:ailiev%25252bglus...@mamul.org>>>> wrote:
 >      >
 >      >     Hi all,
 >      >
 >      >     I ended up rein

Re: [Gluster-users] Issues with Geo-replication (GlusterFS 6.3 on Ubuntu 18.04)

2019-10-17 Thread Alexander Iliev

Thanks, Aravinda.

Does this mean that my scenario is currently unsupported?

It seems that I need to make sure that the nodes in the two clusters can 
see each-other (some kind of VPN would work I guess).


Is this be documented somewhere? I think I've read the geo-replication 
documentation several times now, but somehow it wasn't obvious to me 
that you need access to the slave nodes from the master ones (apart from 
the SSH access).


Thanks!

Best regards,
--
alexander iliev

On 10/17/19 5:25 AM, Aravinda Vishwanathapura Krishna Murthy wrote:

Got it.

Geo-replication uses slave nodes IP in the following cases,

- Verification during Session creation - It tries to mount the Slave 
volume using the hostname/IP provided in Geo-rep create command. Try 
Geo-rep create by specifying the external IP which is accessible from 
the master node.
- Once Geo-replication is started, it gets the list of Slave nodes 
IP/hostname from Slave volume info and connects to those IPs. But in 
this case, those are internal IP addresses that are not accessible from 
Master nodes. - We need to enhance Geo-replication to accept external IP 
and internal IP map details so that for all connections it can use 
external IP.


On Wed, Oct 16, 2019 at 10:29 PM Alexander Iliev 
mailto:ailiev%2bglus...@mamul.org>> wrote:


Hi Aravinda,

All volume brick on the slave volume are up and the volume seems
functional.

Your suggestion about trying to mount the slave volume on a master node
brings up my question about network connectivity again - the GlusterFS
documentation[1] says:

  > The server specified in the mount command is only used to fetch the
gluster configuration volfile describing the volume name. Subsequently,
the client will communicate directly with the servers mentioned in the
volfile (which might not even include the one used for mount).

To me this means that the masternode from your example is expected to
have connectivity to the network where the slave volume runs, i.e. to
have network access to the slave nodes. In my geo-replication scenario
this is definitely not the case. The two cluster are running in two
completely different networks that are not interconnected.

So my question is - how is the slave volume mount expected to happen if
the client host cannot access the GlusterFS nodes? Or is the
connectivity a requirement even for geo-replication?

I'm not sure if I'm missing something, but any help will be highly
appreciated!

Thanks!

Links:
[1]

https://gluster.readthedocs.io/en/latest/Administrator%20Guide/Setting%20Up%20Clients/
    --
alexander iliev

On 10/16/19 6:03 AM, Aravinda Vishwanathapura Krishna Murthy wrote:
 > Hi Alexander,
 >
 > Please check the status of Volume. Looks like the Slave volume
mount is
 > failing because bricks are down or not reachable. If Volume
status shows
 > all bricks are up then try mounting the slave volume using mount
command.
 >
 > ```
 > masternode$ mkdir /mnt/vol
 > masternode$ mount -t glusterfs : /mnt/vol
 > ```
 >
 > On Fri, Oct 11, 2019 at 4:03 AM Alexander Iliev
 > mailto:ailiev%2bglus...@mamul.org>
<mailto:ailiev%2bglus...@mamul.org
<mailto:ailiev%252bglus...@mamul.org>>> wrote:
 >
 >     Hi all,
 >
 >     I ended up reinstalling the nodes with CentOS 7.5 and
GlusterFS 6.5
 >     (installed from the SIG.)
 >
 >     Now when I try to create a replication session I get the
following:
 >
 >       > # gluster volume geo-replication store1
::store2 create
 >     push-pem
 >       > Unable to mount and fetch slave volume details. Please
check the
 >     log:
 >     /var/log/glusterfs/geo-replication/gverify-slavemnt.log
 >       > geo-replication command failed
 >
 >     You can find the contents of gverify-slavemnt.log below, but the
 >     initial
 >     error seems to be:
 >
 >       > [2019-10-10 22:07:51.578519] E
 >     [fuse-bridge.c:5211:fuse_first_lookup]
 >     0-fuse: first lookup on root failed (Transport endpoint is not
 >     connected)
 >
 >     I only found
 >     [this](https://bugzilla.redhat.com/show_bug.cgi?id=1659824)
 >     bug report which doesn't seem to help. The reported issue is
failure to
 >     mount a volume on a GlusterFS client, but in my case I need
 >     geo-replication which implies the client (geo-replication
master) being
 >     on a different network.
 >
 >     Any help will be appreciated.
 >
 >     Thanks!
 >
 >     gverify-slavemnt.log:
 >
 >       > [2019-10-10 22:07:40.571256] I [MSGID: 

Re: [Gluster-users] Issues with Geo-replication (GlusterFS 6.3 on Ubuntu 18.04)

2019-10-16 Thread Alexander Iliev

Hi Aravinda,

All volume brick on the slave volume are up and the volume seems functional.

Your suggestion about trying to mount the slave volume on a master node 
brings up my question about network connectivity again - the GlusterFS 
documentation[1] says:


> The server specified in the mount command is only used to fetch the 
gluster configuration volfile describing the volume name. Subsequently, 
the client will communicate directly with the servers mentioned in the 
volfile (which might not even include the one used for mount).


To me this means that the masternode from your example is expected to 
have connectivity to the network where the slave volume runs, i.e. to 
have network access to the slave nodes. In my geo-replication scenario 
this is definitely not the case. The two cluster are running in two 
completely different networks that are not interconnected.


So my question is - how is the slave volume mount expected to happen if 
the client host cannot access the GlusterFS nodes? Or is the 
connectivity a requirement even for geo-replication?


I'm not sure if I'm missing something, but any help will be highly 
appreciated!


Thanks!

Links:
[1] 
https://gluster.readthedocs.io/en/latest/Administrator%20Guide/Setting%20Up%20Clients/

--
alexander iliev

On 10/16/19 6:03 AM, Aravinda Vishwanathapura Krishna Murthy wrote:

Hi Alexander,

Please check the status of Volume. Looks like the Slave volume mount is 
failing because bricks are down or not reachable. If Volume status shows 
all bricks are up then try mounting the slave volume using mount command.


```
masternode$ mkdir /mnt/vol
masternode$ mount -t glusterfs : /mnt/vol
```

On Fri, Oct 11, 2019 at 4:03 AM Alexander Iliev 
mailto:ailiev%2bglus...@mamul.org>> wrote:


Hi all,

I ended up reinstalling the nodes with CentOS 7.5 and GlusterFS 6.5
(installed from the SIG.)

Now when I try to create a replication session I get the following:

  > # gluster volume geo-replication store1 ::store2 create
push-pem
  > Unable to mount and fetch slave volume details. Please check the
log:
/var/log/glusterfs/geo-replication/gverify-slavemnt.log
  > geo-replication command failed

You can find the contents of gverify-slavemnt.log below, but the
initial
error seems to be:

  > [2019-10-10 22:07:51.578519] E
[fuse-bridge.c:5211:fuse_first_lookup]
0-fuse: first lookup on root failed (Transport endpoint is not
connected)

I only found
[this](https://bugzilla.redhat.com/show_bug.cgi?id=1659824)
bug report which doesn't seem to help. The reported issue is failure to
mount a volume on a GlusterFS client, but in my case I need
geo-replication which implies the client (geo-replication master) being
on a different network.

Any help will be appreciated.

Thanks!

gverify-slavemnt.log:

  > [2019-10-10 22:07:40.571256] I [MSGID: 100030]
[glusterfsd.c:2847:main] 0-glusterfs: Started running glusterfs version
6.5 (args: glusterfs --xlator-option=*dht.lookup-unhashed=off
--volfile-server  --volfile-id store2 -l
/var/log/glusterfs/geo-replication/gverify-slavemnt.log
/tmp/gverify.sh.5nFlRh)
  > [2019-10-10 22:07:40.575438] I [glusterfsd.c:2556:daemonize]
0-glusterfs: Pid of current running process is 6021
  > [2019-10-10 22:07:40.584282] I [MSGID: 101190]
[event-epoll.c:680:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 0
  > [2019-10-10 22:07:40.584299] I [MSGID: 101190]
[event-epoll.c:680:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 1
  > [2019-10-10 22:07:40.928094] I [MSGID: 114020]
[client.c:2393:notify]
0-store2-client-0: parent translators are ready, attempting connect on
transport
  > [2019-10-10 22:07:40.931121] I [MSGID: 114020]
[client.c:2393:notify]
0-store2-client-1: parent translators are ready, attempting connect on
transport
  > [2019-10-10 22:07:40.933976] I [MSGID: 114020]
[client.c:2393:notify]
0-store2-client-2: parent translators are ready, attempting connect on
transport
  > Final graph:
  >

+--+
  >   1: volume store2-client-0
  >   2:     type protocol/client
  >   3:     option ping-timeout 42
  >   4:     option remote-host 172.31.36.11
  >   5:     option remote-subvolume /data/gfs/store1/1/brick-store2
  >   6:     option transport-type socket
  >   7:     option transport.address-family inet
  >   8:     option transport.socket.ssl-enabled off
  >   9:     option transport.tcp-user-timeout 0
  >  10:     option transport.socket.keepalive-time 20
  >  11:     option transport.socket.keepalive-interval 2
  >  12:     option transport.socket.keepalive-count 9
  >  13:   

Re: [Gluster-users] Issues with Geo-replication (GlusterFS 6.3 on Ubuntu 18.04)

2019-10-10 Thread Alexander Iliev
nd-volume
>  65:
>  66: volume store2-read-ahead
>  67: type performance/read-ahead
>  68: subvolumes store2-write-behind
>  69: end-volume
>  70:
>  71: volume store2-readdir-ahead
>  72: type performance/readdir-ahead
>  73: option parallel-readdir off
>  74: option rda-request-size 131072
>  75: option rda-cache-limit 10MB
>  76: subvolumes store2-read-ahead
>  77: end-volume
>  78:
>  79: volume store2-io-cache
>  80: type performance/io-cache
>  81: subvolumes store2-readdir-ahead
>  82: end-volume
>  83:
>  84: volume store2-open-behind
>  85: type performance/open-behind
>  86: subvolumes store2-io-cache
>  87: end-volume
>  88:
>  89: volume store2-quick-read
>  90: type performance/quick-read
>  91: subvolumes store2-open-behind
>  92: end-volume
>  93:
>  94: volume store2-md-cache
>  95: type performance/md-cache
>  96: subvolumes store2-quick-read
>  97: end-volume
>  98:
>  99: volume store2
> 100: type debug/io-stats
> 101: option log-level INFO
> 102: option latency-measurement off
> 103: option count-fop-hits off
> 104: subvolumes store2-md-cache
> 105: end-volume
> 106:
> 107: volume meta-autoload
> 108: type meta
> 109: subvolumes store2
> 110: end-volume
> 111:
> 
+--+
> [2019-10-10 22:07:51.578287] I [fuse-bridge.c:5142:fuse_init] 
0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 
kernel 7.22
> [2019-10-10 22:07:51.578356] I [fuse-bridge.c:5753:fuse_graph_sync] 
0-fuse: switched to graph 0
> [2019-10-10 22:07:51.578467] I [MSGID: 108006] 
[afr-common.c:5666:afr_local_init] 0-store2-replicate-0: no subvolumes up
> [2019-10-10 22:07:51.578519] E [fuse-bridge.c:5211:fuse_first_lookup] 
0-fuse: first lookup on root failed (Transport endpoint is not connected)
> [2019-10-10 22:07:51.578709] W [fuse-bridge.c:1266:fuse_attr_cbk] 
0-glusterfs-fuse: 2: LOOKUP() / => -1 (Transport endpoint is not connected)
> [2019-10-10 22:07:51.578687] I [MSGID: 108006] 
[afr-common.c:5666:afr_local_init] 0-store2-replicate-0: no subvolumes up
> [2019-10-10 22:09:48.222459] E [MSGID: 108006] 
[afr-common.c:5318:__afr_handle_child_down_event] 0-store2-replicate-0: 
All subvolumes are down. Going offline until at least one of them comes 
back up.
> The message "E [MSGID: 108006] 
[afr-common.c:5318:__afr_handle_child_down_event] 0-store2-replicate-0: 
All subvolumes are down. Going offline until at least one of them comes 
back up." repeated 2 times between [2019-10-10 22:09:48.222459] and 
[2019-10-10 22:09:48.222891]

>

alexander iliev

On 9/8/19 4:50 PM, Alexander Iliev wrote:

Hi all,

Sunny, thank you for the update.

I have applied the patch locally on my slave system and now the 
mountbroker setup is successful.


I am facing another issue though - when I try to create a replication 
session between the two sites I am getting:


     # gluster volume geo-replication store1 
glustergeorep@::store1 create push-pem

     Error : Request timed out
     geo-replication command failed

It is still unclear to me if my setup is expected to work at all.

Reading the geo-replication documentation at [1] I see this paragraph:

 > A password-less SSH connection is also required for gsyncd between 
every node in the master to every node in the slave. The gluster 
system:: execute gsec_create command creates secret-pem files on all the 
nodes in the master, and is used to implement the password-less SSH 
connection. The push-pem option in the geo-replication create command 
pushes these keys to all the nodes in the slave.


It is not clear to me whether connectivity from each master node to each 
slave node is a requirement in terms of networking. In my setup the 
slave nodes form the Gluster pool over a private network which is not 
reachable from the master site.


Any ideas how to proceed from here will be greatly appreciated.

Thanks!

Links:
[1] 
https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3/html/administration_guide/sect-preparing_to_deploy_geo-replication 



Best regards,
--
alexander iliev

On 9/3/19 2:50 PM, Sunny Kumar wrote:

Thank you for the explanation Kaleb.

Alexander,

This fix will be available with next release for all supported versions.

/sunny

On Mon, Sep 2, 2019 at 6:47 PM Kaleb Keithley  
wrote:


Fixes on master (before or after the release-7 branch was taken) 
almost certainly warrant a backport IMO to at least release-6, and 
probably release-5 as well.


We used to have a "tracker" BZ for each minor release (e.g. 6.6) to 
keep track of backports by cloning the original BZ and changing the 
Version, and adding that BZ to the tracker. I'm not sure what 
happened to that practice. T

[Gluster-users] Reboot Issue with 6.5 on Ubuntu 18.04

2019-09-08 Thread Alexander Iliev

Hi all,

I am running a GlusterFS server 6.3 on three Ubuntu 18.04 nodes 
installed from the https://launchpad.net/~gluster PPA.


I tried upgrading to 6.5 today and ran into an issue with the first (and 
only) node that has been upgraded so far. When I rebooted the node the 
underlying brick filesystems failed to mount because of a `pvscan` 
process timing out on boot.


I did some experimenting and the issue seems to be that on reboot the 
glusterfsd processes (that expose the bricks as far as I understand) are 
not being shut down which leads to the underlying filesystems show up as 
busy and not getting properly unmounted.


Then I found out that `systemctl stop glusterd.service` doesn't stop the 
brick processes by design and it also seems that for Fedora/RHEL this 
has been worked around by having a separate `glusterfsd.service` unit 
that only acts on shutdown.


This however does not seem to be the case on Ubuntu and I can't figure 
out what is the expected flow there.


So I guess my question is - is this normal/expected behaviour on Ubuntu? 
How is one supposed to set things up so that bricks get properly 
unmounted on reboot and properly mounted at startup?


I am also considering migrating from Ubuntu to CentOS now as the 
upstream support seems much better there. If I decide to switch can I 
re-use the existing bricks or do I need to spin up a clean node, join 
the cluster and get the data synced to it?


Thanks!

Best regards,
--
alexander iliev
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Issues with Geo-replication (GlusterFS 6.3 on Ubuntu 18.04)

2019-09-08 Thread Alexander Iliev

Hi all,

Sunny, thank you for the update.

I have applied the patch locally on my slave system and now the 
mountbroker setup is successful.


I am facing another issue though - when I try to create a replication 
session between the two sites I am getting:


# gluster volume geo-replication store1 
glustergeorep@::store1 create push-pem

Error : Request timed out
geo-replication command failed

It is still unclear to me if my setup is expected to work at all.

Reading the geo-replication documentation at [1] I see this paragraph:

> A password-less SSH connection is also required for gsyncd between 
every node in the master to every node in the slave. The gluster 
system:: execute gsec_create command creates secret-pem files on all the 
nodes in the master, and is used to implement the password-less SSH 
connection. The push-pem option in the geo-replication create command 
pushes these keys to all the nodes in the slave.


It is not clear to me whether connectivity from each master node to each 
slave node is a requirement in terms of networking. In my setup the 
slave nodes form the Gluster pool over a private network which is not 
reachable from the master site.


Any ideas how to proceed from here will be greatly appreciated.

Thanks!

Links:
[1] 
https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3/html/administration_guide/sect-preparing_to_deploy_geo-replication


Best regards,
--
alexander iliev

On 9/3/19 2:50 PM, Sunny Kumar wrote:

Thank you for the explanation Kaleb.

Alexander,

This fix will be available with next release for all supported versions.

/sunny

On Mon, Sep 2, 2019 at 6:47 PM Kaleb Keithley  wrote:


Fixes on master (before or after the release-7 branch was taken) almost 
certainly warrant a backport IMO to at least release-6, and probably release-5 
as well.

We used to have a "tracker" BZ for each minor release (e.g. 6.6) to keep track 
of backports by cloning the original BZ and changing the Version, and adding that BZ to 
the tracker. I'm not sure what happened to that practice. The last ones I can find are 
for 6.3 and 5.7;  https://bugzilla.redhat.com/show_bug.cgi?id=glusterfs-6.3 and 
https://bugzilla.redhat.com/show_bug.cgi?id=glusterfs-5.7

It isn't enough to just backport recent fixes on master to release-7. We are 
supposedly continuing to maintain release-6 and release-5 after release-7 GAs. 
If that has changed, I haven't seen an announcement to that effect. I don't 
know why our developers don't automatically backport to all the actively 
maintained releases.

Even if there isn't a tracker BZ, you can always create a backport BZ by 
cloning the original BZ and change the release to 6. That'd be a good place to 
start.

On Sun, Sep 1, 2019 at 8:45 AM Alexander Iliev  wrote:


Hi Strahil,

Yes, this might be right, but I would still expect fixes like this to be
released for all supported major versions (which should include 6.) At
least that's how I understand https://www.gluster.org/release-schedule/.

Anyway, let's wait for Sunny to clarify.

Best regards,
alexander iliev

On 9/1/19 2:07 PM, Strahil Nikolov wrote:

Hi Alex,

I'm not very deep into bugzilla stuff, but for me NEXTRELEASE means v7.

Sunny,
Am I understanding it correctly ?

Best Regards,
Strahil Nikolov

В неделя, 1 септември 2019 г., 14:27:32 ч. Гринуич+3, Alexander Iliev
 написа:


Hi Sunny,

Thank you for the quick response.

It's not clear to me however if the fix has been already released or not.

The bug status is CLOSED NEXTRELEASE and according to [1] the
NEXTRELEASE resolution means that the fix will be included in the next
supported release. The bug is logged against the mainline version
though, so I'm not sure what this means exactly.

  From the 6.4[2] and 6.5[3] release notes it seems it hasn't been
released yet.

Ideally I would not like to patch my systems locally, so if you have an
ETA on when this will be out officially I would really appreciate it.

Links:
[1] https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_status
[2] https://docs.gluster.org/en/latest/release-notes/6.4/
[3] https://docs.gluster.org/en/latest/release-notes/6.5/

Thank you!

Best regards,

alexander iliev

On 8/30/19 9:22 AM, Sunny Kumar wrote:
  > Hi Alexander,
  >
  > Thanks for pointing that out!
  >
  > But this issue is fixed now you can see below link for bz-link and patch.
  >
  > BZ - https://bugzilla.redhat.com/show_bug.cgi?id=1709248
  >
  > Patch - https://review.gluster.org/#/c/glusterfs/+/22716/
  >
  > Hope this helps.
  >
  > /sunny
  >
  > On Fri, Aug 30, 2019 at 2:30 AM Alexander Iliev
  > mailto:glus...@mamul.org>> wrote:
  >>
  >> Hello dear GlusterFS users list,
  >>
  >> I have been trying to set up geo-replication between two clusters for
  >> some time now. The desired state is (Cluster #1) being replicated to
  >> (Cluster #2).
  >>

Re: [Gluster-users] Issues with Geo-replication (GlusterFS 6.3 on Ubuntu 18.04)

2019-09-01 Thread Alexander Iliev

Hi Strahil,

Yes, this might be right, but I would still expect fixes like this to be 
released for all supported major versions (which should include 6.) At 
least that's how I understand https://www.gluster.org/release-schedule/.


Anyway, let's wait for Sunny to clarify.

Best regards,
alexander iliev

On 9/1/19 2:07 PM, Strahil Nikolov wrote:

Hi Alex,

I'm not very deep into bugzilla stuff, but for me NEXTRELEASE means v7.

Sunny,
Am I understanding it correctly ?

Best Regards,
Strahil Nikolov

В неделя, 1 септември 2019 г., 14:27:32 ч. Гринуич+3, Alexander Iliev 
 написа:



Hi Sunny,

Thank you for the quick response.

It's not clear to me however if the fix has been already released or not.

The bug status is CLOSED NEXTRELEASE and according to [1] the
NEXTRELEASE resolution means that the fix will be included in the next
supported release. The bug is logged against the mainline version
though, so I'm not sure what this means exactly.

 From the 6.4[2] and 6.5[3] release notes it seems it hasn't been
released yet.

Ideally I would not like to patch my systems locally, so if you have an
ETA on when this will be out officially I would really appreciate it.

Links:
[1] https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_status
[2] https://docs.gluster.org/en/latest/release-notes/6.4/
[3] https://docs.gluster.org/en/latest/release-notes/6.5/

Thank you!

Best regards,

alexander iliev

On 8/30/19 9:22 AM, Sunny Kumar wrote:
 > Hi Alexander,
 >
 > Thanks for pointing that out!
 >
 > But this issue is fixed now you can see below link for bz-link and patch.
 >
 > BZ - https://bugzilla.redhat.com/show_bug.cgi?id=1709248
 >
 > Patch - https://review.gluster.org/#/c/glusterfs/+/22716/
 >
 > Hope this helps.
 >
 > /sunny
 >
 > On Fri, Aug 30, 2019 at 2:30 AM Alexander Iliev
 > mailto:glus...@mamul.org>> wrote:
 >>
 >> Hello dear GlusterFS users list,
 >>
 >> I have been trying to set up geo-replication between two clusters for
 >> some time now. The desired state is (Cluster #1) being replicated to
 >> (Cluster #2).
 >>
 >> Here are some details about the setup:
 >>
 >> Cluster #1: three nodes connected via a local network (172.31.35.0/24),
 >> one replicated (3 replica) volume.
 >>
 >> Cluster #2: three nodes connected via a local network (172.31.36.0/24),
 >> one replicated (3 replica) volume.
 >>
 >> The two clusters are connected to the Internet via separate network
 >> adapters.
 >>
 >> Only SSH (port 22) is open on cluster #2 nodes' adapters connected to
 >> the Internet.
 >>
 >> All nodes are running Ubuntu 18.04 and GlusterFS 6.3 installed from [1].
 >>
 >> The first time I followed the guide[2] everything went fine up until I
 >> reached the "Create the session" step. That was like a month ago, then I
 >> had to temporarily stop working in this and now I am coming back to it.
 >>
 >> Currently, if I try to see the mountbroker status I get the following:
 >>
 >>> # gluster-mountbroker status
 >>> Traceback (most recent call last):
 >>>    File "/usr/sbin/gluster-mountbroker", line 396, in 
 >>>      runcli()
 >>>    File 
"/usr/lib/python3/dist-packages/gluster/cliutils/cliutils.py", line 225, 
in runcli

 >>>      cls.run(args)
 >>>    File "/usr/sbin/gluster-mountbroker", line 275, in run
 >>>      out = execute_in_peers("node-status")
 >>>    File "/usr/lib/python3/dist-packages/gluster/cliutils/cliutils.py",
 >> line 127, in execute_in_peers
 >>>      raise GlusterCmdException((rc, out, err, " ".join(cmd)))
 >>> gluster.cliutils.cliutils.GlusterCmdException: (1, '', 'Unable to
 >> end. Error : Success\n', 'gluster system:: execute mountbroker.py
 >> node-status')
 >>
 >> And in /var/log/gluster/glusterd.log I have:
 >>
 >>> [2019-08-10 15:24:21.418834] E [MSGID: 106336]
 >> [glusterd-geo-rep.c:5413:glusterd_op_sys_exec] 0-management: Unable to
 >> end. Error : Success
 >>> [2019-08-10 15:24:21.418908] E [MSGID: 106122]
 >> [glusterd-syncop.c:1445:gd_commit_op_phase] 0-management: Commit of
 >> operation 'Volume Execute system commands' failed on localhost : Unable
 >> to end. Error : Success
 >>
 >> So, I have two questions right now:
 >>
 >> 1) Is there anything wrong with my setup (networking, open ports, etc.)?
 >> Is it expected to work with this setup or should I redo it in a
 >> different way?
 >> 2) How can I troubleshoot the current status of my setup? Can I find out
 >> what's missing/wrong and continue from there or should I just start from
 >> scra

Re: [Gluster-users] Issues with Geo-replication (GlusterFS 6.3 on Ubuntu 18.04)

2019-09-01 Thread Alexander Iliev

Hi Sunny,

Thank you for the quick response.

It's not clear to me however if the fix has been already released or not.

The bug status is CLOSED NEXTRELEASE and according to [1] the 
NEXTRELEASE resolution means that the fix will be included in the next 
supported release. The bug is logged against the mainline version 
though, so I'm not sure what this means exactly.


From the 6.4[2] and 6.5[3] release notes it seems it hasn't been 
released yet.


Ideally I would not like to patch my systems locally, so if you have an 
ETA on when this will be out officially I would really appreciate it.


Links:
[1] https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_status
[2] https://docs.gluster.org/en/latest/release-notes/6.4/
[3] https://docs.gluster.org/en/latest/release-notes/6.5/

Thank you!

Best regards,

alexander iliev

On 8/30/19 9:22 AM, Sunny Kumar wrote:

Hi Alexander,

Thanks for pointing that out!

But this issue is fixed now you can see below link for bz-link and patch.

BZ - https://bugzilla.redhat.com/show_bug.cgi?id=1709248

Patch - https://review.gluster.org/#/c/glusterfs/+/22716/

Hope this helps.

/sunny

On Fri, Aug 30, 2019 at 2:30 AM Alexander Iliev
 wrote:


Hello dear GlusterFS users list,

I have been trying to set up geo-replication between two clusters for
some time now. The desired state is (Cluster #1) being replicated to
(Cluster #2).

Here are some details about the setup:

Cluster #1: three nodes connected via a local network (172.31.35.0/24),
one replicated (3 replica) volume.

Cluster #2: three nodes connected via a local network (172.31.36.0/24),
one replicated (3 replica) volume.

The two clusters are connected to the Internet via separate network
adapters.

Only SSH (port 22) is open on cluster #2 nodes' adapters connected to
the Internet.

All nodes are running Ubuntu 18.04 and GlusterFS 6.3 installed from [1].

The first time I followed the guide[2] everything went fine up until I
reached the "Create the session" step. That was like a month ago, then I
had to temporarily stop working in this and now I am coming back to it.

Currently, if I try to see the mountbroker status I get the following:


# gluster-mountbroker status
Traceback (most recent call last):
   File "/usr/sbin/gluster-mountbroker", line 396, in 
 runcli()
   File "/usr/lib/python3/dist-packages/gluster/cliutils/cliutils.py", line 
225, in runcli
 cls.run(args)
   File "/usr/sbin/gluster-mountbroker", line 275, in run
 out = execute_in_peers("node-status")
   File "/usr/lib/python3/dist-packages/gluster/cliutils/cliutils.py",

line 127, in execute_in_peers

 raise GlusterCmdException((rc, out, err, " ".join(cmd)))
gluster.cliutils.cliutils.GlusterCmdException: (1, '', 'Unable to

end. Error : Success\n', 'gluster system:: execute mountbroker.py
node-status')

And in /var/log/gluster/glusterd.log I have:


[2019-08-10 15:24:21.418834] E [MSGID: 106336]

[glusterd-geo-rep.c:5413:glusterd_op_sys_exec] 0-management: Unable to
end. Error : Success

[2019-08-10 15:24:21.418908] E [MSGID: 106122]

[glusterd-syncop.c:1445:gd_commit_op_phase] 0-management: Commit of
operation 'Volume Execute system commands' failed on localhost : Unable
to end. Error : Success

So, I have two questions right now:

1) Is there anything wrong with my setup (networking, open ports, etc.)?
Is it expected to work with this setup or should I redo it in a
different way?
2) How can I troubleshoot the current status of my setup? Can I find out
what's missing/wrong and continue from there or should I just start from
scratch?

Links:
[1] http://ppa.launchpad.net/gluster/glusterfs-6/ubuntu
[2]
https://docs.gluster.org/en/latest/Administrator%20Guide/Geo%20Replication/

Thank you!

Best regards,
--
alexander iliev
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] Issues with Geo-replication (GlusterFS 6.3 on Ubuntu 18.04)

2019-08-29 Thread Alexander Iliev

Hello dear GlusterFS users list,

I have been trying to set up geo-replication between two clusters for 
some time now. The desired state is (Cluster #1) being replicated to 
(Cluster #2).


Here are some details about the setup:

Cluster #1: three nodes connected via a local network (172.31.35.0/24), 
one replicated (3 replica) volume.


Cluster #2: three nodes connected via a local network (172.31.36.0/24), 
one replicated (3 replica) volume.


The two clusters are connected to the Internet via separate network 
adapters.


Only SSH (port 22) is open on cluster #2 nodes' adapters connected to 
the Internet.


All nodes are running Ubuntu 18.04 and GlusterFS 6.3 installed from [1].

The first time I followed the guide[2] everything went fine up until I 
reached the "Create the session" step. That was like a month ago, then I 
had to temporarily stop working in this and now I am coming back to it.


Currently, if I try to see the mountbroker status I get the following:


# gluster-mountbroker status
Traceback (most recent call last):
  File "/usr/sbin/gluster-mountbroker", line 396, in 
runcli()
  File "/usr/lib/python3/dist-packages/gluster/cliutils/cliutils.py", line 225, 
in runcli
cls.run(args)
  File "/usr/sbin/gluster-mountbroker", line 275, in run
out = execute_in_peers("node-status")
  File "/usr/lib/python3/dist-packages/gluster/cliutils/cliutils.py", 

line 127, in execute_in_peers

raise GlusterCmdException((rc, out, err, " ".join(cmd)))
gluster.cliutils.cliutils.GlusterCmdException: (1, '', 'Unable to 
end. Error : Success\n', 'gluster system:: execute mountbroker.py 
node-status')


And in /var/log/gluster/glusterd.log I have:

[2019-08-10 15:24:21.418834] E [MSGID: 106336] 
[glusterd-geo-rep.c:5413:glusterd_op_sys_exec] 0-management: Unable to 
end. Error : Success
[2019-08-10 15:24:21.418908] E [MSGID: 106122] 
[glusterd-syncop.c:1445:gd_commit_op_phase] 0-management: Commit of 
operation 'Volume Execute system commands' failed on localhost : Unable 
to end. Error : Success


So, I have two questions right now:

1) Is there anything wrong with my setup (networking, open ports, etc.)? 
Is it expected to work with this setup or should I redo it in a 
different way?
2) How can I troubleshoot the current status of my setup? Can I find out 
what's missing/wrong and continue from there or should I just start from 
scratch?


Links:
[1] http://ppa.launchpad.net/gluster/glusterfs-6/ubuntu
[2] 
https://docs.gluster.org/en/latest/Administrator%20Guide/Geo%20Replication/


Thank you!

Best regards,
--
alexander iliev
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users