Re: [Gluster-users] Volume parameters listed more than once

2020-04-21 Thread Kotresh Hiremath Ravishankar
On Mon, Apr 20, 2020 at 9:37 PM Yaniv Kaul  wrote:

>
>
> On Mon, Apr 20, 2020 at 5:38 PM Dmitry Antipov 
> wrote:
>
>> # gluster volume info
>>
>> Volume Name: TEST0
>> Type: Distributed-Replicate
>> Volume ID: ca63095f-58dd-4ba8-82d6-7149a58c1423
>> Status: Created
>> Snapshot Count: 0
>> Number of Bricks: 3 x 3 = 9
>> Transport-type: tcp
>> Bricks:
>> Brick1: HOST-001:/mnt/SSD-0003
>> Brick2: HOST-001:/mnt/SSD-0004
>> Brick3: HOST-002:/mnt/SSD-0003
>> Brick4: HOST-002:/mnt/SSD-0004
>> Brick5: HOST-002:/mnt/SSD-0005
>> Brick6: HOST-003:/mnt/SSD-0002
>> Brick7: HOST-003:/mnt/SSD-0003
>> Brick8: HOST-003:/mnt/SSD-0004
>> Brick9: HOST-004:/mnt/SSD-0002
>> Options Reconfigured:
>> storage.fips-mode-rchecksum: on
>> transport.address-family: inet
>> nfs.disable: on
>> performance.client-io-threads: off
>>
>> # gluster volume get TEST0 all | grep performance.cache-size
>> performance.cache-size  32MB
>> performance.cache-size  128MB
>>
>
> I suspect these are for different translators and regretfully have the
> same name...
> performance/io-cache and performance/quick-read.
>
>
>>
>> ???
>>
>> # gluster volume get TEST0 all | grep features.ctime
>> features.ctime  on
>> features.ctime  on
>>
>
> Same - storage/posix and features/utime translators.
>
Yes, that's correct. These two xlators are dependent. Initially it was two
different options to enable each xlators to enable ctime feature.
The consumer had to enable both xlators with two different commands to
enable ctime feature. So it was decided
to provide the same key "features.ctime" to enable both xlators internally
resulting in two entries in volume info.

>
> Worth filing an issue about it, as it is indeed somewhat confusing.
> Y.
>
>
>> ???
>>
>> Dmitry
>> 
>>
>>
>>
>> Community Meeting Calendar:
>>
>> Schedule -
>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>> Bridge: https://bluejeans.com/441850968
>>
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>
>>
>> 
>
>
>
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://bluejeans.com/441850968
>
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>


-- 
Thanks and Regards,
Kotresh H R




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] geo-replication sync issue

2020-03-18 Thread Kotresh Hiremath Ravishankar
Could you try disabling syncing xattrs and check ?

gluster vol geo-rep  :: config sync-xattrs
false

On Fri, Mar 13, 2020 at 1:42 AM Strahil Nikolov 
wrote:

> On March 12, 2020 9:41:45 AM GMT+02:00, "Etem Bayoğlu" <
> etembayo...@gmail.com> wrote:
> >Hello again,
> >
> >These are gsyncd.log from master on DEBUG level. It tells entering
> >directory, synced files , and gfid information
> >
> >[2020-03-12 07:18:16.702286] D [master(worker
> >/srv/media-storage):324:regjob] _GMaster: synced
> >file=.gfid/358fe62c-c7e8-449a-90dd-1cc1a3b7a346
> >[2020-03-12 07:18:16.702420] D [master(worker
> >/srv/media-storage):324:regjob] _GMaster: synced
> >file=.gfid/04eb63e3-7fcb-45d2-9f29-6292a5072adb
> >[2020-03-12 07:18:16.702574] D [master(worker
> >/srv/media-storage):324:regjob] _GMaster: synced
> >file=.gfid/4363e521-d81a-4a0f-bfa4-5ee6b92da2b4
> >[2020-03-12 07:18:16.702704] D [master(worker
> >/srv/media-storage):324:regjob] _GMaster: synced
> >file=.gfid/bed30509-2c5f-4c77-b2f9-81916a99abd9
> >[2020-03-12 07:18:16.702828] D [master(worker
> >/srv/media-storage):324:regjob] _GMaster: synced
> >file=.gfid/d86f44cc-3001-4bdf-8bae-6bed2a9c8381
> >[2020-03-12 07:18:16.702950] D [master(worker
> >/srv/media-storage):324:regjob] _GMaster: synced
> >file=.gfid/da40d429-d89e-4dc9-9dda-07922d87b3c8
> >[2020-03-12 07:18:16.703075] D [master(worker
> >/srv/media-storage):324:regjob] _GMaster: synced
> >file=.gfid/befc5e03-b7a1-43dc-b6c2-0a186019b6d5
> >[2020-03-12 07:18:16.703198] D [master(worker
> >/srv/media-storage):324:regjob] _GMaster: synced
> >file=.gfid/4e66035f-99f9-4802-b876-2e01686d18f2
> >[2020-03-12 07:18:16.703378] D [master(worker
> >/srv/media-storage):324:regjob] _GMaster: synced
> >file=.gfid/d1295b51-e461-4766-b504-8e9a941a056f
> >[2020-03-12 07:18:16.719875] D [master(worker
> >/srv/media-storage):1792:Xcrawl] _GMaster: entering
> >./api/media/listing/2018/06-02/1557813
> >[2020-03-12 07:18:17.72679] D [master(worker
> >/srv/media-storage):1792:Xcrawl] _GMaster: entering
> >./api/media/listing/2018/06-02/1557205
> >[2020-03-12 07:18:17.297362] D [master(worker
> >/srv/media-storage):1792:Xcrawl] _GMaster: entering
> >./api/media/listing/2018/06-02/1556880
> >[2020-03-12 07:18:17.488224] D [master(worker
> >/srv/media-storage):1792:Xcrawl] _GMaster: entering
> >./api/media/listing/2018/06-02/1557769
> >[2020-03-12 07:18:17.730181] D [master(worker
> >/srv/media-storage):1792:Xcrawl] _GMaster: entering
> >./api/media/listing/2018/06-02/1557028
> >[2020-03-12 07:18:17.869410] I [gsyncd(config-get):318:main] :
> >Using
> >session config file
>
> >path=/var/lib/glusterd/geo-replication/media-storage_slave-node_dr-media/gsyncd.conf
> >[2020-03-12 07:18:18.65431] D [master(worker
> >/srv/media-storage):1792:Xcrawl] _GMaster: entering
> >./api/media/listing/2018/06-02/1558442
> >[2020-03-12 07:18:18.352381] D [master(worker
> >/srv/media-storage):1792:Xcrawl] _GMaster: entering
> >./api/media/listing/2018/06-02/1557391
> >[2020-03-12 07:18:18.374876] I [gsyncd(config-get):318:main] :
> >Using
> >session config file
>
> >path=/var/lib/glusterd/geo-replication/media-storage_slave-node_dr-media/gsyncd.conf
> >[2020-03-12 07:18:18.482299] I [gsyncd(config-set):318:main] :
> >Using
> >session config file
>
> >path=/var/lib/glusterd/geo-replication/media-storage_slave-nodem_dr-media/gsyncd.conf
> >[2020-03-12 07:18:18.507585] D [master(worker
> >/srv/media-storage):1792:Xcrawl] _GMaster: entering
> >./api/media/listing/2018/06-02/1558577
> >[2020-03-12 07:18:18.576061] I [gsyncd(config-get):318:main] :
> >Using
> >session config file
>
> >path=/var/lib/glusterd/geo-replication/media-storage_slave-node_dr-media/gsyncd.conf
> >[2020-03-12 07:18:18.582772] D [master(worker
> >/srv/media-storage):1792:Xcrawl] _GMaster: entering
> >./api/media/listing/2018/06-02/1556831
> >[2020-03-12 07:18:18.684170] I [gsyncd(config-get):318:main] :
> >Using
> >session config file
>
> >path=/var/lib/glusterd/geo-replication/media-storage_slave-node_dr-media/gsyncd.conf
> >[2020-03-12 07:18:18.691845] E [syncdutils(worker
> >/srv/media-storage):312:log_raise_exception] : connection to peer
> >is
> >broken
> >[2020-03-12 07:18:18.692106] E [syncdutils(worker
> >/srv/media-storage):312:log_raise_exception] : connection to peer
> >is
> >broken
> >[2020-03-12 07:18:18.694910] E [syncdutils(worker
> >/srv/media-storage):822:errlog] Popen: command returned error cmd=ssh
> >-oPasswordAuthentication=no -oStrictHostKeyChecking=no -i
> >/var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto
> >-S
> >/tmp/gsyncd-aux-ssh-WaMqpG/241afba5343394352fc3f9c251909232.sock
> >slave-node
> >/nonexistent/gsyncd slave media-storage slave-node::dr-media
> >--master-node
> >master-node --master-node-id 023cdb20-2737-4278-93c2-0927917ee314
> >--master-brick /srv/media-storage --local-node slave-node
> >--local-node-id
> >cf34fc96-a08a-49c2-b8eb-a3df5a05f757 --slave-timeout 120
> >--slave-log-level
> >DEBUG --slave-gluster-log-level INFO 

Re: [Gluster-users] Geo-replication /var/lib space question

2020-02-13 Thread Kotresh Hiremath Ravishankar
All  '.processed' directories (under working_dir and working_dir/.history)
contain processed changelogs and is no longer required by geo-replication
apart from debugging
purposes. That directory can be cleaned up if it's consuming too much space.

On Wed, Feb 12, 2020 at 11:36 PM Sunny Kumar  wrote:

> Hi Alexander,
>
> Yes that is geo-replication working directory and you can run the
> below command to get the location.
>   gluster vol geo-rep  :: config
> working_dir
>
> This directory contains parsed changelogs from backend brick which are
> ready to be processed. After a batch is processed it will be
> automatically cleaned up for next batch processing.
> It does not depends on volume size but on config value "
> changelog-batch-size " the max size of changelogs to process per
> batch.
>
> /sunny
>
> On Mon, Feb 10, 2020 at 11:07 PM Alexander Iliev
>  wrote:
> >
> > Hello list,
> >
> > I have been running a geo-replication session for some time now, but at
> > some point I noticed that the /var/lib/misc/gluster is eating up the
> > storage on my root partition.
> >
> > I moved the folder away to another partition, but I don't seem to
> > remember reading any specific space requirement for /var/lib and
> > geo-replication. Did I miss it in the documentation?
> >
> > Also, does the space used in /var/lib/misc/gluster depend on the
> > geo-replicated volume size? What exactly is stored there? (I'm guessing
> > that's where gsyncd keeps track of the replicatation progress.)
> >
> > (I'm running gluster 6.6 on CentOS 7.7.)
> >
> > Thanks!
> > --
> > alexander iliev
> > 
> >
> > Community Meeting Calendar:
> >
> > APAC Schedule -
> > Every 2nd and 4th Tuesday at 11:30 AM IST
> > Bridge: https://bluejeans.com/441850968
> >
> > NA/EMEA Schedule -
> > Every 1st and 3rd Tuesday at 01:00 PM EDT
> > Bridge: https://bluejeans.com/441850968
> >
> > Gluster-users mailing list
> > Gluster-users@gluster.org
> > https://lists.gluster.org/mailman/listinfo/gluster-users
> >
>
> 
>
> Community Meeting Calendar:
>
> APAC Schedule -
> Every 2nd and 4th Tuesday at 11:30 AM IST
> Bridge: https://bluejeans.com/441850968
>
> NA/EMEA Schedule -
> Every 1st and 3rd Tuesday at 01:00 PM EDT
> Bridge: https://bluejeans.com/441850968
>
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>
>

-- 
Thanks and Regards,
Kotresh H R


Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/441850968

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Unable to setup geo replication

2019-12-01 Thread Kotresh Hiremath Ravishankar
Hi,

Please try disabling xattr sync and see geo-rep works fine

gluster vol geo-rep  :: config sync_xattrs
false


On Thu, Nov 28, 2019 at 1:29 PM Tan, Jian Chern 
wrote:

> Alright so it seems to work with some errors and this the output I’m
> getting.
>
> [root@jfsotc22 mnt]# rsync -aR0 --inplace --super --stats --numeric-ids
> --no-implied-dirs --existing --xattrs --acls --ignore-missing-args file1 -e
> 'ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no  -p 22
> -oControlMaster=auto -i /var/lib/glusterd/geo-replication/secret.pem'
> r...@pgsotc10.png.intel.com:/mnt/
>
> rsync: rsync_xal_set: lsetxattr("/mnt/file1","security.selinux") failed:
> Operation not supported (95)
>
>
>
> Number of files: 1 (reg: 1)
>
> Number of created files: 0
>
> Number of deleted files: 0
>
> Number of regular files transferred: 1
>
> Total file size: 9 bytes
>
> Total transferred file size: 9 bytes
>
> Literal data: 9 bytes
>
> Matched data: 0 bytes
>
> File list size: 0
>
> File list generation time: 0.003 seconds
>
> File list transfer time: 0.000 seconds
>
> Total bytes sent: 152
>
> Total bytes received: 141
>
>
>
> sent 152 bytes  received 141 bytes  65.11 bytes/sec
>
> total size is 9  speedup is 0.03
>
> rsync error: some files/attrs were not transferred (see previous errors)
> (code 23) at main.c(1189) [sender=3.1.3]
>
>
>
> The data is synced over to the other machine when I view the file there
>
> [root@pgsotc10 mnt]# cat file1
>
> testdata
>
> [root@pgsotc10 mnt]#
>
>
>
> *From:* Kotresh Hiremath Ravishankar 
> *Sent:* Wednesday, November 27, 2019 5:25 PM
> *To:* Tan, Jian Chern 
> *Cc:* gluster-users@gluster.org
> *Subject:* Re: [Gluster-users] Unable to setup geo replication
>
>
>
> Oh forgot about that. Just setup passwordless ssh and that particular node
> and try with default ssh pem key and remove
> /var/lib/glusterd/geo-replicationsecre.pem from the command line
>
>
>
> On Wed, Nov 27, 2019 at 12:43 PM Tan, Jian Chern 
> wrote:
>
> I’m getting this when I run that command so something’s wrong somewhere I
> guess.
>
>
>
> [root@jfsotc22 mnt]# rsync -aR0 --inplace --super --stats --numeric-ids
> --no-implied-dirs --existing --xattrs --acls --ignore-missing-args file1 -e
> 'ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no  -p 22
> -oControlMaster=auto -i /var/lib/glusterd/geo-replication/secret.pem'
> r...@pgsotc11.png.intel.com:/mnt/
>
> gsyncd sibling not found
>
> disallowed rsync invocation
>
> rsync: connection unexpectedly closed (0 bytes received so far) [sender]
>
> rsync error: error in rsync protocol data stream (code 12) at io.c(226)
> [sender=3.1.3]
>
> [root@jfsotc22 mnt]#
>
>
>
> *From:* Kotresh Hiremath Ravishankar 
> *Sent:* Tuesday, November 26, 2019 7:22 PM
> *To:* Tan, Jian Chern 
> *Cc:* gluster-users@gluster.org
> *Subject:* Re: [Gluster-users] Unable to setup geo replication
>
>
>
> Ok, Then it should work.
> Could you confirm rsync runs successfully when executed manually as below.
>
>
>
> 1. On master node,
>  a) # mkdir /mastermnt
>  b) Mount master volume on /mastermnt
>  b) # echo "test data" /master/file1
>
> 2. On slave node
>  a) # mkdir /slavemnt
>  b) # Mount slave on /slavemnt
>
>  c) # touch /salvemnt/file1
>
> 3. On master node
>  a) # cd /mastermnt
>
>  b) # rsync -aR0 --inplace --super --stats --numeric-ids
> --no-implied-dirs --existing --xattrs --acls --ignore-missing-args file1 -e
> 'ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no  -p 22
> -oControlMaster=auto -i /var/lib/glusterd/geo-replication/secret.pem'
> r...@pgsotc11.png.intel.com:/slavemnt/
>
> 4. Check for content sync
>
>  a) cat /slavemnt/file1
>
>
>
> On Tue, Nov 26, 2019 at 1:19 PM Tan, Jian Chern 
> wrote:
>
> Rsync on both the slave and master are rsync  version 3.1.3  protocol
> version 31, so both are up to date as far as I know.
>
> Gluster version on both machines are glusterfs 5.10
>
> OS on both machines are Fedora 29 Server Edition
>
>
>
> *From:* Kotresh Hiremath Ravishankar 
> *Sent:* Tuesday, November 26, 2019 3:04 PM
> *To:* Tan, Jian Chern 
> *Cc:* gluster-users@gluster.org
> *Subject:* Re: [Gluster-users] Unable to setup geo replication
>
>
>
> The error code 14 related IPC where any of pipe/fork fails in rsync code.
> Please upgrade the rsync if not done. Also check the rsync versions
> between master and slave to be same.
>
> Which version of gluster are you using?
> What's the host OS?
>
>

Re: [Gluster-users] Unable to setup geo replication

2019-11-27 Thread Kotresh Hiremath Ravishankar
Oh forgot about that. Just setup passwordless ssh and that particular node
and try with default ssh pem key and remove
/var/lib/glusterd/geo-replicationsecre.pem from the command line

On Wed, Nov 27, 2019 at 12:43 PM Tan, Jian Chern 
wrote:

> I’m getting this when I run that command so something’s wrong somewhere I
> guess.
>
>
>
> [root@jfsotc22 mnt]# rsync -aR0 --inplace --super --stats --numeric-ids
> --no-implied-dirs --existing --xattrs --acls --ignore-missing-args file1 -e
> 'ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no  -p 22
> -oControlMaster=auto -i /var/lib/glusterd/geo-replication/secret.pem'
> r...@pgsotc11.png.intel.com:/mnt/
>
> gsyncd sibling not found
>
> disallowed rsync invocation
>
> rsync: connection unexpectedly closed (0 bytes received so far) [sender]
>
> rsync error: error in rsync protocol data stream (code 12) at io.c(226)
> [sender=3.1.3]
>
> [root@jfsotc22 mnt]#
>
>
>
> *From:* Kotresh Hiremath Ravishankar 
> *Sent:* Tuesday, November 26, 2019 7:22 PM
> *To:* Tan, Jian Chern 
> *Cc:* gluster-users@gluster.org
> *Subject:* Re: [Gluster-users] Unable to setup geo replication
>
>
>
> Ok, Then it should work.
> Could you confirm rsync runs successfully when executed manually as below.
>
>
>
> 1. On master node,
>  a) # mkdir /mastermnt
>  b) Mount master volume on /mastermnt
>  b) # echo "test data" /master/file1
>
> 2. On slave node
>  a) # mkdir /slavemnt
>  b) # Mount slave on /slavemnt
>
>  c) # touch /salvemnt/file1
>
> 3. On master node
>  a) # cd /mastermnt
>
>  b) # rsync -aR0 --inplace --super --stats --numeric-ids
> --no-implied-dirs --existing --xattrs --acls --ignore-missing-args file1 -e
> 'ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no  -p 22
> -oControlMaster=auto -i /var/lib/glusterd/geo-replication/secret.pem'
> r...@pgsotc11.png.intel.com:/slavemnt/
>
> 4. Check for content sync
>
>  a) cat /slavemnt/file1
>
>
>
> On Tue, Nov 26, 2019 at 1:19 PM Tan, Jian Chern 
> wrote:
>
> Rsync on both the slave and master are rsync  version 3.1.3  protocol
> version 31, so both are up to date as far as I know.
>
> Gluster version on both machines are glusterfs 5.10
>
> OS on both machines are Fedora 29 Server Edition
>
>
>
> *From:* Kotresh Hiremath Ravishankar 
> *Sent:* Tuesday, November 26, 2019 3:04 PM
> *To:* Tan, Jian Chern 
> *Cc:* gluster-users@gluster.org
> *Subject:* Re: [Gluster-users] Unable to setup geo replication
>
>
>
> The error code 14 related IPC where any of pipe/fork fails in rsync code.
> Please upgrade the rsync if not done. Also check the rsync versions
> between master and slave to be same.
>
> Which version of gluster are you using?
> What's the host OS?
>
> What's the rsync version ?
>
>
>
> On Tue, Nov 26, 2019 at 11:34 AM Tan, Jian Chern 
> wrote:
>
> I’m new to GlusterFS and trying to setup geo-replication with a master
> volume being mirrored to a slave volume on another machine. However I just
> can’t seem to get it to work after starting the geo replication volume with
> the logs showing it failing rsync with error code 14. I can’t seem to find
> any info about this online so any help would be much appreciated.
>
>
>
> [2019-11-26 05:46:31.24706] I
> [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status
> Change  status=Initializing...
>
> [2019-11-26 05:46:31.24891] I [monitor(monitor):157:monitor] Monitor:
> starting gsyncd workerbrick=/data/glusterimagebrick/jfsotc22-gv0
> slave_node=pgsotc11.png.intel.com
>
> [2019-11-26 05:46:31.90935] I [gsyncd(agent
> /data/glusterimagebrick/jfsotc22-gv0):308:main] : Using session config
> file
> path=/var/lib/glusterd/geo-replication/jfsotc22-gv0_pgsotc11.png.intel.com_pgsotc11-gv0/gsyncd.conf
>
> [2019-11-26 05:46:31.92105] I [changelogagent(agent
> /data/glusterimagebrick/jfsotc22-gv0):72:__init__] ChangelogAgent: Agent
> listining...
>
> [2019-11-26 05:46:31.93148] I [gsyncd(worker
> /data/glusterimagebrick/jfsotc22-gv0):308:main] : Using session config
> file
> path=/var/lib/glusterd/geo-replication/jfsotc22-gv0_pgsotc11.png.intel.com_pgsotc11-gv0/gsyncd.conf
>
> [2019-11-26 05:46:31.102422] I [resource(worker
> /data/glusterimagebrick/jfsotc22-gv0):1366:connect_remote] SSH:
> Initializing SSH connection between master and slave...
>
> [2019-11-26 05:46:50.355233] I [resource(worker
> /data/glusterimagebrick/jfsotc22-gv0):1413:connect_remote] SSH: SSH
> connection between master and slave established.duration=19.2526
>
> [2019-11-26 05:46:50.355583] I [resource(worker
> /data/glusterimagebrick/jf

Re: [Gluster-users] Unable to setup geo replication

2019-11-26 Thread Kotresh Hiremath Ravishankar
Ok, Then it should work.
Could you confirm rsync runs successfully when executed manually as below.

1. On master node,
 a) # mkdir /mastermnt
 b) Mount master volume on /mastermnt
 b) # echo "test data" /master/file1
2. On slave node
 a) # mkdir /slavemnt
 b) # Mount slave on /slavemnt
 c) # touch /salvemnt/file1
3. On master node
 a) # cd /mastermnt
 b) # rsync -aR0 --inplace --super --stats --numeric-ids
--no-implied-dirs --existing --xattrs --acls --ignore-missing-args file1 -e
'ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no  -p 22
-oControlMaster=auto -i /var/lib/glusterd/geo-replication/secret.pem'
r...@pgsotc11.png.intel.com:/slavemnt/
4. Check for content sync
 a) cat /slavemnt/file1

On Tue, Nov 26, 2019 at 1:19 PM Tan, Jian Chern 
wrote:

> Rsync on both the slave and master are rsync  version 3.1.3  protocol
> version 31, so both are up to date as far as I know.
>
> Gluster version on both machines are glusterfs 5.10
>
> OS on both machines are Fedora 29 Server Edition
>
>
>
> *From:* Kotresh Hiremath Ravishankar 
> *Sent:* Tuesday, November 26, 2019 3:04 PM
> *To:* Tan, Jian Chern 
> *Cc:* gluster-users@gluster.org
> *Subject:* Re: [Gluster-users] Unable to setup geo replication
>
>
>
> The error code 14 related IPC where any of pipe/fork fails in rsync code.
> Please upgrade the rsync if not done. Also check the rsync versions
> between master and slave to be same.
>
> Which version of gluster are you using?
> What's the host OS?
>
> What's the rsync version ?
>
>
>
> On Tue, Nov 26, 2019 at 11:34 AM Tan, Jian Chern 
> wrote:
>
> I’m new to GlusterFS and trying to setup geo-replication with a master
> volume being mirrored to a slave volume on another machine. However I just
> can’t seem to get it to work after starting the geo replication volume with
> the logs showing it failing rsync with error code 14. I can’t seem to find
> any info about this online so any help would be much appreciated.
>
>
>
> [2019-11-26 05:46:31.24706] I
> [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status
> Change  status=Initializing...
>
> [2019-11-26 05:46:31.24891] I [monitor(monitor):157:monitor] Monitor:
> starting gsyncd workerbrick=/data/glusterimagebrick/jfsotc22-gv0
> slave_node=pgsotc11.png.intel.com
>
> [2019-11-26 05:46:31.90935] I [gsyncd(agent
> /data/glusterimagebrick/jfsotc22-gv0):308:main] : Using session config
> file
> path=/var/lib/glusterd/geo-replication/jfsotc22-gv0_pgsotc11.png.intel.com_pgsotc11-gv0/gsyncd.conf
>
> [2019-11-26 05:46:31.92105] I [changelogagent(agent
> /data/glusterimagebrick/jfsotc22-gv0):72:__init__] ChangelogAgent: Agent
> listining...
>
> [2019-11-26 05:46:31.93148] I [gsyncd(worker
> /data/glusterimagebrick/jfsotc22-gv0):308:main] : Using session config
> file
> path=/var/lib/glusterd/geo-replication/jfsotc22-gv0_pgsotc11.png.intel.com_pgsotc11-gv0/gsyncd.conf
>
> [2019-11-26 05:46:31.102422] I [resource(worker
> /data/glusterimagebrick/jfsotc22-gv0):1366:connect_remote] SSH:
> Initializing SSH connection between master and slave...
>
> [2019-11-26 05:46:50.355233] I [resource(worker
> /data/glusterimagebrick/jfsotc22-gv0):1413:connect_remote] SSH: SSH
> connection between master and slave established.duration=19.2526
>
> [2019-11-26 05:46:50.355583] I [resource(worker
> /data/glusterimagebrick/jfsotc22-gv0):1085:connect] GLUSTER: Mounting
> gluster volume locally...
>
> [2019-11-26 05:46:51.404998] I [resource(worker
> /data/glusterimagebrick/jfsotc22-gv0):1108:connect] GLUSTER: Mounted
> gluster volume duration=1.0492
>
> [2019-11-26 05:46:51.405363] I [subcmds(worker
> /data/glusterimagebrick/jfsotc22-gv0):80:subcmd_worker] : Worker spawn
> successful. Acknowledging back to monitor
>
> [2019-11-26 05:46:53.431502] I [master(worker
> /data/glusterimagebrick/jfsotc22-gv0):1603:register] _GMaster: Working
> dir
> path=/var/lib/misc/gluster/gsyncd/jfsotc22-gv0_pgsotc11.png.intel.com_pgsotc11-gv0/data-glusterimagebrick-jfsotc22-gv0
>
> [2019-11-26 05:46:53.431846] I [resource(worker
> /data/glusterimagebrick/jfsotc22-gv0):1271:service_loop] GLUSTER: Register
> time time=1574747213
>
> [2019-11-26 05:46:53.445589] I [gsyncdstatus(worker
> /data/glusterimagebrick/jfsotc22-gv0):281:set_active] GeorepStatus: Worker
> Status Changestatus=Active
>
> [2019-11-26 05:46:53.446184] I [gsyncdstatus(worker
> /data/glusterimagebrick/jfsotc22-gv0):253:set_worker_crawl_status]
> GeorepStatus: Crawl Status Changestatus=History Crawl
>
> [2019-11-26 05:46:53.446367] I [master(worker
> /data/glusterimagebrick/jfsotc22-gv0):1517:crawl] _GMaster: starting
> history crawlturns=1 sti

Re: [Gluster-users] Unable to setup geo replication

2019-11-25 Thread Kotresh Hiremath Ravishankar
The error code 14 related IPC where any of pipe/fork fails in rsync code.
Please upgrade the rsync if not done. Also check the rsync versions between
master and slave to be same.
Which version of gluster are you using?
What's the host OS?
What's the rsync version ?

On Tue, Nov 26, 2019 at 11:34 AM Tan, Jian Chern 
wrote:

> I’m new to GlusterFS and trying to setup geo-replication with a master
> volume being mirrored to a slave volume on another machine. However I just
> can’t seem to get it to work after starting the geo replication volume with
> the logs showing it failing rsync with error code 14. I can’t seem to find
> any info about this online so any help would be much appreciated.
>
>
>
> [2019-11-26 05:46:31.24706] I
> [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status
> Change  status=Initializing...
>
> [2019-11-26 05:46:31.24891] I [monitor(monitor):157:monitor] Monitor:
> starting gsyncd workerbrick=/data/glusterimagebrick/jfsotc22-gv0
> slave_node=pgsotc11.png.intel.com
>
> [2019-11-26 05:46:31.90935] I [gsyncd(agent
> /data/glusterimagebrick/jfsotc22-gv0):308:main] : Using session config
> file
> path=/var/lib/glusterd/geo-replication/jfsotc22-gv0_pgsotc11.png.intel.com_pgsotc11-gv0/gsyncd.conf
>
> [2019-11-26 05:46:31.92105] I [changelogagent(agent
> /data/glusterimagebrick/jfsotc22-gv0):72:__init__] ChangelogAgent: Agent
> listining...
>
> [2019-11-26 05:46:31.93148] I [gsyncd(worker
> /data/glusterimagebrick/jfsotc22-gv0):308:main] : Using session config
> file
> path=/var/lib/glusterd/geo-replication/jfsotc22-gv0_pgsotc11.png.intel.com_pgsotc11-gv0/gsyncd.conf
>
> [2019-11-26 05:46:31.102422] I [resource(worker
> /data/glusterimagebrick/jfsotc22-gv0):1366:connect_remote] SSH:
> Initializing SSH connection between master and slave...
>
> [2019-11-26 05:46:50.355233] I [resource(worker
> /data/glusterimagebrick/jfsotc22-gv0):1413:connect_remote] SSH: SSH
> connection between master and slave established.duration=19.2526
>
> [2019-11-26 05:46:50.355583] I [resource(worker
> /data/glusterimagebrick/jfsotc22-gv0):1085:connect] GLUSTER: Mounting
> gluster volume locally...
>
> [2019-11-26 05:46:51.404998] I [resource(worker
> /data/glusterimagebrick/jfsotc22-gv0):1108:connect] GLUSTER: Mounted
> gluster volume duration=1.0492
>
> [2019-11-26 05:46:51.405363] I [subcmds(worker
> /data/glusterimagebrick/jfsotc22-gv0):80:subcmd_worker] : Worker spawn
> successful. Acknowledging back to monitor
>
> [2019-11-26 05:46:53.431502] I [master(worker
> /data/glusterimagebrick/jfsotc22-gv0):1603:register] _GMaster: Working
> dir
> path=/var/lib/misc/gluster/gsyncd/jfsotc22-gv0_pgsotc11.png.intel.com_pgsotc11-gv0/data-glusterimagebrick-jfsotc22-gv0
>
> [2019-11-26 05:46:53.431846] I [resource(worker
> /data/glusterimagebrick/jfsotc22-gv0):1271:service_loop] GLUSTER: Register
> time time=1574747213
>
> [2019-11-26 05:46:53.445589] I [gsyncdstatus(worker
> /data/glusterimagebrick/jfsotc22-gv0):281:set_active] GeorepStatus: Worker
> Status Changestatus=Active
>
> [2019-11-26 05:46:53.446184] I [gsyncdstatus(worker
> /data/glusterimagebrick/jfsotc22-gv0):253:set_worker_crawl_status]
> GeorepStatus: Crawl Status Changestatus=History Crawl
>
> [2019-11-26 05:46:53.446367] I [master(worker
> /data/glusterimagebrick/jfsotc22-gv0):1517:crawl] _GMaster: starting
> history crawlturns=1 stime=(1574669325, 0)
> etime=1574747213entry_stime=None
>
> [2019-11-26 05:46:54.448994] I [master(worker
> /data/glusterimagebrick/jfsotc22-gv0):1546:crawl] _GMaster: slave's time
> stime=(1574669325, 0)
>
> [2019-11-26 05:46:54.928395] I [master(worker
> /data/glusterimagebrick/jfsotc22-gv0):1954:syncjob] Syncer: Sync Time
> Taken   job=1   num_files=1 return_code=14  duration=0.0162
>
> [2019-11-26 05:46:54.928607] E [syncdutils(worker
> /data/glusterimagebrick/jfsotc22-gv0):809:errlog] Popen: command returned
> error   cmd=rsync -aR0 --inplace --files-from=- --super --stats
> --numeric-ids --no-implied-dirs --existing --xattrs --acls
> --ignore-missing-args . -e ssh -oPasswordAuthentication=no
> -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem
> -p 22 -oControlMaster=auto -S
> /tmp/gsyncd-aux-ssh-rgpu74f3/de0855b3336b4c3233934fcbeeb3674c.sock
> pgsotc11.png.intel.com:/proc/29549/cwd  error=14
>
> [2019-11-26 05:46:54.935529] I [repce(agent
> /data/glusterimagebrick/jfsotc22-gv0):97:service_loop] RepceServer:
> terminating on reaching EOF.
>
> [2019-11-26 05:46:55.410444] I [monitor(monitor):278:monitor] Monitor:
> worker died in startup phase brick=/data/glusterimagebrick/jfsotc22-gv0
>
> [2019-11-26 05:46:55.412591] I
> [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status
> Change status=Faulty
>
> [2019-11-26 05:47:05.631944] I
> [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status
> Change status=Initializing...
>
> ….
>
>
>
> Thanks!
>
> Jian Chern
>
>
> 
>
> Community 

Re: [Gluster-users] Geo_replication to Faulty

2019-11-18 Thread Kotresh Hiremath Ravishankar
Hi,

Those issues are fixed in gluster v6.6. Please try 6.6. Non-root geo-rep is
stable in v6.6.

On Tue, Nov 19, 2019 at 11:24 AM deepu srinivasan 
wrote:

> Hi
> We are using Gluster 5.6 now.
> We tested 6.2 earlier but it had the gluster-mountbroker issue.(
> https://bugzilla.redhat.com/show_bug.cgi?id=1709248).
>
> On Tue, Nov 19, 2019 at 11:22 AM Kotresh Hiremath Ravishankar <
> khire...@redhat.com> wrote:
>
>> Which version of gluster are you using?
>>
>> On Tue, Nov 19, 2019 at 11:00 AM deepu srinivasan 
>> wrote:
>>
>>> Hi kotresh
>>> Is there a stable release in 6.x series?
>>>
>>>
>>> On Tue, Nov 19, 2019, 10:44 AM Kotresh Hiremath Ravishankar <
>>> khire...@redhat.com> wrote:
>>>
>>>> This issue has been recently fixed with the following patch and should
>>>> be available in latest gluster-6.x
>>>>
>>>> https://review.gluster.org/#/c/glusterfs/+/23570/
>>>>
>>>> On Tue, Nov 19, 2019 at 10:26 AM deepu srinivasan 
>>>> wrote:
>>>>
>>>>>
>>>>> Hi Aravinda
>>>>> *The below logs are from master end:*
>>>>>
>>>>> [2019-11-16 17:29:43.536881] I [gsyncdstatus(worker
>>>>> /home/sas/gluster/data/code-misc6):281:set_active] GeorepStatus: Worker
>>>>> Status Change   status=Active
>>>>> [2019-11-16 17:29:43.629620] I [gsyncdstatus(worker
>>>>> /home/sas/gluster/data/code-misc6):253:set_worker_crawl_status]
>>>>> GeorepStatus: Crawl Status Change   status=History Crawl
>>>>> [2019-11-16 17:29:43.630328] I [master(worker
>>>>> /home/sas/gluster/data/code-misc6):1517:crawl] _GMaster: starting history
>>>>> crawl   turns=1 stime=(1573924576, 0)   entry_stime=(1573924576, 0)
>>>>> etime=1573925383
>>>>> [2019-11-16 17:29:44.636725] I [master(worker
>>>>> /home/sas/gluster/data/code-misc6):1546:crawl] _GMaster: slave's time
>>>>> stime=(1573924576, 0)
>>>>> [2019-11-16 17:29:44.778966] I [master(worker
>>>>> /home/sas/gluster/data/code-misc6):898:fix_possible_entry_failures]
>>>>> _GMaster: Fixing ENOENT error in slave. Parent does not exist on master.
>>>>> Safe to ignore, take out entry   retry_count=1   entry=({'uid': 0,
>>>>> 'gfid': 'c02519e0-0ead-4fe8-902b-dcae72ef83a3', 'gid': 0, 'mode': 33188,
>>>>> 'entry': '.gfid/d60aa0d5-4fdf-4721-97dc-9e3e50995dab/368307802', 'op':
>>>>> 'CREATE'}, 2, {'slave_isdir': False, 'gfid_mismatch': False, 'slave_name':
>>>>> None, 'slave_gfid': None, 'name_mismatch': False, 'dst': False})
>>>>> [2019-11-16 17:29:44.779306] I [master(worker
>>>>> /home/sas/gluster/data/code-misc6):942:handle_entry_failures] _GMaster:
>>>>> Sucessfully fixed entry ops with gfid mismatchretry_count=1
>>>>> [2019-11-16 17:29:44.779516] I [master(worker
>>>>> /home/sas/gluster/data/code-misc6):1194:process_change] _GMaster: Retry
>>>>> original entries. count = 1
>>>>> [2019-11-16 17:29:44.879321] E [repce(worker
>>>>> /home/sas/gluster/data/code-misc6):214:__call__] RepceClient: call failed
>>>>>  call=151945:140353273153344:1573925384.78   method=entry_ops
>>>>>  error=OSError
>>>>> [2019-11-16 17:29:44.879750] E [syncdutils(worker
>>>>> /home/sas/gluster/data/code-misc6):338:log_raise_exception] : FAIL:
>>>>> Traceback (most recent call last):
>>>>>   File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 322,
>>>>> in main
>>>>> func(args)
>>>>>   File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 82,
>>>>> in subcmd_worker
>>>>> local.service_loop(remote)
>>>>>   File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line
>>>>> 1277, in service_loop
>>>>> g3.crawlwrap(oneshot=True)
>>>>>   File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 599,
>>>>> in crawlwrap
>>>>> self.crawl()
>>>>>   File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line
>>>>> 1555, in crawl
>>>>> self.changelogs_batch_process(changes)
>>>>>   File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line
>>>>> 1455, in changelogs_batch_proce

Re: [Gluster-users] Geo_replication to Faulty

2019-11-18 Thread Kotresh Hiremath Ravishankar
Which version of gluster are you using?

On Tue, Nov 19, 2019 at 11:00 AM deepu srinivasan 
wrote:

> Hi kotresh
> Is there a stable release in 6.x series?
>
>
> On Tue, Nov 19, 2019, 10:44 AM Kotresh Hiremath Ravishankar <
> khire...@redhat.com> wrote:
>
>> This issue has been recently fixed with the following patch and should be
>> available in latest gluster-6.x
>>
>> https://review.gluster.org/#/c/glusterfs/+/23570/
>>
>> On Tue, Nov 19, 2019 at 10:26 AM deepu srinivasan 
>> wrote:
>>
>>>
>>> Hi Aravinda
>>> *The below logs are from master end:*
>>>
>>> [2019-11-16 17:29:43.536881] I [gsyncdstatus(worker
>>> /home/sas/gluster/data/code-misc6):281:set_active] GeorepStatus: Worker
>>> Status Change   status=Active
>>> [2019-11-16 17:29:43.629620] I [gsyncdstatus(worker
>>> /home/sas/gluster/data/code-misc6):253:set_worker_crawl_status]
>>> GeorepStatus: Crawl Status Change   status=History Crawl
>>> [2019-11-16 17:29:43.630328] I [master(worker
>>> /home/sas/gluster/data/code-misc6):1517:crawl] _GMaster: starting history
>>> crawl   turns=1 stime=(1573924576, 0)   entry_stime=(1573924576, 0)
>>> etime=1573925383
>>> [2019-11-16 17:29:44.636725] I [master(worker
>>> /home/sas/gluster/data/code-misc6):1546:crawl] _GMaster: slave's time
>>> stime=(1573924576, 0)
>>> [2019-11-16 17:29:44.778966] I [master(worker
>>> /home/sas/gluster/data/code-misc6):898:fix_possible_entry_failures]
>>> _GMaster: Fixing ENOENT error in slave. Parent does not exist on master.
>>> Safe to ignore, take out entry   retry_count=1   entry=({'uid': 0,
>>> 'gfid': 'c02519e0-0ead-4fe8-902b-dcae72ef83a3', 'gid': 0, 'mode': 33188,
>>> 'entry': '.gfid/d60aa0d5-4fdf-4721-97dc-9e3e50995dab/368307802', 'op':
>>> 'CREATE'}, 2, {'slave_isdir': False, 'gfid_mismatch': False, 'slave_name':
>>> None, 'slave_gfid': None, 'name_mismatch': False, 'dst': False})
>>> [2019-11-16 17:29:44.779306] I [master(worker
>>> /home/sas/gluster/data/code-misc6):942:handle_entry_failures] _GMaster:
>>> Sucessfully fixed entry ops with gfid mismatchretry_count=1
>>> [2019-11-16 17:29:44.779516] I [master(worker
>>> /home/sas/gluster/data/code-misc6):1194:process_change] _GMaster: Retry
>>> original entries. count = 1
>>> [2019-11-16 17:29:44.879321] E [repce(worker
>>> /home/sas/gluster/data/code-misc6):214:__call__] RepceClient: call failed
>>>  call=151945:140353273153344:1573925384.78   method=entry_ops
>>>  error=OSError
>>> [2019-11-16 17:29:44.879750] E [syncdutils(worker
>>> /home/sas/gluster/data/code-misc6):338:log_raise_exception] : FAIL:
>>> Traceback (most recent call last):
>>>   File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 322,
>>> in main
>>> func(args)
>>>   File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 82,
>>> in subcmd_worker
>>> local.service_loop(remote)
>>>   File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line
>>> 1277, in service_loop
>>> g3.crawlwrap(oneshot=True)
>>>   File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 599,
>>> in crawlwrap
>>> self.crawl()
>>>   File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1555,
>>> in crawl
>>> self.changelogs_batch_process(changes)
>>>   File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1455,
>>> in changelogs_batch_process
>>> self.process(batch)
>>>   File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1290,
>>> in process
>>> self.process_change(change, done, retry)
>>>   File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1195,
>>> in process_change
>>> failures = self.slave.server.entry_ops(entries)
>>>   File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 233, in
>>> __call__
>>> return self.ins(self.meth, *a)
>>>   File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 215, in
>>> __call__
>>> raise res
>>> OSError: [Errno 13] Permission denied:
>>> '/home/sas/gluster/data/code-misc6/.glusterfs/6a/90/6a9008b1-a4aa-4c30-9ae7-92a33e05d0bb'
>>> [2019-11-16 17:29:44.911767] I [repce(agent
>>> /home/sas/gluster/data/code-misc6):97:service_loop] RepceServer:
>>> terminating on reaching EOF

Re: [Gluster-users] Geo_replication to Faulty

2019-11-18 Thread Kotresh Hiremath Ravishankar
This issue has been recently fixed with the following patch and should be
available in latest gluster-6.x

https://review.gluster.org/#/c/glusterfs/+/23570/

On Tue, Nov 19, 2019 at 10:26 AM deepu srinivasan 
wrote:

>
> Hi Aravinda
> *The below logs are from master end:*
>
> [2019-11-16 17:29:43.536881] I [gsyncdstatus(worker
> /home/sas/gluster/data/code-misc6):281:set_active] GeorepStatus: Worker
> Status Change   status=Active
> [2019-11-16 17:29:43.629620] I [gsyncdstatus(worker
> /home/sas/gluster/data/code-misc6):253:set_worker_crawl_status]
> GeorepStatus: Crawl Status Change   status=History Crawl
> [2019-11-16 17:29:43.630328] I [master(worker
> /home/sas/gluster/data/code-misc6):1517:crawl] _GMaster: starting history
> crawl   turns=1 stime=(1573924576, 0)   entry_stime=(1573924576, 0)
> etime=1573925383
> [2019-11-16 17:29:44.636725] I [master(worker
> /home/sas/gluster/data/code-misc6):1546:crawl] _GMaster: slave's time
> stime=(1573924576, 0)
> [2019-11-16 17:29:44.778966] I [master(worker
> /home/sas/gluster/data/code-misc6):898:fix_possible_entry_failures]
> _GMaster: Fixing ENOENT error in slave. Parent does not exist on master.
> Safe to ignore, take out entry   retry_count=1   entry=({'uid': 0,
> 'gfid': 'c02519e0-0ead-4fe8-902b-dcae72ef83a3', 'gid': 0, 'mode': 33188,
> 'entry': '.gfid/d60aa0d5-4fdf-4721-97dc-9e3e50995dab/368307802', 'op':
> 'CREATE'}, 2, {'slave_isdir': False, 'gfid_mismatch': False, 'slave_name':
> None, 'slave_gfid': None, 'name_mismatch': False, 'dst': False})
> [2019-11-16 17:29:44.779306] I [master(worker
> /home/sas/gluster/data/code-misc6):942:handle_entry_failures] _GMaster:
> Sucessfully fixed entry ops with gfid mismatchretry_count=1
> [2019-11-16 17:29:44.779516] I [master(worker
> /home/sas/gluster/data/code-misc6):1194:process_change] _GMaster: Retry
> original entries. count = 1
> [2019-11-16 17:29:44.879321] E [repce(worker
> /home/sas/gluster/data/code-misc6):214:__call__] RepceClient: call failed
>  call=151945:140353273153344:1573925384.78   method=entry_ops
>  error=OSError
> [2019-11-16 17:29:44.879750] E [syncdutils(worker
> /home/sas/gluster/data/code-misc6):338:log_raise_exception] : FAIL:
> Traceback (most recent call last):
>   File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 322, in
> main
> func(args)
>   File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 82, in
> subcmd_worker
> local.service_loop(remote)
>   File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1277,
> in service_loop
> g3.crawlwrap(oneshot=True)
>   File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 599, in
> crawlwrap
> self.crawl()
>   File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1555, in
> crawl
> self.changelogs_batch_process(changes)
>   File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1455, in
> changelogs_batch_process
> self.process(batch)
>   File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1290, in
> process
> self.process_change(change, done, retry)
>   File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1195, in
> process_change
> failures = self.slave.server.entry_ops(entries)
>   File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 233, in
> __call__
> return self.ins(self.meth, *a)
>   File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 215, in
> __call__
> raise res
> OSError: [Errno 13] Permission denied:
> '/home/sas/gluster/data/code-misc6/.glusterfs/6a/90/6a9008b1-a4aa-4c30-9ae7-92a33e05d0bb'
> [2019-11-16 17:29:44.911767] I [repce(agent
> /home/sas/gluster/data/code-misc6):97:service_loop] RepceServer:
> terminating on reaching EOF.
> [2019-11-16 17:29:45.509344] I [monitor(monitor):278:monitor] Monitor:
> worker died in startup phase brick=/home/sas/gluster/data/code-misc6
> [2019-11-16 17:29:45.511806] I
> [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status
> Change status=Faulty
>
>
>
> *The below logs are from the slave end.*
>
> [2019-11-16 17:24:42.281599] I [resource(slave
> 192.168.185.106/home/sas/gluster/data/code-misc6):580:entry_ops
> ]
> : Special case: rename on mkdir
>  gfid=6a9008b1-a4aa-4c30-9ae7-92a33e05d0bb
> entry='.gfid/a8921d78-a078-46d3-aca5-8b078eb62cac/8878061b-d5b3-47a6-b01c-8310fee39b20'
> [2019-11-16 17:24:42.370582] E [repce(slave
> 192.168.185.106/home/sas/gluster/data/code-misc6):122:worker
> ]
> : call failed:
> Traceback (most recent call last):
>   File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 118, in
> worker
> res = getattr(self.obj, rmeth)(*in_data[2:])
>   File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 581,
> in entry_ops
> src_entry = get_slv_dir_path(slv_host, slv_volume, gfid)
>   File 

Re: [Gluster-users] Geo-replication does not send filesystem changes

2019-07-05 Thread Kotresh Hiremath Ravishankar
The session is moved from "history crawl" to "changelog crawl". After this
point, there are no changelogs to be synced as per the logs.
Please checking in ".processing" directories if there are any pending
changelogs to be synced at
"/var/lib/misc/gluster/gsyncd///.processing"
If there are no pending changelogs, then please check if the brick is up.

On Fri, Jul 5, 2019 at 5:29 PM Pasechny Alexey 
wrote:

> Hi everyone,
>
> I have a problem with native geo-replication setup. It successfully
> starts, makes initial sync but does not send any filesystem data changes
> afterward.
> I'm using CentOS 7.6.1810 with official glusterfs-6.3-1.el7 build on top
> of ZFS on Linux.
> It is a single Master node with single brick and the same Slave node.
>
> # "gluster vol geo-rep status" command gives the following output
>  MASTER NODE = gfs-alfa1
>  MASTER VOL = cicd
>  MASTER BRICK = /zdata/cicd/brick
>  SLAVE USER = root
>  SLAVE = gfs-alfa2::cicd
>  SLAVE NODE = gfs-alfa2
>  STATUS = Active
>  CRAWL STATUS = Changelog Crawl
>  LAST_SYNCED = 2019-07-05 12:08:17
>  ENTRY = 0
>  DATA = 0
>  META = 0
>  FAILURES = 0
>  CHECKPOINT TIME = 2019-07-05 12:13:46
>  CHECKPOINT COMPLETED = No
>
> I enabled DEBUG level log for gsyncd.log but did not get any error
> messages from it. Full log is available here:
> https://pastebin.com/pXL4dBhZ
> On both brick I disabled ctime feature because it is incompatible with old
> versions of gfs clients, enabling this feature does not help too.
>
> # gluster volume info
>  Volume Name: cicd
>  Type: Distribute
>  Volume ID: 8f959a35-c7ab-4484-a1e8-9fa8e3a713b4
>  Status: Started
>  Snapshot Count: 0
>  Number of Bricks: 1
>  Transport-type: tcp
>  Bricks:
>  Brick1: gfs-alfa1:/zdata/cicd/brick
>  Options Reconfigured:
>  nfs.disable: on
>  transport.address-family: inet
>  features.ctime: off
>  geo-replication.indexing: on
>  geo-replication.ignore-pid-check: on
>  changelog.changelog: on
>
> # gluster volume get cicd rollover-time
> Option  Value
> --  -
> changelog.rollover-time 15
>
> # gluster volume get cicd fsync-interval
> Option  Value
> --  -
> changelog.fsync-interval5
>
> Could someone help me with debug of this geo-rep setup?
> Thank you!
>
> BR, Alexey
>
>
> 
>
> В данной электронной почте и во всех ее вложениях может содержаться
> информация конфиденциального характера, предназначенная исключительно
> только для адресата. Если Вы по ошибке получили не предназначенное для Вас
> сообщение, пожалуйста, безотлагательно проинформируйте об этом отправителя,
> а само сообщение немедленно уничтожьте. Физическому или юридическому лицу,
> не являющемуся адресатом почты, строго запрещается несанкционированно
> использовать, просматривать, пересылать, распространять, копировать или
> распоряжаться иным способом содержимым такой почты.
>
> This e-mail and any attachments may contain confidential and/or privileged
> information and is intended solely for the addressee. If you are not the
> intended recipient (or have received this e-mail in error) please notify
> the sender immediately and destroy this e-mail. Any unauthorised use,
> review, retransmissions, dissemination, copying or other use of this
> information by persons or entities other than the intended recipient is
> strictly prohibited.
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users



-- 
Thanks and Regards,
Kotresh H R
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Exception in Geo-Replication

2019-07-02 Thread Kotresh Hiremath Ravishankar
You should be looking into the other log file (changes-.log)
for actual failure.
In your case "changes-home-sas-gluster-data-code-misc.log"

On Tue, Jul 2, 2019 at 12:33 PM deepu srinivasan  wrote:

> Any Update on this issue ?
>
> On Mon, Jul 1, 2019 at 4:19 PM deepu srinivasan 
> wrote:
>
>> Hi
>> I am getting this exception while starting geo-replication. Please help.
>>
>>  [2019-07-01 10:48:02.445475] E [repce(agent
>> /home/sas/gluster/data/code-misc):122:worker] : call failed:
>>
>>  Traceback (most recent call last):
>>
>>File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 118,
>> in worker
>>
>>  res = getattr(self.obj, rmeth)(*in_data[2:])
>>
>>File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py",
>> line 41, in register
>>
>>  return Changes.cl_register(cl_brick, cl_dir, cl_log, cl_level,
>> retries)
>>
>>File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py",
>> line 45, in cl_register
>>
>>  cls.raise_changelog_err()
>>
>>File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py",
>> line 29, in raise_changelog_err
>>
>>  raise ChangelogException(errn, os.strerror(errn))
>>
>>  ChangelogException: [Errno 21] Is a directory
>>
>>  [2019-07-01 10:48:02.446341] E [repce(worker
>> /home/sas/gluster/data/code-misc):214:__call__] RepceClient: call failed
>> call=31023:140523296659264:1561978082.44   method=register
>> error=ChangelogException
>>
>>  [2019-07-01 10:48:02.446654] E [resource(worker
>> /home/sas/gluster/data/code-misc):1268:service_loop] GLUSTER: Changelog
>> register failed error=[Errno 21] Is a  directory
>>
>

-- 
Thanks and Regards,
Kotresh H R
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Geo Replication Stop even after migratingto 5.6

2019-06-14 Thread Kotresh Hiremath Ravishankar
It's about complete re-sync. The idea is to set the stime xattr which marks
the sync time to 0 on all the bricks.
If lot of the data is not synced to slave, this is not very useful. You can
as well delete the geo-rep session with
'reset-sync-time' option and re-setup. I prefer the second way.


Thanks,
Kotresh HR

On Fri, Jun 14, 2019 at 12:48 PM deepu srinivasan 
wrote:

> Hi Guys
> Yes, I will try the root geo-rep setup and update you back.
> Meanwhile is there any procedure for the below-quoted info in the docs?
>
>> Synchronization is not complete
>>
>> *Description*: GlusterFS geo-replication did not synchronize the data
>> completely but the geo-replication status displayed is OK.
>>
>> *Solution*: You can enforce a full sync of the data by erasing the index
>> and restarting GlusterFS geo-replication. After restarting, GlusterFS
>> geo-replication begins synchronizing all the data. All files are compared
>> using checksum, which can be a lengthy and high resource utilization
>> operation on large data sets.
>>
>>
> On Fri, Jun 14, 2019 at 12:30 PM Kotresh Hiremath Ravishankar <
> khire...@redhat.com> wrote:
>
>> Could you please try root geo-rep setup and update back?
>>
>> On Fri, Jun 14, 2019 at 12:28 PM deepu srinivasan 
>> wrote:
>>
>>> Hi Any updates on this
>>>
>>>
>>> On Thu, Jun 13, 2019 at 5:43 PM deepu srinivasan 
>>> wrote:
>>>
>>>> Hi Guys
>>>> Hope you remember the issue I reported for geo replication hang status
>>>> on History Crawl.
>>>> So you advised me to update the gluster version. previously I was using
>>>> 4.1 now I upgraded to 5.6/Still after deleting the previous geo-rep session
>>>> and creating a new one the geo-rep session hangs. Is there any other way
>>>> that I could solve the issue.
>>>> I heard that I could redo the whole geo-replication again. How could I
>>>> do that?
>>>> Please help.
>>>>
>>>
>>
>> --
>> Thanks and Regards,
>> Kotresh H R
>>
>

-- 
Thanks and Regards,
Kotresh H R
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Geo Replication stops replicating

2019-06-05 Thread Kotresh Hiremath Ravishankar
Hi,

I think the steps to setup non-root geo-rep is not followed properly. The
following entry is missing in glusterd vol file which is required.

The message "E [MSGID: 106061]
[glusterd-mountbroker.c:555:glusterd_do_mount] 0-management: 'option
mountbroker-root' missing in glusterd vol file" repeated 33 times between
[2019-06-05 08:50:46.361384] and [2019-06-05 08:52:34.019757]

Could you please the steps from below?

https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.4/html-single/administration_guide/index#Setting_Up_the_Environment_for_a_Secure_Geo-replication_Slave

And let us know if you still face the issue.




On Thu, Jun 6, 2019 at 10:24 AM deepu srinivasan  wrote:

> Hi Kotresh, Sunny
> I Have mailed the logs I found in one of the slave machines. Is there
> anything to do with permission? Please help.
>
> On Wed, Jun 5, 2019 at 2:28 PM deepu srinivasan 
> wrote:
>
>> Hi Kotresh, Sunny
>> Found this log in the slave machine.
>>
>>> [2019-06-05 08:49:10.632583] I [MSGID: 106488]
>>> [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management:
>>> Received get vol req
>>>
>>> The message "I [MSGID: 106488]
>>> [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management:
>>> Received get vol req" repeated 2 times between [2019-06-05 08:49:10.632583]
>>> and [2019-06-05 08:49:10.670863]
>>>
>>> The message "I [MSGID: 106496]
>>> [glusterd-handler.c:3187:__glusterd_handle_mount] 0-glusterd: Received
>>> mount req" repeated 34 times between [2019-06-05 08:48:41.005398] and
>>> [2019-06-05 08:50:37.254063]
>>>
>>> The message "E [MSGID: 106061]
>>> [glusterd-mountbroker.c:555:glusterd_do_mount] 0-management: 'option
>>> mountbroker-root' missing in glusterd vol file" repeated 34 times between
>>> [2019-06-05 08:48:41.005434] and [2019-06-05 08:50:37.254079]
>>>
>>> The message "W [MSGID: 106176]
>>> [glusterd-mountbroker.c:719:glusterd_do_mount] 0-management: unsuccessful
>>> mount request [No such file or directory]" repeated 34 times between
>>> [2019-06-05 08:48:41.005444] and [2019-06-05 08:50:37.254080]
>>>
>>> [2019-06-05 08:50:46.361347] I [MSGID: 106496]
>>> [glusterd-handler.c:3187:__glusterd_handle_mount] 0-glusterd: Received
>>> mount req
>>>
>>> [2019-06-05 08:50:46.361384] E [MSGID: 106061]
>>> [glusterd-mountbroker.c:555:glusterd_do_mount] 0-management: 'option
>>> mountbroker-root' missing in glusterd vol file
>>>
>>> [2019-06-05 08:50:46.361419] W [MSGID: 106176]
>>> [glusterd-mountbroker.c:719:glusterd_do_mount] 0-management: unsuccessful
>>> mount request [No such file or directory]
>>>
>>> The message "I [MSGID: 106496]
>>> [glusterd-handler.c:3187:__glusterd_handle_mount] 0-glusterd: Received
>>> mount req" repeated 33 times between [2019-06-05 08:50:46.361347] and
>>> [2019-06-05 08:52:34.019741]
>>>
>>> The message "E [MSGID: 106061]
>>> [glusterd-mountbroker.c:555:glusterd_do_mount] 0-management: 'option
>>> mountbroker-root' missing in glusterd vol file" repeated 33 times between
>>> [2019-06-05 08:50:46.361384] and [2019-06-05 08:52:34.019757]
>>>
>>> The message "W [MSGID: 106176]
>>> [glusterd-mountbroker.c:719:glusterd_do_mount] 0-management: unsuccessful
>>> mount request [No such file or directory]" repeated 33 times between
>>> [2019-06-05 08:50:46.361419] and [2019-06-05 08:52:34.019758]
>>>
>>> [2019-06-05 08:52:44.426839] I [MSGID: 106496]
>>> [glusterd-handler.c:3187:__glusterd_handle_mount] 0-glusterd: Received
>>> mount req
>>>
>>> [2019-06-05 08:52:44.426886] E [MSGID: 106061]
>>> [glusterd-mountbroker.c:555:glusterd_do_mount] 0-management: 'option
>>> mountbroker-root' missing in glusterd vol file
>>>
>>> [2019-06-05 08:52:44.426896] W [MSGID: 106176]
>>> [glusterd-mountbroker.c:719:glusterd_do_mount] 0-management: unsuccessful
>>> mount request [No such file or directory]
>>>
>>
>> On Wed, Jun 5, 2019 at 1:06 AM deepu srinivasan 
>> wrote:
>>
>>> Thankyou Kotresh
>>>
>>> On Tue, Jun 4, 2019, 11:20 PM Kotresh Hiremath Ravishankar <
>>> khire...@redhat.com> wrote:
>>>
>>>> Ccing Sunny, who was investing similar issue.
>>>>
>>>> On Tue, Jun 4, 2019 at 5:46 PM deepu srinivasan 
>>>> wrote:
>>>>
>>>>>

Re: [Gluster-users] Geo Replication stops replicating

2019-06-04 Thread Kotresh Hiremath Ravishankar
Ccing Sunny, who was investing similar issue.

On Tue, Jun 4, 2019 at 5:46 PM deepu srinivasan  wrote:

> Have already added the path in bashrc . Still in faulty state
>
> On Tue, Jun 4, 2019, 5:27 PM Kotresh Hiremath Ravishankar <
> khire...@redhat.com> wrote:
>
>> could you please try adding /usr/sbin to $PATH for user 'sas'? If it's
>> bash, add 'export PATH=/usr/sbin:$PATH' in
>> /home/sas/.bashrc
>>
>> On Tue, Jun 4, 2019 at 5:24 PM deepu srinivasan 
>> wrote:
>>
>>> Hi Kortesh
>>> Please find the logs of the above error
>>> *Master log snippet*
>>>
>>>> [2019-06-04 11:52:09.254731] I [resource(worker
>>>> /home/sas/gluster/data/code-misc):1379:connect_remote] SSH: Initializing
>>>> SSH connection between master and slave...
>>>>  [2019-06-04 11:52:09.308923] D [repce(worker
>>>> /home/sas/gluster/data/code-misc):196:push] RepceClient: call
>>>> 89724:139652759443264:1559649129.31 __repce_version__() ...
>>>>  [2019-06-04 11:52:09.602792] E [syncdutils(worker
>>>> /home/sas/gluster/data/code-misc):311:log_raise_exception] :
>>>> connection to peer is broken
>>>>  [2019-06-04 11:52:09.603312] E [syncdutils(worker
>>>> /home/sas/gluster/data/code-misc):805:errlog] Popen: command returned error
>>>>   cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i
>>>> /var/lib/ glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto -S
>>>> /tmp/gsyncd-aux-ssh-4aL2tc/d893f66e0addc32f7d0080bb503f5185.sock
>>>> sas@192.168.185.107 /usr/libexec/glusterfs/gsyncd slave code-misc sas@
>>>>   192.168.185.107::code-misc --master-node 192.168.185.106
>>>> --master-node-id 851b64d0-d885-4ae9-9b38-ab5b15db0fec --master-brick
>>>> /home/sas/gluster/data/code-misc --local-node 192.168.185.122 --local-node-
>>>>   id bcaa7af6-c3a1-4411-8e99-4ebecb32eb6a --slave-timeout 120
>>>> --slave-log-level DEBUG --slave-gluster-log-level INFO
>>>> --slave-gluster-command-dir /usr/sbin   error=1
>>>>  [2019-06-04 11:52:09.614996] I [repce(agent
>>>> /home/sas/gluster/data/code-misc):97:service_loop] RepceServer: terminating
>>>> on reaching EOF.
>>>>  [2019-06-04 11:52:09.615545] D [monitor(monitor):271:monitor] Monitor:
>>>> worker(/home/sas/gluster/data/code-misc) connected
>>>>  [2019-06-04 11:52:09.616528] I [monitor(monitor):278:monitor] Monitor:
>>>> worker died in startup phase brick=/home/sas/gluster/data/code-misc
>>>>  [2019-06-04 11:52:09.619391] I
>>>> [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status
>>>> Change status=Faulty
>>>>
>>>
>>> *Slave log snippet*
>>>
>>>> [2019-06-04 11:50:09.782668] E [syncdutils(slave
>>>> 192.168.185.106/home/sas/gluster/data/code-misc):809:logerr] Popen:
>>>> /usr/sbin/gluster> 2 : failed with this errno (No such file or directory)
>>>> [2019-06-04 11:50:11.188167] W [gsyncd(slave
>>>> 192.168.185.125/home/sas/gluster/data/code-misc):305:main] :
>>>> Session config file not exists, using the default config
>>>> path=/var/lib/glusterd/geo-replication/code-misc_192.168.185.107_code-misc/gsyncd.conf
>>>> [2019-06-04 11:50:11.201070] I [resource(slave
>>>> 192.168.185.125/home/sas/gluster/data/code-misc):1098:connect]
>>>> GLUSTER: Mounting gluster volume locally...
>>>> [2019-06-04 11:50:11.271231] E [resource(slave
>>>> 192.168.185.125/home/sas/gluster/data/code-misc):1006:handle_mounter]
>>>> MountbrokerMounter: glusterd answered mnt=
>>>> [2019-06-04 11:50:11.271998] E [syncdutils(slave
>>>> 192.168.185.125/home/sas/gluster/data/code-misc):805:errlog] Popen:
>>>> command returned error cmd=/usr/sbin/gluster --remote-host=localhost
>>>> system:: mount sas user-map-root=sas aux-gfid-mount acl log-level=INFO
>>>> log-file=/var/log/glusterfs/geo-replication-slaves/code-misc_192.168.185.107_code-misc/mnt-192.168.185.125-home-sas-gluster-data-code-misc.log
>>>> volfile-server=localhost volfile-id=code-misc client-pid=-1 error=1
>>>> [2019-06-04 11:50:11.272113] E [syncdutils(slave
>>>> 192.168.185.125/home/sas/gluster/data/code-misc):809:logerr] Popen:
>>>> /usr/sbin/gluster> 2 : failed with this errno (No such file or directory)
>>>
>>>
>>> On Tue, Jun 4, 2019 at 5:10 PM deepu srinivasan 
>>> wrote:
>>>
>>>> Hi
>>>&

Re: [Gluster-users] Geo Replication stops replicating

2019-06-04 Thread Kotresh Hiremath Ravishankar
could you please try adding /usr/sbin to $PATH for user 'sas'? If it's
bash, add 'export PATH=/usr/sbin:$PATH' in
/home/sas/.bashrc

On Tue, Jun 4, 2019 at 5:24 PM deepu srinivasan  wrote:

> Hi Kortesh
> Please find the logs of the above error
> *Master log snippet*
>
>> [2019-06-04 11:52:09.254731] I [resource(worker
>> /home/sas/gluster/data/code-misc):1379:connect_remote] SSH: Initializing
>> SSH connection between master and slave...
>>  [2019-06-04 11:52:09.308923] D [repce(worker
>> /home/sas/gluster/data/code-misc):196:push] RepceClient: call
>> 89724:139652759443264:1559649129.31 __repce_version__() ...
>>  [2019-06-04 11:52:09.602792] E [syncdutils(worker
>> /home/sas/gluster/data/code-misc):311:log_raise_exception] :
>> connection to peer is broken
>>  [2019-06-04 11:52:09.603312] E [syncdutils(worker
>> /home/sas/gluster/data/code-misc):805:errlog] Popen: command returned error
>>   cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i
>> /var/lib/ glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto -S
>> /tmp/gsyncd-aux-ssh-4aL2tc/d893f66e0addc32f7d0080bb503f5185.sock
>> sas@192.168.185.107 /usr/libexec/glusterfs/gsyncd slave code-misc sas@
>> 192.168.185.107::code-misc --master-node 192.168.185.106
>> --master-node-id 851b64d0-d885-4ae9-9b38-ab5b15db0fec --master-brick
>> /home/sas/gluster/data/code-misc --local-node 192.168.185.122 --local-node-
>>   id bcaa7af6-c3a1-4411-8e99-4ebecb32eb6a --slave-timeout 120
>> --slave-log-level DEBUG --slave-gluster-log-level INFO
>> --slave-gluster-command-dir /usr/sbin   error=1
>>  [2019-06-04 11:52:09.614996] I [repce(agent
>> /home/sas/gluster/data/code-misc):97:service_loop] RepceServer: terminating
>> on reaching EOF.
>>  [2019-06-04 11:52:09.615545] D [monitor(monitor):271:monitor] Monitor:
>> worker(/home/sas/gluster/data/code-misc) connected
>>  [2019-06-04 11:52:09.616528] I [monitor(monitor):278:monitor] Monitor:
>> worker died in startup phase brick=/home/sas/gluster/data/code-misc
>>  [2019-06-04 11:52:09.619391] I
>> [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status
>> Change status=Faulty
>>
>
> *Slave log snippet*
>
>> [2019-06-04 11:50:09.782668] E [syncdutils(slave
>> 192.168.185.106/home/sas/gluster/data/code-misc):809:logerr] Popen:
>> /usr/sbin/gluster> 2 : failed with this errno (No such file or directory)
>> [2019-06-04 11:50:11.188167] W [gsyncd(slave
>> 192.168.185.125/home/sas/gluster/data/code-misc):305:main] :
>> Session config file not exists, using the default config
>> path=/var/lib/glusterd/geo-replication/code-misc_192.168.185.107_code-misc/gsyncd.conf
>> [2019-06-04 11:50:11.201070] I [resource(slave
>> 192.168.185.125/home/sas/gluster/data/code-misc):1098:connect] GLUSTER:
>> Mounting gluster volume locally...
>> [2019-06-04 11:50:11.271231] E [resource(slave
>> 192.168.185.125/home/sas/gluster/data/code-misc):1006:handle_mounter]
>> MountbrokerMounter: glusterd answered mnt=
>> [2019-06-04 11:50:11.271998] E [syncdutils(slave
>> 192.168.185.125/home/sas/gluster/data/code-misc):805:errlog] Popen:
>> command returned error cmd=/usr/sbin/gluster --remote-host=localhost
>> system:: mount sas user-map-root=sas aux-gfid-mount acl log-level=INFO
>> log-file=/var/log/glusterfs/geo-replication-slaves/code-misc_192.168.185.107_code-misc/mnt-192.168.185.125-home-sas-gluster-data-code-misc.log
>> volfile-server=localhost volfile-id=code-misc client-pid=-1 error=1
>> [2019-06-04 11:50:11.272113] E [syncdutils(slave
>> 192.168.185.125/home/sas/gluster/data/code-misc):809:logerr] Popen:
>> /usr/sbin/gluster> 2 : failed with this errno (No such file or directory)
>
>
> On Tue, Jun 4, 2019 at 5:10 PM deepu srinivasan 
> wrote:
>
>> Hi
>> As discussed I have upgraded gluster from 4.1 to 6.2 version. But the Geo
>> replication failed to start.
>> Stays in faulty state
>>
>> On Fri, May 31, 2019, 5:32 PM deepu srinivasan 
>> wrote:
>>
>>> Checked the data. It remains in 2708. No progress.
>>>
>>> On Fri, May 31, 2019 at 4:36 PM Kotresh Hiremath Ravishankar <
>>> khire...@redhat.com> wrote:
>>>
>>>> That means it could be working and the defunct process might be some
>>>> old zombie one. Could you check, that data progress ?
>>>>
>>>> On Fri, May 31, 2019 at 4:29 PM deepu srinivasan 
>>>> wrote:
>>>>
>>>>> Hi
>>>>> When i change the rsync option the rsync process doesnt seem to start
>>>>> . Only a defunt process is listed

Re: [Gluster-users] Geo Replication stops replicating

2019-05-31 Thread Kotresh Hiremath Ravishankar
That means it could be working and the defunct process might be some old
zombie one. Could you check, that data progress ?

On Fri, May 31, 2019 at 4:29 PM deepu srinivasan  wrote:

> Hi
> When i change the rsync option the rsync process doesnt seem to start .
> Only a defunt process is listed in ps aux. Only when i set rsync option to
> " " and restart all the process the rsync process is listed in ps aux.
>
>
> On Fri, May 31, 2019 at 4:23 PM Kotresh Hiremath Ravishankar <
> khire...@redhat.com> wrote:
>
>> Yes, rsync config option should have fixed this issue.
>>
>> Could you share the output of the following?
>>
>> 1. gluster volume geo-replication  ::
>> config rsync-options
>> 2. ps -ef | grep rsync
>>
>> On Fri, May 31, 2019 at 4:11 PM deepu srinivasan 
>> wrote:
>>
>>> Done.
>>> We got the following result .
>>>
>>>> 1559298781.338234 write(2, "rsync: link_stat
>>>> \"/tmp/gsyncd-aux-mount-EEJ_sY/.gfid/3fa6aed8-802e-4efe-9903-8bc171176d88\"
>>>> failed: No such file or directory (2)", 128
>>>
>>> seems like a file is missing ?
>>>
>>> On Fri, May 31, 2019 at 3:25 PM Kotresh Hiremath Ravishankar <
>>> khire...@redhat.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> Could you take the strace with with more string size? The argument
>>>> strings are truncated.
>>>>
>>>> strace -s 500 -ttt -T -p 
>>>>
>>>> On Fri, May 31, 2019 at 3:17 PM deepu srinivasan 
>>>> wrote:
>>>>
>>>>> Hi Kotresh
>>>>> The above-mentioned work around did not work properly.
>>>>>
>>>>> On Fri, May 31, 2019 at 3:16 PM deepu srinivasan 
>>>>> wrote:
>>>>>
>>>>>> Hi Kotresh
>>>>>> We have tried the above-mentioned rsync option and we are planning to
>>>>>> have the version upgrade to 6.0.
>>>>>>
>>>>>> On Fri, May 31, 2019 at 11:04 AM Kotresh Hiremath Ravishankar <
>>>>>> khire...@redhat.com> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> This looks like the hang because stderr buffer filled up with errors
>>>>>>> messages and no one reading it.
>>>>>>> I think this issue is fixed in latest releases. As a workaround, you
>>>>>>> can do following and check if it works.
>>>>>>>
>>>>>>> Prerequisite:
>>>>>>>  rsync version should be > 3.1.0
>>>>>>>
>>>>>>> Workaround:
>>>>>>> gluster volume geo-replication  ::
>>>>>>> config rsync-options "--ignore-missing-args"
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Kotresh HR
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, May 30, 2019 at 5:39 PM deepu srinivasan 
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi
>>>>>>>> We were evaluating Gluster geo Replication between two DCs one is
>>>>>>>> in US west and one is in US east. We took multiple trials for different
>>>>>>>> file size.
>>>>>>>> The Geo Replication tends to stop replicating but while checking
>>>>>>>> the status it appears to be in Active state. But the slave volume did 
>>>>>>>> not
>>>>>>>> increase in size.
>>>>>>>> So we have restarted the geo-replication session and checked the
>>>>>>>> status. The status was in an active state and it was in History Crawl 
>>>>>>>> for a
>>>>>>>> long time. We have enabled the DEBUG mode in logging and checked for 
>>>>>>>> any
>>>>>>>> error.
>>>>>>>> There was around 2000 file appeared for syncing candidate. The
>>>>>>>> Rsync process starts but the rsync did not happen in the slave volume.
>>>>>>>> Every time the rsync process appears in the "ps auxxx" list but the
>>>>>>>> replication did not happen in the slave end. What would be the cause of
>>>>>>>> this problem? Is there anyway to debug it?
>>>>>>>>
>>>>>>>> We have also checked the strace of the rync program.
>>>>>>>> it displays something like this
>>>>>>>>
>>>>>>>> "write(2, "rsync: link_stat \"/tmp/gsyncd-au"..., 128"
>>>>>>>>
>>>>>>>>
>>>>>>>> We are using the below specs
>>>>>>>>
>>>>>>>> Gluster version - 4.1.7
>>>>>>>> Sync mode - rsync
>>>>>>>> Volume - 1x3 in each end (master and slave)
>>>>>>>> Intranet Bandwidth - 10 Gig
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Thanks and Regards,
>>>>>>> Kotresh H R
>>>>>>>
>>>>>>
>>>>
>>>> --
>>>> Thanks and Regards,
>>>> Kotresh H R
>>>>
>>>
>>
>> --
>> Thanks and Regards,
>> Kotresh H R
>>
>

-- 
Thanks and Regards,
Kotresh H R
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Geo Replication stops replicating

2019-05-31 Thread Kotresh Hiremath Ravishankar
Yes, rsync config option should have fixed this issue.

Could you share the output of the following?

1. gluster volume geo-replication  ::
config rsync-options
2. ps -ef | grep rsync

On Fri, May 31, 2019 at 4:11 PM deepu srinivasan  wrote:

> Done.
> We got the following result .
>
>> 1559298781.338234 write(2, "rsync: link_stat
>> \"/tmp/gsyncd-aux-mount-EEJ_sY/.gfid/3fa6aed8-802e-4efe-9903-8bc171176d88\"
>> failed: No such file or directory (2)", 128
>
> seems like a file is missing ?
>
> On Fri, May 31, 2019 at 3:25 PM Kotresh Hiremath Ravishankar <
> khire...@redhat.com> wrote:
>
>> Hi,
>>
>> Could you take the strace with with more string size? The argument
>> strings are truncated.
>>
>> strace -s 500 -ttt -T -p 
>>
>> On Fri, May 31, 2019 at 3:17 PM deepu srinivasan 
>> wrote:
>>
>>> Hi Kotresh
>>> The above-mentioned work around did not work properly.
>>>
>>> On Fri, May 31, 2019 at 3:16 PM deepu srinivasan 
>>> wrote:
>>>
>>>> Hi Kotresh
>>>> We have tried the above-mentioned rsync option and we are planning to
>>>> have the version upgrade to 6.0.
>>>>
>>>> On Fri, May 31, 2019 at 11:04 AM Kotresh Hiremath Ravishankar <
>>>> khire...@redhat.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> This looks like the hang because stderr buffer filled up with errors
>>>>> messages and no one reading it.
>>>>> I think this issue is fixed in latest releases. As a workaround, you
>>>>> can do following and check if it works.
>>>>>
>>>>> Prerequisite:
>>>>>  rsync version should be > 3.1.0
>>>>>
>>>>> Workaround:
>>>>> gluster volume geo-replication  ::
>>>>> config rsync-options "--ignore-missing-args"
>>>>>
>>>>> Thanks,
>>>>> Kotresh HR
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Thu, May 30, 2019 at 5:39 PM deepu srinivasan 
>>>>> wrote:
>>>>>
>>>>>> Hi
>>>>>> We were evaluating Gluster geo Replication between two DCs one is in
>>>>>> US west and one is in US east. We took multiple trials for different file
>>>>>> size.
>>>>>> The Geo Replication tends to stop replicating but while checking the
>>>>>> status it appears to be in Active state. But the slave volume did not
>>>>>> increase in size.
>>>>>> So we have restarted the geo-replication session and checked the
>>>>>> status. The status was in an active state and it was in History Crawl 
>>>>>> for a
>>>>>> long time. We have enabled the DEBUG mode in logging and checked for any
>>>>>> error.
>>>>>> There was around 2000 file appeared for syncing candidate. The Rsync
>>>>>> process starts but the rsync did not happen in the slave volume. Every 
>>>>>> time
>>>>>> the rsync process appears in the "ps auxxx" list but the replication did
>>>>>> not happen in the slave end. What would be the cause of this problem? Is
>>>>>> there anyway to debug it?
>>>>>>
>>>>>> We have also checked the strace of the rync program.
>>>>>> it displays something like this
>>>>>>
>>>>>> "write(2, "rsync: link_stat \"/tmp/gsyncd-au"..., 128"
>>>>>>
>>>>>>
>>>>>> We are using the below specs
>>>>>>
>>>>>> Gluster version - 4.1.7
>>>>>> Sync mode - rsync
>>>>>> Volume - 1x3 in each end (master and slave)
>>>>>> Intranet Bandwidth - 10 Gig
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Thanks and Regards,
>>>>> Kotresh H R
>>>>>
>>>>
>>
>> --
>> Thanks and Regards,
>> Kotresh H R
>>
>

-- 
Thanks and Regards,
Kotresh H R
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Geo Replication stops replicating

2019-05-31 Thread Kotresh Hiremath Ravishankar
Hi,

Could you take the strace with with more string size? The argument strings
are truncated.

strace -s 500 -ttt -T -p 

On Fri, May 31, 2019 at 3:17 PM deepu srinivasan  wrote:

> Hi Kotresh
> The above-mentioned work around did not work properly.
>
> On Fri, May 31, 2019 at 3:16 PM deepu srinivasan 
> wrote:
>
>> Hi Kotresh
>> We have tried the above-mentioned rsync option and we are planning to
>> have the version upgrade to 6.0.
>>
>> On Fri, May 31, 2019 at 11:04 AM Kotresh Hiremath Ravishankar <
>> khire...@redhat.com> wrote:
>>
>>> Hi,
>>>
>>> This looks like the hang because stderr buffer filled up with errors
>>> messages and no one reading it.
>>> I think this issue is fixed in latest releases. As a workaround, you can
>>> do following and check if it works.
>>>
>>> Prerequisite:
>>>  rsync version should be > 3.1.0
>>>
>>> Workaround:
>>> gluster volume geo-replication  ::
>>> config rsync-options "--ignore-missing-args"
>>>
>>> Thanks,
>>> Kotresh HR
>>>
>>>
>>>
>>>
>>> On Thu, May 30, 2019 at 5:39 PM deepu srinivasan 
>>> wrote:
>>>
>>>> Hi
>>>> We were evaluating Gluster geo Replication between two DCs one is in US
>>>> west and one is in US east. We took multiple trials for different file
>>>> size.
>>>> The Geo Replication tends to stop replicating but while checking the
>>>> status it appears to be in Active state. But the slave volume did not
>>>> increase in size.
>>>> So we have restarted the geo-replication session and checked the
>>>> status. The status was in an active state and it was in History Crawl for a
>>>> long time. We have enabled the DEBUG mode in logging and checked for any
>>>> error.
>>>> There was around 2000 file appeared for syncing candidate. The Rsync
>>>> process starts but the rsync did not happen in the slave volume. Every time
>>>> the rsync process appears in the "ps auxxx" list but the replication did
>>>> not happen in the slave end. What would be the cause of this problem? Is
>>>> there anyway to debug it?
>>>>
>>>> We have also checked the strace of the rync program.
>>>> it displays something like this
>>>>
>>>> "write(2, "rsync: link_stat \"/tmp/gsyncd-au"..., 128"
>>>>
>>>>
>>>> We are using the below specs
>>>>
>>>> Gluster version - 4.1.7
>>>> Sync mode - rsync
>>>> Volume - 1x3 in each end (master and slave)
>>>> Intranet Bandwidth - 10 Gig
>>>>
>>>
>>>
>>> --
>>> Thanks and Regards,
>>> Kotresh H R
>>>
>>

-- 
Thanks and Regards,
Kotresh H R
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Geo Replication stops replicating

2019-05-30 Thread Kotresh Hiremath Ravishankar
Hi,

This looks like the hang because stderr buffer filled up with errors
messages and no one reading it.
I think this issue is fixed in latest releases. As a workaround, you can do
following and check if it works.

Prerequisite:
 rsync version should be > 3.1.0

Workaround:
gluster volume geo-replication  :: config
rsync-options "--ignore-missing-args"

Thanks,
Kotresh HR




On Thu, May 30, 2019 at 5:39 PM deepu srinivasan  wrote:

> Hi
> We were evaluating Gluster geo Replication between two DCs one is in US
> west and one is in US east. We took multiple trials for different file
> size.
> The Geo Replication tends to stop replicating but while checking the
> status it appears to be in Active state. But the slave volume did not
> increase in size.
> So we have restarted the geo-replication session and checked the status.
> The status was in an active state and it was in History Crawl for a long
> time. We have enabled the DEBUG mode in logging and checked for any error.
> There was around 2000 file appeared for syncing candidate. The Rsync
> process starts but the rsync did not happen in the slave volume. Every time
> the rsync process appears in the "ps auxxx" list but the replication did
> not happen in the slave end. What would be the cause of this problem? Is
> there anyway to debug it?
>
> We have also checked the strace of the rync program.
> it displays something like this
>
> "write(2, "rsync: link_stat \"/tmp/gsyncd-au"..., 128"
>
>
> We are using the below specs
>
> Gluster version - 4.1.7
> Sync mode - rsync
> Volume - 1x3 in each end (master and slave)
> Intranet Bandwidth - 10 Gig
>


-- 
Thanks and Regards,
Kotresh H R
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [geo-rep] Replication faulty - gsyncd.log OSError: [Errno 13] Permission denied

2019-03-20 Thread Kotresh Hiremath Ravishankar
):293:main] : Session config file not
>> exists, using the default config
>> path=/var/lib/glusterd/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.conf
>>
>> [2018-09-25 13:49:36.629415] I [resource(slave
>> master/bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume
>> locally...
>>
>> [2018-09-25 13:49:37.701642] I [resource(slave
>> master/bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster
>> volume duration=1.0718
>>
>> [2018-09-25 13:49:37.704282] I [resource(slave
>> master/bricks/brick1/brick):1146:service_loop] GLUSTER: slave listening
>>
>> [2018-09-25 14:10:27.70952] I [repce(slave
>> master/bricks/brick1/brick):80:service_loop] RepceServer: terminating on
>> reaching EOF.
>>
>> [2018-09-25 14:10:39.632124] W [gsyncd(slave
>> master/bricks/brick1/brick):293:main] : Session config file not
>> exists, using the default config
>> path=/var/lib/glusterd/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.conf
>>
>> [2018-09-25 14:10:39.650958] I [resource(slave
>> master/bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume
>> locally...
>>
>> [2018-09-25 14:10:40.729355] I [resource(slave
>> master/bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster
>> volume duration=1.0781
>>
>> [2018-09-25 14:10:40.730650] I [resource(slave
>> master/bricks/brick1/brick):1146:service_loop] GLUSTER: slave listening
>>
>> [2018-09-25 14:31:10.291064] I [repce(slave
>> master/bricks/brick1/brick):80:service_loop] RepceServer: terminating on
>> reaching EOF.
>>
>> [2018-09-25 14:31:22.802237] W [gsyncd(slave
>> master/bricks/brick1/brick):293:main] : Session config file not
>> exists, using the default config
>> path=/var/lib/glusterd/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.conf
>>
>> [2018-09-25 14:31:22.828418] I [resource(slave
>> master/bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume
>> locally...
>>
>> [2018-09-25 14:31:23.910206] I [resource(slave
>> master/bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster
>> volume duration=1.0813
>>
>> [2018-09-25 14:31:23.913369] I [resource(slave
>> master/bricks/brick1/brick):1146:service_loop] GLUSTER: slave listening
>>
>>
>>
>> Any ideas how to resolve this without re-creating everything again? Can I
>> reset the changelog history?
>>
>>
>>
>> Regards,
>>
>> Christian
>>
>>
>>
>> *From: * on behalf of "Kotte,
>> Christian (Ext)" 
>> *Date: *Monday, 24. September 2018 at 17:20
>> *To: *Kotresh Hiremath Ravishankar 
>> *Cc: *Gluster Users 
>> *Subject: *Re: [Gluster-users] [geo-rep] Replication faulty - gsyncd.log
>> OSError: [Errno 13] Permission denied
>>
>>
>>
>> I don’t configure the permissions of /bricks/brick1/brick/.glusterfs. I
>> don’t even see it on the local GlusterFS mount.
>>
>>
>>
>> Not sure why the permissions are configured with S and the AD group…
>>
>>
>>
>> Regards,
>>
>> Christian
>>
>>
>>
>> *From: * on behalf of "Kotte,
>> Christian (Ext)" 
>> *Date: *Monday, 24. September 2018 at 17:10
>> *To: *Kotresh Hiremath Ravishankar 
>> *Cc: *Gluster Users 
>> *Subject: *Re: [Gluster-users] [geo-rep] Replication faulty - gsyncd.log
>> OSError: [Errno 13] Permission denied
>>
>>
>>
>> Yeah right. I get permission denied.
>>
>>
>>
>> [geoaccount@slave ~]$ ll
>> /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e
>>
>> ls: cannot access
>> /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e:
>> Permission denied
>>
>> [geoaccount@slave ~]$ ll /bricks/brick1/brick/.glusterfs/29/d1/
>>
>> ls: cannot access /bricks/brick1/brick/.glusterfs/29/d1/: Permission
>> denied
>>
>> [geoaccount@slave ~]$ ll /bricks/brick1/brick/.glusterfs/29/
>>
>> ls: cannot access /bricks/brick1/brick/.glusterfs/29/: Permission denied
>>
>> [geoaccount@slave ~]$ ll /bricks/brick1/brick/.glusterfs/
>>
>> ls: cannot open directory /bricks/brick1/brick/.glusterfs/: Permission
>> denied
>>
>>
>>
>> [root@slave ~]# ll /bricks/brick1/brick/.glusterfs/29
>>
>> total 0
>>
>> drwx--S---+ 2 root AD+group 50 Sep 10 07:29 16
>>
>> drwx--S---+ 2 root AD+group 50 Sep 10 07:29 33
>>
>> drwx--S---+ 2 root AD+gro

Re: [Gluster-users] [Gluster-devel] Bitrot: Time of signing depending on the file size???

2019-03-05 Thread Kotresh Hiremath Ravishankar
Hi David,

Thanks for raising the bug. But from the above validation, it's clear that
bitrot is not directly involved. Bitrot waits for last fd to be closed. We
will have to investigate the reason for fd not being closed for large files.

Thanks,
Kotresh HR

On Mon, Mar 4, 2019 at 3:13 PM David Spisla  wrote:

> Hello Kotresh,
>
> Yes, the fd was still open for larger files. I could verify this with a
> 500MiB file and some smaller files. After a specific time only the fd for
> the 500MiB was up and the file still had no signature, for the smaller
> files there were no fds and they already had a signature. I don't know the
> reason for this. Maybe the client still keep th fd open? I opened a bug for
> this:
> https://bugzilla.redhat.com/show_bug.cgi?id=1685023
>
> Regards
> David
>
> Am Fr., 1. März 2019 um 18:29 Uhr schrieb Kotresh Hiremath Ravishankar <
> khire...@redhat.com>:
>
>> Interesting observation! But as discussed in the thread bitrot signing
>> processes depends 2 min timeout (by default) after last fd closes. It
>> doesn't have any co-relation with the size of the file.
>> Did you happen to verify that the fd was still open for large files for
>> some reason?
>>
>>
>>
>> On Fri, Mar 1, 2019 at 1:19 PM David Spisla  wrote:
>>
>>> Hello folks,
>>>
>>> I did some observations concerning the bitrot daemon. It seems to be
>>> that the bitrot signer is signing files depending on file size. I copied
>>> files with different sizes into a volume and I was wonderung because the
>>> files get their signature not the same time (I keep the expiry time default
>>> with 120). Here are some examples:
>>>
>>> 300 KB file ~2-3 m
>>> 70 MB file ~ 40 m
>>> 115 MB file ~ 1 Sh
>>> 800 MB file ~ 4,5 h
>>>
>>> What is the expected behaviour here?
>>> Why does it take so long to sign a 800MB file?
>>> What about 500GB or 1TB?
>>> Is there a way to speed up the sign process?
>>>
>>> My ambition is to understand this observation
>>>
>>> Regards
>>> David Spisla
>>> ___
>>> Gluster-devel mailing list
>>> gluster-de...@gluster.org
>>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>>
>>
>>
>> --
>> Thanks and Regards,
>> Kotresh H R
>>
>

-- 
Thanks and Regards,
Kotresh H R
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [Gluster-devel] Bitrot: Time of signing depending on the file size???

2019-03-01 Thread Kotresh Hiremath Ravishankar
Interesting observation! But as discussed in the thread bitrot signing
processes depends 2 min timeout (by default) after last fd closes. It
doesn't have any co-relation with the size of the file.
Did you happen to verify that the fd was still open for large files for
some reason?



On Fri, Mar 1, 2019 at 1:19 PM David Spisla  wrote:

> Hello folks,
>
> I did some observations concerning the bitrot daemon. It seems to be that
> the bitrot signer is signing files depending on file size. I copied files
> with different sizes into a volume and I was wonderung because the files
> get their signature not the same time (I keep the expiry time default with
> 120). Here are some examples:
>
> 300 KB file ~2-3 m
> 70 MB file ~ 40 m
> 115 MB file ~ 1 Sh
> 800 MB file ~ 4,5 h
>
> What is the expected behaviour here?
> Why does it take so long to sign a 800MB file?
> What about 500GB or 1TB?
> Is there a way to speed up the sign process?
>
> My ambition is to understand this observation
>
> Regards
> David Spisla
> ___
> Gluster-devel mailing list
> gluster-de...@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-devel



-- 
Thanks and Regards,
Kotresh H R
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Elasticsearch on gluster

2018-11-13 Thread Kotresh Hiremath Ravishankar
Hi,

I tried setting up elastic search on gluster. Here is the note of it. Hope
it helps someone trying to setup ELK stack on gluster.

https://hrkscribbles.blogspot.com/2018/11/elastic-search-on-gluster.html

-- 
Thanks and Regards,
Kotresh H R
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [geo-rep] Replication faulty - gsyncd.log OSError: [Errno 13] Permission denied

2018-09-24 Thread Kotresh Hiremath Ravishankar
ting to connect to
> host:(null), port:0
>
> [2018-09-24 13:51:10.643489] W [dict.c:923:str_to_data]
> (-->/usr/lib64/glusterfs/4.1.3/xlator/protocol/client.so(+0x4131a)
> [0x7fafb023831a] -->/lib64/libglusterfs.so.0(dict_set_str+0x16)
> [0x7fafbdb83266] -->/lib64/libglusterfs.so.0(str_to_data+0x91)
> [0x7fafbdb7fea1] ) 0-dict: value is NULL [Invalid argument]
>
> [2018-09-24 13:51:10.643507] I [MSGID: 114006]
> [client-handshake.c:1308:client_setvolume] 0-glustervol1-client-0: failed
> to set process-name in handshake msg
>
> [2018-09-24 13:51:10.643541] W [rpc-clnt.c:1753:rpc_clnt_submit]
> 0-glustervol1-client-0: error returned while attempting to connect to
> host:(null), port:0
>
> [2018-09-24 13:51:10.671460] I [MSGID: 114046]
> [client-handshake.c:1176:client_setvolume_cbk] 0-glustervol1-client-0:
> Connected to glustervol1-client-0, attached to remote volume
> '/bricks/brick1/brick'.
>
> [2018-09-24 13:51:10.672694] I [fuse-bridge.c:4294:fuse_init]
> 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel
> 7.22
>
> [2018-09-24 13:51:10.672715] I [fuse-bridge.c:4927:fuse_graph_sync]
> 0-fuse: switched to graph 0
>
> [2018-09-24 13:51:10.673329] I [MSGID: 109005]
> [dht-selfheal.c:2342:dht_selfheal_directory] 0-glustervol1-dht: Directory
> selfheal failed: Unable to form layout for directory /
>
> [2018-09-24 13:51:16.116458] I [fuse-bridge.c:5199:fuse_thread_proc]
> 0-fuse: initating unmount of
> /var/mountbroker-root/user1300/mtpt-geoaccount-ARDW1E
>
> [2018-09-24 13:51:16.116595] W [glusterfsd.c:1514:cleanup_and_exit]
> (-->/lib64/libpthread.so.0(+0x7e25) [0x7fafbc9eee25]
> -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x55d5dac5dd65]
> -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x55d5dac5db8b] ) 0-:
> received signum (15), shutting down
>
> [2018-09-24 13:51:16.116616] I [fuse-bridge.c:5981:fini] 0-fuse:
> Unmounting '/var/mountbroker-root/user1300/mtpt-geoaccount-ARDW1E'.
>
> [2018-09-24 13:51:16.116625] I [fuse-bridge.c:5986:fini] 0-fuse: Closing
> fuse connection to '/var/mountbroker-root/user1300/mtpt-geoaccount-ARDW1E'.
>
>
>
> Regards,
>
> Christian
>
>
>
> *From: *Kotresh Hiremath Ravishankar 
> *Date: *Saturday, 22. September 2018 at 06:52
> *To: *"Kotte, Christian (Ext)" 
> *Cc: *Gluster Users 
> *Subject: *Re: [Gluster-users] [geo-rep] Replication faulty - gsyncd.log
> OSError: [Errno 13] Permission denied
>
>
>
> The problem occured on slave side whose error is propagated to master.
> Mostly any traceback with repce involved is related to problem in slave.
> Just check few lines above in the log to find the slave node, the crashed
> worker is connected to and get geo replication logs to further debug.
>
>
>
>
>
>
>
>
>
>
>
> On Fri, 21 Sep 2018, 20:10 Kotte, Christian (Ext), <
> christian.ko...@novartis.com> wrote:
>
> Hi,
>
>
>
> Any idea how to troubleshoot this?
>
>
>
> New folders and files were created on the master and the replication went
> faulty. They were created via Samba.
>
>
>
> Version: GlusterFS 4.1.3
>
>
>
> [root@master]# gluster volume geo-replication status
>
>
>
> MASTER NODE MASTER VOL MASTER BRICK
> SLAVE USER
> SLAVE SLAVE
> NODESTATUSCRAWL STATUSLAST_SYNCED
>
>
> -
>
> masterglustervol1/bricks/brick1/brickgeoaccount
> ssh://geoaccount@slave_1::glustervol1   N/A   Faulty
> N/A N/A
>
> masterglustervol1/bricks/brick1/brickgeoaccount
> ssh://geoaccount@slave_2::glustervol1   N/A   Faulty
> N/A N/A
>
> masterglustervol1/bricks/brick1/brickgeoaccount
> ssh://geoaccount@interimmaster::glustervol1   N/A   Faulty
> N/A N/A
>
>
>
> The following error is repeatedly logged in the gsyncd.logs:
>
> [2018-09-21 14:26:38.611479] I [repce(agent
> /bricks/brick1/brick):80:service_loop] RepceServer: terminating on reaching
> EOF.
>
> [2018-09-21 14:26:39.211527] I [monitor(monitor):279:monitor] Monitor:
> worker died in startup phase brick=/bricks/brick1/brick
>
> [2018-09-21 14:26:39.214322] I
> [gsyncdstatus(monitor):244:set_worker_status] GeorepStatus: Worker Status
> Change status=Faulty
>
> [2018-09-21 14:26:49.318953] I [monitor(monitor):158:monitor] Monitor:
> starting gsyncd worker   brick=/

Re: [Gluster-users] [geo-rep] Replication faulty - gsyncd.log OSError: [Errno 13] Permission denied

2018-09-21 Thread Kotresh Hiremath Ravishankar
The problem occured on slave side whose error is propagated to master.
Mostly any traceback with repce involved is related to problem in slave.
Just check few lines above in the log to find the slave node, the crashed
worker is connected to and get geo replication logs to further debug.





On Fri, 21 Sep 2018, 20:10 Kotte, Christian (Ext), <
christian.ko...@novartis.com> wrote:

> Hi,
>
>
>
> Any idea how to troubleshoot this?
>
>
>
> New folders and files were created on the master and the replication went
> faulty. They were created via Samba.
>
>
>
> Version: GlusterFS 4.1.3
>
>
>
> [root@master]# gluster volume geo-replication status
>
>
>
> MASTER NODE MASTER VOL MASTER BRICK
> SLAVE USER
> SLAVE SLAVE
> NODESTATUSCRAWL STATUSLAST_SYNCED
>
>
> -
>
> masterglustervol1/bricks/brick1/brickgeoaccount
> ssh://geoaccount@slave_1::glustervol1   N/A   Faulty
> N/A N/A
>
> masterglustervol1/bricks/brick1/brickgeoaccount
> ssh://geoaccount@slave_2::glustervol1   N/A   Faulty
> N/A N/A
>
> masterglustervol1/bricks/brick1/brickgeoaccount
> ssh://geoaccount@interimmaster::glustervol1   N/A   Faulty
> N/A N/A
>
>
>
> The following error is repeatedly logged in the gsyncd.logs:
>
> [2018-09-21 14:26:38.611479] I [repce(agent
> /bricks/brick1/brick):80:service_loop] RepceServer: terminating on reaching
> EOF.
>
> [2018-09-21 14:26:39.211527] I [monitor(monitor):279:monitor] Monitor:
> worker died in startup phase brick=/bricks/brick1/brick
>
> [2018-09-21 14:26:39.214322] I
> [gsyncdstatus(monitor):244:set_worker_status] GeorepStatus: Worker Status
> Change status=Faulty
>
> [2018-09-21 14:26:49.318953] I [monitor(monitor):158:monitor] Monitor:
> starting gsyncd worker   brick=/bricks/brick1/brick  slave_node=
> nrchbs-slp2020.nibr.novartis.net
>
> [2018-09-21 14:26:49.471532] I [gsyncd(agent
> /bricks/brick1/brick):297:main] : Using session config file
> path=/var/lib/glusterd/geo-replication/glustervol1_nrchbs-slp2020.nibr.novartis.net_glustervol1/gsyncd.conf
>
> [2018-09-21 14:26:49.473917] I [changelogagent(agent
> /bricks/brick1/brick):72:__init__] ChangelogAgent: Agent listining...
>
> [2018-09-21 14:26:49.491359] I [gsyncd(worker
> /bricks/brick1/brick):297:main] : Using session config file
> path=/var/lib/glusterd/geo-replication/glustervol1_nrchbs-slp2020.nibr.novartis.net_glustervol1/gsyncd.conf
>
> [2018-09-21 14:26:49.538049] I [resource(worker
> /bricks/brick1/brick):1377:connect_remote] SSH: Initializing SSH connection
> between master and slave...
>
> [2018-09-21 14:26:53.5017] I [resource(worker
> /bricks/brick1/brick):1424:connect_remote] SSH: SSH connection between
> master and slave established.  duration=3.4665
>
> [2018-09-21 14:26:53.5419] I [resource(worker
> /bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume
> locally...
>
> [2018-09-21 14:26:54.120374] I [resource(worker
> /bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster volume
> duration=1.1146
>
> [2018-09-21 14:26:54.121012] I [subcmds(worker
> /bricks/brick1/brick):70:subcmd_worker] : Worker spawn successful.
> Acknowledging back to monitor
>
> [2018-09-21 14:26:56.144460] I [master(worker
> /bricks/brick1/brick):1593:register] _GMaster: Working dir
> path=/var/lib/misc/gluster/gsyncd/glustervol1_nrchbs-slp2020.nibr.novartis.net_glustervol1/bricks-brick1-brick
>
> [2018-09-21 14:26:56.145145] I [resource(worker
> /bricks/brick1/brick):1282:service_loop] GLUSTER: Register time
> time=1537540016
>
> [2018-09-21 14:26:56.160064] I [gsyncdstatus(worker
> /bricks/brick1/brick):277:set_active] GeorepStatus: Worker Status Change
> status=Active
>
> [2018-09-21 14:26:56.161175] I [gsyncdstatus(worker
> /bricks/brick1/brick):249:set_worker_crawl_status] GeorepStatus: Crawl
> Status Changestatus=History Crawl
>
> [2018-09-21 14:26:56.161536] I [master(worker
> /bricks/brick1/brick):1507:crawl] _GMaster: starting history crawl
> turns=1 stime=(1537522637, 0)   entry_stime=(1537537141, 0)
> etime=1537540016
>
> [2018-09-21 14:26:56.164277] I [master(worker
> /bricks/brick1/brick):1536:crawl] _GMaster: slave's time
> stime=(1537522637, 0)
>
> [2018-09-21 14:26:56.197065] I [master(worker
> /bricks/brick1/brick):1360:process] _GMaster: Skipping already processed
> entry opsto_changelog=1537522638 num_changelogs=1
> from_changelog=1537522638
>
> [2018-09-21 14:26:56.197402] I [master(worker
> /bricks/brick1/brick):1374:process] _GMaster: Entry Time TakenMKD=0
> MKN=0   LIN=0   SYM=0   REN=0   RMD=0   CRE=0   duration=0. UNL=1
>
> [2018-09-21 14:26:56.197623] I [master(worker
> 

Re: [Gluster-users] posix set mdata failed, No ctime

2018-09-21 Thread Kotresh Hiremath Ravishankar
You can ignore this error. It is fixed and should be available in next
4.1.x release.



On Sat, 22 Sep 2018, 07:07 Pedro Costa,  wrote:

> Forgot to mention, I’m running all VM’s with 16.04.1-Ubuntu,  Kernel
> 4.15.0-1023-azure #24
>
>
>
>
>
> *From:* Pedro Costa
> *Sent:* 21 September 2018 10:16
> *To:* 'gluster-users@gluster.org' 
> *Subject:* posix set mdata failed, No ctime
>
>
>
> Hi,
>
>
>
> I have a replicate x3 volume with the following config:
>
>
>
> ```
>
> Volume Name: gvol1
>
> Type: Replicate
>
> Volume ID: 384acec2-5b5f-40da-bf0e-5c53d12b3ae2
>
> Status: Started
>
> Snapshot Count: 0
>
> Number of Bricks: 1 x 3 = 3
>
> Transport-type: tcp
>
> Bricks:
>
> Brick1: vm0:/srv/brick1/gvol1
>
> Brick2: vm1:/srv/brick1/gvol1
>
> Brick3: vm2:/srv/brick1/gvol1
>
> Options Reconfigured:
>
> storage.ctime: on
>
> features.utime: on
>
> storage.fips-mode-rchecksum: on
>
> performance.client-io-threads: off
>
> nfs.disable: on
>
> transport.address-family: inet
>
> ```
>
>
>
> This volume was actually created on v3.8, but as since been upgraded
> (version by version) to v4.1.4 and it’s working fine (for the most part):
>
>
>
> ```
>
> Client connections for volume gvol1
>
> --
>
> Brick : vm0:/srv/brick1/gvol1
>
> Clients connected : 6
>
> Hostname   BytesRead
> BytesWritten   OpVersion
>
>    -
>    -
>
> 10.X.0.5:49143  2096520
> 2480212   40100
>
> 10.X.0.6:4914114000
> 12812   40100
>
> 10.X.0.4:49134   258324
> 333456   40100
>
> 10.X.0.4:49141565372566
> 1643447105   40100
>
> 10.X.0.5:49145491262003
> 291782440   40100
>
> 10.X.0.6:49139482629418
> 32822   40100
>
> --
>
> Brick : vm1:/srv/brick1/gvol1
>
> Clients connected : 6
>
> Hostname   BytesRead
> BytesWritten   OpVersion
>
>    -
>    -
>
> 10.X.0.6:49146   658516
> 508904   40100
>
> 10.X.0.5:49133  4142848
> 7139858   40100
>
> 10.X.0.4:49138 4088
> 3696   40100
>
> 10.X.0.4:49140471405874
> 284488736   40100
>
> 10.X.0.5:49140585193563
> 1670630439   40100
>
> 10.X.0.6:49138482407454
> 330274812   40100
>
> --
>
> Brick : vm2:/srv/brick1/gvol1
>
> Clients connected : 6
>
> Hostname   BytesRead
> BytesWritten   OpVersion
>
>    -
>    -
>
> 10.X.0.6:49133  1789624
> 4340938   40100
>
> 10.X.0.5:49137  3010064
> 3005184   40100
>
> 10.X.0.4:49143 4268
> 3744   40100
>
> 10.X.0.4:49139471328402
> 283798376   40100
>
> 10.X.0.5:49139491404443
> 293342568   40100
>
> 10.X.0.6:49140561683906
> 830511730   40100
>
> --
>
> ```
>
>
>
> I’m now getting a lot of errors on the brick log file, like:
>
>
>
> `The message "W [MSGID: 113117] [posix-metadata.c:627:posix_set_ctime]
> 0-gvol1-posix: posix set mdata failed, No ctime :
> /srv/brick1/gvol1/.glusterfs/18/d0/18d04927-1ec0-4779-8c5b-7ebb82e4a614
> gfid:18d04927-1ec0-4779-8c5b-7ebb82e4a614 [Function not implemented]"
> repeated 2 times between [2018-09-21 08:21:52.480797] and [2018-09-21
> 08:22:07.529625]`
>
>
>
> For different files but the most common is a file that the Node.js
> application that runs on top of the gluster via a fuse client (glusterfs)
> stats every 5s for changes,
> https://nodejs.org/api/fs.html#fs_fs_stat_path_options_callback
>
>
>
> I think this is also related to another issue, when reading the file it
> returns an empty result (not always), as the app reports:
>
>
>
> `2018-09-21 08:22:00 | [vm0] [nobody] sync hosts: invalid
> applications.json, response was empty.`
>
>
>
> Doing `gluster volume heal gvol1 info` yields 0 for all bricks.
>
>
>
> Should I be concerned about the warning, is this a known issue? If not,
> what could be causing the 

Re: [Gluster-users] 4.1.x geo-replication "changelogs could not be processed completely" issue

2018-09-11 Thread Kotresh Hiremath Ravishankar
Answer inline.

On Tue, Sep 11, 2018 at 4:19 PM, Kotte, Christian (Ext) <
christian.ko...@novartis.com> wrote:

> Hi all,
>
>
>
> I use glusterfs 4.1.3 non-root user geo-replication in a cascading setup.
> The gsyncd.log on the master is fine, but I have some strange changelog
> warnings and errors on the interimmaster:
>
>
>
> gsyncd.log
>
> …
>
> [2018-09-11 10:38:35.575464] I [master(worker
> /bricks/brick1/brick):1460:crawl] _GMaster: slave's time
> stime=(1536662250, 0)
>
> [2018-09-11 10:38:37.126749] I [master(worker
> /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken
> duration=1.4698 num_files=1 job=1   return_code=23
>
> [2018-09-11 10:38:37.128668] W [master(worker
> /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying
> changelogsfiles=['CHANGELOG.1536662311']
>
> [2018-09-11 10:38:39.353209] I [master(worker
> /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken
> duration=1.4057 num_files=1 job=2   return_code=23
>
> [2018-09-11 10:38:39.354737] W [master(worker
> /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying
> changelogsfiles=['CHANGELOG.1536662311']
>
> [2018-09-11 10:38:41.501187] I [master(worker
> /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken
> duration=1.4781 num_files=1 job=3   return_code=23
>
> [2018-09-11 10:38:41.503048] W [master(worker
> /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying
> changelogsfiles=['CHANGELOG.1536662311']
>
> [2018-09-11 10:38:43.575047] I [master(worker
> /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken
> duration=1.4385 num_files=1 job=1   return_code=23
>
> [2018-09-11 10:38:43.576597] W [master(worker
> /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying
> changelogsfiles=['CHANGELOG.1536662311']
>
> [2018-09-11 10:38:45.838089] I [master(worker
> /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken
> duration=1.4765 num_files=1 job=2   return_code=23
>
> [2018-09-11 10:38:45.840205] W [master(worker
> /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying
> changelogsfiles=['CHANGELOG.1536662311']
>
> [2018-09-11 10:38:47.969033] I [master(worker
> /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken
> duration=1.4602 num_files=1 job=3   return_code=23
>
> [2018-09-11 10:38:47.970118] W [master(worker
> /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying
> changelogsfiles=['CHANGELOG.1536662311']
>
> [2018-09-11 10:38:50.54420] I [master(worker 
> /bricks/brick1/brick):1944:syncjob]
> Syncer: Sync Time Takenduration=1.4717 num_files=1 job=1
> return_code=23
>
> [2018-09-11 10:38:50.56072] W [master(worker 
> /bricks/brick1/brick):1346:process]
> _GMaster: incomplete sync, retrying changelogs
> files=['CHANGELOG.1536662311']
>
> [2018-09-11 10:38:52.317955] I [master(worker
> /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken
> duration=1.4711 num_files=1 job=2   return_code=23
>
> [2018-09-11 10:38:52.319642] W [master(worker
> /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying
> changelogsfiles=['CHANGELOG.1536662311']
>
> [2018-09-11 10:38:54.448926] I [master(worker
> /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken
> duration=1.4715 num_files=1 job=3   return_code=23
>
> [2018-09-11 10:38:54.451127] W [master(worker
> /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying
> changelogsfiles=['CHANGELOG.1536662311']
>
> [2018-09-11 10:38:56.538007] I [master(worker
> /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken
> duration=1.4759 num_files=1 job=1   return_code=23
>
> [2018-09-11 10:38:56.538914] E [master(worker
> /bricks/brick1/brick):1325:process] _GMaster: changelogs could not be
> processed completely - moving on... files=['CHANGELOG.1536662311']
>
> [2018-09-11 10:38:56.544816] I [master(worker
> /bricks/brick1/brick):1374:process] _GMaster: Entry Time TakenMKD=0
> MKN=0   LIN=0   SYM=0   REN=0   RMD=0   CRE=0   duration=0. UNL=0
>
> [2018-09-11 10:38:56.545031] I [master(worker
> /bricks/brick1/brick):1384:process] _GMaster: Data/Metadata Time Taken
> SETA=0  SETX=0  meta_duration=0.data_duration=1536662336.5450
> DATA=0  XATT=0
>
> [2018-09-11 10:38:56.545356] I [master(worker
> /bricks/brick1/brick):1394:process] _GMaster: Batch Completed
> changelog_end=1536662311entry_stime=None
> changelog_start=1536662311  stime=(1536662310, 0)
> duration=20.9674num_changelogs=1mode=live_changelog
>


There seems to be a bug, please raise a bug. For now as a work around add
the following line at the end on all the master node's configuration with
any editor. After adding it on all master nodes, stop and start geo-rep.

rsync-options = --ignore-missing-args

configuration file:
/var/lib/glusterd/geo-replication/__gsyncd.conf







>
> I had those 

Re: [Gluster-users] GlusterFS 4.1.3, Geo replication unable to setup

2018-09-06 Thread Kotresh Hiremath Ravishankar
Hi Nico,

The glusterd has crashed on this node. Please raise a bug with core file?

Please use the following tool [1] to setup geo-rep by bringing back the
glusterd
if you are finding it difficult with geo-rep setup steps and let us know if
if it still crashes?

[1] http://aravindavk.in/blog/introducing-georepsetup/



On Thu, Sep 6, 2018 at 2:54 PM, Nico van Royen  wrote:

> Hello,
>
> On our dev environment we want to test GeoReplication with GlusterFS 4.1
> and every attempt so far fails.
> For now, we don't care (yet) about running it as a non-root user (not
> using the mountbroker etc).
>
> Installed packages, both on master and slaves:
> [root@clrv110367 geo-replication]# rpm -qa | grep gluster
> glusterfs-client-xlators-4.1.3-1.el7.x86_64
> glusterfs-events-4.1.3-1.el7.x86_64
> glusterfs-geo-replication-4.1.3-1.el7.x86_64
> glusterfs-4.1.3-1.el7.x86_64
> glusterfs-api-4.1.3-1.el7.x86_64
> glusterfs-fuse-4.1.3-1.el7.x86_64
> glusterfs-server-4.1.3-1.el7.x86_64
> glusterfs-rdma-4.1.3-1.el7.x86_64
> glusterfs-extra-xlators-4.1.3-1.el7.x86_64
> glusterfs-libs-4.1.3-1.el7.x86_64
> glusterfs-cli-4.1.3-1.el7.x86_64
> python2-gluster-4.1.3-1.el7.x86_64
> glusterfs-coreutils-0.2.0-1.el7.x86_64
>
> Master volume setup:
> # gluster v create VOLUME2 replica 3 arbiter 1 transport tcp
> clrv110367:/gluster/VOLUME2/export clrv110371:/gluster/VOLUME2/export
> clrv110389:/gluster/VOLUME2/export
> # gluster v start VOLUME2
> # gluster volume set all cluster.enable-shared-storage enable
>
> Slave volume setup
> # gluster v create VOLUME2 replica 3 arbiter 1 transport tcp
> clrv110605:/gluster/VOLUME2/export clrv110608:/gluster/VOLUME2/export
> clrv110606:/gluster/VOLUME2/export
> # gluster v start VOLUME2
> # gluster volume set all cluster.enable-shared-storage enable
>
> On master server:
> # ssh-keygen   (accepting all defaults)
> # ssh-copy-id  clrv110605(one of the slave servers)
> # gluster-georep-sshkey generate
> # gluster volume geo-replication VOLUME2 clrv110605.ic.ing.net::VOLUME2
> create push-pem
>
> Several seconds later, all of the glusterd instances on the master side
> crash, with /var/log/glusterfs/glusterd.log such as:
>
> [2018-09-06 08:50:20.663584] W [MSGID: 106028] 
> [glusterd-geo-rep.c:2568:glusterd_get_statefile_name]
> 0-management: Config file (/var/lib/glusterd/geo-replication/VOLUME2_
> clrv110605_VOLUME2/gsyncd.conf) missing. Looking for template config
> file (/var/lib/glusterd/geo-replication/gsyncd_template.conf) [No such
> file or directory]
> [2018-09-06 08:50:20.663724] I [MSGID: 106294] 
> [glusterd-geo-rep.c:2577:glusterd_get_statefile_name]
> 0-management: Using default config template(/var/lib/glusterd/
> geo-replication/gsyncd_template.conf).
> [2018-09-06 08:50:24.072321] I [MSGID: 106494] [glusterd-handler.c:3024:__
> glusterd_handle_cli_profile_volume] 0-management: Received volume profile
> req for volume VOLUME1
> [2018-09-06 08:50:24.074876] I [MSGID: 106487] [glusterd-handler.c:1486:__
> glusterd_handle_cli_list_friends] 0-glusterd: Received cli list req
> [2018-09-06 08:50:24.744276] I [MSGID: 106131] 
> [glusterd-proc-mgmt.c:83:glusterd_proc_stop]
> 0-management: nfs already stopped
> [2018-09-06 08:50:24.73] I [MSGID: 106568] 
> [glusterd-svc-mgmt.c:235:glusterd_svc_stop]
> 0-management: nfs service is stopped
> [2018-09-06 08:50:24.744497] I [MSGID: 106599] 
> [glusterd-nfs-svc.c:82:glusterd_nfssvc_manager]
> 0-management: nfs/server.so xlator is not installed
> [2018-09-06 08:50:24.749139] I [MSGID: 106568] 
> [glusterd-proc-mgmt.c:87:glusterd_proc_stop]
> 0-management: Stopping glustershd daemon running in pid: 40886
> [2018-09-06 08:50:25.749748] I [MSGID: 106568] 
> [glusterd-svc-mgmt.c:235:glusterd_svc_stop]
> 0-management: glustershd service is stopped
> [2018-09-06 08:50:25.750047] I [MSGID: 106567] 
> [glusterd-svc-mgmt.c:203:glusterd_svc_start]
> 0-management: Starting glustershd service
> [2018-09-06 08:50:25.757036] I [MSGID: 106131] 
> [glusterd-proc-mgmt.c:83:glusterd_proc_stop]
> 0-management: bitd already stopped
> [2018-09-06 08:50:25.757100] I [MSGID: 106568] 
> [glusterd-svc-mgmt.c:235:glusterd_svc_stop]
> 0-management: bitd service is stopped
> [2018-09-06 08:50:25.757288] I [MSGID: 106131] 
> [glusterd-proc-mgmt.c:83:glusterd_proc_stop]
> 0-management: scrub already stopped
> [2018-09-06 08:50:25.757330] I [MSGID: 106568] 
> [glusterd-svc-mgmt.c:235:glusterd_svc_stop]
> 0-management: scrub service is stopped
> [2018-09-06 08:50:28.391332] I [run.c:241:runner_log]
> (-->/usr/lib64/glusterfs/4.1.3/xlator/mgmt/glusterd.so(+0xe2b1a)
> [0x7fae33053b1a] 
> -->/usr/lib64/glusterfs/4.1.3/xlator/mgmt/glusterd.so(+0xe25e5)
> [0x7fae330535e5] -->/lib64/libglusterfs.so.0(runner_log+0x115)
> [0x7fae3e55f0c5] ) 0-management: Ran script: /var/lib/glusterd/hooks/1/
> gsync-create/post/S56glusterd-geo-rep-create-post.sh --volname=VOLUME2
> is_push_pem=1,pub_file=/var/lib/glusterd/geo-replication/
> 

Re: [Gluster-users] Geo-Replication issue

2018-09-06 Thread Kotresh Hiremath Ravishankar
Hi Krishna,

The logs indicate that it has received gluster-poc-sj::gluster as the slave
volume where as the actual slave volume is gluster-poc-sj::glusterdist

what's you irc nick in #gluster?

On Thu, Sep 6, 2018 at 1:45 PM, Krishna Verma  wrote:

> Hi Kotresh,
>
>
>
> [root@gluster-poc-noida repvol]# tailf /var/log/glusterfs/glusterd.log
>
> [2018-09-06 07:57:03.443256] W [MSGID: 106028] 
> [glusterd-geo-rep.c:2568:glusterd_get_statefile_name]
> 0-management: Config file (/var/lib/glusterd/geo-replication/glusterdist_
> gluster-poc-sj_gluster/gsyncd.conf) missing. Looking for template config
> file (/var/lib/glusterd/geo-replication/gsyncd_template.conf) [No such
> file or directory]
>
> [2018-09-06 07:57:03.443339] I [MSGID: 106294] 
> [glusterd-geo-rep.c:2577:glusterd_get_statefile_name]
> 0-management: Using default config template(/var/lib/glusterd/
> geo-replication/gsyncd_template.conf).
>
> [2018-09-06 07:57:03.512014] E [MSGID: 106028] 
> [glusterd-geo-rep.c:3577:glusterd_op_stage_gsync_set]
> 0-management: Geo-replication session between glusterdist and
> gluster-poc-sj::gluster does not exist.. statefile = /var/lib/glusterd/geo-
> replication/glusterdist_gluster-poc-sj_gluster/monitor.status [No such
> file or directory]
>
> [2018-09-06 07:57:03.512049] E [MSGID: 106322] 
> [glusterd-geo-rep.c:3778:glusterd_op_stage_gsync_set]
> 0-management: Geo-replication session between glusterdist and
> gluster-poc-sj::gluster does not exist.
>
> [2018-09-06 07:57:03.512063] E [MSGID: 106301] 
> [glusterd-syncop.c:1352:gd_stage_op_phase]
> 0-management: Staging of operation 'Volume Geo-replication' failed on
> localhost : Geo-replication session between glusterdist and
> gluster-poc-sj::gluster does not exist.
>
> [2018-09-06 07:57:24.869113] E [MSGID: 106316] 
> [glusterd-geo-rep.c:2761:glusterd_verify_slave]
> 0-management: Not a valid slave
>
> [2018-09-06 07:57:24.869289] E [MSGID: 106316] [glusterd-geo-rep.c:3152:
> glusterd_op_stage_gsync_create] 0-management: gluster-poc-sj::gluster is
> not a valid slave volume. Error: Unable to mount and fetch slave volume
> details. Please check the log: /var/log/glusterfs/geo-
> replication/gverify-slavemnt.log
>
> [2018-09-06 07:57:24.869313] E [MSGID: 106301] 
> [glusterd-syncop.c:1352:gd_stage_op_phase]
> 0-management: Staging of operation 'Volume Geo-replication Create' failed
> on localhost : Unable to mount and fetch slave volume details. Please check
> the log: /var/log/glusterfs/geo-replication/gverify-slavemnt.log
>
> [2018-09-06 07:56:38.421045] I [MSGID: 106308] [glusterd-geo-rep.c:4881:
> glusterd_get_gsync_status_mst_slv] 0-management: geo-replication status
> glusterdist gluster-poc-sj::gluster : session is not active
>
> [2018-09-06 07:56:38.486229] I [MSGID: 106028] [glusterd-geo-rep.c:4903:
> glusterd_get_gsync_status_mst_slv] 0-management: /var/lib/glusterd/geo-
> replication/glusterdist_gluster-poc-sj_gluster/monitor.status statefile
> not present. [No such file or directory]
>
>
>
> /Krishna
>
>
>
> *From:* Kotresh Hiremath Ravishankar 
> *Sent:* Thursday, September 6, 2018 1:20 PM
> *To:* Krishna Verma 
> *Cc:* Gluster Users 
> *Subject:* Re: [Gluster-users] Geo-Replication issue
>
>
>
> EXTERNAL MAIL
>
> Hi Krishna,
>
> glusterd log file would help here
>
> Thanks,
>
> Kotresh HR
>
>
>
> On Thu, Sep 6, 2018 at 1:02 PM, Krishna Verma  wrote:
>
> Hi All,
>
>
>
> I am getting issue in geo-replication distributed gluster volume. In a
> session status it shows only peer node instead of 2. And I am also not able
> to delete/start/stop or anything on this session.
>
>
>
> geo-replication distributed gluster volume “glusterdist” status
>
> [root@gluster-poc-noida ~]# gluster volume status glusterdist
>
> Status of volume: glusterdist
>
> Gluster process TCP Port  RDMA Port  Online
> Pid
>
> 
> --
>
> Brick gluster-poc-noida:/data/gluster-dist/
>
> distvol 49154 0  Y
> 23138
>
> Brick noi-poc-gluster:/data/gluster-dist/di
>
> stvol   49154 0  Y
> 14637
>
>
>
> Task Status of Volume glusterdist
>
> 
> --
>
> There are no active volume tasks
>
>
>
> Geo-replication session status
>
> [root@gluster-poc-noida ~]# gluster volume geo-replication glusterdist
> gluster-poc-sj::glusterdist status
>
>
>
> MASTER NODEMASTER VOL MASTER BRICK  SLAVE

Re: [Gluster-users] Geo-Replication issue

2018-09-06 Thread Kotresh Hiremath Ravishankar
Hi Krishna,

Could you come online in #gluster channel on freenode? That would be faster.

On Thu, Sep 6, 2018 at 1:45 PM, Krishna Verma  wrote:

> Hi Kotresh,
>
>
>
> [root@gluster-poc-noida repvol]# tailf /var/log/glusterfs/glusterd.log
>
> [2018-09-06 07:57:03.443256] W [MSGID: 106028] 
> [glusterd-geo-rep.c:2568:glusterd_get_statefile_name]
> 0-management: Config file (/var/lib/glusterd/geo-replication/glusterdist_
> gluster-poc-sj_gluster/gsyncd.conf) missing. Looking for template config
> file (/var/lib/glusterd/geo-replication/gsyncd_template.conf) [No such
> file or directory]
>
> [2018-09-06 07:57:03.443339] I [MSGID: 106294] 
> [glusterd-geo-rep.c:2577:glusterd_get_statefile_name]
> 0-management: Using default config template(/var/lib/glusterd/
> geo-replication/gsyncd_template.conf).
>
> [2018-09-06 07:57:03.512014] E [MSGID: 106028] 
> [glusterd-geo-rep.c:3577:glusterd_op_stage_gsync_set]
> 0-management: Geo-replication session between glusterdist and
> gluster-poc-sj::gluster does not exist.. statefile = /var/lib/glusterd/geo-
> replication/glusterdist_gluster-poc-sj_gluster/monitor.status [No such
> file or directory]
>
> [2018-09-06 07:57:03.512049] E [MSGID: 106322] 
> [glusterd-geo-rep.c:3778:glusterd_op_stage_gsync_set]
> 0-management: Geo-replication session between glusterdist and
> gluster-poc-sj::gluster does not exist.
>
> [2018-09-06 07:57:03.512063] E [MSGID: 106301] 
> [glusterd-syncop.c:1352:gd_stage_op_phase]
> 0-management: Staging of operation 'Volume Geo-replication' failed on
> localhost : Geo-replication session between glusterdist and
> gluster-poc-sj::gluster does not exist.
>
> [2018-09-06 07:57:24.869113] E [MSGID: 106316] 
> [glusterd-geo-rep.c:2761:glusterd_verify_slave]
> 0-management: Not a valid slave
>
> [2018-09-06 07:57:24.869289] E [MSGID: 106316] [glusterd-geo-rep.c:3152:
> glusterd_op_stage_gsync_create] 0-management: gluster-poc-sj::gluster is
> not a valid slave volume. Error: Unable to mount and fetch slave volume
> details. Please check the log: /var/log/glusterfs/geo-
> replication/gverify-slavemnt.log
>
> [2018-09-06 07:57:24.869313] E [MSGID: 106301] 
> [glusterd-syncop.c:1352:gd_stage_op_phase]
> 0-management: Staging of operation 'Volume Geo-replication Create' failed
> on localhost : Unable to mount and fetch slave volume details. Please check
> the log: /var/log/glusterfs/geo-replication/gverify-slavemnt.log
>
> [2018-09-06 07:56:38.421045] I [MSGID: 106308] [glusterd-geo-rep.c:4881:
> glusterd_get_gsync_status_mst_slv] 0-management: geo-replication status
> glusterdist gluster-poc-sj::gluster : session is not active
>
> [2018-09-06 07:56:38.486229] I [MSGID: 106028] [glusterd-geo-rep.c:4903:
> glusterd_get_gsync_status_mst_slv] 0-management: /var/lib/glusterd/geo-
> replication/glusterdist_gluster-poc-sj_gluster/monitor.status statefile
> not present. [No such file or directory]
>
>
>
> /Krishna
>
>
>
> *From:* Kotresh Hiremath Ravishankar 
> *Sent:* Thursday, September 6, 2018 1:20 PM
> *To:* Krishna Verma 
> *Cc:* Gluster Users 
> *Subject:* Re: [Gluster-users] Geo-Replication issue
>
>
>
> EXTERNAL MAIL
>
> Hi Krishna,
>
> glusterd log file would help here
>
> Thanks,
>
> Kotresh HR
>
>
>
> On Thu, Sep 6, 2018 at 1:02 PM, Krishna Verma  wrote:
>
> Hi All,
>
>
>
> I am getting issue in geo-replication distributed gluster volume. In a
> session status it shows only peer node instead of 2. And I am also not able
> to delete/start/stop or anything on this session.
>
>
>
> geo-replication distributed gluster volume “glusterdist” status
>
> [root@gluster-poc-noida ~]# gluster volume status glusterdist
>
> Status of volume: glusterdist
>
> Gluster process TCP Port  RDMA Port  Online
> Pid
>
> 
> --
>
> Brick gluster-poc-noida:/data/gluster-dist/
>
> distvol 49154 0  Y
> 23138
>
> Brick noi-poc-gluster:/data/gluster-dist/di
>
> stvol   49154 0  Y
> 14637
>
>
>
> Task Status of Volume glusterdist
>
> 
> --
>
> There are no active volume tasks
>
>
>
> Geo-replication session status
>
> [root@gluster-poc-noida ~]# gluster volume geo-replication glusterdist
> gluster-poc-sj::glusterdist status
>
>
>
> MASTER NODEMASTER VOL MASTER BRICK  SLAVE
> USERSLAVE  SLAVE NODESTATUS CRAWL
> STATUSLAST_SYNCED

Re: [Gluster-users] Upgrade to 4.1.2 geo-replication does not work

2018-09-06 Thread Kotresh Hiremath Ravishankar
Could you append something to this file and check whether it gets synced
now?


On Thu, Sep 6, 2018 at 9:08 AM, Krishna Verma  wrote:

> Hi Kotresh,
>
>
>
> Did you get a chance to look into this?
>
>
>
> For replicated gluster volume, Still Master is not getting sync with slave.
>
>
>
> At Master :
>
> [root@gluster-poc-noida ~]# du -sh /repvol/rflowTestInt18.08-b001.t.Z
>
> 1.2G/repvol/rflowTestInt18.08-b001.t.Z
>
> [root@gluster-poc-noida ~]#
>
>
>
> At Slave:
>
> [root@gluster-poc-sj ~]# du -sh /repvol/rflowTestInt18.08-b001.t.Z
>
> du: cannot access ‘/repvol/rflowTestInt18.08-b001.t.Z’: No such file or
> directory
>
> [root@gluster-poc-sj ~]#
>
>
>
> File not reached at slave.
>
>
>
> /Krishna
>
>
>
> *From:* Krishna Verma
> *Sent:* Monday, September 3, 2018 4:41 PM
> *To:* 'Kotresh Hiremath Ravishankar' 
> *Cc:* Sunny Kumar ; Gluster Users <
> gluster-users@gluster.org>
> *Subject:* RE: [Gluster-users] Upgrade to 4.1.2 geo-replication does not
> work
>
>
>
> Hi Kotesh:
>
>
>
> Gluster Master Site Servers : gluster-poc-noida and noi-poc-gluster
>
> Gluster Slave site servers: gluster-poc-sj and gluster-poc-sj2
>
>
>
> Master Client : noi-foreman02
>
> Slave Client: sj-kverma
>
>
>
> Step1: Create a LVM partition of 10 GB on all 4 Gluster nodes  (2 Master)
> *  (2 slave) and format that in ext4 filesystem and mount that on server.
>
>
>
> [root@gluster-poc-noida distvol]# df -hT /data/gluster-dist
>
> FilesystemType  Size  Used Avail Use% Mounted
> on
>
> /dev/mapper/centos-gluster--vol--dist ext4  9.8G  847M  8.4G   9%
> /data/gluster-dist
>
> [root@gluster-poc-noida distvol]#
>
>
>
> Step 2:  Created a Trusted storage pool as below:
>
>
>
> At Master:
>
> [root@gluster-poc-noida distvol]# gluster peer status
>
> Number of Peers: 1
>
>
>
> Hostname: noi-poc-gluster
>
> Uuid: 01316459-b5c8-461d-ad25-acc17a82e78f
>
> State: Peer in Cluster (Connected)
>
> [root@gluster-poc-noida distvol]#
>
>
>
> At Slave:
>
> [root@gluster-poc-sj ~]# gluster peer status
>
> Number of Peers: 1
>
>
>
> Hostname: gluster-poc-sj2
>
> Uuid: 6ba85bfe-cd74-4a76-a623-db687f7136fa
>
> State: Peer in Cluster (Connected)
>
> [root@gluster-poc-sj ~]#
>
>
>
> Step 3: Created distributed volume as below:
>
>
>
> At Master:  “gluster volume create glusterdist 
> gluster-poc-noida:/data/gluster-dist/distvol
> noi-poc-gluster:/data/gluster-dist/distvol”
>
>
>
> [root@gluster-poc-noida distvol]# gluster volume info glusterdist
>
>
>
> Volume Name: glusterdist
>
> Type: Distribute
>
> Volume ID: af5b2915-7170-4b5e-aee8-7e68757b9bf1
>
> Status: Started
>
> Snapshot Count: 0
>
> Number of Bricks: 2
>
> Transport-type: tcp
>
> Bricks:
>
> Brick1: gluster-poc-noida:/data/gluster-dist/distvol
>
> Brick2: noi-poc-gluster:/data/gluster-dist/distvol
>
> Options Reconfigured:
>
> changelog.changelog: on
>
> geo-replication.ignore-pid-check: on
>
> geo-replication.indexing: on
>
> transport.address-family: inet
>
> nfs.disable: on
>
> [root@gluster-poc-noida distvol]#
>
>
>
> At Slave “ gluster volume create glusterdist 
> gluster-poc-sj:/data/gluster-dist/distvol
> gluster-poc-sj2:/data/gluster-dist/distvol”
>
>
>
> Volume Name: glusterdist
>
> Type: Distribute
>
> Volume ID: a982da53-a3d7-4b5a-be77-df85f584610d
>
> Status: Started
>
> Snapshot Count: 0
>
> Number of Bricks: 2
>
> Transport-type: tcp
>
> Bricks:
>
> Brick1: gluster-poc-sj:/data/gluster-dist/distvol
>
> Brick2: gluster-poc-sj2:/data/gluster-dist/distvol
>
> Options Reconfigured:
>
> transport.address-family: inet
>
> nfs.disable: on
>
>
>
> Step 4 : Gluster Geo Replication configuration
>
>
>
> On  all Gluster node: “yum install glusterfs-geo-replication.x86_64”
>
> On master node where I created session:
>
> *ssh-keygen*
>
> *ssh-copy-id root@gluster-poc-sj*
>
> *cp /root/.ssh/id_rsa.pub /var/lib/glusterd/geo-replication/secret.pem.pub*
>
> *scp /var/lib/glusterd/geo-replication/secret.pem*
> root@gluster-poc-sj:/var/lib/glusterd/geo-replication/   *
>
>
>
> *On Slave Node: *
>
>
>
>  *ln -s /usr/libexec/glusterfs/gsyncd
> /nonexistent/gsyncd   *
>
>
>
>  On Master Node:
>
>
>
> gluster volume geo-replication glusterdist gluster-poc-sj::glusterdist
> create push-pem force
&

Re: [Gluster-users] Geo-Replication issue

2018-09-06 Thread Kotresh Hiremath Ravishankar
Hi Krishna,

glusterd log file would help here

Thanks,
Kotresh HR

On Thu, Sep 6, 2018 at 1:02 PM, Krishna Verma  wrote:

> Hi All,
>
>
>
> I am getting issue in geo-replication distributed gluster volume. In a
> session status it shows only peer node instead of 2. And I am also not able
> to delete/start/stop or anything on this session.
>
>
>
> geo-replication distributed gluster volume “glusterdist” status
>
> [root@gluster-poc-noida ~]# gluster volume status glusterdist
>
> Status of volume: glusterdist
>
> Gluster process TCP Port  RDMA Port  Online
> Pid
>
> 
> --
>
> Brick gluster-poc-noida:/data/gluster-dist/
>
> distvol 49154 0  Y
> 23138
>
> Brick noi-poc-gluster:/data/gluster-dist/di
>
> stvol   49154 0  Y
> 14637
>
>
>
> Task Status of Volume glusterdist
>
> 
> --
>
> There are no active volume tasks
>
>
>
> Geo-replication session status
>
> [root@gluster-poc-noida ~]# gluster volume geo-replication glusterdist
> gluster-poc-sj::glusterdist status
>
>
>
> MASTER NODEMASTER VOL MASTER BRICK  SLAVE
> USERSLAVE  SLAVE NODESTATUS CRAWL
> STATUSLAST_SYNCED
>
> 
> 
> 
>
> noi-poc-glusterglusterdist/data/gluster-dist/distvol
> root  gluster-poc-sj::glusterdistN/A   Stopped
> N/A N/A
>
> [root@gluster-poc-noida ~]#
>
>
>
> Can’t stop/start/delete the session:
>
> [root@gluster-poc-noida ~]# gluster volume geo-replication glusterdist
> gluster-poc-sj::glusterdist stop
>
> Staging failed on localhost. Please check the log file for more details.
>
> geo-replication command failed
>
> [root@gluster-poc-noida ~]# gluster volume geo-replication glusterdist
> gluster-poc-sj::glusterdist stop force
>
> pid-file entry mising in config file and template config file.
>
> geo-replication command failed
>
> [root@gluster-poc-noida ~]# gluster volume geo-replication glusterdist
> gluster-poc-sj::glusterdist delete
>
> Staging failed on localhost. Please check the log file for more details.
>
> geo-replication command failed
>
> [root@gluster-poc-noida ~]# gluster volume geo-replication glusterdist
> gluster-poc-sj::glusterdist start
>
> Staging failed on localhost. Please check the log file for more details.
>
> geo-replication command failed
>
> [root@gluster-poc-noida ~]#
>
>
>
> gsyncd.log errors
>
> [2018-09-06 06:17:21.757195] I [monitor(monitor):269:monitor] Monitor: worker
> died before establishing connection   brick=/data/gluster-dist/distvol
>
> [2018-09-06 06:17:32.312093] I [monitor(monitor):158:monitor] Monitor:
> starting gsyncd worker   brick=/data/gluster-dist/distvol
> slave_node=gluster-poc-sj
>
> [2018-09-06 06:17:32.441817] I [monitor(monitor):261:monitor] Monitor: 
> Changelog
> Agent died, Aborting Workerbrick=/data/gluster-dist/distvol
>
> [2018-09-06 06:17:32.442193] I [monitor(monitor):279:monitor] Monitor:
> worker died in startup phase brick=/data/gluster-dist/distvol
>
> [2018-09-06 06:17:43.1177] I [monitor(monitor):158:monitor] Monitor:
> starting gsyncd worker brick=/data/gluster-dist/distvol
> slave_node=gluster-poc-sj
>
> [2018-09-06 06:17:43.137794] I [monitor(monitor):261:monitor] Monitor:
> Changelog Agent died, Aborting Workerbrick=/data/gluster-dist/distvol
>
> [2018-09-06 06:17:43.138214] I [monitor(monitor):279:monitor] Monitor:
> worker died in startup phase brick=/data/gluster-dist/distvol
>
> [2018-09-06 06:17:53.144072] I [monitor(monitor):158:monitor] Monitor:
> starting gsyncd worker   brick=/data/gluster-dist/distvol
> slave_node=gluster-poc-sj
>
> [2018-09-06 06:17:53.276853] I [monitor(monitor):261:monitor] Monitor:
> Changelog Agent died, Aborting Workerbrick=/data/gluster-dist/distvol
>
> [2018-09-06 06:17:53.277327] I [monitor(monitor):279:monitor] Monitor:
> worker died in startup phase brick=/data/gluster-dist/distvol
>
>
>
> Could anyone please help?
>
>
>
> /Krishna
>
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>



-- 
Thanks and Regards,
Kotresh H R
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Upgrade to 4.1.2 geo-replication does not work

2018-09-03 Thread Kotresh Hiremath Ravishankar
06] I [subcmds(worker 
> /data/gluster-dist/distvol):70:subcmd_worker]
> : Worker spawn successful. Acknowledging back to monitor
>
> [2018-09-03 07:27:48.401196] I [master(worker 
> /data/gluster-dist/distvol):1593:register]
> _GMaster: Working dir  path=/var/lib/misc/gluster/
> gsyncd/glusterdist_gluster-poc-sj_glusterdist/data-gluster-dist-distvol
>
> [2018-09-03 07:27:48.401477] I [resource(worker
> /data/gluster-dist/distvol):1282:service_loop] GLUSTER: Register time
> time=1535959668
>
> [2018-09-03 07:27:49.176095] I [gsyncdstatus(worker
> /data/gluster-dist/distvol):277:set_active] GeorepStatus: Worker Status
> Change  status=Active
>
> [2018-09-03 07:27:49.177079] I [gsyncdstatus(worker
> /data/gluster-dist/distvol):249:set_worker_crawl_status] GeorepStatus:
> Crawl Status Change  status=History Crawl
>
> [2018-09-03 07:27:49.177339] I [master(worker 
> /data/gluster-dist/distvol):1507:crawl]
> _GMaster: starting history crawl  turns=1 stime=(1535701378, 0)
> entry_stime=(1535701378, 0) etime=1535959669
>
> [2018-09-03 07:27:50.179210] I [master(worker 
> /data/gluster-dist/distvol):1536:crawl]
> _GMaster: slave's timestime=(1535701378, 0)
>
> [2018-09-03 07:27:51.300096] I [gsyncd(config-get):297:main] : Using
> session config file   path=/var/lib/glusterd/geo-replication/glusterdist_
> gluster-poc-sj_glusterdist/gsyncd.conf
>
> [2018-09-03 07:27:51.399027] I [gsyncd(status):297:main] : Using
> session config file   path=/var/lib/glusterd/geo-
> replication/glusterdist_gluster-poc-sj_glusterdist/gsyncd.conf
>
> [2018-09-03 07:27:52.510271] I [master(worker 
> /data/gluster-dist/distvol):1944:syncjob]
> Syncer: Sync Time Taken duration=1.6146 num_files=1 job=2
> return_code=0
>
> [2018-09-03 07:27:52.514487] I [master(worker 
> /data/gluster-dist/distvol):1374:process]
> _GMaster: Entry Time Taken  MKD=0   MKN=0   LIN=0   SYM=0REN=1
> RMD=0   CRE=0   duration=0.2745 UNL=0
>
> [2018-09-03 07:27:52.514615] I [master(worker 
> /data/gluster-dist/distvol):1384:process]
> _GMaster: Data/Metadata Time Taken  SETA=1  SETX=0
> meta_duration=0.2691 data_duration=1.7883DATA=1  XATT=0
>
> [2018-09-03 07:27:52.514844] I [master(worker 
> /data/gluster-dist/distvol):1394:process]
> _GMaster: Batch Completed   
> changelog_end=1535701379entry_stime=(1535701378,
> 0)  changelog_start=1535701379  stime=(1535701378, 0)
> duration=2.3353 num_changelogs=1mode=history_changelog
>
> [2018-09-03 07:27:52.515224] I [master(worker 
> /data/gluster-dist/distvol):1552:crawl]
> _GMaster: finished history crawl  endtime=1535959662
> stime=(1535701378, 0)entry_stime=(1535701378, 0)
>
> [2018-09-03 07:28:01.706876] I [gsyncd(config-get):297:main] : Using
> session config file   path=/var/lib/glusterd/geo-replication/glusterdist_
> gluster-poc-sj_glusterdist/gsyncd.conf
>
> [2018-09-03 07:28:01.803858] I [gsyncd(status):297:main] : Using
> session config file   path=/var/lib/glusterd/geo-
> replication/glusterdist_gluster-poc-sj_glusterdist/gsyncd.conf
>
> [2018-09-03 07:28:03.521949] I [master(worker 
> /data/gluster-dist/distvol):1507:crawl]
> _GMaster: starting history crawl  turns=2 stime=(1535701378, 0)
> entry_stime=(1535701378, 0) etime=1535959683
>
> [2018-09-03 07:28:03.523086] I [master(worker 
> /data/gluster-dist/distvol):1552:crawl]
> _GMaster: finished history crawl  endtime=1535959677
> stime=(1535701378, 0)entry_stime=(1535701378, 0)
>
> [2018-09-03 07:28:04.62274] I [gsyncdstatus(worker
> /data/gluster-dist/distvol):249:set_worker_crawl_status] GeorepStatus:
> Crawl Status Change   status=Changelog Crawl
>
> [root@gluster-poc-noida distvol]#
>
>
>
> *From:* Kotresh Hiremath Ravishankar 
> *Sent:* Monday, September 3, 2018 12:44 PM
>
> *To:* Krishna Verma 
> *Cc:* Sunny Kumar ; Gluster Users <
> gluster-users@gluster.org>
> *Subject:* Re: [Gluster-users] Upgrade to 4.1.2 geo-replication does not
> work
>
>
>
> EXTERNAL MAIL
>
> Hi Krishna,
>
> The log is not complete. If you are re-trying, could you please try it out
> on 4.1.3 and share the logs.
>
> Thanks,
>
> Kotresh HR
>
>
>
> On Mon, Sep 3, 2018 at 12:42 PM, Krishna Verma  wrote:
>
> Hi Kotresh,
>
>
>
> Please find the log files attached.
>
>
>
> Request you to please have a look.
>
>
>
> /Krishna
>
>
>
>
>
>
>
> *From:* Kotresh Hiremath Ravishankar 
> *Sent:* Monday, September 3, 2018 10:19 AM
>
>
> *To:* Krishna Verma 
> *Cc:* Sunny Kumar ; Gluster Users <
> gluster-users@gluster.org>
> *Subject:* Re: [Gluster-users] Upgrade to 4.1.2 geo-replicati

Re: [Gluster-users] Upgrade to 4.1.2 geo-replication does not work

2018-09-03 Thread Kotresh Hiremath Ravishankar
Hi Krishna,

The log is not complete. If you are re-trying, could you please try it out
on 4.1.3 and share the logs.

Thanks,
Kotresh HR

On Mon, Sep 3, 2018 at 12:42 PM, Krishna Verma  wrote:

> Hi Kotresh,
>
>
>
> Please find the log files attached.
>
>
>
> Request you to please have a look.
>
>
>
> /Krishna
>
>
>
>
>
>
>
> *From:* Kotresh Hiremath Ravishankar 
> *Sent:* Monday, September 3, 2018 10:19 AM
>
> *To:* Krishna Verma 
> *Cc:* Sunny Kumar ; Gluster Users <
> gluster-users@gluster.org>
> *Subject:* Re: [Gluster-users] Upgrade to 4.1.2 geo-replication does not
> work
>
>
>
> EXTERNAL MAIL
>
> Hi Krishna,
>
> Indexing is the feature used by Hybrid crawl which only makes crawl
> faster. It has nothing to do with missing data sync.
>
> Could you please share the complete log file of the session where the
> issue is encountered ?
>
> Thanks,
>
> Kotresh HR
>
>
>
> On Mon, Sep 3, 2018 at 9:33 AM, Krishna Verma  wrote:
>
> Hi Kotresh/Support,
>
>
>
> Request your help to get it fix. My slave is not getting sync with master.
> When I restart the session after doing the indexing off then only it shows
> the file at slave but that is also blank with zero size.
>
>
>
> At master: file size is 5.8 GB.
>
>
>
> [root@gluster-poc-noida distvol]# du -sh 17.10.v001.20171023-201021_
> 17020_GPLV3.tar.gz
>
> 5.8G17.10.v001.20171023-201021_17020_GPLV3.tar.gz
>
> [root@gluster-poc-noida distvol]#
>
>
>
> But at slave, after doing the “indexing off” and restart the session and
> then wait for 2 days. It shows only 4.9 GB copied.
>
>
>
> [root@gluster-poc-sj distvol]# du -sh 17.10.v001.20171023-201021_
> 17020_GPLV3.tar.gz
>
> 4.9G17.10.v001.20171023-201021_17020_GPLV3.tar.gz
>
> [root@gluster-poc-sj distvol]#
>
>
>
> Similarly, I tested for small file of size 1.2 GB only that is still
> showing “0” size at slave  after days waiting time.
>
>
>
> At Master:
>
>
>
> [root@gluster-poc-noida distvol]# du -sh rflowTestInt18.08-b001.t.Z
>
> 1.2GrflowTestInt18.08-b001.t.Z
>
> [root@gluster-poc-noida distvol]#
>
>
>
> At Slave:
>
>
>
> [root@gluster-poc-sj distvol]# du -sh rflowTestInt18.08-b001.t.Z
>
> 0   rflowTestInt18.08-b001.t.Z
>
> [root@gluster-poc-sj distvol]#
>
>
>
> Below is my distributed volume info :
>
>
>
> [root@gluster-poc-noida distvol]# gluster volume info glusterdist
>
>
>
> Volume Name: glusterdist
>
> Type: Distribute
>
> Volume ID: af5b2915-7170-4b5e-aee8-7e68757b9bf1
>
> Status: Started
>
> Snapshot Count: 0
>
> Number of Bricks: 2
>
> Transport-type: tcp
>
> Bricks:
>
> Brick1: gluster-poc-noida:/data/gluster-dist/distvol
>
> Brick2: noi-poc-gluster:/data/gluster-dist/distvol
>
> Options Reconfigured:
>
> changelog.changelog: on
>
> geo-replication.ignore-pid-check: on
>
> geo-replication.indexing: on
>
> transport.address-family: inet
>
> nfs.disable: on
>
> [root@gluster-poc-noida distvol]#
>
>
>
> Please help to fix, I believe its not a normal behavior of gluster rsync.
>
>
>
> /Krishna
>
> *From:* Krishna Verma
> *Sent:* Friday, August 31, 2018 12:42 PM
> *To:* 'Kotresh Hiremath Ravishankar' 
> *Cc:* Sunny Kumar ; Gluster Users <
> gluster-users@gluster.org>
> *Subject:* RE: [Gluster-users] Upgrade to 4.1.2 geo-replication does not
> work
>
>
>
> Hi Kotresh,
>
>
>
> I have tested the geo replication over distributed volumes with 2*2
> gluster setup.
>
>
>
> [root@gluster-poc-noida ~]# gluster volume geo-replication glusterdist
> gluster-poc-sj::glusterdist status
>
>
>
> MASTER NODE  MASTER VOL MASTER BRICK  SLAVE
> USERSLAVE  SLAVE NODE STATUSCRAWL
> STATUS   LAST_SYNCED
>
> 
> 
> -
>
> gluster-poc-noidaglusterdist/data/gluster-dist/distvol
> root  gluster-poc-sj::glusterdistgluster-poc-sj Active
> Changelog Crawl2018-08-31 10:28:19
>
> noi-poc-gluster  glusterdist/data/gluster-dist/distvol
> root  gluster-poc-sj::glusterdistgluster-poc-sj2Active
> History Crawl      N/A
>
> [root@gluster-poc-noida ~]#
>
>
>
> Not at client I copied a 848MB file from local disk to master mounted
> volume and it took only 1 minute and 15 seconds. Its great….
>
>
&

Re: [Gluster-users] Was: Upgrade to 4.1.2 geo-replication does not work Now: Upgraded to 4.1.3 geo node Faulty

2018-09-02 Thread Kotresh Hiremath Ravishankar
Hi Marcus,

Geo-rep had few important fixes in 4.1.3. Is it possible to upgrade and
check whether the issue is still seen?

Thanks,
Kotresh HR

On Sat, Sep 1, 2018 at 5:08 PM, Marcus Pedersén 
wrote:

> Hi again,
>
> I found another problem on the other master node.
>
> The node toggles Active/Faulty and it is the same error over and over
> again.
>
>
> [2018-09-01 11:23:02.94080] E [repce(worker /urd-gds/gluster):197:__call__]
> RepceClient: call failedcall=1226:139955262510912:1535800981.24
> method=entry_opserror=GsyncdError
> [2018-09-01 11:23:02.94214] E [syncdutils(worker 
> /urd-gds/gluster):300:log_raise_exception]
> : execution of "gluster" failed with ENOENT (No such file or directory)
> [2018-09-01 11:23:02.106194] I [repce(agent /urd-gds/gluster):80:service_loop]
> RepceServer: terminating on reaching EOF.
> [2018-09-01 11:23:02.12] I [gsyncdstatus(monitor):244:set_worker_status]
> GeorepStatus: Worker Status Change status=Faulty
>
>
> I have also found a python error as well, I have only seen this once
> though.
>
>
> [2018-09-01 11:16:45.907660] I [master(worker
> /urd-gds/gluster):1536:crawl] _GMaster: slave's time
> stime=(1524101534, 0)
> [2018-09-01 11:16:47.364109] E [syncdutils(worker
> /urd-gds/gluster):332:log_raise_exception] : FAIL:
> Traceback (most recent call last):
>   File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line
> 362, in twrap
> tf(*aargs)
>   File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1939,
> in syncjob
> po = self.sync_engine(pb, self.log_err)
>   File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1442,
> in rsync
> rconf.ssh_ctl_args + \
> AttributeError: 'NoneType' object has no attribute 'split'
> [2018-09-01 11:16:47.384531] I [repce(agent /urd-gds/gluster):80:service_loop]
> RepceServer: terminating on reaching EOF.
> [2018-09-01 11:16:48.362987] I [monitor(monitor):279:monitor] Monitor:
> worker died in startup phase brick=/urd-gds/gluster
> [2018-09-01 11:16:48.370701] I [gsyncdstatus(monitor):244:set_worker_status]
> GeorepStatus: Worker Status Change status=Faulty
> [2018-09-01 11:16:58.390548] I [monitor(monitor):158:monitor] Monitor:
> starting gsyncd worker   brick=/urd-gds/gluster  slave_node=urd-gds-geo-000
>
>
> I attach the logs as well.
>
>
> Many thanks!
>
>
> Best regards
>
> Marcus Pedersén
>
>
>
>
> --
> *Från:* gluster-users-boun...@gluster.org  gluster.org> för Marcus Pedersén 
> *Skickat:* den 31 augusti 2018 16:09
> *Till:* khire...@redhat.com
>
> *Kopia:* gluster-users@gluster.org
> *Ämne:* Re: [Gluster-users] Was: Upgrade to 4.1.2 geo-replication does
> not work Now: Upgraded to 4.1.3 geo node Faulty
>
>
> I realy appologize, third try to make mail smaller.
>
>
> /Marcus
>
>
> --
> *Från:* Marcus Pedersén
> *Skickat:* den 31 augusti 2018 16:03
> *Till:* Kotresh Hiremath Ravishankar
> *Kopia:* gluster-users@gluster.org
> *Ämne:* SV: [Gluster-users] Was: Upgrade to 4.1.2 geo-replication does
> not work Now: Upgraded to 4.1.3 geo node Faulty
>
>
> Sorry, resend due to too large mail.
>
>
> /Marcus
> --
> *Från:* Marcus Pedersén
> *Skickat:* den 31 augusti 2018 15:19
> *Till:* Kotresh Hiremath Ravishankar
> *Kopia:* gluster-users@gluster.org
> *Ämne:* SV: [Gluster-users] Was: Upgrade to 4.1.2 geo-replication does
> not work Now: Upgraded to 4.1.3 geo node Faulty
>
>
> Hi Kotresh,
>
> Please find attached logs, only logs from today.
>
> The python error was repeated over and over again until I disabled selinux.
>
> After that the node bacame active again.
>
> The return code 23 seems to be repeated over and over again.
>
>
> rsync version 3.1.2
>
>
> Thanks a lot!
>
>
> Best regards
>
> Marcus
>
>
> --
> *Från:* Kotresh Hiremath Ravishankar 
> *Skickat:* den 31 augusti 2018 11:09
> *Till:* Marcus Pedersén
> *Kopia:* gluster-users@gluster.org
> *Ämne:* Re: [Gluster-users] Was: Upgrade to 4.1.2 geo-replication does
> not work Now: Upgraded to 4.1.3 geo node Faulty
>
> Hi Marcus,
>
> Could you attach full logs? Is the same trace back happening repeatedly?
> It will be helpful you attach the corresponding mount log as well.
> What's the rsync version, you are using?
>
> Thanks,
> Kotresh HR
>
> On Fri, Aug 31, 2018 at 12:16 PM, Marcus Pedersén 
> wrote:
>
>> Hi all,
>>
>> I had problems with stopping sync after upgrade to 4.1.2.
>>
>> I upgraded to 4.1.3 and it 

Re: [Gluster-users] Upgrade to 4.1.2 geo-replication does not work

2018-09-02 Thread Kotresh Hiremath Ravishankar
Hi Krishna,

Indexing is the feature used by Hybrid crawl which only makes crawl faster.
It has nothing to do with missing data sync.
Could you please share the complete log file of the session where the issue
is encountered ?

Thanks,
Kotresh HR

On Mon, Sep 3, 2018 at 9:33 AM, Krishna Verma  wrote:

> Hi Kotresh/Support,
>
>
>
> Request your help to get it fix. My slave is not getting sync with master.
> When I restart the session after doing the indexing off then only it shows
> the file at slave but that is also blank with zero size.
>
>
>
> At master: file size is 5.8 GB.
>
>
>
> [root@gluster-poc-noida distvol]# du -sh 17.10.v001.20171023-201021_
> 17020_GPLV3.tar.gz
>
> 5.8G17.10.v001.20171023-201021_17020_GPLV3.tar.gz
>
> [root@gluster-poc-noida distvol]#
>
>
>
> But at slave, after doing the “indexing off” and restart the session and
> then wait for 2 days. It shows only 4.9 GB copied.
>
>
>
> [root@gluster-poc-sj distvol]# du -sh 17.10.v001.20171023-201021_
> 17020_GPLV3.tar.gz
>
> 4.9G17.10.v001.20171023-201021_17020_GPLV3.tar.gz
>
> [root@gluster-poc-sj distvol]#
>
>
>
> Similarly, I tested for small file of size 1.2 GB only that is still
> showing “0” size at slave  after days waiting time.
>
>
>
> At Master:
>
>
>
> [root@gluster-poc-noida distvol]# du -sh rflowTestInt18.08-b001.t.Z
>
> 1.2GrflowTestInt18.08-b001.t.Z
>
> [root@gluster-poc-noida distvol]#
>
>
>
> At Slave:
>
>
>
> [root@gluster-poc-sj distvol]# du -sh rflowTestInt18.08-b001.t.Z
>
> 0   rflowTestInt18.08-b001.t.Z
>
> [root@gluster-poc-sj distvol]#
>
>
>
> Below is my distributed volume info :
>
>
>
> [root@gluster-poc-noida distvol]# gluster volume info glusterdist
>
>
>
> Volume Name: glusterdist
>
> Type: Distribute
>
> Volume ID: af5b2915-7170-4b5e-aee8-7e68757b9bf1
>
> Status: Started
>
> Snapshot Count: 0
>
> Number of Bricks: 2
>
> Transport-type: tcp
>
> Bricks:
>
> Brick1: gluster-poc-noida:/data/gluster-dist/distvol
>
> Brick2: noi-poc-gluster:/data/gluster-dist/distvol
>
> Options Reconfigured:
>
> changelog.changelog: on
>
> geo-replication.ignore-pid-check: on
>
> geo-replication.indexing: on
>
> transport.address-family: inet
>
> nfs.disable: on
>
> [root@gluster-poc-noida distvol]#
>
>
>
> Please help to fix, I believe its not a normal behavior of gluster rsync.
>
>
>
> /Krishna
>
> *From:* Krishna Verma
> *Sent:* Friday, August 31, 2018 12:42 PM
> *To:* 'Kotresh Hiremath Ravishankar' 
> *Cc:* Sunny Kumar ; Gluster Users <
> gluster-users@gluster.org>
> *Subject:* RE: [Gluster-users] Upgrade to 4.1.2 geo-replication does not
> work
>
>
>
> Hi Kotresh,
>
>
>
> I have tested the geo replication over distributed volumes with 2*2
> gluster setup.
>
>
>
> [root@gluster-poc-noida ~]# gluster volume geo-replication glusterdist
> gluster-poc-sj::glusterdist status
>
>
>
> MASTER NODE  MASTER VOL MASTER BRICK  SLAVE
> USERSLAVE  SLAVE NODE STATUSCRAWL
> STATUS   LAST_SYNCED
>
> 
> 
> -
>
> gluster-poc-noidaglusterdist/data/gluster-dist/distvol
> root  gluster-poc-sj::glusterdistgluster-poc-sj Active
> Changelog Crawl2018-08-31 10:28:19
>
> noi-poc-gluster  glusterdist/data/gluster-dist/distvol
> root  gluster-poc-sj::glusterdistgluster-poc-sj2Active
> History Crawl  N/A
>
> [root@gluster-poc-noida ~]#
>
>
>
> Not at client I copied a 848MB file from local disk to master mounted
> volume and it took only 1 minute and 15 seconds. Its great….
>
>
>
> But even after waited for 2 hrs I was unable to see that file at slave
> site. Then I again erased the indexing by doing “gluster volume set
> glusterdist  indexing off” and restart the session. Magically I received
> the file instantly at slave after doing this.
>
>
>
> Why I need to do “indexing off” every time to reflect data at slave site?
> Is there any fix/workaround of it?
>
>
>
> /Krishna
>
>
>
>
>
> *From:* Kotresh Hiremath Ravishankar 
> *Sent:* Friday, August 31, 2018 10:10 AM
> *To:* Krishna Verma 
> *Cc:* Sunny Kumar ; Gluster Users <
> gluster-users@gluster.org>
> *Subject:* Re: [Gluster-users] Upgrade to 4.1.2 geo-replication does not
> work
>
>
>
> EXTERNAL MAI

Re: [Gluster-users] Was: Upgrade to 4.1.2 geo-replication does not work Now: Upgraded to 4.1.3 geo node Faulty

2018-08-31 Thread Kotresh Hiremath Ravishankar
Hi Marcus,

Could you attach full logs? Is the same trace back happening repeatedly? It
will be helpful you attach the corresponding mount log as well.
What's the rsync version, you are using?

Thanks,
Kotresh HR

On Fri, Aug 31, 2018 at 12:16 PM, Marcus Pedersén 
wrote:

> Hi all,
>
> I had problems with stopping sync after upgrade to 4.1.2.
>
> I upgraded to 4.1.3 and it ran fine for one day, but now one of the master
> nodes shows faulty.
>
> Most of the sync jobs have return code 23, how do I resolve this?
>
> I see messages like:
>
> _GMaster: Sucessfully fixed all entry ops with gfid mismatch
>
> Will this resolve error code 23?
>
> There is also a python error.
>
> The python error was a selinux problem, turning off selinux made node go
> to active again.
>
> See log below.
>
>
> CentOS 7, installed through SIG Gluster (OS updated to latest at the same
> time)
>
> Master cluster: 2 x (2 + 1) distributed, replicated
>
> Client cluster: 1 x (2 + 1) replicated
>
>
> Many thanks in advance!
>
>
> Best regards
>
> Marcus Pedersén
>
>
>
> gsyncd.log from Faulty node:
>
> [2018-08-31 06:25:51.375267] I [master(worker /urd-gds/gluster):1944:syncjob]
> Syncer: Sync Time Taken   duration=0.8099 num_files=57job=3
> return_code=23
> [2018-08-31 06:25:51.465895] I [master(worker /urd-gds/gluster):1944:syncjob]
> Syncer: Sync Time Taken   duration=0.0904 num_files=3 job=3
> return_code=23
> [2018-08-31 06:25:52.562107] E [repce(worker /urd-gds/gluster):197:__call__]
> RepceClient: call failed   call=30069:139655665837888:1535696752.35
> method=entry_opserror=OSError
> [2018-08-31 06:25:52.562346] E [syncdutils(worker
> /urd-gds/gluster):332:log_raise_exception] : FAIL:
> Traceback (most recent call last):
>   File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 311, in
> main
> func(args)
>   File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 72, in
> subcmd_worker
> local.service_loop(remote)
>   File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1288,
> in service_loop
> g3.crawlwrap(oneshot=True)
>   File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 615, in
> crawlwrap
> self.crawl()
>   File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1545,
> in crawl
> self.changelogs_batch_process(changes)
>   File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1445,
> in changelogs_batch_process
> self.process(batch)
>   File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1280,
> in process
> self.process_change(change, done, retry)
>   File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1179,
> in process_change
> failures = self.slave.server.entry_ops(entries)
>   File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 216, in
> __call__
> return self.ins(self.meth, *a)
>   File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 198, in
> __call__
> raise res
> OSError: [Errno 13] Permission denied
> [2018-08-31 06:25:52.578367] I [repce(agent /urd-gds/gluster):80:service_loop]
> RepceServer: terminating on reaching EOF.
> [2018-08-31 06:25:53.558765] I [monitor(monitor):279:monitor] Monitor:
> worker died in startup phase brick=/urd-gds/gluster
> [2018-08-31 06:25:53.569777] I [gsyncdstatus(monitor):244:set_worker_status]
> GeorepStatus: Worker Status Change status=Faulty
> [2018-08-31 06:26:03.593161] I [monitor(monitor):158:monitor] Monitor:
> starting gsyncd worker   brick=/urd-gds/gluster  slave_node=urd-gds-geo-000
> [2018-08-31 06:26:03.636452] I [gsyncd(agent /urd-gds/gluster):297:main]
> : Using session config file   path=/var/lib/glusterd/geo-
> replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> [2018-08-31 06:26:03.636810] I [gsyncd(worker /urd-gds/gluster):297:main]
> : Using session config file  path=/var/lib/glusterd/geo-
> replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> [2018-08-31 06:26:03.637486] I [changelogagent(agent
> /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
> [2018-08-31 06:26:03.650330] I [resource(worker 
> /urd-gds/gluster):1377:connect_remote]
> SSH: Initializing SSH connection between master and slave...
> [2018-08-31 06:26:05.296473] I [resource(worker 
> /urd-gds/gluster):1424:connect_remote]
> SSH: SSH connection between master and slave established.
> duration=1.6457
> [2018-08-31 06:26:05.297904] I [resource(worker 
> /urd-gds/gluster):1096:connect]
> GLUSTER: Mounting gluster volume locally...
> [2018-08-31 06:26:06.396939] I [resource(worker 
> /urd-gds/gluster):1119:connect]
> GLUSTER: Mounted gluster volume duration=1.0985
> [2018-08-31 06:26:06.397691] I [subcmds(worker 
> /urd-gds/gluster):70:subcmd_worker]
> : Worker spawn successful. Acknowledging back to monitor
> [2018-08-31 06:26:16.815566] I [master(worker /urd-gds/gluster):1593:register]
> _GMaster: Working dirpath=/var/lib/misc/gluster/
> 

Re: [Gluster-users] Upgrade to 4.1.2 geo-replication does not work

2018-08-30 Thread Kotresh Hiremath Ravishankar
On Thu, Aug 30, 2018 at 4:55 PM, Krishna Verma  wrote:

> Hi Kotresh,
>
>
>
> 1. Time to write 1GB to master:   27 minutes and 29 seconds
>
> 2. Time for geo-rep to transfer 1GB to slave.   8 minutes
>
This is hard to believe, considering there is no
distribution and there is only one brick participating in syncing.
Could you retest and confirm.

>
>
> /Krishna
>
>
>
>
>
> *From:* Kotresh Hiremath Ravishankar 
> *Sent:* Thursday, August 30, 2018 3:20 PM
>
> *To:* Krishna Verma 
> *Cc:* Sunny Kumar ; Gluster Users <
> gluster-users@gluster.org>
> *Subject:* Re: [Gluster-users] Upgrade to 4.1.2 geo-replication does not
> work
>
>
>
> EXTERNAL MAIL
>
>
>
>
>
> On Thu, Aug 30, 2018 at 1:52 PM, Krishna Verma  wrote:
>
> Hi Kotresh,
>
>
>
> After fix the library link on node "noi-poc-gluster ", the status of one
> mater node is “Active” and another is “Passive”. Can I setup both the
> master as “Active” ?
>
>
>
> Nope, since it's replica, it's redundant to sync same files from two
> nodes. Both replicas can't be Active.
>
>
>
>
>
> Also, when I copy a 1GB size of file from gluster client to master gluster
> volume which is replicated with the slave volume, it tooks 35 minutes and
> 49 seconds. Is there any way to reduce its time taken to rsync data.
>
>
>
> How did you measure this time? Does this include the time take for you to
> write 1GB file to master?
>
> There are two aspects to consider while measuring this.
>
>
>
> 1. Time to write 1GB to master
>
> 2. Time for geo-rep to transfer 1GB to slave.
>
>
>
> In your case, since the setup is 1*2 and only one geo-rep worker is
> Active, Step2 above equals to time for step1 + network transfer time.
>
>
>
> You can measure time in two scenarios
>
> 1. If geo-rep is started while the data is still being written to master.
> It's one way.
>
> 2. Or stop geo-rep until the 1GB file is written to master and then start
> geo-rep to get actual geo-rep time.
>
>
>
> To improve replicating speed,
>
> 1. You can play around with rsync options depending on the kind of I/O
>
> and configure the same for geo-rep as it also uses rsync internally.
>
> 2. It's better if gluster volume has more distribute count like  3*3 or 4*3
>
> It will help in two ways.
>
>1. The files gets distributed on master to multiple bricks
>
>2. So above will help geo-rep as files on multiple bricks are
> synced in parallel (multiple Actives)
>
>
>
> NOTE: Gluster master server and one client is in Noida, India Location.
>
>  Gluster Slave server and one client is in USA.
>
>
>
> Our approach is to transfer data from Noida gluster client will reach to
> the USA gluster client in a minimum time. Please suggest the best approach
> to achieve it.
>
>
>
> [root@noi-dcops ~]# date ; rsync -avh --progress /tmp/gentoo_root.img
> /glusterfs/ ; date
>
> Thu Aug 30 12:26:26 IST 2018
>
> sending incremental file list
>
> gentoo_root.img
>
>   1.07G 100%  490.70kB/s0:35:36 (xfr#1, to-chk=0/1)
>
>
>
> Is this I/O time to write to master volume?
>
>
>
> sent 1.07G bytes  received 35 bytes  499.65K bytes/sec
>
> total size is 1.07G  speedup is 1.00
>
> Thu Aug 30 13:02:15 IST 2018
>
> [root@noi-dcops ~]#
>
>
>
>
>
>
>
> [root@gluster-poc-noida gluster]#  gluster volume geo-replication status
>
>
>
> MASTER NODE  MASTER VOLMASTER BRICK SLAVE USER
> SLAVE  SLAVE NODESTATUS CRAWL
> STATUS   LAST_SYNCED
>
> 
> 
> -----------
>
> gluster-poc-noidaglusterep /data/gluster/gv0root
> ssh://gluster-poc-sj::glusterepgluster-poc-sjActive Changelog
> Crawl2018-08-30 13:42:18
>
> noi-poc-gluster  glusterep /data/gluster/gv0root
> ssh://gluster-poc-sj::glusterepgluster-poc-sjPassive
> N/AN/A
>
> [root@gluster-poc-noida gluster]#
>
>
>
> Thanks in advance for your all time support.
>
>
>
> /Krishna
>
>
>
> *From:* Kotresh Hiremath Ravishankar 
> *Sent:* Thursday, August 30, 2018 10:51 AM
>
>
> *To:* Krishna Verma 
> *Cc:* Sunny Kumar ; Gluster Users <
> gluster-users@gluster.org>
> *Subject:* Re: [Gluster-users] Upgrade to 4.1.2 geo-replication does not
> work
>
>
>
> EXTERNAL MAIL
>
&

Re: [Gluster-users] Upgrade to 4.1.2 geo-replication does not work

2018-08-30 Thread Kotresh Hiremath Ravishankar
On Thu, Aug 30, 2018 at 3:51 PM, Krishna Verma  wrote:

> Hi Kotresh,
>
>
>
> Yes, this include the time take  to write 1GB file to master. geo-rep was
> not stopped while the data was copying to master.
>

This way, you can't really measure how much time geo-rep took.


>
> But now I am trouble, My putty session was timed out while copying data to
> master and geo replication was active. After I restart putty session My
> Master data is not syncing with slave. Its Last_synced time is  1hrs behind
> the current time.
>
>
>
> I restart the geo rep and also delete and again create the session but its
>  “LAST_SYNCED” time is same.
>

Unless, geo-rep is Faulty, it would be processing/syncing. You should check
logs for any errors.


>
> Please help in this.
>
>
>
> …. It's better if gluster volume has more distribute count like  3*3 or
> 4*3 :- Are you refereeing to create a distributed volume with 3 master
> node and 3 slave node?
>

Yes,  that's correct. Please do the test with this. I recommend you to run
the actual workload for which you are planning to use gluster instead of
copying 1GB file and testing.


>
>
>
> /krishna
>
>
>
> *From:* Kotresh Hiremath Ravishankar 
> *Sent:* Thursday, August 30, 2018 3:20 PM
>
> *To:* Krishna Verma 
> *Cc:* Sunny Kumar ; Gluster Users <
> gluster-users@gluster.org>
> *Subject:* Re: [Gluster-users] Upgrade to 4.1.2 geo-replication does not
> work
>
>
>
> EXTERNAL MAIL
>
>
>
>
>
> On Thu, Aug 30, 2018 at 1:52 PM, Krishna Verma  wrote:
>
> Hi Kotresh,
>
>
>
> After fix the library link on node "noi-poc-gluster ", the status of one
> mater node is “Active” and another is “Passive”. Can I setup both the
> master as “Active” ?
>
>
>
> Nope, since it's replica, it's redundant to sync same files from two
> nodes. Both replicas can't be Active.
>
>
>
>
>
> Also, when I copy a 1GB size of file from gluster client to master gluster
> volume which is replicated with the slave volume, it tooks 35 minutes and
> 49 seconds. Is there any way to reduce its time taken to rsync data.
>
>
>
> How did you measure this time? Does this include the time take for you to
> write 1GB file to master?
>
> There are two aspects to consider while measuring this.
>
>
>
> 1. Time to write 1GB to master
>
> 2. Time for geo-rep to transfer 1GB to slave.
>
>
>
> In your case, since the setup is 1*2 and only one geo-rep worker is
> Active, Step2 above equals to time for step1 + network transfer time.
>
>
>
> You can measure time in two scenarios
>
> 1. If geo-rep is started while the data is still being written to master.
> It's one way.
>
> 2. Or stop geo-rep until the 1GB file is written to master and then start
> geo-rep to get actual geo-rep time.
>
>
>
> To improve replicating speed,
>
> 1. You can play around with rsync options depending on the kind of I/O
>
> and configure the same for geo-rep as it also uses rsync internally.
>
> 2. It's better if gluster volume has more distribute count like  3*3 or 4*3
>
> It will help in two ways.
>
>1. The files gets distributed on master to multiple bricks
>
>2. So above will help geo-rep as files on multiple bricks are
> synced in parallel (multiple Actives)
>
>
>
> NOTE: Gluster master server and one client is in Noida, India Location.
>
>  Gluster Slave server and one client is in USA.
>
>
>
> Our approach is to transfer data from Noida gluster client will reach to
> the USA gluster client in a minimum time. Please suggest the best approach
> to achieve it.
>
>
>
> [root@noi-dcops ~]# date ; rsync -avh --progress /tmp/gentoo_root.img
> /glusterfs/ ; date
>
> Thu Aug 30 12:26:26 IST 2018
>
> sending incremental file list
>
> gentoo_root.img
>
>   1.07G 100%  490.70kB/s0:35:36 (xfr#1, to-chk=0/1)
>
>
>
> Is this I/O time to write to master volume?
>
>
>
> sent 1.07G bytes  received 35 bytes  499.65K bytes/sec
>
> total size is 1.07G  speedup is 1.00
>
> Thu Aug 30 13:02:15 IST 2018
>
> [root@noi-dcops ~]#
>
>
>
>
>
>
>
> [root@gluster-poc-noida gluster]#  gluster volume geo-replication status
>
>
>
> MASTER NODE  MASTER VOLMASTER BRICK SLAVE USER
> SLAVE  SLAVE NODESTATUS CRAWL
> STATUS   LAST_SYNCED
>
> 
> ------------
> ---
>
> gluster-poc-noida  

Re: [Gluster-users] Upgrade to 4.1.2 geo-replication does not work

2018-08-30 Thread Kotresh Hiremath Ravishankar
On Thu, Aug 30, 2018 at 1:52 PM, Krishna Verma  wrote:

> Hi Kotresh,
>
>
>
> After fix the library link on node "noi-poc-gluster ", the status of one
> mater node is “Active” and another is “Passive”. Can I setup both the
> master as “Active” ?
>

Nope, since it's replica, it's redundant to sync same files from two nodes.
Both replicas can't be Active.


>
> Also, when I copy a 1GB size of file from gluster client to master gluster
> volume which is replicated with the slave volume, it tooks 35 minutes and
> 49 seconds. Is there any way to reduce its time taken to rsync data.
>

How did you measure this time? Does this include the time take for you to
write 1GB file to master?
There are two aspects to consider while measuring this.

1. Time to write 1GB to master
2. Time for geo-rep to transfer 1GB to slave.

In your case, since the setup is 1*2 and only one geo-rep worker is Active,
Step2 above equals to time for step1 + network transfer time.

You can measure time in two scenarios
1. If geo-rep is started while the data is still being written to master.
It's one way.
2. Or stop geo-rep until the 1GB file is written to master and then start
geo-rep to get actual geo-rep time.

To improve replicating speed,
1. You can play around with rsync options depending on the kind of I/O
and configure the same for geo-rep as it also uses rsync internally.
2. It's better if gluster volume has more distribute count like  3*3 or 4*3
It will help in two ways.
   1. The files gets distributed on master to multiple bricks
   2. So above will help geo-rep as files on multiple bricks are synced
in parallel (multiple Actives)

>
>
> NOTE: Gluster master server and one client is in Noida, India Location.
>
>  Gluster Slave server and one client is in USA.
>
>
>
> Our approach is to transfer data from Noida gluster client will reach to
> the USA gluster client in a minimum time. Please suggest the best approach
> to achieve it.
>
>
>
> [root@noi-dcops ~]# date ; rsync -avh --progress /tmp/gentoo_root.img
> /glusterfs/ ; date
>
> Thu Aug 30 12:26:26 IST 2018
>
> sending incremental file list
>
> gentoo_root.img
>
>   1.07G 100%  490.70kB/s0:35:36 (xfr#1, to-chk=0/1)
>

Is this I/O time to write to master volume?

>
>
> sent 1.07G bytes  received 35 bytes  499.65K bytes/sec
>
> total size is 1.07G  speedup is 1.00
>
> Thu Aug 30 13:02:15 IST 2018
>
> [root@noi-dcops ~]#
>


>
>
>
>
> [root@gluster-poc-noida gluster]#  gluster volume geo-replication status
>
>
>
> MASTER NODE  MASTER VOLMASTER BRICK SLAVE USER
> SLAVE  SLAVE NODESTATUS CRAWL
> STATUS   LAST_SYNCED
>
> 
> 
> ---
>
> gluster-poc-noidaglusterep /data/gluster/gv0root
> ssh://gluster-poc-sj::glusterepgluster-poc-sjActive Changelog
> Crawl2018-08-30 13:42:18
>
> noi-poc-gluster  glusterep /data/gluster/gv0root
> ssh://gluster-poc-sj::glusterepgluster-poc-sjPassive
> N/AN/A
>
> [root@gluster-poc-noida gluster]#
>
>
>
> Thanks in advance for your all time support.
>
>
>
> /Krishna
>
>
>
> *From:* Kotresh Hiremath Ravishankar 
> *Sent:* Thursday, August 30, 2018 10:51 AM
>
> *To:* Krishna Verma 
> *Cc:* Sunny Kumar ; Gluster Users <
> gluster-users@gluster.org>
> *Subject:* Re: [Gluster-users] Upgrade to 4.1.2 geo-replication does not
> work
>
>
>
> EXTERNAL MAIL
>
> Did you fix the library link on node "noi-poc-gluster " as well?
>
> If not please fix it. Please share the geo-rep log this node if it's
>
> as different issue.
>
> -Kotresh HR
>
>
>
> On Thu, Aug 30, 2018 at 12:17 AM, Krishna Verma 
> wrote:
>
> Hi Kotresh,
>
>
>
> Thank you so much for you input. Geo-replication is now showing “Active”
> atleast for 1 master node. But its still at faulty state for the 2nd
> master server.
>
>
>
> Below is the detail.
>
>
>
> [root@gluster-poc-noida glusterfs]# gluster volume geo-replication
> glusterep gluster-poc-sj::glusterep status
>
>
>
> MASTER NODE  MASTER VOLMASTER BRICK SLAVE USER
> SLAVESLAVE NODESTATUSCRAWL STATUS
> LAST_SYNCED
>
> 
> 
> 
>
> gluster-poc-noidaglusterep 

Re: [Gluster-users] Upgrade to 4.1.2 geo-replication does not work

2018-08-29 Thread Kotresh Hiremath Ravishankar
Did you fix the library link on node "noi-poc-gluster " as well?
If not please fix it. Please share the geo-rep log this node if it's
as different issue.

-Kotresh HR

On Thu, Aug 30, 2018 at 12:17 AM, Krishna Verma  wrote:

> Hi Kotresh,
>
>
>
> Thank you so much for you input. Geo-replication is now showing “Active”
> atleast for 1 master node. But its still at faulty state for the 2nd
> master server.
>
>
>
> Below is the detail.
>
>
>
> [root@gluster-poc-noida glusterfs]# gluster volume geo-replication
> glusterep gluster-poc-sj::glusterep status
>
>
>
> MASTER NODE  MASTER VOLMASTER BRICK SLAVE USER
> SLAVESLAVE NODESTATUSCRAWL STATUS
> LAST_SYNCED
>
> 
> 
> 
>
> gluster-poc-noidaglusterep /data/gluster/gv0root
> gluster-poc-sj::glusterepgluster-poc-sjActiveChangelog Crawl
> 2018-08-29 23:56:06
>
> noi-poc-gluster  glusterep /data/gluster/gv0root
> gluster-poc-sj::glusterepN/A   FaultyN/A
> N/A
>
>
>
>
>
> [root@gluster-poc-noida glusterfs]# gluster volume status
>
> Status of volume: glusterep
>
> Gluster process TCP Port  RDMA Port  Online
> Pid
>
> 
> --
>
> Brick gluster-poc-noida:/data/gluster/gv0   49152 0  Y
> 22463
>
> Brick noi-poc-gluster:/data/gluster/gv0 49152 0  Y
> 19471
>
> Self-heal Daemon on localhost   N/A   N/AY
> 32087
>
> Self-heal Daemon on noi-poc-gluster N/A   N/AY
> 6272
>
>
>
> Task Status of Volume glusterep
>
> 
> --
>
> There are no active volume tasks
>
>
>
>
>
>
>
> [root@gluster-poc-noida glusterfs]# gluster volume info
>
>
>
> Volume Name: glusterep
>
> Type: Replicate
>
> Volume ID: 4a71bc94-14ce-4b2c-abc4-e6a9a9765161
>
> Status: Started
>
> Snapshot Count: 0
>
> Number of Bricks: 1 x 2 = 2
>
> Transport-type: tcp
>
> Bricks:
>
> Brick1: gluster-poc-noida:/data/gluster/gv0
>
> Brick2: noi-poc-gluster:/data/gluster/gv0
>
> Options Reconfigured:
>
> transport.address-family: inet
>
> nfs.disable: on
>
> performance.client-io-threads: off
>
> geo-replication.indexing: on
>
> geo-replication.ignore-pid-check: on
>
> changelog.changelog: on
>
> [root@gluster-poc-noida glusterfs]#
>
>
>
> Could you please help me in that also please?
>
>
>
> It would be really a great help from your side.
>
>
>
> /Krishna
>
> *From:* Kotresh Hiremath Ravishankar 
> *Sent:* Wednesday, August 29, 2018 10:47 AM
>
> *To:* Krishna Verma 
> *Cc:* Sunny Kumar ; Gluster Users <
> gluster-users@gluster.org>
> *Subject:* Re: [Gluster-users] Upgrade to 4.1.2 geo-replication does not
> work
>
>
>
> EXTERNAL MAIL
>
> Answer inline
>
>
>
> On Tue, Aug 28, 2018 at 4:28 PM, Krishna Verma  wrote:
>
> Hi Kotresh,
>
>
>
> I created the links before. Below is the detail.
>
>
>
> [root@gluster-poc-noida ~]# ls -l /usr/lib64 | grep libgfch
>
> lrwxrwxrwx   1 root root  30 Aug 28 14:59 libgfchangelog.so ->
> /usr/lib64/libgfchangelog.so.1
>
>
>
> The link created is pointing to wrong library. Please fix this
>
>
>
> #cd /usr/lib64
>
> #rm libgfchangelog.so
>
> #ln -s "libgfchangelog.so.0.0.1" libgfchangelog.so
>
>
>
> lrwxrwxrwx   1 root root  23 Aug 23 23:35 libgfchangelog.so.0 ->
> libgfchangelog.so.0.0.1
>
> -rwxr-xr-x   1 root root   63384 Jul 24 19:11 libgfchangelog.so.0.0.1
>
> [root@gluster-poc-noida ~]# locate libgfchangelog.so
>
> /usr/lib64/libgfchangelog.so.0
>
> /usr/lib64/libgfchangelog.so.0.0.1
>
> [root@gluster-poc-noida ~]#
>
>
>
> Is it looks good what we exactly need or di I need to create any more link
> or How to get “libgfchangelog.so” file if missing.
>
>
>
> /Krishna
>
>
>
> *From:* Kotresh Hiremath Ravishankar 
> *Sent:* Tuesday, August 28, 2018 4:22 PM
> *To:* Krishna Verma 
> *Cc:* Sunny Kumar ; Gluster Users <
> gluster-users@gluster.org>
>
>
> *Subject:* Re: [Gluster-users] Upgrade to 4.1.2 geo-replication does not
> work
>
>
>
> EXTERNAL MAIL
>
> Hi Krishna,
>
> As per the

Re: [Gluster-users] Upgrade to 4.1.2 geo-replication does not work

2018-08-28 Thread Kotresh Hiremath Ravishankar
Answer inline

On Tue, Aug 28, 2018 at 4:28 PM, Krishna Verma  wrote:

> Hi Kotresh,
>
>
>
> I created the links before. Below is the detail.
>
>
>
> [root@gluster-poc-noida ~]# ls -l /usr/lib64 | grep libgfch
>
> lrwxrwxrwx   1 root root  30 Aug 28 14:59 libgfchangelog.so ->
> /usr/lib64/libgfchangelog.so.1
>

The link created is pointing to wrong library. Please fix this

#cd /usr/lib64
#rm libgfchangelog.so
#ln -s "libgfchangelog.so.0.0.1" libgfchangelog.so

lrwxrwxrwx   1 root root  23 Aug 23 23:35 libgfchangelog.so.0 ->
> libgfchangelog.so.0.0.1
>
> -rwxr-xr-x   1 root root   63384 Jul 24 19:11 libgfchangelog.so.0.0.1
>
> [root@gluster-poc-noida ~]# locate libgfchangelog.so
>
> /usr/lib64/libgfchangelog.so.0
>
> /usr/lib64/libgfchangelog.so.0.0.1
>
> [root@gluster-poc-noida ~]#
>
>
>
> Is it looks good what we exactly need or di I need to create any more link
> or How to get “libgfchangelog.so” file if missing.
>
>
>
> /Krishna
>
>
>
> *From:* Kotresh Hiremath Ravishankar 
> *Sent:* Tuesday, August 28, 2018 4:22 PM
> *To:* Krishna Verma 
> *Cc:* Sunny Kumar ; Gluster Users <
> gluster-users@gluster.org>
>
> *Subject:* Re: [Gluster-users] Upgrade to 4.1.2 geo-replication does not
> work
>
>
>
> EXTERNAL MAIL
>
> Hi Krishna,
>
> As per the output shared, I don't see the file "libgfchangelog.so" which
> is what is required.
>
> I only see "libgfchangelog.so.0". Please confirm "libgfchangelog.so" is
> present in "/usr/lib64/".
>
> If not create a symlink similar to "libgfchangelog.so.0"
>
>
>
> It should be something like below.
>
>
>
> #ls -l /usr/lib64 | grep libgfch
> -rwxr-xr-x. 1 root root1078 Aug 28 05:56 libgfchangelog.la
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__libgfchangelog.la=DwMFaQ=aUq983L2pue2FqKFoP6PGHMJQyoJ7kl3s3GZ-_haXqY=0E5nRoxLsT2ZXgCpJM_6ZItAWQ2jH8rVLG6tiXhoLFE=77GIqpHy9HY8RQd6lKzSJ-Z1PCuIhZJ3I3IvIuDX-xo=kIFnrBaSFV_DdqZezd6PXcDnD8Iy_gVN69ETZYtykEE=>
> lrwxrwxrwx. 1 root root  23 Aug 28 05:56 libgfchangelog.so ->
> libgfchangelog.so.0.0.1
> lrwxrwxrwx. 1 root root  23 Aug 28 05:56 libgfchangelog.so.0 ->
> libgfchangelog.so.0.0.1
> -rwxr-xr-x. 1 root root  336888 Aug 28 05:56 libgfchangelog.so.0.0.1
>
>
>
> On Tue, Aug 28, 2018 at 4:04 PM, Krishna Verma  wrote:
>
> Hi Kotresh,
>
>
>
> Thanks for the response, I did that also but nothing changed.
>
>
>
> [root@gluster-poc-noida ~]# ldconfig /usr/lib64
>
> [root@gluster-poc-noida ~]# ldconfig -p | grep libgfchangelog
>
> libgfchangelog.so.0 (libc6,x86-64) =>
> /usr/lib64/libgfchangelog.so.0
>
> [root@gluster-poc-noida ~]#
>
>
>
> [root@gluster-poc-noida ~]# gluster volume geo-replication glusterep
> gluster-poc-sj::glusterep stop
>
> Stopping geo-replication session between glusterep &
> gluster-poc-sj::glusterep has been successful
>
> [root@gluster-poc-noida ~]# gluster volume geo-replication glusterep
> gluster-poc-sj::glusterep start
>
> Starting geo-replication session between glusterep &
> gluster-poc-sj::glusterep has been successful
>
>
>
> [root@gluster-poc-noida ~]# gluster volume geo-replication glusterep
> gluster-poc-sj::glusterep status
>
>
>
> MASTER NODE  MASTER VOLMASTER BRICK SLAVE USER
> SLAVESLAVE NODESTATUSCRAWL STATUS
> LAST_SYNCED
>
> --------
> ----
> -
>
> gluster-poc-noidaglusterep /data/gluster/gv0root
> gluster-poc-sj::glusterepN/A   FaultyN/A N/A
>
> noi-poc-gluster  glusterep /data/gluster/gv0root
>gluster-poc-sj::glusterepN/A   Faulty
> N/A N/A
>
> [root@gluster-poc-noida ~]#
>
>
>
> /Krishna
>
>
>
> *From:* Kotresh Hiremath Ravishankar 
> *Sent:* Tuesday, August 28, 2018 4:00 PM
> *To:* Sunny Kumar 
> *Cc:* Krishna Verma ; Gluster Users <
> gluster-users@gluster.org>
>
>
> *Subject:* Re: [Gluster-users] Upgrade to 4.1.2 geo-replication does not
> work
>
>
>
> EXTERNAL MAIL
>
> Hi Krishna,
>
> Since your libraries are in /usr/lib64, you should be doing
>
> #ldconfig /usr/lib64
>
> Confirm that below command lists the library
>
> #ldconfig -p | grep libgfchangelog
>
>
>
>
>
> On Tue, Aug 28, 2018 at 3:52 PM, Sunny Kumar  wrote:
>
> can you do ldconfig /usr/local/lib and share the output o

Re: [Gluster-users] Upgrade to 4.1.2 geo-replication does not work

2018-08-28 Thread Kotresh Hiremath Ravishankar
Hi Krishna,

As per the output shared, I don't see the file "libgfchangelog.so" which is
what is required.
I only see "libgfchangelog.so.0". Please confirm "libgfchangelog.so" is
present in "/usr/lib64/".
If not create a symlink similar to "libgfchangelog.so.0"

It should be something like below.

#ls -l /usr/lib64 | grep libgfch
-rwxr-xr-x. 1 root root1078 Aug 28 05:56 libgfchangelog.la
lrwxrwxrwx. 1 root root  23 Aug 28 05:56 libgfchangelog.so ->
libgfchangelog.so.0.0.1
lrwxrwxrwx. 1 root root  23 Aug 28 05:56 libgfchangelog.so.0 ->
libgfchangelog.so.0.0.1
-rwxr-xr-x. 1 root root  336888 Aug 28 05:56 libgfchangelog.so.0.0.1


On Tue, Aug 28, 2018 at 4:04 PM, Krishna Verma  wrote:

> Hi Kotresh,
>
>
>
> Thanks for the response, I did that also but nothing changed.
>
>
>
> [root@gluster-poc-noida ~]# ldconfig /usr/lib64
>
> [root@gluster-poc-noida ~]# ldconfig -p | grep libgfchangelog
>
> libgfchangelog.so.0 (libc6,x86-64) =>
> /usr/lib64/libgfchangelog.so.0
>
> [root@gluster-poc-noida ~]#
>
>
>
> [root@gluster-poc-noida ~]# gluster volume geo-replication glusterep
> gluster-poc-sj::glusterep stop
>
> Stopping geo-replication session between glusterep &
> gluster-poc-sj::glusterep has been successful
>
> [root@gluster-poc-noida ~]# gluster volume geo-replication glusterep
> gluster-poc-sj::glusterep start
>
> Starting geo-replication session between glusterep &
> gluster-poc-sj::glusterep has been successful
>
>
>
> [root@gluster-poc-noida ~]# gluster volume geo-replication glusterep
> gluster-poc-sj::glusterep status
>
>
>
> MASTER NODE  MASTER VOLMASTER BRICK SLAVE USER
> SLAVESLAVE NODESTATUSCRAWL STATUS
> LAST_SYNCED
>
> 
> 
> -
>
> gluster-poc-noidaglusterep /data/gluster/gv0root
> gluster-poc-sj::glusterepN/A   FaultyN/A N/A
>
> noi-poc-gluster      glusterep /data/gluster/gv0root
>gluster-poc-sj::glusterepN/A   Faulty
> N/A N/A
>
> [root@gluster-poc-noida ~]#
>
>
>
> /Krishna
>
>
>
> *From:* Kotresh Hiremath Ravishankar 
> *Sent:* Tuesday, August 28, 2018 4:00 PM
> *To:* Sunny Kumar 
> *Cc:* Krishna Verma ; Gluster Users <
> gluster-users@gluster.org>
>
> *Subject:* Re: [Gluster-users] Upgrade to 4.1.2 geo-replication does not
> work
>
>
>
> EXTERNAL MAIL
>
> Hi Krishna,
>
> Since your libraries are in /usr/lib64, you should be doing
>
> #ldconfig /usr/lib64
>
> Confirm that below command lists the library
>
> #ldconfig -p | grep libgfchangelog
>
>
>
>
>
> On Tue, Aug 28, 2018 at 3:52 PM, Sunny Kumar  wrote:
>
> can you do ldconfig /usr/local/lib and share the output of ldconfig -p
> /usr/local/lib | grep libgf
>
> On Tue, Aug 28, 2018 at 3:45 PM Krishna Verma  wrote:
> >
> > Hi Sunny,
> >
> > I did the mentioned changes given in patch and restart the session for
> geo-replication. But again same errors in the logs.
> >
> > I have attaching the config files and logs here.
> >
> >
> > [root@gluster-poc-noida ~]# gluster volume geo-replication glusterep
> gluster-poc-sj::glusterep stop
> > Stopping geo-replication session between glusterep &
> gluster-poc-sj::glusterep has been successful
> > [root@gluster-poc-noida ~]# gluster volume geo-replication glusterep
> gluster-poc-sj::glusterep delete
> > Deleting geo-replication session between glusterep &
> gluster-poc-sj::glusterep has been successful
> > [root@gluster-poc-noida ~]# gluster volume geo-replication glusterep
> gluster-poc-sj::glusterep create push-pem force
> > Creating geo-replication session between glusterep &
> gluster-poc-sj::glusterep has been successful
> > [root@gluster-poc-noida ~]# gluster volume geo-replication glusterep
> gluster-poc-sj::glusterep start
> > geo-replication start failed for glusterep gluster-poc-sj::glusterep
> > geo-replication command failed
> > [root@gluster-poc-noida ~]# gluster volume geo-replication glusterep
> gluster-poc-sj::glusterep start
> > geo-replication start failed for glusterep gluster-poc-sj::glusterep
> > geo-replication command failed
> > [root@gluster-poc-noida ~]# vim /usr/libexec/glusterfs/python/
> syncdaemon/repce.py
> > [r

Re: [Gluster-users] Upgrade to 4.1.2 geo-replication does not work

2018-08-28 Thread Kotresh Hiremath Ravishankar
Hi Krishna,

Since your libraries are in /usr/lib64, you should be doing

#ldconfig /usr/lib64

Confirm that below command lists the library

#ldconfig -p | grep libgfchangelog



On Tue, Aug 28, 2018 at 3:52 PM, Sunny Kumar  wrote:

> can you do ldconfig /usr/local/lib and share the output of ldconfig -p
> /usr/local/lib | grep libgf
> On Tue, Aug 28, 2018 at 3:45 PM Krishna Verma  wrote:
> >
> > Hi Sunny,
> >
> > I did the mentioned changes given in patch and restart the session for
> geo-replication. But again same errors in the logs.
> >
> > I have attaching the config files and logs here.
> >
> >
> > [root@gluster-poc-noida ~]# gluster volume geo-replication glusterep
> gluster-poc-sj::glusterep stop
> > Stopping geo-replication session between glusterep &
> gluster-poc-sj::glusterep has been successful
> > [root@gluster-poc-noida ~]# gluster volume geo-replication glusterep
> gluster-poc-sj::glusterep delete
> > Deleting geo-replication session between glusterep &
> gluster-poc-sj::glusterep has been successful
> > [root@gluster-poc-noida ~]# gluster volume geo-replication glusterep
> gluster-poc-sj::glusterep create push-pem force
> > Creating geo-replication session between glusterep &
> gluster-poc-sj::glusterep has been successful
> > [root@gluster-poc-noida ~]# gluster volume geo-replication glusterep
> gluster-poc-sj::glusterep start
> > geo-replication start failed for glusterep gluster-poc-sj::glusterep
> > geo-replication command failed
> > [root@gluster-poc-noida ~]# gluster volume geo-replication glusterep
> gluster-poc-sj::glusterep start
> > geo-replication start failed for glusterep gluster-poc-sj::glusterep
> > geo-replication command failed
> > [root@gluster-poc-noida ~]# vim /usr/libexec/glusterfs/python/
> syncdaemon/repce.py
> > [root@gluster-poc-noida ~]# systemctl restart glusterd
> > [root@gluster-poc-noida ~]# gluster volume geo-replication glusterep
> gluster-poc-sj::glusterep start
> > Starting geo-replication session between glusterep &
> gluster-poc-sj::glusterep has been successful
> > [root@gluster-poc-noida ~]# gluster volume geo-replication glusterep
> gluster-poc-sj::glusterep status
> >
> > MASTER NODE  MASTER VOLMASTER BRICK SLAVE USER
> SLAVESLAVE NODESTATUSCRAWL STATUS
> LAST_SYNCED
> > 
> 
> -
> > gluster-poc-noidaglusterep /data/gluster/gv0root
> gluster-poc-sj::glusterepN/A   FaultyN/A N/A
> > noi-poc-gluster  glusterep /data/gluster/gv0root
> gluster-poc-sj::glusterepN/A   FaultyN/A N/A
> > [root@gluster-poc-noida ~]#
> >
> >
> > /Krishna.
> >
> > -Original Message-
> > From: Sunny Kumar 
> > Sent: Tuesday, August 28, 2018 3:17 PM
> > To: Krishna Verma 
> > Cc: gluster-users@gluster.org
> > Subject: Re: [Gluster-users] Upgrade to 4.1.2 geo-replication does not
> work
> >
> > EXTERNAL MAIL
> >
> >
> > With same log message ?
> >
> > Can you please verify that
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__review.
> gluster.org_-23_c_glusterfs_-2B_20207_=DwIBaQ=
> aUq983L2pue2FqKFoP6PGHMJQyoJ7kl3s3GZ-_haXqY=0E5nRoxLsT2ZXgCpJM_
> 6ZItAWQ2jH8rVLG6tiXhoLFE=F0ExtFUfa_YCktOGvy82x3IAxvi2GrbPR72jZ8beuYk=
> fGtkmezHJj5YoLN3dUeVUCcYFnREHyOSk36mRjbTTEQ= patch is present if not
> can you please apply that.
> > and try with symlinking ln -s /usr/lib64/libgfchangelog.so.0
> /usr/lib64/libgfchangelog.so.
> >
> > Please share the log also.
> >
> > Regards,
> > Sunny
> > On Tue, Aug 28, 2018 at 3:02 PM Krishna Verma 
> wrote:
> > >
> > > Hi Sunny,
> > >
> > > Thanks for your response, I tried both, but still I am getting the
> same error.
> > >
> > >
> > > [root@noi-poc-gluster ~]# ldconfig /usr/lib [root@noi-poc-gluster ~]#
> > >
> > > [root@noi-poc-gluster ~]# ln -s /usr/lib64/libgfchangelog.so.1
> > > /usr/lib64/libgfchangelog.so [root@noi-poc-gluster ~]# ls -l
> > > /usr/lib64/libgfchangelog.so lrwxrwxrwx. 1 root root 30 Aug 28 14:59
> > > /usr/lib64/libgfchangelog.so -> /usr/lib64/libgfchangelog.so.1
> > >
> > > /Krishna
> > >
> > > -Original Message-
> > > From: Sunny Kumar 
> > > Sent: Tuesday, August 28, 2018 2:55 PM
> > > To: Krishna Verma 
> > > Cc: gluster-users@gluster.org
> > > Subject: Re: [Gluster-users] Upgrade to 4.1.2 geo-replication does not
> > > work
> > >
> > > EXTERNAL MAIL
> > >
> > >
> > > Hi Krish,
> > >
> > > You can run -
> > > #ldconfig /usr/lib
> > >
> > > If that still does not solves your problem you can do manual symlink
> > > like - ln -s /usr/lib64/libgfchangelog.so.1
> > > /usr/lib64/libgfchangelog.so
> > >
> > > Thanks,
> > > Sunny Kumar
> > > On Tue, Aug 28, 2018 at 1:47 PM Krishna Verma 
> wrote:
> > > >
> > > > Hi
> > > >
> > > >
> > > >
> > > > I am getting below error in gsyncd.log
> > > >
> > > >
> > > >
> > > > OSError: 

Re: [Gluster-users] Question to utime feature for release 4.1.0

2018-08-16 Thread Kotresh Hiremath Ravishankar
Hi David,

With this feature enabled, the consistent time attributes (mtime, ctime,
atime) will be
maintained in xattr on the file. With this feature enabled, gluster will
not used time
attributes from backend. It will be served from xattr of that file which
will be
consistent across replica set.

Thanks,
Kotresh

On Thu, Aug 16, 2018 at 12:28 PM, David Spisla  wrote:

> Hello Kotresh,
> its no problem for me that the atime will be updated, importat is a
> consistent mtime and ctime on the bricks of my replica set.
> I have turned on both options you mentioned. After that I created a file
> on my FUSE mount (mounted with noatime). But on all my bricks of the
> replica set
> the mtime and ctime is not consistent. What about the brick mount? Is
> there a special mount option?
> I have a four node cluster and on each node there is only one brick. See
> below all my volume options:
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> *Volume Name: test1Type: ReplicateVolume ID:
> e6576010-d9e3-4a98-bcfd-d4a452e92198Status: StartedSnapshot Count: 0Number
> of Bricks: 1 x 4 = 4Transport-type: tcpBricks:Brick1:
> davids-c1-n1:/gluster/brick1/glusterbrickBrick2:
> davids-c1-n2:/gluster/brick1/glusterbrickBrick3:
> davids-c1-n3:/gluster/brick1/glusterbrickBrick4:
> davids-c1-n4:/gluster/brick1/glusterbrickOptions
> Reconfigured:storage.ctime: onfeatures.utime:
> onperformance.client-io-threads: offnfs.disable:
> ontransport.address-family: inetuser.smb: disablefeatures.read-only:
> offfeatures.worm: offfeatures.worm-file-level: onfeatures.retention-mode:
> relaxnetwork.ping-timeout: 10features.cache-invalidation:
> onfeatures.cache-invalidation-timeout: 600performance.nl-cache:
> onperformance.nl-cache-timeout: 600client.event-threads:
> 32server.event-threads: 32cluster.lookup-optimize:
> onperformance.stat-prefetch: onperformance.cache-invalidation:
> onperformance.md-cache-timeout: 600performance.cache-samba-metadata:
> onperformance.cache-ima-xattrs: onperformance.io-thread-count:
> 64cluster.use-compound-fops: onperformance.cache-size:
> 512MBperformance.cache-refresh-timeout: 10performance.read-ahead:
> offperformance.write-behind-window-size: 4MBperformance.write-behind:
> onstorage.build-pgfid: onauth.ssl-allow: *client.ssl: offserver.ssl:
> offchangelog.changelog: onfeatures.bitrot: onfeatures.scrub:
> Activefeatures.scrub-freq: dailycluster.enable-shared-storage: enable*
>
> Regards
> David Spisla
>
> Am Mi., 15. Aug. 2018 um 20:15 Uhr schrieb Kotresh Hiremath Ravishankar <
> khire...@redhat.com>:
>
>> Hi David,
>>
>> The feature is to provide consistent time attributes (atime, ctime,
>> mtime) across replica set.
>> The feature is enabled with following two options.
>>
>> gluster vol set  utime on
>> gluster vol set  ctime on
>>
>> The features currently does not honour mount options related time
>> attributes such as 'noatime'.
>> So even though the volume is mounted with noatime, it will still update
>> atime with this feature
>> enabled.
>>
>> Thanks,
>> Kotresh HR
>>
>> On Wed, Aug 15, 2018 at 3:51 PM, David Spisla  wrote:
>>
>>> Dear Gluster Community,
>>> in the Chapter "Standalone" point 3 of the release notes for 4.1.0
>>> https://docs.gluster.org/en/latest/release-notes/4.1.0/
>>>
>>> there is an introduction to the new utime feature. What kind of options
>>> are not allowed if I want to mount a volume? There is "noatime,realatime"
>>> mentioned. Does the second mean "relatime". I never heard of "realatime".
>>> Is there any recommendation for the mount options?
>>>
>>> Regards
>>> David Spisla
>>>
>>> ___
>>> Gluster-users mailing list
>>> Gluster-users@gluster.org
>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>
>>
>>
>> --
>> Thanks and Regards,
>> Kotresh H R
>>
>


-- 
Thanks and Regards,
Kotresh H R
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Question to utime feature for release 4.1.0

2018-08-15 Thread Kotresh Hiremath Ravishankar
Hi David,

The feature is to provide consistent time attributes (atime, ctime, mtime)
across replica set.
The feature is enabled with following two options.

gluster vol set  utime on
gluster vol set  ctime on

The features currently does not honour mount options related time
attributes such as 'noatime'.
So even though the volume is mounted with noatime, it will still update
atime with this feature
enabled.

Thanks,
Kotresh HR

On Wed, Aug 15, 2018 at 3:51 PM, David Spisla  wrote:

> Dear Gluster Community,
> in the Chapter "Standalone" point 3 of the release notes for 4.1.0
> https://docs.gluster.org/en/latest/release-notes/4.1.0/
>
> there is an introduction to the new utime feature. What kind of options
> are not allowed if I want to mount a volume? There is "noatime,realatime"
> mentioned. Does the second mean "relatime". I never heard of "realatime".
> Is there any recommendation for the mount options?
>
> Regards
> David Spisla
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>



-- 
Thanks and Regards,
Kotresh H R
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Geo-replication stops after 4-5 hours

2018-08-02 Thread Kotresh Hiremath Ravishankar
Cool, just check whether they are hung by any chance with following command.

#strace -f -p 5921

On Thu, Aug 2, 2018 at 12:25 PM, Marcus Pedersén 
wrote:

> On both active master nodes there is an rsync process. As in:
>
> root  5921  0.0  0.0 115424  1176 ?SAug01   0:00 rsync
> -aR0 --inplace --files-from=- --super --stats --numeric-ids
> --no-implied-dirs --xattrs --acls . -e ssh -oPasswordAuthentication=no
> -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem
> -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-stuphs/
> bf60c68f1a195dad59573a8dbaa309f2.sock geouser@urd-gds-geo-001:/proc/
> 13077/cwd
>
> There is also ssh tunnels to slave nodes and  gsyncd.py processes.
>
> Regards
> Marcus
>
> 
> Marcus Pedersén
> Systemadministrator
> Interbull Centre
> 
> Sent from my phone
> ####
>
> Den 2 aug. 2018 08:07 skrev Kotresh Hiremath Ravishankar <
> khire...@redhat.com>:
> Could you look of any rsync processes hung in master or slave?
>
> On Thu, Aug 2, 2018 at 11:18 AM, Marcus Pedersén 
> wrote:
>
>> Hi Kortesh,
>> rsync  version 3.1.2  protocol version 31
>> All nodes run CentOS 7, updated the last couple of days.
>>
>> Thanks
>> Marcus
>>
>> 
>> Marcus Pedersén
>> Systemadministrator
>> Interbull Centre
>> 
>> Sent from my phone
>> 
>>
>>
>> Den 2 aug. 2018 06:13 skrev Kotresh Hiremath Ravishankar <
>> khire...@redhat.com>:
>>
>> Hi Marcus,
>>
>> What's the rsync version being used?
>>
>> Thanks,
>> Kotresh HR
>>
>> On Thu, Aug 2, 2018 at 1:48 AM, Marcus Pedersén 
>> wrote:
>>
>> Hi all!
>>
>> I upgraded from 3.12.9 to 4.1.1 and had problems with geo-replication.
>>
>> With help from the list with some sym links and so on (handled in another
>> thread)
>>
>> I got the geo-replication running.
>>
>> It ran for 4-5 hours and then stopped, I stopped and started
>> geo-replication and it ran for another 4-5 hours.
>>
>> 4.1.2 was released and I updated, hoping this would solve the problem.
>>
>> I still have the same problem, at start it runs for 4-5 hours and then it
>> stops.
>>
>> After that nothing happens, I have waited for days but still
>> nothing happens.
>>
>>
>> I have looked through logs but can not find anything obvious.
>>
>>
>> Status for geo-replication is active for the two same nodes all the time:
>>
>>
>> MASTER NODEMASTER VOLMASTER BRICK SLAVE USER
>> SLAVE  SLAVE NODE STATUS
>> CRAWL STATUS LAST_SYNCEDENTRYDATA META
>> FAILURESCHECKPOINT TIMECHECKPOINT COMPLETEDCHECKPOINT
>> COMPLETION TIME
>> 
>> 
>> 
>> ---
>> urd-gds-001urd-gds-volume/urd-gds/gluster geouser
>> geouser@urd-gds-geo-001::urd-gds-volumeurd-gds-geo-000Active
>> History Crawl2018-04-16 20:32:090142050
>> 0   2018-07-27 21:12:44No
>> N/A
>> urd-gds-002urd-gds-volume/urd-gds/gluster geouser
>> geouser@urd-gds-geo-001::urd-gds-volumeurd-gds-geo-002Passive
>> N/A  N/AN/A  N/A  N/A
>> N/A N/AN/A
>> N/A
>> urd-gds-004urd-gds-volume/urd-gds/gluster geouser
>> geouser@urd-gds-geo-001::urd-gds-volumeurd-gds-geo-002Passive
>> N/A  N/AN/A  N/A  N/A
>> N/A N/AN/A
>> N/A
>> urd-gds-003urd-gds-volume/urd-gds/gluster geouser
>> geouser@urd-gds-geo-001::urd-gds-volumeurd-gds-geo-000Active
>> History Crawl2018-05-01 20:58:14285  4552 0
>> 0   2018-07-27 21:12:44No
>> N/A
>> urd-gds-000urd-gds-volume/urd-gds/gluster1geouser
>> geouser@urd-gds-geo-001::urd-gds-volumeurd-gds-geo-001Passive
>> N/A  N/AN/A  N/A  N/A
>> N/A N/AN/A
>> N/A
>> urd-gds-000urd-gds-volume/urd-gds/gluster2geouser
>> geouser@urd-gds-geo-001::urd-gds-volumeurd-gds-geo-001Passive
>> N/A  N/A

Re: [Gluster-users] Geo-replication stops after 4-5 hours

2018-08-02 Thread Kotresh Hiremath Ravishankar
Could you look of any rsync processes hung in master or slave?

On Thu, Aug 2, 2018 at 11:18 AM, Marcus Pedersén 
wrote:

> Hi Kortesh,
> rsync  version 3.1.2  protocol version 31
> All nodes run CentOS 7, updated the last couple of days.
>
> Thanks
> Marcus
>
> 
> Marcus Pedersén
> Systemadministrator
> Interbull Centre
> 
> Sent from my phone
> ####
>
>
> Den 2 aug. 2018 06:13 skrev Kotresh Hiremath Ravishankar <
> khire...@redhat.com>:
>
> Hi Marcus,
>
> What's the rsync version being used?
>
> Thanks,
> Kotresh HR
>
> On Thu, Aug 2, 2018 at 1:48 AM, Marcus Pedersén 
> wrote:
>
> Hi all!
>
> I upgraded from 3.12.9 to 4.1.1 and had problems with geo-replication.
>
> With help from the list with some sym links and so on (handled in another
> thread)
>
> I got the geo-replication running.
>
> It ran for 4-5 hours and then stopped, I stopped and started
> geo-replication and it ran for another 4-5 hours.
>
> 4.1.2 was released and I updated, hoping this would solve the problem.
>
> I still have the same problem, at start it runs for 4-5 hours and then it
> stops.
>
> After that nothing happens, I have waited for days but still
> nothing happens.
>
>
> I have looked through logs but can not find anything obvious.
>
>
> Status for geo-replication is active for the two same nodes all the time:
>
>
> MASTER NODEMASTER VOLMASTER BRICK SLAVE USER
> SLAVE  SLAVE NODE STATUS
> CRAWL STATUS LAST_SYNCEDENTRYDATA META
> FAILURESCHECKPOINT TIMECHECKPOINT COMPLETEDCHECKPOINT
> COMPLETION TIME
> 
> 
> 
> ---
> urd-gds-001urd-gds-volume/urd-gds/gluster geouser
> geouser@urd-gds-geo-001::urd-gds-volumeurd-gds-geo-000Active
> History Crawl2018-04-16 20:32:090142050
> 0   2018-07-27 21:12:44No
> N/A
> urd-gds-002urd-gds-volume/urd-gds/gluster geouser
> geouser@urd-gds-geo-001::urd-gds-volumeurd-gds-geo-002Passive
> N/A  N/AN/A  N/A  N/A
> N/A N/AN/A
> N/A
> urd-gds-004urd-gds-volume/urd-gds/gluster geouser
> geouser@urd-gds-geo-001::urd-gds-volumeurd-gds-geo-002Passive
> N/A  N/AN/A  N/A  N/A
> N/A N/AN/A
> N/A
> urd-gds-003urd-gds-volume/urd-gds/gluster geouser
> geouser@urd-gds-geo-001::urd-gds-volumeurd-gds-geo-000Active
> History Crawl2018-05-01 20:58:14285  4552 0
> 0   2018-07-27 21:12:44No
> N/A
> urd-gds-000urd-gds-volume/urd-gds/gluster1geouser
> geouser@urd-gds-geo-001::urd-gds-volumeurd-gds-geo-001Passive
> N/A  N/AN/A  N/A  N/A
> N/A N/AN/A
> N/A
> urd-gds-000urd-gds-volume/urd-gds/gluster2geouser
> geouser@urd-gds-geo-001::urd-gds-volumeurd-gds-geo-001Passive
> N/A  N/AN/A  N/A  N/A
> N/A N/AN/A N/A
>
>
> Master cluster is Distribute-Replicate
>
> 2 x (2 + 1)
>
> Used space 30TB
>
>
> Slave cluster is Replicate
>
> 1 x (2 + 1)
>
> Used space 9TB
>
>
> Parts from gsyncd.logs are enclosed.
>
>
> Thanks a lot!
>
>
> Best regards
>
> Marcus Pedersén
>
>
>
>
> ---
> När du skickar e-post till SLU så innebär detta att SLU behandlar dina
> personuppgifter. För att läsa mer om hur detta går till, klicka här
> <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
> E-mailing SLU will result in SLU processing your personal data. For more
> information on how this is done, click here
> <https://www.slu.se/en/about-slu/contact-slu/personal-data/>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
>
>
> --
> Thanks and Regards,
> Kotresh H R
>
>
> ---
> När du skickar e-post till SLU så innebär detta att SLU behandlar dina
> personuppgifter. För att läsa mer om hur detta går till, klicka här
> <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
> E-mailing SLU will result in SLU processing your personal data. For more
> information on how this is done, click here
> <https://www.slu.se/en/about-slu/contact-slu/personal-data/>
>



-- 
Thanks and Regards,
Kotresh H R
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Geo-replication stops after 4-5 hours

2018-08-01 Thread Kotresh Hiremath Ravishankar
Hi Marcus,

What's the rsync version being used?

Thanks,
Kotresh HR

On Thu, Aug 2, 2018 at 1:48 AM, Marcus Pedersén 
wrote:

> Hi all!
>
> I upgraded from 3.12.9 to 4.1.1 and had problems with geo-replication.
>
> With help from the list with some sym links and so on (handled in another
> thread)
>
> I got the geo-replication running.
>
> It ran for 4-5 hours and then stopped, I stopped and started
> geo-replication and it ran for another 4-5 hours.
>
> 4.1.2 was released and I updated, hoping this would solve the problem.
>
> I still have the same problem, at start it runs for 4-5 hours and then it
> stops.
>
> After that nothing happens, I have waited for days but still
> nothing happens.
>
>
> I have looked through logs but can not find anything obvious.
>
>
> Status for geo-replication is active for the two same nodes all the time:
>
>
> MASTER NODEMASTER VOLMASTER BRICK SLAVE USER
> SLAVE  SLAVE NODE STATUS
> CRAWL STATUS LAST_SYNCEDENTRYDATA META
> FAILURESCHECKPOINT TIMECHECKPOINT COMPLETEDCHECKPOINT
> COMPLETION TIME
> 
> 
> 
> ---
> urd-gds-001urd-gds-volume/urd-gds/gluster geouser
> geouser@urd-gds-geo-001::urd-gds-volumeurd-gds-geo-000Active
> History Crawl2018-04-16 20:32:090142050
> 0   2018-07-27 21:12:44No
> N/A
> urd-gds-002urd-gds-volume/urd-gds/gluster geouser
> geouser@urd-gds-geo-001::urd-gds-volumeurd-gds-geo-002Passive
> N/A  N/AN/A  N/A  N/A
> N/A N/AN/A
> N/A
> urd-gds-004urd-gds-volume/urd-gds/gluster geouser
> geouser@urd-gds-geo-001::urd-gds-volumeurd-gds-geo-002Passive
> N/A  N/AN/A  N/A  N/A
> N/A N/AN/A
> N/A
> urd-gds-003urd-gds-volume/urd-gds/gluster geouser
> geouser@urd-gds-geo-001::urd-gds-volumeurd-gds-geo-000Active
> History Crawl2018-05-01 20:58:14285  4552 0
> 0   2018-07-27 21:12:44No
> N/A
> urd-gds-000urd-gds-volume/urd-gds/gluster1geouser
> geouser@urd-gds-geo-001::urd-gds-volumeurd-gds-geo-001Passive
> N/A  N/AN/A  N/A  N/A
> N/A N/AN/A
> N/A
> urd-gds-000urd-gds-volume/urd-gds/gluster2geouser
> geouser@urd-gds-geo-001::urd-gds-volumeurd-gds-geo-001Passive
> N/A  N/AN/A  N/A  N/A
> N/A N/AN/A N/A
>
>
> Master cluster is Distribute-Replicate
>
> 2 x (2 + 1)
>
> Used space 30TB
>
>
> Slave cluster is Replicate
>
> 1 x (2 + 1)
>
> Used space 9TB
>
>
> Parts from gsyncd.logs are enclosed.
>
>
> Thanks a lot!
>
>
> Best regards
>
> Marcus Pedersén
>
>
>
>
> ---
> När du skickar e-post till SLU så innebär detta att SLU behandlar dina
> personuppgifter. För att läsa mer om hur detta går till, klicka här
> 
> E-mailing SLU will result in SLU processing your personal data. For more
> information on how this is done, click here
> 
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>



-- 
Thanks and Regards,
Kotresh H R
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Gluster 3.12.11 geo-replication connection to peer is broken

2018-07-23 Thread Kotresh Hiremath Ravishankar
Hi Pablo,

The geo-rep status should go to Faulty if he connection to peer is broken.
Does node log files failing with same error? Are these logs repeating?
Does stop and start geo-rep giving the same error?

Thanks,
Kotresh HR

On Tue, Jul 24, 2018 at 1:47 AM, Pablo J Rebollo Sosa  wrote:

> Hi,
>
> I’m having problem with Gluster 3.12.11 geo-replication in CentOS 7.5.
> The process starts the geo-replication but after few minutes the log shows
> “connection to peer is broken”.
>
> The “status detail” looks ok but no files are replicated.
>
> [root@gluster1 vol_replicated]#  gluster volume geo-replication
> vol_replicated geoaccount1@10.20.220.12::georep_1 status detail | sort
>
> 
> 
> 
> 
> -
> MASTER NODEMASTER VOLMASTER BRICK SLAVE
> USER SLAVE   SLAVE NODE  STATUS
> CRAWL STATUSLAST_SYNCEDENTRYDATAMETAFAILURES
>  CHECKPOINT TIMECHECKPOINT COMPLETEDCHECKPOINT COMPLETION TIME
> gluster1 vol_replicated/export/brick1/vol_replicated
>  geoaccount1geoaccount1@10.20.220.12::georep_110.20.220.12
>  Active Hybrid CrawlN/A8191 65500   0
> N/AN/A N/A
> gluster2 vol_replicated/export/brick1/vol_replicated
>  geoaccount1geoaccount1@10.20.220.12::georep_110.20.220.13
>  PassiveN/A N/AN/A  N/A N/A N/A
> N/AN/A N/A
> gluster3 vol_replicated/export/brick1/vol_replicated
>  geoaccount1geoaccount1@10.20.220.12::georep_110.20.220.12
>  PassiveN/A N/AN/A  N/A N/A N/A
> N/AN/A N/A
> gluster4 vol_replicated/export/brick1/vol_replicated
>  geoaccount1geoaccount1@10.20.220.12::georep_110.20.220.13
>  Active Hybrid CrawlN/A8191 65320   0
> N/AN/A N/A
>
> These are the messages on the log file.
>
> [2018-07-23 19:35:50.18026] I [gsyncdstatus(/export/brick1/
> vol_replicated):276:set_active] GeorepStatus: Worker Status Change
> status=Active
> [2018-07-23 19:35:50.19126] I [gsyncdstatus(/export/brick1/
> vol_replicated):248:set_worker_crawl_status] GeorepStatus: Crawl Status
> Change   status=History Crawl
> [2018-07-23 19:35:50.19480] I 
> [master(/export/brick1/vol_replicated):1432:crawl]
> _GMaster: starting history crawl   turns=1 stime=(0, 0)
>  entry_stime=Noneetime=1532374550
> [2018-07-23 19:35:50.20056] E 
> [repce(/export/brick1/vol_replicated):117:worker]
> : call failed:
> Traceback (most recent call last):
>   File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 113, in
> worker
> res = getattr(self.obj, rmeth)(*in_data[2:])
>   File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line
> 54, in history
> num_parallel)
>   File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line
> 103, in cl_history_changelog
> raise ChangelogHistoryNotAvailable()
> ChangelogHistoryNotAvailable
> [2018-07-23 19:35:50.20999] E 
> [repce(/export/brick1/vol_replicated):209:__call__]
> RepceClient: call failed on peer  call=39755:140602890745664:1532374550.02
>method=history  error=ChangelogHistoryNotAvailable
> [2018-07-23 19:35:50.21156] I 
> [resource(/export/brick1/vol_replicated):1675:service_loop]
> GLUSTER: Changelog history not available, using xsync
> [2018-07-23 19:35:50.28688] I 
> [master(/export/brick1/vol_replicated):1543:crawl]
> _GMaster: starting hybrid crawlstime=(0, 0)
> [2018-07-23 19:35:50.30505] I [gsyncdstatus(/export/brick1/
> vol_replicated):248:set_worker_crawl_status] GeorepStatus: Crawl Status
> Change   status=Hybrid Crawl
> [2018-07-23 19:35:54.35396] I 
> [master(/export/brick1/vol_replicated):1554:crawl]
> _GMaster: processing xsync changelog   path=/var/lib/misc/glusterfsd/
> vol_replicated/ssh%3A%2F%2Fgeoaccount1%4010.20.220.12%
> 3Agluster%3A%2F%2F127.0.0.1%3Ageorep_1/a68ebfef8cdf86c3c6e9a0d85969cd
> 3f/xsync/XSYNC-CHANGELOG.1532374550
> [2018-07-23 19:36:11.590595] E [syncdutils(/export/brick1/
> vol_replicated):304:log_raise_exception] : connection to peer is
> broken
>
> Anyone have some clues to what might be wrong?
>
> Best regards,
>
> Pablo J. Rebollo-Sosa
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>



-- 
Thanks and Regards,
Kotresh H R
___
Gluster-users mailing list
Gluster-users@gluster.org

Re: [Gluster-users] georeplication woes

2018-07-23 Thread Kotresh Hiremath Ravishankar
Looks like gsyncd on slave is failing for some reason.

Please run the below cmd on the master.

#ssh -i /var/lib/glusterd/geo-replication/secret.pem  georep@gluster-4.glstr

It should run gsyncd on the slave. If there is error, it should be fixed.
Please share the output of above cmd.

Regards,
Kotresh

On Mon, Jul 23, 2018 at 9:28 PM, Maarten van Baarsel <
mrten_glusterus...@ii.nl> wrote:

>
>
> On 23/07/18 15:28, Sunny Kumar wrote:
> > Hi,
> > If you run this below command on master
> >
> > gluster vol geo-rep   config
> > slave-gluster-command-dir 
> >
> > on slave run "which gluster" to know gluster-binary-location on slave
>
> Done that, repeatedly, no change :(
>
> > It will make the same entry in gsyncd.conf file please recheck and
>
> (what gsyncd.conf? the one in /etc or someplace else?)
>
>
>
> > confirm both entries are same and also can you confirm that both
> > master and slave have same gluster version.
>
> slave:
>
> root@gluster-4:~$ /usr/sbin/gluster --version
> glusterfs 4.0.2
>
> master:
>
> root@gluster-3:/home/mrten# /usr/sbin/gluster --version
> glusterfs 4.0.2
>
>
> Looking at the slaves' cli.log:
>
> [2018-07-23 15:53:26.187547] I [cli.c:767:main] 0-cli: Started running
> /usr/sbin/gluster with version 4.0.2
> [2018-07-23 15:53:26.187611] I [cli.c:646:cli_rpc_init] 0-cli: Connecting
> to remote glusterd at localhost
> [2018-07-23 15:53:26.229756] I [MSGID: 101190] 
> [event-epoll.c:609:event_dispatch_epoll_worker]
> 0-epoll: Started thread with index 1
> [2018-07-23 15:53:26.229871] W [rpc-clnt.c:1739:rpc_clnt_submit]
> 0-glusterfs: error returned while attempting to connect to host:(null),
> port:0
> [2018-07-23 15:53:26.229963] I [socket.c:2625:socket_event_handler]
> 0-transport: EPOLLERR - disconnecting now
> [2018-07-23 15:53:26.230640] I [cli-rpc-ops.c:8785:gf_cli_mount_cbk]
> 0-cli: Received resp to mount
> [2018-07-23 15:53:26.230825] I [input.c:31:cli_batch] 0-: Exiting with: -1
>
> there's a weird warning there with host:(null), port:0
>
> M.
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>



-- 
Thanks and Regards,
Kotresh H R
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work

2018-07-17 Thread Kotresh Hiremath Ravishankar
Hi Marcus,

Well there is nothing wrong in setting up a symlink for gluster binary
location, but
there is a geo-rep command to set it so that gsyncd will search there.

To set on master
#gluster vol geo-rep   config gluster-command-dir


To set on slave
#gluster vol geo-rep   config
slave-gluster-command-dir 

Thanks,
Kotresh HR


On Wed, Jul 18, 2018 at 9:28 AM, Kotresh Hiremath Ravishankar <
khire...@redhat.com> wrote:

> Hi Marcus,
>
> I am testing out 4.1 myself and I will have some update today.
> For this particular traceback, gsyncd is not able to find the library.
> Is it the rpm install? If so, gluster libraries would be in /usr/lib.
> Please run the cmd below.
>
> #ldconfig /usr/lib
> #ldconfig -p /usr/lib | grep libgf  (This should list libgfchangelog.so)
>
> Geo-rep should be fixed automatically.
>
> Thanks,
> Kotresh HR
>
> On Wed, Jul 18, 2018 at 1:27 AM, Marcus Pedersén 
> wrote:
>
>> Hi again,
>>
>> I continue to do some testing, but now I have come to a stage where I
>> need help.
>>
>>
>> gsyncd.log was complaining about that /usr/local/sbin/gluster was missing
>> so I made a link.
>>
>> After that /usr/local/sbin/glusterfs was missing so I made a link there
>> as well.
>>
>> Both links were done on all slave nodes.
>>
>>
>> Now I have a new error that I can not resolve myself.
>>
>> It can not open libgfchangelog.so
>>
>>
>> Many thanks!
>>
>> Regards
>>
>> Marcus Pedersén
>>
>>
>> Part of gsyncd.log:
>>
>> OSError: libgfchangelog.so: cannot open shared object file: No such file
>> or directory
>> [2018-07-17 19:32:06.517106] I [repce(agent 
>> /urd-gds/gluster):89:service_loop]
>> RepceServer: terminating on reaching EOF.
>> [2018-07-17 19:32:07.479553] I [monitor(monitor):272:monitor] Monitor:
>> worker died in startup phase brick=/urd-gds/gluster
>> [2018-07-17 19:32:17.500709] I [monitor(monitor):158:monitor] Monitor:
>> starting gsyncd worker   brick=/urd-gds/gluster  slave_node=urd-gds-geo-000
>> [2018-07-17 19:32:17.541547] I [gsyncd(agent /urd-gds/gluster):297:main]
>> : Using session config file   path=/var/lib/glusterd/geo-rep
>> lication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
>> [2018-07-17 19:32:17.541959] I [gsyncd(worker /urd-gds/gluster):297:main]
>> : Using session config file  path=/var/lib/glusterd/geo-rep
>> lication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
>> [2018-07-17 19:32:17.542363] I [changelogagent(agent
>> /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
>> [2018-07-17 19:32:17.550894] I [resource(worker
>> /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection
>> between master and slave...
>> [2018-07-17 19:32:19.166246] I [resource(worker
>> /urd-gds/gluster):1395:connect_remote] SSH: SSH connection between
>> master and slave established.duration=1.6151
>> [2018-07-17 19:32:19.166806] I [resource(worker
>> /urd-gds/gluster):1067:connect] GLUSTER: Mounting gluster volume
>> locally...
>> [2018-07-17 19:32:20.257344] I [resource(worker
>> /urd-gds/gluster):1090:connect] GLUSTER: Mounted gluster volume
>> duration=1.0901
>> [2018-07-17 19:32:20.257921] I [subcmds(worker
>> /urd-gds/gluster):70:subcmd_worker] : Worker spawn successful.
>> Acknowledging back to monitor
>> [2018-07-17 19:32:20.274647] E [repce(agent /urd-gds/gluster):114:worker]
>> : call failed:
>> Traceback (most recent call last):
>>   File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 110, in
>> worker
>> res = getattr(self.obj, rmeth)(*in_data[2:])
>>   File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py",
>> line 37, in init
>> return Changes.cl_init()
>>   File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py",
>> line 21, in __getattr__
>> from libgfchangelog import Changes as LChanges
>>   File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py",
>> line 17, in 
>> class Changes(object):
>>   File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py",
>> line 19, in Changes
>> use_errno=True)
>>   File "/usr/lib64/python2.7/ctypes/__init__.py", line 360, in __init__
>> self._handle = _dlopen(self._name, mode)
>> OSError: libgfchangelog.so: cannot open shared object file: No such file
>> or directory
>> [2018-07-17 19:32:20.275093] E [repce(worker
>> /urd-gds/gluster):206:__call__] RepceClient: call fai

Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work

2018-07-17 Thread Kotresh Hiremath Ravishankar
worker /urd-gds/gluster):297:main]
> : Using session config file   path=/var/lib/glusterd/geo-
> replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> [2018-07-16 19:35:06.99561] I [gsyncd(agent /urd-gds/gluster):297:main]
> : Using session config filepath=/var/lib/glusterd/geo-
> replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> [2018-07-16 19:35:06.100481] I [changelogagent(agent
> /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
> [2018-07-16 19:35:06.108834] I [resource(worker 
> /urd-gds/gluster):1348:connect_remote]
> SSH: Initializing SSH connection between master and slave...
> [2018-07-16 19:35:06.762320] E [syncdutils(worker
> /urd-gds/gluster):303:log_raise_exception] : connection to peer is
> broken
> [2018-07-16 19:35:06.763103] E [syncdutils(worker
> /urd-gds/gluster):749:errlog] Popen: command returned error   cmd=ssh
> -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i
> /var/lib/glusterd/geo-replicatio\
> n/secret.pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-K9mB6Q/
> bf60c68f1a195dad59573a8dbaa309f2.sock geouser@urd-gds-geo-001
> /nonexistent/gsyncd slave urd-gds-volume geouser@urd-gds-geo-001::urd-
> gds-volu\
> me --master-node urd-gds-001 --master-node-id 
> 912bebfd-1a7f-44dc-b0b7-f001a20d58cd
> --master-brick /urd-gds/gluster --local-node urd-gds-geo-000
> --local-node-id 03075698-2bbf-43e4-a99a-65fe82f61794 --slave-timeo\
> ut 120 --slave-log-level INFO --slave-gluster-log-level INFO
> --slave-gluster-command-dir /usr/local/sbin/ error=1
> [2018-07-16 19:35:06.763398] E [syncdutils(worker
> /urd-gds/gluster):753:logerr] Popen: ssh> failure: execution of
> "/usr/local/sbin/gluster" failed with ENOENT (No such file or directory)
> [2018-07-16 19:35:06.771905] I [repce(agent /urd-gds/gluster):89:service_loop]
> RepceServer: terminating on reaching EOF.
> [2018-07-16 19:35:06.772272] I [monitor(monitor):262:monitor] Monitor:
> worker died before establishing connection   brick=/urd-gds/gluster
> [2018-07-16 19:35:16.786387] I [monitor(monitor):158:monitor] Monitor:
> starting gsyncd worker   brick=/urd-gds/gluster  slave_node=urd-gds-geo-000
> [2018-07-16 19:35:16.828056] I [gsyncd(worker /urd-gds/gluster):297:main]
> : Using session config file  path=/var/lib/glusterd/geo-
> replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> [2018-07-16 19:35:16.828066] I [gsyncd(agent /urd-gds/gluster):297:main]
> : Using session config file   path=/var/lib/glusterd/geo-
> replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> [2018-07-16 19:35:16.828912] I [changelogagent(agent
> /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
> [2018-07-16 19:35:16.837100] I [resource(worker 
> /urd-gds/gluster):1348:connect_remote]
> SSH: Initializing SSH connection between master and slave...
> [2018-07-16 19:35:17.260257] E [syncdutils(worker
> /urd-gds/gluster):303:log_raise_exception] : connection to peer is
> broken
>
> --
> *Från:* gluster-users-boun...@gluster.org  gluster.org> för Marcus Pedersén 
> *Skickat:* den 13 juli 2018 14:50
> *Till:* Kotresh Hiremath Ravishankar
> *Kopia:* gluster-users@gluster.org
> *Ämne:* Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work
>
> Hi Kotresh,
> Yes, all nodes have the same version 4.1.1 both master and slave.
> All glusterd are crashing on the master side.
> Will send logs tonight.
>
> Thanks,
> Marcus
>
> 
> Marcus Pedersén
> Systemadministrator
> Interbull Centre
> 
> Sent from my phone
> 
>
> Den 13 juli 2018 11:28 skrev Kotresh Hiremath Ravishankar <
> khire...@redhat.com>:
>
> Hi Marcus,
>
> Is the gluster geo-rep version is same on both master and slave?
>
> Thanks,
> Kotresh HR
>
> On Fri, Jul 13, 2018 at 1:26 AM, Marcus Pedersén 
> wrote:
>
> Hi Kotresh,
>
> i have replaced both files (gsyncdconfig.py
> <https://review.gluster.org/#/c/20207/1/geo-replication/syncdaemon/gsyncdconfig.py>
> and repce.py
> <https://review.gluster.org/#/c/20207/1/geo-replication/syncdaemon/repce.py>)
> in all nodes both master and slave.
>
> I rebooted all servers but geo-replication status is still Stopped.
>
> I tried to start geo-replication with response Successful but status still
> show Stopped on all nodes.
>
> Nothing has been written to geo-replication logs since I sent the tail of
> the log.
>
> So I do not know what info to provide?
>
>
> Please, help me to find a way to solve this.
>
>
> Thanks!
>
>
> Regards
>
> Marcus
>
>
> ---

Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work

2018-07-13 Thread Kotresh Hiremath Ravishankar
Hi Marcus,

Is the gluster geo-rep version is same on both master and slave?

Thanks,
Kotresh HR

On Fri, Jul 13, 2018 at 1:26 AM, Marcus Pedersén 
wrote:

> Hi Kotresh,
>
> i have replaced both files (gsyncdconfig.py
> <https://review.gluster.org/#/c/20207/1/geo-replication/syncdaemon/gsyncdconfig.py>
> and repce.py
> <https://review.gluster.org/#/c/20207/1/geo-replication/syncdaemon/repce.py>)
> in all nodes both master and slave.
>
> I rebooted all servers but geo-replication status is still Stopped.
>
> I tried to start geo-replication with response Successful but status still
> show Stopped on all nodes.
>
> Nothing has been written to geo-replication logs since I sent the tail of
> the log.
>
> So I do not know what info to provide?
>
>
> Please, help me to find a way to solve this.
>
>
> Thanks!
>
>
> Regards
>
> Marcus
>
>
> --
> *Från:* gluster-users-boun...@gluster.org  gluster.org> för Marcus Pedersén 
> *Skickat:* den 12 juli 2018 08:51
> *Till:* Kotresh Hiremath Ravishankar
> *Kopia:* gluster-users@gluster.org
> *Ämne:* Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work
>
> Thanks Kotresh,
> I installed through the official centos channel, centos-release-gluster41.
> Isn't this fix included in centos install?
> I will have a look, test it tonight and come back to you!
>
> Thanks a lot!
>
> Regards
> Marcus
>
> ####
> Marcus Pedersén
> Systemadministrator
> Interbull Centre
> 
> Sent from my phone
> 
>
> Den 12 juli 2018 07:41 skrev Kotresh Hiremath Ravishankar <
> khire...@redhat.com>:
>
> Hi Marcus,
>
> I think the fix [1] is needed in 4.1
> Could you please this out and let us know if that works for you?
>
> [1] https://review.gluster.org/#/c/20207/
>
> Thanks,
> Kotresh HR
>
> On Thu, Jul 12, 2018 at 1:49 AM, Marcus Pedersén 
> wrote:
>
> Hi all,
>
> I have upgraded from 3.12.9 to 4.1.1 and been following upgrade
> instructions for offline upgrade.
>
> I upgraded geo-replication side first 1 x (2+1) and the master side after
> that 2 x (2+1).
>
> Both clusters works the way they should on their own.
>
> After upgrade on master side status for all geo-replication nodes
> is Stopped.
>
> I tried to start the geo-replication from master node and response back
> was started successfully.
>
> Status again  Stopped
>
> Tried to start again and get response started successfully, after that all
> glusterd crashed on all master nodes.
>
> After a restart of all glusterd the master cluster was up again.
>
> Status for geo-replication is still Stopped and every try to start it
> after this gives the response successful but still status Stopped.
>
>
> Please help me get the geo-replication up and running again.
>
>
> Best regards
>
> Marcus Pedersén
>
>
> Part of geo-replication log from master node:
>
> [2018-07-11 18:42:48.941760] I [changelogagent(/urd-gds/gluster):73:__init__]
> ChangelogAgent: Agent listining...
> [2018-07-11 18:42:48.947567] I 
> [resource(/urd-gds/gluster):1780:connect_remote]
> SSH: Initializing SSH connection between master and slave...
> [2018-07-11 18:42:49.363514] E 
> [syncdutils(/urd-gds/gluster):304:log_raise_exception]
> : connection to peer is broken
> [2018-07-11 18:42:49.364279] E [resource(/urd-gds/gluster):210:errlog]
> Popen: command returned errorcmd=ssh -oPasswordAuthentication=no
> -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret\
> .pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-hjRhBo/7e5
> 534547f3675a710a107722317484f.sock geouser@urd-gds-geo-000
> /nonexistent/gsyncd --session-owner 5e94eb7d-219f-4741-a179-d4ae6b50c7ee
> --local-id .%\
> 2Furd-gds%2Fgluster --local-node urd-gds-001 -N --listen --timeout 120
> gluster://localhost:urd-gds-volume   error=2
> [2018-07-11 18:42:49.364586] E [resource(/urd-gds/gluster):214:logerr]
> Popen: ssh> usage: gsyncd.py [-h]
> [2018-07-11 18:42:49.364799] E [resource(/urd-gds/gluster):214:logerr]
> Popen: ssh>
> [2018-07-11 18:42:49.364989] E [resource(/urd-gds/gluster):214:logerr]
> Popen: ssh>  {monitor-status,monitor,worker
> ,agent,slave,status,config-check,config-get,config-set,
> config-reset,voluuidget,d\
> elete}
> [2018-07-11 18:42:49.365210] E [resource(/urd-gds/gluster):214:logerr]
> Popen: ssh>  ...
> [2018-07-11 18:42:49.365408] E [resource(/urd-gds/gluster):214:logerr]
> Popen: ssh> gsyncd.py: error: argument subcmd: invalid choice:
> '5e94eb7d-219f-4741-a179-d4ae6b50c7ee' (choose from 'monitor-status',
>

Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work

2018-07-11 Thread Kotresh Hiremath Ravishankar
Hi Marcus,

I think the fix [1] is needed in 4.1
Could you please this out and let us know if that works for you?

[1] https://review.gluster.org/#/c/20207/

Thanks,
Kotresh HR

On Thu, Jul 12, 2018 at 1:49 AM, Marcus Pedersén 
wrote:

> Hi all,
>
> I have upgraded from 3.12.9 to 4.1.1 and been following upgrade
> instructions for offline upgrade.
>
> I upgraded geo-replication side first 1 x (2+1) and the master side after
> that 2 x (2+1).
>
> Both clusters works the way they should on their own.
>
> After upgrade on master side status for all geo-replication nodes
> is Stopped.
>
> I tried to start the geo-replication from master node and response back
> was started successfully.
>
> Status again  Stopped
>
> Tried to start again and get response started successfully, after that all
> glusterd crashed on all master nodes.
>
> After a restart of all glusterd the master cluster was up again.
>
> Status for geo-replication is still Stopped and every try to start it
> after this gives the response successful but still status Stopped.
>
>
> Please help me get the geo-replication up and running again.
>
>
> Best regards
>
> Marcus Pedersén
>
>
> Part of geo-replication log from master node:
>
> [2018-07-11 18:42:48.941760] I [changelogagent(/urd-gds/gluster):73:__init__]
> ChangelogAgent: Agent listining...
> [2018-07-11 18:42:48.947567] I 
> [resource(/urd-gds/gluster):1780:connect_remote]
> SSH: Initializing SSH connection between master and slave...
> [2018-07-11 18:42:49.363514] E 
> [syncdutils(/urd-gds/gluster):304:log_raise_exception]
> : connection to peer is broken
> [2018-07-11 18:42:49.364279] E [resource(/urd-gds/gluster):210:errlog]
> Popen: command returned errorcmd=ssh -oPasswordAuthentication=no
> -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret\
> .pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-hjRhBo/
> 7e5534547f3675a710a107722317484f.sock geouser@urd-gds-geo-000
> /nonexistent/gsyncd --session-owner 5e94eb7d-219f-4741-a179-d4ae6b50c7ee
> --local-id .%\
> 2Furd-gds%2Fgluster --local-node urd-gds-001 -N --listen --timeout 120
> gluster://localhost:urd-gds-volume   error=2
> [2018-07-11 18:42:49.364586] E [resource(/urd-gds/gluster):214:logerr]
> Popen: ssh> usage: gsyncd.py [-h]
> [2018-07-11 18:42:49.364799] E [resource(/urd-gds/gluster):214:logerr]
> Popen: ssh>
> [2018-07-11 18:42:49.364989] E [resource(/urd-gds/gluster):214:logerr]
> Popen: ssh>  {monitor-status,monitor,
> worker,agent,slave,status,config-check,config-get,config-set,config-reset,
> voluuidget,d\
> elete}
> [2018-07-11 18:42:49.365210] E [resource(/urd-gds/gluster):214:logerr]
> Popen: ssh>  ...
> [2018-07-11 18:42:49.365408] E [resource(/urd-gds/gluster):214:logerr]
> Popen: ssh> gsyncd.py: error: argument subcmd: invalid choice:
> '5e94eb7d-219f-4741-a179-d4ae6b50c7ee' (choose from 'monitor-status',
> 'monit\
> or', 'worker', 'agent', 'slave', 'status', 'config-check', 'config-get',
> 'config-set', 'config-reset', 'voluuidget', 'delete')
> [2018-07-11 18:42:49.365919] I [syncdutils(/urd-gds/gluster):271:finalize]
> : exiting.
> [2018-07-11 18:42:49.369316] I [repce(/urd-gds/gluster):92:service_loop]
> RepceServer: terminating on reaching EOF.
> [2018-07-11 18:42:49.369921] I [syncdutils(/urd-gds/gluster):271:finalize]
> : exiting.
> [2018-07-11 18:42:49.369694] I [monitor(monitor):353:monitor] Monitor:
> worker died before establishing connection   brick=/urd-gds/gluster
> [2018-07-11 18:42:59.492762] I [monitor(monitor):280:monitor] Monitor:
> starting gsyncd worker   brick=/urd-gds/gluster
> slave_node=ssh://geouser@urd-gds-geo-000:gluster://
> localhost:urd-gds-volume
> [2018-07-11 18:42:59.558491] I 
> [resource(/urd-gds/gluster):1780:connect_remote]
> SSH: Initializing SSH connection between master and slave...
> [2018-07-11 18:42:59.559056] I [changelogagent(/urd-gds/gluster):73:__init__]
> ChangelogAgent: Agent listining...
> [2018-07-11 18:42:59.945693] E 
> [syncdutils(/urd-gds/gluster):304:log_raise_exception]
> : connection to peer is broken
> [2018-07-11 18:42:59.946439] E [resource(/urd-gds/gluster):210:errlog]
> Popen: command returned errorcmd=ssh -oPasswordAuthentication=no
> -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret\
> .pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-992bk7/
> 7e5534547f3675a710a107722317484f.sock geouser@urd-gds-geo-000
> /nonexistent/gsyncd --session-owner 5e94eb7d-219f-4741-a179-d4ae6b50c7ee
> --local-id .%\
> 2Furd-gds%2Fgluster --local-node urd-gds-001 -N --listen --timeout 120
> gluster://localhost:urd-gds-volume   error=2
> [2018-07-11 18:42:59.946748] E [resource(/urd-gds/gluster):214:logerr]
> Popen: ssh> usage: gsyncd.py [-h]
> [2018-07-11 18:42:59.946962] E [resource(/urd-gds/gluster):214:logerr]
> Popen: ssh>
> [2018-07-11 18:42:59.947150] E [resource(/urd-gds/gluster):214:logerr]
> Popen: ssh>  {monitor-status,monitor,
> 

Re: [Gluster-users] Old georep files in /var/lib/misc/glusterfsd

2018-06-25 Thread Kotresh Hiremath Ravishankar
Hi Mabi,

You can safely delete old files under /var/lib/misc/glusterfsd.

Thanks,
Kotresh

On Mon, Jun 25, 2018 at 7:30 PM, mabi  wrote:

> Hi,
>
> In the past I was using geo-replication but unconfigured it on my two
> volumes by using:
>
> gluster volume geo-replication ... stop
> gluster volume geo-replication ... delete
>
> ​​Now I found out that I still have some old files in
> /var/lib/misc/glusterfsd belonging to my two volumes which were
> geo-replicated. Can I safely delete everything under
> /var/lib/misc/glusterfsd on all of my nodes? I have a replica 2 with an
> arbiter and I am using GlusterFS 3.12.9.
>
> Thanks,
> M.
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users




-- 
Thanks and Regards,
Kotresh H R
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Need Help to get GEO Replication working - Error: Please check gsync config file. Unable to get statefile's name

2018-06-20 Thread Kotresh Hiremath Ravishankar
Hi Axel,

It's the latest. Ok, please share the geo-replication master and slave logs.

master location: /var/log/gluster/geo-replication
slave location: /var/log/glusterfs/geo-replication-slaves

Thanks,
Kotresh HR

On Tue, Jun 19, 2018 at 2:54 PM, Axel Gruber  wrote:

> Hello
>
> im using in 2 Debian Machines (Virtual) with Gluster from the repro
>
> root@glusters1:/# gluster --version
> glusterfs 4.1.0
> Repository revision: git://git.gluster.org/glusterfs.git
> Copyright (c) 2006-2016 Red Hat, Inc. <https://www.gluster.org/>
>
>
>
>
>
> <https://tawk.to/autohausgruber>
> Kontaktieren Sie mich direkt per Live Chat - einfach hier Klicken:
> 
> Mit freundlichen Grüßen
> Autohaus A. Gruber OHG
> Axel Gruber / Geschäftsführer
>
> Tel: 0807193200
> Fax: 0807193202
> E-Mail: a...@agm.de
> Internet: www.autohaus-gruber.net
>
> Ihr starker MAZDA und HYUNDAI Partner 4mal in der Region - einmal auch in
> Ihrer Nähe.
>
> Autohaus A. Gruber OHG, Gewerbepark Kaserne 10, 83278 Traunstein.
>
> HRA 8216 Amtsgericht Traunstein, Ust.-Id.-Nr. DE813812187
>
> Geschäftsführer: Axel Gruber, Anton Gruber
>
> Steuernummer: 141/151/51801
>
>
> Am Mo., 18. Juni 2018 um 11:30 Uhr schrieb Kotresh Hiremath Ravishankar <
> khire...@redhat.com>:
>
>> Hi Alex,
>>
>> Sorry, I lost the context.
>>
>> Which gluster version are you using?
>>
>> Thanks,
>> Kotresh HR
>>
>> On Sat, Jun 16, 2018 at 2:57 PM, Axel Gruber  wrote:
>>
>>> Hello
>>>
>>> i think its better to open a new Thread:
>>>
>>>
>>> I tryed to install Geo Replication again - setup SSH Key - prepared
>>> session Broker and so on (shown in the Manual)
>>>
>>> But i get this error:
>>>
>>> root@glusters1:~# gluster volume geo-replication gpool geo.com::geovol
>>> create force
>>> Please check gsync config file. Unable to get statefile's name
>>> geo-replication command failed
>>>
>>> I use "force" command because Slave Gluster is to small - but its empty
>>> - so whtiout Force i get:
>>>
>>> root@glusters1:~# gluster volume geo-replication gpool geo.com::geovol
>>> create
>>> Total disk size of master is greater than disk size of slave.
>>> Total available size of master is greater than available size of slave
>>> geo-replication command failed
>>>
>>> I also tryed to adjust Sice of GEO Volume  - so now GEO Volume is bigger
>>> then Master Volume - but still same Error.
>>>
>>>
>>> Can anyone help me to understand whats going wrong here ?
>>>
>>>
>>>
>>>
>>> <https://tawk.to/autohausgruber>
>>> Kontaktieren Sie mich direkt per Live Chat - einfach hier Klicken:
>>> 
>>> Mit freundlichen Grüßen
>>> Autohaus A. Gruber OHG
>>> Axel Gruber / Geschäftsführer
>>>
>>> Tel: 0807193200
>>> Fax: 0807193202
>>> E-Mail: a...@agm.de
>>> Internet: www.autohaus-gruber.net
>>>
>>> Ihr starker MAZDA und HYUNDAI Partner 4mal in der Region - einmal auch
>>> in Ihrer Nähe.
>>>
>>> Autohaus A. Gruber OHG, Gewerbepark Kaserne 10, 83278 Traunstein.
>>>
>>> HRA 8216 Amtsgericht Traunstein, Ust.-Id.-Nr. DE813812187
>>>
>>> Geschäftsführer: Axel Gruber, Anton Gruber
>>>
>>> Steuernummer: 141/151/51801
>>>
>>> ___
>>> Gluster-users mailing list
>>> Gluster-users@gluster.org
>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>
>>
>>
>> --
>> Thanks and Regards,
>> Kotresh H R
>>
>


-- 
Thanks and Regards,
Kotresh H R
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Geo-Replication memory leak on slave node

2018-06-20 Thread Kotresh Hiremath Ravishankar
Hi Mark,

Sorry, I was busy and could not take a serious look at the logs. I can
update you on Monday.

Thanks,
Kotresh HR

On Wed, Jun 20, 2018 at 12:32 PM, Mark Betham <
mark.bet...@performancehorizon.com> wrote:

> Hi Kotresh,
>
> I was wondering if you had made any progress with regards to the issue I
> am currently experiencing with geo-replication.
>
> For info the fault remains and effectively requires a restart of the
> geo-replication service on a daily basis to reclaim the used memory on the
> slave node.
>
> If you require any further information then please do not hesitate to ask.
>
> Many thanks,
>
> Mark Betham
>
>
> On Mon, 11 Jun 2018 at 08:24, Mark Betham  performancehorizon.com> wrote:
>
>> Hi Kotresh,
>>
>> Many thanks.  I will shortly setup a share on my GDrive and send the link
>> directly to yourself.
>>
>> For Info;
>> The Geo-Rep slave failed again over the weekend but it did not recover
>> this time.  It looks to have become unresponsive at around 14:40 UTC on 9th
>> June.  I have attached an image showing the mem usage and you can see from
>> this when the system failed.  The system was totally unresponsive and
>> required a cold power off and then power on in order to recover the server.
>>
>> Many thanks for your help.
>>
>> Mark Betham.
>>
>> On 11 June 2018 at 05:53, Kotresh Hiremath Ravishankar <
>> khire...@redhat.com> wrote:
>>
>>> Hi Mark,
>>>
>>> Google drive works for me.
>>>
>>> Thanks,
>>> Kotresh HR
>>>
>>> On Fri, Jun 8, 2018 at 3:00 PM, Mark Betham >> performancehorizon.com> wrote:
>>>
>>>> Hi Kotresh,
>>>>
>>>> The memory issue re-occurred again.  This is indicating it will occur
>>>> around once a day.
>>>>
>>>> Again no traceback listed in the log, the only update in the log was as
>>>> follows;
>>>> [2018-06-08 08:26:43.404261] I [resource(slave):1020:service_loop]
>>>> GLUSTER: connection inactive, stopping timeout=120
>>>> [2018-06-08 08:29:19.357615] I [syncdutils(slave):271:finalize] :
>>>> exiting.
>>>> [2018-06-08 08:31:02.432002] I [resource(slave):1502:connect] GLUSTER:
>>>> Mounting gluster volume locally...
>>>> [2018-06-08 08:31:03.716967] I [resource(slave):1515:connect] GLUSTER:
>>>> Mounted gluster volume duration=1.2729
>>>> [2018-06-08 08:31:03.717411] I [resource(slave):1012:service_loop]
>>>> GLUSTER: slave listening
>>>>
>>>> I have attached an image showing the latest memory usage pattern.
>>>>
>>>> Can you please advise how I can pass the log data across to you?  As
>>>> soon as I know this I will get the data uploaded for your review.
>>>>
>>>> Thanks,
>>>>
>>>> Mark Betham
>>>>
>>>>
>>>>
>>>>
>>>> On 7 June 2018 at 08:19, Mark Betham >>> performancehorizon.com> wrote:
>>>>
>>>>> Hi Kotresh,
>>>>>
>>>>> Many thanks for your prompt response.
>>>>>
>>>>> Below are my responses to your questions;
>>>>>
>>>>> 1. Is this trace back consistently hit? I just wanted to confirm
>>>>> whether it's transient which occurs once in a while and gets back to 
>>>>> normal?
>>>>> It appears not.  As soon as the geo-rep recovered yesterday from the
>>>>> high memory usage it immediately began rising again until it consumed all
>>>>> of the available ram.  But this time nothing was committed to the log 
>>>>> file.
>>>>> I would like to add here that this current instance of geo-rep was
>>>>> only brought online at the start of this week due to the issues with glibc
>>>>> on CentOS 7.5.  This is the first time I have had geo-rep running with
>>>>> Gluster ver 3.12.9, both storage clusters at each physical site were only
>>>>> rebuilt approx. 4 weeks ago, due to the previous version in use going EOL.
>>>>> Prior to this I had been running 3.13.2 (3.13.X now EOL) at each of the
>>>>> sites and it is worth noting that the same behaviour was also seen on this
>>>>> version of Gluster, unfortunately I do not have any of the log data from
>>>>> then but I do not recall seeing any instances of the trace back message
>>>>> mentioned.
>>>>>
>>>

Re: [Gluster-users] Need Help to get GEO Replication working - Error: Please check gsync config file. Unable to get statefile's name

2018-06-18 Thread Kotresh Hiremath Ravishankar
Hi Alex,

Sorry, I lost the context.

Which gluster version are you using?

Thanks,
Kotresh HR

On Sat, Jun 16, 2018 at 2:57 PM, Axel Gruber  wrote:

> Hello
>
> i think its better to open a new Thread:
>
>
> I tryed to install Geo Replication again - setup SSH Key - prepared
> session Broker and so on (shown in the Manual)
>
> But i get this error:
>
> root@glusters1:~# gluster volume geo-replication gpool geo.com::geovol
> create force
> Please check gsync config file. Unable to get statefile's name
> geo-replication command failed
>
> I use "force" command because Slave Gluster is to small - but its empty -
> so whtiout Force i get:
>
> root@glusters1:~# gluster volume geo-replication gpool geo.com::geovol
> create
> Total disk size of master is greater than disk size of slave.
> Total available size of master is greater than available size of slave
> geo-replication command failed
>
> I also tryed to adjust Sice of GEO Volume  - so now GEO Volume is bigger
> then Master Volume - but still same Error.
>
>
> Can anyone help me to understand whats going wrong here ?
>
>
>
>
> 
> Kontaktieren Sie mich direkt per Live Chat - einfach hier Klicken:
> 
> Mit freundlichen Grüßen
> Autohaus A. Gruber OHG
> Axel Gruber / Geschäftsführer
>
> Tel: 0807193200
> Fax: 0807193202
> E-Mail: a...@agm.de
> Internet: www.autohaus-gruber.net
>
> Ihr starker MAZDA und HYUNDAI Partner 4mal in der Region - einmal auch in
> Ihrer Nähe.
>
> Autohaus A. Gruber OHG, Gewerbepark Kaserne 10, 83278 Traunstein.
>
> HRA 8216 Amtsgericht Traunstein, Ust.-Id.-Nr. DE813812187
>
> Geschäftsführer: Axel Gruber, Anton Gruber
>
> Steuernummer: 141/151/51801
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>



-- 
Thanks and Regards,
Kotresh H R
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Understand Geo Replication of a big Gluster

2018-06-14 Thread Kotresh Hiremath Ravishankar
Hi Axel,

No geo-replication can't be used without SSH. It's not configurable.
Geo-rep master nodes connect to slave and transfers data over ssh.

I assume you have created the geo-rep session before start.
In the command above, the syntax is incorrect. It should use "::" and not
":/"

gluster volume geo-replication gpool glusergeo::geovol start

Thanks,
Kotresh HR

On Fri, Jun 15, 2018 at 1:32 AM, Axel Gruber  wrote:

> Hello
>
> First of all Thank you for Information
>
> As First Try i want to setup Geo -Replication in a Test System with
> Virtual Machines
>
> So for the Test i have a distributed Gluster with 3 Nodes:
>
> Status of volume: gpool
> Gluster process TCP Port  RDMA Port  Online
> Pid
> 
> --
> Brick glusers1:/mnt/glust/gpool 49152 0  Y
>  568
> Brick glusers2:/mnt/glust/gpool 49152 0  Y
>  574
> Brick glusers3:/mnt/glust/gpool 49152 0  Y
>  665
> NFS Server on localhost 2049  0  Y
>  1127
> NFS Server on glusers2  2049  0  Y
>  926
> NFS Server on glusers3  2049  0  Y
>  695
>
> on The Slave Side i have created also a  volume
>
> Status of volume: geovol
> Gluster process TCP Port  RDMA Port  Online
> Pid
> 
> --
> Brick glusergeo:/mnt/glust/geo  49152 0  Y
>  629
>
> The Maser Side can "see" the Hostname and Ping the IP of the Slave
> root@glusters1:~# ping glusergeo
> PING glusergeo (172.20.230.86) 56(84) bytes of data.
> 64 bytes from glusergeo (172.20.230.86): icmp_seq=1 ttl=64 time=3.85 ms
>
> So for Test i not want to use SSH and i Tryed:
>
>
> root@glusters1:~# gluster volume geo-replication gpool glusergeo:/geovol
> start
> Invalid Url: glusergeo:/geovol
>
> Is it not possible to user geoReplication without SSH ?
>
> Best Regards
>
>
> Am Do., 14. Juni 2018 um 16:27 Uhr schrieb Milind Changire <
> mchan...@redhat.com>:
>
>> Hello Axel,
>> A warm welcome to you and happy to see you board the Gluster ship.
>>
>> geo-replication requires two clusters: one master cluster which is the
>> source cluster and the second cluster called the slave cluster which is the
>> destination cluster
>>
>> Just as you have created a primary/master cluster, you need to create
>> another cluster out of the backup servers in whatever configuration (volume
>> type) you want to create. The volume type of the slave cluster need not be
>> the same as the master cluster.
>>
>> Once you've created the slave cluster, you can then use geo-replication
>> commands to create a communication channel between the two clusters so that
>> the sources from the master cluster are replicated to the slave cluster and
>> to keep pushing any updates to the slave cluster.
>>
>> Hope this helps.
>>
>>
>>
>> On Thu, Jun 14, 2018 at 6:23 PM, Axel Gruber  wrote:
>>
>>> Hello
>>>
>>> Im new on GlusterFS - so a warm hello to everyone here...
>>>
>>> Im testing GlusterFS since some Weeks in different configurations for a
>>> big Media Storage
>>>
>>> Currently for start we plan a distributed / Replicated Gluster with for
>>> Nodes (4x70Tb)
>>> I tryed this within a Test Area on differenzt Virtual Machines - works
>>> fine.
>>>
>>> But for security Reason we also plan Geo Replication of the Whole
>>> Gluster. I have read Doc  - but one thing i dont understand:
>>>
>>> For example our Start Gluster has 4 Nodes (4x70TB  Replicated and
>>> Distributed) i have 140TB Capacity.
>>>
>>> So when i want to Geo Replicate this Volume i dont have a sigle Server
>>> wich is able to store 140GB (or later more) - but i can have several Backup
>>> Servers with a total Sice of 140TB
>>>
>>> So my Question
>>>
>>> How can i say Gluster FS in Geo Replication to use all of this Backup
>>> Servers to GEO REplicate the whole Storage using all Backup Server.
>>>
>>> Best Regards
>>>
>>>
>>> ___
>>> Gluster-users mailing list
>>> Gluster-users@gluster.org
>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>
>>
>>
>> --
>> Milind
>>
>>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>



-- 
Thanks and Regards,
Kotresh H R
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Understand Geo Replication of a big Gluster

2018-06-14 Thread Kotresh Hiremath Ravishankar
Hi Axel,

You don't need single server with 140 TB capacity for replication. The
slave (backup) is also a gluster volume similar to master volume.
So create  the slave (backup) gluster volume with 4 or more nodes to meet
the capacity of master and setup geo-rep between these two volumes.
Geo-replication is between two gluster volumes and not between gluster
volume and single node.


Thanks,
Kotresh HR

On Thu, Jun 14, 2018 at 6:23 PM, Axel Gruber  wrote:

> Hello
>
> Im new on GlusterFS - so a warm hello to everyone here...
>
> Im testing GlusterFS since some Weeks in different configurations for a
> big Media Storage
>
> Currently for start we plan a distributed / Replicated Gluster with for
> Nodes (4x70Tb)
> I tryed this within a Test Area on differenzt Virtual Machines - works
> fine.
>
> But for security Reason we also plan Geo Replication of the Whole Gluster.
> I have read Doc  - but one thing i dont understand:
>
> For example our Start Gluster has 4 Nodes (4x70TB  Replicated and
> Distributed) i have 140TB Capacity.
>
> So when i want to Geo Replicate this Volume i dont have a sigle Server
> wich is able to store 140GB (or later more) - but i can have several Backup
> Servers with a total Sice of 140TB
>
> So my Question
>
> How can i say Gluster FS in Geo Replication to use all of this Backup
> Servers to GEO REplicate the whole Storage using all Backup Server.
>
> Best Regards
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>



-- 
Thanks and Regards,
Kotresh H R
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Geo-Replication memory leak on slave node

2018-06-10 Thread Kotresh Hiremath Ravishankar
Hi Mark,

Google drive works for me.

Thanks,
Kotresh HR

On Fri, Jun 8, 2018 at 3:00 PM, Mark Betham <
mark.bet...@performancehorizon.com> wrote:

> Hi Kotresh,
>
> The memory issue re-occurred again.  This is indicating it will occur
> around once a day.
>
> Again no traceback listed in the log, the only update in the log was as
> follows;
> [2018-06-08 08:26:43.404261] I [resource(slave):1020:service_loop]
> GLUSTER: connection inactive, stopping timeout=120
> [2018-06-08 08:29:19.357615] I [syncdutils(slave):271:finalize] :
> exiting.
> [2018-06-08 08:31:02.432002] I [resource(slave):1502:connect] GLUSTER:
> Mounting gluster volume locally...
> [2018-06-08 08:31:03.716967] I [resource(slave):1515:connect] GLUSTER:
> Mounted gluster volume duration=1.2729
> [2018-06-08 08:31:03.717411] I [resource(slave):1012:service_loop]
> GLUSTER: slave listening
>
> I have attached an image showing the latest memory usage pattern.
>
> Can you please advise how I can pass the log data across to you?  As soon
> as I know this I will get the data uploaded for your review.
>
> Thanks,
>
> Mark Betham
>
>
>
>
> On 7 June 2018 at 08:19, Mark Betham 
> wrote:
>
>> Hi Kotresh,
>>
>> Many thanks for your prompt response.
>>
>> Below are my responses to your questions;
>>
>> 1. Is this trace back consistently hit? I just wanted to confirm whether
>> it's transient which occurs once in a while and gets back to normal?
>> It appears not.  As soon as the geo-rep recovered yesterday from the high
>> memory usage it immediately began rising again until it consumed all of the
>> available ram.  But this time nothing was committed to the log file.
>> I would like to add here that this current instance of geo-rep was only
>> brought online at the start of this week due to the issues with glibc on
>> CentOS 7.5.  This is the first time I have had geo-rep running with Gluster
>> ver 3.12.9, both storage clusters at each physical site were only rebuilt
>> approx. 4 weeks ago, due to the previous version in use going EOL.  Prior
>> to this I had been running 3.13.2 (3.13.X now EOL) at each of the sites and
>> it is worth noting that the same behaviour was also seen on this version of
>> Gluster, unfortunately I do not have any of the log data from then but I do
>> not recall seeing any instances of the trace back message mentioned.
>>
>> 2. Please upload the complete geo-rep logs from both master and slave.
>> I have the log files, just checking to make sure there is no confidential
>> info inside.  The logfiles are too big to send via email, even when
>> compressed.  Do you have a preferred method to allow me to share this data
>> with you or would a share from my Google drive be sufficient?
>>
>> 3. Are the gluster versions same across master and slave?
>> Yes, all gluster versions are the same across the two sites for all
>> storage nodes.  See below for version info taken from the current geo-rep
>> master.
>>
>> glusterfs 3.12.9
>> Repository revision: git://git.gluster.org/glusterfs.git
>> Copyright (c) 2006-2016 Red Hat, Inc. <https://www.gluster.org/>
>> GlusterFS comes with ABSOLUTELY NO WARRANTY.
>> It is licensed to you under your choice of the GNU Lesser
>> General Public License, version 3 or any later version (LGPLv3
>> or later), or the GNU General Public License, version 2 (GPLv2),
>> in all cases as published by the Free Software Foundation.
>>
>> glusterfs-geo-replication-3.12.9-1.el7.x86_64
>> glusterfs-gnfs-3.12.9-1.el7.x86_64
>> glusterfs-libs-3.12.9-1.el7.x86_64
>> glusterfs-server-3.12.9-1.el7.x86_64
>> glusterfs-3.12.9-1.el7.x86_64
>> glusterfs-api-3.12.9-1.el7.x86_64
>> glusterfs-events-3.12.9-1.el7.x86_64
>> centos-release-gluster312-1.0-1.el7.centos.noarch
>> glusterfs-client-xlators-3.12.9-1.el7.x86_64
>> glusterfs-cli-3.12.9-1.el7.x86_64
>> python2-gluster-3.12.9-1.el7.x86_64
>> glusterfs-rdma-3.12.9-1.el7.x86_64
>> glusterfs-fuse-3.12.9-1.el7.x86_64
>>
>> I have also attached another screenshot showing the memory usage from the
>> Gluster slave for the last 48 hours.  This shows memory saturation from
>> yesterday, which correlates with the trace back sent yesterday, and the
>> subsequent memory saturation which occurred over the last 24 hours.  For
>> info, all times are in UTC.
>>
>> Please advise the preferred method to get the log data across to you and
>> also if you require any further information.
>>
>> Many thanks,
>>
>> Mark Betham
>>
>>
>> On 7 June 2018 at 04:42, Kotres

Re: [Gluster-users] Geo-Replication memory leak on slave node

2018-06-06 Thread Kotresh Hiremath Ravishankar
Hi Mark,

Few questions.

1. Is this trace back consistently hit? I just wanted to confirm whether
it's transient which occurs once in a while and gets back to normal?
2. Please upload the complete geo-rep logs from both master and slave.

Thanks,
Kotresh HR

On Wed, Jun 6, 2018 at 7:10 PM, Mark Betham <
mark.bet...@performancehorizon.com> wrote:

> Dear Gluster-Users,
>
> I have geo-replication setup and configured between 2 Gluster pools
> located at different sites.  What I am seeing is an error being reported
> within the geo-replication slave log as follows;
>
> *[2018-06-05 12:05:26.767615] E
> [syncdutils(slave):331:log_raise_exception] : FAIL: *
> *Traceback (most recent call last):*
> *  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line
> 361, in twrap*
> *tf(*aa)*
> *  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1009,
> in *
> *t = syncdutils.Thread(target=lambda: (repce.service_loop(),*
> *  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 90, in
> service_loop*
> *self.q.put(recv(self.inf))*
> *  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 61, in
> recv*
> *return pickle.load(inf)*
> *ImportError: No module named
> h_2013-04-26-04:02:49-2013-04-26_11:02:53.gz.15WBuUh*
> *[2018-06-05 12:05:26.768085] E [repce(slave):117:worker] : call
> failed: *
> *Traceback (most recent call last):*
> *  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 113, in
> worker*
> *res = getattr(self.obj, rmeth)(*in_data[2:])*
> *TypeError: getattr(): attribute name must be string*
>
> From this point in time the slave server begins to consume all of its
> available RAM until it becomes non-responsive.  Eventually the gluster
> service seems to kill off the offending process and the memory is returned
> to the system.  Once the memory has been returned to the remote slave
> system the geo-replication often recovers and data transfer resumes.
>
> I have attached the full geo-replication slave log containing the error
> shown above.  I have also attached an image file showing the memory usage
> of the affected storage server.
>
> We are currently running Gluster version 3.12.9 on top of CentOS 7.5
> x86_64.  The system has been fully patched and is running the latest
> software, excluding glibc which had to be downgraded to get geo-replication
> working.
>
> The Gluster volume runs on a dedicated partition using the XFS filesystem
> which in turn is running on a LVM thin volume.  The physical storage is
> presented as a single drive due to the underlying disks being part of a
> raid 10 array.
>
> The Master volume which is being replicated has a total of 2.2 TB of data
> to be replicated.  The total size of the volume fluctuates very little as
> data being removed equals the new data coming in.  This data is made up of
> many thousands of files across many separated directories.  Data file sizes
> vary from the very small (>1K) to the large (>1Gb).  The Gluster service
> itself is running with a single volume in a replicated configuration across
> 3 bricks at each of the sites.  The delta changes being replicated are on
> average about 100GB per day, where this includes file creation / deletion /
> modification.
>
> The config for the geo-replication session is as follows, taken from the
> current source server;
>
> *special_sync_mode: partial*
> *gluster_log_file:
> /var/log/glusterfs/geo-replication/glustervol0/ssh%3A%2F%2Froot%40storage-server.local%3Agluster%3A%2F%2F127.0.0.1%3Aglustervol1.gluster.log*
> *ssh_command: ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no
> -i /var/lib/glusterd/geo-replication/secret.pem*
> *change_detector: changelog*
> *session_owner: 40e9e77a-034c-44a2-896e-59eec47e8a84*
> *state_file:
> /var/lib/glusterd/geo-replication/glustervol0_storage-server.local_glustervol1/monitor.status*
> *gluster_params: aux-gfid-mount acl*
> *log_rsync_performance: true*
> *remote_gsyncd: /nonexistent/gsyncd*
> *working_dir:
> /var/lib/misc/glusterfsd/glustervol0/ssh%3A%2F%2Froot%40storage-server.local%3Agluster%3A%2F%2F127.0.0.1%3Aglustervol1*
> *state_detail_file:
> /var/lib/glusterd/geo-replication/glustervol0_storage-server.local_glustervol1/ssh%3A%2F%2Froot%40storage-server.local%3Agluster%3A%2F%2F127.0.0.1%3Aglustervol1-detail.status*
> *gluster_command_dir: /usr/sbin/*
> *pid_file:
> /var/lib/glusterd/geo-replication/glustervol0_storage-server.local_glustervol1/monitor.pid*
> *georep_session_working_dir:
> /var/lib/glusterd/geo-replication/glustervol0_storage-server.local_glustervol1/*
> *ssh_command_tar: ssh -oPasswordAuthentication=no
> -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/tar_ssh.pem*
> *master.stime_xattr_name:
> trusted.glusterfs.40e9e77a-034c-44a2-896e-59eec47e8a84.ccfaed9b-ff4b-4a55-acfa-03f092cdf460.stime*
> *changelog_log_file:
> 

Re: [Gluster-users] Geo-Replication memory leak on slave node

2018-06-06 Thread Kotresh Hiremath Ravishankar
Hi Mark,

Few questions.

1. Is this trace back consistently hit? I just wanted to confirm whether
it's transient which occurs once in a while and gets back to normal?
2. Please upload the complete geo-rep logs from both master and slave.
3. Are the gluster versions same across master and slave?

Thanks,
Kotresh HR

On Wed, Jun 6, 2018 at 7:10 PM, Mark Betham <
mark.bet...@performancehorizon.com> wrote:

> Dear Gluster-Users,
>
> I have geo-replication setup and configured between 2 Gluster pools
> located at different sites.  What I am seeing is an error being reported
> within the geo-replication slave log as follows;
>
> *[2018-06-05 12:05:26.767615] E
> [syncdutils(slave):331:log_raise_exception] : FAIL: *
> *Traceback (most recent call last):*
> *  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line
> 361, in twrap*
> *tf(*aa)*
> *  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1009,
> in *
> *t = syncdutils.Thread(target=lambda: (repce.service_loop(),*
> *  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 90, in
> service_loop*
> *self.q.put(recv(self.inf))*
> *  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 61, in
> recv*
> *return pickle.load(inf)*
> *ImportError: No module named
> h_2013-04-26-04:02:49-2013-04-26_11:02:53.gz.15WBuUh*
> *[2018-06-05 12:05:26.768085] E [repce(slave):117:worker] : call
> failed: *
> *Traceback (most recent call last):*
> *  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 113, in
> worker*
> *res = getattr(self.obj, rmeth)(*in_data[2:])*
> *TypeError: getattr(): attribute name must be string*
>
> From this point in time the slave server begins to consume all of its
> available RAM until it becomes non-responsive.  Eventually the gluster
> service seems to kill off the offending process and the memory is returned
> to the system.  Once the memory has been returned to the remote slave
> system the geo-replication often recovers and data transfer resumes.
>
> I have attached the full geo-replication slave log containing the error
> shown above.  I have also attached an image file showing the memory usage
> of the affected storage server.
>
> We are currently running Gluster version 3.12.9 on top of CentOS 7.5
> x86_64.  The system has been fully patched and is running the latest
> software, excluding glibc which had to be downgraded to get geo-replication
> working.
>
> The Gluster volume runs on a dedicated partition using the XFS filesystem
> which in turn is running on a LVM thin volume.  The physical storage is
> presented as a single drive due to the underlying disks being part of a
> raid 10 array.
>
> The Master volume which is being replicated has a total of 2.2 TB of data
> to be replicated.  The total size of the volume fluctuates very little as
> data being removed equals the new data coming in.  This data is made up of
> many thousands of files across many separated directories.  Data file sizes
> vary from the very small (>1K) to the large (>1Gb).  The Gluster service
> itself is running with a single volume in a replicated configuration across
> 3 bricks at each of the sites.  The delta changes being replicated are on
> average about 100GB per day, where this includes file creation / deletion /
> modification.
>
> The config for the geo-replication session is as follows, taken from the
> current source server;
>
> *special_sync_mode: partial*
> *gluster_log_file:
> /var/log/glusterfs/geo-replication/glustervol0/ssh%3A%2F%2Froot%40storage-server.local%3Agluster%3A%2F%2F127.0.0.1%3Aglustervol1.gluster.log*
> *ssh_command: ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no
> -i /var/lib/glusterd/geo-replication/secret.pem*
> *change_detector: changelog*
> *session_owner: 40e9e77a-034c-44a2-896e-59eec47e8a84*
> *state_file:
> /var/lib/glusterd/geo-replication/glustervol0_storage-server.local_glustervol1/monitor.status*
> *gluster_params: aux-gfid-mount acl*
> *log_rsync_performance: true*
> *remote_gsyncd: /nonexistent/gsyncd*
> *working_dir:
> /var/lib/misc/glusterfsd/glustervol0/ssh%3A%2F%2Froot%40storage-server.local%3Agluster%3A%2F%2F127.0.0.1%3Aglustervol1*
> *state_detail_file:
> /var/lib/glusterd/geo-replication/glustervol0_storage-server.local_glustervol1/ssh%3A%2F%2Froot%40storage-server.local%3Agluster%3A%2F%2F127.0.0.1%3Aglustervol1-detail.status*
> *gluster_command_dir: /usr/sbin/*
> *pid_file:
> /var/lib/glusterd/geo-replication/glustervol0_storage-server.local_glustervol1/monitor.pid*
> *georep_session_working_dir:
> /var/lib/glusterd/geo-replication/glustervol0_storage-server.local_glustervol1/*
> *ssh_command_tar: ssh -oPasswordAuthentication=no
> -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/tar_ssh.pem*
> *master.stime_xattr_name:
> trusted.glusterfs.40e9e77a-034c-44a2-896e-59eec47e8a84.ccfaed9b-ff4b-4a55-acfa-03f092cdf460.stime*
> *changelog_log_file:
> 

Re: [Gluster-users] New Style Replication in Version 4

2018-05-01 Thread Kotresh Hiremath Ravishankar
Hi John Hearns,

Thanks for considering gluster. The feature you are requesting is
Active-Active and is not available with geo-replication in 4.0.
So, the use case can't be achieved using single gluster volume. But your
use case can be achieved if we keep two volumes
one for analysis file and other for result files if the there is
distinction between analysis files and result files.

Home Site/Europe Remote
Site/USA
Analysis Files Volume (Geo-rep Slave)     Results Files
Volume (Geo-rep Slave)

Thanks,
Kotresh HR

On Mon, Apr 30, 2018 at 4:56 PM, John Hearns  wrote:

> I am landing here again following a spell on the list when I worked at XMA
> in the UK. Hello again.
>
>
> I have a use case of having a remote office, which should be able to have
> a common storage area with a main office.
>
> At the last company I worked with, we used GPFS with AFM to achieve this
> (not really relevant to this list).
>
>
> I at first though Geo Replication would be ideal for this use case,
> however further reading says that it is suitable for making a slave only
> copy for disaster recovery. Ie. there is no 'two way' syncing of files.
>
> To make it a bit clearer, the concept is that the remote site might be in
> the USA, and the home site in Europe.
>
> Scientists at the remote site copy files for analysis to a local storage
> server.
>
> At the home site there should be a replica of the analysis files. At the
> home site there are HPC resources to run simulations or process the data
> somehow. Results files are written to a storage server and then should be
> replicated back to the remote site.
>
>
>
> Would anyone be able to comment on the New Style Geo Replication which has
> come along in Gluster version 4?
>
> What is the status please, and is it a suitable method for enabling a
> share at a remote office which syncs either way?
>
>
> thankyou
>
> John Hearns
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>



-- 
Thanks and Regards,
Kotresh H R
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] trashcan on dist. repl. volume with geo-replication

2018-03-12 Thread Kotresh Hiremath Ravishankar
Hi Dietmar,

I am trying to understand the problem and have few questions.

1. Is trashcan enabled only on master volume?
2. Does the 'rm -rf' done on master volume synced to slave ?
3. If trashcan is disabled, the issue goes away?

The geo-rep error just says the it failed to create the directory
"Oracle_VM_VirtualBox_Extension" on slave.
Usually this would be because of gfid mismatch but I don't see that in your
case. So I am little more interested
in present state of the geo-rep. Is it still throwing same errors and same
failure to sync the same directory. If
so does the parent 'test1/b1' exists on slave?

And doing ls on trashcan should not affect geo-rep. Is there a easy
reproducer for this ?


Thanks,
Kotresh HR

On Mon, Mar 12, 2018 at 10:13 PM, Dietmar Putz 
wrote:

> Hello,
>
> in regard to
> https://bugzilla.redhat.com/show_bug.cgi?id=1434066
> i have been faced to another issue when using the trashcan feature on a
> dist. repl. volume running a geo-replication. (gfs 3.12.6 on ubuntu 16.04.4)
> for e.g. removing an entire directory with subfolders :
> tron@gl-node1:/myvol-1/test1/b1$ rm -rf *
>
> afterwards listing files in the trashcan :
> tron@gl-node1:/myvol-1/test1$ ls -la /myvol-1/.trashcan/test1/b1/
>
> leads to an outage of the geo-replication.
> error on master-01 and master-02 :
>
> [2018-03-12 13:37:14.827204] I [master(/brick1/mvol1):1385:crawl]
> _GMaster: slave's time stime=(1520861818, 0)
> [2018-03-12 13:37:14.835535] E [master(/brick1/mvol1):784:log_failures]
> _GMaster: ENTRY FAILEDdata=({'uid': 0, 'gfid':
> 'c38f75e3-194a-4d22-9094-50ac8f8756e7', 'gid': 0, 'mode': 16877, 'entry':
> '.gfid/5531bd64-ac50-462b-943e-c0bf1c52f52c/Oracle_VM_VirtualBox_Extension',
> 'op': 'MKDIR'}, 2, {'gfid_mismatch': False, 'dst': False})
> [2018-03-12 13:37:14.835911] E 
> [syncdutils(/brick1/mvol1):299:log_raise_exception]
> : The above directory failed to sync. Please fix it to proceed further.
>
>
> both gfid's of the directories as shown in the log :
> brick1/mvol1/.trashcan/test1/b1 0x5531bd64ac50462b943ec0bf1c52f52c
> brick1/mvol1/.trashcan/test1/b1/Oracle_VM_VirtualBox_Extension
> 0xc38f75e3194a4d22909450ac8f8756e7
>
> the shown directory contains just one file which is stored on gl-node3 and
> gl-node4 while node1 and 2 are in geo replication error.
> since the filesize limitation of the trashcan is obsolete i'm really
> interested to use the trashcan feature but i'm concerned it will interrupt
> the geo-replication entirely.
> does anybody else have been faced with this situation...any hints,
> workarounds... ?
>
> best regards
> Dietmar Putz
>
>
> root@gl-node1:~/tmp# gluster volume info mvol1
>
> Volume Name: mvol1
> Type: Distributed-Replicate
> Volume ID: a1c74931-568c-4f40-8573-dd344553e557
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 2 x 2 = 4
> Transport-type: tcp
> Bricks:
> Brick1: gl-node1-int:/brick1/mvol1
> Brick2: gl-node2-int:/brick1/mvol1
> Brick3: gl-node3-int:/brick1/mvol1
> Brick4: gl-node4-int:/brick1/mvol1
> Options Reconfigured:
> changelog.changelog: on
> geo-replication.ignore-pid-check: on
> geo-replication.indexing: on
> features.trash-max-filesize: 2GB
> features.trash: on
> transport.address-family: inet
> nfs.disable: on
> performance.client-io-threads: off
>
> root@gl-node1:/myvol-1/test1# gluster volume geo-replication mvol1
> gl-node5-int::mvol1 config
> special_sync_mode: partial
> gluster_log_file: /var/log/glusterfs/geo-replica
> tion/mvol1/ssh%3A%2F%2Froot%40192.168.178.65%3Agluster%3A%
> 2F%2F127.0.0.1%3Amvol1.gluster.log
> ssh_command: ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i
> /var/lib/glusterd/geo-replication/secret.pem
> change_detector: changelog
> use_meta_volume: true
> session_owner: a1c74931-568c-4f40-8573-dd344553e557
> state_file: /var/lib/glusterd/geo-replication/mvol1_gl-node5-int_mvol1/
> monitor.status
> gluster_params: aux-gfid-mount acl
> remote_gsyncd: /nonexistent/gsyncd
> working_dir: /var/lib/misc/glusterfsd/mvol1/ssh%3A%2F%2Froot%40192.168.
> 178.65%3Agluster%3A%2F%2F127.0.0.1%3Amvol1
> state_detail_file: /var/lib/glusterd/geo-replicat
> ion/mvol1_gl-node5-int_mvol1/ssh%3A%2F%2Froot%40192.168.
> 178.65%3Agluster%3A%2F%2F127.0.0.1%3Amvol1-detail.status
> gluster_command_dir: /usr/sbin/
> pid_file: /var/lib/glusterd/geo-replication/mvol1_gl-node5-int_mvol1/
> monitor.pid
> georep_session_working_dir: /var/lib/glusterd/geo-replicat
> ion/mvol1_gl-node5-int_mvol1/
> ssh_command_tar: ssh -oPasswordAuthentication=no
> -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replicat
> ion/tar_ssh.pem
> master.stime_xattr_name: trusted.glusterfs.a1c74931-568
> c-4f40-8573-dd344553e557.d62bda3a-1396-492a-ad99-7c6238d93c6a.stime
> changelog_log_file: /var/log/glusterfs/geo-replica
> tion/mvol1/ssh%3A%2F%2Froot%40192.168.178.65%3Agluster%3A%
> 2F%2F127.0.0.1%3Amvol1-changes.log
> socketdir: /var/run/gluster
> volume_id: a1c74931-568c-4f40-8573-dd344553e557
> ignore_deletes: false
> 

Re: [Gluster-users] geo replication

2018-03-06 Thread Kotresh Hiremath Ravishankar
Hi,

It is failing to get the virtual xattr value of
"trusted.glusterfs.volume-mark" at master volume root.
Could you share the geo-replication logs under
/var/log/glusterfs/geo-replication/*.gluster.log ?
I think if there are any transient errors, stopping geo-rep and restarting
master volume should fix it.

Thanks,
Kotresh HR

On Tue, Mar 6, 2018 at 2:28 PM, Curt Lestrup  wrote:

> Hi,
>
>
>
> Have problems with geo replication on glusterfs 3.12.6 / Ubuntu 16.04.
>
> I can see a “master volinfo unavailable” in master logfile.
>
> Any ideas?
>
>
>
>
>
> Master:
>
> Status of volume: testtomcat
>
> Gluster process TCP Port  RDMA Port  Online
> Pid
>
> 
> --
>
> Brick gfstest07:/gfs/testtomcat/mount   49153 0  Y
> 326
>
> Brick gfstest05:/gfs/testtomcat/mount   49153 0  Y
> 326
>
> Brick gfstest01:/gfs/testtomcat/mount   49153 0  Y
> 335
>
> Self-heal Daemon on localhost   N/A   N/AY
> 1134
>
> Self-heal Daemon on gfstest07   N/A   N/AY
> 564
>
> Self-heal Daemon on gfstest05   N/A   N/AY
> 1038
>
>
>
>
>
>
>
> Slave:
>
> Status of volume: testtomcat
>
> Gluster process TCP Port  RDMA Port  Online
> Pid
>
> 
> --
>
> Brick stogfstest11:/gfs/testtomcat/mount49152 0  Y
> 294
>
>
>
>
>
> Created & started the session with:
>
> gluster volume geo-replication testtomcat stogfstest11::testtomcat create
> no-verify
>
> gluster volume geo-replication testtomcat stogfstest11::testtomcat start
>
>
>
> getting the following logs:
>
> master:
>
> [2018-03-06 08:32:46.767544] I [gsyncdstatus(monitor):242:set_worker_status]
> GeorepStatus: Worker Status Changestatus=Initializing...
>
> [2018-03-06 08:32:46.872857] I [monitor(monitor):280:monitor] Monitor:
> starting gsyncd workerbrick=/gfs/testtomcat/mount
> slave_node=ssh://root@stogfstest11:gluster://localhost:testtomcat
>
> [2018-03-06 08:32:46.961122] I 
> [changelogagent(/gfs/testtomcat/mount):73:__init__]
> ChangelogAgent: Agent listining...
>
> [2018-03-06 08:32:46.962470] I 
> [resource(/gfs/testtomcat/mount):1771:connect_remote]
> SSH: Initializing SSH connection between master and slave...
>
> [2018-03-06 08:32:48.515974] I 
> [resource(/gfs/testtomcat/mount):1778:connect_remote]
> SSH: SSH connection between master and slave established.
> duration=1.5530
>
> [2018-03-06 08:32:48.516247] I [resource(/gfs/testtomcat/mount):1493:connect]
> GLUSTER: Mounting gluster volume locally...
>
> [2018-03-06 08:32:49.739631] I [resource(/gfs/testtomcat/mount):1506:connect]
> GLUSTER: Mounted gluster volumeduration=1.2232
>
> [2018-03-06 08:32:49.739870] I [gsyncd(/gfs/testtomcat/mount):799:main_i]
> : Closing feedback fd, waking up the monitor
>
> [2018-03-06 08:32:51.872872] I [master(/gfs/testtomcat/mount):1518:register]
> _GMaster: Working dirpath=/var/lib/misc/glusterfsd/
> testtomcat/ssh%3A%2F%2Froot%40172.16.81.101%3Agluster%3A%
> 2F%2F127.0.0.1%3Atesttomcat/b6a7905143e15d9b079b804c0a8ebf42
>
> [2018-03-06 08:32:51.873176] I 
> [resource(/gfs/testtomcat/mount):1653:service_loop]
> GLUSTER: Register timetime=1520325171
>
> [2018-03-06 08:32:51.926801] E [syncdutils(/gfs/testtomcat/
> mount):299:log_raise_exception] : master volinfo unavailable
>
> [2018-03-06 08:32:51.936203] I 
> [syncdutils(/gfs/testtomcat/mount):271:finalize]
> : exiting.
>
> [2018-03-06 08:32:51.938469] I [repce(/gfs/testtomcat/mount):92:service_loop]
> RepceServer: terminating on reaching EOF.
>
> [2018-03-06 08:32:51.938776] I 
> [syncdutils(/gfs/testtomcat/mount):271:finalize]
> : exiting.
>
> [2018-03-06 08:32:52.743696] I [monitor(monitor):363:monitor] Monitor:
> worker died in startup phasebrick=/gfs/testtomcat/mount
>
> [2018-03-06 08:32:52.763276] I [gsyncdstatus(monitor):242:set_worker_status]
> GeorepStatus: Worker Status Changestatus=Faulty
>
>
>
>
>
> slave:
>
> [2018-03-06 08:32:47.434591] I [resource(slave):1502:connect] GLUSTER:
> Mounting gluster volume locally...
>
> [2018-03-06 08:32:48.490775] I [resource(slave):1515:connect] GLUSTER:
> Mounted gluster volume   duration=1.0557
>
> [2018-03-06 08:32:48.493134] I [resource(slave):1012:service_loop]
> GLUSTER: slave listening
>
> [2018-03-06 08:32:51.942531] I [repce(slave):92:service_loop] RepceServer:
> terminating on reaching EOF.
>
> [2018-03-06 08:32:51.955379] I [syncdutils(slave):271:finalize] :
> exiting.
>
>
>
>
>
> /Curt
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>



-- 
Thanks and Regards,
Kotresh H R

Re: [Gluster-users] Geo replication snapshot error

2018-02-21 Thread Kotresh Hiremath Ravishankar
Hi,

Thanks for reporting the issue. This seems to be a bug.
Could you please raise a bug at https://bugzilla.redhat.com/ under
community/glusterfs ?
We will take a look at it and fix it.

Thanks,
Kotresh HR

On Wed, Feb 21, 2018 at 2:01 PM, Marcus Pedersén 
wrote:

> Hi all,
> I use gluster 3.12 on centos 7.
> I am writing a snapshot program for my geo-replicated cluster.
> Now when I started to run tests with my application I have found
> a very strange behavior regarding geo-replication in gluster.
>
> I have setup my geo-replication according to the docs:
> http://docs.gluster.org/en/latest/Administrator%20Guide/Geo%20Replication/
>
> Both master and slave clusters are replicated with just two
> machines (VM) and no arbiter.
>
> I have setup a geo-user (called geouser) and do not use
> root as the geo user, as specified in the docs.
>
> Both my master and slave volumes are named: vol
>
> If I pause the geo-replication with:
> gluster volume geo-replication vol geouser@ggluster1-geo::vol pause
> Pausing geo-replication session between vol & geouser@ggluster1-geo::vol
> has been successful
>
> Create a snapshot:
> gluster snapshot create my_snap_no_1000 vol
> snapshot create: success: Snap my_snap_no_1000-2018.02.21-07.45.32
> created successfully
>
> Resume geo-replication:
> gluster volume geo-replication vol geouser@ggluster1-geo::vol resume
> Resuming geo-replication session between vol & geouser@ggluster1-geo::vol
> has been successful
>
>
> Everything works fine!
>
> But here comes the problem:
> If I by accident spell my slave user wrong or don't use
> the user at all, as I was using root,
> no matter what user I write pause/resume do NOT report
> any errors. The answer is always pausing/resuming successful.
> The problem comes after a successful pause when I try to
> create a snapshot. It fails with:
> snapshot create: failed: geo-replication session is running for the volume
> vol. Session needs to be stopped before taking a snapshot.
>
> gluster volume geo-replication status
> MASTER NODEMASTER VOLMASTER BRICKSLAVE USERSLAVE
>  SLAVE NODESTATUSCRAWL STATUSLAST_SYNCED
> 
> 
> -
> ggluster1  vol   /glustergeouser
>  ssh://geouser@ggluster1-geo::volN/A   PausedN/A
>N/A
> ggluster2  vol   /glustergeouser
>  ssh://geouser@ggluster1-geo::volN/A   PausedN/A
>N/A
>
>
> After this snapshots fails all the time!
> If I use the correct user again and pause, no error (paused), snapshot
> fails.
> If I resume with correct user, no errors (active).
> Geo-replication still works fine, but some how has something
> gone wrong so snapshots fail.
> After restart of glusterd in all machines it starts to work again.
>
>
> Here is complete run through:
>
> gluster volume geo-replication status
>
> MASTER NODEMASTER VOLMASTER BRICKSLAVE USERSLAVE
>  SLAVE NODE   STATUS CRAWL STATUS
>  LAST_SYNCED
> 
> 
> 
> ggluster1  vol   /glustergeouser
>  ssh://geouser@ggluster1-geo::volggluster1-geoActive
>  Changelog Crawl2018-02-12 15:49:57
> ggluster2  vol   /glustergeouser
>  ssh://geouser@ggluster1-geo::volggluster2-geoPassiveN/A
>   N/A
>
> # Using wrong user: abc
> gluster volume geo-replication vol abc@ggluster1-geo::vol pause
> Pausing geo-replication session between vol & abc@ggluster1-geo::vol has
> been successful
>
>
> gluster volume geo-replication status
>
> MASTER NODEMASTER VOLMASTER BRICKSLAVE USERSLAVE
>  SLAVE NODESTATUSCRAWL STATUSLAST_SYNCED
> 
> 
> -
> ggluster1  vol   /glustergeouser
>  ssh://geouser@ggluster1-geo::volN/A   PausedN/A
>N/A
> ggluster2  vol   /glustergeouser
>  ssh://geouser@ggluster1-geo::volN/A   PausedN/A
>N/A
>
>
> gluster snapshot create snap_vol_1000 vol
> snapshot create: failed: geo-replication session is running for the volume
> vol. Session needs to be stopped before taking a snapshot.
>
> # Using wrong user: abc
> gluster volume geo-replication vol abc@ggluster1-geo::vol resume
> Resuming geo-replication session between vol & ggluster1-geo::vol has been
> successful
>
>
> gluster volume geo-replication status
>
> MASTER NODEMASTER VOLMASTER BRICKSLAVE USERSLAVE
>  SLAVE NODE   STATUS CRAWL STATUS
>  

Re: [Gluster-users] georeplication over ssh.

2018-02-07 Thread Kotresh Hiremath Ravishankar
Ccing glusterd team for information

On Thu, Feb 8, 2018 at 10:02 AM, Alvin Starr <al...@netvel.net> wrote:

> That makes for an interesting problem.
>
> I cannot open port 24007 to allow RPC access.
>
> On 02/07/2018 11:29 PM, Kotresh Hiremath Ravishankar wrote:
>
> Hi Alvin,
>
> Yes, geo-replication sync happens via SSH. Ther server port 24007 is of
> glusterd.
> glusterd will be listening in this port and all volume management
> communication
> happens via RPC.
>
> Thanks,
> Kotresh HR
>
> On Wed, Feb 7, 2018 at 8:29 PM, Alvin Starr <al...@netvel.net> wrote:
>
>> I am running gluster 3.8.9 and trying to setup a geo-replicated volume
>> over ssh,
>>
>> It looks like the volume create command is trying to directly access the
>> server over port 24007.
>>
>> The docs imply that all communications are over ssh.
>>
>> What am I missing?
>>
>> --
>> Alvin Starr   ||   land:  (905)513-7688
>> Netvel Inc.   ||   Cell:  (416)806-0133al...@netvel.net  
>> ||
>>
>>
>>
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>
>
>
> --
> Thanks and Regards,
> Kotresh H R
>
>
> --
> Alvin Starr   ||   land:  (905)513-7688
> Netvel Inc.   ||   Cell:  (416)806-0133al...@netvel.net   
>||
>
>
>


-- 
Thanks and Regards,
Kotresh H R
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] georeplication over ssh.

2018-02-07 Thread Kotresh Hiremath Ravishankar
Hi Alvin,

Yes, geo-replication sync happens via SSH. Ther server port 24007 is of
glusterd.
glusterd will be listening in this port and all volume management
communication
happens via RPC.

Thanks,
Kotresh HR

On Wed, Feb 7, 2018 at 8:29 PM, Alvin Starr  wrote:

> I am running gluster 3.8.9 and trying to setup a geo-replicated volume
> over ssh,
>
> It looks like the volume create command is trying to directly access the
> server over port 24007.
>
> The docs imply that all communications are over ssh.
>
> What am I missing?
>
> --
> Alvin Starr   ||   land:  (905)513-7688
> Netvel Inc.   ||   Cell:  (416)806-0133al...@netvel.net   
>||
>
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>



-- 
Thanks and Regards,
Kotresh H R
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] geo-replication

2018-02-07 Thread Kotresh Hiremath Ravishankar
Answers inline

On Wed, Feb 7, 2018 at 8:44 PM, Marcus Pedersén <marcus.peder...@slu.se>
wrote:

> Thank you for your help!
> Just to make things clear to me (and get a better understanding of
> gluster):
> So, if I make the slave cluster just distributed and node 1 goes down,
> data (say file.txt) that belongs to node 1 will not be synced.
> When node 1 comes back up does the master not realize that file.txt has not
> been synced and makes sure that it is synced when it has contact with node
> 1 again?
> So file.txt will not exist on node 1 at all?
>

Geo-replication syncs changes based on changelog journal which records all
the file operations.
It syncs every file in two steps
1. File creation with same attributes as on master via rpc (CREATE is
recorded in changelog)
2. Data sync via rsync (DATA is recorded in changelog. Any further appends
will only record DATA)

The changelog processing will not halt on encountering ENOENT(It thinks
it's a safe error). It's not
straight forward. When I said, file won't be synced, it means the file is
created on node1 and when
you append the data, the data would not sync as it gets ENOENT since the
node1 is down. But if the
'CREATE' of file is not synced to node1, then it is persistent failure
(ENOTCON) and waits till node1 comes back.

>
> I did a small test on my testing machines.
> Turned one of the geo machines off and created 1 files containing one
> short string in the master nodes.
> Nothing became synced with the geo slaves.
> When I turned on the geo machine again all 1 files were synced to the
> geo slaves.
> Ofcause devided between the two machines.
> Is this the right/expected behavior of geo-replication with a distributed
> cluster?
>

Yes, it's correct. As I said earlier, CREATE itself would have failed with
ENOTCON. geo-rep waited till slave comes back.
Bring slave node down, and now append data to files which falls under node
which is down, you won't see appended data.
So it's always recommended to use replica/ec/arbiter

>
> Many thanks in advance!
>
> Regards
> Marcus
>
>
> On Wed, Feb 07, 2018 at 06:39:20PM +0530, Kotresh Hiremath Ravishankar
> wrote:
> > We are happy to help you out. Please find the answers inline.
> >
> > On Tue, Feb 6, 2018 at 4:39 PM, Marcus Pedersén <marcus.peder...@slu.se>
> > wrote:
> >
> > > Hi all,
> > >
> > > I am planning my new gluster system and tested things out in
> > > a bunch of virtual machines.
> > > I need a bit of help to understand how geo-replication behaves.
> > >
> > > I have a master gluster cluster replica 2
> > > (in production I will use an arbiter and replicatied/distributed)
> > > and the geo cluster is distributed with 2 machines.
> > > (in production I will have the geo cluster distributed)
> > >
> >
> > It's recommended to use slave also to be distribute
> replicate/aribiter/ec.
> > Choosing only distribute will cause issues
> > when of the slave node is down and a file is being synced which belongs
> to
> > that node. It would not sync
> > later.
> >
> >
> > > Everything is up and running and creating files from client both
> > > replicates and is distributed in the geo cluster.
> > >
> > > The thing I am wondering about is:
> > > When I run: gluster volume geo-replication status
> > > I see both slave nodes one is active and the other is passive.
> > >
> > > MASTER NODEMASTER VOL MASTER BRICKSLAVE USERSLAVE
> > > SLAVE NODE  STATUS CRAWL STATUS
> > >LAST_SYNCED
> > > 
> > > 
> > > ---
> > > gluster1   interbullfs/interbullfsgeouser
> > >  ssh://geouser@gluster-geo1::interbullfs-geogluster-geo2Active
> > >  Changelog Crawl2018-02-06 11:46:08
> > > gluster2   interbullfs/interbullfsgeouser
> > >  ssh://geouser@gluster-geo1::interbullfs-geogluster-geo1
> Passive
> > >   N/AN/A
> > >
> > >
> > > If I shutdown the active slave the status changes to faulty
> > > and the other one continues to be passive.
> > >
> >
> > > MASTER NODEMASTER VOL MASTER BRICKSLAVE USERSLAVE
> > > SLAVE NODE  STATUS CRAWL STATUS
> > > LAST_SYNCED
> > > 
> > > ---

Re: [Gluster-users] add geo-replication "passive" node after node replacement

2018-02-07 Thread Kotresh Hiremath Ravishankar
Hi,

When S3 is added to master volume from new node, the following cmd should
be run to generate and distribute ssh keys

1. Generate ssh keys from new node

   #gluster system:: execute gsec_create

2. Push those ssh keys of new node to slave

  #gluster vol geo-rep  :: create
push-pem force

3. Stop and start geo-rep


But note that while removing brick and adding a brick, you should make sure
the data from the brick being removed is synced
to slave.

Thanks,
Kotresh HR





On Wed, Feb 7, 2018 at 4:21 PM, Stefano Bagnara  wrote:

> Hi all,
>
> i had a replica 2 gluster 3.12 between S1 and S2 (1 brick per node)
> geo-replicated to S5 where both S1 and S2 were visible in the
> geo-replication status and S2 "active" while S1 "passive".
>
> I had to replace S1 with S3, so I did an
> "add-brick  replica 3 S3"
> and then
> "remove-brick replica 2 S1".
>
> Now I have again a replica 2 gluster between S3 and S2 but the
> geo-replica only show S2 as active and no other peer involved. So it
> seems S3 does not know about the geo-replica and it is not ready to
> geo-replicate in case S2 goes down.
>
> Here was the original geo-rep status
>
> # gluster volume geo-replication status
>
> MASTER NODEMASTER VOLMASTER BRICK   SLAVE USERSLAVE
>SLAVE NODESTATUS CRAWL STATUS
> LAST_SYNCED
> 
> 
> 
> S2 sharedvol /home/sharedvolroot
> ssh://S5::sharedvolslaveS5PassiveN/AN/A
> S1 sharedvol /home/sharedvolroot
> ssh://S5::sharedvolslaveS5Active Changelog Crawl
> 2018-02-07 10:18:57
>
> Here is the new geo-replication stauts
>
> # gluster volume geo-replication status
>
> MASTER NODEMASTER VOLMASTER BRICK   SLAVE USERSLAVE
>SLAVE NODESTATUSCRAWL STATUS
> LAST_SYNCED
> 
> 
> ---
> S2 sharedvol /home/sharedvolroot
> ssh://S5::sharedvolslaveS5ActiveChangelog Crawl
> 2018-02-07 11:48:31
>
>
> How can I add S3 as a passive node in the geo-replica to S5 ??
>
> Thank you,
> Stefano
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>



-- 
Thanks and Regards,
Kotresh H R
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] geo-replication

2018-02-07 Thread Kotresh Hiremath Ravishankar
Answers in line.

On Tue, Feb 6, 2018 at 6:24 PM, Marcus Pedersén 
wrote:

> Hi again,
> I made some more tests and the behavior I get is that if any of
> the slaves are down the geo-replication stops working.
> It this the way distributed volumes work, if one server goes down
> the entire system stops to work?
> The servers that are online do not continue to work?
>

   If one of the slave node is down, the corresponding master node does
connect to other slave node which is up.
But if the primary slave node (the one used to create geo-rep session) is
down, that connection remains Faulty until
it is brought back up.

>
> Sorry, for asking stupid questions.
>
> Best regards
> Marcus
>
>
> On Tue, Feb 06, 2018 at 12:09:40PM +0100, Marcus Pedersén wrote:
> > Hi all,
> >
> > I am planning my new gluster system and tested things out in
> > a bunch of virtual machines.
> > I need a bit of help to understand how geo-replication behaves.
> >
> > I have a master gluster cluster replica 2
> > (in production I will use an arbiter and replicatied/distributed)
> > and the geo cluster is distributed with 2 machines.
> > (in production I will have the geo cluster distributed)
> >
> > Everything is up and running and creating files from client both
> > replicates and is distributed in the geo cluster.
> >
> > The thing I am wondering about is:
> > When I run: gluster volume geo-replication status
> > I see both slave nodes one is active and the other is passive.
> >
> > MASTER NODEMASTER VOL MASTER BRICKSLAVE USERSLAVE
>   SLAVE NODE  STATUS CRAWL STATUS
>  LAST_SYNCED
> > 
> 
> ---
> > gluster1   interbullfs/interbullfsgeouser
>  ssh://geouser@gluster-geo1::interbullfs-geogluster-geo2Active
>  Changelog Crawl2018-02-06 11:46:08
> > gluster2   interbullfs/interbullfsgeouser
>  ssh://geouser@gluster-geo1::interbullfs-geogluster-geo1Passive
>   N/AN/A
> >
> >
> > If I shutdown the active slave the status changes to faulty
> > and the other one continues to be passive.
> >
> > MASTER NODEMASTER VOL MASTER BRICKSLAVE USERSLAVE
>   SLAVE NODE  STATUS CRAWL STATUS
>   LAST_SYNCED
> > 
> 
> 
> > gluster1   interbullfs/interbullfsgeouser
>  ssh://geouser@gluster-geo1::interbullfs-geoN/A Faulty
>  N/A N/A
> > gluster2   interbullfs/interbullfsgeouser
>  ssh://geouser@gluster-geo1::interbullfs-geogluster-geo1Passive
>   N/A N/A
> >
> >
> > In my understanding I thought that if the active slave stopped
> > working the passive slave should become active and should
> > continue to replicate from master.
> >
> > Am I wrong? Is there just one active slave if it is setup as
> > a distributed system?
> >
> > What I use:
> > Centos 7, gluster 3.12
> > I have followed the geo instructions:
> > http://docs.gluster.org/en/latest/Administrator%20Guide/
> Geo%20Replication/
> >
> > Many thanks in advance!
> >
> > Bets regards
> > Marcus
> >
> > --
> > **
> > * Marcus Pedersén*
> > * System administrator   *
> > **
> > * Interbull Centre   *
> > *    *
> > * Department of Animal Breeding & Genetics — SLU *
> > * Box 7023, SE-750 07*
> > * Uppsala, Sweden*
> > **
> > * Visiting address:  *
> > * Room 55614, Ulls väg 26, Ultuna*
> > * Uppsala*
> > * Sweden *
> > **
> > * Tel: +46-(0)18-67 1962 *
> > **
> > **
> > * ISO 9001 Bureau Veritas No SE004561-1  *
> > **
> > ___
> > Gluster-users mailing list
> > Gluster-users@gluster.org
> > http://lists.gluster.org/mailman/listinfo/gluster-users
>
> --
> **
> * Marcus Pedersén*
> * System administrator   *
> **
> * Interbull Centre  

Re: [Gluster-users] geo-replication

2018-02-07 Thread Kotresh Hiremath Ravishankar
We are happy to help you out. Please find the answers inline.

On Tue, Feb 6, 2018 at 4:39 PM, Marcus Pedersén 
wrote:

> Hi all,
>
> I am planning my new gluster system and tested things out in
> a bunch of virtual machines.
> I need a bit of help to understand how geo-replication behaves.
>
> I have a master gluster cluster replica 2
> (in production I will use an arbiter and replicatied/distributed)
> and the geo cluster is distributed with 2 machines.
> (in production I will have the geo cluster distributed)
>

It's recommended to use slave also to be distribute replicate/aribiter/ec.
Choosing only distribute will cause issues
when of the slave node is down and a file is being synced which belongs to
that node. It would not sync
later.


> Everything is up and running and creating files from client both
> replicates and is distributed in the geo cluster.
>
> The thing I am wondering about is:
> When I run: gluster volume geo-replication status
> I see both slave nodes one is active and the other is passive.
>
> MASTER NODEMASTER VOL MASTER BRICKSLAVE USERSLAVE
> SLAVE NODE  STATUS CRAWL STATUS
>LAST_SYNCED
> 
> 
> ---
> gluster1   interbullfs/interbullfsgeouser
>  ssh://geouser@gluster-geo1::interbullfs-geogluster-geo2Active
>  Changelog Crawl2018-02-06 11:46:08
> gluster2   interbullfs/interbullfsgeouser
>  ssh://geouser@gluster-geo1::interbullfs-geogluster-geo1Passive
>   N/AN/A
>
>
> If I shutdown the active slave the status changes to faulty
> and the other one continues to be passive.
>

> MASTER NODEMASTER VOL MASTER BRICKSLAVE USERSLAVE
> SLAVE NODE  STATUS CRAWL STATUS
> LAST_SYNCED
> 
> 
> 
> gluster1   interbullfs/interbullfsgeouser
>  ssh://geouser@gluster-geo1::interbullfs-geoN/A Faulty
>  N/A N/A
> gluster2   interbullfs/interbullfsgeouser
>  ssh://geouser@gluster-geo1::interbullfs-geogluster-geo1Passive
>   N/A N/A
>
>
> In my understanding I thought that if the active slave stopped
> working the passive slave should become active and should
> continue to replicate from master.
>
> Am I wrong? Is there just one active slave if it is setup as
> a distributed system?
>

The Active/Passive notion is for master node. If gluster1 master node is
down  glusterd2 master node will become Active.
It's not for slave node.



>
> What I use:
> Centos 7, gluster 3.12
> I have followed the geo instructions:
> http://docs.gluster.org/en/latest/Administrator%20Guide/Geo%20Replication/
>
> Many thanks in advance!
>
> Bets regards
> Marcus
>
> --
> **
> * Marcus Pedersén*
> * System administrator   *
> **
> * Interbull Centre   *
> *    *
> * Department of Animal Breeding & Genetics — SLU *
> * Box 7023, SE-750 07*
> * Uppsala, Sweden*
> **
> * Visiting address:  *
> * Room 55614, Ulls väg 26, Ultuna*
> * Uppsala*
> * Sweden *
> **
> * Tel: +46-(0)18-67 1962 *
> **
> **
> * ISO 9001 Bureau Veritas No SE004561-1  *
> **
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users




-- 
Thanks and Regards,
Kotresh H R
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] geo-replication command rsync returned with 3

2018-02-06 Thread Kotresh Hiremath Ravishankar
Hi,

As a quick workaround for geo-replication to work. Please configure the
following option.

gluster vol geo-replication  :: config
access_mount true

The above option will not do the lazy umount and as a result, all the
master and slave volume mounts
maintained by geo-replication can be accessed by others. It's also visible
in df output.
There might be cases where the mount points not get cleaned up when worker
goes faulty and come back.
These needs manual cleaning.

Thanks,
Kotresh HR

On Tue, Feb 6, 2018 at 12:37 AM, Florian Weimer  wrote:

> On 02/05/2018 01:33 PM, Florian Weimer wrote:
>
> Do you have strace output going further back, at least to the proceeding
>> getcwd call?  It would be interesting to see which path the kernel reports,
>> and if it starts with "(unreachable)".
>>
>
> I got the strace output now, but it very difficult to read (chdir in a
> multi-threaded process …).
>
> My current inclination is to blame rsync because it does an unconditional
> getcwd during startup, which now fails if the current directory is
> unreachable.
>
> Further references:
>
> https://sourceware.org/ml/libc-alpha/2018-02/msg00152.html
> https://bugzilla.redhat.com/show_bug.cgi?id=1542180
>
> Andreas Schwab agrees that rsync is buggy:
>
> https://sourceware.org/ml/libc-alpha/2018-02/msg00153.html
>
>
> Thanks,
> Florian
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>



-- 
Thanks and Regards,
Kotresh H R
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] geo-replication initial setup with existing data

2018-01-30 Thread Kotresh Hiremath Ravishankar
Hi,

Geo-replication expects the gfids (unique identifier similar to inode
number in backend file systems) to be same
for a file both on master and slave gluster volume. If the data is directly
copied by other means other than geo-replication,
gfid will be different. The crashes you are seeing is because of that.

If the data is not huge, I would recommend to sync the data from master
volume using geo-replication. The other way
is to copy directly to slave and set the gfid at the backend for each file
to be same as in master volume and then setup
geo-replication. To do the latter, follow below steps. (Note that it's not
tested extensively)


   1. Run the following commands on any one of the master nodes:

   # cd /usr/share/glusterfs/scripts/
   # sh generate-gfid-file.sh localhost:${master-vol} $PWD/get-gfid.sh
/tmp/tmp.*atyEmKyCjo*/upgrade-gfid-values.txt
   # scp /tmp/tmp.*atyEmKyCjo*/upgrade-gfid-values.txt root@${slavehost}:/tmp/

   2. Run the following commands on a slave node:

   # cd /usr/share/glusterfs/scripts/
   # sh slave-upgrade.sh localhost:${slave-vol}
/tmp/tmp.*atyEmKyCjo*/upgrade-gfid-values.txt $PWD/gsync-sync-gfid

   3. Setup geo-rep and start


Thanks,
Kotresh HR

On Mon, Jan 22, 2018 at 4:55 PM,  wrote:

> Hello everyone,
>
> i was searching for a replacement of my rsync dublication of data to
> second server and ended up with gluster geo-replicaiton. But after reading
> the documentation and lots of websites I'm still unsure how to setup a
> geo-replication without retransfering all data.
> I succeeded in converting my existing data folder to a gluster volume by
> creating a volume and using the "find /media/brick -noleaf -print0 | xargs
> --null stat" inside the mounted gluster volume folder on the master.
> But how to I have to prepare the slave? I try to do it in the same way as
> with the master, but this will only result in error messages like
> [2018-01-22 11:17:05.209027] E [repce(/media/brick):209:__call__]
> RepceClient: call failed on peer call=27401:140641732232960:1516619825.12
>method=entry_opserror=OSError
> [2018-01-22 11:17:05.209497] E 
> [syncdutils(/media/brick):331:log_raise_exception]
> : FAIL:
> Traceback (most recent call last):
>   File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/gsyncd.py",
> line 210, in main
> main_i()
>   File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/gsyncd.py",
> line 801, in main_i
> local.service_loop(*[r for r in [remote] if r])
>   File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/resource.py",
> line 1670, in service_loop
> g1.crawlwrap(oneshot=True, register_time=register_time)
>   File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/master.py",
> line 597, in crawlwrap
> self.crawl()
>   File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/master.py",
> line 1555, in crawl
> self.process([item[1]], 0)
>   File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/master.py",
> line 1204, in process
> self.process_change(change, done, retry)
>   File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/master.py",
> line 1114, in process_change
> failures = self.slave.server.entry_ops(entries)
>   File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/repce.py",
> line 228, in __call__
> return self.ins(self.meth, *a)
>   File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/repce.py",
> line 210, in __call__
> raise res
> OSError: [Errno 42] .gfid/----0001
> I removed now all extended attributes on the slave and deleted the
> .glusterfs folder in the brick, so the system is hopefully in the inital
> state again.
> Is there any way to setup a geo-replication session without resyncing all
> data by gluster? Because this will take month with my poor connection over
> here. I'm using gluster 3.13.1 on two Ubuntu 16.04.3 LTS hosts.
>
> I hope someone can help me with some hints, thanks and best regards
> Tino
>
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>



-- 
Thanks and Regards,
Kotresh H R
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] geo-replication command rsync returned with 3

2018-01-24 Thread Kotresh Hiremath Ravishankar
It is clear that rsync is failing. Are the rsync versions on all masters
and slave nodes same?
I have seen that has caused problems sometimes.

-Kotresh HR

On Wed, Jan 24, 2018 at 10:29 PM, Dietmar Putz 
wrote:

> Hi all,
> i have made some tests on the latest Ubuntu 16.04.3 server image. Upgrades
> were disabled...
> the configuration was always the same...a distributed replicated volume on
> 4 VM's with geo-replication to a dist. repl .volume on 4 VM's.
> i started with 3.7.20, upgrade to 3.8.15, to 3.10.9 to 3.12.5. After each
> upgrade i have tested the geo-replication which worked well anytime.
> then i have made an update / upgrade on the first master node. directly
> after upgrade the below shown error appeared on that node.
> after upgrade on the second master node the error appeared there also...
> geo replication is faulty.
>
> this error affects gfs 3.7.20, 3.8.15, 3.10.9 and 3.12.5 on Ubuntu 16.04.3
> in one test i have updated rsync from 3.1.1 to 3.1.2 but with no effect.
>
> does anyone else experienced this behavior...any idea ?
>
> best regards
> Dietmar
>
>
> gfs 3.12.5 geo-rep log on master :
>
> [2018-01-24 15:50:35.347959] I [master(/brick1/mvol1):1385:crawl]
> _GMaster: slave's timestime=(1516808792, 0)
> [2018-01-24 15:50:35.604094] I [master(/brick1/mvol1):1863:syncjob]
> Syncer: Sync Time Takenduration=0.0294num_files=1job=2
> return_code=3
> [2018-01-24 15:50:35.605490] E [resource(/brick1/mvol1):210:errlog]
> Popen: command returned errorcmd=rsync -aR0 --inplace --files-from=-
> --super --stats --numeric-ids --no-implied-dirs --existing --xattrs --acls
> --ignore-missing-args . -e ssh -oPasswordAuthentication=no
> -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem
> -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-MZwEp2/
> cbad1c5f88978ecd713bdb1478fbabbe.sock --compress root@gl-node5-int
> :/proc/2013/cwderror=3
> [2018-01-24 15:50:35.628978] I [syncdutils(/brick1/mvol1):271:finalize]
> : exiting.
>
>
>
> after this upgrade one server fails :
> Start-Date: 2018-01-18  04:33:52
> Commandline: /usr/bin/unattended-upgrade
> Upgrade:
> libdns-export162:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.8,
> 1:9.10.3.dfsg.P4-8ubuntu1.10),
> libisccfg140:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.8,
> 1:9.10.3.dfsg.P4-8ubuntu1.10),
> bind9-host:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.8,
> 1:9.10.3.dfsg.P4-8ubuntu1.10),
> dnsutils:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.8,
> 1:9.10.3.dfsg.P4-8ubuntu1.10),
> libc6:amd64 (2.23-0ubuntu9, 2.23-0ubuntu10),
> libisc160:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.8,
> 1:9.10.3.dfsg.P4-8ubuntu1.10),
> locales:amd64 (2.23-0ubuntu9, 2.23-0ubuntu10),
> libisc-export160:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.8,
> 1:9.10.3.dfsg.P4-8ubuntu1.10),
> libc-bin:amd64 (2.23-0ubuntu9, 2.23-0ubuntu10),
> liblwres141:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.8,
> 1:9.10.3.dfsg.P4-8ubuntu1.10),
> libdns162:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.8,
> 1:9.10.3.dfsg.P4-8ubuntu1.10),
> multiarch-support:amd64 (2.23-0ubuntu9, 2.23-0ubuntu10),
> libisccc140:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.8,
> 1:9.10.3.dfsg.P4-8ubuntu1.10),
> libbind9-140:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.8,
> 1:9.10.3.dfsg.P4-8ubuntu1.10)
> End-Date: 2018-01-18  04:34:32
>
>
>
> strace rsync :
>
> 30743 23:34:47 newfstatat(3, "6737", {st_mode=S_IFDIR|0755, st_size=4096,
> ...}, AT_SYMLINK_NOFOLLOW) = 0
> 30743 23:34:47 newfstatat(3, "6741", {st_mode=S_IFDIR|0755, st_size=4096,
> ...}, AT_SYMLINK_NOFOLLOW) = 0
> 30743 23:34:47 getdents(3, /* 0 entries */, 131072) = 0
> 30743 23:34:47 munmap(0x7fa4feae7000, 135168) = 0
> 30743 23:34:47 close(3) = 0
> 30743 23:34:47 write(2, "rsync: getcwd(): No such file or directory (2)",
> 46) = 46
> 30743 23:34:47 write(2, "\n", 1)= 1
> 30743 23:34:47 rt_sigaction(SIGUSR1, {SIG_IGN, [], SA_RESTORER,
> 0x7fa4fdf404b0}, NULL, 8) = 0
> 30743 23:34:47 rt_sigaction(SIGUSR2, {SIG_IGN, [], SA_RESTORER,
> 0x7fa4fdf404b0}, NULL, 8) = 0
> 30743 23:34:47 write(2, "rsync error: errors selecting input/output files,
> dirs (code 3) at util.c(1056) [Receiver=3.1.1]", 96) = 96
> 30743 23:34:47 write(2, "\n", 1)= 1
> 30743 23:34:47 exit_group(3)= ?
> 30743 23:34:47 +++ exited with 3 +++
>
>
>
>
> Am 19.01.2018 um 17:27 schrieb Joe Julian:
>
> ubuntu 16.04
>
>
> --
> Dietmar Putz
> 3Q GmbH
> Kurfürstendamm 102
> D-10711 Berlin
>
> Mobile:   +49 171 / 90 160 39
> Mail: dietmar.p...@3qsdn.com
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>



-- 
Thanks and Regards,
Kotresh H R
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Deploying geo-replication to local peer

2018-01-16 Thread Kotresh Hiremath Ravishankar
Hi Viktor,

Answers inline

On Wed, Jan 17, 2018 at 3:46 AM, Viktor Nosov  wrote:

> Hi,
>
> I'm looking for glusterfs feature that can be used to transform data
> between
> volumes of different types provisioned on the same nodes.
> It could be, for example, transformation from disperse to distributed
> volume.
> The possible option is to invoke geo-replication between volumes. It seems
> is works properly.
> But I'm concern about  requirement from Administration Guide for Red Hat
> Gluster Storage 3.3 (10.3.3. Prerequisites):
>
> "Slave node must not be a peer of the any of the nodes of the Master
> trusted
> storage pool."
>
>  This doesn't limit geo-rep feature in anyway. It's a  recommendation.
You
can go ahead and use it.

Is this restriction is set to limit usage of geo-replication to disaster
> recovery scenarios only or there is a problem with data synchronization
> between
> master and slave volumes?
>
> Anybody has experience with this issue?
>
> Thanks for any information!
>
> Viktor Nosov
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>



-- 
Thanks and Regards,
Kotresh H R
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] active-active georeplication?

2017-10-24 Thread Kotresh Hiremath Ravishankar
Hi,

No, gluster doesn't support active-active geo-replication. It's not planned
in near future. We will let you know when it's planned.

Thanks,
Kotresh HR

On Tue, Oct 24, 2017 at 11:19 AM, atris adam  wrote:

> hi everybody,
>
> Have glusterfs released a feature named active-active georeplication? If
> yes, in which version it is released? If no, is it planned to have this
> feature?
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>



-- 
Thanks and Regards,
Kotresh H R
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] how to verify bitrot signed file manually?

2017-09-29 Thread Kotresh Hiremath Ravishankar
Hi Amudhan,

Sorry for the late response as I was busy with other things. You are right
bitrot uses sha256 for checksum.
If file-1, file-2 are marked bad, the I/O should be errored out with EIO.
If that is not happening, we need
to look further into it. But what's the file contents of file-1 and file-2
on the replica bricks ? Are they
matching ?

Thanks and Regards,
Kotresh HR

On Mon, Sep 25, 2017 at 4:19 PM, Amudhan P  wrote:

> resending mail.
>
>
> On Fri, Sep 22, 2017 at 5:30 PM, Amudhan P  wrote:
>
>> ok, from bitrot code I figured out gluster using sha256 hashing algo.
>>
>>
>> Now coming to the problem, during scrub run in my cluster some of my
>> files were marked as bad in few set of nodes.
>> I just wanted to confirm bad file. so, I have used "sha256sum" tool in
>> Linux to manually get file hash.
>>
>> here is the result.
>>
>> file-1, file-2 marked as bad by scrub and file-3 is healthy.
>>
>> file-1 sha256 and bitrot signature value matches but still it's been
>> marked as bad.
>>
>> file-2 sha256 and bitrot signature value don't match, could be a victim
>> of bitrot or bitflip.file is still readable without any issue and no errors
>> found in the drive.
>>
>> file-3 sha256 and bitrot signature matches and healthy.
>>
>>
>> file-1 output from
>>
>> "sha256sum" = "71eada9352b1352aaef0f806d3d56
>> 1768ce2df905ded1668f665e06eca2d0bd4"
>>
>>
>> "getfattr -m. -e hex -d "
>> # file: file-1
>> trusted.bit-rot.bad-file=0x3100
>> trusted.bit-rot.signature=0x01020071eada9352b135
>> 2aaef0f806d3d561768ce2df905ded1668f665e06eca2d0bd4
>> trusted.bit-rot.version=0x020058e4f3b40006793d
>> trusted.ec.config=0x080a02000200
>> trusted.ec.dirty=0x
>> trusted.ec.size=0x000718996701
>> trusted.ec.version=0x00038c4c00038c4d
>> trusted.gfid=0xf078a24134fe4f9bb953eca8c28dea9a
>>
>> output scrub log:
>> [2017-09-02 13:02:20.311160] A [MSGID: 118023]
>> [bit-rot-scrub.c:244:bitd_compare_ckum] 0-qubevaultdr-bit-rot-0:
>> CORRUPTION DETECTED: Object /file-1 {Brick: /media/disk16/brick16 | GFID:
>> f078a241-34fe-4f9b-b953-eca8c28dea9a}
>> [2017-09-02 13:02:20.311579] A [MSGID: 118024]
>> [bit-rot-scrub.c:264:bitd_compare_ckum] 0-qubevaultdr-bit-rot-0: Marking
>> /file-1 [GFID: f078a241-34fe-4f9b-b953-eca8c28dea9a | Brick:
>> /media/disk16/brick16] as corrupted..
>>
>> file-2 output from
>>
>> "sha256sum" = "c41ef9c81faed4f3e6010ea67984c
>> 3cfefd842f98ee342939151f9250972dcda"
>>
>>
>> "getfattr -m. -e hex -d "
>> # file: file-2
>> trusted.bit-rot.bad-file=0x3100
>> trusted.bit-rot.signature=0x0102009162cb17d4f0be
>> e676fcb7830c5286d05b8e8940d14f3d117cb90b7b1defc129
>> trusted.bit-rot.version=0x020058e4f3b400019bb2
>> trusted.ec.config=0x080a02000200
>> trusted.ec.dirty=0x
>> trusted.ec.size=0x403433f6
>> trusted.ec.version=0x201a201b
>> trusted.gfid=0xa50012b0a632477c99232313928d239a
>>
>> output scrub log:
>> [2017-09-02 05:18:14.003156] A [MSGID: 118023]
>> [bit-rot-scrub.c:244:bitd_compare_ckum] 0-qubevaultdr-bit-rot-0:
>> CORRUPTION DETECTED: Object /file-2 {Brick: /media/disk13/brick13 | GFID:
>> a50012b0-a632-477c-9923-2313928d239a}
>> [2017-09-02 05:18:14.006629] A [MSGID: 118024]
>> [bit-rot-scrub.c:264:bitd_compare_ckum] 0-qubevaultdr-bit-rot-0: Marking
>> /file-2 [GFID: a50012b0-a632-477c-9923-2313928d239a | Brick:
>> /media/disk13/brick13] as corrupted..
>>
>>
>> file-3 output from
>>
>> "sha256sum" = "a590735b3c8936cc7ca9835128a19
>> c38a3f79c8fd53fddc031a9349b7e273f27"
>>
>>
>> "getfattr -m. -e hex -d "
>> # file: file-3
>> trusted.bit-rot.signature=0x010200a590735b3c8936
>> cc7ca9835128a19c38a3f79c8fd53fddc031a9349b7e273f27
>> trusted.bit-rot.version=0x020058e4f3b400019bb2
>> trusted.ec.config=0x080a02000200
>> trusted.ec.dirty=0x
>> trusted.ec.size=0x3530fc96
>> trusted.ec.version=0x1a981a99
>> trusted.gfid=0x10d8920e42cd42cf9448b8bf3941c192
>>
>>
>>
>> most of the bitrot bad files are in the set of new nodes and data were
>> uploaded using gluster 3.10.1. no drive issues are any kind of error msgs
>> in logs.
>>
>> what could be gone wrong?
>>
>> regards
>> Amudhan
>>
>> On Thu, Sep 21, 2017 at 1:23 PM, Amudhan P  wrote:
>>
>>> Hi,
>>>
>>> I have a file in my brick which was signed by bitrot and latter when
>>> running scrub it was marked as bad.
>>>
>>> Now, I want to verify file again manually. just to clarify my doubt
>>>
>>> how can I do this?
>>>
>>>
>>> regards
>>> Amudhan
>>>
>>
>>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>



-- 
Thanks and Regards,
Kotresh H R
___
Gluster-users mailing list

Re: [Gluster-users] Advice needed for geo replication

2017-05-05 Thread Kotresh Hiremath Ravishankar
Hi Felipe,

All the observations you have made are correct. AFR is a synchronous replication
where the client replicates the data which is limited by the speed by the 
slowest
node (in your case HDD node). AFR is the replicating each brick and is part of 
single
volume. At the end, you will have single volume where each brick's data is 
replicated
and the single volume is highly available.

But Geo-replication is a disaster recovery solution and is replicating whole 
volume
which is asynchronous in nature and eventually consistent. At the end you will 
have
two volumes master volume (in your case SSD one) and slave volume (in your case 
HDD one).
If performance is of concern and it's ok for the nature of replication to be 
eventually
consistent, you should go for geo-replication. And yes failover is manual.

Thanks and Regards,
Kotresh H R

- Original Message -
> From: "Felipe Arturo Polanco" 
> To: gluster-users@gluster.org
> Sent: Thursday, May 4, 2017 6:49:35 PM
> Subject: [Gluster-users] Advice needed for geo replication
> 
> Hi,
> 
> I would like some advice on setting up a replicated or geo replicated setup
> in Gluster.
> 
> Right now the setup consists of 1 storage server with no replica serving
> gluster volumes to clients.
> 
> We need to have some sort of replication of it by adding a second server but
> the catch is this second server will have spinning disks while the current
> one has SSD disks.
> 
> I have read that in Gluster, the clients are the one running the replication
> of data by sending the same bytes to the gluster servers, so by using one
> gluster server with spinning disks the performance of clients will be as
> fast as a spinning disk speed even when the other server has SSD.
> For budget reasons we can't have SSDs in second storage server.
> 
> I kept reading and found the geo replication feature which makes the server
> do the replication of data instead of the clients, which is more likely my
> case but looks like there is no automatic failover mechanism of it and the
> administrator need to intervene to make the slave server a master one
> according to this document:
> https://access.redhat.com/documentation/en-US/Red_Hat_Storage/2.0/html/Administration_Guide/ch11s05.html
> 
> Given this scenario, I really need a piece of advice from the gluster users
> on how would be the best approach to have a replicated setup with SSD+HDD
> storage servers.
> 
> Thanks,
> 
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] [Gluster-devel] Announcing release 3.11 : Scope, schedule and feature tracking

2017-04-25 Thread Kotresh Hiremath Ravishankar
Hi Serkan,

Even though bitrot is not enabled, versioning was being done.
As part of it, on every fresh lookup, getxattr calls were
made to find weather object is bad, to get it's current version
and signature. So a find on gluster mount sometimes would cause
high cpu utilization. 

Since this is an RFE, it would be available from 3.11 and would not
be back ported to 3.10.x


Thanks and Regards,
Kotresh H R

- Original Message -
> From: "Serkan Çoban" <cobanser...@gmail.com>
> To: "Kotresh Hiremath Ravishankar" <khire...@redhat.com>
> Cc: "Shyam" <srang...@redhat.com>, "Gluster Users" 
> <gluster-users@gluster.org>, "Gluster Devel"
> <gluster-de...@gluster.org>
> Sent: Tuesday, April 25, 2017 1:25:39 PM
> Subject: Re: [Gluster-users] [Gluster-devel] Announcing release 3.11 : Scope, 
> schedule and feature tracking
> 
> How this affect CPU usage? Does it read whole file and calculates a
> hash after it is being written?
> Will this patch land in 3.10.x?
> 
> On Tue, Apr 25, 2017 at 10:32 AM, Kotresh Hiremath Ravishankar
> <khire...@redhat.com> wrote:
> > Hi
> >
> > https://github.com/gluster/glusterfs/issues/188 is merged in master
> > and needs to go in 3.11
> >
> > Thanks and Regards,
> > Kotresh H R
> >
> > - Original Message -
> >> From: "Kaushal M" <kshlms...@gmail.com>
> >> To: "Shyam" <srang...@redhat.com>
> >> Cc: gluster-users@gluster.org, "Gluster Devel" <gluster-de...@gluster.org>
> >> Sent: Thursday, April 20, 2017 12:16:39 PM
> >> Subject: Re: [Gluster-devel] Announcing release 3.11 : Scope, schedule and
> >> feature tracking
> >>
> >> On Thu, Apr 13, 2017 at 8:17 PM, Shyam <srang...@redhat.com> wrote:
> >> > On 02/28/2017 10:17 AM, Shyam wrote:
> >> >>
> >> >> Hi,
> >> >>
> >> >> With release 3.10 shipped [1], it is time to set the dates for release
> >> >> 3.11 (and subsequently 4.0).
> >> >>
> >> >> This mail has the following sections, so please read or revisit as
> >> >> needed,
> >> >>   - Release 3.11 dates (the schedule)
> >> >>   - 3.11 focus areas
> >> >
> >> >
> >> > Pinging the list on the above 2 items.
> >> >
> >> >> *Release 3.11 dates:*
> >> >> Based on our release schedule [2], 3.11 would be 3 months from the 3.10
> >> >> release and would be a Short Term Maintenance (STM) release.
> >> >>
> >> >> This puts 3.11 schedule as (working from the release date backwards):
> >> >> - Release: May 30th, 2017
> >> >> - Branching: April 27th, 2017
> >> >
> >> >
> >> > Branching is about 2 weeks away, other than the initial set of overflow
> >> > features from 3.10 nothing else has been raised on the lists and in
> >> > github
> >> > as requests for 3.11.
> >> >
> >> > So, a reminder to folks who are working on features, to raise the
> >> > relevant
> >> > github issue for the same, and post it to devel list for consideration
> >> > in
> >> > 3.11 (also this helps tracking and ensuring we are waiting for the right
> >> > things at the time of branching).
> >> >
> >> >>
> >> >> *3.11 focus areas:*
> >> >> As maintainers of gluster, we want to harden testing around the various
> >> >> gluster features in this release. Towards this the focus area for this
> >> >> release are,
> >> >>
> >> >> 1) Testing improvements in Gluster
> >> >>   - Primary focus would be to get automated test cases to determine
> >> >> release health, rather than repeating a manual exercise every 3 months
> >> >>   - Further, we would also attempt to focus on maturing Glusto[7] for
> >> >> this, and other needs (as much as possible)
> >> >>
> >> >> 2) Merge all (or as much as possible) Facebook patches into master, and
> >> >> hence into release 3.11
> >> >>   - Facebook has (as announced earlier [3]) started posting their
> >> >> patches mainline, and this needs some attention to make it into master
> >> >>
> >> >
> >> > Further to the above, we are also considering the following features for
> >> > this release, re

Re: [Gluster-users] [Gluster-devel] Announcing release 3.11 : Scope, schedule and feature tracking

2017-04-25 Thread Kotresh Hiremath Ravishankar
Hi

https://github.com/gluster/glusterfs/issues/188 is merged in master
and needs to go in 3.11

Thanks and Regards,
Kotresh H R

- Original Message -
> From: "Kaushal M" 
> To: "Shyam" 
> Cc: gluster-users@gluster.org, "Gluster Devel" 
> Sent: Thursday, April 20, 2017 12:16:39 PM
> Subject: Re: [Gluster-devel] Announcing release 3.11 : Scope, schedule and 
> feature tracking
> 
> On Thu, Apr 13, 2017 at 8:17 PM, Shyam  wrote:
> > On 02/28/2017 10:17 AM, Shyam wrote:
> >>
> >> Hi,
> >>
> >> With release 3.10 shipped [1], it is time to set the dates for release
> >> 3.11 (and subsequently 4.0).
> >>
> >> This mail has the following sections, so please read or revisit as needed,
> >>   - Release 3.11 dates (the schedule)
> >>   - 3.11 focus areas
> >
> >
> > Pinging the list on the above 2 items.
> >
> >> *Release 3.11 dates:*
> >> Based on our release schedule [2], 3.11 would be 3 months from the 3.10
> >> release and would be a Short Term Maintenance (STM) release.
> >>
> >> This puts 3.11 schedule as (working from the release date backwards):
> >> - Release: May 30th, 2017
> >> - Branching: April 27th, 2017
> >
> >
> > Branching is about 2 weeks away, other than the initial set of overflow
> > features from 3.10 nothing else has been raised on the lists and in github
> > as requests for 3.11.
> >
> > So, a reminder to folks who are working on features, to raise the relevant
> > github issue for the same, and post it to devel list for consideration in
> > 3.11 (also this helps tracking and ensuring we are waiting for the right
> > things at the time of branching).
> >
> >>
> >> *3.11 focus areas:*
> >> As maintainers of gluster, we want to harden testing around the various
> >> gluster features in this release. Towards this the focus area for this
> >> release are,
> >>
> >> 1) Testing improvements in Gluster
> >>   - Primary focus would be to get automated test cases to determine
> >> release health, rather than repeating a manual exercise every 3 months
> >>   - Further, we would also attempt to focus on maturing Glusto[7] for
> >> this, and other needs (as much as possible)
> >>
> >> 2) Merge all (or as much as possible) Facebook patches into master, and
> >> hence into release 3.11
> >>   - Facebook has (as announced earlier [3]) started posting their
> >> patches mainline, and this needs some attention to make it into master
> >>
> >
> > Further to the above, we are also considering the following features for
> > this release, request feature owners to let us know if these are actively
> > being worked on and if these will make the branching dates. (calling out
> > folks that I think are the current feature owners for the same)
> >
> > 1) Halo - Initial Cut (@pranith)
> > 2) IPv6 support (@kaushal)
> 
> This is under review at https://review.gluster.org/16228 . The patch
> mostly looks fine.
> 
> The only issue is that it currently depends and links with an internal
> FB fork of tirpc (mainly for some helper functions and utilities).
> This makes it hard for the community to make actual use of  and test,
> the IPv6 features/fixes introduced by the change.
> 
> If the change were refactored the use publicly available versions of
> tirpc or ntirpc, I'm OK for it to be merged. I did try it out myself.
> While I was able to build it against available versions of tirpc, I
> wasn't able to get it working correctly.
> 
> > 3) Negative lookup (@poornima)
> > 4) Parallel Readdirp - More changes to default settings. (@poornima, @du)
> >
> >
> >> [1] 3.10 release announcement:
> >> http://lists.gluster.org/pipermail/gluster-devel/2017-February/052188.html
> >>
> >> [2] Gluster release schedule:
> >> https://www.gluster.org/community/release-schedule/
> >>
> >> [3] Mail regarding facebook patches:
> >> http://lists.gluster.org/pipermail/gluster-devel/2016-December/051784.html
> >>
> >> [4] Release scope: https://github.com/gluster/glusterfs/projects/1
> >>
> >> [5] glusterfs github issues: https://github.com/gluster/glusterfs/issues
> >>
> >> [6] github issues for features and major fixes:
> >> https://hackmd.io/s/BkgH8sdtg#
> >>
> >> [7] Glusto tests: https://github.com/gluster/glusto-tests
> >> ___
> >> Gluster-devel mailing list
> >> gluster-de...@gluster.org
> >> http://lists.gluster.org/mailman/listinfo/gluster-devel
> >
> > ___
> > Gluster-devel mailing list
> > gluster-de...@gluster.org
> > http://lists.gluster.org/mailman/listinfo/gluster-devel
> ___
> Gluster-devel mailing list
> gluster-de...@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-devel
> 
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] High load on glusterfsd process

2017-04-25 Thread Kotresh Hiremath Ravishankar
Hi Abhishek,

As this is an enhancement it won't be back ported to 3.7/3.8/3.10
It would be only available from upcoming 3.11 release.

But I did try applying it to 3.7.6. It has lot of conflicts.
If it's important for you, you can upgrade to latest version. 
available and back port it. If it's impossible to upgrade to
latest version, atleast 3.7.20 would do. It has minimal
conflicts. I can help you out with that. 

Thanks and Regards,
Kotresh H R

- Original Message -
> From: "ABHISHEK PALIWAL" <abhishpali...@gmail.com>
> To: "Kotresh Hiremath Ravishankar" <khire...@redhat.com>
> Cc: "Pranith Kumar Karampuri" <pkara...@redhat.com>, "Gluster Devel" 
> <gluster-de...@gluster.org>, "gluster-users"
> <gluster-users@gluster.org>
> Sent: Tuesday, April 25, 2017 10:58:41 AM
> Subject: Re: [Gluster-users] High load on glusterfsd process
> 
> Hi Kotresh,
> 
> Could you please update whether it is possible to get the patch or bakport
> this patch on Gluster 3.7.6 version.
> 
> Regards,
> Abhishek
> 
> On Mon, Apr 24, 2017 at 6:14 PM, ABHISHEK PALIWAL <abhishpali...@gmail.com>
> wrote:
> 
> > What is the way to take this patch on Gluster 3.7.6 or only way to upgrade
> > the version?
> >
> > On Mon, Apr 24, 2017 at 3:22 PM, ABHISHEK PALIWAL <abhishpali...@gmail.com
> > > wrote:
> >
> >> Hi Kotresh,
> >>
> >> I have seen the patch available on the link which you shared. It seems we
> >> don't have some files in gluser 3.7.6 which you modified in the patch.
> >>
> >> Is there any possibility to provide the patch for Gluster 3.7.6?
> >>
> >> Regards,
> >> Abhishek
> >>
> >> On Mon, Apr 24, 2017 at 3:07 PM, Kotresh Hiremath Ravishankar <
> >> khire...@redhat.com> wrote:
> >>
> >>> Hi Abhishek,
> >>>
> >>> Bitrot requires versioning of files to be down on writes.
> >>> This was being done irrespective of whether bitrot is
> >>> enabled or not. This takes considerable CPU. With the
> >>> fix https://review.gluster.org/#/c/14442/, it is made
> >>> optional and is enabled only with bitrot. If bitrot
> >>> is not enabled, then you won't see any setxattr/getxattrs
> >>> related to bitrot.
> >>>
> >>> The fix would be available in 3.11.
> >>>
> >>>
> >>> Thanks and Regards,
> >>> Kotresh H R
> >>>
> >>> - Original Message -
> >>> > From: "ABHISHEK PALIWAL" <abhishpali...@gmail.com>
> >>> > To: "Pranith Kumar Karampuri" <pkara...@redhat.com>
> >>> > Cc: "Gluster Devel" <gluster-de...@gluster.org>, "gluster-users" <
> >>> gluster-users@gluster.org>, "Kotresh Hiremath
> >>> > Ravishankar" <khire...@redhat.com>
> >>> > Sent: Monday, April 24, 2017 11:30:57 AM
> >>> > Subject: Re: [Gluster-users] High load on glusterfsd process
> >>> >
> >>> > Hi Kotresh,
> >>> >
> >>> > Could you please update me on this?
> >>> >
> >>> > Regards,
> >>> > Abhishek
> >>> >
> >>> > On Sat, Apr 22, 2017 at 12:31 PM, Pranith Kumar Karampuri <
> >>> > pkara...@redhat.com> wrote:
> >>> >
> >>> > > +Kotresh who seems to have worked on the bug you mentioned.
> >>> > >
> >>> > > On Fri, Apr 21, 2017 at 12:21 PM, ABHISHEK PALIWAL <
> >>> > > abhishpali...@gmail.com> wrote:
> >>> > >
> >>> > >>
> >>> > >> If the patch provided in that case will resolve my bug as well then
> >>> > >> please provide the patch so that I will backport it on 3.7.6
> >>> > >>
> >>> > >> On Fri, Apr 21, 2017 at 11:30 AM, ABHISHEK PALIWAL <
> >>> > >> abhishpali...@gmail.com> wrote:
> >>> > >>
> >>> > >>> Hi Team,
> >>> > >>>
> >>> > >>> I have noticed that there are so many glusterfsd threads are
> >>> running in
> >>> > >>> my system and we observed some of those thread consuming more cpu.
> >>> I
> >>> > >>> did “strace” on two such threads (before the problem disappeared by
> >>> > >>> itself)

Re: [Gluster-users] High load on glusterfsd process

2017-04-24 Thread Kotresh Hiremath Ravishankar
Hi Abhishek,

Bitrot requires versioning of files to be down on writes.
This was being done irrespective of whether bitrot is
enabled or not. This takes considerable CPU. With the
fix https://review.gluster.org/#/c/14442/, it is made
optional and is enabled only with bitrot. If bitrot
is not enabled, then you won't see any setxattr/getxattrs
related to bitrot.

The fix would be available in 3.11. 


Thanks and Regards,
Kotresh H R

- Original Message -
> From: "ABHISHEK PALIWAL" <abhishpali...@gmail.com>
> To: "Pranith Kumar Karampuri" <pkara...@redhat.com>
> Cc: "Gluster Devel" <gluster-de...@gluster.org>, "gluster-users" 
> <gluster-users@gluster.org>, "Kotresh Hiremath
> Ravishankar" <khire...@redhat.com>
> Sent: Monday, April 24, 2017 11:30:57 AM
> Subject: Re: [Gluster-users] High load on glusterfsd process
> 
> Hi Kotresh,
> 
> Could you please update me on this?
> 
> Regards,
> Abhishek
> 
> On Sat, Apr 22, 2017 at 12:31 PM, Pranith Kumar Karampuri <
> pkara...@redhat.com> wrote:
> 
> > +Kotresh who seems to have worked on the bug you mentioned.
> >
> > On Fri, Apr 21, 2017 at 12:21 PM, ABHISHEK PALIWAL <
> > abhishpali...@gmail.com> wrote:
> >
> >>
> >> If the patch provided in that case will resolve my bug as well then
> >> please provide the patch so that I will backport it on 3.7.6
> >>
> >> On Fri, Apr 21, 2017 at 11:30 AM, ABHISHEK PALIWAL <
> >> abhishpali...@gmail.com> wrote:
> >>
> >>> Hi Team,
> >>>
> >>> I have noticed that there are so many glusterfsd threads are running in
> >>> my system and we observed some of those thread consuming more cpu. I
> >>> did “strace” on two such threads (before the problem disappeared by
> >>> itself)
> >>> and found that there is a continuous activity like below:
> >>>
> >>> lstat("/opt/lvmdir/c2/brick/.glusterfs/e7/7d/e77d12b3-92f8-4
> >>> dfe-9a7f-246e901cbdf1/002700/firewall_-J208482-425_20170126T113552+.log.gz",
> >>> {st_mode=S_IFREG|0670, st_size=1995, ...}) = 0
> >>> lgetxattr("/opt/lvmdir/c2/brick/.glusterfs/e7/7d/e77d12b3-92
> >>> f8-4dfe-9a7f-246e901cbdf1/002700/firewall_-J208482-425_20170126T113552+.log.gz",
> >>> "trusted.bit-rot.bad-file", 0x3fff81f58550, 255) = -1 ENODATA (No data
> >>> available)
> >>> lgetxattr("/opt/lvmdir/c2/brick/.glusterfs/e7/7d/e77d12b3-92
> >>> f8-4dfe-9a7f-246e901cbdf1/002700/firewall_-J208482-425_20170126T113552+.log.gz",
> >>> "trusted.bit-rot.signature", 0x3fff81f58550, 255) = -1 ENODATA (No data
> >>> available)
> >>> lstat("/opt/lvmdir/c2/brick/.glusterfs/e7/7d/e77d12b3-92f8-4
> >>> dfe-9a7f-246e901cbdf1/002700/tcli_-J208482-425_20170123T180550+.log.gz",
> >>> {st_mode=S_IFREG|0670, st_size=169, ...}) = 0
> >>> lgetxattr("/opt/lvmdir/c2/brick/.glusterfs/e7/7d/e77d12b3-92
> >>> f8-4dfe-9a7f-246e901cbdf1/002700/tcli_-J208482-425_20170123T180550+.log.gz",
> >>> "trusted.bit-rot.bad-file", 0x3fff81f58550, 255) = -1 ENODATA (No data
> >>> available)
> >>> lgetxattr("/opt/lvmdir/c2/brick/.glusterfs/e7/7d/e77d12b3-92
> >>> f8-4dfe-9a7f-246e901cbdf1/002700/tcli_-J208482-425_20170123T180550+.log.gz",
> >>> "trusted.bit-rot.signature", 0x3fff81f58550, 255) = -1 ENODATA (No data
> >>> available)
> >>>
> >>> I have found the below existing issue which is very similar to my
> >>> scenario.
> >>>
> >>> https://bugzilla.redhat.com/show_bug.cgi?id=1298258
> >>>
> >>> We are using the gluster-3.7.6 and it seems that the issue is fixed in
> >>> 3.8.4 version.
> >>>
> >>> Could you please let me know why it showing the number of above logs and
> >>> reason behind it as it is not explained in the above bug.
> >>>
> >>> Regards,
> >>> Abhishek
> >>>
> >>> --
> >>>
> >>>
> >>>
> >>>
> >>> Regards
> >>> Abhishek Paliwal
> >>>
> >>
> >>
> >>
> >> --
> >>
> >>
> >>
> >>
> >> Regards
> >> Abhishek Paliwal
> >>
> >> ___
> >> Gluster-users mailing list
> >> Gluster-users@gluster.org
> >> http://lists.gluster.org/mailman/listinfo/gluster-users
> >>
> >
> >
> >
> > --
> > Pranith
> >
> 
> 
> 
> --
> 
> 
> 
> 
> Regards
> Abhishek Paliwal
> 
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Geo replication stuck (rsync: link_stat "(unreachable)")

2017-04-16 Thread Kotresh Hiremath Ravishankar
Answers inline.

Thanks and Regards,
Kotresh H R

- Original Message -
> From: "mabi" <m...@protonmail.ch>
> To: "Kotresh Hiremath Ravishankar" <khire...@redhat.com>
> Cc: "Gluster Users" <gluster-users@gluster.org>
> Sent: Thursday, April 13, 2017 8:51:29 PM
> Subject: Re: [Gluster-users] Geo replication stuck (rsync: link_stat 
> "(unreachable)")
> 
> Hi Kotresh,
> 
> Thanks for your feedback.
> 
> So do you mean I can simply login into the geo-replication slave node, mount
> the volume with fuse, and delete the problematic directory, and finally
> restart geo-replcation?
> 
   Trying to delete the problematic directory on slave might still result with
  the same ENOTEMPTY error. Try that out, if it does not work out, it needs to
  be deleted from backend bricks directly from all the nodes.

> I am planning to migrate to 3.8 as soon as I have a backup (geo-replication).
> Is this issue with DHT fixed in the latest 3.8.x release?
>
   Most of the issues are addressed.

> Regards,
> M.
> 
>  Original Message 
> Subject: Re: [Gluster-users] Geo replication stuck (rsync: link_stat
> "(unreachable)")
> Local Time: April 13, 2017 7:57 AM
> UTC Time: April 13, 2017 5:57 AM
> From: khire...@redhat.com
> To: mabi <m...@protonmail.ch>
> Gluster Users <gluster-users@gluster.org>
> 
> Hi,
> 
> I think the directory Workhours_2017 is deleted on master and on
> slave it's failing to delete because there might be stale linkto files
> at the back end. These issues are fixed in DHT with latest versions.
> Upgrading to latest version would solve these issues.
> 
> To workaround the issue, you might need to cleanup the problematic
> directory on slave from the backend.
> 
> Thanks and Regards,
> Kotresh H R
> 
> - Original Message -
> > From: "mabi" <m...@protonmail.ch>
> > To: "Kotresh Hiremath Ravishankar" <khire...@redhat.com>
> > Cc: "Gluster Users" <gluster-users@gluster.org>
> > Sent: Thursday, April 13, 2017 12:28:50 AM
> > Subject: Re: [Gluster-users] Geo replication stuck (rsync: link_stat
> > "(unreachable)")
> >
> > Hi Kotresh,
> >
> > Thanks for your hint, adding the "--ignore-missing-args" option to rsync
> > and
> > restarting geo-replication worked but it only managed to sync approximately
> > 1/3 of the data until it put the geo replication in status "Failed" this
> > time. Now I have a different type of error as you can see below from the
> > log
> > extract on my geo replication slave node:
> >
> > [2017-04-12 18:01:55.268923] I [MSGID: 109066]
> > [dht-rename.c:1574:dht_rename]
> > 0-myvol-private-geo-dht: renaming
> > /.gfid/1678ff37-f708-4197-bed0-3ecd87ae1314/Workhours_2017
> > empty.xls.ocTransferId2118183895.part
> > (hash=myvol-private-geo-client-0/cache=myvol-private-geo-client-0) =>
> > /.gfid/1678ff37-f708-4197-bed0-3ecd87ae1314/Workhours_2017 empty.xls
> > (hash=myvol-private-geo-client-0/cache=myvol-private-geo-client-0)
> > [2017-04-12 18:01:55.269842] W [fuse-bridge.c:1787:fuse_rename_cbk]
> > 0-glusterfs-fuse: 4786:
> > /.gfid/1678ff37-f708-4197-bed0-3ecd87ae1314/Workhours_2017
> > empty.xls.ocTransferId2118183895.part ->
> > /.gfid/1678ff37-f708-4197-bed0-3ecd87ae1314/Workhours_2017 empty.xls => -1
> > (Directory not empty)
> > [2017-04-12 18:01:55.314062] I [fuse-bridge.c:5016:fuse_thread_proc]
> > 0-fuse:
> > unmounting /tmp/gsyncd-aux-mount-PNSR8s
> > [2017-04-12 18:01:55.314311] W [glusterfsd.c:1251:cleanup_and_exit]
> > (-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x8064) [0x7f97d3129064]
> > -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x7f97d438a725]
> > -->/usr/sbin/glusterfs(cleanup_and_exit+0x57) [0x7f97d438a5a7] ) 0-:
> > received signum (15), shutting down
> > [2017-04-12 18:01:55.314335] I [fuse-bridge.c:5720:fini] 0-fuse: Unmounting
> > '/tmp/gsyncd-aux-mount-PNSR8s'.
> >
> > How can I fix now this issue and have geo-replication continue
> > synchronising
> > again?
> >
> > Best regards,
> > M.
> >
> >  Original Message ----
> > Subject: Re: [Gluster-users] Geo replication stuck (rsync: link_stat
> > "(unreachable)")
> > Local Time: April 11, 2017 9:18 AM
> > UTC Time: April 11, 2017 7:18 AM
> > From: khire...@redhat.com
> > To: mabi <m...@protonmail.ch>
> > Gluster Users <gluster-users@gluster.org>
> >
> > Hi,
> >
> >

Re: [Gluster-users] Geo replication stuck (rsync: link_stat "(unreachable)")

2017-04-12 Thread Kotresh Hiremath Ravishankar
Hi,

I think the directory Workhours_2017 is deleted on master and on
slave it's failing to delete because there might be stale linkto files
at the back end. These issues are fixed in DHT with latest versions.
Upgrading to latest version would solve these issues.

To workaround the issue, you might need to cleanup the problematic
directory on slave from the backend.

Thanks and Regards,
Kotresh H R

- Original Message -
> From: "mabi" <m...@protonmail.ch>
> To: "Kotresh Hiremath Ravishankar" <khire...@redhat.com>
> Cc: "Gluster Users" <gluster-users@gluster.org>
> Sent: Thursday, April 13, 2017 12:28:50 AM
> Subject: Re: [Gluster-users] Geo replication stuck (rsync: link_stat 
> "(unreachable)")
> 
> Hi Kotresh,
> 
> Thanks for your hint, adding the "--ignore-missing-args" option to rsync and
> restarting geo-replication worked but it only managed to sync approximately
> 1/3 of the data until it put the geo replication in status "Failed" this
> time. Now I have a different type of error as you can see below from the log
> extract on my geo replication slave node:
> 
> [2017-04-12 18:01:55.268923] I [MSGID: 109066] [dht-rename.c:1574:dht_rename]
> 0-myvol-private-geo-dht: renaming
> /.gfid/1678ff37-f708-4197-bed0-3ecd87ae1314/Workhours_2017
> empty.xls.ocTransferId2118183895.part
> (hash=myvol-private-geo-client-0/cache=myvol-private-geo-client-0) =>
> /.gfid/1678ff37-f708-4197-bed0-3ecd87ae1314/Workhours_2017 empty.xls
> (hash=myvol-private-geo-client-0/cache=myvol-private-geo-client-0)
> [2017-04-12 18:01:55.269842] W [fuse-bridge.c:1787:fuse_rename_cbk]
> 0-glusterfs-fuse: 4786:
> /.gfid/1678ff37-f708-4197-bed0-3ecd87ae1314/Workhours_2017
> empty.xls.ocTransferId2118183895.part ->
> /.gfid/1678ff37-f708-4197-bed0-3ecd87ae1314/Workhours_2017 empty.xls => -1
> (Directory not empty)
> [2017-04-12 18:01:55.314062] I [fuse-bridge.c:5016:fuse_thread_proc] 0-fuse:
> unmounting /tmp/gsyncd-aux-mount-PNSR8s
> [2017-04-12 18:01:55.314311] W [glusterfsd.c:1251:cleanup_and_exit]
> (-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x8064) [0x7f97d3129064]
> -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x7f97d438a725]
> -->/usr/sbin/glusterfs(cleanup_and_exit+0x57) [0x7f97d438a5a7] ) 0-:
> received signum (15), shutting down
> [2017-04-12 18:01:55.314335] I [fuse-bridge.c:5720:fini] 0-fuse: Unmounting
> '/tmp/gsyncd-aux-mount-PNSR8s'.
> 
> How can I fix now this issue and have geo-replication continue synchronising
> again?
> 
> Best regards,
> M.
> 
>  Original Message 
> Subject: Re: [Gluster-users] Geo replication stuck (rsync: link_stat
> "(unreachable)")
> Local Time: April 11, 2017 9:18 AM
> UTC Time: April 11, 2017 7:18 AM
> From: khire...@redhat.com
> To: mabi <m...@protonmail.ch>
> Gluster Users <gluster-users@gluster.org>
> 
> Hi,
> 
> Then please use set the following rsync config and let us know if it helps.
> 
> gluster vol geo-rep  :: config rsync-options
> "--ignore-missing-args"
> 
> Thanks and Regards,
> Kotresh H R
> 
> - Original Message -
> > From: "mabi" <m...@protonmail.ch>
> > To: "Kotresh Hiremath Ravishankar" <khire...@redhat.com>
> > Cc: "Gluster Users" <gluster-users@gluster.org>
> > Sent: Tuesday, April 11, 2017 2:15:54 AM
> > Subject: Re: [Gluster-users] Geo replication stuck (rsync: link_stat
> > "(unreachable)")
> >
> > Hi Kotresh,
> >
> > I am using the official Debian 8 (jessie) package which has rsync version
> > 3.1.1.
> >
> > Regards,
> > M.
> >
> >  Original Message 
> > Subject: Re: [Gluster-users] Geo replication stuck (rsync: link_stat
> > "(unreachable)")
> > Local Time: April 10, 2017 6:33 AM
> > UTC Time: April 10, 2017 4:33 AM
> > From: khire...@redhat.com
> > To: mabi <m...@protonmail.ch>
> > Gluster Users <gluster-users@gluster.org>
> >
> > Hi Mabi,
> >
> > What's the rsync version being used?
> >
> > Thanks and Regards,
> > Kotresh H R
> >
> > - Original Message -
> > > From: "mabi" <m...@protonmail.ch>
> > > To: "Gluster Users" <gluster-users@gluster.org>
> > > Sent: Saturday, April 8, 2017 4:20:25 PM
> > > Subject: [Gluster-users] Geo replication stuck (rsync: link_stat
> > > "(unreachable)")
> > >
> > > Hello,
> > >
> > > I am using distributed geo replication with two of my GlusterFS 3.7.20
> > > r

Re: [Gluster-users] Geo replication stuck (rsync: link_stat "(unreachable)")

2017-04-11 Thread Kotresh Hiremath Ravishankar
Hi,

Then please use set the following rsync config and let us know if it helps.

gluster vol geo-rep  :: config rsync-options 
"--ignore-missing-args"

Thanks and Regards,
Kotresh H R

- Original Message -
> From: "mabi" <m...@protonmail.ch>
> To: "Kotresh Hiremath Ravishankar" <khire...@redhat.com>
> Cc: "Gluster Users" <gluster-users@gluster.org>
> Sent: Tuesday, April 11, 2017 2:15:54 AM
> Subject: Re: [Gluster-users] Geo replication stuck (rsync: link_stat 
> "(unreachable)")
> 
> Hi Kotresh,
> 
> I am using the official Debian 8 (jessie) package which has rsync version
> 3.1.1.
> 
> Regards,
> M.
> 
>  Original Message 
> Subject: Re: [Gluster-users] Geo replication stuck (rsync: link_stat
> "(unreachable)")
> Local Time: April 10, 2017 6:33 AM
> UTC Time: April 10, 2017 4:33 AM
> From: khire...@redhat.com
> To: mabi <m...@protonmail.ch>
> Gluster Users <gluster-users@gluster.org>
> 
> Hi Mabi,
> 
> What's the rsync version being used?
> 
> Thanks and Regards,
> Kotresh H R
> 
> - Original Message -
> > From: "mabi" <m...@protonmail.ch>
> > To: "Gluster Users" <gluster-users@gluster.org>
> > Sent: Saturday, April 8, 2017 4:20:25 PM
> > Subject: [Gluster-users] Geo replication stuck (rsync: link_stat
> > "(unreachable)")
> >
> > Hello,
> >
> > I am using distributed geo replication with two of my GlusterFS 3.7.20
> > replicated volumes and just noticed that the geo replication for one volume
> > is not working anymore. It is stuck since the 2017-02-23 22:39 and I tried
> > to stop and restart geo replication but still it stays stuck at that
> > specific date and time under the DATA field of the geo replication "status
> > detail" command I can see 3879 and that it has "Active" as STATUS but still
> > nothing happens. I noticed that the rsync process is running but does not
> > do
> > anything, then I did a strace on the PID of rsync and saw the following:
> >
> > write(2, "rsync: link_stat \"(unreachable)/"..., 114
> >
> > It looks like rsync can't read or find a file and stays stuck on that. In
> > the
> > geo-replication log files of GlusterFS master I can't find any error
> > messages just informational message. For example when I restart the geo
> > replication I see the following log entries:
> >
> > [2017-04-07 21:43:05.664541] I [monitor(monitor):443:distribute] :
> > slave
> > bricks: [{'host': 'gfs1geo.domain', 'dir': '/data/private-geo/brick'}]
> > [2017-04-07 21:43:05.666435] I [monitor(monitor):468:distribute] :
> > worker specs: [('/data/private/brick', 'ssh:// root@gfs1geo.domain
> > :gluster://localhost:private-geo', '1', False)]
> > [2017-04-07 21:43:05.823931] I [monitor(monitor):267:monitor] Monitor:
> > 
> > [2017-04-07 21:43:05.824204] I [monitor(monitor):268:monitor] Monitor:
> > starting gsyncd worker
> > [2017-04-07 21:43:05.930124] I [gsyncd(/data/private/brick):733:main_i]
> > : syncing: gluster://localhost:private -> ssh:// root@gfs1geo.domain
> > :gluster://localhost:private-geo
> > [2017-04-07 21:43:05.931169] I [changelogagent(agent):73:__init__]
> > ChangelogAgent: Agent listining...
> > [2017-04-07 21:43:08.558648] I
> > [master(/data/private/brick):83:gmaster_builder] : setting up xsync
> > change detection mode
> > [2017-04-07 21:43:08.559071] I [master(/data/private/brick):367:__init__]
> > _GMaster: using 'rsync' as the sync engine
> > [2017-04-07 21:43:08.560163] I
> > [master(/data/private/brick):83:gmaster_builder] : setting up
> > changelog
> > change detection mode
> > [2017-04-07 21:43:08.560431] I [master(/data/private/brick):367:__init__]
> > _GMaster: using 'rsync' as the sync engine
> > [2017-04-07 21:43:08.561105] I
> > [master(/data/private/brick):83:gmaster_builder] : setting up
> > changeloghistory change detection mode
> > [2017-04-07 21:43:08.561391] I [master(/data/private/brick):367:__init__]
> > _GMaster: using 'rsync' as the sync engine
> > [2017-04-07 21:43:11.354417] I [master(/data/private/brick):1249:register]
> > _GMaster: xsync temp directory:
> > /var/lib/misc/glusterfsd/private/ssh%3A%2F%2Froot%40192.168.20.107%3Agluster%3A%2F%2F127.0.0.1%3Aprivate-geo/616931ac8f39da5dc5834f9d47fc7b1a/xsync
> > [2017-04-07 21:43:11.354751] I
> > [resource(/data/private/brick):1528:service_loop] GLUSTER: Register time:
> > 149160

Re: [Gluster-users] Geo replication stuck (rsync: link_stat "(unreachable)")

2017-04-09 Thread Kotresh Hiremath Ravishankar
Hi Mabi,

What's the rsync version being used?

Thanks and Regards,
Kotresh H R

- Original Message -
> From: "mabi" 
> To: "Gluster Users" 
> Sent: Saturday, April 8, 2017 4:20:25 PM
> Subject: [Gluster-users] Geo replication stuck (rsync: link_stat  
> "(unreachable)")
> 
> Hello,
> 
> I am using distributed geo replication with two of my GlusterFS 3.7.20
> replicated volumes and just noticed that the geo replication for one volume
> is not working anymore. It is stuck since the 2017-02-23 22:39 and I tried
> to stop and restart geo replication but still it stays stuck at that
> specific date and time under the DATA field of the geo replication "status
> detail" command I can see 3879 and that it has "Active" as STATUS but still
> nothing happens. I noticed that the rsync process is running but does not do
> anything, then I did a strace on the PID of rsync and saw the following:
> 
> write(2, "rsync: link_stat \"(unreachable)/"..., 114
> 
> It looks like rsync can't read or find a file and stays stuck on that. In the
> geo-replication log files of GlusterFS master I can't find any error
> messages just informational message. For example when I restart the geo
> replication I see the following log entries:
> 
> [2017-04-07 21:43:05.664541] I [monitor(monitor):443:distribute] : slave
> bricks: [{'host': 'gfs1geo.domain', 'dir': '/data/private-geo/brick'}]
> [2017-04-07 21:43:05.666435] I [monitor(monitor):468:distribute] :
> worker specs: [('/data/private/brick', 'ssh:// root@gfs1geo.domain
> :gluster://localhost:private-geo', '1', False)]
> [2017-04-07 21:43:05.823931] I [monitor(monitor):267:monitor] Monitor:
> 
> [2017-04-07 21:43:05.824204] I [monitor(monitor):268:monitor] Monitor:
> starting gsyncd worker
> [2017-04-07 21:43:05.930124] I [gsyncd(/data/private/brick):733:main_i]
> : syncing: gluster://localhost:private -> ssh:// root@gfs1geo.domain
> :gluster://localhost:private-geo
> [2017-04-07 21:43:05.931169] I [changelogagent(agent):73:__init__]
> ChangelogAgent: Agent listining...
> [2017-04-07 21:43:08.558648] I
> [master(/data/private/brick):83:gmaster_builder] : setting up xsync
> change detection mode
> [2017-04-07 21:43:08.559071] I [master(/data/private/brick):367:__init__]
> _GMaster: using 'rsync' as the sync engine
> [2017-04-07 21:43:08.560163] I
> [master(/data/private/brick):83:gmaster_builder] : setting up changelog
> change detection mode
> [2017-04-07 21:43:08.560431] I [master(/data/private/brick):367:__init__]
> _GMaster: using 'rsync' as the sync engine
> [2017-04-07 21:43:08.561105] I
> [master(/data/private/brick):83:gmaster_builder] : setting up
> changeloghistory change detection mode
> [2017-04-07 21:43:08.561391] I [master(/data/private/brick):367:__init__]
> _GMaster: using 'rsync' as the sync engine
> [2017-04-07 21:43:11.354417] I [master(/data/private/brick):1249:register]
> _GMaster: xsync temp directory:
> /var/lib/misc/glusterfsd/private/ssh%3A%2F%2Froot%40192.168.20.107%3Agluster%3A%2F%2F127.0.0.1%3Aprivate-geo/616931ac8f39da5dc5834f9d47fc7b1a/xsync
> [2017-04-07 21:43:11.354751] I
> [resource(/data/private/brick):1528:service_loop] GLUSTER: Register time:
> 1491601391
> [2017-04-07 21:43:11.357630] I [master(/data/private/brick):510:crawlwrap]
> _GMaster: primary master with volume id e7a40a1b-45c9-4d3c-bb19-0c59b4eceec5
> ...
> [2017-04-07 21:43:11.489355] I [master(/data/private/brick):519:crawlwrap]
> _GMaster: crawl interval: 1 seconds
> [2017-04-07 21:43:11.516710] I [master(/data/private/brick):1163:crawl]
> _GMaster: starting history crawl... turns: 1, stime: (1487885974, 0), etime:
> 1491601391
> [2017-04-07 21:43:12.607836] I [master(/data/private/brick):1192:crawl]
> _GMaster: slave's time: (1487885974, 0)
> 
> Does anyone know how I can find out the root cause of this problem and make
> geo replication work again from the time point it got stuck?
> 
> Many thanks in advance for your help.
> 
> Best regards,
> Mabi
> 
> 
> 
> 
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Geo-Replication not detecting changes

2017-03-30 Thread Kotresh Hiremath Ravishankar
Hi Jeremiah,

I believe the bug ID is #1437244 and not #1327244.
>From the geo-rep logs, the master volume is failed with "Transport Endpoint 
>Not Connected"
...
[2017-03-30 07:40:57.150348] E [resource(/gv0/foo):234:errlog] Popen: command 
"/usr/sbin/glusterfs --aux-gfid-mount --acl 
--log-file=/var/log/glusterfs/geo-replication/foo/ssh%3A%2F%2Froot%4054.165.144.9%3Agluster%3A%2F%2F127.0.0.1%3Afoo.%2Fgv0%2Ffoo.gluster.log
 --volfile-server=localhost --volfile-id=foo --client-pid=-1 
/tmp/gsyncd-aux-mount-K1j3ZD" returned with 107
..


Could you try flushing iptables on both master and slave nodes and check again?
#iptables -F


Thanks and Regards,
Kotresh H R

- Original Message -
> From: "Jeremiah Rothschild" <jerem...@franz.com>
> To: "Kotresh Hiremath Ravishankar" <khire...@redhat.com>
> Cc: gluster-users@gluster.org
> Sent: Thursday, March 30, 2017 1:16:03 PM
> Subject: Re: [Gluster-users] Geo-Replication not detecting changes
> 
> On Thu, Mar 30, 2017 at 12:51:23AM -0400, Kotresh Hiremath Ravishankar wrote:
> > Hi Jeremiah,
> 
> Hi Kotresh! Thanks for the follow-up!
> 
> > That's really strange. Please enable DEBUG logs for geo-replication as
> > below and send
> > us the logs under "/var/log/glusterfs/geo-replication//*.log"
> > from master node
> > 
> > gluster vol geo-rep  :: config log-level
> > DEBUG
> 
> Ok.
> 
> I started from scratch & enabled debug level logging. The logs have been
> attached to Bugzilla #1327244.
> 
> > Geo-rep has two ways to detect changes.
> > 
> > 1. changelog (Changelog Crawl)
> > 2. xsync (Hybrid Crawl):
> >This is good for initial sync. It has the limitation of not
> >detecting unlinks and renames.
> >So the slave would end up having unlinked files and renamed src file if
> >it is used after initial sync.
> 
> FYI I did try changing the changelog_detector to xsync but it made no
> difference. Note that I also detailed this in the "Additional Info" section
> of the Bugzilla bug.
> 
> > Thanks and Regards,
> 
> Thanks again!
> 
> j
> 
> > Kotresh H R
> > 
> > - Original Message -
> > > From: "Jeremiah Rothschild" <jerem...@franz.com>
> > > To: gluster-users@gluster.org
> > > Sent: Wednesday, March 29, 2017 12:39:11 AM
> > > Subject: Re: [Gluster-users] Geo-Replication not detecting changes
> > > 
> > > Following up on my own thread...
> > > 
> > > I have spent hours and hours setting up, re-setting up, screwing with
> > > undocumented variables, upgrading from LTS to non-LTS, etc etc.
> > > 
> > > Nothing seems to give.
> > > 
> > > This is very much an out-of-the-box setup and core functionality just
> > > isn't
> > > working.
> > > 
> > > Can anyone throw me a bone here? Please? Do I file a bug for such an
> > > open-ended issue? Is everyone assuming I've just screwed a step up? I
> > > must
> > > say the documentation is pretty clear & simple. Do you want more logs?
> > > 
> > > If this is going to be a dead end then so be it but I at least need to
> > > make
> > > sure I've tried my hardest to get a working deployment.
> > > 
> > > Thanks for your time and understanding!
> > > 
> > > j
> > > 
> > > On Thu, Mar 23, 2017 at 11:47:03AM -0700, Jeremiah Rothschild wrote:
> > > > Hey all,
> > > > 
> > > > I have a vanilla geo-replication setup running. It is comprised of two
> > > > servers, both CentOS 7 and GlusterFS 3.8.10:
> > > > 
> > > > * server1: Local server. Master volume named "foo".
> > > > * server2: Remote server. Slave volume named "foo".
> > > > 
> > > > Everything went fine including the initial sync. However, no new
> > > > changes
> > > > are
> > > > being seen or synced.
> > > > 
> > > > Geo-rep status looks clean:
> > > > 
> > > > # gluster volume geo-replication foo server2.franz.com::foo status
> > > > MASTER NODE: server1.x.com
> > > > MASTER VOL: foo
> > > > MASTER BRICK: /gv0/foo
> > > > SLAVE USER: root
> > > > SLAVE NODE: server2.x.com::foo
> > > > STATUS: Active
> > > > CRAWL STATUS: Changelog Crawl
> > > > LAST_SYNCED: 2017-03-23 10:12:57
> > > > 
> > > > In

Re: [Gluster-users] Geo-Replication not detecting changes

2017-03-29 Thread Kotresh Hiremath Ravishankar
Hi Jeremiah,

That's really strange. Please enable DEBUG logs for geo-replication as below 
and send
us the logs under "/var/log/glusterfs/geo-replication//*.log" from 
master node

gluster vol geo-rep  :: config log-level DEBUG

Geo-rep has two ways to detect changes.

1. changelog (Changelog Crawl)
2. xsync (Hybrid Crawl):
   This is good for initial sync. It has the limitation of not 
detecting unlinks and renames.
   So the slave would end up having unlinked files and renamed src file if it 
is used after initial sync.

Thanks and Regards,
Kotresh H R

- Original Message -
> From: "Jeremiah Rothschild" 
> To: gluster-users@gluster.org
> Sent: Wednesday, March 29, 2017 12:39:11 AM
> Subject: Re: [Gluster-users] Geo-Replication not detecting changes
> 
> Following up on my own thread...
> 
> I have spent hours and hours setting up, re-setting up, screwing with
> undocumented variables, upgrading from LTS to non-LTS, etc etc.
> 
> Nothing seems to give.
> 
> This is very much an out-of-the-box setup and core functionality just isn't
> working.
> 
> Can anyone throw me a bone here? Please? Do I file a bug for such an
> open-ended issue? Is everyone assuming I've just screwed a step up? I must
> say the documentation is pretty clear & simple. Do you want more logs?
> 
> If this is going to be a dead end then so be it but I at least need to make
> sure I've tried my hardest to get a working deployment.
> 
> Thanks for your time and understanding!
> 
> j
> 
> On Thu, Mar 23, 2017 at 11:47:03AM -0700, Jeremiah Rothschild wrote:
> > Hey all,
> > 
> > I have a vanilla geo-replication setup running. It is comprised of two
> > servers, both CentOS 7 and GlusterFS 3.8.10:
> > 
> > * server1: Local server. Master volume named "foo".
> > * server2: Remote server. Slave volume named "foo".
> > 
> > Everything went fine including the initial sync. However, no new changes
> > are
> > being seen or synced.
> > 
> > Geo-rep status looks clean:
> > 
> > # gluster volume geo-replication foo server2.franz.com::foo status
> > MASTER NODE: server1.x.com
> > MASTER VOL: foo
> > MASTER BRICK: /gv0/foo
> > SLAVE USER: root
> > SLAVE NODE: server2.x.com::foo
> > STATUS: Active
> > CRAWL STATUS: Changelog Crawl
> > LAST_SYNCED: 2017-03-23 10:12:57
> > 
> > In the geo-rep master log, I see these being triggered:
> > 
> > # tail -n3
> > foo/ssh%3A%2F%2Froot%401.2.3.4%3Agluster%3A%2F%2F127.0.0.1%3Afoo.log
> > [2017-03-23 18:33:34.697525] I [master(/gv0/foo):534:crawlwrap] _GMaster:
> > 20
> > crawls, 0 turns
> > [2017-03-23 18:34:37.441982] I [master(/gv0/foo):534:crawlwrap] _GMaster:
> > 20
> > crawls, 0 turns
> > [2017-03-23 18:35:40.242851] I [master(/gv0/foo):534:crawlwrap] _GMaster:
> > 20
> > crawls, 0 turns
> > 
> > I don't see any errors in any of the other logs.
> > 
> > Not sure what else to poke at here. What are the possible values for the
> > "change_detector" config variable? Would it be worthwhile to test with a
> > method other than "changelog"? Other thoughts/ideas?
> > 
> > Thanks in advance!
> > 
> > j
> > ___
> > Gluster-users mailing list
> > Gluster-users@gluster.org
> > http://lists.gluster.org/mailman/listinfo/gluster-users
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
> 
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] why is geo-rep so bloody impossible?

2017-02-20 Thread Kotresh Hiremath Ravishankar
This could happen if two same ssh-key pub keys one with "command=..." and one 
with out 
distributed to slave ~/.ssh/authorized_keys. Please check and remove the one 
without "command=..".
It should work. For passwordless SSH connection, a separate ssh key pair should 
be create.

Thanks and Regards,
Kotresh H R

- Original Message -
> From: "lejeczek" 
> To: gluster-users@gluster.org
> Sent: Friday, February 17, 2017 8:39:48 PM
> Subject: [Gluster-users] why is geo-rep so bloody impossible?
> 
> hi everyone,
> 
> I've been browsing list's messages and it seems to me that users struggle, I
> do.
> I do what I thought was simple, I follow official docs.
> I, as root always do..
> 
> ]$ gluster system:: execute gsec_create
> 
> ]$ gluster volume geo-replication WORK 10.5.6.32::WORK-Replica create
> push-pem force
> ]$ gluster volume geo-replication WORK 10.5.6.32::WORK-Replica start
> 
> and I see:
> 256:log_raise_exception] : getting "No such file or directory"errors is
> most likely due to MISCONFIGURATION, please remove all the public keys added
> by geo-replication from authorized_keys file in slave nodes and run
> Geo-replication create command again.
> 
> 263:log_raise_exception] : If `gsec_create container` was used, then run
> `gluster volume geo-replication 
> [@]:: config remote-gsyncd 
> (Example GSYNCD_PATH: `/usr/libexec/glusterfs/gsyncd`)
> 
> so I remove all command="tar.. from ~/.ssh/authorized_keys on the geo-repl
> slave, then recreate session on master, but.. naturally, unfortunately it
> was not that.
> So I tried config gsyncd only to see:
> ...
> ..Popen: command "ssh -oPasswordAuthentication=no.. returned with 1, saying:
> 0-cli: Started running /usr/sbin/gluster with version 3.8.8
> 0-cli: Connecting to remote glusterd at localhost
> [event-epoll.c:628:event_dispatch_epoll_worker] 0-epoll: Started thread with
> index 1
> [cli-cmd.c:130:cli_cmd_process] 0-: Exiting with: 110
> gsyncd initializaion failed
> 
> and no idea where how to troubleshoot it further.
> for any help many thanks,
> L.
> 
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] should geo repl pick up changes to a vol?

2017-02-08 Thread Kotresh Hiremath Ravishankar
Hi lejeczek,

Try stop force.

gluster vol geo-rep  :: stop force

Thanks and Regards,
Kotresh H R

Thanks and Regards,
Kotresh H R


- Original Message -
> From: "lejeczek" <pelj...@yahoo.co.uk>
> To: "Kotresh Hiremath Ravishankar" <khire...@redhat.com>
> Cc: gluster-users@gluster.org
> Sent: Tuesday, February 7, 2017 3:42:18 PM
> Subject: [Gluster-users] should geo repl pick up changes to a vol?
> 
> 
> 
> On 03/02/17 07:25, Kotresh Hiremath Ravishankar wrote:
> > Hi,
> >
> > The following steps needs to be followed when a brick is added from new
> > node on master.
> >
> > 1. Stop geo-rep
> geo repl which master volume had a brick removed:
> 
> ~]$ gluster volume geo-replication GROUP-WORK
> 10.5.6.32::GROUP-WORK-Replica status
> 
> MASTER NODEMASTER VOLMASTER
> BRICK  SLAVE USER
> SLAVESLAVE NODESTATUS
> CRAWL STATUS LAST_SYNCED
> -
> 10.5.6.100 GROUP-WORK
> /__.aLocalStorages/3/0-GLUSTERs/GROUP-WORKroot
> 10.5.6.32::GROUP-WORK-Replica10.5.6.32 Active
> History Crawl2017-02-01 15:24:05
> 
> ~]$ gluster volume geo-replication GROUP-WORK
> 10.5.6.32::GROUP-WORK-Replica stop
> Staging failed on 10.5.6.49. Error: Geo-replication session
> between GROUP-WORK and 10.5.6.32::GROUP-WORK-Replica does
> not exist.
> geo-replication command failed
> 
> 10.5.6.49 is the brick which was added, now part of the
> master vol.
> 
> 
> >
> > 2. Run the following command on the master node where passwordless SSH
> >connection is configured, in order to create a common pem pub file.
> >
> >  # gluster system:: execute gsec_create
> >
> > 3. Create the geo-replication session using the following command.
> >The push-pem and force options are required to perform the necessary
> >pem-file setup on the slave nodes.
> >
> >  # gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL
> >  create push-pem force
> >
> > 4. Start geo-rep
> >
> > Thanks and Regards,
> > Kotresh H R
> >
> > - Original Message -
> >> From: "lejeczek" <pelj...@yahoo.co.uk>
> >> To: gluster-users@gluster.org
> >> Sent: Thursday, February 2, 2017 1:14:07 AM
> >> Subject: [Gluster-users] should geo repl pick up changes to a vol?
> >>
> >> dear all
> >>
> >> should gluster update geo repl when a volume changes?
> >> eg. bricks are added, taken away.
> >>
> >> reason I'm asking is because it doe not seem like gluster is
> >> doing it on my systems?
> >> Well, I see gluster removed a node form geo-repl, brick that
> >> I removed.
> >> But I added a brick to a vol and it's not there in geo-repl.
> >>
> >> bw.
> >> L.
> >> ___
> >> Gluster-users mailing list
> >> Gluster-users@gluster.org
> >> http://lists.gluster.org/mailman/listinfo/gluster-users
> >>
> 
> 
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] should geo repl pick up changes to a vol?

2017-02-02 Thread Kotresh Hiremath Ravishankar
Hi,

The following steps needs to be followed when a brick is added from new node on 
master.

1. Stop geo-rep

2. Run the following command on the master node where passwordless SSH
  connection is configured, in order to create a common pem pub file.

# gluster system:: execute gsec_create

3. Create the geo-replication session using the following command.
  The push-pem and force options are required to perform the necessary
  pem-file setup on the slave nodes.

# gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL create 
push-pem force

4. Start geo-rep

Thanks and Regards,
Kotresh H R

- Original Message -
> From: "lejeczek" 
> To: gluster-users@gluster.org
> Sent: Thursday, February 2, 2017 1:14:07 AM
> Subject: [Gluster-users] should geo repl pick up changes to a vol?
> 
> dear all
> 
> should gluster update geo repl when a volume changes?
> eg. bricks are added, taken away.
> 
> reason I'm asking is because it doe not seem like gluster is
> doing it on my systems?
> Well, I see gluster removed a node form geo-repl, brick that
> I removed.
> But I added a brick to a vol and it's not there in geo-repl.
> 
> bw.
> L.
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
> 
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] geo repl status: faulty & errors

2017-02-02 Thread Kotresh Hiremath Ravishankar
Answers inline

Thanks and Regards,
Kotresh H R

- Original Message -
> From: "lejeczek" 
> To: gluster-users@gluster.org
> Sent: Wednesday, February 1, 2017 5:48:55 PM
> Subject: [Gluster-users] geo repl status: faulty & errors
> 
> hi everone,
> 
> trying geo-repl first, I've followed that official howto and the process
> claimed "success" up until I went for status: "Faulty"
> Errors I see:
> ...
> [2017-02-01 12:11:38.103259] I [monitor(monitor):268:monitor] Monitor:
> starting gsyncd worker
> [2017-02-01 12:11:38.342930] I [changelogagent(agent):73:__init__]
> ChangelogAgent: Agent listining...
> [2017-02-01 12:11:38.354500] I
> [gsyncd(/__.aLocalStorages/1/0-GLUSTERs/1GLUSTER-QEMU-VMs):736:main_i]
> : syncing: gluster://localhost:QEMU-VMs ->
> ssh://root@10.5.6.32:gluster://localhost:QEMU-VMs-Replica
> [2017-02-01 12:11:38.581310] E
> [syncdutils(/__.aLocalStorages/1/0-GLUSTERs/1GLUSTER-QEMU-VMs):252:log_raise_exception]
> : connection to peer is broken
> [2017-02-01 12:11:38.581964] E
> [resource(/__.aLocalStorages/1/0-GLUSTERs/1GLUSTER-QEMU-VMs):234:errlog]
> Popen: command "ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no
> -i /var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto
> -S /tmp/gsyncd-aux-ssh-VLX7ff/2bad8986ecbd9ad471c368528e0770f6.sock
> root@10.5.6.32 /nonexistent/gsyncd --session-owner
> 8709782a-daa5-4434-a816-c4e0aef8fef2 -N --listen --timeout 120
> gluster://localhost:QEMU-VMs-Replica" returned with 255, saying:
> [2017-02-01 12:11:38.582236] E
> [resource(/__.aLocalStorages/1/0-GLUSTERs/1GLUSTER-QEMU-VMs):238:logerr]
> Popen: ssh> Permission denied
> (publickey,gssapi-keyex,gssapi-with-mic,password).
> [2017-02-01 12:11:38.582945] I
> [syncdutils(/__.aLocalStorages/1/0-GLUSTERs/1GLUSTER-QEMU-VMs):220:finalize]
> : exiting.
> [2017-02-01 12:11:38.586689] I [repce(agent):92:service_loop] RepceServer:
> terminating on reaching EOF.
> [2017-02-01 12:11:38.587055] I [syncdutils(agent):220:finalize] :
> exiting.
> [2017-02-01 12:11:38.586905] I [monitor(monitor):334:monitor] Monitor:
> worker(/__.aLocalStorages/1/0-GLUSTERs/1GLUSTER-QEMU-VMs) died before
> establishing connection
> 
> It's a bit puzzling as password-less ssh works, I had it before gluster so I
> also tried "create no-verify" just in case.
> This - (/__.aLocalStorages/1/0-GLUSTERs/1GLUSTER-QEMU-VMs) - is master volume
> so I understand it is the slave failing here, right?
There is no problem with passwordless ssh. Geo-rep uses this 
passwordless
  SSH connection to distribute geo-rep specific SSH keys to all the slave 
volumes
  using hook script during setup. You can check the glusterd logs if the hook
  script is failed for some reason.

> This is just one peer of two-peer volume, I'd guess process does not even go
> to the second for the first one fails, thus not in the logs, correct?
  This is log file is w.r.t to this node. It's not one after the other. Please
  check the other master peer node for the errors related to that.
> 

  How to fix:
1. Please run "gluster vol geo-rep  :: 
create push-pem force"
 ---This should fix the issue. If you still fail to root cause the 
issue, use the
   below tool to setup as it distributes keys synchronously and you can 
catch error easily.

   http://aravindavk.in/blog/introducing-georepsetup/

> many thanks for all the help,
> L.
> 
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Geo-replication failed to delete from slave file partially written to master volume.

2016-12-26 Thread Kotresh Hiremath Ravishankar
Hi Viktor,

I was caught up with something else and apologies for the late reply.

I went through the logs and here is the reason for geo-rep failing to sync in 
ENOSPC scenario.
The changelog xlator records the operations on the server side. Geo-replication 
replays the same
to sync to the slave. The changelogs are present in 
//.glusterfs/changelogs/.
The file named 'CHANGELOG' gets created waiting for I/O to record it. By 
default every 15 sec,
if the 'CHANGELOG' file has some operation recorded, it will be renamed to 
'CHANGELOG.'.
And a new 'CHANGELOG' file is created. In this usecase, the new 'CHANGELOG' 
file creation is failed because
of ENOSPC and hence the delete operation is not recorded in changelog. Because 
of which geo-rep didn't sync
it. Later after 15 sec, it could create 'CHANGELOG' file because space was 
reclaimed. After which everything
is syncing. So geo-rep might miss syncing few files in ENOSPC scenario.

Thanks and Regards,
Kotresh H R

- Original Message -
> From: "Viktor Nosov" <vno...@stonefly.com>
> To: "Kotresh Hiremath Ravishankar" <khire...@redhat.com>
> Cc: gluster-users@gluster.org
> Sent: Saturday, December 10, 2016 2:43:35 AM
> Subject: [Gluster-users] Geo-replication failed to delete from slave file 
> partially written to master volume.
> 
> Hi Kotresh,
> 
> Logs from geo-replication test that I used to report the problem were
> overwritten.
> So I run the test again. Each time the problem is reproduced with 100%
> probability.
> This time I took all glusterfs  logs from Master and Slave systems. Logs are
> attached to this message.
> 
> All steps to duplicate the problem are reported at the text file
> "geo-rep-test.txt". This file is attached too.
> Records at the file "geo-rep-test.txt" are in chronological order.
> Names of volumes, bricks, nodes are the same as before.
> 
> The test starts when geo-replication synchronization is working properly.
> After that attempt is made to write file "big.file.1" to the master that has
> not enough space to handle the whole file.
> Linux writes the file partially and geo-replication syncs this partial file
> "big.file.1" with the slave volume.
> 
> The next write of new file "test.file" to the master fails but it creates
> file handler "test.file" on Master for file size zero. This new file
> "test.file" is not replicated to the slave
> volume.
> 
> The next step is to delete partially written file "big.file.1" from the
> master to make free space on the master. The delete is successful,
> but it never is sync to the slave. This is the problem. File "big.file.1" is
> still on the slave volume.
> 
> The  next step is to repeat write for file "test.file". We let system to
> overwrite file on the master volume.
> Now file "test.file" has some contents. But this change did not sync to the
> slave. This is other flavor of the same problem.
> 
> Finally new file "test.file.1" was written to the master. The file was
> successfully replicated to the slave.
> 
> Best regards,
> 
> Viktor Nosov
> 
> 
> 
> -Original Message-
> From: Kotresh Hiremath Ravishankar [mailto:khire...@redhat.com]
> Sent: Friday, December 09, 2016 3:03 AM
> To: Viktor Nosov
> Cc: gluster-users@gluster.org
> Subject: Re: [Gluster-users] Geo-replication failed to delete from slave file
> partially written to master volume.
> 
> Hi Viktor,
> 
> I went through the slave logs and there are no errors w.r.t deletion of the
> files.
> I suspect the changelog has missed recording deletion of file because of
> which the delete is not processed.
> 
> Please share the following logs from master volume to root cause the issue.
> 
> 1. Master geo-replication logs:
> /var/log/glusterfs/geo-replication//*.log
> 2. Master brick logs: /var/log/glusterfs/bricks/*.log 3. Also changelogs from
> Master volume:
> /exports/nas-segment-0012/master-for-183-0003/.glusterfs/changelogs/*
> 
> Thanks and Regards,
> Kotresh H R
> 
> - Original Message -
> > From: "Viktor Nosov" <vno...@stonefly.com>
> > To: "Kotresh Hiremath Ravishankar" <khire...@redhat.com>
> > Cc: gluster-users@gluster.org
> > Sent: Wednesday, December 7, 2016 10:48:52 PM
> > Subject: RE: [Gluster-users] Geo-replication failed to delete from slave
> > filepartially written to master volume.
> > 
> > Hi Kotresh,
> > 
> > Thanks for looking into this issue!
> > I'm attaching log files from the slave node from
> > /var/log/glusterfs/geo-replication-slaves/
> > 
> > [root@SC-183

Re: [Gluster-users] Geo-replication failed to delete from slave file partially written to master volume.

2016-12-09 Thread Kotresh Hiremath Ravishankar
Hi Viktor,

I went through the slave logs and there are no errors w.r.t deletion of the 
files.
I suspect the changelog has missed recording deletion of file because of which
the delete is not processed.

Please share the following logs from master volume to root cause the issue.

1. Master geo-replication logs: 
/var/log/glusterfs/geo-replication//*.log
2. Master brick logs: /var/log/glusterfs/bricks/*.log
3. Also changelogs from Master volume: 
/exports/nas-segment-0012/master-for-183-0003/.glusterfs/changelogs/*

Thanks and Regards,
Kotresh H R

- Original Message -
> From: "Viktor Nosov" <vno...@stonefly.com>
> To: "Kotresh Hiremath Ravishankar" <khire...@redhat.com>
> Cc: gluster-users@gluster.org
> Sent: Wednesday, December 7, 2016 10:48:52 PM
> Subject: RE: [Gluster-users] Geo-replication failed to delete from slave file 
> partially written to master volume.
> 
> Hi Kotresh,
> 
> Thanks for looking into this issue!
> I'm attaching log files from the slave node from
> /var/log/glusterfs/geo-replication-slaves/
> 
> [root@SC-183 log]# cp
> /var/log/glusterfs/geo-replication-slaves/84501a83-b07c-4768-bfaa-418b038e1a9e\:gluster%3A%2F%2F127.0.0.1%3Arem-volume-0001.gluster.log
> /home/vnosov/
> [root@SC-183 log]# cp /var/log/glusterfs/geo-replication-slaves/slave.log
> /home/vnosov/
> [root@SC-183 log]# cp
> /var/log/glusterfs/geo-replication-slaves/mbr/84501a83-b07c-4768-bfaa-418b038e1a9e\:gluster%3A%2F%2F127.0.0.1%3Arem-volume-0001.log
> /home/vnosov/
> 
> Best regards,
> 
> Viktor Nosov
> 
> 
> -Original Message-
> From: Kotresh Hiremath Ravishankar [mailto:khire...@redhat.com]
> Sent: Tuesday, December 06, 2016 9:25 PM
> To: Viktor Nosov
> Cc: gluster-users@gluster.org
> Subject: Re: [Gluster-users] Geo-replication failed to delete from slave file
> partially written to master volume.
> 
> Hi Viktor,
> 
> Please share geo-replication-slave mount logs from slave nodes.
> 
> Thanks and Regards,
> Kotresh H R
> 
> - Original Message -
> > From: "Viktor Nosov" <vno...@stonefly.com>
> > To: gluster-users@gluster.org
> > Cc: vno...@stonefly.com
> > Sent: Tuesday, December 6, 2016 7:13:22 AM
> > Subject: [Gluster-users] Geo-replication failed to delete from slave file
> > partially written to master volume.
> > 
> > Hi,
> > 
> > I hit problem while testing geo-replication. Anybody knows how to fix
> > it except deleting and recreating geo-replication?
> > 
> > Geo-replication failed to delete from slave file partially written to
> > master volume.
> > 
> > Have geo-replication between two nodes that are running glusterfs
> > 3.7.16
> > 
> > with master volume:
> > 
> > [root@SC-182 log]# gluster volume info master-for-183-0003
> > 
> > Volume Name: master-for-183-0003
> > Type: Distribute
> > Volume ID: 84501a83-b07c-4768-bfaa-418b038e1a9e
> > Status: Started
> > Number of Bricks: 1
> > Transport-type: tcp
> > Bricks:
> > Brick1: 10.10.60.182:/exports/nas-segment-0012/master-for-183-0003
> > Options Reconfigured:
> > changelog.changelog: on
> > geo-replication.ignore-pid-check: on
> > geo-replication.indexing: on
> > server.allow-insecure: on
> > performance.quick-read: off
> > performance.stat-prefetch: off
> > nfs.disable: on
> > nfs.addr-namelookup: off
> > performance.readdir-ahead: on
> > cluster.enable-shared-storage: enable
> > snap-activate-on-create: enable
> > 
> > and slave volume:
> > 
> > [root@SC-183 log]# gluster volume info rem-volume-0001
> > 
> > Volume Name: rem-volume-0001
> > Type: Distribute
> > Volume ID: 7680de7a-d0e2-42f2-96a9-4da29adba73c
> > Status: Started
> > Number of Bricks: 1
> > Transport-type: tcp
> > Bricks:
> > Brick1: 10.10.60.183:/exports/nas183-segment-0001/rem-volume-0001
> > Options Reconfigured:
> > performance.readdir-ahead: on
> > nfs.addr-namelookup: off
> > nfs.disable: on
> > performance.stat-prefetch: off
> > performance.quick-read: off
> > server.allow-insecure: on
> > snap-activate-on-create: enable
> > 
> > Master volume mounted on node:
> > 
> > [root@SC-182 log]# mount
> > 127.0.0.1:/master-for-183-0003 on /samba/master-for-183-0003 type
> > fuse.glusterfs (rw,allow_other,max_read=131072)
> > 
> > Let's fill up space on master volume:
> > 
> > [root@SC-182 log]# mkdir /samba/master-for-183-0003/cifs_share/dir3
> > [root@SC-182 log]# cp big.file
> > /samba/master-for-183-0003/cifs_share/dir3/
> > [root@SC

Re: [Gluster-users] Geo-replication failed to delete from slave file partially written to master volume.

2016-12-06 Thread Kotresh Hiremath Ravishankar
Hi Viktor,

Please share geo-replication-slave mount logs from slave nodes.

Thanks and Regards,
Kotresh H R

- Original Message -
> From: "Viktor Nosov" 
> To: gluster-users@gluster.org
> Cc: vno...@stonefly.com
> Sent: Tuesday, December 6, 2016 7:13:22 AM
> Subject: [Gluster-users] Geo-replication failed to delete from slave file 
> partially written to master volume.
> 
> Hi,
> 
> I hit problem while testing geo-replication. Anybody knows how to fix it
> except deleting and recreating geo-replication?
> 
> Geo-replication failed to delete from slave file partially written to master
> volume.
> 
> Have geo-replication between two nodes that are running glusterfs 3.7.16
> 
> with master volume:
> 
> [root@SC-182 log]# gluster volume info master-for-183-0003
> 
> Volume Name: master-for-183-0003
> Type: Distribute
> Volume ID: 84501a83-b07c-4768-bfaa-418b038e1a9e
> Status: Started
> Number of Bricks: 1
> Transport-type: tcp
> Bricks:
> Brick1: 10.10.60.182:/exports/nas-segment-0012/master-for-183-0003
> Options Reconfigured:
> changelog.changelog: on
> geo-replication.ignore-pid-check: on
> geo-replication.indexing: on
> server.allow-insecure: on
> performance.quick-read: off
> performance.stat-prefetch: off
> nfs.disable: on
> nfs.addr-namelookup: off
> performance.readdir-ahead: on
> cluster.enable-shared-storage: enable
> snap-activate-on-create: enable
> 
> and slave volume:
> 
> [root@SC-183 log]# gluster volume info rem-volume-0001
> 
> Volume Name: rem-volume-0001
> Type: Distribute
> Volume ID: 7680de7a-d0e2-42f2-96a9-4da29adba73c
> Status: Started
> Number of Bricks: 1
> Transport-type: tcp
> Bricks:
> Brick1: 10.10.60.183:/exports/nas183-segment-0001/rem-volume-0001
> Options Reconfigured:
> performance.readdir-ahead: on
> nfs.addr-namelookup: off
> nfs.disable: on
> performance.stat-prefetch: off
> performance.quick-read: off
> server.allow-insecure: on
> snap-activate-on-create: enable
> 
> Master volume mounted on node:
> 
> [root@SC-182 log]# mount
> 127.0.0.1:/master-for-183-0003 on /samba/master-for-183-0003 type
> fuse.glusterfs (rw,allow_other,max_read=131072)
> 
> Let's fill up space on master volume:
> 
> [root@SC-182 log]# mkdir /samba/master-for-183-0003/cifs_share/dir3
> [root@SC-182 log]# cp big.file /samba/master-for-183-0003/cifs_share/dir3/
> [root@SC-182 log]# cp big.file
> /samba/master-for-183-0003/cifs_share/dir3/big.file.1
> cp: writing `/samba/master-for-183-0003/cifs_share/dir3/big.file.1': No
> space left on device
> cp: closing `/samba/master-for-183-0003/cifs_share/dir3/big.file.1': No
> space left on device
> 
> File " big.file.1" represent part of the original file:
> [root@SC-182 log]# ls -l /samba/master-for-183-0003/cifs_share/dir3/*
> -rwx-- 1 root root 78930370 Dec  5 16:49
> /samba/master-for-183-0003/cifs_share/dir3/big.file
> -rwx-- 1 root root 22155264 Dec  5 16:49
> /samba/master-for-183-0003/cifs_share/dir3/big.file.1
> 
> Both new files are geo-replicated to the Slave volume successfully:
> 
> [root@SC-183 log]# ls -l
> /exports/nas183-segment-0001/rem-volume-0001/cifs_share/dir3/
> total 98720
> -rwx-- 2 root root 78930370 Dec  5 16:49 big.file
> -rwx-- 2 root root 22155264 Dec  5 16:49 big.file.1
> 
> [root@SC-182 log]# /usr/sbin/gluster volume geo-replication
> master-for-183-0003 nasgorep@10.10.60.183::rem-volume-0001 status detail
> 
> MASTER NODE MASTER VOL MASTER BRICK
> SLAVE USERSLAVE SLAVE NODE
> STATUS
> CRAWL STATUS   LAST_SYNCEDENTRYDATAMETAFAILURES
> CHECKPOINT TIMECHECKPOINT COMPLETEDCHECKPOINT COMPLETION TIME
> 
> 
> 
> 
> --
> 10.10.60.182master-for-183-0003
> /exports/nas-segment-0012/master-for-183-0003nasgorep
> nasgorep@10.10.60.183::rem-volume-000110.10.60.183Active
> Changelog Crawl2016-12-05 16:49:4800   0   0
> N/AN/A N/A
> 
> Let's delete partially written file from the master mount:
> 
> [root@SC-182 log]# rm /samba/master-for-183-0003/cifs_share/dir3/big.file.1
> rm: remove regular file
> `/samba/master-for-183-0003/cifs_share/dir3/big.file.1'? y
> 
> [root@SC-182 log]# ls -l /samba/master-for-183-0003/cifs_share/dir3/*
> -rwx-- 1 root root 78930370 Dec  5 16:49
> /samba/master-for-183-0003/cifs_share/dir3/big.file
> 
> Set checkpoint:
> 
> 32643 12/05/2016 16:57:46.540390536 1480985866 command: /usr/sbin/gluster
> volume geo-replication master-for-183-0003
> nasgorep@10.10.60.183::rem-volume-0001 config checkpoint now 2>&1
> 32643 12/05/2016 16:57:48.770820909 1480985868 status=0 /usr/sbin/gluster
> volume 

Re: [Gluster-users] How to force remove geo session?

2016-11-20 Thread Kotresh Hiremath Ravishankar
Hi,

Glad, you could get it rectified. But, having same slave volume for two 
different
geo-rep sessions is never recommended. The two sessions end up writing to 
same slave node. It's always one master volume to many different slave volume
configuration if required. If ssh-keys are deleted on slave for some reason,
running geo-rep create command with force option would redistribute keys.

'gluster vol gep-rep  :: create puh-pem 
force'

And yes, root user and non-root user is considered as two different sessions.

Thanks and Regards,
Kotresh H R

- Original Message -
> From: "Alexandr Porunov" 
> To: "gluster-users@gluster.org List" 
> Sent: Saturday, November 19, 2016 9:41:05 PM
> Subject: Re: [Gluster-users] How to force remove geo session?
> 
> OK, I have figured out what it was.
> 
> I had a session not with 'root' user but with 'geoaccount' user.
> It seems that we can't have 2 sessions to the one node (even if users are
> different). After deleting a session with 'geoaccount' user I was able to
> create a session with 'root' user.
> 
> On Sat, Nov 19, 2016 at 3:51 PM, Alexandr Porunov <
> alexandr.poru...@gmail.com > wrote:
> 
> 
> 
> Hello,
> I had a geo replication between master nodes and slave nodes. I have removed
> ssh keys for authorization from slave nodes. Now I can neither create
> session for slave nodes nor remove the old useless session. Is it possible
> to manually remove a sessions from all the nodes?
> 
> Here is the problem:
> 
> # gluster volume geo-replication gv0 root@192.168.0.124::gv0 delete
> reset-sync-time
> Geo-replication session between gv0 and 192.168.0.124::gv0 does not exist.
> geo-replication command failed
> 
> # gluster volume geo-replication gv0 root@192.168.0.124::gv0 create ssh-port
> 22 push-pem
> Session between gv0 and 192.168.0.124::gv0 is already created.
> geo-replication command failed
> 
> Sincerely,
> Alexandr
> 
> 
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


  1   2   >