I retried many times, find that when I set slave volume's bricks or nodes below 
6, the geo-replication volume status is OK.
I am not sure if this is a bug. 


Whether Normal or faulty nodes, test result is the same.

[root@SVR8049HW2285 ~]#  bash -x /usr/libexec/glusterfs/gverify.sh filews root 
glusterfs02.sh3.ctripcorp.com filews_slave  "/tmp/gverify.log"
+ BUFFER_SIZE=104857600
++ gluster --print-logdir
+ slave_log_file=/var/log/glusterfs/geo-replication-slaves/slave.log
+ main filews root glusterfs02.sh3.ctripcorp.com filews_slave /tmp/gverify.log
+ log_file=/tmp/gverify.log
+ SSH_PORT=22
+ ping_host glusterfs02.sh3.ctripcorp.com 22
+ '[' 0 -ne 0 ']'
+ ssh -oNumberOfPasswordPrompts=0 r...@glusterfs02.sh3.ctripcorp.com 'echo 
Testing_Passwordless_SSH'
Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
+ '[' 255 -ne 0 ']'
+ echo 'FORCE_BLOCKER|Passwordless ssh login has not been setup with 
glusterfs02.sh3.ctripcorp.com for user root.'
+ exit 1
[root@SVR8049HW2285 ~]#



Best Regards 
杨雨阳 Yuyang Yang
OPS 
Ctrip Infrastructure Service (CIS)
Ctrip Computer Technology (Shanghai) Co., Ltd  
Phone: + 86 21 34064880-15554 | Fax: + 86 21 52514588-13389 
Web: www.Ctrip.com


-----邮件原件-----
发件人: Kotresh Hiremath Ravishankar [mailto:khire...@redhat.com] 
发送时间: Wednesday, May 25, 2016 4:58 PM
收件人: vyyy杨雨阳 <yuyangy...@ctrip.com>
抄送: Saravanakumar Arumugam <sarum...@redhat.com>; Gluster-users@gluster.org; 
Aravinda Vishwanathapura Krishna Murthy <avish...@redhat.com>
主题: Re: [Gluster-users] : geo-replication status partial faulty

Answers inline

Thanks and Regards,
Kotresh H R

----- Original Message -----
> From: "vyyy杨雨阳" <yuyangy...@ctrip.com>
> To: "Kotresh Hiremath Ravishankar" <khire...@redhat.com>
> Cc: "Saravanakumar Arumugam" <sarum...@redhat.com>, 
> Gluster-users@gluster.org, "Aravinda Vishwanathapura Krishna Murthy" 
> <avish...@redhat.com>
> Sent: Wednesday, May 25, 2016 12:34:12 PM
> Subject: [Gluster-users] : geo-replication status partial faulty
> 
> Hi,
> 
> 
> Verify below before proceeding further.
> 
> 1. There is only one session directory in all master nodes.
>    
>     ls -l /var/lib/glusterd/geo-replication/
> 
> 2. I can find  "*.status" file in those nodes that geo-replication 
> status shows active or passive, but there is no "*.status" file when 
> node status is faulty
> 
> Per your instruction to clean up ssh keys and do a fresh setup,  step 
> 3 failed
> 
> 3. Create georep ssh keys again and do create force.
>    gluster system:: exec gsec_create
>    gluster vol geo-rep <master-vol> <slave-host>::<slave-vol1> create
>    push-pem force
> 
> [root@SVR8048HW2285 glusterfs]# gluster volume geo-replication filews 
> glusterfs02.sh3.ctripcorp.com::filews_slave create push-pem force 
> Unable to fetch slave volume details. Please check the slave cluster 
> and slave volume.
> geo-replication command failed

       Then please check the slave cluster status whether it is running fine
  and glusterd is running on all slave nodes. After fixing slave cluster if any 
issues.
  Please check whether the below script runs fine.

  bash -x /usr/libexec/glusterfs/gverify.sh <master_vol_name> root  
glusterfs02.sh3.ctripcorp.com::filews_slave <slave_vol> "/tmp/gverify.log"
  

> [root@SVR8048HW2285 glusterfs]#
> [root@SVR8048HW2285 glusterfs]# ssh -i 
> /var/lib/glusterd/geo-replication/secret.pem
> r...@glusterfs02.sh3.ctripcorp.com
> Last login: Wed May 25 14:33:15 2016 from 10.8.231.11 This is a 
> private network server, in monitoring state.
> It is strictly prohibited to unauthorized access and used.
> [root@SVR6520HW2285 ~]#
> 
> etc-glusterfs-glusterd.vol.log loged following message
> 
> [2016-05-25 06:47:47.698364] E
> [glusterd-geo-rep.c:2012:glusterd_verify_slave] 0-: Not a valid slave
> [2016-05-25 06:47:47.698433] E
> [glusterd-geo-rep.c:2240:glusterd_op_stage_gsync_create] 0-:
> glusterfs02.sh3.ctripcorp.com::filews_slave is not a valid slave volume.
> Error: Unable to fetch slave volume details. Please check the slave 
> cluster and slave volume.
> [2016-05-25 06:47:47.698451] E 
> [glusterd-syncop.c:1201:gd_stage_op_phase]
> 0-management: Staging of operation 'Volume Geo-replication Create' 
> failed on localhost : Unable to fetch slave volume details. Please 
> check the slave cluster and slave volume.
> 
> 
> 
> 
> 
> 
> Best Regards
> 杨雨阳 Yuyang Yang
> 
> 
> -----邮件原件-----
> 发件人: Kotresh Hiremath Ravishankar [mailto:khire...@redhat.com]
> 发送时间: Wednesday, May 25, 2016 2:06 PM
> 收件人: vyyy杨雨阳 <yuyangy...@ctrip.com>
> 抄送: Saravanakumar Arumugam <sarum...@redhat.com>; 
> Gluster-users@gluster.org; Aravinda Vishwanathapura Krishna Murthy 
> <avish...@redhat.com>
> 主题: Re: geo-replication status partial faulty
> 
> Hi,
> 
> Verify below before proceeding further.
> 
> 1. Run the following command in all the master nodes and
>    You should find only one directory (session directory)
>    and rest all are files. If you find two directories, it
>    needs a clean up in all master nodes to have the same
>    session directory in all master nodes.
>    
>     ls -l /var/lib/glusterd/geo-replication/
> 
> 2. Run the following command in all master nodes and you should
>    find "*.status" file in all of them.
> 
>     ls -l /var/lib/glusterd/geo-replication/<session_directory>
> 
> 
> Follow the below steps to clean up ssh keys and do a fresh setup.
> 
> In all the slave nodes, clean up ssh keys prefixed with 
> command=...gsyncd and command=tar.. in /root/.ssh/authorized_keys. 
> Also cleanup id_rsa.pub if you had copied form secret.pem and setup 
> usual passwordless ssh connection using ssh-copy-id
> 
> 1. Establish passwordless SSH between one of master node and one of 
> slave node.
>    (not required to copy secret.pem use the usual ssh-copy-id way)
>    Remember to run all geo-rep commands on same master node and use the same
>    slave node for geo-rep commands.
> 
> 2. Stop and Delete geo-rep session as follows.
>    gluster vol geo-rep <master-vol> <slave-host1>::<slave-vol> stop
>    gluster vol geo-rep <master-vol> <slave-host1>::<slave-vol> delete
> 
> 3. Create georep ssh keys again and do create force.
>    gluster system:: exec gsec_create
>    gluster vol geo-rep <master-vol> <slave-host>::<slave-vol1> create
>    push-pem force
> 
> 4. Verify keys have been distributed properly. The below command 
> should automatically
>    run the gsycnd.py without asking password from any master node to any
>    slave host.
> 
>    ssh -i /var/lib/glusterd/geo-replication/secret.pem 
> root@<slave-host>
> 
> 4. Start geo-rep
>    gluster vol geo-rep <master-vol> <slave-host>::<slave-vol1> start
> 
> Let me know if you still face issues.
> 
> 
> Thanks and Regards,
> Kotresh H R
> 
> 
> 
> 
> Thanks and Regards,
> Kotresh H R
> 
> ----- Original Message -----
> > From: "vyyy杨雨阳" <yuyangy...@ctrip.com>
> > To: "Kotresh Hiremath Ravishankar" <khire...@redhat.com>
> > Cc: "Saravanakumar Arumugam" <sarum...@redhat.com>, 
> > Gluster-users@gluster.org, "Aravinda Vishwanathapura Krishna Murthy"
> > <avish...@redhat.com>
> > Sent: Wednesday, May 25, 2016 7:11:08 AM
> > Subject: 答复: 答复: 答复: 答复: 答复: [Gluster-users] 答复: geo-replication 
> > status partial faulty
> > 
> > Commands output as following, Thanks
> > 
> > [root@SVR8048HW2285 ~]# gluster volume geo-replication filews 
> > glusterfs01.sh3.ctripcorp.com::filews_slave status
> >  
> > MASTER NODE      MASTER VOL    MASTER BRICK          SLAVE
> > STATUS     CHECKPOINT STATUS
> > CRAWL STATUS
> > -------------------------------------------------------------------------------------------------------------------------------------------------
> > SVR8048HW2285    filews        /export/sdb/filews
> > glusterfs01.sh3.ctripcorp.com::filews_slave    faulty     N/A
> > N/A
> > SH02SVR5954      filews        /export/sdb/brick1
> > glusterfs01.sh3.ctripcorp.com::filews_slave    faulty     N/A
> > N/A
> > SH02SVR5951      filews        /export/sdb/brick1
> > glusterfs06.sh3.ctripcorp.com::filews_slave    Passive    N/A
> > N/A
> > SVR8050HW2285    filews        /export/sdb/filews
> > glusterfs03.sh3.ctripcorp.com::filews_slave    Passive    N/A
> > N/A
> > SVR8049HW2285    filews        /export/sdb/filews
> > glusterfs05.sh3.ctripcorp.com::filews_slave    Active     N/A
> > Hybrid Crawl
> > SVR8047HW2285    filews        /export/sdb/filews
> > glusterfs01.sh3.ctripcorp.com::filews_slave    Active     N/A
> > Hybrid Crawl
> > SVR6995HW2285    filews        /export/sdb/filews
> > glusterfs01.sh3.ctripcorp.com::filews_slave    faulty     N/A
> > N/A
> > SVR6993HW2285    filews        /export/sdb/filews
> > glusterfs01.sh3.ctripcorp.com::filews_slave    faulty     N/A
> > N/A
> > SH02SVR5953      filews        /export/sdb/brick1
> > glusterfs01.sh3.ctripcorp.com::filews_slave    faulty     N/A
> > N/A
> > SH02SVR5952      filews        /export/sdb/brick1
> > glusterfs01.sh3.ctripcorp.com::filews_slave    faulty     N/A
> > N/A
> > SVR6996HW2285    filews        /export/sdb/filews
> > glusterfs04.sh3.ctripcorp.com::filews_slave    Passive    N/A
> > N/A
> > SVR6994HW2285    filews        /export/sdb/filews
> > glusterfs02.sh3.ctripcorp.com::filews_slave    Passive    N/A
> > N/A
> >            
> > [root@SVR8048HW2285 ~]# ls -l /var/lib/glusterd/geo-replication/
> > total 40
> > -rw------- 1 root root 14140 May 20 16:00 common_secret.pem.pub 
> > drwxr-xr-x 2 root root  4096 May 25 09:35 
> > filews_glusterfs01.sh3.ctripcorp.com_filews_slave
> > -rwxr-xr-x 1 root root  1845 May 17 15:04 gsyncd_template.conf
> > -rw------- 1 root root  1675 May 20 11:03 secret.pem
> > -rw-r--r-- 1 root root   400 May 20 11:03 secret.pem.pub
> > -rw------- 1 root root  1675 May 20 16:00 tar_ssh.pem
> > -rw-r--r-- 1 root root   400 May 20 16:00 tar_ssh.pem.pub
> > [root@SVR8048HW2285 ~]#
> > 
> > 
> > 
> > Best Regards
> > 杨雨阳 Yuyang Yang
> > OPS
> > Ctrip Infrastructure Service (CIS)
> > Ctrip Computer Technology (Shanghai) Co., Ltd
> > Phone: + 86 21 34064880-15554 | Fax: + 86 21 52514588-13389
> > Web: www.Ctrip.com
> > 
> > 
> > -----邮件原件-----
> > 发件人: Kotresh Hiremath Ravishankar [mailto:khire...@redhat.com]
> > 发送时间: Tuesday, May 24, 2016 6:41 PM
> > 收件人: vyyy杨雨阳 <yuyangy...@ctrip.com>
> > 抄送: Saravanakumar Arumugam <sarum...@redhat.com>; 
> > Gluster-users@gluster.org; Aravinda Vishwanathapura Krishna Murthy 
> > <avish...@redhat.com>
> > 主题: Re: 答复: 答复: 答复: 答复: [Gluster-users] 答复: geo-replication status 
> > partial faulty
> > 
> > Ok, it looks like there is a problem with ssh key distribution.
> > 
> > Before I suggest to clean those up and do setup again, could you 
> > share the output of following commands
> > 
> > 1. gluster vol geo-rep <master_vol> <slave_host>::slave status 2. ls 
> > -l /var/lib/glusterd/geo-replication/
> > 
> > Is there multiple geo-rep sessions from this master volume or only one?
> > 
> > Thanks and Regards,
> > Kotresh H R
> > 
> > ----- Original Message -----
> > > From: "vyyy杨雨阳" <yuyangy...@ctrip.com>
> > > To: "Kotresh Hiremath Ravishankar" <khire...@redhat.com>
> > > Cc: "Saravanakumar Arumugam" <sarum...@redhat.com>, 
> > > Gluster-users@gluster.org, "Aravinda Vishwanathapura Krishna Murthy"
> > > <avish...@redhat.com>
> > > Sent: Tuesday, May 24, 2016 3:19:55 PM
> > > Subject: 答复: 答复: 答复: 答复: [Gluster-users] 答复: geo-replication 
> > > status partial faulty
> > > 
> > > We can establish passwordless ssh directly with command 'ssh' , 
> > > but when create push-pem, it shows ' Passwordless ssh login has 
> > > not been setup '
> > > unless copy secret.pem to *id_rsa.pub
> > > 
> > > [root@SVR8048HW2285 ~]#  ssh -i
> > > /var/lib/glusterd/geo-replication/secret.pem
> > > r...@glusterfs01.sh3.ctripcorp.com
> > > Last login: Tue May 24 17:23:53 2016 from 10.8.230.213 This is a 
> > > private network server, in monitoring state.
> > > It is strictly prohibited to unauthorized access and used.
> > > [root@SVR6519HW2285 ~]#
> > > 
> > > 
> > > [root@SVR8048HW2285 filews]# gluster volume geo-replication filews 
> > > glusterfs01.sh3.ctripcorp.com::filews_slave create push-pem force 
> > > Passwordless ssh login has not been setup with 
> > > glusterfs01.sh3.ctripcorp.com for user root.
> > > geo-replication command failed
> > > [root@SVR8048HW2285 filews]#
> > > 
> > > 
> > > 
> > > Best Regards
> > > 杨雨阳 Yuyang Yang
> > > 
> > > 
> > > -----邮件原件-----
> > > 发件人: Kotresh Hiremath Ravishankar [mailto:khire...@redhat.com]
> > > 发送时间: Tuesday, May 24, 2016 3:22 PM
> > > 收件人: vyyy杨雨阳 <yuyangy...@ctrip.com>
> > > 抄送: Saravanakumar Arumugam <sarum...@redhat.com>; 
> > > Gluster-users@gluster.org; Aravinda Vishwanathapura Krishna Murthy 
> > > <avish...@redhat.com>
> > > 主题: Re: 答复: 答复: 答复: [Gluster-users] 答复: geo-replication status 
> > > partial faulty
> > > 
> > > Hi
> > > 
> > > Could you try following command from corresponding masters to 
> > > faulty slave nodes and share the output?
> > > The below command should not ask for password and should run gsync.py.
> > > 
> > > ssh -i /var/lib/glusterd/geo-replication/secret.pem root@<faulty
> > > hosts>
> > > 
> > > To establish passwordless ssh, it is not necessary to copy 
> > > secret.pem to *id_rsa.pub.
> > > 
> > > If the geo-rep session is already established, passwordless ssh 
> > > would already be there.
> > > My suspect is that when I asked you to do 'create force' you did 
> > > it using another slave where password less ssh was not setup. This 
> > > would create another session directory in 
> > > '/var/lib/glusterd/geo-replication' i.e
> > > (<master_vol>_<slave_host>_<slave_vol>)
> > > 
> > > Please check and let us know.
> > > 
> > > Thanks and Regards,
> > > Kotresh H R
> > > 
> > > ----- Original Message -----
> > > > From: "vyyy杨雨阳" <yuyangy...@ctrip.com>
> > > > To: "Kotresh Hiremath Ravishankar" <khire...@redhat.com>
> > > > Cc: "Saravanakumar Arumugam" <sarum...@redhat.com>, 
> > > > Gluster-users@gluster.org, "Aravinda Vishwanathapura Krishna Murthy"
> > > > <avish...@redhat.com>
> > > > Sent: Friday, May 20, 2016 12:35:58 PM
> > > > Subject: 答复: 答复: 答复: [Gluster-users] 答复: geo-replication status 
> > > > partial faulty
> > > > 
> > > > Hello, Kotresh
> > > > 
> > > > I 'create force', but still some nodes work ,some nodes faulty.
> > > > 
> > > > On faulty nodes
> > > > etc-glusterfs-glusterd.vol.log shown:
> > > > [2016-05-20 06:27:03.260870] I
> > > > [glusterd-geo-rep.c:3516:glusterd_read_status_file] 0-: Using 
> > > > passed config 
> > > > template(/var/lib/glusterd/geo-replication/filews_glusterfs01.sh3.ctripcorp.com_filews_slave/gsyncd.conf).
> > > > [2016-05-20 06:27:03.404544] E
> > > > [glusterd-geo-rep.c:3200:glusterd_gsync_read_frm_status] 0-:
> > > > Unable to read gsyncd status file
> > > > [2016-05-20 06:27:03.404583] E
> > > > [glusterd-geo-rep.c:3603:glusterd_read_status_file] 0-: Unable 
> > > > to read the statusfile for /export/sdb/brick1 brick for 
> > > > filews(master),
> > > > glusterfs01.sh3.ctripcorp.com::filews_slave(slave) session
> > > > 
> > > > 
> > > > /var/log/glusterfs/geo-replication/filews/ssh%3A%2F%2Froot%4010.15.65.
> > > > 66%3Agluster%3A%2F%2F127.0.0.1%3Afilews_slave.log
> > > > shown:
> > > > [2016-05-20 15:04:01.858340] I [monitor(monitor):215:monitor] Monitor:
> > > > ------------------------------------------------------------
> > > > [2016-05-20 15:04:01.858688] I [monitor(monitor):216:monitor] Monitor:
> > > > starting gsyncd worker
> > > > [2016-05-20 15:04:01.986754] D [gsyncd(agent):627:main_i] <top>:
> > > > rpc_fd:
> > > > '7,11,10,9'
> > > > [2016-05-20 15:04:01.987505] I 
> > > > [changelogagent(agent):72:__init__]
> > > > ChangelogAgent: Agent listining...
> > > > [2016-05-20 15:04:01.988079] I [repce(agent):92:service_loop]
> > > > RepceServer:
> > > > terminating on reaching EOF.
> > > > [2016-05-20 15:04:01.988238] I [syncdutils(agent):214:finalize] <top>:
> > > > exiting.
> > > > [2016-05-20 15:04:01.988250] I [monitor(monitor):267:monitor] Monitor:
> > > > worker(/export/sdb/brick1) died before establishing connection
> > > > 
> > > > Can you help me!
> > > > 
> > > > 
> > > > Best Regards
> > > > 杨雨阳 Yuyang Yang
> > > > 
> > > > 
> > > > 
> > > > -----邮件原件-----
> > > > 发件人: vyyy杨雨阳
> > > > 发送时间: Thursday, May 19, 2016 7:45 PM
> > > > 收件人: 'Kotresh Hiremath Ravishankar' <khire...@redhat.com>
> > > > 抄送: Saravanakumar Arumugam <sarum...@redhat.com>; 
> > > > Gluster-users@gluster.org; Aravinda Vishwanathapura Krishna 
> > > > Murthy <avish...@redhat.com>
> > > > 主题: 答复: 答复: 答复: [Gluster-users] 答复: geo-replication status 
> > > > partial faulty
> > > > 
> > > > Still not work.
> > > > 
> > > > I need copy /var/lib/glusterd/geo-replication/secret.* to 
> > > > /root/.ssh/id_rsa to make passwordless ssh work.
> > > > 
> > > >  I generate /var/lib/glusterd/geo-replication/secret.pem file on 
> > > > every  master nodes.
> > > > 
> > > > I am not sure is this right.
> > > > 
> > > > 
> > > > [root@sh02svr5956 ~]# gluster volume geo-replication filews 
> > > > glusterfs01.sh3.ctripcorp.com::filews_slave create push-pem 
> > > > force Passwordless ssh login has not been setup with 
> > > > glusterfs01.sh3.ctripcorp.com for user root.
> > > > geo-replication command failed
> > > > 
> > > > [root@sh02svr5956 .ssh]# cp
> > > > /var/lib/glusterd/geo-replication/secret.pem
> > > > ./id_rsa
> > > > cp: overwrite `./id_rsa'? y
> > > > [root@sh02svr5956 .ssh]# cp
> > > > /var/lib/glusterd/geo-replication/secret.pem.pub
> > > > ./id_rsa.pub
> > > > cp: overwrite `./id_rsa.pub'?
> > > > 
> > > >  [root@sh02svr5956 ~]# gluster volume geo-replication filews 
> > > > glusterfs01.sh3.ctripcorp.com::filews_slave create push-pem 
> > > > force Creating  geo-replication session between filews & 
> > > > glusterfs01.sh3.ctripcorp.com::filews_slave has been successful
> > > > [root@sh02svr5956 ~]#
> > > > 
> > > > 
> > > > 
> > > > 
> > > > Best Regards
> > > > 杨雨阳 Yuyang Yang
> > > > OPS
> > > > Ctrip Infrastructure Service (CIS) Ctrip Computer Technology
> > > > (Shanghai) Co., Ltd
> > > > Phone: + 86 21 34064880-15554 | Fax: + 86 21 52514588-13389
> > > > Web: www.Ctrip.com
> > > > 
> > > > 
> > > > -----邮件原件-----
> > > > 发件人: Kotresh Hiremath Ravishankar [mailto:khire...@redhat.com]
> > > > 发送时间: Thursday, May 19, 2016 5:07 PM
> > > > 收件人: vyyy杨雨阳 <yuyangy...@ctrip.com>
> > > > 抄送: Saravanakumar Arumugam <sarum...@redhat.com>; 
> > > > Gluster-users@gluster.org; Aravinda Vishwanathapura Krishna 
> > > > Murthy <avish...@redhat.com>
> > > > 主题: Re: 答复: 答复: [Gluster-users] 答复: geo-replication status 
> > > > partial faulty
> > > > 
> > > > Hi,
> > > > 
> > > > Could you just try 'create force' once to fix those status file errors?
> > > > 
> > > > e.g., 'gluster volume geo-rep <master vol> <slave host>::<slave
> > > > vol> create push-pem force
> > > > 
> > > > Thanks and Regards,
> > > > Kotresh H R
> > > > 
> > > > ----- Original Message -----
> > > > > From: "vyyy杨雨阳" <yuyangy...@ctrip.com>
> > > > > To: "Saravanakumar Arumugam" <sarum...@redhat.com>, 
> > > > > Gluster-users@gluster.org, "Aravinda Vishwanathapura Krishna Murthy"
> > > > > <avish...@redhat.com>, "Kotresh Hiremath Ravishankar"
> > > > > <khire...@redhat.com>
> > > > > Sent: Thursday, May 19, 2016 2:15:34 PM
> > > > > Subject: 答复: 答复: [Gluster-users] 答复: geo-replication status 
> > > > > partial faulty
> > > > > 
> > > > > I have checked all the nodes both on masters and slaves, the 
> > > > > software is the same.
> > > > > 
> > > > > I am puzzled why there were half masters work, halt faulty.
> > > > > 
> > > > > 
> > > > > [admin@SVR6996HW2285 ~]$ rpm -qa |grep gluster
> > > > > glusterfs-api-3.6.3-1.el6.x86_64
> > > > > glusterfs-fuse-3.6.3-1.el6.x86_64
> > > > > glusterfs-geo-replication-3.6.3-1.el6.x86_64
> > > > > glusterfs-3.6.3-1.el6.x86_64
> > > > > glusterfs-cli-3.6.3-1.el6.x86_64
> > > > > glusterfs-server-3.6.3-1.el6.x86_64
> > > > > glusterfs-libs-3.6.3-1.el6.x86_64
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > Best Regards
> > > > > 杨雨阳 Yuyang Yang
> > > > > 
> > > > > OPS
> > > > > Ctrip Infrastructure Service (CIS) Ctrip Computer Technology
> > > > > (Shanghai) Co., Ltd
> > > > > Phone: + 86 21 34064880-15554 | Fax: + 86 21 52514588-13389
> > > > > Web: www.Ctrip.com<http://www.ctrip.com/>
> > > > > 
> > > > > 
> > > > > 
> > > > > 发件人: Saravanakumar Arumugam [mailto:sarum...@redhat.com]
> > > > > 发送时间: Thursday, May 19, 2016 4:33 PM
> > > > > 收件人: vyyy杨雨阳 <yuyangy...@ctrip.com>; 
> > > > > Gluster-users@gluster.org; Aravinda Vishwanathapura Krishna 
> > > > > Murthy <avish...@redhat.com>; Kotresh Hiremath Ravishankar 
> > > > > <khire...@redhat.com>
> > > > > 主题: Re: 答复: [Gluster-users] 答复: geo-replication status partial 
> > > > > faulty
> > > > > 
> > > > > Hi,
> > > > > +geo-rep team.
> > > > > 
> > > > > Can you get the gluster version you are using?
> > > > > 
> > > > > # For example:
> > > > > rpm -qa | grep gluster
> > > > > 
> > > > > I hope you have same gluster version installed everywhere.
> > > > > Please double check and share the same.
> > > > > 
> > > > > Thanks,
> > > > > Saravana
> > > > > On 05/19/2016 01:37 PM, vyyy杨雨阳 wrote:
> > > > > Hi, Saravana
> > > > > 
> > > > > I have changed log level to DEBUG. Then start geo-replication 
> > > > > with log-file option, attached the file.
> > > > > 
> > > > > gluster volume geo-replication filews 
> > > > > glusterfs01.sh3.ctripcorp.com::filews_slave start 
> > > > > --log-file=geo.log
> > > > > 
> > > > > I have checked  /root/.ssh/authorized_keys in 
> > > > > glusterfs01.sh3.ctripcorp.com , It  have entries in 
> > > > > /var/lib/glusterd/geo-replication/common_secret.pem.pub.
> > > > > and I have removed the lines not started with “command=”
> > > > > 
> > > > > ssh -i /var/lib/glusterd/geo-replication/secret.pem  root@ 
> > > > > glusterfs01.sh3.ctripcorp.com I can see gsyncd messages and no 
> > > > > ssh error.
> > > > > 
> > > > > 
> > > > > Attached etc-glusterfs-glusterd.vol.log from faulty node, it shows :
> > > > > 
> > > > > [2016-05-19 06:39:23.405974] I 
> > > > > [glusterd-geo-rep.c:3516:glusterd_read_status_file] 0-: Using 
> > > > > passed config 
> > > > > template(/var/lib/glusterd/geo-replication/filews_glusterfs01.sh3.ctripcorp.com_filews_slave/gsyncd.conf).
> > > > > [2016-05-19 06:39:23.541169] E 
> > > > > [glusterd-geo-rep.c:3200:glusterd_gsync_read_frm_status] 0-:
> > > > > Unable to read gsyncd status file
> > > > > [2016-05-19 06:39:23.541210] E 
> > > > > [glusterd-geo-rep.c:3603:glusterd_read_status_file] 0-: Unable 
> > > > > to read the statusfile for /export/sdb/filews brick for 
> > > > > filews(master),
> > > > > glusterfs01.sh3.ctripcorp.com::filews_slave(slave) session
> > > > > [2016-05-19 06:39:29.472047] I 
> > > > > [glusterd-geo-rep.c:1835:glusterd_get_statefile_name] 0-: 
> > > > > Using passed config 
> > > > > template(/var/lib/glusterd/geo-replication/filews_glusterfs01.sh3.ctripcorp.com_filews_slave/gsyncd.conf).
> > > > > [2016-05-19 06:39:34.939709] I 
> > > > > [glusterd-geo-rep.c:3516:glusterd_read_status_file] 0-: Using 
> > > > > passed config 
> > > > > template(/var/lib/glusterd/geo-replication/filews_glusterfs01.sh3.ctripcorp.com_filews_slave/gsyncd.conf).
> > > > > [2016-05-19 06:39:35.058520] E 
> > > > > [glusterd-geo-rep.c:3200:glusterd_gsync_read_frm_status] 0-:
> > > > > Unable to read gsyncd status file
> > > > > 
> > > > > 
> > > > > /var/log/glusterfs/geo-replication/filews/
> > > > > ssh%3A%2F%2Froot%4010.15.65.66%3Agluster%3A%2F%2F127.0.0.1%3Af
> > > > > il
> > > > > ew
> > > > > s_
> > > > > sl
> > > > > ave.log
> > > > > shows as following:
> > > > > 
> > > > > [2016-05-19 15:11:37.307755] I [monitor(monitor):215:monitor]
> > > > > Monitor:
> > > > > ------------------------------------------------------------
> > > > > [2016-05-19 15:11:37.308059] I [monitor(monitor):216:monitor]
> > > > > Monitor:
> > > > > starting gsyncd worker
> > > > > [2016-05-19 15:11:37.423320] D [gsyncd(agent):627:main_i] <top>:
> > > > > rpc_fd:
> > > > > '7,11,10,9'
> > > > > [2016-05-19 15:11:37.423882] I 
> > > > > [changelogagent(agent):72:__init__]
> > > > > ChangelogAgent: Agent listining...
> > > > > [2016-05-19 15:11:37.423906] I [monitor(monitor):267:monitor]
> > > > > Monitor:
> > > > > worker(/export/sdb/filews) died before establishing connection
> > > > > [2016-05-19 15:11:37.424151] I [repce(agent):92:service_loop]
> > > > > RepceServer:
> > > > > terminating on reaching EOF.
> > > > > [2016-05-19 15:11:37.424335] I 
> > > > > [syncdutils(agent):214:finalize]
> > > > > <top>:
> > > > > exiting.
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > Best Regards
> > > > > Yuyang Yang
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 发 件人: Saravanakumar Arumugam [mailto:sarum...@redhat.com]and 
> > > > > share what's the output?
> > > > > 发送时间: Thursday, May 19, 2016 1:59 PM
> > > > > 收件人: vyyy杨雨阳
> > > > > <yuyangy...@ctrip.com><mailto:yuyangy...@ctrip.com>;
> > > > > Gluster-users@gluster.org<mailto:Gluster-users@gluster.org>
> > > > > 主题: Re: [Gluster-users] 答复: geo-replication status partial 
> > > > > faulty
> > > > > 
> > > > > Hi,
> > > > > 
> > > > > There seems to be some issue in glusterfs01.sh3.ctripcorp.com 
> > > > > slave node.
> > > > > Can you share the complete logs ?
> > > > > 
> > > > > You can increase verbosity of debug messages like this:
> > > > > gluster volume geo-replication <master volume> <slave
> > > > > host>::<slave
> > > > > volume> config log-level DEBUG
> > > > > 
> > > > > 
> > > > > Also, check  /root/.ssh/authorized_keys in 
> > > > > glusterfs01.sh3.ctripcorp.com It should have entries in 
> > > > > /var/lib/glusterd/geo-replication/common_secret.pem.pub 
> > > > > (present in master node).
> > > > > 
> > > > > Have a look at this one for example:
> > > > > https://www.gluster.org/pipermail/gluster-users/2015-August/02
> > > > > 31
> > > > > 74
> > > > > .h
> > > > > tm
> > > > > l
> > > > > 
> > > > > Thanks,
> > > > > Saravana
> > > > > On 05/19/2016 07:53 AM, vyyy杨雨阳 wrote:
> > > > > Hello,
> > > > > 
> > > > > I have tried to config a geo-replication volume , all the 
> > > > > master nodes configuration are the same, When I start this 
> > > > > volume, the status shows partial faulty as following:
> > > > > 
> > > > > gluster volume geo-replication filews 
> > > > > glusterfs01.sh3.ctripcorp.com::filews_slave status
> > > > > 
> > > > > MASTER NODE      MASTER VOL    MASTER BRICK          SLAVE
> > > > > STATUS     CHECKPOINT STATUS
> > > > > CRAWL STATUS
> > > > > -------------------------------------------------------------------------------------------------------------------------------------------------
> > > > > SVR8048HW2285    filews        /export/sdb/filews
> > > > > glusterfs01.sh3.ctripcorp.com::filews_slave    faulty     N/A
> > > > > N/A
> > > > > SVR8050HW2285    filews        /export/sdb/filews
> > > > > glusterfs03.sh3.ctripcorp.com::filews_slave    Passive    N/A
> > > > > N/A
> > > > > SVR8047HW2285    filews        /export/sdb/filews
> > > > > glusterfs01.sh3.ctripcorp.com::filews_slave    Active     N/A
> > > > > Hybrid Crawl
> > > > > SVR8049HW2285    filews        /export/sdb/filews
> > > > > glusterfs05.sh3.ctripcorp.com::filews_slave    Active     N/A
> > > > > Hybrid Crawl
> > > > > SH02SVR5951      filews        /export/sdb/brick1
> > > > > glusterfs06.sh3.ctripcorp.com::filews_slave    Passive    N/A
> > > > > N/A
> > > > > SH02SVR5953      filews        /export/sdb/brick1
> > > > > glusterfs01.sh3.ctripcorp.com::filews_slave    faulty     N/A
> > > > > N/A
> > > > > SVR6995HW2285    filews        /export/sdb/filews
> > > > > glusterfs01.sh3.ctripcorp.com::filews_slave    faulty     N/A
> > > > > N/A
> > > > > SH02SVR5954      filews        /export/sdb/brick1
> > > > > glusterfs01.sh3.ctripcorp.com::filews_slave    faulty     N/A
> > > > > N/A
> > > > > SVR6994HW2285    filews        /export/sdb/filews
> > > > > glusterfs02.sh3.ctripcorp.com::filews_slave    Passive    N/A
> > > > > N/A
> > > > > SVR6993HW2285    filews        /export/sdb/filews
> > > > > glusterfs01.sh3.ctripcorp.com::filews_slave    faulty     N/A
> > > > > N/A
> > > > > SH02SVR5952      filews        /export/sdb/brick1
> > > > > glusterfs01.sh3.ctripcorp.com::filews_slave    faulty     N/A
> > > > > N/A
> > > > > SVR6996HW2285    filews        /export/sdb/filews
> > > > > glusterfs04.sh3.ctripcorp.com::filews_slave    Passive    N/A
> > > > > N/A
> > > > > 
> > > > > On the faulty node, log file
> > > > > /var/log/glusterfs/geo-replication/filews
> > > > > shows
> > > > > worker(/export/sdb/filews) died before establishing connection
> > > > > 
> > > > > [2016-05-18 16:55:46.402622] I [monitor(monitor):215:monitor]
> > > > > Monitor:
> > > > > ------------------------------------------------------------
> > > > > [2016-05-18 16:55:46.402930] I [monitor(monitor):216:monitor]
> > > > > Monitor:
> > > > > starting gsyncd worker
> > > > > [2016-05-18 16:55:46.517460] I 
> > > > > [changelogagent(agent):72:__init__]
> > > > > ChangelogAgent: Agent listining...
> > > > > [2016-05-18 16:55:46.518066] I [repce(agent):92:service_loop]
> > > > > RepceServer:
> > > > > terminating on reaching EOF.
> > > > > [2016-05-18 16:55:46.518279] I 
> > > > > [syncdutils(agent):214:finalize]
> > > > > <top>:
> > > > > exiting.
> > > > > [2016-05-18 16:55:46.518194] I [monitor(monitor):267:monitor]
> > > > > Monitor:
> > > > > worker(/export/sdb/filews) died before establishing connection
> > > > > [2016-05-18 16:55:56.697036] I [monitor(monitor):215:monitor]
> > > > > Monitor:
> > > > > ------------------------------------------------------------
> > > > > 
> > > > > Any advice and suggestions will be greatly appreciated.
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > Best Regards
> > > > >        Yuyang Yang
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > _______________________________________________
> > > > > 
> > > > > Gluster-users mailing list
> > > > > 
> > > > > Gluster-users@gluster.org<mailto:Gluster-users@gluster.org>
> > > > > 
> > > > > http://www.gluster.org/mailman/listinfo/gluster-users
> > > > > 
> > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Reply via email to