Re: [ceph-users] Ceph Replication not working

2019-04-05 Thread Jason Dillaman
What is the version of rbd-mirror daemon and your OSDs? It looks it
found two replicated images and got stuck on the "wait_for_deletion"
step. Since I suspect those images haven't been deleted, it should
have immediately proceeded to the next step of the image replay state
machine. Are there any additional log messages after 2019-04-05
12:07:29.981203?

On Fri, Apr 5, 2019 at 1:56 PM Vikas Rana  wrote:
>
> Hi there,
>
> We are trying to setup a rbd-mirror replication and after the setup, 
> everything looks good but images are not replicating.
>
>
>
> Can some please please help?
>
>
>
> Thanks,
>
> -Vikas
>
>
>
> root@remote:/var/log/ceph# rbd --cluster cephdr mirror pool info nfs
>
> Mode: pool
>
> Peers:
>
>   UUID NAME CLIENT
>
>   bcd54bc5-cd08-435f-a79a-357bce55011d ceph client.mirrorprod
>
>
>
> root@local:/etc/ceph# rbd  mirror pool info nfs
>
> Mode: pool
>
> Peers:
>
>   UUID NAME   CLIENT
>
>   612151cf-f70d-49d0-94e2-a7b850a53e4f cephdr client.mirrordr
>
>
>
>
>
> root@local:/etc/ceph# rbd info nfs/test01
>
> rbd image 'test01':
>
> size 102400 kB in 25 objects
>
> order 22 (4096 kB objects)
>
> block_name_prefix: rbd_data.11cd3c238e1f29
>
> format: 2
>
> features: layering, exclusive-lock, object-map, fast-diff, 
> deep-flatten, journaling
>
> flags:
>
> journal: 11cd3c238e1f29
>
> mirroring state: enabled
>
> mirroring global id: 06fbfe68-b7e4-4d3a-93b2-cd18c569f7f7
>
> mirroring primary: true
>
>
>
>
>
> root@remote:/var/log/ceph# rbd --cluster cephdr mirror pool status nfs 
> --verbose
>
> health: OK
>
> images: 0 total
>
>
>
> root@remote:/var/log/ceph# rbd info nfs/test01
>
> rbd: error opening image test01: (2) No such file or directory
>
>
>
>
>
> root@remote:/var/log/ceph# ceph -s --cluster cephdr
>
>   cluster:
>
> id: ade49174-1f84-4c3c-a93c-b293c3655c93
>
> health: HEALTH_WARN
>
> noout,nodeep-scrub flag(s) set
>
>
>
>   services:
>
> mon:3 daemons, quorum nidcdvtier1a,nidcdvtier2a,nidcdvtier3a
>
> mgr:nidcdvtier1a(active), standbys: nidcdvtier2a
>
> osd:12 osds: 12 up, 12 in
>
> flags noout,nodeep-scrub
>
> rbd-mirror: 1 daemon active
>
>
>
>   data:
>
> pools:   5 pools, 640 pgs
>
> objects: 1.32M objects, 5.03TiB
>
> usage:   10.1TiB used, 262TiB / 272TiB avail
>
> pgs: 640 active+clean
>
>
>
>   io:
>
> client:   170B/s rd, 0B/s wr, 0op/s rd, 0op/s wr
>
>
>
>
>
> 2019-04-05 12:07:29.720742 7f0fa5e284c0  0 ceph version 12.2.11 
> (26dc3775efc7bb286a1d6d66faee0ba30ea23eee) luminous (stable), process 
> rbd-mirror, pid 3921391
>
> 2019-04-05 12:07:29.721752 7f0fa5e284c0  0 pidfile_write: ignore empty 
> --pid-file
>
> 2019-04-05 12:07:29.726580 7f0fa5e284c0 20 rbd::mirror::ServiceDaemon: 
> 0x560200d29bb0 ServiceDaemon:
>
> 2019-04-05 12:07:29.732654 7f0fa5e284c0 20 rbd::mirror::ServiceDaemon: 
> 0x560200d29bb0 init:
>
> 2019-04-05 12:07:29.734920 7f0fa5e284c0  1 mgrc service_daemon_register 
> rbd-mirror.admin metadata {arch=x86_64,ceph_version=ceph version 12.2.11 
> (26dc3775efc7bb286a1d6d66faee0ba30ea23eee) luminous (stable),cpu=Intel(R) 
> Xeon(R) CPU E5-2690 v2 @ 3.00GHz,distro=ubuntu,distro_description=Ubuntu 
> 14.04.5 
> LTS,distro_version=14.04,hostname=nidcdvtier3a,instance_id=464360,kernel_description=#93
>  SMP Sat Jun 17 04:01:23 EDT 
> 2017,kernel_version=3.19.0-85-vtier,mem_swap_kb=67105788,mem_total_kb=131999112,os=Linux}
>
> 2019-04-05 12:07:29.735779 7f0fa5e284c0 20 rbd::mirror::Mirror: 
> 0x560200d27f90 run: enter
>
> 2019-04-05 12:07:29.735793 7f0fa5e284c0 20 
> rbd::mirror::ClusterWatcher:0x560200dcd930 refresh_pools: enter
>
> 2019-04-05 12:07:29.735809 7f0f77fff700 20 rbd::mirror::ImageDeleter: 
> 0x560200dcd9c0 run: enter
>
> 2019-04-05 12:07:29.735819 7f0f77fff700 20 rbd::mirror::ImageDeleter: 
> 0x560200dcd9c0 run: waiting for delete requests
>
> 2019-04-05 12:07:29.739019 7f0fa5e284c0 10 
> rbd::mirror::ClusterWatcher:0x560200dcd930 read_pool_peers: mirroring is 
> disabled for pool docnfs
>
> 2019-04-05 12:07:29.741090 7f0fa5e284c0 10 
> rbd::mirror::ClusterWatcher:0x560200dcd930 read_pool_peers: mirroring is 
> disabled for pool doccifs
>
> 2019-04-05 12:07:29.742620 7f0fa5e284c0 10 
> rbd::mirror::ClusterWatcher:0x560200dcd930 read_pool_peers: mirroring is 
> disabled for pool fcp-dr
>
> 2019-04-05 12:07:29.76 7f0fa5e284c0 10 
> rbd::mirror::ClusterWatcher:0x560200dcd930 read_pool_peers: mirroring is 
> disabled for pool cifs
>
> 2019-04-05 12:07:29.746958 7f0fa5e284c0 20 rbd::mirror::ServiceDaemon: 
> 0x560200d29bb0 add_pool: pool_id=8, pool_name=nfs
>
> 2019-04-05 12:07:29.748181 7f0fa5e284c0 20 rbd::mirror::Mirror: 
> 0x560200d27f90 update_pool_replayers: enter
>
> 2019-04-05 12:07:29.748212 7f0fa5e284c0 20 rbd::mirror::Mirror: 
> 0x560200d27f90 update_pool_replayers: starting pool replayer for uuid: 

Re: [ceph-users] Ceph Replication not working

2019-04-08 Thread Jason Dillaman
The log appears to be missing all the librbd log messages. The process
seems to stop at attempting to open the image from the remote cluster:

2019-04-05 12:07:29.992323 7f0f3bfff700 20
rbd::mirror::image_replayer::OpenImageRequest: 0x7f0f28018a20
send_open_image

Assuming you are using the default log file naming settings, the log
should be located at "/var/log/ceph/ceph-client.mirrorprod.log". Of
course, looking at your cluster naming makes me think that since your
primary cluster is named "ceph" on the DR-site side, have you changed
your "/etc/default/ceph" file to rename the local cluster from "ceph"
to "cephdr" so that the "rbd-mirror" daemon connects to the correct
local cluster?


On Fri, Apr 5, 2019 at 3:28 PM Vikas Rana  wrote:
>
> Hi Jason,
>
> 12.2.11 is the version.
>
> Attached is the complete log file.
>
> We removed the pool to make sure there's no image left on DR site and 
> recreated an empty pool.
>
> Thanks,
> -Vikas
>
> -Original Message-
> From: Jason Dillaman 
> Sent: Friday, April 5, 2019 2:24 PM
> To: Vikas Rana 
> Cc: ceph-users 
> Subject: Re: [ceph-users] Ceph Replication not working
>
> What is the version of rbd-mirror daemon and your OSDs? It looks it found two 
> replicated images and got stuck on the "wait_for_deletion"
> step. Since I suspect those images haven't been deleted, it should have 
> immediately proceeded to the next step of the image replay state machine. Are 
> there any additional log messages after 2019-04-05 12:07:29.981203?
>
> On Fri, Apr 5, 2019 at 1:56 PM Vikas Rana  wrote:
> >
> > Hi there,
> >
> > We are trying to setup a rbd-mirror replication and after the setup, 
> > everything looks good but images are not replicating.
> >
> >
> >
> > Can some please please help?
> >
> >
> >
> > Thanks,
> >
> > -Vikas
> >
> >
> >
> > root@remote:/var/log/ceph# rbd --cluster cephdr mirror pool info nfs
> >
> > Mode: pool
> >
> > Peers:
> >
> >   UUID NAME CLIENT
> >
> >   bcd54bc5-cd08-435f-a79a-357bce55011d ceph client.mirrorprod
> >
> >
> >
> > root@local:/etc/ceph# rbd  mirror pool info nfs
> >
> > Mode: pool
> >
> > Peers:
> >
> >   UUID NAME   CLIENT
> >
> >   612151cf-f70d-49d0-94e2-a7b850a53e4f cephdr client.mirrordr
> >
> >
> >
> >
> >
> > root@local:/etc/ceph# rbd info nfs/test01
> >
> > rbd image 'test01':
> >
> > size 102400 kB in 25 objects
> >
> > order 22 (4096 kB objects)
> >
> > block_name_prefix: rbd_data.11cd3c238e1f29
> >
> > format: 2
> >
> > features: layering, exclusive-lock, object-map, fast-diff,
> > deep-flatten, journaling
> >
> > flags:
> >
> > journal: 11cd3c238e1f29
> >
> > mirroring state: enabled
> >
> > mirroring global id: 06fbfe68-b7e4-4d3a-93b2-cd18c569f7f7
> >
> > mirroring primary: true
> >
> >
> >
> >
> >
> > root@remote:/var/log/ceph# rbd --cluster cephdr mirror pool status nfs
> > --verbose
> >
> > health: OK
> >
> > images: 0 total
> >
> >
> >
> > root@remote:/var/log/ceph# rbd info nfs/test01
> >
> > rbd: error opening image test01: (2) No such file or directory
> >
> >
> >
> >
> >
> > root@remote:/var/log/ceph# ceph -s --cluster cephdr
> >
> >   cluster:
> >
> > id: ade49174-1f84-4c3c-a93c-b293c3655c93
> >
> > health: HEALTH_WARN
> >
> > noout,nodeep-scrub flag(s) set
> >
> >
> >
> >   services:
> >
> > mon:3 daemons, quorum nidcdvtier1a,nidcdvtier2a,nidcdvtier3a
> >
> > mgr:nidcdvtier1a(active), standbys: nidcdvtier2a
> >
> > osd:12 osds: 12 up, 12 in
> >
> > flags noout,nodeep-scrub
> >
> > rbd-mirror: 1 daemon active
> >
> >
> >
> >   data:
> >
> > pools:   5 pools, 640 pgs
> >
> > objects: 1.32M objects, 5.03TiB
> >
> > usage:   10.1TiB used, 262TiB / 272TiB avail
> >
> > pgs: 640 active+clean
> >
> >
> >
> >   io:
> >
> > client:   170B/s rd, 0B/s wr, 0op/s rd, 0op/s wr
> >
> >
> >
> >
> >
&

Re: [ceph-users] Ceph Replication not working

2019-04-08 Thread Vikas Rana
Hi Jason,

On Prod side, we have cluster ceph and on DR side we renamed to cephdr

Accordingly, we renamed the ceph.conf to cephdr.conf on DR side.

This setup used to work and one day we tried to promote the DR to verify the 
replication and since then it's been a nightmare.
The resync didn’t work and then we eventually gave up and deleted the pool on 
DR side to start afresh.

We deleted and recreated the peer relationship also.

Is there any debugging we can do on Prod or DR side to see where its stopping 
or waiting while "send_open_image"?

Rbd-mirror is running as "rbd-mirror --cluster=cephdr"


Thanks,
-Vikas

-Original Message-
From: Jason Dillaman  
Sent: Monday, April 8, 2019 9:30 AM
To: Vikas Rana 
Cc: ceph-users 
Subject: Re: [ceph-users] Ceph Replication not working

The log appears to be missing all the librbd log messages. The process seems to 
stop at attempting to open the image from the remote cluster:

2019-04-05 12:07:29.992323 7f0f3bfff700 20
rbd::mirror::image_replayer::OpenImageRequest: 0x7f0f28018a20 send_open_image

Assuming you are using the default log file naming settings, the log should be 
located at "/var/log/ceph/ceph-client.mirrorprod.log". Of course, looking at 
your cluster naming makes me think that since your primary cluster is named 
"ceph" on the DR-site side, have you changed your "/etc/default/ceph" file to 
rename the local cluster from "ceph"
to "cephdr" so that the "rbd-mirror" daemon connects to the correct local 
cluster?


On Fri, Apr 5, 2019 at 3:28 PM Vikas Rana  wrote:
>
> Hi Jason,
>
> 12.2.11 is the version.
>
> Attached is the complete log file.
>
> We removed the pool to make sure there's no image left on DR site and 
> recreated an empty pool.
>
> Thanks,
> -Vikas
>
> -Original Message-
> From: Jason Dillaman 
> Sent: Friday, April 5, 2019 2:24 PM
> To: Vikas Rana 
> Cc: ceph-users 
> Subject: Re: [ceph-users] Ceph Replication not working
>
> What is the version of rbd-mirror daemon and your OSDs? It looks it found two 
> replicated images and got stuck on the "wait_for_deletion"
> step. Since I suspect those images haven't been deleted, it should have 
> immediately proceeded to the next step of the image replay state machine. Are 
> there any additional log messages after 2019-04-05 12:07:29.981203?
>
> On Fri, Apr 5, 2019 at 1:56 PM Vikas Rana  wrote:
> >
> > Hi there,
> >
> > We are trying to setup a rbd-mirror replication and after the setup, 
> > everything looks good but images are not replicating.
> >
> >
> >
> > Can some please please help?
> >
> >
> >
> > Thanks,
> >
> > -Vikas
> >
> >
> >
> > root@remote:/var/log/ceph# rbd --cluster cephdr mirror pool info nfs
> >
> > Mode: pool
> >
> > Peers:
> >
> >   UUID NAME CLIENT
> >
> >   bcd54bc5-cd08-435f-a79a-357bce55011d ceph client.mirrorprod
> >
> >
> >
> > root@local:/etc/ceph# rbd  mirror pool info nfs
> >
> > Mode: pool
> >
> > Peers:
> >
> >   UUID NAME   CLIENT
> >
> >   612151cf-f70d-49d0-94e2-a7b850a53e4f cephdr client.mirrordr
> >
> >
> >
> >
> >
> > root@local:/etc/ceph# rbd info nfs/test01
> >
> > rbd image 'test01':
> >
> > size 102400 kB in 25 objects
> >
> > order 22 (4096 kB objects)
> >
> > block_name_prefix: rbd_data.11cd3c238e1f29
> >
> > format: 2
> >
> > features: layering, exclusive-lock, object-map, fast-diff, 
> > deep-flatten, journaling
> >
> > flags:
> >
> > journal: 11cd3c238e1f29
> >
> > mirroring state: enabled
> >
> > mirroring global id: 06fbfe68-b7e4-4d3a-93b2-cd18c569f7f7
> >
> > mirroring primary: true
> >
> >
> >
> >
> >
> > root@remote:/var/log/ceph# rbd --cluster cephdr mirror pool status 
> > nfs --verbose
> >
> > health: OK
> >
> > images: 0 total
> >
> >
> >
> > root@remote:/var/log/ceph# rbd info nfs/test01
> >
> > rbd: error opening image test01: (2) No such file or directory
> >
> >
> >
> >
> >
> > root@remote:/var/log/ceph# ceph -s --cluster cephdr
> >
> >   cluster:
> >
> > id: ade49174-1f84-4c3c-a93c-b293c3655c93
> >
> > health: HEALTH_WARN
> >
> > noout,nodeep-scrub flag(s) set
> >
>

Re: [ceph-users] Ceph Replication not working

2019-04-08 Thread Jason Dillaman
On Mon, Apr 8, 2019 at 9:47 AM Vikas Rana  wrote:
>
> Hi Jason,
>
> On Prod side, we have cluster ceph and on DR side we renamed to cephdr
>
> Accordingly, we renamed the ceph.conf to cephdr.conf on DR side.
>
> This setup used to work and one day we tried to promote the DR to verify the 
> replication and since then it's been a nightmare.
> The resync didn’t work and then we eventually gave up and deleted the pool on 
> DR side to start afresh.
>
> We deleted and recreated the peer relationship also.
>
> Is there any debugging we can do on Prod or DR side to see where its stopping 
> or waiting while "send_open_image"?

You need to add "debug rbd = 20" to both your ceph.conf and
cephdr.conf (if you haven't already) and you would need to provide the
log associated w/ the production cluster connection (see below). Also,
please use pastebin or similar service to avoid mailing the logs to
the list.

> Rbd-mirror is running as "rbd-mirror --cluster=cephdr"
>
>
> Thanks,
> -Vikas
>
> -Original Message-
> From: Jason Dillaman 
> Sent: Monday, April 8, 2019 9:30 AM
> To: Vikas Rana 
> Cc: ceph-users 
> Subject: Re: [ceph-users] Ceph Replication not working
>
> The log appears to be missing all the librbd log messages. The process seems 
> to stop at attempting to open the image from the remote cluster:
>
> 2019-04-05 12:07:29.992323 7f0f3bfff700 20
> rbd::mirror::image_replayer::OpenImageRequest: 0x7f0f28018a20 send_open_image
>
> Assuming you are using the default log file naming settings, the log should 
> be located at "/var/log/ceph/ceph-client.mirrorprod.log". Of course, looking 
> at your cluster naming makes me think that since your primary cluster is 
> named "ceph" on the DR-site side, have you changed your "/etc/default/ceph" 
> file to rename the local cluster from "ceph"
> to "cephdr" so that the "rbd-mirror" daemon connects to the correct local 
> cluster?
>
>
> On Fri, Apr 5, 2019 at 3:28 PM Vikas Rana  wrote:
> >
> > Hi Jason,
> >
> > 12.2.11 is the version.
> >
> > Attached is the complete log file.
> >
> > We removed the pool to make sure there's no image left on DR site and 
> > recreated an empty pool.
> >
> > Thanks,
> > -Vikas
> >
> > -Original Message-
> > From: Jason Dillaman 
> > Sent: Friday, April 5, 2019 2:24 PM
> > To: Vikas Rana 
> > Cc: ceph-users 
> > Subject: Re: [ceph-users] Ceph Replication not working
> >
> > What is the version of rbd-mirror daemon and your OSDs? It looks it found 
> > two replicated images and got stuck on the "wait_for_deletion"
> > step. Since I suspect those images haven't been deleted, it should have 
> > immediately proceeded to the next step of the image replay state machine. 
> > Are there any additional log messages after 2019-04-05 12:07:29.981203?
> >
> > On Fri, Apr 5, 2019 at 1:56 PM Vikas Rana  wrote:
> > >
> > > Hi there,
> > >
> > > We are trying to setup a rbd-mirror replication and after the setup, 
> > > everything looks good but images are not replicating.
> > >
> > >
> > >
> > > Can some please please help?
> > >
> > >
> > >
> > > Thanks,
> > >
> > > -Vikas
> > >
> > >
> > >
> > > root@remote:/var/log/ceph# rbd --cluster cephdr mirror pool info nfs
> > >
> > > Mode: pool
> > >
> > > Peers:
> > >
> > >   UUID NAME CLIENT
> > >
> > >   bcd54bc5-cd08-435f-a79a-357bce55011d ceph client.mirrorprod
> > >
> > >
> > >
> > > root@local:/etc/ceph# rbd  mirror pool info nfs
> > >
> > > Mode: pool
> > >
> > > Peers:
> > >
> > >   UUID NAME   CLIENT
> > >
> > >   612151cf-f70d-49d0-94e2-a7b850a53e4f cephdr client.mirrordr
> > >
> > >
> > >
> > >
> > >
> > > root@local:/etc/ceph# rbd info nfs/test01
> > >
> > > rbd image 'test01':
> > >
> > > size 102400 kB in 25 objects
> > >
> > > order 22 (4096 kB objects)
> > >
> > > block_name_prefix: rbd_data.11cd3c238e1f29
> > >
> > > format: 2
> > >
> > > features: layering, exclusive-lock, object-map, fast-diff,
> > > deep-flatten, journaling
> > &g