Hi list,

we have a productive Hammer cluster for our OpenStack cloud and recently a colleague added a cache tier consisting of 2 SSDs and also a pool size of 2, we're still experimenting with this topic.

Now we have some hardware maintenance to do and need to shutdown nodes, one at a time of course. So we tried to flush/evict the cache pool and disable it to prevent data loss, we also set the cache-mode to "forward". Most of the objects have been evicted successfully, but there are still 39 objects left, and it's impossible to evict them. I'm not sure how to make sure if we can just delete the cache pool without data loss, we want to set up the cache-pool from scratch.

# rados -p images-cache ls
rbd_header.210f542ae8944a
volume-ce17068e-a36d-4d9b-9779-3af473aba033.rbd
rbd_header.50ec372eb141f2
931f9a1e-2022-4571-909e-6c3f5f8c3ae8_disk.rbd
rbd_header.59dd32ae8944a
...

There are only 3 types of objects in the cache-pool:
  - rbd_header
  - volume-XXX.rbd (obviously cinder related)
  - XXX_disk (nova disks)

All rbd_header objects have a size of 0 if I run a "stat" command on them, the rest has a size of 112. If I compare the objects with the respective object in the cold-storage, they are identical:

Object rbd_header.1128db1b5d2111:
images-cache/rbd_header.1128db1b5d2111 mtime 2017-08-21 15:55:26.000000, size 0      images/rbd_header.1128db1b5d2111 mtime 2017-08-21 15:55:26.000000, size 0

Object volume-fd07dd66-8a82-431c-99cf-9bfc3076af30.rbd:
images-cache/volume-fd07dd66-8a82-431c-99cf-9bfc3076af30.rbd mtime 2017-08-21 15:55:26.000000, size 112      images/volume-fd07dd66-8a82-431c-99cf-9bfc3076af30.rbd mtime 2017-08-21 15:55:26.000000, size 112

Object 2dcb9d7d-3a4f-49a4-8792-b4b74f5b60e5_disk.rbd:
images-cache/2dcb9d7d-3a4f-49a4-8792-b4b74f5b60e5_disk.rbd mtime 2017-08-21 15:55:25.000000, size 112      images/2dcb9d7d-3a4f-49a4-8792-b4b74f5b60e5_disk.rbd mtime 2017-08-21 15:55:25.000000, size 112

Some of them have an rbd_lock, some of them have a watcher, some don't have any of that but they still can't be evicted:

# rados -p images-cache lock list rbd_header.2207c92ae8944a
{"objname":"rbd_header.2207c92ae8944a","locks":[]}
# rados -p images-cache listwatchers rbd_header.2207c92ae8944a
#
# rados -p images-cache cache-evict rbd_header.2207c92ae8944a
error from cache-evict rbd_header.2207c92ae8944a: (16) Device or resource busy

Then I also tried to shutdown an instance that uses some of the volumes listed in the cache pool, but the objects didn't change at all, the total number was also still 39. For the rbd_header objects I don't even know how to identify their "owner", is there a way?

Has anyone a hint what else I could check or is it reasonable to assume that the objects are really the same and there would be no data loss in case we deleted that pool?
We appreciate any help!

Regards,
Eugen

--
Eugen Block                             voice   : +49-40-559 51 75
NDE Netzdesign und -entwicklung AG      fax     : +49-40-559 51 77
Postfach 61 03 15
D-22423 Hamburg                         e-mail  : ebl...@nde.ag

        Vorsitzende des Aufsichtsrates: Angelika Mozdzen
          Sitz und Registergericht: Hamburg, HRB 90934
                  Vorstand: Jens-U. Mozdzen
                   USt-IdNr. DE 814 013 983

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to