Re: [ceph-users] I/O hangs with 2 node failure even if one node isn't involved in I/O

Kjetil Jørgensen Mon, 20 Mar 2017 17:49:19 -0700

Hi,

rbd_id.vm-100-disk-1 is only a "meta object", IIRC, it's contents will get
you a "prefix", which then gets you on to rbd_header.<prefix>,
rbd_header.prefix contains block size, striping, etc. The actual data
bearing objects will be named something like rbd_data.prefix.%-016x.


Example - vm-100-disk-1 has the prefix 86ce2ae8944a, the first <block size>
of that image will be named rbd_data. 86ce2ae8944a.000000000000, the second
<block size> will be 86ce2ae8944a.000000000001, and so on, chances are that
one of these objects are mapped to a pg which has both host3 and host4
among it's replicas.

An rbd image will end up scattered across most/all osds of the pool it's in.

Cheers,
-KJ

On Fri, Mar 17, 2017 at 12:30 PM, Adam Carheden <carhe...@ucar.edu> wrote:

> I have a 4 node cluster shown by `ceph osd tree` below. Monitors are
> running on hosts 1, 2 and 3. It has a single replicated pool of size
> 3. I have a VM with its hard drive replicated to OSDs 11(host3),
> 5(host1) and 3(host2).
>
> I can 'fail' any one host by disabling the SAN network interface and
> the VM keeps running with a simple slowdown in I/O performance just as
> expected. However, if 'fail' both nodes 3 and 4, I/O hangs on the VM.
> (i.e. `df` never completes, etc.) The monitors on hosts 1 and 2 still
> have quorum, so that shouldn't be an issue. The placement group still
> has 2 of its 3 replicas online.
>
> Why does I/O hang even though host4 isn't running a monitor and
> doesn't have anything to do with my VM's hard drive.
>
>
> Size?
> # ceph osd pool get rbd size
> size: 3
>
> Where's rbd_id.vm-100-disk-1?
> # ceph osd getmap -o /tmp/map && osdmaptool --pool 0 --test-map-object
> rbd_id.vm-100-disk-1 /tmp/map
> got osdmap epoch 1043
> osdmaptool: osdmap file '/tmp/map'
>  object 'rbd_id.vm-100-disk-1' -> 0.1ea -> [11,5,3]
>
> # ceph osd tree
> ID WEIGHT  TYPE NAME          UP/DOWN REWEIGHT PRIMARY-AFFINITY
> -1 8.06160 root default
> -7 5.50308     room A
> -3 1.88754         host host1
>  4 0.40369             osd.4       up  1.00000          1.00000
>  5 0.40369             osd.5       up  1.00000          1.00000
>  6 0.54008             osd.6       up  1.00000          1.00000
>  7 0.54008             osd.7       up  1.00000          1.00000
> -2 3.61554         host host2
>  0 0.90388             osd.0       up  1.00000          1.00000
>  1 0.90388             osd.1       up  1.00000          1.00000
>  2 0.90388             osd.2       up  1.00000          1.00000
>  3 0.90388             osd.3       up  1.00000          1.00000
> -6 2.55852     room B
> -4 1.75114         host host3
>  8 0.40369             osd.8       up  1.00000          1.00000
>  9 0.40369             osd.9       up  1.00000          1.00000
> 10 0.40369             osd.10      up  1.00000          1.00000
> 11 0.54008             osd.11      up  1.00000          1.00000
> -5 0.80737         host host4
> 12 0.40369             osd.12      up  1.00000          1.00000
> 13 0.40369             osd.13      up  1.00000          1.00000
>
>
> --
> Adam Carheden
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
Kjetil Joergensen <kje...@medallia.com>
SRE, Medallia Inc
Phone: +1 (650) 739-6580

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] I/O hangs with 2 node failure even if one node isn't involved in I/O

Reply via email to