Re: [ceph-users] ceph-backed VM drive became corrupted after unexpected VM termination

2017-11-07 Thread Дробышевский , Владимир
2017-11-07 19:06 GMT+05:00 Jason Dillaman :

> On Tue, Nov 7, 2017 at 8:55 AM, Дробышевский, Владимир 
> wrote:
> >
> > Oh, sorry, I forgot to mention that all OSDs are with bluestore, so xfs
> mount options don't have any influence.
> >
> > VMs have cache="none" by default, then I've tried "writethrough". No
> difference.
> >
> > And aren't these rbd cache options enabled by default?
>
> Yes, they are enabled by default. Note, however, that the QEMU cache
> options for the drive will override the Ceph configuration defaults.
>
> What specifically are you seeing in the guest OS when you state
> "corruption"?

Guest OS just can't mount partitions and stuck at initramfs. But I found
the reason: image is staying locked forever after a hypervisor crash. I
didn't tie up one thing with another.
And I've seen that images are staying locked when I tried to investigate
the problem, don't know why didn't I try to unlock before writing here :(

It was a problem with permissions, I've changed it and now everything works
as it should.

Thanks a lot for your help!

And Nico, thank you for pointing out your thread: I found the correct
permissions (in Jason's message) there. Going to open PR with fix for
OpenNebula docs.


> Assuming you haven't disabled barriers in your guest OS
> mount options and are using a journaled filesystem like ext4 or XFS,
> it should be sending proper flush requests to QEMU / librbd to ensure
> that it remains crash consistent. However, if you disable barriers or
> set a QEMU "cache=unsafe" option, these flush requests will not be
> sent and your data will most likely be corrupt after a hard failure.
>
> > 2017-11-07 18:45 GMT+05:00 Peter Maloney  lt.de>:
> >>
> >> I see nobarrier in there... Try without that. (unless that's just the
> bluestore xfs...then it probably won't change anything). And are the osds
> using bluestore?
> >>
> >> And what cache options did you set in the VM config? It's dangerous to
> set writeback without also this in the client side ceph.conf:
> >>
> >> rbd cache writethrough until flush = true
> >> rbd_cache = true
> >>
> >>
> >>
> >>
> >> On 11/07/17 14:36, Дробышевский, Владимир wrote:
> >>
> >> Hello!
> >>
> >>   I've got a weird situation with rdb drive image reliability. I found
> that after hard-reset VM with ceph rbd drive from my new cluster become
> corrupted. I accidentally found it during HA tests of my new cloud cluster:
> after host reset VM was not able to boot again because of the virtual drive
> errors. The same result will be if you just kill qemu process (like would
> happened at host crash time).
> >>
> >>   First of all I thought it is a guest OS problem. But then I tried
> RouterOS (linux based), Linux, FreeBSD - all options show the same behavior.
> >>   Then I blamed OpenNebula installation. For the test sake I've
> installed the latest Proxmox (5.1-36) to another server. The first subtest:
> I've created a VM in OpenNebula from predefined image, shut it down, then
> create Proxmox VM and pointed it to the image was created from OpenNebula.
> >> The second subtest: I've made a clean install from ISO with from
> Proxmox console, having previously created from Proxmox VM and drive image
> (of course, on the same ceph pool).
> >>   Both results: unbootable VMs.
> >>
> >>   Finally I've made a clean install to the fresh VM with local
> LVM-backed drive image. And - guess what? - it survived qemu process kill.
> >>
> >>   This is the first situation of this kind in my practice so I would
> like to ask for guidance. I believe that it is a cache problem of some
> kind, but I haven't faced it with earlier releases.
> >>
> >>   Some cluster details:
> >>
> >>   It's a small test cluster with 4 nodes, each has:
> >>
> >>   2x CPU E5-2665,
> >>   128GB RAM
> >>   1 OSD with Samsung sm863 1.92TB drive
> >>   IB connection with IPoIB on QDR IB network
> >>
> >>   OS: Ubuntu 16.04 with 4.10 kernel
> >>   ceph: luminous 12.2.1
> >>
> >>   Client (kvm host) OSes:
> >>   1. Ubuntu 16.04 (the same hosts as ceph cluster)
> >>   2. Debian 9.1 in case of Proxmox
> >>
> >>
> >> ceph.conf:
> >>
> >> [global]
> >> fsid = 6a8ffc55-fa2e-48dc-a71c-647e1fff749b
> >>
> >> public_network = 10.103.0.0/16
> >> cluster_network = 10.104.0.0/16
> >>
> >> mon_initial_members = e001n01, e001n02, e001n03
> >> mon_host = 10.103.0.1,10.103.0.2,10.103.0.3
> >>
> >> rbd default format = 2
> >>
> >> auth_cluster_required = cephx
> >> auth_service_required = cephx
> >> auth_client_required = cephx
> >>
> >> osd mount options = rw,noexec,nodev,noatime,nodiratime,nobarrier
> >> osd mount options xfs = rw,noexec,nodev,noatime,nodiratime,nobarrier
> >> osd_mkfs_type = xfs
> >>
> >> bluestore fsck on mount = true
> >>
> >> debug_lockdep = 0/0
> >> debug_context = 0/0
> >> debug_crush = 0/0
> >> debug_buffer = 0/0
> >> debug_timer = 0/0
> >> debug_filer = 0/0
> >> debug_objecter = 0/0
> >> debug_rados = 0/0
> >> debug_rbd = 

Re: [ceph-users] ceph-backed VM drive became corrupted after unexpected VM termination

2017-11-07 Thread Jason Dillaman
On Tue, Nov 7, 2017 at 8:55 AM, Дробышевский, Владимир  wrote:
>
> Oh, sorry, I forgot to mention that all OSDs are with bluestore, so xfs mount 
> options don't have any influence.
>
> VMs have cache="none" by default, then I've tried "writethrough". No 
> difference.
>
> And aren't these rbd cache options enabled by default?

Yes, they are enabled by default. Note, however, that the QEMU cache
options for the drive will override the Ceph configuration defaults.

What specifically are you seeing in the guest OS when you state
"corruption"? Assuming you haven't disabled barriers in your guest OS
mount options and are using a journaled filesystem like ext4 or XFS,
it should be sending proper flush requests to QEMU / librbd to ensure
that it remains crash consistent. However, if you disable barriers or
set a QEMU "cache=unsafe" option, these flush requests will not be
sent and your data will most likely be corrupt after a hard failure.

> 2017-11-07 18:45 GMT+05:00 Peter Maloney :
>>
>> I see nobarrier in there... Try without that. (unless that's just the 
>> bluestore xfs...then it probably won't change anything). And are the osds 
>> using bluestore?
>>
>> And what cache options did you set in the VM config? It's dangerous to set 
>> writeback without also this in the client side ceph.conf:
>>
>> rbd cache writethrough until flush = true
>> rbd_cache = true
>>
>>
>>
>>
>> On 11/07/17 14:36, Дробышевский, Владимир wrote:
>>
>> Hello!
>>
>>   I've got a weird situation with rdb drive image reliability. I found that 
>> after hard-reset VM with ceph rbd drive from my new cluster become 
>> corrupted. I accidentally found it during HA tests of my new cloud cluster: 
>> after host reset VM was not able to boot again because of the virtual drive 
>> errors. The same result will be if you just kill qemu process (like would 
>> happened at host crash time).
>>
>>   First of all I thought it is a guest OS problem. But then I tried RouterOS 
>> (linux based), Linux, FreeBSD - all options show the same behavior.
>>   Then I blamed OpenNebula installation. For the test sake I've installed 
>> the latest Proxmox (5.1-36) to another server. The first subtest: I've 
>> created a VM in OpenNebula from predefined image, shut it down, then create 
>> Proxmox VM and pointed it to the image was created from OpenNebula.
>> The second subtest: I've made a clean install from ISO with from Proxmox 
>> console, having previously created from Proxmox VM and drive image (of 
>> course, on the same ceph pool).
>>   Both results: unbootable VMs.
>>
>>   Finally I've made a clean install to the fresh VM with local LVM-backed 
>> drive image. And - guess what? - it survived qemu process kill.
>>
>>   This is the first situation of this kind in my practice so I would like to 
>> ask for guidance. I believe that it is a cache problem of some kind, but I 
>> haven't faced it with earlier releases.
>>
>>   Some cluster details:
>>
>>   It's a small test cluster with 4 nodes, each has:
>>
>>   2x CPU E5-2665,
>>   128GB RAM
>>   1 OSD with Samsung sm863 1.92TB drive
>>   IB connection with IPoIB on QDR IB network
>>
>>   OS: Ubuntu 16.04 with 4.10 kernel
>>   ceph: luminous 12.2.1
>>
>>   Client (kvm host) OSes:
>>   1. Ubuntu 16.04 (the same hosts as ceph cluster)
>>   2. Debian 9.1 in case of Proxmox
>>
>>
>> ceph.conf:
>>
>> [global]
>> fsid = 6a8ffc55-fa2e-48dc-a71c-647e1fff749b
>>
>> public_network = 10.103.0.0/16
>> cluster_network = 10.104.0.0/16
>>
>> mon_initial_members = e001n01, e001n02, e001n03
>> mon_host = 10.103.0.1,10.103.0.2,10.103.0.3
>>
>> rbd default format = 2
>>
>> auth_cluster_required = cephx
>> auth_service_required = cephx
>> auth_client_required = cephx
>>
>> osd mount options = rw,noexec,nodev,noatime,nodiratime,nobarrier
>> osd mount options xfs = rw,noexec,nodev,noatime,nodiratime,nobarrier
>> osd_mkfs_type = xfs
>>
>> bluestore fsck on mount = true
>>
>> debug_lockdep = 0/0
>> debug_context = 0/0
>> debug_crush = 0/0
>> debug_buffer = 0/0
>> debug_timer = 0/0
>> debug_filer = 0/0
>> debug_objecter = 0/0
>> debug_rados = 0/0
>> debug_rbd = 0/0
>> debug_journaler = 0/0
>> debug_objectcatcher = 0/0
>> debug_client = 0/0
>> debug_osd = 0/0
>> debug_optracker = 0/0
>> debug_objclass = 0/0
>> debug_filestore = 0/0
>> debug_journal = 0/0
>> debug_ms = 0/0
>> debug_monc = 0/0
>> debug_tp = 0/0
>> debug_auth = 0/0
>> debug_finisher = 0/0
>> debug_heartbeatmap = 0/0
>> debug_perfcounter = 0/0
>> debug_asok = 0/0
>> debug_throttle = 0/0
>> debug_mon = 0/0
>> debug_paxos = 0/0
>> debug_rgw = 0/0
>>
>> [osd]
>> osd op threads = 4
>> osd disk threads = 2
>> osd max backfills = 1
>> osd recovery threads = 1
>> osd recovery max active = 1
>>
>> --
>>
>> Best regards,
>> Vladimir
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>> --
>>
>> 

Re: [ceph-users] ceph-backed VM drive became corrupted after unexpected VM termination

2017-11-07 Thread Дробышевский , Владимир
Oh, sorry, I forgot to mention that all OSDs are with bluestore, so xfs
mount options don't have any influence.

VMs have cache="none" by default, then I've tried "writethrough". No
difference.

And aren't these rbd cache options enabled by default?




2017-11-07 18:45 GMT+05:00 Peter Maloney :

> I see nobarrier in there... Try without that. (unless that's just the
> bluestore xfs...then it probably won't change anything). And are the osds
> using bluestore?
>
> And what cache options did you set in the VM config? It's dangerous to set
> writeback without also this in the client side ceph.conf:
>
> rbd cache writethrough until flush = true
> rbd_cache = true
>
>
>
>
> On 11/07/17 14:36, Дробышевский, Владимир wrote:
>
> Hello!
>
>   I've got a weird situation with rdb drive image reliability. I found
> that after hard-reset VM with ceph rbd drive from my new cluster become
> corrupted. I accidentally found it during HA tests of my new cloud cluster:
> after host reset VM was not able to boot again because of the virtual drive
> errors. The same result will be if you just kill qemu process (like would
> happened at host crash time).
>
>   First of all I thought it is a guest OS problem. But then I tried
> RouterOS (linux based), Linux, FreeBSD - all options show the same
> behavior.
>   Then I blamed OpenNebula installation. For the test sake I've installed
> the latest Proxmox (5.1-36) to another server. The first subtest: I've
> created a VM in OpenNebula from predefined image, shut it down, then create
> Proxmox VM and pointed it to the image was created from OpenNebula.
> The second subtest: I've made a clean install from ISO with from Proxmox
> console, having previously created from Proxmox VM and drive image (of
> course, on the same ceph pool).
>   Both results: unbootable VMs.
>
>   Finally I've made a clean install to the fresh VM with local LVM-backed
> drive image. And - guess what? - it survived qemu process kill.
>
>   This is the first situation of this kind in my practice so I would like
> to ask for guidance. I believe that it is a cache problem of some kind, but
> I haven't faced it with earlier releases.
>
>   Some cluster details:
>
>   It's a small test cluster with 4 nodes, each has:
>
>   2x CPU E5-2665,
>   128GB RAM
>   1 OSD with Samsung sm863 1.92TB drive
>   IB connection with IPoIB on QDR IB network
>
>   OS: Ubuntu 16.04 with 4.10 kernel
>   ceph: luminous 12.2.1
>
>   Client (kvm host) OSes:
>   1. Ubuntu 16.04 (the same hosts as ceph cluster)
>   2. Debian 9.1 in case of Proxmox
>
>
> *ceph.conf:*
>
> [global]
> fsid = 6a8ffc55-fa2e-48dc-a71c-647e1fff749b
>
> public_network = 10.103.0.0/16
> cluster_network = 10.104.0.0/16
>
> mon_initial_members = e001n01, e001n02, e001n03
> mon_host = 10.103.0.1,10.103.0.2,10.103.0.3
>
> rbd default format = 2
>
> auth_cluster_required = cephx
> auth_service_required = cephx
> auth_client_required = cephx
>
> osd mount options = rw,noexec,nodev,noatime,nodiratime,nobarrier
> osd mount options xfs = rw,noexec,nodev,noatime,nodiratime,nobarrier
> osd_mkfs_type = xfs
>
> bluestore fsck on mount = true
>
> debug_lockdep = 0/0
> debug_context = 0/0
> debug_crush = 0/0
> debug_buffer = 0/0
> debug_timer = 0/0
> debug_filer = 0/0
> debug_objecter = 0/0
> debug_rados = 0/0
> debug_rbd = 0/0
> debug_journaler = 0/0
> debug_objectcatcher = 0/0
> debug_client = 0/0
> debug_osd = 0/0
> debug_optracker = 0/0
> debug_objclass = 0/0
> debug_filestore = 0/0
> debug_journal = 0/0
> debug_ms = 0/0
> debug_monc = 0/0
> debug_tp = 0/0
> debug_auth = 0/0
> debug_finisher = 0/0
> debug_heartbeatmap = 0/0
> debug_perfcounter = 0/0
> debug_asok = 0/0
> debug_throttle = 0/0
> debug_mon = 0/0
> debug_paxos = 0/0
> debug_rgw = 0/0
>
> [osd]
> osd op threads = 4
> osd disk threads = 2
> osd max backfills = 1
> osd recovery threads = 1
> osd recovery max active = 1
>
> --
>
> Best regards,
> Vladimir
>
>
> ___
> ceph-users mailing 
> listceph-us...@lists.ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> --
>
> 
> Peter Maloney
> Brockmann Consult
> Max-Planck-Str. 2
> 21502 Geesthacht
> Germany
> Tel: +49 4152 889 300
> Fax: +49 4152 889 333
> E-mail: peter.malo...@brockmann-consult.de
> Internet: http://www.brockmann-consult.de
> 
>
>


-- 

Best regards,
Vladimir
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-backed VM drive became corrupted after unexpected VM termination

2017-11-07 Thread Peter Maloney
I see nobarrier in there... Try without that. (unless that's just the
bluestore xfs...then it probably won't change anything). And are the
osds using bluestore?

And what cache options did you set in the VM config? It's dangerous to
set writeback without also this in the client side ceph.conf:

rbd cache writethrough until flush = true
rbd_cache = true



On 11/07/17 14:36, Дробышевский, Владимир wrote:
> Hello!
>
>   I've got a weird situation with rdb drive image reliability. I found
> that after hard-reset VM with ceph rbd drive from my new cluster
> become corrupted. I accidentally found it during HA tests of my new
> cloud cluster: after host reset VM was not able to boot again because
> of the virtual drive errors. The same result will be if you just kill
> qemu process (like would happened at host crash time).
>
>   First of all I thought it is a guest OS problem. But then I tried
> RouterOS (linux based), Linux, FreeBSD - all options show the same
> behavior. 
>   Then I blamed OpenNebula installation. For the test sake I've
> installed the latest Proxmox (5.1-36) to another server. The first
> subtest: I've created a VM in OpenNebula from predefined image, shut
> it down, then create Proxmox VM and pointed it to the image was
> created from OpenNebula.
> The second subtest: I've made a clean install from ISO with from
> Proxmox console, having previously created from Proxmox VM and drive
> image (of course, on the same ceph pool).
>   Both results: unbootable VMs.
>
>   Finally I've made a clean install to the fresh VM with local
> LVM-backed drive image. And - guess what? - it survived qemu process kill.
>   
>   This is the first situation of this kind in my practice so I would
> like to ask for guidance. I believe that it is a cache problem of some
> kind, but I haven't faced it with earlier releases.
>
>   Some cluster details:
>
>   It's a small test cluster with 4 nodes, each has:
>
>   2x CPU E5-2665,
>   128GB RAM
>   1 OSD with Samsung sm863 1.92TB drive
>   IB connection with IPoIB on QDR IB network
>
>   OS: Ubuntu 16.04 with 4.10 kernel
>   ceph: luminous 12.2.1
>
>   Client (kvm host) OSes: 
>   1. Ubuntu 16.04 (the same hosts as ceph cluster)
>   2. Debian 9.1 in case of Proxmox
>
>
> *ceph.conf:*
>
> [global]
> fsid = 6a8ffc55-fa2e-48dc-a71c-647e1fff749b
>
> public_network = 10.103.0.0/16 
> cluster_network = 10.104.0.0/16 
>
> mon_initial_members = e001n01, e001n02, e001n03
> mon_host = 10.103.0.1,10.103.0.2,10.103.0.3
>
> rbd default format = 2
>
> auth_cluster_required = cephx
> auth_service_required = cephx
> auth_client_required = cephx
>
> osd mount options = rw,noexec,nodev,noatime,nodiratime,nobarrier
> osd mount options xfs = rw,noexec,nodev,noatime,nodiratime,nobarrier
> osd_mkfs_type = xfs
>
> bluestore fsck on mount = true
>   
> debug_lockdep = 0/0
> debug_context = 0/0
> debug_crush = 0/0
> debug_buffer = 0/0
> debug_timer = 0/0
> debug_filer = 0/0
> debug_objecter = 0/0
> debug_rados = 0/0
> debug_rbd = 0/0
> debug_journaler = 0/0
> debug_objectcatcher = 0/0
> debug_client = 0/0
> debug_osd = 0/0
> debug_optracker = 0/0
> debug_objclass = 0/0
> debug_filestore = 0/0
> debug_journal = 0/0
> debug_ms = 0/0
> debug_monc = 0/0
> debug_tp = 0/0
> debug_auth = 0/0
> debug_finisher = 0/0
> debug_heartbeatmap = 0/0
> debug_perfcounter = 0/0
> debug_asok = 0/0
> debug_throttle = 0/0
> debug_mon = 0/0
> debug_paxos = 0/0
> debug_rgw = 0/0
>
> [osd]
> osd op threads = 4
> osd disk threads = 2
> osd max backfills = 1
> osd recovery threads = 1
> osd recovery max active = 1
>
> -- 
>
> Best regards,
> Vladimir
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


-- 


Peter Maloney
Brockmann Consult
Max-Planck-Str. 2
21502 Geesthacht
Germany
Tel: +49 4152 889 300
Fax: +49 4152 889 333
E-mail: peter.malo...@brockmann-consult.de
Internet: http://www.brockmann-consult.de


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph-backed VM drive became corrupted after unexpected VM termination

2017-11-07 Thread Дробышевский , Владимир
Hello!

  I've got a weird situation with rdb drive image reliability. I found that
after hard-reset VM with ceph rbd drive from my new cluster become
corrupted. I accidentally found it during HA tests of my new cloud cluster:
after host reset VM was not able to boot again because of the virtual drive
errors. The same result will be if you just kill qemu process (like would
happened at host crash time).

  First of all I thought it is a guest OS problem. But then I tried
RouterOS (linux based), Linux, FreeBSD - all options show the same
behavior.
  Then I blamed OpenNebula installation. For the test sake I've installed
the latest Proxmox (5.1-36) to another server. The first subtest: I've
created a VM in OpenNebula from predefined image, shut it down, then create
Proxmox VM and pointed it to the image was created from OpenNebula.
The second subtest: I've made a clean install from ISO with from Proxmox
console, having previously created from Proxmox VM and drive image (of
course, on the same ceph pool).
  Both results: unbootable VMs.

  Finally I've made a clean install to the fresh VM with local LVM-backed
drive image. And - guess what? - it survived qemu process kill.

  This is the first situation of this kind in my practice so I would like
to ask for guidance. I believe that it is a cache problem of some kind, but
I haven't faced it with earlier releases.

  Some cluster details:

  It's a small test cluster with 4 nodes, each has:

  2x CPU E5-2665,
  128GB RAM
  1 OSD with Samsung sm863 1.92TB drive
  IB connection with IPoIB on QDR IB network

  OS: Ubuntu 16.04 with 4.10 kernel
  ceph: luminous 12.2.1

  Client (kvm host) OSes:
  1. Ubuntu 16.04 (the same hosts as ceph cluster)
  2. Debian 9.1 in case of Proxmox


*ceph.conf:*

[global]
fsid = 6a8ffc55-fa2e-48dc-a71c-647e1fff749b

public_network = 10.103.0.0/16
cluster_network = 10.104.0.0/16

mon_initial_members = e001n01, e001n02, e001n03
mon_host = 10.103.0.1,10.103.0.2,10.103.0.3

rbd default format = 2

auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx

osd mount options = rw,noexec,nodev,noatime,nodiratime,nobarrier
osd mount options xfs = rw,noexec,nodev,noatime,nodiratime,nobarrier
osd_mkfs_type = xfs

bluestore fsck on mount = true

debug_lockdep = 0/0
debug_context = 0/0
debug_crush = 0/0
debug_buffer = 0/0
debug_timer = 0/0
debug_filer = 0/0
debug_objecter = 0/0
debug_rados = 0/0
debug_rbd = 0/0
debug_journaler = 0/0
debug_objectcatcher = 0/0
debug_client = 0/0
debug_osd = 0/0
debug_optracker = 0/0
debug_objclass = 0/0
debug_filestore = 0/0
debug_journal = 0/0
debug_ms = 0/0
debug_monc = 0/0
debug_tp = 0/0
debug_auth = 0/0
debug_finisher = 0/0
debug_heartbeatmap = 0/0
debug_perfcounter = 0/0
debug_asok = 0/0
debug_throttle = 0/0
debug_mon = 0/0
debug_paxos = 0/0
debug_rgw = 0/0

[osd]
osd op threads = 4
osd disk threads = 2
osd max backfills = 1
osd recovery threads = 1
osd recovery max active = 1

-- 

Best regards,
Vladimir
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com