Am 29.12.21 um 13:51 schrieb Сергей Цаболов: > Hi, Uwe > > 29.12.2021 14:16, Uwe Sauter пишет: >> Just a feeling but I'd say that the imbalance in OSDs (one host having many >> more disks than the >> rest) is your problem. > > Yes, last node in cluster have more disk then the rest, but > > one disk is 12TB and all others 9 HD is 1TB > >> >> Assuming that your configuration keeps 3 copies of each VM image then the >> imbalance probably means >> that 2 of these 3 copies reside on pve-3111 and if this host is unavailable, >> all VM images with 2 >> copies on that host become unresponsive, too. > > In Proxmox web ceph pool I set the Size: 2 , Min.Size: 2 >
So this means that you want to have 2 copies in the regular case (size) and also 2 copies in the failure case (min size) so that the VMs stay available. So you might solve your problem by decreasing min size to 1 (dangerous!!) or by increasing size to 3, which means that in the regular case you will have 3 copies but if only 2 are available, it will still work and re-sync the 3rd copy once it comes online again. > With : ceph osd map vm.pool object-name (vm ID) I see some of vm object one > copy is on osd.12, > example : > > osdmap e14321 pool 'vm.pool' (2) object '114' -> pg 2.10486407 (2.7) -> up > ([12,8], p12) acting > ([12,8], p12) > > But this example : > > osdmap e14321 pool 'vm.pool' (2) object '113' -> pg 2.8bd09f6d (2.36d) -> up > ([10,7], p10) acting > ([10,7], p10) > > osd.10 and osd.7 > >> >> Check your failure domain for Ceph and possibly change it from OSD to host. >> This should prevent that >> one host holds multiple copies of a VM image. > > I didn 't understand a little what to check ? > > Can you explain me with example? > I don't have an example but you can read about the concept at: https://docs.ceph.com/en/latest/rados/operations/crush-map/#crush-maps Regards, Uwe > >> >> >> Regards, >> >> Uwe >> >> Am 29.12.21 um 09:36 schrieb Сергей Цаболов: >>> Hello to all. >>> >>> In my case I have the 7 node cluster Proxmox and working Ceph (ceph version >>> 15.2.15 octopus >>> (stable)": 7) >>> >>> Ceph HEALTH_OK >>> >>> ceph -s >>> cluster: >>> id: 9662e3fa-4ce6-41df-8d74-5deaa41a8dde >>> health: HEALTH_OK >>> >>> services: >>> mon: 7 daemons, quorum >>> pve-3105,pve-3107,pve-3108,pve-3103,pve-3101,pve-3111,pve-3109 (age 17h) >>> mgr: pve-3107(active, since 41h), standbys: pve-3109, pve-3103, >>> pve-3105, pve-3101, pve-3111, >>> pve-3108 >>> mds: cephfs:1 {0=pve-3105=up:active} 6 up:standby >>> osd: 22 osds: 22 up (since 17h), 22 in (since 17h) >>> >>> task status: >>> >>> data: >>> pools: 4 pools, 1089 pgs >>> objects: 1.09M objects, 4.1 TiB >>> usage: 7.7 TiB used, 99 TiB / 106 TiB avail >>> pgs: 1089 active+clean >>> >>> --------------------------------------------------------------------------------------------------------------------- >>> >>> >>> >>> ceph osd tree >>> >>> ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF >>> -1 106.43005 root default >>> -13 14.55478 host pve-3101 >>> 10 hdd 7.27739 osd.10 up 1.00000 1.00000 >>> 11 hdd 7.27739 osd.11 up 1.00000 1.00000 >>> -11 14.55478 host pve-3103 >>> 8 hdd 7.27739 osd.8 up 1.00000 1.00000 >>> 9 hdd 7.27739 osd.9 up 1.00000 1.00000 >>> -3 14.55478 host pve-3105 >>> 0 hdd 7.27739 osd.0 up 1.00000 1.00000 >>> 1 hdd 7.27739 osd.1 up 1.00000 1.00000 >>> -5 14.55478 host pve-3107 >>> 2 hdd 7.27739 osd.2 up 1.00000 1.00000 >>> 3 hdd 7.27739 osd.3 up 1.00000 1.00000 >>> -9 14.55478 host pve-3108 >>> 6 hdd 7.27739 osd.6 up 1.00000 1.00000 >>> 7 hdd 7.27739 osd.7 up 1.00000 1.00000 >>> -7 14.55478 host pve-3109 >>> 4 hdd 7.27739 osd.4 up 1.00000 1.00000 >>> 5 hdd 7.27739 osd.5 up 1.00000 1.00000 >>> -15 19.10138 host pve-3111 >>> 12 hdd 10.91409 osd.12 up 1.00000 1.00000 >>> 13 hdd 0.90970 osd.13 up 1.00000 1.00000 >>> 14 hdd 0.90970 osd.14 up 1.00000 1.00000 >>> 15 hdd 0.90970 osd.15 up 1.00000 1.00000 >>> 16 hdd 0.90970 osd.16 up 1.00000 1.00000 >>> 17 hdd 0.90970 osd.17 up 1.00000 1.00000 >>> 18 hdd 0.90970 osd.18 up 1.00000 1.00000 >>> 19 hdd 0.90970 osd.19 up 1.00000 1.00000 >>> 20 hdd 0.90970 osd.20 up 1.00000 1.00000 >>> 21 hdd 0.90970 osd.21 up 1.00000 1.00000 >>> >>> --------------------------------------------------------------------------------------------------------------- >>> >>> >>> >>> POOL ID PGS STORED OBJECTS USED >>> %USED MAX AVAIL >>> vm.pool 2 1024 3.0 TiB 863.31k 6.0 TiB >>> 6.38 44 TiB (this pool >>> have the all VM disk) >>> >>> --------------------------------------------------------------------------------------------------------------- >>> >>> >>> >>> ceph osd map vm.pool vm.pool.object >>> osdmap e14319 pool 'vm.pool' (2) object 'vm.pool.object' -> pg 2.196f68d5 >>> (2.d5) -> up ([2,4], p2) >>> acting ([2,4], p2) >>> >>> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- >>> >>> >>> pveversion -v >>> proxmox-ve: 6.4-1 (running kernel: 5.4.143-1-pve) >>> pve-manager: 6.4-13 (running version: 6.4-13/9f411e79) >>> pve-kernel-helper: 6.4-8 >>> pve-kernel-5.4: 6.4-7 >>> pve-kernel-5.4.143-1-pve: 5.4.143-1 >>> pve-kernel-5.4.106-1-pve: 5.4.106-1 >>> ceph: 15.2.15-pve1~bpo10 >>> ceph-fuse: 15.2.15-pve1~bpo10 >>> corosync: 3.1.2-pve1 >>> criu: 3.11-3 >>> glusterfs-client: 5.5-3 >>> ifupdown: residual config >>> ifupdown2: 3.0.0-1+pve4~bpo10 >>> ksm-control-daemon: 1.3-1 >>> libjs-extjs: 6.0.1-10 >>> libknet1: 1.22-pve1~bpo10+1 >>> libproxmox-acme-perl: 1.1.0 >>> libproxmox-backup-qemu0: 1.1.0-1 >>> libpve-access-control: 6.4-3 >>> libpve-apiclient-perl: 3.1-3 >>> libpve-common-perl: 6.4-4 >>> libpve-guest-common-perl: 3.1-5 >>> libpve-http-server-perl: 3.2-3 >>> libpve-storage-perl: 6.4-1 >>> libqb0: 1.0.5-1 >>> libspice-server1: 0.14.2-4~pve6+1 >>> lvm2: 2.03.02-pve4 >>> lxc-pve: 4.0.6-2 >>> lxcfs: 4.0.6-pve1 >>> novnc-pve: 1.1.0-1 >>> proxmox-backup-client: 1.1.13-2 >>> proxmox-mini-journalreader: 1.1-1 >>> proxmox-widget-toolkit: 2.6-1 >>> pve-cluster: 6.4-1 >>> pve-container: 3.3-6 >>> pve-docs: 6.4-2 >>> pve-edk2-firmware: 2.20200531-1 >>> pve-firewall: 4.1-4 >>> pve-firmware: 3.3-2 >>> pve-ha-manager: 3.1-1 >>> pve-i18n: 2.3-1 >>> pve-qemu-kvm: 5.2.0-6 >>> pve-xtermjs: 4.7.0-3 >>> qemu-server: 6.4-2 >>> smartmontools: 7.2-pve2 >>> spiceterm: 3.1-1 >>> vncterm: 1.6-2 >>> zfsutils-linux: 2.0.6-pve1~bpo10+1 >>> >>> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- >>> >>> >>> >>> And now my problem: >>> >>> For all VM I have one pool for VM disks >>> >>> When node/host pve-3111 is shutdown in many of other nodes/hosts >>> pve-3107, pve-3105 VM not >>> shutdown but not available in network. >>> >>> After the node/host is up Ceph back to HEALTH_OK and the all VM back to >>> access in Network (without >>> reboot). >>> >>> Can some one to suggest me what I can to check in Ceph ? >>> >>> Thanks. >>> >> _______________________________________________ pve-user mailing list [email protected] https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
