[ceph-users] All shards of PG missing object and inconsistent
Hi all, I have recently performed a few tasks, namely purging several buckets from our RGWs and added additional hosts into Ceph causing some data movement for a rebalance. As this is now almost completed, I kicked off some deep scrubs and one PG is now returning the following information: 2018-09-21 23:17:59.717286 7f2f16796700 -1 log_channel(cluster) log [ERR] : 14.1b18 shard 313 missing 14:18daa344:::default.162489536.28__shadow_24TB%2f24TB%2fDESIGNTEAM%2fPROJEC TS%2fDLA Piper%2f_DLA_4033_ Global Thought Leadership%2fFilm%2f04 Assets%2fFootage %ef%80%a2 Audio Sync%2fDLA_Thought_Leadership_NYC%2fCam_1%2fA13I1483.MOV.2~5v3nJDNLONBYszy54 ZXZZQgos1D4Ywp.359_6:head 2018-09-21 23:17:59.717292 7f2f16796700 -1 log_channel(cluster) log [ERR] : 14.1b18 shard 665 missing 14:18daa344:::default.162489536.28__shadow_24TB%2f24TB%2fDESIGNTEAM%2fPROJEC TS%2fDLA Piper%2f_DLA_4033_ Global Thought Leadership%2fFilm%2f04 Assets%2fFootage %ef%80%a2 Audio Sync%2fDLA_Thought_Leadership_NYC%2fCam_1%2fA13I1483.MOV.2~5v3nJDNLONBYszy54 ZXZZQgos1D4Ywp.359_6:head 2018-09-21 23:17:59.885884 7f2f16796700 -1 log_channel(cluster) log [ERR] : 14.1b18 shard 385 missing 14:18daa344:::default.162489536.28__shadow_24TB%2f24TB%2fDESIGNTEAM%2fPROJEC TS%2fDLA Piper%2f_DLA_4033_ Global Thought Leadership%2fFilm%2f04 Assets%2fFootage %ef%80%a2 Audio Sync%2fDLA_Thought_Leadership_NYC%2fCam_1%2fA13I1483.MOV.2~5v3nJDNLONBYszy54 ZXZZQgos1D4Ywp.359_6:head 2018-09-21 23:20:24.954402 7f2f16796700 -1 log_channel(cluster) log [ERR] : 14.1b18 scrub stat mismatch, got 44026/44025 objects, 0/0 clones, 44026/44025 dirty, 0/0 omap, 0/0 pinned, 0/0 hit_set_archive, 0/0 whiteouts, 45423386817/45419192513 bytes, 0/0 hit_set_archive bytes. 2018-09-21 23:20:24.954418 7f2f16796700 -1 log_channel(cluster) log [ERR] : 14.1b18 scrub 1 missing, 0 inconsistent objects 2018-09-21 23:20:24.954421 7f2f16796700 -1 log_channel(cluster) log [ERR] : 14.1b18 scrub 4 errors The object I recognise by name as belonging to a bucket purged earlier in the day, and is meant to be deleted. What would be the best means to resolve this inconsistency when the object is supposed to be absent? Kind Regards, Thomas ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Bluestore DB showing as ssd
Hi all. Quick question about osd metadata information. I have several OSDs setup with the data dir on HDD and the db going to a partition on ssd. But when I look at the metadata for all the OSDs, it's showing the db as "hdd". Does this effect anything? And is there anyway to change it? $ sudo ceph osd metadata 1 { "id": 1, "arch": "x86_64", "back_addr": ":6805/2053608", "back_iface": "eth0", "bluefs": "1", "bluefs_db_access_mode": "blk", "bluefs_db_block_size": "4096", "bluefs_db_dev": "8:80", "bluefs_db_dev_node": "sdf", "bluefs_db_driver": "KernelDevice", "bluefs_db_model": "PERC H730 Mini ", "bluefs_db_partition_path": "/dev/sdf2", "bluefs_db_rotational": "1", "bluefs_db_size": "266287972352", *"bluefs_db_type": "hdd",* "bluefs_single_shared_device": "0", "bluefs_slow_access_mode": "blk", "bluefs_slow_block_size": "4096", "bluefs_slow_dev": "253:1", "bluefs_slow_dev_node": "dm-1", "bluefs_slow_driver": "KernelDevice", "bluefs_slow_model": "", "bluefs_slow_partition_path": "/dev/dm-1", "bluefs_slow_rotational": "1", "bluefs_slow_size": "6000601989120", "bluefs_slow_type": "hdd", "bluestore_bdev_access_mode": "blk", "bluestore_bdev_block_size": "4096", "bluestore_bdev_dev": "253:1", "bluestore_bdev_dev_node": "dm-1", "bluestore_bdev_driver": "KernelDevice", "bluestore_bdev_model": "", "bluestore_bdev_partition_path": "/dev/dm-1", "bluestore_bdev_rotational": "1", "bluestore_bdev_size": "6000601989120", "bluestore_bdev_type": "hdd", "ceph_version": "ceph version 12.2.4 (52085d5249a80c5f5121a76d6288429f35e4e77b) luminous (stable)", "cpu": "Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz", "default_device_class": "hdd", "distro": "centos", "distro_description": "CentOS Linux 7 (Core)", "distro_version": "7", "front_addr": ":6804/2053608", "front_iface": "eth0", "hb_back_addr": ".78:6806/2053608", "hb_front_addr": ".78:6807/2053608", "hostname": "ceph0rdi-osd2-1-xrd.eng.sfdc.net", "journal_rotational": "1", "kernel_description": "#1 SMP Tue Jun 26 16:32:21 UTC 2018", "kernel_version": "3.10.0-862.6.3.el7.x86_64", "mem_swap_kb": "0", "mem_total_kb": "131743604", "os": "Linux", "osd_data": "/var/lib/ceph/osd/ceph-1", "osd_objectstore": "bluestore", "rotational": "1" } ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] crush map reclassifier
I've used a crush location hook script to handle this before device classes existed. It checked the device type on startup and assigned the crush position based on this. I don't have that crush map any longer anywhere but the basic version of it looked like this: two roots "hdd" and "ssd". The hdd roots had servers with their hostname in it and the ssd root had buckets of type host "-ssd" appended to the hostname (for a reason I don't remember). At some time someone consolidated the two roots under yet another root because some tool (I think it might have been proxmox?) couldn't handle separate roots, especially if none of them was named "default". We then had two roots within another bucket of type root which (surprisingly) worked and is probably a weird edge case. Probably not too helpful because I don't have any IDs or anything left from that era... Paul Am Sa., 22. Sep. 2018 um 00:39 Uhr schrieb Sage Weil : > > Hi everyone, > > In luminous we added the crush device classes that automagically > categorize your OSDs and hdd, ssd, etc, and allow you write CRUSH rules > that target a subset of devices. Prior to this it was necessary to make > custom edits to your CRUSH map with parallel hierarchies for each > OSD type, and (similarly) to disable the osd_crush_update_on_start option. > > As Dan has noted previously, transitioning from a legacy map to a modern > one using classes in the naive way will reshuffle all of your data. He > worked out a procedure do do this manually but it is delicate and error > prone. I'm working on a tool to do it in a robust/safe way now. > > However... I want to make sure that the tool is sufficiently general. > Can anyone/everyone who has a customized CRUSH map to deal with different > OSD device types please send me a copy (e.g., ceph osd getcrushmap -o > mycrushmap) so I can test the tool against your map? > > Thanks! > sage > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] crush map reclassifier
Hi everyone, In luminous we added the crush device classes that automagically categorize your OSDs and hdd, ssd, etc, and allow you write CRUSH rules that target a subset of devices. Prior to this it was necessary to make custom edits to your CRUSH map with parallel hierarchies for each OSD type, and (similarly) to disable the osd_crush_update_on_start option. As Dan has noted previously, transitioning from a legacy map to a modern one using classes in the naive way will reshuffle all of your data. He worked out a procedure do do this manually but it is delicate and error prone. I'm working on a tool to do it in a robust/safe way now. However... I want to make sure that the tool is sufficiently general. Can anyone/everyone who has a customized CRUSH map to deal with different OSD device types please send me a copy (e.g., ceph osd getcrushmap -o mycrushmap) so I can test the tool against your map? Thanks! sage ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] radosgw rest API to retrive rgw log entries
I am looking for an API equivalent of 'radosgw-admin log list' and 'radosgw-admin log show'. Existing /usage API only reports bucket level numbers like 'radosgw-admin usage show' does. Does anyone know if this is possible from rest API? Thanks. Jin. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] PG stuck incomplete
Le vendredi 21 septembre 2018 à 19:45 +0200, Paul Emmerich a écrit : > The cache tiering has nothing to do with the PG of the underlying > pool > being incomplete. > You are just seeing these requests as stuck because it's the only > thing trying to write to the underlying pool. I agree, It was just to be sure that the problems on OSD 32, 68 and 69 are related to only one "real" problem. > What you need to fix is the PG showing incomplete. I assume you > already tried reducing the min_size to 4 as suggested? Or did you by > chance always run with min_size 4 on the ec pool, which is a common > cause for problems like this. Yes, it has always run with min_size 4. We use Luminous 12.2.8 here, but some (~40%) OSD still run Luminous 12.2.7. I was hoping to "fix" this problem before to continue upgrading. pool details : pool 37 'bkp-foo-raid6' erasure size 6 min_size 4 crush_rule 20 object_hash rjenkins pg_num 256 pgp_num 256 last_change 585715 lfor 585714/585714 flags hashpspool,backfillfull stripe_width 4096 fast_read 1 application rbd removed_snaps [1~3] > Can you share the output of "ceph osd pool ls detail"? > Also, which version of Ceph are you running? > Paul > > Am Fr., 21. Sep. 2018 um 19:28 Uhr schrieb Olivier Bonvalet > : > > > > So I've totally disable cache-tiering and overlay. Now OSD 68 & 69 > > are > > fine, no more blocked. > > > > But OSD 32 is still blocked, and PG 37.9c still marked incomplete > > with > > : > > > > "recovery_state": [ > > { > > "name": "Started/Primary/Peering/Incomplete", > > "enter_time": "2018-09-21 18:56:01.222970", > > "comment": "not enough complete instances of this PG" > > }, > > > > But I don't see blocked requests in OSD.32 logs, should I increase > > one > > of the "debug_xx" flag ? > > > > > > Le vendredi 21 septembre 2018 à 16:51 +0200, Maks Kowalik a écrit : > > > According to the query output you pasted shards 1 and 2 are > > > broken. > > > But, on the other hand EC profile (4+2) should make it possible > > > to > > > recover from 2 shards lost simultanously... > > > > > > pt., 21 wrz 2018 o 16:29 Olivier Bonvalet > > > napisał(a): > > > > Well on drive, I can find thoses parts : > > > > > > > > - cs0 on OSD 29 and 30 > > > > - cs1 on OSD 18 and 19 > > > > - cs2 on OSD 13 > > > > - cs3 on OSD 66 > > > > - cs4 on OSD 0 > > > > - cs5 on OSD 75 > > > > > > > > And I can read thoses files too. > > > > > > > > And all thoses OSD are UP and IN. > > > > > > > > > > > > Le vendredi 21 septembre 2018 à 13:10 +, Eugen Block a > > > > écrit : > > > > > > > I tried to flush the cache with "rados -p cache-bkp-foo > > > > > > > > cache- > > > > > > > flush- > > > > > > > evict-all", but it blocks on the object > > > > > > > "rbd_data.f66c92ae8944a.000f2596". > > > > > > > > > > This is the object that's stuck in the cache tier (according > > > > > to > > > > > your > > > > > output in https://pastebin.com/zrwu5X0w). Can you verify if > > > > > that > > > > > block > > > > > device is in use and healthy or is it corrupt? > > > > > > > > > > > > > > > Zitat von Maks Kowalik : > > > > > > > > > > > Could you, please paste the output of pg 37.9c query > > > > > > > > > > > > pt., 21 wrz 2018 o 14:39 Olivier Bonvalet < > > > > > > ceph.l...@daevel.fr> > > > > > > napisał(a): > > > > > > > > > > > > > In fact, one object (only one) seem to be blocked on the > > > > > > > > cache > > > > > > > tier > > > > > > > (writeback). > > > > > > > > > > > > > > I tried to flush the cache with "rados -p cache-bkp-foo > > > > > > > > cache- > > > > > > > flush- > > > > > > > evict-all", but it blocks on the object > > > > > > > "rbd_data.f66c92ae8944a.000f2596". > > > > > > > > > > > > > > So I reduced (a lot) the cache tier to 200MB, "rados -p > > > > > > > > cache- > > > > > > > bkp-foo > > > > > > > ls" now show only 3 objects : > > > > > > > > > > > > > > rbd_directory > > > > > > > rbd_data.f66c92ae8944a.000f2596 > > > > > > > rbd_header.f66c92ae8944a > > > > > > > > > > > > > > And "cache-flush-evict-all" still hangs. > > > > > > > > > > > > > > I also switched the cache tier to "readproxy", to avoid > > > > > > > using > > > > > > > this > > > > > > > cache. But, it's still blocked. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Le vendredi 21 septembre 2018 à 02:14 +0200, Olivier > > > > > > > Bonvalet > > > > > > > > a > > > > > > > écrit : > > > > > > > > Hello, > > > > > > > > > > > > > > > > on a Luminous cluster, I have a PG incomplete and I > > > > > > > > can't > > > > > > > > find > > > > > > > > how to > > > > > > > > fix that. > > > > > > > > > > > > > > > > It's an EC pool (4+2) : > > > > > > > > > > > > > > > > pg 37.9c is incomplete, acting [32,50,59,1,0,75] > > > > > > > > (reducing > > > > > > > > pool > > > > > > > > bkp-sb-raid6 min_size from 4 may help; search > > > > > > > >
Re: [ceph-users] Proxmox/ceph upgrade and addition of a new node/OSDs
On Fri, Sep 21, 2018 at 09:03:15AM +0200, Hervé Ballans wrote: > Hi MJ (and all), > > So we upgraded our Proxmox/Ceph cluster, and if we have to summarize the > operation in a few words : overall, everything went well :) > The most critical operation of all is the 'osd crush tunables optimal', I > talk about it in more detail after... > > The Proxmox documentation is really well written and accurate and, normally, > following the documentation step by step is almost sufficient ! Glad to hear that everything worked well. > > * first step : upgrade Ceph Jewel to Luminous : > https://pve.proxmox.com/wiki/Ceph_Jewel_to_Luminous > (Note here : OSDs remain in FileStore backend, no BlueStore migration) > > * second step : upgrade Proxmox version 4 to 5 : > https://pve.proxmox.com/wiki/Upgrade_from_4.x_to_5.0 > > Just some numbers, observations and tips (based on our feedback, I'm not an > expert !) : > > * Before migration, make sure you are in the lastest version of Proxmox 4 > (4.4-24) and Ceph Jewel (10.2.11) > > * We don't use the pve repository for ceph packages but the official one > (download.ceph.com). Thus, during the upgrade of Promox PVE, we don't > replace ceph.com repository with promox.com Ceph repository... This is not recommended (and for a reason) - our packages are almost identical to the upstream/official ones. But we do include the occasional bug fix much faster than the official packages do, including reverting breakage. Furthermore, when using our repository, you know that the packages went through our own testing to ensure compatibility with our stack (e.g., issues like JSON output changing from one minor release to the next breaking our integration/GUI). Also, this natural delay between upstream releases and availability in our repository has saved our users from lots of "serious bug noticed one day after release" issues since we switched to providing Ceph via our own repositories. > * When you upgrade Ceph to Luminous (without tunables optimal), there is no > impact on Proxmox 4. VMs are still running normally. > The side effect (non blocking for the functionning of VMs) is located in the > GUI, on the Ceph menu : it can't report the status of the ceph cluster as it > has a JSON formatting error (indeed the output of the command 'ceph -s' is > completely different, really more readable on Luminous) Yes, this is to be expected. Backporting all of that just for the short time window of "upgrade in progress" is too much work for too little gain. > > * It misses a little step in section 8 "Create Manager instances" of the > upgrade ceph documentation. As the Ceph manager daemon is new since > Luminous, the package doesn't exist on Jewel. So you have to install the > ceph-mgr package on each node first before doing 'pveceph createmgr'||| > | It actually does not ;) ceph-mgr is pulled in by ceph on upgrades from Jewel to Luminous - unless you manually removed that package at some point. > Otherwise : > - verify that all your VMs are recently backuped on an external storage (in > case of Disaster recovery Plan !) Good idea in general :D > - if you can, stop all your non-critical VMs (in order to limit client io > operations) > - if any, wait for the end of current backups then disable datacenter backup > (in order to limit client io operations). !! do not forget to re-enable it > when all is over !! > - if any and if no longer needed, delete your snapshots, it removes many > useless objects ! > - start the tunables operation outside of major activity periods (night, > week-end, ??) and take into account that it can be very slow... Scheduling and carefully planning rebalancing operations is always needed on a production cluster. Note that the upgrade docs state that switching to "tunables optimal" is recommended, but "will cause a massive rebalance". > There are probably some options to configure in ceph to avoid 'pgs stuck' > states, but on our side, as we previously moved our critical VM's disks, we > didn't care about that ! > > * Anyway, the upgrade step of Proxmox PVE is done easily and quickly (just > follow the documentation). Note that you can upgrade Proxmox PVE before > doing the 'tunables optimal' operation. > > Hoping that you will find this information useful, good luck with your very > next migration ! Thank you for the detailled report and feedback! ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] PG stuck incomplete
The cache tiering has nothing to do with the PG of the underlying pool being incomplete. You are just seeing these requests as stuck because it's the only thing trying to write to the underlying pool. What you need to fix is the PG showing incomplete. I assume you already tried reducing the min_size to 4 as suggested? Or did you by chance always run with min_size 4 on the ec pool, which is a common cause for problems like this. Can you share the output of "ceph osd pool ls detail"? Also, which version of Ceph are you running? Paul Am Fr., 21. Sep. 2018 um 19:28 Uhr schrieb Olivier Bonvalet : > > So I've totally disable cache-tiering and overlay. Now OSD 68 & 69 are > fine, no more blocked. > > But OSD 32 is still blocked, and PG 37.9c still marked incomplete with > : > > "recovery_state": [ > { > "name": "Started/Primary/Peering/Incomplete", > "enter_time": "2018-09-21 18:56:01.222970", > "comment": "not enough complete instances of this PG" > }, > > But I don't see blocked requests in OSD.32 logs, should I increase one > of the "debug_xx" flag ? > > > Le vendredi 21 septembre 2018 à 16:51 +0200, Maks Kowalik a écrit : > > According to the query output you pasted shards 1 and 2 are broken. > > But, on the other hand EC profile (4+2) should make it possible to > > recover from 2 shards lost simultanously... > > > > pt., 21 wrz 2018 o 16:29 Olivier Bonvalet > > napisał(a): > > > Well on drive, I can find thoses parts : > > > > > > - cs0 on OSD 29 and 30 > > > - cs1 on OSD 18 and 19 > > > - cs2 on OSD 13 > > > - cs3 on OSD 66 > > > - cs4 on OSD 0 > > > - cs5 on OSD 75 > > > > > > And I can read thoses files too. > > > > > > And all thoses OSD are UP and IN. > > > > > > > > > Le vendredi 21 septembre 2018 à 13:10 +, Eugen Block a écrit : > > > > > > I tried to flush the cache with "rados -p cache-bkp-foo > > > cache- > > > > > > flush- > > > > > > evict-all", but it blocks on the object > > > > > > "rbd_data.f66c92ae8944a.000f2596". > > > > > > > > This is the object that's stuck in the cache tier (according to > > > > your > > > > output in https://pastebin.com/zrwu5X0w). Can you verify if that > > > > block > > > > device is in use and healthy or is it corrupt? > > > > > > > > > > > > Zitat von Maks Kowalik : > > > > > > > > > Could you, please paste the output of pg 37.9c query > > > > > > > > > > pt., 21 wrz 2018 o 14:39 Olivier Bonvalet > > > > > napisał(a): > > > > > > > > > > > In fact, one object (only one) seem to be blocked on the > > > cache > > > > > > tier > > > > > > (writeback). > > > > > > > > > > > > I tried to flush the cache with "rados -p cache-bkp-foo > > > cache- > > > > > > flush- > > > > > > evict-all", but it blocks on the object > > > > > > "rbd_data.f66c92ae8944a.000f2596". > > > > > > > > > > > > So I reduced (a lot) the cache tier to 200MB, "rados -p > > > cache- > > > > > > bkp-foo > > > > > > ls" now show only 3 objects : > > > > > > > > > > > > rbd_directory > > > > > > rbd_data.f66c92ae8944a.000f2596 > > > > > > rbd_header.f66c92ae8944a > > > > > > > > > > > > And "cache-flush-evict-all" still hangs. > > > > > > > > > > > > I also switched the cache tier to "readproxy", to avoid using > > > > > > this > > > > > > cache. But, it's still blocked. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Le vendredi 21 septembre 2018 à 02:14 +0200, Olivier Bonvalet > > > a > > > > > > écrit : > > > > > > > Hello, > > > > > > > > > > > > > > on a Luminous cluster, I have a PG incomplete and I can't > > > find > > > > > > > how to > > > > > > > fix that. > > > > > > > > > > > > > > It's an EC pool (4+2) : > > > > > > > > > > > > > > pg 37.9c is incomplete, acting [32,50,59,1,0,75] > > > (reducing > > > > > > > pool > > > > > > > bkp-sb-raid6 min_size from 4 may help; search ceph.com/docs > > > for > > > > > > > 'incomplete') > > > > > > > > > > > > > > Of course, we can't reduce min_size from 4. > > > > > > > > > > > > > > And the full state : https://pastebin.com/zrwu5X0w > > > > > > > > > > > > > > So, IO are blocked, we can't access thoses damaged data. > > > > > > > OSD blocks too : > > > > > > > osds 32,68,69 have stuck requests > 4194.3 sec > > > > > > > > > > > > > > OSD 32 is the primary of this PG. > > > > > > > And OSD 68 and 69 are for cache tiering. > > > > > > > > > > > > > > Any idea how can I fix that ? > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > Olivier > > > > > > > > > > > > > > > > > > > > > ___ > > > > > > > ceph-users mailing list > > > > > > > ceph-users@lists.ceph.com > > > > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > > > > > > > > > > > > > > > ___ > > > > > > ceph-users mailing list > > > > > > ceph-users@lists.ceph.com > > > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > > > > > > > > > >
Re: [ceph-users] PG stuck incomplete
So I've totally disable cache-tiering and overlay. Now OSD 68 & 69 are fine, no more blocked. But OSD 32 is still blocked, and PG 37.9c still marked incomplete with : "recovery_state": [ { "name": "Started/Primary/Peering/Incomplete", "enter_time": "2018-09-21 18:56:01.222970", "comment": "not enough complete instances of this PG" }, But I don't see blocked requests in OSD.32 logs, should I increase one of the "debug_xx" flag ? Le vendredi 21 septembre 2018 à 16:51 +0200, Maks Kowalik a écrit : > According to the query output you pasted shards 1 and 2 are broken. > But, on the other hand EC profile (4+2) should make it possible to > recover from 2 shards lost simultanously... > > pt., 21 wrz 2018 o 16:29 Olivier Bonvalet > napisał(a): > > Well on drive, I can find thoses parts : > > > > - cs0 on OSD 29 and 30 > > - cs1 on OSD 18 and 19 > > - cs2 on OSD 13 > > - cs3 on OSD 66 > > - cs4 on OSD 0 > > - cs5 on OSD 75 > > > > And I can read thoses files too. > > > > And all thoses OSD are UP and IN. > > > > > > Le vendredi 21 septembre 2018 à 13:10 +, Eugen Block a écrit : > > > > > I tried to flush the cache with "rados -p cache-bkp-foo > > cache- > > > > > flush- > > > > > evict-all", but it blocks on the object > > > > > "rbd_data.f66c92ae8944a.000f2596". > > > > > > This is the object that's stuck in the cache tier (according to > > > your > > > output in https://pastebin.com/zrwu5X0w). Can you verify if that > > > block > > > device is in use and healthy or is it corrupt? > > > > > > > > > Zitat von Maks Kowalik : > > > > > > > Could you, please paste the output of pg 37.9c query > > > > > > > > pt., 21 wrz 2018 o 14:39 Olivier Bonvalet > > > > napisał(a): > > > > > > > > > In fact, one object (only one) seem to be blocked on the > > cache > > > > > tier > > > > > (writeback). > > > > > > > > > > I tried to flush the cache with "rados -p cache-bkp-foo > > cache- > > > > > flush- > > > > > evict-all", but it blocks on the object > > > > > "rbd_data.f66c92ae8944a.000f2596". > > > > > > > > > > So I reduced (a lot) the cache tier to 200MB, "rados -p > > cache- > > > > > bkp-foo > > > > > ls" now show only 3 objects : > > > > > > > > > > rbd_directory > > > > > rbd_data.f66c92ae8944a.000f2596 > > > > > rbd_header.f66c92ae8944a > > > > > > > > > > And "cache-flush-evict-all" still hangs. > > > > > > > > > > I also switched the cache tier to "readproxy", to avoid using > > > > > this > > > > > cache. But, it's still blocked. > > > > > > > > > > > > > > > > > > > > > > > > > Le vendredi 21 septembre 2018 à 02:14 +0200, Olivier Bonvalet > > a > > > > > écrit : > > > > > > Hello, > > > > > > > > > > > > on a Luminous cluster, I have a PG incomplete and I can't > > find > > > > > > how to > > > > > > fix that. > > > > > > > > > > > > It's an EC pool (4+2) : > > > > > > > > > > > > pg 37.9c is incomplete, acting [32,50,59,1,0,75] > > (reducing > > > > > > pool > > > > > > bkp-sb-raid6 min_size from 4 may help; search ceph.com/docs > > for > > > > > > 'incomplete') > > > > > > > > > > > > Of course, we can't reduce min_size from 4. > > > > > > > > > > > > And the full state : https://pastebin.com/zrwu5X0w > > > > > > > > > > > > So, IO are blocked, we can't access thoses damaged data. > > > > > > OSD blocks too : > > > > > > osds 32,68,69 have stuck requests > 4194.3 sec > > > > > > > > > > > > OSD 32 is the primary of this PG. > > > > > > And OSD 68 and 69 are for cache tiering. > > > > > > > > > > > > Any idea how can I fix that ? > > > > > > > > > > > > Thanks, > > > > > > > > > > > > Olivier > > > > > > > > > > > > > > > > > > ___ > > > > > > ceph-users mailing list > > > > > > ceph-users@lists.ceph.com > > > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > > > > > > > > > > > > ___ > > > > > ceph-users mailing list > > > > > ceph-users@lists.ceph.com > > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > > > > > > > > > > > > > ___ > > > ceph-users mailing list > > > ceph-users@lists.ceph.com > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > ___ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] PG stuck incomplete
According to the query output you pasted shards 1 and 2 are broken. But, on the other hand EC profile (4+2) should make it possible to recover from 2 shards lost simultanously... pt., 21 wrz 2018 o 16:29 Olivier Bonvalet napisał(a): > Well on drive, I can find thoses parts : > > - cs0 on OSD 29 and 30 > - cs1 on OSD 18 and 19 > - cs2 on OSD 13 > - cs3 on OSD 66 > - cs4 on OSD 0 > - cs5 on OSD 75 > > And I can read thoses files too. > > And all thoses OSD are UP and IN. > > > Le vendredi 21 septembre 2018 à 13:10 +, Eugen Block a écrit : > > > > I tried to flush the cache with "rados -p cache-bkp-foo cache- > > > > flush- > > > > evict-all", but it blocks on the object > > > > "rbd_data.f66c92ae8944a.000f2596". > > > > This is the object that's stuck in the cache tier (according to > > your > > output in https://pastebin.com/zrwu5X0w). Can you verify if that > > block > > device is in use and healthy or is it corrupt? > > > > > > Zitat von Maks Kowalik : > > > > > Could you, please paste the output of pg 37.9c query > > > > > > pt., 21 wrz 2018 o 14:39 Olivier Bonvalet > > > napisał(a): > > > > > > > In fact, one object (only one) seem to be blocked on the cache > > > > tier > > > > (writeback). > > > > > > > > I tried to flush the cache with "rados -p cache-bkp-foo cache- > > > > flush- > > > > evict-all", but it blocks on the object > > > > "rbd_data.f66c92ae8944a.000f2596". > > > > > > > > So I reduced (a lot) the cache tier to 200MB, "rados -p cache- > > > > bkp-foo > > > > ls" now show only 3 objects : > > > > > > > > rbd_directory > > > > rbd_data.f66c92ae8944a.000f2596 > > > > rbd_header.f66c92ae8944a > > > > > > > > And "cache-flush-evict-all" still hangs. > > > > > > > > I also switched the cache tier to "readproxy", to avoid using > > > > this > > > > cache. But, it's still blocked. > > > > > > > > > > > > > > > > > > > > Le vendredi 21 septembre 2018 à 02:14 +0200, Olivier Bonvalet a > > > > écrit : > > > > > Hello, > > > > > > > > > > on a Luminous cluster, I have a PG incomplete and I can't find > > > > > how to > > > > > fix that. > > > > > > > > > > It's an EC pool (4+2) : > > > > > > > > > > pg 37.9c is incomplete, acting [32,50,59,1,0,75] (reducing > > > > > pool > > > > > bkp-sb-raid6 min_size from 4 may help; search ceph.com/docs for > > > > > 'incomplete') > > > > > > > > > > Of course, we can't reduce min_size from 4. > > > > > > > > > > And the full state : https://pastebin.com/zrwu5X0w > > > > > > > > > > So, IO are blocked, we can't access thoses damaged data. > > > > > OSD blocks too : > > > > > osds 32,68,69 have stuck requests > 4194.3 sec > > > > > > > > > > OSD 32 is the primary of this PG. > > > > > And OSD 68 and 69 are for cache tiering. > > > > > > > > > > Any idea how can I fix that ? > > > > > > > > > > Thanks, > > > > > > > > > > Olivier > > > > > > > > > > > > > > > ___ > > > > > ceph-users mailing list > > > > > ceph-users@lists.ceph.com > > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > > > > > > > > > ___ > > > > ceph-users mailing list > > > > ceph-users@lists.ceph.com > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > > > > > > > > ___ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] No fix for 0x6706be76 CRCs ? [SOLVED] (WORKAROUND)
I have ubuntu servers. With ukuu I installed kernel 4.8.17-040817 (The last < 4.9 available kernel) and I haven't any 0x6706be76 crc since. Nor any inconsistence. On 19/09/18 12:01, Alfredo Daniel Rezinovsky wrote: Tried 4.17 with the same problem Just downgraded to 4.8. Let's see if no more 0x67... appears On 18/09/18 16:28, Alfredo Daniel Rezinovsky wrote: I started with this after upgrade to bionic. I had Xenial with lts kernels (4.13) without problem. I will try to change to ubuntu 4.13 and wait for the logs. Thanks On 18/09/18 16:27, Paul Emmerich wrote: Yeah, it's very likely a kernel bug (that no one managed to reduce to a simpler test case or even to reproduce it reliably with reasonable effort on a test system). 4.9 and earlier aren't affected as far as we can tell, we only encountered this after upgrading. But I think Bionic ships with a broken kernel. Try raising the issue with the ubuntu guys if you are using a distribution kernel. Paul 2018-09-18 21:23 GMT+02:00 Alfredo Daniel Rezinovsky : MOMENT !!! "Some kernels (4.9+) sometime fail to return data when reading from a block device under memory pressure." I dind't knew that was the problem. Can't I just dowgrade the kernel? There are known working versions o just need to be prior 4.9? On 18/09/18 16:19, Paul Emmerich wrote: We built a work-around here: https://github.com/ceph/ceph/pull/23273 Which hasn't been backported, but we'll ship 13.2.2 in our Debian packages for the croit OS image. Paul 2018-09-18 21:10 GMT+02:00 Alfredo Daniel Rezinovsky : Changed all my hardware. Now I have plenty of free ram. swap never needed, low iowait and still 7fdbbb73e700 -1 bluestore(/var/lib/ceph/osd/ceph-6) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x1e000, got 0x6706be76, expected 0x85a3fefe, device location [0x25ac04be000~1000], logical extent 0x1e000~1000, object #2:fd955b81:::1729cdb.0006 It happens sometimes, in all my OSDs. Bluestore OSDs with data in HDD and block.db in SSD After running pg repair the pgs were always repaired. running ceph in ubuntu 13.2.1-1bionic -- Alfredo Daniel Rezinovsky Director de Tecnologías de Información y Comunicaciones Facultad de Ingeniería - Universidad Nacional de Cuyo ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Alfredo Daniel Rezinovsky Director de Tecnologías de Información y Comunicaciones Facultad de Ingeniería - Universidad Nacional de Cuyo -- Alfredo Daniel Rezinovsky Director de Tecnologías de Información y Comunicaciones Facultad de Ingeniería - Universidad Nacional de Cuyo ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] PG stuck incomplete
Well on drive, I can find thoses parts : - cs0 on OSD 29 and 30 - cs1 on OSD 18 and 19 - cs2 on OSD 13 - cs3 on OSD 66 - cs4 on OSD 0 - cs5 on OSD 75 And I can read thoses files too. And all thoses OSD are UP and IN. Le vendredi 21 septembre 2018 à 13:10 +, Eugen Block a écrit : > > > I tried to flush the cache with "rados -p cache-bkp-foo cache- > > > flush- > > > evict-all", but it blocks on the object > > > "rbd_data.f66c92ae8944a.000f2596". > > This is the object that's stuck in the cache tier (according to > your > output in https://pastebin.com/zrwu5X0w). Can you verify if that > block > device is in use and healthy or is it corrupt? > > > Zitat von Maks Kowalik : > > > Could you, please paste the output of pg 37.9c query > > > > pt., 21 wrz 2018 o 14:39 Olivier Bonvalet > > napisał(a): > > > > > In fact, one object (only one) seem to be blocked on the cache > > > tier > > > (writeback). > > > > > > I tried to flush the cache with "rados -p cache-bkp-foo cache- > > > flush- > > > evict-all", but it blocks on the object > > > "rbd_data.f66c92ae8944a.000f2596". > > > > > > So I reduced (a lot) the cache tier to 200MB, "rados -p cache- > > > bkp-foo > > > ls" now show only 3 objects : > > > > > > rbd_directory > > > rbd_data.f66c92ae8944a.000f2596 > > > rbd_header.f66c92ae8944a > > > > > > And "cache-flush-evict-all" still hangs. > > > > > > I also switched the cache tier to "readproxy", to avoid using > > > this > > > cache. But, it's still blocked. > > > > > > > > > > > > > > > Le vendredi 21 septembre 2018 à 02:14 +0200, Olivier Bonvalet a > > > écrit : > > > > Hello, > > > > > > > > on a Luminous cluster, I have a PG incomplete and I can't find > > > > how to > > > > fix that. > > > > > > > > It's an EC pool (4+2) : > > > > > > > > pg 37.9c is incomplete, acting [32,50,59,1,0,75] (reducing > > > > pool > > > > bkp-sb-raid6 min_size from 4 may help; search ceph.com/docs for > > > > 'incomplete') > > > > > > > > Of course, we can't reduce min_size from 4. > > > > > > > > And the full state : https://pastebin.com/zrwu5X0w > > > > > > > > So, IO are blocked, we can't access thoses damaged data. > > > > OSD blocks too : > > > > osds 32,68,69 have stuck requests > 4194.3 sec > > > > > > > > OSD 32 is the primary of this PG. > > > > And OSD 68 and 69 are for cache tiering. > > > > > > > > Any idea how can I fix that ? > > > > > > > > Thanks, > > > > > > > > Olivier > > > > > > > > > > > > ___ > > > > ceph-users mailing list > > > > ceph-users@lists.ceph.com > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > > > > > > ___ > > > ceph-users mailing list > > > ceph-users@lists.ceph.com > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rbd-nbd map question
Hi I’m using 10.2.10 Thx On Fri, Sep 21, 2018 at 9:14 AM Mykola Golub wrote: > Vikas, could you tell what version do you observe this on? > > Because I can reproduce this only on jewel, and it has been fixed > starting since luminous 12.2.1 [1]. > > [1] http://tracker.ceph.com/issues/20426 > > On Wed, Sep 19, 2018 at 03:48:44PM -0400, Jason Dillaman wrote: > > Thanks for reporting this -- it looks like we broke the part where > > command-line config overrides were parsed out from the parsing. I've > > opened a tracker ticket against the issue [1]. > > > > On Wed, Sep 19, 2018 at 2:49 PM Vikas Rana wrote: > > > > > > Hi there, > > > > > > With default cluster name "ceph" I can map rbd-nbd without any issue. > > > > > > But for a different cluster name, i'm not able to map image using > rbd-nbd and getting > > > > > > root@vtier-P-node1:/etc/ceph# rbd-nbd --cluster cephdr map > test-pool/testvol > > > rbd-nbd: unknown command: --cluster > > > > > > > > > I looked at the man page and the syntax looks right. > > > Can someone please help me on what I'm doing wrong? > > > > > > Thanks, > > > -Vikas > > > ___ > > > ceph-users mailing list > > > ceph-users@lists.ceph.com > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > [1] http://tracker.ceph.com/issues/36089 > > > > -- > > Jason > > ___ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > -- > Mykola Golub > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] PG stuck incomplete
Yep : pool 38 'cache-bkp-foo' replicated size 3 min_size 2 crush_rule 26 object_hash rjenkins pg_num 128 pgp_num 128 last_change 585369 lfor 68255/68255 flags hashpspool,incomplete_clones tier_of 37 cache_mode readproxy target_bytes 209715200 hit_set bloom{false_positive_probability: 0.05, target_size: 0, seed: 0} 300s x2 decay_rate 0 search_last_n 0 min_read_recency_for_promote 10 min_write_recency_for_promote 2 stripe_width 0 I can't totally disable the cache tiering, because OSD are in filestore (so without "overwrites" feature). Le vendredi 21 septembre 2018 à 13:26 +, Eugen Block a écrit : > > I also switched the cache tier to "readproxy", to avoid using this > > cache. But, it's still blocked. > > You could change the cache mode to "none" to disable it. Could you > paste the output of: > > ceph osd pool ls detail | grep cache-bkp-foo > > > > Zitat von Olivier Bonvalet : > > > In fact, one object (only one) seem to be blocked on the cache tier > > (writeback). > > > > I tried to flush the cache with "rados -p cache-bkp-foo cache- > > flush- > > evict-all", but it blocks on the object > > "rbd_data.f66c92ae8944a.000f2596". > > > > So I reduced (a lot) the cache tier to 200MB, "rados -p cache-bkp- > > foo > > ls" now show only 3 objects : > > > > rbd_directory > > rbd_data.f66c92ae8944a.000f2596 > > rbd_header.f66c92ae8944a > > > > And "cache-flush-evict-all" still hangs. > > > > I also switched the cache tier to "readproxy", to avoid using this > > cache. But, it's still blocked. > > > > > > > > > > Le vendredi 21 septembre 2018 à 02:14 +0200, Olivier Bonvalet a > > écrit : > > > Hello, > > > > > > on a Luminous cluster, I have a PG incomplete and I can't find > > > how to > > > fix that. > > > > > > It's an EC pool (4+2) : > > > > > > pg 37.9c is incomplete, acting [32,50,59,1,0,75] (reducing > > > pool > > > bkp-sb-raid6 min_size from 4 may help; search ceph.com/docs for > > > 'incomplete') > > > > > > Of course, we can't reduce min_size from 4. > > > > > > And the full state : https://pastebin.com/zrwu5X0w > > > > > > So, IO are blocked, we can't access thoses damaged data. > > > OSD blocks too : > > > osds 32,68,69 have stuck requests > 4194.3 sec > > > > > > OSD 32 is the primary of this PG. > > > And OSD 68 and 69 are for cache tiering. > > > > > > Any idea how can I fix that ? > > > > > > Thanks, > > > > > > Olivier > > > > > > > > > ___ > > > ceph-users mailing list > > > ceph-users@lists.ceph.com > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > > > ___ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Dashboard Object Gateway
Hi Hendrik, thank you for reporting the issue. I've opened a tracker issue for that, see https://tracker.ceph.com/issues/36109. As workaround manually configure host and port via CLI using "ceph dashboard set-rgw-api-host " and "ceph dashboard set-rgw-api-port "? Regards Volker Am 18.09.2018 um 12:57 schrieb Hendrik Peyerl: > Hello all, > > we just deployed an Object Gateway to our CEPH Cluster via ceph-deploy > in an IPv6 only Mimic Cluster. To make sure the RGW listens on IPv6 we > set the following config: > rgw_frontends = civetweb port=[::]:7480 > > We now tried to enable the dashboard functionality for said gateway > but we are running into an error 500 after trying to access it via the > dashboard, the mgr log shows the following: > > {"status": "500 Internal Server Error", "version": "3.2.2", "detail": > "The server encountered an unexpected condition which prevented it > from fulfilling the request.", "traceback": "Traceback (most recent > call last):\\n File > \\"/usr/lib/python2.7/site-packages/cherrypy/_cprequest.py\\", line > 656, in respond\\n response.body = self.handler()\\n File > \\"/usr/lib/python2.7/site-packages/cherrypy/lib/encoding.py\\", line > 188, in __call__\\n self.body = self.oldhandler(*args, **kwargs)\\n > File \\"/usr/lib/python2.7/site-packages/cherrypy/lib/jsontools.py\\", > line 61, in json_handler\\n value = > cherrypy.serving.request._json_inner_handler(*args, **kwargs)\\n File > \\"/usr/lib/python2.7/site-packages/cherrypy/_cpdispatch.py\\", line > 34, in __call__\\n return self.callable(*self.args, > **self.kwargs)\\n File > \\"/usr/lib64/ceph/mgr/dashboard/controllers/rgw.py\\", line 23, in > status\\n instance = RgwClient.admin_instance()\\n File > \\"/usr/lib64/ceph/mgr/dashboard/services/rgw_client.py\\", line 138, > in admin_instance\\n return > RgwClient.instance(RgwClient._SYSTEM_USERID)\\n File > \\"/usr/lib64/ceph/mgr/dashboard/services/rgw_client.py\\", line 121, > in instance\\n RgwClient._load_settings()\\n File > \\"/usr/lib64/ceph/mgr/dashboard/services/rgw_client.py\\", line 102, > in _load_settings\\n host, port = _determine_rgw_addr()\\n File > \\"/usr/lib64/ceph/mgr/dashboard/services/rgw_client.py\\", line 78, > in _determine_rgw_addr\\n raise LookupError(\'Failed to determine > RGW port\')\\nLookupError: Failed to determine RGW port\\n"}'] > > > Any help would be greatly appreciated. > > Thanks, > > Hendrik > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Volker Theile Software Engineer | openATTIC SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg) Phone: +49 173 5876879 E-Mail: vthe...@suse.com signature.asc Description: OpenPGP digital signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] customized ceph cluster name by ceph-deploy
Cluster names are deprecated, don't use them. I think they might have been removed with ceph-deploy 2.x (?) Paul Am Fr., 21. Sep. 2018 um 15:13 Uhr schrieb Joshua Chen : > > Hi all, > I am using ceph-deploy 2.0.1 to create my testing cluster by this command: > > ceph-deploy --cluster pescadores new --cluster-network 100.109.240.0/24 > --public-network 10.109.240.0/24 cephmon1 cephmon2 cephmon3 > > but the --cluster pescadores (name of the cluster) doesn't seem to work. > Anyone could help me on this or point out the direction? anything wrong with > my cli? > > or what is the equivelent ceph command to do the same job? > > Cheers > Joshua > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] PG stuck incomplete
I also switched the cache tier to "readproxy", to avoid using this cache. But, it's still blocked. You could change the cache mode to "none" to disable it. Could you paste the output of: ceph osd pool ls detail | grep cache-bkp-foo Zitat von Olivier Bonvalet : In fact, one object (only one) seem to be blocked on the cache tier (writeback). I tried to flush the cache with "rados -p cache-bkp-foo cache-flush- evict-all", but it blocks on the object "rbd_data.f66c92ae8944a.000f2596". So I reduced (a lot) the cache tier to 200MB, "rados -p cache-bkp-foo ls" now show only 3 objects : rbd_directory rbd_data.f66c92ae8944a.000f2596 rbd_header.f66c92ae8944a And "cache-flush-evict-all" still hangs. I also switched the cache tier to "readproxy", to avoid using this cache. But, it's still blocked. Le vendredi 21 septembre 2018 à 02:14 +0200, Olivier Bonvalet a écrit : Hello, on a Luminous cluster, I have a PG incomplete and I can't find how to fix that. It's an EC pool (4+2) : pg 37.9c is incomplete, acting [32,50,59,1,0,75] (reducing pool bkp-sb-raid6 min_size from 4 may help; search ceph.com/docs for 'incomplete') Of course, we can't reduce min_size from 4. And the full state : https://pastebin.com/zrwu5X0w So, IO are blocked, we can't access thoses damaged data. OSD blocks too : osds 32,68,69 have stuck requests > 4194.3 sec OSD 32 is the primary of this PG. And OSD 68 and 69 are for cache tiering. Any idea how can I fix that ? Thanks, Olivier ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rbd-nbd map question
Vikas, could you tell what version do you observe this on? Because I can reproduce this only on jewel, and it has been fixed starting since luminous 12.2.1 [1]. [1] http://tracker.ceph.com/issues/20426 On Wed, Sep 19, 2018 at 03:48:44PM -0400, Jason Dillaman wrote: > Thanks for reporting this -- it looks like we broke the part where > command-line config overrides were parsed out from the parsing. I've > opened a tracker ticket against the issue [1]. > > On Wed, Sep 19, 2018 at 2:49 PM Vikas Rana wrote: > > > > Hi there, > > > > With default cluster name "ceph" I can map rbd-nbd without any issue. > > > > But for a different cluster name, i'm not able to map image using rbd-nbd > > and getting > > > > root@vtier-P-node1:/etc/ceph# rbd-nbd --cluster cephdr map test-pool/testvol > > rbd-nbd: unknown command: --cluster > > > > > > I looked at the man page and the syntax looks right. > > Can someone please help me on what I'm doing wrong? > > > > Thanks, > > -Vikas > > ___ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > [1] http://tracker.ceph.com/issues/36089 > > -- > Jason > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Mykola Golub ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] customized ceph cluster name by ceph-deploy
Hi all, I am using ceph-deploy 2.0.1 to create my testing cluster by this command: ceph-deploy --cluster pescadores new --cluster-network 100.109.240.0/24 --public-network 10.109.240.0/24 cephmon1 cephmon2 cephmon3 but the --cluster pescadores (name of the cluster) doesn't seem to work. Anyone could help me on this or point out the direction? anything wrong with my cli? or what is the equivelent ceph command to do the same job? Cheers Joshua ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] PG stuck incomplete
I tried to flush the cache with "rados -p cache-bkp-foo cache-flush- evict-all", but it blocks on the object "rbd_data.f66c92ae8944a.000f2596". This is the object that's stuck in the cache tier (according to your output in https://pastebin.com/zrwu5X0w). Can you verify if that block device is in use and healthy or is it corrupt? Zitat von Maks Kowalik : Could you, please paste the output of pg 37.9c query pt., 21 wrz 2018 o 14:39 Olivier Bonvalet napisał(a): In fact, one object (only one) seem to be blocked on the cache tier (writeback). I tried to flush the cache with "rados -p cache-bkp-foo cache-flush- evict-all", but it blocks on the object "rbd_data.f66c92ae8944a.000f2596". So I reduced (a lot) the cache tier to 200MB, "rados -p cache-bkp-foo ls" now show only 3 objects : rbd_directory rbd_data.f66c92ae8944a.000f2596 rbd_header.f66c92ae8944a And "cache-flush-evict-all" still hangs. I also switched the cache tier to "readproxy", to avoid using this cache. But, it's still blocked. Le vendredi 21 septembre 2018 à 02:14 +0200, Olivier Bonvalet a écrit : > Hello, > > on a Luminous cluster, I have a PG incomplete and I can't find how to > fix that. > > It's an EC pool (4+2) : > > pg 37.9c is incomplete, acting [32,50,59,1,0,75] (reducing pool > bkp-sb-raid6 min_size from 4 may help; search ceph.com/docs for > 'incomplete') > > Of course, we can't reduce min_size from 4. > > And the full state : https://pastebin.com/zrwu5X0w > > So, IO are blocked, we can't access thoses damaged data. > OSD blocks too : > osds 32,68,69 have stuck requests > 4194.3 sec > > OSD 32 is the primary of this PG. > And OSD 68 and 69 are for cache tiering. > > Any idea how can I fix that ? > > Thanks, > > Olivier > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] PG stuck incomplete
Could you, please paste the output of pg 37.9c query pt., 21 wrz 2018 o 14:39 Olivier Bonvalet napisał(a): > In fact, one object (only one) seem to be blocked on the cache tier > (writeback). > > I tried to flush the cache with "rados -p cache-bkp-foo cache-flush- > evict-all", but it blocks on the object > "rbd_data.f66c92ae8944a.000f2596". > > So I reduced (a lot) the cache tier to 200MB, "rados -p cache-bkp-foo > ls" now show only 3 objects : > > rbd_directory > rbd_data.f66c92ae8944a.000f2596 > rbd_header.f66c92ae8944a > > And "cache-flush-evict-all" still hangs. > > I also switched the cache tier to "readproxy", to avoid using this > cache. But, it's still blocked. > > > > > Le vendredi 21 septembre 2018 à 02:14 +0200, Olivier Bonvalet a écrit : > > Hello, > > > > on a Luminous cluster, I have a PG incomplete and I can't find how to > > fix that. > > > > It's an EC pool (4+2) : > > > > pg 37.9c is incomplete, acting [32,50,59,1,0,75] (reducing pool > > bkp-sb-raid6 min_size from 4 may help; search ceph.com/docs for > > 'incomplete') > > > > Of course, we can't reduce min_size from 4. > > > > And the full state : https://pastebin.com/zrwu5X0w > > > > So, IO are blocked, we can't access thoses damaged data. > > OSD blocks too : > > osds 32,68,69 have stuck requests > 4194.3 sec > > > > OSD 32 is the primary of this PG. > > And OSD 68 and 69 are for cache tiering. > > > > Any idea how can I fix that ? > > > > Thanks, > > > > Olivier > > > > > > ___ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] PG stuck incomplete
In fact, one object (only one) seem to be blocked on the cache tier (writeback). I tried to flush the cache with "rados -p cache-bkp-foo cache-flush- evict-all", but it blocks on the object "rbd_data.f66c92ae8944a.000f2596". So I reduced (a lot) the cache tier to 200MB, "rados -p cache-bkp-foo ls" now show only 3 objects : rbd_directory rbd_data.f66c92ae8944a.000f2596 rbd_header.f66c92ae8944a And "cache-flush-evict-all" still hangs. I also switched the cache tier to "readproxy", to avoid using this cache. But, it's still blocked. Le vendredi 21 septembre 2018 à 02:14 +0200, Olivier Bonvalet a écrit : > Hello, > > on a Luminous cluster, I have a PG incomplete and I can't find how to > fix that. > > It's an EC pool (4+2) : > > pg 37.9c is incomplete, acting [32,50,59,1,0,75] (reducing pool > bkp-sb-raid6 min_size from 4 may help; search ceph.com/docs for > 'incomplete') > > Of course, we can't reduce min_size from 4. > > And the full state : https://pastebin.com/zrwu5X0w > > So, IO are blocked, we can't access thoses damaged data. > OSD blocks too : > osds 32,68,69 have stuck requests > 4194.3 sec > > OSD 32 is the primary of this PG. > And OSD 68 and 69 are for cache tiering. > > Any idea how can I fix that ? > > Thanks, > > Olivier > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Hyper-v ISCSI support
On Fri, Sep 21, 2018 at 6:48 AM Glen Baars wrote: > > Hello Ceph Users, > > > > We have been using ceph-iscsi-cli for some time now with vmware and it is > performing ok. > > > > We would like to use the same iscsi service to store our Hyper-v VMs via > windows clustered shared volumes. When we add the volume to windows failover > manager we get a device is not ready error. I am assuming this is due to > SCSI-3 persistent reservations. That is correct -- the upstream kernel LIO doesn't have any support for distributing SCSI-3 persistent reservations between iSCSI gateways at this time. SUSE has some custom kernel patches to distribute those reservations via the Ceph cluster but it has been previously rejected from inclusion in the upstream kernel. There is also the PetaSAN project which is derived from the SUSE kernel plus some other changes. > Has anyone managed to get ceph to serve iscsi to windows clustered shared > volumes? If so, how? > > Kind regards, > > Glen Baars > > This e-mail is intended solely for the benefit of the addressee(s) and any > other named recipient. It is confidential and may contain legally privileged > or confidential information. If you are not the recipient, any use, > distribution, disclosure or copying of this e-mail is prohibited. The > confidentiality and legal privilege attached to this communication is not > waived or lost by reason of the mistaken transmission or delivery to you. If > you have received this e-mail in error, please notify us immediately. > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Jason ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Hyper-v ISCSI support
Hi Glen, Yes you need clustered SCSI-3 persistent reservations support. This is supported in SUSE SLE kernels, you may also be interested in PetaSAN: http://www.petasan.org which is based on these kernels. Maged On 21/09/18 12:48, Glen Baars wrote: Hello Ceph Users, We have been using ceph-iscsi-cli for some time now with vmware and it is performing ok. We would like to use the same iscsi service to store our Hyper-v VMs via windows clustered shared volumes. When we add the volume to windows failover manager we get a device is not ready error. I am assuming this is due to SCSI-3 persistent reservations. Has anyone managed to get ceph to serve iscsi to windows clustered shared volumes? If so, how? Kind regards, *Glen Baars* This e-mail is intended solely for the benefit of the addressee(s) and any other named recipient. It is confidential and may contain legally privileged or confidential information. If you are not the recipient, any use, distribution, disclosure or copying of this e-mail is prohibited. The confidentiality and legal privilege attached to this communication is not waived or lost by reason of the mistaken transmission or delivery to you. If you have received this e-mail in error, please notify us immediately. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] PG stuck incomplete
Ok, so it's a replica 3 pool, and OSD 68 & 69 are on the same host. Le vendredi 21 septembre 2018 à 11:09 +, Eugen Block a écrit : > > cache-tier on this pool have 26GB of data (for 5.7TB of data on the > > EC > > pool). > > We tried to flush the cache tier, and restart OSD 68 & 69, without > > any > > success. > > I meant the replication size of the pool > > ceph osd pool ls detail | grep > > In the experimental state of our cluster we had a cache tier (for > rbd > pool) with size 2, that can cause problems during recovery. Since > only > OSDs 68 and 69 are mentioned I was wondering if your cache tier > also > has size 2. > > > Zitat von Olivier Bonvalet : > > > Hi, > > > > cache-tier on this pool have 26GB of data (for 5.7TB of data on the > > EC > > pool). > > We tried to flush the cache tier, and restart OSD 68 & 69, without > > any > > success. > > > > But I don't see any related data on cache-tier OSD (filestore) with > > : > > > > find /var/lib/ceph/osd/ -maxdepth 3 -name '*37.9c*' > > > > > > I don't see any usefull information in logs. Maybe I should > > increase > > log level ? > > > > Thanks, > > > > Olivier > > > > > > Le vendredi 21 septembre 2018 à 09:34 +, Eugen Block a écrit : > > > Hi Olivier, > > > > > > what size does the cache tier have? You could set cache-mode to > > > forward and flush it, maybe restarting those OSDs (68, 69) helps, > > > too. > > > Or there could be an issue with the cache tier, what do those > > > logs > > > say? > > > > > > Regards, > > > Eugen > > > > > > > > > Zitat von Olivier Bonvalet : > > > > > > > Hello, > > > > > > > > on a Luminous cluster, I have a PG incomplete and I can't find > > > > how > > > > to > > > > fix that. > > > > > > > > It's an EC pool (4+2) : > > > > > > > > pg 37.9c is incomplete, acting [32,50,59,1,0,75] (reducing > > > > pool > > > > bkp-sb-raid6 min_size from 4 may help; search ceph.com/docs for > > > > 'incomplete') > > > > > > > > Of course, we can't reduce min_size from 4. > > > > > > > > And the full state : https://pastebin.com/zrwu5X0w > > > > > > > > So, IO are blocked, we can't access thoses damaged data. > > > > OSD blocks too : > > > > osds 32,68,69 have stuck requests > 4194.3 sec > > > > > > > > OSD 32 is the primary of this PG. > > > > And OSD 68 and 69 are for cache tiering. > > > > > > > > Any idea how can I fix that ? > > > > > > > > Thanks, > > > > > > > > Olivier > > > > > > > > > > > > ___ > > > > ceph-users mailing list > > > > ceph-users@lists.ceph.com > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > > > > > > > > ___ > > > ceph-users mailing list > > > ceph-users@lists.ceph.com > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] PG stuck incomplete
cache-tier on this pool have 26GB of data (for 5.7TB of data on the EC pool). We tried to flush the cache tier, and restart OSD 68 & 69, without any success. I meant the replication size of the pool ceph osd pool ls detail | grep In the experimental state of our cluster we had a cache tier (for rbd pool) with size 2, that can cause problems during recovery. Since only OSDs 68 and 69 are mentioned I was wondering if your cache tier also has size 2. Zitat von Olivier Bonvalet : Hi, cache-tier on this pool have 26GB of data (for 5.7TB of data on the EC pool). We tried to flush the cache tier, and restart OSD 68 & 69, without any success. But I don't see any related data on cache-tier OSD (filestore) with : find /var/lib/ceph/osd/ -maxdepth 3 -name '*37.9c*' I don't see any usefull information in logs. Maybe I should increase log level ? Thanks, Olivier Le vendredi 21 septembre 2018 à 09:34 +, Eugen Block a écrit : Hi Olivier, what size does the cache tier have? You could set cache-mode to forward and flush it, maybe restarting those OSDs (68, 69) helps, too. Or there could be an issue with the cache tier, what do those logs say? Regards, Eugen Zitat von Olivier Bonvalet : > Hello, > > on a Luminous cluster, I have a PG incomplete and I can't find how > to > fix that. > > It's an EC pool (4+2) : > > pg 37.9c is incomplete, acting [32,50,59,1,0,75] (reducing pool > bkp-sb-raid6 min_size from 4 may help; search ceph.com/docs for > 'incomplete') > > Of course, we can't reduce min_size from 4. > > And the full state : https://pastebin.com/zrwu5X0w > > So, IO are blocked, we can't access thoses damaged data. > OSD blocks too : > osds 32,68,69 have stuck requests > 4194.3 sec > > OSD 32 is the primary of this PG. > And OSD 68 and 69 are for cache tiering. > > Any idea how can I fix that ? > > Thanks, > > Olivier > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Remotely tell an OSD to stop ?
Thanks! I was in the process of upgrading, so "noout" was already set, probably preventing setting "noin". I thus just "ceph osdset noup", then "ceph osd down ", which stopped activity on the disks (probably not enough to clean everything in Bluestore, but I decided to trust its inner working). I now have an unbootable XFS root filesystem, some OSDs out but probably OK owith their data, and 4× redundancy. I'll pause and think about the next steps with no urgency ;-) Le vendredi 21 septembre 2018 à 11:09 +0200, Patrick Nawracay a écrit : > Hi, > > you'll need to set `noup` to prevent OSDs from being started > automatically. The `noin` flags prevents that the cluster sets the > OSD > `in` again, after it has been set `out`. > > `ceph osd set noup` before `ceph osd down ` > > `ceph osd set noin` before `ceph osd out ` > > Those global flags (they prevent all OSDs from being automatically > set > up/in again), can be disabled with unset. > > `ceph osd unset ` > > Please note that I'm not familiar with recovery of a Ceph cluster, > I'm > just trying to answer the question, but don't know if that's the best > approach in this case. > > Patrick > > > On 21.09.2018 10:49, Nicolas Huillard wrote: > > Hi all, > > > > One of my server crashed its root filesystem, ie. the currently > > open > > shell just says "command not found" for any basic command (ls, df, > > mount, dmesg, etc.) > > ACPI soft power-off won't work because it needs scripts on /... > > > > Before I reset the hardware, I'd like to cleanly stop the OSDs on > > this > > server (with still work because they do not need /). > > I was able to move the MGR out of that server with "ceph mgr fail > > [hostname]". > > Is it possible to tell the OSD on that host to stop, from another > > host? > > I tried "ceph osd down [osdnumber]", but the OSD just got back "in" > > immediately. > > > > Ceph 12.2.7 on Debian > > > > TIA, > > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Nicolas Huillard Associé fondateur - Directeur Technique - Dolomède nhuill...@dolomede.fr Fixe : +33 9 52 31 06 10 Mobile : +33 6 50 27 69 08 http://www.dolomede.fr/ https://www.observatoire-climat-energie.fr/ https://reseauactionclimat.org/planetman/ https://350.org/fr/ https://reporterre.net/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] PG stuck incomplete
Hi, cache-tier on this pool have 26GB of data (for 5.7TB of data on the EC pool). We tried to flush the cache tier, and restart OSD 68 & 69, without any success. But I don't see any related data on cache-tier OSD (filestore) with : find /var/lib/ceph/osd/ -maxdepth 3 -name '*37.9c*' I don't see any usefull information in logs. Maybe I should increase log level ? Thanks, Olivier Le vendredi 21 septembre 2018 à 09:34 +, Eugen Block a écrit : > Hi Olivier, > > what size does the cache tier have? You could set cache-mode to > forward and flush it, maybe restarting those OSDs (68, 69) helps, > too. > Or there could be an issue with the cache tier, what do those logs > say? > > Regards, > Eugen > > > Zitat von Olivier Bonvalet : > > > Hello, > > > > on a Luminous cluster, I have a PG incomplete and I can't find how > > to > > fix that. > > > > It's an EC pool (4+2) : > > > > pg 37.9c is incomplete, acting [32,50,59,1,0,75] (reducing pool > > bkp-sb-raid6 min_size from 4 may help; search ceph.com/docs for > > 'incomplete') > > > > Of course, we can't reduce min_size from 4. > > > > And the full state : https://pastebin.com/zrwu5X0w > > > > So, IO are blocked, we can't access thoses damaged data. > > OSD blocks too : > > osds 32,68,69 have stuck requests > 4194.3 sec > > > > OSD 32 is the primary of this PG. > > And OSD 68 and 69 are for cache tiering. > > > > Any idea how can I fix that ? > > > > Thanks, > > > > Olivier > > > > > > ___ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Hyper-v ISCSI support
Hello Ceph Users, We have been using ceph-iscsi-cli for some time now with vmware and it is performing ok. We would like to use the same iscsi service to store our Hyper-v VMs via windows clustered shared volumes. When we add the volume to windows failover manager we get a device is not ready error. I am assuming this is due to SCSI-3 persistent reservations. Has anyone managed to get ceph to serve iscsi to windows clustered shared volumes? If so, how? Kind regards, Glen Baars This e-mail is intended solely for the benefit of the addressee(s) and any other named recipient. It is confidential and may contain legally privileged or confidential information. If you are not the recipient, any use, distribution, disclosure or copying of this e-mail is prohibited. The confidentiality and legal privilege attached to this communication is not waived or lost by reason of the mistaken transmission or delivery to you. If you have received this e-mail in error, please notify us immediately. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph-ansible
On Thu, Sep 20, 2018 at 7:04 PM solarflow99 wrote: > > oh, was that all it was... git clone https://github.com/ceph/ceph-ansible/ > I installed the notario package from EPEL, > python2-notario-0.0.11-2.el7.noarch and thats the newest they have Hey Ken, I thought the latest versions were being packaged, is there something I've missed? The tags have changed format it seems, from 0.0.11 > > > > > On Thu, Sep 20, 2018 at 3:57 PM Alfredo Deza wrote: >> >> Not sure how you installed ceph-ansible, the requirements mention a >> version of a dependency (the notario module) which needs to be 0.0.13 >> or newer, and you seem to be using an older one. >> >> >> On Thu, Sep 20, 2018 at 6:53 PM solarflow99 wrote: >> > >> > Hi, tying to get this to do a simple deployment, and i'm getting a strange >> > error, has anyone seen this? I'm using Centos 7, rel 5 ansible 2.5.3 >> > python version = 2.7.5 >> > >> > I've tried with mimic luninous and even jewel, no luck at all. >> > >> > >> > >> > TASK [ceph-validate : validate provided configuration] >> > ** >> > task path: >> > /home/jzygmont/ansible/ceph-ansible/roles/ceph-validate/tasks/main.yml:2 >> > Thursday 20 September 2018 14:05:18 -0700 (0:00:05.734) 0:00:37.439 >> > >> > The full traceback is: >> > Traceback (most recent call last): >> > File >> > "/usr/lib/python2.7/site-packages/ansible/executor/task_executor.py", line >> > 138, in run >> > res = self._execute() >> > File >> > "/usr/lib/python2.7/site-packages/ansible/executor/task_executor.py", line >> > 561, in _execute >> > result = self._handler.run(task_vars=variables) >> > File "/home/jzygmont/ansible/ceph-ansible/plugins/actions/validate.py", >> > line 43, in run >> > notario.validate(host_vars, install_options, defined_keys=True) >> > TypeError: validate() got an unexpected keyword argument 'defined_keys' >> > >> > fatal: [172.20.3.178]: FAILED! => { >> > "msg": "Unexpected failure during module execution.", >> > "stdout": "" >> > } >> > >> > NO MORE HOSTS LEFT >> > ** >> > >> > PLAY RECAP >> > ** >> > 172.20.3.178 : ok=25 changed=0unreachable=0failed=1 >> > >> > ___ >> > ceph-users mailing list >> > ceph-users@lists.ceph.com >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Remotely tell an OSD to stop ?
Hi, You won't be able to stop them, but if the OSDs are still running I would just set them as out, wait for all data to be moved from them and then it should be safe to power off the host. --- Alex On Fri, Sep 21, 2018 at 11:50 AM Nicolas Huillard wrote: > > Hi all, > > One of my server crashed its root filesystem, ie. the currently open > shell just says "command not found" for any basic command (ls, df, > mount, dmesg, etc.) > ACPI soft power-off won't work because it needs scripts on /... > > Before I reset the hardware, I'd like to cleanly stop the OSDs on this > server (with still work because they do not need /). > I was able to move the MGR out of that server with "ceph mgr fail > [hostname]". > Is it possible to tell the OSD on that host to stop, from another host? > I tried "ceph osd down [osdnumber]", but the OSD just got back "in" > immediately. > > Ceph 12.2.7 on Debian > > TIA, > > -- > Nicolas Huillard > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] PG stuck incomplete
Hi Olivier, what size does the cache tier have? You could set cache-mode to forward and flush it, maybe restarting those OSDs (68, 69) helps, too. Or there could be an issue with the cache tier, what do those logs say? Regards, Eugen Zitat von Olivier Bonvalet : Hello, on a Luminous cluster, I have a PG incomplete and I can't find how to fix that. It's an EC pool (4+2) : pg 37.9c is incomplete, acting [32,50,59,1,0,75] (reducing pool bkp-sb-raid6 min_size from 4 may help; search ceph.com/docs for 'incomplete') Of course, we can't reduce min_size from 4. And the full state : https://pastebin.com/zrwu5X0w So, IO are blocked, we can't access thoses damaged data. OSD blocks too : osds 32,68,69 have stuck requests > 4194.3 sec OSD 32 is the primary of this PG. And OSD 68 and 69 are for cache tiering. Any idea how can I fix that ? Thanks, Olivier ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Remotely tell an OSD to stop ?
Hi, you'll need to set `noup` to prevent OSDs from being started automatically. The `noin` flags prevents that the cluster sets the OSD `in` again, after it has been set `out`. `ceph osd set noup` before `ceph osd down ` `ceph osd set noin` before `ceph osd out ` Those global flags (they prevent all OSDs from being automatically set up/in again), can be disabled with unset. `ceph osd unset ` Please note that I'm not familiar with recovery of a Ceph cluster, I'm just trying to answer the question, but don't know if that's the best approach in this case. Patrick On 21.09.2018 10:49, Nicolas Huillard wrote: > Hi all, > > One of my server crashed its root filesystem, ie. the currently open > shell just says "command not found" for any basic command (ls, df, > mount, dmesg, etc.) > ACPI soft power-off won't work because it needs scripts on /... > > Before I reset the hardware, I'd like to cleanly stop the OSDs on this > server (with still work because they do not need /). > I was able to move the MGR out of that server with "ceph mgr fail > [hostname]". > Is it possible to tell the OSD on that host to stop, from another host? > I tried "ceph osd down [osdnumber]", but the OSD just got back "in" > immediately. > > Ceph 12.2.7 on Debian > > TIA, > -- SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg) signature.asc Description: OpenPGP digital signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Remotely tell an OSD to stop ?
Hi all, One of my server crashed its root filesystem, ie. the currently open shell just says "command not found" for any basic command (ls, df, mount, dmesg, etc.) ACPI soft power-off won't work because it needs scripts on /... Before I reset the hardware, I'd like to cleanly stop the OSDs on this server (with still work because they do not need /). I was able to move the MGR out of that server with "ceph mgr fail [hostname]". Is it possible to tell the OSD on that host to stop, from another host? I tried "ceph osd down [osdnumber]", but the OSD just got back "in" immediately. Ceph 12.2.7 on Debian TIA, -- Nicolas Huillard ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] how dynamic bucket sharding works
Hi Cephers, Could someone explain me how dynamic bucket index sharding works? I have created a test bucket with 4 million objects on ceph 12.2.8 and it showed 80 shards (ver, master_ver, max_marker fomr 0 to 79 in bucket stats) and leave it for a night. Next day in the morning I found this in reshard list: "time": "2018-09-21 06:15:12.094792Z", "tenant": "", "bucket_name": "test", "bucket_id": "_id_.7827818.1", "new_instance_id": "test:_id_.25481437.10", "old_num_shards": 8, "new_num_shards": 16 During this reshard bucket stats showed 16 shards (counting ver, master_ver, max_marker from bucket stats on marker _id_.7827818.1). After deleting and re adding 2 objects reshard kicked in once more, this time from 16 to 80 shards. Actual bucket stats are: { "bucket": "test", "zonegroup": "84d584b4-3e95-49f8-8285-4a704f8252e3", "placement_rule": "default-placement", "explicit_placement": { "data_pool": "", "data_extra_pool": "", "index_pool": "" }, "id": "_id_.25481803.6", "marker": "_id_.7827818.1", "index_type": "Normal", "owner": "test", "ver": "0#789,1#785,2#787,3#782,4#790,5#798,6#784,7#784,8#782,9#791,10#788,11#785,12#786,13#792,14#783,15#783,16#786,17#776,18#787,19#783,20#784,21#785,22#786,23#782,24#787,25#794,26#786,27#789,28#794,29#781,30#785,31#779,32#780,33#776,34#790,35#775,36#780,37#781,38#779,39#782,40#778,41#776,42#774,43#781,44#779,45#785,46#778,47#779,48#783,49#778,50#784,51#779,52#780,53#782,54#781,55#779,56#789,57#783,58#774,59#780,60#779,61#782,62#780,63#775,64#783,65#783,66#781,67#785,68#777,69#785,70#781,71#782,72#778,73#778,74#778,75#777,76#783,77#775,78#790,79#792", "master_ver": "0#0,1#0,2#0,3#0,4#0,5#0,6#0,7#0,8#0,9#0,10#0,11#0,12#0,13#0,14#0,15#0,16#0,17#0,18#0,19#0,20#0,21#0,22#0,23#0,24#0,25#0,26#0,27#0,28#0,29#0,30#0,31#0,32#0,33#0,34#0,35#0,36#0,37#0,38#0,39#0,40#0,41#0,42#0,43#0,44#0,45#0,46#0,47#0,48#0,49#0,50#0,51#0,52#0,53#0,54#0,55#0,56#0,57#0,58#0,59#0,60#0,61#0,62#0,63#0,64#0,65#0,66#0,67#0,68#0,69#0,70#0,71#0,72#0,73#0,74#0,75#0,76#0,77#0,78#0,79#0", "mtime": "2018-09-21 08:40:33.652235", "max_marker": "0#,1#,2#,3#,4#,5#,6#,7#,8#,9#,10#,11#,12#,13#,14#,15#,16#,17#,18#,19#,20#,21#,22#,23#,24#,25#,26#,27#,28#,29#,30#,31#,32#,33#,34#,35#,36#,37#,38#,39#,40#,41#,42#,43#,44#,45#,46#,47#,48#,49#,50#,51#,52#,53#,54#,55#,56#,57#,58#,59#,60#,61#,62#,63#,64#,65#,66#,67#,68#,69#,70#,71#,72#,73#,74#,75#,76#,77#,78#,79#", "usage": { "rgw.none": { "size": 0, "size_actual": 0, "size_utilized": 0, "size_kb": 0, "size_kb_actual": 0, "size_kb_utilized": 0, "num_objects": 2 }, "rgw.main": { "size": 419286170636, "size_actual": 421335109632, "size_utilized": 0, "size_kb": 409459152, "size_kb_actual": 411460068, "size_kb_utilized": 0, "num_objects": 401 } }, "bucket_quota": { "enabled": false, "check_on_raw": false, "max_size": -1, "max_size_kb": 0, "max_objects": -1 } } My question is: Why on earth did ceph reshard this bucket to 8 shards and after than to 16 shards, and than to 80 after re adding 2 objects? Additional question: Why do we need rgw_reshard_bucket_lock_duration if https://ceph.com/community/new-luminous-rgw-dynamic-bucket-sharding/ states: "...Furthermore, there is no need to stop IO operations that go to the bucket (although some concurrent operations may experience additional latency when resharding is in progress)..." From My experience it blocks write completely, only read works. -- Thanks Tom ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] backup ceph
Hi, Am 21.09.18 um 03:28 schrieb ST Wong (ITSC): > Hi, > >>> Will the RAID 6 be mirrored to another storage in remote site for DR >>> purpose? >> >> Not yet. Our goal is to have the backup ceph to which we will replicate >> spread across three different buildings, with 3 replicas. > > May I ask if the backup ceph is a single ceph cluster span across 3 different > buildings, or compose of 3 ceph clusters in 3 different buildings? Thanks. > This will be a single ceph cluster with a failure domain corresponding to the building and three replicas. To test updates before rolling them out to the full cluster, we will also instantiate a small test cluster separately, but we try to keep the number of production clusters down and rather let Ceph handle failover and replication than doing that ourselves, which also allows to grow / shrink the cluster more easily as needed ;-). All the best, Oliver > Thanks again for your help. > Best Regards, > /ST Wong > > -Original Message- > From: Oliver Freyermuth > Sent: Thursday, September 20, 2018 2:10 AM > To: ST Wong (ITSC) > Cc: Peter Wienemann ; ceph-users@lists.ceph.com > Subject: Re: [ceph-users] backup ceph > > Hi, > > Am 19.09.18 um 18:32 schrieb ST Wong (ITSC): >> Thanks for your help. > > You're welcome! > I should also add we don't have very long-term experience with this yet - > Benji is pretty modern. > >>> For the moment, we use Benji to backup to a classic RAID 6. >> Will the RAID 6 be mirrored to another storage in remote site for DR purpose? > > Not yet. Our goal is to have the backup ceph to which we will replicate > spread across three different buildings, with 3 replicas. > >> >>> For RBD mirroring, you do indeed need another running Ceph Cluster, but we >>> plan to use that in the long run (on separate hardware of course). >> Seems this is the way to go, regardless of additional resources required? :) >> Btw, RBD mirroring looks like a DR copy instead of a daily backup from which >> we can restore image of particular date ? > > We would still perform daily snapshots, and keep those both in the RBD mirror > and in the Benji backup. Even when fading out the current RAID 6 machine at > some point, > we'd probably keep Benji and direct it's output to a CephFS pool on our > backup Ceph cluster. If anything goes wrong with the mirroring, this still > leaves us > with an independent backup approach. We also keep several days of snapshots > in the production RBD pool to be able to quickly roll back a VM if anything > goes wrong. > With Benji, you can also mount any of these daily snapshots via NBD in case > it is needed, or restore from a specific date. > > All the best, > Oliver > >> >> Thanks again. >> /st wong >> >> -Original Message- >> From: Oliver Freyermuth >> Sent: Wednesday, September 19, 2018 5:28 PM >> To: ST Wong (ITSC) >> Cc: Peter Wienemann ; ceph-users@lists.ceph.com >> Subject: Re: [ceph-users] backup ceph >> >> Hi, >> >> Am 19.09.18 um 03:24 schrieb ST Wong (ITSC): >>> Hi, >>> >>> Thanks for your information. >>> May I know more about the backup destination to use? As the size of the >>> cluster will be a bit large (~70TB to start with), we're looking for some >>> efficient method to do that backup. Seems RBD mirroring or incremental >>> snapshot s with RBD >>> (https://ceph.com/geen-categorie/incremental-snapshots-with-rbd/) are some >>> ways to go, but requires another running Ceph cluster. Is my understanding >>> correct?Thanks. >> >> For the moment, we use Benji to backup to a classic RAID 6. With Benji, only >> the changed chunks are backed up, and it learns that by asking Ceph for a >> diff of the RBD snapshots. >> So that's really fast after the first backup, and especially if you do >> trimming (e.g. via guest agent if you run VMs) of the RBD volumes before >> backing them up. >> The same is true for Backy2, but it does not support compression (which >> really helps by several factors(!) in saving I/O and with zstd it does not >> use much CPU). >> >> For RBD mirroring, you do indeed need another running Ceph Cluster, but we >> plan to use that in the long run (on separate hardware of course). >> >>> Btw, is this one (https://benji-backup.me/) Benji you'r referring to ? >>> Thanks a lot. >> >> Exactly :-). >> >> Cheers, >> Oliver >> >>> >>> >>> >>> Cheers, >>> /ST Wong >>> >>> >>> >>> -Original Message- >>> From: Oliver Freyermuth >>> Sent: Tuesday, September 18, 2018 6:09 PM >>> To: ST Wong (ITSC) >>> Cc: Peter Wienemann >>> Subject: Re: [ceph-users] backup ceph >>> >>> Hi, >>> >>> we're also just starting to collect experiences, so we have nothing to >>> share (yet). However, we are evaluating using Benji (a well-maintained fork >>> of Backy2 which can also compress) in addition, trimming and fsfreezing the >>> VM disks shortly before, >>> and additionally keeping a few daily and weekly snapshots. >>> We may
Re: [ceph-users] Proxmox/ceph upgrade and addition of a new node/OSDs
Hi Hervé! Thanks for the detailed summary, much appreciated! Best, MJ On 09/21/2018 09:03 AM, Hervé Ballans wrote: Hi MJ (and all), So we upgraded our Proxmox/Ceph cluster, and if we have to summarize the operation in a few words : overall, everything went well :) The most critical operation of all is the 'osd crush tunables optimal', I talk about it in more detail after... The Proxmox documentation is really well written and accurate and, normally, following the documentation step by step is almost sufficient ! * first step : upgrade Ceph Jewel to Luminous : https://pve.proxmox.com/wiki/Ceph_Jewel_to_Luminous (Note here : OSDs remain in FileStore backend, no BlueStore migration) * second step : upgrade Proxmox version 4 to 5 : https://pve.proxmox.com/wiki/Upgrade_from_4.x_to_5.0 Just some numbers, observations and tips (based on our feedback, I'm not an expert !) : * Before migration, make sure you are in the lastest version of Proxmox 4 (4.4-24) and Ceph Jewel (10.2.11) * We don't use the pve repository for ceph packages but the official one (download.ceph.com). Thus, during the upgrade of Promox PVE, we don't replace ceph.com repository with promox.com Ceph repository... * When you upgrade Ceph to Luminous (without tunables optimal), there is no impact on Proxmox 4. VMs are still running normally. The side effect (non blocking for the functionning of VMs) is located in the GUI, on the Ceph menu : it can't report the status of the ceph cluster as it has a JSON formatting error (indeed the output of the command 'ceph -s' is completely different, really more readable on Luminous) * It misses a little step in section 8 "Create Manager instances" of the upgrade ceph documentation. As the Ceph manager daemon is new since Luminous, the package doesn't exist on Jewel. So you have to install the ceph-mgr package on each node first before doing 'pveceph createmgr'||| | * The 'osd crush tunables optimal' operation is time consuming ! in our case : 5 nodes (PE R730xd), 58 OSDs, replicated (3/2) rbd pool with 2048 pgs and 2 millions objects, 22 TB used. The tunables operation took a little more than 24 hours ! * Really take the right time to make the 'tunables optimal' ! We encountered some pgs stuck and blocked requests during this operation. In our case, the involved OSDs were those with a high numbers of pgs (as they are high capacity disks). The consequences can be critical since it can freeze some VMs (I guess those that replicas are stored on the stuck pgs ?). The stuck state were corrected by rebooting the involved OSDs. If you can move the disks of your critical VMs on another storage, so these VMs should not be impacted by the recovery (we moved some disks on another Ceph cluster and keep the conf in the Proxmox cluster being updated and there was no impact) Otherwise : - verify that all your VMs are recently backuped on an external storage (in case of Disaster recovery Plan !) - if you can, stop all your non-critical VMs (in order to limit client io operations) - if any, wait for the end of current backups then disable datacenter backup (in order to limit client io operations). !! do not forget to re-enable it when all is over !! - if any and if no longer needed, delete your snapshots, it removes many useless objects ! - start the tunables operation outside of major activity periods (night, week-end, ??) and take into account that it can be very slow... There are probably some options to configure in ceph to avoid 'pgs stuck' states, but on our side, as we previously moved our critical VM's disks, we didn't care about that ! * Anyway, the upgrade step of Proxmox PVE is done easily and quickly (just follow the documentation). Note that you can upgrade Proxmox PVE before doing the 'tunables optimal' operation. Hoping that you will find this information useful, good luck with your very next migration ! Hervé Le 13/09/2018 à 22:04, mj a écrit : Hi Hervé, No answer from me, but just to say that I have exactly the same upgrade path ahead of me. :-) Please report here any tips, trics, or things you encountered doing the upgrades. It could potentially save us a lot of time. :-) Thanks! MJ On 09/13/2018 05:23 PM, Hervé Ballans wrote: Dear list, I am currently in the process of upgrading Proxmox 4/Jewel to Proxmox5/Luminous. I also have a new node to add to my Proxmox cluster. What I plan to do is the following (from https://pve.proxmox.com/wiki/Ceph_Jewel_to_Luminous): * upgrade Jewel to Luminous * let the "ceph osd crush tunables optimal " command run * upgrade my proxmox to v5 * add the new node (already up to date in v5) * add the new OSDs * let ceph rebalance the lot A couple of questions I have : * would it be a good idea to add the new node+OSDs and run the "tunables optimal" command immediately after, which would maybe gain a little time and avoid two successive pg rebalancing ? * did I miss anything
Re: [ceph-users] Proxmox/ceph upgrade and addition of a new node/OSDs
Hi MJ (and all), So we upgraded our Proxmox/Ceph cluster, and if we have to summarize the operation in a few words : overall, everything went well :) The most critical operation of all is the 'osd crush tunables optimal', I talk about it in more detail after... The Proxmox documentation is really well written and accurate and, normally, following the documentation step by step is almost sufficient ! * first step : upgrade Ceph Jewel to Luminous : https://pve.proxmox.com/wiki/Ceph_Jewel_to_Luminous (Note here : OSDs remain in FileStore backend, no BlueStore migration) * second step : upgrade Proxmox version 4 to 5 : https://pve.proxmox.com/wiki/Upgrade_from_4.x_to_5.0 Just some numbers, observations and tips (based on our feedback, I'm not an expert !) : * Before migration, make sure you are in the lastest version of Proxmox 4 (4.4-24) and Ceph Jewel (10.2.11) * We don't use the pve repository for ceph packages but the official one (download.ceph.com). Thus, during the upgrade of Promox PVE, we don't replace ceph.com repository with promox.com Ceph repository... * When you upgrade Ceph to Luminous (without tunables optimal), there is no impact on Proxmox 4. VMs are still running normally. The side effect (non blocking for the functionning of VMs) is located in the GUI, on the Ceph menu : it can't report the status of the ceph cluster as it has a JSON formatting error (indeed the output of the command 'ceph -s' is completely different, really more readable on Luminous) * It misses a little step in section 8 "Create Manager instances" of the upgrade ceph documentation. As the Ceph manager daemon is new since Luminous, the package doesn't exist on Jewel. So you have to install the ceph-mgr package on each node first before doing 'pveceph createmgr'||| | * The 'osd crush tunables optimal' operation is time consuming ! in our case : 5 nodes (PE R730xd), 58 OSDs, replicated (3/2) rbd pool with 2048 pgs and 2 millions objects, 22 TB used. The tunables operation took a little more than 24 hours ! * Really take the right time to make the 'tunables optimal' ! We encountered some pgs stuck and blocked requests during this operation. In our case, the involved OSDs were those with a high numbers of pgs (as they are high capacity disks). The consequences can be critical since it can freeze some VMs (I guess those that replicas are stored on the stuck pgs ?). The stuck state were corrected by rebooting the involved OSDs. If you can move the disks of your critical VMs on another storage, so these VMs should not be impacted by the recovery (we moved some disks on another Ceph cluster and keep the conf in the Proxmox cluster being updated and there was no impact) Otherwise : - verify that all your VMs are recently backuped on an external storage (in case of Disaster recovery Plan !) - if you can, stop all your non-critical VMs (in order to limit client io operations) - if any, wait for the end of current backups then disable datacenter backup (in order to limit client io operations). !! do not forget to re-enable it when all is over !! - if any and if no longer needed, delete your snapshots, it removes many useless objects ! - start the tunables operation outside of major activity periods (night, week-end, ??) and take into account that it can be very slow... There are probably some options to configure in ceph to avoid 'pgs stuck' states, but on our side, as we previously moved our critical VM's disks, we didn't care about that ! * Anyway, the upgrade step of Proxmox PVE is done easily and quickly (just follow the documentation). Note that you can upgrade Proxmox PVE before doing the 'tunables optimal' operation. Hoping that you will find this information useful, good luck with your very next migration ! Hervé Le 13/09/2018 à 22:04, mj a écrit : Hi Hervé, No answer from me, but just to say that I have exactly the same upgrade path ahead of me. :-) Please report here any tips, trics, or things you encountered doing the upgrades. It could potentially save us a lot of time. :-) Thanks! MJ On 09/13/2018 05:23 PM, Hervé Ballans wrote: Dear list, I am currently in the process of upgrading Proxmox 4/Jewel to Proxmox5/Luminous. I also have a new node to add to my Proxmox cluster. What I plan to do is the following (from https://pve.proxmox.com/wiki/Ceph_Jewel_to_Luminous): * upgrade Jewel to Luminous * let the "ceph osd crush tunables optimal " command run * upgrade my proxmox to v5 * add the new node (already up to date in v5) * add the new OSDs * let ceph rebalance the lot A couple of questions I have : * would it be a good idea to add the new node+OSDs and run the "tunables optimal" command immediately after, which would maybe gain a little time and avoid two successive pg rebalancing ? * did I miss anything in this plan? Regards, Hervé ___ ceph-users mailing list