Re: [ceph-users] Ceph Upgrades - sanity check - MDS steps
Quoting James Wilkins (james.wilk...@fasthosts.com): > Hi all, > > Just want to (double) check something – we’re in the process of > luminous -> mimic upgrades for all of our clusters – particularly this > section regarding MDS steps > > • Confirm that only one MDS is online and is rank 0 for your FS: # > ceph status • Upgrade the last remaining MDS daemon by installing > the new packages and restarting the daemon: > > Namely – is it required to upgrade the live single MDS in place (and > thus have downtime whilst the MDS restarts – on our first cluster was > typically 10 minutes of downtime ) – or can we upgrade the > standby-replays/standbys first and flip once they are back? You should upgrade in place (the last remaining MDS) and yes that causes a bit of downtime. In our case it takes ~ 5s. Make sure to _only_ upgrade the ceph packages (no apt upgrade of whole system) as apt will happily disable services, start updating initramfs ... for all installed kernels, etc. Doing the full upgrade and reboot can be done later. This is how we do it: On (Active) Standby: mds2: systemctl stop ceph-mds.target On Active: apt update apt policy ceph-base <- check that the version that is available is indeed the version you want to upgrade to! apt install ceph-base ceph-common ceph-fuse ceph-mds ceph-mds-dbg libcephfs2 python-cephfs If mds doesn't get restarted with the upgrade, do it manually: systemctl restart ceph-mds.target ^^ a bit of downtime ceph daemon mds.$id version <- to make sure you are running the upgraded version (or run ceph versions to check) On Standby: apt install ceph-base ceph-common ceph-fuse ceph-mds ceph-mds-dbg libcephfs2 python-cephfs systemctl restart ceph-mds.target ceph daemon mds.$id version <- to make sure you are running the upgraded version On Active: apt upgrade && reboot (Standby becomes active) wait for HEALTH_OK On (now) Active (previously standby): apt upgrade && reboot If you follow this procedure you end up with the same active and standby as before the upgrades, both up to date with as little downtime as possible. That said ... I've accidentally updated a standby MDS to a newer version than the Active one ... and this didn't cause any issues (12.2.8 -> 12.2.11) ... but I would not recommend it. Gr. Stefan -- | BIT BV http://www.bit.nl/Kamer van Koophandel 09090351 | GPG: 0xD14839C6 +31 318 648 688 / i...@bit.nl ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Possible to move RBD volumes between pools?
Both pools are in the same Ceph cluster. Do you have any documentation on the live migration process? I'm running 14.2.1 Something like: ``` rbd migration prepare test1 rbd2/test2 rbd migration execute test1 rbd migration commit test1 --force ``` k ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] out of date python-rtslib repo on https://shaman.ceph.com/
On 06/17/2019 03:41 AM, Matthias Leopold wrote: > thank you very much for updating python-rtslib!! > could you maybe also do this for tcmu-runner (version 1.4.1)? I am just about to make a new 1.5 release. Give me a week. I am working on a last feature/bug for the gluster team, and then I am going to pass the code to the gluster tcmu-runner devs for some review and testing. > shaman repos are very convenient for installing and updating the ceph > iscsi stack, I would be very happy if I could continue using it > > matthias > > Am 14.06.19 um 18:08 schrieb Matthias Leopold: >> Hi, >> >> to the people running https://shaman.ceph.com/: >> please update the repo for python-rtslib so recent ceph-iscsi packages >> can be installed which need python-rtslib >= 2.1.fb68 >> >> shaman python-rtslib version is 2.1.fb67 >> upstream python-rtslib version is 2.1.fb69 >> >> thanks + thanks for running this service at all >> matthias >> >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ISCSI Setup
Sounds like progress! Thanks for the update! I will see if I can get it working in my test off of the GH site. -Brent -Original Message- From: Michael Christie Sent: Wednesday, June 19, 2019 5:24 PM To: Brent Kennedy ; 'Ceph Users' Subject: Re: [ceph-users] ISCSI Setup On 06/19/2019 12:34 AM, Brent Kennedy wrote: > Recently upgraded a ceph cluster to nautilus 14.2.1 from Luminous, no > issues. One of the reasons for doing so was to take advantage of some > of the new ISCSI updates that were added in Nautilus. I installed > CentOS 7.6 and did all the basic stuff to get the server online. I > then tried to use the > http://docs.ceph.com/docs/nautilus/rbd/iscsi-target-cli/ document and > hit a hard stop. Apparently, the package versions for the required > packages at the top nor the ceph-iscsi exist yet in any repositories. I am in the process of updating the upstream docs (Aaron wrote up the changes to the RHCS docs and I am just converting to the upstream docs and making into patches for a PR, and ceph-ansible (https://github.com/ceph/ceph-ansible/pull/3977) for the transition from ceph-iscsi-cli/config to ceph-iscsi. The upstream GH for ceph-iscsi is here https://github.com/ceph/ceph-iscsi and it is built here: https://shaman.ceph.com/repos/ceph-iscsi/ I think we are just waiting on one last patch for fqdn support from SUSE so we can make a new ceph-iscsi release. > Reminds me of when I first tried to setup RGWs. Is there a hidden > repository somewhere that hosts these required packages? Also, I > found a thread talking about those packages and the instructions being > off, which concerns me. Is there a good tutorial online somewhere? I > saw the ceph-ansible bits, but wasn't sure if that would even work > because of the package issue. I use ansible to deploy machines all > the time. I also wonder if the ISCSI bits are considered production > or Test ( I see RedHat has a bunch of docs talking about using iscsi, > so I would think production ). > > > > Thoughts anyone? > > > > Regards, > > -Brent > > > > Existing Clusters: > > Test: Nautilus 14.2.1 with 3 osd servers, 1 mon/man, 1 gateway ( all > virtual on SSD ) > > US Production(HDD): Nautilus 14.2.1 with 11 osd servers, 3 mons, 4 > gateways behind haproxy LB > > UK Production(HDD): Luminous 12.2.11 with 25 osd servers, 3 mons/man, > 3 gateways behind haproxy LB > > US Production(SSD): Luminous 12.2.11 with 6 osd servers, 3 mons/man, 3 > gateways behind haproxy LB > > > > > > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Possible to move RBD volumes between pools?
Both pools are in the same Ceph cluster. Do you have any documentation on the live migration process? I'm running 14.2.1 On Wed, Jun 19, 2019, 8:35 PM Jason Dillaman wrote: > On Wed, Jun 19, 2019 at 6:25 PM Brett Chancellor > wrote: > > > > Background: We have a few ceph clusters, each serves multiple Openstack > cluster. Each cluster has it's own set of pools. > > > > I'd like to move ~50TB of volumes from an old cluster (we'll call the > pool cluster01-volumes) to an existing pool (cluster02-volumes) to later be > imported by a different Openstack cluster. I could run something like > this... > > rbd export cluster01-volumes/volume-12345 | rbd import > cluster02-volumes/volume-12345 . > > I'm getting a little confused by the dual use of "cluster" for both > Ceph and OpenStack. Are both pools in the same Ceph cluster? If so, > could you just clone the image to the new pool? The Nautilus release > also includes a simple image live migration tool where it creates a > clone, copies the data and all snapshots to the clone, and then > deletes the original image. > > > But that would be slow and duplicate the data which I'd rather not do. > Are there any better ways to it? > > > > Thanks, > > > > -Brett > > ___ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > -- > Jason > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] BlueFS spillover detected - 14.2.1
On Thu, 20 Jun 2019 at 09:12, Vitaliy Filippov wrote: > All values except 4, 30 and 286 GB are currently useless in ceph with > default rocksdb settings :) > however, several commenters have said that during compaction rocksdb needs space during the process, and hence the DB partition needs to be twice those sizes, so 8GB, 60GB and 600GB. Does rocksdb spill during compaction if it doesn't have enough space? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Possible to move RBD volumes between pools?
On Wed, Jun 19, 2019 at 6:25 PM Brett Chancellor wrote: > > Background: We have a few ceph clusters, each serves multiple Openstack > cluster. Each cluster has it's own set of pools. > > I'd like to move ~50TB of volumes from an old cluster (we'll call the pool > cluster01-volumes) to an existing pool (cluster02-volumes) to later be > imported by a different Openstack cluster. I could run something like this... > rbd export cluster01-volumes/volume-12345 | rbd import > cluster02-volumes/volume-12345 . I'm getting a little confused by the dual use of "cluster" for both Ceph and OpenStack. Are both pools in the same Ceph cluster? If so, could you just clone the image to the new pool? The Nautilus release also includes a simple image live migration tool where it creates a clone, copies the data and all snapshots to the clone, and then deletes the original image. > But that would be slow and duplicate the data which I'd rather not do. Are > there any better ways to it? > > Thanks, > > -Brett > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Jason ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] BlueFS spillover detected - 14.2.1
All values except 4, 30 and 286 GB are currently useless in ceph with default rocksdb settings :) That's what you are seeing - all devices just use ~28 GB and everything else goes to HDDs. -- With best regards, Vitaliy Filippov ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Possible to move RBD volumes between pools?
Background: We have a few ceph clusters, each serves multiple Openstack cluster. Each cluster has it's own set of pools. I'd like to move ~50TB of volumes from an old cluster (we'll call the pool cluster01-volumes) to an existing pool (cluster02-volumes) to later be imported by a different Openstack cluster. I could run something like this... rbd export cluster01-volumes/volume-12345 | rbd import cluster02-volumes/volume-12345 . But that would be slow and duplicate the data which I'd rather not do. Are there any better ways to it? Thanks, -Brett ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ISCSI Setup
On 06/19/2019 12:34 AM, Brent Kennedy wrote: > Recently upgraded a ceph cluster to nautilus 14.2.1 from Luminous, no > issues. One of the reasons for doing so was to take advantage of some > of the new ISCSI updates that were added in Nautilus. I installed > CentOS 7.6 and did all the basic stuff to get the server online. I then > tried to use the > http://docs.ceph.com/docs/nautilus/rbd/iscsi-target-cli/ document and > hit a hard stop. Apparently, the package versions for the required > packages at the top nor the ceph-iscsi exist yet in any repositories. I am in the process of updating the upstream docs (Aaron wrote up the changes to the RHCS docs and I am just converting to the upstream docs and making into patches for a PR, and ceph-ansible (https://github.com/ceph/ceph-ansible/pull/3977) for the transition from ceph-iscsi-cli/config to ceph-iscsi. The upstream GH for ceph-iscsi is here https://github.com/ceph/ceph-iscsi and it is built here: https://shaman.ceph.com/repos/ceph-iscsi/ I think we are just waiting on one last patch for fqdn support from SUSE so we can make a new ceph-iscsi release. > Reminds me of when I first tried to setup RGWs. Is there a hidden > repository somewhere that hosts these required packages? Also, I found > a thread talking about those packages and the instructions being off, > which concerns me. Is there a good tutorial online somewhere? I saw > the ceph-ansible bits, but wasn’t sure if that would even work because > of the package issue. I use ansible to deploy machines all the time. I > also wonder if the ISCSI bits are considered production or Test ( I see > RedHat has a bunch of docs talking about using iscsi, so I would think > production ). > > > > Thoughts anyone? > > > > Regards, > > -Brent > > > > Existing Clusters: > > Test: Nautilus 14.2.1 with 3 osd servers, 1 mon/man, 1 gateway ( all > virtual on SSD ) > > US Production(HDD): Nautilus 14.2.1 with 11 osd servers, 3 mons, 4 > gateways behind haproxy LB > > UK Production(HDD): Luminous 12.2.11 with 25 osd servers, 3 mons/man, 3 > gateways behind haproxy LB > > US Production(SSD): Luminous 12.2.11 with 6 osd servers, 3 mons/man, 3 > gateways behind haproxy LB > > > > > > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] MDS getattr op stuck in snapshot
On 13/06/2019 14.31, Hector Martin wrote: > On 12/06/2019 22.33, Yan, Zheng wrote: >> I have tracked down the bug. thank you for reporting this. 'echo 2 > >> /proc/sys/vm/drop_cache' should fix the hang. If you can compile ceph >> from source, please try following patch. >> >> diff --git a/src/mds/Locker.cc b/src/mds/Locker.cc >> index ecd06294fa..94b947975a 100644 >> --- a/src/mds/Locker.cc >> +++ b/src/mds/Locker.cc >> @@ -2956,7 +2956,8 @@ void Locker::handle_client_caps(MClientCaps *m) >> >>// client flushes and releases caps at the same time. make sure >> MDCache::cow_inode() >>// properly setup CInode::client_need_snapflush >> - if ((m->get_dirty() & ~cap->issued()) && !need_snapflush) >> + if (!need_snapflush && (m->get_dirty() & ~cap->issued()) && >> + (m->flags & MClientCaps::FLAG_PENDING_CAPSNAP)) >> cap->mark_needsnapflush(); >> } >> >> >> > > That was quick, thanks! I can build from source but I won't have time to > do so and test it until next week, if that's okay. Okay, I tried building packages for Xenial following this doc, but that didn't go so well: http://docs.ceph.com/docs/mimic/install/build-ceph/ It seems install-deps pulls in a ppa with a newer GCC and libstdc++ (!) and that produces a build that is incompatible with a plain Xenial machine, no PPAs. The version tag is different too (the -1xenial thing isn't present). Is there documentation for how to build Ubuntu packages the exact same way as they are built for download.ceph.com? i.e. ceph-mds-dbg_13.2.6-1xenial_amd64.deb. If I can figure that out I can build a patched mds and test it. -- Hector Martin (hec...@marcansoft.com) Public Key: https://mrcn.st/pub ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] CephFS damaged and cannot recover
On Wed, Jun 19, 2019 at 9:19 AM Wei Jin wrote: > > There are plenty of data in this cluster (2PB), please help us, thx. > Before doing this dangerous > operations(http://docs.ceph.com/docs/master/cephfs/disaster-recovery-experts/#disaster-recovery-experts) > , any suggestions? > > Ceph version: 12.2.12 > > ceph fs status: > > cephfs - 1057 clients > == > +--+-+-+--+---+---+ > | Rank | State | MDS | Activity | dns | inos | > +--+-+-+--+---+---+ > | 0 | failed | | | | | > | 1 | resolve | n31-023-214 | |0 |0 | > | 2 | resolve | n31-023-215 | |0 |0 | > | 3 | resolve | n31-023-218 | |0 |0 | > | 4 | resolve | n31-023-220 | |0 |0 | > | 5 | resolve | n31-023-217 | |0 |0 | > | 6 | resolve | n31-023-222 | |0 |0 | > | 7 | resolve | n31-023-216 | |0 |0 | > | 8 | resolve | n31-023-221 | |0 |0 | > | 9 | resolve | n31-023-223 | |0 |0 | > | 10 | resolve | n31-023-225 | |0 |0 | > | 11 | resolve | n31-023-224 | |0 |0 | > | 12 | resolve | n31-023-219 | |0 |0 | > | 13 | resolve | n31-023-229 | |0 |0 | > +--+-+-+--+---+---+ > +-+--+---+---+ > | Pool | type | used | avail | > +-+--+---+---+ > | cephfs_metadata | metadata | 2843M | 34.9T | > | cephfs_data | data | 2580T | 731T | > +-+--+---+---+ > > +-+ > | Standby MDS | > +-+ > | n31-023-227 | > | n31-023-226 | > | n31-023-228 | > +-+ Are there failovers occurring while all the ranks are in up:resolve? MDS logs at high debug level would be helpful. -- Patrick Donnelly, Ph.D. He / Him / His Senior Software Engineer Red Hat Sunnyvale, CA GPG: 19F28A586F808C2402351B93C3301A3E258DD79D ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] CephFS damaged and cannot recover
There are plenty of data in this cluster (2PB), please help us, thx. Before doing this dangerous operations(http://docs.ceph.com/docs/master/cephfs/disaster-recovery-experts/#disaster-recovery-experts) , any suggestions? Ceph version: 12.2.12 ceph fs status: cephfs - 1057 clients == +--+-+-+--+---+---+ | Rank | State | MDS | Activity | dns | inos | +--+-+-+--+---+---+ | 0 | failed | | | | | | 1 | resolve | n31-023-214 | |0 |0 | | 2 | resolve | n31-023-215 | |0 |0 | | 3 | resolve | n31-023-218 | |0 |0 | | 4 | resolve | n31-023-220 | |0 |0 | | 5 | resolve | n31-023-217 | |0 |0 | | 6 | resolve | n31-023-222 | |0 |0 | | 7 | resolve | n31-023-216 | |0 |0 | | 8 | resolve | n31-023-221 | |0 |0 | | 9 | resolve | n31-023-223 | |0 |0 | | 10 | resolve | n31-023-225 | |0 |0 | | 11 | resolve | n31-023-224 | |0 |0 | | 12 | resolve | n31-023-219 | |0 |0 | | 13 | resolve | n31-023-229 | |0 |0 | +--+-+-+--+---+---+ +-+--+---+---+ | Pool | type | used | avail | +-+--+---+---+ | cephfs_metadata | metadata | 2843M | 34.9T | | cephfs_data | data | 2580T | 731T | +-+--+---+---+ +-+ | Standby MDS | +-+ | n31-023-227 | | n31-023-226 | | n31-023-228 | +-+ ceph fs dump: dumped fsmap epoch 22712 e22712 enable_multiple, ever_enabled_multiple: 0,0 compat: compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table,9=file layout v2} legacy client fscid: 1 Filesystem 'cephfs' (1) fs_name cephfs epoch 22711 flags 4 created 2018-11-30 10:05:06.015325 modified 2019-06-19 23:37:41.400961 tableserver 0 root 0 session_timeout 60 session_autoclose 300 max_file_size 1099511627776 last_failure 0 last_failure_osd_epoch 22246 compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table,9=file layout v2} max_mds 14 in 0,1,2,3,4,5,6,7,8,9,10,11,12,13 up {1=31684663,2=31684674,3=31684576,4=31684673,5=31684678,6=31684612,7=31684688,8=31684683,9=31684698,10=31684695,11=31684693,12=31684586,13=31684617} failed damaged 0 stopped data_pools [2] metadata_pool 1 inline_data disabled balancer standby_count_wanted 1 31684663: 10.31.23.214:6800/829459839 'n31-023-214' mds.1.22682 up:resolve seq 6 31684674: 10.31.23.215:6800/2483123757 'n31-023-215' mds.2.22683 up:resolve seq 3 31684576: 10.31.23.218:6800/3381299029 'n31-023-218' mds.3.22683 up:resolve seq 3 31684673: 10.31.23.220:6800/3540255817 'n31-023-220' mds.4.22685 up:resolve seq 3 31684678: 10.31.23.217:6800/4004537495 'n31-023-217' mds.5.22689 up:resolve seq 3 31684612: 10.31.23.222:6800/1482899141 'n31-023-222' mds.6.22691 up:resolve seq 3 31684688: 10.31.23.216:6800/820115186 'n31-023-216' mds.7.22693 up:resolve seq 3 31684683: 10.31.23.221:6800/1996416037 'n31-023-221' mds.8.22693 up:resolve seq 3 31684698: 10.31.23.223:6800/2807778042 'n31-023-223' mds.9.22695 up:resolve seq 3 31684695: 10.31.23.225:6800/101451176 'n31-023-225' mds.10.22702 up:resolve seq 3 31684693: 10.31.23.224:6800/1597373084 'n31-023-224' mds.11.22695 up:resolve seq 3 31684586: 10.31.23.219:6800/3640206080 'n31-023-219' mds.12.22695 up:resolve seq 3 31684617: 10.31.23.229:6800/3511814011 'n31-023-229' mds.13.22697 up:resolve seq 3 Standby daemons: 31684637: 10.31.23.227:6800/1987867930 'n31-023-227' mds.-1.0 up:standby seq 2 31684690: 10.31.23.226:6800/3695913629 'n31-023-226' mds.-1.0 up:standby seq 2 31689991: 10.31.23.228:6800/2624666750 'n31-023-228' mds.-1.0 up:standby seq 2 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph crush map randomly changes for one host
Could it because all the osds in it are set reweight =0? -75.45695 host ceph-osd3 26 hdd 1.81898 osd.26 down0 1.0 27 hdd 1.81898 osd.27 down0 1.0 30 hdd 1.81898 osd.30 down0 1.0 Best, Feng On Wed, Jun 19, 2019 at 11:36 AM Pelletier, Robert wrote: > Here is a fuller picture. I inherited this ceph cluster from a previous > admin whom has left the company. Although I am a linux administrator, I > have very little experience with ceph and have had to learn; definitely > still a lot to learn. I do know this crush map was made manually. To me it > does not look right and would like to reorganize it, but I am concerned > about what effects that would have with a cluster that has data on it. > > > > I would like to remove both of the osd3-shelf1 and osd3-shelf2 chassis > buckets and move them to host ceph-osd3 (I don’t see a need from separate > buckets here). The “chassis” are actually two SAS disk shelves connected to > ceph-osd3 host. > > > > However, just moving one osd causes ceph to go unhealthy with > OBJECT_MISPLACED messages and takes a while to go back into a healthy > state. I am not too sure this is a big concern, but I am wondering if there > is a recommended procedure for doing this. As I am just learning, I don’t > want to do anything that will cause data loss. > > > > > > > > > > My Tree is supposed to look like below but it keeps changing to the map > further > below. Notice the drives moving from chassis osd3-shelf1 chassis to host > ceph-osd3. Does anyone know why this may happen? > > > > I wrote a script to monitor for this and to place the osds back where they > belong if they notice the change, but this should obviously not be > necessary. I would appreciate any help with this. > > > > > > > > ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF > > -62 14.55199 root osd3-shelf2 > > -60 14.55199 chassis ceph-osd3-shelf2 > > 3 hdd 1.81898 osd.3down 1.0 1.0 > > 40 hdd 1.81898 osd.40 down 1.0 1.0 > > 41 hdd 1.81898 osd.41 down 1.0 1.0 > > 42 hdd 1.81898 osd.42 down 1.0 1.0 > > 43 hdd 1.81898 osd.43 down 1.0 1.0 > > 44 hdd 1.81898 osd.44 down 1.0 1.0 > > 45 hdd 1.81898 osd.45 down 1.0 1.0 > > 46 hdd 1.81898 osd.46 down 1.0 1.0 > > -581.71599 root osd3-internal > > -541.71599 chassis ceph-osd3-internal > > 34 hdd 0.42899 osd.34 down 1.0 1.0 > > 35 hdd 0.42899 osd.35 down 1.0 1.0 > > 36 hdd 0.42899 osd.36 down 1.0 1.0 > > 37 hdd 0.42899 osd.37 down 1.0 1.0 > > -50 14.55199 root osd3-shelf1 > > -56 14.55199 chassis ceph-osd3-shelf1 > > 21 hdd 1.81898 osd.21 down 1.0 1.0 > > 22 hdd 1.81898 osd.22 down 1.0 1.0 > > 23 hdd 1.81898 osd.23 down 1.0 1.0 > > 24 hdd 1.81898 osd.24 down 1.0 1.0 > > 25 hdd 1.81898 osd.25 down 1.0 1.0 > > 28 hdd 1.81898 osd.28 down 1.0 1.0 > > 29 hdd 1.81898 osd.29 down 1.0 1.0 > > 31 hdd 1.81898 osd.31 down 1.0 1.0 > > -75.45695 host ceph-osd3 > > 26 hdd 1.81898 osd.26 down0 1.0 > > 27 hdd 1.81898 osd.27 down0 1.0 > > 30 hdd 1.81898 osd.30 down0 1.0 > > -1 47.21199 root default > > -40 23.59000 rack mainehall > > -3 23.59000 host ceph-osd1 > > 0 hdd 1.81898 osd.0 up 1.0 1.0 > > 1 hdd 1.81898 osd.1 up 1.0 1.0 > > 2 hdd 1.81898 osd.2 up 1.0 1.0 > > 4 hdd 1.81898 osd.4 up 1.0 1.0 > > 5 hdd 1.81898 osd.5 up 0.90002 1.0 > > 6 hdd 1.81898 osd.6 up 1.0 1.0 > > 7 hdd 1.81898 osd.7 up 1.0 1.0 > > 8 hdd 1.81898 osd.8 up 1.0 1.0 > > 9 hdd 1.81898 osd.9 up 1.0 1.0 > > 10 hdd 1.81898 osd.10 up 0.95001 1.0 > > 33 hdd 1.76099
Re: [ceph-users] Ceph crush map randomly changes for one host
Here is a fuller picture. I inherited this ceph cluster from a previous admin whom has left the company. Although I am a linux administrator, I have very little experience with ceph and have had to learn; definitely still a lot to learn. I do know this crush map was made manually. To me it does not look right and would like to reorganize it, but I am concerned about what effects that would have with a cluster that has data on it. I would like to remove both of the osd3-shelf1 and osd3-shelf2 chassis buckets and move them to host ceph-osd3 (I don’t see a need from separate buckets here). The “chassis” are actually two SAS disk shelves connected to ceph-osd3 host. However, just moving one osd causes ceph to go unhealthy with OBJECT_MISPLACED messages and takes a while to go back into a healthy state. I am not too sure this is a big concern, but I am wondering if there is a recommended procedure for doing this. As I am just learning, I don’t want to do anything that will cause data loss. My Tree is supposed to look like below but it keeps changing to the map further below. Notice the drives moving from chassis osd3-shelf1 chassis to host ceph-osd3. Does anyone know why this may happen? I wrote a script to monitor for this and to place the osds back where they belong if they notice the change, but this should obviously not be necessary. I would appreciate any help with this. ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -62 14.55199 root osd3-shelf2 -60 14.55199 chassis ceph-osd3-shelf2 3 hdd 1.81898 osd.3down 1.0 1.0 40 hdd 1.81898 osd.40 down 1.0 1.0 41 hdd 1.81898 osd.41 down 1.0 1.0 42 hdd 1.81898 osd.42 down 1.0 1.0 43 hdd 1.81898 osd.43 down 1.0 1.0 44 hdd 1.81898 osd.44 down 1.0 1.0 45 hdd 1.81898 osd.45 down 1.0 1.0 46 hdd 1.81898 osd.46 down 1.0 1.0 -581.71599 root osd3-internal -541.71599 chassis ceph-osd3-internal 34 hdd 0.42899 osd.34 down 1.0 1.0 35 hdd 0.42899 osd.35 down 1.0 1.0 36 hdd 0.42899 osd.36 down 1.0 1.0 37 hdd 0.42899 osd.37 down 1.0 1.0 -50 14.55199 root osd3-shelf1 -56 14.55199 chassis ceph-osd3-shelf1 21 hdd 1.81898 osd.21 down 1.0 1.0 22 hdd 1.81898 osd.22 down 1.0 1.0 23 hdd 1.81898 osd.23 down 1.0 1.0 24 hdd 1.81898 osd.24 down 1.0 1.0 25 hdd 1.81898 osd.25 down 1.0 1.0 28 hdd 1.81898 osd.28 down 1.0 1.0 29 hdd 1.81898 osd.29 down 1.0 1.0 31 hdd 1.81898 osd.31 down 1.0 1.0 -75.45695 host ceph-osd3 26 hdd 1.81898 osd.26 down0 1.0 27 hdd 1.81898 osd.27 down0 1.0 30 hdd 1.81898 osd.30 down0 1.0 -1 47.21199 root default -40 23.59000 rack mainehall -3 23.59000 host ceph-osd1 0 hdd 1.81898 osd.0 up 1.0 1.0 1 hdd 1.81898 osd.1 up 1.0 1.0 2 hdd 1.81898 osd.2 up 1.0 1.0 4 hdd 1.81898 osd.4 up 1.0 1.0 5 hdd 1.81898 osd.5 up 0.90002 1.0 6 hdd 1.81898 osd.6 up 1.0 1.0 7 hdd 1.81898 osd.7 up 1.0 1.0 8 hdd 1.81898 osd.8 up 1.0 1.0 9 hdd 1.81898 osd.9 up 1.0 1.0 10 hdd 1.81898 osd.10 up 0.95001 1.0 33 hdd 1.76099 osd.33 up 1.0 1.0 38 hdd 3.63899 osd.38 up 1.0 1.0 -42 23.62199 rack rangleyhall -5 23.62199 host ceph-osd2 11 hdd 1.81898 osd.11 up 1.0 1.0 12 hdd 1.81898 osd.12 up 0.90002 1.0 13 hdd 1.81898 osd.13 up 1.0 1.0 14 hdd 1.81898 osd.14 up 1.0 1.0 15 hdd 1.81898 osd.15 up 1.0 1.0 16 hdd 1.81898 osd.16 up 1
Re: [ceph-users] Stop metadata sync in multi-site RGW
Hi Casey, thanks for the quick reply. The goal is pause replication for a while. Thanks a lot, i'll try this rgw_run_sync_thread. Marcelo M. Miziara Serviço Federal de Processamento de Dados - SERPRO marcelo.mizi...@serpro.gov.br - Mensagem original - De: "Casey Bodley" Para: "ceph-users" Enviadas: Quarta-feira, 19 de junho de 2019 11:54:18 Assunto: Re: [ceph-users] Stop metadata sync in multi-site RGW Right, the sync_from fields in the zone configuration only relate to data sync within the zonegroup. Can you clarify what your goal is? Are you just trying to pause the replication for a while, or disable it permanently? To pause replication, you can configure rgw_run_sync_thread=0 on all gateways in that zone. Just note that replication logs will continue to grow, and because this 'paused' zone isn't consuming them, it will prevent the logs from being trimmed on all zones until sync is reenabled and replication catches up. To disable replication entirely, you'd want to move that zone out of the multisite configuration. This would involve removing the zone from its current zonegroup, creating a new realm and zonegroup, moving the zone into that, and setting its log_data/log_meta fields to false. I can follow up with radosgw-admin commands if that's what you're trying to do. On 6/19/19 10:14 AM, Marcelo Mariano Miziara wrote: > Hello all! > > I'm trying to stop the sync from two zones, but using the parameter > "--sync_from_all=false" seems to stop only the data sync, but not the > metadata (i.e. users and buckets are synced). > > > # radosgw-admin sync status > realm (xx) > zonegroup (xx) > zone (xx) > metadata sync syncing > full sync: 0/64 shards > incremental sync: 64/64 shards > metadata is caught up with master > data sync source: (xx) > not syncing from zone > > Thanks, > Marcelo M. > Serviço Federal de Processamento de Dados - SERPRO > marcelo.mizi...@serpro.gov.br > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com - "Esta mensagem do SERVIÇO FEDERAL DE PROCESSAMENTO DE DADOS (SERPRO), empresa pública federal regida pelo disposto na Lei Federal nº 5.615, é enviada exclusivamente a seu destinatário e pode conter informações confidenciais, protegidas por sigilo profissional. Sua utilização desautorizada é ilegal e sujeita o infrator às penas da lei. Se você a recebeu indevidamente, queira, por gentileza, reenviá-la ao emitente, esclarecendo o equívoco." "This message from SERVIÇO FEDERAL DE PROCESSAMENTO DE DADOS (SERPRO) -- a government company established under Brazilian law (5.615/70) -- is directed exclusively to its addressee and may contain confidential data, protected under professional secrecy rules. Its unauthorized use is illegal and may subject the transgressor to the law's penalties. If you're not the addressee, please send it back, elucidating the failure." ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Stop metadata sync in multi-site RGW
Right, the sync_from fields in the zone configuration only relate to data sync within the zonegroup. Can you clarify what your goal is? Are you just trying to pause the replication for a while, or disable it permanently? To pause replication, you can configure rgw_run_sync_thread=0 on all gateways in that zone. Just note that replication logs will continue to grow, and because this 'paused' zone isn't consuming them, it will prevent the logs from being trimmed on all zones until sync is reenabled and replication catches up. To disable replication entirely, you'd want to move that zone out of the multisite configuration. This would involve removing the zone from its current zonegroup, creating a new realm and zonegroup, moving the zone into that, and setting its log_data/log_meta fields to false. I can follow up with radosgw-admin commands if that's what you're trying to do. On 6/19/19 10:14 AM, Marcelo Mariano Miziara wrote: Hello all! I'm trying to stop the sync from two zones, but using the parameter "--sync_from_all=false" seems to stop only the data sync, but not the metadata (i.e. users and buckets are synced). # radosgw-admin sync status realm (xx) zonegroup (xx) zone (xx) metadata sync syncing full sync: 0/64 shards incremental sync: 64/64 shards metadata is caught up with master data sync source: (xx) not syncing from zone Thanks, Marcelo M. Serviço Federal de Processamento de Dados - SERPRO marcelo.mizi...@serpro.gov.br - "Esta mensagem do SERVIÇO FEDERAL DE PROCESSAMENTO DE DADOS (SERPRO), empresa pública federal regida pelo disposto na Lei Federal nº 5.615, é enviada exclusivamente a seu destinatário e pode conter informações confidenciais, protegidas por sigilo profissional. Sua utilização desautorizada é ilegal e sujeita o infrator às penas da lei. Se você a recebeu indevidamente, queira, por gentileza, reenviá-la ao emitente, esclarecendo o equívoco." "This message from SERVIÇO FEDERAL DE PROCESSAMENTO DE DADOS (SERPRO) -- a government company established under Brazilian law (5.615/70) -- is directed exclusively to its addressee and may contain confidential data, protected under professional secrecy rules. Its unauthorized use is illegal and may subject the transgressor to the law's penalties. If you're not the addressee, please send it back, elucidating the failure." ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Stop metadata sync in multi-site RGW
Hello all! I'm trying to stop the sync from two zones, but using the parameter "--sync_from_all=false" seems to stop only the data sync, but not the metadata (i.e. users and buckets are synced). # radosgw-admin sync status realm (xx) zonegroup (xx) zone (xx) metadata sync syncing full sync: 0/64 shards incremental sync: 64/64 shards metadata is caught up with master data sync source: (xx) not syncing from zone Thanks, Marcelo M. Serviço Federal de Processamento de Dados - SERPRO marcelo.mizi...@serpro.gov.br - "Esta mensagem do SERVIÇO FEDERAL DE PROCESSAMENTO DE DADOS (SERPRO), empresa pública federal regida pelo disposto na Lei Federal nº 5.615, é enviada exclusivamente a seu destinatário e pode conter informações confidenciais, protegidas por sigilo profissional. Sua utilização desautorizada é ilegal e sujeita o infrator às penas da lei. Se você a recebeu indevidamente, queira, por gentileza, reenviá-la ao emitente, esclarecendo o equívoco." "This message from SERVIÇO FEDERAL DE PROCESSAMENTO DE DADOS (SERPRO) -- a government company established under Brazilian law (5.615/70) -- is directed exclusively to its addressee and may contain confidential data, protected under professional secrecy rules. Its unauthorized use is illegal and may subject the transgressor to the law's penalties. If you're not the addressee, please send it back, elucidating the failure." ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Reduced data availability: 2 pgs inactive
Hi Paul, thanks for the hint. Restarting the primary osds of the inactive pgs resolved the problem: Before restarting them they said: 2019-06-19 15:55:36.190 7fcd55c4e700 -1 osd.5 33858 get_health_metrics reporting 15 slow ops, oldest is osd_op(client.220116.0:967410 21.2e4s0 21.d4e19ae4 (undecoded) ondisk+write+known_if_redirected e31569) and 2019-06-19 15:53:31.214 7f9b946d1700 -1 osd.13 33849 get_health_metrics reporting 14560 slow ops, oldest is osd_op(mds.0.44294:99584053 23.5 23.cad28605 (undecoded) ondisk+write+known_if_redirected+full_force e31562) Is this something to worry about? Regards, Lars Wed, 19 Jun 2019 15:04:06 +0200 Paul Emmerich ==> Lars Täuber : > That shouldn't trigger the PG limit (yet), but increasing "mon max pg per > osd" from the default of 200 is a good idea anyways since you are running > with more than 200 PGs per OSD. > > I'd try to restart all OSDs that are in the UP set for that PG: > > 13, > 21, > 23 > 7, > 29, > 9, > 28, > 11, > 8 > > > Maybe that solves it (technically it shouldn't), if that doesn't work > you'll have to dig in deeper into the log files to see where exactly and > why it is stuck activating. > > Paul > -- Informationstechnologie Berlin-Brandenburgische Akademie der Wissenschaften Jägerstraße 22-23 10117 Berlin Tel.: +49 30 20370-352 http://www.bbaw.de ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Reduced data availability: 2 pgs inactive
That shouldn't trigger the PG limit (yet), but increasing "mon max pg per osd" from the default of 200 is a good idea anyways since you are running with more than 200 PGs per OSD. I'd try to restart all OSDs that are in the UP set for that PG: 13, 21, 23 7, 29, 9, 28, 11, 8 Maybe that solves it (technically it shouldn't), if that doesn't work you'll have to dig in deeper into the log files to see where exactly and why it is stuck activating. Paul -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90 On Wed, Jun 19, 2019 at 2:30 PM Lars Täuber wrote: > Hi Paul, > > thanks for your reply. > > Wed, 19 Jun 2019 13:19:55 +0200 > Paul Emmerich ==> Lars Täuber : > > Wild guess: you hit the PG hard limit, how many PGs per OSD do you have? > > If this is the case: increase "osd max pg per osd hard ratio" > > > > Check "ceph pg query" to see why it isn't activating. > > > > Can you share the output of "ceph osd df tree" and "ceph pg query" > > of the affected PGs? > > The pg queries are attached. I can't read them - to much information. > > > Here is the osd df tree: > # osd df tree > ID CLASS WEIGHTREWEIGHT SIZERAW USE DATAOMAPMETA > AVAIL %USE VAR PGS STATUS TYPE NAME > -1 167.15057- 167 TiB 4.7 TiB 1.2 TiB 952 MiB 57 GiB 162 > TiB 2.79 1.00 -root PRZ > -1772.43192- 72 TiB 2.0 TiB 535 GiB 393 MiB 25 GiB 70 > TiB 2.78 1.00 -rack 1-eins > -922.28674- 22 TiB 640 GiB 170 GiB 82 MiB 9.0 GiB 22 > TiB 2.80 1.01 -host onode1 > 2 hdd 5.57169 1.0 5.6 TiB 162 GiB 45 GiB 11 MiB 2.3 GiB 5.4 > TiB 2.84 1.02 224 up osd.2 > 9 hdd 5.57169 1.0 5.6 TiB 156 GiB 39 GiB 19 MiB 2.1 GiB 5.4 > TiB 2.74 0.98 201 up osd.9 > 14 hdd 5.57169 1.0 5.6 TiB 162 GiB 44 GiB 24 MiB 2.1 GiB 5.4 > TiB 2.84 1.02 230 up osd.14 > 21 hdd 5.57169 1.0 5.6 TiB 160 GiB 42 GiB 27 MiB 2.5 GiB 5.4 > TiB 2.80 1.00 219 up osd.21 > -1322.28674- 22 TiB 640 GiB 170 GiB 123 MiB 8.9 GiB 22 > TiB 2.80 1.00 -host onode4 > 4 hdd 5.57169 1.0 5.6 TiB 156 GiB 39 GiB 38 MiB 2.2 GiB 5.4 > TiB 2.73 0.98 205 up osd.4 > 11 hdd 5.57169 1.0 5.6 TiB 164 GiB 47 GiB 24 MiB 2.0 GiB 5.4 > TiB 2.87 1.03 241 up osd.11 > 18 hdd 5.57169 1.0 5.6 TiB 159 GiB 42 GiB 31 MiB 2.5 GiB 5.4 > TiB 2.79 1.00 221 up osd.18 > 22 hdd 5.57169 1.0 5.6 TiB 160 GiB 43 GiB 29 MiB 2.1 GiB 5.4 > TiB 2.81 1.01 225 up osd.22 > -527.85843- 28 TiB 782 GiB 195 GiB 188 MiB 6.9 GiB 27 > TiB 2.74 0.98 -host onode7 > 5 hdd 5.57169 1.0 5.6 TiB 158 GiB 41 GiB 26 MiB 1.2 GiB 5.4 > TiB 2.77 0.99 213 up osd.5 > 12 hdd 5.57169 1.0 5.6 TiB 159 GiB 42 GiB 31 MiB 993 MiB 5.4 > TiB 2.79 1.00 222 up osd.12 > 20 hdd 5.57169 1.0 5.6 TiB 157 GiB 40 GiB 47 MiB 1.2 GiB 5.4 > TiB 2.76 0.99 212 up osd.20 > 27 hdd 5.57169 1.0 5.6 TiB 151 GiB 33 GiB 28 MiB 1.9 GiB 5.4 > TiB 2.64 0.95 179 up osd.27 > 29 hdd 5.57169 1.0 5.6 TiB 156 GiB 39 GiB 56 MiB 1.7 GiB 5.4 > TiB 2.74 0.98 203 up osd.29 > -1844.57349- 45 TiB 1.3 TiB 341 GiB 248 MiB 14 GiB 43 > TiB 2.81 1.01 -rack 2-zwei > -722.28674- 22 TiB 641 GiB 171 GiB 132 MiB 6.7 GiB 22 > TiB 2.81 1.01 -host onode2 > 1 hdd 5.57169 1.0 5.6 TiB 155 GiB 38 GiB 35 MiB 1.2 GiB 5.4 > TiB 2.72 0.97 203 up osd.1 > 8 hdd 5.57169 1.0 5.6 TiB 163 GiB 46 GiB 36 MiB 2.4 GiB 5.4 > TiB 2.86 1.02 243 up osd.8 > 16 hdd 5.57169 1.0 5.6 TiB 161 GiB 43 GiB 24 MiB 1000 MiB 5.4 > TiB 2.82 1.01 221 up osd.16 > 23 hdd 5.57169 1.0 5.6 TiB 162 GiB 45 GiB 37 MiB 2.1 GiB 5.4 > TiB 2.84 1.02 228 up osd.23 > -322.28674- 22 TiB 640 GiB 170 GiB 116 MiB 7.6 GiB 22 > TiB 2.80 1.00 -host onode5 > 3 hdd 5.57169 1.0 5.6 TiB 154 GiB 36 GiB 14 MiB 1010 MiB 5.4 > TiB 2.70 0.97 186 up osd.3 > 7 hdd 5.57169 1.0 5.6 TiB 161 GiB 44 GiB 22 MiB 2.2 GiB 5.4 > TiB 2.82 1.01 221 up osd.7 > 15 hdd 5.57169 1.0 5.6 TiB 165 GiB 48 GiB 26 MiB 2.3 GiB 5.4 > TiB 2.89 1.04 249 up osd.15 > 24 hdd 5.57169 1.0 5.6 TiB 160 GiB 42 GiB 54 MiB 2.1 GiB 5.4 > TiB 2.80 1.00 223 up osd.24 > -1950.14517- 50 TiB 1.4 TiB 376 GiB 311 MiB 18 GiB 49 >
Re: [ceph-users] Reduced data availability: 2 pgs inactive
Hi Paul, thanks for your reply. Wed, 19 Jun 2019 13:19:55 +0200 Paul Emmerich ==> Lars Täuber : > Wild guess: you hit the PG hard limit, how many PGs per OSD do you have? > If this is the case: increase "osd max pg per osd hard ratio" > > Check "ceph pg query" to see why it isn't activating. > > Can you share the output of "ceph osd df tree" and "ceph pg query" > of the affected PGs? The pg queries are attached. I can't read them - to much information. Here is the osd df tree: # osd df tree ID CLASS WEIGHTREWEIGHT SIZERAW USE DATAOMAPMETA AVAIL %USE VAR PGS STATUS TYPE NAME -1 167.15057- 167 TiB 4.7 TiB 1.2 TiB 952 MiB 57 GiB 162 TiB 2.79 1.00 -root PRZ -1772.43192- 72 TiB 2.0 TiB 535 GiB 393 MiB 25 GiB 70 TiB 2.78 1.00 -rack 1-eins -922.28674- 22 TiB 640 GiB 170 GiB 82 MiB 9.0 GiB 22 TiB 2.80 1.01 -host onode1 2 hdd 5.57169 1.0 5.6 TiB 162 GiB 45 GiB 11 MiB 2.3 GiB 5.4 TiB 2.84 1.02 224 up osd.2 9 hdd 5.57169 1.0 5.6 TiB 156 GiB 39 GiB 19 MiB 2.1 GiB 5.4 TiB 2.74 0.98 201 up osd.9 14 hdd 5.57169 1.0 5.6 TiB 162 GiB 44 GiB 24 MiB 2.1 GiB 5.4 TiB 2.84 1.02 230 up osd.14 21 hdd 5.57169 1.0 5.6 TiB 160 GiB 42 GiB 27 MiB 2.5 GiB 5.4 TiB 2.80 1.00 219 up osd.21 -1322.28674- 22 TiB 640 GiB 170 GiB 123 MiB 8.9 GiB 22 TiB 2.80 1.00 -host onode4 4 hdd 5.57169 1.0 5.6 TiB 156 GiB 39 GiB 38 MiB 2.2 GiB 5.4 TiB 2.73 0.98 205 up osd.4 11 hdd 5.57169 1.0 5.6 TiB 164 GiB 47 GiB 24 MiB 2.0 GiB 5.4 TiB 2.87 1.03 241 up osd.11 18 hdd 5.57169 1.0 5.6 TiB 159 GiB 42 GiB 31 MiB 2.5 GiB 5.4 TiB 2.79 1.00 221 up osd.18 22 hdd 5.57169 1.0 5.6 TiB 160 GiB 43 GiB 29 MiB 2.1 GiB 5.4 TiB 2.81 1.01 225 up osd.22 -527.85843- 28 TiB 782 GiB 195 GiB 188 MiB 6.9 GiB 27 TiB 2.74 0.98 -host onode7 5 hdd 5.57169 1.0 5.6 TiB 158 GiB 41 GiB 26 MiB 1.2 GiB 5.4 TiB 2.77 0.99 213 up osd.5 12 hdd 5.57169 1.0 5.6 TiB 159 GiB 42 GiB 31 MiB 993 MiB 5.4 TiB 2.79 1.00 222 up osd.12 20 hdd 5.57169 1.0 5.6 TiB 157 GiB 40 GiB 47 MiB 1.2 GiB 5.4 TiB 2.76 0.99 212 up osd.20 27 hdd 5.57169 1.0 5.6 TiB 151 GiB 33 GiB 28 MiB 1.9 GiB 5.4 TiB 2.64 0.95 179 up osd.27 29 hdd 5.57169 1.0 5.6 TiB 156 GiB 39 GiB 56 MiB 1.7 GiB 5.4 TiB 2.74 0.98 203 up osd.29 -1844.57349- 45 TiB 1.3 TiB 341 GiB 248 MiB 14 GiB 43 TiB 2.81 1.01 -rack 2-zwei -722.28674- 22 TiB 641 GiB 171 GiB 132 MiB 6.7 GiB 22 TiB 2.81 1.01 -host onode2 1 hdd 5.57169 1.0 5.6 TiB 155 GiB 38 GiB 35 MiB 1.2 GiB 5.4 TiB 2.72 0.97 203 up osd.1 8 hdd 5.57169 1.0 5.6 TiB 163 GiB 46 GiB 36 MiB 2.4 GiB 5.4 TiB 2.86 1.02 243 up osd.8 16 hdd 5.57169 1.0 5.6 TiB 161 GiB 43 GiB 24 MiB 1000 MiB 5.4 TiB 2.82 1.01 221 up osd.16 23 hdd 5.57169 1.0 5.6 TiB 162 GiB 45 GiB 37 MiB 2.1 GiB 5.4 TiB 2.84 1.02 228 up osd.23 -322.28674- 22 TiB 640 GiB 170 GiB 116 MiB 7.6 GiB 22 TiB 2.80 1.00 -host onode5 3 hdd 5.57169 1.0 5.6 TiB 154 GiB 36 GiB 14 MiB 1010 MiB 5.4 TiB 2.70 0.97 186 up osd.3 7 hdd 5.57169 1.0 5.6 TiB 161 GiB 44 GiB 22 MiB 2.2 GiB 5.4 TiB 2.82 1.01 221 up osd.7 15 hdd 5.57169 1.0 5.6 TiB 165 GiB 48 GiB 26 MiB 2.3 GiB 5.4 TiB 2.89 1.04 249 up osd.15 24 hdd 5.57169 1.0 5.6 TiB 160 GiB 42 GiB 54 MiB 2.1 GiB 5.4 TiB 2.80 1.00 223 up osd.24 -1950.14517- 50 TiB 1.4 TiB 376 GiB 311 MiB 18 GiB 49 TiB 2.79 1.00 -rack 3-drei -1522.28674- 22 TiB 649 GiB 179 GiB 112 MiB 8.2 GiB 22 TiB 2.84 1.02 -host onode3 0 hdd 5.57169 1.0 5.6 TiB 162 GiB 45 GiB 28 MiB 996 MiB 5.4 TiB 2.84 1.02 229 up osd.0 10 hdd 5.57169 1.0 5.6 TiB 159 GiB 42 GiB 21 MiB 2.2 GiB 5.4 TiB 2.79 1.00 213 up osd.10 17 hdd 5.57169 1.0 5.6 TiB 165 GiB 47 GiB 19 MiB 2.5 GiB 5.4 TiB 2.88 1.03 238 up osd.17 25 hdd 5.57169 1.0 5.6 TiB 163 GiB 46 GiB 44 MiB 2.5 GiB 5.4 TiB 2.86 1.03 242 up osd.25 -1127.85843- 28 TiB 784 GiB 197 GiB 199 MiB 9.4 GiB 27 TiB 2.75 0.99 -host onode6 6 hdd 5.5
Re: [ceph-users] Reduced data availability: 2 pgs inactive
Wild guess: you hit the PG hard limit, how many PGs per OSD do you have? If this is the case: increase "osd max pg per osd hard ratio" Check "ceph pg query" to see why it isn't activating. Can you share the output of "ceph osd df tree" and "ceph pg query" of the affected PGs? Paul -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90 On Wed, Jun 19, 2019 at 8:52 AM Lars Täuber wrote: > Hi there! > > Recently I made our cluster rack aware > by adding racks to the crush map. > The failure domain was and still is "host". > > rule cephfs2_data { > id 7 > type erasure > min_size 3 > max_size 6 > step set_chooseleaf_tries 5 > step set_choose_tries 100 > step take PRZ > step chooseleaf indep 0 type host > step emit > > > Then I sorted the hosts into the new > rack buckets of the crush map as they > are in reality, by: > # osd crush move onodeX rack=XYZ > for all hosts. > > The cluster started to reorder the data. > > In the end the cluster has now: > HEALTH_WARN 1 filesystem is degraded; Reduced data availability: 2 pgs > inactive; Degraded data redundancy: 678/2371785 objects degraded (0.029%), > 2 pgs degraded, 2 pgs undersized > FS_DEGRADED 1 filesystem is degraded > fs cephfs_1 is degraded > PG_AVAILABILITY Reduced data availability: 2 pgs inactive > pg 21.2e4 is stuck inactive for 142792.952697, current state > activating+undersized+degraded+remapped+forced_backfill, last acting > [5,2147483647,25,28,11,2] > pg 23.5 is stuck inactive for 142791.437243, current state > activating+undersized+degraded+remapped+forced_backfill, last acting [13,21] > PG_DEGRADED Degraded data redundancy: 678/2371785 objects degraded > (0.029%), 2 pgs degraded, 2 pgs undersized > pg 21.2e4 is stuck undersized for 142779.321192, current state > activating+undersized+degraded+remapped+forced_backfill, last acting > [5,2147483647,25,28,11,2] > pg 23.5 is stuck undersized for 142789.747915, current state > activating+undersized+degraded+remapped+forced_backfill, last acting [13,21] > > The cluster hosts a cephfs which is > not mountable anymore. > > I tried a few things (as you can see: > forced_backfill), but failed. > > The cephfs_data pool is EC 4+2. > Both inactive pgs seem to have enough > copies to recalculate the contents for > all osds. > > Is there a chance to get both pgs > clean again? > > How can I force the pgs to recalculate > all necessary copies? > > > Thanks > Lars > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Debian Buster builds
On Tue, Jun 18, 2019 at 6:29 PM Daniel Baumann wrote: > On 6/18/19 3:39 PM, Paul Emmerich wrote: > > we maintain (unofficial) Nautilus builds for Buster here: > > https://mirror.croit.io/debian-nautilus/ > > the repository doesn't contain the source packages. just out of > curiosity to see what you might have changes, apart from just > (re)building the packages.. are they available somewhere? > we (currently) don't apply any patches on Nautilus, some of the older Mimic packages have a few bug fixes applied. We build the packages from tags here: https://github.com/croit/ceph, i.e., the 14.2.1 packages are https://github.com/croit/ceph/tree/v14.2.1 Paul -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90 > > Regards, > Daniel > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Nautilus HEALTH_WARN for msgr2 protocol
Aha, yes, that does help ! I tried a lot of variations but couldn't quite get it to work so used the simpler alternative instead. Thanks ! On Wed, 19 Jun 2019 at 09:21, Dominik Csapak wrote: > On 6/14/19 6:10 PM, Bob Farrell wrote: > > Hi. Firstly thanks to all involved in this great mailing list, I learn > > lots from it every day. > > > > Hi, > > > > > I never figured out the correct syntax to set up the first monitor to > > use both 6789 and 3300. The other monitors that join the cluster set > > this config automatically but I couldn't work out how to apply it to the > > first monitor node. > > > > I struggled with this myself yesterday and found that the relevant > argument is not really documented: > > monmaptool --create --addv ID [v1:ip:6789,v2:ip:3300] /path/to/monmap > > > hope this helps :) > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Nautilus HEALTH_WARN for msgr2 protocol
On 6/14/19 6:10 PM, Bob Farrell wrote: Hi. Firstly thanks to all involved in this great mailing list, I learn lots from it every day. Hi, I never figured out the correct syntax to set up the first monitor to use both 6789 and 3300. The other monitors that join the cluster set this config automatically but I couldn't work out how to apply it to the first monitor node. I struggled with this myself yesterday and found that the relevant argument is not really documented: monmaptool --create --addv ID [v1:ip:6789,v2:ip:3300] /path/to/monmap hope this helps :) ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com