[ceph-users] Pool Max Avail and Ceph Dashboard Pool Useage on Nautilus giving different percentages
Hi! While browsing /#/pool in nautilus ceph dashboard I noticed it said 93% used on the single pool we have (3x replica). ceph df detal however shows 81% used on the pool and 67% raw useage. # ceph df detail RAW STORAGE: CLASS SIZEAVAIL USEDRAW USED %RAW USED ssd 478 TiB 153 TiB 324 TiB 325 TiB 67.96 TOTAL 478 TiB 153 TiB 324 TiB 325 TiB 67.96 POOLS: POOLID STORED OBJECTS USED%USED MAX AVAIL QUOTA OBJECTS QUOTA BYTES DIRTY USED COMPR UNDER COMPR echo 3 108 TiB 29.49M 324 TiB 81.6124 TiB N/A N/A 29.49M0 B 0 B I know we're looking at the most full OSD (210PGs, 79% used, 1.17 VAR) and count max avail from that. But where's the 93% full from in dashboard? My guess is that is comes from calculating: 1 - Max Avail / (Used + Max avail) = 0.93 Kind Regards, David Majchrzak ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Tuning Nautilus for flash only
Paul, Absolutely, I said I was looking at those settings and most didn't make any sense to me in a production environment (we've been running ceph since Dumpling). However we only have 1 cluster on Bluestore and I wanted to get some opinions if anything other than the defaults in ceph.conf or sysctl or things like Wido suggested with c-states would make any differences. (Thank you Wido!) Yes, running benchmarks is great, and we're already doing that ourselves. Cheers and have a nice evening! -- David Majchrzak On tor, 2019-11-28 at 17:46 +0100, Paul Emmerich wrote: > Please don't run this config in production. > Disabling checksumming is a bad idea, disabling authentication is > also > pretty bad. > > There are also a few options in there that no longer exist (osd op > threads) or are no longer relevant (max open files), in general, you > should not blindly copy config files you find on the Internet. Only > set an option to its non-default value after carefully checking what > it does and whether it applies to your use case. > > Also, run benchmarks yourself. Use benchmarks that are relevant to > your use case. > > Paul > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Tuning Nautilus for flash only
Hi! We've deployed a new flash only ceph cluster running Nautilus and I'm currently looking at any tunables we should set to get the most out of our NVMe SSDs. I've been looking a bit at the options from the blog post here: https://ceph.io/community/bluestore-default-vs-tuned-performance-comparison/ with the conf here: https://gist.github.com/likid0/1b52631ff5d0d649a22a3f30106ccea7 However some of them, like checksumming, is for testing speed only but not really applicable in a real life scenario with critical data. Should we stick with defaults or is there anything that could help? We have 256GB of RAM on each OSD host, 8 OSD hosts with 10 SSDs on each. 2 osd daemons on each SSD. Raise ssd bluestore cache to 8GB? Workload is about 50/50 r/w ops running qemu VMs through librbd. So mixed block size. 3 replicas. Appreciate any advice! Kind Regards, -- David Majchrzak ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] eu.ceph.com mirror out of sync?
Hi, I'll have a look at the status of se.ceph.com tomorrow morning, it's maintained by us. Kind Regards, David On mån, 2019-09-23 at 22:41 +0200, Oliver Freyermuth wrote: > Hi together, > > the EU mirror still seems to be out-of-sync - does somebody on this > list happen to know whom to contact about this? > Or is this mirror unmaintained and we should switch to something > else? > > Going through the list of appropriate mirrors from > https://docs.ceph.com/docs/master/install/mirrors/ (we are in > Germany) I also find: >http://de.ceph.com/ > (the mirror in Germany) to be non-resolvable. > > Closest by then for us is possibly France: >http://fr.ceph.com/rpm-nautilus/el7/x86_64/ > but also here, there's only 14.2.2, so that's also out-of-sync. > > So in the EU, at least geographically, this only leaves Sweden and > UK. > Sweden at se.ceph.com does not load for me, but UK indeed seems fine. > > Should people in the EU use that mirror, or should we all just use > download.ceph.com instead of something geographically close-by? > > Cheers, > Oliver > > > On 2019-09-17 23:01, Oliver Freyermuth wrote: > > Dear Cephalopodians, > > > > I realized just now that: > >https://eu.ceph.com/rpm-nautilus/el7/x86_64/ > > still holds only released up to 14.2.2, and nothing is to be seen > > of 14.2.3 or 14.2.4, > > while the main repository at: > >https://download.ceph.com/rpm-nautilus/el7/x86_64/ > > looks as expected. > > > > Is this issue with the eu.ceph.com mirror already knwon? > > > > Cheers, > > Oliver > > > > > > ___ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Testing a hypothetical crush map
Hi Andras, From what I can tell you can run crushtool with --test http://docs.ceph.com/docs/master/man/8/crushtool/ (https://link.getmailspring.com/link/1533557996.local-a293f1fe-4d41-v1.3.0-fd741...@getmailspring.com/0?redirect=http%3A%2F%2Fdocs.ceph.com%2Fdocs%2Fmaster%2Fman%2F8%2Fcrushtool%2F&recipient=Y2VwaC11c2Vyc0BsaXN0cy5jZXBoLmNvbQ%3D%3D) http://cephnotes.ksperis.com/blog/2015/02/02/crushmap-example-of-a-hierarchical-cluster-map (https://link.getmailspring.com/link/1533557996.local-a293f1fe-4d41-v1.3.0-fd741...@getmailspring.com/1?redirect=http%3A%2F%2Fcephnotes.ksperis.com%2Fblog%2F2015%2F02%2F02%2Fcrushmap-example-of-a-hierarchical-cluster-map&recipient=Y2VwaC11c2Vyc0BsaXN0cy5jZXBoLmNvbQ%3D%3D) David Majchrzak CTO ODERLAND Webbhotell AB E // da...@oderland.se (https://link.getmailspring.com/link/1533557996.local-a293f1fe-4d41-v1.3.0-fd741...@getmailspring.com/2?redirect=mailto%3Adavid%40oderland.se&recipient=Y2VwaC11c2Vyc0BsaXN0cy5jZXBoLmNvbQ%3D%3D) P // +46.313616161 (tel:+46.313616161) A // Östra Hamngatan 50B, 411 09 Göteborg (https://link.getmailspring.com/link/1533557996.local-a293f1fe-4d41-v1.3.0-fd741...@getmailspring.com/3?redirect=https%3A%2F%2Fmaps.google.com%2F%3Fq%3D%25C3%2596stra%2520Hamngatan%252050B&recipient=Y2VwaC11c2Vyc0BsaXN0cy5jZXBoLmNvbQ%3D%3D) W // https://www.oderland.se (https://link.getmailspring.com/link/1533557996.local-a293f1fe-4d41-v1.3.0-fd741...@getmailspring.com/4?redirect=https%3A%2F%2Fwww.oderland.se&recipient=Y2VwaC11c2Vyc0BsaXN0cy5jZXBoLmNvbQ%3D%3D) On aug 6 2018, at 1:56 pm, Andras Pataki wrote: > > Hi cephers, > Is there a way to see what a crush map change does to the PG mappings > (i.e. what placement groups end up on what OSDs) without actually > setting the crush map (and have the map take effect)? I'm looking for > some way I could test hypothetical crush map changes without any effect > on the running system. > > Andras > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Error: journal specified but not allowed by osd backend
Thanks Eugen! I was looking into running all the commands manually, following the docs for add/remove osd but tried ceph-disk first. I actually made it work by changing the id part in ceph-disk ( it was checking the wrong journal device, which was owned by root:root ). The next part was that I tried re-using an old journal, so I had to create a new one ( parted / sgdisk to set ceph-journal parttype). Could I have just zapped the previous journal? After that it prepared successfully and starting peering. Unsetting nobackfill let it recover a 4TB HDD in approx 9 hours. The best part was that I didn't have to backfill twice then, by reusing the osd uuid. I'll see if I can add to the docs after we have updated to Luminous or Mimic and started using ceph-volume. Kind Regards David Majchrzak On aug 3 2018, at 4:16 pm, Eugen Block wrote: > > Hi, > we have a full bluestore cluster and had to deal with read errors on > the SSD for the block.db. Something like this helped us to recreate a > pre-existing OSD without rebalancing, just refilling the PGs. I would > zap the journal device and let it recreate. It's very similar to your > ceph-deploy output, but maybe you get more of it if you run it manually: > > ceph-osd [--cluster-uuid ] [--osd-objectstore filestore] > --mkfs -i --osd-journal --osd-data > /var/lib/ceph/osd/ceph-/ --mkjournal --setuser ceph --setgroup > ceph --osd-uuid > > Maybe after zapping the journal this will work. At least it would rule > out the old journal as the show-stopper. > > Regards, > Eugen > > > Zitat von David Majchrzak : > > Hi! > > Trying to replace an OSD on a Jewel cluster (filestore data on HDD + > > journal device on SSD). > > I've set noout and removed the flapping drive (read errors) and > > replaced it with a new one. > > > > I've taken down the osd UUID to be able to prepare the new disk with > > the same osd.ID. The journal device is the same as the previous one > > (should I delete the partition and recreate it?) > > However, running ceph-disk prepare returns: > > # ceph-disk -v prepare --cluster-uuid > > c51a2683-55dc-4634-9d9d-f0fec9a6f389 --osd-uuid > > dc49691a-2950-4028-91ea-742ffc9ed63f --journal-dev --data-dev > > --fs-type xfs /dev/sdo /dev/sda8 > > command: Running command: /usr/bin/ceph-osd --check-allows-journal > > -i 0 --log-file $run_dir/$cluster-osd-check.log --cluster ceph > > --setuser ceph --setgroup ceph > > command: Running command: /usr/bin/ceph-osd --check-wants-journal -i > > 0 --log-file $run_dir/$cluster-osd-check.log --cluster ceph > > --setuser ceph --setgroup ceph > > command: Running command: /usr/bin/ceph-osd --check-needs-journal -i > > 0 --log-file $run_dir/$cluster-osd-check.log --cluster ceph > > --setuser ceph --setgroup ceph > > Traceback (most recent call last): > > File "/usr/sbin/ceph-disk", line 9, in > > load_entry_point('ceph-disk==1.0.0', 'console_scripts', 'ceph-disk')() > > File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 5371, in run > > main(sys.argv[1:]) > > File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 5322, in > > main > > args.func(args) > > File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 1900, in > > main > > Prepare.factory(args).prepare() > > File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line > > 1896, in factory > > return PrepareFilestore(args) > > File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line > > 1909, in __init__ > > self.journal = PrepareJournal(args) > > File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line > > 2221, in __init__ > > raise Error('journal specified but not allowed by osd backend') > > ceph_disk.main.Error: Error: journal specified but not allowed by osd > > backend > > > > I tried googling first of course. It COULD be that we have set > > setuser_match_path globally in ceph.conf (like this bug report: > > https://tracker.ceph.com/issues/19642) since the cluster was created > > as dumpling a long time ago. > > Best practice to fix it? Create [osd.X] configs and set > > setuser_match_path in there instead for the old OSDs? > > Should I do any other steps preceding this if I want to use the same > > osd UUID? I've only stopped ceph-osd@21, removed the physical disk, > > inserted new one and tried running prepare. > > Kind Regards, > > David > > > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Error: journal specified but not allowed by osd backend
Hm. You are right. Seems ceph-osd uses id 0 in main.py. I'll have a look in my dev cluster and see if it helps things. /usr/lib/python2.7/dist-packages/ceph_disk/main.py def check_journal_reqs(args): _, _, allows_journal = command([ 'ceph-osd', '--check-allows-journal', '-i', '0', '--log-file', '$run_dir/$cluster-osd-check.log', '--cluster', args.cluster, '--setuser', get_ceph_user(), '--setgroup', get_ceph_group(), ]) _, _, wants_journal = command([ 'ceph-osd', '--check-wants-journal', '-i', '0', '--log-file', '$run_dir/$cluster-osd-check.log', '--cluster', args.cluster, '--setuser', get_ceph_user(), '--setgroup', get_ceph_group(), ]) _, _, needs_journal = command([ 'ceph-osd', '--check-needs-journal', '-i', '0', '--log-file', '$run_dir/$cluster-osd-check.log', '--cluster', args.cluster, '--setuser', get_ceph_user(), '--setgroup', get_ceph_group(), ]) return (not allows_journal, not wants_journal, not needs_journal) # ceph-osd --help usage: ceph-osd -i --osd-data PATH data directory --osd-journal PATH journal file or block device --mkfs create a [new] data directory --convert-filestore run any pending upgrade operations --flush-journal flush all data out of journal --mkjournal initialize a new journal --check-wants-journal check whether a journal is desired --check-allows-journal check whether a journal is allowed --check-needs-journal check whether a journal is required --debug_osd set debug level (e.g. 10) --get-device-fsid PATH get OSD fsid for the given block device --conf/-c FILE read configuration from the given configuration file --id/-i ID set ID portion of my name --name/-n TYPE.ID set name --cluster NAME set cluster name (default: ceph) --setuser USER set uid to user or uid (and gid to user's gid) --setgroup GROUP set gid to group or gid --version show version and quit -d run in foreground, log to stderr. -f run in foreground, log to usual location. --debug_ms N set message debug level (e.g. 1) On aug 2 2018, at 11:57 am, Konstantin Shalygin wrote: > > > ceph_disk.main.Error: Error: journal specified but not allowed by osd > > backend > > I faced this issue once before. > The problem is - function is query for osd.0 instead your osd.21. > Change in main.py > > '-i', '0', > to 21 (your osd number) > '-i', '21', > and try again. > > > > k___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Error: journal specified but not allowed by osd backend
Hi! Trying to replace an OSD on a Jewel cluster (filestore data on HDD + journal device on SSD). I've set noout and removed the flapping drive (read errors) and replaced it with a new one. I've taken down the osd UUID to be able to prepare the new disk with the same osd.ID. The journal device is the same as the previous one (should I delete the partition and recreate it?) However, running ceph-disk prepare returns: # ceph-disk -v prepare --cluster-uuid c51a2683-55dc-4634-9d9d-f0fec9a6f389 --osd-uuid dc49691a-2950-4028-91ea-742ffc9ed63f --journal-dev --data-dev --fs-type xfs /dev/sdo /dev/sda8 command: Running command: /usr/bin/ceph-osd --check-allows-journal -i 0 --log-file $run_dir/$cluster-osd-check.log --cluster ceph --setuser ceph --setgroup ceph command: Running command: /usr/bin/ceph-osd --check-wants-journal -i 0 --log-file $run_dir/$cluster-osd-check.log --cluster ceph --setuser ceph --setgroup ceph command: Running command: /usr/bin/ceph-osd --check-needs-journal -i 0 --log-file $run_dir/$cluster-osd-check.log --cluster ceph --setuser ceph --setgroup ceph Traceback (most recent call last): File "/usr/sbin/ceph-disk", line 9, in load_entry_point('ceph-disk==1.0.0', 'console_scripts', 'ceph-disk')() File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 5371, in run main(sys.argv[1:]) File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 5322, in main args.func(args) File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 1900, in main Prepare.factory(args).prepare() File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 1896, in factory return PrepareFilestore(args) File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 1909, in __init__ self.journal = PrepareJournal(args) File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 2221, in __init__ raise Error('journal specified but not allowed by osd backend') ceph_disk.main.Error: Error: journal specified but not allowed by osd backend I tried googling first of course. It COULD be that we have set setuser_match_path globally in ceph.conf (like this bug report: https://tracker.ceph.com/issues/19642) since the cluster was created as dumpling a long time ago. Best practice to fix it? Create [osd.X] configs and set setuser_match_path in there instead for the old OSDs? Should I do any other steps preceding this if I want to use the same osd UUID? I've only stopped ceph-osd@21, removed the physical disk, inserted new one and tried running prepare. Kind Regards, David ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] PGs stuck peering (looping?) after upgrade to Luminous.
Hi/Hej Magnus, We had a similar issue going from latest hammer to jewel (so might not be applicable for you), with PGs stuck peering / data misplaced, right after updating all mons to latest jewel at that time 10.2.10. Finally setting the require_jewel_osds put everything back in place ( we were going to do this after restarting all OSDs, following the docs/changelogs ). What does your ceph health detail look like? Did you perform any other commands after starting your mon upgrade? Any commands that might change the crush-map might cause issues AFAIK (correct me if im wrong, but i think we ran into this once) if your mons and osds are different versions. // david On jul 12 2018, at 11:45 am, Magnus Grönlund wrote: > > Hi list, > > Things went from bad to worse, tried to upgrade some OSDs to Luminous to see > if that could help but that didn’t appear to make any difference. > But for each restarted OSD there was a few PGs that the OSD seemed to > “forget” and the number of undersized PGs grew until some PGs had been > “forgotten” by all 3 acting OSDs and became stale, even though all OSDs (and > their disks) where available. > Then the OSDs grew so big that the servers ran out of memory (48GB per server > with 10 2TB-disks per server) and started killing the OSDs… > All OSDs where then shutdown to try and preserve some data on the disks at > least, but maybe it is too late? > > /Magnus > > 2018-07-11 21:10 GMT+02:00 Magnus Grönlund (mailto:mag...@gronlund.se)>: > > Hi Paul, > > > > No all OSDs are still jewel , the issue started before I had even started > > to upgrade the first OSD and they don't appear to be flapping. > > ceph -w shows a lot of slow request etc, but nothing unexpected as far as I > > can tell considering the state the cluster is in. > > > > 2018-07-11 20:40:09.396642 osd.37 [WRN] 100 slow requests, 2 included > > below; oldest blocked for > 25402.278824 secs > > 2018-07-11 20:40:09.396652 osd.37 [WRN] slow request 1920.957326 seconds > > old, received at 2018-07-11 20:08:08.439214: > > osd_op(client.73540057.0:8289463 2.e57b3e32 (undecoded) > > ack+ondisk+retry+write+known_if_redirected e160294) currently waiting for > > peered > > 2018-07-11 20:40:09.396660 osd.37 [WRN] slow request 1920.048094 seconds > > old, received at 2018-07-11 20:08:09.348446: > > osd_op(client.671628641.0:998704 2.42f88232 (undecoded) > > ack+ondisk+retry+write+known_if_redirected e160475) currently waiting for > > peered > > 2018-07-11 20:40:10.397008 osd.37 [WRN] 100 slow requests, 2 included > > below; oldest blocked for > 25403.279204 secs > > 2018-07-11 20:40:10.397017 osd.37 [WRN] slow request 1920.043860 seconds > > old, received at 2018-07-11 20:08:10.353060: > > osd_op(client.231731103.0:1007729 3.e0ff5786 (undecoded) > > ondisk+write+known_if_redirected e137428) currently waiting for peered > > 2018-07-11 20:40:10.397023 osd.37 [WRN] slow request 1920.034101 seconds > > old, received at 2018-07-11 20:08:10.362819: > > osd_op(client.207458703.0:2000292 3.a8143b86 (undecoded) > > ondisk+write+known_if_redirected e137428) currently waiting for peered > > 2018-07-11 20:40:10.790573 mon.0 [INF] pgmap 4104 pgs: 5 down+peering, 1142 > > peering, 210 remapped+peering, 5 active+recovery_wait+degraded, 1551 > > active+clean, 2 activating+undersized+degraded+remapped, 15 > > active+remapped+backfilling, 178 unknown, 1 active+remapped, 3 > > activating+remapped, 78 active+undersized+degraded+remapped+backfill_wait, > > 6 active+recovery_wait+degraded+remapped, 3 > > undersized+degraded+remapped+backfill_wait+peered, 5 > > active+undersized+degraded+remapped+backfilling, 295 > > active+remapped+backfill_wait, 3 active+recovery_wait+undersized+degraded, > > 21 activating+undersized+degraded, 559 active+undersized+degraded, 4 > > remapped, 17 undersized+degraded+peered, 1 > > active+recovery_wait+undersized+degraded+remapped; 13439 GB data, 42395 GB > > used, 160 TB / 201 TB avail; 4069 B/s rd, 746 kB/s wr, 5 op/s; > > 534753/10756032 objects degraded (4.972%); 779027/10756032 objects > > misplaced (7.243%); 256 MB/s, 65 objects/s recovering > > > > > > > > > > There are a lot of things in the OSD-log files that I'm unfamiliar with but > > so far I haven't found anything that has given me a clue on how to fix the > > issue. > > BTW restarting a OSD doesn't seem to help, on the contrary, that sometimes > > results in PGs beeing stuck undersized! > > I have attaced a osd-log from when a OSD i restarted started up. > > > > Best regards > > /Magnus > > > > > > 2018-07-11 20:39 GMT+02:00 Paul Emmerich > (mailto:paul.emmer...@croit.io)>: > > > Did you finish the upgrade of the OSDs? Are OSDs flapping? (ceph -w) Is > > > there anything weird in the OSDs' log files? > > > > > > > > > > > > Paul > > > > > > 2018-07-11 20:30 GMT+02:00 Magnus Grönlund > > (mailto:mag...@gronlund.se)>: > > > > Hi, > > > > > > > > Started to upgrade a ceph-cluster from Jewel (10.2.10) to Luminou
[ceph-users] Any issues with old tunables (cluster/pool created at dumpling)?
Hi, Upgrading an old cluster that was created with dumpling up to luminous soon (with a quick stop at jewel, currently upgrading deb7 -> deb8 so we can get any newer packages). My idea is to keep the tuneables as they are, since this pool has active data and I've already disabled tunable warnings in ceph.conf. Are there any "issues" running with old tunables? Disruption of service? Kind Regards, David Majchrzak ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Reweight 0 - best way to backfill slowly?
Works great, seemed to have alot less impact than just letting it peer all PGs at the same time. Used an increment of 0.05 without issue, then a ceph tell 'osd.*' injectargs '--osd-max-backfills 2' seems to keep the HDD at around 85-100% util, but not really affecting the clients. Solid advice, cheers. Kind Regards, David Majchrzak > 29 jan. 2018 kl. 23:14 skrev David Majchrzak : > > Thanks Steve! > > So the peering won't actually move any blocks around, but will make sure that > all PGs know what state they are in? That means that when I start increasing > reweight, PGs will be allocated to the disk, but won't actually recover yet. > However, they will be set as "degraded". > So when all of the peering is done, I'll unset the norecover/nobackfill flags > and backfill will commence but will be less I/O intensive than peering and > backfilling at the same time? > > Kind Regards, > > David Majchrzak > >> 29 jan. 2018 kl. 22:57 skrev Steve Taylor > <mailto:steve.tay...@storagecraft.com>>: >> >> There are two concerns with setting the reweight to 1.0. The first is >> peering and the second is backfilling. Peering is going to block client I/O >> on the affected OSDs, while backfilling will only potentially slow things >> down. >> >> I don't know what your client I/O looks like, but personally I would >> probably set the norecover and nobackfill flags, slowly increment your >> reweight value by 0.01 or whatever you deem to be appropriate for your >> environment, waiting for peering to complete in between each step. Also >> allow any resulting blocked requests to clear up before incrementing your >> reweight again. >> >> When your reweight is all the way up to 1.0, inject osd_max_backfills to >> whatever you like (or don't if you're happy with it as is) and unset the >> norecover and nobackfill flags to let backfilling begin. If you are unable >> to handle the impact of backfilling with osd_max_backfills set to 1, then >> you need to add some new OSDs to your cluster before doing any of this. They >> will have to backfill too, but at least you'll have more spindles to handle >> it. >> >> >> >> >> Steve Taylor | Senior Software Engineer | StorageCraft Technology >> Corporation <https://storagecraft.com/> >> 380 Data Drive Suite 300 | Draper | Utah | 84020 >> Office: 801.871.2799 | >> >> If you are not the intended recipient of this message or received it >> erroneously, please notify the sender and delete it, together with any >> attachments, and be advised that any dissemination or copying of this >> message is prohibited. >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> On Mon, 2018-01-29 at 22:43 +0100, David Majchrzak wrote: >>> And so I totally forgot to add df tree to the mail. >>> Here's the interesting bit from two first nodes. where osd.11 has weight >>> but is reweighted to 0. >>> >>> >>> root@osd1:~# ceph osd df tree >>> ID WEIGHTREWEIGHT SIZE USEAVAIL %USE VAR TYPE NAME >>> -1 181.7- 109T 50848G 60878G 00 root default >>> -2 36.3- 37242G 16792G 20449G 45.09 0.99 host osd1 >>> 0 3.64000 1.0 3724G 1730G 1993G 46.48 1.02 osd.0 >>> 1 3.64000 1.0 3724G 1666G 2057G 44.75 0.98 osd.1 >>> 2 3.64000 1.0 3724G 1734G 1989G 46.57 1.02 osd.2 >>> 3 3.64000 1.0 3724G 1387G 2336G 37.25 0.82 osd.3 >>> 4 3.64000 1.0 3724G 1722G 2002G 46.24 1.01 osd.4 >>> 6 3.64000 1.0 3724G 1840G 1883G 49.43 1.08 osd.6 >>> 7 3.64000 1.0 3724G 1651G 2072G 44.34 0.97 osd.7 >>> 8 3.64000 1.0 3724G 1747G 1976G 46.93 1.03 osd.8 >>> 9 3.64000 1.0 3724G 1697G 2026G 45.58 1.00 osd.9 >>> 5 3.64000 1.0 3724G 1614G 2109G 43.34 0.95 osd.5 >>> -3 36.3- 0 0 0 00 host osd2 >>> 12 3.64000 1.0 3724G 1730G 1993G 46.46 1.02 osd.12 >>> 13 3.64000 1.0 3724G 1745G 1978G 46.88 1.03 osd.13 >>> 14 3.64000 1.0 3724G 1707G 2016G 45.84 1.01 osd.14 >>> 15 3.64000 1.0 3724G 1540G 2184G 41.35 0.91 osd.15 >>> 16 3.64000 1.0 3724G 14
Re: [ceph-users] Reweight 0 - best way to backfill slowly?
Thanks Steve! So the peering won't actually move any blocks around, but will make sure that all PGs know what state they are in? That means that when I start increasing reweight, PGs will be allocated to the disk, but won't actually recover yet. However, they will be set as "degraded". So when all of the peering is done, I'll unset the norecover/nobackfill flags and backfill will commence but will be less I/O intensive than peering and backfilling at the same time? Kind Regards, David Majchrzak > 29 jan. 2018 kl. 22:57 skrev Steve Taylor : > > There are two concerns with setting the reweight to 1.0. The first is peering > and the second is backfilling. Peering is going to block client I/O on the > affected OSDs, while backfilling will only potentially slow things down. > > I don't know what your client I/O looks like, but personally I would probably > set the norecover and nobackfill flags, slowly increment your reweight value > by 0.01 or whatever you deem to be appropriate for your environment, waiting > for peering to complete in between each step. Also allow any resulting > blocked requests to clear up before incrementing your reweight again. > > When your reweight is all the way up to 1.0, inject osd_max_backfills to > whatever you like (or don't if you're happy with it as is) and unset the > norecover and nobackfill flags to let backfilling begin. If you are unable to > handle the impact of backfilling with osd_max_backfills set to 1, then you > need to add some new OSDs to your cluster before doing any of this. They will > have to backfill too, but at least you'll have more spindles to handle it. > > > > > Steve Taylor | Senior Software Engineer | StorageCraft Technology Corporation > <https://storagecraft.com/> > 380 Data Drive Suite 300 | Draper | Utah | 84020 > Office: 801.871.2799 | > > If you are not the intended recipient of this message or received it > erroneously, please notify the sender and delete it, together with any > attachments, and be advised that any dissemination or copying of this message > is prohibited. > > > > > > > > > > > > > > > > > > > > > > > On Mon, 2018-01-29 at 22:43 +0100, David Majchrzak wrote: >> And so I totally forgot to add df tree to the mail. >> Here's the interesting bit from two first nodes. where osd.11 has weight but >> is reweighted to 0. >> >> >> root@osd1:~# ceph osd df tree >> ID WEIGHTREWEIGHT SIZE USEAVAIL %USE VAR TYPE NAME >> -1 181.7- 109T 50848G 60878G 00 root default >> -2 36.3- 37242G 16792G 20449G 45.09 0.99 host osd1 >> 0 3.64000 1.0 3724G 1730G 1993G 46.48 1.02 osd.0 >> 1 3.64000 1.0 3724G 1666G 2057G 44.75 0.98 osd.1 >> 2 3.64000 1.0 3724G 1734G 1989G 46.57 1.02 osd.2 >> 3 3.64000 1.0 3724G 1387G 2336G 37.25 0.82 osd.3 >> 4 3.64000 1.0 3724G 1722G 2002G 46.24 1.01 osd.4 >> 6 3.64000 1.0 3724G 1840G 1883G 49.43 1.08 osd.6 >> 7 3.64000 1.0 3724G 1651G 2072G 44.34 0.97 osd.7 >> 8 3.64000 1.0 3724G 1747G 1976G 46.93 1.03 osd.8 >> 9 3.64000 1.0 3724G 1697G 2026G 45.58 1.00 osd.9 >> 5 3.64000 1.0 3724G 1614G 2109G 43.34 0.95 osd.5 >> -3 36.3- 0 0 0 00 host osd2 >> 12 3.64000 1.0 3724G 1730G 1993G 46.46 1.02 osd.12 >> 13 3.64000 1.0 3724G 1745G 1978G 46.88 1.03 osd.13 >> 14 3.64000 1.0 3724G 1707G 2016G 45.84 1.01 osd.14 >> 15 3.64000 1.0 3724G 1540G 2184G 41.35 0.91 osd.15 >> 16 3.64000 1.0 3724G 1484G 2239G 39.86 0.87 osd.16 >> 18 3.64000 1.0 3724G 1928G 1796G 51.77 1.14 osd.18 >> 20 3.64000 1.0 3724G 1767G 1956G 47.45 1.04 osd.20 >> 10 3.64000 1.0 3724G 1797G 1926G 48.27 1.06 osd.10 >> 49 3.64000 1.0 3724G 1847G 1877G 49.60 1.09 osd.49 >> 11 3.640000 0 0 0 00 osd.11 >> >>> >>> 29 jan. 2018 kl. 22:40 skrev David Majchrzak >> <mailto:da...@visions.se>>: >>> >>> Hi! >>> >>> Cluster: 5 HW nodes, 10 HDDs with SSD journals, filestore, 0.94.9 hammer, >>> debian wheezy (scheduled to upgrade once this is fixed). >>> >>> I have a replaced HDD that another admin set to reweight 0 instead of >>> weight 0 (I can'
[ceph-users] Reweight 0 - best way to backfill slowly?
Hi! Cluster: 5 HW nodes, 10 HDDs with SSD journals, filestore, 0.94.9 hammer, debian wheezy (scheduled to upgrade once this is fixed). I have a replaced HDD that another admin set to reweight 0 instead of weight 0 (I can't remember the reason). What would be the best way to slowly backfill it? Usually I'm using weight and slowly growing it to max size. I guess if I just set reweight to 1.0, it will backfill as fast as I let it, that is max 1 backfill / osd but it will probably disrupt client io (this being on hammer). And if I set the weight on it to 0, the node will get less weight, and will start moving data around everywhere right? Can I use reweight the same way as weight here, slowly increasing it up to 1.0 by increments of say 0.01? Kind Regards, David Majchrzak ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Reweight 0 - best way to backfill slowly?
And so I totally forgot to add df tree to the mail. Here's the interesting bit from two first nodes. where osd.11 has weight but is reweighted to 0. root@osd1:~# ceph osd df tree ID WEIGHTREWEIGHT SIZE USEAVAIL %USE VAR TYPE NAME -1 181.7- 109T 50848G 60878G 00 root default -2 36.3- 37242G 16792G 20449G 45.09 0.99 host osd1 0 3.64000 1.0 3724G 1730G 1993G 46.48 1.02 osd.0 1 3.64000 1.0 3724G 1666G 2057G 44.75 0.98 osd.1 2 3.64000 1.0 3724G 1734G 1989G 46.57 1.02 osd.2 3 3.64000 1.0 3724G 1387G 2336G 37.25 0.82 osd.3 4 3.64000 1.0 3724G 1722G 2002G 46.24 1.01 osd.4 6 3.64000 1.0 3724G 1840G 1883G 49.43 1.08 osd.6 7 3.64000 1.0 3724G 1651G 2072G 44.34 0.97 osd.7 8 3.64000 1.0 3724G 1747G 1976G 46.93 1.03 osd.8 9 3.64000 1.0 3724G 1697G 2026G 45.58 1.00 osd.9 5 3.64000 1.0 3724G 1614G 2109G 43.34 0.95 osd.5 -3 36.3- 0 0 0 00 host osd2 12 3.64000 1.0 3724G 1730G 1993G 46.46 1.02 osd.12 13 3.64000 1.0 3724G 1745G 1978G 46.88 1.03 osd.13 14 3.64000 1.0 3724G 1707G 2016G 45.84 1.01 osd.14 15 3.64000 1.0 3724G 1540G 2184G 41.35 0.91 osd.15 16 3.64000 1.0 3724G 1484G 2239G 39.86 0.87 osd.16 18 3.64000 1.0 3724G 1928G 1796G 51.77 1.14 osd.18 20 3.64000 1.0 3724G 1767G 1956G 47.45 1.04 osd.20 10 3.64000 1.0 3724G 1797G 1926G 48.27 1.06 osd.10 49 3.64000 1.0 3724G 1847G 1877G 49.60 1.09 osd.49 11 3.640000 0 0 0 00 osd.11 > 29 jan. 2018 kl. 22:40 skrev David Majchrzak : > > Hi! > > Cluster: 5 HW nodes, 10 HDDs with SSD journals, filestore, 0.94.9 hammer, > debian wheezy (scheduled to upgrade once this is fixed). > > I have a replaced HDD that another admin set to reweight 0 instead of weight > 0 (I can't remember the reason). > What would be the best way to slowly backfill it? Usually I'm using weight > and slowly growing it to max size. > > I guess if I just set reweight to 1.0, it will backfill as fast as I let it, > that is max 1 backfill / osd but it will probably disrupt client io (this > being on hammer). > > And if I set the weight on it to 0, the node will get less weight, and will > start moving data around everywhere right? > > Can I use reweight the same way as weight here, slowly increasing it up to > 1.0 by increments of say 0.01? > > Kind Regards, > David Majchrzak > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Migrating filestore to bluestore using ceph-volume
Yeah, next one will be without double rebalance, I just had alot of time on my hands. Never did use kill before, however I followed the docs here. Should probably be updated. http://docs.ceph.com/docs/master/rados/operations/bluestore-migration/#convert-existing-osds <http://docs.ceph.com/docs/master/rados/operations/bluestore-migration/#convert-existing-osds> Is this the tracker for the docs? http://tracker.ceph.com/projects/ceph-website/issues?set_filter=1&tracker_id=6 <http://tracker.ceph.com/projects/ceph-website/issues?set_filter=1&tracker_id=6> > 26 jan. 2018 kl. 19:22 skrev Wido den Hollander : > > > > On 01/26/2018 07:09 PM, David Majchrzak wrote: >> destroy did remove the auth key, however create didnt add the auth, I had to >> do it manually. >> Then I tried to start the osd.0 again and it failed because osdmap said it >> was destroyed. > > That seems like this bug: http://tracker.ceph.com/issues/22673 > >> I've summed my steps below: >> Here are my commands prior to create: >> root@int1:~# ceph osd out 0 > >> <-- wait for rebalance/recover --> >> root@int1:~# ceph osd safe-to-destroy 0 >> OSD(s) 0 are safe to destroy without reducing data durability. > > Although it's a very safe route it's not required. You'll have a double > rebalance here. > >> root@int1:~# systemctl kill ceph-osd@0 > > I recommend using 'stop' and not kill. The stop is a clear and graceful > shutdown. > > As I haven't used ceph-volume before I'm not able to tell exactly why the > commands underneath fail. > > Wido > >> root@int1:~# ceph status >> cluster: >> id: efad7df8-721d-43d8-8d02-449406e70b90 >> health: HEALTH_OK >> services: >> mon: 3 daemons, quorum int1,int2,int3 >> mgr: int1(active), standbys: int3, int2 >> osd: 6 osds: 5 up, 5 in >> data: >> pools: 2 pools, 320 pgs >> objects: 97038 objects, 364 GB >> usage: 1096 GB used, 1128 GB / 2224 GB avail >> pgs: 320 active+clean >> io: >> client: 289 kB/s rd, 870 kB/s wr, 46 op/s rd, 48 op/s wr >> root@int1:~# mount | grep /var/lib/ceph/osd/ceph-0 >> /dev/sdc1 on /var/lib/ceph/osd/ceph-0 type xfs >> (rw,noatime,attr2,inode64,noquota) >> root@int1:~# umount /var/lib/ceph/osd/ceph-0 >> root@int1:~# ceph-volume lvm zap /dev/sdc >> Zapping: /dev/sdc >> Running command: sudo wipefs --all /dev/sdc >> stdout: /dev/sdc: 8 bytes were erased at offset 0x0200 (gpt): 45 46 49 >> 20 50 41 52 54 >> /dev/sdc: 8 bytes were erased at offset 0x6fc86d5e00 (gpt): 45 46 49 20 50 >> 41 52 54 >> /dev/sdc: 2 bytes were erased at offset 0x01fe (PMBR): 55 aa >> /dev/sdc: calling ioctl to re-read partition table: Success >> Running command: dd if=/dev/zero of=/dev/sdc bs=1M count=10 >> stderr: 10+0 records in >> 10+0 records out >> 10485760 bytes (10 MB) copied >> stderr: , 0.0253999 s, 413 MB/s >> --> Zapping successful for: /dev/sdc >> root@int1:~# ceph osd destroy 0 --yes-i-really-mean-it >> destroyed osd.0 >> root@int1:~# ceph status >> cluster: >> id: efad7df8-721d-43d8-8d02-449406e70b90 >> health: HEALTH_OK >> services: >> mon: 3 daemons, quorum int1,int2,int3 >> mgr: int1(active), standbys: int3, int2 >> osd: 6 osds: 5 up, 5 in >> data: >> pools: 2 pools, 320 pgs >> objects: 97038 objects, 364 GB >> usage: 1096 GB used, 1128 GB / 2224 GB avail >> pgs: 320 active+clean >> io: >> client: 56910 B/s rd, 1198 kB/s wr, 15 op/s rd, 48 op/s wr >> root@int1:~# ceph-volume create --bluestore --data /dev/sdc --osd-id 0 >> usage: ceph-volume [-h] [--cluster CLUSTER] [--log-level LOG_LEVEL] >>[--log-path LOG_PATH] >> ceph-volume: error: unrecognized arguments: create --bluestore --data >> /dev/sdc --osd-id 0 >> root@int1:~# ceph-volume lvm create --bluestore --data /dev/sdc --osd-id 0 >> Running command: sudo vgcreate --force --yes >> ceph-efad7df8-721d-43d8-8d02-449406e70b90 /dev/sdc >> stderr: WARNING: lvmetad is running but disabled. Restart lvmetad before >> enabling it! >> stdout: Physical volume "/dev/sdc" successfully created >> stdout: Volume group "ceph-efad7df8-721d-43d8-8d02-449406e70b90" >> successfully created >> Running command: sudo lvcreate --yes -l 100%FREE -n >> osd-block-138ce507-f28a-45bf-814c-7fa124a9d9b9 >> ceph-efad7df8-721d-43d8-8d02-449406e70b90 >> stderr: WARNING: lvm
Re: [ceph-users] Migrating filestore to bluestore using ceph-volume
_t::decode(ceph::buffer::list::iterator&) decode past end of struct encoding stderr: 2018-01-26 14:59:10.039925 7fd7ef951cc0 -1 bluestore(/var/lib/ceph/osd/ceph-0//block) _read_bdev_label unable to decode label at offset 102: buffer::malformed_input: void bluestore_bdev_label_t::decode(ceph::buffer::list::iterator&) decode past end of struct encoding stderr: 2018-01-26 14:59:10.039984 7fd7ef951cc0 -1 bluestore(/var/lib/ceph/osd/ceph-0/) _read_fsid unparsable uuid stderr: 2018-01-26 14:59:11.359951 7fd7ef951cc0 -1 key AQA5Qmta9LERFhAAKU+AmT1Sm56nk7sWx2BATQ== stderr: 2018-01-26 14:59:11.888476 7fd7ef951cc0 -1 created object store /var/lib/ceph/osd/ceph-0/ for osd.0 fsid efad7df8-721d-43d8-8d02-449406e70b90 Running command: sudo ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/ceph-efad7df8-721d-43d8-8d02-449406e70b90/osd-block-138ce507-f28a-45bf-814c-7fa124a9d9b9 --path /var/lib/ceph/osd/ceph-0 Running command: sudo ln -snf /dev/ceph-efad7df8-721d-43d8-8d02-449406e70b90/osd-block-138ce507-f28a-45bf-814c-7fa124a9d9b9 /var/lib/ceph/osd/ceph-0/block Running command: chown -R ceph:ceph /dev/dm-4 Running command: chown -R ceph:ceph /var/lib/ceph/osd/ceph-0 Running command: sudo systemctl enable ceph-volume@lvm-0-138ce507-f28a-45bf-814c-7fa124a9d9b9 stderr: Created symlink from /etc/systemd/system/multi-user.target.wants/ceph-volume@lvm-0-138ce507-f28a-45bf-814c-7fa124a9d9b9.service to /lib/systemd/system/ceph-volume@.service. Running command: sudo systemctl start ceph-osd@0 root@int1:~# ceph status cluster: id: efad7df8-721d-43d8-8d02-449406e70b90 health: HEALTH_OK services: mon: 3 daemons, quorum int1,int2,int3 mgr: int1(active), standbys: int3, int2 osd: 6 osds: 5 up, 5 in data: pools: 2 pools, 320 pgs objects: 97038 objects, 364 GB usage: 1095 GB used, 1128 GB / 2224 GB avail pgs: 320 active+clean io: client: 294 kB/s rd, 1827 kB/s wr, 61 op/s rd, 96 op/s wr root@int1:~# ceph osd tree ID CLASS WEIGHT TYPE NAME STATUSREWEIGHT PRI-AFF -1 2.60458 root default -2 0.86819 host int1 0 ssd 0.43159 osd.0 destroyed0 1.0 3 ssd 0.43660 osd.3up 1.0 1.0 -3 0.86819 host int2 1 ssd 0.43159 osd.1up 1.0 1.0 4 ssd 0.43660 osd.4up 1.0 1.0 -4 0.86819 host int3 2 ssd 0.43159 osd.2up 1.0 1.0 5 ssd 0.43660 osd.5up 1.0 1.0 root@int1:~# ceph auth ls Does not list osd.0 root@int1:~# ceph auth add osd.0 osd 'allow *' mon 'allow profile osd' mgr 'allow profile osd' -i /var/lib/ceph/osd/ceph-0/keyring added key for osd.0 root@int1:~# systemctl start ceph-osd@0 root@int1:~# ceph status cluster: id: efad7df8-721d-43d8-8d02-449406e70b90 health: HEALTH_OK services: mon: 3 daemons, quorum int1,int2,int3 mgr: int1(active), standbys: int3, int2 osd: 6 osds: 5 up, 5 in data: pools: 2 pools, 320 pgs objects: 97163 objects, 365 GB usage: 1097 GB used, 1127 GB / 2224 GB avail pgs: 320 active+clean io: client: 284 kB/s rd, 539 kB/s wr, 32 op/s rd, 30 op/s wr root@int1:~# systemctl status ceph-osd@0 ● ceph-osd@0.service - Ceph object storage daemon osd.0 Loaded: loaded (/lib/systemd/system/ceph-osd@.service; enabled) Drop-In: /lib/systemd/system/ceph-osd@.service.d └─ceph-after-pve-cluster.conf Active: inactive (dead) since Fri 2018-01-26 17:02:08 UTC; 54s ago Process: 6857 ExecStart=/usr/bin/ceph-osd -f --cluster ${CLUSTER} --id %i --setuser ceph --setgroup ceph (code=exited, status=0/SUCCESS) Process: 6851 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh --cluster ${CLUSTER} --id %i (code=exited, status=0/SUCCESS) Main PID: 6857 (code=exited, status=0/SUCCESS) Jan 26 17:02:08 int1 systemd[1]: Started Ceph object storage daemon osd.0. Jan 26 17:02:08 int1 ceph-osd[6857]: starting osd.0 at - osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal Jan 26 17:02:08 int1 ceph-osd[6857]: 2018-01-26 17:02:08.801761 7fed0b5bbcc0 -1 osd.0 0 log_to_monitors {default=true} Jan 26 17:02:08 int1 ceph-osd[6857]: 2018-01-26 17:02:08.804600 7fecf2ee4700 -1 osd.0 0 waiting for initial osdmap Jan 26 17:02:08 int1 ceph-osd[6857]: 2018-01-26 17:02:08.909237 7fecf7eee700 -1 osd.0 1040 osdmap says I am destroyed, exiting After this I followed Reed Dier's steps somewhat, zapped the disk again, removed auth, crush, osd. Zapped the disk/parts and device mapper. Could then run the create command without issues. Kind Regards, David Majchrzak > 26 jan. 2018 kl. 18:56 skrev Wido den Hollander : > > > > On 01/26/2018 06:53 PM, David Majchrzak wrote: >> I did do that. >> It didn't add the auth key to ceph, so I had to do that manually. Then it >> said that osd.0 was set as destroye
Re: [ceph-users] Migrating filestore to bluestore using ceph-volume
I did do that. It didn't add the auth key to ceph, so I had to do that manually. Then it said that osd.0 was set as destroyed, which yes, it was still in crushmap. I followed the docs to a point. > 26 jan. 2018 kl. 18:50 skrev Wido den Hollander : > > > > On 01/26/2018 06:37 PM, David Majchrzak wrote: >> Ran: >> ceph auth del osd.0 >> ceph auth del osd.6 >> ceph auth del osd.7 >> ceph osd rm osd.0 >> ceph osd rm osd.6 >> ceph osd rm osd.7 >> which seems to have removed them. > > Did you destroy the OSD prior to running ceph-volume? > > $ ceph osd destroy 6 > > After you've done that you can use ceph-volume to re-create the OSD. > > Wido > >> Thanks for the help Reed! >> Kind Regards, >> David Majchrzak >>> 26 jan. 2018 kl. 18:32 skrev David Majchrzak >> <mailto:da...@visions.se>>: >>> >>> Thanks that helped! >>> >>> Since I had already "halfway" created a lvm volume I wanted to start from >>> the beginning and zap it. >>> >>> Tried to zap the raw device but failed since --destroy doesn't seem to be >>> in 12.2.2 >>> >>> http://docs.ceph.com/docs/master/ceph-volume/lvm/zap/ >>> >>> root@int1:~# ceph-volume lvm zap /dev/sdc --destroy >>> usage: ceph-volume lvm zap [-h] [DEVICE] >>> ceph-volume lvm zap: error: unrecognized arguments: --destroy >>> >>> So i zapped it with the vg/lvm instead. >>> ceph-volume lvm zap >>> /dev/ceph-efad7df8-721d-43d8-8d02-449406e70b90/osd-block-138ce507-f28a-45bf-814c-7fa124a9d9b9 >>> >>> However I run create on it since the LVM was already there. >>> So I zapped it with sgdisk and ran dmsetup remove. After that I was able to >>> create it again. >>> >>> However - each "ceph-volume lvm create" that I ran that failed, >>> successfully added an osd to crush map ;) >>> >>> So I've got this now: >>> >>> root@int1:~# ceph osd df tree >>> ID CLASS WEIGHT REWEIGHT SIZE USEAVAIL %USE VAR PGS TYPE NAME >>> -1 2.60959- 2672G 1101G 1570G 41.24 1.00 - root default >>> -2 0.87320- 894G 369G 524G 41.36 1.00 - host int1 >>> 3 ssd 0.43660 1.0 447G 358G 90295M 80.27 1.95 301 osd.3 >>> 8 ssd 0.43660 1.0 447G 11273M 436G 2.46 0.06 19 osd.8 >>> -3 0.86819- 888G 366G 522G 41.26 1.00 - host int2 >>> 1 ssd 0.43159 1.0 441G 167G 274G 37.95 0.92 147 osd.1 >>> 4 ssd 0.43660 1.0 447G 199G 247G 44.54 1.08 173 osd.4 >>> -4 0.86819- 888G 365G 523G 41.09 1.00 - host int3 >>> 2 ssd 0.43159 1.0 441G 193G 248G 43.71 1.06 174 osd.2 >>> 5 ssd 0.43660 1.0 447G 172G 274G 38.51 0.93 146 osd.5 >>> 0 00 0 0 0 00 0 osd.0 >>> 6 00 0 0 0 00 0 osd.6 >>> 7 00 0 0 0 00 0 osd.7 >>> >>> I guess I can just remove them from crush,auth and rm them? >>> >>> Kind Regards, >>> >>> David Majchrzak >>> >>>> 26 jan. 2018 kl. 18:09 skrev Reed Dier >>> <mailto:reed.d...@focusvq.com>>: >>>> >>>> This is the exact issue that I ran into when starting my bluestore >>>> conversion journey. >>>> >>>> See my thread here: https://www.spinics.net/lists/ceph-users/msg41802.html >>>> >>>> Specifying --osd-id causes it to fail. >>>> >>>> Below are my steps for OSD replace/migrate from filestore to bluestore. >>>> >>>> BIG caveat here in that I am doing destructive replacement, in that I am >>>> not allowing my objects to be migrated off of the OSD I’m replacing before >>>> nuking it. >>>> With 8TB drives it just takes way too long, and I trust my failure domains >>>> and other hardware to get me through the backfills. >>>> So instead of 1) reading data off, writing data elsewhere 2) remove/re-add >>>> 3) reading data elsewhere, writing back on, I am taking step one out, and >>>> trusting my two other copies of the objects. Just wanted to clarify my >>>> steps. >>>> >>>> I also set norecover and norebalance flags immediately prior to running >>>> t
Re: [ceph-users] Migrating filestore to bluestore using ceph-volume
Ran: ceph auth del osd.0 ceph auth del osd.6 ceph auth del osd.7 ceph osd rm osd.0 ceph osd rm osd.6 ceph osd rm osd.7 which seems to have removed them. Thanks for the help Reed! Kind Regards, David Majchrzak > 26 jan. 2018 kl. 18:32 skrev David Majchrzak : > > Thanks that helped! > > Since I had already "halfway" created a lvm volume I wanted to start from the > beginning and zap it. > > Tried to zap the raw device but failed since --destroy doesn't seem to be in > 12.2.2 > > http://docs.ceph.com/docs/master/ceph-volume/lvm/zap/ > <http://docs.ceph.com/docs/master/ceph-volume/lvm/zap/> > > root@int1:~# ceph-volume lvm zap /dev/sdc --destroy > usage: ceph-volume lvm zap [-h] [DEVICE] > ceph-volume lvm zap: error: unrecognized arguments: --destroy > > So i zapped it with the vg/lvm instead. > ceph-volume lvm zap > /dev/ceph-efad7df8-721d-43d8-8d02-449406e70b90/osd-block-138ce507-f28a-45bf-814c-7fa124a9d9b9 > > However I run create on it since the LVM was already there. > So I zapped it with sgdisk and ran dmsetup remove. After that I was able to > create it again. > > However - each "ceph-volume lvm create" that I ran that failed, successfully > added an osd to crush map ;) > > So I've got this now: > > root@int1:~# ceph osd df tree > ID CLASS WEIGHT REWEIGHT SIZE USEAVAIL %USE VAR PGS TYPE NAME > -1 2.60959- 2672G 1101G 1570G 41.24 1.00 - root default > -2 0.87320- 894G 369G 524G 41.36 1.00 - host int1 > 3 ssd 0.43660 1.0 447G 358G 90295M 80.27 1.95 301 osd.3 > 8 ssd 0.43660 1.0 447G 11273M 436G 2.46 0.06 19 osd.8 > -3 0.86819- 888G 366G 522G 41.26 1.00 - host int2 > 1 ssd 0.43159 1.0 441G 167G 274G 37.95 0.92 147 osd.1 > 4 ssd 0.43660 1.0 447G 199G 247G 44.54 1.08 173 osd.4 > -4 0.86819- 888G 365G 523G 41.09 1.00 - host int3 > 2 ssd 0.43159 1.0 441G 193G 248G 43.71 1.06 174 osd.2 > 5 ssd 0.43660 1.0 447G 172G 274G 38.51 0.93 146 osd.5 > 0 00 0 0 0 00 0 osd.0 > 6 00 0 0 0 00 0 osd.6 > 7 00 0 0 0 00 0 osd.7 > > I guess I can just remove them from crush,auth and rm them? > > Kind Regards, > > David Majchrzak > >> 26 jan. 2018 kl. 18:09 skrev Reed Dier > <mailto:reed.d...@focusvq.com>>: >> >> This is the exact issue that I ran into when starting my bluestore >> conversion journey. >> >> See my thread here: https://www.spinics.net/lists/ceph-users/msg41802.html >> <https://www.spinics.net/lists/ceph-users/msg41802.html> >> >> Specifying --osd-id causes it to fail. >> >> Below are my steps for OSD replace/migrate from filestore to bluestore. >> >> BIG caveat here in that I am doing destructive replacement, in that I am not >> allowing my objects to be migrated off of the OSD I’m replacing before >> nuking it. >> With 8TB drives it just takes way too long, and I trust my failure domains >> and other hardware to get me through the backfills. >> So instead of 1) reading data off, writing data elsewhere 2) remove/re-add >> 3) reading data elsewhere, writing back on, I am taking step one out, and >> trusting my two other copies of the objects. Just wanted to clarify my steps. >> >> I also set norecover and norebalance flags immediately prior to running >> these commands so that it doesn’t try to start moving data unnecessarily. >> Then when done, remove those flags, and let it backfill. >> >>> systemctl stop ceph-osd@$ID.service <mailto:ceph-osd@$id.service> >>> ceph-osd -i $ID --flush-journal >>> umount /var/lib/ceph/osd/ceph-$ID >>> ceph-volume lvm zap /dev/$ID >>> ceph osd crush remove osd.$ID >>> ceph auth del osd.$ID >>> ceph osd rm osd.$ID >>> ceph-volume lvm create --bluestore --data /dev/$DATA --block.db /dev/$NVME >> >> So essentially I fully remove the OSD from crush and the osdmap, and when I >> add the OSD back, like I would a new OSD, it fills in the numeric gap with >> the $ID it had before. >> >> Hope this is helpful. >> Been working well for me so far, doing 3 OSDs at a time (half of a failure >> domain). >> >> Reed >> >>> On Jan 26, 2018, at 10:01 AM, David >> <mailto:da...@visions.se>> wrote: >>> >>> >>> Hi! >>> >>> On luminous
Re: [ceph-users] Migrating filestore to bluestore using ceph-volume
Thanks that helped! Since I had already "halfway" created a lvm volume I wanted to start from the beginning and zap it. Tried to zap the raw device but failed since --destroy doesn't seem to be in 12.2.2 http://docs.ceph.com/docs/master/ceph-volume/lvm/zap/ <http://docs.ceph.com/docs/master/ceph-volume/lvm/zap/> root@int1:~# ceph-volume lvm zap /dev/sdc --destroy usage: ceph-volume lvm zap [-h] [DEVICE] ceph-volume lvm zap: error: unrecognized arguments: --destroy So i zapped it with the vg/lvm instead. ceph-volume lvm zap /dev/ceph-efad7df8-721d-43d8-8d02-449406e70b90/osd-block-138ce507-f28a-45bf-814c-7fa124a9d9b9 However I run create on it since the LVM was already there. So I zapped it with sgdisk and ran dmsetup remove. After that I was able to create it again. However - each "ceph-volume lvm create" that I ran that failed, successfully added an osd to crush map ;) So I've got this now: root@int1:~# ceph osd df tree ID CLASS WEIGHT REWEIGHT SIZE USEAVAIL %USE VAR PGS TYPE NAME -1 2.60959- 2672G 1101G 1570G 41.24 1.00 - root default -2 0.87320- 894G 369G 524G 41.36 1.00 - host int1 3 ssd 0.43660 1.0 447G 358G 90295M 80.27 1.95 301 osd.3 8 ssd 0.43660 1.0 447G 11273M 436G 2.46 0.06 19 osd.8 -3 0.86819- 888G 366G 522G 41.26 1.00 - host int2 1 ssd 0.43159 1.0 441G 167G 274G 37.95 0.92 147 osd.1 4 ssd 0.43660 1.0 447G 199G 247G 44.54 1.08 173 osd.4 -4 0.86819- 888G 365G 523G 41.09 1.00 - host int3 2 ssd 0.43159 1.0 441G 193G 248G 43.71 1.06 174 osd.2 5 ssd 0.43660 1.0 447G 172G 274G 38.51 0.93 146 osd.5 0 00 0 0 0 00 0 osd.0 6 00 0 0 0 00 0 osd.6 7 00 0 0 0 00 0 osd.7 I guess I can just remove them from crush,auth and rm them? Kind Regards, David Majchrzak > 26 jan. 2018 kl. 18:09 skrev Reed Dier : > > This is the exact issue that I ran into when starting my bluestore conversion > journey. > > See my thread here: https://www.spinics.net/lists/ceph-users/msg41802.html > <https://www.spinics.net/lists/ceph-users/msg41802.html> > > Specifying --osd-id causes it to fail. > > Below are my steps for OSD replace/migrate from filestore to bluestore. > > BIG caveat here in that I am doing destructive replacement, in that I am not > allowing my objects to be migrated off of the OSD I’m replacing before nuking > it. > With 8TB drives it just takes way too long, and I trust my failure domains > and other hardware to get me through the backfills. > So instead of 1) reading data off, writing data elsewhere 2) remove/re-add 3) > reading data elsewhere, writing back on, I am taking step one out, and > trusting my two other copies of the objects. Just wanted to clarify my steps. > > I also set norecover and norebalance flags immediately prior to running these > commands so that it doesn’t try to start moving data unnecessarily. Then when > done, remove those flags, and let it backfill. > >> systemctl stop ceph-osd@$ID.service >> ceph-osd -i $ID --flush-journal >> umount /var/lib/ceph/osd/ceph-$ID >> ceph-volume lvm zap /dev/$ID >> ceph osd crush remove osd.$ID >> ceph auth del osd.$ID >> ceph osd rm osd.$ID >> ceph-volume lvm create --bluestore --data /dev/$DATA --block.db /dev/$NVME > > So essentially I fully remove the OSD from crush and the osdmap, and when I > add the OSD back, like I would a new OSD, it fills in the numeric gap with > the $ID it had before. > > Hope this is helpful. > Been working well for me so far, doing 3 OSDs at a time (half of a failure > domain). > > Reed > >> On Jan 26, 2018, at 10:01 AM, David > <mailto:da...@visions.se>> wrote: >> >> >> Hi! >> >> On luminous 12.2.2 >> >> I'm migrating some OSDs from filestore to bluestore using the "simple" >> method as described in docs: >> http://docs.ceph.com/docs/master/rados/operations/bluestore-migration/#convert-existing-osds >> >> <http://docs.ceph.com/docs/master/rados/operations/bluestore-migration/#convert-existing-osds> >> Mark out and Replace. >> >> However, at 9.: ceph-volume create --bluestore --data $DEVICE --osd-id $ID >> it seems to create the bluestore but it fails to authenticate with the old >> osd-id auth. >> (the command above is also missing lvm or simple) >> >> I think it's related to this: >> http://tracker.ceph.com/issues/22642 <http://tracker.ceph.com/issues/22642>