Re: [ceph-users] filestore to bluestore: osdmap epoch problem and is the documentation correct?

Brady Deetz Thu, 11 Jan 2018 10:39:06 -0800

I hear you on time. I have 350 x 6TB drives to convert. I recently posted
about a disaster I created automating my migration. Good luck


On Jan 11, 2018 12:22 PM, "Reed Dier" <reed.d...@focusvq.com> wrote:

> I am in the process of migrating my OSDs to bluestore finally and thought
> I would give you some input on how I am approaching it.
> Some of saga you can find in another ML thread here:
> https://www.spinics.net/lists/ceph-users/msg41802.html
>
> My first OSD I was cautious, and I outed the OSD without downing it,
> allowing it to move data off.
> Some background on my cluster, for this OSD, it is an 8TB spinner, with an
> NVMe partition previously used for journaling in filestore, intending to be
> used for block.db in bluestore.
>
> Then I downed it, flushed the journal, destroyed it, zapped with
> ceph-volume, set norecover and norebalance flags, did ceph osd crush remove
> osd.$ID, ceph auth del osd.$ID, and ceph osd rm osd.$ID and used
> ceph-volume locally to create the new LVM target. Then unset the norecover
> and norebalance flags and it backfilled like normal.
>
> I initially ran into issues with specifying --osd.id causing my osd’s to
> fail to start, but removing that I was able to get it to fill in the gap of
> the OSD I just removed.
>
> I’m now doing quicker, more destructive migrations in an attempt to reduce
> data movement.
> This way I don’t read from OSD I’m replacing, write to other OSD
> temporarily, read back from temp OSD, write back to ‘new’ OSD.
> I’m just reading from replica and writing to ‘new’ OSD.
>
> So I’m setting the norecover and norebalance flags, down the OSD (but not
> out, it stays in, also have the noout flag set), destroy/zap, recreate
> using ceph-volume, unset the flags, and it starts backfilling.
> For 8TB disks, and with 23 other 8TB disks in the pool, it takes a *long* time
> to offload it and then backfill back from them. I trust my disks enough to
> backfill from the other disks, and its going well. Also seeing very good
> write performance backfilling compared to previous drive replacements in
> filestore, so thats very promising.
>
> Reed
>
> On Jan 10, 2018, at 8:29 AM, Jens-U. Mozdzen <jmozd...@nde.ag> wrote:
>
> Hi Alfredo,
>
> thank you for your comments:
>
> Zitat von Alfredo Deza <ad...@redhat.com>:
>
> On Wed, Jan 10, 2018 at 8:57 AM, Jens-U. Mozdzen <jmozd...@nde.ag> wrote:
>
> Dear *,
>
> has anybody been successful migrating Filestore OSDs to Bluestore OSDs,
> keeping the OSD number? There have been a number of messages on the list,
> reporting problems, and my experience is the same. (Removing the existing
> OSD and creating a new one does work for me.)
>
> I'm working on an Ceph 12.2.2 cluster and tried following
> http://docs.ceph.com/docs/master/rados/operations/add-
> or-rm-osds/#replacing-an-osd
> - this basically says
>
> 1. destroy old OSD
> 2. zap the disk
> 3. prepare the new OSD
> 4. activate the new OSD
>
> I never got step 4 to complete. The closest I got was by doing the
> following
> steps (assuming OSD ID "999" on /dev/sdzz):
>
> 1. Stop the old OSD via systemd (osd-node # systemctl stop
> ceph-osd@999.service)
>
> 2. umount the old OSD (osd-node # umount /var/lib/ceph/osd/ceph-999)
>
> 3a. if the old OSD was Bluestore with LVM, manually clean up the old OSD's
> volume group
>
> 3b. zap the block device (osd-node # ceph-volume lvm zap /dev/sdzz)
>
> 4. destroy the old OSD (osd-node # ceph osd destroy 999
> --yes-i-really-mean-it)
>
> 5. create a new OSD entry (osd-node # ceph osd new $(cat
> /var/lib/ceph/osd/ceph-999/fsid) 999)
>
>
> Step 5 and 6 are problematic if you are going to be trying ceph-volume
> later on, which takes care of doing this for you.
>
>
> 6. add the OSD secret to Ceph authentication (osd-node # ceph auth add
> osd.999 mgr 'allow profile osd' osd 'allow *' mon 'allow profile osd' -i
> /var/lib/ceph/osd/ceph-999/keyring)
>
>
> I at first tried to follow the documented steps (without my steps 5 and
> 6), which did not work for me. The documented approach failed with "init
> authentication >> failed: (1) Operation not permitted", because actually
> ceph-volume did not add the auth entry for me.
>
> But even after manually adding the authentication, the "ceph-volume"
> approach failed, as the OSD was still marked "destroyed" in the osdmap
> epoch as used by ceph-osd (see the commented messages from ceph-osd.999.log
> below).
>
>
> 7. prepare the new OSD (osd-node # ceph-volume lvm prepare --bluestore
> --osd-id 999 --data /dev/sdzz)
>
>
> You are going to hit a bug in ceph-volume that is preventing you from
> specifying the osd id directly if the ID has been destroyed.
>
> See http://tracker.ceph.com/issues/22642
>
>
> If I read that bug description correctly, you're confirming why I needed
> step #6 above (manually adding the OSD auth entry. But even if ceph-volume
> had added it, the ceph-osd.log entries suggest that starting the OSD would
> still have failed, because of accessing the wrong osdmap epoch.
>
> To me it seems like I'm hitting a bug outside of ceph-volume - unless it's
> ceph-volume that somehow determines which osdmap epoch is used by ceph-osd.
>
> In order for this to work, you would need to make sure that the ID has
> really been destroyed and avoid passing --osd-id in ceph-volume. The
> caveat
> being that you will get whatever ID is available next in the cluster.
>
>
> Yes, that's the work-around I then used - purge the old OSD and create a
> new one.
>
> Thanks & regards,
> Jens
>
> [...]
> --- cut here ---
> # first of multiple attempts, before "ceph auth add ..."
> # no actual epoch referenced, as login failed due to missing auth
> 2018-01-10 00:00:02.173983 7f5cf1c89d00  0 osd.999 0 crush map has features
> 288232575208783872, adjusting msgr requires for clients
> 2018-01-10 00:00:02.173990 7f5cf1c89d00  0 osd.999 0 crush map has features
> 288232575208783872 was 8705, adjusting msgr requires for mons
> 2018-01-10 00:00:02.173994 7f5cf1c89d00  0 osd.999 0 crush map has features
> 288232575208783872, adjusting msgr requires for osds
> 2018-01-10 00:00:02.174046 7f5cf1c89d00  0 osd.999 0 load_pgs
> 2018-01-10 00:00:02.174051 7f5cf1c89d00  0 osd.999 0 load_pgs opened 0 pgs
> 2018-01-10 00:00:02.174055 7f5cf1c89d00  0 osd.999 0 using weightedpriority
> op queue with priority op cut off at 64.
> 2018-01-10 00:00:02.174891 7f5cf1c89d00 -1 osd.999 0 log_to_monitors
> {default=true}
> 2018-01-10 00:00:02.177479 7f5cf1c89d00 -1 osd.999 0 init authentication
> failed: (1) Operation not permitted
>
> # after "ceph auth ..."
> # note the different epochs below? BTW, 110587 is the current epoch at that
> time and osd.999 is marked destroyed there
> # 109892: much too old to offer any details
> # 110587: modified 2018-01-09 23:43:13.202381
>
> 2018-01-10 00:08:00.945507 7fc55905bd00  0 osd.999 0 crush map has features
> 288232575208783872, adjusting msgr requires for clients
> 2018-01-10 00:08:00.945514 7fc55905bd00  0 osd.999 0 crush map has features
> 288232575208783872 was 8705, adjusting msgr requires for mons
> 2018-01-10 00:08:00.945521 7fc55905bd00  0 osd.999 0 crush map has features
> 288232575208783872, adjusting msgr requires for osds
> 2018-01-10 00:08:00.945588 7fc55905bd00  0 osd.999 0 load_pgs
> 2018-01-10 00:08:00.945594 7fc55905bd00  0 osd.999 0 load_pgs opened 0 pgs
> 2018-01-10 00:08:00.945599 7fc55905bd00  0 osd.999 0 using weightedpriority
> op queue with priority op cut off at 64.
> 2018-01-10 00:08:00.946544 7fc55905bd00 -1 osd.999 0 log_to_monitors
> {default=true}
> 2018-01-10 00:08:00.951720 7fc55905bd00  0 osd.999 0 done with init,
> starting boot process
> 2018-01-10 00:08:00.952225 7fc54160a700 -1 osd.999 0 waiting for initial
> osdmap
> 2018-01-10 00:08:00.970644 7fc546614700  0 osd.999 109892 crush map has
> features 288232610642264064, adjusting msgr requires for clients
> 2018-01-10 00:08:00.970653 7fc546614700  0 osd.999 109892 crush map has
> features 288232610642264064 was 288232575208792577, adjusting msgr requires
> for mons
> 2018-01-10 00:08:00.970660 7fc546614700  0 osd.999 109892 crush map has
> features 1008808551021559808, adjusting msgr requires for osds
> 2018-01-10 00:08:01.349602 7fc546614700 -1 osd.999 110587 osdmap says I am
> destroyed, exiting
>
> # another try
> # it is now using epoch 110587 for everything. But that one is off by one
> at
> that time already:
> # 110587: modified 2018-01-09 23:43:13.202381
> # 110588: modified 2018-01-10 00:12:55.271913
>
> # but both 110587 and 110588 have osd.999 as "destroyed", so never mind.
> 2018-01-10 00:13:04.332026 7f408d5a4d00  0 osd.999 110587 crush map has
> features 288232610642264064, adjusting msgr requires for clients
> 2018-01-10 00:13:04.332037 7f408d5a4d00  0 osd.999 110587 crush map has
> features 288232610642264064 was 8705, adjusting msgr requires for mons
> 2018-01-10 00:13:04.332043 7f408d5a4d00  0 osd.999 110587 crush map has
> features 1008808551021559808, adjusting msgr requires for osds
> 2018-01-10 00:13:04.332092 7f408d5a4d00  0 osd.999 110587 load_pgs
> 2018-01-10 00:13:04.332096 7f408d5a4d00  0 osd.999 110587 load_pgs opened 0
> pgs
> 2018-01-10 00:13:04.332100 7f408d5a4d00  0 osd.999 110587 using
> weightedpriority op queue with priority op cut off at 64.
> 2018-01-10 00:13:04.332990 7f408d5a4d00 -1 osd.999 110587 log_to_monitors
> {default=true}
> 2018-01-10 00:13:06.026628 7f408d5a4d00  0 osd.999 110587 done with init,
> starting boot process
> 2018-01-10 00:13:06.027627 7f4075352700 -1 osd.999 110587 osdmap says I am
> destroyed, exiting
>
> # the attempt after using "ceph osd new", which created epoch 110591 as the
> first with osd.999 as autoout,exists,new
> # But ceph-osd still uses 110587.
> # 110587: modified 2018-01-09 23:43:13.202381
> # 110591: modified 2018-01-10 00:30:44.850078
>
> 2018-01-10 00:31:15.453871 7f1c57c58d00  0 osd.999 110587 crush map has
> features 288232610642264064, adjusting msgr requires for clients
> 2018-01-10 00:31:15.453882 7f1c57c58d00  0 osd.999 110587 crush map has
> features 288232610642264064 was 8705, adjusting msgr requires for mons
> 2018-01-10 00:31:15.453887 7f1c57c58d00  0 osd.999 110587 crush map has
> features 1008808551021559808, adjusting msgr requires for osds
> 2018-01-10 00:31:15.453940 7f1c57c58d00  0 osd.999 110587 load_pgs
> 2018-01-10 00:31:15.453945 7f1c57c58d00  0 osd.999 110587 load_pgs opened 0
> pgs
> 2018-01-10 00:31:15.453952 7f1c57c58d00  0 osd.999 110587 using
> weightedpriority op queue with priority op cut off at 64.
> 2018-01-10 00:31:15.454862 7f1c57c58d00 -1 osd.999 110587 log_to_monitors
> {default=true}
> 2018-01-10 00:31:15.520533 7f1c57c58d00  0 osd.999 110587 done with init,
> starting boot process
> 2018-01-10 00:31:15.521278 7f1c40207700 -1 osd.999 110587 osdmap says I am
> destroyed, exiting
> --- cut here ---
> [...]
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] filestore to bluestore: osdmap epoch problem and is the documentation correct?

Reply via email to