[ceph-users] Re: Fwd: Re: Ceph osd will not start.

David Orman Tue, 01 Jun 2021 00:32:07 -0700

I do not believe it was in 16.2.4. I will build another patched version of the 
image tomorrow based on that version. I do agree, I feel this breaks new 
deploys as well as existing, and hope a point release will come soon that 
includes the fix.


> On May 31, 2021, at 15:33, Marco Pizzolo <marcopizz...@gmail.com> wrote:
> 
> 
> David,
> 
> What I can confirm is that if this fix is already in 16.2.4 and 15.2.13, then 
> there's another issue resulting in the same situation, as it continues to 
> happen in the latest available images.
> We are going to try and see if we can install a 15.2.x release and 
> subsequently upgrade using a fixed image.  We were not finding a good way to 
> bootstrap directly with a custom image, but maybe we missed something.  
> cephadm bootstrap command didn't seem to support image path.
> 
> Thanks for your help thus far.  I'll update later today or tomorrow when we 
> get the chance to go the upgrade route.
> 
> Seems tragic that when an all-stopping, immediately reproducible issue such 
> as this occurs, adopters are allowed to flounder for so long.  Ceph has had a 
> tremendously positive impact for us since we began using it in 
> luminous/mimic, but situations such as this are hard to look past.  It's 
> really unfortunate as our existing production clusters have been rock solid 
> thus far, but this does shake one's confidence, and I would wager that I'm 
> not alone.
> 
> Marco
> 
>  
> 
> 
>   
> 
>> On Mon, May 31, 2021 at 3:57 PM David Orman <orma...@corenode.com> wrote:
>> Does the image we built fix the problem for you? That's how we worked
>> around it. Unfortunately, it even bites you with less OSDs if you have
>> DB/WAL on other devices, we have 24 rotational drives/OSDs, but split
>> DB/WAL onto multiple NVMEs. We're hoping the remoto fix (since it's
>> merged upstream and pushed) will land in the next point release of
>> 16.x (and it sounds like 15.x), since this is a blocking issue without
>> using patched containers. I guess testing isn't done against clusters
>> with these kinds of configurations, as we can replicate it on any of
>> our dev/test clusters with this type of drive configuration. We
>> weren't able to upgrade any clusters/deploy new hosts on any clusters,
>> so it caused quite an issue until we figured out the problem and
>> resolved it.
>> 
>> If you want to build your own images, this is the simple Dockerfile we
>> used to get beyond this issue:
>> 
>> $ cat Dockerfile
>> FROM docker.io/ceph/ceph:v16.2.3
>> COPY process.py /lib/python3.6/site-packages/remoto/process.py
>> 
>> The process.py is the patched version we submitted here:
>> https://github.com/alfredodeza/remoto/pull/63/commits/6f98078a1479de1f246f971f311146a3c1605494
>> (merged upstream).
>> 
>> Hope this helps,
>> David
>> 
>> On Mon, May 31, 2021 at 11:43 AM Marco Pizzolo <marcopizz...@gmail.com> 
>> wrote:
>> >
>> > Unfortunately Ceph 16.2.4 is still not working for us.  We continue to 
>> > have issues where the 26th OSD is not fully created and started.  We've 
>> > confirmed that we do get the flock as described in:
>> >
>> > https://tracker.ceph.com/issues/50526
>> >
>> > -----
>> >
>> > I have verified in our labs a way to reproduce easily the problem:
>> >
>> > 0. Please stop the cephadm orchestrator:
>> >
>> > In your bootstrap node:
>> >
>> > # cephadm shell
>> > # ceph mgr module disable cephadm
>> >
>> > 1. In one of the hosts where you want to create osds and you have a big 
>> > amount of devices:
>> >
>> > See if you have a "cephadm" filelock:
>> > for example:
>> >
>> > # lslocks | grep cephadm
>> > python3         1098782  FLOCK   0B WRITE 0     0   0 
>> > /run/cephadm/9fa2b396-adb5-11eb-a2d3-bc97e17cf960.lock
>> >
>> > if that is the case. just kill the process to start with a "clean" 
>> > situation
>> >
>> > 2. Go to the folder: /var/lib/ceph/<your_ceph_cluster_fsid>
>> >
>> > you will find there a file called "cephadm.xxxxxxxxxxxxxx".
>> >
>> > execute:
>> >
>> > # python3 cephadm.xxxxxxxxxxxxxx ceph-volume inventory
>> >
>> > 3. If the problem is present in your cephadm file, you will have the 
>> > command blocked and you will see again a cephadm filelock
>> >
>> > 4. In the case that the modification was not present. Change your 
>> > cephadm.xxxxxxxxxx file to include the modification I did (is just to 
>> > remove the verbosity parameter in the call_throws call)
>> >
>> > https://github.com/ceph/ceph/blob/2f4dc3147712f1991242ef0d059690b5fa3d8463/src/cephadm/cephadm#L4576
>> >
>> > go to step 1, to clean the filelock and try again... with the modification 
>> > in place it must work.
>> >
>> > -----
>> >
>> > For us, it takes a few seconds but then the manual execution does come 
>> > back, and there are no file locks, however we remain unable to add any 
>> > further OSDs.
>> >
>> > Furthermore, this is happening as part of the creation of a new Pacific 
>> > Cluster creation post bootstrap and adding one OSD daemon at a time and 
>> > allowing each OSD to be created, set in, and brought up.
>> >
>> > How is everyone else managing to get past this, or are we the only ones 
>> > (aside from David) using >25 OSDs per host?
>> >
>> > Our luck has been the same with 15.2.13 and 16.2.4, and using both Docker 
>> > and Podman on Ubuntu 20.04.2
>> >
>> > Thanks,
>> > Marco
>> >
>> >
>> >
>> > On Sun, May 30, 2021 at 7:33 AM Peter Childs <pchi...@bcs.org> wrote:
>> >>
>> >> I've actually managed to get a little further with my problem.
>> >>
>> >> As I've said before these servers are slightly distorted in config.
>> >>
>> >> 63 drives and only 48g if memory.
>> >>
>> >> Once I create about 15-20 osds it continues to format the disks but won't 
>> >> actually create the containers or start any service.
>> >>
>> >> Worse than that on reboot the disks disappear, not stop working but not 
>> >> detected by Linux, which makes me think I'm hitting some kernel limit.
>> >>
>> >> At this point I'm going to cut my loses and give up and use the small 
>> >> slightly more powerful 30x drive systems I have (with 256g memory), maybe 
>> >> transplanting the larger disks if I need more capacity.
>> >>
>> >> Peter
>> >>
>> >> On Sat, 29 May 2021, 23:19 Marco Pizzolo, <marcopizz...@gmail.com> wrote:
>> >>>
>> >>> Thanks David
>> >>> We will investigate the bugs as per your suggestion, and then will look 
>> >>> to test with the custom image.
>> >>>
>> >>> Appreciate it.
>> >>>
>> >>> On Sat, May 29, 2021, 4:11 PM David Orman <orma...@corenode.com> wrote:
>> >>>>
>> >>>> You may be running into the same issue we ran into (make sure to read
>> >>>> the first issue, there's a few mingled in there), for which we
>> >>>> submitted a patch:
>> >>>>
>> >>>> https://tracker.ceph.com/issues/50526
>> >>>> https://github.com/alfredodeza/remoto/issues/62
>> >>>>
>> >>>> If you're brave (YMMV, test first non-prod), we pushed an image with
>> >>>> the issue we encountered fixed as per above here:
>> >>>> https://hub.docker.com/repository/docker/ormandj/ceph/tags?page=1 . We
>> >>>> 'upgraded' to this when we encountered the mgr hanging on us after
>> >>>> updating ceph to v16 and experiencing this issue using: "ceph orch
>> >>>> upgrade start --image docker.io/ormandj/ceph:v16.2.3-mgrfix". I've not
>> >>>> tried to boostrap a new cluster with a custom image, and I don't know
>> >>>> when 16.2.4 will be released with this change (hopefully) integrated
>> >>>> as remoto accepted the patch upstream.
>> >>>>
>> >>>> I'm not sure if this is your exact issue, see the bug reports and see
>> >>>> if you see the lock/the behavior matches, if so - then it may help you
>> >>>> out. The only change in that image is that patch to remoto being
>> >>>> overlaid on the default 16.2.3 image.
>> >>>>
>> >>>> On Fri, May 28, 2021 at 1:15 PM Marco Pizzolo <marcopizz...@gmail.com> 
>> >>>> wrote:
>> >>>> >
>> >>>> > Peter,
>> >>>> >
>> >>>> > We're seeing the same issues as you are.  We have 2 new hosts Intel(R)
>> >>>> > Xeon(R) Gold 6248R CPU @ 3.00GHz w/ 48 cores, 384GB RAM, and 60x 10TB 
>> >>>> > SED
>> >>>> > drives and we have tried both 15.2.13 and 16.2.4
>> >>>> >
>> >>>> > Cephadm does NOT properly deploy and activate OSDs on Ubuntu 20.04.2 
>> >>>> > with
>> >>>> > Docker.
>> >>>> >
>> >>>> > Seems to be a bug in Cephadm and a product regression, as we have 4 
>> >>>> > near
>> >>>> > identical nodes on Centos running Nautilus (240 x 10TB SED drives) 
>> >>>> > and had
>> >>>> > no problems.
>> >>>> >
>> >>>> > FWIW we had no luck yet with one-by-one OSD daemon additions through 
>> >>>> > ceph
>> >>>> > orch either.  We also reproduced the issue easily in a virtual lab 
>> >>>> > using
>> >>>> > small virtual disks on a single ceph VM with 1 mon.
>> >>>> >
>> >>>> > We are now looking into whether we can get past this with a manual 
>> >>>> > buildout.
>> >>>> >
>> >>>> > If you, or anyone, has hit the same stumbling block and gotten past 
>> >>>> > it, I
>> >>>> > would really appreciate some guidance.
>> >>>> >
>> >>>> > Thanks,
>> >>>> > Marco
>> >>>> >
>> >>>> > On Thu, May 27, 2021 at 2:23 PM Peter Childs <pchi...@bcs.org> wrote:
>> >>>> >
>> >>>> > > In the end it looks like I might be able to get the node up to 
>> >>>> > > about 30
>> >>>> > > odds before it stops creating any more.
>> >>>> > >
>> >>>> > > Or more it formats the disks but freezes up starting the daemons.
>> >>>> > >
>> >>>> > > I suspect I'm missing somthing I can tune to get it working better.
>> >>>> > >
>> >>>> > > If I could see any error messages that might help, but I'm yet to 
>> >>>> > > spit
>> >>>> > > anything.
>> >>>> > >
>> >>>> > > Peter.
>> >>>> > >
>> >>>> > > On Wed, 26 May 2021, 10:57 Eugen Block, <ebl...@nde.ag> wrote:
>> >>>> > >
>> >>>> > > > > If I add the osd daemons one at a time with
>> >>>> > > > >
>> >>>> > > > > ceph orch daemon add osd drywood12:/dev/sda
>> >>>> > > > >
>> >>>> > > > > It does actually work,
>> >>>> > > >
>> >>>> > > > Great!
>> >>>> > > >
>> >>>> > > > > I suspect what's happening is when my rule for creating osds 
>> >>>> > > > > run and
>> >>>> > > > > creates them all-at-once it ties the orch it overloads cephadm 
>> >>>> > > > > and it
>> >>>> > > > can't
>> >>>> > > > > cope.
>> >>>> > > >
>> >>>> > > > It's possible, I guess.
>> >>>> > > >
>> >>>> > > > > I suspect what I might need to do at least to work around the 
>> >>>> > > > > issue is
>> >>>> > > > set
>> >>>> > > > > "limit:" and bring it up until it stops working.
>> >>>> > > >
>> >>>> > > > It's worth a try, yes, although the docs state you should try to 
>> >>>> > > > avoid
>> >>>> > > > it, it's possible that it doesn't work properly, in that case 
>> >>>> > > > create a
>> >>>> > > > bug report. ;-)
>> >>>> > > >
>> >>>> > > > > I did work out how to get ceph-volume to nearly work manually.
>> >>>> > > > >
>> >>>> > > > > cephadm shell
>> >>>> > > > > ceph auth get client.bootstrap-osd -o
>> >>>> > > > > /var/lib/ceph/bootstrap-osd/ceph.keyring
>> >>>> > > > > ceph-volume lvm create --data /dev/sda --dmcrypt
>> >>>> > > > >
>> >>>> > > > > but given I've now got "add osd" to work, I suspect I just need 
>> >>>> > > > > to fine
>> >>>> > > > > tune my osd creation rules, so it does not try and create too 
>> >>>> > > > > many osds
>> >>>> > > > on
>> >>>> > > > > the same node at the same time.
>> >>>> > > >
>> >>>> > > > I agree, no need to do it manually if there is an automated way,
>> >>>> > > > especially if you're trying to bring up dozens of OSDs.
>> >>>> > > >
>> >>>> > > >
>> >>>> > > > Zitat von Peter Childs <pchi...@bcs.org>:
>> >>>> > > >
>> >>>> > > > > After a bit of messing around. I managed to get it somewhat 
>> >>>> > > > > working.
>> >>>> > > > >
>> >>>> > > > > If I add the osd daemons one at a time with
>> >>>> > > > >
>> >>>> > > > > ceph orch daemon add osd drywood12:/dev/sda
>> >>>> > > > >
>> >>>> > > > > It does actually work,
>> >>>> > > > >
>> >>>> > > > > I suspect what's happening is when my rule for creating osds 
>> >>>> > > > > run and
>> >>>> > > > > creates them all-at-once it ties the orch it overloads cephadm 
>> >>>> > > > > and it
>> >>>> > > > can't
>> >>>> > > > > cope.
>> >>>> > > > >
>> >>>> > > > > service_type: osd
>> >>>> > > > > service_name: osd.drywood-disks
>> >>>> > > > > placement:
>> >>>> > > > >   host_pattern: 'drywood*'
>> >>>> > > > > spec:
>> >>>> > > > >   data_devices:
>> >>>> > > > >     size: "7TB:"
>> >>>> > > > >   objectstore: bluestore
>> >>>> > > > >
>> >>>> > > > > I suspect what I might need to do at least to work around the 
>> >>>> > > > > issue is
>> >>>> > > > set
>> >>>> > > > > "limit:" and bring it up until it stops working.
>> >>>> > > > >
>> >>>> > > > > I did work out how to get ceph-volume to nearly work manually.
>> >>>> > > > >
>> >>>> > > > > cephadm shell
>> >>>> > > > > ceph auth get client.bootstrap-osd -o
>> >>>> > > > > /var/lib/ceph/bootstrap-osd/ceph.keyring
>> >>>> > > > > ceph-volume lvm create --data /dev/sda --dmcrypt
>> >>>> > > > >
>> >>>> > > > > but given I've now got "add osd" to work, I suspect I just need 
>> >>>> > > > > to fine
>> >>>> > > > > tune my osd creation rules, so it does not try and create too 
>> >>>> > > > > many osds
>> >>>> > > > on
>> >>>> > > > > the same node at the same time.
>> >>>> > > > >
>> >>>> > > > >
>> >>>> > > > >
>> >>>> > > > > On Wed, 26 May 2021 at 08:25, Eugen Block <ebl...@nde.ag> wrote:
>> >>>> > > > >
>> >>>> > > > >> Hi,
>> >>>> > > > >>
>> >>>> > > > >> I believe your current issue is due to a missing keyring for
>> >>>> > > > >> client.bootstrap-osd on the OSD node. But even after fixing 
>> >>>> > > > >> that
>> >>>> > > > >> you'll probably still won't be able to deploy an OSD manually 
>> >>>> > > > >> with
>> >>>> > > > >> ceph-volume because 'ceph-volume activate' is not supported 
>> >>>> > > > >> with
>> >>>> > > > >> cephadm [1]. I just tried that in a virtual environment, it 
>> >>>> > > > >> fails when
>> >>>> > > > >> activating the systemd-unit:
>> >>>> > > > >>
>> >>>> > > > >> ---snip---
>> >>>> > > > >> [2021-05-26 06:47:16,677][ceph_volume.process][INFO  ] Running
>> >>>> > > > >> command: /usr/bin/systemctl enable
>> >>>> > > > >> ceph-volume@lvm-8-1a8fc8ae-8f4c-4f91-b044-d5636bb52456
>> >>>> > > > >> [2021-05-26 06:47:16,692][ceph_volume.process][INFO  ] stderr 
>> >>>> > > > >> Failed
>> >>>> > > > >> to connect to bus: No such file or directory
>> >>>> > > > >> [2021-05-26 
>> >>>> > > > >> 06:47:16,693][ceph_volume.devices.lvm.create][ERROR ] lvm
>> >>>> > > > >> activate was unable to complete, while creating the OSD
>> >>>> > > > >> Traceback (most recent call last):
>> >>>> > > > >>    File
>> >>>> > > > >> "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/create.py",
>> >>>> > > > >> line 32, in create
>> >>>> > > > >>      Activate([]).activate(args)
>> >>>> > > > >>    File 
>> >>>> > > > >> "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py",
>> >>>> > > > >> line 16, in is_root
>> >>>> > > > >>      return func(*a, **kw)
>> >>>> > > > >>    File
>> >>>> > > > >>
>> >>>> > > "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/activate.py",
>> >>>> > > > >> line
>> >>>> > > > >> 294, in activate
>> >>>> > > > >>      activate_bluestore(lvs, args.no_systemd)
>> >>>> > > > >>    File
>> >>>> > > > >>
>> >>>> > > "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/activate.py",
>> >>>> > > > >> line
>> >>>> > > > >> 214, in activate_bluestore
>> >>>> > > > >>      systemctl.enable_volume(osd_id, osd_fsid, 'lvm')
>> >>>> > > > >>    File
>> >>>> > > > >> "/usr/lib/python3.6/site-packages/ceph_volume/systemd/systemctl.py",
>> >>>> > > > >> line 82, in enable_volume
>> >>>> > > > >>      return enable(volume_unit % (device_type, id_, fsid))
>> >>>> > > > >>    File
>> >>>> > > > >> "/usr/lib/python3.6/site-packages/ceph_volume/systemd/systemctl.py",
>> >>>> > > > >> line 22, in enable
>> >>>> > > > >>      process.run(['systemctl', 'enable', unit])
>> >>>> > > > >>    File 
>> >>>> > > > >> "/usr/lib/python3.6/site-packages/ceph_volume/process.py",
>> >>>> > > > >> line 153, in run
>> >>>> > > > >>      raise RuntimeError(msg)
>> >>>> > > > >> RuntimeError: command returned non-zero exit status: 1
>> >>>> > > > >> [2021-05-26 06:47:16,694][ceph_volume.devices.lvm.create][INFO 
>> >>>> > > > >>  ] will
>> >>>> > > > >> rollback OSD ID creation
>> >>>> > > > >> [2021-05-26 06:47:16,697][ceph_volume.process][INFO  ] Running
>> >>>> > > > >> command: /usr/bin/ceph --cluster ceph --name 
>> >>>> > > > >> client.bootstrap-osd
>> >>>> > > > >> --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd 
>> >>>> > > > >> purge-new osd.8
>> >>>> > > > >> --yes-i-really-mean-it
>> >>>> > > > >> [2021-05-26 06:47:17,597][ceph_volume.process][INFO  ] stderr 
>> >>>> > > > >> purged
>> >>>> > > > osd.8
>> >>>> > > > >> ---snip---
>> >>>> > > > >>
>> >>>> > > > >> There's a workaround described in [2] that's not really an 
>> >>>> > > > >> option for
>> >>>> > > > >> dozens of OSDs. I think your best approach is to bring cephadm 
>> >>>> > > > >> to
>> >>>> > > > >> activate the OSDs for you.
>> >>>> > > > >> You wrote you didn't find any helpful error messages, but did 
>> >>>> > > > >> cephadm
>> >>>> > > > >> even try to deploy OSDs? What does your osd spec file look 
>> >>>> > > > >> like? Did
>> >>>> > > > >> you explicitly run 'ceph orch apply osd -i specfile.yml'? This 
>> >>>> > > > >> should
>> >>>> > > > >> trigger cephadm and you should see at least some output like 
>> >>>> > > > >> this:
>> >>>> > > > >>
>> >>>> > > > >> Mai 26 08:21:48 pacific1 conmon[31446]: 
>> >>>> > > > >> 2021-05-26T06:21:48.466+0000
>> >>>> > > > >> 7effc15ff700  0 log_channel(cephadm) log [INF] : Applying 
>> >>>> > > > >> service
>> >>>> > > > >> osd.ssd-hdd-mix on host pacific2...
>> >>>> > > > >> Mai 26 08:21:49 pacific1 conmon[31009]: cephadm
>> >>>> > > > >> 2021-05-26T06:21:48.469611+0000 mgr.pacific1.whndiw 
>> >>>> > > > >> (mgr.14166) 1646 :
>> >>>> > > > >> cephadm [INF] Applying service osd.ssd-hdd-mix on host 
>> >>>> > > > >> pacific2...
>> >>>> > > > >>
>> >>>> > > > >> Regards,
>> >>>> > > > >> Eugen
>> >>>> > > > >>
>> >>>> > > > >> [1] https://tracker.ceph.com/issues/49159
>> >>>> > > > >> [2] https://tracker.ceph.com/issues/46691
>> >>>> > > > >>
>> >>>> > > > >>
>> >>>> > > > >> Zitat von Peter Childs <pchi...@bcs.org>:
>> >>>> > > > >>
>> >>>> > > > >> > Not sure what I'm doing wrong, I suspect its the way I'm 
>> >>>> > > > >> > running
>> >>>> > > > >> > ceph-volume.
>> >>>> > > > >> >
>> >>>> > > > >> > root@drywood12:~# cephadm ceph-volume lvm create --data 
>> >>>> > > > >> > /dev/sda
>> >>>> > > > >> --dmcrypt
>> >>>> > > > >> > Inferring fsid 1518c8e0-bbe4-11eb-9772-001e67dc85ea
>> >>>> > > > >> > Using recent ceph image ceph/ceph@sha256
>> >>>> > > > >> > :54e95ae1e11404157d7b329d0bef866ebbb214b195a009e87aae4eba9d282949
>> >>>> > > > >> > /usr/bin/docker: Running command: /usr/bin/ceph-authtool
>> >>>> > > > --gen-print-key
>> >>>> > > > >> > /usr/bin/docker: Running command: /usr/bin/ceph-authtool
>> >>>> > > > --gen-print-key
>> >>>> > > > >> > /usr/bin/docker: -->  RuntimeError: No valid ceph 
>> >>>> > > > >> > configuration file
>> >>>> > > > was
>> >>>> > > > >> > loaded.
>> >>>> > > > >> > Traceback (most recent call last):
>> >>>> > > > >> >   File "/usr/sbin/cephadm", line 8029, in <module>
>> >>>> > > > >> >     main()
>> >>>> > > > >> >   File "/usr/sbin/cephadm", line 8017, in main
>> >>>> > > > >> >     r = ctx.func(ctx)
>> >>>> > > > >> >   File "/usr/sbin/cephadm", line 1678, in _infer_fsid
>> >>>> > > > >> >     return func(ctx)
>> >>>> > > > >> >   File "/usr/sbin/cephadm", line 1738, in _infer_image
>> >>>> > > > >> >     return func(ctx)
>> >>>> > > > >> >   File "/usr/sbin/cephadm", line 4514, in command_ceph_volume
>> >>>> > > > >> >     out, err, code = call_throws(ctx, c.run_cmd(),
>> >>>> > > > verbosity=verbosity)
>> >>>> > > > >> >   File "/usr/sbin/cephadm", line 1464, in call_throws
>> >>>> > > > >> >     raise RuntimeError('Failed command: %s' % ' 
>> >>>> > > > >> > '.join(command))
>> >>>> > > > >> > RuntimeError: Failed command: /usr/bin/docker run --rm 
>> >>>> > > > >> > --ipc=host
>> >>>> > > > >> > --net=host --entrypoint /usr/sbin/ceph-volume --privileged
>> >>>> > > > >> --group-add=disk
>> >>>> > > > >> > --init -e CONTAINER_IMAGE=ceph/ceph@sha256
>> >>>> > > :54e95ae1e11404157d7b329d0t
>> >>>> > > > >> >
>> >>>> > > > >> > root@drywood12:~# cephadm shell
>> >>>> > > > >> > Inferring fsid 1518c8e0-bbe4-11eb-9772-001e67dc85ea
>> >>>> > > > >> > Inferring config
>> >>>> > > > >> >
>> >>>> > > > /var/lib/ceph/1518c8e0-bbe4-11eb-9772-001e67dc85ea/mon.drywood12/config
>> >>>> > > > >> > Using recent ceph image ceph/ceph@sha256
>> >>>> > > > >> > :54e95ae1e11404157d7b329d0bef866ebbb214b195a009e87aae4eba9d282949
>> >>>> > > > >> > root@drywood12:/# ceph-volume lvm create --data /dev/sda 
>> >>>> > > > >> > --dmcrypt
>> >>>> > > > >> > Running command: /usr/bin/ceph-authtool --gen-print-key
>> >>>> > > > >> > Running command: /usr/bin/ceph-authtool --gen-print-key
>> >>>> > > > >> > Running command: /usr/bin/ceph --cluster ceph --name
>> >>>> > > > client.bootstrap-osd
>> >>>> > > > >> > --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd 
>> >>>> > > > >> > new
>> >>>> > > > >> > 70054a5c-c176-463a-a0ac-b44c5db0987c
>> >>>> > > > >> >  stderr: 2021-05-25T07:46:18.188+0000 7fdef8f0d700 -1 auth: 
>> >>>> > > > >> > unable
>> >>>> > > to
>> >>>> > > > >> find
>> >>>> > > > >> > a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) 
>> >>>> > > > >> > No such
>> >>>> > > > file
>> >>>> > > > >> or
>> >>>> > > > >> > directory
>> >>>> > > > >> >  stderr: 2021-05-25T07:46:18.188+0000 7fdef8f0d700 -1
>> >>>> > > > >> > AuthRegistry(0x7fdef405b378) no keyring found at
>> >>>> > > > >> > /var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx
>> >>>> > > > >> >  stderr: 2021-05-25T07:46:18.188+0000 7fdef8f0d700 -1 auth: 
>> >>>> > > > >> > unable
>> >>>> > > to
>> >>>> > > > >> find
>> >>>> > > > >> > a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) 
>> >>>> > > > >> > No such
>> >>>> > > > file
>> >>>> > > > >> or
>> >>>> > > > >> > directory
>> >>>> > > > >> >  stderr: 2021-05-25T07:46:18.188+0000 7fdef8f0d700 -1
>> >>>> > > > >> > AuthRegistry(0x7fdef405ef20) no keyring found at
>> >>>> > > > >> > /var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx
>> >>>> > > > >> >  stderr: 2021-05-25T07:46:18.188+0000 7fdef8f0d700 -1 auth: 
>> >>>> > > > >> > unable
>> >>>> > > to
>> >>>> > > > >> find
>> >>>> > > > >> > a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) 
>> >>>> > > > >> > No such
>> >>>> > > > file
>> >>>> > > > >> or
>> >>>> > > > >> > directory
>> >>>> > > > >> >  stderr: 2021-05-25T07:46:18.188+0000 7fdef8f0d700 -1
>> >>>> > > > >> > AuthRegistry(0x7fdef8f0bea0) no keyring found at
>> >>>> > > > >> > /var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx
>> >>>> > > > >> >  stderr: 2021-05-25T07:46:18.188+0000 7fdef2d9d700 -1
>> >>>> > > > monclient(hunting):
>> >>>> > > > >> > handle_auth_bad_method server allowed_methods [2] but i only 
>> >>>> > > > >> > support
>> >>>> > > > [1]
>> >>>> > > > >> >  stderr: 2021-05-25T07:46:18.188+0000 7fdef259c700 -1
>> >>>> > > > monclient(hunting):
>> >>>> > > > >> > handle_auth_bad_method server allowed_methods [2] but i only 
>> >>>> > > > >> > support
>> >>>> > > > [1]
>> >>>> > > > >> >  stderr: 2021-05-25T07:46:18.188+0000 7fdef1d9b700 -1
>> >>>> > > > monclient(hunting):
>> >>>> > > > >> > handle_auth_bad_method server allowed_methods [2] but i only 
>> >>>> > > > >> > support
>> >>>> > > > [1]
>> >>>> > > > >> >  stderr: 2021-05-25T07:46:18.188+0000 7fdef8f0d700 -1 
>> >>>> > > > >> > monclient:
>> >>>> > > > >> > authenticate NOTE: no keyring found; disabled cephx 
>> >>>> > > > >> > authentication
>> >>>> > > > >> >  stderr: [errno 13] RADOS permission denied (error 
>> >>>> > > > >> > connecting to the
>> >>>> > > > >> > cluster)
>> >>>> > > > >> > -->  RuntimeError: Unable to create a new OSD id
>> >>>> > > > >> > root@drywood12:/# lsblk /dev/sda
>> >>>> > > > >> > NAME MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
>> >>>> > > > >> > sda    8:0    0  7.3T  0 disk
>> >>>> > > > >> >
>> >>>> > > > >> > As far as I can see cephadm gets a little further than this 
>> >>>> > > > >> > as the
>> >>>> > > > disks
>> >>>> > > > >> > have lvm volumes on them just the osd's daemons are not 
>> >>>> > > > >> > created or
>> >>>> > > > >> started.
>> >>>> > > > >> > So maybe I'm invoking ceph-volume incorrectly.
>> >>>> > > > >> >
>> >>>> > > > >> >
>> >>>> > > > >> > On Tue, 25 May 2021 at 06:57, Peter Childs <pchi...@bcs.org> 
>> >>>> > > > >> > wrote:
>> >>>> > > > >> >
>> >>>> > > > >> >>
>> >>>> > > > >> >>
>> >>>> > > > >> >> On Mon, 24 May 2021, 21:08 Marc, <m...@f1-outsourcing.eu> 
>> >>>> > > > >> >> wrote:
>> >>>> > > > >> >>
>> >>>> > > > >> >>> >
>> >>>> > > > >> >>> > I'm attempting to use cephadm and Pacific, currently on 
>> >>>> > > > >> >>> > debian
>> >>>> > > > >> buster,
>> >>>> > > > >> >>> > mostly because centos7 ain't supported any more and 
>> >>>> > > > >> >>> > cenotos8
>> >>>> > > ain't
>> >>>> > > > >> >>> > support
>> >>>> > > > >> >>> > by some of my hardware.
>> >>>> > > > >> >>>
>> >>>> > > > >> >>> Who says centos7 is not supported any more? Afaik 
>> >>>> > > > >> >>> centos7/el7 is
>> >>>> > > > being
>> >>>> > > > >> >>> supported till its EOL 2024. By then maybe a good 
>> >>>> > > > >> >>> alternative for
>> >>>> > > > >> >>> el8/stream has surfaced.
>> >>>> > > > >> >>>
>> >>>> > > > >> >>
>> >>>> > > > >> >> Not supported by ceph Pacific, it's our os of choice 
>> >>>> > > > >> >> otherwise.
>> >>>> > > > >> >>
>> >>>> > > > >> >> My testing says the version available of podman, docker and
>> >>>> > > python3,
>> >>>> > > > do
>> >>>> > > > >> >> not work with Pacific.
>> >>>> > > > >> >>
>> >>>> > > > >> >> Given I've needed to upgrade docker on buster can we please 
>> >>>> > > > >> >> have a
>> >>>> > > > list
>> >>>> > > > >> of
>> >>>> > > > >> >> versions that work with cephadm, maybe even have cephadm 
>> >>>> > > > >> >> say no,
>> >>>> > > > please
>> >>>> > > > >> >> upgrade unless your running the right version or better.
>> >>>> > > > >> >>
>> >>>> > > > >> >>
>> >>>> > > > >> >>
>> >>>> > > > >> >>> > Anyway I have a few nodes with 59x 7.2TB disks but for 
>> >>>> > > > >> >>> > some
>> >>>> > > reason
>> >>>> > > > >> the
>> >>>> > > > >> >>> > osd
>> >>>> > > > >> >>> > daemons don't start, the disks get formatted and the osd 
>> >>>> > > > >> >>> > are
>> >>>> > > > created
>> >>>> > > > >> but
>> >>>> > > > >> >>> > the daemons never come up.
>> >>>> > > > >> >>>
>> >>>> > > > >> >>> what if you try with
>> >>>> > > > >> >>> ceph-volume lvm create --data /dev/sdi --dmcrypt ?
>> >>>> > > > >> >>>
>> >>>> > > > >> >>
>> >>>> > > > >> >> I'll have a go.
>> >>>> > > > >> >>
>> >>>> > > > >> >>
>> >>>> > > > >> >>> > They are probably the wrong spec for ceph (48gb of 
>> >>>> > > > >> >>> > memory and
>> >>>> > > > only 4
>> >>>> > > > >> >>> > cores)
>> >>>> > > > >> >>>
>> >>>> > > > >> >>> You can always start with just configuring a few disks per 
>> >>>> > > > >> >>> node.
>> >>>> > > > That
>> >>>> > > > >> >>> should always work.
>> >>>> > > > >> >>>
>> >>>> > > > >> >>
>> >>>> > > > >> >> That was my thought too.
>> >>>> > > > >> >>
>> >>>> > > > >> >> Thanks
>> >>>> > > > >> >>
>> >>>> > > > >> >> Peter
>> >>>> > > > >> >>
>> >>>> > > > >> >>
>> >>>> > > > >> >>> > but I was expecting them to start and be either dirt 
>> >>>> > > > >> >>> > slow or
>> >>>> > > crash
>> >>>> > > > >> >>> > later,
>> >>>> > > > >> >>> > anyway I've got upto 30 of them, so I was hoping on 
>> >>>> > > > >> >>> > getting at
>> >>>> > > > least
>> >>>> > > > >> get
>> >>>> > > > >> >>> > 6PB of raw storage out of them.
>> >>>> > > > >> >>> >
>> >>>> > > > >> >>> > As yet I've not spotted any helpful error messages.
>> >>>> > > > >> >>> >
>> >>>> > > > >> >>> _______________________________________________
>> >>>> > > > >> >>> ceph-users mailing list -- ceph-users@ceph.io
>> >>>> > > > >> >>> To unsubscribe send an email to ceph-users-le...@ceph.io
>> >>>> > > > >> >>>
>> >>>> > > > >> >>
>> >>>> > > > >> > _______________________________________________
>> >>>> > > > >> > ceph-users mailing list -- ceph-users@ceph.io
>> >>>> > > > >> > To unsubscribe send an email to ceph-users-le...@ceph.io
>> >>>> > > > >>
>> >>>> > > > >>
>> >>>> > > > >> _______________________________________________
>> >>>> > > > >> ceph-users mailing list -- ceph-users@ceph.io
>> >>>> > > > >> To unsubscribe send an email to ceph-users-le...@ceph.io
>> >>>> > > > >>
>> >>>> > > >
>> >>>> > > >
>> >>>> > > > _______________________________________________
>> >>>> > > > ceph-users mailing list -- ceph-users@ceph.io
>> >>>> > > > To unsubscribe send an email to ceph-users-le...@ceph.io
>> >>>> > > >
>> >>>> > > _______________________________________________
>> >>>> > > ceph-users mailing list -- ceph-users@ceph.io
>> >>>> > > To unsubscribe send an email to ceph-users-le...@ceph.io
>> >>>> > >
>> >>>> > _______________________________________________
>> >>>> > ceph-users mailing list -- ceph-users@ceph.io
>> >>>> > To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Fwd: Re: Ceph osd will not start.

Reply via email to