I do not believe it was in 16.2.4. I will build another patched version of the image tomorrow based on that version. I do agree, I feel this breaks new deploys as well as existing, and hope a point release will come soon that includes the fix.
> On May 31, 2021, at 15:33, Marco Pizzolo <marcopizz...@gmail.com> wrote: > > > David, > > What I can confirm is that if this fix is already in 16.2.4 and 15.2.13, then > there's another issue resulting in the same situation, as it continues to > happen in the latest available images. > We are going to try and see if we can install a 15.2.x release and > subsequently upgrade using a fixed image. We were not finding a good way to > bootstrap directly with a custom image, but maybe we missed something. > cephadm bootstrap command didn't seem to support image path. > > Thanks for your help thus far. I'll update later today or tomorrow when we > get the chance to go the upgrade route. > > Seems tragic that when an all-stopping, immediately reproducible issue such > as this occurs, adopters are allowed to flounder for so long. Ceph has had a > tremendously positive impact for us since we began using it in > luminous/mimic, but situations such as this are hard to look past. It's > really unfortunate as our existing production clusters have been rock solid > thus far, but this does shake one's confidence, and I would wager that I'm > not alone. > > Marco > > > > > > >> On Mon, May 31, 2021 at 3:57 PM David Orman <orma...@corenode.com> wrote: >> Does the image we built fix the problem for you? That's how we worked >> around it. Unfortunately, it even bites you with less OSDs if you have >> DB/WAL on other devices, we have 24 rotational drives/OSDs, but split >> DB/WAL onto multiple NVMEs. We're hoping the remoto fix (since it's >> merged upstream and pushed) will land in the next point release of >> 16.x (and it sounds like 15.x), since this is a blocking issue without >> using patched containers. I guess testing isn't done against clusters >> with these kinds of configurations, as we can replicate it on any of >> our dev/test clusters with this type of drive configuration. We >> weren't able to upgrade any clusters/deploy new hosts on any clusters, >> so it caused quite an issue until we figured out the problem and >> resolved it. >> >> If you want to build your own images, this is the simple Dockerfile we >> used to get beyond this issue: >> >> $ cat Dockerfile >> FROM docker.io/ceph/ceph:v16.2.3 >> COPY process.py /lib/python3.6/site-packages/remoto/process.py >> >> The process.py is the patched version we submitted here: >> https://github.com/alfredodeza/remoto/pull/63/commits/6f98078a1479de1f246f971f311146a3c1605494 >> (merged upstream). >> >> Hope this helps, >> David >> >> On Mon, May 31, 2021 at 11:43 AM Marco Pizzolo <marcopizz...@gmail.com> >> wrote: >> > >> > Unfortunately Ceph 16.2.4 is still not working for us. We continue to >> > have issues where the 26th OSD is not fully created and started. We've >> > confirmed that we do get the flock as described in: >> > >> > https://tracker.ceph.com/issues/50526 >> > >> > ----- >> > >> > I have verified in our labs a way to reproduce easily the problem: >> > >> > 0. Please stop the cephadm orchestrator: >> > >> > In your bootstrap node: >> > >> > # cephadm shell >> > # ceph mgr module disable cephadm >> > >> > 1. In one of the hosts where you want to create osds and you have a big >> > amount of devices: >> > >> > See if you have a "cephadm" filelock: >> > for example: >> > >> > # lslocks | grep cephadm >> > python3 1098782 FLOCK 0B WRITE 0 0 0 >> > /run/cephadm/9fa2b396-adb5-11eb-a2d3-bc97e17cf960.lock >> > >> > if that is the case. just kill the process to start with a "clean" >> > situation >> > >> > 2. Go to the folder: /var/lib/ceph/<your_ceph_cluster_fsid> >> > >> > you will find there a file called "cephadm.xxxxxxxxxxxxxx". >> > >> > execute: >> > >> > # python3 cephadm.xxxxxxxxxxxxxx ceph-volume inventory >> > >> > 3. If the problem is present in your cephadm file, you will have the >> > command blocked and you will see again a cephadm filelock >> > >> > 4. In the case that the modification was not present. Change your >> > cephadm.xxxxxxxxxx file to include the modification I did (is just to >> > remove the verbosity parameter in the call_throws call) >> > >> > https://github.com/ceph/ceph/blob/2f4dc3147712f1991242ef0d059690b5fa3d8463/src/cephadm/cephadm#L4576 >> > >> > go to step 1, to clean the filelock and try again... with the modification >> > in place it must work. >> > >> > ----- >> > >> > For us, it takes a few seconds but then the manual execution does come >> > back, and there are no file locks, however we remain unable to add any >> > further OSDs. >> > >> > Furthermore, this is happening as part of the creation of a new Pacific >> > Cluster creation post bootstrap and adding one OSD daemon at a time and >> > allowing each OSD to be created, set in, and brought up. >> > >> > How is everyone else managing to get past this, or are we the only ones >> > (aside from David) using >25 OSDs per host? >> > >> > Our luck has been the same with 15.2.13 and 16.2.4, and using both Docker >> > and Podman on Ubuntu 20.04.2 >> > >> > Thanks, >> > Marco >> > >> > >> > >> > On Sun, May 30, 2021 at 7:33 AM Peter Childs <pchi...@bcs.org> wrote: >> >> >> >> I've actually managed to get a little further with my problem. >> >> >> >> As I've said before these servers are slightly distorted in config. >> >> >> >> 63 drives and only 48g if memory. >> >> >> >> Once I create about 15-20 osds it continues to format the disks but won't >> >> actually create the containers or start any service. >> >> >> >> Worse than that on reboot the disks disappear, not stop working but not >> >> detected by Linux, which makes me think I'm hitting some kernel limit. >> >> >> >> At this point I'm going to cut my loses and give up and use the small >> >> slightly more powerful 30x drive systems I have (with 256g memory), maybe >> >> transplanting the larger disks if I need more capacity. >> >> >> >> Peter >> >> >> >> On Sat, 29 May 2021, 23:19 Marco Pizzolo, <marcopizz...@gmail.com> wrote: >> >>> >> >>> Thanks David >> >>> We will investigate the bugs as per your suggestion, and then will look >> >>> to test with the custom image. >> >>> >> >>> Appreciate it. >> >>> >> >>> On Sat, May 29, 2021, 4:11 PM David Orman <orma...@corenode.com> wrote: >> >>>> >> >>>> You may be running into the same issue we ran into (make sure to read >> >>>> the first issue, there's a few mingled in there), for which we >> >>>> submitted a patch: >> >>>> >> >>>> https://tracker.ceph.com/issues/50526 >> >>>> https://github.com/alfredodeza/remoto/issues/62 >> >>>> >> >>>> If you're brave (YMMV, test first non-prod), we pushed an image with >> >>>> the issue we encountered fixed as per above here: >> >>>> https://hub.docker.com/repository/docker/ormandj/ceph/tags?page=1 . We >> >>>> 'upgraded' to this when we encountered the mgr hanging on us after >> >>>> updating ceph to v16 and experiencing this issue using: "ceph orch >> >>>> upgrade start --image docker.io/ormandj/ceph:v16.2.3-mgrfix". I've not >> >>>> tried to boostrap a new cluster with a custom image, and I don't know >> >>>> when 16.2.4 will be released with this change (hopefully) integrated >> >>>> as remoto accepted the patch upstream. >> >>>> >> >>>> I'm not sure if this is your exact issue, see the bug reports and see >> >>>> if you see the lock/the behavior matches, if so - then it may help you >> >>>> out. The only change in that image is that patch to remoto being >> >>>> overlaid on the default 16.2.3 image. >> >>>> >> >>>> On Fri, May 28, 2021 at 1:15 PM Marco Pizzolo <marcopizz...@gmail.com> >> >>>> wrote: >> >>>> > >> >>>> > Peter, >> >>>> > >> >>>> > We're seeing the same issues as you are. We have 2 new hosts Intel(R) >> >>>> > Xeon(R) Gold 6248R CPU @ 3.00GHz w/ 48 cores, 384GB RAM, and 60x 10TB >> >>>> > SED >> >>>> > drives and we have tried both 15.2.13 and 16.2.4 >> >>>> > >> >>>> > Cephadm does NOT properly deploy and activate OSDs on Ubuntu 20.04.2 >> >>>> > with >> >>>> > Docker. >> >>>> > >> >>>> > Seems to be a bug in Cephadm and a product regression, as we have 4 >> >>>> > near >> >>>> > identical nodes on Centos running Nautilus (240 x 10TB SED drives) >> >>>> > and had >> >>>> > no problems. >> >>>> > >> >>>> > FWIW we had no luck yet with one-by-one OSD daemon additions through >> >>>> > ceph >> >>>> > orch either. We also reproduced the issue easily in a virtual lab >> >>>> > using >> >>>> > small virtual disks on a single ceph VM with 1 mon. >> >>>> > >> >>>> > We are now looking into whether we can get past this with a manual >> >>>> > buildout. >> >>>> > >> >>>> > If you, or anyone, has hit the same stumbling block and gotten past >> >>>> > it, I >> >>>> > would really appreciate some guidance. >> >>>> > >> >>>> > Thanks, >> >>>> > Marco >> >>>> > >> >>>> > On Thu, May 27, 2021 at 2:23 PM Peter Childs <pchi...@bcs.org> wrote: >> >>>> > >> >>>> > > In the end it looks like I might be able to get the node up to >> >>>> > > about 30 >> >>>> > > odds before it stops creating any more. >> >>>> > > >> >>>> > > Or more it formats the disks but freezes up starting the daemons. >> >>>> > > >> >>>> > > I suspect I'm missing somthing I can tune to get it working better. >> >>>> > > >> >>>> > > If I could see any error messages that might help, but I'm yet to >> >>>> > > spit >> >>>> > > anything. >> >>>> > > >> >>>> > > Peter. >> >>>> > > >> >>>> > > On Wed, 26 May 2021, 10:57 Eugen Block, <ebl...@nde.ag> wrote: >> >>>> > > >> >>>> > > > > If I add the osd daemons one at a time with >> >>>> > > > > >> >>>> > > > > ceph orch daemon add osd drywood12:/dev/sda >> >>>> > > > > >> >>>> > > > > It does actually work, >> >>>> > > > >> >>>> > > > Great! >> >>>> > > > >> >>>> > > > > I suspect what's happening is when my rule for creating osds >> >>>> > > > > run and >> >>>> > > > > creates them all-at-once it ties the orch it overloads cephadm >> >>>> > > > > and it >> >>>> > > > can't >> >>>> > > > > cope. >> >>>> > > > >> >>>> > > > It's possible, I guess. >> >>>> > > > >> >>>> > > > > I suspect what I might need to do at least to work around the >> >>>> > > > > issue is >> >>>> > > > set >> >>>> > > > > "limit:" and bring it up until it stops working. >> >>>> > > > >> >>>> > > > It's worth a try, yes, although the docs state you should try to >> >>>> > > > avoid >> >>>> > > > it, it's possible that it doesn't work properly, in that case >> >>>> > > > create a >> >>>> > > > bug report. ;-) >> >>>> > > > >> >>>> > > > > I did work out how to get ceph-volume to nearly work manually. >> >>>> > > > > >> >>>> > > > > cephadm shell >> >>>> > > > > ceph auth get client.bootstrap-osd -o >> >>>> > > > > /var/lib/ceph/bootstrap-osd/ceph.keyring >> >>>> > > > > ceph-volume lvm create --data /dev/sda --dmcrypt >> >>>> > > > > >> >>>> > > > > but given I've now got "add osd" to work, I suspect I just need >> >>>> > > > > to fine >> >>>> > > > > tune my osd creation rules, so it does not try and create too >> >>>> > > > > many osds >> >>>> > > > on >> >>>> > > > > the same node at the same time. >> >>>> > > > >> >>>> > > > I agree, no need to do it manually if there is an automated way, >> >>>> > > > especially if you're trying to bring up dozens of OSDs. >> >>>> > > > >> >>>> > > > >> >>>> > > > Zitat von Peter Childs <pchi...@bcs.org>: >> >>>> > > > >> >>>> > > > > After a bit of messing around. I managed to get it somewhat >> >>>> > > > > working. >> >>>> > > > > >> >>>> > > > > If I add the osd daemons one at a time with >> >>>> > > > > >> >>>> > > > > ceph orch daemon add osd drywood12:/dev/sda >> >>>> > > > > >> >>>> > > > > It does actually work, >> >>>> > > > > >> >>>> > > > > I suspect what's happening is when my rule for creating osds >> >>>> > > > > run and >> >>>> > > > > creates them all-at-once it ties the orch it overloads cephadm >> >>>> > > > > and it >> >>>> > > > can't >> >>>> > > > > cope. >> >>>> > > > > >> >>>> > > > > service_type: osd >> >>>> > > > > service_name: osd.drywood-disks >> >>>> > > > > placement: >> >>>> > > > > host_pattern: 'drywood*' >> >>>> > > > > spec: >> >>>> > > > > data_devices: >> >>>> > > > > size: "7TB:" >> >>>> > > > > objectstore: bluestore >> >>>> > > > > >> >>>> > > > > I suspect what I might need to do at least to work around the >> >>>> > > > > issue is >> >>>> > > > set >> >>>> > > > > "limit:" and bring it up until it stops working. >> >>>> > > > > >> >>>> > > > > I did work out how to get ceph-volume to nearly work manually. >> >>>> > > > > >> >>>> > > > > cephadm shell >> >>>> > > > > ceph auth get client.bootstrap-osd -o >> >>>> > > > > /var/lib/ceph/bootstrap-osd/ceph.keyring >> >>>> > > > > ceph-volume lvm create --data /dev/sda --dmcrypt >> >>>> > > > > >> >>>> > > > > but given I've now got "add osd" to work, I suspect I just need >> >>>> > > > > to fine >> >>>> > > > > tune my osd creation rules, so it does not try and create too >> >>>> > > > > many osds >> >>>> > > > on >> >>>> > > > > the same node at the same time. >> >>>> > > > > >> >>>> > > > > >> >>>> > > > > >> >>>> > > > > On Wed, 26 May 2021 at 08:25, Eugen Block <ebl...@nde.ag> wrote: >> >>>> > > > > >> >>>> > > > >> Hi, >> >>>> > > > >> >> >>>> > > > >> I believe your current issue is due to a missing keyring for >> >>>> > > > >> client.bootstrap-osd on the OSD node. But even after fixing >> >>>> > > > >> that >> >>>> > > > >> you'll probably still won't be able to deploy an OSD manually >> >>>> > > > >> with >> >>>> > > > >> ceph-volume because 'ceph-volume activate' is not supported >> >>>> > > > >> with >> >>>> > > > >> cephadm [1]. I just tried that in a virtual environment, it >> >>>> > > > >> fails when >> >>>> > > > >> activating the systemd-unit: >> >>>> > > > >> >> >>>> > > > >> ---snip--- >> >>>> > > > >> [2021-05-26 06:47:16,677][ceph_volume.process][INFO ] Running >> >>>> > > > >> command: /usr/bin/systemctl enable >> >>>> > > > >> ceph-volume@lvm-8-1a8fc8ae-8f4c-4f91-b044-d5636bb52456 >> >>>> > > > >> [2021-05-26 06:47:16,692][ceph_volume.process][INFO ] stderr >> >>>> > > > >> Failed >> >>>> > > > >> to connect to bus: No such file or directory >> >>>> > > > >> [2021-05-26 >> >>>> > > > >> 06:47:16,693][ceph_volume.devices.lvm.create][ERROR ] lvm >> >>>> > > > >> activate was unable to complete, while creating the OSD >> >>>> > > > >> Traceback (most recent call last): >> >>>> > > > >> File >> >>>> > > > >> "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/create.py", >> >>>> > > > >> line 32, in create >> >>>> > > > >> Activate([]).activate(args) >> >>>> > > > >> File >> >>>> > > > >> "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", >> >>>> > > > >> line 16, in is_root >> >>>> > > > >> return func(*a, **kw) >> >>>> > > > >> File >> >>>> > > > >> >> >>>> > > "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/activate.py", >> >>>> > > > >> line >> >>>> > > > >> 294, in activate >> >>>> > > > >> activate_bluestore(lvs, args.no_systemd) >> >>>> > > > >> File >> >>>> > > > >> >> >>>> > > "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/activate.py", >> >>>> > > > >> line >> >>>> > > > >> 214, in activate_bluestore >> >>>> > > > >> systemctl.enable_volume(osd_id, osd_fsid, 'lvm') >> >>>> > > > >> File >> >>>> > > > >> "/usr/lib/python3.6/site-packages/ceph_volume/systemd/systemctl.py", >> >>>> > > > >> line 82, in enable_volume >> >>>> > > > >> return enable(volume_unit % (device_type, id_, fsid)) >> >>>> > > > >> File >> >>>> > > > >> "/usr/lib/python3.6/site-packages/ceph_volume/systemd/systemctl.py", >> >>>> > > > >> line 22, in enable >> >>>> > > > >> process.run(['systemctl', 'enable', unit]) >> >>>> > > > >> File >> >>>> > > > >> "/usr/lib/python3.6/site-packages/ceph_volume/process.py", >> >>>> > > > >> line 153, in run >> >>>> > > > >> raise RuntimeError(msg) >> >>>> > > > >> RuntimeError: command returned non-zero exit status: 1 >> >>>> > > > >> [2021-05-26 06:47:16,694][ceph_volume.devices.lvm.create][INFO >> >>>> > > > >> ] will >> >>>> > > > >> rollback OSD ID creation >> >>>> > > > >> [2021-05-26 06:47:16,697][ceph_volume.process][INFO ] Running >> >>>> > > > >> command: /usr/bin/ceph --cluster ceph --name >> >>>> > > > >> client.bootstrap-osd >> >>>> > > > >> --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd >> >>>> > > > >> purge-new osd.8 >> >>>> > > > >> --yes-i-really-mean-it >> >>>> > > > >> [2021-05-26 06:47:17,597][ceph_volume.process][INFO ] stderr >> >>>> > > > >> purged >> >>>> > > > osd.8 >> >>>> > > > >> ---snip--- >> >>>> > > > >> >> >>>> > > > >> There's a workaround described in [2] that's not really an >> >>>> > > > >> option for >> >>>> > > > >> dozens of OSDs. I think your best approach is to bring cephadm >> >>>> > > > >> to >> >>>> > > > >> activate the OSDs for you. >> >>>> > > > >> You wrote you didn't find any helpful error messages, but did >> >>>> > > > >> cephadm >> >>>> > > > >> even try to deploy OSDs? What does your osd spec file look >> >>>> > > > >> like? Did >> >>>> > > > >> you explicitly run 'ceph orch apply osd -i specfile.yml'? This >> >>>> > > > >> should >> >>>> > > > >> trigger cephadm and you should see at least some output like >> >>>> > > > >> this: >> >>>> > > > >> >> >>>> > > > >> Mai 26 08:21:48 pacific1 conmon[31446]: >> >>>> > > > >> 2021-05-26T06:21:48.466+0000 >> >>>> > > > >> 7effc15ff700 0 log_channel(cephadm) log [INF] : Applying >> >>>> > > > >> service >> >>>> > > > >> osd.ssd-hdd-mix on host pacific2... >> >>>> > > > >> Mai 26 08:21:49 pacific1 conmon[31009]: cephadm >> >>>> > > > >> 2021-05-26T06:21:48.469611+0000 mgr.pacific1.whndiw >> >>>> > > > >> (mgr.14166) 1646 : >> >>>> > > > >> cephadm [INF] Applying service osd.ssd-hdd-mix on host >> >>>> > > > >> pacific2... >> >>>> > > > >> >> >>>> > > > >> Regards, >> >>>> > > > >> Eugen >> >>>> > > > >> >> >>>> > > > >> [1] https://tracker.ceph.com/issues/49159 >> >>>> > > > >> [2] https://tracker.ceph.com/issues/46691 >> >>>> > > > >> >> >>>> > > > >> >> >>>> > > > >> Zitat von Peter Childs <pchi...@bcs.org>: >> >>>> > > > >> >> >>>> > > > >> > Not sure what I'm doing wrong, I suspect its the way I'm >> >>>> > > > >> > running >> >>>> > > > >> > ceph-volume. >> >>>> > > > >> > >> >>>> > > > >> > root@drywood12:~# cephadm ceph-volume lvm create --data >> >>>> > > > >> > /dev/sda >> >>>> > > > >> --dmcrypt >> >>>> > > > >> > Inferring fsid 1518c8e0-bbe4-11eb-9772-001e67dc85ea >> >>>> > > > >> > Using recent ceph image ceph/ceph@sha256 >> >>>> > > > >> > :54e95ae1e11404157d7b329d0bef866ebbb214b195a009e87aae4eba9d282949 >> >>>> > > > >> > /usr/bin/docker: Running command: /usr/bin/ceph-authtool >> >>>> > > > --gen-print-key >> >>>> > > > >> > /usr/bin/docker: Running command: /usr/bin/ceph-authtool >> >>>> > > > --gen-print-key >> >>>> > > > >> > /usr/bin/docker: --> RuntimeError: No valid ceph >> >>>> > > > >> > configuration file >> >>>> > > > was >> >>>> > > > >> > loaded. >> >>>> > > > >> > Traceback (most recent call last): >> >>>> > > > >> > File "/usr/sbin/cephadm", line 8029, in <module> >> >>>> > > > >> > main() >> >>>> > > > >> > File "/usr/sbin/cephadm", line 8017, in main >> >>>> > > > >> > r = ctx.func(ctx) >> >>>> > > > >> > File "/usr/sbin/cephadm", line 1678, in _infer_fsid >> >>>> > > > >> > return func(ctx) >> >>>> > > > >> > File "/usr/sbin/cephadm", line 1738, in _infer_image >> >>>> > > > >> > return func(ctx) >> >>>> > > > >> > File "/usr/sbin/cephadm", line 4514, in command_ceph_volume >> >>>> > > > >> > out, err, code = call_throws(ctx, c.run_cmd(), >> >>>> > > > verbosity=verbosity) >> >>>> > > > >> > File "/usr/sbin/cephadm", line 1464, in call_throws >> >>>> > > > >> > raise RuntimeError('Failed command: %s' % ' >> >>>> > > > >> > '.join(command)) >> >>>> > > > >> > RuntimeError: Failed command: /usr/bin/docker run --rm >> >>>> > > > >> > --ipc=host >> >>>> > > > >> > --net=host --entrypoint /usr/sbin/ceph-volume --privileged >> >>>> > > > >> --group-add=disk >> >>>> > > > >> > --init -e CONTAINER_IMAGE=ceph/ceph@sha256 >> >>>> > > :54e95ae1e11404157d7b329d0t >> >>>> > > > >> > >> >>>> > > > >> > root@drywood12:~# cephadm shell >> >>>> > > > >> > Inferring fsid 1518c8e0-bbe4-11eb-9772-001e67dc85ea >> >>>> > > > >> > Inferring config >> >>>> > > > >> > >> >>>> > > > /var/lib/ceph/1518c8e0-bbe4-11eb-9772-001e67dc85ea/mon.drywood12/config >> >>>> > > > >> > Using recent ceph image ceph/ceph@sha256 >> >>>> > > > >> > :54e95ae1e11404157d7b329d0bef866ebbb214b195a009e87aae4eba9d282949 >> >>>> > > > >> > root@drywood12:/# ceph-volume lvm create --data /dev/sda >> >>>> > > > >> > --dmcrypt >> >>>> > > > >> > Running command: /usr/bin/ceph-authtool --gen-print-key >> >>>> > > > >> > Running command: /usr/bin/ceph-authtool --gen-print-key >> >>>> > > > >> > Running command: /usr/bin/ceph --cluster ceph --name >> >>>> > > > client.bootstrap-osd >> >>>> > > > >> > --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd >> >>>> > > > >> > new >> >>>> > > > >> > 70054a5c-c176-463a-a0ac-b44c5db0987c >> >>>> > > > >> > stderr: 2021-05-25T07:46:18.188+0000 7fdef8f0d700 -1 auth: >> >>>> > > > >> > unable >> >>>> > > to >> >>>> > > > >> find >> >>>> > > > >> > a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) >> >>>> > > > >> > No such >> >>>> > > > file >> >>>> > > > >> or >> >>>> > > > >> > directory >> >>>> > > > >> > stderr: 2021-05-25T07:46:18.188+0000 7fdef8f0d700 -1 >> >>>> > > > >> > AuthRegistry(0x7fdef405b378) no keyring found at >> >>>> > > > >> > /var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx >> >>>> > > > >> > stderr: 2021-05-25T07:46:18.188+0000 7fdef8f0d700 -1 auth: >> >>>> > > > >> > unable >> >>>> > > to >> >>>> > > > >> find >> >>>> > > > >> > a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) >> >>>> > > > >> > No such >> >>>> > > > file >> >>>> > > > >> or >> >>>> > > > >> > directory >> >>>> > > > >> > stderr: 2021-05-25T07:46:18.188+0000 7fdef8f0d700 -1 >> >>>> > > > >> > AuthRegistry(0x7fdef405ef20) no keyring found at >> >>>> > > > >> > /var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx >> >>>> > > > >> > stderr: 2021-05-25T07:46:18.188+0000 7fdef8f0d700 -1 auth: >> >>>> > > > >> > unable >> >>>> > > to >> >>>> > > > >> find >> >>>> > > > >> > a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) >> >>>> > > > >> > No such >> >>>> > > > file >> >>>> > > > >> or >> >>>> > > > >> > directory >> >>>> > > > >> > stderr: 2021-05-25T07:46:18.188+0000 7fdef8f0d700 -1 >> >>>> > > > >> > AuthRegistry(0x7fdef8f0bea0) no keyring found at >> >>>> > > > >> > /var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx >> >>>> > > > >> > stderr: 2021-05-25T07:46:18.188+0000 7fdef2d9d700 -1 >> >>>> > > > monclient(hunting): >> >>>> > > > >> > handle_auth_bad_method server allowed_methods [2] but i only >> >>>> > > > >> > support >> >>>> > > > [1] >> >>>> > > > >> > stderr: 2021-05-25T07:46:18.188+0000 7fdef259c700 -1 >> >>>> > > > monclient(hunting): >> >>>> > > > >> > handle_auth_bad_method server allowed_methods [2] but i only >> >>>> > > > >> > support >> >>>> > > > [1] >> >>>> > > > >> > stderr: 2021-05-25T07:46:18.188+0000 7fdef1d9b700 -1 >> >>>> > > > monclient(hunting): >> >>>> > > > >> > handle_auth_bad_method server allowed_methods [2] but i only >> >>>> > > > >> > support >> >>>> > > > [1] >> >>>> > > > >> > stderr: 2021-05-25T07:46:18.188+0000 7fdef8f0d700 -1 >> >>>> > > > >> > monclient: >> >>>> > > > >> > authenticate NOTE: no keyring found; disabled cephx >> >>>> > > > >> > authentication >> >>>> > > > >> > stderr: [errno 13] RADOS permission denied (error >> >>>> > > > >> > connecting to the >> >>>> > > > >> > cluster) >> >>>> > > > >> > --> RuntimeError: Unable to create a new OSD id >> >>>> > > > >> > root@drywood12:/# lsblk /dev/sda >> >>>> > > > >> > NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT >> >>>> > > > >> > sda 8:0 0 7.3T 0 disk >> >>>> > > > >> > >> >>>> > > > >> > As far as I can see cephadm gets a little further than this >> >>>> > > > >> > as the >> >>>> > > > disks >> >>>> > > > >> > have lvm volumes on them just the osd's daemons are not >> >>>> > > > >> > created or >> >>>> > > > >> started. >> >>>> > > > >> > So maybe I'm invoking ceph-volume incorrectly. >> >>>> > > > >> > >> >>>> > > > >> > >> >>>> > > > >> > On Tue, 25 May 2021 at 06:57, Peter Childs <pchi...@bcs.org> >> >>>> > > > >> > wrote: >> >>>> > > > >> > >> >>>> > > > >> >> >> >>>> > > > >> >> >> >>>> > > > >> >> On Mon, 24 May 2021, 21:08 Marc, <m...@f1-outsourcing.eu> >> >>>> > > > >> >> wrote: >> >>>> > > > >> >> >> >>>> > > > >> >>> > >> >>>> > > > >> >>> > I'm attempting to use cephadm and Pacific, currently on >> >>>> > > > >> >>> > debian >> >>>> > > > >> buster, >> >>>> > > > >> >>> > mostly because centos7 ain't supported any more and >> >>>> > > > >> >>> > cenotos8 >> >>>> > > ain't >> >>>> > > > >> >>> > support >> >>>> > > > >> >>> > by some of my hardware. >> >>>> > > > >> >>> >> >>>> > > > >> >>> Who says centos7 is not supported any more? Afaik >> >>>> > > > >> >>> centos7/el7 is >> >>>> > > > being >> >>>> > > > >> >>> supported till its EOL 2024. By then maybe a good >> >>>> > > > >> >>> alternative for >> >>>> > > > >> >>> el8/stream has surfaced. >> >>>> > > > >> >>> >> >>>> > > > >> >> >> >>>> > > > >> >> Not supported by ceph Pacific, it's our os of choice >> >>>> > > > >> >> otherwise. >> >>>> > > > >> >> >> >>>> > > > >> >> My testing says the version available of podman, docker and >> >>>> > > python3, >> >>>> > > > do >> >>>> > > > >> >> not work with Pacific. >> >>>> > > > >> >> >> >>>> > > > >> >> Given I've needed to upgrade docker on buster can we please >> >>>> > > > >> >> have a >> >>>> > > > list >> >>>> > > > >> of >> >>>> > > > >> >> versions that work with cephadm, maybe even have cephadm >> >>>> > > > >> >> say no, >> >>>> > > > please >> >>>> > > > >> >> upgrade unless your running the right version or better. >> >>>> > > > >> >> >> >>>> > > > >> >> >> >>>> > > > >> >> >> >>>> > > > >> >>> > Anyway I have a few nodes with 59x 7.2TB disks but for >> >>>> > > > >> >>> > some >> >>>> > > reason >> >>>> > > > >> the >> >>>> > > > >> >>> > osd >> >>>> > > > >> >>> > daemons don't start, the disks get formatted and the osd >> >>>> > > > >> >>> > are >> >>>> > > > created >> >>>> > > > >> but >> >>>> > > > >> >>> > the daemons never come up. >> >>>> > > > >> >>> >> >>>> > > > >> >>> what if you try with >> >>>> > > > >> >>> ceph-volume lvm create --data /dev/sdi --dmcrypt ? >> >>>> > > > >> >>> >> >>>> > > > >> >> >> >>>> > > > >> >> I'll have a go. >> >>>> > > > >> >> >> >>>> > > > >> >> >> >>>> > > > >> >>> > They are probably the wrong spec for ceph (48gb of >> >>>> > > > >> >>> > memory and >> >>>> > > > only 4 >> >>>> > > > >> >>> > cores) >> >>>> > > > >> >>> >> >>>> > > > >> >>> You can always start with just configuring a few disks per >> >>>> > > > >> >>> node. >> >>>> > > > That >> >>>> > > > >> >>> should always work. >> >>>> > > > >> >>> >> >>>> > > > >> >> >> >>>> > > > >> >> That was my thought too. >> >>>> > > > >> >> >> >>>> > > > >> >> Thanks >> >>>> > > > >> >> >> >>>> > > > >> >> Peter >> >>>> > > > >> >> >> >>>> > > > >> >> >> >>>> > > > >> >>> > but I was expecting them to start and be either dirt >> >>>> > > > >> >>> > slow or >> >>>> > > crash >> >>>> > > > >> >>> > later, >> >>>> > > > >> >>> > anyway I've got upto 30 of them, so I was hoping on >> >>>> > > > >> >>> > getting at >> >>>> > > > least >> >>>> > > > >> get >> >>>> > > > >> >>> > 6PB of raw storage out of them. >> >>>> > > > >> >>> > >> >>>> > > > >> >>> > As yet I've not spotted any helpful error messages. >> >>>> > > > >> >>> > >> >>>> > > > >> >>> _______________________________________________ >> >>>> > > > >> >>> ceph-users mailing list -- ceph-users@ceph.io >> >>>> > > > >> >>> To unsubscribe send an email to ceph-users-le...@ceph.io >> >>>> > > > >> >>> >> >>>> > > > >> >> >> >>>> > > > >> > _______________________________________________ >> >>>> > > > >> > ceph-users mailing list -- ceph-users@ceph.io >> >>>> > > > >> > To unsubscribe send an email to ceph-users-le...@ceph.io >> >>>> > > > >> >> >>>> > > > >> >> >>>> > > > >> _______________________________________________ >> >>>> > > > >> ceph-users mailing list -- ceph-users@ceph.io >> >>>> > > > >> To unsubscribe send an email to ceph-users-le...@ceph.io >> >>>> > > > >> >> >>>> > > > >> >>>> > > > >> >>>> > > > _______________________________________________ >> >>>> > > > ceph-users mailing list -- ceph-users@ceph.io >> >>>> > > > To unsubscribe send an email to ceph-users-le...@ceph.io >> >>>> > > > >> >>>> > > _______________________________________________ >> >>>> > > ceph-users mailing list -- ceph-users@ceph.io >> >>>> > > To unsubscribe send an email to ceph-users-le...@ceph.io >> >>>> > > >> >>>> > _______________________________________________ >> >>>> > ceph-users mailing list -- ceph-users@ceph.io >> >>>> > To unsubscribe send an email to ceph-users-le...@ceph.io _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io