Am 30.08.21 um 17:39 schrieb Alcatraz:

Thanks for responding! And of course.

1. ceph orch ls --service-type osd --format yaml


service_type: osd
service_id: all-available-devices
service_name: osd.all-available-devices
  host_pattern: '*'
unmanaged: true
    all: true
  filter_logic: AND
  objectstore: bluestore
  created: '2021-08-30T13:57:51.000178Z'
  last_refresh: '2021-08-30T15:24:10.534710Z'
  running: 0
  size: 6
- 2021-08-30T03:48:01.652108Z service:osd.all-available-devices [INFO] "service was
- "2021-08-30T03:49:00.267808Z service:osd.all-available-devices [ERROR] \"Failed\   \ to apply: cephadm exited with an error code: 1, stderr:Non-zero exit code 1 from\   \ /usr/bin/docker container inspect --format {{.State.Status}} ceph-d1405594-0944-11ec-8ebc-f23c92edc936-osd.0\n\   /usr/bin/docker: stdout \n/usr/bin/docker: stderr Error: No such container: ceph-d1405594-0944-11ec-8ebc-f23c92edc936-osd.0\n\   Deploy daemon osd.0 ...\nTraceback (most recent call last):\n File \"/var/lib/ceph/d1405594-0944-11ec-8ebc-f23c92edc936/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931\"\   , line 8230, in <module>\n    main()\n  File \"/var/lib/ceph/d1405594-0944-11ec-8ebc-f23c92edc936/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931\"\   , line 8218, in main\n    r = ctx.func(ctx)\n  File \"/var/lib/ceph/d1405594-0944-11ec-8ebc-f23c92edc936/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931\"\   , line 1759, in _default_image\n    return func(ctx)\n  File \"/var/lib/ceph/d1405594-0944-11ec-8ebc-f23c92edc936/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931\"\   , line 4326, in command_deploy\n    ports=daemon_ports)\n  File \"/var/lib/ceph/d1405594-0944-11ec-8ebc-f23c92edc936/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931\"\   , line 2632, in deploy_daemon\n    c, osd_fsid=osd_fsid, ports=ports)\n  File \"\ /var/lib/ceph/d1405594-0944-11ec-8ebc-f23c92edc936/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931\"\   , line 2801, in deploy_daemon_units\n    install_sysctl(ctx, fsid, daemon_type)\n\   \  File \"/var/lib/ceph/d1405594-0944-11ec-8ebc-f23c92edc936/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931\"\   , line 2963, in install_sysctl\n    _write(conf, lines)\n  File \"/var/lib/ceph/d1405594-0944-11ec-8ebc-f23c92edc936/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931\"\   , line 2948, in _write\n    with open(conf, 'w') as f:\nFileNotFoundError: [Errno\   \ 2] No such file or directory: '/usr/lib/sysctl.d/90-ceph-d1405594-0944-11ec-8ebc-f23c92edc936-osd.conf'\""

- '2021-08-30T03:49:08.356762Z service:osd.all-available-devices [ERROR] "Failed to
  apply: auth get failed: failed to find osd.0 in keyring retval: -2"'
- '2021-08-30T03:52:34.100977Z service:osd.all-available-devices [ERROR] "Failed to
  apply: auth get failed: failed to find osd.3 in keyring retval: -2"'
- '2021-08-30T03:52:42.260439Z service:osd.all-available-devices [ERROR] "Failed to
  apply: auth get failed: failed to find osd.6 in keyring retval: -2"'

Will be fixed by

2. ceph orch ps --daemon-type osd --format yaml

Output: ...snip...

3. ceph auth add osd.0 osd 'allow *' mon 'allow rwx' -i /var/lib/ceph/d1405594-0944-11ec-8ebc-f23c92edc936/osd.0/keyring

I verified /var/lib/ceph/d1405594-0944-11ec-8ebc-f23c92edc936/osd.0/keyring file does exist.


Error EINVAL: caps cannot be specified both in keyring and in command

You only need to create the keyring, you don't need to store the keyring anywhere. I'd still suggest to somehow create the keyring, but I haven't seen this particular error before.




On 8/30/21 10:28, Sebastian Wagner wrote:
Could you run

Am 30.08.21 um 14:49 schrieb Alcatraz:
Hello all,

Running into some issues trying to build a virtual PoC for Ceph. Went to my cloud provider of choice and spun up some nodes. I have three identical hosts consisting of:

Debian 10
8 cpu cores
1x315GB Boot Drive
3x400GB Data drives

After deploying Ceph (v 16.2.5) using cephadm, adding hosts, and logging into the dashboard, Ceph showed 9 OSDs, 0 up, 9 in. I thought perhaps it just needed some time to bring up the OSDs, so I left it running overnight.

This morning, I checked, and the Ceph dashboard shows 9 OSDs, 0 up, 6 in, 3 out. I find this odd, as it hasn't been touched since it was deployed. Ceph health shows "HEALTH_OK", `ceph osd tree` outputs:

-1              0  root default
 0              0  osd.0           down         0  1.00000
 1              0  osd.1           down         0  1.00000
 2              0  osd.2           down         0  1.00000
 3              0  osd.3           down   1.00000  1.00000
 4              0  osd.4           down   1.00000  1.00000
 5              0  osd.5           down   1.00000  1.00000
 6              0  osd.6           down   1.00000  1.00000
 7              0  osd.7           down   1.00000  1.00000
 8              0  osd.8           down   1.00000  1.00000

and if I run `ls /var/run/ceph` the only thing it outputs is "d1405594-0944-11ec-8ebc-f23c92edc936" (sans quotes), which I assume is the cluster ID? So of course, if I run `ceph daemon osd.8 help` for example, it just returns:

Can't get admin socket path: unable to get conf option admin_socket for osd: b"error parsing 'osd': expected string of the form TYPE.ID, valid types are: auth, mon, osd, mds, mgr, client\n"

If I look at the log within the Ceph dashboard, no errors or warnings appear. Will Ceph not work on virtual hardware? Is there something I need to do to bring up the OSDs?

Just as I was about to send this email I went to check the logs and it shows the following (traceback ommited for length):

8/30/21 7:44:15 AM[ERR]Failed to apply osd.all-available-devices spec DriveGroupSpec(name=all-available-devices->placement=PlacementSpec(host_pattern='*'), service_id='all-available-devices', service_type='osd', data_devices=DeviceSelection(all=True), osd_id_claims={}, unmanaged=False, filter_logic='AND', preview_only=False): auth get failed: failed to find osd.6 in keyring retval: -2

8/30/21 7:45:19 AM[ERR]executing create_from_spec_one(([('ceph01', <ceph.deployment.drive_selection.selector.DriveSelection object at 0x7f63a930bf98>), ('ceph02', <ceph.deployment.drive_selection.selector.DriveSelection object at 0x7f63a81ac8d0>), ('ceph03', <ceph.deployment.drive_selection.selector.DriveSelection object at 0x7f63a930b0b8>)],)) failed.

and similar for the other OSDs. I'm not sure why it's complaining about auth, because in order to even add the hosts to the cluster I had to copy the ceph public key to the hosts to begin with.

