date:20220525

[ceph-users] Re: Replacing OSD with DB on shared NVMe

2022-05-25 Thread Edward R Huyer

That did it, thanks!

It seems like something that should be better documented and/or handled 
automatically when replacing drives.

And yeah, I know I don’t have to reapply my OSD spec, but doing so can be 
faster than waiting for the cluster to get around to it.

Thanks again.

From: David Orman 
Sent: Wednesday, May 25, 2022 5:03 PM
To: Edward R Huyer 
Cc: ceph-users@ceph.io
Subject: Re: [ceph-users] Replacing OSD with DB on shared NVMe

In your example, you can login to the server in question with the OSD, and run 
"ceph-volume lvm zap --osd-id  --destroy" and it will purge the DB/WAL 
LV. You don't need to reapply your osd spec, it will detect the available space 
on the nvme and redploy that OSD.

On Wed, May 25, 2022 at 3:37 PM Edward R Huyer 
mailto:erh...@rit.edu>> wrote:
Ok, I'm not sure if I'm missing something or if this is a gap in ceph orch 
functionality, or what:

On a given host all the OSDs share a single large NVMe drive for DB/WAL storage 
and were set up using a simple ceph orch spec file.  I'm replacing some of the 
OSDs.  After they've been removed with the dashboard equivalent of "ceph orch 
osd rm # --replace" and a new drive has been swapped in, how do I get the OSD 
recreated using the chunk of NVMe for DB/WAL storage?  Because the NVMe has 
data and is still in use by other OSDs, the orchestrator doesn't seem to 
recognize it as a valid storage location, so it won't create the OSDs when I do 
"ceph orch apply -i osdspec.yml".

Thoughts?

-
Edward Huyer
Golisano College of Computing and Information Sciences
Rochester Institute of Technology
Golisano 70-2373
152 Lomb Memorial Drive
Rochester, NY 14623
585-475-6651
erh...@rit.edu>

Obligatory Legalese:
The information transmitted, including attachments, is intended only for the 
person(s) or entity to which it is addressed and may contain confidential 
and/or privileged material. Any review, retransmission, dissemination or other 
use of, or taking of any action in reliance upon this information by persons or 
entities other than the intended recipient is prohibited. If you received this 
in error, please contact the sender and destroy any copies of this information.

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to 
ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Replacing OSD with DB on shared NVMe

2022-05-25 Thread David Orman

In your example, you can login to the server in question with the OSD, and
run "ceph-volume lvm zap --osd-id  --destroy" and it will purge the
DB/WAL LV. You don't need to reapply your osd spec, it will detect the
available space on the nvme and redploy that OSD.

On Wed, May 25, 2022 at 3:37 PM Edward R Huyer  wrote:

> Ok, I'm not sure if I'm missing something or if this is a gap in ceph orch
> functionality, or what:
>
> On a given host all the OSDs share a single large NVMe drive for DB/WAL
> storage and were set up using a simple ceph orch spec file.  I'm replacing
> some of the OSDs.  After they've been removed with the dashboard equivalent
> of "ceph orch osd rm # --replace" and a new drive has been swapped in, how
> do I get the OSD recreated using the chunk of NVMe for DB/WAL storage?
> Because the NVMe has data and is still in use by other OSDs, the
> orchestrator doesn't seem to recognize it as a valid storage location, so
> it won't create the OSDs when I do "ceph orch apply -i osdspec.yml".
>
> Thoughts?
>
> -
> Edward Huyer
> Golisano College of Computing and Information Sciences
> Rochester Institute of Technology
> Golisano 70-2373
> 152 Lomb Memorial Drive
> Rochester, NY 14623
> 585-475-6651
> erh...@rit.edu
>
> Obligatory Legalese:
> The information transmitted, including attachments, is intended only for
> the person(s) or entity to which it is addressed and may contain
> confidential and/or privileged material. Any review, retransmission,
> dissemination or other use of, or taking of any action in reliance upon
> this information by persons or entities other than the intended recipient
> is prohibited. If you received this in error, please contact the sender and
> destroy any copies of this information.
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Cluster healthy, but 16.2.7 osd daemon upgrade says its unsafe to stop them?

2022-05-25 Thread Sarunas Burdulis


On 25/05/2022 15.39, Tim Olow wrote:

Do you have any pools with only one replica?


All pools are 'replicated size' 2 or 3, 'min_size' 1 or 2.

--
Sarunas Burdulis
Dartmouth Mathematics
https://math.dartmouth.edu/~sarunas

· https://useplaintext.email ·
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Replacing OSD with DB on shared NVMe

2022-05-25 Thread Edward R Huyer

Ok, I'm not sure if I'm missing something or if this is a gap in ceph orch 
functionality, or what:

On a given host all the OSDs share a single large NVMe drive for DB/WAL storage 
and were set up using a simple ceph orch spec file.  I'm replacing some of the 
OSDs.  After they've been removed with the dashboard equivalent of "ceph orch 
osd rm # --replace" and a new drive has been swapped in, how do I get the OSD 
recreated using the chunk of NVMe for DB/WAL storage?  Because the NVMe has 
data and is still in use by other OSDs, the orchestrator doesn't seem to 
recognize it as a valid storage location, so it won't create the OSDs when I do 
"ceph orch apply -i osdspec.yml".

Thoughts?

-
Edward Huyer
Golisano College of Computing and Information Sciences
Rochester Institute of Technology
Golisano 70-2373
152 Lomb Memorial Drive
Rochester, NY 14623
585-475-6651
erh...@rit.edu

Obligatory Legalese:
The information transmitted, including attachments, is intended only for the 
person(s) or entity to which it is addressed and may contain confidential 
and/or privileged material. Any review, retransmission, dissemination or other 
use of, or taking of any action in reliance upon this information by persons or 
entities other than the intended recipient is prohibited. If you received this 
in error, please contact the sender and destroy any copies of this information.

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Error deploying iscsi service through cephadm

2022-05-25 Thread Teoman Onay

Hello,

Silly question but have you created the pool which will be used by the
gateway??

On Wed, 25 May 2022, 21:25 Heiner Hardt,  wrote:

> Sorry to be intransigent asking again but is there anyone facing issues on
> deploying iscsi-gateway daemons through CEPHADM?
>
> I´m still having issues trying to deploy iscsi-gateways with no success
> because service containers fail to load ending up with the error : "Error:
> No such object" .
>
> Any help will be great. Thanks.
>
> On Thu, May 19, 2022 at 4:41 PM Heiner Hardt  wrote:
>
> > Hi,
> >
> > Is there anyone facing issues on deploying iscsi-gateway daemons through
> > CEPHADM? When I try to create a new iscsi service it finishes the deploy,
> > starts the container correctly but destroys it right after starting it.
> The
> > daemon shows up with error as it cannot be started.
> >
> > Bellow follows the 2 containers deployed and destroyed in matter of
> > seconds:
> >
> > 602dc3f8dce4   quay.io/ceph/ceph
> > "/usr/bin/rbd-target…"   Less than a second ago   Up Less than a second
> >
> > ceph-fec08570-b6d7-11ec-af55-01f54f89bfd2-iscsi-connector-iscsi-gw-gwpnnq
> > 28429d6f364a   quay.io/ceph/ceph
> > "/usr/bin/tcmu-runner"   Less than a second ago   Up Less than a second
> >
> >
> ceph-fec08570-b6d7-11ec-af55-01f54f89bfd2-iscsi-connector-iscsi-gw-gwpnnq-tcmu
> >
> > It seems the services are failing to come up for some reason but I cannot
> > figure why.
> >
> > 2022-05-19 19:25:36,543 7f739d861740 DEBUG /usr/bin/docker: [
> >
> quay.io/prometheus/node-exporter@sha256:f2269e73124dd0f60a7d19a2ce1264d33d08a985aed0ee6b0b89d0be470592cd
> > ]
> > 2022-05-19 19:25:36,679 7f739d861740 DEBUG /usr/bin/docker:
> node_exporter,
> > version 1.3.1 (branch: HEAD, revision:
> > a2321e7b940ddcff26873612bccdf7cd4c42b6b6)
> > 2022-05-19 19:25:36,679 7f739d861740 DEBUG /usr/bin/docker:   build user:
> >   root@243aafa5525c
> > 2022-05-19 19:25:36,680 7f739d861740 DEBUG /usr/bin/docker:   build date:
> >   20211205-11:09:49
> > 2022-05-19 19:25:36,680 7f739d861740 DEBUG /usr/bin/docker:   go version:
> >   go1.17.3
> > 2022-05-19 19:25:36,680 7f739d861740 DEBUG /usr/bin/docker:   platform:
> >   linux/amd64
> > 2022-05-19 19:25:36,696 7f739d861740 DEBUG systemctl: enabled
> > 2022-05-19 19:25:36,703 7f739d861740 DEBUG systemctl: failed
> > 2022-05-19 19:25:36,752 7f739d861740 DEBUG /usr/bin/docker:
> > 2022-05-19 19:25:36,752 7f739d861740 DEBUG /usr/bin/docker: Error: No
> such
> > object:
> > ceph-fec08570-b6d7-11ec-af55-01f54f89bfd2-iscsi-connector-iscsi-gw-gwpnnq
> > 2022-05-19 19:25:36,810 7f739d861740 DEBUG /usr/bin/docker:
> > 2022-05-19 19:25:36,810 7f739d861740 DEBUG /usr/bin/docker: Error: No
> such
> > object:
> > ceph-fec08570-b6d7-11ec-af55-01f54f89bfd2-iscsi.connector.iscsi-gw.gwpnnq
> > 2022-05-19 19:25:36,823 7f739d861740 DEBUG systemctl: enabled
> > 2022-05-19 19:25:36,833 7f739d861740 DEBUG systemctl: active
> >
> > This Ceph cluster runs on Ubuntu 20.04 and were upgraded to Quincy
> > recently  (version 17.2.0 ).
> >
> > Any thoughts?
> >
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Ceph Repo Branch Rename - May 24

2022-05-25 Thread Neha Ojha

Great, thanks for all the hard work, David and team!

- Neha

On Wed, May 25, 2022 at 12:47 PM David Galloway  wrote:
>
> I was successfully able to get a 'main' build completed.
>
> This means you should be able to push your branches to ceph-ci.git and
> get a build now.
>
> Thank you for your patience.
>
> On 5/24/22 18:30, David Galloway wrote:
> > This maintenance is ongoing. This was a much larger effort than
> > anticipated.
> >
> > I've unpaused Jenkins but fully expect many jobs to fail for the next
> > couple days.
> >
> > If you had a PR targeting master, you will need to edit the PR to target
> > main now instead.
> >
> > I appreciate your patience.
> >
> > On 5/19/22 14:38, David Galloway wrote:
> >> Hi all,
> >>
> >> In an effort to use more inclusive language, we will be renaming all
> >> Ceph repo 'master' branches to 'main' on May 24.
> >>
> >> I anticipate making the change in the morning Eastern US time, merging
> >> all 's/master/main' pull requests I already have open, then tracking
> >> down and fixing any remaining references to the master branch.
> >>
> >> Please excuse the disruption and thank you for your patience.
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Ceph Repo Branch Rename - May 24

2022-05-25 Thread David Galloway


I was successfully able to get a 'main' build completed.

This means you should be able to push your branches to ceph-ci.git and 
get a build now.


Thank you for your patience.

On 5/24/22 18:30, David Galloway wrote:
This maintenance is ongoing. This was a much larger effort than 
anticipated.


I've unpaused Jenkins but fully expect many jobs to fail for the next 
couple days.


If you had a PR targeting master, you will need to edit the PR to target 
main now instead.


I appreciate your patience.

On 5/19/22 14:38, David Galloway wrote:

Hi all,

In an effort to use more inclusive language, we will be renaming all 
Ceph repo 'master' branches to 'main' on May 24.


I anticipate making the change in the morning Eastern US time, merging 
all 's/master/main' pull requests I already have open, then tracking 
down and fixing any remaining references to the master branch.


Please excuse the disruption and thank you for your patience.


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Cluster healthy, but 16.2.7 osd daemon upgrade says its unsafe to stop them?

2022-05-25 Thread Tim Olow

Do you have any pools with only one replica?

Tim

On 5/25/22, 1:48 PM, "Sarunas Burdulis"  wrote:

> ceph health detail says my 5-node cluster is healthy, yet when I ran 
> ceph orch upgrade start --ceph-version 16.2.7 everything seemed to go 
> fine until we got to the OSD section, now for the past hour, every 15 
> seconds a new log entry of   'Upgrade: unsafe to stop osd(s) at this time 
> (1 PGs are or would become offline)' appears in the logs.

Hi,

Has there been any solution or workaround to this?

We have a seemingly healthy cluster, which is stuck on OSD upgrade step 
when upgrading from 15.2.16 to 16.2.8 with the same error(s).

-- 
Sarunas Burdulis
Dartmouth Mathematics
math.dartmouth.edu/~sarunas

· https://useplaintext.email ·

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Error deploying iscsi service through cephadm

2022-05-25 Thread Heiner Hardt

Sorry to be intransigent asking again but is there anyone facing issues on
deploying iscsi-gateway daemons through CEPHADM?

I´m still having issues trying to deploy iscsi-gateways with no success
because service containers fail to load ending up with the error : "Error:
No such object" .

Any help will be great. Thanks.

On Thu, May 19, 2022 at 4:41 PM Heiner Hardt  wrote:

> Hi,
>
> Is there anyone facing issues on deploying iscsi-gateway daemons through
> CEPHADM? When I try to create a new iscsi service it finishes the deploy,
> starts the container correctly but destroys it right after starting it. The
> daemon shows up with error as it cannot be started.
>
> Bellow follows the 2 containers deployed and destroyed in matter of
> seconds:
>
> 602dc3f8dce4   quay.io/ceph/ceph
> "/usr/bin/rbd-target…"   Less than a second ago   Up Less than a second
>
> ceph-fec08570-b6d7-11ec-af55-01f54f89bfd2-iscsi-connector-iscsi-gw-gwpnnq
> 28429d6f364a   quay.io/ceph/ceph
> "/usr/bin/tcmu-runner"   Less than a second ago   Up Less than a second
>
> ceph-fec08570-b6d7-11ec-af55-01f54f89bfd2-iscsi-connector-iscsi-gw-gwpnnq-tcmu
>
> It seems the services are failing to come up for some reason but I cannot
> figure why.
>
> 2022-05-19 19:25:36,543 7f739d861740 DEBUG /usr/bin/docker: [
> quay.io/prometheus/node-exporter@sha256:f2269e73124dd0f60a7d19a2ce1264d33d08a985aed0ee6b0b89d0be470592cd
> ]
> 2022-05-19 19:25:36,679 7f739d861740 DEBUG /usr/bin/docker: node_exporter,
> version 1.3.1 (branch: HEAD, revision:
> a2321e7b940ddcff26873612bccdf7cd4c42b6b6)
> 2022-05-19 19:25:36,679 7f739d861740 DEBUG /usr/bin/docker:   build user:
>   root@243aafa5525c
> 2022-05-19 19:25:36,680 7f739d861740 DEBUG /usr/bin/docker:   build date:
>   20211205-11:09:49
> 2022-05-19 19:25:36,680 7f739d861740 DEBUG /usr/bin/docker:   go version:
>   go1.17.3
> 2022-05-19 19:25:36,680 7f739d861740 DEBUG /usr/bin/docker:   platform:
>   linux/amd64
> 2022-05-19 19:25:36,696 7f739d861740 DEBUG systemctl: enabled
> 2022-05-19 19:25:36,703 7f739d861740 DEBUG systemctl: failed
> 2022-05-19 19:25:36,752 7f739d861740 DEBUG /usr/bin/docker:
> 2022-05-19 19:25:36,752 7f739d861740 DEBUG /usr/bin/docker: Error: No such
> object:
> ceph-fec08570-b6d7-11ec-af55-01f54f89bfd2-iscsi-connector-iscsi-gw-gwpnnq
> 2022-05-19 19:25:36,810 7f739d861740 DEBUG /usr/bin/docker:
> 2022-05-19 19:25:36,810 7f739d861740 DEBUG /usr/bin/docker: Error: No such
> object:
> ceph-fec08570-b6d7-11ec-af55-01f54f89bfd2-iscsi.connector.iscsi-gw.gwpnnq
> 2022-05-19 19:25:36,823 7f739d861740 DEBUG systemctl: enabled
> 2022-05-19 19:25:36,833 7f739d861740 DEBUG systemctl: active
>
> This Ceph cluster runs on Ubuntu 20.04 and were upgraded to Quincy
> recently  (version 17.2.0 ).
>
> Any thoughts?
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Cluster healthy, but 16.2.7 osd daemon upgrade says its unsafe to stop them?

2022-05-25 Thread Sarunas Burdulis

ceph health detail says my 5-node cluster is healthy, yet when I ran 
ceph orch upgrade start --ceph-version 16.2.7 everything seemed to go 
fine until we got to the OSD section, now for the past hour, every 15 
seconds a new log entry of   'Upgrade: unsafe to stop osd(s) at this time 
(1 PGs are or would become offline)' appears in the logs.


Hi,

Has there been any solution or workaround to this?

We have a seemingly healthy cluster, which is stuck on OSD upgrade step 
when upgrading from 15.2.16 to 16.2.8 with the same error(s).


--
Sarunas Burdulis
Dartmouth Mathematics
math.dartmouth.edu/~sarunas

· https://useplaintext.email ·
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] OSDs won't boot after host restart

2022-05-25 Thread Andrew Cowan

I have a small 4-host ceph cluster.  Recently, after rebooting one of the
hosts, all of its daemons come back up smoothly EXCEPT the OSDs.

All of the OSDs have identical journal entries, as below.

'ceph-bluestore-tool fsck' fails with:

2022-05-25T17:27:02.208+ 7f45150a70c0 -1 bluefs _check_new_allocations
invalid extent 1: 0x1a72~1: wasn't given but allocated for ino 1
2022-05-25T17:27:02.208+ 7f45150a70c0 -1 bluefs mount failed to replay
log: (14) Bad address
2022-05-25T17:27:02.208+ 7f45150a70c0 -1
bluestore(/var/lib/ceph/d7511c3e-a570-11ec-9c50-3bd4b57e7e6e/osd.0)
_open_bluefs failed bluefs mount: (14) Bad address
fsck failed: (14) Bad address

Any help/ thoughts would be very much appreciated.  OSD logs below.

Best,

Andy Piltser-Cowan

May 25 16:51:41 forth systemd[1]: Started Ceph osd.0 for
d7511c3e-a570-11ec-9c50-3bd4b57e7e6e.
May 25 16:51:44 forth bash[23554]: Running command: /usr/bin/chown -R
ceph:ceph /var/lib/ceph/osd/ceph-0
May 25 16:51:44 forth bash[23554]: Running command:
/usr/bin/ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev
/dev/ceph-979b08d5-747f-4103-a041-92ba46febdec/osd-block-42e57a20-3077-4208-a682-084b9dd1e634
--path /var/lib/ceph/osd/ce>
May 25 16:51:44 forth bash[23554]: Running command: /usr/bin/ln -snf
/dev/ceph-979b08d5-747f-4103-a041-92ba46febdec/osd-block-42e57a20-3077-4208-a682-084b9dd1e634
/var/lib/ceph/osd/ceph-0/block
May 25 16:51:44 forth bash[23554]: Running command: /usr/bin/chown -h
ceph:ceph /var/lib/ceph/osd/ceph-0/block
May 25 16:51:44 forth bash[23554]: Running command: /usr/bin/chown -R
ceph:ceph /dev/dm-3
May 25 16:51:44 forth bash[23554]: Running command: /usr/bin/chown -R
ceph:ceph /var/lib/ceph/osd/ceph-0
May 25 16:51:44 forth bash[23554]: --> ceph-volume lvm activate successful
for osd ID: 0
May 25 16:51:48 forth bash[24332]: debug 2022-05-25T16:51:48.527+
7fee82abf080  0 set uid:gid to 167:167 (ceph:ceph)
May 25 16:51:48 forth bash[24332]: debug 2022-05-25T16:51:48.527+
7fee82abf080  0 ceph version 16.2.7
(dd0603118f56ab514f133c8d2e3adfc983942503) pacific (stable), process
ceph-osd, pid 7
May 25 16:51:48 forth bash[24332]: debug 2022-05-25T16:51:48.527+
7fee82abf080  0 pidfile_write: ignore empty --pid-file
May 25 16:51:48 forth bash[24332]: debug 2022-05-25T16:51:48.531+
7fee82abf080  1 bdev(0x559b61752800 /var/lib/ceph/osd/ceph-0/block) open
path /var/lib/ceph/osd/ceph-0/block
May 25 16:51:48 forth bash[24332]: debug 2022-05-25T16:51:48.531+
7fee82abf080  1 bdev(0x559b61752800 /var/lib/ceph/osd/ceph-0/block) open
size 330461952 (0x2ba7fc0, 2.7 TiB) block_size 4096 (4 KiB)
rotational discard not sup>
May 25 16:51:48 forth bash[24332]: debug 2022-05-25T16:51:48.535+
7fee82abf080  1 bluestore(/var/lib/ceph/osd/ceph-0) _set_cache_sizes
cache_size 1073741824 meta 0.45 kv 0.45 data 0.06
May 25 16:51:48 forth bash[24332]: debug 2022-05-25T16:51:48.535+
7fee82abf080  1 bdev(0x559b61752c00 /var/lib/ceph/osd/ceph-0/block) open
path /var/lib/ceph/osd/ceph-0/block
May 25 16:51:48 forth bash[24332]: debug 2022-05-25T16:51:48.535+
7fee82abf080  1 bdev(0x559b61752c00 /var/lib/ceph/osd/ceph-0/block) open
size 330461952 (0x2ba7fc0, 2.7 TiB) block_size 4096 (4 KiB)
rotational discard not sup>
May 25 16:51:48 forth bash[24332]: debug 2022-05-25T16:51:48.535+
7fee82abf080  1 bluefs add_block_device bdev 1 path
/var/lib/ceph/osd/ceph-0/block size 2.7 TiB
May 25 16:51:48 forth bash[24332]: debug 2022-05-25T16:51:48.535+
7fee82abf080  1 bdev(0x559b61752c00 /var/lib/ceph/osd/ceph-0/block) close
May 25 16:51:48 forth bash[24332]: debug 2022-05-25T16:51:48.827+
7fee82abf080  1 bdev(0x559b61752800 /var/lib/ceph/osd/ceph-0/block) close
May 25 16:51:49 forth bash[24332]: debug 2022-05-25T16:51:49.111+
7fee82abf080  0 starting osd.0 osd_data /var/lib/ceph/osd/ceph-0
/var/lib/ceph/osd/ceph-0/journal
May 25 16:51:49 forth bash[24332]: debug 2022-05-25T16:51:49.115+
7fee82abf080 -1 Falling back to public interface
May 25 16:51:49 forth bash[24332]: debug 2022-05-25T16:51:49.147+
7fee82abf080  0 load: jerasure load: lrc load: isa
May 25 16:51:49 forth bash[24332]: debug 2022-05-25T16:51:49.147+
7fee82abf080  1 bdev(0x559b62422400 /var/lib/ceph/osd/ceph-0/block) open
path /var/lib/ceph/osd/ceph-0/block
May 25 16:51:49 forth bash[24332]: debug 2022-05-25T16:51:49.147+
7fee82abf080 -1 bdev(0x559b62422400 /var/lib/ceph/osd/ceph-0/block) open
open got: (13) Permission denied
May 25 16:51:49 forth bash[24332]: debug 2022-05-25T16:51:49.147+
7fee82abf080  1 bdev(0x559b62422400 /var/lib/ceph/osd/ceph-0/block) open
path /var/lib/ceph/osd/ceph-0/block
May 25 16:51:49 forth bash[24332]: debug 2022-05-25T16:51:49.147+
7fee82abf080 -1 bdev(0x559b62422400 /var/lib/ceph/osd/ceph-0/block) open
open got: (13) Permission denied
May 25 16:51:49 forth bash[24332]: debug 2022-05-25T16:51:49.147+
7fee82abf080  0 osd.0:0.OSDShard using op scheduler

[ceph-users] Ceph Leadership Team Meeting

2022-05-25 Thread Ernesto Puerta

Hi Cephers,

These are the topics discussed in today's meeting:


   - *Change in the release process*


   - Patrick suggesting version bump PRs vs current commit push approach


   - Commits are not signed


   - Avoids freezing the branch during hotfixes


   - Both for hotfixes and regular dot releases


   - Needs further discussion


   -
   https://www.atlassian.com/git/tutorials/comparing-workflows/gitflow-workflow
<--
   more closely matches current model + proposed changes re: PRs, with the
   addition of a development branch
   -
   
https://www.atlassian.com/continuous-delivery/continuous-integration/trunk-based-development


   - *ceph-Jenkins account needs admin privs* to ceph.git in order to push
   directly to branches


   - doesn't apply to PR version bump


   - *publishing (Windows) binaries signed by cloudbase?*


   - Issues with Linux/Ceph Foundation


   - RH might provide the signed binaries


   - Probably don't publish binaries signed by Cloudbase b/c we wouldn't
   get telemetry data back from crashes etc.


   - *master-main rename*


   - rename completed
   - there are still issues with some Jenkins jobs


   - *quincy blogs*


   - *17.2.1 readiness*


   - 3 PRs left


   - release candidate by Jun 1st week


   - *16.2.8 issue retrospective*:
   
https://docs.google.com/presentation/d/1hbwo_GW48O4nnM78US2ghVXglxZw5TRwmWUOy_oxam8/



   - scale testing


   - stricter backport policy


   - lack of reviews in the backport


   - mgr component is orphaned (RADOS, lack of experience in other teams)


   - conflict solving was not properly documented


   - mgr has become too critical (due to cephadm it is now key)


   - don't merge with 1 approval


   - reviewers on sensitive PRs/files should acknowledge they understand
   the code changes


   - different standards across teams


   - try out requiring reviews from CODEOWNERS on the pacific branch


Kind Regards,
Ernesto
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: rbd command hangs

2022-05-25 Thread Ilya Dryomov

On Wed, May 25, 2022 at 9:21 AM Sopena Ballesteros Manuel
 wrote:
>
> attached,
>
>
> nid001388:~ # ceph auth get client.noir
> 2022-05-25T09:20:00.731+0200 7f81f63f3700 -1 auth: unable to find a keyring 
> on 
> /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,:
>  (2) No such file or directory
> 2022-05-25T09:20:00.731+0200 7f81f63f3700 -1 AuthRegistry(0x7f81f005ec68) no 
> keyring found at 
> /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,,
>  disabling cephx
> 2022-05-25T09:20:00.731+0200 7f81f63f3700 -1 auth: unable to find a keyring 
> on 
> /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,:
>  (2) No such file or directory
> 2022-05-25T09:20:00.731+0200 7f81f63f3700 -1 AuthRegistry(0x7f81f63f2060) no 
> keyring found at 
> /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,,
>  disabling cephx
> 2022-05-25T09:20:00.731+0200 7f81f53f1700 -1 monclient(hunting): 
> handle_auth_bad_method server allowed_methods [2] but i only support [1]
> 2022-05-25T09:20:00.735+0200 7f81f63f3700 -1 monclient: authenticate NOTE: no 
> keyring found; disabled cephx authentication
> [errno 95] error connecting to the cluster

On Wed, May 25, 2022 at 9:29 AM Sopena Ballesteros Manuel
 wrote:
>
> also,
>
>
> nid001388:~ # ceph -n client.noir auth get client.noir
> Error EACCES: access denied

It looks like you a have a general authentication issue, not related
to RBD.

Are you to able to connect to the cluster from other nodes?  I would
suggest copying ceph.conf and keyring files in /etc/ceph from there to
this node.  If this node needs to be restricted to just client.noir key
(i.e. you don't client.admin key to be available there), that can be
sorted out later, after basic connectivity is in place.

Thanks,

Ilya
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: v17.2.0 Quincy released

2022-05-25 Thread Thomas Roth


Hello,

just found that this "feature" is not restricted to upgrades - I just tried to bootstrap an entirely new cluster with Quincy, also with the fatal 
switch to non-root-user: adding the second mon results in

> Unable to write lxmon1:/etc/ceph/ceph.conf: scp: /tmp/etc/ceph/ceph.conf.new: 
Permission denied



By now, I go to ceph.io every day to see if the motd has been changed to "If it 
compiles at all, release it as stable".

Cheers,
Thomas


On 5/4/22 14:57, Jozef Rebjak wrote:

Hello, If there is somebody who is using non-root user within Pacific and would 
like to upgrade to Quincy read this first

https://blog.jozefrebjak.com/why-to-wait-with-upgrade-from-ceph-pacific-with-non-root-user-to-quincy
 
.

or message me with a solution. For me it’s just about waiting for v17.2.1.

Thanks



Dňa 4. 5. 2022 o 11:16, Ilya Dryomov  napísal:

On Tue, May 3, 2022 at 9:31 PM Steve Taylor  wrote:


Just curious, is there any updated ETA on the 16.2.8 release? This
note implied that it was pretty close a couple of weeks ago, but the
release task seems to have several outstanding items before it's
wrapped up.

I'm just wondering if it's worth waiting a bit for new Pacific
deployments to try 16.2.8 or not. Thanks!


Hi Steve,

The last blocker PR just merged so it should be a matter of days now.

Thanks,

Ilya



Steve

On Wed, Apr 20, 2022 at 3:37 AM Ilya Dryomov  wrote:


On Wed, Apr 20, 2022 at 6:21 AM Harry G. Coin  wrote:


Great news!  Any notion when the many pending bug fixes will show up in
Pacific?  It's been a while.


Hi Harry,

The 16.2.8 release is planned within the next week or two.

Thanks,

Ilya
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: cephadm error mgr not available and ERROR: Failed to add host

2022-05-25 Thread Eugen Block


Hi,

first, you can bootstrap a cluster by providing the container image  
path in the bootstrap command like this:


cephadm --image **:5000/ceph/ceph bootstrap --mon-ip **

Check out the docs for an isolated environment [1], I don't think it's  
a good idea to change the runtime the way you did. The container paths  
are configurable, for example you can set it like this:


ceph config set global container_image :5000/my/ceph/image

And then your subject seems wrong, you write "mgr not available", but  
from the logs you paste this:



Waiting for mgr to start...
Waiting for mgr...
mgr not available, waiting (1/15)...
mgr not available, waiting (2/15)...
mgr not available, waiting (3/15)...
mgr not available, waiting (4/15)...
mgr is available
Enabling cephadm module...
Waiting for the mgr to restart...
Waiting for mgr epoch 5...
mgr epoch 5 is available


So the mgr seems to work, it's your bootstrap host that is not ready  
to be managed by cephadm:



pp0101.fst/ceph/ceph:v16.2.7 orch host add opcpmfpsbpp0101 10.20.23.65
/usr/bin/ceph: stderr Error EINVAL: Failed to connect to opcpmfpsbpp0101
(10.20.23.65).
/usr/bin/ceph: stderr Please make sure that the host is reachable and
accepts connections using the cephadm SSH key


Is your host reachable and did you configure SSH access?

[1]  
https://docs.ceph.com/en/latest/cephadm/install/#deployment-in-an-isolated-environment



Zitat von farhad kh :


hi
i want used private registry for running cluster ceph storage and i changed
default registry my container runtime (docker)
/etc/docker/deamon.json
{
  "registery-mirrors": ["https://private-registery.fst;]
}

and all registry addres in /usr/sbin/cephadm(quay.ceph.io and docker.io to
my private registry cat /usr/sbin/cephadm | grep private-registery.fst

DEFAULT_IMAGE = 'private-registery.fst/ceph/ceph:v16.2.7'
DEFAULT_PROMETHEUS_IMAGE = 'private-registery.fst/ceph/prometheus:v2.18.1'
DEFAULT_NODE_EXPORTER_IMAGE =
'private-registery.fst/ceph/node-exporter:v0.18.1'
DEFAULT_ALERT_MANAGER_IMAGE =
'private-registery.fst/ceph/alertmanager:v0.20.0'
DEFAULT_GRAFANA_IMAGE = 'private-registery.fst/ceph/ceph-grafana:6.7.4'
DEFAULT_HAPROXY_IMAGE = 'private-registery.fst/ceph/haproxy:2.3'
DEFAULT_KEEPALIVED_IMAGE = 'private-registery.fst/ceph/keepalived'
DEFAULT_REGISTRY = 'private-registery.fst'   # normalize unqualified
digests to this
>>> normalize_image_digest('ceph/ceph:v16', 'private-registery.fst')
>>> normalize_image_digest('private-registery.fst/ceph/ceph:v16',
'private-registery.fst')
'private-registery.fst/ceph/ceph:v16'
>>> normalize_image_digest('private-registery.fst/ceph',
'private-registery.fst')
>>> normalize_image_digest('localhost/ceph', 'private-registery.fst')

when i try deply first node of cluseter with cephadm  i have this error

 cephadm bootstrap   --mon-ip 10.20.23.65 --allow-fqdn-hostname
--initial-dashboard-user admin   --initial-dashboard-password admin
--dashboard-password-noupdate
Verifying podman|docker is present...
Verifying lvm2 is present...
Verifying time synchronization is in place...
Unit chronyd.service is enabled and running
Repeating the final host check...
docker (/bin/docker) is present
systemctl is present
lvcreate is present
Unit chronyd.service is enabled and running
Host looks OK
Cluster fsid: e52bee78-db8b-11ec-9099-00505695f8a8
Verifying IP 10.20.23.65 port 3300 ...
Verifying IP 10.20.23.65 port 6789 ...
Mon IP `10.20.23.65` is in CIDR network `10.20.23.0/24`
- internal network (--cluster-network) has not been provided, OSD
replication will default to the public_network
Pulling container image private-registery.fst/ceph/ceph:v16.2.7...
Ceph version: ceph version 16.2.7
(dd0603118f56ab514f133c8d2e3adfc983942503) pacific (stable)
Extracting ceph user uid/gid from container image...
Creating initial keys...
Creating initial monmap...
Creating mon...
Waiting for mon to start...
Waiting for mon...
mon is available
Assimilating anything we can from ceph.conf...
Generating new minimal ceph.conf...
Restarting the monitor...
Setting mon public_network to 10.20.23.0/24
Wrote config to /etc/ceph/ceph.conf
Wrote keyring to /etc/ceph/ceph.client.admin.keyring
Creating mgr...
Verifying port 9283 ...
Waiting for mgr to start...
Waiting for mgr...
mgr not available, waiting (1/15)...
mgr not available, waiting (2/15)...
mgr not available, waiting (3/15)...
mgr not available, waiting (4/15)...
mgr is available
Enabling cephadm module...
Waiting for the mgr to restart...
Waiting for mgr epoch 5...
mgr epoch 5 is available
Setting orchestrator backend to cephadm...
Generating ssh key...
Wrote public SSH key to /etc/ceph/ceph.pub
Adding key to root@localhost authorized_keys...
Adding host opcpmfpsbpp0101...
Non-zero exit code 22 from /bin/docker run --rm --ipc=host
--stop-signal=SIGTERM --net=host --entrypoint /usr/bin/ceph --init -e
CONTAINER_IMAGE=private-registery.fst/ceph/ceph:v16.2.7 -e
NODE_NAME=opcpmfpsbpp0101 -e  CEPH_USE_RANDOM_NONCE=1 -v

[ceph-users] Re: rbd command hangs

2022-05-25 Thread Sopena Ballesteros Manuel

also,


nid001388:~ # ceph -n client.noir auth get client.noir
Error EACCES: access denied




From: Ilya Dryomov 
Sent: Tuesday, May 24, 2022 8:45:23 PM
To: Sopena Ballesteros Manuel
Cc: ceph-users@ceph.io
Subject: Re: [ceph-users] rbd command hangs

On Tue, May 24, 2022 at 8:14 PM Sopena Ballesteros Manuel
 wrote:
>
> yes dmesg shows the following:
>
> ...
>
> [23661.367449] rbd: rbd12: failed to lock header: -13
> [23661.367968] rbd: rbd2: no lock owners detected
> [23661.369306] rbd: rbd11: no lock owners detected
> [23661.370068] rbd: rbd11: breaking header lock owned by client21473520
> [23661.370518] rbd: rbd11: blacklist of client21473520 failed: -13
> [23661.370519] rbd: rbd11: failed to lock header: -13
> [23661.370869] rbd: rbd5: no lock owners detected
> [23661.371994] rbd: rbd1: no lock owners detected
> [23661.372546] rbd: rbd1: breaking header lock owned by client21473520
> [23661.373058] rbd: rbd1: blacklist of client21473520 failed: -13
> [23661.373059] rbd: rbd1: failed to lock header: -13
> [23661.374111] rbd: rbd2: breaking header lock owned by client21473520
> [23661.374485] rbd: rbd4: no lock owners detected
> [23661.375210] rbd: rbd4: breaking header lock owned by client21473520
> [23661.375701] rbd: rbd4: blacklist of client21473520 failed: -13
> [23661.375702] rbd: rbd4: failed to lock header: -13
> [23661.376881] rbd: rbd5: breaking header lock owned by client21473520
> [23661.381151] rbd: rbd2: blacklist of client21473520 failed: -13
> [23661.385151] rbd: rbd5: blacklist of client21473520 failed: -13
> [23661.388279] rbd: rbd2: failed to lock header: -13

What is the output of "ceph auth get client.noir"?  The auth caps are
likely incorrect and missing blocklist permissions, see

https://docs.ceph.com/en/quincy/rbd/rados-rbd-cmds/#create-a-block-device-user

Thanks,

Ilya
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: rbd command hangs

2022-05-25 Thread Sopena Ballesteros Manuel

attached,


nid001388:~ # ceph auth get client.noir
2022-05-25T09:20:00.731+0200 7f81f63f3700 -1 auth: unable to find a keyring on 
/etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,:
 (2) No such file or directory
2022-05-25T09:20:00.731+0200 7f81f63f3700 -1 AuthRegistry(0x7f81f005ec68) no 
keyring found at 
/etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,,
 disabling cephx
2022-05-25T09:20:00.731+0200 7f81f63f3700 -1 auth: unable to find a keyring on 
/etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,:
 (2) No such file or directory
2022-05-25T09:20:00.731+0200 7f81f63f3700 -1 AuthRegistry(0x7f81f63f2060) no 
keyring found at 
/etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,,
 disabling cephx
2022-05-25T09:20:00.731+0200 7f81f53f1700 -1 monclient(hunting): 
handle_auth_bad_method server allowed_methods [2] but i only support [1]
2022-05-25T09:20:00.735+0200 7f81f63f3700 -1 monclient: authenticate NOTE: no 
keyring found; disabled cephx authentication
[errno 95] error connecting to the cluster




From: Ilya Dryomov 
Sent: Tuesday, May 24, 2022 8:45:23 PM
To: Sopena Ballesteros Manuel
Cc: ceph-users@ceph.io
Subject: Re: [ceph-users] rbd command hangs

On Tue, May 24, 2022 at 8:14 PM Sopena Ballesteros Manuel
 wrote:
>
> yes dmesg shows the following:
>
> ...
>
> [23661.367449] rbd: rbd12: failed to lock header: -13
> [23661.367968] rbd: rbd2: no lock owners detected
> [23661.369306] rbd: rbd11: no lock owners detected
> [23661.370068] rbd: rbd11: breaking header lock owned by client21473520
> [23661.370518] rbd: rbd11: blacklist of client21473520 failed: -13
> [23661.370519] rbd: rbd11: failed to lock header: -13
> [23661.370869] rbd: rbd5: no lock owners detected
> [23661.371994] rbd: rbd1: no lock owners detected
> [23661.372546] rbd: rbd1: breaking header lock owned by client21473520
> [23661.373058] rbd: rbd1: blacklist of client21473520 failed: -13
> [23661.373059] rbd: rbd1: failed to lock header: -13
> [23661.374111] rbd: rbd2: breaking header lock owned by client21473520
> [23661.374485] rbd: rbd4: no lock owners detected
> [23661.375210] rbd: rbd4: breaking header lock owned by client21473520
> [23661.375701] rbd: rbd4: blacklist of client21473520 failed: -13
> [23661.375702] rbd: rbd4: failed to lock header: -13
> [23661.376881] rbd: rbd5: breaking header lock owned by client21473520
> [23661.381151] rbd: rbd2: blacklist of client21473520 failed: -13
> [23661.385151] rbd: rbd5: blacklist of client21473520 failed: -13
> [23661.388279] rbd: rbd2: failed to lock header: -13

What is the output of "ceph auth get client.noir"?  The auth caps are
likely incorrect and missing blocklist permissions, see

https://docs.ceph.com/en/quincy/rbd/rados-rbd-cmds/#create-a-block-device-user

Thanks,

Ilya
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Replacing OSD with DB on shared NVMe

[ceph-users] Re: Replacing OSD with DB on shared NVMe

[ceph-users] Re: Cluster healthy, but 16.2.7 osd daemon upgrade says its unsafe to stop them?

[ceph-users] Replacing OSD with DB on shared NVMe

[ceph-users] Re: Error deploying iscsi service through cephadm

[ceph-users] Re: Ceph Repo Branch Rename - May 24

[ceph-users] Re: Ceph Repo Branch Rename - May 24

[ceph-users] Re: Cluster healthy, but 16.2.7 osd daemon upgrade says its unsafe to stop them?

[ceph-users] Re: Error deploying iscsi service through cephadm

[ceph-users] Re: Cluster healthy, but 16.2.7 osd daemon upgrade says its unsafe to stop them?

[ceph-users] OSDs won't boot after host restart

[ceph-users] Ceph Leadership Team Meeting

[ceph-users] Re: rbd command hangs

[ceph-users] Re: v17.2.0 Quincy released

[ceph-users] Re: cephadm error mgr not available and ERROR: Failed to add host

[ceph-users] Re: rbd command hangs

[ceph-users] Re: rbd command hangs

17 matches

Site Navigation

Mail list logo

Footer information