[ceph-users] Re: ceph-mgr client.0 error registering admin socket command: (17) File exists

2024-02-26 Thread Eugen Block

Hi,

I see these messages regularly but haven't looked to deep into the  
cause. It appears to be related to short interruptions like log  
rotation or a mgr failover. I think they're harmless.


Regards,
Eugen

Zitat von Denis Polom :


Hi,

running Ceph Quincy 17.2.7 on Ubuntu Focal LTS, ceph-mgr service  
reports following errors:


client.0 error registering admin socket command: (17) File exists

I don't use any extra mgr configuration:

mgr   advanced  mgr/balancer/active true
mgr   advanced  mgr/balancer/log_level debug
mgr   advanced  mgr/balancer/log_to_cluster true
mgr   advanced  mgr/balancer/mode upmap
mgr   advanced  mgr/balancer/upmap_max_deviation 1
mgr   advanced  mgr/balancer/upmap_max_optimizations 20
mgr   advanced  mgr/prometheus/cache true

Do you have some idea, what's the cause and how to fix it?

Thank you

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph & iSCSI

2024-02-26 Thread Xiubo Li

Hi Michael,

Please see the previous threads about the same question:

https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/GDJJL7VSDUJITPM3JV7RCVXVOIQO2CAN/

https://www.spinics.net/lists/ceph-users/msg73969.html

Thanks

- Xiubo

On 2/27/24 11:22, Michael Worsham wrote:

I was reading on the Ceph site that iSCSI is no longer under active development 
since November 2022. Why is that?

https://docs.ceph.com/en/latest/rbd/iscsi-overview/

-- Michael

This message and its attachments are from Data Dimensions and are intended only 
for the use of the individual or entity to which it is addressed, and may 
contain information that is privileged, confidential, and exempt from 
disclosure under applicable law. If the reader of this message is not the 
intended recipient, or the employee or agent responsible for delivering the 
message to the intended recipient, you are hereby notified that any 
dissemination, distribution, or copying of this communication is strictly 
prohibited. If you have received this communication in error, please notify the 
sender immediately and permanently delete the original email and destroy any 
copies or printouts of this email as well as any attachments.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: OSD with dm-crypt?

2024-02-26 Thread Michael Worsham
I was setting up the Ceph cluster via this URL 
(https://computingforgeeks.com/install-ceph-storage-cluster-on-ubuntu-linux-servers/)
 and didn't know if there was a way to do it via the "ceph orch daemon add osd 
ceph-osd-01:/dev/sdb" command or not?

Is it possible to set the OSD to encryption after the fact or does that involve 
some other process?

-- Michael



Get Outlook for Android

From: Alex Gorbachev 
Sent: Monday, February 26, 2024 11:10:54 PM
To: Michael Worsham 
Cc: ceph-users@ceph.io 
Subject: Re: [ceph-users] OSD with dm-crypt?

This is an external email. Please take care when clicking links or opening 
attachments. When in doubt, check with the Help Desk or Security.

If you are using a service spec, just set

encrypted: true

If using ceph-volume, pass this flag:

--dmcrypt

You can verify similar to 
https://smithfarm-thebrain.blogspot.com/2020/03/how-to-verify-that-encrypted-osd-is.html
--
Alex Gorbachev
ISS/Storcium



On Mon, Feb 26, 2024 at 10:25 PM Michael Worsham 
mailto:mwors...@datadimensions.com>> wrote:
Is there a how-to document or cheat sheet on how to enable OSD encryption using 
dm-crypt?

-- Michael

This message and its attachments are from Data Dimensions and are intended only 
for the use of the individual or entity to which it is addressed, and may 
contain information that is privileged, confidential, and exempt from 
disclosure under applicable law. If the reader of this message is not the 
intended recipient, or the employee or agent responsible for delivering the 
message to the intended recipient, you are hereby notified that any 
dissemination, distribution, or copying of this communication is strictly 
prohibited. If you have received this communication in error, please notify the 
sender immediately and permanently delete the original email and destroy any 
copies or printouts of this email as well as any attachments.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to 
ceph-users-le...@ceph.io
This message and its attachments are from Data Dimensions and are intended only 
for the use of the individual or entity to which it is addressed, and may 
contain information that is privileged, confidential, and exempt from 
disclosure under applicable law. If the reader of this message is not the 
intended recipient, or the employee or agent responsible for delivering the 
message to the intended recipient, you are hereby notified that any 
dissemination, distribution, or copying of this communication is strictly 
prohibited. If you have received this communication in error, please notify the 
sender immediately and permanently delete the original email and destroy any 
copies or printouts of this email as well as any attachments.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: OSD with dm-crypt?

2024-02-26 Thread Alex Gorbachev
If you are using a service spec, just set

encrypted: true

If using ceph-volume, pass this flag:

--dmcrypt

You can verify similar to
https://smithfarm-thebrain.blogspot.com/2020/03/how-to-verify-that-encrypted-osd-is.html
--
Alex Gorbachev
ISS/Storcium



On Mon, Feb 26, 2024 at 10:25 PM Michael Worsham <
mwors...@datadimensions.com> wrote:

> Is there a how-to document or cheat sheet on how to enable OSD encryption
> using dm-crypt?
>
> -- Michael
>
> This message and its attachments are from Data Dimensions and are intended
> only for the use of the individual or entity to which it is addressed, and
> may contain information that is privileged, confidential, and exempt from
> disclosure under applicable law. If the reader of this message is not the
> intended recipient, or the employee or agent responsible for delivering the
> message to the intended recipient, you are hereby notified that any
> dissemination, distribution, or copying of this communication is strictly
> prohibited. If you have received this communication in error, please notify
> the sender immediately and permanently delete the original email and
> destroy any copies or printouts of this email as well as any attachments.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] OSD with dm-crypt?

2024-02-26 Thread Michael Worsham
Is there a how-to document or cheat sheet on how to enable OSD encryption using 
dm-crypt?

-- Michael

This message and its attachments are from Data Dimensions and are intended only 
for the use of the individual or entity to which it is addressed, and may 
contain information that is privileged, confidential, and exempt from 
disclosure under applicable law. If the reader of this message is not the 
intended recipient, or the employee or agent responsible for delivering the 
message to the intended recipient, you are hereby notified that any 
dissemination, distribution, or copying of this communication is strictly 
prohibited. If you have received this communication in error, please notify the 
sender immediately and permanently delete the original email and destroy any 
copies or printouts of this email as well as any attachments.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph & iSCSI

2024-02-26 Thread Michael Worsham
I was reading on the Ceph site that iSCSI is no longer under active development 
since November 2022. Why is that?

https://docs.ceph.com/en/latest/rbd/iscsi-overview/

-- Michael

This message and its attachments are from Data Dimensions and are intended only 
for the use of the individual or entity to which it is addressed, and may 
contain information that is privileged, confidential, and exempt from 
disclosure under applicable law. If the reader of this message is not the 
intended recipient, or the employee or agent responsible for delivering the 
message to the intended recipient, you are hereby notified that any 
dissemination, distribution, or copying of this communication is strictly 
prohibited. If you have received this communication in error, please notify the 
sender immediately and permanently delete the original email and destroy any 
copies or printouts of this email as well as any attachments.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Sata SSD trim latency with (WAL+DB on NVME + Sata OSD)

2024-02-26 Thread Özkan Göksu
Hello.

With the SSD drives without tantalum capacitors Ceph faces trim latency on
every write.
I wonder if the behavior is the same if we locate WAL+DB on NVME drives
with "Tantalum capacitors" ?

Do I need to use NVME + SAS SSD to avoid this latency issue?

Best regards.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Some questions about cephadm

2024-02-26 Thread Adam King
In regards to
>
> From the reading you gave me I have understood the following :
> 1 - Set osd_memory_target_autotune to true then set
> autotune_memory_target_ratio to 0.2
> 2 - Or do the math. For my setup I have 384Go per node, each node has 4
> nvme disks of 7.6To, 0.2 of memory is 19.5G. So each OSD will have 19G of
> memory.
>
> Question : Should I take into account the size of the disk when calculating
> the required memory for an OSD?
>
The memory in question is RAM, not disk space. To see the exact value
cephadm will see for the amount of memory (in kb, we multiply by 1024 when
actually using it) when doing this autotuning, you can run

[root@vm-00 ~]# cephadm gather-facts | grep memory_total
  "memory_total_kb": 40802184,

on your machine. Then it multiplies that by the ratio and subtracts out an
amount for every non-OSD daemon on the node. Specifically (taking this from
the code)

min_size_by_type = {
'mds': 4096 * 1048576,
'mgr': 4096 * 1048576,
'mon': 1024 * 1048576,
'crash': 128 * 1048576,
'keepalived': 128 * 1048576,
'haproxy': 128 * 1048576,
}
default_size = 1024 * 1048576

so 1 GB for most daemons, with mgr and mds requiring extra (although for
mds it also uses the `mds_cache_memory_limit` config option if it's set)
and some others requiring less. What's left after all that is done is then
divided by the number of OSDs deployed on the host. If that number ends up
too small, however, there is some floor that it won't set below, but I
can't remember off the top of my head what that is. Maybe 4 GB.

On Mon, Feb 26, 2024 at 5:10 AM wodel youchi  wrote:

> Thank you all for your help.
>
> @Adam
> From the reading you gave me I have understood the following :
> 1 - Set osd_memory_target_autotune to true then set
> autotune_memory_target_ratio to 0.2
> 2 - Or do the math. For my setup I have 384Go per node, each node has 4
> nvme disks of 7.6To, 0.2 of memory is 19.5G. So each OSD will have 19G of
> memory.
>
> Question : Should I take into account the size of the disk when calculating
> the required memory for an OSD?
>
>
> I have another problem, the local registry. I deployed a local registry
> with the required images, then I used cephadm-ansible to prepare my hosts
> and inject the local registry url into /etc/container/registry.conf file
>
> Then I tried to deploy using this command on the admin node:
> cephadm --image 192.168.2.36:4000/ceph/ceph:v17 bootstrap --mon-ip
> 10.1.0.23 --cluster-network 10.2.0.0/16
>
> After the boot strap I found that it still downloads the images from the
> internet, even the ceph image itself, I see two images one from my registry
> the second from quay.
>
> There is a section that talks about using a local registry here
>
> https://docs.ceph.com/en/reef/cephadm/install/#deployment-in-an-isolated-environment
> ,
> but it's not clear especially about the other images. It talks about
> preparing a temporary file named initial-ceph.conf, then it does not use
> it???!!!
>
> Could you help?
>
> Regards.
>
> Le jeu. 22 févr. 2024 à 11:10, Eugen Block  a écrit :
>
> > Hi,
> >
> > just responding to the last questions:
> >
> > >- After the bootstrap, the Web interface was accessible :
> > >   - How can I access the wizard page again? If I don't use it the
> > first
> > >   time I could not find another way to get it.
> >
> > I don't know how to recall the wizard, but you should be able to
> > create a new dashboard user with your desired role (e. g.
> > administrator) from the CLI:
> >
> > ceph dashboard ac-user-create  [] -i
> > 
> >
> > >   - I had a problem with telemetry, I did not configure telemetry,
> > then
> > >   when I clicked the button, the web gui became
> > inaccessible.!!!
> >
> > You can see what happened in the active MGR log.
> >
> > Zitat von wodel youchi :
> >
> > > Hi,
> > >
> > > I have some questions about ceph using cephadm.
> > >
> > > I used to deploy ceph using ceph-ansible, now I have to move to
> cephadm,
> > I
> > > am in my learning journey.
> > >
> > >
> > >- How can I tell my cluster that it's a part of an HCI deployment?
> > With
> > >ceph-ansible it was easy using is_hci : yes
> > >- The documentation of ceph does not indicate what versions of
> > grafana,
> > >prometheus, ...etc should be used with a certain version.
> > >   - I am trying to deploy Quincy, I did a bootstrap to see what
> > >   containers were downloaded and their version.
> > >   - I am asking because I need to use a local registry to deploy
> > those
> > >   images.
> > >- After the bootstrap, the Web interface was accessible :
> > >   - How can I access the wizard page again? If I don't use it the
> > first
> > >   time I could not find another way to get it.
> > >   - I had a problem with telemetry, I did not configure telemetry,
> > then
> > >   when I clicked the button, the web gui became
> > inaccessible.!!!
> 

[ceph-users] Re: pacific 16.2.15 QE validation status

2024-02-26 Thread Yuri Weinstein
Thank you all!

We want to merge the PR with whitelisting added
https://github.com/ceph/ceph/pull/55717 and will start the 16.2.15
build/release afterward.

On Mon, Feb 26, 2024 at 8:25 AM Laura Flores  wrote:

> Thank you Junior for your thorough review of the RADOS suite. Aside from a
> few remaining warnings in the final run that could benefit from
> whitelisting, these are not blockers.
>
> Rados-approved.
>
> On Mon, Feb 26, 2024 at 9:29 AM Kamoltat Sirivadhna 
> wrote:
>
>> details of RADOS run analysis:
>>
>> yuriw-2024-02-19_19:25:49-rados-pacific-release-distro-default-smithi
>> <
>> https://pulpito.ceph.com/yuriw-2024-02-19_19:25:49-rados-pacific-release-distro-default-smithi/#collapseOne
>> >
>>
>>
>>
>>1. https://tracker.ceph.com/issues/64455  task/test_orch_cli: Health
>>check failed: cephadm background work is paused (CEPHADM_PAUSED)" in
>>cluster log (White list)
>>2. https://tracker.ceph.com/issues/64454
>> rados/cephadm/mgr-nfs-upgrade:
>>Health check failed: 1 stray daemon(s) not managed by cephadm
>>(CEPHADM_STRAY_DAEMON)" in cluster log (whitelist)
>>3. https://tracker.ceph.com/issues/63887: Starting alertmanager fails
>>from missing container (happens in Pacific)
>>4. Failed to reconnect to smithi155 [7566763
>><
>> https://pulpito.ceph.com/yuriw-2024-02-19_19:25:49-rados-pacific-release-distro-default-smithi/7566763
>> >
>>]
>>5. https://tracker.ceph.com/issues/64278 Unable to update caps for
>>client.iscsi.iscsi.a (known failures)
>>6. https://tracker.ceph.com/issues/64452 Teuthology runs into
>>"TypeError: expected string or bytes-like object" during log scraping
>>(teuthology failure)
>>7. https://tracker.ceph.com/issues/64343 Expected warnings that need
>> to
>>be whitelisted cause rados/cephadm tests to fail for 7566717
>><
>> https://pulpito.ceph.com/yuriw-2024-02-19_19:25:49-rados-pacific-release-distro-default-smithi/7566717
>> >
>>we neeed to add (ERR|WRN|SEC)
>>8. https://tracker.ceph.com/issues/58145 orch/cephadm: nfs tests
>> failing
>>to mount exports (mount -t nfs 10.0.31.120:/fake /mnt/foo' fails)
>>7566724 (resolved issue re-opened)
>>9. https://tracker.ceph.com/issues/63577 cephadm:
>>docker.io/library/haproxy: toomanyrequests: You have reached your pull
>>rate limit.
>>10. https://tracker.ceph.com/issues/54071 rdos/cephadm/osds: Invalid
>>command: missing required parameter hostname() 756674
>><
>> https://pulpito.ceph.com/yuriw-2024-02-19_19:25:49-rados-pacific-release-distro-default-smithi/7566747
>> >
>>
>>
>> Note:
>>
>>1. Although 7566762 seems like a different failure from what is
>>displayed in pulpito, in the teuth log it failed because of
>>https://tracker.ceph.com/issues/64278.
>>2. rados/cephadm/thrash/ … failed a lot because of
>>https://tracker.ceph.com/issues/64452
>>3. 7566717
>><
>> https://pulpito.ceph.com/yuriw-2024-02-19_19:25:49-rados-pacific-release-distro-default-smithi/7566717
>> >.
>>failed because we didn’t whitelist (ERR|WRN|SEC)
>> :tasks.cephadm:Checking
>>cluster log for badness...
>>4. 7566724 https://tracker.ceph.com/issues/58145 ganesha seems
>> resolved
>>1 year ago, but popped up again so re-opened tracker and ping Adam King
>>(resolved)
>>
>> 7566777, 7566781, 7566796 are due to
>> https://tracker.ceph.com/issues/63577
>>
>>
>>
>> White List and re-ran:
>>
>> yuriw-2024-02-22_21:39:39-rados-pacific-release-distro-default-smithi/
>> <
>> https://pulpito.ceph.com/yuriw-2024-02-22_21:39:39-rados-pacific-release-distro-default-smithi/
>> >
>>
>> rados/cephadm/mds_upgrade_sequence/ —> failed to shutdown mon (known
>> failure discussed with A.King)
>>
>> rados/cephadm/mgr-nfs-upgrade —> failed to shutdown mon (known failure
>> discussed with A.King)
>>
>> rados/cephadm/osds —> zap disk error (known failure)
>>
>> rados/cephadm/smoke-roleless —>  toomanyrequests: You have reached your
>> pull rate limit. https://www.docker.com/increase-rate-limit. (known
>> failures)
>>
>> rados/cephadm/thrash —> Just needs to whitelist (CACHE_POOL_NEAR_FULL)
>> (known failures)
>>
>> rados/cephadm/upgrade —> CEPHADM_FAILED_DAEMON (WRN)  node-exporter
>> (known
>> failure discussed with A.King)
>>
>> rados/cephadm/workunits —> known failure:
>> https://tracker.ceph.com/issues/63887
>>
>> On Mon, Feb 26, 2024 at 10:22 AM Kamoltat Sirivadhna > >
>> wrote:
>>
>> > RADOS approved
>> >
>> > On Wed, Feb 21, 2024 at 11:27 AM Yuri Weinstein 
>> > wrote:
>> >
>> >> Still seeking approvals:
>> >>
>> >> rados - Radek, Junior, Travis, Adam King
>> >>
>> >> All other product areas have been approved and are ready for the
>> release
>> >> step.
>> >>
>> >> Pls also review the Release Notes:
>> >> https://github.com/ceph/ceph/pull/55694
>> >>
>> >>
>> >> On Tue, Feb 20, 2024 at 7:58 AM Yuri Weinstein 
>> >> wrote:
>> >> >
>> >> > We have restarted QE validation after fixing issues and merging
>> several
>> >> 

[ceph-users] Re: pacific 16.2.15 QE validation status

2024-02-26 Thread Laura Flores
Thank you Junior for your thorough review of the RADOS suite. Aside from a
few remaining warnings in the final run that could benefit from
whitelisting, these are not blockers.

Rados-approved.

On Mon, Feb 26, 2024 at 9:29 AM Kamoltat Sirivadhna 
wrote:

> details of RADOS run analysis:
>
> yuriw-2024-02-19_19:25:49-rados-pacific-release-distro-default-smithi
> <
> https://pulpito.ceph.com/yuriw-2024-02-19_19:25:49-rados-pacific-release-distro-default-smithi/#collapseOne
> >
>
>
>
>1. https://tracker.ceph.com/issues/64455  task/test_orch_cli: Health
>check failed: cephadm background work is paused (CEPHADM_PAUSED)" in
>cluster log (White list)
>2. https://tracker.ceph.com/issues/64454 rados/cephadm/mgr-nfs-upgrade:
>Health check failed: 1 stray daemon(s) not managed by cephadm
>(CEPHADM_STRAY_DAEMON)" in cluster log (whitelist)
>3. https://tracker.ceph.com/issues/63887: Starting alertmanager fails
>from missing container (happens in Pacific)
>4. Failed to reconnect to smithi155 [7566763
><
> https://pulpito.ceph.com/yuriw-2024-02-19_19:25:49-rados-pacific-release-distro-default-smithi/7566763
> >
>]
>5. https://tracker.ceph.com/issues/64278 Unable to update caps for
>client.iscsi.iscsi.a (known failures)
>6. https://tracker.ceph.com/issues/64452 Teuthology runs into
>"TypeError: expected string or bytes-like object" during log scraping
>(teuthology failure)
>7. https://tracker.ceph.com/issues/64343 Expected warnings that need to
>be whitelisted cause rados/cephadm tests to fail for 7566717
><
> https://pulpito.ceph.com/yuriw-2024-02-19_19:25:49-rados-pacific-release-distro-default-smithi/7566717
> >
>we neeed to add (ERR|WRN|SEC)
>8. https://tracker.ceph.com/issues/58145 orch/cephadm: nfs tests
> failing
>to mount exports (mount -t nfs 10.0.31.120:/fake /mnt/foo' fails)
>7566724 (resolved issue re-opened)
>9. https://tracker.ceph.com/issues/63577 cephadm:
>docker.io/library/haproxy: toomanyrequests: You have reached your pull
>rate limit.
>10. https://tracker.ceph.com/issues/54071 rdos/cephadm/osds: Invalid
>command: missing required parameter hostname() 756674
><
> https://pulpito.ceph.com/yuriw-2024-02-19_19:25:49-rados-pacific-release-distro-default-smithi/7566747
> >
>
>
> Note:
>
>1. Although 7566762 seems like a different failure from what is
>displayed in pulpito, in the teuth log it failed because of
>https://tracker.ceph.com/issues/64278.
>2. rados/cephadm/thrash/ … failed a lot because of
>https://tracker.ceph.com/issues/64452
>3. 7566717
><
> https://pulpito.ceph.com/yuriw-2024-02-19_19:25:49-rados-pacific-release-distro-default-smithi/7566717
> >.
>failed because we didn’t whitelist (ERR|WRN|SEC) :tasks.cephadm:Checking
>cluster log for badness...
>4. 7566724 https://tracker.ceph.com/issues/58145 ganesha seems resolved
>1 year ago, but popped up again so re-opened tracker and ping Adam King
>(resolved)
>
> 7566777, 7566781, 7566796 are due to https://tracker.ceph.com/issues/63577
>
>
>
> White List and re-ran:
>
> yuriw-2024-02-22_21:39:39-rados-pacific-release-distro-default-smithi/
> <
> https://pulpito.ceph.com/yuriw-2024-02-22_21:39:39-rados-pacific-release-distro-default-smithi/
> >
>
> rados/cephadm/mds_upgrade_sequence/ —> failed to shutdown mon (known
> failure discussed with A.King)
>
> rados/cephadm/mgr-nfs-upgrade —> failed to shutdown mon (known failure
> discussed with A.King)
>
> rados/cephadm/osds —> zap disk error (known failure)
>
> rados/cephadm/smoke-roleless —>  toomanyrequests: You have reached your
> pull rate limit. https://www.docker.com/increase-rate-limit. (known
> failures)
>
> rados/cephadm/thrash —> Just needs to whitelist (CACHE_POOL_NEAR_FULL)
> (known failures)
>
> rados/cephadm/upgrade —> CEPHADM_FAILED_DAEMON (WRN)  node-exporter  (known
> failure discussed with A.King)
>
> rados/cephadm/workunits —> known failure:
> https://tracker.ceph.com/issues/63887
>
> On Mon, Feb 26, 2024 at 10:22 AM Kamoltat Sirivadhna 
> wrote:
>
> > RADOS approved
> >
> > On Wed, Feb 21, 2024 at 11:27 AM Yuri Weinstein 
> > wrote:
> >
> >> Still seeking approvals:
> >>
> >> rados - Radek, Junior, Travis, Adam King
> >>
> >> All other product areas have been approved and are ready for the release
> >> step.
> >>
> >> Pls also review the Release Notes:
> >> https://github.com/ceph/ceph/pull/55694
> >>
> >>
> >> On Tue, Feb 20, 2024 at 7:58 AM Yuri Weinstein 
> >> wrote:
> >> >
> >> > We have restarted QE validation after fixing issues and merging
> several
> >> PRs.
> >> > The new Build 3 (rebase of pacific) tests are summarized in the same
> >> > note (see Build 3 runs) https://tracker.ceph.com/issues/64151#note-1
> >> >
> >> > Seeking approvals:
> >> >
> >> > rados - Radek, Junior, Travis, Ernesto, Adam King
> >> > rgw - Casey
> >> > fs - Venky
> >> > rbd - Ilya
> >> > krbd - Ilya
> >> >
> >> > upgrade/octopus-x (pacific) 

[ceph-users] Re: pacific 16.2.15 QE validation status

2024-02-26 Thread Kamoltat Sirivadhna
details of RADOS run analysis:

yuriw-2024-02-19_19:25:49-rados-pacific-release-distro-default-smithi




   1. https://tracker.ceph.com/issues/64455  task/test_orch_cli: Health
   check failed: cephadm background work is paused (CEPHADM_PAUSED)" in
   cluster log (White list)
   2. https://tracker.ceph.com/issues/64454 rados/cephadm/mgr-nfs-upgrade:
   Health check failed: 1 stray daemon(s) not managed by cephadm
   (CEPHADM_STRAY_DAEMON)" in cluster log (whitelist)
   3. https://tracker.ceph.com/issues/63887: Starting alertmanager fails
   from missing container (happens in Pacific)
   4. Failed to reconnect to smithi155 [7566763
   

   ]
   5. https://tracker.ceph.com/issues/64278 Unable to update caps for
   client.iscsi.iscsi.a (known failures)
   6. https://tracker.ceph.com/issues/64452 Teuthology runs into
   "TypeError: expected string or bytes-like object" during log scraping
   (teuthology failure)
   7. https://tracker.ceph.com/issues/64343 Expected warnings that need to
   be whitelisted cause rados/cephadm tests to fail for 7566717
   

   we neeed to add (ERR|WRN|SEC)
   8. https://tracker.ceph.com/issues/58145 orch/cephadm: nfs tests failing
   to mount exports (mount -t nfs 10.0.31.120:/fake /mnt/foo' fails)
   7566724 (resolved issue re-opened)
   9. https://tracker.ceph.com/issues/63577 cephadm:
   docker.io/library/haproxy: toomanyrequests: You have reached your pull
   rate limit.
   10. https://tracker.ceph.com/issues/54071 rdos/cephadm/osds: Invalid
   command: missing required parameter hostname() 756674
   



Note:

   1. Although 7566762 seems like a different failure from what is
   displayed in pulpito, in the teuth log it failed because of
   https://tracker.ceph.com/issues/64278.
   2. rados/cephadm/thrash/ … failed a lot because of
   https://tracker.ceph.com/issues/64452
   3. 7566717
   
.
   failed because we didn’t whitelist (ERR|WRN|SEC) :tasks.cephadm:Checking
   cluster log for badness...
   4. 7566724 https://tracker.ceph.com/issues/58145 ganesha seems resolved
   1 year ago, but popped up again so re-opened tracker and ping Adam King
   (resolved)

7566777, 7566781, 7566796 are due to https://tracker.ceph.com/issues/63577



White List and re-ran:

yuriw-2024-02-22_21:39:39-rados-pacific-release-distro-default-smithi/


rados/cephadm/mds_upgrade_sequence/ —> failed to shutdown mon (known
failure discussed with A.King)

rados/cephadm/mgr-nfs-upgrade —> failed to shutdown mon (known failure
discussed with A.King)

rados/cephadm/osds —> zap disk error (known failure)

rados/cephadm/smoke-roleless —>  toomanyrequests: You have reached your
pull rate limit. https://www.docker.com/increase-rate-limit. (known
failures)

rados/cephadm/thrash —> Just needs to whitelist (CACHE_POOL_NEAR_FULL)
(known failures)

rados/cephadm/upgrade —> CEPHADM_FAILED_DAEMON (WRN)  node-exporter  (known
failure discussed with A.King)

rados/cephadm/workunits —> known failure:
https://tracker.ceph.com/issues/63887

On Mon, Feb 26, 2024 at 10:22 AM Kamoltat Sirivadhna 
wrote:

> RADOS approved
>
> On Wed, Feb 21, 2024 at 11:27 AM Yuri Weinstein 
> wrote:
>
>> Still seeking approvals:
>>
>> rados - Radek, Junior, Travis, Adam King
>>
>> All other product areas have been approved and are ready for the release
>> step.
>>
>> Pls also review the Release Notes:
>> https://github.com/ceph/ceph/pull/55694
>>
>>
>> On Tue, Feb 20, 2024 at 7:58 AM Yuri Weinstein 
>> wrote:
>> >
>> > We have restarted QE validation after fixing issues and merging several
>> PRs.
>> > The new Build 3 (rebase of pacific) tests are summarized in the same
>> > note (see Build 3 runs) https://tracker.ceph.com/issues/64151#note-1
>> >
>> > Seeking approvals:
>> >
>> > rados - Radek, Junior, Travis, Ernesto, Adam King
>> > rgw - Casey
>> > fs - Venky
>> > rbd - Ilya
>> > krbd - Ilya
>> >
>> > upgrade/octopus-x (pacific) - Adam King, Casey PTL
>> >
>> > upgrade/pacific-p2p - Casey PTL
>> >
>> > ceph-volume - Guillaume, fixed by
>> > https://github.com/ceph/ceph/pull/55658 retesting
>> >
>> > On Thu, Feb 8, 2024 at 8:43 AM Casey Bodley  wrote:
>> > >
>> > > thanks, i've created https://tracker.ceph.com/issues/64360 to track
>> > > these backports to pacific/quincy/reef
>> > >
>> > > On Thu, Feb 8, 2024 at 7:50 AM Stefan Kooman  wrote:
>> > > >
>> > > > Hi,
>> > > >
>> > > > Is this PR: 

[ceph-users] Re: Seperate metadata pool in 3x MDS node

2024-02-26 Thread Özkan Göksu
 Hello Anthony,

The hardware is second hand built and does not have U.2 slots. U.2 servers
cost 3x-4x more.I mean PCI-E "MZ-PLK3T20".
I have to buy SFP cards and 25G is only +30$ more than 10G so why not.
Yes I'm thinking pinned as (clients > rack MDS)
I don't have problems with building and I don't use PG autoscaler.

Hello David.

My system is all internal and I only use one /20 subnet at layer2 level
Yes , I'm thinking of distributing the meta pool on racks 1,2,4,5 because
my clients use search a lot and I just want to shorten the metadata needs.
I have redundant rack PDU's so I don't have any problem with power and I
only have a VPC (2x n9k switch) on the main rack 3. That's why I keep data
and management related everything on rack3 as usual.
Normally I always use WAL+DB on NVME with Sata OSD. The only thing I wonder
is having a separate metadata pool on NVME located on the client racks is
gonna give some benefit or not.

Regards.

David C. , 25 Şub 2024 Paz, 00:07 tarihinde şunu
yazdı:

> Hello,
>
> Each rack works on different trees or is everything parallelized ?
> The meta pools would be distributed over racks 1,2,4,5 ?
> If it is distributed, even if the addressed MDS is on the same switch as
> the client, you will always have this MDS which will consult/write (nvme)
> OSDs on the other racks (among 1,2,4,5).
>
> In any case, the exercise is interesting.
>
>
>
> Le sam. 24 févr. 2024 à 19:56, Özkan Göksu  a écrit :
>
>> Hello folks!
>>
>> I'm designing a new Ceph storage from scratch and I want to increase
>> CephFS
>> speed and decrease latency.
>> Usually I always build (WAL+DB on NVME with Sas-Sata SSD's) and I deploy
>> MDS and MON's on the same servers.
>> This time a weird idea came to my mind and I think it has great potential
>> and will perform better on paper with my limited knowledge.
>>
>> I have 5 racks and the 3nd "middle" rack is my storage and management
>> rack.
>>
>> - At RACK-3 I'm gonna locate 8x 1u OSD server (Spec: 2x E5-2690V4, 256GB,
>> 4x 25G, 2x 1.6TB PCI-E NVME "MZ-PLK3T20", 8x 4TB SATA SSD)
>>
>> - My Cephfs kernel clients are 40x GPU nodes located at RACK1,2,4,5
>>
>> With my current workflow, all the clients;
>> 1- visit the rack data switch
>> 2- jump to main VPC switch via 2x100G,
>> 3- talk with MDS servers,
>> 4- Go back to the client with the answer,
>> 5- To access data follow the same HOP's and visit the OSD's everytime.
>>
>> If I deploy separate metadata pool by using 4x MDS server at top of
>> RACK-1,2,4,5 (Spec: 2x E5-2690V4, 128GB, 2x 10G(Public), 2x 25G (cluster),
>> 2x 960GB U.2 NVME "MZ-PLK3T20")
>> Then all the clients will make the request directly in-rack 1 HOP away MDS
>> servers and if the request is only metadata, then the MDS node doesn't
>> need
>> to redirect the request to OSD nodes.
>> Also by locating MDS servers with seperated metadata pool across all the
>> racks will reduce the high load on main VPC switch at RACK-3
>>
>> If I'm not missing anything then only Recovery workload will suffer with
>> this topology.
>>
>> What do you think?
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: pacific 16.2.15 QE validation status

2024-02-26 Thread Kamoltat Sirivadhna
RADOS approved

On Wed, Feb 21, 2024 at 11:27 AM Yuri Weinstein  wrote:

> Still seeking approvals:
>
> rados - Radek, Junior, Travis, Adam King
>
> All other product areas have been approved and are ready for the release
> step.
>
> Pls also review the Release Notes: https://github.com/ceph/ceph/pull/55694
>
>
> On Tue, Feb 20, 2024 at 7:58 AM Yuri Weinstein 
> wrote:
> >
> > We have restarted QE validation after fixing issues and merging several
> PRs.
> > The new Build 3 (rebase of pacific) tests are summarized in the same
> > note (see Build 3 runs) https://tracker.ceph.com/issues/64151#note-1
> >
> > Seeking approvals:
> >
> > rados - Radek, Junior, Travis, Ernesto, Adam King
> > rgw - Casey
> > fs - Venky
> > rbd - Ilya
> > krbd - Ilya
> >
> > upgrade/octopus-x (pacific) - Adam King, Casey PTL
> >
> > upgrade/pacific-p2p - Casey PTL
> >
> > ceph-volume - Guillaume, fixed by
> > https://github.com/ceph/ceph/pull/55658 retesting
> >
> > On Thu, Feb 8, 2024 at 8:43 AM Casey Bodley  wrote:
> > >
> > > thanks, i've created https://tracker.ceph.com/issues/64360 to track
> > > these backports to pacific/quincy/reef
> > >
> > > On Thu, Feb 8, 2024 at 7:50 AM Stefan Kooman  wrote:
> > > >
> > > > Hi,
> > > >
> > > > Is this PR: https://github.com/ceph/ceph/pull/54918 included as
> well?
> > > >
> > > > You definitely want to build the Ubuntu / debian packages with the
> > > > proper CMAKE_CXX_FLAGS. The performance impact on RocksDB is _HUGE_.
> > > >
> > > > Thanks,
> > > >
> > > > Gr. Stefan
> > > >
> > > > P.s. Kudos to Mark Nelson for figuring it out / testing.
> > > > ___
> > > > ceph-users mailing list -- ceph-users@ceph.io
> > > > To unsubscribe send an email to ceph-users-le...@ceph.io
> > > >
> > >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>


-- 

Kamoltat Sirivadhna (HE/HIM)

SoftWare Engineer - Ceph Storage

ksiri...@redhat.comT: (857) <(919)716-5348>253-8927
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Cephadm and Ceph.conf

2024-02-26 Thread Robert Sander

On 2/26/24 15:24, Michael Worsham wrote:

So how would I be able to put in configurations like this into it?

[global]
 fsid = 46620486-b8a6-11ee-bf23-6510c4d9efa7
 mon_host = [v2:10.20.27.10:3300/0,v1:10.20.27.10:6789/0] 
[v2:10.20.27.11:3300/0,v1:10.20.27.11:6789/0]
 osd pool default size = 3
 osd pool default min size = 2
 osd pool default pg num = 256
 osd pool default pgp num = 256
 mon_max_pg_per_osd = 800
 osd max pg per osd hard ratio = 10
 mon allow pool delete = true
 auth cluster required = cephx
 auth service required = cephx
 auth client required = cephx
 ms_mon_client_mode = crc

[client.radosgw.mon1]
 host = ceph-mon1
 log_file = /var/log/ceph/client.radosgw.mon1.log
 rgw_dns_name = ceph-mon1
 rgw_frontends = "beast port=80 num_threads=500"
 rgw_crypt_require_ssl = false


ceph config assimilate-conf may be of help here.

Regards
--
Robert Sander
Heinlein Consulting GmbH
Schwedter Str. 8/9b, 10119 Berlin

https://www.heinlein-support.de

Tel: 030 / 405051-43
Fax: 030 / 405051-19

Amtsgericht Berlin-Charlottenburg - HRB 220009 B
Geschäftsführer: Peer Heinlein - Sitz: Berlin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] ceph-mgr client.0 error registering admin socket command: (17) File exists

2024-02-26 Thread Denis Polom

Hi,

running Ceph Quincy 17.2.7 on Ubuntu Focal LTS, ceph-mgr service reports 
following errors:


client.0 error registering admin socket command: (17) File exists

I don't use any extra mgr configuration:

mgr   advanced  mgr/balancer/active true
mgr   advanced  mgr/balancer/log_level debug
mgr   advanced  mgr/balancer/log_to_cluster true
mgr   advanced  mgr/balancer/mode upmap
mgr   advanced  mgr/balancer/upmap_max_deviation 1
mgr   advanced  mgr/balancer/upmap_max_optimizations 20
mgr   advanced  mgr/prometheus/cache true

Do you have some idea, what's the cause and how to fix it?

Thank you

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Cephadm and Ceph.conf

2024-02-26 Thread Michael Worsham
So how would I be able to put in configurations like this into it?

[global]
fsid = 46620486-b8a6-11ee-bf23-6510c4d9efa7
mon_host = [v2:10.20.27.10:3300/0,v1:10.20.27.10:6789/0] 
[v2:10.20.27.11:3300/0,v1:10.20.27.11:6789/0]
osd pool default size = 3
osd pool default min size = 2
osd pool default pg num = 256
osd pool default pgp num = 256
mon_max_pg_per_osd = 800
osd max pg per osd hard ratio = 10
mon allow pool delete = true
auth cluster required = cephx
auth service required = cephx
auth client required = cephx
ms_mon_client_mode = crc

[client.radosgw.mon1]
host = ceph-mon1
log_file = /var/log/ceph/client.radosgw.mon1.log
rgw_dns_name = ceph-mon1
rgw_frontends = "beast port=80 num_threads=500"
rgw_crypt_require_ssl = false


-Original Message-
From: Robert Sander 
Sent: Monday, February 26, 2024 8:29 AM
To: ceph-users@ceph.io
Subject: [ceph-users] Re: Cephadm and Ceph.conf

This is an external email. Please take care when clicking links or opening 
attachments. When in doubt, check with the Help Desk or Security.


On 2/26/24 14:24, Michael Worsham wrote:
> I deployed a Ceph reef cluster using cephadm. When it comes to the ceph.conf 
> file, which file should I be editing for making changes to the cluster - the 
> one running under the docker container or the local one on the Ceph monitors?

None of both. You can adjust settings with "ceph config" or the Configuration 
tab of the Dashboard.

Regards
--
Robert Sander
Heinlein Consulting GmbH
Schwedter Str. 8/9b, 10119 Berlin

https://www.heinlein-support.de

Tel: 030 / 405051-43
Fax: 030 / 405051-19

Amtsgericht Berlin-Charlottenburg - HRB 220009 B
Geschäftsführer: Peer Heinlein - Sitz: Berlin 
___
ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to 
ceph-users-le...@ceph.io
This message and its attachments are from Data Dimensions and are intended only 
for the use of the individual or entity to which it is addressed, and may 
contain information that is privileged, confidential, and exempt from 
disclosure under applicable law. If the reader of this message is not the 
intended recipient, or the employee or agent responsible for delivering the 
message to the intended recipient, you are hereby notified that any 
dissemination, distribution, or copying of this communication is strictly 
prohibited. If you have received this communication in error, please notify the 
sender immediately and permanently delete the original email and destroy any 
copies or printouts of this email as well as any attachments.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Cephadm and Ceph.conf

2024-02-26 Thread Robert Sander

On 2/26/24 14:24, Michael Worsham wrote:

I deployed a Ceph reef cluster using cephadm. When it comes to the ceph.conf 
file, which file should I be editing for making changes to the cluster - the 
one running under the docker container or the local one on the Ceph monitors?


None of both. You can adjust settings with "ceph config" or the 
Configuration tab of the Dashboard.


Regards
--
Robert Sander
Heinlein Consulting GmbH
Schwedter Str. 8/9b, 10119 Berlin

https://www.heinlein-support.de

Tel: 030 / 405051-43
Fax: 030 / 405051-19

Amtsgericht Berlin-Charlottenburg - HRB 220009 B
Geschäftsführer: Peer Heinlein - Sitz: Berlin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Cephadm and Ceph.conf

2024-02-26 Thread Michael Worsham
I deployed a Ceph reef cluster using cephadm. When it comes to the ceph.conf 
file, which file should I be editing for making changes to the cluster - the 
one running under the docker container or the local one on the Ceph monitors?

-- Michael

This message and its attachments are from Data Dimensions and are intended only 
for the use of the individual or entity to which it is addressed, and may 
contain information that is privileged, confidential, and exempt from 
disclosure under applicable law. If the reader of this message is not the 
intended recipient, or the employee or agent responsible for delivering the 
message to the intended recipient, you are hereby notified that any 
dissemination, distribution, or copying of this communication is strictly 
prohibited. If you have received this communication in error, please notify the 
sender immediately and permanently delete the original email and destroy any 
copies or printouts of this email as well as any attachments.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Some questions about cephadm

2024-02-26 Thread wodel youchi
Hi;

So it was that, create the initial-ceph.conf and use the --config
Now All images are from the local registry.
Thank you all for your help.

Regards.

Le lun. 26 févr. 2024 à 14:09, wodel youchi  a
écrit :

> I've read that, but I didn't find how to use it?
> should I use the : --config *CONFIG_FILE *options?
>
> Le lun. 26 févr. 2024 à 13:59, Robert Sander 
> a écrit :
>
>> Hi,
>>
>> On 2/26/24 13:22, wodel youchi wrote:
>> >
>> > No didn't work, the bootstrap is still downloading the images from quay.
>>
>> For the image locations of the monitoring stack you have to create an
>> initical ceph.conf like it is mentioned in the chapter you referred
>> earlier:
>>
>> https://docs.ceph.com/en/reef/cephadm/install/#deployment-in-an-isolated-environment
>>
>> Regards
>> --
>> Robert Sander
>> Heinlein Consulting GmbH
>> Schwedter Str. 8/9b, 10119 Berlin
>>
>> https://www.heinlein-support.de
>>
>> Tel: 030 / 405051-43
>> Fax: 030 / 405051-19
>>
>> Amtsgericht Berlin-Charlottenburg - HRB 220009 B
>> Geschäftsführer: Peer Heinlein - Sitz: Berlin
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Some questions about cephadm

2024-02-26 Thread wodel youchi
I've read that, but I didn't find how to use it?
should I use the : --config *CONFIG_FILE *options?

Le lun. 26 févr. 2024 à 13:59, Robert Sander 
a écrit :

> Hi,
>
> On 2/26/24 13:22, wodel youchi wrote:
> >
> > No didn't work, the bootstrap is still downloading the images from quay.
>
> For the image locations of the monitoring stack you have to create an
> initical ceph.conf like it is mentioned in the chapter you referred
> earlier:
>
> https://docs.ceph.com/en/reef/cephadm/install/#deployment-in-an-isolated-environment
>
> Regards
> --
> Robert Sander
> Heinlein Consulting GmbH
> Schwedter Str. 8/9b, 10119 Berlin
>
> https://www.heinlein-support.de
>
> Tel: 030 / 405051-43
> Fax: 030 / 405051-19
>
> Amtsgericht Berlin-Charlottenburg - HRB 220009 B
> Geschäftsführer: Peer Heinlein - Sitz: Berlin
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Some questions about cephadm

2024-02-26 Thread John Mulligan
> 
> I have another problem, the local registry. I deployed a local registry
> with the required images, then I used cephadm-ansible to prepare my hosts
> and inject the local registry url into /etc/container/registry.conf file
> 
> Then I tried to deploy using this command on the admin node:
> cephadm --image 192.168.2.36:4000/ceph/ceph:v17 bootstrap --mon-ip
> 10.1.0.23 --cluster-network 10.2.0.0/16
> 
> After the boot strap I found that it still downloads the images from the
> internet, even the ceph image itself, I see two images one from my registry
> the second from quay.
> 
> There is a section that talks about using a local registry here
> https://docs.ceph.com/en/reef/cephadm/install/#deployment-in-an-isolated-env
> ironment,
 but it's not clear especially about the other images. It talks
> about preparing a temporary file named initial-ceph.conf, then it does not
> use it???!!!
> 
> Could you help?
> 


That's right, the docs are not clear here. I know I had a previous 
conversation about this but I can't find it nor recall if the conversation 
happened on the mailing list, slack, or a tracker issue.

Regardless, the option you need to pass the initial conf to bootstrap is `--
config/-c`. As per the section you need to customize the image names and then 
they will be downloaded from the local registry.



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Some questions about cephadm

2024-02-26 Thread Robert Sander

Hi,

On 2/26/24 13:22, wodel youchi wrote:


No didn't work, the bootstrap is still downloading the images from quay.


For the image locations of the monitoring stack you have to create an 
initical ceph.conf like it is mentioned in the chapter you referred 
earlier: 
https://docs.ceph.com/en/reef/cephadm/install/#deployment-in-an-isolated-environment


Regards
--
Robert Sander
Heinlein Consulting GmbH
Schwedter Str. 8/9b, 10119 Berlin

https://www.heinlein-support.de

Tel: 030 / 405051-43
Fax: 030 / 405051-19

Amtsgericht Berlin-Charlottenburg - HRB 220009 B
Geschäftsführer: Peer Heinlein - Sitz: Berlin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Some questions about cephadm

2024-02-26 Thread wodel youchi
Hi,

No didn't work, the bootstrap is still downloading the images from quay.
PS : My local registry does not require any login/pass authentication, I
used fake ones since it's mandatory to give them.

cephadm --image 192.168.2.36:4000/ceph/ceph:v17 bootstrap --registry-url
192.168.2.36:4000  --registry-username admin --registry-password admin
--mon-ip 10.1.0.23 --cluster-network 10.2.0.0/16

[root@controllera ~]# podman images

REPOSITORYTAG IMAGE ID  CREATED
   SIZE
192.168.2.36:4000/ceph/ceph   v17 56993389bc29  11 days ago
   1.29 GB
quay.io/ceph/ceph-grafana 9.4.7   954c08fa6188  2 months ago
  647 MB
quay.io/prometheus/prometheus v2.43.0 a07b618ecd1d  11 months ago
 235 MB
quay.io/prometheus/alertmanager   v0.25.0 c8568f914cd2  14 months ago
 66.5 MB
quay.io/prometheus/node-exporter  v1.5.0  0da6a335fe13  15 months ago
 23.9 MB



Regards.

Le lun. 26 févr. 2024 à 11:42, Robert Sander 
a écrit :

> Hi,
>
> On 26.02.24 11:08, wodel youchi wrote:
>
> > Then I tried to deploy using this command on the admin node:
> > cephadm --image 192.168.2.36:4000/ceph/ceph:v17 bootstrap --mon-ip
> > 10.1.0.23 --cluster-network 10.2.0.0/16
> >
> > After the boot strap I found that it still downloads the images from the
> > internet, even the ceph image itself, I see two images one from my
> registry
> > the second from quay.
>
> To quote the docs: you can run cephadm bootstrap -h to see all of
> cephadm’s available options.
>
> These options are available:
>
>--registry-url REGISTRY_URL
>  url for custom registry
>--registry-username REGISTRY_USERNAME
>  username for custom registry
>--registry-password REGISTRY_PASSWORD
>  password for custom registry
>--registry-json REGISTRY_JSON
>  json file with custom registry login info (URL,
> Username, Password)
>
> Regards
> --
> Robert Sander
> Heinlein Consulting GmbH
> Schwedter Str. 8/9b, 10119 Berlin
>
> http://www.heinlein-support.de
>
> Tel: 030 / 405051-43
> Fax: 030 / 405051-19
>
> Zwangsangaben lt. §35a GmbHG:
> HRB 220009 B / Amtsgericht Berlin-Charlottenburg,
> Geschäftsführer: Peer Heinlein -- Sitz: Berlin
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Some questions about cephadm

2024-02-26 Thread Robert Sander

Hi,

On 26.02.24 11:08, wodel youchi wrote:


Then I tried to deploy using this command on the admin node:
cephadm --image 192.168.2.36:4000/ceph/ceph:v17 bootstrap --mon-ip
10.1.0.23 --cluster-network 10.2.0.0/16

After the boot strap I found that it still downloads the images from the
internet, even the ceph image itself, I see two images one from my registry
the second from quay.


To quote the docs: you can run cephadm bootstrap -h to see all of cephadm’s 
available options.

These options are available:

  --registry-url REGISTRY_URL
url for custom registry
  --registry-username REGISTRY_USERNAME
username for custom registry
  --registry-password REGISTRY_PASSWORD
password for custom registry
  --registry-json REGISTRY_JSON
json file with custom registry login info (URL, 
Username, Password)

Regards
--
Robert Sander
Heinlein Consulting GmbH
Schwedter Str. 8/9b, 10119 Berlin

http://www.heinlein-support.de

Tel: 030 / 405051-43
Fax: 030 / 405051-19

Zwangsangaben lt. §35a GmbHG:
HRB 220009 B / Amtsgericht Berlin-Charlottenburg,
Geschäftsführer: Peer Heinlein -- Sitz: Berlin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-26 Thread Matthew Leonard (BLOOMBERG/ 120 PARK)
Glad to hear it all worked out for you! 

From: nguyenvand...@baoviet.com.vn At: 02/26/24 05:32:32 UTC-5:00To:  
ceph-users@ceph.io
Subject: [ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in 
recovering

Dear Mr Eugen, Mr Matthew, Mr David, Mr Anthony

My System is UP.

Thank you so much. We get many support from all of you., mazing, kindly support 
from Top professional in Ceph. 
Hope we have a chance to cooperate in the future. And If you travel to VietNam 
in future, let me know. I ll be your local tour guide and we can get some beers.

Once again, Thank you


the solution is=

Try :
ceph config set mds mds_deny_all_reconnect true

and restart MDS.

After that (mds active) :
ceph config rm mds mds_deny_all_reconnect

From Mr David
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-26 Thread nguyenvandiep
Dear Mr Eugen, Mr Matthew, Mr David, Mr Anthony

My System is UP.

Thank you so much. We get many support from all of you., mazing, kindly support 
from Top professional in Ceph. 
Hope we have a chance to cooperate in the future. And If you travel to VietNam 
in future, let me know. I ll be your local tour guide and we can get some beers.

Once again, Thank you



the solution is=

Try :
ceph config set mds mds_deny_all_reconnect true

and restart MDS.

After that (mds active) :
ceph config rm mds mds_deny_all_reconnect

From Mr David
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: What exactly does the osd pool repair funtion do?

2024-02-26 Thread Eugen Block

Hi,

I'm not a dev, but as I understand it, the command would issue a 'pg  
repair' on each (primary) PG of the provided pool. It might be useful  
if you have multiple (or even many) inconsistent PGs in a pool. But  
I've never used that and this is just a hypothesis.


Regards,
Eugen

Zitat von Aleksander Pähn :


What exactly does the osd pool repair function do?
Documentation is not clear.

Kind regards,
AP


This e-mail may contain information that is privileged or  
confidential. If you are not the intended recipient, please delete  
the e-mail and any attachments and notify us immediately.


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Some questions about cephadm

2024-02-26 Thread wodel youchi
Thank you all for your help.

@Adam
From the reading you gave me I have understood the following :
1 - Set osd_memory_target_autotune to true then set
autotune_memory_target_ratio to 0.2
2 - Or do the math. For my setup I have 384Go per node, each node has 4
nvme disks of 7.6To, 0.2 of memory is 19.5G. So each OSD will have 19G of
memory.

Question : Should I take into account the size of the disk when calculating
the required memory for an OSD?


I have another problem, the local registry. I deployed a local registry
with the required images, then I used cephadm-ansible to prepare my hosts
and inject the local registry url into /etc/container/registry.conf file

Then I tried to deploy using this command on the admin node:
cephadm --image 192.168.2.36:4000/ceph/ceph:v17 bootstrap --mon-ip
10.1.0.23 --cluster-network 10.2.0.0/16

After the boot strap I found that it still downloads the images from the
internet, even the ceph image itself, I see two images one from my registry
the second from quay.

There is a section that talks about using a local registry here
https://docs.ceph.com/en/reef/cephadm/install/#deployment-in-an-isolated-environment,
but it's not clear especially about the other images. It talks about
preparing a temporary file named initial-ceph.conf, then it does not use
it???!!!

Could you help?

Regards.

Le jeu. 22 févr. 2024 à 11:10, Eugen Block  a écrit :

> Hi,
>
> just responding to the last questions:
>
> >- After the bootstrap, the Web interface was accessible :
> >   - How can I access the wizard page again? If I don't use it the
> first
> >   time I could not find another way to get it.
>
> I don't know how to recall the wizard, but you should be able to
> create a new dashboard user with your desired role (e. g.
> administrator) from the CLI:
>
> ceph dashboard ac-user-create  [] -i
> 
>
> >   - I had a problem with telemetry, I did not configure telemetry,
> then
> >   when I clicked the button, the web gui became
> inaccessible.!!!
>
> You can see what happened in the active MGR log.
>
> Zitat von wodel youchi :
>
> > Hi,
> >
> > I have some questions about ceph using cephadm.
> >
> > I used to deploy ceph using ceph-ansible, now I have to move to cephadm,
> I
> > am in my learning journey.
> >
> >
> >- How can I tell my cluster that it's a part of an HCI deployment?
> With
> >ceph-ansible it was easy using is_hci : yes
> >- The documentation of ceph does not indicate what versions of
> grafana,
> >prometheus, ...etc should be used with a certain version.
> >   - I am trying to deploy Quincy, I did a bootstrap to see what
> >   containers were downloaded and their version.
> >   - I am asking because I need to use a local registry to deploy
> those
> >   images.
> >- After the bootstrap, the Web interface was accessible :
> >   - How can I access the wizard page again? If I don't use it the
> first
> >   time I could not find another way to get it.
> >   - I had a problem with telemetry, I did not configure telemetry,
> then
> >   when I clicked the button, the web gui became
> inaccessible.!!!
> >
> >
> >
> > Regards.
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Scrub stuck and 'pg has invalid (post-split) stat'

2024-02-26 Thread Eugen Block

Hi,

thanks for the context. Was there any progress over the weekend? The  
hanging commands seem to be MGR related, and there's only one in your  
cluster according to your output. Can you deploy a second one  
manually, then adopt it with cephadm? Can you add 'ceph versions' as  
well?



Zitat von florian.le...@socgen.com:


Hi,
A bit of history might help to understand why we have the cache tier.

We run openstack on top ceph since many years now (started with  
mimic, then an upgrade to nautilus (years 2 ago) and today and  
upgrade to pacific). At the beginning of the setup, we used to have  
a mix of hdd+ssd devices in HCI mode for openstack nova. After the  
upgrade to nautilus, we made a hardware refresh with brand new NVME  
devices. And transitionned from mixed devices to nvme. But we were  
never able to evict all the data from the vms_cache pools (even with  
being aggressive with the eviction; the last resort would have been  
to stop all the virtual instances, and that was not an option for  
our customers), so we decided to move on and set cache-mode proxy  
and serve data with only nvme since then. And it's been like this  
for 1 years and a half.


But today, after the upgrade, the situation is that we cannot query  
any stats (with ceph pg x.x query), rados query hangs, scrub hangs  
even though all PGs are "active+clean". and there is no client  
activity reported by the cluster. Recovery, and rebalance. Also some  
other commands hangs, ie: "ceph balancer status".


--
bash-4.2$ ceph -s
  cluster:
id: 
health: HEALTH_WARN
mon is allowing insecure global_id reclaim
noscrub,nodeep-scrub,nosnaptrim flag(s) set
18432 slow ops, oldest one blocked for 7626 sec, daemons  
[osd.0,osd.1,osd.10,osd.11,osd.112,osd.113,osd.118,osd.119,osd.120,osd.122]... have slow  
ops.


  services:
mon: 3 daemons, quorum mon1,mon2,mon3(age 36m)
mgr: bm9612541(active, since 39m)
osd: 72 osds: 72 up (since 97m), 72 in (since 9h)
 flags noscrub,nodeep-scrub,nosnaptrim

  data:
pools:   8 pools, 2409 pgs
objects: 14.64M objects, 92 TiB
usage:   276 TiB used, 143 TiB / 419 TiB avail
pgs: 2409 active+clean
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: pg repair doesn't fix "got incorrect hash on read" / "candidate had an ec hash mismatch"

2024-02-26 Thread Eugen Block

Hi,

I think your approach makes sense. But I'm wondering if moving only  
the problematic PGs to different OSDs could have an effect as well. I  
assume that moving the 2 PGs is much quicker than moving all BUT those  
2 PGs. If that doesn't work you could still fall back to draining the  
entire OSDs (except for the problematic PG).


Regards,
Eugen

Zitat von Kai Stian Olstad :


Hi,

No one have any comment at all?
I'm not picky so any speculation, guessing, I would, I wouldn't,  
should work and so one would be highly appreciated.



Since 4 out of 6 in EC 4+2 is OK and ceph pg repair doesn't solve it  
I think the following might work.


pg 404.bc acting [223,297,269,276,136,197]

- Use pgremapper to move all PG on OSD 223 and 269 except 404.bc to  
other OSD.

- Set min_since to 4, ceph osd pool set default.rgw.buckets.data min_size 4
- Stop osd 223 and 269

What I hope will happen is that Ceph then recreate 404.bc shard  
s0(osd.223) and s2(osd.269) since they are now down from the  
remaining shards

s1(osd.297), s3(osd.276), s4(osd.136) and s5(osd.197)


_Any_ comment is highly appreciated.

-
Kai Stian Olstad


On 21.02.2024 13:27, Kai Stian Olstad wrote:

Hi,

Short summary

PG 404.bc is an EC 4+2 where s0 and s2 report hash mismtach for 698 objects.
Ceph pg repair doesn't fix it, because if you run deep-srub on the  
PG after repair is finished, it still report scrub errors.


Why can't ceph pg repair repair this, it has 4 out of 6 should be  
able to reconstruct the corrupted shards?
Is there a way to fix this? Like delete object s0 and s2 so it's  
forced to recreate them?



Long detailed summary

A short backstory.
* This is aftermath of problems with mclock, post "17.2.7:  
Backfilling deadlock / stall / stuck / standstill" [1].

 - 4 OSDs had a few bad sectors, set all 4 out and cluster stopped.
 - Solution was to swap from mclock to wpq and restart alle OSD.
 - When all backfilling was finished all 4 OSD was replaced.
 - osd.223 and osd.269 was 2 of the 4 OSDs that was replaced.


PG / pool 404 is EC 4+2 default.rgw.buckets.data

9 days after the osd.223 og osd.269 was replaced, deep-scub was run  
and reported errors

   ceph status
   ---
   HEALTH_ERR 1396 scrub errors; Possible data damage: 1 pg inconsistent
   [ERR] OSD_SCRUB_ERRORS: 1396 scrub errors
   [ERR] PG_DAMAGED: Possible data damage: 1 pg inconsistent
   pg 404.bc is active+clean+inconsistent, acting  
[223,297,269,276,136,197]


I then run repair
   ceph pg repair 404.bc

And ceph status showed this
   ceph status
   ---
   HEALTH_WARN Too many repaired reads on 2 OSDs
   [WRN] OSD_TOO_MANY_REPAIRS: Too many repaired reads on 2 OSDs
   osd.223 had 698 reads repaired
   osd.269 had 698 reads repaired

But osd.223 and osd.269 is new disks and the disks has no SMART  
error or any I/O error in OS logs.

So I tried to run deep-scrub again on the PG.
   ceph pg deep-scrub 404.bc

And got this result.

   ceph status
   ---
   HEALTH_ERR 1396 scrub errors; Too many repaired reads on 2 OSDs;  
Possible data damage: 1 pg inconsistent

   [ERR] OSD_SCRUB_ERRORS: 1396 scrub errors
   [WRN] OSD_TOO_MANY_REPAIRS: Too many repaired reads on 2 OSDs
   osd.223 had 698 reads repaired
   osd.269 had 698 reads repaired
   [ERR] PG_DAMAGED: Possible data damage: 1 pg inconsistent
   pg 404.bc is  
active+clean+scrubbing+deep+inconsistent+repair, acting  
[223,297,269,276,136,197]


698 + 698 = 1396 so the same amount of errors.

Run repair again on 404.bc and ceph status is

   HEALTH_WARN Too many repaired reads on 2 OSDs
   [WRN] OSD_TOO_MANY_REPAIRS: Too many repaired reads on 2 OSDs
   osd.223 had 1396 reads repaired
   osd.269 had 1396 reads repaired

So even when repair finish it doesn't fix the problem since they  
reappear again after a deep-scrub.


The log for osd.223 and osd.269 contain "got incorrect hash on  
read" and "candidate had an ec hash mismatch" for 698 unique objects.
But i only show the logs for 1 of the 698 object, the log is the  
same for the other 697 objects.


   osd.223 log (only showing 1 of 698 object named  
2021-11-08T19%3a43%3a50,145489260+00%3a00)

   ---
   Feb 20 10:31:00 ceph-hd-003 ceph-osd[3665432]: osd.223 pg_epoch:  
231235 pg[404.bcs0( v 231235'1636919  
(231078'1632435,231235'1636919] local-lis/les=226263/226264  
n=296580 ec=36041/27862 lis/c=226263/226263 les/c/f=226264/230954/0  
sis=226263) [223,297,269,276,136,197]p223(0) r=0 lpr=226263  
crt=231235'1636919 lcod 231235'1636918 mlcod 231235'1636918  
active+clean+scrubbing+deep+inconsistent+repair [ 404.bcs0:   
REQ_SCRUB ]  MUST_REPAIR MUST_DEEP_SCRUB MUST_SCRUB planned  
REQ_SCRUB] _scan_list   
404:3d001f95:::1f244892-a2e7-406b-aa62-1b13511333a2.625411.3__multipart_2021-11-08T19%3a43%3a50,145489260+00%3a00.2~OoetD5vkh8fyh-2eeR7GF5rZK7d5EVa.1:head got incorrect hash on read 0xc5d1dd1b !=  expected  
0x7c2f86d7
   Feb 20 10:31:01 ceph-hd-003 ceph-osd[3665432]:  

[ceph-users] Re: Is a direct Octopus to Reef Upgrade Possible?

2024-02-26 Thread Eugen Block

Hi,

no, you can't go directly from O to R, you need to upgrade to Q first.  
Technically it might be possible but it's not supported.
Your approach to first adopt the cluster by cephadm is my preferred  
way as well.


Regards,
Eugen

Zitat von "Alex Hussein-Kershaw (HE/HIM)" :


Hi ceph-users,

I currently use Ceph Octopus to provide CephFS & S3 Storage for our  
app servers, deployed in containers by ceph-ansible. I'm planning to  
take an upgrade to get off Ceph Octopus as it's EOL.


I'd love to go straight to reef, but vaguely remember reading a  
statement that only two major versions can be taken on upgrade. I've  
failed to find that statement again.


Is it possible to go directly from Octopus straight to Reef?

I think a sensible approach here is to first migrate our existing  
deployments to use cephadm, and then use cephadm to upgrade. Any  
advice on this very welcome.


Many thanks,
Alex

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ambigous mds behind on trimming and slowops (ceph 17.2.5 and rook operator 1.10.8)

2024-02-26 Thread Dhairya Parmar
Hi,

May I know which version is being used in the cluster?

It was started after 2 hours of one of the active mds was crashed

Do we know the reason for the crash?

Please share more info, `ceph -s` and MDS logs should reveal more insights.

--
*Dhairya Parmar*

Associate Software Engineer, CephFS

IBM, Inc.



On Fri, Feb 23, 2024 at 8:13 PM  wrote:

> Team,
>
> Guys,
>
> We were facing cephFs volume mount issue and ceph status it was showing
>  mds slow requests
>  Mds behind on trimming
>
> After restarting mds pods it was resolved
> But wanted to know Root caus of this
> It was started after 2 hours of one of the active mds was crashed
> So does that an active mds crash can cause this issue ?
>
>
> Please provide your inputs anyone
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io