[ceph-users] Re: MDS Behind on Trimming...

2024-03-27 Thread Xiubo Li


On 3/28/24 04:03, Erich Weiler wrote:

Hi All,

I've been battling this for a while and I'm not sure where to go from 
here.  I have a Ceph health warning as such:


# ceph -s
  cluster:
    id: 58bde08a-d7ed-11ee-9098-506b4b4da440
    health: HEALTH_WARN
    1 MDSs report slow requests


There had slow requests. I just suspect the behind on trimming was 
caused by this.


Could you share the logs about the slow requests ? What are they ?

Thanks



1 MDSs behind on trimming

  services:
    mon: 5 daemons, quorum 
pr-md-01,pr-md-02,pr-store-01,pr-store-02,pr-md-03 (age 5d)

    mgr: pr-md-01.jemmdf(active, since 3w), standbys: pr-md-02.emffhz
    mds: 1/1 daemons up, 2 standby
    osd: 46 osds: 46 up (since 9h), 46 in (since 2w)

  data:
    volumes: 1/1 healthy
    pools:   4 pools, 1313 pgs
    objects: 260.72M objects, 466 TiB
    usage:   704 TiB used, 424 TiB / 1.1 PiB avail
    pgs: 1306 active+clean
 4    active+clean+scrubbing+deep
 3    active+clean+scrubbing

  io:
    client:   123 MiB/s rd, 75 MiB/s wr, 109 op/s rd, 1.40k op/s wr

And the specifics are:

# ceph health detail
HEALTH_WARN 1 MDSs report slow requests; 1 MDSs behind on trimming
[WRN] MDS_SLOW_REQUEST: 1 MDSs report slow requests
    mds.slugfs.pr-md-01.xdtppo(mds.0): 99 slow requests are blocked > 
30 secs

[WRN] MDS_TRIM: 1 MDSs behind on trimming
    mds.slugfs.pr-md-01.xdtppo(mds.0): Behind on trimming (13884/250) 
max_segments: 250, num_segments: 13884


That "num_segments" number slowly keeps increasing.  I suspect I just 
need to tell the MDS servers to trim faster but after hours of 
googling around I just can't figure out the best way to do it. The 
best I could come up with was to decrease "mds_cache_trim_decay_rate" 
from 1.0 to .8 (to start), based on this page:


https://www.suse.com/support/kb/doc/?id=19740

But it doesn't seem to help, maybe I should decrease it further? I am 
guessing this must be a common issue...?  I am running Reef on the MDS 
servers, but most clients are on Quincy.


Thanks for any advice!

cheers,
erich
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Failed adding back a node

2024-03-27 Thread Alex
Hi Adam!

In addition to my earlier question of is there a way of trying a more
targeted upgrade first so we don't risk accidentally breaking the
entire production cluster,

`ceph config dump | grep container_image` shows:

global
 basic container_image
registry.redhat.io/rhceph/rhceph-5-rhel8@sha256:a193b0de114d19d2efd8750046b5d25da07e2c570e3c4eb4bd93e6de4b90a25a
 *
  mon.mon01
 basic container_image
registry.redhat.io/rhceph/rhceph-5-rhel8:latest
   *
  mon.mon03
 basic container_image
registry.redhat.io/rhceph/rhceph-5-rhel8:16.2.10-160
   *
  mgr
 advanced  mgr/cephadm/container_image_alertmanager
registry.redhat.io/openshift4/ose-prometheus-alertmanager:v4.6
   *
  mgr
 advanced  mgr/cephadm/container_image_base
registry.redhat.io/rhceph/rhceph-5-rhel8
  mgr
 advanced  mgr/cephadm/container_image_grafana
registry.redhat.io/rhceph/rhceph-5-dashboard-rhel8:5
   *
  mgr
 advanced  mgr/cephadm/container_image_node_exporter
registry.redhat.io/openshift4/ose-prometheus-node-exporter:v4.6
   *
  mgr
 advanced  mgr/cephadm/container_image_prometheus
registry.redhat.io/openshift4/ose-prometheus:v4.6
   *
  mgr.mon01
 basic container_image
registry.redhat.io/rhceph/rhceph-5-rhel8:latest
   *
  mgr.mon03
 basic container_image
registry.redhat.io/rhceph/rhceph-5-rhel8@sha256:a193b0de114d19d2efd8750046b5d25da07e2c570e3c4eb4bd93e6de4b90a25a
 *

and do you think I'd still need to rm that one osd that i successfully
created but not added or would that get "pulled in" when I add the
other 19 osds?

`podman image list shows:
REPOSITORY  TAG
 IMAGE ID  CREATEDSIZE
registry.redhat.io/rhceph/rhceph-5-rhel8latest
 1d636b23ab3e  8 weeks ago1.02 GB
`
so would I be running `ceph orch upgrade start --image  1d636b23ab3e` ?

Thanks again.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Failed adding back a node

2024-03-27 Thread Alex
Thanks! Is there a way of trying out the update on one osd first to make
sure we don't nuke the entire production cluster?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Failed adding back a node

2024-03-27 Thread Adam King
From the ceph versions output I can see

"osd": {
"ceph version 16.2.10-160.el8cp
(6977980612de1db28e41e0a90ff779627cde7a8c) pacific (stable)": 160
},

It seems like all the OSD daemons on this cluster are using that
16.2.10-160 image, and I'm guessing most of them are running, so it must
have existed at some point. Curious if `ceph config dump | grep
container_image` will show a different image setting for the OSD. Anyway,
in terms of moving forward it might be best to try to get all the daemons
onto an image you know works. I also see both 16.2.10-208 and 16.2.10-248
listed as versions, which implies there are two different images being used
even between the other daemons. Unless there's a reason for all these
different images, I'd just pick the most up to date one, that you know can
be pulled on all hosts, and do a `ceph orch upgrade start --image
`. That would get all the daemons on that single image, and
might fix the broken OSDs that are failing to pull the 16.2.10-160 image.

On Wed, Mar 27, 2024 at 8:56 PM Alex  wrote:

> Hello.
>
> We're rebuilding our OSD nodes.
> Once cluster worked without any issues, this one is being stubborn
>
> I attempted to add one back to the cluster and seeing the error below
> in out logs:
>
> cephadm ['--image',
> 'registry.redhat.io/rhceph/rhceph-5-rhel8:16.2.10-160', 'pull']
> 2024-03-27 19:30:53,901 7f49792ed740 DEBUG /bin/podman: 4.6.1
> 2024-03-27 19:30:53,905 7f49792ed740 INFO Pulling container image
> registry.redhat.io/rhceph/rhceph-5-rhel8:16.2.10-160...
> 2024-03-27 19:30:54,045 7f49792ed740 DEBUG /bin/podman: Trying to pull
> registry.redhat.io/rhceph/rhceph-5-rhel8:16.2.10-160...
> 2024-03-27 19:30:54,266 7f49792ed740 DEBUG /bin/podman: Error:
> initializing source
> docker://registry.redhat.io/rhceph/rhceph-5-rhel8:16.2.10-160: reading
> manifest 16.2.10-160 in registry.redhat.io/rhceph/rhceph-5-rhel8:
> manifest unknown
> 2024-03-27 19:30:54,270 7f49792ed740 INFO Non-zero exit code 125 from
> /bin/podman pull registry.redhat.io/rhceph/rhceph-5-rhel8:16.2.10-160
> 2024-03-27
> 
> 19:30:54,270 7f49792ed740 INFO /bin/podman: stderr Trying
> to pull registry.redhat.io/rhceph/rhceph-5-rhel8:16.2.10-160...
> 2024-03-27 19:30:54,270 7f49792ed740 INFO /bin/podman: stderr Error:
> initializing source
> docker://registry.redhat.io/rhceph/rhceph-5-rhel8:16.2.10-160: reading
> manifest 16.2.10-160 in registry.redhat.io/rhceph/rhceph-5-rhel8:
> manifest unknown
> 2024-03-27 19:30:54,270 7f49792ed740 ERROR ERROR: Failed command:
> /bin/podman pull registry.redhat.io/rhceph/rhceph-5-rhel8:16.2.10-160
>
> $ ceph versions
> {
> "mon": {
> "ceph version 16.2.10-208.el8cp
> (791f73fbb4bbca2ffe53a2ea0f8706dbffadcc0b) pacific (stable)": 1,
> "ceph version 16.2.10-248.el8cp
> (0edb63afd9bd3edb64f2e0031b77e62f4896) pacific (stable)": 2
> },
> "mgr": {
> "ceph version 16.2.10-208.el8cp
> (791f73fbb4bbca2ffe53a2ea0f8706dbffadcc0b) pacific (stable)": 1,
> "ceph version 16.2.10-248.el8cp
> (0edb63afd9bd3edb64f2e0031b77e62f4896) pacific (stable)": 2
> },
> "osd": {
> "ceph version 16.2.10-160.el8cp
> (6977980612de1db28e41e0a90ff779627cde7a8c) pacific (stable)": 160
> },
> "mds": {},
> "rgw": {
> "ceph version 16.2.10-208.el8cp
> (791f73fbb4bbca2ffe53a2ea0f8706dbffadcc0b) pacific (stable)": 3
> },
> "overall": {
> "ceph version 16.2.10-160.el8cp
> (6977980612de1db28e41e0a90ff779627cde7a8c) pacific (stable)": 160,
> "ceph version 16.2.10-208.el8cp
> (791f73fbb4bbca2ffe53a2ea0f8706dbffadcc0b) pacific (stable)": 5,
> "ceph version 16.2.10-248.el8cp
> (0edb63afd9bd3edb64f2e0031b77e62f4896) pacific (stable)": 4
> }
> }
>
> I don't understand why it's trying to pull 16.2.10-160 which doesn't exist.
>
> registry.redhat.io/rhceph/rhceph-5-dashboard-rhel8 5 93b3137e7a65 11
> months ago 696 MB
> registry.redhat.io/rhceph/rhceph-5-rhel8 5-416 838cea16e15c 11 months
> ago 1.02 GB
> registry.redhat.io/openshift4/ose-prometheus v4.6 ec2d358ca73c 17
> months ago 397 MB
>
>
> This happens using cepadm-ansible as well as
> $ ceph orch ls --export --service_name xxx > xxx.yml
> $ sudo ceph orch apply -i xxx.yml
>
> I tried ceph orch daemon add osd host:/dev/sda
> which surprisingly created a volume on host:/dev/sda and created an
> osd i can see in
> $ ceph osd tree
>
> but It did not get added to host I suspect because of the same Podman
> error and now I'm unable remove it.
> $ ceph orch osd rm
> does not work even with the --force flag.
>
> I stopped the removal with
> $ ceph orch osd rm stop
> after 10+ minutes
>
> I'm considering running $ ceph osd purge osd# --force but worried it
> may only make things worse.
> ceph -s shows that osd but not up or in.
>
> Thanks, and looking forward to any advice!
> ___
> ceph-users mailing list -- 

[ceph-users] Re: Call for Interest: Managed SMB Protocol Support

2024-03-27 Thread Angelo Hongens

Yes, I'd love this!

A lot of companies want samba for simple file access from windows/mac 
clients. I know quite some companies that buy netapp as 'easy smb storage'.


Having ceph do built-in (or bolt-on) samba instead of having to manage 
external samba clusters would be nice, and would make it more accessible 
to replace above storage.


And the result would be better integration between samba and ceph. 
Perhaps in code, but also in documentation and example configs.


I set up my own 2 physical node samba cluster, with gluster to host the 
CTDB lock file. (with a third machine, a vm, to act as the third node in 
the gluster cluster). According to 45drives, saving the CTDB lock file 
in CephFS is a bad idea, and doing some rados-mutex thingy was too 
complex for me. And this whole solution feels a bit hackish, although it 
works wonders. Having a unified tried and tested solution where everyone 
is doing the same thing, sounds great!


Angelo.


On 21/03/2024 15:12, John Mulligan wrote:

Hello Ceph List,

I'd like to formally let the wider community know of some work I've been
involved with for a while now: adding Managed SMB Protocol Support to Ceph.
SMB being the well known network file protocol native to Windows systems and
supported by MacOS (and Linux). The other key word "managed" meaning
integrating with Ceph management tooling - in this particular case cephadm for
orchestration and eventually a new MGR module for managing SMB shares.

The effort is still in it's very early stages. We have a PR adding initial
support for Samba Containers to cephadm [1] and a prototype for an smb MGR
module [2]. We plan on using container images based on the samba-container
project [3] - a team I am already part of. What we're aiming for is a feature
set similar to the current NFS integration in Ceph, but with a focus on
bridging non-Linux/Unix clients to CephFS using a protocol built into those
systems.

A few major features we have planned include:
* Standalone servers (internally defined users/groups)
* Active Directory Domain Member Servers
* Clustered Samba support
* Exporting Samba stats via Prometheus metrics
* A `ceph` cli workflow loosely based on the nfs mgr module

I wanted to share this information in case there's wider community interest in
this effort. I'm happy to take your questions / thoughts / suggestions in this
email thread, via Ceph slack (or IRC), or feel free to attend a Ceph
Orchestration weekly meeting! I try regularly attend and we sometimes discuss
design aspects of the smb effort there. It's on the Ceph Community Calendar.
Thanks!


[1] - https://github.com/ceph/ceph/pull/55068
[2] - https://github.com/ceph/ceph/pull/56350
[3] - https://github.com/samba-in-kubernetes/samba-container/


Thanks for reading,
--John Mulligan


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Cephadm host keeps trying to set osd_memory_target to less than minimum

2024-03-27 Thread Adam King
 I missed a step in the calculation. The total_memory_kb I mentioned
earlier is also multiplied by the value of the
mgr/cephadm/autotune_memory_target_ratio before doing the subtractions for
all the daemons. That value defaults to 0.7. That might explain it seeming
like it's getting a value lower than expected. Beyond that, I'd think 'i'd
need a list of the daemon types and count on that host to try and work
through what it's doing.

On Wed, Mar 27, 2024 at 10:47 AM Mads Aasted  wrote:

> Hi Adam.
>
> So doing the calculations with what you are stating here I arrive at a
> total sum for all the listed processes at 13.3 (roughly) gb, for everything
> except the osds, leaving well in excess of +4gb for each OSD.
> Besides the mon daemon which i can tell on my host has a limit of 2gb ,
> none of the other daemons seem to have a limit set according to ceph orch
> ps. Then again, they are nowhere near the values stated in min_size_by_type
> that you list.
> Obviously yes, I could disable the auto tuning, but that would leave me
> none the wiser as to why this exact host is trying to do this.
>
>
>
> On Tue, Mar 26, 2024 at 10:20 PM Adam King  wrote:
>
>> For context, the value the autotune goes with takes the value from
>> `cephadm gather-facts` on the host (the "memory_total_kb" field) and then
>> subtracts from that per daemon on the host according to
>>
>> min_size_by_type = {
>> 'mds': 4096 * 1048576,
>> 'mgr': 4096 * 1048576,
>> 'mon': 1024 * 1048576,
>> 'crash': 128 * 1048576,
>> 'keepalived': 128 * 1048576,
>> 'haproxy': 128 * 1048576,
>> 'nvmeof': 4096 * 1048576,
>> }
>> default_size = 1024 * 1048576
>>
>> what's left is then divided by the number of OSDs on the host to arrive
>> at the value. I'll also add, since it seems to be an issue on this
>> particular host,  if you add the "_no_autotune_memory" label to the host,
>> it will stop trying to do this on that host.
>>
>> On Mon, Mar 25, 2024 at 6:32 PM  wrote:
>>
>>> I have a virtual ceph cluster running 17.2.6 with 4 ubuntu 22.04 hosts
>>> in it, each with 4 OSD's attached. The first 2 servers hosting mgr's have
>>> 32GB of RAM each, and the remaining have 24gb
>>> For some reason i am unable to identify, the first host in the cluster
>>> appears to constantly be trying to set the osd_memory_target variable to
>>> roughly half of what the calculated minimum is for the cluster, i see the
>>> following spamming the logs constantly
>>> Unable to set osd_memory_target on my-ceph01 to 480485376: error parsing
>>> value: Value '480485376' is below minimum 939524096
>>> Default is set to 4294967296.
>>> I did double check and osd_memory_base (805306368) +
>>> osd_memory_cache_min (134217728) adds up to minimum exactly
>>> osd_memory_target_autotune is currently enabled. But i cannot for the
>>> life of me figure out how it is arriving at 480485376 as a value for that
>>> particular host that even has the most RAM. Neither the cluster or the host
>>> is even approaching max utilization on memory, so it's not like there are
>>> processes competing for resources.
>>> ___
>>> ceph-users mailing list -- ceph-users@ceph.io
>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>>
>>>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Failed adding back a node

2024-03-27 Thread Alex
Hello.

We're rebuilding our OSD nodes.
Once cluster worked without any issues, this one is being stubborn

I attempted to add one back to the cluster and seeing the error below
in out logs:

cephadm ['--image',
'registry.redhat.io/rhceph/rhceph-5-rhel8:16.2.10-160', 'pull']
2024-03-27 19:30:53,901 7f49792ed740 DEBUG /bin/podman: 4.6.1
2024-03-27 19:30:53,905 7f49792ed740 INFO Pulling container image
registry.redhat.io/rhceph/rhceph-5-rhel8:16.2.10-160...
2024-03-27 19:30:54,045 7f49792ed740 DEBUG /bin/podman: Trying to pull
registry.redhat.io/rhceph/rhceph-5-rhel8:16.2.10-160...
2024-03-27 19:30:54,266 7f49792ed740 DEBUG /bin/podman: Error:
initializing source
docker://registry.redhat.io/rhceph/rhceph-5-rhel8:16.2.10-160: reading
manifest 16.2.10-160 in registry.redhat.io/rhceph/rhceph-5-rhel8:
manifest unknown
2024-03-27 19:30:54,270 7f49792ed740 INFO Non-zero exit code 125 from
/bin/podman pull registry.redhat.io/rhceph/rhceph-5-rhel8:16.2.10-160
2024-03-27 19:30:54,270 7f49792ed740 INFO /bin/podman: stderr Trying
to pull registry.redhat.io/rhceph/rhceph-5-rhel8:16.2.10-160...
2024-03-27 19:30:54,270 7f49792ed740 INFO /bin/podman: stderr Error:
initializing source
docker://registry.redhat.io/rhceph/rhceph-5-rhel8:16.2.10-160: reading
manifest 16.2.10-160 in registry.redhat.io/rhceph/rhceph-5-rhel8:
manifest unknown
2024-03-27 19:30:54,270 7f49792ed740 ERROR ERROR: Failed command:
/bin/podman pull registry.redhat.io/rhceph/rhceph-5-rhel8:16.2.10-160

$ ceph versions
{
"mon": {
"ceph version 16.2.10-208.el8cp
(791f73fbb4bbca2ffe53a2ea0f8706dbffadcc0b) pacific (stable)": 1,
"ceph version 16.2.10-248.el8cp
(0edb63afd9bd3edb64f2e0031b77e62f4896) pacific (stable)": 2
},
"mgr": {
"ceph version 16.2.10-208.el8cp
(791f73fbb4bbca2ffe53a2ea0f8706dbffadcc0b) pacific (stable)": 1,
"ceph version 16.2.10-248.el8cp
(0edb63afd9bd3edb64f2e0031b77e62f4896) pacific (stable)": 2
},
"osd": {
"ceph version 16.2.10-160.el8cp
(6977980612de1db28e41e0a90ff779627cde7a8c) pacific (stable)": 160
},
"mds": {},
"rgw": {
"ceph version 16.2.10-208.el8cp
(791f73fbb4bbca2ffe53a2ea0f8706dbffadcc0b) pacific (stable)": 3
},
"overall": {
"ceph version 16.2.10-160.el8cp
(6977980612de1db28e41e0a90ff779627cde7a8c) pacific (stable)": 160,
"ceph version 16.2.10-208.el8cp
(791f73fbb4bbca2ffe53a2ea0f8706dbffadcc0b) pacific (stable)": 5,
"ceph version 16.2.10-248.el8cp
(0edb63afd9bd3edb64f2e0031b77e62f4896) pacific (stable)": 4
}
}

I don't understand why it's trying to pull 16.2.10-160 which doesn't exist.

registry.redhat.io/rhceph/rhceph-5-dashboard-rhel8 5 93b3137e7a65 11
months ago 696 MB
registry.redhat.io/rhceph/rhceph-5-rhel8 5-416 838cea16e15c 11 months
ago 1.02 GB
registry.redhat.io/openshift4/ose-prometheus v4.6 ec2d358ca73c 17
months ago 397 MB


This happens using cepadm-ansible as well as
$ ceph orch ls --export --service_name xxx > xxx.yml
$ sudo ceph orch apply -i xxx.yml

I tried ceph orch daemon add osd host:/dev/sda
which surprisingly created a volume on host:/dev/sda and created an
osd i can see in
$ ceph osd tree

but It did not get added to host I suspect because of the same Podman
error and now I'm unable remove it.
$ ceph orch osd rm
does not work even with the --force flag.

I stopped the removal with
$ ceph orch osd rm stop
after 10+ minutes

I'm considering running $ ceph osd purge osd# --force but worried it
may only make things worse.
ceph -s shows that osd but not up or in.

Thanks, and looking forward to any advice!
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Return value from cephadm host-maintenance?

2024-03-27 Thread John Mulligan
(adding the list back to the thread)

On Wednesday, March 27, 2024 12:54:34 PM EDT Daniel Brown wrote:
> John
> 
> 
> I got curious and was taking another quick look through the python script
> for cephadm.
> 

That's always welcome. :-D

> This is probably too simple of a question to be asking — or maybe I should
> say, I’m not expecting that there’s a simple answer to what might seem like
> a simple question -
> 
> Is there anything that notifies the cluster, or the other hosts in a
> cluster, when a host is going into maintenance mode that it is going into
> maintenance mode, or is cephadm just doing systemctl commands behind the
> scenes to stop and later restart the appropriate ceph containers locally on
> that host?
> 
> Maybe a better way to say it would be - what is differentiating between
> maintenance mode and a host simply crashing or going offline?

I'll paraphrase Adam King, tech lead for cephadm here:

If one runs the command from cephadm binary directly, it will be disabling/
stopping the systemd target only. The intention is for users to use the `ceph 
orch host maintenance` ... commands.

When you use the orch command (quoting Adam here):
```
when we put something into maintenance mode we
1) disable and stop the systemd target for the daemons on the host
2) set the noout flag for all the OSDs on that host
3) internally to cephadm mark the host as having a status of "maintenance" 
which has some effects such as us not refreshing metadata on that host or 
attempting to place/remove daemons from there

The main difference from that to a host going offline is the noout flag for the 
OSDs, and that cephadm will not periodically try to check if the host is 
alive, as it would do for an offline host.

I believe the noout flag stops it from trying to migrate all the data on that 
OSDs to other OSDs as it shouldn't be necessary if they will be coming back

```

The `cephadm host-maintenance enter` is meant to be a component of the `ceph 
orch host maintenance` workflow. It still has a bug, the way it always exits 
with an error is wrong. But you may not want to use it directly.


Reference links:
https://docs.ceph.com/en/latest/cephadm/host-management/#maintenance-mode

https://docs.ceph.com/en/latest/dev/cephadm/host-maintenance/





> > On Mar 22, 2024, at 6:26 AM, Daniel Brown 
> > wrote:
> > 
> > 
> > Looks like it got OK’ed. I’ll put in something today.
> > 
> > 
> > --
> > Dan Brown
> > 
> >> On Mar 21, 2024, at 13:44, John Mulligan 
> >> wrote:>> 
> >> On Thursday, March 21, 2024 11:43:19 AM EDT Daniel Brown wrote:
> >>> Assuming I need admin approval to report this on tracker, how long does
> >>> it
> >>> take to get approved?? Signed up a couple days ago, but still seeing
> >>> “Your
> >>> account was created and is now pending administrator approval.”
> >> 
> >> That's unfortunate. I pinged about  your issue signing up on the ceph
> >> slack
> >> channel for infrastructure. Hopefully, that'll get somebody's attention.
> >> If
> >> you don't get access by tomorrow feel free to ping me again directly and
> >> then *I'll* file the issue for you instead of having you wait around
> >> more.



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] MDS Behind on Trimming...

2024-03-27 Thread Erich Weiler

Hi All,

I've been battling this for a while and I'm not sure where to go from 
here.  I have a Ceph health warning as such:


# ceph -s
  cluster:
id: 58bde08a-d7ed-11ee-9098-506b4b4da440
health: HEALTH_WARN
1 MDSs report slow requests
1 MDSs behind on trimming

  services:
mon: 5 daemons, quorum 
pr-md-01,pr-md-02,pr-store-01,pr-store-02,pr-md-03 (age 5d)

mgr: pr-md-01.jemmdf(active, since 3w), standbys: pr-md-02.emffhz
mds: 1/1 daemons up, 2 standby
osd: 46 osds: 46 up (since 9h), 46 in (since 2w)

  data:
volumes: 1/1 healthy
pools:   4 pools, 1313 pgs
objects: 260.72M objects, 466 TiB
usage:   704 TiB used, 424 TiB / 1.1 PiB avail
pgs: 1306 active+clean
 4active+clean+scrubbing+deep
 3active+clean+scrubbing

  io:
client:   123 MiB/s rd, 75 MiB/s wr, 109 op/s rd, 1.40k op/s wr

And the specifics are:

# ceph health detail
HEALTH_WARN 1 MDSs report slow requests; 1 MDSs behind on trimming
[WRN] MDS_SLOW_REQUEST: 1 MDSs report slow requests
mds.slugfs.pr-md-01.xdtppo(mds.0): 99 slow requests are blocked > 
30 secs

[WRN] MDS_TRIM: 1 MDSs behind on trimming
mds.slugfs.pr-md-01.xdtppo(mds.0): Behind on trimming (13884/250) 
max_segments: 250, num_segments: 13884


That "num_segments" number slowly keeps increasing.  I suspect I just 
need to tell the MDS servers to trim faster but after hours of googling 
around I just can't figure out the best way to do it.  The best I could 
come up with was to decrease "mds_cache_trim_decay_rate" from 1.0 to .8 
(to start), based on this page:


https://www.suse.com/support/kb/doc/?id=19740

But it doesn't seem to help, maybe I should decrease it further?  I am 
guessing this must be a common issue...?  I am running Reef on the MDS 
servers, but most clients are on Quincy.


Thanks for any advice!

cheers,
erich
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph user/bucket usage metrics

2024-03-27 Thread Kushagr Gupta
Hi team,

I am new to ceph and I am looking to monitor the user/bucket usage for ceph.
As per the following link:
https://docs.ceph.com/en/latest/radosgw/metrics/

But when I enabled the same using the command:
'ceph config set client.rgw CONFIG_VARIABLE VALUE'

I ould only see the following perf schema:
```
   "rgw": {
"req": {
"type": 10,
"metric_type": "counter",
"value_type": "integer",
"description": "Requests",
"nick": "",
"priority": 5,
"units": "none"
},
"failed_req": {
"type": 10,
"metric_type": "counter",
"value_type": "integer",
"description": "Aborted requests",
"nick": "",
"priority": 5,
"units": "none"
xxxSNIPxxx
```

But as per the link we should have also gotten the following metrices:
```
"rgw_op": [
{
"labels": {},
"counters": {
"put_obj_ops": 2,
"put_obj_bytes": 5327,
"put_obj_lat": {
"avgcount": 2,
"sum": 2.818064835,
"avgtime": 1.409032417
},
"get_obj_ops": 5,
"get_obj_bytes": 5325,
"get_obj_lat": {
"avgcount": 2,
"sum": 0.00369,
"avgtime": 0.001500034
},
...
"list_buckets_ops": 1,
"list_buckets_lat": {
"avgcount": 1,
"sum": 0.00230,
"avgtime": 0.00230
}
}
},
]
```

But as per the following links:
https://github.com/ceph/ceph/blob/v19.0.0/src/rgw/rgw_perf_counters.cc
https://github.com/ceph/ceph/blob/v18.2.2/src/rgw/rgw_perf_counters.cc

I don't think this feature is currently supported
could anyone please help me with this?
Ceph-version being used by us - 18.2.0(reef)/18.2.2

Thanks and Regards,
Kushagra Gupta
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RGW Data Loss Bug in Octopus 15.2.0 through 15.2.6

2024-03-27 Thread xu chenhui
Hi, Eric Ivancich
  I have similar problem in ceph version 16.2.5. Has this problem been 
completely resolved in Pacific version?
Our bucket has no lifecycle rules and no copy operation. This is a very serious 
data loss issue for us and It happens occasionally in our environment. 

Detail describe: 
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/XQRUOEPZ7YY3ZR46EGMDIYY6SQAGCI3H/

thanks.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Call for Interest: Managed SMB Protocol Support

2024-03-27 Thread John Mulligan
On Tuesday, March 26, 2024 10:53:29 PM EDT David Yang wrote:
> This is great, we are currently using the smb protocol heavily to
> export kernel-mounted cephfs.
> But I encountered a problem. When there are many smb clients
> enumerating or listing the same directory, the smb server will
> experience high load, and the smb process will become D state.
> This problem has been going on for some time and no suitable solution
> has been found yet.
> 

Thanks for the heads up. I'll make sure concurrent dir access is part of the 
test plan.

> John Mulligan  于2024年3月26日周二 03:43写道:
> 
> >
> >
> > On Monday, March 25, 2024 3:22:26 PM EDT Alexander E. Patrakov wrote:
> > 
> > > On Mon, Mar 25, 2024 at 11:01 PM John Mulligan
> > >
> > >
> > >
> > >  wrote:
> > > 
> > > > On Friday, March 22, 2024 2:56:22 PM EDT Alexander E. Patrakov wrote:
> > > > 
> > > > > Hi John,
> > > > >
> > > > >
> > > > >
> > > > > > A few major features we have planned include:
> > > > > > * Standalone servers (internally defined users/groups)
> > > > >
> > > > >
> > > > >
> > > > > No concerns here
> > > > >
> > > > >
> > > > >
> > > > > > * Active Directory Domain Member Servers
> > > > >
> > > > >
> > > > >
> > > > > In the second case, what is the plan regarding UID mapping? Is NFS
> > > > > coexistence planned, or a concurrent mount of the same directory
> > > > > using
> > > > > CephFS directly?
> > > >
> > > >
> > > >
> > > > In the immediate future the plan is to have a very simple, fairly
> > > > "opinionated" idmapping scheme based on the autorid backend.
> > >
> > >
> > >
> > > OK, the docs for clustered SAMBA do mention the autorid backend in
> > > examples. It's a shame that the manual page does not explicitly list
> > > it as compatible with clustered setups.
> > >
> > >
> > >
> > > However, please consider that the majority of Linux distributions
> > > (tested: CentOS, Fedora, Alt Linux, Ubuntu, OpenSUSE) use "realmd" to
> > > join AD domains by default (where "default" means a pointy-clicky way
> > > in a workstation setup), which uses SSSD, and therefore, by this
> > > opinionated choice of the autorid backend, you create mappings that
> > > disagree with the supposed majority and the default. This will create
> > > problems in the future when you do consider NFS coexistence.
> > >
> > >
> >
> >
> >
> > Thanks, I'll keep that in mind.
> >
> >
> >
> > > Well, it's a different topic that most organizations that I have seen
> > > seem to ignore this default. Maybe those that don't have any problems
> > > don't have any reason to talk to me? I think that more research is
> > > needed here on whether RedHat's and GNOME's push of SSSD is something
> > > not-ready or indeed the de-facto standard setup.
> > >
> > >
> >
> >
> >
> > I think it's a bit of a mix, but am not sure either.
> >
> >
> >
> >
> > > Even if you don't want to use SSSD, providing an option to provision a
> > > few domains with idmap rid backend with statically configured ranges
> > > (as an override to autorid) would be a good step forward, as this can
> > > be made compatible with the default RedHat setup.
> >
> >
> >
> > That's reasonable. Thanks for the suggestion.
> >
> >
> >
> >
> > >
> > >
> > > > Sharing the same directories over both NFS and SMB at the same time,
> > > > also
> > > > known as "multi-protocol", is not planned for now, however we're all
> > > > aware
> > > > that there's often a demand for this feature and we're aware of the
> > > > complexity it brings. I expect we'll work on that at some point but
> > > > not
> > > > initially. Similarly, sharing the same directories over a SMB share
> > > > and
> > > > directly on a cephfs mount won't be blocked but we won't recommend
> > > > it.
> > >
> > >
> > >
> > > OK. Feature request: in the case if there are several CephFS
> > > filesystems, support configuration of which one to serve.
> > >
> > >
> >
> >
> >
> > Putting it on the list.
> >
> >
> >
> > > > > In fact, I am quite skeptical, because, at least in my experience,
> > > > > every customer's SAMBA configuration as a domain member is a unique
> > > > > snowflake, and cephadm would need an ability to specify arbitrary
> > > > > UID
> > > > > mapping configuration to match what the customer uses elsewhere -
> > > > > and
> > > > > the match must be precise.
> > > >
> > > >
> > > >
> > > > I agree - our initial use case is something along the lines:
> > > > Users of a Ceph Cluster that have Windows systems, Mac systems, or
> > > > appliances that are joined to an existing AD
> > > > but are not currently interoperating with the Ceph cluster.
> > > >
> > > >
> > > >
> > > > I expect to add some idpapping configuration and agility down the
> > > > line,
> > > > especially supporting some form of rfc2307 idmapping (where unix IDs
> > > > are
> > > > stored in AD).
> > >
> > >
> > >
> > > Yes, for whatever reason, people do this, even though it is cumbersome
> > > to manage.
> > >
> > >
> > >
> > > > But those who already have idmapping schemes 

[ceph-users] nvme hpe

2024-03-27 Thread Albert Shih
Hi.

I notice in the log I got log from each node 

Mar 27 01:12:59 cthulhu1 sudo: ceph : TTY=unknown ; PWD=/ ; USER=root ; 
COMMAND=/sbin/nvme mo000800kxprv smart-log-add --json /dev/nvme1n1
Mar 27 01:13:06 cthulhu1 sudo: ceph : TTY=unknown ; PWD=/ ; USER=root ; 
COMMAND=/sbin/nvme hpe smart-log-add --json /dev/nvme0n1
Mar 27 01:13:01 cthulhu2 sudo: ceph : TTY=unknown ; PWD=/ ; USER=root ; 
COMMAND=/sbin/nvme mo000800kxprv smart-log-add --json /dev/nvme1n1
Mar 27 01:13:07 cthulhu2 sudo: ceph : TTY=unknown ; PWD=/ ; USER=root ; 
COMMAND=/sbin/nvme hpe smart-log-add --json /dev/nvme0n1
Mar 27 01:13:02 cthulhu3 sudo: ceph : TTY=unknown ; PWD=/ ; USER=root ; 
COMMAND=/sbin/nvme mo000800kxprv smart-log-add --json /dev/nvme1n1
Mar 27 01:13:07 cthulhu3 sudo: ceph : TTY=unknown ; PWD=/ ; USER=root ; 
COMMAND=/sbin/nvme hpe smart-log-add --json /dev/nvme0n1
Mar 27 01:13:03 cthulhu4 sudo: ceph : TTY=unknown ; PWD=/ ; USER=root ; 
COMMAND=/sbin/nvme mo000800kxnxh smart-log-add --json /dev/nvme2n1
Mar 27 01:13:07 cthulhu4 sudo: ceph : TTY=unknown ; PWD=/ ; USER=root ; 
COMMAND=/sbin/nvme hpe smart-log-add --json /dev/nvme0n1
Mar 27 01:13:03 cthulhu5 sudo: ceph : TTY=unknown ; PWD=/ ; USER=root ; 
COMMAND=/sbin/nvme mo000800kxnxh smart-log-add --json /dev/nvme1n1
Mar 27 01:13:06 cthulhu5 sudo: ceph : TTY=unknown ; PWD=/ ; USER=root ; 
COMMAND=/sbin/nvme hpe smart-log-add --json /dev/nvme0n1

So the problem are : 

  The plugin for mo000800kxnxh is I'm guessing wrong (and does not exist)
  
  The plugin hpe doesn't exist either. 

nvme find (i'm guestion) the model with nvme list, but 

# nvme list
Node SN   Model
Namespace Usage  Format   FW Rev
   
- --  
/dev/nvme0n1 PWWVF0DSTHO1E7   HPE NS204i-p Gen10+ Boot Controller  
1 480.04  GB / 480.04  GB512   B +  0 B   12141004
/dev/nvme1n1 231940892591 MO000800KXNXH
1   2.76  GB / 800.17  GB512   B +  0 B   HPS0
/dev/nvme2n1 23194089256A MO000800KXNXH
1 799.47  GB / 800.17  GB512   B +  0 B   HPS0

Still with lshw I find out it's was a micron ssd. 

So my question : whats the best thing to do ? 

Which «plugin» should I use and how I tell cephad what to do ? 

Regards





-- 
Albert SHIH 嶺 
France
Heure locale/Local time:
mer. 27 mars 2024 15:43:54 CET
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Cephadm host keeps trying to set osd_memory_target to less than minimum

2024-03-27 Thread Mads Aasted
Hi Adam.

So doing the calculations with what you are stating here I arrive at a
total sum for all the listed processes at 13.3 (roughly) gb, for everything
except the osds, leaving well in excess of +4gb for each OSD.
Besides the mon daemon which i can tell on my host has a limit of 2gb ,
none of the other daemons seem to have a limit set according to ceph orch
ps. Then again, they are nowhere near the values stated in min_size_by_type
that you list.
Obviously yes, I could disable the auto tuning, but that would leave me
none the wiser as to why this exact host is trying to do this.



On Tue, Mar 26, 2024 at 10:20 PM Adam King  wrote:

> For context, the value the autotune goes with takes the value from
> `cephadm gather-facts` on the host (the "memory_total_kb" field) and then
> subtracts from that per daemon on the host according to
>
> min_size_by_type = {
> 'mds': 4096 * 1048576,
> 'mgr': 4096 * 1048576,
> 'mon': 1024 * 1048576,
> 'crash': 128 * 1048576,
> 'keepalived': 128 * 1048576,
> 'haproxy': 128 * 1048576,
> 'nvmeof': 4096 * 1048576,
> }
> default_size = 1024 * 1048576
>
> what's left is then divided by the number of OSDs on the host to arrive at
> the value. I'll also add, since it seems to be an issue on this particular
> host,  if you add the "_no_autotune_memory" label to the host, it will stop
> trying to do this on that host.
>
> On Mon, Mar 25, 2024 at 6:32 PM  wrote:
>
>> I have a virtual ceph cluster running 17.2.6 with 4 ubuntu 22.04 hosts in
>> it, each with 4 OSD's attached. The first 2 servers hosting mgr's have 32GB
>> of RAM each, and the remaining have 24gb
>> For some reason i am unable to identify, the first host in the cluster
>> appears to constantly be trying to set the osd_memory_target variable to
>> roughly half of what the calculated minimum is for the cluster, i see the
>> following spamming the logs constantly
>> Unable to set osd_memory_target on my-ceph01 to 480485376: error parsing
>> value: Value '480485376' is below minimum 939524096
>> Default is set to 4294967296.
>> I did double check and osd_memory_base (805306368) + osd_memory_cache_min
>> (134217728) adds up to minimum exactly
>> osd_memory_target_autotune is currently enabled. But i cannot for the
>> life of me figure out how it is arriving at 480485376 as a value for that
>> particular host that even has the most RAM. Neither the cluster or the host
>> is even approaching max utilization on memory, so it's not like there are
>> processes competing for resources.
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Erasure Code with Autoscaler and Backfill_toofull

2024-03-27 Thread Alexander E. Patrakov
Hello Daniel,

The situation is not as bad as you described. It is just
PG_BACKFILL_FULL, which means: if the backfills proceed, then one osd
will become backfillfull (i.e., over 90% by default).

This is definitely something that the balancer should be able to
resolve if it were allowed to act. You have probably set the "target
max misplaced ratio" option to 0.01. Please increase it to 0.03 (the
default is 0.05).

Or, you can fix the worst offenders using a few runs of TheJJ
balancer: 
https://raw.githubusercontent.com/TheJJ/ceph-balancer/master/placementoptimizer.py

./placementoptimizer.py -v balance --osdsize device --osdused delta
--max-pg-moves 20 --osdfrom fullest | bash



On Wed, Mar 27, 2024 at 5:14 PM Daniel Williams  wrote:
>
> The backfilling was caused by decommissioning an old host and moving a
> bunch of OSD to new machines.
>
> Balancer has not been activated since the backfill started / OSDs were
> moved around on hosts.
>
> Busy OSD level ? Do you mean fullness? The cluster is relatively unused in
> terms of business.
>
> # ceph status
>   cluster:
> health: HEALTH_WARN
> noout flag(s) set
> Low space hindering backfill (add storage if this doesn't
> resolve itself): 10 pgs backfill_toofull
>
>   services:
> mon: 4 daemons, quorum
> ceph-server-02,ceph-server-04,ceph-server-01,ceph-server-05 (age 6d)
> mgr: ceph-server-01.gfavjb(active, since 6d), standbys:
> ceph-server-05.swmxto, ceph-server-04.ymoarr, ceph-server-02.zzcppv
> mds: 1/1 daemons up, 3 standby
> osd: 44 osds: 44 up (since 6d), 44 in (since 6d); 19 remapped pgs
>  flags noout
>
>   data:
> volumes: 1/1 healthy
> pools:   9 pools, 481 pgs
> objects: 57.41M objects, 222 TiB
> usage:   351 TiB used, 129 TiB / 480 TiB avail
> pgs: 13895113/514097636 objects misplaced (2.703%)
>  455 active+clean
>  10  active+remapped+backfill_toofull
>  9   active+remapped+backfilling
>  5   active+clean+scrubbing+deep
>  2   active+clean+scrubbing
>
>   io:
> client:   7.5 MiB/s rd, 4.8 KiB/s wr, 28 op/s rd, 1 op/s wr
>
> # ceph osd df | sort -rnk 17
> ID  CLASS  WEIGHTREWEIGHT  SIZE RAW USE  DATA OMAP META
>  AVAIL %USE   VAR   PGS  STATUS
>  0hdd   9.09598   1.0  9.1 TiB  6.0 TiB  6.0 TiB  0 B18 GiB
>   3.1 TiB  65.96  0.90   62  up
> 11hdd  10.91423   1.0   11 TiB  7.0 TiB  7.0 TiB   40 MiB18 GiB
>   3.9 TiB  64.26  0.88   70  up
> 43hdd  14.55269   1.0   15 TiB  9.3 TiB  9.3 TiB  117 MiB24 GiB
>   5.3 TiB  63.92  0.87   87  up
> 26hdd  12.73340   1.0   13 TiB  7.9 TiB  7.9 TiB   54 MiB21 GiB
>   4.8 TiB  61.98  0.85   80  up
> 35hdd  14.55269   1.0   15 TiB  8.9 TiB  8.9 TiB   46 MiB25 GiB
>   5.7 TiB  61.05  0.83   87  up
>  5hdd   9.09569   1.0  9.1 TiB  5.5 TiB  5.5 TiB1 KiB15 GiB
>   3.6 TiB  60.71  0.83   54  up
> TOTAL  480 TiB  351 TiB  350 TiB  2.6 GiB  1018 GiB
>   129 TiB  73.12
>
> # ceph balancer status
> {
> "active": true,
> "last_optimize_duration": "0:00:00.000326",
> "last_optimize_started": "Wed Mar 27 09:04:32 2024",
> "mode": "upmap",
> "no_optimization_needed": false,
> "optimize_result": "Too many objects (0.027028 > 0.01) are
> misplaced; try again later",
> "plans": []
> }
>
> On Wed, Mar 27, 2024 at 4:53 PM David C.  wrote:
>
> > Hi Daniel,
> >
> > Changing pg_num when some OSD is almost full is not a good strategy (or
> > even dangerous).
> >
> > What is causing this backfilling? loss of an OSD? balancer? other ?
> >
> > What is the least busy OSD level (sort -nrk17)
> >
> > Is the balancer activated? (upmap?)
> >
> > Once the situation stabilizes, it becomes interesting to think about the
> > number of pg/osd =>
> > https://docs.ceph.com/en/latest/rados/operations/placement-groups/#managing-pools-that-are-flagged-with-bulk
> >
> >
> > Le mer. 27 mars 2024 à 09:41, Daniel Williams  a
> > écrit :
> >
> >> Hey,
> >>
> >> I'm running ceph version 18.2.1 (reef) but this problem must have existed
> >> a
> >> long time before reef.
> >>
> >> The documentation says the autoscaler will target 100 pgs per OSD but I'm
> >> only seeing ~10. My erasure encoding is a stripe of 6 data 3 parity.
> >> Could that be the reason? PGs numbers for that EC pool are therefore
> >> multiplied by k+m by the autoscaler calculations?
> >>
> >> Is backfill_toofull calculated against the total size of the PG against
> >> every OSD it is destined for? For my case I have ~1TiB PGs because the
> >> autoscaler is creating only 10 per host, and then backfill too full is
> >> considering that one of my OSDs only has 500GiB free, although that
> >> doesn't
> >> quite add up either because two 1TiB PGs are backfilling two pg's that
> >> have
> >> OSD 1 in them. My backfill full ratio is set to 97%.
> >>
> >> Would it be correct for me 

[ceph-users] Re: Erasure Code with Autoscaler and Backfill_toofull

2024-03-27 Thread Daniel Williams
The backfilling was caused by decommissioning an old host and moving a
bunch of OSD to new machines.

Balancer has not been activated since the backfill started / OSDs were
moved around on hosts.

Busy OSD level ? Do you mean fullness? The cluster is relatively unused in
terms of business.

# ceph status
  cluster:
health: HEALTH_WARN
noout flag(s) set
Low space hindering backfill (add storage if this doesn't
resolve itself): 10 pgs backfill_toofull

  services:
mon: 4 daemons, quorum
ceph-server-02,ceph-server-04,ceph-server-01,ceph-server-05 (age 6d)
mgr: ceph-server-01.gfavjb(active, since 6d), standbys:
ceph-server-05.swmxto, ceph-server-04.ymoarr, ceph-server-02.zzcppv
mds: 1/1 daemons up, 3 standby
osd: 44 osds: 44 up (since 6d), 44 in (since 6d); 19 remapped pgs
 flags noout

  data:
volumes: 1/1 healthy
pools:   9 pools, 481 pgs
objects: 57.41M objects, 222 TiB
usage:   351 TiB used, 129 TiB / 480 TiB avail
pgs: 13895113/514097636 objects misplaced (2.703%)
 455 active+clean
 10  active+remapped+backfill_toofull
 9   active+remapped+backfilling
 5   active+clean+scrubbing+deep
 2   active+clean+scrubbing

  io:
client:   7.5 MiB/s rd, 4.8 KiB/s wr, 28 op/s rd, 1 op/s wr

# ceph osd df | sort -rnk 17
ID  CLASS  WEIGHTREWEIGHT  SIZE RAW USE  DATA OMAP META
 AVAIL %USE   VAR   PGS  STATUS
 0hdd   9.09598   1.0  9.1 TiB  6.0 TiB  6.0 TiB  0 B18 GiB
  3.1 TiB  65.96  0.90   62  up
11hdd  10.91423   1.0   11 TiB  7.0 TiB  7.0 TiB   40 MiB18 GiB
  3.9 TiB  64.26  0.88   70  up
43hdd  14.55269   1.0   15 TiB  9.3 TiB  9.3 TiB  117 MiB24 GiB
  5.3 TiB  63.92  0.87   87  up
26hdd  12.73340   1.0   13 TiB  7.9 TiB  7.9 TiB   54 MiB21 GiB
  4.8 TiB  61.98  0.85   80  up
35hdd  14.55269   1.0   15 TiB  8.9 TiB  8.9 TiB   46 MiB25 GiB
  5.7 TiB  61.05  0.83   87  up
 5hdd   9.09569   1.0  9.1 TiB  5.5 TiB  5.5 TiB1 KiB15 GiB
  3.6 TiB  60.71  0.83   54  up
TOTAL  480 TiB  351 TiB  350 TiB  2.6 GiB  1018 GiB
  129 TiB  73.12

# ceph balancer status
{
"active": true,
"last_optimize_duration": "0:00:00.000326",
"last_optimize_started": "Wed Mar 27 09:04:32 2024",
"mode": "upmap",
"no_optimization_needed": false,
"optimize_result": "Too many objects (0.027028 > 0.01) are
misplaced; try again later",
"plans": []
}

On Wed, Mar 27, 2024 at 4:53 PM David C.  wrote:

> Hi Daniel,
>
> Changing pg_num when some OSD is almost full is not a good strategy (or
> even dangerous).
>
> What is causing this backfilling? loss of an OSD? balancer? other ?
>
> What is the least busy OSD level (sort -nrk17)
>
> Is the balancer activated? (upmap?)
>
> Once the situation stabilizes, it becomes interesting to think about the
> number of pg/osd =>
> https://docs.ceph.com/en/latest/rados/operations/placement-groups/#managing-pools-that-are-flagged-with-bulk
>
>
> Le mer. 27 mars 2024 à 09:41, Daniel Williams  a
> écrit :
>
>> Hey,
>>
>> I'm running ceph version 18.2.1 (reef) but this problem must have existed
>> a
>> long time before reef.
>>
>> The documentation says the autoscaler will target 100 pgs per OSD but I'm
>> only seeing ~10. My erasure encoding is a stripe of 6 data 3 parity.
>> Could that be the reason? PGs numbers for that EC pool are therefore
>> multiplied by k+m by the autoscaler calculations?
>>
>> Is backfill_toofull calculated against the total size of the PG against
>> every OSD it is destined for? For my case I have ~1TiB PGs because the
>> autoscaler is creating only 10 per host, and then backfill too full is
>> considering that one of my OSDs only has 500GiB free, although that
>> doesn't
>> quite add up either because two 1TiB PGs are backfilling two pg's that
>> have
>> OSD 1 in them. My backfill full ratio is set to 97%.
>>
>> Would it be correct for me to change the autoscaler to target ~700 pgs per
>> osd and bias for storagefs and all EC pools to k+m? Should that be the
>> default or the documentation recommended value?
>>
>> How scary is changing PG_NUM while backfilling misplaced PGs? It seems
>> like
>> there's a chance the backfill might succeed so I think I can wait.
>>
>> Any help is greatly appreciated, I've tried to include as much of the
>> relevant debugging output as I can think of.
>>
>> Daniel
>>
>> # ceph osd ls | wc -l
>> 44
>> # ceph pg ls | wc -l
>> 484
>>
>> # ceph osd pool autoscale-status
>> POOL SIZE  TARGET SIZE   RATE  RAW CAPACITY   RATIO
>>  TARGET RATIO  EFFECTIVE RATIO  BIAS  PG_NUM  NEW PG_NUM  AUTOSCALE  BULK
>> .rgw.root  216.0k 3.0480.2T  0.
>>  1.0  32  on False
>> default.rgw.control0  3.0480.2T  0.
>>   

[ceph-users] Re: Erasure Code with Autoscaler and Backfill_toofull

2024-03-27 Thread David C.
Hi Daniel,

Changing pg_num when some OSD is almost full is not a good strategy (or
even dangerous).

What is causing this backfilling? loss of an OSD? balancer? other ?

What is the least busy OSD level (sort -nrk17)

Is the balancer activated? (upmap?)

Once the situation stabilizes, it becomes interesting to think about the
number of pg/osd =>
https://docs.ceph.com/en/latest/rados/operations/placement-groups/#managing-pools-that-are-flagged-with-bulk


Le mer. 27 mars 2024 à 09:41, Daniel Williams  a
écrit :

> Hey,
>
> I'm running ceph version 18.2.1 (reef) but this problem must have existed a
> long time before reef.
>
> The documentation says the autoscaler will target 100 pgs per OSD but I'm
> only seeing ~10. My erasure encoding is a stripe of 6 data 3 parity.
> Could that be the reason? PGs numbers for that EC pool are therefore
> multiplied by k+m by the autoscaler calculations?
>
> Is backfill_toofull calculated against the total size of the PG against
> every OSD it is destined for? For my case I have ~1TiB PGs because the
> autoscaler is creating only 10 per host, and then backfill too full is
> considering that one of my OSDs only has 500GiB free, although that doesn't
> quite add up either because two 1TiB PGs are backfilling two pg's that have
> OSD 1 in them. My backfill full ratio is set to 97%.
>
> Would it be correct for me to change the autoscaler to target ~700 pgs per
> osd and bias for storagefs and all EC pools to k+m? Should that be the
> default or the documentation recommended value?
>
> How scary is changing PG_NUM while backfilling misplaced PGs? It seems like
> there's a chance the backfill might succeed so I think I can wait.
>
> Any help is greatly appreciated, I've tried to include as much of the
> relevant debugging output as I can think of.
>
> Daniel
>
> # ceph osd ls | wc -l
> 44
> # ceph pg ls | wc -l
> 484
>
> # ceph osd pool autoscale-status
> POOL SIZE  TARGET SIZE   RATE  RAW CAPACITY   RATIO
>  TARGET RATIO  EFFECTIVE RATIO  BIAS  PG_NUM  NEW PG_NUM  AUTOSCALE  BULK
> .rgw.root  216.0k 3.0480.2T  0.
>  1.0  32  on False
> default.rgw.control0  3.0480.2T  0.
>  1.0  32  on False
> default.rgw.meta   0  3.0480.2T  0.
>  1.0  32  on False
> default.rgw.log 1636k 3.0480.2T  0.
>  1.0  32  on False
> storagefs  233.5T 1.5480.2T  0.7294
>  1.0 256  on False
> storagefs-meta 850.2M 4.0480.2T  0.
>  4.0  32  on False
> storagefs_wide 355.3G   1.375480.2T  0.0010
>  1.0  32  on False
> .mgr   457.3M 3.0480.2T  0.
>  1.0   1  on False
> mgr-backup-2022-08-19  370.6M 3.0480.2T  0.
>  1.0  32  on False
>
> # ceph osd pool ls detail | column -t
> pool  15  '.rgw.root'  replicated  size 3min_size  2
> crush_rule  0  object_hash  rjenkins  pg_num   32pgp_num  32
> autoscale_mode  on
> pool  16  'default.rgw.control'replicated  size 3min_size  2
> crush_rule  0  object_hash  rjenkins  pg_num   32pgp_num  32
> autoscale_mode  on
> pool  17  'default.rgw.meta'   replicated  size 3min_size  2
> crush_rule  0  object_hash  rjenkins  pg_num   32pgp_num  32
> autoscale_mode  on
> pool  18  'default.rgw.log'replicated  size 3min_size  2
> crush_rule  0  object_hash  rjenkins  pg_num   32pgp_num  32
> autoscale_mode  on
> pool  36  'storagefs'  erasure profile  6.3  size  9
> min_size7  crush_rule   2 object_hash  rjenkins  pg_num   256
>  pgp_num 256  autoscale_mode  on
> pool  37  'storagefs-meta' replicated  size 4min_size  1
> crush_rule  0  object_hash  rjenkins  pg_num   32pgp_num  32
> autoscale_mode  on
> pool  45  'storagefs_wide' erasure profile  8.3  size  11
>  min_size9  crush_rule   8 object_hash  rjenkins  pg_num   32
> pgp_num 32   autoscale_mode  on
> pool  46  '.mgr'   replicated  size 3min_size  2
> crush_rule  0  object_hash  rjenkins  pg_num   1 pgp_num  1
>  autoscale_mode  on
> pool  48  'mgr-backup-2022-08-19'  replicated  size 3min_size  2
> crush_rule  0  object_hash  rjenkins  pg_num   32pgp_num  32
> 

[ceph-users] Re: Ha proxy and S3

2024-03-27 Thread Gheorghiță Butnaru
yes, you can deploy an ingress service with cephadm [1].

You can customize the haproxy config if you need something specific [2].
ceph config-key set mgr/cephadm/services/ingress/haproxy.cfg -i
haproxy.cfg.j2


[1]
https://docs.ceph.com/en/latest/cephadm/services/rgw/#high-availability-service-for-rgw
[2]
https://docs.ceph.com/en/quincy/cephadm/services/monitoring/#using-custom-configuration-files

On Wed, Mar 27, 2024 at 10:21 AM Albert Shih  wrote:

> Hi,
>
> If I'm correct in a S3 installation it's good practice to have a HA proxy,
> I also read somewhere the cephadm tool can deploy the HA Proxy.
>
> But is it a good practice to use cephadm to deploy the HA Proxy or it's
> better do deploy it manually on a other server (who does only that).
>
> Regards
>
> --
> Albert SHIH 嶺 
> France
> Heure locale/Local time:
> mer. 27 mars 2024 09:18:04 CET
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ha proxy and S3

2024-03-27 Thread Marc
> 
> But is it a good practice to use cephadm to deploy the HA Proxy or it's
> better do deploy it manually on a other server (who does only that).
>

Afaik cephadm's only viable option is podman. As I undertand, podman does 
nothing with managing tasks that can move to other hosts automatically. When I 
chose an orchestrator, none were even offering multi homed tasks. So if you 
choose to go with a container, choose something that will automatically start 
(stateless) containers on a different node when something fails.


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Erasure Code with Autoscaler and Backfill_toofull

2024-03-27 Thread Daniel Williams
Hey,

I'm running ceph version 18.2.1 (reef) but this problem must have existed a
long time before reef.

The documentation says the autoscaler will target 100 pgs per OSD but I'm
only seeing ~10. My erasure encoding is a stripe of 6 data 3 parity.
Could that be the reason? PGs numbers for that EC pool are therefore
multiplied by k+m by the autoscaler calculations?

Is backfill_toofull calculated against the total size of the PG against
every OSD it is destined for? For my case I have ~1TiB PGs because the
autoscaler is creating only 10 per host, and then backfill too full is
considering that one of my OSDs only has 500GiB free, although that doesn't
quite add up either because two 1TiB PGs are backfilling two pg's that have
OSD 1 in them. My backfill full ratio is set to 97%.

Would it be correct for me to change the autoscaler to target ~700 pgs per
osd and bias for storagefs and all EC pools to k+m? Should that be the
default or the documentation recommended value?

How scary is changing PG_NUM while backfilling misplaced PGs? It seems like
there's a chance the backfill might succeed so I think I can wait.

Any help is greatly appreciated, I've tried to include as much of the
relevant debugging output as I can think of.

Daniel

# ceph osd ls | wc -l
44
# ceph pg ls | wc -l
484

# ceph osd pool autoscale-status
POOL SIZE  TARGET SIZE   RATE  RAW CAPACITY   RATIO
 TARGET RATIO  EFFECTIVE RATIO  BIAS  PG_NUM  NEW PG_NUM  AUTOSCALE  BULK
.rgw.root  216.0k 3.0480.2T  0.
 1.0  32  on False
default.rgw.control0  3.0480.2T  0.
 1.0  32  on False
default.rgw.meta   0  3.0480.2T  0.
 1.0  32  on False
default.rgw.log 1636k 3.0480.2T  0.
 1.0  32  on False
storagefs  233.5T 1.5480.2T  0.7294
 1.0 256  on False
storagefs-meta 850.2M 4.0480.2T  0.
 4.0  32  on False
storagefs_wide 355.3G   1.375480.2T  0.0010
 1.0  32  on False
.mgr   457.3M 3.0480.2T  0.
 1.0   1  on False
mgr-backup-2022-08-19  370.6M 3.0480.2T  0.
 1.0  32  on False

# ceph osd pool ls detail | column -t
pool  15  '.rgw.root'  replicated  size 3min_size  2
crush_rule  0  object_hash  rjenkins  pg_num   32pgp_num  32
autoscale_mode  on
pool  16  'default.rgw.control'replicated  size 3min_size  2
crush_rule  0  object_hash  rjenkins  pg_num   32pgp_num  32
autoscale_mode  on
pool  17  'default.rgw.meta'   replicated  size 3min_size  2
crush_rule  0  object_hash  rjenkins  pg_num   32pgp_num  32
autoscale_mode  on
pool  18  'default.rgw.log'replicated  size 3min_size  2
crush_rule  0  object_hash  rjenkins  pg_num   32pgp_num  32
autoscale_mode  on
pool  36  'storagefs'  erasure profile  6.3  size  9
min_size7  crush_rule   2 object_hash  rjenkins  pg_num   256
 pgp_num 256  autoscale_mode  on
pool  37  'storagefs-meta' replicated  size 4min_size  1
crush_rule  0  object_hash  rjenkins  pg_num   32pgp_num  32
autoscale_mode  on
pool  45  'storagefs_wide' erasure profile  8.3  size  11
 min_size9  crush_rule   8 object_hash  rjenkins  pg_num   32
pgp_num 32   autoscale_mode  on
pool  46  '.mgr'   replicated  size 3min_size  2
crush_rule  0  object_hash  rjenkins  pg_num   1 pgp_num  1
 autoscale_mode  on
pool  48  'mgr-backup-2022-08-19'  replicated  size 3min_size  2
crush_rule  0  object_hash  rjenkins  pg_num   32pgp_num  32
autoscale_mode  on

# ceph osd erasure-code-profile get 6.3
crush-device-class=
crush-failure-domain=host
crush-root=default
jerasure-per-chunk-alignment=false
k=6
m=3
plugin=jerasure
technique=reed_sol_van
w=8

# ceph pg ls | awk 'NR==1 || /backfill_toofull/' | awk '{print $1" "$2"
"$4" "$6" "$11" "$15" "$16}' | column -t
PG OBJECTS  MISPLACED  BYTES STATE
UP  ACTING
36.f   222077   141392 953817797727  active+remapped+backfill_toofull
 [1,27,41,8,36,17,14,40,32]p1[33,32,29,23,16,17,28,1,14]p33
36.5c  221761   147015 950692130045  active+remapped+backfill_toofull
 [26,27,40,29,1,37,39,11,42]p26  

[ceph-users] Ha proxy and S3

2024-03-27 Thread Albert Shih
Hi, 

If I'm correct in a S3 installation it's good practice to have a HA proxy,
I also read somewhere the cephadm tool can deploy the HA Proxy. 

But is it a good practice to use cephadm to deploy the HA Proxy or it's
better do deploy it manually on a other server (who does only that). 

Regards

-- 
Albert SHIH 嶺 
France
Heure locale/Local time:
mer. 27 mars 2024 09:18:04 CET
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io