[ceph-users] Re: Upgrade from 17.2.5 to 17.2.6 stuck at MDS

2023-04-10 Thread Adam King
It seems like it maybe didn't actually do the redeploy as it should log
something saying it's actually doing it on top of the line saying it
scheduled it. To confirm, the upgrade is paused ("ceph orch upgrade status"
reports is_paused as false)? If so, maybe try doing a mgr failover ("ceph
mgr fail") and then check "ceph orch ps"  and "ceph orch device ls" a few
minutes later and look at the REFRESHED column. If any of those are giving
amounts of time farther back then when you did the failover, there's
probably something going on on the host(s) where it says it hasn't
refreshed recently that's sticking things up (you'd have to go on that host
and look for hanging cephadm commands). Lastly, you could look at the
/var/lib/ceph///unit.run file on the hosts where the
mds daemons are deployed. The (very long) last podman/docker run line in
that file should have the image name of the image the daemon is being
deployed with. So you could use that to confirm if cephadm ever actually
tried a redeploy of the mds with the new image. You could also check the
journal logs for the mds. Cephadm reports the sytemd unit name for the
daemon as part of "cephadm ls" output if you put a copy of the cephadm
binary, un "cephadm ls" with it, grab the systemd unit name for the mds
daemon form that output, you could use that to check the journal logs which
should tell the last restart time and why it's gone down.

On Mon, Apr 10, 2023 at 4:25 PM Thomas Widhalm 
wrote:

> I did what you told me.
>
> I also see in the log, that the command went through:
>
> 2023-04-10T19:58:46.522477+ mgr.ceph04.qaexpv [INF] Schedule
> redeploy daemon mds.mds01.ceph06.rrxmks
> 2023-04-10T20:01:03.360559+ mgr.ceph04.qaexpv [INF] Schedule
> redeploy daemon mds.mds01.ceph05.pqxmvt
> 2023-04-10T20:01:21.787635+ mgr.ceph04.qaexpv [INF] Schedule
> redeploy daemon mds.mds01.ceph07.omdisd
>
>
> But the MDS never start. They stay in error state. I tried to redeploy
> and start them a few times. Even restarted one host where a MDS should run.
>
> mds.mds01.ceph03.xqwdjy  ceph03   error   32m ago
> 2M-- 
> mds.mds01.ceph04.hcmvae  ceph04   error   31m ago
> 2h-- 
> mds.mds01.ceph05.pqxmvt  ceph05   error   32m ago
> 9M-- 
> mds.mds01.ceph06.rrxmks  ceph06   error   32m ago
> 10w-- 
> mds.mds01.ceph07.omdisd  ceph07   error   32m ago
> 2M-- 
>
>
> And other ideas? Or am I missing something.
>
> Cheers,
> Thomas
>
> On 10.04.23 21:53, Adam King wrote:
> > Will also note that the normal upgrade process scales down the mds
> > service to have only 1 mds per fs before upgrading it, so maybe
> > something you'd want to do as well if the upgrade didn't do it already.
> > It does so by setting the max_mds to 1 for the fs.
> >
> > On Mon, Apr 10, 2023 at 3:51 PM Adam King  > > wrote:
> >
> > You could try pausing the upgrade and manually "upgrading" the mds
> > daemons by redeploying them on the new image. Something like "ceph
> > orch daemon redeploy  --image <17.2.6 image>"
> > (daemon names should match those in "ceph orch ps" output). If you
> > do that for all of them and then get them into an up state you
> > should be able to resume the upgrade and have it complete.
> >
> > On Mon, Apr 10, 2023 at 3:25 PM Thomas Widhalm
> > mailto:widha...@widhalm.or.at>> wrote:
> >
> > Hi,
> >
> > If you remember, I hit bug https://tracker.ceph.com/issues/58489
> >  so I
> > was very relieved when 17.2.6 was released and started to update
> > immediately.
> >
> > But now I'm stuck again with my broken MDS. MDS won't get into
> > up:active
> > without the update but the update waits for them to get into
> > up:active
> > state. Seems like a deadlock / chicken-egg problem to me.
> >
> > Since I'm still relatively new to Ceph, could you help me?
> >
> > What I see when watching the update status:
> >
> > {
> >   "target_image":
> > "
> quay.io/ceph/ceph@sha256:1161e35e4e02cf377c93b913ce78773f8413f5a8d7c5eaee4b4773a4f9dd6635
> <
> http://quay.io/ceph/ceph@sha256:1161e35e4e02cf377c93b913ce78773f8413f5a8d7c5eaee4b4773a4f9dd6635
> >",
> >   "in_progress": true,
> >   "which": "Upgrading all daemon types on all hosts",
> >   "services_complete": [
> >   "crash",
> >   "mgr",
> >  "mon",
> >  "osd"
> >   ],
> >   "progress": "18/40 daemons upgraded",
> >   "message": "Error: UPGRADE_OFFLINE_HOST: Upgrade: Failed
> > to connect
> > to host ceph01 at addr (192.168.23.61)",
> >   

[ceph-users] Re: ceph.v17 multi-mds ephemeral directory pinning: cannot set or retrieve extended attribute

2023-04-10 Thread Patrick Donnelly
On Sun, Apr 9, 2023 at 11:21 PM Ulrich Pralle
 wrote:
>
> Hi,
>
> we are using ceph version 17.2.5 on Ubuntu 22.04.1 LTS.
>
> We deployed multi-mds (max_mds=4, plus standby-replay mds).
> Currently we statically directory-pinned our user home directories (~50k).
> The cephfs' root directory is pinned to '-1', ./homes is pinned to "0".
> All user home directories below ./homes/ are pinned to -1, 1, 2, or 3
> depending on a simple hash algorithm.
> Cephfs is provided to our users as samba/cifs (clustered samba,ctdb).
>
> We want to try ephemeral directory pinning.
>
> We can successfully set the extended attribute
> "ceph.dir.pin.distributed" with setfattr(1), but cannot retrieve its
> setting afterwards.:
>
> # setfattr -n ceph.dir.pin.distributed -v 1 ./units
> # getfattr -n ceph.dir.pin.distributed ./units
> ./units: ceph.dir.pin.distributed: No such attribute
>
> strace setfattr reports success on setxattr
>
> setxattr("./units", "ceph.dir.pin.distributed", "1", 1, 0) = 0
>
> strace getfattr reports
>
> lstat("./units", {st_mode=S_IFDIR|0751, st_size=1, ...}) = 0
> getxattr("./units", "ceph.dir.pin.distributed", NULL, 0) = -1 ENODATA
> (No data available)
>
> The file system is mounted
> rw,noatime,,name=,mds_namespace=.acl,recover_session=clean.
> The cephfs mds caps are "allow rwps".
> "./units" has a ceph.dir.layout="stripe_unit=4194304 stripe_count=1
> object_size=4194304 pool=fs_data_units"
> Ubuntu's setfattr is version 2.4.48.
>
> Defining other cephfs extend attributes (like ceph.dir.pin,
> ceph.quota.max_bytes, etc.) works as expected.
>
> What are we missing?

Your kernel doesn't appear to know how to check virtual extended
attributes yet. It should be in 5.18.

> Should we clear all static directory pinnings in advance?

Start by removing the pin on /home. Then remove a group of pins on
some users directories. Confirm /home looks something like:

ceph tell mds.:0 dump tree /home 0 | jq '.[0].dirfrags[] | .dir_auth'
"0"
"0"
"1"
"1"
"1"
"1"
"0"
"0"

Which tells you the dirfrags for /home are distributed across the
ranks (in this case, 0 and 1).

At that point, it should be fine to remove the rest of the manual pins.

> Are there any experience with ephemeral directory pinning?
> Or should one refrain from multi-mds at all?

It should work fine. Please give it a try and report back!

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] How can I use not-replicated pool (replication 1 or raid-0)

2023-04-10 Thread mhnx
Hello.

I have a 10 node cluster. I want to create a non-replicated pool
(replication 1) and I want to ask some questions about it:

Let me tell you my use case:
- I don't care about losing data,
- All of my data is JUNK and these junk files are usually between 1KB to 32MB.
- These files will be deleted in 5 days.
- Writable space and I/O speed is more important.
- I have high Write/Read/Delete operations, minimum 200GB a day.

I'm afraid that, in any failure, I won't be able to access the whole
cluster. Losing data is okay but I have to ignore missing files,
remove the data from the cluster and continue with existing data and
while doing this, I want to be able to write new data to the cluster.

My questions are:
1- To reach this goal do you have any recommendations?
2- With this setup, what potential problems do you have in mind?
3- I think Erasure Coding is not a choice because of the performance
problems and slow file deletion. With this I/O need EC will miss files
and leaks may happen (I've seen before on Nautilus).
4- You read my needs, is there a better way to do this?

Thank you for the answers.
Best regards.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Upgrade from 17.2.5 to 17.2.6 stuck at MDS

2023-04-10 Thread Thomas Widhalm

I did what you told me.

I also see in the log, that the command went through:

2023-04-10T19:58:46.522477+ mgr.ceph04.qaexpv [INF] Schedule 
redeploy daemon mds.mds01.ceph06.rrxmks
2023-04-10T20:01:03.360559+ mgr.ceph04.qaexpv [INF] Schedule 
redeploy daemon mds.mds01.ceph05.pqxmvt
2023-04-10T20:01:21.787635+ mgr.ceph04.qaexpv [INF] Schedule 
redeploy daemon mds.mds01.ceph07.omdisd



But the MDS never start. They stay in error state. I tried to redeploy 
and start them a few times. Even restarted one host where a MDS should run.


mds.mds01.ceph03.xqwdjy  ceph03   error   32m ago 
2M-- 
mds.mds01.ceph04.hcmvae  ceph04   error   31m ago 
2h-- 
mds.mds01.ceph05.pqxmvt  ceph05   error   32m ago 
9M-- 
mds.mds01.ceph06.rrxmks  ceph06   error   32m ago 
10w-- 
mds.mds01.ceph07.omdisd  ceph07   error   32m ago 
2M-- 



And other ideas? Or am I missing something.

Cheers,
Thomas

On 10.04.23 21:53, Adam King wrote:
Will also note that the normal upgrade process scales down the mds 
service to have only 1 mds per fs before upgrading it, so maybe 
something you'd want to do as well if the upgrade didn't do it already. 
It does so by setting the max_mds to 1 for the fs.


On Mon, Apr 10, 2023 at 3:51 PM Adam King > wrote:


You could try pausing the upgrade and manually "upgrading" the mds
daemons by redeploying them on the new image. Something like "ceph
orch daemon redeploy  --image <17.2.6 image>"
(daemon names should match those in "ceph orch ps" output). If you
do that for all of them and then get them into an up state you
should be able to resume the upgrade and have it complete.

On Mon, Apr 10, 2023 at 3:25 PM Thomas Widhalm
mailto:widha...@widhalm.or.at>> wrote:

Hi,

If you remember, I hit bug https://tracker.ceph.com/issues/58489
 so I
was very relieved when 17.2.6 was released and started to update
immediately.

But now I'm stuck again with my broken MDS. MDS won't get into
up:active
without the update but the update waits for them to get into
up:active
state. Seems like a deadlock / chicken-egg problem to me.

Since I'm still relatively new to Ceph, could you help me?

What I see when watching the update status:

{
      "target_image":

"quay.io/ceph/ceph@sha256:1161e35e4e02cf377c93b913ce78773f8413f5a8d7c5eaee4b4773a4f9dd6635 
",
      "in_progress": true,
      "which": "Upgrading all daemon types on all hosts",
      "services_complete": [
          "crash",
          "mgr",
         "mon",
         "osd"
      ],
      "progress": "18/40 daemons upgraded",
      "message": "Error: UPGRADE_OFFLINE_HOST: Upgrade: Failed
to connect
to host ceph01 at addr (192.168.23.61)",
      "is_paused": false
}

(The offline host was one host that broke during the upgrade. I
fixed
that in the meantime and the update went on.)

And in the log:

2023-04-10T19:23:48.750129+ mgr.ceph04.qaexpv [INF] Upgrade:
Waiting
for mds.mds01.ceph04.hcmvae to be up:active (currently up:replay)
2023-04-10T19:23:58.758141+ mgr.ceph04.qaexpv [WRN] Upgrade:
No mds
is up; continuing upgrade procedure to poke things in the right
direction


Please give me a hint what I can do.

Cheers,
Thomas
-- 
http://www.widhalm.or.at 

GnuPG : 6265BAE6 , A84CB603
Threema: H7AV7D33
Telegram, Signal: widha...@widhalm.or.at

___
ceph-users mailing list -- ceph-users@ceph.io

To unsubscribe send an email to ceph-users-le...@ceph.io




OpenPGP_signature
Description: OpenPGP digital signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Upgrade from 17.2.5 to 17.2.6 stuck at MDS

2023-04-10 Thread Adam King
Will also note that the normal upgrade process scales down the mds service
to have only 1 mds per fs before upgrading it, so maybe something you'd
want to do as well if the upgrade didn't do it already. It does so by
setting the max_mds to 1 for the fs.

On Mon, Apr 10, 2023 at 3:51 PM Adam King  wrote:

> You could try pausing the upgrade and manually "upgrading" the mds daemons
> by redeploying them on the new image. Something like "ceph orch daemon
> redeploy  --image <17.2.6 image>" (daemon names should
> match those in "ceph orch ps" output). If you do that for all of them and
> then get them into an up state you should be able to resume the upgrade and
> have it complete.
>
> On Mon, Apr 10, 2023 at 3:25 PM Thomas Widhalm 
> wrote:
>
>> Hi,
>>
>> If you remember, I hit bug https://tracker.ceph.com/issues/58489 so I
>> was very relieved when 17.2.6 was released and started to update
>> immediately.
>>
>> But now I'm stuck again with my broken MDS. MDS won't get into up:active
>> without the update but the update waits for them to get into up:active
>> state. Seems like a deadlock / chicken-egg problem to me.
>>
>> Since I'm still relatively new to Ceph, could you help me?
>>
>> What I see when watching the update status:
>>
>> {
>>  "target_image":
>> "
>> quay.io/ceph/ceph@sha256:1161e35e4e02cf377c93b913ce78773f8413f5a8d7c5eaee4b4773a4f9dd6635
>> ",
>>  "in_progress": true,
>>  "which": "Upgrading all daemon types on all hosts",
>>  "services_complete": [
>>  "crash",
>>  "mgr",
>> "mon",
>> "osd"
>>  ],
>>  "progress": "18/40 daemons upgraded",
>>  "message": "Error: UPGRADE_OFFLINE_HOST: Upgrade: Failed to connect
>> to host ceph01 at addr (192.168.23.61)",
>>  "is_paused": false
>> }
>>
>> (The offline host was one host that broke during the upgrade. I fixed
>> that in the meantime and the update went on.)
>>
>> And in the log:
>>
>> 2023-04-10T19:23:48.750129+ mgr.ceph04.qaexpv [INF] Upgrade: Waiting
>> for mds.mds01.ceph04.hcmvae to be up:active (currently up:replay)
>> 2023-04-10T19:23:58.758141+ mgr.ceph04.qaexpv [WRN] Upgrade: No mds
>> is up; continuing upgrade procedure to poke things in the right direction
>>
>>
>> Please give me a hint what I can do.
>>
>> Cheers,
>> Thomas
>> --
>> http://www.widhalm.or.at
>> GnuPG : 6265BAE6 , A84CB603
>> Threema: H7AV7D33
>> Telegram, Signal: widha...@widhalm.or.at
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Upgrade from 17.2.5 to 17.2.6 stuck at MDS

2023-04-10 Thread Adam King
You could try pausing the upgrade and manually "upgrading" the mds daemons
by redeploying them on the new image. Something like "ceph orch daemon
redeploy  --image <17.2.6 image>" (daemon names should
match those in "ceph orch ps" output). If you do that for all of them and
then get them into an up state you should be able to resume the upgrade and
have it complete.

On Mon, Apr 10, 2023 at 3:25 PM Thomas Widhalm 
wrote:

> Hi,
>
> If you remember, I hit bug https://tracker.ceph.com/issues/58489 so I
> was very relieved when 17.2.6 was released and started to update
> immediately.
>
> But now I'm stuck again with my broken MDS. MDS won't get into up:active
> without the update but the update waits for them to get into up:active
> state. Seems like a deadlock / chicken-egg problem to me.
>
> Since I'm still relatively new to Ceph, could you help me?
>
> What I see when watching the update status:
>
> {
>  "target_image":
> "
> quay.io/ceph/ceph@sha256:1161e35e4e02cf377c93b913ce78773f8413f5a8d7c5eaee4b4773a4f9dd6635
> ",
>  "in_progress": true,
>  "which": "Upgrading all daemon types on all hosts",
>  "services_complete": [
>  "crash",
>  "mgr",
> "mon",
> "osd"
>  ],
>  "progress": "18/40 daemons upgraded",
>  "message": "Error: UPGRADE_OFFLINE_HOST: Upgrade: Failed to connect
> to host ceph01 at addr (192.168.23.61)",
>  "is_paused": false
> }
>
> (The offline host was one host that broke during the upgrade. I fixed
> that in the meantime and the update went on.)
>
> And in the log:
>
> 2023-04-10T19:23:48.750129+ mgr.ceph04.qaexpv [INF] Upgrade: Waiting
> for mds.mds01.ceph04.hcmvae to be up:active (currently up:replay)
> 2023-04-10T19:23:58.758141+ mgr.ceph04.qaexpv [WRN] Upgrade: No mds
> is up; continuing upgrade procedure to poke things in the right direction
>
>
> Please give me a hint what I can do.
>
> Cheers,
> Thomas
> --
> http://www.widhalm.or.at
> GnuPG : 6265BAE6 , A84CB603
> Threema: H7AV7D33
> Telegram, Signal: widha...@widhalm.or.at
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Upgrade from 17.2.5 to 17.2.6 stuck at MDS

2023-04-10 Thread Thomas Widhalm

Hi,

If you remember, I hit bug https://tracker.ceph.com/issues/58489 so I 
was very relieved when 17.2.6 was released and started to update 
immediately.


But now I'm stuck again with my broken MDS. MDS won't get into up:active 
without the update but the update waits for them to get into up:active 
state. Seems like a deadlock / chicken-egg problem to me.


Since I'm still relatively new to Ceph, could you help me?

What I see when watching the update status:

{
"target_image": 
"quay.io/ceph/ceph@sha256:1161e35e4e02cf377c93b913ce78773f8413f5a8d7c5eaee4b4773a4f9dd6635",

"in_progress": true,
"which": "Upgrading all daemon types on all hosts",
"services_complete": [
"crash",
"mgr",
"mon",
"osd"
],
"progress": "18/40 daemons upgraded",
"message": "Error: UPGRADE_OFFLINE_HOST: Upgrade: Failed to connect 
to host ceph01 at addr (192.168.23.61)",

"is_paused": false
}

(The offline host was one host that broke during the upgrade. I fixed 
that in the meantime and the update went on.)


And in the log:

2023-04-10T19:23:48.750129+ mgr.ceph04.qaexpv [INF] Upgrade: Waiting 
for mds.mds01.ceph04.hcmvae to be up:active (currently up:replay)
2023-04-10T19:23:58.758141+ mgr.ceph04.qaexpv [WRN] Upgrade: No mds 
is up; continuing upgrade procedure to poke things in the right direction



Please give me a hint what I can do.

Cheers,
Thomas
--
http://www.widhalm.or.at
GnuPG : 6265BAE6 , A84CB603
Threema: H7AV7D33
Telegram, Signal: widha...@widhalm.or.at


OpenPGP_signature
Description: OpenPGP digital signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Disks are filling up even if there is not a single placement group on them

2023-04-10 Thread Alexander E. Patrakov
On Sat, Apr 8, 2023 at 2:26 PM Michal Strnad  wrote:
>cluster:
>  id: a12aa2d2-fae7-df35-ea2f-3de23100e345
>  health: HEALTH_WARN
...
>  pgs: 1656117639/32580808518 objects misplaced (5.083%)

That's why the space is eaten. The stuff that eats the disk space on
MONs is osdmaps, and the MONs have to keep old osdmaps back to the
moment in the past when the cluster was 100% healthy. Note that
osdmaps are also copied to all OSDs and eat space there, which is what
you have seen.

The relevant (but dangerous) configuration parameter is
"mon_osd_force_trim_to". Better don't use it, and let your ceph
cluster recover. If you can't wait, try to use upmaps to say that all
PGs are fine where they are now, i.e that they are not misplaced.
There is a script somewhere on GitHub that does this, but
unfortunately I can't find it right now.


--
Alexander E. Patrakov
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] v17.2.6 Quincy released

2023-04-10 Thread Yuri Weinstein
We're happy to announce the 6th backport release in the Quincy series.

https://ceph.io/en/news/blog/2023/v17-2-6-quincy-released/

Notable Changes
---

* `ceph mgr dump` command now outputs `last_failure_osd_epoch` and
  `active_clients` fields at the top level.  Previously, these fields were
  output under `always_on_modules` field.

* telemetry: Added new metrics to the 'basic' channel to report
per-pool bluestore
  compression metrics. See a sample report with `ceph telemetry preview`.
  Opt-in with `ceph telemetry on`.

Getting Ceph

* Git at git://github.com/ceph/ceph.git
* Tarball at https://download.ceph.com/tarballs/ceph-17.2.6.tar.gz
* Containers at https://quay.io/repository/ceph/ceph
* For packages, see https://docs.ceph.com/en/latest/install/get-packages/
* Release git sha1: d7ff0d10654d2280e08f1ab989c7cdf3064446a5
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io