date:20231204

[ceph-users] Re: the image used size becomes 0 after export/import with snapshot

2023-12-04 Thread Tony Liu

Hi Ilya,

That explains it. Thank you for clarification!

Tony

From: Ilya Dryomov 
Sent: December 4, 2023 09:40 AM
To: Tony Liu
Cc: ceph-users@ceph.io; d...@ceph.io
Subject: Re: [ceph-users] the image used size becomes 0 after export/import 
with snapshot

On Tue, Nov 28, 2023 at 8:18 AM Tony Liu  wrote:
>
> Hi,
>
> I have an image with a snapshot and some changes after snapshot.
> ```
> $ rbd du backup/f0408e1e-06b6-437b-a2b5-70e3751d0a26
> NAME  
>   PROVISIONED  USED
> f0408e1e-06b6-437b-a2b5-70e3751d0a26@snapshot-eb085877-7557-4620-9c01-c5587b857029
>10 GiB  2.4 GiB
> f0408e1e-06b6-437b-a2b5-70e3751d0a26  
>10 GiB  2.4 GiB
>
>10 GiB  4.8 GiB
> ```
> If there is no changes after snapshot, the image line will show 0 used.
>
> I did export and import.
> ```
> $ rbd export --export-format 2 backup/f0408e1e-06b6-437b-a2b5-70e3751d0a26 - 
> | rbd import --export-format 2 - backup/test
> Exporting image: 100% complete...done.
> Importing image: 100% complete...done.
> ```
>
> When check the imported image, the image line shows 0 used.
> ```
> $ rbd du backup/test
> NAMEPROVISIONED  USED
> test@snapshot-eb085877-7557-4620-9c01-c5587b857029   10 GiB  2.4 GiB
> test 10 GiB  0 B
>   10 GiB  2.4 GiB
> ```
> Any clues how that happened? I'd expect the same du as the source.

Hi Tony,

"rbd import" command does zero detection at 4k granularity by default.
If the "after snapshot" changes just zeroed everything in the snapshot,
such a discrepancy in "rbd du" USED column is expected.

Thanks,

Ilya
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: mds slow request with “failed to authpin, subtree is being exported"

2023-12-04 Thread Venky Shankar

On Tue, Dec 5, 2023 at 6:34 AM Xiubo Li  wrote:
>
>
> On 12/4/23 16:25, zxcs wrote:
> > Thanks a lot, Xiubo!
> >
> > we already set ‘mds_bal_interval’ to 0. and the slow mds seems decrease.
> >
> > But somehow we still see mds complain slow request. and from mds log , can 
> > see
> >
> > “slow request *** seconds old, received at 2023-12-04T…: internal op 
> > exportdir:mds.* currently acquired locks”
> >
> > so our question is, why it still see "internal op exportdir”, any other 
> > config also need to set 0? and could please shed light here which config we 
> > need set .
> >
> IMO, this should be enough.
>
> Venky,
>
> Did I miss something here ?

You missed nothing. Setting `mds_bal_interval = 0` disables the
balancer. I guess there are in-progress exports that would take some
time to backoff and the slow ops should eventually get cleaned up.

I'd say wait a bit and see if the slow request resolves by itself.
FWIW, there was a feature request a while back to cancel an ongoing
export. We should prioritize having that.

>
> Thanks
>
> - Xiubo
>
>
> > Thanks,
> > xz
> >
> >> 2023年11月27日 13:19，Xiubo Li  写道：
> >>
> >>
> >> On 11/27/23 13:12, zxcs wrote:
> >>> current, we using `ceph config set mds mds_bal_interval 3600` to set a 
> >>> fixed time(1 hour).
> >>>
> >>> we also have a question about how to set no balance for multi active mds.
> >>>
> >>> means, we will enable multi active mds(to improve throughput) and no 
> >>> balance for these mds.
> >>>
> >>> and if we set mds_bal_interval as big number seems can void this issue?
> >>>
> >> You can just set 'mds_bal_interval' to 0.
> >>
> >>
> >>>
> >>> Thanks,
> >>> xz
> >>>
>  2023年11月27日 10:56，Ben  写道：
> 
>  with the same mds configuration, we see exactly the same(problem, log and
>  solution) with 17.2.5, constantly happening again and again in couples 
>  days
>  intervals. MDS servers are stuck somewhere, ceph status reports no issue
>  however. We need to restart some of the mds (if not all of them) to 
>  restore
>  them back. Hopefully this could be fixed soon or get docs updated with
>  warning for the balancer's usage in production environment.
> 
>  thanks and regards
> 
>  Xiubo Li  于2023年11月23日周四 15:47写道：
> 
> > On 11/23/23 11:25, zxcs wrote:
> >> Thanks a ton, Xiubo!
> >>
> >> it not disappear.
> >>
> >> even we umount the ceph directory on these two old os node.
> >>
> >> after dump ops flight , we can see some request, and the earliest
> > complain “failed to authpin, subtree is being exported"
> >> And how to avoid this, would you please help to shed some light here?
> > Okay, as Frank mentioned you can try to disable the balancer by pining
> > the directories. As I remembered the balancer is buggy.
> >
> > And also you can raise one ceph tracker and provide the debug logs if
> > you have.
> >
> > Thanks
> >
> > - Xiubo
> >
> >
> >> Thanks,
> >> xz
> >>
> >>
> >>> 2023年11月22日 19:44，Xiubo Li  写道：
> >>>
> >>>
> >>> On 11/22/23 16:02, zxcs wrote:
>  HI, Experts,
> 
>  we are using cephfs with  16.2.* with multi active mds, and recently,
> > we have two nodes mount with ceph-fuse due to the old os system.
>  and  one nodes run a python script with `glob.glob(path)`, and 
>  another
> > client doing `cp` operation on the same path.
>  then we see some log about `mds slow request`, and logs complain
> > “failed to authpin, subtree is being exported"
>  then need to restart mds,
> 
> 
>  our question is, does there any dead lock?  how can we avoid this and
> > how to fix it without restart mds(it will influence other users) ?
> >>> BTW, won't the slow requests disappear themself later ?
> >>>
> >>> It looks like the exporting is slow or there too many exports are 
> >>> going
> > on.
> >>> Thanks
> >>>
> >>> - Xiubo
> >>>
>  Thanks a ton!
> 
> 
>  xz
>  ___
>  ceph-users mailing list -- ceph-users@ceph.io
>  To unsubscribe send an email to ceph-users-le...@ceph.io
> >>> ___
> >>> ceph-users mailing list -- ceph-users@ceph.io
> >>> To unsubscribe send an email to ceph-users-le...@ceph.io
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
>  ___
>  ceph-users mailing list -- ceph-users@ceph.io
>  To unsubscribe send an email to ceph-users-le...@ceph.io
> >> ___
> >> ceph-users mailing list -- ceph-users@ceph.io
> >> To unsubscribe send an email to

[ceph-users] Re: mds slow request with “failed to authpin, subtree is being exported"

2023-12-04 Thread Xiubo Li



On 12/4/23 16:25, zxcs wrote:

Thanks a lot, Xiubo!

we already set ‘mds_bal_interval’ to 0. and the slow mds seems decrease.

But somehow we still see mds complain slow request. and from mds log , can see

“slow request *** seconds old, received at 2023-12-04T…: internal op 
exportdir:mds.* currently acquired locks”

so our question is, why it still see "internal op exportdir”, any other config 
also need to set 0? and could please shed light here which config we need set .


IMO, this should be enough.

Venky,

Did I miss something here ?

Thanks

- Xiubo



Thanks,
xz


2023年11月27日 13:19，Xiubo Li  写道：


On 11/27/23 13:12, zxcs wrote:

current, we using `ceph config set mds mds_bal_interval 3600` to set a fixed 
time(1 hour).

we also have a question about how to set no balance for multi active mds.

means, we will enable multi active mds(to improve throughput) and no balance 
for these mds.

and if we set mds_bal_interval as big number seems can void this issue?


You can just set 'mds_bal_interval' to 0.




Thanks,
xz


2023年11月27日 10:56，Ben  写道：

with the same mds configuration, we see exactly the same(problem, log and
solution) with 17.2.5, constantly happening again and again in couples days
intervals. MDS servers are stuck somewhere, ceph status reports no issue
however. We need to restart some of the mds (if not all of them) to restore
them back. Hopefully this could be fixed soon or get docs updated with
warning for the balancer's usage in production environment.

thanks and regards

Xiubo Li  于2023年11月23日周四 15:47写道：


On 11/23/23 11:25, zxcs wrote:

Thanks a ton, Xiubo!

it not disappear.

even we umount the ceph directory on these two old os node.

after dump ops flight , we can see some request, and the earliest

complain “failed to authpin, subtree is being exported"

And how to avoid this, would you please help to shed some light here?

Okay, as Frank mentioned you can try to disable the balancer by pining
the directories. As I remembered the balancer is buggy.

And also you can raise one ceph tracker and provide the debug logs if
you have.

Thanks

- Xiubo



Thanks,
xz



2023年11月22日 19:44，Xiubo Li  写道：


On 11/22/23 16:02, zxcs wrote:

HI, Experts,

we are using cephfs with  16.2.* with multi active mds, and recently,

we have two nodes mount with ceph-fuse due to the old os system.

and  one nodes run a python script with `glob.glob(path)`, and another

client doing `cp` operation on the same path.

then we see some log about `mds slow request`, and logs complain

“failed to authpin, subtree is being exported"

then need to restart mds,


our question is, does there any dead lock?  how can we avoid this and

how to fix it without restart mds(it will influence other users) ?

BTW, won't the slow requests disappear themself later ?

It looks like the exporting is slow or there too many exports are going

on.

Thanks

- Xiubo


Thanks a ton!


xz
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] EC Profiles & DR

2023-12-04 Thread duluxoz


Hi All,

Looking for some help/explanation around erasure code pools, etc.

I set up a 3-node Ceph (Quincy) cluster with each box holding 7 OSDs 
(HDDs) and each box running Monitor, Manager, and iSCSI Gateway. For the 
record the cluster runs beautifully, without resource issues, etc.


I created an Erasure Code Profile, etc:

~~~
ceph osd erasure-code-profile set my_ec_profile plugin=jerasure k=4 m=2 
crush-failure-domain=osd

ceph osd crush rule create-erasure my_ec_rule my_ec_profile
ceph osd crush rule create-replicated my_replicated_rule default host
~~~

My Crush Map is:

~~~
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable chooseleaf_stable 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54

# devices
device 0 osd.0 class hdd
device 1 osd.1 class hdd
device 2 osd.2 class hdd
device 3 osd.3 class hdd
device 4 osd.4 class hdd
device 5 osd.5 class hdd
device 6 osd.6 class hdd
device 7 osd.7 class hdd
device 8 osd.8 class hdd
device 9 osd.9 class hdd
device 10 osd.10 class hdd
device 11 osd.11 class hdd
device 12 osd.12 class hdd
device 13 osd.13 class hdd
device 14 osd.14 class hdd
device 15 osd.15 class hdd
device 16 osd.16 class hdd
device 17 osd.17 class hdd
device 18 osd.18 class hdd
device 19 osd.19 class hdd
device 20 osd.20 class hdd

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 zone
type 10 region
type 11 root

# buckets
host ceph_1 {
  id -3    # do not change unnecessarily
  id -4 class hdd  # do not change unnecessarily
  # weight 38.09564
  alg straw2
  hash 0  # rjenkins1
  item osd.0 weight 5.34769
  item osd.1 weight 5.45799
  item osd.2 weight 5.45799
  item osd.3 weight 5.45799
  item osd.4 weight 5.45799
  item osd.5 weight 5.45799
  item osd.6 weight 5.45799
}
host ceph_2 {
  id -5    # do not change unnecessarily
  id -6 class hdd  # do not change unnecessarily
  # weight 38.09564
  alg straw2
  hash 0  # rjenkins1
  item osd.7 weight 5.34769
  item osd.8 weight 5.45799
  item osd.9 weight 5.45799
  item osd.10 weight 5.45799
  item osd.11 weight 5.45799
  item osd.12 weight 5.45799
  item osd.13 weight 5.45799
}
host ceph_3 {
  id -7    # do not change unnecessarily
  id -8 class hdd  # do not change unnecessarily
  # weight 38.09564
  alg straw2
  hash 0  # rjenkins1
  item osd.14 weight 5.34769
  item osd.15 weight 5.45799
  item osd.16 weight 5.45799
  item osd.17 weight 5.45799
  item osd.18 weight 5.45799
  item osd.19 weight 5.45799
  item osd.20 weight 5.45799
}
root default {
  id -1    # do not change unnecessarily
  id -2 class hdd  # do not change unnecessarily
  # weight 114.28693
  alg straw2
  hash 0  # rjenkins1
  item ceph_1 weight 38.09564
  item ceph_2 weight 38.09564
  item ceph_3 weight 38.09564
}

# rules
rule replicated_rule {
  id 0
  type replicated
  step take default
  step chooseleaf firstn 0 type host
  step emit
}
rule my_replicated_rule {
  id 1
  type replicated
  step take default
  step chooseleaf firstn 0 type host
  step emit
}
rule my_ec_rule {
  id 2
  type erasure
  step set_chooseleaf_tries 5
  step set_choose_tries 100
  step take default
  step choose indep 3 type host
  step chooseleaf indep 2 type osd
  step emit
}

# end crush map
~~~

Finally I create a pool:

~~~
ceph osd pool create my_pool 32 32 erasure my_ec_profile my_ec_rule
ceph osd pool application enable my_meta_pool rbd
rbd pool init my_meta_pool
rbd pool init my_pool
rbd create --size 16T my_pool/my_disk_1 --data-pool my_pool 
--image-feature journaling

~~~

So all this is to have some VMs (oVirt VMs, for the record) with 
automatic fall-over in the case of a Ceph Node loss - ie I was trying to 
"replicate" a 3-Disk RAID 5 array across the Ceph Nodes, so that I could 
loose a Node and still have a working set of VMs.


However, I took one of the Ceph Nodes down (gracefully) for some 
maintenance the other day and I lost *all* the VMs (ie oVirt complained 
that there was no active pool). As soon as I brought the down node back 
up everything was good again.


So my question is: What did I do wrong with my config?

Sound I, for example, change the EC Profile to `k=2, m=1`, but how is 
that practically different from `k=4, m=2` - yes, the later spreads the 
pool over more disks, but it should still only put 2 disks on each node, 
shouldn't it?


Thanks in advance

Cheers

Dulux-Oz
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: ceph fs (meta) data inconsistent

2023-12-04 Thread Xiubo Li


Frank,

By using your script I still couldn't reproduce it. Locally my python 
version is 3.9.16, and I didn't have other VMs to test python other 
versions.


Could you check the tracker to provide the debug logs ?

Thanks

- Xiubo

On 12/1/23 21:08, Frank Schilder wrote:

Hi Xiubo,

I uploaded a test script with session output showing the issue. When I look at 
your scripts, I can't see the stat-check on the second host anywhere. Hence, I 
don't really know what you are trying to compare.

If you want me to run your test scripts on our system for comparison, please 
include the part executed on the second host explicitly in an ssh-command. 
Running your scripts alone in their current form will not reproduce the issue.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Xiubo Li 
Sent: Monday, November 27, 2023 3:59 AM
To: Frank Schilder; Gregory Farnum
Cc: ceph-users@ceph.io
Subject: Re: [ceph-users] Re: ceph fs (meta) data inconsistent


On 11/24/23 21:37, Frank Schilder wrote:

Hi Xiubo,

thanks for the update. I will test your scripts in our system next week. 
Something important: running both scripts on a single client will not produce a 
difference. You need 2 clients. The inconsistency is between clients, not on 
the same client. For example:

Frank,

Yeah, I did this with 2 different kclients.

Thanks


Setup: host1 and host2 with a kclient mount to a cephfs under /mnt/kcephfs

Test 1
- on host1: execute shutil.copy2
- execute ls -l /mnt/kcephfs/ on host1 and host2: same result

Test 2
- on host1: shutil.copy
- execute ls -l /mnt/kcephfs/ on host1 and host2: file size=0 on host 2 while 
correct on host 1

Your scripts only show output of one host, but the inconsistency requires two 
hosts for observation. The stat information is updated on host1, but not 
synchronized to host2 in the second test. In case you can't reproduce that, I 
will append results from our system to the case.

Also it would be important to know the python and libc versions. We observe 
this only for newer versions of both.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Xiubo Li 
Sent: Thursday, November 23, 2023 3:47 AM
To: Frank Schilder; Gregory Farnum
Cc: ceph-users@ceph.io
Subject: Re: [ceph-users] Re: ceph fs (meta) data inconsistent

I just raised one tracker to follow this:
https://tracker.ceph.com/issues/63510

Thanks

- Xiubo






___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: MDS stuck in up:rejoin

2023-12-04 Thread Venky Shankar

Hi Eric,

On Mon, Nov 27, 2023 at 8:00 PM Eric Tittley  wrote:
>
> Hi all,
>
> For about a week our CephFS has experienced issues with its MDS.
>
> Currently the MDS is stuck in "up:rejoin"
>
> Issues become apparent when simple commands like "mv foo bar/" hung.

I assume the MDS was active at this point in time where the command
hung. Would that be correct?

>
> I unmounted CephFS offline on the clients, evicted those remaining, and then 
> issued
>
> ceph config set mds.0 mds_wipe_sessions true
> ceph config set mds.1 mds_wipe_sessions true
>
> which allowed me to delete the hung requests.

Most likely, the above steps weren't really required. The hung command
is possibly a deadlock in the MDS (during rename).

>
> I've lost the exact commands I used, but something like
> rados -p cephfs_metadata ls | grep mds
> rados rm -p cephfs_metadata mds0_openfiles.0
>
> etc
>
> This allowed the MDS to get to "up:rejoin" where it has been stuck ever since 
> which is getting on five days.
>
> # ceph mds stat
> cephfs:1/1 {0=cephfs.ceph00.uvlkrw=up:rejoin} 2 up:standby
>
>
>
> root@ceph00:/var/log/ceph/a614303a-5eb5-11ed-b492-011f01e12c9a# ceph -s
>   cluster:
> id: a614303a-5eb5-11ed-b492-011f01e12c9a
> health: HEALTH_WARN
> 1 filesystem is degraded
> 1 pgs not deep-scrubbed in time
> 2 pool(s) do not have an application enabled
> 1 daemons have recently crashed
>
>   services:
> mon: 3 daemons, quorum ceph00,ceph01,ceph02 (age 57m)
> mgr: ceph01.lvdgyr(active, since 2h), standbys: ceph00.gpwpgs
> mds: 1/1 daemons up, 2 standby
> osd: 91 osds: 90 up (since 78m), 90 in (since 112m)
>
>   data:
> volumes: 0/1 healthy, 1 recovering
> pools:   5 pools, 1539 pgs
> objects: 138.83M objects, 485 TiB
> usage:   971 TiB used, 348 TiB / 1.3 PiB avail
> pgs: 1527 active+clean
>  12   active+clean+scrubbing+deep
>
>   io:
> client:   3.1 MiB/s rd, 3.16k op/s rd, 0 op/s wr
>
>
> # ceph --version
> ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable)
>
>
> I've tried failing the MDS so it switches.  Rebooted a couple of times.
> I've added more OSDs to the metadata pool and took one out as I thought it 
> might be a bad metadata OSD (The "recently crashed" daemon).

This isn't really going to do any good btw.

The recently crashed daemon is likely the MDS which you mentioned in
your subsequent email.

>
> The error logs are full of
> (prefix to all are:
> Nov 27 14:02:44 ceph00 bash[2145]: debug 2023-11-27T14:02:44.619+ 
> 7f74e845e700  1 -- 
> [v2:192.168.1.128:6800/2157301677,v1:192.168.1.128:6801/2157301677] --> 
> [v2:192.168.1.133:6896/4289132926,v1:192.168.1.133:6897/4289132926]
> )
>
> crc :-1 s=READY pgs=12 cs=0 l=1 rev1=1 crypto rx=0 tx=0 comp rx=0 
> tx=0).send_message enqueueing message m=0x559be00adc00 type=42 
> osd_op(mds.0.36244:8142873 3.ff 3:ff5b34d6:::1.:head [getxattr parent 
> in=6b] snapc 0=[] 
> ondisk+read+known_if_redirected+full_force+supports_pool_eio e32465) v8
> crc :-1 s=READY pgs=12 cs=0 l=1 rev1=1 crypto rx=0 tx=0 comp rx=0 
> tx=0).write_message sending message m=0x559be00adc00 seq=8142643 
> osd_op(mds.0.36244:8142873 3.ff 3:ff5b34d6:::1.:head [getxattr parent 
> in=6b] snapc 0=[] 
> ondisk+read+known_if_redirected+full_force+supports_pool_eio e32465) v8
> crc :-1 s=THROTTLE_DONE pgs=12 cs=0 l=1 rev1=1 crypto rx=0 tx=0 comp rx=0 
> tx=0).handle_message got 154 + 0 + 30 byte message. envelope type=43 src 
> osd.89 off 0
> crc :-1 s=READ_MESSAGE_COMPLETE pgs=12 cs=0 l=1 rev1=1 crypto rx=0 tx=0 comp 
> rx=0 tx=0).handle_message received message m=0x559be01f4480 seq=8142643 
> from=osd.89 type=43 osd_op_reply(8142873 1. [getxattr (30) out=30b] 
> v0'0 uv560123 ondisk = 0) v8
> osd_op_reply(8142873 1. [getxattr (30) out=30b] v0'0 uv560123 ondisk 
> = 0) v8  154+0+30 (crc 0 0 0) 0x559be01f4480 con 0x559be00ad800
> osd_op(unknown.0.36244:8142874 3.ff 3:ff5b34d6:::1.:head [getxattr 
> parent in=6b] snapc 0=[] 
> ondisk+read+known_if_redirected+full_force+supports_pool_eio e32465) v8 -- 
> 0x559be2caec00 con 0x559be00ad800
>
>
>
>
> Repeating multiple times a second (and filling /var)
> Prior to taking one of the cephfs_metadata OSDs offline, these came from 
> communications from ceph00 to the node hosting the suspected bad OSD.
> Now they are between ceph00 and the host of the replacement metadata OSD.
>
> Does anyone have any suggestion on how to get the MDS to switch from 
> "up:rejoin" to "up:active"?
>
> Is there any way to debug this, to determine what issue really is? I'm unable 
> to interpret the debug log.
>
> Cheers,
> Eric
>
> 
> Dr Eric Tittley
> Research Computing Officer www.roe.ac.uk/~ert
> Institute for Astronomy Royal Observatory, Edinburgh
>
>
>
>
> The University of Edinburgh is a charitable body, registered in

[ceph-users] Re: [ext] CephFS pool not releasing space after data deletion

2023-12-04 Thread Venky Shankar

Hi Mathias/Frank,

(sorry for the late reply - this didn't get much attention including
the tracker report and eventually got parked).

Will have this looked into - expect an update in a day or two.

On Sat, Dec 2, 2023 at 5:46 PM Frank Schilder  wrote:
>
> Hi Mathias,
>
> have you made any progress on this? Did the capacity become available 
> eventually?
>
> Best regards,
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> 
> From: Kuhring, Mathias 
> Sent: Friday, October 27, 2023 3:52 PM
> To: ceph-users@ceph.io; Frank Schilder
> Subject: Re: [ext] [ceph-users] CephFS pool not releasing space after data 
> deletion
>
> Dear ceph users,
>
> We are wondering, if this might be the same issue as with this bug:
> https://tracker.ceph.com/issues/52581
>
> Except that we seem to have been snapshots dangling on the old pool.
> And the bug report snapshots dangling on the new pool.
> But maybe it's both?
>
> I mean, once the global root layout was created to a new pool,
> the new pool became in charge for snapshooting at least of new data, right?
> What about data which is overwritten? Is there a conflict of responsibility?
>
> We do have similar listings of snaps with "ceph osd pool ls detail", I
> think:
>
> 0|0[root@osd-1 ~]# ceph osd pool ls detail | grep -B 1 removed_snaps_queue
> pool 1 'cephfs_data' replicated size 3 min_size 2 crush_rule 1
> object_hash rjenkins pg_num 115 pgp_num 107 pg_num_target 32
> pgp_num_target 32 autoscale_mode on last_change 803558 lfor
> 0/803250/803248 flags hashpspool,selfmanaged_snaps stripe_width 0
> expected_num_objects 1 application cephfs
>  removed_snaps_queue
> [3541~1,36e4~1,379f~2,3862~1,3876~1,387d~1,388b~1,389a~1,38a6~1,38bc~1,3993~1,3999~1,39a0~1,39a7~1,39ae~1,39b5~3,39be~1,39c5~1,39cc~1]
> --
> pool 3 'hdd_ec' erasure profile hdd_ec size 3 min_size 2 crush_rule 3
> object_hash rjenkins pg_num 2048 pgp_num 2048 autoscale_mode off
> last_change 803558 lfor 0/87229/87229 flags
> hashpspool,ec_overwrites,selfmanaged_snaps stripe_width 8192 application
> cephfs
>  removed_snaps_queue
> [3541~1,36e4~1,379f~2,3862~1,3876~1,387d~1,388b~1,389a~1,38a6~1,38bc~1,3993~1,3999~1,39a0~1,39a7~1,39ae~1,39b5~3,39be~1,39c5~1,39cc~1]
> --
> pool 20 'hdd_ec_8_2_pool' erasure profile hdd_ec_8_2_profile size 10
> min_size 9 crush_rule 5 object_hash rjenkins pg_num 8192 pgp_num 8192
> autoscale_mode off last_change 803558 lfor 0/0/681917 flags
> hashpspool,ec_overwrites,selfmanaged_snaps stripe_width 32768
> application cephfs
>  removed_snaps_queue
> [3541~1,36e4~1,379f~2,3862~1,3876~1,387d~1,388b~1,389a~1,38a6~1,38bc~1,3993~1,3999~1,39a0~1,39a7~1,39ae~1,39b5~3,39be~1,39c5~1,39cc~1]
>
>
> Here, pool hdd_ec_8_2_pool is the one we recently assigned to the root
> layout.
> Pool hdd_ec is the one which was assigned before and which won't release
> space (at least where I know of).
>
> Is this removed_snaps_queue the same as removed_snaps in the bug issue
> (i.e. the label was renamed)?
> And is it normal that all queues list the same info or should this be
> different per pool?
> Might this be related to pools having now share responsibility over some
> snaps due to layout changes?
>
> And for the big question:
> How can I actually trigger/speedup the removal of those snaps?
> I find the removed_snaps/removed_snaps_queue mentioned a few times in
> the user list.
> But never with some conclusive answer how to deal with them.
> And the only mentions in the docs are just change logs.
>
> I also looked into and started cephfs stray scrubbing:
> https://docs.ceph.com/en/latest/cephfs/scrub/#evaluate-strays-using-recursive-scrub
> But according to the status output, no scrubbing is actually active.
>
> I would appreciate any further ideas. Thanks a lot.
>
> Best Wishes,
> Mathias
>
> On 10/23/2023 12:42 PM, Kuhring, Mathias wrote:
> > Dear Ceph users,
> >
> > Our CephFS is not releasing/freeing up space after deleting hundreds of
> > terabytes of data.
> > By now, this drives us in a "nearfull" osd/pool situation and thus
> > throttles IO.
> >
> > We are on ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5)
> > quincy (stable).
> >
> > Recently, we moved a bunch of data to a new pool with better EC.
> > This was done by adding a new EC pool to the FS.
> > Then assigning the FS root to the new EC pool via the directory layout xattr
> > (so all new data is written to the new pool).
> > And finally copying old data to new folders.
> >
> > I swapped the data as follows to remain the old directory structures.
> > I also made snapshots for validation purposes.
> >
> > So basically:
> > cp -r mymount/mydata/ mymount/new/ # this creates copy on new pool
> > mkdir mymount/mydata/.snap/tovalidate
> > mkdir mymount/new/mydata/.snap/tovalidate
> > mv mymount/mydata/ mymount/old/
> > mv mymount/new/mydata mymount/
> >
> > I could see the increase of data in the new pool as expected (ceph df).
> > I

[ceph-users] Re: the image used size becomes 0 after export/import with snapshot

2023-12-04 Thread Ilya Dryomov

On Tue, Nov 28, 2023 at 8:18 AM Tony Liu  wrote:
>
> Hi,
>
> I have an image with a snapshot and some changes after snapshot.
> ```
> $ rbd du backup/f0408e1e-06b6-437b-a2b5-70e3751d0a26
> NAME  
>   PROVISIONED  USED
> f0408e1e-06b6-437b-a2b5-70e3751d0a26@snapshot-eb085877-7557-4620-9c01-c5587b857029
>10 GiB  2.4 GiB
> f0408e1e-06b6-437b-a2b5-70e3751d0a26  
>10 GiB  2.4 GiB
>
>10 GiB  4.8 GiB
> ```
> If there is no changes after snapshot, the image line will show 0 used.
>
> I did export and import.
> ```
> $ rbd export --export-format 2 backup/f0408e1e-06b6-437b-a2b5-70e3751d0a26 - 
> | rbd import --export-format 2 - backup/test
> Exporting image: 100% complete...done.
> Importing image: 100% complete...done.
> ```
>
> When check the imported image, the image line shows 0 used.
> ```
> $ rbd du backup/test
> NAMEPROVISIONED  USED
> test@snapshot-eb085877-7557-4620-9c01-c5587b857029   10 GiB  2.4 GiB
> test 10 GiB  0 B
>   10 GiB  2.4 GiB
> ```
> Any clues how that happened? I'd expect the same du as the source.

Hi Tony,

"rbd import" command does zero detection at 4k granularity by default.
If the "after snapshot" changes just zeroed everything in the snapshot,
such a discrepancy in "rbd du" USED column is expected.

Thanks,

Ilya
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Libvirt and Ceph: libvirtd tries to open random RBD images

2023-12-04 Thread Jayanth Reddy

Hello Eugen,
Thanks for the response. No, we don't have a pool named "rbd" or any
namespaces defined. I'll figure out a way to increase libvirtd debug level
and check.

Regards,
Jayanth

On Mon, Dec 4, 2023 at 3:16 PM Eugen Block  wrote:

> Hi,
>
> I'm not familiar with Cloudstack, I was just wondering if it tries to
> query the pool "rbd"? Some tools refer to a default pool "rbd" if no
> pool is specified. Do you have an "rbd" pool in that cluster?
> Another thought are namespaces, do you have those defined? Can you
> increase the debug level to see what exactly it tries to do?
>
> Regards,
> Eugen
>
> Zitat von Jayanth Reddy :
>
> > Hello Users,
> > We're using libvirt with KVM and the orchestrator is Cloudstack. I raised
> > the issue already at Cloudstack at
> > https://github.com/apache/cloudstack/issues/8211 but appears to be at
> > libvirtd. Did the same in libvirt ML at
> >
> https://lists.libvirt.org/archives/list/us...@lists.libvirt.org/thread/SA2I4QZGVVEIKPJU7E2KAFYYFZLJZDMV/
> > but I'm now here looking for answers.
> >
> > Below is our environment & issue description:
> >
> > Ceph: v17.2.0
> > Pool: replicated
> > Number of block images in this pool: more than 1250
> >
> > # virsh pool-info c15508c7-5c2c-317f-aa2e-29f307771415
> > Name:   c15508c7-5c2c-317f-aa2e-29f307771415
> > UUID:   c15508c7-5c2c-317f-aa2e-29f307771415
> > State:  running
> > Persistent: no
> > Autostart:  no
> > Capacity:   1.25 PiB
> > Allocation: 489.52 TiB
> > Available:  787.36 TiB
> >
> > # kvm --version
> > QEMU emulator version 4.2.1 (Debian 1:4.2-3ubuntu6.27)
> > Copyright (c) 2003-2019 Fabrice Bellard and the QEMU Project developers
> >
> > # libvirtd --version
> > libvirtd (libvirt) 6.0.0
> >
> > It appears that one of our Cloudstack KVM clusters having 8 hosts is
> having
> > the issue. We have HCI on these 8 hosts and there are around 700+ VMs
> > running. But strange enough, there are these logs like below on hosts.
> >
> >
> > Oct 25 13:38:11 hv-01 libvirtd[9464]: failed to open the RBD image
> > '087bb114-448a-41d2-9f5d-6865b62eed15': No such file or directory
> > Oct 25 20:35:22 hv-01 libvirtd[9464]: failed to open the RBD image
> > 'ccc1168a-5ffa-4b6d-a953-8e0ac788ebc5': No such file or directory
> > Oct 26 09:48:33 hv-01 libvirtd[9464]: failed to open the RBD image
> > 'a3fe82f8-afc9-4604-b55e-91b676514a18': No such file or directory
> >
> > We've got DNS servers on which there is an`A` record resolving to all the
> > IPv4 Addresses of 5 monitors and there have not been any issues with the
> > DNS resolution. But the issue of "failed to open the RBD image
> > 'ccc1168a-5ffa-4b6d-a953-8e0ac788ebc5': No such file or directory" gets
> > more weird because the VM that is making use of that RBD image lets say
> > "087bb114-448a-41d2-9f5d-6865b62eed15" is running on an altogether
> > different host like "hv-06". On further inspection of that specific
> Virtual
> > Machine, it has been running on that host "hv-06" for more than 4 months
> or
> > so. Fortunately, the Virtual Machine has no issues and has been running
> > since then. There are absolutely no issues with any of the Virtual
> Machines
> > because of these warnings.
> >
> > From libvirtd mailing lists, one of the community members helped me
> > understand that libvirt only tries to get the info of the images and
> > doesn't open for reading or writing. All hosts where there is libvirtd
> > tries doing the same. We manually did "virsh pool-refresh" which
> CloudStack
> > itself takes care of at regular intervals and the warning messages still
> > appear. Please help me find the cause and let me know if further
> > information is needed.
> >
> > Thanks,
> > Jayanth Reddy
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Nov/Dec Ceph Science Virtual User Group

2023-12-04 Thread Kevin Hrpcek


Hey All,

So I got busy and failed at getting an email out with a couple days 
notice for last week so let's meet up this week!  We will be having a 
Ceph science/research/big cluster call on Wednesday December 6th. If 
anyone wants to discuss something specific they can add it to the pad 
linked below. If you have questions or comments you can contact me.


This is an informal open call of community members mostly from 
hpc/htc/research/big cluster environments (though anyone is welcome) 
where we discuss whatever is on our minds regarding ceph. Updates, 
outages, features, maintenance, etc...there is no set presenter but I do 
attempt to keep the conversation lively.


NOTE: The change to using Jitsi for the meeting. We are no longer using 
the bluejeans meeting links. The ceph calendar event does not yet 
reflect this and has the wrong day as well.


Pad URL:
https://pad.ceph.com/p/Ceph_Science_User_Group_20231206

Virtual event details:
December 6th, 2023
15:00 UTC
4pm Central European
9am Central US

Description: Main pad for discussions: 
https://pad.ceph.com/p/Ceph_Science_User_Group_Index

Meetings will be recorded and posted to the Ceph Youtube channel.

To join the meeting on a computer or mobile phone: 
https://meet.jit.si/ceph-science-wg


Kevin

--
Kevin Hrpcek
NASA VIIRS Atmosphere SIPS/TROPICS
Space Science & Engineering Center
University of Wisconsin-Madison
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: MDS stuck in up:rejoin

2023-12-04 Thread Eric Tittley


I rebooted the servers and now the MDS won't start at all.

They give the (truncated) error:
    -1> 2023-12-04T14:44:41.354+ 7f6351715640 -1 
./src/mds/MDCache.cc: In function 'void MDCache::rejoin_send_rejoins()' 
thread 7f6351715640 time 2023-12-04T14:44:41.354292+

./src/mds/MDCache.cc: 4084: FAILED ceph_assert(auth >= 0)

Is this a file permission problem?

Eric


On 27/11/2023 14:29, Eric Tittley wrote:

Hi all,

For about a week our CephFS has experienced issues with its MDS.

Currently the MDS is stuck in "up:rejoin"

Issues become apparent when simple commands like "mv foo bar/" hung.

I unmounted CephFS offline on the clients, evicted those remaining, 
and then issued


ceph config set mds.0 mds_wipe_sessions true
ceph config set mds.1 mds_wipe_sessions true

which allowed me to delete the hung requests.

I've lost the exact commands I used, but something like
rados -p cephfs_metadata ls | grep mds
rados rm -p cephfs_metadata mds0_openfiles.0

etc

This allowed the MDS to get to "up:rejoin" where it has been stuck 
ever since which is getting on five days.


# ceph mds stat
cephfs:1/1 {0=cephfs.ceph00.uvlkrw=up:rejoin} 2 up:standby



root@ceph00:/var/log/ceph/a614303a-5eb5-11ed-b492-011f01e12c9a# ceph -s
 cluster:
   id: a614303a-5eb5-11ed-b492-011f01e12c9a
   health: HEALTH_WARN
   1 filesystem is degraded
   1 pgs not deep-scrubbed in time
   2 pool(s) do not have an application enabled
   1 daemons have recently crashed

 services:
   mon: 3 daemons, quorum ceph00,ceph01,ceph02 (age 57m)
   mgr: ceph01.lvdgyr(active, since 2h), standbys: ceph00.gpwpgs
   mds: 1/1 daemons up, 2 standby
   osd: 91 osds: 90 up (since 78m), 90 in (since 112m)

 data:
   volumes: 0/1 healthy, 1 recovering
   pools:   5 pools, 1539 pgs
   objects: 138.83M objects, 485 TiB
   usage:   971 TiB used, 348 TiB / 1.3 PiB avail
   pgs: 1527 active+clean
    12   active+clean+scrubbing+deep

 io:
   client:   3.1 MiB/s rd, 3.16k op/s rd, 0 op/s wr


# ceph --version
ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy 
(stable)



I've tried failing the MDS so it switches.  Rebooted a couple of times.
I've added more OSDs to the metadata pool and took one out as I 
thought it might be a bad metadata OSD (The "recently crashed" daemon).


The error logs are full of
(prefix to all are:
Nov 27 14:02:44 ceph00 bash[2145]: debug 2023-11-27T14:02:44.619+ 
7f74e845e700  1 -- 
[v2:192.168.1.128:6800/2157301677,v1:192.168.1.128:6801/2157301677] 
--> [v2:192.168.1.133:6896/4289132926,v1:192.168.1.133:6897/4289132926]

)

crc :-1 s=READY pgs=12 cs=0 l=1 rev1=1 crypto rx=0 tx=0 comp rx=0 
tx=0).send_message enqueueing message m=0x559be00adc00 type=42 
osd_op(mds.0.36244:8142873 3.ff 3:ff5b34d6:::1.:head [getxattr 
parent in=6b] snapc 0=[] 
ondisk+read+known_if_redirected+full_force+supports_pool_eio e32465) v8
crc :-1 s=READY pgs=12 cs=0 l=1 rev1=1 crypto rx=0 tx=0 comp rx=0 
tx=0).write_message sending message m=0x559be00adc00 seq=8142643 
osd_op(mds.0.36244:8142873 3.ff 3:ff5b34d6:::1.:head [getxattr 
parent in=6b] snapc 0=[] 
ondisk+read+known_if_redirected+full_force+supports_pool_eio e32465) v8
crc :-1 s=THROTTLE_DONE pgs=12 cs=0 l=1 rev1=1 crypto rx=0 tx=0 comp 
rx=0 tx=0).handle_message got 154 + 0 + 30 byte message. envelope 
type=43 src osd.89 off 0
crc :-1 s=READ_MESSAGE_COMPLETE pgs=12 cs=0 l=1 rev1=1 crypto rx=0 
tx=0 comp rx=0 tx=0).handle_message received message m=0x559be01f4480 
seq=8142643 from=osd.89 type=43 osd_op_reply(8142873 1. 
[getxattr (30) out=30b] v0'0 uv560123 ondisk = 0) v8
osd_op_reply(8142873 1. [getxattr (30) out=30b] v0'0 uv560123 
ondisk = 0) v8  154+0+30 (crc 0 0 0) 0x559be01f4480 con 
0x559be00ad800
osd_op(unknown.0.36244:8142874 3.ff 3:ff5b34d6:::1.:head 
[getxattr parent in=6b] snapc 0=[] 
ondisk+read+known_if_redirected+full_force+supports_pool_eio e32465) 
v8 -- 0x559be2caec00 con 0x559be00ad800





Repeating multiple times a second (and filling /var)
Prior to taking one of the cephfs_metadata OSDs offline, these came 
from communications from ceph00 to the node hosting the suspected bad 
OSD.

Now they are between ceph00 and the host of the replacement metadata OSD.

Does anyone have any suggestion on how to get the MDS to switch from 
"up:rejoin" to "up:active"?


Is there any way to debug this, to determine what issue really is? I'm 
unable to interpret the debug log.


Cheers,
Eric


Dr Eric Tittley
Research Computing Officer www.roe.ac.uk/~ert
Institute for Astronomy Royal Observatory, Edinburgh




The University of Edinburgh is a charitable body, registered in 
Scotland, with registration number SC005336. Is e buidheann 
carthannais a th’ ann an Oilthigh Dhùn Èideann, clàraichte an Alba, 
àireamh clàraidh SC005336.

___
ceph-users mailing

[ceph-users] Re: reef 18.2.1 QE Validation status

2023-12-04 Thread Venky Shankar

Hi Yuri,

On Fri, Dec 1, 2023 at 8:47 PM Yuri Weinstein  wrote:
>
> Venky, pls review the test results for smoke and fs after the PRs were merged.

fs run looks good. Summarized here

https://tracker.ceph.com/projects/cephfs/wiki/Reef#04-Dec-2023

>
> Radek, Igor, Adam - any updates on https://tracker.ceph.com/issues/63618?
>
> Thx
>
> On Thu, Nov 30, 2023 at 8:08 AM Yuri Weinstein  wrote:
> >
> > The fs PRs:
> > https://github.com/ceph/ceph/pull/54407
> > https://github.com/ceph/ceph/pull/54677
> > were approved/tested and ready for merge.
> >
> > What is the status/plan for https://tracker.ceph.com/issues/63618?
> >
> > On Wed, Nov 29, 2023 at 10:51 AM Igor Fedotov  wrote:
> > >
> > > https://tracker.ceph.com/issues/63618 to be considered as a blocker for
> > > the next Reef release.
> > >
> > > On 07/11/2023 00:30, Yuri Weinstein wrote:
> > > > Details of this release are summarized here:
> > > >
> > > > https://tracker.ceph.com/issues/63443#note-1
> > > >
> > > > Seeking approvals/reviews for:
> > > >
> > > > smoke - Laura, Radek, Prashant, Venky (POOL_APP_NOT_ENABLE failures)
> > > > rados - Neha, Radek, Travis, Ernesto, Adam King
> > > > rgw - Casey
> > > > fs - Venky
> > > > orch - Adam King
> > > > rbd - Ilya
> > > > krbd - Ilya
> > > > upgrade/quincy-x (reef) - Laura PTL
> > > > powercycle - Brad
> > > > perf-basic - Laura, Prashant (POOL_APP_NOT_ENABLE failures)
> > > >
> > > > Please reply to this email with approval and/or trackers of known
> > > > issues/PRs to address them.
> > > >
> > > > TIA
> > > > YuriW
> > > > ___
> > > > ceph-users mailing list -- ceph-users@ceph.io
> > > > To unsubscribe send an email to ceph-users-le...@ceph.io
> > >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io



-- 
Cheers,
Venky
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Space reclaim doesn't happening in nautilus RBD pool

2023-12-04 Thread Ilya Dryomov

Hi Istvan,

The number of objects in "im" pool (918.34k) doesn't line up with
"rbd du" output which says that only 2.2T are provisioned (that would
take roughly ~576k objects).  This usually occurs when there are object
clones caused by previous snapshots -- keep in mind that trimming
object clones after a snapshot is removed is an asynchronous process
and it can take a while.

Just to confirm, what is the output of "rbd info im/root",
"rbd snap ls --all im/root", "ceph df" (please recapture) and
"ceph osd pool ls detail" (only "im" pool is of interest)?

Thanks,

Ilya

On Fri, Dec 1, 2023 at 5:31 AM Szabo, Istvan (Agoda)
 wrote:
>
> Thrash empty.
>
> Istvan Szabo
> Staff Infrastructure Engineer
> ---
> Agoda Services Co., Ltd.
> e: istvan.sz...@agoda.com
> ---
>
>
>
> 
> From: Ilya Dryomov 
> Sent: Thursday, November 30, 2023 6:27 PM
> To: Szabo, Istvan (Agoda) 
> Cc: Ceph Users 
> Subject: Re: [ceph-users] Space reclaim doesn't happening in nautilus RBD pool
>
> Email received from the internet. If in doubt, don't click any link nor open 
> any attachment !
> 
>
> On Thu, Nov 30, 2023 at 8:25 AM Szabo, Istvan (Agoda)
>  wrote:
> >
> > Hi,
> >
> > Is there any config on Ceph that block/not perform space reclaim?
> > I test on one pool which has only one image 1.8 TiB in used.
> >
> >
> > rbd $p du im/root
> > warning: fast-diff map is not enabled for root. operation may be slow.
> > NAMEPROVISIONED USED
> > root 2.2 TiB 1.8 TiB
> >
> >
> >
> > I already removed all snaphots and now pool has only one image alone.
> > I run both fstrim  over the filesystem (XFS) and try rbd sparsify im/root  
> > (don't know what it is exactly but it mentions to reclaim something)
> > It still shows the pool used 6.9 TiB which totally not make sense right? It 
> > should be up to 3.6 (1.8 * 2) according to its replica?
>
> Hi Istvan,
>
> Have you checked RBD trash?
>
> $ rbd trash ls -p im
>
> Thanks,
>
> Ilya
>
> 
>
> This message is confidential and is for the sole use of the intended 
> recipient(s). It may also be privileged or otherwise protected by copyright 
> or other legal rules. If you have received it by mistake please let us know 
> by reply email and delete it from your system. It is prohibited to copy this 
> message or disclose its content to anyone. Any confidentiality or privilege 
> is not waived or lost by any mistaken delivery or unauthorized disclosure of 
> the message. All messages sent to and from Agoda may be monitored to ensure 
> compliance with company policies, to protect the company's interests and to 
> remove potential malware. Electronic messages may be intercepted, amended, 
> lost or deleted, or contain viruses.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Ceph 16.2.14: osd crash, bdev() _aio_thread got r=-1 ((1) Operation not permitted)

2023-12-04 Thread Zakhar Kirpichenko

Hi,

Just to reiterate, I'm referring to an OSD crash loop because of the
following error:

"2023-12-03T04:00:36.686+ 7f08520e2700 -1 bdev(0x55f02a28a400
/var/lib/ceph/osd/ceph-56/block) _aio_thread got r=-1 ((1) Operation not
permitted)". More relevant log entries: https://pastebin.com/gDat6rfk

The crash log suggested that there could be a hardware issue but there was
none, I was able to access the block device for testing purposes without
any issues, and the problem went away after I rebooted the host, this OSD
is currently operating without any issues under load.

Any ideas?

/Z

On Sun, 3 Dec 2023 at 16:09, Zakhar Kirpichenko  wrote:

> Thanks! The bug I referenced is the reason for the 1st OSD crash, but not
> for the subsequent crashes. The reason for those is described where you
> . I'm asking for help with that one.
>
> /Z
>
> On Sun, 3 Dec 2023 at 15:31, Kai Stian Olstad 
> wrote:
>
>> On Sun, Dec 03, 2023 at 06:53:08AM +0200, Zakhar Kirpichenko wrote:
>> >One of our 16.2.14 cluster OSDs crashed again because of the dreaded
>> >https://tracker.ceph.com/issues/53906 bug.
>>
>> 
>>
>> >It would be good to understand what has triggered this condition and how
>> it
>> >can be resolved without rebooting the whole host. I would very much
>> >appreciate any suggestions.
>>
>> If you look closely at 53906 you'll see it's a duplicate of
>> https://tracker.ceph.com/issues/53907
>>
>> In there you have the fix and a workaround until next minor is released.
>>
>> --
>> Kai Stian Olstad
>>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: How to identify the index pool real usage?

2023-12-04 Thread David C.

Yes that's right.
Test them on a single OSD, to validate.
Does your platform write a lot and everywhere?
From what I just saw, it seems to me that the discard only applies to
transactions (and not the entire disk).
If you can report back the results, that would be great.



Cordialement,

*David CASIER*




Le lun. 4 déc. 2023 à 10:14, Szabo, Istvan (Agoda) 
a écrit :

> These values shouldn't be true to be able to do triming?
>
> "bdev_async_discard": "false",
> "bdev_enable_discard": "false",
>
>
>
> Istvan Szabo
> Staff Infrastructure Engineer
> ---
> Agoda Services Co., Ltd.
> e: istvan.sz...@agoda.com
> ---
>
>
> --
> *From:* David C. 
> *Sent:* Monday, December 4, 2023 3:44 PM
> *To:* Szabo, Istvan (Agoda) 
> *Cc:* Anthony D'Atri ; Ceph Users <
> ceph-users@ceph.io>
> *Subject:* Re: [ceph-users] How to identify the index pool real usage?
>
> Email received from the internet. If in doubt, don't click any link nor
> open any attachment !
> --
> Hi,
>
> A flash system needs free space to work efficiently.
>
> Hence my hypothesis that fully allocated disks need to be notified of free
> blocks (trim)
> 
>
> Cordialement,
>
> *David CASIER*
> 
>
>
>
>
> Le lun. 4 déc. 2023 à 06:01, Szabo, Istvan (Agoda) 
> a écrit :
>
> With the nodes that has some free space on that namespace, we don't have
> issue, only with this which is weird.
> --
> *From:* Anthony D'Atri 
> *Sent:* Friday, December 1, 2023 10:53 PM
> *To:* David C. 
> *Cc:* Szabo, Istvan (Agoda) ; Ceph Users <
> ceph-users@ceph.io>
> *Subject:* Re: [ceph-users] How to identify the index pool real usage?
>
> Email received from the internet. If in doubt, don't click any link nor
> open any attachment !
> 
>
> >>
> >> Today we had a big issue with slow ops on the nvme drives which holding
> >> the index pool.
> >>
> >> Why the nvme shows full if on ceph is barely utilized? Which one I
> should
> >> belive?
> >>
> >> When I check the ceph osd df it shows 10% usage of the osds (1x 2TB nvme
> >> drive has 4x osds on it):
>
> Why split each device into 4 very small OSDs?  You're losing a lot of
> capacity to overhead.
>
> >>
> >> ID   CLASS  WEIGHT   REWEIGHT  SIZE RAW USE  DATA OMAP
> META  AVAIL%USE   VAR   PGS  STATUS
> >> 195   nvme  0.43660   1.0  447 GiB   47 GiB  161 MiB   46 GiB   656
> MiB  400 GiB  10.47  0.21   64  up
> >> 252   nvme  0.43660   1.0  447 GiB   46 GiB  161 MiB   45 GiB   845
> MiB  401 GiB  10.35  0.21   64  up
> >> 253   nvme  0.43660   1.0  447 GiB   46 GiB  229 MiB   45 GiB   662
> MiB  401 GiB  10.26  0.21   66  up
> >> 254   nvme  0.43660   1.0  447 GiB   46 GiB  161 MiB   44 GiB   1.3
> GiB  401 GiB  10.26  0.21   65  up
> >> 255   nvme  0.43660   1.0  447 GiB   47 GiB  161 MiB   46 GiB   1.2
> GiB  400 GiB  10.58  0.21   64  up
> >> 288   nvme  0.43660   1.0  447 GiB   46 GiB  161 MiB   44 GiB   1.2
> GiB  401 GiB  10.25  0.21   64  up
> >> 289   nvme  0.43660   1.0  447 GiB   46 GiB  161 MiB   45 GiB   641
> MiB  401 GiB  10.33  0.21   64  up
> >> 290   nvme  0.43660   1.0  447 GiB   45 GiB  229 MiB   44 GiB   668
> MiB  402 GiB  10.14  0.21   65  up
> >>
> >> However in nvme list it says full:
> >> Node SN   ModelNamespace
> Usage  Format   FW Rev
> >>  
> --- -
> >> --  
>
> >> /dev/nvme0n1 90D0A00XTXTR KCD6XLUL1T92 1
> 1.92  TB /   1.92  TB512   B +  0 B   GPK6
> >> /dev/nvme1n1 60P0A003TXTR KCD6XLUL1T92 1
> 1.92  TB /   1.92  TB512   B +  0 B   GPK6
>
> That command isn't telling you what you think it is.  It has no awareness
> of actual data, it's looking at NVMe namespaces.
>
> >>
> >> With some other node the test was like:
> >>
> >>  *   if none of the disk full, no slow ops.
> >>  *   If 1x disk full and the other not, has slow ops but not too much
> >>  *   if none of the disk full, no slow ops.
> >>
> >> The full disks are very highly utilized during recovery and they are
> >> holding back the operations from the other nvmes.
> >>
> >> What's the reason that even if the pgs are the same in the cluster +/-1
> >> regarding space they are not equally utilized.
> >>
> >> Thank you
> >>
> >>
> >>
> >> 
> >> This message is confidential and is for the sole use of the intended
> >> recipient(s). It may also be privileged or otherwise protected by
> copyright
>

[ceph-users] Re: Libvirt and Ceph: libvirtd tries to open random RBD images

2023-12-04 Thread Eugen Block

Hi,

I'm not familiar with Cloudstack, I was just wondering if it tries to
query the pool "rbd"? Some tools refer to a default pool "rbd" if no
pool is specified. Do you have an "rbd" pool in that cluster?
Another thought are namespaces, do you have those defined? Can you
increase the debug level to see what exactly it tries to do?

Regards,
Eugen

Zitat von Jayanth Reddy :

Hello Users,
We're using libvirt with KVM and the orchestrator is Cloudstack. I raised
the issue already at Cloudstack at
https://github.com/apache/cloudstack/issues/8211 but appears to be at
libvirtd. Did the same in libvirt ML at
https://lists.libvirt.org/archives/list/us...@lists.libvirt.org/thread/SA2I4QZGVVEIKPJU7E2KAFYYFZLJZDMV/
but I'm now here looking for answers.

Below is our environment & issue description:

Ceph: v17.2.0
Pool: replicated
Number of block images in this pool: more than 1250

# virsh pool-info c15508c7-5c2c-317f-aa2e-29f307771415
Name: c15508c7-5c2c-317f-aa2e-29f307771415
UUID: c15508c7-5c2c-317f-aa2e-29f307771415
State: running
Persistent: no
Autostart: no
Capacity: 1.25 PiB
Allocation: 489.52 TiB
Available: 787.36 TiB

# kvm --version
QEMU emulator version 4.2.1 (Debian 1:4.2-3ubuntu6.27)
Copyright (c) 2003-2019 Fabrice Bellard and the QEMU Project developers

# libvirtd --version
libvirtd (libvirt) 6.0.0

It appears that one of our Cloudstack KVM clusters having 8 hosts is having
the issue. We have HCI on these 8 hosts and there are around 700+ VMs
running. But strange enough, there are these logs like below on hosts.

Oct 25 13:38:11 hv-01 libvirtd[9464]: failed to open the RBD image
'087bb114-448a-41d2-9f5d-6865b62eed15': No such file or directory
Oct 25 20:35:22 hv-01 libvirtd[9464]: failed to open the RBD image
'ccc1168a-5ffa-4b6d-a953-8e0ac788ebc5': No such file or directory
Oct 26 09:48:33 hv-01 libvirtd[9464]: failed to open the RBD image
'a3fe82f8-afc9-4604-b55e-91b676514a18': No such file or directory

We've got DNS servers on which there is an`A` record resolving to all the
IPv4 Addresses of 5 monitors and there have not been any issues with the
DNS resolution. But the issue of "failed to open the RBD image
'ccc1168a-5ffa-4b6d-a953-8e0ac788ebc5': No such file or directory" gets
more weird because the VM that is making use of that RBD image lets say
"087bb114-448a-41d2-9f5d-6865b62eed15" is running on an altogether
different host like "hv-06". On further inspection of that specific Virtual
Machine, it has been running on that host "hv-06" for more than 4 months or
so. Fortunately, the Virtual Machine has no issues and has been running
since then. There are absolutely no issues with any of the Virtual Machines
because of these warnings.

From libvirtd mailing lists, one of the community members helped me
understand that libvirt only tries to get the info of the images and
doesn't open for reading or writing. All hosts where there is libvirtd
tries doing the same. We manually did "virsh pool-refresh" which CloudStack
itself takes care of at regular intervals and the warning messages still
appear. Please help me find the cause and let me know if further
information is needed.

Thanks,
Jayanth Reddy
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: How to identify the index pool real usage?

2023-12-04 Thread Szabo, Istvan (Agoda)

These values shouldn't be true to be able to do triming?


"bdev_async_discard": "false",
"bdev_enable_discard": "false",



Istvan Szabo
Staff Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com
---




From: David C. 
Sent: Monday, December 4, 2023 3:44 PM
To: Szabo, Istvan (Agoda) 
Cc: Anthony D'Atri ; Ceph Users 
Subject: Re: [ceph-users] How to identify the index pool real usage?

Email received from the internet. If in doubt, don't click any link nor open 
any attachment !

Hi,

A flash system needs free space to work efficiently.

Hence my hypothesis that fully allocated disks need to be notified of free 
blocks (trim)


Cordialement,

David CASIER





Le lun. 4 déc. 2023 à 06:01, Szabo, Istvan (Agoda) 
mailto:istvan.sz...@agoda.com>> a écrit :
With the nodes that has some free space on that namespace, we don't have issue, 
only with this which is weird.

From: Anthony D'Atri mailto:anthony.da...@gmail.com>>
Sent: Friday, December 1, 2023 10:53 PM
To: David C. mailto:david.cas...@aevoo.fr>>
Cc: Szabo, Istvan (Agoda) 
mailto:istvan.sz...@agoda.com>>; Ceph Users 
mailto:ceph-users@ceph.io>>
Subject: Re: [ceph-users] How to identify the index pool real usage?

Email received from the internet. If in doubt, don't click any link nor open 
any attachment !


>>
>> Today we had a big issue with slow ops on the nvme drives which holding
>> the index pool.
>>
>> Why the nvme shows full if on ceph is barely utilized? Which one I should
>> belive?
>>
>> When I check the ceph osd df it shows 10% usage of the osds (1x 2TB nvme
>> drive has 4x osds on it):

Why split each device into 4 very small OSDs?  You're losing a lot of capacity 
to overhead.

>>
>> ID   CLASS  WEIGHT   REWEIGHT  SIZE RAW USE  DATA OMAP META  
>> AVAIL%USE   VAR   PGS  STATUS
>> 195   nvme  0.43660   1.0  447 GiB   47 GiB  161 MiB   46 GiB   656 MiB  
>> 400 GiB  10.47  0.21   64  up
>> 252   nvme  0.43660   1.0  447 GiB   46 GiB  161 MiB   45 GiB   845 MiB  
>> 401 GiB  10.35  0.21   64  up
>> 253   nvme  0.43660   1.0  447 GiB   46 GiB  229 MiB   45 GiB   662 MiB  
>> 401 GiB  10.26  0.21   66  up
>> 254   nvme  0.43660   1.0  447 GiB   46 GiB  161 MiB   44 GiB   1.3 GiB  
>> 401 GiB  10.26  0.21   65  up
>> 255   nvme  0.43660   1.0  447 GiB   47 GiB  161 MiB   46 GiB   1.2 GiB  
>> 400 GiB  10.58  0.21   64  up
>> 288   nvme  0.43660   1.0  447 GiB   46 GiB  161 MiB   44 GiB   1.2 GiB  
>> 401 GiB  10.25  0.21   64  up
>> 289   nvme  0.43660   1.0  447 GiB   46 GiB  161 MiB   45 GiB   641 MiB  
>> 401 GiB  10.33  0.21   64  up
>> 290   nvme  0.43660   1.0  447 GiB   45 GiB  229 MiB   44 GiB   668 MiB  
>> 402 GiB  10.14  0.21   65  up
>>
>> However in nvme list it says full:
>> Node SN   ModelNamespace Usage   
>>Format   FW Rev
>>   
>> --- -
>> --  

>> /dev/nvme0n1 90D0A00XTXTR KCD6XLUL1T92 1   1.92  TB 
>> /   1.92  TB512   B +  0 B   GPK6
>> /dev/nvme1n1 60P0A003TXTR KCD6XLUL1T92 1   1.92  TB 
>> /   1.92  TB512   B +  0 B   GPK6

That command isn't telling you what you think it is.  It has no awareness of 
actual data, it's looking at NVMe namespaces.

>>
>> With some other node the test was like:
>>
>>  *   if none of the disk full, no slow ops.
>>  *   If 1x disk full and the other not, has slow ops but not too much
>>  *   if none of the disk full, no slow ops.
>>
>> The full disks are very highly utilized during recovery and they are
>> holding back the operations from the other nvmes.
>>
>> What's the reason that even if the pgs are the same in the cluster +/-1
>> regarding space they are not equally utilized.
>>
>> Thank you
>>
>>
>>
>> 
>> This message is confidential and is for the sole use of the intended
>> recipient(s). It may also be privileged or otherwise protected by copyright
>> or other legal rules. If you have received it by mistake please let us know
>> by reply email and delete it from your system. It is prohibited to copy
>> this message or disclose its content to anyone. Any confidentiality or
>> privilege is not waived or lost by any mistaken delivery or unauthorized
>> disclosure of the message. All messages sent to and from Agoda may be
>> monitored to ensure compliance with company policies, to protect the
>> company's interests and to remove potential malware.

[ceph-users] Re: How to identify the index pool real usage?

2023-12-04 Thread David C.

Hi,

A flash system needs free space to work efficiently.

Hence my hypothesis that fully allocated disks need to be notified of free
blocks (trim)


Cordialement,

*David CASIER*





Le lun. 4 déc. 2023 à 06:01, Szabo, Istvan (Agoda) 
a écrit :

> With the nodes that has some free space on that namespace, we don't have
> issue, only with this which is weird.
> --
> *From:* Anthony D'Atri 
> *Sent:* Friday, December 1, 2023 10:53 PM
> *To:* David C. 
> *Cc:* Szabo, Istvan (Agoda) ; Ceph Users <
> ceph-users@ceph.io>
> *Subject:* Re: [ceph-users] How to identify the index pool real usage?
>
> Email received from the internet. If in doubt, don't click any link nor
> open any attachment !
> 
>
> >>
> >> Today we had a big issue with slow ops on the nvme drives which holding
> >> the index pool.
> >>
> >> Why the nvme shows full if on ceph is barely utilized? Which one I
> should
> >> belive?
> >>
> >> When I check the ceph osd df it shows 10% usage of the osds (1x 2TB nvme
> >> drive has 4x osds on it):
>
> Why split each device into 4 very small OSDs?  You're losing a lot of
> capacity to overhead.
>
> >>
> >> ID   CLASS  WEIGHT   REWEIGHT  SIZE RAW USE  DATA OMAP
> META  AVAIL%USE   VAR   PGS  STATUS
> >> 195   nvme  0.43660   1.0  447 GiB   47 GiB  161 MiB   46 GiB   656
> MiB  400 GiB  10.47  0.21   64  up
> >> 252   nvme  0.43660   1.0  447 GiB   46 GiB  161 MiB   45 GiB   845
> MiB  401 GiB  10.35  0.21   64  up
> >> 253   nvme  0.43660   1.0  447 GiB   46 GiB  229 MiB   45 GiB   662
> MiB  401 GiB  10.26  0.21   66  up
> >> 254   nvme  0.43660   1.0  447 GiB   46 GiB  161 MiB   44 GiB   1.3
> GiB  401 GiB  10.26  0.21   65  up
> >> 255   nvme  0.43660   1.0  447 GiB   47 GiB  161 MiB   46 GiB   1.2
> GiB  400 GiB  10.58  0.21   64  up
> >> 288   nvme  0.43660   1.0  447 GiB   46 GiB  161 MiB   44 GiB   1.2
> GiB  401 GiB  10.25  0.21   64  up
> >> 289   nvme  0.43660   1.0  447 GiB   46 GiB  161 MiB   45 GiB   641
> MiB  401 GiB  10.33  0.21   64  up
> >> 290   nvme  0.43660   1.0  447 GiB   45 GiB  229 MiB   44 GiB   668
> MiB  402 GiB  10.14  0.21   65  up
> >>
> >> However in nvme list it says full:
> >> Node SN   ModelNamespace
> Usage  Format   FW Rev
> >>  
> --- -
> >> --  
>
> >> /dev/nvme0n1 90D0A00XTXTR KCD6XLUL1T92 1
> 1.92  TB /   1.92  TB512   B +  0 B   GPK6
> >> /dev/nvme1n1 60P0A003TXTR KCD6XLUL1T92 1
> 1.92  TB /   1.92  TB512   B +  0 B   GPK6
>
> That command isn't telling you what you think it is.  It has no awareness
> of actual data, it's looking at NVMe namespaces.
>
> >>
> >> With some other node the test was like:
> >>
> >>  *   if none of the disk full, no slow ops.
> >>  *   If 1x disk full and the other not, has slow ops but not too much
> >>  *   if none of the disk full, no slow ops.
> >>
> >> The full disks are very highly utilized during recovery and they are
> >> holding back the operations from the other nvmes.
> >>
> >> What's the reason that even if the pgs are the same in the cluster +/-1
> >> regarding space they are not equally utilized.
> >>
> >> Thank you
> >>
> >>
> >>
> >> 
> >> This message is confidential and is for the sole use of the intended
> >> recipient(s). It may also be privileged or otherwise protected by
> copyright
> >> or other legal rules. If you have received it by mistake please let us
> know
> >> by reply email and delete it from your system. It is prohibited to copy
> >> this message or disclose its content to anyone. Any confidentiality or
> >> privilege is not waived or lost by any mistaken delivery or unauthorized
> >> disclosure of the message. All messages sent to and from Agoda may be
> >> monitored to ensure compliance with company policies, to protect the
> >> company's interests and to remove potential malware. Electronic messages
> >> may be intercepted, amended, lost or deleted, or contain viruses.
> >> ___
> >> ceph-users mailing list -- ceph-users@ceph.io
> >> To unsubscribe send an email to ceph-users-le...@ceph.io
> >>
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
> --
> This message is confidential and is for the sole use of the intended
> recipient(s). It may also be privileged or otherwise protected by copyright
> or other legal rules. If you have received it by mistake please let us know
> by reply email and delete it from your

[ceph-users] Re: mds slow request with “failed to authpin, subtree is being exported"

2023-12-04 Thread zxcs

Thanks a lot, Xiubo!

we already set ‘mds_bal_interval’ to 0. and the slow mds seems decrease.

But somehow we still see mds complain slow request. and from mds log , can see 

“slow request *** seconds old, received at 2023-12-04T…: internal op 
exportdir:mds.* currently acquired locks”

so our question is, why it still see "internal op exportdir”, any other config 
also need to set 0? and could please shed light here which config we need set .


Thanks,
xz 

> 2023年11月27日 13:19，Xiubo Li  写道：
> 
> 
> On 11/27/23 13:12, zxcs wrote:
>> current, we using `ceph config set mds mds_bal_interval 3600` to set a fixed 
>> time(1 hour).
>> 
>> we also have a question about how to set no balance for multi active mds.
>> 
>> means, we will enable multi active mds(to improve throughput) and no balance 
>> for these mds.
>> 
>> and if we set mds_bal_interval as big number seems can void this issue?
>> 
> You can just set 'mds_bal_interval' to 0.
> 
> 
>> 
>> 
>> Thanks,
>> xz
>> 
>>> 2023年11月27日 10:56，Ben  写道：
>>> 
>>> with the same mds configuration, we see exactly the same(problem, log and
>>> solution) with 17.2.5, constantly happening again and again in couples days
>>> intervals. MDS servers are stuck somewhere, ceph status reports no issue
>>> however. We need to restart some of the mds (if not all of them) to restore
>>> them back. Hopefully this could be fixed soon or get docs updated with
>>> warning for the balancer's usage in production environment.
>>> 
>>> thanks and regards
>>> 
>>> Xiubo Li  于2023年11月23日周四 15:47写道：
>>> 
 
 On 11/23/23 11:25, zxcs wrote:
> Thanks a ton, Xiubo!
> 
> it not disappear.
> 
> even we umount the ceph directory on these two old os node.
> 
> after dump ops flight , we can see some request, and the earliest
 complain “failed to authpin, subtree is being exported"
> 
> And how to avoid this, would you please help to shed some light here?
 
 Okay, as Frank mentioned you can try to disable the balancer by pining
 the directories. As I remembered the balancer is buggy.
 
 And also you can raise one ceph tracker and provide the debug logs if
 you have.
 
 Thanks
 
 - Xiubo
 
 
> Thanks,
> xz
> 
> 
>> 2023年11月22日 19:44，Xiubo Li  写道：
>> 
>> 
>> On 11/22/23 16:02, zxcs wrote:
>>> HI, Experts,
>>> 
>>> we are using cephfs with  16.2.* with multi active mds, and recently,
 we have two nodes mount with ceph-fuse due to the old os system.
>>> 
>>> and  one nodes run a python script with `glob.glob(path)`, and another
 client doing `cp` operation on the same path.
>>> 
>>> then we see some log about `mds slow request`, and logs complain
 “failed to authpin, subtree is being exported"
>>> 
>>> then need to restart mds,
>>> 
>>> 
>>> our question is, does there any dead lock?  how can we avoid this and
 how to fix it without restart mds(it will influence other users) ?
>> BTW, won't the slow requests disappear themself later ?
>> 
>> It looks like the exporting is slow or there too many exports are going
 on.
>> 
>> Thanks
>> 
>> - Xiubo
>> 
>>> Thanks a ton!
>>> 
>>> 
>>> xz
>>> ___
>>> ceph-users mailing list -- ceph-users@ceph.io
>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
 ___
 ceph-users mailing list -- ceph-users@ceph.io
 To unsubscribe send an email to ceph-users-le...@ceph.io
 
>>> ___
>>> ceph-users mailing list -- ceph-users@ceph.io
>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: the image used size becomes 0 after export/import with snapshot

[ceph-users] Re: mds slow request with “failed to authpin, subtree is being exported"

[ceph-users] Re: mds slow request with “failed to authpin, subtree is being exported"

[ceph-users] EC Profiles & DR

[ceph-users] Re: ceph fs (meta) data inconsistent

[ceph-users] Re: MDS stuck in up:rejoin

[ceph-users] Re: [ext] CephFS pool not releasing space after data deletion

[ceph-users] Re: the image used size becomes 0 after export/import with snapshot

[ceph-users] Re: Libvirt and Ceph: libvirtd tries to open random RBD images

[ceph-users] Nov/Dec Ceph Science Virtual User Group

[ceph-users] Re: MDS stuck in up:rejoin

[ceph-users] Re: reef 18.2.1 QE Validation status

[ceph-users] Re: Space reclaim doesn't happening in nautilus RBD pool

[ceph-users] Re: Ceph 16.2.14: osd crash, bdev() _aio_thread got r=-1 ((1) Operation not permitted)

[ceph-users] Re: How to identify the index pool real usage?

[ceph-users] Re: Libvirt and Ceph: libvirtd tries to open random RBD images

[ceph-users] Re: How to identify the index pool real usage?

[ceph-users] Re: How to identify the index pool real usage?

[ceph-users] Re: mds slow request with “failed to authpin, subtree is being exported"

19 matches

Site Navigation

Mail list logo

Footer information