[ceph-users] Re: Need easy way to calculate Ceph cluster space for SolarWinds

2024-03-20 Thread Anthony D'Atri
.84 to 1.22 is a pretty big.  I suspect your balancer is turned off or 
something in your CRUSH map is confounding it.  

> On Mar 20, 2024, at 5:20 PM, Michael Worsham  
> wrote:
> 
> It seems to be relatively close to that +/- 1.00 range.
> 
> ubuntu@juju-5dcfd8-3-lxd-2:~$ sudo ceph osd df
> ID  CLASS  WEIGHTREWEIGHT  SIZE RAW USE  DATA OMAP META 
> AVAIL%USE   VAR   PGS  STATUS
> 1ssd  18.19040   1.0   18 TiB   10 TiB   10 TiB   11 GiB   38 GiB  
> 8.1 TiB  55.59  0.92  174  up
> 4ssd  18.19040   1.0   18 TiB   11 TiB   11 TiB   20 GiB   34 GiB  
> 7.3 TiB  59.90  0.99  175  up
> 9ssd  18.19040   1.0   18 TiB  9.7 TiB  9.6 TiB   23 GiB   35 GiB  
> 8.5 TiB  53.11  0.88  185  up
> 13ssd  18.19040   1.0   18 TiB   13 TiB   13 TiB   19 GiB   48 GiB  
> 4.8 TiB  73.87  1.22  199  up
> 17ssd  18.19040   1.0   18 TiB   12 TiB   12 TiB   14 GiB   55 GiB  
> 5.9 TiB  67.49  1.12  185  up
> 21ssd  18.19040   1.0   18 TiB   10 TiB   10 TiB   24 GiB   38 GiB  
> 7.8 TiB  57.27  0.95  179  up
> 25ssd  18.19040   1.0   18 TiB   12 TiB   12 TiB   20 GiB   57 GiB  
> 6.1 TiB  66.70  1.10  192  up
> 29ssd  18.19040   1.0   18 TiB  9.2 TiB  9.2 TiB   15 GiB   34 GiB  
> 9.0 TiB  50.61  0.84  170  up
> 33ssd  18.19040   1.0   18 TiB  9.2 TiB  9.1 TiB   13 GiB   36 GiB  
> 9.0 TiB  50.56  0.84  180  up
> 39ssd  18.19040   1.0   18 TiB   12 TiB   12 TiB  4.3 GiB   45 GiB  
> 6.0 TiB  66.84  1.11  188  up
> 44ssd  18.19040   1.0   18 TiB   13 TiB   13 TiB   10 GiB   56 GiB  
> 5.3 TiB  70.59  1.17  187  up
> 46ssd  18.19040   1.0   18 TiB   10 TiB   10 TiB   14 GiB   44 GiB  
> 7.8 TiB  57.24  0.95  174  up
> 0ssd  18.19040   1.0   18 TiB   11 TiB   11 TiB  5.0 GiB   38 GiB  
> 6.9 TiB  62.13  1.03  172  up
> 5ssd  18.19040   1.0   18 TiB   11 TiB   11 TiB   11 GiB   44 GiB  
> 7.3 TiB  59.69  0.99  177  up
> 10ssd  18.19040   1.0   18 TiB   12 TiB   11 TiB   18 GiB   47 GiB  
> 6.7 TiB  63.34  1.05  190  up
> 14ssd  18.19040   1.0   18 TiB   12 TiB   12 TiB  5.1 GiB   48 GiB  
> 6.3 TiB  65.51  1.08  189  up
> 18ssd  18.19040   1.0   18 TiB  9.7 TiB  9.6 TiB   13 GiB   33 GiB  
> 8.5 TiB  53.14  0.88  175  up
> 22ssd  18.19040   1.0   18 TiB   11 TiB   11 TiB   20 GiB   42 GiB  
> 7.3 TiB  59.61  0.99  183  up
> 26ssd  18.19040   1.0   18 TiB  9.8 TiB  9.8 TiB  9.9 GiB   31 GiB  
> 8.4 TiB  53.88  0.89  186  up
> 30ssd  18.19040   1.0   18 TiB   12 TiB   12 TiB  4.5 GiB   56 GiB  
> 6.4 TiB  64.65  1.07  179  up
> 34ssd  18.19040   1.0   18 TiB   13 TiB   13 TiB   18 GiB   49 GiB  
> 4.9 TiB  73.17  1.21  192  up
> 38ssd  18.19040   1.0   18 TiB   12 TiB   12 TiB   16 GiB   51 GiB  
> 6.4 TiB  64.67  1.07  186  up
> 40ssd  18.19040   1.0   18 TiB   13 TiB   12 TiB   23 GiB   52 GiB  
> 5.7 TiB  68.91  1.14  184  up
> 42ssd  18.19040   1.0   18 TiB  9.7 TiB  9.7 TiB   14 GiB   26 GiB  
> 8.5 TiB  53.35  0.88  171  up
> 3ssd  18.19040   1.0   18 TiB  9.9 TiB  9.8 TiB   15 GiB   36 GiB  
> 8.3 TiB  54.42  0.90  184  up
> 7ssd  18.19040   1.0   18 TiB   10 TiB   10 TiB  5.4 GiB   36 GiB  
> 7.9 TiB  56.73  0.94  184  up
> 11ssd  18.19040   1.0   18 TiB   12 TiB   12 TiB  9.6 GiB   54 GiB  
> 6.6 TiB  63.76  1.06  188  up
> 15ssd  18.19040   1.0   18 TiB   11 TiB   11 TiB  7.2 GiB   47 GiB  
> 7.5 TiB  58.51  0.97  192  up
> 19ssd  18.19040   1.0   18 TiB   11 TiB   11 TiB   19 GiB   41 GiB  
> 7.3 TiB  59.87  0.99  181  up
> 23ssd  18.19040   1.0   18 TiB   13 TiB   12 TiB   20 GiB   54 GiB  
> 5.7 TiB  68.89  1.14  181  up
> 27ssd  18.19040   1.0   18 TiB  9.0 TiB  9.0 TiB   15 GiB   31 GiB  
> 9.2 TiB  49.63  0.82  173  up
> 32ssd  18.19040   1.0   18 TiB   11 TiB   11 TiB   20 GiB   26 GiB  
> 7.3 TiB  59.92  0.99  183  up
> 36ssd  18.19040   1.0   18 TiB  8.3 TiB  8.3 TiB   11 GiB   17 GiB  
> 9.8 TiB  45.86  0.76  177  up
> 41ssd  18.19040   1.0   18 TiB   13 TiB   13 TiB   25 GiB   49 GiB  
> 5.2 TiB  71.30  1.18  191  up
> 45ssd  18.19040   1.0   18 TiB   10 TiB   10 TiB   13 GiB   42 GiB  
> 7.9 TiB  56.37  0.93  163  up
> 47ssd  18.19040   1.0   18 TiB   11 TiB   11 TiB   13 GiB   38 GiB  
> 7.2 TiB  60.45  1.00  167  up
> 2ssd  18.19040   1.0   18 TiB   11 TiB   11 TiB  5.2 GiB   43 GiB  
> 7.2 TiB  60.42  1.00  179  up
> 6ssd  18.19040   1.0   18 TiB   11 TiB   11 TiB   28 GiB   47 GiB  
> 7.1 TiB  60.99  1.01  184  up
> 8ssd  18.19040   1.0   18 TiB   13 TiB   13 TiB   20 GiB   59 GiB  
> 5.5 TiB  69.95  1.16  184  up
> 12ssd  18.19040   1.0   18 TiB   11 TiB   11 TiB  6.2 GiB   39 GiB  
> 7.4 TiB  59.22  0.98  180  up

[ceph-users] Re: CephFS space usage

2024-03-20 Thread Anthony D'Atri
Grep through the ls output for ‘rados bench’ leftovers, it’s easy to leave them 
behind.  

> On Mar 20, 2024, at 5:28 PM, Igor Fedotov  wrote:
> 
> Hi Thorne,
> 
> unfortunately I'm unaware of any tools high level enough to easily map files 
> to rados objects without deep undestanding how this works. You might want to 
> try "rados ls" command to get the list of all the objects in the cephfs data 
> pool. And then  learn how that mapping is performed and parse your listing.
> 
> 
> Thanks,
> 
> Igor
> 
>> On 3/20/2024 1:30 AM, Thorne Lawler wrote:
>> 
>> Igor,
>> 
>> Those files are VM disk images, and they're under constant heavy use, so 
>> yes- there/is/ constant severe write load against this disk.
>> 
>> Apart from writing more test files into the filesystems, there must be Ceph 
>> diagnostic tools to describe what those objects are being used for, surely?
>> 
>> We're talking about an extra 10TB of space. How hard can it be to determine 
>> which file those objects are associated with?
>> 
>>> On 19/03/2024 8:39 pm, Igor Fedotov wrote:
>>> 
>>> Hi Thorn,
>>> 
>>> given the amount of files at CephFS volume I presume you don't have severe 
>>> write load against it. Is that correct?
>>> 
>>> If so we can assume that the numbers you're sharing are mostly refer to 
>>> your experiment. At peak I can see bytes_used increase = 629,461,893,120 
>>> bytes (45978612027392  - 45349150134272). With replica factor = 3 this 
>>> roughly matches your written data (200GB I presume?).
>>> 
>>> 
>>> More interestingly is that after file's removal we can see 419,450,880 
>>> bytes delta (=45349569585152 - 45349150134272). I could see two options 
>>> (apart that someone else wrote additional stuff to CephFS during the 
>>> experiment) to explain this:
>>> 
>>> 1. File removal wasn't completed at the last probe half an hour after 
>>> file's removal. Did you see stale object counter when making that probe?
>>> 
>>> 2. Some space is leaking. If that's the case this could be a reason for 
>>> your issue if huge(?) files at CephFS are created/removed periodically. So 
>>> if we're certain that the leak really occurred (and option 1. above isn't 
>>> the case) it makes sense to run more experiments with writing/removing a 
>>> bunch of huge files to the volume to confirm space leakage.
>>> 
>>> On 3/18/2024 3:12 AM, Thorne Lawler wrote:
 
 Thanks Igor,
 
 I have tried that, and the number of objects and bytes_used took a long 
 time to drop, but they seem to have dropped back to almost the original 
 level:
 
  * Before creating the file:
  o 3885835 objects
  o 45349150134272 bytes_used
  * After creating the file:
  o 3931663 objects
  o 45924147249152 bytes_used
  * Immediately after deleting the file:
  o 3935995 objects
  o 45978612027392 bytes_used
  * Half an hour after deleting the file:
  o 3886013 objects
  o 45349569585152 bytes_used
 
 Unfortunately, this is all production infrastructure, so there is always 
 other activity taking place.
 
 What tools are there to visually inspect the object map and see how it 
 relates to the filesystem?
 
>>> Not sure if there is anything like that at CephFS level but you can use 
>>> rados tool to view objects in cephfs data pool and try to build some 
>>> mapping between them and CephFS file list. Could be a bit tricky though.
 
 On 15/03/2024 7:18 pm, Igor Fedotov wrote:
> ceph df detail --format json-pretty
 --
 
 Regards,
 
 Thorne Lawler - Senior System Administrator
 *DDNS* | ABN 76 088 607 265
 First registrar certified ISO 27001-2013 Data Security Standard ITGOV40172
 P +61 499 449 170
 
 _DDNS
 
 /_*Please note:* The information contained in this email message and any 
 attached files may be confidential information, and may also be the 
 subject of legal professional privilege. _If you are not the intended 
 recipient any use, disclosure or copying of this email is unauthorised. 
 _If you received this email in error, please notify Discount Domain Name 
 Services Pty Ltd on 03 9815 6868 to report this matter and delete all 
 copies of this transmission together with any attachments. /
 
>>> --
>>> Igor Fedotov
>>> Ceph Lead Developer
>>> 
>>> Looking for help with your Ceph cluster? Contact us athttps://croit.io
>>> 
>>> croit GmbH, Freseniusstr. 31h, 81247 Munich
>>> CEO: Martin Verges - VAT-ID: DE310638492
>>> Com. register: Amtsgericht Munich HRB 231263
>>> Web:https://croit.io  | YouTube:https://goo.gl/PGE1Bx
>> --
>> 
>> Regards,
>> 
>> Thorne Lawler - Senior System Administrator
>> *DDNS* | ABN 76 088 607 265
>> First registrar certified ISO 27001-2013 Data Security Standard ITGOV40172
>> P +61 499 449 170
>> 
>> _DDNS
>> 
>> /_*Please note:* The information contained in this email message and any 
>> attached files may be confidential

[ceph-users] Re: CephFS space usage

2024-03-20 Thread Igor Fedotov

Thorne,

if that's a bug in Ceph which causes space leakage you might be unable 
to reclaim the space without total purge of the pool.


The problem is that we still uncertain if this is a leakage or something 
else. Hence the need for more thorough research.



Thanks,

Igor

On 3/20/2024 9:13 PM, Thorne Lawler wrote:


Alexander,

Thanks for explaining this. As I suspected, this is a high abstract 
pursuit of what caused the problem, and while I'm sure this makes 
sense for Ceph developers, it isn't going to happen in this case.


I don't care how it got this way- the tools used to create this pool 
will never be used in our environment again after I recover this disk 
space - the entire reason I need to recover the missing space is so I 
can move enough filesystems around to remove the current structure and 
the tools that made it.


I only need to get that disk space back. Any analysis I do will be 
solely directed towards achieving that.


Thanks.

On 21/03/2024 3:10 am, Alexander E. Patrakov wrote:

Hi Thorne,

The idea is quite simple. By retesting the leak with a separate pool, 
used by nobody except you, in the case if the leak exists and is 
reproducible (which is not a given), you can definitely pinpoint it 
without giving any chance to the alternate hypothesis "somebody wrote 
some data in parallel". And then, even if the leak is small but 
reproducible, one can say that multiple such events accumulated to 10 
TB of garbage in the original pool.


On Wed, Mar 20, 2024 at 7:29 PM Thorne Lawler  wrote:

Alexander,

I'm happy to create a new pool if it will help, but I don't
presently see how creating a new pool will help us to identify
the source of the 10TB discrepancy in this original cephfs pool.

Please help me to understand what you are hoping to find...?

On 20/03/2024 6:35 pm, Alexander E. Patrakov wrote:

Thorne,

That's why I asked you to create a separate pool. All writes go
to the original pool, and it is possible to see object counts
per-pool.

On Wed, Mar 20, 2024 at 6:32 AM Thorne Lawler
 wrote:

Alexander,

Thank you, but as I said to Igor: The 5.5TB of files on this
filesystem are virtual machine disks. They are under
constant, heavy write load. There is no way to turn this off.

On 19/03/2024 9:36 pm, Alexander E. Patrakov wrote:

Hello Thorne,

Here is one more suggestion on how to debug this. Right now, there is
uncertainty on whether there is really a disk space leak or if
something simply wrote new data during the test.

If you have at least three OSDs you can reassign, please set their
CRUSH device class to something different than before. E.g., "test".
Then, create a new pool that targets this device class and add it to
CephFS. Then, create an empty directory on CephFS and assign this pool
to it using setfattr. Finally, try reproducing the issue using only
files in this directory. This way, you will be sure that nobody else
is writing any data to the new pool.

On Tue, Mar 19, 2024 at 5:40 PM Igor Fedotov  
  wrote:

Hi Thorn,

given the amount of files at CephFS volume I presume you don't have
severe write load against it. Is that correct?

If so we can assume that the numbers you're sharing are mostly refer to
your experiment. At peak I can see bytes_used increase = 629,461,893,120
bytes (45978612027392  - 45349150134272). With replica factor = 3 this
roughly matches your written data (200GB I presume?).


More interestingly is that after file's removal we can see 419,450,880
bytes delta (=45349569585152 - 45349150134272). I could see two options
(apart that someone else wrote additional stuff to CephFS during the
experiment) to explain this:

1. File removal wasn't completed at the last probe half an hour after
file's removal. Did you see stale object counter when making that probe?

2. Some space is leaking. If that's the case this could be a reason for
your issue if huge(?) files at CephFS are created/removed periodically.
So if we're certain that the leak really occurred (and option 1. above
isn't the case) it makes sense to run more experiments with
writing/removing a bunch of huge files to the volume to confirm space
leakage.

On 3/18/2024 3:12 AM, Thorne Lawler wrote:

Thanks Igor,

I have tried that, and the number of objects and bytes_used took a
long time to drop, but they seem to have dropped back to almost the
original level:

   * Before creating the file:
   o 3885835 objects
   o 45349150134272 bytes_used
   * After creating the file:
   o 3931663 objects
   o 45924147249152 bytes_used
 

[ceph-users] Re: CephFS space usage

2024-03-20 Thread Igor Fedotov

Thorne,

if that's a bug in Ceph which causes space leakage you might be unable 
to reclaim the space without total purge of the pool.


The problem is that we still uncertain if this is a leakage or something 
else. Hence the need for more thorough research.



Thanks,

Igor

On 3/20/2024 9:13 PM, Thorne Lawler wrote:


Alexander,

Thanks for explaining this. As I suspected, this is a high abstract 
pursuit of what caused the problem, and while I'm sure this makes 
sense for Ceph developers, it isn't going to happen in this case.


I don't care how it got this way- the tools used to create this pool 
will never be used in our environment again after I recover this disk 
space - the entire reason I need to recover the missing space is so I 
can move enough filesystems around to remove the current structure and 
the tools that made it.


I only need to get that disk space back. Any analysis I do will be 
solely directed towards achieving that.


Thanks.

On 21/03/2024 3:10 am, Alexander E. Patrakov wrote:

Hi Thorne,

The idea is quite simple. By retesting the leak with a separate pool, 
used by nobody except you, in the case if the leak exists and is 
reproducible (which is not a given), you can definitely pinpoint it 
without giving any chance to the alternate hypothesis "somebody wrote 
some data in parallel". And then, even if the leak is small but 
reproducible, one can say that multiple such events accumulated to 10 
TB of garbage in the original pool.


On Wed, Mar 20, 2024 at 7:29 PM Thorne Lawler  wrote:

Alexander,

I'm happy to create a new pool if it will help, but I don't
presently see how creating a new pool will help us to identify
the source of the 10TB discrepancy in this original cephfs pool.

Please help me to understand what you are hoping to find...?

On 20/03/2024 6:35 pm, Alexander E. Patrakov wrote:

Thorne,

That's why I asked you to create a separate pool. All writes go
to the original pool, and it is possible to see object counts
per-pool.

On Wed, Mar 20, 2024 at 6:32 AM Thorne Lawler
 wrote:

Alexander,

Thank you, but as I said to Igor: The 5.5TB of files on this
filesystem are virtual machine disks. They are under
constant, heavy write load. There is no way to turn this off.

On 19/03/2024 9:36 pm, Alexander E. Patrakov wrote:

Hello Thorne,

Here is one more suggestion on how to debug this. Right now, there is
uncertainty on whether there is really a disk space leak or if
something simply wrote new data during the test.

If you have at least three OSDs you can reassign, please set their
CRUSH device class to something different than before. E.g., "test".
Then, create a new pool that targets this device class and add it to
CephFS. Then, create an empty directory on CephFS and assign this pool
to it using setfattr. Finally, try reproducing the issue using only
files in this directory. This way, you will be sure that nobody else
is writing any data to the new pool.

On Tue, Mar 19, 2024 at 5:40 PM Igor Fedotov  
  wrote:

Hi Thorn,

given the amount of files at CephFS volume I presume you don't have
severe write load against it. Is that correct?

If so we can assume that the numbers you're sharing are mostly refer to
your experiment. At peak I can see bytes_used increase = 629,461,893,120
bytes (45978612027392  - 45349150134272). With replica factor = 3 this
roughly matches your written data (200GB I presume?).


More interestingly is that after file's removal we can see 419,450,880
bytes delta (=45349569585152 - 45349150134272). I could see two options
(apart that someone else wrote additional stuff to CephFS during the
experiment) to explain this:

1. File removal wasn't completed at the last probe half an hour after
file's removal. Did you see stale object counter when making that probe?

2. Some space is leaking. If that's the case this could be a reason for
your issue if huge(?) files at CephFS are created/removed periodically.
So if we're certain that the leak really occurred (and option 1. above
isn't the case) it makes sense to run more experiments with
writing/removing a bunch of huge files to the volume to confirm space
leakage.

On 3/18/2024 3:12 AM, Thorne Lawler wrote:

Thanks Igor,

I have tried that, and the number of objects and bytes_used took a
long time to drop, but they seem to have dropped back to almost the
original level:

   * Before creating the file:
   o 3885835 objects
   o 45349150134272 bytes_used
   * After creating the file:
   o 3931663 objects
   o 45924147249152 bytes_used
 

[ceph-users] Re: CephFS space usage

2024-03-20 Thread Igor Fedotov

Hi Thorne,

unfortunately I'm unaware of any tools high level enough to easily map 
files to rados objects without deep undestanding how this works. You 
might want to try "rados ls" command to get the list of all the objects 
in the cephfs data pool. And then  learn how that mapping is performed 
and parse your listing.



Thanks,

Igor

On 3/20/2024 1:30 AM, Thorne Lawler wrote:


Igor,

Those files are VM disk images, and they're under constant heavy use, 
so yes- there/is/ constant severe write load against this disk.


Apart from writing more test files into the filesystems, there must be 
Ceph diagnostic tools to describe what those objects are being used 
for, surely?


We're talking about an extra 10TB of space. How hard can it be to 
determine which file those objects are associated with?


On 19/03/2024 8:39 pm, Igor Fedotov wrote:


Hi Thorn,

given the amount of files at CephFS volume I presume you don't have 
severe write load against it. Is that correct?


If so we can assume that the numbers you're sharing are mostly refer 
to your experiment. At peak I can see bytes_used increase = 
629,461,893,120 bytes (45978612027392  - 45349150134272). With 
replica factor = 3 this roughly matches your written data (200GB I 
presume?).



More interestingly is that after file's removal we can see 
419,450,880 bytes delta (=45349569585152 - 45349150134272). I could 
see two options (apart that someone else wrote additional stuff to 
CephFS during the experiment) to explain this:


1. File removal wasn't completed at the last probe half an hour after 
file's removal. Did you see stale object counter when making that probe?


2. Some space is leaking. If that's the case this could be a reason 
for your issue if huge(?) files at CephFS are created/removed 
periodically. So if we're certain that the leak really occurred (and 
option 1. above isn't the case) it makes sense to run more 
experiments with writing/removing a bunch of huge files to the volume 
to confirm space leakage.


On 3/18/2024 3:12 AM, Thorne Lawler wrote:


Thanks Igor,

I have tried that, and the number of objects and bytes_used took a 
long time to drop, but they seem to have dropped back to almost the 
original level:


  * Before creating the file:
  o 3885835 objects
  o 45349150134272 bytes_used
  * After creating the file:
  o 3931663 objects
  o 45924147249152 bytes_used
  * Immediately after deleting the file:
  o 3935995 objects
  o 45978612027392 bytes_used
  * Half an hour after deleting the file:
  o 3886013 objects
  o 45349569585152 bytes_used

Unfortunately, this is all production infrastructure, so there is 
always other activity taking place.


What tools are there to visually inspect the object map and see how 
it relates to the filesystem?


Not sure if there is anything like that at CephFS level but you can 
use rados tool to view objects in cephfs data pool and try to build 
some mapping between them and CephFS file list. Could be a bit tricky 
though.


On 15/03/2024 7:18 pm, Igor Fedotov wrote:

ceph df detail --format json-pretty

--

Regards,

Thorne Lawler - Senior System Administrator
*DDNS* | ABN 76 088 607 265
First registrar certified ISO 27001-2013 Data Security Standard 
ITGOV40172

P +61 499 449 170

_DDNS

/_*Please note:* The information contained in this email message and 
any attached files may be confidential information, and may also be 
the subject of legal professional privilege. _If you are not the 
intended recipient any use, disclosure or copying of this email is 
unauthorised. _If you received this email in error, please notify 
Discount Domain Name Services Pty Ltd on 03 9815 6868 to report this 
matter and delete all copies of this transmission together with any 
attachments. /



--
Igor Fedotov
Ceph Lead Developer

Looking for help with your Ceph cluster? Contact us athttps://croit.io

croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263
Web:https://croit.io  | YouTube:https://goo.gl/PGE1Bx

--

Regards,

Thorne Lawler - Senior System Administrator
*DDNS* | ABN 76 088 607 265
First registrar certified ISO 27001-2013 Data Security Standard ITGOV40172
P +61 499 449 170

_DDNS

/_*Please note:* The information contained in this email message and 
any attached files may be confidential information, and may also be 
the subject of legal professional privilege. _If you are not the 
intended recipient any use, disclosure or copying of this email is 
unauthorised. _If you received this email in error, please notify 
Discount Domain Name Services Pty Ltd on 03 9815 6868 to report this 
matter and delete all copies of this transmission together with any 
attachments. /



--
Igor Fedotov
Ceph Lead Developer

Looking for help with your Ceph cluster? Contact us athttps://croit.io

croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Mun

[ceph-users] Re: Need easy way to calculate Ceph cluster space for SolarWinds

2024-03-20 Thread Michael Worsham
It seems to be relatively close to that +/- 1.00 range.

ubuntu@juju-5dcfd8-3-lxd-2:~$ sudo ceph osd df
ID  CLASS  WEIGHTREWEIGHT  SIZE RAW USE  DATA OMAP META 
AVAIL%USE   VAR   PGS  STATUS
 1ssd  18.19040   1.0   18 TiB   10 TiB   10 TiB   11 GiB   38 GiB  8.1 
TiB  55.59  0.92  174  up
 4ssd  18.19040   1.0   18 TiB   11 TiB   11 TiB   20 GiB   34 GiB  7.3 
TiB  59.90  0.99  175  up
 9ssd  18.19040   1.0   18 TiB  9.7 TiB  9.6 TiB   23 GiB   35 GiB  8.5 
TiB  53.11  0.88  185  up
13ssd  18.19040   1.0   18 TiB   13 TiB   13 TiB   19 GiB   48 GiB  4.8 
TiB  73.87  1.22  199  up
17ssd  18.19040   1.0   18 TiB   12 TiB   12 TiB   14 GiB   55 GiB  5.9 
TiB  67.49  1.12  185  up
21ssd  18.19040   1.0   18 TiB   10 TiB   10 TiB   24 GiB   38 GiB  7.8 
TiB  57.27  0.95  179  up
25ssd  18.19040   1.0   18 TiB   12 TiB   12 TiB   20 GiB   57 GiB  6.1 
TiB  66.70  1.10  192  up
29ssd  18.19040   1.0   18 TiB  9.2 TiB  9.2 TiB   15 GiB   34 GiB  9.0 
TiB  50.61  0.84  170  up
33ssd  18.19040   1.0   18 TiB  9.2 TiB  9.1 TiB   13 GiB   36 GiB  9.0 
TiB  50.56  0.84  180  up
39ssd  18.19040   1.0   18 TiB   12 TiB   12 TiB  4.3 GiB   45 GiB  6.0 
TiB  66.84  1.11  188  up
44ssd  18.19040   1.0   18 TiB   13 TiB   13 TiB   10 GiB   56 GiB  5.3 
TiB  70.59  1.17  187  up
46ssd  18.19040   1.0   18 TiB   10 TiB   10 TiB   14 GiB   44 GiB  7.8 
TiB  57.24  0.95  174  up
 0ssd  18.19040   1.0   18 TiB   11 TiB   11 TiB  5.0 GiB   38 GiB  6.9 
TiB  62.13  1.03  172  up
 5ssd  18.19040   1.0   18 TiB   11 TiB   11 TiB   11 GiB   44 GiB  7.3 
TiB  59.69  0.99  177  up
10ssd  18.19040   1.0   18 TiB   12 TiB   11 TiB   18 GiB   47 GiB  6.7 
TiB  63.34  1.05  190  up
14ssd  18.19040   1.0   18 TiB   12 TiB   12 TiB  5.1 GiB   48 GiB  6.3 
TiB  65.51  1.08  189  up
18ssd  18.19040   1.0   18 TiB  9.7 TiB  9.6 TiB   13 GiB   33 GiB  8.5 
TiB  53.14  0.88  175  up
22ssd  18.19040   1.0   18 TiB   11 TiB   11 TiB   20 GiB   42 GiB  7.3 
TiB  59.61  0.99  183  up
26ssd  18.19040   1.0   18 TiB  9.8 TiB  9.8 TiB  9.9 GiB   31 GiB  8.4 
TiB  53.88  0.89  186  up
30ssd  18.19040   1.0   18 TiB   12 TiB   12 TiB  4.5 GiB   56 GiB  6.4 
TiB  64.65  1.07  179  up
34ssd  18.19040   1.0   18 TiB   13 TiB   13 TiB   18 GiB   49 GiB  4.9 
TiB  73.17  1.21  192  up
38ssd  18.19040   1.0   18 TiB   12 TiB   12 TiB   16 GiB   51 GiB  6.4 
TiB  64.67  1.07  186  up
40ssd  18.19040   1.0   18 TiB   13 TiB   12 TiB   23 GiB   52 GiB  5.7 
TiB  68.91  1.14  184  up
42ssd  18.19040   1.0   18 TiB  9.7 TiB  9.7 TiB   14 GiB   26 GiB  8.5 
TiB  53.35  0.88  171  up
 3ssd  18.19040   1.0   18 TiB  9.9 TiB  9.8 TiB   15 GiB   36 GiB  8.3 
TiB  54.42  0.90  184  up
 7ssd  18.19040   1.0   18 TiB   10 TiB   10 TiB  5.4 GiB   36 GiB  7.9 
TiB  56.73  0.94  184  up
11ssd  18.19040   1.0   18 TiB   12 TiB   12 TiB  9.6 GiB   54 GiB  6.6 
TiB  63.76  1.06  188  up
15ssd  18.19040   1.0   18 TiB   11 TiB   11 TiB  7.2 GiB   47 GiB  7.5 
TiB  58.51  0.97  192  up
19ssd  18.19040   1.0   18 TiB   11 TiB   11 TiB   19 GiB   41 GiB  7.3 
TiB  59.87  0.99  181  up
23ssd  18.19040   1.0   18 TiB   13 TiB   12 TiB   20 GiB   54 GiB  5.7 
TiB  68.89  1.14  181  up
27ssd  18.19040   1.0   18 TiB  9.0 TiB  9.0 TiB   15 GiB   31 GiB  9.2 
TiB  49.63  0.82  173  up
32ssd  18.19040   1.0   18 TiB   11 TiB   11 TiB   20 GiB   26 GiB  7.3 
TiB  59.92  0.99  183  up
36ssd  18.19040   1.0   18 TiB  8.3 TiB  8.3 TiB   11 GiB   17 GiB  9.8 
TiB  45.86  0.76  177  up
41ssd  18.19040   1.0   18 TiB   13 TiB   13 TiB   25 GiB   49 GiB  5.2 
TiB  71.30  1.18  191  up
45ssd  18.19040   1.0   18 TiB   10 TiB   10 TiB   13 GiB   42 GiB  7.9 
TiB  56.37  0.93  163  up
47ssd  18.19040   1.0   18 TiB   11 TiB   11 TiB   13 GiB   38 GiB  7.2 
TiB  60.45  1.00  167  up
 2ssd  18.19040   1.0   18 TiB   11 TiB   11 TiB  5.2 GiB   43 GiB  7.2 
TiB  60.42  1.00  179  up
 6ssd  18.19040   1.0   18 TiB   11 TiB   11 TiB   28 GiB   47 GiB  7.1 
TiB  60.99  1.01  184  up
 8ssd  18.19040   1.0   18 TiB   13 TiB   13 TiB   20 GiB   59 GiB  5.5 
TiB  69.95  1.16  184  up
12ssd  18.19040   1.0   18 TiB   11 TiB   11 TiB  6.2 GiB   39 GiB  7.4 
TiB  59.22  0.98  180  up
16ssd  18.19040   1.0   18 TiB  9.8 TiB  9.7 TiB   14 GiB   37 GiB  8.4 
TiB  53.63  0.89  187  up
20ssd  18.19040   1.0   18 TiB  9.7 TiB  9.6 TiB   21 GiB   33 GiB  8.5 
TiB  53.14  0.88  181  up
24ssd  18.19040   1.0   18 TiB   12 TiB   12 TiB   10 GiB   46 GiB  6.3 
TiB  65.24  1.08  180  up
28ssd  

[ceph-users] Re: OSD does not die when disk has failures

2024-03-20 Thread Igor Fedotov

Hi Robert,

I presume the plan was to support handling EIO at upper layers. But 
apparently that hasn't been completed. Or there are some bugs...


Will take a look.


Thanks,

Igor

On 3/19/2024 3:36 PM, Robert Sander wrote:

Hi,

On 3/19/24 13:00, Igor Fedotov wrote:


translating EIO to upper layers rather than crashing an OSD is a 
valid default behavior. One can alter this by setting 
bluestore_fail_eio parameter to true.


What benefit lies in this behavior when in the end client IO stalls?

Regards


--
Igor Fedotov
Ceph Lead Developer

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263
Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Need easy way to calculate Ceph cluster space for SolarWinds

2024-03-20 Thread Anthony D'Atri
Looks like you have one device class and the same replication on all pools, 
which makes that simpler.

Your MAX AVAIL figures are lower than I would expect if you're using size=3, so 
I'd check if you have the balancer enabled, if it's working properly.

Run

ceph osd df

and look at the VAR column, 

[rook@rook-ceph-tools-5ff8d58445-p9npl /]$ ceph osd df | head
ID   CLASS  WEIGHT   REWEIGHT  SIZE RAW USE   DATA  OMAP META 
AVAIL%USE   VAR   PGS  STATUS

Ideally the numbers should all be close to 1.00 + / -

> On Mar 20, 2024, at 16:55, Michael Worsham  
> wrote:
> 
> I had a request from the upper management wanting to use SolarWinds to be 
> able to extract what I am looking at and have SolarWinds track it in terms of 
> total available space, remaining space of the overall cluster, and I guess 
> would be the current RGW pools/buckets we have and their allocated sizes and 
> space remaining in it as well. I am sort of in the dark when it comes to 
> trying to break things down to make it readable/understandable for those that 
> are non-technical.
> 
> I was told that when it comes to pools and buckets, you sort of have to see 
> it this way:
> - Bucket is like a folder
> - Pool is like a hard drive.
> - You can create many folders in a hard drive and you can add quota to each 
> folder.
> - But if you want to know the remaining space, you need to check the hard 
> drive.
> 
> I did the "ceph df" command on the ceph monitor and we have something that 
> looks like this:
> 
>>> sudo ceph df
> --- RAW STORAGE ---
> CLASS SIZEAVAIL USED  RAW USED  %RAW USED
> ssd873 TiB  346 TiB  527 TiB   527 TiB  60.40
> TOTAL  873 TiB  346 TiB  527 TiB   527 TiB  60.40
> 
> --- POOLS ---
> POOL ID   PGS   STORED  OBJECTS USED  %USED  
> MAX AVAIL
> .mgr  1 1  449 KiB2  1.3 MiB  0   
>   61 TiB
> default.rgw.buckets.data  2  2048  123 TiB   41.86M  371 TiB  66.76   
>   61 TiB
> default.rgw.control   3 2  0 B8  0 B  0   
>   61 TiB
> default.rgw.data.root 4 2  0 B0  0 B  0   
>   61 TiB
> default.rgw.gc5 2  0 B0  0 B  0   
>   61 TiB
> default.rgw.log   6 2   41 KiB  209  732 KiB  0   
>   61 TiB
> default.rgw.intent-log7 2  0 B0  0 B  0   
>   61 TiB
> default.rgw.meta  8 2   20 KiB   96  972 KiB  0   
>   61 TiB
> default.rgw.otp   9 2  0 B0  0 B  0   
>   61 TiB
> default.rgw.usage10 2  0 B0  0 B  0   
>   61 TiB
> default.rgw.users.keys   11 2  0 B0  0 B  0   
>   61 TiB
> default.rgw.users.email  12 2  0 B0  0 B  0   
>   61 TiB
> default.rgw.users.swift  13 2  0 B0  0 B  0   
>   61 TiB
> default.rgw.users.uid14 2  0 B0  0 B  0   
>   61 TiB
> default.rgw.buckets.extra1516  0 B0  0 B  0   
>   61 TiB
> default.rgw.buckets.index1664  6.3 GiB  184   19 GiB   0.01   
>   61 TiB
> .rgw.root17 2  2.3 KiB4   48 KiB  0   
>   61 TiB
> ceph-benchmarking18   128  596 GiB  302.20k  1.7 TiB   0.94   
>   61 TiB
> ceph-fs_data 1964  438 MiB  110  1.3 GiB  0   
>   61 TiB
> ceph-fs_metadata 2016   37 MiB   32  111 MiB  0   
>   61 TiB
> test 2132   21 TiB5.61M   64 TiB  25.83   
>   61 TiB
> DD-Test  2232   11 MiB   13   32 MiB  0   
>   61 TiB
> nativesqlbackup  2432  539 MiB  147  1.6 GiB  0   
>   61 TiB
> default.rgw.buckets.non-ec   2532  1.7 MiB0  5.0 MiB  0   
>   61 TiB
> ceph-fs_sql_backups  2632  0 B0  0 B  0   
>   61 TiB
> ceph-fs_sql_backups_metadata 2732  0 B0  0 B  0   
>   61 TiB
> dd-drs-backups   2832  0 B0  0 B  0   
>   61 TiB
> default.rgw.jv-corp-pool.data5932   16 TiB   63.90M   49 TiB  21.12   
>   61 TiB
> default.rgw.jv-corp-pool.index   6032  108 GiB1.19k  323 GiB   0.17   
>   61 TiB
> default.rgw.jv-corp-pool.non-ec  6132  0 B0  0 B  0   
>   61 TiB
> default.rgw.jv-comm-pool.data6232  8.1 TiB   44.20M   24 TiB  11.65   
>   61 TiB
> default.rgw.jv-comm-pool.index   6332   83 GiB  811  248 GiB   0.13   
>   61 TiB
> default.rgw.jv-comm-pool.non-ec  6432  0 B0  0 B  0   
>   61 TiB
> default.rgw.jv-va-pool.data  6532  4.8 TiB   22.17M   14 TiB   7.28   
>   61 TiB
> default.rgw.jv-va-pool.index 

[ceph-users] Re: Need easy way to calculate Ceph cluster space for SolarWinds

2024-03-20 Thread Michael Worsham
I had a request from the upper management wanting to use SolarWinds to be able 
to extract what I am looking at and have SolarWinds track it in terms of total 
available space, remaining space of the overall cluster, and I guess would be 
the current RGW pools/buckets we have and their allocated sizes and space 
remaining in it as well. I am sort of in the dark when it comes to trying to 
break things down to make it readable/understandable for those that are 
non-technical.

I was told that when it comes to pools and buckets, you sort of have to see it 
this way:
- Bucket is like a folder
- Pool is like a hard drive.
- You can create many folders in a hard drive and you can add quota to each 
folder.
- But if you want to know the remaining space, you need to check the hard drive.

I did the "ceph df" command on the ceph monitor and we have something that 
looks like this:

>> sudo ceph df
--- RAW STORAGE ---
CLASS SIZEAVAIL USED  RAW USED  %RAW USED
ssd873 TiB  346 TiB  527 TiB   527 TiB  60.40
TOTAL  873 TiB  346 TiB  527 TiB   527 TiB  60.40

--- POOLS ---
POOL ID   PGS   STORED  OBJECTS USED  %USED  
MAX AVAIL
.mgr  1 1  449 KiB2  1.3 MiB  0 
61 TiB
default.rgw.buckets.data  2  2048  123 TiB   41.86M  371 TiB  66.76 
61 TiB
default.rgw.control   3 2  0 B8  0 B  0 
61 TiB
default.rgw.data.root 4 2  0 B0  0 B  0 
61 TiB
default.rgw.gc5 2  0 B0  0 B  0 
61 TiB
default.rgw.log   6 2   41 KiB  209  732 KiB  0 
61 TiB
default.rgw.intent-log7 2  0 B0  0 B  0 
61 TiB
default.rgw.meta  8 2   20 KiB   96  972 KiB  0 
61 TiB
default.rgw.otp   9 2  0 B0  0 B  0 
61 TiB
default.rgw.usage10 2  0 B0  0 B  0 
61 TiB
default.rgw.users.keys   11 2  0 B0  0 B  0 
61 TiB
default.rgw.users.email  12 2  0 B0  0 B  0 
61 TiB
default.rgw.users.swift  13 2  0 B0  0 B  0 
61 TiB
default.rgw.users.uid14 2  0 B0  0 B  0 
61 TiB
default.rgw.buckets.extra1516  0 B0  0 B  0 
61 TiB
default.rgw.buckets.index1664  6.3 GiB  184   19 GiB   0.01 
61 TiB
.rgw.root17 2  2.3 KiB4   48 KiB  0 
61 TiB
ceph-benchmarking18   128  596 GiB  302.20k  1.7 TiB   0.94 
61 TiB
ceph-fs_data 1964  438 MiB  110  1.3 GiB  0 
61 TiB
ceph-fs_metadata 2016   37 MiB   32  111 MiB  0 
61 TiB
test 2132   21 TiB5.61M   64 TiB  25.83 
61 TiB
DD-Test  2232   11 MiB   13   32 MiB  0 
61 TiB
nativesqlbackup  2432  539 MiB  147  1.6 GiB  0 
61 TiB
default.rgw.buckets.non-ec   2532  1.7 MiB0  5.0 MiB  0 
61 TiB
ceph-fs_sql_backups  2632  0 B0  0 B  0 
61 TiB
ceph-fs_sql_backups_metadata 2732  0 B0  0 B  0 
61 TiB
dd-drs-backups   2832  0 B0  0 B  0 
61 TiB
default.rgw.jv-corp-pool.data5932   16 TiB   63.90M   49 TiB  21.12 
61 TiB
default.rgw.jv-corp-pool.index   6032  108 GiB1.19k  323 GiB   0.17 
61 TiB
default.rgw.jv-corp-pool.non-ec  6132  0 B0  0 B  0 
61 TiB
default.rgw.jv-comm-pool.data6232  8.1 TiB   44.20M   24 TiB  11.65 
61 TiB
default.rgw.jv-comm-pool.index   6332   83 GiB  811  248 GiB   0.13 
61 TiB
default.rgw.jv-comm-pool.non-ec  6432  0 B0  0 B  0 
61 TiB
default.rgw.jv-va-pool.data  6532  4.8 TiB   22.17M   14 TiB   7.28 
61 TiB
default.rgw.jv-va-pool.index 6632   38 GiB  401  113 GiB   0.06 
61 TiB
default.rgw.jv-va-pool.non-ec6732  0 B0  0 B  0 
61 TiB
jv-edi-pool  6832  0 B0  0 B  0 
61 TiB

-- Michael

-Original Message-
From: Anthony D'Atri 
Sent: Wednesday, March 20, 2024 2:48 PM
To: Michael Worsham 
Cc: ceph-users@ceph.io
Subject: Re: [ceph-users] Need easy way to calculate Ceph cluster space for 
SolarWinds

This is an external email. Please take care when clicking links or opening 
attachments. When in doubt, check with the Help Desk or Security.


> On Mar 20, 2024, at 14:42, Michael Worsham  
> wrote:
>
> Is there an easy way to poll a Ceph cluster to see how much space is
> available

`ceph df`

The exporter has percen

[ceph-users] Re: Reef (18.2): Some PG not scrubbed/deep scrubbed for 1 month

2024-03-20 Thread Bandelow, Gunnar
Hi,

i just wanted to mention, that i am running a cluster with reef 18.2.1
with the same issue.

4 PGs start to deepscrub but dont finish since mid february. In the pg
dump they are shown as scheduled for deep scrub. They sometimes change
their status from active+clean to active+clean+scrubbing+deep and
back.


Best regards,
Gunnar 


===


Gunnar Bandelow

Universitätsrechenzentrum (URZ)
Universität Greifswald

Felix-Hausdorff-Straße 18
17489 GreifswaldGermany

Tel.: +49 3834 420 1450  



--- Original Nachricht ---
Betreff: [ceph-users] Re: Reef (18.2): Some PG not scrubbed/deep
scrubbed for 1 month
Von: "Michel Jouvin" 
An: ceph-users@ceph.io
Datum: 20-03-2024 20:00






Hi Rafael,

Good to know I am not alone!

Additional information ~6h after the OSD restart: over the 20 PGs 
impacted, 2 have been processed successfully... I don't have a clear 
picture on how Ceph prioritize the scrub of one PG over another, I had

thought that the oldest/expired scrubs are taken first but it may not
be 
the case. Anyway, I have seen a very significant decrese of the scrub 
activity this afternoon and the cluster is not loaded at all (almost
no 
users yet)...

Michel

Le 20/03/2024 à 17:55, quag...@bol.com.br a écrit :
> Hi,
>      I upgraded a cluster 2 weeks ago here. The situation is the
same 
> as Michel.
>      A lot of PGs no scrubbed/deep-scrubed.
>
> Rafael.
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Reef (18.2): Some PG not scrubbed/deep scrubbed for 1 month

2024-03-20 Thread Michel Jouvin

Hi Rafael,

Good to know I am not alone!

Additional information ~6h after the OSD restart: over the 20 PGs 
impacted, 2 have been processed successfully... I don't have a clear 
picture on how Ceph prioritize the scrub of one PG over another, I had 
thought that the oldest/expired scrubs are taken first but it may not be 
the case. Anyway, I have seen a very significant decrese of the scrub 
activity this afternoon and the cluster is not loaded at all (almost no 
users yet)...


Michel

Le 20/03/2024 à 17:55, quag...@bol.com.br a écrit :

Hi,
     I upgraded a cluster 2 weeks ago here. The situation is the same 
as Michel.

     A lot of PGs no scrubbed/deep-scrubed.

Rafael.

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] node-exporter error

2024-03-20 Thread quag...@bol.com.br
Hello,
 After some time, I'm adding some more disks on a new machine in the ceph 
cluster.
 However, there is a container that is not rising. It is the 
"node-exporter".

 Below is an excerpt from the log that reports the error:

Mar 20 15:51:08 adafn02 
ceph-da43a27a-eee8-11eb-9c87-525400baa344-node-exporter-adafn02[736348]: 
ts=2024-03-20T18:51:08.606Z caller=node_exporter.go:117 level=info collector=xfs
Mar 20 15:51:08 adafn02 
ceph-da43a27a-eee8-11eb-9c87-525400baa344-node-exporter-adafn02[736348]: 
ts=2024-03-20T18:51:08.606Z caller=node_exporter.go:117 level=info collector=zfs
Mar 20 15:51:08 adafn02 
ceph-da43a27a-eee8-11eb-9c87-525400baa344-node-exporter-adafn02[736348]: 
ts=2024-03-20T18:51:08.606Z caller=tls_config.go:232 level=info msg="Listening 
on" address=[::]:9100
Mar 20 15:51:08 adafn02 
ceph-da43a27a-eee8-11eb-9c87-525400baa344-node-exporter-adafn02[736348]: 
ts=2024-03-20T18:51:08.606Z caller=tls_config.go:235 level=info msg="TLS is 
disabled." http2=false address=[::]:9100
Mar 20 15:51:09 adafn02 systemd[1]: 
var-lib-containers-storage-overlay-a80fe574f464677d2fc313cd0e92b12930370b64ec56477ced79e24293953e99-merged.mount:
 Succeeded.
Mar 20 15:51:09 adafn02 systemd[1]: 
ceph-da43a27a-eee8-11eb-9c87-525400baa344@node-exporter.adafn02.service: Main 
process exited, code=exited, status=137/n/a
Mar 20 15:51:10 adafn02 systemd[1]: 
ceph-da43a27a-eee8-11eb-9c87-525400baa344@node-exporter.adafn02.service: Failed 
with result 'exit-code'.

 Version is:
[root@adafn02 ~]# ceph orch ps | grep adafn
crash.adafn02adafn02running (26m)   
 65s ago  38m7440k-  18.2.1 5be31c24972a  839c3ba37349  
node-exporter.adafn02adafn02  *:9100error   
 65s ago   2m--  
osd.62   adafn02running (26m)   
 65s ago  29m54.7M 352G  18.2.1 5be31c24972a  368d60d5ac3c  
osd.83   adafn02running (26m)   
 65s ago  28m56.3M 352G  18.2.1 5be31c24972a  4f9052698265  
osd.134  adafn02running (24m)   
 65s ago  24m 105M 352G  18.2.1 5be31c24972a  40fc99160112  
osd.135  adafn02running (23m)   
 65s ago  23m 103M 352G  18.2.1 5be31c24972a  6f352c76f2e5  


 Other containers in this machine are ok. Could anyone help me identify 
where the error is?

Thanks 
Rafael.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Need easy way to calculate Ceph cluster space for SolarWinds

2024-03-20 Thread Anthony D'Atri



> On Mar 20, 2024, at 14:42, Michael Worsham  
> wrote:
> 
> Is there an easy way to poll a Ceph cluster to see how much space is available

`ceph df`

The exporter has percentages per pool as well.


> and how much space is available per bucket?

Are you using RGW quotas?

> 
> Looking for a way to use SolarWinds to monitor the entire Ceph cluster space 
> utilization and then also be able to break down each RGW bucket to see how 
> much space it was provisioned for and how much is available.

RGW buckets do not provision space.  Optionally there may be some RGW quotas 
but they're a different thing than you're implying.


> 
> -- Michael
> 
> 
> Get Outlook for Android
> This message and its attachments are from Data Dimensions and are intended 
> only for the use of the individual or entity to which it is addressed, and 
> may contain information that is privileged, confidential, and exempt from 
> disclosure under applicable law. If the reader of this message is not the 
> intended recipient, or the employee or agent responsible for delivering the 
> message to the intended recipient, you are hereby notified that any 
> dissemination, distribution, or copying of this communication is strictly 
> prohibited. If you have received this communication in error, please notify 
> the sender immediately and permanently delete the original email and destroy 
> any copies or printouts of this email as well as any attachments.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Need easy way to calculate Ceph cluster space for SolarWinds

2024-03-20 Thread Michael Worsham
Is there an easy way to poll a Ceph cluster to see how much space is available 
and how much space is available per bucket?

Looking for a way to use SolarWinds to monitor the entire Ceph cluster space 
utilization and then also be able to break down each RGW bucket to see how much 
space it was provisioned for and how much is available.

-- Michael


Get Outlook for Android
This message and its attachments are from Data Dimensions and are intended only 
for the use of the individual or entity to which it is addressed, and may 
contain information that is privileged, confidential, and exempt from 
disclosure under applicable law. If the reader of this message is not the 
intended recipient, or the employee or agent responsible for delivering the 
message to the intended recipient, you are hereby notified that any 
dissemination, distribution, or copying of this communication is strictly 
prohibited. If you have received this communication in error, please notify the 
sender immediately and permanently delete the original email and destroy any 
copies or printouts of this email as well as any attachments.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: CephFS space usage

2024-03-20 Thread Thorne Lawler

Alexander,

Thanks for explaining this. As I suspected, this is a high abstract 
pursuit of what caused the problem, and while I'm sure this makes sense 
for Ceph developers, it isn't going to happen in this case.


I don't care how it got this way- the tools used to create this pool 
will never be used in our environment again after I recover this disk 
space - the entire reason I need to recover the missing space is so I 
can move enough filesystems around to remove the current structure and 
the tools that made it.


I only need to get that disk space back. Any analysis I do will be 
solely directed towards achieving that.


Thanks.

On 21/03/2024 3:10 am, Alexander E. Patrakov wrote:

Hi Thorne,

The idea is quite simple. By retesting the leak with a separate pool, 
used by nobody except you, in the case if the leak exists and is 
reproducible (which is not a given), you can definitely pinpoint it 
without giving any chance to the alternate hypothesis "somebody wrote 
some data in parallel". And then, even if the leak is small but 
reproducible, one can say that multiple such events accumulated to 10 
TB of garbage in the original pool.


On Wed, Mar 20, 2024 at 7:29 PM Thorne Lawler  wrote:

Alexander,

I'm happy to create a new pool if it will help, but I don't
presently see how creating a new pool will help us to identify the
source of the 10TB discrepancy in this original cephfs pool.

Please help me to understand what you are hoping to find...?

On 20/03/2024 6:35 pm, Alexander E. Patrakov wrote:

Thorne,

That's why I asked you to create a separate pool. All writes go
to the original pool, and it is possible to see object counts
per-pool.

On Wed, Mar 20, 2024 at 6:32 AM Thorne Lawler
 wrote:

Alexander,

Thank you, but as I said to Igor: The 5.5TB of files on this
filesystem are virtual machine disks. They are under
constant, heavy write load. There is no way to turn this off.

On 19/03/2024 9:36 pm, Alexander E. Patrakov wrote:

Hello Thorne,

Here is one more suggestion on how to debug this. Right now, there is
uncertainty on whether there is really a disk space leak or if
something simply wrote new data during the test.

If you have at least three OSDs you can reassign, please set their
CRUSH device class to something different than before. E.g., "test".
Then, create a new pool that targets this device class and add it to
CephFS. Then, create an empty directory on CephFS and assign this pool
to it using setfattr. Finally, try reproducing the issue using only
files in this directory. This way, you will be sure that nobody else
is writing any data to the new pool.

On Tue, Mar 19, 2024 at 5:40 PM Igor Fedotov 
 wrote:

Hi Thorn,

given the amount of files at CephFS volume I presume you don't have
severe write load against it. Is that correct?

If so we can assume that the numbers you're sharing are mostly refer to
your experiment. At peak I can see bytes_used increase = 629,461,893,120
bytes (45978612027392  - 45349150134272). With replica factor = 3 this
roughly matches your written data (200GB I presume?).


More interestingly is that after file's removal we can see 419,450,880
bytes delta (=45349569585152 - 45349150134272). I could see two options
(apart that someone else wrote additional stuff to CephFS during the
experiment) to explain this:

1. File removal wasn't completed at the last probe half an hour after
file's removal. Did you see stale object counter when making that probe?

2. Some space is leaking. If that's the case this could be a reason for
your issue if huge(?) files at CephFS are created/removed periodically.
So if we're certain that the leak really occurred (and option 1. above
isn't the case) it makes sense to run more experiments with
writing/removing a bunch of huge files to the volume to confirm space
leakage.

On 3/18/2024 3:12 AM, Thorne Lawler wrote:

Thanks Igor,

I have tried that, and the number of objects and bytes_used took a
long time to drop, but they seem to have dropped back to almost the
original level:

   * Before creating the file:
   o 3885835 objects
   o 45349150134272 bytes_used
   * After creating the file:
   o 3931663 objects
   o 45924147249152 bytes_used
   * Immediately after deleting the file:
   o 3935995 objects
   o 45978612027392 bytes_used
   * Half an hour after deleting the file:
   o 3886013 objects
   o 45349569585152 bytes_used

Unfortunately, this is all production infrastructure, so there is
  

[ceph-users] Re: Reef (18.2): Some PG not scrubbed/deep scrubbed for 1 month

2024-03-20 Thread quag...@bol.com.br
Hi,
     I upgraded a cluster 2 weeks ago here. The situation is the same as Michel.
     A lot of PGs no scrubbed/deep-scrubed.

Rafael.___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: CephFS space usage

2024-03-20 Thread Alexander E. Patrakov
Hi Thorne,

The idea is quite simple. By retesting the leak with a separate pool, used
by nobody except you, in the case if the leak exists and is reproducible
(which is not a given), you can definitely pinpoint it without giving any
chance to the alternate hypothesis "somebody wrote some data in parallel".
And then, even if the leak is small but reproducible, one can say that
multiple such events accumulated to 10 TB of garbage in the original pool.

On Wed, Mar 20, 2024 at 7:29 PM Thorne Lawler  wrote:

> Alexander,
>
> I'm happy to create a new pool if it will help, but I don't presently see
> how creating a new pool will help us to identify the source of the 10TB
> discrepancy in this original cephfs pool.
>
> Please help me to understand what you are hoping to find...?
> On 20/03/2024 6:35 pm, Alexander E. Patrakov wrote:
>
> Thorne,
>
> That's why I asked you to create a separate pool. All writes go to the
> original pool, and it is possible to see object counts per-pool.
>
> On Wed, Mar 20, 2024 at 6:32 AM Thorne Lawler  wrote:
>
>> Alexander,
>>
>> Thank you, but as I said to Igor: The 5.5TB of files on this filesystem
>> are virtual machine disks. They are under constant, heavy write load. There
>> is no way to turn this off.
>> On 19/03/2024 9:36 pm, Alexander E. Patrakov wrote:
>>
>> Hello Thorne,
>>
>> Here is one more suggestion on how to debug this. Right now, there is
>> uncertainty on whether there is really a disk space leak or if
>> something simply wrote new data during the test.
>>
>> If you have at least three OSDs you can reassign, please set their
>> CRUSH device class to something different than before. E.g., "test".
>> Then, create a new pool that targets this device class and add it to
>> CephFS. Then, create an empty directory on CephFS and assign this pool
>> to it using setfattr. Finally, try reproducing the issue using only
>> files in this directory. This way, you will be sure that nobody else
>> is writing any data to the new pool.
>>
>> On Tue, Mar 19, 2024 at 5:40 PM Igor Fedotov  
>>  wrote:
>>
>> Hi Thorn,
>>
>> given the amount of files at CephFS volume I presume you don't have
>> severe write load against it. Is that correct?
>>
>> If so we can assume that the numbers you're sharing are mostly refer to
>> your experiment. At peak I can see bytes_used increase = 629,461,893,120
>> bytes (45978612027392  - 45349150134272). With replica factor = 3 this
>> roughly matches your written data (200GB I presume?).
>>
>>
>> More interestingly is that after file's removal we can see 419,450,880
>> bytes delta (=45349569585152 - 45349150134272). I could see two options
>> (apart that someone else wrote additional stuff to CephFS during the
>> experiment) to explain this:
>>
>> 1. File removal wasn't completed at the last probe half an hour after
>> file's removal. Did you see stale object counter when making that probe?
>>
>> 2. Some space is leaking. If that's the case this could be a reason for
>> your issue if huge(?) files at CephFS are created/removed periodically.
>> So if we're certain that the leak really occurred (and option 1. above
>> isn't the case) it makes sense to run more experiments with
>> writing/removing a bunch of huge files to the volume to confirm space
>> leakage.
>>
>> On 3/18/2024 3:12 AM, Thorne Lawler wrote:
>>
>> Thanks Igor,
>>
>> I have tried that, and the number of objects and bytes_used took a
>> long time to drop, but they seem to have dropped back to almost the
>> original level:
>>
>>   * Before creating the file:
>>   o 3885835 objects
>>   o 45349150134272 bytes_used
>>   * After creating the file:
>>   o 3931663 objects
>>   o 45924147249152 bytes_used
>>   * Immediately after deleting the file:
>>   o 3935995 objects
>>   o 45978612027392 bytes_used
>>   * Half an hour after deleting the file:
>>   o 3886013 objects
>>   o 45349569585152 bytes_used
>>
>> Unfortunately, this is all production infrastructure, so there is
>> always other activity taking place.
>>
>> What tools are there to visually inspect the object map and see how it
>> relates to the filesystem?
>>
>>
>> Not sure if there is anything like that at CephFS level but you can use
>> rados tool to view objects in cephfs data pool and try to build some
>> mapping between them and CephFS file list. Could be a bit tricky though.
>>
>> On 15/03/2024 7:18 pm, Igor Fedotov wrote:
>>
>> ceph df detail --format json-pretty
>>
>> --
>>
>> Regards,
>>
>> Thorne Lawler - Senior System Administrator
>> *DDNS* | ABN 76 088 607 265
>> First registrar certified ISO 27001-2013 Data Security Standard ITGOV40172
>> P +61 499 449 170
>>
>> _DDNS
>>
>> /_*Please note:* The information contained in this email message and
>> any attached files may be confidential information, and may also be
>> the subject of legal professional privilege. _If you are not the
>> intended recipient any use, disclosure or copying of this email is
>> unauthorised. _If you received this email

[ceph-users] Re: Why a lot of pgs are degraded after host(+osd) restarted?

2024-03-20 Thread Joshua Baergen
Hi Jaemin,

It is normal for PGs to become degraded during a host reboot, since a
copy of the data was taken offline and needs to be resynchronized
after the host comes back. Normally this is quick, as the recovery
mechanism only needs to modify those objects that have changed while
the host is down.

However, if you have backfills ongoing and reboot a host that contains
OSDs involved in those backfills, then those backfills become
degraded, and you will need to wait for them to complete for
degradation to clear. Do you know if you had backfills at the time the
host was rebooted? If so, the way to avoid this is to wait for
backfill to complete before taking any OSDs/hosts down for
maintenance.

Josh

On Wed, Mar 20, 2024 at 1:50 AM Jaemin Joo  wrote:
>
> Hi all,
>
> While I am testing host failover, there are a lot of degraded pg after
> host(+osd) is up. In spite that it takes a short time to restart, I don't
> understand why pg should check all objects related to the failed host(+osd).
> I'd like to know how to prevent to become degraded pg when osd restart.
>
> FYI. degraded pg means "active+undersized+degraded+remapped+backfilling" pg
> state
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Reef (18.2): Some PG not scrubbed/deep scrubbed for 1 month

2024-03-20 Thread Anthony D'Atri
Suggest issuing an explicit deep scrub against one of the subject PGs, see if 
it takes.

> On Mar 20, 2024, at 8:20 AM, Michel Jouvin  
> wrote:
> 
> Hi,
> 
> We have a Reef cluster that started to complain a couple of weeks ago about 
> ~20 PGs (over 10K) not scrubbed/deep-scrubbed in time. Looking at it since a 
> few days, I saw this affect only those PGs that could not be scrubbed since 
> mid-February. Old the other PGs are regularly scrubbed.
> 
> I decided to look if one OSD was present in all these PGs and found one! I 
> restarted this OSD but it had no effect. Looking at the logs for the suspect 
> OSD, I found nothing related to abnormal behaviour (but the log is very 
> verbose at restart time so easy to miss something...). And there is no error 
> associated with the OSD disk.
> 
> Any advice about where to look for some useful information would be 
> appreciated! Should I try to destroy the OSD and readd it? I'll be more 
> confortable if I was able to find some diagnostics before...
> 
> Best regards,
> 
> Michel
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Reef (18.2): Some PG not scrubbed/deep scrubbed for 1 month

2024-03-20 Thread Michel Jouvin

Hi,

We have a Reef cluster that started to complain a couple of weeks ago 
about ~20 PGs (over 10K) not scrubbed/deep-scrubbed in time. Looking at 
it since a few days, I saw this affect only those PGs that could not be 
scrubbed since mid-February. Old the other PGs are regularly scrubbed.


I decided to look if one OSD was present in all these PGs and found one! 
I restarted this OSD but it had no effect. Looking at the logs for the 
suspect OSD, I found nothing related to abnormal behaviour (but the log 
is very verbose at restart time so easy to miss something...). And there 
is no error associated with the OSD disk.


Any advice about where to look for some useful information would be 
appreciated! Should I try to destroy the OSD and readd it? I'll be more 
confortable if I was able to find some diagnostics before...


Best regards,

Michel
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: CephFS space usage

2024-03-20 Thread Thorne Lawler

Alexander,

I'm happy to create a new pool if it will help, but I don't presently 
see how creating a new pool will help us to identify the source of the 
10TB discrepancy in this original cephfs pool.


Please help me to understand what you are hoping to find...?

On 20/03/2024 6:35 pm, Alexander E. Patrakov wrote:

Thorne,

That's why I asked you to create a separate pool. All writes go to the 
original pool, and it is possible to see object counts per-pool.


On Wed, Mar 20, 2024 at 6:32 AM Thorne Lawler  wrote:

Alexander,

Thank you, but as I said to Igor: The 5.5TB of files on this
filesystem are virtual machine disks. They are under constant,
heavy write load. There is no way to turn this off.

On 19/03/2024 9:36 pm, Alexander E. Patrakov wrote:

Hello Thorne,

Here is one more suggestion on how to debug this. Right now, there is
uncertainty on whether there is really a disk space leak or if
something simply wrote new data during the test.

If you have at least three OSDs you can reassign, please set their
CRUSH device class to something different than before. E.g., "test".
Then, create a new pool that targets this device class and add it to
CephFS. Then, create an empty directory on CephFS and assign this pool
to it using setfattr. Finally, try reproducing the issue using only
files in this directory. This way, you will be sure that nobody else
is writing any data to the new pool.

On Tue, Mar 19, 2024 at 5:40 PM Igor Fedotov 
 wrote:

Hi Thorn,

given the amount of files at CephFS volume I presume you don't have
severe write load against it. Is that correct?

If so we can assume that the numbers you're sharing are mostly refer to
your experiment. At peak I can see bytes_used increase = 629,461,893,120
bytes (45978612027392  - 45349150134272). With replica factor = 3 this
roughly matches your written data (200GB I presume?).


More interestingly is that after file's removal we can see 419,450,880
bytes delta (=45349569585152 - 45349150134272). I could see two options
(apart that someone else wrote additional stuff to CephFS during the
experiment) to explain this:

1. File removal wasn't completed at the last probe half an hour after
file's removal. Did you see stale object counter when making that probe?

2. Some space is leaking. If that's the case this could be a reason for
your issue if huge(?) files at CephFS are created/removed periodically.
So if we're certain that the leak really occurred (and option 1. above
isn't the case) it makes sense to run more experiments with
writing/removing a bunch of huge files to the volume to confirm space
leakage.

On 3/18/2024 3:12 AM, Thorne Lawler wrote:

Thanks Igor,

I have tried that, and the number of objects and bytes_used took a
long time to drop, but they seem to have dropped back to almost the
original level:

   * Before creating the file:
   o 3885835 objects
   o 45349150134272 bytes_used
   * After creating the file:
   o 3931663 objects
   o 45924147249152 bytes_used
   * Immediately after deleting the file:
   o 3935995 objects
   o 45978612027392 bytes_used
   * Half an hour after deleting the file:
   o 3886013 objects
   o 45349569585152 bytes_used

Unfortunately, this is all production infrastructure, so there is
always other activity taking place.

What tools are there to visually inspect the object map and see how it
relates to the filesystem?


Not sure if there is anything like that at CephFS level but you can use
rados tool to view objects in cephfs data pool and try to build some
mapping between them and CephFS file list. Could be a bit tricky though.

On 15/03/2024 7:18 pm, Igor Fedotov wrote:

ceph df detail --format json-pretty

--

Regards,

Thorne Lawler - Senior System Administrator
*DDNS* | ABN 76 088 607 265
First registrar certified ISO 27001-2013 Data Security Standard ITGOV40172
P +61 499 449 170

_DDNS

/_*Please note:* The information contained in this email message and
any attached files may be confidential information, and may also be
the subject of legal professional privilege. _If you are not the
intended recipient any use, disclosure or copying of this email is
unauthorised. _If you received this email in error, please notify
Discount Domain Name Services Pty Ltd on 03 9815 6868 to report this
matter and delete all copies of this transmission together with any
attachments. /


--
Igor Fedotov
Ceph Lead Developer

Looking for help with your Ceph cluster? Contact us athttps://croit.io 


croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 2

[ceph-users] cephadm auto disk preparation and OSD installation incomplete

2024-03-20 Thread Kuhring, Mathias
Dear ceph community,

We have trouble with new disks not being properly prepared resp. OSDs not being 
fully installed by cephadm.
We just added one new node each with ~40 HDDs each to two of our ceph clusters.
In one cluster all but 5 disks got installed automatically.
In the other none got installed.

We are on ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy 
(stable) on both clusters.
(I haven't added new disks since the last upgrade if I recall correctly).

This is our OSD service definition:
```
0|0[root@ceph-3-10 ~]# ceph orch ls osd --export
service_type: osd
service_id: all-available-devices
service_name: osd.all-available-devices
placement:
  host_pattern: '*'
spec:
  data_devices:
all: true
  filter_logic: AND
  objectstore: bluestore
---
service_type: osd
service_id: unmanaged
service_name: osd.unmanaged
unmanaged: true
spec:
  filter_logic: AND
  objectstore: bluestore
```

Usually, new disks are installed properly (as expected due to 
all-available-devices).
This time, I can see that LVs were created (via `lsblk`, `lvs`, `cephadm 
ceph-volume lvm list`).
And OSDs are entered to the crushmap.
However, they are not assigned to a host yet, nor do they have a type or 
weight, e.g.:
```
0|0[root@ceph-2-10 ~]# ceph osd tree | grep "0  osd"
518  0  osd.518   down 0  1.0
519  0  osd.519   down 0  1.0
520  0  osd.520   down 0  1.0
521  0  osd.521   down 0  1.0
522  0  osd.522   down 0  1.0
```

And there is also no OSD daemon created (no docker container).
So, OSD creation is somehow stuck halfway.

I thought of fully cleaning up the OSD/disks.
Hopping cephadm might pick them up properly next time.
Just zapping was not possible, e.g. `cephadm ceph-volume lvm zap --destroy 
/dev/sdab` results in these errors:
```
/usr/bin/docker: stderr  stderr: wipefs: error: /dev/sdab: probing 
initialization failed: Device or resource busy
/usr/bin/docker: stderr --> failed to wipefs device, will try again to 
workaround probable race condition
```

So, I cleaned up more manually with purging them from crush and "resetting" 
disk and LV with dd and dmsetup, resp.: 
```
ceph osd purge 480 --force
dd if=/dev/zero of=/dev/sdab bs=1M count=1
dmsetup remove 
ceph--e10e0f08--8705--441a--8caa--4590de22a611-osd--block--d464211c--f513--4513--86c1--c7ad63e6c142
```

ceph-volume still reported the old volumes, but then zapping actually got rid 
of them (only cleaned out the left-over entries, I guess).

Now, cephadm was able to get one OSD up, when I did this cleanup for only one 
disk.
When I did it in bulk for the rest, they all got stuck again the same way.

Looking into ceph-volume logs (here for osd.522 as representative):
```
0|0[root@ceph-2-11 /var/log/ceph/55633ec3-6c0c-4a02-990c-0f87e0f7a01f]# ll 
*20240316
-rw-r--r-- 1 ceph ceph   613789 Mar 14 17:10 ceph-osd.522.log-20240316
-rw-r--r-- 1 root root 42473553 Mar 16 03:13 ceph-volume.log-20240316
```

ceph-volume only reports keyring creation:
```
[2024-03-14 16:10:19,509][ceph_volume.util.prepare][INFO  ] Creating keyring 
file for osd.522
[2024-03-14 16:10:19,510][ceph_volume.process][INFO  ] Running command: 
/usr/bin/ceph-authtool /var/lib/ceph/osd/ceph-522/keyring --create-keyring 
--name osd.522 --add-key AQBfIfNlinc7EBAAHeFicrjmLEjRPGSjuFuLiQ==
```

In the OSD logs I found a couple of these, but don't know if they are related:
```
2024-03-14T16:10:54.706+ 7fab26988540  2 rocksdb: [db/column_family.cc:546] 
Failed to register data paths of column family (id: 11, name: P)
```

Has anyone seen this behaviour before?
Or could tell me where I should look next to troubleshoot this (which logs)?
Any help is appreciated.

Best Wishes,
Mathias
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Why a lot of pgs are degraded after host(+osd) restarted?

2024-03-20 Thread Jaemin Joo
Hi all,

While I am testing host failover, there are a lot of degraded pg after
host(+osd) is up. In spite that it takes a short time to restart, I don't
understand why pg should check all objects related to the failed host(+osd).
I'd like to know how to prevent to become degraded pg when osd restart.

FYI. degraded pg means "active+undersized+degraded+remapped+backfilling" pg
state
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: CephFS space usage

2024-03-20 Thread Alexander E. Patrakov
Thorne,

That's why I asked you to create a separate pool. All writes go to the
original pool, and it is possible to see object counts per-pool.

On Wed, Mar 20, 2024 at 6:32 AM Thorne Lawler  wrote:

> Alexander,
>
> Thank you, but as I said to Igor: The 5.5TB of files on this filesystem
> are virtual machine disks. They are under constant, heavy write load. There
> is no way to turn this off.
> On 19/03/2024 9:36 pm, Alexander E. Patrakov wrote:
>
> Hello Thorne,
>
> Here is one more suggestion on how to debug this. Right now, there is
> uncertainty on whether there is really a disk space leak or if
> something simply wrote new data during the test.
>
> If you have at least three OSDs you can reassign, please set their
> CRUSH device class to something different than before. E.g., "test".
> Then, create a new pool that targets this device class and add it to
> CephFS. Then, create an empty directory on CephFS and assign this pool
> to it using setfattr. Finally, try reproducing the issue using only
> files in this directory. This way, you will be sure that nobody else
> is writing any data to the new pool.
>
> On Tue, Mar 19, 2024 at 5:40 PM Igor Fedotov  
>  wrote:
>
> Hi Thorn,
>
> given the amount of files at CephFS volume I presume you don't have
> severe write load against it. Is that correct?
>
> If so we can assume that the numbers you're sharing are mostly refer to
> your experiment. At peak I can see bytes_used increase = 629,461,893,120
> bytes (45978612027392  - 45349150134272). With replica factor = 3 this
> roughly matches your written data (200GB I presume?).
>
>
> More interestingly is that after file's removal we can see 419,450,880
> bytes delta (=45349569585152 - 45349150134272). I could see two options
> (apart that someone else wrote additional stuff to CephFS during the
> experiment) to explain this:
>
> 1. File removal wasn't completed at the last probe half an hour after
> file's removal. Did you see stale object counter when making that probe?
>
> 2. Some space is leaking. If that's the case this could be a reason for
> your issue if huge(?) files at CephFS are created/removed periodically.
> So if we're certain that the leak really occurred (and option 1. above
> isn't the case) it makes sense to run more experiments with
> writing/removing a bunch of huge files to the volume to confirm space
> leakage.
>
> On 3/18/2024 3:12 AM, Thorne Lawler wrote:
>
> Thanks Igor,
>
> I have tried that, and the number of objects and bytes_used took a
> long time to drop, but they seem to have dropped back to almost the
> original level:
>
>   * Before creating the file:
>   o 3885835 objects
>   o 45349150134272 bytes_used
>   * After creating the file:
>   o 3931663 objects
>   o 45924147249152 bytes_used
>   * Immediately after deleting the file:
>   o 3935995 objects
>   o 45978612027392 bytes_used
>   * Half an hour after deleting the file:
>   o 3886013 objects
>   o 45349569585152 bytes_used
>
> Unfortunately, this is all production infrastructure, so there is
> always other activity taking place.
>
> What tools are there to visually inspect the object map and see how it
> relates to the filesystem?
>
>
> Not sure if there is anything like that at CephFS level but you can use
> rados tool to view objects in cephfs data pool and try to build some
> mapping between them and CephFS file list. Could be a bit tricky though.
>
> On 15/03/2024 7:18 pm, Igor Fedotov wrote:
>
> ceph df detail --format json-pretty
>
> --
>
> Regards,
>
> Thorne Lawler - Senior System Administrator
> *DDNS* | ABN 76 088 607 265
> First registrar certified ISO 27001-2013 Data Security Standard ITGOV40172
> P +61 499 449 170
>
> _DDNS
>
> /_*Please note:* The information contained in this email message and
> any attached files may be confidential information, and may also be
> the subject of legal professional privilege. _If you are not the
> intended recipient any use, disclosure or copying of this email is
> unauthorised. _If you received this email in error, please notify
> Discount Domain Name Services Pty Ltd on 03 9815 6868 to report this
> matter and delete all copies of this transmission together with any
> attachments. /
>
>
> --
> Igor Fedotov
> Ceph Lead Developer
>
> Looking for help with your Ceph cluster? Contact us athttps://croit.io
>
> croit GmbH, Freseniusstr. 31h, 81247 Munich
> CEO: Martin Verges - VAT-ID: DE310638492
> Com. register: Amtsgericht Munich HRB 231263
> Web:https://croit.io  | YouTube:https://goo.gl/PGE1Bx
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
> --
>
> Regards,
>
> Thorne Lawler - Senior System Administrator
> *DDNS* | ABN 76 088 607 265
> First registrar certified ISO 27001-2013 Data Security Standard ITGOV40172
> P +61 499 449 170
>
> [image: DDNS]
> *Please note: The information contained in this email message and any
> attached fil

[ceph-users] Re: mon stuck in probing

2024-03-20 Thread faicker mo
Hi, this is the debug log,

2024-03-13T11:14:28.087+0800 7f6984a95640  4 mon.memb4@3(probing) e6
probe_timeout 0x5650c2b0c3a0
2024-03-13T11:14:28.087+0800 7f6984a95640 10 mon.memb4@3(probing) e6
bootstrap
2024-03-13T11:14:28.087+0800 7f6984a95640 10 mon.memb4@3(probing) e6
sync_reset_requester
2024-03-13T11:14:28.087+0800 7f6984a95640 10 mon.memb4@3(probing) e6
unregister_cluster_logger - not registered
2024-03-13T11:14:28.087+0800 7f6984a95640 10 mon.memb4@3(probing) e6
cancel_probe_timeout (none scheduled)
2024-03-13T11:14:28.087+0800 7f6984a95640 10 mon.memb4@3(probing) e6 monmap
e6: 5 mons at {memb1=[v2:10.0.4.111:3300/0,v1:10.0.4.111:6789/0],memb2=[v2:
10.0.4.112:3300/0,v1:10.0.4.112:6789/0],memb3=[v2:
10.0.4.113:3300/0,v1:10.0.4.113:6789/0],memb4=[v2:
10.0.4.114:3300/0,v1:10.0.4.114:6789/0],memb5=[v2:
10.0.4.115:3300/0,v1:10.0.4.115:6789/0]} removed_ranks: {}
disallowed_leaders: {}
2024-03-13T11:14:28.087+0800 7f6984a95640 10 mon.memb4@3(probing) e6 _reset
2024-03-13T11:14:28.087+0800 7f6984a95640 10 mon.memb4@3(probing).auth
v2121 _set_mon_num_rank num 0 rank 0
2024-03-13T11:14:28.087+0800 7f6984a95640 10 mon.memb4@3(probing) e6
cancel_probe_timeout (none scheduled)
2024-03-13T11:14:28.087+0800 7f6984a95640 10 mon.memb4@3(probing) e6
timecheck_finish
2024-03-13T11:14:28.087+0800 7f6984a95640 10 mon.memb4@3(probing) e6
scrub_event_cancel
2024-03-13T11:14:28.087+0800 7f6984a95640 10 mon.memb4@3(probing) e6
scrub_reset
2024-03-13T11:14:28.087+0800 7f6984a95640 10 mon.memb4@3(probing) e6
cancel_probe_timeout (none scheduled)
2024-03-13T11:14:28.087+0800 7f6984a95640 10 mon.memb4@3(probing) e6
reset_probe_timeout 0x5650bb5c8380 after 2 seconds
2024-03-13T11:14:28.087+0800 7f6984a95640 10 mon.memb4@3(probing) e6
probing other monitors
2024-03-13T11:14:28.087+0800 7f6984a95640  1 -- [v2:
10.0.4.114:3300/0,v1:10.0.4.114:6789/0] send_to--> mon [v2:
10.0.4.111:3300/0,v1:10.0.4.111:6789/0] -- mon_probe(probe
c6ee9a01-944f-4745-be86-86e4a2a30e0d name memb4 leader -1 mon_release reef)
v8 -- ?+0 0x5650d8765a00
2024-03-13T11:14:28.087+0800 7f6984a95640 10 -- [v2:
10.0.4.114:3300/0,v1:10.0.4.114:6789/0] connect_to [v2:
10.0.4.111:3300/0,v1:10.0.4.111:6789/0] existing 0x565071e1dc00
2024-03-13T11:14:28.087+0800 7f6984a95640  1 -- [v2:
10.0.4.114:3300/0,v1:10.0.4.114:6789/0] --> [v2:
10.0.4.111:3300/0,v1:10.0.4.111:6789/0] -- mon_probe(probe
c6ee9a01-944f-4745-be86-86e4a2a30e0d name memb4 leader -1 mon_release reef)
v8 -- 0x5650d8765a00 con 0x565071e1dc00
2024-03-13T11:14:28.087+0800 7f6984a95640  5 --2- [v2:
10.0.4.114:3300/0,v1:10.0.4.114:6789/0] >> [v2:
10.0.4.111:3300/0,v1:10.0.4.111:6789/0] conn(0x565071e1dc00 0x565070fbac00
unknown :-1 s=BANNER_CONNECTING pgs=20 cs=955 l=0 rev1=1 crypto rx=0 tx=0
comp rx=0 tx=0).send_message enqueueing message m=0x5650d8765a00 type=67
mon_probe(probe c6ee9a01-944f-4745-be86-86e4a2a30e0d name memb4 leader -1
mon_release reef) v8
2024-03-13T11:14:28.087+0800 7f6984a95640  1 -- [v2:
10.0.4.114:3300/0,v1:10.0.4.114:6789/0] send_to--> mon [v2:
10.0.4.112:3300/0,v1:10.0.4.112:6789/0] -- mon_probe(probe
c6ee9a01-944f-4745-be86-86e4a2a30e0d name memb4 leader -1 mon_release reef)
v8 -- ?+0 0x5650d8765c00
2024-03-13T11:14:28.087+0800 7f6984a95640 10 -- [v2:
10.0.4.114:3300/0,v1:10.0.4.114:6789/0] connect_to [v2:
10.0.4.112:3300/0,v1:10.0.4.112:6789/0] existing 0x5650721d6c00
2024-03-13T11:14:28.087+0800 7f6984a95640  1 -- [v2:
10.0.4.114:3300/0,v1:10.0.4.114:6789/0] --> [v2:
10.0.4.112:3300/0,v1:10.0.4.112:6789/0] -- mon_probe(probe
c6ee9a01-944f-4745-be86-86e4a2a30e0d name memb4 leader -1 mon_release reef)
v8 -- 0x5650d8765c00 con 0x5650721d6c00
2024-03-13T11:14:28.087+0800 7f6984a95640  5 --2- [v2:
10.0.4.114:3300/0,v1:10.0.4.114:6789/0] >> [v2:
10.0.4.112:3300/0,v1:10.0.4.112:6789/0] conn(0x5650721d6c00 0x565070fba680
unknown :-1 s=BANNER_CONNECTING pgs=92 cs=960 l=0 rev1=1 crypto rx=0 tx=0
comp rx=0 tx=0).send_message enqueueing message m=0x5650d8765c00 type=67
mon_probe(probe c6ee9a01-944f-4745-be86-86e4a2a30e0d name memb4 leader -1
mon_release reef) v8
2024-03-13T11:14:28.087+0800 7f6984a95640  1 -- [v2:
10.0.4.114:3300/0,v1:10.0.4.114:6789/0] send_to--> mon [v2:
10.0.4.113:3300/0,v1:10.0.4.113:6789/0] -- mon_probe(probe
c6ee9a01-944f-4745-be86-86e4a2a30e0d name memb4 leader -1 mon_release reef)
v8 -- ?+0 0x5650d8765e00
2024-03-13T11:14:28.087+0800 7f6984a95640 10 -- [v2:
10.0.4.114:3300/0,v1:10.0.4.114:6789/0] connect_to [v2:
10.0.4.113:3300/0,v1:10.0.4.113:6789/0] existing 0x5650721d7000
2024-03-13T11:14:28.087+0800 7f6984a95640  1 -- [v2:
10.0.4.114:3300/0,v1:10.0.4.114:6789/0] --> [v2:
10.0.4.113:3300/0,v1:10.0.4.113:6789/0] -- mon_probe(probe
c6ee9a01-944f-4745-be86-86e4a2a30e0d name memb4 leader -1 mon_release reef)
v8 -- 0x5650d8765e00 con 0x5650721d7000
2024-03-13T11:14:28.087+0800 7f6984a95640  5 --2- [v2:
10.0.4.114:3300/0,v1:10.0.4.114:6789/0] >> [v2:
10.0.4.113:3300/0,v1:10.0.4.113:6789/0] conn(0x5650721d7000 0x565070fba100
unknown :-1 s=BANNER_CONNECTING pgs=20 cs=962