[ceph-users] CephFS space usage

2024-03-12 Thread Thorne Lawler

Hi everyone!

My Ceph cluster (17.2.6) has a CephFS volume which is showing 41TB usage 
for the data pool, but there are only 5.5TB of files in it. There are 
fewer than 100 files on the filesystem in total, so where is all that 
space going?


How can I analyze my cephfs to understand what is using that space, and 
if possible, how can I reclaim that space?


Thank you.

--

Regards,

Thorne Lawler - Senior System Administrator
*DDNS* | ABN 76 088 607 265
First registrar certified ISO 27001-2013 Data Security Standard ITGOV40172
P +61 499 449 170

_DDNS

/_*Please note:* The information contained in this email message and any 
attached files may be confidential information, and may also be the 
subject of legal professional privilege. _If you are not the intended 
recipient any use, disclosure or copying of this email is unauthorised. 
_If you received this email in error, please notify Discount Domain Name 
Services Pty Ltd on 03 9815 6868 to report this matter and delete all 
copies of this transmission together with any attachments. /

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: bluestore_min_alloc_size and bluefs_shared_alloc_size

2024-03-12 Thread Joel Davidow
Hi Igor,

Thanks, that's very helpful.

So in this case the Ceph developers recommend that all osds originally
built under octopus be redeployed with default settings and that default
settings continue to be used going forward. Is that correct?

Thanks for your assistance,
Joel


On Tue, Mar 12, 2024 at 4:13 AM Igor Fedotov  wrote:

> Hi Joel,
>
> my primary statement would be - do not adjust "alloc size" settings on
> your own and use default values!
>
> We've had pretty long and convoluted evolution of this stuff so tuning
> recommendations and their aftermaths greatly depend on the exact Ceph
> version. While using improper settings could result in severe performance
> impact and even data loss.
>
> Current state-of-the-arts is that we support minimal allocation size at 4K
> for everything : both HDDs and SSDs, user and bluefs data. Effective
> bluefs_shared_alloc_size (i.e. allocation unit we generally use when BlueFS
> allocates space for DB [meta]data) is at 64K but BlueFS can fallback to 4K
> allocations on its own if main disk space fragmentation is high. Higher
> base value (=64K) generally provides less overhead for both performance and
> metadata mem/disk footprint. This approach shouldn't be applied to OSDs
> which run legacy Ceph versions though. They could lack proper support for
> some aspects of this stuff.
> Using legacy 64K min allocation size for block device (aka 
> bfm_bytes_per_block)
> can sometimes result in a significant space waste - then one should upgrade
> to a version which supports 4K alloc unit and redeploy legacy OSDs. Again
> with no custom tunings for both new or old OSDs.
>
> So in short your choice should be: upgrade, redeploy with default settings
> if needed and keep using defaults.
>
>
> Hope this helps.
>
> Thanks,
>
> Igor
> On 29/02/2024 01:55, Joel Davidow wrote:
>
> Summary
> --
> The relationship of the values configured for bluestore_min_alloc_size and 
> bluefs_shared_alloc_size are reported to impact space amplification, partial 
> overwrites in erasure coded pools, and storage capacity as an osd becomes 
> more fragmented and/or more full.
>
>
> Previous discussions including this topic
> 
> comment #7 in bug 63618 in Dec 2023 - 
> https://tracker.ceph.com/issues/63618#note-7
>
> pad writeup related to bug 62282 likely from late 2023 - 
> https://pad.ceph.com/p/RCA_62282
>
> email sent 13 Sept 2023 in mail list discussion of cannot create new osd - 
> https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/5M4QAXJDCNJ74XVIBIFSHHNSETCCKNMC/
>
> comment #9 in bug 58530 likely from early 2023 - 
> https://tracker.ceph.com/issues/58530#note-9
>
> email sent 30 Sept 2021 in mail list discussion of flapping osds - 
> https://www.mail-archive.com/ceph-users@ceph.io/msg13072.html
>
> email sent 25 Feb 2020 in mail list discussion of changing allocation size - 
> https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/B3DGKH6THFGHALLX6ATJ4GGD4SVFNEKU/
>
>
> Current situation
> -
> We have three Ceph clusters that were originally built via cephadm on octopus 
> and later upgraded to pacific. All osds are HDD (will be moving to wal+db on 
> SSD) and were resharded after the upgrade to enable rocksdb sharding.
>
> The value for bluefs_shared_alloc_size has remained unchanged at 65535.
>
> The value for bluestore_min_alloc_size_hdd was 65535 in octopus but is 
> reported as 4096 by ceph daemon osd. config show in pacific. However, the 
> osd label after upgrading to pacific retains the value of 65535 for 
> bfm_bytes_per_block. BitmapFreelistManager.h in Ceph source code 
> (src/os/bluestore/BitmapFreelistManager.h) indicates that bytes_per_block is 
> bdev_block_size.  This indicates that the physical layout of the osd has not 
> changed from 65535 despite the return of the ceph dameon command reporting it 
> as 4096. This interpretation is supported by the Minimum Allocation Size part 
> of the Bluestore configuration reference for quincy 
> (https://docs.ceph.com/en/quincy/rados/configuration/bluestore-config-ref/#minimum-allocation-size)
>
> Questions
> --
> What are the pros and cons of the following three cases with two variations 
> per case - when using co-located wal+db on HDD and when using separate wal+db 
> on SSD:
> 1) bluefs_shared_alloc_size, bluestore_min_alloc_size, and 
> bfm_bytes_per_block all equal2) bluefs_shared_alloc_size greater than but a 
> multiple of bluestore_min_alloc_size with bfm_bytes_per_block equal to 
> bluestore_min_alloc_size
> 3) bluefs_shared_alloc_size greater than but a multiple of 
> bluestore_min_alloc_size with bfm_bytes_per_block equal to 
> bluefs_shared_alloc_size
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send 

[ceph-users] Ceph Users Feedback Survey

2024-03-12 Thread Neha Ojha
Hi everyone,

On behalf of the Ceph Foundation Board, I would like to announce the
creation of, and cordially invite you to, the first of a recurring series
of meetings focused solely on gathering feedback from the users of
Ceph. The overarching goal of these meetings is to elicit feedback from the
users, companies, and organizations who use Ceph in their production
environments. You can find more details about the motivation behind this
effort in our user survey [1] that we highly encourage all of you to take.
This is an extension of the Ceph User Dev Meeting with concerted focus on
Performance (led by Vincent Hsu, IBM) and Orchestration/Deployment (led by
Matt Leonard, Bloomberg), to start off with. We would like to kick off this
series of meetings on March 21, 2024. The survey will be open until March
18, 2024.

Looking forward to hearing from you!

Thanks,
Neha

[1]
https://docs.google.com/forms/d/15aWxoG4wSQz7ziBaReVNYVv94jA0dSNQsDJGqmHCLMg/viewform?ts=65e87dd8_requested=true
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: bluestore_min_alloc_size and bluefs_shared_alloc_size

2024-03-12 Thread Igor Fedotov

Hi Joel,

my primary statement would be - do not adjust "alloc size" settings on 
your own and use default values!


We've had pretty long and convoluted evolution of this stuff so tuning 
recommendations and their aftermaths greatly depend on the exact Ceph 
version. While using improper settings could result in severe 
performance impact and even data loss.


Current state-of-the-arts is that we support minimal allocation size at 
4K for everything : both HDDs and SSDs, user and bluefs data. Effective 
bluefs_shared_alloc_size (i.e. allocation unit we generally use when 
BlueFS allocates space for DB [meta]data) is at 64K but BlueFS can 
fallback to 4K allocations on its own if main disk space fragmentation 
is high. Higher base value (=64K) generally provides less overhead for 
both performance and metadata mem/disk footprint. This approach 
shouldn't be applied to OSDs which run legacy Ceph versions though. They 
could lack proper support for some aspects of this stuff.


Using legacy 64K min allocation size for block device (aka 
bfm_bytes_per_block) can sometimes result in a significant space waste - 
then one should upgrade to a version which supports 4K alloc unit and 
redeploy legacy OSDs. Again with no custom tunings for both new or old 
OSDs.


So in short your choice should be: upgrade, redeploy with default 
settings if needed and keep using defaults.



Hope this helps.

Thanks,

Igor

On 29/02/2024 01:55, Joel Davidow wrote:

Summary
--
The relationship of the values configured for bluestore_min_alloc_size and 
bluefs_shared_alloc_size are reported to impact space amplification, partial 
overwrites in erasure coded pools, and storage capacity as an osd becomes more 
fragmented and/or more full.


Previous discussions including this topic

comment #7 in bug 63618 in Dec 2023 
-https://tracker.ceph.com/issues/63618#note-7

pad writeup related to bug 62282 likely from late 2023 
-https://pad.ceph.com/p/RCA_62282

email sent 13 Sept 2023 in mail list discussion of cannot create new osd 
-https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/5M4QAXJDCNJ74XVIBIFSHHNSETCCKNMC/

comment #9 in bug 58530 likely from early 2023 
-https://tracker.ceph.com/issues/58530#note-9

email sent 30 Sept 2021 in mail list discussion of flapping osds 
-https://www.mail-archive.com/ceph-users@ceph.io/msg13072.html

email sent 25 Feb 2020 in mail list discussion of changing allocation size 
-https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/B3DGKH6THFGHALLX6ATJ4GGD4SVFNEKU/


Current situation
-
We have three Ceph clusters that were originally built via cephadm on octopus 
and later upgraded to pacific. All osds are HDD (will be moving to wal+db on 
SSD) and were resharded after the upgrade to enable rocksdb sharding.

The value for bluefs_shared_alloc_size has remained unchanged at 65535.

The value for bluestore_min_alloc_size_hdd was 65535 in octopus but is reported as 
4096 by ceph daemon osd. config show in pacific. However, the osd label 
after upgrading to pacific retains the value of 65535 for bfm_bytes_per_block. 
BitmapFreelistManager.h in Ceph source code 
(src/os/bluestore/BitmapFreelistManager.h) indicates that bytes_per_block is 
bdev_block_size.  This indicates that the physical layout of the osd has not changed 
from 65535 despite the return of the ceph dameon command reporting it as 4096. This 
interpretation is supported by the Minimum Allocation Size part of the Bluestore 
configuration reference for quincy 
(https://docs.ceph.com/en/quincy/rados/configuration/bluestore-config-ref/#minimum-allocation-size)
Questions
--
What are the pros and cons of the following three cases with two variations per 
case - when using co-located wal+db on HDD and when using separate wal+db on 
SSD:
1) bluefs_shared_alloc_size, bluestore_min_alloc_size, and bfm_bytes_per_block 
all equal2) bluefs_shared_alloc_size greater than but a multiple of 
bluestore_min_alloc_size with bfm_bytes_per_block equal to 
bluestore_min_alloc_size
3) bluefs_shared_alloc_size greater than but a multiple of 
bluestore_min_alloc_size with bfm_bytes_per_block equal to 
bluefs_shared_alloc_size
___
ceph-users mailing list --ceph-users@ceph.io
To unsubscribe send an email toceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Hanging request in S3

2024-03-12 Thread Christian Kugler
Hi Casey,

Interesting. Especially since the request it hangs on is a GET request.
I set the option and restarted the RGW I test with.

The POSTs for deleting take a while but there are not longer blocking GET
or POST requests.
Thank you!

Best,
Christian

PS: Sorry for pressing the wrong reply button, Casey
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: 18.2.2 dashboard really messed up.

2024-03-12 Thread Nizamudeen A
Hi,

The warning and danger indicator in the capacity chart points to the
nearful and full ratio set to the cluster and
the default values for them are 85% and 95% respectively. You can do a
`ceph osd dump | grep ratio` and see those.

When this got introduced, there was a blog post
explaining
how this is mapped in the chart. But when your used storage
crosses that 85% mark, the chart is colored with yellow to indicate the
user, and when it crosses 95% (or the full ratio) the
chart is colored with red to tell that. But that doesn't mean the cluster
is in bad shape but its a visual indicator to tell you
you are running out of storage.

Regarding the Cluster Utilization chart, it gets metrics directly from
prometheus so that it can be used to show a time-series
data in UI rather than the metrics at current point in time (which was used
before). So if you have prometheus configured in
dashboard and its url is provided in the dashboard settings `ceph dashboard
set-prometheus-api-host `
then you should be able to see the metrics.

In case you need to read more about the new page you can check here

.

Regards,
Nizam



On Mon, Mar 11, 2024 at 11:47 PM Harry G Coin  wrote:

> Looking at ceph -s, all is well.  Looking at the dashboard, 85% of my
> capacity is 'warned', and 95% is 'in danger'.   There is no hint given
> as to the nature of the danger or reason for the warning.  Though
> apparently with merely 5% of my ceph world 'normal', the cluster reports
> 'ok'.  Which, you know, seems contradictory.  I've used just under 40%
> of capacity.
>
> Further down the dashboard, all the subsections of 'Cluster Utilization'
> are '1' and '0.5' with nothing whatever in the graphics area.
>
> Previous versions of ceph presented a normal dashboard.
>
> It's just a little half rack, 5 hosts, a few physical drives each, been
> running ceph for a couple years now.  Orchestrator is cephadm.  It's
> just about as 'plain vanilla' at it gets.  I've had to mute one alert,
> because cephadm refresh aborts when it finds drives on any host that
> have nothing to do with ceph that don't have a blkid_ip 'TYPE' key.
> Seems unrelated to a totally messed up dashboard.  (The tracker for that
> is here: https://tracker.ceph.com/issues/63502 ).
>
> Any idea what the steps are to get useful stuff back on the dashboard?
> Any idea where I can learn what my 85% danger and 95% warning is
> 'about'?  (You'd think 'danger' (The volcano is blowing up now!)  would
> be worse than 'warning' (the volcano might blow up soon) , so how can
> warning+danger > 100%, or if not additive how can warning < danger?)
>
>   Here's a bit of detail:
>
> root@noc1:~# ceph -s
>   cluster:
> id: 4067126d-01cb-40af-824a-881c130140f8
> health: HEALTH_OK
> (muted: CEPHADM_REFRESH_FAILED)
>
>   services:
> mon: 5 daemons, quorum noc4,noc2,noc1,noc3,sysmon1 (age 70m)
> mgr: noc2.yhyuxd(active, since 82m), standbys: noc4.tvhgac,
> noc3.sybsfb, noc1.jtteqg
> mds: 1/1 daemons up, 3 standby
> osd: 27 osds: 27 up (since 20m), 27 in (since 2d)
>
>   data:
> volumes: 1/1 healthy
> pools:   16 pools, 1809 pgs
> objects: 12.29M objects, 17 TiB
> usage:   44 TiB used, 67 TiB / 111 TiB avail
> pgs: 1793 active+clean
>  9active+clean+scrubbing
>  7active+clean+scrubbing+deep
>
>   io:
> client:   5.6 MiB/s rd, 273 KiB/s wr, 41 op/s rd, 58 op/s wr
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io