[ceph-users] Re: quincy v17.2.7 QE Validation status

2023-10-19 Thread Venky Shankar
Hi Yuri,

On Thu, Oct 19, 2023 at 10:48 PM Venky Shankar  wrote:
>
> Hi Yuri,
>
> On Thu, Oct 19, 2023 at 9:32 PM Yuri Weinstein  wrote:
> >
> > We are still finishing off:
> >
> > - revert PR https://github.com/ceph/ceph/pull/54085, needs smoke suite rerun
> > - removed s3tests https://github.com/ceph/ceph/pull/54078 merged
> >
> > Venky, Casey FYI
>
> https://github.com/ceph/ceph/pull/53139 is causing a smoke test
> failure. Details:
> https://github.com/ceph/ceph/pull/53139#issuecomment-1771388202
>
> I've sent a revert for that change -
> https://github.com/ceph/ceph/pull/54108 - will let you know when it's
> ready for testing.

smoke passes with this revert


https://pulpito.ceph.com/vshankar-2023-10-19_20:24:36-smoke-wip-vshankar-testing-quincy-20231019.172112-testing-default-smithi/

fs suite running now...

>
> >
> > On Wed, Oct 18, 2023 at 9:07 PM Venky Shankar  wrote:
> > >
> > > On Tue, Oct 17, 2023 at 12:23 AM Yuri Weinstein  
> > > wrote:
> > > >
> > > > Details of this release are summarized here:
> > > >
> > > > https://tracker.ceph.com/issues/63219#note-2
> > > > Release Notes - TBD
> > > >
> > > > Issue https://tracker.ceph.com/issues/63192 appears to be failing 
> > > > several runs.
> > > > Should it be fixed for this release?
> > > >
> > > > Seeking approvals/reviews for:
> > > >
> > > > smoke - Laura
> > >
> > > There's one failure in the smoke tests
> > >
> > > 
> > > https://pulpito.ceph.com/yuriw-2023-10-18_14:58:31-smoke-quincy-release-distro-default-smithi/
> > >
> > > caused by
> > >
> > > https://github.com/ceph/ceph/pull/53647
> > >
> > > (which was marked DNM but got merged). However, it's a test case thing
> > > and we can live with it.
> > >
> > > Yuri mention in slack that he might do another round of build/tests,
> > > so, Yuri, here's the reverted change:
> > >
> > >https://github.com/ceph/ceph/pull/54085
> > >
> > > > rados - Laura, Radek, Travis, Ernesto, Adam King
> > > >
> > > > rgw - Casey
> > > > fs - Venky
> > > > orch - Adam King
> > > >
> > > > rbd - Ilya
> > > > krbd - Ilya
> > > >
> > > > upgrade/quincy-p2p - Known issue IIRC, Casey pls confirm/approve
> > > >
> > > > client-upgrade-quincy-reef - Laura
> > > >
> > > > powercycle - Brad pls confirm
> > > >
> > > > ceph-volume - Guillaume pls take a look
> > > >
> > > > Please reply to this email with approval and/or trackers of known
> > > > issues/PRs to address them.
> > > >
> > > > Josh, Neha - gibba and LRC upgrades -- N/A for quincy now after reef 
> > > > release.
> > > >
> > > > Thx
> > > > YuriW
> > > > ___
> > > > ceph-users mailing list -- ceph-users@ceph.io
> > > > To unsubscribe send an email to ceph-users-le...@ceph.io
> > > >
> > >
> > >
> > > --
> > > Cheers,
> > > Venky
> > >
> >
>
>
> --
> Cheers,
> Venky



-- 
Cheers,
Venky
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph 16.2.14: OSDs randomly crash in bstore_kv_sync

2023-10-19 Thread Zakhar Kirpichenko
Igor, I noticed that there's no roadmap for the next 16.2.x release. May I
ask what time frame we are looking at with regards to a possible fix?

We're experiencing several OSD crashes caused by this issue per day.

/Z

On Mon, 16 Oct 2023 at 14:19, Igor Fedotov  wrote:

> That's true.
> On 16/10/2023 14:13, Zakhar Kirpichenko wrote:
>
> Many thanks, Igor. I found previously submitted bug reports and subscribed
> to them. My understanding is that the issue is going to be fixed in the
> next Pacific minor release.
>
> /Z
>
> On Mon, 16 Oct 2023 at 14:03, Igor Fedotov  wrote:
>
>> Hi Zakhar,
>>
>> please see my reply for the post on the similar issue at:
>>
>> https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/YNJ35HXN4HXF4XWB6IOZ2RKXX7EQCEIY/
>>
>>
>> Thanks,
>>
>> Igor
>>
>> On 16/10/2023 09:26, Zakhar Kirpichenko wrote:
>> > Hi,
>> >
>> > After upgrading to Ceph 16.2.14 we had several OSD crashes
>> > in bstore_kv_sync thread:
>> >
>> >
>> > 1. "assert_thread_name": "bstore_kv_sync",
>> > 2. "backtrace": [
>> > 3. "/lib64/libpthread.so.0(+0x12cf0) [0x7ff2f6750cf0]",
>> > 4. "gsignal()",
>> > 5. "abort()",
>> > 6. "(ceph::__ceph_assert_fail(char const*, char const*, int, char
>> > const*)+0x1a9) [0x564dc5f87d0b]",
>> > 7. "/usr/bin/ceph-osd(+0x584ed4) [0x564dc5f87ed4]",
>> > 8. "(RocksDBBlueFSVolumeSelector::sub_usage(void*, bluefs_fnode_t
>> > const&)+0x15e) [0x564dc6604a9e]",
>> > 9. "(BlueFS::_flush_range_F(BlueFS::FileWriter*, unsigned long,
>> unsigned
>> > long)+0x77d) [0x564dc66951cd]",
>> > 10. "(BlueFS::_flush_F(BlueFS::FileWriter*, bool, bool*)+0x90)
>> > [0x564dc6695670]",
>> > 11. "(BlueFS::fsync(BlueFS::FileWriter*)+0x18b) [0x564dc66b1a6b]",
>> > 12. "(BlueRocksWritableFile::Sync()+0x18) [0x564dc66c1768]",
>> > 13. "(rocksdb::LegacyWritableFileWrapper::Sync(rocksdb::IOOptions
>> > const&, rocksdb::IODebugContext*)+0x1f) [0x564dc6b6496f]",
>> > 14. "(rocksdb::WritableFileWriter::SyncInternal(bool)+0x402)
>> > [0x564dc6c761c2]",
>> > 15. "(rocksdb::WritableFileWriter::Sync(bool)+0x88)
>> [0x564dc6c77808]",
>> > 16. "(rocksdb::DBImpl::WriteToWAL(rocksdb::WriteThread::WriteGroup
>> > const&, rocksdb::log::Writer*, unsigned long*, bool, bool, unsigned
>> > long)+0x309) [0x564dc6b780c9]",
>> > 17. "(rocksdb::DBImpl::WriteImpl(rocksdb::WriteOptions const&,
>> > rocksdb::WriteBatch*, rocksdb::WriteCallback*, unsigned long*,
>> unsigned
>> > long, bool, unsigned long*, unsigned long,
>> > rocksdb::PreReleaseCallback*)+0x2629) [0x564dc6b80c69]",
>> > 18. "(rocksdb::DBImpl::Write(rocksdb::WriteOptions const&,
>> > rocksdb::WriteBatch*)+0x21) [0x564dc6b80e61]",
>> > 19. "(RocksDBStore::submit_common(rocksdb::WriteOptions&,
>> > std::shared_ptr)+0x84)
>> [0x564dc6b1f644]",
>> > 20.
>> "(RocksDBStore::submit_transaction_sync(std::shared_ptr)+0x9a)
>> > [0x564dc6b2004a]",
>> > 21. "(BlueStore::_kv_sync_thread()+0x30d8) [0x564dc6602ec8]",
>> > 22. "(BlueStore::KVSyncThread::entry()+0x11) [0x564dc662ab61]",
>> > 23. "/lib64/libpthread.so.0(+0x81ca) [0x7ff2f67461ca]",
>> > 24. "clone()"
>> > 25. ],
>> >
>> >
>> > I am attaching two instances of crash info for further reference:
>> > https://pastebin.com/E6myaHNU
>> >
>> > OSD configuration is rather simple and close to default:
>> >
>> > osd.6 dev   bluestore_cache_size_hdd4294967296
>> >osd.6 dev
>> > bluestore_cache_size_ssd4294967296
>> >osd   advanced  debug_rocksdb
>> >1/5
>>  osd
>> >  advanced  osd_max_backfills   2
>> >  osd   basic
>> > osd_memory_target   17179869184
>> >  osd   advanced  osd_recovery_max_active
>> >  2 osd
>> >  advanced  osd_scrub_sleep 0.10
>> >osd   advanced
>> >   rbd_balance_parent_readsfalse
>> >
>> > debug_rocksdb is a recent change, otherwise this configuration has been
>> > running without issues for months. The crashes happened on two different
>> > hosts with identical hardware, the hosts and storage (NVME DB/WAL, HDD
>> > block) don't exhibit any issues. We have not experienced such crashes
>> with
>> > Ceph < 16.2.14.
>> >
>> > Is this a known issue, or should I open a bug report?
>> >
>> > Best regards,
>> > Zakhar
>> > ___
>> > ceph-users mailing list -- ceph-users@ceph.io
>> > To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Specify priority for active MGR and MDS

2023-10-19 Thread Patrick Donnelly
Hello Nicolas,

On Wed, Sep 27, 2023 at 9:32 AM Nicolas FONTAINE  wrote:
>
> Hi everyone,
>
> Is there a way to specify which MGR and which MDS should be the active one?

With respect to the MDS, if your reason for asking is because you want
to have the better provisioned MDS as the active then I discourage you
from architecting your system that way. The standby should be equally
provisioned as recovery generally is a resource intense process that
will usually consume more resources than the steady-state active. If
you underprovision memory, especially, then your standup will simply
not function (go OOM).

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: quincy v17.2.7 QE Validation status

2023-10-19 Thread Laura Flores
Yuri pointed me to a new failure in the quincy-p2p suite from 17.2.7
testing: https://tracker.ceph.com/issues/63257

RADOS is currently investigating.

On Thu, Oct 19, 2023 at 12:21 PM Venky Shankar  wrote:

> Hi Yuri,
>
> On Thu, Oct 19, 2023 at 9:32 PM Yuri Weinstein 
> wrote:
> >
> > We are still finishing off:
> >
> > - revert PR https://github.com/ceph/ceph/pull/54085, needs smoke suite
> rerun
> > - removed s3tests https://github.com/ceph/ceph/pull/54078 merged
> >
> > Venky, Casey FYI
>
> https://github.com/ceph/ceph/pull/53139 is causing a smoke test
> failure. Details:
> https://github.com/ceph/ceph/pull/53139#issuecomment-1771388202
>
> I've sent a revert for that change -
> https://github.com/ceph/ceph/pull/54108 - will let you know when it's
> ready for testing.
>
> >
> > On Wed, Oct 18, 2023 at 9:07 PM Venky Shankar 
> wrote:
> > >
> > > On Tue, Oct 17, 2023 at 12:23 AM Yuri Weinstein 
> wrote:
> > > >
> > > > Details of this release are summarized here:
> > > >
> > > > https://tracker.ceph.com/issues/63219#note-2
> > > > Release Notes - TBD
> > > >
> > > > Issue https://tracker.ceph.com/issues/63192 appears to be failing
> several runs.
> > > > Should it be fixed for this release?
> > > >
> > > > Seeking approvals/reviews for:
> > > >
> > > > smoke - Laura
> > >
> > > There's one failure in the smoke tests
> > >
> > >
> https://pulpito.ceph.com/yuriw-2023-10-18_14:58:31-smoke-quincy-release-distro-default-smithi/
> > >
> > > caused by
> > >
> > > https://github.com/ceph/ceph/pull/53647
> > >
> > > (which was marked DNM but got merged). However, it's a test case thing
> > > and we can live with it.
> > >
> > > Yuri mention in slack that he might do another round of build/tests,
> > > so, Yuri, here's the reverted change:
> > >
> > >https://github.com/ceph/ceph/pull/54085
> > >
> > > > rados - Laura, Radek, Travis, Ernesto, Adam King
> > > >
> > > > rgw - Casey
> > > > fs - Venky
> > > > orch - Adam King
> > > >
> > > > rbd - Ilya
> > > > krbd - Ilya
> > > >
> > > > upgrade/quincy-p2p - Known issue IIRC, Casey pls confirm/approve
> > > >
> > > > client-upgrade-quincy-reef - Laura
> > > >
> > > > powercycle - Brad pls confirm
> > > >
> > > > ceph-volume - Guillaume pls take a look
> > > >
> > > > Please reply to this email with approval and/or trackers of known
> > > > issues/PRs to address them.
> > > >
> > > > Josh, Neha - gibba and LRC upgrades -- N/A for quincy now after reef
> release.
> > > >
> > > > Thx
> > > > YuriW
> > > > ___
> > > > ceph-users mailing list -- ceph-users@ceph.io
> > > > To unsubscribe send an email to ceph-users-le...@ceph.io
> > > >
> > >
> > >
> > > --
> > > Cheers,
> > > Venky
> > >
> >
>
>
> --
> Cheers,
> Venky
> ___
> Dev mailing list -- d...@ceph.io
> To unsubscribe send an email to dev-le...@ceph.io
>


-- 

Laura Flores

She/Her/Hers

Software Engineer, Ceph Storage 

Chicago, IL

lflo...@ibm.com | lflo...@redhat.com 
M: +17087388804
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Turn off Dashboard CephNodeDiskspaceWarning for specific drives?

2023-10-19 Thread Eugen Block

Hi,

I assume it's this prometheus alert rule definition:

  - alert: "CephNodeDiskspaceWarning"
annotations:
  description: "Mountpoint {{ $labels.mountpoint }} on {{  
$labels.nodename }} will be full in less than 5 days based on the 48  
hour trailing fill rate."

  summary: "Host filesystem free space is getting low"
expr:  
"predict_linear(node_filesystem_free_bytes{device=~\"/.*\"}[2d], 3600  
* 24 * 5) *on(instance) group_left(nodename) node_uname_info < 0"

labels:
  oid: "1.3.6.1.4.1.50495.1.2.1.8.4"
  severity: "warning"
  type: "ceph_default"

It's located on your prometheus node under  
/var/lib/ceph/{fsid}/prometheus.{hostname}/etc/prometheus/alerting/ceph_alerts.yml.
You could try playing around with the expression to not consider all  
devices. You can just edit this file and then restart prometheus.


Regards,
Eugen

Zitat von Daniel Brown :


Greetings -

Forgive me if this is an elementary question - am fairly new to  
running CEPH. Have searched but didn’t see anything specific that  
came up.


Is there any way to disable the disk space warnings  
(CephNodeDiskspaceWarning) for specific drives or filesystems on my  
CEPH servers?


Running 18.2.0, installed with cephadm on Ubuntu 22.04 on Arm. Keep  
seeing these warnings in the Dashboard for /boot/firmware, which, in  
my opinion shouldn’t really be something that ceph needs to worry  
about - or at least, should be something I can configure it to ignore.



Thanks in advance.

Dan.


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: How do you handle large Ceph object storage cluster?

2023-10-19 Thread Peter Grandi
> [...] (>10k OSDs, >60 PB of data).

6TBs on average per OSD? Hopully SSDs or RAID10 (or low-number,
3-5) RAID5.

> It is entirely dedicated to object storage with S3 interface.
> Maintenance and its extension are getting more and more
> problematic and time consuming.

Ah the joys of a single large unified storage pool :-).
https://www.sabi.co.uk/blog/0804apr.html?080417#080417

> We consider to split it to two or more completely separate
> clusters

I would suggest doing it 1-2 years ago...

> create S3 layer of abstraction with some additional metadata
> that will allow us to use these 2+ physically independent
> instances as a one logical cluster.

That's what the bucket hierarchy in a Ceph cluster instance
already does. What your layer is going to do is either:

 1) Lookup the object ID in a list of instances, and fetch the
object from the instance that validates the object ID;
 2) Maintain a huge table of all object IDs and which instances
they are in.

But 1) is basically what CRUSH already does and 2) means giving
up the Ceph "decentralized" philosophy based on CRUSH.

BTW one old practice that so few systems follow is to use as
object keys neither addresses nor identifiers, but *both*: first
access the address treating it as a hint, check that the
identifier matches, if not do a slower lookup using the object
identifier part to find the actual address.

> Additionally, newest data is the most demanded data, so we
> have to spread it equally among clusters to avoid skews in
> cluster load.

I usually do the opposite, but that depends on your application.

My practice is to recognize that data is indeed usually
stratified by date, and regard filesystem instances as "silos"
and create a new filesystems instance every some months or
years, and direct all new file creation to the latest instance,
and then get rid progressively of the older instances or copy
their "active" data onwards into the new instance, and the
"inactive" data to offline storage.
http://www.sabi.co.uk/blog/12-fou.html?121218b#121218b

If you really need to keep all data forever online, which is
usually not the case (that's why there are laws that expire
matters after N years) the second best option is to keep old
silos powered up indefinitely, and they will take very little
attention beyond refreshing the hardware periodically and
migrating the data to new instances when that stops being
economical.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: quincy v17.2.7 QE Validation status

2023-10-19 Thread Venky Shankar
Hi Yuri,

On Thu, Oct 19, 2023 at 9:32 PM Yuri Weinstein  wrote:
>
> We are still finishing off:
>
> - revert PR https://github.com/ceph/ceph/pull/54085, needs smoke suite rerun
> - removed s3tests https://github.com/ceph/ceph/pull/54078 merged
>
> Venky, Casey FYI

https://github.com/ceph/ceph/pull/53139 is causing a smoke test
failure. Details:
https://github.com/ceph/ceph/pull/53139#issuecomment-1771388202

I've sent a revert for that change -
https://github.com/ceph/ceph/pull/54108 - will let you know when it's
ready for testing.

>
> On Wed, Oct 18, 2023 at 9:07 PM Venky Shankar  wrote:
> >
> > On Tue, Oct 17, 2023 at 12:23 AM Yuri Weinstein  wrote:
> > >
> > > Details of this release are summarized here:
> > >
> > > https://tracker.ceph.com/issues/63219#note-2
> > > Release Notes - TBD
> > >
> > > Issue https://tracker.ceph.com/issues/63192 appears to be failing several 
> > > runs.
> > > Should it be fixed for this release?
> > >
> > > Seeking approvals/reviews for:
> > >
> > > smoke - Laura
> >
> > There's one failure in the smoke tests
> >
> > 
> > https://pulpito.ceph.com/yuriw-2023-10-18_14:58:31-smoke-quincy-release-distro-default-smithi/
> >
> > caused by
> >
> > https://github.com/ceph/ceph/pull/53647
> >
> > (which was marked DNM but got merged). However, it's a test case thing
> > and we can live with it.
> >
> > Yuri mention in slack that he might do another round of build/tests,
> > so, Yuri, here's the reverted change:
> >
> >https://github.com/ceph/ceph/pull/54085
> >
> > > rados - Laura, Radek, Travis, Ernesto, Adam King
> > >
> > > rgw - Casey
> > > fs - Venky
> > > orch - Adam King
> > >
> > > rbd - Ilya
> > > krbd - Ilya
> > >
> > > upgrade/quincy-p2p - Known issue IIRC, Casey pls confirm/approve
> > >
> > > client-upgrade-quincy-reef - Laura
> > >
> > > powercycle - Brad pls confirm
> > >
> > > ceph-volume - Guillaume pls take a look
> > >
> > > Please reply to this email with approval and/or trackers of known
> > > issues/PRs to address them.
> > >
> > > Josh, Neha - gibba and LRC upgrades -- N/A for quincy now after reef 
> > > release.
> > >
> > > Thx
> > > YuriW
> > > ___
> > > ceph-users mailing list -- ceph-users@ceph.io
> > > To unsubscribe send an email to ceph-users-le...@ceph.io
> > >
> >
> >
> > --
> > Cheers,
> > Venky
> >
>


-- 
Cheers,
Venky
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: quincy v17.2.7 QE Validation status

2023-10-19 Thread Yuri Weinstein
We are still finishing off:

- revert PR https://github.com/ceph/ceph/pull/54085, needs smoke suite rerun
- removed s3tests https://github.com/ceph/ceph/pull/54078 merged

Venky, Casey FYI

On Wed, Oct 18, 2023 at 9:07 PM Venky Shankar  wrote:
>
> On Tue, Oct 17, 2023 at 12:23 AM Yuri Weinstein  wrote:
> >
> > Details of this release are summarized here:
> >
> > https://tracker.ceph.com/issues/63219#note-2
> > Release Notes - TBD
> >
> > Issue https://tracker.ceph.com/issues/63192 appears to be failing several 
> > runs.
> > Should it be fixed for this release?
> >
> > Seeking approvals/reviews for:
> >
> > smoke - Laura
>
> There's one failure in the smoke tests
>
> 
> https://pulpito.ceph.com/yuriw-2023-10-18_14:58:31-smoke-quincy-release-distro-default-smithi/
>
> caused by
>
> https://github.com/ceph/ceph/pull/53647
>
> (which was marked DNM but got merged). However, it's a test case thing
> and we can live with it.
>
> Yuri mention in slack that he might do another round of build/tests,
> so, Yuri, here's the reverted change:
>
>https://github.com/ceph/ceph/pull/54085
>
> > rados - Laura, Radek, Travis, Ernesto, Adam King
> >
> > rgw - Casey
> > fs - Venky
> > orch - Adam King
> >
> > rbd - Ilya
> > krbd - Ilya
> >
> > upgrade/quincy-p2p - Known issue IIRC, Casey pls confirm/approve
> >
> > client-upgrade-quincy-reef - Laura
> >
> > powercycle - Brad pls confirm
> >
> > ceph-volume - Guillaume pls take a look
> >
> > Please reply to this email with approval and/or trackers of known
> > issues/PRs to address them.
> >
> > Josh, Neha - gibba and LRC upgrades -- N/A for quincy now after reef 
> > release.
> >
> > Thx
> > YuriW
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
>
>
> --
> Cheers,
> Venky
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Turn off Dashboard CephNodeDiskspaceWarning for specific drives?

2023-10-19 Thread Daniel Brown

Greetings - 

Forgive me if this is an elementary question - am fairly new to running CEPH. 
Have searched but didn’t see anything specific that came up. 

Is there any way to disable the disk space warnings (CephNodeDiskspaceWarning) 
for specific drives or filesystems on my CEPH servers? 

Running 18.2.0, installed with cephadm on Ubuntu 22.04 on Arm. Keep seeing 
these warnings in the Dashboard for /boot/firmware, which, in my opinion 
shouldn’t really be something that ceph needs to worry about - or at least, 
should be something I can configure it to ignore. 


Thanks in advance. 

Dan. 


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Remove empty orphaned PGs not mapped to a pool

2023-10-19 Thread Malte Stroem

Hello Eugen,

before answering your questions:

Creating a new simple replicated crush rule and setting it to the 
metadata pool was the solution.


> - Your cache tier was on SSDs which need to be removed.

Yes, the cache tier pool and the metadata pool for the RBDs.

> - Cache tier was removed successfully.

Yes.

> - But since the rbd-meta pool is not aware of device classes it used 
the same SSDs.


No, it was located on the SSDs from the beginning.

Looking at the device classes was wrong.

Just setting the right crush rule was the right way.

Thank you very much and please feel free to ask me about the cache tier 
process or if you need some help.


Best regards,
Malte

On 18.10.23 11:50, Eugen Block wrote:

Hi,


So now we need to empty these OSDs.

The device class was SSD. I changed it to HDD and moved the OSDs 
inside the Crush tree to the other HDD OSDs of the host.
I need to move the PGs away from the OSDs to other OSDs but I do not 
know how to do it.


your crush rule doesn't specify a device class so moving them around 
doesn't really help (as you already noticed). Are other pools using that 
crush rule? You can see the applied rule IDs in 'ceph osd pool ls 
detail' output. If other pools use the same rule, make sure the cluster 
can handle data movement if you change it. Test the modified rule with 
crushtool first, and maybe in your test cluster as well.
If you add a device class statement (step take default class hdd) to the 
rule the SSDs will be drained automatically, at least they should.


But before you change anyhting I just want to make sure I understand 
correctly:


- Your cache tier was on SSDs which need to be removed.
- Cache tier was removed successfully.
- But since the rbd-meta pool is not aware of device classes it used the 
same SSDs.


Don't change the root bmeta, only the crush rule "rbd-meta", here's an 
example from a replicated pool:


# rules
rule replicated_ruleset {
     id 0
     type replicated
     min_size 1
     max_size 10
     step take default class hdd
     step chooseleaf firstn 0 type host
     step emit
}


Zitat von Malte Stroem :


Hello Eugen,

I was wrong. I am sorry.

The PGs are not empty and orphaned.

Most of the PGs are empty but a few are indeed used.

And the pool for these PGs is still there. It is the metadata pool of 
the erasure coded pool for RBDs. The cache tier pool was removed 
successfully.


So now we need to empty these OSDs.

The device class was SSD. I changed it to HDD and moved the OSDs 
inside the Crush tree to the other HDD OSDs of the host.


I need to move the PGs away from the OSDs to other OSDs but I do not 
know how to do it.


Is using pg-upmap the solution?

Is using the objectstore-tool the solution?

Is moving the OSDs inside Crush to the right place the solution?

Is migrating the metadata pool to another with another crush rule the 
solution?


The crush rule of this metadata pool looks like this:

{
    "rule_id": 8,
    "rule_name": "rbd-meta",
    "ruleset": 6,
    "type": 1,
    "min_size": 1,
    "max_size": 10,
    "steps": [
    {
    "op": "take",
    "item": -4,
    "item_name": "bmeta"
    },
    {
    "op": "chooseleaf_firstn",
    "num": 0,
    "type": "host"
    },
    {
    "op": "emit"
    }
    ]
}

When stopping one of the OSDs the status gets degraded.

How to move PGs away from the OSDs?

How to let the pool use other OSDs?

Changing the crush rule?

Best,
Malte

Am 05.10.23 um 11:35 schrieb Malte Stroem:

Hello Eugen, Hello Joachim,

@Joachim: Interesting! And you got empty PGs, too? How did you solve 
the problem?


@Eugen: This is one of our biggest clusters and we're in the process 
to migrate from Nautilus to Octopus and to migrate from CentOS to 
Ubuntu.


The cache tier pool's OSDs were still version 14 OSDs. Most of the 
other OSDs are version 15 already.


So I tested the command:

ceph-objectstore-tool --data-path /path/to/osd --op remove --pgid 3.0 
--force


in a test cluster environment and this worked fine.

But the test scenario was not similar to our productive environment 
and the PG wasn't empty.


I did not find a way to emulate the same situation in the test 
scenario, yet.


Best,
Malte

Am 05.10.23 um 11:03 schrieb Eugen Block:
I know, I know... but since we are already using it (for years) I 
have to check how to remove it safely, maybe as long as we're on 
Pacific. ;-)


Zitat von Joachim Kraftmayer - ceph ambassador 
:



@Eugen

We have seen the same problems 8 years ago. I can only recommend 
never to use cache tiering in production.
At Cephalocon this was part of my talk and as far as I remember 
cache tiering will also disappear from ceph soon.


Cache tiering has been deprecated in the Reef release as it has 
lacked a maintainer for a very long time. This does not mean it 
will be certainly removed, but we may choose to remove it without 
much further 

[ceph-users] Re: stuck MDS warning: Client HOST failing to respond to cache pressure

2023-10-19 Thread Frank Schilder
I just tried what sending SIGSTOP and SIGCONT do. After stopping the process 3 
caps were returned. After resuming the process these 3 caps were allocated 
again. There seems to be a large number of stale caps that are not released. 
While the process was stopped the kworker thread continued to show 2% CPU usage 
even though there was no file IO going on.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Frank Schilder 
Sent: Thursday, October 19, 2023 10:02 AM
To: Stefan Kooman; ceph-users@ceph.io
Subject: [ceph-users] Re: stuck MDS warning: Client HOST failing to respond to 
cache pressure

Hi Stefan,

the jobs ended and the warning disappeared as expected. However, a new job 
started and the warning showed up again. There is something very strange going 
on and, maybe, you can help out here:

We have a low client CAPS limit configured for performance reasons:

# ceph config dump | grep client
[...]
  mds   advanced  mds_max_caps_per_client   
  65536

The job in question holds more than that:

# ceph tell mds.0 session ls | jq -c '[.[] | {id: .id, h: 
.client_metadata.hostname, addr: .inst, fs: .client_metadata.root, caps: 
.num_caps, req: .request_load_avg}]|sort_by(.caps)|.[]' | tail
[...]
{"id":172249397,"h":"sn272...","addr":"client.172249397 
v1:192.168.57.143:0/195146548","fs":"/hpc/home","caps":105417,"req":1442}

This CAPS allocation is stable over time, the number doesn't change (I queried 
multiple times with several minutes interval). My guess is that the MDS message 
is not about cache pressure but rather about caps trimming. We do have clients 
that regularly exceed the limit though without MDS warnings. My guess is that 
these return at least some CAPS on request and are, therefore, not flagged. The 
client above seems to sit on a fixed set of CAPS that doesn't change and this 
causes the warning to show up.

The strange thing now is that very few files (on ceph fs) are actually open on 
the client:

[USER@sn272 ~]$ lsof -u USER | grep -e /home -e /groups -e /apps | wc -l
170

The kworker thread is at about 3% CPU and should be able to release CAPS. I'm 
wondering why it doesn't happen though. I also don't believe that 170 open 
files can allocate 105417 client caps.

Questions:

- Why does the client have so many caps allocated? Is there another way than 
open files that requires allocations?
- Is there a way to find out what these caps are for?
- We will look at the code (its python+miniconda), any pointers what to look 
for?

Thanks and best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Frank Schilder 
Sent: Tuesday, October 17, 2023 11:27 AM
To: Stefan Kooman; ceph-users@ceph.io
Subject: [ceph-users] Re: stuck MDS warning: Client HOST failing to respond to 
cache pressure

Hi Stefan,

probably. Its 2 compute nodes and there are jobs running. Our epilogue script 
will drop the caches, at which point I indeed expect the warning to disappear. 
We have no time limit on these nodes though, so this can be a while. I was 
hoping there was an alternative to that, say, a user-level command that I could 
execute on the client without possibly affecting other users jobs.

Thanks and best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Stefan Kooman 
Sent: Tuesday, October 17, 2023 11:13 AM
To: Frank Schilder; ceph-users@ceph.io
Subject: Re: [ceph-users] stuck MDS warning: Client HOST failing to respond to 
cache pressure

On 17-10-2023 09:22, Frank Schilder wrote:
> Hi all,
>
> I'm affected by a stuck MDS warning for 2 clients: "failing to respond to 
> cache pressure". This is a false alarm as no MDS is under any cache pressure. 
> The warning is stuck already for a couple of days. I found some old threads 
> about cases where the MDS does not update flags/triggers for this warning in 
> certain situations. Dating back to luminous and I'm probably hitting one of 
> these.
>
> In these threads I could find a lot except for instructions for how to clear 
> this out in a nice way. Is there something I can do on the clients to clear 
> this warning? I don't want to evict/reboot just because of that.

echo 2 > /proc/sys/vm/drop_caches on the clients  does that help?

Gr. Stefan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph 16.2.14: pgmap updated every few seconds for no apparent reason

2023-10-19 Thread Zakhar Kirpichenko
Thanks, Eugen. This is a useful setting.

/Z

On Thu, 19 Oct 2023 at 10:43, Eugen Block  wrote:

> Hi,
>
> you can change the report interval with this config option (default 2
> seconds):
>
> $ ceph config get mgr mgr_tick_period
> 2
>
> $ ceph config set mgr mgr_tick_period 10
>
> Regards,
> Eugen
>
> Zitat von Chris Palmer :
>
> > I have just checked 2 quincy 17.2.6 clusters, and I see exactly the
> > same. The pgmap version is bumping every two seconds (which ties in
> > with the frequency you observed). Both clusters are healthy with
> > nothing apart from client IO happening.
> >
> > On 13/10/2023 12:09, Zakhar Kirpichenko wrote:
> >> Hi,
> >>
> >> I am investigating excessive mon writes in our cluster and wondering
> >> whether excessive pgmap updates could be the culprit. Basically pgmap is
> >> updated every few seconds, sometimes over ten times per minute, in a
> >> healthy cluster with no OSD and/or PG changes:
> >>
> >> Oct 13 11:03:03 ceph03 bash[4019]: cluster
> 2023-10-13T11:03:01.515438+
> >> mgr.ceph01.vankui (mgr.336635131) 838252 : cluster [DBG] pgmap v606575:
> >> 2400 pgs: 5 active+clean+scrubbing+deep, 2395 active+clean; 16 TiB
> data, 61
> >> TiB used, 716 TiB / 777 TiB avail; 60 MiB/s rd, 109 MiB/s wr, 5.65k op/s
> >> Oct 13 11:03:04 ceph03 bash[4019]: cluster
> 2023-10-13T11:03:03.520953+
> >> mgr.ceph01.vankui (mgr.336635131) 838253 : cluster [DBG] pgmap v606576:
> >> 2400 pgs: 5 active+clean+scrubbing+deep, 2395 active+clean; 16 TiB
> data, 61
> >> TiB used, 716 TiB / 777 TiB avail; 64 MiB/s rd, 128 MiB/s wr, 5.76k op/s
> >> Oct 13 11:03:06 ceph03 bash[4019]: cluster
> 2023-10-13T11:03:05.524474+
> >> mgr.ceph01.vankui (mgr.336635131) 838255 : cluster [DBG] pgmap v606577:
> >> 2400 pgs: 5 active+clean+scrubbing+deep, 2395 active+clean; 16 TiB
> data, 61
> >> TiB used, 716 TiB / 777 TiB avail; 64 MiB/s rd, 122 MiB/s wr, 5.57k op/s
> >> Oct 13 11:03:08 ceph03 bash[4019]: cluster
> 2023-10-13T11:03:07.530484+
> >> mgr.ceph01.vankui (mgr.336635131) 838256 : cluster [DBG] pgmap v606578:
> >> 2400 pgs: 5 active+clean+scrubbing+deep, 2395 active+clean; 16 TiB
> data, 61
> >> TiB used, 716 TiB / 777 TiB avail; 79 MiB/s rd, 127 MiB/s wr, 6.62k op/s
> >> Oct 13 11:03:10 ceph03 bash[4019]: cluster
> 2023-10-13T11:03:09.57+
> >> mgr.ceph01.vankui (mgr.336635131) 838258 : cluster [DBG] pgmap v606579:
> >> 2400 pgs: 5 active+clean+scrubbing+deep, 2395 active+clean; 16 TiB
> data, 61
> >> TiB used, 716 TiB / 777 TiB avail; 66 MiB/s rd, 104 MiB/s wr, 5.38k op/s
> >> Oct 13 11:03:12 ceph03 bash[4019]: cluster
> 2023-10-13T11:03:11.537908+
> >> mgr.ceph01.vankui (mgr.336635131) 838259 : cluster [DBG] pgmap v606580:
> >> 2400 pgs: 5 active+clean+scrubbing+deep, 2395 active+clean; 16 TiB
> data, 61
> >> TiB used, 716 TiB / 777 TiB avail; 85 MiB/s rd, 121 MiB/s wr, 6.43k op/s
> >> Oct 13 11:03:13 ceph03 bash[4019]: cluster
> 2023-10-13T11:03:13.543490+
> >> mgr.ceph01.vankui (mgr.336635131) 838260 : cluster [DBG] pgmap v606581:
> >> 2400 pgs: 5 active+clean+scrubbing+deep, 2395 active+clean; 16 TiB
> data, 61
> >> TiB used, 716 TiB / 777 TiB avail; 78 MiB/s rd, 127 MiB/s wr, 6.54k op/s
> >> Oct 13 11:03:16 ceph03 bash[4019]: cluster
> 2023-10-13T11:03:15.547122+
> >> mgr.ceph01.vankui (mgr.336635131) 838262 : cluster [DBG] pgmap v606582:
> >> 2400 pgs: 5 active+clean+scrubbing+deep, 2395 active+clean; 16 TiB
> data, 61
> >> TiB used, 716 TiB / 777 TiB avail; 71 MiB/s rd, 122 MiB/s wr, 6.08k op/s
> >> Oct 13 11:03:18 ceph03 bash[4019]: cluster
> 2023-10-13T11:03:17.553180+
> >> mgr.ceph01.vankui (mgr.336635131) 838263 : cluster [DBG] pgmap v606583:
> >> 2400 pgs: 1 active+clean+scrubbing, 5 active+clean+scrubbing+deep, 2394
> >> active+clean; 16 TiB data, 61 TiB used, 716 TiB / 777 TiB avail; 75
> MiB/s
> >> rd, 176 MiB/s wr, 6.83k op/s
> >> Oct 13 11:03:20 ceph03 bash[4019]: cluster
> 2023-10-13T11:03:19.555960+
> >> mgr.ceph01.vankui (mgr.336635131) 838264 : cluster [DBG] pgmap v606584:
> >> 2400 pgs: 1 active+clean+scrubbing, 5 active+clean+scrubbing+deep, 2394
> >> active+clean; 16 TiB data, 61 TiB used, 716 TiB / 777 TiB avail; 58
> MiB/s
> >> rd, 161 MiB/s wr, 5.55k op/s
> >> Oct 13 11:03:22 ceph03 bash[4019]: cluster
> 2023-10-13T11:03:21.560597+
> >> mgr.ceph01.vankui (mgr.336635131) 838266 : cluster [DBG] pgmap v606585:
> >> 2400 pgs: 1 active+clean+scrubbing, 5 active+clean+scrubbing+deep, 2394
> >> active+clean; 16 TiB data, 61 TiB used, 716 TiB / 777 TiB avail; 62
> MiB/s
> >> rd, 221 MiB/s wr, 6.19k op/s
> >> Oct 13 11:03:24 ceph03 bash[4019]: cluster
> 2023-10-13T11:03:23.565974+
> >> mgr.ceph01.vankui (mgr.336635131) 838267 : cluster [DBG] pgmap v606586:
> >> 2400 pgs: 1 active+clean+scrubbing, 5 active+clean+scrubbing+deep, 2394
> >> active+clean; 16 TiB data, 61 TiB used, 716 TiB / 777 TiB avail; 50
> MiB/s
> >> rd, 246 MiB/s wr, 5.93k op/s
> >> Oct 13 11:03:26 ceph03 bash[4019]: cluster
> 2023-10-13T11:03:25.569471+
> >> 

[ceph-users] Re: stuck MDS warning: Client HOST failing to respond to cache pressure

2023-10-19 Thread Frank Schilder
Hi Stefan,

the jobs ended and the warning disappeared as expected. However, a new job 
started and the warning showed up again. There is something very strange going 
on and, maybe, you can help out here:

We have a low client CAPS limit configured for performance reasons:

# ceph config dump | grep client
[...]
  mds   advanced  mds_max_caps_per_client   
  65536 

The job in question holds more than that:

# ceph tell mds.0 session ls | jq -c '[.[] | {id: .id, h: 
.client_metadata.hostname, addr: .inst, fs: .client_metadata.root, caps: 
.num_caps, req: .request_load_avg}]|sort_by(.caps)|.[]' | tail
[...]
{"id":172249397,"h":"sn272...","addr":"client.172249397 
v1:192.168.57.143:0/195146548","fs":"/hpc/home","caps":105417,"req":1442}

This CAPS allocation is stable over time, the number doesn't change (I queried 
multiple times with several minutes interval). My guess is that the MDS message 
is not about cache pressure but rather about caps trimming. We do have clients 
that regularly exceed the limit though without MDS warnings. My guess is that 
these return at least some CAPS on request and are, therefore, not flagged. The 
client above seems to sit on a fixed set of CAPS that doesn't change and this 
causes the warning to show up.

The strange thing now is that very few files (on ceph fs) are actually open on 
the client:

[USER@sn272 ~]$ lsof -u USER | grep -e /home -e /groups -e /apps | wc -l
170

The kworker thread is at about 3% CPU and should be able to release CAPS. I'm 
wondering why it doesn't happen though. I also don't believe that 170 open 
files can allocate 105417 client caps.

Questions:

- Why does the client have so many caps allocated? Is there another way than 
open files that requires allocations?
- Is there a way to find out what these caps are for?
- We will look at the code (its python+miniconda), any pointers what to look 
for?

Thanks and best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Frank Schilder 
Sent: Tuesday, October 17, 2023 11:27 AM
To: Stefan Kooman; ceph-users@ceph.io
Subject: [ceph-users] Re: stuck MDS warning: Client HOST failing to respond to 
cache pressure

Hi Stefan,

probably. Its 2 compute nodes and there are jobs running. Our epilogue script 
will drop the caches, at which point I indeed expect the warning to disappear. 
We have no time limit on these nodes though, so this can be a while. I was 
hoping there was an alternative to that, say, a user-level command that I could 
execute on the client without possibly affecting other users jobs.

Thanks and best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Stefan Kooman 
Sent: Tuesday, October 17, 2023 11:13 AM
To: Frank Schilder; ceph-users@ceph.io
Subject: Re: [ceph-users] stuck MDS warning: Client HOST failing to respond to 
cache pressure

On 17-10-2023 09:22, Frank Schilder wrote:
> Hi all,
>
> I'm affected by a stuck MDS warning for 2 clients: "failing to respond to 
> cache pressure". This is a false alarm as no MDS is under any cache pressure. 
> The warning is stuck already for a couple of days. I found some old threads 
> about cases where the MDS does not update flags/triggers for this warning in 
> certain situations. Dating back to luminous and I'm probably hitting one of 
> these.
>
> In these threads I could find a lot except for instructions for how to clear 
> this out in a nice way. Is there something I can do on the clients to clear 
> this warning? I don't want to evict/reboot just because of that.

echo 2 > /proc/sys/vm/drop_caches on the clients  does that help?

Gr. Stefan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph 16.2.14: pgmap updated every few seconds for no apparent reason

2023-10-19 Thread Eugen Block

Hi,

you can change the report interval with this config option (default 2  
seconds):


$ ceph config get mgr mgr_tick_period
2

$ ceph config set mgr mgr_tick_period 10

Regards,
Eugen

Zitat von Chris Palmer :

I have just checked 2 quincy 17.2.6 clusters, and I see exactly the  
same. The pgmap version is bumping every two seconds (which ties in  
with the frequency you observed). Both clusters are healthy with  
nothing apart from client IO happening.


On 13/10/2023 12:09, Zakhar Kirpichenko wrote:

Hi,

I am investigating excessive mon writes in our cluster and wondering
whether excessive pgmap updates could be the culprit. Basically pgmap is
updated every few seconds, sometimes over ten times per minute, in a
healthy cluster with no OSD and/or PG changes:

Oct 13 11:03:03 ceph03 bash[4019]: cluster 2023-10-13T11:03:01.515438+
mgr.ceph01.vankui (mgr.336635131) 838252 : cluster [DBG] pgmap v606575:
2400 pgs: 5 active+clean+scrubbing+deep, 2395 active+clean; 16 TiB data, 61
TiB used, 716 TiB / 777 TiB avail; 60 MiB/s rd, 109 MiB/s wr, 5.65k op/s
Oct 13 11:03:04 ceph03 bash[4019]: cluster 2023-10-13T11:03:03.520953+
mgr.ceph01.vankui (mgr.336635131) 838253 : cluster [DBG] pgmap v606576:
2400 pgs: 5 active+clean+scrubbing+deep, 2395 active+clean; 16 TiB data, 61
TiB used, 716 TiB / 777 TiB avail; 64 MiB/s rd, 128 MiB/s wr, 5.76k op/s
Oct 13 11:03:06 ceph03 bash[4019]: cluster 2023-10-13T11:03:05.524474+
mgr.ceph01.vankui (mgr.336635131) 838255 : cluster [DBG] pgmap v606577:
2400 pgs: 5 active+clean+scrubbing+deep, 2395 active+clean; 16 TiB data, 61
TiB used, 716 TiB / 777 TiB avail; 64 MiB/s rd, 122 MiB/s wr, 5.57k op/s
Oct 13 11:03:08 ceph03 bash[4019]: cluster 2023-10-13T11:03:07.530484+
mgr.ceph01.vankui (mgr.336635131) 838256 : cluster [DBG] pgmap v606578:
2400 pgs: 5 active+clean+scrubbing+deep, 2395 active+clean; 16 TiB data, 61
TiB used, 716 TiB / 777 TiB avail; 79 MiB/s rd, 127 MiB/s wr, 6.62k op/s
Oct 13 11:03:10 ceph03 bash[4019]: cluster 2023-10-13T11:03:09.57+
mgr.ceph01.vankui (mgr.336635131) 838258 : cluster [DBG] pgmap v606579:
2400 pgs: 5 active+clean+scrubbing+deep, 2395 active+clean; 16 TiB data, 61
TiB used, 716 TiB / 777 TiB avail; 66 MiB/s rd, 104 MiB/s wr, 5.38k op/s
Oct 13 11:03:12 ceph03 bash[4019]: cluster 2023-10-13T11:03:11.537908+
mgr.ceph01.vankui (mgr.336635131) 838259 : cluster [DBG] pgmap v606580:
2400 pgs: 5 active+clean+scrubbing+deep, 2395 active+clean; 16 TiB data, 61
TiB used, 716 TiB / 777 TiB avail; 85 MiB/s rd, 121 MiB/s wr, 6.43k op/s
Oct 13 11:03:13 ceph03 bash[4019]: cluster 2023-10-13T11:03:13.543490+
mgr.ceph01.vankui (mgr.336635131) 838260 : cluster [DBG] pgmap v606581:
2400 pgs: 5 active+clean+scrubbing+deep, 2395 active+clean; 16 TiB data, 61
TiB used, 716 TiB / 777 TiB avail; 78 MiB/s rd, 127 MiB/s wr, 6.54k op/s
Oct 13 11:03:16 ceph03 bash[4019]: cluster 2023-10-13T11:03:15.547122+
mgr.ceph01.vankui (mgr.336635131) 838262 : cluster [DBG] pgmap v606582:
2400 pgs: 5 active+clean+scrubbing+deep, 2395 active+clean; 16 TiB data, 61
TiB used, 716 TiB / 777 TiB avail; 71 MiB/s rd, 122 MiB/s wr, 6.08k op/s
Oct 13 11:03:18 ceph03 bash[4019]: cluster 2023-10-13T11:03:17.553180+
mgr.ceph01.vankui (mgr.336635131) 838263 : cluster [DBG] pgmap v606583:
2400 pgs: 1 active+clean+scrubbing, 5 active+clean+scrubbing+deep, 2394
active+clean; 16 TiB data, 61 TiB used, 716 TiB / 777 TiB avail; 75 MiB/s
rd, 176 MiB/s wr, 6.83k op/s
Oct 13 11:03:20 ceph03 bash[4019]: cluster 2023-10-13T11:03:19.555960+
mgr.ceph01.vankui (mgr.336635131) 838264 : cluster [DBG] pgmap v606584:
2400 pgs: 1 active+clean+scrubbing, 5 active+clean+scrubbing+deep, 2394
active+clean; 16 TiB data, 61 TiB used, 716 TiB / 777 TiB avail; 58 MiB/s
rd, 161 MiB/s wr, 5.55k op/s
Oct 13 11:03:22 ceph03 bash[4019]: cluster 2023-10-13T11:03:21.560597+
mgr.ceph01.vankui (mgr.336635131) 838266 : cluster [DBG] pgmap v606585:
2400 pgs: 1 active+clean+scrubbing, 5 active+clean+scrubbing+deep, 2394
active+clean; 16 TiB data, 61 TiB used, 716 TiB / 777 TiB avail; 62 MiB/s
rd, 221 MiB/s wr, 6.19k op/s
Oct 13 11:03:24 ceph03 bash[4019]: cluster 2023-10-13T11:03:23.565974+
mgr.ceph01.vankui (mgr.336635131) 838267 : cluster [DBG] pgmap v606586:
2400 pgs: 1 active+clean+scrubbing, 5 active+clean+scrubbing+deep, 2394
active+clean; 16 TiB data, 61 TiB used, 716 TiB / 777 TiB avail; 50 MiB/s
rd, 246 MiB/s wr, 5.93k op/s
Oct 13 11:03:26 ceph03 bash[4019]: cluster 2023-10-13T11:03:25.569471+
mgr.ceph01.vankui (mgr.336635131) 838269 : cluster [DBG] pgmap v606587:
2400 pgs: 1 active+clean+scrubbing, 5 active+clean+scrubbing+deep, 2394
active+clean; 16 TiB data, 61 TiB used, 716 TiB / 777 TiB avail; 41 MiB/s
rd, 240 MiB/s wr, 4.99k op/s
Oct 13 11:03:28 ceph03 bash[4019]: cluster 2023-10-13T11:03:27.575618+
mgr.ceph01.vankui (mgr.336635131) 838270 : cluster [DBG] pgmap v606588:
2400 pgs: 4 active+clean+scrubbing+deep, 2396 active+clean; 16 TiB data, 61
TiB used, 716 TiB / 

[ceph-users] Re: How to deal with increasing HDD sizes ? 1 OSD for 2 LVM-packed HDDs ?

2023-10-19 Thread Marc
> 
> The question here is a rather simple one:
> when you add to an existing Ceph cluster a new node having disks twice
> (12TB) the size of the existing disks (6TB), how do you let Ceph evenly
> distribute the data across all disks ?

Ceph already does this. If you have 6TB+12TB in one node you will see that the 
12TB will get more data (at least on Nautilus)

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io