[ceph-users] Re: quincy v17.2.7 QE Validation status

2023-10-18 Thread Venky Shankar
On Tue, Oct 17, 2023 at 12:23 AM Yuri Weinstein  wrote:
>
> Details of this release are summarized here:
>
> https://tracker.ceph.com/issues/63219#note-2
> Release Notes - TBD
>
> Issue https://tracker.ceph.com/issues/63192 appears to be failing several 
> runs.
> Should it be fixed for this release?
>
> Seeking approvals/reviews for:
>
> smoke - Laura

There's one failure in the smoke tests


https://pulpito.ceph.com/yuriw-2023-10-18_14:58:31-smoke-quincy-release-distro-default-smithi/

caused by

https://github.com/ceph/ceph/pull/53647

(which was marked DNM but got merged). However, it's a test case thing
and we can live with it.

Yuri mention in slack that he might do another round of build/tests,
so, Yuri, here's the reverted change:

   https://github.com/ceph/ceph/pull/54085

> rados - Laura, Radek, Travis, Ernesto, Adam King
>
> rgw - Casey
> fs - Venky
> orch - Adam King
>
> rbd - Ilya
> krbd - Ilya
>
> upgrade/quincy-p2p - Known issue IIRC, Casey pls confirm/approve
>
> client-upgrade-quincy-reef - Laura
>
> powercycle - Brad pls confirm
>
> ceph-volume - Guillaume pls take a look
>
> Please reply to this email with approval and/or trackers of known
> issues/PRs to address them.
>
> Josh, Neha - gibba and LRC upgrades -- N/A for quincy now after reef release.
>
> Thx
> YuriW
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>


-- 
Cheers,
Venky
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: How to deal with increasing HDD sizes ? 1 OSD for 2 LVM-packed HDDs ?

2023-10-18 Thread Renaud Jean Christophe Miel
Thank you for your feedback.

We have a failure domain of "node".

The question here is a rather simple one:
when you add to an existing Ceph cluster a new node having disks twice (12TB) 
the size of the existing disks (6TB), how do you let Ceph evenly distribute the 
data across all disks ?

You mentioned CRUSH:  would the creation of a new "12TB-virtual-disk" CRUSH 
hierarchy level do the trick ?

In this level you would either pick 1 12TB HDD on the new node, or a pair of 2 
6TB HDDs on an old node.

Has someone already experimented with such kind of CRUSH hierarchy ?

Regards,

Renaud Miel
NAOJ




From: Anthony D'Atri 
Sent: Thursday, October 19, 2023 00:59
To: Robert Sander 
Cc: ceph-users@ceph.io 
Subject: [ceph-users] Re: How to deal with increasing HDD sizes ? 1 OSD for 2 
LVM-packed HDDs ?

This is one of many reasons for not using HDDs ;)

One nuance that is easy overlooked is the CRUSH weight of failure domains.

If, say, you have a failure domain of "rack" with size=3 replicated pools and 
3x CRUSH racks, if you add the new, larger OSDs to only one rack, you will not 
increase the cluster's capacity.

If in this scenario you add them as a fourth rack, this is mostly obviated.  
Another strategy is to add them uniformly to the existing racks.

The larger OSDs will get more PGs as Herr Sander touches upon.
* The higher IO can be somewhat ameliorated by adjusting primary-affinity 
values to favor the smaller OSDs being primaries for given PGs.
* The larger OSDs will have an increased risk of running into the 
mon_max_pg_per_osd limit, especially when an OSD or host fails.  Ensure that 
this setting is high enough to avoid this, suggest 500 as a value.

> On Oct 18, 2023, at 04:05, Robert Sander  wrote:
>
> On 10/18/23 09:25, Renaud Jean Christophe Miel wrote:
>> Hi,
>> Use case:
>> * Ceph cluster with old nodes having 6TB HDDs
>> * Add new node with new 12TB HDDs
>> Is it supported/recommended to pack 2 6TB HDDs handled by 2 old OSDs
>> into 1 12TB LVM disk handled by 1 new OSD ?
>
> The 12 TB HDD will get double the IO than one of the 6 TB HDDs.
> But it will still only be able to handle about 120 IOPs.
> This makes the newer larger HDDs a bottleneck when run in the same pool.
>
> If you are not planning to decommission the smaller HDDs it is recommended to 
> use the larger ones in a separate pool for performance reasons.
>
> Regards
> --
> Robert Sander
> Heinlein Consulting GmbH
> Schwedter Str. 8/9b, 10119 Berlin
>
> https://www.heinlein-support.de
>
> Tel: 030 / 405051-43
> Fax: 030 / 405051-19
>
> Amtsgericht Berlin-Charlottenburg - HRB 220009 B
> Geschäftsführer: Peer Heinlein - Sitz: Berlin
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: quincy v17.2.7 QE Validation status

2023-10-18 Thread Yuri Weinstein
Ok I merged all PRs known to me.

If I hear no objections I will start the building

(Casey FYI -> and will in parallel run quicny-p2p)

On Wed, Oct 18, 2023 at 11:44 AM Yuri Weinstein  wrote:
>
> Per our chat with Casey, we will remove s3tests and include
> https://github.com/ceph/ceph/pull/54078 into 17.2.7
>
> On Wed, Oct 18, 2023 at 9:30 AM Casey Bodley  wrote:
> >
> > On Mon, Oct 16, 2023 at 2:52 PM Yuri Weinstein  wrote:
> > >
> > > Details of this release are summarized here:
> > >
> > > https://tracker.ceph.com/issues/63219#note-2
> > > Release Notes - TBD
> > >
> > > Issue https://tracker.ceph.com/issues/63192 appears to be failing several 
> > > runs.
> > > Should it be fixed for this release?
> > >
> > > Seeking approvals/reviews for:
> > >
> > > smoke - Laura
> > > rados - Laura, Radek, Travis, Ernesto, Adam King
> > >
> > > rgw - Casey
> > > fs - Venky
> > > orch - Adam King
> > >
> > > rbd - Ilya
> > > krbd - Ilya
> > >
> > > upgrade/quincy-p2p - Known issue IIRC, Casey pls confirm/approve
> >
> > sorry, missed this part
> >
> > these point-to-point upgrade tests are failing because they're running
> > s3-tests against older quincy releases that don't have fixes for the
> > bugs they're testing. we don't maintain separate tests for each point
> > release, so we can't expect these upgrade tests to pass in general
> >
> > specifically:
> > test_post_object_wrong_bucket is failing because it requires the
> > 17.2.7 fix from https://github.com/ceph/ceph/pull/53757
> > test_set_bucket_tagging is failing because it requires the 17.2.7 fix
> > from https://github.com/ceph/ceph/pull/50103
> >
> > so the rgw failures are expected, but i can't tell whether they're
> > masking other important upgrade test coverage
> >
> > >
> > > client-upgrade-quincy-reef - Laura
> > >
> > > powercycle - Brad pls confirm
> > >
> > > ceph-volume - Guillaume pls take a look
> > >
> > > Please reply to this email with approval and/or trackers of known
> > > issues/PRs to address them.
> > >
> > > Josh, Neha - gibba and LRC upgrades -- N/A for quincy now after reef 
> > > release.
> > >
> > > Thx
> > > YuriW
> > > ___
> > > Dev mailing list -- d...@ceph.io
> > > To unsubscribe send an email to dev-le...@ceph.io
> > >
> >
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: quincy v17.2.7 QE Validation status

2023-10-18 Thread Laura Flores
The upgrade-clients/client-upgrade-quincy-reef suite passed with Prashant’s
POOL_AAPP_NOT_ENABLED PR. Approved!

On Wed, Oct 18, 2023 at 1:45 PM Yuri Weinstein  wrote:

> Per our chat with Casey, we will remove s3tests and include
> https://github.com/ceph/ceph/pull/54078 into 17.2.7
>
> On Wed, Oct 18, 2023 at 9:30 AM Casey Bodley  wrote:
> >
> > On Mon, Oct 16, 2023 at 2:52 PM Yuri Weinstein 
> wrote:
> > >
> > > Details of this release are summarized here:
> > >
> > > https://tracker.ceph.com/issues/63219#note-2
> > > Release Notes - TBD
> > >
> > > Issue https://tracker.ceph.com/issues/63192 appears to be failing
> several runs.
> > > Should it be fixed for this release?
> > >
> > > Seeking approvals/reviews for:
> > >
> > > smoke - Laura
> > > rados - Laura, Radek, Travis, Ernesto, Adam King
> > >
> > > rgw - Casey
> > > fs - Venky
> > > orch - Adam King
> > >
> > > rbd - Ilya
> > > krbd - Ilya
> > >
> > > upgrade/quincy-p2p - Known issue IIRC, Casey pls confirm/approve
> >
> > sorry, missed this part
> >
> > these point-to-point upgrade tests are failing because they're running
> > s3-tests against older quincy releases that don't have fixes for the
> > bugs they're testing. we don't maintain separate tests for each point
> > release, so we can't expect these upgrade tests to pass in general
> >
> > specifically:
> > test_post_object_wrong_bucket is failing because it requires the
> > 17.2.7 fix from https://github.com/ceph/ceph/pull/53757
> > test_set_bucket_tagging is failing because it requires the 17.2.7 fix
> > from https://github.com/ceph/ceph/pull/50103
> >
> > so the rgw failures are expected, but i can't tell whether they're
> > masking other important upgrade test coverage
> >
> > >
> > > client-upgrade-quincy-reef - Laura
> > >
> > > powercycle - Brad pls confirm
> > >
> > > ceph-volume - Guillaume pls take a look
> > >
> > > Please reply to this email with approval and/or trackers of known
> > > issues/PRs to address them.
> > >
> > > Josh, Neha - gibba and LRC upgrades -- N/A for quincy now after reef
> release.
> > >
> > > Thx
> > > YuriW
> > > ___
> > > Dev mailing list -- d...@ceph.io
> > > To unsubscribe send an email to dev-le...@ceph.io
> > >
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: quincy v17.2.7 QE Validation status

2023-10-18 Thread Yuri Weinstein
Per our chat with Casey, we will remove s3tests and include
https://github.com/ceph/ceph/pull/54078 into 17.2.7

On Wed, Oct 18, 2023 at 9:30 AM Casey Bodley  wrote:
>
> On Mon, Oct 16, 2023 at 2:52 PM Yuri Weinstein  wrote:
> >
> > Details of this release are summarized here:
> >
> > https://tracker.ceph.com/issues/63219#note-2
> > Release Notes - TBD
> >
> > Issue https://tracker.ceph.com/issues/63192 appears to be failing several 
> > runs.
> > Should it be fixed for this release?
> >
> > Seeking approvals/reviews for:
> >
> > smoke - Laura
> > rados - Laura, Radek, Travis, Ernesto, Adam King
> >
> > rgw - Casey
> > fs - Venky
> > orch - Adam King
> >
> > rbd - Ilya
> > krbd - Ilya
> >
> > upgrade/quincy-p2p - Known issue IIRC, Casey pls confirm/approve
>
> sorry, missed this part
>
> these point-to-point upgrade tests are failing because they're running
> s3-tests against older quincy releases that don't have fixes for the
> bugs they're testing. we don't maintain separate tests for each point
> release, so we can't expect these upgrade tests to pass in general
>
> specifically:
> test_post_object_wrong_bucket is failing because it requires the
> 17.2.7 fix from https://github.com/ceph/ceph/pull/53757
> test_set_bucket_tagging is failing because it requires the 17.2.7 fix
> from https://github.com/ceph/ceph/pull/50103
>
> so the rgw failures are expected, but i can't tell whether they're
> masking other important upgrade test coverage
>
> >
> > client-upgrade-quincy-reef - Laura
> >
> > powercycle - Brad pls confirm
> >
> > ceph-volume - Guillaume pls take a look
> >
> > Please reply to this email with approval and/or trackers of known
> > issues/PRs to address them.
> >
> > Josh, Neha - gibba and LRC upgrades -- N/A for quincy now after reef 
> > release.
> >
> > Thx
> > YuriW
> > ___
> > Dev mailing list -- d...@ceph.io
> > To unsubscribe send an email to dev-le...@ceph.io
> >
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: quincy v17.2.7 QE Validation status

2023-10-18 Thread Casey Bodley
On Mon, Oct 16, 2023 at 2:52 PM Yuri Weinstein  wrote:
>
> Details of this release are summarized here:
>
> https://tracker.ceph.com/issues/63219#note-2
> Release Notes - TBD
>
> Issue https://tracker.ceph.com/issues/63192 appears to be failing several 
> runs.
> Should it be fixed for this release?
>
> Seeking approvals/reviews for:
>
> smoke - Laura
> rados - Laura, Radek, Travis, Ernesto, Adam King
>
> rgw - Casey
> fs - Venky
> orch - Adam King
>
> rbd - Ilya
> krbd - Ilya
>
> upgrade/quincy-p2p - Known issue IIRC, Casey pls confirm/approve

sorry, missed this part

these point-to-point upgrade tests are failing because they're running
s3-tests against older quincy releases that don't have fixes for the
bugs they're testing. we don't maintain separate tests for each point
release, so we can't expect these upgrade tests to pass in general

specifically:
test_post_object_wrong_bucket is failing because it requires the
17.2.7 fix from https://github.com/ceph/ceph/pull/53757
test_set_bucket_tagging is failing because it requires the 17.2.7 fix
from https://github.com/ceph/ceph/pull/50103

so the rgw failures are expected, but i can't tell whether they're
masking other important upgrade test coverage

>
> client-upgrade-quincy-reef - Laura
>
> powercycle - Brad pls confirm
>
> ceph-volume - Guillaume pls take a look
>
> Please reply to this email with approval and/or trackers of known
> issues/PRs to address them.
>
> Josh, Neha - gibba and LRC upgrades -- N/A for quincy now after reef release.
>
> Thx
> YuriW
> ___
> Dev mailing list -- d...@ceph.io
> To unsubscribe send an email to dev-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: How to deal with increasing HDD sizes ? 1 OSD for 2 LVM-packed HDDs ?

2023-10-18 Thread Anthony D'Atri
This is one of many reasons for not using HDDs ;)

One nuance that is easy overlooked is the CRUSH weight of failure domains.

If, say, you have a failure domain of "rack" with size=3 replicated pools and 
3x CRUSH racks, if you add the new, larger OSDs to only one rack, you will not 
increase the cluster's capacity.

If in this scenario you add them as a fourth rack, this is mostly obviated.  
Another strategy is to add them uniformly to the existing racks.

The larger OSDs will get more PGs as Herr Sander touches upon.
* The higher IO can be somewhat ameliorated by adjusting primary-affinity 
values to favor the smaller OSDs being primaries for given PGs.
* The larger OSDs will have an increased risk of running into the 
mon_max_pg_per_osd limit, especially when an OSD or host fails.  Ensure that 
this setting is high enough to avoid this, suggest 500 as a value.

> On Oct 18, 2023, at 04:05, Robert Sander  wrote:
> 
> On 10/18/23 09:25, Renaud Jean Christophe Miel wrote:
>> Hi,
>> Use case:
>> * Ceph cluster with old nodes having 6TB HDDs
>> * Add new node with new 12TB HDDs
>> Is it supported/recommended to pack 2 6TB HDDs handled by 2 old OSDs
>> into 1 12TB LVM disk handled by 1 new OSD ?
> 
> The 12 TB HDD will get double the IO than one of the 6 TB HDDs.
> But it will still only be able to handle about 120 IOPs.
> This makes the newer larger HDDs a bottleneck when run in the same pool.
> 
> If you are not planning to decommission the smaller HDDs it is recommended to 
> use the larger ones in a separate pool for performance reasons.
> 
> Regards
> -- 
> Robert Sander
> Heinlein Consulting GmbH
> Schwedter Str. 8/9b, 10119 Berlin
> 
> https://www.heinlein-support.de
> 
> Tel: 030 / 405051-43
> Fax: 030 / 405051-19
> 
> Amtsgericht Berlin-Charlottenburg - HRB 220009 B
> Geschäftsführer: Peer Heinlein - Sitz: Berlin
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Join us for the User + Dev Meeting, happening tomorrow!

2023-10-18 Thread Laura Flores
Hi Ceph users and developers,

You are invited to join us at the User + Dev meeting tomorrow at 10:00 AM
EST! See below for more meeting details.

We have two guest speakers joining us tomorrow:

1. "CRUSH Changes at Scale" by Joshua Baergen, Digital Ocean
In this talk, Joshua Baergen will discuss the problems that operators
encounter with CRUSH changes at scale and how DigitalOcean built
pg-remapper to control and speed up CRUSH-induced backfill.

2. "CephFS Management with Ceph Dashboard" by Pedro Gonzalez Gomez, IBM
This talk will demonstrate new Dashboard behavior regarding CephFS
management.

The last part of the meeting will be dedicated to open discussion. Feel
free to add questions for the speakers or additional topics under the "Open
Discussion" section on the agenda:
https://pad.ceph.com/p/ceph-user-dev-monthly-minutes

If you have an idea for a focus topic you'd like to present at a future
meeting, you are welcome to submit it to this Google Form:
https://docs.google.com/forms/d/e/1FAIpQLSdboBhxVoBZoaHm8xSmeBoemuXoV_rmh4vJDGBrp6d-D3-BlQ/viewform?usp=sf_link
Any Ceph user or developer is eligible to submit!

Thanks,
Laura Flores

Meeting link: https://meet.jit.si/ceph-user-dev-monthly

Time conversions:
UTC:   Thursday, October 19, 14:00 UTC
Mountain View, CA, US: Thursday, October 19,  7:00 PDT
Phoenix, AZ, US:   Thursday, October 19,  7:00 MST
Denver, CO, US:Thursday, October 19,  8:00 MDT
Huntsville, AL, US:Thursday, October 19,  9:00 CDT
Raleigh, NC, US:   Thursday, October 19, 10:00 EDT
London, England:   Thursday, October 19, 15:00 BST
Paris, France: Thursday, October 19, 16:00 CEST
Helsinki, Finland: Thursday, October 19, 17:00 EEST
Tel Aviv, Israel:  Thursday, October 19, 17:00 IDT
Pune, India:   Thursday, October 19, 19:30 IST
Brisbane, Australia:   Friday, October 20,  0:00 AEST
Singapore, Asia:   Thursday, October 19, 22:00 +08
Auckland, New Zealand: Friday, October 20,  3:00 NZDT

-- 

Laura Flores

She/Her/Hers

Software Engineer, Ceph Storage 

Chicago, IL

lflo...@ibm.com | lflo...@redhat.com 
M: +17087388804
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: quincy v17.2.7 QE Validation status

2023-10-18 Thread Laura Flores
@Prashant Dhange 
raised PR https://github.com/ceph/ceph/pull/54065 to help with
POOL_APP_NOT_ENABLED warnings in the smoke, rados, perf-basic, and
upgrade-clients/client-upgrade-quincy-reef suites.

The tracker has been updated with reruns including Prashant's PR. *Smoke,
rados, and perf-basic are approved.*

As for *upgrade-clients/client-upgrade-quincy-reef,* Yuri will rerun the
test after we merge Prashant's PR. Then we will approve this last suite.

Thanks,
Laura

On Wed, Oct 18, 2023 at 7:30 AM Guillaume Abrioux  wrote:

> Hi Yuri,
>
> ceph-volume approved  https://jenkins.ceph.com/job/ceph-volume-test/566/
>
> Regards,
>
> --
> Guillaume Abrioux
> Software Engineer
>
> From: Yuri Weinstein 
> Date: Monday, 16 October 2023 at 20:53
> To: dev , ceph-users 
> Subject: [EXTERNAL] [ceph-users] quincy v17.2.7 QE Validation status
> Details of this release are summarized here:
>
> https://tracker.ceph.com/issues/63219#note-2
> Release Notes - TBD
>
> Issue https://tracker.ceph.com/issues/63192  appears to be failing
> several runs.
> Should it be fixed for this release?
>
> Seeking approvals/reviews for:
>
> smoke - Laura
> rados - Laura, Radek, Travis, Ernesto, Adam King
>
> rgw - Casey
> fs - Venky
> orch - Adam King
>
> rbd - Ilya
> krbd - Ilya
>
> upgrade/quincy-p2p - Known issue IIRC, Casey pls confirm/approve
>
> client-upgrade-quincy-reef - Laura
>
> powercycle - Brad pls confirm
>
> ceph-volume - Guillaume pls take a look
>
> Please reply to this email with approval and/or trackers of known
> issues/PRs to address them.
>
> Josh, Neha - gibba and LRC upgrades -- N/A for quincy now after reef
> release.
>
> Thx
> YuriW
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
> Unless otherwise stated above:
>
> Compagnie IBM France
> Siège Social : 17, avenue de l'Europe, 92275 Bois-Colombes Cedex
> RCS Nanterre 552 118 465
> Forme Sociale : S.A.S.
> Capital Social : 664 069 390,60 €
> SIRET : 552 118 465 03644 - Code NAF 6203Z
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>


-- 

Laura Flores

She/Her/Hers

Software Engineer, Ceph Storage 

Chicago, IL

lflo...@ibm.com | lflo...@redhat.com 
M: +17087388804
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: How to trigger scrubbing in Ceph on-demand ?

2023-10-18 Thread Reto Gysi
Hi

I haven't updated to reef yet. I've tried this on quincy.

# create a testfile on cephfs.rgysi.data pool
root@zephir:/home/rgysi/misc# echo cephtest123 > cephtest.txt

#list inode of new file
root@zephir:/home/rgysi/misc# ls -i cephtest.txt
1099518867574 cephtest.txt

convert inode value to hex value
root@zephir:/home/rgysi/misc# printf "%x" 1099518867574
16e7876

# search for this value in the rados pool cephfs.rgysi.data, to find
object(s)
root@zephir:/home/rgysi/misc# rados -p cephfs.rgysi.data ls | grep
16e7876
16e7876.

# find pg for the object
root@zephir:/home/rgysi/misc# ceph osd map cephfs.rgysi.data
16e7876.
osdmap e105365 pool 'cephfs.rgysi.data' (25) object '16e7876.'
-> pg 25.ee1befa1 (25.1) -> up ([0,2,8], p0) acting ([0,2,8], p0)

#Initiate a deep-scrub for this pg
root@zephir:/home/rgysi/misc# ceph pg deep-scrub 25.1
instructing pg 25.1 on osd.0 to deep-scrub

# check status of scrubbing
root@zephir:/home/rgysi/misc# ceph pg ls scrubbing
PGOBJECTS  DEGRADED  MISPLACED  UNFOUND  BYTESOMAP_BYTES*
 OMAP_KEYS*  LOG   STATESINCE  VERSION
REPORTEDUP ACTING SCRUB_STAMP
 DEEP_SCRUB_STAMP LAST_S
CRUB_DURATION  SCRUB_SCHEDULING
25.137774 0  00  628698231420
  0  2402  active+clean+scrubbing+deep 7s  105365'1178098
 105365:8066292  [0,2,8]p0  [0,2,8]p0  2023-10-18T05:17:48.631392+
 2023-10-08T11:30:58.883164+
   3  deep scrubbing for 1s


Best Regards,

Reto

Am Mi., 18. Okt. 2023 um 16:24 Uhr schrieb Jayjeet Chakraborty <
jayje...@ucsc.edu>:

> Hi all,
>
> Just checking if someone had a chance to go through the scrub trigger issue
> above. Thanks.
>
> Best Regards,
> *Jayjeet Chakraborty*
> Ph.D. Student
> Department of Computer Science and Engineering
> University of California, Santa Cruz
> *Email: jayje...@ucsc.edu *
>
>
> On Mon, Oct 16, 2023 at 9:01 PM Jayjeet Chakraborty 
> wrote:
>
> > Hi all,
> >
> > I am trying to trigger deep scrubbing in Ceph reef (18.2.0) on demand on
> a
> > set of files that I randomly write to CephFS. I have tried both invoking
> > deep-scrub on CephFS using ceph tell and just deep scrubbing a
> > particular PG. Unfortunately, none of that seems to be working for me. I
> am
> > monitoring the ceph status output, it never shows any scrubbing
> > information. Can anyone please help me out on this ? In a nutshell, I
> need
> > Ceph to scrub for me anytime I want. I am using Ceph with default configs
> > for scrubbing. Thanks all.
> >
> > Best Regards,
> > *Jayjeet Chakraborty*
> > Ph.D. Student
> > Department of Computer Science and Engineering
> > University of California, Santa Cruz
> > *Email: jayje...@ucsc.edu *
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: How to trigger scrubbing in Ceph on-demand ?

2023-10-18 Thread Jayjeet Chakraborty
Hi all,

Just checking if someone had a chance to go through the scrub trigger issue
above. Thanks.

Best Regards,
*Jayjeet Chakraborty*
Ph.D. Student
Department of Computer Science and Engineering
University of California, Santa Cruz
*Email: jayje...@ucsc.edu *


On Mon, Oct 16, 2023 at 9:01 PM Jayjeet Chakraborty 
wrote:

> Hi all,
>
> I am trying to trigger deep scrubbing in Ceph reef (18.2.0) on demand on a
> set of files that I randomly write to CephFS. I have tried both invoking
> deep-scrub on CephFS using ceph tell and just deep scrubbing a
> particular PG. Unfortunately, none of that seems to be working for me. I am
> monitoring the ceph status output, it never shows any scrubbing
> information. Can anyone please help me out on this ? In a nutshell, I need
> Ceph to scrub for me anytime I want. I am using Ceph with default configs
> for scrubbing. Thanks all.
>
> Best Regards,
> *Jayjeet Chakraborty*
> Ph.D. Student
> Department of Computer Science and Engineering
> University of California, Santa Cruz
> *Email: jayje...@ucsc.edu *
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] How to confirm cache hit rate in ceph osd.

2023-10-18 Thread mitsu
Hi, 

I'd like to know cache hit rate in ceph osd. I installed prometheus and 
grafana. But there aren't cache hit rate on grafana dashbords...
Does Ceph have a cache hit rate counter? I'd like to know the impact of READ 
performance on Ceph cluster.

Regards,
--
Mitsumasa KONDO
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: How to deal with increasing HDD sizes ? 1 OSD for 2 LVM-packed HDDs ?

2023-10-18 Thread Peter Grandi
> * Ceph cluster with old nodes having 6TB HDDs
> * Add new node with new 12TB HDDs

Halving IOPS-per-TB?

https://www.sabi.co.uk/blog/17-one.html?170610#170610
https://www.sabi.co.uk/blog/15-one.html?150329#150329

> Is it supported/recommended to pack 2 6TB HDDs handled by 2
> old OSDs into 1 12TB LVM disk handled by 1 new OSD ?

The OSDs are just random daemons, what matters to chunk
distribution in Ceph is buckets, and in this case leaf buckets.

So it all depends on the CRUSH map but I suspect that
manipulating it so that two existing leaf buckets become one is
not possible or too tricky to attempt.

One option would be to divide the 12TB disk in 2 partitions/LVs
of 6TB and run 2 OSDs against it. It is not recommended, but I
don't see a big issue in this case other than IOPS-per-TB.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph 16.2.x mon compactions, disk writes

2023-10-18 Thread Zakhar Kirpichenko
Frank,

The only changes in ceph.conf are just the compression settings, most of
the cluster configuration is in the monitor database thus my ceph.conf is
rather short:

---
[global]
fsid = xxx
mon_host = [list of mons]

[mon.yyy]
public network = a.b.c.d/e
mon_rocksdb_options =
"write_buffer_size=33554432,compression=kLZ4Compression,level_compaction_dynamic_level_bytes=true,bottommost_compression=kLZ4HCCompression"
---

Note that my bottommost_compression choice is LZ4HC, whose compression is
better than LZ4 at the expense of higher CPU usage. My nodes have lots of
CPU to spare, so I went for LZ4HC for better space savings and a lower
amount of writes. In general, I would recommend trying a faster and less
intense compression first, LZ4 across the board is a good starting choice.

/Z

On Wed, 18 Oct 2023 at 12:02, Frank Schilder  wrote:

> Hi Zakhar,
>
> since its a bit beyond of the scope of basic, could you please post the
> complete ceph.conf config section for these changes for reference?
>
> Thanks!
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> 
> From: Zakhar Kirpichenko 
> Sent: Wednesday, October 18, 2023 6:14 AM
> To: Eugen Block
> Cc: ceph-users@ceph.io
> Subject: [ceph-users] Re: Ceph 16.2.x mon compactions, disk writes
>
> Many thanks for this, Eugen! I very much appreciate yours and Mykola's
> efforts and insight!
>
> Another thing I noticed was a reduction of RocksDB store after the
> reduction of the total PG number by 30%, from 590-600 MB:
>
> 65M 3675511.sst
> 65M 3675512.sst
> 65M 3675513.sst
> 65M 3675514.sst
> 65M 3675515.sst
> 65M 3675516.sst
> 65M 3675517.sst
> 65M 3675518.sst
> 62M 3675519.sst
>
> to about half of the original size:
>
> -rw-r--r-- 1 167 167  7218886 Oct 13 16:16 3056869.log
> -rw-r--r-- 1 167 167 67250650 Oct 13 16:15 3056871.sst
> -rw-r--r-- 1 167 167 67367527 Oct 13 16:15 3056872.sst
> -rw-r--r-- 1 167 167 63268486 Oct 13 16:15 3056873.sst
>
> Then when I restarted the monitors one by one before adding compression,
> RocksDB store reduced even further. I am not sure why and what exactly got
> automatically removed from the store:
>
> -rw-r--r-- 1 167 167   841960 Oct 18 03:31 018779.log
> -rw-r--r-- 1 167 167 67290532 Oct 18 03:31 018781.sst
> -rw-r--r-- 1 167 167 53287626 Oct 18 03:31 018782.sst
>
> Then I have enabled LZ4 and LZ4HC compression in our small production
> cluster (6 nodes, 96 OSDs) on 3 out of 5
> monitors:
> compression=kLZ4Compression,bottommost_compression=kLZ4HCCompression.
> I specifically went for LZ4 and LZ4HC because of the balance between
> compression/decompression speed and impact on CPU usage. The compression
> doesn't seem to affect the cluster in any negative way, the 3 monitors with
> compression are operating normally. The effect of the compression on
> RocksDB store size and disk writes is quite noticeable:
>
> Compression disabled, 155 MB store.db, ~125 MB RocksDB sst, and ~530 MB
> writes over 5 minutes:
>
> -rw-r--r-- 1 167 167  4227337 Oct 18 03:58 3080868.log
> -rw-r--r-- 1 167 167 67253592 Oct 18 03:57 3080870.sst
> -rw-r--r-- 1 167 167 57783180 Oct 18 03:57 3080871.sst
>
> # du -hs
> /var/lib/ceph/3f50555a-ae2a-11eb-a2fc-ffde44714d86/mon.ceph04/store.db/;
> iotop -ao -bn 2 -d 300 2>&1 | grep ceph-mon
> 155M
>  /var/lib/ceph/3f50555a-ae2a-11eb-a2fc-ffde44714d86/mon.ceph04/store.db/
> 2471602 be/4 167   6.05 M473.24 M  0.00 %  0.16 % ceph-mon -n
> mon.ceph04 -f --setuser ceph --setgroup ceph --default-log-to-file=false
> --default-log-to-stderr=true --default-log-stderr-prefix=debug
>  --default-mon-cluster-log-to-file=false
> --default-mon-cluster-log-to-stderr=true [rocksdb:low0]
> 2471633 be/4 167 188.00 K 40.91 M  0.00 %  0.02 % ceph-mon -n
> mon.ceph04 -f --setuser ceph --setgroup ceph --default-log-to-file=false
> --default-log-to-stderr=true --default-log-stderr-prefix=debug
>  --default-mon-cluster-log-to-file=false
> --default-mon-cluster-log-to-stderr=true [ms_dispatch]
> 2471603 be/4 167  16.00 K 24.16 M  0.00 %  0.01 % ceph-mon -n
> mon.ceph04 -f --setuser ceph --setgroup ceph --default-log-to-file=false
> --default-log-to-stderr=true --default-log-stderr-prefix=debug
>  --default-mon-cluster-log-to-file=false
> --default-mon-cluster-log-to-stderr=true [rocksdb:high0]
>
> Compression enabled, 60 MB store.db, ~23 MB RocksDB sst, and ~130 MB of
> writes over 5 minutes:
>
> -rw-r--r-- 1 167 167  5766659 Oct 18 03:56 3723355.log
> -rw-r--r-- 1 167 167 22240390 Oct 18 03:56 3723357.sst
>
> # du -hs
> /var/lib/ceph/3f50555a-ae2a-11eb-a2fc-ffde44714d86/mon.ceph03/store.db/;
> iotop -ao -bn 2 -d 300 2>&1 | grep ceph-mon
> 60M
> /var/lib/ceph/3f50555a-ae2a-11eb-a2fc-ffde44714d86/mon.ceph03/store.db/
> 2052031 be/4 1671040.00 K 83.48 M  0.00 %  0.01 % ceph-mon -n
> mon.ceph03 -f --setuser ceph --setgroup ceph --default-log-to-file=false
> --default-log

[ceph-users] Re: Nautilus - Octopus upgrade - more questions

2023-10-18 Thread Tim Holloway
I started with Octopus. It had one very serious flaw that I only fixed
by having Ceph self-upgrade to Pacific. Octopus required perfect health
to alter daemons and often the health problems were themselves issues
with daemons. Pacific can overlook most of those problems, so it's a
lot easier to repair stuff.

Nautilus is also the highest level supported natively by CentOS 7 (I
went with Octopus because I mis-read that). I can attest that a Ceph FS
mount under the Nautilus packages to Octopus and Pacific seem to work
just find, incidentally.

I can't use Ansible myself, because my hardware distribution is too
irregular, so I'm doing everything from Ceph commands.

If you are currently running on Ceph as installed services rather than
containerized, I should also note that apparently the version of Docker
for CentOS7 (which I'll take as a rough equivalent to what you've got)
cannot properly run the Ceph containers. AlmaLinux 8, however has no
problems.

Since I've yet to migrate most of my hosts off CentOS 7, I run most of
Ceph in VMs running AlmaLinux 8. The OSD storage I mount as raw disks
to avoid the extra layer that virtualizing them would entail. The raw
disks are themselves LVM logical volumes as I don't have dedicated
physical drives. It's a work in progress, and a messy one.

I don't know if you can do incremental migration with mixed
Nautilus/Pacific OSDs, if Nautilus supports Ceph's internal upgrade
(Octopus does), or if it's best to just crank up a fresh Ceph and
migrate the data via something like rsync (which I did, since I was
moving from glusterfs to Ceph). Maybe someone with more knowledge of
the internals can answer these questions.

   Tim

On Tue, 2023-10-17 at 20:18 -0400, Dave Hall wrote:
> Hello,
> 
> I have a Nautilus cluster built using Ceph packages from Debian 10
> Backports, deployed with Ceph-Ansible.
> 
> I see that Debian does not offer Ceph 15/Octopus packages.  However,
> download.ceph.com does offer such packages.
> 
> Question:  Is it a safe upgrade to install the download.ceph.com
> packages
> over top of the buster-backports packages?
> 
> If so, the next question is how to deploy this?  Should I pull down
> an
> appropriate version of Ceph-Ansible and use the rolling-upgrade
> playbook?
> Or just apg-get -f dist-upgrade the new Ceph packages into place?
> 
> BTW, in the long run I'll probably want to get to container-based
> Reef, but
> I need to keep a stable cluster throughout.
> 
> Any advice or reassurance much appreciated.
> 
> Thanks.
> 
> -Dave
> 
> --
> Dave Hall
> Binghamton University
> kdh...@binghamton.edu
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: quincy v17.2.7 QE Validation status

2023-10-18 Thread Guillaume Abrioux
Hi Yuri,

ceph-volume approved  https://jenkins.ceph.com/job/ceph-volume-test/566/

Regards,

--
Guillaume Abrioux
Software Engineer

From: Yuri Weinstein 
Date: Monday, 16 October 2023 at 20:53
To: dev , ceph-users 
Subject: [EXTERNAL] [ceph-users] quincy v17.2.7 QE Validation status
Details of this release are summarized here:

https://tracker.ceph.com/issues/63219#note-2
Release Notes - TBD

Issue https://tracker.ceph.com/issues/63192  appears to be failing several runs.
Should it be fixed for this release?

Seeking approvals/reviews for:

smoke - Laura
rados - Laura, Radek, Travis, Ernesto, Adam King

rgw - Casey
fs - Venky
orch - Adam King

rbd - Ilya
krbd - Ilya

upgrade/quincy-p2p - Known issue IIRC, Casey pls confirm/approve

client-upgrade-quincy-reef - Laura

powercycle - Brad pls confirm

ceph-volume - Guillaume pls take a look

Please reply to this email with approval and/or trackers of known
issues/PRs to address them.

Josh, Neha - gibba and LRC upgrades -- N/A for quincy now after reef release.

Thx
YuriW
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Unless otherwise stated above:

Compagnie IBM France
Siège Social : 17, avenue de l'Europe, 92275 Bois-Colombes Cedex
RCS Nanterre 552 118 465
Forme Sociale : S.A.S.
Capital Social : 664 069 390,60 €
SIRET : 552 118 465 03644 - Code NAF 6203Z
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [EXTERNAL] [Pacific] ceph orch device ls do not returns any HDD

2023-10-18 Thread Patrick Begou

Hi all,

I'm trying to catch the faulty commit. I'm able to build Ceph from the 
git repo in a fresh podman container but at this time, the lsblk command 
returns nothing in my container.

In ceph containers lsblk works
So something is wrong with launching my podman container (or different 
from launching ceph containers) and I cannot find what.


Any help about this step ?

Thanks

Patrick


Le 13/10/2023 à 09:18, Eugen Block a écrit :

Trying to resend with the attachment.
I can't really find anything suspicious, ceph-volume (16.2.11) does 
recognize /dev/sdc though:


[2023-10-12 08:58:14,135][ceph_volume.process][INFO  ] stdout 
NAME="sdc" KNAME="sdc" PKNAME="" MAJ:MIN="8:32" FSTYPE="" 
MOUNTPOINT="" LABEL="" UUID="" RO="0" RM="1" MODEL="SAMSUNG HE253GJ " 
SIZE="232.9G" STATE="running" OWNER="root" GROUP="disk" 
MODE="brw-rw" ALIGNMENT="0" PHY-SEC="512" LOG-SEC="512" ROTA="1" 
SCHED="mq-deadline" TYPE="disk" DISC-ALN="0" DISC-GRAN="0B" 
DISC-MAX="0B" DISC-ZERO="0" PKNAME="" PARTLABEL=""
[2023-10-12 08:58:14,139][ceph_volume.util.system][INFO  ] Executable 
pvs found on the host, will use /sbin/pvs
[2023-10-12 08:58:14,140][ceph_volume.process][INFO  ] Running 
command: nsenter --mount=/rootfs/proc/1/ns/mnt 
--ipc=/rootfs/proc/1/ns/ipc --net=/rootfs/proc/1/ns/net 
--uts=/rootfs/proc/1/ns/uts /sbin/pvs --noheadings --readonly 
--units=b --nosuffix --separator=";" -o 
pv_name,vg_name,pv_count,lv_count,vg_attr,vg_extent_count,vg_free_count,vg_extent_size


But apparently it just stops after that. I already tried to find a 
debug log-level for ceph-volume but it's not applicable to all 
subcommands.
The cephadm.log also just stops without even finishing the "copying 
blob", which makes me wonder if it actually pulls the entire image? I 
assume you have enough free disk space (otherwise I would expect a 
message "failed to pull target image"), do you see any other warnings 
in syslog or something? Or are the logs incomplete?

Maybe someone else finds any clues in the logs...

Regards,
Eugen

Zitat von Patrick Begou :


Hi Eugen,

You will find in attachment cephadm.log and cepĥ-volume.log. Each 
contains the outputs for the 2 versions.  v16.2.10-20220920 is really 
more verbose or v16.2.11-20230125 does not execute all the detection 
process


Patrick


Le 12/10/2023 à 09:34, Eugen Block a écrit :
Good catch, and I found the thread I had in my mind, it was this 
exact one. :-D Anyway, can you share the ceph-volume.log from the 
working and the not working attempt?
I tried to look for something significant in the pacific release 
notes for 16.2.11, and there were some changes to ceph-volume, but 
I'm not sure what it could be.


Zitat von Patrick Begou :

I've ran additional tests with Pacific releases and with 
"ceph-volume inventory" things went wrong with the first v16.11 
release (v16.2.11-20230125)


=== Ceph v16.2.10-20220920 ===

Device Path   Size rotates available Model name
/dev/sdc  232.83 GB    True    True SAMSUNG HE253GJ
/dev/sda  232.83 GB    True    False SAMSUNG HE253GJ
/dev/sdb  465.76 GB    True    False WDC 
WD5003ABYX-1


=== Ceph v16.2.11-20230125 ===

Device Path   Size Device nodes rotates 
available Model name



May be this could help to see what has changed ?

Patrick

Le 11/10/2023 à 17:38, Eugen Block a écrit :
That's really strange. Just out of curiosity, have you tried 
Quincy (and/or Reef) as well? I don't recall what inventory does 
in the background exactly, I believe Adam King mentioned that in 
some thread, maybe that can help here. I'll search for that thread 
tomorrow.


Zitat von Patrick Begou :


Hi Eugen,

[root@mostha1 ~]# rpm -q cephadm
cephadm-16.2.14-0.el8.noarch

Log associated to the

2023-10-11 16:16:02,167 7f820515fb80 DEBUG 


cephadm ['gather-facts']
2023-10-11 16:16:02,208 7f820515fb80 DEBUG /bin/podman: 4.4.1
2023-10-11 16:16:02,313 7f820515fb80 DEBUG sestatus: SELinux 
status: disabled
2023-10-11 16:16:02,317 7f820515fb80 DEBUG sestatus: SELinux 
status: disabled
2023-10-11 16:16:02,322 7f820515fb80 DEBUG sestatus: SELinux 
status: disabled
2023-10-11 16:16:02,326 7f820515fb80 DEBUG sestatus: SELinux 
status: disabled
2023-10-11 16:16:02,329 7f820515fb80 DEBUG sestatus: SELinux 
status: disabled
2023-10-11 16:16:02,333 7f820515fb80 DEBUG sestatus: SELinux 
status: disabled
2023-10-11 16:16:04,474 7ff2a5c08b80 DEBUG 


cephadm ['ceph-volume', 'inventory']
2023-10-11 16:16:04,516 7ff2a5c08b80 DEBUG /usr/bin/podman: 4.4.1
2023-10-11 16:16:04,520 7ff2a5c08b80 DEBUG Using default config: 
/etc/ceph/ceph.conf
2023-10-11 16:16:04,573 7ff2a5c08b80

[ceph-users] Re: Remove empty orphaned PGs not mapped to a pool

2023-10-18 Thread Eugen Block

Hi,


So now we need to empty these OSDs.

The device class was SSD. I changed it to HDD and moved the OSDs  
inside the Crush tree to the other HDD OSDs of the host.
I need to move the PGs away from the OSDs to other OSDs but I do not  
know how to do it.


your crush rule doesn't specify a device class so moving them around  
doesn't really help (as you already noticed). Are other pools using  
that crush rule? You can see the applied rule IDs in 'ceph osd pool ls  
detail' output. If other pools use the same rule, make sure the  
cluster can handle data movement if you change it. Test the modified  
rule with crushtool first, and maybe in your test cluster as well.
If you add a device class statement (step take default class hdd) to  
the rule the SSDs will be drained automatically, at least they should.


But before you change anyhting I just want to make sure I understand  
correctly:


- Your cache tier was on SSDs which need to be removed.
- Cache tier was removed successfully.
- But since the rbd-meta pool is not aware of device classes it used  
the same SSDs.


Don't change the root bmeta, only the crush rule "rbd-meta", here's an  
example from a replicated pool:


# rules
rule replicated_ruleset {
id 0
type replicated
min_size 1
max_size 10
step take default class hdd
step chooseleaf firstn 0 type host
step emit
}


Zitat von Malte Stroem :


Hello Eugen,

I was wrong. I am sorry.

The PGs are not empty and orphaned.

Most of the PGs are empty but a few are indeed used.

And the pool for these PGs is still there. It is the metadata pool  
of the erasure coded pool for RBDs. The cache tier pool was removed  
successfully.


So now we need to empty these OSDs.

The device class was SSD. I changed it to HDD and moved the OSDs  
inside the Crush tree to the other HDD OSDs of the host.


I need to move the PGs away from the OSDs to other OSDs but I do not  
know how to do it.


Is using pg-upmap the solution?

Is using the objectstore-tool the solution?

Is moving the OSDs inside Crush to the right place the solution?

Is migrating the metadata pool to another with another crush rule  
the solution?


The crush rule of this metadata pool looks like this:

{
"rule_id": 8,
"rule_name": "rbd-meta",
"ruleset": 6,
"type": 1,
"min_size": 1,
"max_size": 10,
"steps": [
{
"op": "take",
"item": -4,
"item_name": "bmeta"
},
{
"op": "chooseleaf_firstn",
"num": 0,
"type": "host"
},
{
"op": "emit"
}
]
}

When stopping one of the OSDs the status gets degraded.

How to move PGs away from the OSDs?

How to let the pool use other OSDs?

Changing the crush rule?

Best,
Malte

Am 05.10.23 um 11:35 schrieb Malte Stroem:

Hello Eugen, Hello Joachim,

@Joachim: Interesting! And you got empty PGs, too? How did you  
solve the problem?


@Eugen: This is one of our biggest clusters and we're in the  
process to migrate from Nautilus to Octopus and to migrate from  
CentOS to Ubuntu.


The cache tier pool's OSDs were still version 14 OSDs. Most of the  
other OSDs are version 15 already.


So I tested the command:

ceph-objectstore-tool --data-path /path/to/osd --op remove --pgid  
3.0 --force


in a test cluster environment and this worked fine.

But the test scenario was not similar to our productive environment  
and the PG wasn't empty.


I did not find a way to emulate the same situation in the test  
scenario, yet.


Best,
Malte

Am 05.10.23 um 11:03 schrieb Eugen Block:
I know, I know... but since we are already using it (for years) I  
have to check how to remove it safely, maybe as long as we're on  
Pacific. ;-)


Zitat von Joachim Kraftmayer - ceph ambassador  
:



@Eugen

We have seen the same problems 8 years ago. I can only recommend  
never to use cache tiering in production.
At Cephalocon this was part of my talk and as far as I remember  
cache tiering will also disappear from ceph soon.


Cache tiering has been deprecated in the Reef release as it has  
lacked a maintainer for a very long time. This does not mean it  
will be certainly removed, but we may choose to remove it without  
much further notice.


https://docs.ceph.com/en/latest/rados/operations/cache-tiering/

Regards, Joachim


___
ceph ambassador DACH
ceph consultant since 2012

Clyso GmbH - Premier Ceph Foundation Member

https://www.clyso.com/

Am 05.10.23 um 10:02 schrieb Eugen Block:
Which ceph version is this? I'm trying to understand how  
removing a pool leaves the PGs of that pool... Do you have any  
logs or something from when you removed the pool?
We'll have to deal with a cache tier in the forseeable future as  
well so this is quite relevant for us as well. Maybe I'll try to  
reproduce it in a test cluster first.
Are those SSDs exclusively for the cache tier or are they used  
by oth

[ceph-users] Re: Remove empty orphaned PGs not mapped to a pool

2023-10-18 Thread Malte Stroem

Hello,

well yes, I think I have to edit the Crush rule and modify:

item_name

or to be clear:

I need to modify this in the decompiled crush map:

root bmeta {
id -4   # do not change unnecessarily
id -254 class hdd   # do not change unnecessarily
id -256 class ssd   # do not change unnecessarily
# weight 0.000
alg straw2
hash 0  # rjenkins1
item gor-bmeta weight 0.000

root and item have to be modified to match the other host I moved the 
OSDs to, I think.


What do you think?

Best,
Malte

Am 18.10.23 um 11:30 schrieb Malte Stroem:

Hello Eugen,

I was wrong. I am sorry.

The PGs are not empty and orphaned.

Most of the PGs are empty but a few are indeed used.

And the pool for these PGs is still there. It is the metadata pool of 
the erasure coded pool for RBDs. The cache tier pool was removed 
successfully.


So now we need to empty these OSDs.

The device class was SSD. I changed it to HDD and moved the OSDs inside 
the Crush tree to the other HDD OSDs of the host.


I need to move the PGs away from the OSDs to other OSDs but I do not 
know how to do it.


Is using pg-upmap the solution?

Is using the objectstore-tool the solution?

Is moving the OSDs inside Crush to the right place the solution?

Is migrating the metadata pool to another with another crush rule the 
solution?


The crush rule of this metadata pool looks like this:

{
     "rule_id": 8,
     "rule_name": "rbd-meta",
     "ruleset": 6,
     "type": 1,
     "min_size": 1,
     "max_size": 10,
     "steps": [
     {
     "op": "take",
     "item": -4,
     "item_name": "bmeta"
     },
     {
     "op": "chooseleaf_firstn",
     "num": 0,
     "type": "host"
     },
     {
     "op": "emit"
     }
     ]
}

When stopping one of the OSDs the status gets degraded.

How to move PGs away from the OSDs?

How to let the pool use other OSDs?

Changing the crush rule?

Best,
Malte

Am 05.10.23 um 11:35 schrieb Malte Stroem:

Hello Eugen, Hello Joachim,

@Joachim: Interesting! And you got empty PGs, too? How did you solve 
the problem?


@Eugen: This is one of our biggest clusters and we're in the process 
to migrate from Nautilus to Octopus and to migrate from CentOS to Ubuntu.


The cache tier pool's OSDs were still version 14 OSDs. Most of the 
other OSDs are version 15 already.


So I tested the command:

ceph-objectstore-tool --data-path /path/to/osd --op remove --pgid 3.0 
--force


in a test cluster environment and this worked fine.

But the test scenario was not similar to our productive environment 
and the PG wasn't empty.


I did not find a way to emulate the same situation in the test 
scenario, yet.


Best,
Malte

Am 05.10.23 um 11:03 schrieb Eugen Block:
I know, I know... but since we are already using it (for years) I 
have to check how to remove it safely, maybe as long as we're on 
Pacific. ;-)


Zitat von Joachim Kraftmayer - ceph ambassador 
:



@Eugen

We have seen the same problems 8 years ago. I can only recommend 
never to use cache tiering in production.
At Cephalocon this was part of my talk and as far as I remember 
cache tiering will also disappear from ceph soon.


Cache tiering has been deprecated in the Reef release as it has 
lacked a maintainer for a very long time. This does not mean it will 
be certainly removed, but we may choose to remove it without much 
further notice.


https://docs.ceph.com/en/latest/rados/operations/cache-tiering/

Regards, Joachim


___
ceph ambassador DACH
ceph consultant since 2012

Clyso GmbH - Premier Ceph Foundation Member

https://www.clyso.com/

Am 05.10.23 um 10:02 schrieb Eugen Block:
Which ceph version is this? I'm trying to understand how removing a 
pool leaves the PGs of that pool... Do you have any logs or 
something from when you removed the pool?
We'll have to deal with a cache tier in the forseeable future as 
well so this is quite relevant for us as well. Maybe I'll try to 
reproduce it in a test cluster first.
Are those SSDs exclusively for the cache tier or are they used by 
other pools as well? If they were used only for the cache tier you 
should be able to just remove them without any risk. But as I said, 
I'd rather try to understand before purging them.



Zitat von Malte Stroem :


Hello Eugen,

yes, we followed the documentation and everything worked fine. The 
cache is gone.


Removing the pool worked well. Everything is clean.

The PGs are empty active+clean.

Possible solutions:

1.

ceph pg {pg-id} mark_unfound_lost delete

I do not think this is the right way since it is for PGs with 
status unfound. But it could work also.


2.

Set the following for the three disk:

ceph osd lost {osd-id}

I am not sure how the cluster will react to this.

3.

ceph-objectstore-tool --data-path /path/to/osd --op remove --pgid 
3.0 --force


Now, will the cluster accept the 

[ceph-users] Re: stuck MDS warning: Client HOST failing to respond to cache pressure

2023-10-18 Thread Loïc Tortay

On 18/10/2023 10:02, Frank Schilder wrote:

Hi Loïc,

thanks for the pointer. Its kind of the opposite extreme to dropping just everything. I 
need to know the file name that is in cache. I'm looking for a middle way, say, 
"drop_caches -u USER" that drops all caches of files owned by user USER. This 
way I could try dropping caches for a bunch of users who are *not* running a job.


Hello,
You can use something like the following to get the list of filenames 
opened by $USER in $USERDIR (CephFS mountpoint):

lsof -au $USER $USERDIR | awk '/REG/{print $NF}' | sort -u

Some level of "drop_caches -u $USER" can then be achieved with piping 
the above commands to "xargs -r $SOMEWHERE/drop-from-pagecache".


No need to be "root" if run as $USER.


Loïc
--
|   Loīc Tortay  - IN2P3 Computing Centre  |
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Remove empty orphaned PGs not mapped to a pool

2023-10-18 Thread Malte Stroem

Hello Eugen,

I was wrong. I am sorry.

The PGs are not empty and orphaned.

Most of the PGs are empty but a few are indeed used.

And the pool for these PGs is still there. It is the metadata pool of 
the erasure coded pool for RBDs. The cache tier pool was removed 
successfully.


So now we need to empty these OSDs.

The device class was SSD. I changed it to HDD and moved the OSDs inside 
the Crush tree to the other HDD OSDs of the host.


I need to move the PGs away from the OSDs to other OSDs but I do not 
know how to do it.


Is using pg-upmap the solution?

Is using the objectstore-tool the solution?

Is moving the OSDs inside Crush to the right place the solution?

Is migrating the metadata pool to another with another crush rule the 
solution?


The crush rule of this metadata pool looks like this:

{
"rule_id": 8,
"rule_name": "rbd-meta",
"ruleset": 6,
"type": 1,
"min_size": 1,
"max_size": 10,
"steps": [
{
"op": "take",
"item": -4,
"item_name": "bmeta"
},
{
"op": "chooseleaf_firstn",
"num": 0,
"type": "host"
},
{
"op": "emit"
}
]
}

When stopping one of the OSDs the status gets degraded.

How to move PGs away from the OSDs?

How to let the pool use other OSDs?

Changing the crush rule?

Best,
Malte

Am 05.10.23 um 11:35 schrieb Malte Stroem:

Hello Eugen, Hello Joachim,

@Joachim: Interesting! And you got empty PGs, too? How did you solve the 
problem?


@Eugen: This is one of our biggest clusters and we're in the process to 
migrate from Nautilus to Octopus and to migrate from CentOS to Ubuntu.


The cache tier pool's OSDs were still version 14 OSDs. Most of the other 
OSDs are version 15 already.


So I tested the command:

ceph-objectstore-tool --data-path /path/to/osd --op remove --pgid 3.0 
--force


in a test cluster environment and this worked fine.

But the test scenario was not similar to our productive environment and 
the PG wasn't empty.


I did not find a way to emulate the same situation in the test scenario, 
yet.


Best,
Malte

Am 05.10.23 um 11:03 schrieb Eugen Block:
I know, I know... but since we are already using it (for years) I have 
to check how to remove it safely, maybe as long as we're on Pacific. ;-)


Zitat von Joachim Kraftmayer - ceph ambassador 
:



@Eugen

We have seen the same problems 8 years ago. I can only recommend 
never to use cache tiering in production.
At Cephalocon this was part of my talk and as far as I remember cache 
tiering will also disappear from ceph soon.


Cache tiering has been deprecated in the Reef release as it has 
lacked a maintainer for a very long time. This does not mean it will 
be certainly removed, but we may choose to remove it without much 
further notice.


https://docs.ceph.com/en/latest/rados/operations/cache-tiering/

Regards, Joachim


___
ceph ambassador DACH
ceph consultant since 2012

Clyso GmbH - Premier Ceph Foundation Member

https://www.clyso.com/

Am 05.10.23 um 10:02 schrieb Eugen Block:
Which ceph version is this? I'm trying to understand how removing a 
pool leaves the PGs of that pool... Do you have any logs or 
something from when you removed the pool?
We'll have to deal with a cache tier in the forseeable future as 
well so this is quite relevant for us as well. Maybe I'll try to 
reproduce it in a test cluster first.
Are those SSDs exclusively for the cache tier or are they used by 
other pools as well? If they were used only for the cache tier you 
should be able to just remove them without any risk. But as I said, 
I'd rather try to understand before purging them.



Zitat von Malte Stroem :


Hello Eugen,

yes, we followed the documentation and everything worked fine. The 
cache is gone.


Removing the pool worked well. Everything is clean.

The PGs are empty active+clean.

Possible solutions:

1.

ceph pg {pg-id} mark_unfound_lost delete

I do not think this is the right way since it is for PGs with 
status unfound. But it could work also.


2.

Set the following for the three disk:

ceph osd lost {osd-id}

I am not sure how the cluster will react to this.

3.

ceph-objectstore-tool --data-path /path/to/osd --op remove --pgid 
3.0 --force


Now, will the cluster accept the removed PG status?

4.

The three disks are still presented in the crush rule, class ssd, 
each single OSD under one host entry.


What if I remove them from crush?

Do you have a better idea, Eugen?

Best,
Malte

Am 04.10.23 um 09:21 schrieb Eugen Block:

Hi,

just for clarity, you're actually talking about the cache tier as 
described in the docs [1]? And you followed the steps until 'ceph 
osd tier remove cold-storage hot-storage' successfully? And the 
pool has been really deleted successfully ('ceph osd pool ls 
detail')?


[1] 
https://docs.ceph.com/en/latest/rados/operations/cache-tiering/#removing-a-cache-tier


Zitat von Malte Stroem :

[ceph-users] Re: Time Estimation for cephfs-data-scan scan_links

2023-10-18 Thread Peter Grandi
[...]
> What is being done is a serial tree walk and copy in 3
> replicas of all objects in the CephFS metadata pool, so it
> depends on both the read and write IOPS rate for the metadata
> pools, but mostly in the write IOPS. [...] Wild guess:
> metadata is on 10x 3.84TB SSDs without persistent cache, data
> is on 48x 8TB devices probably HDDs. Very cost effective :-).

I do not know if those guesses are right, but in general most
Ceph instances I have seen have been designed with the "cost
effective" choice of providing enough IOPS to run the user
workload (but often not even that), but not also more to be able
to run the admin workload quickly (checking, scanning,
scrubbing, migrating, 'fsck' or 'resilvering' of the underlying
filesystem). There is often a similar situation for non HPC
filesystem types, but the scale and pressure on instances of
those are usually much lower than for HPC filesystem instances,
so the consequencesa are less obvious.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] traffic by IP address / bucket / user

2023-10-18 Thread Boris Behrens
Hi,
did someone have a solution ready to monitor traffic by IP address?

Cheers
 Boris
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph 16.2.x mon compactions, disk writes

2023-10-18 Thread Frank Schilder
Hi Zakhar,

since its a bit beyond of the scope of basic, could you please post the 
complete ceph.conf config section for these changes for reference?

Thanks!
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Zakhar Kirpichenko 
Sent: Wednesday, October 18, 2023 6:14 AM
To: Eugen Block
Cc: ceph-users@ceph.io
Subject: [ceph-users] Re: Ceph 16.2.x mon compactions, disk writes

Many thanks for this, Eugen! I very much appreciate yours and Mykola's
efforts and insight!

Another thing I noticed was a reduction of RocksDB store after the
reduction of the total PG number by 30%, from 590-600 MB:

65M 3675511.sst
65M 3675512.sst
65M 3675513.sst
65M 3675514.sst
65M 3675515.sst
65M 3675516.sst
65M 3675517.sst
65M 3675518.sst
62M 3675519.sst

to about half of the original size:

-rw-r--r-- 1 167 167  7218886 Oct 13 16:16 3056869.log
-rw-r--r-- 1 167 167 67250650 Oct 13 16:15 3056871.sst
-rw-r--r-- 1 167 167 67367527 Oct 13 16:15 3056872.sst
-rw-r--r-- 1 167 167 63268486 Oct 13 16:15 3056873.sst

Then when I restarted the monitors one by one before adding compression,
RocksDB store reduced even further. I am not sure why and what exactly got
automatically removed from the store:

-rw-r--r-- 1 167 167   841960 Oct 18 03:31 018779.log
-rw-r--r-- 1 167 167 67290532 Oct 18 03:31 018781.sst
-rw-r--r-- 1 167 167 53287626 Oct 18 03:31 018782.sst

Then I have enabled LZ4 and LZ4HC compression in our small production
cluster (6 nodes, 96 OSDs) on 3 out of 5
monitors: compression=kLZ4Compression,bottommost_compression=kLZ4HCCompression.
I specifically went for LZ4 and LZ4HC because of the balance between
compression/decompression speed and impact on CPU usage. The compression
doesn't seem to affect the cluster in any negative way, the 3 monitors with
compression are operating normally. The effect of the compression on
RocksDB store size and disk writes is quite noticeable:

Compression disabled, 155 MB store.db, ~125 MB RocksDB sst, and ~530 MB
writes over 5 minutes:

-rw-r--r-- 1 167 167  4227337 Oct 18 03:58 3080868.log
-rw-r--r-- 1 167 167 67253592 Oct 18 03:57 3080870.sst
-rw-r--r-- 1 167 167 57783180 Oct 18 03:57 3080871.sst

# du -hs
/var/lib/ceph/3f50555a-ae2a-11eb-a2fc-ffde44714d86/mon.ceph04/store.db/;
iotop -ao -bn 2 -d 300 2>&1 | grep ceph-mon
155M
 /var/lib/ceph/3f50555a-ae2a-11eb-a2fc-ffde44714d86/mon.ceph04/store.db/
2471602 be/4 167   6.05 M473.24 M  0.00 %  0.16 % ceph-mon -n
mon.ceph04 -f --setuser ceph --setgroup ceph --default-log-to-file=false
--default-log-to-stderr=true --default-log-stderr-prefix=debug
 --default-mon-cluster-log-to-file=false
--default-mon-cluster-log-to-stderr=true [rocksdb:low0]
2471633 be/4 167 188.00 K 40.91 M  0.00 %  0.02 % ceph-mon -n
mon.ceph04 -f --setuser ceph --setgroup ceph --default-log-to-file=false
--default-log-to-stderr=true --default-log-stderr-prefix=debug
 --default-mon-cluster-log-to-file=false
--default-mon-cluster-log-to-stderr=true [ms_dispatch]
2471603 be/4 167  16.00 K 24.16 M  0.00 %  0.01 % ceph-mon -n
mon.ceph04 -f --setuser ceph --setgroup ceph --default-log-to-file=false
--default-log-to-stderr=true --default-log-stderr-prefix=debug
 --default-mon-cluster-log-to-file=false
--default-mon-cluster-log-to-stderr=true [rocksdb:high0]

Compression enabled, 60 MB store.db, ~23 MB RocksDB sst, and ~130 MB of
writes over 5 minutes:

-rw-r--r-- 1 167 167  5766659 Oct 18 03:56 3723355.log
-rw-r--r-- 1 167 167 22240390 Oct 18 03:56 3723357.sst

# du -hs
/var/lib/ceph/3f50555a-ae2a-11eb-a2fc-ffde44714d86/mon.ceph03/store.db/;
iotop -ao -bn 2 -d 300 2>&1 | grep ceph-mon
60M
/var/lib/ceph/3f50555a-ae2a-11eb-a2fc-ffde44714d86/mon.ceph03/store.db/
2052031 be/4 1671040.00 K 83.48 M  0.00 %  0.01 % ceph-mon -n
mon.ceph03 -f --setuser ceph --setgroup ceph --default-log-to-file=false
--default-log-to-stderr=true --default-log-stderr-prefix=debug
 --default-mon-cluster-log-to-file=false
--default-mon-cluster-log-to-stderr=true [rocksdb:low0]
2052062 be/4 167   0.00 B 40.79 M  0.00 %  0.01 % ceph-mon -n
mon.ceph03 -f --setuser ceph --setgroup ceph --default-log-to-file=false
--default-log-to-stderr=true --default-log-stderr-prefix=debug
 --default-mon-cluster-log-to-file=false
--default-mon-cluster-log-to-stderr=true [ms_dispatch]
2052032 be/4 167  16.00 K  4.68 M  0.00 %  0.00 % ceph-mon -n
mon.ceph03 -f --setuser ceph --setgroup ceph --default-log-to-file=false
--default-log-to-stderr=true --default-log-stderr-prefix=debug
 --default-mon-cluster-log-to-file=false
--default-mon-cluster-log-to-stderr=true [rocksdb:high0]
2052052 be/4 167  44.00 K  0.00 B  0.00 %  0.00 % ceph-mon -n
mon.ceph03 -f --setuser ceph --setgroup ceph --default-log-to-file=false
--default-log-to-stderr=true --default-log-stderr-prefix=debug
 --default-mon-cluster-log-to-file=false
--default-mon-cluster-log-to-stderr=tru

[ceph-users] Re: How to deal with increasing HDD sizes ? 1 OSD for 2 LVM-packed HDDs ?

2023-10-18 Thread Robert Sander

On 10/18/23 09:25, Renaud Jean Christophe Miel wrote:

Hi,

Use case:
* Ceph cluster with old nodes having 6TB HDDs
* Add new node with new 12TB HDDs

Is it supported/recommended to pack 2 6TB HDDs handled by 2 old OSDs
into 1 12TB LVM disk handled by 1 new OSD ?


The 12 TB HDD will get double the IO than one of the 6 TB HDDs.
But it will still only be able to handle about 120 IOPs.
This makes the newer larger HDDs a bottleneck when run in the same pool.

If you are not planning to decommission the smaller HDDs it is 
recommended to use the larger ones in a separate pool for performance 
reasons.


Regards
--
Robert Sander
Heinlein Consulting GmbH
Schwedter Str. 8/9b, 10119 Berlin

https://www.heinlein-support.de

Tel: 030 / 405051-43
Fax: 030 / 405051-19

Amtsgericht Berlin-Charlottenburg - HRB 220009 B
Geschäftsführer: Peer Heinlein - Sitz: Berlin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: stuck MDS warning: Client HOST failing to respond to cache pressure

2023-10-18 Thread Frank Schilder
Hi Loïc,

thanks for the pointer. Its kind of the opposite extreme to dropping just 
everything. I need to know the file name that is in cache. I'm looking for a 
middle way, say, "drop_caches -u USER" that drops all caches of files owned by 
user USER. This way I could try dropping caches for a bunch of users who are 
*not* running a job.

I guess I have to wait for the jobs to end.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Loïc Tortay 
Sent: Tuesday, October 17, 2023 3:40 PM
To: Frank Schilder; ceph-users@ceph.io
Subject: Re: [ceph-users] Re: stuck MDS warning: Client HOST failing to respond 
to cache pressure

On 17/10/2023 11:27, Frank Schilder wrote:
> Hi Stefan,
>
> probably. Its 2 compute nodes and there are jobs running. Our epilogue script 
> will drop the caches, at which point I indeed expect the warning to 
> disappear. We have no time limit on these nodes though, so this can be a 
> while. I was hoping there was an alternative to that, say, a user-level 
> command that I could execute on the client without possibly affecting other 
> users jobs.
>
Hello,
If you know the names of the files to flush from the cache (from
/proc/$PID/fd, lsof, batch job script, ...), you can use something like
https://github.com/tortay/cache-toys/blob/master/drop-from-pagecache.c
on the client.

See comments line 16 to 22 of the source code for caveats/limitations.


Loïc.
--
|   Loīc Tortay  - IN2P3 Computing Centre  |
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Nautilus - Octopus upgrade - more questions

2023-10-18 Thread Marc
> 
> I have a Nautilus cluster built using Ceph packages from Debian 10
> Backports, deployed with Ceph-Ansible.
> 
> I see that Debian does not offer Ceph 15/Octopus packages.  However,
> download.ceph.com does offer such packages.
> 
> Question:  Is it a safe upgrade to install the download.ceph.com packages
> over top of the buster-backports packages?

I am also still on Nautilus. However I am planning different upgrade path. 
First I will update centos7 to an el9 equivalent. There is no Nautilus for el9, 
however I tried compiling it once, and I sort of got the necessary packages. 
From el9 you can upgrade to the newest ceph.
Soon I will add an el9 osd node to the existing el7 cluster to see how this 
goes.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] How to deal with increasing HDD sizes ? 1 OSD for 2 LVM-packed HDDs ?

2023-10-18 Thread Renaud Jean Christophe Miel
Hi,

Use case:
* Ceph cluster with old nodes having 6TB HDDs
* Add new node with new 12TB HDDs

Is it supported/recommended to pack 2 6TB HDDs handled by 2 old OSDs
into 1 12TB LVM disk handled by 1 new OSD ?

Regards,

Renaud Miel
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io