[ceph-users] Re: Cephalocon Amsterdam 2023 Photographer Volunteer Help Needed

2023-03-21 Thread Alvaro Soto
Did you found a volunteer yet?

---
Alvaro Soto.

Note: My work hours may not be your work hours. Please do not feel the need
to respond during a time that is not convenient for you.
--
Great people talk about ideas,
ordinary people talk about things,
small people talk... about other people.

On Wed, Mar 15, 2023, 10:41 AM Mike Perez  wrote:

> Hi everyone,
>
> To help with costs for Cephalocon Amsterdam 2023, we wanted to see if
> anyone would like to volunteer to help with photography for the event. A
> group of people would be ideal so that we have good coverage in the expo
> hall and sessions.
> If you're interested, please reply to me directly for more information.
>
> --
> Mike Perez
> Community Manager
> Ceph Foundation
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Moving From BlueJeans to Jitsi for Ceph meetings

2023-03-21 Thread Alvaro Soto
+1 jitsi

---
Alvaro Soto.

Note: My work hours may not be your work hours. Please do not feel the need
to respond during a time that is not convenient for you.
--
Great people talk about ideas,
ordinary people talk about things,
small people talk... about other people.

On Tue, Mar 21, 2023, 2:02 PM Federico Lucifredi 
wrote:

> Jitsi is really good, and getting better — we have been using it with my
> local User’s Group for the last couple of years.
>
> Only observation is to discover the maximum allowable number of guests in
> advance if this is not already known - we had a fairly generous allowance
> in BlueJeans accounts for Red Hat, Jitsi community accounts may not be as
> large.
>
> Best-F
>
> > On Mar 21, 2023, at 12:26, Mike Perez  wrote:
> >
> > I'm not familiar with BBB myself. Are there any objections to Jitsi? I
> > want to update the calendar invites this week.
> >
> >> On Thu, Mar 16, 2023 at 6:16 PM huxia...@horebdata.cn
> >>  wrote:
> >>
> >> Besides Jitsi, another option would be BigBlueButton(BBB). Does anyone
> know how BBB compares with Jitsi?
> >>
> >>
> >>
> >>
> >> huxia...@horebdata.cn
> >>
> >> From: Mike Perez
> >> Date: 2023-03-16 21:54
> >> To: ceph-users
> >> Subject: [ceph-users] Moving From BlueJeans to Jitsi for Ceph meetings
> >> Hi everyone,
> >>
> >> We have been using BlueJeans to meet and record some of our meetings
> >> that later get posted to our YouTube channel. Unfortunately, we have
> >> to figure out a new meeting platform due to Red Hat discontinuing
> >> BlueJeans by the end of this month.
> >>
> >> Google Meets is an option, but some users in other countries have
> >> trouble using Google's services.
> >>
> >> For some meetings, we have tried out Jitsi, and it works well, meets
> >> our requirements, and is free.
> >>
> >> Does anyone else have suggestions for another free meeting platform
> >> that provides recording capabilities?
> >>
> >> --
> >> Mike Perez
> >> ___
> >> ceph-users mailing list -- ceph-users@ceph.io
> >> To unsubscribe send an email to ceph-users-le...@ceph.io
> >>
> >> ___
> >> ceph-users mailing list -- ceph-users@ceph.io
> >> To unsubscribe send an email to ceph-users-le...@ceph.io
> >>
> >
> >
> > --
> > Mike Perez
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: quincy v17.2.6 QE Validation status

2023-03-21 Thread Laura Flores
I reviewed the upgrade tests.

I opened two new trackers:
1. https://tracker.ceph.com/issues/59121 - "No upgrade in progress" during
upgrade tests - Ceph - Orchestrator
2. https://tracker.ceph.com/issues/59124 - "Health check failed: 1/3 mons
down, quorum b,c (MON_DOWN)" during quincy p2p upgrade test - Ceph - RADOS

Starred (*) trackers occurred frequently throughout the suites.

https://pulpito.ceph.com/yuriw-2023-03-14_21:33:13-upgrade:octopus-x-quincy-release-distro-default-smithi/
Failures:
1. *https://tracker.ceph.com/issues/59121
 ***
2. *https://tracker.ceph.com/issues/56393
 ***
3. https://tracker.ceph.com/issues/53615
Details:
1.* "No upgrade in progress" during upgrade tests - Ceph - Orchestrator*
2. *thrash-erasure-code-big: failed to complete snap trimming before
timeout - Ceph - RADOS*
3. qa: upgrade test fails with "timeout expired in wait_until_healthy"
- Ceph - CephFS

https://pulpito.ceph.com/yuriw-2023-03-15_21:14:59-upgrade:pacific-x-quincy-release-distro-default-smithi/
Failures:
1. *https://tracker.ceph.com/issues/56393
 ***
2. https://tracker.ceph.com/issues/58914
3. *https://tracker.ceph.com/issues/59121
 ***
4. https://tracker.ceph.com/issues/59123
Details:
1. *thrash-erasure-code-big: failed to complete snap trimming before
timeout - Ceph - RADOS*
2. [ FAILED ] TestClsRbd.group_snap_list_max_read in
upgrade:quincy-x-reef - Ceph - RBD
3. *"No upgrade in progress" during upgrade tests*
4. Timeout opening channel - Tools - Teuthology

https://pulpito.ceph.com/yuriw-2023-03-14_21:36:24-upgrade:quincy-p2p-quincy-release-distro-default-smithi/
Failures:
1. https://tracker.ceph.com/issues/59124
Details:
1. "Health check failed: 1/3 mons down, quorum b,c (MON_DOWN)" during
quincy p2p upgrade test - Ceph - RADOS

https://pulpito.ceph.com/yuriw-2023-03-15_15:25:50-upgrade-clients:client-upgrade-octopus-quincy-quincy-release%C2%A0-distro-default-smithi/
1 test failed from failure to fetch package

https://pulpito.ceph.com/yuriw-2023-03-15_15:26:37-upgrade-clients:client-upgrade-pacific-quincy-quincy-release-distro-default-smithi/
All green

On Tue, Mar 21, 2023 at 3:06 PM Yuri Weinstein  wrote:

> Details of this release are summarized here:
>
> https://tracker.ceph.com/issues/59070#note-1
> Release Notes - TBD
>
> The reruns were in the queue for 4 days because of some slowness issues.
> The core team (Neha, Radek, Laura, and others) are trying to narrow
> down the root cause.
>
> Seeking approvals/reviews for:
>
> rados - Neha, Radek, Travis, Ernesto, Adam King (we still have to test
> and merge at least one PR https://github.com/ceph/ceph/pull/50575 for
> the core)
> rgw - Casey
> fs - Venky (the fs suite has an unusually high amount of failed jobs,
> any reason to suspect it in the observed slowness?)
> orch - Adam King
> rbd - Ilya
> krbd - Ilya
> upgrade/octopus-x - Laura is looking into failures
> upgrade/pacific-x - Laura is looking into failures
> upgrade/quincy-p2p - Laura is looking into failures
> client-upgrade-octopus-quincy-quincy - missing packages, Adam Kraitman
> is looking into it
> powercycle - Brad
> ceph-volume - needs a rerun on merged
> https://github.com/ceph/ceph-ansible/pull/7409
>
> Please reply to this email with approval and/or trackers of known
> issues/PRs to address them.
>
> Also, share any findings or hypnosis about the slowness in the
> execution of the suite.
>
> Josh, Neha - gibba and LRC upgrades pending major suites approvals.
> RC release - pending major suites approvals.
>
> Thx
> YuriW
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>

-- 

Laura Flores

She/Her/Hers

Software Engineer, Ceph Storage 

Chicago, IL

lflo...@ibm.com | lflo...@redhat.com 
M: +17087388804
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] quincy v17.2.6 QE Validation status

2023-03-21 Thread Yuri Weinstein
Details of this release are summarized here:

https://tracker.ceph.com/issues/59070#note-1
Release Notes - TBD

The reruns were in the queue for 4 days because of some slowness issues.
The core team (Neha, Radek, Laura, and others) are trying to narrow
down the root cause.

Seeking approvals/reviews for:

rados - Neha, Radek, Travis, Ernesto, Adam King (we still have to test
and merge at least one PR https://github.com/ceph/ceph/pull/50575 for
the core)
rgw - Casey
fs - Venky (the fs suite has an unusually high amount of failed jobs,
any reason to suspect it in the observed slowness?)
orch - Adam King
rbd - Ilya
krbd - Ilya
upgrade/octopus-x - Laura is looking into failures
upgrade/pacific-x - Laura is looking into failures
upgrade/quincy-p2p - Laura is looking into failures
client-upgrade-octopus-quincy-quincy - missing packages, Adam Kraitman
is looking into it
powercycle - Brad
ceph-volume - needs a rerun on merged
https://github.com/ceph/ceph-ansible/pull/7409

Please reply to this email with approval and/or trackers of known
issues/PRs to address them.

Also, share any findings or hypnosis about the slowness in the
execution of the suite.

Josh, Neha - gibba and LRC upgrades pending major suites approvals.
RC release - pending major suites approvals.

Thx
YuriW
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Moving From BlueJeans to Jitsi for Ceph meetings

2023-03-21 Thread Federico Lucifredi
Jitsi is really good, and getting better — we have been using it with my local 
User’s Group for the last couple of years.

Only observation is to discover the maximum allowable number of guests in 
advance if this is not already known - we had a fairly generous allowance in 
BlueJeans accounts for Red Hat, Jitsi community accounts may not be as large.

Best-F

> On Mar 21, 2023, at 12:26, Mike Perez  wrote:
> 
> I'm not familiar with BBB myself. Are there any objections to Jitsi? I
> want to update the calendar invites this week.
> 
>> On Thu, Mar 16, 2023 at 6:16 PM huxia...@horebdata.cn
>>  wrote:
>> 
>> Besides Jitsi, another option would be BigBlueButton(BBB). Does anyone know 
>> how BBB compares with Jitsi?
>> 
>> 
>> 
>> 
>> huxia...@horebdata.cn
>> 
>> From: Mike Perez
>> Date: 2023-03-16 21:54
>> To: ceph-users
>> Subject: [ceph-users] Moving From BlueJeans to Jitsi for Ceph meetings
>> Hi everyone,
>> 
>> We have been using BlueJeans to meet and record some of our meetings
>> that later get posted to our YouTube channel. Unfortunately, we have
>> to figure out a new meeting platform due to Red Hat discontinuing
>> BlueJeans by the end of this month.
>> 
>> Google Meets is an option, but some users in other countries have
>> trouble using Google's services.
>> 
>> For some meetings, we have tried out Jitsi, and it works well, meets
>> our requirements, and is free.
>> 
>> Does anyone else have suggestions for another free meeting platform
>> that provides recording capabilities?
>> 
>> --
>> Mike Perez
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>> 
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>> 
> 
> 
> -- 
> Mike Perez
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: s3 compatible interface

2023-03-21 Thread Fox, Kevin M
Will either the file store or the posix/gpfs filter support the underlying 
files changing underneath so you can access the files either through s3 or by 
other out of band means (smb, nfs, etc)?

Thanks,
Kevin


From: Matt Benjamin 
Sent: Monday, March 20, 2023 5:27 PM
To: Chris MacNaughton
Cc: ceph-users@ceph.io; Kyle Bader
Subject: [ceph-users] Re: s3 compatible interface

Check twice before you click! This email originated from outside PNNL.


Hi Chris,

This looks useful.  Note for this thread:  this *looks like* it's using the
zipper dbstore backend?  Yes, that's coming in Reef.  We think of dbstore
as mostly the zipper reference driver, but it can be useful as a standalone
setup, potentially.

But there's now a prototype of a posix file filter that can be stacked on
dbstore (or rados, I guess)--not yet merged, and iiuc post-Reef.  That's
the project Daniel was describing.  The posix/gpfs filter is aiming for
being thin and fast and horizontally scalable.

The s3gw project that Clyso and folks were writing about is distinct from
both of these.  I *think* it's truthful to say that s3gw is its own
thing--a hybrid backing store with objects in files, but also metadata
atomicity from an embedded db--plus interesting orchestration.

Matt

On Mon, Mar 20, 2023 at 3:45 PM Chris MacNaughton <
chris.macnaugh...@canonical.com> wrote:

> On 3/20/23 12:02, Frank Schilder wrote:
>
> Hi Marc,
>
> I'm also interested in an S3 service that uses a file system as a back-end. I 
> looked at the documentation of 
> https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Faquarist-labs%2Fs3gw=05%7C01%7Ckevin.fox%40pnnl.gov%7C748fc400c7aa4d6e60db08db29a36b4b%7Cd6faa5f90ae240338c0130048a38deeb%7C0%7C0%7C638149554103894808%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=nq5PtA585rwTIsKwtyuh2EYcCDMIu%2Bwry6%2BXh1GukKs%3D=0
>  and have to say that it doesn't make much sense to me. I don't see this kind 
> of gateway anywhere there. What I see is a build of a rados gateway that can 
> be pointed at a ceph cluster. That's not a gateway to an FS.
>
> Did I misunderstand your actual request or can you point me to the part of 
> the documentation where it says how to spin up an S3 interface using a file 
> system for user data?
>
> The only thing I found is 
> https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fs3gw-docs.readthedocs.io%2Fen%2Flatest%2Fhelm-charts%2F%23local-storage=05%7C01%7Ckevin.fox%40pnnl.gov%7C748fc400c7aa4d6e60db08db29a36b4b%7Cd6faa5f90ae240338c0130048a38deeb%7C0%7C0%7C638149554103894808%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=1fr9aDJ3nqnB3RDDzsF6vpxzXN4961YRDQ%2BhHCdEC%2Bw%3D=0,
>  but it sounds to me that this is not where the user data will be going.
>
> Thanks for any hints and best regards,
>
>
> for testing you can try: 
> https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Faquarist-labs%2Fs3gw=05%7C01%7Ckevin.fox%40pnnl.gov%7C748fc400c7aa4d6e60db08db29a36b4b%7Cd6faa5f90ae240338c0130048a38deeb%7C0%7C0%7C638149554103894808%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=nq5PtA585rwTIsKwtyuh2EYcCDMIu%2Bwry6%2BXh1GukKs%3D=0
>
> Yes indeed, that looks like it can be used with a simple fs backend.
>
> Hey,
>
> (Re-sending this email from a mailing-list subscribed email)
>
> I was playing around with RadosGW's file backend (coming in Reef, zipper)
> a few months back and ended up making this docker container that just works
> to setup things:
> https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FChrisMacNaughton%2Fceph-rgw-docker=05%7C01%7Ckevin.fox%40pnnl.gov%7C748fc400c7aa4d6e60db08db29a36b4b%7Cd6faa5f90ae240338c0130048a38deeb%7C0%7C0%7C638149554103894808%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=Lu%2F9P50FHeInNkTkYUKQGzwDePnvkvcRR%2FmTOPdzeRE%3D=0;
>  published (still,
> maybe for a while?) at 
> https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fhub.docker.com%2Fr%2Ficeyec%2Fceph-rgw-zipper=05%7C01%7Ckevin.fox%40pnnl.gov%7C748fc400c7aa4d6e60db08db29a36b4b%7Cd6faa5f90ae240338c0130048a38deeb%7C0%7C0%7C638149554103894808%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=WQI5wYhaP6XDTiR%2FcKvkAe7i6o4iBgATWVdr4zSBDRI%3D=0
>
> Chris
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>


--

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103


[ceph-users] Re: Moving From BlueJeans to Jitsi for Ceph meetings

2023-03-21 Thread Mike Perez
I'm not familiar with BBB myself. Are there any objections to Jitsi? I
want to update the calendar invites this week.

On Thu, Mar 16, 2023 at 6:16 PM huxia...@horebdata.cn
 wrote:
>
> Besides Jitsi, another option would be BigBlueButton(BBB). Does anyone know 
> how BBB compares with Jitsi?
>
>
>
>
> huxia...@horebdata.cn
>
> From: Mike Perez
> Date: 2023-03-16 21:54
> To: ceph-users
> Subject: [ceph-users] Moving From BlueJeans to Jitsi for Ceph meetings
> Hi everyone,
>
> We have been using BlueJeans to meet and record some of our meetings
> that later get posted to our YouTube channel. Unfortunately, we have
> to figure out a new meeting platform due to Red Hat discontinuing
> BlueJeans by the end of this month.
>
> Google Meets is an option, but some users in other countries have
> trouble using Google's services.
>
> For some meetings, we have tried out Jitsi, and it works well, meets
> our requirements, and is free.
>
> Does anyone else have suggestions for another free meeting platform
> that provides recording capabilities?
>
> --
> Mike Perez
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>


-- 
Mike Perez
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] MDS host in OSD blacklist

2023-03-21 Thread Frank Schilder
Hi all,

we have an octopus v15.2.17 cluster and observe that one of our MDS hosts 
showed up in the OSD blacklist:

# ceph osd blacklist ls
192.168.32.87:6801/3841823949 2023-03-22T10:08:02.589698+0100
192.168.32.87:6800/3841823949 2023-03-22T10:08:02.589698+0100

I see an MDS restart that might be related; see log snippets below. There are 
no clients running on this host, only OSDs and one MDS. What could be the 
reason for the blacklist entries?

Thanks!

Log snippets:

Mar 21 10:07:54 ceph-23 journal: 
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.17/rpm/el8/BUILD/ceph-15.2.17/src/mds/ScatterLock.h:
 In function 'void ScatterLock::set_xlock_snap_sync(MDSContext*)' thread 
7f99e63d5700 time 2023-03-21T10:07:54.967936+0100
Mar 21 10:07:54 ceph-23 journal: 
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.17/rpm/el8/BUILD/ceph-15.2.17/src/mds/ScatterLock.h:
 59: FAILED ceph_assert(state == LOCK_XLOCK || state == LOCK_XLOCKDONE)
Mar 21 10:07:54 ceph-23 journal: ceph version 15.2.17 
(8a82819d84cf884bd39c17e3236e0632ac146dc4) octopus (stable)
Mar 21 10:07:54 ceph-23 journal: 1: (ceph::__ceph_assert_fail(char const*, char 
const*, int, char const*)+0x158) [0x7f99f4a25b92]
Mar 21 10:07:54 ceph-23 journal: 2: (()+0x27ddac) [0x7f99f4a25dac]
Mar 21 10:07:54 ceph-23 journal: 3: (MDCache::truncate_inode(CInode*, 
LogSegment*)+0x32c) [0x561bd623962c]
Mar 21 10:07:54 ceph-23 journal: 4: 
(C_MDS_inode_update_finish::finish(int)+0x133) [0x561bd6210a83]
Mar 21 10:07:54 ceph-23 journal: 5: (MDSContext::complete(int)+0x56) 
[0x561bd6422656]
Mar 21 10:07:54 ceph-23 journal: 6: (MDSIOContextBase::complete(int)+0x39c) 
[0x561bd6422b5c]
Mar 21 10:07:54 ceph-23 journal: 7: (MDSLogContextBase::complete(int)+0x44) 
[0x561bd6422cb4]
Mar 21 10:07:54 ceph-23 journal: 8: (Finisher::finisher_thread_entry()+0x1a5) 
[0x7f99f4ab6a95]
Mar 21 10:07:54 ceph-23 journal: 9: (()+0x81ca) [0x7f99f35fb1ca]
Mar 21 10:07:54 ceph-23 journal: 10: (clone()+0x43) [0x7f99f204ddd3]
Mar 21 10:07:54 ceph-23 journal: *** Caught signal (Aborted) **
Mar 21 10:07:54 ceph-23 journal: in thread 7f99e63d5700 thread_name:MR_Finisher
Mar 21 10:07:54 ceph-23 journal: 2023-03-21T10:07:54.980+0100 7f99e63d5700 -1 
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.17/rpm/el8/BUILD/ceph-15.2.17/src/mds/ScatterLock.h:
 In function 'void ScatterLock::set_xlock_snap_sync(MDSContext*)' thread 
7f99e63d5700 time 2023-03-21T10:07:54.967936+0100
Mar 21 10:07:54 ceph-23 journal: 
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.17/rpm/el8/BUILD/ceph-15.2.17/src/mds/ScatterLock.h:
 59: FAILED ceph_assert(state == LOCK_XLOCK || state == LOCK_XLOCKDONE)
Mar 21 10:07:54 ceph-23 journal: 
Mar 21 10:07:54 ceph-23 journal: ceph version 15.2.17 
(8a82819d84cf884bd39c17e3236e0632ac146dc4) octopus (stable)
Mar 21 10:07:54 ceph-23 journal: 1: (ceph::__ceph_assert_fail(char const*, char 
const*, int, char const*)+0x158) [0x7f99f4a25b92]
Mar 21 10:07:54 ceph-23 journal: 2: (()+0x27ddac) [0x7f99f4a25dac]
Mar 21 10:07:54 ceph-23 journal: 3: (MDCache::truncate_inode(CInode*, 
LogSegment*)+0x32c) [0x561bd623962c]
Mar 21 10:07:54 ceph-23 journal: 4: 
(C_MDS_inode_update_finish::finish(int)+0x133) [0x561bd6210a83]
Mar 21 10:07:54 ceph-23 journal: 5: (MDSContext::complete(int)+0x56) 
[0x561bd6422656]
Mar 21 10:07:54 ceph-23 journal: 6: (MDSIOContextBase::complete(int)+0x39c) 
[0x561bd6422b5c]
Mar 21 10:07:54 ceph-23 journal: 7: (MDSLogContextBase::complete(int)+0x44) 
[0x561bd6422cb4]
Mar 21 10:07:54 ceph-23 journal: 8: (Finisher::finisher_thread_entry()+0x1a5) 
[0x7f99f4ab6a95]
Mar 21 10:07:54 ceph-23 journal: 9: (()+0x81ca) [0x7f99f35fb1ca]
Mar 21 10:07:54 ceph-23 journal: 10: (clone()+0x43) [0x7f99f204ddd3]
Mar 21 10:07:54 ceph-23 journal: 
Mar 21 10:07:54 ceph-23 journal: ceph version 15.2.17 
(8a82819d84cf884bd39c17e3236e0632ac146dc4) octopus (stable)
Mar 21 10:07:54 ceph-23 journal: 1: (()+0x12ce0) [0x7f99f3605ce0]
Mar 21 10:07:54 ceph-23 journal: 2: (gsignal()+0x10f) [0x7f99f2062a9f]
Mar 21 10:07:54 ceph-23 journal: 3: (abort()+0x127) [0x7f99f2035e05]
Mar 21 10:07:54 ceph-23 journal: 4: (ceph::__ceph_assert_fail(char const*, char 
const*, int, char const*)+0x1a9) [0x7f99f4a25be3]
Mar 21 10:07:54 ceph-23 journal: 5: (()+0x27ddac) [0x7f99f4a25dac]
Mar 21 10:07:54 ceph-23 journal: 6: (MDCache::truncate_inode(CInode*, 
LogSegment*)+0x32c) [0x561bd623962c]
Mar 21 10:07:54 ceph-23 journal: 7: 
(C_MDS_inode_update_finish::finish(int)+0x133) [0x561bd6210a83]
Mar 21 10:07:54 ceph-23 journal: 8: (MDSContext::complete(int)+0x56) 
[0x561bd6422656]
Mar 21 10:07:54 ceph-23 journal: 9: 

[ceph-users] Re: avg apply latency went up after update from octopus to pacific

2023-03-21 Thread Boris Behrens
Hi Igor,
i've offline compacted all the OSDs and reenabled the bluefs_buffered_io

It didn't change anything and the commit and apply latencies are around
5-10 times higher than on our nautlus cluster. The pacific cluster got a 5
minute mean over all OSDs 2.2ms, while the nautilus cluster is around 0.2 -
0.7 ms.

I also see these kind of logs. Google didn't really help:
2023-03-21T14:08:22.089+ 7efe7b911700  3 rocksdb:
[le/block_based/filter_policy.cc:579] Using legacy Bloom filter with high
(20) bits/key. Dramatic filter space and/or accuracy improvement is
available with format_version>=5.




Am Di., 21. März 2023 um 10:46 Uhr schrieb Igor Fedotov <
igor.fedo...@croit.io>:

> Hi Boris,
>
> additionally you might want to manually compact RocksDB for every OSD.
>
>
> Thanks,
>
> Igor
> On 3/21/2023 12:22 PM, Boris Behrens wrote:
>
> Disabling the write cache and the bluefs_buffered_io did not change
> anything.
> What we see is that larger disks seem to be the leader in therms of
> slowness (we have 70% 2TB, 20% 4TB and 10% 8TB SSDs in the cluster), but
> removing some of the 8TB disks and replace them with 2TB (because it's by
> far the majority and we have a lot of them) disks did also not change
> anything.
>
> Are there any other ideas I could try. Customer start to complain about the
> slower performance and our k8s team mentions problems with ETCD because the
> latency is too high.
>
> Would it be an option to recreate every OSD?
>
> Cheers
>  Boris
>
> Am Di., 28. Feb. 2023 um 22:46 Uhr schrieb Boris Behrens  
> :
>
>
> Hi Josh,
> thanks a lot for the breakdown and the links.
> I disabled the write cache but it didn't change anything. Tomorrow I will
> try to disable bluefs_buffered_io.
>
> It doesn't sound that I can mitigate the problem with more SSDs.
>
>
> Am Di., 28. Feb. 2023 um 15:42 Uhr schrieb Josh Baergen 
> :
>
>
> Hi Boris,
>
> OK, what I'm wondering is whetherhttps://tracker.ceph.com/issues/58530 is 
> involved. There are two
> aspects to that ticket:
> * A measurable increase in the number of bytes written to disk in
> Pacific as compared to Nautilus
> * The same, but for IOPS
>
> Per the current theory, both are due to the loss of rocksdb log
> recycling when using default recovery options in rocksdb 6.8; Octopus
> uses version 6.1.2, Pacific uses 6.8.1.
>
> 16.2.11 largely addressed the bytes-written amplification, but the
> IOPS amplification remains. In practice, whether this results in a
> write performance degradation depends on the speed of the underlying
> media and the workload, and thus the things I mention in the next
> paragraph may or may not be applicable to you.
>
> There's no known workaround or solution for this at this time. In some
> cases I've seen that disabling bluefs_buffered_io (which itself can
> cause IOPS amplification in some cases) can help; I think most folks
> do this by setting it in local conf and then restarting OSDs in order
> to gain the config change. Something else to consider is
> https://docs.ceph.com/en/quincy/start/hardware-recommendations/#write-caches
> ,
> as sometimes disabling these write caches can improve the IOPS
> performance of SSDs.
>
> Josh
>
> On Tue, Feb 28, 2023 at 7:19 AM Boris Behrens  
>  wrote:
>
> Hi Josh,
> we upgraded 15.2.17 -> 16.2.11 and we only use rbd workload.
>
>
>
> Am Di., 28. Feb. 2023 um 15:00 Uhr schrieb Josh Baergen <
>
> jbaer...@digitalocean.com>:
>
> Hi Boris,
>
> Which version did you upgrade from and to, specifically? And what
> workload are you running (RBD, etc.)?
>
> Josh
>
> On Tue, Feb 28, 2023 at 6:51 AM Boris Behrens  
>  wrote:
>
> Hi,
> today I did the first update from octopus to pacific, and it looks
>
> like the
>
> avg apply latency went up from 1ms to 2ms.
>
> All 36 OSDs are 4TB SSDs and nothing else changed.
> Someone knows if this is an issue, or am I just missing a config
>
> value?
>
> Cheers
>  Boris
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
> --
> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend
>
> im groüen Saal.
>
>
> --
> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
> groüen Saal.
>
>
> --
> Igor Fedotov
> Ceph Lead Developer
> --
> croit GmbH, Freseniusstr. 31h, 81247 Munich
> CEO: Martin Verges - VAT-ID: DE310638492
> Com. register: Amtsgericht Munich HRB 231263
> Web  | LinkedIn  |
> Youtube  |
> Twitter 
>
> Meet us at the SC22 Conference! Learn more 
> Technology Fast50 Award Winner by Deloitte
> 
> !
>
>
> 

[ceph-users] Re: Very slow backfilling/remapping of EC pool PGs

2023-03-21 Thread Gauvain Pocentek
On Tue, Mar 21, 2023 at 2:21 PM Clyso GmbH - Ceph Foundation Member <
joachim.kraftma...@clyso.com> wrote:

>
>
> https://docs.ceph.com/en/latest/rados/configuration/osd-config-ref/#confval-osd_op_queue
>

Since this requires a restart I went an other way to speed up the recovery
of degraded PGs and avoid weirdness while restarting the OSDs. I've
increased the value of osd_mclock_max_capacity_iops_hdd to a ridiculous
number for spinning disks (6000). The effect is not magical but the
recovery went from 4 to 60 objects/s. Ceph should be back to normal in a
few hours.

I will change the osd_op_queue value once the cluster is stable.

Thanks for the help, it's been really useful, and I know a little bit more
about Ceph :)

Gauvain



> ___
> Clyso GmbH - Ceph Foundation Member
>
> Am 21.03.23 um 12:51 schrieb Gauvain Pocentek:
>
> (adding back the list)
>
> On Tue, Mar 21, 2023 at 11:25 AM Joachim Kraftmayer <
> joachim.kraftma...@clyso.com> wrote:
>
>> i added the questions and answers below.
>>
>> ___
>> Best Regards,
>> Joachim Kraftmayer
>> CEO | Clyso GmbH
>>
>> Clyso GmbH
>> p: +49 89 21 55 23 91 2
>> a: Loristraße 8 | 80335 München | Germany
>> w: https://clyso.com | e: joachim.kraftma...@clyso.com
>>
>> We are hiring: https://www.clyso.com/jobs/
>> ---
>> CEO: Dipl. Inf. (FH) Joachim Kraftmayer
>> Unternehmenssitz: Utting am Ammersee
>> Handelsregister beim Amtsgericht: Augsburg
>> Handelsregister-Nummer: HRB 25866
>> USt. ID-Nr.: DE275430677
>>
>> Am 21.03.23 um 11:14 schrieb Gauvain Pocentek:
>>
>> Hi Joachim,
>>
>>
>> On Tue, Mar 21, 2023 at 10:13 AM Joachim Kraftmayer <
>> joachim.kraftma...@clyso.com> wrote:
>>
>>> Which Ceph version are you running, is mclock active?
>>>
>>>
>> We're using Quincy (17.2.5), upgraded step by step from Luminous if I
>> remember correctly.
>>
>> did you recreate the osds? if yes, at which version?
>>
>
> I actually don't remember all the history, but I think we added the HDD
> nodes while running Pacific.
>
>
>
>>
>> mlock seems active, set to high_client_ops profile. HDD OSDs have very
>> different settings for max capacity iops:
>>
>> osd.137basic osd_mclock_max_capacity_iops_hdd
>>  929.763899
>> osd.161basic osd_mclock_max_capacity_iops_hdd
>>  4754.250946
>> osd.222basic osd_mclock_max_capacity_iops_hdd
>>  540.016984
>> osd.281basic osd_mclock_max_capacity_iops_hdd
>>  1029.193945
>> osd.282basic osd_mclock_max_capacity_iops_hdd
>>  1061.762870
>> osd.283basic osd_mclock_max_capacity_iops_hdd
>>  462.984562
>>
>> We haven't set those explicitly, could they be the reason of the slow
>> recovery?
>>
>> i recommend to disable mclock for now, and yes we have seen slow recovery
>> caused by mclock.
>>
>
> Stupid question: how do you do that? I've looked through the docs but
> could only find information about changing the settings.
>
>
>>
>>
>> Bonus question: does ceph set that itself?
>>
>> yes and if you have a setup with HDD + SSD (db & wal) the discovery works
>> not in the right way.
>>
>
> Good to know!
>
>
> Gauvain
>
>
>>
>> Thanks!
>>
>> Gauvain
>>
>>
>>
>>
>>> Joachim
>>>
>>> ___
>>> Clyso GmbH - Ceph Foundation Member
>>>
>>> Am 21.03.23 um 06:53 schrieb Gauvain Pocentek:
>>> > Hello all,
>>> >
>>> > We have an EC (4+2) pool for RGW data, with HDDs + SSDs for WAL/DB.
>>> This
>>> > pool has 9 servers with each 12 disks of 16TBs. About 10 days ago we
>>> lost a
>>> > server and we've removed its OSDs from the cluster. Ceph has started to
>>> > remap and backfill as expected, but the process has been getting
>>> slower and
>>> > slower. Today the recovery rate is around 12 MiB/s and 10 objects/s.
>>> All
>>> > the remaining unclean PGs are backfilling:
>>> >
>>> >data:
>>> >  volumes: 1/1 healthy
>>> >  pools:   14 pools, 14497 pgs
>>> >  objects: 192.38M objects, 380 TiB
>>> >  usage:   764 TiB used, 1.3 PiB / 2.1 PiB avail
>>> >  pgs: 771559/1065561630 objects degraded (0.072%)
>>> >   1215899/1065561630 objects misplaced (0.114%)
>>> >   14428 active+clean
>>> >   50active+undersized+degraded+remapped+backfilling
>>> >   18active+remapped+backfilling
>>> >   1 active+clean+scrubbing+deep
>>> >
>>> > We've checked the health of the remaining servers, and everything looks
>>> > like (CPU/RAM/network/disks).
>>> >
>>> > Any hints on what could be happening?
>>> >
>>> > Thank you,
>>> > Gauvain
>>> > ___
>>> > ceph-users mailing list -- ceph-users@ceph.io
>>> > To unsubscribe send an email to ceph-users-le...@ceph.io
>>>
>>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Very slow backfilling/remapping of EC pool PGs

2023-03-21 Thread Clyso GmbH - Ceph Foundation Member


https://docs.ceph.com/en/latest/rados/configuration/osd-config-ref/#confval-osd_op_queue

___
Clyso GmbH - Ceph Foundation Member

Am 21.03.23 um 12:51 schrieb Gauvain Pocentek:

(adding back the list)

On Tue, Mar 21, 2023 at 11:25 AM Joachim Kraftmayer 
 wrote:


i added the questions and answers below.

___
Best Regards,
Joachim Kraftmayer
CEO | Clyso GmbH

Clyso GmbH
p: +49 89 21 55 23 91 2
a: Loristraße 8 | 80335 München | Germany
w:https://clyso.com  | e:joachim.kraftma...@clyso.com

We are hiring:https://www.clyso.com/jobs/
---
CEO: Dipl. Inf. (FH) Joachim Kraftmayer
Unternehmenssitz: Utting am Ammersee
Handelsregister beim Amtsgericht: Augsburg
Handelsregister-Nummer: HRB 25866
USt. ID-Nr.: DE275430677

Am 21.03.23 um 11:14 schrieb Gauvain Pocentek:

Hi Joachim,


On Tue, Mar 21, 2023 at 10:13 AM Joachim Kraftmayer
 wrote:

Which Ceph version are you running, is mclock active?


We're using Quincy (17.2.5), upgraded step by step from Luminous
if I remember correctly.

did you recreate the osds? if yes, at which version?


I actually don't remember all the history, but I think we added the 
HDD nodes while running Pacific.




mlock seems active, set to high_client_ops profile. HDD OSDs have
very different settings for max capacity iops:

osd.137        basic osd_mclock_max_capacity_iops_hdd  929.763899
osd.161        basic osd_mclock_max_capacity_iops_hdd  4754.250946
osd.222        basic osd_mclock_max_capacity_iops_hdd  540.016984
osd.281        basic osd_mclock_max_capacity_iops_hdd  1029.193945
osd.282        basic osd_mclock_max_capacity_iops_hdd  1061.762870
osd.283        basic osd_mclock_max_capacity_iops_hdd  462.984562

We haven't set those explicitly, could they be the reason of the
slow recovery?


i recommend to disable mclock for now, and yes we have seen slow
recovery caused by mclock.


Stupid question: how do you do that? I've looked through the docs but 
could only find information about changing the settings.





Bonus question: does ceph set that itself?

yes and if you have a setup with HDD + SSD (db & wal) the
discovery works not in the right way.


Good to know!


Gauvain



Thanks!

Gauvain


Joachim

___
Clyso GmbH - Ceph Foundation Member

Am 21.03.23 um 06:53 schrieb Gauvain Pocentek:
> Hello all,
>
> We have an EC (4+2) pool for RGW data, with HDDs + SSDs for
WAL/DB. This
> pool has 9 servers with each 12 disks of 16TBs. About 10
days ago we lost a
> server and we've removed its OSDs from the cluster. Ceph
has started to
> remap and backfill as expected, but the process has been
getting slower and
> slower. Today the recovery rate is around 12 MiB/s and 10
objects/s. All
> the remaining unclean PGs are backfilling:
>
>    data:
>      volumes: 1/1 healthy
>      pools:   14 pools, 14497 pgs
>      objects: 192.38M objects, 380 TiB
>      usage:   764 TiB used, 1.3 PiB / 2.1 PiB avail
>      pgs:     771559/1065561630 objects degraded (0.072%)
>               1215899/1065561630 objects misplaced (0.114%)
>               14428 active+clean
>               50
active+undersized+degraded+remapped+backfilling
>               18 active+remapped+backfilling
>               1  active+clean+scrubbing+deep
>
> We've checked the health of the remaining servers, and
everything looks
> like (CPU/RAM/network/disks).
>
> Any hints on what could be happening?
>
> Thank you,
> Gauvain
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph Bluestore tweaks for Bcache

2023-03-21 Thread Matthias Ferdinand
Hi,

I found a way to preserve the rotational=1 flag for bcache-backed OSDs between
reboots. Using a systemd drop-in for ceph-osd@.service, it now uses lsblk to
look for a bcache device somewhere below the OSD, but sets rotational=1 in the
uppermost LVM volume device mapper target only. This is sufficient to
keep the osd metadata on rotational:1.

Setting rotational=1 additionally on the bcache device itself or along
the path up in the stack would certainly be possible, but be even more
convoluted than this.

This simpler version works well when all bcache-backed OSDs are finally
backed by rotating media. If you mix bcaches backed by HDD and backed
by flash on the same host you would need to dig further.

Regards
Matthias

#---

systemd drop-in at
/etc/systemd/system/ceph-osd@.service.d/10-set-bcache-rotational-flags.conf:

[Service]
ExecStartPre=/usr/local/sbin/set-bcache-rotational-flags %i

#---

/usr/local/sbin/set-bcache-rotational-flags:

#!/bin/sh

OSD_ID=$1
echo "# set rotational flag for osd.${OSD_ID} block device"

date
grep . /sys/block/*/queue/rotational | grep -v -e '^/sys/block/loop' -e 
'/sys/block/sr'

sleep 1

# ASSUMPTION: on this host, any bcache device holding OSD data is
#   backed by rotating media.
#
# This simplification allows to skip determining the exact bcache
# device and the exact backing device underneath this bcache device
# from where to copy the rotational flag to the ceph OSD LVM
# volume rotational flag.
#
# Instead, set the LVM volume rotational flag to '1' if lsblk
# finds any bcache device underneath. The rotational flag of the
# bcache device itself is not modified.

dev_basename=`readlink -f /var/lib/ceph/osd/ceph-${OSD_ID}/block | xargs -r 
basename`

if [ -n "${dev_basename}" ]; then
bcache_major=`awk '$2=="bcache" { print $1; }' /proc/devices`
if [ -n "${bcache_major}" ]; then
if lsblk --list --inverse --noheadings -o name,maj:min 
/dev/${dev_basename} | grep -q 
"^bcache[0-9]*[[:space:]][[:space:]]*${bcache_major}:.*$"; then
# this OSD sits on a bcache
r=/sys/block/${dev_basename}/queue/rotational
if [ -e "${r}" ]; then
echo "# setting rotational=1 on ${r}"
echo "1" >${r}
fi
fi
fi
fi


#---


On Thu, Feb 02, 2023 at 12:18:55AM +0100, Matthias Ferdinand wrote:
> ceph version: 17.2.0 on Ubuntu 22.04
>   non-containerized ceph from Ubuntu repos
>   cluster started on luminous
> 
> I have been using bcache on filestore on rotating disks for many years
> without problems.  Now converting OSDs to bluestore, there are some
> strange effects.
> 
> If I create the bcache device, set its rotational flag to '1', then do
> ceph-volume lvm create ... --crush-device-class=hdd
> the OSD comes up with the right parameters and much improved latency
> compared to OSD directly on /dev/sdX. 
> 
> ceph osd metatdata ...
> shows
> "bluestore_bdev_type": "hdd",
> "rotational": "1"
> 
> But after reboot, bcache rotational flag is set '0' again, and the OSD
> now comes up with "rotational": "0"
> Latency immediately starts to increase (and continually increases over
> the next days, possibly due to accumulating fragmention).
> 
> These wrong settings stay in place even if I stop the OSD, set the
> bcache rotational flag to '1' again and restart the OSD. I have found no
> way to get back to the original settings other than destroying and
> recreating the OSD. I guess I am just not seeing something obvious, like
> from where these settings get pulled at OSD startup.
> 
> I even created udev rules to set bcache rotational=1 at boot time,
> before any ceph daemon starts, but it did not help. Something running
> after these rules reset the bcache rotationl flags back to 0.
> Haven't found the culprit yet, but not sure if it even matters.
> 
> Are these OSD settings (bluestore_bdev_type, rotational) persisted
> somewhere and can they be edited and pinned?
> 
> Alternatively, can I manually set and persist the relevant bluestore
> tunables (per OSD / per device class) so as to make the bcache
> rotational flag irrelevant after the OSD is first created?
> 
> Regards
> Matthias
> 
> 
> On Fri, Apr 08, 2022 at 03:05:38PM +0300, Igor Fedotov wrote:
> > Hi Frank,
> > 
> > in fact this parameter impacts OSD behavior at both build-time and during
> > regular operationing. It simply substitutes hdd/ssd auto-detection with
> > manual specification.  And hence relevant config parameters are applied. If
> > e.g. min_alloc_size is persistent after OSD creation - it wouldn't be
> > updated. But if specific setting allows at run-time - it would be altered.
> > 
> > So the proper usage would definitely be manual ssd/hdd mode selection before
> > the 

[ceph-users] Re: Very slow backfilling/remapping of EC pool PGs

2023-03-21 Thread Gauvain Pocentek
(adding back the list)

On Tue, Mar 21, 2023 at 11:25 AM Joachim Kraftmayer <
joachim.kraftma...@clyso.com> wrote:

> i added the questions and answers below.
>
> ___
> Best Regards,
> Joachim Kraftmayer
> CEO | Clyso GmbH
>
> Clyso GmbH
> p: +49 89 21 55 23 91 2
> a: Loristraße 8 | 80335 München | Germany
> w: https://clyso.com | e: joachim.kraftma...@clyso.com
>
> We are hiring: https://www.clyso.com/jobs/
> ---
> CEO: Dipl. Inf. (FH) Joachim Kraftmayer
> Unternehmenssitz: Utting am Ammersee
> Handelsregister beim Amtsgericht: Augsburg
> Handelsregister-Nummer: HRB 25866
> USt. ID-Nr.: DE275430677
>
> Am 21.03.23 um 11:14 schrieb Gauvain Pocentek:
>
> Hi Joachim,
>
>
> On Tue, Mar 21, 2023 at 10:13 AM Joachim Kraftmayer <
> joachim.kraftma...@clyso.com> wrote:
>
>> Which Ceph version are you running, is mclock active?
>>
>>
> We're using Quincy (17.2.5), upgraded step by step from Luminous if I
> remember correctly.
>
> did you recreate the osds? if yes, at which version?
>

I actually don't remember all the history, but I think we added the HDD
nodes while running Pacific.



>
> mlock seems active, set to high_client_ops profile. HDD OSDs have very
> different settings for max capacity iops:
>
> osd.137basic osd_mclock_max_capacity_iops_hdd
>  929.763899
> osd.161basic osd_mclock_max_capacity_iops_hdd
>  4754.250946
> osd.222basic osd_mclock_max_capacity_iops_hdd
>  540.016984
> osd.281basic osd_mclock_max_capacity_iops_hdd
>  1029.193945
> osd.282basic osd_mclock_max_capacity_iops_hdd
>  1061.762870
> osd.283basic osd_mclock_max_capacity_iops_hdd
>  462.984562
>
> We haven't set those explicitly, could they be the reason of the slow
> recovery?
>
> i recommend to disable mclock for now, and yes we have seen slow recovery
> caused by mclock.
>

Stupid question: how do you do that? I've looked through the docs but could
only find information about changing the settings.


>
>
> Bonus question: does ceph set that itself?
>
> yes and if you have a setup with HDD + SSD (db & wal) the discovery works
> not in the right way.
>

Good to know!


Gauvain


>
> Thanks!
>
> Gauvain
>
>
>
>
>> Joachim
>>
>> ___
>> Clyso GmbH - Ceph Foundation Member
>>
>> Am 21.03.23 um 06:53 schrieb Gauvain Pocentek:
>> > Hello all,
>> >
>> > We have an EC (4+2) pool for RGW data, with HDDs + SSDs for WAL/DB. This
>> > pool has 9 servers with each 12 disks of 16TBs. About 10 days ago we
>> lost a
>> > server and we've removed its OSDs from the cluster. Ceph has started to
>> > remap and backfill as expected, but the process has been getting slower
>> and
>> > slower. Today the recovery rate is around 12 MiB/s and 10 objects/s. All
>> > the remaining unclean PGs are backfilling:
>> >
>> >data:
>> >  volumes: 1/1 healthy
>> >  pools:   14 pools, 14497 pgs
>> >  objects: 192.38M objects, 380 TiB
>> >  usage:   764 TiB used, 1.3 PiB / 2.1 PiB avail
>> >  pgs: 771559/1065561630 objects degraded (0.072%)
>> >   1215899/1065561630 objects misplaced (0.114%)
>> >   14428 active+clean
>> >   50active+undersized+degraded+remapped+backfilling
>> >   18active+remapped+backfilling
>> >   1 active+clean+scrubbing+deep
>> >
>> > We've checked the health of the remaining servers, and everything looks
>> > like (CPU/RAM/network/disks).
>> >
>> > Any hints on what could be happening?
>> >
>> > Thank you,
>> > Gauvain
>> > ___
>> > ceph-users mailing list -- ceph-users@ceph.io
>> > To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Changing os to ubuntu from centos 8

2023-03-21 Thread Szabo, Istvan (Agoda)
Thank you, I’ll take a note and give a try.

Istvan Szabo
Staff Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com
---

From: Boris Behrens 
Sent: Tuesday, March 21, 2023 4:29 PM
To: Szabo, Istvan (Agoda) ; Ceph Users 

Cc: dietr...@internet-sicherheit.de; ji...@spets.org
Subject: Re: [ceph-users] Changing os to ubuntu from centos 8

Email received from the internet. If in doubt, don't click any link nor open 
any attachment !

Hi Istvan,

I currently make the move from centos7 to ubuntu18.04 (we want to jump directly 
from nautilus to pacific), When everything in the cluster got the same version, 
and the version is available on the new OS you can just reinstall the hosts 
with the new OS.

With the mons, I remove the current mon from the list while reinstalling and 
recreate the mon afterward, so I don't need to carry over any files. With the 
OSD hosts I just set the cluster to "noout" and have the system down for 20 
minutes, which is about the time I require to install the new OS and provision 
all the configs. Afterwards I just start all the OSDs (ceph-volume lvm activate 
--all) and wait for the cluster to become green again.

Cheers
 Boris

Am Di., 21. März 2023 um 08:54 Uhr schrieb Szabo, Istvan (Agoda) 
mailto:istvan.sz...@agoda.com>>:
Hi,

I'd like to change the os to ubuntu 20.04.5 from my bare metal deployed octopus 
15.2.14 on centos 8. On the first run I would go with octopus 15.2.17 just to 
not make big changes in the cluster.
I've found couple of threads on the mailing list but those were containerized 
(like: Re: Upgrade/migrate host operating system for ceph nodes (CentOS/Rocky) 
or  Re: Migrating CEPH OS looking for suggestions).

Wonder what is the proper steps for this kind of migration? Do we need to start 
with mgr or mon or rgw or osd?
Is it possible to reuse the osd with ceph-volume scan on the reinstalled 
machine?
I'd stay with baremetal deployment and even maybe with octopus but I'm curious 
your advice.

Thank you


This message is confidential and is for the sole use of the intended 
recipient(s). It may also be privileged or otherwise protected by copyright or 
other legal rules. If you have received it by mistake please let us know by 
reply email and delete it from your system. It is prohibited to copy this 
message or disclose its content to anyone. Any confidentiality or privilege is 
not waived or lost by any mistaken delivery or unauthorized disclosure of the 
message. All messages sent to and from Agoda may be monitored to ensure 
compliance with company policies, to protect the company's interests and to 
remove potential malware. Electronic messages may be intercepted, amended, lost 
or deleted, or contain viruses.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to 
ceph-users-le...@ceph.io


--
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im 
groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: avg apply latency went up after update from octopus to pacific

2023-03-21 Thread Igor Fedotov

Hi Boris,

additionally you might want to manually compact RocksDB for every OSD.


Thanks,

Igor

On 3/21/2023 12:22 PM, Boris Behrens wrote:

Disabling the write cache and the bluefs_buffered_io did not change
anything.
What we see is that larger disks seem to be the leader in therms of
slowness (we have 70% 2TB, 20% 4TB and 10% 8TB SSDs in the cluster), but
removing some of the 8TB disks and replace them with 2TB (because it's by
far the majority and we have a lot of them) disks did also not change
anything.

Are there any other ideas I could try. Customer start to complain about the
slower performance and our k8s team mentions problems with ETCD because the
latency is too high.

Would it be an option to recreate every OSD?

Cheers
  Boris

Am Di., 28. Feb. 2023 um 22:46 Uhr schrieb Boris Behrens:


Hi Josh,
thanks a lot for the breakdown and the links.
I disabled the write cache but it didn't change anything. Tomorrow I will
try to disable bluefs_buffered_io.

It doesn't sound that I can mitigate the problem with more SSDs.


Am Di., 28. Feb. 2023 um 15:42 Uhr schrieb Josh Baergen <
jbaer...@digitalocean.com>:


Hi Boris,

OK, what I'm wondering is whether
https://tracker.ceph.com/issues/58530  is involved. There are two
aspects to that ticket:
* A measurable increase in the number of bytes written to disk in
Pacific as compared to Nautilus
* The same, but for IOPS

Per the current theory, both are due to the loss of rocksdb log
recycling when using default recovery options in rocksdb 6.8; Octopus
uses version 6.1.2, Pacific uses 6.8.1.

16.2.11 largely addressed the bytes-written amplification, but the
IOPS amplification remains. In practice, whether this results in a
write performance degradation depends on the speed of the underlying
media and the workload, and thus the things I mention in the next
paragraph may or may not be applicable to you.

There's no known workaround or solution for this at this time. In some
cases I've seen that disabling bluefs_buffered_io (which itself can
cause IOPS amplification in some cases) can help; I think most folks
do this by setting it in local conf and then restarting OSDs in order
to gain the config change. Something else to consider is

https://docs.ceph.com/en/quincy/start/hardware-recommendations/#write-caches
,
as sometimes disabling these write caches can improve the IOPS
performance of SSDs.

Josh

On Tue, Feb 28, 2023 at 7:19 AM Boris Behrens  wrote:

Hi Josh,
we upgraded 15.2.17 -> 16.2.11 and we only use rbd workload.



Am Di., 28. Feb. 2023 um 15:00 Uhr schrieb Josh Baergen <

jbaer...@digitalocean.com>:

Hi Boris,

Which version did you upgrade from and to, specifically? And what
workload are you running (RBD, etc.)?

Josh

On Tue, Feb 28, 2023 at 6:51 AM Boris Behrens  wrote:

Hi,
today I did the first update from octopus to pacific, and it looks

like the

avg apply latency went up from 1ms to 2ms.

All 36 OSDs are 4TB SSDs and nothing else changed.
Someone knows if this is an issue, or am I just missing a config

value?

Cheers
  Boris
___
ceph-users mailing list --ceph-users@ceph.io
To unsubscribe send an email toceph-users-le...@ceph.io



--
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend

im groüen Saal.



--
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.




--
Igor Fedotov
Ceph Lead Developer
--
croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263
Web  | LinkedIn  | 
Youtube  | 
Twitter 


Meet us at the SC22 Conference! Learn more 
Technology Fast50 Award Winner by Deloitte 
!



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Changing os to ubuntu from centos 8

2023-03-21 Thread Boris Behrens
Hi Istvan,

I currently make the move from centos7 to ubuntu18.04 (we want to jump
directly from nautilus to pacific), When everything in the cluster got the
same version, and the version is available on the new OS you can just
reinstall the hosts with the new OS.

With the mons, I remove the current mon from the list while reinstalling
and recreate the mon afterward, so I don't need to carry over any files.
With the OSD hosts I just set the cluster to "noout" and have the system
down for 20 minutes, which is about the time I require to install the new
OS and provision all the configs. Afterwards I just start all the OSDs
(ceph-volume lvm activate --all) and wait for the cluster to become green
again.

Cheers
 Boris

Am Di., 21. März 2023 um 08:54 Uhr schrieb Szabo, Istvan (Agoda) <
istvan.sz...@agoda.com>:

> Hi,
>
> I'd like to change the os to ubuntu 20.04.5 from my bare metal deployed
> octopus 15.2.14 on centos 8. On the first run I would go with octopus
> 15.2.17 just to not make big changes in the cluster.
> I've found couple of threads on the mailing list but those were
> containerized (like: Re: Upgrade/migrate host operating system for ceph
> nodes (CentOS/Rocky) or  Re: Migrating CEPH OS looking for suggestions).
>
> Wonder what is the proper steps for this kind of migration? Do we need to
> start with mgr or mon or rgw or osd?
> Is it possible to reuse the osd with ceph-volume scan on the reinstalled
> machine?
> I'd stay with baremetal deployment and even maybe with octopus but I'm
> curious your advice.
>
> Thank you
>
> 
> This message is confidential and is for the sole use of the intended
> recipient(s). It may also be privileged or otherwise protected by copyright
> or other legal rules. If you have received it by mistake please let us know
> by reply email and delete it from your system. It is prohibited to copy
> this message or disclose its content to anyone. Any confidentiality or
> privilege is not waived or lost by any mistaken delivery or unauthorized
> disclosure of the message. All messages sent to and from Agoda may be
> monitored to ensure compliance with company policies, to protect the
> company's interests and to remove potential malware. Electronic messages
> may be intercepted, amended, lost or deleted, or contain viruses.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>


-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: avg apply latency went up after update from octopus to pacific

2023-03-21 Thread Boris Behrens
Disabling the write cache and the bluefs_buffered_io did not change
anything.
What we see is that larger disks seem to be the leader in therms of
slowness (we have 70% 2TB, 20% 4TB and 10% 8TB SSDs in the cluster), but
removing some of the 8TB disks and replace them with 2TB (because it's by
far the majority and we have a lot of them) disks did also not change
anything.

Are there any other ideas I could try. Customer start to complain about the
slower performance and our k8s team mentions problems with ETCD because the
latency is too high.

Would it be an option to recreate every OSD?

Cheers
 Boris

Am Di., 28. Feb. 2023 um 22:46 Uhr schrieb Boris Behrens :

> Hi Josh,
> thanks a lot for the breakdown and the links.
> I disabled the write cache but it didn't change anything. Tomorrow I will
> try to disable bluefs_buffered_io.
>
> It doesn't sound that I can mitigate the problem with more SSDs.
>
>
> Am Di., 28. Feb. 2023 um 15:42 Uhr schrieb Josh Baergen <
> jbaer...@digitalocean.com>:
>
>> Hi Boris,
>>
>> OK, what I'm wondering is whether
>> https://tracker.ceph.com/issues/58530 is involved. There are two
>> aspects to that ticket:
>> * A measurable increase in the number of bytes written to disk in
>> Pacific as compared to Nautilus
>> * The same, but for IOPS
>>
>> Per the current theory, both are due to the loss of rocksdb log
>> recycling when using default recovery options in rocksdb 6.8; Octopus
>> uses version 6.1.2, Pacific uses 6.8.1.
>>
>> 16.2.11 largely addressed the bytes-written amplification, but the
>> IOPS amplification remains. In practice, whether this results in a
>> write performance degradation depends on the speed of the underlying
>> media and the workload, and thus the things I mention in the next
>> paragraph may or may not be applicable to you.
>>
>> There's no known workaround or solution for this at this time. In some
>> cases I've seen that disabling bluefs_buffered_io (which itself can
>> cause IOPS amplification in some cases) can help; I think most folks
>> do this by setting it in local conf and then restarting OSDs in order
>> to gain the config change. Something else to consider is
>>
>> https://docs.ceph.com/en/quincy/start/hardware-recommendations/#write-caches
>> ,
>> as sometimes disabling these write caches can improve the IOPS
>> performance of SSDs.
>>
>> Josh
>>
>> On Tue, Feb 28, 2023 at 7:19 AM Boris Behrens  wrote:
>> >
>> > Hi Josh,
>> > we upgraded 15.2.17 -> 16.2.11 and we only use rbd workload.
>> >
>> >
>> >
>> > Am Di., 28. Feb. 2023 um 15:00 Uhr schrieb Josh Baergen <
>> jbaer...@digitalocean.com>:
>> >>
>> >> Hi Boris,
>> >>
>> >> Which version did you upgrade from and to, specifically? And what
>> >> workload are you running (RBD, etc.)?
>> >>
>> >> Josh
>> >>
>> >> On Tue, Feb 28, 2023 at 6:51 AM Boris Behrens  wrote:
>> >> >
>> >> > Hi,
>> >> > today I did the first update from octopus to pacific, and it looks
>> like the
>> >> > avg apply latency went up from 1ms to 2ms.
>> >> >
>> >> > All 36 OSDs are 4TB SSDs and nothing else changed.
>> >> > Someone knows if this is an issue, or am I just missing a config
>> value?
>> >> >
>> >> > Cheers
>> >> >  Boris
>> >> > ___
>> >> > ceph-users mailing list -- ceph-users@ceph.io
>> >> > To unsubscribe send an email to ceph-users-le...@ceph.io
>> >
>> >
>> >
>> > --
>> > Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend
>> im groüen Saal.
>>
>
>
> --
> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
> groüen Saal.
>


-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: s3 compatible interface

2023-03-21 Thread Joachim Kraftmayer
Hi, maybe I should have mentioned the zipper project as well, watched 
both IBM and SUSE presentations at FOSDEM 2023.


I personally follow the zipper project with great interest.

Joachim


___

Ceph Foundation Member

Am 21.03.23 um 01:27 schrieb Matt Benjamin:

Hi Chris,

This looks useful.  Note for this thread:  this *looks like* it's using the
zipper dbstore backend?  Yes, that's coming in Reef.  We think of dbstore
as mostly the zipper reference driver, but it can be useful as a standalone
setup, potentially.

But there's now a prototype of a posix file filter that can be stacked on
dbstore (or rados, I guess)--not yet merged, and iiuc post-Reef.  That's
the project Daniel was describing.  The posix/gpfs filter is aiming for
being thin and fast and horizontally scalable.

The s3gw project that Clyso and folks were writing about is distinct from
both of these.  I *think* it's truthful to say that s3gw is its own
thing--a hybrid backing store with objects in files, but also metadata
atomicity from an embedded db--plus interesting orchestration.

Matt

On Mon, Mar 20, 2023 at 3:45 PM Chris MacNaughton <
chris.macnaugh...@canonical.com> wrote:


On 3/20/23 12:02, Frank Schilder wrote:

Hi Marc,

I'm also interested in an S3 service that uses a file system as a back-end. I 
looked at the documentation of https://github.com/aquarist-labs/s3gw and have 
to say that it doesn't make much sense to me. I don't see this kind of gateway 
anywhere there. What I see is a build of a rados gateway that can be pointed at 
a ceph cluster. That's not a gateway to an FS.

Did I misunderstand your actual request or can you point me to the part of the 
documentation where it says how to spin up an S3 interface using a file system 
for user data?

The only thing I found is 
https://s3gw-docs.readthedocs.io/en/latest/helm-charts/#local-storage, but it 
sounds to me that this is not where the user data will be going.

Thanks for any hints and best regards,


for testing you can try: https://github.com/aquarist-labs/s3gw

Yes indeed, that looks like it can be used with a simple fs backend.

Hey,

(Re-sending this email from a mailing-list subscribed email)

I was playing around with RadosGW's file backend (coming in Reef, zipper)
a few months back and ended up making this docker container that just works
to setup things:
https://github.com/ChrisMacNaughton/ceph-rgw-docker; published (still,
maybe for a while?) at https://hub.docker.com/r/iceyec/ceph-rgw-zipper

Chris
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io




___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Very slow backfilling/remapping of EC pool PGs

2023-03-21 Thread Joachim Kraftmayer

Which Ceph version are you running, is mclock active?

Joachim

___
Clyso GmbH - Ceph Foundation Member

Am 21.03.23 um 06:53 schrieb Gauvain Pocentek:

Hello all,

We have an EC (4+2) pool for RGW data, with HDDs + SSDs for WAL/DB. This
pool has 9 servers with each 12 disks of 16TBs. About 10 days ago we lost a
server and we've removed its OSDs from the cluster. Ceph has started to
remap and backfill as expected, but the process has been getting slower and
slower. Today the recovery rate is around 12 MiB/s and 10 objects/s. All
the remaining unclean PGs are backfilling:

   data:
 volumes: 1/1 healthy
 pools:   14 pools, 14497 pgs
 objects: 192.38M objects, 380 TiB
 usage:   764 TiB used, 1.3 PiB / 2.1 PiB avail
 pgs: 771559/1065561630 objects degraded (0.072%)
  1215899/1065561630 objects misplaced (0.114%)
  14428 active+clean
  50active+undersized+degraded+remapped+backfilling
  18active+remapped+backfilling
  1 active+clean+scrubbing+deep

We've checked the health of the remaining servers, and everything looks
like (CPU/RAM/network/disks).

Any hints on what could be happening?

Thank you,
Gauvain
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Unexpected ceph pool creation error with Ceph Quincy

2023-03-21 Thread Eugen Block
Sorry, hit send too early. It seems I could reproduce it by reducing  
the value to 1:


host1:~ # ceph config set mon mon_max_pool_pg_num 1
host1:~ # ceph config get mon mon_max_pool_pg_num
1
host1:~ # ceph osd pool create pool3
Error ERANGE: 'pg_num' must be greater than 0 and less than or equal  
to 1 (you may adjust 'mon max pool pg num' for higher values)


The default is 65536. Can you verify if this is your issue?

Zitat von Eugen Block :

Did you ever adjust mon_max_pool_pg_num? Can you check what your  
current config value is?


host1:~ # ceph config get mon mon_max_pool_pg_num
65536

Zitat von Geert Kloosterman :


Hi,

Thanks Eugen for checking this.  I get the same default values as  
you when I remove the entries from my ceph.conf:


  [root@gjk-ceph ~]# ceph-conf -D | grep default_pg
  osd_pool_default_pg_autoscale_mode = on
  osd_pool_default_pg_num = 32
  osd_pool_default_pgp_num = 0

However, in my case, the pool creation error remains:

  [root@gjk-ceph ~]# ceph osd pool create asdf
  Error ERANGE: 'pgp_num' must be greater than 0 and lower or equal
  than 'pg_num', which in this case is 1

But I can create the pool when passing the same pg_num and pgp_num
values explicity:

  [root@gjk-ceph ~]# ceph osd pool create asdf 32 0
  pool 'asdf' created

Does anyone have an idea how I can debug this further?

I'm running Ceph on a virtualized Rocky 8.7 test cluster, with Ceph
rpms installed from http://download.ceph.com/rpm-quincy/el8/

Best regards,
Geert Kloosterman


On Wed, 2023-03-15 at 13:42 +, Eugen Block wrote:

External email: Use caution opening links or attachments


Hi,

I could not confirm this in a virtual lab cluster, also on 17.2.5:

host1:~ # ceph osd pool create asdf
pool 'asdf' created

host1:~ # ceph-conf -D | grep 'osd_pool_default_pg'
osd_pool_default_pg_autoscale_mode = on
osd_pool_default_pg_num = 32
osd_pool_default_pgp_num = 0

So it looks quite similar except the pgp_num value (I can't remember
having that modified). This is an upgraded Nautilus cluster.

Zitat von Geert Kloosterman :


Hi all,

I'm trying out Ceph Quincy (17.2.5) for the first time and I'm
running into unexpected behavior of "ceph osd pool create".

When not passing any pg_num and pgp_num values, I get the following
error with Quincy:

[root@gjk-ceph ~]# ceph osd pool create asdf
Error ERANGE: 'pgp_num' must be greater than 0 and lower or
equal than 'pg_num', which in this case is 1

I checked with Ceph Pacific (16.2.11) and there the extra arguments
are not needed.

I expected it would use osd_pool_default_pg_num and
osd_pool_default_pgp_num as defined in my ceph.conf:

[root@gjk-ceph ~]# ceph-conf -D | grep 'osd_pool_default_pg'
osd_pool_default_pg_autoscale_mode = on
osd_pool_default_pg_num = 8
osd_pool_default_pgp_num = 8

At least, this is what appears to be used with Pacific.

Is this an intended change of behavior?  I could not find anything
related in the release notes.

Best regards,
Geert Kloosterman
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Unexpected ceph pool creation error with Ceph Quincy

2023-03-21 Thread Eugen Block
Did you ever adjust mon_max_pool_pg_num? Can you check what your  
current config value is?


host1:~ # ceph config get mon mon_max_pool_pg_num
65536

Zitat von Geert Kloosterman :


Hi,

Thanks Eugen for checking this.  I get the same default values as  
you when I remove the entries from my ceph.conf:


   [root@gjk-ceph ~]# ceph-conf -D | grep default_pg
   osd_pool_default_pg_autoscale_mode = on
   osd_pool_default_pg_num = 32
   osd_pool_default_pgp_num = 0

However, in my case, the pool creation error remains:

   [root@gjk-ceph ~]# ceph osd pool create asdf
   Error ERANGE: 'pgp_num' must be greater than 0 and lower or equal
   than 'pg_num', which in this case is 1

But I can create the pool when passing the same pg_num and pgp_num
values explicity:

   [root@gjk-ceph ~]# ceph osd pool create asdf 32 0
   pool 'asdf' created

Does anyone have an idea how I can debug this further?

I'm running Ceph on a virtualized Rocky 8.7 test cluster, with Ceph
rpms installed from http://download.ceph.com/rpm-quincy/el8/

Best regards,
Geert Kloosterman


On Wed, 2023-03-15 at 13:42 +, Eugen Block wrote:

External email: Use caution opening links or attachments


Hi,

I could not confirm this in a virtual lab cluster, also on 17.2.5:

host1:~ # ceph osd pool create asdf
pool 'asdf' created

host1:~ # ceph-conf -D | grep 'osd_pool_default_pg'
osd_pool_default_pg_autoscale_mode = on
osd_pool_default_pg_num = 32
osd_pool_default_pgp_num = 0

So it looks quite similar except the pgp_num value (I can't remember
having that modified). This is an upgraded Nautilus cluster.

Zitat von Geert Kloosterman :

> Hi all,
>
> I'm trying out Ceph Quincy (17.2.5) for the first time and I'm
> running into unexpected behavior of "ceph osd pool create".
>
> When not passing any pg_num and pgp_num values, I get the following
> error with Quincy:
>
> [root@gjk-ceph ~]# ceph osd pool create asdf
> Error ERANGE: 'pgp_num' must be greater than 0 and lower or
> equal than 'pg_num', which in this case is 1
>
> I checked with Ceph Pacific (16.2.11) and there the extra arguments
> are not needed.
>
> I expected it would use osd_pool_default_pg_num and
> osd_pool_default_pgp_num as defined in my ceph.conf:
>
> [root@gjk-ceph ~]# ceph-conf -D | grep 'osd_pool_default_pg'
> osd_pool_default_pg_autoscale_mode = on
> osd_pool_default_pg_num = 8
> osd_pool_default_pgp_num = 8
>
> At least, this is what appears to be used with Pacific.
>
> Is this an intended change of behavior?  I could not find anything
> related in the release notes.
>
> Best regards,
> Geert Kloosterman
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Changing os to ubuntu from centos 8

2023-03-21 Thread Szabo, Istvan (Agoda)
Hi,

I'd like to change the os to ubuntu 20.04.5 from my bare metal deployed octopus 
15.2.14 on centos 8. On the first run I would go with octopus 15.2.17 just to 
not make big changes in the cluster.
I've found couple of threads on the mailing list but those were containerized 
(like: Re: Upgrade/migrate host operating system for ceph nodes (CentOS/Rocky) 
or  Re: Migrating CEPH OS looking for suggestions).

Wonder what is the proper steps for this kind of migration? Do we need to start 
with mgr or mon or rgw or osd?
Is it possible to reuse the osd with ceph-volume scan on the reinstalled 
machine?
I'd stay with baremetal deployment and even maybe with octopus but I'm curious 
your advice.

Thank you


This message is confidential and is for the sole use of the intended 
recipient(s). It may also be privileged or otherwise protected by copyright or 
other legal rules. If you have received it by mistake please let us know by 
reply email and delete it from your system. It is prohibited to copy this 
message or disclose its content to anyone. Any confidentiality or privilege is 
not waived or lost by any mistaken delivery or unauthorized disclosure of the 
message. All messages sent to and from Agoda may be monitored to ensure 
compliance with company policies, to protect the company's interests and to 
remove potential malware. Electronic messages may be intercepted, amended, lost 
or deleted, or contain viruses.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io