[ceph-users] Re: Ceph RADOSGW with Keycloak ODIC

2022-03-18 Thread Pritha Srivastava
Hi,

When you list the roles, the Condition element of the trust policy in the
role doesn't seem quite right:

"Condition": {
>"StringEquals": {
>"localhost:8080/auth/realms/demo:myclient
": "account"
>}

But what you have mentioned in the policy_document just above is correct:

"Condition":{"StringEquals":{"localhost:8080/auth/realms/demo:app_id":"account"}}

Is the value of 'aud' field in the access token that you generated, set to
"account"?

Another thing to check would be to see that the clientid (myclient) that
you have set in clientIdList as part of create_openid_connect_provider()
call, matches with the value of either clientId or client_id field in the
access token.

Or you can also check rgw logs and see what error is being logged for
AssumeRoleWithWebIdentity.

Thanks,
Pritha

On Sat, Mar 19, 2022 at 12:21 AM Seth Cagampang 
wrote:

> Hello,
>
>
>
> It seems like Pritha is the Ceph RGW expert in this forum. I am currently
> trying to integrate CephRGW object storage with KeyCloak as the OIDC
> provider. I am running ceph version 16.2.7 Pacific stable.
>
>
>
> At this point, I am just trying to get a POC working with the python
> scripts provided in the example in these docs <
> https://docs.ceph.com/en/latest/radosgw/STS/#sts-configuration> . Here are
> some step by step instructions on how I set up the ceph cluster and
> KeyCloak server:
>
>
>
> *Set up keycloak server*:
>
> 1. Create new Realm 'demo'
>
> 2. Create 'testuser' and add credentials. Verify that I am able to login to
> the realm using the new credentials.
>
> 3. Create a client 'myclient' and set Access Type as 'confidential' to
> generate client secret
>
> 4. Add a keycloak-oidc provider using the client credentials.
>
> 5. On the client set 'Authorization Enabled' to ON and 'Service Accounts
> Enabled' to ON.
>
>
>
> We should now be able to get the access tokens from the OIDC provider. To
> do this I used the sample curl calls from these docs <
> https://docs.ceph.com/en/latest/radosgw/keycloak/#setting-up-keycloak>
> which I put into scripts:
>
> access_token.sh
>
> #!/bin/bash
>
> KC_REALM=demo
>
> KC_CLIENT=myclient
>
> KC_CLIENT_SECRET=620b31fa----
>
> KC_SERVER=localhost:8080 
>
> KC_CONTEXT=auth
>
>
>
> # Request Tokens for credentials
>
> KC_RESPONSE=$( \
>
> curl -k -v -X POST \
>
> -H "Content-Type: application/x-www-form-urlencoded" \
>
> -d "scope=openid" \
>
> -d "grant_type=client_credentials" \
>
> -d "client_id=$KC_CLIENT" \
>
> -d "client_secret=$KC_CLIENT_SECRET" \
>
> "http://
> $KC_SERVER/$KC_CONTEXT/realms/$KC_REALM/protocol/openid-connect/token"
> \
>
> | jq .
>
> )
>
>
>
> KC_ACCESS_TOKEN=$(echo $KC_RESPONSE| jq -r .access_token)
>
> echo $KC_RESPONSE | jq .
>
> echo $KC_ACCESS_TOKEN
>
>
>
> Using this script I am able to get the access token for later usage and it
> has been verified that we are able to get the access token from the key
> cloak OIDC.
>
>
>
> *Set up Ceph Cluster w/ RGW*:
>
> 1. Create Ceph Cluster with OSD's and journals. Create an S3 object storage
> pool and then create an RGW on the cluster manager node.
>
> 2. Enable sts in the gateway config in /etc/ceph/ceph.conf as seen in the
> example from the docs <
> https://docs.ceph.com/en/latest/radosgw/keycloak/#setting-up-keycloak> :
>
> > [client.radosgw.gateway_name]
>
> > rgw sts key = abcdefghijklmnop
>
> > rgw s3 auth use sts = true
>
> 3. Create test users to be used in the test application python script.
>
> > radosgw-admin --uid TESTER --display-name "testuser" --access_key TESTER
> --secret test123 user create
> > radosgw-admin caps add --uid="TESTER" --caps="oidc-provider=*"
> >   radosgw-admin caps add --uid="TESTER" --caps="roles=*"
> >
> >   radosgw-admin --uid TESTER1 --display-name "testuser1" --access_key
> TESTER1 --secret test321 user create
> >   radosgw-admin caps add --uid="TESTER1" --caps="roles=*"
>
> 4. We need to generate thumbprints of the OIDC provider. I used the docs
> here  to
> write a script to generate the thumbprints:
>
> # Get the 'x5c' from this response to turn into an IDP-cert
>
> KEY1_RESPONSE=$(curl -k -v \
>
>  -X GET \
>
>  -H "Content-Type: application/x-www-form-urlencoded" \
>
>  "http://localhost:8080/auth/realms/demo/protocol/openid-connect/certs
> "
> \
>
>  | jq -r .keys[0].x5c)
>
>
>
> KEY2_RESPONSE=$(curl -k -v \
>
>  -X GET \
>
>  -H "Content-Type: application/x-www-form-urlencoded" \
>
>  "http://localhost:8080/auth/realms/demo/protocol/openid-connect/certs
> "
> \
>
>  | jq -r .keys[1].x5c)
>
>
>
> echo
>
> echo "Assembling Certificates"
>
>
>
> # Assemble Cert1
>
> echo '-BEGIN CERTIFICATE-' > certificate1.crt
>
> echo $(echo $KEY1_RESPONSE) | sed
> 's/^.//;s/.$//;s/^.//;s/.$//;s/^.//;s/.$//' >> cer

[ceph-users] Re: Local NTP servers on monitor node's.

2022-03-18 Thread Robin H. Johnson
On Wed, Mar 16, 2022 at 10:49:15AM +, Frank Schilder wrote:
> Returning to this thread, I finally managed to capture the problem I'm
> facing in a log. The time service to the outside world is blocked by
> our organisation's firewall and I'm restricted to use internal time
> servers. Unfortunately, these seem to be periodically unstable. I
> caught a time-excursion in the log extracts shown below. My problem
> now is that such a transient causes time-havoc on the cluster, because
> the servers start to adjust in all directions.
...
> Is there a config to tell the head node to take it easy with jumps in
> the external clock source?
This is the "step" config knobs.

> Here the observation. It is annotated and filtered to contain only
> lines where the offset changes and I reduced it to show the incident
> with few lines, all as seen from the head node:
...
> I know that the providers of the time service should get their act
> together, but I doubt that will happen and I would like to harden my
> time sync config to survive such events without chaos. If anyone can
> point me to a suitable config, please do. I need a way to smoothen out
> steep upstream oscillations, like a low-pass filter would do.
If you did filter out the sudden jumps, you'd end up with your mons
all (rightly) distrusting the bad time service, and then they could
drift on their own.

There are better timenuts than I on the list, but I think the following
MIGHT be a reasonable course of action.

1. Disable time stepping: "tinker stepfwd 0 stepback 0" (the exact syntax might 
vary depending on NTP version)
2. Set up your mons all be NTP servers (possibly in addition to the
   existing head node). They should peer with each other explicitly.
3. Set up the rest of your cluster to consume from the mons ONLY.
4. Optional: if your time service providers are unreliable, investigate
   build/buying your own, and use it to feed time to the mons.

If all the mons end up distrusting the time-service you have, they
*should* retain consistent time between themselves, and thus the clients
should also keep consistent time.





-- 
Robin Hugh Johnson
Gentoo Linux: Dev, Infra Lead, Foundation Treasurer
E-Mail   : robb...@gentoo.org
GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph RADOSGW with Keycloak ODIC

2022-03-18 Thread Seth Cagampang
Hello,



It seems like Pritha is the Ceph RGW expert in this forum. I am currently
trying to integrate CephRGW object storage with KeyCloak as the OIDC
provider. I am running ceph version 16.2.7 Pacific stable.



At this point, I am just trying to get a POC working with the python
scripts provided in the example in these docs <
https://docs.ceph.com/en/latest/radosgw/STS/#sts-configuration> . Here are
some step by step instructions on how I set up the ceph cluster and
KeyCloak server:



*Set up keycloak server*:

1. Create new Realm 'demo'

2. Create 'testuser' and add credentials. Verify that I am able to login to
the realm using the new credentials.

3. Create a client 'myclient' and set Access Type as 'confidential' to
generate client secret

4. Add a keycloak-oidc provider using the client credentials.

5. On the client set 'Authorization Enabled' to ON and 'Service Accounts
Enabled' to ON.



We should now be able to get the access tokens from the OIDC provider. To
do this I used the sample curl calls from these docs <
https://docs.ceph.com/en/latest/radosgw/keycloak/#setting-up-keycloak>
which I put into scripts:

access_token.sh

#!/bin/bash

KC_REALM=demo

KC_CLIENT=myclient

KC_CLIENT_SECRET=620b31fa----

KC_SERVER=localhost:8080 

KC_CONTEXT=auth



# Request Tokens for credentials

KC_RESPONSE=$( \

curl -k -v -X POST \

-H "Content-Type: application/x-www-form-urlencoded" \

-d "scope=openid" \

-d "grant_type=client_credentials" \

-d "client_id=$KC_CLIENT" \

-d "client_secret=$KC_CLIENT_SECRET" \

"http://$KC_SERVER/$KC_CONTEXT/realms/$KC_REALM/protocol/openid-connect/token";
\

| jq .

)



KC_ACCESS_TOKEN=$(echo $KC_RESPONSE| jq -r .access_token)

echo $KC_RESPONSE | jq .

echo $KC_ACCESS_TOKEN



Using this script I am able to get the access token for later usage and it
has been verified that we are able to get the access token from the key
cloak OIDC.



*Set up Ceph Cluster w/ RGW*:

1. Create Ceph Cluster with OSD's and journals. Create an S3 object storage
pool and then create an RGW on the cluster manager node.

2. Enable sts in the gateway config in /etc/ceph/ceph.conf as seen in the
example from the docs <
https://docs.ceph.com/en/latest/radosgw/keycloak/#setting-up-keycloak> :

> [client.radosgw.gateway_name]

> rgw sts key = abcdefghijklmnop

> rgw s3 auth use sts = true

3. Create test users to be used in the test application python script.

> radosgw-admin --uid TESTER --display-name "testuser" --access_key TESTER
--secret test123 user create
> radosgw-admin caps add --uid="TESTER" --caps="oidc-provider=*"
>   radosgw-admin caps add --uid="TESTER" --caps="roles=*"
>
>   radosgw-admin --uid TESTER1 --display-name "testuser1" --access_key
TESTER1 --secret test321 user create
>   radosgw-admin caps add --uid="TESTER1" --caps="roles=*"

4. We need to generate thumbprints of the OIDC provider. I used the docs
here  to
write a script to generate the thumbprints:

# Get the 'x5c' from this response to turn into an IDP-cert

KEY1_RESPONSE=$(curl -k -v \

 -X GET \

 -H "Content-Type: application/x-www-form-urlencoded" \

 "http://localhost:8080/auth/realms/demo/protocol/openid-connect/certs";
\

 | jq -r .keys[0].x5c)



KEY2_RESPONSE=$(curl -k -v \

 -X GET \

 -H "Content-Type: application/x-www-form-urlencoded" \

 "http://localhost:8080/auth/realms/demo/protocol/openid-connect/certs";
\

 | jq -r .keys[1].x5c)



echo

echo "Assembling Certificates"



# Assemble Cert1

echo '-BEGIN CERTIFICATE-' > certificate1.crt

echo $(echo $KEY1_RESPONSE) | sed
's/^.//;s/.$//;s/^.//;s/.$//;s/^.//;s/.$//' >> certificate1.crt

echo '-END CERTIFICATE-' >> certificate1.crt

echo $(cat certificate1.crt)



# Assemble Cert2

echo '-BEGIN CERTIFICATE-' > certificate2.crt

echo $(echo $KEY2_RESPONSE) | sed
's/^.//;s/.$//;s/^.//;s/.$//;s/^.//;s/.$//' >> certificate2.crt

echo '-END CERTIFICATE-' >> certificate2.crt

echo $(cat certificate2.crt)



echo

echo "Generating thumbprints"

# Create Thumbprint for both certs

PRETHUMBPRINT1=$(openssl x509 -in certificate1.crt -fingerprint -noout)

PRETHUMBPRINT2=$(openssl x509 -in certificate2.crt -fingerprint -noout)



PRETHUMBPRINT1=$(echo $PRETHUMBPRINT1 | awk '{ print substr($0, 18) }')

PRETHUMBPRINT2=$(echo $PRETHUMBPRINT2 | awk '{ print substr($0, 18) }')



echo "${PRETHUMBPRINT1//:}"

echo "${PRETHUMBPRINT2//:}"

I copied and pasted these thumbprints into the example application python
script to perform the create_open_id_connect_provider() for the
'iam_client'.

5. Next I filled out the missing information in the example application
script:

#!/usr/bin/python3

import boto3



iam_client = boto3.client('iam',

aws_access_key_id="TESTER",

aws_secret_access_key="test123",

endpoint_url="http://10.x.x.x:7480";, #

[ceph-users] radosgw-admin zonegroup synced user with colon in name is not working

2022-03-18 Thread Boris Behrens
Hi,
we've got user that is called like : (please don't ask me why.
I have no clue) and it got some strange behavior in the syncing process.

In the master zonegroup the user looks like this:
root@s3db1:~# radosgw-admin user info --uid
94c13787-0e79-4a67-b4f5-d4c71c59c16e:104648ad-9a3e-4b7a-8b30-3f14b261c20a
{
"user_id":
"94c13787-0e79-4a67-b4f5-d4c71c59c16e:104648ad-9a3e-4b7a-8b30-3f14b261c20a",
"display_name":
"94c13787-0e79-4a67-b4f5-d4c71c59c16e:104648ad-9a3e-4b7a-8b30-3f14b261c20a",
...
"subusers": [],
"keys": [
{
"user":
"94c13787-0e79-4a67-b4f5-d4c71c59c16e:104648ad-9a3e-4b7a-8b30-3f14b261c20a",
"access_key": "ACCESS-KEY",
"secret_key": "SECRET"
}
],
...
}

In the other zonegroup the user looks like this:
root@ac1f6b4abef6:~# radosgw-admin user info --uid
94c13787-0e79-4a67-b4f5-d4c71c59c16e:104648ad-9a3e-4b7a-8b30-3f14b261c20a
{
"user_id":
"94c13787-0e79-4a67-b4f5-d4c71c59c16e:104648ad-9a3e-4b7a-8b30-3f14b261c20a",
"display_name":
"94c13787-0e79-4a67-b4f5-d4c71c59c16e:104648ad-9a3e-4b7a-8b30-3f14b261c20a",
...
"subusers": [],
"keys": [
{
"user":
"94c13787-0e79-4a67-b4f5-d4c71c59c16e:104648ad-9a3e-4b7a-8b30-3f14b261c20a:104648ad-9a3e-4b7a-8b30-3f14b261c20a",
"access_key": "ACCESS-KEY",
"secret_key": "SECRET"
}
]
...
}

Buckets that belong to the user are not accessible at all in the 2nd
zonegroup.
Buckets that are in the master zonegroup work fine.

Maybe this is a bug, maybe colons are not allowed in user names?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Take the Ceph User Survey for 2022!

2022-03-18 Thread Mike Perez
Hi all,

There is one week left until the Ceph User Survey closes. Please
consider taking it or sharing it with others that use Ceph.

On Fri, Feb 11, 2022 at 3:59 PM Mike Perez  wrote:
>
> Hi everyone!
>
> Be sure to make your voice heard by taking the Ceph User Survey before
> March 25, 2022. This information will help guide the Ceph community’s
> investment in Ceph and the Ceph community's future development.
>
> https://survey.zohopublic.com/zs/tLCskv
>
> Thank you to the Ceph User Survey Working Group for designing this
> year's survey!
>
> https://tracker.ceph.com/projects/ceph/wiki/User_Survey_Working_Group
>
> We will provide the final results in the coming month after
> the survey has ended.
>
> --
> Mike Perez

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph OSD's take 10+ minutes to start on reboot

2022-03-18 Thread Chris Page
Hi,

Following up from this, is it just normal for them to take a while? I
notice that once I have restarted an OSD, the 'meta' value drops right down
to empty and slowly builds back up. The restarted OSD's start with just 1gb
or so of metadata and increase over time to 160/170GB of metadata.

So perhaps the delay is just the rebuilding of this metadata pool?

Thanks,
Chris.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: CephFS snaptrim bug?

2022-03-18 Thread Linkriver Technology
Hello,

If I understand my issue correctly, it is in fact unrelated to CephFS itself,
rather the problem happens at a lower level (in Ceph itself). IOW, it affects
all kind of snapshots, not just CephFS ones. I believe my FS is healthy
otherwise. In any case, here is the output of the command you asked:

I ran it a few hours ago:

"num_strays": 235,
"num_strays_delayed": 38,
"num_strays_enqueuing": 0,
"strays_created": 5414436,
"strays_enqueued": 5405983,
"strays_reintegrated": 17892,
"strays_migrated": 0,

And just now:

"num_strays": 186,
"num_strays_delayed": 0,
"num_strays_enqueuing": 0,
"strays_created": 5540016,
"strays_enqueued": 5531494,
"strays_reintegrated": 18128,
"strays_migrated": 0,


Regards,

LRT

-Original Message-
From: Arnaud M 
To: Linkriver Technology 
Cc: Dan van der Ster , Ceph Users 
Subject: [ceph-users] Re: CephFS snaptrim bug?
Date: Thu, 17 Mar 2022 21:48:18 +0100

Hello Linkriver

I might have an issue close to your

Can you tell us if your strays dirs are full ?

What does this command output to you ?

ceph tell mds.0 perf dump | grep strays

Does the value change over time ?

All the best

Arnaud

Le mer. 16 mars 2022 à 15:35, Linkriver Technology <
technol...@linkriver-capital.com> a écrit :

> Hi,
> 
> Has anyone figured whether those "lost" snaps are rediscoverable /
> trimmable?
> All pgs in the cluster have been deep scrubbed since my previous email and
> I'm
> not seeing any of that wasted space being recovered.
> 
> Regards,
> 
> LRT
> 
> -Original Message-
> From: Dan van der Ster 
> To: technol...@linkriver-capital.com
> Cc: Ceph Users , Neha Ojha 
> Subject: Re: [ceph-users] CephFS snaptrim bug?
> Date: Thu, 24 Feb 2022 09:48:04 +0100
> 
> See https://tracker.ceph.com/issues/54396
> 
> I don't know how to tell the osds to rediscover those trimmed snaps.
> Neha does that possible?
> 
> Cheers, Dan
> 
> On Thu, Feb 24, 2022 at 9:27 AM Dan van der Ster 
> wrote:
> > 
> > Hi,
> > 
> > I had a look at the code -- looks like there's a flaw in the logic:
> > the snaptrim queue is cleared if osd_pg_max_concurrent_snap_trims = 0.
> > 
> > I'll open a tracker and send a PR to restrict
> > osd_pg_max_concurrent_snap_trims to >= 1.
> > 
> > Cheers, Dan
> > 
> > On Wed, Feb 23, 2022 at 9:44 PM Linkriver Technology
> >  wrote:
> > > 
> > > Hello,
> > > 
> > > I have upgraded our Ceph cluster from Nautilus to Octopus (15.2.15)
> over the
> > > weekend. The upgrade went well as far as I can tell.
> > > 
> > > Earlier today, noticing that our CephFS data pool was approaching
> capacity, I
> > > removed some old CephFS snapshots (taken weekly at the root of the
> filesystem),
> > > keeping only the most recent one (created today, 2022-02-21). As
> expected, a
> > > good fraction of the PGs transitioned from active+clean to
> active+clean+snaptrim
> > > or active+clean+snaptrim_wait. In previous occasions when I removed a
> snapshot
> > > it took a few days for snaptrimming to complete. This would happen
> without
> > > noticeably impacting other workloads, and would also free up an
> appreciable
> > > amount of disk space.
> > > 
> > > This time around, after a few hours of snaptrimming, users complained
> of high IO
> > > latency, and indeed Ceph reported "slow ops" on a number of OSDs and
> on the
> > > active MDS. I attributed this to the snaptrimming and decided to
> reduce it by
> > > initially setting osd_pg_max_concurrent_snap_trims to 1, which didn't
> seem to
> > > help much, so I then set it to 0, which had the surprising effect of
> > > transitioning all PGs back to active+clean (is this intended?). I also
> restarted
> > > the MDS which seemed to be struggling. IO latency went back to normal
> > > immediately.
> > > 
> > > Outside of users' working hours, I decided to resume snaptrimming by
> setting
> > > osd_pg_max_concurrent_snap_trims back to 1. Much to my surprise,
> nothing
> > > happened. All PGs remained (and still remain at time of writing) in
> the state
> > > active+clean, even after restarting some of them. This definitely seems
> > > abnormal, as I mentioned earlier, snaptrimming this FS previously
> would take in
> > > the order of multiple days. Moreover, if snaptrim were truly complete,
> I would
> > > expect pool usage to have dropped by appreciable amounts (at least a
> dozen
> > > terabytes), but that doesn't seem to be the case.
> > > 
> > > A du on the CephFS root gives:
> > > 
> > > # du -sh /mnt/pve/cephfs
> > > 31T/mnt/pve/cephfs
> > > 
> > > But:
> > > 
> > > # ceph df
> > > 
> > > --- POOLS ---
> > > POOL   ID  PGS   STORED   OBJECTS  USED %USED  MAX
> AVAIL
> > > cephfs_data 7   512   43 TiB  190.83M  147 TiB  93.22
> 3.6 TiB
> > > cephfs_metadata 832   89 GiB  694.60k  266 GiB   1.32
> 6.4 TiB
> > > 
> > > 
> > > ceph pg dump reports a SNAPTRIMQ_LEN of 0 on all PGs.
> > > 

[ceph-users] March 2022 Ceph Tech Talk:

2022-03-18 Thread Mike Perez
Hi everyone

On March 24 at 17:00 UTC, hear Kamoltat (Junior) Sirivadhna give a
Ceph Tech Talk on how Teuthology, Ceph's integration test framework,
works!

https://ceph.io/en/community/tech-talks/

Also, if you would like to present and share with the community what
you're doing with Ceph or development, please let me know as we are
looking for content. Thanks!

-- 
Mike Perez

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RGW/S3 losing multipart upload objects

2022-03-18 Thread Ulrich Klein
I tried it on a mini-cluster (4 Raspberries) with 16.2.7. 
Same procedure, same effect. I just can’t get rid of these objects.

Is there any method that would allow me to delete these objects without 
damaging RGW?

Ciao, Uli 

> On 17. 03 2022, at 15:30, Soumya Koduri  wrote:
> 
> On 3/17/22 17:16, Ulrich Klein wrote:
>> Hi,
>> 
>> My second attempt to get help with a problem I'm trying to solve for about 6 
>> month now.
>> 
>> I have a Ceph 16.2.6 test cluster, used almost exclusively for providing 
>> RGW/S3 service. similar to a production cluster.
>> 
>> The problem I have is this:
>> A client uploads (via S3) a bunch of large files into a bucket via multiparts
>> The upload(s) get interrupted and retried
>> In the end from a client's perspective all the files are visible and 
>> everything looks fine.
>> But on the cluster there are many more objects in the buckets
>> Even after cleaning out the incomplete multipart uploads there are too many 
>> objects
>> Even after deleting all the visible objects from the bucket there are still 
>> objects in the bucket
>> I have so far found no way to get rid of those left-over objects.
>> It's screwing up space accounting and I'm afraid I'll eventually have a 
>> cluster full of those lost objects.
>> The only way to clean up seems to be to copy te contents of a bucket to a 
>> new bucket and delete the screwed-up bucket. But on a production system 
>> that's not always a real option.
>> 
>> I've found a variety of older threads that describe a similar problem. None 
>> of them decribing a solution :(
>> 
>> 
>> 
>> I can pretty easily reproduce the problem with this sequence:
>> 
>> On a client system create a directory with ~30 200MB files. (On a faster 
>> system I'd probably need bigger or more files)
>> tstfiles/tst01 - tst29
>> 
>> run
>> $ rclone mkdir tester:/test-bucket # creates a bucket on the test system 
>> with user tester
>> Run
>> $ rclone sync -v tstfiles tester:/test-bucket/tstfiles
>> a couple of times (6-8), interrupting each one via CNTRL-C
>> Eventually let one finish.
>> 
>> Now I can use s3cmd to see all the files:
>> $ s3cmd ls -lr s3://test-bucket/tstfiles
>> 2022-03-16 17:11   200M  ecb28853bd18eeae185b0b12bd47333c-40  STANDARD 
>> s3://test-bucket/tstfiles/tst01
>> ...
>> 2022-03-16 17:13   200M  ecb28853bd18eeae185b0b12bd47333c-40  STANDARD 
>> s3://test-bucket/tstfiles/tst29
>> 
>> ... and to list incomplete uploads:
>> $ s3cmd multipart s3://test-bucket
>> s3://test-bucket/
>> InitiatedPathId
>> 2022-03-16T17:11:19.074Z s3://test-bucket/tstfiles/tst05 
>> 2~1nElF0c3uq5FnZ9cKlsnGlXKATvjr0g
>> ...
>> 2022-03-16T17:12:41.583Z s3://test-bucket/tstfiles/tst28 
>> 2~exVQUILhVSmFqWxCuAflRa4Tfq4nUQa
>> 
>> I can abort the uploads with
>> $  s3cmd abortmp s3://test-bucket/tstfiles/tst05 
>> 2~1nElF0c3uq5FnZ9cKlsnGlXKATvjr0g
>> ...
> 
> 
> 
> On the latest master, I see that these objects are deleted immediately post 
> abortmp. I believe this issue may have beenn fixed as part of [1], backported 
> to v16.2.7 [2]. Maybe you could try upgrading your cluster and recheck.
> 
> 
> Thanks,
> 
> Soumya
> 
> 
> [1] https://tracker.ceph.com/issues/53222
> 
> [2] https://tracker.ceph.com/issues/53291
> 
> 
> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Question about auto scale and changing the PG Num

2022-03-18 Thread Claas Goltz
Hello,
I have an SSD pool that was initially created years ago with 128PG. This
seems to be suboptimal to me. In this pool are 32 OSDs á 1.6TiB. 8 servers
with 4 OSDs each.

ceph osd pool autoscale-status recommends 2048 PGs.

Is it safe to enable the autoscale mode? Is the pool still accessible
during this time? Or should I rather go up step by step, my idea would be
to go first to 512, then 1024 and finally to 2048. I would set the recovery
priority to low at the working hours and default outside that time.
Thank you very much!
Claas
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: 3 node CEPH PVE hyper-converged cluster serious fragmentation and performance loss in matter of days.

2022-03-18 Thread Igor Fedotov



On 3/10/2022 6:10 PM, Sasa Glumac wrote:



> In this respect could you please try to switch bluestore and bluefs
> allocators to bitmap and run some smoke benchmarking again.
Can i change this on live server (is there possibility of losing data 
etc )? Can you please share correct procedure.



To change the allocator for an OSD.N one should run:

ceph config set osd.N bluestore_allocator bitmap

and restart an OSD.

I'm unware about any issues with such a switch...

Alternatively/additionally you might want to try stupid allocator as well.



> Additionally you might want to upgrade to 15.2.16 which includes a bunch
> of improvements for Avl/Hybrid allocators tail latency numbers as per
> the ticket above.
Atm we use pve repository where 15.2.15 is latest , I will need to 
either wait for .16 from them or create second cluster without proxmox 
but would like to test on existing.
Is there any difference between pve ceph and regular so i can change 
repo and install over existing ?

Sorry I don't know.

--
Igor Fedotov
Ceph Lead Developer

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263
Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io