[ceph-users] Re: ceph 17.2.6 and iam roles (pr#48030)

2023-04-13 Thread Christopher Durham
 

 Casey,
Thanks. This all worked. Some observations and comments for others that may be 
in my situation:

1. When deleting the roles on the secondary with radosgw-admin role delete  
I had to delete all the policies of each role before I deleted the role 
itself.2. radosgw-admin complained when I tried and told me to use 
--yes-i-really-mean-it as otherwise it would result in inconsistent metadata 
between clusters (which is what I wanted in this case).
Note: This was on my test cluster, I'll do the production cluster once some 
downtime is scheduled, as my users' stuff will complain/die when, at least for 
a short time, there are no roles on the secondary.
Another note: I have no idea how ths happened, but my test cluster was still 
using ldb for the mon data! I updated a single mon on the master side and the 
mon would not start! After realizing I had ldb files in 
/var/lib/ceph/mon/ceph-servername/store.db, I extracted the monmap from a 
still-up mon, removed the broken mon from the map, re-inserted the monmap to 
the working mon, and then reinitialized the broken mon. Again, the mon would 
not start! What I determined was that ALL my mons on the test cluster were 
still running ldb (on pacific). Yuk. How I finally fixed it was to extract the 
monmap from a working mon (it was still using ldb), remove the broken one and 
reinsert the map to the working mon, but then before reinitializing the broken 
mon, create a file /var/lib/ceph/mon/ceph-/kv_backend with the one line 
'rocksdb', then and only then restore the mon. Repeat for all mons, using the 
first restored mon (which was now rocksdb) as the starting point so I didnt 
have to create the kv_backend file.

 Again, I have no idea how I got into this situation, but this cluster started 
at nautilus. It might make sense for ceph to provide an automated conversion 
script to recover a mon in this situation.
For those of you wondering how to do what I vaguely am discussing ( manually 
remove and restore a broken mon), see here:
https://docs.ceph.com/en/latest/rados/operations/add-or-rm-mons/
-Chris


-Original Message-
From: Casey Bodley 
To: Christopher Durham 
Cc: ceph-users@ceph.io 
Sent: Tue, Apr 11, 2023 1:59 pm
Subject: [ceph-users] Re: ceph 17.2.6 and iam roles (pr#48030)

On Tue, Apr 11, 2023 at 3:53 PM Casey Bodley  wrote:
>
> On Tue, Apr 11, 2023 at 3:19 PM Christopher Durham  wrote:
> >
> >
> > Hi,
> > I see that this PR: https://github.com/ceph/ceph/pull/48030
> > made it into ceph 17.2.6, as per the change log  at: 
> > https://docs.ceph.com/en/latest/releases/quincy/  That's great.
> > But my scenario is as follows:
> > I have two clusters set up as multisite. Because of  the lack of 
> > replication for IAM roles, we have set things up so that roles on the 
> > primary 'manually' get replicated to the secondary site via a python 
> > script. Thus, if I create a role on the primary, add/delete users or 
> > buckets from said role, the role, including the AssumeRolePolicyDocument 
> > and policies, gets pushed to the replicated site. This has served us well 
> > for three years.
> > With the advent of this fix, what should I do before I upgrade to 17.2.6 
> > (currently on 17.2.5, rocky 8)
> >
> > I know that in my situation, roles of the same name have different RoleIDs 
> > on the two sites. What should I do before I upgrade? Possibilities that 
> > *could* happen if i dont rectify things as we upgrade:
> > 1. The different RoleIDs lead to two roles of the same name on the 
> > replicated site, perhaps with the system unable to address/look at/modify 
> > either
> > 2. Roles just don't get repiicated to the second site
>
> no replication would happen until the metadata changes again on the
> primary zone. once that gets triggered, the role metadata would
> probably fail to sync due to the name conflicts
>
> >
> > or other similar situations, all of which I want to avoid.
> > Perhaps the safest thing to do is to remove all roles on the secondary 
> > site, upgrade, and then force a replication of roles (How would I *force* 
> > that for iAM roles if it is the correct answer?)
>
> this removal will probably be necessary to avoid those conflicts. once
> that's done, you can force a metadata full sync on the secondary zone
> by running 'radosgw-admin metadata sync init' there, then restarting
> its gateways. this will have to resync all of the bucket and user
> metadata as well

p.s. don't use the DeleteRole rest api on the secondary zone after
upgrading, as the request would get forwarded to the primary zone and
delete it there too. you can use 'radosgw-admin role delete' on the
secondary instead

>
> > Here is the original bug report:
> >
> > https://tracker.ceph.com/issues/57364
> > 

[ceph-users] Re: ceph 17.2.6 and iam roles (pr#48030)

2023-04-11 Thread Casey Bodley
On Tue, Apr 11, 2023 at 3:53 PM Casey Bodley  wrote:
>
> On Tue, Apr 11, 2023 at 3:19 PM Christopher Durham  wrote:
> >
> >
> > Hi,
> > I see that this PR: https://github.com/ceph/ceph/pull/48030
> > made it into ceph 17.2.6, as per the change log  at: 
> > https://docs.ceph.com/en/latest/releases/quincy/  That's great.
> > But my scenario is as follows:
> > I have two clusters set up as multisite. Because of  the lack of 
> > replication for IAM roles, we have set things up so that roles on the 
> > primary 'manually' get replicated to the secondary site via a python 
> > script. Thus, if I create a role on the primary, add/delete users or 
> > buckets from said role, the role, including the AssumeRolePolicyDocument 
> > and policies, gets pushed to the replicated site. This has served us well 
> > for three years.
> > With the advent of this fix, what should I do before I upgrade to 17.2.6 
> > (currently on 17.2.5, rocky 8)
> >
> > I know that in my situation, roles of the same name have different RoleIDs 
> > on the two sites. What should I do before I upgrade? Possibilities that 
> > *could* happen if i dont rectify things as we upgrade:
> > 1. The different RoleIDs lead to two roles of the same name on the 
> > replicated site, perhaps with the system unable to address/look at/modify 
> > either
> > 2. Roles just don't get repiicated to the second site
>
> no replication would happen until the metadata changes again on the
> primary zone. once that gets triggered, the role metadata would
> probably fail to sync due to the name conflicts
>
> >
> > or other similar situations, all of which I want to avoid.
> > Perhaps the safest thing to do is to remove all roles on the secondary 
> > site, upgrade, and then force a replication of roles (How would I *force* 
> > that for iAM roles if it is the correct answer?)
>
> this removal will probably be necessary to avoid those conflicts. once
> that's done, you can force a metadata full sync on the secondary zone
> by running 'radosgw-admin metadata sync init' there, then restarting
> its gateways. this will have to resync all of the bucket and user
> metadata as well

p.s. don't use the DeleteRole rest api on the secondary zone after
upgrading, as the request would get forwarded to the primary zone and
delete it there too. you can use 'radosgw-admin role delete' on the
secondary instead

>
> > Here is the original bug report:
> >
> > https://tracker.ceph.com/issues/57364
> > Thanks!
> > -Chris
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph 17.2.6 and iam roles (pr#48030)

2023-04-11 Thread Casey Bodley
On Tue, Apr 11, 2023 at 3:19 PM Christopher Durham  wrote:
>
>
> Hi,
> I see that this PR: https://github.com/ceph/ceph/pull/48030
> made it into ceph 17.2.6, as per the change log  at: 
> https://docs.ceph.com/en/latest/releases/quincy/  That's great.
> But my scenario is as follows:
> I have two clusters set up as multisite. Because of  the lack of replication 
> for IAM roles, we have set things up so that roles on the primary 'manually' 
> get replicated to the secondary site via a python script. Thus, if I create a 
> role on the primary, add/delete users or buckets from said role, the role, 
> including the AssumeRolePolicyDocument and policies, gets pushed to the 
> replicated site. This has served us well for three years.
> With the advent of this fix, what should I do before I upgrade to 17.2.6 
> (currently on 17.2.5, rocky 8)
>
> I know that in my situation, roles of the same name have different RoleIDs on 
> the two sites. What should I do before I upgrade? Possibilities that *could* 
> happen if i dont rectify things as we upgrade:
> 1. The different RoleIDs lead to two roles of the same name on the replicated 
> site, perhaps with the system unable to address/look at/modify either
> 2. Roles just don't get repiicated to the second site

no replication would happen until the metadata changes again on the
primary zone. once that gets triggered, the role metadata would
probably fail to sync due to the name conflicts

>
> or other similar situations, all of which I want to avoid.
> Perhaps the safest thing to do is to remove all roles on the secondary site, 
> upgrade, and then force a replication of roles (How would I *force* that for 
> iAM roles if it is the correct answer?)

this removal will probably be necessary to avoid those conflicts. once
that's done, you can force a metadata full sync on the secondary zone
by running 'radosgw-admin metadata sync init' there, then restarting
its gateways. this will have to resync all of the bucket and user
metadata as well

> Here is the original bug report:
>
> https://tracker.ceph.com/issues/57364
> Thanks!
> -Chris
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io