I'm trying to setup the OIDC provider for RGW so that I can have roles that can 
be assumed by people logging into their regular Azure AD identities. The client 
I'm planning to use is Cyberduck – it seems like one of the few GUI S3 clients 
that manages the OIDC login process in a way that could work for relatively 
naive users.

I've gotten a fair ways down the road. I've been able to configure Cyberduck so 
that it performs the login with Azure AD, gets an identity token, and then 
sends it to Ceph to engage with the AssumeRoleWithWebIdentity process. However, 
I then get an error, which shows up in the Ceph rgw logs like this:

2024-07-08T17:18:09.749+0000 7fb2d7845700  0 req 15967124976712370684 
1.284013867s sts:assume_role_web_identity Signature validation failed: evp 
verify final failed: 0 error:0407008A:rsa 
routines:RSA_padding_check_PKCS1_type_1:invalid padding

I turned the logging for rgw up to 20 to see if I could follow along to see how 
much of the process succeeds and learn more about what fails. I can then see 
logging messages from this file in the source code:

https://github.com/ceph/ceph/blob/08d7ff952d78d1bbda04d5ff7e3db1e733301072/src/rgw/rgw_rest_sts.cc

We get to WebTokenEngine::get_from_jwt, and it logs the JWT payload in a way 
that seems to be as expected. The logs then indicate that a request is sent to 
the /.well-known/openid-configuration endpoint that appears to be appropriate 
for the issuer of the JWT. The logs eventually indicate what looks like a 
successful and appropriate response to that. The logs then show that a request 
is sent to the jwks_uri that is indicated in the openid-configuration document. 
The response to that is logged, and it appears to be appropriate.

We then get some logging starting with "Certificate is", so it looks like we're 
getting as far as WebTokenEngine::validate_signature. So, several things appear 
to have happened successfully – we've loading the OIDC provider that 
corresponds to the iss, and we've found a client ID that corresponds to what I 
registered when I configured things. (This is why I say we appear to be a fair 
ways down the road – a lot of this is working).

It looks as though what's happening in the code now is that it's iterating 
through the certificates given in the jwks_uri content. There are 6 
certificates listed, but the code only gets as far as the first one. Looking at 
the code, what appears to be happening is that, among the various certificates 
in the jwks_uri, it's finding the first one which matches a thumbprint 
registered with Ceph (that is, which I registered with Ceph). This must be 
succeeding (for the first certificate), because the "Signature validation 
failed" logging comes later. So, the code does verify that the thumbprint of 
the first certificate matches one of the thumbprints I registered with Ceph for 
this OIDC provider.

We then get to a part of the code where it tries to verify the JWT using the 
certificate, with jwt::verify. Given what gets logged ("Signature validateion 
failed: ", this must be throwing an exception.

The thing I find surprising about this is that there really isn't any reason to 
think that the first certificate listed in the jwks_uri content is going to be 
the certificate used to sign the JWT. If I understand JWT correctly, it's 
appropriate to sign the JWT with any of the certificates listed in the jwks_uri 
content. Furthermore, the JWT header includes a reference to the kid, so it's 
possible for Ceph to know exactly which certificate the JWT purports to be 
signed by. And, Ceph knows that there might be multiple thumbprints, because we 
can register 5. So, the logic of trying the first valid certificate in x5c and 
then stopping if it fails seems broken, actually.

I suppose what I could do as a workaround is try to figure out whether Azure AD 
is consistently using the same kid to sign the JWTs for me, and then only 
register that thumbprint with Ceph. Then, Ceph would actually choose the 
correct certificate (as the others wouldn't match a thumbprint I registered). I 
may try this – in part, just to verify what I think is happening. But it would 
be awfully fragile – I don't believe there is any requirement in JWT to just 
use one of the certificates listed in x5c.

An alternative would be to try rewriting the code to apply a different kind of 
logic. The way it ought to work (it seems to me) is something like this:


  *
Get the openid_configuration, and get the jwks stuff from the jwks_uri (which 
Ceph does already).
  *
Look at the header of the JWT to see which kid it purports to be signed by.
  *
Find the certificate that corresponds to that kid (from the jwks_uri content)
  *
Validate the JWT with that certificate.

That ought to work, at least given what I'm seeing. (But, I'm not a JWT expert, 
so I don't know whether there is something unusual in how Azure AD generates 
JWT's and handles the jwks_uri content).

Anyway, I'm curious whether anyone else has been trying to get this to work 
with Azure AD, and whether they have run into similar problems. And, of course, 
whether I appear to be misunderstanding anything about how this is supposed to 
work.


Ryan Rempel

Director of Information Technology

Canadian Mennonite University
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to