Hi Akira, Thank you for the information. Knox plays a main role in reverse proxy for Hadoop cluster. I understand the importance to keep Knox running to centralize audit log for ingress into the cluster. Other reverse proxy solution like Nginx are more feature rich for caching static contents and load balancer. It would be great to have ability to use either Knox or Nginx as reverse proxy solution. Company wide OIDC is likely to run independently from Hadoop cluster, but also possible to run in a Hadoop cluster. Reverse proxy must have ability to redirects to OIDC where exposed endpoint is appropriate.
HADOOP-11717 was a good effort to enable SSO integration except it is written to extend on Kerberos authentication, which prevents decoupling from Kerberos a reality. I gathered a few design requirements this morning, and welcome to contribute: 1. Encryption is mandatory. Server certificate validation is required. 2. Existing token infrastructure for block access token remains the same. 3. Replace delegation token transport with OIDC JWT token. 4. Patch token renewer logic to support renew token with OIDC endpoint before token expires. 5. Impersonation logic uses service user credentials. New way to renew service user credentials securely. 6. Replace Hadoop RPC SASL transport with TLS because OIDC works with TLS natively. 7. Command CLI improvements to use environment variables or files for accessing client credentials Downgrade the use of UGI.doAs() to private of Hadoop. Service should not run with elevated privileges unless there is a good reason for it (i.e. loading hive external tables). I think this is good starting point, and feedback can help to turn these requirements into tasks. Let me know what you think. Thanks regards, Eric On Tue, May 19, 2020 at 9:47 PM Akira Ajisaka <aajis...@apache.org> wrote: > Hi Eric, thank you for starting the discussion. > > I'm interested in OpenID Connect (OIDC) integration. > > In addition to the benefits (security, cloud native), operating costs may > be reduced in some companies. > We have our company-wide OIDC provider and enable SSO for Hadoop Web UIs > via Knox + OIDC in Yahoo! JAPAN. > On the other hand, Hadoop administrators have to manage our own KDC > servers only for Hadoop ecosystems. > If Hadoop and its ecosystem can support OIDC, we don't have to manage KDC > and that way operating costs will be reduced. > > Regards, > Akira > > On Thu, May 7, 2020 at 7:32 AM Eric Yang <eric...@gmail.com> wrote: > >> Hi all, >> >> Kerberos was developed decade before web development becomes popular. >> There are some Kerberos limitations which does not work well in Hadoop. A >> few examples of corner cases: >> >> 1. Kerberos principal doesn't encode port number, it is difficult to know >> if the principal is coming from an authorized daemon or a hacker container >> trying to forge service principal. >> 2. Hadoop Kerberos principals are used as high privileged principal, a >> form >> of credential to impersonate end user. >> 3. Delegation token may allow expired users to continue to run jobs long >> after they are gone, without rechecking if end user credentials is still >> valid. >> 4. Passing different form of tokens does not work well with cloud >> provider >> security mechanism. For example, passing AWS sts token for S3 bucket. >> There is no renewal mechanism, nor good way to identify when the token >> would expire. >> >> There are companies that work on bridging security mechanism of different >> types, but this is not primary goal for Hadoop. Hadoop can benefit from >> modernized security using open standards like OpenID Connect, which >> proposes to unify web applications using SSO. This ensure the client >> credentials are transported in each stage of client servers interaction. >> This may improve overall security, and provide more cloud native form >> factor. I wonder if there is any interested in the community to enable >> Hadoop OpenID Connect integration work? >> >> regards, >> Eric >> >