Hi Akira,

Thank you for the information.  Knox plays a main role in reverse proxy for
Hadoop cluster.  I understand the importance to keep Knox running to
centralize audit log for ingress into the cluster.  Other reverse proxy
solution like Nginx are more feature rich for caching static contents and
load balancer.  It would be great to have ability to use either Knox or
Nginx as reverse proxy solution.  Company wide OIDC is likely to run
independently from Hadoop cluster, but also possible to run in a Hadoop
cluster.  Reverse proxy must have ability to redirects to OIDC where
exposed endpoint is appropriate.

HADOOP-11717 was a good effort to enable SSO integration except it is
written to extend on Kerberos authentication, which prevents decoupling
from Kerberos a reality.  I gathered a few design requirements this
morning, and welcome to contribute:

1.  Encryption is mandatory.  Server certificate validation is required.
2.  Existing token infrastructure for block access token remains the same.
3.  Replace delegation token transport with OIDC JWT token.
4.  Patch token renewer logic to support renew token with OIDC endpoint
before token expires.
5.  Impersonation logic uses service user credentials.  New way to renew
service user credentials securely.
6.  Replace Hadoop RPC SASL transport with TLS because OIDC works with TLS
natively.
7.  Command CLI improvements to use environment variables or files for
accessing client credentials

Downgrade the use of UGI.doAs() to private of Hadoop.  Service should not
run with elevated privileges unless there is a good reason for it (i.e.
loading hive external tables).
I think this is good starting point, and feedback can help to turn these
requirements into tasks.  Let me know what you think.  Thanks

regards,
Eric

On Tue, May 19, 2020 at 9:47 PM Akira Ajisaka <aajis...@apache.org> wrote:

> Hi Eric, thank you for starting the discussion.
>
> I'm interested in OpenID Connect (OIDC) integration.
>
> In addition to the benefits (security, cloud native), operating costs may
> be reduced in some companies.
> We have our company-wide OIDC provider and enable SSO for Hadoop Web UIs
> via Knox + OIDC in Yahoo! JAPAN.
> On the other hand, Hadoop administrators have to manage our own KDC
> servers only for Hadoop ecosystems.
> If Hadoop and its ecosystem can support OIDC, we don't have to manage KDC
> and that way operating costs will be reduced.
>
> Regards,
> Akira
>
> On Thu, May 7, 2020 at 7:32 AM Eric Yang <eric...@gmail.com> wrote:
>
>> Hi all,
>>
>> Kerberos was developed decade before web development becomes popular.
>> There are some Kerberos limitations which does not work well in Hadoop.  A
>> few examples of corner cases:
>>
>> 1. Kerberos principal doesn't encode port number, it is difficult to know
>> if the principal is coming from an authorized daemon or a hacker container
>> trying to forge service principal.
>> 2. Hadoop Kerberos principals are used as high privileged principal, a
>> form
>> of credential to impersonate end user.
>> 3. Delegation token may allow expired users to continue to run jobs long
>> after they are gone, without rechecking if end user credentials is still
>> valid.
>> 4.  Passing different form of tokens does not work well with cloud
>> provider
>> security mechanism.  For example, passing AWS sts token for S3 bucket.
>> There is no renewal mechanism, nor good way to identify when the token
>> would expire.
>>
>> There are companies that work on bridging security mechanism of different
>> types, but this is not primary goal for Hadoop.  Hadoop can benefit from
>> modernized security using open standards like OpenID Connect, which
>> proposes to unify web applications using SSO.   This ensure the client
>> credentials are transported in each stage of client servers interaction.
>> This may improve overall security, and provide more cloud native form
>> factor.  I wonder if there is any interested in the community to enable
>> Hadoop OpenID Connect integration work?
>>
>> regards,
>> Eric
>>
>

Reply via email to