See my comments inline:

On Wed, May 20, 2020 at 4:50 PM Rajive Chittajallu <raj...@ieee.org> wrote:

> On Wed, May 20, 2020 at 1:47 PM Eric Yang <eric...@gmail.com> wrote:
> >
> >> > Kerberos was developed decade before web development becomes popular.
> >> > There are some Kerberos limitations which does not work well in
> Hadoop.  A
> >> > few examples of corner cases:
> >>
> >> Microsoft Active Directory, which is extensively used in many
> organizations,
> >> is based on Kerberos.
> >
> >
> > True, but with rise of Google and AWS.  OIDC seems to be a formidable
> standard that can replace Kerberos for authentication.  I think providing
> an option for the new standard is good for Hadoop.
> >
>
> I think you are referring to Oauth2 and adoption across varies
> significantly across vendors. When one refers to Kerberos, its mostly
> about MIT Kerberos or Microsoft Active Directory. But Oauth2 is a
> specification, implementations vary and are quite prone to bugs. I
> would be very careful in making a generic statement as a "formidable
> standard".
>
> AWS services, atleast in the context of Data processing / Analytics
> does not support Oauth2. Its more of a GCP thing. AWS uses Signed
> requests [1].
>
> [1] https://docs.aws.amazon.com/general/latest/gr/signature-version-4.html


Kerberos is a protocol for authentication.  OIDC is also an authentication
protocol.  MIT Kerberos or Oauth2 are frameworks, not authentication
protocol.  By no means that I am suggesting to adopt Oauth2 framework
because implementing according to protocol spec is better than hard wired
to a certain libraries.  We can adopt existing OIDC libraries like pac4j to
reduce maintenance of implementing OIDC protocol in Hadoop.  AWS has been
offering OIDC authentication for EKS, and IAM identity provider.  By
offering native OIDC support, it will help Hadoop to access cloud services
that are secured by OIDC more easily.


>
>
>>
> >> > 1. Kerberos principal doesn't encode port number, it is difficult to
> know
> >> > if the principal is coming from an authorized daemon or a hacker
> container
> >> > trying to forge service principal.
> >>
> >> Clients use ephemeral ports. Not sure of what the relevancy of this
> statement.
> >
> > Hint: CVE-2020-9492
> >
>
> Its a reserved one. You can help the conversation by describing a threat
> model.
>

Hadoop security mailing list has the problem listed, if you are interested
in this area.  Hadoop Kerberos security quirks is a off topic for
decoupling Kerberos from Hadoop.


> >> > 2. Hadoop Kerberos principals are used as high privileged principal,
> a form
> >> > of credential to impersonate end user.
> >>
> >> Principals are identities of the user. You can make identities fully
> qualified,
> >> to include issuing authority if you want to. This is not kerberos
> specific.
> >>
> >> Remember, Kerberos is an authentication mechanism, How those assertions
> >> are translated to authorization rules are application specific.
> >>
> >> Probably reconsider alternatives to auth_to_local rules.
> >
> >
> > Trust must be validated.  Hadoop Kerberos principals for service that
> can perform impersonation are equal to root power.  Transport root power
> securely without being intercepted is quite difficult, when services are
> running as root instead of daemons.  There is alternate solution to always
> forward signed end user token, hence, there is no need of validation of
> proxy user credential.  The down side of forwarding signed token is
> difficult to forward multiple tokens of incompatible security mechanism
> because renewal mechanism and expiration time may not be deciphered by the
> transport mechanism.  This is the reason that using SSO token is a good way
> to ensure every libraries and framework abide by same security practice to
> eliminate confused deputy problems.
>
> Trust of what? Service principals should not be used for
> authentication in client context, there
> are there for server identification.


The trust is referring to service (Oozie/Hive) impersonates as end user,
and namenode issues delegation token after check proxy user ACL, The form
of token presented to namenode is a service tgt, not end user tgt.  The
service tgt is validated in proxy user ACL validation with namenode to
allow impersonation to happen.  If service tgt is intercepted due to lack
of encryption in RPC or HTTP transport, service ticket is vulnerable to
replay attack.


>
>
OAuth2 (which OIDC flow is based on) suggests JWT, which are signed
> tokens. Can you
> elaborate more on what do you mean my "SSO Token"?


SSO Token is JWT token in this context.  My advice is there should only be
one token transported, instead of multiple tokens to prevent out of sync
expiration date problem on multiple tokens.


>  To improve security for doAS use cases, add context to the calls. Just
> replacing

Kerberos with a different authentication mechanism is not going to
> solve the problem.


The focus is to support alternate security mechanism that may have been
chosen by other companies.  It is not strictly solving any doAs problem,
but nice to consider impact to Hadoop's proxy user implementation.

And how to improve Proxy User usecases vary by application. Asserting
>
a 'on-behalf-of' action,
> when there is an active client on the other end (eg: hdfs proxy) would
> be different from one that
> is initiated per schedule, eg Oozie.


I don't agree that doAs is any different between hdfs proxy or Oozie.  They
are both using impersonation power and behaving like root programs.  As the
result, they must be treated as root programs with extra effort to secure
all entry points to avoid security mistakes.


>

> >>
> >> > 3. Delegation token may allow expired users to continue to run jobs
> long
> >> > after they are gone, without rechecking if end user credentials is
> still
> >> > valid.
> >>
> >> Delegation tokens are hadoop specific implementation, whose lifecycle is
> >> outside the scope of Kerberos. Hadoop (NN/RM) can periodically check
> >> respective IDP Policy and revoke tokens. Or have a central token
> >> management service, similar to KMS
> >>
> >> > 4.  Passing different form of tokens does not work well with cloud
> provider
> >> > security mechanism.  For example, passing AWS sts token for S3 bucket.
> >> > There is no renewal mechanism, nor good way to identify when the token
> >> > would expire.
> >>
> >> This is outside the scope of Kerberos.
> >>
> >> Assuming you are using YARN, making RM handle S3 temp credentials,
> >> similar to HDFS delegation tokens is something to consider.
> >>
> >> > There are companies that work on bridging security mechanism of
> different
> >> > types, but this is not primary goal for Hadoop.  Hadoop can benefit
> from
> >> > modernized security using open standards like OpenID Connect, which
> >> > proposes to unify web applications using SSO.   This ensure the client
> >> > credentials are transported in each stage of client servers
> interaction.
> >> > This may improve overall security, and provide more cloud native form
> >> > factor.  I wonder if there is any interested in the community to
> enable
> >> > Hadoop OpenID Connect integration work?
> >>
> >> End to end identity assertion is where Kerberos in it self does not
> address.
> >> But any implementation should not pass "credentials'. Need a way to pass
> >> signed requests, that could be verified along the chain.
> >
> >
> > We agree on this, and OIDC seems like a good option to pass signed
> requests and verifies the signed token.
> >
> >>
> >> >
> >> > regards,
> >> > Eric
>

Reply via email to