Hi Tien -

Apache Knox sounds like exactly what you need here.
Let me explain a bit about how Knox fits into the Hadoop ecosystem.

Apache Hadoop established an integration pattern that is used across the
ecosystem of related projects called proxyuser or Trusted Proxy [1].
This is a pattern that allows specific processes/services to make requests
on behalf of other endusers.
These trusted proxy services establish a trust relationship with the
backend services with a combination of:
* Kerberos for strong authentication for determining the identity of the
trusted service
* doAs/impersonation - this is typically a doAs query param sent to the
backend service from the trusted proxy
* configuration to dictate which hosts to expect trusted proxies to make
calls from, which users are allowed to be impersonated by the trusted
service

Now, a high level view of Knox in this context - we'll use WebHDFS as the
backend service example:
* an enduser makes a curl call to WebHDFS through Knox: 'curl -ivku
guest:guest-password
https://localhost:8443/gateway/sandbox/webhdfs/v1/tmp?op=LISTSTATUS'
* Knox has an endpoint that is configured via a Knox Topology called
sandbox.xml and this topology is configured to support HTTP Basic Auth
* Knox authenticates the user via the ShiroProvider authentication provider
and establishes a Java Subject security context for the request processing
internally
* The request flows through the provider chain and enforces whatever
security policies, authorization checks, identity assertion, etc
* The last provider is the Dispatch to the backend service - this is
essentially an HTTP client that interacts with the backend service
* The webhdfs dispatch takes the authenticated username and sets that as
the doAs query param on the outgoing request and dispatches the client's
original request with that param added.
* The WebHDFS endpoint will issue a kerberos challenge to the client which
is Knox in this case and we will authenticate as the Knox identity via
Kerberos/SPNEGO
* WebHDFS will note that there is a doAs query param, that the Knox
identity is indeed a trusted proxy, that the request is coming from an
expected host and that the impersonation is allowed for the user being
asserted by the doAs.

Hopefully that wasn't too much detail and that it proves helpful.

thanks,

--larry

1.
https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Superusers.html


On Tue, Dec 1, 2020 at 2:23 PM Tien Dat PHAN <tphan....@gmail.com> wrote:

> Dear experts,
>
> We are recently starting to adopt Knox as the principle component for
> equipping our data processing cluster a complete security layer.
>
> In fact, the situation is, in our cluster, there are Apache components
> like Apache HBase, HDFS which play the role as our data processing backend.
> These components work perfectly with Kerberos authentication for Access
> Control.
> On the other hand, our frontend is using CAS for authenticating users
> (when accessing the data stored in our cluster).
>
> We just wonder (sorry if this turns out to a dumb question for you all) if
> it is possible for the following scenario?
> 1) User access to our web UI, inputting the username and password
> 2) The CAS authentication certificates that username and password, there
> will be a token stored in this session
> 3) We (somehow) convert this token into Kerberos token which will be
> passed to backend API when querying data.
>
> The main concern is about the step 3). The reason we think of this
> scenario is because we don't expect the users to login one more time to
> create a Kerberos token (for backend access).
>
> Do you think this is a reasonable authentication setup? And if YES, do you
> think is possible with the help from Knox API?
>
> Thank you in advance for your time and consideration.
>
> Best regards
> Tien Dat PHAN
>

Reply via email to