[
https://issues.apache.org/jira/browse/HADOOP-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12642048#action_12642048
]
Doug Cutting commented on HADOOP-4348:
--------------------------------------
Sanjay> The best way to represent that service access is when a service proxy
object is created - e.g when the connection is established.
A proxy is not bound to a single connection. Connections are retrieved from a
cache each time a call is made. Different proxies may share the same
connection, and a single proxy my use different connections for different calls.
Sanjay> We could share multiple service sessions in a single connection but
that complexity is not worth it.
It would be simpler to implement this way, not more complex. In HADOOP-4049 it
was considerably simpler to pass extra data by modifying the RPC code than
Client/Server. That's my primary motivation here: to keep the code simple. So
unless there's a reason why we must authorize per connection rather than per
request, it would be easier to authorize requests and would better
compartmentalize the code. There are some performance implications.
Authorizing per request will use fewer connections but perform more
authorizations. I don't know whether this is significant. I expect that ACLs
will be cached, and that authorization will not be too expensive, but that
remains to be seen. So performance may provide a motivation to authorize per
connection. But let's not prematurely optimize.
Sanjay> I see your argument to be equivalent to arguing against service level
authorization and that method level authorization is sufficient.
No, but we will eventually probably need method-level authorization too, and it
would be nice if whatever support we add now also helps then. If we do this in
RPC, then we can examine only the protocol name for now, and subsequently add
method-level authorization at the same place. So implementing
service-level-authentication this way better prepares us for method-level
authentication.
Sanjay> Would you be happier if we created an intermediate layer, say
rpc-session, in between. I am not seriously suggesting we do that.
We have two layers today. We could add this at either layer. It would be
cleaner to add it only at one layer, not mixed between the two, as in the
current patch. It would be simpler to add it to the RPC layer, and I have yet
to hear a strong reason why that would be wrong. That's all I'm saying.
> Adding service-level authorization to Hadoop
> --------------------------------------------
>
> Key: HADOOP-4348
> URL: https://issues.apache.org/jira/browse/HADOOP-4348
> Project: Hadoop Core
> Issue Type: New Feature
> Reporter: Kan Zhang
> Assignee: Arun C Murthy
> Fix For: 0.20.0
>
> Attachments: HADOOP-4348_0_20081022.patch
>
>
> Service-level authorization is the initial checking done by a Hadoop service
> to find out if a connecting client is a pre-defined user of that service. If
> not, the connection or service request will be declined. This feature allows
> services to limit access to a clearly defined group of users. For example,
> service-level authorization allows "world-readable" files on a HDFS cluster
> to be readable only by the pre-defined users of that cluster, not by anyone
> who can connect to the cluster. It also allows a M/R cluster to define its
> group of users so that only those users can submit jobs to it.
> Here is an initial list of requirements I came up with.
> 1. Users of a cluster is defined by a flat list of usernames and groups.
> A client is a user of the cluster if and only if her username is listed in
> the flat list or one of her groups is explicitly listed in the flat list.
> Nested groups are not supported.
> 2. The flat list is stored in a conf file and pushed to every cluster
> node so that services can access them.
> 3. Services will monitor the modification of the conf file periodically
> (5 mins interval by default) and reload the list if needed.
> 4. Checking against the flat list is done as early as possible and before
> any other authorization checking. Both HDFS and M/R clusters will implement
> this feature.
> 5. This feature can be switched off and is off by default.
> I'm aware of interests in pulling user data from LDAP. For this JIRA, I
> suggest we implement it using a conf file. Additional data sources may be
> supported via new JIRA's.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.