Hi all, I wanted to start a (hopefully short) discussion around the treatment of the hadoop.job.ugi configuration in Hadoop 0.22 and beyond (as well as the secure 0.20 branch). In the current security implementation, the following incompatible changes have been made even for users who are sticking with "simple" security.
1) Groups resolution happens on the server side, where it used to happen on the client. Thus, all Hadoop users must exist on the NN/JT machines in order for group mapping to succeed (or the user must write a custom group mapper). 2) The hadoop.job.ugi parameter is ignored - instead the user has to use the new UGI.createRemoteUser("foo").doAs() API, even in simple security. I'm curious whether the general user community feels these are acceptable breaking changes. The potential solutions I can see are: For 1) Add a configuration like hadoop.security.simple.groupmappinglocation -> "client" or "server". If it's set to "client", the group mapping would continue to happen as it does in prior versions on the client side. For 2) If security is "simple", we can have the FileSystem and JobClient constructors check for this parameter. If it's set, and there is no Subject object associated with the current AccessControlContext, wrap the creation of the RPC proxy with the correct doAs() call. Although security is obviously an absolute necessity for many organizations, I know of a lot of people who have small clusters and small teams who don't have any plans to deploy it. For these people, I imagine the above backward-compatibility layer may be very helpful as they adopt the next releases of Hadoop. If we don't want to support these options going forward, we can of course emit deprecation warnings when they are in effect and remove the compatibility layer in the next major release. Any thoughts here? Do people often make use of the hadoop.job.ugi variable to such an extent that this breaking change would block your organization from upgrading? Thanks -Todd -- Todd Lipcon Software Engineer, Cloudera