[
https://issues.apache.org/jira/browse/HADOOP-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644843#action_12644843
]
Chris Douglas commented on HADOOP-4575:
---------------------------------------
On core changes:
* There is a race condition in UserGroupInformationManager::getUgiForUser; the
critical section protects the get and put, but simultaneous requests for an
absent ugi will likely create duplicate entries and insert the last acquirer of
the latter lock. o.a.h.mapred.IndexCache uses a standard idiom that applies
here. Alternatively, simply making the method synchronized would perform
adequately and avoid considerable complication, particularly given the cleanup
semantics for this cache.
* A cleanup thread is overkill. It should be sufficient to verify the timestamp
of the item out of the cache and discard it if it's too old (refreshing the ugi
in the cache). If the size of the cache is a concern, setting a threshold
triggering cleanup on insertion is a sound strategy. The ugi+timestamp is
almost certainly less than 100 bytes of memory, and closer to half that in most
cases. If the limit is only checked on insertion and entries aren't pushed out
of the cache, there's no risk for the degenerate case where the cache is
scanned for each access. Further, only valid users are inserted, so the DoS
threat is low.
* Since there are no references to UserGroupInformationManager in core, perhaps
it can live with the proxy in contrib (including the changes to
hadoop-default.xml)
* I don't understand the changes to FileDataServlet. It introduces a race
condition in the jspHelper field initialization, which should be fixed. Is this
getting around a static initialization order problem? If so, a proper pattern
is preferred.
* UnixUserGroupInformation::getUgiForUser is a dangerous method:
{code}
public static UnixUserGroupInformation getUgiForUser(String userName)
throws IOException {
String[] cmd = new String[] { "bash", "-c", "id -Gn " + userName };
String[] groups = Shell.execCommand(cmd).split("\\s+");
return new UnixUserGroupInformation(userName, groups);
}
{code}
Though the proxy does a fair amount of sanity checking and input sanitizing,
future callers may not be so careful (e.g. userName = {{"foo ; rm \-rf /"}}).
Further, the login functionality is more general and a public static method on
UnixUGI will impede future design. As a separate issue, replacing
Shell::getGroupsCommand with "id -Gn" and the StringTokenizer with
String::split are sound improvements. For this issue, though I can see why this
functionality is associated with UnixUGI, looking up groups for arbitrary users
belongs in the proxy code and is not required in core. The method should also
do some of its own sanity checks.
* HftpFileSystem.ran should be {{final}}
On contrib:
* Why does ProxyHttpServer not extend or own an HttpServer? There is
considerable overlap in their interfaces and implementations.
* I don't understand why the proxy would return a status of
[402|http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.3] when the
client doesn't provide a cert. Is there a canonical response for this case?
* Given that authentication is bypassed for testing:
{noformat}
+ } else { // http request, set ugi for servlets, only for testing purposes
+ String ugi = rqst.getParameter("ugi");
+ rqst.setAttribute("authorized.ugi", new UnixUserGroupInformation(ugi
+ .split(",")));
+ }
{noformat}
if a non-ssl listener is attached, this should probably log at WARN level on
init at least.
* Calling System.exit from createHdfsProxy is unnecessarily forceful; returning
null for success and throwing on failure is clearer. Similarly, the switch stmt
is less readable than humble if stmts.
> An independent HTTPS proxy for HDFS
> -----------------------------------
>
> Key: HADOOP-4575
> URL: https://issues.apache.org/jira/browse/HADOOP-4575
> Project: Hadoop Core
> Issue Type: New Feature
> Reporter: Kan Zhang
> Assignee: Kan Zhang
> Fix For: 0.20.0
>
> Attachments: 4575-1-contrib.patch, 4575-1-core.patch, 4575-1.patch,
> lib.tgz
>
>
> Currently, an HDFS cluster itself exposes an HTTP/HTTPS interface for reading
> files (i.e., HFTP/HSFTP). However, there is a need for an HTTPS proxy server
> running independently from the cluster and serving data to users who
> otherwise can't access the cluster directly. This proxy will authenticate its
> users via SSL certificates and implements the HSFTP interface for reading
> files.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.