[ 
https://issues.apache.org/jira/browse/HADOOP-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644843#action_12644843
 ] 

Chris Douglas commented on HADOOP-4575:
---------------------------------------

On core changes:
* There is a race condition in UserGroupInformationManager::getUgiForUser; the 
critical section protects the get and put, but simultaneous requests for an 
absent ugi will likely create duplicate entries and insert the last acquirer of 
the latter lock. o.a.h.mapred.IndexCache uses a standard idiom that applies 
here. Alternatively, simply making the method synchronized would perform 
adequately and avoid considerable complication, particularly given the cleanup 
semantics for this cache.
* A cleanup thread is overkill. It should be sufficient to verify the timestamp 
of the item out of the cache and discard it if it's too old (refreshing the ugi 
in the cache). If the size of the cache is a concern, setting a threshold 
triggering cleanup on insertion is a sound strategy. The ugi+timestamp is 
almost certainly less than 100 bytes of memory, and closer to half that in most 
cases. If the limit is only checked on insertion and entries aren't pushed out 
of the cache, there's no risk for the degenerate case where the cache is 
scanned for each access. Further, only valid users are inserted, so the DoS 
threat is low.
* Since there are no references to UserGroupInformationManager in core, perhaps 
it can live with the proxy in contrib (including the changes to 
hadoop-default.xml)
* I don't understand the changes to FileDataServlet. It introduces a race 
condition in the jspHelper field initialization, which should be fixed. Is this 
getting around a static initialization order problem? If so, a proper pattern 
is preferred.
* UnixUserGroupInformation::getUgiForUser is a dangerous method:
{code}

  public static UnixUserGroupInformation getUgiForUser(String userName)
      throws IOException {
    String[] cmd = new String[] { "bash", "-c", "id -Gn " + userName };
    String[] groups = Shell.execCommand(cmd).split("\\s+");
    return new UnixUserGroupInformation(userName, groups);
  }
{code}
Though the proxy does a fair amount of sanity checking and input sanitizing, 
future callers may not be so careful (e.g. userName = {{"foo ; rm \-rf /"}}). 
Further, the login functionality is more general and a public static method on 
UnixUGI will impede future design. As a separate issue, replacing 
Shell::getGroupsCommand with "id -Gn" and the StringTokenizer with 
String::split are sound improvements. For this issue, though I can see why this 
functionality is associated with UnixUGI, looking up groups for arbitrary users 
belongs in the proxy code and is not required in core. The method should also 
do some of its own sanity checks.
* HftpFileSystem.ran should be {{final}}

On contrib:
* Why does ProxyHttpServer not extend or own an HttpServer? There is 
considerable overlap in their interfaces and implementations.
* I don't understand why the proxy would return a status of 
[402|http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.3] when the 
client doesn't provide a cert. Is there a canonical response for this case?
* Given that authentication is bypassed for testing:
{noformat}
+    } else { // http request, set ugi for servlets, only for testing purposes
+      String ugi = rqst.getParameter("ugi");
+      rqst.setAttribute("authorized.ugi", new UnixUserGroupInformation(ugi
+          .split(",")));
+    }
{noformat}
if a non-ssl listener is attached, this should probably log at WARN level on 
init at least.
* Calling System.exit from createHdfsProxy is unnecessarily forceful; returning 
null for success and throwing on failure is clearer. Similarly, the switch stmt 
is less readable than humble if stmts.

> An independent HTTPS proxy for HDFS
> -----------------------------------
>
>                 Key: HADOOP-4575
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4575
>             Project: Hadoop Core
>          Issue Type: New Feature
>            Reporter: Kan Zhang
>            Assignee: Kan Zhang
>             Fix For: 0.20.0
>
>         Attachments: 4575-1-contrib.patch, 4575-1-core.patch, 4575-1.patch, 
> lib.tgz
>
>
> Currently, an HDFS cluster itself exposes an HTTP/HTTPS interface for reading 
> files (i.e., HFTP/HSFTP). However, there is a need for an HTTPS proxy server 
> running independently from the cluster and serving data to users who 
> otherwise can't access the cluster directly. This proxy will authenticate its 
> users via SSL certificates and implements the HSFTP interface for reading 
> files.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to