[ https://issues.apache.org/jira/browse/HADOOP-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13455181#comment-13455181 ]
Todd Lipcon commented on HADOOP-8803: ------------------------------------- Oh, I see. So, you'd modify getBlockLocations() so that it returns a block token which is byte-range restricted? > Make Hadoop running more secure public cloud envrionment > -------------------------------------------------------- > > Key: HADOOP-8803 > URL: https://issues.apache.org/jira/browse/HADOOP-8803 > Project: Hadoop Common > Issue Type: New Feature > Components: fs, ipc, security > Affects Versions: 0.20.204.0 > Reporter: Xianqing Yu > Labels: hadoop > Original Estimate: 2m > Remaining Estimate: 2m > > I am a Ph.D student in North Carolina State University. I am modifying the > Hadoop's code (which including most parts of Hadoop, e.g. JobTracker, > TaskTracker, NameNode, DataNode) to achieve better security. > > My major goal is that make Hadoop running more secure in the Cloud > environment, especially for public Cloud environment. In order to achieve > that, I redesign the currently security mechanism and achieve following > proprieties: > 1. Bring byte-level access control to Hadoop HDFS. Based on 0.20.204, HDFS > access control is based on user or block granularity, e.g. HDFS Delegation > Token only check if the file can be accessed by certain user or not, Block > Token only proof which block or blocks can be accessed. I make Hadoop can do > byte-granularity access control, each access party, user or task process can > only access the bytes she or he least needed. > 2. I assume that in the public Cloud environment, only Namenode, secondary > Namenode, JobTracker can be trusted. A large number of Datanode and > TaskTracker may be compromised due to some of them may be running under less > secure environment. So I re-design the secure mechanism to make the damage > the hacker can do to be minimized. > > a. Re-design the Block Access Token to solve wildly shared-key problem of > HDFS. In original Block Access Token design, all HDFS (Namenode and Datanode) > share one master key to generate Block Access Token, if one DataNode is > compromised by hacker, the hacker can get the key and generate any Block > Access Token he or she want. > > b. Re-design the HDFS Delegation Token to do fine-grain access control for > TaskTracker and Map-Reduce Task process on HDFS. > > In the Hadoop 0.20.204, all TaskTrackers can use their kerberos credentials > to access any files for MapReduce on HDFS. So they have the same privilege as > JobTracker to do read or write tokens, copy job file, etc.. However, if one > of them is compromised, every critical thing in MapReduce directory (job > file, Delegation Token) is exposed to attacker. I solve the problem by making > JobTracker to decide which TaskTracker can access which file in MapReduce > Directory on HDFS. > > For Task process, once it get HDFS Delegation Token, it can access everything > belong to this job or user on HDFS. By my design, it can only access the > bytes it needed from HDFS. > > There are some other improvement in the security, such as TaskTracker can not > know some information like blockID from the Block Token (because it is > encrypted by my way), and HDFS can set up secure channel to send data as a > option. > > By those features, Hadoop can run much securely under uncertain environment > such as Public Cloud. I already start to test my prototype. I want to know > that whether community is interesting about my work? Is that a value work to > contribute to production Hadoop? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira