On Fri, Jul 2, 2010 at 4:15 PM, Gary Helmling <[email protected]> wrote:
> Hi folks, > > With Yahoo's latest security release on github ( > http://github.com/yahoo/hadoop-common/tree/yahoo-hadoop-0.20.104), it > looks > like we now have a real-world usable version of secure Hadoop, based on > 0.20. This is exciting stuff, because now we have something solid to start > working towards implementing similar security controls in HBase > (HBASE-1697, > HBASE-2014, HBASE-2016, HBASE-2420)! > > However, this is going to be a large undertaking, with a strong dependency > on the secure Hadoop branch (more on that in a bit -- unfortunately the > fragmented hadoop-0.20 world is already leaking through). So I'd like to > propose a feature branch in the HBase svn repo for security work, to: > > 1) ensure that changes towards implementing secure HBase have an ASF home > 2) provide more visibility and granularity for review (esp. JIRA & > reviewboard usage) > 3) ease interaction/integration with other branched changes underway > (master > rewrite) > > I've already started pushing some preliminary changes up to github ( > http://github.com/ghelmling/hbase/tree/security), and will continue to do > so, but I'd like to avoid both massive patch sets accumulating too many > changes and making interested committers & contributors go digging to see > what the current state is. > > On the secure Hadoop branch dependency -- I've integrated the > org.apache.hadoop.ipc changes into o.a.h.hbase.ipc.* (HBASE-2742) and run > into a couple complications: > > * Hadoop RPC version rolled from 3 to 4 (apparently 0.20-append also does > this!) > Can you explain further which version you're talking about here? Our HBase IPC is already wire-incompatible with Hadoop IPC, so the version numbers needn't match up, right? > * various bits in the updated HBaseClient, HBaseServer, etc. now depend on > the security implementation, so building and running on top of non-secure > Hadoop will not be possible. > > Which classes do we depend on? Can we copy-paste those over into our tree? It's more of a maintenance pain in some ways, but in other ways it allows us to fix bugs, etc, without waiting on Hadoop releases. > I'd like to post the diff on review.hbase.org for more review and > feedback, > but that begs the question of where the changes should go? > > I can set up review.hbase.org to fetch from your github repo, so that diffs against your github branch will upload properly. Is that useful? > Longer term, I think we need to dump Hadoop RPC (AVRO-405 seems promising > in > this) so that HBase internals aren't so intertwined with Hadoop > implementation details, but that's it's own large scale project which we > shouldn't couple to security. > > So, to sum up, thoughts on: > > a) creating a "security" feature branch in svn? > b) RPC related changes, specifically cross Hadoop branch incompatibility > due > to version increment and Hadoop security dependencies? > > Thanks, > Gary > -- Todd Lipcon Software Engineer, Cloudera
