> I think there is precedence for competing and/or "duplicate" Apache > projects, Avro/Thrift and HBase/Cassandra come to mind.
That argument isn't helping you make your case. Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) >________________________________ >From: Joey Echeverria <[email protected]> >To: [email protected] >Sent: Saturday, September 3, 2011 3:30 AM >Subject: Re: [DISCUSSION] Accumulo, another BigTable clone, has shown up on >Apache Incubator as a proposal > >To add to what Todd said, I actually worked with those guys for the >last 3 years and have used Accumulo in production. It's true that it >would have been better if they had been able to contribute to HBase >rather than go on their own, but it's not easy to contribute to open >source, either officially or unofficially when you work at NSA. I >think there is precedence for competing and/or "duplicate" Apache >projects, Avro/Thrift and HBase/Cassandra come to mind. I'm mostly >interested in this project setting a precedent for other work at NSA >to be developed as open source. > >-Joey > >On Fri, Sep 2, 2011 at 3:09 PM, Todd Lipcon <[email protected]> wrote: >> Hey folks, >> >> <wearing my Todd hat and not my Cloudera hat!> >> >> I've been in touch with this team for the last 18 months or so. >> They're good people, smart, and have a healthy respect for HBase and >> our team. Though they haven't contributed code or participated on the >> lists, I can vouch that they do follow our development and generally >> do understand HBase as well as what makes their system different. In >> the context of the incubator proposal, they're trying to explain why >> their system is different than HBase, and not trying to knock our >> project. They do borrow our ideas, and in the future we'll be able to >> borrow some of theirs. Iterator trees, for example, are distinct from >> coprocessors and have some really nice capabilities which I'm looking >> forward to adapting into HBase. >> >> There are a couple things to keep in mind about the story here: >> - they first evaluated HBase 3 years ago. HBase at that point was not >> usable for their application - I think several of us here remember the >> state of HBase at the time and might have made the same decision. So, >> they started their own project with an internal team of 5-6 people. >> - contributing to open source from within the NSA is not easy, for >> obvious reasons. They've jumped through many hoops to open source >> this, and we should be thankful for that. Now that they're out in open >> source land, I think we'll see them collaborating with us much more >> openly. >> >> I for one look forward to working with these folks, and maybe merging >> the projects some time down the road as the feature lists converge. >> >> -Todd >> >> On Fri, Sep 2, 2011 at 11:40 AM, Gary Helmling <[email protected]> wrote: >>> Some comments on the proposal and differentiation vs HBase: >>> >>> Access Labels: >>> >>> The proposal claims that this is "unlikely to be adopted [in HBase]". This >>> is completely untrue. This has been discussed many times in the past in >>> relation to our security implementation. It's just been deferred at the >>> moment due to a need to focus on the initial implementation. But it's >>> certainly viewed as a potentially important feature for a future iteration. >>> Contributions always welcome! >>> >>> see HBASE-3435: Provide per-column-qualifier and per-key-value security for >>> HBASE-3025 >>> >>> >>> Iterators: >>> >>> What do these provide that RegionObservers don't? I'm speculating since the >>> proposal provides little in the way of details, but if these are "unlikely >>> to be adopted" it's only because coprocessors already offer more extensive >>> functionality. >>> >>> >>> "Flexibility" aka online schema changes and locality groups >>> >>> Locality groups seem to be the only meaningful differentiation in this >>> entire comparison. >>> >>> >>> Testing >>> >>> Performance under "some configurations and conditions" and unsubstantiated >>> "greater data integrity" is not meaningful differentiation. >>> >>> >>> Apache Brand >>> >>> Claims a relationship with HBase. Is there overlapping code or is this just >>> the duplication of functionality? There's no community relationship that >>> I'm aware of. I haven't seen any of the proposed committers on the HBase >>> user and dev lists to this point, so that doesn't set much of a precedent >>> for community interaction. >>> >>> >>> Overall I see no meaningful differentiation vs HBase as an existing project, >>> no past attempts to interact with the most relevant Apache community, and >>> only an, until now, private "community" of government users. I think it's >>> great that they want to open source this. I don't want to discourage that >>> -- go for it! But I don't see what the benefit is of ASF incubating this. >>> I only see the potential for community fragmentation and market confusion >>> over such closely similar projects. >>> >>> >>> Gary >>> >>> >>> On Fri, Sep 2, 2011 at 11:06 AM, Stack <[email protected]> wrote: >>> >>>> See here for the incubator proposal: >>>> http://wiki.apache.org/incubator/AccumuloProposal >>>> >>>> Reactions probably better belong over on the incubator mailing list >>>> but I thought a discussion here first might be useful developing a >>>> stance. >>>> >>>> Initial reaction, not having seen the code, is that it seems to be close to >>>> HBase; so close, they call HBase out explicitly in their proposal. >>>> >>>> The cell based 'access labels' seem like a matter of adding >>>> an extra field to KV and their Iterators seem like a specialization on >>>> Coprocessors. The ability to add column families on the fly seems too >>>> minor a difference to call out especially if online schema edits are >>>> now (soon) supported. They talk of locality group like functionality >>>> too -- that >>>> could be a significant difference. We would have to see the code but at >>>> first blush, differences look small. >>>> >>>> Yet another BT implementation further divides this contended space. >>>> If there were to be an effort integrating HBase into Accumulo or vice >>>> versa, its likely to distract significantly from project forward motion (If >>>> the Accumulo fellows were interested in integrating the two projects, >>>> I'd have thought they'd have tried to talk to us before this so thats >>>> probably not their intent). >>>> >>>> On other hand, if their once-secret project is out in the open, we can >>>> steal the Apache-licensed good bits and.... >>>> >>>> What do folks think? >>>> >>>> St.Ack >>>> >>> >> >> >> >> -- >> Todd Lipcon >> Software Engineer, Cloudera >> > > > >-- >Joseph Echeverria >Cloudera, Inc. >443.305.9434 > > >
