Nice!
Is this related to HBase? Or similar to it?

mahadev

On Fri, Sep 2, 2011 at 9:27 AM, Patrick Hunt <ph...@apache.org> wrote:
> FYI, another project using ZK -- woot!!! (note that they have their
> own WAL - perhaps a good application for BookKeeper?)
>
> ---------- Forwarded message ----------
> From: Billie J Rinaldi <billie.j.rina...@ugov.gov>
> Date: Fri, Sep 2, 2011 at 8:45 AM
> Subject: [PROPOSAL] Accumulo for the Apache Incubator
> To: gene...@incubator.apache.org
>
>
> Greetings,
>
> I would like to propose Accumulo to be an Apache Incubator project.
> Accumulo is a distributed key/value store that provides expressive
> cell-level access labels and a server-side programming mechanism that
> can modify key/value pairs at various points in the data management
> process.  It is based on Google's BigTable design and runs over Apache
> Hadoop and Zookeeper.
>
> Here is a link to the proposal in the Incubator wiki:
> http://wiki.apache.org/incubator/AccumuloProposal
>
> I've also pasted the initial contents below.
>
> Thanks,
> Billie Rinaldi
>
>
> = Accumulo Proposal =
>
> == Abstract ==
> Accumulo is a distributed key/value store that provides expressive,
> cell-level access labels.
>
> == Proposal ==
> Accumulo is a sorted, distributed key/value store based on Google's
> BigTable design.  It is built on top of Apache Hadoop, Zookeeper, and
> Thrift.  It features a few novel improvements on the BigTable design
> in the form of cell-level access labels and a server-side programming
> mechanism that can modify key/value pairs at various points in the
> data management process.
>
> == Background ==
> Google published the design of BigTable in 2006.  Several other open
> source projects have implemented aspects of this design including
> HBase, CloudStore, and Cassandra.  Accumulo began its development in
> 2008.
>
> == Rationale ==
> There is a need for a flexible, high performance distributed key/value
> store that provides expressive, fine-grained access labels.  The
> communities we expect to be most interested in such a project are
> government, health care, and other industries where privacy is a
> concern.  We have made much progress in developing this project over
> the past 3 years and believe both the project and the interested
> communities would benefit from this work being openly available and
> having open development.
>
> == Current Status ==
>
> === Meritocracy ===
> We intend to strongly encourage the community to help with and
> contribute to the code.  We will actively seek potential committers
> and help them become familiar with the codebase.
>
> === Community ===
> A strong government community has developed around Accumulo and
> training classes have been ongoing for about a year.  Hundreds of
> developers use Accumulo.
>
> === Core Developers ===
> The developers are mainly employed by the National Security Agency,
> but we anticipate interest developing among other companies.
>
> === Alignment ===
> Accumulo is built on top of Hadoop, Zookeeper, and Thrift.  It builds
> with Maven.  Due to the strong relationship with these Apache
> projects, the incubator is a good match for Accumulo.
>
> == Known Risks ==
> === Orphaned Products ===
> There is only a small risk of being orphaned.  The community is
> committed to improving the codebase of the project due to its
> fulfilling needs not addressed by any other software.
>
> === Inexperience with Open Source ===
> The codebase has been treated internally as an open source project
> since its beginning, and the initial Apache committers have been
> involved with the code for multiple years.  While our experience with
> public open source is limited, we do not anticipate difficulty in
> operating under Apache's development process.
>
> === Homogeneous Developers ===
> The committers have multiple employers and it is expected that
> committers from different companies will be recruited.
>
> === Reliance on Salaried Developers ===
> The initial committers are all paid by their employers to work on
> Accumulo and we expect such employment to continue.  Some of the
> initial committers would continue as volunteers even if no longer
> employed to do so.
>
> === Relationships with Other Apache Products ===
> Accumulo uses Hadoop, Zookeeper, Thrift, Maven, log4j, commons-lang,
> -net, -io, -jci, -collections, -configuration, -logging, and -codec.
>
> === Relationship to HBase ===
> Accumulo and HBase are both based on the design of Google's BigTable,
> so there is a danger that potential users will have difficulty
> distinguishing the two or that they will not see an incentive in
> adopting Accumulo.  There are a few key areas in which Accumulo
> differs from HBase.  Some of the desired features of Accumulo could be
> incorporated into HBase, however the most important of these may be
> unlikely to be adopted (see cell-level access labels and iterators
> below).  It is a possibility that the codebases will ultimately
> converge, but the number of differences at the current time warrants a
> separate project for Accumulo.
>
> ==== Access Labels ====
> Accumulo has an additional portion of its key that sorts after the
> column qualifier and before the timestamp.  It is called column
> visibility and enables expressive cell-level access control.
> Authorizations are passed with each query to control what data is
> returned to the user.  The column visibilities are boolean AND and OR
> combinations of arbitrary strings (such as "(A&B)|C") and
> authorizations are sets of strings (such as {C,D}).
>
> ==== Iterators ====
> Accumulo has a novel server-side programming mechanism that can modify
> the data written to disk or returned to the user.  This mechanism can
> be configured for any of the scopes where data is read from or written
> to disk.  It can be used to perform joins on data within a single
> tablet.
>
> ==== Flexibility ====
> HBase requires the user to specify the set of column families to be
> used up front.  Accumulo places no restrictions on the column
> families.  Also, each column family in HBase is stored separately on
> disk.  Accumulo allows column families to be grouped together on disk,
> as does BigTable.  This enables users to configure how their data is
> stored, potentially providing improvements in compression and lookup
> speeds.  It gives Accumulo a row/column hybrid nature, while HBase is
> currently column-oriented.
>
> ==== Testing ====
> Accumulo has testing frameworks that have resulted in its achieving a
> high level of correctness and performance.  We have observed that
> under some configurations and conditions Accumulo will outperform
> HBase and provide greater data integrity.
>
> ==== Logging ====
> HBase uses a write-ahead log on the Hadoop Distributed File System.
> Accumulo has its own logging service that does not depend on
> communication with the HDFS NameNode.
>
> ==== Storage ====
> Accumulo has a relative key file format that improves compression.
>
> ==== Areas in which HBase features improvements over Accumulo ====
> in memory tables, upserts, coprocessors, connections to other projects
> such as Cascading and Pig
>
> === Expectations ===
> There is a risk that Accumulo will be criticized for not providing
> adequate security.  The access labels in Accumulo do not in themselves
> provide a complete security solution, but are a mechanism for labeling
> each piece of data with the authorizations that are necessary to see
> it.
>
> === Apache Brand ===
> Our interest in releasing this code as an Apache incubator project is
> due to its strong relationship with other Apache projects, i.e.
> Hadoop, Zookeeper, and HBase.
>
> == Documentation ==
> There is not currently documentation about Accumulo on the web, but a
> fair amount of documentation and training materials exists and will be
> provided on the Accumulo wiki at apache.org.  Also, a paper discussing
> YCSB results for Accumulo will be presented at the 2011 Symposium on
> Cloud Computing.
>
> == Initial Source ==
> Accumulo has been in development since spring 2008.  There are
> hundreds of developers using it and tens of developers have
> contributed to it.  The core codebase consists of 200,000 lines of
> code (mainly Java) and 100s of pages of documentation.  There are also
> a few projects built on top of Accumulo that may be added to its
> contrib in the future.  These include support for Hive, Matlab, YCSB,
> and graph processing.
>
> == Source and Intellectual Property Submission Plan ==
> Accumulo core code, examples, documention, and training materials will
> be submitted by the National Security Agency.
>
> We will also be soliciting contributions of further plugins from MIT
> Lincoln Labs, Carnegie Mellon University, and others.
>
> Accumulo has been developed by a mix of government employees and
> private companies under government contract.  Material developed by
> government employees is in the public domain and no U.S. copyright
> exists in works of the federal government.  For the contractor
> developed material in the initial submission, the U.S. Government has
> sufficient authority per the ICLA from the copyright owner to
> contribute the Accumulo code to the incubator.
>
> There has been some discussion regarding accepting contributions from
> US Government sources on
> [https://issues.apache.org/jira/browse/LEGAL-93 LEGAL-93]. We propose
> that the NSA will sign an ICLA/CCLA if that document could be slightly
> modified to explicitly address copyright in works of government
> employees. Specifically, we propose that the definition of “You” be
> modified to include “the copyright owner, the owner of a Contribution
> not subject to copyright, or legal entity authorized by the copyright
> owner that is making this Agreement.” In addition, section 2, the
> copyright license grant be modified after “You hereby grant” that
> either states “to the extent authorized by law” or “to the extent
> copyright exists in the Contribution.”  These changes will permit US
> Government employee developed work to be included.
>
> One proposed solution is to form a Collaborative Research and
> Development Agreement (CRADA) between the Apache Software Foundation
> and the US Government, but this will not solve the underlying problem
> that U.S. law does not grant copyright to works of government
> employees.  At this time a CRADA is not necessary but should it be
> determined that a CRADA is necessary, we would like to work through
> that process during the incubation phase of Accumulo rather than
> before acceptance as this may take time to enter into an agreement.
>
> == External Dependencies ==
> jetty (Apache and EPL), jline (BSD), jfreechart (LGPL), jcommon
> (LGPL), slf4j (MIT), junit (CPL)
>
> == Cryptography ==
> none
>
> == Required Resources ==
>  * Mailing Lists
>   * accumulo-private
>   * accumulo-dev
>   * accumulo-commits
>   * accumulo-user
>
>  * Subversion Directory
>   * https://svn.apache.org/repos/asf/incubator/accumulo
>
>  * Issue Tracking
>   * JIRA Accumulo (ACCUMULO)
>
>  * Continuous Integration
>   * Jenkins builds on https://builds.apache.org/
>
>  * Web
>   * http://incubator.apache.org/accumulo/
>   * wiki at http://wiki.apache.org or http://cwiki.apache.org
>
> == Initial Committers ==
>  * Aaron Cordova (aaron at cordovas dot org)
>  * Adam Fuchs (adam.p.fuchs at ugov dot gov)
>  * Eric Newton (ecn at swcomplete dot com)
>  * Billie Rinaldi (billie.j.rinaldi at ugov dot gov)
>  * Keith Turner (keith.turner at ptech-llc dot com)
>  * John Vines (john.w.vines at ugov dot gov)
>  * Chris Waring (christopher.a.waring at ugov dot gov)
>
> == Affiliations ==
>  * Aaron Cordova, The Interllective
>  * Adam Fuchs, National Security Agency
>  * Eric Newton, SW Complete Incorporated
>  * Billie Rinaldi, National Security Agency
>  * Keith Turner, Peterson Technology LLC
>  * John Vines, National Security Agency
>  * Chris Waring, National Security Agency
>
> == Sponsors ==
>  * Champion: Doug Cutting
>  * Nominated Mentors: Benson Margulies, ?, ?
>  * Sponsoring Entity: Apache Incubator
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>

Reply via email to