Re: [PROPOSAL] Apache AsterixDB Incubator

2015-01-19 Thread Mike Carey

Ditto - thanks for the support!
Cheers,
Mike

On 1/19/15 5:39 PM, Till Westmann wrote:


On Jan 19, 2015, at 11:34 AM, jan i j...@apache.org 
mailto:j...@apache.org wrote:


Looks like a real challenging project, and the proposal looks as if 
it has already been through a couple of refinement rounds.


Count on my +1, when it comes to voting.


Will do!

Thanks,
Till



rgds
jan i

On 19 January 2015 at 19:26, Henry Saputra henry.sapu...@gmail.com 
mailto:henry.sapu...@gmail.com wrote:


+1 This is GREAT News!

Was watching and trying AsterixDB last year and looked in awesome
shape.

I have my plate full but would love to help mentor this project
to get
it going to ASF if needed!

- Henry

On Wed, Jan 14, 2015 at 6:21 PM, Mattmann, Chris A (3980)
chris.a.mattm...@jpl.nasa.gov
mailto:chris.a.mattm...@jpl.nasa.gov wrote:
 Hi Folks,

 I am pleased to bring forth the Apache AsterixDB proposal to the
 Apache Incubator as Champion, working in collaboration with the
 team. Please find the wiki proposal here:

 https://wiki.apache.org/incubator/AsterixDBProposal


 Full text of the proposal is below. Please discuss and enjoy. I’ll
 leave the discussion open for a week, and then look to call a VOTE
 hopefully end of next week if all is well.

 Cheers!
 Chris Mattmann

 =
 Apache AsterixDB Proposal

 Abstract

 Apache AsterixDB is a scalable big data management system
(BDMS) that
 provides storage, management, and query capabilities for large
 collections of semi-structured data.

 Proposal

 AsterixDB is a big data management system (BDMS) that makes it
 well-suited to needs such as web data warehousing and social data
 storage and analysis. Feature-wise, AsterixDB has:

 * A NoSQL style data model (ADM) based on extending JSON with
object
   database concepts.
 * An expressive and declarative query language (AQL) for querying
   semi-structured data.
 * A runtime query execution engine, Hyracks, for
partitioned-parallel
   execution of query plans.
 * Partitioned LSM-based data storage and indexing for efficient
   ingestion of newly arriving data.
 * Support for querying and indexing external data (e.g., in
HDFS) as
   well as data stored within AsterixDB.
 * A rich set of primitive data types, including support for
spatial,
   temporal, and textual data.
 * Indexing options that include B+ trees, R trees, and inverted
   keyword index support.
 * Basic transactional (concurrency and recovery) capabilities
akin to
   those of a NoSQL store.


 Background and Rationale

 In the world of relational databases, the need to tackle data
volumes
 that exceed the capabilities of a single server led to the
 development of “shared-nothing” parallel database systems several
 decades ago. These systems spread data over a cluster based on a
 partitioning strategy, such as hash partitioning, and queries are
 processed by employing partitioned-parallel divide-and-conquer
 techniques. Since these systems are fronted by a high-level,
 declarative language (SQL), their users are shielded from the
 complexities of parallel programming. Parallel database systems
have
 been an extremely successful application of parallel computing, and
 quite a number of commercial products exist today.

 In the distributed systems world, the Web brought a need to
index and
 query its huge content. SQL and relational databases were not the
 answer, though shared-nothing clusters again emerged as the
hardware
 platform of choice. Google developed the Google File System
(GFS) and
 MapReduce programming model to allow programmers to store and
process
 Big Data by writing a few user-defined functions. The MapReduce
 framework applies these functions in parallel to data instances in
 distributed files (map) and to sorted groups of instances sharing a
 common key (reduce) -- not unlike the partitioned parallelism in
 parallel database systems. Apache's Hadoop MapReduce platform
is the
 most prominent implementation of this paradigm for the rest of the
 Big Data community. On top of Hadoop and HDFS sit declarative
 languages like Pig and Hive that each compile down to Hadoop
 MapReduce jobs.

 The big Web companies were also challenged by extreme user bases
 (100s of millions of users) and needed fast simple lookups and
 updates to very large keyed data sets like user profiles. SQL
 databases were deemed either too expensive or not scalable, so the
 “NoSQL movement” was born. The ASF now has HBase and Cassandra, two
 popular key-value stores, in this space. MongoDB and 

Re: [PROPOSAL] Apache AsterixDB Incubator

2015-01-19 Thread Mike Carey

Indeed - thanks!!
Cheers,
Mike

On 1/19/15 5:28 PM, Till Westmann wrote:

Hi Henry,

thanks! It’s great that you’ve seen (and liked) AsterixDB before.

Even if your time is very limited we would be very happy to have you on board 
as a mentor.
I’ll add you to the proposal.

Cheers,
Till


On Jan 19, 2015, at 10:26 AM, Henry Saputra henry.sapu...@gmail.com wrote:

+1 This is GREAT News!

Was watching and trying AsterixDB last year and looked in awesome shape.

I have my plate full but would love to help mentor this project to get
it going to ASF if needed!

- Henry

On Wed, Jan 14, 2015 at 6:21 PM, Mattmann, Chris A (3980)
chris.a.mattm...@jpl.nasa.gov wrote:

Hi Folks,

I am pleased to bring forth the Apache AsterixDB proposal to the
Apache Incubator as Champion, working in collaboration with the
team. Please find the wiki proposal here:

https://wiki.apache.org/incubator/AsterixDBProposal


Full text of the proposal is below. Please discuss and enjoy. I’ll
leave the discussion open for a week, and then look to call a VOTE
hopefully end of next week if all is well.

Cheers!
Chris Mattmann

=
Apache AsterixDB Proposal

Abstract

Apache AsterixDB is a scalable big data management system (BDMS) that
provides storage, management, and query capabilities for large
collections of semi-structured data.

Proposal

AsterixDB is a big data management system (BDMS) that makes it
well-suited to needs such as web data warehousing and social data
storage and analysis. Feature-wise, AsterixDB has:

* A NoSQL style data model (ADM) based on extending JSON with object
  database concepts.
* An expressive and declarative query language (AQL) for querying
  semi-structured data.
* A runtime query execution engine, Hyracks, for partitioned-parallel
  execution of query plans.
* Partitioned LSM-based data storage and indexing for efficient
  ingestion of newly arriving data.
* Support for querying and indexing external data (e.g., in HDFS) as
  well as data stored within AsterixDB.
* A rich set of primitive data types, including support for spatial,
  temporal, and textual data.
* Indexing options that include B+ trees, R trees, and inverted
  keyword index support.
* Basic transactional (concurrency and recovery) capabilities akin to
  those of a NoSQL store.


Background and Rationale

In the world of relational databases, the need to tackle data volumes
that exceed the capabilities of a single server led to the
development of “shared-nothing” parallel database systems several
decades ago. These systems spread data over a cluster based on a
partitioning strategy, such as hash partitioning, and queries are
processed by employing partitioned-parallel divide-and-conquer
techniques. Since these systems are fronted by a high-level,
declarative language (SQL), their users are shielded from the
complexities of parallel programming. Parallel database systems have
been an extremely successful application of parallel computing, and
quite a number of commercial products exist today.

In the distributed systems world, the Web brought a need to index and
query its huge content. SQL and relational databases were not the
answer, though shared-nothing clusters again emerged as the hardware
platform of choice. Google developed the Google File System (GFS) and
MapReduce programming model to allow programmers to store and process
Big Data by writing a few user-defined functions. The MapReduce
framework applies these functions in parallel to data instances in
distributed files (map) and to sorted groups of instances sharing a
common key (reduce) -- not unlike the partitioned parallelism in
parallel database systems. Apache's Hadoop MapReduce platform is the
most prominent implementation of this paradigm for the rest of the
Big Data community. On top of Hadoop and HDFS sit declarative
languages like Pig and Hive that each compile down to Hadoop
MapReduce jobs.

The big Web companies were also challenged by extreme user bases
(100s of millions of users) and needed fast simple lookups and
updates to very large keyed data sets like user profiles. SQL
databases were deemed either too expensive or not scalable, so the
“NoSQL movement” was born. The ASF now has HBase and Cassandra, two
popular key-value stores, in this space. MongoDB and Couchbase are
other open source alternatives (document stores).

It is evident from the rapidly growing popularity of NoSQL stores,
as well as the strong demand for Big Data analytics engines today,
that there is a strong (and growing!) need to store, process, *and*
query large volumes of semi-structured data in many application
areas. Until very recently, developers have had to ``choose'' between
using big data analytics engines like Apache Hive or Apache Spark,
which can do complex query processing and analysis over HDFS-resident
files, and flexible but low-function data stores like MongoDB or
Apache HBase. (The Apache Phoenix project,

Re: Next steps for various proposals (mentor re-boot, pTLP, etc.)

2015-01-19 Thread Mattmann, Chris A (3980)
Ditto, kudos to ChrisD

++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++






-Original Message-
From: Ted Dunning ted.dunn...@gmail.com
Reply-To: general@incubator.apache.org general@incubator.apache.org
Date: Monday, January 19, 2015 at 5:48 PM
To: general@incubator.apache.org general@incubator.apache.org
Subject: Re: Next steps for various proposals (mentor re-boot, pTLP, etc.)

On Mon, Jan 19, 2015 at 4:37 PM, Chris Douglas cdoug...@apache.org
wrote:

 submit a proposal to the
 board to start a new project. Fork the incubator.


Hmm...

That is the first interesting variation here.


-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org


Re: Next steps for various proposals (mentor re-boot, pTLP, etc.)

2015-01-19 Thread Andrew Purtell
I do not dispute anything written below nor do I intend this to be a last
word, just a clarification.

 I
​
n neither model are people powerless in any meaningful sense.

I approached these proposals by putting myself in the shoes of a newcomer
as best as I'm able (I've been PMC for years and PPMC also). The feeling of
investment in the process I'd have would be different than before under the
second two options (*not* the mentor reboot), as would be the calculus of
bringing a project to Apache. I have not observed the IPMC model to take
ownership away, because the initial contributors bringing their project
here are formed into a PPMC of equals and the usual release votes done by
the IPMC are up-or-down checks on releases, not exercises in differential
power.


On Mon, Jan 19, 2015 at 4:15 PM, Benson Margulies bimargul...@gmail.com
wrote:

 I'm in the odd situation of not particularly wanting to argue in favor
 of the proposal I wrote, yet finding it hard to resist the provocation
 of messages that appear, to me, to misunderstand it. So I'll restrict
 myself to the following, and I won't reply to any further dispute.
 Anyone else is welcome to have a last-er word than me.

 The incubator is like no other Apache project. It is not a
 meritocratic, volunteer, community, producing a software product for
 the public good. It is a volunteer, meritocratic, group of people
 solving a problem for the board.

 The problem that the incubator sets out to solve is this: How do you
 bootstrap a community from scratch?

 Because it is a group of people solving a problem for the board,
 there's no special 'merit' in shaping it in the usual ASF PMC growing
 community mold. There may by some problems with that shape related to
 scale, noise, and responsibility. Some people who find those problems
 to be severe want to make changes. Others, not so much. The board is
 always free to solve any problem with any structure that it finds
 effective; there's no 'constitutional' requirement that everything is
 a meritocratic PMC. Witness what happened to ApacheCon.

 We have here two competing visions. The current vision says: Let
 people who have never run an Apache community it start doing it with
 coaching and supervision from 'mentors'. The alternative vision says,
 Start with a kernel of people who have done it before. Those of you
 who are happy with the current vision? Great! I wrote up the
 alternative vision to try to put some clarity onto a lot of prior
 writing that found fault with the current model and looked for an
 alternative.

 I
 ​​
 n neither model are people powerless in any meaningful sense. In the
 current model, people have an interaction with the full IPMC. They can
 get pretty frustrated, but, as Mavin has documented, the frustration
 is more the fault of the lack of documentation than of the behavior of
 the IPMC. In the alternative model, they _start out_ with a group of
 'strangers' at the center of their community, but those strangers are
 chosen specifically for their ability and experience in building a
 consensus community. And, in any case, they they will rapidly become
 an ever-smaller fraction of the group.

 Badly-behaved mentors (and other IPMC members) can overbear in the
 current model, and badly-behaved seed-PMC members could overbear in
 the alternative.

 I very much doubt that email discussion will yield any consensus to do
 anything radical. Which might be fine. When the time comes to find
 Roman's successor, an interesting situation may arise in which
 candidates might declare their intention to implement changes. And
 just to be clear, _I_ am not running on the platform of implementing
 what I wrote -- or any other way.

 -
 To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
 For additional commands, e-mail: general-h...@incubator.apache.org




-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)


Re: [VOTE] Release Apache htrace-3.1.0-incubating

2015-01-19 Thread Nick Dimiduk
+1 (non-binding, see my post from dev thread*)

*
http://mail-archives.apache.org/mod_mbox/incubator-htrace-dev/201501.mbox/%3cCANZa=guvaa2oeqfiudiz1tg82e23w80uzu+fspsk6xcbwtn...@mail.gmail.com%3e

On Mon, Jan 19, 2015 at 7:34 AM, Jake Farrell jfarr...@apache.org wrote:

 +1 binding

 -Jake


 On Sat, Jan 17, 2015 at 10:36 PM, Stack st...@duboce.net wrote:

  Apache HTrace (incubating), after ten release candidates, has voted to
  release the below referenced Apache HTrace 3.1.0-incubating release
  candidate.
 
  Dear IPMC, please vote on our first release candidate as an Apache
  Incubator project.
 
  Here is the vote thread we ran on our dev list (Six binding +1 votes and
 no
  dissent) with
  a subject: [VOTE] htrace-3.1.0, the this is it for sure!, tenth
 release
  candidate
 
  *
 
 http://mail-archives.apache.org/mod_mbox/incubator-htrace-dev/201501.mbox/%3CCADcMMgF_agDCzcwsxpdGsJOzCQ1ebA5z7hsM_-oFbneAVzh4dg%40mail.gmail.com%3E
  
 
 http://mail-archives.apache.org/mod_mbox/incubator-htrace-dev/201501.mbox/%3CCADcMMgF_agDCzcwsxpdGsJOzCQ1ebA5z7hsM_-oFbneAVzh4dg%40mail.gmail.com%3E
  *
 
  The source tarball, hashes, and signing are here:
 
http://people.apache.org/~stack/htrace-3.1.0-incubatingRC9/
 
  (Over in htrace our RC number was zero based so RC9 == tenth RC)
 
  Related maven artifacts are posted here:
 
 
 https://repository.apache.org/content/repositories/orgapachehtrace-1014
 
  The tag for the RC is here:
 
 
 https://git-wip-us.apache.org/repos/asf?p=incubator-htrace.git;a=commit;h=0cabe569bc05a58c7a319a460eed5e50e136bae7
 
  The KEYS file with the key used signing is available here:
 
  *https://dist.apache.org/repos/dist/release/incubator/htrace/KEYS
  https://dist.apache.org/repos/dist/release/incubator/htrace/KEYS*
 
 
  44 issues were closed/resolved for this release:
 
 
 
 https://issues.apache.org/jira/issues/?jql=project%20%3D%20HTRACE%20AND%20status%20%3D%20resolved%20AND%20fixVersion%20%3D%203.1.0%20ORDER%20BY%20issuetype%20DESC
 
 
  The vote will be open for 72 hours.
 
  [ ] +1 Release this package as Apache HTrace 3.1.0-incubating
  [ ] +0 no opinion
  [ ] -1 Do not release this package because ...
 
 
  Thanks,
  St.Ack
 



Re: Next steps for various proposals (mentor re-boot, pTLP, etc.)

2015-01-19 Thread Benson Margulies
I'm in the odd situation of not particularly wanting to argue in favor
of the proposal I wrote, yet finding it hard to resist the provocation
of messages that appear, to me, to misunderstand it. So I'll restrict
myself to the following, and I won't reply to any further dispute.
Anyone else is welcome to have a last-er word than me.

The incubator is like no other Apache project. It is not a
meritocratic, volunteer, community, producing a software product for
the public good. It is a volunteer, meritocratic, group of people
solving a problem for the board.

The problem that the incubator sets out to solve is this: How do you
bootstrap a community from scratch?

Because it is a group of people solving a problem for the board,
there's no special 'merit' in shaping it in the usual ASF PMC growing
community mold. There may by some problems with that shape related to
scale, noise, and responsibility. Some people who find those problems
to be severe want to make changes. Others, not so much. The board is
always free to solve any problem with any structure that it finds
effective; there's no 'constitutional' requirement that everything is
a meritocratic PMC. Witness what happened to ApacheCon.

We have here two competing visions. The current vision says: Let
people who have never run an Apache community it start doing it with
coaching and supervision from 'mentors'. The alternative vision says,
Start with a kernel of people who have done it before. Those of you
who are happy with the current vision? Great! I wrote up the
alternative vision to try to put some clarity onto a lot of prior
writing that found fault with the current model and looked for an
alternative.

In neither model are people powerless in any meaningful sense. In the
current model, people have an interaction with the full IPMC. They can
get pretty frustrated, but, as Mavin has documented, the frustration
is more the fault of the lack of documentation than of the behavior of
the IPMC. In the alternative model, they _start out_ with a group of
'strangers' at the center of their community, but those strangers are
chosen specifically for their ability and experience in building a
consensus community. And, in any case, they they will rapidly become
an ever-smaller fraction of the group.

Badly-behaved mentors (and other IPMC members) can overbear in the
current model, and badly-behaved seed-PMC members could overbear in
the alternative.

I very much doubt that email discussion will yield any consensus to do
anything radical. Which might be fine. When the time comes to find
Roman's successor, an interesting situation may arise in which
candidates might declare their intention to implement changes. And
just to be clear, _I_ am not running on the platform of implementing
what I wrote -- or any other way.

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: Next steps for various proposals (mentor re-boot, pTLP, etc.)

2015-01-19 Thread Andrew Purtell
I think the cures are all problematic and might be worse than the disease.


On Mon, Jan 19, 2015 at 1:47 PM, Roman Shaposhnik r...@apache.org wrote:

 On Wed, Jan 14, 2015 at 8:48 AM, Roman Shaposhnik r...@apache.org wrote:
  Hi!
 
  at this point we have had a few lively threads
  discussing three somewhat different proposals:
 #1 mentor re-boot
 #2 pTLP
 #3 Ross's strawman http://s.apache.org/8eS
  it feels to me that all three need additional work
  to be done before we can have any reasonable
  consensus around them (let alone voting).
 
  Wearing my chair hat, I would like to suggest that
  the next step should be: for each proposal we identify
  points that are going to block consensus (AKA would
  result in -1 vote if it comes to a vote). I suggest we
  do it on the wiki pages themselves (I'll wikify Ross's
  proposal tonight). Not editing the wikis but simply
  collecting this feedback as the last section in each
  proposal. The idea would be to identify all such
  points in a week or so.
 
  Sounds good?

 To follow up. Each of the proposals:
 https://wiki.apache.org/incubator/MentorRebootProposal


​​An active mentor is removed from a podling if that mentor does not
review/sign off on a release.

​The above implies the foundation has a pool of mentors able to
consistently meet every reporting requirement in a timely manner, without
regard to personal or professional obstacles.​ I don't see it. For an
organization almost entirely made up of volunteers this seems overly
optimistic. There is only a small core membership who are capable and
willing to do this as evidenced by a skim of history of general@incubator
and members@. Perhaps this core group will end up shouldering the
incubation load in its entirety. Although sadly this is more or less the
current state of affairs, individual podlings do come with new mentors not
part of the professional membership motivated to see at least that
specific podling through. It's also risky to expect mentors kicked from a
podling to be okay with it and want to try again, especially if listed on
some naughty list to the board.




 https://wiki.apache.org/incubator/Strawman


​​Only ASF members on the PPMC will have binding votes for the releases.

​This proposal seems better than the others in my estimation, but doesn't
allow podlings full investment in their own release management. The members
on the PPMC who have binding votes will drive the release process out of
necessity. Once the podling graduates and the members on the PPMC leave to
resume other interests or duties, only then for the first time is the
project running their own releases. I think it was better to let the
podling own their release process but have the IPMC (or equivalent) have an
up-or-down vote afterward as a check on their activities.




 https://wiki.apache.org/incubator/IncubatorV2


This proposal revokes merit earned by existing IPMC members and reboots
incubator supervision as a sub-board limited to 15 members. How members
apply to this board is not specified. It is suggested the current board
make recommendations to the board for their replacements, a very
unmeritocratic suggestion that is quite surprising. It's not clear at all
how the membership can address issues with this sub-board as they can
with the Board. I think this proposal takes the likely outcome of the first
proposal, that only a small core group of professional membership can
manage sufficient activity as mentors to not be kicked from podlings, and
codifies it with new structure and bylaws. Maybe in the end this is
admitting reality. However, discussion of this proposal also floated the
idea that the sub-board be later given authority to supervise the affairs
of established TLPs, which is deeply problematic* and I suspect still
hovers in the wings. I would hope not.

All proposals for new ASF projects must include an initial PMC chair and
an initial set of PMC members. These people must be acceptable to the
board. It is the responsibility of the Incubator Committee to vett these
people. All of them must have experience on existing PMCs

This doubles down on the aspect of the Strawman proposal where PPMC members
are powerless to vote on releases. Here they are powerless to make any and
all project management decisions about their own software they brought to
Apache. It's not mentoring if you make all of the decisions for them.

​* - Find me any PMC of any TLP that would ​welcome the self-introduction
of newly empowered meddlers who by definition are uninformed of their
project particulars.



 now has the feedback gathering section at the end.
 I am done with my personal feedback. Please provide
 yours.

 Here's the criteria you can apply when deciding whether
 to spend time on this or not: imagine that the proposal
 the way it is written were to come to a vote. If at that point
 you'd be inclined to vote -1 -- please let us know NOW.

 Using a VOTE thread as a forcing function for folks to
 

Re: Next steps for various proposals (mentor re-boot, pTLP, etc.)

2015-01-19 Thread Chris Douglas
I agree with Andrew. Creating a sub-board of the most vocal members of
the IPMC distills its dysfunction.

But this doesn't require consensus. The proposals focus on identifying
the true members of the IPMC; clearly, the authors believe
themselves to be among the elect. So make a list of the IPMC members
you believe should judge the other 90%, and submit a proposal to the
board to start a new project. Fork the incubator.

If the board is interested in your proposal, then you can demonstrate
how successful a small, committed group can be in contrast to the 150+
member IPMC. Concurrently, others may propose TLPs directly to the
board, and this committee will continue as-is.

Surely all of you have better things to do than... this. -C

On Mon, Jan 19, 2015 at 3:55 PM, Andrew Purtell apurt...@apache.org wrote:
 I think the cures are all problematic and might be worse than the disease.


 On Mon, Jan 19, 2015 at 1:47 PM, Roman Shaposhnik r...@apache.org wrote:

 On Wed, Jan 14, 2015 at 8:48 AM, Roman Shaposhnik r...@apache.org wrote:
  Hi!
 
  at this point we have had a few lively threads
  discussing three somewhat different proposals:
 #1 mentor re-boot
 #2 pTLP
 #3 Ross's strawman http://s.apache.org/8eS
  it feels to me that all three need additional work
  to be done before we can have any reasonable
  consensus around them (let alone voting).
 
  Wearing my chair hat, I would like to suggest that
  the next step should be: for each proposal we identify
  points that are going to block consensus (AKA would
  result in -1 vote if it comes to a vote). I suggest we
  do it on the wiki pages themselves (I'll wikify Ross's
  proposal tonight). Not editing the wikis but simply
  collecting this feedback as the last section in each
  proposal. The idea would be to identify all such
  points in a week or so.
 
  Sounds good?

 To follow up. Each of the proposals:
 https://wiki.apache.org/incubator/MentorRebootProposal


 An active mentor is removed from a podling if that mentor does not
 review/sign off on a release.

 The above implies the foundation has a pool of mentors able to
 consistently meet every reporting requirement in a timely manner, without
 regard to personal or professional obstacles. I don't see it. For an
 organization almost entirely made up of volunteers this seems overly
 optimistic. There is only a small core membership who are capable and
 willing to do this as evidenced by a skim of history of general@incubator
 and members@. Perhaps this core group will end up shouldering the
 incubation load in its entirety. Although sadly this is more or less the
 current state of affairs, individual podlings do come with new mentors not
 part of the professional membership motivated to see at least that
 specific podling through. It's also risky to expect mentors kicked from a
 podling to be okay with it and want to try again, especially if listed on
 some naughty list to the board.




 https://wiki.apache.org/incubator/Strawman


 Only ASF members on the PPMC will have binding votes for the releases.

 This proposal seems better than the others in my estimation, but doesn't
 allow podlings full investment in their own release management. The members
 on the PPMC who have binding votes will drive the release process out of
 necessity. Once the podling graduates and the members on the PPMC leave to
 resume other interests or duties, only then for the first time is the
 project running their own releases. I think it was better to let the
 podling own their release process but have the IPMC (or equivalent) have an
 up-or-down vote afterward as a check on their activities.




 https://wiki.apache.org/incubator/IncubatorV2


 This proposal revokes merit earned by existing IPMC members and reboots
 incubator supervision as a sub-board limited to 15 members. How members
 apply to this board is not specified. It is suggested the current board
 make recommendations to the board for their replacements, a very
 unmeritocratic suggestion that is quite surprising. It's not clear at all
 how the membership can address issues with this sub-board as they can
 with the Board. I think this proposal takes the likely outcome of the first
 proposal, that only a small core group of professional membership can
 manage sufficient activity as mentors to not be kicked from podlings, and
 codifies it with new structure and bylaws. Maybe in the end this is
 admitting reality. However, discussion of this proposal also floated the
 idea that the sub-board be later given authority to supervise the affairs
 of established TLPs, which is deeply problematic* and I suspect still
 hovers in the wings. I would hope not.

 All proposals for new ASF projects must include an initial PMC chair and
 an initial set of PMC members. These people must be acceptable to the
 board. It is the responsibility of the Incubator Committee to vett these
 people. All of them must have experience on existing PMCs

 This doubles 

Re: [PROPOSAL] Apache AsterixDB Incubator

2015-01-19 Thread Henry Saputra
Thanks Till,

Will try to solicit more mentors to help.
Especially with initial committers mostly have not been exposed to
contributing the Apache way.

- Henry

On Mon, Jan 19, 2015 at 5:28 PM, Till Westmann t...@westmann.org wrote:
 Hi Henry,

 thanks! It’s great that you’ve seen (and liked) AsterixDB before.

 Even if your time is very limited we would be very happy to have you on board 
 as a mentor.
 I’ll add you to the proposal.

 Cheers,
 Till

 On Jan 19, 2015, at 10:26 AM, Henry Saputra henry.sapu...@gmail.com wrote:

 +1 This is GREAT News!

 Was watching and trying AsterixDB last year and looked in awesome shape.

 I have my plate full but would love to help mentor this project to get
 it going to ASF if needed!

 - Henry

 On Wed, Jan 14, 2015 at 6:21 PM, Mattmann, Chris A (3980)
 chris.a.mattm...@jpl.nasa.gov wrote:
 Hi Folks,

 I am pleased to bring forth the Apache AsterixDB proposal to the
 Apache Incubator as Champion, working in collaboration with the
 team. Please find the wiki proposal here:

 https://wiki.apache.org/incubator/AsterixDBProposal


 Full text of the proposal is below. Please discuss and enjoy. I’ll
 leave the discussion open for a week, and then look to call a VOTE
 hopefully end of next week if all is well.

 Cheers!
 Chris Mattmann

 =
 Apache AsterixDB Proposal

 Abstract

 Apache AsterixDB is a scalable big data management system (BDMS) that
 provides storage, management, and query capabilities for large
 collections of semi-structured data.

 Proposal

 AsterixDB is a big data management system (BDMS) that makes it
 well-suited to needs such as web data warehousing and social data
 storage and analysis. Feature-wise, AsterixDB has:

 * A NoSQL style data model (ADM) based on extending JSON with object
  database concepts.
 * An expressive and declarative query language (AQL) for querying
  semi-structured data.
 * A runtime query execution engine, Hyracks, for partitioned-parallel
  execution of query plans.
 * Partitioned LSM-based data storage and indexing for efficient
  ingestion of newly arriving data.
 * Support for querying and indexing external data (e.g., in HDFS) as
  well as data stored within AsterixDB.
 * A rich set of primitive data types, including support for spatial,
  temporal, and textual data.
 * Indexing options that include B+ trees, R trees, and inverted
  keyword index support.
 * Basic transactional (concurrency and recovery) capabilities akin to
  those of a NoSQL store.


 Background and Rationale

 In the world of relational databases, the need to tackle data volumes
 that exceed the capabilities of a single server led to the
 development of “shared-nothing” parallel database systems several
 decades ago. These systems spread data over a cluster based on a
 partitioning strategy, such as hash partitioning, and queries are
 processed by employing partitioned-parallel divide-and-conquer
 techniques. Since these systems are fronted by a high-level,
 declarative language (SQL), their users are shielded from the
 complexities of parallel programming. Parallel database systems have
 been an extremely successful application of parallel computing, and
 quite a number of commercial products exist today.

 In the distributed systems world, the Web brought a need to index and
 query its huge content. SQL and relational databases were not the
 answer, though shared-nothing clusters again emerged as the hardware
 platform of choice. Google developed the Google File System (GFS) and
 MapReduce programming model to allow programmers to store and process
 Big Data by writing a few user-defined functions. The MapReduce
 framework applies these functions in parallel to data instances in
 distributed files (map) and to sorted groups of instances sharing a
 common key (reduce) -- not unlike the partitioned parallelism in
 parallel database systems. Apache's Hadoop MapReduce platform is the
 most prominent implementation of this paradigm for the rest of the
 Big Data community. On top of Hadoop and HDFS sit declarative
 languages like Pig and Hive that each compile down to Hadoop
 MapReduce jobs.

 The big Web companies were also challenged by extreme user bases
 (100s of millions of users) and needed fast simple lookups and
 updates to very large keyed data sets like user profiles. SQL
 databases were deemed either too expensive or not scalable, so the
 “NoSQL movement” was born. The ASF now has HBase and Cassandra, two
 popular key-value stores, in this space. MongoDB and Couchbase are
 other open source alternatives (document stores).

 It is evident from the rapidly growing popularity of NoSQL stores,
 as well as the strong demand for Big Data analytics engines today,
 that there is a strong (and growing!) need to store, process, *and*
 query large volumes of semi-structured data in many application
 areas. Until very recently, developers have had to ``choose'' between
 using big data analytics 

Re: Next steps for various proposals (mentor re-boot, pTLP, etc.)

2015-01-19 Thread Bertrand Delacretaz
On Tue, Jan 20, 2015 at 1:37 AM, Chris Douglas cdoug...@apache.org wrote:
 ...So make a list of the IPMC members
 you believe should judge the other 90%, and submit a proposal to the
 board to start a new project. Fork the incubator

How is that different from pruning the current IPMC membership by
removing inactive members?

I don't think those inactive folks are a problem, but if people think
they are it's easy to ask them if they want to stay, and remove those
who reply no or don't reply.

-Bertrand

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [PROPOSAL] Apache AsterixDB Incubator

2015-01-19 Thread Ted Dunning
Chris just asked me under separate cover.

I am happy to help out as mentor.



On Mon, Jan 19, 2015 at 8:17 PM, Henry Saputra henry.sapu...@gmail.com
wrote:

 Thanks Till,

 Will try to solicit more mentors to help.
 Especially with initial committers mostly have not been exposed to
 contributing the Apache way.

 - Henry

 On Mon, Jan 19, 2015 at 5:28 PM, Till Westmann t...@westmann.org wrote:
  Hi Henry,
 
  thanks! It’s great that you’ve seen (and liked) AsterixDB before.
 
  Even if your time is very limited we would be very happy to have you on
 board as a mentor.
  I’ll add you to the proposal.
 
  Cheers,
  Till
 
  On Jan 19, 2015, at 10:26 AM, Henry Saputra henry.sapu...@gmail.com
 wrote:
 
  +1 This is GREAT News!
 
  Was watching and trying AsterixDB last year and looked in awesome shape.
 
  I have my plate full but would love to help mentor this project to get
  it going to ASF if needed!
 
  - Henry
 
  On Wed, Jan 14, 2015 at 6:21 PM, Mattmann, Chris A (3980)
  chris.a.mattm...@jpl.nasa.gov wrote:
  Hi Folks,
 
  I am pleased to bring forth the Apache AsterixDB proposal to the
  Apache Incubator as Champion, working in collaboration with the
  team. Please find the wiki proposal here:
 
  https://wiki.apache.org/incubator/AsterixDBProposal
 
 
  Full text of the proposal is below. Please discuss and enjoy. I’ll
  leave the discussion open for a week, and then look to call a VOTE
  hopefully end of next week if all is well.
 
  Cheers!
  Chris Mattmann
 
  =
  Apache AsterixDB Proposal
 
  Abstract
 
  Apache AsterixDB is a scalable big data management system (BDMS) that
  provides storage, management, and query capabilities for large
  collections of semi-structured data.
 
  Proposal
 
  AsterixDB is a big data management system (BDMS) that makes it
  well-suited to needs such as web data warehousing and social data
  storage and analysis. Feature-wise, AsterixDB has:
 
  * A NoSQL style data model (ADM) based on extending JSON with object
   database concepts.
  * An expressive and declarative query language (AQL) for querying
   semi-structured data.
  * A runtime query execution engine, Hyracks, for partitioned-parallel
   execution of query plans.
  * Partitioned LSM-based data storage and indexing for efficient
   ingestion of newly arriving data.
  * Support for querying and indexing external data (e.g., in HDFS) as
   well as data stored within AsterixDB.
  * A rich set of primitive data types, including support for spatial,
   temporal, and textual data.
  * Indexing options that include B+ trees, R trees, and inverted
   keyword index support.
  * Basic transactional (concurrency and recovery) capabilities akin to
   those of a NoSQL store.
 
 
  Background and Rationale
 
  In the world of relational databases, the need to tackle data volumes
  that exceed the capabilities of a single server led to the
  development of “shared-nothing” parallel database systems several
  decades ago. These systems spread data over a cluster based on a
  partitioning strategy, such as hash partitioning, and queries are
  processed by employing partitioned-parallel divide-and-conquer
  techniques. Since these systems are fronted by a high-level,
  declarative language (SQL), their users are shielded from the
  complexities of parallel programming. Parallel database systems have
  been an extremely successful application of parallel computing, and
  quite a number of commercial products exist today.
 
  In the distributed systems world, the Web brought a need to index and
  query its huge content. SQL and relational databases were not the
  answer, though shared-nothing clusters again emerged as the hardware
  platform of choice. Google developed the Google File System (GFS) and
  MapReduce programming model to allow programmers to store and process
  Big Data by writing a few user-defined functions. The MapReduce
  framework applies these functions in parallel to data instances in
  distributed files (map) and to sorted groups of instances sharing a
  common key (reduce) -- not unlike the partitioned parallelism in
  parallel database systems. Apache's Hadoop MapReduce platform is the
  most prominent implementation of this paradigm for the rest of the
  Big Data community. On top of Hadoop and HDFS sit declarative
  languages like Pig and Hive that each compile down to Hadoop
  MapReduce jobs.
 
  The big Web companies were also challenged by extreme user bases
  (100s of millions of users) and needed fast simple lookups and
  updates to very large keyed data sets like user profiles. SQL
  databases were deemed either too expensive or not scalable, so the
  “NoSQL movement” was born. The ASF now has HBase and Cassandra, two
  popular key-value stores, in this space. MongoDB and Couchbase are
  other open source alternatives (document stores).
 
  It is evident from the rapidly growing popularity of NoSQL stores,
  as well as the strong 

Reporting and releasing for Ripple

2015-01-19 Thread Ross Gardler (MS OPEN TECH)
The Ripple community asked for a stay of execution before being moved to the 
attic, as was recommended by some. This was granted in November 2014 with a 
review in six months.

No board report was submitted this month and no action has been taken with 
respect to the concerns I raised about releases from this project.

If this project community wishes to continue to operate as an incubating 
project, and eventually graduate, then these items need to be addressed.  There 
are two months remaining. Without an ASF approved release happening in that 
period it is unlikely that the IPMC will approve a further six months.

I'm here as a mentor and ready to help guide any member of this community 
(committer or otherwise, everyone is welcome) in making a release.

Ross

Microsoft Open Technologies, Inc.
A subsidiary of Microsoft Corporation



Re: Next steps for various proposals (mentor re-boot, pTLP, etc.)

2015-01-19 Thread Roman Shaposhnik
On Wed, Jan 14, 2015 at 8:48 AM, Roman Shaposhnik r...@apache.org wrote:
 Hi!

 at this point we have had a few lively threads
 discussing three somewhat different proposals:
#1 mentor re-boot
#2 pTLP
#3 Ross's strawman http://s.apache.org/8eS
 it feels to me that all three need additional work
 to be done before we can have any reasonable
 consensus around them (let alone voting).

 Wearing my chair hat, I would like to suggest that
 the next step should be: for each proposal we identify
 points that are going to block consensus (AKA would
 result in -1 vote if it comes to a vote). I suggest we
 do it on the wiki pages themselves (I'll wikify Ross's
 proposal tonight). Not editing the wikis but simply
 collecting this feedback as the last section in each
 proposal. The idea would be to identify all such
 points in a week or so.

 Sounds good?

To follow up. Each of the proposals:
https://wiki.apache.org/incubator/MentorRebootProposal
https://wiki.apache.org/incubator/Strawman
https://wiki.apache.org/incubator/IncubatorV2

now has the feedback gathering section at the end.
I am done with my personal feedback. Please provide
yours.

Here's the criteria you can apply when deciding whether
to spend time on this or not: imagine that the proposal
the way it is written were to come to a vote. If at that point
you'd be inclined to vote -1 -- please let us know NOW.

Using a VOTE thread as a forcing function for folks to
provide feedback would be *really* unfortunate.

Also, please try to keep 'deal breakers' section as small
as possible (pushing all the non-critical piece of your
feedback to the 'suggestions' section). When in doubt
(even if it is -0) -- make it go to suggestions.

The only items that belong to 'dealbreakers' are the ones
that would *strongly* motivate you to vote -1

Thanks,
Roman.

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: Next steps for various proposals (mentor re-boot, pTLP, etc.)

2015-01-19 Thread Roman Shaposhnik
On Fri, Jan 16, 2015 at 3:04 PM, Ross Gardler (MS OPEN TECH)
ross.gard...@microsoft.com wrote:
 Or we could just do it

 We debated plenty. Three proposals came out of it (two if you look at mine as 
 the strawman it was intended to be).

As a matter of fact, while editing yours (sorry, I took the liberty) and leaving
feedback for Alan's I felt like they were pretty close in spirit, with
yours going
all the way to make podlings behave as close to TLPs as possible while
still not overwhelming the board.

 Those proposals are not mutually exclusive.

 I say record them in the wiki. Run them for a while. Then compare against the 
 problems
 document we drew up a couple of years back and see how effective they are.

That's the plan.

Thanks,
Roman.

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: Next steps for various proposals (mentor re-boot, pTLP, etc.)

2015-01-19 Thread Bertrand Delacretaz
On Sat, Jan 17, 2015 at 12:16 AM, Ross Gardler (MS OPEN TECH)
ross.gard...@microsoft.com wrote:
 http://wiki.apache.org/incubator/IncubatorIssues2013

If someone reviews this it would be nice to add brief comments about
today's state, maybe right after each item's title (like early 2014
status: still a problem)

-Bertrand

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



[RESULTS] Release Apache Usergrid 1.0.1 (incubating) RC5

2015-01-19 Thread Dave
+1 (binding) John Ament
+1 (binding) Dave Johnson
+1 (binding) Justin Mclean

The IPMC vote passes. Thanks everybody!

- Dave




On Sat, Jan 17, 2015 at 6:29 PM, Dave snoopd...@gmail.com wrote:

 Thanks for the careful review and the +1!

 I filed the issues you raised as USERGRID-358.
 https://issues.apache.org/jira/browse/USERGRID-358

 - Dave



 On Sat, Jan 17, 2015 at 6:10 PM, Justin Mclean jus...@classsoftware.com
 wrote:

 Hi,

 +1 (binding) if LICENSE and NOTICE are fixed up for next release.

 - incubating in name
 - DISCLAIMER exists
 - Signatures and MD5 correct (in dist area)
 - LICENSE and NOTICE minor issues (see below)
 - Source files have headers.
 - No unexpected binary files in release
 - Can compile from source

 LICENSE issues
 - May require font awesome license (see
 ./docs/_theme/sphinx_rtd_theme/static/css/theme.css)
 - Bootstrap version is MIT not Apache
 ./portal/js/libs/bootstrap/LICENSE.txt
 - Missing MIT license for
 ./sdks/dotnet/packages/Newtonsoft.Json.4.5.11/LICENSE.txt
 - Missing BSD license for NSubstitute
 ./sdks/dotnet/packages/NSubstitute.1.6.0.0/LICENSE.txt
 - Missing BSD license for sphinx eg
 ./docs/_theme/sphinx_rtd_theme/search.html

 NOTICE issues
 - Intro.js should be in LICENSE not NOTICE. However i was unable to find
 intro.js anywhere in the package so it may no longer be required

 Thanks,
 Justin



 -
 To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
 For additional commands, e-mail: general-h...@incubator.apache.org





Re: Next steps for various proposals (mentor re-boot, pTLP, etc.)

2015-01-19 Thread Roman Shaposhnik
On Mon, Jan 19, 2015 at 2:59 AM, Bertrand Delacretaz
bdelacre...@apache.org wrote:
 On Sat, Jan 17, 2015 at 12:16 AM, Ross Gardler (MS OPEN TECH)
 ross.gard...@microsoft.com wrote:
 http://wiki.apache.org/incubator/IncubatorIssues2013

 If someone reviews this it would be nice to add brief comments about
 today's state, maybe right after each item's title (like early 2014
 status: still a problem)

First of all, this appears to be an immutable page.

That said, looking at the list of issues, I'd say
that every one of them still applies (the caveat
being: to what degree) although the analysis
suggestions could be slightly out of date.

In general, I'd split the issues in two categories:

Operational/Structural issues
Issue 01 - lack of mentor participation
Issue 02 - lack of progress towards graduation
Issue 03 - Too many cooks spoil the IPMC broth
Issue 04 - Horrible signal-to-noise ratio on general@a.o for
podling contributors
Issue 05 - Inadequate reporting
Issue 07 - Vetting releases is a huge pain
Issue 08 - The IPMC is broken

Documentation:
Issue 06 - Podlings status metadata is not reliable
Issue 09 - People do not follow through to improve Incubator documentation
Issue 10 - Steps for Podlings to Acquire Resources Are Disparate
and Poorly Documented
Issue 11 - Clearly and concisely document the principles and
constraints for the ASF
Issue 12 - Cold welcome for new projects

In my view, the two categories can be addressed in
parallel. The Documentation agenda seems to be
passionately championed by Marvin and a few other folks.
I'd expect them to make quite a bit of progress there.

Personally, I'd like to focus this thread on the first category.

Thanks,
Roman.

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [PROPOSAL] Apache AsterixDB Incubator

2015-01-19 Thread jan i
Looks like a real challenging project, and the proposal looks as if it has
already been through a couple of refinement rounds.

Count on my +1, when it comes to voting.

rgds
jan i

On 19 January 2015 at 19:26, Henry Saputra henry.sapu...@gmail.com wrote:

 +1 This is GREAT News!

 Was watching and trying AsterixDB last year and looked in awesome shape.

 I have my plate full but would love to help mentor this project to get
 it going to ASF if needed!

 - Henry

 On Wed, Jan 14, 2015 at 6:21 PM, Mattmann, Chris A (3980)
 chris.a.mattm...@jpl.nasa.gov wrote:
  Hi Folks,
 
  I am pleased to bring forth the Apache AsterixDB proposal to the
  Apache Incubator as Champion, working in collaboration with the
  team. Please find the wiki proposal here:
 
  https://wiki.apache.org/incubator/AsterixDBProposal
 
 
  Full text of the proposal is below. Please discuss and enjoy. I’ll
  leave the discussion open for a week, and then look to call a VOTE
  hopefully end of next week if all is well.
 
  Cheers!
  Chris Mattmann
 
  =
  Apache AsterixDB Proposal
 
  Abstract
 
  Apache AsterixDB is a scalable big data management system (BDMS) that
  provides storage, management, and query capabilities for large
  collections of semi-structured data.
 
  Proposal
 
  AsterixDB is a big data management system (BDMS) that makes it
  well-suited to needs such as web data warehousing and social data
  storage and analysis. Feature-wise, AsterixDB has:
 
  * A NoSQL style data model (ADM) based on extending JSON with object
database concepts.
  * An expressive and declarative query language (AQL) for querying
semi-structured data.
  * A runtime query execution engine, Hyracks, for partitioned-parallel
execution of query plans.
  * Partitioned LSM-based data storage and indexing for efficient
ingestion of newly arriving data.
  * Support for querying and indexing external data (e.g., in HDFS) as
well as data stored within AsterixDB.
  * A rich set of primitive data types, including support for spatial,
temporal, and textual data.
  * Indexing options that include B+ trees, R trees, and inverted
keyword index support.
  * Basic transactional (concurrency and recovery) capabilities akin to
those of a NoSQL store.
 
 
  Background and Rationale
 
  In the world of relational databases, the need to tackle data volumes
  that exceed the capabilities of a single server led to the
  development of “shared-nothing” parallel database systems several
  decades ago. These systems spread data over a cluster based on a
  partitioning strategy, such as hash partitioning, and queries are
  processed by employing partitioned-parallel divide-and-conquer
  techniques. Since these systems are fronted by a high-level,
  declarative language (SQL), their users are shielded from the
  complexities of parallel programming. Parallel database systems have
  been an extremely successful application of parallel computing, and
  quite a number of commercial products exist today.
 
  In the distributed systems world, the Web brought a need to index and
  query its huge content. SQL and relational databases were not the
  answer, though shared-nothing clusters again emerged as the hardware
  platform of choice. Google developed the Google File System (GFS) and
  MapReduce programming model to allow programmers to store and process
  Big Data by writing a few user-defined functions. The MapReduce
  framework applies these functions in parallel to data instances in
  distributed files (map) and to sorted groups of instances sharing a
  common key (reduce) -- not unlike the partitioned parallelism in
  parallel database systems. Apache's Hadoop MapReduce platform is the
  most prominent implementation of this paradigm for the rest of the
  Big Data community. On top of Hadoop and HDFS sit declarative
  languages like Pig and Hive that each compile down to Hadoop
  MapReduce jobs.
 
  The big Web companies were also challenged by extreme user bases
  (100s of millions of users) and needed fast simple lookups and
  updates to very large keyed data sets like user profiles. SQL
  databases were deemed either too expensive or not scalable, so the
  “NoSQL movement” was born. The ASF now has HBase and Cassandra, two
  popular key-value stores, in this space. MongoDB and Couchbase are
  other open source alternatives (document stores).
 
  It is evident from the rapidly growing popularity of NoSQL stores,
  as well as the strong demand for Big Data analytics engines today,
  that there is a strong (and growing!) need to store, process, *and*
  query large volumes of semi-structured data in many application
  areas. Until very recently, developers have had to ``choose'' between
  using big data analytics engines like Apache Hive or Apache Spark,
  which can do complex query processing and analysis over HDFS-resident
  files, and flexible but low-function data stores like MongoDB 

Re: [PROPOSAL] Apache AsterixDB Incubator

2015-01-19 Thread Henry Saputra
+1 This is GREAT News!

Was watching and trying AsterixDB last year and looked in awesome shape.

I have my plate full but would love to help mentor this project to get
it going to ASF if needed!

- Henry

On Wed, Jan 14, 2015 at 6:21 PM, Mattmann, Chris A (3980)
chris.a.mattm...@jpl.nasa.gov wrote:
 Hi Folks,

 I am pleased to bring forth the Apache AsterixDB proposal to the
 Apache Incubator as Champion, working in collaboration with the
 team. Please find the wiki proposal here:

 https://wiki.apache.org/incubator/AsterixDBProposal


 Full text of the proposal is below. Please discuss and enjoy. I’ll
 leave the discussion open for a week, and then look to call a VOTE
 hopefully end of next week if all is well.

 Cheers!
 Chris Mattmann

 =
 Apache AsterixDB Proposal

 Abstract

 Apache AsterixDB is a scalable big data management system (BDMS) that
 provides storage, management, and query capabilities for large
 collections of semi-structured data.

 Proposal

 AsterixDB is a big data management system (BDMS) that makes it
 well-suited to needs such as web data warehousing and social data
 storage and analysis. Feature-wise, AsterixDB has:

 * A NoSQL style data model (ADM) based on extending JSON with object
   database concepts.
 * An expressive and declarative query language (AQL) for querying
   semi-structured data.
 * A runtime query execution engine, Hyracks, for partitioned-parallel
   execution of query plans.
 * Partitioned LSM-based data storage and indexing for efficient
   ingestion of newly arriving data.
 * Support for querying and indexing external data (e.g., in HDFS) as
   well as data stored within AsterixDB.
 * A rich set of primitive data types, including support for spatial,
   temporal, and textual data.
 * Indexing options that include B+ trees, R trees, and inverted
   keyword index support.
 * Basic transactional (concurrency and recovery) capabilities akin to
   those of a NoSQL store.


 Background and Rationale

 In the world of relational databases, the need to tackle data volumes
 that exceed the capabilities of a single server led to the
 development of “shared-nothing” parallel database systems several
 decades ago. These systems spread data over a cluster based on a
 partitioning strategy, such as hash partitioning, and queries are
 processed by employing partitioned-parallel divide-and-conquer
 techniques. Since these systems are fronted by a high-level,
 declarative language (SQL), their users are shielded from the
 complexities of parallel programming. Parallel database systems have
 been an extremely successful application of parallel computing, and
 quite a number of commercial products exist today.

 In the distributed systems world, the Web brought a need to index and
 query its huge content. SQL and relational databases were not the
 answer, though shared-nothing clusters again emerged as the hardware
 platform of choice. Google developed the Google File System (GFS) and
 MapReduce programming model to allow programmers to store and process
 Big Data by writing a few user-defined functions. The MapReduce
 framework applies these functions in parallel to data instances in
 distributed files (map) and to sorted groups of instances sharing a
 common key (reduce) -- not unlike the partitioned parallelism in
 parallel database systems. Apache's Hadoop MapReduce platform is the
 most prominent implementation of this paradigm for the rest of the
 Big Data community. On top of Hadoop and HDFS sit declarative
 languages like Pig and Hive that each compile down to Hadoop
 MapReduce jobs.

 The big Web companies were also challenged by extreme user bases
 (100s of millions of users) and needed fast simple lookups and
 updates to very large keyed data sets like user profiles. SQL
 databases were deemed either too expensive or not scalable, so the
 “NoSQL movement” was born. The ASF now has HBase and Cassandra, two
 popular key-value stores, in this space. MongoDB and Couchbase are
 other open source alternatives (document stores).

 It is evident from the rapidly growing popularity of NoSQL stores,
 as well as the strong demand for Big Data analytics engines today,
 that there is a strong (and growing!) need to store, process, *and*
 query large volumes of semi-structured data in many application
 areas. Until very recently, developers have had to ``choose'' between
 using big data analytics engines like Apache Hive or Apache Spark,
 which can do complex query processing and analysis over HDFS-resident
 files, and flexible but low-function data stores like MongoDB or
 Apache HBase. (The Apache Phoenix project,
 http://phoenix.apache.org/, is a recent SQL-over-HBase effort that
 aims to bridge between these choices.)

 AsterixDB is a highly scalable data management system that can store,
 index, and manage semi-structured data, e.g., much like MongoDB, but
 it also supports a full-power query language with the