Re: [PROPOSAL] Apache AsterixDB Incubator
Ditto - thanks for the support! Cheers, Mike On 1/19/15 5:39 PM, Till Westmann wrote: On Jan 19, 2015, at 11:34 AM, jan i j...@apache.org mailto:j...@apache.org wrote: Looks like a real challenging project, and the proposal looks as if it has already been through a couple of refinement rounds. Count on my +1, when it comes to voting. Will do! Thanks, Till rgds jan i On 19 January 2015 at 19:26, Henry Saputra henry.sapu...@gmail.com mailto:henry.sapu...@gmail.com wrote: +1 This is GREAT News! Was watching and trying AsterixDB last year and looked in awesome shape. I have my plate full but would love to help mentor this project to get it going to ASF if needed! - Henry On Wed, Jan 14, 2015 at 6:21 PM, Mattmann, Chris A (3980) chris.a.mattm...@jpl.nasa.gov mailto:chris.a.mattm...@jpl.nasa.gov wrote: Hi Folks, I am pleased to bring forth the Apache AsterixDB proposal to the Apache Incubator as Champion, working in collaboration with the team. Please find the wiki proposal here: https://wiki.apache.org/incubator/AsterixDBProposal Full text of the proposal is below. Please discuss and enjoy. I’ll leave the discussion open for a week, and then look to call a VOTE hopefully end of next week if all is well. Cheers! Chris Mattmann = Apache AsterixDB Proposal Abstract Apache AsterixDB is a scalable big data management system (BDMS) that provides storage, management, and query capabilities for large collections of semi-structured data. Proposal AsterixDB is a big data management system (BDMS) that makes it well-suited to needs such as web data warehousing and social data storage and analysis. Feature-wise, AsterixDB has: * A NoSQL style data model (ADM) based on extending JSON with object database concepts. * An expressive and declarative query language (AQL) for querying semi-structured data. * A runtime query execution engine, Hyracks, for partitioned-parallel execution of query plans. * Partitioned LSM-based data storage and indexing for efficient ingestion of newly arriving data. * Support for querying and indexing external data (e.g., in HDFS) as well as data stored within AsterixDB. * A rich set of primitive data types, including support for spatial, temporal, and textual data. * Indexing options that include B+ trees, R trees, and inverted keyword index support. * Basic transactional (concurrency and recovery) capabilities akin to those of a NoSQL store. Background and Rationale In the world of relational databases, the need to tackle data volumes that exceed the capabilities of a single server led to the development of “shared-nothing” parallel database systems several decades ago. These systems spread data over a cluster based on a partitioning strategy, such as hash partitioning, and queries are processed by employing partitioned-parallel divide-and-conquer techniques. Since these systems are fronted by a high-level, declarative language (SQL), their users are shielded from the complexities of parallel programming. Parallel database systems have been an extremely successful application of parallel computing, and quite a number of commercial products exist today. In the distributed systems world, the Web brought a need to index and query its huge content. SQL and relational databases were not the answer, though shared-nothing clusters again emerged as the hardware platform of choice. Google developed the Google File System (GFS) and MapReduce programming model to allow programmers to store and process Big Data by writing a few user-defined functions. The MapReduce framework applies these functions in parallel to data instances in distributed files (map) and to sorted groups of instances sharing a common key (reduce) -- not unlike the partitioned parallelism in parallel database systems. Apache's Hadoop MapReduce platform is the most prominent implementation of this paradigm for the rest of the Big Data community. On top of Hadoop and HDFS sit declarative languages like Pig and Hive that each compile down to Hadoop MapReduce jobs. The big Web companies were also challenged by extreme user bases (100s of millions of users) and needed fast simple lookups and updates to very large keyed data sets like user profiles. SQL databases were deemed either too expensive or not scalable, so the “NoSQL movement” was born. The ASF now has HBase and Cassandra, two popular key-value stores, in this space. MongoDB and
Re: [PROPOSAL] Apache AsterixDB Incubator
Indeed - thanks!! Cheers, Mike On 1/19/15 5:28 PM, Till Westmann wrote: Hi Henry, thanks! It’s great that you’ve seen (and liked) AsterixDB before. Even if your time is very limited we would be very happy to have you on board as a mentor. I’ll add you to the proposal. Cheers, Till On Jan 19, 2015, at 10:26 AM, Henry Saputra henry.sapu...@gmail.com wrote: +1 This is GREAT News! Was watching and trying AsterixDB last year and looked in awesome shape. I have my plate full but would love to help mentor this project to get it going to ASF if needed! - Henry On Wed, Jan 14, 2015 at 6:21 PM, Mattmann, Chris A (3980) chris.a.mattm...@jpl.nasa.gov wrote: Hi Folks, I am pleased to bring forth the Apache AsterixDB proposal to the Apache Incubator as Champion, working in collaboration with the team. Please find the wiki proposal here: https://wiki.apache.org/incubator/AsterixDBProposal Full text of the proposal is below. Please discuss and enjoy. I’ll leave the discussion open for a week, and then look to call a VOTE hopefully end of next week if all is well. Cheers! Chris Mattmann = Apache AsterixDB Proposal Abstract Apache AsterixDB is a scalable big data management system (BDMS) that provides storage, management, and query capabilities for large collections of semi-structured data. Proposal AsterixDB is a big data management system (BDMS) that makes it well-suited to needs such as web data warehousing and social data storage and analysis. Feature-wise, AsterixDB has: * A NoSQL style data model (ADM) based on extending JSON with object database concepts. * An expressive and declarative query language (AQL) for querying semi-structured data. * A runtime query execution engine, Hyracks, for partitioned-parallel execution of query plans. * Partitioned LSM-based data storage and indexing for efficient ingestion of newly arriving data. * Support for querying and indexing external data (e.g., in HDFS) as well as data stored within AsterixDB. * A rich set of primitive data types, including support for spatial, temporal, and textual data. * Indexing options that include B+ trees, R trees, and inverted keyword index support. * Basic transactional (concurrency and recovery) capabilities akin to those of a NoSQL store. Background and Rationale In the world of relational databases, the need to tackle data volumes that exceed the capabilities of a single server led to the development of “shared-nothing” parallel database systems several decades ago. These systems spread data over a cluster based on a partitioning strategy, such as hash partitioning, and queries are processed by employing partitioned-parallel divide-and-conquer techniques. Since these systems are fronted by a high-level, declarative language (SQL), their users are shielded from the complexities of parallel programming. Parallel database systems have been an extremely successful application of parallel computing, and quite a number of commercial products exist today. In the distributed systems world, the Web brought a need to index and query its huge content. SQL and relational databases were not the answer, though shared-nothing clusters again emerged as the hardware platform of choice. Google developed the Google File System (GFS) and MapReduce programming model to allow programmers to store and process Big Data by writing a few user-defined functions. The MapReduce framework applies these functions in parallel to data instances in distributed files (map) and to sorted groups of instances sharing a common key (reduce) -- not unlike the partitioned parallelism in parallel database systems. Apache's Hadoop MapReduce platform is the most prominent implementation of this paradigm for the rest of the Big Data community. On top of Hadoop and HDFS sit declarative languages like Pig and Hive that each compile down to Hadoop MapReduce jobs. The big Web companies were also challenged by extreme user bases (100s of millions of users) and needed fast simple lookups and updates to very large keyed data sets like user profiles. SQL databases were deemed either too expensive or not scalable, so the “NoSQL movement” was born. The ASF now has HBase and Cassandra, two popular key-value stores, in this space. MongoDB and Couchbase are other open source alternatives (document stores). It is evident from the rapidly growing popularity of NoSQL stores, as well as the strong demand for Big Data analytics engines today, that there is a strong (and growing!) need to store, process, *and* query large volumes of semi-structured data in many application areas. Until very recently, developers have had to ``choose'' between using big data analytics engines like Apache Hive or Apache Spark, which can do complex query processing and analysis over HDFS-resident files, and flexible but low-function data stores like MongoDB or Apache HBase. (The Apache Phoenix project,
Re: Next steps for various proposals (mentor re-boot, pTLP, etc.)
Ditto, kudos to ChrisD ++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++ -Original Message- From: Ted Dunning ted.dunn...@gmail.com Reply-To: general@incubator.apache.org general@incubator.apache.org Date: Monday, January 19, 2015 at 5:48 PM To: general@incubator.apache.org general@incubator.apache.org Subject: Re: Next steps for various proposals (mentor re-boot, pTLP, etc.) On Mon, Jan 19, 2015 at 4:37 PM, Chris Douglas cdoug...@apache.org wrote: submit a proposal to the board to start a new project. Fork the incubator. Hmm... That is the first interesting variation here. - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: Next steps for various proposals (mentor re-boot, pTLP, etc.)
I do not dispute anything written below nor do I intend this to be a last word, just a clarification. I n neither model are people powerless in any meaningful sense. I approached these proposals by putting myself in the shoes of a newcomer as best as I'm able (I've been PMC for years and PPMC also). The feeling of investment in the process I'd have would be different than before under the second two options (*not* the mentor reboot), as would be the calculus of bringing a project to Apache. I have not observed the IPMC model to take ownership away, because the initial contributors bringing their project here are formed into a PPMC of equals and the usual release votes done by the IPMC are up-or-down checks on releases, not exercises in differential power. On Mon, Jan 19, 2015 at 4:15 PM, Benson Margulies bimargul...@gmail.com wrote: I'm in the odd situation of not particularly wanting to argue in favor of the proposal I wrote, yet finding it hard to resist the provocation of messages that appear, to me, to misunderstand it. So I'll restrict myself to the following, and I won't reply to any further dispute. Anyone else is welcome to have a last-er word than me. The incubator is like no other Apache project. It is not a meritocratic, volunteer, community, producing a software product for the public good. It is a volunteer, meritocratic, group of people solving a problem for the board. The problem that the incubator sets out to solve is this: How do you bootstrap a community from scratch? Because it is a group of people solving a problem for the board, there's no special 'merit' in shaping it in the usual ASF PMC growing community mold. There may by some problems with that shape related to scale, noise, and responsibility. Some people who find those problems to be severe want to make changes. Others, not so much. The board is always free to solve any problem with any structure that it finds effective; there's no 'constitutional' requirement that everything is a meritocratic PMC. Witness what happened to ApacheCon. We have here two competing visions. The current vision says: Let people who have never run an Apache community it start doing it with coaching and supervision from 'mentors'. The alternative vision says, Start with a kernel of people who have done it before. Those of you who are happy with the current vision? Great! I wrote up the alternative vision to try to put some clarity onto a lot of prior writing that found fault with the current model and looked for an alternative. I n neither model are people powerless in any meaningful sense. In the current model, people have an interaction with the full IPMC. They can get pretty frustrated, but, as Mavin has documented, the frustration is more the fault of the lack of documentation than of the behavior of the IPMC. In the alternative model, they _start out_ with a group of 'strangers' at the center of their community, but those strangers are chosen specifically for their ability and experience in building a consensus community. And, in any case, they they will rapidly become an ever-smaller fraction of the group. Badly-behaved mentors (and other IPMC members) can overbear in the current model, and badly-behaved seed-PMC members could overbear in the alternative. I very much doubt that email discussion will yield any consensus to do anything radical. Which might be fine. When the time comes to find Roman's successor, an interesting situation may arise in which candidates might declare their intention to implement changes. And just to be clear, _I_ am not running on the platform of implementing what I wrote -- or any other way. - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
Re: [VOTE] Release Apache htrace-3.1.0-incubating
+1 (non-binding, see my post from dev thread*) * http://mail-archives.apache.org/mod_mbox/incubator-htrace-dev/201501.mbox/%3cCANZa=guvaa2oeqfiudiz1tg82e23w80uzu+fspsk6xcbwtn...@mail.gmail.com%3e On Mon, Jan 19, 2015 at 7:34 AM, Jake Farrell jfarr...@apache.org wrote: +1 binding -Jake On Sat, Jan 17, 2015 at 10:36 PM, Stack st...@duboce.net wrote: Apache HTrace (incubating), after ten release candidates, has voted to release the below referenced Apache HTrace 3.1.0-incubating release candidate. Dear IPMC, please vote on our first release candidate as an Apache Incubator project. Here is the vote thread we ran on our dev list (Six binding +1 votes and no dissent) with a subject: [VOTE] htrace-3.1.0, the this is it for sure!, tenth release candidate * http://mail-archives.apache.org/mod_mbox/incubator-htrace-dev/201501.mbox/%3CCADcMMgF_agDCzcwsxpdGsJOzCQ1ebA5z7hsM_-oFbneAVzh4dg%40mail.gmail.com%3E http://mail-archives.apache.org/mod_mbox/incubator-htrace-dev/201501.mbox/%3CCADcMMgF_agDCzcwsxpdGsJOzCQ1ebA5z7hsM_-oFbneAVzh4dg%40mail.gmail.com%3E * The source tarball, hashes, and signing are here: http://people.apache.org/~stack/htrace-3.1.0-incubatingRC9/ (Over in htrace our RC number was zero based so RC9 == tenth RC) Related maven artifacts are posted here: https://repository.apache.org/content/repositories/orgapachehtrace-1014 The tag for the RC is here: https://git-wip-us.apache.org/repos/asf?p=incubator-htrace.git;a=commit;h=0cabe569bc05a58c7a319a460eed5e50e136bae7 The KEYS file with the key used signing is available here: *https://dist.apache.org/repos/dist/release/incubator/htrace/KEYS https://dist.apache.org/repos/dist/release/incubator/htrace/KEYS* 44 issues were closed/resolved for this release: https://issues.apache.org/jira/issues/?jql=project%20%3D%20HTRACE%20AND%20status%20%3D%20resolved%20AND%20fixVersion%20%3D%203.1.0%20ORDER%20BY%20issuetype%20DESC The vote will be open for 72 hours. [ ] +1 Release this package as Apache HTrace 3.1.0-incubating [ ] +0 no opinion [ ] -1 Do not release this package because ... Thanks, St.Ack
Re: Next steps for various proposals (mentor re-boot, pTLP, etc.)
I'm in the odd situation of not particularly wanting to argue in favor of the proposal I wrote, yet finding it hard to resist the provocation of messages that appear, to me, to misunderstand it. So I'll restrict myself to the following, and I won't reply to any further dispute. Anyone else is welcome to have a last-er word than me. The incubator is like no other Apache project. It is not a meritocratic, volunteer, community, producing a software product for the public good. It is a volunteer, meritocratic, group of people solving a problem for the board. The problem that the incubator sets out to solve is this: How do you bootstrap a community from scratch? Because it is a group of people solving a problem for the board, there's no special 'merit' in shaping it in the usual ASF PMC growing community mold. There may by some problems with that shape related to scale, noise, and responsibility. Some people who find those problems to be severe want to make changes. Others, not so much. The board is always free to solve any problem with any structure that it finds effective; there's no 'constitutional' requirement that everything is a meritocratic PMC. Witness what happened to ApacheCon. We have here two competing visions. The current vision says: Let people who have never run an Apache community it start doing it with coaching and supervision from 'mentors'. The alternative vision says, Start with a kernel of people who have done it before. Those of you who are happy with the current vision? Great! I wrote up the alternative vision to try to put some clarity onto a lot of prior writing that found fault with the current model and looked for an alternative. In neither model are people powerless in any meaningful sense. In the current model, people have an interaction with the full IPMC. They can get pretty frustrated, but, as Mavin has documented, the frustration is more the fault of the lack of documentation than of the behavior of the IPMC. In the alternative model, they _start out_ with a group of 'strangers' at the center of their community, but those strangers are chosen specifically for their ability and experience in building a consensus community. And, in any case, they they will rapidly become an ever-smaller fraction of the group. Badly-behaved mentors (and other IPMC members) can overbear in the current model, and badly-behaved seed-PMC members could overbear in the alternative. I very much doubt that email discussion will yield any consensus to do anything radical. Which might be fine. When the time comes to find Roman's successor, an interesting situation may arise in which candidates might declare their intention to implement changes. And just to be clear, _I_ am not running on the platform of implementing what I wrote -- or any other way. - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: Next steps for various proposals (mentor re-boot, pTLP, etc.)
I think the cures are all problematic and might be worse than the disease. On Mon, Jan 19, 2015 at 1:47 PM, Roman Shaposhnik r...@apache.org wrote: On Wed, Jan 14, 2015 at 8:48 AM, Roman Shaposhnik r...@apache.org wrote: Hi! at this point we have had a few lively threads discussing three somewhat different proposals: #1 mentor re-boot #2 pTLP #3 Ross's strawman http://s.apache.org/8eS it feels to me that all three need additional work to be done before we can have any reasonable consensus around them (let alone voting). Wearing my chair hat, I would like to suggest that the next step should be: for each proposal we identify points that are going to block consensus (AKA would result in -1 vote if it comes to a vote). I suggest we do it on the wiki pages themselves (I'll wikify Ross's proposal tonight). Not editing the wikis but simply collecting this feedback as the last section in each proposal. The idea would be to identify all such points in a week or so. Sounds good? To follow up. Each of the proposals: https://wiki.apache.org/incubator/MentorRebootProposal An active mentor is removed from a podling if that mentor does not review/sign off on a release. The above implies the foundation has a pool of mentors able to consistently meet every reporting requirement in a timely manner, without regard to personal or professional obstacles. I don't see it. For an organization almost entirely made up of volunteers this seems overly optimistic. There is only a small core membership who are capable and willing to do this as evidenced by a skim of history of general@incubator and members@. Perhaps this core group will end up shouldering the incubation load in its entirety. Although sadly this is more or less the current state of affairs, individual podlings do come with new mentors not part of the professional membership motivated to see at least that specific podling through. It's also risky to expect mentors kicked from a podling to be okay with it and want to try again, especially if listed on some naughty list to the board. https://wiki.apache.org/incubator/Strawman Only ASF members on the PPMC will have binding votes for the releases. This proposal seems better than the others in my estimation, but doesn't allow podlings full investment in their own release management. The members on the PPMC who have binding votes will drive the release process out of necessity. Once the podling graduates and the members on the PPMC leave to resume other interests or duties, only then for the first time is the project running their own releases. I think it was better to let the podling own their release process but have the IPMC (or equivalent) have an up-or-down vote afterward as a check on their activities. https://wiki.apache.org/incubator/IncubatorV2 This proposal revokes merit earned by existing IPMC members and reboots incubator supervision as a sub-board limited to 15 members. How members apply to this board is not specified. It is suggested the current board make recommendations to the board for their replacements, a very unmeritocratic suggestion that is quite surprising. It's not clear at all how the membership can address issues with this sub-board as they can with the Board. I think this proposal takes the likely outcome of the first proposal, that only a small core group of professional membership can manage sufficient activity as mentors to not be kicked from podlings, and codifies it with new structure and bylaws. Maybe in the end this is admitting reality. However, discussion of this proposal also floated the idea that the sub-board be later given authority to supervise the affairs of established TLPs, which is deeply problematic* and I suspect still hovers in the wings. I would hope not. All proposals for new ASF projects must include an initial PMC chair and an initial set of PMC members. These people must be acceptable to the board. It is the responsibility of the Incubator Committee to vett these people. All of them must have experience on existing PMCs This doubles down on the aspect of the Strawman proposal where PPMC members are powerless to vote on releases. Here they are powerless to make any and all project management decisions about their own software they brought to Apache. It's not mentoring if you make all of the decisions for them. * - Find me any PMC of any TLP that would welcome the self-introduction of newly empowered meddlers who by definition are uninformed of their project particulars. now has the feedback gathering section at the end. I am done with my personal feedback. Please provide yours. Here's the criteria you can apply when deciding whether to spend time on this or not: imagine that the proposal the way it is written were to come to a vote. If at that point you'd be inclined to vote -1 -- please let us know NOW. Using a VOTE thread as a forcing function for folks to
Re: Next steps for various proposals (mentor re-boot, pTLP, etc.)
I agree with Andrew. Creating a sub-board of the most vocal members of the IPMC distills its dysfunction. But this doesn't require consensus. The proposals focus on identifying the true members of the IPMC; clearly, the authors believe themselves to be among the elect. So make a list of the IPMC members you believe should judge the other 90%, and submit a proposal to the board to start a new project. Fork the incubator. If the board is interested in your proposal, then you can demonstrate how successful a small, committed group can be in contrast to the 150+ member IPMC. Concurrently, others may propose TLPs directly to the board, and this committee will continue as-is. Surely all of you have better things to do than... this. -C On Mon, Jan 19, 2015 at 3:55 PM, Andrew Purtell apurt...@apache.org wrote: I think the cures are all problematic and might be worse than the disease. On Mon, Jan 19, 2015 at 1:47 PM, Roman Shaposhnik r...@apache.org wrote: On Wed, Jan 14, 2015 at 8:48 AM, Roman Shaposhnik r...@apache.org wrote: Hi! at this point we have had a few lively threads discussing three somewhat different proposals: #1 mentor re-boot #2 pTLP #3 Ross's strawman http://s.apache.org/8eS it feels to me that all three need additional work to be done before we can have any reasonable consensus around them (let alone voting). Wearing my chair hat, I would like to suggest that the next step should be: for each proposal we identify points that are going to block consensus (AKA would result in -1 vote if it comes to a vote). I suggest we do it on the wiki pages themselves (I'll wikify Ross's proposal tonight). Not editing the wikis but simply collecting this feedback as the last section in each proposal. The idea would be to identify all such points in a week or so. Sounds good? To follow up. Each of the proposals: https://wiki.apache.org/incubator/MentorRebootProposal An active mentor is removed from a podling if that mentor does not review/sign off on a release. The above implies the foundation has a pool of mentors able to consistently meet every reporting requirement in a timely manner, without regard to personal or professional obstacles. I don't see it. For an organization almost entirely made up of volunteers this seems overly optimistic. There is only a small core membership who are capable and willing to do this as evidenced by a skim of history of general@incubator and members@. Perhaps this core group will end up shouldering the incubation load in its entirety. Although sadly this is more or less the current state of affairs, individual podlings do come with new mentors not part of the professional membership motivated to see at least that specific podling through. It's also risky to expect mentors kicked from a podling to be okay with it and want to try again, especially if listed on some naughty list to the board. https://wiki.apache.org/incubator/Strawman Only ASF members on the PPMC will have binding votes for the releases. This proposal seems better than the others in my estimation, but doesn't allow podlings full investment in their own release management. The members on the PPMC who have binding votes will drive the release process out of necessity. Once the podling graduates and the members on the PPMC leave to resume other interests or duties, only then for the first time is the project running their own releases. I think it was better to let the podling own their release process but have the IPMC (or equivalent) have an up-or-down vote afterward as a check on their activities. https://wiki.apache.org/incubator/IncubatorV2 This proposal revokes merit earned by existing IPMC members and reboots incubator supervision as a sub-board limited to 15 members. How members apply to this board is not specified. It is suggested the current board make recommendations to the board for their replacements, a very unmeritocratic suggestion that is quite surprising. It's not clear at all how the membership can address issues with this sub-board as they can with the Board. I think this proposal takes the likely outcome of the first proposal, that only a small core group of professional membership can manage sufficient activity as mentors to not be kicked from podlings, and codifies it with new structure and bylaws. Maybe in the end this is admitting reality. However, discussion of this proposal also floated the idea that the sub-board be later given authority to supervise the affairs of established TLPs, which is deeply problematic* and I suspect still hovers in the wings. I would hope not. All proposals for new ASF projects must include an initial PMC chair and an initial set of PMC members. These people must be acceptable to the board. It is the responsibility of the Incubator Committee to vett these people. All of them must have experience on existing PMCs This doubles
Re: [PROPOSAL] Apache AsterixDB Incubator
Thanks Till, Will try to solicit more mentors to help. Especially with initial committers mostly have not been exposed to contributing the Apache way. - Henry On Mon, Jan 19, 2015 at 5:28 PM, Till Westmann t...@westmann.org wrote: Hi Henry, thanks! It’s great that you’ve seen (and liked) AsterixDB before. Even if your time is very limited we would be very happy to have you on board as a mentor. I’ll add you to the proposal. Cheers, Till On Jan 19, 2015, at 10:26 AM, Henry Saputra henry.sapu...@gmail.com wrote: +1 This is GREAT News! Was watching and trying AsterixDB last year and looked in awesome shape. I have my plate full but would love to help mentor this project to get it going to ASF if needed! - Henry On Wed, Jan 14, 2015 at 6:21 PM, Mattmann, Chris A (3980) chris.a.mattm...@jpl.nasa.gov wrote: Hi Folks, I am pleased to bring forth the Apache AsterixDB proposal to the Apache Incubator as Champion, working in collaboration with the team. Please find the wiki proposal here: https://wiki.apache.org/incubator/AsterixDBProposal Full text of the proposal is below. Please discuss and enjoy. I’ll leave the discussion open for a week, and then look to call a VOTE hopefully end of next week if all is well. Cheers! Chris Mattmann = Apache AsterixDB Proposal Abstract Apache AsterixDB is a scalable big data management system (BDMS) that provides storage, management, and query capabilities for large collections of semi-structured data. Proposal AsterixDB is a big data management system (BDMS) that makes it well-suited to needs such as web data warehousing and social data storage and analysis. Feature-wise, AsterixDB has: * A NoSQL style data model (ADM) based on extending JSON with object database concepts. * An expressive and declarative query language (AQL) for querying semi-structured data. * A runtime query execution engine, Hyracks, for partitioned-parallel execution of query plans. * Partitioned LSM-based data storage and indexing for efficient ingestion of newly arriving data. * Support for querying and indexing external data (e.g., in HDFS) as well as data stored within AsterixDB. * A rich set of primitive data types, including support for spatial, temporal, and textual data. * Indexing options that include B+ trees, R trees, and inverted keyword index support. * Basic transactional (concurrency and recovery) capabilities akin to those of a NoSQL store. Background and Rationale In the world of relational databases, the need to tackle data volumes that exceed the capabilities of a single server led to the development of “shared-nothing” parallel database systems several decades ago. These systems spread data over a cluster based on a partitioning strategy, such as hash partitioning, and queries are processed by employing partitioned-parallel divide-and-conquer techniques. Since these systems are fronted by a high-level, declarative language (SQL), their users are shielded from the complexities of parallel programming. Parallel database systems have been an extremely successful application of parallel computing, and quite a number of commercial products exist today. In the distributed systems world, the Web brought a need to index and query its huge content. SQL and relational databases were not the answer, though shared-nothing clusters again emerged as the hardware platform of choice. Google developed the Google File System (GFS) and MapReduce programming model to allow programmers to store and process Big Data by writing a few user-defined functions. The MapReduce framework applies these functions in parallel to data instances in distributed files (map) and to sorted groups of instances sharing a common key (reduce) -- not unlike the partitioned parallelism in parallel database systems. Apache's Hadoop MapReduce platform is the most prominent implementation of this paradigm for the rest of the Big Data community. On top of Hadoop and HDFS sit declarative languages like Pig and Hive that each compile down to Hadoop MapReduce jobs. The big Web companies were also challenged by extreme user bases (100s of millions of users) and needed fast simple lookups and updates to very large keyed data sets like user profiles. SQL databases were deemed either too expensive or not scalable, so the “NoSQL movement” was born. The ASF now has HBase and Cassandra, two popular key-value stores, in this space. MongoDB and Couchbase are other open source alternatives (document stores). It is evident from the rapidly growing popularity of NoSQL stores, as well as the strong demand for Big Data analytics engines today, that there is a strong (and growing!) need to store, process, *and* query large volumes of semi-structured data in many application areas. Until very recently, developers have had to ``choose'' between using big data analytics
Re: Next steps for various proposals (mentor re-boot, pTLP, etc.)
On Tue, Jan 20, 2015 at 1:37 AM, Chris Douglas cdoug...@apache.org wrote: ...So make a list of the IPMC members you believe should judge the other 90%, and submit a proposal to the board to start a new project. Fork the incubator How is that different from pruning the current IPMC membership by removing inactive members? I don't think those inactive folks are a problem, but if people think they are it's easy to ask them if they want to stay, and remove those who reply no or don't reply. -Bertrand - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [PROPOSAL] Apache AsterixDB Incubator
Chris just asked me under separate cover. I am happy to help out as mentor. On Mon, Jan 19, 2015 at 8:17 PM, Henry Saputra henry.sapu...@gmail.com wrote: Thanks Till, Will try to solicit more mentors to help. Especially with initial committers mostly have not been exposed to contributing the Apache way. - Henry On Mon, Jan 19, 2015 at 5:28 PM, Till Westmann t...@westmann.org wrote: Hi Henry, thanks! It’s great that you’ve seen (and liked) AsterixDB before. Even if your time is very limited we would be very happy to have you on board as a mentor. I’ll add you to the proposal. Cheers, Till On Jan 19, 2015, at 10:26 AM, Henry Saputra henry.sapu...@gmail.com wrote: +1 This is GREAT News! Was watching and trying AsterixDB last year and looked in awesome shape. I have my plate full but would love to help mentor this project to get it going to ASF if needed! - Henry On Wed, Jan 14, 2015 at 6:21 PM, Mattmann, Chris A (3980) chris.a.mattm...@jpl.nasa.gov wrote: Hi Folks, I am pleased to bring forth the Apache AsterixDB proposal to the Apache Incubator as Champion, working in collaboration with the team. Please find the wiki proposal here: https://wiki.apache.org/incubator/AsterixDBProposal Full text of the proposal is below. Please discuss and enjoy. I’ll leave the discussion open for a week, and then look to call a VOTE hopefully end of next week if all is well. Cheers! Chris Mattmann = Apache AsterixDB Proposal Abstract Apache AsterixDB is a scalable big data management system (BDMS) that provides storage, management, and query capabilities for large collections of semi-structured data. Proposal AsterixDB is a big data management system (BDMS) that makes it well-suited to needs such as web data warehousing and social data storage and analysis. Feature-wise, AsterixDB has: * A NoSQL style data model (ADM) based on extending JSON with object database concepts. * An expressive and declarative query language (AQL) for querying semi-structured data. * A runtime query execution engine, Hyracks, for partitioned-parallel execution of query plans. * Partitioned LSM-based data storage and indexing for efficient ingestion of newly arriving data. * Support for querying and indexing external data (e.g., in HDFS) as well as data stored within AsterixDB. * A rich set of primitive data types, including support for spatial, temporal, and textual data. * Indexing options that include B+ trees, R trees, and inverted keyword index support. * Basic transactional (concurrency and recovery) capabilities akin to those of a NoSQL store. Background and Rationale In the world of relational databases, the need to tackle data volumes that exceed the capabilities of a single server led to the development of “shared-nothing” parallel database systems several decades ago. These systems spread data over a cluster based on a partitioning strategy, such as hash partitioning, and queries are processed by employing partitioned-parallel divide-and-conquer techniques. Since these systems are fronted by a high-level, declarative language (SQL), their users are shielded from the complexities of parallel programming. Parallel database systems have been an extremely successful application of parallel computing, and quite a number of commercial products exist today. In the distributed systems world, the Web brought a need to index and query its huge content. SQL and relational databases were not the answer, though shared-nothing clusters again emerged as the hardware platform of choice. Google developed the Google File System (GFS) and MapReduce programming model to allow programmers to store and process Big Data by writing a few user-defined functions. The MapReduce framework applies these functions in parallel to data instances in distributed files (map) and to sorted groups of instances sharing a common key (reduce) -- not unlike the partitioned parallelism in parallel database systems. Apache's Hadoop MapReduce platform is the most prominent implementation of this paradigm for the rest of the Big Data community. On top of Hadoop and HDFS sit declarative languages like Pig and Hive that each compile down to Hadoop MapReduce jobs. The big Web companies were also challenged by extreme user bases (100s of millions of users) and needed fast simple lookups and updates to very large keyed data sets like user profiles. SQL databases were deemed either too expensive or not scalable, so the “NoSQL movement” was born. The ASF now has HBase and Cassandra, two popular key-value stores, in this space. MongoDB and Couchbase are other open source alternatives (document stores). It is evident from the rapidly growing popularity of NoSQL stores, as well as the strong
Reporting and releasing for Ripple
The Ripple community asked for a stay of execution before being moved to the attic, as was recommended by some. This was granted in November 2014 with a review in six months. No board report was submitted this month and no action has been taken with respect to the concerns I raised about releases from this project. If this project community wishes to continue to operate as an incubating project, and eventually graduate, then these items need to be addressed. There are two months remaining. Without an ASF approved release happening in that period it is unlikely that the IPMC will approve a further six months. I'm here as a mentor and ready to help guide any member of this community (committer or otherwise, everyone is welcome) in making a release. Ross Microsoft Open Technologies, Inc. A subsidiary of Microsoft Corporation
Re: Next steps for various proposals (mentor re-boot, pTLP, etc.)
On Wed, Jan 14, 2015 at 8:48 AM, Roman Shaposhnik r...@apache.org wrote: Hi! at this point we have had a few lively threads discussing three somewhat different proposals: #1 mentor re-boot #2 pTLP #3 Ross's strawman http://s.apache.org/8eS it feels to me that all three need additional work to be done before we can have any reasonable consensus around them (let alone voting). Wearing my chair hat, I would like to suggest that the next step should be: for each proposal we identify points that are going to block consensus (AKA would result in -1 vote if it comes to a vote). I suggest we do it on the wiki pages themselves (I'll wikify Ross's proposal tonight). Not editing the wikis but simply collecting this feedback as the last section in each proposal. The idea would be to identify all such points in a week or so. Sounds good? To follow up. Each of the proposals: https://wiki.apache.org/incubator/MentorRebootProposal https://wiki.apache.org/incubator/Strawman https://wiki.apache.org/incubator/IncubatorV2 now has the feedback gathering section at the end. I am done with my personal feedback. Please provide yours. Here's the criteria you can apply when deciding whether to spend time on this or not: imagine that the proposal the way it is written were to come to a vote. If at that point you'd be inclined to vote -1 -- please let us know NOW. Using a VOTE thread as a forcing function for folks to provide feedback would be *really* unfortunate. Also, please try to keep 'deal breakers' section as small as possible (pushing all the non-critical piece of your feedback to the 'suggestions' section). When in doubt (even if it is -0) -- make it go to suggestions. The only items that belong to 'dealbreakers' are the ones that would *strongly* motivate you to vote -1 Thanks, Roman. - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: Next steps for various proposals (mentor re-boot, pTLP, etc.)
On Fri, Jan 16, 2015 at 3:04 PM, Ross Gardler (MS OPEN TECH) ross.gard...@microsoft.com wrote: Or we could just do it We debated plenty. Three proposals came out of it (two if you look at mine as the strawman it was intended to be). As a matter of fact, while editing yours (sorry, I took the liberty) and leaving feedback for Alan's I felt like they were pretty close in spirit, with yours going all the way to make podlings behave as close to TLPs as possible while still not overwhelming the board. Those proposals are not mutually exclusive. I say record them in the wiki. Run them for a while. Then compare against the problems document we drew up a couple of years back and see how effective they are. That's the plan. Thanks, Roman. - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: Next steps for various proposals (mentor re-boot, pTLP, etc.)
On Sat, Jan 17, 2015 at 12:16 AM, Ross Gardler (MS OPEN TECH) ross.gard...@microsoft.com wrote: http://wiki.apache.org/incubator/IncubatorIssues2013 If someone reviews this it would be nice to add brief comments about today's state, maybe right after each item's title (like early 2014 status: still a problem) -Bertrand - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
[RESULTS] Release Apache Usergrid 1.0.1 (incubating) RC5
+1 (binding) John Ament +1 (binding) Dave Johnson +1 (binding) Justin Mclean The IPMC vote passes. Thanks everybody! - Dave On Sat, Jan 17, 2015 at 6:29 PM, Dave snoopd...@gmail.com wrote: Thanks for the careful review and the +1! I filed the issues you raised as USERGRID-358. https://issues.apache.org/jira/browse/USERGRID-358 - Dave On Sat, Jan 17, 2015 at 6:10 PM, Justin Mclean jus...@classsoftware.com wrote: Hi, +1 (binding) if LICENSE and NOTICE are fixed up for next release. - incubating in name - DISCLAIMER exists - Signatures and MD5 correct (in dist area) - LICENSE and NOTICE minor issues (see below) - Source files have headers. - No unexpected binary files in release - Can compile from source LICENSE issues - May require font awesome license (see ./docs/_theme/sphinx_rtd_theme/static/css/theme.css) - Bootstrap version is MIT not Apache ./portal/js/libs/bootstrap/LICENSE.txt - Missing MIT license for ./sdks/dotnet/packages/Newtonsoft.Json.4.5.11/LICENSE.txt - Missing BSD license for NSubstitute ./sdks/dotnet/packages/NSubstitute.1.6.0.0/LICENSE.txt - Missing BSD license for sphinx eg ./docs/_theme/sphinx_rtd_theme/search.html NOTICE issues - Intro.js should be in LICENSE not NOTICE. However i was unable to find intro.js anywhere in the package so it may no longer be required Thanks, Justin - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: Next steps for various proposals (mentor re-boot, pTLP, etc.)
On Mon, Jan 19, 2015 at 2:59 AM, Bertrand Delacretaz bdelacre...@apache.org wrote: On Sat, Jan 17, 2015 at 12:16 AM, Ross Gardler (MS OPEN TECH) ross.gard...@microsoft.com wrote: http://wiki.apache.org/incubator/IncubatorIssues2013 If someone reviews this it would be nice to add brief comments about today's state, maybe right after each item's title (like early 2014 status: still a problem) First of all, this appears to be an immutable page. That said, looking at the list of issues, I'd say that every one of them still applies (the caveat being: to what degree) although the analysis suggestions could be slightly out of date. In general, I'd split the issues in two categories: Operational/Structural issues Issue 01 - lack of mentor participation Issue 02 - lack of progress towards graduation Issue 03 - Too many cooks spoil the IPMC broth Issue 04 - Horrible signal-to-noise ratio on general@a.o for podling contributors Issue 05 - Inadequate reporting Issue 07 - Vetting releases is a huge pain Issue 08 - The IPMC is broken Documentation: Issue 06 - Podlings status metadata is not reliable Issue 09 - People do not follow through to improve Incubator documentation Issue 10 - Steps for Podlings to Acquire Resources Are Disparate and Poorly Documented Issue 11 - Clearly and concisely document the principles and constraints for the ASF Issue 12 - Cold welcome for new projects In my view, the two categories can be addressed in parallel. The Documentation agenda seems to be passionately championed by Marvin and a few other folks. I'd expect them to make quite a bit of progress there. Personally, I'd like to focus this thread on the first category. Thanks, Roman. - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [PROPOSAL] Apache AsterixDB Incubator
Looks like a real challenging project, and the proposal looks as if it has already been through a couple of refinement rounds. Count on my +1, when it comes to voting. rgds jan i On 19 January 2015 at 19:26, Henry Saputra henry.sapu...@gmail.com wrote: +1 This is GREAT News! Was watching and trying AsterixDB last year and looked in awesome shape. I have my plate full but would love to help mentor this project to get it going to ASF if needed! - Henry On Wed, Jan 14, 2015 at 6:21 PM, Mattmann, Chris A (3980) chris.a.mattm...@jpl.nasa.gov wrote: Hi Folks, I am pleased to bring forth the Apache AsterixDB proposal to the Apache Incubator as Champion, working in collaboration with the team. Please find the wiki proposal here: https://wiki.apache.org/incubator/AsterixDBProposal Full text of the proposal is below. Please discuss and enjoy. I’ll leave the discussion open for a week, and then look to call a VOTE hopefully end of next week if all is well. Cheers! Chris Mattmann = Apache AsterixDB Proposal Abstract Apache AsterixDB is a scalable big data management system (BDMS) that provides storage, management, and query capabilities for large collections of semi-structured data. Proposal AsterixDB is a big data management system (BDMS) that makes it well-suited to needs such as web data warehousing and social data storage and analysis. Feature-wise, AsterixDB has: * A NoSQL style data model (ADM) based on extending JSON with object database concepts. * An expressive and declarative query language (AQL) for querying semi-structured data. * A runtime query execution engine, Hyracks, for partitioned-parallel execution of query plans. * Partitioned LSM-based data storage and indexing for efficient ingestion of newly arriving data. * Support for querying and indexing external data (e.g., in HDFS) as well as data stored within AsterixDB. * A rich set of primitive data types, including support for spatial, temporal, and textual data. * Indexing options that include B+ trees, R trees, and inverted keyword index support. * Basic transactional (concurrency and recovery) capabilities akin to those of a NoSQL store. Background and Rationale In the world of relational databases, the need to tackle data volumes that exceed the capabilities of a single server led to the development of “shared-nothing” parallel database systems several decades ago. These systems spread data over a cluster based on a partitioning strategy, such as hash partitioning, and queries are processed by employing partitioned-parallel divide-and-conquer techniques. Since these systems are fronted by a high-level, declarative language (SQL), their users are shielded from the complexities of parallel programming. Parallel database systems have been an extremely successful application of parallel computing, and quite a number of commercial products exist today. In the distributed systems world, the Web brought a need to index and query its huge content. SQL and relational databases were not the answer, though shared-nothing clusters again emerged as the hardware platform of choice. Google developed the Google File System (GFS) and MapReduce programming model to allow programmers to store and process Big Data by writing a few user-defined functions. The MapReduce framework applies these functions in parallel to data instances in distributed files (map) and to sorted groups of instances sharing a common key (reduce) -- not unlike the partitioned parallelism in parallel database systems. Apache's Hadoop MapReduce platform is the most prominent implementation of this paradigm for the rest of the Big Data community. On top of Hadoop and HDFS sit declarative languages like Pig and Hive that each compile down to Hadoop MapReduce jobs. The big Web companies were also challenged by extreme user bases (100s of millions of users) and needed fast simple lookups and updates to very large keyed data sets like user profiles. SQL databases were deemed either too expensive or not scalable, so the “NoSQL movement” was born. The ASF now has HBase and Cassandra, two popular key-value stores, in this space. MongoDB and Couchbase are other open source alternatives (document stores). It is evident from the rapidly growing popularity of NoSQL stores, as well as the strong demand for Big Data analytics engines today, that there is a strong (and growing!) need to store, process, *and* query large volumes of semi-structured data in many application areas. Until very recently, developers have had to ``choose'' between using big data analytics engines like Apache Hive or Apache Spark, which can do complex query processing and analysis over HDFS-resident files, and flexible but low-function data stores like MongoDB
Re: [PROPOSAL] Apache AsterixDB Incubator
+1 This is GREAT News! Was watching and trying AsterixDB last year and looked in awesome shape. I have my plate full but would love to help mentor this project to get it going to ASF if needed! - Henry On Wed, Jan 14, 2015 at 6:21 PM, Mattmann, Chris A (3980) chris.a.mattm...@jpl.nasa.gov wrote: Hi Folks, I am pleased to bring forth the Apache AsterixDB proposal to the Apache Incubator as Champion, working in collaboration with the team. Please find the wiki proposal here: https://wiki.apache.org/incubator/AsterixDBProposal Full text of the proposal is below. Please discuss and enjoy. I’ll leave the discussion open for a week, and then look to call a VOTE hopefully end of next week if all is well. Cheers! Chris Mattmann = Apache AsterixDB Proposal Abstract Apache AsterixDB is a scalable big data management system (BDMS) that provides storage, management, and query capabilities for large collections of semi-structured data. Proposal AsterixDB is a big data management system (BDMS) that makes it well-suited to needs such as web data warehousing and social data storage and analysis. Feature-wise, AsterixDB has: * A NoSQL style data model (ADM) based on extending JSON with object database concepts. * An expressive and declarative query language (AQL) for querying semi-structured data. * A runtime query execution engine, Hyracks, for partitioned-parallel execution of query plans. * Partitioned LSM-based data storage and indexing for efficient ingestion of newly arriving data. * Support for querying and indexing external data (e.g., in HDFS) as well as data stored within AsterixDB. * A rich set of primitive data types, including support for spatial, temporal, and textual data. * Indexing options that include B+ trees, R trees, and inverted keyword index support. * Basic transactional (concurrency and recovery) capabilities akin to those of a NoSQL store. Background and Rationale In the world of relational databases, the need to tackle data volumes that exceed the capabilities of a single server led to the development of “shared-nothing” parallel database systems several decades ago. These systems spread data over a cluster based on a partitioning strategy, such as hash partitioning, and queries are processed by employing partitioned-parallel divide-and-conquer techniques. Since these systems are fronted by a high-level, declarative language (SQL), their users are shielded from the complexities of parallel programming. Parallel database systems have been an extremely successful application of parallel computing, and quite a number of commercial products exist today. In the distributed systems world, the Web brought a need to index and query its huge content. SQL and relational databases were not the answer, though shared-nothing clusters again emerged as the hardware platform of choice. Google developed the Google File System (GFS) and MapReduce programming model to allow programmers to store and process Big Data by writing a few user-defined functions. The MapReduce framework applies these functions in parallel to data instances in distributed files (map) and to sorted groups of instances sharing a common key (reduce) -- not unlike the partitioned parallelism in parallel database systems. Apache's Hadoop MapReduce platform is the most prominent implementation of this paradigm for the rest of the Big Data community. On top of Hadoop and HDFS sit declarative languages like Pig and Hive that each compile down to Hadoop MapReduce jobs. The big Web companies were also challenged by extreme user bases (100s of millions of users) and needed fast simple lookups and updates to very large keyed data sets like user profiles. SQL databases were deemed either too expensive or not scalable, so the “NoSQL movement” was born. The ASF now has HBase and Cassandra, two popular key-value stores, in this space. MongoDB and Couchbase are other open source alternatives (document stores). It is evident from the rapidly growing popularity of NoSQL stores, as well as the strong demand for Big Data analytics engines today, that there is a strong (and growing!) need to store, process, *and* query large volumes of semi-structured data in many application areas. Until very recently, developers have had to ``choose'' between using big data analytics engines like Apache Hive or Apache Spark, which can do complex query processing and analysis over HDFS-resident files, and flexible but low-function data stores like MongoDB or Apache HBase. (The Apache Phoenix project, http://phoenix.apache.org/, is a recent SQL-over-HBase effort that aims to bridge between these choices.) AsterixDB is a highly scalable data management system that can store, index, and manage semi-structured data, e.g., much like MongoDB, but it also supports a full-power query language with the