Re: [VOTE] Accept Impala into the Apache Incubator

2015-11-25 Thread Doug Cutting
+1 (binding)

Doug

On Tue, Nov 24, 2015 at 1:03 PM, Henry Robinson  wrote:

> Hi -
>
> The [DISCUSS] thread has been quiet for a few days, so I think there's been
> sufficient opportunity for discussion around our proposal to bring Impala
> to the ASF Incubator.
>
> I'd like to call a VOTE on that proposal, which is on the wiki at
> https://wiki.apache.org/incubator/ImpalaProposal, and which I've pasted
> below.
>
> During the discussion period, the proposal has been amended to add Brock
> Noland as a new mentor, to add one missed committer from the list and to
> correct some issues with the dependency list.
>
> Please cast your votes as follows:
>
> [] +1, accept Impala into the Incubator
> [] +/-0, non-counted vote to express a disposition
> [] -1, do not accept Impala into the Incubator (please give your reason(s))
>
> As with the concurrent Kudu vote, I propose leaving the vote open for a
> full seven days (to close at Tuesday, December 1st at noon PST), due to the
> upcoming US holiday.
>
> Thanks,
> Henry
>
> 
>
> = Abstract =
> Impala is a high-performance C++ and Java SQL query engine for data stored
> in Apache Hadoop-based clusters.
>
> = Proposal =
>
> We propose to contribute the Impala codebase and associated artifacts (e.g.
> documentation, web-site content etc.) to the Apache Software Foundation
> with the intent of forming a productive, meritocratic and open community
> around Impala’s continued development, according to the ‘Apache Way’.
>
> Cloudera owns several trademarks regarding Impala, and proposes to transfer
> ownership of those trademarks in full to the ASF.
>
> = Background =
> Engineers at Cloudera developed Impala and released it as an
> Apache-licensed open-source project in Fall 2012. Impala was written as a
> brand-new, modern C++ SQL engine targeted from the start for data stored in
> Apache Hadoop clusters.
>
> Impala’s most important benefit to users is high-performance, making it
> extremely appropriate for common enterprise analytic and business
> intelligence workloads. This is achieved by a number of software
> techniques, including: native support for data stored in HDFS and related
> filesystems, just-in-time compilation and optimization of individual query
> plans, high-performance C++ codebase and massively-parallel distributed
> architecture. In benchmarks, Impala is routinely amongst the very highest
> performing SQL query engines.
>
> = Rationale =
>
> Despite the exciting innovation in the so-called ‘big-data’ space, SQL
> remains by far the most common interface for interacting with data in both
> traditional warehouses and modern ‘big-data’ clusters. There is clearly a
> need, as evidenced by the eager adoption of Impala and other SQL engines in
> enterprise contexts, for a query engine that offers the familiar SQL
> interface, but that has been specifically designed to operate in massive,
> distributed clusters rather than in traditional, fixed-hardware,
> warehouse-specific deployments. Impala is one such query engine.
>
> We believe that the ASF is the right venue to foster an open-source
> community around Impala’s development. We expect that Impala will benefit
> from more productive collaboration with related Apache projects, and under
> the auspices of the ASF will attract talented contributors who will push
> Impala’s development forward at pace.
>
> We believe that the timing is right for Impala’s development to move
> wholesale to the ASF: Impala is well-established, has been Apache-licensed
> open-source for more than three years, and the core project is relatively
> stable. We are excited to see where an ASF-based community can take Impala
> from this strong starting point.
>
> = Initial Goals =
> Our initial goals are as follows:
>
>  * Establish ASF-compatible engineering practices and workflows
>  * Refactor and publish existing internal build scripts and test
> infrastructure, in order to make them usable by any community member.
>  * Transfer source code, documentation and associated artifacts to the ASF.
>  * Grow the user and developer communities
>
> = Current Status =
>
> Impala is developed as an Apache-licensed open-source project. The source
> code is available at http://github.com/cloudera/Impala, and developer
> documentation is at https://github.com/cloudera/Impala/wiki. The majority
> of commits to the project have come from Cloudera-employed developers, but
> we have accepted some contributions from individuals from other
> organizations.
>
> All code reviews are done via a public instance of the Gerrit review tool
> at http://gerrit.cloudera.org:8080/, and discussed on a public mailing
> list. All patches must be reviewed before they are accepted into the
> codebase, via a voting mechanism that is similar to that used on Apache
> projects such as Hadoop and HBase.
>
> Before a patch is committed, it must pass a suite of pre-commit tests.
> These tests are currently run on Cloudera’s internal 

Re: RTC vs CTR (was: Concerning Sentry...)

2015-11-25 Thread Stephen Connolly
Spoilsport ;-)

On 25 November 2015 at 16:47, Upayavira  wrote:

> Not replying to this mail specifically, but to the thread in general...
>
> People keep using the terms RTC and CTR as if we all mean the same
> thing. Please don't. If you must use these terms, please define what you
> mean by them.
>
> CTR is a less ambiguous term - I'd suggest we all assume that "commit"
> means a push to a version control system.
>
> However, RTC seems to mean many things - from "push to JIRA for review
> first, wait a bit, then commit to VCS" through "push to JIRA, and once
> you have sufficient +1 votes, you can commit" to "push to JIRA for a
> review, then another committer must commit it".
>
> If we're gonna debate RTC, can we please describe which of these we are
> talking about (or some other mechanism that I haven't described)?
> Otherwise, we will end up endlessly debating over the top of each other.
>
> Upayavira
>
> On Wed, Nov 25, 2015, at 09:28 AM, Harbs wrote:
> > AIUI, there’s two ways to go about RTC which is easier in Git:
> > 1) Working in feature/bug fix branches. Assuming RTC only applies to the
> > main branch, changes are done in separate branches where commits do not
> > require review. The feature/bug fix branch is then only merged back in
> > after it had a review. The reason this is easier is because branching and
> > merging is almost zero effort in Git. Many Git workflows don’t work on
> > the main branch anyway, so this is a particularly good fit for those
> > workflows.
> > 2) Pull requests. Using pull requests, all changes can be pulled in with
> > a single command.
> >
> > I’ve personally never participated in RTC (unless you count Github
> > projects and before I was a committer in Flex), so it could be I’m
> > missing something.
> >
> > Of course there’s nothing to ENFORCE that the commit is not done before a
> > review, but why would you want to do that? That’s where trust comes to
> > play… ;-)
> >
> > Harbs
> >
> > On Nov 25, 2015, at 4:08 AM, Konstantin Boudnik  wrote:
> >
> > > I don't think Git is particularly empowering RTC - there's nothing in
> it that
> > > requires someone to look over one's shoulder.
> >
>
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>


Re: [VOTE] Accept Kudu into the Apache Incubator

2015-11-25 Thread Doug Cutting
+1 (binding)

Doug

On Wed, Nov 25, 2015 at 8:45 AM, Chris Douglas  wrote:

> +1 (binding) -C
>
> On Tue, Nov 24, 2015 at 11:32 AM, Todd Lipcon  wrote:
> > Hi all,
> >
> > Discussion on the [DISCUSS] thread seems to have wound down, so I'd like
> to
> > call a VOTE on acceptance of Kudu into the ASF Incubator. The proposal is
> > pasted below and also available on the wiki at:
> > https://wiki.apache.org/incubator/KuduProposal
> >
> > The proposal is unchanged since the original version, except for the
> > addition of Carl Steinbach as a Mentor.
> >
> > Please cast your votes:
> >
> > [] +1, accept Kudu into the Incubator
> > [] +/-0, positive/negative non-counted expression of feelings
> > [] -1, do not accept Kudu into the incubator (please state reasoning)
> >
> > Given the US holiday this week, I imagine many folks are traveling or
> > otherwise offline. So, let's run the vote for a full week rather than the
> > traditional 72 hours. Unless the IPMC objects to the extended voting
> > period, the vote will close on Tues, Dec 1st at noon PST.
> >
> > Thanks
> > -Todd
> > -
> >
> > = Kudu Proposal =
> >
> > == Abstract ==
> >
> > Kudu is a distributed columnar storage engine built for the Apache Hadoop
> > ecosystem.
> >
> > == Proposal ==
> >
> > Kudu is an open source storage engine for structured data which supports
> > low-latency random access together with efficient analytical access
> > patterns. Kudu distributes data using horizontal partitioning and
> > replicates each partition using Raft consensus, providing low
> > mean-time-to-recovery and low tail latencies. Kudu is designed within the
> > context of the Apache Hadoop ecosystem and supports many integrations
> with
> > other data analytics projects both inside and outside of the Apache
> > Software Foundation.
> >
> >
> >
> > We propose to incubate Kudu as a project of the Apache Software
> Foundation.
> >
> > == Background ==
> >
> > In recent years, explosive growth in the amount of data being generated
> and
> > captured by enterprises has resulted in the rapid adoption of open source
> > technology which is able to store massive data sets at scale and at low
> > cost. In particular, the Apache Hadoop ecosystem has become a focal point
> > for such “big data” workloads, because many traditional open source
> > database systems have lagged in offering a scalable alternative.
> >
> >
> >
> > Structured storage in the Hadoop ecosystem has typically been achieved in
> > two ways: for static data sets, data is typically stored on Apache HDFS
> > using binary data formats such as Apache Avro or Apache Parquet. However,
> > neither HDFS nor these formats has any provision for updating individual
> > records, or for efficient random access. Mutable data sets are typically
> > stored in semi-structured stores such as Apache HBase or Apache
> Cassandra.
> > These systems allow for low-latency record-level reads and writes, but
> lag
> > far behind the static file formats in terms of sequential read throughput
> > for applications such as SQL-based analytics or machine learning.
> >
> >
> >
> > Kudu is a new storage system designed and implemented from the ground up
> to
> > fill this gap between high-throughput sequential-access storage systems
> > such as HDFS and low-latency random-access systems such as HBase or
> > Cassandra. While these existing systems continue to hold advantages in
> some
> > situations, Kudu offers a “happy medium” alternative that can
> dramatically
> > simplify the architecture of many common workloads. In particular, Kudu
> > offers a simple API for row-level inserts, updates, and deletes, while
> > providing table scans at throughputs similar to Parquet, a commonly-used
> > columnar format for static data.
> >
> >
> >
> > More information on Kudu can be found at the existing open source project
> > website: http://getkudu.io and in particular in the Kudu white-paper
> PDF:
> > http://getkudu.io/kudu.pdf from which the above was excerpted.
> >
> > == Rationale ==
> >
> > As described above, Kudu fills an important gap in the open source
> storage
> > ecosystem. After our initial open source project release in September
> 2015,
> > we have seen a great amount of interest across a diverse set of users and
> > companies. We believe that, as a storage system, it is critical to build
> an
> > equally diverse set of contributors in the development community. Our
> > experiences as committers and PMC members on other Apache projects have
> > taught us the value of diverse communities in ensuring both longevity and
> > high quality for such foundational systems.
> >
> > == Initial Goals ==
> >
> >  * Move the existing codebase, website, documentation, and mailing lists
> to
> > Apache-hosted infrastructure
> >  * Work with the infrastructure team to implement and approve our code
> > review, build, and testing workflows in the context of the ASF
> >  * Incremental development and 

Re: [VOTE] Accept Kudu into the Apache Incubator

2015-11-25 Thread Chris Douglas
+1 (binding) -C

On Tue, Nov 24, 2015 at 11:32 AM, Todd Lipcon  wrote:
> Hi all,
>
> Discussion on the [DISCUSS] thread seems to have wound down, so I'd like to
> call a VOTE on acceptance of Kudu into the ASF Incubator. The proposal is
> pasted below and also available on the wiki at:
> https://wiki.apache.org/incubator/KuduProposal
>
> The proposal is unchanged since the original version, except for the
> addition of Carl Steinbach as a Mentor.
>
> Please cast your votes:
>
> [] +1, accept Kudu into the Incubator
> [] +/-0, positive/negative non-counted expression of feelings
> [] -1, do not accept Kudu into the incubator (please state reasoning)
>
> Given the US holiday this week, I imagine many folks are traveling or
> otherwise offline. So, let's run the vote for a full week rather than the
> traditional 72 hours. Unless the IPMC objects to the extended voting
> period, the vote will close on Tues, Dec 1st at noon PST.
>
> Thanks
> -Todd
> -
>
> = Kudu Proposal =
>
> == Abstract ==
>
> Kudu is a distributed columnar storage engine built for the Apache Hadoop
> ecosystem.
>
> == Proposal ==
>
> Kudu is an open source storage engine for structured data which supports
> low-latency random access together with efficient analytical access
> patterns. Kudu distributes data using horizontal partitioning and
> replicates each partition using Raft consensus, providing low
> mean-time-to-recovery and low tail latencies. Kudu is designed within the
> context of the Apache Hadoop ecosystem and supports many integrations with
> other data analytics projects both inside and outside of the Apache
> Software Foundation.
>
>
>
> We propose to incubate Kudu as a project of the Apache Software Foundation.
>
> == Background ==
>
> In recent years, explosive growth in the amount of data being generated and
> captured by enterprises has resulted in the rapid adoption of open source
> technology which is able to store massive data sets at scale and at low
> cost. In particular, the Apache Hadoop ecosystem has become a focal point
> for such “big data” workloads, because many traditional open source
> database systems have lagged in offering a scalable alternative.
>
>
>
> Structured storage in the Hadoop ecosystem has typically been achieved in
> two ways: for static data sets, data is typically stored on Apache HDFS
> using binary data formats such as Apache Avro or Apache Parquet. However,
> neither HDFS nor these formats has any provision for updating individual
> records, or for efficient random access. Mutable data sets are typically
> stored in semi-structured stores such as Apache HBase or Apache Cassandra.
> These systems allow for low-latency record-level reads and writes, but lag
> far behind the static file formats in terms of sequential read throughput
> for applications such as SQL-based analytics or machine learning.
>
>
>
> Kudu is a new storage system designed and implemented from the ground up to
> fill this gap between high-throughput sequential-access storage systems
> such as HDFS and low-latency random-access systems such as HBase or
> Cassandra. While these existing systems continue to hold advantages in some
> situations, Kudu offers a “happy medium” alternative that can dramatically
> simplify the architecture of many common workloads. In particular, Kudu
> offers a simple API for row-level inserts, updates, and deletes, while
> providing table scans at throughputs similar to Parquet, a commonly-used
> columnar format for static data.
>
>
>
> More information on Kudu can be found at the existing open source project
> website: http://getkudu.io and in particular in the Kudu white-paper PDF:
> http://getkudu.io/kudu.pdf from which the above was excerpted.
>
> == Rationale ==
>
> As described above, Kudu fills an important gap in the open source storage
> ecosystem. After our initial open source project release in September 2015,
> we have seen a great amount of interest across a diverse set of users and
> companies. We believe that, as a storage system, it is critical to build an
> equally diverse set of contributors in the development community. Our
> experiences as committers and PMC members on other Apache projects have
> taught us the value of diverse communities in ensuring both longevity and
> high quality for such foundational systems.
>
> == Initial Goals ==
>
>  * Move the existing codebase, website, documentation, and mailing lists to
> Apache-hosted infrastructure
>  * Work with the infrastructure team to implement and approve our code
> review, build, and testing workflows in the context of the ASF
>  * Incremental development and releases per Apache guidelines
>
> == Current Status ==
>
>  Releases 
>
> Kudu has undergone one public release, tagged here
> https://github.com/cloudera/kudu/tree/kudu0.5.0-release
>
> This initial release was not performed in the typical ASF fashion -- no
> source tarball was released, but rather only convenience binaries made
> 

Re: [VOTE] Accept Impala into the Apache Incubator

2015-11-25 Thread Chris Douglas
+1 (binding) -C

On Tue, Nov 24, 2015 at 1:03 PM, Henry Robinson  wrote:
> Hi -
>
> The [DISCUSS] thread has been quiet for a few days, so I think there's been
> sufficient opportunity for discussion around our proposal to bring Impala
> to the ASF Incubator.
>
> I'd like to call a VOTE on that proposal, which is on the wiki at
> https://wiki.apache.org/incubator/ImpalaProposal, and which I've pasted
> below.
>
> During the discussion period, the proposal has been amended to add Brock
> Noland as a new mentor, to add one missed committer from the list and to
> correct some issues with the dependency list.
>
> Please cast your votes as follows:
>
> [] +1, accept Impala into the Incubator
> [] +/-0, non-counted vote to express a disposition
> [] -1, do not accept Impala into the Incubator (please give your reason(s))
>
> As with the concurrent Kudu vote, I propose leaving the vote open for a
> full seven days (to close at Tuesday, December 1st at noon PST), due to the
> upcoming US holiday.
>
> Thanks,
> Henry
>
> 
>
> = Abstract =
> Impala is a high-performance C++ and Java SQL query engine for data stored
> in Apache Hadoop-based clusters.
>
> = Proposal =
>
> We propose to contribute the Impala codebase and associated artifacts (e.g.
> documentation, web-site content etc.) to the Apache Software Foundation
> with the intent of forming a productive, meritocratic and open community
> around Impala’s continued development, according to the ‘Apache Way’.
>
> Cloudera owns several trademarks regarding Impala, and proposes to transfer
> ownership of those trademarks in full to the ASF.
>
> = Background =
> Engineers at Cloudera developed Impala and released it as an
> Apache-licensed open-source project in Fall 2012. Impala was written as a
> brand-new, modern C++ SQL engine targeted from the start for data stored in
> Apache Hadoop clusters.
>
> Impala’s most important benefit to users is high-performance, making it
> extremely appropriate for common enterprise analytic and business
> intelligence workloads. This is achieved by a number of software
> techniques, including: native support for data stored in HDFS and related
> filesystems, just-in-time compilation and optimization of individual query
> plans, high-performance C++ codebase and massively-parallel distributed
> architecture. In benchmarks, Impala is routinely amongst the very highest
> performing SQL query engines.
>
> = Rationale =
>
> Despite the exciting innovation in the so-called ‘big-data’ space, SQL
> remains by far the most common interface for interacting with data in both
> traditional warehouses and modern ‘big-data’ clusters. There is clearly a
> need, as evidenced by the eager adoption of Impala and other SQL engines in
> enterprise contexts, for a query engine that offers the familiar SQL
> interface, but that has been specifically designed to operate in massive,
> distributed clusters rather than in traditional, fixed-hardware,
> warehouse-specific deployments. Impala is one such query engine.
>
> We believe that the ASF is the right venue to foster an open-source
> community around Impala’s development. We expect that Impala will benefit
> from more productive collaboration with related Apache projects, and under
> the auspices of the ASF will attract talented contributors who will push
> Impala’s development forward at pace.
>
> We believe that the timing is right for Impala’s development to move
> wholesale to the ASF: Impala is well-established, has been Apache-licensed
> open-source for more than three years, and the core project is relatively
> stable. We are excited to see where an ASF-based community can take Impala
> from this strong starting point.
>
> = Initial Goals =
> Our initial goals are as follows:
>
>  * Establish ASF-compatible engineering practices and workflows
>  * Refactor and publish existing internal build scripts and test
> infrastructure, in order to make them usable by any community member.
>  * Transfer source code, documentation and associated artifacts to the ASF.
>  * Grow the user and developer communities
>
> = Current Status =
>
> Impala is developed as an Apache-licensed open-source project. The source
> code is available at http://github.com/cloudera/Impala, and developer
> documentation is at https://github.com/cloudera/Impala/wiki. The majority
> of commits to the project have come from Cloudera-employed developers, but
> we have accepted some contributions from individuals from other
> organizations.
>
> All code reviews are done via a public instance of the Gerrit review tool
> at http://gerrit.cloudera.org:8080/, and discussed on a public mailing
> list. All patches must be reviewed before they are accepted into the
> codebase, via a voting mechanism that is similar to that used on Apache
> projects such as Hadoop and HBase.
>
> Before a patch is committed, it must pass a suite of pre-commit tests.
> These tests are currently run on Cloudera’s internal 

Re: RTC vs CTR (was: Concerning Sentry...)

2015-11-25 Thread Upayavira
Not replying to this mail specifically, but to the thread in general...

People keep using the terms RTC and CTR as if we all mean the same
thing. Please don't. If you must use these terms, please define what you
mean by them.

CTR is a less ambiguous term - I'd suggest we all assume that "commit"
means a push to a version control system.

However, RTC seems to mean many things - from "push to JIRA for review
first, wait a bit, then commit to VCS" through "push to JIRA, and once
you have sufficient +1 votes, you can commit" to "push to JIRA for a
review, then another committer must commit it".

If we're gonna debate RTC, can we please describe which of these we are
talking about (or some other mechanism that I haven't described)?
Otherwise, we will end up endlessly debating over the top of each other.

Upayavira

On Wed, Nov 25, 2015, at 09:28 AM, Harbs wrote:
> AIUI, there’s two ways to go about RTC which is easier in Git:
> 1) Working in feature/bug fix branches. Assuming RTC only applies to the
> main branch, changes are done in separate branches where commits do not
> require review. The feature/bug fix branch is then only merged back in
> after it had a review. The reason this is easier is because branching and
> merging is almost zero effort in Git. Many Git workflows don’t work on
> the main branch anyway, so this is a particularly good fit for those
> workflows.
> 2) Pull requests. Using pull requests, all changes can be pulled in with
> a single command.
> 
> I’ve personally never participated in RTC (unless you count Github
> projects and before I was a committer in Flex), so it could be I’m
> missing something.
> 
> Of course there’s nothing to ENFORCE that the commit is not done before a
> review, but why would you want to do that? That’s where trust comes to
> play… ;-)
> 
> Harbs
> 
> On Nov 25, 2015, at 4:08 AM, Konstantin Boudnik  wrote:
> 
> > I don't think Git is particularly empowering RTC - there's nothing in it 
> > that
> > requires someone to look over one's shoulder.
> 

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: RTC vs CTR (was: Concerning Sentry...)

2015-11-25 Thread Harbs
Very good point, but I’m not sure that CTR is that much less ambiguous.

It would be interesting to compare different models both that users consider 
CTR as well as RTC. I have a feeling there is some overlap of “CTR” and “RTC”.

I’m pretty sure that a lot of folks call some CTR cases “RTC”. It’s pretty hard 
to review changes which are not in a source control system some way or another. 
Attaching a patch to a JIRA is a pretty clunky way of going about that.

In particular, I’m interested in knowing how much “R” prior to “C” people have 
trouble with (Greg specifically as he seems to be the most vocal in his 
opposition). What workflows do “CTR” proponents like to use?

Thanks,
Harbs

On Nov 25, 2015, at 6:47 PM, Upayavira  wrote:

> Not replying to this mail specifically, but to the thread in general...
> 
> People keep using the terms RTC and CTR as if we all mean the same
> thing. Please don't. If you must use these terms, please define what you
> mean by them.
> 
> CTR is a less ambiguous term - I'd suggest we all assume that "commit"
> means a push to a version control system.
> 
> However, RTC seems to mean many things - from "push to JIRA for review
> first, wait a bit, then commit to VCS" through "push to JIRA, and once
> you have sufficient +1 votes, you can commit" to "push to JIRA for a
> review, then another committer must commit it".
> 
> If we're gonna debate RTC, can we please describe which of these we are
> talking about (or some other mechanism that I haven't described)?
> Otherwise, we will end up endlessly debating over the top of each other.
> 
> Upayavira
> 
> On Wed, Nov 25, 2015, at 09:28 AM, Harbs wrote:
>> AIUI, there’s two ways to go about RTC which is easier in Git:
>> 1) Working in feature/bug fix branches. Assuming RTC only applies to the
>> main branch, changes are done in separate branches where commits do not
>> require review. The feature/bug fix branch is then only merged back in
>> after it had a review. The reason this is easier is because branching and
>> merging is almost zero effort in Git. Many Git workflows don’t work on
>> the main branch anyway, so this is a particularly good fit for those
>> workflows.
>> 2) Pull requests. Using pull requests, all changes can be pulled in with
>> a single command.
>> 
>> I’ve personally never participated in RTC (unless you count Github
>> projects and before I was a committer in Flex), so it could be I’m
>> missing something.
>> 
>> Of course there’s nothing to ENFORCE that the commit is not done before a
>> review, but why would you want to do that? That’s where trust comes to
>> play… ;-)
>> 
>> Harbs
>> 
>> On Nov 25, 2015, at 4:08 AM, Konstantin Boudnik  wrote:
>> 
>>> I don't think Git is particularly empowering RTC - there's nothing in it 
>>> that
>>> requires someone to look over one's shoulder.
>> 
> 
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
> 


-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: RTC vs CTR (was: Concerning Sentry...)

2015-11-25 Thread Sam Ruby
On Wed, Nov 25, 2015 at 5:18 PM, Greg Stein  wrote:
> On Wed, Nov 25, 2015 at 4:02 PM, Sam Ruby  wrote:
>
>> On Wed, Nov 25, 2015 at 4:51 PM, Greg Stein  wrote:
>> >
>> > Don't shut down trunk/master for product development.
>>
>> I don't believe you heard my point, but I'm not going to repeat it.
>
> I read your post several times, completely :-P ... I just think it didn't
> argue against RTC being a form on control. (and yeah, maybe you weren't
> trying to argue that?)

I don't believe that RTC is a form of control over others.  I believe
that RTC is a mechanism to ensure that every change is adequately
reviewed.

>> Instead I will add a new point.
>>
>> 'trunk/master for product development' is not the only development
>> model available to a project.  As an example, I've seen models where
>> 'trunk/master is for product maintenance', and all development occurs
>> in a branch explicitly designated as where work on the next release is
>> to occur.
>>
>
> I think that is just playing with names. In Apache Subversion the "product
> maintenance" is branches/1.8.x and branches/1.9.x (1.7.x and prior are
> deprecated). trunk is for "next release".
>
> In your naming model, where we've seen the name "develop" for "next
> release" (aka where all new dev occurs), then I'd say making it RTC is
> harmful.
>
> trunk/master was shorthand for "where dev occurs". If you want to use a
> different name... okay. :-)

I don't believe it is just playing with names.  There are projects in
when all non-trivial development occurs in feature branches.

> Cheers,
> -g
>
> ps. fwiw, trunk/tags/branches isn't mandated in svn either. It was just an
> ad hoc template we came up with back near the start of the project. We
> assumed third-party tools would focus around that naming, which is
> generally true, but svn itself has never cared.

Ack.

- Sam Ruby

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: Adopting non-ASF AL projects (was Re: [DISCUSS] Kudu incubator proposal)

2015-11-25 Thread John D. Ament
If we use groovy as an example, a single contributor provided an SGA and
signed it himself.  no other contributors signed the SGA.

On Wed, Nov 25, 2015 at 1:01 PM Alex Harui  wrote:

> Renaming thread since my question doesn't have anything to do with Kudu.
>
> I'm trying to resolve Greg's "opt-out" response, vs Roy's "blessing of the
> original authors" in the link to the archives Owen posted.  I've always
> assumed that the "blessing..." part meant that any non-ASF code base, even
> ones under AL, had to come in with an SGA signed by ALL of the original
> copyright holders.
>
> Specifically, there are two code bases under AL where the major
> contributors have indicated that they would like our project to take over
> change-control.  These donations have been held up by trying to chase down
> all of the folks who made smaller contributions and getting them to sign
> an SGA.  There really isn't any community around these code bases right
> now, but our project is interested in them because under ASF practices,
> they can at least get occasional attention without the major contributors
> having to be involved.
>
> Is an SGA needed?  If not, is there a recommended practice for providing
> notification such that folks who want to opt-out can find out the
> change-control for code base is moving to the ASF?
>
> Thanks,
> -Alex
>
> On 11/24/15, 8:01 PM, "Owen O'Malley"  wrote:
>
> >On Tue, Nov 24, 2015 at 7:39 PM, Greg Stein  wrote:
> >
> >> On Mon, Nov 23, 2015 at 12:46 PM, Alex Harui  wrote:
> >>
> >> > On 11/23/15, 8:23 AM, "Mattmann, Chris A (3980)"
> >> >  wrote:
> >> >
> >> > >Alex,
> >> > >
> >> > >Please re-read my email. As I stated we don’t take code that
> >> > >authors don’t want us to have. So far, we haven’t heard from any of
> >> > >the authors on the incoming Kudu project that that’s the case. If
> >> > >it’s not the case, we go by the license of the project which
> >>stipulates
> >> > >how code can be copied, modified, reused, etc.
> >> >
> >> > Yes, but my interpretation of your words is that folks have to opt
> >>out,
> >> >
> >>
> >> Correct: opt-out.
> >>
> >> Since this code is under ALv2, we can import it to the ASF under that
> >> license. We have always done stuff like this, including other permissive
> >> licenses.
> >>
> >> But this isn't simply importing a library, this is saying "the ASF is
> >>now
> >> the primary locus of development for >this< code." And that's where
> >>people
> >> can say, "woah. I hate you guys. don't develop my code there", and so we
> >> nuke it.
> >>
> >> SGA/iCLA is to give us rights that we otherwise wouldn't have (ie. the
> >>code
> >> was under a different license).
> >>
> >
> >It is worth looking back at the thread on Bloodhound
> ><
> http://www.google.com/url?q=http%3A%2F%2Fmail-archives.apache.org%2Fmod_m
> >box%2Fincubator-general%2F201201.mbox%2F%253C0F2EA54E-4419-428F-A604-46EF5
> >9C40469%2540gbiv.com
> %253E=D=1=AFQjCNG4tmh9dY86HFVyRZlTE66tCjvh
> >Kg>
> >.
> >
> >The important thing is that Apache doesn't fork communities. In this case,
> >the community wants to move to Apache. That is great and should be
> >allowed.
> >They shouldn't need to get an explicit permission from each contributor
> >over the years.
> >
> >.. Owen
>
>
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>


Re: [VOTE] Apache Wave Release 0.4.0-incubating (RC10)

2015-11-25 Thread Ali Lown
Bump. I could still do with another vote from an IPMC member.

Thanks,
Ali

On 12 November 2015 at 16:02, Ali Lown  wrote:
> Hi all,
>
> I could still do with someone else taking a look at these artifacts.
>
> Currently there is one +1 vote from Justin Mclean, and one +1 vote
> from Christian Grobmeier that can be forwarded from the wave-dev list.
>
> My understanding is that I require at least 3 binding +1's from IPMC.
>
> Thanks,
> Ali
>
> On 3 November 2015 at 16:09, Ali Lown  wrote:
>> Hi,
>>
>> Wave has just completed its first passing vote, and would like to make
>> its first release since joining the incubator in 2010.
>>
>> The Wave PPMC has voted in favour with 6 binding votes, and 4
>> non-binding votes from the community.
>>
>> PPMC vote call:
>> https://mail-archives.apache.org/mod_mbox/incubator-wave-dev/201510.mbox/%3CCABRGrVdhdhhdRMwJ9jsWxzqMX9ijzgG%3Di5BF1YJBAZ2juYFORg%40mail.gmail.com%3E
>>
>> PPMC vote result (second result after a slow start for the first attempt):
>> https://mail-archives.apache.org/mod_mbox/incubator-wave-dev/201511.mbox/%3CCABRGrVenajwQPBw98Zy99UqrMNNQMJ-X5tGcWWAMR3xNX%2Bnu7w%40mail.gmail.com%3E
>>
>> Staging directory:
>> https://dist.apache.org/repos/dist/dev/incubator/wave/0.4.0-rc10/
>>
>> Please note that
>> - checksums are generated from 'gpg --print-md SHA512 $f > $f.sha'
>> - .zip versions are generated from the .tar.bz2 version by repacking
>> - binary artifacts are provided for convenience of the end-user, the
>> vote is primarily regarding the source artifacts
>>
>> Git branch:
>> https://git-wip-us.apache.org/repos/asf?p=incubator-wave.git;a=shortlog;h=refs/heads/0.4.0-rc10
>>
>> KEYS file:
>> https://dist.apache.org/repos/dist/dev/incubator/wave/0.4.0-rc10/KEYS
>>
>> Please verify the release and vote. This vote will close after 72
>> hours on 6th November.
>>
>> [] +1: I approve
>> [] -1: I disapprove because...
>>
>> Please add (binding) if your vote is binding.
>>
>> Thanks,
>> Ali

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: RTC vs CTR (was: Concerning Sentry...)

2015-11-25 Thread Konstantin Boudnik
On Wed, Nov 25, 2015 at 05:12PM, Chris Douglas wrote:
> RTC is regulation. That's not a synonym for control when it's
> conflated with suspicion of people. Regulation is a set of deliberate
> checks on a system.
> 
> Good regulation estimates (or reacts to) a system's natural excesses,
> then attempts to constrain existential threats. It isn't a lack of
> trust, but how trust is scaled. RTC can encourage review (where

And that goes, as always, to the question "Who makes the decision about the
_right_ level of trust". And, of course, how to make sure the governing body
is corruption-proof. In other words: there are no such thing as 'good
externalized regulation' because sooner or later it gets abused one way or
another. And I dare you to prove me wrong on this ;)

> oversight might be weak), throttle the pace of change (where sheer
> volume might discourage volunteers and exclude individuals), and
> identify code with a discouraging "bus factor" for attention or
> removal (where an isolated contributor can't solicit any feedback).
> Todd, Steve, Andrew, and others already covered other, intended
> desiderata.
> 
> Bad regulation erroneously classifies the power structure as part of
> the system, and threats to powerful people as existential threats to
> the system. It preserves privilege at the expense of individual
> initiative. RTC can mire committers in review, throttle the pace of
> change artificially, and entrench project members behind an inertial
> default. These unintended consequences create new existential threats
> to a project, which either require subsequent regulation/monitoring or
> they prove RTC to be worse than the diseases it remedied.[1]

Supposedly, regulations are introduced as a reaction to failures, if I read
what you're saying correctly. Empirically, majority of the failures in the
self-regulating systems are a result of ill-conceived interventions from the
last time. And we are going into a self-perpetuating cycle where a bad idea
leads to an even worst one and so on, until the system grinds to a halt or
collapse.

And you're right - there are always unintended consequences to artificial
limitations of any sort. One can not create a perfect set of fixed rules to
address all possible future permutations. After all, this is exactly how
the complex dynamic systems work.

In this respect, it is wiser to let the system find the equilibrium by
letting it go, and make small, localized tweaks when/if they needed. In our
case, CTR relies on actors' best judgement with postponed negative feedback if
something goes wrong or deemed incorrect. Such systems prove to be the most
effective when compared with the rigidness of N-pager guidelines document for
every step along the way.

Cos

> In practice, RTC does all these simultaneously, and the community is
> responsible for ensuring the implementation is effective, efficient,
> and just. That balance isn't static, either. One chooses RTC not
> because the code has some property (complexity, size, etc.), but
> because the community does, at the time.
> 
> All that said: many, maybe most projects entering incubation should
> try CTR, and adopt RTC if there's some concrete reason that justifies
> added governance. If the culture requests reviews, enforces tests/CI,
> members can keep up with changes, etc. then most probably won't bother
> with RTC. If the project already has an RTC culture and they want to
> keep it, we've seen that work, too. -C
> 
> 
> [1] RTC/CTR isn't the last policy choice the project makes, either.
> Allowing feature branches to work as CTR (complemented by branch
> committers) can dampen the shortcomings of enforcing RTC on
> trunk/release branches. Policies allowing non-code changes, etc. have
> been mentioned elsewhere in the thread.
> 
> 
> On Wed, Nov 25, 2015 at 12:39 PM, Greg Stein  wrote:
> > Boo hoo. Todd said it wasn't about control, and then a few days later said
> > he was forcing people into doing reviews. So yeah: in his case, it *is*
> > about control.
> >
> > Over the 17 years I've been around Apache, every single time I've seen
> > somebody attempt to justify something like RTC, it always goes back to
> > control. Always.
> >
> > -g
> >
> >
> > On Wed, Nov 25, 2015 at 2:35 PM, Andrew Purtell 
> > wrote:
> >
> >> I have to completely disagree and find your assertion vaguely offensive.
> >>
> >> > On Nov 25, 2015, at 12:32 PM, Greg Stein  wrote:
> >> >
> >> > On Wed, Nov 25, 2015 at 12:44 PM, Andrew Purtell 
> >> > wrote:
> >> >> ...
> >> >>
> >> >> and inherited the RTC ethic from our parent community. I did recently
> >> test
> >> >> the state of consensus on RTC vs CTR there and it still holds. I think
> >> this
> >> >> model makes sense for HBase, which is a mature (read: complex) code base
> >> >> that implements a distributed database. For sure we want multiple sets
> >> of
> >> >>
> >> >
> >> > I call bullshit. "complex" 

Re: RTC vs CTR (was: Concerning Sentry...)

2015-11-25 Thread Ralph Goers

> On Nov 25, 2015, at 3:22 PM, Todd Lipcon  wrote:
> 
> Isn't it an issue of scalability? With pre-commit code reviews, typically
> the uploader of the code will pick out one or two people to review the code
> who know the area well. Or, if no one is picked by the submitter of the
> patch, the committers will organically end up deciding who is to review the
> code, at which point that reviewer ends up being a sort of shepherd for the
> patch, sticking with the contributor through multiple revs until it's ready
> for commit.
> 
> With post-commit review, do you expect to watch the mailing list and review
> every patch that comes in? In a project like Hadoop, that's not feasible --
> we've had ~35,000 lines of code changed in the last month in 267 patches.
> If everyone tries to review every patch post-commit, you end up with n^2
> work as the community grows.

Maven is a large project with a decent number of committers. People naturally 
pick their areas and review the code in their areas of interest because it is 
important to them. Maven also has a large suite of integration tests for the 
core and a group of people who are interested in that. Each of the Maven 
plugins has a group of people who gravitate towards them. No one really has to 
assign anything.

With Log4j we have one individual who commits a lot but most of his commits are 
to change variable names, fix javadoc bugs, and other general code cleanup 
tasks.  I will look at the commit log message and won’t review those directly 
because I know that is what he is doing. But when he makes a code change with a 
Jira issue tag I will do my best to review that. That won’t be reflected on the 
dev list because I only comment when I find something that needs to be updated.

The major difference I see between the CTR projects and the RTC projects is 
that the RTC projects mandate that everything has to go through the 
review-then-commit process.  CTR projects a) allow portions of the project to 
be RTC or b) allow committers to choose to use RTC for specific commits.  It is 
almost like we are dealing with the difference between the GPL, where the code 
is supposedly free, and non-copyleft licenses where the user is free. Both can 
get the job done but both come with a different set of benefits and costs. Much 
as I don’t care to participate in GPL projects I also don’t care to participate 
in pure RTC projects as both restrict me in ways I very much dislike,

Ralph




-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: RTC vs CTR (was: Concerning Sentry...)

2015-11-25 Thread Todd Lipcon
On Wed, Nov 25, 2015 at 7:06 PM, Ralph Goers 
wrote:


> Much as I don’t care to participate in GPL projects I also don’t care to
> participate in pure RTC projects as both restrict me in ways I very much
> dislike,
>
>
You're entitled to that opinion. I personally don't care to participate in
CTR projects. So, as stated above, let's agree to disagree, and let each
community within Apache decide for themselves.

-Todd


Re: RTC vs CTR (was: Concerning Sentry...)

2015-11-25 Thread Chris Douglas
RTC is regulation. That's not a synonym for control when it's
conflated with suspicion of people. Regulation is a set of deliberate
checks on a system.

Good regulation estimates (or reacts to) a system's natural excesses,
then attempts to constrain existential threats. It isn't a lack of
trust, but how trust is scaled. RTC can encourage review (where
oversight might be weak), throttle the pace of change (where sheer
volume might discourage volunteers and exclude individuals), and
identify code with a discouraging "bus factor" for attention or
removal (where an isolated contributor can't solicit any feedback).
Todd, Steve, Andrew, and others already covered other, intended
desiderata.

Bad regulation erroneously classifies the power structure as part of
the system, and threats to powerful people as existential threats to
the system. It preserves privilege at the expense of individual
initiative. RTC can mire committers in review, throttle the pace of
change artificially, and entrench project members behind an inertial
default. These unintended consequences create new existential threats
to a project, which either require subsequent regulation/monitoring or
they prove RTC to be worse than the diseases it remedied.[1]

In practice, RTC does all these simultaneously, and the community is
responsible for ensuring the implementation is effective, efficient,
and just. That balance isn't static, either. One chooses RTC not
because the code has some property (complexity, size, etc.), but
because the community does, at the time.

All that said: many, maybe most projects entering incubation should
try CTR, and adopt RTC if there's some concrete reason that justifies
added governance. If the culture requests reviews, enforces tests/CI,
members can keep up with changes, etc. then most probably won't bother
with RTC. If the project already has an RTC culture and they want to
keep it, we've seen that work, too. -C


[1] RTC/CTR isn't the last policy choice the project makes, either.
Allowing feature branches to work as CTR (complemented by branch
committers) can dampen the shortcomings of enforcing RTC on
trunk/release branches. Policies allowing non-code changes, etc. have
been mentioned elsewhere in the thread.


On Wed, Nov 25, 2015 at 12:39 PM, Greg Stein  wrote:
> Boo hoo. Todd said it wasn't about control, and then a few days later said
> he was forcing people into doing reviews. So yeah: in his case, it *is*
> about control.
>
> Over the 17 years I've been around Apache, every single time I've seen
> somebody attempt to justify something like RTC, it always goes back to
> control. Always.
>
> -g
>
>
> On Wed, Nov 25, 2015 at 2:35 PM, Andrew Purtell 
> wrote:
>
>> I have to completely disagree and find your assertion vaguely offensive.
>>
>> > On Nov 25, 2015, at 12:32 PM, Greg Stein  wrote:
>> >
>> > On Wed, Nov 25, 2015 at 12:44 PM, Andrew Purtell 
>> > wrote:
>> >> ...
>> >>
>> >> and inherited the RTC ethic from our parent community. I did recently
>> test
>> >> the state of consensus on RTC vs CTR there and it still holds. I think
>> this
>> >> model makes sense for HBase, which is a mature (read: complex) code base
>> >> that implements a distributed database. For sure we want multiple sets
>> of
>> >>
>> >
>> > I call bullshit. "complex" my ass. I've said it before: all software is
>> > complex, and yours is no more complex than another. That is NOT a
>> rationale
>> > for installing RTC. It is an excuse for maintaining undue control.
>> >
>> > -g
>>
>> -
>> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
>> For additional commands, e-mail: general-h...@incubator.apache.org
>>
>>

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: RTC vs CTR (was: Concerning Sentry...)

2015-11-25 Thread Sam Ruby
On Wed, Nov 25, 2015 at 9:13 PM, Konstantin Boudnik  wrote:
>
> And that goes, as always, to the question "Who makes the decision about the
> _right_ level of trust".

The community.

- Sam Ruby

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: RTC vs CTR (was: Concerning Sentry...)

2015-11-25 Thread Andrew Purtell
Most of the Hadoop ecosystem uses RTC. I can't speak to other projects but
on the one I chair there's no conspiracy to exclude anyone.

I chair Bigtop. We recently tested a switch to CTR. It went very well and
so we just wrapped up a vote to make it the permanent state of affairs. I
think this is the best option for Bigtop, which has a small but active
group of committers each working in loosely coupled ways on different parts
of the tree. I also chair HBase. We were spun out of Hadoop direct to TLP
and inherited the RTC ethic from our parent community. I did recently test
the state of consensus on RTC vs CTR there and it still holds. I think this
model makes sense for HBase, which is a mature (read: complex) code base
that implements a distributed database. For sure we want multiple sets of
eyes on changes there. They can have unexpected consequences. Almost above
all, we want to do due diligence on not introducing bugs that lose user
data.

So, to each their own? Please.




On Sun, Nov 22, 2015 at 2:05 PM, Ralph Goers 
wrote:

> Yes, it would be good to take a survey.  Interestingly, I wasn’t aware
> that ANY Apache projects used RTC until I became involved with a project in
> the Hadoop ecosystem, which seems to align with Tood’s statement since all
> the projects he is listed as being involved in are part of that.  In fact,
> when I was mentoring the project I am familiar with I asked during
> incubation why they wanted to use RTC and was told that it was because that
> is the way all Hadoop related projects worked. Since most of the committers
> were paid to work on the project by their employer I also got the feeling
> that it aligned with that.
>
> Ralph
>
> > On Nov 22, 2015, at 1:18 PM, Konstantin Boudnik  wrote:
> >
> > On Tue, Nov 17, 2015 at 11:12PM, Todd Lipcon wrote:
> >> On Tue, Nov 17, 2015 at 10:48 PM, Emmanuel Lécharny <
> elecha...@gmail.com>
> >> wrote:
> >>>
> >
>  Except that there seems to be great disagreement among the Members as
> to
>  whether RTC is somehow anti-Apache-Way.
> 
>  If you want to try to create an ASF-wide resolution that RTC doesn't
> >>> follow
>  the Apache Way, and get the board/membership to vote on it, go ahead,
> but
>  it confuses podlings who are new to the ASF when people espouse
> personal
>  opinions as if they are ASF rules.
> >>>
> >>> That is not the point.
> >>>
> >>>
> >>> The question is not to decide if C-T-R is The Apache Way over R-T-C.
> The
> >>> question is wether a project entering incubation with a selected R-T-C
> >>> mode is likely to exit incubation for the simple reason it will be very
> >>> hard for this project to grow its community due to this choice. It's
> >>> like starting a 100m race with a 20kb backpack on your shoulder...
> >>>
> >>
> >> If you have any statistics that show this to be the case, I'd be very
> >> interested. RTC is the norm in basically every Apache project I've been
> a
> >> part of, many of which have thriving communities and are generally
> regarded
> >> as successful software projects.
> >
> > Do you have any statistics on that, Todd? Would be very interesting to
> see,
> > indeed.
> >
>
>
>
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>


-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)


[ANNOUNCE] CFP open for ApacheCon North America 2016

2015-11-25 Thread Rich Bowen
Community growth starts by talking with those interested in your
project. ApacheCon North America is coming, are you?

We are delighted to announce that the Call For Presentations (CFP) is
now open for ApacheCon North America. You can submit your proposed
sessions at
http://events.linuxfoundation.org/events/apache-big-data-north-america/program/cfp
for big data talks and
http://events.linuxfoundation.org/events/apachecon-north-america/program/cfp
for all other topics.

ApacheCon North America will be held in Vancouver, Canada, May 9-13th
2016. ApacheCon has been running every year since 2000, and is the place
to build your project communities.

While we will consider individual talks we prefer to see related
sessions that are likely to draw users and community members. When
submitting your talk work with your project community and with related
communities to come up with a full program that will walk attendees
through the basics and on into mastery of your project in example use
cases. Content that introduces what's new in your latest release is also
of particular interest, especially when it builds upon existing well
know application models. The goal should be to showcase your project in
ways that will attract participants and encourage engagement in your
community, Please remember to involve your whole project community (user
and dev lists) when building content. This is your chance to create a
project specific event within the broader ApacheCon conference.

Content at ApacheCon North America will be cross-promoted as
mini-conferences, such as ApacheCon Big Data, and ApacheCon Mobile, so
be sure to indicate which larger category your proposed sessions fit into.

Finally, please plan to attend ApacheCon, even if you're not proposing a
talk. The biggest value of the event is community building, and we count
on you to make it a place where your project community is likely to
congregate, not just for the technical content in sessions, but for
hackathons, project summits, and good old fashioned face-to-face networking.

-- 
rbo...@apache.org
http://apache.org/

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Adopting non-ASF AL projects (was Re: [DISCUSS] Kudu incubator proposal)

2015-11-25 Thread Alex Harui
Renaming thread since my question doesn't have anything to do with Kudu.

I'm trying to resolve Greg's "opt-out" response, vs Roy's "blessing of the
original authors" in the link to the archives Owen posted.  I've always
assumed that the "blessing..." part meant that any non-ASF code base, even
ones under AL, had to come in with an SGA signed by ALL of the original
copyright holders.

Specifically, there are two code bases under AL where the major
contributors have indicated that they would like our project to take over
change-control.  These donations have been held up by trying to chase down
all of the folks who made smaller contributions and getting them to sign
an SGA.  There really isn't any community around these code bases right
now, but our project is interested in them because under ASF practices,
they can at least get occasional attention without the major contributors
having to be involved.

Is an SGA needed?  If not, is there a recommended practice for providing
notification such that folks who want to opt-out can find out the
change-control for code base is moving to the ASF?

Thanks,
-Alex

On 11/24/15, 8:01 PM, "Owen O'Malley"  wrote:

>On Tue, Nov 24, 2015 at 7:39 PM, Greg Stein  wrote:
>
>> On Mon, Nov 23, 2015 at 12:46 PM, Alex Harui  wrote:
>>
>> > On 11/23/15, 8:23 AM, "Mattmann, Chris A (3980)"
>> >  wrote:
>> >
>> > >Alex,
>> > >
>> > >Please re-read my email. As I stated we don’t take code that
>> > >authors don’t want us to have. So far, we haven’t heard from any of
>> > >the authors on the incoming Kudu project that that’s the case. If
>> > >it’s not the case, we go by the license of the project which
>>stipulates
>> > >how code can be copied, modified, reused, etc.
>> >
>> > Yes, but my interpretation of your words is that folks have to opt
>>out,
>> >
>>
>> Correct: opt-out.
>>
>> Since this code is under ALv2, we can import it to the ASF under that
>> license. We have always done stuff like this, including other permissive
>> licenses.
>>
>> But this isn't simply importing a library, this is saying "the ASF is
>>now
>> the primary locus of development for >this< code." And that's where
>>people
>> can say, "woah. I hate you guys. don't develop my code there", and so we
>> nuke it.
>>
>> SGA/iCLA is to give us rights that we otherwise wouldn't have (ie. the
>>code
>> was under a different license).
>>
>
>It is worth looking back at the thread on Bloodhound
>box%2Fincubator-general%2F201201.mbox%2F%253C0F2EA54E-4419-428F-A604-46EF5
>9C40469%2540gbiv.com%253E=D=1=AFQjCNG4tmh9dY86HFVyRZlTE66tCjvh
>Kg>
>.
>
>The important thing is that Apache doesn't fork communities. In this case,
>the community wants to move to Apache. That is great and should be
>allowed.
>They shouldn't need to get an explicit permission from each contributor
>over the years.
>
>.. Owen


-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org


Re: RTC vs CTR (was: Concerning Sentry...)

2015-11-25 Thread Steve Loughran

> On 22 Nov 2015, at 22:34, Branko Čibej  wrote:
> 
> 
> The major question here, for me, is: if the project is RTC, then why
> would I make an effort to become a committer if at the end of the day
> I'm still not trusted to know when to ask for review? It'd be less work
> to throw patches at the developers and let them deal with the pain of
> reviewing and applying.
> 

what you gain as committer is not so much the right to do the housekeeping of 
svn commit/git commit, it's the right to commit other people's code in after 
reviewing it.

And while anyone is encouraged to review patches on JIRA/github, etc, your 
ability to +1 code says you are trusted to make changes to the code without 
breaking things. That is: your knowledge of the code is deemed sufficient to be 
able to review the work of others, to help guide them into a state where it can 
be committed, and if not: explain why not. You just still have to go through 
the same process of submission and review with your peers, so there is a 
guarantee that 1 other person is always aware of what you do.

That ability to +1 code is the right and the responsibility. 



> How would it feel to get a mail like this:
> 
>Congratulations! The developers of Project FOO invite you to become
>a committer. All your patches to date have been perfect and your
>other contributions outstanding. Of course we still won't let you
>commit your changes unless [brass hats] have reviewed and approved
>them; we operate by a review-then-commit process. The only real
>benefit of committer status is that you can now review other
>people's patches and have a binding opinion, unless [brass hats]
>have written otherwise in the bylaws.

yes: you get to have a direct say in what goes into the codebase.

you also get a duty: you need to review other people's work. We need to 
encourage more of that in the Hadoop codebase. I know its a chore, but Yetus is 
helping, as should the github integration.

> 
>P.S.: Any competent engineer will immediately see that the optimal
>way to proceed is to join an informal group of committers that
>mutually +1 each other's patches without unnecessary hassle, and/or
>ingratiate yourself with [brass hats] to achieve equivalent results.
>After all, it's all about building a healthy community, right?

it would, though it'd stand out. And if you want things to work without 
fielding support calls, you want the quality of what goes in to be high -no 
matter from whom it came.

If you work in specific part of the code, you do end up knowing the people who 
also work there, their skills, their weaknesses: who is most likely to break 
things. So you may show some favouritism to people  you trust. Explicit 
tradings of patches? Not me.
-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org


Re: RTC vs CTR (was: Concerning Sentry...)

2015-11-25 Thread Greg Stein
On Nov 25, 2015, at 4:08 AM, Konstantin Boudnik  wrote:
> I don't think Git is particularly empowering RTC - there's nothing in it
that
> requires someone to look over one's shoulder.

On Wed, Nov 25, 2015 at 3:28 AM, Harbs  wrote:

> AIUI, there’s two ways to go about RTC which is easier in Git:
>

That's not what Cos said. He said using Git does not lead to RTC.

If RTC has been chosen, then you're right: Git makes it easier [than svn].
But you've swapped cause/effect from what Cos was saying.

Cheers,
-g


Re: RTC vs CTR (was: Concerning Sentry...)

2015-11-25 Thread Harbs
If a review is required for non-code changes to the main branch, then I agree.

I’m sure you agree that reviews on code make for less bugs. We all make 
mistakes and can overlook things. It seems kind of extreme to assume that this 
kind of required review is all about control. Since anyone who can commit can 
review, it’s kind of hard for me to swallow that.

I assume your logic is that the reviews can come after the commit. Sure. But 
what if it doesn’t?

Case in point: I made some pretty major changes to TLF in Flex which 
constituted a number of months of work. I’m willing to bet that not every 
commit I did was checked by others. I did a decent job, but there were a few 
regressive bugs in my code, and I accidentally reverted some code in my commit 
as well. In a workflow where my code would have to get one or more +1s before I 
committed it to the main branch, it’s likely that the reverted commit (at the 
least) would have been caught.

I would actually welcome knowing someone looked over my code for a sanity check.

Harbs

On Nov 25, 2015, at 10:49 PM, Greg Stein  wrote:

> That is pretty normal operation in both styles of workflow. My concern is
> with trunk/master. Is a committer trusted enough to make changes directly?
> 
> If all meaningful changes (ie. changing APIs and algorithms, not just
> fixing typos w/o review) are not trusted, and require review/permission,
> then I'm against that.
> 
> It is good practice to put potentially disruptive code onto a branch while
> it is developed, then merge it when complete. Trusting a committer to ask
> for review before the merge is great. Requiring it, less so.
> 
> But RTC on trunk/master is harmful.
> 
> Cheers,
> -g
> 
> On Wed, Nov 25, 2015 at 2:44 PM, Harbs  wrote:
> 
>> What about commit to feature/bug brach, review and then commit to main
>> branch?
>> 
>> Is that CTR or RTC in your book?
>> 
>> On Nov 25, 2015, at 10:42 PM, Greg Stein  wrote:
>> 
>>> I object to Lucene's path, too. A committer's judgement is not trusted
>>> enough to make a change without upload/review. They need permission first
>>> (again: to use your term; it works great).
>>> 
>>> On Wed, Nov 25, 2015 at 2:39 PM, Upayavira  wrote:
>>> 
 Some setups that people call RTC are actually CTR in your nomenclature,
 so we could be talking cross-purposes. That's all I'm trying to avoid.
 E.g. Lucene - everything happens in JIRA first (upload patch, wait for
 review), but once that has happened, you are free to commit away. So
 strictly, it is RTC, but not seemingly in the sense you are objecting
 to.
 
 Upayavira
 
 On Wed, Nov 25, 2015, at 08:35 PM, Greg Stein wrote:
> I think this is a distraction. You said it best the other day: RTC
> implies
> the need for "permission" before making a change to the codebase.
> Committers are not trusted to make a judgement on whether a change
>> should
> be made.
> 
> CTR trusts committers to use their judgement. RTC distrusts committers,
> and
> makes them seek permission [though one of several mechanisms].
> 
> -g
> 
> On Wed, Nov 25, 2015 at 10:47 AM, Upayavira  wrote:
> 
>> Not replying to this mail specifically, but to the thread in
>> general...
>> 
>> People keep using the terms RTC and CTR as if we all mean the same
>> thing. Please don't. If you must use these terms, please define what
 you
>> mean by them.
>> 
>> CTR is a less ambiguous term - I'd suggest we all assume that "commit"
>> means a push to a version control system.
>> 
>> However, RTC seems to mean many things - from "push to JIRA for review
>> first, wait a bit, then commit to VCS" through "push to JIRA, and once
>> you have sufficient +1 votes, you can commit" to "push to JIRA for a
>> review, then another committer must commit it".
>> 
>> If we're gonna debate RTC, can we please describe which of these we
>> are
>> talking about (or some other mechanism that I haven't described)?
>> Otherwise, we will end up endlessly debating over the top of each
 other.
>> 
>> Upayavira
>> 
>> On Wed, Nov 25, 2015, at 09:28 AM, Harbs wrote:
>>> AIUI, there’s two ways to go about RTC which is easier in Git:
>>> 1) Working in feature/bug fix branches. Assuming RTC only applies to
 the
>>> main branch, changes are done in separate branches where commits do
 not
>>> require review. The feature/bug fix branch is then only merged back
 in
>>> after it had a review. The reason this is easier is because
 branching and
>>> merging is almost zero effort in Git. Many Git workflows don’t work
 on
>>> the main branch anyway, so this is a particularly good fit for those
>>> workflows.
>>> 2) Pull requests. Using pull requests, all changes can be pulled in
 with

Re: RTC vs CTR (was: Concerning Sentry...)

2015-11-25 Thread Ralph Goers
1. What makes you think all bugs are caught during code reviews (they aren’t)?
2. What makes you think that code reviews after the commit are any less 
thorough than reviews required before the commit?

If you don’t trust your community to do code reviews after you commit then 
there is a problem in your community. Forcing a code review to occur first 
won’t fix that.

In a CTR world you can choose to do every piece of work on a branch and ask for 
a code review before you commit. That is your choice.  But if you know that the 
code you are committing is good because you have a thorough knowledge of your 
product you shouldn’t be forced to have it reviewed before you can commit it.  
I actually love the line in Kudu where it says that automation insures quality. 
I am a big fan of that. In my experience having lots of tests is the best way 
to insure stuff doesn’t get broken.

So how did you catch the bugs in your code in Flex?  Would you have preferred 
that they stay on a branch for months so they could be reviewed before 
committing?  Despite how great people say git is I have still had lots of 
problems resolving merge conflicts when the code isn’t merged back quickly.  If 
I understand what you are saying you would prefer that Flex use RTC because you 
don’t trust your fellow committers to review your code.  That is a community 
problem that needs to be fixed. Forcing them to review the code first isn’t the 
proper way to fix it.

Ralph

> On Nov 25, 2015, at 2:00 PM, Harbs  wrote:
> 
> If a review is required for non-code changes to the main branch, then I agree.
> 
> I’m sure you agree that reviews on code make for less bugs. We all make 
> mistakes and can overlook things. It seems kind of extreme to assume that 
> this kind of required review is all about control. Since anyone who can 
> commit can review, it’s kind of hard for me to swallow that.
> 
> I assume your logic is that the reviews can come after the commit. Sure. But 
> what if it doesn’t?
> 
> Case in point: I made some pretty major changes to TLF in Flex which 
> constituted a number of months of work. I’m willing to bet that not every 
> commit I did was checked by others. I did a decent job, but there were a few 
> regressive bugs in my code, and I accidentally reverted some code in my 
> commit as well. In a workflow where my code would have to get one or more +1s 
> before I committed it to the main branch, it’s likely that the reverted 
> commit (at the least) would have been caught.
> 
> I would actually welcome knowing someone looked over my code for a sanity 
> check.
> 
> Harbs
> 
> On Nov 25, 2015, at 10:49 PM, Greg Stein  wrote:
> 
>> That is pretty normal operation in both styles of workflow. My concern is
>> with trunk/master. Is a committer trusted enough to make changes directly?
>> 
>> If all meaningful changes (ie. changing APIs and algorithms, not just
>> fixing typos w/o review) are not trusted, and require review/permission,
>> then I'm against that.
>> 
>> It is good practice to put potentially disruptive code onto a branch while
>> it is developed, then merge it when complete. Trusting a committer to ask
>> for review before the merge is great. Requiring it, less so.
>> 
>> But RTC on trunk/master is harmful.
>> 
>> Cheers,
>> -g
>> 
>> On Wed, Nov 25, 2015 at 2:44 PM, Harbs  wrote:
>> 
>>> What about commit to feature/bug brach, review and then commit to main
>>> branch?
>>> 
>>> Is that CTR or RTC in your book?
>>> 
>>> On Nov 25, 2015, at 10:42 PM, Greg Stein  wrote:
>>> 
 I object to Lucene's path, too. A committer's judgement is not trusted
 enough to make a change without upload/review. They need permission first
 (again: to use your term; it works great).
 
 On Wed, Nov 25, 2015 at 2:39 PM, Upayavira  wrote:
 
> Some setups that people call RTC are actually CTR in your nomenclature,
> so we could be talking cross-purposes. That's all I'm trying to avoid.
> E.g. Lucene - everything happens in JIRA first (upload patch, wait for
> review), but once that has happened, you are free to commit away. So
> strictly, it is RTC, but not seemingly in the sense you are objecting
> to.
> 
> Upayavira
> 
> On Wed, Nov 25, 2015, at 08:35 PM, Greg Stein wrote:
>> I think this is a distraction. You said it best the other day: RTC
>> implies
>> the need for "permission" before making a change to the codebase.
>> Committers are not trusted to make a judgement on whether a change
>>> should
>> be made.
>> 
>> CTR trusts committers to use their judgement. RTC distrusts committers,
>> and
>> makes them seek permission [though one of several mechanisms].
>> 
>> -g
>> 
>> On Wed, Nov 25, 2015 at 10:47 AM, Upayavira  wrote:
>> 
>>> Not replying to this mail specifically, but to the thread in

Re: RTC vs CTR (was: Concerning Sentry...)

2015-11-25 Thread Andrew Purtell
And I challenge you to comb over all HBase mailing lists and JIRAs and find any 
instance where we were not the model of a meritocratic and consensus driven 
community, or any instance where a committer has ever been aggrieved by our 
practices, and especially where I as chair have tried to exert control. It's 
harder to impugn people's motives if there's at least a minimal evidentiary 
bar. 


> On Nov 25, 2015, at 12:39 PM, Greg Stein  wrote:
> 
> Boo hoo. Todd said it wasn't about control, and then a few days later said
> he was forcing people into doing reviews. So yeah: in his case, it *is*
> about control.
> 
> Over the 17 years I've been around Apache, every single time I've seen
> somebody attempt to justify something like RTC, it always goes back to
> control. Always.
> 
> -g
> 
> 
> On Wed, Nov 25, 2015 at 2:35 PM, Andrew Purtell 
> wrote:
> 
>> I have to completely disagree and find your assertion vaguely offensive.
>> 
>>> On Nov 25, 2015, at 12:32 PM, Greg Stein  wrote:
>>> 
>>> On Wed, Nov 25, 2015 at 12:44 PM, Andrew Purtell 
>>> wrote:
 ...
 
 and inherited the RTC ethic from our parent community. I did recently
>> test
 the state of consensus on RTC vs CTR there and it still holds. I think
>> this
 model makes sense for HBase, which is a mature (read: complex) code base
 that implements a distributed database. For sure we want multiple sets
>> of
>>> 
>>> I call bullshit. "complex" my ass. I've said it before: all software is
>>> complex, and yours is no more complex than another. That is NOT a
>> rationale
>>> for installing RTC. It is an excuse for maintaining undue control.
>>> 
>>> -g
>> 
>> -
>> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
>> For additional commands, e-mail: general-h...@incubator.apache.org
>> 
>> 

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: RTC vs CTR (was: Concerning Sentry...)

2015-11-25 Thread Sam Ruby
On Wed, Nov 25, 2015 at 4:51 PM, Greg Stein  wrote:
>
> Don't shut down trunk/master for product development.

I don't believe you heard my point, but I'm not going to repeat it.
Instead I will add a new point.

'trunk/master for product development' is not the only development
model available to a project.  As an example, I've seen models where
'trunk/master is for product maintenance', and all development occurs
in a branch explicitly designated as where work on the next release is
to occur.

- Sam Ruby

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: RTC vs CTR (was: Concerning Sentry...)

2015-11-25 Thread Harbs

On Nov 25, 2015, at 11:32 PM, Ralph Goers  wrote:

> 1. What makes you think all bugs are caught during code reviews (they aren’t)?

I don’t, and I did not infer that.

> 2. What makes you think that code reviews after the commit are any less 
> thorough than reviews required before the commit?

Nothing. I did not say that. If the code review is done, it makes no difference 
when. I only said that CTR insures that the CODE REVIEW IS ACTUALLY DONE.

> If you don’t trust your community to do code reviews after you commit then 
> there is a problem in your community. Forcing a code review to occur first 
> won’t fix that.
> 
I don’t see it as not trusting the community. I think Flex is doing just fine 
right now. But Flex is a big code base and there’s areas where not a lot of 
people work on it. Especially, on a big code-base, there’s people working on 
different things. It’s totally reasonable for something to go under everyone’s 
radar — especially in an area of the code where there are not a lot of people 
working on it. Mandatory code reviews seems to me a method of making sure that 
it doesn’t get missed. If the code doesn’t get reviewed after x number of days, 
the person who's committing can send an email to the list asking for someone to 
look it over. What’s wrong with that?

> In a CTR world you can choose to do every piece of work on a branch and ask 
> for a code review before you commit. That is your choice.  But if you know 
> that the code you are committing is good because you have a thorough 
> knowledge of your product you shouldn’t be forced to have it reviewed before 
> you can commit it.  I actually love the line in Kudu where it says that 
> automation insures quality. I am a big fan of that. In my experience having 
> lots of tests is the best way to insure stuff doesn’t get broken.
> 
> So how did you catch the bugs in your code in Flex?  Would you have preferred 
> that they stay on a branch for months so they could be reviewed before 
> committing?  Despite how great people say git is I have still had lots of 
> problems resolving merge conflicts when the code isn’t merged back quickly.  
> If I understand what you are saying you would prefer that Flex use RTC 
> because you don’t trust your fellow committers to review your code.  That is 
> a community problem that needs to be fixed. Forcing them to review the code 
> first isn’t the proper way to fix it.

The bugs were found after the last release of Flex and reported by users.

No. I’m not saying I want RTC. I don’t. I’m quite happy with CTR in my 
community. Small bugs in Flex even if not caught will not likely cause users 
millions of dollars. I’m okay if there might be more bugs in Flex and not 
requiring code review, because code review DOES make things more difficult. All 
I’m saying is that I understand the rationale as quality assurance for 
communities who consider the damage for regressive bugs to be very high. It 
seems like a certain amount of RTC can be a reasonable price to pay.


> Ralph
> 
>> On Nov 25, 2015, at 2:00 PM, Harbs  wrote:
>> 
>> If a review is required for non-code changes to the main branch, then I 
>> agree.
>> 
>> I’m sure you agree that reviews on code make for less bugs. We all make 
>> mistakes and can overlook things. It seems kind of extreme to assume that 
>> this kind of required review is all about control. Since anyone who can 
>> commit can review, it’s kind of hard for me to swallow that.
>> 
>> I assume your logic is that the reviews can come after the commit. Sure. But 
>> what if it doesn’t?
>> 
>> Case in point: I made some pretty major changes to TLF in Flex which 
>> constituted a number of months of work. I’m willing to bet that not every 
>> commit I did was checked by others. I did a decent job, but there were a few 
>> regressive bugs in my code, and I accidentally reverted some code in my 
>> commit as well. In a workflow where my code would have to get one or more 
>> +1s before I committed it to the main branch, it’s likely that the reverted 
>> commit (at the least) would have been caught.
>> 
>> I would actually welcome knowing someone looked over my code for a sanity 
>> check.
>> 
>> Harbs
>> 
>> On Nov 25, 2015, at 10:49 PM, Greg Stein  wrote:
>> 
>>> That is pretty normal operation in both styles of workflow. My concern is
>>> with trunk/master. Is a committer trusted enough to make changes directly?
>>> 
>>> If all meaningful changes (ie. changing APIs and algorithms, not just
>>> fixing typos w/o review) are not trusted, and require review/permission,
>>> then I'm against that.
>>> 
>>> It is good practice to put potentially disruptive code onto a branch while
>>> it is developed, then merge it when complete. Trusting a committer to ask
>>> for review before the merge is great. Requiring it, less so.
>>> 
>>> But RTC on trunk/master is harmful.
>>> 
>>> Cheers,
>>> -g
>>> 
>>> On Wed, Nov 25, 2015 

Re: RTC vs CTR (was: Concerning Sentry...)

2015-11-25 Thread Greg Stein
On Wed, Nov 25, 2015 at 4:02 PM, Sam Ruby  wrote:

> On Wed, Nov 25, 2015 at 4:51 PM, Greg Stein  wrote:
> >
> > Don't shut down trunk/master for product development.
>
> I don't believe you heard my point, but I'm not going to repeat it.
>

I read your post several times, completely :-P ... I just think it didn't
argue against RTC being a form on control. (and yeah, maybe you weren't
trying to argue that?)


> Instead I will add a new point.
>
> 'trunk/master for product development' is not the only development
> model available to a project.  As an example, I've seen models where
> 'trunk/master is for product maintenance', and all development occurs
> in a branch explicitly designated as where work on the next release is
> to occur.
>

I think that is just playing with names. In Apache Subversion the "product
maintenance" is branches/1.8.x and branches/1.9.x (1.7.x and prior are
deprecated). trunk is for "next release".

In your naming model, where we've seen the name "develop" for "next
release" (aka where all new dev occurs), then I'd say making it RTC is
harmful.

trunk/master was shorthand for "where dev occurs". If you want to use a
different name... okay. :-)

Cheers,
-g

ps. fwiw, trunk/tags/branches isn't mandated in svn either. It was just an
ad hoc template we came up with back near the start of the project. We
assumed third-party tools would focus around that naming, which is
generally true, but svn itself has never cared.


Re: RTC vs CTR (was: Concerning Sentry...)

2015-11-25 Thread Todd Lipcon
On Wed, Nov 25, 2015 at 1:32 PM, Ralph Goers 
wrote:

> 1. What makes you think all bugs are caught during code reviews (they
> aren’t)?
>

They aren't. But some are. And catching them in code review is cheaper than
catching them when a user hits them.

Additionally, plenty of other things are caught in code reviews other than
bugs (style, compat issues, design issues, poor test coverage, etc)


> 2. What makes you think that code reviews after the commit are any less
> thorough than reviews required before the commit?
>
> If you don’t trust your community to do code reviews after you commit then
> there is a problem in your community. Forcing a code review to occur first
> won’t fix that.
>

Isn't it an issue of scalability? With pre-commit code reviews, typically
the uploader of the code will pick out one or two people to review the code
who know the area well. Or, if no one is picked by the submitter of the
patch, the committers will organically end up deciding who is to review the
code, at which point that reviewer ends up being a sort of shepherd for the
patch, sticking with the contributor through multiple revs until it's ready
for commit.

With post-commit review, do you expect to watch the mailing list and review
every patch that comes in? In a project like Hadoop, that's not feasible --
we've had ~35,000 lines of code changed in the last month in 267 patches.
If everyone tries to review every patch post-commit, you end up with n^2
work as the community grows.

Amusingly enough, I happened upon a chapter in "Producing Open Source
Software" that invoke's Greg's name on the subject of open source code
review (http://producingoss.com/en/setting-tone.html):

 There was no guarantee that every commit would be reviewed, though one
> might sometimes look over a change if one were particularly interested in
> that area of the code. Bugs slipped in that really could and should have
> been caught. A developer named Greg Stein, who knew the value of code
> review from past work, decided that he was going to set an example by
> reviewing every line of every single commit that went into the code
> repository. Each commit anyone made was soon followed by an email to the
> developer's list from Greg, dissecting the commit, analyzing possible
> problems, and occasionally praising a clever bit of code.


I'm impressed that Greg was able to do this with Subversion, but not sure
how it could work in a faster paced project, and also feel like this
practice produces a serious "bus factor" issue.

-Todd


Re: [VOTE] Accept Impala into the Apache Incubator

2015-11-25 Thread Hitesh Shah
+1 (binding)

— Hitesh

On Nov 24, 2015, at 1:03 PM, Henry Robinson  wrote:

> Hi -
> 
> The [DISCUSS] thread has been quiet for a few days, so I think there's been
> sufficient opportunity for discussion around our proposal to bring Impala
> to the ASF Incubator.
> 
> I'd like to call a VOTE on that proposal, which is on the wiki at
> https://wiki.apache.org/incubator/ImpalaProposal, and which I've pasted
> below.
> 
> During the discussion period, the proposal has been amended to add Brock
> Noland as a new mentor, to add one missed committer from the list and to
> correct some issues with the dependency list.
> 
> Please cast your votes as follows:
> 
> [] +1, accept Impala into the Incubator
> [] +/-0, non-counted vote to express a disposition
> [] -1, do not accept Impala into the Incubator (please give your reason(s))
> 
> As with the concurrent Kudu vote, I propose leaving the vote open for a
> full seven days (to close at Tuesday, December 1st at noon PST), due to the
> upcoming US holiday.
> 
> Thanks,
> Henry
> 
> 
> 
> = Abstract =
> Impala is a high-performance C++ and Java SQL query engine for data stored
> in Apache Hadoop-based clusters.
> 
> = Proposal =
> 
> We propose to contribute the Impala codebase and associated artifacts (e.g.
> documentation, web-site content etc.) to the Apache Software Foundation
> with the intent of forming a productive, meritocratic and open community
> around Impala’s continued development, according to the ‘Apache Way’.
> 
> Cloudera owns several trademarks regarding Impala, and proposes to transfer
> ownership of those trademarks in full to the ASF.
> 
> = Background =
> Engineers at Cloudera developed Impala and released it as an
> Apache-licensed open-source project in Fall 2012. Impala was written as a
> brand-new, modern C++ SQL engine targeted from the start for data stored in
> Apache Hadoop clusters.
> 
> Impala’s most important benefit to users is high-performance, making it
> extremely appropriate for common enterprise analytic and business
> intelligence workloads. This is achieved by a number of software
> techniques, including: native support for data stored in HDFS and related
> filesystems, just-in-time compilation and optimization of individual query
> plans, high-performance C++ codebase and massively-parallel distributed
> architecture. In benchmarks, Impala is routinely amongst the very highest
> performing SQL query engines.
> 
> = Rationale =
> 
> Despite the exciting innovation in the so-called ‘big-data’ space, SQL
> remains by far the most common interface for interacting with data in both
> traditional warehouses and modern ‘big-data’ clusters. There is clearly a
> need, as evidenced by the eager adoption of Impala and other SQL engines in
> enterprise contexts, for a query engine that offers the familiar SQL
> interface, but that has been specifically designed to operate in massive,
> distributed clusters rather than in traditional, fixed-hardware,
> warehouse-specific deployments. Impala is one such query engine.
> 
> We believe that the ASF is the right venue to foster an open-source
> community around Impala’s development. We expect that Impala will benefit
> from more productive collaboration with related Apache projects, and under
> the auspices of the ASF will attract talented contributors who will push
> Impala’s development forward at pace.
> 
> We believe that the timing is right for Impala’s development to move
> wholesale to the ASF: Impala is well-established, has been Apache-licensed
> open-source for more than three years, and the core project is relatively
> stable. We are excited to see where an ASF-based community can take Impala
> from this strong starting point.
> 
> = Initial Goals =
> Our initial goals are as follows:
> 
> * Establish ASF-compatible engineering practices and workflows
> * Refactor and publish existing internal build scripts and test
> infrastructure, in order to make them usable by any community member.
> * Transfer source code, documentation and associated artifacts to the ASF.
> * Grow the user and developer communities
> 
> = Current Status =
> 
> Impala is developed as an Apache-licensed open-source project. The source
> code is available at http://github.com/cloudera/Impala, and developer
> documentation is at https://github.com/cloudera/Impala/wiki. The majority
> of commits to the project have come from Cloudera-employed developers, but
> we have accepted some contributions from individuals from other
> organizations.
> 
> All code reviews are done via a public instance of the Gerrit review tool
> at http://gerrit.cloudera.org:8080/, and discussed on a public mailing
> list. All patches must be reviewed before they are accepted into the
> codebase, via a voting mechanism that is similar to that used on Apache
> projects such as Hadoop and HBase.
> 
> Before a patch is committed, it must pass a suite of pre-commit tests.
> These tests are currently run on 

Re: RTC vs CTR (was: Concerning Sentry...)

2015-11-25 Thread Andrew Purtell
I have to completely disagree and find your assertion vaguely offensive. 

> On Nov 25, 2015, at 12:32 PM, Greg Stein  wrote:
> 
> On Wed, Nov 25, 2015 at 12:44 PM, Andrew Purtell 
> wrote:
>> ...
>> 
>> and inherited the RTC ethic from our parent community. I did recently test
>> the state of consensus on RTC vs CTR there and it still holds. I think this
>> model makes sense for HBase, which is a mature (read: complex) code base
>> that implements a distributed database. For sure we want multiple sets of
>> 
> 
> I call bullshit. "complex" my ass. I've said it before: all software is
> complex, and yours is no more complex than another. That is NOT a rationale
> for installing RTC. It is an excuse for maintaining undue control.
> 
> -g

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: RTC vs CTR (was: Concerning Sentry...)

2015-11-25 Thread Greg Stein
That is pretty normal operation in both styles of workflow. My concern is
with trunk/master. Is a committer trusted enough to make changes directly?

If all meaningful changes (ie. changing APIs and algorithms, not just
fixing typos w/o review) are not trusted, and require review/permission,
then I'm against that.

It is good practice to put potentially disruptive code onto a branch while
it is developed, then merge it when complete. Trusting a committer to ask
for review before the merge is great. Requiring it, less so.

But RTC on trunk/master is harmful.

Cheers,
-g

On Wed, Nov 25, 2015 at 2:44 PM, Harbs  wrote:

> What about commit to feature/bug brach, review and then commit to main
> branch?
>
> Is that CTR or RTC in your book?
>
> On Nov 25, 2015, at 10:42 PM, Greg Stein  wrote:
>
> > I object to Lucene's path, too. A committer's judgement is not trusted
> > enough to make a change without upload/review. They need permission first
> > (again: to use your term; it works great).
> >
> > On Wed, Nov 25, 2015 at 2:39 PM, Upayavira  wrote:
> >
> >> Some setups that people call RTC are actually CTR in your nomenclature,
> >> so we could be talking cross-purposes. That's all I'm trying to avoid.
> >> E.g. Lucene - everything happens in JIRA first (upload patch, wait for
> >> review), but once that has happened, you are free to commit away. So
> >> strictly, it is RTC, but not seemingly in the sense you are objecting
> >> to.
> >>
> >> Upayavira
> >>
> >> On Wed, Nov 25, 2015, at 08:35 PM, Greg Stein wrote:
> >>> I think this is a distraction. You said it best the other day: RTC
> >>> implies
> >>> the need for "permission" before making a change to the codebase.
> >>> Committers are not trusted to make a judgement on whether a change
> should
> >>> be made.
> >>>
> >>> CTR trusts committers to use their judgement. RTC distrusts committers,
> >>> and
> >>> makes them seek permission [though one of several mechanisms].
> >>>
> >>> -g
> >>>
> >>> On Wed, Nov 25, 2015 at 10:47 AM, Upayavira  wrote:
> >>>
>  Not replying to this mail specifically, but to the thread in
> general...
> 
>  People keep using the terms RTC and CTR as if we all mean the same
>  thing. Please don't. If you must use these terms, please define what
> >> you
>  mean by them.
> 
>  CTR is a less ambiguous term - I'd suggest we all assume that "commit"
>  means a push to a version control system.
> 
>  However, RTC seems to mean many things - from "push to JIRA for review
>  first, wait a bit, then commit to VCS" through "push to JIRA, and once
>  you have sufficient +1 votes, you can commit" to "push to JIRA for a
>  review, then another committer must commit it".
> 
>  If we're gonna debate RTC, can we please describe which of these we
> are
>  talking about (or some other mechanism that I haven't described)?
>  Otherwise, we will end up endlessly debating over the top of each
> >> other.
> 
>  Upayavira
> 
>  On Wed, Nov 25, 2015, at 09:28 AM, Harbs wrote:
> > AIUI, there’s two ways to go about RTC which is easier in Git:
> > 1) Working in feature/bug fix branches. Assuming RTC only applies to
> >> the
> > main branch, changes are done in separate branches where commits do
> >> not
> > require review. The feature/bug fix branch is then only merged back
> >> in
> > after it had a review. The reason this is easier is because
> >> branching and
> > merging is almost zero effort in Git. Many Git workflows don’t work
> >> on
> > the main branch anyway, so this is a particularly good fit for those
> > workflows.
> > 2) Pull requests. Using pull requests, all changes can be pulled in
> >> with
> > a single command.
> >
> > I’ve personally never participated in RTC (unless you count Github
> > projects and before I was a committer in Flex), so it could be I’m
> > missing something.
> >
> > Of course there’s nothing to ENFORCE that the commit is not done
> >> before a
> > review, but why would you want to do that? That’s where trust comes
> >> to
> > play… ;-)
> >
> > Harbs
> >
> > On Nov 25, 2015, at 4:08 AM, Konstantin Boudnik 
> >> wrote:
> >
> >> I don't think Git is particularly empowering RTC - there's nothing
> >> in
>  it that
> >> requires someone to look over one's shoulder.
> >
> 
>  -
>  To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
>  For additional commands, e-mail: general-h...@incubator.apache.org
> 
> 
> >>
> >> -
> >> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> >> For additional commands, e-mail: general-h...@incubator.apache.org
> >>
> >>
>
>
> 

Re: RTC vs CTR (was: Concerning Sentry...)

2015-11-25 Thread Jim Jagielski

> On Nov 25, 2015, at 3:49 PM, Greg Stein  wrote:
> 
> That is pretty normal operation in both styles of workflow. My concern is
> with trunk/master.

As far as I know, that condition was unclear... You seemed
to imply that RTC *anyplace* was harmful or all about control.

Both CTR and RTC are processes, with known reasons, rationales,
scenarios and use-cases. It's HOW they are used which is
the rub.

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: RTC vs CTR (was: Concerning Sentry...)

2015-11-25 Thread Greg Stein
On Wed, Nov 25, 2015 at 12:44 PM, Andrew Purtell 
wrote:
>...
>
> and inherited the RTC ethic from our parent community. I did recently test
> the state of consensus on RTC vs CTR there and it still holds. I think this
> model makes sense for HBase, which is a mature (read: complex) code base
> that implements a distributed database. For sure we want multiple sets of
>

I call bullshit. "complex" my ass. I've said it before: all software is
complex, and yours is no more complex than another. That is NOT a rationale
for installing RTC. It is an excuse for maintaining undue control.

-g


Re: RTC vs CTR (was: Concerning Sentry...)

2015-11-25 Thread Greg Stein
I think this is a distraction. You said it best the other day: RTC implies
the need for "permission" before making a change to the codebase.
Committers are not trusted to make a judgement on whether a change should
be made.

CTR trusts committers to use their judgement. RTC distrusts committers, and
makes them seek permission [though one of several mechanisms].

-g

On Wed, Nov 25, 2015 at 10:47 AM, Upayavira  wrote:

> Not replying to this mail specifically, but to the thread in general...
>
> People keep using the terms RTC and CTR as if we all mean the same
> thing. Please don't. If you must use these terms, please define what you
> mean by them.
>
> CTR is a less ambiguous term - I'd suggest we all assume that "commit"
> means a push to a version control system.
>
> However, RTC seems to mean many things - from "push to JIRA for review
> first, wait a bit, then commit to VCS" through "push to JIRA, and once
> you have sufficient +1 votes, you can commit" to "push to JIRA for a
> review, then another committer must commit it".
>
> If we're gonna debate RTC, can we please describe which of these we are
> talking about (or some other mechanism that I haven't described)?
> Otherwise, we will end up endlessly debating over the top of each other.
>
> Upayavira
>
> On Wed, Nov 25, 2015, at 09:28 AM, Harbs wrote:
> > AIUI, there’s two ways to go about RTC which is easier in Git:
> > 1) Working in feature/bug fix branches. Assuming RTC only applies to the
> > main branch, changes are done in separate branches where commits do not
> > require review. The feature/bug fix branch is then only merged back in
> > after it had a review. The reason this is easier is because branching and
> > merging is almost zero effort in Git. Many Git workflows don’t work on
> > the main branch anyway, so this is a particularly good fit for those
> > workflows.
> > 2) Pull requests. Using pull requests, all changes can be pulled in with
> > a single command.
> >
> > I’ve personally never participated in RTC (unless you count Github
> > projects and before I was a committer in Flex), so it could be I’m
> > missing something.
> >
> > Of course there’s nothing to ENFORCE that the commit is not done before a
> > review, but why would you want to do that? That’s where trust comes to
> > play… ;-)
> >
> > Harbs
> >
> > On Nov 25, 2015, at 4:08 AM, Konstantin Boudnik  wrote:
> >
> > > I don't think Git is particularly empowering RTC - there's nothing in
> it that
> > > requires someone to look over one's shoulder.
> >
>
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>


Re: RTC vs CTR (was: Concerning Sentry...)

2015-11-25 Thread Harbs

On Nov 25, 2015, at 10:37 PM, Greg Stein  wrote:

>> AIUI, there’s two ways to go about RTC which is easier in Git:
>> 
> 
> That's not what Cos said. He said using Git does not lead to RTC.
> 
> If RTC has been chosen, then you're right: Git makes it easier [than svn].
> But you've swapped cause/effect from what Cos was saying.

Cos was responding to my email. So if I’m swapping his intent then I’m swapping 
a swapped intent… ;-)



Re: RTC vs CTR (was: Concerning Sentry...)

2015-11-25 Thread Sam Ruby
On Wed, Nov 25, 2015 at 3:39 PM, Greg Stein  wrote:
>
> Over the 17 years I've been around Apache, every single time I've seen
> somebody attempt to justify something like RTC, it always goes back to
> control. Always.

Strongly disagree.  If you say 'every', all it takes is one counter
example to disprove the assertion.  Here is a counter example:

https://cwiki.apache.org/confluence/display/INFRA/Git+workflow+for+infrastructure-puppet+repo

It is not a hypothetical example from the distant past.  It is a live
example which seems to work well.  I've witnessed it being used for
single line patches (a removal of a line, in fact) in a YAML file.
Gavin created a branch, made a patch, pushed it, and Daniel merged it.
Not for provenance reasons.  Or for control reasons.  But to ensure a
second set of eyes looked at the change and evaluated whether or not
there may be some unanticipated side effect.

I'll propose a thought experiment.  We seem to agree that there is
room for teams to impose some form of RTC on branches that are to be
released "soonish" (for some value of "soonish").  Let's take the next
step... what happens if releases are frequent (i.e. approaching
continuous?).

That's essentially what the infrastructure team is faced with.

I don't give a whit about 'control issues' (perceived or real, doesn't
matter).  Anything I commit may be reverted.  I'm fine with that.  I
don't presume to control anything.  And if somebody wants to try to
control me -- all I can say is: good luck with that.  :-P

What I care most about is languishing patches.  Whether they come from
team members or drive by contributors, doesn't matter.  That's
harmful.  Git, and in particular, GitHub, makes them less harmful, but
they are the root problem not whether the process is
Commit-Then-Revert or Post-Then-Ignore.

If most communities in the Hadoop ecosystem use RTC, I don't care
UNLESS there is evidence of them not being responsive to patches.  For
quieter communities (including apparently BigTop), RTC could lead to
problems, and CTR is arguably more appropriate.  I'm fine with that
too.

- Sam Ruby

P.S.  My personal preference remains CTR.  I would much rather be
reverted with an explanation than to be ignored without one.

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: RTC vs CTR (was: Concerning Sentry...)

2015-11-25 Thread Greg Stein
On Wed, Nov 25, 2015 at 3:27 PM, Sam Ruby  wrote:

> On Wed, Nov 25, 2015 at 3:39 PM, Greg Stein  wrote:
> >
> > Over the 17 years I've been around Apache, every single time I've seen
> > somebody attempt to justify something like RTC, it always goes back to
> > control. Always.
>
> Strongly disagree.  If you say 'every', all it takes is one counter
> example to disprove the assertion.  Here is a counter example:
>
>
> https://cwiki.apache.org/confluence/display/INFRA/Git+workflow+for+infrastructure-puppet+repo
>
> It is not a hypothetical example from the distant past.  It is a live
> example which seems to work well.  I've witnessed it being used for
> single line patches (a removal of a line, in fact) in a YAML file.
> Gavin created a branch, made a patch, pushed it, and Daniel merged it.
> Not for provenance reasons.  Or for control reasons.  But to ensure a
> second set of eyes looked at the change and evaluated whether or not
> there may be some unanticipated side effect.
>

I disagree. It *is* for control reasons. Infra can't allow a patch to be
deployed willy-nilly, or shit goes wrong. Fast.

Infra is not building a software product. They are maintaining live
systems. Control is absolutely needed.

Their entire repository is like a release stabilization branch. It needs to
be vetted before release.

I'll propose a thought experiment.  We seem to agree that there is
> room for teams to impose some form of RTC on branches that are to be
> released "soonish" (for some value of "soonish").  Let's take the next
>

Yes, I call those branches "owned by" or "personal branch of" the RM, who
decides to apply his/her rules on what is allowed onto the branch. At some
point, the RM cuts a release and the community votes on it.

One RM might allow any change. Another RM might require (3) +1 votes for
any change to be applied. Yet another refuses all change, and only applies
changes themselves.

step... what happens if releases are frequent (i.e. approaching
> continuous?).
>

Don't shut down trunk/master for product development. If the RMs want to
push my work into the releases each week, then great. I'll help them, under
the rules they set. If I find their release rules too onerous, then I'll
start my own release under Apache's "any committer can be an RM" and hope
for (3) +1 votes on the result.


> That's essentially what the infrastructure team is faced with.
>

It's not a product. They are solving a very different problem.

>...

Cheers,
-g


Re: RTC vs CTR (was: Concerning Sentry...)

2015-11-25 Thread Upayavira
Some setups that people call RTC are actually CTR in your nomenclature,
so we could be talking cross-purposes. That's all I'm trying to avoid.
E.g. Lucene - everything happens in JIRA first (upload patch, wait for
review), but once that has happened, you are free to commit away. So
strictly, it is RTC, but not seemingly in the sense you are objecting
to.

Upayavira

On Wed, Nov 25, 2015, at 08:35 PM, Greg Stein wrote:
> I think this is a distraction. You said it best the other day: RTC
> implies
> the need for "permission" before making a change to the codebase.
> Committers are not trusted to make a judgement on whether a change should
> be made.
> 
> CTR trusts committers to use their judgement. RTC distrusts committers,
> and
> makes them seek permission [though one of several mechanisms].
> 
> -g
> 
> On Wed, Nov 25, 2015 at 10:47 AM, Upayavira  wrote:
> 
> > Not replying to this mail specifically, but to the thread in general...
> >
> > People keep using the terms RTC and CTR as if we all mean the same
> > thing. Please don't. If you must use these terms, please define what you
> > mean by them.
> >
> > CTR is a less ambiguous term - I'd suggest we all assume that "commit"
> > means a push to a version control system.
> >
> > However, RTC seems to mean many things - from "push to JIRA for review
> > first, wait a bit, then commit to VCS" through "push to JIRA, and once
> > you have sufficient +1 votes, you can commit" to "push to JIRA for a
> > review, then another committer must commit it".
> >
> > If we're gonna debate RTC, can we please describe which of these we are
> > talking about (or some other mechanism that I haven't described)?
> > Otherwise, we will end up endlessly debating over the top of each other.
> >
> > Upayavira
> >
> > On Wed, Nov 25, 2015, at 09:28 AM, Harbs wrote:
> > > AIUI, there’s two ways to go about RTC which is easier in Git:
> > > 1) Working in feature/bug fix branches. Assuming RTC only applies to the
> > > main branch, changes are done in separate branches where commits do not
> > > require review. The feature/bug fix branch is then only merged back in
> > > after it had a review. The reason this is easier is because branching and
> > > merging is almost zero effort in Git. Many Git workflows don’t work on
> > > the main branch anyway, so this is a particularly good fit for those
> > > workflows.
> > > 2) Pull requests. Using pull requests, all changes can be pulled in with
> > > a single command.
> > >
> > > I’ve personally never participated in RTC (unless you count Github
> > > projects and before I was a committer in Flex), so it could be I’m
> > > missing something.
> > >
> > > Of course there’s nothing to ENFORCE that the commit is not done before a
> > > review, but why would you want to do that? That’s where trust comes to
> > > play… ;-)
> > >
> > > Harbs
> > >
> > > On Nov 25, 2015, at 4:08 AM, Konstantin Boudnik  wrote:
> > >
> > > > I don't think Git is particularly empowering RTC - there's nothing in
> > it that
> > > > requires someone to look over one's shoulder.
> > >
> >
> > -
> > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> > For additional commands, e-mail: general-h...@incubator.apache.org
> >
> >

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: RTC vs CTR (was: Concerning Sentry...)

2015-11-25 Thread Greg Stein
Boo hoo. Todd said it wasn't about control, and then a few days later said
he was forcing people into doing reviews. So yeah: in his case, it *is*
about control.

Over the 17 years I've been around Apache, every single time I've seen
somebody attempt to justify something like RTC, it always goes back to
control. Always.

-g


On Wed, Nov 25, 2015 at 2:35 PM, Andrew Purtell 
wrote:

> I have to completely disagree and find your assertion vaguely offensive.
>
> > On Nov 25, 2015, at 12:32 PM, Greg Stein  wrote:
> >
> > On Wed, Nov 25, 2015 at 12:44 PM, Andrew Purtell 
> > wrote:
> >> ...
> >>
> >> and inherited the RTC ethic from our parent community. I did recently
> test
> >> the state of consensus on RTC vs CTR there and it still holds. I think
> this
> >> model makes sense for HBase, which is a mature (read: complex) code base
> >> that implements a distributed database. For sure we want multiple sets
> of
> >>
> >
> > I call bullshit. "complex" my ass. I've said it before: all software is
> > complex, and yours is no more complex than another. That is NOT a
> rationale
> > for installing RTC. It is an excuse for maintaining undue control.
> >
> > -g
>
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>


Re: [VOTE] Accept Kudu into the Apache Incubator

2015-11-25 Thread Hitesh Shah
+1 (binding)

— Hitesh

On Nov 24, 2015, at 11:32 AM, Todd Lipcon  wrote:

> Hi all,
> 
> Discussion on the [DISCUSS] thread seems to have wound down, so I'd like to
> call a VOTE on acceptance of Kudu into the ASF Incubator. The proposal is
> pasted below and also available on the wiki at:
> https://wiki.apache.org/incubator/KuduProposal
> 
> The proposal is unchanged since the original version, except for the
> addition of Carl Steinbach as a Mentor.
> 
> Please cast your votes:
> 
> [] +1, accept Kudu into the Incubator
> [] +/-0, positive/negative non-counted expression of feelings
> [] -1, do not accept Kudu into the incubator (please state reasoning)
> 
> Given the US holiday this week, I imagine many folks are traveling or
> otherwise offline. So, let's run the vote for a full week rather than the
> traditional 72 hours. Unless the IPMC objects to the extended voting
> period, the vote will close on Tues, Dec 1st at noon PST.
> 
> Thanks
> -Todd
> -
> 
> = Kudu Proposal =
> 
> == Abstract ==
> 
> Kudu is a distributed columnar storage engine built for the Apache Hadoop
> ecosystem.
> 
> == Proposal ==
> 
> Kudu is an open source storage engine for structured data which supports
> low-latency random access together with efficient analytical access
> patterns. Kudu distributes data using horizontal partitioning and
> replicates each partition using Raft consensus, providing low
> mean-time-to-recovery and low tail latencies. Kudu is designed within the
> context of the Apache Hadoop ecosystem and supports many integrations with
> other data analytics projects both inside and outside of the Apache
> Software Foundation.
> 
> 
> 
> We propose to incubate Kudu as a project of the Apache Software Foundation.
> 
> == Background ==
> 
> In recent years, explosive growth in the amount of data being generated and
> captured by enterprises has resulted in the rapid adoption of open source
> technology which is able to store massive data sets at scale and at low
> cost. In particular, the Apache Hadoop ecosystem has become a focal point
> for such “big data” workloads, because many traditional open source
> database systems have lagged in offering a scalable alternative.
> 
> 
> 
> Structured storage in the Hadoop ecosystem has typically been achieved in
> two ways: for static data sets, data is typically stored on Apache HDFS
> using binary data formats such as Apache Avro or Apache Parquet. However,
> neither HDFS nor these formats has any provision for updating individual
> records, or for efficient random access. Mutable data sets are typically
> stored in semi-structured stores such as Apache HBase or Apache Cassandra.
> These systems allow for low-latency record-level reads and writes, but lag
> far behind the static file formats in terms of sequential read throughput
> for applications such as SQL-based analytics or machine learning.
> 
> 
> 
> Kudu is a new storage system designed and implemented from the ground up to
> fill this gap between high-throughput sequential-access storage systems
> such as HDFS and low-latency random-access systems such as HBase or
> Cassandra. While these existing systems continue to hold advantages in some
> situations, Kudu offers a “happy medium” alternative that can dramatically
> simplify the architecture of many common workloads. In particular, Kudu
> offers a simple API for row-level inserts, updates, and deletes, while
> providing table scans at throughputs similar to Parquet, a commonly-used
> columnar format for static data.
> 
> 
> 
> More information on Kudu can be found at the existing open source project
> website: http://getkudu.io and in particular in the Kudu white-paper PDF:
> http://getkudu.io/kudu.pdf from which the above was excerpted.
> 
> == Rationale ==
> 
> As described above, Kudu fills an important gap in the open source storage
> ecosystem. After our initial open source project release in September 2015,
> we have seen a great amount of interest across a diverse set of users and
> companies. We believe that, as a storage system, it is critical to build an
> equally diverse set of contributors in the development community. Our
> experiences as committers and PMC members on other Apache projects have
> taught us the value of diverse communities in ensuring both longevity and
> high quality for such foundational systems.
> 
> == Initial Goals ==
> 
> * Move the existing codebase, website, documentation, and mailing lists to
> Apache-hosted infrastructure
> * Work with the infrastructure team to implement and approve our code
> review, build, and testing workflows in the context of the ASF
> * Incremental development and releases per Apache guidelines
> 
> == Current Status ==
> 
>  Releases 
> 
> Kudu has undergone one public release, tagged here
> https://github.com/cloudera/kudu/tree/kudu0.5.0-release
> 
> This initial release was not performed in the typical ASF fashion -- no
> source tarball was released, but rather 

Re: [VOTE] Accept Impala into the Apache Incubator

2015-11-25 Thread Jarek Jarcec Cecho
[X] +1, accept Impala into the Incubator

(Binding)

Jarcec

> On Nov 24, 2015, at 1:03 PM, Henry Robinson  wrote:
> 
> Hi -
> 
> The [DISCUSS] thread has been quiet for a few days, so I think there's been
> sufficient opportunity for discussion around our proposal to bring Impala
> to the ASF Incubator.
> 
> I'd like to call a VOTE on that proposal, which is on the wiki at
> https://wiki.apache.org/incubator/ImpalaProposal, and which I've pasted
> below.
> 
> During the discussion period, the proposal has been amended to add Brock
> Noland as a new mentor, to add one missed committer from the list and to
> correct some issues with the dependency list.
> 
> Please cast your votes as follows:
> 
> [] +1, accept Impala into the Incubator
> [] +/-0, non-counted vote to express a disposition
> [] -1, do not accept Impala into the Incubator (please give your reason(s))
> 
> As with the concurrent Kudu vote, I propose leaving the vote open for a
> full seven days (to close at Tuesday, December 1st at noon PST), due to the
> upcoming US holiday.
> 
> Thanks,
> Henry
> 
> 
> 
> = Abstract =
> Impala is a high-performance C++ and Java SQL query engine for data stored
> in Apache Hadoop-based clusters.
> 
> = Proposal =
> 
> We propose to contribute the Impala codebase and associated artifacts (e.g.
> documentation, web-site content etc.) to the Apache Software Foundation
> with the intent of forming a productive, meritocratic and open community
> around Impala’s continued development, according to the ‘Apache Way’.
> 
> Cloudera owns several trademarks regarding Impala, and proposes to transfer
> ownership of those trademarks in full to the ASF.
> 
> = Background =
> Engineers at Cloudera developed Impala and released it as an
> Apache-licensed open-source project in Fall 2012. Impala was written as a
> brand-new, modern C++ SQL engine targeted from the start for data stored in
> Apache Hadoop clusters.
> 
> Impala’s most important benefit to users is high-performance, making it
> extremely appropriate for common enterprise analytic and business
> intelligence workloads. This is achieved by a number of software
> techniques, including: native support for data stored in HDFS and related
> filesystems, just-in-time compilation and optimization of individual query
> plans, high-performance C++ codebase and massively-parallel distributed
> architecture. In benchmarks, Impala is routinely amongst the very highest
> performing SQL query engines.
> 
> = Rationale =
> 
> Despite the exciting innovation in the so-called ‘big-data’ space, SQL
> remains by far the most common interface for interacting with data in both
> traditional warehouses and modern ‘big-data’ clusters. There is clearly a
> need, as evidenced by the eager adoption of Impala and other SQL engines in
> enterprise contexts, for a query engine that offers the familiar SQL
> interface, but that has been specifically designed to operate in massive,
> distributed clusters rather than in traditional, fixed-hardware,
> warehouse-specific deployments. Impala is one such query engine.
> 
> We believe that the ASF is the right venue to foster an open-source
> community around Impala’s development. We expect that Impala will benefit
> from more productive collaboration with related Apache projects, and under
> the auspices of the ASF will attract talented contributors who will push
> Impala’s development forward at pace.
> 
> We believe that the timing is right for Impala’s development to move
> wholesale to the ASF: Impala is well-established, has been Apache-licensed
> open-source for more than three years, and the core project is relatively
> stable. We are excited to see where an ASF-based community can take Impala
> from this strong starting point.
> 
> = Initial Goals =
> Our initial goals are as follows:
> 
> * Establish ASF-compatible engineering practices and workflows
> * Refactor and publish existing internal build scripts and test
> infrastructure, in order to make them usable by any community member.
> * Transfer source code, documentation and associated artifacts to the ASF.
> * Grow the user and developer communities
> 
> = Current Status =
> 
> Impala is developed as an Apache-licensed open-source project. The source
> code is available at http://github.com/cloudera/Impala, and developer
> documentation is at https://github.com/cloudera/Impala/wiki. The majority
> of commits to the project have come from Cloudera-employed developers, but
> we have accepted some contributions from individuals from other
> organizations.
> 
> All code reviews are done via a public instance of the Gerrit review tool
> at http://gerrit.cloudera.org:8080/, and discussed on a public mailing
> list. All patches must be reviewed before they are accepted into the
> codebase, via a voting mechanism that is similar to that used on Apache
> projects such as Hadoop and HBase.
> 
> Before a patch is committed, it must pass a suite of pre-commit tests.
> 

Re: RTC vs CTR (was: Concerning Sentry...)

2015-11-25 Thread Pierre Smits
I see some are trying to spread quite some FUD.

Pierre Smits

*OFBiz Extensions Marketplace*
http://oem.ofbizci.net/oci-2/

On Wed, Nov 25, 2015 at 11:47 PM, Sam Ruby  wrote:

> On Wed, Nov 25, 2015 at 5:18 PM, Greg Stein  wrote:
> > On Wed, Nov 25, 2015 at 4:02 PM, Sam Ruby 
> wrote:
> >
> >> On Wed, Nov 25, 2015 at 4:51 PM, Greg Stein  wrote:
> >> >
> >> > Don't shut down trunk/master for product development.
> >>
> >> I don't believe you heard my point, but I'm not going to repeat it.
> >
> > I read your post several times, completely :-P ... I just think it didn't
> > argue against RTC being a form on control. (and yeah, maybe you weren't
> > trying to argue that?)
>
> I don't believe that RTC is a form of control over others.  I believe
> that RTC is a mechanism to ensure that every change is adequately
> reviewed.
>
> >> Instead I will add a new point.
> >>
> >> 'trunk/master for product development' is not the only development
> >> model available to a project.  As an example, I've seen models where
> >> 'trunk/master is for product maintenance', and all development occurs
> >> in a branch explicitly designated as where work on the next release is
> >> to occur.
> >>
> >
> > I think that is just playing with names. In Apache Subversion the
> "product
> > maintenance" is branches/1.8.x and branches/1.9.x (1.7.x and prior are
> > deprecated). trunk is for "next release".
> >
> > In your naming model, where we've seen the name "develop" for "next
> > release" (aka where all new dev occurs), then I'd say making it RTC is
> > harmful.
> >
> > trunk/master was shorthand for "where dev occurs". If you want to use a
> > different name... okay. :-)
>
> I don't believe it is just playing with names.  There are projects in
> when all non-trivial development occurs in feature branches.
>
> > Cheers,
> > -g
> >
> > ps. fwiw, trunk/tags/branches isn't mandated in svn either. It was just
> an
> > ad hoc template we came up with back near the start of the project. We
> > assumed third-party tools would focus around that naming, which is
> > generally true, but svn itself has never cared.
>
> Ack.
>
> - Sam Ruby
>
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>


Re: [VOTE] Accept Kudu into the Apache Incubator

2015-11-25 Thread Rob Vesse
+1 (binding)

Rob

On 24/11/2015 19:32, "Todd Lipcon"  wrote:

>Hi all,
>
>Discussion on the [DISCUSS] thread seems to have wound down, so I'd like
>to
>call a VOTE on acceptance of Kudu into the ASF Incubator. The proposal is
>pasted below and also available on the wiki at:
>https://wiki.apache.org/incubator/KuduProposal
>
>The proposal is unchanged since the original version, except for the
>addition of Carl Steinbach as a Mentor.
>
>Please cast your votes:
>
>[] +1, accept Kudu into the Incubator
>[] +/-0, positive/negative non-counted expression of feelings
>[] -1, do not accept Kudu into the incubator (please state reasoning)
>
>Given the US holiday this week, I imagine many folks are traveling or
>otherwise offline. So, let's run the vote for a full week rather than the
>traditional 72 hours. Unless the IPMC objects to the extended voting
>period, the vote will close on Tues, Dec 1st at noon PST.
>
>Thanks
>-Todd
>-
>
>= Kudu Proposal =
>
>== Abstract ==
>
>Kudu is a distributed columnar storage engine built for the Apache Hadoop
>ecosystem.
>
>== Proposal ==
>
>Kudu is an open source storage engine for structured data which supports
>low-latency random access together with efficient analytical access
>patterns. Kudu distributes data using horizontal partitioning and
>replicates each partition using Raft consensus, providing low
>mean-time-to-recovery and low tail latencies. Kudu is designed within the
>context of the Apache Hadoop ecosystem and supports many integrations with
>other data analytics projects both inside and outside of the Apache
>Software Foundation.
>
>
>
>We propose to incubate Kudu as a project of the Apache Software
>Foundation.
>
>== Background ==
>
>In recent years, explosive growth in the amount of data being generated
>and
>captured by enterprises has resulted in the rapid adoption of open source
>technology which is able to store massive data sets at scale and at low
>cost. In particular, the Apache Hadoop ecosystem has become a focal point
>for such “big data” workloads, because many traditional open source
>database systems have lagged in offering a scalable alternative.
>
>
>
>Structured storage in the Hadoop ecosystem has typically been achieved in
>two ways: for static data sets, data is typically stored on Apache HDFS
>using binary data formats such as Apache Avro or Apache Parquet. However,
>neither HDFS nor these formats has any provision for updating individual
>records, or for efficient random access. Mutable data sets are typically
>stored in semi-structured stores such as Apache HBase or Apache Cassandra.
>These systems allow for low-latency record-level reads and writes, but lag
>far behind the static file formats in terms of sequential read throughput
>for applications such as SQL-based analytics or machine learning.
>
>
>
>Kudu is a new storage system designed and implemented from the ground up
>to
>fill this gap between high-throughput sequential-access storage systems
>such as HDFS and low-latency random-access systems such as HBase or
>Cassandra. While these existing systems continue to hold advantages in
>some
>situations, Kudu offers a “happy medium” alternative that can dramatically
>simplify the architecture of many common workloads. In particular, Kudu
>offers a simple API for row-level inserts, updates, and deletes, while
>providing table scans at throughputs similar to Parquet, a commonly-used
>columnar format for static data.
>
>
>
>More information on Kudu can be found at the existing open source project
>website: http://getkudu.io and in particular in the Kudu white-paper PDF:
>http://getkudu.io/kudu.pdf from which the above was excerpted.
>
>== Rationale ==
>
>As described above, Kudu fills an important gap in the open source storage
>ecosystem. After our initial open source project release in September
>2015,
>we have seen a great amount of interest across a diverse set of users and
>companies. We believe that, as a storage system, it is critical to build
>an
>equally diverse set of contributors in the development community. Our
>experiences as committers and PMC members on other Apache projects have
>taught us the value of diverse communities in ensuring both longevity and
>high quality for such foundational systems.
>
>== Initial Goals ==
>
> * Move the existing codebase, website, documentation, and mailing lists
>to
>Apache-hosted infrastructure
> * Work with the infrastructure team to implement and approve our code
>review, build, and testing workflows in the context of the ASF
> * Incremental development and releases per Apache guidelines
>
>== Current Status ==
>
> Releases 
>
>Kudu has undergone one public release, tagged here
>https://github.com/cloudera/kudu/tree/kudu0.5.0-release
>
>This initial release was not performed in the typical ASF fashion -- no
>source tarball was released, but rather only convenience binaries made
>available in Cloudera’s repositories. We will adopt 

Re: RTC vs CTR (was: Concerning Sentry...)

2015-11-25 Thread Harbs
AIUI, there’s two ways to go about RTC which is easier in Git:
1) Working in feature/bug fix branches. Assuming RTC only applies to the main 
branch, changes are done in separate branches where commits do not require 
review. The feature/bug fix branch is then only merged back in after it had a 
review. The reason this is easier is because branching and merging is almost 
zero effort in Git. Many Git workflows don’t work on the main branch anyway, so 
this is a particularly good fit for those workflows.
2) Pull requests. Using pull requests, all changes can be pulled in with a 
single command.

I’ve personally never participated in RTC (unless you count Github projects and 
before I was a committer in Flex), so it could be I’m missing something.

Of course there’s nothing to ENFORCE that the commit is not done before a 
review, but why would you want to do that? That’s where trust comes to play… ;-)

Harbs

On Nov 25, 2015, at 4:08 AM, Konstantin Boudnik  wrote:

> I don't think Git is particularly empowering RTC - there's nothing in it that
> requires someone to look over one's shoulder.



Re: [VOTE] Accept Impala into the Apache Incubator

2015-11-25 Thread Alex Karasulu
+1 (binding)

On Wed, Nov 25, 2015 at 10:44 AM, Tom White  wrote:

> +1 (binding)
>
> Tom
>
> On Tue, Nov 24, 2015 at 9:03 PM, Henry Robinson 
> wrote:
> > Hi -
> >
> > The [DISCUSS] thread has been quiet for a few days, so I think there's
> been
> > sufficient opportunity for discussion around our proposal to bring Impala
> > to the ASF Incubator.
> >
> > I'd like to call a VOTE on that proposal, which is on the wiki at
> > https://wiki.apache.org/incubator/ImpalaProposal, and which I've pasted
> > below.
> >
> > During the discussion period, the proposal has been amended to add Brock
> > Noland as a new mentor, to add one missed committer from the list and to
> > correct some issues with the dependency list.
> >
> > Please cast your votes as follows:
> >
> > [] +1, accept Impala into the Incubator
> > [] +/-0, non-counted vote to express a disposition
> > [] -1, do not accept Impala into the Incubator (please give your
> reason(s))
> >
> > As with the concurrent Kudu vote, I propose leaving the vote open for a
> > full seven days (to close at Tuesday, December 1st at noon PST), due to
> the
> > upcoming US holiday.
> >
> > Thanks,
> > Henry
> >
> > 
> >
> > = Abstract =
> > Impala is a high-performance C++ and Java SQL query engine for data
> stored
> > in Apache Hadoop-based clusters.
> >
> > = Proposal =
> >
> > We propose to contribute the Impala codebase and associated artifacts
> (e.g.
> > documentation, web-site content etc.) to the Apache Software Foundation
> > with the intent of forming a productive, meritocratic and open community
> > around Impala’s continued development, according to the ‘Apache Way’.
> >
> > Cloudera owns several trademarks regarding Impala, and proposes to
> transfer
> > ownership of those trademarks in full to the ASF.
> >
> > = Background =
> > Engineers at Cloudera developed Impala and released it as an
> > Apache-licensed open-source project in Fall 2012. Impala was written as a
> > brand-new, modern C++ SQL engine targeted from the start for data stored
> in
> > Apache Hadoop clusters.
> >
> > Impala’s most important benefit to users is high-performance, making it
> > extremely appropriate for common enterprise analytic and business
> > intelligence workloads. This is achieved by a number of software
> > techniques, including: native support for data stored in HDFS and related
> > filesystems, just-in-time compilation and optimization of individual
> query
> > plans, high-performance C++ codebase and massively-parallel distributed
> > architecture. In benchmarks, Impala is routinely amongst the very highest
> > performing SQL query engines.
> >
> > = Rationale =
> >
> > Despite the exciting innovation in the so-called ‘big-data’ space, SQL
> > remains by far the most common interface for interacting with data in
> both
> > traditional warehouses and modern ‘big-data’ clusters. There is clearly a
> > need, as evidenced by the eager adoption of Impala and other SQL engines
> in
> > enterprise contexts, for a query engine that offers the familiar SQL
> > interface, but that has been specifically designed to operate in massive,
> > distributed clusters rather than in traditional, fixed-hardware,
> > warehouse-specific deployments. Impala is one such query engine.
> >
> > We believe that the ASF is the right venue to foster an open-source
> > community around Impala’s development. We expect that Impala will benefit
> > from more productive collaboration with related Apache projects, and
> under
> > the auspices of the ASF will attract talented contributors who will push
> > Impala’s development forward at pace.
> >
> > We believe that the timing is right for Impala’s development to move
> > wholesale to the ASF: Impala is well-established, has been
> Apache-licensed
> > open-source for more than three years, and the core project is relatively
> > stable. We are excited to see where an ASF-based community can take
> Impala
> > from this strong starting point.
> >
> > = Initial Goals =
> > Our initial goals are as follows:
> >
> >  * Establish ASF-compatible engineering practices and workflows
> >  * Refactor and publish existing internal build scripts and test
> > infrastructure, in order to make them usable by any community member.
> >  * Transfer source code, documentation and associated artifacts to the
> ASF.
> >  * Grow the user and developer communities
> >
> > = Current Status =
> >
> > Impala is developed as an Apache-licensed open-source project. The source
> > code is available at http://github.com/cloudera/Impala, and developer
> > documentation is at https://github.com/cloudera/Impala/wiki. The
> majority
> > of commits to the project have come from Cloudera-employed developers,
> but
> > we have accepted some contributions from individuals from other
> > organizations.
> >
> > All code reviews are done via a public instance of the Gerrit review tool
> > at http://gerrit.cloudera.org:8080/, and discussed on a 

Re: [VOTE] Accept Impala into the Apache Incubator

2015-11-25 Thread Tom White
+1 (binding)

Tom

On Tue, Nov 24, 2015 at 9:03 PM, Henry Robinson  wrote:
> Hi -
>
> The [DISCUSS] thread has been quiet for a few days, so I think there's been
> sufficient opportunity for discussion around our proposal to bring Impala
> to the ASF Incubator.
>
> I'd like to call a VOTE on that proposal, which is on the wiki at
> https://wiki.apache.org/incubator/ImpalaProposal, and which I've pasted
> below.
>
> During the discussion period, the proposal has been amended to add Brock
> Noland as a new mentor, to add one missed committer from the list and to
> correct some issues with the dependency list.
>
> Please cast your votes as follows:
>
> [] +1, accept Impala into the Incubator
> [] +/-0, non-counted vote to express a disposition
> [] -1, do not accept Impala into the Incubator (please give your reason(s))
>
> As with the concurrent Kudu vote, I propose leaving the vote open for a
> full seven days (to close at Tuesday, December 1st at noon PST), due to the
> upcoming US holiday.
>
> Thanks,
> Henry
>
> 
>
> = Abstract =
> Impala is a high-performance C++ and Java SQL query engine for data stored
> in Apache Hadoop-based clusters.
>
> = Proposal =
>
> We propose to contribute the Impala codebase and associated artifacts (e.g.
> documentation, web-site content etc.) to the Apache Software Foundation
> with the intent of forming a productive, meritocratic and open community
> around Impala’s continued development, according to the ‘Apache Way’.
>
> Cloudera owns several trademarks regarding Impala, and proposes to transfer
> ownership of those trademarks in full to the ASF.
>
> = Background =
> Engineers at Cloudera developed Impala and released it as an
> Apache-licensed open-source project in Fall 2012. Impala was written as a
> brand-new, modern C++ SQL engine targeted from the start for data stored in
> Apache Hadoop clusters.
>
> Impala’s most important benefit to users is high-performance, making it
> extremely appropriate for common enterprise analytic and business
> intelligence workloads. This is achieved by a number of software
> techniques, including: native support for data stored in HDFS and related
> filesystems, just-in-time compilation and optimization of individual query
> plans, high-performance C++ codebase and massively-parallel distributed
> architecture. In benchmarks, Impala is routinely amongst the very highest
> performing SQL query engines.
>
> = Rationale =
>
> Despite the exciting innovation in the so-called ‘big-data’ space, SQL
> remains by far the most common interface for interacting with data in both
> traditional warehouses and modern ‘big-data’ clusters. There is clearly a
> need, as evidenced by the eager adoption of Impala and other SQL engines in
> enterprise contexts, for a query engine that offers the familiar SQL
> interface, but that has been specifically designed to operate in massive,
> distributed clusters rather than in traditional, fixed-hardware,
> warehouse-specific deployments. Impala is one such query engine.
>
> We believe that the ASF is the right venue to foster an open-source
> community around Impala’s development. We expect that Impala will benefit
> from more productive collaboration with related Apache projects, and under
> the auspices of the ASF will attract talented contributors who will push
> Impala’s development forward at pace.
>
> We believe that the timing is right for Impala’s development to move
> wholesale to the ASF: Impala is well-established, has been Apache-licensed
> open-source for more than three years, and the core project is relatively
> stable. We are excited to see where an ASF-based community can take Impala
> from this strong starting point.
>
> = Initial Goals =
> Our initial goals are as follows:
>
>  * Establish ASF-compatible engineering practices and workflows
>  * Refactor and publish existing internal build scripts and test
> infrastructure, in order to make them usable by any community member.
>  * Transfer source code, documentation and associated artifacts to the ASF.
>  * Grow the user and developer communities
>
> = Current Status =
>
> Impala is developed as an Apache-licensed open-source project. The source
> code is available at http://github.com/cloudera/Impala, and developer
> documentation is at https://github.com/cloudera/Impala/wiki. The majority
> of commits to the project have come from Cloudera-employed developers, but
> we have accepted some contributions from individuals from other
> organizations.
>
> All code reviews are done via a public instance of the Gerrit review tool
> at http://gerrit.cloudera.org:8080/, and discussed on a public mailing
> list. All patches must be reviewed before they are accepted into the
> codebase, via a voting mechanism that is similar to that used on Apache
> projects such as Hadoop and HBase.
>
> Before a patch is committed, it must pass a suite of pre-commit tests.
> These tests are currently run on Cloudera’s internal 

Re: [VOTE] Accept Kudu into the Apache Incubator

2015-11-25 Thread Tom White
+1 (binding)

Tom

On Tue, Nov 24, 2015 at 7:32 PM, Todd Lipcon  wrote:
> Hi all,
>
> Discussion on the [DISCUSS] thread seems to have wound down, so I'd like to
> call a VOTE on acceptance of Kudu into the ASF Incubator. The proposal is
> pasted below and also available on the wiki at:
> https://wiki.apache.org/incubator/KuduProposal
>
> The proposal is unchanged since the original version, except for the
> addition of Carl Steinbach as a Mentor.
>
> Please cast your votes:
>
> [] +1, accept Kudu into the Incubator
> [] +/-0, positive/negative non-counted expression of feelings
> [] -1, do not accept Kudu into the incubator (please state reasoning)
>
> Given the US holiday this week, I imagine many folks are traveling or
> otherwise offline. So, let's run the vote for a full week rather than the
> traditional 72 hours. Unless the IPMC objects to the extended voting
> period, the vote will close on Tues, Dec 1st at noon PST.
>
> Thanks
> -Todd
> -
>
> = Kudu Proposal =
>
> == Abstract ==
>
> Kudu is a distributed columnar storage engine built for the Apache Hadoop
> ecosystem.
>
> == Proposal ==
>
> Kudu is an open source storage engine for structured data which supports
> low-latency random access together with efficient analytical access
> patterns. Kudu distributes data using horizontal partitioning and
> replicates each partition using Raft consensus, providing low
> mean-time-to-recovery and low tail latencies. Kudu is designed within the
> context of the Apache Hadoop ecosystem and supports many integrations with
> other data analytics projects both inside and outside of the Apache
> Software Foundation.
>
>
>
> We propose to incubate Kudu as a project of the Apache Software Foundation.
>
> == Background ==
>
> In recent years, explosive growth in the amount of data being generated and
> captured by enterprises has resulted in the rapid adoption of open source
> technology which is able to store massive data sets at scale and at low
> cost. In particular, the Apache Hadoop ecosystem has become a focal point
> for such “big data” workloads, because many traditional open source
> database systems have lagged in offering a scalable alternative.
>
>
>
> Structured storage in the Hadoop ecosystem has typically been achieved in
> two ways: for static data sets, data is typically stored on Apache HDFS
> using binary data formats such as Apache Avro or Apache Parquet. However,
> neither HDFS nor these formats has any provision for updating individual
> records, or for efficient random access. Mutable data sets are typically
> stored in semi-structured stores such as Apache HBase or Apache Cassandra.
> These systems allow for low-latency record-level reads and writes, but lag
> far behind the static file formats in terms of sequential read throughput
> for applications such as SQL-based analytics or machine learning.
>
>
>
> Kudu is a new storage system designed and implemented from the ground up to
> fill this gap between high-throughput sequential-access storage systems
> such as HDFS and low-latency random-access systems such as HBase or
> Cassandra. While these existing systems continue to hold advantages in some
> situations, Kudu offers a “happy medium” alternative that can dramatically
> simplify the architecture of many common workloads. In particular, Kudu
> offers a simple API for row-level inserts, updates, and deletes, while
> providing table scans at throughputs similar to Parquet, a commonly-used
> columnar format for static data.
>
>
>
> More information on Kudu can be found at the existing open source project
> website: http://getkudu.io and in particular in the Kudu white-paper PDF:
> http://getkudu.io/kudu.pdf from which the above was excerpted.
>
> == Rationale ==
>
> As described above, Kudu fills an important gap in the open source storage
> ecosystem. After our initial open source project release in September 2015,
> we have seen a great amount of interest across a diverse set of users and
> companies. We believe that, as a storage system, it is critical to build an
> equally diverse set of contributors in the development community. Our
> experiences as committers and PMC members on other Apache projects have
> taught us the value of diverse communities in ensuring both longevity and
> high quality for such foundational systems.
>
> == Initial Goals ==
>
>  * Move the existing codebase, website, documentation, and mailing lists to
> Apache-hosted infrastructure
>  * Work with the infrastructure team to implement and approve our code
> review, build, and testing workflows in the context of the ASF
>  * Incremental development and releases per Apache guidelines
>
> == Current Status ==
>
>  Releases 
>
> Kudu has undergone one public release, tagged here
> https://github.com/cloudera/kudu/tree/kudu0.5.0-release
>
> This initial release was not performed in the typical ASF fashion -- no
> source tarball was released, but rather only convenience binaries made
> 

Re: [VOTE] Accept Impala into the Apache Incubator

2015-11-25 Thread Amol Kekre
+1 (non-binding)

Amol


On Wed, Nov 25, 2015 at 12:44 AM, Tom White  wrote:

> +1 (binding)
>
> Tom
>
> On Tue, Nov 24, 2015 at 9:03 PM, Henry Robinson 
> wrote:
> > Hi -
> >
> > The [DISCUSS] thread has been quiet for a few days, so I think there's
> been
> > sufficient opportunity for discussion around our proposal to bring Impala
> > to the ASF Incubator.
> >
> > I'd like to call a VOTE on that proposal, which is on the wiki at
> > https://wiki.apache.org/incubator/ImpalaProposal, and which I've pasted
> > below.
> >
> > During the discussion period, the proposal has been amended to add Brock
> > Noland as a new mentor, to add one missed committer from the list and to
> > correct some issues with the dependency list.
> >
> > Please cast your votes as follows:
> >
> > [] +1, accept Impala into the Incubator
> > [] +/-0, non-counted vote to express a disposition
> > [] -1, do not accept Impala into the Incubator (please give your
> reason(s))
> >
> > As with the concurrent Kudu vote, I propose leaving the vote open for a
> > full seven days (to close at Tuesday, December 1st at noon PST), due to
> the
> > upcoming US holiday.
> >
> > Thanks,
> > Henry
> >
> > 
> >
> > = Abstract =
> > Impala is a high-performance C++ and Java SQL query engine for data
> stored
> > in Apache Hadoop-based clusters.
> >
> > = Proposal =
> >
> > We propose to contribute the Impala codebase and associated artifacts
> (e.g.
> > documentation, web-site content etc.) to the Apache Software Foundation
> > with the intent of forming a productive, meritocratic and open community
> > around Impala’s continued development, according to the ‘Apache Way’.
> >
> > Cloudera owns several trademarks regarding Impala, and proposes to
> transfer
> > ownership of those trademarks in full to the ASF.
> >
> > = Background =
> > Engineers at Cloudera developed Impala and released it as an
> > Apache-licensed open-source project in Fall 2012. Impala was written as a
> > brand-new, modern C++ SQL engine targeted from the start for data stored
> in
> > Apache Hadoop clusters.
> >
> > Impala’s most important benefit to users is high-performance, making it
> > extremely appropriate for common enterprise analytic and business
> > intelligence workloads. This is achieved by a number of software
> > techniques, including: native support for data stored in HDFS and related
> > filesystems, just-in-time compilation and optimization of individual
> query
> > plans, high-performance C++ codebase and massively-parallel distributed
> > architecture. In benchmarks, Impala is routinely amongst the very highest
> > performing SQL query engines.
> >
> > = Rationale =
> >
> > Despite the exciting innovation in the so-called ‘big-data’ space, SQL
> > remains by far the most common interface for interacting with data in
> both
> > traditional warehouses and modern ‘big-data’ clusters. There is clearly a
> > need, as evidenced by the eager adoption of Impala and other SQL engines
> in
> > enterprise contexts, for a query engine that offers the familiar SQL
> > interface, but that has been specifically designed to operate in massive,
> > distributed clusters rather than in traditional, fixed-hardware,
> > warehouse-specific deployments. Impala is one such query engine.
> >
> > We believe that the ASF is the right venue to foster an open-source
> > community around Impala’s development. We expect that Impala will benefit
> > from more productive collaboration with related Apache projects, and
> under
> > the auspices of the ASF will attract talented contributors who will push
> > Impala’s development forward at pace.
> >
> > We believe that the timing is right for Impala’s development to move
> > wholesale to the ASF: Impala is well-established, has been
> Apache-licensed
> > open-source for more than three years, and the core project is relatively
> > stable. We are excited to see where an ASF-based community can take
> Impala
> > from this strong starting point.
> >
> > = Initial Goals =
> > Our initial goals are as follows:
> >
> >  * Establish ASF-compatible engineering practices and workflows
> >  * Refactor and publish existing internal build scripts and test
> > infrastructure, in order to make them usable by any community member.
> >  * Transfer source code, documentation and associated artifacts to the
> ASF.
> >  * Grow the user and developer communities
> >
> > = Current Status =
> >
> > Impala is developed as an Apache-licensed open-source project. The source
> > code is available at http://github.com/cloudera/Impala, and developer
> > documentation is at https://github.com/cloudera/Impala/wiki. The
> majority
> > of commits to the project have come from Cloudera-employed developers,
> but
> > we have accepted some contributions from individuals from other
> > organizations.
> >
> > All code reviews are done via a public instance of the Gerrit review tool
> > at http://gerrit.cloudera.org:8080/, and 

Re: [VOTE] Accept Kudu into the Apache Incubator

2015-11-25 Thread Amol Kekre
+1 (non-binding)

Amol


On Wed, Nov 25, 2015 at 3:19 AM, Roman Shaposhnik  wrote:

> On Tue, Nov 24, 2015 at 11:32 AM, Todd Lipcon  wrote:
> > Hi all,
> >
> > Discussion on the [DISCUSS] thread seems to have wound down, so I'd like
> to
> > call a VOTE on acceptance of Kudu into the ASF Incubator. The proposal is
> > pasted below and also available on the wiki at:
> > https://wiki.apache.org/incubator/KuduProposal
> >
> > The proposal is unchanged since the original version, except for the
> > addition of Carl Steinbach as a Mentor.
> >
> > Please cast your votes:
> >
> > [] +1, accept Kudu into the Incubator
> > [] +/-0, positive/negative non-counted expression of feelings
> > [] -1, do not accept Kudu into the incubator (please state reasoning)
>
> +1 (binding)
>
> Bets of luck guys!
>
> Thanks,
> Roman.
>
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>


Re: [VOTE] Accept Kudu into the Apache Incubator

2015-11-25 Thread Roman Shaposhnik
On Tue, Nov 24, 2015 at 11:32 AM, Todd Lipcon  wrote:
> Hi all,
>
> Discussion on the [DISCUSS] thread seems to have wound down, so I'd like to
> call a VOTE on acceptance of Kudu into the ASF Incubator. The proposal is
> pasted below and also available on the wiki at:
> https://wiki.apache.org/incubator/KuduProposal
>
> The proposal is unchanged since the original version, except for the
> addition of Carl Steinbach as a Mentor.
>
> Please cast your votes:
>
> [] +1, accept Kudu into the Incubator
> [] +/-0, positive/negative non-counted expression of feelings
> [] -1, do not accept Kudu into the incubator (please state reasoning)

+1 (binding)

Bets of luck guys!

Thanks,
Roman.

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [VOTE] Accept Impala into the Apache Incubator

2015-11-25 Thread Roman Shaposhnik
On Tue, Nov 24, 2015 at 1:03 PM, Henry Robinson  wrote:
> Hi -
>
> The [DISCUSS] thread has been quiet for a few days, so I think there's been
> sufficient opportunity for discussion around our proposal to bring Impala
> to the ASF Incubator.
>
> I'd like to call a VOTE on that proposal, which is on the wiki at
> https://wiki.apache.org/incubator/ImpalaProposal, and which I've pasted
> below.
>
> During the discussion period, the proposal has been amended to add Brock
> Noland as a new mentor, to add one missed committer from the list and to
> correct some issues with the dependency list.
>
> Please cast your votes as follows:
>
> [] +1, accept Impala into the Incubator
> [] +/-0, non-counted vote to express a disposition
> [] -1, do not accept Impala into the Incubator (please give your reason(s))

-1 (binding)

I wasn't convinced by the results of the RTC vs. CTR discussion
and given the initial composition of the community, I'd like to see
an initial commitment to erring on the side of inclusiveness rather
that the walled-garden community protected by Gerrit.

Thanks,
Roman.

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [DISCUSS] Kudu incubator proposal

2015-11-25 Thread Ted Dunning
Since the contributors were employed at Cloudera, they probably signed an
invention assignment.  That means Cloudera can sign an SGA.

On Wed, Nov 25, 2015 at 11:39 AM, Greg Stein  wrote:

> On Mon, Nov 23, 2015 at 12:46 PM, Alex Harui  wrote:
>
> > On 11/23/15, 8:23 AM, "Mattmann, Chris A (3980)"
> >  wrote:
> >
> > >Alex,
> > >
> > >Please re-read my email. As I stated we don’t take code that
> > >authors don’t want us to have. So far, we haven’t heard from any of
> > >the authors on the incoming Kudu project that that’s the case. If
> > >it’s not the case, we go by the license of the project which stipulates
> > >how code can be copied, modified, reused, etc.
> >
> > Yes, but my interpretation of your words is that folks have to opt out,
> >
>
> Correct: opt-out.
>
> Since this code is under ALv2, we can import it to the ASF under that
> license. We have always done stuff like this, including other permissive
> licenses.
>
> But this isn't simply importing a library, this is saying "the ASF is now
> the primary locus of development for >this< code." And that's where people
> can say, "woah. I hate you guys. don't develop my code there", and so we
> nuke it.
>
> SGA/iCLA is to give us rights that we otherwise wouldn't have (ie. the code
> was under a different license).
>
> Cheers,
> -g
>