Re: [DISCUSS] project for pre-commit patch testing (was Re: upstream jenkins build broken?)

2015-07-11 Thread Sean Busbey
As mentioned on HADOOP-12111, there is now an incubator-style proposal:
http://wiki.apache.org/incubator/YetusProposal

On Wed, Jun 24, 2015 at 9:41 AM, Sean Busbey  wrote:

> Hi Folks!
>
> Work in a feature branch is now being tracked by HADOOP-12111.
>
> On Thu, Jun 18, 2015 at 10:07 PM, Sean Busbey  wrote:
>
>> It looks like we have consensus.
>>
>> I'll start drafting up a proposal for the next board meeting (July 15th).
>> Once we work out the name I'll submit a PODLINGNAMESEARCH jira to track
>> that we did due diligence on whatever we pick.
>>
>> In the mean time, Hadoop PMC would y'all be willing to host us in a
>> branch so that we can start prepping things now? We would want branch
>> commit rights for the proposed new PMC.
>>
>>
>> -Sean
>>
>>
>> On Mon, Jun 15, 2015 at 6:47 PM, Sean Busbey  wrote:
>>
>>> Oof. I had meant to push on this again but life got in the way and now
>>> the June board meeting is upon us. Sorry everyone. In the event that this
>>> ends up contentious, hopefully one of the copied communities can give us a
>>> branch to work in.
>>>
>>> I know everyone is busy, so here's the short version of this email: I'd
>>> like to move some of the code currently in Hadoop (test-patch) into a new
>>> TLP focused on QA tooling. I'm not sure what the best format for priming
>>> this conversation is. ORC filled in the incubator project proposal
>>> template, but I'm not sure how much that confused the issue. So to start,
>>> I'll just write what I'm hoping we can accomplish in general terms here.
>>>
>>> All software development projects that are community based (that is,
>>> accepting outside contributions) face a common QA problem for vetting
>>> in-coming contributions. Hadoop is fortunate enough to be sufficiently
>>> popular that the weight of the problem drove tool development (i.e.
>>> test-patch). That tool is generalizable enough that a bunch of other TLPs
>>> have adopted their own forks. Unfortunately, in most projects this kind of
>>> QA work is an enabler rather than a primary concern, so often the tooling
>>> is worked on ad-hoc and little shared improvements happen across projects. 
>>> Since
>>> the tooling itself is never a primary concern, any made is rarely reused
>>> outside of ASF projects.
>>>
>>> Over the last couple months a few of us have been working on
>>> generalizing the tooling present in the Hadoop code base (because it was
>>> the most mature out of all those in the various projects) and it's reached
>>> a point where we think we can start bringing on other downstream users.
>>> This means we need to start establishing things like a release cadence and
>>> to grow the new contributors we have to handle more project responsibility.
>>> Personally, I think that means it's time to move out from under Hadoop to
>>> drive things as our own community. Eventually, I hope the community can
>>> help draw in a group of folks traditionally underrepresented in ASF
>>> projects, namely QA and operations folks.
>>>
>>> I think test-patch by itself has enough scope to justify a project.
>>> Having a solid set of build tools that are customizable to fit the norms of
>>> different software communities is a bunch of work. Making it work well in
>>> both the context of automated test systems like Jenkins and for individual
>>> developers is even more work. We could easily also take over maintenance of
>>> things like shelldocs, since test-patch is the primary consumer of that
>>> currently but it's generally useful tooling.
>>>
>>> In addition to test-patch, I think the proposed project has some future
>>> growth potential. Given some adoption of test-patch to prove utility, the
>>> project could build on the ties it makes to start building tools to help
>>> projects do their own longer-run testing. Note that I'm talking about the
>>> tools to build QA processes and not a particular set of tested components.
>>> Specifically, I think the ChaosMonkey work that's in HBase should be
>>> generalizable as a fault injection framework (either based on that code or
>>> something like it). Doing this for arbitrary software is obviously very
>>> difficult, and a part of easing that will be to make (and then favor)
>>> tooling to allow projects to have operational glue that looks the same.
>>> Namely, the shell work that's been done in hadoop-functions.sh would be a
>>> great foundational layer that could bring good daemon handling practices to
>>> a whole slew of software projects. In the event that these frameworks and
>>> tools get adopted by parts of the Hadoop ecosystem, that could make the job
>>> of i.e. Bigtop substantially easier.
>>>
>>> I've reached out to a few folks who have been involved in the current
>>> test-patch work or expressed interest in helping out on getting it used in
>>> other projects. Right now, the proposed PMC would be (alphabetical by last
>>> name):
>>>
>>> * Andrew Bayer (ASF member, incubator pmc, bigtop pmc, flume pmc,
>>> jclouds pmc, sqoop pmc, all 

Re: [DISCUSS] project for pre-commit patch testing (was Re: upstream jenkins build broken?)

2015-06-24 Thread Sean Busbey
Hi Folks!

Work in a feature branch is now being tracked by HADOOP-12111.

On Thu, Jun 18, 2015 at 10:07 PM, Sean Busbey  wrote:

> It looks like we have consensus.
>
> I'll start drafting up a proposal for the next board meeting (July 15th).
> Once we work out the name I'll submit a PODLINGNAMESEARCH jira to track
> that we did due diligence on whatever we pick.
>
> In the mean time, Hadoop PMC would y'all be willing to host us in a branch
> so that we can start prepping things now? We would want branch commit
> rights for the proposed new PMC.
>
>
> -Sean
>
>
> On Mon, Jun 15, 2015 at 6:47 PM, Sean Busbey  wrote:
>
>> Oof. I had meant to push on this again but life got in the way and now
>> the June board meeting is upon us. Sorry everyone. In the event that this
>> ends up contentious, hopefully one of the copied communities can give us a
>> branch to work in.
>>
>> I know everyone is busy, so here's the short version of this email: I'd
>> like to move some of the code currently in Hadoop (test-patch) into a new
>> TLP focused on QA tooling. I'm not sure what the best format for priming
>> this conversation is. ORC filled in the incubator project proposal
>> template, but I'm not sure how much that confused the issue. So to start,
>> I'll just write what I'm hoping we can accomplish in general terms here.
>>
>> All software development projects that are community based (that is,
>> accepting outside contributions) face a common QA problem for vetting
>> in-coming contributions. Hadoop is fortunate enough to be sufficiently
>> popular that the weight of the problem drove tool development (i.e.
>> test-patch). That tool is generalizable enough that a bunch of other TLPs
>> have adopted their own forks. Unfortunately, in most projects this kind of
>> QA work is an enabler rather than a primary concern, so often the tooling
>> is worked on ad-hoc and little shared improvements happen across projects. 
>> Since
>> the tooling itself is never a primary concern, any made is rarely reused
>> outside of ASF projects.
>>
>> Over the last couple months a few of us have been working on generalizing
>> the tooling present in the Hadoop code base (because it was the most mature
>> out of all those in the various projects) and it's reached a point where we
>> think we can start bringing on other downstream users. This means we need
>> to start establishing things like a release cadence and to grow the new
>> contributors we have to handle more project responsibility. Personally, I
>> think that means it's time to move out from under Hadoop to drive things as
>> our own community. Eventually, I hope the community can help draw in a
>> group of folks traditionally underrepresented in ASF projects, namely QA
>> and operations folks.
>>
>> I think test-patch by itself has enough scope to justify a project.
>> Having a solid set of build tools that are customizable to fit the norms of
>> different software communities is a bunch of work. Making it work well in
>> both the context of automated test systems like Jenkins and for individual
>> developers is even more work. We could easily also take over maintenance of
>> things like shelldocs, since test-patch is the primary consumer of that
>> currently but it's generally useful tooling.
>>
>> In addition to test-patch, I think the proposed project has some future
>> growth potential. Given some adoption of test-patch to prove utility, the
>> project could build on the ties it makes to start building tools to help
>> projects do their own longer-run testing. Note that I'm talking about the
>> tools to build QA processes and not a particular set of tested components.
>> Specifically, I think the ChaosMonkey work that's in HBase should be
>> generalizable as a fault injection framework (either based on that code or
>> something like it). Doing this for arbitrary software is obviously very
>> difficult, and a part of easing that will be to make (and then favor)
>> tooling to allow projects to have operational glue that looks the same.
>> Namely, the shell work that's been done in hadoop-functions.sh would be a
>> great foundational layer that could bring good daemon handling practices to
>> a whole slew of software projects. In the event that these frameworks and
>> tools get adopted by parts of the Hadoop ecosystem, that could make the job
>> of i.e. Bigtop substantially easier.
>>
>> I've reached out to a few folks who have been involved in the current
>> test-patch work or expressed interest in helping out on getting it used in
>> other projects. Right now, the proposed PMC would be (alphabetical by last
>> name):
>>
>> * Andrew Bayer (ASF member, incubator pmc, bigtop pmc, flume pmc, jclouds
>> pmc, sqoop pmc, all around Jenkins expert)
>> * Sean Busbey (ASF member, accumulo pmc, hbase pmc)
>> * Nick Dimiduk (hbase pmc, phoenix pmc)
>> * Chris Nauroth (ASF member, incubator pmc, hadoop pmc)
>> * Andrew Purtell  (ASF member, incubator pmc, bigtop pmc, hbase pmc,
>> phoenix pm

Re: [DISCUSS] project for pre-commit patch testing (was Re: upstream jenkins build broken?)

2015-06-24 Thread Nigel Daley
+1 for a separate project and going directly to TLP if possible (as Hadoop 
itself did when split out of Nutch)

+1 for having language discussions once it's a TLP :-)

Cheers,
Nigel

> On Jun 22, 2015, at 1:55 PM, Andrew Purtell  wrote:
> 
>> On Mon, Jun 22, 2015 at 1:03 PM, Nick Dimiduk  wrote:
>> 
>> On Mon, Jun 22, 2015 at 12:43 PM, Colin P. McCabe 
>> wrote:
>> 
>>> You mentioned that "most of our project will be focused on shell
>>> scripts" I guess based on the existing test-patch code.  Allen did a
>>> lot of good work in this area recently.  I am curious if you evaluated
>>> languages such as Python or Node.js for this use-case.  Shell scripts
>>> can get a little... tricky beyond a certain size.  On the other hand,
>>> if we are standardizing on shell, which shell and which version?
>>> Perhaps bash 3.5+?
>> 
>> I'll also add that shell is not helpful for a cross-platform set of
>> tooling. I recently added a daemon to Apache Phoenix; an explicit
>> requirement was Windows support. I ended up implementing a solution in
>> python because that environment is platform-agnostic and still systems-y
>> enough. I think this is something this project should seriously consider.
> 
> In my opinion, historically, test-patch hasn't needed to be cross platform
> because the only first class development environment for Hadoop has been
> Linux. Growing beyond this could absolutely be one focus of Yetus should
> that be a consensus goal of the community. The seed of the project, though,
> is today's test-patch, which is implemented in bash. That's where we are
> today. Language "discussions" (smile) can and should be forward looking.
> 
> 
>> On Mon, Jun 22, 2015 at 1:03 PM, Nick Dimiduk  wrote:
>> 
>> On Mon, Jun 22, 2015 at 12:43 PM, Colin P. McCabe 
>> wrote:
>> 
>>> You mentioned that "most of our project will be focused on shell
>>> scripts" I guess based on the existing test-patch code.  Allen did a
>>> lot of good work in this area recently.  I am curious if you evaluated
>>> languages such as Python or Node.js for this use-case.  Shell scripts
>>> can get a little... tricky beyond a certain size.  On the other hand,
>>> if we are standardizing on shell, which shell and which version?
>>> Perhaps bash 3.5+?
>> 
>> I'll also add that shell is not helpful for a cross-platform set of
>> tooling. I recently added a daemon to Apache Phoenix; an explicit
>> requirement was Windows support. I ended up implementing a solution in
>> python because that environment is platform-agnostic and still systems-y
>> enough. I think this is something this project should seriously consider.
>> 
>> -n
>> 
>> On Tue, Jun 16, 2015 at 7:55 PM, Sean Busbey  wrote:
 I'm going to try responding to several things at once here, so
>> apologies
>>> if
 I miss anyone and sorry for the long email. :)
 
 
 On Tue, Jun 16, 2015 at 3:44 PM, Steve Loughran <
>> ste...@hortonworks.com>
 wrote:
 
> I think it's good to have a general build/test process projects can
>>> share,
> so +1 to pulling it out. You should get help from others.
> 
> regarding incubation, it is a lot of work, especially for something
>>> that's
> more of an in-house tool than an artifact to release and redistribute.
> 
> You can't just use apache labs or the build project's repo to work on
>>> this?
> 
> if you do want to incubate, we may want to nominate the hadoop project
>>> as
> the monitoring PMC, rather than incubator@.
> 
> -steve
 Important note: we're proposing a board resolution that would directly
>>> pull
 this code base out into a new TLP; there'd be no incubator, we'd just
 continue building community and start making releases.
 
 The proposed PMC believes the tooling we're talking about has direct
 applicability to projects well outside of the ASF. Lot's of other open
 source projects run on community contributions and have a general need
>>> for
 better QA tools. Given that problem set and the presence of a community
 working to solve it, there's no reason this needs to be treated as an
 in-house build project. We certainly want to be useful to ASF projects
>>> and
 getting them on-board given our current optimization for ASF infra will
 certainly be easier, but we're not limited to that (and our current
 prerequisites, a CI tool and jira or github, are pretty broadly
>>> available).
 
 
> On Tue, Jun 16, 2015 at 10:13 AM, Nick Dimiduk 
 wrote:
 
> 
> Since we're tossing out names, how about Apache Bootstrap? It's a
> meta-project to help other projects get off the ground, after all.
 
 
 There's already a web development framework named Bootstrap[1]. It's
>> also
 used by several ASF projects, so I think it best to avoid the
>> confusion.
 
 The name is, of course, up to the proposed PMC. As a bit of background,
>>> the
 current name Yetus fulfills Allen's desire to ha

Re: [DISCUSS] project for pre-commit patch testing (was Re: upstream jenkins build broken?)

2015-06-22 Thread Andrew Purtell
On Mon, Jun 22, 2015 at 1:03 PM, Nick Dimiduk  wrote:

> On Mon, Jun 22, 2015 at 12:43 PM, Colin P. McCabe 
> wrote:
>
> > You mentioned that "most of our project will be focused on shell
> > scripts" I guess based on the existing test-patch code.  Allen did a
> > lot of good work in this area recently.  I am curious if you evaluated
> > languages such as Python or Node.js for this use-case.  Shell scripts
> > can get a little... tricky beyond a certain size.  On the other hand,
> > if we are standardizing on shell, which shell and which version?
> > Perhaps bash 3.5+?
> >
>
> I'll also add that shell is not helpful for a cross-platform set of
> tooling. I recently added a daemon to Apache Phoenix; an explicit
> requirement was Windows support. I ended up implementing a solution in
> python because that environment is platform-agnostic and still systems-y
> enough. I think this is something this project should seriously consider.
>

In my opinion, historically, test-patch hasn't needed to be cross platform
because the only first class development environment for Hadoop has been
Linux. Growing beyond this could absolutely be one focus of Yetus should
that be a consensus goal of the community. The seed of the project, though,
is today's test-patch, which is implemented in bash. That's where we are
today. Language "discussions" (smile) can and should be forward looking.


On Mon, Jun 22, 2015 at 1:03 PM, Nick Dimiduk  wrote:

> On Mon, Jun 22, 2015 at 12:43 PM, Colin P. McCabe 
> wrote:
>
> > You mentioned that "most of our project will be focused on shell
> > scripts" I guess based on the existing test-patch code.  Allen did a
> > lot of good work in this area recently.  I am curious if you evaluated
> > languages such as Python or Node.js for this use-case.  Shell scripts
> > can get a little... tricky beyond a certain size.  On the other hand,
> > if we are standardizing on shell, which shell and which version?
> > Perhaps bash 3.5+?
> >
>
> I'll also add that shell is not helpful for a cross-platform set of
> tooling. I recently added a daemon to Apache Phoenix; an explicit
> requirement was Windows support. I ended up implementing a solution in
> python because that environment is platform-agnostic and still systems-y
> enough. I think this is something this project should seriously consider.
>
> -n
>
> On Tue, Jun 16, 2015 at 7:55 PM, Sean Busbey  wrote:
> > > I'm going to try responding to several things at once here, so
> apologies
> > if
> > > I miss anyone and sorry for the long email. :)
> > >
> > >
> > > On Tue, Jun 16, 2015 at 3:44 PM, Steve Loughran <
> ste...@hortonworks.com>
> > > wrote:
> > >
> > >> I think it's good to have a general build/test process projects can
> > share,
> > >> so +1 to pulling it out. You should get help from others.
> > >>
> > >> regarding incubation, it is a lot of work, especially for something
> > that's
> > >> more of an in-house tool than an artifact to release and redistribute.
> > >>
> > >> You can't just use apache labs or the build project's repo to work on
> > this?
> > >>
> > >> if you do want to incubate, we may want to nominate the hadoop project
> > as
> > >> the monitoring PMC, rather than incubator@.
> > >>
> > >> -steve
> > >>
> > >>
> > > Important note: we're proposing a board resolution that would directly
> > pull
> > > this code base out into a new TLP; there'd be no incubator, we'd just
> > > continue building community and start making releases.
> > >
> > > The proposed PMC believes the tooling we're talking about has direct
> > > applicability to projects well outside of the ASF. Lot's of other open
> > > source projects run on community contributions and have a general need
> > for
> > > better QA tools. Given that problem set and the presence of a community
> > > working to solve it, there's no reason this needs to be treated as an
> > > in-house build project. We certainly want to be useful to ASF projects
> > and
> > > getting them on-board given our current optimization for ASF infra will
> > > certainly be easier, but we're not limited to that (and our current
> > > prerequisites, a CI tool and jira or github, are pretty broadly
> > available).
> > >
> > >
> > > On Tue, Jun 16, 2015 at 10:13 AM, Nick Dimiduk 
> > wrote:
> > >
> > >>
> > >> Since we're tossing out names, how about Apache Bootstrap? It's a
> > >> meta-project to help other projects get off the ground, after all.
> > >>
> > >
> > >
> > > There's already a web development framework named Bootstrap[1]. It's
> also
> > > used by several ASF projects, so I think it best to avoid the
> confusion.
> > >
> > > The name is, of course, up to the proposed PMC. As a bit of background,
> > the
> > > current name Yetus fulfills Allen's desire to have something shell
> > related
> > > and my desire to have a project that starts with Y (there are currently
> > no
> > > ASF projects that start with Y). The universe of names that fill in
> these
> > > two is very small, AFAICT. I did a brie

Re: [DISCUSS] project for pre-commit patch testing (was Re: upstream jenkins build broken?)

2015-06-22 Thread Nick Dimiduk
On Mon, Jun 22, 2015 at 12:43 PM, Colin P. McCabe 
wrote:

> You mentioned that "most of our project will be focused on shell
> scripts" I guess based on the existing test-patch code.  Allen did a
> lot of good work in this area recently.  I am curious if you evaluated
> languages such as Python or Node.js for this use-case.  Shell scripts
> can get a little... tricky beyond a certain size.  On the other hand,
> if we are standardizing on shell, which shell and which version?
> Perhaps bash 3.5+?
>

I'll also add that shell is not helpful for a cross-platform set of
tooling. I recently added a daemon to Apache Phoenix; an explicit
requirement was Windows support. I ended up implementing a solution in
python because that environment is platform-agnostic and still systems-y
enough. I think this is something this project should seriously consider.

-n

On Tue, Jun 16, 2015 at 7:55 PM, Sean Busbey  wrote:
> > I'm going to try responding to several things at once here, so apologies
> if
> > I miss anyone and sorry for the long email. :)
> >
> >
> > On Tue, Jun 16, 2015 at 3:44 PM, Steve Loughran 
> > wrote:
> >
> >> I think it's good to have a general build/test process projects can
> share,
> >> so +1 to pulling it out. You should get help from others.
> >>
> >> regarding incubation, it is a lot of work, especially for something
> that's
> >> more of an in-house tool than an artifact to release and redistribute.
> >>
> >> You can't just use apache labs or the build project's repo to work on
> this?
> >>
> >> if you do want to incubate, we may want to nominate the hadoop project
> as
> >> the monitoring PMC, rather than incubator@.
> >>
> >> -steve
> >>
> >>
> > Important note: we're proposing a board resolution that would directly
> pull
> > this code base out into a new TLP; there'd be no incubator, we'd just
> > continue building community and start making releases.
> >
> > The proposed PMC believes the tooling we're talking about has direct
> > applicability to projects well outside of the ASF. Lot's of other open
> > source projects run on community contributions and have a general need
> for
> > better QA tools. Given that problem set and the presence of a community
> > working to solve it, there's no reason this needs to be treated as an
> > in-house build project. We certainly want to be useful to ASF projects
> and
> > getting them on-board given our current optimization for ASF infra will
> > certainly be easier, but we're not limited to that (and our current
> > prerequisites, a CI tool and jira or github, are pretty broadly
> available).
> >
> >
> > On Tue, Jun 16, 2015 at 10:13 AM, Nick Dimiduk 
> wrote:
> >
> >>
> >> Since we're tossing out names, how about Apache Bootstrap? It's a
> >> meta-project to help other projects get off the ground, after all.
> >>
> >
> >
> > There's already a web development framework named Bootstrap[1]. It's also
> > used by several ASF projects, so I think it best to avoid the confusion.
> >
> > The name is, of course, up to the proposed PMC. As a bit of background,
> the
> > current name Yetus fulfills Allen's desire to have something shell
> related
> > and my desire to have a project that starts with Y (there are currently
> no
> > ASF projects that start with Y). The universe of names that fill in these
> > two is very small, AFAICT. I did a brief suitability search and didn't
> find
> > any blockers.
> >
> >
> >  On Tue, Jun 16, 2015 at 11:59 AM, Allen Wittenauer 
> >  wrote:
> >
> >>
> >> Since a couple of people have brought it up:
> >>
> >> I think the release question is probably one of the big question
> >> marks.  Other than tar balls, how does something like this actually get
> >> used downstream?
> >>
> >> For test-patch, in particular, I have a few thoughts on this:
> >>
> >> Short term:
> >>
> >> * Projects that want to move RIGHT NOW would modify their
> Jenkins
> >> jobs to checkout from the Yetus repo (preferably at a well known tag or
> >> branch) in one directory and their project repo in another directory.
> Then
> >> it’s just a matter of passing the correct flags to test-patch.  This is
> >> pretty much how I’ve been personally running test-patch for about 6
> months
> >> now. Under Jenkins, we’ve seen this work with NiFi (incubating) already.
> >>
> >> * Create a stub version of test-patch that projects could check
> >> into their repo, replacing the existing test-patch.  This stub version
> >> would git clone from either ASF or github and then execute test-patch
> >> accordingly on demand.  With the correct smarts, it could make sure it
> has
> >> a cached version to prevent continual clones.
> >>
> >> Longer term:
> >>
> >> * I’ve been toying with the idea of (ab)using Java repos and
> >> packaging as a transportation layer, either in addition or in
> combination
> >> with something like a maven plugin.  Something like this would clearly
> be
> >> better for offline usage and/or to lower the network tra

Re: [DISCUSS] project for pre-commit patch testing (was Re: upstream jenkins build broken?)

2015-06-22 Thread Colin P. McCabe
+1 for making this a separate project.  We've always struggled with a
lot of forks of the test-patch code and perhaps this project can help
create something that works well for multiple projects.

Bypassing the incubator seems kind of weird (I didn't know that was an
option) but I will let other people with more experience in the ASF
comment on that.

You mentioned that "most of our project will be focused on shell
scripts" I guess based on the existing test-patch code.  Allen did a
lot of good work in this area recently.  I am curious if you evaluated
languages such as Python or Node.js for this use-case.  Shell scripts
can get a little... tricky beyond a certain size.  On the other hand,
if we are standardizing on shell, which shell and which version?
Perhaps bash 3.5+?

Also, what will be the mechanism for customizing this for each
project?  Ideally the customizations needed would be small so we could
share the most code.

cheers,
Colin


On Tue, Jun 16, 2015 at 7:55 PM, Sean Busbey  wrote:
> I'm going to try responding to several things at once here, so apologies if
> I miss anyone and sorry for the long email. :)
>
>
> On Tue, Jun 16, 2015 at 3:44 PM, Steve Loughran 
> wrote:
>
>> I think it's good to have a general build/test process projects can share,
>> so +1 to pulling it out. You should get help from others.
>>
>> regarding incubation, it is a lot of work, especially for something that's
>> more of an in-house tool than an artifact to release and redistribute.
>>
>> You can't just use apache labs or the build project's repo to work on this?
>>
>> if you do want to incubate, we may want to nominate the hadoop project as
>> the monitoring PMC, rather than incubator@.
>>
>> -steve
>>
>>
> Important note: we're proposing a board resolution that would directly pull
> this code base out into a new TLP; there'd be no incubator, we'd just
> continue building community and start making releases.
>
> The proposed PMC believes the tooling we're talking about has direct
> applicability to projects well outside of the ASF. Lot's of other open
> source projects run on community contributions and have a general need for
> better QA tools. Given that problem set and the presence of a community
> working to solve it, there's no reason this needs to be treated as an
> in-house build project. We certainly want to be useful to ASF projects and
> getting them on-board given our current optimization for ASF infra will
> certainly be easier, but we're not limited to that (and our current
> prerequisites, a CI tool and jira or github, are pretty broadly available).
>
>
> On Tue, Jun 16, 2015 at 10:13 AM, Nick Dimiduk  wrote:
>
>>
>> Since we're tossing out names, how about Apache Bootstrap? It's a
>> meta-project to help other projects get off the ground, after all.
>>
>
>
> There's already a web development framework named Bootstrap[1]. It's also
> used by several ASF projects, so I think it best to avoid the confusion.
>
> The name is, of course, up to the proposed PMC. As a bit of background, the
> current name Yetus fulfills Allen's desire to have something shell related
> and my desire to have a project that starts with Y (there are currently no
> ASF projects that start with Y). The universe of names that fill in these
> two is very small, AFAICT. I did a brief suitability search and didn't find
> any blockers.
>
>
>  On Tue, Jun 16, 2015 at 11:59 AM, Allen Wittenauer 
>  wrote:
>
>>
>> Since a couple of people have brought it up:
>>
>> I think the release question is probably one of the big question
>> marks.  Other than tar balls, how does something like this actually get
>> used downstream?
>>
>> For test-patch, in particular, I have a few thoughts on this:
>>
>> Short term:
>>
>> * Projects that want to move RIGHT NOW would modify their Jenkins
>> jobs to checkout from the Yetus repo (preferably at a well known tag or
>> branch) in one directory and their project repo in another directory.  Then
>> it’s just a matter of passing the correct flags to test-patch.  This is
>> pretty much how I’ve been personally running test-patch for about 6 months
>> now. Under Jenkins, we’ve seen this work with NiFi (incubating) already.
>>
>> * Create a stub version of test-patch that projects could check
>> into their repo, replacing the existing test-patch.  This stub version
>> would git clone from either ASF or github and then execute test-patch
>> accordingly on demand.  With the correct smarts, it could make sure it has
>> a cached version to prevent continual clones.
>>
>> Longer term:
>>
>> * I’ve been toying with the idea of (ab)using Java repos and
>> packaging as a transportation layer, either in addition or in combination
>> with something like a maven plugin.  Something like this would clearly be
>> better for offline usage and/or to lower the network traffic.
>>
>
> It's important that the project follow ASF guidelines on publishing
> releases[2]. So long as we publish rel

Re: [DISCUSS] project for pre-commit patch testing (was Re: upstream jenkins build broken?)

2015-06-18 Thread Sean Busbey
It looks like we have consensus.

I'll start drafting up a proposal for the next board meeting (July 15th).
Once we work out the name I'll submit a PODLINGNAMESEARCH jira to track
that we did due diligence on whatever we pick.

In the mean time, Hadoop PMC would y'all be willing to host us in a branch
so that we can start prepping things now? We would want branch commit
rights for the proposed new PMC.


-Sean


On Mon, Jun 15, 2015 at 6:47 PM, Sean Busbey  wrote:

> Oof. I had meant to push on this again but life got in the way and now the
> June board meeting is upon us. Sorry everyone. In the event that this ends
> up contentious, hopefully one of the copied communities can give us a
> branch to work in.
>
> I know everyone is busy, so here's the short version of this email: I'd
> like to move some of the code currently in Hadoop (test-patch) into a new
> TLP focused on QA tooling. I'm not sure what the best format for priming
> this conversation is. ORC filled in the incubator project proposal
> template, but I'm not sure how much that confused the issue. So to start,
> I'll just write what I'm hoping we can accomplish in general terms here.
>
> All software development projects that are community based (that is,
> accepting outside contributions) face a common QA problem for vetting
> in-coming contributions. Hadoop is fortunate enough to be sufficiently
> popular that the weight of the problem drove tool development (i.e.
> test-patch). That tool is generalizable enough that a bunch of other TLPs
> have adopted their own forks. Unfortunately, in most projects this kind of
> QA work is an enabler rather than a primary concern, so often the tooling
> is worked on ad-hoc and little shared improvements happen across projects. 
> Since
> the tooling itself is never a primary concern, any made is rarely reused
> outside of ASF projects.
>
> Over the last couple months a few of us have been working on generalizing
> the tooling present in the Hadoop code base (because it was the most mature
> out of all those in the various projects) and it's reached a point where we
> think we can start bringing on other downstream users. This means we need
> to start establishing things like a release cadence and to grow the new
> contributors we have to handle more project responsibility. Personally, I
> think that means it's time to move out from under Hadoop to drive things as
> our own community. Eventually, I hope the community can help draw in a
> group of folks traditionally underrepresented in ASF projects, namely QA
> and operations folks.
>
> I think test-patch by itself has enough scope to justify a project. Having
> a solid set of build tools that are customizable to fit the norms of
> different software communities is a bunch of work. Making it work well in
> both the context of automated test systems like Jenkins and for individual
> developers is even more work. We could easily also take over maintenance of
> things like shelldocs, since test-patch is the primary consumer of that
> currently but it's generally useful tooling.
>
> In addition to test-patch, I think the proposed project has some future
> growth potential. Given some adoption of test-patch to prove utility, the
> project could build on the ties it makes to start building tools to help
> projects do their own longer-run testing. Note that I'm talking about the
> tools to build QA processes and not a particular set of tested components.
> Specifically, I think the ChaosMonkey work that's in HBase should be
> generalizable as a fault injection framework (either based on that code or
> something like it). Doing this for arbitrary software is obviously very
> difficult, and a part of easing that will be to make (and then favor)
> tooling to allow projects to have operational glue that looks the same.
> Namely, the shell work that's been done in hadoop-functions.sh would be a
> great foundational layer that could bring good daemon handling practices to
> a whole slew of software projects. In the event that these frameworks and
> tools get adopted by parts of the Hadoop ecosystem, that could make the job
> of i.e. Bigtop substantially easier.
>
> I've reached out to a few folks who have been involved in the current
> test-patch work or expressed interest in helping out on getting it used in
> other projects. Right now, the proposed PMC would be (alphabetical by last
> name):
>
> * Andrew Bayer (ASF member, incubator pmc, bigtop pmc, flume pmc, jclouds
> pmc, sqoop pmc, all around Jenkins expert)
> * Sean Busbey (ASF member, accumulo pmc, hbase pmc)
> * Nick Dimiduk (hbase pmc, phoenix pmc)
> * Chris Nauroth (ASF member, incubator pmc, hadoop pmc)
> * Andrew Purtell  (ASF member, incubator pmc, bigtop pmc, hbase pmc,
> phoenix pmc)
> * Allen Wittenauer (hadoop committer)
>
> That PMC gives us several members and a bunch of folks familiar with the
> ASF. Combined with the code already existing in Apache spaces, I think that
> gives us sufficient justif

Re: [DISCUSS] project for pre-commit patch testing (was Re: upstream jenkins build broken?)

2015-06-17 Thread Ray Chiang
For words beginning with "Y"

Yale: Mythical animal elephant/boar.  After some quick Googling, it's
apparently originally documented by "Pliny the Elder", so it's got a
beer-related connotation too.  Downside, might get confused with the
university of the same name.
Yare: adj for quick/agile/lively
Yucca: noun for a plant that an elephant *could* eat
Yair: Scottish term for a fish trap.  Also a Hebrew name for "He will
enlighten"

-Ray


On Wed, Jun 17, 2015 at 5:03 AM, Steve Loughran 
wrote:

>
> > On 17 Jun 2015, at 03:55, Sean Busbey  wrote:
> >
> > The name is, of course, up to the proposed PMC. As a bit of background,
> the
> > current name Yetus fulfills Allen's desire to have something shell
> related
> > and my desire to have a project that starts with Y (there are currently
> no
> > ASF projects that start with Y). The universe of names that fill in these
> > two is very small, AFAICT. I did a brief suitability search and didn't
> find
> > any blockers.
>
>
> Apache YouBrokeTheBuild?
>
> I'd thought of "yeti", but there's a couple of software projects/products
> called that already.
>
> Here's a complete list of things that live alongside elephants in
> Tanzania; nothing beginning with Y
>
> http://www.serengeti.org/animals.html
>
> if you pick one from that list I may have a photo for your slides
>


Re: [DISCUSS] project for pre-commit patch testing (was Re: upstream jenkins build broken?)

2015-06-17 Thread Steve Loughran

> On 17 Jun 2015, at 03:55, Sean Busbey  wrote:
> 
> The name is, of course, up to the proposed PMC. As a bit of background, the
> current name Yetus fulfills Allen's desire to have something shell related
> and my desire to have a project that starts with Y (there are currently no
> ASF projects that start with Y). The universe of names that fill in these
> two is very small, AFAICT. I did a brief suitability search and didn't find
> any blockers.


Apache YouBrokeTheBuild?

I'd thought of "yeti", but there's a couple of software projects/products 
called that already.

Here's a complete list of things that live alongside elephants in Tanzania; 
nothing beginning with Y

http://www.serengeti.org/animals.html

if you pick one from that list I may have a photo for your slides


Re: [DISCUSS] project for pre-commit patch testing (was Re: upstream jenkins build broken?)

2015-06-16 Thread Sean Busbey
I'm going to try responding to several things at once here, so apologies if
I miss anyone and sorry for the long email. :)


On Tue, Jun 16, 2015 at 3:44 PM, Steve Loughran 
wrote:

> I think it's good to have a general build/test process projects can share,
> so +1 to pulling it out. You should get help from others.
>
> regarding incubation, it is a lot of work, especially for something that's
> more of an in-house tool than an artifact to release and redistribute.
>
> You can't just use apache labs or the build project's repo to work on this?
>
> if you do want to incubate, we may want to nominate the hadoop project as
> the monitoring PMC, rather than incubator@.
>
> -steve
>
>
Important note: we're proposing a board resolution that would directly pull
this code base out into a new TLP; there'd be no incubator, we'd just
continue building community and start making releases.

The proposed PMC believes the tooling we're talking about has direct
applicability to projects well outside of the ASF. Lot's of other open
source projects run on community contributions and have a general need for
better QA tools. Given that problem set and the presence of a community
working to solve it, there's no reason this needs to be treated as an
in-house build project. We certainly want to be useful to ASF projects and
getting them on-board given our current optimization for ASF infra will
certainly be easier, but we're not limited to that (and our current
prerequisites, a CI tool and jira or github, are pretty broadly available).


On Tue, Jun 16, 2015 at 10:13 AM, Nick Dimiduk  wrote:

>
> Since we're tossing out names, how about Apache Bootstrap? It's a
> meta-project to help other projects get off the ground, after all.
>


There's already a web development framework named Bootstrap[1]. It's also
used by several ASF projects, so I think it best to avoid the confusion.

The name is, of course, up to the proposed PMC. As a bit of background, the
current name Yetus fulfills Allen's desire to have something shell related
and my desire to have a project that starts with Y (there are currently no
ASF projects that start with Y). The universe of names that fill in these
two is very small, AFAICT. I did a brief suitability search and didn't find
any blockers.


 On Tue, Jun 16, 2015 at 11:59 AM, Allen Wittenauer 
 wrote:

>
> Since a couple of people have brought it up:
>
> I think the release question is probably one of the big question
> marks.  Other than tar balls, how does something like this actually get
> used downstream?
>
> For test-patch, in particular, I have a few thoughts on this:
>
> Short term:
>
> * Projects that want to move RIGHT NOW would modify their Jenkins
> jobs to checkout from the Yetus repo (preferably at a well known tag or
> branch) in one directory and their project repo in another directory.  Then
> it’s just a matter of passing the correct flags to test-patch.  This is
> pretty much how I’ve been personally running test-patch for about 6 months
> now. Under Jenkins, we’ve seen this work with NiFi (incubating) already.
>
> * Create a stub version of test-patch that projects could check
> into their repo, replacing the existing test-patch.  This stub version
> would git clone from either ASF or github and then execute test-patch
> accordingly on demand.  With the correct smarts, it could make sure it has
> a cached version to prevent continual clones.
>
> Longer term:
>
> * I’ve been toying with the idea of (ab)using Java repos and
> packaging as a transportation layer, either in addition or in combination
> with something like a maven plugin.  Something like this would clearly be
> better for offline usage and/or to lower the network traffic.
>

It's important that the project follow ASF guidelines on publishing
releases[2]. So long as we publish releases to the distribution directory I
think we'd be fine having folks work off of the corresponding tag. I'm not
sure there's much reason to do that, however. A Jenkins job can just as
easily grab a release tarball as a git tag and we're not talking about a
large amount of stuff. The kind of build setup that Chris N mentioned is
also totally doable now that there's a build description DSL for Jenkins[3].

For individual developers, I don't see any reason we can't package things
up as a tool, similar to how findbugs or shellcheck work. We can make OS
packages (or homebrew for OS X) if we want to make stand alone installation
on developer machines real easy. Those same packages could be installed on
the ASF build machines, provided some ASF project wanted to make use of
Yetus.

Having releases will incur some turn around time for when folks want to see
fixes, but that's a trade off around release cadence we can work out longer
term.

I would like to have one or two projects that can work off of the bleeding
edge repo, but we'd have to get that to mesh with foundation policy. My gut
tells me we should be able to com

Re: [DISCUSS] project for pre-commit patch testing (was Re: upstream jenkins build broken?)

2015-06-16 Thread Steve Loughran
I think it's good to have a general build/test process projects can share, so 
+1 to pulling it out. You should get help from others. 

regarding incubation, it is a lot of work, especially for something that's more 
of an in-house tool than an artifact to release and redistribute.

You can't just use apache labs or the build project's repo to work on this? 

if you do want to incubate, we may want to nominate the hadoop project as the 
monitoring PMC, rather than incubator@. 

-steve

> On 16 Jun 2015, at 17:59, Allen Wittenauer  wrote:
> 
> 
> Since a couple of people have brought it up:
> 
>   I think the release question is probably one of the big question marks. 
>  Other than tar balls, how does something like this actually get used 
> downstream?
> 
>   For test-patch, in particular, I have a few thoughts on this:
> 
> Short term:
> 
>   * Projects that want to move RIGHT NOW would modify their Jenkins jobs 
> to checkout from the Yetus repo (preferably at a well known tag or branch) in 
> one directory and their project repo in another directory.  Then it’s just a 
> matter of passing the correct flags to test-patch.  This is pretty much how 
> I’ve been personally running test-patch for about 6 months now. Under 
> Jenkins, we’ve seen this work with NiFi (incubating) already.
> 
>   * Create a stub version of test-patch that projects could check into 
> their repo, replacing the existing test-patch.  This stub version would git 
> clone from either ASF or github and then execute test-patch accordingly on 
> demand.  With the correct smarts, it could make sure it has a cached version 
> to prevent continual clones.
> 
> Longer term:
> 
>   * I’ve been toying with the idea of (ab)using Java repos and packaging 
> as a transportation layer, either in addition or in combination with 
> something like a maven plugin.  Something like this would clearly be better 
> for offline usage and/or to lower the network traffic.
> 
> 
>   It’s probably worth pointing out that plugins can get sucked in from 
> outside the Yetus dir structure, so project specific bits can remain in those 
> projects.  This would mean that, e.g., if ambari decides they want to change 
> the dependency ordering such that ambari-metrics always gets built first, 
> that’s completely doable without the Yetus project getting involved.  This is 
> particularly relevant for things like the Dockerfile where projects would 
> almost certainly want to dictate their build and test time dependencies.  



Re: [DISCUSS] project for pre-commit patch testing (was Re: upstream jenkins build broken?)

2015-06-16 Thread Allen Wittenauer

Since a couple of people have brought it up:

I think the release question is probably one of the big question marks. 
 Other than tar balls, how does something like this actually get used 
downstream?

For test-patch, in particular, I have a few thoughts on this:

Short term:

* Projects that want to move RIGHT NOW would modify their Jenkins jobs 
to checkout from the Yetus repo (preferably at a well known tag or branch) in 
one directory and their project repo in another directory.  Then it’s just a 
matter of passing the correct flags to test-patch.  This is pretty much how 
I’ve been personally running test-patch for about 6 months now. Under Jenkins, 
we’ve seen this work with NiFi (incubating) already.

* Create a stub version of test-patch that projects could check into 
their repo, replacing the existing test-patch.  This stub version would git 
clone from either ASF or github and then execute test-patch accordingly on 
demand.  With the correct smarts, it could make sure it has a cached version to 
prevent continual clones.

Longer term:

* I’ve been toying with the idea of (ab)using Java repos and packaging 
as a transportation layer, either in addition or in combination with something 
like a maven plugin.  Something like this would clearly be better for offline 
usage and/or to lower the network traffic.


It’s probably worth pointing out that plugins can get sucked in from 
outside the Yetus dir structure, so project specific bits can remain in those 
projects.  This would mean that, e.g., if ambari decides they want to change 
the dependency ordering such that ambari-metrics always gets built first, 
that’s completely doable without the Yetus project getting involved.  This is 
particularly relevant for things like the Dockerfile where projects would 
almost certainly want to dictate their build and test time dependencies.  

Re: [DISCUSS] project for pre-commit patch testing (was Re: upstream jenkins build broken?)

2015-06-16 Thread Tsuyoshi Ozawa
+1 on the idea.

It would be great if tests about dependency management. multiple
branches, and distributed environment can be done in the project. One
discussion point is how Hadoop depends on Yetus, including the
development cycles. It's a good time to rethink what's can be done for
making Hadoop better.

Thanks,
- Tsuyoshi

On Tue, Jun 16, 2015 at 8:47 AM, Sean Busbey  wrote:
> Oof. I had meant to push on this again but life got in the way and now the
> June board meeting is upon us. Sorry everyone. In the event that this ends
> up contentious, hopefully one of the copied communities can give us a
> branch to work in.
>
> I know everyone is busy, so here's the short version of this email: I'd
> like to move some of the code currently in Hadoop (test-patch) into a new
> TLP focused on QA tooling. I'm not sure what the best format for priming
> this conversation is. ORC filled in the incubator project proposal
> template, but I'm not sure how much that confused the issue. So to start,
> I'll just write what I'm hoping we can accomplish in general terms here.
>
> All software development projects that are community based (that is,
> accepting outside contributions) face a common QA problem for vetting
> in-coming contributions. Hadoop is fortunate enough to be sufficiently
> popular that the weight of the problem drove tool development (i.e.
> test-patch). That tool is generalizable enough that a bunch of other TLPs
> have adopted their own forks. Unfortunately, in most projects this kind of
> QA work is an enabler rather than a primary concern, so often the tooling
> is worked on ad-hoc and little shared improvements happen across
> projects. Since
> the tooling itself is never a primary concern, any made is rarely reused
> outside of ASF projects.
>
> Over the last couple months a few of us have been working on generalizing
> the tooling present in the Hadoop code base (because it was the most mature
> out of all those in the various projects) and it's reached a point where we
> think we can start bringing on other downstream users. This means we need
> to start establishing things like a release cadence and to grow the new
> contributors we have to handle more project responsibility. Personally, I
> think that means it's time to move out from under Hadoop to drive things as
> our own community. Eventually, I hope the community can help draw in a
> group of folks traditionally underrepresented in ASF projects, namely QA
> and operations folks.
>
> I think test-patch by itself has enough scope to justify a project. Having
> a solid set of build tools that are customizable to fit the norms of
> different software communities is a bunch of work. Making it work well in
> both the context of automated test systems like Jenkins and for individual
> developers is even more work. We could easily also take over maintenance of
> things like shelldocs, since test-patch is the primary consumer of that
> currently but it's generally useful tooling.
>
> In addition to test-patch, I think the proposed project has some future
> growth potential. Given some adoption of test-patch to prove utility, the
> project could build on the ties it makes to start building tools to help
> projects do their own longer-run testing. Note that I'm talking about the
> tools to build QA processes and not a particular set of tested components.
> Specifically, I think the ChaosMonkey work that's in HBase should be
> generalizable as a fault injection framework (either based on that code or
> something like it). Doing this for arbitrary software is obviously very
> difficult, and a part of easing that will be to make (and then favor)
> tooling to allow projects to have operational glue that looks the same.
> Namely, the shell work that's been done in hadoop-functions.sh would be a
> great foundational layer that could bring good daemon handling practices to
> a whole slew of software projects. In the event that these frameworks and
> tools get adopted by parts of the Hadoop ecosystem, that could make the job
> of i.e. Bigtop substantially easier.
>
> I've reached out to a few folks who have been involved in the current
> test-patch work or expressed interest in helping out on getting it used in
> other projects. Right now, the proposed PMC would be (alphabetical by last
> name):
>
> * Andrew Bayer (ASF member, incubator pmc, bigtop pmc, flume pmc, jclouds
> pmc, sqoop pmc, all around Jenkins expert)
> * Sean Busbey (ASF member, accumulo pmc, hbase pmc)
> * Nick Dimiduk (hbase pmc, phoenix pmc)
> * Chris Nauroth (ASF member, incubator pmc, hadoop pmc)
> * Andrew Purtell  (ASF member, incubator pmc, bigtop pmc, hbase pmc,
> phoenix pmc)
> * Allen Wittenauer (hadoop committer)
>
> That PMC gives us several members and a bunch of folks familiar with the
> ASF. Combined with the code already existing in Apache spaces, I think that
> gives us sufficient justification for a direct board proposal.
>
> The planned project name is "Apache Yetus". It's

Re: [DISCUSS] project for pre-commit patch testing (was Re: upstream jenkins build broken?)

2015-06-16 Thread Nick Dimiduk
I think this is a great idea! Having just gone through the process of
getting Phoenix up to speed with precommits, it would be really nice to
have a place to go other than "fork/hack someone else's work". For the same
project, I recently integrated its first daemon service. This meant adding
a bunch of servicy Python code (multi platform support is required) which I
only sort of trust. Again, would be great to have an explicit resource for
this kind of thing in the ecosystem. I expect Calcite and Kylin will be
following along shortly.

Since we're tossing out names, how about Apache Bootstrap? It's a
meta-project to help other projects get off the ground, after all.

-n

On Monday, June 15, 2015, Sean Busbey  wrote:

> Oof. I had meant to push on this again but life got in the way and now the
> June board meeting is upon us. Sorry everyone. In the event that this ends
> up contentious, hopefully one of the copied communities can give us a
> branch to work in.
>
> I know everyone is busy, so here's the short version of this email: I'd
> like to move some of the code currently in Hadoop (test-patch) into a new
> TLP focused on QA tooling. I'm not sure what the best format for priming
> this conversation is. ORC filled in the incubator project proposal
> template, but I'm not sure how much that confused the issue. So to start,
> I'll just write what I'm hoping we can accomplish in general terms here.
>
> All software development projects that are community based (that is,
> accepting outside contributions) face a common QA problem for vetting
> in-coming contributions. Hadoop is fortunate enough to be sufficiently
> popular that the weight of the problem drove tool development (i.e.
> test-patch). That tool is generalizable enough that a bunch of other TLPs
> have adopted their own forks. Unfortunately, in most projects this kind of
> QA work is an enabler rather than a primary concern, so often the tooling
> is worked on ad-hoc and little shared improvements happen across
> projects. Since
> the tooling itself is never a primary concern, any made is rarely reused
> outside of ASF projects.
>
> Over the last couple months a few of us have been working on generalizing
> the tooling present in the Hadoop code base (because it was the most mature
> out of all those in the various projects) and it's reached a point where we
> think we can start bringing on other downstream users. This means we need
> to start establishing things like a release cadence and to grow the new
> contributors we have to handle more project responsibility. Personally, I
> think that means it's time to move out from under Hadoop to drive things as
> our own community. Eventually, I hope the community can help draw in a
> group of folks traditionally underrepresented in ASF projects, namely QA
> and operations folks.
>
> I think test-patch by itself has enough scope to justify a project. Having
> a solid set of build tools that are customizable to fit the norms of
> different software communities is a bunch of work. Making it work well in
> both the context of automated test systems like Jenkins and for individual
> developers is even more work. We could easily also take over maintenance of
> things like shelldocs, since test-patch is the primary consumer of that
> currently but it's generally useful tooling.
>
> In addition to test-patch, I think the proposed project has some future
> growth potential. Given some adoption of test-patch to prove utility, the
> project could build on the ties it makes to start building tools to help
> projects do their own longer-run testing. Note that I'm talking about the
> tools to build QA processes and not a particular set of tested components.
> Specifically, I think the ChaosMonkey work that's in HBase should be
> generalizable as a fault injection framework (either based on that code or
> something like it). Doing this for arbitrary software is obviously very
> difficult, and a part of easing that will be to make (and then favor)
> tooling to allow projects to have operational glue that looks the same.
> Namely, the shell work that's been done in hadoop-functions.sh would be a
> great foundational layer that could bring good daemon handling practices to
> a whole slew of software projects. In the event that these frameworks and
> tools get adopted by parts of the Hadoop ecosystem, that could make the job
> of i.e. Bigtop substantially easier.
>
> I've reached out to a few folks who have been involved in the current
> test-patch work or expressed interest in helping out on getting it used in
> other projects. Right now, the proposed PMC would be (alphabetical by last
> name):
>
> * Andrew Bayer (ASF member, incubator pmc, bigtop pmc, flume pmc, jclouds
> pmc, sqoop pmc, all around Jenkins expert)
> * Sean Busbey (ASF member, accumulo pmc, hbase pmc)
> * Nick Dimiduk (hbase pmc, phoenix pmc)
> * Chris Nauroth (ASF member, incubator pmc, hadoop pmc)
> * Andrew Purtell  (ASF member, incubator pmc, bigto

Re: [DISCUSS] project for pre-commit patch testing (was Re: upstream jenkins build broken?)

2015-06-16 Thread Kengo Seki
+1. From its user's viewpoint, recent improvements on test-patch made my
work really efficient.
For example, quick feedback due to avoiding unnecessary tests, automated
build environment setup due to Docker support, automated patch download
from JIRA, automated shellcheck and whitespace checker, etc.
I believe it is worth spreading these ideas as a TLP over other projects
having the same problems such as a long QA process.

2015-06-16 15:08 GMT+09:00 Chris Douglas :

> +1 A separate project sounds great. It'd be great to have more
> standard tooling across the ecosystem.
>
> As a practical matter, how should projects consume releases? -C
>
> On Mon, Jun 15, 2015 at 4:47 PM, Sean Busbey  wrote:
> > Oof. I had meant to push on this again but life got in the way and now
> the
> > June board meeting is upon us. Sorry everyone. In the event that this
> ends
> > up contentious, hopefully one of the copied communities can give us a
> > branch to work in.
> >
> > I know everyone is busy, so here's the short version of this email: I'd
> > like to move some of the code currently in Hadoop (test-patch) into a new
> > TLP focused on QA tooling. I'm not sure what the best format for priming
> > this conversation is. ORC filled in the incubator project proposal
> > template, but I'm not sure how much that confused the issue. So to start,
> > I'll just write what I'm hoping we can accomplish in general terms here.
> >
> > All software development projects that are community based (that is,
> > accepting outside contributions) face a common QA problem for vetting
> > in-coming contributions. Hadoop is fortunate enough to be sufficiently
> > popular that the weight of the problem drove tool development (i.e.
> > test-patch). That tool is generalizable enough that a bunch of other TLPs
> > have adopted their own forks. Unfortunately, in most projects this kind
> of
> > QA work is an enabler rather than a primary concern, so often the tooling
> > is worked on ad-hoc and little shared improvements happen across
> > projects. Since
> > the tooling itself is never a primary concern, any made is rarely reused
> > outside of ASF projects.
> >
> > Over the last couple months a few of us have been working on generalizing
> > the tooling present in the Hadoop code base (because it was the most
> mature
> > out of all those in the various projects) and it's reached a point where
> we
> > think we can start bringing on other downstream users. This means we need
> > to start establishing things like a release cadence and to grow the new
> > contributors we have to handle more project responsibility. Personally, I
> > think that means it's time to move out from under Hadoop to drive things
> as
> > our own community. Eventually, I hope the community can help draw in a
> > group of folks traditionally underrepresented in ASF projects, namely QA
> > and operations folks.
> >
> > I think test-patch by itself has enough scope to justify a project.
> Having
> > a solid set of build tools that are customizable to fit the norms of
> > different software communities is a bunch of work. Making it work well in
> > both the context of automated test systems like Jenkins and for
> individual
> > developers is even more work. We could easily also take over maintenance
> of
> > things like shelldocs, since test-patch is the primary consumer of that
> > currently but it's generally useful tooling.
> >
> > In addition to test-patch, I think the proposed project has some future
> > growth potential. Given some adoption of test-patch to prove utility, the
> > project could build on the ties it makes to start building tools to help
> > projects do their own longer-run testing. Note that I'm talking about the
> > tools to build QA processes and not a particular set of tested
> components.
> > Specifically, I think the ChaosMonkey work that's in HBase should be
> > generalizable as a fault injection framework (either based on that code
> or
> > something like it). Doing this for arbitrary software is obviously very
> > difficult, and a part of easing that will be to make (and then favor)
> > tooling to allow projects to have operational glue that looks the same.
> > Namely, the shell work that's been done in hadoop-functions.sh would be a
> > great foundational layer that could bring good daemon handling practices
> to
> > a whole slew of software projects. In the event that these frameworks and
> > tools get adopted by parts of the Hadoop ecosystem, that could make the
> job
> > of i.e. Bigtop substantially easier.
> >
> > I've reached out to a few folks who have been involved in the current
> > test-patch work or expressed interest in helping out on getting it used
> in
> > other projects. Right now, the proposed PMC would be (alphabetical by
> last
> > name):
> >
> > * Andrew Bayer (ASF member, incubator pmc, bigtop pmc, flume pmc, jclouds
> > pmc, sqoop pmc, all around Jenkins expert)
> > * Sean Busbey (ASF member, accumulo pmc, hbase pmc)
> > * Nick Di

Re: [DISCUSS] project for pre-commit patch testing (was Re: upstream jenkins build broken?)

2015-06-15 Thread Chris Douglas
+1 A separate project sounds great. It'd be great to have more
standard tooling across the ecosystem.

As a practical matter, how should projects consume releases? -C

On Mon, Jun 15, 2015 at 4:47 PM, Sean Busbey  wrote:
> Oof. I had meant to push on this again but life got in the way and now the
> June board meeting is upon us. Sorry everyone. In the event that this ends
> up contentious, hopefully one of the copied communities can give us a
> branch to work in.
>
> I know everyone is busy, so here's the short version of this email: I'd
> like to move some of the code currently in Hadoop (test-patch) into a new
> TLP focused on QA tooling. I'm not sure what the best format for priming
> this conversation is. ORC filled in the incubator project proposal
> template, but I'm not sure how much that confused the issue. So to start,
> I'll just write what I'm hoping we can accomplish in general terms here.
>
> All software development projects that are community based (that is,
> accepting outside contributions) face a common QA problem for vetting
> in-coming contributions. Hadoop is fortunate enough to be sufficiently
> popular that the weight of the problem drove tool development (i.e.
> test-patch). That tool is generalizable enough that a bunch of other TLPs
> have adopted their own forks. Unfortunately, in most projects this kind of
> QA work is an enabler rather than a primary concern, so often the tooling
> is worked on ad-hoc and little shared improvements happen across
> projects. Since
> the tooling itself is never a primary concern, any made is rarely reused
> outside of ASF projects.
>
> Over the last couple months a few of us have been working on generalizing
> the tooling present in the Hadoop code base (because it was the most mature
> out of all those in the various projects) and it's reached a point where we
> think we can start bringing on other downstream users. This means we need
> to start establishing things like a release cadence and to grow the new
> contributors we have to handle more project responsibility. Personally, I
> think that means it's time to move out from under Hadoop to drive things as
> our own community. Eventually, I hope the community can help draw in a
> group of folks traditionally underrepresented in ASF projects, namely QA
> and operations folks.
>
> I think test-patch by itself has enough scope to justify a project. Having
> a solid set of build tools that are customizable to fit the norms of
> different software communities is a bunch of work. Making it work well in
> both the context of automated test systems like Jenkins and for individual
> developers is even more work. We could easily also take over maintenance of
> things like shelldocs, since test-patch is the primary consumer of that
> currently but it's generally useful tooling.
>
> In addition to test-patch, I think the proposed project has some future
> growth potential. Given some adoption of test-patch to prove utility, the
> project could build on the ties it makes to start building tools to help
> projects do their own longer-run testing. Note that I'm talking about the
> tools to build QA processes and not a particular set of tested components.
> Specifically, I think the ChaosMonkey work that's in HBase should be
> generalizable as a fault injection framework (either based on that code or
> something like it). Doing this for arbitrary software is obviously very
> difficult, and a part of easing that will be to make (and then favor)
> tooling to allow projects to have operational glue that looks the same.
> Namely, the shell work that's been done in hadoop-functions.sh would be a
> great foundational layer that could bring good daemon handling practices to
> a whole slew of software projects. In the event that these frameworks and
> tools get adopted by parts of the Hadoop ecosystem, that could make the job
> of i.e. Bigtop substantially easier.
>
> I've reached out to a few folks who have been involved in the current
> test-patch work or expressed interest in helping out on getting it used in
> other projects. Right now, the proposed PMC would be (alphabetical by last
> name):
>
> * Andrew Bayer (ASF member, incubator pmc, bigtop pmc, flume pmc, jclouds
> pmc, sqoop pmc, all around Jenkins expert)
> * Sean Busbey (ASF member, accumulo pmc, hbase pmc)
> * Nick Dimiduk (hbase pmc, phoenix pmc)
> * Chris Nauroth (ASF member, incubator pmc, hadoop pmc)
> * Andrew Purtell  (ASF member, incubator pmc, bigtop pmc, hbase pmc,
> phoenix pmc)
> * Allen Wittenauer (hadoop committer)
>
> That PMC gives us several members and a bunch of folks familiar with the
> ASF. Combined with the code already existing in Apache spaces, I think that
> gives us sufficient justification for a direct board proposal.
>
> The planned project name is "Apache Yetus". It's an archaic genus of sea
> snail and most of our project will be focused on shell scripts.
>
> N.b.: this does not mean that the Hadoop community would _have_ to

Re: [DISCUSS] project for pre-commit patch testing (was Re: upstream jenkins build broken?)

2015-06-15 Thread Chris Nauroth
+1

ZooKeeper is another project that has expressed interest in improving its
pre-commit process lately.  I understand Allen has had some success
applying this to the ZooKeeper build too, with some small caveats around
quirks in the build.xml that I think we can resolve.

I'm interested in defining how the release model works for a project like
this.  The current model of forking and checking it in directly to
multiple projects leads to the fragmentation and bugs described earlier in
the thread.  Another possible model is something more dynamic, like a
bootstrap script capable of checking out a release from a git tag before
launching pre-commit.  I'm interested to hear from various projects on how
they'd like to integrate.

--Chris Nauroth




On 6/15/15, 8:57 PM, "Josh Elser"  wrote:

>+1
>
>(Have been talking to Sean in private on the subject -- seems
>appropriate to voice some public support)
>
>I'd be interested in this for Accumulo and Slider. For Accumulo, we've
>come a far way without a pre-commit build, primarily due to a CTR
>process. We have seen the repeated questions of "how do I run the tests"
>which a more automated workflow would help with, IMO. I think Slider
>could benefit with the same reasons.
>
>I'd also be giddy to see the recent improvements in Hadoop trickle down
>into the other projects that Allen already mentioned.
>
>Take this as record that I'd be happy to try to help out where possible.
>
>Sean Busbey wrote:
>> thank you for making a more digestible version Allen. :)
>>
>> If you're interested in soliciting feedback from other projects, I
>>created
>> ASF short links to this thread in common-dev and hbase:
>>
>>
>> * http://s.apache.org/yetus-discuss-hadoop
>> * http://s.apache.org/yetus-discuss-hbase
>>
>> While I agree that it's important to get feedback from ASF projects that
>> might find this useful, I can say that recently I've been involved in
>>the
>> non-ASF project YCSB and both the pretest and better shell stuff would
>>be
>> immensely useful over there.
>>
>> On Mon, Jun 15, 2015 at 10:36 PM, Allen Wittenauer
>>wrote:
>>
>>>  I'm clearly +1 on this idea.  As part of the rewrite in
>>>Hadoop of
>>> test-patch, it was amazing to see how far and wide this bit of code as
>>> spread.  So I see consolidating everyone's efforts as a huge win for a
>>> large number of projects.  (esp considering how many I saw suffering
>>>from a
>>> variety of identified bugs! )
>>>
>>>  But….
>>>
>>>  I think it's important for people involved in those other
>>>projects
>>> to speak up and voice an opinion as to whether this is useful.
>>>
>>> To summarize:
>>>
>>>  In the short term, a single location to get/use a precommit
>>>patch
>>> tester rather than everyone building/supporting their own in their
>>>spare
>>> time.
>>>
>>>   FWIW, we've already got the code base modified to be
>>>pluggable.
>>> We've written some basic/simple plugins that support Hadoop, HBase,
>>>Tajo,
>>> Tez, Pig, and Flink.  For HBase and Flink, this does include their
>>>custom
>>> checks.  Adding support for other project shouldn't be hard.  Simple
>>> projects take almost no time after seeing the basic pattern.
>>>
>>>  I think it's worthwhile highlighting that means support for
>>>both
>>> JIRA and GitHub as well as Ant and Maven from the same code base.
>>>
>>> Longer term:
>>>
>>>  Well, we clearly have ideas of things that we want to do.
>>>Adding
>>> more features to test-patch (review board? gradle?) is obvious. But
>>>what
>>> about teasing apart and generalizing some of the other shell bits from
>>> projects? A common library for building CLI tools to fault injection to
>>> release documentation creation tools to …  I'd even like to see us get
>>>as
>>> advanced as a "run this program to auto-generate daemon stop/start
>>>bits".
>>>
>>>  I had a few chats with people about this idea at Hadoop
>>>Summit.
>>> What's truly exciting are the ideas that people had once they realized
>>>what
>>> kinds of problems we're trying to solve.  It's always amazing the
>>>problems
>>> that projects have that could be solved by these types of solutions.
>>>Let's
>>> stop hiding our cool toys in this area.
>>>
>>>  So, what feedback and ideas do you have in this area?  Are
>>>you a
>>> yay or a nay?
>>>
>>>
>>> On Jun 15, 2015, at 4:47 PM, Sean Busbey  wrote:
>>>
 Oof. I had meant to push on this again but life got in the way and now
>>> the
 June board meeting is upon us. Sorry everyone. In the event that this
>>> ends
 up contentious, hopefully one of the copied communities can give us a
 branch to work in.

 I know everyone is busy, so here's the short version of this email:
I'd
 like to move some of the code currently in Hadoop (test-patch) into a
new
 TLP focused on QA tooling. I'm not sure what the best format for
priming
 this conversation is. ORC filled in the incubator project proposal
 t

Re: [DISCUSS] project for pre-commit patch testing (was Re: upstream jenkins build broken?)

2015-06-15 Thread Josh Elser

+1

(Have been talking to Sean in private on the subject -- seems 
appropriate to voice some public support)


I'd be interested in this for Accumulo and Slider. For Accumulo, we've 
come a far way without a pre-commit build, primarily due to a CTR 
process. We have seen the repeated questions of "how do I run the tests" 
which a more automated workflow would help with, IMO. I think Slider 
could benefit with the same reasons.


I'd also be giddy to see the recent improvements in Hadoop trickle down 
into the other projects that Allen already mentioned.


Take this as record that I'd be happy to try to help out where possible.

Sean Busbey wrote:

thank you for making a more digestible version Allen. :)

If you're interested in soliciting feedback from other projects, I created
ASF short links to this thread in common-dev and hbase:


* http://s.apache.org/yetus-discuss-hadoop
* http://s.apache.org/yetus-discuss-hbase

While I agree that it's important to get feedback from ASF projects that
might find this useful, I can say that recently I've been involved in the
non-ASF project YCSB and both the pretest and better shell stuff would be
immensely useful over there.

On Mon, Jun 15, 2015 at 10:36 PM, Allen Wittenauer  wrote:


 I'm clearly +1 on this idea.  As part of the rewrite in Hadoop of
test-patch, it was amazing to see how far and wide this bit of code as
spread.  So I see consolidating everyone's efforts as a huge win for a
large number of projects.  (esp considering how many I saw suffering from a
variety of identified bugs! )

 But….

 I think it's important for people involved in those other projects
to speak up and voice an opinion as to whether this is useful.

To summarize:

 In the short term, a single location to get/use a precommit patch
tester rather than everyone building/supporting their own in their spare
time.

  FWIW, we've already got the code base modified to be pluggable.
We've written some basic/simple plugins that support Hadoop, HBase, Tajo,
Tez, Pig, and Flink.  For HBase and Flink, this does include their custom
checks.  Adding support for other project shouldn't be hard.  Simple
projects take almost no time after seeing the basic pattern.

 I think it's worthwhile highlighting that means support for both
JIRA and GitHub as well as Ant and Maven from the same code base.

Longer term:

 Well, we clearly have ideas of things that we want to do. Adding
more features to test-patch (review board? gradle?) is obvious. But what
about teasing apart and generalizing some of the other shell bits from
projects? A common library for building CLI tools to fault injection to
release documentation creation tools to …  I'd even like to see us get as
advanced as a "run this program to auto-generate daemon stop/start bits".

 I had a few chats with people about this idea at Hadoop Summit.
What's truly exciting are the ideas that people had once they realized what
kinds of problems we're trying to solve.  It's always amazing the problems
that projects have that could be solved by these types of solutions.  Let's
stop hiding our cool toys in this area.

 So, what feedback and ideas do you have in this area?  Are you a
yay or a nay?


On Jun 15, 2015, at 4:47 PM, Sean Busbey  wrote:


Oof. I had meant to push on this again but life got in the way and now

the

June board meeting is upon us. Sorry everyone. In the event that this

ends

up contentious, hopefully one of the copied communities can give us a
branch to work in.

I know everyone is busy, so here's the short version of this email: I'd
like to move some of the code currently in Hadoop (test-patch) into a new
TLP focused on QA tooling. I'm not sure what the best format for priming
this conversation is. ORC filled in the incubator project proposal
template, but I'm not sure how much that confused the issue. So to start,
I'll just write what I'm hoping we can accomplish in general terms here.

All software development projects that are community based (that is,
accepting outside contributions) face a common QA problem for vetting
in-coming contributions. Hadoop is fortunate enough to be sufficiently
popular that the weight of the problem drove tool development (i.e.
test-patch). That tool is generalizable enough that a bunch of other TLPs
have adopted their own forks. Unfortunately, in most projects this kind

of

QA work is an enabler rather than a primary concern, so often the tooling
is worked on ad-hoc and little shared improvements happen across
projects. Since
the tooling itself is never a primary concern, any made is rarely reused
outside of ASF projects.

Over the last couple months a few of us have been working on generalizing
the tooling present in the Hadoop code base (because it was the most

mature

out of all those in the various projects) and it's reached a point where

we

think we can start bringing on other downstream users. This means we need
to start 

Re: [DISCUSS] project for pre-commit patch testing (was Re: upstream jenkins build broken?)

2015-06-15 Thread Sean Busbey
thank you for making a more digestible version Allen. :)

If you're interested in soliciting feedback from other projects, I created
ASF short links to this thread in common-dev and hbase:


* http://s.apache.org/yetus-discuss-hadoop
* http://s.apache.org/yetus-discuss-hbase

While I agree that it's important to get feedback from ASF projects that
might find this useful, I can say that recently I've been involved in the
non-ASF project YCSB and both the pretest and better shell stuff would be
immensely useful over there.

On Mon, Jun 15, 2015 at 10:36 PM, Allen Wittenauer  wrote:

>
> I'm clearly +1 on this idea.  As part of the rewrite in Hadoop of
> test-patch, it was amazing to see how far and wide this bit of code as
> spread.  So I see consolidating everyone's efforts as a huge win for a
> large number of projects.  (esp considering how many I saw suffering from a
> variety of identified bugs! )
>
> But….
>
> I think it's important for people involved in those other projects
> to speak up and voice an opinion as to whether this is useful.
>
> To summarize:
>
> In the short term, a single location to get/use a precommit patch
> tester rather than everyone building/supporting their own in their spare
> time.
>
>  FWIW, we've already got the code base modified to be pluggable.
> We've written some basic/simple plugins that support Hadoop, HBase, Tajo,
> Tez, Pig, and Flink.  For HBase and Flink, this does include their custom
> checks.  Adding support for other project shouldn't be hard.  Simple
> projects take almost no time after seeing the basic pattern.
>
> I think it's worthwhile highlighting that means support for both
> JIRA and GitHub as well as Ant and Maven from the same code base.
>
> Longer term:
>
> Well, we clearly have ideas of things that we want to do. Adding
> more features to test-patch (review board? gradle?) is obvious. But what
> about teasing apart and generalizing some of the other shell bits from
> projects? A common library for building CLI tools to fault injection to
> release documentation creation tools to …  I'd even like to see us get as
> advanced as a "run this program to auto-generate daemon stop/start bits".
>
> I had a few chats with people about this idea at Hadoop Summit.
> What's truly exciting are the ideas that people had once they realized what
> kinds of problems we're trying to solve.  It's always amazing the problems
> that projects have that could be solved by these types of solutions.  Let's
> stop hiding our cool toys in this area.
>
> So, what feedback and ideas do you have in this area?  Are you a
> yay or a nay?
>
>
> On Jun 15, 2015, at 4:47 PM, Sean Busbey  wrote:
>
> > Oof. I had meant to push on this again but life got in the way and now
> the
> > June board meeting is upon us. Sorry everyone. In the event that this
> ends
> > up contentious, hopefully one of the copied communities can give us a
> > branch to work in.
> >
> > I know everyone is busy, so here's the short version of this email: I'd
> > like to move some of the code currently in Hadoop (test-patch) into a new
> > TLP focused on QA tooling. I'm not sure what the best format for priming
> > this conversation is. ORC filled in the incubator project proposal
> > template, but I'm not sure how much that confused the issue. So to start,
> > I'll just write what I'm hoping we can accomplish in general terms here.
> >
> > All software development projects that are community based (that is,
> > accepting outside contributions) face a common QA problem for vetting
> > in-coming contributions. Hadoop is fortunate enough to be sufficiently
> > popular that the weight of the problem drove tool development (i.e.
> > test-patch). That tool is generalizable enough that a bunch of other TLPs
> > have adopted their own forks. Unfortunately, in most projects this kind
> of
> > QA work is an enabler rather than a primary concern, so often the tooling
> > is worked on ad-hoc and little shared improvements happen across
> > projects. Since
> > the tooling itself is never a primary concern, any made is rarely reused
> > outside of ASF projects.
> >
> > Over the last couple months a few of us have been working on generalizing
> > the tooling present in the Hadoop code base (because it was the most
> mature
> > out of all those in the various projects) and it's reached a point where
> we
> > think we can start bringing on other downstream users. This means we need
> > to start establishing things like a release cadence and to grow the new
> > contributors we have to handle more project responsibility. Personally, I
> > think that means it's time to move out from under Hadoop to drive things
> as
> > our own community. Eventually, I hope the community can help draw in a
> > group of folks traditionally underrepresented in ASF projects, namely QA
> > and operations folks.
> >
> > I think test-patch by itself has enough scope to justify a proj

Re: [DISCUSS] project for pre-commit patch testing (was Re: upstream jenkins build broken?)

2015-06-15 Thread Allen Wittenauer

I'm clearly +1 on this idea.  As part of the rewrite in Hadoop of 
test-patch, it was amazing to see how far and wide this bit of code as spread.  
So I see consolidating everyone's efforts as a huge win for a large number of 
projects.  (esp considering how many I saw suffering from a variety of 
identified bugs! )

But….

I think it's important for people involved in those other projects to 
speak up and voice an opinion as to whether this is useful. 

To summarize:

In the short term, a single location to get/use a precommit patch 
tester rather than everyone building/supporting their own in their spare time. 

 FWIW, we've already got the code base modified to be pluggable.  We've 
written some basic/simple plugins that support Hadoop, HBase, Tajo, Tez, Pig, 
and Flink.  For HBase and Flink, this does include their custom checks.  Adding 
support for other project shouldn't be hard.  Simple projects take almost no 
time after seeing the basic pattern.

I think it's worthwhile highlighting that means support for both JIRA 
and GitHub as well as Ant and Maven from the same code base.

Longer term:

Well, we clearly have ideas of things that we want to do. Adding more 
features to test-patch (review board? gradle?) is obvious. But what about 
teasing apart and generalizing some of the other shell bits from projects? A 
common library for building CLI tools to fault injection to release 
documentation creation tools to …  I'd even like to see us get as advanced as a 
"run this program to auto-generate daemon stop/start bits".

I had a few chats with people about this idea at Hadoop Summit.  What's 
truly exciting are the ideas that people had once they realized what kinds of 
problems we're trying to solve.  It's always amazing the problems that projects 
have that could be solved by these types of solutions.  Let's stop hiding our 
cool toys in this area.

So, what feedback and ideas do you have in this area?  Are you a yay or 
a nay?


On Jun 15, 2015, at 4:47 PM, Sean Busbey  wrote:

> Oof. I had meant to push on this again but life got in the way and now the
> June board meeting is upon us. Sorry everyone. In the event that this ends
> up contentious, hopefully one of the copied communities can give us a
> branch to work in.
> 
> I know everyone is busy, so here's the short version of this email: I'd
> like to move some of the code currently in Hadoop (test-patch) into a new
> TLP focused on QA tooling. I'm not sure what the best format for priming
> this conversation is. ORC filled in the incubator project proposal
> template, but I'm not sure how much that confused the issue. So to start,
> I'll just write what I'm hoping we can accomplish in general terms here.
> 
> All software development projects that are community based (that is,
> accepting outside contributions) face a common QA problem for vetting
> in-coming contributions. Hadoop is fortunate enough to be sufficiently
> popular that the weight of the problem drove tool development (i.e.
> test-patch). That tool is generalizable enough that a bunch of other TLPs
> have adopted their own forks. Unfortunately, in most projects this kind of
> QA work is an enabler rather than a primary concern, so often the tooling
> is worked on ad-hoc and little shared improvements happen across
> projects. Since
> the tooling itself is never a primary concern, any made is rarely reused
> outside of ASF projects.
> 
> Over the last couple months a few of us have been working on generalizing
> the tooling present in the Hadoop code base (because it was the most mature
> out of all those in the various projects) and it's reached a point where we
> think we can start bringing on other downstream users. This means we need
> to start establishing things like a release cadence and to grow the new
> contributors we have to handle more project responsibility. Personally, I
> think that means it's time to move out from under Hadoop to drive things as
> our own community. Eventually, I hope the community can help draw in a
> group of folks traditionally underrepresented in ASF projects, namely QA
> and operations folks.
> 
> I think test-patch by itself has enough scope to justify a project. Having
> a solid set of build tools that are customizable to fit the norms of
> different software communities is a bunch of work. Making it work well in
> both the context of automated test systems like Jenkins and for individual
> developers is even more work. We could easily also take over maintenance of
> things like shelldocs, since test-patch is the primary consumer of that
> currently but it's generally useful tooling.
> 
> In addition to test-patch, I think the proposed project has some future
> growth potential. Given some adoption of test-patch to prove utility, the
> project could build on the ties it makes to start building tools to help
> projects do their own longer-run testing. Note that I'm talk

Re: [DISCUSS] project for pre-commit patch testing (was Re: upstream jenkins build broken?)

2015-06-15 Thread Sean Busbey
Oof. I had meant to push on this again but life got in the way and now the
June board meeting is upon us. Sorry everyone. In the event that this ends
up contentious, hopefully one of the copied communities can give us a
branch to work in.

I know everyone is busy, so here's the short version of this email: I'd
like to move some of the code currently in Hadoop (test-patch) into a new
TLP focused on QA tooling. I'm not sure what the best format for priming
this conversation is. ORC filled in the incubator project proposal
template, but I'm not sure how much that confused the issue. So to start,
I'll just write what I'm hoping we can accomplish in general terms here.

All software development projects that are community based (that is,
accepting outside contributions) face a common QA problem for vetting
in-coming contributions. Hadoop is fortunate enough to be sufficiently
popular that the weight of the problem drove tool development (i.e.
test-patch). That tool is generalizable enough that a bunch of other TLPs
have adopted their own forks. Unfortunately, in most projects this kind of
QA work is an enabler rather than a primary concern, so often the tooling
is worked on ad-hoc and little shared improvements happen across
projects. Since
the tooling itself is never a primary concern, any made is rarely reused
outside of ASF projects.

Over the last couple months a few of us have been working on generalizing
the tooling present in the Hadoop code base (because it was the most mature
out of all those in the various projects) and it's reached a point where we
think we can start bringing on other downstream users. This means we need
to start establishing things like a release cadence and to grow the new
contributors we have to handle more project responsibility. Personally, I
think that means it's time to move out from under Hadoop to drive things as
our own community. Eventually, I hope the community can help draw in a
group of folks traditionally underrepresented in ASF projects, namely QA
and operations folks.

I think test-patch by itself has enough scope to justify a project. Having
a solid set of build tools that are customizable to fit the norms of
different software communities is a bunch of work. Making it work well in
both the context of automated test systems like Jenkins and for individual
developers is even more work. We could easily also take over maintenance of
things like shelldocs, since test-patch is the primary consumer of that
currently but it's generally useful tooling.

In addition to test-patch, I think the proposed project has some future
growth potential. Given some adoption of test-patch to prove utility, the
project could build on the ties it makes to start building tools to help
projects do their own longer-run testing. Note that I'm talking about the
tools to build QA processes and not a particular set of tested components.
Specifically, I think the ChaosMonkey work that's in HBase should be
generalizable as a fault injection framework (either based on that code or
something like it). Doing this for arbitrary software is obviously very
difficult, and a part of easing that will be to make (and then favor)
tooling to allow projects to have operational glue that looks the same.
Namely, the shell work that's been done in hadoop-functions.sh would be a
great foundational layer that could bring good daemon handling practices to
a whole slew of software projects. In the event that these frameworks and
tools get adopted by parts of the Hadoop ecosystem, that could make the job
of i.e. Bigtop substantially easier.

I've reached out to a few folks who have been involved in the current
test-patch work or expressed interest in helping out on getting it used in
other projects. Right now, the proposed PMC would be (alphabetical by last
name):

* Andrew Bayer (ASF member, incubator pmc, bigtop pmc, flume pmc, jclouds
pmc, sqoop pmc, all around Jenkins expert)
* Sean Busbey (ASF member, accumulo pmc, hbase pmc)
* Nick Dimiduk (hbase pmc, phoenix pmc)
* Chris Nauroth (ASF member, incubator pmc, hadoop pmc)
* Andrew Purtell  (ASF member, incubator pmc, bigtop pmc, hbase pmc,
phoenix pmc)
* Allen Wittenauer (hadoop committer)

That PMC gives us several members and a bunch of folks familiar with the
ASF. Combined with the code already existing in Apache spaces, I think that
gives us sufficient justification for a direct board proposal.

The planned project name is "Apache Yetus". It's an archaic genus of sea
snail and most of our project will be focused on shell scripts.

N.b.: this does not mean that the Hadoop community would _have_ to rely on
the new TLP, but I hope that once we have a release that can be evaluated
there'd be enough benefit to strongly encourage it.

This has mostly been focused on scope and community issues, and I'd love to
talk through any feedback on that. Additionally, are there any other points
folks want to make sure are covered before we have a resolution?

On Sat, Jun 6, 2015

[DISCUSS] project for pre-commit patch testing (was Re: upstream jenkins build broken?)

2015-06-06 Thread Sean Busbey
Sorry for the resend. I figured this deserves a [DISCUSS] flag.



On Sat, Jun 6, 2015 at 10:39 PM, Sean Busbey  wrote:

> Hi Folks!
>
> After working on test-patch with other folks for the last few months, I
> think we've reached the point where we can make the fastest progress
> towards the goal of a general use pre-commit patch tester by spinning
> things into a project focused on just that. I think we have a mature enough
> code base and a sufficient fledgling community, so I'm going to put
> together a tlp proposal.
>
> Thanks for the feedback thus far from use within Hadoop. I hope we can
> continue to make things more useful.
>
> -Sean
>
> On Wed, Mar 11, 2015 at 5:16 PM, Sean Busbey  wrote:
>
>> HBase's dev-support folder is where the scripts and support files live.
>> We've only recently started adding anything to the maven builds that's
>> specific to jenkins[1]; so far it's diagnostic stuff, but that's where I'd
>> add in more if we ran into the same permissions problems y'all are having.
>>
>> There's also our precommit job itself, though it isn't large[2]. AFAIK,
>> we don't properly back this up anywhere, we just notify each other of
>> changes on a particular mail thread[3].
>>
>> [1]: https://github.com/apache/hbase/blob/master/pom.xml#L1687
>> [2]: https://builds.apache.org/job/PreCommit-HBASE-Build/ (they're all
>> read because I just finished fixing "mvn site" running out of permgen)
>> [3]: http://s.apache.org/NT0
>>
>>
>> On Wed, Mar 11, 2015 at 4:51 PM, Chris Nauroth 
>> wrote:
>>
>>> Sure, thanks Sean!  Do we just look in the dev-support folder in the
>>> HBase
>>> repo?  Is there any additional context we need to be aware of?
>>>
>>> Chris Nauroth
>>> Hortonworks
>>> http://hortonworks.com/
>>>
>>>
>>>
>>>
>>>
>>>
>>> On 3/11/15, 2:44 PM, "Sean Busbey"  wrote:
>>>
>>> >+dev@hbase
>>> >
>>> >HBase has recently been cleaning up our precommit jenkins jobs to make
>>> >them
>>> >more robust. From what I can tell our stuff started off as an earlier
>>> >version of what Hadoop uses for testing.
>>> >
>>> >Folks on either side open to an experiment of combining our precommit
>>> >check
>>> >tooling? In principle we should be looking for the same kinds of things.
>>> >
>>> >Naturally we'll still need different jenkins jobs to handle different
>>> >resource needs and we'd need to figure out where stuff eventually lives,
>>> >but that could come later.
>>> >
>>> >On Wed, Mar 11, 2015 at 4:34 PM, Chris Nauroth <
>>> cnaur...@hortonworks.com>
>>> >wrote:
>>> >
>>> >> The only thing I'm aware of is the failOnError option:
>>> >>
>>> >>
>>> >>
>>> http://maven.apache.org/plugins/maven-clean-plugin/examples/ignoring-erro
>>> >>rs
>>> >> .html
>>> >>
>>> >>
>>> >> I prefer that we don't disable this, because ignoring different kinds
>>> of
>>> >> failures could leave our build directories in an indeterminate state.
>>> >>For
>>> >> example, we could end up with an old class file on the classpath for
>>> >>test
>>> >> runs that was supposedly deleted.
>>> >>
>>> >> I think it's worth exploring Eddy's suggestion to try simulating
>>> failure
>>> >> by placing a file where the code expects to see a directory.  That
>>> might
>>> >> even let us enable some of these tests that are skipped on Windows,
>>> >> because Windows allows access for the owner even after permissions
>>> have
>>> >> been stripped.
>>> >>
>>> >> Chris Nauroth
>>> >> Hortonworks
>>> >> http://hortonworks.com/
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> On 3/11/15, 2:10 PM, "Colin McCabe"  wrote:
>>> >>
>>> >> >Is there a maven plugin or setting we can use to simply remove
>>> >> >directories that have no executable permissions on them?  Clearly we
>>> >> >have the permission to do this from a technical point of view (since
>>> >> >we created the directories as the jenkins user), it's simply that the
>>> >> >code refuses to do it.
>>> >> >
>>> >> >Otherwise I guess we can just fix those tests...
>>> >> >
>>> >> >Colin
>>> >> >
>>> >> >On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu  wrote:
>>> >> >> Thanks a lot for looking into HDFS-7722, Chris.
>>> >> >>
>>> >> >> In HDFS-7722:
>>> >> >> TestDataNodeVolumeFailureXXX tests reset data dir permissions in
>>> >> >>TearDown().
>>> >> >> TestDataNodeHotSwapVolumes reset permissions in a finally clause.
>>> >> >>
>>> >> >> Also I ran mvn test several times on my machine and all tests
>>> passed.
>>> >> >>
>>> >> >> However, since in DiskChecker#checkDirAccess():
>>> >> >>
>>> >> >> private static void checkDirAccess(File dir) throws
>>> >>DiskErrorException {
>>> >> >>   if (!dir.isDirectory()) {
>>> >> >> throw new DiskErrorException("Not a directory: "
>>> >> >>  + dir.toString());
>>> >> >>   }
>>> >> >>
>>> >> >>   checkAccessByFileMethods(dir);
>>> >> >> }
>>> >> >>
>>> >> >> One potentially safer alternative is replacing data dir with a
>>> >>regular
>>> >> >> file to stimulate disk failures.
>>> >> >>
>>> >> >> On Tue, Mar 10, 2015 at 2:19 PM, Chris