It looks like we have consensus. I'll start drafting up a proposal for the next board meeting (July 15th). Once we work out the name I'll submit a PODLINGNAMESEARCH jira to track that we did due diligence on whatever we pick.
In the mean time, Hadoop PMC would y'all be willing to host us in a branch so that we can start prepping things now? We would want branch commit rights for the proposed new PMC. -Sean On Mon, Jun 15, 2015 at 6:47 PM, Sean Busbey <bus...@cloudera.com> wrote: > Oof. I had meant to push on this again but life got in the way and now the > June board meeting is upon us. Sorry everyone. In the event that this ends > up contentious, hopefully one of the copied communities can give us a > branch to work in. > > I know everyone is busy, so here's the short version of this email: I'd > like to move some of the code currently in Hadoop (test-patch) into a new > TLP focused on QA tooling. I'm not sure what the best format for priming > this conversation is. ORC filled in the incubator project proposal > template, but I'm not sure how much that confused the issue. So to start, > I'll just write what I'm hoping we can accomplish in general terms here. > > All software development projects that are community based (that is, > accepting outside contributions) face a common QA problem for vetting > in-coming contributions. Hadoop is fortunate enough to be sufficiently > popular that the weight of the problem drove tool development (i.e. > test-patch). That tool is generalizable enough that a bunch of other TLPs > have adopted their own forks. Unfortunately, in most projects this kind of > QA work is an enabler rather than a primary concern, so often the tooling > is worked on ad-hoc and little shared improvements happen across projects. > Since > the tooling itself is never a primary concern, any made is rarely reused > outside of ASF projects. > > Over the last couple months a few of us have been working on generalizing > the tooling present in the Hadoop code base (because it was the most mature > out of all those in the various projects) and it's reached a point where we > think we can start bringing on other downstream users. This means we need > to start establishing things like a release cadence and to grow the new > contributors we have to handle more project responsibility. Personally, I > think that means it's time to move out from under Hadoop to drive things as > our own community. Eventually, I hope the community can help draw in a > group of folks traditionally underrepresented in ASF projects, namely QA > and operations folks. > > I think test-patch by itself has enough scope to justify a project. Having > a solid set of build tools that are customizable to fit the norms of > different software communities is a bunch of work. Making it work well in > both the context of automated test systems like Jenkins and for individual > developers is even more work. We could easily also take over maintenance of > things like shelldocs, since test-patch is the primary consumer of that > currently but it's generally useful tooling. > > In addition to test-patch, I think the proposed project has some future > growth potential. Given some adoption of test-patch to prove utility, the > project could build on the ties it makes to start building tools to help > projects do their own longer-run testing. Note that I'm talking about the > tools to build QA processes and not a particular set of tested components. > Specifically, I think the ChaosMonkey work that's in HBase should be > generalizable as a fault injection framework (either based on that code or > something like it). Doing this for arbitrary software is obviously very > difficult, and a part of easing that will be to make (and then favor) > tooling to allow projects to have operational glue that looks the same. > Namely, the shell work that's been done in hadoop-functions.sh would be a > great foundational layer that could bring good daemon handling practices to > a whole slew of software projects. In the event that these frameworks and > tools get adopted by parts of the Hadoop ecosystem, that could make the job > of i.e. Bigtop substantially easier. > > I've reached out to a few folks who have been involved in the current > test-patch work or expressed interest in helping out on getting it used in > other projects. Right now, the proposed PMC would be (alphabetical by last > name): > > * Andrew Bayer (ASF member, incubator pmc, bigtop pmc, flume pmc, jclouds > pmc, sqoop pmc, all around Jenkins expert) > * Sean Busbey (ASF member, accumulo pmc, hbase pmc) > * Nick Dimiduk (hbase pmc, phoenix pmc) > * Chris Nauroth (ASF member, incubator pmc, hadoop pmc) > * Andrew Purtell (ASF member, incubator pmc, bigtop pmc, hbase pmc, > phoenix pmc) > * Allen Wittenauer (hadoop committer) > > That PMC gives us several members and a bunch of folks familiar with the > ASF. Combined with the code already existing in Apache spaces, I think that > gives us sufficient justification for a direct board proposal. > > The planned project name is "Apache Yetus". It's an archaic genus of sea > snail and most of our project will be focused on shell scripts. > > N.b.: this does not mean that the Hadoop community would _have_ to rely on > the new TLP, but I hope that once we have a release that can be evaluated > there'd be enough benefit to strongly encourage it. > > This has mostly been focused on scope and community issues, and I'd love > to talk through any feedback on that. Additionally, are there any other > points folks want to make sure are covered before we have a resolution? > > On Sat, Jun 6, 2015 at 10:43 PM, Sean Busbey <bus...@cloudera.com> wrote: > >> Sorry for the resend. I figured this deserves a [DISCUSS] flag. >> >> >> >> On Sat, Jun 6, 2015 at 10:39 PM, Sean Busbey <bus...@cloudera.com> wrote: >> >>> Hi Folks! >>> >>> After working on test-patch with other folks for the last few months, I >>> think we've reached the point where we can make the fastest progress >>> towards the goal of a general use pre-commit patch tester by spinning >>> things into a project focused on just that. I think we have a mature enough >>> code base and a sufficient fledgling community, so I'm going to put >>> together a tlp proposal. >>> >>> Thanks for the feedback thus far from use within Hadoop. I hope we can >>> continue to make things more useful. >>> >>> -Sean >>> >>> On Wed, Mar 11, 2015 at 5:16 PM, Sean Busbey <bus...@cloudera.com> >>> wrote: >>> >>>> HBase's dev-support folder is where the scripts and support files live. >>>> We've only recently started adding anything to the maven builds that's >>>> specific to jenkins[1]; so far it's diagnostic stuff, but that's where I'd >>>> add in more if we ran into the same permissions problems y'all are having. >>>> >>>> There's also our precommit job itself, though it isn't large[2]. AFAIK, >>>> we don't properly back this up anywhere, we just notify each other of >>>> changes on a particular mail thread[3]. >>>> >>>> [1]: https://github.com/apache/hbase/blob/master/pom.xml#L1687 >>>> [2]: https://builds.apache.org/job/PreCommit-HBASE-Build/ (they're all >>>> read because I just finished fixing "mvn site" running out of permgen) >>>> [3]: http://s.apache.org/NT0 >>>> >>>> >>>> On Wed, Mar 11, 2015 at 4:51 PM, Chris Nauroth < >>>> cnaur...@hortonworks.com> wrote: >>>> >>>>> Sure, thanks Sean! Do we just look in the dev-support folder in the >>>>> HBase >>>>> repo? Is there any additional context we need to be aware of? >>>>> >>>>> Chris Nauroth >>>>> Hortonworks >>>>> http://hortonworks.com/ >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On 3/11/15, 2:44 PM, "Sean Busbey" <bus...@cloudera.com> wrote: >>>>> >>>>> >+dev@hbase >>>>> > >>>>> >HBase has recently been cleaning up our precommit jenkins jobs to make >>>>> >them >>>>> >more robust. From what I can tell our stuff started off as an earlier >>>>> >version of what Hadoop uses for testing. >>>>> > >>>>> >Folks on either side open to an experiment of combining our precommit >>>>> >check >>>>> >tooling? In principle we should be looking for the same kinds of >>>>> things. >>>>> > >>>>> >Naturally we'll still need different jenkins jobs to handle different >>>>> >resource needs and we'd need to figure out where stuff eventually >>>>> lives, >>>>> >but that could come later. >>>>> > >>>>> >On Wed, Mar 11, 2015 at 4:34 PM, Chris Nauroth < >>>>> cnaur...@hortonworks.com> >>>>> >wrote: >>>>> > >>>>> >> The only thing I'm aware of is the failOnError option: >>>>> >> >>>>> >> >>>>> >> >>>>> http://maven.apache.org/plugins/maven-clean-plugin/examples/ignoring-erro >>>>> >>rs >>>>> >> .html >>>>> >> >>>>> >> >>>>> >> I prefer that we don't disable this, because ignoring different >>>>> kinds of >>>>> >> failures could leave our build directories in an indeterminate >>>>> state. >>>>> >>For >>>>> >> example, we could end up with an old class file on the classpath for >>>>> >>test >>>>> >> runs that was supposedly deleted. >>>>> >> >>>>> >> I think it's worth exploring Eddy's suggestion to try simulating >>>>> failure >>>>> >> by placing a file where the code expects to see a directory. That >>>>> might >>>>> >> even let us enable some of these tests that are skipped on Windows, >>>>> >> because Windows allows access for the owner even after permissions >>>>> have >>>>> >> been stripped. >>>>> >> >>>>> >> Chris Nauroth >>>>> >> Hortonworks >>>>> >> http://hortonworks.com/ >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> On 3/11/15, 2:10 PM, "Colin McCabe" <cmcc...@alumni.cmu.edu> wrote: >>>>> >> >>>>> >> >Is there a maven plugin or setting we can use to simply remove >>>>> >> >directories that have no executable permissions on them? Clearly >>>>> we >>>>> >> >have the permission to do this from a technical point of view >>>>> (since >>>>> >> >we created the directories as the jenkins user), it's simply that >>>>> the >>>>> >> >code refuses to do it. >>>>> >> > >>>>> >> >Otherwise I guess we can just fix those tests... >>>>> >> > >>>>> >> >Colin >>>>> >> > >>>>> >> >On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <l...@cloudera.com> wrote: >>>>> >> >> Thanks a lot for looking into HDFS-7722, Chris. >>>>> >> >> >>>>> >> >> In HDFS-7722: >>>>> >> >> TestDataNodeVolumeFailureXXX tests reset data dir permissions in >>>>> >> >>TearDown(). >>>>> >> >> TestDataNodeHotSwapVolumes reset permissions in a finally clause. >>>>> >> >> >>>>> >> >> Also I ran mvn test several times on my machine and all tests >>>>> passed. >>>>> >> >> >>>>> >> >> However, since in DiskChecker#checkDirAccess(): >>>>> >> >> >>>>> >> >> private static void checkDirAccess(File dir) throws >>>>> >>DiskErrorException { >>>>> >> >> if (!dir.isDirectory()) { >>>>> >> >> throw new DiskErrorException("Not a directory: " >>>>> >> >> + dir.toString()); >>>>> >> >> } >>>>> >> >> >>>>> >> >> checkAccessByFileMethods(dir); >>>>> >> >> } >>>>> >> >> >>>>> >> >> One potentially safer alternative is replacing data dir with a >>>>> >>regular >>>>> >> >> file to stimulate disk failures. >>>>> >> >> >>>>> >> >> On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth >>>>> >> >><cnaur...@hortonworks.com> wrote: >>>>> >> >>> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure, >>>>> >> >>> TestDataNodeVolumeFailureReporting, and >>>>> >> >>> TestDataNodeVolumeFailureToleration all remove executable >>>>> >>permissions >>>>> >> >>>from >>>>> >> >>> directories like the one Colin mentioned to simulate disk >>>>> failures >>>>> >>at >>>>> >> >>>data >>>>> >> >>> nodes. I reviewed the code for all of those, and they all >>>>> appear >>>>> >>to be >>>>> >> >>> doing the necessary work to restore executable permissions at >>>>> the >>>>> >>end >>>>> >> >>>of >>>>> >> >>> the test. The only recent uncommitted patch I¹ve seen that >>>>> makes >>>>> >> >>>changes >>>>> >> >>> in these test suites is HDFS-7722. That patch still looks fine >>>>> >> >>>though. I >>>>> >> >>> don¹t know if there are other uncommitted patches that changed >>>>> these >>>>> >> >>>test >>>>> >> >>> suites. >>>>> >> >>> >>>>> >> >>> I suppose it¹s also possible that the JUnit process unexpectedly >>>>> >>died >>>>> >> >>> after removing executable permissions but before restoring them. >>>>> >>That >>>>> >> >>> always would have been a weakness of these test suites, >>>>> regardless >>>>> >>of >>>>> >> >>>any >>>>> >> >>> recent changes. >>>>> >> >>> >>>>> >> >>> Chris Nauroth >>>>> >> >>> Hortonworks >>>>> >> >>> http://hortonworks.com/ >>>>> >> >>> >>>>> >> >>> >>>>> >> >>> >>>>> >> >>> >>>>> >> >>> >>>>> >> >>> >>>>> >> >>> On 3/10/15, 1:47 PM, "Aaron T. Myers" <a...@cloudera.com> wrote: >>>>> >> >>> >>>>> >> >>>>Hey Colin, >>>>> >> >>>> >>>>> >> >>>>I asked Andrew Bayer, who works with Apache Infra, what's going >>>>> on >>>>> >>with >>>>> >> >>>>these boxes. He took a look and concluded that some perms are >>>>> being >>>>> >> >>>>set in >>>>> >> >>>>those directories by our unit tests which are precluding those >>>>> files >>>>> >> >>>>from >>>>> >> >>>>getting deleted. He's going to clean up the boxes for us, but we >>>>> >>should >>>>> >> >>>>expect this to keep happening until we can fix the test in >>>>> question >>>>> >>to >>>>> >> >>>>properly clean up after itself. >>>>> >> >>>> >>>>> >> >>>>To help narrow down which commit it was that started this, >>>>> Andrew >>>>> >>sent >>>>> >> >>>>me >>>>> >> >>>>this info: >>>>> >> >>>> >>>>> >> >>>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS- >>>>> >> >>>>> >>>>> >>>>>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3 >>>>> >>>>>>/ >>>>> >> >>>>has >>>>> >> >>>>500 perms, so I'm guessing that's the problem. Been that way >>>>> since >>>>> >>9:32 >>>>> >> >>>>UTC >>>>> >> >>>>on March 5th." >>>>> >> >>>> >>>>> >> >>>>-- >>>>> >> >>>>Aaron T. Myers >>>>> >> >>>>Software Engineer, Cloudera >>>>> >> >>>> >>>>> >> >>>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe >>>>> >><cmcc...@apache.org> >>>>> >> >>>>wrote: >>>>> >> >>>> >>>>> >> >>>>> Hi all, >>>>> >> >>>>> >>>>> >> >>>>> A very quick (and not thorough) survey shows that I can't >>>>> find any >>>>> >> >>>>> jenkins jobs that succeeded from the last 24 hours. Most of >>>>> them >>>>> >> >>>>>seem >>>>> >> >>>>> to be failing with some variant of this message: >>>>> >> >>>>> >>>>> >> >>>>> [ERROR] Failed to execute goal >>>>> >> >>>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean >>>>> >>(default-clean) >>>>> >> >>>>> on project hadoop-hdfs: Failed to clean project: Failed to >>>>> delete >>>>> >> >>>>> >>>>> >> >>>>> >>>>> >> >>>>> >>>>> >>>>>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hd >>>>> >>>>>>>fs >>>>> >> >>>>>-pr >>>>> >> >>>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3 >>>>> >> >>>>> -> [Help 1] >>>>> >> >>>>> >>>>> >> >>>>> Any ideas how this happened? Bad disk, unit test setting >>>>> wrong >>>>> >> >>>>> permissions? >>>>> >> >>>>> >>>>> >> >>>>> Colin >>>>> >> >>>>> >>>>> >> >>> >>>>> >> >> >>>>> >> >> >>>>> >> >> >>>>> >> >> -- >>>>> >> >> Lei (Eddy) Xu >>>>> >> >> Software Engineer, Cloudera >>>>> >> >>>>> >> >>>>> > >>>>> > >>>>> >-- >>>>> >Sean >>>>> >>>>> >>>> >>>> >>>> -- >>>> Sean >>>> >>> >>> >>> >>> -- >>> Sean >>> >> >> >> >> -- >> Sean >> > > > > -- > Sean > -- Sean