Hi Folks! Work in a feature branch is now being tracked by HADOOP-12111.
On Thu, Jun 18, 2015 at 10:07 PM, Sean Busbey <bus...@cloudera.com> wrote: > It looks like we have consensus. > > I'll start drafting up a proposal for the next board meeting (July 15th). > Once we work out the name I'll submit a PODLINGNAMESEARCH jira to track > that we did due diligence on whatever we pick. > > In the mean time, Hadoop PMC would y'all be willing to host us in a branch > so that we can start prepping things now? We would want branch commit > rights for the proposed new PMC. > > > -Sean > > > On Mon, Jun 15, 2015 at 6:47 PM, Sean Busbey <bus...@cloudera.com> wrote: > >> Oof. I had meant to push on this again but life got in the way and now >> the June board meeting is upon us. Sorry everyone. In the event that this >> ends up contentious, hopefully one of the copied communities can give us a >> branch to work in. >> >> I know everyone is busy, so here's the short version of this email: I'd >> like to move some of the code currently in Hadoop (test-patch) into a new >> TLP focused on QA tooling. I'm not sure what the best format for priming >> this conversation is. ORC filled in the incubator project proposal >> template, but I'm not sure how much that confused the issue. So to start, >> I'll just write what I'm hoping we can accomplish in general terms here. >> >> All software development projects that are community based (that is, >> accepting outside contributions) face a common QA problem for vetting >> in-coming contributions. Hadoop is fortunate enough to be sufficiently >> popular that the weight of the problem drove tool development (i.e. >> test-patch). That tool is generalizable enough that a bunch of other TLPs >> have adopted their own forks. Unfortunately, in most projects this kind of >> QA work is an enabler rather than a primary concern, so often the tooling >> is worked on ad-hoc and little shared improvements happen across projects. >> Since >> the tooling itself is never a primary concern, any made is rarely reused >> outside of ASF projects. >> >> Over the last couple months a few of us have been working on generalizing >> the tooling present in the Hadoop code base (because it was the most mature >> out of all those in the various projects) and it's reached a point where we >> think we can start bringing on other downstream users. This means we need >> to start establishing things like a release cadence and to grow the new >> contributors we have to handle more project responsibility. Personally, I >> think that means it's time to move out from under Hadoop to drive things as >> our own community. Eventually, I hope the community can help draw in a >> group of folks traditionally underrepresented in ASF projects, namely QA >> and operations folks. >> >> I think test-patch by itself has enough scope to justify a project. >> Having a solid set of build tools that are customizable to fit the norms of >> different software communities is a bunch of work. Making it work well in >> both the context of automated test systems like Jenkins and for individual >> developers is even more work. We could easily also take over maintenance of >> things like shelldocs, since test-patch is the primary consumer of that >> currently but it's generally useful tooling. >> >> In addition to test-patch, I think the proposed project has some future >> growth potential. Given some adoption of test-patch to prove utility, the >> project could build on the ties it makes to start building tools to help >> projects do their own longer-run testing. Note that I'm talking about the >> tools to build QA processes and not a particular set of tested components. >> Specifically, I think the ChaosMonkey work that's in HBase should be >> generalizable as a fault injection framework (either based on that code or >> something like it). Doing this for arbitrary software is obviously very >> difficult, and a part of easing that will be to make (and then favor) >> tooling to allow projects to have operational glue that looks the same. >> Namely, the shell work that's been done in hadoop-functions.sh would be a >> great foundational layer that could bring good daemon handling practices to >> a whole slew of software projects. In the event that these frameworks and >> tools get adopted by parts of the Hadoop ecosystem, that could make the job >> of i.e. Bigtop substantially easier. >> >> I've reached out to a few folks who have been involved in the current >> test-patch work or expressed interest in helping out on getting it used in >> other projects. Right now, the proposed PMC would be (alphabetical by last >> name): >> >> * Andrew Bayer (ASF member, incubator pmc, bigtop pmc, flume pmc, jclouds >> pmc, sqoop pmc, all around Jenkins expert) >> * Sean Busbey (ASF member, accumulo pmc, hbase pmc) >> * Nick Dimiduk (hbase pmc, phoenix pmc) >> * Chris Nauroth (ASF member, incubator pmc, hadoop pmc) >> * Andrew Purtell (ASF member, incubator pmc, bigtop pmc, hbase pmc, >> phoenix pmc) >> * Allen Wittenauer (hadoop committer) >> >> That PMC gives us several members and a bunch of folks familiar with the >> ASF. Combined with the code already existing in Apache spaces, I think that >> gives us sufficient justification for a direct board proposal. >> >> The planned project name is "Apache Yetus". It's an archaic genus of sea >> snail and most of our project will be focused on shell scripts. >> >> N.b.: this does not mean that the Hadoop community would _have_ to rely >> on the new TLP, but I hope that once we have a release that can be >> evaluated there'd be enough benefit to strongly encourage it. >> >> This has mostly been focused on scope and community issues, and I'd love >> to talk through any feedback on that. Additionally, are there any other >> points folks want to make sure are covered before we have a resolution? >> >> On Sat, Jun 6, 2015 at 10:43 PM, Sean Busbey <bus...@cloudera.com> wrote: >> >>> Sorry for the resend. I figured this deserves a [DISCUSS] flag. >>> >>> >>> >>> On Sat, Jun 6, 2015 at 10:39 PM, Sean Busbey <bus...@cloudera.com> >>> wrote: >>> >>>> Hi Folks! >>>> >>>> After working on test-patch with other folks for the last few months, I >>>> think we've reached the point where we can make the fastest progress >>>> towards the goal of a general use pre-commit patch tester by spinning >>>> things into a project focused on just that. I think we have a mature enough >>>> code base and a sufficient fledgling community, so I'm going to put >>>> together a tlp proposal. >>>> >>>> Thanks for the feedback thus far from use within Hadoop. I hope we can >>>> continue to make things more useful. >>>> >>>> -Sean >>>> >>>> On Wed, Mar 11, 2015 at 5:16 PM, Sean Busbey <bus...@cloudera.com> >>>> wrote: >>>> >>>>> HBase's dev-support folder is where the scripts and support files >>>>> live. We've only recently started adding anything to the maven builds >>>>> that's specific to jenkins[1]; so far it's diagnostic stuff, but that's >>>>> where I'd add in more if we ran into the same permissions problems y'all >>>>> are having. >>>>> >>>>> There's also our precommit job itself, though it isn't large[2]. >>>>> AFAIK, we don't properly back this up anywhere, we just notify each other >>>>> of changes on a particular mail thread[3]. >>>>> >>>>> [1]: https://github.com/apache/hbase/blob/master/pom.xml#L1687 >>>>> [2]: https://builds.apache.org/job/PreCommit-HBASE-Build/ (they're >>>>> all read because I just finished fixing "mvn site" running out of permgen) >>>>> [3]: http://s.apache.org/NT0 >>>>> >>>>> >>>>> On Wed, Mar 11, 2015 at 4:51 PM, Chris Nauroth < >>>>> cnaur...@hortonworks.com> wrote: >>>>> >>>>>> Sure, thanks Sean! Do we just look in the dev-support folder in the >>>>>> HBase >>>>>> repo? Is there any additional context we need to be aware of? >>>>>> >>>>>> Chris Nauroth >>>>>> Hortonworks >>>>>> http://hortonworks.com/ >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On 3/11/15, 2:44 PM, "Sean Busbey" <bus...@cloudera.com> wrote: >>>>>> >>>>>> >+dev@hbase >>>>>> > >>>>>> >HBase has recently been cleaning up our precommit jenkins jobs to >>>>>> make >>>>>> >them >>>>>> >more robust. From what I can tell our stuff started off as an earlier >>>>>> >version of what Hadoop uses for testing. >>>>>> > >>>>>> >Folks on either side open to an experiment of combining our precommit >>>>>> >check >>>>>> >tooling? In principle we should be looking for the same kinds of >>>>>> things. >>>>>> > >>>>>> >Naturally we'll still need different jenkins jobs to handle different >>>>>> >resource needs and we'd need to figure out where stuff eventually >>>>>> lives, >>>>>> >but that could come later. >>>>>> > >>>>>> >On Wed, Mar 11, 2015 at 4:34 PM, Chris Nauroth < >>>>>> cnaur...@hortonworks.com> >>>>>> >wrote: >>>>>> > >>>>>> >> The only thing I'm aware of is the failOnError option: >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> http://maven.apache.org/plugins/maven-clean-plugin/examples/ignoring-erro >>>>>> >>rs >>>>>> >> .html >>>>>> >> >>>>>> >> >>>>>> >> I prefer that we don't disable this, because ignoring different >>>>>> kinds of >>>>>> >> failures could leave our build directories in an indeterminate >>>>>> state. >>>>>> >>For >>>>>> >> example, we could end up with an old class file on the classpath >>>>>> for >>>>>> >>test >>>>>> >> runs that was supposedly deleted. >>>>>> >> >>>>>> >> I think it's worth exploring Eddy's suggestion to try simulating >>>>>> failure >>>>>> >> by placing a file where the code expects to see a directory. That >>>>>> might >>>>>> >> even let us enable some of these tests that are skipped on Windows, >>>>>> >> because Windows allows access for the owner even after permissions >>>>>> have >>>>>> >> been stripped. >>>>>> >> >>>>>> >> Chris Nauroth >>>>>> >> Hortonworks >>>>>> >> http://hortonworks.com/ >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> On 3/11/15, 2:10 PM, "Colin McCabe" <cmcc...@alumni.cmu.edu> >>>>>> wrote: >>>>>> >> >>>>>> >> >Is there a maven plugin or setting we can use to simply remove >>>>>> >> >directories that have no executable permissions on them? Clearly >>>>>> we >>>>>> >> >have the permission to do this from a technical point of view >>>>>> (since >>>>>> >> >we created the directories as the jenkins user), it's simply that >>>>>> the >>>>>> >> >code refuses to do it. >>>>>> >> > >>>>>> >> >Otherwise I guess we can just fix those tests... >>>>>> >> > >>>>>> >> >Colin >>>>>> >> > >>>>>> >> >On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <l...@cloudera.com> wrote: >>>>>> >> >> Thanks a lot for looking into HDFS-7722, Chris. >>>>>> >> >> >>>>>> >> >> In HDFS-7722: >>>>>> >> >> TestDataNodeVolumeFailureXXX tests reset data dir permissions in >>>>>> >> >>TearDown(). >>>>>> >> >> TestDataNodeHotSwapVolumes reset permissions in a finally >>>>>> clause. >>>>>> >> >> >>>>>> >> >> Also I ran mvn test several times on my machine and all tests >>>>>> passed. >>>>>> >> >> >>>>>> >> >> However, since in DiskChecker#checkDirAccess(): >>>>>> >> >> >>>>>> >> >> private static void checkDirAccess(File dir) throws >>>>>> >>DiskErrorException { >>>>>> >> >> if (!dir.isDirectory()) { >>>>>> >> >> throw new DiskErrorException("Not a directory: " >>>>>> >> >> + dir.toString()); >>>>>> >> >> } >>>>>> >> >> >>>>>> >> >> checkAccessByFileMethods(dir); >>>>>> >> >> } >>>>>> >> >> >>>>>> >> >> One potentially safer alternative is replacing data dir with a >>>>>> >>regular >>>>>> >> >> file to stimulate disk failures. >>>>>> >> >> >>>>>> >> >> On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth >>>>>> >> >><cnaur...@hortonworks.com> wrote: >>>>>> >> >>> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure, >>>>>> >> >>> TestDataNodeVolumeFailureReporting, and >>>>>> >> >>> TestDataNodeVolumeFailureToleration all remove executable >>>>>> >>permissions >>>>>> >> >>>from >>>>>> >> >>> directories like the one Colin mentioned to simulate disk >>>>>> failures >>>>>> >>at >>>>>> >> >>>data >>>>>> >> >>> nodes. I reviewed the code for all of those, and they all >>>>>> appear >>>>>> >>to be >>>>>> >> >>> doing the necessary work to restore executable permissions at >>>>>> the >>>>>> >>end >>>>>> >> >>>of >>>>>> >> >>> the test. The only recent uncommitted patch I¹ve seen that >>>>>> makes >>>>>> >> >>>changes >>>>>> >> >>> in these test suites is HDFS-7722. That patch still looks fine >>>>>> >> >>>though. I >>>>>> >> >>> don¹t know if there are other uncommitted patches that changed >>>>>> these >>>>>> >> >>>test >>>>>> >> >>> suites. >>>>>> >> >>> >>>>>> >> >>> I suppose it¹s also possible that the JUnit process >>>>>> unexpectedly >>>>>> >>died >>>>>> >> >>> after removing executable permissions but before restoring >>>>>> them. >>>>>> >>That >>>>>> >> >>> always would have been a weakness of these test suites, >>>>>> regardless >>>>>> >>of >>>>>> >> >>>any >>>>>> >> >>> recent changes. >>>>>> >> >>> >>>>>> >> >>> Chris Nauroth >>>>>> >> >>> Hortonworks >>>>>> >> >>> http://hortonworks.com/ >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> >>> On 3/10/15, 1:47 PM, "Aaron T. Myers" <a...@cloudera.com> >>>>>> wrote: >>>>>> >> >>> >>>>>> >> >>>>Hey Colin, >>>>>> >> >>>> >>>>>> >> >>>>I asked Andrew Bayer, who works with Apache Infra, what's >>>>>> going on >>>>>> >>with >>>>>> >> >>>>these boxes. He took a look and concluded that some perms are >>>>>> being >>>>>> >> >>>>set in >>>>>> >> >>>>those directories by our unit tests which are precluding those >>>>>> files >>>>>> >> >>>>from >>>>>> >> >>>>getting deleted. He's going to clean up the boxes for us, but >>>>>> we >>>>>> >>should >>>>>> >> >>>>expect this to keep happening until we can fix the test in >>>>>> question >>>>>> >>to >>>>>> >> >>>>properly clean up after itself. >>>>>> >> >>>> >>>>>> >> >>>>To help narrow down which commit it was that started this, >>>>>> Andrew >>>>>> >>sent >>>>>> >> >>>>me >>>>>> >> >>>>this info: >>>>>> >> >>>> >>>>>> >> >>>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS- >>>>>> >> >>>>>> >>>>>> >>>>>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3 >>>>>> >>>>>>/ >>>>>> >> >>>>has >>>>>> >> >>>>500 perms, so I'm guessing that's the problem. Been that way >>>>>> since >>>>>> >>9:32 >>>>>> >> >>>>UTC >>>>>> >> >>>>on March 5th." >>>>>> >> >>>> >>>>>> >> >>>>-- >>>>>> >> >>>>Aaron T. Myers >>>>>> >> >>>>Software Engineer, Cloudera >>>>>> >> >>>> >>>>>> >> >>>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe >>>>>> >><cmcc...@apache.org> >>>>>> >> >>>>wrote: >>>>>> >> >>>> >>>>>> >> >>>>> Hi all, >>>>>> >> >>>>> >>>>>> >> >>>>> A very quick (and not thorough) survey shows that I can't >>>>>> find any >>>>>> >> >>>>> jenkins jobs that succeeded from the last 24 hours. Most of >>>>>> them >>>>>> >> >>>>>seem >>>>>> >> >>>>> to be failing with some variant of this message: >>>>>> >> >>>>> >>>>>> >> >>>>> [ERROR] Failed to execute goal >>>>>> >> >>>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean >>>>>> >>(default-clean) >>>>>> >> >>>>> on project hadoop-hdfs: Failed to clean project: Failed to >>>>>> delete >>>>>> >> >>>>> >>>>>> >> >>>>> >>>>>> >> >>>>>> >>>>>> >>>>>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hd >>>>>> >>>>>>>fs >>>>>> >> >>>>>-pr >>>>>> >> >>>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3 >>>>>> >> >>>>> -> [Help 1] >>>>>> >> >>>>> >>>>>> >> >>>>> Any ideas how this happened? Bad disk, unit test setting >>>>>> wrong >>>>>> >> >>>>> permissions? >>>>>> >> >>>>> >>>>>> >> >>>>> Colin >>>>>> >> >>>>> >>>>>> >> >>> >>>>>> >> >> >>>>>> >> >> >>>>>> >> >> >>>>>> >> >> -- >>>>>> >> >> Lei (Eddy) Xu >>>>>> >> >> Software Engineer, Cloudera >>>>>> >> >>>>>> >> >>>>>> > >>>>>> > >>>>>> >-- >>>>>> >Sean >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Sean >>>>> >>>> >>>> >>>> >>>> -- >>>> Sean >>>> >>> >>> >>> >>> -- >>> Sean >>> >> >> >> >> -- >> Sean >> > > > > -- > Sean > -- Sean