+1 A separate project sounds great. It'd be great to have more
standard tooling across the ecosystem.

As a practical matter, how should projects consume releases? -C

On Mon, Jun 15, 2015 at 4:47 PM, Sean Busbey <bus...@cloudera.com> wrote:
> Oof. I had meant to push on this again but life got in the way and now the
> June board meeting is upon us. Sorry everyone. In the event that this ends
> up contentious, hopefully one of the copied communities can give us a
> branch to work in.
>
> I know everyone is busy, so here's the short version of this email: I'd
> like to move some of the code currently in Hadoop (test-patch) into a new
> TLP focused on QA tooling. I'm not sure what the best format for priming
> this conversation is. ORC filled in the incubator project proposal
> template, but I'm not sure how much that confused the issue. So to start,
> I'll just write what I'm hoping we can accomplish in general terms here.
>
> All software development projects that are community based (that is,
> accepting outside contributions) face a common QA problem for vetting
> in-coming contributions. Hadoop is fortunate enough to be sufficiently
> popular that the weight of the problem drove tool development (i.e.
> test-patch). That tool is generalizable enough that a bunch of other TLPs
> have adopted their own forks. Unfortunately, in most projects this kind of
> QA work is an enabler rather than a primary concern, so often the tooling
> is worked on ad-hoc and little shared improvements happen across
> projects. Since
> the tooling itself is never a primary concern, any made is rarely reused
> outside of ASF projects.
>
> Over the last couple months a few of us have been working on generalizing
> the tooling present in the Hadoop code base (because it was the most mature
> out of all those in the various projects) and it's reached a point where we
> think we can start bringing on other downstream users. This means we need
> to start establishing things like a release cadence and to grow the new
> contributors we have to handle more project responsibility. Personally, I
> think that means it's time to move out from under Hadoop to drive things as
> our own community. Eventually, I hope the community can help draw in a
> group of folks traditionally underrepresented in ASF projects, namely QA
> and operations folks.
>
> I think test-patch by itself has enough scope to justify a project. Having
> a solid set of build tools that are customizable to fit the norms of
> different software communities is a bunch of work. Making it work well in
> both the context of automated test systems like Jenkins and for individual
> developers is even more work. We could easily also take over maintenance of
> things like shelldocs, since test-patch is the primary consumer of that
> currently but it's generally useful tooling.
>
> In addition to test-patch, I think the proposed project has some future
> growth potential. Given some adoption of test-patch to prove utility, the
> project could build on the ties it makes to start building tools to help
> projects do their own longer-run testing. Note that I'm talking about the
> tools to build QA processes and not a particular set of tested components.
> Specifically, I think the ChaosMonkey work that's in HBase should be
> generalizable as a fault injection framework (either based on that code or
> something like it). Doing this for arbitrary software is obviously very
> difficult, and a part of easing that will be to make (and then favor)
> tooling to allow projects to have operational glue that looks the same.
> Namely, the shell work that's been done in hadoop-functions.sh would be a
> great foundational layer that could bring good daemon handling practices to
> a whole slew of software projects. In the event that these frameworks and
> tools get adopted by parts of the Hadoop ecosystem, that could make the job
> of i.e. Bigtop substantially easier.
>
> I've reached out to a few folks who have been involved in the current
> test-patch work or expressed interest in helping out on getting it used in
> other projects. Right now, the proposed PMC would be (alphabetical by last
> name):
>
> * Andrew Bayer (ASF member, incubator pmc, bigtop pmc, flume pmc, jclouds
> pmc, sqoop pmc, all around Jenkins expert)
> * Sean Busbey (ASF member, accumulo pmc, hbase pmc)
> * Nick Dimiduk (hbase pmc, phoenix pmc)
> * Chris Nauroth (ASF member, incubator pmc, hadoop pmc)
> * Andrew Purtell  (ASF member, incubator pmc, bigtop pmc, hbase pmc,
> phoenix pmc)
> * Allen Wittenauer (hadoop committer)
>
> That PMC gives us several members and a bunch of folks familiar with the
> ASF. Combined with the code already existing in Apache spaces, I think that
> gives us sufficient justification for a direct board proposal.
>
> The planned project name is "Apache Yetus". It's an archaic genus of sea
> snail and most of our project will be focused on shell scripts.
>
> N.b.: this does not mean that the Hadoop community would _have_ to rely on
> the new TLP, but I hope that once we have a release that can be evaluated
> there'd be enough benefit to strongly encourage it.
>
> This has mostly been focused on scope and community issues, and I'd love to
> talk through any feedback on that. Additionally, are there any other points
> folks want to make sure are covered before we have a resolution?
>
> On Sat, Jun 6, 2015 at 10:43 PM, Sean Busbey <bus...@cloudera.com> wrote:
>
>> Sorry for the resend. I figured this deserves a [DISCUSS] flag.
>>
>>
>>
>> On Sat, Jun 6, 2015 at 10:39 PM, Sean Busbey <bus...@cloudera.com> wrote:
>>
>>> Hi Folks!
>>>
>>> After working on test-patch with other folks for the last few months, I
>>> think we've reached the point where we can make the fastest progress
>>> towards the goal of a general use pre-commit patch tester by spinning
>>> things into a project focused on just that. I think we have a mature enough
>>> code base and a sufficient fledgling community, so I'm going to put
>>> together a tlp proposal.
>>>
>>> Thanks for the feedback thus far from use within Hadoop. I hope we can
>>> continue to make things more useful.
>>>
>>> -Sean
>>>
>>> On Wed, Mar 11, 2015 at 5:16 PM, Sean Busbey <bus...@cloudera.com> wrote:
>>>
>>>> HBase's dev-support folder is where the scripts and support files live.
>>>> We've only recently started adding anything to the maven builds that's
>>>> specific to jenkins[1]; so far it's diagnostic stuff, but that's where I'd
>>>> add in more if we ran into the same permissions problems y'all are having.
>>>>
>>>> There's also our precommit job itself, though it isn't large[2]. AFAIK,
>>>> we don't properly back this up anywhere, we just notify each other of
>>>> changes on a particular mail thread[3].
>>>>
>>>> [1]: https://github.com/apache/hbase/blob/master/pom.xml#L1687
>>>> [2]: https://builds.apache.org/job/PreCommit-HBASE-Build/ (they're all
>>>> read because I just finished fixing "mvn site" running out of permgen)
>>>> [3]: http://s.apache.org/NT0
>>>>
>>>>
>>>> On Wed, Mar 11, 2015 at 4:51 PM, Chris Nauroth <cnaur...@hortonworks.com
>>>> > wrote:
>>>>
>>>>> Sure, thanks Sean!  Do we just look in the dev-support folder in the
>>>>> HBase
>>>>> repo?  Is there any additional context we need to be aware of?
>>>>>
>>>>> Chris Nauroth
>>>>> Hortonworks
>>>>> http://hortonworks.com/
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 3/11/15, 2:44 PM, "Sean Busbey" <bus...@cloudera.com> wrote:
>>>>>
>>>>> >+dev@hbase
>>>>> >
>>>>> >HBase has recently been cleaning up our precommit jenkins jobs to make
>>>>> >them
>>>>> >more robust. From what I can tell our stuff started off as an earlier
>>>>> >version of what Hadoop uses for testing.
>>>>> >
>>>>> >Folks on either side open to an experiment of combining our precommit
>>>>> >check
>>>>> >tooling? In principle we should be looking for the same kinds of
>>>>> things.
>>>>> >
>>>>> >Naturally we'll still need different jenkins jobs to handle different
>>>>> >resource needs and we'd need to figure out where stuff eventually
>>>>> lives,
>>>>> >but that could come later.
>>>>> >
>>>>> >On Wed, Mar 11, 2015 at 4:34 PM, Chris Nauroth <
>>>>> cnaur...@hortonworks.com>
>>>>> >wrote:
>>>>> >
>>>>> >> The only thing I'm aware of is the failOnError option:
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> http://maven.apache.org/plugins/maven-clean-plugin/examples/ignoring-erro
>>>>> >>rs
>>>>> >> .html
>>>>> >>
>>>>> >>
>>>>> >> I prefer that we don't disable this, because ignoring different
>>>>> kinds of
>>>>> >> failures could leave our build directories in an indeterminate state.
>>>>> >>For
>>>>> >> example, we could end up with an old class file on the classpath for
>>>>> >>test
>>>>> >> runs that was supposedly deleted.
>>>>> >>
>>>>> >> I think it's worth exploring Eddy's suggestion to try simulating
>>>>> failure
>>>>> >> by placing a file where the code expects to see a directory.  That
>>>>> might
>>>>> >> even let us enable some of these tests that are skipped on Windows,
>>>>> >> because Windows allows access for the owner even after permissions
>>>>> have
>>>>> >> been stripped.
>>>>> >>
>>>>> >> Chris Nauroth
>>>>> >> Hortonworks
>>>>> >> http://hortonworks.com/
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> On 3/11/15, 2:10 PM, "Colin McCabe" <cmcc...@alumni.cmu.edu> wrote:
>>>>> >>
>>>>> >> >Is there a maven plugin or setting we can use to simply remove
>>>>> >> >directories that have no executable permissions on them?  Clearly we
>>>>> >> >have the permission to do this from a technical point of view (since
>>>>> >> >we created the directories as the jenkins user), it's simply that
>>>>> the
>>>>> >> >code refuses to do it.
>>>>> >> >
>>>>> >> >Otherwise I guess we can just fix those tests...
>>>>> >> >
>>>>> >> >Colin
>>>>> >> >
>>>>> >> >On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <l...@cloudera.com> wrote:
>>>>> >> >> Thanks a lot for looking into HDFS-7722, Chris.
>>>>> >> >>
>>>>> >> >> In HDFS-7722:
>>>>> >> >> TestDataNodeVolumeFailureXXX tests reset data dir permissions in
>>>>> >> >>TearDown().
>>>>> >> >> TestDataNodeHotSwapVolumes reset permissions in a finally clause.
>>>>> >> >>
>>>>> >> >> Also I ran mvn test several times on my machine and all tests
>>>>> passed.
>>>>> >> >>
>>>>> >> >> However, since in DiskChecker#checkDirAccess():
>>>>> >> >>
>>>>> >> >> private static void checkDirAccess(File dir) throws
>>>>> >>DiskErrorException {
>>>>> >> >>   if (!dir.isDirectory()) {
>>>>> >> >>     throw new DiskErrorException("Not a directory: "
>>>>> >> >>                                  + dir.toString());
>>>>> >> >>   }
>>>>> >> >>
>>>>> >> >>   checkAccessByFileMethods(dir);
>>>>> >> >> }
>>>>> >> >>
>>>>> >> >> One potentially safer alternative is replacing data dir with a
>>>>> >>regular
>>>>> >> >> file to stimulate disk failures.
>>>>> >> >>
>>>>> >> >> On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth
>>>>> >> >><cnaur...@hortonworks.com> wrote:
>>>>> >> >>> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
>>>>> >> >>> TestDataNodeVolumeFailureReporting, and
>>>>> >> >>> TestDataNodeVolumeFailureToleration all remove executable
>>>>> >>permissions
>>>>> >> >>>from
>>>>> >> >>> directories like the one Colin mentioned to simulate disk
>>>>> failures
>>>>> >>at
>>>>> >> >>>data
>>>>> >> >>> nodes.  I reviewed the code for all of those, and they all appear
>>>>> >>to be
>>>>> >> >>> doing the necessary work to restore executable permissions at the
>>>>> >>end
>>>>> >> >>>of
>>>>> >> >>> the test.  The only recent uncommitted patch I¹ve seen that makes
>>>>> >> >>>changes
>>>>> >> >>> in these test suites is HDFS-7722.  That patch still looks fine
>>>>> >> >>>though.  I
>>>>> >> >>> don¹t know if there are other uncommitted patches that changed
>>>>> these
>>>>> >> >>>test
>>>>> >> >>> suites.
>>>>> >> >>>
>>>>> >> >>> I suppose it¹s also possible that the JUnit process unexpectedly
>>>>> >>died
>>>>> >> >>> after removing executable permissions but before restoring them.
>>>>> >>That
>>>>> >> >>> always would have been a weakness of these test suites,
>>>>> regardless
>>>>> >>of
>>>>> >> >>>any
>>>>> >> >>> recent changes.
>>>>> >> >>>
>>>>> >> >>> Chris Nauroth
>>>>> >> >>> Hortonworks
>>>>> >> >>> http://hortonworks.com/
>>>>> >> >>>
>>>>> >> >>>
>>>>> >> >>>
>>>>> >> >>>
>>>>> >> >>>
>>>>> >> >>>
>>>>> >> >>> On 3/10/15, 1:47 PM, "Aaron T. Myers" <a...@cloudera.com> wrote:
>>>>> >> >>>
>>>>> >> >>>>Hey Colin,
>>>>> >> >>>>
>>>>> >> >>>>I asked Andrew Bayer, who works with Apache Infra, what's going
>>>>> on
>>>>> >>with
>>>>> >> >>>>these boxes. He took a look and concluded that some perms are
>>>>> being
>>>>> >> >>>>set in
>>>>> >> >>>>those directories by our unit tests which are precluding those
>>>>> files
>>>>> >> >>>>from
>>>>> >> >>>>getting deleted. He's going to clean up the boxes for us, but we
>>>>> >>should
>>>>> >> >>>>expect this to keep happening until we can fix the test in
>>>>> question
>>>>> >>to
>>>>> >> >>>>properly clean up after itself.
>>>>> >> >>>>
>>>>> >> >>>>To help narrow down which commit it was that started this, Andrew
>>>>> >>sent
>>>>> >> >>>>me
>>>>> >> >>>>this info:
>>>>> >> >>>>
>>>>> >> >>>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
>>>>> >>
>>>>>
>>>>> >>>>>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3
>>>>> >>>>>>/
>>>>> >> >>>>has
>>>>> >> >>>>500 perms, so I'm guessing that's the problem. Been that way
>>>>> since
>>>>> >>9:32
>>>>> >> >>>>UTC
>>>>> >> >>>>on March 5th."
>>>>> >> >>>>
>>>>> >> >>>>--
>>>>> >> >>>>Aaron T. Myers
>>>>> >> >>>>Software Engineer, Cloudera
>>>>> >> >>>>
>>>>> >> >>>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe
>>>>> >><cmcc...@apache.org>
>>>>> >> >>>>wrote:
>>>>> >> >>>>
>>>>> >> >>>>> Hi all,
>>>>> >> >>>>>
>>>>> >> >>>>> A very quick (and not thorough) survey shows that I can't find
>>>>> any
>>>>> >> >>>>> jenkins jobs that succeeded from the last 24 hours.  Most of
>>>>> them
>>>>> >> >>>>>seem
>>>>> >> >>>>> to be failing with some variant of this message:
>>>>> >> >>>>>
>>>>> >> >>>>> [ERROR] Failed to execute goal
>>>>> >> >>>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean
>>>>> >>(default-clean)
>>>>> >> >>>>> on project hadoop-hdfs: Failed to clean project: Failed to
>>>>> delete
>>>>> >> >>>>>
>>>>> >> >>>>>
>>>>> >>
>>>>>
>>>>> >>>>>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hd
>>>>> >>>>>>>fs
>>>>> >> >>>>>-pr
>>>>> >> >>>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3
>>>>> >> >>>>> -> [Help 1]
>>>>> >> >>>>>
>>>>> >> >>>>> Any ideas how this happened?  Bad disk, unit test setting wrong
>>>>> >> >>>>> permissions?
>>>>> >> >>>>>
>>>>> >> >>>>> Colin
>>>>> >> >>>>>
>>>>> >> >>>
>>>>> >> >>
>>>>> >> >>
>>>>> >> >>
>>>>> >> >> --
>>>>> >> >> Lei (Eddy) Xu
>>>>> >> >> Software Engineer, Cloudera
>>>>> >>
>>>>> >>
>>>>> >
>>>>> >
>>>>> >--
>>>>> >Sean
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Sean
>>>>
>>>
>>>
>>>
>>> --
>>> Sean
>>>
>>
>>
>>
>> --
>> Sean
>>
>
>
>
> --
> Sean

Reply via email to