On Mon, Nov 23, 2015 at 1:53 PM, Colin P. McCabe <cmcc...@apache.org> wrote: > I agree that our tests are in a bad state. It would help if we could > maintain a list of "flaky tests" somewhere in git and have Yetus > consider the flakiness of a test before -1ing a patch. Right now, we > pretty much all have that list in our heads, and we're not applying it > very consistently. Having this list would also let us know where to > concentrate our efforts to fix things. > > On Sun, Nov 22, 2015 at 4:21 AM, Steve Loughran <ste...@hortonworks.com> > wrote: >> >> Jenkins is pretty much dead in the water these days; a test run that works >> is a rare miracle rather than the default state. Which also means most >> patches are being +1'd in even though patches are failing, with comments >> like "the test failures are probably unrelated" >> >> >> I think everyone has to be grateful that I'm not volunteering to be release >> manager for 2.8, as if I were i'd have already imposed a block on any >> patches going in until jenkins was stable. That is: nothing but test fixes >> would go in. >> >> as it is, at least for the next couple of weeks, I'm going to experiment >> with reverting patches which break the build. Usually those breakages are >> being fixed, eventually, with followup patches. With a "patches which break >> the build get reverted" policy, whoever submitted that first patch gets to >> write the fix *and test it again*. This should encourage people to be more >> rigorous first time round. >> >> >> 1. Yes, I'm going to have to be ruthless and do this for myself too. Or >> others can. I'm not doing much (any?) core hadoop coding right now, so more >> isolated. >> 2. No, I don't plan to show favouritism: break the build and it gets >> rolled back. >> 3. We can review this in a week or two to see how it goes. And someone >> else can volunteer to keep jenkins happy. >> 4. I'll get a smaller fix for HDFS-9263 in. >> 5. I've also started running slider 0.90-SNAPSHOT test runs with Hadoop >> 2.8.0-SNAPSHOT, so I'm being the first to find problems beyond jenkins. So >> far HADOOP-12050 is the first blocker. It went in in August, which shows we >> aren't doing enough cross-version testing beyond just Jenkins. That breakage >> (HADOOP-12587) is stopping my test code working against secure clusters —if >> I was being really harsh I'd have reverted that too, but's been in long >> enough I think a fix is probably the best solution. > > Well, this is already directly contracting point #2, isn't it? :)
Just to be clear, I'm not trying to imply that this was favoritism (I don't think it was) but just that a revert is not always the right solution. A short discussion usually helps to find the right solution, which could be a revert, a follow-on fix, or something else. best, Colin > > I am open to being more critical about patches going in, but I think > we should have some very minimal discussion before reverting things. > It's just polite. > > Colin > > >> 6. Finally: everyone should feel free to fix tests. Don't be shy now! >> >> Giving this is a US vacation week, it should be a quieter week for breakages. >> >> Sorry —but if we can't even get Jenkins stable, then what hope do we have >> for a 2.8 release working? >> >> -Steve >> >>