Re: [DISCUSS] When is a test not flaky anymore?

2018-07-05 Thread Dan Smith
Honestly I've never liked the flaky category. What it means is that at some
point in the past, we decided to put off tracking down and fixing a failure
and now we're left with a bug number and a description and that's it.

I think we will be better off if we just get rid of the flaky category
entirely. That way no one can label anything else as flaky and push it off
for later, and if flaky tests fail again we will actually prioritize and
fix them instead of ignoring them.

I think Patrick was looking at rerunning the flaky tests to see what is
still failing. How about we just run the whole flaky suite some number of
times (100?), fix whatever is still failing and close out and remove the
category from the rest?

I think will we get more benefit from shaking out and fixing the issues we
have in the current codebase than we will from carefully explaining the
flaky failures from the past.

-Dan

On Thu, Jul 5, 2018 at 7:03 PM, Dale Emery  wrote:

> Hi Alexander and all,
>
> > On Jul 5, 2018, at 5:11 PM, Alexander Murmann 
> wrote:
> >
> > Hi everyone!
> >
> > Dan Smith started a discussion about shaking out more flaky DUnit tests.
> > That's a great effort and I am happy it's happening.
> >
> > As a corollary to that conversation I wonder what the criteria should be
> > for a test to not be considered flaky any longer and have the category
> > removed.
> >
> > In general the bar should be fairly high. Even if a test only fails ~1 in
> > 500 runs that's still a problem given how many tests we have.
> >
> > I see two ends of the spectrum:
> > 1. We have a good understanding why the test was flaky and think we fixed
> > it.
> > 2. We have a hard time reproducing the flaky behavior and have no good
> > theory as to why the test might have shown flaky behavior.
> >
> > In the first case I'd suggest to run the test ~100 times to get a little
> > more confidence that we fixed the flaky behavior and then remove the
> > category.
>
> Here’s a test for case 1:
>
> If we really understand why it was flaky, we will be able to:
> - Identify the “faults”—the broken places in the code (whether system
> code or test code).
> - Identify the exact conditions under which those faults led to the
> failures we observed.
> - Explain how those faults, under those conditions. led to those
> failures.
> - Run unit tests that exercise the code under those same conditions,
> and demonstrate that
>   the formerly broken code now does the right thing.
>
> If we’re lacking any of these things, I’d say we’re dealing with case 2.
>
> > The second case is a lot more problematic. How often do we want to run a
> > test like that before we decide that it might have been fixed since we
> last
> > saw it happen? Anything else we could/should do to verify the test
> deserves
> > our trust again?
>
>
> I would want a clear, compelling explanation of the failures we observed.
>
> Clear and compelling are subjective, of course. For me, clear and
> compelling would include
> descriptions of:
>- The faults in the code. What, specifically, was broken.
>- The specific conditions under which the code did the wrong thing.
>- How those faults, under those conditions, led to those failures.
>- How the fix either prevents those conditions, or causes the formerly
> broken code to
>  now do the right thing.
>
> Even if we don’t have all of these elements, we may have some of them.
> That can help us
> calibrate our confidence. But the elements work together. If we’re lacking
> one, the others
> are shaky, to some extent.
>
> The more elements are missing in our explanation, the more times I’d want
> to run the test
> before trusting it.
>
> Cheers,
> Dale
>
> —
> Dale Emery
> dem...@pivotal.io
>
>


Geode unit tests completed in 'develop/DistributedTest' with non-zero exit code

2018-07-05 Thread apachegeodeci
Pipeline results can be found at:

Concourse: 
https://concourse.apachegeode-ci.info/teams/main/pipelines/develop/jobs/DistributedTest/builds/98



Re: [DISCUSS] When is a test not flaky anymore?

2018-07-05 Thread Dale Emery
Hi Alexander and all,

> On Jul 5, 2018, at 5:11 PM, Alexander Murmann  wrote:
> 
> Hi everyone!
> 
> Dan Smith started a discussion about shaking out more flaky DUnit tests.
> That's a great effort and I am happy it's happening.
> 
> As a corollary to that conversation I wonder what the criteria should be
> for a test to not be considered flaky any longer and have the category
> removed.
> 
> In general the bar should be fairly high. Even if a test only fails ~1 in
> 500 runs that's still a problem given how many tests we have.
> 
> I see two ends of the spectrum:
> 1. We have a good understanding why the test was flaky and think we fixed
> it.
> 2. We have a hard time reproducing the flaky behavior and have no good
> theory as to why the test might have shown flaky behavior.
> 
> In the first case I'd suggest to run the test ~100 times to get a little
> more confidence that we fixed the flaky behavior and then remove the
> category.

Here’s a test for case 1:

If we really understand why it was flaky, we will be able to:
- Identify the “faults”—the broken places in the code (whether system code 
or test code).
- Identify the exact conditions under which those faults led to the 
failures we observed.
- Explain how those faults, under those conditions. led to those failures.
- Run unit tests that exercise the code under those same conditions, and 
demonstrate that
  the formerly broken code now does the right thing.

If we’re lacking any of these things, I’d say we’re dealing with case 2.

> The second case is a lot more problematic. How often do we want to run a
> test like that before we decide that it might have been fixed since we last
> saw it happen? Anything else we could/should do to verify the test deserves
> our trust again?


I would want a clear, compelling explanation of the failures we observed.

Clear and compelling are subjective, of course. For me, clear and compelling 
would include
descriptions of:
   - The faults in the code. What, specifically, was broken.
   - The specific conditions under which the code did the wrong thing.
   - How those faults, under those conditions, led to those failures.
   - How the fix either prevents those conditions, or causes the formerly 
broken code to
 now do the right thing.

Even if we don’t have all of these elements, we may have some of them. That can 
help us
calibrate our confidence. But the elements work together. If we’re lacking one, 
the others
are shaky, to some extent.

The more elements are missing in our explanation, the more times I’d want to 
run the test
before trusting it.

Cheers,
Dale

—
Dale Emery
dem...@pivotal.io



Geode unit tests completed in 'develop/DistributedTest' with non-zero exit code

2018-07-05 Thread apachegeodeci
Pipeline results can be found at:

Concourse: 
https://concourse.apachegeode-ci.info/teams/main/pipelines/develop/jobs/DistributedTest/builds/98



Geode unit tests completed in 'develop/AcceptanceTest' with non-zero exit code

2018-07-05 Thread apachegeodeci
Pipeline results can be found at:

Concourse: 
https://concourse.apachegeode-ci.info/teams/main/pipelines/develop/jobs/AcceptanceTest/builds/133



[DISCUSS] When is a test not flaky anymore?

2018-07-05 Thread Alexander Murmann
Hi everyone!

Dan Smith started a discussion about shaking out more flaky DUnit tests.
That's a great effort and I am happy it's happening.

As a corollary to that conversation I wonder what the criteria should be
for a test to not be considered flaky any longer and have the category
removed.

In general the bar should be fairly high. Even if a test only fails ~1 in
500 runs that's still a problem given how many tests we have.

I see two ends of the spectrum:
1. We have a good understanding why the test was flaky and think we fixed
it.
2. We have a hard time reproducing the flaky behavior and have no good
theory as to why the test might have shown flaky behavior.

In the first case I'd suggest to run the test ~100 times to get a little
more confidence that we fixed the flaky behavior and then remove the
category.

The second case is a lot more problematic. How often do we want to run a
test like that before we decide that it might have been fixed since we last
saw it happen? Anything else we could/should do to verify the test deserves
our trust again?


Re: [DISCUSS] Run DistributedTest more frequently

2018-07-05 Thread Dan Smith
A separate pipeline would work too. Maybe we could even create multiple
DistributedTest jobs to run in parallel that way We'd want to create
some metrics jobs for that separate pipeline for sure.

-Dan

On Thu, Jul 5, 2018 at 4:53 PM, Anthony Baker  wrote:

> What do you think about making a separate pipeline for this?
>
> Anthony
>
>
> > On Jul 5, 2018, at 4:50 PM, Alexander Murmann 
> wrote:
> >
> > Sounds like a great idea.
> >
> > I particularly like running it on CI a bunch because it should identify
> > flaky tests and flaky product issues, but also might find problematic
> > interactions with how we run on CI.
> >
> >
> > On Thu, Jul 5, 2018 at 4:40 PM, Dan Smith  wrote:
> >
> >> We seem to keep hitting intermittent failures in DistributedTest in our
> >> concourse pipeline.
> >>
> >> I'd like to increase the frequency that DistributedTest runs in order to
> >> shake out more flaky tests. With concourse we should be able to trigger
> >> this job on a timer so that it is running continuously.
> >>
> >> What do you all think about this plan? It means more runs to analyze,
> but
> >> it should us help clean up this job in the long run.
> >>
> >> -Dan
> >>
>
>


Geode unit tests completed in 'develop/AcceptanceTest' with non-zero exit code

2018-07-05 Thread apachegeodeci
Pipeline results can be found at:

Concourse: 
https://concourse.apachegeode-ci.info/teams/main/pipelines/develop/jobs/AcceptanceTest/builds/132



Geode unit tests completed in 'develop/UITests' with non-zero exit code

2018-07-05 Thread apachegeodeci
Pipeline results can be found at:

Concourse: 
https://concourse.apachegeode-ci.info/teams/main/pipelines/develop/jobs/UITests/builds/131



Re: [DISCUSS] Run DistributedTest more frequently

2018-07-05 Thread Anthony Baker
What do you think about making a separate pipeline for this?

Anthony


> On Jul 5, 2018, at 4:50 PM, Alexander Murmann  wrote:
> 
> Sounds like a great idea.
> 
> I particularly like running it on CI a bunch because it should identify
> flaky tests and flaky product issues, but also might find problematic
> interactions with how we run on CI.
> 
> 
> On Thu, Jul 5, 2018 at 4:40 PM, Dan Smith  wrote:
> 
>> We seem to keep hitting intermittent failures in DistributedTest in our
>> concourse pipeline.
>> 
>> I'd like to increase the frequency that DistributedTest runs in order to
>> shake out more flaky tests. With concourse we should be able to trigger
>> this job on a timer so that it is running continuously.
>> 
>> What do you all think about this plan? It means more runs to analyze, but
>> it should us help clean up this job in the long run.
>> 
>> -Dan
>> 



[DISCUSS] Run DistributedTest more frequently

2018-07-05 Thread Dan Smith
We seem to keep hitting intermittent failures in DistributedTest in our
concourse pipeline.

I'd like to increase the frequency that DistributedTest runs in order to
shake out more flaky tests. With concourse we should be able to trigger
this job on a timer so that it is running continuously.

What do you all think about this plan? It means more runs to analyze, but
it should us help clean up this job in the long run.

-Dan


[Spring CI] Spring Data GemFire > Nightly-ApacheGeode > #969 was SUCCESSFUL (with 2423 tests)

2018-07-05 Thread Spring CI

---
Spring Data GemFire > Nightly-ApacheGeode > #969 was successful.
---
Scheduled
2425 tests in total.

https://build.spring.io/browse/SGF-NAG-969/





--
This message is automatically generated by Atlassian Bamboo

Geode unit tests completed in 'develop/AcceptanceTest' with non-zero exit code

2018-07-05 Thread apachegeodeci
Pipeline results can be found at:

Concourse: 
https://concourse.apachegeode-ci.info/teams/main/pipelines/develop/jobs/AcceptanceTest/builds/131



Re: Concourse instability

2018-07-05 Thread Sean Goller
I think we're firing on all cylinders again. I'll be monitoring it for the
rest of the day for issues.

On Thu, Jul 5, 2018 at 11:39 AM Sean Goller  wrote:

> We're experiencing a minor amount of instability relating to filled up
> disks on workers, along with massive job execution. We're recreating
> workers so for the next hour or so things may not be working optimally.
> Thank you for your patience.
>
>
> -Sean.
>
>


Concourse instability

2018-07-05 Thread Sean Goller
We're experiencing a minor amount of instability relating to filled up
disks on workers, along with massive job execution. We're recreating
workers so for the next hour or so things may not be working optimally.
Thank you for your patience.


-Sean.


Re: PR integration with concourse

2018-07-05 Thread Anthony Baker
After a bit of discussion with ASF INFRA, we can now start adding GitHub 
‘checks’ to PR’s.  

This is awesome because it makes it easier for new and existing contributors to 
verify that their changes are good and ready to be merged.  When you submit a 
PR, the ‘pr-develop’ pipeline [1] will trigger and run the same jobs as the 
main ‘develop’ pipeline (but on your PR).  For each job that runs, there will 
be status information added to the PR along with links that go to specific 
concourse jobs.  See [2] for an example.  If you push another commit to the PR, 
the jobs will be triggered again.

Feedback and contributions are appreciated.

Anthony

[1] https://concourse.apachegeode-ci.info/teams/main/pipelines/pr-develop 

[2] https://github.com/apache/geode/pull/2106 



> On Apr 20, 2018, at 9:51 AM, Anthony Baker  wrote:
> 
> FYI, I’ve filed https://issues.apache.org/jira/browse/INFRA-16409 to allow us 
> to better integrate concourse and GitHub PR”s.
> 
> Anthony
> 



Geode unit tests completed in 'develop/AcceptanceTest' with non-zero exit code

2018-07-05 Thread apachegeodeci
Pipeline results can be found at:

Concourse: 
https://concourse.apachegeode-ci.info/teams/main/pipelines/develop/jobs/AcceptanceTest/builds/129



Geode unit tests completed in 'develop/AcceptanceTest' with non-zero exit code

2018-07-05 Thread apachegeodeci
Pipeline results can be found at:

Concourse: 
https://concourse.apachegeode-ci.info/teams/main/pipelines/develop/jobs/AcceptanceTest/builds/128



Geode unit tests completed in 'develop/UITests' with non-zero exit code

2018-07-05 Thread apachegeodeci
Pipeline results can be found at:

Concourse: 
https://concourse.apachegeode-ci.info/teams/main/pipelines/develop/jobs/UITests/builds/127



Build for version 1.8.0-build.1095 of Apache Geode failed.

2018-07-05 Thread apachegeodeci
=

The build job for Apache Geode version 1.8.0-build.1095 has failed.


Build artifacts are available at:
http://files.apachegeode-ci.info/builds/1.8.0-build.1095/geode-build-artifacts-1.8.0-build.1095.tgz

Test results are available at:
http://files.apachegeode-ci.info/builds/1.8.0-build.1095/test-results/build/


Job: 
https://concourse.apachegeode-ci.info/teams/main/pipelines/develop/jobs/Build/builds/139

=


Build for version 1.8.0-build.1094 of Apache Geode failed.

2018-07-05 Thread apachegeodeci
=

The build job for Apache Geode version 1.8.0-build.1094 has failed.


Build artifacts are available at:
http://files.apachegeode-ci.info/builds/1.8.0-build.1094/geode-build-artifacts-1.8.0-build.1094.tgz

Test results are available at:
http://files.apachegeode-ci.info/builds/1.8.0-build.1094/test-results/build/


Job: 
https://concourse.apachegeode-ci.info/teams/main/pipelines/develop/jobs/Build/builds/138

=