Re: Debugging intermittent dunit failures

Kirk Lund Fri, 15 Dec 2017 10:59:18 -0800

This is even better than I hoped for.

+1 to make the change. I already approved the PR.


On Fri, Dec 15, 2017 at 7:50 AM, Jens Deppe <jde...@pivotal.io> wrote:

> Using forkEvery=1, my distributedTest run took 3h5m which is maybe 10
> minutes slower than the random sampling I did of other tests and
> distributedTestCore took 2h35m which is on par with others.
>
> I'd say that it's worth switching to forkEvery=1 as it will most likely
> provide more consistency for test results.
>
> --Jens
>
> On Thu, Dec 14, 2017 at 4:29 PM, Patrick Rhomberg <prhomb...@pivotal.io>
> wrote:
>
> > Are the AcceptanceTests currently set to fork after every one?  It might
> be
> > apples to oranges, but we could try to look at the cost difference there.
> >
> > On Thu, Dec 14, 2017 at 4:24 PM, Kirk Lund <kl...@apache.org> wrote:
> >
> > > Thanks Jens! Let us know if you end up running dunit in parallel with
> > > forkevery 1 or if it's completely serial (one at a time). I think
> > parallel
> > > with docker with forkevery 1 is going to give us the best results but
> I'm
> > > sure you already know that since you're the docker king.
> > >
> > > This change would eliminate memory pollution. Any test pollution
> > remaining
> > > after this should be caused by dunit writing to non-TemporaryFolder
> > > directories such as current working directory or user's home directory.
> > >
> > > On Thu, Dec 14, 2017 at 4:08 PM, Jens Deppe <jde...@pivotal.io> wrote:
> > >
> > > > I'll do that.
> > > >
> > > > On Thu, Dec 14, 2017 at 2:02 PM, Kirk Lund <kl...@apache.org> wrote:
> > > >
> > > > > Someone needs to try it out. If there are any build-engineer types
> > > > working
> > > > > on geode, then my suggestion is for them to try this change and
> > report
> > > > the
> > > > > timing difference.
> > > > >
> > > > > On Tue, Dec 12, 2017 at 2:22 PM, Alexander Murmann <
> > > amurm...@pivotal.io>
> > > > > wrote:
> > > > >
> > > > > > Do we have a rough idea how forking every time would impact how
> > long
> > > > > tests
> > > > > > run?
> > > > > >
> > > > > > On Tue, Dec 12, 2017 at 1:39 PM, Kirk Lund <kl...@apache.org>
> > wrote:
> > > > > >
> > > > > > > We should just change to fork every 1 instead of 30. Wasting
> time
> > > > > trying
> > > > > > to
> > > > > > > debug statics is well... it's a waste of time. We should be
> > focused
> > > > on
> > > > > > > other things.
> > > > > > >
> > > > > > > On Mon, Dec 11, 2017 at 9:05 PM, Jinmei Liao <
> jil...@pivotal.io>
> > > > > wrote:
> > > > > > >
> > > > > > > > It doesn't call as much static methods as
> > > > JUnit4DistributedTestCase.
> > > > > > > > tearDownVM,
> > > > > > > > see MemberStarterRule.after().
> > > > > > > >
> > > > > > > > On Mon, Dec 11, 2017 at 4:36 PM, Dan Smith <
> dsm...@pivotal.io>
> > > > > wrote:
> > > > > > > >
> > > > > > > > > I don't think we are trying to reuse the distributed system
> > -
> > > it
> > > > > > gets
> > > > > > > > > disconnected after each test. See
> JUnit4DistributedTestCase.
> > > > > > > tearDownVM.
> > > > > > > > >
> > > > > > > > > Are the new junit rules also cleaning things up?
> > > > > > > > >
> > > > > > > > > -Dan
> > > > > > > > >
> > > > > > > > > On Mon, Dec 11, 2017 at 4:16 PM, Kirk Lund <
> kl...@apache.org
> > >
> > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Is there a reason we can't change DistributedTestCase and
> > > > > > subclasses
> > > > > > > to
> > > > > > > > > use
> > > > > > > > > > TemporaryFolder for all artifacts?
> > > > > > > > > >
> > > > > > > > > > We could also disconnectAllFromDS in @AfterClass (or even
> > > > @After)
> > > > > > to
> > > > > > > > get
> > > > > > > > > > things a bit more separate between dunit test classes.
> > > > > > > > > >
> > > > > > > > > > Running dunit tests in parallel is much more important
> than
> > > > > trying
> > > > > > to
> > > > > > > > > reuse
> > > > > > > > > > distributed system across multiple dunit tests. The
> latter
> > > just
> > > > > > isn't
> > > > > > > > > worth
> > > > > > > > > > the headache and trouble that it causes when static vars
> or
> > > > > > constants
> > > > > > > > or
> > > > > > > > > > disk artifacts pollute later tests.
> > > > > > > > > >
> > > > > > > > > > On Mon, Dec 11, 2017 at 1:42 PM, Dan Smith <
> > > dsm...@pivotal.io>
> > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > One other thing you can do is look for the below line
> in
> > > the
> > > > > logs
> > > > > > > of
> > > > > > > > > your
> > > > > > > > > > > failure. These are the tests that ran in the same JVM
> as
> > > your
> > > > > > > tests.
> > > > > > > > > This
> > > > > > > > > > > won't help if your tests are getting messed up by disk
> > > > > artifacts
> > > > > > or
> > > > > > > > > port
> > > > > > > > > > > issues, but if it is some JVM state left by a previous
> > test
> > > > it
> > > > > > > would
> > > > > > > > be
> > > > > > > > > > in
> > > > > > > > > > > this list.
> > > > > > > > > > >
> > > > > > > > > > > Previously run tests: [ClientServerMiscSelectorDUnitT
> > est,
> > > > > > > > > > > ClientConflationDUnitTest, ReliableMessagingDUnitTest]
> > > > > > > > > > >
> > > > > > > > > > > On Mon, Dec 11, 2017 at 1:14 PM, Jens Deppe <
> > > > > > jensde...@apache.org>
> > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > I've recently debugged various distributed tests
> which
> > > fail
> > > > > as
> > > > > > a
> > > > > > > > > result
> > > > > > > > > > > of
> > > > > > > > > > > > prior tests not cleaning up enough. It's somewhat
> > painful
> > > > and
> > > > > > > this
> > > > > > > > is
> > > > > > > > > > my
> > > > > > > > > > > > usual debug process:
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >    - Examine the progress.txt file to determine which
> > > tests
> > > > > ran
> > > > > > > > > before
> > > > > > > > > > my
> > > > > > > > > > > >    failing test.
> > > > > > > > > > > >    - I pick 20-25 of these tests and create a Suite
> > > > > (including
> > > > > > my
> > > > > > > > > > failing
> > > > > > > > > > > >    test) - as these tests may have run in parallel,
> > it's
> > > > not
> > > > > > > clear
> > > > > > > > > > which
> > > > > > > > > > > > tests
> > > > > > > > > > > >    would have run immediately prior to your test
> > > > > > > > > > > >    - Run the whole suite to see if I can get my test
> to
> > > > fail
> > > > > > > > > > > >    - Bisect or manually iterate through the tests to
> > see
> > > > > which
> > > > > > > one
> > > > > > > > is
> > > > > > > > > > > >    causing the problem.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > The last step is painful, so I've updated SuiteRunner
> > to
> > > > use
> > > > > a
> > > > > > > > > > > 'candidate'
> > > > > > > > > > > > test class and run this class after every other class
> > in
> > > > the
> > > > > > list
> > > > > > > > of
> > > > > > > > > > > > SuiteClasses. For example:
> > > > > > > > > > > >
> > > > > > > > > > > > @Suite.SuiteClasses(value = {
> > > > > > > > > > > >     org.apache.geode.cache.snapshot.
> > > > > > SnapshotByteArrayDUnitTest.
> > > > > > > > > class,
> > > > > > > > > > > >     org.apache.geode.cache.query.dunit.
> > > > > > > > > QueryDataInconsistencyDUnitTes
> > > > > > > > > > > > t.class,
> > > > > > > > > > > >     org.apache.geode.cache.query.internal.index.
> > > > > > > > > > > > MultiIndexCreationDUnitTest.class,
> > > > > > > > > > > > })
> > > > > > > > > > > >  @SuiteRunner.Candidate(org.apache.geode.management.
> > > > > > > > > > > > internal.configuration.
> ClusterConfigDistributionDUnit
> > > > > > Test.class)
> > > > > > > > > > > > @RunWith(SuiteRunner.class)
> > > > > > > > > > > > public class DebugSuite {
> > > > > > > > > > > > }
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > The Candidate is optional, but this would run the
> > > following
> > > > > > > tests:
> > > > > > > > > > > >
> > > > > > > > > > > > - SnapshotByteArrayDUnitTest
> > > > > > > > > > > > - *ClusterConfigDistributionDUnitTest*
> > > > > > > > > > > > - QueryDataInconsistencyDUnitTest
> > > > > > > > > > > > - *ClusterConfigDistributionDUnitTest*
> > > > > > > > > > > > - MultiIndexCreationDUnitTest
> > > > > > > > > > > > - *ClusterConfigDistributionDUnitTest*
> > > > > > > > > > > >
> > > > > > > > > > > > --Jens
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Cheers
> > > > > > > >
> > > > > > > > Jinmei
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Debugging intermittent dunit failures

Reply via email to