Re: Debugging intermittent dunit failures

Jens Deppe Fri, 15 Dec 2017 07:51:38 -0800

Using forkEvery=1, my distributedTest run took 3h5m which is maybe 10
minutes slower than the random sampling I did of other tests and
distributedTestCore took 2h35m which is on par with others.


I'd say that it's worth switching to forkEvery=1 as it will most likely
provide more consistency for test results.

--Jens

On Thu, Dec 14, 2017 at 4:29 PM, Patrick Rhomberg <prhomb...@pivotal.io>
wrote:

> Are the AcceptanceTests currently set to fork after every one?  It might be
> apples to oranges, but we could try to look at the cost difference there.
>
> On Thu, Dec 14, 2017 at 4:24 PM, Kirk Lund <kl...@apache.org> wrote:
>
> > Thanks Jens! Let us know if you end up running dunit in parallel with
> > forkevery 1 or if it's completely serial (one at a time). I think
> parallel
> > with docker with forkevery 1 is going to give us the best results but I'm
> > sure you already know that since you're the docker king.
> >
> > This change would eliminate memory pollution. Any test pollution
> remaining
> > after this should be caused by dunit writing to non-TemporaryFolder
> > directories such as current working directory or user's home directory.
> >
> > On Thu, Dec 14, 2017 at 4:08 PM, Jens Deppe <jde...@pivotal.io> wrote:
> >
> > > I'll do that.
> > >
> > > On Thu, Dec 14, 2017 at 2:02 PM, Kirk Lund <kl...@apache.org> wrote:
> > >
> > > > Someone needs to try it out. If there are any build-engineer types
> > > working
> > > > on geode, then my suggestion is for them to try this change and
> report
> > > the
> > > > timing difference.
> > > >
> > > > On Tue, Dec 12, 2017 at 2:22 PM, Alexander Murmann <
> > amurm...@pivotal.io>
> > > > wrote:
> > > >
> > > > > Do we have a rough idea how forking every time would impact how
> long
> > > > tests
> > > > > run?
> > > > >
> > > > > On Tue, Dec 12, 2017 at 1:39 PM, Kirk Lund <kl...@apache.org>
> wrote:
> > > > >
> > > > > > We should just change to fork every 1 instead of 30. Wasting time
> > > > trying
> > > > > to
> > > > > > debug statics is well... it's a waste of time. We should be
> focused
> > > on
> > > > > > other things.
> > > > > >
> > > > > > On Mon, Dec 11, 2017 at 9:05 PM, Jinmei Liao <jil...@pivotal.io>
> > > > wrote:
> > > > > >
> > > > > > > It doesn't call as much static methods as
> > > JUnit4DistributedTestCase.
> > > > > > > tearDownVM,
> > > > > > > see MemberStarterRule.after().
> > > > > > >
> > > > > > > On Mon, Dec 11, 2017 at 4:36 PM, Dan Smith <dsm...@pivotal.io>
> > > > wrote:
> > > > > > >
> > > > > > > > I don't think we are trying to reuse the distributed system
> -
> > it
> > > > > gets
> > > > > > > > disconnected after each test. See JUnit4DistributedTestCase.
> > > > > > tearDownVM.
> > > > > > > >
> > > > > > > > Are the new junit rules also cleaning things up?
> > > > > > > >
> > > > > > > > -Dan
> > > > > > > >
> > > > > > > > On Mon, Dec 11, 2017 at 4:16 PM, Kirk Lund <kl...@apache.org
> >
> > > > wrote:
> > > > > > > >
> > > > > > > > > Is there a reason we can't change DistributedTestCase and
> > > > > subclasses
> > > > > > to
> > > > > > > > use
> > > > > > > > > TemporaryFolder for all artifacts?
> > > > > > > > >
> > > > > > > > > We could also disconnectAllFromDS in @AfterClass (or even
> > > @After)
> > > > > to
> > > > > > > get
> > > > > > > > > things a bit more separate between dunit test classes.
> > > > > > > > >
> > > > > > > > > Running dunit tests in parallel is much more important than
> > > > trying
> > > > > to
> > > > > > > > reuse
> > > > > > > > > distributed system across multiple dunit tests. The latter
> > just
> > > > > isn't
> > > > > > > > worth
> > > > > > > > > the headache and trouble that it causes when static vars or
> > > > > constants
> > > > > > > or
> > > > > > > > > disk artifacts pollute later tests.
> > > > > > > > >
> > > > > > > > > On Mon, Dec 11, 2017 at 1:42 PM, Dan Smith <
> > dsm...@pivotal.io>
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > One other thing you can do is look for the below line in
> > the
> > > > logs
> > > > > > of
> > > > > > > > your
> > > > > > > > > > failure. These are the tests that ran in the same JVM as
> > your
> > > > > > tests.
> > > > > > > > This
> > > > > > > > > > won't help if your tests are getting messed up by disk
> > > > artifacts
> > > > > or
> > > > > > > > port
> > > > > > > > > > issues, but if it is some JVM state left by a previous
> test
> > > it
> > > > > > would
> > > > > > > be
> > > > > > > > > in
> > > > > > > > > > this list.
> > > > > > > > > >
> > > > > > > > > > Previously run tests: [ClientServerMiscSelectorDUnitT
> est,
> > > > > > > > > > ClientConflationDUnitTest, ReliableMessagingDUnitTest]
> > > > > > > > > >
> > > > > > > > > > On Mon, Dec 11, 2017 at 1:14 PM, Jens Deppe <
> > > > > jensde...@apache.org>
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > I've recently debugged various distributed tests which
> > fail
> > > > as
> > > > > a
> > > > > > > > result
> > > > > > > > > > of
> > > > > > > > > > > prior tests not cleaning up enough. It's somewhat
> painful
> > > and
> > > > > > this
> > > > > > > is
> > > > > > > > > my
> > > > > > > > > > > usual debug process:
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >    - Examine the progress.txt file to determine which
> > tests
> > > > ran
> > > > > > > > before
> > > > > > > > > my
> > > > > > > > > > >    failing test.
> > > > > > > > > > >    - I pick 20-25 of these tests and create a Suite
> > > > (including
> > > > > my
> > > > > > > > > failing
> > > > > > > > > > >    test) - as these tests may have run in parallel,
> it's
> > > not
> > > > > > clear
> > > > > > > > > which
> > > > > > > > > > > tests
> > > > > > > > > > >    would have run immediately prior to your test
> > > > > > > > > > >    - Run the whole suite to see if I can get my test to
> > > fail
> > > > > > > > > > >    - Bisect or manually iterate through the tests to
> see
> > > > which
> > > > > > one
> > > > > > > is
> > > > > > > > > > >    causing the problem.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > The last step is painful, so I've updated SuiteRunner
> to
> > > use
> > > > a
> > > > > > > > > > 'candidate'
> > > > > > > > > > > test class and run this class after every other class
> in
> > > the
> > > > > list
> > > > > > > of
> > > > > > > > > > > SuiteClasses. For example:
> > > > > > > > > > >
> > > > > > > > > > > @Suite.SuiteClasses(value = {
> > > > > > > > > > >     org.apache.geode.cache.snapshot.
> > > > > SnapshotByteArrayDUnitTest.
> > > > > > > > class,
> > > > > > > > > > >     org.apache.geode.cache.query.dunit.
> > > > > > > > QueryDataInconsistencyDUnitTes
> > > > > > > > > > > t.class,
> > > > > > > > > > >     org.apache.geode.cache.query.internal.index.
> > > > > > > > > > > MultiIndexCreationDUnitTest.class,
> > > > > > > > > > > })
> > > > > > > > > > >  @SuiteRunner.Candidate(org.apache.geode.management.
> > > > > > > > > > > internal.configuration.ClusterConfigDistributionDUnit
> > > > > Test.class)
> > > > > > > > > > > @RunWith(SuiteRunner.class)
> > > > > > > > > > > public class DebugSuite {
> > > > > > > > > > > }
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > The Candidate is optional, but this would run the
> > following
> > > > > > tests:
> > > > > > > > > > >
> > > > > > > > > > > - SnapshotByteArrayDUnitTest
> > > > > > > > > > > - *ClusterConfigDistributionDUnitTest*
> > > > > > > > > > > - QueryDataInconsistencyDUnitTest
> > > > > > > > > > > - *ClusterConfigDistributionDUnitTest*
> > > > > > > > > > > - MultiIndexCreationDUnitTest
> > > > > > > > > > > - *ClusterConfigDistributionDUnitTest*
> > > > > > > > > > >
> > > > > > > > > > > --Jens
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Cheers
> > > > > > >
> > > > > > > Jinmei
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Debugging intermittent dunit failures

Reply via email to