On 17 July 2015 at 13:08, Dimiter Naydenov
<dimiter.nayde...@canonical.com> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 17.07.2015 12:07, James Tunnicliffe wrote:
>> /me opens can of worms
> Thanks for starting the discussion :)
>
>>
>> Having spent perhaps too long trying to parallelise the running of
>> the unit test suite over multiple machines using various bat guano
>> crazy ideas, I know too much about this but haven't got an easy
>> fix. I do know the right fix is to re-write the very long tests
>> that we have.
>>
>> If you want to find long running tests, go test -v ./... -check.v
>> is the command to run at top level. You will get a lot of output,
>> but it isn't difficult to find tests that take longer than 10
>> seconds with grep and I am sure I could dig the script out that I
>> wrote that examines the output and tells you all tests over a
>> certain runtime.
>>
>> When you run "go test ./..." at the top of juju/juju it runs suites
>> in parallel. If you have multiple long tests in a suite then it has
>> a significant impact on the total runtime. We have no way with the
>> current tools to exclude single tests without modifying the tests
>> themselves;
>
> How about GOMAXPROCS=1 go test ./... ? Won't that force the runtime to
> run all suites sequentially?

I don't want to run them sequentially - that would be slower.

There are several things going on. First, long tests are bad, but if
they have to be long then starting them as soon as possible is good
because it is more efficient to pack big things first, then small
things (think of a bucket, put the big rocks in first, sand in last,
you can easily level the sand off, but if you put the sand in first
you end up with a lumpy surface). The second is long tests tend to be
ones sitting and waiting for things to happen, but aren't very CPU
intensive, but if you increase GOMAXPROCS in the hope that you can
take advantage of unused CPU time you mostly end up making other tests
fail that are timing dependant because you just slowed them down
enough to fail. The third is the scheduler running tests seems to
(though I haven't looked at the code) run one suite per process and
those suites single threaded, in alphabetical directory search order,
so since our longer suites tend to be closer to the end of that list
than to start with, it doesn't optimally schedule.

I know there is work ongoing to improve the go scheduler, which may
help if it looks at load and not just number of active processes.

>> if we did we could run all the tests that take less than a few
>> seconds by maintaining a list of long tests, and run those long
>> tests as a separate, parallel task. The real fix is to put some
>> effort into making the long running tests more unit test and less
>> full stack test. 30+ seconds is not what we want. The least worst
>> idea I have is making a sub-suite for tests that take > 10 seconds,
>> one test per suite, so the standard tools will run them in parallel
>> with everything else. Providing you have many CPUs there is a
>> reasonable chance this will help. It is not remotely nice though.
>
> Using go tool pprof can also help figuring out why certain tests take
> a long time and/or memory. I'm planning to experiment with it and come
> up with some feedback.

I did take a quick look a while ago, but I was a young Juju hacker and
young go hacker, so didn't get much further than looking at the
numbers and thinking "yep, they are big". I would be very surprised if
there was an easy fix for the long running tests. I expect that
testing in a different way is required. The good news is the number of
long tests is small.

These are the long tests as found by the combination of these two:
http://pastebin.ubuntu.com/11892666/
http://pastebin.ubuntu.com/11892667/

PASS: pinger_test.go:131:
mongoPingerSuite.TestAgentConnectionsShutDownWhenStateDies 30.368s
PASS: fetch_test.go:60: FetchSuite.TestRun 9.003s
PASS: fetch_test.go:60: FetchSuite.TestRun 9.002s
PASS: status_test.go:2673: StatusSuite.TestStatusAllFormats 13.327s
PASS: upgradejuju_test.go:308: UpgradeJujuSuite.TestUpgradeJuju 16.219s
PASS: machine_test.go:409: MachineSuite.TestHostUnits 10.795s
PASS: machine_test.go:498: MachineSuite.TestManageEnviron 9.919s
PASS: machine_test.go:1941:
mongoSuite.TestStateWorkerDialSetsWriteMajority 12.071s
PASS: unit_test.go:225: UnitSuite.TestUpgradeFailsWithoutTools 10.116s
PASS: bootstrap_test.go:142:
bootstrapSuite.TestBootstrapNoToolsDevelopmentConfig 11.892s
PASS: bootstrap_test.go:123:
bootstrapSuite.TestBootstrapNoToolsNonReleaseStream 11.623s
PASS: leadership_test.go:130: leadershipSuite.TestClaimLeadership 10.021s
PASS: dblog_test.go:65: dblogSuite.TestMachineAgentWithoutFeatureFlag 10.012s
PASS: dblog_test.go:83: dblogSuite.TestUnitAgentWithoutFeatureFlag 10.060s
PASS: oplog_test.go:26: oplogSuite.TestWithRealOplog 14.208s
PASS: assign_test.go:1259:
assignCleanSuite.TestAssignUnitPolicyConcurrently 10.530s
PASS: assign_test.go:1259:
assignCleanSuite.TestAssignUnitPolicyConcurrently 10.834s
PASS: state_test.go:189: MultiEnvStateSuite.TestWatchTwoEnvironments 9.766s
PASS: restore_test.go:98: RestoreSuite.TestReplicasetIsReset 11.175s
PASS: initiate_test.go:24: InitiateSuite.TestInitiateReplicaSet 10.075s
PASS: kvm-broker_test.go:403:
kvmProvisionerSuite.TestContainerStartedAndStopped 10.056s
PASS: lxc-broker_test.go:1087:
lxcProvisionerSuite.TestContainerStartedAndStopped 15.054s
PASS: provisioner_test.go:1095:
ProvisionerSuite.TestSetInstanceInfoFailureSetsErrorStatusAndStopsInstanceButKeepsGoing
10.144s
PASS: uniter_test.go:1508: UniterSuite.TestActionEvents 39.614s
PASS: uniter_test.go:1114: UniterSuite.TestUniterRelations 16.092s
PASS: uniter_test.go:970: UniterSuite.TestUniterUpgradeGitConflicts 10.982s
{'quick': 71, 'long': 19, 'ok': 51, 'sub-second': 6970, 'very-long': 26}

>>
>> 0 ✓ dooferlad@homework2
>> ~/dev/go/src/github.com/juju/juju/worker/uniter $ go test -check.v
>>
>> Shorter tests deleted from this list. The longest are: PASS:
>> uniter_test.go:1508: UniterSuite.TestActionEvents 39.711s PASS:
>> uniter_test.go:1114: UniterSuite.TestUniterRelations 16.276s PASS:
>> uniter_test.go:970: UniterSuite.TestUniterUpgradeGitConflicts
>> 11.354s
>>
>> These are a worth a look: PASS: uniter_test.go:2053:
>> UniterSuite.TestLeadership 5.146s PASS: util_unix_test.go:103:
>> UniterSuite.TestRunCommand 6.946s PASS: uniter_test.go:2104:
>> UniterSuite.TestStorage 4.593s PASS: uniter_test.go:1367:
>> UniterSuite.TestUniterCollectMetrics 4.102s PASS:
>> uniter_test.go:774: UniterSuite.TestUniterDeployerConversion
>> 6.904s PASS: uniter_test.go:427:
>> UniterSuite.TestUniterDyingReaction 5.772s PASS:
>> uniter_test.go:393: UniterSuite.TestUniterHookSynchronisation
>> 4.546s PASS: uniter_test.go:1274:
>> UniterSuite.TestUniterRelationErrors 4.536s PASS:
>> uniter_test.go:476: UniterSuite.TestUniterSteadyStateUpgrade
>> 6.405s PASS: uniter_test.go:895:
>> UniterSuite.TestUniterUpgradeConflicts 6.430s
>>
>> ok   github.com/juju/juju/worker/uniter 175.014s
>>
>> James
>>
>> On 17 July 2015 at 04:59, Tim Penhey <tim.pen...@canonical.com>
>> wrote:
>>> Hi Curtis,
>>>
>>> I have been looking at some of the recent cursings from ppc64le,
>>> and the last two included timeouts for the worker/uniter tests.
>>>
>>> On my machine, amd64, i7, 16 gig ram, I get the following:
>>>
>>> $ time go test 2015-07-17 03:53:03 WARNING juju.worker.uniter
>>> upgrade123.go:26 no uniter state file found for unit
>>> unit-mysql-0, skipping uniter upgrade step OK: 51 passed PASS ok
>>> github.com/juju/juju/worker/uniter      433.256s
>>>
>>> real    7m24.270s user    3m18.647s sys     1m2.472s
>>>
>>> Now lets ignore the the logging output that someone should fix,
>>> we can see how long it takes here. Given that gccgo on power is
>>> slower, we are going to do two things:
>>>
>>> 1) increase the timeouts for the uniter
>>>
>>> 2) change the uniter tests
>>>
>>> WRT to point 2, most of the uniter tests are actually fully
>>> functional end to end tests, and should not be run every time we
>>> land code.
>>>
>>> They should be moved into the featuretest package.
>>>
>>> Thanks, Tim
>>>
>>> -- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify
>>> settings or unsubscribe at:
>>> https://lists.ubuntu.com/mailman/listinfo/juju-dev
>>
>
>
> - --
> Dimiter Naydenov <dimiter.nayde...@canonical.com>
> Juju Core Sapphire team <http://juju.ubuntu.com>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v2.0.22 (GNU/Linux)
>
> iQEcBAEBAgAGBQJVqPAiAAoJENzxV2TbLzHwIHYIAKLXI2F4V/Jp+3rFLqbOCrgx
> QHTnnnARC7yDE5nbz0nFC/Z6JdEIsG+Xc+JzsaYh+cpZiRTmRvwztSlOyFBq649a
> fpCyUttY7CvPGxf+ul58dkFD2JL7Pv/ZNOAR4vGS6X2IR5y/UohtJVntkh3i68xQ
> +zRNlhmrGs2pxYVTHMPjfO+X83Cv/UNHq/j7upk1jRKXrm4AjjqGS+vQkIvTUJDF
> Y2T8efxFXHnMP5u3qI6yyoE1C8/wjh2AHkNNcVPoAy8ClRVjowOo0UpSH8XV2k89
> PRtA35ON7Xrgrv45SOehuDo7PyeZacop7wp2d+tNKLLV4xi75aKkt7EQUcfmNOk=
> =I+Ar
> -----END PGP SIGNATURE-----
>
> --
> Juju-dev mailing list
> Juju-dev@lists.ubuntu.com
> Modify settings or unsubscribe at: 
> https://lists.ubuntu.com/mailman/listinfo/juju-dev

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Reply via email to