Mark,

On 6/16/15 4:13 PM, Mark Thomas wrote:
> On 16/06/2015 20:39, Christopher Schultz wrote:
>> Mark,
>>
>> On 6/15/15 8:02 AM, Mark Thomas wrote:
>>> I have been experimenting with the free Azure credits that come with the
>>> MSDN subscription Microsoft kindly offers to all Apache committers to
>>> use for their ASF work.
>>>
>>> I have been looking at options for making the unit tests run faster.
>>>
>>> All the figures below are for running the trunk unit tests on a fully
>>> updated Ubuntu 14.04 LTS instance.
>>>
>>>
>>> A2 Basic 233:53 tests on hdd, with code coverage, 1 thread
>>> D2       120:57 tests on hdd, with code coverage, 1 thread
>>> D2       119:53 tests on ssd, with code coverage, 1 thread
>>> D2        32:16 tests on hdd, no code coverage,   2 threads
>>> D2        23:24 tests on hdd, no code coverage,   4 threads
>>>
>>> (Both A2 and D2 boxes have 2 cores. D2 have 60% faster processors).
>>>
>>> I'll be testing larger instance with more cores later.
>>>
>>> So far, I think it is safe to draw the following conclusions:
>>> - code coverage is expensive
>>> - code coverage (as currently configured) requires single thread
>>>   execution (more on this below)
>>> - 1 test thread per core definitely gives better performance
>>> - 2 test threads per core gives even better performance
>>
>> Obviously, code coverage and CPU power (more likely access to the CPU,
>> not the CPU speed itself) are bigger factors in the equation, here.
> 
> Comparing A2 and D2 above, the only difference is CPU speed.

I don't know anything about Azure specifically, but I do know that in
AWS the virtual machine classes include both differences in the CPU
itself (the hardware) and also access to certain numbers of cores. So
for example, they may have a 16-core box, but your VM will be limited to
at most one of them at a time. I was wondering if Azure did the same
kind of thing.

>> Multi-threaded is nice, but it's marginal compared to the other factors
>> (which are orders of magnitude at this point).
> 
> Not when you increase the number of cores it isn't.

Okay, we'll see with more data. Your numbers below are encouraging.

When you say "core", do you mean CPU or CPU-thread? For example, most
Intel processors these days have hyperthreading which means two threads
per core. I have a quad-core laptop but I can have 8
simultaneously-executing processes.

I'm curious as to why you are getting I/O timeouts when you use Nthreads
> Ncores, but I suppose that depends upon the way you count. On my
laptop, does that mean I shouldn't exceed 4 threads or 8 threads?

>> One more data point would have been good to have:
>>
>> D2    ???:?? tests on hdd, no code coverage, 1 thread
> 
> Agreed. I'll set a test running and see where it ends up.
> 
> Some more figures I do have:
> 
> D8 09:53 tests on hdd, no code coverage, 8 threads (8 core box).
> 
> On my laptop (4 core) the time taken to run the unit tests dropped from
> ~60 mins to ~15 mins with 4 threads (pretty much linear).
> 
>>> Where the limit is for threads per core is TBD.
>>>
>>> I've already fixed the unit tests (I think) so parallel running is
>>> possible. I'll be adding a threads option to build.xml shortly. It will
>>> default to 1 and I'll add a comment to build.properties.default not to
>>> increase it above 1 if code coverage is enabled (I might try and detect
>>> and handle that case). Once I have data on threads vs cores I'll add
>>> that too.
>>>
>>> The reason code coverage doesn't work with the junit threads option is
>>> that cobertura serialises the coverage data between tests. If we
>>> partitioned the tests (e.g. by name) and configured separated coverage
>>> data files for each partition (merging them at the end) then cobertura
>>> would be OK. Sensibly partitioning the tests is more effort than I have
>>> time for at the moment so I am going with the simple option.
>>
>> If doubling the number of threads delivers a ~30% performance
>> improvement in the code coverage (just extrapolating the results for
>> merely running the tests over to code-coverage), then perhaps a
>> heavy-handed segmentation of the Cobertura tests into two
>> arbitrarily-selected sets of tests would be a good trial with not too
>> much effort to give it a try.
>>
>> What do you think?
> 
> I think that is more effort than it took to get the multi-threaded tests
> working. You'll need to set up parallel junit tests in Ant, manage the
> separate code coverage files and then merge the result.

Hmm. Merging the results would be ugly, especially if we have to figure
out how two separate runs of Cobertura ran over the same piece of code
(we might have some cross-over).

Or we could just have two separate reports ;)

-chris

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to