Gregory P. Smith <g...@krypto.org> added the comment:

re: slow tests in the first half of the list.  the same total amount of time is 
going to be spent regardless.  In our test suite on a modern fast 16 thread 
system, all but 10 tests are completed in parallel within the first 30 seconds. 
 The remaining ~10 take 10x+ that wall time more minutes.

So the most latency you will shave off on a modern system is probably <30 
seconds.  On a slower system the magnitude of that will remain the same in 
proportion.  CI systems are not workstations.  On -j1 or -j2 system I doubt it 
will make a meaningful difference at all.

Picture test execution as a utilization graph:

```
|ttttttttttttttttttttttt
|                       tttt
|                           ttt
|                              tttttttttt
+----------------------------------------
```

The total area under that curve is going to remain the same no matter what so 
long as we execute everything.  Reordering the tests can pull the final long 
tail in a bit by pushing out the top layer.  You move more towards an optimal 
rectangle, but you're still limited by the area.  **The less -jN parallelism 
you have as CPU cores the less difference any reordering change makes.**

What actual parallelism do our Github CI systems offer?

The fundamental problem is that we do a LOT in our test suite and have no 
concept of what depends on what and thus _needs_ to be run.  So we run it all.  
For specialized tests like test_peg_generator and test_tools it should be easy 
to determine from a list of modified files if those tests are relevant.

That gets a lot more complicated to accurately express for things like 
test_multiprocessing and test_concurrent_futures.

test_peg_generator and test_tools are also *packages of tests* that themselves 
should be parallelized individually instead of considered a single serialized 
unit.

At work we even shard test methods within TestCase classes so that big ones can 
be split across test executor tasks: See the _setup_sharding() function in 
absltest here: 
https://github.com/abseil/abseil-py/blob/main/absl/testing/absltest.py#L2368

In absence of implementing an approach like that within test.regrtest to shard 
at a more granular level thus enabling us to approach the golden rectangle of 
optimal parallel test latency, we're left with manually splitting long running 
test module/packages up into smaller units to achieve a similar effect.

----------
nosy: +gregory.p.smith

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue46524>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to