On Wednesday 2022-10-05 23:24, Karl Berry wrote:
>
>What troubles me most is that there's no obvious way to debug any test
>failure involving parallelism, since they go away with serial execution.
>Any ideas about how to determine what is going wrong in the parallel
>make? Any way to make parallel failures more reproducible?
1. Throw more processes in the mix (make -jN with more-than-normal N)
so that either
- for each (single) process the "critical section" execution time goes up
- for the whole job set, the total time spent in/around critical sections
goes up
2. determine which exact (sub-)program and syscall failed in what process in
what job (strace), then construct a hypothesis around that failure
3. watch if any one job is somehow executed twice, or a file is written to
concurrently
foo: foo.c foo.h
ld -o foo ...
foo.c foo.h:
generate_from_somewhere
3b. or a file is read and written to concurrently
%.o: %.c
generate_version.h
cc -o $@ $<
foo: foo.o bar.o
(and foo.c, bar.c, nongenerated, have a #include "version.h")
I've seen something like that in libtracefs commit
b64dc07ca44ccfed40eae8d345867fd938ce6e0e