Hi Timothée,

Timothee Mathieu <[email protected]> writes:

> After further investigations, it seems that the problem was not from
> the simulator itself but with the fact that it simulates contact which
> are very sensitive to even a small difference in the input actions. I
> discovered that pytorch (and maybe other dependencies) has a
> reproducibility problem of order 1e-5 when on AVX512 compared to
> AVX2. I first tried to solve the problem by disabling AVX512 at the
> level of pytorch, but it did not work. The dev of pytorch said that it
> may be because some components dispatch computation to MKL-DNN, I
> tried to disable AVX512 on MKL, and still the results were not
> reproducible, I also tried to deactivate in openmpi without success.
> I finally concluded that there was a problem with AVX512 somewhere in
> the dependencies graph but I gave up identifying where, as this seems
> very complicated.

Oh, not fully satisfactory then.  :-)

> Instead, I found a tool https://github.com/twosigma/libvirtcpuid/
> which allows me to mask avx512 from the process and this worked! I was
> able to use it to modify glibc with a graft in the guix shell command
> to disable AVX512 in a guix shell command and get the exact same
> result on both AVX512 and non-AVX512 computers without much of an
> overhead (there is no vm, the only difference seems to be a slight
> acceleration when using AVX512 as expected).

Interesting, thanks!

Ludo’.

Reply via email to