Bug#1025480: another manifestation of the problem with AVX-512 kernel, workarond

2023-04-18 Thread Enzo Alberto Dari
This bug was correctly retitled by the maintainer as related to the AVX-512
kernel.
However, it is filed against the libopenblas0-pthread debian package while
I have found another case that gives wrong results even using one thread.
The script solves a linear elasticity problem using the finite element
method in octave, the files are shared here:
https://drive.google.com/file/d/1xkw-1YfX3W-mfqD92I1fNYnQM78yA9Sy/view?usp=sharing
After building the system of linear equations, it is solved using the
octave operator "\", and by "lu" factorization, the results are checked by
computing the residual vectors (should have a norm around the truncation
error).

These are the results of the tests:
-Using defaults:
$ octave fem_lame2d.m
Residualbackslash = 17.947
ResidualLUPQbackslash = 7.2444

-Forcing sequential mode (1 thread):
$ OMP_NUM_THREADS=1 octave fem_lame2d.m
Residualbackslash = 17.947
ResidualLUPQbackslash = 7.2444

-Avoiding the use of the openblas AVX-512 kernel (working workaround !!):
$ OPENBLAS_CORETYPE=Haswell octave fem_lame2d.m
Residualbackslash = 1.4547e-13
ResidualLUPQbackslash = 1.4879e-13

-- 
Enzo A. Dari
Profesor Titular
Instituto Balseiro 


Bug#1025480: libopenblas0-pthread: octave inv function gives wrong results apparently in newer processors

2022-12-05 Thread Enzo Alberto Dari
Package: libopenblas0-pthread
Version: 0.3.13+ds-3
Severity: important
Tags: upstream
X-Debbugs-Cc: da...@ib.edu.ar

Dear Maintainer,

While upgrading my debian OS from 10.x to 11.x (octave 4.4.5 to 6.2.0),
one of my scripts started failing. I managed to create the following test
that reproduces the problem:

% non-singular matrix
b=[7110.327, -2592.219, 631.419, -288.541, 169.250, -113.431, 82.646, -63.812, 
51.448, -42.914;
 -1218.551, 1508.124, -720.486, 169.250, -74.433, 42.572, -28.131, 20.364, 
-15.701, 12.683;
 169.250, -482.641, 674.499, -350.244, 82.646, -36.010, 20.364, -13.333, 9.592, 
-7.371;
 -49.544, 82.646, -268.958, 399.001, -215.550, 51.448, -22.463, 12.683, -8.285, 
5.950;
 20.364, -27.810, 51.448, -178.696, 275.664, -152.325, 36.804, -16.173, 9.164, 
-6.000;
 -10.260, 12.683, -18.791, 36.804, -132.946, 211.205, -118.438, 28.958, 
-12.831, 7.317;
 5.950, -6.944, 9.164, -14.257, 28.958, -107.451, 174.780, -99.104, 24.511, 
-10.963;
 -3.839, 4.320, -5.318, 7.317, -11.762, 24.511, -92.803, 153.928, -88.144, 
22.050;
 2.702, -2.966, 3.488, -4.451, 6.304, -10.377, 22.050, -84.854, 143.052, 
-82.752;
 -2.050, 2.211, -2.519, 3.056, -4.000, 5.789, -9.704, 20.944, -81.727, 139.664];
sizeb=n=size(b,1)
rankb=rank(b)
% Builds blocked matrix
B=[eye(n) zeros(n); zeros(n) b];
% Computes inverse
inv1=inv(B);
% Computes inverse by blocks: non-trivial block:
invb=inv(b);
% Build inverse by blocks
inv2=[eye(n) zeros(n); zeros(n) invb];
% Both inverse matrices should be equal
diffinvs=norm(inv1-inv2)
% All these condition numbers should be 1
cond(inv1*B)
cond(B*inv1)
cond(inv2*B)
cond(B*inv2)

The computation of "inv1" gives wrong results in:
-Intel Core i9-9900X
-Intel Core i9-7900X
-Intel Core i5-1035G1
and correct results in:
-Intel Core i5-750
-Intel Core i7-4930K
What pointed me in the direction of a trheads problem was the fact that
setting OMP_NUM_THREADS to 1 modify the output of the computation, (given
correct results in some cases).
The last tests I performed were running octave preloading the pthreads
and the openmp openblas libraries:
LD_PRELOAD=/usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblas.so.0 
octave-cli test.m
gives incorrect results, while
LD_PRELOAD=/usr/lib/x86_64-linux-gnu/openblas-openmp/libopenblas.so.0 
octave-cli test.m
works ok.

(by "works ok" I mean the inverse computed by both methods differ only in the
floating point precision:~2e-17 and all the condition numbers are 1, In the
case of "failure", the inverses differ by ~0.035 and the condition numbers of
inv1*B and B*inv1 are about 4.7-4.9)


-- System Information:
Debian Release: 11.5
  APT prefers stable-updates
  APT policy: (500, 'stable-updates'), (500, 'stable-security'), (500, 'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 5.10.0-18-amd64 (SMP w/20 CPU threads)
Kernel taint flags: TAINT_FIRMWARE_WORKAROUND
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), 
LANGUAGE=en_US:en
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Versions of packages libopenblas0-pthread depends on:
ii  libc6 2.31-13+deb11u5
ii  libgfortran5  10.2.1-6

libopenblas0-pthread recommends no packages.

libopenblas0-pthread suggests no packages.

-- no debconf information