On Tue, Feb 9, 2016 at 11:37 AM, Julian Taylor <jtaylor.deb...@googlemail.com> wrote: > On 09.02.2016 04:59, Nathaniel Smith wrote: >> On Mon, Feb 8, 2016 at 6:07 PM, Nathaniel Smith <n...@pobox.com> wrote: >>> On Mon, Feb 8, 2016 at 6:04 PM, Matthew Brett <matthew.br...@gmail.com> >>> wrote: >>>> On Mon, Feb 8, 2016 at 5:26 PM, Nathaniel Smith <n...@pobox.com> wrote: >>>>> On Mon, Feb 8, 2016 at 4:37 PM, Matthew Brett <matthew.br...@gmail.com> >>>>> wrote: >>>>> [...] >>>>>> I can't replicate the segfault with manylinux wheels and scipy. On >>>>>> the other hand, I get a new test error for numpy from manylinux, scipy >>>>>> from manylinux, like this: >>>>>> >>>>>> $ python -c 'import scipy.linalg; scipy.linalg.test()' >>>>>> >>>>>> ====================================================================== >>>>>> FAIL: test_decomp.test_eigh('general ', 6, 'F', True, False, False, (2, >>>>>> 4)) >>>>>> ---------------------------------------------------------------------- >>>>>> Traceback (most recent call last): >>>>>> File "/usr/local/lib/python2.7/dist-packages/nose/case.py", line >>>>>> 197, in runTest >>>>>> self.test(*self.arg) >>>>>> File >>>>>> "/usr/local/lib/python2.7/dist-packages/scipy/linalg/tests/test_decomp.py", >>>>>> line 658, in eigenhproblem_general >>>>>> assert_array_almost_equal(diag2_, ones(diag2_.shape[0]), >>>>>> DIGITS[dtype]) >>>>>> File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py", >>>>>> line 892, in assert_array_almost_equal >>>>>> precision=decimal) >>>>>> File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py", >>>>>> line 713, in assert_array_compare >>>>>> raise AssertionError(msg) >>>>>> AssertionError: >>>>>> Arrays are not almost equal to 4 decimals >>>>>> >>>>>> (mismatch 100.0%) >>>>>> x: array([ 0., 0., 0.], dtype=float32) >>>>>> y: array([ 1., 1., 1.]) >>>>>> >>>>>> ---------------------------------------------------------------------- >>>>>> Ran 1507 tests in 14.928s >>>>>> >>>>>> FAILED (KNOWNFAIL=4, SKIP=1, failures=1) >>>>>> >>>>>> This is a very odd error, which we don't get when running over a numpy >>>>>> installed from source, linked to ATLAS, and doesn't happen when >>>>>> running the tests via: >>>>>> >>>>>> nosetests /usr/local/lib/python2.7/dist-packages/scipy/linalg >>>>>> >>>>>> So, something about the copy of numpy (linked to openblas) is >>>>>> affecting the results of scipy (also linked to openblas), and only >>>>>> with a particular environment / test order. >>>>>> >>>>>> If you'd like to try and see whether y'all can do a better job of >>>>>> debugging than me: >>>>>> >>>>>> # Run this script inside a docker container started with this >>>>>> incantation: >>>>>> # docker run -ti --rm ubuntu:12.04 /bin/bash >>>>>> apt-get update >>>>>> apt-get install -y python curl >>>>>> apt-get install libpython2.7 # this won't be necessary with next >>>>>> iteration of manylinux wheel builds >>>>>> curl -LO https://bootstrap.pypa.io/get-pip.py >>>>>> python get-pip.py >>>>>> pip install -f https://nipy.bic.berkeley.edu/manylinux numpy scipy nose >>>>>> python -c 'import scipy.linalg; scipy.linalg.test()' >>>>> >>>>> I just tried this and on my laptop it completed without error. >>>>> >>>>> Best guess is that we're dealing with some memory corruption bug >>>>> inside openblas, so it's getting perturbed by things like exactly what >>>>> other calls to openblas have happened (which is different depending on >>>>> whether numpy is linked to openblas), and which core type openblas has >>>>> detected. >>>>> >>>>> On my laptop, which *doesn't* show the problem, running with >>>>> OPENBLAS_VERBOSE=2 says "Core: Haswell". >>>>> >>>>> Guess the next step is checking what core type the failing machines >>>>> use, and running valgrind... anyone have a good valgrind suppressions >>>>> file? >>>> >>>> My machine (which does give the failure) gives >>>> >>>> Core: Core2 >>>> >>>> with OPENBLAS_VERBOSE=2 >>> >>> Yep, that allows me to reproduce it: >>> >>> root@f7153f0cc841:/# OPENBLAS_VERBOSE=2 OPENBLAS_CORETYPE=Core2 python >>> -c 'import scipy.linalg; scipy.linalg.test()' >>> Core: Core2 >>> [...] >>> ====================================================================== >>> FAIL: test_decomp.test_eigh('general ', 6, 'F', True, False, False, (2, 4)) >>> ---------------------------------------------------------------------- >>> [...] >>> >>> So this is indeed sounding like an OpenBLAS issue... next stop >>> valgrind, I guess :-/ >> >> Here's the valgrind output: >> https://gist.github.com/njsmith/577d028e79f0a80d2797 >> >> There's a lot of it, but no smoking guns have jumped out at me :-/ >> >> -n >> > > plenty of smoking guns, e.g.: > > .............==3695== Invalid read of size 8 > 3417 ==3695== at 0x7AAA9C0: daxpy_k_CORE2 (in > /usr/local/lib/python2.7/dist-packages/numpy/.libs/libopenblas.so.0) > 3418 ==3695== by 0x76BEEFC: ger_kernel (in > /usr/local/lib/python2.7/dist-packages/numpy/.libs/libopenblas.so.0) > 3419 ==3695== by 0x788F618: exec_blas (in > /usr/local/lib/python2.7/dist-packages/numpy/.libs/libopenblas.so.0) > 3420 ==3695== by 0x76BF099: dger_thread (in > /usr/local/lib/python2.7/dist-packages/numpy/.libs/libopenblas.so.0) > 3421 ==3695== by 0x767DC37: dger_ (in > /usr/local/lib/python2.7/dist-packages/numpy/.libs/libopenblas.so.0) > > > I think I have reported that to openblas already, they said do that > intentionally, though last I checked they are missing the code that > verifies this is actually allowed (if your not crossing a page you can > read beyond the boundaries). Its pretty likely its a pointless micro > optimization, you normally only use that trick for string functions > where you don't know the size of the string. > > Your code also indicates it ran on core2, while the issues occur on > sandybridge, maybe valgrind messes with the cpu detection so it won't > show anything.
Julian - thanks for having a look. Do you happen to remember the openblas issue number for this? Was there an obvious place we could patch openblas to avoid this error in particular? Cheers, Matthew _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion