On Mon, Feb 8, 2016 at 6:07 PM, Nathaniel Smith <n...@pobox.com> wrote: > On Mon, Feb 8, 2016 at 6:04 PM, Matthew Brett <matthew.br...@gmail.com> wrote: >> On Mon, Feb 8, 2016 at 5:26 PM, Nathaniel Smith <n...@pobox.com> wrote: >>> On Mon, Feb 8, 2016 at 4:37 PM, Matthew Brett <matthew.br...@gmail.com> >>> wrote: >>> [...] >>>> I can't replicate the segfault with manylinux wheels and scipy. On >>>> the other hand, I get a new test error for numpy from manylinux, scipy >>>> from manylinux, like this: >>>> >>>> $ python -c 'import scipy.linalg; scipy.linalg.test()' >>>> >>>> ====================================================================== >>>> FAIL: test_decomp.test_eigh('general ', 6, 'F', True, False, False, (2, 4)) >>>> ---------------------------------------------------------------------- >>>> Traceback (most recent call last): >>>> File "/usr/local/lib/python2.7/dist-packages/nose/case.py", line >>>> 197, in runTest >>>> self.test(*self.arg) >>>> File >>>> "/usr/local/lib/python2.7/dist-packages/scipy/linalg/tests/test_decomp.py", >>>> line 658, in eigenhproblem_general >>>> assert_array_almost_equal(diag2_, ones(diag2_.shape[0]), DIGITS[dtype]) >>>> File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py", >>>> line 892, in assert_array_almost_equal >>>> precision=decimal) >>>> File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py", >>>> line 713, in assert_array_compare >>>> raise AssertionError(msg) >>>> AssertionError: >>>> Arrays are not almost equal to 4 decimals >>>> >>>> (mismatch 100.0%) >>>> x: array([ 0., 0., 0.], dtype=float32) >>>> y: array([ 1., 1., 1.]) >>>> >>>> ---------------------------------------------------------------------- >>>> Ran 1507 tests in 14.928s >>>> >>>> FAILED (KNOWNFAIL=4, SKIP=1, failures=1) >>>> >>>> This is a very odd error, which we don't get when running over a numpy >>>> installed from source, linked to ATLAS, and doesn't happen when >>>> running the tests via: >>>> >>>> nosetests /usr/local/lib/python2.7/dist-packages/scipy/linalg >>>> >>>> So, something about the copy of numpy (linked to openblas) is >>>> affecting the results of scipy (also linked to openblas), and only >>>> with a particular environment / test order. >>>> >>>> If you'd like to try and see whether y'all can do a better job of >>>> debugging than me: >>>> >>>> # Run this script inside a docker container started with this incantation: >>>> # docker run -ti --rm ubuntu:12.04 /bin/bash >>>> apt-get update >>>> apt-get install -y python curl >>>> apt-get install libpython2.7 # this won't be necessary with next >>>> iteration of manylinux wheel builds >>>> curl -LO https://bootstrap.pypa.io/get-pip.py >>>> python get-pip.py >>>> pip install -f https://nipy.bic.berkeley.edu/manylinux numpy scipy nose >>>> python -c 'import scipy.linalg; scipy.linalg.test()' >>> >>> I just tried this and on my laptop it completed without error. >>> >>> Best guess is that we're dealing with some memory corruption bug >>> inside openblas, so it's getting perturbed by things like exactly what >>> other calls to openblas have happened (which is different depending on >>> whether numpy is linked to openblas), and which core type openblas has >>> detected. >>> >>> On my laptop, which *doesn't* show the problem, running with >>> OPENBLAS_VERBOSE=2 says "Core: Haswell". >>> >>> Guess the next step is checking what core type the failing machines >>> use, and running valgrind... anyone have a good valgrind suppressions >>> file? >> >> My machine (which does give the failure) gives >> >> Core: Core2 >> >> with OPENBLAS_VERBOSE=2 > > Yep, that allows me to reproduce it: > > root@f7153f0cc841:/# OPENBLAS_VERBOSE=2 OPENBLAS_CORETYPE=Core2 python > -c 'import scipy.linalg; scipy.linalg.test()' > Core: Core2 > [...] > ====================================================================== > FAIL: test_decomp.test_eigh('general ', 6, 'F', True, False, False, (2, 4)) > ---------------------------------------------------------------------- > [...] > > So this is indeed sounding like an OpenBLAS issue... next stop > valgrind, I guess :-/
Here's the valgrind output: https://gist.github.com/njsmith/577d028e79f0a80d2797 There's a lot of it, but no smoking guns have jumped out at me :-/ -n -- Nathaniel J. Smith -- https://vorpus.org _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion