On Mon, Feb 8, 2016 at 7:59 PM, Nathaniel Smith <n...@pobox.com> wrote: > On Mon, Feb 8, 2016 at 6:07 PM, Nathaniel Smith <n...@pobox.com> wrote: >> On Mon, Feb 8, 2016 at 6:04 PM, Matthew Brett <matthew.br...@gmail.com> >> wrote: >>> On Mon, Feb 8, 2016 at 5:26 PM, Nathaniel Smith <n...@pobox.com> wrote: >>>> On Mon, Feb 8, 2016 at 4:37 PM, Matthew Brett <matthew.br...@gmail.com> >>>> wrote: >>>> [...] >>>>> I can't replicate the segfault with manylinux wheels and scipy. On >>>>> the other hand, I get a new test error for numpy from manylinux, scipy >>>>> from manylinux, like this: >>>>> >>>>> $ python -c 'import scipy.linalg; scipy.linalg.test()' >>>>> >>>>> ====================================================================== >>>>> FAIL: test_decomp.test_eigh('general ', 6, 'F', True, False, False, (2, >>>>> 4)) >>>>> ---------------------------------------------------------------------- >>>>> Traceback (most recent call last): >>>>> File "/usr/local/lib/python2.7/dist-packages/nose/case.py", line >>>>> 197, in runTest >>>>> self.test(*self.arg) >>>>> File >>>>> "/usr/local/lib/python2.7/dist-packages/scipy/linalg/tests/test_decomp.py", >>>>> line 658, in eigenhproblem_general >>>>> assert_array_almost_equal(diag2_, ones(diag2_.shape[0]), >>>>> DIGITS[dtype]) >>>>> File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py", >>>>> line 892, in assert_array_almost_equal >>>>> precision=decimal) >>>>> File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py", >>>>> line 713, in assert_array_compare >>>>> raise AssertionError(msg) >>>>> AssertionError: >>>>> Arrays are not almost equal to 4 decimals >>>>> >>>>> (mismatch 100.0%) >>>>> x: array([ 0., 0., 0.], dtype=float32) >>>>> y: array([ 1., 1., 1.]) >>>>> >>>>> ---------------------------------------------------------------------- >>>>> Ran 1507 tests in 14.928s >>>>> >>>>> FAILED (KNOWNFAIL=4, SKIP=1, failures=1) >>>>> >>>>> This is a very odd error, which we don't get when running over a numpy >>>>> installed from source, linked to ATLAS, and doesn't happen when >>>>> running the tests via: >>>>> >>>>> nosetests /usr/local/lib/python2.7/dist-packages/scipy/linalg >>>>> >>>>> So, something about the copy of numpy (linked to openblas) is >>>>> affecting the results of scipy (also linked to openblas), and only >>>>> with a particular environment / test order. >>>>> >>>>> If you'd like to try and see whether y'all can do a better job of >>>>> debugging than me: >>>>> >>>>> # Run this script inside a docker container started with this incantation: >>>>> # docker run -ti --rm ubuntu:12.04 /bin/bash >>>>> apt-get update >>>>> apt-get install -y python curl >>>>> apt-get install libpython2.7 # this won't be necessary with next >>>>> iteration of manylinux wheel builds >>>>> curl -LO https://bootstrap.pypa.io/get-pip.py >>>>> python get-pip.py >>>>> pip install -f https://nipy.bic.berkeley.edu/manylinux numpy scipy nose >>>>> python -c 'import scipy.linalg; scipy.linalg.test()' >>>> >>>> I just tried this and on my laptop it completed without error. >>>> >>>> Best guess is that we're dealing with some memory corruption bug >>>> inside openblas, so it's getting perturbed by things like exactly what >>>> other calls to openblas have happened (which is different depending on >>>> whether numpy is linked to openblas), and which core type openblas has >>>> detected. >>>> >>>> On my laptop, which *doesn't* show the problem, running with >>>> OPENBLAS_VERBOSE=2 says "Core: Haswell". >>>> >>>> Guess the next step is checking what core type the failing machines >>>> use, and running valgrind... anyone have a good valgrind suppressions >>>> file? >>> >>> My machine (which does give the failure) gives >>> >>> Core: Core2 >>> >>> with OPENBLAS_VERBOSE=2 >> >> Yep, that allows me to reproduce it: >> >> root@f7153f0cc841:/# OPENBLAS_VERBOSE=2 OPENBLAS_CORETYPE=Core2 python >> -c 'import scipy.linalg; scipy.linalg.test()' >> Core: Core2 >> [...] >> ====================================================================== >> FAIL: test_decomp.test_eigh('general ', 6, 'F', True, False, False, (2, 4)) >> ---------------------------------------------------------------------- >> [...] >> >> So this is indeed sounding like an OpenBLAS issue... next stop >> valgrind, I guess :-/ > > Here's the valgrind output: > https://gist.github.com/njsmith/577d028e79f0a80d2797 > > There's a lot of it, but no smoking guns have jumped out at me :-/
Could you send me instructions on replicating the valgrind run, I'll run on on the actual Core2 machine... Matthew _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion