On Tue, Feb 9, 2016 at 11:37 AM, Julian Taylor
<jtaylor.deb...@googlemail.com> wrote:
> On 09.02.2016 04:59, Nathaniel Smith wrote:
>> On Mon, Feb 8, 2016 at 6:07 PM, Nathaniel Smith <n...@pobox.com> wrote:
>>> On Mon, Feb 8, 2016 at 6:04 PM, Matthew Brett <matthew.br...@gmail.com> 
>>> wrote:
>>>> On Mon, Feb 8, 2016 at 5:26 PM, Nathaniel Smith <n...@pobox.com> wrote:
>>>>> On Mon, Feb 8, 2016 at 4:37 PM, Matthew Brett <matthew.br...@gmail.com> 
>>>>> wrote:
>>>>> [...]
>>>>>> I can't replicate the segfault with manylinux wheels and scipy.  On
>>>>>> the other hand, I get a new test error for numpy from manylinux, scipy
>>>>>> from manylinux, like this:
>>>>>>
>>>>>> $ python -c 'import scipy.linalg; scipy.linalg.test()'
>>>>>>
>>>>>> ======================================================================
>>>>>> FAIL: test_decomp.test_eigh('general ', 6, 'F', True, False, False, (2, 
>>>>>> 4))
>>>>>> ----------------------------------------------------------------------
>>>>>> Traceback (most recent call last):
>>>>>>   File "/usr/local/lib/python2.7/dist-packages/nose/case.py", line
>>>>>> 197, in runTest
>>>>>>     self.test(*self.arg)
>>>>>>   File 
>>>>>> "/usr/local/lib/python2.7/dist-packages/scipy/linalg/tests/test_decomp.py",
>>>>>> line 658, in eigenhproblem_general
>>>>>>     assert_array_almost_equal(diag2_, ones(diag2_.shape[0]), 
>>>>>> DIGITS[dtype])
>>>>>>   File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py",
>>>>>> line 892, in assert_array_almost_equal
>>>>>>     precision=decimal)
>>>>>>   File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py",
>>>>>> line 713, in assert_array_compare
>>>>>>     raise AssertionError(msg)
>>>>>> AssertionError:
>>>>>> Arrays are not almost equal to 4 decimals
>>>>>>
>>>>>> (mismatch 100.0%)
>>>>>>  x: array([ 0.,  0.,  0.], dtype=float32)
>>>>>>  y: array([ 1.,  1.,  1.])
>>>>>>
>>>>>> ----------------------------------------------------------------------
>>>>>> Ran 1507 tests in 14.928s
>>>>>>
>>>>>> FAILED (KNOWNFAIL=4, SKIP=1, failures=1)
>>>>>>
>>>>>> This is a very odd error, which we don't get when running over a numpy
>>>>>> installed from source, linked to ATLAS, and doesn't happen when
>>>>>> running the tests via:
>>>>>>
>>>>>> nosetests /usr/local/lib/python2.7/dist-packages/scipy/linalg
>>>>>>
>>>>>> So, something about the copy of numpy (linked to openblas) is
>>>>>> affecting the results of scipy (also linked to openblas), and only
>>>>>> with a particular environment / test order.
>>>>>>
>>>>>> If you'd like to try and see whether y'all can do a better job of
>>>>>> debugging than me:
>>>>>>
>>>>>> # Run this script inside a docker container started with this 
>>>>>> incantation:
>>>>>> # docker run -ti --rm ubuntu:12.04 /bin/bash
>>>>>> apt-get update
>>>>>> apt-get install -y python curl
>>>>>> apt-get install libpython2.7  # this won't be necessary with next
>>>>>> iteration of manylinux wheel builds
>>>>>> curl -LO https://bootstrap.pypa.io/get-pip.py
>>>>>> python get-pip.py
>>>>>> pip install -f https://nipy.bic.berkeley.edu/manylinux numpy scipy nose
>>>>>> python -c 'import scipy.linalg; scipy.linalg.test()'
>>>>>
>>>>> I just tried this and on my laptop it completed without error.
>>>>>
>>>>> Best guess is that we're dealing with some memory corruption bug
>>>>> inside openblas, so it's getting perturbed by things like exactly what
>>>>> other calls to openblas have happened (which is different depending on
>>>>> whether numpy is linked to openblas), and which core type openblas has
>>>>> detected.
>>>>>
>>>>> On my laptop, which *doesn't* show the problem, running with
>>>>> OPENBLAS_VERBOSE=2 says "Core: Haswell".
>>>>>
>>>>> Guess the next step is checking what core type the failing machines
>>>>> use, and running valgrind... anyone have a good valgrind suppressions
>>>>> file?
>>>>
>>>> My machine (which does give the failure) gives
>>>>
>>>> Core: Core2
>>>>
>>>> with OPENBLAS_VERBOSE=2
>>>
>>> Yep, that allows me to reproduce it:
>>>
>>> root@f7153f0cc841:/# OPENBLAS_VERBOSE=2 OPENBLAS_CORETYPE=Core2 python
>>> -c 'import scipy.linalg; scipy.linalg.test()'
>>> Core: Core2
>>> [...]
>>> ======================================================================
>>> FAIL: test_decomp.test_eigh('general ', 6, 'F', True, False, False, (2, 4))
>>> ----------------------------------------------------------------------
>>> [...]
>>>
>>> So this is indeed sounding like an OpenBLAS issue... next stop
>>> valgrind, I guess :-/
>>
>> Here's the valgrind output:
>>   https://gist.github.com/njsmith/577d028e79f0a80d2797
>>
>> There's a lot of it, but no smoking guns have jumped out at me :-/
>>
>> -n
>>
>
> plenty of smoking guns, e.g.:
>
> .............==3695== Invalid read of size 8
> 3417    ==3695==    at 0x7AAA9C0: daxpy_k_CORE2 (in
> /usr/local/lib/python2.7/dist-packages/numpy/.libs/libopenblas.so.0)
> 3418    ==3695==    by 0x76BEEFC: ger_kernel (in
> /usr/local/lib/python2.7/dist-packages/numpy/.libs/libopenblas.so.0)
> 3419    ==3695==    by 0x788F618: exec_blas (in
> /usr/local/lib/python2.7/dist-packages/numpy/.libs/libopenblas.so.0)
> 3420    ==3695==    by 0x76BF099: dger_thread (in
> /usr/local/lib/python2.7/dist-packages/numpy/.libs/libopenblas.so.0)
> 3421    ==3695==    by 0x767DC37: dger_ (in
> /usr/local/lib/python2.7/dist-packages/numpy/.libs/libopenblas.so.0)
>
>
> I think I have reported that to openblas already, they said do that
> intentionally, though last I checked they are missing the code that
> verifies this is actually allowed (if your not crossing a page you can
> read beyond the boundaries). Its pretty likely its a pointless micro
> optimization, you normally only use that trick for string functions
> where you don't know the size of the string.
>
> Your code also indicates it ran on core2, while the issues occur on
> sandybridge, maybe valgrind messes with the cpu detection so it won't
> show anything.

Julian - thanks for having a look.  Do you happen to remember the
openblas issue number for this?

Was there an obvious place we could patch openblas to avoid this error
in particular?

Cheers,

Matthew
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to