On 09.02.2016 20:52, Matthew Brett wrote: > On Mon, Feb 8, 2016 at 7:59 PM, Nathaniel Smith <n...@pobox.com> wrote: >> On Mon, Feb 8, 2016 at 6:07 PM, Nathaniel Smith <n...@pobox.com> wrote: >>> On Mon, Feb 8, 2016 at 6:04 PM, Matthew Brett <matthew.br...@gmail.com> >>> wrote: >>>> On Mon, Feb 8, 2016 at 5:26 PM, Nathaniel Smith <n...@pobox.com> wrote: >>>>> On Mon, Feb 8, 2016 at 4:37 PM, Matthew Brett <matthew.br...@gmail.com> >>>>> wrote: >>>>> [...] >>>>>> I can't replicate the segfault with manylinux wheels and scipy. On >>>>>> the other hand, I get a new test error for numpy from manylinux, scipy >>>>>> from manylinux, like this: >>>>>> >>>>>> $ python -c 'import scipy.linalg; scipy.linalg.test()' >>>>>> >>>>>> ====================================================================== >>>>>> FAIL: test_decomp.test_eigh('general ', 6, 'F', True, False, False, (2, >>>>>> 4)) >>>>>> ---------------------------------------------------------------------- >>>>>> Traceback (most recent call last): >>>>>> File "/usr/local/lib/python2.7/dist-packages/nose/case.py", line >>>>>> 197, in runTest >>>>>> self.test(*self.arg) >>>>>> File >>>>>> "/usr/local/lib/python2.7/dist-packages/scipy/linalg/tests/test_decomp.py", >>>>>> line 658, in eigenhproblem_general >>>>>> assert_array_almost_equal(diag2_, ones(diag2_.shape[0]), >>>>>> DIGITS[dtype]) >>>>>> File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py", >>>>>> line 892, in assert_array_almost_equal >>>>>> precision=decimal) >>>>>> File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py", >>>>>> line 713, in assert_array_compare >>>>>> raise AssertionError(msg) >>>>>> AssertionError: >>>>>> Arrays are not almost equal to 4 decimals >>>>>> >>>>>> (mismatch 100.0%) >>>>>> x: array([ 0., 0., 0.], dtype=float32) >>>>>> y: array([ 1., 1., 1.]) >>>>>> >>>>>> ---------------------------------------------------------------------- >>>>>> Ran 1507 tests in 14.928s >>>>>> >>>>>> FAILED (KNOWNFAIL=4, SKIP=1, failures=1) >>>>>> >>>>>> This is a very odd error, which we don't get when running over a numpy >>>>>> installed from source, linked to ATLAS, and doesn't happen when >>>>>> running the tests via: >>>>>> >>>>>> nosetests /usr/local/lib/python2.7/dist-packages/scipy/linalg >>>>>> >>>>>> So, something about the copy of numpy (linked to openblas) is >>>>>> affecting the results of scipy (also linked to openblas), and only >>>>>> with a particular environment / test order. >>>>>> >>>>>> If you'd like to try and see whether y'all can do a better job of >>>>>> debugging than me: >>>>>> >>>>>> # Run this script inside a docker container started with this >>>>>> incantation: >>>>>> # docker run -ti --rm ubuntu:12.04 /bin/bash >>>>>> apt-get update >>>>>> apt-get install -y python curl >>>>>> apt-get install libpython2.7 # this won't be necessary with next >>>>>> iteration of manylinux wheel builds >>>>>> curl -LO https://bootstrap.pypa.io/get-pip.py >>>>>> python get-pip.py >>>>>> pip install -f https://nipy.bic.berkeley.edu/manylinux numpy scipy nose >>>>>> python -c 'import scipy.linalg; scipy.linalg.test()' >>>>> >>>>> I just tried this and on my laptop it completed without error. >>>>> >>>>> Best guess is that we're dealing with some memory corruption bug >>>>> inside openblas, so it's getting perturbed by things like exactly what >>>>> other calls to openblas have happened (which is different depending on >>>>> whether numpy is linked to openblas), and which core type openblas has >>>>> detected. >>>>> >>>>> On my laptop, which *doesn't* show the problem, running with >>>>> OPENBLAS_VERBOSE=2 says "Core: Haswell". >>>>> >>>>> Guess the next step is checking what core type the failing machines >>>>> use, and running valgrind... anyone have a good valgrind suppressions >>>>> file? >>>> >>>> My machine (which does give the failure) gives >>>> >>>> Core: Core2 >>>> >>>> with OPENBLAS_VERBOSE=2 >>> >>> Yep, that allows me to reproduce it: >>> >>> root@f7153f0cc841:/# OPENBLAS_VERBOSE=2 OPENBLAS_CORETYPE=Core2 python >>> -c 'import scipy.linalg; scipy.linalg.test()' >>> Core: Core2 >>> [...] >>> ====================================================================== >>> FAIL: test_decomp.test_eigh('general ', 6, 'F', True, False, False, (2, 4)) >>> ---------------------------------------------------------------------- >>> [...] >>> >>> So this is indeed sounding like an OpenBLAS issue... next stop >>> valgrind, I guess :-/ >> >> Here's the valgrind output: >> https://gist.github.com/njsmith/577d028e79f0a80d2797 >> >> There's a lot of it, but no smoking guns have jumped out at me :-/ > > Could you send me instructions on replicating the valgrind run, I'll > run on on the actual Core2 machine... > > Matthew
please also use this suppression file, should reduce the python noise significantly but it might be a bit out of date. Used to work fine on an ubuntu built python.
# # This is a valgrind suppression file that should be used when using valgrind. # # Here's an example of running valgrind: # # cd python/dist/src # valgrind --tool=memcheck --suppressions=Misc/valgrind-python.supp \ # ./python -E -tt ./Lib/test/regrtest.py -u bsddb,network # # You must edit Objects/obmalloc.c and uncomment Py_USING_MEMORY_DEBUGGER # to use the preferred suppressions with Py_ADDRESS_IN_RANGE. # # If you do not want to recompile Python, you can uncomment # suppressions for PyObject_Free and PyObject_Realloc. # # See Misc/README.valgrind for more information. # all tool names: Addrcheck,Memcheck,cachegrind,helgrind,massif { ADDRESS_IN_RANGE/Invalid read of size 4 Memcheck:Addr4 fun:Py_ADDRESS_IN_RANGE } { ADDRESS_IN_RANGE/Invalid read of size 4 Memcheck:Value4 fun:Py_ADDRESS_IN_RANGE } { ADDRESS_IN_RANGE/Invalid read of size 8 (x86_64 aka amd64) Memcheck:Value8 fun:Py_ADDRESS_IN_RANGE } { ADDRESS_IN_RANGE/Conditional jump or move depends on uninitialised value Memcheck:Cond fun:Py_ADDRESS_IN_RANGE } # # Leaks (including possible leaks) # Hmmm, I wonder if this masks some real leaks. I think it does. # Will need to fix that. # { Suppress leaking the GIL. Happens once per process, see comment in ceval.c. Memcheck:Leak fun:malloc fun:PyThread_allocate_lock fun:PyEval_InitThreads } { Suppress leaking the GIL after a fork. Memcheck:Leak fun:malloc fun:PyThread_allocate_lock fun:PyEval_ReInitThreads } { Suppress leaking the autoTLSkey. This looks like it shouldn't leak though. Memcheck:Leak fun:malloc fun:PyThread_create_key fun:_PyGILState_Init fun:Py_InitializeEx fun:Py_Main } { Hmmm, is this a real leak or like the GIL? Memcheck:Leak fun:malloc fun:PyThread_ReInitTLS } { Handle PyMalloc confusing valgrind (possibly leaked) Memcheck:Leak fun:realloc fun:_PyObject_GC_Resize # fun:COMMENT_THIS_LINE_TO_DISABLE_LEAK_WARNING } { Handle PyMalloc confusing valgrind (possibly leaked) Memcheck:Leak fun:malloc fun:_PyObject_GC_New # fun:COMMENT_THIS_LINE_TO_DISABLE_LEAK_WARNING } { Handle PyMalloc confusing valgrind (possibly leaked) Memcheck:Leak fun:malloc fun:_PyObject_GC_NewVar # fun:COMMENT_THIS_LINE_TO_DISABLE_LEAK_WARNING } # # Non-python specific leaks # { Handle pthread issue (possibly leaked) Memcheck:Leak fun:calloc fun:allocate_dtv fun:_dl_allocate_tls_storage fun:_dl_allocate_tls } { Handle pthread issue (possibly leaked) Memcheck:Leak fun:memalign fun:_dl_allocate_tls_storage fun:_dl_allocate_tls } # Object Malloc/Free/Realloc stuff, very broad { ADDRESS_IN_RANGE/Invalid read of size 4 Memcheck:Addr4 fun:PyObject_Free* } { ADDRESS_IN_RANGE/Invalid read of size 4 Memcheck:Value4 fun:PyObject_Free* } { ADDRESS_IN_RANGE/Conditional jump or move depends on uninitialised value Memcheck:Cond fun:PyObject_Free* } { ADDRESS_IN_RANGE/Invalid read of size 4 Memcheck:Addr4 fun:PyObject_Realloc* } { ADDRESS_IN_RANGE/Invalid read of size 4 Memcheck:Value4 fun:PyObject_Realloc* } { ADDRESS_IN_RANGE/Conditional jump or move depends on uninitialised value Memcheck:Cond fun:PyObject_Realloc* } # Object Malloc/Free/Realloc stuff for size 8 { ADDRESS_IN_RANGE/Invalid read of size 8 Memcheck:Addr8 fun:PyObject_Free* } { ADDRESS_IN_RANGE/Invalid read of size 8 Memcheck:Value8 fun:PyObject_Free* } { ADDRESS_IN_RANGE/Invalid read of size 8 Memcheck:Addr8 fun:PyObject_Realloc* } { ADDRESS_IN_RANGE/Invalid read of size 8 Memcheck:Value8 fun:PyObject_Realloc* } ### ### All the suppressions below are for errors that occur within libraries ### that Python uses. The problems to not appear to be related to Python's ### use of the libraries. ### { Generic ubuntu ld problems Memcheck:Addr8 obj:/lib/ld-2.4.so obj:/lib/ld-2.4.so obj:/lib/ld-2.4.so obj:/lib/ld-2.4.so } { Generic gentoo ld problems Memcheck:Cond obj:/lib/ld-2.3.4.so obj:/lib/ld-2.3.4.so obj:/lib/ld-2.3.4.so obj:/lib/ld-2.3.4.so } { DBM problems, see test_dbm Memcheck:Param write(buf) fun:write obj:/usr/lib/libdb1.so.2 obj:/usr/lib/libdb1.so.2 obj:/usr/lib/libdb1.so.2 obj:/usr/lib/libdb1.so.2 fun:dbm_close } { DBM problems, see test_dbm Memcheck:Value8 fun:memmove obj:/usr/lib/libdb1.so.2 obj:/usr/lib/libdb1.so.2 obj:/usr/lib/libdb1.so.2 obj:/usr/lib/libdb1.so.2 fun:dbm_store fun:dbm_ass_sub } { DBM problems, see test_dbm Memcheck:Cond obj:/usr/lib/libdb1.so.2 obj:/usr/lib/libdb1.so.2 obj:/usr/lib/libdb1.so.2 fun:dbm_store fun:dbm_ass_sub } { DBM problems, see test_dbm Memcheck:Cond fun:memmove obj:/usr/lib/libdb1.so.2 obj:/usr/lib/libdb1.so.2 obj:/usr/lib/libdb1.so.2 obj:/usr/lib/libdb1.so.2 fun:dbm_store fun:dbm_ass_sub } { GDBM problems, see test_gdbm Memcheck:Param write(buf) fun:write fun:gdbm_open } { ZLIB problems, see test_gzip Memcheck:Cond obj:/lib/libz.so.1.2.3 obj:/lib/libz.so.1.2.3 fun:deflate } { Avoid problems w/readline doing a putenv and leaking on exit Memcheck:Leak fun:malloc fun:xmalloc fun:sh_set_lines_and_columns fun:_rl_get_screen_size fun:_rl_init_terminal_io obj:/lib/libreadline.so.4.3 fun:rl_initialize } ### ### These occur from somewhere within the SSL, when running ### test_socket_sll. They are too general to leave on by default. ### ###{ ### somewhere in SSL stuff ### Memcheck:Cond ### fun:memset ###} ###{ ### somewhere in SSL stuff ### Memcheck:Value4 ### fun:memset ###} ### ###{ ### somewhere in SSL stuff ### Memcheck:Cond ### fun:MD5_Update ###} ### ###{ ### somewhere in SSL stuff ### Memcheck:Value4 ### fun:MD5_Update ###} # # All of these problems come from using test_socket_ssl # { from test_socket_ssl Memcheck:Cond fun:BN_bin2bn } { from test_socket_ssl Memcheck:Cond fun:BN_num_bits_word } { from test_socket_ssl Memcheck:Value4 fun:BN_num_bits_word } { from test_socket_ssl Memcheck:Cond fun:BN_mod_exp_mont_word } { from test_socket_ssl Memcheck:Cond fun:BN_mod_exp_mont } { from test_socket_ssl Memcheck:Param write(buf) fun:write obj:/usr/lib/libcrypto.so.0.9.7 } { from test_socket_ssl Memcheck:Cond fun:RSA_verify } { from test_socket_ssl Memcheck:Value4 fun:RSA_verify } { from test_socket_ssl Memcheck:Value4 fun:DES_set_key_unchecked } { from test_socket_ssl Memcheck:Value4 fun:DES_encrypt2 } { from test_socket_ssl Memcheck:Cond obj:/usr/lib/libssl.so.0.9.7 } { from test_socket_ssl Memcheck:Value4 obj:/usr/lib/libssl.so.0.9.7 } { from test_socket_ssl Memcheck:Cond fun:BUF_MEM_grow_clean } { from test_socket_ssl Memcheck:Cond fun:memcpy fun:ssl3_read_bytes } { from test_socket_ssl Memcheck:Cond fun:SHA1_Update } { from test_socket_ssl Memcheck:Value4 fun:SHA1_Update } #jtaylor added { <insert_a_suppression_name_here> Memcheck:Addr4 fun:PyObject_GC_Del fun:tupledealloc.* } { <insert_a_suppression_name_here> Memcheck:Addr4 fun:PyObject_GC_Del fun:code_dealloc.* } { <insert_a_suppression_name_here> Memcheck:Cond fun:PyObject_GC_Del fun:code_dealloc.* } { <insert_a_suppression_name_here> Memcheck:Value8 fun:PyObject_GC_Del fun:code_dealloc.* } { <insert_a_suppression_name_here> Memcheck:Value8 fun:PyObject_GC_Del fun:tupledealloc.* } { <insert_a_suppression_name_here> Memcheck:Cond fun:PyObject_GC_Del fun:tupledealloc.* } { <insert_a_suppression_name_here> Memcheck:Addr4 fun:PyObject_GC_Del fun:dict_dealloc.* } { <insert_a_suppression_name_here> Memcheck:Cond fun:PyObject_GC_Del fun:dict_dealloc.* } { <insert_a_suppression_name_here> Memcheck:Value8 fun:PyObject_GC_Del fun:dict_dealloc.* } { <insert_a_suppression_name_here> Memcheck:Addr4 fun:PyObject_GC_Del fun:collect.* } { <insert_a_suppression_name_here> Memcheck:Cond fun:PyObject_GC_Del fun:collect.* } { <insert_a_suppression_name_here> Memcheck:Value8 fun:PyObject_GC_Del fun:collect.* } { <insert_a_suppression_name_here> Memcheck:Addr4 fun:match_dealloc.* fun:frame_dealloc.* } { <insert_a_suppression_name_here> Memcheck:Addr4 fun:PyObject_GC_Del fun:subtype_dealloc.* } { <insert_a_suppression_name_here> Memcheck:Addr4 fun:PyObject_GC_Del fun:frame_dealloc.* fun:PyEval_EvalFrameEx fun:PyEval_EvalFrameEx fun:PyEval_EvalFrameEx fun:PyEval_EvalFrameEx } { <insert_a_suppression_name_here> Memcheck:Addr4 fun:PyObject_GC_Del fun:PyFrame_ClearFreeList fun:collect.* fun:_PyObject_GC_New } { <insert_a_suppression_name_here> Memcheck:Addr4 fun:PyObject_GC_Del fun:PyFrame_ClearFreeList fun:collect.* } { <insert_a_suppression_name_here> Memcheck:Cond fun:PyObject_GC_Del fun:PyFrame_ClearFreeList fun:collect.* } { <insert_a_suppression_name_here> Memcheck:Cond fun:PyObject_GC_Del fun:subtype_dealloc.* } { <insert_a_suppression_name_here> Memcheck:Addr4 fun:PyObject_GC_Del fun:PyDict_Fini fun:Py_Finalize } { <insert_a_suppression_name_here> Memcheck:Cond fun:PyObject_GC_Del fun:PyDict_Fini fun:Py_Finalize } { <insert_a_suppression_name_here> Memcheck:Value8 fun:PyObject_GC_Del fun:PyDict_Fini fun:Py_Finalize } { <insert_a_suppression_name_here> Memcheck:Value8 fun:PyGrammar_RemoveAccelerators fun:Py_Finalize } { <insert_a_suppression_name_here> Memcheck:Addr4 fun:PyGrammar_RemoveAccelerators fun:Py_Finalize } { <insert_a_suppression_name_here> Memcheck:Cond fun:PyGrammar_RemoveAccelerators fun:Py_Finalize }
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion