Hi, I don't know if many people are aware of it, but I have recently discovered perf, a tool available from the kernel sources. It is extremely simple to use, and very useful when looking at numpy/scipy perf issues in compiled code. For example, I can get this kind of results for looking at the numpy neighborhood iterator performance in one simple command, without special compilation flags:
44.69% python /home/david/local/stow/scipy.git/lib/python2.6/site-packages/scipy/signal/sigtools.so [.] _imp_correlate_nd_double 39.47% python /home/david/local/stow/numpy-1.4.0/lib/python2.6/site-packages/numpy/core/multiarray.so [.] get_ptr_constant 9.98% python /home/david/local/stow/numpy-1.4.0/lib/python2.6/site-packages/numpy/core/multiarray.so [.] get_ptr_simple 0.65% python /usr/bin/python2.6 [.] 0x0000000012b8a0 0.40% python /usr/bin/python2.6 [.] 0x000000000a6662 0.37% python /usr/bin/python2.6 [.] 0x0000000004c10d 0.32% python /usr/bin/python2.6 [.] PyEval_EvalFrameEx 0.15% python [kernel] [k] __d_lookup 0.14% python /lib/libc-2.10.1.so [.] _int_malloc 0.12% python /usr/bin/python2.6 [.] 0x0000000004f90e 0.10% python [kernel] [k] __link_path_walk 0.09% python /usr/bin/python2.6 [.] PyObject_Malloc 0.09% python /lib/ld-2.10.1.so [.] do_lookup_x 0.09% python /lib/libc-2.10.1.so [.] __GI_memcpy 0.08% python [kernel] [k] __ticket_spin_lock 0.07% python /usr/bin/python2.6 [.] PyParser_AddToken And even cooler, annotated sources: ------------------------------------------------ Percent | Source code & Disassembly of multiarray.so ------------------------------------------------ : : : : Disassembly of section .text: : : 000000000001d8a0 <get_ptr_constant>: : _coordinates[c] = bd; : : /* set the dataptr from its current coordinates */ : static char* : get_ptr_constant(PyArrayIterObject* _iter, npy_intp *coordinates) : { 15.69 : 1d8a0: 48 81 ec 08 01 00 00 sub $0x108,%rsp : int i; : npy_intp bd, _coordinates[NPY_MAXDIMS]; : PyArrayNeighborhoodIterObject *niter = (PyArrayNeighborhoodIterObject*)_iter; : PyArrayIterObject *p = niter->_internal_iter; : : for(i = 0; i < niter->nd; ++i) { 0.02 : 1d8a7: 48 83 bf 48 0a 00 00 cmpq $0x0,0xa48(%rdi) 0.00 : 1d8ae: 00 : get_ptr_constant(PyArrayIterObject* _iter, npy_intp *coordinates) : { : int i; : npy_intp bd, _coordinates[NPY_MAXDIMS]; : PyArrayNeighborhoodIterObject *niter = (PyArrayNeighborhoodIterObject*)_iter; : PyArrayIterObject *p = niter->_internal_iter; 0.01 : 1d8af: 48 8b 87 50 0b 00 00 mov 0xb50(%rdi),%rax : : for(i = 0; i < niter->nd; ++i) { 7.92 : 1d8b6: 7e 64 jle 1d91c <get_ptr_constant+0x7c> : _INF_SET_PTR(i) 0.01 : 1d8b8: 48 8b 0e mov (%rsi),%rcx 0.00 : 1d8bb: 48 03 48 28 add 0x28(%rax),%rcx 0.03 : 1d8bf: 48 3b 88 40 07 00 00 cmp 0x740(%rax),%rcx 7.97 : 1d8c6: 7c 68 jl 1d930 <get_ptr_constant+0x90> 0.02 : 1d8c8: 45 31 c9 xor %r9d,%r9d 0.00 : 1d8cb: 31 d2 xor %edx,%edx 0.00 : 1d8cd: 48 3b 88 48 07 00 00 cmp 0x748(%rax),%rcx 7.75 : 1d8d4: 7e 32 jle 1d908 <get_ptr_constant+0x68> 0.00 : 1d8d6: eb 58 jmp 1d930 <get_ptr_constant+0x90> 0.00 : 1d8d8: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1) 0.00 : 1d8df: 00 7.68 : 1d8e0: 4c 8d 42 74 lea 0x74(%rdx),%r8 0.00 : 1d8e4: 48 8b 0c d6 mov (%rsi,%rdx,8),%rcx 0.00 : 1d8e8: 48 03 4c d0 28 add 0x28(%rax,%rdx,8),%rcx 0.00 : 1d8ed: 49 c1 e0 04 shl $0x4,%r8 7.89 : 1d8f1: 49 3b 0c 00 cmp (%r8,%rax,1),%rcx 0.00 : 1d8f5: 7c 39 jl 1d930 <get_ptr_constant+0x90> 0.01 : 1d8f7: 49 89 d0 mov %rdx,%r8 0.11 : 1d8fa: 49 c1 e0 04 shl $0x4,%r8 7.18 : 1d8fe: 4a 3b 8c 00 48 07 00 cmp 0x748(%rax,%r8,1),%rcx 0.00 : 1d905: 00 0.09 : 1d906: 7f 28 jg 1d930 <get_ptr_constant+0x90> : int i; : npy_intp bd, _coordinates[NPY_MAXDIMS]; : PyArrayNeighborhoodIterObject *niter = (PyArrayNeighborhoodIterObject*)_iter; : PyArrayIterObject *p = niter->_internal_iter; : It works for C and Fortran, BTW, cheers, David _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion