Hi, We experienced constant crashing from opensm 3.3.15 (3.3.15-1.el6.cq5) after a recent upgrade. We compiled and installed 3.3.17 and problem went away.
OpenSM server: CentOS 6.5 w/ stock RDMA. OpenSM 3.3.15 was from the CentOS repository. A behaviour that may help diagnose this: Unusual large amount messages were filling up the opensm.log file: Mar 13 09:50:04 909147 [4FAFC700] 0x01 -> log_rcv_cb_error: ERR 3111: Received MAD with error status = 0x1C SubnGetResp(SwitchInfo), attr_mod 0x0, TID 0x73c86e46 Initial path: 0,1,33,30,28 Return path: 0,10,32,13,28 80 of these messages occur periodically. smpquery on the paths shows that these all point to the Sun QNEM switches (80 I4 chips). "use_mfttop FALSE" eliminated these messages. Florent *** glibc detected *** /usr/sbin/opensm: malloc(): smallbin double linked list corrupted: 0x00007f9b3c4352a0 *** ======= Backtrace: ========= /lib64/libc.so.6(+0x76166)[0x7f9b56279166] /lib64/libc.so.6(+0x79f1f)[0x7f9b5627cf1f] /lib64/libc.so.6(__libc_malloc+0x71)[0x7f9b5627d991] /usr/sbin/opensm[0x4216f3] /usr/sbin/opensm(osm_pkey_mgr_process+0x467)[0x422187] /usr/sbin/opensm[0x446efb] /usr/sbin/opensm(osm_state_mgr_process+0x1f8)[0x448538] /usr/sbin/opensm[0x4422bb] /usr/lib64/libosmcomp.so.3(+0x85fe)[0x7f9b56ddb5fe] /lib64/libpthread.so.0(+0x79d1)[0x7f9b5659e9d1] /lib64/libc.so.6(clone+0x6d)[0x7f9b562ebb6d] *** glibc detected *** /usr/sbin/opensm: double free or corruption (out): 0x00007fe2f42e1830 *** ======= Backtrace: ========= /lib64/libc.so.6(+0x76166)[0x7fe30ec9d166] /lib64/libc.so.6(+0x78c93)[0x7fe30ec9fc93] /usr/sbin/opensm[0x449cf6] /usr/sbin/opensm(osm_subn_rescan_conf_files+0x194)[0x44af14] /usr/sbin/opensm[0x447260] /usr/sbin/opensm(osm_state_mgr_process+0x1f8)[0x448538] /usr/sbin/opensm[0x4422bb] /usr/lib64/libosmcomp.so.3(+0x85fe)[0x7fe30f7ff5fe] /lib64/libpthread.so.0(+0x79d1)[0x7fe30efc29d1] /lib64/libc.so.6(clone+0x6d)[0x7fe30ed0fb6d] *** glibc detected *** /usr/sbin/opensm: malloc(): smallbin double linked list corrupted: 0x00007f200838ede0 *** ======= Backtrace: ========= /lib64/libc.so.6(+0x76166)[0x7f2025131166] /lib64/libc.so.6(+0x79f1f)[0x7f2025134f1f] /lib64/libc.so.6(__libc_malloc+0x71)[0x7f2025135991] /usr/sbin/opensm[0x4216f3] /usr/sbin/opensm(osm_pkey_mgr_process+0x467)[0x422187] /usr/sbin/opensm[0x446efb] /usr/sbin/opensm(osm_state_mgr_process+0x1f8)[0x448538] /usr/sbin/opensm[0x4422bb] /usr/lib64/libosmcomp.so.3(+0x85fe)[0x7f2025c935fe] /lib64/libpthread.so.0(+0x79d1)[0x7f20254569d1] /lib64/libc.so.6(clone+0x6d)[0x7f20251a3b6d] *** glibc detected *** /usr/sbin/opensm: malloc(): smallbin double linked list corrupted: 0x00007f8464013df0 *** ======= Backtrace: ========= /lib64/libc.so.6(+0x76166)[0x7f847ec95166] /lib64/libc.so.6(+0x79f1f)[0x7f847ec98f1f] /lib64/libc.so.6(__libc_malloc+0x71)[0x7f847ec99991] /usr/sbin/opensm[0x4216f3] /usr/sbin/opensm(osm_pkey_mgr_process+0x467)[0x422187] /usr/sbin/opensm[0x446efb] /usr/sbin/opensm(osm_state_mgr_process+0x1f8)[0x448538] /usr/sbin/opensm[0x4422bb] /usr/lib64/libosmcomp.so.3(+0x85fe)[0x7f847f7f75fe] /lib64/libpthread.so.0(+0x79d1)[0x7f847efba9d1] /lib64/libc.so.6(clone+0x6d)[0x7f847ed07b6d] -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html