are you able to reproduce this error with ib verbs bw test? I hope, you are running on lossless Ethernet fabric setup and selecting correct VLAN .
-Devendar From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Dave Turner Sent: Wednesday, January 28, 2015 4:31 PM To: de...@open-mpi.org Subject: [OMPI devel] mlx4 QP operation err I'm testing RoCE on 40 Gbps Mellanox ethernet cards and am getting a mlx4 QP operation error every time it gets to testing 132 kB packets. These are aggregate tests in that 16 cores on one host are doing bi-directional ping-pongs to 16 cores on another host across the Mellanox cards. I've found some old references to similar mlx4 errors dating back to 2009 that lead me to believe this may be a firmware error. I believe we're running the most up to date version of the firmware. Could someone comment on whether these are firmware issues, and if so how to report them to Mellanox? I've attached some files with more detailed information on this problem. Dave Turner -- Work: davetur...@ksu.edu<mailto:davetur...@ksu.edu> (785) 532-7791 118 Nichols Hall, Manhattan KS 66502 Home: drdavetur...@gmail.com<mailto:drdavetur...@gmail.com> cell: (785) 770-5929