are you able to reproduce this error with ib verbs bw test?  I hope,  you are 
running on lossless Ethernet fabric setup and selecting correct VLAN .

-Devendar

From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Dave Turner
Sent: Wednesday, January 28, 2015 4:31 PM
To: de...@open-mpi.org
Subject: [OMPI devel] mlx4 QP operation err


    I'm testing RoCE on 40 Gbps Mellanox ethernet cards and am getting a
mlx4 QP operation error every time it gets to testing 132 kB packets.  These
are aggregate tests in that 16 cores on one host are doing bi-directional
ping-pongs to 16 cores on another host across the Mellanox cards.

      I've found some old references to similar mlx4 errors dating back to
2009 that lead me to believe this may be a firmware error.  I believe we're
running the most up to date version of the firmware.

     Could someone comment on whether these are firmware issues, and
if so how to report them to Mellanox?  I've attached some files with more
detailed information on this problem.

                 Dave Turner

--
Work:     davetur...@ksu.edu<mailto:davetur...@ksu.edu>     (785) 532-7791
             118 Nichols Hall, Manhattan KS  66502
Home:    drdavetur...@gmail.com<mailto:drdavetur...@gmail.com>
              cell: (785) 770-5929

Reply via email to