Re: rdma provider module references
On 12/15/2010 11:09 AM, Roland Dreier wrote: I notice that if I have a user rdma application running that has an rdma connection using iw_cxgb3, then the iw_cxgb3 module reference count is bumped and thus it cannot be unloaded. However when I have an NFSRDMA connection that utilizes iw_cxgb3, the module reference count is not bumped, and iw_cxgb3 can erroneously be unloaded while the NFSRDMA connection is still active, causing a crash. What is supposed to happen is that as the HW driver is unloading, it calls ib_unregister_device() first, and this calls each client's .remove() method to have it release everything related to that device. However I guess NFS/RDMA is behind the RDMA CM, which is supposed to handle device removal. In that code it seems to end up in cma_process_remove(), which appears at first glance to do the right things to destroy all connections etc. The idea is that RDMA devices should be like net devices, ie you can remove them even if they're in use -- things should just clean up, rather than blocking the module removal. The uverbs case is a bit of a hack because we don't have a way to handle revoking the mmap regions etc yet. What goes wrong with NFS/RDMA in this scheme? It looks like it should work. Here's one stack. From this I assume the offload connection was still active after iw_cxgb3 was unloaded... Call Trace: IRQ [80037136] kref_get+0x38/0x3d [885fb5b1] :iw_cxgb3:sched+0x17/0x49 [8824cf37] :cxgb3:process_rx+0x37/0x8b [8824a3e7] :cxgb3:process_responses+0xc09/0xc63 [8824ac65] :cxgb3:napi_rx_handler+0x36/0xa4 [8000c88a] net_rx_action+0xac/0x1e0 [8824ac15] :cxgb3:t3_sge_intr_msix_napi+0x173/0x18d [80012409] __do_softirq+0x89/0x133 [8005f2fc] call_softirq+0x1c/0x28 [8006dba8] do_softirq+0x2c/0x85 [8006da30] do_IRQ+0xec/0xf5 [800575d0] mwait_idle+0x0/0x4a [8005e615] ret_from_intr+0x0/0xa EOI [80057606] mwait_idle+0x36/0x4a [800497be] cpu_idle+0x95/0xb8 [80078997] start_secondary+0x498/0x4a7 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: rdma provider module references
However I guess NFS/RDMA is behind the RDMA CM, which is supposed to handle device removal. In that code it seems to end up in cma_process_remove(), which appears at first glance to do the right things to destroy all connections etc. Function cma_process_remove() calls cma_remove_id_dev() for each cm_id bound to the device being removed. Function cma_remove_id_dev() calls the event handler function for each cm_id and passes a RDMA_CM_EVENT_DEVICE_REMOVAL event. The NFSRDMA server marks the RPC transport as XPT_CLOSE, but doesn't immediately destroy the cm_id in the event handler function. This is in net/sunrpc/xprtrdma/svc_rdma_transport.c / rdma_cma_handler(). That's the issue methinks. Each RDMA kernel user must destroy all the resources in the event handler function itself. These cannot be scheduled or deferred in any way given the current design. Steve. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Intel NetEffect NE020 E10G81GP use
Hi, I wrote about RDMA over TCP/IP (iWARP). Today I have installed mvapich2-1.6rc1 and openmpi-1.4.3 on my machine. I don't know a lot about MPI code. I just try run hello file from example folder. [r...@redigo-01 examples]# ls connectivity_c.c hello_c.c hello_f77.f Makefile README ring_cxx.cc ring_f90.f90 hello hello_cxx.cc hello_f90.f90 Makefile.include ring_c.c ring_f77.f [r...@redigo-01 examples]# pwd /root/openmpi-1.4.3/examples COMMAND LINE: A) [r...@redigo-01 examples]# mpiexec -n 6 /root/openmpi-1.4.3/examples/cpi problem with execution of /root/openmpi-1.4.3/examples/cpi on redigo-01.lnl.infn.it: [Errno 2] No such file or directory B) [r...@redigo-01 examples]# mpirun rsh redigo-01.lnl.infn.it_59240 hello redigo-01.lnl.infn.it_59240: host unknown redigo-01.lnl.infn.it_59240: host unknown trying normal rsh (/usr/bin/rsh) redigo-01.lnl.infn.it_59240: Unknown host I write the mod.conf (as root user) file and I put it with correct permissions in /etc directory. I don't understand where is my error? Could you help me with these first steps in MPI? Thank you very much. Regards, Andrea On Dec 15, 2010 10:45 PM, Jaszcza, Andrzej andrzej.jasz...@intel.com wrote: Hi Andrea, I think you meant RDMA over TCP/IP (iWarp) and not RDMA over Converged Ethernet (RoCE) as Intel NetEffect cards do not support the latter. You may safely try to run any MPI benchmarks using MVAPICH2, Open MPI or Intel MPI. There is more information on iWarp, Intel NetEffect cards and their possible uses on Intel website: http://www.intel.com/technology/comms/iwarp/index.htm. Regards, Andrzej -Original Message- From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-ow...@vger.kernel.org] On Behalf Of Andrea Gozzelino Sent: Wednesday, December 15, 2010 11:08 AM To: linux-rdma@vger.kernel.org Subject: Intel NetEffect NE020 E10G81GP use Importance: High Hi all, I have tested two cards Intel NetEffect NE020 E10G81GP with RDMA over Ethernet; results are OK in latency and bandwidth. Are there other protocols supported by cards,that offer the same performances? (TCP Offload, STORAGE, MPI). Socket Direct Protocol (SDO) is not good for those performances. Any suggestions about using cards with high performances? What are benefits using these cards in daily storage and data movement operations? Thank you. Best regards, Andrea Gozzelino INFN - Laboratori Nazionali di Legnaro(LNL) Viale dell'Universita' 2 -I-35020 - Legnaro (PD)- ITALIA Office: E-101 Tel: +39 049 8068346 Fax: +39 049 641925 Mail: andrea.gozzel...@lnl.infn.it Cell: +39 3488245552 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html - Intel Technology Poland sp. z o.o. z siedziba w Gdansku ul. Slowackiego 173 80-298 Gdansk Sad Rejonowy Gdansk Polnoc w Gdansku, VII Wydzial Gospodarczy Krajowego Rejestru Sadowego, numer KRS 101882 NIP 957-07-52-316 Kapital zakladowy 200.000 zl This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. Andrea Gozzelino INFN - Laboratori Nazionali di Legnaro (LNL) Viale dell'Universita' 2 -I-35020 - Legnaro (PD)- ITALIA Office: E-101 Tel: +39 049 8068346 Fax: +39 049 641925 Mail: andrea.gozzel...@lnl.infn.it Cell: +39 3488245552 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Intel NetEffect NE020 E10G81GP use
[r...@redigo-01 examples]# ls connectivity_c.c hello_c.c hello_f77.f Makefile README ring_cxx.cc ring_f90.f90 hello hello_cxx.cc hello_f90.f90 Makefile.include ring_c.c ring_f77.f [r...@redigo-01 examples]# pwd /root/openmpi-1.4.3/examples COMMAND LINE: A) [r...@redigo-01 examples]# mpiexec -n 6 /root/openmpi-1.4.3/examples/cpi problem with execution of /root/openmpi-1.4.3/examples/cpi on redigo-01.lnl.infn.it: [Errno 2] No such file or directory You don't have cpi in /root/openmpi-1.4.3/examples directory. Try running hello instead. Chien -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
ibv_get_cq_event blocking forever after successful ibv_post_send...
mine is a single cs mode program. client keep on send messages to server, afte a certain number of times, the server just block in ibv_get_cq_event. the client receive no reply and keep waiting.if manually stop client, server get a completion with wc.status is IBV_WC_WR_FLUSH_ERR. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html