Re: [PATCH V2 5/5] RDMA CM: Netlink Client
On Wed, 2010-12-08 at 11:23 -0700, Jason Gunthorpe wrote: Sorry, I still need some clarifications... When you say deadlocks, do you mean when calling malloc with a lock or when overflowing a socket receive buffer? For the second case, when we use netlink_unicast, the skbuff is sent and freed. It is transferred to the userspace's socket using netlink_sendskb and accumulated in its recv buff. Are you referring to a deadlock there? I still fail to see the issue. Why would the kernel socket recv buff reach a limit? Could you please elaborate? Netlink is all driven from user space syscalls.. so it looks like sendmsg() [..] ibnl_rcv_msg cma_get_stats [..] ibnl_unicast [..] netlink_attachskb (now we block on the socket recv queue once it fills) The deadlock is that userspace is sitting in sendmsg() while the kernel is sleeping in netlink_attachskb waiting for the recvbuf to empty. User space cannot call recvmsg() while it is in blocked in sendmsg() so it all goes boom. Oh, now I see what you mean. I thought you meant the recv buffer in the netlink socket... But I'm using MSG_DONTWAIT when calling netlink_unicast, so attachskb shouldn't block. I also tested that. I do agree that freeing the skb and simply giving up is not the best we can do, so we can try and send as much as we can instead, but either way, a deadlock shouldn't occur. Nir -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: rdma_lat whos
Ido Shamai wrote: The latest git tree is available at git://git.openfabrics.org/~shamoya/perftest.git Ido, on a related issue - I'm trying to run ib_send_lat in IBoE environment and it fails. I'm using the latest cut of the perftest sources from git, as for the other components (libibverbs, libmlx4, kernel, FW and HW, see below). Its a system of two nodes connected back-to-back with port1 being IB and port2 being Eth, so the same perftest code works okay on IB / p1-p1 I have ping working fine over mlx4_en, so basically things are okay, I think you made a comment few weeks ago that perftest should be working now with IBoE, so I wonder what goes wrong here? client side: ib_send_lat -d mlx4_0 -i 2 boo1 -- Send Latency Test Connection type : RC Inline data is used up to 400 bytes message Mtu : 1024 Link type is Ethernet Using gid index 0 as source GID local address: LID QPN 0x44004f PSN 0x6b567a GID: 254:128:00:00:00:00:00:00:02:02:201:255:254:07:237:03 remote address: LID QPN 0x48004f PSN 0x3e78fc GID: 254:128:00:00:00:00:00:00:02:02:201:255:254:07:236:243 -- #bytes #iterationst_min[usec]t_max[usec] t_typical[usec] Completion with error at server Failed status 5: wr_id 0 syndrom 0xf4 rcnt=0 server side ib_send_lat -d mlx4_0 -i 2 -- Send Latency Test Connection type : RC Inline data is used up to 400 bytes message Mtu : 1024 Link type is Ethernet local address: LID QPN 0x48004f PSN 0x3e78fc GID: 254:128:00:00:00:00:00:00:02:02:201:255:254:07:236:243 remote address: LID QPN 0x44004f PSN 0x6b567a GID: 254:128:00:00:00:00:00:00:02:02:201:255:254:07:237:03 -- #bytes #iterationst_min[usec]t_max[usec] t_typical[usec] Or. its ConnectX2 on both sides with firmware 2.7.700 ibv_devinfo hca_id: mlx4_0 transport: InfiniBand (0) fw_ver: 2.7.700 node_guid: 0002:c903:0007:ed02 sys_image_guid: 0002:c903:0007:ed05 vendor_id: 0x02c9 vendor_part_id: 26428 hw_ver: 0xB0 board_id: MT_0DD0120009 phys_port_cnt: 2 port: 1 state: PORT_ACTIVE (4) max_mtu:2048 (4) active_mtu: 2048 (4) sm_lid: 12 port_lid: 9 port_lmc: 0x00 link_layer: IB port: 2 state: PORT_ACTIVE (4) max_mtu:2048 (4) active_mtu: 1024 (3) sm_lid: 0 port_lid: 0 port_lmc: 0x00 link_layer: Ethernet ofa_kernel: git://git.openfabrics.org/ofed_1_5/linux-2.6.git ofed_kernel_1_5 commit 21556e24411b4e4b0694f70244d4a33a454ddbf5 libibverbs: http://www.openfabrics.org/downloads/libibverbs/libibverbs-1.1.4-0.14.gb6c138b.tar.gz libmlx4: http://www.openfabrics.org/downloads/libmlx4/libmlx4-1.0-0.13.g4e5c43f.tar.gz -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: rdma_lat whos
local address: LID QPN 0x44004f PSN 0x6b567a GID: 254:128:00:00:00:00:00:00:02:02:201:255:254:07:237:03 Also, it would be much easier to track/debug if the GID octets will be printed in hexadecimal, can you? Or -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Variable multicast and patch record queues length.
Allow to user set size of multicast and path record queues. It should solve the problem, when the packets are dropped when using slow SM. Currently only 3 packets stored in the send queue before drop take place. Queue's length may be changed at runtime via files mcast_qlen and prec_qlen files under /sys/module/ib_ipoib/parameters/ directory. This patch based on idea of Christoph Lameter. http://lists.openfabrics.org/pipermail/general/2009-June/059853.html The tool for generating multicast traffic can be found on http://www.gentwo.org/ll. Signed-off-by: Aleksey Senin aleks...@voltaire.com --- drivers/infiniband/ulp/ipoib/ipoib.h |2 + drivers/infiniband/ulp/ipoib/ipoib_main.c | 91 +++- drivers/infiniband/ulp/ipoib/ipoib_multicast.c |2 +- 3 files changed, 91 insertions(+), 4 deletions(-) diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h index 753a983..159e29c 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib.h +++ b/drivers/infiniband/ulp/ipoib/ipoib.h @@ -698,6 +698,8 @@ static inline void ipoib_unregister_debugfs(void) { } extern int ipoib_sendq_size; extern int ipoib_recvq_size; +extern unsigned int ipoib_prec_qlen; +extern unsigned int ipoib_mcast_qlen; extern struct ib_sa_client ipoib_sa_client; diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c index 9ff7bc7..c07a788 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c @@ -69,6 +69,85 @@ module_param(lro_max_aggr, int, 0644); MODULE_PARM_DESC(lro_max_aggr, LRO: Max packets to be aggregated (default = 64)); +unsigned int ipoib_prec_qlen = IPOIB_MAX_PATH_REC_QUEUE; +unsigned int ipoib_mcast_qlen = IPOIB_MAX_MCAST_QUEUE; + +static struct ctl_table_header *ipoib_table_header; + +#define MIN_IPOIB_QLENGTH 1 +#define MAX_IPOIB_QLENGTH 256 + +static unsigned int min_ipoib_qlen = MIN_IPOIB_QLENGTH; +static unsigned int max_ipoib_qlen = MAX_IPOIB_QLENGTH; + +static ctl_table ipoib_tunable_table[] = { + { + .procname = prec_qlen, + .data = ipoib_prec_qlen, + .maxlen = sizeof(unsigned int), + .mode = 0644, + .proc_handler = proc_dointvec_minmax, + .extra1 = min_ipoib_qlen, + .extra2 = max_ipoib_qlen + }, + { + .procname = mcast_qlen, + .data = ipoib_mcast_qlen, + .maxlen = sizeof(unsigned int), + .mode = 0644, + .proc_handler = proc_dointvec_minmax, + .extra1 = min_ipoib_qlen, + .extra2 = max_ipoib_qlen + }, + {}, +}; + +static ctl_table ipoib_table[] = { + { + .procname = ib_ipoib, + .mode = 0555, + .maxlen = 0, + .child = ipoib_tunable_table + }, + {}, +}; + +static int param_set_uint_minmax(const char *val, + const struct kernel_param *kp, + unsigned int min, unsigned int max) +{ + unsigned long num; + int ret; + + if (!val) + return -EINVAL; + ret = strict_strtoul(val, 0, num); + if (ret == -EINVAL || num min || num max) + return -EINVAL; + *((unsigned int *)kp-arg) = num; + return 0; +} + +static int param_set_queue_length(const char *val, + const struct kernel_param *kp) +{ + return param_set_uint_minmax(val, kp,\ + MIN_IPOIB_QLENGTH, MAX_IPOIB_QLENGTH); +} + +static struct kernel_param_ops param_ops_queue_length = { + .set = param_set_queue_length, + .get = param_get_uint, +}; + +#define param_check_queue_length(name, p) \ + __param_check(name, p, unsigned int); + +module_param_named(prec_qlen, ipoib_prec_qlen, queue_length, 0644); +MODULE_PARM_DESC(prec_qlen, Path record queue length ([1..256], default = 3)); +module_param_named(mcast_qlen, ipoib_mcast_qlen, queue_length, 0644); +MODULE_PARM_DESC(mcast_qlen, Multicast queue length ([1...256], default = 3)); + #ifdef CONFIG_INFINIBAND_IPOIB_DEBUG int ipoib_debug_level; @@ -597,7 +676,7 @@ static void neigh_add_path(struct sk_buff *skb, struct net_device *dev)
Re: rdma_lat whos
Or Gerlitz wrote: Ido Shamai wrote: I'm trying to run ib_send_lat in IBoE environment and it fails. I got this to work now by specifying the ip address associated with the relevant mlx4_en network device on the server side, is this documented anywhere? Or. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: NFS-RDMA hangs: connection closed (-103)
On 12/8/10 9:10 AM, Spelic wrote: Tom, have you reproduced the RDMA hangs - connection closes bug or the sparse file at server side upon NFS hitting ENOSPC ? Because for the latter people have already given exhaustive explanation: see this other thread at http://fossplanet.com/f13/%5Blinux-lvm%5D-bugs-mkfs-xfs-device-mapper-xfs-dev-ram-81653/ While the former bug is still open and very interesting for us. I'm working on the 'former' bug. The bug that I think you've run in to with how RDMA transport errors are handled and how RPC are retried in the event of an error. With hard mounts (which I'm suspecting you have), the RPC will be retried forever. In this bug, the transport never 'recovers' after the error and therefore the RPC never succeeds and the mount is effectively hung. There were bugs fixed in this area between 34 and top which is why you saw the less catastrophic, but still broken behavior you see now. Unfortunately I can only support this part-time, but I'll keep you updated on the progress. Thanks for finding this and helping to debug, Tom Thanks for your help S. On 12/07/2010 05:12 PM, Tom Tucker wrote: Status update... I have reproduced the bug a number of different ways. It seems to be most easily reproduced by simply writing more data than the filesystem has space for. I can do this reliably with any FS. I think the XFS bug may have tickled this bug somehow. Tom On 12/2/10 1:09 PM, Spelic wrote: Hello all please be aware that the file oversize bug is reproducible also without infiniband, with just nfs over ethernet over xfs over ramdisk (but it doesn't hang, so it's a different bug than the one I posted here at the RDMA mailing list) I have posted another thread regarding the file oversize bug, which you can read in the LVM, XFS, and LKML mailing lists, please have a look http://fossplanet.com/f13/%5Blinux-lvm%5D-bugs-mkfs-xfs-device-mapper-xfs-dev-ram-81653/ Especially my second post, replying myself at +30 minutes, explains that it's reproducible also with ethernet. Thank you On 12/02/2010 07:37 PM, Roland Dreier wrote: Adding Dave Chinner to the cc list, since he's both an XFS guru as well as being very familiar with NFS and RDMA... Dave, if you read below, it seems there is some strange behavior exporting XFS with NFS/RDMA. - R. On 12/02/2010 12:59 AM, Tom Tucker wrote: Spelic, I have seen this problem before, but have not been able to reliably reproduce it. When I saw the problem, there were no transport errors and it appeared as if the I/O had actually completed, but that the waiter was not being awoken. I was not able to reliably reproduce the problem and was not able to determine if the problem was a latent bug in NFS in general or a bug in the RDMA transport in particular. I will try your setup here, but I don't have a system like yours so I'll have to settle for a smaller ramdisk, however, I have a few questions: - Does the FS matter? For example, can you use ext[2-4] on the ramdisk and not still reproduce - As I mentioned earlier NFS v3 vs. NFS v4 - RAMDISK size, i.e. 2G vs. 14G Thanks, Tom Hello Tom, thanks for replying - The FS matters to some extent: as I wrote, with ext4 it's not possible to reproduce the bug in this way, so immediately and reliably, however ext4 also will hang eventually if you work on it for hours so I had to switch to IPoIB for our real work; reread my previous post. - NFS3 not tried yet. Never tried to do RDMA on NFS3... do you have a pointer on instructions? - RAMDISK size: I am testing it. Ok I confirm with 1.5GB ramdisk it's reproducible. boot option ramdisk_size=1572864 (1.5*1024**2=1572864.0) confirm: blockdev --getsize64 /dev/ram0 == 1610612736 now at server side mkfs and mount with defaults: mkfs.xfs /dev/ram0 mount /dev/ram0 /mnt/ram (this is a simplification over my previous email, and it's needed with a smaller ramdisk or mkfs.xfs will refuse to work. The bug is still reproducible like this) DOH! another bug: It's strange how at the end of the test ls -lh /mnt/ram at server side will show a zerofile larger than 1.5GB at the end of the procedure, sometimes it's 3GB, sometimes it's 2.3GB... but it's larger than the ramdisk size. # ll -h /mnt/ram total 1.5G drwxr-xr-x 2 root root 21 2010-12-02 12:54 ./ drwxr-xr-x 3 root root 4.0K 2010-11-29 23:51 ../ -rw-r--r-- 1 root root 2.3G 2010-12-02 12:59 zerofile # df -h FilesystemSize Used Avail Use% Mounted on /dev/sda1 294G 4.1G 275G 2% / devtmpfs 7.9G 184K 7.9G 1% /dev none 7.9G 0 7.9G 0% /dev/shm none 7.9G 100K 7.9G 1% /var/run none 7.9G 0 7.9G 0% /var/lock none 7.9G 0 7.9G 0% /lib/init/rw /dev/ram0 1.5G 1.5G 20K 100% /mnt/ram # dd
Re: [PATCH] replace (long*)(long) casting with transportable data type (uintptr_t)
On 15:18 Tue 07 Dec , Smith, Stan wrote: We should return to my original patch submission. remove the (long*) (long) and replace with (uintptr_t) -(osmv_query_req_t *) (long *)(long)(p_madw-context.ni_context. -node_guid); +(osmv_query_req_t *) (uintptr_t) p_madw-context.ni_context.node_guid; Sasha, can you take care of this? Done. Thanks. Sasha -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ibnetdiscover issue
Hi Tom, On 16:14 Wed 08 Dec , Tom Ammon wrote: Is there a quick workaround we could put in place? I want to map out our fabric, and I especially need the spine GUIDs on the GD4200 because I'm going to be doing up/down routing and want to specify the root GUIDs. I can also submit a support case to Voltaire, if you think that would make it go faster. I want to make sure we are using OFED as distributed from OFA. As far as I can see ibnetdiscover supports 4200 device. Could you rerun ibnetdiscover without '-g' option and send me results? Sasha Tom On 12/8/2010 11:28 AM, Hal Rosenstock wrote: Hi Tom, On 12/8/2010 12:48 PM, Tom Ammon wrote: Hi, I get the following when I try to run ibnetdiscover from a server plugged in to a voltaire 4036 switch. We're using OFED 1.5.2: [r...@sm1 ~]# ibnetdiscover src/chassis.c:535; Unexpected node found: guid 0x0008f1050075134c ibnetdiscover: iberror: failed: discover failed Looks to me like there's a missing is_spine_4200() clause missing in get_router_slot in libibnetdisc/src/chassis.c. Eli had added changes to support the 4200 so he's the best one to comment. -- Hal However, ibdiagnet runs fine: [r...@sm1 ~]# ibdiagnet Loading IBDIAGNET from: /usr/lib64/ibdiagnet1.5.4 -W- Topology file is not specified. Reports regarding cluster links will use direct routes. Loading IBDM from: /usr/lib64/ibdm1.5.4 -I- Using port 1 as the local port. -I- Discovering ... 277 nodes (23 Switches 254 CA-s) discovered. -I--- -I- Bad Guids/LIDs Info -I--- -I- No bad Guids were found -I--- -I- Links With Logical State = INIT -I--- -I- No bad Links (with logical state = INIT) were found -I--- -I- General Device Info -I--- -I--- -I- PM Counters Info -I--- -W- lid=0x0007 guid=0x0008f105006515ba dev=23131 Port=33 Performance Monitor counter : Value link_error_recovery_counter : 0xff (overflow) -W- lid=0x0010 guid=0x0008f10500201d7c dev=23130 Port=14 Performance Monitor counter : Value symbol_error_counter : 0x (overflow) -W- lid=0x0001 guid=0x0008f10500108a76 dev=23130 Port=30 Performance Monitor counter : Value symbol_error_counter : 0x (overflow) -I--- -I- Fabric Partitions Report (see ibdiagnet.pkey for a full hosts list) -I--- -I- PKey:0x7fff Hosts:254 full:254 limited:0 -I--- -I- IPoIB Subnets Check -I--- -I- Subnet: IPv4 PKey:0x7fff QKey:0x0b1b MTU:2048Byte rate:10Gbps SL:0x00 -I--- -I- Bad Links Info -I- No bad link were found -I--- -I- Stages Status Report: STAGE Errors Warnings Bad GUIDs/LIDs Check 0 0 Link State Active Check 0 0 General Devices Info Report 0 0 Performance Counters Report 0 3 Partitions Check 0 0 IPoIB Subnets Check 0 0 Please see /tmp/ibdiagnet.log for complete log -I- Done. Run time was 21 seconds. Any ideas? Tom -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Tom Ammon Network Engineer Office: 801.587.0976 Mobile: 801.674.9273 Center for High Performance Computing University of Utah http://www.chpc.utah.edu -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2 5/5] RDMA CM: Netlink Client
On Thu, Dec 09, 2010 at 10:47:18AM +0200, Nir Muchtar wrote: But I'm using MSG_DONTWAIT when calling netlink_unicast, so attachskb shouldn't block. I also tested that. But then you are guarenteed to have an incomplete dump once you have enough entries! The best trade off is what the other dump_start user's do, you might have an inconsistent dump sometimes, but at least it is complete and correct most of the time. Jason -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: rdma_lat whos
On Thu, Dec 09, 2010 at 12:39:00PM +0200, Or Gerlitz wrote: local address: LID QPN 0x44004f PSN 0x6b567a GID: 254:128:00:00:00:00:00:00:02:02:201:255:254:07:237:03 Also, it would be much easier to track/debug if the GID octets will be printed in hexadecimal, can you? GIDs should be printed with inet_ntop(AF_INET6) Jason -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
rdma_cm resource over use
Hi, the rdma_cm library opens all availabe devices and keeps them open even after binding to a specific device/port. This eats resources from other devices that would otherwise be available to other applications. Is there a way to avoid this? If not, maybe we should close all other devices after binfing? -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: rdma_cm resource over use
the rdma_cm library opens all availabe devices and keeps them open even after binding to a specific device/port. This eats resources from other devices that would otherwise be available to other applications. Is there a way to avoid this? If not, maybe we should close all other devices after binfing? The librdmacm documents that all RDMA devices remain open while the librdmacm is loaded (see rdma_get_devices). Also, a listen doesn't need to be bound to any specific device, but the connection request will be. Personally, I really don't see this as a major use of resources. - Sean -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: rdma_cm resource over use
For ConnectX, mlx4 will consume a UAR page for each open of libibverbs device file. On a 4K page architectures we usually have 1024 UAR pages. On a 64K page size the situation becomes worth. Suppose you have a system with more than one device. One of the devices is used by rdam_cm and the other is not. You can easily exhaust all the resources. We actually have this situation at a customer. On Thu, Dec 9, 2010 at 8:30 PM, Hefty, Sean sean.he...@intel.com wrote: the rdma_cm library opens all availabe devices and keeps them open even after binding to a specific device/port. This eats resources from other devices that would otherwise be available to other applications. Is there a way to avoid this? If not, maybe we should close all other devices after binfing? The librdmacm documents that all RDMA devices remain open while the librdmacm is loaded (see rdma_get_devices). Also, a listen doesn't need to be bound to any specific device, but the connection request will be. Personally, I really don't see this as a major use of resources. - Sean -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ewg] IPoIB to Ethernet routing performance
On Mon, 6 Dec 2010, sebastien dugue wrote: The Mellanox BridgeX looks a better hardware solution with 12x 10Ge ports but when I tested this they could only provide vNIC functionality and would not commit to adding IPoIB gateway on their roadmap. Right, we did some evaluation on it and this was really a show stopper. Did the same thing here came to the same conclusions. Qlogic also offer the 12400 Gateway. This has 6x 10ge ports. However, like the Mellanox, I understand they only provide host vNIC support. Really? I was hoping that they would have something worth looking at. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch] IB: handle -ENOMEM in forward_trap()
ib_create_send_mad() can return ERR_PTR(-ENOMEM) here. Signed-off-by: Dan Carpenter erro...@gmail.com diff --git a/drivers/infiniband/hw/mlx4/mad.c b/drivers/infiniband/hw/mlx4/mad.c index c9a8dd6..a1add16 100644 --- a/drivers/infiniband/hw/mlx4/mad.c +++ b/drivers/infiniband/hw/mlx4/mad.c @@ -211,6 +211,9 @@ static void forward_trap(struct mlx4_ib_dev *dev, u8 port_num, struct ib_mad *ma if (agent) { send_buf = ib_create_send_mad(agent, qpn, 0, 0, IB_MGMT_MAD_HDR, IB_MGMT_MAD_DATA, GFP_ATOMIC); + + if (IS_ERR(send_buf)) + return; /* * We rely here on the fact that MLX QPs don't use the * address handle after the send is posted (this is diff --git a/drivers/infiniband/hw/mthca/mthca_mad.c b/drivers/infiniband/hw/mthca/mthca_mad.c index 5648659..03a59534 100644 --- a/drivers/infiniband/hw/mthca/mthca_mad.c +++ b/drivers/infiniband/hw/mthca/mthca_mad.c @@ -171,6 +171,8 @@ static void forward_trap(struct mthca_dev *dev, if (agent) { send_buf = ib_create_send_mad(agent, qpn, 0, 0, IB_MGMT_MAD_HDR, IB_MGMT_MAD_DATA, GFP_ATOMIC); + if (IS_ERR(send_buf)) + return; /* * We rely here on the fact that MLX QPs don't use the * address handle after the send is posted (this is -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html