Re: [PATCH V2 5/5] RDMA CM: Netlink Client

2010-12-09 Thread Nir Muchtar
On Wed, 2010-12-08 at 11:23 -0700, Jason Gunthorpe wrote:

  Sorry, I still need some clarifications...
  When you say deadlocks, do you mean when calling malloc with a lock or
  when overflowing a socket receive buffer?
  For the second case, when we use netlink_unicast, the skbuff is sent and
  freed. It is transferred to the userspace's socket using netlink_sendskb
  and accumulated in its recv buff.
  
  Are you referring to a deadlock there? I still fail to see the issue.
  Why would the kernel socket recv buff reach a limit? Could you please
  elaborate?
 
 Netlink is all driven from user space syscalls.. so it looks like
 
 sendmsg()
 [..]
 ibnl_rcv_msg
 cma_get_stats
 [..]
 ibnl_unicast
 [..]
 netlink_attachskb
 (now we block on the socket recv queue once it fills)
 
 The deadlock is that userspace is sitting in sendmsg() while the
 kernel is sleeping in netlink_attachskb waiting for the recvbuf to
 empty.
 
 User space cannot call recvmsg() while it is in blocked in sendmsg()
 so it all goes boom.
 

Oh, now I see what you mean. I thought you meant the recv buffer in the
netlink socket... 

But I'm using MSG_DONTWAIT when calling netlink_unicast, so attachskb
shouldn't block. I also tested that.
I do agree that freeing the skb and simply giving up is not the best we
can do, so we can try and send as much as we can instead, but either
way, a deadlock shouldn't occur.

Nir

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rdma_lat whos

2010-12-09 Thread Or Gerlitz
Ido Shamai wrote:
 The latest git tree is available at 
 git://git.openfabrics.org/~shamoya/perftest.git 

Ido, on a related issue - I'm trying to run ib_send_lat in IBoE environment and 
it fails. 

I'm using the latest cut of the perftest sources from git, as for the other 
components (libibverbs, libmlx4, kernel, FW and HW, see below). Its a system of 
two nodes connected back-to-back with port1 being IB and port2 being Eth, so 
the same perftest code works okay on IB / p1-p1 I have ping working fine over 
mlx4_en, so basically things are okay, I think you made a comment few weeks ago 
that perftest should be working now with IBoE, so I wonder what goes wrong here?

client side:
 ib_send_lat -d mlx4_0 -i 2 boo1
 --
 Send Latency Test
  Connection type : RC
  Inline data is used up to 400 bytes message
  Mtu : 1024
  Link type is Ethernet
  Using gid index 0 as source GID
  local address: LID  QPN 0x44004f PSN 0x6b567a
  GID: 254:128:00:00:00:00:00:00:02:02:201:255:254:07:237:03
  remote address: LID  QPN 0x48004f PSN 0x3e78fc
  GID: 254:128:00:00:00:00:00:00:02:02:201:255:254:07:236:243
 --
  #bytes #iterationst_min[usec]t_max[usec]  t_typical[usec]
  Completion with error at server
  Failed status 5: wr_id 0 syndrom 0xf4
 rcnt=0

server side
 ib_send_lat -d mlx4_0 -i 2
 --
 Send Latency Test
  Connection type : RC
  Inline data is used up to 400 bytes message
  Mtu : 1024
  Link type is Ethernet
  local address: LID  QPN 0x48004f PSN 0x3e78fc
  GID: 254:128:00:00:00:00:00:00:02:02:201:255:254:07:236:243
  remote address: LID  QPN 0x44004f PSN 0x6b567a
  GID: 254:128:00:00:00:00:00:00:02:02:201:255:254:07:237:03
 --
  #bytes #iterationst_min[usec]t_max[usec]  t_typical[usec]


Or.

its ConnectX2 on both sides with firmware 2.7.700
  ibv_devinfo
 hca_id: mlx4_0
 transport:  InfiniBand (0)
 fw_ver: 2.7.700
 node_guid:  0002:c903:0007:ed02
 sys_image_guid: 0002:c903:0007:ed05
 vendor_id:  0x02c9
 vendor_part_id: 26428
 hw_ver: 0xB0
 board_id:   MT_0DD0120009
 phys_port_cnt:  2
 port:   1
 state:  PORT_ACTIVE (4)
 max_mtu:2048 (4)
 active_mtu: 2048 (4)
 sm_lid: 12
 port_lid:   9
 port_lmc:   0x00
 link_layer: IB
 
 port:   2
 state:  PORT_ACTIVE (4)
 max_mtu:2048 (4)
 active_mtu: 1024 (3)
 sm_lid: 0
 port_lid:   0
 port_lmc:   0x00
 link_layer: Ethernet


 ofa_kernel:
 git://git.openfabrics.org/ofed_1_5/linux-2.6.git ofed_kernel_1_5
 commit 21556e24411b4e4b0694f70244d4a33a454ddbf5

 libibverbs:
 http://www.openfabrics.org/downloads/libibverbs/libibverbs-1.1.4-0.14.gb6c138b.tar.gz

 libmlx4:
 http://www.openfabrics.org/downloads/libmlx4/libmlx4-1.0-0.13.g4e5c43f.tar.gz
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rdma_lat whos

2010-12-09 Thread Or Gerlitz
  local address: LID  QPN 0x44004f PSN 0x6b567a
  GID: 254:128:00:00:00:00:00:00:02:02:201:255:254:07:237:03

Also, it would be much easier to track/debug if the GID octets will be printed 
in hexadecimal, can you?

Or
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Variable multicast and patch record queues length.

2010-12-09 Thread Aleksey Senin
Allow to user set size of multicast and path record queues. It
should solve the problem, when the packets are dropped when using
slow SM.
Currently only 3 packets stored in the send queue before drop take
place. Queue's length may be changed at runtime via files mcast_qlen
and prec_qlen files under /sys/module/ib_ipoib/parameters/ directory.

This patch based on idea of Christoph Lameter.
http://lists.openfabrics.org/pipermail/general/2009-June/059853.html

The tool for generating multicast traffic can be found on
http://www.gentwo.org/ll.

Signed-off-by: Aleksey Senin aleks...@voltaire.com
---
 drivers/infiniband/ulp/ipoib/ipoib.h   |2 +
 drivers/infiniband/ulp/ipoib/ipoib_main.c  |   91 +++-
 drivers/infiniband/ulp/ipoib/ipoib_multicast.c |2 +-
 3 files changed, 91 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h 
b/drivers/infiniband/ulp/ipoib/ipoib.h
index 753a983..159e29c 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -698,6 +698,8 @@ static inline void ipoib_unregister_debugfs(void) { }
 
 extern int ipoib_sendq_size;
 extern int ipoib_recvq_size;
+extern unsigned int ipoib_prec_qlen;
+extern unsigned int ipoib_mcast_qlen;
 
 extern struct ib_sa_client ipoib_sa_client;
 
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c 
b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index 9ff7bc7..c07a788 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -69,6 +69,85 @@ module_param(lro_max_aggr, int, 0644);
 MODULE_PARM_DESC(lro_max_aggr, LRO: Max packets to be aggregated 
(default = 64));
 
+unsigned int ipoib_prec_qlen = IPOIB_MAX_PATH_REC_QUEUE;
+unsigned int ipoib_mcast_qlen = IPOIB_MAX_MCAST_QUEUE;
+
+static struct ctl_table_header *ipoib_table_header;
+
+#define MIN_IPOIB_QLENGTH 1
+#define MAX_IPOIB_QLENGTH 256
+
+static unsigned int min_ipoib_qlen  = MIN_IPOIB_QLENGTH;
+static unsigned int max_ipoib_qlen  = MAX_IPOIB_QLENGTH;
+
+static ctl_table ipoib_tunable_table[] = {
+   {
+   .procname   = prec_qlen,
+   .data   = ipoib_prec_qlen,
+   .maxlen = sizeof(unsigned int),
+   .mode   = 0644,
+   .proc_handler   = proc_dointvec_minmax,
+   .extra1 = min_ipoib_qlen,
+   .extra2 = max_ipoib_qlen
+   },
+   {
+   .procname   = mcast_qlen,
+   .data   = ipoib_mcast_qlen,
+   .maxlen = sizeof(unsigned int),
+   .mode   = 0644,
+   .proc_handler   = proc_dointvec_minmax,
+   .extra1 = min_ipoib_qlen,
+   .extra2 = max_ipoib_qlen
+   },
+   {},
+};
+
+static ctl_table ipoib_table[] = {
+   {
+   .procname   = ib_ipoib,
+   .mode   = 0555,
+   .maxlen = 0,
+   .child  = ipoib_tunable_table
+   },
+   {},
+};
+
+static int param_set_uint_minmax(const char *val,
+   const struct kernel_param *kp,
+   unsigned int min, unsigned int max)
+{
+   unsigned long num;
+   int ret;
+
+   if (!val)
+   return -EINVAL;
+   ret = strict_strtoul(val, 0, num);
+   if (ret == -EINVAL || num  min || num  max)
+   return -EINVAL;
+   *((unsigned int *)kp-arg) = num;
+   return 0;
+}
+
+static int param_set_queue_length(const char *val,
+   const struct kernel_param *kp)
+{
+   return param_set_uint_minmax(val, kp,\
+   MIN_IPOIB_QLENGTH, MAX_IPOIB_QLENGTH);
+}
+
+static struct kernel_param_ops param_ops_queue_length = {
+   .set = param_set_queue_length,
+   .get = param_get_uint,
+};
+
+#define param_check_queue_length(name, p) \
+   __param_check(name, p, unsigned int);
+
+module_param_named(prec_qlen, ipoib_prec_qlen, queue_length, 0644);
+MODULE_PARM_DESC(prec_qlen, Path record queue length ([1..256], default = 
3));
+module_param_named(mcast_qlen, ipoib_mcast_qlen, queue_length, 0644);
+MODULE_PARM_DESC(mcast_qlen, Multicast queue length ([1...256], default = 
3));
+
 #ifdef CONFIG_INFINIBAND_IPOIB_DEBUG
 int ipoib_debug_level;
 
@@ -597,7 +676,7 @@ static void neigh_add_path(struct sk_buff *skb, struct 
net_device *dev)
 

Re: rdma_lat whos

2010-12-09 Thread Or Gerlitz
Or Gerlitz wrote:
 Ido Shamai wrote:
 I'm trying to run ib_send_lat in IBoE environment and it fails. 

I got this to work now by specifying the ip address associated with the 
relevant mlx4_en network device on the server side, is this documented anywhere?

Or.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NFS-RDMA hangs: connection closed (-103)

2010-12-09 Thread Tom Tucker

On 12/8/10 9:10 AM, Spelic wrote:
Tom, have you reproduced the RDMA hangs - connection closes bug or 
the sparse file at server side upon NFS hitting ENOSPC ?


Because for the latter people have already given exhaustive 
explanation: see this other thread at 
http://fossplanet.com/f13/%5Blinux-lvm%5D-bugs-mkfs-xfs-device-mapper-xfs-dev-ram-81653/ 



While the former bug is still open and very interesting for us.

I'm working on the 'former' bug. The bug that I think you've run in to 
with how RDMA transport errors are handled and how RPC are retried in 
the event of an error. With hard mounts (which I'm suspecting you have), 
the RPC will be retried forever. In this bug, the transport never 
'recovers' after the error and therefore the RPC never succeeds and the 
mount is effectively hung.


There were bugs fixed in this area between 34 and top which is why you 
saw the less catastrophic, but still broken behavior you see now.


Unfortunately I can only support this part-time, but I'll keep you 
updated on the progress.


Thanks for finding this and helping to debug,
Tom


Thanks for your help
S.


On 12/07/2010 05:12 PM, Tom Tucker wrote:

Status update...

I have reproduced the bug a number of different ways. It seems to be 
most easily reproduced by simply writing more data than the 
filesystem has space for. I can do this reliably with any FS. I think 
the XFS bug may have tickled this bug somehow.


Tom

On 12/2/10 1:09 PM, Spelic wrote:

Hello all
please be aware that the file oversize bug is reproducible also 
without infiniband, with just nfs over ethernet over xfs over 
ramdisk (but it doesn't hang, so it's a different bug than the one I 
posted here at the RDMA mailing list)
I have posted another thread regarding the file oversize bug, 
which you can read in the LVM, XFS, and LKML mailing lists, please 
have a look
http://fossplanet.com/f13/%5Blinux-lvm%5D-bugs-mkfs-xfs-device-mapper-xfs-dev-ram-81653/ 

Especially my second post, replying myself at +30 minutes, explains 
that it's reproducible also with ethernet.


Thank you

On 12/02/2010 07:37 PM, Roland Dreier wrote:
Adding Dave Chinner to the cc list, since he's both an XFS guru as 
well

as being very familiar with NFS and RDMA...

Dave, if you read below, it seems there is some strange behavior
exporting XFS with NFS/RDMA.

  - R.

  On 12/02/2010 12:59 AM, Tom Tucker wrote:
   Spelic,
 
   I have seen this problem before, but have not been able to 
reliably
   reproduce it. When I saw the problem, there were no transport 
errors
   and it appeared as if the I/O had actually completed, but that 
the

   waiter was not being awoken. I was not able to reliably reproduce
   the problem and was not able to determine if the problem was a
   latent bug in NFS in general or a bug in the RDMA transport in
   particular.
 
   I will try your setup here, but I don't have a system like 
yours so

   I'll have to settle for a smaller ramdisk, however, I have a few
   questions:
 
   - Does the FS matter? For example, can you use ext[2-4] on the
   ramdisk and not still reproduce
   - As I mentioned earlier NFS v3 vs. NFS v4
   - RAMDISK size, i.e. 2G vs. 14G
 
   Thanks,
   Tom

  Hello Tom, thanks for replying

  - The FS matters to some extent: as I wrote, with ext4 it's not
  possible to reproduce the bug in this way, so immediately and
  reliably, however ext4 also will hang eventually if you work on 
it for

  hours so I had to switch to IPoIB for our real work; reread my
  previous post.

  - NFS3 not tried yet. Never tried to do RDMA on NFS3... do you 
have a

  pointer on instructions?


  - RAMDISK size: I am testing it.

  Ok I confirm with 1.5GB ramdisk it's reproducible.
  boot option ramdisk_size=1572864
  (1.5*1024**2=1572864.0)
  confirm: blockdev --getsize64 /dev/ram0 == 1610612736

  now at server side mkfs and mount with defaults:
  mkfs.xfs /dev/ram0
  mount /dev/ram0 /mnt/ram
  (this is a simplification over my previous email, and it's 
needed with

  a smaller ramdisk or mkfs.xfs will refuse to work. The bug is still
  reproducible like this)


  DOH! another bug:
  It's strange how at the end of the test
  ls -lh /mnt/ram
  at server side will show a zerofile larger than 1.5GB at the end of
  the procedure, sometimes it's 3GB, sometimes it's 2.3GB... but it's
  larger than the ramdisk size.

  # ll -h /mnt/ram
  total 1.5G
  drwxr-xr-x 2 root root   21 2010-12-02 12:54 ./
  drwxr-xr-x 3 root root 4.0K 2010-11-29 23:51 ../
  -rw-r--r-- 1 root root 2.3G 2010-12-02 12:59 zerofile
  # df -h
  FilesystemSize  Used Avail Use% Mounted on
  /dev/sda1 294G  4.1G  275G   2% /
  devtmpfs  7.9G  184K  7.9G   1% /dev
  none  7.9G 0  7.9G   0% /dev/shm
  none  7.9G  100K  7.9G   1% /var/run
  none  7.9G 0  7.9G   0% /var/lock
  none  7.9G 0  7.9G   0% /lib/init/rw
  /dev/ram0 1.5G  1.5G   20K 100% /mnt/ram

  # dd 

Re: [PATCH] replace (long*)(long) casting with transportable data type (uintptr_t)

2010-12-09 Thread Sasha Khapyorsky
On 15:18 Tue 07 Dec , Smith, Stan wrote:
 
 We should return to my original patch submission.
 remove the (long*) (long) and replace with (uintptr_t)
 
  -(osmv_query_req_t *) (long *)(long)(p_madw-context.ni_context.
  -node_guid);
  +(osmv_query_req_t *) (uintptr_t) 
  p_madw-context.ni_context.node_guid;
 
 Sasha, can you take care of this?

Done. Thanks.

Sasha
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ibnetdiscover issue

2010-12-09 Thread Sasha Khapyorsky
Hi Tom,

On 16:14 Wed 08 Dec , Tom Ammon wrote:
 
 Is there a quick workaround we could put in place? I want to map out our 
 fabric, and I especially need the spine GUIDs on the GD4200 because I'm 
 going to be doing up/down routing and want to specify the root GUIDs. I 
 can also submit a support case to Voltaire, if you think that would make 
 it go faster. I want to make sure we are using OFED as distributed from OFA.

As far as I can see ibnetdiscover supports 4200 device. Could you rerun
ibnetdiscover without '-g' option and send me results?

Sasha

 
 Tom
 
 On 12/8/2010 11:28 AM, Hal Rosenstock wrote:
  Hi Tom,
 
  On 12/8/2010 12:48 PM, Tom Ammon wrote:
  Hi,
 
  I get the following when I try to run ibnetdiscover from a server
  plugged in to a voltaire 4036 switch. We're using OFED 1.5.2:
 
  [r...@sm1 ~]# ibnetdiscover
  src/chassis.c:535; Unexpected node found: guid 0x0008f1050075134c
  ibnetdiscover: iberror: failed: discover failed
 
  Looks to me like there's a missing is_spine_4200() clause missing in
  get_router_slot in libibnetdisc/src/chassis.c. Eli had added changes to
  support the 4200 so he's the best one to comment.
 
  -- Hal
 
 
  However, ibdiagnet runs fine:
 
  [r...@sm1 ~]# ibdiagnet
  Loading IBDIAGNET from: /usr/lib64/ibdiagnet1.5.4
  -W- Topology file is not specified.
  Reports regarding cluster links will use direct routes.
  Loading IBDM from: /usr/lib64/ibdm1.5.4
  -I- Using port 1 as the local port.
  -I- Discovering ... 277 nodes (23 Switches  254 CA-s) discovered.
 
 
  -I---
  -I- Bad Guids/LIDs Info
  -I---
  -I- No bad Guids were found
 
  -I---
  -I- Links With Logical State = INIT
  -I---
  -I- No bad Links (with logical state = INIT) were found
 
  -I---
  -I- General Device Info
  -I---
 
  -I---
  -I- PM Counters Info
  -I---
  -W- lid=0x0007 guid=0x0008f105006515ba dev=23131 Port=33
  Performance Monitor counter : Value
  link_error_recovery_counter : 0xff (overflow)
  -W- lid=0x0010 guid=0x0008f10500201d7c dev=23130 Port=14
  Performance Monitor counter : Value
  symbol_error_counter : 0x (overflow)
  -W- lid=0x0001 guid=0x0008f10500108a76 dev=23130 Port=30
  Performance Monitor counter : Value
  symbol_error_counter : 0x (overflow)
 
  -I---
  -I- Fabric Partitions Report (see ibdiagnet.pkey for a full hosts list)
  -I---
  -I- PKey:0x7fff Hosts:254 full:254 limited:0
 
  -I---
  -I- IPoIB Subnets Check
  -I---
  -I- Subnet: IPv4 PKey:0x7fff QKey:0x0b1b MTU:2048Byte rate:10Gbps
  SL:0x00
 
  -I---
  -I- Bad Links Info
  -I- No bad link were found
  -I---
  
  -I- Stages Status Report:
  STAGE Errors Warnings
  Bad GUIDs/LIDs Check 0 0
  Link State Active Check 0 0
  General Devices Info Report 0 0
  Performance Counters Report 0 3
  Partitions Check 0 0
  IPoIB Subnets Check 0 0
 
  Please see /tmp/ibdiagnet.log for complete log
  
 
  -I- Done. Run time was 21 seconds.
 
  Any ideas?
 
  Tom
 
 
  --
  To unsubscribe from this list: send the line unsubscribe linux-rdma in
  the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 -- 
 Tom Ammon
 Network Engineer
 Office: 801.587.0976
 Mobile: 801.674.9273
 
 Center for High Performance Computing
 University of Utah
 http://www.chpc.utah.edu
 --
 To unsubscribe from this list: send the line unsubscribe linux-rdma in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 5/5] RDMA CM: Netlink Client

2010-12-09 Thread Jason Gunthorpe
On Thu, Dec 09, 2010 at 10:47:18AM +0200, Nir Muchtar wrote:

 But I'm using MSG_DONTWAIT when calling netlink_unicast, so attachskb
 shouldn't block. I also tested that.

But then you are guarenteed to have an incomplete dump once you have
enough entries!

The best trade off is what the other dump_start user's do, you might
have an inconsistent dump sometimes, but at least it is complete and
correct most of the time.

Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rdma_lat whos

2010-12-09 Thread Jason Gunthorpe
On Thu, Dec 09, 2010 at 12:39:00PM +0200, Or Gerlitz wrote:
   local address: LID  QPN 0x44004f PSN 0x6b567a
   GID: 254:128:00:00:00:00:00:00:02:02:201:255:254:07:237:03
 
 Also, it would be much easier to track/debug if the GID octets will be 
 printed in hexadecimal, can you?

GIDs should be printed with inet_ntop(AF_INET6)

Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


rdma_cm resource over use

2010-12-09 Thread Eli Cohen
Hi,
the rdma_cm library opens all availabe devices and keeps them open
even after binding to a specific device/port. This eats resources from
other devices that would otherwise be available to other applications.
Is there a way to avoid this? If not, maybe we should close all other
devices after binfing?
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: rdma_cm resource over use

2010-12-09 Thread Hefty, Sean
 the rdma_cm library opens all availabe devices and keeps them open
 even after binding to a specific device/port. This eats resources from
 other devices that would otherwise be available to other applications.
 Is there a way to avoid this? If not, maybe we should close all other
 devices after binfing?

The librdmacm documents that all RDMA devices remain open while the librdmacm 
is loaded (see rdma_get_devices).  Also, a listen doesn't need to be bound to 
any specific device, but the connection request will be.  Personally, I really 
don't see this as a major use of resources.

- Sean


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rdma_cm resource over use

2010-12-09 Thread Eli Cohen
For ConnectX, mlx4 will consume a UAR page for each open of libibverbs
device file. On a 4K page architectures we usually have 1024 UAR
pages. On a 64K page size the situation becomes worth. Suppose you
have a system with more than one device. One of the devices is used by
rdam_cm and the other is not. You can easily exhaust all the
resources. We actually have this situation at a customer.

On Thu, Dec 9, 2010 at 8:30 PM, Hefty, Sean sean.he...@intel.com wrote:
 the rdma_cm library opens all availabe devices and keeps them open
 even after binding to a specific device/port. This eats resources from
 other devices that would otherwise be available to other applications.
 Is there a way to avoid this? If not, maybe we should close all other
 devices after binfing?

 The librdmacm documents that all RDMA devices remain open while the librdmacm 
 is loaded (see rdma_get_devices).  Also, a listen doesn't need to be bound to 
 any specific device, but the connection request will be.  Personally, I 
 really don't see this as a major use of resources.

 - Sean


 --
 To unsubscribe from this list: send the line unsubscribe linux-rdma in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ewg] IPoIB to Ethernet routing performance

2010-12-09 Thread Christoph Lameter
On Mon, 6 Dec 2010, sebastien dugue wrote:

  The Mellanox BridgeX looks a better hardware solution with 12x 10Ge
  ports but when I tested this they could only provide vNIC
  functionality and would not commit to adding IPoIB gateway on their
  roadmap.

   Right, we did some evaluation on it and this was really a show stopper.

Did the same thing here came to the same conclusions.

  Qlogic also offer the 12400 Gateway.  This has 6x 10ge ports.
  However, like the Mellanox, I understand they only provide host vNIC
  support.

Really? I was hoping that they would have something worth looking at.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch] IB: handle -ENOMEM in forward_trap()

2010-12-09 Thread Dan Carpenter
ib_create_send_mad() can return ERR_PTR(-ENOMEM) here.

Signed-off-by: Dan Carpenter erro...@gmail.com

diff --git a/drivers/infiniband/hw/mlx4/mad.c b/drivers/infiniband/hw/mlx4/mad.c
index c9a8dd6..a1add16 100644
--- a/drivers/infiniband/hw/mlx4/mad.c
+++ b/drivers/infiniband/hw/mlx4/mad.c
@@ -211,6 +211,9 @@ static void forward_trap(struct mlx4_ib_dev *dev, u8 
port_num, struct ib_mad *ma
if (agent) {
send_buf = ib_create_send_mad(agent, qpn, 0, 0, IB_MGMT_MAD_HDR,
  IB_MGMT_MAD_DATA, GFP_ATOMIC);
+
+   if (IS_ERR(send_buf))
+   return;
/*
 * We rely here on the fact that MLX QPs don't use the
 * address handle after the send is posted (this is
diff --git a/drivers/infiniband/hw/mthca/mthca_mad.c 
b/drivers/infiniband/hw/mthca/mthca_mad.c
index 5648659..03a59534 100644
--- a/drivers/infiniband/hw/mthca/mthca_mad.c
+++ b/drivers/infiniband/hw/mthca/mthca_mad.c
@@ -171,6 +171,8 @@ static void forward_trap(struct mthca_dev *dev,
if (agent) {
send_buf = ib_create_send_mad(agent, qpn, 0, 0, IB_MGMT_MAD_HDR,
  IB_MGMT_MAD_DATA, GFP_ATOMIC);
+   if (IS_ERR(send_buf))
+   return;
/*
 * We rely here on the fact that MLX QPs don't use the
 * address handle after the send is posted (this is
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html