Re: [PATCH 20/22] IB/iser: Support up to 8MB data transfer in a single command

2015-08-03 Thread Atchley, Scott
On Aug 2, 2015, at 4:01 AM, Sagi Grimberg sa...@dev.mellanox.co.il wrote:

 +static void
 +iser_calc_scsi_params(struct iser_conn *iser_conn,
 +  unsigned int max_sectors)
 +{
 +struct iser_device *device = iser_conn-ib_conn.device;
 +unsigned short sg_tablesize, sup_sg_tablesize;
 +
 +sg_tablesize = DIV_ROUND_UP(max_sectors * 512, SIZE_4K);
 +sup_sg_tablesize = min_t(unsigned, ISCSI_ISER_MAX_SG_TABLESIZE,
 + device-dev_attr.max_fast_reg_page_list_len);
 +
 +if (sg_tablesize  sup_sg_tablesize) {
 +sg_tablesize = sup_sg_tablesize;
 +iser_conn-scsi_max_sectors = sg_tablesize * SIZE_4K / 512;
 +} else {
 +iser_conn-scsi_max_sectors = max_sectors;
 +}
 +
 
 Why SIZE_4K and not PAGE_SIZE?
 
 Yes, I'll change that to PAGE_SIZE.
 
 Thanks.

Would non-4KB pages (e.g. PPC 64KB) be an issue? Would this work between hosts 
with different page sizes?

Scott

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/3] scsi_cmnd: Introduce scsi_transfer_length helper

2014-06-26 Thread Atchley, Scott
On Jun 26, 2014, at 10:55 AM, James Bottomley 
james.bottom...@hansenpartnership.com wrote:

 On Thu, 2014-06-26 at 16:53 +0200, Bart Van Assche wrote:
 On 06/11/14 11:09, Sagi Grimberg wrote:
 +   return xfer_len + (xfer_len  ilog2(sector_size)) * 8;
 
 Sorry that I just noticed this now, but why is a shift-right and ilog2()
 used in the above expression instead of just dividing the transfer
 length by the sector size ?
 
 It's a performance thing.  Division is really slow on most CPUs.
 However, we know the divisor is a power of two so we re-express the
 division as a shift, which the processor can do really fast.
 
 James

I have done this in the past as well, but have you benchmarked it? Compilers 
typically do the right thing in this case (i.e replace division with shift).

Scott--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/3] scsi_cmnd: Introduce scsi_transfer_length helper

2014-06-26 Thread Atchley, Scott
On Jun 26, 2014, at 12:38 PM, James Bottomley 
james.bottom...@hansenpartnership.com wrote:

 On June 26, 2014 11:41:48 AM EDT, Atchley, Scott atchle...@ornl.gov wrote:
 On Jun 26, 2014, at 10:55 AM, James Bottomley
 james.bottom...@hansenpartnership.com wrote:
 
 On Thu, 2014-06-26 at 16:53 +0200, Bart Van Assche wrote:
 On 06/11/14 11:09, Sagi Grimberg wrote:
 + return xfer_len + (xfer_len  ilog2(sector_size)) * 8;
 
 Sorry that I just noticed this now, but why is a shift-right and
 ilog2()
 used in the above expression instead of just dividing the transfer
 length by the sector size ?
 
 It's a performance thing.  Division is really slow on most CPUs.
 However, we know the divisor is a power of two so we re-express the
 division as a shift, which the processor can do really fast.
 
 James
 
 I have done this in the past as well, but have you benchmarked it?
 Compilers typically do the right thing in this case (i.e replace
 division with shift).
 
 The compiler can only do that for values which are reducible to constants at 
 compile time. This is a runtime value, the compiler has no way of deducing 
 that it will be a power of 2
 
 James

You're right, I should have said runtime.

However, gcc on Intel seems to choose the right algorithm at runtime. On a 
trivial app with -O0, I see the same performance for shift and division if the 
divisor is a power of two. Is see ~38% penalty if the divisor is not a power of 
2. With -O3, shift is faster than division by about ~17% when the divisor is a 
power of two.

Scott--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Slow performance with librspreload.so

2013-08-30 Thread Atchley, Scott
On Aug 30, 2013, at 1:38 PM, Hefty, Sean sean.he...@intel.com wrote:

 Another strange issue:
 
 $ sudo LD_PRELOAD=/usr/local/lib/rsocket/librspreload.so iperf -c
 172.17.0.2
 
 Client connecting to 172.17.0.2, TCP port 5001
 TCP window size:  128 KByte (default)
 
 Increasing the window size may improve the results.  E.g. on my systems I go 
 from 17.7 Gbps at 128 KB to 24.3 Gbps for 512 KB.
 
 
 [  3] local 172.17.0.1 port 57926 connected with 172.17.0.2 port 5001
 [ ID] Interval   Transfer Bandwidth
 [  3]  0.0-10.0 sec  12.2 GBytes  10.4 Gbits/sec
 
 $ iperf -c 172.17.0.2
 
 Client connecting to 172.17.0.2, TCP port 5001
 TCP window size:  648 KByte (default)
 
 [  3] local 172.17.0.1 port 58113 connected with 172.17.0.2 port 5001
 [ ID] Interval   Transfer Bandwidth
 [  3]  0.0-10.0 sec  14.5 GBytes  12.5 Gbits/sec
 
 rsocket slower than IPoIB ?
 
 This is surprising to me - just getting 12.5 Gbps out of ipoib is surprising. 
  Does iperf use sendfile()?

I have a pair of nodes connected by QDR via a switch. Using normal IPoIB, a 
single Netperf can reach 18.4 Gb/s if I bind to the same core that the IRQ 
handler is bound to. With four concurrent Netperfs, I can reach 23 Gb/s. This 
is in datagram mode. Connected mode is slower.

I have not tried rsockets on these nodes.

Scott


 
 My results with iperf (version 2.0.5) over ipoib (default configurations) 
 vary considerably based on the TCP window size.  (Note that this is a 40 Gbps 
 link.)  Results summarized:
 
 TCP window size: 27.9 KByte (default)
 [  3]  0.0-10.0 sec  12.8 GBytes  11.0 Gbits/sec
 
 TCP window size:  416 KByte (WARNING: requested  500 KByte)
 [  3]  0.0-10.0 sec  8.19 GBytes  7.03 Gbits/sec
 
 TCP window size:  250 KByte (WARNING: requested  125 KByte)
 [  3]  0.0-10.0 sec  4.99 GBytes  4.29 Gbits/sec
 
 I'm guessing that there are some settings I can change to increase the ipoib 
 performance on my systems.  Using rspreload, I get:
 
 LD_PRELOAD=/usr/local/lib/rsocket/librspreload.so iperf -c 192.168.0.103
 TCP window size:  512 KByte (default)
 [  3]  0.0-10.0 sec  28.3 GBytes  24.3 Gbits/sec
 
 It seems that ipoib bandwidth should be close to rsockets, similar to what 
 you see.  I also don't understand the effect that the TCP window size is 
 having on the results.  The smallest window gives the best bandwidth for 
 ipoib?!
 
 - Sean
 --
 To unsubscribe from this list: send the line unsubscribe linux-rdma in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ceph-users] Help needed porting Ceph to RSockets

2013-08-14 Thread Atchley, Scott
On Aug 14, 2013, at 3:21 AM, Andreas Bluemle andreas.blue...@itxperts.de 
wrote:

 Hi,
 
 maybe some information about the environment I am
 working in:
 
 - CentOS 6.4 with custom kernel 3.8.13
 - librdmacm / librspreload from git, tag 1.0.17
 - application started with librspreload in LD_PRELOAD environment
 
 Currently, I have increased the value of the spin time by setting the
 default value for polling_time in the source code.
 
 I guess that the correct way to do this is via
 configuration in /etc/rdma/rsocket/polling_time?
 
 Concerning the rpoll() itself, some more comments/questions
 embedded below.
 
 On Tue, 13 Aug 2013 21:44:42 +
 Hefty, Sean sean.he...@intel.com wrote:
 
 I found a workaround for my (our) problem: in the librdmacm
 code, rsocket.c, there is a global constant polling_time, which
 is set to 10 microseconds at the moment.
 
 I raise this to 1 - and all of a sudden things work nicely.
 
 I am adding the linux-rdma list to CC so Sean might see this.
 
 If I understand what you are describing, the caller to rpoll()
 spins for up to 10 ms (10,000 us) before calling the real poll().
 
 What is the purpose of the real poll() call? Is it simply a means
 to block the caller and avoid spinning? Or does it actually expect
 to detect an event?
 
 When the real poll() is called, an event is expected on an fd
 associated with the CQ's completion channel. 
 
 The first question I would have is: why is the rpoll() split into
 these two pieces? There must have been some reason to do a busy
 loop on some local state information rather than just call the
 real poll() directly.

Sean can answer specifically, but this is a typical HPC technique. The worst 
thing you can do is handle an event and then block when the next event is 
available. This adds 1-3 us to latency which is unacceptable in HPC. In HPC, we 
poll. If we worry about power, we poll until we get no more events and then we 
poll a little more before blocking. Determining the little more is the fun 
part. ;-) 

Scott--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ceph-users] Help needed porting Ceph to RSockets

2013-08-13 Thread Atchley, Scott
On Aug 13, 2013, at 10:06 AM, Andreas Bluemle andreas.blue...@itxperts.de 
wrote:

 Hi Matthew,
 
 I found a workaround for my (our) problem: in the librdmacm
 code, rsocket.c, there is a global constant polling_time, which
 is set to 10 microseconds at the moment.
 
 I raise this to 1 - and all of a sudden things work nicely.

I am adding the linux-rdma list to CC so Sean might see this.

If I understand what you are describing, the caller to rpoll() spins for up to 
10 ms (10,000 us) before calling the real poll().

What is the purpose of the real poll() call? Is it simply a means to block the 
caller and avoid spinning? Or does it actually expect to detect an event?

 I think we are looking at two issues here:
 1. the thread structure of ceph messenger
   For a given socket connection, there are 3 threads of interest
   here: the main messenger thread, the Pipe::reader and the
   Pipe::writer.
 
   For a ceph client like the ceph admin command, I see the following
   sequence
 - the connection to the ceph monitor is created by the
   main messenger  thread, the Pipe::reader and
   Pipe::writer are instantiated.
 - the requested command is sent to the ceph monitor, the
   answer is read and printed
 - at this point the Pipe::reader already has called
   tcp_read_wait(), polling for more data or connection termination
 - after the response had been printed, the main loop calls the
   shutdown routines which in in turn shutdown() the socket
 
There is some time between the last two steps - and this gap is
long enough to open a race:
 
 2. rpoll, ibv and poll
   the rpoll implementation in rsockets is split in 2 phases:
   - a busy loop which checks the state of the underlying ibv queue pair
   - the call to real poll() system call (i.e. the uverbs(?)
 implementation of poll() inside the kernel)
 
   The busy loop has a maximum duration of polling_time (10 microseconds
   by default) - and is able detect the local shutdown and returns a
   POLLHUP.
 
   The poll() system call (i.e. the uverbs implementation of poll() 
   in the kernel) does not detect the local shutdown - and only returns
   after the caller supplied timeout expires.
 
 Increasing the rsockets polloing_time from 10 to 1 microseconds
 results in the rpoll to detect the local shutdown within the busy loop.
 
 Decreasing the ceph ms tcp read timeout from the default of 900 to 5
 seconds serves a similar purpose, but is much coarser.
 
 From my understanding, the real issue is neither at the ceph nor at the
 rsockets level: it is related to the uverbs kernel module.
 
 An alternative way to address the current problem at the rsockets level
 would be w re-write of the rpoll(): instead of the busy loop at the
 beginning followed by the reall poll() call with the full user
 specificed timeout value (ms tcp read timeout in our case), I would
 embed the real poll()  into a loop, splitting the user specified timeout
 into smaller portions and doing the rsockets specific rs_poll_check()
 on every timeout of the real poll().

I have not looked at the rsocket code, so take the following with a grain of 
salt. If the purpose of the real poll() is to simply block the user for a 
specified time, then you can simply make it a short duration (taking into 
consideration what granularity the OS provides) and then call ibv_poll_cq(). 
Keep in mind, polling will prevent your CPU from reducing power.

If the real poll() is actually checking for something (e.g. checking on the 
RDMA channel's fd or the IB channel's fd), then you may not want to spin too 
much.

Scott

 Best Regards
 
 Andreas Bluemle
 
 
 On Tue, 13 Aug 2013 07:53:12 +0200
 Andreas Bluemle andreas.blue...@itxperts.de wrote:
 
 Hi Matthew,
 
 I can confirm the beahviour whichi you describe.
 I too believe that the problem is on the client side (ceph command).
 My log files show the very same symptom, i.e. the client side
 not being able to shutdown the pipes properly.
 
 (Q: I had problems yesterday to send a mail to ceph-users list
 with the log files attached to it because of the size of 
 the attachments exceeding some limit; I hadnÄt been subscribed
 to the list at that point. Is the uses of pastebin.com the better
 way to provide such lengthy information in general?
 
 
 Best Regards
 
 Andreas Bluemle
 
 On Tue, 13 Aug 2013 11:59:36 +0800
 Matthew Anderson manderson8...@gmail.com wrote:
 
 Moving this conversation to ceph-devel where the dev's might be able
 to shed some light on this.
 
 I've added some additional debug to my code to narrow the issue down
 a bit and the reader thread appears to be getting locked by
 tcp_read_wait() because rpoll never returns an event when the socket
 is shutdown. A hack way of proving this was to lower the timeout in
 rpoll to 5 seconds. When command like 'ceph osd tree' completes you
 can see it block for 5 seconds until rpoll times out and returns 0.
 The reader thread is then able to join and the 

Re: [PATCH V2] libibverbs: Allow arbitrary int values for MTU

2013-07-30 Thread Atchley, Scott
On Jul 30, 2013, at 2:31 PM, Christoph Lameter c...@gentwo.org wrote:

 On Tue, 30 Jul 2013, Jeff Squyres (jsquyres) wrote:
 
 On Jul 30, 2013, at 12:44 PM, Christoph Lameter c...@gentwo.org wrote:
 
 What in the world does that mean? I am an oldtimer I guess. Seems that
 this is something that can be done in the newfangled forum? How does this
 affect mailing lists?
 
 
 I'm not sure what you're asking me; please see the prior posts on this
 thread that describes the MTU issue and why we still need a solution.
 
 What does bump mean? You keep sending replies that just says bump.

http://en.wikipedia.org/wiki/Internet_forum#Thread

When a member posts in a thread it will jump to the top since it is the latest 
updated thread. Similarly, other threads will jump in front of it when they 
receive posts. When a member posts in a thread for no reason but to have it go 
to the top, it is referred to as a bump or bumping.

He is trying to bring it back to everyone's attention.

Scott

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NFS over RDMA benchmark

2013-04-18 Thread Atchley, Scott
On Apr 18, 2013, at 3:15 PM, Wendy Cheng s.wendy.ch...@gmail.com wrote:

 On Thu, Apr 18, 2013 at 10:50 AM, Spencer Shepler
 spencer.shep...@gmail.com wrote:
 
 Note that SPEC SFS does not support RDMA.
 
 
 IIRC, the benchmark comes with source code - wondering anyone has
 modified it to run on RDMA ?  Or is there any real user to share the
 experience ?

I am not familiar with SpecSFS, but if it exercises the filesystem, it does not 
know which RPC layer that NFS uses, no? Or does it implement its own client and 
directly access the RPC layer?

 
 -- Wendy
 
 
 From: Wendy Cheng
 Sent: 4/18/2013 9:16 AM
 To: Yan Burman
 Cc: Atchley, Scott; J. Bruce Fields; Tom Tucker; linux-rdma@vger.kernel.org;
 linux-...@vger.kernel.org; Or Gerlitz
 
 Subject: Re: NFS over RDMA benchmark
 
 On Thu, Apr 18, 2013 at 5:47 AM, Yan Burman y...@mellanox.com wrote:
 
 
 What do you suggest for benchmarking NFS?
 
 
 I believe SPECsfs has been widely used by NFS (server) vendors to
 position their product lines. Its workload was based on a real life
 NFS deployment. I think it is more torward office type of workload
 (large client/user count with smaller file sizes e.g. software
 development with build, compile, etc).
 
 BTW, we're experimenting a similar project and would be interested to
 know your findings.
 
 -- Wendy
 --
 To unsubscribe from this list: send the line unsubscribe linux-nfs in
 
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NFS over RDMA benchmark

2013-04-17 Thread Atchley, Scott
On Apr 17, 2013, at 1:15 PM, Wendy Cheng s.wendy.ch...@gmail.com wrote:

 On Wed, Apr 17, 2013 at 7:36 AM, Yan Burman y...@mellanox.com wrote:
 Hi.
 
 I've been trying to do some benchmarks for NFS over RDMA and I seem to only 
 get about half of the bandwidth that the HW can give me.
 My setup consists of 2 servers each with 16 cores, 32Gb of memory, and 
 Mellanox ConnectX3 QDR card over PCI-e gen3.
 These servers are connected to a QDR IB switch. The backing storage on the 
 server is tmpfs mounted with noatime.
 I am running kernel 3.5.7.
 
 When running ib_send_bw, I get 4.3-4.5 GB/sec for block sizes 4-512K.
 When I run fio over rdma mounted nfs, I get 260-2200MB/sec for the same 
 block sizes (4-512K). running over IPoIB-CM, I get 200-980MB/sec.

Yan,

Are you trying to optimize single client performance or server performance with 
multiple clients?


 Remember there are always gaps between wire speed (that ib_send_bw
 measures) and real world applications.
 
 That being said, does your server use default export (sync) option ?
 Export the share with async option can bring you closer to wire
 speed. However, the practice (async) is generally not recommended in a
 real production system - as it can cause data integrity issues, e.g.
 you have more chances to lose data when the boxes crash.
 
 -- Wendy


Wendy,

It has a been a few years since I looked at RPCRDMA, but I seem to remember 
that RPCs were limited to 32KB which means that you have to pipeline them to 
get linerate. In addition to requiring pipelining, the argument from the 
authors was that the goal was to maximize server performance and not single 
client performance.

Scott

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Sharing MR Between Multiple Connections

2012-11-14 Thread Atchley, Scott
On Nov 13, 2012, at 11:36 PM, Christopher Mitchell christop...@cemetech.net 
wrote:

 Hi,
 
 I am working on building an Infiniband application with a server that
 can handle many simultaneous clients. The server exposes a chunk of
 memory that each of the clients can read via RDMA. I was previously
 creating a new MR on the server for each client (and of course in that
 connection's PD). However, under stress testing, I realized that
 ibv_reg_mr() started failing after I simultaneously MRed the same area
 enough times to cover 20.0 GB. I presume that the problem is reaching
 some pinning limit, although ulimit reports unlimited for all
 relevant possibilities. I tried creating a single global PD and a
 single MR to be shared among the multiple connections, but
 rdma_create_qp() fails with an invalid argument when I try to do that.
 I therefore deduce that the PD specified in rdma_create_qp() must
 correspond to an active connection, not simply be created by opening a
 device.
 
 Long question short: is there any way I can share the same MR among
 multiple clients, so that my shared memory region is limited to N
 bytes instead of N/C (clients) bytes?

Christopher,

Yes, it is possible. You have to use the same PD for all QPs/connections. We do 
this in CCI when using the Verbs transport.

Scott--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: how to preserve QP over HA events for librdmacm applications

2012-09-20 Thread Atchley, Scott
On Sep 20, 2012, at 1:37 PM, Pradeep Satyanarayana 
prade...@linux.vnet.ibm.com wrote:

 On 09/19/2012 11:14 AM, Atchley, Scott wrote:
 On Sep 19, 2012, at 1:05 PM, Hefty, Seansean.he...@intel.com  wrote:
 
 I too would be interested in bringing a QP from error back to a usable 
 state. I
 have been debating whether to reconnect using the current RDMA calls versus
 trying to transition the existing RC QP.
 
 I assumed to transition the existing QP that I would need to open a socket 
 to
 coordinate the two sides. Is that correct?
 
 If I were instead to use rdma_connect(), does it require a new CM id or 
 just a
 new QP within the same id?
 
 What if you say pre-created a second (fail over) QP for HA purposes all 
 under the covers of a single socket? And both QPs were connected before 
 the failure. Not sure if that would work with the same CM id though. If 
 not, we will need to rdma_connect() the second QP after failure.
 
 By having a second QP and bound to say a different port/device, one 
 could survive not just link up/down events, but device failures too. 
 Would that be more generic?

Hi Pradeep,

What is the memory cost of a QP? I assume it will require a second CM id as 
well.

Involving a second device and/or port is not an option for my usage.

Scott--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: how to preserve QP over HA events for librdmacm applications

2012-09-19 Thread Atchley, Scott
On Sep 19, 2012, at 11:58 AM, Alex Rosenbaum al...@mellanox.com wrote:

 Since we use the RDMA_PS_IPOIB we need librdmacm to help get the correct 
 pkey_index and qkey (in INIT-RTR transition) to match IPoIB's UD QP own 
 values. If not, than our user space UD QP will not be able to send/recv from 
 IPoIB on remote machines (which is what we want to gain by using the IPOIB 
 port space).
 
 Maybe we can save the values used from the rdma_create_qp and reuse them once 
 modify the UD QP state by libverbs (ibv_modify_qp).
 It would be nice if we had access to the rdma's modify qp wrapper to do this 
 nicely from application level.

I too would be interested in bringing a QP from error back to a usable state. I 
have been debating whether to reconnect using the current RDMA calls versus 
trying to transition the existing RC QP.

I assumed to transition the existing QP that I would need to open a socket to 
coordinate the two sides. Is that correct?

If I were instead to use rdma_connect(), does it require a new CM id or just a 
new QP within the same id?

Thanks,

Scott


 -Original Message-
 From: Or Gerlitz 
 Sent: Wednesday, September 19, 2012 6:53 PM
 To: Alex Rosenbaum
 Cc: Hefty, Sean; linux-rdma (linux-rdma@vger.kernel.org)
 Subject: Re: how to preserve QP over HA events for librdmacm applications
 
 On 19/09/2012 18:48, Hefty, Sean wrote:
 Can this flushing be somehow done with the current 
 librdmacm/libibverbs APIs or we need some enhancement?
 You can call verbs directly to transition the QP state.  That leaves the CM 
 state unchanged, which doesn't really matter for UD QPs anyway.
 
 
 
 Alex,
 
 Any reason we can't deploy this hack? is that for the IPoIB port space it 
 would require copying some low level code from librdmacm or even from the 
 kernel? e.g the IPoIB qkey, etc.
 
 Or.
 --
 To unsubscribe from this list: send the line unsubscribe linux-rdma in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: how to preserve QP over HA events for librdmacm applications

2012-09-19 Thread Atchley, Scott
On Sep 19, 2012, at 1:05 PM, Hefty, Sean sean.he...@intel.com wrote:

 I too would be interested in bringing a QP from error back to a usable 
 state. I
 have been debating whether to reconnect using the current RDMA calls versus
 trying to transition the existing RC QP.
 
 I assumed to transition the existing QP that I would need to open a socket to
 coordinate the two sides. Is that correct?
 
 If I were instead to use rdma_connect(), does it require a new CM id or just 
 a
 new QP within the same id?
 
 What do you gain by transitioning an RC QP from error to RTS, versus just 
 establishing a new connection?

I have a certain amount of state regarding a peer. I lookup that state based on 
the qp_num returned within a work completion, for example. If I reconnect, I 
will need to migrate the state from the old qp_num to the new qp_num.

I have no preference which is why I asked about the two options (opening a 
socket to coordinate state transitions versus connecting with a new QP).

Scott--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: how to preserve QP over HA events for librdmacm applications

2012-09-19 Thread Atchley, Scott
On Sep 19, 2012, at 2:14 PM, Atchley, Scott atchle...@ornl.gov wrote:

 On Sep 19, 2012, at 1:05 PM, Hefty, Sean sean.he...@intel.com wrote:
 
 I too would be interested in bringing a QP from error back to a usable 
 state. I
 have been debating whether to reconnect using the current RDMA calls versus
 trying to transition the existing RC QP.
 
 I assumed to transition the existing QP that I would need to open a socket 
 to
 coordinate the two sides. Is that correct?
 
 If I were instead to use rdma_connect(), does it require a new CM id or 
 just a
 new QP within the same id?
 
 What do you gain by transitioning an RC QP from error to RTS, versus just 
 establishing a new connection?
 
 I have a certain amount of state regarding a peer. I lookup that state based 
 on the qp_num returned within a work completion, for example. If I reconnect, 
 I will need to migrate the state from the old qp_num to the new qp_num.
 
 I have no preference which is why I asked about the two options (opening a 
 socket to coordinate state transitions versus connecting with a new QP).

I don't know if it matters to the conversation or not, but I use an SRQ. I am 
unclear how to remove a QP from the SRQ. Is ibv_destroy_qp() sufficient? Or do 
I need to use rdma_destroy_qp()?

I basically, use the rdma_* calls for connection setup. After that, I use only 
ibv_* calls for communication (Send/Recv and RDMA).

Scott--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: IPoIB performance

2012-09-05 Thread Atchley, Scott
On Sep 5, 2012, at 11:51 AM, Christoph Lameter wrote:

 On Wed, 29 Aug 2012, Atchley, Scott wrote:
 
 I am benchmarking a sockets based application and I want a sanity check
 on IPoIB performance expectations when using connected mode (65520 MTU).
 I am using the tuning tips in Documentation/infiniband/ipoib.txt. The
 machines have Mellanox QDR cards (see below for the verbose ibv_devinfo
 output). I am using a 2.6.36 kernel. The hosts have single socket Intel
 E5520 (4 core with hyper-threading on) at 2.27 GHz.
 
 I am using netperf's TCP_STREAM and binding cores. The best I have seen
 is ~13 Gbps. Is this the best I can expect from these cards?
 
 Sounds about right, This is not a hardware limitation but
 a limitation of the socket I/O layer / PCI-E bus. The cards generally can
 process more data than the PCI bus and the OS can handle.
 
 PCI-E on PCI 2.0 should give you up to about 2.3 Gbytes/sec with these
 nics. So there is like something that the network layer does to you that
 limits the bandwidth.

First, thanks for the reply.

I am not sure where are are getting the 2.3 GB/s value. When using verbs 
natively, I can get ~3.4 GB/s. I am assuming that these HCAs lack certain TCP 
offloads that might allow higher Socket performance. Ethtool reports:

# ethtool -k ib0
Offload parameters for ib0:
rx-checksumming: off
tx-checksumming: off
scatter-gather: off
tcp segmentation offload: off
udp fragmentation offload: off
generic segmentation offload: on
generic-receive-offload: off

There is no checksum support which I would expect to lower performance. Since 
checksums need to be calculated in the host, I would expect faster processors 
to help performance some.

So basically, am I in the ball park given this hardware?

 
 What should I expect as a max for ipoib with FDR cards?
 
 More of the same. You may want to
 
 A) increase the block size handled by the socket layer

Do you mean altering sysctl with something like:

# increase TCP max buffer size setable using setsockopt()
net.core.rmem_max = 16777216 
net.core.wmem_max = 16777216 
# increase Linux autotuning TCP buffer limit 
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
# increase the length of the processor input queue
net.core.netdev_max_backlog = 3

or something increasing the SO_SNFBUF and SO_RCVBUF sizes or something else?

 B) Increase the bandwidth by using PCI-E 3 or more PCI-E lanes.
 
 C) Bypass the socket layer. Look at Sean's rsockets layer f.e.

We actually want to test the socket stack and not bypass it.

Thanks again!

Scott

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: IPoIB performance

2012-09-05 Thread Atchley, Scott
On Sep 5, 2012, at 1:50 PM, Reeted wrote:

 On 08/29/12 21:35, Atchley, Scott wrote:
 Hi all,
 
 I am benchmarking a sockets based application and I want a sanity check on 
 IPoIB performance expectations when using connected mode (65520 MTU).
 
 I have read that with newer cards the datagram (unconnected) mode is 
 faster at IPoIB than connected mode. Do you want to check?

I have read that the latency is lower (better) but the bandwidth is lower.

Using datagram mode limits the MTU to 2044 and the throughput to ~3 Gb/s on 
these machines/cards. Connected mode at the same MTU performs roughly the same. 
The win in connected mode comes with larger MTUs. With a 9000 MTU, I see ~6 
Gb/s. Pushing the MTU to 655120 (the maximum for ipoib), I can get ~13 Gb/s.

 What benchmark program are you using?

netperf with process binding (-T). I tune sysctl per the DOE FasterData specs:

http://fasterdata.es.net/host-tuning/linux/

Scott--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: IPoIB performance

2012-09-05 Thread Atchley, Scott
On Sep 5, 2012, at 2:20 PM, Christoph Lameter wrote:

 On Wed, 5 Sep 2012, Atchley, Scott wrote:
 
 # ethtool -k ib0
 Offload parameters for ib0:
 rx-checksumming: off
 tx-checksumming: off
 scatter-gather: off
 tcp segmentation offload: off
 udp fragmentation offload: off
 generic segmentation offload: on
 generic-receive-offload: off
 
 There is no checksum support which I would expect to lower performance.
 Since checksums need to be calculated in the host, I would expect faster
 processors to help performance some.
 
 K that is a major problem. Both are on by default here. What NIC is this?

These are Mellanox QDR HCAs (board id is MT_0D90110009). The full output of 
ibv_devinfo is in my original post.

 A) increase the block size handled by the socket layer
 
 Do you mean altering sysctl with something like:
 
 Nope increase mtu. Connected mode supports up to 64k mtu size I believe.

Yes, I am using the max MTU (65520).

 or something increasing the SO_SNFBUF and SO_RCVBUF sizes or something else?
 
 That does nothing for performance. The problem is that the handling of the
 data by the kernel causes too much latency so that you cannot reach the
 full bw of the hardware.
 
 We actually want to test the socket stack and not bypass it.
 
 AFAICT the network stack is useful up to 1Gbps and
 after that more and more band-aid comes into play.

Hmm, many 10G Ethernet NICs can reach line rate. I have not yet tested any 40G 
Ethernet NICs, but I hope that they will get close to line rate. If not, what 
is the point? ;-)

Scott--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: IPoIB performance

2012-09-05 Thread Atchley, Scott
On Sep 5, 2012, at 3:04 PM, Reeted wrote:

 On 09/05/12 19:59, Atchley, Scott wrote:
 On Sep 5, 2012, at 1:50 PM, Reeted wrote:
 
 
 I have read that with newer cards the datagram (unconnected) mode is
 faster at IPoIB than connected mode. Do you want to check?
 I have read that the latency is lower (better) but the bandwidth is lower.
 
 Using datagram mode limits the MTU to 2044 and the throughput to ~3 Gb/s on 
 these machines/cards. Connected mode at the same MTU performs roughly the 
 same. The win in connected mode comes with larger MTUs. With a 9000 MTU, I 
 see ~6 Gb/s. Pushing the MTU to 655120 (the maximum for ipoib), I can get 
 ~13 Gb/s.
 
 
 Have a look at an old thread in this ML by Sebastien Dugue IPoIB to 
 Ethernet routing performance
 He had numbers much higher than yours on similar hardware, and was 
 suggested to use datagram to achieve offloading and even higher speeds.
 Keep me informed if you can fix this, I am interested but can't test 
 infiniband myself right now.

He claims 20 Gb/s and Or replies that one should also get near 20 Gb/s using 
datagram mode. I checked and datagram mode shows support via ethtool for more 
offloads. In my case, I still see better performance with connected mode.

Thanks,

Scott--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: IPoIB performance

2012-09-05 Thread Atchley, Scott
On Sep 5, 2012, at 3:06 PM, Christoph Lameter wrote:

 On Wed, 5 Sep 2012, Atchley, Scott wrote:
 
 AFAICT the network stack is useful up to 1Gbps and
 after that more and more band-aid comes into play.
 
 Hmm, many 10G Ethernet NICs can reach line rate. I have not yet tested any 
 40G Ethernet NICs, but I hope that they will get close to line rate. If not, 
 what is the point? ;-)
 
 Oh yes they can under restricted circumstances. Large packets, multiple
 cores etc. With the band-aids….

With Myricom 10G NICs, for example, you just need one core and it can do line 
rate with 1500 byte MTU. Do you count the stateless offloads as band-aids? Or 
something else?

I have not tested any 40G NICs yet, but I imagine that one core will not be 
enough.

Thanks,

Scott--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: IPoIB performance

2012-09-05 Thread Atchley, Scott

On Sep 5, 2012, at 3:13 PM, Christoph Lameter wrote:

 On Wed, 5 Sep 2012, Atchley, Scott wrote:
 
 These are Mellanox QDR HCAs (board id is MT_0D90110009). The full output of 
 ibv_devinfo is in my original post.
 
 Hmmm... You are running an old kernel. What version of OFED do you use?

Hah, if you think my kernel is old, you should see my userland (RHEL5.5). ;-)

Does the version of OFED impact the kernel modules? I am using the modules that 
came with the kernel. I don't believe that libibverbs or librdmacm are used by 
the kernel's socket stack. That said, I am using source builds with tags 
libibverbs-1.1.6 and v1.0.16 (librdmacm).

Scott--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: IPoIB performance

2012-09-05 Thread Atchley, Scott
On Sep 5, 2012, at 4:12 PM, Ezra Kissel wrote:

 On 9/5/2012 3:48 PM, Atchley, Scott wrote:
 On Sep 5, 2012, at 3:06 PM, Christoph Lameter wrote:
 
 On Wed, 5 Sep 2012, Atchley, Scott wrote:
 
 AFAICT the network stack is useful up to 1Gbps and
 after that more and more band-aid comes into play.
 
 Hmm, many 10G Ethernet NICs can reach line rate. I have not yet tested any 
 40G Ethernet NICs, but I hope that they will get close to line rate. If 
 not, what is the point? ;-)
 
 Oh yes they can under restricted circumstances. Large packets, multiple
 cores etc. With the band-aids….
 
 With Myricom 10G NICs, for example, you just need one core and it can do 
 line rate with 1500 byte MTU. Do you count the stateless offloads as 
 band-aids? Or something else?
 
 I have not tested any 40G NICs yet, but I imagine that one core will not be 
 enough.
 
 Since you are using netperf, you might also considering experimenting 
 with the TCP_SENDFILE test.  Using sendfile/splice calls can have a 
 significant impact for sockets-based apps.
 
 Using 40G NICs (Mellanox ConnectX-3 EN), I've seen our applications hit 
 22Gb/s single core/stream while fully CPU bound.  With sendfile/splice, 
 there is no issue saturating a 40G link with about 40-50% core 
 utilization.  That being said, binding to the right core/node, message 
 size and memory alignment, interrupt handling, and proper host/NIC 
 tuning all have an impact on the performance.  The state of 
 high-performance networking is certainly not plug-and-play.

Thanks for the tip. The app we want to test does not use sendfile() or splice().

I do bind to the best core (determined by testing all combinations on client 
and server).

I have heard others within DOE reach ~16 Gb/s on a 40G Mellanox NIC. I'm glad 
to hear that you got to 22 Gb/s for a single stream. That is more reassuring.

Scott--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


IPoIB performance

2012-08-29 Thread Atchley, Scott
Hi all,

I am benchmarking a sockets based application and I want a sanity check on 
IPoIB performance expectations when using connected mode (65520 MTU). I am 
using the tuning tips in Documentation/infiniband/ipoib.txt. The machines have 
Mellanox QDR cards (see below for the verbose ibv_devinfo output). I am using a 
2.6.36 kernel. The hosts have single socket Intel E5520 (4 core with 
hyper-threading on) at 2.27 GHz.

I am using netperf's TCP_STREAM and binding cores. The best I have seen is ~13 
Gbps. Is this the best I can expect from these cards?

What should I expect as a max for ipoib with FDR cards?

Thanks,

Scott



hca_id: mlx4_0
transport:  InfiniBand (0)
fw_ver: 2.7.626
node_guid:  0002:c903:000b:6520
sys_image_guid: 0002:c903:000b:6523
vendor_id:  0x02c9
vendor_part_id: 26428
hw_ver: 0xB0
board_id:   MT_0D90110009
phys_port_cnt:  1
max_mr_size:0x
page_size_cap:  0xfe00
max_qp: 65464
max_qp_wr:  16384
device_cap_flags:   0x006c9c76
max_sge:32
max_sge_rd: 0
max_cq: 65408
max_cqe:4194303
max_mr: 131056
max_pd: 32764
max_qp_rd_atom: 16
max_ee_rd_atom: 0
max_res_rd_atom:1047424
max_qp_init_rd_atom:128
max_ee_init_rd_atom:0
atomic_cap: ATOMIC_HCA (1)
max_ee: 0
max_rdd:0
max_mw: 0
max_raw_ipv6_qp:0
max_raw_ethy_qp:0
max_mcast_grp:  8192
max_mcast_qp_attach:56
max_total_mcast_qp_attach:  458752
max_ah: 0
max_fmr:0
max_srq:65472
max_srq_wr: 16383
max_srq_sge:31
max_pkeys:  128
local_ca_ack_delay: 15
port:   1
state:  PORT_ACTIVE (4)
max_mtu:2048 (4)
active_mtu: 2048 (4)
sm_lid: 6
port_lid:   8
port_lmc:   0x00
link_layer: InfiniBand
max_msg_sz: 0x4000
port_cap_flags: 0x02510868
max_vl_num: 8 (4)
bad_pkey_cntr:  0x0
qkey_viol_cntr: 0x0
sm_sl:  0
pkey_tbl_len:   128
gid_tbl_len:128
subnet_timeout: 18
init_type_reply:0
active_width:   4X (2)
active_speed:   10.0 Gbps (4)
phys_state: LINK_UP (5)
GID[  0]:   
fe80::::0002:c903:000b:6521

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: OT: netmap - a novel framework for fast packet I/O

2012-01-20 Thread Atchley, Scott
On Jan 20, 2012, at 11:20 AM, Ira Weiny wrote:

 On Fri, 20 Jan 2012 06:18:44 -0800
 Atchley, Scott atchle...@ornl.gov wrote:
 
 Interesting. It totally hijacks the NIC; all traffic is captured. You would 
 have to implement your own IP stack, Verbs stack, etc.
 
 
 Can multiple user space processes share the card?  If so, how is security 
 handled between them?

It is not clear from the paper I scanned.

There does seem to be a mechanism to send selected packets up the host stack.

Scott

 
 Ira
 
 Scott
 
 On Jan 19, 2012, at 11:50 AM, Yann Droneaud wrote:
 
 Hi,
 
 I have discovered today the netmap project[1] through an ACM Queue
 article[2].
 
 Netmap is a new interface to send and receive packets through an
 Ethernet interface (NIC). It seems to provide a raw access to network
 interface in order to process packets at high rate with a low overhead.
 
 This is an another example of kernel-bypass/zero-copy which are core
 features of InfiniBand verbs/RDMA.
 
 But unlike InfiniBand verbs/RDMA, Netmap seems to have a very small API.
 
 Such API could be enough to build an unreliable datagram messaging
 system on low cost hardware (without concerns of determinism, flow
 control, etc.).
 
 I'm asking myself if the way netmap exposes internal NIC rings could be
 applicable for IB/IBoE HCA ? e.g. beyond 10GbE NIC, is netmap relevant ?
 
 Regards.
 
 [1] http://info.iet.unipi.it/~luigi/netmap/
 
 netmap - a novel framework for fast packet I/O
 Luigi Rizzo Università di Pisa
 
 [2] http://queue.acm.org/detail.cfm?id=2103536
 
 Revisiting Network I/O APIs: The netmap Framework 
 Luigi Rizzo, 2012-01-17  
 
 -- 
 Yann Droneaud
 
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-rdma in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-rdma in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 
 -- 
 Ira Weiny
 Member of Technical Staff
 Lawrence Livermore National Lab
 925-423-8008
 wei...@llnl.gov

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Send with immediate data completion

2012-01-12 Thread Atchley, Scott
On Jan 11, 2012, at 5:22 PM, Hefty, Sean wrote:

 I'm still waiting on feedback from the IBTA, but they are looking into the
 matter.
 
 The intent is for immediate data only to be provided on receive work 
 completions.  The IBTA will clarify the spec on this.  I'll submit patches 
 that remove setting the wc flag, which may help avoid this confusion some.

Sean,

Thanks for looking into this.

Scott--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Send with immediate data completion

2012-01-11 Thread Atchley, Scott
On Jan 3, 2012, at 12:35 PM, Atchley, Scott wrote:

 On Jan 3, 2012, at 11:55 AM, Hefty, Sean wrote:
 
 I have a question about a completion for a send with immediate data. The IB
 spec (1.2.1) only mentions that the WC's immediate data be present at the
 receiver. It is silent on the value on the sender at completion. It does say
 that it is only valid if the WC's immediate data indicator is set.
 
 Can you provide a section reference to the spec on the areas that you're 
 looking at?  Looking quickly, section 11.4.2.1 reads like immediate data 
 should be available in either case.
 
 I've never checked imm data on the send wc.  I'm just trying to determine if 
 there's an issue in the spec that should be addressed, or if this is simply 
 a bug in the hca/driver.
 
 There is the definition in the glossary:
 
 Immediate Data
 
 Data contained in a Work Queue Element that is sent along with the payload to 
 the remote Channel Adapter and placed in a Receive Work Completion.
 
 Section 3.7.4 Transport Layer:
 
 The Immediate Data (IMMDT) field is optionally present in RDMA WRITE and SEND 
 messages. It contains data that the consumer placed in the Send or RDMA Write 
 request and the receiving QP will place that value in the current receive 
 WQE. An RDMA Write with immediate data will consume a receive WQE even though 
 the QP did not place any data into the receive buffer since the IMMDT is 
 placed in a CQE that references the receive WQE and indicates that the WQE 
 has completed.
 
 Section 11.4.1.1 Post Send Request has:
 
 Immediate Data Indicator. This is set if Immediate Data is to
 be included in the outgoing request. Valid only for Send or
 Write RDMA operations.
 
 4-byte Immediate Data. Valid only for Send or Write RDMA operations.
 
 11.4.2.1 Poll for Completion
 
 Immediate data indicator. This is set if immediate data is present.
 
 4-byte immediate data.
 
 
 None specifically mention the sender's completion event.

Sean,

Any thoughts?

Personally, I would like to have it in the send completion, but it might not be 
possible for all drivers to implement. If not, then the spec should be 
clarified.

Scott--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Send with immediate data completion

2012-01-03 Thread Atchley, Scott
Hi all,

I have a question about a completion for a send with immediate data. The IB 
spec (1.2.1) only mentions that the WC's immediate data be present at the 
receiver. It is silent on the value on the sender at completion. It does say 
that it is only valid if the WC's immediate data indicator is set.

When I test using a 2.6.38 kernel with the kernel.org libibverbs git tree, I 
see a send completion's wc_flags set with IBV_WC_WITH_IMM yet the imm_data is 
not what I passed in. Since the spec is silent on setting imm_data on the 
sender, I assume that I should not rely on looking at the imm_data on a send 
completion.

Given that, should IBV_WC_WITH_IMM ever be set on the sender?

Thanks,

Scott

-
Scott Atchley
HPC Systems Engineer
Center for Computational Sciences
Oak Ridge National Laboratory
atchle...@ornl.gov



--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Send with immediate data completion

2012-01-03 Thread Atchley, Scott
On Jan 3, 2012, at 11:55 AM, Hefty, Sean wrote:

 I have a question about a completion for a send with immediate data. The IB
 spec (1.2.1) only mentions that the WC's immediate data be present at the
 receiver. It is silent on the value on the sender at completion. It does say
 that it is only valid if the WC's immediate data indicator is set.
 
 Can you provide a section reference to the spec on the areas that you're 
 looking at?  Looking quickly, section 11.4.2.1 reads like immediate data 
 should be available in either case.
 
 I've never checked imm data on the send wc.  I'm just trying to determine if 
 there's an issue in the spec that should be addressed, or if this is simply a 
 bug in the hca/driver.

For the record, I am using:

hca_id: mlx4_0
transport:  InfiniBand (0)
fw_ver: 2.7.626
node_guid:  0002:c903:000b:64e8
sys_image_guid: 0002:c903:000b:64eb
vendor_id:  0x02c9
vendor_part_id: 26428
hw_ver: 0xB0
board_id:   MT_0D90110009

Scott--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html