Re: [PATCH V4 for-next 1/5] IB/core: Add RSS and TSS QP groups - suggesting BOF during OFA conf to further discuss that

2013-04-30 Thread Shlomo Pongratz

On 4/29/2013 11:36 PM, Jason Gunthorpe wrote:

On Mon, Apr 29, 2013 at 10:52:21PM +0300, Or Gerlitz wrote:

On Fri, Apr 26, 2013 at 12:40 AM, Jason Gunthorpe wrote:

But I don't follow why the send QPNs have to be sequential for
IPoIB. It looks like this is being motivated by RSS and RSS QPNs are
just being reused for TSS?

Go read It turns out that there are IPoIB drivers used by some
operating-systems
and/or Hypervisors in a para-virtualization (PV) scheme which extract the
source QPN from the CQ WC associated with an incoming packets in order
to.. and what follows in the change-log of patch 4/5
http://marc.info/?l=linux-rdmam=136412901621797w=2

This is what I said in the first place, the RFC is premised on the
src.QPN to be set properly, you can't just mess with it, because stuff
needs it.

I think you should have split this patch up, there is lots going on
here.

- Add proper TSS that doesn't change the wire protocol
- Add fake TSS that does change the wire protocol, and
   properly document those changes so other people can
   follow/implement them
- Add RSS

And.. 'tss_qpn_mask_sz' seems unnecessarily limiting, using
  WC.srcQPN + ipoib_header.tss_qpn_offset == real QPN
  (ie use a signed offset, not a mask)
Seems much better than
  Wc.srcQPN  ~((1(ipoib_header.tss_qpn_mask_sz  12))-1) == real QPN
  (Did I even get that right?)

Specifically it means the requirements for alignment and
contiguous-ness are gone. This means you can implement it without
using the QP groups API and it will work immediately with every HCA
out there. I think if we are going to actually mess with the wire
protocol this sort of broad applicability is important.

As for the other two questions: seems reasonable to me. Without a
consensus among HW vendors how to do this it makes sense to move ahead
*in the kernel* with a minimal API. Userspace is a different question
of course..

Jason

Hi Jason,

Your suggestion could have been valid if the the IPoIB header was larger.
Please note that the a QPN occupies 3 octets and thus its value lies in 
the range of [0..0xFF].
On the other hand the reserved field in the IPoIB header occupies only 2 
octets, so given an arbitrary group of source QPN it may be not possible 
to recover the real QPN.
This is why the real QPN should be a power of two and the rest should 
have consecutive numbers. And since the number of the TSS QP is 
relatively small, that is, in the order of the number of the cores than 
masking the lower bits of the Wc.srcQPN will recover the real QPN 
number.
Also by sending only the mask length we don't use the entire reserved 
filed but only 4 bits leaving 12 bits to future use.


Best regards,

S.P.

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NFS over RDMA benchmark

2013-04-30 Thread Tom Talpey

On 4/30/2013 1:09 AM, Yan Burman wrote:




-Original Message-
From: J. Bruce Fields [mailto:bfie...@fieldses.org]
Sent: Sunday, April 28, 2013 17:43
To: Yan Burman
Cc: Wendy Cheng; Atchley, Scott; Tom Tucker; linux-rdma@vger.kernel.org;
linux-...@vger.kernel.org; Or Gerlitz
Subject: Re: NFS over RDMA benchmark

On Sun, Apr 28, 2013 at 06:28:16AM +, Yan Burman wrote:

On Wed, Apr 17, 2013 at 7:36 AM, Yan Burman
y...@mellanox.com

I've been trying to do some benchmarks for NFS over
RDMA and I seem to

only get about half of the bandwidth that the HW can give me.

My setup consists of 2 servers each with 16 cores,
32Gb of memory, and

Mellanox ConnectX3 QDR card over PCI-e gen3.

These servers are connected to a QDR IB switch. The
backing storage on

the server is tmpfs mounted with noatime.

I am running kernel 3.5.7.

When running ib_send_bw, I get 4.3-4.5 GB/sec for
block sizes 4-

512K.

When I run fio over rdma mounted nfs, I get
260-2200MB/sec for the

same block sizes (4-512K). running over IPoIB-CM, I get
200-

980MB/sec.

...

I am trying to get maximum performance from a single server
- I used 2

processes in fio test - more than 2 did not show any performance

boost.

I tried running fio from 2 different PCs on 2 different
files, but the sum of

the two is more or less the same as running from single client PC.




I finally got up to 4.1GB/sec bandwidth with RDMA (ipoib-CM bandwidth is also 
way higher now).
For some reason when I had intel IOMMU enabled, the performance dropped 
significantly.
I now get up to ~95K IOPS and 4.1GB/sec bandwidth.


Excellent, but is that 95K IOPS a typo? At 4KB, that's less than 400MBps.

What is the client CPU percentage you see under this workload, and
how different are the NFS/RDMA and NFS/IPoIB overheads?


Now I will take care of the issue that I am running only at 40Gbit/s instead of 
56Gbit/s, but that is another unrelated problem (I suspect I have a cable 
issue).

This is still strange, since ib_send_bw with intel iommu enabled did get up to 
4.5GB/sec, so why did intel iommu affect only nfs code?


You'll need to do more profiling to track that down. I would suspect
that ib_send_bw is using some sort of direct hardware access, bypassing
the IOMMU management and possibly performing no dynamic memory registration.

The NFS/RDMA code goes via the standard kernel DMA API, and correctly
registers/deregisters memory on a per-i/o basis in order to provide
storage data integrity. Perhaps there are overheads in the IOMMU
management which can be addressed.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NFS over RDMA benchmark

2013-04-30 Thread J. Bruce Fields
On Sun, Apr 28, 2013 at 10:42:48AM -0400, J. Bruce Fields wrote:
 On Sun, Apr 28, 2013 at 06:28:16AM +, Yan Burman wrote:
 On Wed, Apr 17, 2013 at 7:36 AM, Yan Burman
 y...@mellanox.com
 I've been trying to do some benchmarks for NFS over RDMA
 and I seem to
only get about half of the bandwidth that the HW can give me.
 My setup consists of 2 servers each with 16 cores, 32Gb of
 memory, and
Mellanox ConnectX3 QDR card over PCI-e gen3.
 These servers are connected to a QDR IB switch. The
 backing storage on
the server is tmpfs mounted with noatime.
 I am running kernel 3.5.7.

 When running ib_send_bw, I get 4.3-4.5 GB/sec for block 
 sizes 4-
   512K.
 When I run fio over rdma mounted nfs, I get 260-2200MB/sec
 for the
same block sizes (4-512K). running over IPoIB-CM, I get 200-
   980MB/sec.
 ...
   I am trying to get maximum performance from a single server - I
   used 2
  processes in fio test - more than 2 did not show any performance 
  boost.
   I tried running fio from 2 different PCs on 2 different files,
   but the sum of
  the two is more or less the same as running from single client PC.
  
   What I did see is that server is sweating a lot more than the
   clients and
  more than that, it has 1 core (CPU5) in 100% softirq tasklet:
   cat /proc/softirqs
 ...
 Perf top for the CPU with high tasklet count gives:

  samples  pcnt RIPfunction
 DSO
 ...
  2787.00 24.1% 81062a00 mutex_spin_on_owner
   /root/vmlinux
 ...
   Googling around  I think we want:
   
 perf record -a --call-graph
 (give it a chance to collect some samples, then ^C)
 perf report --call-graph --stdio
   
  
  Sorry it took me a while to get perf to show the call trace (did not enable 
  frame pointers in kernel and struggled with perf options...), but what I 
  get is:
  36.18%  nfsd  [kernel.kallsyms]   [k] mutex_spin_on_owner
  |
  --- mutex_spin_on_owner
 |
 |--99.99%-- __mutex_lock_slowpath
 |  mutex_lock
 |  |
 |  |--85.30%-- generic_file_aio_write
 
 That's the inode i_mutex.

Looking at the code  With CONFIG_MUTEX_SPIN_ON_OWNER it spins
(instead of sleeping) as long as the lock owner's still running.  So
this is just a lot of contention on the i_mutex, I guess.  Not sure what
to do aobut that.

--b.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NFS over RDMA benchmark

2013-04-30 Thread Tom Talpey

On 4/30/2013 1:09 AM, Yan Burman wrote:

I now get up to ~95K IOPS and 4.1GB/sec bandwidth.
...
 ib_send_bw with intel iommu enabled did get up to 4.5GB/sec


BTW, you may want to verify that these are the same GB. Many
benchmarks say KB/MB/GB when they really mean KiB/MiB/GiB.

At GB/GiB, the difference is about 7.5%, very close to the
difference between 4.1 and 4.5.

Just a thought.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: NFS over RDMA benchmark

2013-04-30 Thread Yan Burman


 -Original Message-
 From: Tom Talpey [mailto:t...@talpey.com]
 Sent: Tuesday, April 30, 2013 16:05
 To: Yan Burman
 Cc: J. Bruce Fields; Wendy Cheng; Atchley, Scott; Tom Tucker; linux-
 r...@vger.kernel.org; linux-...@vger.kernel.org; Or Gerlitz
 Subject: Re: NFS over RDMA benchmark
 
 On 4/30/2013 1:09 AM, Yan Burman wrote:
 
 
  -Original Message-
  From: J. Bruce Fields [mailto:bfie...@fieldses.org]
  Sent: Sunday, April 28, 2013 17:43
  To: Yan Burman
  Cc: Wendy Cheng; Atchley, Scott; Tom Tucker;
  linux-rdma@vger.kernel.org; linux-...@vger.kernel.org; Or Gerlitz
  Subject: Re: NFS over RDMA benchmark
 
  On Sun, Apr 28, 2013 at 06:28:16AM +, Yan Burman wrote:
  On Wed, Apr 17, 2013 at 7:36 AM, Yan Burman
  y...@mellanox.com
  I've been trying to do some benchmarks for NFS over RDMA
  and I seem to
  only get about half of the bandwidth that the HW can give me.
  My setup consists of 2 servers each with 16 cores, 32Gb of
  memory, and
  Mellanox ConnectX3 QDR card over PCI-e gen3.
  These servers are connected to a QDR IB switch. The backing
  storage on
  the server is tmpfs mounted with noatime.
  I am running kernel 3.5.7.
 
  When running ib_send_bw, I get 4.3-4.5 GB/sec for block
  sizes 4-
  512K.
  When I run fio over rdma mounted nfs, I get 260-2200MB/sec
  for the
  same block sizes (4-512K). running over IPoIB-CM, I get
  200-
  980MB/sec.
  ...
  I am trying to get maximum performance from a single server
  - I used 2
  processes in fio test - more than 2 did not show any performance
  boost.
  I tried running fio from 2 different PCs on 2 different files,
  but the sum of
  the two is more or less the same as running from single client PC.
 
 
  I finally got up to 4.1GB/sec bandwidth with RDMA (ipoib-CM bandwidth is
 also way higher now).
  For some reason when I had intel IOMMU enabled, the performance
 dropped significantly.
  I now get up to ~95K IOPS and 4.1GB/sec bandwidth.
 
 Excellent, but is that 95K IOPS a typo? At 4KB, that's less than 400MBps.
 

That is not a typo. I get 95K IOPS with randrw test with block size 4K.
I get 4.1GBps with block size 256K randread test.

 What is the client CPU percentage you see under this workload, and how
 different are the NFS/RDMA and NFS/IPoIB overheads?

NFS/RDMA has about more 20-30% CPU usage than NFS/IPoIB, but RDMA has almost 
twice the bandwidth of IPoIB.
Overall, CPU usage gets up to about 20% for randread and 50% for randwrite.

 
  Now I will take care of the issue that I am running only at 40Gbit/s instead
 of 56Gbit/s, but that is another unrelated problem (I suspect I have a cable
 issue).
 
  This is still strange, since ib_send_bw with intel iommu enabled did get up
 to 4.5GB/sec, so why did intel iommu affect only nfs code?
 
 You'll need to do more profiling to track that down. I would suspect that
 ib_send_bw is using some sort of direct hardware access, bypassing the
 IOMMU management and possibly performing no dynamic memory
 registration.
 
 The NFS/RDMA code goes via the standard kernel DMA API, and correctly
 registers/deregisters memory on a per-i/o basis in order to provide storage
 data integrity. Perhaps there are overheads in the IOMMU management
 which can be addressed.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] libibverbs: Use autoreconf in autogen.sh

2013-04-30 Thread Jeff Squyres (jsquyres)
Bump bump.  :-)

On Apr 25, 2013, at 11:38 AM, Jeff Squyres (jsquyres) jsquy...@cisco.com 
wrote:

 Bump.
 
 On Apr 22, 2013, at 1:41 PM, Jeff Squyres jsquy...@cisco.com wrote:
 
 The old sequence of Autotools commands listed in autogen.sh is no
 longer correct.  Instead, just use the single autoreconf command,
 which will invoke all the Right Autotools commands in the correct
 order.
 
 Signed-off-by: Jeff Squyres jsquy...@cisco.com
 ---
 autogen.sh | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)
 
 diff --git a/autogen.sh b/autogen.sh
 index fd47839..6c9233e 100755
 --- a/autogen.sh
 +++ b/autogen.sh
 @@ -1,8 +1,4 @@
 #! /bin/sh
 
 set -x
 -aclocal -I config
 -libtoolize --force --copy
 -autoheader
 -automake --foreign --add-missing --copy
 -autoconf
 +autoreconf -ifv -I config
 -- 
 1.8.1.1
 
 
 
 -- 
 Jeff Squyres
 jsquy...@cisco.com
 For corporate legal information go to: 
 http://www.cisco.com/web/about/doing_business/legal/cri/
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-rdma in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: NFS over RDMA benchmark

2013-04-30 Thread Yan Burman


 -Original Message-
 From: Tom Talpey [mailto:t...@talpey.com]
 Sent: Tuesday, April 30, 2013 17:20
 To: Yan Burman
 Cc: J. Bruce Fields; Wendy Cheng; Atchley, Scott; Tom Tucker; linux-
 r...@vger.kernel.org; linux-...@vger.kernel.org; Or Gerlitz
 Subject: Re: NFS over RDMA benchmark
 
 On 4/30/2013 1:09 AM, Yan Burman wrote:
  I now get up to ~95K IOPS and 4.1GB/sec bandwidth.
 ...
   ib_send_bw with intel iommu enabled did get up to 4.5GB/sec
 
 BTW, you may want to verify that these are the same GB. Many benchmarks
 say KB/MB/GB when they really mean KiB/MiB/GiB.
 
 At GB/GiB, the difference is about 7.5%, very close to the difference between
 4.1 and 4.5.
 
 Just a thought.

The question is not why there is 400MBps difference between ib_send_bw and 
NFSoRDMA.
The question is why with IOMMU ib_send_bw got to the same bandwidth as without 
it while NFSoRDMA got half.

From some googling, it seems that when IOMMU is enabled, dma mapping functions 
get a lot more expensive.
Perhaps that is the reason for the performance drop.

Yan

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NFS over RDMA benchmark

2013-04-30 Thread Tom Talpey

On 4/30/2013 10:23 AM, Yan Burman wrote:

-Original Message-
From: Tom Talpey [mailto:t...@talpey.com]

On Sun, Apr 28, 2013 at 06:28:16AM +, Yan Burman wrote:

I finally got up to 4.1GB/sec bandwidth with RDMA (ipoib-CM bandwidth is

also way higher now).

For some reason when I had intel IOMMU enabled, the performance

dropped significantly.

I now get up to ~95K IOPS and 4.1GB/sec bandwidth.


Excellent, but is that 95K IOPS a typo? At 4KB, that's less than 400MBps.



That is not a typo. I get 95K IOPS with randrw test with block size 4K.
I get 4.1GBps with block size 256K randread test.


Well, then I suggest you focus on whether you are satisfied with a
high bandwidth goal or a high IOPS goal. They are two very different
things, and clearly there are still significant issues to track down
in the server.


What is the client CPU percentage you see under this workload, and how
different are the NFS/RDMA and NFS/IPoIB overheads?


NFS/RDMA has about more 20-30% CPU usage than NFS/IPoIB, but RDMA has almost 
twice the bandwidth of IPoIB.


So, for 125% of the CPU, RDMA is delivering 200% of the bandwidth.
A common reporting approach is to calculate cycles per Byte (roughly,
CPU/MB/sec), and you'll find this can be a great tool for comparison
when overhead is a consideration.


Overall, CPU usage gets up to about 20% for randread and 50% for randwrite.


This is *client* CPU? Writes require the server to take additional
overhead to make RDMA Read requests, but the client side is doing
practically the same thing for the read vs write path. Again, you
may want to profile more deeply to track that difference down.

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NFS over RDMA benchmark

2013-04-30 Thread Wendy Cheng
On Mon, Apr 29, 2013 at 10:09 PM, Yan Burman y...@mellanox.com wrote:

 I finally got up to 4.1GB/sec bandwidth with RDMA (ipoib-CM bandwidth is also 
 way higher now).
 For some reason when I had intel IOMMU enabled, the performance dropped 
 significantly.
 I now get up to ~95K IOPS and 4.1GB/sec bandwidth.
 Now I will take care of the issue that I am running only at 40Gbit/s instead 
 of 56Gbit/s, but that is another unrelated problem (I suspect I have a cable 
 issue).

 This is still strange, since ib_send_bw with intel iommu enabled did get up 
 to 4.5GB/sec, so why did intel iommu affect only nfs code?


That's very exciting ! The sad part is that IOMMU has to be turned off.

I think ib_send_bw uses a single buffer so the DMA mapping search
overhead is not an issue.

-- Wendy
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V4 for-next 1/5] IB/core: Add RSS and TSS QP groups - suggesting BOF during OFA conf to further discuss that

2013-04-30 Thread Jason Gunthorpe
On Tue, Apr 30, 2013 at 12:04:25PM +0300, Shlomo Pongratz wrote:

 And.. 'tss_qpn_mask_sz' seems unnecessarily limiting, using
   WC.srcQPN + ipoib_header.tss_qpn_offset == real QPN
   (ie use a signed offset, not a mask)
 Seems much better than
   Wc.srcQPN  ~((1(ipoib_header.tss_qpn_mask_sz  12))-1) == real QPN
   (Did I even get that right?)

 Your suggestion could have been valid if the the IPoIB header was larger.
 Please note that the a QPN occupies 3 octets and thus its value lies
 in the range of [0..0xFF].

I am aware of this, and it isn't really a problem, adaptors that
allocate randomly across the entire QPN space would not be compatible
with this approach, but most adaptors allocate QPNs
quasi-contiguously.

Basically, at startup, IPoIB would allocate a TX QP, then allocate TSS
QPs, and throw away any that can't fit in the encoding, until it
reaches the target number or tries too long. No need for a special API
to the driver.

Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NFS over RDMA benchmark

2013-04-30 Thread Tom Tucker

On 4/30/13 9:38 AM, Yan Burman wrote:



-Original Message-
From: Tom Talpey [mailto:t...@talpey.com]
Sent: Tuesday, April 30, 2013 17:20
To: Yan Burman
Cc: J. Bruce Fields; Wendy Cheng; Atchley, Scott; Tom Tucker; linux-
r...@vger.kernel.org; linux-...@vger.kernel.org; Or Gerlitz
Subject: Re: NFS over RDMA benchmark

On 4/30/2013 1:09 AM, Yan Burman wrote:

I now get up to ~95K IOPS and 4.1GB/sec bandwidth.
...
  ib_send_bw with intel iommu enabled did get up to 4.5GB/sec

BTW, you may want to verify that these are the same GB. Many benchmarks
say KB/MB/GB when they really mean KiB/MiB/GiB.

At GB/GiB, the difference is about 7.5%, very close to the difference between
4.1 and 4.5.

Just a thought.

The question is not why there is 400MBps difference between ib_send_bw and 
NFSoRDMA.
The question is why with IOMMU ib_send_bw got to the same bandwidth as without 
it while NFSoRDMA got half.
NFSRDMA is constantly registering and unregistering memory when you use 
FRMR mode. By contrast IPoIB has a descriptor ring that is set up once 
and re-used. I suspect this is the difference maker. Have you tried 
running the server in ALL_PHYSICAL mode, i.e. where it uses a DMA_MR for 
all of memory?


Tom

From some googling, it seems that when IOMMU is enabled, dma mapping functions 
get a lot more expensive.
Perhaps that is the reason for the performance drop.

Yan


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V4 for-next 1/5] IB/core: Add RSS and TSS QP groups - suggesting BOF during OFA conf to further discuss that

2013-04-30 Thread Or Gerlitz
Jason Gunthorpe jguntho...@obsidianresearch.com wrote:

 For the TSS case, I'd say just allocate normal QPs and provide
 something like ibv_override_ud_src_qpn(). This is very general and
 broadly useful for any application using UD QPs.


I've lost you, how you suggest to implement ibv_override_ud_src_qpn(), is
that for future HW or you have a method to get work today.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V4 for-next 1/5] IB/core: Add RSS and TSS QP groups - suggesting BOF during OFA conf to further discuss that

2013-04-30 Thread Jason Gunthorpe
On Tue, Apr 30, 2013 at 11:08:19PM +0300, Or Gerlitz wrote:
 Jason Gunthorpe jguntho...@obsidianresearch.com wrote:
 
 For the TSS case, I'd say just allocate normal QPs and provide
 something like ibv_override_ud_src_qpn(). This is very general and
 broadly useful for any application using UD QPs.
 
 I've lost you, how you suggest to implement ibv_override_ud_src_qpn(), is that
 for future HW or you have a method to get work today.

I meant as a user space API alternative to the parent/child group API
for transmit. It would require some level of driver/FW/HW support of
course.

Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for-next 6/9] IB/core: Add receive Flow Steering support

2013-04-30 Thread Or Gerlitz
On Mon, Apr 29, 2013 at 10:40 PM, Christoph Lameter c...@linux.com wrote:
 On Mon, 29 Apr 2013, Steve Wise wrote:

 Hey Or,  This looks good at first glance.  I must confess I cannot tell yet 
 if
 this will provide everything we need for chelsio's RAW packet requirements.
 But I think we should move forward on this, and enhance as needed.

 Well we are using the raw qp s here too and would like to use receive
 flow steering. Could we please get this merged?

Steve, Christoph -- thanks for the positive feedback.

So Roland, not that I expect this double ack to behave as our gerrit
system where a +2 feedback triggers acceptance... but still,  there's
real world need here and real patches that address that need - any
questions or comments on them? if not, are they going to get into
3.10?

Or.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html