Re: [PATCH rdma-next V2 00/32] Soft-RoCE driver

2015-12-26 Thread Doug Ledford
On 12/24/2015 09:20 AM, Christoph Hellwig wrote:
> On Thu, Dec 24, 2015 at 02:58:10PM +0200, Or Gerlitz wrote:
>> On Thu, Dec 24, 2015 at 12:02 PM, Christoph Hellwig  
>> wrote:
>>> On Thu, Dec 24, 2015 at 11:17:46AM +0200, Kamal Heib wrote:
 We've located the driver in the staging subtree. This follows a requirement
 to implement an IB transport library - Soft RoCE is in the same boat like 
 the hfi1
 driver. We need to define and implement a lib to prevent those code
 duplications.
>>>
>>> Given the trainwreck that the staging process is it might seems more
>>> sensible to get it into a stage and then merge it directly.  You'll
>>> probably save yourself a lot of work that way.
>>
>> I am not sure what you mean by "get it into a stage and then merge it
>> directly" --i
>> is that not go through staging at all?
> 
> Sorry, I should have not finished that email in a hurry before leaving
> the house.  Let me rephrase:
> 
> Given the trainwreck that the staging process is it, might be more
> sensible to get it into shape and then merge it directly.  You'll
> probably save yourself a lot of work that way.
> 

Greg and I have an agreement (again) on staging/rdma.  This shouldn't be
an issue.

-- 
Doug Ledford 
  GPG KeyID: 0E572FDD




signature.asc
Description: OpenPGP digital signature


Re: [PATCH rdma-next V2 00/32] Soft-RoCE driver

2015-12-26 Thread Ming Lin
On Thu, 2015-12-24 at 11:17 +0200, Kamal Heib wrote:
> Doug and list Hi,
> 
> This patchset introduces Soft RoCE driver.

Thanks to submit this driver.
I have been looking for this kind of driver a while to setup a RDMA
environment with 2 virtual machines.

I did a quick test with 2 VMs.

root@vm1:~# rxe_cfg start
tee: /sys/kernel/config/rdma_cm/rxe0/default_roce_mode: Permission denied
IBOE V2
  Name  Link  Driver  Speed  NMTU  IPv4_addr  RDEV  RMTU  
  eth0  yes   8139cp 1500  192.168.122.3  rxe0  1024  (3)

ibv_rc_pingpong works!

root@vm1:~# ibv_rc_pingpong -d rxe0 -g 0
  local address:  LID 0x, QPN 0x11, PSN 0xfb6f08, GID 
fe80::5054:ff:fe12:3456
  remote address: LID 0x, QPN 0x11, PSN 0xcb2acb, GID 
fe80::5054:ff:fe5f:8a49
8192000 bytes in 0.66 seconds = 99.94 Mbit/sec
1000 iters in 0.66 seconds = 655.76 usec/iter

root@vm2:~# ibv_rc_pingpong -d rxe0 -g 0 192.168.122.89
  local address:  LID 0x, QPN 0x11, PSN 0xcb2acb, GID 
fe80::5054:ff:fe5f:8a49
  remote address: LID 0x, QPN 0x11, PSN 0xfb6f08, GID 
fe80::5054:ff:fe12:3456
8192000 bytes in 0.66 seconds = 99.70 Mbit/sec
1000 iters in 0.66 seconds = 657.32 usec/iter

rping seems works too!

root@vm1:~# rping -s -d
created cm_id 0x1683d20
rdma_bind_addr successful
rdma_listen
cma_event type RDMA_CM_EVENT_CONNECT_REQUEST cma_id 0x1684200 (child)
child cma 0x1684200
created pd 0x16844a0
created channel 0x16844c0
created cq 0x16844e0
created qp 0x1684590
rping_setup_buffers called on cb 0x1683010
allocated & registered buffers...
accepting client connection request
cq_thread started.
cma_event type RDMA_CM_EVENT_ESTABLISHED cma_id 0x1684200 (child)
ESTABLISHED
recv completion
Received rkey 118b addr 22899e0 len 64 from peer
server received sink adv
server posted rdma read req 
rdma read completion
server received read complete
server posted go ahead
send completion
recv completion
Received rkey 1045 addr 2289950 len 64 from peer
server received sink adv
rdma write from lkey d11 laddr 16846d0 len 64
rdma write completion
server rdma write complete 
server posted go ahead
send completion
cma_event type RDMA_CM_EVENT_DISCONNECTED cma_id 0x1684200 (child)
server DISCONNECT EVENT...
wait for RDMA_READ_ADV state 9
rping_free_buffers called on cb 0x1683010
destroy cm_id 0x1683d20

root@vm2:~# rping -c -d -a 192.168.122.89 -C 1
count 1
created cm_id 0x2289200
cma_event type RDMA_CM_EVENT_ADDR_RESOLVED cma_id 0x2289200 (parent)
cma_event type RDMA_CM_EVENT_ROUTE_RESOLVED cma_id 0x2289200 (parent)
rdma_resolve_addr - rdma_resolve_route successful
created pd 0x22896e0
created channel 0x2289700
created cq 0x2289720
created qp 0x22897d0
rping_setup_buffers called on cb 0x2288010
allocated & registered buffers...
cq_thread started.
cma_event type RDMA_CM_EVENT_ESTABLISHED cma_id 0x2289200 (parent)
ESTABLISHED
rmda_connect successful
RDMA addr 22899e0 rkey 118b len 64
send completion
recv completion
RDMA addr 2289950 rkey 1045 len 64
send completion
recv completion
rping_free_buffers called on cb 0x2288010
destroy cm_id 0x2289200



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH rdma-next V2 00/32] Soft-RoCE driver

2015-12-24 Thread Kamal Heib
Doug and list Hi,

This patchset introduces Soft RoCE driver.

Some background on the driver: The original Soft-RoCE driver was implemented by
Bob Pearson from SFW. Bob started the submission process [1], but his work was
abandoned after v2.
Mellanox decided to pick it up and continue the submission. As part of the
process we detected some problems with the original implementation. Mainly, we
wanted to RoCEv2, also, there are too many locks and
context switches in the data path. Most of them are already removed.

We've located the driver in the staging subtree. This follows a requirement
to implement an IB transport library - Soft RoCE is in the same boat like the 
hfi1
driver. We need to define and implement a lib to prevent those code
duplications.

We did address the feedback provided on the original submission.

Soft-RoCE is sitting on top of Matan's RoCEv2 series [2] which was taken
to 4.5 and present Doug's k.o/for-4.5 branch.

RXE user space (librxe) is located at github [4] with instructions how to use
it [5]

Some notes on the architecture and design:

ib_rxe, implements the RDMA transport and registers with the RDMA core as a
kernel verbs provider. It also implements the packet IO layer. ib_rxe attaches
to the Linux netdev stack as a udp encapsulating protocol and can send and
receive packets over any Ethernet device. It uses the RoCEv2 protocol to handle
RDMA transport.

The modules are configured by entries in /sys. There is a configuration script
(rxe_cfg) that simplifies the use of this interface. rxe_cfg is part of the
rxe user space code, librxe.

The use of rxe verbs in user space requires the inclusion of librxe as a device
specific plug-in to libibverbs. librxe is packaged separately [4].

Copies of the user space library and tools for 'upstream' and a clone of Doug's 
tree with
these patches applied are available at github [3] under rxe_submission-v2 branch

Architecture:

~

 +---+
 |  Application  |
 +---+
 +---+
 | libibverbs|
User +---+
 ++ ++
 | librxe | | HW RoCE lib|
 ++ ++
~
 +--+   ++
 | Sockets  |   | RDMA ULP   |
 +--+   ++
 +--+  +-+
 | TCP/IP   |  | ib_core |
 +--+  +-+
 ++ ++
Kernel   | ib_rxe | | HW RoCE driver |
 ++ ++
 ++
 | NIC driver |
 ++
~

The driver components and a non asci chart of the module could be found at a
pdf [6] presented by Bob before the original submission.
The design is very similar, one thing that was changed, is the arbiter task
that was removed. This reduced the number of context switches and locks during
the data path.

A TODO file is placed under the driver folder.

Thanks,
Kamal, Liran and Amir

[1] - http://www.spinics.net/lists/linux-rdma/msg08936.html
[2] - http://marc.info/?l=linux-rdma=145087562709661=2
[3] - https://github.com/SoftRoCE/rxe-dev
[4] - https://github.com/SoftRoCE/librxe-dev
[5] - https://github.com/SoftRoCE/rxe-dev/wiki/rxe-dev:-Home
[6] - 
http://downloads.openfabrics.org/Media/Sonoma2010/Sonoma_2010_Wednesday_rxe.pdf

Changes from V0:
- Rebased to 4.3-rc1
- IPv4 based sessions work
- Fixed the link speed and width we report to the query port verb
- Update the TODO file with Sagi's request

Changes from V1:
- Rebased to 4.4.0-rc6 and to Doug's k.o/for-4.5 github branch 
- Move driver to be under "drivers/staging/rdma/"

Amir Vadai (3):
  IB/core: Macro for RoCEv2 UDP port
  IB/rxe: Shared objects between user and kernel
  IB/rxe: TODO file while in staging

Kamal Heib (29):
  IB/core: Add SEND_LAST_INV and SEND_ONLY_INV opcodes
  IB/rxe: IBA header types and methods
  IB/rxe: Bit mask and lengths declaration for different opcodes
  IB/rxe: Default rxe device and port parameters
  IB/rxe: External interface to lower level modules
  IB/rxe: Misc local interfaces between files in ib_rxe
  IB/rxe: Add maintainer for rxe driver
  IB/rxe: Work 

Re: [PATCH rdma-next V2 00/32] Soft-RoCE driver

2015-12-24 Thread Christoph Hellwig
On Thu, Dec 24, 2015 at 11:17:46AM +0200, Kamal Heib wrote:
> We've located the driver in the staging subtree. This follows a requirement
> to implement an IB transport library - Soft RoCE is in the same boat like the 
> hfi1
> driver. We need to define and implement a lib to prevent those code
> duplications.

Given the trainwreck that the staging process is it might seems more
sensible to get it into a stage and then merge it directly.  You'll
probably save yourself a lot of work that way.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH rdma-next V2 00/32] Soft-RoCE driver

2015-12-24 Thread Christoph Hellwig
On Thu, Dec 24, 2015 at 02:58:10PM +0200, Or Gerlitz wrote:
> On Thu, Dec 24, 2015 at 12:02 PM, Christoph Hellwig  
> wrote:
> > On Thu, Dec 24, 2015 at 11:17:46AM +0200, Kamal Heib wrote:
> >> We've located the driver in the staging subtree. This follows a requirement
> >> to implement an IB transport library - Soft RoCE is in the same boat like 
> >> the hfi1
> >> driver. We need to define and implement a lib to prevent those code
> >> duplications.
> >
> > Given the trainwreck that the staging process is it might seems more
> > sensible to get it into a stage and then merge it directly.  You'll
> > probably save yourself a lot of work that way.
> 
> I am not sure what you mean by "get it into a stage and then merge it
> directly" --i
> is that not go through staging at all?

Sorry, I should have not finished that email in a hurry before leaving
the house.  Let me rephrase:

Given the trainwreck that the staging process is it, might be more
sensible to get it into shape and then merge it directly.  You'll
probably save yourself a lot of work that way.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH rdma-next V2 00/32] Soft-RoCE driver

2015-12-24 Thread Or Gerlitz
On Thu, Dec 24, 2015 at 12:02 PM, Christoph Hellwig  wrote:
> On Thu, Dec 24, 2015 at 11:17:46AM +0200, Kamal Heib wrote:
>> We've located the driver in the staging subtree. This follows a requirement
>> to implement an IB transport library - Soft RoCE is in the same boat like 
>> the hfi1
>> driver. We need to define and implement a lib to prevent those code
>> duplications.
>
> Given the trainwreck that the staging process is it might seems more
> sensible to get it into a stage and then merge it directly.  You'll
> probably save yourself a lot of work that way.

I am not sure what you mean by "get it into a stage and then merge it
directly" --i
is that not go through staging at all?

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html