[RFC v3 00/11] HFI Virtual Network Interface Controller (VNIC)

2017-02-07 Thread Vishwanathapura, Niranjana
ChangeLog:
=
v2 => v3:
a) Introduce and adopt generic RDMA netdev interface including,
 - having bottom hfi1 driver directly interfacing with netstack.
 - optimizing interface between hfi_vnic and hfi1 driver.
b) Remove bitfield usage.
c) Move hfi_vnic driver to drivers/infiniband/ulp folder.
d) Minor fixes

v2: Initial post @ https://www.spinics.net/lists/linux-rdma/msg44110.html

Description:

Intel Omni-Path Host Fabric Interface (HFI) Virtual Network Interface
Controller (VNIC) feature supports Ethernet functionality over Omni-Path
fabric by encapsulating the Ethernet packets between HFI nodes.

The patterns of exchanges of Omni-Path encapsulated Ethernet packets
involves one or more virtual Ethernet switches overlaid on the Omni-Path
fabric topology. A subset of HFI nodes on the Omni-Path fabric are
permitted to exchange encapsulated Ethernet packets across a particular
virtual Ethernet switch. The virtual Ethernet switches are logical
abstractions achieved by configuring the HFI nodes on the fabric for
header generation and processing. In the simplest configuration all HFI
nodes across the fabric exchange encapsulated Ethernet packets over a
single virtual Ethernet switch. A virtual Ethernet switch, is effectively
an independent Ethernet network. The configuration is performed by an
Ethernet Manager (EM) which is part of the trusted Fabric Manager (FM)
application. HFI nodes can have multiple VNICs each connected to a
different virtual Ethernet switch. The below diagram presents a case
of two virtual Ethernet switches with two HFI nodes.

 +---+
 |  Subnet/  |
 | Ethernet  |
 |  Manager  |
 +---+
/  /
  /   /
//
  / /
+-+  +--+
|  Virtual Ethernet Switch|  |  Virtual Ethernet Switch |
|  +-++-+ |  | +-++-+   |
|  | VPORT   ||  VPORT  | |  | |  VPORT  ||  VPORT  |   |
+--+-++-+-+  +-+-++-+---+
 | \/ |
 |   \/   |
 | \/ |
 |/  \|
 |  /  \  |
 +---++  +---++
 |   VNIC|VNIC|  |VNIC   |VNIC|
 +---++  +---++
 |  HFI   |  |  HFI   |
 ++  ++


Intel HFI VNIC software design is presented in the below diagram.
HFI VNIC functionality has a HW dependent component and a HW
independent component.

The support has been added for IB device to allocate and free the RDMA
netdev devices. The RDMA netdev supports interfacing with the network
stack thus creating standard network interfaces. HFI_VNIC is an RDMA
netdev device type.

The HW dependent VNIC functionality is part of the HFI1 driver. It
implements the verbs to allocate and free the HFI_VNIC RDMA netdev.
It involves HW resource allocation/management for VNIC functionality.
It interfaces with the network stack and implements the required
net_device_ops functions. It expects Omni-Path encapsulated Ethernet
packets in the transmit path and provides HW access to them. It strips
the Omni-Path header from the received packets before passing them up
the network stack. It also implements the RDMA netdev control operations.

The HFI VNIC module implements the HW independent VNIC functionality.
It consists of two parts. The VNIC Ethernet Management Agent (VEMA)
registers itself with IB core as an IB client and interfaces with the
IB MAD stack. It exchanges the management information with the Ethernet
Manager (EM) and the VNIC netdev. The VNIC netdev part allocates and frees
the HFI_VNIC RDMA netdev devices. It overrides the net_device_ops functions
set by HW dependent VNIC driver where required to accommodate any control
operation. It also handles the encapsulation of Ethernet packets with an
Omni-Path header in the transmit path. For each VNIC interface, the
information required for encapsulation is configured by the EM via VEMA MAD
interface. It also passes any control information to the HW dependent driver
by invoking the RDMA netdev control operations.

+---+ +--+
|   | |   Linux  |
| IB MAD| |  Network |
|   | |   Stack  |
+---+ +--+

Re: [RFC v3 00/11] HFI Virtual Network Interface Controller (VNIC)

2017-02-22 Thread Vishwanathapura, Niranjana

On Mon, Feb 13, 2017 at 10:09:35AM -0700, Jason Gunthorpe wrote:

On Sun, Feb 12, 2017 at 01:26:35PM +, Liran Liss wrote:

> From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-
> ow...@vger.kernel.org] On Behalf Of Vishwanathapura, Niranjana

>
> ChangeLog:
> =
> v2 => v3:
> a) Introduce and adopt generic RDMA netdev interface including,
>  - having bottom hfi1 driver directly interfacing with netstack.
>  - optimizing interface between hfi_vnic and hfi1 driver.
> b) Remove bitfield usage.
> c) Move hfi_vnic driver to drivers/infiniband/ulp folder.

The vnic driver should be placed under drivers/infiniband/hw/hfi1/*
since it is HFI-specific.


I think they should call it opa_vnic and keep it in ulp to avoid this
confusion.



Alright, I am renaming it as opa_vnic to avoid any confusion.

Thanks,
Niranjana


Jason


Re: [RFC v3 00/11] HFI Virtual Network Interface Controller (VNIC)

2017-02-07 Thread Leon Romanovsky
On Tue, Feb 07, 2017 at 12:22:59PM -0800, Vishwanathapura, Niranjana wrote:
> ChangeLog:
> =
> v2 => v3:
> a) Introduce and adopt generic RDMA netdev interface including,
>  - having bottom hfi1 driver directly interfacing with netstack.
>  - optimizing interface between hfi_vnic and hfi1 driver.
> b) Remove bitfield usage.
> c) Move hfi_vnic driver to drivers/infiniband/ulp folder.

I didn't read patches yet, and prefer to ask it in advance. Does this new ULP 
work with all
drivers/infiniband/hw/* devices as it is expected from ULP?

Thanks


signature.asc
Description: PGP signature


RE: [RFC v3 00/11] HFI Virtual Network Interface Controller (VNIC)

2017-02-07 Thread Hefty, Sean
> I didn't read patches yet, and prefer to ask it in advance. Does this
> new ULP work with all
> drivers/infiniband/hw/* devices as it is expected from ULP?

Like the way ipoib or srp work with all hw devices?  What is the real point of 
this question?


Re: [RFC v3 00/11] HFI Virtual Network Interface Controller (VNIC)

2017-02-07 Thread Bart Van Assche
On Tue, 2017-02-07 at 12:22 -0800, Vishwanathapura, Niranjana wrote:
> Intel Omni-Path Host Fabric Interface (HFI) Virtual Network Interface
> Controller (VNIC) feature supports Ethernet functionality over Omni-Path
> fabric by encapsulating the Ethernet packets between HFI nodes.

This may have been stated before, but what is missing from this description
is an explanation of why accepting an Ethernet over RDMA driver in the
upstream kernel is considered useful. We already have an IPoIB driver, so
why to add another driver to the kernel tree for communicating IP packets
over an RDMA network?

Bart.


Re: [RFC v3 00/11] HFI Virtual Network Interface Controller (VNIC)

2017-02-07 Thread Vishwanathapura, Niranjana

On Tue, Feb 07, 2017 at 01:00:05PM -0800, Hefty, Sean wrote:

I didn't read patches yet, and prefer to ask it in advance. Does this
new ULP work with all
drivers/infiniband/hw/* devices as it is expected from ULP?


Like the way ipoib or srp work with all hw devices?  What is the real point of 
this question?


Leon,
It was already discussed in below threads.

https://www.spinics.net/lists/linux-rdma/msg44128.html
https://www.spinics.net/lists/linux-rdma/msg44131.html
https://www.spinics.net/lists/linux-rdma/msg44155.html

Niranjana



RE: [RFC v3 00/11] HFI Virtual Network Interface Controller (VNIC)

2017-02-07 Thread Hefty, Sean
> This may have been stated before, but what is missing from this
> description
> is an explanation of why accepting an Ethernet over RDMA driver in the
> upstream kernel is considered useful. We already have an IPoIB driver,
> so
> why to add another driver to the kernel tree for communicating IP
> packets
> over an RDMA network?

This is Ethernet - not IP - encapsulation over a non-InfiniBand device/protocol.


Re: [RFC v3 00/11] HFI Virtual Network Interface Controller (VNIC)

2017-02-07 Thread Bart Van Assche
On Tue, 2017-02-07 at 21:44 +, Hefty, Sean wrote:
> This is Ethernet - not IP - encapsulation over a non-InfiniBand 
> device/protocol.

That's more than clear from the cover letter. In my opinion the cover letter
should explain why it is considered useful to have such a driver upstream
and what the use cases are of encapsulating Ethernet frames inside RDMA
packets.

Bart.


Re: [RFC v3 00/11] HFI Virtual Network Interface Controller (VNIC)

2017-02-07 Thread Vishwanathapura, Niranjana

On Tue, Feb 07, 2017 at 09:58:50PM +, Bart Van Assche wrote:

On Tue, 2017-02-07 at 21:44 +, Hefty, Sean wrote:

This is Ethernet - not IP - encapsulation over a non-InfiniBand device/protocol.


That's more than clear from the cover letter. In my opinion the cover letter
should explain why it is considered useful to have such a driver upstream
and what the use cases are of encapsulating Ethernet frames inside RDMA
packets.



We believe on our HW, HFI VNIC design gives better hardware resource usage 
which is also scalable and hence room for better performance.
Also as evident in the cover letter, it gives us better manageability by 
defining virtual Ethernet switches overlaid on the fabric and

use standard Ethernet support provided by Linux.

Niranjana




Re: [RFC v3 00/11] HFI Virtual Network Interface Controller (VNIC)

2017-02-07 Thread Bart Van Assche
On Tue, 2017-02-07 at 16:54 -0800, Vishwanathapura, Niranjana wrote:
> On Tue, Feb 07, 2017 at 09:58:50PM +, Bart Van Assche wrote:
> > On Tue, 2017-02-07 at 21:44 +, Hefty, Sean wrote:
> > > This is Ethernet - not IP - encapsulation over a non-InfiniBand 
> > > device/protocol.
> > 
> > That's more than clear from the cover letter. In my opinion the cover letter
> > should explain why it is considered useful to have such a driver upstream
> > and what the use cases are of encapsulating Ethernet frames inside RDMA
> > packets.
> 
> We believe on our HW, HFI VNIC design gives better hardware resource usage 
> which is also scalable and hence room for better performance.
> Also as evident in the cover letter, it gives us better manageability by 
> defining virtual Ethernet switches overlaid on the fabric and
> use standard Ethernet support provided by Linux.

That kind of language is appropriate for a marketing brochure but not for a
technical forum. Even reading your statement twice did not make me any wiser.
You mentioned "better hardware resource usage". Compared to what? Is that
perhaps compared to IPoIB? Since Ethernet frames have an extra header and are
larger than IPoIB frames, how can larger frames result in better hardware
resource usage? And what is a virtual Ethernet switch? Is this perhaps packet
forwarding by software? If so, why are virtual Ethernet switches needed since
the Linux networking stack already supports packet forwarding?

Thanks,

Bart.


Re: [RFC v3 00/11] HFI Virtual Network Interface Controller (VNIC)

2017-02-07 Thread Leon Romanovsky
On Tue, Feb 07, 2017 at 01:43:03PM -0800, Vishwanathapura, Niranjana wrote:
> On Tue, Feb 07, 2017 at 01:00:05PM -0800, Hefty, Sean wrote:
> > > I didn't read patches yet, and prefer to ask it in advance. Does this
> > > new ULP work with all
> > > drivers/infiniband/hw/* devices as it is expected from ULP?
> >
> > Like the way ipoib or srp work with all hw devices?  What is the real point 
> > of this question?
>
> Leon,
> It was already discussed in below threads.
>
> https://www.spinics.net/lists/linux-rdma/msg44128.html
> https://www.spinics.net/lists/linux-rdma/msg44131.html
> https://www.spinics.net/lists/linux-rdma/msg44155.html

Yes, but you still didn't answer on my question.
From the first link:
--
  If that is your position then this should be a straight up IB ULP that
  works with any IB hardware.

Yes, see my comments in point #3 of my previous email...
--

Can I grab these patches and run on one of 14 drivers available in
drivers/inifiniband/hw/* ?

>
> Niranjana
>


signature.asc
Description: PGP signature


Re: [RFC v3 00/11] HFI Virtual Network Interface Controller (VNIC)

2017-02-07 Thread Leon Romanovsky
On Tue, Feb 07, 2017 at 09:00:05PM +, Hefty, Sean wrote:
> > I didn't read patches yet, and prefer to ask it in advance. Does this
> > new ULP work with all
> > drivers/infiniband/hw/* devices as it is expected from ULP?
>
> Like the way ipoib or srp work with all hw devices?  What is the real point 
> of this question?

Sorry, but I don't understand your response. Both IPoIB and SRP were 
standardized
and implemented years before hfi was brought into the RDMA stack, so on
time of introduction they clearly supported all the devices.

Does this VNIC interface have standard? Where can I see HFI wire
protocol to implement HFI VNIC support in our HW?

Thanks.


signature.asc
Description: PGP signature


RE: [RFC v3 00/11] HFI Virtual Network Interface Controller (VNIC)

2017-02-07 Thread Weiny, Ira
> 
> On Tue, 2017-02-07 at 16:54 -0800, Vishwanathapura, Niranjana wrote:
> > On Tue, Feb 07, 2017 at 09:58:50PM +, Bart Van Assche wrote:
> > > On Tue, 2017-02-07 at 21:44 +, Hefty, Sean wrote:
> > > > This is Ethernet - not IP - encapsulation over a non-InfiniBand
> device/protocol.
> > >
> > > That's more than clear from the cover letter. In my opinion the
> > > cover letter should explain why it is considered useful to have such
> > > a driver upstream and what the use cases are of encapsulating
> > > Ethernet frames inside RDMA packets.
> >
> > We believe on our HW, HFI VNIC design gives better hardware resource
> > usage which is also scalable and hence room for better performance.
> > Also as evident in the cover letter, it gives us better manageability
> > by defining virtual Ethernet switches overlaid on the fabric and use
> > standard Ethernet support provided by Linux.
> 
> That kind of language is appropriate for a marketing brochure but not for a
> technical forum.

Well.  That is not totally true.  Perhaps more detail on how we get better 
performance but we thought this has been covered already.

> Even reading your statement twice did not make me any wiser.
> You mentioned "better hardware resource usage". Compared to what? Is that
> perhaps compared to IPoIB?  Since Ethernet frames have an extra header and
> are larger than IPoIB frames, how can larger frames result in better hardware
> resource usage? 

Yes, as compared to IPoIB.  The problem with IPoIB is it introduces a 
significant amount of Verbs overhead which is not needed for Ethernet 
encapsulation.  Especially on hardware such as ours.  As Jason has mentioned 
having a more generic "skb_send" or "skb_qp" has been discussed in the past.

As we discussed at the plumbers conference not all send/receive paths are 
"Queue Pairs".  Yes we have a send queue (multiple send queues actually) and a 
recv queue (again multiple queues) but there is no pairing of the queues at 
all.  There are no completion semantics required either.  This reduced overhead 
results in better performance on our hardware.

> And what is a virtual Ethernet switch? Is this perhaps packet
> forwarding by software? If so, why are virtual Ethernet switches needed since
> the Linux networking stack already supports packet forwarding?

Virtual Ethernet switches provide packet switching through the native OPA 
switches via OPA Virtual Fabrics (a tuple of the path information including 
lid/pkey/sl/mtu).  This is not packet forwarding within the node.  A large 
advantage here is that the virtual switches are centrally managed by the EM in 
a very scalable way.  For example, the IPoIB configuration semantics such as 
multicast group join/create, Path Record queries, etc are all eliminated.  
Further reducing overhead.

Ira



RE: [RFC v3 00/11] HFI Virtual Network Interface Controller (VNIC)

2017-02-08 Thread Hefty, Sean
> > Even reading your statement twice did not make me any wiser.
> > You mentioned "better hardware resource usage". Compared to what? Is
> that
> > perhaps compared to IPoIB?  Since Ethernet frames have an extra
> header and
> > are larger than IPoIB frames, how can larger frames result in better
> hardware
> > resource usage?
> 
> Yes, as compared to IPoIB.  The problem with IPoIB is it introduces a
> significant amount of Verbs overhead which is not needed for Ethernet
> encapsulation.  Especially on hardware such as ours.  As Jason has
> mentioned having a more generic "skb_send" or "skb_qp" has been
> discussed in the past.

Let's start discussing whether ipoib should be in the upstream kernel while 
we're at this nonsense.

The IBTA chose to encapsulate IP over IB transport as their mechanism for 
supporting traditional socket based applications.  OPA chose Ethernet 
encapsulated over OPA link layer.  RDMA isn't involved with OPA.  This is a 
pointless discussion over architecture.



Re: [RFC v3 00/11] HFI Virtual Network Interface Controller (VNIC)

2017-02-08 Thread Jason Gunthorpe
On Tue, Feb 07, 2017 at 04:54:16PM -0800, Vishwanathapura, Niranjana wrote:
> On Tue, Feb 07, 2017 at 09:58:50PM +, Bart Van Assche wrote:
> >On Tue, 2017-02-07 at 21:44 +, Hefty, Sean wrote:
> >>This is Ethernet - not IP - encapsulation over a non-InfiniBand 
> >>device/protocol.
> >
> >That's more than clear from the cover letter. In my opinion the cover letter
> >should explain why it is considered useful to have such a driver upstream
> >and what the use cases are of encapsulating Ethernet frames inside RDMA
> >packets.
> >
> 
> We believe on our HW, HFI VNIC design gives better hardware resource usage
> which is also scalable and hence room for better performance.

Lets not go overboard here, you are mainly getting better performance
because vnic has the new netdev interface to the HFI driver. There is
nothing stopping ipoib from having the same capability and a similar
HFI driver implementation.

> Also as evident in the cover letter, it gives us better manageability by
> defining virtual Ethernet switches overlaid on the fabric and
> use standard Ethernet support provided by Linux.

This is true and probably the most important reason..

Jason


RE: [RFC v3 00/11] HFI Virtual Network Interface Controller (VNIC)

2017-02-12 Thread Liran Liss
> From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-
> ow...@vger.kernel.org] On Behalf Of Vishwanathapura, Niranjana

> 
> ChangeLog:
> =
> v2 => v3:
> a) Introduce and adopt generic RDMA netdev interface including,
>  - having bottom hfi1 driver directly interfacing with netstack.
>  - optimizing interface between hfi_vnic and hfi1 driver.
> b) Remove bitfield usage.
> c) Move hfi_vnic driver to drivers/infiniband/ulp folder.

The vnic driver should be placed under drivers/infiniband/hw/hfi1/* since it is 
HFI-specific.
--Liran



Re: [RFC v3 00/11] HFI Virtual Network Interface Controller (VNIC)

2017-02-13 Thread Jason Gunthorpe
On Sun, Feb 12, 2017 at 01:26:35PM +, Liran Liss wrote:
> > From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-
> > ow...@vger.kernel.org] On Behalf Of Vishwanathapura, Niranjana
> 
> > 
> > ChangeLog:
> > =
> > v2 => v3:
> > a) Introduce and adopt generic RDMA netdev interface including,
> >  - having bottom hfi1 driver directly interfacing with netstack.
> >  - optimizing interface between hfi_vnic and hfi1 driver.
> > b) Remove bitfield usage.
> > c) Move hfi_vnic driver to drivers/infiniband/ulp folder.
> 
> The vnic driver should be placed under drivers/infiniband/hw/hfi1/*
> since it is HFI-specific.

I think they should call it opa_vnic and keep it in ulp to avoid this
confusion.

Jason