Re: [PATCH for-next V2 00/11] Add RoCE v2 support

2015-12-17 Thread Moni Shoua
> No, not true.  You are implementing RoCEv2 support, which is an entirely
> new feature.  So this feature can't have had a security hole since
> forever as it has never been in the kernel before now.  The objections
> are arising because of the ordering of events.  Specifically, we added
> the core namespace support (even though it isn't complete, so far it's
> the infrastructure ready for various upper portions of the stack to
> start using, but it isn't a complete stack wide solution yet) first, and
> so this new feature, which will need to be a part of that namespace
> infrastructure that other parts of the IB stack can use, should have its
> namespace support already enabled (ideally, but if it didn't, it should
> at least have a clear plan for how to enable it in the future).  Jason's
> objection is based upon this premise and the fact that a technical
> review of the code makes it look like the core namespace infrastructure
> becomes less complete, not more, with the inclusion of these patches.
>
> As I understand it, prior to these patches there would always be a 1:1
> mapping of GID to gid_index because you would never have duplicate GIDs
> in the GID table.  That allowed an easy, definitive 1:1 mapping of GID
> to namespace via the existing infrastructure for any received packet [1].
>

Incorrect. Even with RoCEv1 you can have 2 GID table entries with the same
GID value but with different VLANs. The conflict in looking for a matching index
was resolved by making the pair  as a key for this table. The value
of VLAN was obtained from the completion BTW. All we do now is widen the
definition to include network_type (or L3 protocol if you like) to the
key (and therefore
to the completion structure)

> These patches add the concept of duplicate GIDs that are differentiated
> by their RoCE version (also called network type).  So, now, an incoming
> packet could match a couple different gid_indexes and we need additional
> information to get back to the definitive 1:1 mapping.  The submitted
> patches are designed around a lazy resolution of the namespace,
> preferring to defer the work of mapping the incoming packet to a unique
> namespace until that information is actually needed.  To enable this
> lazy resolution, it provides the network_type so that the resolution can
> be done.
>
You could say that even for native IB. Now, you resolve sgid_index
only when you really
need it (with ib_init_ah_from_wc()) and you don't fill the completion
structure with gid_index
every time you poll.

> This is a fair assessment of the current state of things and what these
> patches do, yes?
>
> Jason's objections are this:
>
> 1)  The lazy resolution is wrong.
> 2)  The use of network_type as the additional information to get to the
> unique namespace is vendor specific cruft that shouldn't be part of the
> core kernel API.

Not true. This is not vendor specific information. Spec says that network_type
be a part of the completion. Honestly, I really don't understand the resistance
from implementing the spec as it is written.
>
> Jason's preference would be that the above issues be resolved by
> skipping the lazy resolution and instead doing proactive resolution on
> receipt of a packet and then probably just pass the namespace around
> instead of passing around the information needed to resolve the
> namespace.  Or, at a minimum, at least make the information added to the
> core API not something vendor specific like network_type, which is a
> detail of the Mellanox implementation.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH for-next V2 00/11] Add RoCE v2 support

2015-12-17 Thread Liran Liss
> From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-
> ow...@vger.kernel.org] On Behalf Of Doug Ledford

> These patches add the concept of duplicate GIDs that are differentiated by
> their RoCE version (also called network type).  So, now, an incoming packet
> could match a couple different gid_indexes and we need additional
> information to get back to the definitive 1:1 mapping.  The submitted patches
> are designed around a lazy resolution of the namespace, preferring to defer
> the work of mapping the incoming packet to a unique namespace until that
> information is actually needed.  To enable this lazy resolution, it provides 
> the
> network_type so that the resolution can be done.
> 
> This is a fair assessment of the current state of things and what these
> patches do, yes?

Hi Doug,
Yes, your description nails the crux of the discussion.
Thanks.

> 
> Jason's objections are this:
> 
> 1)  The lazy resolution is wrong.
> 2)  The use of network_type as the additional information to get to the
> unique namespace is vendor specific cruft that shouldn't be part of the core
> kernel API.
> 

Which is totally wrong. RoCE versions and the network type attribute are part 
of the RoCE specification.
There is nothing vendor specific. In fact, this patch initially came from 
Avago!!!

> Jason's preference would be that the above issues be resolved by skipping
> the lazy resolution and instead doing proactive resolution on receipt of a
> packet and then probably just pass the namespace around instead of passing
> around the information needed to resolve the namespace.  Or, at a
> minimum, at least make the information added to the core API not
> something vendor specific like network_type, which is a detail of the
> Mellanox implementation.

Again, there is nothing here specific to the Mellanox implementation.
This argument is invalid.

> 
> Jason, is this accurate for your position?
> 
> If everyone agrees that this is a fair statement of where we stand, then I'll
> continue my response.  If not, please correct anything I have wrong above
> and I'll take that into my continued response.
> 

Since we are following the standard, and:
- This is *not* a vendor specific issue
- Lazy resolution adheres to the current flow of the stack (which IMO, is the 
right thing to do anyway)
- There are no security issues whatsoever (see below)
- The code could be easily extended to support namespaces in the future
- We should "release early release often", and not hold progress because of 
endless philosophical discussions
then let's apply these patches.

I perfectly understand Jason's proposal and its merits.
I just don't agree that it is a better approach, and in any case this is not 
the time to continue discussing it.

> 1 - Actually, for any received packet with associated IP address information.
> We've only enabled net namespaces for IP connections between user space
> applications, for direct IB connections or for kernel connections there is not
> yet any namespace support.

Correct.
That's why we are not risking anything in the KAPI with regard to namespaces; 
there is no security issue.

--Liran

> 
> --
> Doug Ledford 
>   GPG KeyID: 0E572FDD
> 

N�r��yb�X��ǧv�^�)޺{.n�+{��ٚ�{ay�ʇڙ�,j��f���h���z��w���
���j:+v���w�j�mzZ+�ݢj"��!�i

Re: [PATCH for-next V2 00/11] Add RoCE v2 support

2015-12-16 Thread Doug Ledford
On 12/16/2015 01:56 AM, Moni Shoua wrote:
>> The part that bothers me about this is that this statement makes sense
>> when just thinking about the spec, as you say.  However, once you
>> consider namespaces, security implications make this statement spec
>> compliant, but still unacceptable.  The spec itself is silent on
>> namespaces.  But, you guys wanted, and you got, namespace support.
>> Since that's beyond spec, and carries security requirements, I think
>> it's fair to say that from now on, the Linux kernel RDMA stack can no
>> longer *just* be spec compliant.  There are additional concerns that
>> must always be addressed with new changes, and those are the namespace
>> constraint preservation concerns.
> 
> I can't object to that but I really would like to get an example of a
> security risk.

*This* is exactly the conversation to be having right now.  The
namespace support has been added to the core, and so now we need to
define exactly what the impact of that is for new feature submissions
like this one.  More on that below...

> So far, besides hearing that the way we choose to handle completions
> is wrong, I didn't get a convincing example of how where it doesn't
> work.

Work is too fuzzy of a word to use here.  It could mean "applications
keep running", but that could be contrary to the namespace restrictions
as it may be that the application should *not* have continued to run
when namespace considerations were taken into account.

> Moreover, regarding security, all we wanted is for HW to report
> the L3 protocol (IB, IPv4, or IPv6) in the packet. This is data that
> with some extra CPU cycles can be obtained from the 40 bytes that are
> scattered to the receive bufs anyway. So, if there is a security hole
> it exists from day one of the IB stack and this is not the time we
> should insist on fixing it.

No, not true.  You are implementing RoCEv2 support, which is an entirely
new feature.  So this feature can't have had a security hole since
forever as it has never been in the kernel before now.  The objections
are arising because of the ordering of events.  Specifically, we added
the core namespace support (even though it isn't complete, so far it's
the infrastructure ready for various upper portions of the stack to
start using, but it isn't a complete stack wide solution yet) first, and
so this new feature, which will need to be a part of that namespace
infrastructure that other parts of the IB stack can use, should have its
namespace support already enabled (ideally, but if it didn't, it should
at least have a clear plan for how to enable it in the future).  Jason's
objection is based upon this premise and the fact that a technical
review of the code makes it look like the core namespace infrastructure
becomes less complete, not more, with the inclusion of these patches.

As I understand it, prior to these patches there would always be a 1:1
mapping of GID to gid_index because you would never have duplicate GIDs
in the GID table.  That allowed an easy, definitive 1:1 mapping of GID
to namespace via the existing infrastructure for any received packet [1].

These patches add the concept of duplicate GIDs that are differentiated
by their RoCE version (also called network type).  So, now, an incoming
packet could match a couple different gid_indexes and we need additional
information to get back to the definitive 1:1 mapping.  The submitted
patches are designed around a lazy resolution of the namespace,
preferring to defer the work of mapping the incoming packet to a unique
namespace until that information is actually needed.  To enable this
lazy resolution, it provides the network_type so that the resolution can
be done.

This is a fair assessment of the current state of things and what these
patches do, yes?

Jason's objections are this:

1)  The lazy resolution is wrong.
2)  The use of network_type as the additional information to get to the
unique namespace is vendor specific cruft that shouldn't be part of the
core kernel API.

Jason's preference would be that the above issues be resolved by
skipping the lazy resolution and instead doing proactive resolution on
receipt of a packet and then probably just pass the namespace around
instead of passing around the information needed to resolve the
namespace.  Or, at a minimum, at least make the information added to the
core API not something vendor specific like network_type, which is a
detail of the Mellanox implementation.

Jason, is this accurate for your position?

If everyone agrees that this is a fair statement of where we stand, then
I'll continue my response.  If not, please correct anything I have wrong
above and I'll take that into my continued response.

1 - Actually, for any received packet with associated IP address
information.  We've only enabled net namespaces for IP connections
between user space applications, for direct IB connections or for kernel
connections there is not yet any namespace support.

-- 
Doug Ledford 

Re: [PATCH for-next V2 00/11] Add RoCE v2 support

2015-12-16 Thread Jason Gunthorpe
On Wed, Dec 16, 2015 at 08:56:01AM +0200, Moni Shoua wrote:

> I can't object to that but I really would like to get an example of a
> security risk.

How can anyone give you an example when nobody knows exactly how mlx
hardware works in this area?

>From an kapi prespective, the security design is very simple.

Every single UD packet the kapi side has to process must be
unambiguously associated with a gid_index or dropped. Period full
stop. I would think that is an obvious conclusion based on the design
of the gid cache.

This is why we need a clear API to get this critical information. It
should not be open coded in init_ah_from_wc, it should not be done
some other way in the CMA code.

This is a simple matter of sane kapi design, nothing more.

I have no idea why this is so objectionable.

> scattered to the receive bufs anyway. So, if there is a security hole
> it exists from day one of the IB stack and this is not the time we
> should insist on fixing it.

IB isn't interacting with the net stack in the same way rocev2 is, so
this is not a pre-existing problem.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for-next V2 00/11] Add RoCE v2 support

2015-12-16 Thread Jason Gunthorpe
On Wed, Dec 16, 2015 at 09:57:02AM +, Liran Liss wrote:
> Currently, namespaces are not supported for RoCE.

IMHO, we should not be accepting rocev2 without at least basic
namespace support too, since it is fairly trivial to do based on the
work that is already done for verbs. An obvious missing piece is the
'wc to gid index' API I keep asking for.

> That said, we have everything we need for RoCE namespace support when we get 
> there.

Then there is no problem with the 'wc to gid index' stuff, so stop
complaining about it.

> All of this has nothing to do with "broken" and enshrining anything in the 
> kapi.
> That's just bullshit.

No, it is a critique of the bad kAPI choices in this patch that mean
it broadly doesn't use namespaces, net devices or IP routing
correctly.

> The design of the RDMA stack is that Verbs are used by core IB
> services, such as addressing.  For these services, as the
> specification requires, all relevant fields must be reported in the
> CQE, period.  All spec-compliant HW devices follow this.

Wrong, the kapi needs to meet the needs of the kernel, and is
influenced but not set by the various standards.

That means we get to make better choices in the kapi than exposing
wc.network_type.

> If a ULP wants to create an address handle from a completion, there
> are service routines to accomplish that, based on the reported
> fields.  If it doesn't care, there is no reason to sacrifice
> performance.

I have no idea why you think there would be a performance sacrifice,
maybe you should review the patches and my remarks again.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for-next V2 00/11] Add RoCE v2 support

2015-12-16 Thread Jason Gunthorpe
On Wed, Dec 16, 2015 at 03:39:16PM -0500, Doug Ledford wrote:

> These patches add the concept of duplicate GIDs that are differentiated
> by their RoCE version (also called network type).

and by vlan, and smac, and ... Basically everything network unique about
a namespace has to be encapsulted in the gid index now. Each namespace
thus has a subset of gid indexes that are valid for it to use for
outbound and to recieve packets on.

roce didn't really have a way to work with net namespaces, AFAIK (?)
so it gets a pass.

But rocev2 very clearly does. It needs needs to address the issue
outlined in commit b8cab5dab15ff5c2acc3faefdde28919b0341c11 (IB/cma:
Accept connection without a valid netdev on RoCE)

That means cma.c needs to get the gid index every single CMA packet it
processes and confirm that the associated net device is permitted to
talk to the matching CM ID. 

It is no mistake there is a hole in cma.c waiting for this code, when
Haggai did that work it was very clear in my mind that rocev2 would
need to slot into here as well.

> Jason's objections are this:
> 
> 1)  The lazy resolution is wrong.

Wrong in the sense it doesn't actually exist in a usable form
anyplace.

cma.c does not do it, and absoultely must as discussed above.

init_ah_from_wc needs to do it, and maybe does. It is hard to tell,
perhaps a 'rdma_wc_to_dgid_index()' is actually open coded in there
now. Just from a code readability perspective that is ugly.

Then we get into the missing route handling in all places that
construct a rocev2 AH...

> Jason's preference would be that the above issues be resolved by
> skipping the lazy resolution and instead doing proactive resolution
> on

I am happy with lazy resolution, that is a fine compromise.

I just want to see kapi that makes sense here. It is very clear to me
no kernel user can possibly correctly touch a rocev2 UD packet without
retrieving the gid index, so we must have a kAPI for this.

> namespace.  Or, at a minimum, at least make the information added to the
> core API not something vendor specific like network_type, which is a
> detail of the Mellanox implementation.

I keep suggesting a rdma_wc_to_dgid_index() API call.

Perhasp most of he code for this already seems to exist in
init_ah_from_wc.

> 1 - Actually, for any received packet with associated IP address
> information.  We've only enabled net namespaces for IP connections
> between user space applications, for direct IB connections or for kernel
> connections there is not yet any namespace support.

IHMO, this is actually a problem for rocev2.

IB needs more work to create a rdma namespace, but rocve2 does not.

The kernel software side should certainly be completed as a quick
follow on to this series, that means the use of gid_indexes at all
uAPI access points needs to be checked for rocev2.

HW support is needed to complete rocve2 containment, as the hw must
check the gid_index on all directly posted WCs and *ALL* rx'd packets
for a QP to ensure it is allowed.

Some kind of warn on until that support is available would also be
great.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH for-next V2 00/11] Add RoCE v2 support

2015-12-16 Thread Liran Liss
> From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-
> ow...@vger.kernel.org] On Behalf Of Doug Ledford

> In particular, Liran piped up with this comment:
> 
> "Also, I don't want to do any route resolution on the Rx path. A UD QP
> completion just reports the details of the packet it received.
> 
> Conceptually, an incoming packet may not even match an SGID index at all.
> Maybe, responses should be sent from another device. This should not be
> decided that the point that a packet was received."
> 
> The part that bothers me about this is that this statement makes sense when
> just thinking about the spec, as you say.  However, once you consider
> namespaces, security implications make this statement spec compliant, but
> still unacceptable.  The spec itself is silent on namespaces.  But, you guys
> wanted, and you got, namespace support.
> Since that's beyond spec, and carries security requirements, I think it's 
> fair to
> say that from now on, the Linux kernel RDMA stack can no longer *just* be
> spec compliant.  There are additional concerns that must always be
> addressed with new changes, and those are the namespace constraint
> preservation concerns.
> 

Hi Doug,

Currently, there is no namespace support for RoCE, so the RoCEv2 patches have 
*nothing* to do with this.
That said, the RoCE specification does not contradict or inhibit any future 
implementation for namespaces.
The CMA will get the  from ib_wc and resolve to a netdev (or 
sgid_index->netdev, whatever) and process the request accordingly.

We can have endless theoretical discussions on features that are not even 
implemented yet (e.g., RoCE namespace support) each time we add a minor 
straightforward, *spec-compliant* change that *all* RoCE vendors adhere to.
If someone wishes to introduce a new concept, API refactoring proposal, or 
similar for community review, please do so with a different RFC.

This is hindering progress of the whole RDMA stack development!
For example, the posted SoftRoCE patches are waiting just for this.

The RoCEv2 patches have been posted upstream for review for months (!) now.
I simply cannot understand why this is lagging for so long; let's start to get 
the wheels rolling.
--Liran



RE: [PATCH for-next V2 00/11] Add RoCE v2 support

2015-12-16 Thread Liran Liss
> From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-

> > Since you and Jason did not reach a consensus, I have to dig in and
> > see if these patches make it possible to break namespace confinement,
> > either accidentally or with intentionally tricky behavior.  That's
> > going to take me some time.
> 
> Everything to do with parsing a wc and constructing an AH is wrong in this
> series, and the fixes require the API changes I mentioned ( add 'wc to gid
> index' API call, add 'route to AH' API call)
> 
> Every time you read 'route validation' - that is an error, the route should
> never just be validated, it is needed information to construct a rocev2 AH. 
> All
> the places that roughly hand parse the rocev2 WC should not be open coded.
> 
> Even if current HW is broken for namespaces we should not enshrine that in
> the kapi.
>

Currently, namespaces are not supported for RoCE.
So for this patches, this is irrelevant.
That said, we have everything we need for RoCE namespace support when we get 
there.

All of this has nothing to do with "broken" and enshrining anything in the kapi.
That's just bullshit.

The crux of the discussion is the meaning of the API.
The design of the RDMA stack is that Verbs are used by core IB services, such 
as addressing.
For these services, as the specification requires, all relevant fields must be 
reported in the CQE, period.
All spec-compliant HW devices follow this.

If a ULP wants to create an address handle from a completion, there are service 
routines to accomplish that, based on the reported fields.
If it doesn't care, there is no reason to sacrifice performance.

--Liran

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for-next V2 00/11] Add RoCE v2 support

2015-12-15 Thread Doug Ledford
On 12/15/2015 02:15 AM, Moni Shoua wrote:
> On Thu, Dec 3, 2015 at 3:47 PM, Matan Barak  wrote:
>> Hi Doug,
>>
>> This series adds the support for RoCE v2. In order to support RoCE v2,
>> we add gid_type attribute to every GID. When the RoCE GID management
>> populates the GID table, it duplicates each GID with all supported types.
>> This gives the user the ability to communicate over each supported
>> type.
>>
>> Patch 0001, 0002 and 0003 add support for multiple GID types to the
>> cache and related APIs. The third patch exposes the GID attributes
>> information is sysfs.
>>
>> Patch 0004 adds the RoCE v2 GID type and the capabilities required
>> from the vendor in order to implement RoCE v2. These capabilities
>> are grouped together as RDMA_CORE_PORT_IBA_ROCE_UDP_ENCAP.
>>
>> RoCE v2 could work at IPv4 and IPv6 networks. When receiving ib_wc, this
>> information should come from the vendor's driver. In case the vendor
>> doesn't supply this information, we parse the packet headers and resolve
>> its network type. Patch 0005 adds this information and required utilities.
>>
>> Patches 0006 and 0007 adds route validation. This is mandatory to ensure
>> that we send packets using GIDS which corresponds to a net-device that
>> can be routed to the destination.
>>
>> Patches 0008 and 0009 add configfs support (and the required
>> infrastructure) for CMA. The administrator should be able to set the
>> default RoCE type. This is done through a new per-port
>> default_roce_mode configfs file.
>>
>> Patch 0010 formats a QP1 packet in order to support RoCE v2 CM
>> packets. This is required for vendors which implement their
>> QP1 as a Raw QP.
>>
>> Patch 0011 adds support for IPv4 multicast as an IPv4 network
>> requires IGMP to be sent in order to join multicast groups.
>>
>> Vendors code aren't part of this patch-set. Soft-Roce will be
>> sent soon and depends on these patches. Other vendors, like
>> mlx4, ocrdma and mlx5 will follow.
>>
>> This patch is applied on "Change per-entry locks in GID cache to table lock"
>> which was sent to the mailing list.
>>
>> Thanks,
>> Matan
>>
>> Changed from V1:
>>  - Rebased against Linux 4.4-rc2 master branch.
>>  - Add route validation
>>  - ConfigFS - avoid compiling INFINIBAND=y and CONFIGFS_FS=m
>>  - Add documentation for configfs and sysfs ABI
>>  - Remove ifindex and gid_type from mcmember
>>
>> Changes from V0:
>>  - Rebased patches against Doug's latest k.o/for-4.4 tree.
>>  - Fixed a bug in configfs (rmdir caused an incorrect free).
>>
>> Matan Barak (8):
>>   IB/core: Add gid_type to gid attribute
>>   IB/cm: Use the source GID index type
>>   IB/core: Add gid attributes to sysfs
>>   IB/core: Add ROCE_UDP_ENCAP (RoCE V2) type
>>   IB/core: Move rdma_is_upper_dev_rcu to header file
>>   IB/core: Validate route in ib_init_ah_from_wc and ib_init_ah_from_path
>>   IB/rdma_cm: Add wrapper for cma reference count
>>   IB/cma: Add configfs for rdma_cm
>>
>> Moni Shoua (2):
>>   IB/core: Initialize UD header structure with IP and UDP headers
>>   IB/cma: Join and leave multicast groups with IGMP
>>
>> Somnath Kotur (1):
>>   IB/core: Add rdma_network_type to wc
>>
>>  Documentation/ABI/testing/configfs-rdma_cm   |  22 ++
>>  Documentation/ABI/testing/sysfs-class-infiniband |  16 ++
>>  drivers/infiniband/Kconfig   |   9 +
>>  drivers/infiniband/core/Makefile |   2 +
>>  drivers/infiniband/core/addr.c   | 185 +
>>  drivers/infiniband/core/cache.c  | 169 
>>  drivers/infiniband/core/cm.c |  31 ++-
>>  drivers/infiniband/core/cma.c| 261 --
>>  drivers/infiniband/core/cma_configfs.c   | 321 
>> +++
>>  drivers/infiniband/core/core_priv.h  |  45 
>>  drivers/infiniband/core/device.c |  10 +-
>>  drivers/infiniband/core/multicast.c  |  17 +-
>>  drivers/infiniband/core/roce_gid_mgmt.c  |  81 --
>>  drivers/infiniband/core/sa_query.c   |  76 +-
>>  drivers/infiniband/core/sysfs.c  | 184 -
>>  drivers/infiniband/core/ud_header.c  | 155 ++-
>>  drivers/infiniband/core/uverbs_marshall.c|   1 +
>>  drivers/infiniband/core/verbs.c  | 170 ++--
>>  drivers/infiniband/hw/mlx4/qp.c  |   7 +-
>>  drivers/infiniband/hw/mthca/mthca_qp.c   |   2 +-
>>  drivers/infiniband/hw/ocrdma/ocrdma_ah.c |   2 +-
>>  include/rdma/ib_addr.h   |  11 +-
>>  include/rdma/ib_cache.h  |   4 +
>>  include/rdma/ib_pack.h   |  45 +++-
>>  include/rdma/ib_sa.h |   3 +
>>  include/rdma/ib_verbs.h  |  78 +-
>>  26 files changed, 1704 insertions(+), 203 deletions(-)
>>  create mode 100644 

Re: [PATCH for-next V2 00/11] Add RoCE v2 support

2015-12-15 Thread Jason Gunthorpe
On Tue, Dec 15, 2015 at 04:45:21PM -0500, Doug Ledford wrote:

> In particular, Liran piped up with this comment:
> 
> "Also, I don't want to do any route resolution on the Rx path. A UD QP
> completion just reports the details of the packet it received.
> 
> Conceptually, an incoming packet may not even match an SGID index at
> all.  Maybe, responses should be sent from another device. This should
> not be decided that the point that a packet was received."
> 
> The part that bothers me about this is that this statement makes sense
> when just thinking about the spec, as you say.  However, once you
> consider namespaces, security implications make this statement spec
> compliant, but still unacceptable.

This is where I got to and decided there is no desire to make a
correct patch for this stuff.

You are completely correct in your comments above.

> Since you and Jason did not reach a consensus, I have to dig in and see
> if these patches make it possible to break namespace confinement, either
> accidentally or with intentionally tricky behavior.  That's going to
> take me some time.

Everything to do with parsing a wc and constructing an AH is wrong in
this series, and the fixes require the API changes I mentioned ( add
'wc to gid index' API call, add 'route to AH' API call)

Every time you read 'route validation' - that is an error, the route
should never just be validated, it is needed information to construct
a rocev2 AH. All the places that roughly hand parse the rocev2 WC
should not be open coded.

Even if current HW is broken for namespaces we should not enshrine that
in the kapi.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for-next V2 00/11] Add RoCE v2 support

2015-12-15 Thread Moni Shoua
> The part that bothers me about this is that this statement makes sense
> when just thinking about the spec, as you say.  However, once you
> consider namespaces, security implications make this statement spec
> compliant, but still unacceptable.  The spec itself is silent on
> namespaces.  But, you guys wanted, and you got, namespace support.
> Since that's beyond spec, and carries security requirements, I think
> it's fair to say that from now on, the Linux kernel RDMA stack can no
> longer *just* be spec compliant.  There are additional concerns that
> must always be addressed with new changes, and those are the namespace
> constraint preservation concerns.

I can't object to that but I really would like to get an example of a
security risk.
So far, besides hearing that the way we choose to handle completions
is wrong, I didn't get a convincing example of how where it doesn't
work. Moreover, regarding security, all we wanted is for HW to report
the L3 protocol (IB, IPv4, or IPv6) in the packet. This is data that
with some extra CPU cycles can be obtained from the 40 bytes that are
scattered to the receive bufs anyway. So, if there is a security hole
it exists from day one of the IB stack and this is not the time we
should insist on fixing it.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for-next V2 00/11] Add RoCE v2 support

2015-12-14 Thread Moni Shoua
On Thu, Dec 3, 2015 at 3:47 PM, Matan Barak  wrote:
> Hi Doug,
>
> This series adds the support for RoCE v2. In order to support RoCE v2,
> we add gid_type attribute to every GID. When the RoCE GID management
> populates the GID table, it duplicates each GID with all supported types.
> This gives the user the ability to communicate over each supported
> type.
>
> Patch 0001, 0002 and 0003 add support for multiple GID types to the
> cache and related APIs. The third patch exposes the GID attributes
> information is sysfs.
>
> Patch 0004 adds the RoCE v2 GID type and the capabilities required
> from the vendor in order to implement RoCE v2. These capabilities
> are grouped together as RDMA_CORE_PORT_IBA_ROCE_UDP_ENCAP.
>
> RoCE v2 could work at IPv4 and IPv6 networks. When receiving ib_wc, this
> information should come from the vendor's driver. In case the vendor
> doesn't supply this information, we parse the packet headers and resolve
> its network type. Patch 0005 adds this information and required utilities.
>
> Patches 0006 and 0007 adds route validation. This is mandatory to ensure
> that we send packets using GIDS which corresponds to a net-device that
> can be routed to the destination.
>
> Patches 0008 and 0009 add configfs support (and the required
> infrastructure) for CMA. The administrator should be able to set the
> default RoCE type. This is done through a new per-port
> default_roce_mode configfs file.
>
> Patch 0010 formats a QP1 packet in order to support RoCE v2 CM
> packets. This is required for vendors which implement their
> QP1 as a Raw QP.
>
> Patch 0011 adds support for IPv4 multicast as an IPv4 network
> requires IGMP to be sent in order to join multicast groups.
>
> Vendors code aren't part of this patch-set. Soft-Roce will be
> sent soon and depends on these patches. Other vendors, like
> mlx4, ocrdma and mlx5 will follow.
>
> This patch is applied on "Change per-entry locks in GID cache to table lock"
> which was sent to the mailing list.
>
> Thanks,
> Matan
>
> Changed from V1:
>  - Rebased against Linux 4.4-rc2 master branch.
>  - Add route validation
>  - ConfigFS - avoid compiling INFINIBAND=y and CONFIGFS_FS=m
>  - Add documentation for configfs and sysfs ABI
>  - Remove ifindex and gid_type from mcmember
>
> Changes from V0:
>  - Rebased patches against Doug's latest k.o/for-4.4 tree.
>  - Fixed a bug in configfs (rmdir caused an incorrect free).
>
> Matan Barak (8):
>   IB/core: Add gid_type to gid attribute
>   IB/cm: Use the source GID index type
>   IB/core: Add gid attributes to sysfs
>   IB/core: Add ROCE_UDP_ENCAP (RoCE V2) type
>   IB/core: Move rdma_is_upper_dev_rcu to header file
>   IB/core: Validate route in ib_init_ah_from_wc and ib_init_ah_from_path
>   IB/rdma_cm: Add wrapper for cma reference count
>   IB/cma: Add configfs for rdma_cm
>
> Moni Shoua (2):
>   IB/core: Initialize UD header structure with IP and UDP headers
>   IB/cma: Join and leave multicast groups with IGMP
>
> Somnath Kotur (1):
>   IB/core: Add rdma_network_type to wc
>
>  Documentation/ABI/testing/configfs-rdma_cm   |  22 ++
>  Documentation/ABI/testing/sysfs-class-infiniband |  16 ++
>  drivers/infiniband/Kconfig   |   9 +
>  drivers/infiniband/core/Makefile |   2 +
>  drivers/infiniband/core/addr.c   | 185 +
>  drivers/infiniband/core/cache.c  | 169 
>  drivers/infiniband/core/cm.c |  31 ++-
>  drivers/infiniband/core/cma.c| 261 --
>  drivers/infiniband/core/cma_configfs.c   | 321 
> +++
>  drivers/infiniband/core/core_priv.h  |  45 
>  drivers/infiniband/core/device.c |  10 +-
>  drivers/infiniband/core/multicast.c  |  17 +-
>  drivers/infiniband/core/roce_gid_mgmt.c  |  81 --
>  drivers/infiniband/core/sa_query.c   |  76 +-
>  drivers/infiniband/core/sysfs.c  | 184 -
>  drivers/infiniband/core/ud_header.c  | 155 ++-
>  drivers/infiniband/core/uverbs_marshall.c|   1 +
>  drivers/infiniband/core/verbs.c  | 170 ++--
>  drivers/infiniband/hw/mlx4/qp.c  |   7 +-
>  drivers/infiniband/hw/mthca/mthca_qp.c   |   2 +-
>  drivers/infiniband/hw/ocrdma/ocrdma_ah.c |   2 +-
>  include/rdma/ib_addr.h   |  11 +-
>  include/rdma/ib_cache.h  |   4 +
>  include/rdma/ib_pack.h   |  45 +++-
>  include/rdma/ib_sa.h |   3 +
>  include/rdma/ib_verbs.h  |  78 +-
>  26 files changed, 1704 insertions(+), 203 deletions(-)
>  create mode 100644 Documentation/ABI/testing/configfs-rdma_cm
>  create mode 100644 Documentation/ABI/testing/sysfs-class-infiniband
>  create mode 100644 

[PATCH for-next V2 00/11] Add RoCE v2 support

2015-12-03 Thread Matan Barak
Hi Doug,

This series adds the support for RoCE v2. In order to support RoCE v2,
we add gid_type attribute to every GID. When the RoCE GID management
populates the GID table, it duplicates each GID with all supported types.
This gives the user the ability to communicate over each supported
type.

Patch 0001, 0002 and 0003 add support for multiple GID types to the
cache and related APIs. The third patch exposes the GID attributes
information is sysfs.

Patch 0004 adds the RoCE v2 GID type and the capabilities required
from the vendor in order to implement RoCE v2. These capabilities
are grouped together as RDMA_CORE_PORT_IBA_ROCE_UDP_ENCAP.

RoCE v2 could work at IPv4 and IPv6 networks. When receiving ib_wc, this
information should come from the vendor's driver. In case the vendor
doesn't supply this information, we parse the packet headers and resolve
its network type. Patch 0005 adds this information and required utilities.

Patches 0006 and 0007 adds route validation. This is mandatory to ensure
that we send packets using GIDS which corresponds to a net-device that
can be routed to the destination.

Patches 0008 and 0009 add configfs support (and the required
infrastructure) for CMA. The administrator should be able to set the
default RoCE type. This is done through a new per-port
default_roce_mode configfs file.

Patch 0010 formats a QP1 packet in order to support RoCE v2 CM
packets. This is required for vendors which implement their
QP1 as a Raw QP.

Patch 0011 adds support for IPv4 multicast as an IPv4 network
requires IGMP to be sent in order to join multicast groups.

Vendors code aren't part of this patch-set. Soft-Roce will be
sent soon and depends on these patches. Other vendors, like
mlx4, ocrdma and mlx5 will follow.

This patch is applied on "Change per-entry locks in GID cache to table lock"
which was sent to the mailing list.

Thanks,
Matan

Changed from V1:
 - Rebased against Linux 4.4-rc2 master branch.
 - Add route validation
 - ConfigFS - avoid compiling INFINIBAND=y and CONFIGFS_FS=m
 - Add documentation for configfs and sysfs ABI
 - Remove ifindex and gid_type from mcmember

Changes from V0:
 - Rebased patches against Doug's latest k.o/for-4.4 tree.
 - Fixed a bug in configfs (rmdir caused an incorrect free).

Matan Barak (8):
  IB/core: Add gid_type to gid attribute
  IB/cm: Use the source GID index type
  IB/core: Add gid attributes to sysfs
  IB/core: Add ROCE_UDP_ENCAP (RoCE V2) type
  IB/core: Move rdma_is_upper_dev_rcu to header file
  IB/core: Validate route in ib_init_ah_from_wc and ib_init_ah_from_path
  IB/rdma_cm: Add wrapper for cma reference count
  IB/cma: Add configfs for rdma_cm

Moni Shoua (2):
  IB/core: Initialize UD header structure with IP and UDP headers
  IB/cma: Join and leave multicast groups with IGMP

Somnath Kotur (1):
  IB/core: Add rdma_network_type to wc

 Documentation/ABI/testing/configfs-rdma_cm   |  22 ++
 Documentation/ABI/testing/sysfs-class-infiniband |  16 ++
 drivers/infiniband/Kconfig   |   9 +
 drivers/infiniband/core/Makefile |   2 +
 drivers/infiniband/core/addr.c   | 185 +
 drivers/infiniband/core/cache.c  | 169 
 drivers/infiniband/core/cm.c |  31 ++-
 drivers/infiniband/core/cma.c| 261 --
 drivers/infiniband/core/cma_configfs.c   | 321 +++
 drivers/infiniband/core/core_priv.h  |  45 
 drivers/infiniband/core/device.c |  10 +-
 drivers/infiniband/core/multicast.c  |  17 +-
 drivers/infiniband/core/roce_gid_mgmt.c  |  81 --
 drivers/infiniband/core/sa_query.c   |  76 +-
 drivers/infiniband/core/sysfs.c  | 184 -
 drivers/infiniband/core/ud_header.c  | 155 ++-
 drivers/infiniband/core/uverbs_marshall.c|   1 +
 drivers/infiniband/core/verbs.c  | 170 ++--
 drivers/infiniband/hw/mlx4/qp.c  |   7 +-
 drivers/infiniband/hw/mthca/mthca_qp.c   |   2 +-
 drivers/infiniband/hw/ocrdma/ocrdma_ah.c |   2 +-
 include/rdma/ib_addr.h   |  11 +-
 include/rdma/ib_cache.h  |   4 +
 include/rdma/ib_pack.h   |  45 +++-
 include/rdma/ib_sa.h |   3 +
 include/rdma/ib_verbs.h  |  78 +-
 26 files changed, 1704 insertions(+), 203 deletions(-)
 create mode 100644 Documentation/ABI/testing/configfs-rdma_cm
 create mode 100644 Documentation/ABI/testing/sysfs-class-infiniband
 create mode 100644 drivers/infiniband/core/cma_configfs.c

-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH for-next V2 00/11] Add RoCE v2 support

2015-12-03 Thread Matan Barak
Hi Doug,

This series adds the support for RoCE v2. In order to support RoCE v2,
we add gid_type attribute to every GID. When the RoCE GID management
populates the GID table, it duplicates each GID with all supported types.
This gives the user the ability to communicate over each supported
type.

Patch 0001, 0002 and 0003 add support for multiple GID types to the
cache and related APIs. The third patch exposes the GID attributes
information is sysfs.

Patch 0004 adds the RoCE v2 GID type and the capabilities required
from the vendor in order to implement RoCE v2. These capabilities
are grouped together as RDMA_CORE_PORT_IBA_ROCE_UDP_ENCAP.

RoCE v2 could work at IPv4 and IPv6 networks. When receiving ib_wc, this
information should come from the vendor's driver. In case the vendor
doesn't supply this information, we parse the packet headers and resolve
its network type. Patch 0005 adds this information and required utilities.

Patches 0006 and 0007 adds route validation. This is mandatory to ensure
that we send packets using GIDS which corresponds to a net-device that
can be routed to the destination.

Patches 0008 and 0009 add configfs support (and the required
infrastructure) for CMA. The administrator should be able to set the
default RoCE type. This is done through a new per-port
default_roce_mode configfs file.

Patch 0010 formats a QP1 packet in order to support RoCE v2 CM
packets. This is required for vendors which implement their
QP1 as a Raw QP.

Patch 0011 adds support for IPv4 multicast as an IPv4 network
requires IGMP to be sent in order to join multicast groups.

Vendors code aren't part of this patch-set. Soft-Roce will be
sent soon and depends on these patches. Other vendors, like
mlx4, ocrdma and mlx5 will follow.

This patch is applied on "Change per-entry locks in GID cache to table lock"
which was sent to the mailing list.

Thanks,
Matan

Changed from V1:
 - Rebased against Linux 4.4-rc2 master branch.
 - Add route validation
 - ConfigFS - avoid compiling INFINIBAND=y and CONFIGFS_FS=m
 - Add documentation for configfs and sysfs ABI
 - Remove ifindex and gid_type from mcmember

Changes from V0:
 - Rebased patches against Doug's latest k.o/for-4.4 tree.
 - Fixed a bug in configfs (rmdir caused an incorrect free).

Matan Barak (8):
  IB/core: Add gid_type to gid attribute
  IB/cm: Use the source GID index type
  IB/core: Add gid attributes to sysfs
  IB/core: Add ROCE_UDP_ENCAP (RoCE V2) type
  IB/core: Move rdma_is_upper_dev_rcu to header file
  IB/core: Validate route in ib_init_ah_from_wc and ib_init_ah_from_path
  IB/rdma_cm: Add wrapper for cma reference count
  IB/cma: Add configfs for rdma_cm

Moni Shoua (2):
  IB/core: Initialize UD header structure with IP and UDP headers
  IB/cma: Join and leave multicast groups with IGMP

Somnath Kotur (1):
  IB/core: Add rdma_network_type to wc

 Documentation/ABI/testing/configfs-rdma_cm   |  22 ++
 Documentation/ABI/testing/sysfs-class-infiniband |  16 ++
 drivers/infiniband/Kconfig   |   9 +
 drivers/infiniband/core/Makefile |   2 +
 drivers/infiniband/core/addr.c   | 185 +
 drivers/infiniband/core/cache.c  | 169 
 drivers/infiniband/core/cm.c |  31 ++-
 drivers/infiniband/core/cma.c| 261 --
 drivers/infiniband/core/cma_configfs.c   | 321 +++
 drivers/infiniband/core/core_priv.h  |  45 
 drivers/infiniband/core/device.c |  10 +-
 drivers/infiniband/core/multicast.c  |  17 +-
 drivers/infiniband/core/roce_gid_mgmt.c  |  81 --
 drivers/infiniband/core/sa_query.c   |  76 +-
 drivers/infiniband/core/sysfs.c  | 184 -
 drivers/infiniband/core/ud_header.c  | 155 ++-
 drivers/infiniband/core/uverbs_marshall.c|   1 +
 drivers/infiniband/core/verbs.c  | 170 ++--
 drivers/infiniband/hw/mlx4/qp.c  |   7 +-
 drivers/infiniband/hw/mthca/mthca_qp.c   |   2 +-
 drivers/infiniband/hw/ocrdma/ocrdma_ah.c |   2 +-
 include/rdma/ib_addr.h   |  11 +-
 include/rdma/ib_cache.h  |   4 +
 include/rdma/ib_pack.h   |  45 +++-
 include/rdma/ib_sa.h |   3 +
 include/rdma/ib_verbs.h  |  78 +-
 26 files changed, 1704 insertions(+), 203 deletions(-)
 create mode 100644 Documentation/ABI/testing/configfs-rdma_cm
 create mode 100644 Documentation/ABI/testing/sysfs-class-infiniband
 create mode 100644 drivers/infiniband/core/cma_configfs.c

-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html