Re: [PATCH v1 08/12] IB/cma: Add net_dev and private data checks to RDMA CM

2015-07-16 Thread Jason Gunthorpe
On Thu, Jul 16, 2015 at 12:01:55PM +, Liran Liss wrote:

> - Name space lookup is done based on BTH.pkey, private_data.IP, and
>   optionally GRH.DGID (if present, for extra validation)

Just changing the pkey to BTH.pkey would be fine by me.

Using GRH.DGID if available instead of the primary path hack is also
smart, I pointed that out at the beginning..

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v1 08/12] IB/cma: Add net_dev and private data checks to RDMA CM

2015-07-16 Thread Liran Liss
> From: Jason Gunthorpe [mailto:jguntho...@obsidianresearch.com]

> > After all, it is the payload that designates the entity that you
> > want to establish a connection to, rather than the packet headers,
> > which are just meant to relay the packet to the proper CM
> 
> No, that isn't right. The IBA uses the GMP's destination first, then
> serviceID as the demux. Services IDs are not globally unique, they are
> scoped by the destination.
> 

The destination is the physical CA port and kernel CM agent, so I don't think 
the answer is that clear.
Going forward along these lines:
- Name space lookup is done based on BTH.pkey, private_data.IP, and optionally 
GRH.DGID (if present, for extra validation)
-- Primary and alternate paths are not considered at all

- If P_Key enforcement is set up via cgroups:
-- For CM processing, we only check BTH.pkey
--- Upon conflict, the packet is dropped
-- The primary/alternate path pkeys are not validated by CM, but will be 
validated during QP modification

Does this sound OK?

In any case, let's complete the namespace lookup first, and then follow up with 
a cgroup patchset.

> The path data is just *routing* it doesn't describe at all the entity
> we want to talk to, it is only a proposal for how to flow data to it.
> 
> In any event, both the GMP headers and the path data needs to be
> checked against the container's pkey list. I don't know why this is so
> contentions.
> 
> Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 08/12] IB/cma: Add net_dev and private data checks to RDMA CM

2015-07-15 Thread Jason Gunthorpe
On Wed, Jul 15, 2015 at 08:27:06PM +, Liran Liss wrote:
> If you want to restrict a container to a specific set of pkeys, use
> cgroups.

Ideally yes, but in the absence of a cgroup the set of pkeys assigned
to the container via ipoib is a reasonable alternate.

> This would apply both to CM MADs and QPs.
> - In the MAD case, CM MADs would be first matched to a namespace and rdma_id 
> and dropped upon pkey conflict (with either the headers or the payload).
> - In the QP case, modify_qp() would fail on conflict.
> Partitioning needs to be enforced also for applications that don't use the CM 
> at all...

Yep, that is how pkey checking should work, cgroup or not.

So, you agree with me.

I say that until we get a cgroup capability, the pkey list of the
container is the set of IPoIB interfaces associated with it, and we
still have to do the various checks above. The first check is relavent
to this patchset and should be done by using the GMP's headers to
locate the net device.

> For namespaces, it seems more natural to lookup the namespace based
> solely on the CM payload.

How so? Which payload content do you use? The primary path? The
alternate path?

> After all, it is the payload that designates the entity that you
> want to establish a connection to, rather than the packet headers,
> which are just meant to relay the packet to the proper CM

No, that isn't right. The IBA uses the GMP's destination first, then
serviceID as the demux. Services IDs are not globally unique, they are
scoped by the destination.

The path data is just *routing* it doesn't describe at all the entity
we want to talk to, it is only a proposal for how to flow data to it.

In any event, both the GMP headers and the path data needs to be
checked against the container's pkey list. I don't know why this is so
contentions.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v1 08/12] IB/cma: Add net_dev and private data checks to RDMA CM

2015-07-15 Thread Liran Liss
> From: Jason Gunthorpe [mailto:jguntho...@obsidianresearch.com]
 
> 
> > What is really missing here I guess is a mechanism that would
> > enforce containers to only use certain pkeys - perhaps with
> > something like an RDMA cgroup.  It could force containers to only
> > use approved pkeys not only with RDMA CM, but through uverbs, and
> > other user-space interfaces.
> 
> Ideally yes. Without that it just means you can't use pkey
> meaningfully with containers..
> 

That's why partition enforcement should be separated from namespaces.

If you want to restrict a container to a specific set of pkeys, use cgroups.
This would apply both to CM MADs and QPs.
- In the MAD case, CM MADs would be first matched to a namespace and rdma_id 
and dropped upon pkey conflict (with either the headers or the payload).
- In the QP case, modify_qp() would fail on conflict.
Partitioning needs to be enforced also for applications that don't use the CM 
at all...

For namespaces, it seems more natural to lookup the namespace based solely on 
the CM payload.
After all, it is the payload that designates the entity that you want to 
establish a connection to, rather than the packet headers, which are just meant 
to relay the packet to the proper CM agent. (Spec-wise, it is even possible to 
use completely different node to open a connection on behalf of another node.)
So, the sole role of pkeys in the context of namespaces is to locate the proper 
namespace; not for partition isolation.

So, as a first step, let's get the namespace lookup in place.
Next, add an rdma crgoup, which would control pkeys, as well as resource counts 
(you don't want one container to rob the whole system from all QPs...).

> Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 08/12] IB/cma: Add net_dev and private data checks to RDMA CM

2015-07-15 Thread Jason Gunthorpe
On Wed, Jul 15, 2015 at 01:57:48PM +0300, Haggai Eran wrote:
> On 13/07/2015 21:14, Jason Gunthorpe wrote:
> > On Mon, Jun 22, 2015 at 03:42:37PM +0300, Haggai Eran wrote:
> >> +  switch (ib_event->event) {
> >> +  case IB_CM_REQ_RECEIVED:
> >> +  req->device = req_param->listen_id->device;
> >> +  req->port   = req_param->port;
> >> +  req->local_gid  = &req_param->primary_path->sgid;
> >> +  req->service_id = req_param->primary_path->service_id;
> >> +  req->pkey   = be16_to_cpu(req_param->primary_path->pkey);
> > 
> > I feel pretty strongly that we should be using the pkey from the work
> > completion, not the pkey in the message.
> > 
> > The reason, if someone is using pkey like vlan, and expecting a
> > container to never receive packets outside the assigned pkey, then we
> > need to check each and every packet for the correct pkey before
> > associating it with that container.
> 
> The way I see it is that you have one RDMA CM agent in the system, and
> the header parameters address it. This agent allows addressing several
> namespaces, and they are de-muxed according to the parameters in the
> payload.

That doesn't address my argument at all.

A container should never, ever, process a packet on a pkey that is not
associated with it.

So, it doesn't matter one bit how you decide to demux. After the demux
is done all three pkeys needs to be checked to be compatible with the
container: GMP, Primary Path, Alternate Path.

If any are not compatible the packet *MUST* be dropped.

If you do the above check, you may as well start with the wc's pkey,
since it reflects the logical 'netdevice' that received the packet.

> So a container never receives a packet outside of its assigned pkeys,
> but the pkey you look at (as well as the GID, and possibly the IP
> address) all come from the payload.

That isn't entirey true, at a minimum the above scheme allows an
attacking pkey to create an unbounded number of REQs delivered to
userspace on a victim container, without needing to be part of the
victim's pkey.

> > When doing the namespace patches you should probably also look at
> > other CM GMPs than just the REQ and how the paths are setup and
> > consider what to do with the pkey. I'd probably suggest that the pkey
> > should be forced throughout the entire process to ensure it always
> > matches the ip device - at least for containers that is the right
> > thing.. I probably wouldn't turn it on for the root namespace though..
> 
> Once a connection has been established, following GMPs use a unique ID
> to address this connection, so no more de-muxing is needed.

I think there are open issues here about pkey correctness. Ie the
unique ID is not locked to a pkey, so an attacking pkey can guess the
unique ID and then issue GMPs that will be accepted.

Once locked to a container the unique ID absolutely needs to drop all
packets that are outside the container's pkey space. IMHO this would
be part of the namespace patches.

Basically, pkey in a container should work exactly the same as pkey on
a HCA, packets are dropped before they ever reach the container. This
means every single resource associated with a container much check the
pkey of any packet targeted at that resource.

> What is really missing here I guess is a mechanism that would
> enforce containers to only use certain pkeys - perhaps with
> something like an RDMA cgroup.  It could force containers to only
> use approved pkeys not only with RDMA CM, but through uverbs, and
> other user-space interfaces.

Ideally yes. Without that it just means you can't use pkey
meaningfully with containers..

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 08/12] IB/cma: Add net_dev and private data checks to RDMA CM

2015-07-15 Thread Haggai Eran
On 13/07/2015 21:14, Jason Gunthorpe wrote:
> On Mon, Jun 22, 2015 at 03:42:37PM +0300, Haggai Eran wrote:
>> +switch (ib_event->event) {
>> +case IB_CM_REQ_RECEIVED:
>> +req->device = req_param->listen_id->device;
>> +req->port   = req_param->port;
>> +req->local_gid  = &req_param->primary_path->sgid;
>> +req->service_id = req_param->primary_path->service_id;
>> +req->pkey   = be16_to_cpu(req_param->primary_path->pkey);
> 
> I feel pretty strongly that we should be using the pkey from the work
> completion, not the pkey in the message.
> 
> The reason, if someone is using pkey like vlan, and expecting a
> container to never receive packets outside the assigned pkey, then we
> need to check each and every packet for the correct pkey before
> associating it with that container.

The way I see it is that you have one RDMA CM agent in the system, and
the header parameters address it. This agent allows addressing several
namespaces, and they are de-muxed according to the parameters in the
payload.

So a container never receives a packet outside of its assigned pkeys,
but the pkey you look at (as well as the GID, and possibly the IP
address) all come from the payload.

> When doing the namespace patches you should probably also look at
> other CM GMPs than just the REQ and how the paths are setup and
> consider what to do with the pkey. I'd probably suggest that the pkey
> should be forced throughout the entire process to ensure it always
> matches the ip device - at least for containers that is the right
> thing.. I probably wouldn't turn it on for the root namespace though..

Once a connection has been established, following GMPs use a unique ID
to address this connection, so no more de-muxing is needed. What is
really missing here I guess is a mechanism that would enforce containers
to only use certain pkeys - perhaps with something like an RDMA cgroup.
It could force containers to only use approved pkeys not only with RDMA
CM, but through uverbs, and other user-space interfaces.

Haggai
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 08/12] IB/cma: Add net_dev and private data checks to RDMA CM

2015-07-13 Thread Jason Gunthorpe
On Mon, Jun 22, 2015 at 03:42:37PM +0300, Haggai Eran wrote:
> + switch (ib_event->event) {
> + case IB_CM_REQ_RECEIVED:
> + req->device = req_param->listen_id->device;
> + req->port   = req_param->port;
> + req->local_gid  = &req_param->primary_path->sgid;
> + req->service_id = req_param->primary_path->service_id;
> + req->pkey   = be16_to_cpu(req_param->primary_path->pkey);

I feel pretty strongly that we should be using the pkey from the work
completion, not the pkey in the message.

The reason, if someone is using pkey like vlan, and expecting a
container to never receive packets outside the assigned pkey, then we
need to check each and every packet for the correct pkey before
associating it with that container.

When doing the namespace patches you should probably also look at
other CM GMPs than just the REQ and how the paths are setup and
consider what to do with the pkey. I'd probably suggest that the pkey
should be forced throughout the entire process to ensure it always
matches the ip device - at least for containers that is the right
thing.. I probably wouldn't turn it on for the root namespace though..

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html