Re: [PATCH] IB/ipoib: CSUM support in connected mode

2015-11-24 Thread Yann Droneaud
Hi,

Le mercredi 18 novembre 2015 à 12:46 +0200, Yuval Shaia a écrit :
> On Thu, Jul 30, 2015 at 03:58:13PM +0200, Yann Droneaud wrote:
> > Le jeudi 30 juillet 2015 à 04:46 -0700, Yuval Shaia a écrit :

> > > This enhancement suggest the usage of IB CRC instead of CSUM in
> > > IPoIB CM. IPoIB CM uses RC (Reliable Connection) which guarantees
> > > the corruption free delivery of the packet.
> > > 
> > > InfiniBand uses 32b CRC which provides stronger data integrity 
> > > protection compare to 16b IP Checksum.
> > 
> > InfiniBand 32b CRC <=> Ethernet 32b CRC, it's link layer, layer 2.
> > 
> > IPv4 checksum is at another level, it's internet layer, layer 3.
> > 
> > > So, there is no added value that IP/TCP Checksum provides in the
> > > IB world.
> > > 
> > 
> > Sure, IPv4 checksum is a thing of the past: checksum was dropped
> > from IP header in IPv6: it assumes the lower layer, such as
> > Ethernet, provides the required integrety check.

> Right, will change description to something like this:
> InfiniBand 32b CRC offers strong data integrity protection compare
> CSUM.

This patch must clearly state that, without IPv4 header checksum, it's
no more IP protocol as described by RFC 791

> > 
> > I think not checking the IPv4 checksum should be a choice,
> > carefully thought, for inside a fabric, as I understand your
> > proposal, packet with invalid checksum will be allowed to go in/out
> > of the fabric.

> If peer supports this feature do you see any reason why not?

You're free to run you own protocol inside your private network, but
this protocol is not compatible with IPv4.

> > 
> > It sound like it's a departure from the behavior one can expect
> > from an IPv4 network stack.
> > 
> > > The proposal is to tell network stack that IPoIB-CM supports IP 
> > > Checksum offload. This enables the kernel to save the time of 
> > > checksum calculation of IPoIB CM packets. Network sends the IP
> > > packet without adding the IP Checksum to the header. On the
> > > receive side, IPoIB driver again tells the network stack that IP
> > > Checksum is good for the incoming packets and network stack
> > > avoids the IP Checksum calculations.
> > > 
> > > During connection establishment the driver determine if peer
> > > supports IB CRC as checksum. This is done so driver will be able
> > > to calculate checksum before transmiting the packet in case the
> > > peer does not support this feature.
> > > 
> > 
> > Two questions:
> > 
> > - What will see tool such as wireshark/tcpdump when sniffing
> > checksum-less IPv4 packets sent/received on IPoIB interface ?

> Never checked it but i assume 0 or what ever the kernel put there
> when driver reports NETIF_F_IP_CSUM. I can check and zero if needed
> but any reason to believe it is needed?

I don't think it's needed if you agree that this modification creates a
new protocol which is no more IPv4.

> > 
> > - What might happen if such checksum-less IPv4 packet is later
> > routed to a different IPv4 network ?

> Same as my question above, if peer supports this feature do you see
> anything wrong?

If peer is going to forward this packet to a different network, which
is not IPoIB based, the checksum will be checked and the packet will be
rejected.

Regards.

-- 
Yann Droneaud
OPTEYA

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 1/3] IB/core: Expose a device attribute for rdma_read access flags

2015-11-10 Thread Yann Droneaud
Hi,

Le mardi 10 novembre 2015 à 12:44 +0200, Sagi Grimberg a écrit :
> Different devices may require different access permissions
> for rdma reads e.g. for Infiniband devices, local write access
> suffices while iWARP devices require remote write permissions as
> well.
> 
> This situation generates extra logic for ULPs that need to be aware
> of the underlying device to determine rdma read access. Instead,
> add a device attribute for ULPs to use without being aware of the
> underlying device.
> 
> Also, set rdma_read_access_flags in the relevant device drivers:
> mlx4, mlx5, qib, ocrdma, nes: IB_ACCESS_LOCAL_WRITE
> cxgb3, cxgb4, hfi: IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_WRITE
> 

Why were those hw providers not modified to
enforce IB_ACCESS_REMOTE_WRITE when needed, instead of asking users to
set it for them ?

Regards.

-- 
Yann Droneaud
OPTEYA

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] IB/core: avoid 32-bit warning

2015-10-07 Thread Yann Droneaud
Hi,

Le mercredi 07 octobre 2015 à 16:19 +0300, Sagi Grimberg a écrit :
> On 10/7/2015 3:29 PM, Arnd Bergmann wrote:
> > The INIT_UDATA() macro requires a pointer or unsigned long argument
> > for
> > both input and output buffer, and all callers had a cast from when
> > the code was merged until a recent restructuring, so now we get
> > 
> > core/uverbs_cmd.c: In function 'ib_uverbs_create_cq':
> > core/uverbs_cmd.c:1481:66: warning: cast to pointer from integer of
> > different size [-Wint-to-pointer-cast]
> > 
> > This makes the code behave as before by adding back the cast to
> > unsigned long.
> > 
> > Signed-off-by: Arnd Bergmann <a...@arndb.de>
> > Fixes: 565197dd8fb1 ("IB/core: Extend ib_uverbs_create_cq")
> > 
> > diff --git a/drivers/infiniband/core/uverbs_cmd.c
> > b/drivers/infiniband/core/uverbs_cmd.c
> > index be4cb9f04be3..88b3b78340f2 100644
> > --- a/drivers/infiniband/core/uverbs_cmd.c
> > +++ b/drivers/infiniband/core/uverbs_cmd.c
> > @@ -1478,7 +1478,7 @@ ssize_t ib_uverbs_create_cq(struct
> > ib_uverbs_file *file,
> > if (copy_from_user(, buf, sizeof(cmd)))
> > return -EFAULT;
> > 
> > -   INIT_UDATA(, buf, cmd.response, sizeof(cmd),
> > sizeof(resp));
> > +   INIT_UDATA(, buf, (unsigned long)cmd.response,
> > sizeof(cmd), sizeof(resp));
> 
> Would it make sense to cast inside INIT_UDATA() and not have callers
> worry about it?

It's ... complicated. See INIT_UDATA_BUF_OR_NULL().

Awayway, I have patch to do the opposite, eg. explicitly cast u64 value
to (void __user *)(unsigned long) in the caller function instead, as it
allows some safer / new operations on the response buffer.

Regards.

-- 
Yann Droneaud
OPTEYA

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] IB/core: avoid 32-bit warning

2015-10-07 Thread Yann Droneaud
Hi,

Le mercredi 07 octobre 2015 à 14:29 +0200, Arnd Bergmann a écrit :
> The INIT_UDATA() macro requires a pointer or unsigned long argument 
> for both input and output buffer, and all callers had a cast from 
> when the code was merged until a recent restructuring, so now we get
> 
> core/uverbs_cmd.c: In function 'ib_uverbs_create_cq':
> core/uverbs_cmd.c:1481:66: warning: cast to pointer from integer of
> different size [-Wint-to-pointer-cast]
> 
> This makes the code behave as before by adding back the cast to
> unsigned long.
> 
> Signed-off-by: Arnd Bergmann <a...@arndb.de>
> Fixes: 565197dd8fb1 ("IB/core: Extend ib_uverbs_create_cq")
> 

Reviewed-by: Yann Droneaud <ydrone...@opteya.com>

> diff --git a/drivers/infiniband/core/uverbs_cmd.c
> b/drivers/infiniband/core/uverbs_cmd.c
> index be4cb9f04be3..88b3b78340f2 100644
> --- a/drivers/infiniband/core/uverbs_cmd.c
> +++ b/drivers/infiniband/core/uverbs_cmd.c
> @@ -1478,7 +1478,7 @@ ssize_t ib_uverbs_create_cq(struct
> ib_uverbs_file *file,
>   if (copy_from_user(, buf, sizeof(cmd)))
>   return -EFAULT;
>  
> - INIT_UDATA(, buf, cmd.response, sizeof(cmd),
> sizeof(resp));
> + INIT_UDATA(, buf, (unsigned long)cmd.response,
> sizeof(cmd), sizeof(resp));
>  
>   INIT_UDATA(, buf + sizeof(cmd),
>  (unsigned long)cmd.response + sizeof(resp),
> 

Regards.

-- 
Yann Droneaud
OPTEYA

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: message size, was Re: merge struct ib_device_attr into struct ib_device

2015-09-23 Thread Yann Droneaud
Hi,

Le mardi 22 septembre 2015 à 23:55 +0200, 'Christoph Hellwig' a écrit :
> On Tue, Sep 22, 2015 at 04:06:17PM -0500, Steve Wise wrote:
> > Can you create a series of smaller patches that will fit on the
> > list?
> > That would make it easier for everyone to review/comment. 
> 
> I don't see how that is possible, as it's a flag day change.
> 

Perhaps you could change ib_query_device() to use the ib_device_attr
from struct ib_device so that ulp doesn't have to be modified at the
same time the drivers. Then further patches can update the users of
ib_query_device(), one at a time. And the last patch would remove
ib_query_device().

Regards.

-- 
Yann Droneaud

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: merge struct ib_device_attr into struct ib_device

2015-09-22 Thread Yann Droneaud
Le lundi 21 septembre 2015 à 13:59 -0700, Christoph Hellwig a écrit :
> This patch gets rid of struct ib_device_attr and cleans up drivers
> nicely.
> 
> It goes on top of my send_wr cleanups and the memory registration
> udpates
> from Sagi.
> 

Is the patch missing ?

Regards.

-- 
Yann Droneaud
OPTEYA

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for-next 02/10] IB/core: Add flags parameter for ib_create_cq

2015-05-20 Thread Yann Droneaud
Hi,

Le dimanche 17 mai 2015 à 16:36 +0300, Or Gerlitz a écrit :
 diff --git a/drivers/infiniband/ulp/iser/iser_verbs.c 
 b/drivers/infiniband/ulp/iser/iser_verbs.c
 index cc2dd35..922d322 100644
 --- a/drivers/infiniband/ulp/iser/iser_verbs.c
 +++ b/drivers/infiniband/ulp/iser/iser_verbs.c
 @@ -126,11 +126,9 @@ static int iser_create_device_ib_res(struct 
 iser_device *device)
 struct iser_comp *comp = device-comps[i];
  
 comp-device = device;
 -   comp-cq = ib_create_cq(device-ib_device,
 -   iser_cq_callback,
 -   iser_cq_event_callback,
 -   (void *)comp,
 -   max_cqe, i);
 +   comp-cq = ib_create_cq(device-ib_device, iser_cq_callback,
 +   iser_cq_event_callback, (void *)comp,
 +   max_cqe, i, 0);

Don't change indentation needlessly.


 if (IS_ERR(comp-cq)) {
 comp-cq = NULL;
 goto cq_err;


 diff --git a/drivers/infiniband/ulp/isert/ib_isert.c 
 b/drivers/infiniband/ulp/isert/ib_isert.c
 index 327529e..f7d9ae0 100644
 --- a/drivers/infiniband/ulp/isert/ib_isert.c
 +++ b/drivers/infiniband/ulp/isert/ib_isert.c
 @@ -320,11 +320,12 @@ isert_alloc_comps(struct isert_device *device,
  
 comp-device = device;
 INIT_WORK(comp-work, isert_cq_work);
 -   comp-cq = ib_create_cq(device-ib_device,
 -   isert_cq_callback,
 +   comp-cq = ib_create_cq(device-ib_device, isert_cq_callback,

Don't change indentation.

 isert_cq_event_callback,
 (void *)comp,
 -   max_cqe, i);
 +   max_cqe,
 +   i,
 +   0);
 if (IS_ERR(comp-gt;cq)) {
 isert_err(Unable to allocate cq\n);
 ret = PTR_ERR(comp-gt;cq);
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
 index ad0e2ea..1bbe4a4 100644
 --- a/include/rdma/ib_verbs.h
 +++ b/include/rdma/ib_verbs.h
 @@ -173,6 +173,10 @@ struct ib_odp_caps {
 } per_transport_caps;
  };
  
 +enum ib_cq_creation_flags {
 +   IB_CQ_FLAGS_TIMESTAMP   = 1  0,
 +};
 +

That's must be part of a different patch, otherwise the commit message
is not true.

  struct ib_cq_init_attr {
 int cqe;
 int comp_vector;

diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c
 b/net/sunrpc/xprtrdma/svc_rdma_transport.c
 index 3df8320..cfb5915 100644
 --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
 +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
 @@ -901,21 +901,16 @@ static struct svc_xprt *svc_rdma_accept(struct
 svc_xprt *xprt)
 goto errout;
 }
 newxprt-sc_sq_cq = ib_create_cq(newxprt-sc_cm_id-device,
 -sq_comp_handler,
 -cq_event_handler,
 -newxprt,
 -newxprt-sc_sq_depth,
 -0);
 +sq_comp_handler, cq_event_handler,
 +newxprt, newxprt-sc_sq_depth, 0, 0);

Don't change indentation.

 if (IS_ERR(newxprt-sc_sq_cq)) {
 dprintk(svcrdma: error creating SQ CQ for connect
 request\n);
 goto errout;
 }
 newxprt-sc_rq_cq = ib_create_cq(newxprt-sc_cm_id-device,
 -rq_comp_handler,
 -cq_event_handler,
 -newxprt,
 -newxprt-sc_max_requests,
 -0);
 +rq_comp_handler, cq_event_handler,
 +newxprt, newxprt-sc_max_requests,
 +0, 0);

Don't change indentation.

 if (IS_ERR(newxprt-sc_rq_cq)) {
 dprintk(svcrdma: error creating RQ CQ for connect
 request\n);
 goto errout;


Regards

-- 
Yann Droneaud
OPTEYA


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for-next 09/10] IB/mlx4: Add timestamp_mask and hca_core_clock to query_device

2015-05-20 Thread Yann Droneaud
Hi,

Le mercredi 20 mai 2015 à 17:41 +0300, Or Gerlitz a écrit :
 On 5/20/2015 3:29 AM, Jason Gunthorpe wrote:
  On Tue, May 19, 2015 at 10:30:00PM +0300, Or Gerlitz wrote:
Are you objecting adding the clock frequency and mask to the 
qeury device verb?
why?
  Lets see the verbs side and I'll let you know.
 
 You mean the user series of libibverbs/libmlx4? I don't see why this 
 should be a must for the review of the kernel bits. The user-space 
 code 
 is coming up soon, sure, but we should be able to review kernel 
 patches 
 without requiring to actually see the user-space code.
 

In some other subsystems: no userspace code, no merge.

http://blog.ffwll.ch/2015/05/gfx-kernel-upstreaming-requirements.html


 As the change-logs here explained, the clock frequency is needed for 
 applications to convert the HCA clock delta (current time - timestamp 
 on 
 WC) into nano-secs and such.The mask is needed to realize how many 
 bits
 from the 64b time-stamp are supported by the HW.
 
  
If not, are objecting using the vendor specific track of the 
verb to
pass from the vendor driver to the vendor library this or that 
detail
which is needed for proper operation? why?
  I'm uncomfortable seeing otherwise vendor-neutral calls gain vendor
  extensions.
 
 But this is whole purpose of the udata framework in uverbs, right? 
 for 
 each uverb command the vendor user-space library has a well defined 
 channel to communicate directly with the low level vendor driver 
 throughout the uverbs channels.
 

Uverbs convey information between kernel and userspace drivers to
implement verbs for userspace application. I don't think it's designed
to allow vendor to add random extensions in the best way with regard to
backward/forward compability.


Anyway, please, we have to make drivers which are going to behave as
good citizens to the kernel *and* userspace. Adding a dedicated
extensions which is going to be replaced later by a generic, vendor
neutral, extension will be painful to maintain to ensure backward
compatibility.

So let's think how this timestamp extension can be made generic enough
to be future-proof (and at least present proof to address current use
cases).

Regards.

-- 
Yann Droneaud
OPTEYA



--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for-next 01/10] IB/core: Change provider's API of create_cq to be extendible

2015-05-20 Thread Yann Droneaud
Le mardi 19 mai 2015 à 12:45 -0600, Jason Gunthorpe a écrit :
  + struct ib_cq * (*create_cq)(struct ib_device *device,
  + struct ib_cq_init_attr *attr,
 
 const struct ib_cq_init_attr *attr,
 
 And related changes that will cause.

I was going to ask for the same change.

Regards.

-- 
Yann Droneaud
OPTEYA


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rdma kernel tree

2015-05-19 Thread Yann Droneaud
Hi,

Le mardi 19 mai 2015 à 15:49 -0400, Doug Ledford a écrit :
 On Tue, 2015-05-19 at 21:47 +0300, Or Gerlitz wrote:
  On Tue, May 19, 2015 at 3:22 PM, Doug Ledford dledf...@redhat.com wrote:
   On Tue, 2015-05-19 at 07:35 +0300, Or Gerlitz wrote:
I pulled that via a bundle from patchworks.  I'll double check it.
Did  you check it out? fixed it out?
  
   I took a look now, you've rebased to-be-rebased/for-4.2 to 4.1-rc4 and
  
   Correct.
  
   it seems this is what you are going to push into the kernel.org treem
  
   Correct.
  
   but this series is still there with the zillion tested/reviewed/etc
   signature per one 2-3 patch, I think we've agreed this needs to be
   addressed prior to the upstream push, right?
  
   Incorrect.  What you objected to before was the large Cc: list in the
   patches.  That is gone.  What is there now is just the reviewed-by: list
   of three people, and the tested-by list of two people.  As the entire
   patch set as a whole was reviewed and tested by those people, it seems
   accurate to me.
  
  Doug, I have never ever seen a patch set (specifically the 15~23 part
  of it) with that level of simplicity
 
 That portion of the patchset didn't start out with that level of
 simplicity per patch, it evolved to that because it made review
 *significantly* easier.  It's very simple to review a patch that does:
 
 Add 1 helper
 Replace tests in code with just that 1 helper
 
 because you can scroll through that patch and know that every line being
 replaced is related to that one helper.  If you want to know every line
 that was replaced with rdma_cap_iw_cm, you go to that one patch and it's
 all listed very easy to read.
 
 On the other hand, when you squash all those patches together, review
 becomes much harder because if you want to see what a single helper
 does, you have to sift through all of the other helper changes and hope
 you find the right ones, and that you found all of the right ones.
 
   and
  signature/reviewers/tested-by/etc inflation.
 
 I added those myself as part of an automated addition.  It applied to
 the entire series, so it was put on each patch.  The people that
 tested/reviewed the series did not do so to individual patches, they hit
 all of them.  And as was pointed out a couple of weeks ago on an earlier
 patchset I picked up, it is generally good behavior to give attribution
 where it is due to encourage people to participate.
 

I agree with all the above replies from Doug:

- small patches are easier to read and review: I prefer reviewing 10
littles patches than one big patch.

- Signed-off-by:, Reviewed-by:, Tested-by: and Cc: are required too
so that when it come to modifying existing code, we can contact people
previously involved for reviews, tests, advice, etc.

(I usually add Link: with a reference to the e-mail so that one commit
 can be traced back to the related patch submission and discussions  
 (cover-letter !), and, to be able to trace back all patch iterations,
 I add a link to the previous patchset submission, and so on).

Regards.

-- 
Yann Droneaud
OPTEYA


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] libibverbs init.c: remove stderr warnings if no userspace driver found

2015-05-09 Thread Yann Droneaud
Hi,

Le vendredi 08 mai 2015 à 11:21 -0700, Jeff Squyres a écrit :
 Signed-off-by: Jeff Squyres jsquy...@cisco.com

This is a little short for an explanation: what was the issue with the
error messages ?


 ---
  src/init.c | 14 --
  1 file changed, 14 deletions(-)
 
 diff --git a/src/init.c b/src/init.c
 index d0e4b1c..9c21768 100644
 --- a/src/init.c
 +++ b/src/init.c
 @@ -557,19 +557,5 @@ HIDDEN int ibverbs_init(struct ibv_device ***list)
   }
  
  out:
 - for (sysfs_dev = sysfs_dev_list,
 -  next_dev = sysfs_dev ? sysfs_dev-next : NULL;
 -  sysfs_dev;
 -  sysfs_dev = next_dev, next_dev = sysfs_dev ? sysfs_dev-next : 
 NULL) {
 - if (!sysfs_dev-have_driver) {
 - fprintf(stderr, PFX Warning: no userspace 
 device-specific 
 - driver found for %s\n, sysfs_dev-sysfs_path);
 - if (statically_linked)
 - fprintf(stderr,When linking libibverbs 
 statically, 
 - driver must be statically linked 
 too.\n);
 - }
 - free(sysfs_dev);

I believe this free() was necessary to not leak some memory.

 - }
 -
   return num_devices;
  }


Regards.

-- 
Yann Droneaud
OPTEYA


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V1 net-next] IB/ipoib: Fix ndo_get_iflink

2015-04-16 Thread Yann Droneaud
Hi,

Le jeudi 16 avril 2015 à 16:34 +0300, Erez Shitrit a écrit :
 Currently, iflink of the parent interface was always accessed, even 
 when interface didn't have a parent and hence we crashed there.

+ since commit 5aa7add8f14b ('infiniband/ipoib: implement
ndo_get_iflink').

as there was no crash here before AFAIK.

 
 Handle the interface types properly: for a child interface, return
 the ifindex of the parent, for parent interface, return its ifindex.
 
 For child devices, make sure to set the parent pointer prior to
 invoking register_netdevice(), this allows the new ndo to be called
 by the stack immediately after the child device is registered.
 
 Fixes: 5aa7add8f14b ('infiniband/ipoib: implement ndo_get_iflink')

Cc: Nicolas Dichtel nicolas.dich...@6wind.com

 Reported-by: Honggang Li ho...@redhat.com
 Signed-off-by: Erez Shitrit ere...@mellanox.com
 Signed-off-by: Honggang Li ho...@redhat.com
 ---
 
 changes from V0:
  - fixed two typos in the change-log
 
  drivers/infiniband/ulp/ipoib/ipoib_main.c | 5 +
  drivers/infiniband/ulp/ipoib/ipoib_vlan.c | 3 +--
  2 files changed, 6 insertions(+), 2 deletions(-)
 
 diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c 
 b/drivers/infiniband/ulp/ipoib/ipoib_main.c
 index 657b89b..915ad04 100644
 --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
 +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
 @@ -846,6 +846,11 @@ static int ipoib_get_iflink(const struct net_device *dev)
  {
   struct ipoib_dev_priv *priv = netdev_priv(dev);
  
 + /* parent interface */
 + if (!test_bit(IPOIB_FLAG_SUBINTERFACE, priv-flags))
 + return dev-ifindex;
 +
 + /* child/vlan interface */
   return priv-parent-ifindex;
  }
  
 diff --git a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c 
 b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
 index 4dd1313..fca1a88 100644
 --- a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
 +++ b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
 @@ -58,6 +58,7 @@ int __ipoib_vlan_add(struct ipoib_dev_priv *ppriv, struct 
 ipoib_dev_priv *priv,
   /* MTU will be reset when mcast join happens */
   priv-dev-mtu   = IPOIB_UD_MTU(priv-max_ib_mtu);
   priv-mcast_mtu  = priv-admin_mtu = priv-dev-mtu;
 + priv-parent = ppriv-dev;
   set_bit(IPOIB_FLAG_SUBINTERFACE, priv-flags);
  
   result = ipoib_set_dev_features(priv, ppriv-ca);
 @@ -84,8 +85,6 @@ int __ipoib_vlan_add(struct ipoib_dev_priv *ppriv, struct 
 ipoib_dev_priv *priv,
   goto register_failed;
   }
  
 - priv-parent = ppriv-dev;
 -
   ipoib_create_debug_files(priv-dev);
  
   /* RTNL childs don't need proprietary sysfs entries */

Regards.

-- 
Yann Droneaud
OPTEYA


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 2/2] IB/core: don't disallow registering region starting at 0x0

2015-04-14 Thread Yann Droneaud
Hi,

Le mardi 14 avril 2015 à 12:20 +0300, Sagi Grimberg a écrit :
 On 4/13/2015 3:56 PM, Yann Droneaud wrote:
  In a call to ib_umem_get(), if address is 0x0 and size is
  already page aligned, check added in commit 8494057ab5e4
  (IB/uverbs: Prevent integer overflow in ib_umem_get address
  arithmetic) will refuse to register a memory region that
  could otherwise be valid (provided vm.mmap_min_addr sysctl
  and mmap_low_allowed SELinux knobs allow userspace to map
  something at address 0x0).
 
  This patch allows back such registration: ib_umem_get()
  should probably don't care of the base address provided it
  can be pinned with get_user_pages().
 
  There's two possible overflows, in (addr + size) and in
  PAGE_ALIGN(addr + size), this patch keep ensuring none
  of them happen while allowing to pin memory at address
  0x0. Anyway, the case of size equal 0 is no more (partially)
  handled as 0-length memory region are disallowed by an
  earlier check.
 
  Link: http://mid.gmane.org/cover.1428929103.git.ydrone...@opteya.com
  Cc: sta...@vger.kernel.org # 8494057ab5e4 (IB/uverbs: Prevent integer 
  overflow in ib_umem_get address arithmetic)
  Cc: Shachar Raindel rain...@mellanox.com
  Cc: Jack Morgenstein ja...@mellanox.com
  Cc: Or Gerlitz ogerl...@mellanox.com
  Signed-off-by: Yann Droneaud ydrone...@opteya.com
  ---
drivers/infiniband/core/umem.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
 
  diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
  index 9ac4068d2088..38acb3cfc545 100644
  --- a/drivers/infiniband/core/umem.c
  +++ b/drivers/infiniband/core/umem.c
  @@ -106,8 +106,8 @@ struct ib_umem *ib_umem_get(struct ib_ucontext 
  *context, unsigned long addr,
   * If the combination of the addr and size requested for this memory
   * region causes an integer overflow, return error.
   */
  -   if ((PAGE_ALIGN(addr + size) = size) ||
  -   (PAGE_ALIGN(addr + size) = addr))
  +   if (((addr + size)  addr) ||
  +   PAGE_ALIGN(addr + size)  (addr + size))
 
 If you do change the first statement to be: (addr + size) = addr
 wouldn't it cover patch #1?
 

Yes, but it doesn't sound a great place to do it: here it's about
overflow, so I'd prefer not doing the null memory region check there.

Regards.

-- 
Yann Droneaud
OPTEYA


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH linux-next 1/4] infiniband/ipoib: fix possible NULL pointer dereference in ipoib_get_iflink

2015-04-14 Thread Yann Droneaud
Hi Nicolas,

Le mardi 14 avril 2015 à 17:49 +0200, Nicolas Dichtel a écrit :
 Le 14/04/2015 17:44, Honggang LI a écrit :
  On Tue, Apr 14, 2015 at 08:34:33AM -0700, Eric Dumazet wrote:
  On Tue, 2015-04-14 at 23:20 +0800, Honggang Li wrote:
  Starting monitoring for VG vg_rdma01:   3 logical volume(s) in volume
  group vg_rdma01 monitored
  [  OK  ]
 
 
  CR2: 0120
  ---[ end trace a8610f6e9640eb85 ]---
 
  Signed-off-by: Honggang Li ho...@redhat.com
 
  When was this bug added ?
 
 
  Sorry, I do not know. I ran into this bug today when I'm tracing a race
  condition issue related qib. In order to check upstream linux-next fix
  the race condition or not, I build linux-next-20150414 on two machines. Both
  machines panic at modprobe ib_ipoib. Do you means I need to report a
  bug? But I do not know report to who or where.
 
 Here is the tag:
 Fixes: 5aa7add8f14b (infiniband/ipoib: implement ndo_get_iflink)
 

Pardon me, but this patch was never submitted to
linux-rdma@vger.kernel.org for review !?

Regards.

-- 
Yann Droneaud
OPTEYA


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v1 0/2] Fixes on top of CVE-2014-8159 kernel: infiniband: uverbs: unprotected physical memory access

2015-04-13 Thread Yann Droneaud
Hi,

Please find one patch to prevent a possible issue partially
addressed by commit 8494057ab5e4 (IB/uverbs: Prevent integer
overflow in ib_umem_get address arithmetic) (see discussions
in [1]) and another one to add back the possibility of registering
memory mapped at 0 (which is probably not something to be allowed,
but it's probably not up to ib_umem_get() to prevent it).

Changes from v0 [2]:
- don't touch to overflow logic in first patch:
  not modifying the logic here so that the patch can be applied
  even on kernel without the overflow preventing checks,
  and second patch is going to rewrite the check.

- don't break overflow detection in second patch:
  changing less or equal to less comparison broke the overflow
  detection logic regarding to rounding done by PAGE_ALIGN,
  so fixes this by checking for overflow in addr + size,
  then by checking for overflow in PAGE_ALIGN(addr + size).

[1] Re: CVE-2014-8159 kernel: infiniband: uverbs: unprotected physical
 memory access

 http://mid.gmane.org/1428497043.22575.176.ca...@opteya.com
 http://marc.info/?i=1428497043.22575.176.ca...@opteya.com

[2] [PATCH RESEND 0/2] Fixes on top of CVE-2014-8159 kernel: infiniband: 
uverbs: unprotected physical memory access

 http://mid.gmane.org/cover.1428523125.git.ydrone...@opteya.com
 http://marc.info/?i=cover.1428523125.git.ydrone...@opteya.com

Yann Droneaud (2):
  IB/core: disallow registering 0-sized memory region
  IB/core: don't disallow registering region starting at 0x0

 drivers/infiniband/core/umem.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v1 1/2] IB/core: disallow registering 0-sized memory region

2015-04-13 Thread Yann Droneaud
If ib_umem_get() is called with a size equal to 0 and an
non-page aligned address, one page will be pinned and a
0-sized umem will be returned to the caller.

This should not be allowed: it's not expected for a memory
region to have a size equal to 0.

This patch adds a check to explicitly refuse to register
a 0-sized region.

Link: http://mid.gmane.org/cover.1428929103.git.ydrone...@opteya.com
Cc: sta...@vger.kernel.org
Cc: Shachar Raindel rain...@mellanox.com
Cc: Jack Morgenstein ja...@mellanox.com
Cc: Or Gerlitz ogerl...@mellanox.com
Signed-off-by: Yann Droneaud ydrone...@opteya.com
---
 drivers/infiniband/core/umem.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index 8c014b5dab4c..9ac4068d2088 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -99,6 +99,9 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, 
unsigned long addr,
if (dmasync)
dma_set_attr(DMA_ATTR_WRITE_BARRIER, attrs);
 
+   if (!size)
+   return ERR_PTR(-EINVAL);
+
/*
 * If the combination of the addr and size requested for this memory
 * region causes an integer overflow, return error.
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: CVE-2014-8159 kernel: infiniband: uverbs: unprotected physical memory access

2015-04-13 Thread Yann Droneaud
Hi,

Le jeudi 02 avril 2015 à 18:12 +, Haggai Eran a écrit :
 On Thursday, April 2, 2015 7:44 PM, Shachar Raindel wrote:
  -Original Message-
  From: Yann Droneaud [mailto:ydrone...@opteya.com]
  Le jeudi 02 avril 2015 à 18:18 +0300, Haggai Eran a écrit :
   On 02/04/2015 16:30, Yann Droneaud wrote:
   Le jeudi 02 avril 2015 à 10:52 +, Shachar Raindel a écrit :
   -Original Message-
   From: Yann Droneaud [mailto:ydrone...@opteya.com]
   Sent: Thursday, April 02, 2015 1:05 PM
   Le mercredi 18 mars 2015 à 17:39 +, Shachar Raindel a écrit :
  
   +  /*
   +   * If the combination of the addr and size requested for this
   memory
   +   * region causes an integer overflow, return error.
   +   */
   +  if ((PAGE_ALIGN(addr + size) = size) ||
   +  (PAGE_ALIGN(addr + size) = addr))
   +  return ERR_PTR(-EINVAL);
   +
  
   Can access_ok() be used here ?
  
if (!access_ok(writable ? VERIFY_WRITE : VERIFY_READ,
   addr, size))
 return ERR_PTR(-EINVAL);
  
  
   No, this will break the current ODP semantics.
  
   ODP allows the user to register memory that is not accessible yet.
   This is a critical design feature, as it allows avoiding holding
   a registration cache. Adding this check will break the behavior,
   forcing memory to be all accessible when registering an ODP MR.
  
  
   Where's the check for the range being in userspace memory space,
   especially for the ODP case ?
  
   For non ODP case (eg. plain old behavior), does get_user_pages()
   ensure the requested pages fit in userspace region on all
   architectures ? I think so.
  
   Yes, get_user_pages will return a smaller amount of pages than
  requested
   if it encounters an unmapped region (or a region without write
   permissions for write requests). If this happens, the loop in
   ib_umem_get calls get_user_pages again with the next set of pages, and
   this time if it the first page still cannot be mapped an error is
  returned.
  
  
   In ODP case, I'm not sure such check is ever done ?
  
   In ODP, we also call get_user_pages, but only when a page fault occurs
   (see ib_umem_odp_map_dma_pages()). This allows the user to pre-
  register
   a memory region that contains unmapped virtual space, and then mmap
   different files into that area without needing to re-register.
  
 
  OK, thanks for the description.
 
  ...
 
  Another related question: as the large memory range could be registered
  by user space with ibv_reg_mr(pd, base, size, IB_ACCESS_ON_DEMAND),
  what's prevent the kernel to map a file as the result of mmap(0, ...)
  in this  region, making it available remotely through IBV_WR_RDMA_READ /
  IBV_WR_RDMA_WRITE ?
 
  
  This is not a bug. This is a feature.
  
  Exposing a file through RDMA, using ODP, can be done exactly like this.
  Given that the application explicitly requested this behavior, I don't
  see why it is a problem. Actually, some of our tests use such flows.
  The mmu notifiers mechanism allow us to do this safely. When the page is
  written back to disk, it is removed from the ODP mapping. When it is
  accessed by the HCA, it is brought back to RAM.
  
 
 I want to add that we would like to see users registering a very large
 memory region (perhaps the entire process address space) for local
 access, and then enabling remote access only to specific regions using
 memory windows. However, this isn't supported yet by our driver.

In such scheme, the registration must still be handled manually:
one has to create a memory window to get a rkey to be exchanged with
a peer, so why one would want to register such a large memory region
(the whole process space) ?

I guess creating the memory window is faster than registering memory
region. I'd rather say this is not an excuse to register a larger 
memory region (up to the whole process space, current and future) as it 
sounds like a surprising optimisation: let the HCA known too many
pages just to be sure it already knows some when the process want to 
use them. It seems it would become difficult to handle if there's too
many processes.

I would prefer creating memory region becoming costless (through ODP :).

  Still, there are valid cases where you would still want the results
 of an mmap(0,...) call to be remotely accessible, in cases where there
 is enough trust between the local process and the remote process.

mmap(0, , fd) let the kernel choose where to put the file in 
process virtual memory space, so it may, may not, partially, end up in 
an ODP pre registered memory region for a range unallocated/unused yet.

I don't think one want such to happen.

  It may help a middleware communication library register a large
 portion of the address space in advance, and still work with random
 pointers given to it by another application module.
 

But as said in the beginnig of your message, the middleware would have
bind a memory window

[PATCH v1 2/2] IB/core: don't disallow registering region starting at 0x0

2015-04-13 Thread Yann Droneaud
In a call to ib_umem_get(), if address is 0x0 and size is
already page aligned, check added in commit 8494057ab5e4
(IB/uverbs: Prevent integer overflow in ib_umem_get address
arithmetic) will refuse to register a memory region that
could otherwise be valid (provided vm.mmap_min_addr sysctl
and mmap_low_allowed SELinux knobs allow userspace to map
something at address 0x0).

This patch allows back such registration: ib_umem_get()
should probably don't care of the base address provided it
can be pinned with get_user_pages().

There's two possible overflows, in (addr + size) and in
PAGE_ALIGN(addr + size), this patch keep ensuring none
of them happen while allowing to pin memory at address
0x0. Anyway, the case of size equal 0 is no more (partially)
handled as 0-length memory region are disallowed by an
earlier check.

Link: http://mid.gmane.org/cover.1428929103.git.ydrone...@opteya.com
Cc: sta...@vger.kernel.org # 8494057ab5e4 (IB/uverbs: Prevent integer 
overflow in ib_umem_get address arithmetic)
Cc: Shachar Raindel rain...@mellanox.com
Cc: Jack Morgenstein ja...@mellanox.com
Cc: Or Gerlitz ogerl...@mellanox.com
Signed-off-by: Yann Droneaud ydrone...@opteya.com
---
 drivers/infiniband/core/umem.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index 9ac4068d2088..38acb3cfc545 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -106,8 +106,8 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, 
unsigned long addr,
 * If the combination of the addr and size requested for this memory
 * region causes an integer overflow, return error.
 */
-   if ((PAGE_ALIGN(addr + size) = size) ||
-   (PAGE_ALIGN(addr + size) = addr))
+   if (((addr + size)  addr) ||
+   PAGE_ALIGN(addr + size)  (addr + size))
return ERR_PTR(-EINVAL);
 
if (!can_do_mlock())
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Stepping down as maintainer (was Re: [PATCH for-next 0/9] mlx4 changes in virtual GID management)

2015-04-12 Thread Yann Droneaud
Hi,

Le vendredi 10 avril 2015 à 20:39 +, Hefty, Sean a écrit :
  I'm on a perhaps-unfortunately-timed vacation this week, but to make
  the transition smooth, I'll handle the easy patches that are in flight
  for 4.1, and then turn things over to the community for 4.2.  (Unless
  the feeling is that it's better for me to step down immediately — I
  have no problem doing that, I just don't want to leave people in the
  lurch)
 
 This transition plan makes sense to me.  I can dedicate more of my
 time helping to review kernel patches, at least for a couple of
 months.  Hal is listed as the other sub-maintainer.  He usually limits
 his input to the IB mgmt components, but maybe he can contribute more
 until we're over this hump?
 
 I'm not trying to volunteer anyone, but longer term, Doug appears to
 be in an ideal situation for maintaining the RDMA subtree.  Because of
 his position at RedHat, he has a strong interest in ensuring user
 space compatibility, can test across a wide range of devices, and
 needs to deal with long term maintenance issues.
 

That would make sense to have someone like Doug or Jason being the
potential successors to Roland in order to keep InfiniBand subsystem
vendor neutrality.

Regards.

-- 
Yann Droneaud
OPTEYA


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: CVE-2014-8159 kernel: infiniband: uverbs: unprotected physical memory access

2015-04-08 Thread Yann Droneaud
Hi,

Le jeudi 02 avril 2015 à 16:34 +, Shachar Raindel a écrit :
  -Original Message-
  From: Yann Droneaud [mailto:ydrone...@opteya.com]
  Sent: Thursday, April 02, 2015 6:16 PM
  Le jeudi 02 avril 2015 à 10:52 +, Shachar Raindel a écrit :
-Original Message-
From: Yann Droneaud [mailto:ydrone...@opteya.com]
Sent: Thursday, April 02, 2015 1:05 PM
Le mercredi 18 mars 2015 à 17:39 +, Shachar Raindel a écrit :
  ...
 + /*
 +  * If the combination of the addr and size requested for this
memory
 +  * region causes an integer overflow, return error.
 +  */
 + if ((PAGE_ALIGN(addr + size) = size) ||
 + (PAGE_ALIGN(addr + size) = addr))
 + return ERR_PTR(-EINVAL);
 +
   
Can access_ok() be used here ?
   
 if (!access_ok(writable ? VERIFY_WRITE : VERIFY_READ,
addr, size))
  return ERR_PTR(-EINVAL);
   
  
   No, this will break the current ODP semantics.
  
   ODP allows the user to register memory that is not accessible yet.
   This is a critical design feature, as it allows avoiding holding
   a registration cache. Adding this check will break the behavior,
   forcing memory to be all accessible when registering an ODP MR.
  
  
  Failed to notice previously, but since this would break ODP, and ODP is
  only available starting v3.19-rc1, my proposed fix might be applicable
  for older kernel (if not better).
  
 
 Can you explain how this proposed fix is better than the existing patch?
 Why do we want to push to the stable tree a patch that is not in the
 upstream? There is an existing, tested, patch that is going to the tip
 of the development. It even applies cleanly on every kernel version around.
 

access_ok() check for overflow *and* that the region is the memory range
for the current process. The later check is not done in your proposed 
fix (but it should not be needed as get_user_pages() will be called 
to validate the whole region for non-ODP memory registration).

Anyway, AFAIK access_ok() won't check for address being not NULL and
size not being 0, and I've noticed your proposed fix also ensure address
is not equal to NULL and, more important, that size is not equal to 0:
before v3.15-rc1 and commit eeb8461e36c9 (IB: Refactor umem to use
linear SG table), calling ib_umem_get() with size equal to 0 would 
succeed with any arbitrary address ... who knows what might happen in 
the lowlevel drivers (aka. providers) if they got an umem for a 0-sized
memory region.
This part of the changes was not detailled in your commit message: it's
an issue not related to overflow which is addressed by your patch.

So I agree my proposed patch is no better than yours: I've missed the
0-sized memory region issue and didn't take care of NULL address.

Regards.

-- 
Yann Droneaud
OPTEYA


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: CVE-2014-8159 kernel: infiniband: uverbs: unprotected physical memory access

2015-04-08 Thread Yann Droneaud
Hi,

Le mercredi 08 avril 2015 à 14:19 +0200, Yann Droneaud a écrit :
 Le jeudi 02 avril 2015 à 16:34 +, Shachar Raindel a écrit :
   -Original Message-
   From: Yann Droneaud [mailto:ydrone...@opteya.com]
   Sent: Thursday, April 02, 2015 6:16 PM
   Le jeudi 02 avril 2015 à 10:52 +, Shachar Raindel a écrit :
 -Original Message-
 From: Yann Droneaud [mailto:ydrone...@opteya.com]
 Sent: Thursday, April 02, 2015 1:05 PM
 Le mercredi 18 mars 2015 à 17:39 +, Shachar Raindel a écrit :
   ...
  +   /*
  +* If the combination of the addr and size requested for this
 memory
  +* region causes an integer overflow, return error.
  +*/
  +   if ((PAGE_ALIGN(addr + size) = size) ||
  +   (PAGE_ALIGN(addr + size) = addr))
  +   return ERR_PTR(-EINVAL);
  +

 Can access_ok() be used here ?

  if (!access_ok(writable ? VERIFY_WRITE : VERIFY_READ,
 addr, size))
   return ERR_PTR(-EINVAL);

   
No, this will break the current ODP semantics.
   
ODP allows the user to register memory that is not accessible yet.
This is a critical design feature, as it allows avoiding holding
a registration cache. Adding this check will break the behavior,
forcing memory to be all accessible when registering an ODP MR.
   
   
   Failed to notice previously, but since this would break ODP, and ODP is
   only available starting v3.19-rc1, my proposed fix might be applicable
   for older kernel (if not better).
   
  
  Can you explain how this proposed fix is better than the existing patch?
  Why do we want to push to the stable tree a patch that is not in the
  upstream? There is an existing, tested, patch that is going to the tip
  of the development. It even applies cleanly on every kernel version around.
  
 
 access_ok() check for overflow *and* that the region is the memory range
 for the current process. The later check is not done in your proposed 
 fix (but it should not be needed as get_user_pages() will be called 
 to validate the whole region for non-ODP memory registration).
 
 Anyway, AFAIK access_ok() won't check for address being not NULL and
 size not being 0, and I've noticed your proposed fix also ensure address
 is not equal to NULL and, more important, that size is not equal to 0

It only check address not being 0 if size is already PAGE_SIZE aligned,
and it only check size not being 0 if address is already PAGE_SIZE
aligned.

 before v3.15-rc1 and commit eeb8461e36c9 (IB: Refactor umem to use
 linear SG table), calling ib_umem_get() with size equal to 0 would 
 succeed with any arbitrary address ... who knows what might happen in 
 the lowlevel drivers (aka. providers) if they got an umem for a 0-sized
 memory region.
 This part of the changes was not detailled in your commit message: it's
 an issue not related to overflow which is addressed by your patch.
 
 So I agree my proposed patch is no better than yours: I've missed the
 0-sized memory region issue and didn't take care of NULL address.
 

Regards.

-- 
Yann Droneaud
OPTEYA


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] IB/core: disallow registering 0-sized memory region

2015-04-08 Thread Yann Droneaud
If ib_umem_get() is called with a size equal to 0 and an
non-page aligned address, one page will be pinned and a
0-sized umem will be returned to the caller.

This should not be allowed: it's not expected for a memory
region to have a size equal to 0.

This patch adds a check to explicitly refuse to register
a 0-sized region.

Additionally, it updates check added in commit 8494057ab5e4
(IB/uverbs: Prevent integer overflow in ib_umem_get
address arithmetic) to not care about 0-sized region:
it would had catched 0-sized region only if address was
already page aligned.

Link: http://mid.gmane.org/cover.1428502843.git.ydrone...@opteya.com
Cc: sta...@vger.kernel.org # 8494057ab5e4 (IB/uverbs: Prevent integer overflow 
in ib_umem_get address arithmetic)
Cc: Shachar Raindel rain...@mellanox.com
Cc: Jack Morgenstein ja...@mellanox.com
Cc: Or Gerlitz ogerl...@mellanox.com
Signed-off-by: Yann Droneaud ydrone...@opteya.com
---
 drivers/infiniband/core/umem.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index 8c014b5dab4c..cbe361645c1b 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -99,12 +99,15 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, 
unsigned long addr,
if (dmasync)
dma_set_attr(DMA_ATTR_WRITE_BARRIER, attrs);
 
+   if (!size)
+   return ERR_PTR(-EINVAL);
+
/*
 * If the combination of the addr and size requested for this memory
 * region causes an integer overflow, return error.
 */
if ((PAGE_ALIGN(addr + size) = size) ||
-   (PAGE_ALIGN(addr + size) = addr))
+   (PAGE_ALIGN(addr + size)  addr))
return ERR_PTR(-EINVAL);
 
if (!can_do_mlock())
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] IB/core: don't disallow registering region starting at 0x0

2015-04-08 Thread Yann Droneaud
In a call to ib_umem_get(), if address is 0x0 and size is
already page aligned, check added in commit 8494057ab5e4
(IB/uverbs: Prevent integer overflow in ib_umem_get address
arithmetic) will refuse to register a memory region that
could otherwise be valid (provided vm.mmap_min_addr sysctl
and mmap_low_allowed SELinux knobs allow userspace to map
something at address 0x0).

This patch allows back such registration: ib_umem_get()
should probably don't care of the base address provided it
can be pinned with get_user_pages().

Link: http://mid.gmane.org/cover.1428502843.git.ydrone...@opteya.com
Cc: sta...@vger.kernel.org # 8494057ab5e4 (IB/uverbs: Prevent integer overflow 
in ib_umem_get address arithmetic)
Cc: Shachar Raindel rain...@mellanox.com
Cc: Jack Morgenstein ja...@mellanox.com
Cc: Or Gerlitz ogerl...@mellanox.com
Signed-off-by: Yann Droneaud ydrone...@opteya.com
---
 drivers/infiniband/core/umem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index cbe361645c1b..a0aadd05ab6d 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -106,7 +106,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, 
unsigned long addr,
 * If the combination of the addr and size requested for this memory
 * region causes an integer overflow, return error.
 */
-   if ((PAGE_ALIGN(addr + size) = size) ||
+   if ((PAGE_ALIGN(addr + size)  size) ||
(PAGE_ALIGN(addr + size)  addr))
return ERR_PTR(-EINVAL);
 
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/2] Fixes on top of CVE-2014-8159 kernel: infiniband: uverbs: unprotected physical memory access

2015-04-08 Thread Yann Droneaud
Hi,

Please find one patch to prevent a possible issue partially
addressed by commit 8494057ab5e4 (IB/uverbs: Prevent integer
overflow in ib_umem_get address arithmetic) (see discussions
in [1]) and another one to add back the possibility of registering
memory mapped at 0 (which is probably not something to be allowed,
but it's not up to ib_umem_get() to prevent it).

[1] Re: CVE-2014-8159 kernel: infiniband: uverbs: unprotected physical
 memory access

 http://mid.gmane.org/1428497043.22575.176.ca...@opteya.com
 http://marc.info/?i=1428497043.22575.176.ca...@opteya.com

Regards.

Yann Droneaud (2):
  IB/core: disallow registering 0-sized memory region
  IB/core: don't disallow registering region starting at 0x0

 drivers/infiniband/core/umem.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RESEND 1/2] IB/core: disallow registering 0-sized memory region

2015-04-08 Thread Yann Droneaud
If ib_umem_get() is called with a size equal to 0 and an
non-page aligned address, one page will be pinned and a
0-sized umem will be returned to the caller.

This should not be allowed: it's not expected for a memory
region to have a size equal to 0.

This patch adds a check to explicitly refuse to register
a 0-sized region.

Additionally, it updates check added in commit 8494057ab5e4
(IB/uverbs: Prevent integer overflow in ib_umem_get
address arithmetic) to not care about 0-sized region:
it would had catched 0-sized region only if address was
already page aligned.

Link: http://mid.gmane.org/cover.1428523125.git.ydrone...@opteya.com
Cc: sta...@vger.kernel.org # 8494057ab5e4 (IB/uverbs: Prevent integer 
overflow in ib_umem_get address arithmetic)
Cc: Shachar Raindel rain...@mellanox.com
Cc: Jack Morgenstein ja...@mellanox.com
Cc: Or Gerlitz ogerl...@mellanox.com
Signed-off-by: Yann Droneaud ydrone...@opteya.com
---
 drivers/infiniband/core/umem.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index 8c014b5dab4c..cbe361645c1b 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -99,12 +99,15 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, 
unsigned long addr,
if (dmasync)
dma_set_attr(DMA_ATTR_WRITE_BARRIER, attrs);
 
+   if (!size)
+   return ERR_PTR(-EINVAL);
+
/*
 * If the combination of the addr and size requested for this memory
 * region causes an integer overflow, return error.
 */
if ((PAGE_ALIGN(addr + size) = size) ||
-   (PAGE_ALIGN(addr + size) = addr))
+   (PAGE_ALIGN(addr + size)  addr))
return ERR_PTR(-EINVAL);
 
if (!can_do_mlock())
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RESEND 2/2] IB/core: don't disallow registering region starting at 0x0

2015-04-08 Thread Yann Droneaud
In a call to ib_umem_get(), if address is 0x0 and size is
already page aligned, check added in commit 8494057ab5e4
(IB/uverbs: Prevent integer overflow in ib_umem_get address
arithmetic) will refuse to register a memory region that
could otherwise be valid (provided vm.mmap_min_addr sysctl
and mmap_low_allowed SELinux knobs allow userspace to map
something at address 0x0).

This patch allows back such registration: ib_umem_get()
should probably don't care of the base address provided it
can be pinned with get_user_pages().

Link: http://mid.gmane.org/cover.1428523125.git.ydrone...@opteya.com
Cc: sta...@vger.kernel.org # 8494057ab5e4 (IB/uverbs: Prevent integer 
overflow in ib_umem_get address arithmetic)
Cc: Shachar Raindel rain...@mellanox.com
Cc: Jack Morgenstein ja...@mellanox.com
Cc: Or Gerlitz ogerl...@mellanox.com
Signed-off-by: Yann Droneaud ydrone...@opteya.com
---
 drivers/infiniband/core/umem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index cbe361645c1b..a0aadd05ab6d 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -106,7 +106,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, 
unsigned long addr,
 * If the combination of the addr and size requested for this memory
 * region causes an integer overflow, return error.
 */
-   if ((PAGE_ALIGN(addr + size) = size) ||
+   if ((PAGE_ALIGN(addr + size)  size) ||
(PAGE_ALIGN(addr + size)  addr))
return ERR_PTR(-EINVAL);
 
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: CVE-2014-8159 kernel: infiniband: uverbs: unprotected physical memory access

2015-04-03 Thread Yann Droneaud
Hi,

Le vendredi 03 avril 2015 à 08:39 +, Haggai Eran a écrit :
 On Thursday, April 2, 2015 11:40 PM, Yann Droneaud ydrone...@opteya.com 
 wrote:
  Le jeudi 02 avril 2015 à 16:44 +, Shachar Raindel a écrit :
   -Original Message-
   From: Yann Droneaud [mailto:ydrone...@opteya.com]
   Sent: Thursday, April 02, 2015 7:35 PM
  
   Another related question: as the large memory range could be registered
   by user space with ibv_reg_mr(pd, base, size, IB_ACCESS_ON_DEMAND),
   what's prevent the kernel to map a file as the result of mmap(0, ...)
   in this  region, making it available remotely through IBV_WR_RDMA_READ /
   IBV_WR_RDMA_WRITE ?
  
 
  This is not a bug. This is a feature.
 
  Exposing a file through RDMA, using ODP, can be done exactly like this.
  Given that the application explicitly requested this behavior, I don't
  see why it is a problem.
  
  If the application cannot choose what will end up in the region it has
  registered, it's an issue !
  
  What might happen if one library in a program call mmap(0, size, ...) to
  load a file storing a secret (a private key), and that file ends up
  being mapped in an registered but otherwise free region (afaict, the
  kernel is allowed to do it) ?
  What might happen if one library in a program call call mmap(0,
  size, ..., MAP_ANONYMOUS,...) to allocate memory, call mlock(), then
  write in this location a secret (a passphrase), and that area ends up
  in the memory region registered for on demand paging ?
  
  The application haven't choose to disclose these confidential piece of
  information, but they are available for reading/writing by remote
  through RDMA given it knows the rkey of the memory region (which is a
  32bits value).
  
  I hope I'm missing something, because I'm not feeling confident such
  behavor is a feature.
 
 What we are aiming for is the possibility to register the entire process' 
 address 
 space for RDMA operations (if the process chooses to use this feature).
 This is similar to multiple threads accessing the same address space. I'm 
 sure 
 you wouldn't be complaining about the ability of one thread to access the 
 secret 
 passphrase mmapped by another thread in your example.
 
  I'm trying to understand how the application can choose what is exposed
  through RDMA if it registers a very large memory region for later use
  (but do not actually explicitly map something there yet): what's the
  consequences ?
  
 void *start = sbrk(0);
 size_t size = ULONG_MAX - (unsigned long)start;
  
 ibv_reg_mr(pd, start, size, IB_ACCESS_ON_DEMAND)
 
 The consequences are exactly as you wrote. Just as giving a non-ODP rkey 
 to a remote node allows the node to access the registered memory behind that 
 rkey, giving an ODP rkey to a remote node allows that node to access the 
 virtual address space behind that rkey.
 

There's a difference: it's impossible to give a valid non-ODP rkey that
point to a memory region not already mapped (backed by a file for 
example), so the application *choose* the content of the memory to be
made accessible remotely before making it accessible.

As I understand the last explanation regarding ODP, at creation time,
an ODP rkey can point to a free, unused, unallocated memory portion.
At this point the kernel can happily map anything the application
(and its libraries) want to map at a (almost) *random* address that
could be in (or partially in) the ODP memory region.

And I have a problem with such random behavior. Allowing this is seems
dangerous and should be done with care.

I believe the application must kept the control of what's end up in its 
ODP registered memory region.

Especially for multi thread program: imagine one thread creating a large
memory region for its future purposes, then send the rkey to a remote 
peer and wait for some work to be done.
In the mean time another call mmap(0, ...) to map a file at a kernel 
chosen address, and that address happen to be in the memory region 
registered by the other thread:

1) the first thread is amputated from a portion of memory it was 
willing to use;
2) the data used by the second thread is accessible to the remote 
peer(s) while not expected.

Speculatively registering memory seems dangerous for any use case I
could think of.

Regards.

-- 
Yann Droneaud
OPTEYA


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: CVE-2014-8159 kernel: infiniband: uverbs: unprotected physical memory access

2015-04-02 Thread Yann Droneaud
Hi,

Le jeudi 02 avril 2015 à 16:44 +, Shachar Raindel a écrit :
  -Original Message-
  From: Yann Droneaud [mailto:ydrone...@opteya.com]
  Sent: Thursday, April 02, 2015 7:35 PM

  Another related question: as the large memory range could be registered
  by user space with ibv_reg_mr(pd, base, size, IB_ACCESS_ON_DEMAND),
  what's prevent the kernel to map a file as the result of mmap(0, ...)
  in this  region, making it available remotely through IBV_WR_RDMA_READ /
  IBV_WR_RDMA_WRITE ?
  
 
 This is not a bug. This is a feature.
 
 Exposing a file through RDMA, using ODP, can be done exactly like this.
 Given that the application explicitly requested this behavior, I don't
 see why it is a problem. 

If the application cannot choose what will end up in the region it has
registered, it's an issue !

What might happen if one library in a program call mmap(0, size, ...) to
load a file storing a secret (a private key), and that file ends up 
being mapped in an registered but otherwise free region (afaict, the 
kernel is allowed to do it) ?
What might happen if one library in a program call call mmap(0, 
size, ..., MAP_ANONYMOUS,...) to allocate memory, call mlock(), then
write in this location a secret (a passphrase), and that area ends up
in the memory region registered for on demand paging ?

The application haven't choose to disclose these confidential piece of 
information, but they are available for reading/writing by remote
through RDMA given it knows the rkey of the memory region (which is a 
32bits value).

I hope I'm missing something, because I'm not feeling confident such
behavor is a feature.


 Actually, some of our tests use such flows.
 The mmu notifiers mechanism allow us to do this safely. When the page is
 written back to disk, it is removed from the ODP mapping. When it is
 accessed by the HCA, it is brought back to RAM.
 

I'm not discussing about the benefit of On Demand Paging and why it's a
very good feature to expose files through RDMA.

I'm trying to understand how the application can choose what is exposed
through RDMA if it registers a very large memory region for later use 
(but do not actually explicitly map something there yet): what's the
consequences ?

   void *start = sbrk(0);
   size_t size = ULONG_MAX - (unsigned long)start;

   ibv_reg_mr(pd, start, size, IB_ACCESS_ON_DEMAND)


Regards.

-- 
Yann Droneaud
OPTEYA


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: CVE-2014-8159 kernel: infiniband: uverbs: unprotected physical memory access

2015-04-02 Thread Yann Droneaud
Hi,

Le mercredi 18 mars 2015 à 17:39 +, Shachar Raindel a écrit :
 Hi,
 
 It was found that the Linux kernel's InfiniBand/RDMA subsystem did not
 properly sanitize input parameters while registering memory regions
 from user space via the (u)verbs API. A local user with access to
 a /dev/infiniband/uverbsX device could use this flaw to crash the
 system or, potentially, escalate their privileges on the system.
 
 The issue has been assigned CVE-2014-8159.
 
 The issue exists in the InfiniBand/RDMA/iWARP drivers since Linux
 Kernel version 2.6.13.
 
 Mellanox OFED 2.4-1.0.4 fixes the issue. Available from:
 http://www.mellanox.com/page/products_dyn?product_family=26mtag=linux_sw_drivers
  
 
 RedHat errata: https://access.redhat.com/security/cve/CVE-2014-8159
 Canonical errata: 
 http://people.canonical.com/~ubuntu-security/cve/2014/CVE-2014-8159.html
 Novell (Suse) bug tracking: https://bugzilla.novell.com/show_bug.cgi?id=914742
 
 
 The following patch fixes the issue:
 
 --- 8 --
 
 From d4d68430d4a12c569e28b4f4468284ea2286 Mon Sep 17 00:00:00 2001
 From: Shachar Raindel rain...@mellanox.com
 Date: Sun, 04 Jan 2015 18:30:32 +0200
 Subject: [PATCH] IB/core: Prevent integer overflow in ib_umem_get address 
 arithmetic
 
 Properly verify that the resulting page aligned end address is larger
 than both the start address and the length of the memory area
 requested.
 
 Both the start and length arguments for ib_umem_get are controlled by
 the user. A misbehaving user can provide values which will cause an
 integer overflow when calculating the page aligned end address.
 
 This overflow can cause also miscalculation of the number of pages
 mapped, and additional logic issues.
 
 Signed-off-by: Shachar Raindel rain...@mellanox.com
 Signed-off-by: Jack Morgenstein ja...@mellanox.com
 Signed-off-by: Or Gerlitz ogerl...@mellanox.com
 ---
 
 diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
 index aec7a6a..8c014b5 100644
 --- a/drivers/infiniband/core/umem.c
 +++ b/drivers/infiniband/core/umem.c
 @@ -99,6 +99,14 @@
   if (dmasync)
   dma_set_attr(DMA_ATTR_WRITE_BARRIER, attrs);
  
 + /*
 +  * If the combination of the addr and size requested for this memory
 +  * region causes an integer overflow, return error.
 +  */
 + if ((PAGE_ALIGN(addr + size) = size) ||
 + (PAGE_ALIGN(addr + size) = addr))
 + return ERR_PTR(-EINVAL);
 +

Can access_ok() be used here ?

 if (!access_ok(writable ? VERIFY_WRITE : VERIFY_READ,
addr, size))
  return ERR_PTR(-EINVAL);


   if (!can_do_mlock())
   return ERR_PTR(-EPERM);
 

Regards.

-- 
Yann Droneaud
OPTEYA


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: CVE-2014-8159 kernel: infiniband: uverbs: unprotected physical memory access

2015-04-02 Thread Yann Droneaud
Hi,
Le jeudi 02 avril 2015 à 10:52 +, Shachar Raindel a écrit :
  -Original Message-
  From: Yann Droneaud [mailto:ydrone...@opteya.com]
  Sent: Thursday, April 02, 2015 1:05 PM
  Le mercredi 18 mars 2015 à 17:39 +, Shachar Raindel a écrit :
...
   + /*
   +  * If the combination of the addr and size requested for this
  memory
   +  * region causes an integer overflow, return error.
   +  */
   + if ((PAGE_ALIGN(addr + size) = size) ||
   + (PAGE_ALIGN(addr + size) = addr))
   + return ERR_PTR(-EINVAL);
   +
  
  Can access_ok() be used here ?
  
   if (!access_ok(writable ? VERIFY_WRITE : VERIFY_READ,
  addr, size))
return ERR_PTR(-EINVAL);
  
 
 No, this will break the current ODP semantics.
 
 ODP allows the user to register memory that is not accessible yet.
 This is a critical design feature, as it allows avoiding holding
 a registration cache. Adding this check will break the behavior,
 forcing memory to be all accessible when registering an ODP MR.
 

Failed to notice previously, but since this would break ODP, and ODP is 
only available starting v3.19-rc1, my proposed fix might be applicable 
for older kernel (if not better).

From 2a3beaeb4b35d968f127cb59cfda2f12dbd87e4b Mon Sep 17 00:00:00 2001
From: Yann Droneaud ydrone...@opteya.com
Date: Thu, 2 Apr 2015 17:01:05 +0200
Subject: [RFC PATCH] IB/core: reject invalid userspace memory range registration

Signed-off-by: Yann Droneaud ydrone...@opteya.com
---
 drivers/infiniband/core/umem.c | 23 +++
 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index df0c4f605a21..6758b4ac56eb 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -90,6 +90,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, 
unsigned long addr,
DEFINE_DMA_ATTRS(attrs);
struct scatterlist *sg, *sg_list_start;
int need_release = 0;
+   bool writable;
 
if (dmasync)
dma_set_attr(DMA_ATTR_WRITE_BARRIER, attrs);
@@ -97,6 +98,19 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, 
unsigned long addr,
if (!can_do_mlock())
return ERR_PTR(-EPERM);
 
+   /*
+* We ask for writable memory if any access flags other than
+* remote read are set.  Local write and remote write
+* obviously require write access.  Remote atomic can do
+* things like fetch and add, which will modify memory, and
+* MW bind can change permissions by binding a window.
+*/
+   writable  = !!(access  ~IB_ACCESS_REMOTE_READ);
+
+   if (!access_ok(writable ? VERIFY_WRITE : VERIFY_READ,
+  (void __user *)addr, size))
+   return ERR_PTR(-EFAULT);
+
umem = kzalloc(sizeof *umem, GFP_KERNEL);
if (!umem)
return ERR_PTR(-ENOMEM);
@@ -106,14 +120,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, 
unsigned long addr,
umem-offset= addr  ~PAGE_MASK;
umem-page_size = PAGE_SIZE;
umem-pid   = get_task_pid(current, PIDTYPE_PID);
-   /*
-* We ask for writable memory if any access flags other than
-* remote read are set.  Local write and remote write
-* obviously require write access.  Remote atomic can do
-* things like fetch and add, which will modify memory, and
-* MW bind can change permissions by binding a window.
-*/
-   umem-writable  = !!(access  ~IB_ACCESS_REMOTE_READ);
+   umem-writable  = writable;
 
/* We assume the memory is from hugetlb until proved otherwise */
umem-hugetlb   = 1;
-- 
2.1.0

Regards.

-- 
Yann Droneaud
OPTEYA


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: CVE-2014-8159 kernel: infiniband: uverbs: unprotected physical memory access

2015-04-02 Thread Yann Droneaud
Hi,

Le jeudi 02 avril 2015 à 10:52 +, Shachar Raindel a écrit :
  -Original Message-
  From: Yann Droneaud [mailto:ydrone...@opteya.com]
  Sent: Thursday, April 02, 2015 1:05 PM
  Le mercredi 18 mars 2015 à 17:39 +, Shachar Raindel a écrit :

   + /*
   +  * If the combination of the addr and size requested for this
  memory
   +  * region causes an integer overflow, return error.
   +  */
   + if ((PAGE_ALIGN(addr + size) = size) ||
   + (PAGE_ALIGN(addr + size) = addr))
   + return ERR_PTR(-EINVAL);
   +
  
  Can access_ok() be used here ?
  
   if (!access_ok(writable ? VERIFY_WRITE : VERIFY_READ,
  addr, size))
return ERR_PTR(-EINVAL);
  
 
 No, this will break the current ODP semantics.
 
 ODP allows the user to register memory that is not accessible yet.
 This is a critical design feature, as it allows avoiding holding
 a registration cache. Adding this check will break the behavior,
 forcing memory to be all accessible when registering an ODP MR.
 

Where's the check for the range being in userspace memory space,
especially for the ODP case ?

For non ODP case (eg. plain old behavior), does get_user_pages()
ensure the requested pages fit in userspace region on all 
architectures ? I think so.

In ODP case, I'm not sure such check is ever done ?
(Aside, does it take special mesure to protect shared mapping from
being read and/or *written* ?)

Regards.

-- 
Yann Droneaud
OPTEYA


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[-stable] commit 377b513485fd (IB/core: Avoid leakage from kernel to user space)

2015-03-27 Thread Yann Droneaud
Hi,

Please add commit 377b513485fd (IB/core: Avoid leakage from kernel to
user space) to -stable. It can be applied to v2.6.32 and later.

Regards.

-- 
Yann Droneaud
OPTEYA


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] rdma: replace deprecated ifconfig in doc

2015-03-10 Thread Yann Droneaud
Hi,

Le lundi 09 mars 2015 à 11:29 -0700, Stephen Hemminger a écrit :
 On Mon, 09 Mar 2015 13:06:30 +0100
 Yann Droneaud ydrone...@opteya.com wrote:
  Le dimanche 08 mars 2015 à 16:36 -0700, Stephen Hemminger a écrit :
   The ifconfig command has been deprecated for many years.
   To encourage new users not to continue using it and learning
   iproute2; the ifconfig should not be used in examples.
   
   Signed-off-by: Stephen Hemminger step...@networkplumber.org
   
   
   --- a/Documentation/filesystems/nfs/nfs-rdma.txt  2014-09-28 
   16:27:20.393505440 -0700
   +++ b/Documentation/filesystems/nfs/nfs-rdma.txt  2015-03-08 
   16:33:21.483985810 -0700
   @@ -187,8 +187,10 @@ Check RDMA and NFS Setup
To further test the InfiniBand software stack, use IPoIB (this
assumes you have two IB hosts named host1 and host2):

   -host1$ ifconfig ib0 a.b.c.x
   -host2$ ifconfig ib0 a.b.c.y
   +host1$ ip link set dev ib0 up
   +host1$ ip address add dev ib0 a.b.c.x
   +host2$ ip link set dev ib0 up
  
  I don't think we need to up the device twice.
 
 You need to do it on each host.
 

Oops, missed that. Sorry.

   +host2$ ip address add dev ib0 a.b.c.y
host1$ ping a.b.c.y
host2$ ping a.b.c.x

   @@ -229,7 +231,8 @@ NFS/RDMA Setup

$ modprobe ib_mthca
$ modprobe ib_ipoib
   -$ ifconfig ib0 a.b.c.d
   +$ ip li set dev ib0 up
  
  s/li/link ?
 
 ip commands support shorthand.
 

Yes, sure, but here, it's looking more like a typo.
It should be consistent throughout the document.

Regards.

-- 
Yann Droneaud
OPTEYA


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] rdma: replace deprecated ifconfig in doc

2015-03-09 Thread Yann Droneaud
Hi,

Le dimanche 08 mars 2015 à 16:36 -0700, Stephen Hemminger a écrit :
 The ifconfig command has been deprecated for many years.
 To encourage new users not to continue using it and learning
 iproute2; the ifconfig should not be used in examples.
 
 Signed-off-by: Stephen Hemminger step...@networkplumber.org
 
 
 --- a/Documentation/filesystems/nfs/nfs-rdma.txt  2014-09-28 
 16:27:20.393505440 -0700
 +++ b/Documentation/filesystems/nfs/nfs-rdma.txt  2015-03-08 
 16:33:21.483985810 -0700
 @@ -187,8 +187,10 @@ Check RDMA and NFS Setup
  To further test the InfiniBand software stack, use IPoIB (this
  assumes you have two IB hosts named host1 and host2):
  
 -host1$ ifconfig ib0 a.b.c.x
 -host2$ ifconfig ib0 a.b.c.y
 +host1$ ip link set dev ib0 up
 +host1$ ip address add dev ib0 a.b.c.x
 +host2$ ip link set dev ib0 up

I don't think we need to up the device twice.

 +host2$ ip address add dev ib0 a.b.c.y
  host1$ ping a.b.c.y
  host2$ ping a.b.c.x
  
 @@ -229,7 +231,8 @@ NFS/RDMA Setup
  
  $ modprobe ib_mthca
  $ modprobe ib_ipoib
 -$ ifconfig ib0 a.b.c.d
 +$ ip li set dev ib0 up

s/li/link ?

 +$ ip addr add dev ib0 a.b.c.d
  
  NOTE: use unique addresses for the client and server

Regards.

-- 
Yann Droneaud
OPTEYA



--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] IB/srp: Add 64-bit LUN support

2015-03-06 Thread Yann Droneaud
-opcode = SRP_TSK_MGMT;
 - tsk_mgmt-lun = cpu_to_be64(((u64) lun)  48);
 + int_to_scsilun(lun, tsk_mgmt-lun);
   tsk_mgmt-tsk_mgmt_func = SRP_TSK_ABORT_TASK;
   tsk_mgmt-task_tag = (u64) found_evt;
  
 @@ -1652,7 +1652,7 @@ static int ibmvscsi_eh_device_reset_handler(struct 
 scsi_cmnd *cmd)
   /* Set up a lun reset SRP command */
   memset(tsk_mgmt, 0x00, sizeof(*tsk_mgmt));
   tsk_mgmt-opcode = SRP_TSK_MGMT;
 - tsk_mgmt-lun = cpu_to_be64(((u64) lun)  48);
 + int_to_scsilun(lun, tsk_mgmt-lun);
   tsk_mgmt-tsk_mgmt_func = SRP_TSK_LUN_RESET;
  
   evt-sync_srp = srp_rsp;
 diff --git a/include/scsi/srp.h b/include/scsi/srp.h
 index 1ae84db..5be834d 100644
 --- a/include/scsi/srp.h
 +++ b/include/scsi/srp.h
 @@ -42,6 +42,7 @@
   */
  
  #include linux/types.h
 +#include scsi/scsi.h
  
  enum {
   SRP_LOGIN_REQ   = 0x00,
 @@ -179,7 +180,7 @@ struct srp_tsk_mgmt {
   u8  reserved1[6];
   u64 tag;
   u8  reserved2[4];
 - __be64  lun __attribute__((packed));
 + struct scsi_lun lun;
   u8  reserved3[2];
   u8  tsk_mgmt_func;
   u8  reserved4;
 @@ -200,7 +201,7 @@ struct srp_cmd {
   u8  data_in_desc_cnt;
   u64 tag;
   u8  reserved2[4];
 - __be64  lun __attribute__((packed));
 + struct scsi_lun lun;
   u8  reserved3;
   u8  task_attr;
   u8  reserved4;
 @@ -265,7 +266,7 @@ struct srp_aer_req {
   __be32  req_lim_delta;
   u64 tag;
   u32 reserved2;
 - __be64  lun;
 + struct scsi_lun lun;
   __be32  sense_data_len;
   u32 reserved3;
   u8  sense_data[0];

Reviewed-by: Yann Droneaud ydrone...@opteya.com

Regards.

-- 
Yann Droneaud
OPTEYA


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH, resend] IB/srp: Add 64-bit LUN support

2015-03-04 Thread Yann Droneaud
Hi,

Le mercredi 04 mars 2015 à 11:01 +0100, Bart Van Assche a écrit :
 The SCSI standard defines 64-bit values for LUNs. Large arrays
 employing large or hierarchical LUN numbers become more and more
 common. So update the SRP initiator to use 64-bit LUN numbers.
 See also Hannes Reinecke, commit 9cb78c16f5da (scsi: use 64-bit LUNs),
 June 2014.
 

Thanks for the added description.

 The largest LUN number that has been tested is 0xd2003fff.
 
 Checked the following structure sizes with gdb:
 * sizeof(struct srp_cmd) = 48
 * sizeof(struct srp_tsk_mgmt) = 48
 * sizeof(struct srp_aer_req) = 36
 
 The ibmvscsi changes have been compile tested only on a PPC system.
 
 Signed-off-by: Bart Van Assche bart.vanass...@sandisk.com
 Reviewed-by: Hannes Reinecke h...@suse.de
 Cc: Sagi Grimberg sa...@mellanox.com
 Cc: Sebastian Parschauer sebastian.rie...@profitbricks.com
 Cc: Brian King brk...@linux.vnet.ibm.com
 Cc: Nathan Fontenot nf...@linux.vnet.ibm.com
 Cc: Tyrel Datwyler tyr...@linux.vnet.ibm.com
 ---
  drivers/infiniband/ulp/srp/ib_srp.c | 12 ++--
  drivers/scsi/ibmvscsi/ibmvscsi.c|  6 +++---
  include/scsi/srp.h  |  7 ---
  3 files changed, 13 insertions(+), 12 deletions(-)
 
 diff --git a/drivers/infiniband/ulp/srp/ib_srp.c 
 b/drivers/infiniband/ulp/srp/ib_srp.c
 index a0e24a8..e427454 100644
 --- a/drivers/infiniband/ulp/srp/ib_srp.c
 +++ b/drivers/infiniband/ulp/srp/ib_srp.c
 @@ -3146,7 +3146,7 @@ static ssize_t srp_create_target(struct device *dev,
   target_host-transportt  = ib_srp_transport_template;
   target_host-max_channel = 0;
   target_host-max_id  = 1;
 - target_host-max_lun = SRP_MAX_LUN;
 + target_host-max_lun = -1LL;

I guess you can remove SRP_MAX_LUN from
drivers/infiniband/ulp/srp/ib_srp.h too.

   target_host-max_cmd_len = sizeof ((struct srp_cmd *) (void *) 0L)-cdb;
  
   target = host_to_target(target_host);

 diff --git a/include/scsi/srp.h b/include/scsi/srp.h
 index 1ae84db..5be834d 100644
 --- a/include/scsi/srp.h
 +++ b/include/scsi/srp.h
 @@ -42,6 +42,7 @@
   */
  
  #include linux/types.h
 +#include scsi/scsi.h
  
  enum {
   SRP_LOGIN_REQ   = 0x00,

@@ -54,7 +54,6 @@ enum {
SRP_DLID_REDIRECT   = 2,
SRP_STALE_CONN  = 3,
 
-   SRP_MAX_LUN = 512,
SRP_DEF_SG_TABLESIZE= 12,
 
SRP_DEFAULT_QUEUE_SIZE  = 1  6,


Regards.

-- 
Yann Droneaud
OPTEYA


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 1/5] IB/uverbs: ex_query_device: answer must not depend on request's comp_mask

2015-03-02 Thread Yann Droneaud
Hi Or,

Le samedi 31 janvier 2015 à 22:09 +0200, Or Gerlitz a écrit :
 On Fri, Jan 30, 2015, Yann Droneaud ydrone...@opteya.com wrote:
  [..] I have not the chance
  of owning HCA with the support for this feature, nor the patches
  libibverbs / libmlx5 ... (anyway I would not have the time to test). [...]
 
 Yann, have you ever run Linux over RDMA capable HW (IB, RoCE, iWARP or alike?)

That was a silly question.

So I've took a little time to consider answering it,
perhaps you were gently kidding me.

Anyway, please take the following links as an answer to your question:

rdma_create_qp() and max_send_wr
http://mid.gmane.org/1303404264.2243.19.ca...@deela.quest-ce.net

ibv_create_qp() and max_inline_data behavior
http://mid.gmane.org/d7685e13978f93890b2939c5ac2d5...@meuh.org

[PATCH librdmacm 0/3] no IBV_SEND_INLINE thus memory registration
required on QLogic/Intel HCA
http://mid.gmane.org/cover.1376746185.git.ydrone...@opteya.com

[PATCH 00/22] infiniband: improve userspace input check
http://mid.gmane.org/cover.1376847403.git.ydrone...@opteya.com

I hope this would be enough to join the gang^Wclub. Where're the
by-laws ?

Regards.

-- 
Yann Droneaud
OPTEYA


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] IB/ipatch: Use setup_timer and mod_timer

2015-03-02 Thread Yann Droneaud
Le lundi 02 mars 2015 à 01:16 +0530, Vaishali Thakkar a écrit :
 Use timer API functions setup_timer and mod_timer instead
 of structure assignments as they are standard way to set
 the timer and to update the expire field of an active timer
 respectively.
 
 This is done using Coccinelle and semantic patch used for
 this is as follows:
 
 // smpl
 @@
 expression x,y,z,a,b;
 @@
 
 -init_timer (x);
 +setup_timer (x, y, z);
 +mod_timer (a, b);
 -x.function = y;
 -x.data = z;
 -x.expires = b;
 -add_timer(a);
 // /smpl
 
 Signed-off-by: Vaishali Thakkar vthakkar1...@gmail.com
 ---
  drivers/infiniband/hw/ipath/ipath_driver.c|  9 +++--
  drivers/infiniband/hw/ipath/ipath_init_chip.c | 10 +++---
  drivers/infiniband/hw/ipath/ipath_verbs.c |  7 ++-
  3 files changed, 8 insertions(+), 18 deletions(-)
 
[...]
 diff --git a/drivers/infiniband/hw/ipath/ipath_init_chip.c 
 b/drivers/infiniband/hw/ipath/ipath_init_chip.c
 index be2a60e..34ffb43 100644
 --- a/drivers/infiniband/hw/ipath/ipath_init_chip.c
 +++ b/drivers/infiniband/hw/ipath/ipath_init_chip.c
 @@ -950,13 +950,9 @@ int ipath_init_chip(struct ipath_devdata *dd, int reinit)
* set up stats retrieval timer, even if we had errors
* in last portion of setup
*/
 - init_timer(dd-ipath_stats_timer);
 - dd-ipath_stats_timer.function = ipath_get_faststats;
 - dd-ipath_stats_timer.data = (unsigned long) dd;
 - /* every 5 seconds; */
 - dd-ipath_stats_timer.expires = jiffies + 5 * HZ;
 - /* takes ~16 seconds to overflow at full IB 4x bandwdith */
 - add_timer(dd-ipath_stats_timer);
 + setup_timer(dd-ipath_stats_timer, ipath_get_faststats,
 + (unsigned long)dd);
 + mod_timer(dd-ipath_stats_timer, jiffies + 5 * HZ);

The code seems correct, but you remove the comments, loosing some useful
information.

Regards.

-- 
Yann Droneaud
OPTEYA


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RE-RESEND V2 for-next 3/5] IB/core: Call ib_unregister_mad_agent() only for valid agents

2015-02-17 Thread Yann Droneaud
Hi,

Le lundi 16 février 2015 à 22:28 +0100, Yann Droneaud a écrit :
 Le jeudi 05 février 2015 à 13:53 +0200, Or Gerlitz a écrit :
  From: Majd Dibbiny m...@mellanox.com
  
  In some error flows, ib_mad_unregister_agent is being invoked also in cases 
  where
   ^^^
   ib_unregister_mad_agent()
 
  the ib_mad_register_agent call failed (resulting in an illegal pointer in 
  the
  ^
  ib_register_mad_agent()
 
  agent field).  This causes a kernel crash in the error flow.
  
  Fix this by calling ib_unregister_mad_agent only for cases where
  ib_register_mad_agent succeeded.
  
 
 The code was checking for struct ib_mad_agent *agent not being NULL, 
 while ib_register_mad_agent() returns a ERR_PTR() in case of error
 ... bad :(
 

After reading Jason's comments [1], I don't understand the purpose of
this patch. How can an ERR_PTR() value be present in the arrays ?

A quick review makes me think that mthca and mlx4 error paths lacks NULL
assignment after calling ib_unregister_mad_agent(). In other words, qib
might be correct while the others should be fixed.

[1] http://mid.gmane.org/20150205174337.gb31...@obsidianresearch.com

  Signed-off-by: Majd Dibbiny m...@mellanox.com
  Signed-off-by: Or Gerlitz ogerl...@mellanox.com
  ---
   drivers/infiniband/hw/mlx4/mad.c|2 +-
   drivers/infiniband/hw/mthca/mthca_mad.c |2 +-
   drivers/infiniband/hw/qib/qib_mad.c |2 +-
   3 files changed, 3 insertions(+), 3 deletions(-)
  
  diff --git a/drivers/infiniband/hw/mlx4/mad.c 
  b/drivers/infiniband/hw/mlx4/mad.c
  index 82a7dd8..6be0d2c 100644
  --- a/drivers/infiniband/hw/mlx4/mad.c
  +++ b/drivers/infiniband/hw/mlx4/mad.c
  @@ -907,7 +907,7 @@ int mlx4_ib_mad_init(struct mlx4_ib_dev *dev)
   err:
  for (p = 0; p  dev-num_ports; ++p)
  for (q = 0; q = 1; ++q)
  -   if (dev-send_agent[p][q])
  +   if (!IS_ERR(dev-send_agent[p][q]))
  ib_unregister_mad_agent(dev-send_agent[p][q]);
   
  return ret;
  diff --git a/drivers/infiniband/hw/mthca/mthca_mad.c 
  b/drivers/infiniband/hw/mthca/mthca_mad.c
  index 8881fa3..5f1a7ce 100644
  --- a/drivers/infiniband/hw/mthca/mthca_mad.c
  +++ b/drivers/infiniband/hw/mthca/mthca_mad.c
  @@ -317,7 +317,7 @@ int mthca_create_agents(struct mthca_dev *dev)
   err:
  for (p = 0; p  dev-limits.num_ports; ++p)
  for (q = 0; q = 1; ++q)
  -   if (dev-send_agent[p][q])
  +   if (!IS_ERR(dev-send_agent[p][q]))
  ib_unregister_mad_agent(dev-send_agent[p][q]);
   
  return ret;
  diff --git a/drivers/infiniband/hw/qib/qib_mad.c 
  b/drivers/infiniband/hw/qib/qib_mad.c
  index 636be11..6c7cc80 100644
  --- a/drivers/infiniband/hw/qib/qib_mad.c
  +++ b/drivers/infiniband/hw/qib/qib_mad.c
  @@ -2499,7 +2499,7 @@ int qib_create_agents(struct qib_ibdev *dev)
   err:
  for (p = 0; p  dd-num_pports; p++) {
  ibp = dd-pport[p].ibport_data;
  -   if (ibp-send_agent) {
  +   if (!IS_ERR(ibp-send_agent)) {
  agent = ibp-send_agent;
  ibp-send_agent = NULL;
  ib_unregister_mad_agent(agent);
 

Regards.

-- 
Yann Droneaud
OPTEYA


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RE-RESEND V2 for-next 5/5] IB/core: Fix deadlock on uverbs modify_qp error flow

2015-02-16 Thread Yann Droneaud
Le jeudi 05 février 2015 à 13:53 +0200, Or Gerlitz a écrit :
 From: Moshe Lazer mos...@mellanox.com
 
 The deadlock occurs on __uverbs_modify_qp, we take a lock (idr_read_qp)
 and in case of failure in ib_resolve_eth_l2_attrs we don't release
 it (put_qp_read). Fix that.
 
 Issue: 355606
 ^^
I'm not sure this is relevant

 Fixes: ed4c54e5b4ba (IB/core: Resolve Ethernet L2 addresses when modifying 
 QP)
 Signed-off-by: Moshe Lazer mos...@mellanox.com
 Signed-off-by: Or Gerlitz ogerl...@mellanox.com
 ---
  drivers/infiniband/core/uverbs_cmd.c |9 +
  1 files changed, 5 insertions(+), 4 deletions(-)
 
 diff --git a/drivers/infiniband/core/uverbs_cmd.c 
 b/drivers/infiniband/core/uverbs_cmd.c
 index ecb6430..a602ce9 100644
 --- a/drivers/infiniband/core/uverbs_cmd.c
 +++ b/drivers/infiniband/core/uverbs_cmd.c
 @@ -2097,20 +2097,21 @@ ssize_t ib_uverbs_modify_qp(struct ib_uverbs_file 
 *file,
   if (qp-real_qp == qp) {
   ret = ib_resolve_eth_l2_attrs(qp, attr, cmd.attr_mask);
   if (ret)
 - goto out;
 + goto release_qp;
   ret = qp-device-modify_qp(qp, attr,
   modify_qp_mask(qp-qp_type, cmd.attr_mask), udata);
   } else {
   ret = ib_modify_qp(qp, attr, modify_qp_mask(qp-qp_type, 
 cmd.attr_mask));
   }
  
 - put_qp_read(qp);
 -
   if (ret)
 - goto out;
 + goto release_qp;
  
   ret = in_len;
  
 +release_qp:
 + put_qp_read(qp);
 +
  out:
   kfree(attr);
  

Regards.

-- 
Yann Droneaud
OPTEYA


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RE-RESEND V2 for-next 4/5] IB/core: Make sure that the PSN does not overflow

2015-02-16 Thread Yann Droneaud
Hi,

Le jeudi 05 février 2015 à 13:53 +0200, Or Gerlitz a écrit :
 From: Majd Dibbiny m...@mellanox.com
 
 The rq/sq-psn is 24 bits as defined in the IB spec, therefore ULPs and User
 space applications shouldn't use the 8 most significant bits in the 32 bits
 variables to avoid overflow in modify_qp.
 
 Fixed the PSN generation by the RDMA-CM to mask out the 8 most significant 
 bits,
 also mask out these bits in uverbs for attributes provided by user-space.
 
 Signed-off-by: Majd Dibbiny m...@mellanox.com
 Signed-off-by: Or Gerlitz ogerl...@mellanox.com
 ---
  drivers/infiniband/core/cma.c|1 +
  drivers/infiniband/core/uverbs_cmd.c |4 ++--
  2 files changed, 3 insertions(+), 2 deletions(-)
 
 diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
 index d570030..fab0ee5 100644
 --- a/drivers/infiniband/core/cma.c
 +++ b/drivers/infiniband/core/cma.c
 @@ -512,6 +512,7 @@ struct rdma_cm_id *rdma_create_id(rdma_cm_event_handler 
 event_handler,
   INIT_LIST_HEAD(id_priv-listen_list);
   INIT_LIST_HEAD(id_priv-mc_list);
   get_random_bytes(id_priv-seq_num, sizeof id_priv-seq_num);
 + id_priv-seq_num = 0xff;
  
   return id_priv-id;
  }
 diff --git a/drivers/infiniband/core/uverbs_cmd.c 
 b/drivers/infiniband/core/uverbs_cmd.c
 index 532d8eb..ecb6430 100644
 --- a/drivers/infiniband/core/uverbs_cmd.c
 +++ b/drivers/infiniband/core/uverbs_cmd.c
 @@ -2053,8 +2053,8 @@ ssize_t ib_uverbs_modify_qp(struct ib_uverbs_file *file,
   attr-path_mtu= cmd.path_mtu;
   attr-path_mig_state  = cmd.path_mig_state;
   attr-qkey= cmd.qkey;
 - attr-rq_psn  = cmd.rq_psn;
 - attr-sq_psn  = cmd.sq_psn;
 + attr-rq_psn  = cmd.rq_psn  0xff;
 + attr-sq_psn  = cmd.sq_psn  0xff;
   attr-dest_qp_num = cmd.dest_qp_num;
   attr-qp_access_flags = cmd.qp_access_flags;
   attr-pkey_index  = cmd.pkey_index;

0xff could be made a #define

Question: userspace is allowed to ask for a PSN on 32bits, but it will 
be silently truncated, is it going to puzzle applications ?

Anyway, it would have been better to return an error in
the first place ... not sure if we can do it now ...

Regards.

-- 
Yann Droneaud
OPTEYA


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] IB/core: Add support for extended query device caps

2015-02-16 Thread Yann Droneaud
Le dimanche 08 février 2015 à 13:28 +0200, Haggai Eran a écrit :
 From: Eli Cohen e...@mellanox.com
 
 Add extensible query device capabilities verb to allow adding new features.
 ib_uverbs_ex_query_device is added and copy_query_dev_fields is used to copy
 capability fields to be used by both ib_uverbs_query_device and
 ib_uverbs_ex_query_device.
 
 Following the discussion about this patch [1], the code now validates the
 command's comp_mask is zero, returning -EINVAL for unknown values, in order to
 allow extending the verb in the future.
 
 The verb also checks the user-space provided response buffer size and only
 fills in capabilities that will fit in the buffer. In attempt to follow the
 spirit of presentation [2] by Tzahi Oved that was presented during OpenFabrics
 Alliance International Developer Workshop 2013, the comp_mask bits will only
 describe which fields are valid. Furthermore, fields that can simply be
 cleared when they are not supported, do not require a comp_mask bit at all.
 The verb returns a response_length field containing the actual number of bytes
 written by the kernel, so that a newer version running on an older kernel can
 tell which fields were actually returned.
 
 [1] [PATCH v1 0/5] IB/core: extended query device caps cleanup for v3.19
 http://thread.gmane.org/gmane.linux.kernel.api/7889/
 
 [2] 
 https://www.openfabrics.org/images/docs/2013_Dev_Workshop/Tues_0423/2013_Workshop_Tues_0830_Tzahi_Oved-verbs_extensions_ofa_2013-tzahio.pdf
 
 Cc: Yann Droneaud ydrone...@opteya.com
 Cc: Ira Weiny ira.we...@intel.com
 Cc: Jason Gunthorpe jguntho...@obsidianresearch.com
 Signed-off-by: Eli Cohen e...@mellanox.com
 Signed-off-by: Haggai Eran hagg...@mellanox.com
 ---
  drivers/infiniband/core/uverbs.h  |   1 +
  drivers/infiniband/core/uverbs_cmd.c  | 131 
 +++---
  drivers/infiniband/core/uverbs_main.c |   1 +
  include/uapi/rdma/ib_user_verbs.h |  12 
  4 files changed, 104 insertions(+), 41 deletions(-)
 
 diff --git a/drivers/infiniband/core/uverbs.h 
 b/drivers/infiniband/core/uverbs.h
 index 643c08a025a5..b716b0815644 100644
 --- a/drivers/infiniband/core/uverbs.h
 +++ b/drivers/infiniband/core/uverbs.h
 @@ -258,5 +258,6 @@ IB_UVERBS_DECLARE_CMD(close_xrcd);
  
  IB_UVERBS_DECLARE_EX_CMD(create_flow);
  IB_UVERBS_DECLARE_EX_CMD(destroy_flow);
 +IB_UVERBS_DECLARE_EX_CMD(query_device);
  
  #endif /* UVERBS_H */
 diff --git a/drivers/infiniband/core/uverbs_cmd.c 
 b/drivers/infiniband/core/uverbs_cmd.c
 index b7943ff16ed3..8f507538c42b 100644
 --- a/drivers/infiniband/core/uverbs_cmd.c
 +++ b/drivers/infiniband/core/uverbs_cmd.c
 @@ -400,6 +400,52 @@ err:
   return ret;
  }
  
 +static void copy_query_dev_fields(struct ib_uverbs_file *file,
 +   struct ib_uverbs_query_device_resp *resp,
 +   struct ib_device_attr *attr)
 +{
 + resp-fw_ver= attr-fw_ver;
 + resp-node_guid = file-device-ib_dev-node_guid;
 + resp-sys_image_guid= attr-sys_image_guid;
 + resp-max_mr_size   = attr-max_mr_size;
 + resp-page_size_cap = attr-page_size_cap;
 + resp-vendor_id = attr-vendor_id;
 + resp-vendor_part_id= attr-vendor_part_id;
 + resp-hw_ver= attr-hw_ver;
 + resp-max_qp= attr-max_qp;
 + resp-max_qp_wr = attr-max_qp_wr;
 + resp-device_cap_flags  = attr-device_cap_flags;
 + resp-max_sge   = attr-max_sge;
 + resp-max_sge_rd= attr-max_sge_rd;
 + resp-max_cq= attr-max_cq;
 + resp-max_cqe   = attr-max_cqe;
 + resp-max_mr= attr-max_mr;
 + resp-max_pd= attr-max_pd;
 + resp-max_qp_rd_atom= attr-max_qp_rd_atom;
 + resp-max_ee_rd_atom= attr-max_ee_rd_atom;
 + resp-max_res_rd_atom   = attr-max_res_rd_atom;
 + resp-max_qp_init_rd_atom   = attr-max_qp_init_rd_atom;
 + resp-max_ee_init_rd_atom   = attr-max_ee_init_rd_atom;
 + resp-atomic_cap= attr-atomic_cap;
 + resp-max_ee= attr-max_ee;
 + resp-max_rdd   = attr-max_rdd;
 + resp-max_mw= attr-max_mw;
 + resp-max_raw_ipv6_qp   = attr-max_raw_ipv6_qp;
 + resp-max_raw_ethy_qp   = attr-max_raw_ethy_qp;
 + resp-max_mcast_grp = attr-max_mcast_grp;
 + resp-max_mcast_qp_attach   = attr-max_mcast_qp_attach;
 + resp-max_total_mcast_qp_attach = attr-max_total_mcast_qp_attach;
 + resp-max_ah= attr-max_ah;
 + resp-max_fmr   = attr-max_fmr;
 + resp-max_map_per_fmr   = attr-max_map_per_fmr;
 + resp-max_srq   = attr-max_srq;
 + resp-max_srq_wr= attr-max_srq_wr;
 + resp-max_srq_sge   = attr-max_srq_sge;
 + resp-max_pkeys = attr-max_pkeys;
 + resp-local_ca_ack_delay

Re: [PATCH 2/3] IB/core: Add on demand paging caps to ib_uverbs_ex_query_device

2015-02-16 Thread Yann Droneaud
Hi,

Le dimanche 08 février 2015 à 13:28 +0200, Haggai Eran a écrit :
 Add a on-demand capabilities reporting to the extended query device verb.
 

Add on-demand paging (ODP) capabilities [...]

Perhaps we could also add a mention such the following to get an idea
of the purpose of those flags:
see also commit 860f10a799c8 (IB/core: Add flags for on demand paging
support)

 Yann Droneaud writes:
  Note: as offsetof() is used to retrieve the size of the lower chunk
  of the response, beware that it only works if the upper chunk
  is right after, without any implicit padding. And, as the size of
  the latter chunk is added to the base size, implicit padding at the
  end of the structure is not taken in account. Both point must be
  taken in account when extending the uverbs functionalities.
 
 Cc: Yann Droneaud ydrone...@opteya.com
 Cc: Ira Weiny ira.we...@intel.com
 Cc: Jason Gunthorpe jguntho...@obsidianresearch.com
 Cc: Eli Cohen e...@mellanox.com
 Signed-off-by: Haggai Eran hagg...@mellanox.com
 ---
  drivers/infiniband/core/uverbs_cmd.c | 20 +++-
  include/uapi/rdma/ib_user_verbs.h| 11 +++
  2 files changed, 30 insertions(+), 1 deletion(-)
 
 diff --git a/drivers/infiniband/core/uverbs_cmd.c 
 b/drivers/infiniband/core/uverbs_cmd.c
 index 8f507538c42b..04ca04559ce5 100644
 --- a/drivers/infiniband/core/uverbs_cmd.c
 +++ b/drivers/infiniband/core/uverbs_cmd.c
 @@ -3318,7 +3318,7 @@ int ib_uverbs_ex_query_device(struct ib_uverbs_file 
 *file,
   if (cmd.reserved)
   return -EINVAL;
  
 - resp.response_length = sizeof(resp);
 + resp.response_length = offsetof(typeof(resp), odp_caps);
  
   if (ucore-outlen  resp.response_length)
   return -ENOSPC;
 @@ -3330,6 +3330,24 @@ int ib_uverbs_ex_query_device(struct ib_uverbs_file 
 *file,
   copy_query_dev_fields(file, resp.base, attr);
   resp.comp_mask = 0;
  
 + if (ucore-outlen  resp.response_length + sizeof(resp.odp_caps))
 + goto end;
 +
 +#ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING
 + resp.odp_caps.general_caps = attr.odp_caps.general_caps;
 + resp.odp_caps.per_transport_caps.rc_odp_caps =
 + attr.odp_caps.per_transport_caps.rc_odp_caps;
 + resp.odp_caps.per_transport_caps.uc_odp_caps =
 + attr.odp_caps.per_transport_caps.uc_odp_caps;
 + resp.odp_caps.per_transport_caps.ud_odp_caps =
 + attr.odp_caps.per_transport_caps.ud_odp_caps;
 + resp.odp_caps.reserved = 0;
 +#else
 + memset(resp.odp_caps, 0, sizeof(resp.odp_caps));
 +#endif
 + resp.response_length += sizeof(resp.odp_caps);
 +
 +end:
   err = ib_copy_to_udata(ucore, resp, resp.response_length);
   if (err)
   return err;
 diff --git a/include/uapi/rdma/ib_user_verbs.h 
 b/include/uapi/rdma/ib_user_verbs.h
 index f0f799afd856..b513e662d8e4 100644
 --- a/include/uapi/rdma/ib_user_verbs.h
 +++ b/include/uapi/rdma/ib_user_verbs.h
 @@ -207,10 +207,21 @@ struct ib_uverbs_ex_query_device {
   __u32 reserved;
  };
  
 +struct ib_uverbs_odp_caps {
 + __u64 general_caps;
 + struct {
 + __u32 rc_odp_caps;
 + __u32 uc_odp_caps;
 + __u32 ud_odp_caps;
 + } per_transport_caps;
 + __u32 reserved;
 +};
 +
  struct ib_uverbs_ex_query_device_resp {
   struct ib_uverbs_query_device_resp base;
   __u32 comp_mask;
   __u32 response_length;
 + struct ib_uverbs_odp_caps odp_caps;
  };
  
  struct ib_uverbs_query_port {

Reviewed-by: Yann Droneaud ydrone...@opteya.com

Thanks a lot for this updated patch.

Regards.

-- 
Yann Droneaud
OPTEYA


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RE-RESEND V2 for-next 3/5] IB/core: Call ib_unregister_mad_agent() only for valid agents

2015-02-16 Thread Yann Droneaud
Le jeudi 05 février 2015 à 13:53 +0200, Or Gerlitz a écrit :
 From: Majd Dibbiny m...@mellanox.com
 
 In some error flows, ib_mad_unregister_agent is being invoked also in cases 
 where
  ^^^
  ib_unregister_mad_agent()

 the ib_mad_register_agent call failed (resulting in an illegal pointer in the
 ^
 ib_register_mad_agent()

 agent field).  This causes a kernel crash in the error flow.
 
 Fix this by calling ib_unregister_mad_agent only for cases where
 ib_register_mad_agent succeeded.
 

The code was checking for struct ib_mad_agent *agent not being NULL, 
while ib_register_mad_agent() returns a ERR_PTR() in case of error
... bad :(

 Signed-off-by: Majd Dibbiny m...@mellanox.com
 Signed-off-by: Or Gerlitz ogerl...@mellanox.com
 ---
  drivers/infiniband/hw/mlx4/mad.c|2 +-
  drivers/infiniband/hw/mthca/mthca_mad.c |2 +-
  drivers/infiniband/hw/qib/qib_mad.c |2 +-
  3 files changed, 3 insertions(+), 3 deletions(-)
 
 diff --git a/drivers/infiniband/hw/mlx4/mad.c 
 b/drivers/infiniband/hw/mlx4/mad.c
 index 82a7dd8..6be0d2c 100644
 --- a/drivers/infiniband/hw/mlx4/mad.c
 +++ b/drivers/infiniband/hw/mlx4/mad.c
 @@ -907,7 +907,7 @@ int mlx4_ib_mad_init(struct mlx4_ib_dev *dev)
  err:
   for (p = 0; p  dev-num_ports; ++p)
   for (q = 0; q = 1; ++q)
 - if (dev-send_agent[p][q])
 + if (!IS_ERR(dev-send_agent[p][q]))
   ib_unregister_mad_agent(dev-send_agent[p][q]);
  
   return ret;
 diff --git a/drivers/infiniband/hw/mthca/mthca_mad.c 
 b/drivers/infiniband/hw/mthca/mthca_mad.c
 index 8881fa3..5f1a7ce 100644
 --- a/drivers/infiniband/hw/mthca/mthca_mad.c
 +++ b/drivers/infiniband/hw/mthca/mthca_mad.c
 @@ -317,7 +317,7 @@ int mthca_create_agents(struct mthca_dev *dev)
  err:
   for (p = 0; p  dev-limits.num_ports; ++p)
   for (q = 0; q = 1; ++q)
 - if (dev-send_agent[p][q])
 + if (!IS_ERR(dev-send_agent[p][q]))
   ib_unregister_mad_agent(dev-send_agent[p][q]);
  
   return ret;
 diff --git a/drivers/infiniband/hw/qib/qib_mad.c 
 b/drivers/infiniband/hw/qib/qib_mad.c
 index 636be11..6c7cc80 100644
 --- a/drivers/infiniband/hw/qib/qib_mad.c
 +++ b/drivers/infiniband/hw/qib/qib_mad.c
 @@ -2499,7 +2499,7 @@ int qib_create_agents(struct qib_ibdev *dev)
  err:
   for (p = 0; p  dd-num_pports; p++) {
   ibp = dd-pport[p].ibport_data;
 - if (ibp-send_agent) {
 + if (!IS_ERR(ibp-send_agent)) {
   agent = ibp-send_agent;
   ibp-send_agent = NULL;
   ib_unregister_mad_agent(agent);

Regards.

-- 
Yann Droneaud
OPTEYA


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] IB/mlx5: Enable the ODP capability query verb

2015-02-16 Thread Yann Droneaud
Hi,

Le dimanche 08 février 2015 à 13:28 +0200, Haggai Eran a écrit :
 Re-enable the on-demand paging capability query through the
 extended query device verb.
 

I would find useful to have a link to the initial commit in this one,
so that the thing makes more sense:

commit 8cdd312cfed7 (IB/mlx5: Implement the ODP capability query
verb).

 Signed-off-by: Haggai Eran hagg...@mellanox.com
 ---
  drivers/infiniband/hw/mlx5/main.c | 2 ++
  1 file changed, 2 insertions(+)
 
 diff --git a/drivers/infiniband/hw/mlx5/main.c 
 b/drivers/infiniband/hw/mlx5/main.c
 index 03bf81211a54..8a87404e9c76 100644
 --- a/drivers/infiniband/hw/mlx5/main.c
 +++ b/drivers/infiniband/hw/mlx5/main.c
 @@ -1331,6 +1331,8 @@ static void *mlx5_ib_add(struct mlx5_core_dev *mdev)
   (1ull  IB_USER_VERBS_CMD_DESTROY_SRQ) |
   (1ull  IB_USER_VERBS_CMD_CREATE_XSRQ) |
   (1ull  IB_USER_VERBS_CMD_OPEN_QP);
 + dev-ib_dev.uverbs_ex_cmd_mask =
 + (1ull  IB_USER_VERBS_EX_CMD_QUERY_DEVICE);
  
   dev-ib_dev.query_device= mlx5_ib_query_device;
   dev-ib_dev.query_port  = mlx5_ib_query_port;

Reviewed-by: Yann Droneaud ydrone...@opteya.com

Regards.

-- 
Yann Droneaud
OPTEYA


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] IB/srp: Add 64-bit LUN support

2015-02-10 Thread Yann Droneaud
[2];
   u8  tsk_mgmt_func;
   u8  reserved4;
 @@ -200,7 +201,7 @@ struct srp_cmd {
   u8  data_in_desc_cnt;
   u64 tag;
   u8  reserved2[4];
 - __be64  lun __attribute__((packed));
 + struct scsi_lun lun;
   u8  reserved3;
   u8  task_attr;
   u8  reserved4;
 @@ -265,7 +266,7 @@ struct srp_aer_req {
   __be32  req_lim_delta;
   u64 tag;
   u32 reserved2;
 - __be64  lun;
 + struct scsi_lun lun;
   __be32  sense_data_len;
   u32 reserved3;
   u8  sense_data[0];

Regards.

-- 
Yann Droneaud
OPTEYA


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] IB/srp: Add 64-bit LUN support

2015-02-10 Thread Yann Droneaud
Hi Bart,

Le mardi 10 février 2015 à 12:41 +0100, Bart Van Assche a écrit :
 On 02/10/15 11:13, Yann Droneaud wrote:
  Le lundi 02 février 2015 à 18:21 +0100, Bart Van Assche a écrit :
  The largest LUN number that has been tested is 0xd2003fff.
 
  Checked the following structure sizes with gdb:
  * sizeof(struct srp_cmd) = 48
  * sizeof(struct srp_tsk_mgmt) = 48
  * sizeof(struct srp_aer_req) = 36
  
  You can use 'pahole' tool to retrieve the size (and layout) of any
  data structure from a .o object file.
 
 Thanks, I will have a look at that tool.
 
  The ibmvscsi changes have been compile tested only (on a PPC system).
  
  The changelog lack of any information about the purpose of 64bit LUNs.
  Please detail a bit why and how.
 
 You might have missed Hannes' patch series through which 64-bit LUN
 support was added in the SCSI core and in most SCSI LLD drivers.

Yes I've missed this thread, and, as this commit doesn't have a clear
reference to it, I feel I might not be alone wondering about it.

  The
 description of one of the patches in that series reads as follows:
 
The SCSI standard defines 64-bit values for LUNs, and large arrays
 employing large or hierarchical LUN numbers become more and more
 common.
 
 So update the linux SCSI stack to use 64-bit LUN numbers.
 
 Is it OK for you if I resend this patch with that text added in the
 patch description ?
 

That's would be perfect: it makes a lot more sense with these
information added. Additionally, you could add a link to the patch
series.

Thanks a lot.

Regards.

-- 
Yann Droneaud
OPTEYA


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PULL] please pull infiniband.git

2015-02-07 Thread Yann Droneaud
Hi,

Le vendredi 06 février 2015 à 13:19 -0800, Roland Dreier a écrit :
 
 Please pull from
 
 git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git 
 tags/rdma-for-linus
 
 
 One more last-second RDMA change for 3.19:
  - Yann realized that the previous revert of new userspace ABI did not
go far enough, and we're still exposing a change that we don't want.
Revert even closer to 3.18 interface to make sure we get things right
in the long run.
 
 Sorry for sending this at the very end of the release cycle, but we
 didn't realize the scope of the required fix until just now.
 

I hope this could go in v3.19 as, at this stage, we don't want to expose
any bits of this ABI in a released kernel.

Regards.

-- 
Yann Droneaud
OPTEYA


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Revert IB/core: Add support for extended query device caps

2015-02-06 Thread Yann Droneaud
Hi,

Le vendredi 06 février 2015 à 00:55 -0800, Roland Dreier a écrit :
 Thanks for noticing this at the last moment.
 

I should have noticed this earlier, but, as small is beautiful,
I was impressed by the patch from Haggai and thought it would better 
than an ugly revert ... Foolish I was.

 I will send this to Linus on Friday.

Thanks a lot. And sorry about this epic adventure in ABI land :/

Regards.

-- 
Yann Droneaud
OPTEYA



--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Revert IB/core: Add support for extended query device caps

2015-02-05 Thread Yann Droneaud
While commit 7e36ef8205ff (IB/core: Temporarily disable ex_query_device uverb)
is correct as it make the extended QUERY_DEVICE uverb as integrated as
part of commit 5a77abf9a97a (IB/core: Add support for extended query device 
caps)
and commit 860f10a799c8 (IB/core: Add flags for on demand paging support)
not available to userspace, it doesn't address the initial issue
regarding ib_copy_to_udata() [1][2].

Additionally, further discussions around this new uverb seems to
conclude it would require a different data structure than the one
currently described in rdma/ib_user_verbs.h [3].

Both of these issues require a revert of the changes,
so this patch partially revert commit 8cdd312cfed7 (IB/mlx5: Implement the ODP 
capability query verb),
commit 860f10a799c8 (IB/core: Add flags for on demand paging support)
and fully revert commit 5a77abf9a97a (IB/core: Add support for extended query 
device caps).

[1] Re: [PATCH v3 06/17] IB/core: Add support for extended query device caps
http://mid.gmane.org/1418733236.2779.26.ca...@opteya.com

[2] Re: [PATCH] IB/core: Temporarily disable ex_query_device uverb
http://mid.gmane.org/1423067503.3030.83.ca...@opteya.com

[3] RE: [PATCH v1 1/5] IB/uverbs: ex_query_device: answer must not depend on 
request's comp_mask

http://mid.gmane.org/2807e5fd2f6fda4886f6618eac48510e0cc12...@crsmsx101.amr.corp.intel.com

Cc: Eli Cohen e...@mellanox.com
Cc: Haggai Eran hagg...@mellanox.com
Cc: Ira Weiny ira.we...@intel.com
Cc: Jason Gunthorpe jguntho...@obsidianresearch.com
Cc: Sagi Grimberg sa...@mellanox.com
Cc: Shachar Raindel rain...@mellanox.com
Signed-off-by: Yann Droneaud ydrone...@opteya.com
---
 drivers/infiniband/core/uverbs.h |   1 -
 drivers/infiniband/core/uverbs_cmd.c | 137 +++
 drivers/infiniband/hw/mlx5/main.c|   2 -
 include/rdma/ib_verbs.h  |   5 +-
 include/uapi/rdma/ib_user_verbs.h|  27 ---
 5 files changed, 42 insertions(+), 130 deletions(-)

diff --git a/drivers/infiniband/core/uverbs.h b/drivers/infiniband/core/uverbs.h
index b716b0815644..643c08a025a5 100644
--- a/drivers/infiniband/core/uverbs.h
+++ b/drivers/infiniband/core/uverbs.h
@@ -258,6 +258,5 @@ IB_UVERBS_DECLARE_CMD(close_xrcd);
 
 IB_UVERBS_DECLARE_EX_CMD(create_flow);
 IB_UVERBS_DECLARE_EX_CMD(destroy_flow);
-IB_UVERBS_DECLARE_EX_CMD(query_device);
 
 #endif /* UVERBS_H */
diff --git a/drivers/infiniband/core/uverbs_cmd.c 
b/drivers/infiniband/core/uverbs_cmd.c
index 532d8eba8b02..b7943ff16ed3 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -400,52 +400,6 @@ err:
return ret;
 }
 
-static void copy_query_dev_fields(struct ib_uverbs_file *file,
- struct ib_uverbs_query_device_resp *resp,
- struct ib_device_attr *attr)
-{
-   resp-fw_ver= attr-fw_ver;
-   resp-node_guid = file-device-ib_dev-node_guid;
-   resp-sys_image_guid= attr-sys_image_guid;
-   resp-max_mr_size   = attr-max_mr_size;
-   resp-page_size_cap = attr-page_size_cap;
-   resp-vendor_id = attr-vendor_id;
-   resp-vendor_part_id= attr-vendor_part_id;
-   resp-hw_ver= attr-hw_ver;
-   resp-max_qp= attr-max_qp;
-   resp-max_qp_wr = attr-max_qp_wr;
-   resp-device_cap_flags  = attr-device_cap_flags;
-   resp-max_sge   = attr-max_sge;
-   resp-max_sge_rd= attr-max_sge_rd;
-   resp-max_cq= attr-max_cq;
-   resp-max_cqe   = attr-max_cqe;
-   resp-max_mr= attr-max_mr;
-   resp-max_pd= attr-max_pd;
-   resp-max_qp_rd_atom= attr-max_qp_rd_atom;
-   resp-max_ee_rd_atom= attr-max_ee_rd_atom;
-   resp-max_res_rd_atom   = attr-max_res_rd_atom;
-   resp-max_qp_init_rd_atom   = attr-max_qp_init_rd_atom;
-   resp-max_ee_init_rd_atom   = attr-max_ee_init_rd_atom;
-   resp-atomic_cap= attr-atomic_cap;
-   resp-max_ee= attr-max_ee;
-   resp-max_rdd   = attr-max_rdd;
-   resp-max_mw= attr-max_mw;
-   resp-max_raw_ipv6_qp   = attr-max_raw_ipv6_qp;
-   resp-max_raw_ethy_qp   = attr-max_raw_ethy_qp;
-   resp-max_mcast_grp = attr-max_mcast_grp;
-   resp-max_mcast_qp_attach   = attr-max_mcast_qp_attach;
-   resp-max_total_mcast_qp_attach = attr-max_total_mcast_qp_attach;
-   resp-max_ah= attr-max_ah;
-   resp-max_fmr   = attr-max_fmr;
-   resp-max_map_per_fmr   = attr-max_map_per_fmr;
-   resp-max_srq   = attr-max_srq;
-   resp-max_srq_wr= attr-max_srq_wr;
-   resp-max_srq_sge   = attr-max_srq_sge;
-   resp-max_pkeys = attr-max_pkeys;
-   resp-local_ca_ack_delay

Re: [PATCH] IB/core: Temporarily disable ex_query_device uverb

2015-02-04 Thread Yann Droneaud
Hi,

Le lundi 02 février 2015 à 14:10 +0100, Yann Droneaud a écrit :
 Le dimanche 01 février 2015 à 15:35 +0200, Haggai Eran a écrit :
  Commit 5a77abf9a97a (IB/core: Add support for extended query device caps)
  added a new extended verb to query the capabilities of RDMA devices, but the
  semantics of this verb are still under debate [1].
  
  Block access to this verb from user-space until the new semantics are in
  place.
  
  Cc: Yann Droneaud ydrone...@opteya.com
  Cc: Jason Gunthorpe jguntho...@obsidianresearch.com
  Cc: Eli Cohen e...@mellanox.com
  
  [1] [PATCH v1 0/5] IB/core: extended query device caps cleanup for v3.19
  http://www.spinics.net/lists/linux-rdma/msg22904.html
  
  Signed-off-by: Haggai Eran hagg...@mellanox.com
 
 Reviewed-by: Yann Droneaud ydrone...@opteya.com

   drivers/infiniband/core/uverbs_main.c | 1 -
   1 file changed, 1 deletion(-)
  
  diff --git a/drivers/infiniband/core/uverbs_main.c 
  b/drivers/infiniband/core/uverbs_main.c
  index e6c23b9eab33..5db1a8cc388d 100644
  --- a/drivers/infiniband/core/uverbs_main.c
  +++ b/drivers/infiniband/core/uverbs_main.c
  @@ -123,7 +123,6 @@ static int (*uverbs_ex_cmd_table[])(struct 
  ib_uverbs_file *file,
  struct ib_udata *uhw) = {
  [IB_USER_VERBS_EX_CMD_CREATE_FLOW]  = ib_uverbs_ex_create_flow,
  [IB_USER_VERBS_EX_CMD_DESTROY_FLOW] = ib_uverbs_ex_destroy_flow,
  -   [IB_USER_VERBS_EX_CMD_QUERY_DEVICE] = ib_uverbs_ex_query_device
   };
   
   static void ib_uverbs_add_one(struct ib_device *device);
 
 That's the smallest (and smartest) patch to be applied instead of
 reverting.
 

Unfortunately, I've missed the issue I was complaining about in the
first place [1]. And I feel a bit guilty having missed the issue.

The present patch is fine as it fully disable the new extended 
QUERY_DEVICE uverb, but it doesn't disable the broken logic added in
ib_copy_to_udata() by commit 5a77abf9a97a ('IB/core: Add support for
extended query device caps'):

  diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
  index 470a011d6fa4..97a999f9e4d8 100644
  --- a/include/rdma/ib_verbs.h
  +++ b/include/rdma/ib_verbs.h
  @@ -1662,7 +1662,10 @@ static inline int ib_copy_from_udata(void *dest, 
struct ib_udata *udata, size_t
   
   static inline int ib_copy_to_udata(struct ib_udata *udata, void *src, size_t 
len)
   {
  - return copy_to_user(udata-outbuf, src, len) ? -EFAULT : 0;
  + size_t copy_sz;
  +
  + copy_sz = min_t(size_t, len, udata-outlen);
  + return copy_to_user(udata-outbuf, src, copy_sz) ? -EFAULT : 0;
   }

That part of commit 5a77abf9a97a should be reverted as I'm not sure
it doesn't introduce regressions, especially difficult to notice ones.

Regards.

[1] Re: [PATCH v3 06/17] IB/core: Add support for extended query device caps
http://mid.gmane.org/1418733236.2779.26.ca...@opteya.com

Regards.

-- 
Yann Droneaud
OPTEYA


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] IB/core: Temporarily disable ex_query_device uverb

2015-02-02 Thread Yann Droneaud
Le dimanche 01 février 2015 à 15:35 +0200, Haggai Eran a écrit :
 Commit 5a77abf9a97a (IB/core: Add support for extended query device caps)
 added a new extended verb to query the capabilities of RDMA devices, but the
 semantics of this verb are still under debate [1].
 
 Block access to this verb from user-space until the new semantics are in
 place.
 
 Cc: Yann Droneaud ydrone...@opteya.com
 Cc: Jason Gunthorpe jguntho...@obsidianresearch.com
 Cc: Eli Cohen e...@mellanox.com
 
 [1] [PATCH v1 0/5] IB/core: extended query device caps cleanup for v3.19
 http://www.spinics.net/lists/linux-rdma/msg22904.html
 
 Signed-off-by: Haggai Eran hagg...@mellanox.com

Reviewed-by: Yann Droneaud ydrone...@opteya.com

 ---
 
 Hi Roland,
 
 As I wrote in a separate mail, I hope you can accept Yann's patches to modify
 the new query device verb [1] in time for kernel 3.19. 
 
 If you decide to revert the extended query device verb, here is a minimal 
 patch
 to do block this verb temporarily, leaving the rest of the ODP infrastructure
 in-place.
 
 Regards,
 Haggai
 
  drivers/infiniband/core/uverbs_main.c | 1 -
  1 file changed, 1 deletion(-)
 
 diff --git a/drivers/infiniband/core/uverbs_main.c 
 b/drivers/infiniband/core/uverbs_main.c
 index e6c23b9eab33..5db1a8cc388d 100644
 --- a/drivers/infiniband/core/uverbs_main.c
 +++ b/drivers/infiniband/core/uverbs_main.c
 @@ -123,7 +123,6 @@ static int (*uverbs_ex_cmd_table[])(struct ib_uverbs_file 
 *file,
   struct ib_udata *uhw) = {
   [IB_USER_VERBS_EX_CMD_CREATE_FLOW]  = ib_uverbs_ex_create_flow,
   [IB_USER_VERBS_EX_CMD_DESTROY_FLOW] = ib_uverbs_ex_destroy_flow,
 - [IB_USER_VERBS_EX_CMD_QUERY_DEVICE] = ib_uverbs_ex_query_device
  };
  
  static void ib_uverbs_add_one(struct ib_device *device);

That's the smallest (and smartest) patch to be applied instead of
reverting.

Regards.

-- 
Yann Droneaud
OPTEYA


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for-next 05/10] IB/cm,cma: Move RDMA IP CM private-data parsing code from ib_cma to ib_cm

2015-02-01 Thread Yann Droneaud
(cma_mask, 0xF);
   if (!cma_any_addr(addr)) {
   cma_data-dst_addr.ip6 = ip6_addr;
   memset(cma_mask-dst_addr.ip6, 0xFF,
 @@ -2615,35 +2510,6 @@ err1:
  }
  EXPORT_SYMBOL(rdma_bind_addr);
  
 -static int cma_format_hdr(void *hdr, struct rdma_id_private *id_priv)
 -{
 - struct cma_hdr *cma_hdr;
 -
 - cma_hdr = hdr;
 - cma_hdr-cma_version = CMA_VERSION;
 - if (cma_family(id_priv) == AF_INET) {
 - struct sockaddr_in *src4, *dst4;
 -
 - src4 = (struct sockaddr_in *) cma_src_addr(id_priv);
 - dst4 = (struct sockaddr_in *) cma_dst_addr(id_priv);
 -
 - cma_set_ip_ver(cma_hdr, 4);
 - cma_hdr-src_addr.ip4.addr = src4-sin_addr.s_addr;
 - cma_hdr-dst_addr.ip4.addr = dst4-sin_addr.s_addr;
 - cma_hdr-port = src4-sin_port;
 - } else if (cma_family(id_priv) == AF_INET6) {
 - struct sockaddr_in6 *src6, *dst6;
 -
 - src6 = (struct sockaddr_in6 *) cma_src_addr(id_priv);
 - dst6 = (struct sockaddr_in6 *) cma_dst_addr(id_priv);
 -
 - cma_set_ip_ver(cma_hdr, 6);
 - cma_hdr-src_addr.ip6 = src6-sin6_addr;
 - cma_hdr-dst_addr.ip6 = dst6-sin6_addr;
 - cma_hdr-port = src6-sin6_port;
 - }
 - return 0;
 -}
  
  static int cma_sidr_rep_handler(struct ib_cm_id *cm_id,
   struct ib_cm_event *ib_event)
 @@ -2731,7 +2597,9 @@ static int cma_resolve_ib_udp(struct rdma_id_private 
 *id_priv,
  conn_param-private_data_len);
  
   if (private_data) {
 - ret = cma_format_hdr(private_data, id_priv);
 + ret = cm_format_hdr(private_data, cma_family(id_priv),
 + cma_src_addr(id_priv),
 + cma_dst_addr(id_priv));
   if (ret)
   goto out;
   req.private_data = private_data;
 @@ -2796,7 +2664,9 @@ static int cma_connect_ib(struct rdma_id_private 
 *id_priv,
  
   route = id_priv-id.route;
   if (private_data) {
 - ret = cma_format_hdr(private_data, id_priv);
 + ret = cm_format_hdr(private_data, cma_family(id_priv),
 + cma_src_addr(id_priv),
 + cma_dst_addr(id_priv));
   if (ret)
   goto out;
   req.private_data = private_data;
 diff --git a/include/rdma/ib_cm.h b/include/rdma/ib_cm.h
 index 0e3ff30647d5..e418a11afcfe 100644
 --- a/include/rdma/ib_cm.h
 +++ b/include/rdma/ib_cm.h
 @@ -274,6 +274,52 @@ struct ib_cm_event {
  #define CM_LAP_ATTR_ID   cpu_to_be16(0x0019)
  #define CM_APR_ATTR_ID   cpu_to_be16(0x001A)
  
 +union cm_ip_addr {
 + struct in6_addr ip6;
 + struct {
 + __be32 pad[3];
 + __be32 addr;
 + } ip4;
 +};
 +
 +struct cm_hdr {
 + u8 cm_version;
 + u8 ip_version;  /* IP version: 7:4 */
 + __be16 port;
 + union cm_ip_addr src_addr;
 + union cm_ip_addr dst_addr;
 +};
 +
 +#define RDMA_IP_CM_VERSION 0x00
 +
 +/**
 + * cm_format_hdr - Fill in a cm_hdr struct according to connection details
 + * @hdr:  cm_hdr struct to fill
 + * @family:   ip family of the addresses - AF_INET or AF_INTET6
 + * @src_addr: source address of the connection
 + * @dst_addr: destination address of the connection
 + **/
 +int cm_format_hdr(void *hdr, int family,
 +   struct sockaddr *src_addr,
 +   struct sockaddr *dst_addr);
 +
 +/**
 + * cm_save_net_info - saves ib connection event details
 + * @src_addr: source address of the connection
 + * @dst_addr: destination address of the connection
 + * @ib_event: ib event to take connection details from
 + **/
 +int cm_save_net_info(struct sockaddr *src_addr,
 +  struct sockaddr *dst_addr,
 +  struct ib_cm_event *ib_event);
 +
 +/**
 + * cm_set_ip_ver - sets the ip version of a cm_hdr struct
 + * @hdr:cm_hdr struct to change
 + * @ip_ver: ip version to set - a 4 bit value
 + **/
 +void cm_set_ip_ver(struct cm_hdr *hdr, u8 ip_ver);
 +
  /**
   * ib_cm_handler - User-defined callback to process communication events.
   * @cm_id: Communication identifier associated with the reported event.

Every other symbols in ib_cm.h are prefixed by ib_cm_, so I would
prefer having symbols moved from ib_cma.c to ib_cm.c be renamed, except
it would create a lot of code change ...

Regards.

-- 
Yann Droneaud
OPTEYA



--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for-next 08/10] IB/cma: Add support for network namespaces

2015-02-01 Thread Yann Droneaud
 error;
  
 diff --git a/net/rds/ib.c b/net/rds/ib.c
 index ba2dffeff608..cc137f523248 100644
 --- a/net/rds/ib.c
 +++ b/net/rds/ib.c
 @@ -326,7 +326,7 @@ static int rds_ib_laddr_check(__be32 addr)
   /* Create a CMA ID and try to bind it. This catches both
* IB and iWARP capable NICs.
*/
 - cm_id = rdma_create_id(NULL, NULL, RDMA_PS_TCP, IB_QPT_RC);
 + cm_id = rdma_create_id(NULL, NULL, RDMA_PS_TCP, IB_QPT_RC, init_net);
   if (IS_ERR(cm_id))
   return PTR_ERR(cm_id);
  
 diff --git a/net/rds/ib_cm.c b/net/rds/ib_cm.c
 index 31b74f5e61ad..d19b91296ddc 100644
 --- a/net/rds/ib_cm.c
 +++ b/net/rds/ib_cm.c
 @@ -584,7 +584,7 @@ int rds_ib_conn_connect(struct rds_connection *conn)
   /* XXX I wonder what affect the port space has */
   /* delegate cm event handler to rdma_transport */
   ic-i_cm_id = rdma_create_id(rds_rdma_cm_event_handler, conn,
 -  RDMA_PS_TCP, IB_QPT_RC);
 +  RDMA_PS_TCP, IB_QPT_RC, init_net);
   if (IS_ERR(ic-i_cm_id)) {
   ret = PTR_ERR(ic-i_cm_id);
   ic-i_cm_id = NULL;
 diff --git a/net/rds/iw.c b/net/rds/iw.c
 index 589935661d66..8501b73ed12f 100644
 --- a/net/rds/iw.c
 +++ b/net/rds/iw.c
 @@ -227,7 +227,7 @@ static int rds_iw_laddr_check(__be32 addr)
   /* Create a CMA ID and try to bind it. This catches both
* IB and iWARP capable NICs.
*/
 - cm_id = rdma_create_id(NULL, NULL, RDMA_PS_TCP, IB_QPT_RC);
 + cm_id = rdma_create_id(NULL, NULL, RDMA_PS_TCP, IB_QPT_RC, init_net);
   if (IS_ERR(cm_id))
   return PTR_ERR(cm_id);
  
 diff --git a/net/rds/iw_cm.c b/net/rds/iw_cm.c
 index a91e1db62ee6..e5ee2d562a60 100644
 --- a/net/rds/iw_cm.c
 +++ b/net/rds/iw_cm.c
 @@ -521,7 +521,7 @@ int rds_iw_conn_connect(struct rds_connection *conn)
   /* XXX I wonder what affect the port space has */
   /* delegate cm event handler to rdma_transport */
   ic-i_cm_id = rdma_create_id(rds_rdma_cm_event_handler, conn,
 -  RDMA_PS_TCP, IB_QPT_RC);
 +  RDMA_PS_TCP, IB_QPT_RC, init_net);
   if (IS_ERR(ic-i_cm_id)) {
   ret = PTR_ERR(ic-i_cm_id);
   ic-i_cm_id = NULL;
 diff --git a/net/rds/rdma_transport.c b/net/rds/rdma_transport.c
 index 6cd9d1deafc3..066b60b27b12 100644
 --- a/net/rds/rdma_transport.c
 +++ b/net/rds/rdma_transport.c
 @@ -160,7 +160,7 @@ static int rds_rdma_listen_init(void)
   int ret;
  
   cm_id = rdma_create_id(rds_rdma_cm_event_handler, NULL, RDMA_PS_TCP,
 -IB_QPT_RC);
 +IB_QPT_RC, init_net);
   if (IS_ERR(cm_id)) {
   ret = PTR_ERR(cm_id);
   printk(KERN_ERR RDS/RDMA: failed to setup listener, 
 diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c 
 b/net/sunrpc/xprtrdma/svc_rdma_transport.c
 index 4e618808bc98..e3b246e305f9 100644
 --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
 +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
 @@ -701,7 +701,7 @@ static struct svc_xprt *svc_rdma_create(struct svc_serv 
 *serv,
   xprt = cma_xprt-sc_xprt;
  
   listen_id = rdma_create_id(rdma_listen_handler, cma_xprt, RDMA_PS_TCP,
 -IB_QPT_RC);
 +IB_QPT_RC, init_net);
   if (IS_ERR(listen_id)) {
   ret = PTR_ERR(listen_id);
   dprintk(svcrdma: rdma_create_id failed = %d\n, ret);
 diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
 index c98e40643910..f574e77165f4 100644
 --- a/net/sunrpc/xprtrdma/verbs.c
 +++ b/net/sunrpc/xprtrdma/verbs.c
 @@ -483,7 +483,8 @@ rpcrdma_create_id(struct rpcrdma_xprt *xprt,
  
   init_completion(ia-ri_done);
  
 - id = rdma_create_id(rpcrdma_conn_upcall, xprt, RDMA_PS_TCP, IB_QPT_RC);
 + id = rdma_create_id(rpcrdma_conn_upcall, xprt, RDMA_PS_TCP, IB_QPT_RC,
 + init_net);
   if (IS_ERR(id)) {
   rc = PTR_ERR(id);
   dprintk(RPC:   %s: rdma_create_id() failed %i\n,

Regards.

-- 
Yann Droneaud
OPTEYA


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 2/5] IB/uverbs: ex_query_device: check request's comp_mask

2015-02-01 Thread Yann Droneaud
Hi,

Le dimanche 01 février 2015 à 10:12 +0200, Haggai Eran a écrit :
 On 29/01/2015 19:59, Yann Droneaud wrote:
  This patch ensures the extended QUERY_DEVICE uverbs request's
  comp_mask has only known and supported bits (currently none).
  
  If userspace set unknown features bits, -EINVAL will be returned,
  ensuring current programs are not allowed to set random feature
  bits: such bits could enable new extended features in future kernel
  versions and those features can trigger a behavior not unsupported
  by the older programs or make the newer kernels return an error
  for a request which was valid on older kernels.
  
  Additionally, returning an error for unsupported feature would
  allow userspace to probe/discover which extended features are
  currently supported by a kernel.
 
 As I wrote before, I hope in the future we don't force userspace to
 probe features this way, because it may be unnecessarily complex.
 

I believe that most use cases won't need probing as applications are
often built according to the current kernel features in mind.

If applications need to use new features, it seems to be a small price
to pay to be prepared to get -EINVAL.

In another word: backward compatibility from application point of view:
a newer application wanting to run on older kernel must be prepared to.

 I agree though that we should have a way to extend this verb in the future.
 
 Reviewed-by: Haggai Eran hagg...@mellanox.com
 

Thanks.

  
  Link: http://mid.gmane.org/cover.1422553023.git.ydrone...@opteya.com
  Cc: Sagi Grimberg sa...@mellanox.com
  Cc: Shachar Raindel rain...@mellanox.com
  Cc: Eli Cohen e...@mellanox.com
  Cc: Haggai Eran hagg...@mellanox.com
  Signed-off-by: Yann Droneaud ydrone...@opteya.com
  ---
   drivers/infiniband/core/uverbs_cmd.c | 3 +++
   1 file changed, 3 insertions(+)
  
  diff --git a/drivers/infiniband/core/uverbs_cmd.c 
  b/drivers/infiniband/core/uverbs_cmd.c
  index 6ef06a9b4362..fbcc54b86795 100644
  --- a/drivers/infiniband/core/uverbs_cmd.c
  +++ b/drivers/infiniband/core/uverbs_cmd.c
  @@ -3312,6 +3312,9 @@ int ib_uverbs_ex_query_device(struct ib_uverbs_file 
  *file,
  if (err)
  return err;
   
  +   if (cmd.comp_mask)
  +   return -EINVAL;
  +
  if (cmd.reserved)
  return -EINVAL;
   
  
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-api in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] infiniband: Help gcc generate better code for ocrdma_srq_toggle_bit

2015-01-31 Thread Yann Droneaud
Hi,

Le samedi 31 janvier 2015 à 00:00 +0100, Rasmus Villemoes a écrit :
 ping
 

As you're fixing ocrdma driver, I think you might want to find people
@emulex.com to review your patches.

BTW, there's no MAINTAINERS entry for ocrdma driver ... which is a pity.

 On Fri, Jan 16 2015, Rasmus Villemoes li...@rasmusvillemoes.dk wrote:
 
  gcc emits a surprising amount of code in order to flip a bit. One
  would think that a single instruction is enough.
 
  $ scripts/bloat-o-meter /tmp/ocrdma_verbs.o 
  drivers/infiniband/hw/ocrdma/ocrdma_verbs.o
  add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-142 (-142)
  function old new   delta
  ocrdma_post_srq_recv 498 460 -38
  ocrdma_poll_cq  20101962 -48
  ocrdma_discard_cqes  495 439 -56
 
  All three calls of ocrdma_srq_toggle_bit happen within spinlocks, so
  saving a few useless instructions might be worthwhile.
 
  Signed-off-by: Rasmus Villemoes li...@rasmusvillemoes.dk
  ---
   drivers/infiniband/hw/ocrdma/ocrdma_verbs.c | 5 +
   1 file changed, 1 insertion(+), 4 deletions(-)
 
  diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c 
  b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
  index fb8d8c4dfbb9..eff11e6c6183 100644
  --- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
  +++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
  @@ -1484,10 +1484,7 @@ static void ocrdma_srq_toggle_bit(struct ocrdma_srq 
  *srq, int idx)
  int i = idx / 32;
  unsigned int mask = (1  (idx % 32));
   
  -   if (srq-idx_bit_fields[i]  mask)
  -   srq-idx_bit_fields[i] = ~mask;
  -   else
  -   srq-idx_bit_fields[i] |= mask;
  +   srq-idx_bit_fields[i] ^= mask;
   }
   
   static int ocrdma_hwq_free_cnt(struct ocrdma_qp_hwq_info *q)
 --

Regards.

-- 
Yann Droneaud
OPTEYA



--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 1/5] IB/uverbs: ex_query_device: answer must not depend on request's comp_mask

2015-01-30 Thread Yann Droneaud
Hi,

Le jeudi 29 janvier 2015 à 15:17 -0800, Roland Dreier a écrit :
 On Thu, Jan 29, 2015 at 1:59 PM, Yann Droneaud ydrone...@opteya.com wrote:
  Roland: I agree with Yann, these patches need to go in, or the ODP
  patches reverted.
 
  Reverting all On Demand Paging patches seems overkill:
  if something as to be reverted it should be commit 5a77abf9a97a
  (IB/core: Add support for extended query device caps) and the part of
  commit 860f10a799c8 (IB/core: Add flags for on demand paging support)
  which modify ib_uverbs_ex_query_device().
 
 Thank you and Jason for taking on this interface.
 
 At this point I feel like I do about the IPoIB changes -- we should
 revert the broken stuff and get it right for 3.20.
 
 If we revert the two things you describe above, is everything else OK
 to leave in 3.19 with respect to ABI?
 

I've tried to review every changes since v3.18 on drivers/infiniband
include/rdma and include/uapi/rdma with respect to ABI issues.

I've noticed no other issue, but I have to admit I've not well reviewed
the drivers (hw/) internal changes.

If the IB_USER_VERBS_EX_CMD_QUERY_DEVICE and ib_uverbs_ex_query_device
changes are going to be reverted for v3.19, the on-demand-paging
feature will be available (IB_DEVICE_ON_DEMAND_PAGING will be set 
device_cap_flags in response to non extended QUERY_DEVICE for mlx5 HCA
and IB_ACCESS_ON_DEMAND access flag will be effective for REG_MR 
uverbs), but its parameters won't be. I don't know if it's a no-go for 
the usage of on-demand paging by userspace: I have not the chance
of owning HCA with the support for this feature, nor the patches
libibverbs / libmlx5 ... (anyway I would not have the time to test).
I've hoped people from Mellanox would have commented on the revert 
option too.

Regards.

-- 
Yann Droneaud
OPTEYA


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] IB/uverbs: ex_query_device: check request's comp_mask

2015-01-29 Thread Yann Droneaud
Hi,

Le mercredi 28 janvier 2015 à 17:40 +0200, Haggai Eran a écrit :
 On 28/01/2015 15:19, Yann Droneaud wrote:
  Le mardi 27 janvier 2015 à 08:50 +0200, Haggai Eran a écrit :
  On 26/01/2015 13:17, Yann Droneaud wrote:
  ...
  Le dimanche 25 janvier 2015 à 17:23 +0200, Haggai Eran a écrit :
  On 22/01/2015 15:28, Yann Droneaud wrote:
  This patch ensures the extended QUERY_DEVICE uverbs request's
  comp_mask has only known values. If userspace returns unknown
  features, -EINVAL will be returned, allowing to probe/discover
  which features are currently supported by the kernel.
 
  This probing process will be much more cumbersome than it needs to be
  because userspace will have to call QUERY_DEVICE repetitively with
  different comp_mask fields until it finds out the exact set of supported
  bits.
 
 
  O(log2(N))
 
  I don't think user space developers will be happy having to do trial and 
  error to determine what features the kernel driver supports. It might be 
  even more then O(log2(N)). If my understanding of comp_mask bits usage is 
  correct it would O(N). But it's not the time complexity I'm worried about,
  it's the fact that it requires user-space developers to go through hoops in
  order to get information that can be much more easily exported.
 
  
  But there's currently *NO* such mean that could give a hint to userspace
  developer whether one bit of request's comp_mask is currently effective
  in extended QUERY_DEVICE uverbs. While we have such mean in CREATE_FLOW
  and DESTROY_FLOW: unsupported bits trigger -EINVAL.
  
  In QUERY_DEVICE, userspace developer is allowed to set request's 
  comp_mask to -1: it won't hurt it on current kernel, so why not using 
  this value or any other random value ? The program will run: it's Works
  For Me.
  
  But the same program (either binary or source code) might fail on
  newer kernel where some bits in comp_mask gain a meaning not supported
  by the program.
 
 Why don't we make the command comp_mask in QUERY_DEVICE zero in both
 versions. The behavior of the command with comp_mask = 0 will be to
 return the maximum amount of data that fits in the response buffer. The
 kernel will return -EINVAL if the input command comp_mask is not zero in
 the current version.
 

Yes, that's exactly what I wanted to do.

 If in the future we want to alter the behavior or add more input fields
 to QUERY_DEVICE, we would use new bits.
 

Yes.

  Or you had to add a different interface, dedicated to retrieve the exact
  supported feature mask.
 
  Moreover, it also ensure the requested features set in comp_mask
  are sequentially set, not skipping intermediate features, eg. the
  highest requested feature also request all the lower ones.
  This way each new feature will have to be stacked on top of the
  existing ones: this is especially important for the request and
  response data structures where fields are added after the
  current ones when expanded to support a new feature.
 
  I think it is perfectly acceptable that not all drivers will implement
  all available features, and so you shouldn't enforce this requirement.
 
  With regard to QUERY_DEVICE: the data structure layout depends on the
  comp_mask value. So either you propose a way to express multipart data
  structure (see CMSG or NETLINK), or we have to ensure the data structure
  is ever-growing, with each new chunck stacked over the existing ones:
  that's the purpose of :
 
if (cmd.comp_mask  (cmd.comp_mask + 1))
return -EINVAL;
 
  Also, it makes the comp_mask nothing more than a wasteful version number
  between 0 and 63.
 
  That's what I've already explained earlier in Re: [PATCH v3 06/17]
  IB/core: Add support for extended query device caps:
 
  http://mid.gmane.org/1421844612.13543.40.ca...@opteya.com
 
  Yes, you wrote there:
  Regarding comp_mask (not for this particular verb):
 
  It's not clear whether request's comp_mask describe the request or the
  response, as such I'm puzzled.
 
  How would the kernel and the userspace be able to parse the request and
  the response if they ignore unknown bits ?
 
  How would they be able to skip the unrecognized chunk of the request and
  response buffer ?
 
  How one bit in a comp_mask is related to a chunk in the request or
  response ?
 
  It's likely the kernel or userspace would have to skip the remaining
  comp_mask's bits after encountering an unknown bit as the size of the
  corresponding chunk in request or response would be unknown, making
  impossible to locate the corresponding chunk for the next bit set in
  comp_mask. Having said that, comp_mask is just a complicated way of
  expressing a version, which is turn describe a size (ever growing).
 
  It is my understanding that each comp_mask bit marks a set of fields in 
  the command or in the response struct as valid, so the struct format 
  remains the same and the kernel and userspace don't need to make 
  difficult calculation as to where each field is, but you

[PATCH v1 4/5] IB/uverbs: ex_query_device: no need to clear the whole structure

2015-01-29 Thread Yann Droneaud
As only the requested fields are set and copied to userspace,
there's no need to clear the content of the response structure
beforehand.

Link: http://mid.gmane.org/cover.1422553023.git.ydrone...@opteya.com
Cc: Sagi Grimberg sa...@mellanox.com
Cc: Shachar Raindel rain...@mellanox.com
Cc: Eli Cohen e...@mellanox.com
Cc: Haggai Eran hagg...@mellanox.com
Signed-off-by: Yann Droneaud ydrone...@opteya.com
---
 drivers/infiniband/core/uverbs_cmd.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/core/uverbs_cmd.c 
b/drivers/infiniband/core/uverbs_cmd.c
index 81d8b5aa2eb6..9520880a163f 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -3328,9 +3328,9 @@ int ib_uverbs_ex_query_device(struct ib_uverbs_file *file,
if (err)
return err;
 
-   memset(resp, 0, sizeof(resp));
copy_query_dev_fields(file, resp.base, attr);
resp.comp_mask = 0;
+   resp.reserved = 0;
 
if (ucore-outlen  resp_len + sizeof(resp.odp_caps))
goto end;
@@ -3343,6 +3343,9 @@ int ib_uverbs_ex_query_device(struct ib_uverbs_file *file,
attr.odp_caps.per_transport_caps.uc_odp_caps;
resp.odp_caps.per_transport_caps.ud_odp_caps =
attr.odp_caps.per_transport_caps.ud_odp_caps;
+   resp.odp_caps.reserved = 0;
+#else
+   memset(resp.odp_caps, 0, sizeof(resp.odp_caps));
 #endif
resp.comp_mask |= IB_USER_VERBS_EX_QUERY_DEVICE_ODP;
resp_len += sizeof(resp.odp_caps);
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v1 1/5] IB/uverbs: ex_query_device: answer must not depend on request's comp_mask

2015-01-29 Thread Yann Droneaud
As specified in Extending Verbs API presentation [1] by Tzahi Oved
during OFA International Developer Workshop 2013, the request's
comp_mask should describe the request data: it's describe the
availability of extended fields in the request.
Conversely, the response's comp_mask should describe the presence
of extended fields in the response.

So this patch changes ib_uverbs_ex_query_device() function to always
return the maximum supported features, currently only
IB_USER_VERBS_EX_QUERY_DEVICE_ODP per commit 860f10a799c8 (IB/core:
Add flags for on demand paging support), regardless of the value of
request's comp_mask.

[1] 
https://www.openfabrics.org/images/docs/2013_Dev_Workshop/Tues_0423/2013_Workshop_Tues_0830_Tzahi_Oved-verbs_extensions_ofa_2013-tzahio.pdf

Link: http://mid.gmane.org/cover.1422553023.git.ydrone...@opteya.com
Cc: Sagi Grimberg sa...@mellanox.com
Cc: Shachar Raindel rain...@mellanox.com
Cc: Eli Cohen e...@mellanox.com
Cc: Haggai Eran hagg...@mellanox.com
Signed-off-by: Yann Droneaud ydrone...@opteya.com
---
 drivers/infiniband/core/uverbs_cmd.c | 18 --
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/drivers/infiniband/core/uverbs_cmd.c 
b/drivers/infiniband/core/uverbs_cmd.c
index 532d8eba8b02..6ef06a9b4362 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -3324,17 +3324,15 @@ int ib_uverbs_ex_query_device(struct ib_uverbs_file 
*file,
resp.comp_mask = 0;
 
 #ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING
-   if (cmd.comp_mask  IB_USER_VERBS_EX_QUERY_DEVICE_ODP) {
-   resp.odp_caps.general_caps = attr.odp_caps.general_caps;
-   resp.odp_caps.per_transport_caps.rc_odp_caps =
-   attr.odp_caps.per_transport_caps.rc_odp_caps;
-   resp.odp_caps.per_transport_caps.uc_odp_caps =
-   attr.odp_caps.per_transport_caps.uc_odp_caps;
-   resp.odp_caps.per_transport_caps.ud_odp_caps =
-   attr.odp_caps.per_transport_caps.ud_odp_caps;
-   resp.comp_mask |= IB_USER_VERBS_EX_QUERY_DEVICE_ODP;
-   }
+   resp.odp_caps.general_caps = attr.odp_caps.general_caps;
+   resp.odp_caps.per_transport_caps.rc_odp_caps =
+   attr.odp_caps.per_transport_caps.rc_odp_caps;
+   resp.odp_caps.per_transport_caps.uc_odp_caps =
+   attr.odp_caps.per_transport_caps.uc_odp_caps;
+   resp.odp_caps.per_transport_caps.ud_odp_caps =
+   attr.odp_caps.per_transport_caps.ud_odp_caps;
 #endif
+   resp.comp_mask |= IB_USER_VERBS_EX_QUERY_DEVICE_ODP;
 
err = ib_copy_to_udata(ucore, resp, sizeof(resp));
if (err)
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v1 5/5] IB/core: ib_copy_to_udata(): don't silently truncate response

2015-01-29 Thread Yann Droneaud
While ib_copy_to_udata() should check for the available output
space as already proposed in some other patches [1][2][3], the
changes brought by commit 5a77abf9a97a (IB/core: Add support for
extended query device caps) are silently truncating the data to
be written to userspace if the output buffer is not large enough
to hold the response data.

Silently truncating the response is not a reliable behavior as
userspace is not given any hint about this truncation: userspace
is leaved with garbage to play with.

Not checking the response buffer size and writing past the
userspace buffer is no good either, but it's the current behavior.

So this patch revert the particular change on ib_copy_to_udata()
as a better behavior is implemented in the upper level function
ib_uverbs_ex_query_device().

[1] [PATCH 00/22] infiniband: improve userspace input check

http://mid.gmane.org/cover.1376847403.git.ydrone...@opteya.com

[2] [PATCH 03/22] infiniband: ib_copy_from_udata(): check input length

http://mid.gmane.org/2bf102a41c51f61965ee09df827abe8fefb523a9.1376847403.git.ydrone...@opteya.com

[3] [PATCH 04/22] infiniband: ib_copy_to_udata(): check output length

http://mid.gmane.org/d27716a3a1c180f832d153a7402f65ea8a75b734.1376847403.git.ydrone...@opteya.com

Link: http://mid.gmane.org/cover.1422553023.git.ydrone...@opteya.com
Cc: Sagi Grimberg sa...@mellanox.com
Cc: Shachar Raindel rain...@mellanox.com
Cc: Eli Cohen e...@mellanox.com
Cc: Haggai Eran hagg...@mellanox.com
Signed-off-by: Yann Droneaud ydrone...@opteya.com
---
 include/rdma/ib_verbs.h | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 0d74f1de99aa..65994a19e840 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1707,10 +1707,7 @@ static inline int ib_copy_from_udata(void *dest, struct 
ib_udata *udata, size_t
 
 static inline int ib_copy_to_udata(struct ib_udata *udata, void *src, size_t 
len)
 {
-   size_t copy_sz;
-
-   copy_sz = min_t(size_t, len, udata-outlen);
-   return copy_to_user(udata-outbuf, src, copy_sz) ? -EFAULT : 0;
+   return copy_to_user(udata-outbuf, src, len) ? -EFAULT : 0;
 }
 
 /**
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v1 3/5] IB/uverbs: ex_query_device: answer must depend on response buffer's size

2015-01-29 Thread Yann Droneaud
As specified in Extending Verbs API presentation [1] by Tzahi Oved
during OFA International Developer Workshop 2013, the request's
comp_mask should describe the request data: it's describe the
availability of extended fields in the request.
Conversely, the response's comp_mask should describe the presence
of extended fields in the response.

So instead of silently truncating extended QUERY_DEVICE uverb's
response, see commit 5a77abf9a97a (IB/core: Add support for
extended query device caps)), this patch makes function
ib_uverbs_ex_query_device() check the available space in the
response buffer against the minimal response structure and fail
with -ENOSPC if this base structure cannot be returned to
userspace: it's required to be able to return the comp_mask's
value, at least.

For extended features, currently only IB_USER_VERBS_EX_QUERY_DEVICE_ODP
per commit 860f10a799c8 (IB/core: Add flags for on demand paging
support), the extension's data is returned if the needed space
is available in the response buffer: it is expected that newer
userspace program pass a bigger response buffer so that it can
retrieve extended features. The comp_mask value will match the fields
that were effectively returned to userspace.

In the end:
- userspace won't get truncated responses;
- newer kernel would be able to support older binaries and
  older binaries will work flawlessly with newer kernel;
- additionally, newer binaries would even be able to support
  older kernel, as far as they don't set new feature bit in
  request's comp_mask.

Note: as offsetof() is used to retrieve the size of the lower chunk
of the response, beware that it only works if the upper chunk
is right after, without any implicit padding. And, as the size of
the latter chunk is added to the base size, implicit padding at the
end of the structure is not taken in account. Both point must be
taken in account when extending the uverbs functionalities.

[1] 
https://www.openfabrics.org/images/docs/2013_Dev_Workshop/Tues_0423/2013_Workshop_Tues_0830_Tzahi_Oved-verbs_extensions_ofa_2013-tzahio.pdf

Link: http://mid.gmane.org/cover.1422553023.git.ydrone...@opteya.com
Cc: Sagi Grimberg sa...@mellanox.com
Cc: Shachar Raindel rain...@mellanox.com
Cc: Eli Cohen e...@mellanox.com
Cc: Haggai Eran hagg...@mellanox.com
Signed-off-by: Yann Droneaud ydrone...@opteya.com
---
 drivers/infiniband/core/uverbs_cmd.c | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/core/uverbs_cmd.c 
b/drivers/infiniband/core/uverbs_cmd.c
index fbcc54b86795..81d8b5aa2eb6 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -3302,6 +3302,7 @@ int ib_uverbs_ex_query_device(struct ib_uverbs_file *file,
struct ib_uverbs_ex_query_device  cmd;
struct ib_device_attr attr;
struct ib_device *device;
+   size_t resp_len;
int err;
 
device = file-device-ib_dev;
@@ -3318,6 +3319,11 @@ int ib_uverbs_ex_query_device(struct ib_uverbs_file 
*file,
if (cmd.reserved)
return -EINVAL;
 
+   resp_len = offsetof(typeof(resp), odp_caps);
+
+   if (ucore-outlen  resp_len)
+   return -ENOSPC;
+
err = device-query_device(device, attr);
if (err)
return err;
@@ -3326,6 +3332,9 @@ int ib_uverbs_ex_query_device(struct ib_uverbs_file *file,
copy_query_dev_fields(file, resp.base, attr);
resp.comp_mask = 0;
 
+   if (ucore-outlen  resp_len + sizeof(resp.odp_caps))
+   goto end;
+
 #ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING
resp.odp_caps.general_caps = attr.odp_caps.general_caps;
resp.odp_caps.per_transport_caps.rc_odp_caps =
@@ -3336,8 +3345,10 @@ int ib_uverbs_ex_query_device(struct ib_uverbs_file 
*file,
attr.odp_caps.per_transport_caps.ud_odp_caps;
 #endif
resp.comp_mask |= IB_USER_VERBS_EX_QUERY_DEVICE_ODP;
+   resp_len += sizeof(resp.odp_caps);
 
-   err = ib_copy_to_udata(ucore, resp, sizeof(resp));
+end:
+   err = ib_copy_to_udata(ucore, resp, resp_len);
if (err)
return err;
 
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v1 2/5] IB/uverbs: ex_query_device: check request's comp_mask

2015-01-29 Thread Yann Droneaud
This patch ensures the extended QUERY_DEVICE uverbs request's
comp_mask has only known and supported bits (currently none).

If userspace set unknown features bits, -EINVAL will be returned,
ensuring current programs are not allowed to set random feature
bits: such bits could enable new extended features in future kernel
versions and those features can trigger a behavior not unsupported
by the older programs or make the newer kernels return an error
for a request which was valid on older kernels.

Additionally, returning an error for unsupported feature would
allow userspace to probe/discover which extended features are
currently supported by a kernel.

Link: http://mid.gmane.org/cover.1422553023.git.ydrone...@opteya.com
Cc: Sagi Grimberg sa...@mellanox.com
Cc: Shachar Raindel rain...@mellanox.com
Cc: Eli Cohen e...@mellanox.com
Cc: Haggai Eran hagg...@mellanox.com
Signed-off-by: Yann Droneaud ydrone...@opteya.com
---
 drivers/infiniband/core/uverbs_cmd.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/infiniband/core/uverbs_cmd.c 
b/drivers/infiniband/core/uverbs_cmd.c
index 6ef06a9b4362..fbcc54b86795 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -3312,6 +3312,9 @@ int ib_uverbs_ex_query_device(struct ib_uverbs_file *file,
if (err)
return err;
 
+   if (cmd.comp_mask)
+   return -EINVAL;
+
if (cmd.reserved)
return -EINVAL;
 
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v1 0/5] IB/core: extended query device caps cleanup for v3.19

2015-01-29 Thread Yann Droneaud
Hi,

Following discussions in thread [PATCH v3 06/17] IB/core: Add support
for extended query device caps [1] and further comments on previous
patch in [PATCH 3/4] IB/uverbs: ex_query_device: check request's
comp_mask thread [2], I'm proposing an updated patchset to implement
a slighly different behavior to handle the request parameters in
ib_uverbs_ex_query_device() in order to restore the current behavior
of ib_copy_to_udata().

I think we found an agreement on the scheme to be implemented in
Haggai Eran's response [3]:

- input's comp_mask is currently not used, so it must be 0;
- the response buffer size is checked so that the base answer
  can be returned without being truncated;
- the extended feature (odp) information are returned
  if, and only if, there's enough space in the response buffer,
  allowing older programs to get only the features they would
  expect. Newer programs are expected to provide more space,
  but as the response's comp_mask will describe only the available
  fields, newer program cannot be surprized to not get information
  when run on older kernel (if we want to support this use case).

I feel even more confident this behavior would allow a better
maintainability. Additionally, it's still looking more like the
behavior already implemented by other InfiniBand/RDMA kernel -
userspace interfaces.

I hope this would go in v3.19 before its release so that the extended
QUERY_DEVICE uverbs won't hurt us being part of an official release:
in it's current shape it won't be easy to support it.

Regards.

Changes from v0 [4]
- don't use input's comp_mask to conditionnaly build the response
- ensure input's comp_mask is set 0

[1] http://mid.gmane.org/1418733236.2779.26.ca...@opteya.com
[2] 
http://mid.gmane.org/063956366559d6919693aabb324bab83d676bc28.1421931555.git.ydrone...@opteya.com
[3] http://mid.gmane.org/54c902e4.5010...@mellanox.com
[4] http://mid.gmane.org/cover.1421931555.git.ydrone...@opteya.com

Yann Droneaud (5):
  IB/uverbs: ex_query_device: answer must not depend on request's
comp_mask
  IB/uverbs: ex_query_device: check request's comp_mask
  IB/uverbs: ex_query_device: answer must depend on response buffer's
size
  IB/uverbs: ex_query_device: no need to clear the whole structure
  IB/core: ib_copy_to_udata(): don't silently truncate response

 drivers/infiniband/core/uverbs_cmd.c | 39 +---
 include/rdma/ib_verbs.h  |  5 +
 2 files changed, 28 insertions(+), 16 deletions(-)

-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] IB/uverbs: ex_query_device: check request's comp_mask

2015-01-29 Thread Yann Droneaud
Hi,

Le jeudi 29 janvier 2015 à 11:09 -0700, Jason Gunthorpe a écrit :
 On Wed, Jan 28, 2015 at 02:19:11PM +0100, Yann Droneaud wrote:
 
  But the same program (either binary or source code) might fail on
  newer kernel where some bits in comp_mask gain a meaning not supported
  by the program.
 
 To clarify some of this:
 
 The intention of the comp_mask scheme was to define a growing
 structure.
 
 The invariant is: A bigger structure allways has all smaller
 structures (past versions) as a prefix.
 
 So the size alone is enough for user space to know what is going
 on. userspace only processes structure members up to the size returned
 by the kernel, and there is only one structure layout.
 

Unfortunately, the userspace don't get the size of the returned data:
it's only a single write() syscall after all.

So the kernel is left with the comp_mask in the response to express
the returned size.

 The purpose of the output comp_mask is to allow drivers to declare
 they do not support the new structure members, and comp_mask bits
 should only be used with new structure members do not have a natural
 'null' value.
 

It's not (yet) about drivers as the output's comp_mask (in the 
response), is set by uverbs layer.

Do you think we have to enforce a 1 to 1 relation between the request 
comp_mask's bits and the response comp_mask's bits ?

That would not fit with my understanding of Extending Verbs API 
presentation [1]: request's comp_mask describe the fields present in 
the request and response's comp_mask describe the fields present in the 
response.

 Further, we need to distinguish cases where additional data is
 mandatory or optional.
 
 query_device is clearly optional, the only purpose the input comp mask
 serves is to reduce expensive work in the driver by indicating that
 some result bits are not needed.

That's not how I've understand the purpose of the request's comp_mask
after reading the presentation: request's comp_mask describe the content
of the request. Then that additional content can trigger the presence 
of additional fields in the response if needed.

  It is perfectly OK for the kernel to
 ignore the input comp mask, and OK for userspace to typically request
 all bits. (indeed, I think this is a pretty silly optimization myself,
 and the original patch that motivated this was restructured so it is
 not needed)
 

It's not at all OK to ignore request's comp_mask: it has to be checked
to find if there's a feature requested the kernel cannot fullfil, and 
any unknown bit is a possible feature request. So the kernel has to 
reject the request as a whole as it don't know how to process it.

As we don't know yet how we would extend the extended QUERY_DEVICE
uverbs, the kernel have to refuse any random value.

http://blog.ffwll.ch/2013/11/botching-up-ioctls.html


 Other verbs must be more strict, if one side does not understand the
 comp mask then the verb must fail in some way. Presumably the valid
 comp mask values are inferred in some way (eg query_device says the
 extended function is supported)
 

 Everything should fit in one of those two general cases..

I believe every interface should return an error for any unknown value
(every unused bits of a data structure), that is, there's only one case.

   
  Thanks for the link to the presentation.
 
 If I recall the presentation is old and had some flaws that were
 addressed before things made it into libibverbs..
 

I have to have a look to this part of libibverbs: I'm not sure
the extended QUERY_DEVICE is already implemented.

 Thank you for looking at this, it is very important that this scheme
 is used properly, and it is very easy to make mistakes. I haven't had
 time to review any of these new patches myself.
 

I hope you would find some time to review my latest patchset
so that we don't miss a corner case. It's starting to become urgent.

Regards.

-- 
Yann Droneaud
OPTEYA


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 1/5] IB/uverbs: ex_query_device: answer must not depend on request's comp_mask

2015-01-29 Thread Yann Droneaud
Hi,

Le jeudi 29 janvier 2015 à 11:28 -0700, Jason Gunthorpe a écrit :
 On Thu, Jan 29, 2015 at 06:59:58PM +0100, Yann Droneaud wrote:
  As specified in Extending Verbs API presentation [1] by Tzahi Oved
  during OFA International Developer Workshop 2013, the request's
  comp_mask should describe the request data: it's describe the
  availability of extended fields in the request.
  Conversely, the response's comp_mask should describe the presence
  of extended fields in the response.
 
 Roland: I agree with Yann, these patches need to go in, or the ODP
 patches reverted.
 
   #ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING
  -   if (cmd.comp_mask  IB_USER_VERBS_EX_QUERY_DEVICE_ODP) {
 
 Absolutely, a query output should never depend on the input comp_mask.
 

We should keep this in mind then.

  -   resp.odp_caps.general_caps = attr.odp_caps.general_caps;
  -   resp.odp_caps.per_transport_caps.rc_odp_caps =
  -   attr.odp_caps.per_transport_caps.rc_odp_caps;
  -   resp.odp_caps.per_transport_caps.uc_odp_caps =
  -   attr.odp_caps.per_transport_caps.uc_odp_caps;
  -   resp.odp_caps.per_transport_caps.ud_odp_caps =
  -   attr.odp_caps.per_transport_caps.ud_odp_caps;
  -   resp.comp_mask |= IB_USER_VERBS_EX_QUERY_DEVICE_ODP;
  -   }
  +   resp.odp_caps.general_caps = attr.odp_caps.general_caps;
  +   resp.odp_caps.per_transport_caps.rc_odp_caps =
  +   attr.odp_caps.per_transport_caps.rc_odp_caps;
  +   resp.odp_caps.per_transport_caps.uc_odp_caps =
  +   attr.odp_caps.per_transport_caps.uc_odp_caps;
  +   resp.odp_caps.per_transport_caps.ud_odp_caps =
  +   attr.odp_caps.per_transport_caps.ud_odp_caps;
   #endif
  +   resp.comp_mask |= IB_USER_VERBS_EX_QUERY_DEVICE_ODP;
 
 Not sure about this - if 0 is a valid null answer for all the _caps
 then it is fine, and the comp_mask bit should just be removed as the
 size alone should be enough.
 

That's difficult to say. But I hope 0 are valids answers in this case.

Anyway, the response's comp_mask is needed, as it's the sole way for 
the userspace to know the size of the response. See below.

 This looks wrong however:
 
  err = ib_copy_to_udata(ucore, resp, sizeof(resp));
  if (err)
return err;
 return 0;
 
 Why isn't this returning the filled structure size to userspace?  This
 would seem to be very urgently wrong to me? Am I missing something?
 
 Patch 3 probably should gain:
 
 - return 0;
 + return resp_len;
 

The write() syscall must return the size buffer passed to it, or less,
but in such case it would ask for trouble as userspace would be allowed
to write() the remaining bytes.
Returning a size bigger than the one passed to write() is not acceptable
and would break any expectation.

 This patch looks like an improvement to me:
 
 Reviewed-By: Jason Gunthorpe jguntho...@obsidianresearch.com
 

Thanks a lot.

Regards.

-- 
Yann Droneaud
OPTEYA


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 1/5] IB/uverbs: ex_query_device: answer must not depend on request's comp_mask

2015-01-29 Thread Yann Droneaud
Hi,

Le jeudi 29 janvier 2015 à 11:28 -0700, Jason Gunthorpe a écrit :
 On Thu, Jan 29, 2015 at 06:59:58PM +0100, Yann Droneaud wrote:
  As specified in Extending Verbs API presentation [1] by Tzahi Oved
  during OFA International Developer Workshop 2013, the request's
  comp_mask should describe the request data: it's describe the
  availability of extended fields in the request.
  Conversely, the response's comp_mask should describe the presence
  of extended fields in the response.
 
 Roland: I agree with Yann, these patches need to go in, or the ODP
 patches reverted.
 

Reverting all On Demand Paging patches seems overkill:
if something as to be reverted it should be commit 5a77abf9a97a
(IB/core: Add support for extended query device caps) and the part of
commit 860f10a799c8 (IB/core: Add flags for on demand paging support)
which modify ib_uverbs_ex_query_device().

But I wonder about this part of commit 860f10a799c8:

diff --git a/drivers/infiniband/core/uverbs_cmd.c 
b/drivers/infiniband/core/uverbs_cmd.c
index c7a43624c96b..f9326ccda4b5 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -953,6 +953,18 @@ ssize_t ib_uverbs_reg_mr(struct ib_uverbs_file *file,
goto err_free;
}
 
+   if (cmd.access_flags  IB_ACCESS_ON_DEMAND) {
+   struct ib_device_attr attr;
+
+   ret = ib_query_device(pd-device, attr);
+   if (ret || !(attr.device_cap_flags 
+   IB_DEVICE_ON_DEMAND_PAGING)) {
+   pr_debug(ODP support not available\n);
+   ret = -EINVAL;
+   goto err_put;
+   }
+   }
+

AFAICT (1  6) bit from struct ib_uverbs_reg_mr access_flags field
was not enforced to be 0 previously, as ib_check_mr_access() only check 
for some coherency between a subset of the bits (it's not a function
dedicated to check flags provided by userspace):

include/rdma/ib_verbs.h:

   1094 enum ib_access_flags {
   1095 IB_ACCESS_LOCAL_WRITE   = 1,
   1096 IB_ACCESS_REMOTE_WRITE  = (11),
   1097 IB_ACCESS_REMOTE_READ   = (12),
   1098 IB_ACCESS_REMOTE_ATOMIC = (13),
   1099 IB_ACCESS_MW_BIND   = (14),
   1100 IB_ZERO_BASED   = (15),
   1101 IB_ACCESS_ON_DEMAND = (16),
   1102 };

drivers/infiniband/core/uverbs_cmd.c: ib_uverbs_reg_mr()

961 ret = ib_check_mr_access(cmd.access_flags);
962 if (ret)
963 return ret;

include/rdma/ib_verbs.h:

   2643 static inline int ib_check_mr_access(int flags)
   2644 {
   2645 /*
   2646  * Local write permission is required if remote write or
   2647  * remote atomic permission is also requested.
   2648  */
   2649 if (flags  (IB_ACCESS_REMOTE_ATOMIC | IB_ACCESS_REMOTE_WRITE) 

   2650 !(flags  IB_ACCESS_LOCAL_WRITE))
   2651 return -EINVAL;
   2652 
   2653 return 0;
   2654 }

drivers/infiniband/core/uverbs_cmd.c: ib_uverbs_reg_mr()

990 mr = pd-device-reg_user_mr(pd, cmd.start, cmd.length, 
cmd.hca_va,
991  cmd.access_flags, udata);

reg_user_mr() functions may call ib_umem_get() and pass access_flags to
it:

drivers/infiniband/core/umem.c: ib_umem_get()

114 /*
115  * We ask for writable memory if any of the following
116  * access flags are set.  Local write and remote write
117  * obviously require write access.  Remote atomic can do
118  * things like fetch and add, which will modify memory, and
119  * MW bind can change permissions by binding a window.
120  */
121 umem-writable  = !!(access 
122 (IB_ACCESS_LOCAL_WRITE   | IB_ACCESS_REMOTE_WRITE |
123  IB_ACCESS_REMOTE_ATOMIC | IB_ACCESS_MW_BIND));
124 
125 if (access  IB_ACCESS_ON_DEMAND) {
126 ret = ib_umem_odp_get(context, umem);
127 if (ret) {
128 kfree(umem);
129 return ERR_PTR(ret);
130 }
131 return umem;
132 }


As you can see only a few bits in access_flags are checked in the end,
so it may exist a very unlikely possibility that an existing userspace
program is setting this IB_ACCESS_ON_DEMAND bit without the intention
of enabling on demand paging as this would be unnoticed by older kernel.

In the other hand, a newer program built with on-demand-paging in mind
will set the bit, but when run on older kernel, it will be a no-op,
allowing the program to continue, perhaps thinking on-demand-paging
is available.

That should be avoided as much as possible.

Unfortunately, I think this cannot be fixed as it's was long since 
IB_ZERO_BASED was added by commit 7083e42ee2 (IB/core: Add type 2

Re: [PATCH v1 1/5] IB/uverbs: ex_query_device: answer must not depend on request's comp_mask

2015-01-29 Thread Yann Droneaud
Le jeudi 29 janvier 2015 à 19:43 +0100, Yann Droneaud a écrit :
 Le jeudi 29 janvier 2015 à 11:28 -0700, Jason Gunthorpe a écrit :
  On Thu, Jan 29, 2015 at 06:59:58PM +0100, Yann Droneaud wrote:

   - resp.odp_caps.general_caps = attr.odp_caps.general_caps;
   - resp.odp_caps.per_transport_caps.rc_odp_caps =
   - attr.odp_caps.per_transport_caps.rc_odp_caps;
   - resp.odp_caps.per_transport_caps.uc_odp_caps =
   - attr.odp_caps.per_transport_caps.uc_odp_caps;
   - resp.odp_caps.per_transport_caps.ud_odp_caps =
   - attr.odp_caps.per_transport_caps.ud_odp_caps;
   - resp.comp_mask |= IB_USER_VERBS_EX_QUERY_DEVICE_ODP;
   - }
   + resp.odp_caps.general_caps = attr.odp_caps.general_caps;
   + resp.odp_caps.per_transport_caps.rc_odp_caps =
   + attr.odp_caps.per_transport_caps.rc_odp_caps;
   + resp.odp_caps.per_transport_caps.uc_odp_caps =
   + attr.odp_caps.per_transport_caps.uc_odp_caps;
   + resp.odp_caps.per_transport_caps.ud_odp_caps =
   + attr.odp_caps.per_transport_caps.ud_odp_caps;
#endif
   + resp.comp_mask |= IB_USER_VERBS_EX_QUERY_DEVICE_ODP;
  
  Not sure about this - if 0 is a valid null answer for all the _caps
  then it is fine, and the comp_mask bit should just be removed as the
  size alone should be enough.
  
 
 That's difficult to say. But I hope 0 are valids answers in this case.
 

Hum, according to include/rdma/ib_verbs.h, there's a bit set
for ODP support:

148 enum ib_odp_general_cap_bits {
149 IB_ODP_SUPPORT = 1  0,
150 };

So it should be safe to set everything to 0 in struct
ib_uverbs_odp_caps.

Regards.

-- 
Yann Droneaud
OPTEYA


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 2/5] IB/uverbs: ex_query_device: check request's comp_mask

2015-01-29 Thread Yann Droneaud
Hi,

Le jeudi 29 janvier 2015 à 11:36 -0700, Jason Gunthorpe a écrit :
 On Thu, Jan 29, 2015 at 06:59:59PM +0100, Yann Droneaud wrote:
  This patch ensures the extended QUERY_DEVICE uverbs request's
  comp_mask has only known and supported bits (currently none).
 
 I think I would be happy to see the input comp_mask removed
 entirely. I can't see a possible use for input data to a QUERY command
 that wouldn't be better served by creating a new command
 
 But forcing the value to 0 seems reasonable as well.
 

I cannot forsee the future, but having at least one unused bit
available allow for any kind of yet unknown extension: having this
bit/these bits checked now, permit fixing mistake later.

So let it be.

 Reviewed-By: Jason Gunthorpe jguntho...@obsidianresearch.com
 

Thanks.

 Also, the _ex varients were supposed to be supersets of the base call,
 so it is wrong that query_device_ex doesn't return all the same data
 as query_device, layed out so that the original response structure is
 a prefix of the extended response structure.
 
 The other _ex calls in verbs were designed that way so it will be
 surprising to the user that this one is different.
 

It seems to me it has the layout you're expecting:

224 struct ib_uverbs_ex_query_device_resp {
225 struct ib_uverbs_query_device_resp base;
226 __u32 comp_mask;
227 __u32 reserved;
228 struct ib_uverbs_odp_caps odp_caps;
229 };

Regards.

-- 
Yann Droneaud
OPTEYA


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 3/5] IB/uverbs: ex_query_device: answer must depend on response buffer's size

2015-01-29 Thread Yann Droneaud
Hi,

Le jeudi 29 janvier 2015 à 11:38 -0700, Jason Gunthorpe a écrit :
 On Thu, Jan 29, 2015 at 07:00:00PM +0100, Yann Droneaud wrote:
  As specified in Extending Verbs API presentation [1] by Tzahi Oved
  during OFA International Developer Workshop 2013, the request's
  comp_mask should describe the request data: it's describe the
  availability of extended fields in the request.
  Conversely, the response's comp_mask should describe the presence
  of extended fields in the response.
 
 This makes sense to me as well.
 
  -   err = ib_copy_to_udata(ucore, resp, sizeof(resp));
  +end:
  +   err = ib_copy_to_udata(ucore, resp, resp_len);
  if (err)
  return err;
   
 
 I think resp_len should be returned, not 0?
 

As said previously it's not possible to use the effective size of the
response as the return value of the write() syscall: the syscall
must return the size of the input buffer, not the size of the output
buffer, otherwise it would break the semantics of the write() syscall.

Regards.

-- 
Yann Droneaud
OPTEYA


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 1/5] IB/uverbs: ex_query_device: answer must not depend on request's comp_mask

2015-01-29 Thread Yann Droneaud
Hi,

Le jeudi 29 janvier 2015 à 12:18 -0700, Jason Gunthorpe a écrit :
 On Thu, Jan 29, 2015 at 07:43:29PM +0100, Yann Droneaud wrote:
 
  The write() syscall must return the size buffer passed to it, or
  less, but in such case it would ask for trouble as userspace would
  be allowed to write() the remaining bytes.  Returning a size bigger
  than the one passed to write() is not acceptable and would break any
  expectation.
 
 By that logic the 0 return is still wrong, and it should be ucore-in_len
 

This is handled by ib_uverbs_write() in
drivers/infiniband/core/uverbs_main.c:

709 if (err)
710 return err;
711 
712 return written_count;


 But I think we can return less without risking anything breaking, it
 already violates the invariant associated with write() - it mutates
 the buffer passed in!
 

I don't think so, as only the response buffer is written to and the
response buffer pointer is provided in the buffer given to write().

AFAIK, no uverbs ever change the content of the input buffer (eg. the
request): I've managed to declare the various input buffers const so
it would surprising to find it use for writing to userspace.

Anyway, I recognize that uverb way of abusing write() syscall is 
borderline (at best) regarding other Linux subsystems and Unix paradigm 
in general. But it's not enough to screw it more.

Regards.

-- 
Yann Droneaud
OPTEYA


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] IB/uverbs: ex_query_device: check request's comp_mask

2015-01-29 Thread Yann Droneaud
Le jeudi 29 janvier 2015 à 12:14 -0700, Jason Gunthorpe a écrit :
 On Thu, Jan 29, 2015 at 07:35:14PM +0100, Yann Droneaud wrote:
 
  Unfortunately, the userspace don't get the size of the returned data:
  it's only a single write() syscall after all.
 
 A write syscall that behaves nothing like write() actually should, so
 I don't see why we can't have
 
 resp_len = write(fd,inout_buf,sizeof(input_len));
 
 Returning 0 from write makes no sense at all.
 

0 is not the result of the write() syscall, as for extended uverbs
I've ensure the return value of the write() syscall was the size
it was given. See
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/infiniband/core/uverbs_main.c?id=v3.19-rc6#n599

705 err = uverbs_ex_cmd_table[command](file,
706ucore,
707uhw);
708 
709 if (err)
710 return err;
711 
712 return written_count;

See commit f21519b23c1 (IB/core: extended command: an improved
infrastructure for uverbs commands)

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=f21519b23c1b6fa25366be4114ccf7fcf1c190f9

 In the fullness of your patchset it will maintain the invariant that
 resp_len = sizeof(input_len)
 

I don't catch your point: the response can be bigger than the request.

 Which seems OK to me considering what we have to work with, and a
 significantly better choice than 0.
 
  So the kernel is left with the comp_mask in the response to express
  the returned size.
 
 It was never the intent that the size should be computed from
 comp_mask. If the size is necessary it must be explicit.
 
 In this instance if the size is not returned then libibverbs will have
 to zero the entire user buffer before passing it to the kernel,
 because it has to ensure any tail for the user app is 0'd.
 

The proposed patch ensure the integrity of the response regarding
comp_mask: if a bit is set in response's comp_mask that means the
related fields are presents (and valid).

So before parsing the response fields, userspace have to check
response's comp_mask: fields access must be protected by correct
check on comp_mask ... but it might be useful for the userspace
developer to clear the response buffer just in case he/she decided
to be lazy with the check.

 Remember for santity we want comp mask bits for things that can't be 0

For me, it's better if a bit is set in response's comp_mask by the 
kernel when the kernel have written something in the related fields 
even if the those fields are all 0.

 and we want 0 for things that are not set.
 
 struct ib_query_device_ex res;
 ibv_query_device_ex(..,res,sizeof(res));
 
 printf(%u,res.foo_cap); // 0 if unsupported is OK
 if (res.comp_mask  COMP_BAR)
   printf(%u,res.bar_thingy);  // 0 has meaning, must use COMP_BAR
 
 Ideally we want to minimize the number of COMP bits because it is a
 huge burden on the end user to work with them.
 

Sure. So I think comp_mask is just an overly complicated way of
expressing the version and the size of the response.

   The purpose of the output comp_mask is to allow drivers to declare
   they do not support the new structure members, and comp_mask bits
   should only be used with new structure members do not have a natural
   'null' value.
 
  It's not (yet) about drivers as the output's comp_mask (in the 
  response), is set by uverbs layer.
  
  Do you think we have to enforce a 1 to 1 relation between the request 
  comp_mask's bits and the response comp_mask's bits ?
 
 In the query context I would be happy enough to just treat the in
 bound buffer as uninitialized buffer space, but certainly generally
 speaking the comp_mask+size should refer to the structure - so
 input/output are not directly related.
 

OK.

   Further, we need to distinguish cases where additional data is
   mandatory or optional.
   
   query_device is clearly optional, the only purpose the input comp mask
   serves is to reduce expensive work in the driver by indicating that
   some result bits are not needed.
  
  That's not how I've understand the purpose of the request's comp_mask
  after reading the presentation: request's comp_mask describe the content
  of the request. Then that additional content can trigger the presence 
  of additional fields in the response if needed.
 
 Agreed - what I described was an abuse that some early Mellanox
 patches for a query_ex included and it was just a shortcut to avoid
 defining a dedicated input structure. It seems that same scheme was
 copied into these new patches.
 

OK

It is perfectly OK for the kernel to
   ignore the input comp mask, and OK for userspace to typically request
   all bits. (indeed, I think this is a pretty silly optimization myself,
   and the original patch that motivated this was restructured so it is
   not needed

Re: [PATCH] ib_ipoib: Scatter-Gather support in connected mode

2015-01-28 Thread Yann Droneaud
 = netdev_priv(dev);
  
   if (test_bit(IPOIB_FLAG_ADMIN_CM, priv-flags))
 - features = ~(NETIF_F_SG | NETIF_F_IP_CSUM | NETIF_F_TSO);
 + features = ~(NETIF_F_IP_CSUM | NETIF_F_TSO);
  
   return features;
  }
 @@ -233,7 +233,6 @@ int ipoib_set_mode(struct net_device *dev, const char 
 *buf)
  will cause multicast packet drops\n);
   netdev_update_features(dev);
   rtnl_unlock();
 - priv-tx_wr.send_flags = ~IB_SEND_IP_CSUM;

Sure about this removal ? I don't see it related to scatter-gather.

  
   ipoib_flush_paths(dev);
   rtnl_lock();

Regards.

-- 
Yann Droneaud
OPTEYA


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] IB/uverbs: ex_query_device: check request's comp_mask

2015-01-28 Thread Yann Droneaud
Hi,

Le mardi 27 janvier 2015 à 08:50 +0200, Haggai Eran a écrit :
 On 26/01/2015 13:17, Yann Droneaud wrote:
  ...
  Le dimanche 25 janvier 2015 à 17:23 +0200, Haggai Eran a écrit :
  On 22/01/2015 15:28, Yann Droneaud wrote:
  This patch ensures the extended QUERY_DEVICE uverbs request's
  comp_mask has only known values. If userspace returns unknown
  features, -EINVAL will be returned, allowing to probe/discover
  which features are currently supported by the kernel.
 
  This probing process will be much more cumbersome than it needs to be
  because userspace will have to call QUERY_DEVICE repetitively with
  different comp_mask fields until it finds out the exact set of supported
  bits.
 
  
  O(log2(N))
 
 I don't think user space developers will be happy having to do trial and 
 error to determine what features the kernel driver supports. It might be 
 even more then O(log2(N)). If my understanding of comp_mask bits usage is 
 correct it would O(N). But it's not the time complexity I'm worried about,
 it's the fact that it requires user-space developers to go through hoops in
 order to get information that can be much more easily exported.
 

But there's currently *NO* such mean that could give a hint to userspace
developer whether one bit of request's comp_mask is currently effective
in extended QUERY_DEVICE uverbs. While we have such mean in CREATE_FLOW
and DESTROY_FLOW: unsupported bits trigger -EINVAL.

In QUERY_DEVICE, userspace developer is allowed to set request's 
comp_mask to -1: it won't hurt it on current kernel, so why not using 
this value or any other random value ? The program will run: it's Works
For Me.

But the same program (either binary or source code) might fail on
newer kernel where some bits in comp_mask gain a meaning not supported
by the program.

  Or you had to add a different interface, dedicated to retrieve the exact
  supported feature mask.
  
  Moreover, it also ensure the requested features set in comp_mask
  are sequentially set, not skipping intermediate features, eg. the
  highest requested feature also request all the lower ones.
  This way each new feature will have to be stacked on top of the
  existing ones: this is especially important for the request and
  response data structures where fields are added after the
  current ones when expanded to support a new feature.
 
  I think it is perfectly acceptable that not all drivers will implement
  all available features, and so you shouldn't enforce this requirement.
  
  With regard to QUERY_DEVICE: the data structure layout depends on the
  comp_mask value. So either you propose a way to express multipart data
  structure (see CMSG or NETLINK), or we have to ensure the data structure
  is ever-growing, with each new chunck stacked over the existing ones:
  that's the purpose of :
  
  if (cmd.comp_mask  (cmd.comp_mask + 1))
  return -EINVAL;
  
  Also, it makes the comp_mask nothing more than a wasteful version number
  between 0 and 63.
  
  That's what I've already explained earlier in Re: [PATCH v3 06/17]
  IB/core: Add support for extended query device caps:
  
  http://mid.gmane.org/1421844612.13543.40.ca...@opteya.com
 
 Yes, you wrote there:
  Regarding comp_mask (not for this particular verb):
  
  It's not clear whether request's comp_mask describe the request or the
  response, as such I'm puzzled.
  
  How would the kernel and the userspace be able to parse the request and
  the response if they ignore unknown bits ?
  
  How would they be able to skip the unrecognized chunk of the request and
  response buffer ?
  
  How one bit in a comp_mask is related to a chunk in the request or
  response ?
  
  It's likely the kernel or userspace would have to skip the remaining
  comp_mask's bits after encountering an unknown bit as the size of the
  corresponding chunk in request or response would be unknown, making
  impossible to locate the corresponding chunk for the next bit set in
  comp_mask. Having said that, comp_mask is just a complicated way of
  expressing a version, which is turn describe a size (ever growing).
 
 It is my understanding that each comp_mask bit marks a set of fields in 
 the command or in the response struct as valid, so the struct format 
 remains the same and the kernel and userspace don't need to make 
 difficult calculation as to where each field is, but you can still pass 
 a high bit set in comp_mask with one of the lower bits cleared.
 

How can the struct format remain the same, if some fields are added
to implement newer feature ?

 I couldn't find this explicit detail in the mailing list, but I did found
 a presentation that was presented in OFA International Developer 
 Workshop 2013 [1], that gave an example of of an verb where each 
 comp_mask bit marked a different field as valid.
 

Thanks for the link to the presentation.

As I read it the presentation:

- in request, comp_mask give hint to the kernel of additional fields in
the request

Re: [PATCH 3/4] IB/uverbs: ex_query_device: check request's comp_mask

2015-01-26 Thread Yann Droneaud
Hi,

(adding linux-api@ for comments:

 We're introducing a new uverb in the InfiniBand subsystem:
 extended QUERY_DEVICE in v3.19:

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/infiniband/core/uverbs_cmd.c?id=v3.19-rc6#n3297
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/include/uapi/rdma/ib_user_verbs.h?id=v3.19-rc6#n209
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/include/uapi/rdma/ib_user_verbs.h?id=v3.19-rc6#n224

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=5a77abf9a97a
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=860f10a799c8

 As you may not already know, InfiniBand subsystem use a write() 
 syscall[1] to issue ioctl()-like operations. Many operations (aka   
 verbs) are available [2], for each there's a query data structures
 and for some there's a response data structure [3]. As a result to a 
 write() operation, kernel could allocate resources on the task behalf 
 and/or write data back to userspace provided buffers whose pointers 
 were part of buffer passed to write().

 I've expressed concern on the way the new extended QUERY_DEVICE
 was trying to be itself expandable: by silently ignoring shorter
 buffer, returning truncated data, the interface seems awkward.

Re: [PATCH v2 06/17] IB/core: Add support for extended query device caps
http://mid.gmane.org/1418216676.1.45.ca...@opteya.com

Re: [PATCH v3 06/17] IB/core: Add support for extended query device caps
http://mid.gmane.org/1418733236.2779.26.ca...@opteya.com
http://mid.gmane.org/1418824811.3334.3.camel@dworkin
http://mid.gmane.org/1421844612.13543.40.ca...@opteya.com

 I recognize the author's intention to provide a way for userspace
 to retrieve easily the highest supported information as something
 desirable.

 But I believe we must be more strict on the request content and
 fail for any unrecognized, unsupported, incorrect bits to make
 safer to extend the interface latter.

 I've submitted a patchset[4] to address these issues.
 But, while I'm convinced it the way to go, I'm not able to find
 how it could be made to satisfy the original author expectations.
 
 I hope linux-api@ readers could provide some insight regarding
 the way we should implement such interface

[1] 
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/infiniband/core/uverbs_main.c?id=v3.19-rc6#n599

[2] 
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/infiniband/core/uverbs_main.c?id=v3.19-rc6#n81

[3] 
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/include/uapi/rdma/ib_user_verbs.h?id=v3.19-rc6

[4] http://mid.gmane.org/cover.1421931555.git.ydrone...@opteya.com

)

Le dimanche 25 janvier 2015 à 17:23 +0200, Haggai Eran a écrit :
 On 22/01/2015 15:28, Yann Droneaud wrote:
  This patch ensures the extended QUERY_DEVICE uverbs request's
  comp_mask has only known values. If userspace returns unknown
  features, -EINVAL will be returned, allowing to probe/discover
  which features are currently supported by the kernel.
 
 This probing process will be much more cumbersome than it needs to be
 because userspace will have to call QUERY_DEVICE repetitively with
 different comp_mask fields until it finds out the exact set of supported
 bits.
 

O(log2(N))

Or you had to add a different interface, dedicated to retrieve the exact
supported feature mask.

  Moreover, it also ensure the requested features set in comp_mask
  are sequentially set, not skipping intermediate features, eg. the
  highest requested feature also request all the lower ones.
  This way each new feature will have to be stacked on top of the
  existing ones: this is especially important for the request and
  response data structures where fields are added after the
  current ones when expanded to support a new feature.
 
 I think it is perfectly acceptable that not all drivers will implement
 all available features, and so you shouldn't enforce this requirement.

With regard to QUERY_DEVICE: the data structure layout depends on the
comp_mask value. So either you propose a way to express multipart data
structure (see CMSG or NETLINK), or we have to ensure the data structure
is ever-growing, with each new chunck stacked over the existing ones:
that's the purpose of :

if (cmd.comp_mask  (cmd.comp_mask + 1))
return -EINVAL;

 Also, it makes the comp_mask nothing more than a wasteful version number
 between 0 and 63.

That's what I've already explained earlier in Re: [PATCH v3 06/17]
IB/core: Add support for extended query device caps:

http://mid.gmane.org/1421844612.13543.40.ca...@opteya.com

 
 In the specific case of QUERY_DEVICE you might argue that there isn't
 any need for input comp_mask, only for output, and then you may enforce
 the input comp_mask will always be zero.

The extended QUERY_DEVICE uverbs as currently merged is using comp_mask
from input to choose

Re: [PATCH 1/4] IB/uverbs: ex_query_device: fail if output buffer size does not match

2015-01-26 Thread Yann Droneaud
Hi,

Le dimanche 25 janvier 2015 à 17:29 +0200, Haggai Eran a écrit :
 On 22/01/2015 15:28, Yann Droneaud wrote:
  Instead of silently truncating extended QUERY_DEVICE uverb's
  response, see commit 5a77abf9a97a (IB/core: Add support for
  extended query device caps)), this patch makes function
  ib_uverbs_ex_query_device() check the available space in the
  response buffer against the requested features set in comp_mask
  (currently only IB_USER_VERBS_EX_QUERY_DEVICE_ODP per commit
  860f10a799c8 (IB/core: Add flags for on demand paging support).
 
 Currently you only need to check two possible sizes. With each
 additional field you will need to check another size.

Yes, that's mandatory for any interface with a dynamic, ever growing,
data structure.

 Sooner or later someone will forget to add a check and will create a 
 non-backward
 compatible kernel.
 

That's why we have to review userspace interface patch with lot of care.

Anyway, such a change should be fixed easily as it's would be
definitively a bug.

If the behavior I'm proposing is in place, userspace won't be allowed
to supply unsupported bits in a request, so older userspace requests on
newer kernels won't trigger unsupported new behavior. This ensure we can
extend the interfance with no risk of breaking existing program.

  If the response buffer is not large enough to store the expected
  response, -ENOSPC is returned to userspace so that it can adjust
  the size of its buffer.
  
  Note: as offsetof() is used to retrieve the size of the lower chunk
  of the response, beware that it only works if the upper chunk
  is right after, without any implicit padding. And, as the size of
  the latter chunk is added to the base size, implicit padding at the
  end of the structure is not taken in account. Both point must be
  taken in account when extending the uverbs functionalities.
 
 Another point future contributors will easily miss.
 

One should use pahole tool when modifying data structure part of the
ABI to ensure the data structure are correctly aligned on various
architectures.

  Link: http://mid.gmane.org/cover.1421931555.git.ydrone...@opteya.com
  Cc: Sagi Grimberg sa...@mellanox.com
  Cc: Shachar Raindel rain...@mellanox.com
  Cc: Eli Cohen e...@mellanox.com
  Cc: Haggai Eran hagg...@mellanox.com
  Signed-off-by: Yann Droneaud ydrone...@opteya.com
  ---
   drivers/infiniband/core/uverbs_cmd.c | 13 -
   1 file changed, 12 insertions(+), 1 deletion(-)
  
  diff --git a/drivers/infiniband/core/uverbs_cmd.c 
  b/drivers/infiniband/core/uverbs_cmd.c
  index 532d8eba8b02..8668b328b7e6 100644
  --- a/drivers/infiniband/core/uverbs_cmd.c
  +++ b/drivers/infiniband/core/uverbs_cmd.c
  @@ -3302,6 +3302,7 @@ int ib_uverbs_ex_query_device(struct ib_uverbs_file 
  *file,
  struct ib_uverbs_ex_query_device  cmd;
  struct ib_device_attr attr;
  struct ib_device *device;
  +   size_t resp_len;
  int err;
   
  device = file-device-ib_dev;
  @@ -3315,6 +3316,11 @@ int ib_uverbs_ex_query_device(struct ib_uverbs_file 
  *file,
  if (cmd.reserved)
  return -EINVAL;
   
  +   resp_len = offsetof(typeof(resp), odp_caps);
  +
  +   if (ucore-outlen  resp_len)
  +   return -ENOSPC;
  +
  err = device-query_device(device, attr);
  if (err)
  return err;
  @@ -3325,6 +3331,11 @@ int ib_uverbs_ex_query_device(struct ib_uverbs_file 
  *file,
   
   #ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING
  if (cmd.comp_mask  IB_USER_VERBS_EX_QUERY_DEVICE_ODP) {
  +   resp_len += sizeof(resp.odp_caps);
  +
  +   if (ucore-outlen  resp_len)
  +   return -ENOSPC;
  +
  resp.odp_caps.general_caps = attr.odp_caps.general_caps;
  resp.odp_caps.per_transport_caps.rc_odp_caps =
  attr.odp_caps.per_transport_caps.rc_odp_caps;
  @@ -3336,7 +3347,7 @@ int ib_uverbs_ex_query_device(struct ib_uverbs_file 
  *file,
  }
   #endif
   
  -   err = ib_copy_to_udata(ucore, resp, sizeof(resp));
  +   err = ib_copy_to_udata(ucore, resp, resp_len);
  if (err)
  return err;
   
  
 

Regards.

-- 
Yann Droneaud
OPTEYA


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/4] IB/core: ib_copy_to_udata(): don't silently truncate response

2015-01-22 Thread Yann Droneaud
While ib_copy_to_udata() should check for the available output
space as already proposed in some other patches [1][2][3], the
changes brought by commit 5a77abf9a97a (IB/core: Add support for
extended query device caps) are silently truncating the data to
be written to userspace if the output buffer is not large enough
to hold the response data.

Silently truncating the response is not a reliable behavior as
userspace is not given any hint about this truncation: userspace
is leaved with garbage to play with.

Not checking the response buffer size and writing past the
userspace buffer is no good either, but it's the current behavior.

So this patch revert the particular change on ib_copy_to_udata()
as a better behavior is implemented in the upper level function
ib_uverbs_ex_query_device().

[1] [PATCH 00/22] infiniband: improve userspace input check

http://mid.gmane.org/cover.1376847403.git.ydrone...@opteya.com

[2] [PATCH 03/22] infiniband: ib_copy_from_udata(): check input length

http://mid.gmane.org/2bf102a41c51f61965ee09df827abe8fefb523a9.1376847403.git.ydrone...@opteya.com

[3] [PATCH 04/22] infiniband: ib_copy_to_udata(): check output length

http://mid.gmane.org/d27716a3a1c180f832d153a7402f65ea8a75b734.1376847403.git.ydrone...@opteya.com

Link: http://mid.gmane.org/cover.1421931555.git.ydrone...@opteya.com
Cc: Sagi Grimberg sa...@mellanox.com
Cc: Shachar Raindel rain...@mellanox.com
Cc: Eli Cohen e...@mellanox.com
Cc: Haggai Eran hagg...@mellanox.com
Signed-off-by: Yann Droneaud ydrone...@opteya.com
---
 include/rdma/ib_verbs.h | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 0d74f1de99aa..65994a19e840 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1707,10 +1707,7 @@ static inline int ib_copy_from_udata(void *dest, struct 
ib_udata *udata, size_t
 
 static inline int ib_copy_to_udata(struct ib_udata *udata, void *src, size_t 
len)
 {
-   size_t copy_sz;
-
-   copy_sz = min_t(size_t, len, udata-outlen);
-   return copy_to_user(udata-outbuf, src, copy_sz) ? -EFAULT : 0;
+   return copy_to_user(udata-outbuf, src, len) ? -EFAULT : 0;
 }
 
 /**
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/4] IB/uverbs: ex_query_device: check request's comp_mask

2015-01-22 Thread Yann Droneaud
This patch ensures the extended QUERY_DEVICE uverbs request's
comp_mask has only known values. If userspace returns unknown
features, -EINVAL will be returned, allowing to probe/discover
which features are currently supported by the kernel.

Moreover, it also ensure the requested features set in comp_mask
are sequentially set, not skipping intermediate features, eg. the
highest requested feature also request all the lower ones.
This way each new feature will have to be stacked on top of the
existing ones: this is especially important for the request and
response data structures where fields are added after the
current ones when expanded to support a new feature.

Link: http://mid.gmane.org/cover.1421931555.git.ydrone...@opteya.com
Cc: Sagi Grimberg sa...@mellanox.com
Cc: Shachar Raindel rain...@mellanox.com
Cc: Eli Cohen e...@mellanox.com
Cc: Haggai Eran hagg...@mellanox.com
Signed-off-by: Yann Droneaud ydrone...@opteya.com
---
 drivers/infiniband/core/uverbs_cmd.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/infiniband/core/uverbs_cmd.c 
b/drivers/infiniband/core/uverbs_cmd.c
index 8668b328b7e6..80a1c90f1dbf 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -3313,6 +3313,12 @@ int ib_uverbs_ex_query_device(struct ib_uverbs_file 
*file,
if (err)
return err;
 
+   if (cmd.comp_mask  (cmd.comp_mask + 1))
+   return -EINVAL;
+
+   if (cmd.comp_mask  ~(__u32)IB_USER_VERBS_EX_QUERY_DEVICE_ODP)
+   return -EINVAL;
+
if (cmd.reserved)
return -EINVAL;
 
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/4] IB/uverbs: ex_query_device: no need to clear the whole structure

2015-01-22 Thread Yann Droneaud
As only the requested fields are set and copied to userspace,
there's no need to clear the content of the response structure
beforehand.

Link: http://mid.gmane.org/cover.1421931555.git.ydrone...@opteya.com
Cc: Sagi Grimberg sa...@mellanox.com
Cc: Shachar Raindel rain...@mellanox.com
Cc: Eli Cohen e...@mellanox.com
Cc: Haggai Eran hagg...@mellanox.com
Signed-off-by: Yann Droneaud ydrone...@opteya.com
---
 drivers/infiniband/core/uverbs_cmd.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/core/uverbs_cmd.c 
b/drivers/infiniband/core/uverbs_cmd.c
index 80a1c90f1dbf..bedcd8499c9c 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -3331,9 +3331,9 @@ int ib_uverbs_ex_query_device(struct ib_uverbs_file *file,
if (err)
return err;
 
-   memset(resp, 0, sizeof(resp));
copy_query_dev_fields(file, resp.base, attr);
resp.comp_mask = 0;
+   resp.reserved = 0;
 
 #ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING
if (cmd.comp_mask  IB_USER_VERBS_EX_QUERY_DEVICE_ODP) {
@@ -3349,6 +3349,9 @@ int ib_uverbs_ex_query_device(struct ib_uverbs_file *file,
attr.odp_caps.per_transport_caps.uc_odp_caps;
resp.odp_caps.per_transport_caps.ud_odp_caps =
attr.odp_caps.per_transport_caps.ud_odp_caps;
+
+   resp.odp_caps.reserved = 0;
+
resp.comp_mask |= IB_USER_VERBS_EX_QUERY_DEVICE_ODP;
}
 #endif
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4] IB/uverbs: ex_query_device: fail if output buffer size does not match

2015-01-22 Thread Yann Droneaud
Instead of silently truncating extended QUERY_DEVICE uverb's
response, see commit 5a77abf9a97a (IB/core: Add support for
extended query device caps)), this patch makes function
ib_uverbs_ex_query_device() check the available space in the
response buffer against the requested features set in comp_mask
(currently only IB_USER_VERBS_EX_QUERY_DEVICE_ODP per commit
860f10a799c8 (IB/core: Add flags for on demand paging support).

If the response buffer is not large enough to store the expected
response, -ENOSPC is returned to userspace so that it can adjust
the size of its buffer.

Note: as offsetof() is used to retrieve the size of the lower chunk
of the response, beware that it only works if the upper chunk
is right after, without any implicit padding. And, as the size of
the latter chunk is added to the base size, implicit padding at the
end of the structure is not taken in account. Both point must be
taken in account when extending the uverbs functionalities.

Link: http://mid.gmane.org/cover.1421931555.git.ydrone...@opteya.com
Cc: Sagi Grimberg sa...@mellanox.com
Cc: Shachar Raindel rain...@mellanox.com
Cc: Eli Cohen e...@mellanox.com
Cc: Haggai Eran hagg...@mellanox.com
Signed-off-by: Yann Droneaud ydrone...@opteya.com
---
 drivers/infiniband/core/uverbs_cmd.c | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/core/uverbs_cmd.c 
b/drivers/infiniband/core/uverbs_cmd.c
index 532d8eba8b02..8668b328b7e6 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -3302,6 +3302,7 @@ int ib_uverbs_ex_query_device(struct ib_uverbs_file *file,
struct ib_uverbs_ex_query_device  cmd;
struct ib_device_attr attr;
struct ib_device *device;
+   size_t resp_len;
int err;
 
device = file-device-ib_dev;
@@ -3315,6 +3316,11 @@ int ib_uverbs_ex_query_device(struct ib_uverbs_file 
*file,
if (cmd.reserved)
return -EINVAL;
 
+   resp_len = offsetof(typeof(resp), odp_caps);
+
+   if (ucore-outlen  resp_len)
+   return -ENOSPC;
+
err = device-query_device(device, attr);
if (err)
return err;
@@ -3325,6 +3331,11 @@ int ib_uverbs_ex_query_device(struct ib_uverbs_file 
*file,
 
 #ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING
if (cmd.comp_mask  IB_USER_VERBS_EX_QUERY_DEVICE_ODP) {
+   resp_len += sizeof(resp.odp_caps);
+
+   if (ucore-outlen  resp_len)
+   return -ENOSPC;
+
resp.odp_caps.general_caps = attr.odp_caps.general_caps;
resp.odp_caps.per_transport_caps.rc_odp_caps =
attr.odp_caps.per_transport_caps.rc_odp_caps;
@@ -3336,7 +3347,7 @@ int ib_uverbs_ex_query_device(struct ib_uverbs_file *file,
}
 #endif
 
-   err = ib_copy_to_udata(ucore, resp, sizeof(resp));
+   err = ib_copy_to_udata(ucore, resp, resp_len);
if (err)
return err;
 
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/4] IB/core: extended query device caps cleanup for v3.19

2015-01-22 Thread Yann Droneaud
Hi,

Following discussions in thread [PATCH v3 06/17] IB/core: Add support
for extended query device caps [1], I'm proposing the following patchset
to implement a different behavior to handle the request parameters
in ib_uverbs_ex_query_device() in order to restore the current behavior
of ib_copy_to_udata().

The behavior is different from the initial one but I feel more confident
it would allow a better maintainability. Additionally, I believe it's
looking more like the behavior already implemented by other InfiniBand/RDMA
kernel - userspace interfaces.

[1] http://mid.gmane.org/1418733236.2779.26.ca...@opteya.com

Yann Droneaud (4):
  IB/uverbs: ex_query_device: fail if output buffer size does not match
  IB/core: ib_copy_to_udata(): don't silently truncate response
  IB/uverbs: ex_query_device: check request's comp_mask
  IB/uverbs: ex_query_device: no need to clear the whole structure

 drivers/infiniband/core/uverbs_cmd.c | 24 ++--
 include/rdma/ib_verbs.h  |  5 +
 2 files changed, 23 insertions(+), 6 deletions(-)

-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 06/17] IB/core: Add support for extended query device caps

2015-01-21 Thread Yann Droneaud
Hi,

Le mardi 16 décembre 2014 à 21:14 +0100, Yann Droneaud a écrit :
 Le mardi 16 décembre 2014 à 22:07 +0200, Or Gerlitz a écrit :
  On Tue, Dec 16, 2014 at 7:41 PM, Roland Dreier rol...@kernel.org wrote:
   On Tue, Dec 16, 2014 at 4:33 AM, Yann Droneaud ydrone...@opteya.com 
   wrote:
  
   With the suggested change here, buffer overflow won't happen,
   but the error is silently ignored, allowing uverb to return a
   partial result, which is likely not expected by userspace as
   it's a bit difficult to handle it gracefully.
  
   So this has to be removed, and a check on userspace response
   buffer must be added to ib_uverbs_ex_query_device() instead.
  
   I'm not sure of the specifics of the change you're suggesting here.
   Would it be OK to go forward with the patch set we have, and then fix
   this issue before 3.19-rc2?
  
  Roland,
  
  Haggai will address the change in an incremental patch against your
  for-next (3.19-rc1) so the fix will be ready on time for 3.19-rc2
  

AFAICT, no fix has been posted on the list so far.
Do you have a patch to address the issue ready ?

Regards.

-- 
Yann Droneaud
OPTEYA



--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 06/17] IB/core: Add support for extended query device caps

2015-01-21 Thread Yann Droneaud
Hi,

Le mercredi 17 décembre 2014 à 16:12 +0200, Haggai Eran a écrit :
 On 17/12/2014 16:00, Yann Droneaud wrote:
  Le mercredi 17 décembre 2014 à 08:54 +0200, Haggai Eran a écrit :
  On 16/12/2014 14:33, Yann Droneaud wrote:
  Le jeudi 11 décembre 2014 à 17:04 +0200, Haggai Eran a écrit :
   static inline int ib_copy_to_udata(struct ib_udata *udata, void *src, 
  size_t len)
   {
  -return copy_to_user(udata-outbuf, src, len) ? -EFAULT : 0;
  +size_t copy_sz;
  +
  +copy_sz = min_t(size_t, len, udata-outlen);
  +return copy_to_user(udata-outbuf, src, copy_sz) ? -EFAULT : 0;
   }
 
 
  This is not the place to do this: as I'm guessing the purpose of this 
  change from the patch in '[PATCH v3 07/17] IB/core: Add flags for on 
  demand paging support', you're trying to handle uverbs call from 
  a userspace program using a previous, shorter ABI.
 
  Yes, that was my intention.
 
 
  But that's hidding bug where userspace will get it wrong at passing the 
  correct buffer / size for all others uverb calls.
 
  That cannot work that way.
 
  In a previous patchset [1], I've suggested to add a check in 
  ib_copy_{from,to}_udata()[2][3] in order to check the input/output
  buffer size to not read/write past userspace provided buffer
  boundaries: in case of mismatch an error would be returned to
  userspace.
 
  With the suggested change here, buffer overflow won't happen,
  but the error is silently ignored, allowing uverb to return a
  partial result, which is likely not expected by userspace as
  it's a bit difficult to handle it gracefully.
 
  So this has to be removed, and a check on userspace response
  buffer must be added to ib_uverbs_ex_query_device() instead.
 
  I agree that we shouldn't silently ignore bugs in userspace, but I'm not
  sure the alternative is maintainable. If we have in the future N new
  extensions to this verb, will we need to validate the user space given
  output buffer is one of the N possible sizes?
 
  
  Yes.
 
 It would very easy for someone to forget one of the possible sizes, and
 thus blocking support for an older version of libibverbs. Such a bug
 would be hard to detect because it requires testing all previous
 versions of libibverbs. I think the problem you are trying to solve -
 userspace accidentally setting a smaller response size then required -
 will be detected immediately when one attempts to use that code.
 
  
  Additionnaly the size should be checked related to the flags set in the
  comp_mask: eg. requiring IB_USER_VERBS_EX_QUERY_DEVICE_ODP but not
  providing the expected response buffer should be an error.
 
 In a query verb like EX_QUERY_DEVICE, I would expect the user-space code
 to request all bits in the comp_mask, since there's very little benefit
 from requesting a specific set (only a slightly shorter response for the
 system call). The kernel would ignore bits it doesn't know, and the
 user-space would ignore bits it doesn't know in the response.
 

Regarding comp_mask (not for this particular verb):

It's not clear whether request's comp_mask describe the request or the
response, as such I'm puzzled.

How would the kernel and the userspace be able to parse the request and
the response if they ignore unknown bits ?

How would they be able to skip the unrecognized chunk of the request and
response buffer ?

How one bit in a comp_mask is related to a chunk in the request or
response ?

It's likely the kernel or userspace would have to skip the remaining
comp_mask's bits after encountering an unknown bit as the size of the
corresponding chunk in request or response would be unknown, making
impossible to locate the corresponding chunk for the next bit set in
comp_mask. Having said that, comp_mask is just a complicated way of
expressing a version, which is turn describe a size (ever growing).

Anyway, I wanted to point to the perf subsystem and tool:

perf subsystem use a version in it's request (ok, it's a size):

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/include/uapi/linux/perf_event.h?id=v3.19-rc5#n255

It's checked in perf_copy_attr():

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/kernel/events/core.c?id=v3.19-rc5#n7072

As you can see, any unknown value passed to the kernel is rejected with
-EINVAL. Which is the way to follow, with respect to
http://blog.ffwll.ch/2013/11/botching-up-ioctls.html rules.

Then perf userspace tool is able to probe the maximum supported
features:

See __perf_evsel__open()
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/tools/perf/util/evsel.c?id=v3.19-rc5#n1038

Such complicated construct could be used in userspace to check what's
the highest supported comp_mask is.

I don't think it would solve our issue here, but this is a piece to be
reviewed.

By the way, we have to find how to handle the issue as soon as possible,
because we can't let this thing working this way for v3.19.

Regards.

-- 
Yann Droneaud
OPTEYA

Re: [PATCH v3 07/17] IB/core: Add flags for on demand paging support

2014-12-16 Thread Yann Droneaud
;
 + resp.comp_mask |= IB_USER_VERBS_EX_QUERY_DEVICE_ODP;
 + }

You need to clear the tail of the response otherwise, kernel will leak
stack content to userspace:

+ #else /* !CONFIG_INFINIBAND_ON_DEMAND_PAGING */
+   resp.odp_caps.general_caps = 0;
+   resp.odp_caps.per_transport_caps.rc_odp_caps = 0;
+   resp.odp_caps.per_transport_caps.uc_odp_caps = 0;
+   resp.odp_caps.per_transport_caps.ud_odp_caps = 0;

 +#endif
 +

+   resp.odp_caps.reserved = 0

   err = ib_copy_to_udata(ucore, resp, sizeof(resp));
   if (err)
   return err;
 diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
 index 97a999f9e4d8..a41bc5a39ebf 100644
 --- a/include/rdma/ib_verbs.h
 +++ b/include/rdma/ib_verbs.h
 @@ -123,7 +123,8 @@ enum ib_device_cap_flags {
   IB_DEVICE_MEM_WINDOW_TYPE_2A= (123),
   IB_DEVICE_MEM_WINDOW_TYPE_2B= (124),
   IB_DEVICE_MANAGED_FLOW_STEERING = (129),
 - IB_DEVICE_SIGNATURE_HANDOVER= (130)
 + IB_DEVICE_SIGNATURE_HANDOVER= (130),
 + IB_DEVICE_ON_DEMAND_PAGING  = (131),
  };
  
  enum ib_signature_prot_cap {
 @@ -143,6 +144,27 @@ enum ib_atomic_cap {
   IB_ATOMIC_GLOB
  };
  
 +enum ib_odp_general_cap_bits {
 + IB_ODP_SUPPORT = 1  0,
 +};
 +
 +enum ib_odp_transport_cap_bits {
 + IB_ODP_SUPPORT_SEND = 1  0,
 + IB_ODP_SUPPORT_RECV = 1  1,
 + IB_ODP_SUPPORT_WRITE= 1  2,
 + IB_ODP_SUPPORT_READ = 1  3,
 + IB_ODP_SUPPORT_ATOMIC   = 1  4,
 +};
 +
 +struct ib_odp_caps {
 + uint64_t general_caps;
 + struct {
 + uint32_t  rc_odp_caps;
 + uint32_t  uc_odp_caps;
 + uint32_t  ud_odp_caps;
 + } per_transport_caps;
 +};
 +
  struct ib_device_attr {
   u64 fw_ver;
   __be64  sys_image_guid;
 @@ -186,6 +208,7 @@ struct ib_device_attr {
   u8  local_ca_ack_delay;
   int sig_prot_cap;
   int sig_guard_cap;
 + struct ib_odp_caps  odp_caps;
  };
  
  enum ib_mtu {
 @@ -1073,7 +1096,8 @@ enum ib_access_flags {
   IB_ACCESS_REMOTE_READ   = (12),
   IB_ACCESS_REMOTE_ATOMIC = (13),
   IB_ACCESS_MW_BIND   = (14),
 - IB_ZERO_BASED   = (15)
 + IB_ZERO_BASED   = (15),
 + IB_ACCESS_ON_DEMAND = (16),
  };
  
  struct ib_phys_buf {
 diff --git a/include/uapi/rdma/ib_user_verbs.h 
 b/include/uapi/rdma/ib_user_verbs.h
 index e8a96071e352..4275b961bf60 100644
 --- a/include/uapi/rdma/ib_user_verbs.h
 +++ b/include/uapi/rdma/ib_user_verbs.h
 @@ -202,15 +202,30 @@ struct ib_uverbs_query_device_resp {
   __u8  reserved[4];
  };
  
 +enum {
 + IB_USER_VERBS_EX_QUERY_DEVICE_ODP = 1ULL  0,
 +};
 +
  struct ib_uverbs_ex_query_device {
   __u32 comp_mask;
   __u32 reserved;
  };
  
 +struct ib_uverbs_odp_caps {
 + __u64 general_caps;
 + struct {
 + __u32 rc_odp_caps;
 + __u32 uc_odp_caps;
 + __u32 ud_odp_caps;
 + } per_transport_caps;
 + __u32 reserved;
 +};
 +
  struct ib_uverbs_ex_query_device_resp {
   struct ib_uverbs_query_device_resp base;
   __u32 comp_mask;
   __u32 reserved;
 + struct ib_uverbs_odp_caps odp_caps;
  };

Hopefully, no kernel was released with ib_uverbs_ex_query_device_resp
without odp_caps (eg. in between '[PATCH v3 06/17] IB/core: Add support
for extended query device caps' and this one, or 
ib_uverbs_ex_query_device() should have been modified
to handle shorter ib_uverbs_ex_query_device_resp to accomodate the ABI
variations.

  
  struct ib_uverbs_query_port {

Regards.

-- 
Yann Droneaud


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 07/17] IB/core: Add flags for on demand paging support

2014-12-16 Thread Yann Droneaud
Le mardi 16 décembre 2014 à 13:02 +0100, Yann Droneaud a écrit :
 Le jeudi 11 décembre 2014 à 17:04 +0200, Haggai Eran a écrit :
  From: Sagi Grimberg sa...@mellanox.com
  
  * Add a configuration option for enable on-demand paging support in the
infiniband subsystem (CONFIG_INFINIBAND_ON_DEMAND_PAGING). In a later 
  patch,
this configuration option will select the MMU_NOTIFIER configuration 
  option
to enable mmu notifiers.
  * Add a flag for on demand paging (ODP) support in the IB device 
  capabilities.
  * Add a flag to request ODP MR in the access flags to reg_mr.
  * Fail registrations done with the ODP flag when the low-level driver 
  doesn't
support this.
  * Change the conditions in which an MR will be writable to explicitly
specify the access flags. This is to avoid making an MR writable just
because it is an ODP MR.
  * Add a ODP capabilities to the extended query device verb.
  
  Signed-off-by: Sagi Grimberg sa...@mellanox.com
  Signed-off-by: Shachar Raindel rain...@mellanox.com
  Signed-off-by: Haggai Eran hagg...@mellanox.com
  ---
   drivers/infiniband/Kconfig   | 10 ++
   drivers/infiniband/core/umem.c   |  8 +---
   drivers/infiniband/core/uverbs_cmd.c | 25 +
   include/rdma/ib_verbs.h  | 28 ++--
   include/uapi/rdma/ib_user_verbs.h| 15 +++
   5 files changed, 81 insertions(+), 5 deletions(-)
  
  diff --git a/drivers/infiniband/Kconfig b/drivers/infiniband/Kconfig
  index 77089399359b..089a2c2af329 100644
  --- a/drivers/infiniband/Kconfig
  +++ b/drivers/infiniband/Kconfig
  @@ -38,6 +38,16 @@ config INFINIBAND_USER_MEM
  depends on INFINIBAND_USER_ACCESS != n
  default y
   
  +config INFINIBAND_ON_DEMAND_PAGING
  +   bool InfiniBand on-demand paging support
  +   depends on INFINIBAND_USER_MEM
  +   default y
  +   ---help---
  + On demand paging support for the InfiniBand subsystem.
  + Together with driver support this allows registration of
  + memory regions without pinning their pages, fetching the
  + pages on demand instead.
  +
   config INFINIBAND_ADDR_TRANS
  bool
  depends on INFINIBAND
  diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
  index 6f152628e0d2..c328e4693d14 100644
  --- a/drivers/infiniband/core/umem.c
  +++ b/drivers/infiniband/core/umem.c
  @@ -107,13 +107,15 @@ struct ib_umem *ib_umem_get(struct ib_ucontext 
  *context, unsigned long addr,
  umem-page_size = PAGE_SIZE;
  umem-pid   = get_task_pid(current, PIDTYPE_PID);
  /*
  -* We ask for writable memory if any access flags other than
  -* remote read are set.  Local write and remote write
  +* We ask for writable memory if any of the following
  +* access flags are set.  Local write and remote write
   * obviously require write access.  Remote atomic can do
   * things like fetch and add, which will modify memory, and
   * MW bind can change permissions by binding a window.
   */
  -   umem-writable  = !!(access  ~IB_ACCESS_REMOTE_READ);
  +   umem-writable  = !!(access 
  +   (IB_ACCESS_LOCAL_WRITE   | IB_ACCESS_REMOTE_WRITE |
  +IB_ACCESS_REMOTE_ATOMIC | IB_ACCESS_MW_BIND));
   
  /* We assume the memory is from hugetlb until proved otherwise */
  umem-hugetlb   = 1;
  diff --git a/drivers/infiniband/core/uverbs_cmd.c 
  b/drivers/infiniband/core/uverbs_cmd.c
  index c7a43624c96b..f9326ccda4b5 100644
  --- a/drivers/infiniband/core/uverbs_cmd.c
  +++ b/drivers/infiniband/core/uverbs_cmd.c
  @@ -953,6 +953,18 @@ ssize_t ib_uverbs_reg_mr(struct ib_uverbs_file *file,
  goto err_free;
  }
   
  +   if (cmd.access_flags  IB_ACCESS_ON_DEMAND) {
  +   struct ib_device_attr attr;
  +
  +   ret = ib_query_device(pd-device, attr);
  +   if (ret || !(attr.device_cap_flags 
  +   IB_DEVICE_ON_DEMAND_PAGING)) {
  +   pr_debug(ODP support not available\n);
  +   ret = -EINVAL;
  +   goto err_put;
  +   }
  +   }
  +
  mr = pd-device-reg_user_mr(pd, cmd.start, cmd.length, cmd.hca_va,
   cmd.access_flags, udata);
  if (IS_ERR(mr)) {
  @@ -3289,6 +3301,19 @@ int ib_uverbs_ex_query_device(struct ib_uverbs_file 
  *file,
  copy_query_dev_fields(file, resp.base, attr);
  resp.comp_mask = 0;
   
  +#ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING
  +   if (cmd.comp_mask  IB_USER_VERBS_EX_QUERY_DEVICE_ODP) {
  +   resp.odp_caps.general_caps = attr.odp_caps.general_caps;
  +   resp.odp_caps.per_transport_caps.rc_odp_caps =
  +   attr.odp_caps.per_transport_caps.rc_odp_caps;
  +   resp.odp_caps.per_transport_caps.uc_odp_caps =
  +   attr.odp_caps.per_transport_caps.uc_odp_caps;
  +   resp.odp_caps.per_transport_caps.ud_odp_caps

Re: [PATCH v3 06/17] IB/core: Add support for extended query device caps

2014-12-16 Thread Yann Droneaud
 uverb to return a
partial result, which is likely not expected by userspace as
it's a bit difficult to handle it gracefully.

So this has to be removed, and a check on userspace response
buffer must be added to ib_uverbs_ex_query_device() instead.

[1] [PATCH 00/22] infiniband: improve userspace input check

http://marc.info/?i=cover.1376847403.git.ydrone...@opteya.com
http://mid.gmane.org/cover.1376847403.git.ydrone...@opteya.com

[2] [PATCH 03/22] infiniband: ib_copy_from_udata(): check input length

http://mid.gmane.org/2bf102a41c51f61965ee09df827abe8fefb523a9.1376847403.git.ydrone...@opteya.com

[3] [PATCH 04/22] infiniband: ib_copy_to_udata(): check output length

http://mid.gmane.org/d27716a3a1c180f832d153a7402f65ea8a75b734.1376847403.git.ydrone...@opteya.com

Regards.

-- 
Yann Droneaud
OPTEYA


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 06/17] IB/core: Add support for extended query device caps

2014-12-16 Thread Yann Droneaud
Hi,

Le mardi 16 décembre 2014 à 22:07 +0200, Or Gerlitz a écrit :
 On Tue, Dec 16, 2014 at 7:41 PM, Roland Dreier rol...@kernel.org wrote:
  On Tue, Dec 16, 2014 at 4:33 AM, Yann Droneaud ydrone...@opteya.com wrote:
 
  With the suggested change here, buffer overflow won't happen,
  but the error is silently ignored, allowing uverb to return a
  partial result, which is likely not expected by userspace as
  it's a bit difficult to handle it gracefully.
 
  So this has to be removed, and a check on userspace response
  buffer must be added to ib_uverbs_ex_query_device() instead.
 
  I'm not sure of the specifics of the change you're suggesting here.
  Would it be OK to go forward with the patch set we have, and then fix
  this issue before 3.19-rc2?
 
 Roland,
 
 Haggai will address the change in an incremental patch against your
 for-next (3.19-rc1) so the fix will be ready on time for 3.19-rc2
 

That's would be great.

Thanks

-- 
Yann Droneaud
OPTEYA


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 04/17] IB/core: Add umem function to read data from user-space

2014-12-11 Thread Yann Droneaud
Le jeudi 11 décembre 2014 à 13:09 +0200, Haggai Eran a écrit :
 On 10/12/2014 18:22, Yann Droneaud wrote:
  Hi,
  
  Le mardi 11 novembre 2014 à 18:36 +0200, Haggai Eran a écrit :
  In some drivers there's a need to read data from a user space area that
  was pinned using ib_umem, when running from a different process context.
 
  The ib_umem_copy_from function allows reading data from the physical pages
  pinned in the ib_umem struct.
 
  Signed-off-by: Haggai Eran hagg...@mellanox.com
  ---
   drivers/infiniband/core/umem.c | 26 ++
   include/rdma/ib_umem.h |  2 ++
   2 files changed, 28 insertions(+)
 
  diff --git a/drivers/infiniband/core/umem.c 
  b/drivers/infiniband/core/umem.c
  index e0f883292374..77bec75963e7 100644
  --- a/drivers/infiniband/core/umem.c
  +++ b/drivers/infiniband/core/umem.c
  @@ -292,3 +292,29 @@ int ib_umem_page_count(struct ib_umem *umem)
 return n;
   }
   EXPORT_SYMBOL(ib_umem_page_count);
  +
  +/*
  + * Copy from the given ib_umem's pages to the given buffer.
  + *
  + * umem - the umem to copy from
  + * offset - offset to start copying from
  + * dst - destination buffer
  + * length - buffer length
  + *
  + * Returns the number of copied bytes, or an error code.
  + */
  +int ib_umem_copy_from(struct ib_umem *umem, size_t offset, void *dst,
  +size_t length)
  
  I would prefer the arguments in the same order as ib_copy_from_udata()
  
  int ib_umem_copy_from(void *dst,
struct ib_umem *umem, size_t umem_offset,
size_t length);
 
 No problem.
 
  +{
  +  size_t end = offset + length;
  +
  +  if (offset  umem-length || end  umem-length || end  offset) {
  +  pr_err(ib_umem_copy_from not in range. offset: %zd umem 
  length: %zd end: %zd\n,
  + offset, umem-length, end);
  +  return -EINVAL;
  +  }
  +

I think the test could be rewritten as:

if (offset  umem-length || length  umem-length - offset)

That's one operation less.


  +  return sg_pcopy_to_buffer(umem-sg_head.sgl, umem-nmap, dst, length,
  +  offset + ib_umem_offset(umem));
  +}
  +EXPORT_SYMBOL(ib_umem_copy_from);
  
  As the function return a int, no more than INT_MAX bytes (likely 2^31
  - 1) can be copied. Perhaps changing the return type to to ssize_t would
  be better (and a check to enfore ssize_t maximum value). Or change the
  function could return 0 in case of success or a error code, just like
  ib_copy_from_udata().
  
 
 Okay. I'll change it to match ib_copy_from_udata. We're checking the
 umem size in the call site of this function anyway, and the only reason
 I see sg_pcopy_to_buffer would return less than *length* bytes is when
 reaching the end of the scatterlist.
 

As the length is compared against umem-length (+ offset), this would 
mean umem-length is not synchronized with the length of the data 
described by the scatter/gather list ?


BTW, ib_copy_from_udata() is defined as an inline function. Would it be
better to have ib_umem_copy_from() being an inline function too ?
(In such case, I would remove the error message to not duplicate it
 across all modules using the function)

Regards.

-- 
Yann Droneaud
OPTEYA


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 04/17] IB/core: Add umem function to read data from user-space

2014-12-11 Thread Yann Droneaud
Hi Or,

Le jeudi 11 décembre 2014 à 15:09 +0200, Or Gerlitz a écrit :
 On Thu, Dec 11, 2014 at 2:39 PM, Yann Droneaud ydrone...@opteya.com wrote:

  BTW, ib_copy_from_udata() is defined as an inline function. Would it be
  better to have ib_umem_copy_from() being an inline function too ?
  (In such case, I would remove the error message to not duplicate it
   across all modules using the function)
 
 Yann, Lets leave your 2nd comment to be addressed as future
 cleanup/improvement. Really, these patches
 are on the list for 7-8 moths and we were asking for feedback over all
 this time, jumping now with inlining comments and such when the merge
 window is just across the door isn't appropriate.

Sure, I've no right and no power to delay or NACK anything: I'm just 
lazily following the list, and my involvement in InfiniBand/RDMA
community is slowly dropping to 0, being involved in other (paid) 
projects, with some spike from time to time, waiting to find time to 
drain my patch queue. Unfortunately I'm not on the same schedule.

Anyway, everyone is free to provide comments at anytime.
And you're free to take them in account.

It's a fact of kernel development: sometime a patchset take many months
and multiple revision before being applied.

Regards.

-- 
Yann Droneaud
OPTEYA


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 06/17] IB/core: Add support for extended query device caps

2014-12-10 Thread Yann Droneaud
Hi Roland,

Le mardi 09 décembre 2014 à 13:43 -0800, Roland Dreier a écrit :
 I was getting ready to apply the ODP series, but then I noticed:
 
 On Tue, Nov 11, 2014 at 8:36 AM, Haggai Eran hagg...@mellanox.com wrote:
  diff --git a/drivers/infiniband/core/uverbs_main.c 
  b/drivers/infiniband/core/uverbs_main.c
  index 71ab83fde472..974025028790 100644
  --- a/drivers/infiniband/core/uverbs_main.c
  +++ b/drivers/infiniband/core/uverbs_main.c
  @@ -122,7 +122,8 @@ static int (*uverbs_ex_cmd_table[])(struct 
  ib_uverbs_file *file,
  struct ib_udata *ucore,
  struct ib_udata *uhw) = {
  [IB_USER_VERBS_EX_CMD_CREATE_FLOW]  = ib_uverbs_ex_create_flow,
  -   [IB_USER_VERBS_EX_CMD_DESTROY_FLOW] = ib_uverbs_ex_destroy_flow
  +   [IB_USER_VERBS_EX_CMD_DESTROY_FLOW] = ib_uverbs_ex_destroy_flow,
  +   [IB_USER_VERBS_EX_CMD_QUERY_DEVICE] = ib_uverbs_ex_query_device
   };
 
   static void ib_uverbs_add_one(struct ib_device *device);
 
  diff --git a/include/uapi/rdma/ib_user_verbs.h 
  b/include/uapi/rdma/ib_user_verbs.h
  index 26daf55ff76e..ed8c3d9da42c 100644
  --- a/include/uapi/rdma/ib_user_verbs.h
  +++ b/include/uapi/rdma/ib_user_verbs.h
  @@ -90,8 +90,9 @@ enum {
   };
 
   enum {
  +   IB_USER_VERBS_EX_CMD_QUERY_DEVICE = IB_USER_VERBS_CMD_QUERY_DEVICE,
  IB_USER_VERBS_EX_CMD_CREATE_FLOW = IB_USER_VERBS_CMD_THRESHOLD,
  -   IB_USER_VERBS_EX_CMD_DESTROY_FLOW
  +   IB_USER_VERBS_EX_CMD_DESTROY_FLOW,
   };
 
 And this makes no sense to me.  I thought the whole point of EX
 commands was to add then after IB_USER_VERBS_CMD_THRESHOLD.
 
 In this case, if you make IB_USER_VERBS_EX_CMD_QUERY_DEVICE =
 IB_USER_VERBS_CMD_QUERY_DEVICE then doesn't the entry in
 uverbs_cmd_table[] for normal query device get overwritten with
 ib_uverbs_ex_query_device()??
 

IB_USER_VERBS_CMD_THRESHOLD was introduced as part of commit
400dbc96583f ('IB/core: Infrastructure for extensible uverbs commands').

When I've 'upgraded' it to more extensible scheme, in commit
f21519b23c1b ('IB/core: extended command: an improved infrastructure 
for uverbs commands'), I've put the extended commands in a different 
namespace, with the idea we could later convert non-extended commands 
to extended ones and phase out the older ABI in the future.

I seems that what I've envisioned is starting to happen.

Regards.

-- 
Yann Droneaud
OPTEYA


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 06/17] IB/core: Add support for extended query device caps

2014-12-10 Thread Yann Droneaud
 a check
for it being 0.

 +};
 +
 +struct ib_uverbs_ex_query_device_resp {
 + struct ib_uverbs_query_device_resp base;
 + __u32 comp_mask;
 +};
 +

_ex response buffer are supposed to be aligned on 64bit boundary:
you should probably add padding at the end of the structure and ensure
it's cleared before send it to userspace.

See commit f21519b23c1b ('IB/core: extended command: an improved
infrastructure for uverbs commands').

Regards.

-- 
Yann Droneaud


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 04/17] IB/core: Add umem function to read data from user-space

2014-12-10 Thread Yann Droneaud
Hi,

Le mardi 11 novembre 2014 à 18:36 +0200, Haggai Eran a écrit :
 In some drivers there's a need to read data from a user space area that
 was pinned using ib_umem, when running from a different process context.
 
 The ib_umem_copy_from function allows reading data from the physical pages
 pinned in the ib_umem struct.
 
 Signed-off-by: Haggai Eran hagg...@mellanox.com
 ---
  drivers/infiniband/core/umem.c | 26 ++
  include/rdma/ib_umem.h |  2 ++
  2 files changed, 28 insertions(+)
 
 diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
 index e0f883292374..77bec75963e7 100644
 --- a/drivers/infiniband/core/umem.c
 +++ b/drivers/infiniband/core/umem.c
 @@ -292,3 +292,29 @@ int ib_umem_page_count(struct ib_umem *umem)
   return n;
  }
  EXPORT_SYMBOL(ib_umem_page_count);
 +
 +/*
 + * Copy from the given ib_umem's pages to the given buffer.
 + *
 + * umem - the umem to copy from
 + * offset - offset to start copying from
 + * dst - destination buffer
 + * length - buffer length
 + *
 + * Returns the number of copied bytes, or an error code.
 + */
 +int ib_umem_copy_from(struct ib_umem *umem, size_t offset, void *dst,
 +   size_t length)

I would prefer the arguments in the same order as ib_copy_from_udata()

int ib_umem_copy_from(void *dst,
  struct ib_umem *umem, size_t umem_offset,
  size_t length);
 

 +{
 + size_t end = offset + length;
 +
 + if (offset  umem-length || end  umem-length || end  offset) {
 + pr_err(ib_umem_copy_from not in range. offset: %zd umem 
 length: %zd end: %zd\n,
 +offset, umem-length, end);
 + return -EINVAL;
 + }
 +
 + return sg_pcopy_to_buffer(umem-sg_head.sgl, umem-nmap, dst, length,
 + offset + ib_umem_offset(umem));
 +}
 +EXPORT_SYMBOL(ib_umem_copy_from);

As the function return a int, no more than INT_MAX bytes (likely 2^31
- 1) can be copied. Perhaps changing the return type to to ssize_t would
be better (and a check to enfore ssize_t maximum value). Or change the
function could return 0 in case of success or a error code, just like
ib_copy_from_udata().

 diff --git a/include/rdma/ib_umem.h b/include/rdma/ib_umem.h
 index 7ed6d4ff58dc..ee897724cbf8 100644
 --- a/include/rdma/ib_umem.h
 +++ b/include/rdma/ib_umem.h
 @@ -84,6 +84,8 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, 
 unsigned long addr,
   size_t size, int access, int dmasync);
  void ib_umem_release(struct ib_umem *umem);
  int ib_umem_page_count(struct ib_umem *umem);
 +int ib_umem_copy_from(struct ib_umem *umem, size_t start, void *dst,
 +   size_t length);
  
  #else /* CONFIG_INFINIBAND_USER_MEM */
  


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH libibverbs 6/9] ibv_cmd_open_qp(): initialize reserved field in struct ibv_open_qp

2014-10-01 Thread Yann Droneaud
Properly initialize data structure exchanged with kernel uverbs
layer:
- may remove valgrind spurious warnings;
- may allow to use the field later for future expansion.

Link: http://marc.info/?i=cover.1412163687.git.ydrone...@opteya.com
Fixes: 5f5292bbc51fb ('Add ibv_open_qp() for XRC receive QPs')
Cc: Sean Hefty sean.he...@intel.com
Cc: Yishai Hadas yish...@mellanox.com
Signed-off-by: Yann Droneaud ydrone...@opteya.com
---
 src/cmd.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/cmd.c b/src/cmd.c
index f80ce4d30832..f9e4a03fd84d 100644
--- a/src/cmd.c
+++ b/src/cmd.c
@@ -785,6 +785,7 @@ int ibv_cmd_open_qp(struct ibv_context *context, struct 
verbs_qp *qp,
cmd-pd_handle   = xrcd-handle;
cmd-qpn = attr-qp_num;
cmd-qp_type = attr-qp_type;
+   memset(cmd-reserved, 0, sizeof(cmd-reserved));
 
if (write(context-cmd_fd, cmd, cmd_size) != cmd_size)
return errno;
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH libibverbs 3/9] ibv_cmd_post_send(): initialize reserved field in struct ibv_kern_send_wr

2014-10-01 Thread Yann Droneaud
Properly initialize data structure exchanged with kernel uverbs
layer:
- may remove valgrind spurious warnings;
- may allow to use the field later for future expansion.

Link: http://marc.info/?i=cover.1412163687.git.ydrone...@opteya.com
Fixes: eb0663777c24 ('Add support for new datapath kernel commands')
Signed-off-by: Yann Droneaud ydrone...@opteya.com
---
 src/cmd.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/src/cmd.c b/src/cmd.c
index d057494260ee..8d8f7e8294e5 100644
--- a/src/cmd.c
+++ b/src/cmd.c
@@ -999,6 +999,9 @@ int ibv_cmd_post_send(struct ibv_qp *ibqp, struct 
ibv_send_wr *wr,
tmp-wr.ud.ah  = i-wr.ud.ah-handle;
tmp-wr.ud.remote_qpn  = i-wr.ud.remote_qpn;
tmp-wr.ud.remote_qkey = i-wr.ud.remote_qkey;
+   memset(tmp-wr.ud.reserved, 0,
+  sizeof(tmp-wr.ud.reserved)
+  + (sizeof(tmp-wr) - sizeof(tmp-wr.ud)));
} else {
switch (i-opcode) {
case IBV_WR_RDMA_WRITE:
@@ -1007,6 +1010,9 @@ int ibv_cmd_post_send(struct ibv_qp *ibqp, struct 
ibv_send_wr *wr,
tmp-wr.rdma.remote_addr =
i-wr.rdma.remote_addr;
tmp-wr.rdma.rkey = i-wr.rdma.rkey;
+   memset(tmp-wr.rdma.reserved, 0,
+  sizeof(tmp-wr.rdma.reserved)
+  + (sizeof(tmp-wr) - 
sizeof(tmp-wr.rdma)));
break;
case IBV_WR_ATOMIC_CMP_AND_SWP:
case IBV_WR_ATOMIC_FETCH_AND_ADD:
@@ -1016,8 +1022,12 @@ int ibv_cmd_post_send(struct ibv_qp *ibqp, struct 
ibv_send_wr *wr,
i-wr.atomic.compare_add;
tmp-wr.atomic.swap = i-wr.atomic.swap;
tmp-wr.atomic.rkey = i-wr.atomic.rkey;
+   memset(tmp-wr.atomic.reserved, 0,
+  sizeof(tmp-wr.atomic.reserved)
+  + (sizeof(tmp-wr) - 
sizeof(tmp-wr.atomic)));
break;
default:
+   memset(tmp-wr, 0, sizeof(tmp-wr));
break;
}
}
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH libibverbs 1/9] ibv_cmd_create_ah(): initialize reserved fields in various structures

2014-10-01 Thread Yann Droneaud
Reserved fields in struct ibv_kern_global_route,
struct ibv_kern_ah_attr and struct ibv_create_ah must be
properly initialized when exchanged with kernel uverbs layer,
as:
- it may remove valgrind spurious warnings;
- it may allow to use the field later for future expansion.

Link: http://marc.info/?i=cover.1412163687.git.ydrone...@opteya.com
Fixes: eb0663777c24 ('Add support for new datapath kernel commands')
Signed-off-by: Yann Droneaud ydrone...@opteya.com
---
 src/cmd.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/cmd.c b/src/cmd.c
index 45ea06ff4705..b281082f2f4d 100644
--- a/src/cmd.c
+++ b/src/cmd.c
@@ -1179,16 +1179,19 @@ int ibv_cmd_create_ah(struct ibv_pd *pd, struct ibv_ah 
*ah,
IBV_INIT_CMD_RESP(cmd, sizeof cmd, CREATE_AH, resp, sizeof resp);
cmd.user_handle= (uintptr_t) ah;
cmd.pd_handle  = pd-handle;
+   cmd.reserved   = 0;
cmd.attr.dlid  = attr-dlid;
cmd.attr.sl= attr-sl;
cmd.attr.src_path_bits = attr-src_path_bits;
cmd.attr.static_rate   = attr-static_rate;
cmd.attr.is_global = attr-is_global;
cmd.attr.port_num  = attr-port_num;
+   cmd.attr.reserved  = 0;
cmd.attr.grh.flow_label= attr-grh.flow_label;
cmd.attr.grh.sgid_index= attr-grh.sgid_index;
cmd.attr.grh.hop_limit = attr-grh.hop_limit;
cmd.attr.grh.traffic_class = attr-grh.traffic_class;
+   cmd.attr.grh.reserved  = 0;
memcpy(cmd.attr.grh.dgid, attr-grh.dgid.raw, 16);
 
if (write(pd-context-cmd_fd, cmd, sizeof cmd) != sizeof cmd)
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH libibverbs 4/9] ibv_cmd_create_srq_ex(): set reserved field in struct ibv_create_xsrq

2014-10-01 Thread Yann Droneaud
Properly initialize data structure exchanged with kernel uverbs
layer:
- may remove valgrind spurious warnings;
- may allow to use the field later for future expansion.

Link: http://marc.info/?i=cover.1412163687.git.ydrone...@opteya.com
Fixes: 40c1365b2198 ('Add support for XRC SRQs')
Cc: Sean Hefty sean.he...@intel.com
Cc: Yishai Hadas yish...@mellanox.com
Signed-off-by: Yann Droneaud ydrone...@opteya.com
---
 src/cmd.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/cmd.c b/src/cmd.c
index 8d8f7e8294e5..f80ce4d30832 100644
--- a/src/cmd.c
+++ b/src/cmd.c
@@ -463,6 +463,7 @@ int ibv_cmd_create_srq_ex(struct ibv_context *context,
cmd-max_wr  = attr_ex-attr.max_wr;
cmd-max_sge = attr_ex-attr.max_sge;
cmd-srq_limit   = attr_ex-attr.srq_limit;
+   cmd-reserved= 0;
 
cmd-srq_type = (attr_ex-comp_mask  IBV_SRQ_INIT_ATTR_TYPE) ?
attr_ex-srq_type : IBV_SRQT_BASIC;
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH libibverbs 9/9] ibv_cmd_destroy_flow(): initialize comp_mask explicitly

2014-10-01 Thread Yann Droneaud
There's no need to clear the whole struct ibv_destroy_flow
with memset(): only comp_mask field is not explicitly
initialized in the function.

This patch removes call to memset() and initialize
comp_mask to 0.

Link: http://marc.info/?i=cover.1412163687.git.ydrone...@opteya.com
Fixes: 389de6a6ef4e ('Add receive flow steering support')
Cc: Hadar Hen Zion had...@mellanox.com
Cc: Or Gerlitz ogerl...@mellanox.com
Cc: Matan Barak mat...@mellanox.com
Signed-off-by: Yann Droneaud ydrone...@opteya.com
---
 src/cmd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/cmd.c b/src/cmd.c
index 2f039f6852bb..d849e6dac214 100644
--- a/src/cmd.c
+++ b/src/cmd.c
@@ -1384,8 +1384,8 @@ int ibv_cmd_destroy_flow(struct ibv_flow *flow_id)
struct ibv_destroy_flow cmd;
int ret = 0;
 
-   memset(cmd, 0, sizeof(cmd));
IBV_INIT_CMD_EX(cmd, sizeof(cmd), DESTROY_FLOW);
+   cmd.comp_mask = 0;
cmd.flow_handle = flow_id-handle;
 
if (write(flow_id-context-cmd_fd, cmd, sizeof(cmd)) != sizeof(cmd))
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH libibverbs 8/9] ibv_cmd_create_flow(): initialize flow_id-comp_mask

2014-10-01 Thread Yann Droneaud
ibv_cmd_create_flow() allocates and returns a struct ibv_flow,
but doesn't initialize its comp_mask field.

Link: http://marc.info/?i=cover.1412163687.git.ydrone...@opteya.com
Fixes: 389de6a6ef4 ('Add receive flow steering support')
Cc: Hadar Hen Zion had...@mellanox.com
Cc: Or Gerlitz ogerl...@mellanox.com
Cc: Matan Barak mat...@mellanox.com
Signed-off-by: Yann Droneaud ydrone...@opteya.com
---
 src/cmd.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/cmd.c b/src/cmd.c
index dc155d6bf4c7..2f039f6852bb 100644
--- a/src/cmd.c
+++ b/src/cmd.c
@@ -1372,6 +1372,7 @@ struct ibv_flow *ibv_cmd_create_flow(struct ibv_qp *qp,
 
flow_id-context = qp-context;
flow_id-handle = resp.flow_handle;
+   flow_id-comp_mask = 0;
return flow_id;
 err:
free(flow_id);
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH libibverbs 5/9] kern-abi: remove unused qp_type union in struct ibv_kern_send_wr

2014-10-01 Thread Yann Droneaud
Added by commit c7e3e61052dd ('Add support for XRC QPs') in
struct ibv_kern_send_wr, qp_type union is not initialized
in ibv_cmd_post_send().

Additionnaly, as reported by pahole, this field introduces
padding in the end of struct ibv_kern_send_wr:

struct ibv_kern_send_wr {

/* ... */

union {
struct {
__u32  remote_srqn;  /*56 4 */
} xrc;   /*   4 */
} qp_type;   /*56 4 */

/* size: 64, cachelines: 1, members: 7 */
/* padding: 4 */
};

This padding is not initialized too.

Since this field is not used at all kernel side and not exposed
to drivers, qp_type union in struct ibv_kern_send_wr could be
removed from the structure.

Link: http://marc.info/?i=cover.1412163687.git.ydrone...@opteya.com
Fixes: c7e3e61052dd ('Add support for XRC QPs')
Cc: Sean Hefty sean.he...@intel.com
Cc: Yishai Hadas yish...@mellanox.com
Signed-off-by: Yann Droneaud ydrone...@opteya.com
---
 include/infiniband/kern-abi.h | 5 -
 1 file changed, 5 deletions(-)

diff --git a/include/infiniband/kern-abi.h b/include/infiniband/kern-abi.h
index 91b45d837239..7023a0294df2 100644
--- a/include/infiniband/kern-abi.h
+++ b/include/infiniband/kern-abi.h
@@ -676,11 +676,6 @@ struct ibv_kern_send_wr {
__u32 reserved;
} ud;
} wr;
-   union {
-   struct {
-   __u32 remote_srqn;
-   } xrc;
-   } qp_type;
 };
 
 struct ibv_kern_eth_filter {
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH libibverbs 7/9] ibv_cmd_create_flow(): explicitly clear fields

2014-10-01 Thread Yann Droneaud
There's no need to clear the whole struct ibv_create_flow
+ struct ibv_kern_spec with memset().

This patch removes call to memset(), adds explicit
initialization for reserved fields, set comp_mask
explicitly

Link: http://marc.info/?i=cover.1412163687.git.ydrone...@opteya.com
Fixes: 389de6a6ef4e ('Add receive flow steering support')
Cc: Hadar Hen Zion had...@mellanox.com
Cc: Or Gerlitz ogerl...@mellanox.com
Cc: Matan Barak mat...@mellanox.com
Signed-off-by: Yann Droneaud ydrone...@opteya.com
---
 src/cmd.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/src/cmd.c b/src/cmd.c
index f9e4a03fd84d..dc155d6bf4c7 100644
--- a/src/cmd.c
+++ b/src/cmd.c
@@ -1289,6 +1289,7 @@ static int ib_spec_to_kern_spec(struct ibv_flow_spec 
*ib_spec,
struct ibv_kern_spec *kern_spec)
 {
kern_spec-hdr.type = ib_spec-hdr.type;
+   kern_spec-hdr.reserved = 0;
 
switch (ib_spec-hdr.type) {
case IBV_FLOW_SPEC_ETH:
@@ -1337,15 +1338,17 @@ struct ibv_flow *ibv_cmd_create_flow(struct ibv_qp *qp,
flow_id = malloc(sizeof(*flow_id));
if (!flow_id)
return NULL;
-   memset(cmd, 0, cmd_size);
 
+   cmd-comp_mask = 0;
cmd-qp_handle = qp-handle;
 
+   cmd-flow_attr.size = 0;
cmd-flow_attr.type = flow_attr-type;
cmd-flow_attr.priority = flow_attr-priority;
cmd-flow_attr.num_of_specs = flow_attr-num_of_specs;
cmd-flow_attr.port = flow_attr-port;
cmd-flow_attr.flags = flow_attr-flags;
+   cmd-flow_attr.reserved[0] = cmd-flow_attr.reserved[1] = 1;
 
kern_spec = cmd + 1;
ib_spec = flow_attr + 1;
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   3   4   5   >