Re: BUG: NULL pointer dereference at ib_uverbs_comp_handler+0x20

2017-08-01 Thread Matan Barak
On Fri, Jul 28, 2017 at 8:38 PM, Logan Gunthorpe <log...@deltatee.com> wrote:
> Hi,
>
> My system has been failing with recent kernels (4.12.x and 4.13-rc2)
> with a NULL pointer dereference at the stack trace given at the end of
> this email. This happens when simply running 'ib_write_bw -R '
> with a Chelsio T6 (cxgb4). I've bisected (log attached) to find the
> offending commit to be:
>
> commit 1e7710f3f6563940bb6bbc94aa8eadfd344a86af
> Author: Matan Barak <mat...@mellanox.com>
>   IB/core: Change completion channel to use the reworked objects schema
>
> Reverting this commit (and the dependent commits db1b5ddd53365 and
> e0fcc61113c that also fix other bugs with this commit) from v4.12.3
> fixes the issue.
>
> I did the bisect with the userspace libraries in Debian Stretch but I
> also had this bug with rdma-core v14. I was pretty sure v4.12 kernels
> worked for me in the past but likely only before I upgraded from Jessie
> to Stretch.
>
> Thanks,
>
> Logan
>

Hi Logan,

I've tried to reproduce this in my setup (ConnectX 4, RoCE mode) using
1e7710f3f6563940bb6bbc94aa8eadfd344a86af as the kernel's head.
I've used d779dd9a9e8f as rdma-core user-space and the latest perftest bits.
I couldn't reproduce this problem.
I'll try to review this commit again, but please provide more information.
For example, do you see the iwpm_register_pid error when these commits
are reverted?
Does this also happen when using the plain rdma-cm examples (ucmatose,
rping)? Does it happen in a plain verbs application (ibv_rc_pingpong)?
I assume you use iWarp, right? Did you test other modes?
Did you reproduce this issue with your ConnectX 4 as well?
Could you please reproduce it with KASAN as well?

PS, e0fcc61113c isn't a bug fix, it's just a simple refactor.

Regards,
Matan

>
> PS. As a side rant, this bug was found after a very *frustrating* day of
> what was supposed to be the 20 minute task of getting my RDMA cards
> plugged in again. I tried with both CX4s and the T6s (and I'm still not
> sure if my CX4s work yet). Instead, it turns out there's a whole mess of
> bugs in the kernel I had to go up against. I went back and forth between
> different versions of the userspace libraries because I was sure 4.11
> worked -- but it turned out 4.11.10+, 4.12.x and who knows what other
> stable kernels are currently broken by the bug fixed in [1]. And there
> was a whole other bug that broke things that was fixed in the 4.12-rc
> series that I had to carefully bisect around to find the one reported
> above. So frustrating!!
>
> [1] 5a7a88f1b488e4ee49eb3d5b82612d4d9ffdf2c3
>
> --
>
> [   53.320439] iwpm_register_pid: Unable to send a nlmsg (client = 2)
> [   54.738579] BUG: unable to handle kernel NULL pointer dereference at
> 0058
> [   54.747439] IP: _raw_spin_lock_irqsave+0x10/0x30
> [   54.752719] PGD 0
> [   54.752721] P4D 0
> [   54.755049]
> [   54.759109] Oops: 0002 [#1] SMP
> [   54.762699] Modules linked in:
> [   54.766195] CPU: 0 PID: 5 Comm: kworker/u16:0 Not tainted
> 4.13.0-rc2.direct #708
> [   54.774536] Hardware name: Supermicro SYS-7047GR-TRF/X9DRG-QF, BIOS
> 3.0a 12/05/2013
> [   54.783182] Workqueue: iw_cxgb4 process_work
> [   54.788036] task: 880276a5ee80 task.stack: c90c4000
> [   54.794728] RIP: 0010:_raw_spin_lock_irqsave+0x10/0x30
> [   54.800552] RSP: 0018:c90c7c70 EFLAGS: 00010046
> [   54.806473] RAX:  RBX: 0002 RCX:
> 
> [   54.814524] RDX: 0001 RSI: 0058 RDI:
> 0058
> [   54.822583] RBP: 880470484600 R08: 0001 R09:
> 0001
> [   54.830663] R10: 0040 R11: 88047420b400 R12:
> 0282
> [   54.838744] R13: c90c7dc0 R14: 0001 R15:
> 880470484600
> [   54.846825] FS:  () GS:880277c0()
> knlGS:
> [   54.855997] CS:  0010 DS:  ES:  CR0: 80050033
> [   54.862522] CR2: 0058 CR3: 01e0a000 CR4:
> 000406f0
> [   54.870602] Call Trace:
> [   54.873442]  ? ib_uverbs_comp_handler+0x20/0xe0
> [   54.878610]  ? flush_qp+0x6e/0x2b0
> [   54.882514]  ? c4iw_modify_qp+0x11c2/0x1870
> [   54.887295]  ? close_con_rpl+0xe7/0x170
> [   54.891686]  ? kfree_skb+0x33/0x90
> [   54.895592]  ? skb_dequeue+0x52/0x60
> [   54.899690]  ? process_work+0x4a/0x60
> [   54.903887]  ? process_one_work+0x1c2/0x3e0
> [   54.908664]  ? worker_thread+0x47/0x3d0
> [   54.913056]  ? kthread+0xfc/0x130
> [   54.916864]  ? create_worker+0x180/0x180
> [   54.921353]  ? kthread_create_on_node+0x40/0x40
> [   54.926521]  ? ret_from_fork+0x22/0x30
> [   54.930811] Code: c0 74 05 e8 b3 1c 73 ff 48 89 

Re: BUG: NULL pointer dereference at ib_uverbs_comp_handler+0x20

2017-08-01 Thread Matan Barak
On Fri, Jul 28, 2017 at 8:38 PM, Logan Gunthorpe  wrote:
> Hi,
>
> My system has been failing with recent kernels (4.12.x and 4.13-rc2)
> with a NULL pointer dereference at the stack trace given at the end of
> this email. This happens when simply running 'ib_write_bw -R '
> with a Chelsio T6 (cxgb4). I've bisected (log attached) to find the
> offending commit to be:
>
> commit 1e7710f3f6563940bb6bbc94aa8eadfd344a86af
> Author: Matan Barak 
>   IB/core: Change completion channel to use the reworked objects schema
>
> Reverting this commit (and the dependent commits db1b5ddd53365 and
> e0fcc61113c that also fix other bugs with this commit) from v4.12.3
> fixes the issue.
>
> I did the bisect with the userspace libraries in Debian Stretch but I
> also had this bug with rdma-core v14. I was pretty sure v4.12 kernels
> worked for me in the past but likely only before I upgraded from Jessie
> to Stretch.
>
> Thanks,
>
> Logan
>

Hi Logan,

I've tried to reproduce this in my setup (ConnectX 4, RoCE mode) using
1e7710f3f6563940bb6bbc94aa8eadfd344a86af as the kernel's head.
I've used d779dd9a9e8f as rdma-core user-space and the latest perftest bits.
I couldn't reproduce this problem.
I'll try to review this commit again, but please provide more information.
For example, do you see the iwpm_register_pid error when these commits
are reverted?
Does this also happen when using the plain rdma-cm examples (ucmatose,
rping)? Does it happen in a plain verbs application (ibv_rc_pingpong)?
I assume you use iWarp, right? Did you test other modes?
Did you reproduce this issue with your ConnectX 4 as well?
Could you please reproduce it with KASAN as well?

PS, e0fcc61113c isn't a bug fix, it's just a simple refactor.

Regards,
Matan

>
> PS. As a side rant, this bug was found after a very *frustrating* day of
> what was supposed to be the 20 minute task of getting my RDMA cards
> plugged in again. I tried with both CX4s and the T6s (and I'm still not
> sure if my CX4s work yet). Instead, it turns out there's a whole mess of
> bugs in the kernel I had to go up against. I went back and forth between
> different versions of the userspace libraries because I was sure 4.11
> worked -- but it turned out 4.11.10+, 4.12.x and who knows what other
> stable kernels are currently broken by the bug fixed in [1]. And there
> was a whole other bug that broke things that was fixed in the 4.12-rc
> series that I had to carefully bisect around to find the one reported
> above. So frustrating!!
>
> [1] 5a7a88f1b488e4ee49eb3d5b82612d4d9ffdf2c3
>
> --
>
> [   53.320439] iwpm_register_pid: Unable to send a nlmsg (client = 2)
> [   54.738579] BUG: unable to handle kernel NULL pointer dereference at
> 0058
> [   54.747439] IP: _raw_spin_lock_irqsave+0x10/0x30
> [   54.752719] PGD 0
> [   54.752721] P4D 0
> [   54.755049]
> [   54.759109] Oops: 0002 [#1] SMP
> [   54.762699] Modules linked in:
> [   54.766195] CPU: 0 PID: 5 Comm: kworker/u16:0 Not tainted
> 4.13.0-rc2.direct #708
> [   54.774536] Hardware name: Supermicro SYS-7047GR-TRF/X9DRG-QF, BIOS
> 3.0a 12/05/2013
> [   54.783182] Workqueue: iw_cxgb4 process_work
> [   54.788036] task: 880276a5ee80 task.stack: c90c4000
> [   54.794728] RIP: 0010:_raw_spin_lock_irqsave+0x10/0x30
> [   54.800552] RSP: 0018:c90c7c70 EFLAGS: 00010046
> [   54.806473] RAX:  RBX: 0002 RCX:
> 
> [   54.814524] RDX: 0001 RSI: 0058 RDI:
> 0058
> [   54.822583] RBP: 880470484600 R08: 0001 R09:
> 0001
> [   54.830663] R10: 0040 R11: 88047420b400 R12:
> 0282
> [   54.838744] R13: c90c7dc0 R14: 0001 R15:
> 880470484600
> [   54.846825] FS:  () GS:880277c0()
> knlGS:
> [   54.855997] CS:  0010 DS:  ES:  CR0: 80050033
> [   54.862522] CR2: 0058 CR3: 01e0a000 CR4:
> 000406f0
> [   54.870602] Call Trace:
> [   54.873442]  ? ib_uverbs_comp_handler+0x20/0xe0
> [   54.878610]  ? flush_qp+0x6e/0x2b0
> [   54.882514]  ? c4iw_modify_qp+0x11c2/0x1870
> [   54.887295]  ? close_con_rpl+0xe7/0x170
> [   54.891686]  ? kfree_skb+0x33/0x90
> [   54.895592]  ? skb_dequeue+0x52/0x60
> [   54.899690]  ? process_work+0x4a/0x60
> [   54.903887]  ? process_one_work+0x1c2/0x3e0
> [   54.908664]  ? worker_thread+0x47/0x3d0
> [   54.913056]  ? kthread+0xfc/0x130
> [   54.916864]  ? create_worker+0x180/0x180
> [   54.921353]  ? kthread_create_on_node+0x40/0x40
> [   54.926521]  ? ret_from_fork+0x22/0x30
> [   54.930811] Code: c0 74 05 e8 b3 1c 73 ff 48 89 d8 5b c3 0f 1f 40 00
> 66 2e 0f 1f 84 00 

Re: [PATCH] IB/core: fix semicolon.cocci warnings

2017-06-08 Thread Matan Barak
On Wed, Jun 7, 2017 at 11:26 PM, Julia Lawall  wrote:
> Remove unneeded semicolon.
>
> Generated by: scripts/coccinelle/misc/semicolon.cocci

Thanks. I'll fix this in the next version of this series.


Re: [PATCH] IB/core: fix semicolon.cocci warnings

2017-06-08 Thread Matan Barak
On Wed, Jun 7, 2017 at 11:26 PM, Julia Lawall  wrote:
> Remove unneeded semicolon.
>
> Generated by: scripts/coccinelle/misc/semicolon.cocci

Thanks. I'll fix this in the next version of this series.


Re: [PATCH] net/mlx5: drop duplicate header delay.h

2016-11-24 Thread Matan Barak

On 24/11/2016 15:58, Geliang Tang wrote:

Drop duplicate header delay.h from mlx5/core/main.c.

Signed-off-by: Geliang Tang <geliangt...@gmail.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/main.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index f28df33..d7a55eb 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -46,7 +46,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #ifdef CONFIG_RFS_ACCEL
 #include 



Thanks.
Acked-by: Matan Barak <mat...@mellanox.com>


Re: [PATCH] net/mlx5: drop duplicate header delay.h

2016-11-24 Thread Matan Barak

On 24/11/2016 15:58, Geliang Tang wrote:

Drop duplicate header delay.h from mlx5/core/main.c.

Signed-off-by: Geliang Tang 
---
 drivers/net/ethernet/mellanox/mlx5/core/main.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index f28df33..d7a55eb 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -46,7 +46,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #ifdef CONFIG_RFS_ACCEL
 #include 



Thanks.
Acked-by: Matan Barak 


Re: [PATCH] net/mlx5: Simplify a test

2016-11-01 Thread Matan Barak

On 01/11/2016 09:10, Christophe JAILLET wrote:

'create_root_ns()' does not return an error pointer, so the test can be
simplified to be more consistent.

Signed-off-by: Christophe JAILLET <christophe.jail...@wanadoo.fr>
---
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index 904853f9cf7a..330955f6badc 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -1833,7 +1833,7 @@ static int init_root_ns(struct mlx5_flow_steering 
*steering)
 {

steering->root_ns = create_root_ns(steering, FS_FT_NIC_RX);
-   if (IS_ERR_OR_NULL(steering->root_ns))
+   if (!steering->root_ns)
goto cleanup;

if (init_root_tree(steering, _fs, >root_ns->ns.node))



Thanks.
Acked-by: Matan Barak <mat...@mellanox.com>



Re: [PATCH] net/mlx5: Simplify a test

2016-11-01 Thread Matan Barak

On 01/11/2016 09:10, Christophe JAILLET wrote:

'create_root_ns()' does not return an error pointer, so the test can be
simplified to be more consistent.

Signed-off-by: Christophe JAILLET 
---
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index 904853f9cf7a..330955f6badc 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -1833,7 +1833,7 @@ static int init_root_ns(struct mlx5_flow_steering 
*steering)
 {

steering->root_ns = create_root_ns(steering, FS_FT_NIC_RX);
-   if (IS_ERR_OR_NULL(steering->root_ns))
+   if (!steering->root_ns)
goto cleanup;

if (init_root_tree(steering, _fs, >root_ns->ns.node))



Thanks.
Acked-by: Matan Barak 



Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller

2016-09-14 Thread Matan Barak

On 14/09/2016 10:06, Parav Pandit wrote:

Hi Dennis,

Do you know how would HFI1 driver would work along with rdma cgroup?

Hi Matan, Leon, Jason,
Apart from HFI1, is there any other concern?


I just wonder how things like RSS will work. For example, a RSS QP 
doesn't really have a queue (if I recall, it's connected to work queues 
via an indirection table). So, when a user creates such a QP, do you 
want to account it as a regular QP?

How are work queues accounted?



Or Patch is good to go?

4.8 dates are close by (2 weeks) and there are two git trees involved
(that might cause merge error to Linus) so if there are no issues, I
would like to make request to Doug to consider it for 4.8 early on.

Parav

On Mon, Sep 12, 2016 at 10:37 AM, Leon Romanovsky  wrote:

On Sun, Sep 11, 2016 at 11:52:35AM -0600, Jason Gunthorpe wrote:

On Sun, Sep 11, 2016 at 07:24:45PM +0200, Christoph Hellwig wrote:

I've posted some initial work toward a) a while ago, and once we


Did it get merged? Do you have a pointer?


http://www.spinics.net/lists/linux-rdma/msg31958.html


Right, I remember that. Certainly the right direction


However, everything under verbs is not straightforward. The files in
userspace are not copies...

user:

struct ibv_query_device {
   __u32 command;
   __u16 in_words;
   __u16 out_words;
   __u64 response;
   __u64 driver_data[0];
};

kernel:

struct ib_uverbs_query_device {
__u64 response;
__u64 driver_data[0];
};


We'll obviously need different strutures for the libibvers API
and the kernel interface in this case, and we'll need to figure out
how to properly translate them.  I think a cast, plus compile time
type checking ala BUILD_BUG_ON is the way to go.


I'm not sure I follow, which would I cast?

BUILD_BUG_ON(sizeof(ibv_query_device) == sizeof(ib_uverbs_cmd_hdr) +
 sizeof(ib_uverbs_query_device))

?


I'm thinking the best way forward might be to use a script and
transform userspace into:

struct ibv_query_device {
  struct ib_uverbs_cmd_hdr hdr;
  struct ib_uverbs_query_device cmd;
};


That would break the users of the interface.


Sorry, I mean doing this inside rdma-plumbing. Since the change is ABI
identical the modified libibverbs would still be binary compatible
with all providers but not source compatible. Since all kernel
supported providers are in rdma-plumbing we can add the '.cmd.' at the
same time.

The kernel uapi header would stay the same.


However automatically generating the user ABI from the kernel one
might still be a good idea in the long run.


My preference would be to try and use the kernel headers directly.


I thought the same, especially after realizing that they are almost
copy/paste from the vendor *-abi.h files.



Jason




Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller

2016-09-14 Thread Matan Barak

On 14/09/2016 10:06, Parav Pandit wrote:

Hi Dennis,

Do you know how would HFI1 driver would work along with rdma cgroup?

Hi Matan, Leon, Jason,
Apart from HFI1, is there any other concern?


I just wonder how things like RSS will work. For example, a RSS QP 
doesn't really have a queue (if I recall, it's connected to work queues 
via an indirection table). So, when a user creates such a QP, do you 
want to account it as a regular QP?

How are work queues accounted?



Or Patch is good to go?

4.8 dates are close by (2 weeks) and there are two git trees involved
(that might cause merge error to Linus) so if there are no issues, I
would like to make request to Doug to consider it for 4.8 early on.

Parav

On Mon, Sep 12, 2016 at 10:37 AM, Leon Romanovsky  wrote:

On Sun, Sep 11, 2016 at 11:52:35AM -0600, Jason Gunthorpe wrote:

On Sun, Sep 11, 2016 at 07:24:45PM +0200, Christoph Hellwig wrote:

I've posted some initial work toward a) a while ago, and once we


Did it get merged? Do you have a pointer?


http://www.spinics.net/lists/linux-rdma/msg31958.html


Right, I remember that. Certainly the right direction


However, everything under verbs is not straightforward. The files in
userspace are not copies...

user:

struct ibv_query_device {
   __u32 command;
   __u16 in_words;
   __u16 out_words;
   __u64 response;
   __u64 driver_data[0];
};

kernel:

struct ib_uverbs_query_device {
__u64 response;
__u64 driver_data[0];
};


We'll obviously need different strutures for the libibvers API
and the kernel interface in this case, and we'll need to figure out
how to properly translate them.  I think a cast, plus compile time
type checking ala BUILD_BUG_ON is the way to go.


I'm not sure I follow, which would I cast?

BUILD_BUG_ON(sizeof(ibv_query_device) == sizeof(ib_uverbs_cmd_hdr) +
 sizeof(ib_uverbs_query_device))

?


I'm thinking the best way forward might be to use a script and
transform userspace into:

struct ibv_query_device {
  struct ib_uverbs_cmd_hdr hdr;
  struct ib_uverbs_query_device cmd;
};


That would break the users of the interface.


Sorry, I mean doing this inside rdma-plumbing. Since the change is ABI
identical the modified libibverbs would still be binary compatible
with all providers but not source compatible. Since all kernel
supported providers are in rdma-plumbing we can add the '.cmd.' at the
same time.

The kernel uapi header would stay the same.


However automatically generating the user ABI from the kernel one
might still be a good idea in the long run.


My preference would be to try and use the kernel headers directly.


I thought the same, especially after realizing that they are almost
copy/paste from the vendor *-abi.h files.



Jason




Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller

2016-09-11 Thread Matan Barak

On 10/09/2016 20:01, Jason Gunthorpe wrote:

On Sat, Sep 10, 2016 at 06:14:42PM +0200, Christoph Hellwig wrote:

OFVWG meetings have absolutely zero relevance for Linux development.


Well, to be fair there are a fair number of kernel developers on that
particular call..


More "flexibility" for drivers just means giving up on designing a
coherent API and leaving it to drivers authors to add crap to their
own drivers.  That's a major step backwards.


Sadly, it isn't a step backwards, it is status quo - at least as far
as the uapi is concerned.

Every single user space driver has its own private abi file, carefully
hidden in their driver, and dutifully copied over to user space:

providers/cxgb3/iwch-abi.h
providers/cxgb4/cxgb4-abi.h
providers/hfi1verbs/hfi-abi.h
providers/i40iw/i40iw-abi.h
providers/ipathverbs/ipath-abi.h
providers/mlx4/mlx4-abi.h
providers/mlx5/mlx5-abi.h
providers/mthca/mthca-abi.h
providers/nes/nes-abi.h
providers/ocrdma/ocrdma_abi.h
providers/rxe/rxe-abi.h

Just to pick two random examples:

struct mlx5_create_cq {
struct ibv_create_cqibv_cmd;
__u64   buf_addr;
__u64   db_addr;
__u32   cqe_size;
};

struct iwch_create_cq {
struct ibv_create_cq ibv_cmd;
uint64_t user_rptr_addr;
};

Love to hear ideas on a way forward that doesn't involve rewriting
everything :(



Yeah, unfortunately, the RDMA ABI is more driver specific ABI than a 
common user-kernel ABI. I guess this will become even worse, as the RDMA 
subsystem is evolving to serve more drivers with different object types. 
For example, I would like to hear how hfi1 are going to define their 
user-kernel ABI (once they leave the custom ioctls).



They should not be using the code in drivers/infiniband.  usnic is such
an example of a driver that should never have been added in it's current
form.


+1

Jason



Matan


Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller

2016-09-11 Thread Matan Barak

On 10/09/2016 20:01, Jason Gunthorpe wrote:

On Sat, Sep 10, 2016 at 06:14:42PM +0200, Christoph Hellwig wrote:

OFVWG meetings have absolutely zero relevance for Linux development.


Well, to be fair there are a fair number of kernel developers on that
particular call..


More "flexibility" for drivers just means giving up on designing a
coherent API and leaving it to drivers authors to add crap to their
own drivers.  That's a major step backwards.


Sadly, it isn't a step backwards, it is status quo - at least as far
as the uapi is concerned.

Every single user space driver has its own private abi file, carefully
hidden in their driver, and dutifully copied over to user space:

providers/cxgb3/iwch-abi.h
providers/cxgb4/cxgb4-abi.h
providers/hfi1verbs/hfi-abi.h
providers/i40iw/i40iw-abi.h
providers/ipathverbs/ipath-abi.h
providers/mlx4/mlx4-abi.h
providers/mlx5/mlx5-abi.h
providers/mthca/mthca-abi.h
providers/nes/nes-abi.h
providers/ocrdma/ocrdma_abi.h
providers/rxe/rxe-abi.h

Just to pick two random examples:

struct mlx5_create_cq {
struct ibv_create_cqibv_cmd;
__u64   buf_addr;
__u64   db_addr;
__u32   cqe_size;
};

struct iwch_create_cq {
struct ibv_create_cq ibv_cmd;
uint64_t user_rptr_addr;
};

Love to hear ideas on a way forward that doesn't involve rewriting
everything :(



Yeah, unfortunately, the RDMA ABI is more driver specific ABI than a 
common user-kernel ABI. I guess this will become even worse, as the RDMA 
subsystem is evolving to serve more drivers with different object types. 
For example, I would like to hear how hfi1 are going to define their 
user-kernel ABI (once they leave the custom ioctls).



They should not be using the code in drivers/infiniband.  usnic is such
an example of a driver that should never have been added in it's current
form.


+1

Jason



Matan


Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller

2016-09-11 Thread Matan Barak

On 10/09/2016 19:12, Christoph Hellwig wrote:

On Wed, Sep 07, 2016 at 01:25:13PM +0530, Parav Pandit wrote:

 a) delay cgroups support until the grand rewrite is done
 b) add it now and deal with the consequences later


Can we do (b) now and differ adding any HW resources to cgroup until
they are clearly called out.
Architecture and APIs are already in place to support this.


Sounds fine to me.  The only thing I want to avoid is pie in the
sky "future proofing" that leads to horrible architectures.  And I assume
that's what Matan proposed.



NO, that not what I proposed. User-kernel API/ABI should be designed 
with drivers differences in mind. The internal design or internals APIs 
could ignore such things as they can be changed later.





Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller

2016-09-11 Thread Matan Barak

On 10/09/2016 19:12, Christoph Hellwig wrote:

On Wed, Sep 07, 2016 at 01:25:13PM +0530, Parav Pandit wrote:

 a) delay cgroups support until the grand rewrite is done
 b) add it now and deal with the consequences later


Can we do (b) now and differ adding any HW resources to cgroup until
they are clearly called out.
Architecture and APIs are already in place to support this.


Sounds fine to me.  The only thing I want to avoid is pie in the
sky "future proofing" that leads to horrible architectures.  And I assume
that's what Matan proposed.



NO, that not what I proposed. User-kernel API/ABI should be designed 
with drivers differences in mind. The internal design or internals APIs 
could ignore such things as they can be changed later.





Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller

2016-09-07 Thread Matan Barak

On 07/09/2016 10:55, Parav Pandit wrote:

Hi Matan,

On Thu, Sep 1, 2016 at 2:14 PM, Christoph Hellwig <h...@lst.de> wrote:

On Thu, Sep 01, 2016 at 10:25:40AM +0300, Matan Barak wrote:

Well, if I recall, the reason doing so last time was in order to allow
flexible updating of ib_core independently, which is obviously not a good
reason (to say the least).

Since the new ABI will probably define new object types (all recent
proposals go this way), the current approach could lead to either trying to
map new objects to existing cgroup resource types, which could lead to some
weird non 1:1 mapping, or having a split definitions - such that each
driver will declare its objects both in the cgroups mechanism and in its
driver dispatch table.



Even worse than that, drivers could simply ignore the cgroups support while
implementing their own resource types and get a very broken containers
support.

If drivers are broken due to ignorance of not-calling cgroup APIs,
that should be ok.
That particular driver should fix it.
If the resource creation using uverbs is using well defined rdma level
resource, uverbs level will make sure to honor cgroup limits without
involving hw drivers anyway.



All recent proposals of the new ABI schema deals with extending the 
flexibility of the current schema by letting drivers define their 
specific types, actions, attributes, etc. Even more than that, the 
dispatching starts from the driver and it chooses if it wants to use the 
common RDMA core layer or have it's own wise implementation instead.

Some drivers might even prefer not to implement the current verbs types.
These decisions were made in the OFVWG meetings.

Anyway, maybe we should consider using a more higher-level logic objects 
that could fit multiple drivers requirements.



RDMA Verb level resource is charged/uncharged by RDMA core.
RDMA HW level resource is charged/uncharged by RDMA HW driver using
well defined API and resource by cgroup core.
This scheme ensures that there is 1:1 mapping.



Sounds reasonable, but what about drivers which ignore the common code 
and implement it in their own way? What about drivers which don't 
support the standard RDMA types at all?
I guess we should find a balance between something abstract and common 
enough that will ease administrator configuration but be specific enough 
for the various models we have (or will have) in the RDMA stack.



I don't think current definition of resource type is carved out on stone.
They can be extended as we forward.
As many of us agree that, they should be well defined and it should be
agreed by cgroup and rdma community.



Of course, but since the ABI and cgroups model are somehow related, they 
should be dealt with together and approved by Doug who participated in 
some of the OFVWG meetings.



For example, today we have RDMA_VERB_xxx resources.
New well defined RDMA HW resources can be defined in rdma_cgroup.h
file as RDMA_HW_xx in same enum table.



So a driver will change the cgroups file for every new type it adds?
Will we just have a super set enum of all crazy types vendors added?



Sorry guys, but arbitrary extensibility for something not finished is the
worst idea ever.  We have two options here:

 a) delay cgroups support until the grand rewrite is done
 b) add it now and deal with the consequences later


Can we do (b) now and differ adding any HW resources to cgroup until
they are clearly called out.
Architecture and APIs are already in place to support this.



Since this affect the user, it's better to think how it fits our 
"optional standard"/"vendor types" model first. Maybe we could force all 
standard types even if the driver we use doesn't support any of them.



That being said, adding random non-Verbs hardwasre to the RDMA core is
the second worst idea ever.


We can differ adding HW resource to core and cgroup until they are
clearly defined.
In that case current architecture still holds good.



Clearly we should differ adding the actual code until a driver could 
declare such objects, but we need to decide how to expose the standard 
optional RDMA types (basically, the types you've added) and how future 
driver specific types fit in.



Guess I need to catch up with the
discussion and start using the flame thrower.


Matan,
Can you please point us to the new RFC/ABI email thread which
describes the design, partitioning of code between core vs hw drivers
etc.



One proposal is [1]. There's another one from Sean which aims for 
similar targets regards the driver specific types.


[1] https://www.spinics.net/lists/linux-rdma/msg38997.html

Matan


Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller

2016-09-07 Thread Matan Barak

On 07/09/2016 10:55, Parav Pandit wrote:

Hi Matan,

On Thu, Sep 1, 2016 at 2:14 PM, Christoph Hellwig  wrote:

On Thu, Sep 01, 2016 at 10:25:40AM +0300, Matan Barak wrote:

Well, if I recall, the reason doing so last time was in order to allow
flexible updating of ib_core independently, which is obviously not a good
reason (to say the least).

Since the new ABI will probably define new object types (all recent
proposals go this way), the current approach could lead to either trying to
map new objects to existing cgroup resource types, which could lead to some
weird non 1:1 mapping, or having a split definitions - such that each
driver will declare its objects both in the cgroups mechanism and in its
driver dispatch table.



Even worse than that, drivers could simply ignore the cgroups support while
implementing their own resource types and get a very broken containers
support.

If drivers are broken due to ignorance of not-calling cgroup APIs,
that should be ok.
That particular driver should fix it.
If the resource creation using uverbs is using well defined rdma level
resource, uverbs level will make sure to honor cgroup limits without
involving hw drivers anyway.



All recent proposals of the new ABI schema deals with extending the 
flexibility of the current schema by letting drivers define their 
specific types, actions, attributes, etc. Even more than that, the 
dispatching starts from the driver and it chooses if it wants to use the 
common RDMA core layer or have it's own wise implementation instead.

Some drivers might even prefer not to implement the current verbs types.
These decisions were made in the OFVWG meetings.

Anyway, maybe we should consider using a more higher-level logic objects 
that could fit multiple drivers requirements.



RDMA Verb level resource is charged/uncharged by RDMA core.
RDMA HW level resource is charged/uncharged by RDMA HW driver using
well defined API and resource by cgroup core.
This scheme ensures that there is 1:1 mapping.



Sounds reasonable, but what about drivers which ignore the common code 
and implement it in their own way? What about drivers which don't 
support the standard RDMA types at all?
I guess we should find a balance between something abstract and common 
enough that will ease administrator configuration but be specific enough 
for the various models we have (or will have) in the RDMA stack.



I don't think current definition of resource type is carved out on stone.
They can be extended as we forward.
As many of us agree that, they should be well defined and it should be
agreed by cgroup and rdma community.



Of course, but since the ABI and cgroups model are somehow related, they 
should be dealt with together and approved by Doug who participated in 
some of the OFVWG meetings.



For example, today we have RDMA_VERB_xxx resources.
New well defined RDMA HW resources can be defined in rdma_cgroup.h
file as RDMA_HW_xx in same enum table.



So a driver will change the cgroups file for every new type it adds?
Will we just have a super set enum of all crazy types vendors added?



Sorry guys, but arbitrary extensibility for something not finished is the
worst idea ever.  We have two options here:

 a) delay cgroups support until the grand rewrite is done
 b) add it now and deal with the consequences later


Can we do (b) now and differ adding any HW resources to cgroup until
they are clearly called out.
Architecture and APIs are already in place to support this.



Since this affect the user, it's better to think how it fits our 
"optional standard"/"vendor types" model first. Maybe we could force all 
standard types even if the driver we use doesn't support any of them.



That being said, adding random non-Verbs hardwasre to the RDMA core is
the second worst idea ever.


We can differ adding HW resource to core and cgroup until they are
clearly defined.
In that case current architecture still holds good.



Clearly we should differ adding the actual code until a driver could 
declare such objects, but we need to decide how to expose the standard 
optional RDMA types (basically, the types you've added) and how future 
driver specific types fit in.



Guess I need to catch up with the
discussion and start using the flame thrower.


Matan,
Can you please point us to the new RFC/ABI email thread which
describes the design, partitioning of code between core vs hw drivers
etc.



One proposal is [1]. There's another one from Sean which aims for 
similar targets regards the driver specific types.


[1] https://www.spinics.net/lists/linux-rdma/msg38997.html

Matan


Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller

2016-09-01 Thread Matan Barak

On 01/09/2016 00:16, Tejun Heo wrote:

Hello,

On Wed, Aug 31, 2016 at 06:07:30PM +0300, Matan Barak wrote:

Currently, there are some discussions regarding the RDMA ABI. The current
proposed approach (after a lot of discussions in the OFVWG) is to have
driver dependent object types rather than the fixed set of IB object types
we have today.
AFAIK, some vendors might want to use the RDMA subsystem for a different
fabrics which has a different set of objects.
You could see RFCs for such concepts both from Mellanox and Intel on the
linux-rdma mailing list.

Saying that, maybe we need to make the resource types a bit more flexible
and dynamic.


That'd be back to square one and Christoph was dead against it too,
so...



Well, if I recall, the reason doing so last time was in order to allow 
flexible updating of ib_core independently, which is obviously not a 
good reason (to say the least).


Since the new ABI will probably define new object types (all recent 
proposals go this way), the current approach could lead to either trying 
to map new objects to existing cgroup resource types, which could lead 
to some weird non 1:1 mapping, or having a split definitions - such that 
each driver will declare its objects both in the cgroups mechanism and 
in its driver dispatch table.
Even worse than that, drivers could simply ignore the cgroups support 
while implementing their own resource types and get a very broken 
containers support.




Thanks.



Matan


Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller

2016-09-01 Thread Matan Barak

On 01/09/2016 00:16, Tejun Heo wrote:

Hello,

On Wed, Aug 31, 2016 at 06:07:30PM +0300, Matan Barak wrote:

Currently, there are some discussions regarding the RDMA ABI. The current
proposed approach (after a lot of discussions in the OFVWG) is to have
driver dependent object types rather than the fixed set of IB object types
we have today.
AFAIK, some vendors might want to use the RDMA subsystem for a different
fabrics which has a different set of objects.
You could see RFCs for such concepts both from Mellanox and Intel on the
linux-rdma mailing list.

Saying that, maybe we need to make the resource types a bit more flexible
and dynamic.


That'd be back to square one and Christoph was dead against it too,
so...



Well, if I recall, the reason doing so last time was in order to allow 
flexible updating of ib_core independently, which is obviously not a 
good reason (to say the least).


Since the new ABI will probably define new object types (all recent 
proposals go this way), the current approach could lead to either trying 
to map new objects to existing cgroup resource types, which could lead 
to some weird non 1:1 mapping, or having a split definitions - such that 
each driver will declare its objects both in the cgroups mechanism and 
in its driver dispatch table.
Even worse than that, drivers could simply ignore the cgroups support 
while implementing their own resource types and get a very broken 
containers support.




Thanks.



Matan


Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller

2016-08-31 Thread Matan Barak

On 31/08/2016 11:37, Parav Pandit wrote:

Added rdma cgroup controller that does accounting, limit enforcement
on rdma/IB verbs and hw resources.

Added rdma cgroup header file which defines its APIs to perform
charing/uncharing functionality. It also defined APIs for RDMA/IB
stack for device registration. Devices which are registered will
participate in controller functions of accounting and limit
enforcements. It define rdmacg_device structure to bind IB stack
and RDMA cgroup controller.

RDMA resources are tracked using resource pool. Resource pool is per
device, per cgroup entity which allows setting up accounting limits
on per device basis.

Currently resources are defined by the RDMA cgroup.

Resource pool is created/destroyed dynamically whenever
charging/uncharging occurs respectively and whenever user
configuration is done. Its a tradeoff of memory vs little more code
space that creates resource pool object whenever necessary, instead of
creating them during cgroup creation and device registration time.

Signed-off-by: Parav Pandit 
---
 include/linux/cgroup_rdma.h   |  66 +
 include/linux/cgroup_subsys.h |   4 +
 init/Kconfig  |  10 +
 kernel/Makefile   |   1 +
 kernel/cgroup_rdma.c  | 664 ++
 5 files changed, 745 insertions(+)
 create mode 100644 include/linux/cgroup_rdma.h
 create mode 100644 kernel/cgroup_rdma.c

diff --git a/include/linux/cgroup_rdma.h b/include/linux/cgroup_rdma.h
new file mode 100644
index 000..6710e28
--- /dev/null
+++ b/include/linux/cgroup_rdma.h
@@ -0,0 +1,66 @@
+/*
+ * Copyright (C) 2016 Parav Pandit 
+ *
+ * This file is subject to the terms and conditions of version 2 of the GNU
+ * General Public License. See the file COPYING in the main directory of the
+ * Linux distribution for more details.
+ */
+
+#ifndef _CGROUP_RDMA_H
+#define _CGROUP_RDMA_H
+
+#include 
+
+enum rdmacg_resource_type {
+   RDMACG_VERB_RESOURCE_UCTX,
+   RDMACG_VERB_RESOURCE_AH,
+   RDMACG_VERB_RESOURCE_PD,
+   RDMACG_VERB_RESOURCE_CQ,
+   RDMACG_VERB_RESOURCE_MR,
+   RDMACG_VERB_RESOURCE_MW,
+   RDMACG_VERB_RESOURCE_SRQ,
+   RDMACG_VERB_RESOURCE_QP,
+   RDMACG_VERB_RESOURCE_FLOW,
+   /*
+* add any hw specific resource here as RDMA_HW_RESOURCE_NAME
+*/
+   RDMACG_RESOURCE_MAX,
+};
+
+#ifdef CONFIG_CGROUP_RDMA
+


Currently, there are some discussions regarding the RDMA ABI. The 
current proposed approach (after a lot of discussions in the OFVWG) is 
to have driver dependent object types rather than the fixed set of IB 
object types we have today.
AFAIK, some vendors might want to use the RDMA subsystem for a different 
fabrics which has a different set of objects.
You could see RFCs for such concepts both from Mellanox and Intel on the 
linux-rdma mailing list.


Saying that, maybe we need to make the resource types a bit more 
flexible and dynamic.


Regards,
Matan


Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller

2016-08-31 Thread Matan Barak

On 31/08/2016 11:37, Parav Pandit wrote:

Added rdma cgroup controller that does accounting, limit enforcement
on rdma/IB verbs and hw resources.

Added rdma cgroup header file which defines its APIs to perform
charing/uncharing functionality. It also defined APIs for RDMA/IB
stack for device registration. Devices which are registered will
participate in controller functions of accounting and limit
enforcements. It define rdmacg_device structure to bind IB stack
and RDMA cgroup controller.

RDMA resources are tracked using resource pool. Resource pool is per
device, per cgroup entity which allows setting up accounting limits
on per device basis.

Currently resources are defined by the RDMA cgroup.

Resource pool is created/destroyed dynamically whenever
charging/uncharging occurs respectively and whenever user
configuration is done. Its a tradeoff of memory vs little more code
space that creates resource pool object whenever necessary, instead of
creating them during cgroup creation and device registration time.

Signed-off-by: Parav Pandit 
---
 include/linux/cgroup_rdma.h   |  66 +
 include/linux/cgroup_subsys.h |   4 +
 init/Kconfig  |  10 +
 kernel/Makefile   |   1 +
 kernel/cgroup_rdma.c  | 664 ++
 5 files changed, 745 insertions(+)
 create mode 100644 include/linux/cgroup_rdma.h
 create mode 100644 kernel/cgroup_rdma.c

diff --git a/include/linux/cgroup_rdma.h b/include/linux/cgroup_rdma.h
new file mode 100644
index 000..6710e28
--- /dev/null
+++ b/include/linux/cgroup_rdma.h
@@ -0,0 +1,66 @@
+/*
+ * Copyright (C) 2016 Parav Pandit 
+ *
+ * This file is subject to the terms and conditions of version 2 of the GNU
+ * General Public License. See the file COPYING in the main directory of the
+ * Linux distribution for more details.
+ */
+
+#ifndef _CGROUP_RDMA_H
+#define _CGROUP_RDMA_H
+
+#include 
+
+enum rdmacg_resource_type {
+   RDMACG_VERB_RESOURCE_UCTX,
+   RDMACG_VERB_RESOURCE_AH,
+   RDMACG_VERB_RESOURCE_PD,
+   RDMACG_VERB_RESOURCE_CQ,
+   RDMACG_VERB_RESOURCE_MR,
+   RDMACG_VERB_RESOURCE_MW,
+   RDMACG_VERB_RESOURCE_SRQ,
+   RDMACG_VERB_RESOURCE_QP,
+   RDMACG_VERB_RESOURCE_FLOW,
+   /*
+* add any hw specific resource here as RDMA_HW_RESOURCE_NAME
+*/
+   RDMACG_RESOURCE_MAX,
+};
+
+#ifdef CONFIG_CGROUP_RDMA
+


Currently, there are some discussions regarding the RDMA ABI. The 
current proposed approach (after a lot of discussions in the OFVWG) is 
to have driver dependent object types rather than the fixed set of IB 
object types we have today.
AFAIK, some vendors might want to use the RDMA subsystem for a different 
fabrics which has a different set of objects.
You could see RFCs for such concepts both from Mellanox and Intel on the 
linux-rdma mailing list.


Saying that, maybe we need to make the resource types a bit more 
flexible and dynamic.


Regards,
Matan


Re: [BUG] mellanox IB driver fails to load on large config

2015-07-21 Thread Matan Barak
On Tue, Jul 21, 2015 at 5:56 AM, Alex Thorlton  wrote:
> On Mon, Jul 20, 2015 at 11:28:03AM -0500, Alex Thorlton wrote:
>> I've got some time on the large machine later today.  I'll give this a
>> try then.
>
> I ran a boot with this patch applied:
>
> diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
> index 83e80ab..c84aea0 100644
> --- a/include/linux/mlx4/device.h
> +++ b/include/linux/mlx4/device.h
> @@ -45,7 +45,7 @@
>  #include 
>
>  #define MAX_MSIX_P_PORT17
> -#define MAX_MSIX   64
> +#define MAX_MSIX   8192
>  #define MSIX_LEGACY_SZ 4
>  #define MIN_MSIX_P_PORT5
>
> I went for a max of 8192, since I was actually booting the machine with
> 6144 cores (not 4096) for this run.  It doesn't look like this fixed the
> problem.  I still saw the same errors during boot.
>
> FWIW, the module does appear to still successfully load:
>
> 8<---
> # lsmod | grep mlx
> mlx4_ib   151552  0
> ib_sa  32768  1 mlx4_ib
> ib_mad 49152  2 ib_sa,mlx4_ib
> ib_core   102400  3 ib_sa,mlx4_ib,ib_mad
> mlx4_core 278528  1 mlx4_ib
> --->8
>
> If the module loading is good enough, and we should just ignore the
> errors, then I'm fine with that.  Just wanting to make sure that
> everything is behaving correctly.

It shouldn't be a problem,  as all unused/erroneous EQs get "-1".
We'll try to reproduce the problem here, it might take awhile though.

Thanks for checking this,
Matan

>
> - Alex
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] mellanox IB driver fails to load on large config

2015-07-21 Thread Matan Barak
On Tue, Jul 21, 2015 at 5:56 AM, Alex Thorlton athorl...@sgi.com wrote:
 On Mon, Jul 20, 2015 at 11:28:03AM -0500, Alex Thorlton wrote:
 I've got some time on the large machine later today.  I'll give this a
 try then.

 I ran a boot with this patch applied:

 diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
 index 83e80ab..c84aea0 100644
 --- a/include/linux/mlx4/device.h
 +++ b/include/linux/mlx4/device.h
 @@ -45,7 +45,7 @@
  #include linux/timecounter.h

  #define MAX_MSIX_P_PORT17
 -#define MAX_MSIX   64
 +#define MAX_MSIX   8192
  #define MSIX_LEGACY_SZ 4
  #define MIN_MSIX_P_PORT5

 I went for a max of 8192, since I was actually booting the machine with
 6144 cores (not 4096) for this run.  It doesn't look like this fixed the
 problem.  I still saw the same errors during boot.

 FWIW, the module does appear to still successfully load:

 8---
 # lsmod | grep mlx
 mlx4_ib   151552  0
 ib_sa  32768  1 mlx4_ib
 ib_mad 49152  2 ib_sa,mlx4_ib
 ib_core   102400  3 ib_sa,mlx4_ib,ib_mad
 mlx4_core 278528  1 mlx4_ib
 ---8

 If the module loading is good enough, and we should just ignore the
 errors, then I'm fine with that.  Just wanting to make sure that
 everything is behaving correctly.

It shouldn't be a problem,  as all unused/erroneous EQs get -1.
We'll try to reproduce the problem here, it might take awhile though.

Thanks for checking this,
Matan


 - Alex
 --
 To unsubscribe from this list: send the line unsubscribe linux-rdma in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] mellanox IB driver fails to load on large config

2015-07-15 Thread Matan Barak



On 7/14/2015 11:28 PM, Alex Thorlton wrote:

On Tue, Jul 14, 2015 at 11:06:26PM +0300, Or Gerlitz wrote:

On Tue, Jul 14, 2015 at 9:48 PM, Alex Thorlton  wrote:

On Tue, Jul 14, 2015 at 01:22:34PM -0500, andrew banman wrote:

On Sat, Jul 11, 2015 at 11:20:19PM +0300, Or Gerlitz wrote:

On Fri, Jul 10, 2015 at 10:15 PM, andrew banman  wrote:

I'm seeing a large number of allocation errors originating from the Mellanox IB
driver when booting the 4.2-rc1 kernel on a 4096cpu 32TB memory system:


Just to make sure, mlx4 works fine on this small (...) system with 4.1
and 4.2-rc1 breaks, or 4.2-rc1 is the 1st time you're trying that
config?


I'll let Alex comment on that, he did some testing on that.


I started seeing this on a 4.1-rc8 kernel, so it's been around for a
little while.  It may have been around before 4.1-rc8, but I haven't run
any kernels older than that on the big machine for some time.


To make sure I am correctly following, on 4.1-rc8  you also see
something, right?


Yes, that's correct.


are these the same messages or different ones? if the latter send to us.


We see the same exact messages on 4.1-rc8.


Hi,

We don't recall getting those error with 32cpu machines, but we'll try 
to reproduce this issue.


Matan



Thanks for looking into this!

- Alex


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] mellanox IB driver fails to load on large config

2015-07-15 Thread Matan Barak



On 7/14/2015 11:28 PM, Alex Thorlton wrote:

On Tue, Jul 14, 2015 at 11:06:26PM +0300, Or Gerlitz wrote:

On Tue, Jul 14, 2015 at 9:48 PM, Alex Thorlton athorl...@sgi.com wrote:

On Tue, Jul 14, 2015 at 01:22:34PM -0500, andrew banman wrote:

On Sat, Jul 11, 2015 at 11:20:19PM +0300, Or Gerlitz wrote:

On Fri, Jul 10, 2015 at 10:15 PM, andrew banman aban...@sgi.com wrote:

I'm seeing a large number of allocation errors originating from the Mellanox IB
driver when booting the 4.2-rc1 kernel on a 4096cpu 32TB memory system:


Just to make sure, mlx4 works fine on this small (...) system with 4.1
and 4.2-rc1 breaks, or 4.2-rc1 is the 1st time you're trying that
config?


I'll let Alex comment on that, he did some testing on that.


I started seeing this on a 4.1-rc8 kernel, so it's been around for a
little while.  It may have been around before 4.1-rc8, but I haven't run
any kernels older than that on the big machine for some time.


To make sure I am correctly following, on 4.1-rc8  you also see
something, right?


Yes, that's correct.


are these the same messages or different ones? if the latter send to us.


We see the same exact messages on 4.1-rc8.


Hi,

We don't recall getting those error with 32cpu machines, but we'll try 
to reproduce this issue.


Matan



Thanks for looking into this!

- Alex


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/