Re: [PATCH V1] NFS-RDMA: fix qp pointer validation checks

2014-04-24 Thread Sagi Grimberg

On 4/24/2014 2:30 AM, Devesh Sharma wrote:

Hi Chuck

Following is the complete call trace of a typical NFS-RDMA transaction while 
mounting a share.
It is unavoidable to stop calling post-send in case it is not created. 
Therefore, applying checks to the connection state is a must
While registering/deregistering frmrs on-the-fly. The unconnected state of QP 
implies don't call  post_send/post_recv from any context.



Long thread... didn't follow it all.

If I understand correctly this race comes only for *cleanup* (LINV) of 
FRMR registration while teardown flow destroyed the QP.

I think this might be disappear if for each registration you post LINV+FRMR.
This is assuming that a situation where trying to post Fastreg on a 
"bad" QP can
never happen (usually since teardown flow typically suspends outgoing 
commands).


Sagi.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: IB/cma: Make timeout dependent on the subnet timeout

2014-04-24 Thread Or Gerlitz

On 23/04/2014 16:44, Hefty, Sean wrote:

Regarding SubnetTimeout changes: the code in
drivers/infiniband/core/cache.c already queues a work request after each
port state change. Inside that work request e.g. the P_Key cache is
updated. Would it be acceptable to modify ib_cache_update() such that it
also queries the port attributes and caches these ? Cached port
attributes could e.g. be stored in struct ib_port.

Without looking at details, this at least sounds reasonable.



Sean, can't we have CMA to follow the same practice used in the CM where 
we derive the RC QP timeout based on the packet life time retrieved in 
path queries? e.g base the cm response time out on this value  too?


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] mm: get_user_pages(write,force) refuse to COW in shared areas

2014-04-24 Thread Oleg Nesterov
Hi Hugh,

Sorry for late reply. First of all, to avoid the confusion, I think the
patch is fine.

When I saw this patch I decided that uprobes should be updated accordingly,
but I just realized that I do not understand what should I write in the
changelog.

On 04/04, Hugh Dickins wrote:
>
> + if (gup_flags & FOLL_WRITE) {
> + if (!(vm_flags & VM_WRITE)) {
> + if (!(gup_flags & FOLL_FORCE))
> + goto efault;
> + /*
> +  * We used to let the write,force case do COW
> +  * in a VM_MAYWRITE VM_SHARED !VM_WRITE vma, so
> +  * ptrace could set a breakpoint in a read-only
> +  * mapping of an executable, without corrupting
> +  * the file (yet only when that file had been
> +  * opened for writing!).  Anon pages in shared
> +  * mappings are surprising: now just reject it.
> +  */
> + if (!is_cow_mapping(vm_flags)) {
> + WARN_ON_ONCE(vm_flags & VM_MAYWRITE);
> + goto efault;
> + }

OK. But could you please clarify "Anon pages in shared mappings are surprising" 
?
I mean, does this only apply to "VM_MAYWRITE VM_SHARED !VM_WRITE vma" mentioned
above or this is bad even if a !FMODE_WRITE file was mmaped as MAP_SHARED ?

Yes, in this case this vma is not VM_SHARED and it is not VM_MAYWRITE, it is 
only
VM_MAYSHARE. This is in fact private mapping except mprotect(PROT_WRITE) will 
not
work.

But with or without this patch gup(FOLL_WRITE | FOLL_FORCE) won't work in this 
case,
(although perhaps it could ?), is_cow_mapping() == F because of !VM_MAYWRITE.

However, currently uprobes assumes that a cowed anon page is fine in this case, 
and
this differs from gup().

So, what do you think about the patch below? It is probably fine in any case,
but is there any "strong" reason to follow the gup's behaviour and forbid the
anon page in VM_MAYSHARE && !VM_MAYWRITE vma?

Oleg.

--- x/kernel/events/uprobes.c
+++ x/kernel/events/uprobes.c
@@ -127,12 +127,13 @@ struct xol_area {
  */
 static bool valid_vma(struct vm_area_struct *vma, bool is_register)
 {
-   vm_flags_t flags = VM_HUGETLB | VM_MAYEXEC | VM_SHARED;
+   vm_flags_t flags = VM_HUGETLB | VM_MAYEXEC;
 
if (is_register)
flags |= VM_WRITE;
 
-   return vma->vm_file && (vma->vm_flags & flags) == VM_MAYEXEC;
+   return  vma->vm_file && is_cow_mapping(vma->vm_flags) &&
+   (vma->vm_flags & flags) == VM_MAYEXEC;
 }
 
 static unsigned long offset_to_vaddr(struct vm_area_struct *vma, loff_t offset)

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 00/20] On demand paging

2014-04-24 Thread Or Gerlitz

On 02/03/2014 12:49, Haggai Eran wrote:

The following set of patches implements on-demand paging (ODP) support
in the RDMA stack and in the mlx5_ib Infiniband driver.



I've placed the latest cut of the ODP patches on my public git tree @

git://beany.openfabrics.org/~ogerlitz/linux-2.6.git odp

this is actually V0.1 with the following changes:

- Rebase against v3.15-rc2
- Removed dependency on patches that were accepted upstream
- Changed use of compound_trans_head to compound_head as the latter was 
removed in 3.14



f5d7fc1 IB/mlx5: Implement on demand paging by adding support for MMU 
notifiers

09eae22 IB/mlx5: Add support for RDMA write responder page faults
ca84a78 IB/mlx5: Handle page faults
302a6ea IB/mlx5: Page faults handling infrastructure
df792a8 IB/mlx5: Add function to read WQE from user-space
1b4f69b IB/mlx5: Add mlx5_ib_update_mtt to update page tables after creation
bc5a6b0 IB/mlx5: Changes in memory region creation to support on-demand 
paging

335c8ef IB/mlx5: Implement the ODP capability query verb
5a67390 net/mlx5_core: Add support for page faults events and low level 
handling

11450c4 IB/mlx5: Refactor UMR to have its own context struct
51deb37 IB/mlx5: Enhance UMR support to allow partial page table update
107bc64 IB/mlx5: Set QP offsets and parameters for user QPs and not just 
for kernel QPs
e91a314 mlx5: Store MR attributes in mlx5_mr_core during creation and 
after UMR

894f946 IB/mlx5: Add MR to radix tree in reg_mr_callback
9283891 IB/mlx5: Fix error handling in reg_umr
467f4e7 IB/core: Implement support for MMU notifiers regarding on demand 
paging regions

c98a42e IB/core: Add support for on demand paging regions
8fb5241 IB/core: Add umem function to read data from user-space
16c9cf0 IB/core: Replace ib_umem's offset field with a full address
9f0d8b5 IB/core: Add flags for on demand paging support

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V1] NFS-RDMA: fix qp pointer validation checks

2014-04-24 Thread Chuck Lever

On Apr 24, 2014, at 3:12 AM, Sagi Grimberg  wrote:

> On 4/24/2014 2:30 AM, Devesh Sharma wrote:
>> Hi Chuck
>> 
>> Following is the complete call trace of a typical NFS-RDMA transaction while 
>> mounting a share.
>> It is unavoidable to stop calling post-send in case it is not created. 
>> Therefore, applying checks to the connection state is a must
>> While registering/deregistering frmrs on-the-fly. The unconnected state of 
>> QP implies don't call  post_send/post_recv from any context.
>> 
> 
> Long thread... didn't follow it all.

I think you got the gist of it.

> If I understand correctly this race comes only for *cleanup* (LINV) of FRMR 
> registration while teardown flow destroyed the QP.
> I think this might be disappear if for each registration you post LINV+FRMR.
> This is assuming that a situation where trying to post Fastreg on a "bad" QP 
> can
> never happen (usually since teardown flow typically suspends outgoing 
> commands).

That’s typically true for “hard” NFS mounts. But “soft” NFS mounts
wake RPCs after a timeout while the transport is disconnected, in
order to kill them.  At that point, deregistration still needs to
succeed somehow.

IMO there are three related problems.

1.  rpcrdma_ep_connect() is allowing RPC tasks to be awoken while
there is no QP at all (->qp is NULL). The woken RPC tasks are
trying to deregister buffers that may include page cache pages,
and it’s oopsing because ->qp is NULL.

That’s a logic bug in rpcrdma_ep_connect(), and I have an idea
how to address it.

2.  If a QP is present but disconnected, posting LOCAL_INV won’t work. 
That leaves buffers (and page cache pages, potentially) registered.
That could be addressed with LINV+FRMR. But...

3.  The client should not leave page cache pages registered indefinitely.
Both LINV+FRMR and our current approach depends on having a working
QP _at_ _some_ _point_ … but the client simply can’t depend on that.
What happens if an NFS server is, say, destroyed by fire while there
are active client mount points? What if the HCA’s firmware is
permanently not allowing QP creation?

Here's a relevant comment in rpcrdma_ep_connect():

 815 /* TEMP TEMP TEMP - fail if new device:
 816  * Deregister/remarshal *all* requests!
 817  * Close and recreate adapter, pd, etc!
 818  * Re-determine all attributes still sane!
 819  * More stuff I haven't thought of!
 820  * Rrrgh!
 821  */

xprtrdma does not do this today.

When a new device is created, all existing RPC requests could be
deregistered and re-marshalled.  As far as I can tell,
rpcrdma_ep_connect() is executing in a synchronous context (the connect
worker) and we can simply use dereg_mr, as long as later, when the RPCs
are re-driven, they know they need to re-marshal.

I’ll try some things today.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH V1] NFS-RDMA: fix qp pointer validation checks

2014-04-24 Thread Devesh Sharma
Thanks Chuck for summarizing.
One more issue is being added to the list below.

> -Original Message-
> From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-
> ow...@vger.kernel.org] On Behalf Of Chuck Lever
> Sent: Thursday, April 24, 2014 8:31 PM
> To: Sagi Grimberg
> Cc: Devesh Sharma; Linux NFS Mailing List; linux-rdma@vger.kernel.org;
> Trond Myklebust
> Subject: Re: [PATCH V1] NFS-RDMA: fix qp pointer validation checks
> 
> 
> On Apr 24, 2014, at 3:12 AM, Sagi Grimberg 
> wrote:
> 
> > On 4/24/2014 2:30 AM, Devesh Sharma wrote:
> >> Hi Chuck
> >>
> >> Following is the complete call trace of a typical NFS-RDMA transaction
> while mounting a share.
> >> It is unavoidable to stop calling post-send in case it is not
> >> created. Therefore, applying checks to the connection state is a must
> While registering/deregistering frmrs on-the-fly. The unconnected state of
> QP implies don't call  post_send/post_recv from any context.
> >>
> >
> > Long thread... didn't follow it all.
> 
> I think you got the gist of it.
> 
> > If I understand correctly this race comes only for *cleanup* (LINV) of FRMR
> registration while teardown flow destroyed the QP.
> > I think this might be disappear if for each registration you post LINV+FRMR.
> > This is assuming that a situation where trying to post Fastreg on a
> > "bad" QP can never happen (usually since teardown flow typically suspends
> outgoing commands).
> 
> That's typically true for "hard" NFS mounts. But "soft" NFS mounts wake
> RPCs after a timeout while the transport is disconnected, in order to kill
> them.  At that point, deregistration still needs to succeed somehow.
> 
> IMO there are three related problems.
> 
> 1.  rpcrdma_ep_connect() is allowing RPC tasks to be awoken while
> there is no QP at all (->qp is NULL). The woken RPC tasks are
> trying to deregister buffers that may include page cache pages,
> and it's oopsing because ->qp is NULL.
> 
> That's a logic bug in rpcrdma_ep_connect(), and I have an idea
> how to address it.
> 
> 2.  If a QP is present but disconnected, posting LOCAL_INV won't work.
> That leaves buffers (and page cache pages, potentially) registered.
> That could be addressed with LINV+FRMR. But...
> 
> 3.  The client should not leave page cache pages registered indefinitely.
> Both LINV+FRMR and our current approach depends on having a working
> QP _at_ _some_ _point_ ... but the client simply can't depend on that.
> What happens if an NFS server is, say, destroyed by fire while there
> are active client mount points? What if the HCA's firmware is
> permanently not allowing QP creation?
Addition to the list
4. If rdma traffic is in progress and  the network link goes down and comes 
back up after some time (t > 10 secs ), 
The rpcrdma_ep_connect() does not destroys the existing QP because 
rpcrdma_create_id fails (rdma_resolve_addr fails).
Now, once the connect worker thread Gets rescheduled again, every time CM 
fails with establishment error. Finally, after multiple tries
CM fails with rdma_cm_event = 15 and entire recovery thread sits silently 
forever and kernel reports user app is blocked for more than 120 secs. 
> 
> Here's a relevant comment in rpcrdma_ep_connect():
> 
>  815 /* TEMP TEMP TEMP - fail if new device:
>  816  * Deregister/remarshal *all* requests!
>  817  * Close and recreate adapter, pd, etc!
>  818  * Re-determine all attributes still sane!
>  819  * More stuff I haven't thought of!
>  820  * Rrrgh!
>  821  */
> 
> xprtrdma does not do this today.
> 
> When a new device is created, all existing RPC requests could be
> deregistered and re-marshalled.  As far as I can tell,
> rpcrdma_ep_connect() is executing in a synchronous context (the connect
> worker) and we can simply use dereg_mr, as long as later, when the RPCs are
> re-driven, they know they need to re-marshal.
> 
> I'll try some things today.
> 
> --
> Chuck Lever
> chuck[dot]lever[at]oracle[dot]com
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the
> body of a message to majord...@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V1] NFS-RDMA: fix qp pointer validation checks

2014-04-24 Thread Chuck Lever

On Apr 24, 2014, at 11:48 AM, Devesh Sharma  wrote:

> Thanks Chuck for summarizing.
> One more issue is being added to the list below.
> 
>> -Original Message-
>> From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-
>> ow...@vger.kernel.org] On Behalf Of Chuck Lever
>> Sent: Thursday, April 24, 2014 8:31 PM
>> To: Sagi Grimberg
>> Cc: Devesh Sharma; Linux NFS Mailing List; linux-rdma@vger.kernel.org;
>> Trond Myklebust
>> Subject: Re: [PATCH V1] NFS-RDMA: fix qp pointer validation checks
>> 
>> 
>> On Apr 24, 2014, at 3:12 AM, Sagi Grimberg 
>> wrote:
>> 
>>> On 4/24/2014 2:30 AM, Devesh Sharma wrote:
 Hi Chuck
 
 Following is the complete call trace of a typical NFS-RDMA transaction
>> while mounting a share.
 It is unavoidable to stop calling post-send in case it is not
 created. Therefore, applying checks to the connection state is a must
>> While registering/deregistering frmrs on-the-fly. The unconnected state of
>> QP implies don't call  post_send/post_recv from any context.
 
>>> 
>>> Long thread... didn't follow it all.
>> 
>> I think you got the gist of it.
>> 
>>> If I understand correctly this race comes only for *cleanup* (LINV) of FRMR
>> registration while teardown flow destroyed the QP.
>>> I think this might be disappear if for each registration you post LINV+FRMR.
>>> This is assuming that a situation where trying to post Fastreg on a
>>> "bad" QP can never happen (usually since teardown flow typically suspends
>> outgoing commands).
>> 
>> That's typically true for "hard" NFS mounts. But "soft" NFS mounts wake
>> RPCs after a timeout while the transport is disconnected, in order to kill
>> them.  At that point, deregistration still needs to succeed somehow.
>> 
>> IMO there are three related problems.
>> 
>> 1.  rpcrdma_ep_connect() is allowing RPC tasks to be awoken while
>>there is no QP at all (->qp is NULL). The woken RPC tasks are
>>trying to deregister buffers that may include page cache pages,
>>and it's oopsing because ->qp is NULL.
>> 
>>That's a logic bug in rpcrdma_ep_connect(), and I have an idea
>>how to address it.
>> 
>> 2.  If a QP is present but disconnected, posting LOCAL_INV won't work.
>>That leaves buffers (and page cache pages, potentially) registered.
>>That could be addressed with LINV+FRMR. But...
>> 
>> 3.  The client should not leave page cache pages registered indefinitely.
>>Both LINV+FRMR and our current approach depends on having a working
>>QP _at_ _some_ _point_ ... but the client simply can't depend on that.
>>What happens if an NFS server is, say, destroyed by fire while there
>>are active client mount points? What if the HCA's firmware is
>>permanently not allowing QP creation?
> Addition to the list
> 4. If rdma traffic is in progress and  the network link goes down and comes 
> back up after some time (t > 10 secs ), 
>The rpcrdma_ep_connect() does not destroys the existing QP because 
> rpcrdma_create_id fails (rdma_resolve_addr fails).
>Now, once the connect worker thread Gets rescheduled again, every time CM 
> fails with establishment error. Finally, after multiple tries
>CM fails with rdma_cm_event = 15 and entire recovery thread sits silently 
> forever and kernel reports user app is blocked for more than 120 secs.

I think I see that now. I should be able to address it with the fixes for 1.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3.15-rc3--to=rol...@purestorage.com 1/3] iw_cxgb4: Fix endpoint mutex deadlocks

2014-04-24 Thread Steve Wise
In cases where the cm calls c4iw_modify_rc_qp() with the endpoint
mutex held, they must be called with internal == 1.  rx_data() and
process_mpa_reply() are not doing this.  This causes a deadlock because
c4iw_modify_rc_qp() might call c4iw_ep_disconnect() in some !internal
cases, and c4iw_ep_disconnect() acquires the endpoint mutex.  The design
was intended to only do the disconnect for !internal calls.

Change rx_data(), FPDU_MODE case, to call c4iw_modify_rc_qp() with
internal == 1, and then disconnect only after releasing the mutex.

Change process_mpa_reply() to call c4iw_modify_rc_qp(TERMINATE) with
internal == 1 and set a new attr flag telling it to send a TERMINATE
message.  Previously this was implied by !internal.

Change process_mpa_reply() to return whether the caller should disconnect
after releasing the endpoint mutex.  Now rx_data() will do the disconnect
in the cases where process_mpa_reply() wants to disconnect after the
TERMINATE is sent.

Change c4iw_modify_rc_qp() RTS->TERM to only disconnect if !internal, and
to send a TERMINATE message if attrs->send_term is 1.

Change abort_connection() to not aquire the ep mutex for setting the state, and
make all calls to abort_connection() do so with the mutex held.

Signed-off-by: Steve Wise 
---

 drivers/infiniband/hw/cxgb4/cm.c   |   31 ---
 drivers/infiniband/hw/cxgb4/iw_cxgb4.h |1 +
 drivers/infiniband/hw/cxgb4/qp.c   |9 +
 3 files changed, 26 insertions(+), 15 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb4/cm.c b/drivers/infiniband/hw/cxgb4/cm.c
index 185452a..f9b04bc 100644
--- a/drivers/infiniband/hw/cxgb4/cm.c
+++ b/drivers/infiniband/hw/cxgb4/cm.c
@@ -996,7 +996,7 @@ static void close_complete_upcall(struct c4iw_ep *ep, int 
status)
 static int abort_connection(struct c4iw_ep *ep, struct sk_buff *skb, gfp_t gfp)
 {
PDBG("%s ep %p tid %u\n", __func__, ep, ep->hwtid);
-   state_set(&ep->com, ABORTING);
+   __state_set(&ep->com, ABORTING);
set_bit(ABORT_CONN, &ep->com.history);
return send_abort(ep, skb, gfp);
 }
@@ -1154,7 +1154,7 @@ static int update_rx_credits(struct c4iw_ep *ep, u32 
credits)
return credits;
 }
 
-static void process_mpa_reply(struct c4iw_ep *ep, struct sk_buff *skb)
+static int process_mpa_reply(struct c4iw_ep *ep, struct sk_buff *skb)
 {
struct mpa_message *mpa;
struct mpa_v2_conn_params *mpa_v2_params;
@@ -1164,6 +1164,7 @@ static void process_mpa_reply(struct c4iw_ep *ep, struct 
sk_buff *skb)
struct c4iw_qp_attributes attrs;
enum c4iw_qp_attr_mask mask;
int err;
+   int disconnect = 0;
 
PDBG("%s ep %p tid %u\n", __func__, ep, ep->hwtid);
 
@@ -1173,7 +1174,7 @@ static void process_mpa_reply(struct c4iw_ep *ep, struct 
sk_buff *skb)
 * will abort the connection.
 */
if (stop_ep_timer(ep))
-   return;
+   return 0;
 
/*
 * If we get more than the supported amount of private data
@@ -1195,7 +1196,7 @@ static void process_mpa_reply(struct c4iw_ep *ep, struct 
sk_buff *skb)
 * if we don't even have the mpa message, then bail.
 */
if (ep->mpa_pkt_len < sizeof(*mpa))
-   return;
+   return 0;
mpa = (struct mpa_message *) ep->mpa_pkt;
 
/* Validate MPA header. */
@@ -1235,7 +1236,7 @@ static void process_mpa_reply(struct c4iw_ep *ep, struct 
sk_buff *skb)
 * We'll continue process when more data arrives.
 */
if (ep->mpa_pkt_len < (sizeof(*mpa) + plen))
-   return;
+   return 0;
 
if (mpa->flags & MPA_REJECT) {
err = -ECONNREFUSED;
@@ -1337,9 +1338,11 @@ static void process_mpa_reply(struct c4iw_ep *ep, struct 
sk_buff *skb)
attrs.layer_etype = LAYER_MPA | DDP_LLP;
attrs.ecode = MPA_NOMATCH_RTR;
attrs.next_state = C4IW_QP_STATE_TERMINATE;
+   attrs.send_term = 1;
err = c4iw_modify_qp(ep->com.qp->rhp, ep->com.qp,
-   C4IW_QP_ATTR_NEXT_STATE, &attrs, 0);
+   C4IW_QP_ATTR_NEXT_STATE, &attrs, 1);
err = -ENOMEM;
+   disconnect = 1;
goto out;
}
 
@@ -1355,9 +1358,11 @@ static void process_mpa_reply(struct c4iw_ep *ep, struct 
sk_buff *skb)
attrs.layer_etype = LAYER_MPA | DDP_LLP;
attrs.ecode = MPA_INSUFF_IRD;
attrs.next_state = C4IW_QP_STATE_TERMINATE;
+   attrs.send_term = 1;
err = c4iw_modify_qp(ep->com.qp->rhp, ep->com.qp,
-   C4IW_QP_ATTR_NEXT_STATE, &attrs, 0);
+   C4IW_QP_ATTR_NEXT_STATE, &attrs, 1);
err = -ENOMEM;
+   disconnect = 1;
goto out;
}
goto out;
@@ -1366,7 +1371,7 @@ err:
send_abort(ep, sk

[PATCH 3.15-rc3--to=rol...@purestorage.com 2/3] iw_cxgb4: force T5 connections to use TAHOE cong control

2014-04-24 Thread Steve Wise
This is required to work around a T5 HW issue.

Signed-off-by: Steve Wise 
---

 drivers/infiniband/hw/cxgb4/cm.c  |8 
 drivers/infiniband/hw/cxgb4/t4fw_ri_api.h |   14 ++
 2 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb4/cm.c b/drivers/infiniband/hw/cxgb4/cm.c
index f9b04bc..1f863a9 100644
--- a/drivers/infiniband/hw/cxgb4/cm.c
+++ b/drivers/infiniband/hw/cxgb4/cm.c
@@ -587,6 +587,10 @@ static int send_connect(struct c4iw_ep *ep)
opt2 |= SACK_EN(1);
if (wscale && enable_tcp_window_scaling)
opt2 |= WND_SCALE_EN(1);
+   if (is_t5(ep->com.dev->rdev.lldi.adapter_type)) {
+   opt2 |= T5_OPT_2_VALID;
+   opt2 |= V_CONG_CNTRL(CONG_ALG_TAHOE);
+   }
t4_set_arp_err_handler(skb, NULL, act_open_req_arp_failure);
 
if (is_t4(ep->com.dev->rdev.lldi.adapter_type)) {
@@ -2018,6 +2022,10 @@ static void accept_cr(struct c4iw_ep *ep, struct sk_buff 
*skb,
if (tcph->ece && tcph->cwr)
opt2 |= CCTRL_ECN(1);
}
+   if (is_t5(ep->com.dev->rdev.lldi.adapter_type)) {
+   opt2 |= T5_OPT_2_VALID;
+   opt2 |= V_CONG_CNTRL(CONG_ALG_TAHOE);
+   }
 
rpl = cplhdr(skb);
INIT_TP_WR(rpl, ep->hwtid);
diff --git a/drivers/infiniband/hw/cxgb4/t4fw_ri_api.h 
b/drivers/infiniband/hw/cxgb4/t4fw_ri_api.h
index dc193c2..6121ca0 100644
--- a/drivers/infiniband/hw/cxgb4/t4fw_ri_api.h
+++ b/drivers/infiniband/hw/cxgb4/t4fw_ri_api.h
@@ -836,4 +836,18 @@ struct ulptx_idata {
 #define V_RX_DACK_CHANGE(x) ((x) << S_RX_DACK_CHANGE)
 #define F_RX_DACK_CHANGEV_RX_DACK_CHANGE(1U)
 
+enum { /* TCP congestion control algorithms */
+   CONG_ALG_RENO,
+   CONG_ALG_TAHOE,
+   CONG_ALG_NEWRENO,
+   CONG_ALG_HIGHSPEED
+};
+
+#define S_CONG_CNTRL14
+#define M_CONG_CNTRL0x3
+#define V_CONG_CNTRL(x) ((x) << S_CONG_CNTRL)
+#define G_CONG_CNTRL(x) (((x) >> S_CONG_CNTRL) & M_CONG_CNTRL)
+
+#define T5_OPT_2_VALID   (1 << 31)
+
 #endif /* _T4FW_RI_API_H_ */

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3.15-rc3--to=rol...@purestorage.com 3/3] iw_cxgb4: only allow kernel db ringing for T4 devs

2014-04-24 Thread Steve Wise
The whole db drop avoidance stuff is for T4 only.  So we cannot allow
that to be enabled for T5 devices.

Signed-off-by: Steve Wise 
---

 drivers/infiniband/hw/cxgb4/qp.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb4/qp.c b/drivers/infiniband/hw/cxgb4/qp.c
index f18ef34..086f62f 100644
--- a/drivers/infiniband/hw/cxgb4/qp.c
+++ b/drivers/infiniband/hw/cxgb4/qp.c
@@ -1777,11 +1777,15 @@ int c4iw_ib_modify_qp(struct ib_qp *ibqp, struct 
ib_qp_attr *attr,
/*
 * Use SQ_PSN and RQ_PSN to pass in IDX_INC values for
 * ringing the queue db when we're in DB_FULL mode.
+* Only allow this on T4 devices.
 */
attrs.sq_db_inc = attr->sq_psn;
attrs.rq_db_inc = attr->rq_psn;
mask |= (attr_mask & IB_QP_SQ_PSN) ? C4IW_QP_ATTR_SQ_DB : 0;
mask |= (attr_mask & IB_QP_RQ_PSN) ? C4IW_QP_ATTR_RQ_DB : 0;
+   if (is_t5(to_c4iw_qp(ibqp)->rhp->rdev.lldi.adapter_type) &&
+   (mask & (C4IW_QP_ATTR_SQ_DB|C4IW_QP_ATTR_RQ_DB)))
+   return -EINVAL;
 
return c4iw_modify_qp(rhp, qhp, mask, &attrs, 0);
 }

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RESEND 3.15-rc3 1/3] iw_cxgb4: Fix endpoint mutex deadlocks

2014-04-24 Thread Steve Wise
In cases where the cm calls c4iw_modify_rc_qp() with the endpoint
mutex held, they must be called with internal == 1.  rx_data() and
process_mpa_reply() are not doing this.  This causes a deadlock because
c4iw_modify_rc_qp() might call c4iw_ep_disconnect() in some !internal
cases, and c4iw_ep_disconnect() acquires the endpoint mutex.  The design
was intended to only do the disconnect for !internal calls.

Change rx_data(), FPDU_MODE case, to call c4iw_modify_rc_qp() with
internal == 1, and then disconnect only after releasing the mutex.

Change process_mpa_reply() to call c4iw_modify_rc_qp(TERMINATE) with
internal == 1 and set a new attr flag telling it to send a TERMINATE
message.  Previously this was implied by !internal.

Change process_mpa_reply() to return whether the caller should disconnect
after releasing the endpoint mutex.  Now rx_data() will do the disconnect
in the cases where process_mpa_reply() wants to disconnect after the
TERMINATE is sent.

Change c4iw_modify_rc_qp() RTS->TERM to only disconnect if !internal, and
to send a TERMINATE message if attrs->send_term is 1.

Change abort_connection() to not aquire the ep mutex for setting the state, and
make all calls to abort_connection() do so with the mutex held.

Signed-off-by: Steve Wise 
---

 drivers/infiniband/hw/cxgb4/cm.c   |   31 ---
 drivers/infiniband/hw/cxgb4/iw_cxgb4.h |1 +
 drivers/infiniband/hw/cxgb4/qp.c   |9 +
 3 files changed, 26 insertions(+), 15 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb4/cm.c b/drivers/infiniband/hw/cxgb4/cm.c
index 185452a..f9b04bc 100644
--- a/drivers/infiniband/hw/cxgb4/cm.c
+++ b/drivers/infiniband/hw/cxgb4/cm.c
@@ -996,7 +996,7 @@ static void close_complete_upcall(struct c4iw_ep *ep, int 
status)
 static int abort_connection(struct c4iw_ep *ep, struct sk_buff *skb, gfp_t gfp)
 {
PDBG("%s ep %p tid %u\n", __func__, ep, ep->hwtid);
-   state_set(&ep->com, ABORTING);
+   __state_set(&ep->com, ABORTING);
set_bit(ABORT_CONN, &ep->com.history);
return send_abort(ep, skb, gfp);
 }
@@ -1154,7 +1154,7 @@ static int update_rx_credits(struct c4iw_ep *ep, u32 
credits)
return credits;
 }
 
-static void process_mpa_reply(struct c4iw_ep *ep, struct sk_buff *skb)
+static int process_mpa_reply(struct c4iw_ep *ep, struct sk_buff *skb)
 {
struct mpa_message *mpa;
struct mpa_v2_conn_params *mpa_v2_params;
@@ -1164,6 +1164,7 @@ static void process_mpa_reply(struct c4iw_ep *ep, struct 
sk_buff *skb)
struct c4iw_qp_attributes attrs;
enum c4iw_qp_attr_mask mask;
int err;
+   int disconnect = 0;
 
PDBG("%s ep %p tid %u\n", __func__, ep, ep->hwtid);
 
@@ -1173,7 +1174,7 @@ static void process_mpa_reply(struct c4iw_ep *ep, struct 
sk_buff *skb)
 * will abort the connection.
 */
if (stop_ep_timer(ep))
-   return;
+   return 0;
 
/*
 * If we get more than the supported amount of private data
@@ -1195,7 +1196,7 @@ static void process_mpa_reply(struct c4iw_ep *ep, struct 
sk_buff *skb)
 * if we don't even have the mpa message, then bail.
 */
if (ep->mpa_pkt_len < sizeof(*mpa))
-   return;
+   return 0;
mpa = (struct mpa_message *) ep->mpa_pkt;
 
/* Validate MPA header. */
@@ -1235,7 +1236,7 @@ static void process_mpa_reply(struct c4iw_ep *ep, struct 
sk_buff *skb)
 * We'll continue process when more data arrives.
 */
if (ep->mpa_pkt_len < (sizeof(*mpa) + plen))
-   return;
+   return 0;
 
if (mpa->flags & MPA_REJECT) {
err = -ECONNREFUSED;
@@ -1337,9 +1338,11 @@ static void process_mpa_reply(struct c4iw_ep *ep, struct 
sk_buff *skb)
attrs.layer_etype = LAYER_MPA | DDP_LLP;
attrs.ecode = MPA_NOMATCH_RTR;
attrs.next_state = C4IW_QP_STATE_TERMINATE;
+   attrs.send_term = 1;
err = c4iw_modify_qp(ep->com.qp->rhp, ep->com.qp,
-   C4IW_QP_ATTR_NEXT_STATE, &attrs, 0);
+   C4IW_QP_ATTR_NEXT_STATE, &attrs, 1);
err = -ENOMEM;
+   disconnect = 1;
goto out;
}
 
@@ -1355,9 +1358,11 @@ static void process_mpa_reply(struct c4iw_ep *ep, struct 
sk_buff *skb)
attrs.layer_etype = LAYER_MPA | DDP_LLP;
attrs.ecode = MPA_INSUFF_IRD;
attrs.next_state = C4IW_QP_STATE_TERMINATE;
+   attrs.send_term = 1;
err = c4iw_modify_qp(ep->com.qp->rhp, ep->com.qp,
-   C4IW_QP_ATTR_NEXT_STATE, &attrs, 0);
+   C4IW_QP_ATTR_NEXT_STATE, &attrs, 1);
err = -ENOMEM;
+   disconnect = 1;
goto out;
}
goto out;
@@ -1366,7 +1371,7 @@ err:
send_abort(ep, sk

[PATCH RESEND 3.15-rc3 2/3] iw_cxgb4: force T5 connections to use TAHOE cong control

2014-04-24 Thread Steve Wise
This is required to work around a T5 HW issue.

Signed-off-by: Steve Wise 
---

 drivers/infiniband/hw/cxgb4/cm.c  |8 
 drivers/infiniband/hw/cxgb4/t4fw_ri_api.h |   14 ++
 2 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb4/cm.c b/drivers/infiniband/hw/cxgb4/cm.c
index f9b04bc..1f863a9 100644
--- a/drivers/infiniband/hw/cxgb4/cm.c
+++ b/drivers/infiniband/hw/cxgb4/cm.c
@@ -587,6 +587,10 @@ static int send_connect(struct c4iw_ep *ep)
opt2 |= SACK_EN(1);
if (wscale && enable_tcp_window_scaling)
opt2 |= WND_SCALE_EN(1);
+   if (is_t5(ep->com.dev->rdev.lldi.adapter_type)) {
+   opt2 |= T5_OPT_2_VALID;
+   opt2 |= V_CONG_CNTRL(CONG_ALG_TAHOE);
+   }
t4_set_arp_err_handler(skb, NULL, act_open_req_arp_failure);
 
if (is_t4(ep->com.dev->rdev.lldi.adapter_type)) {
@@ -2018,6 +2022,10 @@ static void accept_cr(struct c4iw_ep *ep, struct sk_buff 
*skb,
if (tcph->ece && tcph->cwr)
opt2 |= CCTRL_ECN(1);
}
+   if (is_t5(ep->com.dev->rdev.lldi.adapter_type)) {
+   opt2 |= T5_OPT_2_VALID;
+   opt2 |= V_CONG_CNTRL(CONG_ALG_TAHOE);
+   }
 
rpl = cplhdr(skb);
INIT_TP_WR(rpl, ep->hwtid);
diff --git a/drivers/infiniband/hw/cxgb4/t4fw_ri_api.h 
b/drivers/infiniband/hw/cxgb4/t4fw_ri_api.h
index dc193c2..6121ca0 100644
--- a/drivers/infiniband/hw/cxgb4/t4fw_ri_api.h
+++ b/drivers/infiniband/hw/cxgb4/t4fw_ri_api.h
@@ -836,4 +836,18 @@ struct ulptx_idata {
 #define V_RX_DACK_CHANGE(x) ((x) << S_RX_DACK_CHANGE)
 #define F_RX_DACK_CHANGEV_RX_DACK_CHANGE(1U)
 
+enum { /* TCP congestion control algorithms */
+   CONG_ALG_RENO,
+   CONG_ALG_TAHOE,
+   CONG_ALG_NEWRENO,
+   CONG_ALG_HIGHSPEED
+};
+
+#define S_CONG_CNTRL14
+#define M_CONG_CNTRL0x3
+#define V_CONG_CNTRL(x) ((x) << S_CONG_CNTRL)
+#define G_CONG_CNTRL(x) (((x) >> S_CONG_CNTRL) & M_CONG_CNTRL)
+
+#define T5_OPT_2_VALID   (1 << 31)
+
 #endif /* _T4FW_RI_API_H_ */

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RESEND 3.15-rc3 3/3] iw_cxgb4: only allow kernel db ringing for T4 devs

2014-04-24 Thread Steve Wise
The whole db drop avoidance stuff is for T4 only.  So we cannot allow
that to be enabled for T5 devices.

Signed-off-by: Steve Wise 
---

 drivers/infiniband/hw/cxgb4/qp.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb4/qp.c b/drivers/infiniband/hw/cxgb4/qp.c
index f18ef34..086f62f 100644
--- a/drivers/infiniband/hw/cxgb4/qp.c
+++ b/drivers/infiniband/hw/cxgb4/qp.c
@@ -1777,11 +1777,15 @@ int c4iw_ib_modify_qp(struct ib_qp *ibqp, struct 
ib_qp_attr *attr,
/*
 * Use SQ_PSN and RQ_PSN to pass in IDX_INC values for
 * ringing the queue db when we're in DB_FULL mode.
+* Only allow this on T4 devices.
 */
attrs.sq_db_inc = attr->sq_psn;
attrs.rq_db_inc = attr->rq_psn;
mask |= (attr_mask & IB_QP_SQ_PSN) ? C4IW_QP_ATTR_SQ_DB : 0;
mask |= (attr_mask & IB_QP_RQ_PSN) ? C4IW_QP_ATTR_RQ_DB : 0;
+   if (is_t5(to_c4iw_qp(ibqp)->rhp->rdev.lldi.adapter_type) &&
+   (mask & (C4IW_QP_ATTR_SQ_DB|C4IW_QP_ATTR_RQ_DB)))
+   return -EINVAL;
 
return c4iw_modify_qp(rhp, qhp, mask, &attrs, 0);
 }

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] mm: get_user_pages(write,force) refuse to COW in shared areas

2014-04-24 Thread Hugh Dickins
On Thu, 24 Apr 2014, Oleg Nesterov wrote:

> Hi Hugh,
> 
> Sorry for late reply. First of all, to avoid the confusion, I think the
> patch is fine.
> 
> When I saw this patch I decided that uprobes should be updated accordingly,
> but I just realized that I do not understand what should I write in the
> changelog.

Thanks a lot for considering similar issues in uprobes, Oleg: I merely
checked that its uses of get_user_pages() would not be problematic,
and didn't look around to rediscover the worrying mm business that
goes on down there in kernel/events.

> 
> On 04/04, Hugh Dickins wrote:
> >
> > +   if (gup_flags & FOLL_WRITE) {
> > +   if (!(vm_flags & VM_WRITE)) {
> > +   if (!(gup_flags & FOLL_FORCE))
> > +   goto efault;
> > +   /*
> > +* We used to let the write,force case do COW
> > +* in a VM_MAYWRITE VM_SHARED !VM_WRITE vma, so
> > +* ptrace could set a breakpoint in a read-only
> > +* mapping of an executable, without corrupting
> > +* the file (yet only when that file had been
> > +* opened for writing!).  Anon pages in shared
> > +* mappings are surprising: now just reject it.
> > +*/
> > +   if (!is_cow_mapping(vm_flags)) {
> > +   WARN_ON_ONCE(vm_flags & VM_MAYWRITE);
> > +   goto efault;
> > +   }
> 
> OK. But could you please clarify "Anon pages in shared mappings are 
> surprising" ?
> I mean, does this only apply to "VM_MAYWRITE VM_SHARED !VM_WRITE vma" 
> mentioned
> above or this is bad even if a !FMODE_WRITE file was mmaped as MAP_SHARED ?

Good question. I simply didn't consider that - and (as you have realized)
didn't need to consider it, because I was just stopping the problematic
behaviour in gup(), and didn't need to consider whether other behaviour
prohibited by gup() was actually unproblematic.

> 
> Yes, in this case this vma is not VM_SHARED and it is not VM_MAYWRITE, it is 
> only
> VM_MAYSHARE. This is in fact private mapping except mprotect(PROT_WRITE) will 
> not
> work.
> 
> But with or without this patch gup(FOLL_WRITE | FOLL_FORCE) won't work in 
> this case,
 "this" meaning my patch rather than yours below
> (although perhaps it could ?), is_cow_mapping() == F because of !VM_MAYWRITE.
> 
> However, currently uprobes assumes that a cowed anon page is fine in this 
> case, and
> this differs from gup().
> 
> So, what do you think about the patch below? It is probably fine in any case,
> but is there any "strong" reason to follow the gup's behaviour and forbid the
> anon page in VM_MAYSHARE && !VM_MAYWRITE vma?

I don't think there is a "strong" reason to forbid it.

The strongest reason is simply that it's much safer if uprobes follows
the same conventions as mm, and get_user_pages() happens to have
forbidden that all along.

The philosophical reason to forbid it is that the user mmapped with
MAP_SHARED, and it's merely a kernel-internal detail that we flip off
VM_SHARED and treat these read-only shared mappings very much like
private mappings.  The user asked for MAP_SHARED, and we prefer to
respect that by not letting private COWs creep in.

We could treat those mappings even more like private mappings, and
allow the COWs; but better to be strict about it, so long as doing
so doesn't give you regressions.

> 
> Oleg.
> 
> --- x/kernel/events/uprobes.c
> +++ x/kernel/events/uprobes.c
> @@ -127,12 +127,13 @@ struct xol_area {
>   */
>  static bool valid_vma(struct vm_area_struct *vma, bool is_register)
>  {
> - vm_flags_t flags = VM_HUGETLB | VM_MAYEXEC | VM_SHARED;
> + vm_flags_t flags = VM_HUGETLB | VM_MAYEXEC;

I think a one-line patch changing VM_SHARED to VM_MAYSHARE would do it,
wouldn't it?  And save you from having to export is_cow_mapping()
from mm/memory.c.  (I used is_cow_mapping() because I had to make the
test more complex anyway, just to exclude the case which had been
oddly handled before.)

Hugh

>  
>   if (is_register)
>   flags |= VM_WRITE;
>  
> - return vma->vm_file && (vma->vm_flags & flags) == VM_MAYEXEC;
> + return  vma->vm_file && is_cow_mapping(vma->vm_flags) &&
> + (vma->vm_flags & flags) == VM_MAYEXEC;
>  }
>  
>  static unsigned long offset_to_vaddr(struct vm_area_struct *vma, loff_t 
> offset)
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH opensm] libvendor/osm_vendor_ibumad.c: Support GRH (for GS classes)

2014-04-24 Thread Hal Rosenstock

If GRH is in incoming GS class management packet, convert umad GRH
information to OpenSM GRH information. On outgoing side, convert
OpenSM GRH information into umad GRH information.

Note that only base port 0 (GID index 0) is supported.

This is mainly for SA although other GS classes could use it (but don't).

Note also that SA reports with GRH is not handled by this patch.

Signed-off-by: Hal Rosenstock 
---
diff --git a/libvendor/osm_vendor_ibumad.c b/libvendor/osm_vendor_ibumad.c
index f9d3036..e9651f6 100644
--- a/libvendor/osm_vendor_ibumad.c
+++ b/libvendor/osm_vendor_ibumad.c
@@ -261,9 +261,20 @@ ib_mad_addr_conv(ib_user_mad_t * umad, osm_mad_addr_t * 
osm_mad_addr,
osm_mad_addr->addr_type.gsi.remote_qkey = ib_mad_addr->qkey;
osm_mad_addr->addr_type.gsi.pkey_ix = umad_get_pkey(umad);
osm_mad_addr->addr_type.gsi.service_level = ib_mad_addr->sl;
-   osm_mad_addr->addr_type.gsi.global_route = 0;   /* FIXME: handle GRH */
-   memset(&osm_mad_addr->addr_type.gsi.grh_info, 0,
-  sizeof osm_mad_addr->addr_type.gsi.grh_info);
+   if (ib_mad_addr->grh_present) {
+   osm_mad_addr->addr_type.gsi.global_route = 1;
+   osm_mad_addr->addr_type.gsi.grh_info.hop_limit = 
ib_mad_addr->hop_limit;
+   osm_mad_addr->addr_type.gsi.grh_info.ver_class_flow =
+   ib_grh_set_ver_class_flow(6,/* GRH version */
+ ib_mad_addr->traffic_class,
+ ib_mad_addr->flow_label);
+   memcpy(&osm_mad_addr->addr_type.gsi.grh_info.dest_gid,
+  &ib_mad_addr->gid, 16);
+   } else {
+   osm_mad_addr->addr_type.gsi.global_route = 0;
+   memset(&osm_mad_addr->addr_type.gsi.grh_info, 0,
+  sizeof osm_mad_addr->addr_type.gsi.grh_info);
+   }
 }
 
 static void *swap_mad_bufs(osm_madw_t * p_madw, void *umad)
@@ -290,6 +301,7 @@ static void *umad_receiver(void *p_ptr)
osm_mad_addr_t osm_addr;
osm_madw_t *p_madw, *p_req_madw;
ib_mad_t *p_mad, *p_req_mad;
+   ib_mad_addr_t *p_mad_addr;
void *umad = 0;
int mad_agent, length;
 
@@ -342,6 +354,14 @@ static void *umad_receiver(void *p_ptr)
}
 
p_mad = (ib_mad_t *) umad_get_mad(umad);
+   p_mad_addr = umad_get_mad_addr(umad);
+   /* Only support GID index 0 currently */
+   if (p_mad_addr->grh_present && p_mad_addr->gid_index) {
+   OSM_LOG(p_ur->p_log, OSM_LOG_ERROR, "ERR 5409: "
+   "GRH received on GID index %d for mgmt class 
0x%x\n",
+   p_mad_addr->gid_index, p_mad->mgmt_class);
+   continue;
+   }
 
ib_mad_addr_conv(umad, &osm_addr,
 p_mad->mgmt_class == IB_MCLASS_SUBN_LID ||
@@ -1070,6 +1090,7 @@ osm_vendor_send(IN osm_bind_handle_t h_bind,
osm_mad_addr_t *const p_mad_addr = osm_madw_get_mad_addr_ptr(p_madw);
ib_mad_t *const p_mad = osm_madw_get_mad_ptr(p_madw);
ib_sa_mad_t *const p_sa = (ib_sa_mad_t *) p_mad;
+   ib_mad_addr_t mad_addr;
int ret = -1;
int __attribute__((__unused__)) is_rmpp = 0;
uint32_t sent_mad_size;
@@ -1098,7 +1119,17 @@ osm_vendor_send(IN osm_bind_handle_t h_bind,
  p_mad_addr->addr_type.gsi.remote_qp,
  p_mad_addr->addr_type.gsi.service_level,
  IB_QP1_WELL_KNOWN_Q_KEY);
-   umad_set_grh(p_vw->umad, NULL); /* FIXME: GRH support */
+   if (p_mad_addr->addr_type.gsi.global_route) {
+   mad_addr.grh_present = 1;
+   mad_addr.gid_index = 0;
+   mad_addr.hop_limit = 
p_mad_addr->addr_type.gsi.grh_info.hop_limit;
+   
ib_grh_get_ver_class_flow(p_mad_addr->addr_type.gsi.grh_info.ver_class_flow,
+ NULL, &mad_addr.traffic_class,
+ &mad_addr.flow_label);
+   memcpy(&mad_addr.gid, 
&p_mad_addr->addr_type.gsi.grh_info.dest_gid, 16);
+   umad_set_grh(p_vw->umad, &mad_addr);
+   } else
+   umad_set_grh(p_vw->umad, NULL);
umad_set_pkey(p_vw->umad, p_mad_addr->addr_type.gsi.pkey_ix);
if (ib_class_is_rmpp(p_mad->mgmt_class)) {  /* RMPP GS classes 
FIXME: no GRH */
if (!ib_rmpp_is_flag_set((ib_rmpp_mad_t *) p_sa,
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: IB/cma: Make timeout dependent on the subnet timeout

2014-04-24 Thread Hefty, Sean
> Sean, can't we have CMA to follow the same practice used in the CM where
> we derive the RC QP timeout based on the packet life time retrieved in
> path queries? e.g base the cm response time out on this value  too?

We could.  The timeout that's being modified by the patch is the time needed by 
the remote peer to process the incoming message and send a response.  This time 
is in addition to the packet life time value that gets used.

For a remote kernel agent, the time needed to respond to a CM message may be 
fairly small.  For a user space client, the time may be significant, on the 
order to seconds to minutes.  We can probably make due with a fairly short 
timeout, provided that MRAs are used by the remote side.

There's no great solution that I can think of.  Maybe the RDMA CM can adjust 
the timeout based on the remote address, assuming that it can determine if the 
remote address is a user space or kernel agent.

- Sean
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: isert for mellanox drivers

2014-04-24 Thread Christoph Hellwig
On Thu, Apr 24, 2014 at 12:54:40PM +0300, sagi grimberg wrote:
> Well, I feel the same way (although less harsh about it), I would
> prefer to have it all inbox.
> As I see it, OFED is useful for costumers who want to upgrade RDMA
> functionality (or get Tech previews)
> without upgrading their distro or wait for it to land upstream.

For that we have the compat drivers project, which could easily handle
the rdma drivers as well.

The problem with OFED is (or was last time a looked) that it's a big
pile that includes backports, and new features not submitted or even
rejected upstream.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[no subject]

2014-04-24 Thread ADELIA SOARES RIBAS


Claim your 500,000,00 Euros
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: isert for mellanox drivers

2014-04-24 Thread Jason Gunthorpe
On Thu, Apr 24, 2014 at 10:38:25PM -0700, Christoph Hellwig wrote:
> On Thu, Apr 24, 2014 at 12:54:40PM +0300, sagi grimberg wrote:
> > Well, I feel the same way (although less harsh about it), I would
> > prefer to have it all inbox.
> > As I see it, OFED is useful for costumers who want to upgrade RDMA
> > functionality (or get Tech previews)
> > without upgrading their distro or wait for it to land upstream.
> 
> For that we have the compat drivers project, which could easily handle
> the rdma drivers as well.
> 
> The problem with OFED is (or was last time a looked) that it's a big
> pile that includes backports, and new features not submitted or even
> rejected upstream.

Official OFA OFED is now strictly backports from a given kernel
verison, and TBH, is not widely used now that everything is included
in the modern distros.

The vendor 'OFEDs' remain a big pile. I'm not even sure source is
proved for them.. At least it isn't readily apparent.

IMHO, the vendors should not be co-opting the OFED branding, but that
is a whole other topic

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: isert for mellanox drivers

2014-04-24 Thread Christoph Hellwig
On Thu, Apr 24, 2014 at 11:55:08PM -0600, Jason Gunthorpe wrote:
> Official OFA OFED is now strictly backports from a given kernel
> verison, and TBH, is not widely used now that everything is included
> in the modern distros.
> 
> The vendor 'OFEDs' remain a big pile. I'm not even sure source is
> proved for them.. At least it isn't readily apparent.
> 
> IMHO, the vendors should not be co-opting the OFED branding, but that
> is a whole other topic

Thanks for the clarification Jason!  I'll take back my rant and will
apply it to the vendors instead :)

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html