Re: [PATCH] ib_isert: prevent NULL pointer dereference in isert_login_recv_done()
On Thu, 2017-06-29 at 11:28 +0300, Sagi Grimberg wrote: > >> Can you test just the one liner fix below? > >> > @@ -1452,7 +1452,7 @@ > isert_login_recv_done(struct ib_cq *cq, struct ib_wc *wc) > { > struct isert_conn *isert_conn = wc->qp->qp_context; > -struct ib_device *ib_dev = isert_conn->cm_id->device; > +struct ib_device *ib_dev = isert_conn->device->ib_device; > if (unlikely(wc->status != IB_WC_SUCCESS)) { > isert_print_wc(wc, "login recv"); > > > > I'll test also this one-liner fix as soon as I can. > > > > But I can say that I'm pretty sure it will work as well, because all the > > previous NULL pointer dereferences that we've got in the past happened > > all 100% in isert_login_recv_done(). The other cases are probably a safe > > precaution, but they can't really happen. > > Agree, I'd prefer to start with a surgical fix before moving forward. Sounds fine. The single line fix above in isert_login_recv_done() has been queued up in target-pending/for-next with Andrea's Tested-by and your Reviewed-by: https://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending.git/commit/?h=for-next=702c0f17403765cc5aa1c18f6ea6eb549c1bac9b Please let me know if there are any other issues wrt the surgical fix.
Re: [PATCH] ib_isert: prevent NULL pointer dereference in isert_login_recv_done()
On Thu, 2017-06-29 at 11:28 +0300, Sagi Grimberg wrote: > >> Can you test just the one liner fix below? > >> > @@ -1452,7 +1452,7 @@ > isert_login_recv_done(struct ib_cq *cq, struct ib_wc *wc) > { > struct isert_conn *isert_conn = wc->qp->qp_context; > -struct ib_device *ib_dev = isert_conn->cm_id->device; > +struct ib_device *ib_dev = isert_conn->device->ib_device; > if (unlikely(wc->status != IB_WC_SUCCESS)) { > isert_print_wc(wc, "login recv"); > > > > I'll test also this one-liner fix as soon as I can. > > > > But I can say that I'm pretty sure it will work as well, because all the > > previous NULL pointer dereferences that we've got in the past happened > > all 100% in isert_login_recv_done(). The other cases are probably a safe > > precaution, but they can't really happen. > > Agree, I'd prefer to start with a surgical fix before moving forward. Sounds fine. The single line fix above in isert_login_recv_done() has been queued up in target-pending/for-next with Andrea's Tested-by and your Reviewed-by: https://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending.git/commit/?h=for-next=702c0f17403765cc5aa1c18f6ea6eb549c1bac9b Please let me know if there are any other issues wrt the surgical fix.
Re: [PATCH] ib_isert: prevent NULL pointer dereference in isert_login_recv_done()
Can you test just the one liner fix below? @@ -1452,7 +1452,7 @@ isert_login_recv_done(struct ib_cq *cq, struct ib_wc *wc) { struct isert_conn *isert_conn = wc->qp->qp_context; - struct ib_device *ib_dev = isert_conn->cm_id->device; + struct ib_device *ib_dev = isert_conn->device->ib_device; if (unlikely(wc->status != IB_WC_SUCCESS)) { isert_print_wc(wc, "login recv"); I'll test also this one-liner fix as soon as I can. But I can say that I'm pretty sure it will work as well, because all the previous NULL pointer dereferences that we've got in the past happened all 100% in isert_login_recv_done(). The other cases are probably a safe precaution, but they can't really happen. Agree, I'd prefer to start with a surgical fix before moving forward.
Re: [PATCH] ib_isert: prevent NULL pointer dereference in isert_login_recv_done()
Can you test just the one liner fix below? @@ -1452,7 +1452,7 @@ isert_login_recv_done(struct ib_cq *cq, struct ib_wc *wc) { struct isert_conn *isert_conn = wc->qp->qp_context; - struct ib_device *ib_dev = isert_conn->cm_id->device; + struct ib_device *ib_dev = isert_conn->device->ib_device; if (unlikely(wc->status != IB_WC_SUCCESS)) { isert_print_wc(wc, "login recv"); I'll test also this one-liner fix as soon as I can. But I can say that I'm pretty sure it will work as well, because all the previous NULL pointer dereferences that we've got in the past happened all 100% in isert_login_recv_done(). The other cases are probably a safe precaution, but they can't really happen. Agree, I'd prefer to start with a surgical fix before moving forward.
Re: [PATCH] ib_isert: prevent NULL pointer dereference in isert_login_recv_done()
On Thu, Jun 29, 2017 at 08:36:51AM +0300, Sagi Grimberg wrote: > > >Just tested this patch, I wasn't able to reproduce the NULL pointer > >dereference or any other bugs, so this fix seems safe enough to me. > > > >Tested-by: Andrea Righi> > Can you test just the one liner fix below? > > >>@@ -1452,7 +1452,7 @@ > >> isert_login_recv_done(struct ib_cq *cq, struct ib_wc *wc) > >> { > >>struct isert_conn *isert_conn = wc->qp->qp_context; > >>- struct ib_device *ib_dev = isert_conn->cm_id->device; > >>+ struct ib_device *ib_dev = isert_conn->device->ib_device; > >>if (unlikely(wc->status != IB_WC_SUCCESS)) { > >>isert_print_wc(wc, "login recv"); I'll test also this one-liner fix as soon as I can. But I can say that I'm pretty sure it will work as well, because all the previous NULL pointer dereferences that we've got in the past happened all 100% in isert_login_recv_done(). The other cases are probably a safe precaution, but they can't really happen. -Andrea
Re: [PATCH] ib_isert: prevent NULL pointer dereference in isert_login_recv_done()
On Thu, Jun 29, 2017 at 08:36:51AM +0300, Sagi Grimberg wrote: > > >Just tested this patch, I wasn't able to reproduce the NULL pointer > >dereference or any other bugs, so this fix seems safe enough to me. > > > >Tested-by: Andrea Righi > > Can you test just the one liner fix below? > > >>@@ -1452,7 +1452,7 @@ > >> isert_login_recv_done(struct ib_cq *cq, struct ib_wc *wc) > >> { > >>struct isert_conn *isert_conn = wc->qp->qp_context; > >>- struct ib_device *ib_dev = isert_conn->cm_id->device; > >>+ struct ib_device *ib_dev = isert_conn->device->ib_device; > >>if (unlikely(wc->status != IB_WC_SUCCESS)) { > >>isert_print_wc(wc, "login recv"); I'll test also this one-liner fix as soon as I can. But I can say that I'm pretty sure it will work as well, because all the previous NULL pointer dereferences that we've got in the past happened all 100% in isert_login_recv_done(). The other cases are probably a safe precaution, but they can't really happen. -Andrea
Re: [PATCH] ib_isert: prevent NULL pointer dereference in isert_login_recv_done()
Just tested this patch, I wasn't able to reproduce the NULL pointer dereference or any other bugs, so this fix seems safe enough to me. Tested-by: Andrea RighiCan you test just the one liner fix below? @@ -1452,7 +1452,7 @@ isert_login_recv_done(struct ib_cq *cq, struct ib_wc *wc) { struct isert_conn *isert_conn = wc->qp->qp_context; - struct ib_device *ib_dev = isert_conn->cm_id->device; + struct ib_device *ib_dev = isert_conn->device->ib_device; if (unlikely(wc->status != IB_WC_SUCCESS)) { isert_print_wc(wc, "login recv");
Re: [PATCH] ib_isert: prevent NULL pointer dereference in isert_login_recv_done()
Just tested this patch, I wasn't able to reproduce the NULL pointer dereference or any other bugs, so this fix seems safe enough to me. Tested-by: Andrea Righi Can you test just the one liner fix below? @@ -1452,7 +1452,7 @@ isert_login_recv_done(struct ib_cq *cq, struct ib_wc *wc) { struct isert_conn *isert_conn = wc->qp->qp_context; - struct ib_device *ib_dev = isert_conn->cm_id->device; + struct ib_device *ib_dev = isert_conn->device->ib_device; if (unlikely(wc->status != IB_WC_SUCCESS)) { isert_print_wc(wc, "login recv");
Re: [PATCH] ib_isert: prevent NULL pointer dereference in isert_login_recv_done() (was: Re: NULL pointer dereference in isert_login_recv_done in 4.9.32)
On Sun, Jun 25, 2017 at 04:58:04PM -0700, Nicholas A. Bellinger wrote: ... > So I assume isert_cma_handler() -> isert_connect_error() getting called > to clear isert_conn->cm_id before connection established, and > subsequently isert_conn->login_req_buf->rx_cqe.done() -> > isert_login_recv_done() still getting invoked after connection failure > is new RDMA API behavior.. > > That said, following Sagi's original patch that added the clearing of > isert_conn->cm_id to isert_connect_error(), it probably makes sense to > use isert_conn->device->ib_device, and avoid dereferencing > isert_conn->cm_id before connection is established. > > Here's a quick (untested) patch to do this for recv/send done callbacks, > and avoid using isert_conn->cm_id during isert_rdma_accept(). > > Sagi + Co, WDYT..? Just tested this patch, I wasn't able to reproduce the NULL pointer dereference or any other bugs, so this fix seems safe enough to me. Tested-by: Andrea Righi> > diff --git a/drivers/infiniband/ulp/isert/ib_isert.c > b/drivers/infiniband/ulp/isert/ib_isert.c > index fcbed35..f7f97f3 100644 > --- a/drivers/infiniband/ulp/isert/ib_isert.c > +++ b/drivers/infiniband/ulp/isert/ib_isert.c > @@ -52,7 +52,7 @@ > static int > isert_login_post_recv(struct isert_conn *isert_conn); > static int > -isert_rdma_accept(struct isert_conn *isert_conn); > +isert_rdma_accept(struct isert_conn *isert_conn, struct rdma_cm_id *cm_id); > struct rdma_cm_id *isert_setup_id(struct isert_np *isert_np); > > static void isert_release_work(struct work_struct *work); > @@ -543,7 +543,7 @@ > if (ret) > goto out_conn_dev; > > - ret = isert_rdma_accept(isert_conn); > + ret = isert_rdma_accept(isert_conn, cma_id); > if (ret) > goto out_conn_dev; > > @@ -1452,7 +1452,7 @@ > isert_login_recv_done(struct ib_cq *cq, struct ib_wc *wc) > { > struct isert_conn *isert_conn = wc->qp->qp_context; > - struct ib_device *ib_dev = isert_conn->cm_id->device; > + struct ib_device *ib_dev = isert_conn->device->ib_device; > > if (unlikely(wc->status != IB_WC_SUCCESS)) { > isert_print_wc(wc, "login recv"); > @@ -1769,7 +1769,7 @@ > isert_login_send_done(struct ib_cq *cq, struct ib_wc *wc) > { > struct isert_conn *isert_conn = wc->qp->qp_context; > - struct ib_device *ib_dev = isert_conn->cm_id->device; > + struct ib_device *ib_dev = isert_conn->device->ib_device; > struct iser_tx_desc *tx_desc = cqe_to_tx_desc(wc->wr_cqe); > > if (unlikely(wc->status != IB_WC_SUCCESS)) { > @@ -2369,9 +2369,8 @@ struct rdma_cm_id * > } > > static int > -isert_rdma_accept(struct isert_conn *isert_conn) > +isert_rdma_accept(struct isert_conn *isert_conn, struct rdma_cm_id *cm_id) > { > - struct rdma_cm_id *cm_id = isert_conn->cm_id; > struct rdma_conn_param cp; > int ret; > struct iser_cm_hdr rsp_hdr;
Re: [PATCH] ib_isert: prevent NULL pointer dereference in isert_login_recv_done() (was: Re: NULL pointer dereference in isert_login_recv_done in 4.9.32)
On Sun, Jun 25, 2017 at 04:58:04PM -0700, Nicholas A. Bellinger wrote: ... > So I assume isert_cma_handler() -> isert_connect_error() getting called > to clear isert_conn->cm_id before connection established, and > subsequently isert_conn->login_req_buf->rx_cqe.done() -> > isert_login_recv_done() still getting invoked after connection failure > is new RDMA API behavior.. > > That said, following Sagi's original patch that added the clearing of > isert_conn->cm_id to isert_connect_error(), it probably makes sense to > use isert_conn->device->ib_device, and avoid dereferencing > isert_conn->cm_id before connection is established. > > Here's a quick (untested) patch to do this for recv/send done callbacks, > and avoid using isert_conn->cm_id during isert_rdma_accept(). > > Sagi + Co, WDYT..? Just tested this patch, I wasn't able to reproduce the NULL pointer dereference or any other bugs, so this fix seems safe enough to me. Tested-by: Andrea Righi > > diff --git a/drivers/infiniband/ulp/isert/ib_isert.c > b/drivers/infiniband/ulp/isert/ib_isert.c > index fcbed35..f7f97f3 100644 > --- a/drivers/infiniband/ulp/isert/ib_isert.c > +++ b/drivers/infiniband/ulp/isert/ib_isert.c > @@ -52,7 +52,7 @@ > static int > isert_login_post_recv(struct isert_conn *isert_conn); > static int > -isert_rdma_accept(struct isert_conn *isert_conn); > +isert_rdma_accept(struct isert_conn *isert_conn, struct rdma_cm_id *cm_id); > struct rdma_cm_id *isert_setup_id(struct isert_np *isert_np); > > static void isert_release_work(struct work_struct *work); > @@ -543,7 +543,7 @@ > if (ret) > goto out_conn_dev; > > - ret = isert_rdma_accept(isert_conn); > + ret = isert_rdma_accept(isert_conn, cma_id); > if (ret) > goto out_conn_dev; > > @@ -1452,7 +1452,7 @@ > isert_login_recv_done(struct ib_cq *cq, struct ib_wc *wc) > { > struct isert_conn *isert_conn = wc->qp->qp_context; > - struct ib_device *ib_dev = isert_conn->cm_id->device; > + struct ib_device *ib_dev = isert_conn->device->ib_device; > > if (unlikely(wc->status != IB_WC_SUCCESS)) { > isert_print_wc(wc, "login recv"); > @@ -1769,7 +1769,7 @@ > isert_login_send_done(struct ib_cq *cq, struct ib_wc *wc) > { > struct isert_conn *isert_conn = wc->qp->qp_context; > - struct ib_device *ib_dev = isert_conn->cm_id->device; > + struct ib_device *ib_dev = isert_conn->device->ib_device; > struct iser_tx_desc *tx_desc = cqe_to_tx_desc(wc->wr_cqe); > > if (unlikely(wc->status != IB_WC_SUCCESS)) { > @@ -2369,9 +2369,8 @@ struct rdma_cm_id * > } > > static int > -isert_rdma_accept(struct isert_conn *isert_conn) > +isert_rdma_accept(struct isert_conn *isert_conn, struct rdma_cm_id *cm_id) > { > - struct rdma_cm_id *cm_id = isert_conn->cm_id; > struct rdma_conn_param cp; > int ret; > struct iser_cm_hdr rsp_hdr;
Re: [PATCH] ib_isert: prevent NULL pointer dereference in isert_login_recv_done()
Hi Andrea & Robert, Hi Andrea, Robert, Nic and Co We have hit this four times today. Any ideas? [ 169.382113] BUG: unable to handle kernel NULL pointer dereference at (null) [ 169.382152] IP: [] isert_login_recv_done+0x28/0x170 [ib_isert] So, we spent more time to track down this bug. It seems that at login time an error is happening, not sure exactly what kind of error, but isert_connect_error() is called and isert_conn->cm_id is set to NULL. Later, isert_login_recv_done() is trying to access isert_conn->cm_id->device and we get the NULL pointer dereference. Following there's the patch that we have applied to track down this problem. And this is what we see in dmesg with this patch applied: [ 658.633188] isert: isert_connect_error: conn 887f2209c000 error [ 658.633226] isert: isert_login_recv_done: login with broken rdma_cm_id As we can see isert_connect_error() is called before isert_login_recv_done and at that point isert_conn->cm_id is NULL. Obviously simply checking if the pointer is NULL, returning and ignoring the error in isert_login_recv_done() is not the best fix ever and I'm not sure if I'm breaking something else doing so (even if with this patch the kernel doesn't crash and I've not seen any problem so far). Maybe a better way is to tear down the whole connection when this particular case is happening? Suggestions? That is an accurate analysis of what is going on, good job! So I assume isert_cma_handler() -> isert_connect_error() getting called to clear isert_conn->cm_id before connection established, and subsequently isert_conn->login_req_buf->rx_cqe.done() -> isert_login_recv_done() still getting invoked after connection failure is new RDMA API behavior.. That is correct, but its not really a new behavior, and absolutely not introduced by the new RDMA API. We will _always_ see the completion of _all_ recv wrs posted on the qp (given that we assigned a ->done handler), this is a FLUSH error completion, we just don't get to verify that because we deref NULL before. The issue here, as you indicated was the assumption that dereferencing the connection cm_id is always safe, which is not true since: commit 4a579da2586bd3b79b025947ea24ede2bbfede62 Author: Sagi GrimbergDate: Sun Mar 29 15:52:04 2015 +0300 iser-target: Fix possible deadlock in RDMA_CM connection error Before we reach to connection established we may get an error event. In this case the core won't teardown this connection (never established it), so we take care of freeing it ourselves. Signed-off-by: Sagi Grimberg Cc: # v3.10+ Signed-off-by: Nicholas Bellinger As I see it, we have a direct reference to the isert_device from isert_conn which is the one-liner fix that we actually need (like we do in isert_rdma_read_done, isert_rdma_write_done), guess we also need it in isert_login_send_done, isert_send_done (although it can't really happen, but better safe and consistent than sorry). @@ -1452,7 +1452,7 @@ isert_login_recv_done(struct ib_cq *cq, struct ib_wc *wc) { struct isert_conn *isert_conn = wc->qp->qp_context; - struct ib_device *ib_dev = isert_conn->cm_id->device; + struct ib_device *ib_dev = isert_conn->device->ib_device; if (unlikely(wc->status != IB_WC_SUCCESS)) { isert_print_wc(wc, "login recv"); That's the one...
Re: [PATCH] ib_isert: prevent NULL pointer dereference in isert_login_recv_done()
Hi Andrea & Robert, Hi Andrea, Robert, Nic and Co We have hit this four times today. Any ideas? [ 169.382113] BUG: unable to handle kernel NULL pointer dereference at (null) [ 169.382152] IP: [] isert_login_recv_done+0x28/0x170 [ib_isert] So, we spent more time to track down this bug. It seems that at login time an error is happening, not sure exactly what kind of error, but isert_connect_error() is called and isert_conn->cm_id is set to NULL. Later, isert_login_recv_done() is trying to access isert_conn->cm_id->device and we get the NULL pointer dereference. Following there's the patch that we have applied to track down this problem. And this is what we see in dmesg with this patch applied: [ 658.633188] isert: isert_connect_error: conn 887f2209c000 error [ 658.633226] isert: isert_login_recv_done: login with broken rdma_cm_id As we can see isert_connect_error() is called before isert_login_recv_done and at that point isert_conn->cm_id is NULL. Obviously simply checking if the pointer is NULL, returning and ignoring the error in isert_login_recv_done() is not the best fix ever and I'm not sure if I'm breaking something else doing so (even if with this patch the kernel doesn't crash and I've not seen any problem so far). Maybe a better way is to tear down the whole connection when this particular case is happening? Suggestions? That is an accurate analysis of what is going on, good job! So I assume isert_cma_handler() -> isert_connect_error() getting called to clear isert_conn->cm_id before connection established, and subsequently isert_conn->login_req_buf->rx_cqe.done() -> isert_login_recv_done() still getting invoked after connection failure is new RDMA API behavior.. That is correct, but its not really a new behavior, and absolutely not introduced by the new RDMA API. We will _always_ see the completion of _all_ recv wrs posted on the qp (given that we assigned a ->done handler), this is a FLUSH error completion, we just don't get to verify that because we deref NULL before. The issue here, as you indicated was the assumption that dereferencing the connection cm_id is always safe, which is not true since: commit 4a579da2586bd3b79b025947ea24ede2bbfede62 Author: Sagi Grimberg Date: Sun Mar 29 15:52:04 2015 +0300 iser-target: Fix possible deadlock in RDMA_CM connection error Before we reach to connection established we may get an error event. In this case the core won't teardown this connection (never established it), so we take care of freeing it ourselves. Signed-off-by: Sagi Grimberg Cc: # v3.10+ Signed-off-by: Nicholas Bellinger As I see it, we have a direct reference to the isert_device from isert_conn which is the one-liner fix that we actually need (like we do in isert_rdma_read_done, isert_rdma_write_done), guess we also need it in isert_login_send_done, isert_send_done (although it can't really happen, but better safe and consistent than sorry). @@ -1452,7 +1452,7 @@ isert_login_recv_done(struct ib_cq *cq, struct ib_wc *wc) { struct isert_conn *isert_conn = wc->qp->qp_context; - struct ib_device *ib_dev = isert_conn->cm_id->device; + struct ib_device *ib_dev = isert_conn->device->ib_device; if (unlikely(wc->status != IB_WC_SUCCESS)) { isert_print_wc(wc, "login recv"); That's the one...
Re: [PATCH] ib_isert: prevent NULL pointer dereference in isert_login_recv_done() (was: Re: NULL pointer dereference in isert_login_recv_done in 4.9.32)
On Fri, Jun 23, 2017 at 12:37:57AM +0200, Andrea Righi wrote: > On Wed, Jun 21, 2017 at 10:33:45AM -0600, Robert LeBlanc wrote: > > On Wed, Jun 21, 2017 at 9:17 AM, Robert LeBlanc> > wrote: > > > On Tue, Jun 20, 2017 at 12:54 PM, Robert LeBlanc > > > wrote: > > >> We have hit this four times today. Any ideas? > > >> > > >> [ 169.382113] BUG: unable to handle kernel NULL pointer dereference at > > >> (null) > > >> [ 169.382152] IP: [] isert_login_recv_done+0x28/0x170 > > >> [ib_isert] > > So, we spent more time to track down this bug. > > It seems that at login time an error is happening, not sure exactly what > kind of error, but isert_connect_error() is called and isert_conn->cm_id > is set to NULL. > > Later, isert_login_recv_done() is trying to access > isert_conn->cm_id->device and we get the NULL pointer dereference. > > Following there's the patch that we have applied to track down this > problem. > > And this is what we see in dmesg with this patch applied: > > [ 658.633188] isert: isert_connect_error: conn 887f2209c000 error > [ 658.633226] isert: isert_login_recv_done: login with broken rdma_cm_id > > As we can see isert_connect_error() is called before isert_login_recv_done > and at that point isert_conn->cm_id is NULL. > > Obviously simply checking if the pointer is NULL, returning and ignoring > the error in isert_login_recv_done() is not the best fix ever and I'm > not sure if I'm breaking something else doing so (even if with this > patch the kernel doesn't crash and I've not seen any problem so far). > > Maybe a better way is to tear down the whole connection when this > particular case is happening? Suggestions? > > Thanks, > -Andrea > > --- > ib_isert: prevent NULL pointer dereference in isert_login_recv_done() > > During a login if an error is happening isert_connect_error() is called > and isert_conn->cm_id is set to NULL. > > Later, isert_login_recv_done() is executed, trying to access > isert_conn->cm_id->device, causing the following BUG: > > [ 169.382113] BUG: unable to handle kernel NULL pointer dereference at > (null) > [ 169.382152] IP: [] isert_login_recv_done+0x28/0x170 > [ib_isert] > > Check if isert_con->cm_id is set to NULL in isert_login_recv_done() to > avoid this problem. > > Signed-off-by: Andrea Righi > Signed-off-by: Robert LeBlanc > --- > drivers/infiniband/ulp/isert/ib_isert.c | 9 - > 1 file changed, 8 insertions(+), 1 deletion(-) > > diff --git a/drivers/infiniband/ulp/isert/ib_isert.c > b/drivers/infiniband/ulp/isert/ib_isert.c > index fcbed35..a8c1143 100644 > --- a/drivers/infiniband/ulp/isert/ib_isert.c > +++ b/drivers/infiniband/ulp/isert/ib_isert.c > @@ -741,6 +741,7 @@ isert_connect_error(struct rdma_cm_id *cma_id) > { > struct isert_conn *isert_conn = cma_id->qp->qp_context; > > + isert_warn("conn %p error\n", isert_conn); the same can be achieved with tracing, please don't put it. > list_del_init(_conn->node); > isert_conn->cm_id = NULL; > isert_put_conn(isert_conn); > @@ -1452,7 +1453,13 @@ static void > isert_login_recv_done(struct ib_cq *cq, struct ib_wc *wc) > { > struct isert_conn *isert_conn = wc->qp->qp_context; > - struct ib_device *ib_dev = isert_conn->cm_id->device; > +struct ib_device *ib_dev; > + > + if (unlikely(isert_conn->cm_id == NULL)) { There is no need to explicitly compare with NULL. !isert_conn->cm_id will do the trick. Thanks > + isert_warn("login with broken rdma_cm_id"); > + return; > + } > + ib_dev = isert_conn->cm_id->device; > > if (unlikely(wc->status != IB_WC_SUCCESS)) { > isert_print_wc(wc, "login recv"); > -- > 2.7.4 > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html signature.asc Description: PGP signature
Re: [PATCH] ib_isert: prevent NULL pointer dereference in isert_login_recv_done() (was: Re: NULL pointer dereference in isert_login_recv_done in 4.9.32)
On Fri, Jun 23, 2017 at 12:37:57AM +0200, Andrea Righi wrote: > On Wed, Jun 21, 2017 at 10:33:45AM -0600, Robert LeBlanc wrote: > > On Wed, Jun 21, 2017 at 9:17 AM, Robert LeBlanc > > wrote: > > > On Tue, Jun 20, 2017 at 12:54 PM, Robert LeBlanc > > > wrote: > > >> We have hit this four times today. Any ideas? > > >> > > >> [ 169.382113] BUG: unable to handle kernel NULL pointer dereference at > > >> (null) > > >> [ 169.382152] IP: [] isert_login_recv_done+0x28/0x170 > > >> [ib_isert] > > So, we spent more time to track down this bug. > > It seems that at login time an error is happening, not sure exactly what > kind of error, but isert_connect_error() is called and isert_conn->cm_id > is set to NULL. > > Later, isert_login_recv_done() is trying to access > isert_conn->cm_id->device and we get the NULL pointer dereference. > > Following there's the patch that we have applied to track down this > problem. > > And this is what we see in dmesg with this patch applied: > > [ 658.633188] isert: isert_connect_error: conn 887f2209c000 error > [ 658.633226] isert: isert_login_recv_done: login with broken rdma_cm_id > > As we can see isert_connect_error() is called before isert_login_recv_done > and at that point isert_conn->cm_id is NULL. > > Obviously simply checking if the pointer is NULL, returning and ignoring > the error in isert_login_recv_done() is not the best fix ever and I'm > not sure if I'm breaking something else doing so (even if with this > patch the kernel doesn't crash and I've not seen any problem so far). > > Maybe a better way is to tear down the whole connection when this > particular case is happening? Suggestions? > > Thanks, > -Andrea > > --- > ib_isert: prevent NULL pointer dereference in isert_login_recv_done() > > During a login if an error is happening isert_connect_error() is called > and isert_conn->cm_id is set to NULL. > > Later, isert_login_recv_done() is executed, trying to access > isert_conn->cm_id->device, causing the following BUG: > > [ 169.382113] BUG: unable to handle kernel NULL pointer dereference at > (null) > [ 169.382152] IP: [] isert_login_recv_done+0x28/0x170 > [ib_isert] > > Check if isert_con->cm_id is set to NULL in isert_login_recv_done() to > avoid this problem. > > Signed-off-by: Andrea Righi > Signed-off-by: Robert LeBlanc > --- > drivers/infiniband/ulp/isert/ib_isert.c | 9 - > 1 file changed, 8 insertions(+), 1 deletion(-) > > diff --git a/drivers/infiniband/ulp/isert/ib_isert.c > b/drivers/infiniband/ulp/isert/ib_isert.c > index fcbed35..a8c1143 100644 > --- a/drivers/infiniband/ulp/isert/ib_isert.c > +++ b/drivers/infiniband/ulp/isert/ib_isert.c > @@ -741,6 +741,7 @@ isert_connect_error(struct rdma_cm_id *cma_id) > { > struct isert_conn *isert_conn = cma_id->qp->qp_context; > > + isert_warn("conn %p error\n", isert_conn); the same can be achieved with tracing, please don't put it. > list_del_init(_conn->node); > isert_conn->cm_id = NULL; > isert_put_conn(isert_conn); > @@ -1452,7 +1453,13 @@ static void > isert_login_recv_done(struct ib_cq *cq, struct ib_wc *wc) > { > struct isert_conn *isert_conn = wc->qp->qp_context; > - struct ib_device *ib_dev = isert_conn->cm_id->device; > +struct ib_device *ib_dev; > + > + if (unlikely(isert_conn->cm_id == NULL)) { There is no need to explicitly compare with NULL. !isert_conn->cm_id will do the trick. Thanks > + isert_warn("login with broken rdma_cm_id"); > + return; > + } > + ib_dev = isert_conn->cm_id->device; > > if (unlikely(wc->status != IB_WC_SUCCESS)) { > isert_print_wc(wc, "login recv"); > -- > 2.7.4 > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html signature.asc Description: PGP signature
Re: [PATCH] ib_isert: prevent NULL pointer dereference in isert_login_recv_done() (was: Re: NULL pointer dereference in isert_login_recv_done in 4.9.32)
Hi Andrea & Robert, (Adding HCH CC') On Fri, 2017-06-23 at 00:37 +0200, Andrea Righi wrote: > On Wed, Jun 21, 2017 at 10:33:45AM -0600, Robert LeBlanc wrote: > > On Wed, Jun 21, 2017 at 9:17 AM, Robert LeBlanc> > wrote: > > > On Tue, Jun 20, 2017 at 12:54 PM, Robert LeBlanc > > > wrote: > > >> We have hit this four times today. Any ideas? > > >> > > >> [ 169.382113] BUG: unable to handle kernel NULL pointer dereference at > > >> (null) > > >> [ 169.382152] IP: [] isert_login_recv_done+0x28/0x170 > > >> [ib_isert] > > So, we spent more time to track down this bug. > > It seems that at login time an error is happening, not sure exactly what > kind of error, but isert_connect_error() is called and isert_conn->cm_id > is set to NULL. > > Later, isert_login_recv_done() is trying to access > isert_conn->cm_id->device and we get the NULL pointer dereference. > > Following there's the patch that we have applied to track down this > problem. > > And this is what we see in dmesg with this patch applied: > > [ 658.633188] isert: isert_connect_error: conn 887f2209c000 error > [ 658.633226] isert: isert_login_recv_done: login with broken rdma_cm_id > > As we can see isert_connect_error() is called before isert_login_recv_done > and at that point isert_conn->cm_id is NULL. > > Obviously simply checking if the pointer is NULL, returning and ignoring > the error in isert_login_recv_done() is not the best fix ever and I'm > not sure if I'm breaking something else doing so (even if with this > patch the kernel doesn't crash and I've not seen any problem so far). > > Maybe a better way is to tear down the whole connection when this > particular case is happening? Suggestions? > > Thanks, > -Andrea > > --- > ib_isert: prevent NULL pointer dereference in isert_login_recv_done() > > During a login if an error is happening isert_connect_error() is called > and isert_conn->cm_id is set to NULL. > > Later, isert_login_recv_done() is executed, trying to access > isert_conn->cm_id->device, causing the following BUG: > > [ 169.382113] BUG: unable to handle kernel NULL pointer dereference at > (null) > [ 169.382152] IP: [] isert_login_recv_done+0x28/0x170 > [ib_isert] > > Check if isert_con->cm_id is set to NULL in isert_login_recv_done() to > avoid this problem. > > Signed-off-by: Andrea Righi > Signed-off-by: Robert LeBlanc > --- > drivers/infiniband/ulp/isert/ib_isert.c | 9 - > 1 file changed, 8 insertions(+), 1 deletion(-) > > diff --git a/drivers/infiniband/ulp/isert/ib_isert.c > b/drivers/infiniband/ulp/isert/ib_isert.c > index fcbed35..a8c1143 100644 > --- a/drivers/infiniband/ulp/isert/ib_isert.c > +++ b/drivers/infiniband/ulp/isert/ib_isert.c > @@ -741,6 +741,7 @@ isert_connect_error(struct rdma_cm_id *cma_id) > { > struct isert_conn *isert_conn = cma_id->qp->qp_context; > > + isert_warn("conn %p error\n", isert_conn); > list_del_init(_conn->node); > isert_conn->cm_id = NULL; > isert_put_conn(isert_conn); > @@ -1452,7 +1453,13 @@ static void > isert_login_recv_done(struct ib_cq *cq, struct ib_wc *wc) > { > struct isert_conn *isert_conn = wc->qp->qp_context; > - struct ib_device *ib_dev = isert_conn->cm_id->device; > +struct ib_device *ib_dev; > + > + if (unlikely(isert_conn->cm_id == NULL)) { > + isert_warn("login with broken rdma_cm_id"); > + return; > + } > + ib_dev = isert_conn->cm_id->device; > > if (unlikely(wc->status != IB_WC_SUCCESS)) { > isert_print_wc(wc, "login recv"); So I assume isert_cma_handler() -> isert_connect_error() getting called to clear isert_conn->cm_id before connection established, and subsequently isert_conn->login_req_buf->rx_cqe.done() -> isert_login_recv_done() still getting invoked after connection failure is new RDMA API behavior.. That said, following Sagi's original patch that added the clearing of isert_conn->cm_id to isert_connect_error(), it probably makes sense to use isert_conn->device->ib_device, and avoid dereferencing isert_conn->cm_id before connection is established. Here's a quick (untested) patch to do this for recv/send done callbacks, and avoid using isert_conn->cm_id during isert_rdma_accept(). Sagi + Co, WDYT..? diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c index fcbed35..f7f97f3 100644 --- a/drivers/infiniband/ulp/isert/ib_isert.c +++ b/drivers/infiniband/ulp/isert/ib_isert.c @@ -52,7 +52,7 @@ static int isert_login_post_recv(struct isert_conn *isert_conn); static int -isert_rdma_accept(struct isert_conn *isert_conn); +isert_rdma_accept(struct isert_conn *isert_conn, struct rdma_cm_id *cm_id); struct rdma_cm_id *isert_setup_id(struct isert_np *isert_np); static void isert_release_work(struct work_struct *work); @@ -543,7 +543,7 @@ if (ret)
Re: [PATCH] ib_isert: prevent NULL pointer dereference in isert_login_recv_done() (was: Re: NULL pointer dereference in isert_login_recv_done in 4.9.32)
Hi Andrea & Robert, (Adding HCH CC') On Fri, 2017-06-23 at 00:37 +0200, Andrea Righi wrote: > On Wed, Jun 21, 2017 at 10:33:45AM -0600, Robert LeBlanc wrote: > > On Wed, Jun 21, 2017 at 9:17 AM, Robert LeBlanc > > wrote: > > > On Tue, Jun 20, 2017 at 12:54 PM, Robert LeBlanc > > > wrote: > > >> We have hit this four times today. Any ideas? > > >> > > >> [ 169.382113] BUG: unable to handle kernel NULL pointer dereference at > > >> (null) > > >> [ 169.382152] IP: [] isert_login_recv_done+0x28/0x170 > > >> [ib_isert] > > So, we spent more time to track down this bug. > > It seems that at login time an error is happening, not sure exactly what > kind of error, but isert_connect_error() is called and isert_conn->cm_id > is set to NULL. > > Later, isert_login_recv_done() is trying to access > isert_conn->cm_id->device and we get the NULL pointer dereference. > > Following there's the patch that we have applied to track down this > problem. > > And this is what we see in dmesg with this patch applied: > > [ 658.633188] isert: isert_connect_error: conn 887f2209c000 error > [ 658.633226] isert: isert_login_recv_done: login with broken rdma_cm_id > > As we can see isert_connect_error() is called before isert_login_recv_done > and at that point isert_conn->cm_id is NULL. > > Obviously simply checking if the pointer is NULL, returning and ignoring > the error in isert_login_recv_done() is not the best fix ever and I'm > not sure if I'm breaking something else doing so (even if with this > patch the kernel doesn't crash and I've not seen any problem so far). > > Maybe a better way is to tear down the whole connection when this > particular case is happening? Suggestions? > > Thanks, > -Andrea > > --- > ib_isert: prevent NULL pointer dereference in isert_login_recv_done() > > During a login if an error is happening isert_connect_error() is called > and isert_conn->cm_id is set to NULL. > > Later, isert_login_recv_done() is executed, trying to access > isert_conn->cm_id->device, causing the following BUG: > > [ 169.382113] BUG: unable to handle kernel NULL pointer dereference at > (null) > [ 169.382152] IP: [] isert_login_recv_done+0x28/0x170 > [ib_isert] > > Check if isert_con->cm_id is set to NULL in isert_login_recv_done() to > avoid this problem. > > Signed-off-by: Andrea Righi > Signed-off-by: Robert LeBlanc > --- > drivers/infiniband/ulp/isert/ib_isert.c | 9 - > 1 file changed, 8 insertions(+), 1 deletion(-) > > diff --git a/drivers/infiniband/ulp/isert/ib_isert.c > b/drivers/infiniband/ulp/isert/ib_isert.c > index fcbed35..a8c1143 100644 > --- a/drivers/infiniband/ulp/isert/ib_isert.c > +++ b/drivers/infiniband/ulp/isert/ib_isert.c > @@ -741,6 +741,7 @@ isert_connect_error(struct rdma_cm_id *cma_id) > { > struct isert_conn *isert_conn = cma_id->qp->qp_context; > > + isert_warn("conn %p error\n", isert_conn); > list_del_init(_conn->node); > isert_conn->cm_id = NULL; > isert_put_conn(isert_conn); > @@ -1452,7 +1453,13 @@ static void > isert_login_recv_done(struct ib_cq *cq, struct ib_wc *wc) > { > struct isert_conn *isert_conn = wc->qp->qp_context; > - struct ib_device *ib_dev = isert_conn->cm_id->device; > +struct ib_device *ib_dev; > + > + if (unlikely(isert_conn->cm_id == NULL)) { > + isert_warn("login with broken rdma_cm_id"); > + return; > + } > + ib_dev = isert_conn->cm_id->device; > > if (unlikely(wc->status != IB_WC_SUCCESS)) { > isert_print_wc(wc, "login recv"); So I assume isert_cma_handler() -> isert_connect_error() getting called to clear isert_conn->cm_id before connection established, and subsequently isert_conn->login_req_buf->rx_cqe.done() -> isert_login_recv_done() still getting invoked after connection failure is new RDMA API behavior.. That said, following Sagi's original patch that added the clearing of isert_conn->cm_id to isert_connect_error(), it probably makes sense to use isert_conn->device->ib_device, and avoid dereferencing isert_conn->cm_id before connection is established. Here's a quick (untested) patch to do this for recv/send done callbacks, and avoid using isert_conn->cm_id during isert_rdma_accept(). Sagi + Co, WDYT..? diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c index fcbed35..f7f97f3 100644 --- a/drivers/infiniband/ulp/isert/ib_isert.c +++ b/drivers/infiniband/ulp/isert/ib_isert.c @@ -52,7 +52,7 @@ static int isert_login_post_recv(struct isert_conn *isert_conn); static int -isert_rdma_accept(struct isert_conn *isert_conn); +isert_rdma_accept(struct isert_conn *isert_conn, struct rdma_cm_id *cm_id); struct rdma_cm_id *isert_setup_id(struct isert_np *isert_np); static void isert_release_work(struct work_struct *work); @@ -543,7 +543,7 @@ if (ret) goto out_conn_dev; - ret = isert_rdma_accept(isert_conn); + ret =
[PATCH] ib_isert: prevent NULL pointer dereference in isert_login_recv_done() (was: Re: NULL pointer dereference in isert_login_recv_done in 4.9.32)
On Wed, Jun 21, 2017 at 10:33:45AM -0600, Robert LeBlanc wrote: > On Wed, Jun 21, 2017 at 9:17 AM, Robert LeBlancwrote: > > On Tue, Jun 20, 2017 at 12:54 PM, Robert LeBlanc > > wrote: > >> We have hit this four times today. Any ideas? > >> > >> [ 169.382113] BUG: unable to handle kernel NULL pointer dereference at > >>(null) > >> [ 169.382152] IP: [] isert_login_recv_done+0x28/0x170 > >> [ib_isert] So, we spent more time to track down this bug. It seems that at login time an error is happening, not sure exactly what kind of error, but isert_connect_error() is called and isert_conn->cm_id is set to NULL. Later, isert_login_recv_done() is trying to access isert_conn->cm_id->device and we get the NULL pointer dereference. Following there's the patch that we have applied to track down this problem. And this is what we see in dmesg with this patch applied: [ 658.633188] isert: isert_connect_error: conn 887f2209c000 error [ 658.633226] isert: isert_login_recv_done: login with broken rdma_cm_id As we can see isert_connect_error() is called before isert_login_recv_done and at that point isert_conn->cm_id is NULL. Obviously simply checking if the pointer is NULL, returning and ignoring the error in isert_login_recv_done() is not the best fix ever and I'm not sure if I'm breaking something else doing so (even if with this patch the kernel doesn't crash and I've not seen any problem so far). Maybe a better way is to tear down the whole connection when this particular case is happening? Suggestions? Thanks, -Andrea --- ib_isert: prevent NULL pointer dereference in isert_login_recv_done() During a login if an error is happening isert_connect_error() is called and isert_conn->cm_id is set to NULL. Later, isert_login_recv_done() is executed, trying to access isert_conn->cm_id->device, causing the following BUG: [ 169.382113] BUG: unable to handle kernel NULL pointer dereference at (null) [ 169.382152] IP: [] isert_login_recv_done+0x28/0x170 [ib_isert] Check if isert_con->cm_id is set to NULL in isert_login_recv_done() to avoid this problem. Signed-off-by: Andrea Righi Signed-off-by: Robert LeBlanc --- drivers/infiniband/ulp/isert/ib_isert.c | 9 - 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c index fcbed35..a8c1143 100644 --- a/drivers/infiniband/ulp/isert/ib_isert.c +++ b/drivers/infiniband/ulp/isert/ib_isert.c @@ -741,6 +741,7 @@ isert_connect_error(struct rdma_cm_id *cma_id) { struct isert_conn *isert_conn = cma_id->qp->qp_context; + isert_warn("conn %p error\n", isert_conn); list_del_init(_conn->node); isert_conn->cm_id = NULL; isert_put_conn(isert_conn); @@ -1452,7 +1453,13 @@ static void isert_login_recv_done(struct ib_cq *cq, struct ib_wc *wc) { struct isert_conn *isert_conn = wc->qp->qp_context; - struct ib_device *ib_dev = isert_conn->cm_id->device; +struct ib_device *ib_dev; + + if (unlikely(isert_conn->cm_id == NULL)) { + isert_warn("login with broken rdma_cm_id"); + return; + } + ib_dev = isert_conn->cm_id->device; if (unlikely(wc->status != IB_WC_SUCCESS)) { isert_print_wc(wc, "login recv"); -- 2.7.4
[PATCH] ib_isert: prevent NULL pointer dereference in isert_login_recv_done() (was: Re: NULL pointer dereference in isert_login_recv_done in 4.9.32)
On Wed, Jun 21, 2017 at 10:33:45AM -0600, Robert LeBlanc wrote: > On Wed, Jun 21, 2017 at 9:17 AM, Robert LeBlanc wrote: > > On Tue, Jun 20, 2017 at 12:54 PM, Robert LeBlanc > > wrote: > >> We have hit this four times today. Any ideas? > >> > >> [ 169.382113] BUG: unable to handle kernel NULL pointer dereference at > >>(null) > >> [ 169.382152] IP: [] isert_login_recv_done+0x28/0x170 > >> [ib_isert] So, we spent more time to track down this bug. It seems that at login time an error is happening, not sure exactly what kind of error, but isert_connect_error() is called and isert_conn->cm_id is set to NULL. Later, isert_login_recv_done() is trying to access isert_conn->cm_id->device and we get the NULL pointer dereference. Following there's the patch that we have applied to track down this problem. And this is what we see in dmesg with this patch applied: [ 658.633188] isert: isert_connect_error: conn 887f2209c000 error [ 658.633226] isert: isert_login_recv_done: login with broken rdma_cm_id As we can see isert_connect_error() is called before isert_login_recv_done and at that point isert_conn->cm_id is NULL. Obviously simply checking if the pointer is NULL, returning and ignoring the error in isert_login_recv_done() is not the best fix ever and I'm not sure if I'm breaking something else doing so (even if with this patch the kernel doesn't crash and I've not seen any problem so far). Maybe a better way is to tear down the whole connection when this particular case is happening? Suggestions? Thanks, -Andrea --- ib_isert: prevent NULL pointer dereference in isert_login_recv_done() During a login if an error is happening isert_connect_error() is called and isert_conn->cm_id is set to NULL. Later, isert_login_recv_done() is executed, trying to access isert_conn->cm_id->device, causing the following BUG: [ 169.382113] BUG: unable to handle kernel NULL pointer dereference at (null) [ 169.382152] IP: [] isert_login_recv_done+0x28/0x170 [ib_isert] Check if isert_con->cm_id is set to NULL in isert_login_recv_done() to avoid this problem. Signed-off-by: Andrea Righi Signed-off-by: Robert LeBlanc --- drivers/infiniband/ulp/isert/ib_isert.c | 9 - 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c index fcbed35..a8c1143 100644 --- a/drivers/infiniband/ulp/isert/ib_isert.c +++ b/drivers/infiniband/ulp/isert/ib_isert.c @@ -741,6 +741,7 @@ isert_connect_error(struct rdma_cm_id *cma_id) { struct isert_conn *isert_conn = cma_id->qp->qp_context; + isert_warn("conn %p error\n", isert_conn); list_del_init(_conn->node); isert_conn->cm_id = NULL; isert_put_conn(isert_conn); @@ -1452,7 +1453,13 @@ static void isert_login_recv_done(struct ib_cq *cq, struct ib_wc *wc) { struct isert_conn *isert_conn = wc->qp->qp_context; - struct ib_device *ib_dev = isert_conn->cm_id->device; +struct ib_device *ib_dev; + + if (unlikely(isert_conn->cm_id == NULL)) { + isert_warn("login with broken rdma_cm_id"); + return; + } + ib_dev = isert_conn->cm_id->device; if (unlikely(wc->status != IB_WC_SUCCESS)) { isert_print_wc(wc, "login recv"); -- 2.7.4