Re: [PATCH FIX for-3.19] IB/ipoib: Fix failed multicast joins/sends
On 1/14/2015 6:09 PM, Doug Ledford wrote: On Wed, 2015-01-14 at 18:02 +0200, Erez Shitrit wrote: Hi Doug, Perhaps I am missing something here, but ping6 still doesn't work for me in many cases. I think the reason is that your origin patch does the following: in function ipoib_mcast_join_task if (test_bit(IPOIB_MCAST_FLAG_SENDONLY, mcast-flags)) ipoib_mcast_sendonly_join(mcast); else ipoib_mcast_join(dev, mcast, 1); return; The flow for sendonly_join doesn't include handling the mc_task, so only the first mc in the list (if it is sendonly mcg) will be sent, and no more mcg's that are in the ipoib mc list are going to be sent. (see how it is in ipoib_mcast_join flow) Yes, I know what you are talking about. However, my patches did not add this bug, it was present in the original code. Please check a plain v3.18 kernel, which does not have my patches, and you will see that ipoib_mcast_sendonly_join_complete also fails to restart the mcast join thread there as well. Agree. but in 3.18 there was no call from mc_task to sendonly_join, just to the full-member join, so no need at that point to handle the task. (the call for sendonly-join was by demand whenever new packet to mcg was sent by the kernel) only in 3.19 the sendonly join was called explicitly from the mc_task. I can demonstrate it with the log of ipoib: I am trying to ping6 fe80::202:c903:9f:3b0a via ib0 The log is: ib0: restarting multicast task ib0: setting up send only multicast group for ff12:601b::::::0016 ib0: adding multicast entry for mgid ff12:601b::::0001:ff43:3bf1 ib0: no multicast record for ff12:601b::::::0016, starting sendonly join ib0: join completion for ff12:601b::::::0001 (status 0) ib0: MGID ff12:601b::::::0001 AV 88081afb5f40, LID 0xc015, SL 0 ib0: join completion for ff12:401b::::::0001 (status 0) ib0: MGID ff12:401b::::::0001 AV 88081e1c42c0, LID 0xc014, SL 0 ib0: sendonly multicast join failed for ff12:601b::::::0016, status -22 ib0: no multicast record for ff12:601b::::::0016, starting sendonly join ib0: sendonly multicast join failed for ff12:601b::::::0016, status -22 ib0: no multicast record for ff12:601b::::::0016, starting sendonly join ib0: sendonly multicast join failed for ff12:601b::::::0016, status -22 ib0: setting up send only multicast group for ff12:601b::::::0002 ib0: no multicast record for ff12:601b::::::0016, starting sendonly join ib0: sendonly multicast join failed for ff12:601b::::::0016, status -22 ib0: setting up send only multicast group for ff12:601b::::0001:ff9f:3b0a here you can see that the ipv6 address is added and queued to the list ib0: no multicast record for ff12:601b::::::0016, starting sendonly join ib0: sendonly multicast join failed for ff12:601b::::::0016, status -22 the ipv6 mcg will not be sent because it is after some other sendonly, and no one in that flow re-queue the mc_task again. This is a problem with the design of the original mcast task thread. I'm looking at a fix now. Currently the design only allows one join to be outstanding at a time. Is there a reason for that that I'm not aware of? Some historical context that I don't know about? IMHO, the reason for only one mc on the air at a time was to make our life easier, otherwise there are locks to take/manage, races between few responses, etc. also, the multicast module in the core keeps all the requests in serialize mode. perhaps, you can use the relevant code from the full-member join in the sendonly joinin order to handle the mc_task, or to return the call to send-only to the mcast_send instead of the mc_task. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH FIX for-3.19] IB/ipoib: Fix failed multicast joins/sends
On Wed, 2015-01-14 at 20:47 +0200, Erez Shitrit wrote: On 1/14/2015 6:09 PM, Doug Ledford wrote: On Wed, 2015-01-14 at 18:02 +0200, Erez Shitrit wrote: Hi Doug, Perhaps I am missing something here, but ping6 still doesn't work for me in many cases. I think the reason is that your origin patch does the following: in function ipoib_mcast_join_task if (test_bit(IPOIB_MCAST_FLAG_SENDONLY, mcast-flags)) ipoib_mcast_sendonly_join(mcast); else ipoib_mcast_join(dev, mcast, 1); return; The flow for sendonly_join doesn't include handling the mc_task, so only the first mc in the list (if it is sendonly mcg) will be sent, and no more mcg's that are in the ipoib mc list are going to be sent. (see how it is in ipoib_mcast_join flow) Yes, I know what you are talking about. However, my patches did not add this bug, it was present in the original code. Please check a plain v3.18 kernel, which does not have my patches, and you will see that ipoib_mcast_sendonly_join_complete also fails to restart the mcast join thread there as well. Agree. but in 3.18 there was no call from mc_task to sendonly_join, just to the full-member join, so no need at that point to handle the task. (the call for sendonly-join was by demand whenever new packet to mcg was sent by the kernel) only in 3.19 the sendonly join was called explicitly from the mc_task. I just sent a patch set that fixes this. I can demonstrate it with the log of ipoib: I am trying to ping6 fe80::202:c903:9f:3b0a via ib0 The log is: ib0: restarting multicast task ib0: setting up send only multicast group for ff12:601b::::::0016 ib0: adding multicast entry for mgid ff12:601b::::0001:ff43:3bf1 ib0: no multicast record for ff12:601b::::::0016, starting sendonly join ib0: join completion for ff12:601b::::::0001 (status 0) ib0: MGID ff12:601b::::::0001 AV 88081afb5f40, LID 0xc015, SL 0 ib0: join completion for ff12:401b::::::0001 (status 0) ib0: MGID ff12:401b::::::0001 AV 88081e1c42c0, LID 0xc014, SL 0 ib0: sendonly multicast join failed for ff12:601b::::::0016, status -22 ib0: no multicast record for ff12:601b::::::0016, starting sendonly join ib0: sendonly multicast join failed for ff12:601b::::::0016, status -22 ib0: no multicast record for ff12:601b::::::0016, starting sendonly join ib0: sendonly multicast join failed for ff12:601b::::::0016, status -22 ib0: setting up send only multicast group for ff12:601b::::::0002 ib0: no multicast record for ff12:601b::::::0016, starting sendonly join ib0: sendonly multicast join failed for ff12:601b::::::0016, status -22 ib0: setting up send only multicast group for ff12:601b::::0001:ff9f:3b0a here you can see that the ipv6 address is added and queued to the list ib0: no multicast record for ff12:601b::::::0016, starting sendonly join ib0: sendonly multicast join failed for ff12:601b::::::0016, status -22 the ipv6 mcg will not be sent because it is after some other sendonly, and no one in that flow re-queue the mc_task again. This is a problem with the design of the original mcast task thread. I'm looking at a fix now. Currently the design only allows one join to be outstanding at a time. Is there a reason for that that I'm not aware of? Some historical context that I don't know about? IMHO, the reason for only one mc on the air at a time was to make our life easier, otherwise there are locks to take/manage, races between few responses, etc. also, the multicast module in the core keeps all the requests in serialize mode. perhaps, you can use the relevant code from the full-member join in the sendonly joinin order to handle the mc_task, or to return the call to send-only to the mcast_send instead of the mc_task. I reworked things a bit, but yes, the send only task now does the right thing. Please review the latest patchset I posted. It's working just fine for me here. -- Doug Ledford dledf...@redhat.com GPG KeyID: 0E572FDD signature.asc Description: This is a digitally signed message part
Re: [PATCH FIX for-3.19] IB/ipoib: Fix failed multicast joins/sends
Hi Doug, Perhaps I am missing something here, but ping6 still doesn't work for me in many cases. I think the reason is that your origin patch does the following: in function ipoib_mcast_join_task if (test_bit(IPOIB_MCAST_FLAG_SENDONLY, mcast-flags)) ipoib_mcast_sendonly_join(mcast); else ipoib_mcast_join(dev, mcast, 1); return; The flow for sendonly_join doesn't include handling the mc_task, so only the first mc in the list (if it is sendonly mcg) will be sent, and no more mcg's that are in the ipoib mc list are going to be sent. (see how it is in ipoib_mcast_join flow) I can demonstrate it with the log of ipoib: I am trying to ping6 fe80::202:c903:9f:3b0a via ib0 The log is: ib0: restarting multicast task ib0: setting up send only multicast group for ff12:601b::::::0016 ib0: adding multicast entry for mgid ff12:601b::::0001:ff43:3bf1 ib0: no multicast record for ff12:601b::::::0016, starting sendonly join ib0: join completion for ff12:601b::::::0001 (status 0) ib0: MGID ff12:601b::::::0001 AV 88081afb5f40, LID 0xc015, SL 0 ib0: join completion for ff12:401b::::::0001 (status 0) ib0: MGID ff12:401b::::::0001 AV 88081e1c42c0, LID 0xc014, SL 0 ib0: sendonly multicast join failed for ff12:601b::::::0016, status -22 ib0: no multicast record for ff12:601b::::::0016, starting sendonly join ib0: sendonly multicast join failed for ff12:601b::::::0016, status -22 ib0: no multicast record for ff12:601b::::::0016, starting sendonly join ib0: sendonly multicast join failed for ff12:601b::::::0016, status -22 ib0: setting up send only multicast group for ff12:601b::::::0002 ib0: no multicast record for ff12:601b::::::0016, starting sendonly join ib0: sendonly multicast join failed for ff12:601b::::::0016, status -22 ib0: setting up send only multicast group for ff12:601b::::0001:ff9f:3b0a here you can see that the ipv6 address is added and queued to the list ib0: no multicast record for ff12:601b::::::0016, starting sendonly join ib0: sendonly multicast join failed for ff12:601b::::::0016, status -22 the ipv6 mcg will not be sent because it is after some other sendonly, and no one in that flow re-queue the mc_task again. Thanks, Erez The usage of IPOIB_MCAST_RUN as a flag is inconsistent. In some places it is used to mean our device is administratively allowed to send multicast joins/leaves/packets and in other places it means our multicast join task thread is currently running and will process your request if you put it on the queue. However, this latter meaning is in fact flawed as there is a race condition between the join task testing the mcast list and finding it empty of remaining work, dropping the mcast mutex and also the priv-lock spinlock, and clearing the IPOIB_MCAST_RUN flag. Further, there are numerous locations that use the flag in the former fashion, and when all tasks complete and the task thread clears the RUN flag, all of those other locations will fail to ever again queue any work. This results in the interface coming up fine initially, but having problems adding new multicast groups after the first round of groups have all been added and the RUN flag is cleared by the join task thread when it thinks it is done. To resolve this issue, convert all locations in the code to treat the RUN flag as an indicator that the multicast portion of this interface is in fact administratively up and joins/leaves/sends can be performed. There is no harm (other than a slight performance penalty) to never clearing this flag and using it in this fashion as it simply means that a few places that used to micro-optimize how often this task was queued on a work queue will now queue the task a few extra times. We can address that suboptimal behavior in future patches. Signed-off-by: Doug Ledford dledf...@redhat.com --- drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 11 +-- 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c index bc50dd0d0e4..91b8fe118ec 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c @@ -630,8 +630,6 @@ void ipoib_mcast_join_task(struct work_struct *work) } ipoib_dbg_mcast(priv, successfully joined all multicast groups\n); - - clear_bit(IPOIB_MCAST_RUN, priv-flags); } int ipoib_mcast_start_thread(struct net_device *dev) @@ -641,8 +639,8 @@ int ipoib_mcast_start_thread(struct net_device *dev) ipoib_dbg_mcast(priv, starting multicast thread\n);
Re: [PATCH FIX for-3.19] IB/ipoib: Fix failed multicast joins/sends
On Wed, 2015-01-14 at 18:02 +0200, Erez Shitrit wrote: Hi Doug, Perhaps I am missing something here, but ping6 still doesn't work for me in many cases. I think the reason is that your origin patch does the following: in function ipoib_mcast_join_task if (test_bit(IPOIB_MCAST_FLAG_SENDONLY, mcast-flags)) ipoib_mcast_sendonly_join(mcast); else ipoib_mcast_join(dev, mcast, 1); return; The flow for sendonly_join doesn't include handling the mc_task, so only the first mc in the list (if it is sendonly mcg) will be sent, and no more mcg's that are in the ipoib mc list are going to be sent. (see how it is in ipoib_mcast_join flow) Yes, I know what you are talking about. However, my patches did not add this bug, it was present in the original code. Please check a plain v3.18 kernel, which does not have my patches, and you will see that ipoib_mcast_sendonly_join_complete also fails to restart the mcast join thread there as well. I can demonstrate it with the log of ipoib: I am trying to ping6 fe80::202:c903:9f:3b0a via ib0 The log is: ib0: restarting multicast task ib0: setting up send only multicast group for ff12:601b::::::0016 ib0: adding multicast entry for mgid ff12:601b::::0001:ff43:3bf1 ib0: no multicast record for ff12:601b::::::0016, starting sendonly join ib0: join completion for ff12:601b::::::0001 (status 0) ib0: MGID ff12:601b::::::0001 AV 88081afb5f40, LID 0xc015, SL 0 ib0: join completion for ff12:401b::::::0001 (status 0) ib0: MGID ff12:401b::::::0001 AV 88081e1c42c0, LID 0xc014, SL 0 ib0: sendonly multicast join failed for ff12:601b::::::0016, status -22 ib0: no multicast record for ff12:601b::::::0016, starting sendonly join ib0: sendonly multicast join failed for ff12:601b::::::0016, status -22 ib0: no multicast record for ff12:601b::::::0016, starting sendonly join ib0: sendonly multicast join failed for ff12:601b::::::0016, status -22 ib0: setting up send only multicast group for ff12:601b::::::0002 ib0: no multicast record for ff12:601b::::::0016, starting sendonly join ib0: sendonly multicast join failed for ff12:601b::::::0016, status -22 ib0: setting up send only multicast group for ff12:601b::::0001:ff9f:3b0a here you can see that the ipv6 address is added and queued to the list ib0: no multicast record for ff12:601b::::::0016, starting sendonly join ib0: sendonly multicast join failed for ff12:601b::::::0016, status -22 the ipv6 mcg will not be sent because it is after some other sendonly, and no one in that flow re-queue the mc_task again. This is a problem with the design of the original mcast task thread. I'm looking at a fix now. Currently the design only allows one join to be outstanding at a time. Is there a reason for that that I'm not aware of? Some historical context that I don't know about? -- Doug Ledford dledf...@redhat.com GPG KeyID: 0E572FDD signature.asc Description: This is a digitally signed message part
Re: [PATCH FIX for-3.19] IB/ipoib: Fix failed multicast joins/sends
On Wed, 2015-01-14 at 11:54 +0200, Or Gerlitz wrote: On 1/14/2015 8:38 AM, Doug Ledford wrote: The usage of IPOIB_MCAST_RUN as a flag is inconsistent. In some places it is used to mean our device is administratively allowed to send multicast joins/leaves/packets [...] what? please tell which of these upstream hits (prior to any fix) has this semantics? Actually, almost all of them. I don't think they were originally intended to be that way, but as a result of the mechanics of how multicast joins are done, the net result is what it is. But even more importantly, this semantic is what we *want*. For example, in join_finish() where it tests IPOIB_MCAST_RUN before queueing delayed work on a join failure, we *want* this to queue work whether the current task thinks it is done or not as this is an async failure case and the thread could have thought it was finished and cleared the flag before we got our failure. Treating this flag with the old semantic leaves this situation with a dangling error condition that doesn't get retried until something else restarts the mcast thread. In fact, in almost all of the places that this flag is checked, we really do want to run the thread again unless we are in the middle of a mcast flush or we are downing the dev. In all other cases, we want to requeue the task. So the semantic I'm applying here, even if not what you or some other people had in mind, is in fact closer to the truth of how the flag is used, and is more in line with how we *want* the flag used. # git grep -n IPOIB_MCAST_RUN drivers/infiniband/ulp/ipoib drivers/infiniband/ulp/ipoib/ipoib.h:90: IPOIB_MCAST_RUN = 6, drivers/infiniband/ulp/ipoib/ipoib_multicast.c:434: if (test_bit(IPOIB_MCAST_RUN, priv-flags)) drivers/infiniband/ulp/ipoib/ipoib_multicast.c:466: if (status test_bit(IPOIB_MCAST_RUN, priv-flags)) drivers/infiniband/ulp/ipoib/ipoib_multicast.c:536: if (test_bit(IPOIB_MCAST_RUN, priv-flags)) drivers/infiniband/ulp/ipoib/ipoib_multicast.c:550: if (!test_bit(IPOIB_MCAST_RUN, priv-flags)) drivers/infiniband/ulp/ipoib/ipoib_multicast.c:576: if (test_bit(IPOIB_MCAST_RUN, priv-flags)) drivers/infiniband/ulp/ipoib/ipoib_multicast.c:634: clear_bit(IPOIB_MCAST_RUN, priv-flags); drivers/infiniband/ulp/ipoib/ipoib_multicast.c:644: if (!test_and_set_bit(IPOIB_MCAST_RUN, priv-flags)) drivers/infiniband/ulp/ipoib/ipoib_multicast.c:658: clear_bit(IPOIB_MCAST_RUN, priv-flags); drivers/infiniband/ulp/ipoib/ipoib_multicast.c:728: if (!test_and_set_bit(IPOIB_MCAST_RUN, priv-flags)) -- Doug Ledford dledf...@redhat.com GPG KeyID: 0E572FDD signature.asc Description: This is a digitally signed message part
Re: [PATCH FIX for-3.19] IB/ipoib: Fix failed multicast joins/sends
On 1/14/2015 8:38 AM, Doug Ledford wrote: The usage of IPOIB_MCAST_RUN as a flag is inconsistent. In some places it is used to mean our device is administratively allowed to send multicast joins/leaves/packets [...] what? please tell which of these upstream hits (prior to any fix) has this semantics? # git grep -n IPOIB_MCAST_RUN drivers/infiniband/ulp/ipoib drivers/infiniband/ulp/ipoib/ipoib.h:90: IPOIB_MCAST_RUN = 6, drivers/infiniband/ulp/ipoib/ipoib_multicast.c:434: if (test_bit(IPOIB_MCAST_RUN, priv-flags)) drivers/infiniband/ulp/ipoib/ipoib_multicast.c:466: if (status test_bit(IPOIB_MCAST_RUN, priv-flags)) drivers/infiniband/ulp/ipoib/ipoib_multicast.c:536: if (test_bit(IPOIB_MCAST_RUN, priv-flags)) drivers/infiniband/ulp/ipoib/ipoib_multicast.c:550: if (!test_bit(IPOIB_MCAST_RUN, priv-flags)) drivers/infiniband/ulp/ipoib/ipoib_multicast.c:576: if (test_bit(IPOIB_MCAST_RUN, priv-flags)) drivers/infiniband/ulp/ipoib/ipoib_multicast.c:634: clear_bit(IPOIB_MCAST_RUN, priv-flags); drivers/infiniband/ulp/ipoib/ipoib_multicast.c:644: if (!test_and_set_bit(IPOIB_MCAST_RUN, priv-flags)) drivers/infiniband/ulp/ipoib/ipoib_multicast.c:658: clear_bit(IPOIB_MCAST_RUN, priv-flags); drivers/infiniband/ulp/ipoib/ipoib_multicast.c:728: if (!test_and_set_bit(IPOIB_MCAST_RUN, priv-flags)) -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH FIX for-3.19] IB/ipoib: Fix failed multicast joins/sends
The usage of IPOIB_MCAST_RUN as a flag is inconsistent. In some places it is used to mean our device is administratively allowed to send multicast joins/leaves/packets and in other places it means our multicast join task thread is currently running and will process your request if you put it on the queue. However, this latter meaning is in fact flawed as there is a race condition between the join task testing the mcast list and finding it empty of remaining work, dropping the mcast mutex and also the priv-lock spinlock, and clearing the IPOIB_MCAST_RUN flag. Further, there are numerous locations that use the flag in the former fashion, and when all tasks complete and the task thread clears the RUN flag, all of those other locations will fail to ever again queue any work. This results in the interface coming up fine initially, but having problems adding new multicast groups after the first round of groups have all been added and the RUN flag is cleared by the join task thread when it thinks it is done. To resolve this issue, convert all locations in the code to treat the RUN flag as an indicator that the multicast portion of this interface is in fact administratively up and joins/leaves/sends can be performed. There is no harm (other than a slight performance penalty) to never clearing this flag and using it in this fashion as it simply means that a few places that used to micro-optimize how often this task was queued on a work queue will now queue the task a few extra times. We can address that suboptimal behavior in future patches. Signed-off-by: Doug Ledford dledf...@redhat.com --- drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 11 +-- 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c index bc50dd0d0e4..91b8fe118ec 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c @@ -630,8 +630,6 @@ void ipoib_mcast_join_task(struct work_struct *work) } ipoib_dbg_mcast(priv, successfully joined all multicast groups\n); - - clear_bit(IPOIB_MCAST_RUN, priv-flags); } int ipoib_mcast_start_thread(struct net_device *dev) @@ -641,8 +639,8 @@ int ipoib_mcast_start_thread(struct net_device *dev) ipoib_dbg_mcast(priv, starting multicast thread\n); mutex_lock(mcast_mutex); - if (!test_and_set_bit(IPOIB_MCAST_RUN, priv-flags)) - queue_delayed_work(priv-wq, priv-mcast_task, 0); + set_bit(IPOIB_MCAST_RUN, priv-flags); + queue_delayed_work(priv-wq, priv-mcast_task, 0); mutex_unlock(mcast_mutex); return 0; @@ -725,7 +723,7 @@ void ipoib_mcast_send(struct net_device *dev, u8 *daddr, struct sk_buff *skb) memcpy(mcast-mcmember.mgid.raw, mgid, sizeof (union ib_gid)); __ipoib_mcast_add(dev, mcast); list_add_tail(mcast-list, priv-multicast_list); - if (!test_and_set_bit(IPOIB_MCAST_RUN, priv-flags)) + if (test_bit(IPOIB_MCAST_RUN, priv-flags)) queue_delayed_work(priv-wq, priv-mcast_task, 0); } @@ -951,7 +949,8 @@ void ipoib_mcast_restart_task(struct work_struct *work) /* * Restart our join task if needed */ - ipoib_mcast_start_thread(dev); + if (test_bit(IPOIB_MCAST_RUN, priv-flags)) + queue_delayed_work(priv-wq, priv-mcast_task, 0); rtnl_unlock(); } -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html