Re: [PATCH FIX For-3.19 v4 0/7] IB/ipoib: follow fixes for multicast handling
On Wed, 2015-01-21 at 12:37 -0800, Roland Dreier wrote: On Wed, Jan 21, 2015 at 12:34 PM, Or Gerlitz gerlitz...@gmail.com wrote: Because Doug's changes fixed some bad, easy-to-reproduce issues. On the other hand we don't want to introduce new regressions to fix the old issues. See above, we did introduced regressions. Yes, I know, that's my whole point. We need to fix the current 3.19-rc code, and the two choices are to keep the fixes we added during 3.19 or revert back to 3.18. Doug's opinion is that your proposed fix is broken, and we don't have an alternate fix. I will second that opinion. Over night we ran a series of tests on some new patches I made, and they resolved the rmmod/insmod failure case in our testing. There were two significant fixes. One of them was related to the switch to using a separate work queue per device. The other was an oversight in ipoib_mcast_restart_task(). Neither of these issues were addressed by the alternate fix. So, at best, the alternate fix is paper machete that covers over two holes but leaves the holes in place. So I suggest we revert the whole series from 3.19 and get this right for 3.20. Before you decide, please take a look at the final fix as I see it. This was a 7 patch series, now it's 10 patches. But the final three patches are small, well understood, and obviously correct. Regardless of whether you take these 10, I do *not* suggest leaving the first 8 and using the alternate patch. I suggest either an all or nothing approach. But, like I said, the rmmod issue is now fixed in my testing. -- Doug Ledford dledf...@redhat.com GPG KeyID: 0E572FDD signature.asc Description: This is a digitally signed message part
Re: [PATCH FIX For-3.19 v4 0/7] IB/ipoib: follow fixes for multicast handling
On Tue, Jan 20, 2015 at 8:16 AM, Erez Shitrit ere...@dev.mellanox.co.il wrote: After trying your V4 patch series, I can tell that first, the endless scheduling of the mcast task is indeed over, but still, the multicast functionality in ipoib is unstable. Is this worse than 3.18? (Have you tested that?) Because Doug's changes fixed some bad, easy-to-reproduce issues. On the other hand we don't want to introduce new regressions to fix the old issues. I think we only have a few days to decide whether to revert back to 3.18 code, or push forward with these fixes. - R. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH FIX For-3.19 v4 0/7] IB/ipoib: follow fixes for multicast handling
It's worse than 3.18, but only if your test is rmmod ipoib based. For normal running conditions it's better. However, I'm going to try and fix the rmmod issue today if at all possible. Sent from my iPhone On Jan 21, 2015, at 12:20 PM, Roland Dreier rol...@kernel.org wrote: On Tue, Jan 20, 2015 at 8:16 AM, Erez Shitrit ere...@dev.mellanox.co.il wrote: After trying your V4 patch series, I can tell that first, the endless scheduling of the mcast task is indeed over, but still, the multicast functionality in ipoib is unstable. Is this worse than 3.18? (Have you tested that?) Because Doug's changes fixed some bad, easy-to-reproduce issues. On the other hand we don't want to introduce new regressions to fix the old issues. I think we only have a few days to decide whether to revert back to 3.18 code, or push forward with these fixes. - R. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH FIX For-3.19 v4 0/7] IB/ipoib: follow fixes for multicast handling
On Wed, 2015-01-21 at 12:37 -0800, Roland Dreier wrote: On Wed, Jan 21, 2015 at 12:34 PM, Or Gerlitz gerlitz...@gmail.com wrote: Because Doug's changes fixed some bad, easy-to-reproduce issues. On the other hand we don't want to introduce new regressions to fix the old issues. See above, we did introduced regressions. Yes, I know, that's my whole point. We need to fix the current 3.19-rc code, and the two choices are to keep the fixes we added during 3.19 or revert back to 3.18. Doug's opinion is that your proposed fix is broken, and we don't have an alternate fix. So I suggest we revert the whole series from 3.19 and get this right for 3.20. - R. Give me 24 hours before making a final decision on that please. -- Doug Ledford dledf...@redhat.com GPG KeyID: 0E572FDD signature.asc Description: This is a digitally signed message part
Re: [PATCH FIX For-3.19 v4 0/7] IB/ipoib: follow fixes for multicast handling
On Wed, Jan 21, 2015 at 7:19 PM, Roland Dreier rol...@kernel.org wrote: On Tue, Jan 20, 2015 at 8:16 AM, Erez Shitrit ere...@dev.mellanox.co.il wrote: After trying your V4 patch series, I can tell that first, the endless scheduling of the mcast task is indeed over, but still, the multicast functionality in ipoib is unstable. Is this worse than 3.18? (Have you tested that?) Roland, Doug, To be fully clear here by this we're talking on seven patches of complexity and volume which I think go way beyond post -rc5 timeline: IB/ipoib: Fix failed multicast joins/sends IB/ipoib: Add a helper to restart the multicast task IB/ipoib: make delayed tasks not hold up everything IB/ipoib: Handle -ENETRESET properly in our callback IB/ipoib: don't restart our thread on ENETRESET IB/ipoib: remove unneeded locks IB/ipoib: fix race between mcast_dev_flush and mcast_join drivers/infiniband/ulp/ipoib/ipoib.h | 1 + drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 204 +++-- 2 files changed, 121 insertions(+), 84 deletions(-) Doug, I understand your claim and frustration that with 3.18 and such (older kernels) your ifdown/up loop manages to break the driver, but fixing the driver such that this test works and in the same time practically breaking IPv6 and IPv5 multicast introduces a deep regression vs. 3.18 - which as you wrote here, would be wrong to fix with such a further big change. Are you really sure that reverting the offending patch 016d9fb25cd9 IPoIB: fix MCAST_FLAG_BUSY usage and maybe some more dependent related hunks from downstream patches of that series isn't possible? If this is the case, I would suggest that we either revive the review on the fix we sent [1] or drop the whole 3.19-rc1 changes. I vote for the former. [1] http://marc.info/?l=linux-rdmam=142064313123254w=2 Because Doug's changes fixed some bad, easy-to-reproduce issues. On the other hand we don't want to introduce new regressions to fix the old issues. See above, we did introduced regressions. I think we only have a few days to decide whether to revert back to 3.18 code, or push forward with these fixes. Or. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH FIX For-3.19 v4 0/7] IB/ipoib: follow fixes for multicast handling
On Wed, Jan 21, 2015 at 10:37 PM, Roland Dreier rol...@kernel.org wrote: On Wed, Jan 21, 2015 at 12:34 PM, Or Gerlitz gerlitz...@gmail.com wrote: Because Doug's changes fixed some bad, easy-to-reproduce issues. On the other hand we don't want to introduce new regressions to fix the old issues. See above, we did introduced regressions. Yes, I know, that's my whole point. We need to fix the current 3.19-rc code, and the two choices are to keep the fixes we added during 3.19 or revert back to 3.18. Doug's opinion is that your proposed fix is broken, and we don't have an alternate fix. Before we go back to 3.18, there's the option I was asking Doug to comment on, namely is it really impossible to revert the offending patch? if this is the case, I'd like to get Erez few more days to discuss his proposed patch + changes to it from code review. So I suggest we revert the whole series from 3.19 and get this right for 3.20. Roland, I'm very happy to see that you're finally up to non -rc1 activity. This didn't happen for 3.18 and maybe also for 3.17 and you know what, it's terribly impossible to run a kernel sub-system like that. Or. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH FIX For-3.19 v4 0/7] IB/ipoib: follow fixes for multicast handling
On Wed, 2015-01-21 at 22:34 +0200, Or Gerlitz wrote: On Wed, Jan 21, 2015 at 7:19 PM, Roland Dreier rol...@kernel.org wrote: On Tue, Jan 20, 2015 at 8:16 AM, Erez Shitrit ere...@dev.mellanox.co.il wrote: After trying your V4 patch series, I can tell that first, the endless scheduling of the mcast task is indeed over, but still, the multicast functionality in ipoib is unstable. Is this worse than 3.18? (Have you tested that?) Roland, Doug, To be fully clear here by this we're talking on seven patches of complexity and volume which I think go way beyond post -rc5 timeline: IB/ipoib: Fix failed multicast joins/sends IB/ipoib: Add a helper to restart the multicast task IB/ipoib: make delayed tasks not hold up everything IB/ipoib: Handle -ENETRESET properly in our callback IB/ipoib: don't restart our thread on ENETRESET IB/ipoib: remove unneeded locks IB/ipoib: fix race between mcast_dev_flush and mcast_join drivers/infiniband/ulp/ipoib/ipoib.h | 1 + drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 204 +++-- 2 files changed, 121 insertions(+), 84 deletions(-) I don't usually look at the line count or the patch count, I look at the changes. Most of the 7 patches above are very self contained and very small. The exception is the final patch and even though it is large, the net result is that it takes a needlessly complex function and makes it easier to read, easier to understand, easier to verify that it does what it's supposed to do, and then fixes a locking issue. I also look at testing. While it's not obvious to the people on this list, I am testing this rather profusely and so is our internal QE department and a few others. I'm really down to only the rmmod race (which I suspect the 7 patches above did not introduce, but the original 8 patches did). Doug, I understand your claim and frustration that with 3.18 and such (older kernels) your ifdown/up loop manages to break the driver, Yes. Quite easily. but fixing the driver such that this test works and in the same time practically breaking IPv6 and IPv5 multicast introduces a deep regression vs. 3.18 You're right. That's why there are these 7 patches. - which as you wrote here, would be wrong to fix with such a further big change. Are you really sure that reverting the offending patch 016d9fb25cd9 IPoIB: fix MCAST_FLAG_BUSY usage and maybe some more dependent related hunks from downstream patches of that series isn't possible? Positive. The 8 patches change the semantics of the MCAST_BUSY_FLAG usage throughout the code. If you pick some and leave others, you can end up with parts using the flag one way and parts another and then you really blow your locking and usage to hell. If this is the case, I would suggest that we either revive the review on the fix we sent [1] or drop the whole 3.19-rc1 changes. I vote for the former. [1] http://marc.info/?l=linux-rdmam=142064313123254w=2 So, my original 8 patches were broken. That's easy enough to see. My sin in my testing of those patches was that my test methodology was too myopic. I focused on the ifup/ifdown loop scenario, and didn't test other things that might be effected. Similarly, if people are only testing this latest set of code one way (for example, using rmmod/insmod as the means to force the connection up and down and test the joins/leaves), that can be similarly myopic. This patch from Erez is horribly broken for cases other than rmmod/insmod testing. I'm guessing you didn't see it because you never tested any other way, like using ifup/ifdown or pulling a cable or turning a switch port off and back on. If you had, there would not be a question of using it, at least as it stands. Look, smaller is not always better. A fix that seems to work and is smaller, but renders the code almost unintelligible because it makes the code internally inconsistent between various areas is not a preferable patch to a larger patch that fixes things right and makes the overall code consistent and understandable. Because Doug's changes fixed some bad, easy-to-reproduce issues. On the other hand we don't want to introduce new regressions to fix the old issues. See above, we did introduced regressions. Which are already fixed by this 7 patch patchset. The one remaining issue isn't actually an ipv6 or ipv4 multicast issue, but specifically an issue related to locking around rmmod/insmod and that we are allowing ourselves to get into an inconsistent state during these cycles which then effects later running (for instance, we might not clean out all of our multicast joins before rmmod on ipoib succeeds, so on the next insmod and attempt to join that same multicast group, the ib_sa layer multicast membership state no longer agrees with us and we have problems). I've got to fix that. But if you run your tests without rmmod/insmod being how you bring the link up/down, I think
Re: [PATCH FIX For-3.19 v4 0/7] IB/ipoib: follow fixes for multicast handling
On Wed, Jan 21, 2015 at 12:34 PM, Or Gerlitz gerlitz...@gmail.com wrote: Because Doug's changes fixed some bad, easy-to-reproduce issues. On the other hand we don't want to introduce new regressions to fix the old issues. See above, we did introduced regressions. Yes, I know, that's my whole point. We need to fix the current 3.19-rc code, and the two choices are to keep the fixes we added during 3.19 or revert back to 3.18. Doug's opinion is that your proposed fix is broken, and we don't have an alternate fix. So I suggest we revert the whole series from 3.19 and get this right for 3.20. - R. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH FIX For-3.19 v4 0/7] IB/ipoib: follow fixes for multicast handling
On Tue, 2015-01-20 at 18:16 +0200, Erez Shitrit wrote: On 1/20/2015 5:58 AM, Doug Ledford wrote: These patches are to resolve issues created by my previous patch set. While that set worked fine in my testing, there were problems with multicast joins after the initial set of joins had completed. Since my testing relied upon the normal set of multicast joins that happen when the interface is first brought up, I missed those problems. Symptoms vary from failure to send packets due to a failed join, to loss of connectivity after a subnet manager restart, to failure to properly release multicast groups on shutdown resulting in hangs when the mlx4 driver attempts to unload itself via its reboot notifier handler. This set of patches has passed a number of tests above and beyond my original tests. As suggested by Or Gerlitz I added IPv6 and IPv4 multicast tests. I also added both subnet manager restarts and manual shutdown/restart of individual ports at the switch in order to ensure that the ENETRESET path was properly tested. I included testing, then a subnet manager restart, then a quiescent period for caches to expire, then restarting testing to make sure that arp and neighbor discovery work after the subnet manager restart. All in all, I have not been able to trip the multicast joins up any longer. Additionally, the original impetus for my first 8 patch set was that it was simply too easy to break the IPoIB subsystem with this simple loop: while true; do ifconfig ib0 up ifconfig ib0 down done Just to be safe, I made sure this problem did not resurface. Roland, the 3.19-rc code is broken. We either need to revert my original patchset, or grab these, but I would not recommend leaving it as it currently stands. Doug Ledford (7): IB/ipoib: Fix failed multicast joins/sends IB/ipoib: Add a helper to restart the multicast task IB/ipoib: make delayed tasks not hold up everything IB/ipoib: Handle -ENETRESET properly in our callback IB/ipoib: don't restart our thread on ENETRESET IB/ipoib: remove unneeded locks IB/ipoib: fix race between mcast_dev_flush and mcast_join drivers/infiniband/ulp/ipoib/ipoib.h | 1 + drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 204 +++-- 2 files changed, 121 insertions(+), 84 deletions(-) Hi Doug, After trying your V4 patch series, I can tell that first, the endless scheduling of the mcast task is indeed over, Good. but still, the multicast functionality in ipoib is unstable. I'm not seeing that here. Let's try to figure out what's different. I see that there are times that ping6 works good, and sometimes it doesn't, to make it clear I always use the link-local address assigned by the stack to the IPoIB device, see [1] below for how I run it. As do I. I'll attach the scripts I used to run it for your reference. I also see that send-only mcast stops working from time to time, see [2] below for how I run this. I can narrow the problem to be on the sender (client) side, since I work with a peer node which has well functioning IPoIB multicast code. I don't think the peer side really denotes a conclusive argument ;-) One more phenomena, that in some cases I can see that the driver (after the mcast_debug_level is set) prints endless message: ib0: no address vector, but multicast join already started OK, this is to be expected from your tests I think. In particular, this is generated by mcast_send() if it's called by your program while the send only join has not yet completed. The flow goes like this: First packet after interface comes up: mcast_send() - ipoib_mcast_alloc() - ipoib_mcast_add() - schedule join task thread In a different thread: mcast_join_task() find unjoined mcast group mark mcast-flags with IPOIB_MCAST_FLAG BUSY - mcast_join() send join request over the wire Back on original thread context: mcast_send() this time we find a matching mcast entry but mcast-ah is NULL queue packet, unless backlog is full and then drop packet if mcast-flags IPOIB_MCAST_FLAG_BUSY, emit notice that you see In a different thread: mcast_sendonly_join_complete() - mcast_join_finish() set mcast-ah send skb backlog queue clear IPOIB_MCAST_FLAG_BUSY Back on original thread context: mcast_send() now we find the mcast entry, and we find the mcast-ah entry, so sends now proceed as expected with no messages, and any lost packets while waiting on mcast-ah
[PATCH FIX For-3.19 v4 0/7] IB/ipoib: follow fixes for multicast handling
These patches are to resolve issues created by my previous patch set. While that set worked fine in my testing, there were problems with multicast joins after the initial set of joins had completed. Since my testing relied upon the normal set of multicast joins that happen when the interface is first brought up, I missed those problems. Symptoms vary from failure to send packets due to a failed join, to loss of connectivity after a subnet manager restart, to failure to properly release multicast groups on shutdown resulting in hangs when the mlx4 driver attempts to unload itself via its reboot notifier handler. This set of patches has passed a number of tests above and beyond my original tests. As suggested by Or Gerlitz I added IPv6 and IPv4 multicast tests. I also added both subnet manager restarts and manual shutdown/restart of individual ports at the switch in order to ensure that the ENETRESET path was properly tested. I included testing, then a subnet manager restart, then a quiescent period for caches to expire, then restarting testing to make sure that arp and neighbor discovery work after the subnet manager restart. All in all, I have not been able to trip the multicast joins up any longer. Additionally, the original impetus for my first 8 patch set was that it was simply too easy to break the IPoIB subsystem with this simple loop: while true; do ifconfig ib0 up ifconfig ib0 down done Just to be safe, I made sure this problem did not resurface. Roland, the 3.19-rc code is broken. We either need to revert my original patchset, or grab these, but I would not recommend leaving it as it currently stands. Doug Ledford (7): IB/ipoib: Fix failed multicast joins/sends IB/ipoib: Add a helper to restart the multicast task IB/ipoib: make delayed tasks not hold up everything IB/ipoib: Handle -ENETRESET properly in our callback IB/ipoib: don't restart our thread on ENETRESET IB/ipoib: remove unneeded locks IB/ipoib: fix race between mcast_dev_flush and mcast_join drivers/infiniband/ulp/ipoib/ipoib.h | 1 + drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 204 +++-- 2 files changed, 121 insertions(+), 84 deletions(-) -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html