Re: [PATCH FIX For-3.19 v4 0/7] IB/ipoib: follow fixes for multicast handling

2015-01-22 Thread Doug Ledford
On Wed, 2015-01-21 at 12:37 -0800, Roland Dreier wrote:
 On Wed, Jan 21, 2015 at 12:34 PM, Or Gerlitz gerlitz...@gmail.com wrote:
  Because Doug's changes fixed some bad, easy-to-reproduce issues.  On
  the other hand we don't want to introduce new regressions to fix the
  old issues.
 
  See above, we did introduced regressions.
 
 Yes, I know, that's my whole point.
 
 We need to fix the current 3.19-rc code, and the two choices are to
 keep the fixes we added during 3.19 or revert back to 3.18.
 
 Doug's opinion is that your proposed fix is broken, and we don't have
 an alternate fix.

I will second that opinion.  Over night we ran a series of tests on some
new patches I made, and they resolved the rmmod/insmod failure case in
our testing.  There were two significant fixes.  One of them was related
to the switch to using a separate work queue per device.  The other was
an oversight in ipoib_mcast_restart_task().  Neither of these issues
were addressed by the alternate fix.  So, at best, the alternate fix is
paper machete that covers over two holes but leaves the holes in place.

 So I suggest we revert the whole series from 3.19 and get this right for 3.20.

Before you decide, please take a look at the final fix as I see it.
This was a 7 patch series, now it's 10 patches.  But the final three
patches are small, well understood, and obviously correct.

Regardless of whether you take these 10, I do *not* suggest leaving the
first 8 and using the alternate patch.  I suggest either an all or
nothing approach.  But, like I said, the rmmod issue is now fixed in my
testing.

-- 
Doug Ledford dledf...@redhat.com
  GPG KeyID: 0E572FDD




signature.asc
Description: This is a digitally signed message part


Re: [PATCH FIX For-3.19 v4 0/7] IB/ipoib: follow fixes for multicast handling

2015-01-21 Thread Roland Dreier
On Tue, Jan 20, 2015 at 8:16 AM, Erez Shitrit ere...@dev.mellanox.co.il wrote:

 After trying your V4 patch series, I can tell that first, the endless 
 scheduling of
 the mcast task is indeed over, but still, the multicast functionality in 
 ipoib is unstable.

Is this worse than 3.18?  (Have you tested that?)

Because Doug's changes fixed some bad, easy-to-reproduce issues.  On
the other hand we don't want to introduce new regressions to fix the
old issues.

I think we only have a few days to decide whether to revert back to
3.18 code, or push forward with these fixes.

 - R.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH FIX For-3.19 v4 0/7] IB/ipoib: follow fixes for multicast handling

2015-01-21 Thread Doug Ledford
It's worse than 3.18, but only if your test is rmmod ipoib based. For normal 
running conditions it's better. However, I'm going to try and fix the rmmod 
issue today if at all possible.

Sent from my iPhone

 On Jan 21, 2015, at 12:20 PM, Roland Dreier rol...@kernel.org wrote:
 
 On Tue, Jan 20, 2015 at 8:16 AM, Erez Shitrit ere...@dev.mellanox.co.il 
 wrote:
 
 After trying your V4 patch series, I can tell that first, the endless 
 scheduling of
 the mcast task is indeed over, but still, the multicast functionality in 
 ipoib is unstable.
 
 Is this worse than 3.18?  (Have you tested that?)
 
 Because Doug's changes fixed some bad, easy-to-reproduce issues.  On
 the other hand we don't want to introduce new regressions to fix the
 old issues.
 
 I think we only have a few days to decide whether to revert back to
 3.18 code, or push forward with these fixes.
 
 - R.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH FIX For-3.19 v4 0/7] IB/ipoib: follow fixes for multicast handling

2015-01-21 Thread Doug Ledford
On Wed, 2015-01-21 at 12:37 -0800, Roland Dreier wrote:
 On Wed, Jan 21, 2015 at 12:34 PM, Or Gerlitz gerlitz...@gmail.com wrote:
  Because Doug's changes fixed some bad, easy-to-reproduce issues.  On
  the other hand we don't want to introduce new regressions to fix the
  old issues.
 
  See above, we did introduced regressions.
 
 Yes, I know, that's my whole point.
 
 We need to fix the current 3.19-rc code, and the two choices are to
 keep the fixes we added during 3.19 or revert back to 3.18.
 
 Doug's opinion is that your proposed fix is broken, and we don't have
 an alternate fix.
 
 So I suggest we revert the whole series from 3.19 and get this right for 3.20.
 
  - R.

Give me 24 hours before making a final decision on that please.

-- 
Doug Ledford dledf...@redhat.com
  GPG KeyID: 0E572FDD




signature.asc
Description: This is a digitally signed message part


Re: [PATCH FIX For-3.19 v4 0/7] IB/ipoib: follow fixes for multicast handling

2015-01-21 Thread Or Gerlitz
On Wed, Jan 21, 2015 at 7:19 PM, Roland Dreier rol...@kernel.org wrote:
 On Tue, Jan 20, 2015 at 8:16 AM, Erez Shitrit ere...@dev.mellanox.co.il 
 wrote:

 After trying your V4 patch series, I can tell that first, the endless 
 scheduling of
 the mcast task is indeed over, but still, the multicast functionality in 
 ipoib is unstable.

 Is this worse than 3.18?  (Have you tested that?)

Roland, Doug,

To be fully clear here by this we're talking on seven patches of
complexity and volume which I think go way beyond post -rc5 timeline:

  IB/ipoib: Fix failed multicast joins/sends
  IB/ipoib: Add a helper to restart the multicast task
  IB/ipoib: make delayed tasks not hold up everything
  IB/ipoib: Handle -ENETRESET properly in our callback
  IB/ipoib: don't restart our thread on ENETRESET
  IB/ipoib: remove unneeded locks
  IB/ipoib: fix race between mcast_dev_flush and mcast_join

 drivers/infiniband/ulp/ipoib/ipoib.h   |   1 +
 drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 204 +++--
 2 files changed, 121 insertions(+), 84 deletions(-)

Doug, I understand your claim and frustration that with 3.18 and such
(older kernels) your ifdown/up loop manages to break the driver, but
fixing the driver such that this test works and in the same time
practically breaking IPv6 and IPv5 multicast introduces a deep
regression vs. 3.18 - which as you wrote here, would be wrong to fix
with such a further big change.

Are you really sure that reverting the offending patch 016d9fb25cd9
IPoIB: fix MCAST_FLAG_BUSY usage and maybe some more dependent
related hunks from downstream patches of that series isn't possible?

If this is the case, I would suggest that we either revive the review
on the fix we sent [1] or drop the whole 3.19-rc1 changes. I vote for
the former.

[1] http://marc.info/?l=linux-rdmam=142064313123254w=2

 Because Doug's changes fixed some bad, easy-to-reproduce issues.  On
 the other hand we don't want to introduce new regressions to fix the
 old issues.

See above, we did introduced regressions.

 I think we only have a few days to decide whether to revert back to
 3.18 code, or push forward with these fixes.

Or.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH FIX For-3.19 v4 0/7] IB/ipoib: follow fixes for multicast handling

2015-01-21 Thread Or Gerlitz
On Wed, Jan 21, 2015 at 10:37 PM, Roland Dreier rol...@kernel.org wrote:
 On Wed, Jan 21, 2015 at 12:34 PM, Or Gerlitz gerlitz...@gmail.com wrote:
 Because Doug's changes fixed some bad, easy-to-reproduce issues.  On
 the other hand we don't want to introduce new regressions to fix the
 old issues.

 See above, we did introduced regressions.

 Yes, I know, that's my whole point.

 We need to fix the current 3.19-rc code, and the two choices are to
 keep the fixes we added during 3.19 or revert back to 3.18.

 Doug's opinion is that your proposed fix is broken, and we don't have
 an alternate fix.

Before we go back to 3.18, there's the option I was asking Doug to
comment on, namely is it really impossible to revert the offending
patch? if this is the case, I'd like to get Erez few more days to
discuss his proposed patch + changes to it from code review.

 So I suggest we revert the whole series from 3.19 and get this right for 3.20.

Roland, I'm very happy to see that you're finally up to non -rc1 activity.

This didn't happen for 3.18 and maybe also for 3.17 and you know what,
it's terribly impossible to run a kernel sub-system like that.

 Or.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH FIX For-3.19 v4 0/7] IB/ipoib: follow fixes for multicast handling

2015-01-21 Thread Doug Ledford
On Wed, 2015-01-21 at 22:34 +0200, Or Gerlitz wrote:
 On Wed, Jan 21, 2015 at 7:19 PM, Roland Dreier rol...@kernel.org wrote:
  On Tue, Jan 20, 2015 at 8:16 AM, Erez Shitrit ere...@dev.mellanox.co.il 
  wrote:
 
  After trying your V4 patch series, I can tell that first, the endless 
  scheduling of
  the mcast task is indeed over, but still, the multicast functionality in 
  ipoib is unstable.
 
  Is this worse than 3.18?  (Have you tested that?)
 
 Roland, Doug,
 
 To be fully clear here by this we're talking on seven patches of
 complexity and volume which I think go way beyond post -rc5 timeline:
 
   IB/ipoib: Fix failed multicast joins/sends
   IB/ipoib: Add a helper to restart the multicast task
   IB/ipoib: make delayed tasks not hold up everything
   IB/ipoib: Handle -ENETRESET properly in our callback
   IB/ipoib: don't restart our thread on ENETRESET
   IB/ipoib: remove unneeded locks
   IB/ipoib: fix race between mcast_dev_flush and mcast_join
 
  drivers/infiniband/ulp/ipoib/ipoib.h   |   1 +
  drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 204 
 +++--
  2 files changed, 121 insertions(+), 84 deletions(-)

I don't usually look at the line count or the patch count, I look at the
changes.  Most of the 7 patches above are very self contained and very
small.  The exception is the final patch and even though it is large,
the net result is that it takes a needlessly complex function and makes
it easier to read, easier to understand, easier to verify that it does
what it's supposed to do, and then fixes a locking issue.  I also look
at testing.  While it's not obvious to the people on this list, I am
testing this rather profusely and so is our internal QE department and a
few others.  I'm really down to only the rmmod race (which I suspect the
7 patches above did not introduce, but the original 8 patches did).

 Doug, I understand your claim and frustration that with 3.18 and such
 (older kernels) your ifdown/up loop manages to break the driver,

Yes.  Quite easily.

  but
 fixing the driver such that this test works and in the same time
 practically breaking IPv6 and IPv5 multicast introduces a deep
 regression vs. 3.18

You're right.  That's why there are these 7 patches.

  - which as you wrote here, would be wrong to fix
 with such a further big change.
 
 Are you really sure that reverting the offending patch 016d9fb25cd9
 IPoIB: fix MCAST_FLAG_BUSY usage and maybe some more dependent
 related hunks from downstream patches of that series isn't possible?

Positive.  The 8 patches change the semantics of the MCAST_BUSY_FLAG
usage throughout the code.  If you pick some and leave others, you can
end up with parts using the flag one way and parts another and then you
really blow your locking and usage to hell.

 If this is the case, I would suggest that we either revive the review
 on the fix we sent [1] or drop the whole 3.19-rc1 changes. I vote for
 the former.
 
 [1] http://marc.info/?l=linux-rdmam=142064313123254w=2

So, my original 8 patches were broken.  That's easy enough to see.  My
sin in my testing of those patches was that my test methodology was
too myopic.  I focused on the ifup/ifdown loop scenario, and didn't test
other things that might be effected.  Similarly, if people are only
testing this latest set of code one way (for example, using rmmod/insmod
as the means to force the connection up and down and test the
joins/leaves), that can be similarly myopic.  This patch from Erez is
horribly broken for cases other than rmmod/insmod testing.  I'm guessing
you didn't see it because you never tested any other way, like using
ifup/ifdown or pulling a cable or turning a switch port off and back on.
If you had, there would not be a question of using it, at least as it
stands.

Look, smaller is not always better.  A fix that seems to work and is
smaller, but renders the code almost unintelligible because it makes the
code internally inconsistent between various areas is not a preferable
patch to a larger patch that fixes things right and makes the overall
code consistent and understandable.

  Because Doug's changes fixed some bad, easy-to-reproduce issues.  On
  the other hand we don't want to introduce new regressions to fix the
  old issues.
 
 See above, we did introduced regressions.

Which are already fixed by this 7 patch patchset.  The one remaining
issue isn't actually an ipv6 or ipv4 multicast issue, but specifically
an issue related to locking around rmmod/insmod and that we are allowing
ourselves to get into an inconsistent state during these cycles which
then effects later running (for instance, we might not clean out all of
our multicast joins before rmmod on ipoib succeeds, so on the next
insmod and attempt to join that same multicast group, the ib_sa layer
multicast membership state no longer agrees with us and we have
problems).  I've got to fix that.  But if you run your tests without
rmmod/insmod being how you bring the link up/down, I think 

Re: [PATCH FIX For-3.19 v4 0/7] IB/ipoib: follow fixes for multicast handling

2015-01-21 Thread Roland Dreier
On Wed, Jan 21, 2015 at 12:34 PM, Or Gerlitz gerlitz...@gmail.com wrote:
 Because Doug's changes fixed some bad, easy-to-reproduce issues.  On
 the other hand we don't want to introduce new regressions to fix the
 old issues.

 See above, we did introduced regressions.

Yes, I know, that's my whole point.

We need to fix the current 3.19-rc code, and the two choices are to
keep the fixes we added during 3.19 or revert back to 3.18.

Doug's opinion is that your proposed fix is broken, and we don't have
an alternate fix.

So I suggest we revert the whole series from 3.19 and get this right for 3.20.

 - R.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH FIX For-3.19 v4 0/7] IB/ipoib: follow fixes for multicast handling

2015-01-20 Thread Doug Ledford
On Tue, 2015-01-20 at 18:16 +0200, Erez Shitrit wrote:
 On 1/20/2015 5:58 AM, Doug Ledford wrote:
  These patches are to resolve issues created by my previous patch set.
  While that set worked fine in my testing, there were problems with
  multicast joins after the initial set of joins had completed.  Since my
  testing relied upon the normal set of multicast joins that happen
  when the interface is first brought up, I missed those problems.
 
  Symptoms vary from failure to send packets due to a failed join, to
  loss of connectivity after a subnet manager restart, to failure
  to properly release multicast groups on shutdown resulting in hangs
  when the mlx4 driver attempts to unload itself via its reboot
  notifier handler.
 
  This set of patches has passed a number of tests above and beyond my
  original tests.  As suggested by Or Gerlitz I added IPv6 and IPv4
  multicast tests.  I also added both subnet manager restarts and
  manual shutdown/restart of individual ports at the switch in order to
  ensure that the ENETRESET path was properly tested.  I included
  testing, then a subnet manager restart, then a quiescent period for
  caches to expire, then restarting testing to make sure that arp and
  neighbor discovery work after the subnet manager restart.
 
  All in all, I have not been able to trip the multicast joins up any
  longer.
 
  Additionally, the original impetus for my first 8 patch set was that
  it was simply too easy to break the IPoIB subsystem with this simple
  loop:
 
  while true; do
   ifconfig ib0 up
   ifconfig ib0 down
  done
 
  Just to be safe, I made sure this problem did not resurface.
 
  Roland, the 3.19-rc code is broken.  We either need to revert my
  original patchset, or grab these, but I would not recommend leaving
  it as it currently stands.
 
  Doug Ledford (7):
 IB/ipoib: Fix failed multicast joins/sends
 IB/ipoib: Add a helper to restart the multicast task
 IB/ipoib: make delayed tasks not hold up everything
 IB/ipoib: Handle -ENETRESET properly in our callback
 IB/ipoib: don't restart our thread on ENETRESET
 IB/ipoib: remove unneeded locks
 IB/ipoib: fix race between mcast_dev_flush and mcast_join
 
drivers/infiniband/ulp/ipoib/ipoib.h   |   1 +
drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 204 
  +++--
2 files changed, 121 insertions(+), 84 deletions(-)
 
 Hi Doug,
 
 After trying your V4 patch series, I can tell that first, the endless 
 scheduling of
 the mcast task is indeed over,

Good.

  but still, the multicast functionality in 
 ipoib is unstable.

I'm not seeing that here.  Let's try to figure out what's different.

 I see that there are times that ping6 works good, and sometimes it 
 doesn't, to
 make it clear I always use the link-local address assigned by the stack 
 to the
 IPoIB device, see [1] below for how I run it.

As do I.  I'll attach the scripts I used to run it for your reference.

 I also see that send-only mcast stops working from time to time, see [2] 
 below
 for how I run this. I can narrow the problem to be on the sender 
 (client) side,
 since I work with a peer node which has well functioning IPoIB multicast 
 code.

I don't think the peer side really denotes a conclusive argument ;-)

 One more phenomena, that in some cases I can see that the driver (after the
 mcast_debug_level is set) prints endless message:
 ib0: no address vector, but multicast join already started

OK, this is to be expected from your tests I think.  In particular, this
is generated by mcast_send() if it's called by your program while the
send only join has not yet completed.  The flow goes like this:

First packet after interface comes up:
mcast_send() - ipoib_mcast_alloc() - ipoib_mcast_add() - schedule join task 
thread

In a different thread:
mcast_join_task()
  find unjoined mcast group
  mark mcast-flags with IPOIB_MCAST_FLAG BUSY
  - mcast_join()
 send join request over the wire

Back on original thread context:
mcast_send()
  this time we find a matching mcast entry but mcast-ah is NULL
  queue packet, unless backlog is full and then drop packet
  if mcast-flags  IPOIB_MCAST_FLAG_BUSY, emit notice that you see

In a different thread:
mcast_sendonly_join_complete() -
mcast_join_finish()
  set mcast-ah
  send skb backlog queue
  clear IPOIB_MCAST_FLAG_BUSY

Back on original thread context:
mcast_send()
  now we find the mcast entry, and we find the mcast-ah entry, so
  sends now proceed as expected with no messages, and any lost packets
  while waiting on mcast-ah 

[PATCH FIX For-3.19 v4 0/7] IB/ipoib: follow fixes for multicast handling

2015-01-19 Thread Doug Ledford
These patches are to resolve issues created by my previous patch set.
While that set worked fine in my testing, there were problems with
multicast joins after the initial set of joins had completed.  Since my
testing relied upon the normal set of multicast joins that happen
when the interface is first brought up, I missed those problems.

Symptoms vary from failure to send packets due to a failed join, to
loss of connectivity after a subnet manager restart, to failure
to properly release multicast groups on shutdown resulting in hangs
when the mlx4 driver attempts to unload itself via its reboot
notifier handler.

This set of patches has passed a number of tests above and beyond my
original tests.  As suggested by Or Gerlitz I added IPv6 and IPv4
multicast tests.  I also added both subnet manager restarts and
manual shutdown/restart of individual ports at the switch in order to
ensure that the ENETRESET path was properly tested.  I included
testing, then a subnet manager restart, then a quiescent period for
caches to expire, then restarting testing to make sure that arp and
neighbor discovery work after the subnet manager restart.

All in all, I have not been able to trip the multicast joins up any
longer.

Additionally, the original impetus for my first 8 patch set was that
it was simply too easy to break the IPoIB subsystem with this simple
loop:

while true; do
ifconfig ib0 up
ifconfig ib0 down
done

Just to be safe, I made sure this problem did not resurface.

Roland, the 3.19-rc code is broken.  We either need to revert my
original patchset, or grab these, but I would not recommend leaving
it as it currently stands.

Doug Ledford (7):
  IB/ipoib: Fix failed multicast joins/sends
  IB/ipoib: Add a helper to restart the multicast task
  IB/ipoib: make delayed tasks not hold up everything
  IB/ipoib: Handle -ENETRESET properly in our callback
  IB/ipoib: don't restart our thread on ENETRESET
  IB/ipoib: remove unneeded locks
  IB/ipoib: fix race between mcast_dev_flush and mcast_join

 drivers/infiniband/ulp/ipoib/ipoib.h   |   1 +
 drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 204 +++--
 2 files changed, 121 insertions(+), 84 deletions(-)

-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html