Re: [PATCH FIX For-3.19 v5 00/10] Fix ipoib regressions

Erez Shitrit Mon, 26 Jan 2015 02:29:07 -0800

On 1/26/2015 12:21 AM, Doug Ledford wrote:

On Sun, 2015-01-25 at 14:54 +0200, Erez Shitrit wrote:

On 1/23/2015 6:52 PM, Doug Ledford wrote:

On Thu, 2015-01-22 at 09:31 -0500, Doug Ledford wrote:

My 8 patch set taken into 3.19 caused some regressions.  This patch
set resolves those issues.


These patches are to resolve issues created by my previous patch set.
While that set worked fine in my testing, there were problems with
multicast joins after the initial set of joins had completed.  Since my
testing relied upon the normal set of multicast joins that happen
when the interface is first brought up, I missed those problems.

Symptoms vary from failure to send packets due to a failed join, to
loss of connectivity after a subnet manager restart, to failure
to properly release multicast groups on shutdown resulting in hangs
when the mlx4 driver attempts to unload itself via its reboot
notifier handler.

This set of patches has passed a number of tests above and beyond my
original tests.  As suggested by Or Gerlitz I added IPv6 and IPv4
multicast tests.  I also added both subnet manager restarts and
manual shutdown/restart of individual ports at the switch in order to
ensure that the ENETRESET path was properly tested.  I included
testing, then a subnet manager restart, then a quiescent period for
caches to expire, then restarting testing to make sure that arp and
neighbor discovery work after the subnet manager restart.

All in all, I have not been able to trip the multicast joins up any
longer.

Additionally, the original impetus for my first 8 patch set was that
it was simply too easy to break the IPoIB subsystem with this simple
loop:

while true; do
      ifconfig ib0 up
      ifconfig ib0 down
done

Just to be safe, I made sure this problem did not resurface.

v5: fix an oversight in mcast_restart_task that leaked mcast joins
      fix a failure to flush the ipoib_workqueue on deregister that
      meant we could end up running our code after our device had been
      removed, resulting in an oops
      remove a debug message that could be trigger so fast that the
      kernel printk mechanism would starve out the mcast join task thread
      resulting in what looked like a mcast failure that was really just
      delayed action


Doug Ledford (10):
    IB/ipoib: fix IPOIB_MCAST_RUN flag usage
    IB/ipoib: Add a helper to restart the multicast task
    IB/ipoib: make delayed tasks not hold up everything
    IB/ipoib: Handle -ENETRESET properly in our callback
    IB/ipoib: don't restart our thread on ENETRESET
    IB/ipoib: remove unneeded locks
    IB/ipoib: fix race between mcast_dev_flush and mcast_join
    IB/ipoib: fix ipoib_mcast_restart_task
    IB/ipoib: flush the ipoib_workqueue on unregister
    IB/ipoib: cleanup a couple debug messages

   drivers/infiniband/ulp/ipoib/ipoib.h           |   1 +
   drivers/infiniband/ulp/ipoib/ipoib_main.c      |   2 +
   drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 234 
++++++++++++++-----------
   3 files changed, 131 insertions(+), 106 deletions(-)

FWIW, a couple different customers have tried a test kernel I built
internally with my patches and I've had multiple reports that all
previously observed issues have been resolved.

Hi Doug,
still see an issue with the last version, and as a result no sendonly or
IPv6 is working.

I disagree with your assessment.  See below.

The scenario is some how simple to reproduce, if there is a sendonly
multicast group that failed to join (sm refuses, perhaps the group was
closed, etc.)

This is not the case.  "failed to join" implies that the join completed
with an error.  According to the logs below, something impossible is
happening.

  that causes all the other mcg's to be blocked behind it
forever.

Your right.  Because the join tasks are serialized and we only queue up
one join at a time (something I would like to address, but not in the
3.19 timeframe), if one join simply never returns, as the logs below
indicate, then it blocks up the whole system.

for example, there is bad mcg ff12:601b:ffff:0000:0000:0000:0000:0016
that the sm refuses to join, and after some time the user tries to send
packets to ip address 225.5.5.5 (mcg:
ff12:401b:ffff:0000:0000:0000:0105:0505 )
the log will show something like:
[1561627.426080] ib0: no multicast record for
ff12:601b:ffff:0000:0000:0000:0000:0016, starting sendonly join

This is printed out by sendonly_join

[1561633.726768] ib0: setting up send only multicast group for
ff12:401b:ffff:0000:0000:0000:0105:0505

This is part of mcast_send and indicates we have created the new group
and added it to the mcast rbtree, but not that we've started the join.

[1561643.498990] ib0: no multicast record for
ff12:601b:ffff:0000:0000:0000:0000:0016, starting sendonly join

Our mcast task got ran again, and we are restarting the same group for
some reason.  Theoretically that means we should have cleared the busy
flag on this group somehow, which requires that sendonly_join_complete
must have been called.  However, I can't find a single code path in
sendonly_join_complete that could possibly clear out the busy flag on
this group and not result in a printout message of some sort.  Are you
trimming the list of messages here?  The reason I say this is that these
are the possibilities:

sendonly_join_complete with status == ENETRESET, we silently drop out
but we would immediately get an ib_event with netreset and process a
flush, which would print out copious messages
sendonly_join_complete with status == 0, we would call mcast_join_finish
and the only silent return is if this is our broadcast group and
priv->broadcast is not set, in which case we don't have link up and the
silent failures and the failure to join any other address are normal and
expected since the ipoib layer considers things dead without the
broadcast group joined, every other return path will print out
something, and in particular the attempt to get an ah on the group will
always print out either success or failure
sendonly_join_complete with status anything else results in a debug
message about the join failing.

As far as I know, your setup has the IPv4 broadcast attached and the
link is up.  The group you are actually showing as bad is the IPv6
MLDv2-capable Routers group and failure to join it should not be
blocking anything else.  So I'm back to what I said before: unless you
are trimming messages, this debug output makes no sense whatsoever and
should be impossible to generate.

New (and full) dmesg attached, (after modprobe ib_ipoib, with all debugflags set) it is all there.


the case is simple: ping6
In device A:
# ip addr show ib0

19: ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc mq state UPqlen 256link/infinibanda0:04:02:10:fe:80:00:00:00:00:00:00:00:02:c9:03:00:9f:3b:0a brd00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff

    inet 11.134.33.1/8 brd 11.255.255.255 scope global ib0
    inet6 fe80::202:c903:9f:3b0a/64 scope link
       valid_lft forever preferred_lft forever

In device B, where i have the latest version:
#ping6 fe80::202:c903:9f:3b0a -I ib0

   Are you sure you have a clean tree
with no possible merge oddities that are throwing your testing off?

double checked myself, new git, with the last 10 patches, recompiled allmodules, reboot etc. the same issue.


#git log --oneline
362509a IB/ipoib: cleanup a couple debug messages
e8e1618 IB/ipoib: flush the ipoib_workqueue on unregister
867555c IB/ipoib: fix ipoib_mcast_restart_task
125a466 IB/ipoib: fix race between mcast_dev_flush and mcast_join
9b0908b IB/ipoib: remove unneeded locks
c2fb665 IB/ipoib: don't restart our thread on ENETRESET
2345213 IB/ipoib: Handle -ENETRESET properly in our callback
6ed68e8 IB/ipoib: make delayed tasks not hold up everything
bee92be5 IB/ipoib: Add a helper to restart the multicast task
2084aee IB/ipoib: fix IPOIB_MCAST_RUN flag usage

a7cfef2 Merge branches 'core', 'cxgb4', 'ipoib', 'iser', 'mlx4','ocrdma', 'odp' and 'srp' into for-next

....
....

[1561675.645424] ib0: no multicast record for
ff12:601b:ffff:0000:0000:0000:0000:0016, starting sendonly join
[1561691.718464] ib0: no multicast record for
ff12:601b:ffff:0000:0000:0000:0000:0016, starting sendonly join
[1561707.791609] ib0: no multicast record for
ff12:601b:ffff:0000:0000:0000:0000:0016, starting sendonly join
[1561723.864839] ib0: no multicast record for
ff12:601b:ffff:0000:0000:0000:0000:0016, starting sendonly join
[1561739.937981] ib0: no multicast record for
ff12:601b:ffff:0000:0000:0000:0000:0016, starting sendonly join
[1561756.010895] ib0: no multicast record for
ff12:601b:ffff:0000:0000:0000:0000:0016, starting sendonly join

These are all being queued with successively longer backoffs, exactly as
they should be.

In __ipoib_mcast_continue_join_thread, when we get called with delay !=
0, we end up doing two things.  One, we change the mcast->backoff from 1
to whatever it should be now (minimum of 2, maximum of 16), and we set
delay_until to jiffies + (HZ * mcast->backoff).  This lets the mcast
task know to skip this group until such time as its backoff is expired.
 From the messages above, that seems to be working. The other thing we do
is we don't queue up the mcast thread just once, we queue it up twice.
Once at the right time for the backoff, and again now (although there is
a bug here, the second instance queues it up with delay as the time
offset, that should be 0, that's a holdover from before, at least
fortunately it is only ever 1 or 0 so it's not horrible, but things
would go a little faster if I fix that so that the second instance is
always 0).  That way the instance that's queued up first will know to
skip the delayed group and continue on through the list, and that should
allow us to get these other groups that you say are blocked.

....
for ever or till the sm will decide in the future to let
ff12:601b:ffff:0000:0000:0000:0000:0016 join, till than no sendonly traffic.


The main cause is the concept that was broken for the send-only join,
when you treat the sendonly like a regular mcg and add it to the mc list
and to the mc_task etc.

I'm looking at 3.18 right now, and 3.18 adds sendonly groups to the mcg
just like my current code does.  The only difference, and I do mean
*only*, is that it calls sendonly_join directly instead of via the
mcast_task.

Yes, and i already wrote that it is more than just "only", it changedthe concept of the sendonly mc packet.

   The net effect of that is that sendonly joins are not
serialized in the old version and they are in the new.  I'm more than
happy to deserialize them, but I don't think that necessarily means we
have to call sendonly_join directly from mcast_send, it could also be
done by allowing the mcast_task to fire off more than one join at a
time.  Whether we use the mcast_task or call sendonly_join directly is
in-consequential, it's more an issue of whether or not we serialize the
joins.

(not consider other issues like packet drop, and bw of the sendonly traffic)
IMHO, sendonly is part of the TX flow as it was till now in the ipoib
driver and should stay in that way.

It seems like you're thinking of the mcast task thread as something that
only runs as part of mcast joins or leaves when the upper layer calls
set multicast list in our driver.  It used to be that way, but since I
reworked the task thread, that's no longer the case.  As such, I don't
think this distinction you are drawing is necessarily still correct.

I went over your comments to my patch, will try to response/ cover them
ASAP.

Thanks, Erez






--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[ 2425.913245] ib0: bringing up interface
[ 2425.918035] ib0: starting multicast thread
[ 2425.922260] IPv6: ADDRCONF(NETDEV_UP): ib0: link is not ready
[ 2425.928542] ib0: joining MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
[ 2425.935065] ib0: restarting multicast task
[ 2425.939228] ib0: adding multicast entry for mgid 
ff12:601b:ffff:0000:0000:0000:0000:0001
[ 2425.939335] ib0: join completion for ff12:401b:ffff:0000:0000:0000:ffff:ffff 
(status 0)
[ 2425.955538] ib0: adding multicast entry for mgid 
ff12:401b:ffff:0000:0000:0000:0000:0001
[ 2425.964891] ib0: Created ah ffff8800ca0b24c0
[ 2425.969231] ib0: MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff AV 
ffff8800ca0b24c0, LID 0xc000, SL 0
[ 2425.978434] ib0: joining MGID ff12:601b:ffff:0000:0000:0000:0000:0001
[ 2425.985613] ib0: joining MGID ff12:401b:ffff:0000:0000:0000:0000:0001
[ 2425.992595] IPv6: ADDRCONF(NETDEV_CHANGE): ib0: link becomes ready
[ 2425.998967] ib0: restarting multicast task
[ 2426.001183] ib0: setting up send only multicast group for 
ff12:601b:ffff:0000:0000:0000:0000:0016
[ 2426.012123] ib0: adding multicast entry for mgid 
ff12:601b:ffff:0000:0000:0001:ff43:3bf1
[ 2426.021059] ib0: no multicast record for 
ff12:601b:ffff:0000:0000:0000:0000:0016, starting sendonly join
[ 2426.330297] ib0: join completion for ff12:601b:ffff:0000:0000:0000:0000:0001 
(status 0)
[ 2426.339054] ib0: Created ah ffff8800ca0b25c0
[ 2426.343389] ib0: MGID ff12:601b:ffff:0000:0000:0000:0000:0001 AV 
ffff8800ca0b25c0, LID 0xc016, SL 0
[ 2426.353901] ib0: join completion for ff12:401b:ffff:0000:0000:0000:0000:0001 
(status 0)
[ 2426.362984] ib0: Created ah ffff88041f104440
[ 2426.367316] ib0: MGID ff12:401b:ffff:0000:0000:0000:0000:0001 AV 
ffff88041f104440, LID 0xc014, SL 0
[ 2426.376487] ib0: sendonly multicast join failed for 
ff12:601b:ffff:0000:0000:0000:0000:0016, status -22
[ 2426.386032] ib0: joining MGID ff12:601b:ffff:0000:0000:0001:ff43:3bf1
[ 2426.393162] ib0: join completion for ff12:601b:ffff:0000:0000:0001:ff43:3bf1 
(status 0)
[ 2426.402013] ib0: Created ah ffff88041f115340
[ 2426.406349] ib0: MGID ff12:601b:ffff:0000:0000:0001:ff43:3bf1 AV 
ffff88041f115340, LID 0xc018, SL 0
[ 2426.415567] ib0: successfully joined all multicast groups
[ 2426.421610] ib0: successfully joined all multicast groups
[ 2427.037903] ib0: setting up send only multicast group for 
ff12:601b:ffff:0000:0000:0000:0000:0002
[ 2427.047677] ib0: no multicast record for 
ff12:601b:ffff:0000:0000:0000:0000:0002, starting sendonly join
[ 2427.057362] ib0: sendonly multicast join failed for 
ff12:601b:ffff:0000:0000:0000:0000:0002, status -22
[ 2429.073774] ib0: no multicast record for 
ff12:601b:ffff:0000:0000:0000:0000:0016, starting sendonly join
[ 2429.083391] ib0: sendonly multicast join failed for 
ff12:601b:ffff:0000:0000:0000:0000:0016, status -22
[ 2433.100207] ib0: no multicast record for 
ff12:601b:ffff:0000:0000:0000:0000:0016, starting sendonly join
[ 2433.109820] ib0: sendonly multicast join failed for 
ff12:601b:ffff:0000:0000:0000:0000:0016, status -22
[ 2434.594637] ib0: setting up send only multicast group for 
ff12:601b:ffff:0000:0000:0001:ff9f:3b0a
[ 2441.136671] ib0: no multicast record for 
ff12:601b:ffff:0000:0000:0000:0000:0016, starting sendonly join
[ 2441.146279] ib0: sendonly multicast join failed for 
ff12:601b:ffff:0000:0000:0000:0000:0016, status -22
[ 2457.210044] ib0: no multicast record for 
ff12:601b:ffff:0000:0000:0000:0000:0016, starting sendonly join
[ 2457.219652] ib0: sendonly multicast join failed for 
ff12:601b:ffff:0000:0000:0000:0000:0016, status -22
[ 2473.282869] ib0: no multicast record for 
ff12:601b:ffff:0000:0000:0000:0000:0016, starting sendonly join
[ 2473.292482] ib0: sendonly multicast join failed for 
ff12:601b:ffff:0000:0000:0000:0000:0016, status -22
[ 2489.356138] ib0: no multicast record for 
ff12:601b:ffff:0000:0000:0000:0000:0016, starting sendonly join
[ 2489.365749] ib0: sendonly multicast join failed for 
ff12:601b:ffff:0000:0000:0000:0000:0016, status -22
[ 2505.429325] ib0: no multicast record for 
ff12:601b:ffff:0000:0000:0000:0000:0016, starting sendonly join
[ 2505.438941] ib0: sendonly multicast join failed for 
ff12:601b:ffff:0000:0000:0000:0000:0016, status -22
[ 2516.191339] ib0: neigh free for ffffff 
ff12:601b:ffff:0000:0000:0001:ff43:3bf1
[ 2521.502141] ib0: no multicast record for 
ff12:601b:ffff:0000:0000:0000:0000:0016, starting sendonly join
[ 2521.511744] ib0: sendonly multicast join failed for 
ff12:601b:ffff:0000:0000:0000:0000:0016, status -22
[ 2537.575375] ib0: no multicast record for 
ff12:601b:ffff:0000:0000:0000:0000:0016, starting sendonly join
[ 2537.584980] ib0: sendonly multicast join failed for 
ff12:601b:ffff:0000:0000:0000:0000:0016, status -22
[ 2553.648531] ib0: no multicast record for 
ff12:601b:ffff:0000:0000:0000:0000:0016, starting sendonly join
[ 2553.658137] ib0: sendonly multicast join failed for 
ff12:601b:ffff:0000:0000:0000:0000:0016, status -22
[ 2569.721539] ib0: no multicast record for 
ff12:601b:ffff:0000:0000:0000:0000:0016, starting sendonly join
[ 2569.731143] ib0: sendonly multicast join failed for 
ff12:601b:ffff:0000:0000:0000:0000:0016, status -22
[ 2585.794694] ib0: no multicast record for 
ff12:601b:ffff:0000:0000:0000:0000:0016, starting sendonly join
[ 2585.804307] ib0: sendonly multicast join failed for 
ff12:601b:ffff:0000:0000:0000:0000:0016, status -22
[ 2601.867993] ib0: no multicast record for 
ff12:601b:ffff:0000:0000:0000:0000:0016, starting sendonly join
[ 2601.877600] ib0: sendonly multicast join failed for 
ff12:601b:ffff:0000:0000:0000:0000:0016, status -22
[ 2617.940929] ib0: no multicast record for 
ff12:601b:ffff:0000:0000:0000:0000:0016, starting sendonly join
[ 2617.950538] ib0: sendonly multicast join failed for 
ff12:601b:ffff:0000:0000:0000:0000:0016, status -22
[ 2634.014165] ib0: no multicast record for 
ff12:601b:ffff:0000:0000:0000:0000:0016, starting sendonly join
[ 2634.023776] ib0: sendonly multicast join failed for 
ff12:601b:ffff:0000:0000:0000:0000:0016, status -22
[ 2650.087367] ib0: no multicast record for 
ff12:601b:ffff:0000:0000:0000:0000:0016, starting sendonly join
[ 2650.096972] ib0: sendonly multicast join failed for 
ff12:601b:ffff:0000:0000:0000:0000:0016, status -22
[ 2666.160196] ib0: no multicast record for 
ff12:601b:ffff:0000:0000:0000:0000:0016, starting sendonly join
[ 2666.169811] ib0: sendonly multicast join failed for 
ff12:601b:ffff:0000:0000:0000:0000:0016, status -22
[ 2682.233359] ib0: no multicast record for 
ff12:601b:ffff:0000:0000:0000:0000:0016, starting sendonly join
[ 2682.242969] ib0: sendonly multicast join failed for 
ff12:601b:ffff:0000:0000:0000:0000:0016, status -22
[ 2698.306545] ib0: no multicast record for 
ff12:601b:ffff:0000:0000:0000:0000:0016, starting sendonly join
[ 2698.316151] ib0: sendonly multicast join failed for 
ff12:601b:ffff:0000:0000:0000:0000:0016, status -22
[ 2714.379507] ib0: no multicast record for 
ff12:601b:ffff:0000:0000:0000:0000:0016, starting sendonly join
[ 2730.452688] ib0: no multicast record for 
ff12:601b:ffff:0000:0000:0000:0000:0016, starting sendonly join
[ 2746.525811] ib0: no multicast record for 
ff12:601b:ffff:0000:0000:0000:0000:0016, starting sendonly join
[ 2762.598925] ib0: no multicast record for 
ff12:601b:ffff:0000:0000:0000:0000:0016, starting sendonly join
[ 2778.672146] ib0: no multicast record for 
ff12:601b:ffff:0000:0000:0000:0000:0016, starting sendonly join
[ 2794.745171] ib0: no multicast record for 
ff12:601b:ffff:0000:0000:0000:0000:0016, starting sendonly join
[ 2810.818171] ib0: no multicast record for 
ff12:601b:ffff:0000:0000:0000:0000:0016, starting sendonly join
[ 2826.891541] ib0: no multicast record for 
ff12:601b:ffff:0000:0000:0000:0000:0016, starting sendonly join
[ 2842.964546] ib0: no multicast record for 
ff12:601b:ffff:0000:0000:0000:0000:0016, starting sendonly join

Re: [PATCH FIX For-3.19 v5 00/10] Fix ipoib regressions

Reply via email to