On Wed, 2015-01-14 at 18:02 +0200, Erez Shitrit wrote: > Hi Doug, > > Perhaps I am missing something here, but ping6 still doesn't work for me > in many cases. > > I think the reason is that your origin patch does the following: > in function ipoib_mcast_join_task > if (test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags)) > ipoib_mcast_sendonly_join(mcast); > else > ipoib_mcast_join(dev, mcast, 1); > return; > The flow for sendonly_join doesn't include handling the mc_task, so only > the first mc in the list (if it is sendonly mcg) will be sent, and no > more mcg's that are in the ipoib mc list are going to be sent. (see how > it is in ipoib_mcast_join flow)
Yes, I know what you are talking about. However, my patches did not add this bug, it was present in the original code. Please check a plain v3.18 kernel, which does not have my patches, and you will see that ipoib_mcast_sendonly_join_complete also fails to restart the mcast join thread there as well. > > I can demonstrate it with the log of ipoib: > I am trying to ping6 fe80::202:c903:9f:3b0a via ib0 > > The log is: > ib0: restarting multicast task > ib0: setting up send only multicast group for > ff12:601b:ffff:0000:0000:0000:0000:0016 > ib0: adding multicast entry for mgid ff12:601b:ffff:0000:0000:0001:ff43:3bf1 > ib0: no multicast record for ff12:601b:ffff:0000:0000:0000:0000:0016, > starting sendonly join > ib0: join completion for ff12:601b:ffff:0000:0000:0000:0000:0001 (status 0) > ib0: MGID ff12:601b:ffff:0000:0000:0000:0000:0001 AV ffff88081afb5f40, > LID 0xc015, SL 0 > ib0: join completion for ff12:401b:ffff:0000:0000:0000:0000:0001 (status 0) > ib0: MGID ff12:401b:ffff:0000:0000:0000:0000:0001 AV ffff88081e1c42c0, > LID 0xc014, SL 0 > ib0: sendonly multicast join failed for > ff12:601b:ffff:0000:0000:0000:0000:0016, status -22 > ib0: no multicast record for ff12:601b:ffff:0000:0000:0000:0000:0016, > starting sendonly join > ib0: sendonly multicast join failed for > ff12:601b:ffff:0000:0000:0000:0000:0016, status -22 > ib0: no multicast record for ff12:601b:ffff:0000:0000:0000:0000:0016, > starting sendonly join > ib0: sendonly multicast join failed for > ff12:601b:ffff:0000:0000:0000:0000:0016, status -22 > ib0: setting up send only multicast group for > ff12:601b:ffff:0000:0000:0000:0000:0002 > ib0: no multicast record for ff12:601b:ffff:0000:0000:0000:0000:0016, > starting sendonly join > ib0: sendonly multicast join failed for > ff12:601b:ffff:0000:0000:0000:0000:0016, status -22 > ib0: setting up send only multicast group for > ff12:601b:ffff:0000:0000:0001:ff9f:3b0a > >>>>>> here you can see that the ipv6 address is added and queued > to the list > ib0: no multicast record for ff12:601b:ffff:0000:0000:0000:0000:0016, > starting sendonly join > ib0: sendonly multicast join failed for > ff12:601b:ffff:0000:0000:0000:0000:0016, status -22 > >>>>>> the ipv6 mcg will not be sent because it is after some other > sendonly, and no one in that flow re-queue the mc_task again. This is a problem with the design of the original mcast task thread. I'm looking at a fix now. Currently the design only allows one join to be outstanding at a time. Is there a reason for that that I'm not aware of? Some historical context that I don't know about? -- Doug Ledford <dledf...@redhat.com> GPG KeyID: 0E572FDD
signature.asc
Description: This is a digitally signed message part