On Wed, 2015-01-14 at 18:02 +0200, Erez Shitrit wrote:
> Hi Doug,
> 
> Perhaps I am missing something here, but ping6 still doesn't work for me 
> in many cases.
> 
> I think the reason is that your origin patch does the following:
> in function ipoib_mcast_join_task
>          if (test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags))
>              ipoib_mcast_sendonly_join(mcast);
>          else
>              ipoib_mcast_join(dev, mcast, 1);
>          return;
> The flow for sendonly_join doesn't include handling the mc_task, so only 
> the first mc in the list (if it is sendonly mcg) will be sent, and no 
> more mcg's that are in the ipoib mc list are going to be sent. (see how 
> it is in ipoib_mcast_join flow)

Yes, I know what you are talking about.  However, my patches did not add
this bug, it was present in the original code.  Please check a plain
v3.18 kernel, which does not have my patches, and you will see that
ipoib_mcast_sendonly_join_complete also fails to restart the mcast join
thread there as well.

> 
> I can demonstrate it with the log of ipoib:
> I am trying to ping6 fe80::202:c903:9f:3b0a via ib0
> 
> The log is:
> ib0: restarting multicast task
> ib0: setting up send only multicast group for 
> ff12:601b:ffff:0000:0000:0000:0000:0016
> ib0: adding multicast entry for mgid ff12:601b:ffff:0000:0000:0001:ff43:3bf1
> ib0: no multicast record for ff12:601b:ffff:0000:0000:0000:0000:0016, 
> starting sendonly join
> ib0: join completion for ff12:601b:ffff:0000:0000:0000:0000:0001 (status 0)
> ib0: MGID ff12:601b:ffff:0000:0000:0000:0000:0001 AV ffff88081afb5f40, 
> LID 0xc015, SL 0
> ib0: join completion for ff12:401b:ffff:0000:0000:0000:0000:0001 (status 0)
> ib0: MGID ff12:401b:ffff:0000:0000:0000:0000:0001 AV ffff88081e1c42c0, 
> LID 0xc014, SL 0
> ib0: sendonly multicast join failed for 
> ff12:601b:ffff:0000:0000:0000:0000:0016, status -22
> ib0: no multicast record for ff12:601b:ffff:0000:0000:0000:0000:0016, 
> starting sendonly join
> ib0: sendonly multicast join failed for 
> ff12:601b:ffff:0000:0000:0000:0000:0016, status -22
> ib0: no multicast record for ff12:601b:ffff:0000:0000:0000:0000:0016, 
> starting sendonly join
> ib0: sendonly multicast join failed for 
> ff12:601b:ffff:0000:0000:0000:0000:0016, status -22
> ib0: setting up send only multicast group for 
> ff12:601b:ffff:0000:0000:0000:0000:0002
> ib0: no multicast record for ff12:601b:ffff:0000:0000:0000:0000:0016, 
> starting sendonly join
> ib0: sendonly multicast join failed for 
> ff12:601b:ffff:0000:0000:0000:0000:0016, status -22
> ib0: setting up send only multicast group for 
> ff12:601b:ffff:0000:0000:0001:ff9f:3b0a
>      >>>>>> here you can see that the ipv6 address is added and queued 
> to the list
> ib0: no multicast record for ff12:601b:ffff:0000:0000:0000:0000:0016, 
> starting sendonly join
> ib0: sendonly multicast join failed for 
> ff12:601b:ffff:0000:0000:0000:0000:0016, status -22
>      >>>>>> the ipv6 mcg will not be sent because it is after some other 
> sendonly, and no one in that flow re-queue the mc_task again.

This is a problem with the design of the original mcast task thread.
I'm looking at a fix now.  Currently the design only allows one join to
be outstanding at a time.  Is there a reason for that that I'm not aware
of?  Some historical context that I don't know about?

-- 
Doug Ledford <dledf...@redhat.com>
              GPG KeyID: 0E572FDD


Attachment: signature.asc
Description: This is a digitally signed message part

Reply via email to