Re: [PATCH V3 FIX For-3.19 0/3] IB/ipoib: Fix multicast join flow

2015-01-15 Thread Doug Ledford
On Thu, 2015-01-15 at 09:19 +, Erez Shitrit wrote:
 Hi Doug,
 
 Thank you for the quick response.
 
 Now I can see 2 issues, that I want to draw your attention to:
 
 1. if there is a mcg that the driver failed to join, the mc_task enters to 
 endless loop of re-queue, and the log will be full with the next messages:
 [682560.569826] ib0: no multicast record for 
 ff12:601b::::::0016, starting sendonly join
 [682560.580136] ib0: no multicast record for 
 ff12:601b::::::0016, starting sendonly join
 [682560.590364] ib0: no multicast record for 
 ff12:601b::::::0016, starting sendonly join
 [682560.600504] ib0: no multicast record for 
 ff12:601b::::::0016, starting sendonly join
 [682560.610627] ib0: no multicast record for 
 ff12:601b::::::0016, starting sendonly join
 [682560.620769] ib0: no multicast record for 
 ff12:601b::::::0016, starting sendonly join
 [682560.631082] ib0: no multicast record for 
 ff12:601b::::::0016, starting sendonly join
 [682560.640835] ib0: sendonly multicast join failed for 
 ff12:601b::::::0016, status -22
 [682560.651033] ib0: no multicast record for 
 ff12:601b::::::0016, starting sendonly join
 [682560.660758] ib0: sendonly multicast join failed for 
 ff12:601b::::::0016, status -22
 [682560.670923] ib0: no multicast record for 
 ff12:601b::::::0016, starting sendonly join
 [682560.680676] ib0: sendonly multicast join failed for 
 ff12:601b::::::0016, status -22
 [682560.690898] ib0: no multicast record for 
 ff12:601b::::::0016, starting sendonly join
 [682560.700630] ib0: sendonly multicast join failed for 
 ff12:601b::::::0016, status -22
 
 around 100 times a sec.

OK, this looks like the send only joins that fail are not setting a
fallback properly or something like that.  There is a separate bug that
I've isolated that I'm going to fix, then I we can see if that fix
effects things here, as it very well might.

 2. IPv6 still doesn't work for me, at the same case where it is not the first 
 mcg in the list.

Can you give me some sort of instructions on how to replicate your
testing?  Things are working for me here, but I don't have a complex
IPv6 setup and mine may be too simple to reproduce what you are seeing.

 Thanks, Erez
 
 -Original Message-
 From: Doug Ledford [mailto:dledf...@redhat.com] 
 Sent: Wednesday, January 14, 2015 9:53 PM
 To: linux-rdma@vger.kernel.org; rol...@kernel.org
 Cc: Amir Vadai; Eyal Perry; Erez Shitrit; Or Gerlitz; Doug Ledford
 Subject: [PATCH V3 FIX For-3.19 0/3] IB/ipoib: Fix multicast join flow
 
 This patch series fixes the multicast join behavior problems introduced by my 
 previous patchset.  In particular, the original code did not use the send 
 only join code from the multicast thread context, and so it did not need to 
 restart the multicast thread.  After my previous patchset, it does get called 
 from the thread context, and so the send only join completion areas need to 
 restart the join thread but they don't.  This patchset makes them do so.  It 
 then adds in some cleanups for restarting the thread, and fixes the fact that 
 one delayed join holds up the entire list of joins.
 
 v3: Resend because the last send didn't register in patchworks properly
 (because the subject-prefix was not on all of the emails, only the
 first) and because the Cc: list didn't not pass from cover letter
 to patches
 
 v2: Added two new patches, the first creates a helper to restart the
 multicast join thread and also adds using it in the two places where
 it should have been used but wasn't, the second allows the joins to
 proceed around a delayed join instead of stalling everything.
 
 v1: Addressed the usage of the IPOIB_MCAST_RUN flag
 
 Doug Ledford (3):
   IB/ipoib: Fix failed multicast joins/sends
   IB/ipoib: Add a helper to restart the multicast task
   IB/ipoib: make delayed tasks not hold up everything
 
  drivers/infiniband/ulp/ipoib/ipoib.h   |  1 +
  drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 94 
 ++
  2 files changed, 66 insertions(+), 29 deletions(-)
 
 --
 2.1.0
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-rdma in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


-- 
Doug Ledford dledf...@redhat.com
  GPG KeyID: 0E572FDD




signature.asc
Description: This is a digitally signed message part


Re: [PATCH V3 FIX For-3.19 0/3] IB/ipoib: Fix multicast join flow

2015-01-15 Thread Doug Ledford
On Thu, 2015-01-15 at 22:08 +0200, Erez Shitrit wrote:
 On 1/15/2015 5:24 PM, Doug Ledford wrote:
  On Thu, 2015-01-15 at 09:19 +, Erez Shitrit wrote:
  Hi Doug,
 
  Thank you for the quick response.
 
  Now I can see 2 issues, that I want to draw your attention to:
 
  1. if there is a mcg that the driver failed to join, the mc_task enters to 
  endless loop of re-queue, and the log will be full with the next messages:
  [682560.569826] ib0: no multicast record for 
  ff12:601b::::::0016, starting sendonly join
  [682560.580136] ib0: no multicast record for 
  ff12:601b::::::0016, starting sendonly join
  [682560.590364] ib0: no multicast record for 
  ff12:601b::::::0016, starting sendonly join
  [682560.600504] ib0: no multicast record for 
  ff12:601b::::::0016, starting sendonly join
  [682560.610627] ib0: no multicast record for 
  ff12:601b::::::0016, starting sendonly join
  [682560.620769] ib0: no multicast record for 
  ff12:601b::::::0016, starting sendonly join
  [682560.631082] ib0: no multicast record for 
  ff12:601b::::::0016, starting sendonly join
  [682560.640835] ib0: sendonly multicast join failed for 
  ff12:601b::::::0016, status -22
  [682560.651033] ib0: no multicast record for 
  ff12:601b::::::0016, starting sendonly join
  [682560.660758] ib0: sendonly multicast join failed for 
  ff12:601b::::::0016, status -22
  [682560.670923] ib0: no multicast record for 
  ff12:601b::::::0016, starting sendonly join
  [682560.680676] ib0: sendonly multicast join failed for 
  ff12:601b::::::0016, status -22
  [682560.690898] ib0: no multicast record for 
  ff12:601b::::::0016, starting sendonly join
  [682560.700630] ib0: sendonly multicast join failed for 
  ff12:601b::::::0016, status -22
 
  around 100 times a sec.
  OK, this looks like the send only joins that fail are not setting a
  fallback properly or something like that.  There is a separate bug that
  I've isolated that I'm going to fix, then I we can see if that fix
  effects things here, as it very well might.
 
  2. IPv6 still doesn't work for me, at the same case where it is not the 
  first mcg in the list.
  Can you give me some sort of instructions on how to replicate your
  testing?  Things are working for me here, but I don't have a complex
  IPv6 setup and mine may be too simple to reproduce what you are seeing.
 I don't have a complex setup, i have 2 devices, and i do a regular ping6 
 from device with the full series in it, to some other device. nothing 
 special, the only thing i can say that in the list there is one sendonly 
 mcg (
 
 ff12:601b::::::0016) that is at the first place in the 
 list.
 anyway, i think it connected to the first issue,because it at some endless 
 loop with the first mcg, it doesn't have the chance to handle the other mcg's.

OK, well, I have this all working here.  However, there is still one
lingering issue (not reported on this thread yet) that needs addressed,
so I don't yet consider the patchset complete.  But, I'll post it as it
stands so far for you to try your tests again.

The outstanding issue is that it is possible for ipoib_mcast_flush_dev
to race with ipoib_mcast_join and cause ipoib_mcast_join to oops.  It's
rare, I've only seen it once, but I was afraid that it was possible by
looking at the code, and now I have confirmation that it is indeed
possible.  So, it needs to be fixed.

 
  Thanks, Erez
 
  -Original Message-
  From: Doug Ledford [mailto:dledf...@redhat.com]
  Sent: Wednesday, January 14, 2015 9:53 PM
  To: linux-rdma@vger.kernel.org; rol...@kernel.org
  Cc: Amir Vadai; Eyal Perry; Erez Shitrit; Or Gerlitz; Doug Ledford
  Subject: [PATCH V3 FIX For-3.19 0/3] IB/ipoib: Fix multicast join flow
 
  This patch series fixes the multicast join behavior problems introduced by 
  my previous patchset.  In particular, the original code did not use the 
  send only join code from the multicast thread context, and so it did not 
  need to restart the multicast thread.  After my previous patchset, it does 
  get called from the thread context, and so the send only join completion 
  areas need to restart the join thread but they don't.  This patchset makes 
  them do so.  It then adds in some cleanups for restarting the thread, and 
  fixes the fact that one delayed join holds up the entire list of joins.
 
  v3: Resend because the last send didn't register in patchworks properly
   (because the subject-prefix was not on all of the emails, only the
   first) and because the Cc: list didn't not pass from cover letter
   to patches
 
  v2: Added two new patches, the first creates a helper to restart the
   multicast join thread and also adds using

Re: [PATCH V3 FIX For-3.19 0/3] IB/ipoib: Fix multicast join flow

2015-01-15 Thread Erez Shitrit

On 1/15/2015 5:24 PM, Doug Ledford wrote:

On Thu, 2015-01-15 at 09:19 +, Erez Shitrit wrote:

Hi Doug,

Thank you for the quick response.

Now I can see 2 issues, that I want to draw your attention to:

1. if there is a mcg that the driver failed to join, the mc_task enters to 
endless loop of re-queue, and the log will be full with the next messages:
[682560.569826] ib0: no multicast record for 
ff12:601b::::::0016, starting sendonly join
[682560.580136] ib0: no multicast record for 
ff12:601b::::::0016, starting sendonly join
[682560.590364] ib0: no multicast record for 
ff12:601b::::::0016, starting sendonly join
[682560.600504] ib0: no multicast record for 
ff12:601b::::::0016, starting sendonly join
[682560.610627] ib0: no multicast record for 
ff12:601b::::::0016, starting sendonly join
[682560.620769] ib0: no multicast record for 
ff12:601b::::::0016, starting sendonly join
[682560.631082] ib0: no multicast record for 
ff12:601b::::::0016, starting sendonly join
[682560.640835] ib0: sendonly multicast join failed for 
ff12:601b::::::0016, status -22
[682560.651033] ib0: no multicast record for 
ff12:601b::::::0016, starting sendonly join
[682560.660758] ib0: sendonly multicast join failed for 
ff12:601b::::::0016, status -22
[682560.670923] ib0: no multicast record for 
ff12:601b::::::0016, starting sendonly join
[682560.680676] ib0: sendonly multicast join failed for 
ff12:601b::::::0016, status -22
[682560.690898] ib0: no multicast record for 
ff12:601b::::::0016, starting sendonly join
[682560.700630] ib0: sendonly multicast join failed for 
ff12:601b::::::0016, status -22

around 100 times a sec.

OK, this looks like the send only joins that fail are not setting a
fallback properly or something like that.  There is a separate bug that
I've isolated that I'm going to fix, then I we can see if that fix
effects things here, as it very well might.


2. IPv6 still doesn't work for me, at the same case where it is not the first 
mcg in the list.

Can you give me some sort of instructions on how to replicate your
testing?  Things are working for me here, but I don't have a complex
IPv6 setup and mine may be too simple to reproduce what you are seeing.
I don't have a complex setup, i have 2 devices, and i do a regular ping6 
from device with the full series in it, to some other device. nothing 
special, the only thing i can say that in the list there is one sendonly 
mcg (


ff12:601b::::::0016) that is at the first place in the list.
anyway, i think it connected to the first issue,because it at some endless loop 
with the first mcg, it doesn't have the chance to handle the other mcg's.




Thanks, Erez

-Original Message-
From: Doug Ledford [mailto:dledf...@redhat.com]
Sent: Wednesday, January 14, 2015 9:53 PM
To: linux-rdma@vger.kernel.org; rol...@kernel.org
Cc: Amir Vadai; Eyal Perry; Erez Shitrit; Or Gerlitz; Doug Ledford
Subject: [PATCH V3 FIX For-3.19 0/3] IB/ipoib: Fix multicast join flow

This patch series fixes the multicast join behavior problems introduced by my 
previous patchset.  In particular, the original code did not use the send only 
join code from the multicast thread context, and so it did not need to restart 
the multicast thread.  After my previous patchset, it does get called from the 
thread context, and so the send only join completion areas need to restart the 
join thread but they don't.  This patchset makes them do so.  It then adds in 
some cleanups for restarting the thread, and fixes the fact that one delayed 
join holds up the entire list of joins.

v3: Resend because the last send didn't register in patchworks properly
 (because the subject-prefix was not on all of the emails, only the
 first) and because the Cc: list didn't not pass from cover letter
 to patches

v2: Added two new patches, the first creates a helper to restart the
 multicast join thread and also adds using it in the two places where
 it should have been used but wasn't, the second allows the joins to
 proceed around a delayed join instead of stalling everything.

v1: Addressed the usage of the IPOIB_MCAST_RUN flag

Doug Ledford (3):
   IB/ipoib: Fix failed multicast joins/sends
   IB/ipoib: Add a helper to restart the multicast task
   IB/ipoib: make delayed tasks not hold up everything

  drivers/infiniband/ulp/ipoib/ipoib.h   |  1 +
  drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 94 ++
  2 files changed, 66 insertions(+), 29 deletions(-)

--
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo

RE: [PATCH V3 FIX For-3.19 0/3] IB/ipoib: Fix multicast join flow

2015-01-15 Thread Erez Shitrit
Hi Doug,

Thank you for the quick response.

Now I can see 2 issues, that I want to draw your attention to:

1. if there is a mcg that the driver failed to join, the mc_task enters to 
endless loop of re-queue, and the log will be full with the next messages:
[682560.569826] ib0: no multicast record for 
ff12:601b::::::0016, starting sendonly join
[682560.580136] ib0: no multicast record for 
ff12:601b::::::0016, starting sendonly join
[682560.590364] ib0: no multicast record for 
ff12:601b::::::0016, starting sendonly join
[682560.600504] ib0: no multicast record for 
ff12:601b::::::0016, starting sendonly join
[682560.610627] ib0: no multicast record for 
ff12:601b::::::0016, starting sendonly join
[682560.620769] ib0: no multicast record for 
ff12:601b::::::0016, starting sendonly join
[682560.631082] ib0: no multicast record for 
ff12:601b::::::0016, starting sendonly join
[682560.640835] ib0: sendonly multicast join failed for 
ff12:601b::::::0016, status -22
[682560.651033] ib0: no multicast record for 
ff12:601b::::::0016, starting sendonly join
[682560.660758] ib0: sendonly multicast join failed for 
ff12:601b::::::0016, status -22
[682560.670923] ib0: no multicast record for 
ff12:601b::::::0016, starting sendonly join
[682560.680676] ib0: sendonly multicast join failed for 
ff12:601b::::::0016, status -22
[682560.690898] ib0: no multicast record for 
ff12:601b::::::0016, starting sendonly join
[682560.700630] ib0: sendonly multicast join failed for 
ff12:601b::::::0016, status -22

around 100 times a sec.

2. IPv6 still doesn't work for me, at the same case where it is not the first 
mcg in the list.

Thanks, Erez

-Original Message-
From: Doug Ledford [mailto:dledf...@redhat.com] 
Sent: Wednesday, January 14, 2015 9:53 PM
To: linux-rdma@vger.kernel.org; rol...@kernel.org
Cc: Amir Vadai; Eyal Perry; Erez Shitrit; Or Gerlitz; Doug Ledford
Subject: [PATCH V3 FIX For-3.19 0/3] IB/ipoib: Fix multicast join flow

This patch series fixes the multicast join behavior problems introduced by my 
previous patchset.  In particular, the original code did not use the send only 
join code from the multicast thread context, and so it did not need to restart 
the multicast thread.  After my previous patchset, it does get called from the 
thread context, and so the send only join completion areas need to restart the 
join thread but they don't.  This patchset makes them do so.  It then adds in 
some cleanups for restarting the thread, and fixes the fact that one delayed 
join holds up the entire list of joins.

v3: Resend because the last send didn't register in patchworks properly
(because the subject-prefix was not on all of the emails, only the
first) and because the Cc: list didn't not pass from cover letter
to patches

v2: Added two new patches, the first creates a helper to restart the
multicast join thread and also adds using it in the two places where
it should have been used but wasn't, the second allows the joins to
proceed around a delayed join instead of stalling everything.

v1: Addressed the usage of the IPOIB_MCAST_RUN flag

Doug Ledford (3):
  IB/ipoib: Fix failed multicast joins/sends
  IB/ipoib: Add a helper to restart the multicast task
  IB/ipoib: make delayed tasks not hold up everything

 drivers/infiniband/ulp/ipoib/ipoib.h   |  1 +
 drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 94 ++
 2 files changed, 66 insertions(+), 29 deletions(-)

--
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V3 FIX For-3.19 0/3] IB/ipoib: Fix multicast join flow

2015-01-14 Thread Doug Ledford
This patch series fixes the multicast join behavior problems introduced
by my previous patchset.  In particular, the original code did not use
the send only join code from the multicast thread context, and so it
did not need to restart the multicast thread.  After my previous patchset,
it does get called from the thread context, and so the send only join
completion areas need to restart the join thread but they don't.  This
patchset makes them do so.  It then adds in some cleanups for restarting
the thread, and fixes the fact that one delayed join holds up the entire
list of joins.

v3: Resend because the last send didn't register in patchworks properly
(because the subject-prefix was not on all of the emails, only the
first) and because the Cc: list didn't not pass from cover letter
to patches

v2: Added two new patches, the first creates a helper to restart the
multicast join thread and also adds using it in the two places where
it should have been used but wasn't, the second allows the joins to
proceed around a delayed join instead of stalling everything.

v1: Addressed the usage of the IPOIB_MCAST_RUN flag

Doug Ledford (3):
  IB/ipoib: Fix failed multicast joins/sends
  IB/ipoib: Add a helper to restart the multicast task
  IB/ipoib: make delayed tasks not hold up everything

 drivers/infiniband/ulp/ipoib/ipoib.h   |  1 +
 drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 94 ++
 2 files changed, 66 insertions(+), 29 deletions(-)

-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html