On Wed, 2015-01-21 at 12:37 -0800, Roland Dreier wrote:
> On Wed, Jan 21, 2015 at 12:34 PM, Or Gerlitz <gerlitz...@gmail.com> wrote:
> >> Because Doug's changes fixed some bad, easy-to-reproduce issues.  On
> >> the other hand we don't want to introduce new regressions to fix the
> >> old issues.
> >
> > See above, we did introduced regressions.
> 
> Yes, I know, that's my whole point.
> 
> We need to fix the current 3.19-rc code, and the two choices are to
> keep the fixes we added during 3.19 or revert back to 3.18.
> 
> Doug's opinion is that your proposed fix is broken, and we don't have
> an alternate fix.

I will second that opinion.  Over night we ran a series of tests on some
new patches I made, and they resolved the rmmod/insmod failure case in
our testing.  There were two significant fixes.  One of them was related
to the switch to using a separate work queue per device.  The other was
an oversight in ipoib_mcast_restart_task().  Neither of these issues
were addressed by the alternate fix.  So, at best, the alternate fix is
paper machete that covers over two holes but leaves the holes in place.

> So I suggest we revert the whole series from 3.19 and get this right for 3.20.

Before you decide, please take a look at the final fix as I see it.
This was a 7 patch series, now it's 10 patches.  But the final three
patches are small, well understood, and obviously correct.

Regardless of whether you take these 10, I do *not* suggest leaving the
first 8 and using the alternate patch.  I suggest either an all or
nothing approach.  But, like I said, the rmmod issue is now fixed in my
testing.

-- 
Doug Ledford <dledf...@redhat.com>
              GPG KeyID: 0E572FDD


Attachment: signature.asc
Description: This is a digitally signed message part

Reply via email to