On Wed, 2015-01-21 at 12:37 -0800, Roland Dreier wrote: > On Wed, Jan 21, 2015 at 12:34 PM, Or Gerlitz <gerlitz...@gmail.com> wrote: > >> Because Doug's changes fixed some bad, easy-to-reproduce issues. On > >> the other hand we don't want to introduce new regressions to fix the > >> old issues. > > > > See above, we did introduced regressions. > > Yes, I know, that's my whole point. > > We need to fix the current 3.19-rc code, and the two choices are to > keep the fixes we added during 3.19 or revert back to 3.18. > > Doug's opinion is that your proposed fix is broken, and we don't have > an alternate fix.
I will second that opinion. Over night we ran a series of tests on some new patches I made, and they resolved the rmmod/insmod failure case in our testing. There were two significant fixes. One of them was related to the switch to using a separate work queue per device. The other was an oversight in ipoib_mcast_restart_task(). Neither of these issues were addressed by the alternate fix. So, at best, the alternate fix is paper machete that covers over two holes but leaves the holes in place. > So I suggest we revert the whole series from 3.19 and get this right for 3.20. Before you decide, please take a look at the final fix as I see it. This was a 7 patch series, now it's 10 patches. But the final three patches are small, well understood, and obviously correct. Regardless of whether you take these 10, I do *not* suggest leaving the first 8 and using the alternate patch. I suggest either an all or nothing approach. But, like I said, the rmmod issue is now fixed in my testing. -- Doug Ledford <dledf...@redhat.com> GPG KeyID: 0E572FDD
signature.asc
Description: This is a digitally signed message part