On 2018/01/18 07:51, Alexander Duyck wrote: > On Wed, Jan 17, 2018 at 10:50 PM, Benjamin Poirier <bpoir...@suse.com> wrote: > > It was reported that emulated e1000e devices in vmware esxi 6.5 Build > > 7526125 do not link up after commit 4aea7a5c5e94 ("e1000e: Avoid receiver > > overrun interrupt bursts", v4.15-rc1). Some tracing shows that after > > e1000e_trigger_lsc() is called, ICR reads out as 0x0 in e1000_msix_other() > > on emulated e1000e devices. In comparison, on real e1000e 82574 hardware, > > icr=0x80000004 (_INT_ASSERTED | _OTHER) in the same situation. > > Isn't 0x80000004 (_INT_ASSERTED | _LSC)? The assumption I based my
Yes. The numeric value is correct. I made a mistake when writing down the flag names. > patch on was that the VMware code was sending _OTHER instead of _LSC > to trigger LSC events. As such in my version of the workaround I just It's not so deterministic, sadly. In my tests, upon entry into e1000_msix_other() after e1000e_trigger_lsc(), with no workaround patch I've seen: icr=0x0 icr=0x3d Reserved RXDMT0 Reserved LSC TXDW icr=0x46 RXO LSC TXQE and someone else reported: icr=0x3c Reserved RXDMT0 Reserved LSC > went through and did the testing if the _RXO bit was set, otherwise I > assumed that whatever event was received must have been meant to > trigger an _LSC type event since that worked in the past. > > > Some experimentation showed that this flaw in vmware e1000e emulation can > > be worked around by not setting Other in EIAC. This is how it was before > > 16ecba59bc33 ("e1000e: Do not read ICR in Other interrupt", v4.5-rc1). > > Did this actually set the _LSC bit or was it just giving you the > _OTHER bit? The ICR for that combination would come out 0x81000000. With my patch, after e1000e_trigger_lsc(), it results in icr=0x81000004 on real and emulated hardware. IMO, the resulting icr read is cleaner than with your patch but it depends on an undocumented quirk of the emulated vmware e1000e, so I don't know which of the two workarounds is more desirable. If you'd like to stick with your patch though, I think that you should definitely rewrite it as: diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c index 9f18d39bdc8f..68c0bcb8287f 100644 --- a/drivers/net/ethernet/intel/e1000e/netdev.c +++ b/drivers/net/ethernet/intel/e1000e/netdev.c @@ -1928,7 +1928,12 @@ static irqreturn_t e1000_msix_other(int __always_unused irq, void *data) __napi_schedule(&adapter->napi); } } - if (icr & E1000_ICR_LSC) { + if (icr & E1000_ICR_LSC || !(icr & E1000_ICR_RXO)) { + /* We assume if the RXO bit is not set that this is a + * link status change event. This is needed due to emulated + * versions of the device that may not correctly populate + * the LSC bit. + */ ew32(ICR, E1000_ICR_LSC); hw->mac.get_link_status = true; /* guard against interrupt when we're going down */ > > > Fixes: 4aea7a5c5e94 ("e1000e: Avoid receiver overrun interrupt bursts") > > Signed-off-by: Benjamin Poirier <bpoir...@suse.com> > > --- > > > > Jeff, I'm sending as RFC since it looks like a problem that should be fixed > > in vmware. If you'd like to have the workaround in e1000e, I'll submit. > > I would appreciate it if you could review/test the patch I submitted > for the same issue. Specifically I would want to make certain that it > still addresses the original RXO interrupt burst issue your reported. I've tested both my patch and yours; they both allow link up on vmware; link up on real 82574 and rxo mitigation on real 82574. I couldn't conveniently test rxo on vmware. > > Thanks. > > - Alex >