> On 19 Oct 2016, at 13:07 PM, Kevin Wolf <kw...@redhat.com> wrote: > > Am 19.10.2016 um 09:57 hat Dmitry Fleytman geschrieben: >> >> On 19 Oct 2016, at 10:25 AM, Kevin Wolf <kw...@redhat.com> wrote: >> >> Am 19.10.2016 um 08:48 hat Dmitry Fleytman geschrieben: >> >> Another related thing that I noticed while debugging this and >> turning on >> tracing is that the interrupt throttling timers kept firing even if >> there was no activity at all. Something might be wrong, there, too. >> >> Next thing I wondered why throttling was enabled at all because the >> spec >> says the default is 0 (turned off). So one thing that I'm pretty >> sure is >> just a misunderstanding is the following defintion: >> >> #define E1000E_MIN_XITR (500) /* No more then 7813 interrupts >> per >> second according to spec >> 10.2.4.2 */ >> >> >> As I understand it, the spec is just giving an example there and >> lower >> values are valid as well. At the very least, 0 should be accepted >> as >> a >> special case because it means "disabled" and it's specified to be >> the >> default. >> >> >> Right, this according to the spec this value should be 0 by default >> and >> throttling should be disabled. >> >> Current device implementation does not allow specification of >> throttling interval less than 500 and treats interval 0 as throttling >> enabled with interval 500. >> >> This is done by intention because according to the spec (10.2.4.2) >> device cannot produce more than 7813 interrupts per second even when >> throttling is disabled. Therefore, even in case of interrupt storm >> (continuous interrupt re-injection by device), number of interrupts >> produced by device is limited and CPU (driver) has enough time to do >> its job and handle problematic interrupt state. >> >> >> I think you're misinterpreting the spec here. This is the paragraph >> we're talking about, right? >> >> For example, if the interval is programmed to 500 (decimal), the >> 82574 guarantees the CPU is not interrupted by it for 128 µs from >> the last interrupt. The maximum observable interrupt rate from the >> 82574 should never exceed 7813 interrupts/sec. >> >> It says "for example", so this is just demonstrating how you can >> calculate the effects of a specific throttling setting. It says that >> _if_ you set ITR to 500, you get an interrupt at most every >> 500 * 256 ns = 128 µs. And 1 / 128 µs = 7821.5 Hz, so this is the >> effective maximum frequency that _this specific_ ITR setting allows. >> >> I also don't think it would make any sense for hardware to be unable to >> trigger interrupts more often than that. Triggering an interrupt is not >> a complex operation that involves a lot of calculation or anything. >> >> >> Hi Kevin, >> >> Yes, I assume that sentence >> >> “The maximum observable interrupt rate from the >> 82574 should never exceed 7813 interrupts/sec." >> >> is not a related to a specific case, but describes a generic limitation, >> however it might be I’m misreading the spec indeed. > > For me everything hints at this being only an example: Not only do the > numbers match the example made in the previous sentence (which is > explicitly called an example) and look weird as a real limit, but it's > also in the same paragraph as the explicit example and the spec is > generally good at starting a new paragraph when talking about a new > aspect.
I tend to agree. > > I don't care enough to actually make you change anything, but I wanted > you to be aware that the interpretation of the spec as coded into our > emulation isn't clear at all (in fact, I would think it's clear that > it's _not_ meant this way) and that real hardware probably doesn't do > the same thing as we do. Thanks, Kevin. > > What we're doing may still have merit, as a workaround for a guest > driver bug. > >> Opposed to this, virtual device is able to raise interrupts with rate >> limited by CPU speed only therefore driver has no chance to fix >> interrupt storm condition. >> >> Windows e1000e drivers rely on upper limit for number of interrupts >> per second in some cases and absence of this limit leads to infinite >> interrupt storms. >> >> To summarise, while usage of throttling mechanisms is a little bit >> different from what specification says, effective emulated device >> behavior is totally compliant to the real device. >> >> >> So Windows doesn't configure ITR (i.e. it is 0) even though it can't >> handle unlimited interrupts? That would be a driver bug then, and >> perhaps an important enough one to keep a workaround in our code. But >> then let's be explicit that this is a workaround for a Windows bug and >> not mandated by the spec. >> >> I'm not sure in what setup you produced this error, but possibly a >> reason why this doesn't happen with real hardware isn't the NIC itself >> but the backend: Communication with the host can obviously be faster >> than talking to a physical network (so if you were doing the latter, the >> rate in the VM wouldn't be limited by the CPU, but by the physical >> network). >> >> >> This issue is reproduced on device disable and not related >> to intensive device/backend communication. One RX packet with >> right timing is enough to trigged the problem. >> >> The same issue was fixed in e1000 device some time ago as well. > > Commit 9596ef7c was good in flagging it as a guest driver bug. Only a > later series brought in the questionable spec interpretation. > > Kevin