Re: [Qemu-devel] [PATCH] implement moderation registers for e1000
On Wed, Feb 06, 2013 at 03:23:41PM +0100, Luigi Rizzo wrote: The following patch implements interrupt moderation registers for the e1000 adapter. These feature is normally used by OS drivers, and their implementation improves performance significantly, especially on the transmit path. The feature can be disabled through a command line option. We have seen only benefits in throughput, while latency slightly increases (that is an inherent feature of interrupt moderation) because very short delays cannot be emulated precisely. For those curious on performance, there is a set of measurements (of this and other changes that we will post separately) at http://info.iet.unipi.it/~luigi/research.html#qemu http://info.iet.unipi.it/~luigi/papers/20130206-qemu.pdf is 404. +static void +mit_set_ics(E1000State *s, uint32_t cause) +{ +if (s-mit_on == 0) { +set_ics(s, 0, cause); +return; +} +s-mit_cause |= cause; +if (!s-mit_timer_on) +mit_rearm_and_int(s); Please run scripts/checkpatch.pl. QEMU coding style always uses braces, even for an if statement with a single line body. @@ -1302,6 +1377,10 @@ static int pci_e1000_init(PCIDevice *pci_dev) d-autoneg_timer = qemu_new_timer_ms(vm_clock, e1000_autoneg_timer, d); +d-mit_cause = 0; +d-mit_timer_on = 0; +d-mit_timer = qemu_new_timer_ns(vm_clock, mit_rearm_and_int, d); Missing qemu_del_timer(d-mit_timer) and qemu_free_timer(d-mit_timer) in pci_e1000_uninit(). + return 0; } @@ -1313,6 +1392,7 @@ static void qdev_e1000_reset(DeviceState *dev) static Property e1000_properties[] = { DEFINE_NIC_PROPERTIES(E1000State, conf), +DEFINE_PROP_UINT32(mit_on, E1000State, mit_on, 1), /* default on */ Can access the performance results you linked to. Therefore it's hard to know whether default on makes sense. We must avoid performance regressions. Stefan
Re: [Qemu-devel] [PATCH] implement moderation registers for e1000
On Fri, Feb 8, 2013 at 2:02 AM, Stefan Hajnoczi stefa...@gmail.com wrote: On Wed, Feb 06, 2013 at 03:23:41PM +0100, Luigi Rizzo wrote: The following patch implements interrupt moderation registers for the e1000 adapter. These feature is normally used by OS drivers, and their implementation improves performance significantly, especially on the transmit path. The feature can be disabled through a command line option. We have seen only benefits in throughput, while latency slightly increases (that is an inherent feature of interrupt moderation) because very short delays cannot be emulated precisely. For those curious on performance, there is a set of measurements (of this and other changes that we will post separately) at http://info.iet.unipi.it/~luigi/research.html#qemu http://info.iet.unipi.it/~luigi/papers/20130206-qemu.pdf is 404. sorry, fixed now. And, will resubmit a fixed patch with uninit and fixed braces in the if() statement. I am happy to make this default to off. But it would be good if you could actually give it a try. Note that linux and FreeBSD (and presumably windows) do use moderation by default so enabling the emulation of the registers makes the guest OS behave differently (and closer to bare metal). To test that the patch itself does not cause regression in the emulator one should also turn off moderation in the guest (one of the tests i have run). cheers luigi
Re: [Qemu-devel] [PATCH] implement moderation registers for e1000
Il 08/02/2013 11:20, Luigi Rizzo ha scritto: On Fri, Feb 8, 2013 at 2:02 AM, Stefan Hajnoczi stefa...@gmail.com mailto:stefa...@gmail.com wrote: On Wed, Feb 06, 2013 at 03:23:41PM +0100, Luigi Rizzo wrote: The following patch implements interrupt moderation registers for the e1000 adapter. These feature is normally used by OS drivers, and their implementation improves performance significantly, especially on the transmit path. The feature can be disabled through a command line option. We have seen only benefits in throughput, while latency slightly increases (that is an inherent feature of interrupt moderation) because very short delays cannot be emulated precisely. For those curious on performance, there is a set of measurements (of this and other changes that we will post separately) at http://info.iet.unipi.it/~luigi/research.html#qemu http://info.iet.unipi.it/~luigi/papers/20130206-qemu.pdf is 404. sorry, fixed now. And, will resubmit a fixed patch with uninit and fixed braces in the if() statement. I am happy to make this default to off. But it would be good if you could actually give it a try. Note that linux and FreeBSD (and presumably windows) do use moderation by default so enabling the emulation of the registers makes the guest OS behave differently (and closer to bare metal). To test that the patch itself does not cause regression in the emulator one should also turn off moderation in the guest (one of the tests i have run). I think making the default on is fine, but you need to add compatibility options to leave it off in older machine types (pc-1.4 and earlier). Paolo
Re: [Qemu-devel] [PATCH] implement moderation registers for e1000
On Fri, Feb 8, 2013 at 11:46 AM, Paolo Bonzini pbonz...@redhat.com wrote: Il 08/02/2013 11:20, Luigi Rizzo ha scritto: On Fri, Feb 8, 2013 at 2:02 AM, Stefan Hajnoczi stefa...@gmail.com mailto:stefa...@gmail.com wrote: On Wed, Feb 06, 2013 at 03:23:41PM +0100, Luigi Rizzo wrote: The following patch implements interrupt moderation registers for the e1000 adapter. These feature is normally used by OS drivers, and their implementation improves performance significantly, especially on the transmit path. The feature can be disabled through a command line option. We have seen only benefits in throughput, while latency slightly increases (that is an inherent feature of interrupt moderation) because very short delays cannot be emulated precisely. For those curious on performance, there is a set of measurements (of this and other changes that we will post separately) at http://info.iet.unipi.it/~luigi/research.html#qemu http://info.iet.unipi.it/~luigi/papers/20130206-qemu.pdf is 404. sorry, fixed now. And, will resubmit a fixed patch with uninit and fixed braces in the if() statement. I am happy to make this default to off. But it would be good if you could actually give it a try. Note that linux and FreeBSD (and presumably windows) do use moderation by default so enabling the emulation of the registers makes the guest OS behave differently (and closer to bare metal). To test that the patch itself does not cause regression in the emulator one should also turn off moderation in the guest (one of the tests i have run). I think making the default on is fine, but you need to add compatibility options to leave it off in older machine types (pc-1.4 and earlier). Latency regression. Would need to see real results to understand how bad it is. Stefan
Re: [Qemu-devel] [PATCH] implement moderation registers for e1000
On Fri, Feb 08, 2013 at 11:46:34AM +0100, Paolo Bonzini wrote: Il 08/02/2013 11:20, Luigi Rizzo ha scritto: ... I am happy to make this default to off. But it would be good if you could actually give it a try. Note that linux and FreeBSD (and presumably windows) do use moderation by default so enabling the emulation of the registers makes the guest OS behave differently (and closer to bare metal). To test that the patch itself does not cause regression in the emulator one should also turn off moderation in the guest (one of the tests i have run). I think making the default on is fine, but you need to add compatibility options to leave it off in older machine types (pc-1.4 and earlier). I am unclear on what you mean by older machine types ? The hardware (E1000_DEV_ID_82540EM) does have these registers, and it is entirely up to the guest OS to use them or not. cheers luigi
Re: [Qemu-devel] [PATCH] implement moderation registers for e1000
On Fri, Feb 08, 2013 at 11:59:12AM +0100, Stefan Hajnoczi wrote: ... http://info.iet.unipi.it/~luigi/papers/20130206-qemu.pdf is 404. sorry, fixed now. And, will resubmit a fixed patch with uninit and fixed braces in the if() statement. I am happy to make this default to off. But it would be good if you could actually give it a try. Note that linux and FreeBSD (and presumably windows) do use moderation by default so enabling the emulation of the registers makes the guest OS behave differently (and closer to bare metal). To test that the patch itself does not cause regression in the emulator one should also turn off moderation in the guest (one of the tests i have run). I think making the default on is fine, but you need to add compatibility options to leave it off in older machine types (pc-1.4 and earlier). Latency regression. Would need to see real results to understand how bad it is. most of the numbers in the paper are for FreeBSD, not sure if they qualify as real here :) For Linux, in guest-host configuration, we have these numbers for TCP_RR CONFIGURATION TCP_RR (KTps) 1 VCPU TX itr=0 17.7 1 VCPU TX itr=1007.3 2 VCPU TX itr=0 13.9 2 VCPU TX itr=1007.4 These are computed forcing the value in the itr register. However, by default, linux seems to dynamically adjust the itr values depending on the load so it is difficult to make repeatable experiments, This is why I'd encourage you to run some tests with what you think is an appropriate configuration. cheers luigi
Re: [Qemu-devel] [PATCH] implement moderation registers for e1000
Il 07/02/2106 07:28, Luigi Rizzo ha scritto: On Fri, Feb 08, 2013 at 11:46:34AM +0100, Paolo Bonzini wrote: Il 08/02/2013 11:20, Luigi Rizzo ha scritto: ... I am happy to make this default to off. But it would be good if you could actually give it a try. Note that linux and FreeBSD (and presumably windows) do use moderation by default so enabling the emulation of the registers makes the guest OS behave differently (and closer to bare metal). To test that the patch itself does not cause regression in the emulator one should also turn off moderation in the guest (one of the tests i have run). I think making the default on is fine, but you need to add compatibility options to leave it off in older machine types (pc-1.4 and earlier). I am unclear on what you mean by older machine types ? The hardware (E1000_DEV_ID_82540EM) does have these registers, and it is entirely up to the guest OS to use them or not. Yes, but suppose a guest has a bug that was masked by the old non-implementation. -M pc-1.4 is supposed to bring back the old version's behavior as much as possible, including making the guest bug latent again. Paolo
Re: [Qemu-devel] [PATCH] implement moderation registers for e1000
On Fri, Feb 08, 2013 at 12:52:12PM +0100, Paolo Bonzini wrote: Il 07/02/2106 07:28, Luigi Rizzo ha scritto: On Fri, Feb 08, 2013 at 11:46:34AM +0100, Paolo Bonzini wrote: Il 08/02/2013 11:20, Luigi Rizzo ha scritto: ... I am happy to make this default to off. But it would be good if you could actually give it a try. Note that linux and FreeBSD (and presumably windows) do use moderation by default so enabling the emulation of the registers makes the guest OS behave differently (and closer to bare metal). To test that the patch itself does not cause regression in the emulator one should also turn off moderation in the guest (one of the tests i have run). I think making the default on is fine, but you need to add compatibility options to leave it off in older machine types (pc-1.4 and earlier). I am unclear on what you mean by older machine types ? The hardware (E1000_DEV_ID_82540EM) does have these registers, and it is entirely up to the guest OS to use them or not. Yes, but suppose a guest has a bug that was masked by the old non-implementation. -M pc-1.4 is supposed to bring back the old version's behavior as much as possible, including making the guest bug latent again. ok i see. Is there some example code that already handles a similar '-M' option so i can look at it and use to set the mit_on parameter ? cheers luigi
Re: [Qemu-devel] [PATCH] implement moderation registers for e1000
Il 07/02/2106 07:28, Luigi Rizzo ha scritto: Yes, but suppose a guest has a bug that was masked by the old non-implementation. -M pc-1.4 is supposed to bring back the old version's behavior as much as possible, including making the guest bug latent again. ok i see. Is there some example code that already handles a similar '-M' option so i can look at it and use to set the mit_on parameter ? Yes, see http://permalink.gmane.org/gmane.comp.emulators.qemu/193676. You actually have to set it *off* in the compatibility option. Paolo
[Qemu-devel] [PATCH] implement moderation registers for e1000
The following patch implements interrupt moderation registers for the e1000 adapter. These feature is normally used by OS drivers, and their implementation improves performance significantly, especially on the transmit path. The feature can be disabled through a command line option. We have seen only benefits in throughput, while latency slightly increases (that is an inherent feature of interrupt moderation) because very short delays cannot be emulated precisely. For those curious on performance, there is a set of measurements (of this and other changes that we will post separately) at http://info.iet.unipi.it/~luigi/research.html#qemu cheers luigi Signed-off-by: Luigi Rizzo ri...@iet.unipi.it diff --git a/hw/e1000.c b/hw/e1000.c index bb150c6..b4c0f4a 100644 --- a/hw/e1000.c +++ b/hw/e1000.c @@ -131,6 +131,10 @@ typedef struct E1000State_st { } eecd_state; QEMUTimer *autoneg_timer; +QEMUTimer *mit_timer; /* handle for the timer */ +uint32_t mit_timer_on; /* mitigation timer active */ +uint32_t mit_cause;/* pending interrupt cause */ +uint32_t mit_on; /* mitigation enable */ } E1000State; #definedefreg(x) x = (E1000_##x2) @@ -146,6 +150,7 @@ enum { defreg(TPR), defreg(TPT),defreg(TXDCTL), defreg(WUFC), defreg(RA),defreg(MTA),defreg(CRCERRS),defreg(VFTA), defreg(VET), +defreg(RDTR), defreg(RADV), defreg(TADV), defreg(ITR), }; static void @@ -652,6 +657,73 @@ static uint64_t tx_desc_base(E1000State *s) return (bah 32) + bal; } +/* helper function, 0 means the value is not set */ +static inline void +mit_update_delay(uint32_t *curr, uint32_t value) +{ +if (value (*curr == 0 || value *curr)) { +*curr = value; +} +} + +/* + * If necessary, rearm the timer and post an interrupt. + * Called at the end of tx/rx routines (mit_timer_on == 0), + * and when the timer fires (mit_timer_on == 1). + * We provide a partial implementation of interrupt mitigation, + * emulating only RADV, TADV and ITR (lower 16 bits, 1024ns units for + * RADV and TADV, 256ns units for ITR). RDTR is only used to enable RADV; + * relative timers based on TIDV and RDTR are not implemented. + */ +static void +mit_rearm_and_int(void *opaque) +{ +E1000State *s = opaque; +uint32_t mit_delay = 0; + +/* + * Clear the flag. It is only set when the callback fires, + * and we need to clear it anyways. + */ +s-mit_timer_on = 0; +if (s-mit_cause == 0) { /* no events pending, we are done */ +return; +} +/* + * Compute the next mitigation delay according to pending interrupts + * and the current values of RADV (provided RDTR!=0), TADV and ITR. + * Then rearm the timer. + */ +if (s-mit_cause (E1000_ICR_TXQE | E1000_ICR_TXDW)) { +mit_update_delay(mit_delay, s-mac_reg[TADV] * 4); +} +if (s-mac_reg[RDTR] (s-mit_cause E1000_ICS_RXT0)) { +mit_update_delay(mit_delay, s-mac_reg[RADV] * 4); +} +mit_update_delay(mit_delay, s-mac_reg[ITR]); + +if (mit_delay) { +s-mit_timer_on = 1; +qemu_mod_timer(s-mit_timer, +qemu_get_clock_ns(vm_clock) + mit_delay * 256); +} + +set_ics(s, 0, s-mit_cause); +s-mit_cause = 0; +} + +static void +mit_set_ics(E1000State *s, uint32_t cause) +{ +if (s-mit_on == 0) { +set_ics(s, 0, cause); +return; +} +s-mit_cause |= cause; +if (!s-mit_timer_on) +mit_rearm_and_int(s); +} + static void start_xmit(E1000State *s) { @@ -689,7 +761,7 @@ start_xmit(E1000State *s) break; } } -set_ics(s, 0, cause); +mit_set_ics(s, cause); } static int @@ -908,7 +980,7 @@ e1000_receive(NetClientState *nc, const uint8_t *buf, size_t size) s-rxbuf_min_shift) n |= E1000_ICS_RXDMT0; -set_ics(s, 0, n); +mit_set_ics(s, n); return size; } @@ -1013,6 +1085,7 @@ static uint32_t (*macreg_readops[])(E1000State *, int) = { getreg(RDH), getreg(RDT),getreg(VET),getreg(ICS), getreg(TDBAL), getreg(TDBAH), getreg(RDBAH), getreg(RDBAL), getreg(TDLEN), getreg(RDLEN), +getreg(RDTR), getreg(RADV), getreg(TADV), getreg(ITR), [TOTH] = mac_read_clr8,[TORH] = mac_read_clr8, [GPRC] = mac_read_clr4, [GPTC] = mac_read_clr4,[TPR] = mac_read_clr4, [TPT] = mac_read_clr4, @@ -1029,6 +1102,8 @@ static void (*macreg_writeops[])(E1000State *, int, uint32_t) = { putreg(PBA), putreg(EERD), putreg(SWSM), putreg(WUFC), putreg(TDBAL), putreg(TDBAH), putreg(TXDCTL), putreg(RDBAH), putreg(RDBAL), putreg(LEDCTL), putreg(VET), +[RDTR] = set_16bit, [RADV] = set_16bit, [TADV] = set_16bit, +[ITR] = set_16bit, [TDLEN] = set_dlen,[RDLEN] = set_dlen, [TCTL] = set_tctl, [TDT] = set_tctl, [MDIC] = set_mdic,