On Thu, 31 May 2012, Paolo Bonzini wrote:
> Il 31/05/2012 00:53, Luigi Rizzo ha scritto:
> > The image contains my fast packet generator "pkt-gen" (a stock
> > traffic generator such as netperf etc. is too slow to show the
> > problem). pkt-gen can send about 1Mpps in this configuration using
> > -net netmap in the backend. The qemu process in this case takes 100%
> > CPU. On the receive side, i cannot receive more than 50Kpps, even if i
> > flood the bridge with a a huge amount of traffic. The qemu process stays
> > at 5% cpu or less.
> >
> > Then i read on the docs in main-loop.h which says that one case where
> > the qemu_notify_event() is needed is when using
> > qemu_set_fd_handler2(), which is exactly what my backend uses
> > (similar to tap.c)
>
> The path is a bit involved, but I think Luigi is right. The docs say
> "Remember to call qemu_notify_event whenever the [return value of the
> fd_read_poll callback] may change from false to true." Now net/tap.c has
>
> static int tap_can_send(void *opaque)
> {
> TAPState *s = opaque;
>
> return qemu_can_send_packet(&s->nc);
> }
>
> and (ignoring VLANs) qemu_can_send_packet is
>
> int qemu_can_send_packet(VLANClientState *sender)
> {
> if (sender->peer->receive_disabled) {
> return 0;
> } else if (sender->peer->info->can_receive &&
> !sender->peer->info->can_receive(sender->peer)) {
> return 0;
> } else {
> return 1;
> }
> }
>
> So whenever receive_disabled goes from 0 to 1 or can_receive goes from 0 to 1,
> the _peer_ has to call qemu_notify_event. In e1000.c we have
>
> static bool e1000_has_rxbufs(E1000State *s, size_t total_size)
> {
> int bufs;
> /* Fast-path short packets */
> if (total_size <= s->rxbuf_size) {
> return s->mac_reg[RDH] != s->mac_reg[RDT] || !s->check_rxov;
> }
> if (s->mac_reg[RDH] < s->mac_reg[RDT]) {
> bufs = s->mac_reg[RDT] - s->mac_reg[RDH];
> } else if (s->mac_reg[RDH] > s->mac_reg[RDT] || !s->check_rxov) {
> bufs = s->mac_reg[RDLEN] / sizeof(struct e1000_rx_desc) +
> s->mac_reg[RDT] - s->mac_reg[RDH];
> } else {
> return false;
> }
> return total_size <= bufs * s->rxbuf_size;
> }
>
> static int
> e1000_can_receive(VLANClientState *nc)
> {
> E1000State *s = DO_UPCAST(NICState, nc, nc)->opaque;
>
> return (s->mac_reg[RCTL] & E1000_RCTL_EN) && e1000_has_rxbufs(s, 1);
> }
>
> So as a conservative approximation, you need to fire qemu_notify_event
> whenever you write to RDH, RDT, RDLEN and RCTL, or when check_rxov becomes
> zero. In practice, only RDT, RCTL and check_rxov matter. Luigi, does this
> patch work for you?
>
> diff --git a/hw/e1000.c b/hw/e1000.c
> index 4573f13..0069103 100644
> --- a/hw/e1000.c
> +++ b/hw/e1000.c
> @@ -295,6 +295,7 @@ set_rx_control(E1000State *s, int index, uint32_t val)
> s->rxbuf_min_shift = ((val / E1000_RCTL_RDMTS_QUAT) & 3) + 1;
> DBGOUT(RX, "RCTL: %d, mac_reg[RCTL] = 0x%x\n", s->mac_reg[RDT],
> s->mac_reg[RCTL]);
> + qemu_notify_event();
> }
>
> static void
> @@ -922,6 +923,7 @@ set_rdt(E1000State *s, int index, uint32_t val)
> {
> s->check_rxov = 0;
> s->mac_reg[index] = val & 0xffff;
> + qemu_notify_event();
> }
>
> static void
>
>
> RDT is indeed written in the ISR. In the Linux driver, e1000_clean_rx_irq
> calls adapter->alloc_rx_buf which is e1000_alloc_rx_buffers. There you
> see this:
>
> if (likely(rx_ring->next_to_use != i)) {
> rx_ring->next_to_use = i;
> if (unlikely(i-- == 0))
> i = (rx_ring->count - 1);
>
> /* Force memory writes to complete before letting h/w
> * know there are new descriptors to fetch. (Only
> * applicable for weak-ordered memory model archs,
> * such as IA-64). */
> wmb();
> writel(i, hw->hw_addr + rx_ring->rdt);
> }
>
> Similarly for all other devices:
> - cadence_gem -> GEM_NWCTRL
> - dp8393x -> SONIC_CR, SONIC_ISR
> - eepro100 -> set_ru_state
> - mcf_fec -> mcf_fec_enable_rx
> - milkymist-minimax2 -> R_STATE0, R_STATE1
> - mipsnet -> MIPSNET_INT_CTL, MIPSNET_RX_DATA_BUFFER
> - ne2000 -> EN0_STARTPG, EN0_STOPPG, E8390_CMD
> - opencores_eth -> TX_BD_NUM, MODER, rx_desc
> - pcnet -> pcnet_start, csr[5]
> - rtl8139 -> RxBufPtr and Cfg9346
> - smc91c111 -> RCR, smc91c111_release_packet
> - spapr_llan -> h_add_logical_lan_buffer
> - stellaris_enet -> RCTL, DATA
> - xgmac -> DMA_CONTROL
> - xilinx_axienet -> rcw[1]
> - xilinx_ethlite -> R_RX_CTRL0
>
> For Xen I think this is not possible at the moment because it doesn't
> implement rx notification.
Why do you say that?
Xen supports the iothread and CONFIG_EVENTFD.