Re: [Qemu-devel] TCP Segementation Offloading

Stefan Hajnoczi Fri, 06 May 2016 10:06:39 -0700

On Fri, May 06, 2016 at 06:34:33AM +0200, Ingo Krabbe wrote:
> > On Sun, May 01, 2016 at 02:31:57PM +0200, Ingo Krabbe wrote:
> >> Good Mayday Qemu Developers,
> >> 
> >> today I tried to find a reference to a networking problem, that seems to 
> >> be of quite general nature: TCP Segmentation Offloading (TSO) in virtual 
> >> environments.
> >> 
> >> When I setup TAP network adapter for a virtual machine and put it into a 
> >> host bridge, the known best practice is to manually set "tso off gso off" 
> >> with ethtool, for the guest driver if I use a hardware emulation, such as 
> >> e1000 and/or "tso off gso off" for the host driver and/or for the bridge 
> >> adapter, if I use the virtio driver, as otherwise you experience 
> >> (sometimes?) performance problems or even lost packages.
> > 
> > I can't parse this sentence.  In what cases do you think it's a "known
> > best practice" to disable tso and gso?  Maybe a table would be a clearer
> > way to communicate this.
> > 
> > Can you provide a link to the source claiming tso and gso should be
> > disabled?
> 
> Sorry for that long sentence. The consequence seems to be, that it is most 
> stable to turn off tso and gso for host bridges and for adapters in virtual 
> machines.
> 
> One of the most comprehensive collections of arguments is this article
> 
>       
> https://kris.io/2015/10/01/kvm-network-performance-tso-and-gso-turn-it-off/
> 
> while I also found a documentation for Centos 6
> 
>       
> https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Virtualization_Host_Configuration_and_Guest_Installation_Guide/ch10s04.html


This documentation is about (ancient) RHEL 3.9 guests.  I would not
apply anything on that page to modern Linux distro releases without
re-checking.

> 
> In google groups this one is discussed
> 
>       https://code.google.com/p/ganeti/wiki/PerformanceTuning
> 
> Of course the same is found for Xen Machines
> 
>       http://cloudnull.io/2012/07/xenserver-network-tuning/
> 
> You see there are several Links in the internet and my first question is: Why 
> can't I find this discussion in the qemu-wiki space.
> 
> I think the bug
> 
>       https://bugs.launchpad.net/bugs/1202289
> 
> is related.

Thanks for posting all the links!

I hope Michael and/or Jason explain the current status for RHEL 6/7 and
other modern distros.  Maybe they can also follow up with the kris.io
blog author if an update to the post is necessary.

TSO/GSO is enabled by default on my Fedora and RHEL host/guests.  If it
was a best practice for those distros I'd expect the default settings to
reflect that.  Also, I would be surprised if the offload features were
bad since work was put into supporting and extending them in virtio-net
over the years.

> >> I haven't found a complete analysis of the background of these problems, 
> >> but there seem to be some effects on MTU based fragmentation and UDP 
> >> checksums.
> >> 
> >> There is a tso related bug on launchpad, but the context of this bug is 
> >> too narrow, for the generality of the problem.
> >> 
> >> Also it seems that there is a problem in LXC contexts too (I found such a 
> >> reference, without detailed description in a Post about Xen setup).
> >> 
> >> My question now is: Is there a bug in the driver code and shouldn't this 
> >> be documented somewhere in wiki.qemu.org? Where there developments about 
> >> this topic in the past or is there any planned/ongoing work todo on the 
> >> qemu drivers?
> >> 
> >> Most problem reports found relate to deprecated Centos6 qemu-kvm packages.
> >> 
> >> In our company we have similar or even worse problems with Centos7 hosts 
> >> and guest machines.
> > 
> > Have haven't explained what problem you are experiencing.  If you want
> > help with your setup please include your QEMU command-line (ps aux |
> > grep qemu), the traffic pattern (ideally how to reproduce it with a
> > benchmarking tool), and what observation you are making (e.g. netstat
> > counters showing dropped packets).
> 
> I was quite astonished about the many hints about virtio drivers as we had 
> this problem with the e1000 driver in a Centos7 Guest on a Centos6 Host.
> 
>       e1000 0000:00:03.0 ens3: Detected Tx Unit Hang#012  Tx Queue            
>  <0>#012  TDH                  <42>#012  TDT                  <42>#012  
> next_to_use          <2e>#012  next_to_clean        
> <42>#012buffer_info[next_to_clean]#012  time_stamp           <104aff1b8>#012  
> next_to_watch        <44>#012  jiffies              <104b00ee9>#012  
> next_to_watch.status <0>
>       Apr 25 21:08:48 db03 kernel: ------------[ cut here ]------------
>       Apr 25 21:08:48 db03 kernel: WARNING: at net/sched/sch_generic.c:297 
> dev_watchdog+0x270/0x280()
>       Apr 25 21:08:48 db03 kernel: NETDEV WATCHDOG: ens3 (e1000): transmit 
> queue 0 timed out
>       Apr 25 21:08:48 db03 kernel: Modules linked in: binfmt_misc ipt_REJECT 
> nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip6t_REJECT nf_conntrack_ipv6 
> nf_defrag_ipv6 xt_conntrack nf_conntrack ip6table_filter ip6_tables btrfs 
> zlib_deflate raid6_pq xor ext4 mbcache jbd2 crc32_pclmul ghash_clmulni_intel 
> aesni_intel lrw gf128mul glue_helper ablk_helper i2c_piix4 ppdev cryptd 
> pcspkr virtio_balloon parport_pc parport sg nfsd auth_rpcgss nfs_acl lockd 
> grace sunrpc ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic 
> ata_generic pata_acpi virtio_scsi cirrus syscopyarea sysfillrect sysimgblt 
> drm_kms_helper ttm drm crct10dif_pclmul crct10dif_common ata_piix 
> crc32c_intel virtio_pci e1000 i2c_core virtio_ring libata serio_raw virtio 
> floppy dm_mirror dm_region_hash dm_log dm_mod
>       Apr 25 21:08:48 db03 kernel: CPU: 2 PID: 0 Comm: swapper/2 Not tainted 
> 3.10.0-327.13.1.el7.x86_64 #1
>       Apr 25 21:08:48 db03 kernel: Hardware name: Red Hat KVM, BIOS 0.5.1 
> 01/01/2007
>       Apr 25 21:08:48 db03 kernel: ffff88126f483d88 685d892e8a452abb 
> ffff88126f483d40 ffffffff8163571c
>       Apr 25 21:08:48 db03 kernel: ffff88126f483d78 ffffffff8107b200 
> 0000000000000000 ffff881203b9a000
>       Apr 25 21:08:48 db03 kernel: ffff881201c3e080 0000000000000001 
> 0000000000000002 ffff88126f483de0
>       Apr 25 21:08:48 db03 kernel: Call Trace:
>       Apr 25 21:08:48 db03 kernel: <IRQ>  [<ffffffff8163571c>] 
> dump_stack+0x19/0x1b
>       Apr 25 21:08:48 db03 kernel: [<ffffffff8107b200>] 
> warn_slowpath_common+0x70/0xb0
>       Apr 25 21:08:48 db03 kernel: [<ffffffff8107b29c>] 
> warn_slowpath_fmt+0x5c/0x80
>       Apr 25 21:08:48 db03 kernel: [<ffffffff8154cd40>] 
> dev_watchdog+0x270/0x280
>       Apr 25 21:08:48 db03 kernel: [<ffffffff8154cad0>] ? 
> dev_graft_qdisc+0x80/0x80
>       Apr 25 21:08:48 db03 kernel: [<ffffffff8108b0a6>] 
> call_timer_fn+0x36/0x110
>       Apr 25 21:08:48 db03 kernel: [<ffffffff8154cad0>] ? 
> dev_graft_qdisc+0x80/0x80
>       Apr 25 21:08:48 db03 kernel: [<ffffffff8108dd97>] 
> run_timer_softirq+0x237/0x340
>       Apr 25 21:08:48 db03 kernel: [<ffffffff81084b0f>] 
> __do_softirq+0xef/0x280
>       Apr 25 21:08:48 db03 kernel: [<ffffffff816477dc>] call_softirq+0x1c/0x30
>       Apr 25 21:08:48 db03 kernel: [<ffffffff81016fc5>] do_softirq+0x65/0xa0
>       Apr 25 21:08:48 db03 kernel: [<ffffffff81084ea5>] irq_exit+0x115/0x120
>       Apr 25 21:08:48 db03 kernel: [<ffffffff81648455>] 
> smp_apic_timer_interrupt+0x45/0x60
>       Apr 25 21:08:48 db03 kernel: [<ffffffff81646b1d>] 
> apic_timer_interrupt+0x6d/0x80
>       Apr 25 21:08:48 db03 kernel: <EOI>  [<ffffffff81058e96>] ? 
> native_safe_halt+0x6/0x10
>       Apr 25 21:08:48 db03 kernel: [<ffffffff8101dbcf>] default_idle+0x1f/0xc0
>       Apr 25 21:08:48 db03 kernel: [<ffffffff8101e4d6>] 
> arch_cpu_idle+0x26/0x30
>       Apr 25 21:08:48 db03 kernel: [<ffffffff810d6325>] 
> cpu_startup_entry+0x245/0x290
>       Apr 25 21:08:48 db03 kernel: [<ffffffff810475fa>] 
> start_secondary+0x1ba/0x230
>       Apr 25 21:08:48 db03 kernel: ---[ end trace 71ac4360272e207e ]---
>       Apr 25 21:08:48 db03 kernel: e1000 0000:00:03.0 ens3: Reset adapter
> 
> 
> I'm still not sure why this happens on this host "db03", while db02 and db01 
> are not affected. All guests are running on different hosts and the network 
> is controlled by an openvswitch.

This looks interesting.  It could be a bug in QEMU's e1000 NIC
emulation.  Maybe it has already been fixed in qemu.git but I didn't see
any relevant commits.

Please post the RPM version numbers you are using (rpm -qa | grep qemu
in host, rpm -qa | grep kernel in host).

The e1000 driver can print additional information (to dump the contents
of the tx ring).  Please increase your kernel's log level to collect
that information:
 # echo 8 >/proc/sys/kernel/printk

The tx ring dump may allow someone to figure out why the packet caused
tx to stall.

Stefan

signature.asc
Description: PGP signature

Re: [Qemu-devel] TCP Segementation Offloading

Reply via email to