On Oct 20, 2013, at 6:52 AM, "Chris Murray" <[email protected]> wrote:
> Hi all, > > I'm hoping for some troubleshooting advice. I have an OpenIndiana > oi_151a8 virtual machine which was functioning correctly on vSphere 5.1 > but now isn't on vSphere 5.5 (ESXi-5.5.0-1331820-standard) > > A small corner of my network infrastructure has a vSphere host upon > which live two virtual machines: > ape - "Debian Linux ape 2.6.32-5-amd64 #1 SMP Sun Sep 23 10:07:46 UTC > 2012 x86_64 GNU/Linux", uses USB passthrough to read from a APC UPS and > e-mail me when power is lost > giraffe - oi_151a8, serves up virtual machine images over NFS. > > Since the upgrade of vSphere from 5.1 to 5.5, virtual machines on other > hosts whose VMDKs are on this NFS mount are now very slow. Putty > sessions to the oi_151a8 VM also 'stutter', and I see patterns in ping, > such as these: > > Reply from 192.168.0.13: bytes=32 time=1367ms TTL=255 > Reply from 192.168.0.13: bytes=32 time<1ms TTL=255 > Reply from 192.168.0.13: bytes=32 time<1ms TTL=255 > Reply from 192.168.0.13: bytes=32 time=1369ms TTL=255 > Reply from 192.168.0.13: bytes=32 time<1ms TTL=255 > Reply from 192.168.0.13: bytes=32 time<1ms TTL=255 > Reply from 192.168.0.13: bytes=32 time=1356ms TTL=255 > Reply from 192.168.0.13: bytes=32 time<1ms TTL=255 > Reply from 192.168.0.13: bytes=32 time<1ms TTL=255 > Reply from 192.168.0.13: bytes=32 time=1376ms TTL=255 > Reply from 192.168.0.13: bytes=32 time<1ms TTL=255 > Reply from 192.168.0.13: bytes=32 time<1ms TTL=255 > Reply from 192.168.0.13: bytes=32 time<1ms TTL=255 > Request timed out. > > At the same time, pings to the neighbouring VM (ape), or the host follow > the normal "time<1ms" pattern, as do pings to other random machines on > the network. I've therefore ruled out switch infrastructure, including > possibly the vSwitch inside this vSphere host given that the 'giraffe' > VM exhibits a problem whereas 'ape' does not. > > Interestingly, if I power down VMs whose storage lives on giraffe, the > pings return to sub 1ms. > > I am drawing the conclusion that this is some symptom of the combination > of OI, vSphere 5.5 & network load, although I'm not sure where to turn > next. > > Tried: > "zpool scrub rpool" - to induce high read load on the SSD in the vSphere > host. This may look like a strange thing to test, but I've seen odd > effects on Windows machines whose storage is struggling in the past. > Created a test pool on SSD and induced write load using "cat /dev/zero > > /testpool/zerofile". > "zpool scrub giraffepool" - to induce high read load on the spinning > drives. Still no effect from these three tests, further hinting that > it's network load which is a trigger. > Checked that ipfilter is off with the following, yet there is a message > in dmesg: "IP Filter: v4.1.9, running." > > chris@giraffe:~# svcs -xv ipfilter > svc:/network/ipfilter:default (IP Filter) > State: disabled since October 20, 2013 12:17:02 PM UTC > Reason: Disabled by an administrator. > See: http://illumos.org/msg/SMF-8000-05 > See: man -M /usr/share/man -s 5 ipfilter > Impact: This service is not running. > > Haven't tried yet: > Installing OI again in another VM to see if the problem is localised to > giraffe, since I'd also have to induce load to be confident of the issue > existing or not. > > I'm using the e1000 NIC in vSphere and don't have VM tools installed. > > Any troubleshooting advice to help me focus somewhere would be > appreciated. Check your network configuration: routes, netmasks, MTU, dup IPs, etc. -- richard -- [email protected] +1-760-896-4422 _______________________________________________ OpenIndiana-discuss mailing list [email protected] http://openindiana.org/mailman/listinfo/openindiana-discuss
