Hi all, Well we conducted another test (much better than the last one - it had too many unpredictable variables): 1. 1 destination machine let's call it "dest1" 2. dest1 has 2 nics 2.1 vlan 411 access mode thru a VSWITCH 2.2 vlan 420 trunck mode directly thru osa devices (different OSA then the VSWITCH) 3. 1 source machine lets' call it source1 4 same nic configuration as dest1 5. using two different ssh connections to srouce1, we run the ping command to the dest1 machine: 5.1 first ping from source1 is to dest1 vlan 411 5.2 second ping from source1 is to dest1 vlan 420 6. what we were trying to prove is that the same 2 machine at the same time will respond much better using the OSA directly then using the VSWITCH. 7. well... we failed to prove that... as when the VSWITCH pings peeked so did the OSA pings.. we didn't expect that.
So, the problem is not the VSWITCH and not the OSA as the peeks happen on both of them simultaneously. Our next guess is that SRM dispatching time slice is too big. Please share if you have a better idea... The default is 5ms? Really? So, If I have 5 machines in the dispatch list (let's say with 1 CPU) it is very possible that a simple ping will take 25ms. Now we have 2 IFLs running about 15 guests on 1 z/VM and another z/VM on the same CEC (sharing the 2 IFLs) with about 4 guests that does not do much (PR/SM weight is 98% to the big z/VM and 2% to the small). Total CPU utilization is at about 80%. We are having trouble on the big z/VM :-) Although not all guests has the same share (and I admit that I don't fully understand how the shares really work - I have read more than one article about it) 15 times 5 ms devided by 2 IFLs is 37.5ms. And what if the guest answering the ping is busy with other stuff (like real application work) it can be in the next time slice.. that is 75ms. Just to be clear. We really don't care about the ping response time. We are having real performance issues and just can’t find the reason. CPU is not utilize and we can still see stolen time on all guests. Did any of you changed your time slice settings? Does this make any sense to you CPU performance experts. Thanks! Offer Baruch -----Original Message----- From: Linux on 390 Port [mailto:LINUX-390@VM.MARIST.EDU] On Behalf Of Leland Lucius Sent: Friday, October 28, 2011 7:41 AM To: LINUX-390@VM.MARIST.EDU Subject: Re: z/VM Switch performance On 10/27/11 11:07 AM, Marcy Cortes wrote: > Offer, I do get the erratic pings too. Not as high as 200, but some 50s. > > It was reported to me a few weeks ago by one of our more sophisticated users > that traceroute occasionally fails over the same vswitch. > Run it like 20 times 1 right after another to recreate or use the -q option > with something like -q 8. > > I checked with another customer and he also saw the same behavior. > > I opened a PMR with VM but they said to open one with Linux. I have not > gotten around to opening one with Novell yet. > > Could some of you others out there try this simple ping test? > > We are also vlan aware and it does happen on both LACP and non-LACP. > We have both aware and unaware VSWITCHes. Results of two guests connected to the same aware VSWITCH and on the same VLAN: pzawap01:~ # traceroute pzawap03 traceroute to pzawap03 (172.2.2.211), 30 hops max, 40 byte packets 1 pzawap03.svc (172.2.2.211) 0.114 ms 0.134 ms 0.016 ms pzawap01:~ # traceroute pzawap03 traceroute to pzawap03 (172.2.2.211), 30 hops max, 40 byte packets 1 pzawap03.svc (172.2.2.211) 0.055 ms 0.136 ms 0.024 ms pzawap01:~ # traceroute pzawap03 traceroute to pzawap03 (172.2.2.211), 30 hops max, 40 byte packets 1 pzawap03.svc (172.2.2.211) 0.000 ms * * Results of two guests connected to the same unaware VSWITCH: pzsdns01:~ # traceroute 192.1.1.28 traceroute to 192.1.1.28 (192.1.1.28), 30 hops max, 40 byte packets 1 192.1.1.28 (192.1.1.28) 0.199 ms 0.036 ms 0.074 ms pzsdns01:~ # traceroute 192.1.1.28 traceroute to 192.1.1.28 (192.1.1.28), 30 hops max, 40 byte packets 1 192.1.1.28 (192.1.1.28) 0.189 ms 0.492 ms * pzsdns01:~ # traceroute 192.1.1.28 traceroute to 192.1.1.28 (192.1.1.28), 30 hops max, 40 byte packets 1 192.1.1.28 (192.1.1.28) 0.140 ms 0.038 ms 0.087 ms pzsdns01:~ # traceroute 192.1.1.28 traceroute to 192.1.1.28 (192.1.1.28), 30 hops max, 40 byte packets 1 192.1.1.28 (192.1.1.28) 0.244 ms 0.044 ms 0.026 ms pzsdns01:~ # traceroute 192.1.1.28 traceroute to 192.1.1.28 (192.1.1.28), 30 hops max, 40 byte packets 1 * * * 2 * * * 3 * * * 4 * * * 5 * * * 6 * * * 7 192.1.1.28 (192.1.1.28) 0.306 ms 0.196 ms 0.286 ms Results from a guest on an unaware VSWITCH to a guest on a different unaware VSWITCH (same LPAR) pzsdns01:~ # traceroute pzsadm01 traceroute to pzsadm01 (172.1.1.35), 30 hops max, 40 byte packets 1 192.1.1.1 (192.1.1.1) 0.289 ms 0.254 ms 0.230 ms 2 pzsadm01.svc (172.1.1.35) 0.271 ms 0.311 ms 0.324 ms pzsdns01:~ # traceroute pzsadm01 traceroute to pzsadm01 (172.1.1.35), 30 hops max, 40 byte packets 1 192.1.1.1 (192.1.1.1) 0.320 ms 0.276 ms 0.682 ms 2 pzsadm01.svc (172.1.1.35) 0.321 ms 0.263 ms 0.296 ms pzsdns01:~ # traceroute pzsadm01 traceroute to pzsadm01 (172.1.1.35), 30 hops max, 40 byte packets 1 192.1.1.1 (192.1.1.1) 0.322 ms 0.278 ms 0.259 ms 2 * * * 3 * * * 4 * * * 5 * * * 6 pzsadm01.svc (172.1.1.35) 0.755 ms * * Results from a guest on an aware VSWITCH to a guest on an aware VSWITCH on different LPARs (same VLAN) pzawap01:~ # traceroute pzawap04 traceroute to pzawap04 (172.3.3.212), 30 hops max, 40 byte packets 1 pzawap04.svc (172.3.3.212) 0.540 ms 0.298 ms 0.310 ms pzawap01:~ # traceroute pzawap04 traceroute to pzawap04 (172.3.3.212), 30 hops max, 40 byte packets 1 pzawap04.svc (172.3.3.212) 0.463 ms 0.312 ms 0.313 ms pzawap01:~ # traceroute pzawap04 traceroute to pzawap04 (172.3.3.212), 30 hops max, 40 byte packets 1 pzawap04.svc (172.3.3.212) 0.381 ms * * All guests are SLES10-SP3 (kernel 2.6.16.60-0.76.8-default) Leland ---------------------------------------------------------------------- For LINUX-390 subscribe / signoff / archive access instructions, send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 ---------------------------------------------------------------------- For more information on Linux on System z, visit http://wiki.linuxvm.org/ ----- No virus found in this message. Checked by AVG - www.avg.com Version: 10.0.1411 / Virus Database: 2092/3978 - Release Date: 10/27/11 ---------------------------------------------------------------------- For LINUX-390 subscribe / signoff / archive access instructions, send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 ---------------------------------------------------------------------- For more information on Linux on System z, visit http://wiki.linuxvm.org/