Re: [OmniOS-discuss] The ixgbe driver, Lindsay Lohan, and the Greek economy

2015-03-01 Thread W Verb
Hello all,



Well, I no longer blame the ixgbe driver for the problems I'm seeing.


I tried Joerg's updated driver, which didn't improve the issue. So I went
back to the drawing board and rebuilt the server from scratch.

What I noted is that if I have only a single 1-gig physical interface
active on the ESXi host, everything works as expected. As soon as I enable
two interfaces, I start seeing the performance problems I've described.

Response pauses from the server that I see in TCPdumps are still leading me
to believe the problem is delay on the server side, so I ran a series of
kernel dtraces and produced some flamegraphs.


This was taken during a read operation with two active 10G interfaces on
the server, with a single target being shared by two tpgs- one tpg for each
10G physical port. The host device has two 1G ports enabled, with VLANs
separating the active ports into 10G/1G pairs. ESXi is set to multipath
using both VLANS with a round-robin IO interval of 1.

https://drive.google.com/file/d/0BwyUMjibonYQd3ZYOGh4d2pteGs/view?usp=sharing


This was taken during a write operation:

https://drive.google.com/file/d/0BwyUMjibonYQMnBtU1Q2SXM2ams/view?usp=sharing


I then rebooted the server and disabled C-State, ACPI T-State, and general
EIST (Turbo boost) functionality in the CPU.

I when I attempted to boot my guest VM, the iSCSI transfer gradually ground
to a halt during the boot loading process, and the guest OS never did
complete its boot process.

Here is a flamegraph taken while iSCSI is slowly dying:

https://drive.google.com/file/d/0BwyUMjibonYQM21JeFZPX3dZWTg/view?usp=sharing



I edited out cpu_idle_adaptive from the dtrace output and regenerated the
slowdown graph:

https://drive.google.com/file/d/0BwyUMjibonYQbTVwV3NvXzlPS1E/view?usp=sharing



I then edited cpu_idle_adaptive out of the speedy write operation and
regenerated that graph:

https://drive.google.com/file/d/0BwyUMjibonYQeWFYM0pCMDZ1X2s/view?usp=sharing



I have zero experience with interpreting flamegraphs, but the most
significant difference I see between the slow read example and the fast
write example is in unix`thread_start -- unix`idle. There's a good chunk
of unix`i86_mwait in the read example that is not present in the write
example at all.

Disabling the l2arc cache device didn't make a difference, and I had to
reenable EIST support on the CPU to get my VMs to boot.

I am seeing a variety of bug reports going back to 2010 regarding excessive
mwait operations, with the suggested solutions usually being to set cpupm
enable poll-mode in power.conf. That change also had no effect on speed.

-Warren V







-Original Message-

From: Chris Siebenmann [mailto:c...@cs.toronto.edu c...@cs.toronto.edu]

Sent: Monday, February 23, 2015 8:30 AM

To: W Verb

Cc: omnios-discuss@lists.omniti.com; c...@cs.toronto.edu

Subject: Re: [OmniOS-discuss] The ixgbe driver, Lindsay Lohan, and the
Greek economy



 Chris, thanks for your specific details. I'd appreciate it if you

 could tell me which copper NIC you tried, as well as to pass on the

 iSCSI tuning parameters.



 Our copper NIC experience is with onboard X540-AT2 ports on SuperMicro
hardware (which have the guaranteed 10-20 msec lock hold) and dual-port
82599EB TN cards (which have some sort of driver/hardware failure under
load that eventually leads to 2-second lock holds). I can't recommend
either with the current driver; we had to revert to 1G networking in order
to get stable servers.



 The iSCSI parameter modifications we do, across both initiators and
targets, are:



  initialr2tno

  firstburstlength  128k

  maxrecvdataseglen 128k[only on Linux backends]

  maxxmitdataseglen 128k[only on Linux backends]



The OmniOS initiator doesn't need tuning for more than the first two
parameters; on the Linux backends we tune up all four. My extended thoughts
on these tuning parameters and why we touch them can be found

here:



   http://utcc.utoronto.ca/~cks/space/blog/tech/UnderstandingiSCSIProtocol

   http://utcc.utoronto.ca/~cks/space/blog/tech/LikelyISCSITuning



The short version is that these parameters probably only make a small
difference but their overall goal is to do 128KB ZFS reads and writes in
single iSCSI operations (although they will be fragmented at the TCP

layer) and to do iSCSI writes without a back-and-forth delay between
initiator and target (that's 'initialr2t no').



 I think basically everyone should use InitialR2T set to no and in fact
that it should be the software default. These days only unusually limited
iSCSI targets should need it to be otherwise and they can change their
setting for it (initiator and target must both agree to it being 'yes', so
either can veto it).



  - cks



On Mon, Feb 23, 2015 at 8:21 AM, Joerg Goltermann j...@osn.de wrote:

 Hi,

 I think your problem is caused by your link properties or your
 switch settings. In general the standard ixgbe seems to perform

Re: [OmniOS-discuss] The ixgbe driver, Lindsay Lohan, and the Greek economy

2015-02-23 Thread Joerg Goltermann

Hi,

I think your problem is caused by your link properties or your
switch settings. In general the standard ixgbe seems to perform
well.

I had trouble after changing the default flow control settings to bi
and this was my motivation to update the ixgbe driver a long time ago.
After I have updated our systems to ixgbe 2.5.8 I never had any
problems 

Make sure your switch has support for jumbo frames and you use
the same mtu on all ports, otherwise the smallest will be used.

What switch do you use? I can tell you nice horror stories about
different vendors

 - Joerg

On 23.02.2015 10:31, W Verb wrote:

Thank you Joerg,

I've downloaded the package and will try it tomorrow.

The only thing I can add at this point is that upon review of my
testing, I may have performed my pkg -u between the initial quad-gig
performance test and installing the 10G NIC. So this may be a new
problem introduced in the latest updates.

Those of you who are running 10G and have not upgraded to the latest
kernel, etc, might want to do some additional testing before running the
update.

-Warren V

On Mon, Feb 23, 2015 at 1:15 AM, Joerg Goltermann j...@osn.de
mailto:j...@osn.de wrote:

Hi,

I remember there was a problem with the flow control settings in the
ixgbe
driver, so I updated it a long time ago for our internal servers to
2.5.8.
Last weekend I integrated the latest changes from the FreeBSD driver
to bring
the illumos ixgbe to 2.5.25 but I had no time to test it, so it's
completely
untested!


If you would like to give the latest driver a try you can fetch the
kernel modules from
https://cloud.osn.de/index.__php/s/Fb4so9RsNnXA7r9
https://cloud.osn.de/index.php/s/Fb4so9RsNnXA7r9

Clone your boot environment, place the modules in the new environment
and update the boot-archive of the new BE.

  - Joerg





On 23.02.2015 02:54, W Verb wrote:

By the way, to those of you who have working setups: please send me
your pool/volume settings, interface linkprops, and any kernel
tuning
parameters you may have set.

Thanks,
Warren V

On Sat, Feb 21, 2015 at 7:59 AM, Schweiss, Chip
c...@innovates.com mailto:c...@innovates.com wrote:

I can't say I totally agree with your performance
assessment.   I run Intel
X520 in all my OmniOS boxes.

Here is a capture of nfssvrtop I made while running many
storage vMotions
between two OmniOS boxes hosting NFS datastores.   This is a
10 host VMware
cluster.  Both OmniOS boxes are dual 10G connected with
copper twin-ax to
the in rack Nexus 5010.

VMware does 100% sync writes, I use ZeusRAM SSDs for log
devices.

-Chip

2014 Apr 24 08:05:51, load: 12.64, read: 17330243 KB,
swrite: 15985KB,
awrite: 1875455  KB

Ver Client   NFSOPS   Reads SWrites AWrites
Commits   Rd_bw
SWr_bw  AWr_bwRd_t   SWr_t   AWr_t   Com_t  Align%

4   10.28.17.105  0   0   0   0
  0   0
0   0   0   0   0   0   0

4   10.28.17.215  0   0   0   0
  0   0
0   0   0   0   0   0   0

4   10.28.17.213  0   0   0   0
  0   0
0   0   0   0   0   0   0

4   10.28.16.151  0   0   0   0
  0   0
0   0   0   0   0   0   0

4   all   1   0   0   0
  0   0
0   0   0   0   0   0   0

3   10.28.16.175  3   0   3   0
  0   1
11   04806  48   0   0  85

3   10.28.16.183  6   0   6   0
  0   3
162   0 549 124   0   0  73

3   10.28.16.180 11   0  10   0
  0   3
27   0 776  89   0   0  67

3   10.28.16.176 28   2  26   0
  0  10
405   02572 198   0   0 100

3   10.28.16.178   46064602   4   0
  0  294534
3   0 723  49   0   0  99

3   10.28.16.179   49054879  26   0
  0  312208
311   0 735 271   0   0  99

3   10.28.16.181   55155502  13   0
  0  352107
77   0

Re: [OmniOS-discuss] The ixgbe driver, Lindsay Lohan, and the Greek economy

2015-02-23 Thread Joerg Goltermann

Hi,

I remember there was a problem with the flow control settings in the ixgbe
driver, so I updated it a long time ago for our internal servers to 2.5.8.
Last weekend I integrated the latest changes from the FreeBSD driver to 
bring

the illumos ixgbe to 2.5.25 but I had no time to test it, so it's completely
untested!


If you would like to give the latest driver a try you can fetch the
kernel modules from https://cloud.osn.de/index.php/s/Fb4so9RsNnXA7r9

Clone your boot environment, place the modules in the new environment
and update the boot-archive of the new BE.

 - Joerg




On 23.02.2015 02:54, W Verb wrote:

By the way, to those of you who have working setups: please send me
your pool/volume settings, interface linkprops, and any kernel tuning
parameters you may have set.

Thanks,
Warren V

On Sat, Feb 21, 2015 at 7:59 AM, Schweiss, Chip c...@innovates.com wrote:

I can't say I totally agree with your performance assessment.   I run Intel
X520 in all my OmniOS boxes.

Here is a capture of nfssvrtop I made while running many storage vMotions
between two OmniOS boxes hosting NFS datastores.   This is a 10 host VMware
cluster.  Both OmniOS boxes are dual 10G connected with copper twin-ax to
the in rack Nexus 5010.

VMware does 100% sync writes, I use ZeusRAM SSDs for log devices.

-Chip

2014 Apr 24 08:05:51, load: 12.64, read: 17330243 KB, swrite: 15985KB,
awrite: 1875455  KB

Ver Client   NFSOPS   Reads SWrites AWrites Commits   Rd_bw
SWr_bw  AWr_bwRd_t   SWr_t   AWr_t   Com_t  Align%

4   10.28.17.105  0   0   0   0   0   0
0   0   0   0   0   0   0

4   10.28.17.215  0   0   0   0   0   0
0   0   0   0   0   0   0

4   10.28.17.213  0   0   0   0   0   0
0   0   0   0   0   0   0

4   10.28.16.151  0   0   0   0   0   0
0   0   0   0   0   0   0

4   all   1   0   0   0   0   0
0   0   0   0   0   0   0

3   10.28.16.175  3   0   3   0   0   1
11   04806  48   0   0  85

3   10.28.16.183  6   0   6   0   0   3
162   0 549 124   0   0  73

3   10.28.16.180 11   0  10   0   0   3
27   0 776  89   0   0  67

3   10.28.16.176 28   2  26   0   0  10
405   02572 198   0   0 100

3   10.28.16.178   46064602   4   0   0  294534
3   0 723  49   0   0  99

3   10.28.16.179   49054879  26   0   0  312208
311   0 735 271   0   0  99

3   10.28.16.181   55155502  13   0   0  352107
77   0  89  87   0   0  99

3   10.28.16.184  12095   12059  10   0   0  763014
39   0 249 147   0   0  99

3   10.28.58.1154016040 1166354  53  191605
474  202346 192  96 144  83  99

3   all   42574   33086 2176354  53 1913488
1582  202300 348 138 153 105  99





On Fri, Feb 20, 2015 at 11:46 PM, W Verb wver...@gmail.com wrote:


Hello All,

Thank you for your replies.
I tried a few things, and found the following:

1: Disabling hyperthreading support in the BIOS drops performance overall
by a factor of 4.
2: Disabling VT support also seems to have some effect, although it
appears to be minor. But this has the amusing side effect of fixing the
hangs I've been experiencing with fast reboot. Probably by disabling kvm.
3: The performance tests are a bit tricky to quantify because of caching
effects. In fact, I'm not entirely sure what is happening here. It's just
best to describe what I'm seeing:

The commands I'm using to test are
dd if=/dev/zero of=./test.dd bs=2M count=5000
dd of=/dev/null if=./test.dd bs=2M count=5000
The host vm is running Centos 6.6, and has the latest vmtools installed.
There is a host cache on an SSD local to the host that is also in place.
Disabling the host cache didn't immediately have an effect as far as I could
see.

The host MTU set to 3000 on all iSCSI interfaces for all tests.

Test 1: Right after reboot, with an ixgbe MTU of 9000, the write test
yields an average speed over three tests of 137MB/s. The read test yields an
average over three tests of 5MB/s.

Test 2: After setting ifconfig ixgbe0 mtu 3000, the write tests yield
140MB/s, and the read tests yield 53MB/s. It's important to note here that
if I cut the read test short at only 2-3GB, I get results upwards of
350MB/s, which I assume is local cache-related distortion.

Test 3: MTU of 1500. Read tests are up to 156 MB/s. Write tests yield

Re: [OmniOS-discuss] The ixgbe driver, Lindsay Lohan, and the Greek economy

2015-02-23 Thread W Verb
Thank you Joerg,

I've downloaded the package and will try it tomorrow.

The only thing I can add at this point is that upon review of my testing, I
may have performed my pkg -u between the initial quad-gig performance
test and installing the 10G NIC. So this may be a new problem introduced in
the latest updates.

Those of you who are running 10G and have not upgraded to the latest
kernel, etc, might want to do some additional testing before running the
update.

-Warren V

On Mon, Feb 23, 2015 at 1:15 AM, Joerg Goltermann j...@osn.de wrote:

 Hi,

 I remember there was a problem with the flow control settings in the ixgbe
 driver, so I updated it a long time ago for our internal servers to 2.5.8.
 Last weekend I integrated the latest changes from the FreeBSD driver to
 bring
 the illumos ixgbe to 2.5.25 but I had no time to test it, so it's
 completely
 untested!


 If you would like to give the latest driver a try you can fetch the
 kernel modules from https://cloud.osn.de/index.php/s/Fb4so9RsNnXA7r9

 Clone your boot environment, place the modules in the new environment
 and update the boot-archive of the new BE.

  - Joerg





 On 23.02.2015 02:54, W Verb wrote:

 By the way, to those of you who have working setups: please send me
 your pool/volume settings, interface linkprops, and any kernel tuning
 parameters you may have set.

 Thanks,
 Warren V

 On Sat, Feb 21, 2015 at 7:59 AM, Schweiss, Chip c...@innovates.com
 wrote:

 I can't say I totally agree with your performance assessment.   I run
 Intel
 X520 in all my OmniOS boxes.

 Here is a capture of nfssvrtop I made while running many storage vMotions
 between two OmniOS boxes hosting NFS datastores.   This is a 10 host
 VMware
 cluster.  Both OmniOS boxes are dual 10G connected with copper twin-ax to
 the in rack Nexus 5010.

 VMware does 100% sync writes, I use ZeusRAM SSDs for log devices.

 -Chip

 2014 Apr 24 08:05:51, load: 12.64, read: 17330243 KB, swrite: 15985
 KB,
 awrite: 1875455  KB

 Ver Client   NFSOPS   Reads SWrites AWrites Commits   Rd_bw
 SWr_bw  AWr_bwRd_t   SWr_t   AWr_t   Com_t  Align%

 4   10.28.17.105  0   0   0   0   0   0
 0   0   0   0   0   0   0

 4   10.28.17.215  0   0   0   0   0   0
 0   0   0   0   0   0   0

 4   10.28.17.213  0   0   0   0   0   0
 0   0   0   0   0   0   0

 4   10.28.16.151  0   0   0   0   0   0
 0   0   0   0   0   0   0

 4   all   1   0   0   0   0   0
 0   0   0   0   0   0   0

 3   10.28.16.175  3   0   3   0   0   1
 11   04806  48   0   0  85

 3   10.28.16.183  6   0   6   0   0   3
 162   0 549 124   0   0  73

 3   10.28.16.180 11   0  10   0   0   3
 27   0 776  89   0   0  67

 3   10.28.16.176 28   2  26   0   0  10
 405   02572 198   0   0 100

 3   10.28.16.178   46064602   4   0   0  294534
 3   0 723  49   0   0  99

 3   10.28.16.179   49054879  26   0   0  312208
 311   0 735 271   0   0  99

 3   10.28.16.181   55155502  13   0   0  352107
 77   0  89  87   0   0  99

 3   10.28.16.184  12095   12059  10   0   0  763014
 39   0 249 147   0   0  99

 3   10.28.58.1154016040 1166354  53  191605
 474  202346 192  96 144  83  99

 3   all   42574 33086 2176354  53 1913488
 1582  202300 348 138 153 105  99





 On Fri, Feb 20, 2015 at 11:46 PM, W Verb wver...@gmail.com wrote:


 Hello All,

 Thank you for your replies.
 I tried a few things, and found the following:

 1: Disabling hyperthreading support in the BIOS drops performance
 overall
 by a factor of 4.
 2: Disabling VT support also seems to have some effect, although it
 appears to be minor. But this has the amusing side effect of fixing the
 hangs I've been experiencing with fast reboot. Probably by disabling
 kvm.
 3: The performance tests are a bit tricky to quantify because of caching
 effects. In fact, I'm not entirely sure what is happening here. It's
 just
 best to describe what I'm seeing:

 The commands I'm using to test are
 dd if=/dev/zero of=./test.dd bs=2M count=5000
 dd of=/dev/null if=./test.dd bs=2M count=5000
 The host vm is running Centos 6.6, and has the latest vmtools installed.
 There is a host cache on an SSD local to the host that is also in place.
 Disabling the host cache didn't immediately have 

Re: [OmniOS-discuss] The ixgbe driver, Lindsay Lohan, and the Greek economy

2015-02-22 Thread W Verb
By the way, to those of you who have working setups: please send me
your pool/volume settings, interface linkprops, and any kernel tuning
parameters you may have set.

Thanks,
Warren V

On Sat, Feb 21, 2015 at 7:59 AM, Schweiss, Chip c...@innovates.com wrote:
 I can't say I totally agree with your performance assessment.   I run Intel
 X520 in all my OmniOS boxes.

 Here is a capture of nfssvrtop I made while running many storage vMotions
 between two OmniOS boxes hosting NFS datastores.   This is a 10 host VMware
 cluster.  Both OmniOS boxes are dual 10G connected with copper twin-ax to
 the in rack Nexus 5010.

 VMware does 100% sync writes, I use ZeusRAM SSDs for log devices.

 -Chip

 2014 Apr 24 08:05:51, load: 12.64, read: 17330243 KB, swrite: 15985KB,
 awrite: 1875455  KB

 Ver Client   NFSOPS   Reads SWrites AWrites Commits   Rd_bw
 SWr_bw  AWr_bwRd_t   SWr_t   AWr_t   Com_t  Align%

 4   10.28.17.105  0   0   0   0   0   0
 0   0   0   0   0   0   0

 4   10.28.17.215  0   0   0   0   0   0
 0   0   0   0   0   0   0

 4   10.28.17.213  0   0   0   0   0   0
 0   0   0   0   0   0   0

 4   10.28.16.151  0   0   0   0   0   0
 0   0   0   0   0   0   0

 4   all   1   0   0   0   0   0
 0   0   0   0   0   0   0

 3   10.28.16.175  3   0   3   0   0   1
 11   04806  48   0   0  85

 3   10.28.16.183  6   0   6   0   0   3
 162   0 549 124   0   0  73

 3   10.28.16.180 11   0  10   0   0   3
 27   0 776  89   0   0  67

 3   10.28.16.176 28   2  26   0   0  10
 405   02572 198   0   0 100

 3   10.28.16.178   46064602   4   0   0  294534
 3   0 723  49   0   0  99

 3   10.28.16.179   49054879  26   0   0  312208
 311   0 735 271   0   0  99

 3   10.28.16.181   55155502  13   0   0  352107
 77   0  89  87   0   0  99

 3   10.28.16.184  12095   12059  10   0   0  763014
 39   0 249 147   0   0  99

 3   10.28.58.1154016040 1166354  53  191605
 474  202346 192  96 144  83  99

 3   all   42574   33086 2176354  53 1913488
 1582  202300 348 138 153 105  99





 On Fri, Feb 20, 2015 at 11:46 PM, W Verb wver...@gmail.com wrote:

 Hello All,

 Thank you for your replies.
 I tried a few things, and found the following:

 1: Disabling hyperthreading support in the BIOS drops performance overall
 by a factor of 4.
 2: Disabling VT support also seems to have some effect, although it
 appears to be minor. But this has the amusing side effect of fixing the
 hangs I've been experiencing with fast reboot. Probably by disabling kvm.
 3: The performance tests are a bit tricky to quantify because of caching
 effects. In fact, I'm not entirely sure what is happening here. It's just
 best to describe what I'm seeing:

 The commands I'm using to test are
 dd if=/dev/zero of=./test.dd bs=2M count=5000
 dd of=/dev/null if=./test.dd bs=2M count=5000
 The host vm is running Centos 6.6, and has the latest vmtools installed.
 There is a host cache on an SSD local to the host that is also in place.
 Disabling the host cache didn't immediately have an effect as far as I could
 see.

 The host MTU set to 3000 on all iSCSI interfaces for all tests.

 Test 1: Right after reboot, with an ixgbe MTU of 9000, the write test
 yields an average speed over three tests of 137MB/s. The read test yields an
 average over three tests of 5MB/s.

 Test 2: After setting ifconfig ixgbe0 mtu 3000, the write tests yield
 140MB/s, and the read tests yield 53MB/s. It's important to note here that
 if I cut the read test short at only 2-3GB, I get results upwards of
 350MB/s, which I assume is local cache-related distortion.

 Test 3: MTU of 1500. Read tests are up to 156 MB/s. Write tests yield
 about 142MB/s.
 Test 4: MTU of 1000: Read test at 182MB/s.
 Test 5: MTU of 900: Read test at 130 MB/s.
 Test 6: MTU of 1000: Read test at 160MB/s. Write tests are now
 consistently at about 300MB/s.
 Test 7: MTU of 1200: Read test at 124MB/s.
 Test 8: MTU of 1000: Read test at 161MB/s. Write at 261MB/s.

 A few final notes:
 L1ARC grabs about 10GB of RAM during the tests, so there's definitely some
 read caching going on.
 The write operations are easier to observe with iostat, and I'm seeing io
 rates that closely correlate with the network write 

Re: [OmniOS-discuss] The ixgbe driver, Lindsay Lohan, and the Greek economy

2015-02-22 Thread Chris Nagele
Is the issue here only related to iSCSI? We've used the X520's for NFS
for a couple of years and it has worked really well for us.

Not sure if this is an accurate test, but iperf shows the following
results for me:

Over 1GbE:

[ ID] Interval   Transfer Bandwidth
[  3]  0.0-10.0 sec981 MBytes823 Mbits/sec

Over 10GbE on the same machines:

[ ID] Interval   Transfer Bandwidth
[  3]  0.0-10.0 sec  9.42 GBytes  8.09 Gbits/sec

I could be going in the wrong direction here, but I was curious as
well since we rely on 10G heavily.

Chris


On Sun, Feb 22, 2015 at 8:54 PM, W Verb wver...@gmail.com wrote:
 By the way, to those of you who have working setups: please send me
 your pool/volume settings, interface linkprops, and any kernel tuning
 parameters you may have set.

 Thanks,
 Warren V

 On Sat, Feb 21, 2015 at 7:59 AM, Schweiss, Chip c...@innovates.com wrote:
 I can't say I totally agree with your performance assessment.   I run Intel
 X520 in all my OmniOS boxes.

 Here is a capture of nfssvrtop I made while running many storage vMotions
 between two OmniOS boxes hosting NFS datastores.   This is a 10 host VMware
 cluster.  Both OmniOS boxes are dual 10G connected with copper twin-ax to
 the in rack Nexus 5010.

 VMware does 100% sync writes, I use ZeusRAM SSDs for log devices.

 -Chip

 2014 Apr 24 08:05:51, load: 12.64, read: 17330243 KB, swrite: 15985KB,
 awrite: 1875455  KB

 Ver Client   NFSOPS   Reads SWrites AWrites Commits   Rd_bw
 SWr_bw  AWr_bwRd_t   SWr_t   AWr_t   Com_t  Align%

 4   10.28.17.105  0   0   0   0   0   0
 0   0   0   0   0   0   0

 4   10.28.17.215  0   0   0   0   0   0
 0   0   0   0   0   0   0

 4   10.28.17.213  0   0   0   0   0   0
 0   0   0   0   0   0   0

 4   10.28.16.151  0   0   0   0   0   0
 0   0   0   0   0   0   0

 4   all   1   0   0   0   0   0
 0   0   0   0   0   0   0

 3   10.28.16.175  3   0   3   0   0   1
 11   04806  48   0   0  85

 3   10.28.16.183  6   0   6   0   0   3
 162   0 549 124   0   0  73

 3   10.28.16.180 11   0  10   0   0   3
 27   0 776  89   0   0  67

 3   10.28.16.176 28   2  26   0   0  10
 405   02572 198   0   0 100

 3   10.28.16.178   46064602   4   0   0  294534
 3   0 723  49   0   0  99

 3   10.28.16.179   49054879  26   0   0  312208
 311   0 735 271   0   0  99

 3   10.28.16.181   55155502  13   0   0  352107
 77   0  89  87   0   0  99

 3   10.28.16.184  12095   12059  10   0   0  763014
 39   0 249 147   0   0  99

 3   10.28.58.1154016040 1166354  53  191605
 474  202346 192  96 144  83  99

 3   all   42574   33086 2176354  53 1913488
 1582  202300 348 138 153 105  99





 On Fri, Feb 20, 2015 at 11:46 PM, W Verb wver...@gmail.com wrote:

 Hello All,

 Thank you for your replies.
 I tried a few things, and found the following:

 1: Disabling hyperthreading support in the BIOS drops performance overall
 by a factor of 4.
 2: Disabling VT support also seems to have some effect, although it
 appears to be minor. But this has the amusing side effect of fixing the
 hangs I've been experiencing with fast reboot. Probably by disabling kvm.
 3: The performance tests are a bit tricky to quantify because of caching
 effects. In fact, I'm not entirely sure what is happening here. It's just
 best to describe what I'm seeing:

 The commands I'm using to test are
 dd if=/dev/zero of=./test.dd bs=2M count=5000
 dd of=/dev/null if=./test.dd bs=2M count=5000
 The host vm is running Centos 6.6, and has the latest vmtools installed.
 There is a host cache on an SSD local to the host that is also in place.
 Disabling the host cache didn't immediately have an effect as far as I could
 see.

 The host MTU set to 3000 on all iSCSI interfaces for all tests.

 Test 1: Right after reboot, with an ixgbe MTU of 9000, the write test
 yields an average speed over three tests of 137MB/s. The read test yields an
 average over three tests of 5MB/s.

 Test 2: After setting ifconfig ixgbe0 mtu 3000, the write tests yield
 140MB/s, and the read tests yield 53MB/s. It's important to note here that
 if I cut the read test short at only 2-3GB, I get results upwards of
 350MB/s, which I assume is local cache-related 

Re: [OmniOS-discuss] The ixgbe driver, Lindsay Lohan, and the Greek economy

2015-02-21 Thread Schweiss, Chip
I can't say I totally agree with your performance assessment.   I run Intel
X520 in all my OmniOS boxes.

Here is a capture of nfssvrtop I made while running many storage vMotions
between two OmniOS boxes hosting NFS datastores.   This is a 10 host VMware
cluster.  Both OmniOS boxes are dual 10G connected with copper twin-ax to
the in rack Nexus 5010.

VMware does 100% sync writes, I use ZeusRAM SSDs for log devices.

-Chip

2014 Apr 24 08:05:51, load: 12.64, read: 17330243 KB, swrite: 15985KB,
awrite: 1875455  KB

Ver Client   NFSOPS   Reads SWrites AWrites Commits   Rd_bw
SWr_bw  AWr_bwRd_t   SWr_t   AWr_t   Com_t  Align%

4   10.28.17.105  0   0   0   0   0
0   0   0   0   0   0   0   0

4   10.28.17.215  0   0   0   0   0
0   0   0   0   0   0   0   0

4   10.28.17.213  0   0   0   0   0
0   0   0   0   0   0   0   0

4   10.28.16.151  0   0   0   0   0
0   0   0   0   0   0   0   0

4   all   1   0   0   0   0
0   0   0   0   0   0   0   0

3   10.28.16.175  3   0   3   0   0
1  11   04806  48   0   0  85

3   10.28.16.183  6   0   6   0   0   3
162   0 549 124   0   0  73

3   10.28.16.180 11   0  10   0   0
3  27   0 776  89   0   0  67

3   10.28.16.176 28   2  26   0   0  10
405   02572 198   0   0 100

3   10.28.16.178   46064602   4   0   0
294534   3   0 723  49   0   0  99

3   10.28.16.179   49054879  26   0   0  312208
311   0 735 271   0   0  99

3   10.28.16.181   55155502  13   0   0
352107  77   0  89  87   0   0  99

3   10.28.16.184  12095   12059  10   0   0
763014  39   0 249 147   0   0  99

3   10.28.58.1154016040 1166354  53  191605
474  202346 192  96 144  83  99

3   all   42574   33086 2176354  53 *1913488*
1582  202300 348 138 153 105  99




On Fri, Feb 20, 2015 at 11:46 PM, W Verb wver...@gmail.com wrote:

 Hello All,

 Thank you for your replies.
 I tried a few things, and found the following:

 1: Disabling hyperthreading support in the BIOS drops performance overall
 by a factor of 4.
 2: Disabling VT support also seems to have some effect, although it
 appears to be minor. But this has the amusing side effect of fixing the
 hangs I've been experiencing with fast reboot. Probably by disabling kvm.
 3: The performance tests are a bit tricky to quantify because of caching
 effects. In fact, I'm not entirely sure what is happening here. It's just
 best to describe what I'm seeing:

 The commands I'm using to test are
 dd if=/dev/zero of=./test.dd bs=2M count=5000
 dd of=/dev/null if=./test.dd bs=2M count=5000
 The host vm is running Centos 6.6, and has the latest vmtools installed.
 There is a host cache on an SSD local to the host that is also in place.
 Disabling the host cache didn't immediately have an effect as far as I
 could see.

 The host MTU set to 3000 on all iSCSI interfaces for all tests.

 Test 1: Right after reboot, with an ixgbe MTU of 9000, the write test
 yields an average speed over three tests of 137MB/s. The read test yields
 an average over three tests of 5MB/s.

 Test 2: After setting ifconfig ixgbe0 mtu 3000, the write tests yield
 140MB/s, and the read tests yield 53MB/s. It's important to note here that
 if I cut the read test short at only 2-3GB, I get results upwards of
 350MB/s, which I assume is local cache-related distortion.

 Test 3: MTU of 1500. Read tests are up to 156 MB/s. Write tests yield
 about 142MB/s.
 Test 4: MTU of 1000: Read test at 182MB/s.
 Test 5: MTU of 900: Read test at 130 MB/s.
 Test 6: MTU of 1000: Read test at 160MB/s. Write tests are now
 consistently at about 300MB/s.
 Test 7: MTU of 1200: Read test at 124MB/s.
 Test 8: MTU of 1000: Read test at 161MB/s. Write at 261MB/s.

 A few final notes:
 L1ARC grabs about 10GB of RAM during the tests, so there's definitely some
 read caching going on.
 The write operations are easier to observe with iostat, and I'm seeing io
 rates that closely correlate with the network write speeds.


 Chris, thanks for your specific details. I'd appreciate it if you could
 tell me which copper NIC you tried, as well as to pass on the iSCSI tuning
 parameters.

 I've ordered an Intel EXPX9502AFXSR, which uses the 82598 chip instead of
 the 82599 in the X520. If I get 

[OmniOS-discuss] The ixgbe driver, Lindsay Lohan, and the Greek economy

2015-02-20 Thread W Verb
Hello all,

Each of the things in the subject line are:
1: Horrendously broken
2: Have an extremely poor short-term outlook
3: Will take a huge investment of time by intelligent, dedicated, and
insightful people to fix.

It's common knowledge that the ixgbe driver in omnios/illumos/opensolaris
is broke as hell. The point of this message is not to complain. The point
is to pass on a configuration that is working for me, albeit in a
screwed-up degraded fashion.

I have four ESXi 5.5u2 host servers with 1 Intel PCI-e quad-gigabit NIC
installed in each. Three of the gigabit ports on each client are dedicated
to carry iSCSI traffic between each host and a single storage server.

The storage server is based on a Supermicro X10SLM-F mainboard, which has
three PCI-e slots. Two of the slots are used for storage controllers, and a
single slot is used for an Intel X520 dual-port fiber 10G NIC.

Previously, I had a single storage controller and two quad-gig NICs
installed in the storage server, and was able to get close to line-rate on
multipath iSCSI with three host clients. But when I added the fourth, I
upgraded to 10G.

After installation and configuration, I observed all kinds of bad behavior
in the network traffic between the hosts and the server. All of this bad
behavior is traced to the ixgbe driver on the storage server. Without going
into the full troubleshooting process, here are my takeaways:

1: The only tuning factor that appears to have significant effect on the
driver is MTU size. This applies to both the MTU of the ixgbe NIC as well
as the MTU of the 1-gig NICs used in the hosts.

2: I have seen best performance with the MTU on the ixgbe set to 1000 bytes
(yes, 1k). The MTU on the ESXi interfaces is set to 3000 bytes.

3: Setting 9000 byte MTUs on both sides results in about 150MB/s write
speeds on a a linux vmware guest running a 10GB dd operation. But read
speeds are at 5MB/s.

4: Testing of dd operations on the storage server itself shows that the
filesystem is capable of performing 500MB/s+ reads and writes.

5: After setting the MTUs listed in point 2, I am able to get 270-300MB/s
writes on the guest OS, and ~200MB/s reads. Not perfect, but I'll take it.

6: No /etc/system or other kernel tunings are in use.

7: Delayed ACK, Nagle, and L2 flow control tests had no effect.

8: pkg -u was performed before all tests, so I should be using the latest
kernel code, etc.

9: When capturing traffic on omnios, I used the CSW distribution of
tcpdump. It's worth noting that unlike EVERY ... OTHER ... IMPLEMENTATION
... of tcpdump I've ever used (BSD flavors, OSX, various linux distros,
various embedded distros), libpcap doesn't appear to get individual frame
reports from the omnios kernel, and so aggregates multi-frame TCP segments
into a single record. This has the appearance of 20-60kB frames being
transmitted by omnios when reading a packet capture with Wireshark. I
cannot tell you how irritating this is when troubleshooting network issues.

10: At the wire level, the speed problems are clearly due to pauses in
response time by omnios. At 9000 byte frame sizes, I see a good number of
duplicate ACKs and fast retransmits during read operations (when omnios is
transmitting). But below about a 4100-byte MTU on omnios (which seems to
correlate to 4096-byte iSCSI block transfers), the transmission errors fade
away and we only see the transmission pause problem.

I'm in the process of aggregating the 10G ports and performing some IO
testing with the vmware IO performance tool. That should show the
performance of the 10G NIC when both physical ports are in use, and
hopefully get me some more granularity on the MTU settings.

If anyone has a list of kernel tuning parameters to test, I'm happy to try
them out and report back. I've found a variety of suggestions online, but
between Illumos, solaris, openindiana, Nexenta, opensolaris, etc, the
supported variables are, um, inconsistent.

-Warren V
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] The ixgbe driver, Lindsay Lohan, and the Greek economy

2015-02-20 Thread Dan McDonald
You should PLEASE share your original note with the illumos developer's list.

Also, please keep in mind that it is VERY possible you're seeing crosscall 
effects where the interrupt-servicing CPU core is not on the same PCIe bus as 
where you're card is plugged in.

I never got a chance to perform these tests fully when I *had* ixgbe HW handy, 
but I observed bizarre improvements, or the disappearance of bizarre effects, 
if I:

- disabled the HT-inspired CPUs (psradm -f HT CPUs)

- Disabled one of the two CPUs (again, using psradm).

You may wish to try messing around with what OS-reported CPUs are on your 
Romley (Xeon E5) system.

I will also note that it's high time for illumos to pull in the ixgbe updates 
from upstream.  Intel is NOT being very helpful here, partially because of 
fear-of-Oracle, and partially because there aren't enough illumos customers to 
make a dent in their HW sales.

In the past, illumos developers have found the time to yank in the newest 
driver updates from upstream.  That hasn't happened recently. For the record, 
OmniTI might be able to contribute here IF AND ONLY IF a sufficiently paying 
customer motivates us. I suspect the same answer (sufficiently paying customer) 
applies to engineers from any other illumos shop as well.

Dan

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] The ixgbe driver, Lindsay Lohan, and the Greek economy

2015-02-20 Thread Dan McDonald

 On Feb 20, 2015, at 9:10 PM, Dan McDonald dan...@omniti.com wrote:
 
 I never got a chance to perform these tests fully when I *had* ixgbe HW 
 handy, but I observed bizarre improvements, or the disappearance of bizarre 
 effects, if I:
 
   - disabled the HT-inspired CPUs (psradm -f HT CPUs)
 
   - Disabled one of the two CPUs (again, using psradm).
 
 You may wish to try messing around with what OS-reported CPUs are on your 
 Romley (Xeon E5) system.

To see the layouts, the psrinfo(1M) command is your friend, especially psrinfo 
-vp.

Dan

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] The ixgbe driver, Lindsay Lohan, and the Greek economy

2015-02-20 Thread Chris Siebenmann
 After installation and configuration, I observed all kinds of bad behavior
 in the network traffic between the hosts and the server. All of this bad
 behavior is traced to the ixgbe driver on the storage server. Without going
 into the full troubleshooting process, here are my takeaways:
[...]

 For what it's worth, we managed to achieve much better line rates on
copper 10G ixgbe hardware of various descriptions between OmniOS
and CentOS 7 (I don't think we ever tested OmniOS to OmniOS). I don't
believe OmniOS could do TCP at full line rate but I think we managed 700+
Mbytes/sec on both transmit and receive and we got basically disk-limited
speeds with iSCSI (across multiple disks on multi-disk mirrored pools,
OmniOS iSCSI initiator, Linux iSCSI targets).

 I don't believe we did any specific kernel tuning (and in fact some of
our attempts to fiddle ixgbe driver parameters blew up in our face).
We did tune iSCSI connection parameters to increase various buffer
sizes so that ZFS could do even large single operations in single iSCSI
transactions. (More details available if people are interested.)

 10: At the wire level, the speed problems are clearly due to pauses in
 response time by omnios. At 9000 byte frame sizes, I see a good number
 of duplicate ACKs and fast retransmits during read operations (when
 omnios is transmitting). But below about a 4100-byte MTU on omnios
 (which seems to correlate to 4096-byte iSCSI block transfers), the
 transmission errors fade away and we only see the transmission pause
 problem.

 This is what really attracted my attention. In our OmniOS setup, our
specific Intel hardware had ixgbe driver issues that could cause
activity stalls during once-a-second link heartbeat checks. This
obviously had an effect at the TCP and iSCSI layers. My initial message
to illumos-developer sparked a potentially interesting discussion:

http://www.listbox.com/member/archive/182179/2014/10/sort/time_rev/page/16/entry/6:405/20141003125035:6357079A-4B1D-11E4-A39C-D534381BA44D/

If you think this is a possibility in your setup, I've put the DTrace
script I used to hunt for this up on the web:

http://www.cs.toronto.edu/~cks/src/omnios-ixgbe/ixgbe_delay.d

This isn't the only potential source of driver stalls by any means, it's
just the one I found. You may also want to look at lockstat in general,
as information it reported is what led us to look specifically at the
ixgbe code here.

(If you suspect kernel/driver issues, lockstat combined with kernel
source is a really excellent resource.)

- cks
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] The ixgbe driver, Lindsay Lohan, and the Greek economy

2015-02-20 Thread W Verb
Hello All,

Thank you for your replies.
I tried a few things, and found the following:

1: Disabling hyperthreading support in the BIOS drops performance overall
by a factor of 4.
2: Disabling VT support also seems to have some effect, although it appears
to be minor. But this has the amusing side effect of fixing the hangs I've
been experiencing with fast reboot. Probably by disabling kvm.
3: The performance tests are a bit tricky to quantify because of caching
effects. In fact, I'm not entirely sure what is happening here. It's just
best to describe what I'm seeing:

The commands I'm using to test are
dd if=/dev/zero of=./test.dd bs=2M count=5000
dd of=/dev/null if=./test.dd bs=2M count=5000
The host vm is running Centos 6.6, and has the latest vmtools installed.
There is a host cache on an SSD local to the host that is also in place.
Disabling the host cache didn't immediately have an effect as far as I
could see.

The host MTU set to 3000 on all iSCSI interfaces for all tests.

Test 1: Right after reboot, with an ixgbe MTU of 9000, the write test
yields an average speed over three tests of 137MB/s. The read test yields
an average over three tests of 5MB/s.

Test 2: After setting ifconfig ixgbe0 mtu 3000, the write tests yield
140MB/s, and the read tests yield 53MB/s. It's important to note here that
if I cut the read test short at only 2-3GB, I get results upwards of
350MB/s, which I assume is local cache-related distortion.

Test 3: MTU of 1500. Read tests are up to 156 MB/s. Write tests yield about
142MB/s.
Test 4: MTU of 1000: Read test at 182MB/s.
Test 5: MTU of 900: Read test at 130 MB/s.
Test 6: MTU of 1000: Read test at 160MB/s. Write tests are now consistently
at about 300MB/s.
Test 7: MTU of 1200: Read test at 124MB/s.
Test 8: MTU of 1000: Read test at 161MB/s. Write at 261MB/s.

A few final notes:
L1ARC grabs about 10GB of RAM during the tests, so there's definitely some
read caching going on.
The write operations are easier to observe with iostat, and I'm seeing io
rates that closely correlate with the network write speeds.


Chris, thanks for your specific details. I'd appreciate it if you could
tell me which copper NIC you tried, as well as to pass on the iSCSI tuning
parameters.

I've ordered an Intel EXPX9502AFXSR, which uses the 82598 chip instead of
the 82599 in the X520. If I get similar results with my fiber transcievers,
I'll see if I can get a hold of copper ones.

But I should mention that I did indeed look at PHY/MAC error rates, and
they are nil.

-Warren V

On Fri, Feb 20, 2015 at 7:25 PM, Chris Siebenmann c...@cs.toronto.edu
wrote:

  After installation and configuration, I observed all kinds of bad
 behavior
  in the network traffic between the hosts and the server. All of this bad
  behavior is traced to the ixgbe driver on the storage server. Without
 going
  into the full troubleshooting process, here are my takeaways:
 [...]

  For what it's worth, we managed to achieve much better line rates on
 copper 10G ixgbe hardware of various descriptions between OmniOS
 and CentOS 7 (I don't think we ever tested OmniOS to OmniOS). I don't
 believe OmniOS could do TCP at full line rate but I think we managed 700+
 Mbytes/sec on both transmit and receive and we got basically disk-limited
 speeds with iSCSI (across multiple disks on multi-disk mirrored pools,
 OmniOS iSCSI initiator, Linux iSCSI targets).

  I don't believe we did any specific kernel tuning (and in fact some of
 our attempts to fiddle ixgbe driver parameters blew up in our face).
 We did tune iSCSI connection parameters to increase various buffer
 sizes so that ZFS could do even large single operations in single iSCSI
 transactions. (More details available if people are interested.)

  10: At the wire level, the speed problems are clearly due to pauses in
  response time by omnios. At 9000 byte frame sizes, I see a good number
  of duplicate ACKs and fast retransmits during read operations (when
  omnios is transmitting). But below about a 4100-byte MTU on omnios
  (which seems to correlate to 4096-byte iSCSI block transfers), the
  transmission errors fade away and we only see the transmission pause
  problem.

  This is what really attracted my attention. In our OmniOS setup, our
 specific Intel hardware had ixgbe driver issues that could cause
 activity stalls during once-a-second link heartbeat checks. This
 obviously had an effect at the TCP and iSCSI layers. My initial message
 to illumos-developer sparked a potentially interesting discussion:


 http://www.listbox.com/member/archive/182179/2014/10/sort/time_rev/page/16/entry/6:405/20141003125035:6357079A-4B1D-11E4-A39C-D534381BA44D/

 If you think this is a possibility in your setup, I've put the DTrace
 script I used to hunt for this up on the web:

 http://www.cs.toronto.edu/~cks/src/omnios-ixgbe/ixgbe_delay.d

 This isn't the only potential source of driver stalls by any means, it's
 just the one I found. You may also want to look at lockstat in