Just a word of warning.
I had multiple simultaneous node failures from running "i7z" monitoring tool 
while investigating latency issues. It does nothing more than reading MSRs from 
the CPU.
That was on CentOS 6.5 kernel.
cpu_dma_latency was opened with "1" with an occassional run of cyclictest from 
rt-tools.

Not sure how powertop works (it probably just peeks into sysfs).

Anyway, you should really investigate whether TurboBoost works now, it could 
cause a 50% drop easily. 

Jan


> On 03 Sep 2015, at 09:37, Nick Fisk <n...@fisk.me.uk> wrote:
> 
> acpi_cpufreq was the driver I used.
>  
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com 
> <mailto:ceph-users-boun...@lists.ceph.com>] On Behalf Of Robert LeBlanc
> Sent: 02 September 2015 22:34
> To: Nick Fisk <n...@fisk.me.uk <mailto:n...@fisk.me.uk>>
> Cc: ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
> Subject: Re: [ceph-users] Ceph SSD CPU Frequency Benchmarks
>  
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
>  
> Changing to the acpi_idle driver dropped the performance by about 50%. That 
> was an unexpected result.
>  
> I'm having issues with powertop and the userspace governor, it always shows 
> 100% idle. I downloaded the latest version with the same result. Still more 
> work to do, but wanted to share my findings.
>  
> - ----------------
> Robert LeBlanc
> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>  
> On Wed, Sep 2, 2015 at 9:50 AM, Robert LeBlanc  wrote:
> - -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
>  
> Thanks for the responses.
>  
> I forgot to include the fio test for completeness:
>  
> 8 job QD=8
> [ext4-test]
> runtime=150
> name=ext4-test
> readwrite=randrw
> size=15G
> blocksize=4k
> ioengine=sync
> iodepth=8
> numjobs=8
> thread
> group_reporting
> time_based
> direct=1
>  
>  
> 1 job QD=1
> [ext4-test]
> runtime=150
> name=ext4-test
> readwrite=randrw
> size=15G
> blocksize=4k
> ioengine=sync
> iodepth=1
> numjobs=1
> thread
> group_reporting
> time_based
> direct=1
>  
> I have not disabled all of the power management, I've only prevented the CPU 
> from going to an idle state below C1. I'll have to check on Jan's suggestion 
> of swapping out the intel_idle driver to see what difference it makes. I did 
> not run powertop as I did the testing because it (or cpupower monitor) 
> impacted performance and would have thrown off the results. I'll do some runs 
> with lower clocks and make sure that it is staying at the lower speeds. Here 
> is some additional output:
>  
> # cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor              
> userspace
> # cpupower monitor
>     |Nehalem                    || Mperf              || Idle_Stats         
> CPU | C3   | C6   | PC3  | PC6  || C0   | Cx   | Freq || POLL | C1-A | C6-A 
>    0|  0.00| 94.19|  0.00|  0.00||  5.70| 94.30|  1299||  0.00|  0.00| 94.32
>    1|  0.00| 99.39|  0.00|  0.00||  0.53| 99.47|  1298||  0.00|  0.00| 99.48
>    2|  0.00| 99.60|  0.00|  0.00||  0.38| 99.62|  1299||  0.00|  0.00| 99.61
>    3|  0.00| 99.63|  0.00|  0.00||  0.36| 99.64|  1299||  0.00|  0.00| 99.64
>    4|  0.00| 99.84|  0.00|  0.00||  0.11| 99.89|  1301||  0.00|  0.00| 99.97
>    5|  0.00| 99.57|  0.00|  0.00||  0.40| 99.60|  1299||  0.00|  0.00| 99.61
>    6|  0.00| 99.72|  0.00|  0.00||  0.27| 99.73|  1299||  0.00|  0.00| 99.73
>    7|  0.00| 99.98|  0.00|  0.00||  0.01| 99.99|  1321||  0.00|  0.00| 99.99
> # cat /sys/devices/system/cpu/cpuidle/current_driver 
> intel_idle
>  
> I then echo "1" into /dev/cpu_dma_latency. We can see that the idle time 
> moves from C6 to C1
>  
> # cpupower monitor
>     |Nehalem                    || Mperf              || Idle_Stats         
> CPU | C3   | C6   | PC3  | PC6  || C0   | Cx   | Freq || POLL | C1-A | C6-A 
>    0|  0.00|  0.00|  0.00|  0.00||  0.37| 99.63|  1299||  0.00| 99.63|  0.00
>    1|  0.00|  0.00|  0.00|  0.00||  0.16| 99.84|  1299||  0.00| 99.84|  0.00
>    2|  0.00|  0.00|  0.00|  0.00||  0.47| 99.53|  1299||  0.00| 99.53|  0.00
>    3|  0.00|  0.00|  0.00|  0.00||  0.43| 99.57|  1299||  0.00| 99.57|  0.00
>    4|  0.00|  0.00|  0.00|  0.00||  0.09| 99.91|  1300||  0.00| 99.91|  0.00
>    5|  0.00|  0.00|  0.00|  0.00||  0.06| 99.94|  1298||  0.00| 99.94|  0.00
>    6|  0.00|  0.00|  0.00|  0.00||  0.09| 99.91|  1300||  0.00| 99.91|  0.00
>    7|  0.00|  0.00|  0.00|  0.00||  0.28| 99.72|  1299||  0.00| 99.72|  0.00
> # cat /sys/devices/system/cpu/cpu0/cpuidle/state*/latency
> 0
> 2
> 15
> # cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_{min,max,cur}_freq 
> 1200000
> 1200000
> 1200000
> 1200000
> 1200000
> 1200000
> 1200000
> 1200000
> 1200000
> 2401000
> 2401000
> 2401000
> 2401000
> 2401000
> 2401000
> 2401000
> 1200000
> 1200000
> 1200000
> 1600000
> 1200000
> 1200000
> 1200000
> 1200000
>  
> Thanks for taking the time to collaborate with me on this.
> - -----BEGIN PGP SIGNATURE-----
> Version: Mailvelope v1.0.2
> Comment: 
> http://xo4t.mj.am/link/xo4t/ns52ju9/1/edoAvwwvkDAJvN74g633xw/aHR0cHM6Ly93d3cubWFpbHZlbG9wZS5jb20v
>  
> <http://xo4t.mj.am/link/xo4t/ns52ju9/1/edoAvwwvkDAJvN74g633xw/aHR0cHM6Ly93d3cubWFpbHZlbG9wZS5jb20v>
>  
> wsFcBAEBCAAQBQJV5xrBCRDmVDuy+mK58QAAWaoP/2bIKlsp+fmlViP4pFV7
> Sv+y/1nCQdNs0l2AJdiDX2l7OQrYavDh5LldJBkcmTyB74KjDJ+i88VGYkdG
> n8Q6tTbF4erw8P/gPf3DIrvQazdQm+a/6rUBpkM+MNTRyKRczxeyCu8kCNzb
> jDP7erwnj0WzCZMAA1uFLa9sMKBNxOfpK9wQR5NbQCkOcsDtprNL2KPfxrFV
> Rgk0OBGBSLtz9BE/PMYpbeqr9o1nChCp4hkg5AUcFrAuceOKdA7R8lKPIUZ6
> 0zTL1OjGsGfy/sp856poqmF02bANF9LXzmcBMKBNMO0iS89xv0YyIgRBlt/Z
> lXc4M7IWtYzbbUVAtSLcOtWrzS8Yp0hMKlPrhA7LZFrhZ4+t45mvyrS3RbiP
> RG8osdvjz58ZBS7/jk1gDZd8Xbj5bsU3n01DTFJ3CeAE2etAqgheAGlj4OTR
> kfs/g1jbYArEgnfX3jTJ2wECjfVRTrgXJGjceoYtJYbQ4Ns/0dBWpZBrkEu0
> AX4VU1dk9R1B0rootvKsWedcKvof4cSOyKRtQxGHS7ipqtkyep+1JquO41mr
> cBC9p/TOXgh90M8476G1CpMqWwWHneHJ6bjO5V1W8uWGXTNFnaGbqS4v3mWk
> ge1qukr9et0Su0llUb8Rz3hCDqD6PfMJpquBTAB/kaanS+t0pi+00wxu7zzB
> zVQ/
> =v4sY
> - -----END PGP SIGNATURE-----
>  
> - ----------------
> Robert LeBlanc
> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>  
> On Wed, Sep 2, 2015 at 3:21 AM, Nick Fisk  wrote:
> I think this may be related to what I had to do, it rings a bell at least.
>  
> http://xo4t.mj.am/link/xo4t/ns52ju9/2/dqmSDogyUoMBdB6plMOfQA/aHR0cDovL3VuaXguc3RhY2tleGNoYW5nZS5jb20vcXVlc3Rpb25zLzE1MzY5My9jYW50LXVzZS11c2Vyc3BhY2UtY3B1ZnJlcS1nb3Zlcm5vci1hbmQtc2V0LWNwdS1mcmVxdWVuY3k
>  
> <http://xo4t.mj.am/link/xo4t/ns52ju9/2/dqmSDogyUoMBdB6plMOfQA/aHR0cDovL3VuaXguc3RhY2tleGNoYW5nZS5jb20vcXVlc3Rpb25zLzE1MzY5My9jYW50LXVzZS11c2Vyc3BhY2UtY3B1ZnJlcS1nb3Zlcm5vci1hbmQtc2V0LWNwdS1mcmVxdWVuY3k>
>  
> The P-state drive doesn't support userspace, so you need to disable it and 
> make Linux use the old acpi drive instead.
>  
> > -----Original Message-----
> > From: Nick Fisk [mailto:n...@fisk.me.uk <mailto:n...@fisk.me.uk>]
> > Sent: 01 September 2015 22:21
> > To: 'Robert LeBlanc' 
> > Cc: ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
> > Subject: RE: [ceph-users] Ceph SSD CPU Frequency Benchmarks
> > 
> > > -----Original Message-----
> > > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com 
> > > <mailto:ceph-users-boun...@lists.ceph.com>] On Behalf
> > > Of Robert LeBlanc
> > > Sent: 01 September 2015 21:48
> > > To: Nick Fisk 
> > > Cc: ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
> > > Subject: Re: [ceph-users] Ceph SSD CPU Frequency Benchmarks
> > >
> > > -----BEGIN PGP SIGNED MESSAGE-----
> > > Hash: SHA256
> > >
> > > Nick,
> > >
> > > I've been trying to replicate your results without success. Can you
> > > help me understand what I'm doing that is not the same as your test?
> > >
> > > My setup is two boxes, one is a client and the other is a server. The
> > > server has Intel(R) Atom(TM) CPU  C2750  @ 2.40GHz, 32 GB RAM and 2
> > > Intel S3500
> > > 240 GB SSD drives. The boxes have Infiniband FDR cards connected to a
> > > QDR switch using IPoIB. I set up OSDs on the 2 SSDs and set pool
> > > size=1. I mapped a 200GB RBD using the kernel module ran fio on the
> > > RBD. I adjusted the number of cores, clock speed and C-states of the
> > > server and here are my
> > > results:
> > >
> > > Adjusted core number and set the processor to a set frequency using
> > > the userspace governor.
> > >
> > > 8 jobs 8 depth   Cores
> > >                   1    2     3     4     5     6     7     8
> > > Frequency  2.4  387  762  1121  1432  1657  1900  2092  2260
> > > GHz        2    386  758  1126  1428  1657  1890  2090  2232
> > >            1.6  382  756  1127  1428  1656  1894  2083  2201
> > >            1.2  385  756  1125  1431  1656  1885  2093  2244
> > >
> > 
> > I tested at QD=1 as this tends to highlight the difference in clock speed,
> > whereas a higher queue depth will probably scale with both frequency and
> > cores. I'm not sure this is your problem, but to make sure your environment
> > is doing what you want I would suggest QD=1 and 1 job to start with.
> > 
> > But thank you for sharing these results regardless of your current frequency
> > scaling issues. Information like this is really useful for people trying to 
> > decide
> > on hardware purchases. Those Atom boards look like they could support 12x
> > normal HDD's quite happily, assuming 80 IOPsx12.
> > 
> > I wonder if we can get enough data from various people to generate a
> > IOPs/CPU Freq for various CPU architectures?
> > 
> > 
> > > I then adjusted the processor to not go in a deeper sleep state than
> > > C1 and also tested setting the highest CPU frequency with the ondemand
> > governor.
> > >
> > > 1 job 1 depth
> > > Cores  1
> > >               <=C1, feq range  C0-C6, freq range  C0-C6, static freq      
> > >   <=C1, static
> > > freq
> > > Frequency 2.4  381             381                379                 381
> > > GHz       2    382             380                381                 381
> > >           1.6  380             381                379                 382
> > >           1.2  383             378                379                 383
> > > Cores  8
> > >               <=C1, feq range  C0-C6, freq range  C0-C6, static freq      
> > >   <=C1, static
> > > freq
> > > Frequency 2.4  629             580                584                 629
> > > GHz       2    630             579                584                 634
> > >           1.6  630             579                584                 634
> > >           1.2  632             581                582                 634
> > >
> > > Here I'm see a correlation between # cores and C-states, but not
> > frequency.
> > >
> > > Frequency was controlled with:
> > > cpupower frequency-set -d 1.2GHz -u 1.2GHz -g userspace and cpupower
> > > frequency-set -d 1.2GHz -u 2.0GHz -g ondemand
> > >
> > > Core count adjusted by:
> > > for i in {1..7}; do echo 0 > /sys/devices/system/cpu/cpu$i/online;
> > > done
> > >
> > > C-states controlled by:
> > > # python
> > > Python 2.7.5 (default, Jun 24 2015, 00:41:19) [GCC 4.8.3 20140911 (Red
> > > Hat 4.8.3-9)] on linux2 Type "help", "copyright", "credits" or
> > > "license" for more information.
> > > >>> fd = open('/dev/cpu_dma_latency','wb')
> > > >>> fd.write('1')
> > > >>> fd.flush()
> > > >>> fd.close() # Don't run this until the tests are completed (the
> > > >>> handle has
> > > to stay open).
> > > >>>
> > >
> > > I'd like to replicate your results. I'd also like if you can verify
> > > some of mine in your set-up around C-States and cores.
> > 
> > I can't remember exactly, but I think I had to do something to get the
> > userspace governor to behave as I expected it to. I tend to recall setting 
> > the
> > frequency low and yet still seeing it bursting up to max. I will have a look
> > through my notes tomorrow and see if I can recall anything. One thing I do
> > remember though is that the Intel powertop utility was very useful in
> > confirming what the actual CPU frequency was. It might be worth installing
> > and running this and seeing what the CPU cores are doing.
> > 
> > 
> > >
> > > Thanks,
> > >
> > > -----BEGIN PGP SIGNATURE-----
> > > Version: Mailvelope v1.0.2
> > > Comment: 
> > > http://xo4t.mj.am/link/xo4t/ns52ju9/3/77Gqq0K7J8uNuAdReV-d-Q/aHR0cHM6Ly93d3cubWFpbHZlbG9wZS5jb20v
> > >  
> > > <http://xo4t.mj.am/link/xo4t/ns52ju9/3/77Gqq0K7J8uNuAdReV-d-Q/aHR0cHM6Ly93d3cubWFpbHZlbG9wZS5jb20v>
> > >
> > >
> > wsFcBAEBCAAQBQJV5g8GCRDmVDuy+mK58QAAe6YP/j+SNGFI2z7ndnbOk87
> > > D
> > > UjxG+hiZT5bkdt2/wVfI6QiH0UGDA3rLBsttOHPgfxP6/CEy801q8/fO0QOk
> > > tLxIgX01K4ECls2uhiFAM3bhKalFsKDM6rHYFx96tIGWonQeou36ouDG8pfz
> > > YsprvQ2XZEX1+G4dfZZ4lc3A3mfIY6Wsn7DC0tup9eRp3cl9hQLXEu4Zg8CZ
> > > 7867FNaud4S4f6hYV0KUC0fv+hZvyruMCt/jgl8gVr8bAdNgiW5u862gsk5b
> > > sO9mb7H679G8t47m3xd89jTh9siMshbcakF9PXKzrN7DxBb/sBuN3GykesZA
> > > +5jdUTzPCxFu+LocJ91by8FybatpLwxycmfP2gRxd/owclXk5BqqJUnrdYVm
> > >
> > n2GcHobdHVv9k/s+iBVV0xbwqOY+IO9UNUfLAKNy7E1xtpXdTpQBuokmu/4D
> > >
> > WXg3C4u+DsZNvcziO4s/edQ1koOQm1Fcj5VnbouSqmsHpB5nHeJbGmiKNTB
> > > A
> > > 9pE/hTph56YRqOE3bq3X/ohjtziL7/e/MVF3VUisDJieaLxV9weLxKIf0W9t
> > > L7NMhX7iUIMps5ulA9qzd8qJK6yBa65BVXtk5M0A5oTA/VvxHQT6e5nSZS+Z
> > >
> > WLjavMnmSSJT1BQZ5GkVbVqo4UVjndcXEvkBm3+McaGKliO2xvxP+U3nCKpZ
> > > js+h
> > > =4WAa
> > > -----END PGP SIGNATURE-----
> > >
> > >
> > > ----------------
> > > Robert LeBlanc
> > > PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
> > >
> > > On Sat, Jun 13, 2015 at 8:58 AM, Nick Fisk  wrote:
> > > Hi All,
> > >
> > > I know there has been lots of discussions around needing fast CPU's to
> > > get the most out of SSD's. However I have never really ever seen an
> > > solid numbers to make a comparison about how much difference a faster
> > > CPU makes and if Ceph scales linearly with clockspeed. So I did a
> > > little experiment today.
> > >
> > > I setup a 1 OSD Ceph instance on a Desktop PC. The Desktop has a i5
> > > Sandbybridge CPU with the CPU turbo overclocked to 4.3ghz. By using
> > > the userspace governor in Linux, I was able to set static clock speeds
> > > to see the possible performance effects on Ceph. My pc only has an old
> > > X25M-G2 SSD, so I had to limit the IO testing to 4kb QD=1, as
> > > otherwise the SSD ran out of puff when I got to the higher clock
> > > speeds.
> > >
> > > CPU Mhz 4Kb Write IO    Min Latency (us)        Avg Latency (us)        
> > > CPU
> > > usr     CPU sys
> > > 1600            797             886                     1250
> > > 10.14           2.35
> > > 2000            815             746                     1222
> > > 8.45            1.82
> > > 2400            1161            630                     857
> > > 9.5             1.6
> > > 2800            1227            549                     812
> > > 8.74            1.24
> > > 3300            1320            482                     755
> > > 7.87            1.08
> > > 4300            1548            437                     644
> > > 7.72            0.9
> > >
> > > The figures show a fairly linear trend right through the clock range
> > > and clearly shows the importance of having fast CPU's (Ghz not cores)
> > > if you want to achieve high IO, especially at low queue depths.
> > >
> > >
> > > Things to Note
> > > These figures are from a desktop CPU, no doubt Xeons will be slightly
> > > faster at the same clock speed I assuming using the userspace governor
> > > in this way is a realistic way to simulate different CPU clock speeds?
> > > My old SSD is probably skewing the figures slightly I have complete
> > > control over the turbo settings and big cooling, many server CPU's
> > > will limit the max turbo if multiple cores are under load or get too
> > > hot Ceph SSD OSD nodes are probably best with high end E3 CPU's as
> > > they have the highest clock speeds HDD's with Journals will probably
> > > benefit slightly from higher clock speeds, if the disk isn't the
> > > bottleneck (ie small block sequential writes) These numbers are for
> > > Replica=1, at 2 or 3 these numbers will be at least half I would
> > > imagine
> > >
> > >
> > > I hope someone finds this useful
> > >
> > > Nick
> > >
> > >
> > >
> > >
> > > _______________________________________________
> > > ceph-users mailing list
> > > ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
> > > http://xo4t.mj.am/link/xo4t/ns52ju9/4/-vMNrHFDvYhqvjD5TCCMBQ/aHR0cDovL2xpc3RzLmNlcGguY29tL2xpc3RpbmZvLmNnaS9jZXBoLXVzZXJzLWNlcGguY29t
> > >  
> > > <http://xo4t.mj.am/link/xo4t/ns52ju9/4/-vMNrHFDvYhqvjD5TCCMBQ/aHR0cDovL2xpc3RzLmNlcGguY29tL2xpc3RpbmZvLmNnaS9jZXBoLXVzZXJzLWNlcGguY29t>
>  
>  
>  
>  
>  
>  
>  
>  
> -----BEGIN PGP SIGNATURE-----
> Version: Mailvelope v1.0.2
> Comment: 
> http://xo4t.mj.am/link/xo4t/ns52ju9/5/nIvdtYToWIvlNUw_Xqjh3A/aHR0cHM6Ly93d3cubWFpbHZlbG9wZS5jb20v
>  
> <http://xo4t.mj.am/link/xo4t/ns52ju9/5/nIvdtYToWIvlNUw_Xqjh3A/aHR0cHM6Ly93d3cubWFpbHZlbG9wZS5jb20v>
>  
> wsFcBAEBCAAQBQJV52tMCRDmVDuy+mK58QAAOLsP/07h5cp9ytKW11kB/Ijk
> pIYPrqvemNSafFhvtlXJPtps8BE624z7RlL+I+tn44wlt0u6yu4GmAitWZTX
> 415ULFEC0SLLrqZZ8a9bwlCIcA4l17qwFx/Q8Ao4naxhzCX1SAydBiuxb46z
> j9X6kl7UvyX4kDdx0HhN1g70UdOwGBAHQzWNZWOec/LhSZSRxlDBU64aWga0
> p4tYE8yWFE+xTxfD7M48BPSeSUcP/sYbdUq5pFQlhml7o1peBQAKY4w+BYas
> YrfcNcpaNwehqD4MAfbNrmG2A3MKtVWDpvmi+y9JIgIUx0TbPQT3W2nkO9sD
> nMT2R5cJlck/BpoGW94q9w+aZdkVooqMNjVM64HyuqbY1I+NoZE/TV9vo1Kw
> hDWEvp2I5duWJp4/BKpryAtYE5/U7Ob/x5FMJ4QDrfGMPB8Q7seLdCeKDaK8
> yZK9WA800BCquKwx/bwROqkN/n3ArZVkTS87UuN0s3FnPV0U1xLBJiQrphZ3
> CnFBdpau//X9piNRR9c84Ped6b8iqDp7mj77MeELInUGbCabWka3LhzfK8H4
> co2MVRDam+aDKpuKQg5LP/rjxEoSOcH9fERPcImlJ+fNksu+fXpHATmltkKe
> eUL5eyxGVh+n0SVpQue3Chfx1/CbpxKMH6I1DpXPZMz/BfiMIAryre97JuF6
> Do2Q
> =nnPZ
> -----END PGP SIGNATURE-----
> 
>  _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to