Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

2015-02-28 Thread Philippe Schwarz
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Le 28/02/2015 12:19, mad Engineer a écrit :
 Hello All,
 
 I am trying ceph-firefly 0.80.8 
 (69eaad7f8308f21573c604f121956e64679a52a7) with 9 OSD ,all Samsung
 SSD 850 EVO on 3 servers with 24 G RAM,16 cores @2.27 Ghz Ubuntu
 14.04 LTS with 3.16-3 kernel.All are connected to 10G ports with
 maximum MTU.There are no extra disks for journaling and also there
 are no separate network for replication and data transfer.All 3
 nodes are also hosting monitoring process.Operating system runs on
 SATA disk.
 
 When doing a sequential benchmark using dd on RBD, mounted on
 client as ext4 its taking 110s to write 100Mb data at an average
 speed of 926Kbps.
 
 time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct 
 25000+0 records in 25000+0 records out 10240 bytes (102 MB)
 copied, 110.582 s, 926 kB/s
 
 real1m50.585s user0m0.106s sys 0m2.233s
 
 While doing this directly on ssd mount point shows:
 
 time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct 
 25000+0 records in 25000+0 records out 10240 bytes (102 MB)
 copied, 1.38567 s, 73.9 MB/s
 
 OSDs are in XFS with these extra arguments :
 
 rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M
 
 ceph.conf
 
 [global] fsid = 7d889081-7826-439c-9fe5-d4e57480d9be 
 mon_initial_members = ceph1, ceph2, ceph3 mon_host =
 10.99.10.118,10.99.10.119,10.99.10.120 auth_cluster_required =
 cephx auth_service_required = cephx auth_client_required = cephx 
 filestore_xattr_use_omap = true osd_pool_default_size = 2 
 osd_pool_default_min_size = 2 osd_pool_default_pg_num = 450 
 osd_pool_default_pgp_num = 450 max_open_files = 131072
 
 [osd] osd_mkfs_type = xfs osd_op_threads = 8 osd_disk_threads = 4 
 osd_mount_options_xfs =
 rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M
 
 
 on our traditional storage with Full SAS disk, same dd completes
 in 16s with an average write speed of 6Mbps.
 
 Rados bench:
 
 rados bench -p rbd 10 write Maintaining 16 concurrent writes of
 4194304 bytes for up to 10 seconds or 0 objects Object prefix:
 benchmark_data_ceph1_2977 sec Cur ops   started  finished  avg MB/s
 cur MB/s  last lat   avg lat 0   0 0 0
 0 0 - 0 1  169478
 311.821   312  0.041228  0.140132 2  16   192   176
 351.866   392  0.106294  0.175055 3  16   275   259
 345.216   332  0.076795  0.166036 4  16   302   286
 285.912   108  0.043888  0.196419 5  16   395   379
 303.11   372  0.126033  0.207488 6  16   501   485
 323.242   424  0.125972  0.194559 7  16   621   605
 345.621   480  0.194155  0.183123 8  16   730   714
 356.903   436  0.086678  0.176099 9  16   814   798
 354.572   336  0.081567  0.174786 10  16   832
 816   326.31372  0.037431  0.182355 11  16   833
 817   297.013 4  0.533326  0.182784 Total time run:
 11.489068 Total writes made:  833 Write size:
 4194304 Bandwidth (MB/sec): 290.015
 
 Stddev Bandwidth:   175.723 Max bandwidth (MB/sec): 480 Min
 bandwidth (MB/sec): 0 Average Latency:0.220582 Stddev
 Latency: 0.343697 Max latency:2.85104 Min
 latency:0.035381
 
 Our ultimate aim is to replace existing SAN with ceph,but for that
 it should meet minimum 8000 iops.Can any one help me with this,OSD
 are SSD,CPU has good clock speed,backend network is good but still
 we are not able to extract full capability of SSD disks.
 
 
 
 Thanks,

Hi, i'm new to ceph so, don't consider my words as holy truth.

It seems that Samsung 840 (so i assume 850) are crappy for ceph :

MTBF :
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-November/044258.html
Bandwidth
:http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-December/045247.html

And according to a confirmed user of Ceph/ProxmoX, Samsung SSDs should
be avoided if possible in ceph storage.

Apart from that, it seems there was an limitation in ceph for the use
of the complete bandwidth available in SSDs; but i think with less
than 1Mb/s you haven't hit this limit.

I remind you that i'm not a ceph-guru (far from that, indeed), so feel
free to disagree; i'm on the way to improve my knowledge.

Best regards.




-BEGIN PGP SIGNATURE-
Version: GnuPG v1

iEYEARECAAYFAlTxp0UACgkQlhqCFkbqHRb5+wCgrXCM3VsnVE6PCbbpOmQXCXbr
8u0An2BUgZWismSK0PxbwVDOD5+/UWik
=0o0v
-END PGP SIGNATURE-
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

2015-02-28 Thread Stefan Priebe - Profihost AG

 Am 28.02.2015 um 12:43 schrieb Alexandre DERUMIER aderum...@odiso.com:
 
 Hi,
 
 First, test if your ssd can write fast with O_DSYNC
 check this blog:
 http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/
 
 
 Then, try with ceph Giant (or maybe wait for Hammer), because they are a lot 
 of optimisations for ssd for threads sharding.
 
 In my last test with giant, I was able to reach around 12iops with 
 6osd/intel s3500 ssd, but I was cpu limited.

But this was replication1? I never was able to do more than 30 000 with 
replication 3.

Stefan


 
 - Mail original -
 De: mad Engineer themadengin...@gmail.com
 À: ceph-users ceph-users@lists.ceph.com
 Envoyé: Samedi 28 Février 2015 12:19:56
 Objet: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD
 with 3.16-3 kernel
 
 Hello All, 
 
 I am trying ceph-firefly 0.80.8 
 (69eaad7f8308f21573c604f121956e64679a52a7) with 9 OSD ,all Samsung SSD 
 850 EVO on 3 servers with 24 G RAM,16 cores @2.27 Ghz Ubuntu 14.04 LTS 
 with 3.16-3 kernel.All are connected to 10G ports with maximum 
 MTU.There are no extra disks for journaling and also there are no 
 separate network for replication and data transfer.All 3 nodes are 
 also hosting monitoring process.Operating system runs on SATA disk. 
 
 When doing a sequential benchmark using dd on RBD, mounted on client 
 as ext4 its taking 110s to write 100Mb data at an average speed of 
 926Kbps. 
 
 time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct 
 25000+0 records in 
 25000+0 records out 
 10240 bytes (102 MB) copied, 110.582 s, 926 kB/s 
 
 real 1m50.585s 
 user 0m0.106s 
 sys 0m2.233s 
 
 While doing this directly on ssd mount point shows: 
 
 time dd if=/dev/zero of=hello bs=4k count=25000 
 oflag=direct 
 25000+0 records in 
 25000+0 records out 
 10240 bytes (102 MB) copied, 1.38567 
 s, 73.9 MB/s 
 
 OSDs are in XFS with these extra arguments : 
 
 rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M 
 
 ceph.conf 
 
 [global] 
 fsid = 7d889081-7826-439c-9fe5-d4e57480d9be 
 mon_initial_members = ceph1, ceph2, ceph3 
 mon_host = 10.99.10.118,10.99.10.119,10.99.10.120 
 auth_cluster_required = cephx 
 auth_service_required = cephx 
 auth_client_required = cephx 
 filestore_xattr_use_omap = true 
 osd_pool_default_size = 2 
 osd_pool_default_min_size = 2 
 osd_pool_default_pg_num = 450 
 osd_pool_default_pgp_num = 450 
 max_open_files = 131072 
 
 [osd] 
 osd_mkfs_type = xfs 
 osd_op_threads = 8 
 osd_disk_threads = 4 
 osd_mount_options_xfs = 
 rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M 
 
 
 on our traditional storage with Full SAS disk, same dd completes in 
 16s with an average write speed of 6Mbps. 
 
 Rados bench: 
 
 rados bench -p rbd 10 write 
 Maintaining 16 concurrent writes of 4194304 bytes for up to 10 
 seconds or 0 objects 
 Object prefix: benchmark_data_ceph1_2977 
 sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 
 0 0 0 0 0 0 - 0 
 1 16 94 78 311.821 312 0.041228 0.140132 
 2 16 192 176 351.866 392 0.106294 0.175055 
 3 16 275 259 345.216 332 0.076795 0.166036 
 4 16 302 286 285.912 108 0.043888 0.196419 
 5 16 395 379 303.11 372 0.126033 0.207488 
 6 16 501 485 323.242 424 0.125972 0.194559 
 7 16 621 605 345.621 480 0.194155 0.183123 
 8 16 730 714 356.903 436 0.086678 0.176099 
 9 16 814 798 354.572 336 0.081567 0.174786 
 10 16 832 816 326.313 72 0.037431 0.182355 
 11 16 833 817 297.013 4 0.533326 0.182784 
 Total time run: 11.489068 
 Total writes made: 833 
 Write size: 4194304 
 Bandwidth (MB/sec): 290.015 
 
 Stddev Bandwidth: 175.723 
 Max bandwidth (MB/sec): 480 
 Min bandwidth (MB/sec): 0 
 Average Latency: 0.220582 
 Stddev Latency: 0.343697 
 Max latency: 2.85104 
 Min latency: 0.035381 
 
 Our ultimate aim is to replace existing SAN with ceph,but for that it 
 should meet minimum 8000 iops.Can any one help me with this,OSD are 
 SSD,CPU has good clock speed,backend network is good but still we are 
 not able to extract full capability of SSD disks. 
 
 
 
 Thanks, 
 ___ 
 ceph-users mailing list 
 ceph-users@lists.ceph.com 
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

2015-02-28 Thread Alexandre DERUMIER
But this was replication1? I never was able to do more than 30 000 with 
replication 3.

Oh, sorry, it's was about read.

for write, I think I was around 3iops with 3 nodes (2x4cores 2,1ghz each), 
cpu bound, with replication x1.
with replication x3, around 9000iops.


Going to test on 2x10cores 3,1ghz in some weeks.





- Mail original -
De: Stefan Priebe s.pri...@profihost.ag
À: aderumier aderum...@odiso.com
Cc: mad Engineer themadengin...@gmail.com, ceph-users 
ceph-users@lists.ceph.com
Envoyé: Samedi 28 Février 2015 13:42:54
Objet: Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD  
with 3.16-3 kernel

 Am 28.02.2015 um 12:43 schrieb Alexandre DERUMIER aderum...@odiso.com: 
 
 Hi, 
 
 First, test if your ssd can write fast with O_DSYNC 
 check this blog: 
 http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/
  
 
 
 Then, try with ceph Giant (or maybe wait for Hammer), because they are a lot 
 of optimisations for ssd for threads sharding. 
 
 In my last test with giant, I was able to reach around 12iops with 
 6osd/intel s3500 ssd, but I was cpu limited. 

But this was replication1? I never was able to do more than 30 000 with 
replication 3. 

Stefan 


 
 - Mail original - 
 De: mad Engineer themadengin...@gmail.com 
 À: ceph-users ceph-users@lists.ceph.com 
 Envoyé: Samedi 28 Février 2015 12:19:56 
 Objet: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD 
 with 3.16-3 kernel 
 
 Hello All, 
 
 I am trying ceph-firefly 0.80.8 
 (69eaad7f8308f21573c604f121956e64679a52a7) with 9 OSD ,all Samsung SSD 
 850 EVO on 3 servers with 24 G RAM,16 cores @2.27 Ghz Ubuntu 14.04 LTS 
 with 3.16-3 kernel.All are connected to 10G ports with maximum 
 MTU.There are no extra disks for journaling and also there are no 
 separate network for replication and data transfer.All 3 nodes are 
 also hosting monitoring process.Operating system runs on SATA disk. 
 
 When doing a sequential benchmark using dd on RBD, mounted on client 
 as ext4 its taking 110s to write 100Mb data at an average speed of 
 926Kbps. 
 
 time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct 
 25000+0 records in 
 25000+0 records out 
 10240 bytes (102 MB) copied, 110.582 s, 926 kB/s 
 
 real 1m50.585s 
 user 0m0.106s 
 sys 0m2.233s 
 
 While doing this directly on ssd mount point shows: 
 
 time dd if=/dev/zero of=hello bs=4k count=25000 
 oflag=direct 
 25000+0 records in 
 25000+0 records out 
 10240 bytes (102 MB) copied, 1.38567 
 s, 73.9 MB/s 
 
 OSDs are in XFS with these extra arguments : 
 
 rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M 
 
 ceph.conf 
 
 [global] 
 fsid = 7d889081-7826-439c-9fe5-d4e57480d9be 
 mon_initial_members = ceph1, ceph2, ceph3 
 mon_host = 10.99.10.118,10.99.10.119,10.99.10.120 
 auth_cluster_required = cephx 
 auth_service_required = cephx 
 auth_client_required = cephx 
 filestore_xattr_use_omap = true 
 osd_pool_default_size = 2 
 osd_pool_default_min_size = 2 
 osd_pool_default_pg_num = 450 
 osd_pool_default_pgp_num = 450 
 max_open_files = 131072 
 
 [osd] 
 osd_mkfs_type = xfs 
 osd_op_threads = 8 
 osd_disk_threads = 4 
 osd_mount_options_xfs = 
 rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M 
 
 
 on our traditional storage with Full SAS disk, same dd completes in 
 16s with an average write speed of 6Mbps. 
 
 Rados bench: 
 
 rados bench -p rbd 10 write 
 Maintaining 16 concurrent writes of 4194304 bytes for up to 10 
 seconds or 0 objects 
 Object prefix: benchmark_data_ceph1_2977 
 sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 
 0 0 0 0 0 0 - 0 
 1 16 94 78 311.821 312 0.041228 0.140132 
 2 16 192 176 351.866 392 0.106294 0.175055 
 3 16 275 259 345.216 332 0.076795 0.166036 
 4 16 302 286 285.912 108 0.043888 0.196419 
 5 16 395 379 303.11 372 0.126033 0.207488 
 6 16 501 485 323.242 424 0.125972 0.194559 
 7 16 621 605 345.621 480 0.194155 0.183123 
 8 16 730 714 356.903 436 0.086678 0.176099 
 9 16 814 798 354.572 336 0.081567 0.174786 
 10 16 832 816 326.313 72 0.037431 0.182355 
 11 16 833 817 297.013 4 0.533326 0.182784 
 Total time run: 11.489068 
 Total writes made: 833 
 Write size: 4194304 
 Bandwidth (MB/sec): 290.015 
 
 Stddev Bandwidth: 175.723 
 Max bandwidth (MB/sec): 480 
 Min bandwidth (MB/sec): 0 
 Average Latency: 0.220582 
 Stddev Latency: 0.343697 
 Max latency: 2.85104 
 Min latency: 0.035381 
 
 Our ultimate aim is to replace existing SAN with ceph,but for that it 
 should meet minimum 8000 iops.Can any one help me with this,OSD are 
 SSD,CPU has good clock speed,backend network is good but still we are 
 not able to extract full capability of SSD disks. 
 
 
 
 Thanks, 
 ___ 
 ceph-users mailing list 
 ceph-users@lists.ceph.com 
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
 ___ 
 ceph-users mailing list 
 

Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

2015-02-28 Thread mad Engineer
thanks for that link Alexandre,
as per that link tried these:
 *850 EVO*
*without dsync*

 dd if=randfile of=/dev/sdb1 bs=4k count=10 oflag=direct
10+0 records in
10+0 records out
40960 bytes (410 MB) copied, 4.42913 s, 92.5 MB/s

with *dsync*:

 dd if=randfile of=/dev/sdb1 bs=4k count=10 oflag=direct,dsync
10+0 records in
10+0 records out
40960 bytes (410 MB) copied, 83.4916 s, 4.9 MB/s

*on 840 EVO*
dd if=randfile of=/dev/sdd1 bs=4k count=10 oflag=direct
10+0 records in
10+0 records out
40960 bytes (410 MB) copied, 5.11912 s, 80.0 MB/s

*with dsync*
 dd if=randfile of=/dev/sdd1 bs=4k count=10 oflag=direct,dsync
10+0 records in
10+0 records out
40960 bytes (410 MB) copied, 196.738 s, 2.1 MB/s

So with dsync there is significant reduction in performance,looks like 850
is better than 840.Can this be the reason for reduced write speed of
926kbps?

Also before trying on physical servers i ran ceph on vmware vms with SAS
disks using giant 0.87 ,at that time fire-fly 80.8 was giving higher
numbers,so decided to use firefly.

On Sat, Feb 28, 2015 at 5:13 PM, Alexandre DERUMIER aderum...@odiso.com
wrote:

 Hi,

 First, test if your ssd can write fast with O_DSYNC
 check this blog:

 http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/


 Then, try with ceph Giant (or maybe wait for Hammer), because they are a
 lot of optimisations for ssd for threads sharding.

 In my last test with giant, I was able to reach around 12iops with
 6osd/intel s3500 ssd, but I was cpu limited.

 - Mail original -
 De: mad Engineer themadengin...@gmail.com
 À: ceph-users ceph-users@lists.ceph.com
 Envoyé: Samedi 28 Février 2015 12:19:56
 Objet: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9
 OSD  with 3.16-3 kernel

 Hello All,

 I am trying ceph-firefly 0.80.8
 (69eaad7f8308f21573c604f121956e64679a52a7) with 9 OSD ,all Samsung SSD
 850 EVO on 3 servers with 24 G RAM,16 cores @2.27 Ghz Ubuntu 14.04 LTS
 with 3.16-3 kernel.All are connected to 10G ports with maximum
 MTU.There are no extra disks for journaling and also there are no
 separate network for replication and data transfer.All 3 nodes are
 also hosting monitoring process.Operating system runs on SATA disk.

 When doing a sequential benchmark using dd on RBD, mounted on client
 as ext4 its taking 110s to write 100Mb data at an average speed of
 926Kbps.

 time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct
 25000+0 records in
 25000+0 records out
 10240 bytes (102 MB) copied, 110.582 s, 926 kB/s

 real 1m50.585s
 user 0m0.106s
 sys 0m2.233s

 While doing this directly on ssd mount point shows:

 time dd if=/dev/zero of=hello bs=4k count=25000
 oflag=direct
 25000+0 records in
 25000+0 records out
 10240 bytes (102 MB) copied, 1.38567
 s, 73.9 MB/s

 OSDs are in XFS with these extra arguments :

 rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M

 ceph.conf

 [global]
 fsid = 7d889081-7826-439c-9fe5-d4e57480d9be
 mon_initial_members = ceph1, ceph2, ceph3
 mon_host = 10.99.10.118,10.99.10.119,10.99.10.120
 auth_cluster_required = cephx
 auth_service_required = cephx
 auth_client_required = cephx
 filestore_xattr_use_omap = true
 osd_pool_default_size = 2
 osd_pool_default_min_size = 2
 osd_pool_default_pg_num = 450
 osd_pool_default_pgp_num = 450
 max_open_files = 131072

 [osd]
 osd_mkfs_type = xfs
 osd_op_threads = 8
 osd_disk_threads = 4
 osd_mount_options_xfs =
 rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M


 on our traditional storage with Full SAS disk, same dd completes in
 16s with an average write speed of 6Mbps.

 Rados bench:

 rados bench -p rbd 10 write
 Maintaining 16 concurrent writes of 4194304 bytes for up to 10
 seconds or 0 objects
 Object prefix: benchmark_data_ceph1_2977
 sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
 0 0 0 0 0 0 - 0
 1 16 94 78 311.821 312 0.041228 0.140132
 2 16 192 176 351.866 392 0.106294 0.175055
 3 16 275 259 345.216 332 0.076795 0.166036
 4 16 302 286 285.912 108 0.043888 0.196419
 5 16 395 379 303.11 372 0.126033 0.207488
 6 16 501 485 323.242 424 0.125972 0.194559
 7 16 621 605 345.621 480 0.194155 0.183123
 8 16 730 714 356.903 436 0.086678 0.176099
 9 16 814 798 354.572 336 0.081567 0.174786
 10 16 832 816 326.313 72 0.037431 0.182355
 11 16 833 817 297.013 4 0.533326 0.182784
 Total time run: 11.489068
 Total writes made: 833
 Write size: 4194304
 Bandwidth (MB/sec): 290.015

 Stddev Bandwidth: 175.723
 Max bandwidth (MB/sec): 480
 Min bandwidth (MB/sec): 0
 Average Latency: 0.220582
 Stddev Latency: 0.343697
 Max latency: 2.85104
 Min latency: 0.035381

 Our ultimate aim is to replace existing SAN with ceph,but for that it
 should meet minimum 8000 iops.Can any one help me with this,OSD are
 SSD,CPU has good clock speed,backend network is good but still we are
 not able to extract full capability of SSD disks.



 Thanks,
 

[ceph-users] Mail not reaching the list?

2015-02-28 Thread Tony Harris
Hi,I've sent a couple of emails to the list since subscribing, but I've never 
seen them reach the list; I was just wondering if there was something wrong?___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

2015-02-28 Thread mad Engineer
Hello All,

I am trying ceph-firefly 0.80.8
(69eaad7f8308f21573c604f121956e64679a52a7) with 9 OSD ,all Samsung SSD
850 EVO on 3 servers with 24 G RAM,16 cores @2.27 Ghz Ubuntu 14.04 LTS
with 3.16-3 kernel.All are connected to 10G ports with maximum
MTU.There are no extra disks for journaling and also there are no
separate network for replication and data transfer.All 3 nodes are
also hosting monitoring process.Operating system runs on SATA disk.

When doing a sequential benchmark using dd on RBD, mounted on client
as ext4 its taking 110s to write 100Mb data at an average speed of
926Kbps.

  time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct
 25000+0 records in
 25000+0 records out
 10240 bytes (102 MB) copied, 110.582 s, 926 kB/s

 real1m50.585s
 user0m0.106s
 sys 0m2.233s

While doing this directly on ssd mount point shows:

  time dd if=/dev/zero of=hello bs=4k count=25000
oflag=direct
  25000+0 records in
  25000+0 records out
  10240 bytes (102 MB) copied, 1.38567
s, 73.9 MB/s

OSDs are in XFS with these extra arguments :

rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M

ceph.conf

[global]
fsid = 7d889081-7826-439c-9fe5-d4e57480d9be
mon_initial_members = ceph1, ceph2, ceph3
mon_host = 10.99.10.118,10.99.10.119,10.99.10.120
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
filestore_xattr_use_omap = true
osd_pool_default_size = 2
osd_pool_default_min_size = 2
osd_pool_default_pg_num = 450
osd_pool_default_pgp_num = 450
max_open_files = 131072

[osd]
osd_mkfs_type = xfs
osd_op_threads = 8
osd_disk_threads = 4
osd_mount_options_xfs = rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M


on our traditional storage with Full SAS disk, same dd completes in
16s with an average write speed of 6Mbps.

Rados bench:

rados bench -p rbd 10 write
 Maintaining 16 concurrent writes of 4194304 bytes for up to 10
seconds or 0 objects
 Object prefix: benchmark_data_ceph1_2977
   sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
 0   0 0 0 0 0 - 0
 1  169478   311.821   312  0.041228  0.140132
 2  16   192   176   351.866   392  0.106294  0.175055
 3  16   275   259   345.216   332  0.076795  0.166036
 4  16   302   286   285.912   108  0.043888  0.196419
 5  16   395   379303.11   372  0.126033  0.207488
 6  16   501   485   323.242   424  0.125972  0.194559
 7  16   621   605   345.621   480  0.194155  0.183123
 8  16   730   714   356.903   436  0.086678  0.176099
 9  16   814   798   354.572   336  0.081567  0.174786
10  16   832   816   326.31372  0.037431  0.182355
11  16   833   817   297.013 4  0.533326  0.182784
 Total time run: 11.489068
Total writes made:  833
Write size: 4194304
Bandwidth (MB/sec): 290.015

Stddev Bandwidth:   175.723
Max bandwidth (MB/sec): 480
Min bandwidth (MB/sec): 0
Average Latency:0.220582
Stddev Latency: 0.343697
Max latency:2.85104
Min latency:0.035381

Our ultimate aim is to replace existing SAN with ceph,but for that it
should meet minimum 8000 iops.Can any one help me with this,OSD are
SSD,CPU has good clock speed,backend network is good but still we are
not able to extract full capability of SSD disks.



Thanks,
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

2015-02-28 Thread Alexandre DERUMIER
As optimisation,

try to set ioscheduler to noop, 

and also enable rbd_cache=true. (It's really helping for for sequential writes)

but your results seem quite low, 926kb/s with 4k, it's only 200io/s.

check if you don't have any big network latencies, or mtu fragementation 
problem.

Maybe also try to bench with fio, with more parallel jobs.




- Mail original -
De: mad Engineer themadengin...@gmail.com
À: Philippe Schwarz p...@schwarz-fr.net
Cc: ceph-users ceph-users@lists.ceph.com
Envoyé: Samedi 28 Février 2015 13:06:59
Objet: Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD 
with 3.16-3 kernel

Thanks for the reply Philippe,we were using these disks in our NAS,now 
it looks like i am in big trouble :-( 

On Sat, Feb 28, 2015 at 5:02 PM, Philippe Schwarz p...@schwarz-fr.net wrote: 
 -BEGIN PGP SIGNED MESSAGE- 
 Hash: SHA1 
 
 Le 28/02/2015 12:19, mad Engineer a écrit : 
 Hello All, 
 
 I am trying ceph-firefly 0.80.8 
 (69eaad7f8308f21573c604f121956e64679a52a7) with 9 OSD ,all Samsung 
 SSD 850 EVO on 3 servers with 24 G RAM,16 cores @2.27 Ghz Ubuntu 
 14.04 LTS with 3.16-3 kernel.All are connected to 10G ports with 
 maximum MTU.There are no extra disks for journaling and also there 
 are no separate network for replication and data transfer.All 3 
 nodes are also hosting monitoring process.Operating system runs on 
 SATA disk. 
 
 When doing a sequential benchmark using dd on RBD, mounted on 
 client as ext4 its taking 110s to write 100Mb data at an average 
 speed of 926Kbps. 
 
 time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct 
 25000+0 records in 25000+0 records out 10240 bytes (102 MB) 
 copied, 110.582 s, 926 kB/s 
 
 real 1m50.585s user 0m0.106s sys 0m2.233s 
 
 While doing this directly on ssd mount point shows: 
 
 time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct 
 25000+0 records in 25000+0 records out 10240 bytes (102 MB) 
 copied, 1.38567 s, 73.9 MB/s 
 
 OSDs are in XFS with these extra arguments : 
 
 rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M 
 
 ceph.conf 
 
 [global] fsid = 7d889081-7826-439c-9fe5-d4e57480d9be 
 mon_initial_members = ceph1, ceph2, ceph3 mon_host = 
 10.99.10.118,10.99.10.119,10.99.10.120 auth_cluster_required = 
 cephx auth_service_required = cephx auth_client_required = cephx 
 filestore_xattr_use_omap = true osd_pool_default_size = 2 
 osd_pool_default_min_size = 2 osd_pool_default_pg_num = 450 
 osd_pool_default_pgp_num = 450 max_open_files = 131072 
 
 [osd] osd_mkfs_type = xfs osd_op_threads = 8 osd_disk_threads = 4 
 osd_mount_options_xfs = 
 rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M 
 
 
 on our traditional storage with Full SAS disk, same dd completes 
 in 16s with an average write speed of 6Mbps. 
 
 Rados bench: 
 
 rados bench -p rbd 10 write Maintaining 16 concurrent writes of 
 4194304 bytes for up to 10 seconds or 0 objects Object prefix: 
 benchmark_data_ceph1_2977 sec Cur ops started finished avg MB/s 
 cur MB/s last lat avg lat 0 0 0 0 
 0 0 - 0 1 16 94 78 
 311.821 312 0.041228 0.140132 2 16 192 176 
 351.866 392 0.106294 0.175055 3 16 275 259 
 345.216 332 0.076795 0.166036 4 16 302 286 
 285.912 108 0.043888 0.196419 5 16 395 379 
 303.11 372 0.126033 0.207488 6 16 501 485 
 323.242 424 0.125972 0.194559 7 16 621 605 
 345.621 480 0.194155 0.183123 8 16 730 714 
 356.903 436 0.086678 0.176099 9 16 814 798 
 354.572 336 0.081567 0.174786 10 16 832 
 816 326.313 72 0.037431 0.182355 11 16 833 
 817 297.013 4 0.533326 0.182784 Total time run: 
 11.489068 Total writes made: 833 Write size: 
 4194304 Bandwidth (MB/sec): 290.015 
 
 Stddev Bandwidth: 175.723 Max bandwidth (MB/sec): 480 Min 
 bandwidth (MB/sec): 0 Average Latency: 0.220582 Stddev 
 Latency: 0.343697 Max latency: 2.85104 Min 
 latency: 0.035381 
 
 Our ultimate aim is to replace existing SAN with ceph,but for that 
 it should meet minimum 8000 iops.Can any one help me with this,OSD 
 are SSD,CPU has good clock speed,backend network is good but still 
 we are not able to extract full capability of SSD disks. 
 
 
 
 Thanks, 
 
 Hi, i'm new to ceph so, don't consider my words as holy truth. 
 
 It seems that Samsung 840 (so i assume 850) are crappy for ceph : 
 
 MTBF : 
 http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-November/044258.html 
 Bandwidth 
 :http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-December/045247.html
  
 
 And according to a confirmed user of Ceph/ProxmoX, Samsung SSDs should 
 be avoided if possible in ceph storage. 
 
 Apart from that, it seems there was an limitation in ceph for the use 
 of the complete bandwidth available in SSDs; but i think with less 
 than 1Mb/s you haven't hit this limit. 
 
 I remind you that i'm not a ceph-guru (far from that, indeed), so feel 
 free to disagree; i'm on the way to improve my knowledge. 
 
 Best regards. 
 
 
 
 
 -BEGIN PGP SIGNATURE- 
 Version: GnuPG v1 
 
 

Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

2015-02-28 Thread Alexandre DERUMIER
Hi,

First, test if your ssd can write fast with O_DSYNC
check this blog:
http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/


Then, try with ceph Giant (or maybe wait for Hammer), because they are a lot of 
optimisations for ssd for threads sharding.

In my last test with giant, I was able to reach around 12iops with 
6osd/intel s3500 ssd, but I was cpu limited.

- Mail original -
De: mad Engineer themadengin...@gmail.com
À: ceph-users ceph-users@lists.ceph.com
Envoyé: Samedi 28 Février 2015 12:19:56
Objet: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD  
with 3.16-3 kernel

Hello All, 

I am trying ceph-firefly 0.80.8 
(69eaad7f8308f21573c604f121956e64679a52a7) with 9 OSD ,all Samsung SSD 
850 EVO on 3 servers with 24 G RAM,16 cores @2.27 Ghz Ubuntu 14.04 LTS 
with 3.16-3 kernel.All are connected to 10G ports with maximum 
MTU.There are no extra disks for journaling and also there are no 
separate network for replication and data transfer.All 3 nodes are 
also hosting monitoring process.Operating system runs on SATA disk. 

When doing a sequential benchmark using dd on RBD, mounted on client 
as ext4 its taking 110s to write 100Mb data at an average speed of 
926Kbps. 

time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct 
25000+0 records in 
25000+0 records out 
10240 bytes (102 MB) copied, 110.582 s, 926 kB/s 

real 1m50.585s 
user 0m0.106s 
sys 0m2.233s 

While doing this directly on ssd mount point shows: 

time dd if=/dev/zero of=hello bs=4k count=25000 
oflag=direct 
25000+0 records in 
25000+0 records out 
10240 bytes (102 MB) copied, 1.38567 
s, 73.9 MB/s 

OSDs are in XFS with these extra arguments : 

rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M 

ceph.conf 

[global] 
fsid = 7d889081-7826-439c-9fe5-d4e57480d9be 
mon_initial_members = ceph1, ceph2, ceph3 
mon_host = 10.99.10.118,10.99.10.119,10.99.10.120 
auth_cluster_required = cephx 
auth_service_required = cephx 
auth_client_required = cephx 
filestore_xattr_use_omap = true 
osd_pool_default_size = 2 
osd_pool_default_min_size = 2 
osd_pool_default_pg_num = 450 
osd_pool_default_pgp_num = 450 
max_open_files = 131072 

[osd] 
osd_mkfs_type = xfs 
osd_op_threads = 8 
osd_disk_threads = 4 
osd_mount_options_xfs = 
rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M 


on our traditional storage with Full SAS disk, same dd completes in 
16s with an average write speed of 6Mbps. 

Rados bench: 

rados bench -p rbd 10 write 
Maintaining 16 concurrent writes of 4194304 bytes for up to 10 
seconds or 0 objects 
Object prefix: benchmark_data_ceph1_2977 
sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 
0 0 0 0 0 0 - 0 
1 16 94 78 311.821 312 0.041228 0.140132 
2 16 192 176 351.866 392 0.106294 0.175055 
3 16 275 259 345.216 332 0.076795 0.166036 
4 16 302 286 285.912 108 0.043888 0.196419 
5 16 395 379 303.11 372 0.126033 0.207488 
6 16 501 485 323.242 424 0.125972 0.194559 
7 16 621 605 345.621 480 0.194155 0.183123 
8 16 730 714 356.903 436 0.086678 0.176099 
9 16 814 798 354.572 336 0.081567 0.174786 
10 16 832 816 326.313 72 0.037431 0.182355 
11 16 833 817 297.013 4 0.533326 0.182784 
Total time run: 11.489068 
Total writes made: 833 
Write size: 4194304 
Bandwidth (MB/sec): 290.015 

Stddev Bandwidth: 175.723 
Max bandwidth (MB/sec): 480 
Min bandwidth (MB/sec): 0 
Average Latency: 0.220582 
Stddev Latency: 0.343697 
Max latency: 2.85104 
Min latency: 0.035381 

Our ultimate aim is to replace existing SAN with ceph,but for that it 
should meet minimum 8000 iops.Can any one help me with this,OSD are 
SSD,CPU has good clock speed,backend network is good but still we are 
not able to extract full capability of SSD disks. 



Thanks, 
___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

2015-02-28 Thread Andrei Mikhailovsky
Martin, 

I have been using Samsung 840 Pro for journals about 2 years now and have just 
replaced all my samsung drives with Intel. We have found a lot of performance 
issues with 840 Pro (we are using 128mb). In particular, a very strange 
behaviour with using 4 partitions (with 50% underprovisioning left as empty 
unpartitioned space on the drive) where the drive would grind to almost a halt 
after a few weeks of use. I was getting 100% utilisation on the drives doing 
just 3-4MB/s writes. This was not the case when I've installed the new drives. 
Manual Trimming helps for a few weeks until the same happens again. 

This has been happening with all 840 Pro ssds that we have and contacting 
Samsung Support has proven to be utterly useless. They do not want to speak 
with you until you install windows and run their monkey utility ((. 

Also, i've noticed the latencies of the Samsung 840 Pro ssd drives to be about 
15-20 slower compared with a consumer grade Intel drives, like Intel 520. 
According to ceph osd pef, I would consistently get higher figures on the osds 
with Samsung journal drive compared with the Intel drive on the same server. 
Something like 2-3ms for Intel vs 40-50ms for Samsungs. 

At some point we had enough with Samsungs and scrapped them. 

Andrei 

- Original Message -

 From: Martin B Nielsen mar...@unity3d.com
 To: Philippe Schwarz p...@schwarz-fr.net
 Cc: ceph-users@lists.ceph.com
 Sent: Saturday, 28 February, 2015 11:51:57 AM
 Subject: Re: [ceph-users] Extreme slowness in SSD cluster with 3
 nodes and 9 OSD with 3.16-3 kernel

 Hi,

 I cannot recognize that picture; we've been using samsumg 840 pro in
 production for almost 2 years now - and have had 1 fail.

 We run a 8node mixed ssd/platter cluster with 4x samsung 840 pro
 (500gb) in each so that is 32x ssd.

 They've written ~25TB data in avg each.

 Using the dd you had inside an existing semi-busy mysql-guest I get:

 10240 bytes (102 MB) copied, 5.58218 s, 18.3 MB/s

 Which is still not a lot, but I think it is more a limitation of our
 setup/load.

 We are using dumpling.

 All that aside, I would prob. go with something tried and tested if I
 was to redo it today - we haven't had any issues, but it is still
 nice to use something you know should have a baseline performance
 and can compare to that.

 Cheers,
 Martin

 On Sat, Feb 28, 2015 at 12:32 PM, Philippe Schwarz 
 p...@schwarz-fr.net  wrote:

  -BEGIN PGP SIGNED MESSAGE-
 
  Hash: SHA1
 

  Le 28/02/2015 12:19, mad Engineer a écrit :
 

   Hello All,
 
  
 
   I am trying ceph-firefly 0.80.8
 
   (69eaad7f8308f21573c604f121956e64679a52a7) with 9 OSD ,all
   Samsung
 
   SSD 850 EVO on 3 servers with 24 G RAM,16 cores @2.27 Ghz Ubuntu
 
   14.04 LTS with 3.16-3 kernel.All are connected to 10G ports with
 
   maximum MTU.There are no extra disks for journaling and also
   there
 
   are no separate network for replication and data transfer.All 3
 
   nodes are also hosting monitoring process.Operating system runs
   on
 
   SATA disk.
 
  
 
   When doing a sequential benchmark using dd on RBD, mounted on
 
   client as ext4 its taking 110s to write 100Mb data at an average
 
   speed of 926Kbps.
 
  
 
   time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct
 
   25000+0 records in 25000+0 records out 10240 bytes (102 MB)
 
   copied, 110.582 s, 926 kB/s
 
  
 
   real 1m50.585s user 0m0.106s sys 0m2.233s
 
  
 
   While doing this directly on ssd mount point shows:
 
  
 
   time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct
 
   25000+0 records in 25000+0 records out 10240 bytes (102 MB)
 
   copied, 1.38567 s, 73.9 MB/s
 
  
 
   OSDs are in XFS with these extra arguments :
 
  
 
   rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M
 
  
 
   ceph.conf
 
  
 
   [global] fsid = 7d889081-7826-439c-9fe5-d4e57480d9be
 
   mon_initial_members = ceph1, ceph2, ceph3 mon_host =
 
   10.99.10.118,10.99.10.119,10.99.10.120 auth_cluster_required =
 
   cephx auth_service_required = cephx auth_client_required = cephx
 
   filestore_xattr_use_omap = true osd_pool_default_size = 2
 
   osd_pool_default_min_size = 2 osd_pool_default_pg_num = 450
 
   osd_pool_default_pgp_num = 450 max_open_files = 131072
 
  
 
   [osd] osd_mkfs_type = xfs osd_op_threads = 8 osd_disk_threads = 4
 
   osd_mount_options_xfs =
 
   rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M
 
  
 
  
 
   on our traditional storage with Full SAS disk, same dd
   completes
 
   in 16s with an average write speed of 6Mbps.
 
  
 
   Rados bench:
 
  
 
   rados bench -p rbd 10 write Maintaining 16 concurrent writes of
 
   4194304 bytes for up to 10 seconds or 0 objects Object prefix:
 
   benchmark_data_ceph1_2977 sec Cur ops started finished avg MB/s
 
   cur MB/s last lat avg lat 0 0 0 0
 
   0 0 - 0 1 16 94 78
 
   311.821 312 0.041228 0.140132 2 16 192 176
 
   351.866 392 0.106294 0.175055 3 16 275 259
 
   345.216 332 0.076795 0.166036 4 16 302 

Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

2015-02-28 Thread Martin B Nielsen
Hi Andrei,

If there is one thing I've come to understand by now is that ceph configs,
performance, hw and well - everything - seems to vary on almost people
basis.

I do not recognize that latency issue either, this is from one of our nodes
(4x 500GB samsung 840 pro - sd[c-f]) which has been running for 600+ days
(so the iostat -x is an avg of that):

# uptime
 16:24:57 up 611 days,  4:03,  1 user,  load average: 1.18, 1.55, 1.72

# iostat -x
[ ... ]
Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz
avgqu-sz   await r_await w_await  svctm  %util
sdc   0.00 0.164.87   22.62   344.18   458.65
58.41 0.051.920.452.24   0.76   2.10
sdd   0.00 0.124.37   20.02   317.98   437.95
61.98 0.051.900.442.21   0.78   1.91
sde   0.00 0.124.17   19.33   302.45   403.02
60.02 0.041.870.432.18   0.77   1.80
sdf   0.00 0.124.51   20.84   322.84   439.70
60.17 0.051.840.432.15   0.76   1.93
[ ... ]

Granted, we do not have very high usage on this cluster on a ssd-basis and
it might change as we put more load on it, but we will deal with it then. I
do not think ~2ms access time is neither good nor bad.

This is from another cluster we operate - this one has an intel DC S3700
800gb ssd (sdb)
# uptime
 09:37:26 up 654 days,  8:40,  1 user,  load average: 0.33, 0.40, 0.54

# iostat -x
[ ... ]
sdb   0.01 1.49   39.76   86.79  1252.80  2096.98
52.94 0.020.761.220.54   0.41   5.21
[ ... ]

It is misleading as the latter just have 3 disks + hardware based 1gb
backed raidcontroller whereas the first is a 'cheap' dumb 12disk jbod IT
based setup.

All the ssd from both clusters have 3 partitions - 1 ceph-data and 2
journal partitions (1 journal for the ssd itself and 1 journal for 1
platter disk).

The intel ssd is very sturdy though - it has had a 2.1MB/sec avg. write
over 654 days - that is somewhere around 120TB so far.

But ultimately it boils down to what you need - in our usecase the latter
cluster has be to rockstable and performing - and we chose the intel ones
based on that. The first one we don't really care if we loose a node or two
and we replace disks every month or whenever it fits into our
going-to-datacenter-schedule - we wanted an ok'ish performing cluster and
focused more on total space / price than highperforming hardware. The
fantastic thing is we are not locked into any specific hardware and we can
replace any of it if we need to and/or find it is suddenly starting to have
issues.

Cheers,
Martin



On Sat, Feb 28, 2015 at 2:55 PM, Andrei Mikhailovsky and...@arhont.com
wrote:


 Martin,

 I have been using Samsung 840 Pro for journals about 2 years now and have
 just replaced all my samsung drives with Intel. We have found a lot of
 performance issues with 840 Pro (we are using 128mb). In particular, a very
 strange behaviour with using 4 partitions (with 50% underprovisioning left
 as empty unpartitioned space on the drive) where the drive would grind to
 almost a halt after a few weeks of use. I was getting 100% utilisation on
 the drives doing just 3-4MB/s writes. This was not the case when I've
 installed the new drives. Manual Trimming helps for a few weeks until the
 same happens again.

 This has been happening with all 840 Pro ssds that we have and contacting
 Samsung Support has proven to be utterly useless. They do not want to speak
 with you until you install windows and run their monkey utility ((.

 Also, i've noticed the latencies of the Samsung 840 Pro ssd drives to be
 about 15-20 slower compared with a consumer grade Intel drives, like Intel
 520. According to  ceph osd pef, I would consistently get higher figures on
 the osds with Samsung journal drive compared with the Intel drive on the
 same server. Something like 2-3ms for Intel vs 40-50ms for Samsungs.

 At some point we had enough with Samsungs and scrapped them.

 Andrei

 --

 *From: *Martin B Nielsen mar...@unity3d.com
 *To: *Philippe Schwarz p...@schwarz-fr.net
 *Cc: *ceph-users@lists.ceph.com
 *Sent: *Saturday, 28 February, 2015 11:51:57 AM
 *Subject: *Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes
 and 9 OSD with 3.16-3 kernel


 Hi,

 I cannot recognize that picture; we've been using samsumg 840 pro in
 production for almost 2 years now - and have had 1 fail.

 We run a 8node mixed ssd/platter cluster with 4x samsung 840 pro (500gb)
 in each so that is 32x ssd.

 They've written ~25TB data in avg each.

 Using the dd you had inside an existing semi-busy mysql-guest I get:

 10240 bytes (102 MB) copied, 5.58218 s, 18.3 MB/s

 Which is still not a lot, but I think it is more a limitation of our
 setup/load.

 We are using dumpling.

 All that aside, I would prob. go with something tried and tested if I was
 to redo it today - we haven't had any issues, but it is still nice to use
 something 

Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

2015-02-28 Thread Somnath Roy
I would say check with rados tool like ceph_smalliobench/rados bench first to 
see how much performance these tools are reporting. This will help you to 
isolate any upstream issues.
Also, check with ‘iostat –xk 1’ for the resource utilization. Hope you are 
running with powerful enough cpu complex since you are saying network is not a 
bottleneck.

Thanks  Regards
Somnath

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of mad 
Engineer
Sent: Saturday, February 28, 2015 12:29 PM
To: Alexandre DERUMIER
Cc: ceph-users
Subject: Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 
OSD with 3.16-3 kernel

reinstalled ceph packages and now with memstore backend [osd objectstore 
=memstore] its giving 400Kbps .No idea where the problem is.

On Sun, Mar 1, 2015 at 12:30 AM, mad Engineer 
themadengin...@gmail.commailto:themadengin...@gmail.com wrote:
tried changing scheduler from deadline to noop also upgraded to Gaint and btrfs 
filesystem,downgraded kernel to 3.16 from 3.16-3 not much difference

dd if=/dev/zero of=hi bs=4k count=25000 oflag=direct
25000+0 records in
25000+0 records out
10240 bytes (102 MB) copied, 94.691 s, 1.1 MB/s

Earlier on a vmware setup i was getting ~850 KBps and now even on physical 
server with SSD drives its just over 1MBps.I doubt some serious configuration 
issues.

Tried iperf between 3 servers all are showing 9 Gbps,tried icmp with different 
packet size ,no fragmentation.

i also noticed that out of 9 osd 5 are 850 EVO and 4 are 840 EVO.I believe this 
will not cause this much drop in performance.

Thanks for any help


On Sat, Feb 28, 2015 at 6:49 PM, Alexandre DERUMIER 
aderum...@odiso.commailto:aderum...@odiso.com wrote:
As optimisation,

try to set ioscheduler to noop,

and also enable rbd_cache=true. (It's really helping for for sequential writes)

but your results seem quite low, 926kb/s with 4k, it's only 200io/s.

check if you don't have any big network latencies, or mtu fragementation 
problem.

Maybe also try to bench with fio, with more parallel jobs.




- Mail original -
De: mad Engineer themadengin...@gmail.commailto:themadengin...@gmail.com
À: Philippe Schwarz p...@schwarz-fr.netmailto:p...@schwarz-fr.net
Cc: ceph-users ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com
Envoyé: Samedi 28 Février 2015 13:06:59
Objet: Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD 
with 3.16-3 kernel
Thanks for the reply Philippe,we were using these disks in our NAS,now
it looks like i am in big trouble :-(

On Sat, Feb 28, 2015 at 5:02 PM, Philippe Schwarz 
p...@schwarz-fr.netmailto:p...@schwarz-fr.net wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 Le 28/02/2015 12:19, mad Engineer a écrit :
 Hello All,

 I am trying ceph-firefly 0.80.8
 (69eaad7f8308f21573c604f121956e64679a52a7) with 9 OSD ,all Samsung
 SSD 850 EVO on 3 servers with 24 G RAM,16 cores @2.27 Ghz Ubuntu
 14.04 LTS with 3.16-3 kernel.All are connected to 10G ports with
 maximum MTU.There are no extra disks for journaling and also there
 are no separate network for replication and data transfer.All 3
 nodes are also hosting monitoring process.Operating system runs on
 SATA disk.

 When doing a sequential benchmark using dd on RBD, mounted on
 client as ext4 its taking 110s to write 100Mb data at an average
 speed of 926Kbps.

 time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct
 25000+0 records in 25000+0 records out 10240 bytes (102 MB)
 copied, 110.582 s, 926 kB/s

 real 1m50.585s user 0m0.106s sys 0m2.233s

 While doing this directly on ssd mount point shows:

 time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct
 25000+0 records in 25000+0 records out 10240 bytes (102 MB)
 copied, 1.38567 s, 73.9 MB/s

 OSDs are in XFS with these extra arguments :

 rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M

 ceph.conf

 [global] fsid = 7d889081-7826-439c-9fe5-d4e57480d9be
 mon_initial_members = ceph1, ceph2, ceph3 mon_host =
 10.99.10.118,10.99.10.119,10.99.10.120 auth_cluster_required =
 cephx auth_service_required = cephx auth_client_required = cephx
 filestore_xattr_use_omap = true osd_pool_default_size = 2
 osd_pool_default_min_size = 2 osd_pool_default_pg_num = 450
 osd_pool_default_pgp_num = 450 max_open_files = 131072

 [osd] osd_mkfs_type = xfs osd_op_threads = 8 osd_disk_threads = 4
 osd_mount_options_xfs =
 rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M


 on our traditional storage with Full SAS disk, same dd completes
 in 16s with an average write speed of 6Mbps.

 Rados bench:

 rados bench -p rbd 10 write Maintaining 16 concurrent writes of
 4194304 bytes for up to 10 seconds or 0 objects Object prefix:
 benchmark_data_ceph1_2977 sec Cur ops started finished avg MB/s
 cur MB/s last lat avg lat 0 0 0 0
 0 0 - 0 1 16 94 78
 311.821 312 0.041228 0.140132 2 16 192 176
 351.866 392 0.106294 0.175055 3 16 275 259
 345.216 332 0.076795 0.166036 4 16 302 286
 285.912 108 0.043888 0.196419 

Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

2015-02-28 Thread mad Engineer
Thanks for the reply Philippe,we were using these disks in our NAS,now
it looks like i am in big trouble :-(

On Sat, Feb 28, 2015 at 5:02 PM, Philippe Schwarz p...@schwarz-fr.net wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 Le 28/02/2015 12:19, mad Engineer a écrit :
 Hello All,

 I am trying ceph-firefly 0.80.8
 (69eaad7f8308f21573c604f121956e64679a52a7) with 9 OSD ,all Samsung
 SSD 850 EVO on 3 servers with 24 G RAM,16 cores @2.27 Ghz Ubuntu
 14.04 LTS with 3.16-3 kernel.All are connected to 10G ports with
 maximum MTU.There are no extra disks for journaling and also there
 are no separate network for replication and data transfer.All 3
 nodes are also hosting monitoring process.Operating system runs on
 SATA disk.

 When doing a sequential benchmark using dd on RBD, mounted on
 client as ext4 its taking 110s to write 100Mb data at an average
 speed of 926Kbps.

 time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct
 25000+0 records in 25000+0 records out 10240 bytes (102 MB)
 copied, 110.582 s, 926 kB/s

 real1m50.585s user0m0.106s sys 0m2.233s

 While doing this directly on ssd mount point shows:

 time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct
 25000+0 records in 25000+0 records out 10240 bytes (102 MB)
 copied, 1.38567 s, 73.9 MB/s

 OSDs are in XFS with these extra arguments :

 rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M

 ceph.conf

 [global] fsid = 7d889081-7826-439c-9fe5-d4e57480d9be
 mon_initial_members = ceph1, ceph2, ceph3 mon_host =
 10.99.10.118,10.99.10.119,10.99.10.120 auth_cluster_required =
 cephx auth_service_required = cephx auth_client_required = cephx
 filestore_xattr_use_omap = true osd_pool_default_size = 2
 osd_pool_default_min_size = 2 osd_pool_default_pg_num = 450
 osd_pool_default_pgp_num = 450 max_open_files = 131072

 [osd] osd_mkfs_type = xfs osd_op_threads = 8 osd_disk_threads = 4
 osd_mount_options_xfs =
 rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M


 on our traditional storage with Full SAS disk, same dd completes
 in 16s with an average write speed of 6Mbps.

 Rados bench:

 rados bench -p rbd 10 write Maintaining 16 concurrent writes of
 4194304 bytes for up to 10 seconds or 0 objects Object prefix:
 benchmark_data_ceph1_2977 sec Cur ops   started  finished  avg MB/s
 cur MB/s  last lat   avg lat 0   0 0 0
 0 0 - 0 1  169478
 311.821   312  0.041228  0.140132 2  16   192   176
 351.866   392  0.106294  0.175055 3  16   275   259
 345.216   332  0.076795  0.166036 4  16   302   286
 285.912   108  0.043888  0.196419 5  16   395   379
 303.11   372  0.126033  0.207488 6  16   501   485
 323.242   424  0.125972  0.194559 7  16   621   605
 345.621   480  0.194155  0.183123 8  16   730   714
 356.903   436  0.086678  0.176099 9  16   814   798
 354.572   336  0.081567  0.174786 10  16   832
 816   326.31372  0.037431  0.182355 11  16   833
 817   297.013 4  0.533326  0.182784 Total time run:
 11.489068 Total writes made:  833 Write size:
 4194304 Bandwidth (MB/sec): 290.015

 Stddev Bandwidth:   175.723 Max bandwidth (MB/sec): 480 Min
 bandwidth (MB/sec): 0 Average Latency:0.220582 Stddev
 Latency: 0.343697 Max latency:2.85104 Min
 latency:0.035381

 Our ultimate aim is to replace existing SAN with ceph,but for that
 it should meet minimum 8000 iops.Can any one help me with this,OSD
 are SSD,CPU has good clock speed,backend network is good but still
 we are not able to extract full capability of SSD disks.



 Thanks,

 Hi, i'm new to ceph so, don't consider my words as holy truth.

 It seems that Samsung 840 (so i assume 850) are crappy for ceph :

 MTBF :
 http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-November/044258.html
 Bandwidth
 :http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-December/045247.html

 And according to a confirmed user of Ceph/ProxmoX, Samsung SSDs should
 be avoided if possible in ceph storage.

 Apart from that, it seems there was an limitation in ceph for the use
 of the complete bandwidth available in SSDs; but i think with less
 than 1Mb/s you haven't hit this limit.

 I remind you that i'm not a ceph-guru (far from that, indeed), so feel
 free to disagree; i'm on the way to improve my knowledge.

 Best regards.




 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1

 iEYEARECAAYFAlTxp0UACgkQlhqCFkbqHRb5+wCgrXCM3VsnVE6PCbbpOmQXCXbr
 8u0An2BUgZWismSK0PxbwVDOD5+/UWik
 =0o0v
 -END PGP SIGNATURE-
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

2015-02-28 Thread Somnath Roy
Sorry, I saw you have already tried with ‘rados bench’. So, some points here.

1. If you are considering write workload, I think with total of 2 copies and 
with 4K workload , you should be able to get ~4K iops (considering it hitting 
the disk, not with memstore).

2. You are having 9 OSDs and if you created only one pool with only 450 PGS, 
you should try to increase that and see if getting any improvement or not.

3. Also, the rados bench script you ran with very low QD, try increasing that, 
may be 32/64.

4. If you are running firefly, other optimization won’t work here..But, you can 
add the following in your ceph.conf file and it should give you some boost.

debug_lockdep = 0/0
debug_context = 0/0
debug_crush = 0/0
debug_buffer = 0/0
debug_timer = 0/0
debug_filer = 0/0
debug_objecter = 0/0
debug_rados = 0/0
debug_rbd = 0/0
debug_journaler = 0/0
debug_objectcatcher = 0/0
debug_client = 0/0
debug_osd = 0/0
debug_optracker = 0/0
debug_objclass = 0/0
debug_filestore = 0/0
debug_journal = 0/0
debug_ms = 0/0
debug_monc = 0/0
debug_tp = 0/0
debug_auth = 0/0
debug_finisher = 0/0
debug_heartbeatmap = 0/0
debug_perfcounter = 0/0
debug_asok = 0/0
debug_throttle = 0/0
debug_mon = 0/0
debug_paxos = 0/0
debug_rgw = 0/0

5. Give us the ceph –s output and the iostat output while io is going on.

Thanks  Regards
Somnath



From: Somnath Roy
Sent: Saturday, February 28, 2015 12:59 PM
To: 'mad Engineer'; Alexandre DERUMIER
Cc: ceph-users
Subject: RE: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 
OSD with 3.16-3 kernel

I would say check with rados tool like ceph_smalliobench/rados bench first to 
see how much performance these tools are reporting. This will help you to 
isolate any upstream issues.
Also, check with ‘iostat –xk 1’ for the resource utilization. Hope you are 
running with powerful enough cpu complex since you are saying network is not a 
bottleneck.

Thanks  Regards
Somnath

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of mad 
Engineer
Sent: Saturday, February 28, 2015 12:29 PM
To: Alexandre DERUMIER
Cc: ceph-users
Subject: Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 
OSD with 3.16-3 kernel

reinstalled ceph packages and now with memstore backend [osd objectstore 
=memstore] its giving 400Kbps .No idea where the problem is.

On Sun, Mar 1, 2015 at 12:30 AM, mad Engineer 
themadengin...@gmail.commailto:themadengin...@gmail.com wrote:
tried changing scheduler from deadline to noop also upgraded to Gaint and btrfs 
filesystem,downgraded kernel to 3.16 from 3.16-3 not much difference

dd if=/dev/zero of=hi bs=4k count=25000 oflag=direct
25000+0 records in
25000+0 records out
10240 bytes (102 MB) copied, 94.691 s, 1.1 MB/s

Earlier on a vmware setup i was getting ~850 KBps and now even on physical 
server with SSD drives its just over 1MBps.I doubt some serious configuration 
issues.

Tried iperf between 3 servers all are showing 9 Gbps,tried icmp with different 
packet size ,no fragmentation.

i also noticed that out of 9 osd 5 are 850 EVO and 4 are 840 EVO.I believe this 
will not cause this much drop in performance.

Thanks for any help


On Sat, Feb 28, 2015 at 6:49 PM, Alexandre DERUMIER 
aderum...@odiso.commailto:aderum...@odiso.com wrote:
As optimisation,

try to set ioscheduler to noop,

and also enable rbd_cache=true. (It's really helping for for sequential writes)

but your results seem quite low, 926kb/s with 4k, it's only 200io/s.

check if you don't have any big network latencies, or mtu fragementation 
problem.

Maybe also try to bench with fio, with more parallel jobs.




- Mail original -
De: mad Engineer themadengin...@gmail.commailto:themadengin...@gmail.com
À: Philippe Schwarz p...@schwarz-fr.netmailto:p...@schwarz-fr.net
Cc: ceph-users ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com
Envoyé: Samedi 28 Février 2015 13:06:59
Objet: Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD 
with 3.16-3 kernel
Thanks for the reply Philippe,we were using these disks in our NAS,now
it looks like i am in big trouble :-(

On Sat, Feb 28, 2015 at 5:02 PM, Philippe Schwarz 
p...@schwarz-fr.netmailto:p...@schwarz-fr.net wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 Le 28/02/2015 12:19, mad Engineer a écrit :
 Hello All,

 I am trying ceph-firefly 0.80.8
 (69eaad7f8308f21573c604f121956e64679a52a7) with 9 OSD ,all Samsung
 SSD 850 EVO on 3 servers with 24 G RAM,16 cores @2.27 Ghz Ubuntu
 14.04 LTS with 3.16-3 kernel.All are connected to 10G ports with
 maximum MTU.There are no extra disks for journaling and also there
 are no separate network for replication and data transfer.All 3
 nodes are also hosting monitoring process.Operating system runs on
 SATA disk.

 When doing a sequential benchmark using dd on RBD, mounted on
 client as ext4 its taking 110s to write 100Mb data at an average
 speed of 926Kbps.

 time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct
 25000+0 records in 

Re: [ceph-users] Shutting down a cluster fully and powering it back up

2015-02-28 Thread Gregory Farnum
Sounds good!
-Greg
On Sat, Feb 28, 2015 at 10:55 AM David da...@visions.se wrote:

 Hi!

 I’m about to do maintenance on a Ceph Cluster, where we need to shut it
 all down fully.
 We’re currently only using it for rados block devices to KVM Hypervizors.

 Are these steps sane?

 Shutting it down

 1. Shut down all IO to the cluster. Means turning off all clients (KVM
 Hypervizors in our case).
 2. Set cluster to noout by running: ceph osd set noout
 3. Shut down the MON nodes.
 4. Shut down the OSD nodes.

 Starting it up

 1. Start the OSD nodes.
 2. Start the MON nodes.
 3. Check ceph -w to see the status of ceph and take actions if something
 is wrong.
 4. Start up the clients (KVM Hypervizors)
 5. Run ceph osd unset noout

 Kind Regards,
 David
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

2015-02-28 Thread Stefan Priebe


Am 28.02.2015 um 19:41 schrieb Kevin Walker:

What about the Samsung 845DC Pro SSD's?

These have fantastic enterprise performance characteristics.

http://www.thessdreview.com/our-reviews/samsung-845dc-pro-review-800gb-class-leading-speed-endurance/


Or use SV843 from Samsung Semiconductor (seperate samsung company).

Stefan


Kind regards

Kevin

On 28 February 2015 at 15:32, Philippe Schwarz p...@schwarz-fr.net
mailto:p...@schwarz-fr.net wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Le 28/02/2015 12:19, mad Engineer a écrit :
  Hello All,
 
  I am trying ceph-firefly 0.80.8
  (69eaad7f8308f21573c604f121956e64679a52a7) with 9 OSD ,all Samsung
  SSD 850 EVO on 3 servers with 24 G RAM,16 cores @2.27 Ghz Ubuntu
  14.04 LTS with 3.16-3 kernel.All are connected to 10G ports with
  maximum MTU.There are no extra disks for journaling and also there
  are no separate network for replication and data transfer.All 3
  nodes are also hosting monitoring process.Operating system runs on
  SATA disk.
 
  When doing a sequential benchmark using dd on RBD, mounted on
  client as ext4 its taking 110s to write 100Mb data at an average
  speed of 926Kbps.
 
  time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct
  25000+0 records in 25000+0 records out 10240 bytes (102 MB)
  copied, 110.582 s, 926 kB/s
 
  real1m50.585s user0m0.106s sys 0m2.233s
 
  While doing this directly on ssd mount point shows:
 
  time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct
  25000+0 records in 25000+0 records out 10240 bytes (102 MB)
  copied, 1.38567 s, 73.9 MB/s
 
  OSDs are in XFS with these extra arguments :
 
  rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M
 
  ceph.conf
 
  [global] fsid = 7d889081-7826-439c-9fe5-d4e57480d9be
  mon_initial_members = ceph1, ceph2, ceph3 mon_host =
  10.99.10.118,10.99.10.119,10.99.10.120 auth_cluster_required =
  cephx auth_service_required = cephx auth_client_required = cephx
  filestore_xattr_use_omap = true osd_pool_default_size = 2
  osd_pool_default_min_size = 2 osd_pool_default_pg_num = 450
  osd_pool_default_pgp_num = 450 max_open_files = 131072
 
  [osd] osd_mkfs_type = xfs osd_op_threads = 8 osd_disk_threads = 4
  osd_mount_options_xfs =
  rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M
 
 
  on our traditional storage with Full SAS disk, same dd completes
  in 16s with an average write speed of 6Mbps.
 
  Rados bench:
 
  rados bench -p rbd 10 write Maintaining 16 concurrent writes of
  4194304 bytes for up to 10 seconds or 0 objects Object prefix:
  benchmark_data_ceph1_2977 sec Cur ops   started  finished  avg MB/s
  cur MB/s  last lat   avg lat 0   0 0 0
  0 0 - 0 1  169478
  311.821   312  0.041228  0.140132 2  16   192   176
  351.866   392  0.106294  0.175055 3  16   275   259
  345.216   332  0.076795  0.166036 4  16   302   286
  285.912   108  0.043888  0.196419 5  16   395   379
  303.11   372  0.126033  0.207488 6  16   501   485
  323.242   424  0.125972  0.194559 7  16   621   605
  345.621   480  0.194155  0.183123 8  16   730   714
  356.903   436  0.086678  0.176099 9  16   814   798
  354.572   336  0.081567  0.174786 10  16   832
  816   326.31372  0.037431  0.182355 11  16   833
  817   297.013 4  0.533326  0.182784 Total time run:
  11.489068 Total writes made:  833 Write size:
  4194304 Bandwidth (MB/sec): 290.015
 
  Stddev Bandwidth:   175.723 Max bandwidth (MB/sec): 480 Min
  bandwidth (MB/sec): 0 Average Latency:0.220582 Stddev
  Latency: 0.343697 Max latency:2.85104 Min
  latency:0.035381
 
  Our ultimate aim is to replace existing SAN with ceph,but for that
  it should meet minimum 8000 iops.Can any one help me with this,OSD
  are SSD,CPU has good clock speed,backend network is good but still
  we are not able to extract full capability of SSD disks.
 
 
 
  Thanks,

Hi, i'm new to ceph so, don't consider my words as holy truth.

It seems that Samsung 840 (so i assume 850) are crappy for ceph :

MTBF :

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-November/044258.html
Bandwidth

:http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-December/045247.html

And according to a confirmed user of Ceph/ProxmoX, Samsung SSDs should
be avoided if possible in ceph storage.

Apart from that, it seems there was an limitation in ceph for the use
of 

[ceph-users] RGW hammer/master woes

2015-02-28 Thread Pavan Rallabhandi
Am struggling to get through a basic PUT via swift client with RGW and CEPH 
binaries built out of Hammer/Master codebase, whereas the same (command on the 
same setup) is going through with RGW and CEPH binaries built out of Giant.

Find below RGW log snippet and the command that was run. Am I missing anything 
obvious here?

The user info looks like this:

{ user_id: johndoe,
  display_name: John Doe,
  email: j...@example.com,
  suspended: 0,
  max_buckets: 1000,
  auid: 0,
  subusers: [
{ id: johndoe:swift,
  permissions: full-control}],
  keys: [
{ user: johndoe,
  access_key: 7B39L2TUQ448LZW4RI3M,
  secret_key: lshKCoacSlbyVc7mBLLr4cJ26fEEM22Tcmp29hT3},
{ user: johndoe:swift,
  access_key: SHZ64EF7CIB4V42I14AH,
  secret_key: }],
  swift_keys: [
{ user: johndoe:swift,
  secret_key: asdf}],
  caps: [],
  op_mask: read, write, delete,
  default_placement: ,
  placement_tags: [],
  bucket_quota: { enabled: false,
  max_size_kb: -1,
  max_objects: -1},
  user_quota: { enabled: false,
  max_size_kb: -1,
  max_objects: -1},
  temp_url_keys: []}


The command that was run and the logs:

snip

swift -A http://localhost:8989/auth -U johndoe:swift -K asdf upload mycontainer 
ceph

2015-02-28 23:28:39.272897 7fb610ff9700  1 == starting new request 
req=0x7fb5f0009990 =
2015-02-28 23:28:39.272913 7fb610ff9700  2 req 0:0.16::PUT 
/swift/v1/mycontainer/ceph::initializing
2015-02-28 23:28:39.272918 7fb610ff9700 10 host=localhost:8989
2015-02-28 23:28:39.272921 7fb610ff9700 20 subdomain= domain= in_hosted_domain=0
2015-02-28 23:28:39.272938 7fb610ff9700 10 meta HTTP_X_OBJECT_META_MTIME
2015-02-28 23:28:39.272945 7fb610ff9700 10 x 
x-amz-meta-mtime:1425140933.648506
2015-02-28 23:28:39.272964 7fb610ff9700 10 ver=v1 first=mycontainer req=ceph
2015-02-28 23:28:39.272971 7fb610ff9700 10 s-object=ceph s-bucket=mycontainer
2015-02-28 23:28:39.272976 7fb610ff9700  2 req 0:0.79:swift:PUT 
/swift/v1/mycontainer/ceph::getting op
2015-02-28 23:28:39.272982 7fb610ff9700  2 req 0:0.85:swift:PUT 
/swift/v1/mycontainer/ceph:put_obj:authorizing
2015-02-28 23:28:39.273008 7fb610ff9700 10 swift_user=johndoe:swift
2015-02-28 23:28:39.273026 7fb610ff9700 20 build_token 
token=0d006a6f686e646f653a73776966744436beb90402b13c4f53f35472c2cf0f
2015-02-28 23:28:39.273057 7fb610ff9700  2 req 0:0.000160:swift:PUT 
/swift/v1/mycontainer/ceph:put_obj:reading permissions
2015-02-28 23:28:39.273100 7fb610ff9700 15 Read 
AccessControlPolicyAccessControlPolicy 
xmlns=http://s3.amazonaws.com/doc/2006-03-01/;OwnerIDjohndoe/IDDisplayNameJohn
 Doe/DisplayName/OwnerAccessControlListGrantGrantee 
xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance; 
xsi:type=CanonicalUserIDjohndoe/IDDisplayNameJohn 
Doe/DisplayName/GranteePermissionFULL_CONTROL/Permission/Grant/AccessControlList/AccessControlPolicy
2015-02-28 23:28:39.273114 7fb610ff9700  2 req 0:0.000216:swift:PUT 
/swift/v1/mycontainer/ceph:put_obj:init op
2015-02-28 23:28:39.273120 7fb610ff9700  2 req 0:0.000223:swift:PUT 
/swift/v1/mycontainer/ceph:put_obj:verifying op mask
2015-02-28 23:28:39.273123 7fb610ff9700 20 required_mask= 2 user.op_mask=7
2015-02-28 23:28:39.273125 7fb610ff9700  2 req 0:0.000228:swift:PUT 
/swift/v1/mycontainer/ceph:put_obj:verifying op permissions
2015-02-28 23:28:39.273129 7fb610ff9700  5 Searching permissions for 
uid=johndoe mask=50
2015-02-28 23:28:39.273131 7fb610ff9700  5 Found permission: 15
2015-02-28 23:28:39.273133 7fb610ff9700  5 Searching permissions for group=1 
mask=50
2015-02-28 23:28:39.273135 7fb610ff9700  5 Permissions for group not found
2015-02-28 23:28:39.273136 7fb610ff9700  5 Searching permissions for group=2 
mask=50
2015-02-28 23:28:39.273137 7fb610ff9700  5 Permissions for group not found
2015-02-28 23:28:39.273138 7fb610ff9700  5 Getting permissions id=johndoe 
owner=johndoe perm=2
2015-02-28 23:28:39.273140 7fb610ff9700 10  uid=johndoe requested perm 
(type)=2, policy perm=2, user_perm_mask=2, acl perm=2
2015-02-28 23:28:39.273143 7fb610ff9700  2 req 0:0.000246:swift:PUT 
/swift/v1/mycontainer/ceph:put_obj:verifying op params
2015-02-28 23:28:39.273146 7fb610ff9700  2 req 0:0.000249:swift:PUT 
/swift/v1/mycontainer/ceph:put_obj:executing
2015-02-28 23:28:39.273279 7fb610ff9700 10 x 
x-amz-meta-mtime:1425140933.648506
2015-02-28 23:28:39.273313 7fb610ff9700 20 get_obj_state: rctx=0x7fb610ff41f0 
obj=mycontainer:ceph state=0x7fb5f0016940 s-prefetch_data=0
2015-02-28 23:28:39.274354 7fb610ff9700 20 get_obj_state: rctx=0x7fb610ff41f0 
obj=mycontainer:ceph state=0x7fb5f0016940 s-prefetch_data=0
2015-02-28 23:28:39.274394 7fb610ff9700 10 setting object 
write_tag=default.14199.0
2015-02-28 23:28:39.274554 7fb610ff9700 20 reading from 
.rgw:.bucket.meta.mycontainer:default.14199.3
2015-02-28 23:28:39.274574 7fb610ff9700 20 get_obj_state: rctx=0x7fb610ff2ef0 
obj=.rgw:.bucket.meta.mycontainer:default.14199.3 state=0x7fb5f001db30 

Re: [ceph-users] More than 50% osds down, CPUs still busy; will the cluster recover without help?

2015-02-28 Thread Chris Murray
After noticing that the number increases by 101 on each attempt to start
osd.11, I figured I was only 7 iterations away from the output being
within 101 of 63675. So, I killed the osd process, started it again,
lather, rinse, repeat. I then did the same for other OSDs. Some created
very small logs, and some created logs into the gigabytes. Grepping the
latter for update_osd_stat showed me where the maps were up to, and
therefore which OSDs needed some special attention. Some of the epoch
numbers appeared to increase by themselves to a point and then plateaux,
after which I'd kill then start the osd again, and this number would
start to increase again.

After all either showed 63675, or nothing at all, I turned debugging
back off, deleted logs, and tried to bring the cluster back by unsetting
noup, nobackfill, norecovery etc. It hasn't got very far before
appearing stuck again, with nothing progressing in ceph status. It
appears that 11/15 OSDs are now properly up, but four still aren't. A
lot of placement groups are stale, so I guess I really need the
remaining four to come up.

The OSDs in question are 1, 7, 10  12. All have a line similar to this
as the last in their log:

2015-02-28 10:35:04.240822 7f375ef40780  1 journal _open
/var/lib/ceph/osd/ceph-1/journal fd 21: 5367660544 bytes, block size
4096 bytes, directio = 1, aio = 1

Even with the following in ceph.conf, I'm not seeing anything after that
last line in the log.

 debug osd = 20
 debug filestore = 1

CPU is still being consumed by the ceph-osd process though, but not much
memory is being used compared to the other two OSDs which are up on that
node.

Is there perhaps even further logging that I can use to see why the logs
aren't progressing past this point? 
Osd.1 is on /dev/sdb. iostat still shows some activity as the minutes go
on, but not much:

(60 second intervals)
Device:tpskB_read/skB_wrtn/skB_readkB_wrtn
sdb   5.45 0.00   807.33  0  48440
sdb   5.75 0.00   807.33  0  48440
sdb   5.43 0.00   807.20  0  48440

Thanks,
Chris

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
Chris Murray
Sent: 27 February 2015 10:32
To: Gregory Farnum
Cc: ceph-users
Subject: Re: [ceph-users] More than 50% osds down, CPUs still busy;will
the cluster recover without help?

A little further logging:

2015-02-27 10:27:15.745585 7fe8e3f2f700 20 osd.11 62839 update_osd_stat
osd_stat(1305 GB used, 1431 GB avail, 2789 GB total, peers []/[] op hist
[])
2015-02-27 10:27:15.745619 7fe8e3f2f700  5 osd.11 62839 heartbeat:
osd_stat(1305 GB used, 1431 GB avail, 2789 GB total, peers []/[] op hist
[])
2015-02-27 10:27:23.530913 7fe8e8536700  1 -- 192.168.12.25:6800/673078
-- 192.168.12.25:6789/0 -- mon_subscribe({monmap=6+,osd_pg_creates=0})
v2 -- ?+0 0xe5f26380 con 0xe1f0cc60
2015-02-27 10:27:30.645902 7fe8e3f2f700 20 osd.11 62839 update_osd_stat
osd_stat(1305 GB used, 1431 GB avail, 2789 GB total, peers []/[] op hist
[])
2015-02-27 10:27:30.645938 7fe8e3f2f700  5 osd.11 62839 heartbeat:
osd_stat(1305 GB used, 1431 GB avail, 2789 GB total, peers []/[] op hist
[])
2015-02-27 10:27:33.531142 7fe8e8536700  1 -- 192.168.12.25:6800/673078
-- 192.168.12.25:6789/0 -- mon_subscribe({monmap=6+,osd_pg_creates=0})
v2 -- ?+0 0xe5f26540 con 0xe1f0cc60
2015-02-27 10:27:43.531333 7fe8e8536700  1 -- 192.168.12.25:6800/673078
-- 192.168.12.25:6789/0 -- mon_subscribe({monmap=6+,osd_pg_creates=0})
v2 -- ?+0 0xe5f26700 con 0xe1f0cc60
2015-02-27 10:27:45.546275 7fe8e3f2f700 20 osd.11 62839 update_osd_stat
osd_stat(1305 GB used, 1431 GB avail, 2789 GB total, peers []/[] op hist
[])
2015-02-27 10:27:45.546311 7fe8e3f2f700  5 osd.11 62839 heartbeat:
osd_stat(1305 GB used, 1431 GB avail, 2789 GB total, peers []/[] op hist
[])
2015-02-27 10:27:53.531564 7fe8e8536700  1 -- 192.168.12.25:6800/673078
-- 192.168.12.25:6789/0 -- mon_subscribe({monmap=6+,osd_pg_creates=0})
v2 -- ?+0 0xe5f268c0 con 0xe1f0cc60
2015-02-27 10:27:56.846593 7fe8e3f2f700 20 osd.11 62839 update_osd_stat
osd_stat(1305 GB used, 1431 GB avail, 2789 GB total, peers []/[] op hist
[])
2015-02-27 10:27:56.846627 7fe8e3f2f700  5 osd.11 62839 heartbeat:
osd_stat(1305 GB used, 1431 GB avail, 2789 GB total, peers []/[] op hist
[])
2015-02-27 10:27:57.346965 7fe8e3f2f700 20 osd.11 62839 update_osd_stat
osd_stat(1305 GB used, 1431 GB avail, 2789 GB total, peers []/[] op hist
[])
2015-02-27 10:27:57.347001 7fe8e3f2f700  5 osd.11 62839 heartbeat:
osd_stat(1305 GB used, 1431 GB avail, 2789 GB total, peers []/[] op hist
[])
2015-02-27 10:28:03.531785 7fe8e8536700  1 -- 192.168.12.25:6800/673078
-- 192.168.12.25:6789/0 -- mon_subscribe({monmap=6+,osd_pg_creates=0})
v2 -- ?+0 0xe5f26a80 con 0xe1f0cc60
2015-02-27 10:28:13.532027 7fe8e8536700  1 -- 192.168.12.25:6800/673078
-- 192.168.12.25:6789/0 -- mon_subscribe({monmap=6+,osd_pg_creates=0})
v2 

Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

2015-02-28 Thread Kevin Walker
What about the Samsung 845DC Pro SSD's?

These have fantastic enterprise performance characteristics.

http://www.thessdreview.com/our-reviews/samsung-845dc-pro-review-800gb-class-leading-speed-endurance/

Kind regards

Kevin

On 28 February 2015 at 15:32, Philippe Schwarz p...@schwarz-fr.net wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 Le 28/02/2015 12:19, mad Engineer a écrit :
  Hello All,
 
  I am trying ceph-firefly 0.80.8
  (69eaad7f8308f21573c604f121956e64679a52a7) with 9 OSD ,all Samsung
  SSD 850 EVO on 3 servers with 24 G RAM,16 cores @2.27 Ghz Ubuntu
  14.04 LTS with 3.16-3 kernel.All are connected to 10G ports with
  maximum MTU.There are no extra disks for journaling and also there
  are no separate network for replication and data transfer.All 3
  nodes are also hosting monitoring process.Operating system runs on
  SATA disk.
 
  When doing a sequential benchmark using dd on RBD, mounted on
  client as ext4 its taking 110s to write 100Mb data at an average
  speed of 926Kbps.
 
  time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct
  25000+0 records in 25000+0 records out 10240 bytes (102 MB)
  copied, 110.582 s, 926 kB/s
 
  real1m50.585s user0m0.106s sys 0m2.233s
 
  While doing this directly on ssd mount point shows:
 
  time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct
  25000+0 records in 25000+0 records out 10240 bytes (102 MB)
  copied, 1.38567 s, 73.9 MB/s
 
  OSDs are in XFS with these extra arguments :
 
  rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M
 
  ceph.conf
 
  [global] fsid = 7d889081-7826-439c-9fe5-d4e57480d9be
  mon_initial_members = ceph1, ceph2, ceph3 mon_host =
  10.99.10.118,10.99.10.119,10.99.10.120 auth_cluster_required =
  cephx auth_service_required = cephx auth_client_required = cephx
  filestore_xattr_use_omap = true osd_pool_default_size = 2
  osd_pool_default_min_size = 2 osd_pool_default_pg_num = 450
  osd_pool_default_pgp_num = 450 max_open_files = 131072
 
  [osd] osd_mkfs_type = xfs osd_op_threads = 8 osd_disk_threads = 4
  osd_mount_options_xfs =
  rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M
 
 
  on our traditional storage with Full SAS disk, same dd completes
  in 16s with an average write speed of 6Mbps.
 
  Rados bench:
 
  rados bench -p rbd 10 write Maintaining 16 concurrent writes of
  4194304 bytes for up to 10 seconds or 0 objects Object prefix:
  benchmark_data_ceph1_2977 sec Cur ops   started  finished  avg MB/s
  cur MB/s  last lat   avg lat 0   0 0 0
  0 0 - 0 1  169478
  311.821   312  0.041228  0.140132 2  16   192   176
  351.866   392  0.106294  0.175055 3  16   275   259
  345.216   332  0.076795  0.166036 4  16   302   286
  285.912   108  0.043888  0.196419 5  16   395   379
  303.11   372  0.126033  0.207488 6  16   501   485
  323.242   424  0.125972  0.194559 7  16   621   605
  345.621   480  0.194155  0.183123 8  16   730   714
  356.903   436  0.086678  0.176099 9  16   814   798
  354.572   336  0.081567  0.174786 10  16   832
  816   326.31372  0.037431  0.182355 11  16   833
  817   297.013 4  0.533326  0.182784 Total time run:
  11.489068 Total writes made:  833 Write size:
  4194304 Bandwidth (MB/sec): 290.015
 
  Stddev Bandwidth:   175.723 Max bandwidth (MB/sec): 480 Min
  bandwidth (MB/sec): 0 Average Latency:0.220582 Stddev
  Latency: 0.343697 Max latency:2.85104 Min
  latency:0.035381
 
  Our ultimate aim is to replace existing SAN with ceph,but for that
  it should meet minimum 8000 iops.Can any one help me with this,OSD
  are SSD,CPU has good clock speed,backend network is good but still
  we are not able to extract full capability of SSD disks.
 
 
 
  Thanks,

 Hi, i'm new to ceph so, don't consider my words as holy truth.

 It seems that Samsung 840 (so i assume 850) are crappy for ceph :

 MTBF :

 http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-November/044258.html
 Bandwidth
 :
 http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-December/045247.html

 And according to a confirmed user of Ceph/ProxmoX, Samsung SSDs should
 be avoided if possible in ceph storage.

 Apart from that, it seems there was an limitation in ceph for the use
 of the complete bandwidth available in SSDs; but i think with less
 than 1Mb/s you haven't hit this limit.

 I remind you that i'm not a ceph-guru (far from that, indeed), so feel
 free to disagree; i'm on the way to improve my knowledge.

 Best regards.




 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1

 iEYEARECAAYFAlTxp0UACgkQlhqCFkbqHRb5+wCgrXCM3VsnVE6PCbbpOmQXCXbr
 8u0An2BUgZWismSK0PxbwVDOD5+/UWik
 =0o0v
 -END PGP SIGNATURE-
 ___
 ceph-users mailing list
 

[ceph-users] Shutting down a cluster fully and powering it back up

2015-02-28 Thread David
Hi!

I’m about to do maintenance on a Ceph Cluster, where we need to shut it all 
down fully.
We’re currently only using it for rados block devices to KVM Hypervizors.

Are these steps sane?

Shutting it down

1. Shut down all IO to the cluster. Means turning off all clients (KVM 
Hypervizors in our case).
2. Set cluster to noout by running: ceph osd set noout
3. Shut down the MON nodes.
4. Shut down the OSD nodes.

Starting it up

1. Start the OSD nodes.
2. Start the MON nodes.
3. Check ceph -w to see the status of ceph and take actions if something is 
wrong.
4. Start up the clients (KVM Hypervizors)
5. Run ceph osd unset noout

Kind Regards,
David
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Booting from journal devices

2015-02-28 Thread Nick Fisk
Hi All,

 

Thought I would just share this in case someone finds it useful.

 

I've just finished building our new Ceph cluster where the journals are
installed on the same SSD's as the OS. The SSD's have a MD raid partitions
for the OS and swap and the rest of the SSD's are used for individual
journal partitions. The OS is Ubuntu and as such the default install is
using MBR partitions.

 

After we created the OSD's, the servers were unable to boot anymore. This
was caused by the ceph-deploy making GPT partitions for the journals on the
SSD's, effectively what it looks like is that its overwriting the MBR boot
record.

 

To fix it I carried out the following steps:-

 

1.   Used gdisk on both SSD's to create a new partition from sector 34
to 2047, of type EF02

2.   Ran grub-install against each SSD device

 

Hope that's helps someone, if they come across the same problem.

 

Nick




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] New Cluster - Any requests?

2015-02-28 Thread Nick Fisk
Hi All,

 

I've just finished building a new POC cluster comprised of the following:-

 

4 Hosts in 1 chassis
(http://www.supermicro.com/products/system/4U/F617/SYS-F617H6-FTPT_.cfm)
each with the following:-

 

2x Xeon 2620 v2 (2.1Ghz)

32GB Ram

2x Onboard 10GB-T into 10GB switches

10x 3TB WD Red Pro Disks (currently in k3 m3 EC pool, so 55TB usable)

2x 100G S3700 SSD's for journals and OS

1x 400GB S3700 SSD for SSD cache tier

Ubuntu 14.04.2 (3.16 Kernel)

Running Ceph 87.1

 

Its currently in testing whilst I get iSCSI over RBD working to a state I'm
happy with.

 

As a very rough idea of performance from the SSD tier, I'm seeing about 10K
writes and 40K reads of 4kb at queue depth of 32. During these bench's total
CPU of each host is at about 80%. and this is just 1 SSD OSD per host
remember.

 

Idle power usage is around 500-600W

 

I'm intending to post some performance numbers of the individual components
and RBD performance within the next couple of weeks, but if anybody has any
requests for me to carry out any tests or changes whilst it's in testing
please let me know. I'm happy to create new pools and carry out config
changes, but nothing that will result in me rebuilding the cluster from
scratch.

 

 

Nick




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

2015-02-28 Thread mad Engineer
tried changing scheduler from deadline to noop also upgraded to Gaint and
btrfs filesystem,downgraded kernel to 3.16 from 3.16-3 not much difference

dd if=/dev/zero of=hi bs=4k count=25000 oflag=direct
25000+0 records in
25000+0 records out
10240 bytes (102 MB) copied, 94.691 s, 1.1 MB/s

Earlier on a vmware setup i was getting ~850 KBps and now even on physical
server with SSD drives its just over 1MBps.I doubt some serious
configuration issues.

Tried iperf between 3 servers all are showing 9 Gbps,tried icmp with
different packet size ,no fragmentation.

i also noticed that out of 9 osd 5 are 850 EVO and 4 are 840 EVO.I believe
this will not cause this much drop in performance.

Thanks for any help


On Sat, Feb 28, 2015 at 6:49 PM, Alexandre DERUMIER aderum...@odiso.com
wrote:

 As optimisation,

 try to set ioscheduler to noop,

 and also enable rbd_cache=true. (It's really helping for for sequential
 writes)

 but your results seem quite low, 926kb/s with 4k, it's only 200io/s.

 check if you don't have any big network latencies, or mtu fragementation
 problem.

 Maybe also try to bench with fio, with more parallel jobs.




 - Mail original -
 De: mad Engineer themadengin...@gmail.com
 À: Philippe Schwarz p...@schwarz-fr.net
 Cc: ceph-users ceph-users@lists.ceph.com
 Envoyé: Samedi 28 Février 2015 13:06:59
 Objet: Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9
 OSD with 3.16-3 kernel

 Thanks for the reply Philippe,we were using these disks in our NAS,now
 it looks like i am in big trouble :-(

 On Sat, Feb 28, 2015 at 5:02 PM, Philippe Schwarz p...@schwarz-fr.net
 wrote:
  -BEGIN PGP SIGNED MESSAGE-
  Hash: SHA1
 
  Le 28/02/2015 12:19, mad Engineer a écrit :
  Hello All,
 
  I am trying ceph-firefly 0.80.8
  (69eaad7f8308f21573c604f121956e64679a52a7) with 9 OSD ,all Samsung
  SSD 850 EVO on 3 servers with 24 G RAM,16 cores @2.27 Ghz Ubuntu
  14.04 LTS with 3.16-3 kernel.All are connected to 10G ports with
  maximum MTU.There are no extra disks for journaling and also there
  are no separate network for replication and data transfer.All 3
  nodes are also hosting monitoring process.Operating system runs on
  SATA disk.
 
  When doing a sequential benchmark using dd on RBD, mounted on
  client as ext4 its taking 110s to write 100Mb data at an average
  speed of 926Kbps.
 
  time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct
  25000+0 records in 25000+0 records out 10240 bytes (102 MB)
  copied, 110.582 s, 926 kB/s
 
  real 1m50.585s user 0m0.106s sys 0m2.233s
 
  While doing this directly on ssd mount point shows:
 
  time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct
  25000+0 records in 25000+0 records out 10240 bytes (102 MB)
  copied, 1.38567 s, 73.9 MB/s
 
  OSDs are in XFS with these extra arguments :
 
  rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M
 
  ceph.conf
 
  [global] fsid = 7d889081-7826-439c-9fe5-d4e57480d9be
  mon_initial_members = ceph1, ceph2, ceph3 mon_host =
  10.99.10.118,10.99.10.119,10.99.10.120 auth_cluster_required =
  cephx auth_service_required = cephx auth_client_required = cephx
  filestore_xattr_use_omap = true osd_pool_default_size = 2
  osd_pool_default_min_size = 2 osd_pool_default_pg_num = 450
  osd_pool_default_pgp_num = 450 max_open_files = 131072
 
  [osd] osd_mkfs_type = xfs osd_op_threads = 8 osd_disk_threads = 4
  osd_mount_options_xfs =
  rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M
 
 
  on our traditional storage with Full SAS disk, same dd completes
  in 16s with an average write speed of 6Mbps.
 
  Rados bench:
 
  rados bench -p rbd 10 write Maintaining 16 concurrent writes of
  4194304 bytes for up to 10 seconds or 0 objects Object prefix:
  benchmark_data_ceph1_2977 sec Cur ops started finished avg MB/s
  cur MB/s last lat avg lat 0 0 0 0
  0 0 - 0 1 16 94 78
  311.821 312 0.041228 0.140132 2 16 192 176
  351.866 392 0.106294 0.175055 3 16 275 259
  345.216 332 0.076795 0.166036 4 16 302 286
  285.912 108 0.043888 0.196419 5 16 395 379
  303.11 372 0.126033 0.207488 6 16 501 485
  323.242 424 0.125972 0.194559 7 16 621 605
  345.621 480 0.194155 0.183123 8 16 730 714
  356.903 436 0.086678 0.176099 9 16 814 798
  354.572 336 0.081567 0.174786 10 16 832
  816 326.313 72 0.037431 0.182355 11 16 833
  817 297.013 4 0.533326 0.182784 Total time run:
  11.489068 Total writes made: 833 Write size:
  4194304 Bandwidth (MB/sec): 290.015
 
  Stddev Bandwidth: 175.723 Max bandwidth (MB/sec): 480 Min
  bandwidth (MB/sec): 0 Average Latency: 0.220582 Stddev
  Latency: 0.343697 Max latency: 2.85104 Min
  latency: 0.035381
 
  Our ultimate aim is to replace existing SAN with ceph,but for that
  it should meet minimum 8000 iops.Can any one help me with this,OSD
  are SSD,CPU has good clock speed,backend network is good but still
  we are not able to extract full capability of SSD disks.
 
 
 
  Thanks,
 
  Hi, i'm new to ceph so, don't consider my words as holy truth.
 
  It 

Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

2015-02-28 Thread mad Engineer
reinstalled ceph packages and now with memstore backend [osd objectstore
=memstore] its giving 400Kbps .No idea where the problem is.

On Sun, Mar 1, 2015 at 12:30 AM, mad Engineer themadengin...@gmail.com
wrote:

 tried changing scheduler from deadline to noop also upgraded to Gaint and
 btrfs filesystem,downgraded kernel to 3.16 from 3.16-3 not much difference

 dd if=/dev/zero of=hi bs=4k count=25000 oflag=direct
 25000+0 records in
 25000+0 records out
 10240 bytes (102 MB) copied, 94.691 s, 1.1 MB/s

 Earlier on a vmware setup i was getting ~850 KBps and now even on physical
 server with SSD drives its just over 1MBps.I doubt some serious
 configuration issues.

 Tried iperf between 3 servers all are showing 9 Gbps,tried icmp with
 different packet size ,no fragmentation.

 i also noticed that out of 9 osd 5 are 850 EVO and 4 are 840 EVO.I believe
 this will not cause this much drop in performance.

 Thanks for any help


 On Sat, Feb 28, 2015 at 6:49 PM, Alexandre DERUMIER aderum...@odiso.com
 wrote:

 As optimisation,

 try to set ioscheduler to noop,

 and also enable rbd_cache=true. (It's really helping for for sequential
 writes)

 but your results seem quite low, 926kb/s with 4k, it's only 200io/s.

 check if you don't have any big network latencies, or mtu fragementation
 problem.

 Maybe also try to bench with fio, with more parallel jobs.




 - Mail original -
 De: mad Engineer themadengin...@gmail.com
 À: Philippe Schwarz p...@schwarz-fr.net
 Cc: ceph-users ceph-users@lists.ceph.com
 Envoyé: Samedi 28 Février 2015 13:06:59
 Objet: Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and
 9 OSD with 3.16-3 kernel

 Thanks for the reply Philippe,we were using these disks in our NAS,now
 it looks like i am in big trouble :-(

 On Sat, Feb 28, 2015 at 5:02 PM, Philippe Schwarz p...@schwarz-fr.net
 wrote:
  -BEGIN PGP SIGNED MESSAGE-
  Hash: SHA1
 
  Le 28/02/2015 12:19, mad Engineer a écrit :
  Hello All,
 
  I am trying ceph-firefly 0.80.8
  (69eaad7f8308f21573c604f121956e64679a52a7) with 9 OSD ,all Samsung
  SSD 850 EVO on 3 servers with 24 G RAM,16 cores @2.27 Ghz Ubuntu
  14.04 LTS with 3.16-3 kernel.All are connected to 10G ports with
  maximum MTU.There are no extra disks for journaling and also there
  are no separate network for replication and data transfer.All 3
  nodes are also hosting monitoring process.Operating system runs on
  SATA disk.
 
  When doing a sequential benchmark using dd on RBD, mounted on
  client as ext4 its taking 110s to write 100Mb data at an average
  speed of 926Kbps.
 
  time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct
  25000+0 records in 25000+0 records out 10240 bytes (102 MB)
  copied, 110.582 s, 926 kB/s
 
  real 1m50.585s user 0m0.106s sys 0m2.233s
 
  While doing this directly on ssd mount point shows:
 
  time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct
  25000+0 records in 25000+0 records out 10240 bytes (102 MB)
  copied, 1.38567 s, 73.9 MB/s
 
  OSDs are in XFS with these extra arguments :
 
  rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M
 
  ceph.conf
 
  [global] fsid = 7d889081-7826-439c-9fe5-d4e57480d9be
  mon_initial_members = ceph1, ceph2, ceph3 mon_host =
  10.99.10.118,10.99.10.119,10.99.10.120 auth_cluster_required =
  cephx auth_service_required = cephx auth_client_required = cephx
  filestore_xattr_use_omap = true osd_pool_default_size = 2
  osd_pool_default_min_size = 2 osd_pool_default_pg_num = 450
  osd_pool_default_pgp_num = 450 max_open_files = 131072
 
  [osd] osd_mkfs_type = xfs osd_op_threads = 8 osd_disk_threads = 4
  osd_mount_options_xfs =
  rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M
 
 
  on our traditional storage with Full SAS disk, same dd completes
  in 16s with an average write speed of 6Mbps.
 
  Rados bench:
 
  rados bench -p rbd 10 write Maintaining 16 concurrent writes of
  4194304 bytes for up to 10 seconds or 0 objects Object prefix:
  benchmark_data_ceph1_2977 sec Cur ops started finished avg MB/s
  cur MB/s last lat avg lat 0 0 0 0
  0 0 - 0 1 16 94 78
  311.821 312 0.041228 0.140132 2 16 192 176
  351.866 392 0.106294 0.175055 3 16 275 259
  345.216 332 0.076795 0.166036 4 16 302 286
  285.912 108 0.043888 0.196419 5 16 395 379
  303.11 372 0.126033 0.207488 6 16 501 485
  323.242 424 0.125972 0.194559 7 16 621 605
  345.621 480 0.194155 0.183123 8 16 730 714
  356.903 436 0.086678 0.176099 9 16 814 798
  354.572 336 0.081567 0.174786 10 16 832
  816 326.313 72 0.037431 0.182355 11 16 833
  817 297.013 4 0.533326 0.182784 Total time run:
  11.489068 Total writes made: 833 Write size:
  4194304 Bandwidth (MB/sec): 290.015
 
  Stddev Bandwidth: 175.723 Max bandwidth (MB/sec): 480 Min
  bandwidth (MB/sec): 0 Average Latency: 0.220582 Stddev
  Latency: 0.343697 Max latency: 2.85104 Min
  latency: 0.035381
 
  Our ultimate aim is to replace existing SAN with ceph,but for that
  it should meet minimum 8000 iops.Can any one help me with 

Re: [ceph-users] SSD selection

2015-02-28 Thread Christian Balzer
On Sat, 28 Feb 2015 20:42:35 -0600 Tony Harris wrote:

 Hi all,
 
 I have a small cluster together and it's running fairly well (3 nodes, 21
 osds).  I'm looking to improve the write performance a bit though, which
 I was hoping that using SSDs for journals would do.  But, I was wondering
 what people had as recommendations for SSDs to act as journal drives.
 If I read the docs on ceph.com correctly, I'll need 2 ssds per node
 (with 7 drives in each node, I think the recommendation was 1ssd per 4-5
 drives?) so I'm looking for drives that will work well without breaking
 the bank for where I work (I'll probably have to purchase them myself
 and donate, so my budget is somewhat small).  Any suggestions?  I'd
 prefer one that can finish its write in a power outage case, the only
 one I know of off hand is the intel dcs3700 I think, but at $300 it's
 WAY above my affordability range.

Firstly, an uneven number of OSDs (HDDs) per node will bite you in the
proverbial behind down the road when combined with journal SSDs, as one of
those SSDs will wear our faster than the other.

Secondly, how many SSDs you need is basically a trade-off between price,
performance, endurance and limiting failure impact. 

I have cluster where I used 4 100GB DC S3700s with 8 HDD OSDs, optimizing
the write paths and IOPS and failure domain, but not the sequential speed
or cost.

Depending on what your write load is and the expected lifetime of this
cluster, you might be able to get away with DC S3500s or even better the
new DC S3610s.
Keep in mind that buying a cheap, low endurance SSD now might cost you
more down the road if you have to replace it after a year (TBW/$).

All the cheap alternatives to DC level SSDs tend to wear out too fast,
have no powercaps and tend to have unpredictable (caused by garbage
collection) and steadily decreasing performance.

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Mail not reaching the list?

2015-02-28 Thread Sudarshan Pathak
Mail is landed in Spam.

Here is message from google:
*Why is this message in Spam?* It has a from address in yahoo.com but has
failed yahoo.com's required tests for authentication.  Learn more
https://support.google.com/mail/answer/1366858?hl=enexpand=5





Regards,
Sudarshan Pathak

On Sat, Feb 28, 2015 at 9:25 PM, Tony Harris kg4...@yahoo.com wrote:

 Hi,
 I've sent a couple of emails to the list since subscribing, but I've never
 seen them reach the list; I was just wondering if there was something wrong?

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Am I reaching the list now?

2015-02-28 Thread Tony Harris
I was subscribed with a yahoo email address, but it was getting some grief
so I decided to try using my gmail address, hopefully this one is working

-Tony
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] SSD selection

2015-02-28 Thread Tony Harris
Hi all,

I have a small cluster together and it's running fairly well (3 nodes, 21
osds).  I'm looking to improve the write performance a bit though, which I
was hoping that using SSDs for journals would do.  But, I was wondering
what people had as recommendations for SSDs to act as journal drives.  If I
read the docs on ceph.com correctly, I'll need 2 ssds per node (with 7
drives in each node, I think the recommendation was 1ssd per 4-5 drives?)
so I'm looking for drives that will work well without breaking the bank for
where I work (I'll probably have to purchase them myself and donate, so my
budget is somewhat small).  Any suggestions?  I'd prefer one that can
finish its write in a power outage case, the only one I know of off hand is
the intel dcs3700 I think, but at $300 it's WAY above my affordability
range.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Booting from journal devices

2015-02-28 Thread Christian Balzer

Hello,

On Sat, 28 Feb 2015 18:47:14 - Nick Fisk wrote:

 Hi All,
 
  
 
 Thought I would just share this in case someone finds it useful.
 
  
 
 I've just finished building our new Ceph cluster where the journals are
 installed on the same SSD's as the OS. The SSD's have a MD raid
 partitions for the OS and swap and the rest of the SSD's are used for
 individual journal partitions. The OS is Ubuntu and as such the default
 install is using MBR partitions.

When you say MBR, do you mean DOS partitions?
 
  
 
 After we created the OSD's, the servers were unable to boot anymore. This
 was caused by the ceph-deploy making GPT partitions for the journals on
 the SSD's, effectively what it looks like is that its overwriting the
 MBR boot record.
 
Another reason I dislike these black box, we know better what you want
approaches.

Which version of ceph-deploy was this?

Did you create the journal partitions beforehand?

Because I have a cluster very much like it and ceph-deploy v1.5.7 did not
touch the OS/Journal SSDs other than initializing the journals
(partitions had been created with fdisk before).

Christian

  
 
 To fix it I carried out the following steps:-
 
  
 
 1.   Used gdisk on both SSD's to create a new partition from sector
 34 to 2047, of type EF02
 
 2.   Ran grub-install against each SSD device
 
  
 
 Hope that's helps someone, if they come across the same problem.
 
  
 
 Nick
 
 
 
 


-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

2015-02-28 Thread mad Engineer
I am re installing ceph with giant release,will soon update results with
above configuration changes.

my servers are Cisco UCS C 200 M1 with  Integrated Intel ICH10R SATA
controller.Before installing ceph i changed it to use Software RAID
quoting from below link [When using the integrated RAID, you must enable
the ICH10R controller in SW RAID mode]
http://www.cisco.com/c/en/us/td/docs/unified_computing/ucs/c/hw/C200M1/install/c200M1/RAID.html#88713


Not sure this is the problem.With out ceph ,linux is giving better results
with this controller and SSD disks. WIth ceph over it results are slower
than SATA disks.

Thanks for all your support


On Sun, Mar 1, 2015 at 3:07 AM, Somnath Roy somnath@sandisk.com wrote:

  Sorry, I saw you have already tried with ‘rados bench’. So, some points
 here.



 1. If you are considering write workload, I think with total of 2 copies
 and with 4K workload , you should be able to get ~4K iops (considering it
 hitting the disk, not with memstore).



 2. You are having 9 OSDs and if you created only one pool with only 450
 PGS, you should try to increase that and see if getting any improvement or
 not.



 3. Also, the rados bench script you ran with very low QD, try increasing
 that, may be 32/64.



 4. If you are running firefly, other optimization won’t work here..But,
 you can add the following in your ceph.conf file and it should give you
 some boost.



 debug_lockdep = 0/0

 debug_context = 0/0

 debug_crush = 0/0

 debug_buffer = 0/0

 debug_timer = 0/0

 debug_filer = 0/0

 debug_objecter = 0/0

 debug_rados = 0/0

 debug_rbd = 0/0

 debug_journaler = 0/0

 debug_objectcatcher = 0/0

 debug_client = 0/0

 debug_osd = 0/0

 debug_optracker = 0/0

 debug_objclass = 0/0

 debug_filestore = 0/0

 debug_journal = 0/0

 debug_ms = 0/0

 debug_monc = 0/0

 debug_tp = 0/0

 debug_auth = 0/0

 debug_finisher = 0/0

 debug_heartbeatmap = 0/0

 debug_perfcounter = 0/0

 debug_asok = 0/0

 debug_throttle = 0/0

 debug_mon = 0/0

 debug_paxos = 0/0

 debug_rgw = 0/0



 5. Give us the ceph –s output and the iostat output while io is going on.



 Thanks  Regards

 Somnath







 *From:* Somnath Roy
 *Sent:* Saturday, February 28, 2015 12:59 PM
 *To:* 'mad Engineer'; Alexandre DERUMIER
 *Cc:* ceph-users
 *Subject:* RE: [ceph-users] Extreme slowness in SSD cluster with 3 nodes
 and 9 OSD with 3.16-3 kernel



 I would say check with rados tool like ceph_smalliobench/rados bench first
 to see how much performance these tools are reporting. This will help you
 to isolate any upstream issues.

 Also, check with ‘iostat –xk 1’ for the resource utilization. Hope you are
 running with powerful enough cpu complex since you are saying network is
 not a bottleneck.



 Thanks  Regards

 Somnath



 *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf
 Of *mad Engineer
 *Sent:* Saturday, February 28, 2015 12:29 PM
 *To:* Alexandre DERUMIER
 *Cc:* ceph-users
 *Subject:* Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes
 and 9 OSD with 3.16-3 kernel



 reinstalled ceph packages and now with memstore backend [osd objectstore
 =memstore] its giving 400Kbps .No idea where the problem is.



 On Sun, Mar 1, 2015 at 12:30 AM, mad Engineer themadengin...@gmail.com
 wrote:

 tried changing scheduler from deadline to noop also upgraded to Gaint and
 btrfs filesystem,downgraded kernel to 3.16 from 3.16-3 not much difference



 dd if=/dev/zero of=hi bs=4k count=25000 oflag=direct

 25000+0 records in

 25000+0 records out

 10240 bytes (102 MB) copied, 94.691 s, 1.1 MB/s



 Earlier on a vmware setup i was getting ~850 KBps and now even on physical
 server with SSD drives its just over 1MBps.I doubt some serious
 configuration issues.



 Tried iperf between 3 servers all are showing 9 Gbps,tried icmp with
 different packet size ,no fragmentation.



 i also noticed that out of 9 osd 5 are 850 EVO and 4 are 840 EVO.I believe
 this will not cause this much drop in performance.



 Thanks for any help





 On Sat, Feb 28, 2015 at 6:49 PM, Alexandre DERUMIER aderum...@odiso.com
 wrote:

 As optimisation,

 try to set ioscheduler to noop,

 and also enable rbd_cache=true. (It's really helping for for sequential
 writes)

 but your results seem quite low, 926kb/s with 4k, it's only 200io/s.

 check if you don't have any big network latencies, or mtu fragementation
 problem.

 Maybe also try to bench with fio, with more parallel jobs.




 - Mail original -
 De: mad Engineer themadengin...@gmail.com
 À: Philippe Schwarz p...@schwarz-fr.net
 Cc: ceph-users ceph-users@lists.ceph.com
 Envoyé: Samedi 28 Février 2015 13:06:59
 Objet: Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9
 OSD with 3.16-3 kernel

 Thanks for the reply Philippe,we were using these disks in our NAS,now
 it looks like i am in big trouble :-(

 On Sat, Feb 28, 2015 at 5:02 PM, Philippe Schwarz p...@schwarz-fr.net
 wrote:
  -BEGIN PGP SIGNED MESSAGE-
  Hash: