Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Le 28/02/2015 12:19, mad Engineer a écrit : Hello All, I am trying ceph-firefly 0.80.8 (69eaad7f8308f21573c604f121956e64679a52a7) with 9 OSD ,all Samsung SSD 850 EVO on 3 servers with 24 G RAM,16 cores @2.27 Ghz Ubuntu 14.04 LTS with 3.16-3 kernel.All are connected to 10G ports with maximum MTU.There are no extra disks for journaling and also there are no separate network for replication and data transfer.All 3 nodes are also hosting monitoring process.Operating system runs on SATA disk. When doing a sequential benchmark using dd on RBD, mounted on client as ext4 its taking 110s to write 100Mb data at an average speed of 926Kbps. time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct 25000+0 records in 25000+0 records out 10240 bytes (102 MB) copied, 110.582 s, 926 kB/s real1m50.585s user0m0.106s sys 0m2.233s While doing this directly on ssd mount point shows: time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct 25000+0 records in 25000+0 records out 10240 bytes (102 MB) copied, 1.38567 s, 73.9 MB/s OSDs are in XFS with these extra arguments : rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M ceph.conf [global] fsid = 7d889081-7826-439c-9fe5-d4e57480d9be mon_initial_members = ceph1, ceph2, ceph3 mon_host = 10.99.10.118,10.99.10.119,10.99.10.120 auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx filestore_xattr_use_omap = true osd_pool_default_size = 2 osd_pool_default_min_size = 2 osd_pool_default_pg_num = 450 osd_pool_default_pgp_num = 450 max_open_files = 131072 [osd] osd_mkfs_type = xfs osd_op_threads = 8 osd_disk_threads = 4 osd_mount_options_xfs = rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M on our traditional storage with Full SAS disk, same dd completes in 16s with an average write speed of 6Mbps. Rados bench: rados bench -p rbd 10 write Maintaining 16 concurrent writes of 4194304 bytes for up to 10 seconds or 0 objects Object prefix: benchmark_data_ceph1_2977 sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 0 0 0 0 0 0 - 0 1 169478 311.821 312 0.041228 0.140132 2 16 192 176 351.866 392 0.106294 0.175055 3 16 275 259 345.216 332 0.076795 0.166036 4 16 302 286 285.912 108 0.043888 0.196419 5 16 395 379 303.11 372 0.126033 0.207488 6 16 501 485 323.242 424 0.125972 0.194559 7 16 621 605 345.621 480 0.194155 0.183123 8 16 730 714 356.903 436 0.086678 0.176099 9 16 814 798 354.572 336 0.081567 0.174786 10 16 832 816 326.31372 0.037431 0.182355 11 16 833 817 297.013 4 0.533326 0.182784 Total time run: 11.489068 Total writes made: 833 Write size: 4194304 Bandwidth (MB/sec): 290.015 Stddev Bandwidth: 175.723 Max bandwidth (MB/sec): 480 Min bandwidth (MB/sec): 0 Average Latency:0.220582 Stddev Latency: 0.343697 Max latency:2.85104 Min latency:0.035381 Our ultimate aim is to replace existing SAN with ceph,but for that it should meet minimum 8000 iops.Can any one help me with this,OSD are SSD,CPU has good clock speed,backend network is good but still we are not able to extract full capability of SSD disks. Thanks, Hi, i'm new to ceph so, don't consider my words as holy truth. It seems that Samsung 840 (so i assume 850) are crappy for ceph : MTBF : http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-November/044258.html Bandwidth :http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-December/045247.html And according to a confirmed user of Ceph/ProxmoX, Samsung SSDs should be avoided if possible in ceph storage. Apart from that, it seems there was an limitation in ceph for the use of the complete bandwidth available in SSDs; but i think with less than 1Mb/s you haven't hit this limit. I remind you that i'm not a ceph-guru (far from that, indeed), so feel free to disagree; i'm on the way to improve my knowledge. Best regards. -BEGIN PGP SIGNATURE- Version: GnuPG v1 iEYEARECAAYFAlTxp0UACgkQlhqCFkbqHRb5+wCgrXCM3VsnVE6PCbbpOmQXCXbr 8u0An2BUgZWismSK0PxbwVDOD5+/UWik =0o0v -END PGP SIGNATURE- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel
Am 28.02.2015 um 12:43 schrieb Alexandre DERUMIER aderum...@odiso.com: Hi, First, test if your ssd can write fast with O_DSYNC check this blog: http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/ Then, try with ceph Giant (or maybe wait for Hammer), because they are a lot of optimisations for ssd for threads sharding. In my last test with giant, I was able to reach around 12iops with 6osd/intel s3500 ssd, but I was cpu limited. But this was replication1? I never was able to do more than 30 000 with replication 3. Stefan - Mail original - De: mad Engineer themadengin...@gmail.com À: ceph-users ceph-users@lists.ceph.com Envoyé: Samedi 28 Février 2015 12:19:56 Objet: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel Hello All, I am trying ceph-firefly 0.80.8 (69eaad7f8308f21573c604f121956e64679a52a7) with 9 OSD ,all Samsung SSD 850 EVO on 3 servers with 24 G RAM,16 cores @2.27 Ghz Ubuntu 14.04 LTS with 3.16-3 kernel.All are connected to 10G ports with maximum MTU.There are no extra disks for journaling and also there are no separate network for replication and data transfer.All 3 nodes are also hosting monitoring process.Operating system runs on SATA disk. When doing a sequential benchmark using dd on RBD, mounted on client as ext4 its taking 110s to write 100Mb data at an average speed of 926Kbps. time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct 25000+0 records in 25000+0 records out 10240 bytes (102 MB) copied, 110.582 s, 926 kB/s real 1m50.585s user 0m0.106s sys 0m2.233s While doing this directly on ssd mount point shows: time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct 25000+0 records in 25000+0 records out 10240 bytes (102 MB) copied, 1.38567 s, 73.9 MB/s OSDs are in XFS with these extra arguments : rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M ceph.conf [global] fsid = 7d889081-7826-439c-9fe5-d4e57480d9be mon_initial_members = ceph1, ceph2, ceph3 mon_host = 10.99.10.118,10.99.10.119,10.99.10.120 auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx filestore_xattr_use_omap = true osd_pool_default_size = 2 osd_pool_default_min_size = 2 osd_pool_default_pg_num = 450 osd_pool_default_pgp_num = 450 max_open_files = 131072 [osd] osd_mkfs_type = xfs osd_op_threads = 8 osd_disk_threads = 4 osd_mount_options_xfs = rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M on our traditional storage with Full SAS disk, same dd completes in 16s with an average write speed of 6Mbps. Rados bench: rados bench -p rbd 10 write Maintaining 16 concurrent writes of 4194304 bytes for up to 10 seconds or 0 objects Object prefix: benchmark_data_ceph1_2977 sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 0 0 0 0 0 0 - 0 1 16 94 78 311.821 312 0.041228 0.140132 2 16 192 176 351.866 392 0.106294 0.175055 3 16 275 259 345.216 332 0.076795 0.166036 4 16 302 286 285.912 108 0.043888 0.196419 5 16 395 379 303.11 372 0.126033 0.207488 6 16 501 485 323.242 424 0.125972 0.194559 7 16 621 605 345.621 480 0.194155 0.183123 8 16 730 714 356.903 436 0.086678 0.176099 9 16 814 798 354.572 336 0.081567 0.174786 10 16 832 816 326.313 72 0.037431 0.182355 11 16 833 817 297.013 4 0.533326 0.182784 Total time run: 11.489068 Total writes made: 833 Write size: 4194304 Bandwidth (MB/sec): 290.015 Stddev Bandwidth: 175.723 Max bandwidth (MB/sec): 480 Min bandwidth (MB/sec): 0 Average Latency: 0.220582 Stddev Latency: 0.343697 Max latency: 2.85104 Min latency: 0.035381 Our ultimate aim is to replace existing SAN with ceph,but for that it should meet minimum 8000 iops.Can any one help me with this,OSD are SSD,CPU has good clock speed,backend network is good but still we are not able to extract full capability of SSD disks. Thanks, ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel
But this was replication1? I never was able to do more than 30 000 with replication 3. Oh, sorry, it's was about read. for write, I think I was around 3iops with 3 nodes (2x4cores 2,1ghz each), cpu bound, with replication x1. with replication x3, around 9000iops. Going to test on 2x10cores 3,1ghz in some weeks. - Mail original - De: Stefan Priebe s.pri...@profihost.ag À: aderumier aderum...@odiso.com Cc: mad Engineer themadengin...@gmail.com, ceph-users ceph-users@lists.ceph.com Envoyé: Samedi 28 Février 2015 13:42:54 Objet: Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel Am 28.02.2015 um 12:43 schrieb Alexandre DERUMIER aderum...@odiso.com: Hi, First, test if your ssd can write fast with O_DSYNC check this blog: http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/ Then, try with ceph Giant (or maybe wait for Hammer), because they are a lot of optimisations for ssd for threads sharding. In my last test with giant, I was able to reach around 12iops with 6osd/intel s3500 ssd, but I was cpu limited. But this was replication1? I never was able to do more than 30 000 with replication 3. Stefan - Mail original - De: mad Engineer themadengin...@gmail.com À: ceph-users ceph-users@lists.ceph.com Envoyé: Samedi 28 Février 2015 12:19:56 Objet: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel Hello All, I am trying ceph-firefly 0.80.8 (69eaad7f8308f21573c604f121956e64679a52a7) with 9 OSD ,all Samsung SSD 850 EVO on 3 servers with 24 G RAM,16 cores @2.27 Ghz Ubuntu 14.04 LTS with 3.16-3 kernel.All are connected to 10G ports with maximum MTU.There are no extra disks for journaling and also there are no separate network for replication and data transfer.All 3 nodes are also hosting monitoring process.Operating system runs on SATA disk. When doing a sequential benchmark using dd on RBD, mounted on client as ext4 its taking 110s to write 100Mb data at an average speed of 926Kbps. time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct 25000+0 records in 25000+0 records out 10240 bytes (102 MB) copied, 110.582 s, 926 kB/s real 1m50.585s user 0m0.106s sys 0m2.233s While doing this directly on ssd mount point shows: time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct 25000+0 records in 25000+0 records out 10240 bytes (102 MB) copied, 1.38567 s, 73.9 MB/s OSDs are in XFS with these extra arguments : rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M ceph.conf [global] fsid = 7d889081-7826-439c-9fe5-d4e57480d9be mon_initial_members = ceph1, ceph2, ceph3 mon_host = 10.99.10.118,10.99.10.119,10.99.10.120 auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx filestore_xattr_use_omap = true osd_pool_default_size = 2 osd_pool_default_min_size = 2 osd_pool_default_pg_num = 450 osd_pool_default_pgp_num = 450 max_open_files = 131072 [osd] osd_mkfs_type = xfs osd_op_threads = 8 osd_disk_threads = 4 osd_mount_options_xfs = rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M on our traditional storage with Full SAS disk, same dd completes in 16s with an average write speed of 6Mbps. Rados bench: rados bench -p rbd 10 write Maintaining 16 concurrent writes of 4194304 bytes for up to 10 seconds or 0 objects Object prefix: benchmark_data_ceph1_2977 sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 0 0 0 0 0 0 - 0 1 16 94 78 311.821 312 0.041228 0.140132 2 16 192 176 351.866 392 0.106294 0.175055 3 16 275 259 345.216 332 0.076795 0.166036 4 16 302 286 285.912 108 0.043888 0.196419 5 16 395 379 303.11 372 0.126033 0.207488 6 16 501 485 323.242 424 0.125972 0.194559 7 16 621 605 345.621 480 0.194155 0.183123 8 16 730 714 356.903 436 0.086678 0.176099 9 16 814 798 354.572 336 0.081567 0.174786 10 16 832 816 326.313 72 0.037431 0.182355 11 16 833 817 297.013 4 0.533326 0.182784 Total time run: 11.489068 Total writes made: 833 Write size: 4194304 Bandwidth (MB/sec): 290.015 Stddev Bandwidth: 175.723 Max bandwidth (MB/sec): 480 Min bandwidth (MB/sec): 0 Average Latency: 0.220582 Stddev Latency: 0.343697 Max latency: 2.85104 Min latency: 0.035381 Our ultimate aim is to replace existing SAN with ceph,but for that it should meet minimum 8000 iops.Can any one help me with this,OSD are SSD,CPU has good clock speed,backend network is good but still we are not able to extract full capability of SSD disks. Thanks, ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list
Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel
thanks for that link Alexandre, as per that link tried these: *850 EVO* *without dsync* dd if=randfile of=/dev/sdb1 bs=4k count=10 oflag=direct 10+0 records in 10+0 records out 40960 bytes (410 MB) copied, 4.42913 s, 92.5 MB/s with *dsync*: dd if=randfile of=/dev/sdb1 bs=4k count=10 oflag=direct,dsync 10+0 records in 10+0 records out 40960 bytes (410 MB) copied, 83.4916 s, 4.9 MB/s *on 840 EVO* dd if=randfile of=/dev/sdd1 bs=4k count=10 oflag=direct 10+0 records in 10+0 records out 40960 bytes (410 MB) copied, 5.11912 s, 80.0 MB/s *with dsync* dd if=randfile of=/dev/sdd1 bs=4k count=10 oflag=direct,dsync 10+0 records in 10+0 records out 40960 bytes (410 MB) copied, 196.738 s, 2.1 MB/s So with dsync there is significant reduction in performance,looks like 850 is better than 840.Can this be the reason for reduced write speed of 926kbps? Also before trying on physical servers i ran ceph on vmware vms with SAS disks using giant 0.87 ,at that time fire-fly 80.8 was giving higher numbers,so decided to use firefly. On Sat, Feb 28, 2015 at 5:13 PM, Alexandre DERUMIER aderum...@odiso.com wrote: Hi, First, test if your ssd can write fast with O_DSYNC check this blog: http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/ Then, try with ceph Giant (or maybe wait for Hammer), because they are a lot of optimisations for ssd for threads sharding. In my last test with giant, I was able to reach around 12iops with 6osd/intel s3500 ssd, but I was cpu limited. - Mail original - De: mad Engineer themadengin...@gmail.com À: ceph-users ceph-users@lists.ceph.com Envoyé: Samedi 28 Février 2015 12:19:56 Objet: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel Hello All, I am trying ceph-firefly 0.80.8 (69eaad7f8308f21573c604f121956e64679a52a7) with 9 OSD ,all Samsung SSD 850 EVO on 3 servers with 24 G RAM,16 cores @2.27 Ghz Ubuntu 14.04 LTS with 3.16-3 kernel.All are connected to 10G ports with maximum MTU.There are no extra disks for journaling and also there are no separate network for replication and data transfer.All 3 nodes are also hosting monitoring process.Operating system runs on SATA disk. When doing a sequential benchmark using dd on RBD, mounted on client as ext4 its taking 110s to write 100Mb data at an average speed of 926Kbps. time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct 25000+0 records in 25000+0 records out 10240 bytes (102 MB) copied, 110.582 s, 926 kB/s real 1m50.585s user 0m0.106s sys 0m2.233s While doing this directly on ssd mount point shows: time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct 25000+0 records in 25000+0 records out 10240 bytes (102 MB) copied, 1.38567 s, 73.9 MB/s OSDs are in XFS with these extra arguments : rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M ceph.conf [global] fsid = 7d889081-7826-439c-9fe5-d4e57480d9be mon_initial_members = ceph1, ceph2, ceph3 mon_host = 10.99.10.118,10.99.10.119,10.99.10.120 auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx filestore_xattr_use_omap = true osd_pool_default_size = 2 osd_pool_default_min_size = 2 osd_pool_default_pg_num = 450 osd_pool_default_pgp_num = 450 max_open_files = 131072 [osd] osd_mkfs_type = xfs osd_op_threads = 8 osd_disk_threads = 4 osd_mount_options_xfs = rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M on our traditional storage with Full SAS disk, same dd completes in 16s with an average write speed of 6Mbps. Rados bench: rados bench -p rbd 10 write Maintaining 16 concurrent writes of 4194304 bytes for up to 10 seconds or 0 objects Object prefix: benchmark_data_ceph1_2977 sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 0 0 0 0 0 0 - 0 1 16 94 78 311.821 312 0.041228 0.140132 2 16 192 176 351.866 392 0.106294 0.175055 3 16 275 259 345.216 332 0.076795 0.166036 4 16 302 286 285.912 108 0.043888 0.196419 5 16 395 379 303.11 372 0.126033 0.207488 6 16 501 485 323.242 424 0.125972 0.194559 7 16 621 605 345.621 480 0.194155 0.183123 8 16 730 714 356.903 436 0.086678 0.176099 9 16 814 798 354.572 336 0.081567 0.174786 10 16 832 816 326.313 72 0.037431 0.182355 11 16 833 817 297.013 4 0.533326 0.182784 Total time run: 11.489068 Total writes made: 833 Write size: 4194304 Bandwidth (MB/sec): 290.015 Stddev Bandwidth: 175.723 Max bandwidth (MB/sec): 480 Min bandwidth (MB/sec): 0 Average Latency: 0.220582 Stddev Latency: 0.343697 Max latency: 2.85104 Min latency: 0.035381 Our ultimate aim is to replace existing SAN with ceph,but for that it should meet minimum 8000 iops.Can any one help me with this,OSD are SSD,CPU has good clock speed,backend network is good but still we are not able to extract full capability of SSD disks. Thanks,
[ceph-users] Mail not reaching the list?
Hi,I've sent a couple of emails to the list since subscribing, but I've never seen them reach the list; I was just wondering if there was something wrong?___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel
Hello All, I am trying ceph-firefly 0.80.8 (69eaad7f8308f21573c604f121956e64679a52a7) with 9 OSD ,all Samsung SSD 850 EVO on 3 servers with 24 G RAM,16 cores @2.27 Ghz Ubuntu 14.04 LTS with 3.16-3 kernel.All are connected to 10G ports with maximum MTU.There are no extra disks for journaling and also there are no separate network for replication and data transfer.All 3 nodes are also hosting monitoring process.Operating system runs on SATA disk. When doing a sequential benchmark using dd on RBD, mounted on client as ext4 its taking 110s to write 100Mb data at an average speed of 926Kbps. time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct 25000+0 records in 25000+0 records out 10240 bytes (102 MB) copied, 110.582 s, 926 kB/s real1m50.585s user0m0.106s sys 0m2.233s While doing this directly on ssd mount point shows: time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct 25000+0 records in 25000+0 records out 10240 bytes (102 MB) copied, 1.38567 s, 73.9 MB/s OSDs are in XFS with these extra arguments : rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M ceph.conf [global] fsid = 7d889081-7826-439c-9fe5-d4e57480d9be mon_initial_members = ceph1, ceph2, ceph3 mon_host = 10.99.10.118,10.99.10.119,10.99.10.120 auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx filestore_xattr_use_omap = true osd_pool_default_size = 2 osd_pool_default_min_size = 2 osd_pool_default_pg_num = 450 osd_pool_default_pgp_num = 450 max_open_files = 131072 [osd] osd_mkfs_type = xfs osd_op_threads = 8 osd_disk_threads = 4 osd_mount_options_xfs = rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M on our traditional storage with Full SAS disk, same dd completes in 16s with an average write speed of 6Mbps. Rados bench: rados bench -p rbd 10 write Maintaining 16 concurrent writes of 4194304 bytes for up to 10 seconds or 0 objects Object prefix: benchmark_data_ceph1_2977 sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 0 0 0 0 0 0 - 0 1 169478 311.821 312 0.041228 0.140132 2 16 192 176 351.866 392 0.106294 0.175055 3 16 275 259 345.216 332 0.076795 0.166036 4 16 302 286 285.912 108 0.043888 0.196419 5 16 395 379303.11 372 0.126033 0.207488 6 16 501 485 323.242 424 0.125972 0.194559 7 16 621 605 345.621 480 0.194155 0.183123 8 16 730 714 356.903 436 0.086678 0.176099 9 16 814 798 354.572 336 0.081567 0.174786 10 16 832 816 326.31372 0.037431 0.182355 11 16 833 817 297.013 4 0.533326 0.182784 Total time run: 11.489068 Total writes made: 833 Write size: 4194304 Bandwidth (MB/sec): 290.015 Stddev Bandwidth: 175.723 Max bandwidth (MB/sec): 480 Min bandwidth (MB/sec): 0 Average Latency:0.220582 Stddev Latency: 0.343697 Max latency:2.85104 Min latency:0.035381 Our ultimate aim is to replace existing SAN with ceph,but for that it should meet minimum 8000 iops.Can any one help me with this,OSD are SSD,CPU has good clock speed,backend network is good but still we are not able to extract full capability of SSD disks. Thanks, ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel
As optimisation, try to set ioscheduler to noop, and also enable rbd_cache=true. (It's really helping for for sequential writes) but your results seem quite low, 926kb/s with 4k, it's only 200io/s. check if you don't have any big network latencies, or mtu fragementation problem. Maybe also try to bench with fio, with more parallel jobs. - Mail original - De: mad Engineer themadengin...@gmail.com À: Philippe Schwarz p...@schwarz-fr.net Cc: ceph-users ceph-users@lists.ceph.com Envoyé: Samedi 28 Février 2015 13:06:59 Objet: Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel Thanks for the reply Philippe,we were using these disks in our NAS,now it looks like i am in big trouble :-( On Sat, Feb 28, 2015 at 5:02 PM, Philippe Schwarz p...@schwarz-fr.net wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Le 28/02/2015 12:19, mad Engineer a écrit : Hello All, I am trying ceph-firefly 0.80.8 (69eaad7f8308f21573c604f121956e64679a52a7) with 9 OSD ,all Samsung SSD 850 EVO on 3 servers with 24 G RAM,16 cores @2.27 Ghz Ubuntu 14.04 LTS with 3.16-3 kernel.All are connected to 10G ports with maximum MTU.There are no extra disks for journaling and also there are no separate network for replication and data transfer.All 3 nodes are also hosting monitoring process.Operating system runs on SATA disk. When doing a sequential benchmark using dd on RBD, mounted on client as ext4 its taking 110s to write 100Mb data at an average speed of 926Kbps. time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct 25000+0 records in 25000+0 records out 10240 bytes (102 MB) copied, 110.582 s, 926 kB/s real 1m50.585s user 0m0.106s sys 0m2.233s While doing this directly on ssd mount point shows: time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct 25000+0 records in 25000+0 records out 10240 bytes (102 MB) copied, 1.38567 s, 73.9 MB/s OSDs are in XFS with these extra arguments : rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M ceph.conf [global] fsid = 7d889081-7826-439c-9fe5-d4e57480d9be mon_initial_members = ceph1, ceph2, ceph3 mon_host = 10.99.10.118,10.99.10.119,10.99.10.120 auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx filestore_xattr_use_omap = true osd_pool_default_size = 2 osd_pool_default_min_size = 2 osd_pool_default_pg_num = 450 osd_pool_default_pgp_num = 450 max_open_files = 131072 [osd] osd_mkfs_type = xfs osd_op_threads = 8 osd_disk_threads = 4 osd_mount_options_xfs = rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M on our traditional storage with Full SAS disk, same dd completes in 16s with an average write speed of 6Mbps. Rados bench: rados bench -p rbd 10 write Maintaining 16 concurrent writes of 4194304 bytes for up to 10 seconds or 0 objects Object prefix: benchmark_data_ceph1_2977 sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 0 0 0 0 0 0 - 0 1 16 94 78 311.821 312 0.041228 0.140132 2 16 192 176 351.866 392 0.106294 0.175055 3 16 275 259 345.216 332 0.076795 0.166036 4 16 302 286 285.912 108 0.043888 0.196419 5 16 395 379 303.11 372 0.126033 0.207488 6 16 501 485 323.242 424 0.125972 0.194559 7 16 621 605 345.621 480 0.194155 0.183123 8 16 730 714 356.903 436 0.086678 0.176099 9 16 814 798 354.572 336 0.081567 0.174786 10 16 832 816 326.313 72 0.037431 0.182355 11 16 833 817 297.013 4 0.533326 0.182784 Total time run: 11.489068 Total writes made: 833 Write size: 4194304 Bandwidth (MB/sec): 290.015 Stddev Bandwidth: 175.723 Max bandwidth (MB/sec): 480 Min bandwidth (MB/sec): 0 Average Latency: 0.220582 Stddev Latency: 0.343697 Max latency: 2.85104 Min latency: 0.035381 Our ultimate aim is to replace existing SAN with ceph,but for that it should meet minimum 8000 iops.Can any one help me with this,OSD are SSD,CPU has good clock speed,backend network is good but still we are not able to extract full capability of SSD disks. Thanks, Hi, i'm new to ceph so, don't consider my words as holy truth. It seems that Samsung 840 (so i assume 850) are crappy for ceph : MTBF : http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-November/044258.html Bandwidth :http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-December/045247.html And according to a confirmed user of Ceph/ProxmoX, Samsung SSDs should be avoided if possible in ceph storage. Apart from that, it seems there was an limitation in ceph for the use of the complete bandwidth available in SSDs; but i think with less than 1Mb/s you haven't hit this limit. I remind you that i'm not a ceph-guru (far from that, indeed), so feel free to disagree; i'm on the way to improve my knowledge. Best regards. -BEGIN PGP SIGNATURE- Version: GnuPG v1
Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel
Hi, First, test if your ssd can write fast with O_DSYNC check this blog: http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/ Then, try with ceph Giant (or maybe wait for Hammer), because they are a lot of optimisations for ssd for threads sharding. In my last test with giant, I was able to reach around 12iops with 6osd/intel s3500 ssd, but I was cpu limited. - Mail original - De: mad Engineer themadengin...@gmail.com À: ceph-users ceph-users@lists.ceph.com Envoyé: Samedi 28 Février 2015 12:19:56 Objet: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel Hello All, I am trying ceph-firefly 0.80.8 (69eaad7f8308f21573c604f121956e64679a52a7) with 9 OSD ,all Samsung SSD 850 EVO on 3 servers with 24 G RAM,16 cores @2.27 Ghz Ubuntu 14.04 LTS with 3.16-3 kernel.All are connected to 10G ports with maximum MTU.There are no extra disks for journaling and also there are no separate network for replication and data transfer.All 3 nodes are also hosting monitoring process.Operating system runs on SATA disk. When doing a sequential benchmark using dd on RBD, mounted on client as ext4 its taking 110s to write 100Mb data at an average speed of 926Kbps. time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct 25000+0 records in 25000+0 records out 10240 bytes (102 MB) copied, 110.582 s, 926 kB/s real 1m50.585s user 0m0.106s sys 0m2.233s While doing this directly on ssd mount point shows: time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct 25000+0 records in 25000+0 records out 10240 bytes (102 MB) copied, 1.38567 s, 73.9 MB/s OSDs are in XFS with these extra arguments : rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M ceph.conf [global] fsid = 7d889081-7826-439c-9fe5-d4e57480d9be mon_initial_members = ceph1, ceph2, ceph3 mon_host = 10.99.10.118,10.99.10.119,10.99.10.120 auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx filestore_xattr_use_omap = true osd_pool_default_size = 2 osd_pool_default_min_size = 2 osd_pool_default_pg_num = 450 osd_pool_default_pgp_num = 450 max_open_files = 131072 [osd] osd_mkfs_type = xfs osd_op_threads = 8 osd_disk_threads = 4 osd_mount_options_xfs = rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M on our traditional storage with Full SAS disk, same dd completes in 16s with an average write speed of 6Mbps. Rados bench: rados bench -p rbd 10 write Maintaining 16 concurrent writes of 4194304 bytes for up to 10 seconds or 0 objects Object prefix: benchmark_data_ceph1_2977 sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 0 0 0 0 0 0 - 0 1 16 94 78 311.821 312 0.041228 0.140132 2 16 192 176 351.866 392 0.106294 0.175055 3 16 275 259 345.216 332 0.076795 0.166036 4 16 302 286 285.912 108 0.043888 0.196419 5 16 395 379 303.11 372 0.126033 0.207488 6 16 501 485 323.242 424 0.125972 0.194559 7 16 621 605 345.621 480 0.194155 0.183123 8 16 730 714 356.903 436 0.086678 0.176099 9 16 814 798 354.572 336 0.081567 0.174786 10 16 832 816 326.313 72 0.037431 0.182355 11 16 833 817 297.013 4 0.533326 0.182784 Total time run: 11.489068 Total writes made: 833 Write size: 4194304 Bandwidth (MB/sec): 290.015 Stddev Bandwidth: 175.723 Max bandwidth (MB/sec): 480 Min bandwidth (MB/sec): 0 Average Latency: 0.220582 Stddev Latency: 0.343697 Max latency: 2.85104 Min latency: 0.035381 Our ultimate aim is to replace existing SAN with ceph,but for that it should meet minimum 8000 iops.Can any one help me with this,OSD are SSD,CPU has good clock speed,backend network is good but still we are not able to extract full capability of SSD disks. Thanks, ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel
Martin, I have been using Samsung 840 Pro for journals about 2 years now and have just replaced all my samsung drives with Intel. We have found a lot of performance issues with 840 Pro (we are using 128mb). In particular, a very strange behaviour with using 4 partitions (with 50% underprovisioning left as empty unpartitioned space on the drive) where the drive would grind to almost a halt after a few weeks of use. I was getting 100% utilisation on the drives doing just 3-4MB/s writes. This was not the case when I've installed the new drives. Manual Trimming helps for a few weeks until the same happens again. This has been happening with all 840 Pro ssds that we have and contacting Samsung Support has proven to be utterly useless. They do not want to speak with you until you install windows and run their monkey utility ((. Also, i've noticed the latencies of the Samsung 840 Pro ssd drives to be about 15-20 slower compared with a consumer grade Intel drives, like Intel 520. According to ceph osd pef, I would consistently get higher figures on the osds with Samsung journal drive compared with the Intel drive on the same server. Something like 2-3ms for Intel vs 40-50ms for Samsungs. At some point we had enough with Samsungs and scrapped them. Andrei - Original Message - From: Martin B Nielsen mar...@unity3d.com To: Philippe Schwarz p...@schwarz-fr.net Cc: ceph-users@lists.ceph.com Sent: Saturday, 28 February, 2015 11:51:57 AM Subject: Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel Hi, I cannot recognize that picture; we've been using samsumg 840 pro in production for almost 2 years now - and have had 1 fail. We run a 8node mixed ssd/platter cluster with 4x samsung 840 pro (500gb) in each so that is 32x ssd. They've written ~25TB data in avg each. Using the dd you had inside an existing semi-busy mysql-guest I get: 10240 bytes (102 MB) copied, 5.58218 s, 18.3 MB/s Which is still not a lot, but I think it is more a limitation of our setup/load. We are using dumpling. All that aside, I would prob. go with something tried and tested if I was to redo it today - we haven't had any issues, but it is still nice to use something you know should have a baseline performance and can compare to that. Cheers, Martin On Sat, Feb 28, 2015 at 12:32 PM, Philippe Schwarz p...@schwarz-fr.net wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Le 28/02/2015 12:19, mad Engineer a écrit : Hello All, I am trying ceph-firefly 0.80.8 (69eaad7f8308f21573c604f121956e64679a52a7) with 9 OSD ,all Samsung SSD 850 EVO on 3 servers with 24 G RAM,16 cores @2.27 Ghz Ubuntu 14.04 LTS with 3.16-3 kernel.All are connected to 10G ports with maximum MTU.There are no extra disks for journaling and also there are no separate network for replication and data transfer.All 3 nodes are also hosting monitoring process.Operating system runs on SATA disk. When doing a sequential benchmark using dd on RBD, mounted on client as ext4 its taking 110s to write 100Mb data at an average speed of 926Kbps. time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct 25000+0 records in 25000+0 records out 10240 bytes (102 MB) copied, 110.582 s, 926 kB/s real 1m50.585s user 0m0.106s sys 0m2.233s While doing this directly on ssd mount point shows: time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct 25000+0 records in 25000+0 records out 10240 bytes (102 MB) copied, 1.38567 s, 73.9 MB/s OSDs are in XFS with these extra arguments : rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M ceph.conf [global] fsid = 7d889081-7826-439c-9fe5-d4e57480d9be mon_initial_members = ceph1, ceph2, ceph3 mon_host = 10.99.10.118,10.99.10.119,10.99.10.120 auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx filestore_xattr_use_omap = true osd_pool_default_size = 2 osd_pool_default_min_size = 2 osd_pool_default_pg_num = 450 osd_pool_default_pgp_num = 450 max_open_files = 131072 [osd] osd_mkfs_type = xfs osd_op_threads = 8 osd_disk_threads = 4 osd_mount_options_xfs = rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M on our traditional storage with Full SAS disk, same dd completes in 16s with an average write speed of 6Mbps. Rados bench: rados bench -p rbd 10 write Maintaining 16 concurrent writes of 4194304 bytes for up to 10 seconds or 0 objects Object prefix: benchmark_data_ceph1_2977 sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 0 0 0 0 0 0 - 0 1 16 94 78 311.821 312 0.041228 0.140132 2 16 192 176 351.866 392 0.106294 0.175055 3 16 275 259 345.216 332 0.076795 0.166036 4 16 302
Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel
Hi Andrei, If there is one thing I've come to understand by now is that ceph configs, performance, hw and well - everything - seems to vary on almost people basis. I do not recognize that latency issue either, this is from one of our nodes (4x 500GB samsung 840 pro - sd[c-f]) which has been running for 600+ days (so the iostat -x is an avg of that): # uptime 16:24:57 up 611 days, 4:03, 1 user, load average: 1.18, 1.55, 1.72 # iostat -x [ ... ] Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sdc 0.00 0.164.87 22.62 344.18 458.65 58.41 0.051.920.452.24 0.76 2.10 sdd 0.00 0.124.37 20.02 317.98 437.95 61.98 0.051.900.442.21 0.78 1.91 sde 0.00 0.124.17 19.33 302.45 403.02 60.02 0.041.870.432.18 0.77 1.80 sdf 0.00 0.124.51 20.84 322.84 439.70 60.17 0.051.840.432.15 0.76 1.93 [ ... ] Granted, we do not have very high usage on this cluster on a ssd-basis and it might change as we put more load on it, but we will deal with it then. I do not think ~2ms access time is neither good nor bad. This is from another cluster we operate - this one has an intel DC S3700 800gb ssd (sdb) # uptime 09:37:26 up 654 days, 8:40, 1 user, load average: 0.33, 0.40, 0.54 # iostat -x [ ... ] sdb 0.01 1.49 39.76 86.79 1252.80 2096.98 52.94 0.020.761.220.54 0.41 5.21 [ ... ] It is misleading as the latter just have 3 disks + hardware based 1gb backed raidcontroller whereas the first is a 'cheap' dumb 12disk jbod IT based setup. All the ssd from both clusters have 3 partitions - 1 ceph-data and 2 journal partitions (1 journal for the ssd itself and 1 journal for 1 platter disk). The intel ssd is very sturdy though - it has had a 2.1MB/sec avg. write over 654 days - that is somewhere around 120TB so far. But ultimately it boils down to what you need - in our usecase the latter cluster has be to rockstable and performing - and we chose the intel ones based on that. The first one we don't really care if we loose a node or two and we replace disks every month or whenever it fits into our going-to-datacenter-schedule - we wanted an ok'ish performing cluster and focused more on total space / price than highperforming hardware. The fantastic thing is we are not locked into any specific hardware and we can replace any of it if we need to and/or find it is suddenly starting to have issues. Cheers, Martin On Sat, Feb 28, 2015 at 2:55 PM, Andrei Mikhailovsky and...@arhont.com wrote: Martin, I have been using Samsung 840 Pro for journals about 2 years now and have just replaced all my samsung drives with Intel. We have found a lot of performance issues with 840 Pro (we are using 128mb). In particular, a very strange behaviour with using 4 partitions (with 50% underprovisioning left as empty unpartitioned space on the drive) where the drive would grind to almost a halt after a few weeks of use. I was getting 100% utilisation on the drives doing just 3-4MB/s writes. This was not the case when I've installed the new drives. Manual Trimming helps for a few weeks until the same happens again. This has been happening with all 840 Pro ssds that we have and contacting Samsung Support has proven to be utterly useless. They do not want to speak with you until you install windows and run their monkey utility ((. Also, i've noticed the latencies of the Samsung 840 Pro ssd drives to be about 15-20 slower compared with a consumer grade Intel drives, like Intel 520. According to ceph osd pef, I would consistently get higher figures on the osds with Samsung journal drive compared with the Intel drive on the same server. Something like 2-3ms for Intel vs 40-50ms for Samsungs. At some point we had enough with Samsungs and scrapped them. Andrei -- *From: *Martin B Nielsen mar...@unity3d.com *To: *Philippe Schwarz p...@schwarz-fr.net *Cc: *ceph-users@lists.ceph.com *Sent: *Saturday, 28 February, 2015 11:51:57 AM *Subject: *Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel Hi, I cannot recognize that picture; we've been using samsumg 840 pro in production for almost 2 years now - and have had 1 fail. We run a 8node mixed ssd/platter cluster with 4x samsung 840 pro (500gb) in each so that is 32x ssd. They've written ~25TB data in avg each. Using the dd you had inside an existing semi-busy mysql-guest I get: 10240 bytes (102 MB) copied, 5.58218 s, 18.3 MB/s Which is still not a lot, but I think it is more a limitation of our setup/load. We are using dumpling. All that aside, I would prob. go with something tried and tested if I was to redo it today - we haven't had any issues, but it is still nice to use something
Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel
I would say check with rados tool like ceph_smalliobench/rados bench first to see how much performance these tools are reporting. This will help you to isolate any upstream issues. Also, check with ‘iostat –xk 1’ for the resource utilization. Hope you are running with powerful enough cpu complex since you are saying network is not a bottleneck. Thanks Regards Somnath From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of mad Engineer Sent: Saturday, February 28, 2015 12:29 PM To: Alexandre DERUMIER Cc: ceph-users Subject: Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel reinstalled ceph packages and now with memstore backend [osd objectstore =memstore] its giving 400Kbps .No idea where the problem is. On Sun, Mar 1, 2015 at 12:30 AM, mad Engineer themadengin...@gmail.commailto:themadengin...@gmail.com wrote: tried changing scheduler from deadline to noop also upgraded to Gaint and btrfs filesystem,downgraded kernel to 3.16 from 3.16-3 not much difference dd if=/dev/zero of=hi bs=4k count=25000 oflag=direct 25000+0 records in 25000+0 records out 10240 bytes (102 MB) copied, 94.691 s, 1.1 MB/s Earlier on a vmware setup i was getting ~850 KBps and now even on physical server with SSD drives its just over 1MBps.I doubt some serious configuration issues. Tried iperf between 3 servers all are showing 9 Gbps,tried icmp with different packet size ,no fragmentation. i also noticed that out of 9 osd 5 are 850 EVO and 4 are 840 EVO.I believe this will not cause this much drop in performance. Thanks for any help On Sat, Feb 28, 2015 at 6:49 PM, Alexandre DERUMIER aderum...@odiso.commailto:aderum...@odiso.com wrote: As optimisation, try to set ioscheduler to noop, and also enable rbd_cache=true. (It's really helping for for sequential writes) but your results seem quite low, 926kb/s with 4k, it's only 200io/s. check if you don't have any big network latencies, or mtu fragementation problem. Maybe also try to bench with fio, with more parallel jobs. - Mail original - De: mad Engineer themadengin...@gmail.commailto:themadengin...@gmail.com À: Philippe Schwarz p...@schwarz-fr.netmailto:p...@schwarz-fr.net Cc: ceph-users ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com Envoyé: Samedi 28 Février 2015 13:06:59 Objet: Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel Thanks for the reply Philippe,we were using these disks in our NAS,now it looks like i am in big trouble :-( On Sat, Feb 28, 2015 at 5:02 PM, Philippe Schwarz p...@schwarz-fr.netmailto:p...@schwarz-fr.net wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Le 28/02/2015 12:19, mad Engineer a écrit : Hello All, I am trying ceph-firefly 0.80.8 (69eaad7f8308f21573c604f121956e64679a52a7) with 9 OSD ,all Samsung SSD 850 EVO on 3 servers with 24 G RAM,16 cores @2.27 Ghz Ubuntu 14.04 LTS with 3.16-3 kernel.All are connected to 10G ports with maximum MTU.There are no extra disks for journaling and also there are no separate network for replication and data transfer.All 3 nodes are also hosting monitoring process.Operating system runs on SATA disk. When doing a sequential benchmark using dd on RBD, mounted on client as ext4 its taking 110s to write 100Mb data at an average speed of 926Kbps. time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct 25000+0 records in 25000+0 records out 10240 bytes (102 MB) copied, 110.582 s, 926 kB/s real 1m50.585s user 0m0.106s sys 0m2.233s While doing this directly on ssd mount point shows: time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct 25000+0 records in 25000+0 records out 10240 bytes (102 MB) copied, 1.38567 s, 73.9 MB/s OSDs are in XFS with these extra arguments : rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M ceph.conf [global] fsid = 7d889081-7826-439c-9fe5-d4e57480d9be mon_initial_members = ceph1, ceph2, ceph3 mon_host = 10.99.10.118,10.99.10.119,10.99.10.120 auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx filestore_xattr_use_omap = true osd_pool_default_size = 2 osd_pool_default_min_size = 2 osd_pool_default_pg_num = 450 osd_pool_default_pgp_num = 450 max_open_files = 131072 [osd] osd_mkfs_type = xfs osd_op_threads = 8 osd_disk_threads = 4 osd_mount_options_xfs = rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M on our traditional storage with Full SAS disk, same dd completes in 16s with an average write speed of 6Mbps. Rados bench: rados bench -p rbd 10 write Maintaining 16 concurrent writes of 4194304 bytes for up to 10 seconds or 0 objects Object prefix: benchmark_data_ceph1_2977 sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 0 0 0 0 0 0 - 0 1 16 94 78 311.821 312 0.041228 0.140132 2 16 192 176 351.866 392 0.106294 0.175055 3 16 275 259 345.216 332 0.076795 0.166036 4 16 302 286 285.912 108 0.043888 0.196419
Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel
Thanks for the reply Philippe,we were using these disks in our NAS,now it looks like i am in big trouble :-( On Sat, Feb 28, 2015 at 5:02 PM, Philippe Schwarz p...@schwarz-fr.net wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Le 28/02/2015 12:19, mad Engineer a écrit : Hello All, I am trying ceph-firefly 0.80.8 (69eaad7f8308f21573c604f121956e64679a52a7) with 9 OSD ,all Samsung SSD 850 EVO on 3 servers with 24 G RAM,16 cores @2.27 Ghz Ubuntu 14.04 LTS with 3.16-3 kernel.All are connected to 10G ports with maximum MTU.There are no extra disks for journaling and also there are no separate network for replication and data transfer.All 3 nodes are also hosting monitoring process.Operating system runs on SATA disk. When doing a sequential benchmark using dd on RBD, mounted on client as ext4 its taking 110s to write 100Mb data at an average speed of 926Kbps. time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct 25000+0 records in 25000+0 records out 10240 bytes (102 MB) copied, 110.582 s, 926 kB/s real1m50.585s user0m0.106s sys 0m2.233s While doing this directly on ssd mount point shows: time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct 25000+0 records in 25000+0 records out 10240 bytes (102 MB) copied, 1.38567 s, 73.9 MB/s OSDs are in XFS with these extra arguments : rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M ceph.conf [global] fsid = 7d889081-7826-439c-9fe5-d4e57480d9be mon_initial_members = ceph1, ceph2, ceph3 mon_host = 10.99.10.118,10.99.10.119,10.99.10.120 auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx filestore_xattr_use_omap = true osd_pool_default_size = 2 osd_pool_default_min_size = 2 osd_pool_default_pg_num = 450 osd_pool_default_pgp_num = 450 max_open_files = 131072 [osd] osd_mkfs_type = xfs osd_op_threads = 8 osd_disk_threads = 4 osd_mount_options_xfs = rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M on our traditional storage with Full SAS disk, same dd completes in 16s with an average write speed of 6Mbps. Rados bench: rados bench -p rbd 10 write Maintaining 16 concurrent writes of 4194304 bytes for up to 10 seconds or 0 objects Object prefix: benchmark_data_ceph1_2977 sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 0 0 0 0 0 0 - 0 1 169478 311.821 312 0.041228 0.140132 2 16 192 176 351.866 392 0.106294 0.175055 3 16 275 259 345.216 332 0.076795 0.166036 4 16 302 286 285.912 108 0.043888 0.196419 5 16 395 379 303.11 372 0.126033 0.207488 6 16 501 485 323.242 424 0.125972 0.194559 7 16 621 605 345.621 480 0.194155 0.183123 8 16 730 714 356.903 436 0.086678 0.176099 9 16 814 798 354.572 336 0.081567 0.174786 10 16 832 816 326.31372 0.037431 0.182355 11 16 833 817 297.013 4 0.533326 0.182784 Total time run: 11.489068 Total writes made: 833 Write size: 4194304 Bandwidth (MB/sec): 290.015 Stddev Bandwidth: 175.723 Max bandwidth (MB/sec): 480 Min bandwidth (MB/sec): 0 Average Latency:0.220582 Stddev Latency: 0.343697 Max latency:2.85104 Min latency:0.035381 Our ultimate aim is to replace existing SAN with ceph,but for that it should meet minimum 8000 iops.Can any one help me with this,OSD are SSD,CPU has good clock speed,backend network is good but still we are not able to extract full capability of SSD disks. Thanks, Hi, i'm new to ceph so, don't consider my words as holy truth. It seems that Samsung 840 (so i assume 850) are crappy for ceph : MTBF : http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-November/044258.html Bandwidth :http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-December/045247.html And according to a confirmed user of Ceph/ProxmoX, Samsung SSDs should be avoided if possible in ceph storage. Apart from that, it seems there was an limitation in ceph for the use of the complete bandwidth available in SSDs; but i think with less than 1Mb/s you haven't hit this limit. I remind you that i'm not a ceph-guru (far from that, indeed), so feel free to disagree; i'm on the way to improve my knowledge. Best regards. -BEGIN PGP SIGNATURE- Version: GnuPG v1 iEYEARECAAYFAlTxp0UACgkQlhqCFkbqHRb5+wCgrXCM3VsnVE6PCbbpOmQXCXbr 8u0An2BUgZWismSK0PxbwVDOD5+/UWik =0o0v -END PGP SIGNATURE- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel
Sorry, I saw you have already tried with ‘rados bench’. So, some points here. 1. If you are considering write workload, I think with total of 2 copies and with 4K workload , you should be able to get ~4K iops (considering it hitting the disk, not with memstore). 2. You are having 9 OSDs and if you created only one pool with only 450 PGS, you should try to increase that and see if getting any improvement or not. 3. Also, the rados bench script you ran with very low QD, try increasing that, may be 32/64. 4. If you are running firefly, other optimization won’t work here..But, you can add the following in your ceph.conf file and it should give you some boost. debug_lockdep = 0/0 debug_context = 0/0 debug_crush = 0/0 debug_buffer = 0/0 debug_timer = 0/0 debug_filer = 0/0 debug_objecter = 0/0 debug_rados = 0/0 debug_rbd = 0/0 debug_journaler = 0/0 debug_objectcatcher = 0/0 debug_client = 0/0 debug_osd = 0/0 debug_optracker = 0/0 debug_objclass = 0/0 debug_filestore = 0/0 debug_journal = 0/0 debug_ms = 0/0 debug_monc = 0/0 debug_tp = 0/0 debug_auth = 0/0 debug_finisher = 0/0 debug_heartbeatmap = 0/0 debug_perfcounter = 0/0 debug_asok = 0/0 debug_throttle = 0/0 debug_mon = 0/0 debug_paxos = 0/0 debug_rgw = 0/0 5. Give us the ceph –s output and the iostat output while io is going on. Thanks Regards Somnath From: Somnath Roy Sent: Saturday, February 28, 2015 12:59 PM To: 'mad Engineer'; Alexandre DERUMIER Cc: ceph-users Subject: RE: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel I would say check with rados tool like ceph_smalliobench/rados bench first to see how much performance these tools are reporting. This will help you to isolate any upstream issues. Also, check with ‘iostat –xk 1’ for the resource utilization. Hope you are running with powerful enough cpu complex since you are saying network is not a bottleneck. Thanks Regards Somnath From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of mad Engineer Sent: Saturday, February 28, 2015 12:29 PM To: Alexandre DERUMIER Cc: ceph-users Subject: Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel reinstalled ceph packages and now with memstore backend [osd objectstore =memstore] its giving 400Kbps .No idea where the problem is. On Sun, Mar 1, 2015 at 12:30 AM, mad Engineer themadengin...@gmail.commailto:themadengin...@gmail.com wrote: tried changing scheduler from deadline to noop also upgraded to Gaint and btrfs filesystem,downgraded kernel to 3.16 from 3.16-3 not much difference dd if=/dev/zero of=hi bs=4k count=25000 oflag=direct 25000+0 records in 25000+0 records out 10240 bytes (102 MB) copied, 94.691 s, 1.1 MB/s Earlier on a vmware setup i was getting ~850 KBps and now even on physical server with SSD drives its just over 1MBps.I doubt some serious configuration issues. Tried iperf between 3 servers all are showing 9 Gbps,tried icmp with different packet size ,no fragmentation. i also noticed that out of 9 osd 5 are 850 EVO and 4 are 840 EVO.I believe this will not cause this much drop in performance. Thanks for any help On Sat, Feb 28, 2015 at 6:49 PM, Alexandre DERUMIER aderum...@odiso.commailto:aderum...@odiso.com wrote: As optimisation, try to set ioscheduler to noop, and also enable rbd_cache=true. (It's really helping for for sequential writes) but your results seem quite low, 926kb/s with 4k, it's only 200io/s. check if you don't have any big network latencies, or mtu fragementation problem. Maybe also try to bench with fio, with more parallel jobs. - Mail original - De: mad Engineer themadengin...@gmail.commailto:themadengin...@gmail.com À: Philippe Schwarz p...@schwarz-fr.netmailto:p...@schwarz-fr.net Cc: ceph-users ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com Envoyé: Samedi 28 Février 2015 13:06:59 Objet: Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel Thanks for the reply Philippe,we were using these disks in our NAS,now it looks like i am in big trouble :-( On Sat, Feb 28, 2015 at 5:02 PM, Philippe Schwarz p...@schwarz-fr.netmailto:p...@schwarz-fr.net wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Le 28/02/2015 12:19, mad Engineer a écrit : Hello All, I am trying ceph-firefly 0.80.8 (69eaad7f8308f21573c604f121956e64679a52a7) with 9 OSD ,all Samsung SSD 850 EVO on 3 servers with 24 G RAM,16 cores @2.27 Ghz Ubuntu 14.04 LTS with 3.16-3 kernel.All are connected to 10G ports with maximum MTU.There are no extra disks for journaling and also there are no separate network for replication and data transfer.All 3 nodes are also hosting monitoring process.Operating system runs on SATA disk. When doing a sequential benchmark using dd on RBD, mounted on client as ext4 its taking 110s to write 100Mb data at an average speed of 926Kbps. time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct 25000+0 records in
Re: [ceph-users] Shutting down a cluster fully and powering it back up
Sounds good! -Greg On Sat, Feb 28, 2015 at 10:55 AM David da...@visions.se wrote: Hi! I’m about to do maintenance on a Ceph Cluster, where we need to shut it all down fully. We’re currently only using it for rados block devices to KVM Hypervizors. Are these steps sane? Shutting it down 1. Shut down all IO to the cluster. Means turning off all clients (KVM Hypervizors in our case). 2. Set cluster to noout by running: ceph osd set noout 3. Shut down the MON nodes. 4. Shut down the OSD nodes. Starting it up 1. Start the OSD nodes. 2. Start the MON nodes. 3. Check ceph -w to see the status of ceph and take actions if something is wrong. 4. Start up the clients (KVM Hypervizors) 5. Run ceph osd unset noout Kind Regards, David ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel
Am 28.02.2015 um 19:41 schrieb Kevin Walker: What about the Samsung 845DC Pro SSD's? These have fantastic enterprise performance characteristics. http://www.thessdreview.com/our-reviews/samsung-845dc-pro-review-800gb-class-leading-speed-endurance/ Or use SV843 from Samsung Semiconductor (seperate samsung company). Stefan Kind regards Kevin On 28 February 2015 at 15:32, Philippe Schwarz p...@schwarz-fr.net mailto:p...@schwarz-fr.net wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Le 28/02/2015 12:19, mad Engineer a écrit : Hello All, I am trying ceph-firefly 0.80.8 (69eaad7f8308f21573c604f121956e64679a52a7) with 9 OSD ,all Samsung SSD 850 EVO on 3 servers with 24 G RAM,16 cores @2.27 Ghz Ubuntu 14.04 LTS with 3.16-3 kernel.All are connected to 10G ports with maximum MTU.There are no extra disks for journaling and also there are no separate network for replication and data transfer.All 3 nodes are also hosting monitoring process.Operating system runs on SATA disk. When doing a sequential benchmark using dd on RBD, mounted on client as ext4 its taking 110s to write 100Mb data at an average speed of 926Kbps. time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct 25000+0 records in 25000+0 records out 10240 bytes (102 MB) copied, 110.582 s, 926 kB/s real1m50.585s user0m0.106s sys 0m2.233s While doing this directly on ssd mount point shows: time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct 25000+0 records in 25000+0 records out 10240 bytes (102 MB) copied, 1.38567 s, 73.9 MB/s OSDs are in XFS with these extra arguments : rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M ceph.conf [global] fsid = 7d889081-7826-439c-9fe5-d4e57480d9be mon_initial_members = ceph1, ceph2, ceph3 mon_host = 10.99.10.118,10.99.10.119,10.99.10.120 auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx filestore_xattr_use_omap = true osd_pool_default_size = 2 osd_pool_default_min_size = 2 osd_pool_default_pg_num = 450 osd_pool_default_pgp_num = 450 max_open_files = 131072 [osd] osd_mkfs_type = xfs osd_op_threads = 8 osd_disk_threads = 4 osd_mount_options_xfs = rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M on our traditional storage with Full SAS disk, same dd completes in 16s with an average write speed of 6Mbps. Rados bench: rados bench -p rbd 10 write Maintaining 16 concurrent writes of 4194304 bytes for up to 10 seconds or 0 objects Object prefix: benchmark_data_ceph1_2977 sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 0 0 0 0 0 0 - 0 1 169478 311.821 312 0.041228 0.140132 2 16 192 176 351.866 392 0.106294 0.175055 3 16 275 259 345.216 332 0.076795 0.166036 4 16 302 286 285.912 108 0.043888 0.196419 5 16 395 379 303.11 372 0.126033 0.207488 6 16 501 485 323.242 424 0.125972 0.194559 7 16 621 605 345.621 480 0.194155 0.183123 8 16 730 714 356.903 436 0.086678 0.176099 9 16 814 798 354.572 336 0.081567 0.174786 10 16 832 816 326.31372 0.037431 0.182355 11 16 833 817 297.013 4 0.533326 0.182784 Total time run: 11.489068 Total writes made: 833 Write size: 4194304 Bandwidth (MB/sec): 290.015 Stddev Bandwidth: 175.723 Max bandwidth (MB/sec): 480 Min bandwidth (MB/sec): 0 Average Latency:0.220582 Stddev Latency: 0.343697 Max latency:2.85104 Min latency:0.035381 Our ultimate aim is to replace existing SAN with ceph,but for that it should meet minimum 8000 iops.Can any one help me with this,OSD are SSD,CPU has good clock speed,backend network is good but still we are not able to extract full capability of SSD disks. Thanks, Hi, i'm new to ceph so, don't consider my words as holy truth. It seems that Samsung 840 (so i assume 850) are crappy for ceph : MTBF : http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-November/044258.html Bandwidth :http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-December/045247.html And according to a confirmed user of Ceph/ProxmoX, Samsung SSDs should be avoided if possible in ceph storage. Apart from that, it seems there was an limitation in ceph for the use of
[ceph-users] RGW hammer/master woes
Am struggling to get through a basic PUT via swift client with RGW and CEPH binaries built out of Hammer/Master codebase, whereas the same (command on the same setup) is going through with RGW and CEPH binaries built out of Giant. Find below RGW log snippet and the command that was run. Am I missing anything obvious here? The user info looks like this: { user_id: johndoe, display_name: John Doe, email: j...@example.com, suspended: 0, max_buckets: 1000, auid: 0, subusers: [ { id: johndoe:swift, permissions: full-control}], keys: [ { user: johndoe, access_key: 7B39L2TUQ448LZW4RI3M, secret_key: lshKCoacSlbyVc7mBLLr4cJ26fEEM22Tcmp29hT3}, { user: johndoe:swift, access_key: SHZ64EF7CIB4V42I14AH, secret_key: }], swift_keys: [ { user: johndoe:swift, secret_key: asdf}], caps: [], op_mask: read, write, delete, default_placement: , placement_tags: [], bucket_quota: { enabled: false, max_size_kb: -1, max_objects: -1}, user_quota: { enabled: false, max_size_kb: -1, max_objects: -1}, temp_url_keys: []} The command that was run and the logs: snip swift -A http://localhost:8989/auth -U johndoe:swift -K asdf upload mycontainer ceph 2015-02-28 23:28:39.272897 7fb610ff9700 1 == starting new request req=0x7fb5f0009990 = 2015-02-28 23:28:39.272913 7fb610ff9700 2 req 0:0.16::PUT /swift/v1/mycontainer/ceph::initializing 2015-02-28 23:28:39.272918 7fb610ff9700 10 host=localhost:8989 2015-02-28 23:28:39.272921 7fb610ff9700 20 subdomain= domain= in_hosted_domain=0 2015-02-28 23:28:39.272938 7fb610ff9700 10 meta HTTP_X_OBJECT_META_MTIME 2015-02-28 23:28:39.272945 7fb610ff9700 10 x x-amz-meta-mtime:1425140933.648506 2015-02-28 23:28:39.272964 7fb610ff9700 10 ver=v1 first=mycontainer req=ceph 2015-02-28 23:28:39.272971 7fb610ff9700 10 s-object=ceph s-bucket=mycontainer 2015-02-28 23:28:39.272976 7fb610ff9700 2 req 0:0.79:swift:PUT /swift/v1/mycontainer/ceph::getting op 2015-02-28 23:28:39.272982 7fb610ff9700 2 req 0:0.85:swift:PUT /swift/v1/mycontainer/ceph:put_obj:authorizing 2015-02-28 23:28:39.273008 7fb610ff9700 10 swift_user=johndoe:swift 2015-02-28 23:28:39.273026 7fb610ff9700 20 build_token token=0d006a6f686e646f653a73776966744436beb90402b13c4f53f35472c2cf0f 2015-02-28 23:28:39.273057 7fb610ff9700 2 req 0:0.000160:swift:PUT /swift/v1/mycontainer/ceph:put_obj:reading permissions 2015-02-28 23:28:39.273100 7fb610ff9700 15 Read AccessControlPolicyAccessControlPolicy xmlns=http://s3.amazonaws.com/doc/2006-03-01/;OwnerIDjohndoe/IDDisplayNameJohn Doe/DisplayName/OwnerAccessControlListGrantGrantee xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance; xsi:type=CanonicalUserIDjohndoe/IDDisplayNameJohn Doe/DisplayName/GranteePermissionFULL_CONTROL/Permission/Grant/AccessControlList/AccessControlPolicy 2015-02-28 23:28:39.273114 7fb610ff9700 2 req 0:0.000216:swift:PUT /swift/v1/mycontainer/ceph:put_obj:init op 2015-02-28 23:28:39.273120 7fb610ff9700 2 req 0:0.000223:swift:PUT /swift/v1/mycontainer/ceph:put_obj:verifying op mask 2015-02-28 23:28:39.273123 7fb610ff9700 20 required_mask= 2 user.op_mask=7 2015-02-28 23:28:39.273125 7fb610ff9700 2 req 0:0.000228:swift:PUT /swift/v1/mycontainer/ceph:put_obj:verifying op permissions 2015-02-28 23:28:39.273129 7fb610ff9700 5 Searching permissions for uid=johndoe mask=50 2015-02-28 23:28:39.273131 7fb610ff9700 5 Found permission: 15 2015-02-28 23:28:39.273133 7fb610ff9700 5 Searching permissions for group=1 mask=50 2015-02-28 23:28:39.273135 7fb610ff9700 5 Permissions for group not found 2015-02-28 23:28:39.273136 7fb610ff9700 5 Searching permissions for group=2 mask=50 2015-02-28 23:28:39.273137 7fb610ff9700 5 Permissions for group not found 2015-02-28 23:28:39.273138 7fb610ff9700 5 Getting permissions id=johndoe owner=johndoe perm=2 2015-02-28 23:28:39.273140 7fb610ff9700 10 uid=johndoe requested perm (type)=2, policy perm=2, user_perm_mask=2, acl perm=2 2015-02-28 23:28:39.273143 7fb610ff9700 2 req 0:0.000246:swift:PUT /swift/v1/mycontainer/ceph:put_obj:verifying op params 2015-02-28 23:28:39.273146 7fb610ff9700 2 req 0:0.000249:swift:PUT /swift/v1/mycontainer/ceph:put_obj:executing 2015-02-28 23:28:39.273279 7fb610ff9700 10 x x-amz-meta-mtime:1425140933.648506 2015-02-28 23:28:39.273313 7fb610ff9700 20 get_obj_state: rctx=0x7fb610ff41f0 obj=mycontainer:ceph state=0x7fb5f0016940 s-prefetch_data=0 2015-02-28 23:28:39.274354 7fb610ff9700 20 get_obj_state: rctx=0x7fb610ff41f0 obj=mycontainer:ceph state=0x7fb5f0016940 s-prefetch_data=0 2015-02-28 23:28:39.274394 7fb610ff9700 10 setting object write_tag=default.14199.0 2015-02-28 23:28:39.274554 7fb610ff9700 20 reading from .rgw:.bucket.meta.mycontainer:default.14199.3 2015-02-28 23:28:39.274574 7fb610ff9700 20 get_obj_state: rctx=0x7fb610ff2ef0 obj=.rgw:.bucket.meta.mycontainer:default.14199.3 state=0x7fb5f001db30
Re: [ceph-users] More than 50% osds down, CPUs still busy; will the cluster recover without help?
After noticing that the number increases by 101 on each attempt to start osd.11, I figured I was only 7 iterations away from the output being within 101 of 63675. So, I killed the osd process, started it again, lather, rinse, repeat. I then did the same for other OSDs. Some created very small logs, and some created logs into the gigabytes. Grepping the latter for update_osd_stat showed me where the maps were up to, and therefore which OSDs needed some special attention. Some of the epoch numbers appeared to increase by themselves to a point and then plateaux, after which I'd kill then start the osd again, and this number would start to increase again. After all either showed 63675, or nothing at all, I turned debugging back off, deleted logs, and tried to bring the cluster back by unsetting noup, nobackfill, norecovery etc. It hasn't got very far before appearing stuck again, with nothing progressing in ceph status. It appears that 11/15 OSDs are now properly up, but four still aren't. A lot of placement groups are stale, so I guess I really need the remaining four to come up. The OSDs in question are 1, 7, 10 12. All have a line similar to this as the last in their log: 2015-02-28 10:35:04.240822 7f375ef40780 1 journal _open /var/lib/ceph/osd/ceph-1/journal fd 21: 5367660544 bytes, block size 4096 bytes, directio = 1, aio = 1 Even with the following in ceph.conf, I'm not seeing anything after that last line in the log. debug osd = 20 debug filestore = 1 CPU is still being consumed by the ceph-osd process though, but not much memory is being used compared to the other two OSDs which are up on that node. Is there perhaps even further logging that I can use to see why the logs aren't progressing past this point? Osd.1 is on /dev/sdb. iostat still shows some activity as the minutes go on, but not much: (60 second intervals) Device:tpskB_read/skB_wrtn/skB_readkB_wrtn sdb 5.45 0.00 807.33 0 48440 sdb 5.75 0.00 807.33 0 48440 sdb 5.43 0.00 807.20 0 48440 Thanks, Chris -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Chris Murray Sent: 27 February 2015 10:32 To: Gregory Farnum Cc: ceph-users Subject: Re: [ceph-users] More than 50% osds down, CPUs still busy;will the cluster recover without help? A little further logging: 2015-02-27 10:27:15.745585 7fe8e3f2f700 20 osd.11 62839 update_osd_stat osd_stat(1305 GB used, 1431 GB avail, 2789 GB total, peers []/[] op hist []) 2015-02-27 10:27:15.745619 7fe8e3f2f700 5 osd.11 62839 heartbeat: osd_stat(1305 GB used, 1431 GB avail, 2789 GB total, peers []/[] op hist []) 2015-02-27 10:27:23.530913 7fe8e8536700 1 -- 192.168.12.25:6800/673078 -- 192.168.12.25:6789/0 -- mon_subscribe({monmap=6+,osd_pg_creates=0}) v2 -- ?+0 0xe5f26380 con 0xe1f0cc60 2015-02-27 10:27:30.645902 7fe8e3f2f700 20 osd.11 62839 update_osd_stat osd_stat(1305 GB used, 1431 GB avail, 2789 GB total, peers []/[] op hist []) 2015-02-27 10:27:30.645938 7fe8e3f2f700 5 osd.11 62839 heartbeat: osd_stat(1305 GB used, 1431 GB avail, 2789 GB total, peers []/[] op hist []) 2015-02-27 10:27:33.531142 7fe8e8536700 1 -- 192.168.12.25:6800/673078 -- 192.168.12.25:6789/0 -- mon_subscribe({monmap=6+,osd_pg_creates=0}) v2 -- ?+0 0xe5f26540 con 0xe1f0cc60 2015-02-27 10:27:43.531333 7fe8e8536700 1 -- 192.168.12.25:6800/673078 -- 192.168.12.25:6789/0 -- mon_subscribe({monmap=6+,osd_pg_creates=0}) v2 -- ?+0 0xe5f26700 con 0xe1f0cc60 2015-02-27 10:27:45.546275 7fe8e3f2f700 20 osd.11 62839 update_osd_stat osd_stat(1305 GB used, 1431 GB avail, 2789 GB total, peers []/[] op hist []) 2015-02-27 10:27:45.546311 7fe8e3f2f700 5 osd.11 62839 heartbeat: osd_stat(1305 GB used, 1431 GB avail, 2789 GB total, peers []/[] op hist []) 2015-02-27 10:27:53.531564 7fe8e8536700 1 -- 192.168.12.25:6800/673078 -- 192.168.12.25:6789/0 -- mon_subscribe({monmap=6+,osd_pg_creates=0}) v2 -- ?+0 0xe5f268c0 con 0xe1f0cc60 2015-02-27 10:27:56.846593 7fe8e3f2f700 20 osd.11 62839 update_osd_stat osd_stat(1305 GB used, 1431 GB avail, 2789 GB total, peers []/[] op hist []) 2015-02-27 10:27:56.846627 7fe8e3f2f700 5 osd.11 62839 heartbeat: osd_stat(1305 GB used, 1431 GB avail, 2789 GB total, peers []/[] op hist []) 2015-02-27 10:27:57.346965 7fe8e3f2f700 20 osd.11 62839 update_osd_stat osd_stat(1305 GB used, 1431 GB avail, 2789 GB total, peers []/[] op hist []) 2015-02-27 10:27:57.347001 7fe8e3f2f700 5 osd.11 62839 heartbeat: osd_stat(1305 GB used, 1431 GB avail, 2789 GB total, peers []/[] op hist []) 2015-02-27 10:28:03.531785 7fe8e8536700 1 -- 192.168.12.25:6800/673078 -- 192.168.12.25:6789/0 -- mon_subscribe({monmap=6+,osd_pg_creates=0}) v2 -- ?+0 0xe5f26a80 con 0xe1f0cc60 2015-02-27 10:28:13.532027 7fe8e8536700 1 -- 192.168.12.25:6800/673078 -- 192.168.12.25:6789/0 -- mon_subscribe({monmap=6+,osd_pg_creates=0}) v2
Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel
What about the Samsung 845DC Pro SSD's? These have fantastic enterprise performance characteristics. http://www.thessdreview.com/our-reviews/samsung-845dc-pro-review-800gb-class-leading-speed-endurance/ Kind regards Kevin On 28 February 2015 at 15:32, Philippe Schwarz p...@schwarz-fr.net wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Le 28/02/2015 12:19, mad Engineer a écrit : Hello All, I am trying ceph-firefly 0.80.8 (69eaad7f8308f21573c604f121956e64679a52a7) with 9 OSD ,all Samsung SSD 850 EVO on 3 servers with 24 G RAM,16 cores @2.27 Ghz Ubuntu 14.04 LTS with 3.16-3 kernel.All are connected to 10G ports with maximum MTU.There are no extra disks for journaling and also there are no separate network for replication and data transfer.All 3 nodes are also hosting monitoring process.Operating system runs on SATA disk. When doing a sequential benchmark using dd on RBD, mounted on client as ext4 its taking 110s to write 100Mb data at an average speed of 926Kbps. time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct 25000+0 records in 25000+0 records out 10240 bytes (102 MB) copied, 110.582 s, 926 kB/s real1m50.585s user0m0.106s sys 0m2.233s While doing this directly on ssd mount point shows: time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct 25000+0 records in 25000+0 records out 10240 bytes (102 MB) copied, 1.38567 s, 73.9 MB/s OSDs are in XFS with these extra arguments : rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M ceph.conf [global] fsid = 7d889081-7826-439c-9fe5-d4e57480d9be mon_initial_members = ceph1, ceph2, ceph3 mon_host = 10.99.10.118,10.99.10.119,10.99.10.120 auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx filestore_xattr_use_omap = true osd_pool_default_size = 2 osd_pool_default_min_size = 2 osd_pool_default_pg_num = 450 osd_pool_default_pgp_num = 450 max_open_files = 131072 [osd] osd_mkfs_type = xfs osd_op_threads = 8 osd_disk_threads = 4 osd_mount_options_xfs = rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M on our traditional storage with Full SAS disk, same dd completes in 16s with an average write speed of 6Mbps. Rados bench: rados bench -p rbd 10 write Maintaining 16 concurrent writes of 4194304 bytes for up to 10 seconds or 0 objects Object prefix: benchmark_data_ceph1_2977 sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 0 0 0 0 0 0 - 0 1 169478 311.821 312 0.041228 0.140132 2 16 192 176 351.866 392 0.106294 0.175055 3 16 275 259 345.216 332 0.076795 0.166036 4 16 302 286 285.912 108 0.043888 0.196419 5 16 395 379 303.11 372 0.126033 0.207488 6 16 501 485 323.242 424 0.125972 0.194559 7 16 621 605 345.621 480 0.194155 0.183123 8 16 730 714 356.903 436 0.086678 0.176099 9 16 814 798 354.572 336 0.081567 0.174786 10 16 832 816 326.31372 0.037431 0.182355 11 16 833 817 297.013 4 0.533326 0.182784 Total time run: 11.489068 Total writes made: 833 Write size: 4194304 Bandwidth (MB/sec): 290.015 Stddev Bandwidth: 175.723 Max bandwidth (MB/sec): 480 Min bandwidth (MB/sec): 0 Average Latency:0.220582 Stddev Latency: 0.343697 Max latency:2.85104 Min latency:0.035381 Our ultimate aim is to replace existing SAN with ceph,but for that it should meet minimum 8000 iops.Can any one help me with this,OSD are SSD,CPU has good clock speed,backend network is good but still we are not able to extract full capability of SSD disks. Thanks, Hi, i'm new to ceph so, don't consider my words as holy truth. It seems that Samsung 840 (so i assume 850) are crappy for ceph : MTBF : http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-November/044258.html Bandwidth : http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-December/045247.html And according to a confirmed user of Ceph/ProxmoX, Samsung SSDs should be avoided if possible in ceph storage. Apart from that, it seems there was an limitation in ceph for the use of the complete bandwidth available in SSDs; but i think with less than 1Mb/s you haven't hit this limit. I remind you that i'm not a ceph-guru (far from that, indeed), so feel free to disagree; i'm on the way to improve my knowledge. Best regards. -BEGIN PGP SIGNATURE- Version: GnuPG v1 iEYEARECAAYFAlTxp0UACgkQlhqCFkbqHRb5+wCgrXCM3VsnVE6PCbbpOmQXCXbr 8u0An2BUgZWismSK0PxbwVDOD5+/UWik =0o0v -END PGP SIGNATURE- ___ ceph-users mailing list
[ceph-users] Shutting down a cluster fully and powering it back up
Hi! I’m about to do maintenance on a Ceph Cluster, where we need to shut it all down fully. We’re currently only using it for rados block devices to KVM Hypervizors. Are these steps sane? Shutting it down 1. Shut down all IO to the cluster. Means turning off all clients (KVM Hypervizors in our case). 2. Set cluster to noout by running: ceph osd set noout 3. Shut down the MON nodes. 4. Shut down the OSD nodes. Starting it up 1. Start the OSD nodes. 2. Start the MON nodes. 3. Check ceph -w to see the status of ceph and take actions if something is wrong. 4. Start up the clients (KVM Hypervizors) 5. Run ceph osd unset noout Kind Regards, David ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Booting from journal devices
Hi All, Thought I would just share this in case someone finds it useful. I've just finished building our new Ceph cluster where the journals are installed on the same SSD's as the OS. The SSD's have a MD raid partitions for the OS and swap and the rest of the SSD's are used for individual journal partitions. The OS is Ubuntu and as such the default install is using MBR partitions. After we created the OSD's, the servers were unable to boot anymore. This was caused by the ceph-deploy making GPT partitions for the journals on the SSD's, effectively what it looks like is that its overwriting the MBR boot record. To fix it I carried out the following steps:- 1. Used gdisk on both SSD's to create a new partition from sector 34 to 2047, of type EF02 2. Ran grub-install against each SSD device Hope that's helps someone, if they come across the same problem. Nick ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] New Cluster - Any requests?
Hi All, I've just finished building a new POC cluster comprised of the following:- 4 Hosts in 1 chassis (http://www.supermicro.com/products/system/4U/F617/SYS-F617H6-FTPT_.cfm) each with the following:- 2x Xeon 2620 v2 (2.1Ghz) 32GB Ram 2x Onboard 10GB-T into 10GB switches 10x 3TB WD Red Pro Disks (currently in k3 m3 EC pool, so 55TB usable) 2x 100G S3700 SSD's for journals and OS 1x 400GB S3700 SSD for SSD cache tier Ubuntu 14.04.2 (3.16 Kernel) Running Ceph 87.1 Its currently in testing whilst I get iSCSI over RBD working to a state I'm happy with. As a very rough idea of performance from the SSD tier, I'm seeing about 10K writes and 40K reads of 4kb at queue depth of 32. During these bench's total CPU of each host is at about 80%. and this is just 1 SSD OSD per host remember. Idle power usage is around 500-600W I'm intending to post some performance numbers of the individual components and RBD performance within the next couple of weeks, but if anybody has any requests for me to carry out any tests or changes whilst it's in testing please let me know. I'm happy to create new pools and carry out config changes, but nothing that will result in me rebuilding the cluster from scratch. Nick ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel
tried changing scheduler from deadline to noop also upgraded to Gaint and btrfs filesystem,downgraded kernel to 3.16 from 3.16-3 not much difference dd if=/dev/zero of=hi bs=4k count=25000 oflag=direct 25000+0 records in 25000+0 records out 10240 bytes (102 MB) copied, 94.691 s, 1.1 MB/s Earlier on a vmware setup i was getting ~850 KBps and now even on physical server with SSD drives its just over 1MBps.I doubt some serious configuration issues. Tried iperf between 3 servers all are showing 9 Gbps,tried icmp with different packet size ,no fragmentation. i also noticed that out of 9 osd 5 are 850 EVO and 4 are 840 EVO.I believe this will not cause this much drop in performance. Thanks for any help On Sat, Feb 28, 2015 at 6:49 PM, Alexandre DERUMIER aderum...@odiso.com wrote: As optimisation, try to set ioscheduler to noop, and also enable rbd_cache=true. (It's really helping for for sequential writes) but your results seem quite low, 926kb/s with 4k, it's only 200io/s. check if you don't have any big network latencies, or mtu fragementation problem. Maybe also try to bench with fio, with more parallel jobs. - Mail original - De: mad Engineer themadengin...@gmail.com À: Philippe Schwarz p...@schwarz-fr.net Cc: ceph-users ceph-users@lists.ceph.com Envoyé: Samedi 28 Février 2015 13:06:59 Objet: Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel Thanks for the reply Philippe,we were using these disks in our NAS,now it looks like i am in big trouble :-( On Sat, Feb 28, 2015 at 5:02 PM, Philippe Schwarz p...@schwarz-fr.net wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Le 28/02/2015 12:19, mad Engineer a écrit : Hello All, I am trying ceph-firefly 0.80.8 (69eaad7f8308f21573c604f121956e64679a52a7) with 9 OSD ,all Samsung SSD 850 EVO on 3 servers with 24 G RAM,16 cores @2.27 Ghz Ubuntu 14.04 LTS with 3.16-3 kernel.All are connected to 10G ports with maximum MTU.There are no extra disks for journaling and also there are no separate network for replication and data transfer.All 3 nodes are also hosting monitoring process.Operating system runs on SATA disk. When doing a sequential benchmark using dd on RBD, mounted on client as ext4 its taking 110s to write 100Mb data at an average speed of 926Kbps. time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct 25000+0 records in 25000+0 records out 10240 bytes (102 MB) copied, 110.582 s, 926 kB/s real 1m50.585s user 0m0.106s sys 0m2.233s While doing this directly on ssd mount point shows: time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct 25000+0 records in 25000+0 records out 10240 bytes (102 MB) copied, 1.38567 s, 73.9 MB/s OSDs are in XFS with these extra arguments : rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M ceph.conf [global] fsid = 7d889081-7826-439c-9fe5-d4e57480d9be mon_initial_members = ceph1, ceph2, ceph3 mon_host = 10.99.10.118,10.99.10.119,10.99.10.120 auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx filestore_xattr_use_omap = true osd_pool_default_size = 2 osd_pool_default_min_size = 2 osd_pool_default_pg_num = 450 osd_pool_default_pgp_num = 450 max_open_files = 131072 [osd] osd_mkfs_type = xfs osd_op_threads = 8 osd_disk_threads = 4 osd_mount_options_xfs = rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M on our traditional storage with Full SAS disk, same dd completes in 16s with an average write speed of 6Mbps. Rados bench: rados bench -p rbd 10 write Maintaining 16 concurrent writes of 4194304 bytes for up to 10 seconds or 0 objects Object prefix: benchmark_data_ceph1_2977 sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 0 0 0 0 0 0 - 0 1 16 94 78 311.821 312 0.041228 0.140132 2 16 192 176 351.866 392 0.106294 0.175055 3 16 275 259 345.216 332 0.076795 0.166036 4 16 302 286 285.912 108 0.043888 0.196419 5 16 395 379 303.11 372 0.126033 0.207488 6 16 501 485 323.242 424 0.125972 0.194559 7 16 621 605 345.621 480 0.194155 0.183123 8 16 730 714 356.903 436 0.086678 0.176099 9 16 814 798 354.572 336 0.081567 0.174786 10 16 832 816 326.313 72 0.037431 0.182355 11 16 833 817 297.013 4 0.533326 0.182784 Total time run: 11.489068 Total writes made: 833 Write size: 4194304 Bandwidth (MB/sec): 290.015 Stddev Bandwidth: 175.723 Max bandwidth (MB/sec): 480 Min bandwidth (MB/sec): 0 Average Latency: 0.220582 Stddev Latency: 0.343697 Max latency: 2.85104 Min latency: 0.035381 Our ultimate aim is to replace existing SAN with ceph,but for that it should meet minimum 8000 iops.Can any one help me with this,OSD are SSD,CPU has good clock speed,backend network is good but still we are not able to extract full capability of SSD disks. Thanks, Hi, i'm new to ceph so, don't consider my words as holy truth. It
Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel
reinstalled ceph packages and now with memstore backend [osd objectstore =memstore] its giving 400Kbps .No idea where the problem is. On Sun, Mar 1, 2015 at 12:30 AM, mad Engineer themadengin...@gmail.com wrote: tried changing scheduler from deadline to noop also upgraded to Gaint and btrfs filesystem,downgraded kernel to 3.16 from 3.16-3 not much difference dd if=/dev/zero of=hi bs=4k count=25000 oflag=direct 25000+0 records in 25000+0 records out 10240 bytes (102 MB) copied, 94.691 s, 1.1 MB/s Earlier on a vmware setup i was getting ~850 KBps and now even on physical server with SSD drives its just over 1MBps.I doubt some serious configuration issues. Tried iperf between 3 servers all are showing 9 Gbps,tried icmp with different packet size ,no fragmentation. i also noticed that out of 9 osd 5 are 850 EVO and 4 are 840 EVO.I believe this will not cause this much drop in performance. Thanks for any help On Sat, Feb 28, 2015 at 6:49 PM, Alexandre DERUMIER aderum...@odiso.com wrote: As optimisation, try to set ioscheduler to noop, and also enable rbd_cache=true. (It's really helping for for sequential writes) but your results seem quite low, 926kb/s with 4k, it's only 200io/s. check if you don't have any big network latencies, or mtu fragementation problem. Maybe also try to bench with fio, with more parallel jobs. - Mail original - De: mad Engineer themadengin...@gmail.com À: Philippe Schwarz p...@schwarz-fr.net Cc: ceph-users ceph-users@lists.ceph.com Envoyé: Samedi 28 Février 2015 13:06:59 Objet: Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel Thanks for the reply Philippe,we were using these disks in our NAS,now it looks like i am in big trouble :-( On Sat, Feb 28, 2015 at 5:02 PM, Philippe Schwarz p...@schwarz-fr.net wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Le 28/02/2015 12:19, mad Engineer a écrit : Hello All, I am trying ceph-firefly 0.80.8 (69eaad7f8308f21573c604f121956e64679a52a7) with 9 OSD ,all Samsung SSD 850 EVO on 3 servers with 24 G RAM,16 cores @2.27 Ghz Ubuntu 14.04 LTS with 3.16-3 kernel.All are connected to 10G ports with maximum MTU.There are no extra disks for journaling and also there are no separate network for replication and data transfer.All 3 nodes are also hosting monitoring process.Operating system runs on SATA disk. When doing a sequential benchmark using dd on RBD, mounted on client as ext4 its taking 110s to write 100Mb data at an average speed of 926Kbps. time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct 25000+0 records in 25000+0 records out 10240 bytes (102 MB) copied, 110.582 s, 926 kB/s real 1m50.585s user 0m0.106s sys 0m2.233s While doing this directly on ssd mount point shows: time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct 25000+0 records in 25000+0 records out 10240 bytes (102 MB) copied, 1.38567 s, 73.9 MB/s OSDs are in XFS with these extra arguments : rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M ceph.conf [global] fsid = 7d889081-7826-439c-9fe5-d4e57480d9be mon_initial_members = ceph1, ceph2, ceph3 mon_host = 10.99.10.118,10.99.10.119,10.99.10.120 auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx filestore_xattr_use_omap = true osd_pool_default_size = 2 osd_pool_default_min_size = 2 osd_pool_default_pg_num = 450 osd_pool_default_pgp_num = 450 max_open_files = 131072 [osd] osd_mkfs_type = xfs osd_op_threads = 8 osd_disk_threads = 4 osd_mount_options_xfs = rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M on our traditional storage with Full SAS disk, same dd completes in 16s with an average write speed of 6Mbps. Rados bench: rados bench -p rbd 10 write Maintaining 16 concurrent writes of 4194304 bytes for up to 10 seconds or 0 objects Object prefix: benchmark_data_ceph1_2977 sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 0 0 0 0 0 0 - 0 1 16 94 78 311.821 312 0.041228 0.140132 2 16 192 176 351.866 392 0.106294 0.175055 3 16 275 259 345.216 332 0.076795 0.166036 4 16 302 286 285.912 108 0.043888 0.196419 5 16 395 379 303.11 372 0.126033 0.207488 6 16 501 485 323.242 424 0.125972 0.194559 7 16 621 605 345.621 480 0.194155 0.183123 8 16 730 714 356.903 436 0.086678 0.176099 9 16 814 798 354.572 336 0.081567 0.174786 10 16 832 816 326.313 72 0.037431 0.182355 11 16 833 817 297.013 4 0.533326 0.182784 Total time run: 11.489068 Total writes made: 833 Write size: 4194304 Bandwidth (MB/sec): 290.015 Stddev Bandwidth: 175.723 Max bandwidth (MB/sec): 480 Min bandwidth (MB/sec): 0 Average Latency: 0.220582 Stddev Latency: 0.343697 Max latency: 2.85104 Min latency: 0.035381 Our ultimate aim is to replace existing SAN with ceph,but for that it should meet minimum 8000 iops.Can any one help me with
Re: [ceph-users] SSD selection
On Sat, 28 Feb 2015 20:42:35 -0600 Tony Harris wrote: Hi all, I have a small cluster together and it's running fairly well (3 nodes, 21 osds). I'm looking to improve the write performance a bit though, which I was hoping that using SSDs for journals would do. But, I was wondering what people had as recommendations for SSDs to act as journal drives. If I read the docs on ceph.com correctly, I'll need 2 ssds per node (with 7 drives in each node, I think the recommendation was 1ssd per 4-5 drives?) so I'm looking for drives that will work well without breaking the bank for where I work (I'll probably have to purchase them myself and donate, so my budget is somewhat small). Any suggestions? I'd prefer one that can finish its write in a power outage case, the only one I know of off hand is the intel dcs3700 I think, but at $300 it's WAY above my affordability range. Firstly, an uneven number of OSDs (HDDs) per node will bite you in the proverbial behind down the road when combined with journal SSDs, as one of those SSDs will wear our faster than the other. Secondly, how many SSDs you need is basically a trade-off between price, performance, endurance and limiting failure impact. I have cluster where I used 4 100GB DC S3700s with 8 HDD OSDs, optimizing the write paths and IOPS and failure domain, but not the sequential speed or cost. Depending on what your write load is and the expected lifetime of this cluster, you might be able to get away with DC S3500s or even better the new DC S3610s. Keep in mind that buying a cheap, low endurance SSD now might cost you more down the road if you have to replace it after a year (TBW/$). All the cheap alternatives to DC level SSDs tend to wear out too fast, have no powercaps and tend to have unpredictable (caused by garbage collection) and steadily decreasing performance. Christian -- Christian BalzerNetwork/Systems Engineer ch...@gol.com Global OnLine Japan/Fusion Communications http://www.gol.com/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Mail not reaching the list?
Mail is landed in Spam. Here is message from google: *Why is this message in Spam?* It has a from address in yahoo.com but has failed yahoo.com's required tests for authentication. Learn more https://support.google.com/mail/answer/1366858?hl=enexpand=5 Regards, Sudarshan Pathak On Sat, Feb 28, 2015 at 9:25 PM, Tony Harris kg4...@yahoo.com wrote: Hi, I've sent a couple of emails to the list since subscribing, but I've never seen them reach the list; I was just wondering if there was something wrong? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Am I reaching the list now?
I was subscribed with a yahoo email address, but it was getting some grief so I decided to try using my gmail address, hopefully this one is working -Tony ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] SSD selection
Hi all, I have a small cluster together and it's running fairly well (3 nodes, 21 osds). I'm looking to improve the write performance a bit though, which I was hoping that using SSDs for journals would do. But, I was wondering what people had as recommendations for SSDs to act as journal drives. If I read the docs on ceph.com correctly, I'll need 2 ssds per node (with 7 drives in each node, I think the recommendation was 1ssd per 4-5 drives?) so I'm looking for drives that will work well without breaking the bank for where I work (I'll probably have to purchase them myself and donate, so my budget is somewhat small). Any suggestions? I'd prefer one that can finish its write in a power outage case, the only one I know of off hand is the intel dcs3700 I think, but at $300 it's WAY above my affordability range. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Booting from journal devices
Hello, On Sat, 28 Feb 2015 18:47:14 - Nick Fisk wrote: Hi All, Thought I would just share this in case someone finds it useful. I've just finished building our new Ceph cluster where the journals are installed on the same SSD's as the OS. The SSD's have a MD raid partitions for the OS and swap and the rest of the SSD's are used for individual journal partitions. The OS is Ubuntu and as such the default install is using MBR partitions. When you say MBR, do you mean DOS partitions? After we created the OSD's, the servers were unable to boot anymore. This was caused by the ceph-deploy making GPT partitions for the journals on the SSD's, effectively what it looks like is that its overwriting the MBR boot record. Another reason I dislike these black box, we know better what you want approaches. Which version of ceph-deploy was this? Did you create the journal partitions beforehand? Because I have a cluster very much like it and ceph-deploy v1.5.7 did not touch the OS/Journal SSDs other than initializing the journals (partitions had been created with fdisk before). Christian To fix it I carried out the following steps:- 1. Used gdisk on both SSD's to create a new partition from sector 34 to 2047, of type EF02 2. Ran grub-install against each SSD device Hope that's helps someone, if they come across the same problem. Nick -- Christian BalzerNetwork/Systems Engineer ch...@gol.com Global OnLine Japan/Fusion Communications http://www.gol.com/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel
I am re installing ceph with giant release,will soon update results with above configuration changes. my servers are Cisco UCS C 200 M1 with Integrated Intel ICH10R SATA controller.Before installing ceph i changed it to use Software RAID quoting from below link [When using the integrated RAID, you must enable the ICH10R controller in SW RAID mode] http://www.cisco.com/c/en/us/td/docs/unified_computing/ucs/c/hw/C200M1/install/c200M1/RAID.html#88713 Not sure this is the problem.With out ceph ,linux is giving better results with this controller and SSD disks. WIth ceph over it results are slower than SATA disks. Thanks for all your support On Sun, Mar 1, 2015 at 3:07 AM, Somnath Roy somnath@sandisk.com wrote: Sorry, I saw you have already tried with ‘rados bench’. So, some points here. 1. If you are considering write workload, I think with total of 2 copies and with 4K workload , you should be able to get ~4K iops (considering it hitting the disk, not with memstore). 2. You are having 9 OSDs and if you created only one pool with only 450 PGS, you should try to increase that and see if getting any improvement or not. 3. Also, the rados bench script you ran with very low QD, try increasing that, may be 32/64. 4. If you are running firefly, other optimization won’t work here..But, you can add the following in your ceph.conf file and it should give you some boost. debug_lockdep = 0/0 debug_context = 0/0 debug_crush = 0/0 debug_buffer = 0/0 debug_timer = 0/0 debug_filer = 0/0 debug_objecter = 0/0 debug_rados = 0/0 debug_rbd = 0/0 debug_journaler = 0/0 debug_objectcatcher = 0/0 debug_client = 0/0 debug_osd = 0/0 debug_optracker = 0/0 debug_objclass = 0/0 debug_filestore = 0/0 debug_journal = 0/0 debug_ms = 0/0 debug_monc = 0/0 debug_tp = 0/0 debug_auth = 0/0 debug_finisher = 0/0 debug_heartbeatmap = 0/0 debug_perfcounter = 0/0 debug_asok = 0/0 debug_throttle = 0/0 debug_mon = 0/0 debug_paxos = 0/0 debug_rgw = 0/0 5. Give us the ceph –s output and the iostat output while io is going on. Thanks Regards Somnath *From:* Somnath Roy *Sent:* Saturday, February 28, 2015 12:59 PM *To:* 'mad Engineer'; Alexandre DERUMIER *Cc:* ceph-users *Subject:* RE: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel I would say check with rados tool like ceph_smalliobench/rados bench first to see how much performance these tools are reporting. This will help you to isolate any upstream issues. Also, check with ‘iostat –xk 1’ for the resource utilization. Hope you are running with powerful enough cpu complex since you are saying network is not a bottleneck. Thanks Regards Somnath *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf Of *mad Engineer *Sent:* Saturday, February 28, 2015 12:29 PM *To:* Alexandre DERUMIER *Cc:* ceph-users *Subject:* Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel reinstalled ceph packages and now with memstore backend [osd objectstore =memstore] its giving 400Kbps .No idea where the problem is. On Sun, Mar 1, 2015 at 12:30 AM, mad Engineer themadengin...@gmail.com wrote: tried changing scheduler from deadline to noop also upgraded to Gaint and btrfs filesystem,downgraded kernel to 3.16 from 3.16-3 not much difference dd if=/dev/zero of=hi bs=4k count=25000 oflag=direct 25000+0 records in 25000+0 records out 10240 bytes (102 MB) copied, 94.691 s, 1.1 MB/s Earlier on a vmware setup i was getting ~850 KBps and now even on physical server with SSD drives its just over 1MBps.I doubt some serious configuration issues. Tried iperf between 3 servers all are showing 9 Gbps,tried icmp with different packet size ,no fragmentation. i also noticed that out of 9 osd 5 are 850 EVO and 4 are 840 EVO.I believe this will not cause this much drop in performance. Thanks for any help On Sat, Feb 28, 2015 at 6:49 PM, Alexandre DERUMIER aderum...@odiso.com wrote: As optimisation, try to set ioscheduler to noop, and also enable rbd_cache=true. (It's really helping for for sequential writes) but your results seem quite low, 926kb/s with 4k, it's only 200io/s. check if you don't have any big network latencies, or mtu fragementation problem. Maybe also try to bench with fio, with more parallel jobs. - Mail original - De: mad Engineer themadengin...@gmail.com À: Philippe Schwarz p...@schwarz-fr.net Cc: ceph-users ceph-users@lists.ceph.com Envoyé: Samedi 28 Février 2015 13:06:59 Objet: Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel Thanks for the reply Philippe,we were using these disks in our NAS,now it looks like i am in big trouble :-( On Sat, Feb 28, 2015 at 5:02 PM, Philippe Schwarz p...@schwarz-fr.net wrote: -BEGIN PGP SIGNED MESSAGE- Hash: