Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel
Can you run the Fio test again but with a queue depth of 32. This will probably show what your cluster is capable of. Adding more nodes with SSD's will probably help scale, but only at higher io depths. At low queue depths you are probably already at the limit as per my earlier email. From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of mad Engineer Sent: 09 March 2015 17:23 To: Nick Fisk Cc: ceph-users Subject: Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel Thank you Nick for explaining the problem with 4k writes.Queue depth used in this setup is 256 the maximum supported. Can you clarify that adding more nodes will not increase iops.In general how will we increase iops of a ceph cluster. Thanks for your help On Sat, Mar 7, 2015 at 5:57 PM, Nick Fisk wrote: You are hitting serial latency limits. For a 4kb sync write to happen it has to:- 1. Travel across network from client to Primary OSD 2. Be processed by Ceph 3. Get Written to Pri OSD 4. Ack travels across network to client At 4kb these 4 steps take up a very high percentage of the actual processing time as compared to the actual write to the SSD. Apart from faster (more ghz) CPU's which will improve step 2, there's not much that can be done. Future Ceph releases may improve step2 as well, but I wouldn't imagine it will change dramitcally. Replication level >1 will also see the IOPs drop as you are introducing yet more ceph processing and network delays. Unless a future Ceph feature can be implemented where it returns the ack to client once data has hit the 1st OSD. Still a 1000 iops, is not that bad. You mention it needs to achieve 8000 iops to replace your existing SAN, at what queue depth is this required? You are getting way above that at a queue depth of only 16. I doubt most Ethernet based enterprise SANs would be able to provide 8000 iops at a queue depth of 1, as just network delays would be limiting you to around that figure. A network delay of .1ms will limit you to 10,000 IOPs, .2ms = 5000IOPs and so on. If you really do need pure SSD performance for a certain client you will need to move the SSD local to it using some sort of caching software running on the client , although this can bring its own challenges. Nick > -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > mad Engineer > Sent: 07 March 2015 10:55 > To: Somnath Roy > Cc: ceph-users > Subject: Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 > OSD with 3.16-3 kernel > > Update: > > Hardware: > Upgraded RAID controller to LSI Megaraid 9341 -12Gbps > 3 Samsung 840 EVO - was showing 45K iops for fio test with 7 threads and 4k > block size in JBOD mode > CPU- 16 cores @2.27Ghz > RAM- 24Gb > NIC- 10Gbits with under 1 ms latency, iperf shows 9.18 Gbps between host > and client > > Software > Ubuntu 14.04 with stock kernel 3.13- > Upgraded from firefly to giant [ceph version 0.87.1 > (283c2e7cfa2457799f534744d7d549f83ea1335e)] > Changed file system to btrfs and i/o scheduler to noop. > > Ceph Setup > replication to 1 and using 2 SSD OSD and 1 SSD for Journal.All are samsung 840 > EVO in JBOD mode on single server. > > Configuration: > [global] > fsid = 979f32fc-6f31-43b0-832f-29fcc4c5a648 > mon_initial_members = ceph1 > mon_host = 10.99.10.118 > auth_cluster_required = cephx > auth_service_required = cephx > auth_client_required = cephx > filestore_xattr_use_omap = true > osd_pool_default_size = 1 > osd_pool_default_min_size = 1 > osd_pool_default_pg_num = 250 > osd_pool_default_pgp_num = 250 > debug_lockdep = 0/0 > debug_context = 0/0 > debug_crush = 0/0 > debug_buffer = 0/0 > debug_timer = 0/0 > debug_filer = 0/0 > debug_objecter = 0/0 > debug_rados = 0/0 > debug_rbd = 0/0 > debug_journaler = 0/0 > debug_objectcatcher = 0/0 > debug_client = 0/0 > debug_osd = 0/0 > debug_optracker = 0/0 > debug_objclass = 0/0 > debug_filestore = 0/0 > debug_journal = 0/0 > debug_ms = 0/0 > debug_monc = 0/0 > debug_tp = 0/0 > debug_auth = 0/0 > debug_finisher = 0/0 > debug_heartbeatmap = 0/0 > debug_perfcounter = 0/0 > debug_asok = 0/0 > debug_throttle = 0/0 > debug_mon = 0/0 > debug_paxos = 0/0 > debug_rgw = 0/0 > > [client] > rbd_cache = true > > Client > Ubuntu 14.04 with 16 Core @2.53 Ghz and 24G RAM > > Results > rados bench -p rdp -b 4096 -t 16 10 write > > rados bench -p rbd -b 4096 -t 16 10 write > Maintaining 16 concurrent writes of 4096 bytes for up to 10 seconds or 0 > objects > Object prefix: benchmark_data_ubuntucompute_3931 >sec Cur ops started finished avg MB/s cur MB/s last lat avg lat > 0 0 0 0
Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel
Thank you Nick for explaining the problem with 4k writes.Queue depth used in this setup is 256 the maximum supported. Can you clarify that adding more nodes will not increase iops.In general how will we increase iops of a ceph cluster. Thanks for your help On Sat, Mar 7, 2015 at 5:57 PM, Nick Fisk wrote: > You are hitting serial latency limits. For a 4kb sync write to happen it > has to:- > > 1. Travel across network from client to Primary OSD > 2. Be processed by Ceph > 3. Get Written to Pri OSD > 4. Ack travels across network to client > > At 4kb these 4 steps take up a very high percentage of the actual > processing time as compared to the actual write to the SSD. Apart from > faster (more ghz) CPU's which will improve step 2, there's not much that > can be done. Future Ceph releases may improve step2 as well, but I wouldn't > imagine it will change dramitcally. > > Replication level >1 will also see the IOPs drop as you are introducing > yet more ceph processing and network delays. Unless a future Ceph feature > can be implemented where it returns the ack to client once data has hit the > 1st OSD. > > Still a 1000 iops, is not that bad. You mention it needs to achieve 8000 > iops to replace your existing SAN, at what queue depth is this required? > You are getting way above that at a queue depth of only 16. > > I doubt most Ethernet based enterprise SANs would be able to provide 8000 > iops at a queue depth of 1, as just network delays would be limiting you to > around that figure. A network delay of .1ms will limit you to 10,000 IOPs, > .2ms = 5000IOPs and so on. > > If you really do need pure SSD performance for a certain client you will > need to move the SSD local to it using some sort of caching software > running on the client , although this can bring its own challenges. > > Nick > > > -Original Message- > > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > > mad Engineer > > Sent: 07 March 2015 10:55 > > To: Somnath Roy > > Cc: ceph-users > > Subject: Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes > and 9 > > OSD with 3.16-3 kernel > > > > Update: > > > > Hardware: > > Upgraded RAID controller to LSI Megaraid 9341 -12Gbps > > 3 Samsung 840 EVO - was showing 45K iops for fio test with 7 threads and > 4k > > block size in JBOD mode > > CPU- 16 cores @2.27Ghz > > RAM- 24Gb > > NIC- 10Gbits with under 1 ms latency, iperf shows 9.18 Gbps between host > > and client > > > > Software > > Ubuntu 14.04 with stock kernel 3.13- > > Upgraded from firefly to giant [ceph version 0.87.1 > > (283c2e7cfa2457799f534744d7d549f83ea1335e)] > > Changed file system to btrfs and i/o scheduler to noop. > > > > Ceph Setup > > replication to 1 and using 2 SSD OSD and 1 SSD for Journal.All are > samsung 840 > > EVO in JBOD mode on single server. > > > > Configuration: > > [global] > > fsid = 979f32fc-6f31-43b0-832f-29fcc4c5a648 > > mon_initial_members = ceph1 > > mon_host = 10.99.10.118 > > auth_cluster_required = cephx > > auth_service_required = cephx > > auth_client_required = cephx > > filestore_xattr_use_omap = true > > osd_pool_default_size = 1 > > osd_pool_default_min_size = 1 > > osd_pool_default_pg_num = 250 > > osd_pool_default_pgp_num = 250 > > debug_lockdep = 0/0 > > debug_context = 0/0 > > debug_crush = 0/0 > > debug_buffer = 0/0 > > debug_timer = 0/0 > > debug_filer = 0/0 > > debug_objecter = 0/0 > > debug_rados = 0/0 > > debug_rbd = 0/0 > > debug_journaler = 0/0 > > debug_objectcatcher = 0/0 > > debug_client = 0/0 > > debug_osd = 0/0 > > debug_optracker = 0/0 > > debug_objclass = 0/0 > > debug_filestore = 0/0 > > debug_journal = 0/0 > > debug_ms = 0/0 > > debug_monc = 0/0 > > debug_tp = 0/0 > > debug_auth = 0/0 > > debug_finisher = 0/0 > > debug_heartbeatmap = 0/0 > > debug_perfcounter = 0/0 > > debug_asok = 0/0 > > debug_throttle = 0/0 > > debug_mon = 0/0 > > debug_paxos = 0/0 > > debug_rgw = 0/0 > > > > [client] > > rbd_cache = true > > > > Client > > Ubuntu 14.04 with 16 Core @2.53 Ghz and 24G RAM > > > > Results > > rados bench -p rdp -b 4096 -t 16 10 write > > > > rados bench -p rbd -b 4096 -t 16 10 write > > Maintaining 16 concurrent writes of 4096 bytes for up to 10 seconds or 0 > > objects > > Object prefix: benchmark_data_ubuntucompute_3931 > >sec Cur ops started finished avg MB/s cu
Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel
You are hitting serial latency limits. For a 4kb sync write to happen it has to:- 1. Travel across network from client to Primary OSD 2. Be processed by Ceph 3. Get Written to Pri OSD 4. Ack travels across network to client At 4kb these 4 steps take up a very high percentage of the actual processing time as compared to the actual write to the SSD. Apart from faster (more ghz) CPU's which will improve step 2, there's not much that can be done. Future Ceph releases may improve step2 as well, but I wouldn't imagine it will change dramitcally. Replication level >1 will also see the IOPs drop as you are introducing yet more ceph processing and network delays. Unless a future Ceph feature can be implemented where it returns the ack to client once data has hit the 1st OSD. Still a 1000 iops, is not that bad. You mention it needs to achieve 8000 iops to replace your existing SAN, at what queue depth is this required? You are getting way above that at a queue depth of only 16. I doubt most Ethernet based enterprise SANs would be able to provide 8000 iops at a queue depth of 1, as just network delays would be limiting you to around that figure. A network delay of .1ms will limit you to 10,000 IOPs, .2ms = 5000IOPs and so on. If you really do need pure SSD performance for a certain client you will need to move the SSD local to it using some sort of caching software running on the client , although this can bring its own challenges. Nick > -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > mad Engineer > Sent: 07 March 2015 10:55 > To: Somnath Roy > Cc: ceph-users > Subject: Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 > OSD with 3.16-3 kernel > > Update: > > Hardware: > Upgraded RAID controller to LSI Megaraid 9341 -12Gbps > 3 Samsung 840 EVO - was showing 45K iops for fio test with 7 threads and 4k > block size in JBOD mode > CPU- 16 cores @2.27Ghz > RAM- 24Gb > NIC- 10Gbits with under 1 ms latency, iperf shows 9.18 Gbps between host > and client > > Software > Ubuntu 14.04 with stock kernel 3.13- > Upgraded from firefly to giant [ceph version 0.87.1 > (283c2e7cfa2457799f534744d7d549f83ea1335e)] > Changed file system to btrfs and i/o scheduler to noop. > > Ceph Setup > replication to 1 and using 2 SSD OSD and 1 SSD for Journal.All are samsung 840 > EVO in JBOD mode on single server. > > Configuration: > [global] > fsid = 979f32fc-6f31-43b0-832f-29fcc4c5a648 > mon_initial_members = ceph1 > mon_host = 10.99.10.118 > auth_cluster_required = cephx > auth_service_required = cephx > auth_client_required = cephx > filestore_xattr_use_omap = true > osd_pool_default_size = 1 > osd_pool_default_min_size = 1 > osd_pool_default_pg_num = 250 > osd_pool_default_pgp_num = 250 > debug_lockdep = 0/0 > debug_context = 0/0 > debug_crush = 0/0 > debug_buffer = 0/0 > debug_timer = 0/0 > debug_filer = 0/0 > debug_objecter = 0/0 > debug_rados = 0/0 > debug_rbd = 0/0 > debug_journaler = 0/0 > debug_objectcatcher = 0/0 > debug_client = 0/0 > debug_osd = 0/0 > debug_optracker = 0/0 > debug_objclass = 0/0 > debug_filestore = 0/0 > debug_journal = 0/0 > debug_ms = 0/0 > debug_monc = 0/0 > debug_tp = 0/0 > debug_auth = 0/0 > debug_finisher = 0/0 > debug_heartbeatmap = 0/0 > debug_perfcounter = 0/0 > debug_asok = 0/0 > debug_throttle = 0/0 > debug_mon = 0/0 > debug_paxos = 0/0 > debug_rgw = 0/0 > > [client] > rbd_cache = true > > Client > Ubuntu 14.04 with 16 Core @2.53 Ghz and 24G RAM > > Results > rados bench -p rdp -b 4096 -t 16 10 write > > rados bench -p rbd -b 4096 -t 16 10 write > Maintaining 16 concurrent writes of 4096 bytes for up to 10 seconds or 0 > objects > Object prefix: benchmark_data_ubuntucompute_3931 >sec Cur ops started finished avg MB/s cur MB/s last lat avg lat > 0 0 0 0 0 0 - 0 > 1 16 6370 6354 24.8124 24.8203 0.002210.00251512 > 2 16 11618 11602 22.6536 20.5 0.0010250.00275493 > 3 16 16889 16873 21.9637 20.5898 0.0012880.00281797 > 4 16 17310 1729416.884 1.64453 0.0540660.00365805 > 5 16 17695 1767913.808 1.50391 0.0014510.0009 > 6 16 18127 18111 11.78681.6875 0.0014630.00527521 > 7 16 21647 21631 12.0669 13.75 0.001601 0.0051773 > 8 16 28056 28040 13.6872 25.0352 0.0052680.00456353 > 9 16 28947 2893112.553 3.48047 0.066470.00494762 > 10 16 29346 29330 11.4536 1.55859 0.0013410.0054231
Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel
], | 99.00th=[ 1928], 99.50th=[ 1992], 99.90th=[ 2160], 99.95th=[ 2288], | 99.99th=[39168] bw (KB /s): min= 54, max= 3568, per=64.10%, avg=2529.43, stdev=315.56 lat (usec) : 750=0.07%, 1000=2.53% lat (msec) : 2=96.96%, 4=0.43%, 50=0.01%, >=2000=0.01% cpu : usr=0.51%, sys=2.04%, ctx=73550, majf=0, minf=93 IO depths: 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued: total=r=0/w=73234/d=0, short=r=0/w=0/d=0 Run status group 0 (all jobs): WRITE: io=292936KB, aggrb=3946KB/s, minb=3946KB/s, maxb=3946KB/s, mint=74236msec, maxt=74236msec Disk stats (read/write): rbd0: ios=186/73232, merge=0/0, ticks=120/109676, in_queue=143448, util=100.00% How can i improve performance of 4k write? Will adding more Nodes improve this Thanks for any help On Sun, Mar 1, 2015 at 3:07 AM, Somnath Roy wrote: > Sorry, I saw you have already tried with ‘rados bench’. So, some points > here. > > > > 1. If you are considering write workload, I think with total of 2 copies > and with 4K workload , you should be able to get ~4K iops (considering it > hitting the disk, not with memstore). > > > > 2. You are having 9 OSDs and if you created only one pool with only 450 > PGS, you should try to increase that and see if getting any improvement or > not. > > > > 3. Also, the rados bench script you ran with very low QD, try increasing > that, may be 32/64. > > > > 4. If you are running firefly, other optimization won’t work here..But, > you can add the following in your ceph.conf file and it should give you > some boost. > > > > debug_lockdep = 0/0 > > debug_context = 0/0 > > debug_crush = 0/0 > > debug_buffer = 0/0 > > debug_timer = 0/0 > > debug_filer = 0/0 > > debug_objecter = 0/0 > > debug_rados = 0/0 > > debug_rbd = 0/0 > > debug_journaler = 0/0 > > debug_objectcatcher = 0/0 > > debug_client = 0/0 > > debug_osd = 0/0 > > debug_optracker = 0/0 > > debug_objclass = 0/0 > > debug_filestore = 0/0 > > debug_journal = 0/0 > > debug_ms = 0/0 > > debug_monc = 0/0 > > debug_tp = 0/0 > > debug_auth = 0/0 > > debug_finisher = 0/0 > > debug_heartbeatmap = 0/0 > > debug_perfcounter = 0/0 > > debug_asok = 0/0 > > debug_throttle = 0/0 > > debug_mon = 0/0 > > debug_paxos = 0/0 > > debug_rgw = 0/0 > > > > 5. Give us the ceph –s output and the iostat output while io is going on. > > > > Thanks & Regards > > Somnath > > > > > > > > *From:* Somnath Roy > *Sent:* Saturday, February 28, 2015 12:59 PM > *To:* 'mad Engineer'; Alexandre DERUMIER > *Cc:* ceph-users > *Subject:* RE: [ceph-users] Extreme slowness in SSD cluster with 3 nodes > and 9 OSD with 3.16-3 kernel > > > > I would say check with rados tool like ceph_smalliobench/rados bench first > to see how much performance these tools are reporting. This will help you > to isolate any upstream issues. > > Also, check with ‘iostat –xk 1’ for the resource utilization. Hope you are > running with powerful enough cpu complex since you are saying network is > not a bottleneck. > > > > Thanks & Regards > > Somnath > > > > *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf > Of *mad Engineer > *Sent:* Saturday, February 28, 2015 12:29 PM > *To:* Alexandre DERUMIER > *Cc:* ceph-users > *Subject:* Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes > and 9 OSD with 3.16-3 kernel > > > > reinstalled ceph packages and now with memstore backend [osd objectstore > =memstore] its giving 400Kbps .No idea where the problem is. > > > > On Sun, Mar 1, 2015 at 12:30 AM, mad Engineer > wrote: > > tried changing scheduler from deadline to noop also upgraded to Gaint and > btrfs filesystem,downgraded kernel to 3.16 from 3.16-3 not much difference > > > > dd if=/dev/zero of=hi bs=4k count=25000 oflag=direct > > 25000+0 records in > > 25000+0 records out > > 10240 bytes (102 MB) copied, 94.691 s, 1.1 MB/s > > > > Earlier on a vmware setup i was getting ~850 KBps and now even on physical > server with SSD drives its just over 1MBps.I doubt some serious > configuration issues. > > > > Tried iperf between 3 servers all are showing 9 Gbps,tried icmp with > different packet size ,no fragmentation. > > > > i also noticed that out of 9 osd 5 are 850 EVO and 4 are 840 EVO.I believe > this will not cause this much drop in performance. > &g
Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel
I am re installing ceph with giant release,will soon update results with above configuration changes. my servers are Cisco UCS C 200 M1 with Integrated Intel ICH10R SATA controller.Before installing ceph i changed it to use Software RAID quoting from below link [When using the integrated RAID, you must enable the ICH10R controller in SW RAID mode] http://www.cisco.com/c/en/us/td/docs/unified_computing/ucs/c/hw/C200M1/install/c200M1/RAID.html#88713 Not sure this is the problem.With out ceph ,linux is giving better results with this controller and SSD disks. WIth ceph over it results are slower than SATA disks. Thanks for all your support On Sun, Mar 1, 2015 at 3:07 AM, Somnath Roy wrote: > Sorry, I saw you have already tried with ‘rados bench’. So, some points > here. > > > > 1. If you are considering write workload, I think with total of 2 copies > and with 4K workload , you should be able to get ~4K iops (considering it > hitting the disk, not with memstore). > > > > 2. You are having 9 OSDs and if you created only one pool with only 450 > PGS, you should try to increase that and see if getting any improvement or > not. > > > > 3. Also, the rados bench script you ran with very low QD, try increasing > that, may be 32/64. > > > > 4. If you are running firefly, other optimization won’t work here..But, > you can add the following in your ceph.conf file and it should give you > some boost. > > > > debug_lockdep = 0/0 > > debug_context = 0/0 > > debug_crush = 0/0 > > debug_buffer = 0/0 > > debug_timer = 0/0 > > debug_filer = 0/0 > > debug_objecter = 0/0 > > debug_rados = 0/0 > > debug_rbd = 0/0 > > debug_journaler = 0/0 > > debug_objectcatcher = 0/0 > > debug_client = 0/0 > > debug_osd = 0/0 > > debug_optracker = 0/0 > > debug_objclass = 0/0 > > debug_filestore = 0/0 > > debug_journal = 0/0 > > debug_ms = 0/0 > > debug_monc = 0/0 > > debug_tp = 0/0 > > debug_auth = 0/0 > > debug_finisher = 0/0 > > debug_heartbeatmap = 0/0 > > debug_perfcounter = 0/0 > > debug_asok = 0/0 > > debug_throttle = 0/0 > > debug_mon = 0/0 > > debug_paxos = 0/0 > > debug_rgw = 0/0 > > > > 5. Give us the ceph –s output and the iostat output while io is going on. > > > > Thanks & Regards > > Somnath > > > > > > > > *From:* Somnath Roy > *Sent:* Saturday, February 28, 2015 12:59 PM > *To:* 'mad Engineer'; Alexandre DERUMIER > *Cc:* ceph-users > *Subject:* RE: [ceph-users] Extreme slowness in SSD cluster with 3 nodes > and 9 OSD with 3.16-3 kernel > > > > I would say check with rados tool like ceph_smalliobench/rados bench first > to see how much performance these tools are reporting. This will help you > to isolate any upstream issues. > > Also, check with ‘iostat –xk 1’ for the resource utilization. Hope you are > running with powerful enough cpu complex since you are saying network is > not a bottleneck. > > > > Thanks & Regards > > Somnath > > > > *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf > Of *mad Engineer > *Sent:* Saturday, February 28, 2015 12:29 PM > *To:* Alexandre DERUMIER > *Cc:* ceph-users > *Subject:* Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes > and 9 OSD with 3.16-3 kernel > > > > reinstalled ceph packages and now with memstore backend [osd objectstore > =memstore] its giving 400Kbps .No idea where the problem is. > > > > On Sun, Mar 1, 2015 at 12:30 AM, mad Engineer > wrote: > > tried changing scheduler from deadline to noop also upgraded to Gaint and > btrfs filesystem,downgraded kernel to 3.16 from 3.16-3 not much difference > > > > dd if=/dev/zero of=hi bs=4k count=25000 oflag=direct > > 25000+0 records in > > 25000+0 records out > > 10240 bytes (102 MB) copied, 94.691 s, 1.1 MB/s > > > > Earlier on a vmware setup i was getting ~850 KBps and now even on physical > server with SSD drives its just over 1MBps.I doubt some serious > configuration issues. > > > > Tried iperf between 3 servers all are showing 9 Gbps,tried icmp with > different packet size ,no fragmentation. > > > > i also noticed that out of 9 osd 5 are 850 EVO and 4 are 840 EVO.I believe > this will not cause this much drop in performance. > > > > Thanks for any help > > > > > > On Sat, Feb 28, 2015 at 6:49 PM, Alexandre DERUMIER > wrote: > > As optimisation, > > try to set ioscheduler to noop, > > and also enable rbd_cache=true. (It's really helping for for sequential > writes) > > but your result
Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel
Sorry, I saw you have already tried with ‘rados bench’. So, some points here. 1. If you are considering write workload, I think with total of 2 copies and with 4K workload , you should be able to get ~4K iops (considering it hitting the disk, not with memstore). 2. You are having 9 OSDs and if you created only one pool with only 450 PGS, you should try to increase that and see if getting any improvement or not. 3. Also, the rados bench script you ran with very low QD, try increasing that, may be 32/64. 4. If you are running firefly, other optimization won’t work here..But, you can add the following in your ceph.conf file and it should give you some boost. debug_lockdep = 0/0 debug_context = 0/0 debug_crush = 0/0 debug_buffer = 0/0 debug_timer = 0/0 debug_filer = 0/0 debug_objecter = 0/0 debug_rados = 0/0 debug_rbd = 0/0 debug_journaler = 0/0 debug_objectcatcher = 0/0 debug_client = 0/0 debug_osd = 0/0 debug_optracker = 0/0 debug_objclass = 0/0 debug_filestore = 0/0 debug_journal = 0/0 debug_ms = 0/0 debug_monc = 0/0 debug_tp = 0/0 debug_auth = 0/0 debug_finisher = 0/0 debug_heartbeatmap = 0/0 debug_perfcounter = 0/0 debug_asok = 0/0 debug_throttle = 0/0 debug_mon = 0/0 debug_paxos = 0/0 debug_rgw = 0/0 5. Give us the ceph –s output and the iostat output while io is going on. Thanks & Regards Somnath From: Somnath Roy Sent: Saturday, February 28, 2015 12:59 PM To: 'mad Engineer'; Alexandre DERUMIER Cc: ceph-users Subject: RE: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel I would say check with rados tool like ceph_smalliobench/rados bench first to see how much performance these tools are reporting. This will help you to isolate any upstream issues. Also, check with ‘iostat –xk 1’ for the resource utilization. Hope you are running with powerful enough cpu complex since you are saying network is not a bottleneck. Thanks & Regards Somnath From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of mad Engineer Sent: Saturday, February 28, 2015 12:29 PM To: Alexandre DERUMIER Cc: ceph-users Subject: Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel reinstalled ceph packages and now with memstore backend [osd objectstore =memstore] its giving 400Kbps .No idea where the problem is. On Sun, Mar 1, 2015 at 12:30 AM, mad Engineer mailto:themadengin...@gmail.com>> wrote: tried changing scheduler from deadline to noop also upgraded to Gaint and btrfs filesystem,downgraded kernel to 3.16 from 3.16-3 not much difference dd if=/dev/zero of=hi bs=4k count=25000 oflag=direct 25000+0 records in 25000+0 records out 10240 bytes (102 MB) copied, 94.691 s, 1.1 MB/s Earlier on a vmware setup i was getting ~850 KBps and now even on physical server with SSD drives its just over 1MBps.I doubt some serious configuration issues. Tried iperf between 3 servers all are showing 9 Gbps,tried icmp with different packet size ,no fragmentation. i also noticed that out of 9 osd 5 are 850 EVO and 4 are 840 EVO.I believe this will not cause this much drop in performance. Thanks for any help On Sat, Feb 28, 2015 at 6:49 PM, Alexandre DERUMIER mailto:aderum...@odiso.com>> wrote: As optimisation, try to set ioscheduler to noop, and also enable rbd_cache=true. (It's really helping for for sequential writes) but your results seem quite low, 926kb/s with 4k, it's only 200io/s. check if you don't have any big network latencies, or mtu fragementation problem. Maybe also try to bench with fio, with more parallel jobs. - Mail original - De: "mad Engineer" mailto:themadengin...@gmail.com>> À: "Philippe Schwarz" mailto:p...@schwarz-fr.net>> Cc: "ceph-users" mailto:ceph-users@lists.ceph.com>> Envoyé: Samedi 28 Février 2015 13:06:59 Objet: Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel Thanks for the reply Philippe,we were using these disks in our NAS,now it looks like i am in big trouble :-( On Sat, Feb 28, 2015 at 5:02 PM, Philippe Schwarz mailto:p...@schwarz-fr.net>> wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > Le 28/02/2015 12:19, mad Engineer a écrit : >> Hello All, >> >> I am trying ceph-firefly 0.80.8 >> (69eaad7f8308f21573c604f121956e64679a52a7) with 9 OSD ,all Samsung >> SSD 850 EVO on 3 servers with 24 G RAM,16 cores @2.27 Ghz Ubuntu >> 14.04 LTS with 3.16-3 kernel.All are connected to 10G ports with >> maximum MTU.There are no extra disks for journaling and also there >> are no separate network for replication and data transfer.All 3 >> nodes are also hosting monitoring process.Operating system runs on >> SATA disk. >> >> When doing a sequential benchmark using "dd" on RBD, mounted on >> client as ext4 its taking 110s to write 100Mb
Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel
I would say check with rados tool like ceph_smalliobench/rados bench first to see how much performance these tools are reporting. This will help you to isolate any upstream issues. Also, check with ‘iostat –xk 1’ for the resource utilization. Hope you are running with powerful enough cpu complex since you are saying network is not a bottleneck. Thanks & Regards Somnath From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of mad Engineer Sent: Saturday, February 28, 2015 12:29 PM To: Alexandre DERUMIER Cc: ceph-users Subject: Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel reinstalled ceph packages and now with memstore backend [osd objectstore =memstore] its giving 400Kbps .No idea where the problem is. On Sun, Mar 1, 2015 at 12:30 AM, mad Engineer mailto:themadengin...@gmail.com>> wrote: tried changing scheduler from deadline to noop also upgraded to Gaint and btrfs filesystem,downgraded kernel to 3.16 from 3.16-3 not much difference dd if=/dev/zero of=hi bs=4k count=25000 oflag=direct 25000+0 records in 25000+0 records out 10240 bytes (102 MB) copied, 94.691 s, 1.1 MB/s Earlier on a vmware setup i was getting ~850 KBps and now even on physical server with SSD drives its just over 1MBps.I doubt some serious configuration issues. Tried iperf between 3 servers all are showing 9 Gbps,tried icmp with different packet size ,no fragmentation. i also noticed that out of 9 osd 5 are 850 EVO and 4 are 840 EVO.I believe this will not cause this much drop in performance. Thanks for any help On Sat, Feb 28, 2015 at 6:49 PM, Alexandre DERUMIER mailto:aderum...@odiso.com>> wrote: As optimisation, try to set ioscheduler to noop, and also enable rbd_cache=true. (It's really helping for for sequential writes) but your results seem quite low, 926kb/s with 4k, it's only 200io/s. check if you don't have any big network latencies, or mtu fragementation problem. Maybe also try to bench with fio, with more parallel jobs. - Mail original - De: "mad Engineer" mailto:themadengin...@gmail.com>> À: "Philippe Schwarz" mailto:p...@schwarz-fr.net>> Cc: "ceph-users" mailto:ceph-users@lists.ceph.com>> Envoyé: Samedi 28 Février 2015 13:06:59 Objet: Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel Thanks for the reply Philippe,we were using these disks in our NAS,now it looks like i am in big trouble :-( On Sat, Feb 28, 2015 at 5:02 PM, Philippe Schwarz mailto:p...@schwarz-fr.net>> wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > Le 28/02/2015 12:19, mad Engineer a écrit : >> Hello All, >> >> I am trying ceph-firefly 0.80.8 >> (69eaad7f8308f21573c604f121956e64679a52a7) with 9 OSD ,all Samsung >> SSD 850 EVO on 3 servers with 24 G RAM,16 cores @2.27 Ghz Ubuntu >> 14.04 LTS with 3.16-3 kernel.All are connected to 10G ports with >> maximum MTU.There are no extra disks for journaling and also there >> are no separate network for replication and data transfer.All 3 >> nodes are also hosting monitoring process.Operating system runs on >> SATA disk. >> >> When doing a sequential benchmark using "dd" on RBD, mounted on >> client as ext4 its taking 110s to write 100Mb data at an average >> speed of 926Kbps. >> >> time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct >> 25000+0 records in 25000+0 records out 10240 bytes (102 MB) >> copied, 110.582 s, 926 kB/s >> >> real 1m50.585s user 0m0.106s sys 0m2.233s >> >> While doing this directly on ssd mount point shows: >> >> time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct >> 25000+0 records in 25000+0 records out 10240 bytes (102 MB) >> copied, 1.38567 s, 73.9 MB/s >> >> OSDs are in XFS with these extra arguments : >> >> rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M >> >> ceph.conf >> >> [global] fsid = 7d889081-7826-439c-9fe5-d4e57480d9be >> mon_initial_members = ceph1, ceph2, ceph3 mon_host = >> 10.99.10.118,10.99.10.119,10.99.10.120 auth_cluster_required = >> cephx auth_service_required = cephx auth_client_required = cephx >> filestore_xattr_use_omap = true osd_pool_default_size = 2 >> osd_pool_default_min_size = 2 osd_pool_default_pg_num = 450 >> osd_pool_default_pgp_num = 450 max_open_files = 131072 >> >> [osd] osd_mkfs_type = xfs osd_op_threads = 8 osd_disk_threads = 4 >> osd_mount_options_xfs = >> "rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M" >> >> >> on our traditional storage with Full SAS disk, same "dd" completes >> in 16s with an average write speed of 6Mbps. >> >>
Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel
reinstalled ceph packages and now with memstore backend [osd objectstore =memstore] its giving 400Kbps .No idea where the problem is. On Sun, Mar 1, 2015 at 12:30 AM, mad Engineer wrote: > tried changing scheduler from deadline to noop also upgraded to Gaint and > btrfs filesystem,downgraded kernel to 3.16 from 3.16-3 not much difference > > dd if=/dev/zero of=hi bs=4k count=25000 oflag=direct > 25000+0 records in > 25000+0 records out > 10240 bytes (102 MB) copied, 94.691 s, 1.1 MB/s > > Earlier on a vmware setup i was getting ~850 KBps and now even on physical > server with SSD drives its just over 1MBps.I doubt some serious > configuration issues. > > Tried iperf between 3 servers all are showing 9 Gbps,tried icmp with > different packet size ,no fragmentation. > > i also noticed that out of 9 osd 5 are 850 EVO and 4 are 840 EVO.I believe > this will not cause this much drop in performance. > > Thanks for any help > > > On Sat, Feb 28, 2015 at 6:49 PM, Alexandre DERUMIER > wrote: > >> As optimisation, >> >> try to set ioscheduler to noop, >> >> and also enable rbd_cache=true. (It's really helping for for sequential >> writes) >> >> but your results seem quite low, 926kb/s with 4k, it's only 200io/s. >> >> check if you don't have any big network latencies, or mtu fragementation >> problem. >> >> Maybe also try to bench with fio, with more parallel jobs. >> >> >> >> >> ----- Mail original ----- >> De: "mad Engineer" >> À: "Philippe Schwarz" >> Cc: "ceph-users" >> Envoyé: Samedi 28 Février 2015 13:06:59 >> Objet: Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and >> 9 OSD with 3.16-3 kernel >> >> Thanks for the reply Philippe,we were using these disks in our NAS,now >> it looks like i am in big trouble :-( >> >> On Sat, Feb 28, 2015 at 5:02 PM, Philippe Schwarz >> wrote: >> > -BEGIN PGP SIGNED MESSAGE- >> > Hash: SHA1 >> > >> > Le 28/02/2015 12:19, mad Engineer a écrit : >> >> Hello All, >> >> >> >> I am trying ceph-firefly 0.80.8 >> >> (69eaad7f8308f21573c604f121956e64679a52a7) with 9 OSD ,all Samsung >> >> SSD 850 EVO on 3 servers with 24 G RAM,16 cores @2.27 Ghz Ubuntu >> >> 14.04 LTS with 3.16-3 kernel.All are connected to 10G ports with >> >> maximum MTU.There are no extra disks for journaling and also there >> >> are no separate network for replication and data transfer.All 3 >> >> nodes are also hosting monitoring process.Operating system runs on >> >> SATA disk. >> >> >> >> When doing a sequential benchmark using "dd" on RBD, mounted on >> >> client as ext4 its taking 110s to write 100Mb data at an average >> >> speed of 926Kbps. >> >> >> >> time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct >> >> 25000+0 records in 25000+0 records out 10240 bytes (102 MB) >> >> copied, 110.582 s, 926 kB/s >> >> >> >> real 1m50.585s user 0m0.106s sys 0m2.233s >> >> >> >> While doing this directly on ssd mount point shows: >> >> >> >> time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct >> >> 25000+0 records in 25000+0 records out 10240 bytes (102 MB) >> >> copied, 1.38567 s, 73.9 MB/s >> >> >> >> OSDs are in XFS with these extra arguments : >> >> >> >> rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M >> >> >> >> ceph.conf >> >> >> >> [global] fsid = 7d889081-7826-439c-9fe5-d4e57480d9be >> >> mon_initial_members = ceph1, ceph2, ceph3 mon_host = >> >> 10.99.10.118,10.99.10.119,10.99.10.120 auth_cluster_required = >> >> cephx auth_service_required = cephx auth_client_required = cephx >> >> filestore_xattr_use_omap = true osd_pool_default_size = 2 >> >> osd_pool_default_min_size = 2 osd_pool_default_pg_num = 450 >> >> osd_pool_default_pgp_num = 450 max_open_files = 131072 >> >> >> >> [osd] osd_mkfs_type = xfs osd_op_threads = 8 osd_disk_threads = 4 >> >> osd_mount_options_xfs = >> >> "rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M" >> >> >> >> >> >> on our traditional storage with Full SAS disk, same "dd" completes >> >> in 16s with an average write speed of 6Mbps. >> >> >> >> Rados bench: >>
Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel
tried changing scheduler from deadline to noop also upgraded to Gaint and btrfs filesystem,downgraded kernel to 3.16 from 3.16-3 not much difference dd if=/dev/zero of=hi bs=4k count=25000 oflag=direct 25000+0 records in 25000+0 records out 10240 bytes (102 MB) copied, 94.691 s, 1.1 MB/s Earlier on a vmware setup i was getting ~850 KBps and now even on physical server with SSD drives its just over 1MBps.I doubt some serious configuration issues. Tried iperf between 3 servers all are showing 9 Gbps,tried icmp with different packet size ,no fragmentation. i also noticed that out of 9 osd 5 are 850 EVO and 4 are 840 EVO.I believe this will not cause this much drop in performance. Thanks for any help On Sat, Feb 28, 2015 at 6:49 PM, Alexandre DERUMIER wrote: > As optimisation, > > try to set ioscheduler to noop, > > and also enable rbd_cache=true. (It's really helping for for sequential > writes) > > but your results seem quite low, 926kb/s with 4k, it's only 200io/s. > > check if you don't have any big network latencies, or mtu fragementation > problem. > > Maybe also try to bench with fio, with more parallel jobs. > > > > > - Mail original - > De: "mad Engineer" > À: "Philippe Schwarz" > Cc: "ceph-users" > Envoyé: Samedi 28 Février 2015 13:06:59 > Objet: Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 > OSD with 3.16-3 kernel > > Thanks for the reply Philippe,we were using these disks in our NAS,now > it looks like i am in big trouble :-( > > On Sat, Feb 28, 2015 at 5:02 PM, Philippe Schwarz > wrote: > > -BEGIN PGP SIGNED MESSAGE- > > Hash: SHA1 > > > > Le 28/02/2015 12:19, mad Engineer a écrit : > >> Hello All, > >> > >> I am trying ceph-firefly 0.80.8 > >> (69eaad7f8308f21573c604f121956e64679a52a7) with 9 OSD ,all Samsung > >> SSD 850 EVO on 3 servers with 24 G RAM,16 cores @2.27 Ghz Ubuntu > >> 14.04 LTS with 3.16-3 kernel.All are connected to 10G ports with > >> maximum MTU.There are no extra disks for journaling and also there > >> are no separate network for replication and data transfer.All 3 > >> nodes are also hosting monitoring process.Operating system runs on > >> SATA disk. > >> > >> When doing a sequential benchmark using "dd" on RBD, mounted on > >> client as ext4 its taking 110s to write 100Mb data at an average > >> speed of 926Kbps. > >> > >> time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct > >> 25000+0 records in 25000+0 records out 10240 bytes (102 MB) > >> copied, 110.582 s, 926 kB/s > >> > >> real 1m50.585s user 0m0.106s sys 0m2.233s > >> > >> While doing this directly on ssd mount point shows: > >> > >> time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct > >> 25000+0 records in 25000+0 records out 10240 bytes (102 MB) > >> copied, 1.38567 s, 73.9 MB/s > >> > >> OSDs are in XFS with these extra arguments : > >> > >> rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M > >> > >> ceph.conf > >> > >> [global] fsid = 7d889081-7826-439c-9fe5-d4e57480d9be > >> mon_initial_members = ceph1, ceph2, ceph3 mon_host = > >> 10.99.10.118,10.99.10.119,10.99.10.120 auth_cluster_required = > >> cephx auth_service_required = cephx auth_client_required = cephx > >> filestore_xattr_use_omap = true osd_pool_default_size = 2 > >> osd_pool_default_min_size = 2 osd_pool_default_pg_num = 450 > >> osd_pool_default_pgp_num = 450 max_open_files = 131072 > >> > >> [osd] osd_mkfs_type = xfs osd_op_threads = 8 osd_disk_threads = 4 > >> osd_mount_options_xfs = > >> "rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M" > >> > >> > >> on our traditional storage with Full SAS disk, same "dd" completes > >> in 16s with an average write speed of 6Mbps. > >> > >> Rados bench: > >> > >> rados bench -p rbd 10 write Maintaining 16 concurrent writes of > >> 4194304 bytes for up to 10 seconds or 0 objects Object prefix: > >> benchmark_data_ceph1_2977 sec Cur ops started finished avg MB/s > >> cur MB/s last lat avg lat 0 0 0 0 > >> 0 0 - 0 1 16 94 78 > >> 311.821 312 0.041228 0.140132 2 16 192 176 > >> 351.866 392 0.106294 0.175055 3 16 275 259 > >> 345.216 332 0.076795 0.166036 4 16 302 286 > >> 285.912 108 0.043888 0.196419 5 16 395 379 > >> 303.11 372 0.126033 0.207488 6 16 501 485 > >> 323.242 424 0.125972 0
Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel
Am 28.02.2015 um 19:41 schrieb Kevin Walker: What about the Samsung 845DC Pro SSD's? These have fantastic enterprise performance characteristics. http://www.thessdreview.com/our-reviews/samsung-845dc-pro-review-800gb-class-leading-speed-endurance/ Or use SV843 from Samsung Semiconductor (seperate samsung company). Stefan Kind regards Kevin On 28 February 2015 at 15:32, Philippe Schwarz mailto:p...@schwarz-fr.net>> wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Le 28/02/2015 12:19, mad Engineer a écrit : > Hello All, > > I am trying ceph-firefly 0.80.8 > (69eaad7f8308f21573c604f121956e64679a52a7) with 9 OSD ,all Samsung > SSD 850 EVO on 3 servers with 24 G RAM,16 cores @2.27 Ghz Ubuntu > 14.04 LTS with 3.16-3 kernel.All are connected to 10G ports with > maximum MTU.There are no extra disks for journaling and also there > are no separate network for replication and data transfer.All 3 > nodes are also hosting monitoring process.Operating system runs on > SATA disk. > > When doing a sequential benchmark using "dd" on RBD, mounted on > client as ext4 its taking 110s to write 100Mb data at an average > speed of 926Kbps. > > time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct > 25000+0 records in 25000+0 records out 10240 bytes (102 MB) > copied, 110.582 s, 926 kB/s > > real1m50.585s user0m0.106s sys 0m2.233s > > While doing this directly on ssd mount point shows: > > time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct > 25000+0 records in 25000+0 records out 10240 bytes (102 MB) > copied, 1.38567 s, 73.9 MB/s > > OSDs are in XFS with these extra arguments : > > rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M > > ceph.conf > > [global] fsid = 7d889081-7826-439c-9fe5-d4e57480d9be > mon_initial_members = ceph1, ceph2, ceph3 mon_host = > 10.99.10.118,10.99.10.119,10.99.10.120 auth_cluster_required = > cephx auth_service_required = cephx auth_client_required = cephx > filestore_xattr_use_omap = true osd_pool_default_size = 2 > osd_pool_default_min_size = 2 osd_pool_default_pg_num = 450 > osd_pool_default_pgp_num = 450 max_open_files = 131072 > > [osd] osd_mkfs_type = xfs osd_op_threads = 8 osd_disk_threads = 4 > osd_mount_options_xfs = > "rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M" > > > on our traditional storage with Full SAS disk, same "dd" completes > in 16s with an average write speed of 6Mbps. > > Rados bench: > > rados bench -p rbd 10 write Maintaining 16 concurrent writes of > 4194304 bytes for up to 10 seconds or 0 objects Object prefix: > benchmark_data_ceph1_2977 sec Cur ops started finished avg MB/s > cur MB/s last lat avg lat 0 0 0 0 > 0 0 - 0 1 169478 > 311.821 312 0.041228 0.140132 2 16 192 176 > 351.866 392 0.106294 0.175055 3 16 275 259 > 345.216 332 0.076795 0.166036 4 16 302 286 > 285.912 108 0.043888 0.196419 5 16 395 379 > 303.11 372 0.126033 0.207488 6 16 501 485 > 323.242 424 0.125972 0.194559 7 16 621 605 > 345.621 480 0.194155 0.183123 8 16 730 714 > 356.903 436 0.086678 0.176099 9 16 814 798 > 354.572 336 0.081567 0.174786 10 16 832 > 816 326.31372 0.037431 0.182355 11 16 833 > 817 297.013 4 0.533326 0.182784 Total time run: > 11.489068 Total writes made: 833 Write size: > 4194304 Bandwidth (MB/sec): 290.015 > > Stddev Bandwidth: 175.723 Max bandwidth (MB/sec): 480 Min > bandwidth (MB/sec): 0 Average Latency:0.220582 Stddev > Latency: 0.343697 Max latency:2.85104 Min > latency:0.035381 > > Our ultimate aim is to replace existing SAN with ceph,but for that > it should meet minimum 8000 iops.Can any one help me with this,OSD > are SSD,CPU has good clock speed,backend network is good but still > we are not able to extract full capability of SSD disks. > > > > Thanks, Hi, i'm new to ceph so, don't consider my words as holy truth. It seems that Samsung 840 (so i assume 850) are crappy for ceph : MTBF : http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-November/044258.html Bandwidth :http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-December/045247.html And according to a confirmed user of Ceph/ProxmoX, Samsung SSDs should be avoided if possible in ceph storage. Apart fr
Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel
What about the Samsung 845DC Pro SSD's? These have fantastic enterprise performance characteristics. http://www.thessdreview.com/our-reviews/samsung-845dc-pro-review-800gb-class-leading-speed-endurance/ Kind regards Kevin On 28 February 2015 at 15:32, Philippe Schwarz wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > Le 28/02/2015 12:19, mad Engineer a écrit : > > Hello All, > > > > I am trying ceph-firefly 0.80.8 > > (69eaad7f8308f21573c604f121956e64679a52a7) with 9 OSD ,all Samsung > > SSD 850 EVO on 3 servers with 24 G RAM,16 cores @2.27 Ghz Ubuntu > > 14.04 LTS with 3.16-3 kernel.All are connected to 10G ports with > > maximum MTU.There are no extra disks for journaling and also there > > are no separate network for replication and data transfer.All 3 > > nodes are also hosting monitoring process.Operating system runs on > > SATA disk. > > > > When doing a sequential benchmark using "dd" on RBD, mounted on > > client as ext4 its taking 110s to write 100Mb data at an average > > speed of 926Kbps. > > > > time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct > > 25000+0 records in 25000+0 records out 10240 bytes (102 MB) > > copied, 110.582 s, 926 kB/s > > > > real1m50.585s user0m0.106s sys 0m2.233s > > > > While doing this directly on ssd mount point shows: > > > > time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct > > 25000+0 records in 25000+0 records out 10240 bytes (102 MB) > > copied, 1.38567 s, 73.9 MB/s > > > > OSDs are in XFS with these extra arguments : > > > > rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M > > > > ceph.conf > > > > [global] fsid = 7d889081-7826-439c-9fe5-d4e57480d9be > > mon_initial_members = ceph1, ceph2, ceph3 mon_host = > > 10.99.10.118,10.99.10.119,10.99.10.120 auth_cluster_required = > > cephx auth_service_required = cephx auth_client_required = cephx > > filestore_xattr_use_omap = true osd_pool_default_size = 2 > > osd_pool_default_min_size = 2 osd_pool_default_pg_num = 450 > > osd_pool_default_pgp_num = 450 max_open_files = 131072 > > > > [osd] osd_mkfs_type = xfs osd_op_threads = 8 osd_disk_threads = 4 > > osd_mount_options_xfs = > > "rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M" > > > > > > on our traditional storage with Full SAS disk, same "dd" completes > > in 16s with an average write speed of 6Mbps. > > > > Rados bench: > > > > rados bench -p rbd 10 write Maintaining 16 concurrent writes of > > 4194304 bytes for up to 10 seconds or 0 objects Object prefix: > > benchmark_data_ceph1_2977 sec Cur ops started finished avg MB/s > > cur MB/s last lat avg lat 0 0 0 0 > > 0 0 - 0 1 169478 > > 311.821 312 0.041228 0.140132 2 16 192 176 > > 351.866 392 0.106294 0.175055 3 16 275 259 > > 345.216 332 0.076795 0.166036 4 16 302 286 > > 285.912 108 0.043888 0.196419 5 16 395 379 > > 303.11 372 0.126033 0.207488 6 16 501 485 > > 323.242 424 0.125972 0.194559 7 16 621 605 > > 345.621 480 0.194155 0.183123 8 16 730 714 > > 356.903 436 0.086678 0.176099 9 16 814 798 > > 354.572 336 0.081567 0.174786 10 16 832 > > 816 326.31372 0.037431 0.182355 11 16 833 > > 817 297.013 4 0.533326 0.182784 Total time run: > > 11.489068 Total writes made: 833 Write size: > > 4194304 Bandwidth (MB/sec): 290.015 > > > > Stddev Bandwidth: 175.723 Max bandwidth (MB/sec): 480 Min > > bandwidth (MB/sec): 0 Average Latency:0.220582 Stddev > > Latency: 0.343697 Max latency:2.85104 Min > > latency:0.035381 > > > > Our ultimate aim is to replace existing SAN with ceph,but for that > > it should meet minimum 8000 iops.Can any one help me with this,OSD > > are SSD,CPU has good clock speed,backend network is good but still > > we are not able to extract full capability of SSD disks. > > > > > > > > Thanks, > > Hi, i'm new to ceph so, don't consider my words as holy truth. > > It seems that Samsung 840 (so i assume 850) are crappy for ceph : > > MTBF : > > http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-November/044258.html > Bandwidth > : > http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-December/045247.html > > And according to a confirmed user of Ceph/ProxmoX, Samsung SSDs should > be avoided if possible in ceph storage. > > Apart from that, it seems there was an limitation in ceph for the use > of the complete bandwidth available in SSDs; but i think with less > than 1Mb/s you haven't hit this limit. > > I remind you that i'm not a ceph-guru (far from that, indeed), so feel > free to disagree; i'm on the way to improve my knowledge. > > Best regards. > > > > > -BEGIN PGP SIGNATURE- > Version: GnuPG v1 > > iEYEARECAAYFAlTxp0UACgkQlhqCFk
Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel
Hi Andrei, If there is one thing I've come to understand by now is that ceph configs, performance, hw and well - everything - seems to vary on almost people basis. I do not recognize that latency issue either, this is from one of our nodes (4x 500GB samsung 840 pro - sd[c-f]) which has been running for 600+ days (so the iostat -x is an avg of that): # uptime 16:24:57 up 611 days, 4:03, 1 user, load average: 1.18, 1.55, 1.72 # iostat -x [ ... ] Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sdc 0.00 0.164.87 22.62 344.18 458.65 58.41 0.051.920.452.24 0.76 2.10 sdd 0.00 0.124.37 20.02 317.98 437.95 61.98 0.051.900.442.21 0.78 1.91 sde 0.00 0.124.17 19.33 302.45 403.02 60.02 0.041.870.432.18 0.77 1.80 sdf 0.00 0.124.51 20.84 322.84 439.70 60.17 0.051.840.432.15 0.76 1.93 [ ... ] Granted, we do not have very high usage on this cluster on a ssd-basis and it might change as we put more load on it, but we will deal with it then. I do not think ~2ms access time is neither good nor bad. This is from another cluster we operate - this one has an intel DC S3700 800gb ssd (sdb) # uptime 09:37:26 up 654 days, 8:40, 1 user, load average: 0.33, 0.40, 0.54 # iostat -x [ ... ] sdb 0.01 1.49 39.76 86.79 1252.80 2096.98 52.94 0.020.761.220.54 0.41 5.21 [ ... ] It is misleading as the latter just have 3 disks + hardware based 1gb backed raidcontroller whereas the first is a 'cheap' dumb 12disk jbod IT based setup. All the ssd from both clusters have 3 partitions - 1 ceph-data and 2 journal partitions (1 journal for the ssd itself and 1 journal for 1 platter disk). The intel ssd is very sturdy though - it has had a 2.1MB/sec avg. write over 654 days - that is somewhere around 120TB so far. But ultimately it boils down to what you need - in our usecase the latter cluster has be to rockstable and performing - and we chose the intel ones based on that. The first one we don't really care if we loose a node or two and we replace disks every month or whenever it fits into our going-to-datacenter-schedule - we wanted an ok'ish performing cluster and focused more on total space / price than highperforming hardware. The fantastic thing is we are not locked into any specific hardware and we can replace any of it if we need to and/or find it is suddenly starting to have issues. Cheers, Martin On Sat, Feb 28, 2015 at 2:55 PM, Andrei Mikhailovsky wrote: > > Martin, > > I have been using Samsung 840 Pro for journals about 2 years now and have > just replaced all my samsung drives with Intel. We have found a lot of > performance issues with 840 Pro (we are using 128mb). In particular, a very > strange behaviour with using 4 partitions (with 50% underprovisioning left > as empty unpartitioned space on the drive) where the drive would grind to > almost a halt after a few weeks of use. I was getting 100% utilisation on > the drives doing just 3-4MB/s writes. This was not the case when I've > installed the new drives. Manual Trimming helps for a few weeks until the > same happens again. > > This has been happening with all 840 Pro ssds that we have and contacting > Samsung Support has proven to be utterly useless. They do not want to speak > with you until you install windows and run their monkey utility ((. > > Also, i've noticed the latencies of the Samsung 840 Pro ssd drives to be > about 15-20 slower compared with a consumer grade Intel drives, like Intel > 520. According to ceph osd pef, I would consistently get higher figures on > the osds with Samsung journal drive compared with the Intel drive on the > same server. Something like 2-3ms for Intel vs 40-50ms for Samsungs. > > At some point we had enough with Samsungs and scrapped them. > > Andrei > > -- > > *From: *"Martin B Nielsen" > *To: *"Philippe Schwarz" > *Cc: *ceph-users@lists.ceph.com > *Sent: *Saturday, 28 February, 2015 11:51:57 AM > *Subject: *Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes > and 9 OSD with 3.16-3 kernel > > > Hi, > > I cannot recognize that picture; we've been using samsumg 840 pro in > production for almost 2 years now - and have had 1 fail. > > We run a 8node mixed ssd/platter cluster with 4x samsung 840 pro (500gb) > in each so that is 32x ssd. > > They've written ~25TB data in avg each. > > Using the dd you had inside an existing semi-busy mysql-guest I get: > > 10240 bytes (102 MB) copied, 5.58218 s, 18.3 MB/s > > Which is still not a lot, but I think it i
Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel
Martin, I have been using Samsung 840 Pro for journals about 2 years now and have just replaced all my samsung drives with Intel. We have found a lot of performance issues with 840 Pro (we are using 128mb). In particular, a very strange behaviour with using 4 partitions (with 50% underprovisioning left as empty unpartitioned space on the drive) where the drive would grind to almost a halt after a few weeks of use. I was getting 100% utilisation on the drives doing just 3-4MB/s writes. This was not the case when I've installed the new drives. Manual Trimming helps for a few weeks until the same happens again. This has been happening with all 840 Pro ssds that we have and contacting Samsung Support has proven to be utterly useless. They do not want to speak with you until you install windows and run their monkey utility ((. Also, i've noticed the latencies of the Samsung 840 Pro ssd drives to be about 15-20 slower compared with a consumer grade Intel drives, like Intel 520. According to ceph osd pef, I would consistently get higher figures on the osds with Samsung journal drive compared with the Intel drive on the same server. Something like 2-3ms for Intel vs 40-50ms for Samsungs. At some point we had enough with Samsungs and scrapped them. Andrei - Original Message - > From: "Martin B Nielsen" > To: "Philippe Schwarz" > Cc: ceph-users@lists.ceph.com > Sent: Saturday, 28 February, 2015 11:51:57 AM > Subject: Re: [ceph-users] Extreme slowness in SSD cluster with 3 > nodes and 9 OSD with 3.16-3 kernel > Hi, > I cannot recognize that picture; we've been using samsumg 840 pro in > production for almost 2 years now - and have had 1 fail. > We run a 8node mixed ssd/platter cluster with 4x samsung 840 pro > (500gb) in each so that is 32x ssd. > They've written ~25TB data in avg each. > Using the dd you had inside an existing semi-busy mysql-guest I get: > 10240 bytes (102 MB) copied, 5.58218 s, 18.3 MB/s > Which is still not a lot, but I think it is more a limitation of our > setup/load. > We are using dumpling. > All that aside, I would prob. go with something tried and tested if I > was to redo it today - we haven't had any issues, but it is still > nice to use something you know should have a baseline performance > and can compare to that. > Cheers, > Martin > On Sat, Feb 28, 2015 at 12:32 PM, Philippe Schwarz < > p...@schwarz-fr.net > wrote: > > -BEGIN PGP SIGNED MESSAGE- > > > Hash: SHA1 > > > Le 28/02/2015 12:19, mad Engineer a écrit : > > > > Hello All, > > > > > > > > I am trying ceph-firefly 0.80.8 > > > > (69eaad7f8308f21573c604f121956e64679a52a7) with 9 OSD ,all > > > Samsung > > > > SSD 850 EVO on 3 servers with 24 G RAM,16 cores @2.27 Ghz Ubuntu > > > > 14.04 LTS with 3.16-3 kernel.All are connected to 10G ports with > > > > maximum MTU.There are no extra disks for journaling and also > > > there > > > > are no separate network for replication and data transfer.All 3 > > > > nodes are also hosting monitoring process.Operating system runs > > > on > > > > SATA disk. > > > > > > > > When doing a sequential benchmark using "dd" on RBD, mounted on > > > > client as ext4 its taking 110s to write 100Mb data at an average > > > > speed of 926Kbps. > > > > > > > > time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct > > > > 25000+0 records in 25000+0 records out 10240 bytes (102 MB) > > > > copied, 110.582 s, 926 kB/s > > > > > > > > real 1m50.585s user 0m0.106s sys 0m2.233s > > > > > > > > While doing this directly on ssd mount point shows: > > > > > > > > time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct > > > > 25000+0 records in 25000+0 records out 10240 bytes (102 MB) > > > > copied, 1.38567 s, 73.9 MB/s > > > > > > > > OSDs are in XFS with these extra arguments : > > > > > > > > rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M > > > > > > > > ceph.conf > > > > > > > > [global] fsid = 7d889081-7826-439c-9fe5-d4e57480d9be > > > > mon_initial_members = ceph1, ceph2, ceph3 mon_host = > > > > 10.99.10.118,10.99.10.119,10.99.10.120 auth_cluster_required = > > > > cephx auth_service_required = cephx auth_client_required = cephx > > > > filestore_xattr_use_omap = true osd_pool_default_size = 2 > > > > osd_pool_default_min_size = 2 osd
Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel
thanks for that link Alexandre, as per that link tried these: *850 EVO* *without dsync* dd if=randfile of=/dev/sdb1 bs=4k count=10 oflag=direct 10+0 records in 10+0 records out 40960 bytes (410 MB) copied, 4.42913 s, 92.5 MB/s with *dsync*: dd if=randfile of=/dev/sdb1 bs=4k count=10 oflag=direct,dsync 10+0 records in 10+0 records out 40960 bytes (410 MB) copied, 83.4916 s, 4.9 MB/s *on 840 EVO* dd if=randfile of=/dev/sdd1 bs=4k count=10 oflag=direct 10+0 records in 10+0 records out 40960 bytes (410 MB) copied, 5.11912 s, 80.0 MB/s *with dsync* dd if=randfile of=/dev/sdd1 bs=4k count=10 oflag=direct,dsync 10+0 records in 10+0 records out 40960 bytes (410 MB) copied, 196.738 s, 2.1 MB/s So with dsync there is significant reduction in performance,looks like 850 is better than 840.Can this be the reason for reduced write speed of 926kbps? Also before trying on physical servers i ran ceph on vmware vms with SAS disks using giant 0.87 ,at that time fire-fly 80.8 was giving higher numbers,so decided to use firefly. On Sat, Feb 28, 2015 at 5:13 PM, Alexandre DERUMIER wrote: > Hi, > > First, test if your ssd can write fast with O_DSYNC > check this blog: > > http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/ > > > Then, try with ceph Giant (or maybe wait for Hammer), because they are a > lot of optimisations for ssd for threads sharding. > > In my last test with giant, I was able to reach around 12iops with > 6osd/intel s3500 ssd, but I was cpu limited. > > - Mail original - > De: "mad Engineer" > À: "ceph-users" > Envoyé: Samedi 28 Février 2015 12:19:56 > Objet: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 > OSD with 3.16-3 kernel > > Hello All, > > I am trying ceph-firefly 0.80.8 > (69eaad7f8308f21573c604f121956e64679a52a7) with 9 OSD ,all Samsung SSD > 850 EVO on 3 servers with 24 G RAM,16 cores @2.27 Ghz Ubuntu 14.04 LTS > with 3.16-3 kernel.All are connected to 10G ports with maximum > MTU.There are no extra disks for journaling and also there are no > separate network for replication and data transfer.All 3 nodes are > also hosting monitoring process.Operating system runs on SATA disk. > > When doing a sequential benchmark using "dd" on RBD, mounted on client > as ext4 its taking 110s to write 100Mb data at an average speed of > 926Kbps. > > time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct > 25000+0 records in > 25000+0 records out > 10240 bytes (102 MB) copied, 110.582 s, 926 kB/s > > real 1m50.585s > user 0m0.106s > sys 0m2.233s > > While doing this directly on ssd mount point shows: > > time dd if=/dev/zero of=hello bs=4k count=25000 > oflag=direct > 25000+0 records in > 25000+0 records out > 10240 bytes (102 MB) copied, 1.38567 > s, 73.9 MB/s > > OSDs are in XFS with these extra arguments : > > rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M > > ceph.conf > > [global] > fsid = 7d889081-7826-439c-9fe5-d4e57480d9be > mon_initial_members = ceph1, ceph2, ceph3 > mon_host = 10.99.10.118,10.99.10.119,10.99.10.120 > auth_cluster_required = cephx > auth_service_required = cephx > auth_client_required = cephx > filestore_xattr_use_omap = true > osd_pool_default_size = 2 > osd_pool_default_min_size = 2 > osd_pool_default_pg_num = 450 > osd_pool_default_pgp_num = 450 > max_open_files = 131072 > > [osd] > osd_mkfs_type = xfs > osd_op_threads = 8 > osd_disk_threads = 4 > osd_mount_options_xfs = > "rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M" > > > on our traditional storage with Full SAS disk, same "dd" completes in > 16s with an average write speed of 6Mbps. > > Rados bench: > > rados bench -p rbd 10 write > Maintaining 16 concurrent writes of 4194304 bytes for up to 10 > seconds or 0 objects > Object prefix: benchmark_data_ceph1_2977 > sec Cur ops started finished avg MB/s cur MB/s last lat avg lat > 0 0 0 0 0 0 - 0 > 1 16 94 78 311.821 312 0.041228 0.140132 > 2 16 192 176 351.866 392 0.106294 0.175055 > 3 16 275 259 345.216 332 0.076795 0.166036 > 4 16 302 286 285.912 108 0.043888 0.196419 > 5 16 395 379 303.11 372 0.126033 0.207488 > 6 16 501 485 323.242 424 0.125972 0.194559 > 7 16 621 605 345.621 480 0.194155 0.183123 > 8 16 730 714 356.903 436 0.086678 0.176099 > 9 16 814 798 354.572 336 0.081567 0.174786 > 10 16 832 816 326.313 72 0.037431 0.182355 > 11 16 833 817 297.013 4 0.533326 0.182784 > Total time run: 11.489068 > Total writes made: 833 > Write size: 4194304 > Bandwidth (MB/sec): 290.015 > > Stddev Bandwidth: 175.723 > Max
Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel
As optimisation, try to set ioscheduler to noop, and also enable rbd_cache=true. (It's really helping for for sequential writes) but your results seem quite low, 926kb/s with 4k, it's only 200io/s. check if you don't have any big network latencies, or mtu fragementation problem. Maybe also try to bench with fio, with more parallel jobs. - Mail original - De: "mad Engineer" À: "Philippe Schwarz" Cc: "ceph-users" Envoyé: Samedi 28 Février 2015 13:06:59 Objet: Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel Thanks for the reply Philippe,we were using these disks in our NAS,now it looks like i am in big trouble :-( On Sat, Feb 28, 2015 at 5:02 PM, Philippe Schwarz wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > Le 28/02/2015 12:19, mad Engineer a écrit : >> Hello All, >> >> I am trying ceph-firefly 0.80.8 >> (69eaad7f8308f21573c604f121956e64679a52a7) with 9 OSD ,all Samsung >> SSD 850 EVO on 3 servers with 24 G RAM,16 cores @2.27 Ghz Ubuntu >> 14.04 LTS with 3.16-3 kernel.All are connected to 10G ports with >> maximum MTU.There are no extra disks for journaling and also there >> are no separate network for replication and data transfer.All 3 >> nodes are also hosting monitoring process.Operating system runs on >> SATA disk. >> >> When doing a sequential benchmark using "dd" on RBD, mounted on >> client as ext4 its taking 110s to write 100Mb data at an average >> speed of 926Kbps. >> >> time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct >> 25000+0 records in 25000+0 records out 10240 bytes (102 MB) >> copied, 110.582 s, 926 kB/s >> >> real 1m50.585s user 0m0.106s sys 0m2.233s >> >> While doing this directly on ssd mount point shows: >> >> time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct >> 25000+0 records in 25000+0 records out 10240 bytes (102 MB) >> copied, 1.38567 s, 73.9 MB/s >> >> OSDs are in XFS with these extra arguments : >> >> rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M >> >> ceph.conf >> >> [global] fsid = 7d889081-7826-439c-9fe5-d4e57480d9be >> mon_initial_members = ceph1, ceph2, ceph3 mon_host = >> 10.99.10.118,10.99.10.119,10.99.10.120 auth_cluster_required = >> cephx auth_service_required = cephx auth_client_required = cephx >> filestore_xattr_use_omap = true osd_pool_default_size = 2 >> osd_pool_default_min_size = 2 osd_pool_default_pg_num = 450 >> osd_pool_default_pgp_num = 450 max_open_files = 131072 >> >> [osd] osd_mkfs_type = xfs osd_op_threads = 8 osd_disk_threads = 4 >> osd_mount_options_xfs = >> "rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M" >> >> >> on our traditional storage with Full SAS disk, same "dd" completes >> in 16s with an average write speed of 6Mbps. >> >> Rados bench: >> >> rados bench -p rbd 10 write Maintaining 16 concurrent writes of >> 4194304 bytes for up to 10 seconds or 0 objects Object prefix: >> benchmark_data_ceph1_2977 sec Cur ops started finished avg MB/s >> cur MB/s last lat avg lat 0 0 0 0 >> 0 0 - 0 1 16 94 78 >> 311.821 312 0.041228 0.140132 2 16 192 176 >> 351.866 392 0.106294 0.175055 3 16 275 259 >> 345.216 332 0.076795 0.166036 4 16 302 286 >> 285.912 108 0.043888 0.196419 5 16 395 379 >> 303.11 372 0.126033 0.207488 6 16 501 485 >> 323.242 424 0.125972 0.194559 7 16 621 605 >> 345.621 480 0.194155 0.183123 8 16 730 714 >> 356.903 436 0.086678 0.176099 9 16 814 798 >> 354.572 336 0.081567 0.174786 10 16 832 >> 816 326.313 72 0.037431 0.182355 11 16 833 >> 817 297.013 4 0.533326 0.182784 Total time run: >> 11.489068 Total writes made: 833 Write size: >> 4194304 Bandwidth (MB/sec): 290.015 >> >> Stddev Bandwidth: 175.723 Max bandwidth (MB/sec): 480 Min >> bandwidth (MB/sec): 0 Average Latency: 0.220582 Stddev >> Latency: 0.343697 Max latency: 2.85104 Min >> latency: 0.035381 >> >> Our ultimate aim is to replace existing SAN with ceph,but for that >> it should meet minimum 8000 iops.Can any one help me with this,OSD >> are SSD,CPU has good clock speed,backend network is good but still >> we are not able to extract full capability of SSD disks. >> >> >> >> Thanks, > > Hi, i'm new to ceph so, don't consider my words as holy truth. > > It seems that Samsung 840 (so i assume 850) are crappy for ceph : > > MTBF : > http://lists.cep
Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel
>>But this was replication1? I never was able to do more than 30 000 with >>replication 3. Oh, sorry, it's was about read. for write, I think I was around 3iops with 3 nodes (2x4cores 2,1ghz each), cpu bound, with replication x1. with replication x3, around 9000iops. Going to test on 2x10cores 3,1ghz in some weeks. - Mail original - De: "Stefan Priebe" À: "aderumier" Cc: "mad Engineer" , "ceph-users" Envoyé: Samedi 28 Février 2015 13:42:54 Objet: Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel > Am 28.02.2015 um 12:43 schrieb Alexandre DERUMIER : > > Hi, > > First, test if your ssd can write fast with O_DSYNC > check this blog: > http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/ > > > > Then, try with ceph Giant (or maybe wait for Hammer), because they are a lot > of optimisations for ssd for threads sharding. > > In my last test with giant, I was able to reach around 12iops with > 6osd/intel s3500 ssd, but I was cpu limited. But this was replication1? I never was able to do more than 30 000 with replication 3. Stefan > > - Mail original - > De: "mad Engineer" > À: "ceph-users" > Envoyé: Samedi 28 Février 2015 12:19:56 > Objet: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD > with 3.16-3 kernel > > Hello All, > > I am trying ceph-firefly 0.80.8 > (69eaad7f8308f21573c604f121956e64679a52a7) with 9 OSD ,all Samsung SSD > 850 EVO on 3 servers with 24 G RAM,16 cores @2.27 Ghz Ubuntu 14.04 LTS > with 3.16-3 kernel.All are connected to 10G ports with maximum > MTU.There are no extra disks for journaling and also there are no > separate network for replication and data transfer.All 3 nodes are > also hosting monitoring process.Operating system runs on SATA disk. > > When doing a sequential benchmark using "dd" on RBD, mounted on client > as ext4 its taking 110s to write 100Mb data at an average speed of > 926Kbps. > > time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct > 25000+0 records in > 25000+0 records out > 10240 bytes (102 MB) copied, 110.582 s, 926 kB/s > > real 1m50.585s > user 0m0.106s > sys 0m2.233s > > While doing this directly on ssd mount point shows: > > time dd if=/dev/zero of=hello bs=4k count=25000 > oflag=direct > 25000+0 records in > 25000+0 records out > 10240 bytes (102 MB) copied, 1.38567 > s, 73.9 MB/s > > OSDs are in XFS with these extra arguments : > > rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M > > ceph.conf > > [global] > fsid = 7d889081-7826-439c-9fe5-d4e57480d9be > mon_initial_members = ceph1, ceph2, ceph3 > mon_host = 10.99.10.118,10.99.10.119,10.99.10.120 > auth_cluster_required = cephx > auth_service_required = cephx > auth_client_required = cephx > filestore_xattr_use_omap = true > osd_pool_default_size = 2 > osd_pool_default_min_size = 2 > osd_pool_default_pg_num = 450 > osd_pool_default_pgp_num = 450 > max_open_files = 131072 > > [osd] > osd_mkfs_type = xfs > osd_op_threads = 8 > osd_disk_threads = 4 > osd_mount_options_xfs = > "rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M" > > > on our traditional storage with Full SAS disk, same "dd" completes in > 16s with an average write speed of 6Mbps. > > Rados bench: > > rados bench -p rbd 10 write > Maintaining 16 concurrent writes of 4194304 bytes for up to 10 > seconds or 0 objects > Object prefix: benchmark_data_ceph1_2977 > sec Cur ops started finished avg MB/s cur MB/s last lat avg lat > 0 0 0 0 0 0 - 0 > 1 16 94 78 311.821 312 0.041228 0.140132 > 2 16 192 176 351.866 392 0.106294 0.175055 > 3 16 275 259 345.216 332 0.076795 0.166036 > 4 16 302 286 285.912 108 0.043888 0.196419 > 5 16 395 379 303.11 372 0.126033 0.207488 > 6 16 501 485 323.242 424 0.125972 0.194559 > 7 16 621 605 345.621 480 0.194155 0.183123 > 8 16 730 714 356.903 436 0.086678 0.176099 > 9 16 814 798 354.572 336 0.081567 0.174786 > 10 16 832 816 326.313 72 0.037431 0.182355 > 11 16 833 817 297.013 4 0.533326 0.182784 > Total time run: 11.489068 > Total writes made: 833 > Write size: 4194304 > Bandwidth (MB/sec): 290.015 > > Stddev Bandwidth: 175.723 > Max bandwidth (MB/sec): 480 > Min bandwidth (MB/sec): 0 > Average Latency: 0.220582 > Stddev Latency: 0.343697 > Max latency: 2.85104 > Min latency: 0.035381 > > Our ultimate aim is to replace existing SAN with ceph,but for that it &g
Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel
> Am 28.02.2015 um 12:43 schrieb Alexandre DERUMIER : > > Hi, > > First, test if your ssd can write fast with O_DSYNC > check this blog: > http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/ > > > Then, try with ceph Giant (or maybe wait for Hammer), because they are a lot > of optimisations for ssd for threads sharding. > > In my last test with giant, I was able to reach around 12iops with > 6osd/intel s3500 ssd, but I was cpu limited. But this was replication1? I never was able to do more than 30 000 with replication 3. Stefan > > - Mail original - > De: "mad Engineer" > À: "ceph-users" > Envoyé: Samedi 28 Février 2015 12:19:56 > Objet: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD > with 3.16-3 kernel > > Hello All, > > I am trying ceph-firefly 0.80.8 > (69eaad7f8308f21573c604f121956e64679a52a7) with 9 OSD ,all Samsung SSD > 850 EVO on 3 servers with 24 G RAM,16 cores @2.27 Ghz Ubuntu 14.04 LTS > with 3.16-3 kernel.All are connected to 10G ports with maximum > MTU.There are no extra disks for journaling and also there are no > separate network for replication and data transfer.All 3 nodes are > also hosting monitoring process.Operating system runs on SATA disk. > > When doing a sequential benchmark using "dd" on RBD, mounted on client > as ext4 its taking 110s to write 100Mb data at an average speed of > 926Kbps. > > time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct > 25000+0 records in > 25000+0 records out > 10240 bytes (102 MB) copied, 110.582 s, 926 kB/s > > real 1m50.585s > user 0m0.106s > sys 0m2.233s > > While doing this directly on ssd mount point shows: > > time dd if=/dev/zero of=hello bs=4k count=25000 > oflag=direct > 25000+0 records in > 25000+0 records out > 10240 bytes (102 MB) copied, 1.38567 > s, 73.9 MB/s > > OSDs are in XFS with these extra arguments : > > rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M > > ceph.conf > > [global] > fsid = 7d889081-7826-439c-9fe5-d4e57480d9be > mon_initial_members = ceph1, ceph2, ceph3 > mon_host = 10.99.10.118,10.99.10.119,10.99.10.120 > auth_cluster_required = cephx > auth_service_required = cephx > auth_client_required = cephx > filestore_xattr_use_omap = true > osd_pool_default_size = 2 > osd_pool_default_min_size = 2 > osd_pool_default_pg_num = 450 > osd_pool_default_pgp_num = 450 > max_open_files = 131072 > > [osd] > osd_mkfs_type = xfs > osd_op_threads = 8 > osd_disk_threads = 4 > osd_mount_options_xfs = > "rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M" > > > on our traditional storage with Full SAS disk, same "dd" completes in > 16s with an average write speed of 6Mbps. > > Rados bench: > > rados bench -p rbd 10 write > Maintaining 16 concurrent writes of 4194304 bytes for up to 10 > seconds or 0 objects > Object prefix: benchmark_data_ceph1_2977 > sec Cur ops started finished avg MB/s cur MB/s last lat avg lat > 0 0 0 0 0 0 - 0 > 1 16 94 78 311.821 312 0.041228 0.140132 > 2 16 192 176 351.866 392 0.106294 0.175055 > 3 16 275 259 345.216 332 0.076795 0.166036 > 4 16 302 286 285.912 108 0.043888 0.196419 > 5 16 395 379 303.11 372 0.126033 0.207488 > 6 16 501 485 323.242 424 0.125972 0.194559 > 7 16 621 605 345.621 480 0.194155 0.183123 > 8 16 730 714 356.903 436 0.086678 0.176099 > 9 16 814 798 354.572 336 0.081567 0.174786 > 10 16 832 816 326.313 72 0.037431 0.182355 > 11 16 833 817 297.013 4 0.533326 0.182784 > Total time run: 11.489068 > Total writes made: 833 > Write size: 4194304 > Bandwidth (MB/sec): 290.015 > > Stddev Bandwidth: 175.723 > Max bandwidth (MB/sec): 480 > Min bandwidth (MB/sec): 0 > Average Latency: 0.220582 > Stddev Latency: 0.343697 > Max latency: 2.85104 > Min latency: 0.035381 > > Our ultimate aim is to replace existing SAN with ceph,but for that it > should meet minimum 8000 iops.Can any one help me with this,OSD are > SSD,CPU has good clock speed,backend network is good but still we are > not able to extract full capability of SSD disks. > > > > Thanks, > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel
Thanks for the reply Philippe,we were using these disks in our NAS,now it looks like i am in big trouble :-( On Sat, Feb 28, 2015 at 5:02 PM, Philippe Schwarz wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > Le 28/02/2015 12:19, mad Engineer a écrit : >> Hello All, >> >> I am trying ceph-firefly 0.80.8 >> (69eaad7f8308f21573c604f121956e64679a52a7) with 9 OSD ,all Samsung >> SSD 850 EVO on 3 servers with 24 G RAM,16 cores @2.27 Ghz Ubuntu >> 14.04 LTS with 3.16-3 kernel.All are connected to 10G ports with >> maximum MTU.There are no extra disks for journaling and also there >> are no separate network for replication and data transfer.All 3 >> nodes are also hosting monitoring process.Operating system runs on >> SATA disk. >> >> When doing a sequential benchmark using "dd" on RBD, mounted on >> client as ext4 its taking 110s to write 100Mb data at an average >> speed of 926Kbps. >> >> time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct >> 25000+0 records in 25000+0 records out 10240 bytes (102 MB) >> copied, 110.582 s, 926 kB/s >> >> real1m50.585s user0m0.106s sys 0m2.233s >> >> While doing this directly on ssd mount point shows: >> >> time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct >> 25000+0 records in 25000+0 records out 10240 bytes (102 MB) >> copied, 1.38567 s, 73.9 MB/s >> >> OSDs are in XFS with these extra arguments : >> >> rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M >> >> ceph.conf >> >> [global] fsid = 7d889081-7826-439c-9fe5-d4e57480d9be >> mon_initial_members = ceph1, ceph2, ceph3 mon_host = >> 10.99.10.118,10.99.10.119,10.99.10.120 auth_cluster_required = >> cephx auth_service_required = cephx auth_client_required = cephx >> filestore_xattr_use_omap = true osd_pool_default_size = 2 >> osd_pool_default_min_size = 2 osd_pool_default_pg_num = 450 >> osd_pool_default_pgp_num = 450 max_open_files = 131072 >> >> [osd] osd_mkfs_type = xfs osd_op_threads = 8 osd_disk_threads = 4 >> osd_mount_options_xfs = >> "rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M" >> >> >> on our traditional storage with Full SAS disk, same "dd" completes >> in 16s with an average write speed of 6Mbps. >> >> Rados bench: >> >> rados bench -p rbd 10 write Maintaining 16 concurrent writes of >> 4194304 bytes for up to 10 seconds or 0 objects Object prefix: >> benchmark_data_ceph1_2977 sec Cur ops started finished avg MB/s >> cur MB/s last lat avg lat 0 0 0 0 >> 0 0 - 0 1 169478 >> 311.821 312 0.041228 0.140132 2 16 192 176 >> 351.866 392 0.106294 0.175055 3 16 275 259 >> 345.216 332 0.076795 0.166036 4 16 302 286 >> 285.912 108 0.043888 0.196419 5 16 395 379 >> 303.11 372 0.126033 0.207488 6 16 501 485 >> 323.242 424 0.125972 0.194559 7 16 621 605 >> 345.621 480 0.194155 0.183123 8 16 730 714 >> 356.903 436 0.086678 0.176099 9 16 814 798 >> 354.572 336 0.081567 0.174786 10 16 832 >> 816 326.31372 0.037431 0.182355 11 16 833 >> 817 297.013 4 0.533326 0.182784 Total time run: >> 11.489068 Total writes made: 833 Write size: >> 4194304 Bandwidth (MB/sec): 290.015 >> >> Stddev Bandwidth: 175.723 Max bandwidth (MB/sec): 480 Min >> bandwidth (MB/sec): 0 Average Latency:0.220582 Stddev >> Latency: 0.343697 Max latency:2.85104 Min >> latency:0.035381 >> >> Our ultimate aim is to replace existing SAN with ceph,but for that >> it should meet minimum 8000 iops.Can any one help me with this,OSD >> are SSD,CPU has good clock speed,backend network is good but still >> we are not able to extract full capability of SSD disks. >> >> >> >> Thanks, > > Hi, i'm new to ceph so, don't consider my words as holy truth. > > It seems that Samsung 840 (so i assume 850) are crappy for ceph : > > MTBF : > http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-November/044258.html > Bandwidth > :http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-December/045247.html > > And according to a confirmed user of Ceph/ProxmoX, Samsung SSDs should > be avoided if possible in ceph storage. > > Apart from that, it seems there was an limitation in ceph for the use > of the complete bandwidth available in SSDs; but i think with less > than 1Mb/s you haven't hit this limit. > > I remind you that i'm not a ceph-guru (far from that, indeed), so feel > free to disagree; i'm on the way to improve my knowledge. > > Best regards. > > > > > -BEGIN PGP SIGNATURE- > Version: GnuPG v1 > > iEYEARECAAYFAlTxp0UACgkQlhqCFkbqHRb5+wCgrXCM3VsnVE6PCbbpOmQXCXbr > 8u0An2BUgZWismSK0PxbwVDOD5+/UWik > =0o0v > -END PGP SIGNATURE- ___ ceph-users mailing list ceph-users@lists.ceph.c
Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel
Hi, I cannot recognize that picture; we've been using samsumg 840 pro in production for almost 2 years now - and have had 1 fail. We run a 8node mixed ssd/platter cluster with 4x samsung 840 pro (500gb) in each so that is 32x ssd. They've written ~25TB data in avg each. Using the dd you had inside an existing semi-busy mysql-guest I get: 10240 bytes (102 MB) copied, 5.58218 s, 18.3 MB/s Which is still not a lot, but I think it is more a limitation of our setup/load. We are using dumpling. All that aside, I would prob. go with something tried and tested if I was to redo it today - we haven't had any issues, but it is still nice to use something you know should have a baseline performance and can compare to that. Cheers, Martin On Sat, Feb 28, 2015 at 12:32 PM, Philippe Schwarz wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > Le 28/02/2015 12:19, mad Engineer a écrit : > > Hello All, > > > > I am trying ceph-firefly 0.80.8 > > (69eaad7f8308f21573c604f121956e64679a52a7) with 9 OSD ,all Samsung > > SSD 850 EVO on 3 servers with 24 G RAM,16 cores @2.27 Ghz Ubuntu > > 14.04 LTS with 3.16-3 kernel.All are connected to 10G ports with > > maximum MTU.There are no extra disks for journaling and also there > > are no separate network for replication and data transfer.All 3 > > nodes are also hosting monitoring process.Operating system runs on > > SATA disk. > > > > When doing a sequential benchmark using "dd" on RBD, mounted on > > client as ext4 its taking 110s to write 100Mb data at an average > > speed of 926Kbps. > > > > time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct > > 25000+0 records in 25000+0 records out 10240 bytes (102 MB) > > copied, 110.582 s, 926 kB/s > > > > real1m50.585s user0m0.106s sys 0m2.233s > > > > While doing this directly on ssd mount point shows: > > > > time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct > > 25000+0 records in 25000+0 records out 10240 bytes (102 MB) > > copied, 1.38567 s, 73.9 MB/s > > > > OSDs are in XFS with these extra arguments : > > > > rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M > > > > ceph.conf > > > > [global] fsid = 7d889081-7826-439c-9fe5-d4e57480d9be > > mon_initial_members = ceph1, ceph2, ceph3 mon_host = > > 10.99.10.118,10.99.10.119,10.99.10.120 auth_cluster_required = > > cephx auth_service_required = cephx auth_client_required = cephx > > filestore_xattr_use_omap = true osd_pool_default_size = 2 > > osd_pool_default_min_size = 2 osd_pool_default_pg_num = 450 > > osd_pool_default_pgp_num = 450 max_open_files = 131072 > > > > [osd] osd_mkfs_type = xfs osd_op_threads = 8 osd_disk_threads = 4 > > osd_mount_options_xfs = > > "rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M" > > > > > > on our traditional storage with Full SAS disk, same "dd" completes > > in 16s with an average write speed of 6Mbps. > > > > Rados bench: > > > > rados bench -p rbd 10 write Maintaining 16 concurrent writes of > > 4194304 bytes for up to 10 seconds or 0 objects Object prefix: > > benchmark_data_ceph1_2977 sec Cur ops started finished avg MB/s > > cur MB/s last lat avg lat 0 0 0 0 > > 0 0 - 0 1 169478 > > 311.821 312 0.041228 0.140132 2 16 192 176 > > 351.866 392 0.106294 0.175055 3 16 275 259 > > 345.216 332 0.076795 0.166036 4 16 302 286 > > 285.912 108 0.043888 0.196419 5 16 395 379 > > 303.11 372 0.126033 0.207488 6 16 501 485 > > 323.242 424 0.125972 0.194559 7 16 621 605 > > 345.621 480 0.194155 0.183123 8 16 730 714 > > 356.903 436 0.086678 0.176099 9 16 814 798 > > 354.572 336 0.081567 0.174786 10 16 832 > > 816 326.31372 0.037431 0.182355 11 16 833 > > 817 297.013 4 0.533326 0.182784 Total time run: > > 11.489068 Total writes made: 833 Write size: > > 4194304 Bandwidth (MB/sec): 290.015 > > > > Stddev Bandwidth: 175.723 Max bandwidth (MB/sec): 480 Min > > bandwidth (MB/sec): 0 Average Latency:0.220582 Stddev > > Latency: 0.343697 Max latency:2.85104 Min > > latency:0.035381 > > > > Our ultimate aim is to replace existing SAN with ceph,but for that > > it should meet minimum 8000 iops.Can any one help me with this,OSD > > are SSD,CPU has good clock speed,backend network is good but still > > we are not able to extract full capability of SSD disks. > > > > > > > > Thanks, > > Hi, i'm new to ceph so, don't consider my words as holy truth. > > It seems that Samsung 840 (so i assume 850) are crappy for ceph : > > MTBF : > > http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-November/044258.html > Bandwidth > : > http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-December/045247.html > > And according to
Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel
Hi, First, test if your ssd can write fast with O_DSYNC check this blog: http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/ Then, try with ceph Giant (or maybe wait for Hammer), because they are a lot of optimisations for ssd for threads sharding. In my last test with giant, I was able to reach around 12iops with 6osd/intel s3500 ssd, but I was cpu limited. - Mail original - De: "mad Engineer" À: "ceph-users" Envoyé: Samedi 28 Février 2015 12:19:56 Objet: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel Hello All, I am trying ceph-firefly 0.80.8 (69eaad7f8308f21573c604f121956e64679a52a7) with 9 OSD ,all Samsung SSD 850 EVO on 3 servers with 24 G RAM,16 cores @2.27 Ghz Ubuntu 14.04 LTS with 3.16-3 kernel.All are connected to 10G ports with maximum MTU.There are no extra disks for journaling and also there are no separate network for replication and data transfer.All 3 nodes are also hosting monitoring process.Operating system runs on SATA disk. When doing a sequential benchmark using "dd" on RBD, mounted on client as ext4 its taking 110s to write 100Mb data at an average speed of 926Kbps. time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct 25000+0 records in 25000+0 records out 10240 bytes (102 MB) copied, 110.582 s, 926 kB/s real 1m50.585s user 0m0.106s sys 0m2.233s While doing this directly on ssd mount point shows: time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct 25000+0 records in 25000+0 records out 10240 bytes (102 MB) copied, 1.38567 s, 73.9 MB/s OSDs are in XFS with these extra arguments : rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M ceph.conf [global] fsid = 7d889081-7826-439c-9fe5-d4e57480d9be mon_initial_members = ceph1, ceph2, ceph3 mon_host = 10.99.10.118,10.99.10.119,10.99.10.120 auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx filestore_xattr_use_omap = true osd_pool_default_size = 2 osd_pool_default_min_size = 2 osd_pool_default_pg_num = 450 osd_pool_default_pgp_num = 450 max_open_files = 131072 [osd] osd_mkfs_type = xfs osd_op_threads = 8 osd_disk_threads = 4 osd_mount_options_xfs = "rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M" on our traditional storage with Full SAS disk, same "dd" completes in 16s with an average write speed of 6Mbps. Rados bench: rados bench -p rbd 10 write Maintaining 16 concurrent writes of 4194304 bytes for up to 10 seconds or 0 objects Object prefix: benchmark_data_ceph1_2977 sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 0 0 0 0 0 0 - 0 1 16 94 78 311.821 312 0.041228 0.140132 2 16 192 176 351.866 392 0.106294 0.175055 3 16 275 259 345.216 332 0.076795 0.166036 4 16 302 286 285.912 108 0.043888 0.196419 5 16 395 379 303.11 372 0.126033 0.207488 6 16 501 485 323.242 424 0.125972 0.194559 7 16 621 605 345.621 480 0.194155 0.183123 8 16 730 714 356.903 436 0.086678 0.176099 9 16 814 798 354.572 336 0.081567 0.174786 10 16 832 816 326.313 72 0.037431 0.182355 11 16 833 817 297.013 4 0.533326 0.182784 Total time run: 11.489068 Total writes made: 833 Write size: 4194304 Bandwidth (MB/sec): 290.015 Stddev Bandwidth: 175.723 Max bandwidth (MB/sec): 480 Min bandwidth (MB/sec): 0 Average Latency: 0.220582 Stddev Latency: 0.343697 Max latency: 2.85104 Min latency: 0.035381 Our ultimate aim is to replace existing SAN with ceph,but for that it should meet minimum 8000 iops.Can any one help me with this,OSD are SSD,CPU has good clock speed,backend network is good but still we are not able to extract full capability of SSD disks. Thanks, ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Le 28/02/2015 12:19, mad Engineer a écrit : > Hello All, > > I am trying ceph-firefly 0.80.8 > (69eaad7f8308f21573c604f121956e64679a52a7) with 9 OSD ,all Samsung > SSD 850 EVO on 3 servers with 24 G RAM,16 cores @2.27 Ghz Ubuntu > 14.04 LTS with 3.16-3 kernel.All are connected to 10G ports with > maximum MTU.There are no extra disks for journaling and also there > are no separate network for replication and data transfer.All 3 > nodes are also hosting monitoring process.Operating system runs on > SATA disk. > > When doing a sequential benchmark using "dd" on RBD, mounted on > client as ext4 its taking 110s to write 100Mb data at an average > speed of 926Kbps. > > time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct > 25000+0 records in 25000+0 records out 10240 bytes (102 MB) > copied, 110.582 s, 926 kB/s > > real1m50.585s user0m0.106s sys 0m2.233s > > While doing this directly on ssd mount point shows: > > time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct > 25000+0 records in 25000+0 records out 10240 bytes (102 MB) > copied, 1.38567 s, 73.9 MB/s > > OSDs are in XFS with these extra arguments : > > rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M > > ceph.conf > > [global] fsid = 7d889081-7826-439c-9fe5-d4e57480d9be > mon_initial_members = ceph1, ceph2, ceph3 mon_host = > 10.99.10.118,10.99.10.119,10.99.10.120 auth_cluster_required = > cephx auth_service_required = cephx auth_client_required = cephx > filestore_xattr_use_omap = true osd_pool_default_size = 2 > osd_pool_default_min_size = 2 osd_pool_default_pg_num = 450 > osd_pool_default_pgp_num = 450 max_open_files = 131072 > > [osd] osd_mkfs_type = xfs osd_op_threads = 8 osd_disk_threads = 4 > osd_mount_options_xfs = > "rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M" > > > on our traditional storage with Full SAS disk, same "dd" completes > in 16s with an average write speed of 6Mbps. > > Rados bench: > > rados bench -p rbd 10 write Maintaining 16 concurrent writes of > 4194304 bytes for up to 10 seconds or 0 objects Object prefix: > benchmark_data_ceph1_2977 sec Cur ops started finished avg MB/s > cur MB/s last lat avg lat 0 0 0 0 > 0 0 - 0 1 169478 > 311.821 312 0.041228 0.140132 2 16 192 176 > 351.866 392 0.106294 0.175055 3 16 275 259 > 345.216 332 0.076795 0.166036 4 16 302 286 > 285.912 108 0.043888 0.196419 5 16 395 379 > 303.11 372 0.126033 0.207488 6 16 501 485 > 323.242 424 0.125972 0.194559 7 16 621 605 > 345.621 480 0.194155 0.183123 8 16 730 714 > 356.903 436 0.086678 0.176099 9 16 814 798 > 354.572 336 0.081567 0.174786 10 16 832 > 816 326.31372 0.037431 0.182355 11 16 833 > 817 297.013 4 0.533326 0.182784 Total time run: > 11.489068 Total writes made: 833 Write size: > 4194304 Bandwidth (MB/sec): 290.015 > > Stddev Bandwidth: 175.723 Max bandwidth (MB/sec): 480 Min > bandwidth (MB/sec): 0 Average Latency:0.220582 Stddev > Latency: 0.343697 Max latency:2.85104 Min > latency:0.035381 > > Our ultimate aim is to replace existing SAN with ceph,but for that > it should meet minimum 8000 iops.Can any one help me with this,OSD > are SSD,CPU has good clock speed,backend network is good but still > we are not able to extract full capability of SSD disks. > > > > Thanks, Hi, i'm new to ceph so, don't consider my words as holy truth. It seems that Samsung 840 (so i assume 850) are crappy for ceph : MTBF : http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-November/044258.html Bandwidth :http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-December/045247.html And according to a confirmed user of Ceph/ProxmoX, Samsung SSDs should be avoided if possible in ceph storage. Apart from that, it seems there was an limitation in ceph for the use of the complete bandwidth available in SSDs; but i think with less than 1Mb/s you haven't hit this limit. I remind you that i'm not a ceph-guru (far from that, indeed), so feel free to disagree; i'm on the way to improve my knowledge. Best regards. -BEGIN PGP SIGNATURE- Version: GnuPG v1 iEYEARECAAYFAlTxp0UACgkQlhqCFkbqHRb5+wCgrXCM3VsnVE6PCbbpOmQXCXbr 8u0An2BUgZWismSK0PxbwVDOD5+/UWik =0o0v -END PGP SIGNATURE- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel
Hello All, I am trying ceph-firefly 0.80.8 (69eaad7f8308f21573c604f121956e64679a52a7) with 9 OSD ,all Samsung SSD 850 EVO on 3 servers with 24 G RAM,16 cores @2.27 Ghz Ubuntu 14.04 LTS with 3.16-3 kernel.All are connected to 10G ports with maximum MTU.There are no extra disks for journaling and also there are no separate network for replication and data transfer.All 3 nodes are also hosting monitoring process.Operating system runs on SATA disk. When doing a sequential benchmark using "dd" on RBD, mounted on client as ext4 its taking 110s to write 100Mb data at an average speed of 926Kbps. time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct 25000+0 records in 25000+0 records out 10240 bytes (102 MB) copied, 110.582 s, 926 kB/s real1m50.585s user0m0.106s sys 0m2.233s While doing this directly on ssd mount point shows: time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct 25000+0 records in 25000+0 records out 10240 bytes (102 MB) copied, 1.38567 s, 73.9 MB/s OSDs are in XFS with these extra arguments : rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M ceph.conf [global] fsid = 7d889081-7826-439c-9fe5-d4e57480d9be mon_initial_members = ceph1, ceph2, ceph3 mon_host = 10.99.10.118,10.99.10.119,10.99.10.120 auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx filestore_xattr_use_omap = true osd_pool_default_size = 2 osd_pool_default_min_size = 2 osd_pool_default_pg_num = 450 osd_pool_default_pgp_num = 450 max_open_files = 131072 [osd] osd_mkfs_type = xfs osd_op_threads = 8 osd_disk_threads = 4 osd_mount_options_xfs = "rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M" on our traditional storage with Full SAS disk, same "dd" completes in 16s with an average write speed of 6Mbps. Rados bench: rados bench -p rbd 10 write Maintaining 16 concurrent writes of 4194304 bytes for up to 10 seconds or 0 objects Object prefix: benchmark_data_ceph1_2977 sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 0 0 0 0 0 0 - 0 1 169478 311.821 312 0.041228 0.140132 2 16 192 176 351.866 392 0.106294 0.175055 3 16 275 259 345.216 332 0.076795 0.166036 4 16 302 286 285.912 108 0.043888 0.196419 5 16 395 379303.11 372 0.126033 0.207488 6 16 501 485 323.242 424 0.125972 0.194559 7 16 621 605 345.621 480 0.194155 0.183123 8 16 730 714 356.903 436 0.086678 0.176099 9 16 814 798 354.572 336 0.081567 0.174786 10 16 832 816 326.31372 0.037431 0.182355 11 16 833 817 297.013 4 0.533326 0.182784 Total time run: 11.489068 Total writes made: 833 Write size: 4194304 Bandwidth (MB/sec): 290.015 Stddev Bandwidth: 175.723 Max bandwidth (MB/sec): 480 Min bandwidth (MB/sec): 0 Average Latency:0.220582 Stddev Latency: 0.343697 Max latency:2.85104 Min latency:0.035381 Our ultimate aim is to replace existing SAN with ceph,but for that it should meet minimum 8000 iops.Can any one help me with this,OSD are SSD,CPU has good clock speed,backend network is good but still we are not able to extract full capability of SSD disks. Thanks, ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com