Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

2015-03-09 Thread Nick Fisk
Can you run the Fio test again but with a queue depth of 32. This will probably 
show what your cluster is capable of. Adding more nodes with SSD's will 
probably help scale, but only at higher io depths. At low queue depths you are 
probably already at the limit as per my earlier email.


From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of mad 
Engineer
Sent: 09 March 2015 17:23
To: Nick Fisk
Cc: ceph-users
Subject: Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 
OSD with 3.16-3 kernel

Thank you Nick for explaining the problem with 4k writes.Queue depth used in 
this setup is 256 the maximum supported.
Can you clarify that adding more nodes will not increase iops.In general how 
will we increase iops of a ceph cluster.

Thanks for your help


On Sat, Mar 7, 2015 at 5:57 PM, Nick Fisk  wrote:
You are hitting serial latency limits. For a 4kb sync write to happen it has 
to:-

1. Travel across network from client to Primary OSD
2. Be processed by Ceph
3. Get Written to Pri OSD
4. Ack travels across network to client

At 4kb these 4 steps take up a very high percentage of the actual processing 
time as compared to the actual write to the SSD. Apart from faster (more ghz) 
CPU's which will improve step 2, there's not much that can be done. Future Ceph 
releases may improve step2 as well, but I wouldn't imagine it will change 
dramitcally.

Replication level >1 will also see the IOPs drop as you are introducing yet 
more ceph processing and network delays. Unless a future Ceph feature can be 
implemented where it returns the ack to client once data has hit the 1st OSD.

Still a 1000 iops, is not that bad.  You mention it needs to achieve 8000 iops 
to replace your existing SAN, at what queue depth is this required? You are 
getting way above that at a queue depth of only 16.

I doubt most Ethernet based enterprise SANs would be able to provide 8000 iops 
at a queue depth of 1, as just network delays would be limiting you to around 
that figure. A network delay of .1ms will limit you to 10,000 IOPs, .2ms = 
5000IOPs and so on.

If you really do need pure SSD performance for a certain client you will need 
to move the SSD local to it using some sort of caching software running on the 
client , although this can bring its own challenges.

Nick

> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> mad Engineer
> Sent: 07 March 2015 10:55
> To: Somnath Roy
> Cc: ceph-users
> Subject: Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9
> OSD with 3.16-3 kernel
>
> Update:
>
> Hardware:
> Upgraded RAID controller to LSI Megaraid 9341 -12Gbps
> 3 Samsung 840 EVO - was showing 45K iops for fio test with 7 threads and 4k
> block size in JBOD mode
> CPU- 16 cores @2.27Ghz
> RAM- 24Gb
> NIC- 10Gbits with under 1 ms latency, iperf shows 9.18 Gbps between host
> and client
>
>  Software
> Ubuntu 14.04 with stock kernel 3.13-
> Upgraded from firefly to giant [ceph version 0.87.1
> (283c2e7cfa2457799f534744d7d549f83ea1335e)]
> Changed file system to btrfs and i/o scheduler to noop.
>
> Ceph Setup
> replication to 1 and using 2 SSD OSD and 1 SSD for Journal.All are samsung 840
> EVO in JBOD mode on single server.
>
> Configuration:
> [global]
> fsid = 979f32fc-6f31-43b0-832f-29fcc4c5a648
> mon_initial_members = ceph1
> mon_host = 10.99.10.118
> auth_cluster_required = cephx
> auth_service_required = cephx
> auth_client_required = cephx
> filestore_xattr_use_omap = true
> osd_pool_default_size = 1
> osd_pool_default_min_size = 1
> osd_pool_default_pg_num = 250
> osd_pool_default_pgp_num = 250
> debug_lockdep = 0/0
> debug_context = 0/0
> debug_crush = 0/0
> debug_buffer = 0/0
> debug_timer = 0/0
> debug_filer = 0/0
> debug_objecter = 0/0
> debug_rados = 0/0
> debug_rbd = 0/0
> debug_journaler = 0/0
> debug_objectcatcher = 0/0
> debug_client = 0/0
> debug_osd = 0/0
> debug_optracker = 0/0
> debug_objclass = 0/0
> debug_filestore = 0/0
> debug_journal = 0/0
> debug_ms = 0/0
> debug_monc = 0/0
> debug_tp = 0/0
> debug_auth = 0/0
> debug_finisher = 0/0
> debug_heartbeatmap = 0/0
> debug_perfcounter = 0/0
> debug_asok = 0/0
> debug_throttle = 0/0
> debug_mon = 0/0
> debug_paxos = 0/0
> debug_rgw = 0/0
>
> [client]
> rbd_cache = true
>
> Client
> Ubuntu 14.04 with 16 Core @2.53 Ghz and 24G RAM
>
> Results
> rados bench -p rdp -b 4096 -t 16 10 write
>
> rados bench -p rbd -b 4096 -t 16 10 write
>  Maintaining 16 concurrent writes of 4096 bytes for up to 10 seconds or 0
> objects
>  Object prefix: benchmark_data_ubuntucompute_3931
>sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
>  0   0 0 0 

Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

2015-03-09 Thread mad Engineer
Thank you Nick for explaining the problem with 4k writes.Queue depth used
in this setup is 256 the maximum supported.
Can you clarify that adding more nodes will not increase iops.In general
how will we increase iops of a ceph cluster.

Thanks for your help


On Sat, Mar 7, 2015 at 5:57 PM, Nick Fisk  wrote:

> You are hitting serial latency limits. For a 4kb sync write to happen it
> has to:-
>
> 1. Travel across network from client to Primary OSD
> 2. Be processed by Ceph
> 3. Get Written to Pri OSD
> 4. Ack travels across network to client
>
> At 4kb these 4 steps take up a very high percentage of the actual
> processing time as compared to the actual write to the SSD. Apart from
> faster (more ghz) CPU's which will improve step 2, there's not much that
> can be done. Future Ceph releases may improve step2 as well, but I wouldn't
> imagine it will change dramitcally.
>
> Replication level >1 will also see the IOPs drop as you are introducing
> yet more ceph processing and network delays. Unless a future Ceph feature
> can be implemented where it returns the ack to client once data has hit the
> 1st OSD.
>
> Still a 1000 iops, is not that bad.  You mention it needs to achieve 8000
> iops to replace your existing SAN, at what queue depth is this required?
> You are getting way above that at a queue depth of only 16.
>
> I doubt most Ethernet based enterprise SANs would be able to provide 8000
> iops at a queue depth of 1, as just network delays would be limiting you to
> around that figure. A network delay of .1ms will limit you to 10,000 IOPs,
> .2ms = 5000IOPs and so on.
>
> If you really do need pure SSD performance for a certain client you will
> need to move the SSD local to it using some sort of caching software
> running on the client , although this can bring its own challenges.
>
> Nick
>
> > -Original Message-
> > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> > mad Engineer
> > Sent: 07 March 2015 10:55
> > To: Somnath Roy
> > Cc: ceph-users
> > Subject: Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes
> and 9
> > OSD with 3.16-3 kernel
> >
> > Update:
> >
> > Hardware:
> > Upgraded RAID controller to LSI Megaraid 9341 -12Gbps
> > 3 Samsung 840 EVO - was showing 45K iops for fio test with 7 threads and
> 4k
> > block size in JBOD mode
> > CPU- 16 cores @2.27Ghz
> > RAM- 24Gb
> > NIC- 10Gbits with under 1 ms latency, iperf shows 9.18 Gbps between host
> > and client
> >
> >  Software
> > Ubuntu 14.04 with stock kernel 3.13-
> > Upgraded from firefly to giant [ceph version 0.87.1
> > (283c2e7cfa2457799f534744d7d549f83ea1335e)]
> > Changed file system to btrfs and i/o scheduler to noop.
> >
> > Ceph Setup
> > replication to 1 and using 2 SSD OSD and 1 SSD for Journal.All are
> samsung 840
> > EVO in JBOD mode on single server.
> >
> > Configuration:
> > [global]
> > fsid = 979f32fc-6f31-43b0-832f-29fcc4c5a648
> > mon_initial_members = ceph1
> > mon_host = 10.99.10.118
> > auth_cluster_required = cephx
> > auth_service_required = cephx
> > auth_client_required = cephx
> > filestore_xattr_use_omap = true
> > osd_pool_default_size = 1
> > osd_pool_default_min_size = 1
> > osd_pool_default_pg_num = 250
> > osd_pool_default_pgp_num = 250
> > debug_lockdep = 0/0
> > debug_context = 0/0
> > debug_crush = 0/0
> > debug_buffer = 0/0
> > debug_timer = 0/0
> > debug_filer = 0/0
> > debug_objecter = 0/0
> > debug_rados = 0/0
> > debug_rbd = 0/0
> > debug_journaler = 0/0
> > debug_objectcatcher = 0/0
> > debug_client = 0/0
> > debug_osd = 0/0
> > debug_optracker = 0/0
> > debug_objclass = 0/0
> > debug_filestore = 0/0
> > debug_journal = 0/0
> > debug_ms = 0/0
> > debug_monc = 0/0
> > debug_tp = 0/0
> > debug_auth = 0/0
> > debug_finisher = 0/0
> > debug_heartbeatmap = 0/0
> > debug_perfcounter = 0/0
> > debug_asok = 0/0
> > debug_throttle = 0/0
> > debug_mon = 0/0
> > debug_paxos = 0/0
> > debug_rgw = 0/0
> >
> > [client]
> > rbd_cache = true
> >
> > Client
> > Ubuntu 14.04 with 16 Core @2.53 Ghz and 24G RAM
> >
> > Results
> > rados bench -p rdp -b 4096 -t 16 10 write
> >
> > rados bench -p rbd -b 4096 -t 16 10 write
> >  Maintaining 16 concurrent writes of 4096 bytes for up to 10 seconds or 0
> > objects
> >  Object prefix: benchmark_data_ubuntucompute_3931
> >sec Cur ops   started  finished  avg MB/s  cu

Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

2015-03-07 Thread Nick Fisk
You are hitting serial latency limits. For a 4kb sync write to happen it has 
to:-

1. Travel across network from client to Primary OSD
2. Be processed by Ceph
3. Get Written to Pri OSD
4. Ack travels across network to client

At 4kb these 4 steps take up a very high percentage of the actual processing 
time as compared to the actual write to the SSD. Apart from faster (more ghz) 
CPU's which will improve step 2, there's not much that can be done. Future Ceph 
releases may improve step2 as well, but I wouldn't imagine it will change 
dramitcally. 

Replication level >1 will also see the IOPs drop as you are introducing yet 
more ceph processing and network delays. Unless a future Ceph feature can be 
implemented where it returns the ack to client once data has hit the 1st OSD.

Still a 1000 iops, is not that bad.  You mention it needs to achieve 8000 iops 
to replace your existing SAN, at what queue depth is this required? You are 
getting way above that at a queue depth of only 16.

I doubt most Ethernet based enterprise SANs would be able to provide 8000 iops 
at a queue depth of 1, as just network delays would be limiting you to around 
that figure. A network delay of .1ms will limit you to 10,000 IOPs, .2ms = 
5000IOPs and so on.

If you really do need pure SSD performance for a certain client you will need 
to move the SSD local to it using some sort of caching software running on the 
client , although this can bring its own challenges.

Nick

> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> mad Engineer
> Sent: 07 March 2015 10:55
> To: Somnath Roy
> Cc: ceph-users
> Subject: Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9
> OSD with 3.16-3 kernel
> 
> Update:
> 
> Hardware:
> Upgraded RAID controller to LSI Megaraid 9341 -12Gbps
> 3 Samsung 840 EVO - was showing 45K iops for fio test with 7 threads and 4k
> block size in JBOD mode
> CPU- 16 cores @2.27Ghz
> RAM- 24Gb
> NIC- 10Gbits with under 1 ms latency, iperf shows 9.18 Gbps between host
> and client
> 
>  Software
> Ubuntu 14.04 with stock kernel 3.13-
> Upgraded from firefly to giant [ceph version 0.87.1
> (283c2e7cfa2457799f534744d7d549f83ea1335e)]
> Changed file system to btrfs and i/o scheduler to noop.
> 
> Ceph Setup
> replication to 1 and using 2 SSD OSD and 1 SSD for Journal.All are samsung 840
> EVO in JBOD mode on single server.
> 
> Configuration:
> [global]
> fsid = 979f32fc-6f31-43b0-832f-29fcc4c5a648
> mon_initial_members = ceph1
> mon_host = 10.99.10.118
> auth_cluster_required = cephx
> auth_service_required = cephx
> auth_client_required = cephx
> filestore_xattr_use_omap = true
> osd_pool_default_size = 1
> osd_pool_default_min_size = 1
> osd_pool_default_pg_num = 250
> osd_pool_default_pgp_num = 250
> debug_lockdep = 0/0
> debug_context = 0/0
> debug_crush = 0/0
> debug_buffer = 0/0
> debug_timer = 0/0
> debug_filer = 0/0
> debug_objecter = 0/0
> debug_rados = 0/0
> debug_rbd = 0/0
> debug_journaler = 0/0
> debug_objectcatcher = 0/0
> debug_client = 0/0
> debug_osd = 0/0
> debug_optracker = 0/0
> debug_objclass = 0/0
> debug_filestore = 0/0
> debug_journal = 0/0
> debug_ms = 0/0
> debug_monc = 0/0
> debug_tp = 0/0
> debug_auth = 0/0
> debug_finisher = 0/0
> debug_heartbeatmap = 0/0
> debug_perfcounter = 0/0
> debug_asok = 0/0
> debug_throttle = 0/0
> debug_mon = 0/0
> debug_paxos = 0/0
> debug_rgw = 0/0
> 
> [client]
> rbd_cache = true
> 
> Client
> Ubuntu 14.04 with 16 Core @2.53 Ghz and 24G RAM
> 
> Results
> rados bench -p rdp -b 4096 -t 16 10 write
> 
> rados bench -p rbd -b 4096 -t 16 10 write
>  Maintaining 16 concurrent writes of 4096 bytes for up to 10 seconds or 0
> objects
>  Object prefix: benchmark_data_ubuntucompute_3931
>sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
>  0   0 0 0 0 0 - 0
>  1  16  6370  6354   24.8124   24.8203   0.002210.00251512
>  2  16 11618 11602   22.6536  20.5  0.0010250.00275493
>  3  16 16889 16873   21.9637   20.5898  0.0012880.00281797
>  4  16 17310 1729416.884   1.64453  0.0540660.00365805
>  5  16 17695 1767913.808   1.50391  0.0014510.0009
>  6  16 18127 18111   11.78681.6875  0.0014630.00527521
>  7  16 21647 21631   12.0669 13.75  0.001601 0.0051773
>  8  16 28056 28040   13.6872   25.0352  0.0052680.00456353
>  9  16 28947 2893112.553   3.48047   0.066470.00494762
> 10  16 29346 29330   11.4536   1.55859  0.0013410.0054231

Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

2015-03-07 Thread mad Engineer
],
 | 99.00th=[ 1928], 99.50th=[ 1992], 99.90th=[ 2160], 99.95th=[ 2288],
 | 99.99th=[39168]
bw (KB  /s): min=   54, max= 3568, per=64.10%, avg=2529.43, stdev=315.56
lat (usec) : 750=0.07%, 1000=2.53%
lat (msec) : 2=96.96%, 4=0.43%, 50=0.01%, >=2000=0.01%
  cpu  : usr=0.51%, sys=2.04%, ctx=73550, majf=0, minf=93
  IO depths: 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%,
>=64=0.0%
 submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
 complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
 issued: total=r=0/w=73234/d=0, short=r=0/w=0/d=0

Run status group 0 (all jobs):
  WRITE: io=292936KB, aggrb=3946KB/s, minb=3946KB/s, maxb=3946KB/s,
mint=74236msec, maxt=74236msec

Disk stats (read/write):
  rbd0: ios=186/73232, merge=0/0, ticks=120/109676, in_queue=143448,
util=100.00%

How can i improve performance of 4k write? Will adding more Nodes improve
this

Thanks for any help

On Sun, Mar 1, 2015 at 3:07 AM, Somnath Roy  wrote:

>  Sorry, I saw you have already tried with ‘rados bench’. So, some points
> here.
>
>
>
> 1. If you are considering write workload, I think with total of 2 copies
> and with 4K workload , you should be able to get ~4K iops (considering it
> hitting the disk, not with memstore).
>
>
>
> 2. You are having 9 OSDs and if you created only one pool with only 450
> PGS, you should try to increase that and see if getting any improvement or
> not.
>
>
>
> 3. Also, the rados bench script you ran with very low QD, try increasing
> that, may be 32/64.
>
>
>
> 4. If you are running firefly, other optimization won’t work here..But,
> you can add the following in your ceph.conf file and it should give you
> some boost.
>
>
>
> debug_lockdep = 0/0
>
> debug_context = 0/0
>
> debug_crush = 0/0
>
> debug_buffer = 0/0
>
> debug_timer = 0/0
>
> debug_filer = 0/0
>
> debug_objecter = 0/0
>
> debug_rados = 0/0
>
> debug_rbd = 0/0
>
> debug_journaler = 0/0
>
> debug_objectcatcher = 0/0
>
> debug_client = 0/0
>
> debug_osd = 0/0
>
> debug_optracker = 0/0
>
> debug_objclass = 0/0
>
> debug_filestore = 0/0
>
> debug_journal = 0/0
>
> debug_ms = 0/0
>
> debug_monc = 0/0
>
> debug_tp = 0/0
>
> debug_auth = 0/0
>
> debug_finisher = 0/0
>
> debug_heartbeatmap = 0/0
>
> debug_perfcounter = 0/0
>
> debug_asok = 0/0
>
> debug_throttle = 0/0
>
> debug_mon = 0/0
>
> debug_paxos = 0/0
>
> debug_rgw = 0/0
>
>
>
> 5. Give us the ceph –s output and the iostat output while io is going on.
>
>
>
> Thanks & Regards
>
> Somnath
>
>
>
>
>
>
>
> *From:* Somnath Roy
> *Sent:* Saturday, February 28, 2015 12:59 PM
> *To:* 'mad Engineer'; Alexandre DERUMIER
> *Cc:* ceph-users
> *Subject:* RE: [ceph-users] Extreme slowness in SSD cluster with 3 nodes
> and 9 OSD with 3.16-3 kernel
>
>
>
> I would say check with rados tool like ceph_smalliobench/rados bench first
> to see how much performance these tools are reporting. This will help you
> to isolate any upstream issues.
>
> Also, check with ‘iostat –xk 1’ for the resource utilization. Hope you are
> running with powerful enough cpu complex since you are saying network is
> not a bottleneck.
>
>
>
> Thanks & Regards
>
> Somnath
>
>
>
> *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf
> Of *mad Engineer
> *Sent:* Saturday, February 28, 2015 12:29 PM
> *To:* Alexandre DERUMIER
> *Cc:* ceph-users
> *Subject:* Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes
> and 9 OSD with 3.16-3 kernel
>
>
>
> reinstalled ceph packages and now with memstore backend [osd objectstore
> =memstore] its giving 400Kbps .No idea where the problem is.
>
>
>
> On Sun, Mar 1, 2015 at 12:30 AM, mad Engineer 
> wrote:
>
> tried changing scheduler from deadline to noop also upgraded to Gaint and
> btrfs filesystem,downgraded kernel to 3.16 from 3.16-3 not much difference
>
>
>
> dd if=/dev/zero of=hi bs=4k count=25000 oflag=direct
>
> 25000+0 records in
>
> 25000+0 records out
>
> 10240 bytes (102 MB) copied, 94.691 s, 1.1 MB/s
>
>
>
> Earlier on a vmware setup i was getting ~850 KBps and now even on physical
> server with SSD drives its just over 1MBps.I doubt some serious
> configuration issues.
>
>
>
> Tried iperf between 3 servers all are showing 9 Gbps,tried icmp with
> different packet size ,no fragmentation.
>
>
>
> i also noticed that out of 9 osd 5 are 850 EVO and 4 are 840 EVO.I believe
> this will not cause this much drop in performance.
>
&g

Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

2015-02-28 Thread mad Engineer
I am re installing ceph with giant release,will soon update results with
above configuration changes.

my servers are Cisco UCS C 200 M1 with  Integrated Intel ICH10R SATA
controller.Before installing ceph i changed it to use Software RAID
quoting from below link [When using the integrated RAID, you must enable
the ICH10R controller in SW RAID mode]
http://www.cisco.com/c/en/us/td/docs/unified_computing/ucs/c/hw/C200M1/install/c200M1/RAID.html#88713


Not sure this is the problem.With out ceph ,linux is giving better results
with this controller and SSD disks. WIth ceph over it results are slower
than SATA disks.

Thanks for all your support


On Sun, Mar 1, 2015 at 3:07 AM, Somnath Roy  wrote:

>  Sorry, I saw you have already tried with ‘rados bench’. So, some points
> here.
>
>
>
> 1. If you are considering write workload, I think with total of 2 copies
> and with 4K workload , you should be able to get ~4K iops (considering it
> hitting the disk, not with memstore).
>
>
>
> 2. You are having 9 OSDs and if you created only one pool with only 450
> PGS, you should try to increase that and see if getting any improvement or
> not.
>
>
>
> 3. Also, the rados bench script you ran with very low QD, try increasing
> that, may be 32/64.
>
>
>
> 4. If you are running firefly, other optimization won’t work here..But,
> you can add the following in your ceph.conf file and it should give you
> some boost.
>
>
>
> debug_lockdep = 0/0
>
> debug_context = 0/0
>
> debug_crush = 0/0
>
> debug_buffer = 0/0
>
> debug_timer = 0/0
>
> debug_filer = 0/0
>
> debug_objecter = 0/0
>
> debug_rados = 0/0
>
> debug_rbd = 0/0
>
> debug_journaler = 0/0
>
> debug_objectcatcher = 0/0
>
> debug_client = 0/0
>
> debug_osd = 0/0
>
> debug_optracker = 0/0
>
> debug_objclass = 0/0
>
> debug_filestore = 0/0
>
> debug_journal = 0/0
>
> debug_ms = 0/0
>
> debug_monc = 0/0
>
> debug_tp = 0/0
>
> debug_auth = 0/0
>
> debug_finisher = 0/0
>
> debug_heartbeatmap = 0/0
>
> debug_perfcounter = 0/0
>
> debug_asok = 0/0
>
> debug_throttle = 0/0
>
> debug_mon = 0/0
>
> debug_paxos = 0/0
>
> debug_rgw = 0/0
>
>
>
> 5. Give us the ceph –s output and the iostat output while io is going on.
>
>
>
> Thanks & Regards
>
> Somnath
>
>
>
>
>
>
>
> *From:* Somnath Roy
> *Sent:* Saturday, February 28, 2015 12:59 PM
> *To:* 'mad Engineer'; Alexandre DERUMIER
> *Cc:* ceph-users
> *Subject:* RE: [ceph-users] Extreme slowness in SSD cluster with 3 nodes
> and 9 OSD with 3.16-3 kernel
>
>
>
> I would say check with rados tool like ceph_smalliobench/rados bench first
> to see how much performance these tools are reporting. This will help you
> to isolate any upstream issues.
>
> Also, check with ‘iostat –xk 1’ for the resource utilization. Hope you are
> running with powerful enough cpu complex since you are saying network is
> not a bottleneck.
>
>
>
> Thanks & Regards
>
> Somnath
>
>
>
> *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf
> Of *mad Engineer
> *Sent:* Saturday, February 28, 2015 12:29 PM
> *To:* Alexandre DERUMIER
> *Cc:* ceph-users
> *Subject:* Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes
> and 9 OSD with 3.16-3 kernel
>
>
>
> reinstalled ceph packages and now with memstore backend [osd objectstore
> =memstore] its giving 400Kbps .No idea where the problem is.
>
>
>
> On Sun, Mar 1, 2015 at 12:30 AM, mad Engineer 
> wrote:
>
> tried changing scheduler from deadline to noop also upgraded to Gaint and
> btrfs filesystem,downgraded kernel to 3.16 from 3.16-3 not much difference
>
>
>
> dd if=/dev/zero of=hi bs=4k count=25000 oflag=direct
>
> 25000+0 records in
>
> 25000+0 records out
>
> 10240 bytes (102 MB) copied, 94.691 s, 1.1 MB/s
>
>
>
> Earlier on a vmware setup i was getting ~850 KBps and now even on physical
> server with SSD drives its just over 1MBps.I doubt some serious
> configuration issues.
>
>
>
> Tried iperf between 3 servers all are showing 9 Gbps,tried icmp with
> different packet size ,no fragmentation.
>
>
>
> i also noticed that out of 9 osd 5 are 850 EVO and 4 are 840 EVO.I believe
> this will not cause this much drop in performance.
>
>
>
> Thanks for any help
>
>
>
>
>
> On Sat, Feb 28, 2015 at 6:49 PM, Alexandre DERUMIER 
> wrote:
>
> As optimisation,
>
> try to set ioscheduler to noop,
>
> and also enable rbd_cache=true. (It's really helping for for sequential
> writes)
>
> but your result

Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

2015-02-28 Thread Somnath Roy
Sorry, I saw you have already tried with ‘rados bench’. So, some points here.

1. If you are considering write workload, I think with total of 2 copies and 
with 4K workload , you should be able to get ~4K iops (considering it hitting 
the disk, not with memstore).

2. You are having 9 OSDs and if you created only one pool with only 450 PGS, 
you should try to increase that and see if getting any improvement or not.

3. Also, the rados bench script you ran with very low QD, try increasing that, 
may be 32/64.

4. If you are running firefly, other optimization won’t work here..But, you can 
add the following in your ceph.conf file and it should give you some boost.

debug_lockdep = 0/0
debug_context = 0/0
debug_crush = 0/0
debug_buffer = 0/0
debug_timer = 0/0
debug_filer = 0/0
debug_objecter = 0/0
debug_rados = 0/0
debug_rbd = 0/0
debug_journaler = 0/0
debug_objectcatcher = 0/0
debug_client = 0/0
debug_osd = 0/0
debug_optracker = 0/0
debug_objclass = 0/0
debug_filestore = 0/0
debug_journal = 0/0
debug_ms = 0/0
debug_monc = 0/0
debug_tp = 0/0
debug_auth = 0/0
debug_finisher = 0/0
debug_heartbeatmap = 0/0
debug_perfcounter = 0/0
debug_asok = 0/0
debug_throttle = 0/0
debug_mon = 0/0
debug_paxos = 0/0
debug_rgw = 0/0

5. Give us the ceph –s output and the iostat output while io is going on.

Thanks & Regards
Somnath



From: Somnath Roy
Sent: Saturday, February 28, 2015 12:59 PM
To: 'mad Engineer'; Alexandre DERUMIER
Cc: ceph-users
Subject: RE: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 
OSD with 3.16-3 kernel

I would say check with rados tool like ceph_smalliobench/rados bench first to 
see how much performance these tools are reporting. This will help you to 
isolate any upstream issues.
Also, check with ‘iostat –xk 1’ for the resource utilization. Hope you are 
running with powerful enough cpu complex since you are saying network is not a 
bottleneck.

Thanks & Regards
Somnath

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of mad 
Engineer
Sent: Saturday, February 28, 2015 12:29 PM
To: Alexandre DERUMIER
Cc: ceph-users
Subject: Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 
OSD with 3.16-3 kernel

reinstalled ceph packages and now with memstore backend [osd objectstore 
=memstore] its giving 400Kbps .No idea where the problem is.

On Sun, Mar 1, 2015 at 12:30 AM, mad Engineer 
mailto:themadengin...@gmail.com>> wrote:
tried changing scheduler from deadline to noop also upgraded to Gaint and btrfs 
filesystem,downgraded kernel to 3.16 from 3.16-3 not much difference

dd if=/dev/zero of=hi bs=4k count=25000 oflag=direct
25000+0 records in
25000+0 records out
10240 bytes (102 MB) copied, 94.691 s, 1.1 MB/s

Earlier on a vmware setup i was getting ~850 KBps and now even on physical 
server with SSD drives its just over 1MBps.I doubt some serious configuration 
issues.

Tried iperf between 3 servers all are showing 9 Gbps,tried icmp with different 
packet size ,no fragmentation.

i also noticed that out of 9 osd 5 are 850 EVO and 4 are 840 EVO.I believe this 
will not cause this much drop in performance.

Thanks for any help


On Sat, Feb 28, 2015 at 6:49 PM, Alexandre DERUMIER 
mailto:aderum...@odiso.com>> wrote:
As optimisation,

try to set ioscheduler to noop,

and also enable rbd_cache=true. (It's really helping for for sequential writes)

but your results seem quite low, 926kb/s with 4k, it's only 200io/s.

check if you don't have any big network latencies, or mtu fragementation 
problem.

Maybe also try to bench with fio, with more parallel jobs.




- Mail original -
De: "mad Engineer" mailto:themadengin...@gmail.com>>
À: "Philippe Schwarz" mailto:p...@schwarz-fr.net>>
Cc: "ceph-users" mailto:ceph-users@lists.ceph.com>>
Envoyé: Samedi 28 Février 2015 13:06:59
Objet: Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD 
with 3.16-3 kernel
Thanks for the reply Philippe,we were using these disks in our NAS,now
it looks like i am in big trouble :-(

On Sat, Feb 28, 2015 at 5:02 PM, Philippe Schwarz 
mailto:p...@schwarz-fr.net>> wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> Le 28/02/2015 12:19, mad Engineer a écrit :
>> Hello All,
>>
>> I am trying ceph-firefly 0.80.8
>> (69eaad7f8308f21573c604f121956e64679a52a7) with 9 OSD ,all Samsung
>> SSD 850 EVO on 3 servers with 24 G RAM,16 cores @2.27 Ghz Ubuntu
>> 14.04 LTS with 3.16-3 kernel.All are connected to 10G ports with
>> maximum MTU.There are no extra disks for journaling and also there
>> are no separate network for replication and data transfer.All 3
>> nodes are also hosting monitoring process.Operating system runs on
>> SATA disk.
>>
>> When doing a sequential benchmark using "dd" on RBD, mounted on
>> client as ext4 its taking 110s to write 100Mb 

Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

2015-02-28 Thread Somnath Roy
I would say check with rados tool like ceph_smalliobench/rados bench first to 
see how much performance these tools are reporting. This will help you to 
isolate any upstream issues.
Also, check with ‘iostat –xk 1’ for the resource utilization. Hope you are 
running with powerful enough cpu complex since you are saying network is not a 
bottleneck.

Thanks & Regards
Somnath

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of mad 
Engineer
Sent: Saturday, February 28, 2015 12:29 PM
To: Alexandre DERUMIER
Cc: ceph-users
Subject: Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 
OSD with 3.16-3 kernel

reinstalled ceph packages and now with memstore backend [osd objectstore 
=memstore] its giving 400Kbps .No idea where the problem is.

On Sun, Mar 1, 2015 at 12:30 AM, mad Engineer 
mailto:themadengin...@gmail.com>> wrote:
tried changing scheduler from deadline to noop also upgraded to Gaint and btrfs 
filesystem,downgraded kernel to 3.16 from 3.16-3 not much difference

dd if=/dev/zero of=hi bs=4k count=25000 oflag=direct
25000+0 records in
25000+0 records out
10240 bytes (102 MB) copied, 94.691 s, 1.1 MB/s

Earlier on a vmware setup i was getting ~850 KBps and now even on physical 
server with SSD drives its just over 1MBps.I doubt some serious configuration 
issues.

Tried iperf between 3 servers all are showing 9 Gbps,tried icmp with different 
packet size ,no fragmentation.

i also noticed that out of 9 osd 5 are 850 EVO and 4 are 840 EVO.I believe this 
will not cause this much drop in performance.

Thanks for any help


On Sat, Feb 28, 2015 at 6:49 PM, Alexandre DERUMIER 
mailto:aderum...@odiso.com>> wrote:
As optimisation,

try to set ioscheduler to noop,

and also enable rbd_cache=true. (It's really helping for for sequential writes)

but your results seem quite low, 926kb/s with 4k, it's only 200io/s.

check if you don't have any big network latencies, or mtu fragementation 
problem.

Maybe also try to bench with fio, with more parallel jobs.




- Mail original -
De: "mad Engineer" mailto:themadengin...@gmail.com>>
À: "Philippe Schwarz" mailto:p...@schwarz-fr.net>>
Cc: "ceph-users" mailto:ceph-users@lists.ceph.com>>
Envoyé: Samedi 28 Février 2015 13:06:59
Objet: Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD 
with 3.16-3 kernel
Thanks for the reply Philippe,we were using these disks in our NAS,now
it looks like i am in big trouble :-(

On Sat, Feb 28, 2015 at 5:02 PM, Philippe Schwarz 
mailto:p...@schwarz-fr.net>> wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> Le 28/02/2015 12:19, mad Engineer a écrit :
>> Hello All,
>>
>> I am trying ceph-firefly 0.80.8
>> (69eaad7f8308f21573c604f121956e64679a52a7) with 9 OSD ,all Samsung
>> SSD 850 EVO on 3 servers with 24 G RAM,16 cores @2.27 Ghz Ubuntu
>> 14.04 LTS with 3.16-3 kernel.All are connected to 10G ports with
>> maximum MTU.There are no extra disks for journaling and also there
>> are no separate network for replication and data transfer.All 3
>> nodes are also hosting monitoring process.Operating system runs on
>> SATA disk.
>>
>> When doing a sequential benchmark using "dd" on RBD, mounted on
>> client as ext4 its taking 110s to write 100Mb data at an average
>> speed of 926Kbps.
>>
>> time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct
>> 25000+0 records in 25000+0 records out 10240 bytes (102 MB)
>> copied, 110.582 s, 926 kB/s
>>
>> real 1m50.585s user 0m0.106s sys 0m2.233s
>>
>> While doing this directly on ssd mount point shows:
>>
>> time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct
>> 25000+0 records in 25000+0 records out 10240 bytes (102 MB)
>> copied, 1.38567 s, 73.9 MB/s
>>
>> OSDs are in XFS with these extra arguments :
>>
>> rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M
>>
>> ceph.conf
>>
>> [global] fsid = 7d889081-7826-439c-9fe5-d4e57480d9be
>> mon_initial_members = ceph1, ceph2, ceph3 mon_host =
>> 10.99.10.118,10.99.10.119,10.99.10.120 auth_cluster_required =
>> cephx auth_service_required = cephx auth_client_required = cephx
>> filestore_xattr_use_omap = true osd_pool_default_size = 2
>> osd_pool_default_min_size = 2 osd_pool_default_pg_num = 450
>> osd_pool_default_pgp_num = 450 max_open_files = 131072
>>
>> [osd] osd_mkfs_type = xfs osd_op_threads = 8 osd_disk_threads = 4
>> osd_mount_options_xfs =
>> "rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M"
>>
>>
>> on our traditional storage with Full SAS disk, same "dd" completes
>> in 16s with an average write speed of 6Mbps.
>>
>> 

Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

2015-02-28 Thread mad Engineer
reinstalled ceph packages and now with memstore backend [osd objectstore
=memstore] its giving 400Kbps .No idea where the problem is.

On Sun, Mar 1, 2015 at 12:30 AM, mad Engineer 
wrote:

> tried changing scheduler from deadline to noop also upgraded to Gaint and
> btrfs filesystem,downgraded kernel to 3.16 from 3.16-3 not much difference
>
> dd if=/dev/zero of=hi bs=4k count=25000 oflag=direct
> 25000+0 records in
> 25000+0 records out
> 10240 bytes (102 MB) copied, 94.691 s, 1.1 MB/s
>
> Earlier on a vmware setup i was getting ~850 KBps and now even on physical
> server with SSD drives its just over 1MBps.I doubt some serious
> configuration issues.
>
> Tried iperf between 3 servers all are showing 9 Gbps,tried icmp with
> different packet size ,no fragmentation.
>
> i also noticed that out of 9 osd 5 are 850 EVO and 4 are 840 EVO.I believe
> this will not cause this much drop in performance.
>
> Thanks for any help
>
>
> On Sat, Feb 28, 2015 at 6:49 PM, Alexandre DERUMIER 
> wrote:
>
>> As optimisation,
>>
>> try to set ioscheduler to noop,
>>
>> and also enable rbd_cache=true. (It's really helping for for sequential
>> writes)
>>
>> but your results seem quite low, 926kb/s with 4k, it's only 200io/s.
>>
>> check if you don't have any big network latencies, or mtu fragementation
>> problem.
>>
>> Maybe also try to bench with fio, with more parallel jobs.
>>
>>
>>
>>
>> ----- Mail original -----
>> De: "mad Engineer" 
>> À: "Philippe Schwarz" 
>> Cc: "ceph-users" 
>> Envoyé: Samedi 28 Février 2015 13:06:59
>> Objet: Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and
>> 9 OSD with 3.16-3 kernel
>>
>> Thanks for the reply Philippe,we were using these disks in our NAS,now
>> it looks like i am in big trouble :-(
>>
>> On Sat, Feb 28, 2015 at 5:02 PM, Philippe Schwarz 
>> wrote:
>> > -BEGIN PGP SIGNED MESSAGE-
>> > Hash: SHA1
>> >
>> > Le 28/02/2015 12:19, mad Engineer a écrit :
>> >> Hello All,
>> >>
>> >> I am trying ceph-firefly 0.80.8
>> >> (69eaad7f8308f21573c604f121956e64679a52a7) with 9 OSD ,all Samsung
>> >> SSD 850 EVO on 3 servers with 24 G RAM,16 cores @2.27 Ghz Ubuntu
>> >> 14.04 LTS with 3.16-3 kernel.All are connected to 10G ports with
>> >> maximum MTU.There are no extra disks for journaling and also there
>> >> are no separate network for replication and data transfer.All 3
>> >> nodes are also hosting monitoring process.Operating system runs on
>> >> SATA disk.
>> >>
>> >> When doing a sequential benchmark using "dd" on RBD, mounted on
>> >> client as ext4 its taking 110s to write 100Mb data at an average
>> >> speed of 926Kbps.
>> >>
>> >> time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct
>> >> 25000+0 records in 25000+0 records out 10240 bytes (102 MB)
>> >> copied, 110.582 s, 926 kB/s
>> >>
>> >> real 1m50.585s user 0m0.106s sys 0m2.233s
>> >>
>> >> While doing this directly on ssd mount point shows:
>> >>
>> >> time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct
>> >> 25000+0 records in 25000+0 records out 10240 bytes (102 MB)
>> >> copied, 1.38567 s, 73.9 MB/s
>> >>
>> >> OSDs are in XFS with these extra arguments :
>> >>
>> >> rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M
>> >>
>> >> ceph.conf
>> >>
>> >> [global] fsid = 7d889081-7826-439c-9fe5-d4e57480d9be
>> >> mon_initial_members = ceph1, ceph2, ceph3 mon_host =
>> >> 10.99.10.118,10.99.10.119,10.99.10.120 auth_cluster_required =
>> >> cephx auth_service_required = cephx auth_client_required = cephx
>> >> filestore_xattr_use_omap = true osd_pool_default_size = 2
>> >> osd_pool_default_min_size = 2 osd_pool_default_pg_num = 450
>> >> osd_pool_default_pgp_num = 450 max_open_files = 131072
>> >>
>> >> [osd] osd_mkfs_type = xfs osd_op_threads = 8 osd_disk_threads = 4
>> >> osd_mount_options_xfs =
>> >> "rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M"
>> >>
>> >>
>> >> on our traditional storage with Full SAS disk, same "dd" completes
>> >> in 16s with an average write speed of 6Mbps.
>> >>
>> >> Rados bench:
>> 

Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

2015-02-28 Thread mad Engineer
tried changing scheduler from deadline to noop also upgraded to Gaint and
btrfs filesystem,downgraded kernel to 3.16 from 3.16-3 not much difference

dd if=/dev/zero of=hi bs=4k count=25000 oflag=direct
25000+0 records in
25000+0 records out
10240 bytes (102 MB) copied, 94.691 s, 1.1 MB/s

Earlier on a vmware setup i was getting ~850 KBps and now even on physical
server with SSD drives its just over 1MBps.I doubt some serious
configuration issues.

Tried iperf between 3 servers all are showing 9 Gbps,tried icmp with
different packet size ,no fragmentation.

i also noticed that out of 9 osd 5 are 850 EVO and 4 are 840 EVO.I believe
this will not cause this much drop in performance.

Thanks for any help


On Sat, Feb 28, 2015 at 6:49 PM, Alexandre DERUMIER 
wrote:

> As optimisation,
>
> try to set ioscheduler to noop,
>
> and also enable rbd_cache=true. (It's really helping for for sequential
> writes)
>
> but your results seem quite low, 926kb/s with 4k, it's only 200io/s.
>
> check if you don't have any big network latencies, or mtu fragementation
> problem.
>
> Maybe also try to bench with fio, with more parallel jobs.
>
>
>
>
> - Mail original -
> De: "mad Engineer" 
> À: "Philippe Schwarz" 
> Cc: "ceph-users" 
> Envoyé: Samedi 28 Février 2015 13:06:59
> Objet: Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9
> OSD with 3.16-3 kernel
>
> Thanks for the reply Philippe,we were using these disks in our NAS,now
> it looks like i am in big trouble :-(
>
> On Sat, Feb 28, 2015 at 5:02 PM, Philippe Schwarz 
> wrote:
> > -BEGIN PGP SIGNED MESSAGE-
> > Hash: SHA1
> >
> > Le 28/02/2015 12:19, mad Engineer a écrit :
> >> Hello All,
> >>
> >> I am trying ceph-firefly 0.80.8
> >> (69eaad7f8308f21573c604f121956e64679a52a7) with 9 OSD ,all Samsung
> >> SSD 850 EVO on 3 servers with 24 G RAM,16 cores @2.27 Ghz Ubuntu
> >> 14.04 LTS with 3.16-3 kernel.All are connected to 10G ports with
> >> maximum MTU.There are no extra disks for journaling and also there
> >> are no separate network for replication and data transfer.All 3
> >> nodes are also hosting monitoring process.Operating system runs on
> >> SATA disk.
> >>
> >> When doing a sequential benchmark using "dd" on RBD, mounted on
> >> client as ext4 its taking 110s to write 100Mb data at an average
> >> speed of 926Kbps.
> >>
> >> time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct
> >> 25000+0 records in 25000+0 records out 10240 bytes (102 MB)
> >> copied, 110.582 s, 926 kB/s
> >>
> >> real 1m50.585s user 0m0.106s sys 0m2.233s
> >>
> >> While doing this directly on ssd mount point shows:
> >>
> >> time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct
> >> 25000+0 records in 25000+0 records out 10240 bytes (102 MB)
> >> copied, 1.38567 s, 73.9 MB/s
> >>
> >> OSDs are in XFS with these extra arguments :
> >>
> >> rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M
> >>
> >> ceph.conf
> >>
> >> [global] fsid = 7d889081-7826-439c-9fe5-d4e57480d9be
> >> mon_initial_members = ceph1, ceph2, ceph3 mon_host =
> >> 10.99.10.118,10.99.10.119,10.99.10.120 auth_cluster_required =
> >> cephx auth_service_required = cephx auth_client_required = cephx
> >> filestore_xattr_use_omap = true osd_pool_default_size = 2
> >> osd_pool_default_min_size = 2 osd_pool_default_pg_num = 450
> >> osd_pool_default_pgp_num = 450 max_open_files = 131072
> >>
> >> [osd] osd_mkfs_type = xfs osd_op_threads = 8 osd_disk_threads = 4
> >> osd_mount_options_xfs =
> >> "rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M"
> >>
> >>
> >> on our traditional storage with Full SAS disk, same "dd" completes
> >> in 16s with an average write speed of 6Mbps.
> >>
> >> Rados bench:
> >>
> >> rados bench -p rbd 10 write Maintaining 16 concurrent writes of
> >> 4194304 bytes for up to 10 seconds or 0 objects Object prefix:
> >> benchmark_data_ceph1_2977 sec Cur ops started finished avg MB/s
> >> cur MB/s last lat avg lat 0 0 0 0
> >> 0 0 - 0 1 16 94 78
> >> 311.821 312 0.041228 0.140132 2 16 192 176
> >> 351.866 392 0.106294 0.175055 3 16 275 259
> >> 345.216 332 0.076795 0.166036 4 16 302 286
> >> 285.912 108 0.043888 0.196419 5 16 395 379
> >> 303.11 372 0.126033 0.207488 6 16 501 485
> >> 323.242 424 0.125972 0

Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

2015-02-28 Thread Stefan Priebe


Am 28.02.2015 um 19:41 schrieb Kevin Walker:

What about the Samsung 845DC Pro SSD's?

These have fantastic enterprise performance characteristics.

http://www.thessdreview.com/our-reviews/samsung-845dc-pro-review-800gb-class-leading-speed-endurance/


Or use SV843 from Samsung Semiconductor (seperate samsung company).

Stefan


Kind regards

Kevin

On 28 February 2015 at 15:32, Philippe Schwarz mailto:p...@schwarz-fr.net>> wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Le 28/02/2015 12:19, mad Engineer a écrit :
 > Hello All,
 >
 > I am trying ceph-firefly 0.80.8
 > (69eaad7f8308f21573c604f121956e64679a52a7) with 9 OSD ,all Samsung
 > SSD 850 EVO on 3 servers with 24 G RAM,16 cores @2.27 Ghz Ubuntu
 > 14.04 LTS with 3.16-3 kernel.All are connected to 10G ports with
 > maximum MTU.There are no extra disks for journaling and also there
 > are no separate network for replication and data transfer.All 3
 > nodes are also hosting monitoring process.Operating system runs on
 > SATA disk.
 >
 > When doing a sequential benchmark using "dd" on RBD, mounted on
 > client as ext4 its taking 110s to write 100Mb data at an average
 > speed of 926Kbps.
 >
 > time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct
 > 25000+0 records in 25000+0 records out 10240 bytes (102 MB)
 > copied, 110.582 s, 926 kB/s
 >
 > real1m50.585s user0m0.106s sys 0m2.233s
 >
 > While doing this directly on ssd mount point shows:
 >
 > time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct
 > 25000+0 records in 25000+0 records out 10240 bytes (102 MB)
 > copied, 1.38567 s, 73.9 MB/s
 >
 > OSDs are in XFS with these extra arguments :
 >
 > rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M
 >
 > ceph.conf
 >
 > [global] fsid = 7d889081-7826-439c-9fe5-d4e57480d9be
 > mon_initial_members = ceph1, ceph2, ceph3 mon_host =
 > 10.99.10.118,10.99.10.119,10.99.10.120 auth_cluster_required =
 > cephx auth_service_required = cephx auth_client_required = cephx
 > filestore_xattr_use_omap = true osd_pool_default_size = 2
 > osd_pool_default_min_size = 2 osd_pool_default_pg_num = 450
 > osd_pool_default_pgp_num = 450 max_open_files = 131072
 >
 > [osd] osd_mkfs_type = xfs osd_op_threads = 8 osd_disk_threads = 4
 > osd_mount_options_xfs =
 > "rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M"
 >
 >
 > on our traditional storage with Full SAS disk, same "dd" completes
 > in 16s with an average write speed of 6Mbps.
 >
 > Rados bench:
 >
 > rados bench -p rbd 10 write Maintaining 16 concurrent writes of
 > 4194304 bytes for up to 10 seconds or 0 objects Object prefix:
 > benchmark_data_ceph1_2977 sec Cur ops   started  finished  avg MB/s
 > cur MB/s  last lat   avg lat 0   0 0 0
 > 0 0 - 0 1  169478
 > 311.821   312  0.041228  0.140132 2  16   192   176
 > 351.866   392  0.106294  0.175055 3  16   275   259
 > 345.216   332  0.076795  0.166036 4  16   302   286
 > 285.912   108  0.043888  0.196419 5  16   395   379
 > 303.11   372  0.126033  0.207488 6  16   501   485
 > 323.242   424  0.125972  0.194559 7  16   621   605
 > 345.621   480  0.194155  0.183123 8  16   730   714
 > 356.903   436  0.086678  0.176099 9  16   814   798
 > 354.572   336  0.081567  0.174786 10  16   832
 > 816   326.31372  0.037431  0.182355 11  16   833
 > 817   297.013 4  0.533326  0.182784 Total time run:
 > 11.489068 Total writes made:  833 Write size:
 > 4194304 Bandwidth (MB/sec): 290.015
 >
 > Stddev Bandwidth:   175.723 Max bandwidth (MB/sec): 480 Min
 > bandwidth (MB/sec): 0 Average Latency:0.220582 Stddev
 > Latency: 0.343697 Max latency:2.85104 Min
 > latency:0.035381
 >
 > Our ultimate aim is to replace existing SAN with ceph,but for that
 > it should meet minimum 8000 iops.Can any one help me with this,OSD
 > are SSD,CPU has good clock speed,backend network is good but still
 > we are not able to extract full capability of SSD disks.
 >
 >
 >
 > Thanks,

Hi, i'm new to ceph so, don't consider my words as holy truth.

It seems that Samsung 840 (so i assume 850) are crappy for ceph :

MTBF :

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-November/044258.html
Bandwidth

:http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-December/045247.html

And according to a confirmed user of Ceph/ProxmoX, Samsung SSDs should
be avoided if possible in ceph storage.

Apart fr

Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

2015-02-28 Thread Kevin Walker
What about the Samsung 845DC Pro SSD's?

These have fantastic enterprise performance characteristics.

http://www.thessdreview.com/our-reviews/samsung-845dc-pro-review-800gb-class-leading-speed-endurance/

Kind regards

Kevin

On 28 February 2015 at 15:32, Philippe Schwarz  wrote:

> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> Le 28/02/2015 12:19, mad Engineer a écrit :
> > Hello All,
> >
> > I am trying ceph-firefly 0.80.8
> > (69eaad7f8308f21573c604f121956e64679a52a7) with 9 OSD ,all Samsung
> > SSD 850 EVO on 3 servers with 24 G RAM,16 cores @2.27 Ghz Ubuntu
> > 14.04 LTS with 3.16-3 kernel.All are connected to 10G ports with
> > maximum MTU.There are no extra disks for journaling and also there
> > are no separate network for replication and data transfer.All 3
> > nodes are also hosting monitoring process.Operating system runs on
> > SATA disk.
> >
> > When doing a sequential benchmark using "dd" on RBD, mounted on
> > client as ext4 its taking 110s to write 100Mb data at an average
> > speed of 926Kbps.
> >
> > time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct
> > 25000+0 records in 25000+0 records out 10240 bytes (102 MB)
> > copied, 110.582 s, 926 kB/s
> >
> > real1m50.585s user0m0.106s sys 0m2.233s
> >
> > While doing this directly on ssd mount point shows:
> >
> > time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct
> > 25000+0 records in 25000+0 records out 10240 bytes (102 MB)
> > copied, 1.38567 s, 73.9 MB/s
> >
> > OSDs are in XFS with these extra arguments :
> >
> > rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M
> >
> > ceph.conf
> >
> > [global] fsid = 7d889081-7826-439c-9fe5-d4e57480d9be
> > mon_initial_members = ceph1, ceph2, ceph3 mon_host =
> > 10.99.10.118,10.99.10.119,10.99.10.120 auth_cluster_required =
> > cephx auth_service_required = cephx auth_client_required = cephx
> > filestore_xattr_use_omap = true osd_pool_default_size = 2
> > osd_pool_default_min_size = 2 osd_pool_default_pg_num = 450
> > osd_pool_default_pgp_num = 450 max_open_files = 131072
> >
> > [osd] osd_mkfs_type = xfs osd_op_threads = 8 osd_disk_threads = 4
> > osd_mount_options_xfs =
> > "rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M"
> >
> >
> > on our traditional storage with Full SAS disk, same "dd" completes
> > in 16s with an average write speed of 6Mbps.
> >
> > Rados bench:
> >
> > rados bench -p rbd 10 write Maintaining 16 concurrent writes of
> > 4194304 bytes for up to 10 seconds or 0 objects Object prefix:
> > benchmark_data_ceph1_2977 sec Cur ops   started  finished  avg MB/s
> > cur MB/s  last lat   avg lat 0   0 0 0
> > 0 0 - 0 1  169478
> > 311.821   312  0.041228  0.140132 2  16   192   176
> > 351.866   392  0.106294  0.175055 3  16   275   259
> > 345.216   332  0.076795  0.166036 4  16   302   286
> > 285.912   108  0.043888  0.196419 5  16   395   379
> > 303.11   372  0.126033  0.207488 6  16   501   485
> > 323.242   424  0.125972  0.194559 7  16   621   605
> > 345.621   480  0.194155  0.183123 8  16   730   714
> > 356.903   436  0.086678  0.176099 9  16   814   798
> > 354.572   336  0.081567  0.174786 10  16   832
> > 816   326.31372  0.037431  0.182355 11  16   833
> > 817   297.013 4  0.533326  0.182784 Total time run:
> > 11.489068 Total writes made:  833 Write size:
> > 4194304 Bandwidth (MB/sec): 290.015
> >
> > Stddev Bandwidth:   175.723 Max bandwidth (MB/sec): 480 Min
> > bandwidth (MB/sec): 0 Average Latency:0.220582 Stddev
> > Latency: 0.343697 Max latency:2.85104 Min
> > latency:0.035381
> >
> > Our ultimate aim is to replace existing SAN with ceph,but for that
> > it should meet minimum 8000 iops.Can any one help me with this,OSD
> > are SSD,CPU has good clock speed,backend network is good but still
> > we are not able to extract full capability of SSD disks.
> >
> >
> >
> > Thanks,
>
> Hi, i'm new to ceph so, don't consider my words as holy truth.
>
> It seems that Samsung 840 (so i assume 850) are crappy for ceph :
>
> MTBF :
>
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-November/044258.html
> Bandwidth
> :
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-December/045247.html
>
> And according to a confirmed user of Ceph/ProxmoX, Samsung SSDs should
> be avoided if possible in ceph storage.
>
> Apart from that, it seems there was an limitation in ceph for the use
> of the complete bandwidth available in SSDs; but i think with less
> than 1Mb/s you haven't hit this limit.
>
> I remind you that i'm not a ceph-guru (far from that, indeed), so feel
> free to disagree; i'm on the way to improve my knowledge.
>
> Best regards.
>
>
>
>
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v1
>
> iEYEARECAAYFAlTxp0UACgkQlhqCFk

Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

2015-02-28 Thread Martin B Nielsen
Hi Andrei,

If there is one thing I've come to understand by now is that ceph configs,
performance, hw and well - everything - seems to vary on almost people
basis.

I do not recognize that latency issue either, this is from one of our nodes
(4x 500GB samsung 840 pro - sd[c-f]) which has been running for 600+ days
(so the iostat -x is an avg of that):

# uptime
 16:24:57 up 611 days,  4:03,  1 user,  load average: 1.18, 1.55, 1.72

# iostat -x
[ ... ]
Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz
avgqu-sz   await r_await w_await  svctm  %util
sdc   0.00 0.164.87   22.62   344.18   458.65
58.41 0.051.920.452.24   0.76   2.10
sdd   0.00 0.124.37   20.02   317.98   437.95
61.98 0.051.900.442.21   0.78   1.91
sde   0.00 0.124.17   19.33   302.45   403.02
60.02 0.041.870.432.18   0.77   1.80
sdf   0.00 0.124.51   20.84   322.84   439.70
60.17 0.051.840.432.15   0.76   1.93
[ ... ]

Granted, we do not have very high usage on this cluster on a ssd-basis and
it might change as we put more load on it, but we will deal with it then. I
do not think ~2ms access time is neither good nor bad.

This is from another cluster we operate - this one has an intel DC S3700
800gb ssd (sdb)
# uptime
 09:37:26 up 654 days,  8:40,  1 user,  load average: 0.33, 0.40, 0.54

# iostat -x
[ ... ]
sdb   0.01 1.49   39.76   86.79  1252.80  2096.98
52.94 0.020.761.220.54   0.41   5.21
[ ... ]

It is misleading as the latter just have 3 disks + hardware based 1gb
backed raidcontroller whereas the first is a 'cheap' dumb 12disk jbod IT
based setup.

All the ssd from both clusters have 3 partitions - 1 ceph-data and 2
journal partitions (1 journal for the ssd itself and 1 journal for 1
platter disk).

The intel ssd is very sturdy though - it has had a 2.1MB/sec avg. write
over 654 days - that is somewhere around 120TB so far.

But ultimately it boils down to what you need - in our usecase the latter
cluster has be to rockstable and performing - and we chose the intel ones
based on that. The first one we don't really care if we loose a node or two
and we replace disks every month or whenever it fits into our
going-to-datacenter-schedule - we wanted an ok'ish performing cluster and
focused more on total space / price than highperforming hardware. The
fantastic thing is we are not locked into any specific hardware and we can
replace any of it if we need to and/or find it is suddenly starting to have
issues.

Cheers,
Martin



On Sat, Feb 28, 2015 at 2:55 PM, Andrei Mikhailovsky 
wrote:

>
> Martin,
>
> I have been using Samsung 840 Pro for journals about 2 years now and have
> just replaced all my samsung drives with Intel. We have found a lot of
> performance issues with 840 Pro (we are using 128mb). In particular, a very
> strange behaviour with using 4 partitions (with 50% underprovisioning left
> as empty unpartitioned space on the drive) where the drive would grind to
> almost a halt after a few weeks of use. I was getting 100% utilisation on
> the drives doing just 3-4MB/s writes. This was not the case when I've
> installed the new drives. Manual Trimming helps for a few weeks until the
> same happens again.
>
> This has been happening with all 840 Pro ssds that we have and contacting
> Samsung Support has proven to be utterly useless. They do not want to speak
> with you until you install windows and run their monkey utility ((.
>
> Also, i've noticed the latencies of the Samsung 840 Pro ssd drives to be
> about 15-20 slower compared with a consumer grade Intel drives, like Intel
> 520. According to  ceph osd pef, I would consistently get higher figures on
> the osds with Samsung journal drive compared with the Intel drive on the
> same server. Something like 2-3ms for Intel vs 40-50ms for Samsungs.
>
> At some point we had enough with Samsungs and scrapped them.
>
> Andrei
>
> --
>
> *From: *"Martin B Nielsen" 
> *To: *"Philippe Schwarz" 
> *Cc: *ceph-users@lists.ceph.com
> *Sent: *Saturday, 28 February, 2015 11:51:57 AM
> *Subject: *Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes
> and 9 OSD with 3.16-3 kernel
>
>
> Hi,
>
> I cannot recognize that picture; we've been using samsumg 840 pro in
> production for almost 2 years now - and have had 1 fail.
>
> We run a 8node mixed ssd/platter cluster with 4x samsung 840 pro (500gb)
> in each so that is 32x ssd.
>
> They've written ~25TB data in avg each.
>
> Using the dd you had inside an existing semi-busy mysql-guest I get:
>
> 10240 bytes (102 MB) copied, 5.58218 s, 18.3 MB/s
>
> Which is still not a lot, but I think it i

Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

2015-02-28 Thread Andrei Mikhailovsky
Martin, 

I have been using Samsung 840 Pro for journals about 2 years now and have just 
replaced all my samsung drives with Intel. We have found a lot of performance 
issues with 840 Pro (we are using 128mb). In particular, a very strange 
behaviour with using 4 partitions (with 50% underprovisioning left as empty 
unpartitioned space on the drive) where the drive would grind to almost a halt 
after a few weeks of use. I was getting 100% utilisation on the drives doing 
just 3-4MB/s writes. This was not the case when I've installed the new drives. 
Manual Trimming helps for a few weeks until the same happens again. 

This has been happening with all 840 Pro ssds that we have and contacting 
Samsung Support has proven to be utterly useless. They do not want to speak 
with you until you install windows and run their monkey utility ((. 

Also, i've noticed the latencies of the Samsung 840 Pro ssd drives to be about 
15-20 slower compared with a consumer grade Intel drives, like Intel 520. 
According to ceph osd pef, I would consistently get higher figures on the osds 
with Samsung journal drive compared with the Intel drive on the same server. 
Something like 2-3ms for Intel vs 40-50ms for Samsungs. 

At some point we had enough with Samsungs and scrapped them. 

Andrei 

- Original Message -

> From: "Martin B Nielsen" 
> To: "Philippe Schwarz" 
> Cc: ceph-users@lists.ceph.com
> Sent: Saturday, 28 February, 2015 11:51:57 AM
> Subject: Re: [ceph-users] Extreme slowness in SSD cluster with 3
> nodes and 9 OSD with 3.16-3 kernel

> Hi,

> I cannot recognize that picture; we've been using samsumg 840 pro in
> production for almost 2 years now - and have had 1 fail.

> We run a 8node mixed ssd/platter cluster with 4x samsung 840 pro
> (500gb) in each so that is 32x ssd.

> They've written ~25TB data in avg each.

> Using the dd you had inside an existing semi-busy mysql-guest I get:

> 10240 bytes (102 MB) copied, 5.58218 s, 18.3 MB/s

> Which is still not a lot, but I think it is more a limitation of our
> setup/load.

> We are using dumpling.

> All that aside, I would prob. go with something tried and tested if I
> was to redo it today - we haven't had any issues, but it is still
> nice to use something you know should have a baseline performance
> and can compare to that.

> Cheers,
> Martin

> On Sat, Feb 28, 2015 at 12:32 PM, Philippe Schwarz <
> p...@schwarz-fr.net > wrote:

> > -BEGIN PGP SIGNED MESSAGE-
> 
> > Hash: SHA1
> 

> > Le 28/02/2015 12:19, mad Engineer a écrit :
> 

> > > Hello All,
> 
> > >
> 
> > > I am trying ceph-firefly 0.80.8
> 
> > > (69eaad7f8308f21573c604f121956e64679a52a7) with 9 OSD ,all
> > > Samsung
> 
> > > SSD 850 EVO on 3 servers with 24 G RAM,16 cores @2.27 Ghz Ubuntu
> 
> > > 14.04 LTS with 3.16-3 kernel.All are connected to 10G ports with
> 
> > > maximum MTU.There are no extra disks for journaling and also
> > > there
> 
> > > are no separate network for replication and data transfer.All 3
> 
> > > nodes are also hosting monitoring process.Operating system runs
> > > on
> 
> > > SATA disk.
> 
> > >
> 
> > > When doing a sequential benchmark using "dd" on RBD, mounted on
> 
> > > client as ext4 its taking 110s to write 100Mb data at an average
> 
> > > speed of 926Kbps.
> 
> > >
> 
> > > time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct
> 
> > > 25000+0 records in 25000+0 records out 10240 bytes (102 MB)
> 
> > > copied, 110.582 s, 926 kB/s
> 
> > >
> 
> > > real 1m50.585s user 0m0.106s sys 0m2.233s
> 
> > >
> 
> > > While doing this directly on ssd mount point shows:
> 
> > >
> 
> > > time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct
> 
> > > 25000+0 records in 25000+0 records out 10240 bytes (102 MB)
> 
> > > copied, 1.38567 s, 73.9 MB/s
> 
> > >
> 
> > > OSDs are in XFS with these extra arguments :
> 
> > >
> 
> > > rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M
> 
> > >
> 
> > > ceph.conf
> 
> > >
> 
> > > [global] fsid = 7d889081-7826-439c-9fe5-d4e57480d9be
> 
> > > mon_initial_members = ceph1, ceph2, ceph3 mon_host =
> 
> > > 10.99.10.118,10.99.10.119,10.99.10.120 auth_cluster_required =
> 
> > > cephx auth_service_required = cephx auth_client_required = cephx
> 
> > > filestore_xattr_use_omap = true osd_pool_default_size = 2
> 
> > > osd_pool_default_min_size = 2 osd

Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

2015-02-28 Thread mad Engineer
thanks for that link Alexandre,
as per that link tried these:
 *850 EVO*
*without dsync*

 dd if=randfile of=/dev/sdb1 bs=4k count=10 oflag=direct
10+0 records in
10+0 records out
40960 bytes (410 MB) copied, 4.42913 s, 92.5 MB/s

with *dsync*:

 dd if=randfile of=/dev/sdb1 bs=4k count=10 oflag=direct,dsync
10+0 records in
10+0 records out
40960 bytes (410 MB) copied, 83.4916 s, 4.9 MB/s

*on 840 EVO*
dd if=randfile of=/dev/sdd1 bs=4k count=10 oflag=direct
10+0 records in
10+0 records out
40960 bytes (410 MB) copied, 5.11912 s, 80.0 MB/s

*with dsync*
 dd if=randfile of=/dev/sdd1 bs=4k count=10 oflag=direct,dsync
10+0 records in
10+0 records out
40960 bytes (410 MB) copied, 196.738 s, 2.1 MB/s

So with dsync there is significant reduction in performance,looks like 850
is better than 840.Can this be the reason for reduced write speed of
926kbps?

Also before trying on physical servers i ran ceph on vmware vms with SAS
disks using giant 0.87 ,at that time fire-fly 80.8 was giving higher
numbers,so decided to use firefly.

On Sat, Feb 28, 2015 at 5:13 PM, Alexandre DERUMIER 
wrote:

> Hi,
>
> First, test if your ssd can write fast with O_DSYNC
> check this blog:
>
> http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/
>
>
> Then, try with ceph Giant (or maybe wait for Hammer), because they are a
> lot of optimisations for ssd for threads sharding.
>
> In my last test with giant, I was able to reach around 12iops with
> 6osd/intel s3500 ssd, but I was cpu limited.
>
> - Mail original -
> De: "mad Engineer" 
> À: "ceph-users" 
> Envoyé: Samedi 28 Février 2015 12:19:56
> Objet: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9
> OSD  with 3.16-3 kernel
>
> Hello All,
>
> I am trying ceph-firefly 0.80.8
> (69eaad7f8308f21573c604f121956e64679a52a7) with 9 OSD ,all Samsung SSD
> 850 EVO on 3 servers with 24 G RAM,16 cores @2.27 Ghz Ubuntu 14.04 LTS
> with 3.16-3 kernel.All are connected to 10G ports with maximum
> MTU.There are no extra disks for journaling and also there are no
> separate network for replication and data transfer.All 3 nodes are
> also hosting monitoring process.Operating system runs on SATA disk.
>
> When doing a sequential benchmark using "dd" on RBD, mounted on client
> as ext4 its taking 110s to write 100Mb data at an average speed of
> 926Kbps.
>
> time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct
> 25000+0 records in
> 25000+0 records out
> 10240 bytes (102 MB) copied, 110.582 s, 926 kB/s
>
> real 1m50.585s
> user 0m0.106s
> sys 0m2.233s
>
> While doing this directly on ssd mount point shows:
>
> time dd if=/dev/zero of=hello bs=4k count=25000
> oflag=direct
> 25000+0 records in
> 25000+0 records out
> 10240 bytes (102 MB) copied, 1.38567
> s, 73.9 MB/s
>
> OSDs are in XFS with these extra arguments :
>
> rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M
>
> ceph.conf
>
> [global]
> fsid = 7d889081-7826-439c-9fe5-d4e57480d9be
> mon_initial_members = ceph1, ceph2, ceph3
> mon_host = 10.99.10.118,10.99.10.119,10.99.10.120
> auth_cluster_required = cephx
> auth_service_required = cephx
> auth_client_required = cephx
> filestore_xattr_use_omap = true
> osd_pool_default_size = 2
> osd_pool_default_min_size = 2
> osd_pool_default_pg_num = 450
> osd_pool_default_pgp_num = 450
> max_open_files = 131072
>
> [osd]
> osd_mkfs_type = xfs
> osd_op_threads = 8
> osd_disk_threads = 4
> osd_mount_options_xfs =
> "rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M"
>
>
> on our traditional storage with Full SAS disk, same "dd" completes in
> 16s with an average write speed of 6Mbps.
>
> Rados bench:
>
> rados bench -p rbd 10 write
> Maintaining 16 concurrent writes of 4194304 bytes for up to 10
> seconds or 0 objects
> Object prefix: benchmark_data_ceph1_2977
> sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
> 0 0 0 0 0 0 - 0
> 1 16 94 78 311.821 312 0.041228 0.140132
> 2 16 192 176 351.866 392 0.106294 0.175055
> 3 16 275 259 345.216 332 0.076795 0.166036
> 4 16 302 286 285.912 108 0.043888 0.196419
> 5 16 395 379 303.11 372 0.126033 0.207488
> 6 16 501 485 323.242 424 0.125972 0.194559
> 7 16 621 605 345.621 480 0.194155 0.183123
> 8 16 730 714 356.903 436 0.086678 0.176099
> 9 16 814 798 354.572 336 0.081567 0.174786
> 10 16 832 816 326.313 72 0.037431 0.182355
> 11 16 833 817 297.013 4 0.533326 0.182784
> Total time run: 11.489068
> Total writes made: 833
> Write size: 4194304
> Bandwidth (MB/sec): 290.015
>
> Stddev Bandwidth: 175.723
> Max 

Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

2015-02-28 Thread Alexandre DERUMIER
As optimisation,

try to set ioscheduler to noop, 

and also enable rbd_cache=true. (It's really helping for for sequential writes)

but your results seem quite low, 926kb/s with 4k, it's only 200io/s.

check if you don't have any big network latencies, or mtu fragementation 
problem.

Maybe also try to bench with fio, with more parallel jobs.




- Mail original -
De: "mad Engineer" 
À: "Philippe Schwarz" 
Cc: "ceph-users" 
Envoyé: Samedi 28 Février 2015 13:06:59
Objet: Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD 
with 3.16-3 kernel

Thanks for the reply Philippe,we were using these disks in our NAS,now 
it looks like i am in big trouble :-( 

On Sat, Feb 28, 2015 at 5:02 PM, Philippe Schwarz  wrote: 
> -BEGIN PGP SIGNED MESSAGE- 
> Hash: SHA1 
> 
> Le 28/02/2015 12:19, mad Engineer a écrit : 
>> Hello All, 
>> 
>> I am trying ceph-firefly 0.80.8 
>> (69eaad7f8308f21573c604f121956e64679a52a7) with 9 OSD ,all Samsung 
>> SSD 850 EVO on 3 servers with 24 G RAM,16 cores @2.27 Ghz Ubuntu 
>> 14.04 LTS with 3.16-3 kernel.All are connected to 10G ports with 
>> maximum MTU.There are no extra disks for journaling and also there 
>> are no separate network for replication and data transfer.All 3 
>> nodes are also hosting monitoring process.Operating system runs on 
>> SATA disk. 
>> 
>> When doing a sequential benchmark using "dd" on RBD, mounted on 
>> client as ext4 its taking 110s to write 100Mb data at an average 
>> speed of 926Kbps. 
>> 
>> time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct 
>> 25000+0 records in 25000+0 records out 10240 bytes (102 MB) 
>> copied, 110.582 s, 926 kB/s 
>> 
>> real 1m50.585s user 0m0.106s sys 0m2.233s 
>> 
>> While doing this directly on ssd mount point shows: 
>> 
>> time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct 
>> 25000+0 records in 25000+0 records out 10240 bytes (102 MB) 
>> copied, 1.38567 s, 73.9 MB/s 
>> 
>> OSDs are in XFS with these extra arguments : 
>> 
>> rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M 
>> 
>> ceph.conf 
>> 
>> [global] fsid = 7d889081-7826-439c-9fe5-d4e57480d9be 
>> mon_initial_members = ceph1, ceph2, ceph3 mon_host = 
>> 10.99.10.118,10.99.10.119,10.99.10.120 auth_cluster_required = 
>> cephx auth_service_required = cephx auth_client_required = cephx 
>> filestore_xattr_use_omap = true osd_pool_default_size = 2 
>> osd_pool_default_min_size = 2 osd_pool_default_pg_num = 450 
>> osd_pool_default_pgp_num = 450 max_open_files = 131072 
>> 
>> [osd] osd_mkfs_type = xfs osd_op_threads = 8 osd_disk_threads = 4 
>> osd_mount_options_xfs = 
>> "rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M" 
>> 
>> 
>> on our traditional storage with Full SAS disk, same "dd" completes 
>> in 16s with an average write speed of 6Mbps. 
>> 
>> Rados bench: 
>> 
>> rados bench -p rbd 10 write Maintaining 16 concurrent writes of 
>> 4194304 bytes for up to 10 seconds or 0 objects Object prefix: 
>> benchmark_data_ceph1_2977 sec Cur ops started finished avg MB/s 
>> cur MB/s last lat avg lat 0 0 0 0 
>> 0 0 - 0 1 16 94 78 
>> 311.821 312 0.041228 0.140132 2 16 192 176 
>> 351.866 392 0.106294 0.175055 3 16 275 259 
>> 345.216 332 0.076795 0.166036 4 16 302 286 
>> 285.912 108 0.043888 0.196419 5 16 395 379 
>> 303.11 372 0.126033 0.207488 6 16 501 485 
>> 323.242 424 0.125972 0.194559 7 16 621 605 
>> 345.621 480 0.194155 0.183123 8 16 730 714 
>> 356.903 436 0.086678 0.176099 9 16 814 798 
>> 354.572 336 0.081567 0.174786 10 16 832 
>> 816 326.313 72 0.037431 0.182355 11 16 833 
>> 817 297.013 4 0.533326 0.182784 Total time run: 
>> 11.489068 Total writes made: 833 Write size: 
>> 4194304 Bandwidth (MB/sec): 290.015 
>> 
>> Stddev Bandwidth: 175.723 Max bandwidth (MB/sec): 480 Min 
>> bandwidth (MB/sec): 0 Average Latency: 0.220582 Stddev 
>> Latency: 0.343697 Max latency: 2.85104 Min 
>> latency: 0.035381 
>> 
>> Our ultimate aim is to replace existing SAN with ceph,but for that 
>> it should meet minimum 8000 iops.Can any one help me with this,OSD 
>> are SSD,CPU has good clock speed,backend network is good but still 
>> we are not able to extract full capability of SSD disks. 
>> 
>> 
>> 
>> Thanks, 
> 
> Hi, i'm new to ceph so, don't consider my words as holy truth. 
> 
> It seems that Samsung 840 (so i assume 850) are crappy for ceph : 
> 
> MTBF : 
> http://lists.cep

Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

2015-02-28 Thread Alexandre DERUMIER
>>But this was replication1? I never was able to do more than 30 000 with 
>>replication 3.

Oh, sorry, it's was about read.

for write, I think I was around 3iops with 3 nodes (2x4cores 2,1ghz each), 
cpu bound, with replication x1.
with replication x3, around 9000iops.


Going to test on 2x10cores 3,1ghz in some weeks.





- Mail original -
De: "Stefan Priebe" 
À: "aderumier" 
Cc: "mad Engineer" , "ceph-users" 

Envoyé: Samedi 28 Février 2015 13:42:54
Objet: Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD  
with 3.16-3 kernel

> Am 28.02.2015 um 12:43 schrieb Alexandre DERUMIER : 
> 
> Hi, 
> 
> First, test if your ssd can write fast with O_DSYNC 
> check this blog: 
> http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/
>  
> 
> 
> Then, try with ceph Giant (or maybe wait for Hammer), because they are a lot 
> of optimisations for ssd for threads sharding. 
> 
> In my last test with giant, I was able to reach around 12iops with 
> 6osd/intel s3500 ssd, but I was cpu limited. 

But this was replication1? I never was able to do more than 30 000 with 
replication 3. 

Stefan 


> 
> - Mail original - 
> De: "mad Engineer"  
> À: "ceph-users"  
> Envoyé: Samedi 28 Février 2015 12:19:56 
> Objet: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD 
> with 3.16-3 kernel 
> 
> Hello All, 
> 
> I am trying ceph-firefly 0.80.8 
> (69eaad7f8308f21573c604f121956e64679a52a7) with 9 OSD ,all Samsung SSD 
> 850 EVO on 3 servers with 24 G RAM,16 cores @2.27 Ghz Ubuntu 14.04 LTS 
> with 3.16-3 kernel.All are connected to 10G ports with maximum 
> MTU.There are no extra disks for journaling and also there are no 
> separate network for replication and data transfer.All 3 nodes are 
> also hosting monitoring process.Operating system runs on SATA disk. 
> 
> When doing a sequential benchmark using "dd" on RBD, mounted on client 
> as ext4 its taking 110s to write 100Mb data at an average speed of 
> 926Kbps. 
> 
> time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct 
> 25000+0 records in 
> 25000+0 records out 
> 10240 bytes (102 MB) copied, 110.582 s, 926 kB/s 
> 
> real 1m50.585s 
> user 0m0.106s 
> sys 0m2.233s 
> 
> While doing this directly on ssd mount point shows: 
> 
> time dd if=/dev/zero of=hello bs=4k count=25000 
> oflag=direct 
> 25000+0 records in 
> 25000+0 records out 
> 10240 bytes (102 MB) copied, 1.38567 
> s, 73.9 MB/s 
> 
> OSDs are in XFS with these extra arguments : 
> 
> rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M 
> 
> ceph.conf 
> 
> [global] 
> fsid = 7d889081-7826-439c-9fe5-d4e57480d9be 
> mon_initial_members = ceph1, ceph2, ceph3 
> mon_host = 10.99.10.118,10.99.10.119,10.99.10.120 
> auth_cluster_required = cephx 
> auth_service_required = cephx 
> auth_client_required = cephx 
> filestore_xattr_use_omap = true 
> osd_pool_default_size = 2 
> osd_pool_default_min_size = 2 
> osd_pool_default_pg_num = 450 
> osd_pool_default_pgp_num = 450 
> max_open_files = 131072 
> 
> [osd] 
> osd_mkfs_type = xfs 
> osd_op_threads = 8 
> osd_disk_threads = 4 
> osd_mount_options_xfs = 
> "rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M" 
> 
> 
> on our traditional storage with Full SAS disk, same "dd" completes in 
> 16s with an average write speed of 6Mbps. 
> 
> Rados bench: 
> 
> rados bench -p rbd 10 write 
> Maintaining 16 concurrent writes of 4194304 bytes for up to 10 
> seconds or 0 objects 
> Object prefix: benchmark_data_ceph1_2977 
> sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 
> 0 0 0 0 0 0 - 0 
> 1 16 94 78 311.821 312 0.041228 0.140132 
> 2 16 192 176 351.866 392 0.106294 0.175055 
> 3 16 275 259 345.216 332 0.076795 0.166036 
> 4 16 302 286 285.912 108 0.043888 0.196419 
> 5 16 395 379 303.11 372 0.126033 0.207488 
> 6 16 501 485 323.242 424 0.125972 0.194559 
> 7 16 621 605 345.621 480 0.194155 0.183123 
> 8 16 730 714 356.903 436 0.086678 0.176099 
> 9 16 814 798 354.572 336 0.081567 0.174786 
> 10 16 832 816 326.313 72 0.037431 0.182355 
> 11 16 833 817 297.013 4 0.533326 0.182784 
> Total time run: 11.489068 
> Total writes made: 833 
> Write size: 4194304 
> Bandwidth (MB/sec): 290.015 
> 
> Stddev Bandwidth: 175.723 
> Max bandwidth (MB/sec): 480 
> Min bandwidth (MB/sec): 0 
> Average Latency: 0.220582 
> Stddev Latency: 0.343697 
> Max latency: 2.85104 
> Min latency: 0.035381 
> 
> Our ultimate aim is to replace existing SAN with ceph,but for that it 
&g

Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

2015-02-28 Thread Stefan Priebe - Profihost AG

> Am 28.02.2015 um 12:43 schrieb Alexandre DERUMIER :
> 
> Hi,
> 
> First, test if your ssd can write fast with O_DSYNC
> check this blog:
> http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/
> 
> 
> Then, try with ceph Giant (or maybe wait for Hammer), because they are a lot 
> of optimisations for ssd for threads sharding.
> 
> In my last test with giant, I was able to reach around 12iops with 
> 6osd/intel s3500 ssd, but I was cpu limited.

But this was replication1? I never was able to do more than 30 000 with 
replication 3.

Stefan


> 
> - Mail original -
> De: "mad Engineer" 
> À: "ceph-users" 
> Envoyé: Samedi 28 Février 2015 12:19:56
> Objet: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD
> with 3.16-3 kernel
> 
> Hello All, 
> 
> I am trying ceph-firefly 0.80.8 
> (69eaad7f8308f21573c604f121956e64679a52a7) with 9 OSD ,all Samsung SSD 
> 850 EVO on 3 servers with 24 G RAM,16 cores @2.27 Ghz Ubuntu 14.04 LTS 
> with 3.16-3 kernel.All are connected to 10G ports with maximum 
> MTU.There are no extra disks for journaling and also there are no 
> separate network for replication and data transfer.All 3 nodes are 
> also hosting monitoring process.Operating system runs on SATA disk. 
> 
> When doing a sequential benchmark using "dd" on RBD, mounted on client 
> as ext4 its taking 110s to write 100Mb data at an average speed of 
> 926Kbps. 
> 
> time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct 
> 25000+0 records in 
> 25000+0 records out 
> 10240 bytes (102 MB) copied, 110.582 s, 926 kB/s 
> 
> real 1m50.585s 
> user 0m0.106s 
> sys 0m2.233s 
> 
> While doing this directly on ssd mount point shows: 
> 
> time dd if=/dev/zero of=hello bs=4k count=25000 
> oflag=direct 
> 25000+0 records in 
> 25000+0 records out 
> 10240 bytes (102 MB) copied, 1.38567 
> s, 73.9 MB/s 
> 
> OSDs are in XFS with these extra arguments : 
> 
> rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M 
> 
> ceph.conf 
> 
> [global] 
> fsid = 7d889081-7826-439c-9fe5-d4e57480d9be 
> mon_initial_members = ceph1, ceph2, ceph3 
> mon_host = 10.99.10.118,10.99.10.119,10.99.10.120 
> auth_cluster_required = cephx 
> auth_service_required = cephx 
> auth_client_required = cephx 
> filestore_xattr_use_omap = true 
> osd_pool_default_size = 2 
> osd_pool_default_min_size = 2 
> osd_pool_default_pg_num = 450 
> osd_pool_default_pgp_num = 450 
> max_open_files = 131072 
> 
> [osd] 
> osd_mkfs_type = xfs 
> osd_op_threads = 8 
> osd_disk_threads = 4 
> osd_mount_options_xfs = 
> "rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M" 
> 
> 
> on our traditional storage with Full SAS disk, same "dd" completes in 
> 16s with an average write speed of 6Mbps. 
> 
> Rados bench: 
> 
> rados bench -p rbd 10 write 
> Maintaining 16 concurrent writes of 4194304 bytes for up to 10 
> seconds or 0 objects 
> Object prefix: benchmark_data_ceph1_2977 
> sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 
> 0 0 0 0 0 0 - 0 
> 1 16 94 78 311.821 312 0.041228 0.140132 
> 2 16 192 176 351.866 392 0.106294 0.175055 
> 3 16 275 259 345.216 332 0.076795 0.166036 
> 4 16 302 286 285.912 108 0.043888 0.196419 
> 5 16 395 379 303.11 372 0.126033 0.207488 
> 6 16 501 485 323.242 424 0.125972 0.194559 
> 7 16 621 605 345.621 480 0.194155 0.183123 
> 8 16 730 714 356.903 436 0.086678 0.176099 
> 9 16 814 798 354.572 336 0.081567 0.174786 
> 10 16 832 816 326.313 72 0.037431 0.182355 
> 11 16 833 817 297.013 4 0.533326 0.182784 
> Total time run: 11.489068 
> Total writes made: 833 
> Write size: 4194304 
> Bandwidth (MB/sec): 290.015 
> 
> Stddev Bandwidth: 175.723 
> Max bandwidth (MB/sec): 480 
> Min bandwidth (MB/sec): 0 
> Average Latency: 0.220582 
> Stddev Latency: 0.343697 
> Max latency: 2.85104 
> Min latency: 0.035381 
> 
> Our ultimate aim is to replace existing SAN with ceph,but for that it 
> should meet minimum 8000 iops.Can any one help me with this,OSD are 
> SSD,CPU has good clock speed,backend network is good but still we are 
> not able to extract full capability of SSD disks. 
> 
> 
> 
> Thanks, 
> ___ 
> ceph-users mailing list 
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

2015-02-28 Thread mad Engineer
Thanks for the reply Philippe,we were using these disks in our NAS,now
it looks like i am in big trouble :-(

On Sat, Feb 28, 2015 at 5:02 PM, Philippe Schwarz  wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> Le 28/02/2015 12:19, mad Engineer a écrit :
>> Hello All,
>>
>> I am trying ceph-firefly 0.80.8
>> (69eaad7f8308f21573c604f121956e64679a52a7) with 9 OSD ,all Samsung
>> SSD 850 EVO on 3 servers with 24 G RAM,16 cores @2.27 Ghz Ubuntu
>> 14.04 LTS with 3.16-3 kernel.All are connected to 10G ports with
>> maximum MTU.There are no extra disks for journaling and also there
>> are no separate network for replication and data transfer.All 3
>> nodes are also hosting monitoring process.Operating system runs on
>> SATA disk.
>>
>> When doing a sequential benchmark using "dd" on RBD, mounted on
>> client as ext4 its taking 110s to write 100Mb data at an average
>> speed of 926Kbps.
>>
>> time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct
>> 25000+0 records in 25000+0 records out 10240 bytes (102 MB)
>> copied, 110.582 s, 926 kB/s
>>
>> real1m50.585s user0m0.106s sys 0m2.233s
>>
>> While doing this directly on ssd mount point shows:
>>
>> time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct
>> 25000+0 records in 25000+0 records out 10240 bytes (102 MB)
>> copied, 1.38567 s, 73.9 MB/s
>>
>> OSDs are in XFS with these extra arguments :
>>
>> rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M
>>
>> ceph.conf
>>
>> [global] fsid = 7d889081-7826-439c-9fe5-d4e57480d9be
>> mon_initial_members = ceph1, ceph2, ceph3 mon_host =
>> 10.99.10.118,10.99.10.119,10.99.10.120 auth_cluster_required =
>> cephx auth_service_required = cephx auth_client_required = cephx
>> filestore_xattr_use_omap = true osd_pool_default_size = 2
>> osd_pool_default_min_size = 2 osd_pool_default_pg_num = 450
>> osd_pool_default_pgp_num = 450 max_open_files = 131072
>>
>> [osd] osd_mkfs_type = xfs osd_op_threads = 8 osd_disk_threads = 4
>> osd_mount_options_xfs =
>> "rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M"
>>
>>
>> on our traditional storage with Full SAS disk, same "dd" completes
>> in 16s with an average write speed of 6Mbps.
>>
>> Rados bench:
>>
>> rados bench -p rbd 10 write Maintaining 16 concurrent writes of
>> 4194304 bytes for up to 10 seconds or 0 objects Object prefix:
>> benchmark_data_ceph1_2977 sec Cur ops   started  finished  avg MB/s
>> cur MB/s  last lat   avg lat 0   0 0 0
>> 0 0 - 0 1  169478
>> 311.821   312  0.041228  0.140132 2  16   192   176
>> 351.866   392  0.106294  0.175055 3  16   275   259
>> 345.216   332  0.076795  0.166036 4  16   302   286
>> 285.912   108  0.043888  0.196419 5  16   395   379
>> 303.11   372  0.126033  0.207488 6  16   501   485
>> 323.242   424  0.125972  0.194559 7  16   621   605
>> 345.621   480  0.194155  0.183123 8  16   730   714
>> 356.903   436  0.086678  0.176099 9  16   814   798
>> 354.572   336  0.081567  0.174786 10  16   832
>> 816   326.31372  0.037431  0.182355 11  16   833
>> 817   297.013 4  0.533326  0.182784 Total time run:
>> 11.489068 Total writes made:  833 Write size:
>> 4194304 Bandwidth (MB/sec): 290.015
>>
>> Stddev Bandwidth:   175.723 Max bandwidth (MB/sec): 480 Min
>> bandwidth (MB/sec): 0 Average Latency:0.220582 Stddev
>> Latency: 0.343697 Max latency:2.85104 Min
>> latency:0.035381
>>
>> Our ultimate aim is to replace existing SAN with ceph,but for that
>> it should meet minimum 8000 iops.Can any one help me with this,OSD
>> are SSD,CPU has good clock speed,backend network is good but still
>> we are not able to extract full capability of SSD disks.
>>
>>
>>
>> Thanks,
>
> Hi, i'm new to ceph so, don't consider my words as holy truth.
>
> It seems that Samsung 840 (so i assume 850) are crappy for ceph :
>
> MTBF :
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-November/044258.html
> Bandwidth
> :http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-December/045247.html
>
> And according to a confirmed user of Ceph/ProxmoX, Samsung SSDs should
> be avoided if possible in ceph storage.
>
> Apart from that, it seems there was an limitation in ceph for the use
> of the complete bandwidth available in SSDs; but i think with less
> than 1Mb/s you haven't hit this limit.
>
> I remind you that i'm not a ceph-guru (far from that, indeed), so feel
> free to disagree; i'm on the way to improve my knowledge.
>
> Best regards.
>
>
>
>
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v1
>
> iEYEARECAAYFAlTxp0UACgkQlhqCFkbqHRb5+wCgrXCM3VsnVE6PCbbpOmQXCXbr
> 8u0An2BUgZWismSK0PxbwVDOD5+/UWik
> =0o0v
> -END PGP SIGNATURE-
___
ceph-users mailing list
ceph-users@lists.ceph.c

Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

2015-02-28 Thread Martin B Nielsen
Hi,

I cannot recognize that picture; we've been using samsumg 840 pro in
production for almost 2 years now - and have had 1 fail.

We run a 8node mixed ssd/platter cluster with 4x samsung 840 pro (500gb) in
each so that is 32x ssd.

They've written ~25TB data in avg each.

Using the dd you had inside an existing semi-busy mysql-guest I get:

10240 bytes (102 MB) copied, 5.58218 s, 18.3 MB/s

Which is still not a lot, but I think it is more a limitation of our
setup/load.

We are using dumpling.

All that aside, I would prob. go with something tried and tested if I was
to redo it today - we haven't had any issues, but it is still nice to use
something you know should have a baseline performance and can compare to
that.

Cheers,
Martin

On Sat, Feb 28, 2015 at 12:32 PM, Philippe Schwarz 
wrote:

> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> Le 28/02/2015 12:19, mad Engineer a écrit :
> > Hello All,
> >
> > I am trying ceph-firefly 0.80.8
> > (69eaad7f8308f21573c604f121956e64679a52a7) with 9 OSD ,all Samsung
> > SSD 850 EVO on 3 servers with 24 G RAM,16 cores @2.27 Ghz Ubuntu
> > 14.04 LTS with 3.16-3 kernel.All are connected to 10G ports with
> > maximum MTU.There are no extra disks for journaling and also there
> > are no separate network for replication and data transfer.All 3
> > nodes are also hosting monitoring process.Operating system runs on
> > SATA disk.
> >
> > When doing a sequential benchmark using "dd" on RBD, mounted on
> > client as ext4 its taking 110s to write 100Mb data at an average
> > speed of 926Kbps.
> >
> > time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct
> > 25000+0 records in 25000+0 records out 10240 bytes (102 MB)
> > copied, 110.582 s, 926 kB/s
> >
> > real1m50.585s user0m0.106s sys 0m2.233s
> >
> > While doing this directly on ssd mount point shows:
> >
> > time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct
> > 25000+0 records in 25000+0 records out 10240 bytes (102 MB)
> > copied, 1.38567 s, 73.9 MB/s
> >
> > OSDs are in XFS with these extra arguments :
> >
> > rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M
> >
> > ceph.conf
> >
> > [global] fsid = 7d889081-7826-439c-9fe5-d4e57480d9be
> > mon_initial_members = ceph1, ceph2, ceph3 mon_host =
> > 10.99.10.118,10.99.10.119,10.99.10.120 auth_cluster_required =
> > cephx auth_service_required = cephx auth_client_required = cephx
> > filestore_xattr_use_omap = true osd_pool_default_size = 2
> > osd_pool_default_min_size = 2 osd_pool_default_pg_num = 450
> > osd_pool_default_pgp_num = 450 max_open_files = 131072
> >
> > [osd] osd_mkfs_type = xfs osd_op_threads = 8 osd_disk_threads = 4
> > osd_mount_options_xfs =
> > "rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M"
> >
> >
> > on our traditional storage with Full SAS disk, same "dd" completes
> > in 16s with an average write speed of 6Mbps.
> >
> > Rados bench:
> >
> > rados bench -p rbd 10 write Maintaining 16 concurrent writes of
> > 4194304 bytes for up to 10 seconds or 0 objects Object prefix:
> > benchmark_data_ceph1_2977 sec Cur ops   started  finished  avg MB/s
> > cur MB/s  last lat   avg lat 0   0 0 0
> > 0 0 - 0 1  169478
> > 311.821   312  0.041228  0.140132 2  16   192   176
> > 351.866   392  0.106294  0.175055 3  16   275   259
> > 345.216   332  0.076795  0.166036 4  16   302   286
> > 285.912   108  0.043888  0.196419 5  16   395   379
> > 303.11   372  0.126033  0.207488 6  16   501   485
> > 323.242   424  0.125972  0.194559 7  16   621   605
> > 345.621   480  0.194155  0.183123 8  16   730   714
> > 356.903   436  0.086678  0.176099 9  16   814   798
> > 354.572   336  0.081567  0.174786 10  16   832
> > 816   326.31372  0.037431  0.182355 11  16   833
> > 817   297.013 4  0.533326  0.182784 Total time run:
> > 11.489068 Total writes made:  833 Write size:
> > 4194304 Bandwidth (MB/sec): 290.015
> >
> > Stddev Bandwidth:   175.723 Max bandwidth (MB/sec): 480 Min
> > bandwidth (MB/sec): 0 Average Latency:0.220582 Stddev
> > Latency: 0.343697 Max latency:2.85104 Min
> > latency:0.035381
> >
> > Our ultimate aim is to replace existing SAN with ceph,but for that
> > it should meet minimum 8000 iops.Can any one help me with this,OSD
> > are SSD,CPU has good clock speed,backend network is good but still
> > we are not able to extract full capability of SSD disks.
> >
> >
> >
> > Thanks,
>
> Hi, i'm new to ceph so, don't consider my words as holy truth.
>
> It seems that Samsung 840 (so i assume 850) are crappy for ceph :
>
> MTBF :
>
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-November/044258.html
> Bandwidth
> :
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-December/045247.html
>
> And according to 

Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

2015-02-28 Thread Alexandre DERUMIER
Hi,

First, test if your ssd can write fast with O_DSYNC
check this blog:
http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/


Then, try with ceph Giant (or maybe wait for Hammer), because they are a lot of 
optimisations for ssd for threads sharding.

In my last test with giant, I was able to reach around 12iops with 
6osd/intel s3500 ssd, but I was cpu limited.

- Mail original -
De: "mad Engineer" 
À: "ceph-users" 
Envoyé: Samedi 28 Février 2015 12:19:56
Objet: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD  
with 3.16-3 kernel

Hello All, 

I am trying ceph-firefly 0.80.8 
(69eaad7f8308f21573c604f121956e64679a52a7) with 9 OSD ,all Samsung SSD 
850 EVO on 3 servers with 24 G RAM,16 cores @2.27 Ghz Ubuntu 14.04 LTS 
with 3.16-3 kernel.All are connected to 10G ports with maximum 
MTU.There are no extra disks for journaling and also there are no 
separate network for replication and data transfer.All 3 nodes are 
also hosting monitoring process.Operating system runs on SATA disk. 

When doing a sequential benchmark using "dd" on RBD, mounted on client 
as ext4 its taking 110s to write 100Mb data at an average speed of 
926Kbps. 

time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct 
25000+0 records in 
25000+0 records out 
10240 bytes (102 MB) copied, 110.582 s, 926 kB/s 

real 1m50.585s 
user 0m0.106s 
sys 0m2.233s 

While doing this directly on ssd mount point shows: 

time dd if=/dev/zero of=hello bs=4k count=25000 
oflag=direct 
25000+0 records in 
25000+0 records out 
10240 bytes (102 MB) copied, 1.38567 
s, 73.9 MB/s 

OSDs are in XFS with these extra arguments : 

rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M 

ceph.conf 

[global] 
fsid = 7d889081-7826-439c-9fe5-d4e57480d9be 
mon_initial_members = ceph1, ceph2, ceph3 
mon_host = 10.99.10.118,10.99.10.119,10.99.10.120 
auth_cluster_required = cephx 
auth_service_required = cephx 
auth_client_required = cephx 
filestore_xattr_use_omap = true 
osd_pool_default_size = 2 
osd_pool_default_min_size = 2 
osd_pool_default_pg_num = 450 
osd_pool_default_pgp_num = 450 
max_open_files = 131072 

[osd] 
osd_mkfs_type = xfs 
osd_op_threads = 8 
osd_disk_threads = 4 
osd_mount_options_xfs = 
"rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M" 


on our traditional storage with Full SAS disk, same "dd" completes in 
16s with an average write speed of 6Mbps. 

Rados bench: 

rados bench -p rbd 10 write 
Maintaining 16 concurrent writes of 4194304 bytes for up to 10 
seconds or 0 objects 
Object prefix: benchmark_data_ceph1_2977 
sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 
0 0 0 0 0 0 - 0 
1 16 94 78 311.821 312 0.041228 0.140132 
2 16 192 176 351.866 392 0.106294 0.175055 
3 16 275 259 345.216 332 0.076795 0.166036 
4 16 302 286 285.912 108 0.043888 0.196419 
5 16 395 379 303.11 372 0.126033 0.207488 
6 16 501 485 323.242 424 0.125972 0.194559 
7 16 621 605 345.621 480 0.194155 0.183123 
8 16 730 714 356.903 436 0.086678 0.176099 
9 16 814 798 354.572 336 0.081567 0.174786 
10 16 832 816 326.313 72 0.037431 0.182355 
11 16 833 817 297.013 4 0.533326 0.182784 
Total time run: 11.489068 
Total writes made: 833 
Write size: 4194304 
Bandwidth (MB/sec): 290.015 

Stddev Bandwidth: 175.723 
Max bandwidth (MB/sec): 480 
Min bandwidth (MB/sec): 0 
Average Latency: 0.220582 
Stddev Latency: 0.343697 
Max latency: 2.85104 
Min latency: 0.035381 

Our ultimate aim is to replace existing SAN with ceph,but for that it 
should meet minimum 8000 iops.Can any one help me with this,OSD are 
SSD,CPU has good clock speed,backend network is good but still we are 
not able to extract full capability of SSD disks. 



Thanks, 
___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

2015-02-28 Thread Philippe Schwarz
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Le 28/02/2015 12:19, mad Engineer a écrit :
> Hello All,
> 
> I am trying ceph-firefly 0.80.8 
> (69eaad7f8308f21573c604f121956e64679a52a7) with 9 OSD ,all Samsung
> SSD 850 EVO on 3 servers with 24 G RAM,16 cores @2.27 Ghz Ubuntu
> 14.04 LTS with 3.16-3 kernel.All are connected to 10G ports with
> maximum MTU.There are no extra disks for journaling and also there
> are no separate network for replication and data transfer.All 3
> nodes are also hosting monitoring process.Operating system runs on
> SATA disk.
> 
> When doing a sequential benchmark using "dd" on RBD, mounted on
> client as ext4 its taking 110s to write 100Mb data at an average
> speed of 926Kbps.
> 
> time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct 
> 25000+0 records in 25000+0 records out 10240 bytes (102 MB)
> copied, 110.582 s, 926 kB/s
> 
> real1m50.585s user0m0.106s sys 0m2.233s
> 
> While doing this directly on ssd mount point shows:
> 
> time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct 
> 25000+0 records in 25000+0 records out 10240 bytes (102 MB)
> copied, 1.38567 s, 73.9 MB/s
> 
> OSDs are in XFS with these extra arguments :
> 
> rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M
> 
> ceph.conf
> 
> [global] fsid = 7d889081-7826-439c-9fe5-d4e57480d9be 
> mon_initial_members = ceph1, ceph2, ceph3 mon_host =
> 10.99.10.118,10.99.10.119,10.99.10.120 auth_cluster_required =
> cephx auth_service_required = cephx auth_client_required = cephx 
> filestore_xattr_use_omap = true osd_pool_default_size = 2 
> osd_pool_default_min_size = 2 osd_pool_default_pg_num = 450 
> osd_pool_default_pgp_num = 450 max_open_files = 131072
> 
> [osd] osd_mkfs_type = xfs osd_op_threads = 8 osd_disk_threads = 4 
> osd_mount_options_xfs =
> "rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M"
> 
> 
> on our traditional storage with Full SAS disk, same "dd" completes
> in 16s with an average write speed of 6Mbps.
> 
> Rados bench:
> 
> rados bench -p rbd 10 write Maintaining 16 concurrent writes of
> 4194304 bytes for up to 10 seconds or 0 objects Object prefix:
> benchmark_data_ceph1_2977 sec Cur ops   started  finished  avg MB/s
> cur MB/s  last lat   avg lat 0   0 0 0
> 0 0 - 0 1  169478
> 311.821   312  0.041228  0.140132 2  16   192   176
> 351.866   392  0.106294  0.175055 3  16   275   259
> 345.216   332  0.076795  0.166036 4  16   302   286
> 285.912   108  0.043888  0.196419 5  16   395   379
> 303.11   372  0.126033  0.207488 6  16   501   485
> 323.242   424  0.125972  0.194559 7  16   621   605
> 345.621   480  0.194155  0.183123 8  16   730   714
> 356.903   436  0.086678  0.176099 9  16   814   798
> 354.572   336  0.081567  0.174786 10  16   832
> 816   326.31372  0.037431  0.182355 11  16   833
> 817   297.013 4  0.533326  0.182784 Total time run:
> 11.489068 Total writes made:  833 Write size:
> 4194304 Bandwidth (MB/sec): 290.015
> 
> Stddev Bandwidth:   175.723 Max bandwidth (MB/sec): 480 Min
> bandwidth (MB/sec): 0 Average Latency:0.220582 Stddev
> Latency: 0.343697 Max latency:2.85104 Min
> latency:0.035381
> 
> Our ultimate aim is to replace existing SAN with ceph,but for that
> it should meet minimum 8000 iops.Can any one help me with this,OSD
> are SSD,CPU has good clock speed,backend network is good but still
> we are not able to extract full capability of SSD disks.
> 
> 
> 
> Thanks,

Hi, i'm new to ceph so, don't consider my words as holy truth.

It seems that Samsung 840 (so i assume 850) are crappy for ceph :

MTBF :
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-November/044258.html
Bandwidth
:http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-December/045247.html

And according to a confirmed user of Ceph/ProxmoX, Samsung SSDs should
be avoided if possible in ceph storage.

Apart from that, it seems there was an limitation in ceph for the use
of the complete bandwidth available in SSDs; but i think with less
than 1Mb/s you haven't hit this limit.

I remind you that i'm not a ceph-guru (far from that, indeed), so feel
free to disagree; i'm on the way to improve my knowledge.

Best regards.




-BEGIN PGP SIGNATURE-
Version: GnuPG v1

iEYEARECAAYFAlTxp0UACgkQlhqCFkbqHRb5+wCgrXCM3VsnVE6PCbbpOmQXCXbr
8u0An2BUgZWismSK0PxbwVDOD5+/UWik
=0o0v
-END PGP SIGNATURE-
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

2015-02-28 Thread mad Engineer
Hello All,

I am trying ceph-firefly 0.80.8
(69eaad7f8308f21573c604f121956e64679a52a7) with 9 OSD ,all Samsung SSD
850 EVO on 3 servers with 24 G RAM,16 cores @2.27 Ghz Ubuntu 14.04 LTS
with 3.16-3 kernel.All are connected to 10G ports with maximum
MTU.There are no extra disks for journaling and also there are no
separate network for replication and data transfer.All 3 nodes are
also hosting monitoring process.Operating system runs on SATA disk.

When doing a sequential benchmark using "dd" on RBD, mounted on client
as ext4 its taking 110s to write 100Mb data at an average speed of
926Kbps.

  time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct
 25000+0 records in
 25000+0 records out
 10240 bytes (102 MB) copied, 110.582 s, 926 kB/s

 real1m50.585s
 user0m0.106s
 sys 0m2.233s

While doing this directly on ssd mount point shows:

  time dd if=/dev/zero of=hello bs=4k count=25000
oflag=direct
  25000+0 records in
  25000+0 records out
  10240 bytes (102 MB) copied, 1.38567
s, 73.9 MB/s

OSDs are in XFS with these extra arguments :

rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M

ceph.conf

[global]
fsid = 7d889081-7826-439c-9fe5-d4e57480d9be
mon_initial_members = ceph1, ceph2, ceph3
mon_host = 10.99.10.118,10.99.10.119,10.99.10.120
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
filestore_xattr_use_omap = true
osd_pool_default_size = 2
osd_pool_default_min_size = 2
osd_pool_default_pg_num = 450
osd_pool_default_pgp_num = 450
max_open_files = 131072

[osd]
osd_mkfs_type = xfs
osd_op_threads = 8
osd_disk_threads = 4
osd_mount_options_xfs = "rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M"


on our traditional storage with Full SAS disk, same "dd" completes in
16s with an average write speed of 6Mbps.

Rados bench:

rados bench -p rbd 10 write
 Maintaining 16 concurrent writes of 4194304 bytes for up to 10
seconds or 0 objects
 Object prefix: benchmark_data_ceph1_2977
   sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
 0   0 0 0 0 0 - 0
 1  169478   311.821   312  0.041228  0.140132
 2  16   192   176   351.866   392  0.106294  0.175055
 3  16   275   259   345.216   332  0.076795  0.166036
 4  16   302   286   285.912   108  0.043888  0.196419
 5  16   395   379303.11   372  0.126033  0.207488
 6  16   501   485   323.242   424  0.125972  0.194559
 7  16   621   605   345.621   480  0.194155  0.183123
 8  16   730   714   356.903   436  0.086678  0.176099
 9  16   814   798   354.572   336  0.081567  0.174786
10  16   832   816   326.31372  0.037431  0.182355
11  16   833   817   297.013 4  0.533326  0.182784
 Total time run: 11.489068
Total writes made:  833
Write size: 4194304
Bandwidth (MB/sec): 290.015

Stddev Bandwidth:   175.723
Max bandwidth (MB/sec): 480
Min bandwidth (MB/sec): 0
Average Latency:0.220582
Stddev Latency: 0.343697
Max latency:2.85104
Min latency:0.035381

Our ultimate aim is to replace existing SAN with ceph,but for that it
should meet minimum 8000 iops.Can any one help me with this,OSD are
SSD,CPU has good clock speed,backend network is good but still we are
not able to extract full capability of SSD disks.



Thanks,
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com