Re: [ceph-users] Analysing ceph performance with SSD journal, 10gbe NIC and 2 replicas -Hammer release

2017-01-06 Thread Maged Mokhtar


The numbers are very low. I would first benchmark the system without the vm 
client using rbd 4k test such as:
rbd bench-write image01  --pool=rbd --io-threads=32 --io-size 4096
--io-pattern rand --rbd_cache=false


 Original message 
From: kevin parrikar  
Date: 07/01/2017  05:48  (GMT+02:00) 
To: Christian Balzer  
Cc: ceph-users@lists.ceph.com 
Subject: Re: [ceph-users] Analysing ceph performance with SSD journal, 10gbe 
NIC and 2 replicas -Hammer release 

i really need some help here :(

replaced all 7.2 rpm SAS disks with new Samsung 840 evo 512Gb SSD with no 
seperate journal Disk .Now both OSD nodes are with 2 ssd disks  with a replica 
of 2 . 
Total number of OSD process in the cluster is 4.with all SSD.

But throughput has gone down from 1.4 MB/s to 1.3 MB/s for 4k writes and for 4M 
it has gone down from 140MB/s to 126MB/s .

now atop no longer shows OSD device as 100% busy..
How ever i can see both ceph-osd process in atop with 53% and 47% disk 
utilization.

 PID                         RDDSK          WRDSK           WCANCL       DSK    
 CMD        1/220771                          0K                648.8M          
   0K               53%    ceph-osd19547                          0K            
    576.7M             0K               47%    ceph-osd


OSD disks(ssd) utilization from atop

DSK |  sdc | busy  6%  | read  0  | write  517  | KiB/r   0  | KiB/w  293 | 
MBr/s 0.00  | MBw/s 148.18  | avq   9.44  | avio 0.12 ms  |

DSK |  sdd | busy   5% | read   0 | write   336 | KiB/r   0  | KiB/w   292 | 
MBr/s 0.00 | MBw/s  96.12  | avq     7.62  | avio 0.15 ms  |


Queue Depth of OSD disks
 cat /sys/block/sdd/device//queue_depth256
atop inside virtual machine:[4 CPU/3Gb RAM]
DSK |   vdc  | busy     96%  | read     0  | write  256  | KiB/r   0  | KiB/w  
512  | MBr/s   0.00  | MBw/s 128.00  | avq    7.96  | avio 3.77 ms  |


Both Guest and Host are using deadline I/O scheduler

Virtual Machine Configuration:

                                     
                   
                       
449da0e7-6223-457c-b2c6-b5e112099212          



ceph.conf

 cat /etc/ceph/ceph.conf
[global]fsid = c4e1a523-9017-492e-9c30-8350eba1bd51mon_initial_members = 
node-16 node-30 node-31mon_host = 172.16.1.11 172.16.1.12 
172.16.1.8auth_cluster_required = cephxauth_service_required = 
cephxauth_client_required = cephxfilestore_xattr_use_omap = 
truelog_to_syslog_level = infolog_to_syslog = Trueosd_pool_default_size = 
2osd_pool_default_min_size = 1osd_pool_default_pg_num = 64public_network = 
172.16.1.0/24log_to_syslog_facility = LOG_LOCAL0osd_journal_size = 
2048auth_supported = cephxosd_pool_default_pgp_num = 64osd_mkfs_type = 
xfscluster_network = 172.16.1.0/24osd_recovery_max_active = 1osd_max_backfills 
= 1

[client]rbd_cache_writethrough_until_flush = Truerbd_cache = True
[client.radosgw.gateway]rgw_keystone_accepted_roles = _member_, Member, admin, 
swiftoperatorkeyring = /etc/ceph/keyring.radosgw.gatewayrgw_frontends = fastcgi 
socket_port=9000 socket_host=127.0.0.1rgw_socket_path = 
/tmp/radosgw.sockrgw_keystone_revocation_interval = 100
Any guidance on where to look for issues.

Regards,Kevin
On Fri, Jan 6, 2017 at 4:42 PM, kevin parrikar  
wrote:
Thanks Christian for your valuable comments,each comment is a new learning for 
me.
Please see inline 

On Fri, Jan 6, 2017 at 9:32 AM, Christian Balzer  wrote:


Hello,



On Fri, 6 Jan 2017 08:40:36 +0530 kevin parrikar wrote:



> Hello All,

>

> I have setup a ceph cluster based on 0.94.6 release in  2 servers each with

> 80Gb intel s3510 and 2x3 Tb 7.2 SATA disks,16 CPU,24G RAM

> which is connected to a 10G switch with a replica of 2 [ i will add 3 more

> servers to the cluster] and 3 seperate monitor nodes which are vms.

>

I'd go to the latest hammer, this version has a lethal cache-tier bug if

you should decide to try that.



80Gb Intel DC S3510 are a) slow and b) have only 0.3 DWPD.

You're going to wear those out quickly and if not replaced in time loose

data.



2 HDDs give you a theoretical speed of something like 300MB/s sustained,

when used a OSDs I'd expect the usual 50-60MB/s per OSD due to

seeks, journal (file system) and leveldb overheads.

Which perfectly matches your results.

H that makes sense ,its hitting 7.2 rpm OSD's peak write speed.I was in an 
assumption that ssd Journal to OSD will happen slowly at a later time and hence 
 i could use slower and cheaper disks for OSD.But in practise it looks like 
many articles in the internet that talks about faster journal and slower OSD 
dont seems to be correct.

Will adding more OSD disks per node improve the overall performance?

 i can add 4 more disks to each node,but all are 7.2 rpm disks .I am expecting 
some kind of parallel writes on these disks and magically improves performance 
:D
This is my second experiment with Ceph last time i gave up and purchased 
another costly solution from a vendor.But this time i am determined to fix all 
issues and b

Re: [ceph-users] Analysing ceph performance with SSD journal, 10gbe NIC and 2 replicas -Hammer release

2017-01-06 Thread kevin parrikar
i really need some help here :(

replaced all 7.2 rpm SAS disks with new Samsung 840 evo 512Gb SSD with no
seperate journal Disk .Now both OSD nodes are with 2 ssd disks  with a
replica of *2* .
Total number of OSD process in the cluster is *4*.with all SSD.


But throughput has gone down from 1.4 MB/s to 1.3 MB/s for 4k writes and
for 4M it has gone down from 140MB/s to 126MB/s .

now atop no longer shows OSD device as 100% busy..

How ever i can see both ceph-osd process in atop with 53% and 47% disk
utilization.

 PID RDDSK  WRDSK   WCANCL
DSK CMD1/2
20771  0K648.8M 0K
  53%ceph-osd
19547  0K576.7M 0K
  47%ceph-osd


OSD disks(ssd) utilization from atop

DSK |  sdc | busy  6%  | read  0  | write  517  | KiB/r   0  | KiB/w  293 |
MBr/s 0.00  | MBw/s 148.18  | avq   9.44  | avio 0.12 ms  |

DSK |  sdd | busy   5% | read   0 | write   336 | KiB/r   0  | KiB/w   292
| MBr/s 0.00 | MBw/s  96.12  | avq 7.62  | avio 0.15 ms  |


Queue Depth of OSD disks
 cat /sys/block/sdd/device//queue_depth
256

atop inside virtual machine:[4 CPU/3Gb RAM]
DSK |   vdc  | busy 96%  | read 0  | write  256  | KiB/r   0  |
KiB/w  512  | MBr/s   0.00  | MBw/s 128.00  | avq7.96  | avio 3.77 ms  |


Both Guest and Host are using deadline I/O scheduler


Virtual Machine Configuration:

 

  
  

  
  



  
  
  449da0e7-6223-457c-b2c6-b5e112099212
  




ceph.conf

 cat /etc/ceph/ceph.conf

[global]
fsid = c4e1a523-9017-492e-9c30-8350eba1bd51
mon_initial_members = node-16 node-30 node-31
mon_host = 172.16.1.11 172.16.1.12 172.16.1.8
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
filestore_xattr_use_omap = true
log_to_syslog_level = info
log_to_syslog = True
osd_pool_default_size = 2
osd_pool_default_min_size = 1
osd_pool_default_pg_num = 64
public_network = 172.16.1.0/24
log_to_syslog_facility = LOG_LOCAL0
osd_journal_size = 2048
auth_supported = cephx
osd_pool_default_pgp_num = 64
osd_mkfs_type = xfs
cluster_network = 172.16.1.0/24
osd_recovery_max_active = 1
osd_max_backfills = 1


[client]
rbd_cache_writethrough_until_flush = True
rbd_cache = True

[client.radosgw.gateway]
rgw_keystone_accepted_roles = _member_, Member, admin, swiftoperator
keyring = /etc/ceph/keyring.radosgw.gateway
rgw_frontends = fastcgi socket_port=9000 socket_host=127.0.0.1
rgw_socket_path = /tmp/radosgw.sock
rgw_keystone_revocation_interval = 100

Any guidance on where to look for issues.

Regards,
Kevin

On Fri, Jan 6, 2017 at 4:42 PM, kevin parrikar 
wrote:

> Thanks Christian for your valuable comments,each comment is a new learning
> for me.
> Please see inline
>
> On Fri, Jan 6, 2017 at 9:32 AM, Christian Balzer  wrote:
>
>>
>> Hello,
>>
>> On Fri, 6 Jan 2017 08:40:36 +0530 kevin parrikar wrote:
>>
>> > Hello All,
>> >
>> > I have setup a ceph cluster based on 0.94.6 release in  2 servers each
>> with
>> > 80Gb intel s3510 and 2x3 Tb 7.2 SATA disks,16 CPU,24G RAM
>> > which is connected to a 10G switch with a replica of 2 [ i will add 3
>> more
>> > servers to the cluster] and 3 seperate monitor nodes which are vms.
>> >
>> I'd go to the latest hammer, this version has a lethal cache-tier bug if
>> you should decide to try that.
>>
>> 80Gb Intel DC S3510 are a) slow and b) have only 0.3 DWPD.
>> You're going to wear those out quickly and if not replaced in time loose
>> data.
>>
>> 2 HDDs give you a theoretical speed of something like 300MB/s sustained,
>> when used a OSDs I'd expect the usual 50-60MB/s per OSD due to
>> seeks, journal (file system) and leveldb overheads.
>> Which perfectly matches your results.
>>
>
> H that makes sense ,its hitting 7.2 rpm OSD's peak write speed.I was
> in an assumption that ssd Journal to OSD will happen slowly at a later time
> and hence  i could use slower and cheaper disks for OSD.But in practise it
> looks like many articles in the internet that talks about faster journal
> and slower OSD dont seems to be correct.
>
> Will adding more OSD disks per node improve the overall performance?
>
>  i can add 4 more disks to each node,but all are 7.2 rpm disks .I am
> expecting some kind of parallel writes on these disks and magically
> improves performance :D
>
> This is my second experiment with Ceph last time i gave up and purchased
> another costly solution from a vendor.But this time i am determined to fix
> all issues and bring up a solid cluster .
> Last time clsuter was  giving a throughput of around 900kbps for 1G writes
> from virtual machine and now things have improved ,its giving 1.4 Mbps but
> still far slower than the target of 24Mbps.
>
> Expecting to make some progress with the help of experts here :)
>
>>
>> > rbd_cache is enabled in configurations,XFS filesystem,LSI 92465-4i raid
>> > card 

[ceph-users] cephfs AND rbds

2017-01-06 Thread David Turner
Can cephfs and rbds use the same pool to store data?  I know you would need a 
separate metadata pool for cephfs, but could they share the same data pool?



[cid:image90f904.JPG@bd76911d.4db65e9b]   David 
Turner | Cloud Operations Engineer | StorageCraft Technology 
Corporation
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2760 | Mobile: 385.224.2943



If you are not the intended recipient of this message or received it 
erroneously, please notify the sender and delete it, together with any 
attachments, and be advised that any dissemination or copying of this message 
is prohibited.


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs ata1.00: status: { DRDY }

2017-01-06 Thread Anthony D'Atri
YMMV of course, but the first thing that struck me was the constraint of scrub 
times.  Constraining them to fewer hours can mean that more run in parallel.  
If you truly have off-hours for client ops (Graphite / Grafana are great for 
visualizing that) that might make sense, but in my 24x7 OpenStack world, there 
is little or no off-hour lull, so I let scrubs run all the time.

You might also up osd_deep_scrub_interval.  The default is one week; I raise 
that to four weeks as a compromise between aggressive protection and the 
realities of contention.

— Anthony, currently looking for a new Ceph opportunity.

>> In our ceph.conf we already have this settings active:
>> 
>> osd max scrubs = 1
>> osd scrub begin hour = 20
>> osd scrub end hour = 7
>> osd op threads = 16
>> osd client op priority = 63
>> osd recovery op priority = 1
>> osd op thread timeout = 5
>> 
>> osd disk thread ioprio class = idle
>> osd disk thread ioprio priority = 7
>> 
> You're missing the most powerful scrub dampener there is:
> osd_scrub_sleep = 0.1
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph Blog Planet

2017-01-06 Thread Patrick McGarry
Hey cephers,

Now that we're getting ready to flip the switch on the new ceph.com
I'd like to start rebuilding our blog planet. If you have a blog that
regularly has posts tagged with 'ceph' I'd love to include you. Please
drop me a line with your ceph-specific feed and I'll add it to the
list. Thanks!


-- 

Best Regards,

Patrick McGarry
Director Ceph Community || Red Hat
http://ceph.com  ||  http://community.redhat.com
@scuttlemonkey || @ceph
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] new user error

2017-01-06 Thread David Turner
It should happen automatically when you start the node.  Please provide ceph 
status, ceph osd tree, and any other pertinent specifics you can think of.



[cid:image2cfdf2.JPG@41df26b2.449ae9c6]   David 
Turner | Cloud Operations Engineer | StorageCraft Technology 
Corporation
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2760 | Mobile: 385.224.2943



If you are not the intended recipient of this message or received it 
erroneously, please notify the sender and delete it, together with any 
attachments, and be advised that any dissemination or copying of this message 
is prohibited.




From: ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of Alex Evonosky 
[alex.evono...@gmail.com]
Sent: Thursday, January 05, 2017 10:25 PM
To: ceph-users@lists.ceph.com
Subject: [ceph-users] new user error


Hello group--

I have been running ceph 10.2.3 for awhile now without any issues.  This 
evening my admin node (which is also an OSD and Monitor) crashed.  I checked my 
other OSD servers and the data seems to still be there.

Is there an easy way to bring the admin node back into the cluster?  I am 
trying to bring this admin node/OSD back up without losing any data...

Thank you!

-Alex
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs ata1.00: status: { DRDY }

2017-01-06 Thread Oliver Dzombic
Hi Christian,

thank you for your comments.

Unfortunatelly i have here some Softwarerestrictions, so i can not go
with the librbd backend and have to go with cephfs.

I understand, that a newer kernelsoftware might help solving issues. But
we can not upgrade every time to every new kernel being released.

If cephfs is that much sensitive, its simply not really useable. And i
think it is useable, so i think the kernel is not the issue here.

Between 4.5 and 4.8 is not soo much timespace, that i would search the
issue here.



Right now, unfortunatelly, i have some trouble to assign log warnings
within ceph to the issues inside the VM's because the VM's are not run
by us. So i lack of exact informations.

I will have to setup our own VM's there and do some logging/checking.

But as far as the logs told me, there are no slow requests.



HDDs have their journals on seperate HW Raid-10 SSD's ( SM863's  )



Yes, writeback mode.



I inserted osd_scrub_sleep = 0.1 now. But currently also doing an
extension of the ceph, so its anyway busy with recovery ( without any
issues so far, even the utilization is basically same with (deep) scrubbing.

The SSD Cache we run is quiet big and quiet fast. So most of the heavy
IO is handled there.

The HDD's usually have nothing to do ( 200-500 kb per second throughput
per device at ~5% utilization at 5/6 IOPS ).

Thats during productive time.

At night time, running scrub, the stats are raising to

50 MB/s throughput, 80-95% utilization at 400 IOPS reading


-

I hope this osd_scrub_sleep = 0.1 will have some impact as soon as the
cluster is back to normal and i turn on scrubbing again.


Again, thank you very much !

-- 
Mit freundlichen Gruessen / Best regards

Oliver Dzombic
IP-Interactive

mailto:i...@ip-interactive.de

Anschrift:

IP Interactive UG ( haftungsbeschraenkt )
Zum Sonnenberg 1-3
63571 Gelnhausen

HRB 93402 beim Amtsgericht Hanau
Geschäftsführung: Oliver Dzombic

Steuer Nr.: 35 236 3622 1
UST ID: DE274086107


Am 06.01.2017 um 01:56 schrieb Christian Balzer:
> 
> Hello,
> 
> On Thu, 5 Jan 2017 23:02:51 +0100 Oliver Dzombic wrote:
> 
> 
> I've never seen hung qemu tasks, slow/hung I/O tasks inside VMs with a
> broken/slow cluster I've seen.
> That's because mine are all RBD librbd backed.
> 
> I think your approach with cephfs probably isn't the way forward.
> Also with cephfs you probably want to run the latest and greatest kernel
> there is (4.8?).
> 
> Is your cluster logging slow request warnings during that time?
> 
>>
>> In the night, thats when this issues occure primary/(only?), we run the
>> scrubs and deep scrubs.
>>
>> In this time the HDD Utilization of the cold storage peaks to 80-95%.
>>
> Never a good thing, if they are also expected to do something useful.
> HDD OSDs have their journals inline?
> 
>> But we have a SSD hot storage in front of this, which is buffering
>> writes and reads.
>>
> With that you mean cache-tier in writeback mode?
>  
>> In our ceph.conf we already have this settings active:
>>
>> osd max scrubs = 1
>> osd scrub begin hour = 20
>> osd scrub end hour = 7
>> osd op threads = 16
>> osd client op priority = 63
>> osd recovery op priority = 1
>> osd op thread timeout = 5
>>
>> osd disk thread ioprio class = idle
>> osd disk thread ioprio priority = 7
>>
> You're missing the most powerful scrub dampener there is:
> osd_scrub_sleep = 0.1
> 
>>
>>
>> All in all i do not think that there is not enough IO for the clients on
>> the cold storage ( even it looks like that on the first view ).
>>
> I find that one of the best ways to understand and thus manage your
> cluster is to run something like collectd with graphite (or grafana or
> whatever cranks your tractor).
> 
> This should in combination with detailed spot analysis by atop or similar
> give a very good idea of what is going on.
> 
> So in this case, watch cache-tier promotions and flushes, see if your
> clients I/Os really are covered by the cache or if during the night your
> VMs may do log rotates or access other cold data and thus have to go to
> the HDD based OSDs...
>  
>> And if its really as simple as too view IO for the clients, my question
>> would be, how to avoid it ?
>>
>> Turning off scrub/deep scrub completely ? That should not be needed and
>> is also not too much advisable.
>>
> From where I'm standing deep-scrub is a luxury bling thing of limited
> value when compared to something with integrated live checksums as in
> Bluestore (so we hope) and BTRFS/ZFS. 
> 
> That said, your cluster NEEDs to be able to survive scrubs or it will be
> in even bigger trouble when OSDs/nodes fail.
> 
> Christian
> 
>> We simply can not run less than
>>
>> osd max scrubs = 1
>>
>>
>> So if scrub is eating away all IO, the scrub algorythem is simply too
>> aggressiv.
>>
>> Or, and thats most probable i guess, i have some kind of config mistake.
>>
>>
> 
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com

Re: [ceph-users] Analysing ceph performance with SSD journal, 10gbe NIC and 2 replicas -Hammer release

2017-01-06 Thread kevin parrikar
Thanks Christian for your valuable comments,each comment is a new learning
for me.
Please see inline

On Fri, Jan 6, 2017 at 9:32 AM, Christian Balzer  wrote:

>
> Hello,
>
> On Fri, 6 Jan 2017 08:40:36 +0530 kevin parrikar wrote:
>
> > Hello All,
> >
> > I have setup a ceph cluster based on 0.94.6 release in  2 servers each
> with
> > 80Gb intel s3510 and 2x3 Tb 7.2 SATA disks,16 CPU,24G RAM
> > which is connected to a 10G switch with a replica of 2 [ i will add 3
> more
> > servers to the cluster] and 3 seperate monitor nodes which are vms.
> >
> I'd go to the latest hammer, this version has a lethal cache-tier bug if
> you should decide to try that.
>
> 80Gb Intel DC S3510 are a) slow and b) have only 0.3 DWPD.
> You're going to wear those out quickly and if not replaced in time loose
> data.
>
> 2 HDDs give you a theoretical speed of something like 300MB/s sustained,
> when used a OSDs I'd expect the usual 50-60MB/s per OSD due to
> seeks, journal (file system) and leveldb overheads.
> Which perfectly matches your results.
>

H that makes sense ,its hitting 7.2 rpm OSD's peak write speed.I was in
an assumption that ssd Journal to OSD will happen slowly at a later time
and hence  i could use slower and cheaper disks for OSD.But in practise it
looks like many articles in the internet that talks about faster journal
and slower OSD dont seems to be correct.

Will adding more OSD disks per node improve the overall performance?

 i can add 4 more disks to each node,but all are 7.2 rpm disks .I am
expecting some kind of parallel writes on these disks and magically
improves performance :D

This is my second experiment with Ceph last time i gave up and purchased
another costly solution from a vendor.But this time i am determined to fix
all issues and bring up a solid cluster .
Last time clsuter was  giving a throughput of around 900kbps for 1G writes
from virtual machine and now things have improved ,its giving 1.4 Mbps but
still far slower than the target of 24Mbps.

Expecting to make some progress with the help of experts here :)

>
> > rbd_cache is enabled in configurations,XFS filesystem,LSI 92465-4i raid
> > card with 512Mb cache [ssd is in writeback mode wth BBU]
> >
> >
> > Before installing ceph, i tried to check max throughpit of intel 3500
> 80G
> > SSD using block size of 4M [i read somewhere that ceph uses 4m objects]
> and
> > it was giving 220mbps {dd if=/dev/zero of=/dev/sdb bs=4M count=1000
> > oflag=direct}
> >
> Irrelevant, sustained sequential writes will be limited by what your OSDs
> (HDDs) can sustain.
>
> > *Observation:*
> > Now the cluster is up and running and from the vm i am trying to write a
> 4g
> > file to its volume using dd if=/dev/zero of=/dev/sdb bs=4M count=1000
> > oflag=direct .It takes aroud 39 seconds to write.
> >
> >  during this time ssd journal was showing disk write of 104M on both the
> > ceph servers (dstat sdb) and compute node a network transfer rate of
> ~110M
> > on its 10G storage interface(dstat -nN eth2]
> >
> As I said, sounds about right.
>
> >
> > my questions are:
> >
> >
> >- Is this the best throughput ceph can offer or can anything in my
> >environment be optmised to get  more performance? [iperf shows a max
> >throughput 9.8Gbits/s]
> >
> Not your network.
>
> Watch your nodes with atop and you will note that your HDDs are maxed out.
>
> >
> >
> >- I guess Network/SSD is under utilized and it can handle more writes
> >how can this be improved to send more data over network to ssd?
> >
> As jiajia wrote, a cache-tier might give you some speed boosts.
> But with those SSDs I'd advise against it, both too small and too low
> endurance.
>
> >
> >
> >- rbd kernel module wasn't loaded on compute node,i loaded it manually
> >using "modprobe" and later destroyed/re-created vms,but this doesnot
> give
> >any performance boost. So librbd and RBD are equally fast?
> >
> Irrelevant and confusing.
> Your VMs will use on or the other depending on how they are configured.
>
> >
> >
> >- Samsung evo 840 512Gb shows a throughput of 500Mbps for 4M writes
> [dd
> >if=/dev/zero of=/dev/sdb bs=4M count=1000 oflag=direct] and for 4Kb
> it was
> >equally fast as that of intel S3500 80gb .Does changing my SSD from
> intel
> >s3500 100Gb to Samsung 840 500Gb make any performance  difference
> here just
> >because for 4M wirtes samsung 840 evo is faster?Can Ceph utilize this
> extra
> >speed.Since samsung evo 840 is faster in 4M writes.
> >
> Those SSDs would be an even worse choice for endurance/reliability
> reasons, though their larger size offsets that a bit.
>
> Unless you have a VERY good understanding and data on how much your
> cluster is going to write, pick at the very least SSDs with 3+ DWPD
> endurance like the DC S3610s.
> In very light loaded cases DC S3520 with 1DWPD may be OK, but again, you
> need to know what you're doing here.
>
> Christian
> >
> > Can somebody help me understand this better.
> >
>

Re: [ceph-users] Analysing ceph performance with SSD journal, 10gbe NIC and 2 replicas -Hammer release

2017-01-06 Thread kevin parrikar
Thanks Zhong.
We got 5 servers for testing ,two are already configured to be OSD nodes
and as per the storage requirement we need at least 5 OSD nodes .Let me try
to get more servers to try cache tier ,but i am not hopefull though :( .

Will try bcache and see how it improves performance,thanks for your
suggestion.

Regards,
Kevin

On Fri, Jan 6, 2017 at 8:56 AM, jiajia zhong  wrote:

>
>
> 2017-01-06 11:10 GMT+08:00 kevin parrikar :
>
>> Hello All,
>>
>> I have setup a ceph cluster based on 0.94.6 release in  2 servers each
>> with 80Gb intel s3510 and 2x3 Tb 7.2 SATA disks,16 CPU,24G RAM
>> which is connected to a 10G switch with a replica of 2 [ i will add 3
>> more servers to the cluster] and 3 seperate monitor nodes which are vms.
>>
>> rbd_cache is enabled in configurations,XFS filesystem,LSI 92465-4i raid
>> card with 512Mb cache [ssd is in writeback mode wth BBU]
>>
>>
>> Before installing ceph, i tried to check max throughpit of intel 3500
>>  80G SSD using block size of 4M [i read somewhere that ceph uses 4m
>> objects] and it was giving 220mbps {dd if=/dev/zero of=/dev/sdb bs=4M
>> count=1000 oflag=direct}
>>
>> *Observation:*
>> Now the cluster is up and running and from the vm i am trying to write a
>> 4g file to its volume using dd if=/dev/zero of=/dev/sdb bs=4M count=1000
>> oflag=direct .It takes aroud 39 seconds to write.
>>
>>  during this time ssd journal was showing disk write of 104M on both the
>> ceph servers (dstat sdb) and compute node a network transfer rate of ~110M
>> on its 10G storage interface(dstat -nN eth2]
>>
>>
>> my questions are:
>>
>>
>>- Is this the best throughput ceph can offer or can anything in my
>>environment be optmised to get  more performance? [iperf shows a max
>>throughput 9.8Gbits/s]
>>
>>
>>
>>- I guess Network/SSD is under utilized and it can handle more writes
>>how can this be improved to send more data over network to ssd?
>>
>> cache tiering? http://docs.ceph.com/docs/hammer/rados/operations/cache-
> tiering/
> or try bcache in kernel.
>
>>
>>- rbd kernel module wasn't loaded on compute node,i loaded it
>>manually using "modprobe" and later destroyed/re-created vms,but this
>>doesnot give any performance boost. So librbd and RBD are equally fast?
>>
>>
>>
>>- Samsung evo 840 512Gb shows a throughput of 500Mbps for 4M writes
>>[dd if=/dev/zero of=/dev/sdb bs=4M count=1000 oflag=direct] and for 4Kb it
>>was equally fast as that of intel S3500 80gb .Does changing my SSD from
>>intel s3500 100Gb to Samsung 840 500Gb make any performance  difference
>>here just because for 4M wirtes samsung 840 evo is faster?Can Ceph utilize
>>this extra speed.Since samsung evo 840 is faster in 4M writes.
>>
>>
>> Can somebody help me understand this better.
>>
>> Regards,
>> Kevin
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph and rrdtool

2017-01-06 Thread Bjoern Laessig
On Fri, 2016-12-02 at 19:47 +, Steve Jankowski wrote:
> Anyone using rrdtool with Ceph via rados or cephfs ?

i just read your mail, so i hope it helps.


> If so, how many rrd files and how many rrd file updates per minute.

My munin runs on rbd in a vm. It has a rrdcached with options:
''rrdcached -w 1800 -z 600''. It uses 1.5G as cache for writing to 6
rrds updated every 5 minutes. This creates a permanent Throughput of 1.2
MB/s with 80 Disk-IO/s in a 5 min avg. A peak usage for Disk-IO/s is
1000IO/s on a 14 OSD-Node Cluster.

Björn

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com