Re: [ceph-users] Improving Performance with more OSD's?

2015-01-05 Thread Nick Fisk
Hi Udo,

Lindsay did this for performance reasons so that the data is spread evenly
over the disks, I believe it has been accepted that the remaining 2tb on the
3tb disks will not be used.

Nick


-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Udo
Lembke
Sent: 05 January 2015 07:15
To: Lindsay Mathieson
Cc: ceph-us...@ceph.com  ceph-users
Subject: Re: [ceph-users] Improving Performance with more OSD's?

Hi Lindsay,

On 05.01.2015 06:52, Lindsay Mathieson wrote:
 ...
 So two OSD Nodes had:
 - Samsung 840 EVO SSD for Op. Sys.
 - Intel 530 SSD for Journals (10GB Per OSD)
 - 3TB WD Red
 - 1 TB WD Blue
 - 1 TB WD Blue
 - Each disk weighted at 1.0
 - Primary affinity of the WD Red (slow) set to 0
the weight should be the size of the filesystem. With weight 1 for all
disks, you run in trouble if your cluster filled, because the 1TB-Disks are
full, before the 3TB disk!

You should have something like 0.9 for the 1TB and 2.82 for the 3TB disks (
df -k | grep osd | awk '{print $2/(1024^3) }'  ).

Udo
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Improving Performance with more OSD's?

2015-01-05 Thread Nick Fisk
I've been having good results with OMD (Check_MK + Nagios)

There is a plugin for Ceph as well that I made a small modification to, to
work with a wider range of cluster sizes

http://www.spinics.net/lists/ceph-users/msg13355.html

Nick

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
Lindsay Mathieson
Sent: 05 January 2015 12:35
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Improving Performance with more OSD's?

On Mon, 5 Jan 2015 09:21:16 AM Nick Fisk wrote:
 Lindsay did this for performance reasons so that the data is spread 
 evenly over the disks, I believe it has been accepted that the 
 remaining 2tb on the 3tb disks will not be used.

Exactly, thanks Nick.

I only have a terabyte of data, and its not going to grow much, if at all. 
With 3 OSD's per node the 1TB OSD's are only at 40% utilisation, but you can
bet I'll be keeping a close eye on that.

Next step, get nagios or icinga setup.
--
Lindsay




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Improving Performance with more OSD's?

2015-01-05 Thread Lindsay Mathieson
On Mon, 5 Jan 2015 01:15:03 PM Nick Fisk wrote:
 I've been having good results with OMD (Check_MK + Nagios)
 
 There is a plugin for Ceph as well that I made a small modification to, to
 work with a wider range of cluster sizes


Thanks, I'll check it out.

Currently trying zabbix, seems more straightforward than nagios.
-- 
Lindsay

signature.asc
Description: This is a digitally signed message part.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Improving Performance with more OSD's?

2015-01-05 Thread Lindsay Mathieson
On Mon, 5 Jan 2015 09:21:16 AM Nick Fisk wrote:
 Lindsay did this for performance reasons so that the data is spread evenly
 over the disks, I believe it has been accepted that the remaining 2tb on the
 3tb disks will not be used.

Exactly, thanks Nick.

I only have a terabyte of data, and its not going to grow much, if at all. 
With 3 OSD's per node the 1TB OSD's are only at 40% utilisation, but you can 
bet I'll be keeping a close eye on that.

Next step, get nagios or icinga setup.
-- 
Lindsay

signature.asc
Description: This is a digitally signed message part.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Improving Performance with more OSD's?

2015-01-04 Thread Lindsay Mathieson
Well I upgraded my cluster over the weekend :)
To each node I added:
- Intel SSD 530 for journals
- 2 * 1TB WD Blue

So two OSD Nodes had:
- Samsung 840 EVO SSD for Op. Sys.
- Intel 530 SSD for Journals (10GB Per OSD)
- 3TB WD Red
- 1 TB WD Blue
- 1 TB WD Blue
- Each disk weighted at 1.0
- Primary affinity of the WD Red (slow) set to 0

Took about 8 hours for 1TB of data to rebalance over the OSD's

Very pleased with results so far.

rados benchmark:
- Write bandwidth has increased from 49 MB/s to 140 MB/s
- Reads have stayed roughly the same at 500 MB/s

VM Benchmarks:
- Actually have stayed much the same, but have more depth - multiple VM's
share the bandwidth nicely.

Users are finding their VM's *much* less laggy.

Thanks for all the help and suggestions.

Lindsay
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Improving Performance with more OSD's?

2015-01-04 Thread Udo Lembke
Hi Lindsay,

On 05.01.2015 06:52, Lindsay Mathieson wrote:
 ...
 So two OSD Nodes had:
 - Samsung 840 EVO SSD for Op. Sys.
 - Intel 530 SSD for Journals (10GB Per OSD)
 - 3TB WD Red
 - 1 TB WD Blue
 - 1 TB WD Blue
 - Each disk weighted at 1.0
 - Primary affinity of the WD Red (slow) set to 0
the weight should be the size of the filesystem. With weight 1 for all
disks, you run in trouble if your cluster filled, because the 1TB-Disks
are full, before the 3TB disk!

You should have something like 0.9 for the 1TB and 2.82 for the 3TB
disks ( df -k | grep osd | awk '{print $2/(1024^3) }'  ).

Udo
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Improving Performance with more OSD's?

2014-12-30 Thread Eneko Lacunza

Hi,

On 29/12/14 15:12, Christian Balzer wrote:

3rd Node
  - Monitor only, for quorum
- Intel Nuc
- 8GB RAM
- CPU: Celeron N2820

Uh oh, a bit weak for a monitor. Where does the OS live (on this and the
other nodes)? The leveldb (/var/lib/ceph/..) of the monitors likes it fast,
SSDs preferably.


I have a small setup with such a node (only 4 GB RAM, another 2 good 
nodes for OSD and virtualization) - it works like a charm and CPU max is 
always under 5% in the graphs. It only peaks when backups are dumped to 
its 1TB disk using NFS.

I'd prefer to use the existing third node (the Intel Nuc), but its
expansion is limited to USB3 devices. Are there USB3 external drives
with decent performance stats?


I'd advise against it.
That node doing both monitor and OSDs is not going to end well.
My experience has led me not to trust USB disks for continuous 
operation, I wouldn't do this either.


Just my cents
Eneko

--
Zuzendari Teknikoa / Director Técnico
Binovo IT Human Project, S.L.
Telf. 943575997
  943493611
Astigarraga bidea 2, planta 6 dcha., ofi. 3-2; 20180 Oiartzun (Gipuzkoa)
www.binovo.es

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Improving Performance with more OSD's?

2014-12-30 Thread Eneko Lacunza

Hi,

On 30/12/14 11:55, Lindsay Mathieson wrote:

On Tue, 30 Dec 2014 11:26:08 AM Eneko Lacunza wrote:

  have a small setup with such a node (only 4 GB RAM, another 2 good
nodes for OSD and virtualization) - it works like a charm and CPU max is
always under 5% in the graphs. It only peaks when backups are dumped to
its 1TB disk using NFS.

Yes, CPU has not been a problem for em at all, I even occasional run a windows
VM on the NUC.

Sounds like we have very similar setups - 2 good ndoes that run full osd's,
mon and VM's, and a third smaller node for quorum.

Do you have OSD's on your thrid ndoe as well?
No, I have never had a VM running on it, there are only 6 VMs in this 
cluster and the other 2 nodes have plenty of RAM/CPU for them. I might 
try if one of the good nodes goes down ;)

I'd advise against it.
That node doing both monitor and OSDs is not going to end well.

My experience has led me not to trust USB disks for continuous
operation, I wouldn't do this either.

Yeah, it doesn't sound like a good idea. Pity, the nucs are so small and quiet

Yes. But I think the CPU would become a problem as soon as we put 1-2 
OSDs on that NUC. Maybe with a Core i3 NUC... :)


Cheers
Eneko

--
Zuzendari Teknikoa / Director Técnico
Binovo IT Human Project, S.L.
Telf. 943575997
  943493611
Astigarraga bidea 2, planta 6 dcha., ofi. 3-2; 20180 Oiartzun (Gipuzkoa)
www.binovo.es

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Improving Performance with more OSD's?

2014-12-30 Thread Lindsay Mathieson
On Tue, 30 Dec 2014 11:26:08 AM Eneko Lacunza wrote:
  have a small setup with such a node (only 4 GB RAM, another 2 good 
 nodes for OSD and virtualization) - it works like a charm and CPU max is 
 always under 5% in the graphs. It only peaks when backups are dumped to 
 its 1TB disk using NFS.

Yes, CPU has not been a problem for em at all, I even occasional run a windows 
VM on the NUC.

Sounds like we have very similar setups - 2 good ndoes that run full osd's, 
mon and VM's, and a third smaller node for quorum.

Do you have OSD's on your thrid ndoe as well?

  I'd advise against it.
  That node doing both monitor and OSDs is not going to end well.
 
 My experience has led me not to trust USB disks for continuous 
 operation, I wouldn't do this either.

Yeah, it doesn't sound like a good idea. Pity, the nucs are so small and quiet

thanks,

-- 
Lindsay

signature.asc
Description: This is a digitally signed message part.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Improving Performance with more OSD's?

2014-12-29 Thread Tomasz Kuzemko
On Sun, Dec 28, 2014 at 02:49:08PM +0900, Christian Balzer wrote:
 You really, really want size 3 and a third node for both performance
 (reads) and redundancy.

How does it benefit read performance? I thought all reads are made only
from the active primary OSD.

-- 
Tomasz Kuzemko
tomasz.kuze...@ovh.net



signature.asc
Description: Digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Improving Performance with more OSD's?

2014-12-29 Thread Andrey Korolyov
On Mon, Dec 29, 2014 at 12:47 PM, Tomasz Kuzemko tomasz.kuze...@ovh.net wrote:
 On Sun, Dec 28, 2014 at 02:49:08PM +0900, Christian Balzer wrote:
 You really, really want size 3 and a third node for both performance
 (reads) and redundancy.

 How does it benefit read performance? I thought all reads are made only
 from the active primary OSD.

 --
 Tomasz Kuzemko
 tomasz.kuze...@ovh.net

You`ll have chunks of primary data scattered between three devices
instead of two, as each pg will have a random acting set (until you
decide to pin primary).
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Improving Performance with more OSD's?

2014-12-29 Thread Christian Balzer

Hello,

On Mon, 29 Dec 2014 00:05:40 +1000 Lindsay Mathieson wrote:

 Appreciate the detailed reply Christian.
 
 On Sun, 28 Dec 2014 02:49:08 PM Christian Balzer wrote:
  On Sun, 28 Dec 2014 08:59:33 +1000 Lindsay Mathieson wrote:
   I'm looking to improve the raw performance on my small setup (2
   Compute Nodes, 2 OSD's). Only used for hosting KVM images.
  
  This doesn't really make things clear, do you mean 2 STORAGE nodes
  with 2 OSDs (HDDs) each?
 
 2 Nodes, 1 OSD per node
 
 Hardware is indentical for all nodes  disks
 - Mobo: P9X79 WS
 - CPU:Intel  Xeon E5-2620
Not particularly fast, but sufficient for about 4 OSDs

 - RAM: 32 GB ECC
Good enough.

 - 1GB Nic Public Access
 - 2 * 1GB Bond for ceph
Is that a private cluster network just between Ceph storage nodes or is
this for all ceph traffic (including clients)?
The later would probably be better, a private cluster network twice as
fast as the client one isn't particular helpful 99% of the time.

 - OSD: 3TB WD Red
 - Journal: 10GB on Samsung 840 EVO
 
 3rd Node
  - Monitor only, for quorum
 - Intel Nuc 
 - 8GB RAM
 - CPU: Celeron N2820
 
Uh oh, a bit weak for a monitor. Where does the OS live (on this and the
other nodes)? The leveldb (/var/lib/ceph/..) of the monitors likes it fast,
SSDs preferably.

 
 
  In either case that's a very small setup (and with a replication of 2 a
  risky one, too), so don't expect great performance.
 
 Ok.
 
  
  Throughput numbers aren't exactly worthless, but you will find IOPS to
  be the killer in most cases. Also without describing how you measured
  these numbers (rados bench, fio, bonnie, on the host, inside a VM)
  they become even more muddled.
 
 - rados bench on the node to test raw write 
 - fio in a VM
 - Crystal DiskMark in a windows VM to test IOPS
 
 
  You really, really want size 3 and a third node for both performance
  (reads) and redundancy.
 
 I can probably scare up a desktop PC to use as a fourth node with
 another 3TB disk.
 
The closer it is to the current storage nodes, the better. 
The slowest OSD in a cluster can impede all (most of) the others.

 I'd prefer to use the existing third node (the Intel Nuc), but its
 expansion is limited to USB3 devices. Are there USB3 external drives
 with decent performance stats?
 
I'd advise against it.
That node doing both monitor and OSDs is not going to end well.

Regards,

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Improving Performance with more OSD's?

2014-12-29 Thread Christian Balzer

Hello,

On Mon, 29 Dec 2014 13:49:49 +0400 Andrey Korolyov wrote:

 On Mon, Dec 29, 2014 at 12:47 PM, Tomasz Kuzemko
 tomasz.kuze...@ovh.net wrote:
  On Sun, Dec 28, 2014 at 02:49:08PM +0900, Christian Balzer wrote:
  You really, really want size 3 and a third node for both performance
  (reads) and redundancy.
 
  How does it benefit read performance? I thought all reads are made only
  from the active primary OSD.
 
  --
  Tomasz Kuzemko
  tomasz.kuze...@ovh.net
 
 You`ll have chunks of primary data scattered between three devices
 instead of two, as each pg will have a random acting set (until you
 decide to pin primary).
 
What Andrey wrote.

Reads will scale up (on a cluster basis, individual clients might
not benefit as much) linearly with each additional device (host/OSD).

Writes will scale up with each additional device divided by replica size. 

Fun fact, if you have 1 node with replica 1 and add 2 more identical nodes
and increase the replica to 3, your write performance will be less than 50%
of the single node. 
Once you add a 4th node, write speed will increase again.

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Improving Performance with more OSD's?

2014-12-29 Thread Lindsay Mathieson
On Mon, 29 Dec 2014 11:12:06 PM Christian Balzer wrote:
 Is that a private cluster network just between Ceph storage nodes or is
 this for all ceph traffic (including clients)?
 The later would probably be better, a private cluster network twice as
 fast as the client one isn't particular helpful 99% of the time.


The later - all ceph traffic including clients (qemu rbd).

  3rd Node
  
   - Monitor only, for quorum
  
  - Intel Nuc
  - 8GB RAM
  - CPU: Celeron N2820
 
 Uh oh, a bit weak for a monitor. Where does the OS live (on this and the
 other nodes)? The leveldb (/var/lib/ceph/..) of the monitors likes it fast,
 SSDs preferably.

On a SSD (all the nodes have OS on SSD).

Looks like I misunderstood the purpose of the monitors, I presumed they were 
just for monitoring node health. They do more than that?


 The closer it is to the current storage nodes, the better.
 The slowest OSD in a cluster can impede all (most of) the others.

Closer as in similar hardware specs?




-- 
Lindsay

signature.asc
Description: This is a digitally signed message part.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Improving Performance with more OSD's?

2014-12-29 Thread Lindsay Mathieson
On Sun, 28 Dec 2014 04:08:03 PM Nick Fisk wrote:
 If you can't add another full host, your best bet would be to add another
 2-3 disks to each server. This should give you a bit more performance. It's
 much better to have lots of small disks rather than large multi-TB ones from
 a performance perspective. So maybe look to see if you can get 500GB/1TB
 drives cheap.


Thanks, will do.

Can you set replica 3 with two nodes and 6-8 OSD's? one would have to tweak 
the crush map?
-- 
Lindsay

signature.asc
Description: This is a digitally signed message part.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Improving Performance with more OSD's?

2014-12-29 Thread Lindsay Mathieson
On Mon, 29 Dec 2014 11:29:11 PM Christian Balzer wrote:
 Reads will scale up (on a cluster basis, individual clients might
 not benefit as much) linearly with each additional device (host/OSD).

I'm taking that to mean individual clients as a whole will be limited by the 
speed of individual OSD's, but multiple clients will spread their reads 
between multiple OSD's, leading to a higher aggregate bandwidth than 
individual disks could sustain.

I guess the limiting factor there would be network.

 
 Writes will scale up with each additional device divided by replica size. 

So adding OSD's will increase write speed from individual clients? seq writes 
go out to different OSD's simultaneously?

 
 Fun fact, if you have 1 node with replica 1 and add 2 more identical nodes
 and increase the replica to 3, your write performance will be less than 50%
 of the single node. 

Interesting - this seems to imply that writes go to the replica OSD's one 
after another, rather than simultaneously like I expected.

thanks,

-- 
Lindsay

signature.asc
Description: This is a digitally signed message part.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Improving Performance with more OSD's?

2014-12-29 Thread Nick Fisk
You would need to modify the crush map, so that it would store two of the
same replica's on the same host, however I'm not sure how you would go about
this and still make sure that at least 1 other replica is on a different
host. But to be honest with the amount of OSD's you will have, the data loss
probability with a replica size of 2 is not as bad as when you have much
larger clusters so you may decide that size 2 is fine. But as always, make
sure you have backups.



-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
Lindsay Mathieson
Sent: 29 December 2014 22:24
To: Nick Fisk
Cc: ceph-us...@ceph.com
Subject: Re: [ceph-users] Improving Performance with more OSD's?

On Sun, 28 Dec 2014 04:08:03 PM Nick Fisk wrote:
 If you can't add another full host, your best bet would be to add 
 another
 2-3 disks to each server. This should give you a bit more performance. 
 It's much better to have lots of small disks rather than large 
 multi-TB ones from a performance perspective. So maybe look to see if 
 you can get 500GB/1TB drives cheap.


Thanks, will do.

Can you set replica 3 with two nodes and 6-8 OSD's? one would have to tweak
the crush map?
--
Lindsay




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Improving Performance with more OSD's?

2014-12-29 Thread Lindsay Mathieson
On Sun, 28 Dec 2014 04:08:03 PM Nick Fisk wrote:
  This should give you a bit more performance. It's
 much better to have lots of small disks rather than large multi-TB ones from
 a performance perspective. So maybe look to see if you can get 500GB/1TB
 drives cheap.

Is this from the docs still relevant in this case?

/A weight is the relative difference between device capacities. We recommend 
using
1.00 as the relative weight for a 1TB storage device. In such a scenario, a 
weight of
0.5 would represent approximately 500GB, and a weight of 3.00 would represent
approximately 3TB/

So I would have maybe 1 3TB and 2 * 1TB

Kinda regret getting the 3TB drives now  learning experience.


--
Lindsay


signature.asc
Description: This is a digitally signed message part.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Improving Performance with more OSD's?

2014-12-29 Thread Christian Balzer

Hello,

On Tue, 30 Dec 2014 08:12:21 +1000 Lindsay Mathieson wrote:

 On Mon, 29 Dec 2014 11:12:06 PM Christian Balzer wrote:
  Is that a private cluster network just between Ceph storage nodes or is
  this for all ceph traffic (including clients)?
  The later would probably be better, a private cluster network twice as
  fast as the client one isn't particular helpful 99% of the time.
 
 
 The later - all ceph traffic including clients (qemu rbd).
 
Very good. ^.^

   3rd Node
   
- Monitor only, for quorum
   
   - Intel Nuc
   - 8GB RAM
   - CPU: Celeron N2820
  
  Uh oh, a bit weak for a monitor. Where does the OS live (on this and
  the other nodes)? The leveldb (/var/lib/ceph/..) of the monitors likes
  it fast, SSDs preferably.
 
 On a SSD (all the nodes have OS on SSD).
 
Good.

 Looks like I misunderstood the purpose of the monitors, I presumed they
 were just for monitoring node health. They do more than that?
 
They keep the maps and the pgmap in particular is of course very busy.
All that action is at: /var/lib/ceph/mon/monitorname/store.db/ .

In addition monitors log like no tomorrow, also straining the OS storage.

 
  The closer it is to the current storage nodes, the better.
  The slowest OSD in a cluster can impede all (most of) the others.
 
 Closer as in similar hardware specs?
 
Ayup. The less variation, the better and the more predictable things
become.
Again, having 1 node slow down 2 fast nodes is not what you want.

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Improving Performance with more OSD's?

2014-12-29 Thread Lindsay Mathieson
On Tue, 30 Dec 2014 12:48:58 PM Christian Balzer wrote:
  Looks like I misunderstood the purpose of the monitors, I presumed they
  were just for monitoring node health. They do more than that?
 
  
 
 They keep the maps and the pgmap in particular is of course very busy.
 All that action is at: /var/lib/ceph/mon/monitorname/store.db/ .
 
 In addition monitors log like no tomorrow, also straining the OS storage.


Yikes!

Did a quick check, root  data storage at under 10% usage - Phew!

Could the third under spec'd monitor (which only has 1GB Eth) be slowing 
things down? worthwhile removing it as a test?
-- 
Lindsay

signature.asc
Description: This is a digitally signed message part.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Improving Performance with more OSD's?

2014-12-29 Thread Christian Balzer
On Tue, 30 Dec 2014 08:22:01 +1000 Lindsay Mathieson wrote:

 On Mon, 29 Dec 2014 11:29:11 PM Christian Balzer wrote:
  Reads will scale up (on a cluster basis, individual clients might
  not benefit as much) linearly with each additional device (host/OSD).
 
 I'm taking that to mean individual clients as a whole will be limited by
 the speed of individual OSD's, but multiple clients will spread their
 reads between multiple OSD's, leading to a higher aggregate bandwidth
 than individual disks could sustain.
 
A single client like a VM or an application (see rados bench threads)
might of course do things in parallel, too. And thus benefit from
accessing mulitple OSDs on multiple nodes at the same time.

However a client doing a single, sequential read won't improve much of
course (the fact that there are more OSDs with less spindel competition may
still help though).

 I guess the limiting factor there would be network.
 
For bandwidth/throughput, most likely and certainly in your case.

But bandwidth really tends to become very quickly the least of your
concerns, IOPS is where bottlenecks tend to appear first.

And there aside from the obvious limitations of your disks (and SSDs) the
for most people surprising next bottleneck is the CPU.

  
  Writes will scale up with each additional device divided by replica
  size. 
 
 So adding OSD's will increase write speed from individual clients? 

OSDs help in and by itself due to the fact that activity can be
distributed between more spindles (HDDs). So you can certainly increase
the speed of your current storage nodes by adding more OSDs (lets say 4
per node). 
However increasing the node count and replica size to 3 will not improve
things, rather the opposite. 
Because simply put in that configuration each node will have to do the same
things as the others, plus overhead and limitations imposed things like the
network. 
Once you add a 4th node, things speed up again.

 seq writes go out to different OSD's simultaneously?
 
Unless there are multiple threads, no. But given the default object size
of 4MB, they go to different OSDs sequentially and rather quickly so. 

  
  Fun fact, if you have 1 node with replica 1 and add 2 more identical
  nodes and increase the replica to 3, your write performance will be
  less than 50% of the single node. 
 
 Interesting - this seems to imply that writes go to the replica OSD's
 one after another, rather than simultaneously like I expected.
 
There is a graphic on the Ceph documentation:
http://ceph.com/docs/master/architecture/#smart-daemons-enable-hyperscale

The numbering of the requests suggests sequential operation, but even if
the primary OSD sends the data to the secondary one(s) in parallel your
network bandwidth and LATENCY as well as the activity on those nodes and
OSDs will of course delay things when compared to just a single, local
write.

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Improving Performance with more OSD's?

2014-12-29 Thread Christian Balzer
On Tue, 30 Dec 2014 14:08:32 +1000 Lindsay Mathieson wrote:

 On Tue, 30 Dec 2014 12:48:58 PM Christian Balzer wrote:
   Looks like I misunderstood the purpose of the monitors, I presumed
   they were just for monitoring node health. They do more than that?
  
   
  
  They keep the maps and the pgmap in particular is of course very busy.
  All that action is at: /var/lib/ceph/mon/monitorname/store.db/ .
  
  In addition monitors log like no tomorrow, also straining the OS
  storage.
 
 
 Yikes!
 
 Did a quick check, root  data storage at under 10% usage - Phew!
 
The DB doesn't (shouldn't) grow out of bounds and the logs while chatty
ought to be rotated. 
Your issue is IOPS, how busy those SSDs are more than anything.
But even crappy SSDs should be just fine.

Use a good monitoring tool like atop to watch how busy things are.

And do that while running a normal rados bench like this from a client
node:
rados -p rbd bench 60 write -t 32

And again like this:
rados -p rbd bench 60 write -t 32 -b 4096

In particular (but not only), compare the CPU usage during those runs.

 Could the third under spec'd monitor (which only has 1GB Eth) be slowing 
 things down? worthwhile removing it as a test?

Check with atop, but I doubt it. The network should be fine, storage on
SSD should be fine, the memory (if not doing anything else) should do for
your cluster size. CPU probably as well, but that is for you to check.

Also the primary monitor is the one with the lowest IP (unfortunately not
documented anywhere or configurable).

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Improving Performance with more OSD's?

2014-12-29 Thread Lindsay Mathieson
On Tue, 30 Dec 2014 12:48:58 PM Christian Balzer wrote:
  Looks like I misunderstood the purpose of the monitors, I presumed they
  were just for monitoring node health. They do more than that?
 
  
 
 They keep the maps and the pgmap in particular is of course very busy.
 All that action is at: /var/lib/ceph/mon/monitorname/store.db/ .
 
 In addition monitors log like no tomorrow, also straining the OS storage.


Yikes!

Did a quick check, root  data storage at under 10% usage - Phew!

Could the third under spec'd monitor (which only has 1GB Eth) be slowing 
things down? worthwhile removing it as a test?
-- 
Lindsay

signature.asc
Description: This is a digitally signed message part.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Improving Performance with more OSD's?

2014-12-29 Thread Lindsay Mathieson
On 30 December 2014 at 14:28, Christian Balzer ch...@gol.com wrote:

 Use a good monitoring tool like atop to watch how busy things are.

 And do that while running a normal rados bench like this from a client
 node:
 rados -p rbd bench 60 write -t 32

 And again like this:
 rados -p rbd bench 60 write -t 32 -b 4096

 In particular (but not only), compare the CPU usage during those runs.

Interesting results -

First 14 seconds:
CPU : 1 core at sys/user 2%/1%, rest idle
HD :  45% Busy
SDDD: 35% Busy

- After 14 seconds
CPU : 1 core at sys/user 20%/7%, rest idle
  HD : 100% Busy
SDD: 30% - 50% Busy

Journal size is 10GB
max sync interval = 46.5
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Improving Performance with more OSD's?

2014-12-28 Thread Lindsay Mathieson
Appreciate the detailed reply Christian.

On Sun, 28 Dec 2014 02:49:08 PM Christian Balzer wrote:
 On Sun, 28 Dec 2014 08:59:33 +1000 Lindsay Mathieson wrote:
  I'm looking to improve the raw performance on my small setup (2 Compute
  Nodes, 2 OSD's). Only used for hosting KVM images.
 
 This doesn't really make things clear, do you mean 2 STORAGE nodes with 2
 OSDs (HDDs) each?

2 Nodes, 1 OSD per node

Hardware is indentical for all nodes  disks
- Mobo: P9X79 WS
- CPU:Intel  Xeon E5-2620
- RAM: 32 GB ECC
- 1GB Nic Public Access
- 2 * 1GB Bond for ceph
- OSD: 3TB WD Red
- Journal: 10GB on Samsung 840 EVO

3rd Node
 - Monitor only, for quorum
- Intel Nuc 
- 8GB RAM
- CPU: Celeron N2820



 In either case that's a very small setup (and with a replication of 2 a
 risky one, too), so don't expect great performance.

Ok.

 
 Throughput numbers aren't exactly worthless, but you will find IOPS to be
 the killer in most cases. Also without describing how you measured these
 numbers (rados bench, fio, bonnie, on the host, inside a VM) they become
 even more muddled.

- rados bench on the node to test raw write 
- fio in a VM
- Crystal DiskMark in a windows VM to test IOPS


 You really, really want size 3 and a third node for both performance
 (reads) and redundancy.

I can probably scare up a desktop PC to use as a fourth node with another 3TB 
disk.

I'd prefer to use the existing third node (the Intel Nuc), but its expansion 
is limited to USB3 devices. Are there USB3 external drives with decent 
performance stats?


thanks,
-- 
Lindsay

signature.asc
Description: This is a digitally signed message part.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Improving Performance with more OSD's?

2014-12-28 Thread Nick Fisk
Hi Lindsay,

Ceph is really designed to scale across large amounts of OSD's and whilst it
will still function with only 2 OSD's, I wouldn't expect it to perform as
well as compared to a RAID 1 mirror with Battery Backed Cache.

I wouldn't recommend running the OSD's on USB, although it should work
reasonably well.

If you can't add another full host, your best bet would be to add another
2-3 disks to each server. This should give you a bit more performance. It's
much better to have lots of small disks rather than large multi-TB ones from
a performance perspective. So maybe look to see if you can get 500GB/1TB
drives cheap.

Nick

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
Lindsay Mathieson
Sent: 28 December 2014 14:06
To: ceph-us...@ceph.com
Subject: Re: [ceph-users] Improving Performance with more OSD's?

Appreciate the detailed reply Christian.

On Sun, 28 Dec 2014 02:49:08 PM Christian Balzer wrote:
 On Sun, 28 Dec 2014 08:59:33 +1000 Lindsay Mathieson wrote:
  I'm looking to improve the raw performance on my small setup (2 
  Compute Nodes, 2 OSD's). Only used for hosting KVM images.
 
 This doesn't really make things clear, do you mean 2 STORAGE nodes 
 with 2 OSDs (HDDs) each?

2 Nodes, 1 OSD per node

Hardware is indentical for all nodes  disks
- Mobo: P9X79 WS
- CPU:Intel  Xeon E5-2620
- RAM: 32 GB ECC
- 1GB Nic Public Access
- 2 * 1GB Bond for ceph
- OSD: 3TB WD Red
- Journal: 10GB on Samsung 840 EVO

3rd Node
 - Monitor only, for quorum
- Intel Nuc
- 8GB RAM
- CPU: Celeron N2820



 In either case that's a very small setup (and with a replication of 2 a
 risky one, too), so don't expect great performance.

Ok.

 
 Throughput numbers aren't exactly worthless, but you will find IOPS to be
 the killer in most cases. Also without describing how you measured these
 numbers (rados bench, fio, bonnie, on the host, inside a VM) they become
 even more muddled.

- rados bench on the node to test raw write 
- fio in a VM
- Crystal DiskMark in a windows VM to test IOPS


 You really, really want size 3 and a third node for both performance
 (reads) and redundancy.

I can probably scare up a desktop PC to use as a fourth node with another
3TB 
disk.

I'd prefer to use the existing third node (the Intel Nuc), but its expansion

is limited to USB3 devices. Are there USB3 external drives with decent 
performance stats?


thanks,
-- 
Lindsay




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Improving Performance with more OSD's?

2014-12-27 Thread Lindsay Mathieson
I'm looking to improve the raw performance on my small setup (2 Compute Nodes, 
2 OSD's). Only used for hosting KVM images.

Raw read/write is roughly 200/35 MB/s. Starting 4+ VM's simultaneously pushes 
iowaits over 30%, though the system keeps chugging along.

Budget is limited ... :(

I plan to upgrade my SSD journals to something better than the Samsung 840 
EVO's (Intel 520/530?)

One of the things I see mentioned a lot in blogs etc is how ceph's performance 
improves as you add more OSD's and that the quality of the disks does not 
matter so much as the quantity.

How does this work? does ceph stripe reads and writes across the OSD's to 
improve performance?

If I add 3 cheap OSD's to each node (500GB - 1TB) with 10GB SSD journal 
partition each could I expect a big improvement in performance?

What sort of redundancy to setup? currently its min= 1, size=2. Size is not an 
issue, we already have 150% more space than we need, redundancy and 
performance is more important.

Now I think on it, we can live with the slow write performance, but reducing 
iowait would be *really* good.

thanks,
-- 
Lindsay

signature.asc
Description: This is a digitally signed message part.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Improving Performance with more OSD's?

2014-12-27 Thread Christian Balzer
On Sun, 28 Dec 2014 08:59:33 +1000 Lindsay Mathieson wrote:

 I'm looking to improve the raw performance on my small setup (2 Compute
 Nodes, 2 OSD's). Only used for hosting KVM images.
 
This doesn't really make things clear, do you mean 2 STORAGE nodes with 2
OSDs (HDDs) each?
In either case that's a very small setup (and with a replication of 2 a
risky one, too), so don't expect great performance.

It would help if you'd tell us what these nodes are made of
(CPU, RAM, disks, network) so we can at least guess what that cluster
might be capable of.

 Raw read/write is roughly 200/35 MB/s. Starting 4+ VM's simultaneously
 pushes iowaits over 30%, though the system keeps chugging along.
 
Throughput numbers aren't exactly worthless, but you will find IOPS to be
the killer in most cases. Also without describing how you measured these
numbers (rados bench, fio, bonnie, on the host, inside a VM) they become
even more muddled. 

 Budget is limited ... :(
 
 I plan to upgrade my SSD journals to something better than the Samsung
 840 EVO's (Intel 520/530?)
 
Not a big improvement really. 
Take a look at the 100GB Intel DC S3700s, while they can write only at
200MB/s they are priced rather nicely and they will deliver that
performance at ANY time and for a long time, too.

 One of the things I see mentioned a lot in blogs etc is how ceph's
 performance improves as you add more OSD's and that the quality of the
 disks does not matter so much as the quantity.
 
 How does this work? does ceph stripe reads and writes across the OSD's
 to improve performance?
 
Yes and no. It stripes by default to 4MB objects, so with enough OSDs and
clients I/Os will become distributed, scaling up nicely. However a single
client could be hitting the same object on the same OSD all the time
(small DB file for example), so you won't see much or any improvement in
that case.
There is also the option to stripe things on a much smaller scale, however
that takes some planning and needs to be done at pool creation time. 
See and read the Ceph documentation.

 If I add 3 cheap OSD's to each node (500GB - 1TB) with 10GB SSD journal 
 partition each could I expect a big improvement in performance?
 
That depends a lot on the stuff you haven't told us (CPU/RAM/network).
Given that there is sufficient of those, especially CPU, the answer is yes.
A large amount of RAM on the storage nodes will improve reads, as hot
objects become and remain cached.

Of course having decent HDDs will help even with journals on SSDs, for
example the Toshiba DTxx (totally not recommended for ANYTHING) HDDs
cost about the same as their entry level enterprise MG0x drives, which
are nearly twice as fast in the IOPS department.

 What sort of redundancy to setup? currently its min= 1, size=2. Size is
 not an issue, we already have 150% more space than we need, redundancy
 and performance is more important.
 
You really, really want size 3 and a third node for both performance
(reads) and redundancy.

 Now I think on it, we can live with the slow write performance, but
 reducing iowait would be *really* good.
 
Decent SSDs (see above) and more (decent) spindles will help with both.

Regards,

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com