[ceph-users] osds on 2 nodes vs. on one node

2015-09-02 Thread Deneau, Tom
In a small cluster I have 2 OSD nodes with identical hardware, each with 6 osds.

* Configuration 1:  I shut down the osds on one node so I am using 6 OSDS on a 
single node

* Configuration 2:  I shut down 3 osds on each node so now I have 6 total OSDS 
but 3 on each node.

I measure read performance using rados bench from a separate client node.
The client has plenty of spare CPU power and the network and disk utilization 
are not limiting factors.
In all cases, the pool type is replicated so we're just reading from the 
primary.

With Configuration 1, I see approximately 70% more bandwidth than with 
configuration 2.
In general, any configuration where the osds span 2 nodes gets poorer 
performance but in particular
when the 2 nodes have equal amounts of traffic.

Is there any ceph parameter that might be throttling the cases where osds span 
2 nodes?

-- Tom Deneau, AMD
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] osds on 2 nodes vs. on one node

2015-09-02 Thread Christian Balzer

Hello,

On Wed, 2 Sep 2015 22:38:12 + Deneau, Tom wrote:

> In a small cluster I have 2 OSD nodes with identical hardware, each with
> 6 osds.
> 
> * Configuration 1:  I shut down the osds on one node so I am using 6
> OSDS on a single node
>
Shut down how?
Just a "service blah stop" or actually removing them from the cluster aka
CRUSH map?
 
> * Configuration 2:  I shut down 3 osds on each node so now I have 6
> total OSDS but 3 on each node.
> 
Same as above. 
And in this case even more relevant, because just shutting down random OSDs
on both nodes would result in massive recovery action at best and more
likely a broken cluster.

> I measure read performance using rados bench from a separate client node.
Default parameters?

> The client has plenty of spare CPU power and the network and disk
> utilization are not limiting factors. In all cases, the pool type is
> replicated so we're just reading from the primary.
>
Replicated as in size 2? 
We can guess/assume that from your cluster size, but w/o you telling us
or giving us all the various config/crush outputs that is only a guess.
 
> With Configuration 1, I see approximately 70% more bandwidth than with
> configuration 2. 

Never mind that bandwidth is mostly irrelevant in real life, which
bandwidth, read or write?

> In general, any configuration where the osds span 2
> nodes gets poorer performance but in particular when the 2 nodes have
> equal amounts of traffic.
>

Again, guessing from what you're actually doing this isn't particular
surprising. 
Because with a single node, default rules and replication of 2 your OSDs
never have to replicate anything when it comes to writes. 
Whereas with 2 nodes replication happens and takes more time (latency) and
might also saturate your network (we have of course no idea how your
cluster looks like).

Christian
 
> Is there any ceph parameter that might be throttling the cases where
> osds span 2 nodes?
> 
> -- Tom Deneau, AMD
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 


-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] osds on 2 nodes vs. on one node

2015-09-03 Thread Deneau, Tom
Rewording to remove confusion...

Config 1: set up a cluster with 1 node with 6 OSDs
Config 2: identical hardware, set up a cluster with 2 nodes with 3 OSDs each

In each case I do the following:
   1) rados bench write --no-cleanup the same number of 4M size objects
   2) drop caches on all osd nodes
   3) rados bench seq  -t 4 to sequentially read the objects
  and record the read bandwidth

Rados bench is running on a separate client, not on an OSD node.
The client has plenty of spare CPU power and the network and disk
utilization are not limiting factors.

With Config 1, I see approximately 70% more sequential read bandwidth than with 
Config 2.

In both cases the primary OSDs of the objecgts appear evenly distributed across 
OSDs.

Yes, replication factor is 2 but since we are only measuring read performance,
I don't think that matters. 

Question is whether there is a ceph parameter that might be throttling the
2 node configuation?

-- Tom

> -Original Message-
> From: Christian Balzer [mailto:ch...@gol.com]
> Sent: Wednesday, September 02, 2015 7:29 PM
> To: ceph-users
> Cc: Deneau, Tom
> Subject: Re: [ceph-users] osds on 2 nodes vs. on one node
> 
> 
> Hello,
> 
> On Wed, 2 Sep 2015 22:38:12 + Deneau, Tom wrote:
> 
> > In a small cluster I have 2 OSD nodes with identical hardware, each
> > with
> > 6 osds.
> >
> > * Configuration 1:  I shut down the osds on one node so I am using 6
> > OSDS on a single node
> >
> Shut down how?
> Just a "service blah stop" or actually removing them from the cluster aka
> CRUSH map?
> 
> > * Configuration 2:  I shut down 3 osds on each node so now I have 6
> > total OSDS but 3 on each node.
> >
> Same as above.
> And in this case even more relevant, because just shutting down random OSDs
> on both nodes would result in massive recovery action at best and more likely
> a broken cluster.
> 
> > I measure read performance using rados bench from a separate client node.
> Default parameters?
> 
> > The client has plenty of spare CPU power and the network and disk
> > utilization are not limiting factors. In all cases, the pool type is
> > replicated so we're just reading from the primary.
> >
> Replicated as in size 2?
> We can guess/assume that from your cluster size, but w/o you telling us or
> giving us all the various config/crush outputs that is only a guess.
> 
> > With Configuration 1, I see approximately 70% more bandwidth than with
> > configuration 2.
> 
> Never mind that bandwidth is mostly irrelevant in real life, which bandwidth,
> read or write?
> 
> > In general, any configuration where the osds span 2 nodes gets poorer
> > performance but in particular when the 2 nodes have equal amounts of
> > traffic.
> >
> 
> Again, guessing from what you're actually doing this isn't particular
> surprising.
> Because with a single node, default rules and replication of 2 your OSDs
> never have to replicate anything when it comes to writes.
> Whereas with 2 nodes replication happens and takes more time (latency) and
> might also saturate your network (we have of course no idea how your cluster
> looks like).
> 
> Christian
> 
> > Is there any ceph parameter that might be throttling the cases where
> > osds span 2 nodes?
> >
> > -- Tom Deneau, AMD
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> 
> 
> --
> Christian BalzerNetwork/Systems Engineer
> ch...@gol.com Global OnLine Japan/Fusion Communications
> http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] osds on 2 nodes vs. on one node

2015-09-03 Thread Mark Nelson

On 09/03/2015 10:39 AM, Deneau, Tom wrote:

Rewording to remove confusion...

Config 1: set up a cluster with 1 node with 6 OSDs
Config 2: identical hardware, set up a cluster with 2 nodes with 3 OSDs each

In each case I do the following:
1) rados bench write --no-cleanup the same number of 4M size objects
2) drop caches on all osd nodes
3) rados bench seq  -t 4 to sequentially read the objects
   and record the read bandwidth

Rados bench is running on a separate client, not on an OSD node.
The client has plenty of spare CPU power and the network and disk
utilization are not limiting factors.

With Config 1, I see approximately 70% more sequential read bandwidth than with 
Config 2.


Out of curiosity have you tried 6 OSDs just on the 2nd node?



In both cases the primary OSDs of the objecgts appear evenly distributed across 
OSDs.

Yes, replication factor is 2 but since we are only measuring read performance,
I don't think that matters.

Question is whether there is a ceph parameter that might be throttling the
2 node configuation?


It sounds kind of like some kind of network wonkiness, but who knows. 
Maybe try some concurrent network communication tests from the OSDs to 
the clients just to make sure there isn't something strange going on 
with both OSD nodes send data to the client concurrently.


What's the behavior like over time?  Is throughput on the fast setup 
stable?  Is the slow setup spikey?  Consistently low?  How's the latency 
spread in each case?




-- Tom


-Original Message-
From: Christian Balzer [mailto:ch...@gol.com]
Sent: Wednesday, September 02, 2015 7:29 PM
To: ceph-users
Cc: Deneau, Tom
Subject: Re: [ceph-users] osds on 2 nodes vs. on one node


Hello,

On Wed, 2 Sep 2015 22:38:12 + Deneau, Tom wrote:


In a small cluster I have 2 OSD nodes with identical hardware, each
with
6 osds.

* Configuration 1:  I shut down the osds on one node so I am using 6
OSDS on a single node


Shut down how?
Just a "service blah stop" or actually removing them from the cluster aka
CRUSH map?


* Configuration 2:  I shut down 3 osds on each node so now I have 6
total OSDS but 3 on each node.


Same as above.
And in this case even more relevant, because just shutting down random OSDs
on both nodes would result in massive recovery action at best and more likely
a broken cluster.


I measure read performance using rados bench from a separate client node.

Default parameters?


The client has plenty of spare CPU power and the network and disk
utilization are not limiting factors. In all cases, the pool type is
replicated so we're just reading from the primary.


Replicated as in size 2?
We can guess/assume that from your cluster size, but w/o you telling us or
giving us all the various config/crush outputs that is only a guess.


With Configuration 1, I see approximately 70% more bandwidth than with
configuration 2.


Never mind that bandwidth is mostly irrelevant in real life, which bandwidth,
read or write?


In general, any configuration where the osds span 2 nodes gets poorer
performance but in particular when the 2 nodes have equal amounts of
traffic.



Again, guessing from what you're actually doing this isn't particular
surprising.
Because with a single node, default rules and replication of 2 your OSDs
never have to replicate anything when it comes to writes.
Whereas with 2 nodes replication happens and takes more time (latency) and
might also saturate your network (we have of course no idea how your cluster
looks like).

Christian


Is there any ceph parameter that might be throttling the cases where
osds span 2 nodes?

-- Tom Deneau, AMD
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




--
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] osds on 2 nodes vs. on one node

2015-09-03 Thread Deneau, Tom
After running some other experiments, I see now that the high single-node
bandwidth only occurs when ceph-mon is also running on that same node.
(In these small clusters I only had one ceph-mon running).
If I compare to a single-node where ceph-mon is not running, I see
basically identical performance to the two-node arrangement.

So now my question is:  Is it expected that there would be such
a large performance difference between using osds on a single node
where ceph-mon is running vs. using osds on a single node where
ceph-mon is not running?

-- Tom

> -Original Message-
> From: Deneau, Tom
> Sent: Thursday, September 03, 2015 10:39 AM
> To: 'Christian Balzer'; ceph-users
> Subject: RE: [ceph-users] osds on 2 nodes vs. on one node
> 
> Rewording to remove confusion...
> 
> Config 1: set up a cluster with 1 node with 6 OSDs Config 2: identical
> hardware, set up a cluster with 2 nodes with 3 OSDs each
> 
> In each case I do the following:
>1) rados bench write --no-cleanup the same number of 4M size objects
>2) drop caches on all osd nodes
>3) rados bench seq  -t 4 to sequentially read the objects
>   and record the read bandwidth
> 
> Rados bench is running on a separate client, not on an OSD node.
> The client has plenty of spare CPU power and the network and disk utilization
> are not limiting factors.
> 
> With Config 1, I see approximately 70% more sequential read bandwidth than
> with Config 2.
> 
> In both cases the primary OSDs of the objecgts appear evenly distributed
> across OSDs.
> 
> Yes, replication factor is 2 but since we are only measuring read
> performance, I don't think that matters.
> 
> Question is whether there is a ceph parameter that might be throttling the
> 2 node configuation?
> 
> -- Tom
> 
> > -Original Message-
> > From: Christian Balzer [mailto:ch...@gol.com]
> > Sent: Wednesday, September 02, 2015 7:29 PM
> > To: ceph-users
> > Cc: Deneau, Tom
> > Subject: Re: [ceph-users] osds on 2 nodes vs. on one node
> >
> >
> > Hello,
> >
> > On Wed, 2 Sep 2015 22:38:12 + Deneau, Tom wrote:
> >
> > > In a small cluster I have 2 OSD nodes with identical hardware, each
> > > with
> > > 6 osds.
> > >
> > > * Configuration 1:  I shut down the osds on one node so I am using 6
> > > OSDS on a single node
> > >
> > Shut down how?
> > Just a "service blah stop" or actually removing them from the cluster
> > aka CRUSH map?
> >
> > > * Configuration 2:  I shut down 3 osds on each node so now I have 6
> > > total OSDS but 3 on each node.
> > >
> > Same as above.
> > And in this case even more relevant, because just shutting down random
> > OSDs on both nodes would result in massive recovery action at best and
> > more likely a broken cluster.
> >
> > > I measure read performance using rados bench from a separate client node.
> > Default parameters?
> >
> > > The client has plenty of spare CPU power and the network and disk
> > > utilization are not limiting factors. In all cases, the pool type is
> > > replicated so we're just reading from the primary.
> > >
> > Replicated as in size 2?
> > We can guess/assume that from your cluster size, but w/o you telling
> > us or giving us all the various config/crush outputs that is only a guess.
> >
> > > With Configuration 1, I see approximately 70% more bandwidth than
> > > with configuration 2.
> >
> > Never mind that bandwidth is mostly irrelevant in real life, which
> > bandwidth, read or write?
> >
> > > In general, any configuration where the osds span 2 nodes gets
> > > poorer performance but in particular when the 2 nodes have equal
> > > amounts of traffic.
> > >
> >
> > Again, guessing from what you're actually doing this isn't particular
> > surprising.
> > Because with a single node, default rules and replication of 2 your
> > OSDs never have to replicate anything when it comes to writes.
> > Whereas with 2 nodes replication happens and takes more time (latency)
> > and might also saturate your network (we have of course no idea how
> > your cluster looks like).
> >
> > Christian
> >
> > > Is there any ceph parameter that might be throttling the cases where
> > > osds span 2 nodes?
> > >
> > > -- Tom Deneau, AMD
> > > ___
> > > ceph-users mailing list
> > > ceph-users@lists.ceph.com
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > >
> >
> >
> > --
> > Christian BalzerNetwork/Systems Engineer
> > ch...@gol.com   Global OnLine Japan/Fusion Communications
> > http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] osds on 2 nodes vs. on one node

2015-09-08 Thread Gregory Farnum
On Fri, Sep 4, 2015 at 12:24 AM, Deneau, Tom  wrote:
> After running some other experiments, I see now that the high single-node
> bandwidth only occurs when ceph-mon is also running on that same node.
> (In these small clusters I only had one ceph-mon running).
> If I compare to a single-node where ceph-mon is not running, I see
> basically identical performance to the two-node arrangement.
>
> So now my question is:  Is it expected that there would be such
> a large performance difference between using osds on a single node
> where ceph-mon is running vs. using osds on a single node where
> ceph-mon is not running?

No. There's clearly some kind of weird confound going on here.
Honestly my first thought (I haven't heard of anything like this
before) is that you might want to look at the power-saving profile of
your nodes. Maybe the extra load of the monitor is keeping the CPU
awake or something...
-Greg

>
> -- Tom
>
>> -Original Message-
>> From: Deneau, Tom
>> Sent: Thursday, September 03, 2015 10:39 AM
>> To: 'Christian Balzer'; ceph-users
>> Subject: RE: [ceph-users] osds on 2 nodes vs. on one node
>>
>> Rewording to remove confusion...
>>
>> Config 1: set up a cluster with 1 node with 6 OSDs Config 2: identical
>> hardware, set up a cluster with 2 nodes with 3 OSDs each
>>
>> In each case I do the following:
>>1) rados bench write --no-cleanup the same number of 4M size objects
>>2) drop caches on all osd nodes
>>3) rados bench seq  -t 4 to sequentially read the objects
>>   and record the read bandwidth
>>
>> Rados bench is running on a separate client, not on an OSD node.
>> The client has plenty of spare CPU power and the network and disk utilization
>> are not limiting factors.
>>
>> With Config 1, I see approximately 70% more sequential read bandwidth than
>> with Config 2.
>>
>> In both cases the primary OSDs of the objecgts appear evenly distributed
>> across OSDs.
>>
>> Yes, replication factor is 2 but since we are only measuring read
>> performance, I don't think that matters.
>>
>> Question is whether there is a ceph parameter that might be throttling the
>> 2 node configuation?
>>
>> -- Tom
>>
>> > -Original Message-
>> > From: Christian Balzer [mailto:ch...@gol.com]
>> > Sent: Wednesday, September 02, 2015 7:29 PM
>> > To: ceph-users
>> > Cc: Deneau, Tom
>> > Subject: Re: [ceph-users] osds on 2 nodes vs. on one node
>> >
>> >
>> > Hello,
>> >
>> > On Wed, 2 Sep 2015 22:38:12 + Deneau, Tom wrote:
>> >
>> > > In a small cluster I have 2 OSD nodes with identical hardware, each
>> > > with
>> > > 6 osds.
>> > >
>> > > * Configuration 1:  I shut down the osds on one node so I am using 6
>> > > OSDS on a single node
>> > >
>> > Shut down how?
>> > Just a "service blah stop" or actually removing them from the cluster
>> > aka CRUSH map?
>> >
>> > > * Configuration 2:  I shut down 3 osds on each node so now I have 6
>> > > total OSDS but 3 on each node.
>> > >
>> > Same as above.
>> > And in this case even more relevant, because just shutting down random
>> > OSDs on both nodes would result in massive recovery action at best and
>> > more likely a broken cluster.
>> >
>> > > I measure read performance using rados bench from a separate client node.
>> > Default parameters?
>> >
>> > > The client has plenty of spare CPU power and the network and disk
>> > > utilization are not limiting factors. In all cases, the pool type is
>> > > replicated so we're just reading from the primary.
>> > >
>> > Replicated as in size 2?
>> > We can guess/assume that from your cluster size, but w/o you telling
>> > us or giving us all the various config/crush outputs that is only a guess.
>> >
>> > > With Configuration 1, I see approximately 70% more bandwidth than
>> > > with configuration 2.
>> >
>> > Never mind that bandwidth is mostly irrelevant in real life, which
>> > bandwidth, read or write?
>> >
>> > > In general, any configuration where the osds span 2 nodes gets
>> > > poorer performance but in particular when the 2 nodes have equal
>> > > amounts of traffic.
>> > >
>> >
>> > Again, guessing from what you're actually doing this isn'