I would say you're a little light on RAM.  With 4TB disks 70% full, I've
seen some ceph-osd processes using 3.5GB of RAM during recovery.  You'll be
fine during normal operation, but you might run into issues at the worst
possible time.

I have 8 OSDs per node, and 32G of RAM.  I've had ceph-osd processes start
swapping, and that's a great way to get them kicked out for being
unresponsive.


I'm not a dev, but I can make some wild and uninformed guesses :-) .  The
primary OSD uses more CPU than the replicas, and I suspect that you have
more primaries on the hot nodes.

Since you're testing, try repeating the test on 3 OSD nodes instead of 4.
If you don't want to run that test, you can generate a histogram from ceph
pg dump data, and see if there are more primary osds (the first one in the
acting array) on the hot nodes.



On Wed, Mar 18, 2015 at 7:18 AM, f...@univ-lr.fr <f...@univ-lr.fr> wrote:

> Hi to the ceph-users list !
>
> We're setting up a new Ceph infrastructure :
> - 1 MDS admin node
> - 4 OSD storage nodes (60 OSDs)
>   each of them running a monitor
> - 1 client
>
> Each 32GB RAM/16 cores OSD node supports 15 x 4TB SAS OSDs (XFS) and 1 SSD
> with 5GB journal partitions, all in JBOD attachement.
> Every node has 2x10Gb LACP attachement.
> The OSD nodes are freshly installed with puppet then from the admin node
> Default OSD weight in the OSD tree
> 1 test pool with 4096 PGs
>
> During setup phase, we're trying to qualify the performance
> characteristics of our setup.
> Rados benchmark are done from a client with these commandes :
> rados -p pool -b 4194304 bench 60 write -t 32 --no-cleanup
> rados -p pool -b 4194304 bench 60 seq -t 32 --no-cleanup
>
> Each time we observed a recurring phenomena : 2 of the 4 OSD nodes have
> twice the CPU load :
> http://www.4shared.com/photo/Ua0umPVbba/UnevenLoad.html
> (What to look at is the real-time %CPU and the cumulated CPU time per
> ceph-osd process)
>
> And after a fresh complete reinstall to be sure, this twice-as-high CPU
> load is observed but not on the same 2 nodes :
> http://www.4shared.com/photo/2AJfd1B_ba/UnevenLoad-v2.html
>
> Nothing obvious about the installation seems able to explain that.
>
> The crush distribution function doesn't have more than 4.5% inequality
> between the 4 OSD nodes for the primary OSDs of the objects, and less than
> 3% between the hosts if we considere the whole acting sets for the objects
> used during the benchmark. And the differences are not accordingly
> comparable to the CPU loads. So the cause has to be elsewhere.
>
> I cannot be sure it has no impact on performance. Even if we have enough
> CPU cores headroom, logic would say it has to have some consequences on
> delays and also on performances .
>
> Would someone have any idea, or reproduce the test on its setup to see if
> this is a common comportment ?
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to