[ceph-users] librados pthread_create failure

2013-08-26 Thread Greg Poirier
So, in doing some testing last week, I believe I managed to exhaust the
number of threads available to nova-compute last week. After some
investigation, I found the pthread_create failure and increased nproc for
our Nova user to, what I considered, a ridiculous 120,000 threads after
reading that librados will require a thread per osd, plus a few for
overhead, per VM on our compute nodes.

This made me wonder: how many threads could Ceph possibly need on one of
our compute nodes.

32 cores * an overcommit ratio of 16, assuming each one is booted from a
Ceph volume, * 300 (approximate number of disks in our soon-to-go-live Ceph
cluster) = 153,600 threads.

So this is where I started to put the truck in reverse. Am I right? What
about when we triple the size of our Ceph cluster? I could easily see a
future where we have easily 1,000 disks, if not many, many more in our
cluster. How do people scale this? Do you RAID to increase the density of
your Ceph cluster? I can only imagine that this will also drastically
increase the amount of resources required on my data nodes as well.

So... suggestions? Reading?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] librados pthread_create failure

2013-08-26 Thread Gregory Farnum
On Mon, Aug 26, 2013 at 9:24 AM, Greg Poirier greg.poir...@opower.com wrote:
 So, in doing some testing last week, I believe I managed to exhaust the
 number of threads available to nova-compute last week. After some
 investigation, I found the pthread_create failure and increased nproc for
 our Nova user to, what I considered, a ridiculous 120,000 threads after
 reading that librados will require a thread per osd, plus a few for
 overhead, per VM on our compute nodes.

 This made me wonder: how many threads could Ceph possibly need on one of our
 compute nodes.

 32 cores * an overcommit ratio of 16, assuming each one is booted from a
 Ceph volume, * 300 (approximate number of disks in our soon-to-go-live Ceph
 cluster) = 153,600 threads.

 So this is where I started to put the truck in reverse. Am I right? What
 about when we triple the size of our Ceph cluster? I could easily see a
 future where we have easily 1,000 disks, if not many, many more in our
 cluster. How do people scale this? Do you RAID to increase the density of
 your Ceph cluster? I can only imagine that this will also drastically
 increase the amount of resources required on my data nodes as well.

 So... suggestions? Reading?

Your math looks right to me. So far though it hasn't caused anybody
any trouble — Linux threads are much cheaper than people imagine when
they're inactive. At some point we will certainly need to reduce the
thread counts of our messenger (using epoll on a bunch of sockets
instead of 2 threads - 1 socket), but it hasn't happened yet.
In terms of things you can do if this does become a problem, the most
prominent is probably to (sigh) partition your cluster into pods on a
per-rack basis or something. This is actually not as bad as it sounds
since your network design probably would prefer not to send all writes
through your core router, so if you create a pool for each rack and do
something like this rack, next rack, next row for your replication you
get better network traffic patterns.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com