On May 28, 2014, at 5:31 AM, Gregory Farnum <g...@inktank.com> wrote:

> On Sun, May 25, 2014 at 6:24 PM, Guang Yang <yguan...@yahoo.com> wrote:
>> On May 21, 2014, at 1:33 AM, Gregory Farnum <g...@inktank.com> wrote:
>> 
>>> This failure means the messenger subsystem is trying to create a
>>> thread and is getting an error code back — probably due to a process
>>> or system thread limit that you can turn up with ulimit.
>>> 
>>> This is happening because a replicated PG primary needs a connection
>>> to only its replicas (generally 1 or 2 connections), but with an
>>> erasure-coded PG the primary requires a connection to m+n-1 replicas
>>> (everybody who's in the erasure-coding set, including itself). Right
>>> now our messenger requires a thread for each connection, so kerblam.
>>> (And it actually requires a couple such connections because we have
>>> separate heartbeat, cluster data, and client data systems.)
>> Hi Greg,
>> Is there any plan to refactor the messenger component to reduce the num of 
>> threads? For example, use event-driven mode.
> 
> We've discussed it in very broad terms, but there are no concrete
> designs and it's not on the schedule yet. If anybody has conclusive
> evidence that it's causing them trouble they can't work around, that
> would be good to know…
Thanks for the response!

We used to have a cluster with each OSD host having 11 disks (daemons), on each 
host, there are around 15K threads, the system is stable but when there is 
cluster wide change (e.g. OSD down / out, recovery), we observed system load 
increasing, there is no cascading failure though.

Most recently we are evaluating Ceph against high density hardware with each 
OSD host having 33 disks (daemons), on each host, there are around 40K-50K 
threads, with some OSD host down/out, we started seeing high load increasing 
and a large volume of thread join/creation.

We don’t have a strong evidence that the messenger thread model is the problem 
and how event-driven approach can help, but I think as moving to high density 
hardware (for cost saving purpose), the issue could be amplified.

If there is any plan, it is good to know and we are very interested to involve.

Thanks,
Guang

> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to