Thanks Guys kernel.pid_max=4194303 did the trick.

- Karan -

> On 09 Mar 2015, at 14:48, Christian Eichelmann 
> <christian.eichelm...@1und1.de> wrote:
> 
> Hi Karan,
> 
> as you are actually writing in your own book, the problem is the sysctl
> setting "kernel.pid_max". I've seen in your bug report that you were
> setting it to 65536, which is still to low for high density hardware.
> 
> In our cluster, one OSD server has in an idle situation about 66.000
> Threads (60 OSDs per Server). The number of threads increases when you
> increase the number of placement groups in the cluster, which I think
> has triggered your problem.
> 
> Set the "kernel.pid_max" setting to 4194303 (the maximum) like Azad
> Aliyar suggested, and the problem should be gone.
> 
> Regards,
> Christian
> 
> Am 09.03.2015 11:41, schrieb Karan Singh:
>> Hello Community need help to fix a long going Ceph problem.
>> 
>> Cluster is unhealthy , Multiple OSDs are DOWN. When i am trying to
>> restart OSD’s i am getting this error 
>> 
>> 
>> /2015-03-09 12:22:16.312774 7f760dac9700 -1 common/Thread.cc
>> <http://Thread.cc>: In function 'void Thread::create(size_t)' thread
>> 7f760dac9700 time 2015-03-09 12:22:16.311970/
>> /common/Thread.cc <http://Thread.cc>: 129: FAILED assert(ret == 0)/
>> 
>> 
>> *Environment *:  4 Nodes , OSD+Monitor , Firefly latest , CentOS6.5
>> , 3.17.2-1.el6.elrepo.x86_64
>> 
>> Tried upgrading from 0.80.7 to 0.80.8  but no Luck
>> 
>> Tried centOS stock kernel 2.6.32  but no Luck
>> 
>> Memory is not a problem more then 150+GB is free 
>> 
>> 
>> Did any one every faced this problem ??
>> 
>> *Cluster status *
>> *
>> *
>> / cluster 2bd3283d-67ef-4316-8b7e-d8f4747eae33/
>> /     health HEALTH_WARN 7334 pgs degraded; 1185 pgs down; 1 pgs
>> incomplete; 1735 pgs peering; 8938 pgs stale; 1/
>> /736 pgs stuck inactive; 8938 pgs stuck stale; 10320 pgs stuck unclean;
>> recovery 6061/31080 objects degraded (19/
>> /.501%); 111/196 in osds are down; clock skew detected on mon.pouta-s02,
>> mon.pouta-s03/
>> /     monmap e3: 3 mons at
>> {pouta-s01=10.XXX.50.1:6789/0,pouta-s02=10.XXX.50.2:6789/0,pouta-s03=10.XXX.50.3:6789/
>> //0}, election epoch 1312, quorum 0,1,2 pouta-s01,pouta-s02,pouta-s03/
>> /   * osdmap e26633: 239 osds: 85 up, 196 in*/
>> /      pgmap v60389: 17408 pgs, 13 pools, 42345 MB data, 10360 objects/
>> /            4699 GB used, 707 TB / 711 TB avail/
>> /            6061/31080 objects degraded (19.501%)/
>> /                  14 down+remapped+peering/
>> /                  39 active/
>> /                3289 active+clean/
>> /                 547 peering/
>> /                 663 stale+down+peering/
>> /                 705 stale+active+remapped/
>> /                   1 active+degraded+remapped/
>> /                   1 stale+down+incomplete/
>> /                 484 down+peering/
>> /                 455 active+remapped/
>> /                3696 stale+active+degraded/
>> /                   4 remapped+peering/
>> /                  23 stale+down+remapped+peering/
>> /                  51 stale+active/
>> /                3637 active+degraded/
>> /                3799 stale+active+clean/
>> 
>> *OSD :  Logs *
>> 
>> /2015-03-09 12:22:16.312774 7f760dac9700 -1 common/Thread.cc
>> <http://Thread.cc>: In function 'void Thread::create(size_t)' thread
>> 7f760dac9700 time 2015-03-09 12:22:16.311970/
>> /common/Thread.cc <http://Thread.cc>: 129: FAILED assert(ret == 0)/
>> /
>> /
>> / ceph version 0.80.8 (69eaad7f8308f21573c604f121956e64679a52a7)/
>> / 1: (Thread::create(unsigned long)+0x8a) [0xaf41da]/
>> / 2: (SimpleMessenger::add_accept_pipe(int)+0x6a) [0xae84fa]/
>> / 3: (Accepter::entry()+0x265) [0xb5c635]/
>> / 4: /lib64/libpthread.so.0() [0x3c8a6079d1]/
>> / 5: (clone()+0x6d) [0x3c8a2e89dd]/
>> / NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>> needed to interpret this./
>> 
>> 
>> *More information at Ceph Tracker Issue :
>> *http://tracker.ceph.com/issues/10988#change-49018
>> 
>> 
>> ****************************************************************
>> Karan Singh 
>> Systems Specialist , Storage Platforms
>> CSC - IT Center for Science,
>> Keilaranta 14, P. O. Box 405, FIN-02101 Espoo, Finland
>> mobile: +358 503 812758
>> tel. +358 9 4572001
>> fax +358 9 4572302
>> http://www.csc.fi/
>> ****************************************************************
>> 
>> 
>> 
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 
> 
> 
> -- 
> Christian Eichelmann
> Systemadministrator
> 
> 1&1 Internet AG - IT Operations Mail & Media Advertising & Targeting
> Brauerstraße 48 · DE-76135 Karlsruhe
> Telefon: +49 721 91374-8026
> christian.eichelm...@1und1.de
> 
> Amtsgericht Montabaur / HRB 6484
> Vorstände: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, Robert
> Hoffmann, Markus Huhn, Hans-Henning Kettler, Dr. Oliver Mauss, Jan Oetjen
> Aufsichtsratsvorsitzender: Michael Scheeren

Attachment: smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to