Thank your reply
  Our cluster are runing for two years in production,and it has no problem,so 
we don't upgrade.
  I check memory on host.Very little memory of free left.Does creating thread 
failure have anything to do with this?
  In addition to the kvm virtual machine, there are 22 osds on the host.

free -m
       total        used       free    shared  buff/cache   available
Mem:    515420   178212     4323       729       332884       335360
Swap:   8191        8145       46

> sysctl:
> kernel.pid_max=4194303
> kernel.threads-max=2097152
> vm.max_map_count=524288
> 
> But really, why are you still running Hammer?  Later releases handle a large 
> number of OSDs *much* better.
> 
> > On Jun 1, 2020, at 7:08 PM, 展荣臻(信泰) <zhanrzh...@teamsun.com.cn> wrote:
> > 
> > Hi all,
> >   We have a hammer ceph cluster with 3 monitor,324 osds. OSD daemon and kvm 
> > is collocated on node;
> > The ceph cluster are runing 2 years.Recently we added ~700 osds to the 
> > cluster,as process:
> > 1.ceph osd create 
> > 2. mkdir -p /var/lib/ceph/osd/ceph-$osd
> > 3. mkfs.xfs -f /dev/$disk
> > 4. mount -o inode64,notime /dev/$disk /var/lib/ceph/osd/ceph-$osd
> > 5. ceph-osd -i 0 --mkfs --mkkey 
> > 6.ceph auth add osd.$osd osd 'allow *' mon 'allow profile osd' -i 
> > /var/lib/ceph/osd/ceph-$osd/keyring
> > 7.ceph osd crush create-or-move $osd host=kvm101 root=default 
> > Mabe we do that requently.After add 122 osds, osd.1-osd.8 failed
> > 
> > 2020-05-14 16:48:29.881021 7f6727fb9700 -1 common/Thread.cc: In function 
> > 'void Thread::create(size_t)' thread 7f6727fb9700 time 2020-05-14 
> > 16:48:29.870051
> > common/Thread.cc: 129: FAILED assert(ret == 0)
> > 
> > ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)
> > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
> > const*)+0x85) [0xbc8b55]
> > 2: (Thread::create(unsigned long)+0x8a) [0xbac50a]
> > 3: (Pipe::accept()+0x37fb) [0xca6c3b]
> > 4: (Pipe::reader()+0x1a0f) [0xcaa75f]
> > 5: (Pipe::Reader::entry()+0xd) [0xcb351d]
> > 6: (()+0x7dc5) [0x7f67a45ebdc5]
> > 7: (clone()+0x6d) [0x7f67a30cc1cd]
> > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to 
> > interpret this.
> > 
> > ulimit -u
> > 2061600
> > open files  32768
> > 
> > 
> > Does anyone know what's going on? Why create thread faild?
> > 
> > 
> > 
> > 
> > 
> > 
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to