Re: [ceph-users] Expanding pg's of an erasure coded pool

Kenneth Waegeman Wed, 21 May 2014 03:53:28 -0700

Thanks! I increased the max processes parameter for all daemons quitea lot (until ulimit -u 3802720)


These are the limits for the daemons now..
[root@ ~]# cat /proc/17006/limits
Limit                     Soft Limit           Hard Limit           Units
Max cpu time              unlimited            unlimited            seconds
Max file size             unlimited            unlimited            bytes
Max data size             unlimited            unlimited            bytes
Max stack size            10485760             unlimited            bytes
Max core file size        unlimited            unlimited            bytes
Max resident set          unlimited            unlimited            bytes
Max processes             3802720              3802720              processes
Max open files            32768                32768                files
Max locked memory         65536                65536                bytes
Max address space         unlimited            unlimited            bytes
Max file locks            unlimited            unlimited            locks
Max pending signals       95068                95068                signals
Max msgqueue size         819200               819200               bytes
Max nice priority         0                    0
Max realtime priority     0                    0
Max realtime timeout      unlimited            unlimited            us


But this didn't help. Are there other parameters I should change?

I also got an 'bash: fork: Cannot allocate memory' error once whenrunning a command after starting the ceph services. It shouldn't be amemory shortage issue itself because when monitoring the failure thereis still enough (cached) available..



----- Message from Gregory Farnum <g...@inktank.com> ---------
   Date: Tue, 20 May 2014 10:33:30 -0700
   From: Gregory Farnum <g...@inktank.com>
Subject: Re: [ceph-users] Expanding pg's of an erasure coded pool
     To: Kenneth Waegeman <kenneth.waege...@ugent.be>
     Cc: ceph-users <ceph-users@lists.ceph.com>

This failure means the messenger subsystem is trying to create a
thread and is getting an error code back — probably due to a process
or system thread limit that you can turn up with ulimit.

This is happening because a replicated PG primary needs a connection
to only its replicas (generally 1 or 2 connections), but with an
erasure-coded PG the primary requires a connection to m+n-1 replicas
(everybody who's in the erasure-coding set, including itself). Right
now our messenger requires a thread for each connection, so kerblam.
(And it actually requires a couple such connections because we have
separate heartbeat, cluster data, and client data systems.)
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Tue, May 20, 2014 at 3:43 AM, Kenneth Waegeman
<kenneth.waege...@ugent.be> wrote:

Hi,

On a setup of 400 OSDs (20 nodes, with 20 OSDs per node), I first tried to
create a erasure coded pool with 4096 pgs, but this crashed the cluster.
I then started with 1024 pgs, expanding to 2048 (pg_num and pgp_num), when I
then try to expand to 4096 (not even quite enough) the cluster crashes
again. ( Do we need less of pg's with erasure coding?)

The crash starts with individual OSDs crashing, eventually bringing down the
mons (until there is no more quorum or too few osds)

Out of the logs:


   -16> 2014-05-20 10:31:55.545590 7fd42f34d700  5 -- op tracker -- , seq:
14301, time: 2014-05-20 10:31:55.545590, event: started, request:
pg_query(0.974 epoch 3315) v3
   -15> 2014-05-20 10:31:55.545776 7fd42f34d700  1 --
130.246.178.141:6836/10446 --> 130.246.179.191:6826/21854 -- pg_notify(0.974
epoch 3326) v5 -- ?+0 0xc8b4ec0 con 0x9
026b40
   -14> 2014-05-20 10:31:55.545807 7fd42f34d700  5 -- op tracker -- , seq:
14301, time: 2014-05-20 10:31:55.545807, event: done, request:
pg_query(0.974 epoch 3315) v3
   -13> 2014-05-20 10:31:55.559661 7fd3fdb0f700  1 --
130.246.178.141:6837/10446 >> :/0 pipe(0xce0c380 sd=468 :6837 s=0 pgs=0 cs=0
l=0 c=0x1255f0c0).accept sd=468 130.246.179.191:60618/0
   -12> 2014-05-20 10:31:55.564034 7fd3bf72f700  1 --
130.246.178.141:6838/10446 >> :/0 pipe(0xe3f2300 sd=596 :6838 s=0 pgs=0 cs=0
l=0 c=0x129b5ee0).accept sd=596 130.246.179.191:43913/0
   -11> 2014-05-20 10:31:55.627776 7fd42df4b700  1 --
130.246.178.141:0/10446 <== osd.170 130.246.179.191:6827/21854 3 ====
osd_ping(ping_reply e3316 stamp 2014-05-20 10:31:52.994368) v2 ==== 47+0+0
(855262282 0 0) 0xb6863c0 con 0x1255b9c0
   -10> 2014-05-20 10:31:55.629425 7fd42df4b700  1 --
130.246.178.141:0/10446 <== osd.170 130.246.179.191:6827/21854 4 ====
osd_ping(ping_reply e3316 stamp 2014-05-20 10:31:53.509621) v2 ==== 47+0+0
(2581193378 0 0) 0x93d6c80 con 0x1255b9c0
    -9> 2014-05-20 10:31:55.631270 7fd42f34d700  1 --
130.246.178.141:6836/10446 <== osd.169 130.246.179.191:6841/25473 2 ====
pg_query(7.3ffs6 epoch 3326) v3 ==== 144+0+0 (221596234 0 0) 0x10b994a0 con
0x9383860
    -8> 2014-05-20 10:31:55.631308 7fd42f34d700  5 -- op tracker -- , seq:
14302, time: 2014-05-20 10:31:55.631130, event: header_read, request:
pg_query(7.3ffs6 epoch 3326) v3
    -7> 2014-05-20 10:31:55.631315 7fd42f34d700  5 -- op tracker -- , seq:
14302, time: 2014-05-20 10:31:55.631133, event: throttled, request:
pg_query(7.3ffs6 epoch 3326) v3
    -6> 2014-05-20 10:31:55.631339 7fd42f34d700  5 -- op tracker -- , seq:
14302, time: 2014-05-20 10:31:55.631207, event: all_read, request:
pg_query(7.3ffs6 epoch 3326) v3
    -5> 2014-05-20 10:31:55.631343 7fd42f34d700  5 -- op tracker -- , seq:
14302, time: 2014-05-20 10:31:55.631303, event: dispatched, request:
pg_query(7.3ffs6 epoch 3326) v3
    -4> 2014-05-20 10:31:55.631349 7fd42f34d700  5 -- op tracker -- , seq:
14302, time: 2014-05-20 10:31:55.631349, event: waiting_for_osdmap, request:
pg_query(7.3ffs6 epoch 3326) v3
    -3> 2014-05-20 10:31:55.631363 7fd42f34d700  5 -- op tracker -- , seq:
14302, time: 2014-05-20 10:31:55.631363, event: started, request:
pg_query(7.3ffs6 epoch 3326) v3
    -2> 2014-05-20 10:31:55.631402 7fd42f34d700  5 -- op tracker -- , seq:
14302, time: 2014-05-20 10:31:55.631402, event: done, request:
pg_query(7.3ffs6 epoch 3326) v3
    -1> 2014-05-20 10:31:55.631488 7fd427b41700  1 --
130.246.178.141:6836/10446 --> 130.246.179.191:6841/25473 --
pg_notify(7.3ffs6(14) epoch 3326) v5 -- ?+0 0xcc7b9c0 con 0x9383860
     0> 2014-05-20 10:31:55.632127 7fd42cb49700 -1 common/Thread.cc: In
function 'void Thread::create(size_t)' thread 7fd42cb49700 time 2014-05-20
10:31:55.630937
common/Thread.cc: 110: FAILED assert(ret == 0)

 ceph version 0.80.1 (a38fe1169b6d2ac98b427334c12d7cf81f809b74)
 1: (Thread::create(unsigned long)+0x8a) [0xa83f8a]
 2: (SimpleMessenger::add_accept_pipe(int)+0x6a) [0xa2a6aa]
 3: (Accepter::entry()+0x265) [0xb3ca45]
 4: (()+0x79d1) [0x7fd4436b19d1]
 5: (clone()+0x6d) [0x7fd4423ecb6d]

--- begin dump of recent events ---
     0> 2014-05-20 10:31:56.622247 7fd3bc5fe700 -1 *** Caught signal
(Aborted) **
 in thread 7fd3bc5fe700

 ceph version 0.80.1 (a38fe1169b6d2ac98b427334c12d7cf81f809b74)
 1: /usr/bin/ceph-osd() [0x9ab3b1]
 2: (()+0xf710) [0x7fd4436b9710]
 3: (gsignal()+0x35) [0x7fd442336925]
 4: (abort()+0x175) [0x7fd442338105]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x12d) [0x7fd442bf0a5d]
 6: (()+0xbcbe6) [0x7fd442beebe6]
 7: (()+0xbcc13) [0x7fd442beec13]
 8: (()+0xbcd0e) [0x7fd442beed0e]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x7f2) [0xaec612]
 10: (Thread::create(unsigned long)+0x8a) [0xa83f8a]
 11: (Pipe::connect()+0x2efb) [0xb2850b]
 12: (Pipe::writer()+0x9f3) [0xb2a063]
 13: (Pipe::Writer::entry()+0xd) [0xb359cd]
 14: (()+0x79d1) [0x7fd4436b19d1]
 15: (clone()+0x6d) [0x7fd4423ecb6d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to
interpret this.


--- begin dump of recent events ---
     0> 2014-05-20 10:37:50.378377 7ff018059700 -1 *** Caught signal
(Aborted) **
 in thread 7ff018059700

in the mon:
 ceph version 0.80.1 (a38fe1169b6d2ac98b427334c12d7cf81f809b74)
 1: /usr/bin/ceph-mon() [0x86b991]
 2: (()+0xf710) [0x7ff01ee5b710]
 3: (gsignal()+0x35) [0x7ff01dad8925]
 4: (abort()+0x175) [0x7ff01dada105]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x12d) [0x7ff01e392a5d]
 6: (()+0xbcbe6) [0x7ff01e390be6]
 7: (()+0xbcc13) [0x7ff01e390c13]
 8: (()+0xbcd0e) [0x7ff01e390d0e]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x7f2) [0x7a5472]
 10: (Thread::create(unsigned long)+0x8a) [0x748c9a]
 11: (SimpleMessenger::add_accept_pipe(int)+0x6a) [0x8351ba]
 12: (Accepter::entry()+0x265) [0x863295]
 13: (()+0x79d1) [0x7ff01ee539d1]
 14: (clone()+0x6d) [0x7ff01db8eb6d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to
interpret this.

When I make a replicated pool, I can go already to 8192pgs without problem.

Thanks already!!

Kind regards,
Kenneth

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



----- End message from Gregory Farnum <g...@inktank.com> -----

--

Met vriendelijke groeten,
Kenneth Waegeman


_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Expanding pg's of an erasure coded pool

Reply via email to