Hi, sir. I'm sorry I made a mistake, the fix that you provided should be the one we need, is it safe for us to simply "git cherry-pick" that commit into our 0.94.5 version?
So sorry for my mistake. Thank you. On Wed, Jan 11, 2017 at 3:59 PM, 许雪寒 <xuxue...@360.cn> wrote: > In our test, when one machine is under heavy packet loss, OSDs on > other machines can be brought down and sometimes more than one OSD go > down, because they may try to read a message that comes from that > machine. So we are really concerned, please help us. Thanks > > > -----邮件原件----- > 发件人: 许雪寒 > 发送时间: 2017年1月11日 15:13 > 收件人: 'Sage Weil' > 抄送: ceph-de...@vger.kernel.org > 主题: 答复: Pipe "deadlock" in Hammer, 0.94.5 > > Thanks for your reply, sir:-) > > Actually, this case is not very rare in our test. When iptables drop ip > packets with the probability around 98%~99%, this case occurs about every > three times our test runs. > I checked #14120 as you recommended, however, it doesn't seem to be our > problem. Because, as http://tracker.ceph.com/issues/18184 says, #14120 is > caused by a commit that changed Pipe::tcp_read_wait() to return -errno > instead of "-1" and this commit is not applied in our tested hammer > verion(0.94.5). > And I truly agree with you in that the invocation of "recv" function passes > MSG_DONTWAIT flag, so it shouldn't block, however, every time we encounter > this problem it's always the Pipe::reader_thread who's calling recv that hold > the lock when suicide happens, this is really confusing. > > Please help us, thank you:-) > > > > -----邮件原件----- > 发件人: Sage Weil [mailto:s...@newdream.net] > 发送时间: 2017年1月10日 21:12 > 收件人: 许雪寒 > 抄送: ceph-de...@vger.kernel.org > 主题: Re: Pipe "deadlock" in Hammer, 0.94.5 > > On Tue, 10 Jan 2017, 许雪寒 wrote: >> Hi, everyone. >> >> Recently, we did some experiment to test the stability of the ceph cluster. >> We used Hammer version which is the mostly used version of online cluster. >> One of the scenarios that we simulated is poor network connectivity, in >> which we used iptables to drop TCP/IP packet under some probability. And >> sometimes, we can see that some OSD suicide themselves. >> >> We used gdb to debug the core dumped by linux. We found that the thread that >> hit the suicide time threshold is a peering thread who is trying to send a >> pg_notify message, the ceph-osd log file and gdb output is as follows: >> >> Log file: >> -3> 2017-01-10 17:02:13.469949 7fd446ff7700 1 heartbeat_map >> is_healthy 'OSD::osd_tp thread 0x7fd440bed700' had timed out after 15 >> -2> 2017-01-10 17:02:13.469952 7fd446ff7700 1 heartbeat_map >> is_healthy 'OSD::osd_tp thread 0x7fd440bed700' had suicide timed out >> after 150 >> -1> 2017-01-10 17:02:13.469954 7fd4451f4700 1 -- >> 10.160.132.157:6818/10014122 <== osd.20 10.160.132.156:0/24908 163 >> ==== osd_ping(ping e4030 stamp 2017-01-10 17:02:13.450374) v2 ==== >> 47+0+0 (3247646131 0 0) 0x7fd418ca8600 con 0x7fd413c89700 >> 0> 2017-01-10 17:02:13.496895 7fd446ff7700 -1 error_msg >> common/HeartbeatMap.cc: In function 'bool >> ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, const char*, >> time_t)' thread 7fd446ff7700 time 2017-01-10 17:02:13.469969 >> common/HeartbeatMap.cc: 79: FAILED assert(0 == "hit suicide timeout") >> >> GDB OUTPUT: >> (gdb) thread 8 >> [Switching to thread 8 (Thread 0x7fd440bed700 (LWP 15302))]#0 >> 0x0000003c5d80e334 in __lll_lock_wait () from /lib64/libpthread.so.0 >> (gdb) bt >> #0 0x0000003c5d80e334 in __lll_lock_wait () from >> /lib64/libpthread.so.0 >> #1 0x0000003c5d8095d8 in _L_lock_854 () from /lib64/libpthread.so.0 >> #2 0x0000003c5d8094a7 in pthread_mutex_lock () from >> /lib64/libpthread.so.0 >> #3 0x0000000001a54ae4 in Mutex::Lock (this=0x7fd426453598, >> no_lockdep=false) at common/Mutex.cc:96 >> #4 0x0000000001409285 in Mutex::Locker::Locker (this=0x7fd440beb6c0, >> m=...) at common/Mutex.h:115 >> #5 0x0000000001c46446 in PipeConnection::try_get_pipe >> (this=0x7fd426453580, p=0x7fd440beb908) at >> msg/simple/PipeConnection.cc:38 >> #6 0x0000000001c05809 in SimpleMessenger::submit_message >> (this=0x7fd482029400, m=0x7fd425538d00, con=0x7fd426453580, >> dest_addr=..., dest_type=4, already_locked=false) at >> msg/simple/SimpleMessenger.cc:443 >> #7 0x0000000001c033fa in SimpleMessenger::_send_message >> (this=0x7fd482029400, m=0x7fd425538d00, con=0x7fd426453580) at >> msg/simple/SimpleMessenger.cc:136 >> #8 0x0000000001c467c7 in SimpleMessenger::send_message >> (this=0x7fd482029400, m=0x7fd425538d00, con=0x7fd426453580) at >> msg/simple/SimpleMessenger.h:139 >> #9 0x0000000001c466a1 in PipeConnection::send_message >> (this=0x7fd426453580, m=0x7fd425538d00) at >> msg/simple/PipeConnection.cc:78 >> #10 0x00000000013b3ff2 in OSDService::send_map (this=0x7fd4821e76c8, >> m=0x7fd425538d00, con=0x7fd426453580) at osd/OSD.cc:1054 >> #11 0x00000000013b45e7 in OSDService::send_incremental_map >> (this=0x7fd4821e76c8, since=4028, con=0x7fd426453580, >> osdmap=std::tr1::shared_ptr (count 49) 0x7fd426c0f480) at >> osd/OSD.cc:1087 >> #12 0x00000000013b215f in OSDService::share_map_peer >> (this=0x7fd4821e76c8, peer=9, con=0x7fd426453580, >> map=std::tr1::shared_ptr (count 49) 0x7fd426c0f480) at osd/OSD.cc:887 >> #13 0x00000000013f43cc in OSD::do_notifies (this=0x7fd4821e6000, >> notify_list=std::map with 7 elements = {...}, >> curmap=std::tr1::shared_ptr (count 49) 0x7fd426c0f480) at >> osd/OSD.cc:7246 >> #14 0x00000000013f3c99 in OSD::dispatch_context (this=0x7fd4821e6000, >> ctx=..., pg=0x0, curmap=std::tr1::shared_ptr (count 49) >> 0x7fd426c0f480, handle=0x7fd440becb40) at osd/OSD.cc:7198 >> #15 0x000000000140043e in OSD::process_peering_events >> (this=0x7fd4821e6000, pgs=std::list = {...}, handle=...) at >> osd/OSD.cc:8539 >> #16 0x000000000141e094 in OSD::PeeringWQ::_process >> (this=0x7fd4821e7070, pgs=std::list = {...}, handle=...) at >> osd/OSD.h:1601 >> #17 0x00000000014b94bf in >> ThreadPool::BatchWorkQueue<PG>::_void_process (this=0x7fd4821e7070, >> p=0x7fd425419040, handle=...) at common/WorkQueue.h:107 >> #18 0x0000000001b2d2e8 in ThreadPool::worker (this=0x7fd4821e64b0, >> wt=0x7fd4761db430) at common/WorkQueue.cc:128 >> #19 0x0000000001b313f7 in ThreadPool::WorkThread::entry >> (this=0x7fd4761db430) at common/WorkQueue.h:318 >> #20 0x0000000001b33d40 in Thread::entry_wrapper (this=0x7fd4761db430) >> at common/Thread.cc:61 >> #21 0x0000000001b33cb2 in Thread::_entry_func (arg=0x7fd4761db430) at >> common/Thread.cc:45 >> #22 0x0000003c5d807aa1 in start_thread () from /lib64/libpthread.so.0 >> #23 0x0000003c5d4e8aad in clone () from /lib64/libc.so.6 >> >> And the thread that is holding the mutex lock which this thread is trying to >> get is a pipe reader_thread who is trying to read a full message that sent >> from another OSD: >> >> (gdb) frame 2 >> #2 0x0000003c5d8094a7 in pthread_mutex_lock () from >> /lib64/libpthread.so.0 >> (gdb) info reg >> rax 0xfffffffffffffe00 -512 rbx >> 0x7fd425538d00 140549136026880 rcx 0xffffffffffffffff >> -1 rdx 0x10 16 rsi >> 0x80 128 rdi 0x7fd4264535a8 >> 140549151864232 rbp 0x7fd440beb680 >> 0x7fd440beb680 rsp 0x7fd440beb648 0x7fd440beb648 >> r8 0x7fd4264535a8 140549151864232 >> r9 0x3bc6 15302 >> r10 0x1 1 >> r11 0x246 582 >> r12 0x4 4 >> r13 0x7fd440bed9c0 140549596043712 >> r14 0x0 0 >> r15 0x3 3 >> rip 0x3c5d8094a7 0x3c5d8094a7 <pthread_mutex_lock+55> >> eflags 0x246 [ PF ZF IF ] cs 0x33 >> 51 ss 0x2b 43 ds 0x0 0 es >> 0x0 0 fs 0x0 0 gs 0x0 0 >> (gdb) p *(pthread_mutex_t*)0x7fd4264535a8 >> $4 = {__data = {__lock = 2, __count = 0, __owner = 5008, __nusers = >> 1, __kind = 2, __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, >> __size = >> "\002\000\000\000\000\000\000\000\220\023\000\000\001\000\000\000\002" >> , '\000' <repeats 22 times>, __align = 2} >> (gdb) thread 50 >> [Switching to thread 50 (Thread 0x7fd4001f1700 (LWP 5008))]#0 >> 0x0000003c5d80ec2c in recv () from /lib64/libpthread.so.0 >> (gdb) bt >> #0 0x0000003c5d80ec2c in recv () from /lib64/libpthread.so.0 >> #1 0x0000000001c3a6f8 in Pipe::do_recv (this=0x7fd426561000, >> buf=0x7fd42649c000 "\a\220\006", len=4096, flags=64) at >> msg/simple/Pipe.cc:2428 >> #2 0x0000000001c3aa4d in Pipe::buffered_recv (this=0x7fd426561000, >> buf=0x7fd4224c01d3 "", len=4096, flags=64) at msg/simple/Pipe.cc:2474 >> #3 0x0000000001c3ab0d in Pipe::tcp_read_nonblocking >> (this=0x7fd426561000, buf=0x7fd4224c0000 "\t\to\030", len=4096) at >> msg/simple/Pipe.cc:2492 > > It's supposed to be a non-blocking read, so in principle this shouldn't block > and hold the lock. > > I suspect what you're seeing is a variation of #14120, fixed by > 63e44e32974c9bae17bb1bfd4261dcb024ad845c. > > How reproducible is this? We can push a build that includes the fix, but it > would be nice to have some confirmation that it is the right one since so far > this has been a very rare case. > > Thanks! > sage > > >> #4 0x0000000001c37a25 in Pipe::read_message (this=0x7fd426561000, >> pm=0x7fd4001f0b80, auth_handler=0x7fd422481080) at >> msg/simple/Pipe.cc:2032 >> #5 0x0000000001c33146 in Pipe::reader (this=0x7fd426561000) at >> msg/simple/Pipe.cc:1581 >> #6 0x0000000001c3b0c8 in Pipe::Reader::entry (this=0x7fd426561018) >> at >> msg/simple/Pipe.h:50 >> #7 0x0000000001b33d40 in Thread::entry_wrapper (this=0x7fd426561018) >> at common/Thread.cc:61 >> #8 0x0000000001b33cb2 in Thread::_entry_func (arg=0x7fd426561018) at >> common/Thread.cc:45 >> #9 0x0000003c5d807aa1 in start_thread () from /lib64/libpthread.so.0 >> #10 0x0000003c5d4e8aad in clone () from /lib64/libc.so.6 >> >> Because we are dropping a large amount of ip packets, so we think the “recv” >> call is blocked for a long time, which led to the suicide timeout. >> >> We read the source code, and found that the lock that directly made the >> thread hit suicide timeout is Connection::lock. Why is this lock used in >> both the submit_message and read_message? Is it supposed to be this way? >> >> Please help me, thank you☺ >> >> >> N?????r??y??????X??ǧv???){.n?????z?]z????ay? ʇڙ??j ??f???h????? ?w??? > ???j:+v???w???????? ????zZ+???????j"????i _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com