[ https://issues.apache.org/jira/browse/TS-971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
mohan_zl updated TS-971: ------------------------ Attachment: TS-evacuate-fix.patch Understand the cause of this bug, the patch has fixed it. Now as the TS-970 and TS-971 are both fixed, the cache feature can works well. The bug is caused in this way: in Vol::aggWrite, the line "io.thread = AIO_CALLBACK_THREAD_AIO" cause AIO thread directly call continuation handler "Vol::aggWriteDone", and the MUTEX_LOCK make vol->mutex->thread_holding is the current AIO thread, which is a DEDICATED thread. If you enable cache evacuate, then something error happens: CacheVC::evacuateDocDone will call do_read_call, which call CacheVC::handleRead, and in CacheVC::handleRead, line "io.thread = mutex->thread_holding" will use current AIO thread to do asynchronous io, but AIO is a DEDICATED thread, it will neeeeever do this action, and the ink_assert macro will cause ats crash. I think the division of every thread's labor is not very clear, for example, the aio thread is a DEDICATED thread, so if you wanna use it to call continuation handler, which should be done by REGULAR thread, some bug will happen. Besides, the epoll_wait should be done by a special thread like POLL thread, not ET_NET thread, which is in actually a worker thread, isn't it? > Thread error in the cache evacuation feature > -------------------------------------------- > > Key: TS-971 > URL: https://issues.apache.org/jira/browse/TS-971 > Project: Traffic Server > Issue Type: Bug > Reporter: mohan_zl > Attachments: TS-evacuate-fix.patch > > > After fix the Bug TS-970, i go on testing the evacuate feature for the cache, > with the same environment and test methods, and this time, trafficserver > crash in another codes, somewhat strange. > {code} > (gdb) bt > #0 0x0000003639c30265 in raise () from /lib64/libc.so.6 > #1 0x0000003639c31d10 in abort () from /lib64/libc.so.6 > #2 0x00002b9258e7e6fa in ink_die_die_die (retval=Could not find the frame > base for "ink_die_die_die". > ) at ink_error.cc:43 > #3 0x00002b9258e7e979 in ink_fatal_va (return_code=Could not find the frame > base for "ink_fatal_va". > ) at ink_error.cc:65 > #4 0x00002b9258e7eb46 in ink_fatal (return_code=Could not find the frame > base for "ink_fatal". > ) at ink_error.cc:73 > #5 0x00002b9258e7c97a in _ink_assert (a=Could not find the frame base for > "_ink_assert". > ) at ink_assert.cc:44 > #6 0x00000000004f45df in EThread::schedule (this=0x2aaaabe9c010, > e=0x2aaab4325e00, fast_signal=true) > at ../../iocore/eventsystem/P_UnixEThread.h:96 > #7 0x00000000006496db in EThread::schedule_imm_signal (this=0x2aaaabe9c010, > cont=0x302b948, callback_event=1, cookie=0x0) > at ../../iocore/eventsystem/P_UnixEThread.h:62 > #8 0x00000000006c4427 in aio_thread_main (arg=0x2aaaac0d1820) at AIO.cc:528 > #9 0x00000000006c4afa in AIOThreadInfo::start (this=0x2aaaac0d1820, event=1, > e=0x2a05650) at AIO.cc:188 > #10 0x00000000004d3789 in Continuation::handleEvent (this=0x2aaaac0d1820, > event=1, data=0x2a05650) at I_Continuation.h:146 > #11 0x00000000006f705b in EThread::execute (this=0x2aaaabe9c010) at > UnixEThread.cc:289 > #12 0x00000000006f6307 in spawn_thread_internal (a=0x2aaaac0d1870) at > Thread.cc:88 > #13 0x000000363a8064a7 in start_thread () from /lib64/libpthread.so.0 > #14 0x0000003639cd3c2d in clone () from /lib64/libc.so.6 > (gdb) f 8 > #8 0x00000000006c4427 in aio_thread_main (arg=0x2aaaac0d1820) at AIO.cc:528 > 528 op->thread->schedule_imm_signal(op); > (gdb) p *op > $1 = {<Continuation> = {<force_VFPT_to_top> = {_vptr.force_VFPT_to_top = > 0x760710}, > handler = 0x68a4f6 <AIOCallbackInternal::io_complete(int, void*)>, > handler_name = 0x75dfb0 "&AIOCallbackInternal::io_complete", mutex = { > m_ptr = 0x2aaaac261f70}, link = {<SLink<Continuation>> = {next = 0x0}, > prev = 0x0}}, aiocb = {aio_fildes = 38, aio_buf = 0x2aabd80db000, > aio_nbytes = 3072, aio_offset = 3359854592, aio_reqprio = 0, > aio_lio_opcode = 1, aio_state = 0, aio__pad = {0}}, action = {_vptr.Action = > 0x0, > continuation = 0x302b7c0, mutex = {m_ptr = 0x2aaaac261f70}, cancelled = > 0}, thread = 0x2aaaabe9c010, then = 0x0, aio_result = 3072} > (gdb) p ((CacheVC *)op->action->continuation)->io->aiocb > $1 = {aio_fildes = 38, aio_buf = 0x2aabd80db000, aio_nbytes = 3072, > aio_offset = 3359854592, aio_reqprio = 0, aio_lio_opcode = 1, aio_state = 0, > aio__pad = {0}} > (gdb) p ((CacheVC *)op->action->continuation)->handler_name > $2 = 0x75f372 "&CacheVC::handleReadDone" > (gdb) p ((CacheVC *)op->action->continuation)->f.evacuator > $3 = 1 > (gdb) p ((CacheVC *)op->action->continuation)->save_handler > $4 = 0x6afe9e <CacheVC::evacuateReadHead(int, Event*)> > (gdb) f 6 > #6 0x00000000004f45df in EThread::schedule (this=0x2aaaabe9c010, > e=0x2aaab4325e00, fast_signal=true) > at ../../iocore/eventsystem/P_UnixEThread.h:96 > 96 ink_assert(tt == REGULAR); > (gdb) p tt > $2 = DEDICATED > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira