[ 
https://issues.apache.org/jira/browse/TS-971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mohan_zl updated TS-971:
------------------------

    Attachment: TS-evacuate-fix.patch

Understand the cause of this bug, the patch has fixed it. Now as the TS-970 and 
TS-971 are both fixed, the cache feature can works well. 
The bug is caused in this way: in Vol::aggWrite, the line "io.thread = 
AIO_CALLBACK_THREAD_AIO" cause AIO thread directly call continuation handler 
"Vol::aggWriteDone", and the MUTEX_LOCK make vol->mutex->thread_holding is the 
current AIO thread, which is a DEDICATED thread. If you enable cache evacuate, 
then something error happens: CacheVC::evacuateDocDone will call do_read_call, 
which call CacheVC::handleRead, and in CacheVC::handleRead, line "io.thread = 
mutex->thread_holding" will use current AIO thread to do asynchronous io, but 
AIO is a DEDICATED thread, it will neeeeever do this action, and the ink_assert 
macro will cause ats crash.
I think the division of every thread's labor is not very clear, for example, 
the aio thread is a DEDICATED thread, so if you wanna use it to call 
continuation handler, which should be done by REGULAR thread, some bug will 
happen. Besides, the epoll_wait should be done by a special thread like POLL 
thread, not ET_NET thread, which is in actually a worker thread, isn't it? 
                
> Thread error in the cache evacuation feature
> --------------------------------------------
>
>                 Key: TS-971
>                 URL: https://issues.apache.org/jira/browse/TS-971
>             Project: Traffic Server
>          Issue Type: Bug
>            Reporter: mohan_zl
>         Attachments: TS-evacuate-fix.patch
>
>
> After fix the Bug TS-970, i go on testing the evacuate feature for the cache, 
> with the same environment and test methods, and this time, trafficserver 
> crash in another codes, somewhat strange.
> {code}
> (gdb) bt
> #0  0x0000003639c30265 in raise () from /lib64/libc.so.6
> #1  0x0000003639c31d10 in abort () from /lib64/libc.so.6
> #2  0x00002b9258e7e6fa in ink_die_die_die (retval=Could not find the frame 
> base for "ink_die_die_die".
> ) at ink_error.cc:43
> #3  0x00002b9258e7e979 in ink_fatal_va (return_code=Could not find the frame 
> base for "ink_fatal_va".
> ) at ink_error.cc:65
> #4  0x00002b9258e7eb46 in ink_fatal (return_code=Could not find the frame 
> base for "ink_fatal".
> ) at ink_error.cc:73
> #5  0x00002b9258e7c97a in _ink_assert (a=Could not find the frame base for 
> "_ink_assert".
> ) at ink_assert.cc:44
> #6  0x00000000004f45df in EThread::schedule (this=0x2aaaabe9c010, 
> e=0x2aaab4325e00, fast_signal=true)
>     at ../../iocore/eventsystem/P_UnixEThread.h:96
> #7  0x00000000006496db in EThread::schedule_imm_signal (this=0x2aaaabe9c010, 
> cont=0x302b948, callback_event=1, cookie=0x0)
>     at ../../iocore/eventsystem/P_UnixEThread.h:62
> #8  0x00000000006c4427 in aio_thread_main (arg=0x2aaaac0d1820) at AIO.cc:528
> #9  0x00000000006c4afa in AIOThreadInfo::start (this=0x2aaaac0d1820, event=1, 
> e=0x2a05650) at AIO.cc:188
> #10 0x00000000004d3789 in Continuation::handleEvent (this=0x2aaaac0d1820, 
> event=1, data=0x2a05650) at I_Continuation.h:146
> #11 0x00000000006f705b in EThread::execute (this=0x2aaaabe9c010) at 
> UnixEThread.cc:289
> #12 0x00000000006f6307 in spawn_thread_internal (a=0x2aaaac0d1870) at 
> Thread.cc:88
> #13 0x000000363a8064a7 in start_thread () from /lib64/libpthread.so.0
> #14 0x0000003639cd3c2d in clone () from /lib64/libc.so.6
> (gdb) f 8
> #8  0x00000000006c4427 in aio_thread_main (arg=0x2aaaac0d1820) at AIO.cc:528
> 528             op->thread->schedule_imm_signal(op);
> (gdb) p *op
> $1 = {<Continuation> = {<force_VFPT_to_top> = {_vptr.force_VFPT_to_top = 
> 0x760710}, 
>     handler = 0x68a4f6 <AIOCallbackInternal::io_complete(int, void*)>, 
> handler_name = 0x75dfb0 "&AIOCallbackInternal::io_complete", mutex = {
>       m_ptr = 0x2aaaac261f70}, link = {<SLink<Continuation>> = {next = 0x0}, 
> prev = 0x0}}, aiocb = {aio_fildes = 38, aio_buf = 0x2aabd80db000, 
>     aio_nbytes = 3072, aio_offset = 3359854592, aio_reqprio = 0, 
> aio_lio_opcode = 1, aio_state = 0, aio__pad = {0}}, action = {_vptr.Action = 
> 0x0, 
>     continuation = 0x302b7c0, mutex = {m_ptr = 0x2aaaac261f70}, cancelled = 
> 0}, thread = 0x2aaaabe9c010, then = 0x0, aio_result = 3072}
> (gdb) p ((CacheVC *)op->action->continuation)->io->aiocb
> $1 = {aio_fildes = 38, aio_buf = 0x2aabd80db000, aio_nbytes = 3072, 
> aio_offset = 3359854592, aio_reqprio = 0, aio_lio_opcode = 1, aio_state = 0, 
>   aio__pad = {0}}
> (gdb) p ((CacheVC *)op->action->continuation)->handler_name
> $2 = 0x75f372 "&CacheVC::handleReadDone"
> (gdb) p ((CacheVC *)op->action->continuation)->f.evacuator
> $3 = 1
> (gdb) p ((CacheVC *)op->action->continuation)->save_handler
> $4 = 0x6afe9e <CacheVC::evacuateReadHead(int, Event*)>
> (gdb) f 6
> #6  0x00000000004f45df in EThread::schedule (this=0x2aaaabe9c010, 
> e=0x2aaab4325e00, fast_signal=true)
>     at ../../iocore/eventsystem/P_UnixEThread.h:96
> 96        ink_assert(tt == REGULAR);
> (gdb) p tt
> $2 = DEDICATED
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to