[ 
https://issues.apache.org/jira/browse/TS-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13727276#comment-13727276
 ] 

ASF subversion and git services commented on TS-1330:
-----------------------------------------------------

Commit c1c3db9d7b524176e611a6aba3066d1a732b4bd2 in branch refs/heads/t from 
[~yunkai]
[ https://git-wip-us.apache.org/repos/asf?p=trafficserver.git;h=c1c3db9 ]

TS-1330: Fix logging core at checkout_write()

== Analysis ==
1. One thread(A) got an LogBuffer object in LogObject::_checkout_write() and ran
   into LogBuffer::checkout_write(), before it updated the LogBuffer::m_state,
   it was scheduled out by the kernel.

2. Other threads got the same LogBuffer object, but they all finished
   _checkout_write() codes and made the LogBuffer object become full, finally
   one of these thread flushed the LogBuffer object into write_list queue.

3. The LogBuffer object's m_references == 0 at this time.

4. The flushing thread would call LogBufferManager::flush_buffers() to loop
   write_list. As the LogBuffer's m_references == 0, so it would be flushed
   and deleted.

5. Now, thread(A) would be scheduled in by the kernel, but the LogBuffer object
   has been deleted, so the m_buffer == NULL, that is why entry_header is 0x0.

== Solution ==
1. Remove flushing code from LogObject::log(), we should flush LogBuffer into
   queue only when new_buffer is created, so that we can promise that LogBuffer
   must be flushed only once.

2. The flushing thread should both check m_references and m_state.s.num_writers.

3. When allocate a new_buffer, should avoid memory leak.

Signed-off-by: Yunkai Zhang <qiushu....@taobao.com>

                
> Logging related segfault in 3.2.0
> ---------------------------------
>
>                 Key: TS-1330
>                 URL: https://issues.apache.org/jira/browse/TS-1330
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Logging
>    Affects Versions: 3.2.0
>         Environment: ATS 3.2.0 on RHEL 6.2 64-bit
>            Reporter: David Carlin
>            Assignee: Yunkai Zhang
>            Priority: Critical
>              Labels: A, crash
>             Fix For: 3.3.5
>
>         Attachments: 0001-A-testing-patch-to-verify-my-analysis.patch, 
> 0001-TS-1330-Fix-logging-core-at-checkout_write.patch, 
> 0001-TS-1330-Fix-logging-core-at-checkout_write.patch.V2
>
>
> I observed the following crash once on one of our ATS boxes - possibly 
> related to TS-1240
> Jul  2 13:56:56 l2 traffic_server[25853]: {0x2b0a391e1700} ERROR: 
> [SSL_NetVConnection::ssl_read_from_net] SSL_ERROR_SYSCALL, underlying IO 
> error: Connection reset by peer
> Jul  2 13:59:56 l2 kernel: [ET_NET 1][25855]: segfault at c ip 
> 000000000058e083 sp 00002b0a2982b740 error 6
> Jul  2 13:59:56 l2 kernel: [ET_NET 3][25857]: segfault at 84 ip 
> 000000000058e083 sp 00002b0a29a31740 error 6 in traffic_server[400000+340000]
> Jul  2 13:59:56 l2 kernel: in traffic_server[400000+340000]
> Jul  2 14:02:59 l2 traffic_cop[25901]: (test) read timeout [180000 ]
> Jul  2 14:02:59 l2 traffic_cop[25901]: server heartbeat failed [1]
> Jul  2 14:03:08 l2 traffic_manager[25826]: {0x7f3f088607e0} FATAL: 
> [LocalManager::pollMgmtProcessServer] Error in read (errno: 104)
> Jul  2 14:03:09 l2 traffic_manager[25826]: {0x7f3f088607e0} FATAL:  (last 
> system error 104: Connection reset by peer)
> Jul  2 14:03:09 l2 traffic_cop[25901]: cannot find traffic_server [1]
> Jul  2 14:03:09 l2 traffic_manager[25826]: {0x7f3f088607e0} ERROR: 
> [LocalManager::sendMgmtMsgToProcesses] Error writing message
> Jul  2 14:03:09 l2 traffic_manager[25826]: {0x7f3f088607e0} ERROR:  (last 
> system error 32: Broken pipe)
> Jul  2 14:03:12 l2 traffic_cop[25901]: cop received child status signal 
> [25826 35584]
> Jul  2 14:03:12 l2 traffic_cop[25901]: traffic_manager not running, making 
> sure traffic_server is dead
> Jul  2 14:03:12 l2 traffic_cop[25901]: spawning traffic_manager
> Jul  2 14:03:13 l2 traffic_manager[18267]: NOTE: --- Manager Starting ---
> Jul  2 14:03:13 l2 traffic_manager[18267]: NOTE: Manager Version: Apache 
> Traffic Server - traffic_manager - 3.2.0 - (build # 52518 on Jun 25 2012 at 
> 18:22:12)
> Jul  2 14:03:13 l2 traffic_manager[18267]: {0x7fe63de3f7e0} STATUS: opened 
> /home/y/logs/trafficserver/manager.log
> Jul  2 14:03:15 l2 traffic_server[18322]: NOTE: --- Server Starting ---
> Jul  2 14:03:15 l2 traffic_server[18322]: NOTE: Server Version: Apache 
> Traffic Server - traffic_server - 3.2.0 - (build # 52518 on Jun 25 2012 at 
> 18:22:31)
> Jul  2 14:03:15 l2 traffic_server[18322]: {0x2b77573ab860} STATUS: opened 
> /home/y/logs/trafficserver/diags.log
> Jul  2 14:03:15 l2 traffic_server[18322]: {0x2b77573ab860} ERROR: Cannot 
> insert duplicate!
> Jul  2 14:03:22 l2 traffic_cop[25901]: server heartbeat succeeded
> [Jul  2 13:56:56.304] Server {0x2b0a391e1700} ERROR: 
> [SSL_NetVConnection::ssl_read_from_net] SSL_ERROR_SYSCALL, underlying IO 
> error: Connection reset by peer
> NOTE: Traffic Server received Sig 11: Segmentation fault
> NOTE: Traffic Server received Sig 11: Segmentation fault
> /home/y/bin/traffic_server - STACK TRACE: 
> /home/y/bin/traffic_server - STACK TRACE: 
> /lib64/libpthread.so.0[0x3b54e0f4a0]
> /lib64/libpthread.so.0[0x3b54e0f4a0]
> /home/y/bin/traffic_server(_ZN9LogBuffer14checkout_writeEPmm+0x153)[0x58e083]
> /home/y/bin/traffic_server(_ZN9LogObject15_checkout_writeEPmm+0xa8)[0x5a64c8]
> /home/y/bin/traffic_server(_ZN9LogObject3logEP9LogAccessPc+0x2f0)[0x5a7e30]
> /home/y/bin/traffic_server(_ZN9LogBuffer14checkout_writeEPmm+0x153)[0x58e083]
> /home/y/bin/traffic_server(_ZN9LogObject15_checkout_writeEPmm+0xa8)[0x5a64c8]
> /home/y/bin/traffic_server(_ZN9LogObject3logEP9LogAccessPc+0x2f0)[0x5a7e30]
> /home/y/bin/traffic_server(_ZN3Log6accessEP9LogAccess+0x146)[0x58f506]
> /home/y/bin/traffic_server(_ZN6HttpSM12update_statsEv+0x630)[0x526c50]
> /home/y/bin/traffic_server(_ZN3Log6accessEP9LogAccess+0x146)[0x58f506]
> /home/y/bin/traffic_server(_ZN6HttpSM9kill_thisEv+0x928)[0x52b548]
> /home/y/bin/traffic_server(_ZN6HttpSM12update_statsEv+0x630)[0x526c50]
> /home/y/bin/traffic_server(_ZN6HttpSM9kill_thisEv+0x928)[0x52b548]
> /home/y/bin/traffic_server(_ZN6HttpSM12main_handlerEiPv+0x198)[0x52b868]
> /home/y/bin/traffic_server(_ZN10HttpTunnel12main_handlerEiPv+0xde)[0x56c3ee]
> /home/y/bin/traffic_server[0x6736a1]
> /home/y/bin/traffic_server(_ZN6HttpSM12main_handlerEiPv+0x198)[0x52b868]
> /home/y/bin/traffic_server(_Z15write_to_net_ioP10NetHandlerP18UnixNetVConnectionP7EThread+0x847)[0x675517]
> /home/y/bin/traffic_server[0x672f81]
> /home/y/bin/traffic_server(_ZN10NetHandler12mainNetEventEiP5Event+0x286)[0x66df96]
> /home/y/bin/traffic_server[0x676ad5]
> /home/y/bin/traffic_server(_ZN7EThread13process_eventEP5Eventi+0xb4)[0x696b14]
> /home/y/bin/traffic_server(_ZN10NetHandler12mainNetEventEiP5Event+0x1f2)[0x66df02]
> /home/y/bin/traffic_server(_ZN7EThread13process_eventEP5Eventi+0xb4)[0x696b14]
> /home/y/bin/traffic_server(_ZN7EThread7executeEv+0x4c3)[0x6974a3]
> /home/y/bin/traffic_server[0x695ae2]
> /lib64/libpthread.so.0[0x3b54e077f1]
> /lib64/libc.so.6(clone+0x6d)[0x3b54ae5ccd]
> /home/y/bin/traffic_server(_ZN7EThread7executeEv+0x4c3)[0x6974a3]
> /home/y/bin/traffic_server[0x695ae2]
> /lib64/libpthread.so.0[0x3b54e077f1]
> /lib64/libc.so.6(clone+0x6d)[0x3b54ae5ccd]
> [Jul  2 14:03:08.684] Manager {0x7f3f088607e0} FATAL: 
> [LocalManager::pollMgmtProcessServer] Error in read (errno: 104)
> [Jul  2 14:03:09.213] Manager {0x7f3f088607e0} FATAL:  (last system error 
> 104: Connection reset by peer)
> [Jul  2 14:03:09.213] Manager {0x7f3f088607e0} NOTE: 
> [LocalManager::mgmtShutdown] Executing shutdown request.
> [Jul  2 14:03:09.213] Manager {0x7f3f088607e0} NOTE: 
> [LocalManager::processShutdown] Executing process shutdown request.
> [Jul  2 14:03:09.496] Manager {0x7f3f088607e0} ERROR: 
> [LocalManager::sendMgmtMsgToProcesses] Error writing message
> [Jul  2 14:03:09.496] Manager {0x7f3f088607e0} ERROR:  (last system error 32: 
> Broken pipe)
> [example_alarm_bin.sh] sent alarm: l2.ycs.uls.yahoo.com [Mon Jul  2 14:03:09 
> 2012] The TS-TM connection is broken for some reason. Either restart TS and 
> TM or correct this error for TM to display TS statistics correctly

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to