[jira] [Commented] (TS-1330) Logging related segfault in 3.2.0
[ https://issues.apache.org/jira/browse/TS-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13728651#comment-13728651 ] ASF subversion and git services commented on TS-1330: - Commit 1bb12edf9cd9b0076eaf39b8665909846743ee73 in branch refs/heads/3.3.x from [~zwoop] [ https://git-wip-us.apache.org/repos/asf?p=trafficserver.git;h=1bb12ed ] Merge branch 'master' into 3.3.x * master: (295 commits) TS-2081 Make the WCCP addr configuration LOCAL TS-2093: Check bounds on plugin stat creation. TS-2092 Use of uninitialized member in HdrHeap. [TS-2052] ET_SSL thread spinning TS-2090 Make proxy.config.allocator.enable_reclaim default based on build instructions TS-1006: adjust some reclaimable-freelist's default configuration Updated newish tests to the gitignore Fix autoconf checks for mcheck_pedantic() Fix configure check for eventfd TS-1330: Fix logging core at checkout_write() add some standard extensions to ignore list TS-1953: remove version check from stable plugins TS-1953: Remove check_ts_version() from experimental plugins TS-1953: remove check_ts_version() from examples Added TS-2086. TS-2086 Remove a few more unused configs Added TS-1685 TS-1685 Remove TS_MICRO and fellas Added Ts-1255 TS-1255 Fix the types for all overridable configs. This was actually a real bug in the code, in that all float configurations were actually treated as integer, rendering them useless ... Logging related segfault in 3.2.0 - Key: TS-1330 URL: https://issues.apache.org/jira/browse/TS-1330 Project: Traffic Server Issue Type: Bug Components: Logging Affects Versions: 3.2.0 Environment: ATS 3.2.0 on RHEL 6.2 64-bit Reporter: David Carlin Assignee: Yunkai Zhang Priority: Critical Labels: A, crash Fix For: 3.3.5 Attachments: 0001-A-testing-patch-to-verify-my-analysis.patch, 0001-TS-1330-Fix-logging-core-at-checkout_write.patch, 0001-TS-1330-Fix-logging-core-at-checkout_write.patch.V2 I observed the following crash once on one of our ATS boxes - possibly related to TS-1240 Jul 2 13:56:56 l2 traffic_server[25853]: {0x2b0a391e1700} ERROR: [SSL_NetVConnection::ssl_read_from_net] SSL_ERROR_SYSCALL, underlying IO error: Connection reset by peer Jul 2 13:59:56 l2 kernel: [ET_NET 1][25855]: segfault at c ip 0058e083 sp 2b0a2982b740 error 6 Jul 2 13:59:56 l2 kernel: [ET_NET 3][25857]: segfault at 84 ip 0058e083 sp 2b0a29a31740 error 6 in traffic_server[40+34] Jul 2 13:59:56 l2 kernel: in traffic_server[40+34] Jul 2 14:02:59 l2 traffic_cop[25901]: (test) read timeout [18 ] Jul 2 14:02:59 l2 traffic_cop[25901]: server heartbeat failed [1] Jul 2 14:03:08 l2 traffic_manager[25826]: {0x7f3f088607e0} FATAL: [LocalManager::pollMgmtProcessServer] Error in read (errno: 104) Jul 2 14:03:09 l2 traffic_manager[25826]: {0x7f3f088607e0} FATAL: (last system error 104: Connection reset by peer) Jul 2 14:03:09 l2 traffic_cop[25901]: cannot find traffic_server [1] Jul 2 14:03:09 l2 traffic_manager[25826]: {0x7f3f088607e0} ERROR: [LocalManager::sendMgmtMsgToProcesses] Error writing message Jul 2 14:03:09 l2 traffic_manager[25826]: {0x7f3f088607e0} ERROR: (last system error 32: Broken pipe) Jul 2 14:03:12 l2 traffic_cop[25901]: cop received child status signal [25826 35584] Jul 2 14:03:12 l2 traffic_cop[25901]: traffic_manager not running, making sure traffic_server is dead Jul 2 14:03:12 l2 traffic_cop[25901]: spawning traffic_manager Jul 2 14:03:13 l2 traffic_manager[18267]: NOTE: --- Manager Starting --- Jul 2 14:03:13 l2 traffic_manager[18267]: NOTE: Manager Version: Apache Traffic Server - traffic_manager - 3.2.0 - (build # 52518 on Jun 25 2012 at 18:22:12) Jul 2 14:03:13 l2 traffic_manager[18267]: {0x7fe63de3f7e0} STATUS: opened /home/y/logs/trafficserver/manager.log Jul 2 14:03:15 l2 traffic_server[18322]: NOTE: --- Server Starting --- Jul 2 14:03:15 l2 traffic_server[18322]: NOTE: Server Version: Apache Traffic Server - traffic_server - 3.2.0 - (build # 52518 on Jun 25 2012 at 18:22:31) Jul 2 14:03:15 l2 traffic_server[18322]: {0x2b77573ab860} STATUS: opened /home/y/logs/trafficserver/diags.log Jul 2 14:03:15 l2 traffic_server[18322]: {0x2b77573ab860} ERROR: Cannot insert duplicate! Jul 2 14:03:22 l2 traffic_cop[25901]: server heartbeat succeeded [Jul 2 13:56:56.304] Server {0x2b0a391e1700} ERROR: [SSL_NetVConnection::ssl_read_from_net] SSL_ERROR_SYSCALL, underlying IO error: Connection reset by peer NOTE: Traffic Server received Sig 11: Segmentation fault NOTE: Traffic Server received Sig 11: Segmentation fault /home/y/bin/traffic_server - STACK TRACE:
[jira] [Commented] (TS-1330) Logging related segfault in 3.2.0
[ https://issues.apache.org/jira/browse/TS-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13727276#comment-13727276 ] ASF subversion and git services commented on TS-1330: - Commit c1c3db9d7b524176e611a6aba3066d1a732b4bd2 in branch refs/heads/t from [~yunkai] [ https://git-wip-us.apache.org/repos/asf?p=trafficserver.git;h=c1c3db9 ] TS-1330: Fix logging core at checkout_write() == Analysis == 1. One thread(A) got an LogBuffer object in LogObject::_checkout_write() and ran into LogBuffer::checkout_write(), before it updated the LogBuffer::m_state, it was scheduled out by the kernel. 2. Other threads got the same LogBuffer object, but they all finished _checkout_write() codes and made the LogBuffer object become full, finally one of these thread flushed the LogBuffer object into write_list queue. 3. The LogBuffer object's m_references == 0 at this time. 4. The flushing thread would call LogBufferManager::flush_buffers() to loop write_list. As the LogBuffer's m_references == 0, so it would be flushed and deleted. 5. Now, thread(A) would be scheduled in by the kernel, but the LogBuffer object has been deleted, so the m_buffer == NULL, that is why entry_header is 0x0. == Solution == 1. Remove flushing code from LogObject::log(), we should flush LogBuffer into queue only when new_buffer is created, so that we can promise that LogBuffer must be flushed only once. 2. The flushing thread should both check m_references and m_state.s.num_writers. 3. When allocate a new_buffer, should avoid memory leak. Signed-off-by: Yunkai Zhang qiushu@taobao.com Logging related segfault in 3.2.0 - Key: TS-1330 URL: https://issues.apache.org/jira/browse/TS-1330 Project: Traffic Server Issue Type: Bug Components: Logging Affects Versions: 3.2.0 Environment: ATS 3.2.0 on RHEL 6.2 64-bit Reporter: David Carlin Assignee: Yunkai Zhang Priority: Critical Labels: A, crash Fix For: 3.3.5 Attachments: 0001-A-testing-patch-to-verify-my-analysis.patch, 0001-TS-1330-Fix-logging-core-at-checkout_write.patch, 0001-TS-1330-Fix-logging-core-at-checkout_write.patch.V2 I observed the following crash once on one of our ATS boxes - possibly related to TS-1240 Jul 2 13:56:56 l2 traffic_server[25853]: {0x2b0a391e1700} ERROR: [SSL_NetVConnection::ssl_read_from_net] SSL_ERROR_SYSCALL, underlying IO error: Connection reset by peer Jul 2 13:59:56 l2 kernel: [ET_NET 1][25855]: segfault at c ip 0058e083 sp 2b0a2982b740 error 6 Jul 2 13:59:56 l2 kernel: [ET_NET 3][25857]: segfault at 84 ip 0058e083 sp 2b0a29a31740 error 6 in traffic_server[40+34] Jul 2 13:59:56 l2 kernel: in traffic_server[40+34] Jul 2 14:02:59 l2 traffic_cop[25901]: (test) read timeout [18 ] Jul 2 14:02:59 l2 traffic_cop[25901]: server heartbeat failed [1] Jul 2 14:03:08 l2 traffic_manager[25826]: {0x7f3f088607e0} FATAL: [LocalManager::pollMgmtProcessServer] Error in read (errno: 104) Jul 2 14:03:09 l2 traffic_manager[25826]: {0x7f3f088607e0} FATAL: (last system error 104: Connection reset by peer) Jul 2 14:03:09 l2 traffic_cop[25901]: cannot find traffic_server [1] Jul 2 14:03:09 l2 traffic_manager[25826]: {0x7f3f088607e0} ERROR: [LocalManager::sendMgmtMsgToProcesses] Error writing message Jul 2 14:03:09 l2 traffic_manager[25826]: {0x7f3f088607e0} ERROR: (last system error 32: Broken pipe) Jul 2 14:03:12 l2 traffic_cop[25901]: cop received child status signal [25826 35584] Jul 2 14:03:12 l2 traffic_cop[25901]: traffic_manager not running, making sure traffic_server is dead Jul 2 14:03:12 l2 traffic_cop[25901]: spawning traffic_manager Jul 2 14:03:13 l2 traffic_manager[18267]: NOTE: --- Manager Starting --- Jul 2 14:03:13 l2 traffic_manager[18267]: NOTE: Manager Version: Apache Traffic Server - traffic_manager - 3.2.0 - (build # 52518 on Jun 25 2012 at 18:22:12) Jul 2 14:03:13 l2 traffic_manager[18267]: {0x7fe63de3f7e0} STATUS: opened /home/y/logs/trafficserver/manager.log Jul 2 14:03:15 l2 traffic_server[18322]: NOTE: --- Server Starting --- Jul 2 14:03:15 l2 traffic_server[18322]: NOTE: Server Version: Apache Traffic Server - traffic_server - 3.2.0 - (build # 52518 on Jun 25 2012 at 18:22:31) Jul 2 14:03:15 l2 traffic_server[18322]: {0x2b77573ab860} STATUS: opened /home/y/logs/trafficserver/diags.log Jul 2 14:03:15 l2 traffic_server[18322]: {0x2b77573ab860} ERROR: Cannot insert duplicate! Jul 2 14:03:22 l2 traffic_cop[25901]: server heartbeat succeeded [Jul 2 13:56:56.304] Server {0x2b0a391e1700} ERROR: [SSL_NetVConnection::ssl_read_from_net] SSL_ERROR_SYSCALL, underlying IO error: Connection reset by peer
[jira] [Commented] (TS-1330) Logging related segfault in 3.2.0
[ https://issues.apache.org/jira/browse/TS-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13727281#comment-13727281 ] ASF subversion and git services commented on TS-1330: - Commit 3e05261c675b15a76f178e1d3e579c6941f42d20 in branch refs/heads/master from [~yunkai] [ https://git-wip-us.apache.org/repos/asf?p=trafficserver.git;h=3e05261 ] TS-1330: Fix logging core at checkout_write() == Analysis == 1. One thread(A) got an LogBuffer object in LogObject::_checkout_write() and ran into LogBuffer::checkout_write(), before it updated the LogBuffer::m_state, it was scheduled out by the kernel. 2. Other threads got the same LogBuffer object, but they all finished _checkout_write() codes and made the LogBuffer object become full, finally one of these thread flushed the LogBuffer object into write_list queue. 3. The LogBuffer object's m_references == 0 at this time. 4. The flushing thread would call LogBufferManager::flush_buffers() to loop write_list. As the LogBuffer's m_references == 0, so it would be flushed and deleted. 5. Now, thread(A) would be scheduled in by the kernel, but the LogBuffer object has been deleted, so the m_buffer == NULL, that is why entry_header is 0x0. == Solution == 1. Remove flushing code from LogObject::log(), we should flush LogBuffer into queue only when new_buffer is created, so that we can promise that LogBuffer must be flushed only once. 2. The flushing thread should both check m_references and m_state.s.num_writers. 3. When allocate a new_buffer, should avoid memory leak. Signed-off-by: Yunkai Zhang qiushu@taobao.com Logging related segfault in 3.2.0 - Key: TS-1330 URL: https://issues.apache.org/jira/browse/TS-1330 Project: Traffic Server Issue Type: Bug Components: Logging Affects Versions: 3.2.0 Environment: ATS 3.2.0 on RHEL 6.2 64-bit Reporter: David Carlin Assignee: Yunkai Zhang Priority: Critical Labels: A, crash Fix For: 3.3.5 Attachments: 0001-A-testing-patch-to-verify-my-analysis.patch, 0001-TS-1330-Fix-logging-core-at-checkout_write.patch, 0001-TS-1330-Fix-logging-core-at-checkout_write.patch.V2 I observed the following crash once on one of our ATS boxes - possibly related to TS-1240 Jul 2 13:56:56 l2 traffic_server[25853]: {0x2b0a391e1700} ERROR: [SSL_NetVConnection::ssl_read_from_net] SSL_ERROR_SYSCALL, underlying IO error: Connection reset by peer Jul 2 13:59:56 l2 kernel: [ET_NET 1][25855]: segfault at c ip 0058e083 sp 2b0a2982b740 error 6 Jul 2 13:59:56 l2 kernel: [ET_NET 3][25857]: segfault at 84 ip 0058e083 sp 2b0a29a31740 error 6 in traffic_server[40+34] Jul 2 13:59:56 l2 kernel: in traffic_server[40+34] Jul 2 14:02:59 l2 traffic_cop[25901]: (test) read timeout [18 ] Jul 2 14:02:59 l2 traffic_cop[25901]: server heartbeat failed [1] Jul 2 14:03:08 l2 traffic_manager[25826]: {0x7f3f088607e0} FATAL: [LocalManager::pollMgmtProcessServer] Error in read (errno: 104) Jul 2 14:03:09 l2 traffic_manager[25826]: {0x7f3f088607e0} FATAL: (last system error 104: Connection reset by peer) Jul 2 14:03:09 l2 traffic_cop[25901]: cannot find traffic_server [1] Jul 2 14:03:09 l2 traffic_manager[25826]: {0x7f3f088607e0} ERROR: [LocalManager::sendMgmtMsgToProcesses] Error writing message Jul 2 14:03:09 l2 traffic_manager[25826]: {0x7f3f088607e0} ERROR: (last system error 32: Broken pipe) Jul 2 14:03:12 l2 traffic_cop[25901]: cop received child status signal [25826 35584] Jul 2 14:03:12 l2 traffic_cop[25901]: traffic_manager not running, making sure traffic_server is dead Jul 2 14:03:12 l2 traffic_cop[25901]: spawning traffic_manager Jul 2 14:03:13 l2 traffic_manager[18267]: NOTE: --- Manager Starting --- Jul 2 14:03:13 l2 traffic_manager[18267]: NOTE: Manager Version: Apache Traffic Server - traffic_manager - 3.2.0 - (build # 52518 on Jun 25 2012 at 18:22:12) Jul 2 14:03:13 l2 traffic_manager[18267]: {0x7fe63de3f7e0} STATUS: opened /home/y/logs/trafficserver/manager.log Jul 2 14:03:15 l2 traffic_server[18322]: NOTE: --- Server Starting --- Jul 2 14:03:15 l2 traffic_server[18322]: NOTE: Server Version: Apache Traffic Server - traffic_server - 3.2.0 - (build # 52518 on Jun 25 2012 at 18:22:31) Jul 2 14:03:15 l2 traffic_server[18322]: {0x2b77573ab860} STATUS: opened /home/y/logs/trafficserver/diags.log Jul 2 14:03:15 l2 traffic_server[18322]: {0x2b77573ab860} ERROR: Cannot insert duplicate! Jul 2 14:03:22 l2 traffic_cop[25901]: server heartbeat succeeded [Jul 2 13:56:56.304] Server {0x2b0a391e1700} ERROR: [SSL_NetVConnection::ssl_read_from_net] SSL_ERROR_SYSCALL, underlying IO error: Connection reset by
[jira] [Commented] (TS-1330) Logging related segfault in 3.2.0
[ https://issues.apache.org/jira/browse/TS-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13726007#comment-13726007 ] Yunkai Zhang commented on TS-1330: -- I have tested V2 patch, it fixed this issue completely, it's time to merge it:). Logging related segfault in 3.2.0 - Key: TS-1330 URL: https://issues.apache.org/jira/browse/TS-1330 Project: Traffic Server Issue Type: Bug Components: Logging Affects Versions: 3.2.0 Environment: ATS 3.2.0 on RHEL 6.2 64-bit Reporter: David Carlin Assignee: Yunkai Zhang Priority: Critical Labels: A, crash Fix For: 3.3.5 Attachments: 0001-A-testing-patch-to-verify-my-analysis.patch, 0001-TS-1330-Fix-logging-core-at-checkout_write.patch, 0001-TS-1330-Fix-logging-core-at-checkout_write.patch.V2 I observed the following crash once on one of our ATS boxes - possibly related to TS-1240 Jul 2 13:56:56 l2 traffic_server[25853]: {0x2b0a391e1700} ERROR: [SSL_NetVConnection::ssl_read_from_net] SSL_ERROR_SYSCALL, underlying IO error: Connection reset by peer Jul 2 13:59:56 l2 kernel: [ET_NET 1][25855]: segfault at c ip 0058e083 sp 2b0a2982b740 error 6 Jul 2 13:59:56 l2 kernel: [ET_NET 3][25857]: segfault at 84 ip 0058e083 sp 2b0a29a31740 error 6 in traffic_server[40+34] Jul 2 13:59:56 l2 kernel: in traffic_server[40+34] Jul 2 14:02:59 l2 traffic_cop[25901]: (test) read timeout [18 ] Jul 2 14:02:59 l2 traffic_cop[25901]: server heartbeat failed [1] Jul 2 14:03:08 l2 traffic_manager[25826]: {0x7f3f088607e0} FATAL: [LocalManager::pollMgmtProcessServer] Error in read (errno: 104) Jul 2 14:03:09 l2 traffic_manager[25826]: {0x7f3f088607e0} FATAL: (last system error 104: Connection reset by peer) Jul 2 14:03:09 l2 traffic_cop[25901]: cannot find traffic_server [1] Jul 2 14:03:09 l2 traffic_manager[25826]: {0x7f3f088607e0} ERROR: [LocalManager::sendMgmtMsgToProcesses] Error writing message Jul 2 14:03:09 l2 traffic_manager[25826]: {0x7f3f088607e0} ERROR: (last system error 32: Broken pipe) Jul 2 14:03:12 l2 traffic_cop[25901]: cop received child status signal [25826 35584] Jul 2 14:03:12 l2 traffic_cop[25901]: traffic_manager not running, making sure traffic_server is dead Jul 2 14:03:12 l2 traffic_cop[25901]: spawning traffic_manager Jul 2 14:03:13 l2 traffic_manager[18267]: NOTE: --- Manager Starting --- Jul 2 14:03:13 l2 traffic_manager[18267]: NOTE: Manager Version: Apache Traffic Server - traffic_manager - 3.2.0 - (build # 52518 on Jun 25 2012 at 18:22:12) Jul 2 14:03:13 l2 traffic_manager[18267]: {0x7fe63de3f7e0} STATUS: opened /home/y/logs/trafficserver/manager.log Jul 2 14:03:15 l2 traffic_server[18322]: NOTE: --- Server Starting --- Jul 2 14:03:15 l2 traffic_server[18322]: NOTE: Server Version: Apache Traffic Server - traffic_server - 3.2.0 - (build # 52518 on Jun 25 2012 at 18:22:31) Jul 2 14:03:15 l2 traffic_server[18322]: {0x2b77573ab860} STATUS: opened /home/y/logs/trafficserver/diags.log Jul 2 14:03:15 l2 traffic_server[18322]: {0x2b77573ab860} ERROR: Cannot insert duplicate! Jul 2 14:03:22 l2 traffic_cop[25901]: server heartbeat succeeded [Jul 2 13:56:56.304] Server {0x2b0a391e1700} ERROR: [SSL_NetVConnection::ssl_read_from_net] SSL_ERROR_SYSCALL, underlying IO error: Connection reset by peer NOTE: Traffic Server received Sig 11: Segmentation fault NOTE: Traffic Server received Sig 11: Segmentation fault /home/y/bin/traffic_server - STACK TRACE: /home/y/bin/traffic_server - STACK TRACE: /lib64/libpthread.so.0[0x3b54e0f4a0] /lib64/libpthread.so.0[0x3b54e0f4a0] /home/y/bin/traffic_server(_ZN9LogBuffer14checkout_writeEPmm+0x153)[0x58e083] /home/y/bin/traffic_server(_ZN9LogObject15_checkout_writeEPmm+0xa8)[0x5a64c8] /home/y/bin/traffic_server(_ZN9LogObject3logEP9LogAccessPc+0x2f0)[0x5a7e30] /home/y/bin/traffic_server(_ZN9LogBuffer14checkout_writeEPmm+0x153)[0x58e083] /home/y/bin/traffic_server(_ZN9LogObject15_checkout_writeEPmm+0xa8)[0x5a64c8] /home/y/bin/traffic_server(_ZN9LogObject3logEP9LogAccessPc+0x2f0)[0x5a7e30] /home/y/bin/traffic_server(_ZN3Log6accessEP9LogAccess+0x146)[0x58f506] /home/y/bin/traffic_server(_ZN6HttpSM12update_statsEv+0x630)[0x526c50] /home/y/bin/traffic_server(_ZN3Log6accessEP9LogAccess+0x146)[0x58f506] /home/y/bin/traffic_server(_ZN6HttpSM9kill_thisEv+0x928)[0x52b548] /home/y/bin/traffic_server(_ZN6HttpSM12update_statsEv+0x630)[0x526c50] /home/y/bin/traffic_server(_ZN6HttpSM9kill_thisEv+0x928)[0x52b548] /home/y/bin/traffic_server(_ZN6HttpSM12main_handlerEiPv+0x198)[0x52b868] /home/y/bin/traffic_server(_ZN10HttpTunnel12main_handlerEiPv+0xde)[0x56c3ee] /home/y/bin/traffic_server[0x6736a1]
[jira] [Commented] (TS-1330) Logging related segfault in 3.2.0
[ https://issues.apache.org/jira/browse/TS-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723768#comment-13723768 ] Leif Hedstrom commented on TS-1330: --- Yunkai: lets move the discussion here. Can we expect a patch before friday ? Logging related segfault in 3.2.0 - Key: TS-1330 URL: https://issues.apache.org/jira/browse/TS-1330 Project: Traffic Server Issue Type: Bug Components: Logging Affects Versions: 3.2.0 Environment: ATS 3.2.0 on RHEL 6.2 64-bit Reporter: David Carlin Assignee: Bryan Call Priority: Critical Labels: A, crash Fix For: 3.5.0 I observed the following crash once on one of our ATS boxes - possibly related to TS-1240 Jul 2 13:56:56 l2 traffic_server[25853]: {0x2b0a391e1700} ERROR: [SSL_NetVConnection::ssl_read_from_net] SSL_ERROR_SYSCALL, underlying IO error: Connection reset by peer Jul 2 13:59:56 l2 kernel: [ET_NET 1][25855]: segfault at c ip 0058e083 sp 2b0a2982b740 error 6 Jul 2 13:59:56 l2 kernel: [ET_NET 3][25857]: segfault at 84 ip 0058e083 sp 2b0a29a31740 error 6 in traffic_server[40+34] Jul 2 13:59:56 l2 kernel: in traffic_server[40+34] Jul 2 14:02:59 l2 traffic_cop[25901]: (test) read timeout [18 ] Jul 2 14:02:59 l2 traffic_cop[25901]: server heartbeat failed [1] Jul 2 14:03:08 l2 traffic_manager[25826]: {0x7f3f088607e0} FATAL: [LocalManager::pollMgmtProcessServer] Error in read (errno: 104) Jul 2 14:03:09 l2 traffic_manager[25826]: {0x7f3f088607e0} FATAL: (last system error 104: Connection reset by peer) Jul 2 14:03:09 l2 traffic_cop[25901]: cannot find traffic_server [1] Jul 2 14:03:09 l2 traffic_manager[25826]: {0x7f3f088607e0} ERROR: [LocalManager::sendMgmtMsgToProcesses] Error writing message Jul 2 14:03:09 l2 traffic_manager[25826]: {0x7f3f088607e0} ERROR: (last system error 32: Broken pipe) Jul 2 14:03:12 l2 traffic_cop[25901]: cop received child status signal [25826 35584] Jul 2 14:03:12 l2 traffic_cop[25901]: traffic_manager not running, making sure traffic_server is dead Jul 2 14:03:12 l2 traffic_cop[25901]: spawning traffic_manager Jul 2 14:03:13 l2 traffic_manager[18267]: NOTE: --- Manager Starting --- Jul 2 14:03:13 l2 traffic_manager[18267]: NOTE: Manager Version: Apache Traffic Server - traffic_manager - 3.2.0 - (build # 52518 on Jun 25 2012 at 18:22:12) Jul 2 14:03:13 l2 traffic_manager[18267]: {0x7fe63de3f7e0} STATUS: opened /home/y/logs/trafficserver/manager.log Jul 2 14:03:15 l2 traffic_server[18322]: NOTE: --- Server Starting --- Jul 2 14:03:15 l2 traffic_server[18322]: NOTE: Server Version: Apache Traffic Server - traffic_server - 3.2.0 - (build # 52518 on Jun 25 2012 at 18:22:31) Jul 2 14:03:15 l2 traffic_server[18322]: {0x2b77573ab860} STATUS: opened /home/y/logs/trafficserver/diags.log Jul 2 14:03:15 l2 traffic_server[18322]: {0x2b77573ab860} ERROR: Cannot insert duplicate! Jul 2 14:03:22 l2 traffic_cop[25901]: server heartbeat succeeded [Jul 2 13:56:56.304] Server {0x2b0a391e1700} ERROR: [SSL_NetVConnection::ssl_read_from_net] SSL_ERROR_SYSCALL, underlying IO error: Connection reset by peer NOTE: Traffic Server received Sig 11: Segmentation fault NOTE: Traffic Server received Sig 11: Segmentation fault /home/y/bin/traffic_server - STACK TRACE: /home/y/bin/traffic_server - STACK TRACE: /lib64/libpthread.so.0[0x3b54e0f4a0] /lib64/libpthread.so.0[0x3b54e0f4a0] /home/y/bin/traffic_server(_ZN9LogBuffer14checkout_writeEPmm+0x153)[0x58e083] /home/y/bin/traffic_server(_ZN9LogObject15_checkout_writeEPmm+0xa8)[0x5a64c8] /home/y/bin/traffic_server(_ZN9LogObject3logEP9LogAccessPc+0x2f0)[0x5a7e30] /home/y/bin/traffic_server(_ZN9LogBuffer14checkout_writeEPmm+0x153)[0x58e083] /home/y/bin/traffic_server(_ZN9LogObject15_checkout_writeEPmm+0xa8)[0x5a64c8] /home/y/bin/traffic_server(_ZN9LogObject3logEP9LogAccessPc+0x2f0)[0x5a7e30] /home/y/bin/traffic_server(_ZN3Log6accessEP9LogAccess+0x146)[0x58f506] /home/y/bin/traffic_server(_ZN6HttpSM12update_statsEv+0x630)[0x526c50] /home/y/bin/traffic_server(_ZN3Log6accessEP9LogAccess+0x146)[0x58f506] /home/y/bin/traffic_server(_ZN6HttpSM9kill_thisEv+0x928)[0x52b548] /home/y/bin/traffic_server(_ZN6HttpSM12update_statsEv+0x630)[0x526c50] /home/y/bin/traffic_server(_ZN6HttpSM9kill_thisEv+0x928)[0x52b548] /home/y/bin/traffic_server(_ZN6HttpSM12main_handlerEiPv+0x198)[0x52b868] /home/y/bin/traffic_server(_ZN10HttpTunnel12main_handlerEiPv+0xde)[0x56c3ee] /home/y/bin/traffic_server[0x6736a1] /home/y/bin/traffic_server(_ZN6HttpSM12main_handlerEiPv+0x198)[0x52b868] /home/y/bin/traffic_server(_Z15write_to_net_ioP10NetHandlerP18UnixNetVConnectionP7EThread+0x847)[0x675517]
[jira] [Commented] (TS-1330) Logging related segfault in 3.2.0
[ https://issues.apache.org/jira/browse/TS-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723775#comment-13723775 ] Yunkai Zhang commented on TS-1330: -- [~zwoop] Yes, maybe tonight:) Logging related segfault in 3.2.0 - Key: TS-1330 URL: https://issues.apache.org/jira/browse/TS-1330 Project: Traffic Server Issue Type: Bug Components: Logging Affects Versions: 3.2.0 Environment: ATS 3.2.0 on RHEL 6.2 64-bit Reporter: David Carlin Assignee: Bryan Call Priority: Critical Labels: A, crash Fix For: 3.5.0 I observed the following crash once on one of our ATS boxes - possibly related to TS-1240 Jul 2 13:56:56 l2 traffic_server[25853]: {0x2b0a391e1700} ERROR: [SSL_NetVConnection::ssl_read_from_net] SSL_ERROR_SYSCALL, underlying IO error: Connection reset by peer Jul 2 13:59:56 l2 kernel: [ET_NET 1][25855]: segfault at c ip 0058e083 sp 2b0a2982b740 error 6 Jul 2 13:59:56 l2 kernel: [ET_NET 3][25857]: segfault at 84 ip 0058e083 sp 2b0a29a31740 error 6 in traffic_server[40+34] Jul 2 13:59:56 l2 kernel: in traffic_server[40+34] Jul 2 14:02:59 l2 traffic_cop[25901]: (test) read timeout [18 ] Jul 2 14:02:59 l2 traffic_cop[25901]: server heartbeat failed [1] Jul 2 14:03:08 l2 traffic_manager[25826]: {0x7f3f088607e0} FATAL: [LocalManager::pollMgmtProcessServer] Error in read (errno: 104) Jul 2 14:03:09 l2 traffic_manager[25826]: {0x7f3f088607e0} FATAL: (last system error 104: Connection reset by peer) Jul 2 14:03:09 l2 traffic_cop[25901]: cannot find traffic_server [1] Jul 2 14:03:09 l2 traffic_manager[25826]: {0x7f3f088607e0} ERROR: [LocalManager::sendMgmtMsgToProcesses] Error writing message Jul 2 14:03:09 l2 traffic_manager[25826]: {0x7f3f088607e0} ERROR: (last system error 32: Broken pipe) Jul 2 14:03:12 l2 traffic_cop[25901]: cop received child status signal [25826 35584] Jul 2 14:03:12 l2 traffic_cop[25901]: traffic_manager not running, making sure traffic_server is dead Jul 2 14:03:12 l2 traffic_cop[25901]: spawning traffic_manager Jul 2 14:03:13 l2 traffic_manager[18267]: NOTE: --- Manager Starting --- Jul 2 14:03:13 l2 traffic_manager[18267]: NOTE: Manager Version: Apache Traffic Server - traffic_manager - 3.2.0 - (build # 52518 on Jun 25 2012 at 18:22:12) Jul 2 14:03:13 l2 traffic_manager[18267]: {0x7fe63de3f7e0} STATUS: opened /home/y/logs/trafficserver/manager.log Jul 2 14:03:15 l2 traffic_server[18322]: NOTE: --- Server Starting --- Jul 2 14:03:15 l2 traffic_server[18322]: NOTE: Server Version: Apache Traffic Server - traffic_server - 3.2.0 - (build # 52518 on Jun 25 2012 at 18:22:31) Jul 2 14:03:15 l2 traffic_server[18322]: {0x2b77573ab860} STATUS: opened /home/y/logs/trafficserver/diags.log Jul 2 14:03:15 l2 traffic_server[18322]: {0x2b77573ab860} ERROR: Cannot insert duplicate! Jul 2 14:03:22 l2 traffic_cop[25901]: server heartbeat succeeded [Jul 2 13:56:56.304] Server {0x2b0a391e1700} ERROR: [SSL_NetVConnection::ssl_read_from_net] SSL_ERROR_SYSCALL, underlying IO error: Connection reset by peer NOTE: Traffic Server received Sig 11: Segmentation fault NOTE: Traffic Server received Sig 11: Segmentation fault /home/y/bin/traffic_server - STACK TRACE: /home/y/bin/traffic_server - STACK TRACE: /lib64/libpthread.so.0[0x3b54e0f4a0] /lib64/libpthread.so.0[0x3b54e0f4a0] /home/y/bin/traffic_server(_ZN9LogBuffer14checkout_writeEPmm+0x153)[0x58e083] /home/y/bin/traffic_server(_ZN9LogObject15_checkout_writeEPmm+0xa8)[0x5a64c8] /home/y/bin/traffic_server(_ZN9LogObject3logEP9LogAccessPc+0x2f0)[0x5a7e30] /home/y/bin/traffic_server(_ZN9LogBuffer14checkout_writeEPmm+0x153)[0x58e083] /home/y/bin/traffic_server(_ZN9LogObject15_checkout_writeEPmm+0xa8)[0x5a64c8] /home/y/bin/traffic_server(_ZN9LogObject3logEP9LogAccessPc+0x2f0)[0x5a7e30] /home/y/bin/traffic_server(_ZN3Log6accessEP9LogAccess+0x146)[0x58f506] /home/y/bin/traffic_server(_ZN6HttpSM12update_statsEv+0x630)[0x526c50] /home/y/bin/traffic_server(_ZN3Log6accessEP9LogAccess+0x146)[0x58f506] /home/y/bin/traffic_server(_ZN6HttpSM9kill_thisEv+0x928)[0x52b548] /home/y/bin/traffic_server(_ZN6HttpSM12update_statsEv+0x630)[0x526c50] /home/y/bin/traffic_server(_ZN6HttpSM9kill_thisEv+0x928)[0x52b548] /home/y/bin/traffic_server(_ZN6HttpSM12main_handlerEiPv+0x198)[0x52b868] /home/y/bin/traffic_server(_ZN10HttpTunnel12main_handlerEiPv+0xde)[0x56c3ee] /home/y/bin/traffic_server[0x6736a1] /home/y/bin/traffic_server(_ZN6HttpSM12main_handlerEiPv+0x198)[0x52b868] /home/y/bin/traffic_server(_Z15write_to_net_ioP10NetHandlerP18UnixNetVConnectionP7EThread+0x847)[0x675517] /home/y/bin/traffic_server[0x672f81]
[jira] [Commented] (TS-1330) Logging related segfault in 3.2.0
[ https://issues.apache.org/jira/browse/TS-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723962#comment-13723962 ] Yunkai Zhang commented on TS-1330: -- I have found the root cause of this issue, here is my analysis: 1. One thread(A) got an LogBuffer object in LogObject::_checkout_write() and ran into LogBuffer::checkout_write(), before it updated the LogBuffer::m_state, it was scheduled out by the kernel. 2. Other threads got the same LogBuffer object, but they *all* finished _checkout_write() codes and made the LogBuffer object become *full*, finally one of these thread flushed the LogBuffer object into write_list queue. 3. The LogBuffer object's m_references == 0 at this time. 4. The flushing thread would call LogBufferManager::flush_buffers() to loop write_list. As the LogBuffer's m_references == 0, so it would be flushed and *deleted*. 5. Now, thread(A) would be scheduled in by the kernel, but the LogBuffer object has been *deleted*, so the m_buffer == NULL, that is why entry_header is 0x0. Here is a coredump generated by the my testing patch - 0001-A-testing-patch-to-verify-my-analysis.patch: {code} (gdb) bt #0 0x003c75232885 in raise () from /lib64/libc.so.6 #1 0x003c75234065 in abort () from /lib64/libc.so.6 #2 0x2b02d24bdaf8 in ink_die_die_die (retval=1) at ink_error.cc:43 #3 0x2b02d24bdbca in ink_fatal_va(int, const char *, typedef __va_list_tag __va_list_tag *) (return_code=1, message_format=0x2b02d24dbe40 %s:%d: failed assert `%s`, ap=0x2b0304211a90) at ink_error.cc:65 #4 0x2b02d24bdc95 in ink_fatal (return_code=1, message_format=0x2b02d24dbe40 %s:%d: failed assert `%s`) at ink_error.cc:73 #5 0x2b02d24bcbc4 in _ink_assert (expression=0x74e021 current_ref == 0, file=0x74e014 LogObject.cc, line=60) at ink_assert.cc:38 #6 0x0060a9ab in LogBufferManager::flush_buffers (this=0x3219a08, sink=0x32199a0) at LogObject.cc:60 #7 0x005f755e in LogObject::flush_buffers (this=0x3219970) at LogObject.h:126 #8 0x0060de55 in LogObjectManager::flush_buffers (this=0x31e84f0) at LogObject.cc:1185 #9 0x005f6627 in Log::flush_thread_main (args=0x0) at Log.cc:1193 #10 0x005f779b in LoggingFlushContinuation::mainEvent (this=0x31e8fb0, event=1, data=0x2b02d8e13c60) at Log.cc:297 #11 0x004f1f1e in Continuation::handleEvent (this=0x31e8fb0, event=1, data=0x2b02d8e13c60) at ../iocore/eventsystem/I_Continuation.h:146 #12 0x0070fb18 in EThread::execute (this=0x2b0304013010) at UnixEThread.cc:291 #13 0x0070e6df in spawn_thread_internal (a=0x31e83d0) at Thread.cc:88 #14 0x003c756077f1 in start_thread () from /lib64/libpthread.so.0 #15 0x003c752e570d in clone () from /lib64/libc.so.6 (gdb) f 6 #6 0x0060a9ab in LogBufferManager::flush_buffers (this=0x3219a08, sink=0x32199a0) at LogObject.cc:60 60ink_release_assert(current_ref == 0); (gdb) l 55 * m_ref_debug could be changed by other threads 56 * currently, let's store it into local variable 57 * which is helpful for debugging. 58 */ 59int current_ref = b-m_ref_debug; 60ink_release_assert(current_ref == 0); 61new_q.push(b); 62 } 63} 64 (gdb) p current_ref $2 = 2 (gdb) p b-m_references $3 = 0 (gdb) {code} Once we found out the root cause, fix is simple. We should refine the LogBuffer::m_references variaable, so that it can *really* prevent outstanding references. Logging related segfault in 3.2.0 - Key: TS-1330 URL: https://issues.apache.org/jira/browse/TS-1330 Project: Traffic Server Issue Type: Bug Components: Logging Affects Versions: 3.2.0 Environment: ATS 3.2.0 on RHEL 6.2 64-bit Reporter: David Carlin Assignee: Bryan Call Priority: Critical Labels: A, crash Fix For: 3.5.0 I observed the following crash once on one of our ATS boxes - possibly related to TS-1240 Jul 2 13:56:56 l2 traffic_server[25853]: {0x2b0a391e1700} ERROR: [SSL_NetVConnection::ssl_read_from_net] SSL_ERROR_SYSCALL, underlying IO error: Connection reset by peer Jul 2 13:59:56 l2 kernel: [ET_NET 1][25855]: segfault at c ip 0058e083 sp 2b0a2982b740 error 6 Jul 2 13:59:56 l2 kernel: [ET_NET 3][25857]: segfault at 84 ip 0058e083 sp 2b0a29a31740 error 6 in traffic_server[40+34] Jul 2 13:59:56 l2 kernel: in traffic_server[40+34] Jul 2 14:02:59 l2 traffic_cop[25901]: (test) read timeout [18 ] Jul 2 14:02:59 l2 traffic_cop[25901]: server heartbeat failed [1] Jul 2 14:03:08 l2 traffic_manager[25826]: {0x7f3f088607e0} FATAL: [LocalManager::pollMgmtProcessServer] Error in read (errno: 104) Jul 2
[jira] [Commented] (TS-1330) Logging related segfault in 3.2.0
[ https://issues.apache.org/jira/browse/TS-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724132#comment-13724132 ] Yunkai Zhang commented on TS-1330: -- I have submitted a patch to fix this issue: https://issues.apache.org/jira/secure/attachment/12594993/0001-TS-1330-Fix-logging-core-at-checkout_write.patch Please test it, any feedback is welcome. Logging related segfault in 3.2.0 - Key: TS-1330 URL: https://issues.apache.org/jira/browse/TS-1330 Project: Traffic Server Issue Type: Bug Components: Logging Affects Versions: 3.2.0 Environment: ATS 3.2.0 on RHEL 6.2 64-bit Reporter: David Carlin Assignee: Bryan Call Priority: Critical Labels: A, crash Fix For: 3.5.0 Attachments: 0001-A-testing-patch-to-verify-my-analysis.patch, 0001-TS-1330-Fix-logging-core-at-checkout_write.patch I observed the following crash once on one of our ATS boxes - possibly related to TS-1240 Jul 2 13:56:56 l2 traffic_server[25853]: {0x2b0a391e1700} ERROR: [SSL_NetVConnection::ssl_read_from_net] SSL_ERROR_SYSCALL, underlying IO error: Connection reset by peer Jul 2 13:59:56 l2 kernel: [ET_NET 1][25855]: segfault at c ip 0058e083 sp 2b0a2982b740 error 6 Jul 2 13:59:56 l2 kernel: [ET_NET 3][25857]: segfault at 84 ip 0058e083 sp 2b0a29a31740 error 6 in traffic_server[40+34] Jul 2 13:59:56 l2 kernel: in traffic_server[40+34] Jul 2 14:02:59 l2 traffic_cop[25901]: (test) read timeout [18 ] Jul 2 14:02:59 l2 traffic_cop[25901]: server heartbeat failed [1] Jul 2 14:03:08 l2 traffic_manager[25826]: {0x7f3f088607e0} FATAL: [LocalManager::pollMgmtProcessServer] Error in read (errno: 104) Jul 2 14:03:09 l2 traffic_manager[25826]: {0x7f3f088607e0} FATAL: (last system error 104: Connection reset by peer) Jul 2 14:03:09 l2 traffic_cop[25901]: cannot find traffic_server [1] Jul 2 14:03:09 l2 traffic_manager[25826]: {0x7f3f088607e0} ERROR: [LocalManager::sendMgmtMsgToProcesses] Error writing message Jul 2 14:03:09 l2 traffic_manager[25826]: {0x7f3f088607e0} ERROR: (last system error 32: Broken pipe) Jul 2 14:03:12 l2 traffic_cop[25901]: cop received child status signal [25826 35584] Jul 2 14:03:12 l2 traffic_cop[25901]: traffic_manager not running, making sure traffic_server is dead Jul 2 14:03:12 l2 traffic_cop[25901]: spawning traffic_manager Jul 2 14:03:13 l2 traffic_manager[18267]: NOTE: --- Manager Starting --- Jul 2 14:03:13 l2 traffic_manager[18267]: NOTE: Manager Version: Apache Traffic Server - traffic_manager - 3.2.0 - (build # 52518 on Jun 25 2012 at 18:22:12) Jul 2 14:03:13 l2 traffic_manager[18267]: {0x7fe63de3f7e0} STATUS: opened /home/y/logs/trafficserver/manager.log Jul 2 14:03:15 l2 traffic_server[18322]: NOTE: --- Server Starting --- Jul 2 14:03:15 l2 traffic_server[18322]: NOTE: Server Version: Apache Traffic Server - traffic_server - 3.2.0 - (build # 52518 on Jun 25 2012 at 18:22:31) Jul 2 14:03:15 l2 traffic_server[18322]: {0x2b77573ab860} STATUS: opened /home/y/logs/trafficserver/diags.log Jul 2 14:03:15 l2 traffic_server[18322]: {0x2b77573ab860} ERROR: Cannot insert duplicate! Jul 2 14:03:22 l2 traffic_cop[25901]: server heartbeat succeeded [Jul 2 13:56:56.304] Server {0x2b0a391e1700} ERROR: [SSL_NetVConnection::ssl_read_from_net] SSL_ERROR_SYSCALL, underlying IO error: Connection reset by peer NOTE: Traffic Server received Sig 11: Segmentation fault NOTE: Traffic Server received Sig 11: Segmentation fault /home/y/bin/traffic_server - STACK TRACE: /home/y/bin/traffic_server - STACK TRACE: /lib64/libpthread.so.0[0x3b54e0f4a0] /lib64/libpthread.so.0[0x3b54e0f4a0] /home/y/bin/traffic_server(_ZN9LogBuffer14checkout_writeEPmm+0x153)[0x58e083] /home/y/bin/traffic_server(_ZN9LogObject15_checkout_writeEPmm+0xa8)[0x5a64c8] /home/y/bin/traffic_server(_ZN9LogObject3logEP9LogAccessPc+0x2f0)[0x5a7e30] /home/y/bin/traffic_server(_ZN9LogBuffer14checkout_writeEPmm+0x153)[0x58e083] /home/y/bin/traffic_server(_ZN9LogObject15_checkout_writeEPmm+0xa8)[0x5a64c8] /home/y/bin/traffic_server(_ZN9LogObject3logEP9LogAccessPc+0x2f0)[0x5a7e30] /home/y/bin/traffic_server(_ZN3Log6accessEP9LogAccess+0x146)[0x58f506] /home/y/bin/traffic_server(_ZN6HttpSM12update_statsEv+0x630)[0x526c50] /home/y/bin/traffic_server(_ZN3Log6accessEP9LogAccess+0x146)[0x58f506] /home/y/bin/traffic_server(_ZN6HttpSM9kill_thisEv+0x928)[0x52b548] /home/y/bin/traffic_server(_ZN6HttpSM12update_statsEv+0x630)[0x526c50] /home/y/bin/traffic_server(_ZN6HttpSM9kill_thisEv+0x928)[0x52b548] /home/y/bin/traffic_server(_ZN6HttpSM12main_handlerEiPv+0x198)[0x52b868] /home/y/bin/traffic_server(_ZN10HttpTunnel12main_handlerEiPv+0xde)[0x56c3ee]
[jira] [Commented] (TS-1330) Logging related segfault in 3.2.0
[ https://issues.apache.org/jira/browse/TS-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724730#comment-13724730 ] Yunkai Zhang commented on TS-1330: -- I'm cooking V2 of this patch as I found other problems about logging. Please wait ... Logging related segfault in 3.2.0 - Key: TS-1330 URL: https://issues.apache.org/jira/browse/TS-1330 Project: Traffic Server Issue Type: Bug Components: Logging Affects Versions: 3.2.0 Environment: ATS 3.2.0 on RHEL 6.2 64-bit Reporter: David Carlin Assignee: Bryan Call Priority: Critical Labels: A, crash Fix For: 3.5.0 Attachments: 0001-A-testing-patch-to-verify-my-analysis.patch, 0001-TS-1330-Fix-logging-core-at-checkout_write.patch I observed the following crash once on one of our ATS boxes - possibly related to TS-1240 Jul 2 13:56:56 l2 traffic_server[25853]: {0x2b0a391e1700} ERROR: [SSL_NetVConnection::ssl_read_from_net] SSL_ERROR_SYSCALL, underlying IO error: Connection reset by peer Jul 2 13:59:56 l2 kernel: [ET_NET 1][25855]: segfault at c ip 0058e083 sp 2b0a2982b740 error 6 Jul 2 13:59:56 l2 kernel: [ET_NET 3][25857]: segfault at 84 ip 0058e083 sp 2b0a29a31740 error 6 in traffic_server[40+34] Jul 2 13:59:56 l2 kernel: in traffic_server[40+34] Jul 2 14:02:59 l2 traffic_cop[25901]: (test) read timeout [18 ] Jul 2 14:02:59 l2 traffic_cop[25901]: server heartbeat failed [1] Jul 2 14:03:08 l2 traffic_manager[25826]: {0x7f3f088607e0} FATAL: [LocalManager::pollMgmtProcessServer] Error in read (errno: 104) Jul 2 14:03:09 l2 traffic_manager[25826]: {0x7f3f088607e0} FATAL: (last system error 104: Connection reset by peer) Jul 2 14:03:09 l2 traffic_cop[25901]: cannot find traffic_server [1] Jul 2 14:03:09 l2 traffic_manager[25826]: {0x7f3f088607e0} ERROR: [LocalManager::sendMgmtMsgToProcesses] Error writing message Jul 2 14:03:09 l2 traffic_manager[25826]: {0x7f3f088607e0} ERROR: (last system error 32: Broken pipe) Jul 2 14:03:12 l2 traffic_cop[25901]: cop received child status signal [25826 35584] Jul 2 14:03:12 l2 traffic_cop[25901]: traffic_manager not running, making sure traffic_server is dead Jul 2 14:03:12 l2 traffic_cop[25901]: spawning traffic_manager Jul 2 14:03:13 l2 traffic_manager[18267]: NOTE: --- Manager Starting --- Jul 2 14:03:13 l2 traffic_manager[18267]: NOTE: Manager Version: Apache Traffic Server - traffic_manager - 3.2.0 - (build # 52518 on Jun 25 2012 at 18:22:12) Jul 2 14:03:13 l2 traffic_manager[18267]: {0x7fe63de3f7e0} STATUS: opened /home/y/logs/trafficserver/manager.log Jul 2 14:03:15 l2 traffic_server[18322]: NOTE: --- Server Starting --- Jul 2 14:03:15 l2 traffic_server[18322]: NOTE: Server Version: Apache Traffic Server - traffic_server - 3.2.0 - (build # 52518 on Jun 25 2012 at 18:22:31) Jul 2 14:03:15 l2 traffic_server[18322]: {0x2b77573ab860} STATUS: opened /home/y/logs/trafficserver/diags.log Jul 2 14:03:15 l2 traffic_server[18322]: {0x2b77573ab860} ERROR: Cannot insert duplicate! Jul 2 14:03:22 l2 traffic_cop[25901]: server heartbeat succeeded [Jul 2 13:56:56.304] Server {0x2b0a391e1700} ERROR: [SSL_NetVConnection::ssl_read_from_net] SSL_ERROR_SYSCALL, underlying IO error: Connection reset by peer NOTE: Traffic Server received Sig 11: Segmentation fault NOTE: Traffic Server received Sig 11: Segmentation fault /home/y/bin/traffic_server - STACK TRACE: /home/y/bin/traffic_server - STACK TRACE: /lib64/libpthread.so.0[0x3b54e0f4a0] /lib64/libpthread.so.0[0x3b54e0f4a0] /home/y/bin/traffic_server(_ZN9LogBuffer14checkout_writeEPmm+0x153)[0x58e083] /home/y/bin/traffic_server(_ZN9LogObject15_checkout_writeEPmm+0xa8)[0x5a64c8] /home/y/bin/traffic_server(_ZN9LogObject3logEP9LogAccessPc+0x2f0)[0x5a7e30] /home/y/bin/traffic_server(_ZN9LogBuffer14checkout_writeEPmm+0x153)[0x58e083] /home/y/bin/traffic_server(_ZN9LogObject15_checkout_writeEPmm+0xa8)[0x5a64c8] /home/y/bin/traffic_server(_ZN9LogObject3logEP9LogAccessPc+0x2f0)[0x5a7e30] /home/y/bin/traffic_server(_ZN3Log6accessEP9LogAccess+0x146)[0x58f506] /home/y/bin/traffic_server(_ZN6HttpSM12update_statsEv+0x630)[0x526c50] /home/y/bin/traffic_server(_ZN3Log6accessEP9LogAccess+0x146)[0x58f506] /home/y/bin/traffic_server(_ZN6HttpSM9kill_thisEv+0x928)[0x52b548] /home/y/bin/traffic_server(_ZN6HttpSM12update_statsEv+0x630)[0x526c50] /home/y/bin/traffic_server(_ZN6HttpSM9kill_thisEv+0x928)[0x52b548] /home/y/bin/traffic_server(_ZN6HttpSM12main_handlerEiPv+0x198)[0x52b868] /home/y/bin/traffic_server(_ZN10HttpTunnel12main_handlerEiPv+0xde)[0x56c3ee] /home/y/bin/traffic_server[0x6736a1] /home/y/bin/traffic_server(_ZN6HttpSM12main_handlerEiPv+0x198)[0x52b868]
[jira] [Commented] (TS-1330) Logging related segfault in 3.2.0
[ https://issues.apache.org/jira/browse/TS-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724770#comment-13724770 ] James Peach commented on TS-1330: - Is m_references enforcing the lifetime of the LogBuffer object? If so, can we use Ptr for that? Logging related segfault in 3.2.0 - Key: TS-1330 URL: https://issues.apache.org/jira/browse/TS-1330 Project: Traffic Server Issue Type: Bug Components: Logging Affects Versions: 3.2.0 Environment: ATS 3.2.0 on RHEL 6.2 64-bit Reporter: David Carlin Assignee: Bryan Call Priority: Critical Labels: A, crash Fix For: 3.5.0 Attachments: 0001-A-testing-patch-to-verify-my-analysis.patch, 0001-TS-1330-Fix-logging-core-at-checkout_write.patch I observed the following crash once on one of our ATS boxes - possibly related to TS-1240 Jul 2 13:56:56 l2 traffic_server[25853]: {0x2b0a391e1700} ERROR: [SSL_NetVConnection::ssl_read_from_net] SSL_ERROR_SYSCALL, underlying IO error: Connection reset by peer Jul 2 13:59:56 l2 kernel: [ET_NET 1][25855]: segfault at c ip 0058e083 sp 2b0a2982b740 error 6 Jul 2 13:59:56 l2 kernel: [ET_NET 3][25857]: segfault at 84 ip 0058e083 sp 2b0a29a31740 error 6 in traffic_server[40+34] Jul 2 13:59:56 l2 kernel: in traffic_server[40+34] Jul 2 14:02:59 l2 traffic_cop[25901]: (test) read timeout [18 ] Jul 2 14:02:59 l2 traffic_cop[25901]: server heartbeat failed [1] Jul 2 14:03:08 l2 traffic_manager[25826]: {0x7f3f088607e0} FATAL: [LocalManager::pollMgmtProcessServer] Error in read (errno: 104) Jul 2 14:03:09 l2 traffic_manager[25826]: {0x7f3f088607e0} FATAL: (last system error 104: Connection reset by peer) Jul 2 14:03:09 l2 traffic_cop[25901]: cannot find traffic_server [1] Jul 2 14:03:09 l2 traffic_manager[25826]: {0x7f3f088607e0} ERROR: [LocalManager::sendMgmtMsgToProcesses] Error writing message Jul 2 14:03:09 l2 traffic_manager[25826]: {0x7f3f088607e0} ERROR: (last system error 32: Broken pipe) Jul 2 14:03:12 l2 traffic_cop[25901]: cop received child status signal [25826 35584] Jul 2 14:03:12 l2 traffic_cop[25901]: traffic_manager not running, making sure traffic_server is dead Jul 2 14:03:12 l2 traffic_cop[25901]: spawning traffic_manager Jul 2 14:03:13 l2 traffic_manager[18267]: NOTE: --- Manager Starting --- Jul 2 14:03:13 l2 traffic_manager[18267]: NOTE: Manager Version: Apache Traffic Server - traffic_manager - 3.2.0 - (build # 52518 on Jun 25 2012 at 18:22:12) Jul 2 14:03:13 l2 traffic_manager[18267]: {0x7fe63de3f7e0} STATUS: opened /home/y/logs/trafficserver/manager.log Jul 2 14:03:15 l2 traffic_server[18322]: NOTE: --- Server Starting --- Jul 2 14:03:15 l2 traffic_server[18322]: NOTE: Server Version: Apache Traffic Server - traffic_server - 3.2.0 - (build # 52518 on Jun 25 2012 at 18:22:31) Jul 2 14:03:15 l2 traffic_server[18322]: {0x2b77573ab860} STATUS: opened /home/y/logs/trafficserver/diags.log Jul 2 14:03:15 l2 traffic_server[18322]: {0x2b77573ab860} ERROR: Cannot insert duplicate! Jul 2 14:03:22 l2 traffic_cop[25901]: server heartbeat succeeded [Jul 2 13:56:56.304] Server {0x2b0a391e1700} ERROR: [SSL_NetVConnection::ssl_read_from_net] SSL_ERROR_SYSCALL, underlying IO error: Connection reset by peer NOTE: Traffic Server received Sig 11: Segmentation fault NOTE: Traffic Server received Sig 11: Segmentation fault /home/y/bin/traffic_server - STACK TRACE: /home/y/bin/traffic_server - STACK TRACE: /lib64/libpthread.so.0[0x3b54e0f4a0] /lib64/libpthread.so.0[0x3b54e0f4a0] /home/y/bin/traffic_server(_ZN9LogBuffer14checkout_writeEPmm+0x153)[0x58e083] /home/y/bin/traffic_server(_ZN9LogObject15_checkout_writeEPmm+0xa8)[0x5a64c8] /home/y/bin/traffic_server(_ZN9LogObject3logEP9LogAccessPc+0x2f0)[0x5a7e30] /home/y/bin/traffic_server(_ZN9LogBuffer14checkout_writeEPmm+0x153)[0x58e083] /home/y/bin/traffic_server(_ZN9LogObject15_checkout_writeEPmm+0xa8)[0x5a64c8] /home/y/bin/traffic_server(_ZN9LogObject3logEP9LogAccessPc+0x2f0)[0x5a7e30] /home/y/bin/traffic_server(_ZN3Log6accessEP9LogAccess+0x146)[0x58f506] /home/y/bin/traffic_server(_ZN6HttpSM12update_statsEv+0x630)[0x526c50] /home/y/bin/traffic_server(_ZN3Log6accessEP9LogAccess+0x146)[0x58f506] /home/y/bin/traffic_server(_ZN6HttpSM9kill_thisEv+0x928)[0x52b548] /home/y/bin/traffic_server(_ZN6HttpSM12update_statsEv+0x630)[0x526c50] /home/y/bin/traffic_server(_ZN6HttpSM9kill_thisEv+0x928)[0x52b548] /home/y/bin/traffic_server(_ZN6HttpSM12main_handlerEiPv+0x198)[0x52b868] /home/y/bin/traffic_server(_ZN10HttpTunnel12main_handlerEiPv+0xde)[0x56c3ee] /home/y/bin/traffic_server[0x6736a1] /home/y/bin/traffic_server(_ZN6HttpSM12main_handlerEiPv+0x198)[0x52b868]
[jira] [Commented] (TS-1330) Logging related segfault in 3.2.0
[ https://issues.apache.org/jira/browse/TS-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724782#comment-13724782 ] Yunkai Zhang commented on TS-1330: -- No, m_references only enforces the lifetime of getting offset from LogBuffer, that is what checkout_write() do. Logging related segfault in 3.2.0 - Key: TS-1330 URL: https://issues.apache.org/jira/browse/TS-1330 Project: Traffic Server Issue Type: Bug Components: Logging Affects Versions: 3.2.0 Environment: ATS 3.2.0 on RHEL 6.2 64-bit Reporter: David Carlin Assignee: Bryan Call Priority: Critical Labels: A, crash Fix For: 3.5.0 Attachments: 0001-A-testing-patch-to-verify-my-analysis.patch, 0001-TS-1330-Fix-logging-core-at-checkout_write.patch I observed the following crash once on one of our ATS boxes - possibly related to TS-1240 Jul 2 13:56:56 l2 traffic_server[25853]: {0x2b0a391e1700} ERROR: [SSL_NetVConnection::ssl_read_from_net] SSL_ERROR_SYSCALL, underlying IO error: Connection reset by peer Jul 2 13:59:56 l2 kernel: [ET_NET 1][25855]: segfault at c ip 0058e083 sp 2b0a2982b740 error 6 Jul 2 13:59:56 l2 kernel: [ET_NET 3][25857]: segfault at 84 ip 0058e083 sp 2b0a29a31740 error 6 in traffic_server[40+34] Jul 2 13:59:56 l2 kernel: in traffic_server[40+34] Jul 2 14:02:59 l2 traffic_cop[25901]: (test) read timeout [18 ] Jul 2 14:02:59 l2 traffic_cop[25901]: server heartbeat failed [1] Jul 2 14:03:08 l2 traffic_manager[25826]: {0x7f3f088607e0} FATAL: [LocalManager::pollMgmtProcessServer] Error in read (errno: 104) Jul 2 14:03:09 l2 traffic_manager[25826]: {0x7f3f088607e0} FATAL: (last system error 104: Connection reset by peer) Jul 2 14:03:09 l2 traffic_cop[25901]: cannot find traffic_server [1] Jul 2 14:03:09 l2 traffic_manager[25826]: {0x7f3f088607e0} ERROR: [LocalManager::sendMgmtMsgToProcesses] Error writing message Jul 2 14:03:09 l2 traffic_manager[25826]: {0x7f3f088607e0} ERROR: (last system error 32: Broken pipe) Jul 2 14:03:12 l2 traffic_cop[25901]: cop received child status signal [25826 35584] Jul 2 14:03:12 l2 traffic_cop[25901]: traffic_manager not running, making sure traffic_server is dead Jul 2 14:03:12 l2 traffic_cop[25901]: spawning traffic_manager Jul 2 14:03:13 l2 traffic_manager[18267]: NOTE: --- Manager Starting --- Jul 2 14:03:13 l2 traffic_manager[18267]: NOTE: Manager Version: Apache Traffic Server - traffic_manager - 3.2.0 - (build # 52518 on Jun 25 2012 at 18:22:12) Jul 2 14:03:13 l2 traffic_manager[18267]: {0x7fe63de3f7e0} STATUS: opened /home/y/logs/trafficserver/manager.log Jul 2 14:03:15 l2 traffic_server[18322]: NOTE: --- Server Starting --- Jul 2 14:03:15 l2 traffic_server[18322]: NOTE: Server Version: Apache Traffic Server - traffic_server - 3.2.0 - (build # 52518 on Jun 25 2012 at 18:22:31) Jul 2 14:03:15 l2 traffic_server[18322]: {0x2b77573ab860} STATUS: opened /home/y/logs/trafficserver/diags.log Jul 2 14:03:15 l2 traffic_server[18322]: {0x2b77573ab860} ERROR: Cannot insert duplicate! Jul 2 14:03:22 l2 traffic_cop[25901]: server heartbeat succeeded [Jul 2 13:56:56.304] Server {0x2b0a391e1700} ERROR: [SSL_NetVConnection::ssl_read_from_net] SSL_ERROR_SYSCALL, underlying IO error: Connection reset by peer NOTE: Traffic Server received Sig 11: Segmentation fault NOTE: Traffic Server received Sig 11: Segmentation fault /home/y/bin/traffic_server - STACK TRACE: /home/y/bin/traffic_server - STACK TRACE: /lib64/libpthread.so.0[0x3b54e0f4a0] /lib64/libpthread.so.0[0x3b54e0f4a0] /home/y/bin/traffic_server(_ZN9LogBuffer14checkout_writeEPmm+0x153)[0x58e083] /home/y/bin/traffic_server(_ZN9LogObject15_checkout_writeEPmm+0xa8)[0x5a64c8] /home/y/bin/traffic_server(_ZN9LogObject3logEP9LogAccessPc+0x2f0)[0x5a7e30] /home/y/bin/traffic_server(_ZN9LogBuffer14checkout_writeEPmm+0x153)[0x58e083] /home/y/bin/traffic_server(_ZN9LogObject15_checkout_writeEPmm+0xa8)[0x5a64c8] /home/y/bin/traffic_server(_ZN9LogObject3logEP9LogAccessPc+0x2f0)[0x5a7e30] /home/y/bin/traffic_server(_ZN3Log6accessEP9LogAccess+0x146)[0x58f506] /home/y/bin/traffic_server(_ZN6HttpSM12update_statsEv+0x630)[0x526c50] /home/y/bin/traffic_server(_ZN3Log6accessEP9LogAccess+0x146)[0x58f506] /home/y/bin/traffic_server(_ZN6HttpSM9kill_thisEv+0x928)[0x52b548] /home/y/bin/traffic_server(_ZN6HttpSM12update_statsEv+0x630)[0x526c50] /home/y/bin/traffic_server(_ZN6HttpSM9kill_thisEv+0x928)[0x52b548] /home/y/bin/traffic_server(_ZN6HttpSM12main_handlerEiPv+0x198)[0x52b868] /home/y/bin/traffic_server(_ZN10HttpTunnel12main_handlerEiPv+0xde)[0x56c3ee] /home/y/bin/traffic_server[0x6736a1]
[jira] [Commented] (TS-1330) Logging related segfault in 3.2.0
[ https://issues.apache.org/jira/browse/TS-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13720380#comment-13720380 ] Leif Hedstrom commented on TS-1330: --- Yunkai: Can you comment your findings on this ticket as well? Particularly if you have any details on how to reproduce it... Logging related segfault in 3.2.0 - Key: TS-1330 URL: https://issues.apache.org/jira/browse/TS-1330 Project: Traffic Server Issue Type: Bug Components: Logging Affects Versions: 3.2.0 Environment: ATS 3.2.0 on RHEL 6.2 64-bit Reporter: David Carlin Assignee: Bryan Call Priority: Critical Labels: A, crash Fix For: 3.5.0 I observed the following crash once on one of our ATS boxes - possibly related to TS-1240 Jul 2 13:56:56 l2 traffic_server[25853]: {0x2b0a391e1700} ERROR: [SSL_NetVConnection::ssl_read_from_net] SSL_ERROR_SYSCALL, underlying IO error: Connection reset by peer Jul 2 13:59:56 l2 kernel: [ET_NET 1][25855]: segfault at c ip 0058e083 sp 2b0a2982b740 error 6 Jul 2 13:59:56 l2 kernel: [ET_NET 3][25857]: segfault at 84 ip 0058e083 sp 2b0a29a31740 error 6 in traffic_server[40+34] Jul 2 13:59:56 l2 kernel: in traffic_server[40+34] Jul 2 14:02:59 l2 traffic_cop[25901]: (test) read timeout [18 ] Jul 2 14:02:59 l2 traffic_cop[25901]: server heartbeat failed [1] Jul 2 14:03:08 l2 traffic_manager[25826]: {0x7f3f088607e0} FATAL: [LocalManager::pollMgmtProcessServer] Error in read (errno: 104) Jul 2 14:03:09 l2 traffic_manager[25826]: {0x7f3f088607e0} FATAL: (last system error 104: Connection reset by peer) Jul 2 14:03:09 l2 traffic_cop[25901]: cannot find traffic_server [1] Jul 2 14:03:09 l2 traffic_manager[25826]: {0x7f3f088607e0} ERROR: [LocalManager::sendMgmtMsgToProcesses] Error writing message Jul 2 14:03:09 l2 traffic_manager[25826]: {0x7f3f088607e0} ERROR: (last system error 32: Broken pipe) Jul 2 14:03:12 l2 traffic_cop[25901]: cop received child status signal [25826 35584] Jul 2 14:03:12 l2 traffic_cop[25901]: traffic_manager not running, making sure traffic_server is dead Jul 2 14:03:12 l2 traffic_cop[25901]: spawning traffic_manager Jul 2 14:03:13 l2 traffic_manager[18267]: NOTE: --- Manager Starting --- Jul 2 14:03:13 l2 traffic_manager[18267]: NOTE: Manager Version: Apache Traffic Server - traffic_manager - 3.2.0 - (build # 52518 on Jun 25 2012 at 18:22:12) Jul 2 14:03:13 l2 traffic_manager[18267]: {0x7fe63de3f7e0} STATUS: opened /home/y/logs/trafficserver/manager.log Jul 2 14:03:15 l2 traffic_server[18322]: NOTE: --- Server Starting --- Jul 2 14:03:15 l2 traffic_server[18322]: NOTE: Server Version: Apache Traffic Server - traffic_server - 3.2.0 - (build # 52518 on Jun 25 2012 at 18:22:31) Jul 2 14:03:15 l2 traffic_server[18322]: {0x2b77573ab860} STATUS: opened /home/y/logs/trafficserver/diags.log Jul 2 14:03:15 l2 traffic_server[18322]: {0x2b77573ab860} ERROR: Cannot insert duplicate! Jul 2 14:03:22 l2 traffic_cop[25901]: server heartbeat succeeded [Jul 2 13:56:56.304] Server {0x2b0a391e1700} ERROR: [SSL_NetVConnection::ssl_read_from_net] SSL_ERROR_SYSCALL, underlying IO error: Connection reset by peer NOTE: Traffic Server received Sig 11: Segmentation fault NOTE: Traffic Server received Sig 11: Segmentation fault /home/y/bin/traffic_server - STACK TRACE: /home/y/bin/traffic_server - STACK TRACE: /lib64/libpthread.so.0[0x3b54e0f4a0] /lib64/libpthread.so.0[0x3b54e0f4a0] /home/y/bin/traffic_server(_ZN9LogBuffer14checkout_writeEPmm+0x153)[0x58e083] /home/y/bin/traffic_server(_ZN9LogObject15_checkout_writeEPmm+0xa8)[0x5a64c8] /home/y/bin/traffic_server(_ZN9LogObject3logEP9LogAccessPc+0x2f0)[0x5a7e30] /home/y/bin/traffic_server(_ZN9LogBuffer14checkout_writeEPmm+0x153)[0x58e083] /home/y/bin/traffic_server(_ZN9LogObject15_checkout_writeEPmm+0xa8)[0x5a64c8] /home/y/bin/traffic_server(_ZN9LogObject3logEP9LogAccessPc+0x2f0)[0x5a7e30] /home/y/bin/traffic_server(_ZN3Log6accessEP9LogAccess+0x146)[0x58f506] /home/y/bin/traffic_server(_ZN6HttpSM12update_statsEv+0x630)[0x526c50] /home/y/bin/traffic_server(_ZN3Log6accessEP9LogAccess+0x146)[0x58f506] /home/y/bin/traffic_server(_ZN6HttpSM9kill_thisEv+0x928)[0x52b548] /home/y/bin/traffic_server(_ZN6HttpSM12update_statsEv+0x630)[0x526c50] /home/y/bin/traffic_server(_ZN6HttpSM9kill_thisEv+0x928)[0x52b548] /home/y/bin/traffic_server(_ZN6HttpSM12main_handlerEiPv+0x198)[0x52b868] /home/y/bin/traffic_server(_ZN10HttpTunnel12main_handlerEiPv+0xde)[0x56c3ee] /home/y/bin/traffic_server[0x6736a1] /home/y/bin/traffic_server(_ZN6HttpSM12main_handlerEiPv+0x198)[0x52b868]
[jira] [Commented] (TS-1330) Logging related segfault in 3.2.0
[ https://issues.apache.org/jira/browse/TS-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13720381#comment-13720381 ] Leif Hedstrom commented on TS-1330: --- From the dupe bug (Yunkai's comment) {code} (gdb) bt #0 0x005f0384 in LogBuffer::checkout_write (this=0x2aeb30002ff0, write_offset=0x2aeb09f5f4e8, write_size=152) at LogBuffer.cc:287 #1 0x0060bc3f in LogObject::_checkout_write (this=0x1ed8800, write_offset=0x2aeb09f5f4e8, bytes_needed=152) at LogObject.cc:424 #2 0x0060c43e in LogObject::log (this=0x1ed8800, lad=0x2aeb09f5f7a0, text_entry=0x0) at LogObject.cc:579 #3 0x005f761a in LogObjectManager::log (this=0x1eaf400, lad=0x2aeb09f5f7a0) at LogObject.h:395 #4 0x005f6379 in Log::access (lad=0x2aeb09f5f7a0) at Log.cc:1117 #5 0x0058506e in HttpSM::update_stats (this=0x2aeb41379f10) at HttpSM.cc:6312 #6 0x00584d21 in HttpSM::kill_this (this=0x2aeb41379f10) at HttpSM.cc:6273 #7 0x005798ea in HttpSM::main_handler (this=0x2aeb41379f10, event=2301, data=0x2aeb4137ba70) at HttpSM.cc:2593 #8 0x004f1f1e in Continuation::handleEvent (this=0x2aeb41379f10, event=2301, data=0x2aeb4137ba70) at ../iocore/eventsystem/I_Continuation.h:146 #9 0x005bc819 in HttpTunnel::main_handler (this=0x2aeb4137ba70, event=103, data=0x2aebb000f050) at HttpTunnel.cc:1476 #10 0x004f1f1e in Continuation::handleEvent (this=0x2aeb4137ba70, event=103, data=0x2aebb000f050) at ../iocore/eventsystem/I_Continuation.h:146 #11 0x006ee000 in write_signal_and_update (event=103, vc=0x2aebb000eee0) at UnixNetVConnection.cc:153 #12 0x006ee0f9 in write_signal_done (event=103, nh=0x2aeb08d542a8, vc=0x2aebb000eee0) at UnixNetVConnection.cc:180 #13 0x006eef0a in write_to_net_io (nh=0x2aeb08d542a8, vc=0x2aebb000eee0, thread=0x2aeb08d51010) at UnixNetVConnection.cc:479 #14 0x006ee8c9 in write_to_net (nh=0x2aeb08d542a8, vc=0x2aebb000eee0, pd=0x2aeb1466b010, thread=0x2aeb08d51010) at UnixNetVConnection.cc:352 #15 0x006eae36 in NetHandler::mainNetEvent (this=0x2aeb08d542a8, event=5, e=0x2aeb0a6746e0) at UnixNet.cc:394 #16 0x004f1f1e in Continuation::handleEvent (this=0x2aeb08d542a8, event=5, data=0x2aeb0a6746e0) at ../iocore/eventsystem/I_Continuation.h:146 #17 0x0070f3d2 in EThread::process_event (this=0x2aeb08d51010, e=0x2aeb0a6746e0, calling_code=5) at UnixEThread.cc:142 #18 0x0070f8ef in EThread::execute (this=0x2aeb08d51010) at UnixEThread.cc:266 #19 0x0070e62f in spawn_thread_internal (a=0x1b3f520) at Thread.cc:88 #20 0x003c756077f1 in start_thread () from /lib64/libpthread.so.0 #21 0x003c752e570d in clone () from /lib64/libc.so.6 (gdb) f 0 #0 0x005f0384 in LogBuffer::checkout_write (this=0x2aeb30002ff0, write_offset=0x2aeb09f5f4e8, write_size=152) at LogBuffer.cc:287 287 entry_header-timestamp = tp.tv_sec; (gdb) p entry_header $7 = (LogEntryHeader *) 0x0 (gdb) p old_s $8 = {ival = 0, s = {offset = 0, num_entries = 0, byte_count = 0, full = 0, num_writers = 0}} (gdb) p new_s $9 = {ival = 563671507992744, s = {offset = 168, num_entries = 1, byte_count = 168, full = 0, num_writers = 1}} (gdb) p m_buffer $10 = 0x2aeb30005000 \316\372\316\n\002 (gdb) p *this $11 = {write_link = {next = 0x0}, link = {SLinkLogBuffer = {next = 0x0}, prev = 0x0}, static M_ID = 57301638, m_unaligned_buffer = 0x2aeb30004fa0 \350\a, m_buffer = 0x2aeb30005000 \316\372\316\n\002, m_size = 18432, m_buf_align = 512, m_write_align = 8, m_state = {ival = 923143098288024, s = {offset = 18328, num_entries = 115, byte_count = 18328, full = 1, num_writers = 1}}, m_max_entries = 200, m_expiration_time = 1374692424, m_owner = 0x1ed8800, m_header = 0x2aeb30005000, m_id = 57301630, m_references = 1} {code} Logging related segfault in 3.2.0 - Key: TS-1330 URL: https://issues.apache.org/jira/browse/TS-1330 Project: Traffic Server Issue Type: Bug Components: Logging Affects Versions: 3.2.0 Environment: ATS 3.2.0 on RHEL 6.2 64-bit Reporter: David Carlin Assignee: Bryan Call Priority: Critical Labels: A, crash Fix For: 3.5.0 I observed the following crash once on one of our ATS boxes - possibly related to TS-1240 Jul 2 13:56:56 l2 traffic_server[25853]: {0x2b0a391e1700} ERROR: [SSL_NetVConnection::ssl_read_from_net] SSL_ERROR_SYSCALL, underlying IO error: Connection reset by peer Jul 2 13:59:56 l2 kernel: [ET_NET 1][25855]: segfault at c ip 0058e083 sp 2b0a2982b740 error 6 Jul 2 13:59:56 l2 kernel: [ET_NET 3][25857]: segfault at 84 ip 0058e083 sp 2b0a29a31740 error 6 in traffic_server[40+34] Jul 2
[jira] [Commented] (TS-1330) Logging related segfault in 3.2.0
[ https://issues.apache.org/jira/browse/TS-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13711821#comment-13711821 ] Leif Hedstrom commented on TS-1330: --- Bryan: Any updates? Should we just move this out to v3.3.6 (and hence, v3.5.0) ? Logging related segfault in 3.2.0 - Key: TS-1330 URL: https://issues.apache.org/jira/browse/TS-1330 Project: Traffic Server Issue Type: Bug Components: Logging Affects Versions: 3.2.0 Environment: ATS 3.2.0 on RHEL 6.2 64-bit Reporter: David Carlin Assignee: Bryan Call Priority: Critical Labels: A, crash Fix For: 3.3.5 I observed the following crash once on one of our ATS boxes - possibly related to TS-1240 Jul 2 13:56:56 l2 traffic_server[25853]: {0x2b0a391e1700} ERROR: [SSL_NetVConnection::ssl_read_from_net] SSL_ERROR_SYSCALL, underlying IO error: Connection reset by peer Jul 2 13:59:56 l2 kernel: [ET_NET 1][25855]: segfault at c ip 0058e083 sp 2b0a2982b740 error 6 Jul 2 13:59:56 l2 kernel: [ET_NET 3][25857]: segfault at 84 ip 0058e083 sp 2b0a29a31740 error 6 in traffic_server[40+34] Jul 2 13:59:56 l2 kernel: in traffic_server[40+34] Jul 2 14:02:59 l2 traffic_cop[25901]: (test) read timeout [18 ] Jul 2 14:02:59 l2 traffic_cop[25901]: server heartbeat failed [1] Jul 2 14:03:08 l2 traffic_manager[25826]: {0x7f3f088607e0} FATAL: [LocalManager::pollMgmtProcessServer] Error in read (errno: 104) Jul 2 14:03:09 l2 traffic_manager[25826]: {0x7f3f088607e0} FATAL: (last system error 104: Connection reset by peer) Jul 2 14:03:09 l2 traffic_cop[25901]: cannot find traffic_server [1] Jul 2 14:03:09 l2 traffic_manager[25826]: {0x7f3f088607e0} ERROR: [LocalManager::sendMgmtMsgToProcesses] Error writing message Jul 2 14:03:09 l2 traffic_manager[25826]: {0x7f3f088607e0} ERROR: (last system error 32: Broken pipe) Jul 2 14:03:12 l2 traffic_cop[25901]: cop received child status signal [25826 35584] Jul 2 14:03:12 l2 traffic_cop[25901]: traffic_manager not running, making sure traffic_server is dead Jul 2 14:03:12 l2 traffic_cop[25901]: spawning traffic_manager Jul 2 14:03:13 l2 traffic_manager[18267]: NOTE: --- Manager Starting --- Jul 2 14:03:13 l2 traffic_manager[18267]: NOTE: Manager Version: Apache Traffic Server - traffic_manager - 3.2.0 - (build # 52518 on Jun 25 2012 at 18:22:12) Jul 2 14:03:13 l2 traffic_manager[18267]: {0x7fe63de3f7e0} STATUS: opened /home/y/logs/trafficserver/manager.log Jul 2 14:03:15 l2 traffic_server[18322]: NOTE: --- Server Starting --- Jul 2 14:03:15 l2 traffic_server[18322]: NOTE: Server Version: Apache Traffic Server - traffic_server - 3.2.0 - (build # 52518 on Jun 25 2012 at 18:22:31) Jul 2 14:03:15 l2 traffic_server[18322]: {0x2b77573ab860} STATUS: opened /home/y/logs/trafficserver/diags.log Jul 2 14:03:15 l2 traffic_server[18322]: {0x2b77573ab860} ERROR: Cannot insert duplicate! Jul 2 14:03:22 l2 traffic_cop[25901]: server heartbeat succeeded [Jul 2 13:56:56.304] Server {0x2b0a391e1700} ERROR: [SSL_NetVConnection::ssl_read_from_net] SSL_ERROR_SYSCALL, underlying IO error: Connection reset by peer NOTE: Traffic Server received Sig 11: Segmentation fault NOTE: Traffic Server received Sig 11: Segmentation fault /home/y/bin/traffic_server - STACK TRACE: /home/y/bin/traffic_server - STACK TRACE: /lib64/libpthread.so.0[0x3b54e0f4a0] /lib64/libpthread.so.0[0x3b54e0f4a0] /home/y/bin/traffic_server(_ZN9LogBuffer14checkout_writeEPmm+0x153)[0x58e083] /home/y/bin/traffic_server(_ZN9LogObject15_checkout_writeEPmm+0xa8)[0x5a64c8] /home/y/bin/traffic_server(_ZN9LogObject3logEP9LogAccessPc+0x2f0)[0x5a7e30] /home/y/bin/traffic_server(_ZN9LogBuffer14checkout_writeEPmm+0x153)[0x58e083] /home/y/bin/traffic_server(_ZN9LogObject15_checkout_writeEPmm+0xa8)[0x5a64c8] /home/y/bin/traffic_server(_ZN9LogObject3logEP9LogAccessPc+0x2f0)[0x5a7e30] /home/y/bin/traffic_server(_ZN3Log6accessEP9LogAccess+0x146)[0x58f506] /home/y/bin/traffic_server(_ZN6HttpSM12update_statsEv+0x630)[0x526c50] /home/y/bin/traffic_server(_ZN3Log6accessEP9LogAccess+0x146)[0x58f506] /home/y/bin/traffic_server(_ZN6HttpSM9kill_thisEv+0x928)[0x52b548] /home/y/bin/traffic_server(_ZN6HttpSM12update_statsEv+0x630)[0x526c50] /home/y/bin/traffic_server(_ZN6HttpSM9kill_thisEv+0x928)[0x52b548] /home/y/bin/traffic_server(_ZN6HttpSM12main_handlerEiPv+0x198)[0x52b868] /home/y/bin/traffic_server(_ZN10HttpTunnel12main_handlerEiPv+0xde)[0x56c3ee] /home/y/bin/traffic_server[0x6736a1] /home/y/bin/traffic_server(_ZN6HttpSM12main_handlerEiPv+0x198)[0x52b868] /home/y/bin/traffic_server(_Z15write_to_net_ioP10NetHandlerP18UnixNetVConnectionP7EThread+0x847)[0x675517]
[jira] [Commented] (TS-1330) Logging related segfault in 3.2.0
[ https://issues.apache.org/jira/browse/TS-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13703409#comment-13703409 ] Leif Hedstrom commented on TS-1330: --- Assigning to Bryan to make a decision to either fix it or move it out to v3.5.0. Logging related segfault in 3.2.0 - Key: TS-1330 URL: https://issues.apache.org/jira/browse/TS-1330 Project: Traffic Server Issue Type: Bug Components: Logging Affects Versions: 3.2.0 Environment: ATS 3.2.0 on RHEL 6.2 64-bit Reporter: David Carlin Assignee: Leif Hedstrom Priority: Critical Labels: A, crash Fix For: 3.3.5 I observed the following crash once on one of our ATS boxes - possibly related to TS-1240 Jul 2 13:56:56 l2 traffic_server[25853]: {0x2b0a391e1700} ERROR: [SSL_NetVConnection::ssl_read_from_net] SSL_ERROR_SYSCALL, underlying IO error: Connection reset by peer Jul 2 13:59:56 l2 kernel: [ET_NET 1][25855]: segfault at c ip 0058e083 sp 2b0a2982b740 error 6 Jul 2 13:59:56 l2 kernel: [ET_NET 3][25857]: segfault at 84 ip 0058e083 sp 2b0a29a31740 error 6 in traffic_server[40+34] Jul 2 13:59:56 l2 kernel: in traffic_server[40+34] Jul 2 14:02:59 l2 traffic_cop[25901]: (test) read timeout [18 ] Jul 2 14:02:59 l2 traffic_cop[25901]: server heartbeat failed [1] Jul 2 14:03:08 l2 traffic_manager[25826]: {0x7f3f088607e0} FATAL: [LocalManager::pollMgmtProcessServer] Error in read (errno: 104) Jul 2 14:03:09 l2 traffic_manager[25826]: {0x7f3f088607e0} FATAL: (last system error 104: Connection reset by peer) Jul 2 14:03:09 l2 traffic_cop[25901]: cannot find traffic_server [1] Jul 2 14:03:09 l2 traffic_manager[25826]: {0x7f3f088607e0} ERROR: [LocalManager::sendMgmtMsgToProcesses] Error writing message Jul 2 14:03:09 l2 traffic_manager[25826]: {0x7f3f088607e0} ERROR: (last system error 32: Broken pipe) Jul 2 14:03:12 l2 traffic_cop[25901]: cop received child status signal [25826 35584] Jul 2 14:03:12 l2 traffic_cop[25901]: traffic_manager not running, making sure traffic_server is dead Jul 2 14:03:12 l2 traffic_cop[25901]: spawning traffic_manager Jul 2 14:03:13 l2 traffic_manager[18267]: NOTE: --- Manager Starting --- Jul 2 14:03:13 l2 traffic_manager[18267]: NOTE: Manager Version: Apache Traffic Server - traffic_manager - 3.2.0 - (build # 52518 on Jun 25 2012 at 18:22:12) Jul 2 14:03:13 l2 traffic_manager[18267]: {0x7fe63de3f7e0} STATUS: opened /home/y/logs/trafficserver/manager.log Jul 2 14:03:15 l2 traffic_server[18322]: NOTE: --- Server Starting --- Jul 2 14:03:15 l2 traffic_server[18322]: NOTE: Server Version: Apache Traffic Server - traffic_server - 3.2.0 - (build # 52518 on Jun 25 2012 at 18:22:31) Jul 2 14:03:15 l2 traffic_server[18322]: {0x2b77573ab860} STATUS: opened /home/y/logs/trafficserver/diags.log Jul 2 14:03:15 l2 traffic_server[18322]: {0x2b77573ab860} ERROR: Cannot insert duplicate! Jul 2 14:03:22 l2 traffic_cop[25901]: server heartbeat succeeded [Jul 2 13:56:56.304] Server {0x2b0a391e1700} ERROR: [SSL_NetVConnection::ssl_read_from_net] SSL_ERROR_SYSCALL, underlying IO error: Connection reset by peer NOTE: Traffic Server received Sig 11: Segmentation fault NOTE: Traffic Server received Sig 11: Segmentation fault /home/y/bin/traffic_server - STACK TRACE: /home/y/bin/traffic_server - STACK TRACE: /lib64/libpthread.so.0[0x3b54e0f4a0] /lib64/libpthread.so.0[0x3b54e0f4a0] /home/y/bin/traffic_server(_ZN9LogBuffer14checkout_writeEPmm+0x153)[0x58e083] /home/y/bin/traffic_server(_ZN9LogObject15_checkout_writeEPmm+0xa8)[0x5a64c8] /home/y/bin/traffic_server(_ZN9LogObject3logEP9LogAccessPc+0x2f0)[0x5a7e30] /home/y/bin/traffic_server(_ZN9LogBuffer14checkout_writeEPmm+0x153)[0x58e083] /home/y/bin/traffic_server(_ZN9LogObject15_checkout_writeEPmm+0xa8)[0x5a64c8] /home/y/bin/traffic_server(_ZN9LogObject3logEP9LogAccessPc+0x2f0)[0x5a7e30] /home/y/bin/traffic_server(_ZN3Log6accessEP9LogAccess+0x146)[0x58f506] /home/y/bin/traffic_server(_ZN6HttpSM12update_statsEv+0x630)[0x526c50] /home/y/bin/traffic_server(_ZN3Log6accessEP9LogAccess+0x146)[0x58f506] /home/y/bin/traffic_server(_ZN6HttpSM9kill_thisEv+0x928)[0x52b548] /home/y/bin/traffic_server(_ZN6HttpSM12update_statsEv+0x630)[0x526c50] /home/y/bin/traffic_server(_ZN6HttpSM9kill_thisEv+0x928)[0x52b548] /home/y/bin/traffic_server(_ZN6HttpSM12main_handlerEiPv+0x198)[0x52b868] /home/y/bin/traffic_server(_ZN10HttpTunnel12main_handlerEiPv+0xde)[0x56c3ee] /home/y/bin/traffic_server[0x6736a1] /home/y/bin/traffic_server(_ZN6HttpSM12main_handlerEiPv+0x198)[0x52b868] /home/y/bin/traffic_server(_Z15write_to_net_ioP10NetHandlerP18UnixNetVConnectionP7EThread+0x847)[0x675517]
[jira] [Commented] (TS-1330) Logging related segfault in 3.2.0
[ https://issues.apache.org/jira/browse/TS-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13695914#comment-13695914 ] Bryan Call commented on TS-1330: Looking at the core dumps on a production box: {code} [bcall@l2 ~]$ sudo grep -A 10 'STACK TRACE' /usr/local/logs/trafficserver/traffic.out | grep -v ^\\[ /usr/local/bin/traffic_server - STACK TRACE: -- /usr/local/bin/traffic_server - STACK TRACE: /lib64/libpthread.so.0[0x342f00f500] /usr/local/bin/traffic_server(_ZN9LogBuffer14checkout_writeEPmm+0x153)[0x58e3e3] /usr/local/bin/traffic_server(_ZN9LogObject15_checkout_writeEPmm+0xa8)[0x5a6828] /usr/local/bin/traffic_server(_ZN9LogObject3logEP9LogAccessPc+0x2f0)[0x5a8190] /usr/local/bin/traffic_server(_ZN3Log6accessEP9LogAccess+0x146)[0x58f866] /usr/local/bin/traffic_server(_ZN6HttpSM12update_statsEv+0x630)[0x526fa0] /usr/local/bin/traffic_server(_ZN6HttpSM9kill_thisEv+0x928)[0x52b898] /usr/local/bin/traffic_server(_ZN6HttpSM12main_handlerEiPv+0x198)[0x52bbb8] /usr/local/bin/traffic_server(_ZN10HttpTunnel12main_handlerEiPv+0xde)[0x56c74e] /usr/local/bin/traffic_server[0x6743c1] -- /usr/local/bin/traffic_server/usr/local/bin/traffic_server - STACK TRACE: /usr/local/bin/traffic_server - STACK TRACE: - STACK TRACE: /usr/local/bin/traffic_server/usr/local/bin/traffic_server - STACK TRACE: - STACK TRACE: /lib64/libpthread.so.0[0x342f00f500] /usr/local/bin/traffic_server(_ZN9LogBuffer14checkout_writeEPmm+0x153)[0x58e3e3] /usr/local/bin/traffic_server(_ZN9LogObject15_checkout_writeEPmm+0xa8)[0x5a6828] /usr/local/bin/traffic_server(_ZN9LogObject3logEP9LogAccessPc+0x2f0)[0x5a8190] /lib64/libpthread.so.0[0x342f00f500] /usr/local/bin/traffic_server(_ZN9LogBuffer14checkout_writeEPmm+0x153)[0x58e3e3] /lib64/libpthread.so.0[0x342f00f500] /usr/local/bin/traffic_server(_ZN9LogBuffer14checkout_writeEPmm+0x153)[0x58e3e3] /usr/local/bin/traffic_server(_ZN9LogObject15_checkout_writeEPmm+0xa8)[0x5a6828] /usr/local/bin/traffic_server(_ZN9LogObject3logEP9LogAccessPc+0x2f0)[0x5a8190] -- /usr/local/bin/traffic_server - STACK TRACE: /lib64/libpthread.so.0[0x342f00f500] /usr/local/bin/traffic_server(_ZN9LogBuffer14checkout_writeEPmm+0x153)[0x58e3e3] /usr/local/bin/traffic_server(_ZN9LogObject15_checkout_writeEPmm+0xa8)[0x5a6828] /usr/local/bin/traffic_server(_ZN9LogObject3logEP9LogAccessPc+0x2f0)[0x5a8190] /usr/local/bin/traffic_server(_ZN3Log6accessEP9LogAccess+0x146)[0x58f866] /usr/local/bin/traffic_server(_ZN6HttpSM12update_statsEv+0x630)[0x526fa0] /usr/local/bin/traffic_server(_ZN6HttpSM9kill_thisEv+0x928)[0x52b898] /usr/local/bin/traffic_server(_ZN6HttpSM12main_handlerEiPv+0x198)[0x52bbb8] /usr/local/bin/traffic_server[0x6735eb] /usr/local/bin/traffic_server(_ZN18UnixNetVConnection9mainEventEiP5Event+0x5df)[0x67577f] -- /usr/local/bin/traffic_server - STACK TRACE: /lib64/libpthread.so.0[0x342f00f500] /usr/local/bin/traffic_server(_ZN9LogBuffer14checkout_writeEPmm+0x153)[0x58e3e3] /usr/local/bin/traffic_server(_ZN9LogObject15_checkout_writeEPmm+0xa8)[0x5a6828] /usr/local/bin/traffic_server(_ZN9LogObject3logEP9LogAccessPc+0x2f0)[0x5a8190] /usr/local/bin/traffic_server(_ZN3Log6accessEP9LogAccess+0x146)[0x58f866] /usr/local/bin/traffic_server(_ZN6HttpSM12update_statsEv+0x630)[0x526fa0] /usr/local/bin/traffic_server(_ZN6HttpSM9kill_thisEv+0x928)[0x52b898] /usr/local/bin/traffic_server(_ZN6HttpSM12main_handlerEiPv+0x198)[0x52bbb8] /usr/local/bin/traffic_server[0x673ca1] /usr/local/bin/traffic_server[0x6777f5] -- /usr/local/bin/traffic_server - STACK TRACE: -- /usr/local/bin/traffic_server - STACK TRACE: /lib64/libpthread.so.0[0x342f00f500] /usr/local/bin/traffic_server(_ZN9LogBuffer14checkout_writeEPmm+0x153)[0x58e3e3] /usr/local/bin/traffic_server(_ZN9LogObject15_checkout_writeEPmm+0xa8)[0x5a6828] /usr/local/bin/traffic_server(_ZN9LogObject3logEP9LogAccessPc+0x2f0)[0x5a8190] /usr/local/bin/traffic_server(_ZN3Log6accessEP9LogAccess+0x146)[0x58f866] /usr/local/bin/traffic_server(_ZN6HttpSM12update_statsEv+0x630)[0x526fa0] /usr/local/bin/traffic_server(_ZN6HttpSM9kill_thisEv+0x928)[0x52b898] /usr/local/bin/traffic_server(_ZN6HttpSM12main_handlerEiPv+0x198)[0x52bbb8] /usr/local/bin/traffic_server(_ZN10HttpTunnel12main_handlerEiPv+0xde)[0x56c74e] /usr/local/bin/traffic_server[0x6743c1] -- /usr/local/bin/traffic_server - STACK TRACE: /lib64/libpthread.so.0[0x342f00f500] /usr/local/lib64/libtsutil.so.3(ink_atomiclist_popall+0x64)[0x2ad678deedc4] /usr/local/bin/traffic_server(_ZN16LogBufferManager13flush_buffersEP13LogBufferSink+0x1f)[0x5a734f] /usr/local/bin/traffic_server(_ZN16LogObjectManager13flush_buffersEv+0x28)[0x5a7e28] /usr/local/bin/traffic_server(_ZN3Log17flush_thread_mainEPv+0xaa)[0x59012a] /usr/local/bin/traffic_server(_ZN24LoggingFlushContinuation9mainEventEiPv+0xb)[0x59361b] /usr/local/bin/traffic_server(_ZN7EThread7executeEv+0xb86)[0x698886] /usr/local/bin/traffic_server[0x696802] /lib64/libpthread.so.0[0x342f007851]
[jira] [Commented] (TS-1330) Logging related segfault in 3.2.0
[ https://issues.apache.org/jira/browse/TS-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482935#comment-13482935 ] Peter commented on TS-1330: --- I'm seeing something similar in ATS/3.0.5 [2012-10-23 18:10:41] [ET_SSL 4][24739]: segfault at 8a ip 00639afd sp 2ab043ca6d90 error 6 in traffic_server[40+31d000] [2012-10-23 18:21:06] [ET_SSL 1][7225]: segfault at 8a ip 00639afd sp 2b8def9a3d90 error 6 in traffic_server[40+31d000] [2012-10-23 19:02:02] [ET_SSL 0][7333]: segfault at 8a ip 00639afd sp 2b23e40ffd90 error 6 in traffic_server[40+31d000] [2012-10-23 19:02:11] [ET_SSL 3][7677]: segfault at 8a ip 00639afd sp 2b4ab3ba5d90 error 6 in traffic_server[40+31d000] I'll try to get a core asap Logging related segfault in 3.2.0 - Key: TS-1330 URL: https://issues.apache.org/jira/browse/TS-1330 Project: Traffic Server Issue Type: Bug Components: Logging Affects Versions: 3.2.0 Environment: ATS 3.2.0 on RHEL 6.2 64-bit Reporter: David Carlin Fix For: 3.3.1 I observed the following crash once on one of our ATS boxes - possibly related to TS-1240 Jul 2 13:56:56 l2 traffic_server[25853]: {0x2b0a391e1700} ERROR: [SSL_NetVConnection::ssl_read_from_net] SSL_ERROR_SYSCALL, underlying IO error: Connection reset by peer Jul 2 13:59:56 l2 kernel: [ET_NET 1][25855]: segfault at c ip 0058e083 sp 2b0a2982b740 error 6 Jul 2 13:59:56 l2 kernel: [ET_NET 3][25857]: segfault at 84 ip 0058e083 sp 2b0a29a31740 error 6 in traffic_server[40+34] Jul 2 13:59:56 l2 kernel: in traffic_server[40+34] Jul 2 14:02:59 l2 traffic_cop[25901]: (test) read timeout [18 ] Jul 2 14:02:59 l2 traffic_cop[25901]: server heartbeat failed [1] Jul 2 14:03:08 l2 traffic_manager[25826]: {0x7f3f088607e0} FATAL: [LocalManager::pollMgmtProcessServer] Error in read (errno: 104) Jul 2 14:03:09 l2 traffic_manager[25826]: {0x7f3f088607e0} FATAL: (last system error 104: Connection reset by peer) Jul 2 14:03:09 l2 traffic_cop[25901]: cannot find traffic_server [1] Jul 2 14:03:09 l2 traffic_manager[25826]: {0x7f3f088607e0} ERROR: [LocalManager::sendMgmtMsgToProcesses] Error writing message Jul 2 14:03:09 l2 traffic_manager[25826]: {0x7f3f088607e0} ERROR: (last system error 32: Broken pipe) Jul 2 14:03:12 l2 traffic_cop[25901]: cop received child status signal [25826 35584] Jul 2 14:03:12 l2 traffic_cop[25901]: traffic_manager not running, making sure traffic_server is dead Jul 2 14:03:12 l2 traffic_cop[25901]: spawning traffic_manager Jul 2 14:03:13 l2 traffic_manager[18267]: NOTE: --- Manager Starting --- Jul 2 14:03:13 l2 traffic_manager[18267]: NOTE: Manager Version: Apache Traffic Server - traffic_manager - 3.2.0 - (build # 52518 on Jun 25 2012 at 18:22:12) Jul 2 14:03:13 l2 traffic_manager[18267]: {0x7fe63de3f7e0} STATUS: opened /home/y/logs/trafficserver/manager.log Jul 2 14:03:15 l2 traffic_server[18322]: NOTE: --- Server Starting --- Jul 2 14:03:15 l2 traffic_server[18322]: NOTE: Server Version: Apache Traffic Server - traffic_server - 3.2.0 - (build # 52518 on Jun 25 2012 at 18:22:31) Jul 2 14:03:15 l2 traffic_server[18322]: {0x2b77573ab860} STATUS: opened /home/y/logs/trafficserver/diags.log Jul 2 14:03:15 l2 traffic_server[18322]: {0x2b77573ab860} ERROR: Cannot insert duplicate! Jul 2 14:03:22 l2 traffic_cop[25901]: server heartbeat succeeded [Jul 2 13:56:56.304] Server {0x2b0a391e1700} ERROR: [SSL_NetVConnection::ssl_read_from_net] SSL_ERROR_SYSCALL, underlying IO error: Connection reset by peer NOTE: Traffic Server received Sig 11: Segmentation fault NOTE: Traffic Server received Sig 11: Segmentation fault /home/y/bin/traffic_server - STACK TRACE: /home/y/bin/traffic_server - STACK TRACE: /lib64/libpthread.so.0[0x3b54e0f4a0] /lib64/libpthread.so.0[0x3b54e0f4a0] /home/y/bin/traffic_server(_ZN9LogBuffer14checkout_writeEPmm+0x153)[0x58e083] /home/y/bin/traffic_server(_ZN9LogObject15_checkout_writeEPmm+0xa8)[0x5a64c8] /home/y/bin/traffic_server(_ZN9LogObject3logEP9LogAccessPc+0x2f0)[0x5a7e30] /home/y/bin/traffic_server(_ZN9LogBuffer14checkout_writeEPmm+0x153)[0x58e083] /home/y/bin/traffic_server(_ZN9LogObject15_checkout_writeEPmm+0xa8)[0x5a64c8] /home/y/bin/traffic_server(_ZN9LogObject3logEP9LogAccessPc+0x2f0)[0x5a7e30] /home/y/bin/traffic_server(_ZN3Log6accessEP9LogAccess+0x146)[0x58f506] /home/y/bin/traffic_server(_ZN6HttpSM12update_statsEv+0x630)[0x526c50] /home/y/bin/traffic_server(_ZN3Log6accessEP9LogAccess+0x146)[0x58f506] /home/y/bin/traffic_server(_ZN6HttpSM9kill_thisEv+0x928)[0x52b548] /home/y/bin/traffic_server(_ZN6HttpSM12update_statsEv+0x630)[0x526c50]
[jira] [Commented] (TS-1330) Logging related segfault in 3.2.0
[ https://issues.apache.org/jira/browse/TS-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13436984#comment-13436984 ] David Carlin commented on TS-1330: -- Still seeing this on a few different boxes occasionally: NOTE: Traffic Server received Sig 11: Segmentation fault /home/y/bin/traffic_server - STACK TRACE: /lib64/libpthread.so.0[0x35ea80f500] /home/y/bin/traffic_server(_ZN9LogBuffer14checkout_writeEPmm+0x153)[0x58e083] /home/y/bin/traffic_server(_ZN9LogObject15_checkout_writeEPmm+0xa8)[0x5a64c8] /home/y/bin/traffic_server(_ZN9LogObject3logEP9LogAccessPc+0x2f0)[0x5a7e30] /home/y/bin/traffic_server(_ZN3Log6accessEP9LogAccess+0x146)[0x58f506] /home/y/bin/traffic_server(_ZN6HttpSM12update_statsEv+0x630)[0x526c50] /home/y/bin/traffic_server(_ZN6HttpSM9kill_thisEv+0x928)[0x52b548] /home/y/bin/traffic_server(_ZN6HttpSM12main_handlerEiPv+0x198)[0x52b868] /home/y/bin/traffic_server(_ZN10HttpTunnel12main_handlerEiPv+0xde)[0x56c3ee] /home/y/bin/traffic_server[0x673871] /home/y/bin/traffic_server(_Z15write_to_net_ioP10NetHandlerP18UnixNetVConnectionP7EThread+0x847)[0x6756e7] /home/y/bin/traffic_server(_ZN10NetHandler12mainNetEventEiP5Event+0x286)[0x66e076] /home/y/bin/traffic_server(_ZN7EThread13process_eventEP5Eventi+0xb4)[0x696ce4] /home/y/bin/traffic_server(_ZN7EThread7executeEv+0x4c3)[0x697673] /home/y/bin/traffic_server[0x695cb2] /lib64/libpthread.so.0[0x35ea807851] /lib64/libc.so.6(clone+0x6d)[0x35ea4e767d] NOTE: Traffic Server received Sig 11: Segmentation fault /home/y/bin/traffic_server - STACK TRACE: /lib64/libpthread.so.0[0x37d080f500] /home/y/bin/traffic_server(_ZN9LogBuffer14checkout_writeEPmm+0x153)[0x58e083] /home/y/bin/traffic_server(_ZN9LogObject15_checkout_writeEPmm+0xa8)[0x5a64c8] /home/y/bin/traffic_server(_ZN9LogObject3logEP9LogAccessPc+0x2f0)[0x5a7e30] /home/y/bin/traffic_server(_ZN3Log6accessEP9LogAccess+0x146)[0x58f506] /home/y/bin/traffic_server(_ZN6HttpSM12update_statsEv+0x630)[0x526c50] /home/y/bin/traffic_server(_ZN6HttpSM9kill_thisEv+0x928)[0x52b548] /home/y/bin/traffic_server(_ZN6HttpSM12main_handlerEiPv+0x198)[0x52b868] /home/y/bin/traffic_server(_ZN10HttpTunnel12main_handlerEiPv+0xde)[0x56c3ee] /home/y/bin/traffic_server[0x673871] /home/y/bin/traffic_server(_Z15write_to_net_ioP10NetHandlerP18UnixNetVConnectionP7EThread+0x847)[0x6756e7] /home/y/bin/traffic_server(_ZN10NetHandler12mainNetEventEiP5Event+0x286)[0x66e076] /home/y/bin/traffic_server(_ZN7EThread13process_eventEP5Eventi+0xb4)[0x696ce4] /home/y/bin/traffic_server(_ZN7EThread7executeEv+0x4c3)[0x697673] /home/y/bin/traffic_server[0x695cb2] /lib64/libpthread.so.0[0x37d0807851] /lib64/libc.so.6(clone+0x6d)[0x37d04e767d] NOTE: Traffic Server received Sig 11: Segmentation fault NOTE: Traffic Server received Sig 11: Segmentation fault /home/y/bin/traffic_server - STACK TRACE: /home/y/bin/traffic_server - STACK TRACE: /lib64/libpthread.so.0[0x3ecd40f500] /lib64/libpthread.so.0[0x3ecd40f500] /home/y/bin/traffic_server(_ZN9LogBuffer14checkout_writeEPmm+0x153)[0x58e083] /home/y/bin/traffic_server(_ZN9LogObject15_checkout_writeEPmm+0xa8)[0x5a64c8] /home/y/bin/traffic_server(_ZN9LogObject3logEP9LogAccessPc+0x2f0)[0x5a7e30] /home/y/bin/traffic_server(_ZN9LogBuffer14checkout_writeEPmm+0x153)[0x58e083] /home/y/bin/traffic_server(_ZN9LogObject15_checkout_writeEPmm+0xa8)[0x5a64c8] /home/y/bin/traffic_server(_ZN3Log6accessEP9LogAccess+0x146)[0x58f506] /home/y/bin/traffic_server(_ZN9LogObject3logEP9LogAccessPc+0x2f0)[0x5a7e30] /home/y/bin/traffic_server(_ZN6HttpSM12update_statsEv+0x630)[0x526c50] /home/y/bin/traffic_server(_ZN3Log6accessEP9LogAccess+0x146)[0x58f506] /home/y/bin/traffic_server(_ZN6HttpSM9kill_thisEv+0x928)[0x52b548] /home/y/bin/traffic_server(_ZN6HttpSM12main_handlerEiPv+0x198)[0x52b868] /home/y/bin/traffic_server(_ZN10HttpTunnel12main_handlerEiPv+0xde)[0x56c3ee] /home/y/bin/traffic_server[0x673871] /home/y/bin/traffic_server(_Z15write_to_net_ioP10NetHandlerP18UnixNetVConnectionP7EThread+0x847)[0x6756e7] /home/y/bin/traffic_server(_ZN10NetHandler12mainNetEventEiP5Event+0x286)[0x66e076] /home/y/bin/traffic_server(_ZN7EThread13process_eventEP5Eventi+0xb4)[0x696ce4] /home/y/bin/traffic_server(_ZN7EThread7executeEv+0x4c3)[0x697673] /home/y/bin/traffic_server[0x695cb2] /lib64/libpthread.so.0[0x3ecd407851] /lib64/libc.so.6(clone+0x6d)[0x3ecd0e767d] /home/y/bin/traffic_server(_ZN6HttpSM12update_statsEv+0x630)[0x526c50] /home/y/bin/traffic_server(_ZN6HttpSM9kill_thisEv+0x928)[0x52b548] /home/y/bin/traffic_server(_ZN6HttpSM12main_handlerEiPv+0x198)[0x52b868] /home/y/bin/traffic_server[0x673151] /home/y/bin/traffic_server[0x676ca5] /home/y/bin/traffic_server(_ZN10NetHandler12mainNetEventEiP5Event+0x1f2)[0x66dfe2] /home/y/bin/traffic_server(_ZN7EThread13process_eventEP5Eventi+0xb4)[0x696ce4] /home/y/bin/traffic_server(_ZN7EThread7executeEv+0x4c3)[0x697673] /home/y/bin/traffic_server(main+0xea4)[0x4c26a4]
[jira] [Commented] (TS-1330) Logging related segfault in 3.2.0
[ https://issues.apache.org/jira/browse/TS-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13437009#comment-13437009 ] Leif Hedstrom commented on TS-1330: --- Any chance you can produce some more detailed backtraces and information, from the core file? Bryan can probably help you if you need. The stacktraces from the logs are much less useful than the data from the core files. Logging related segfault in 3.2.0 - Key: TS-1330 URL: https://issues.apache.org/jira/browse/TS-1330 Project: Traffic Server Issue Type: Bug Components: Logging Affects Versions: 3.2.0 Environment: ATS 3.2.0 on RHEL 6.2 64-bit Reporter: David Carlin Fix For: 3.3.1 I observed the following crash once on one of our ATS boxes - possibly related to TS-1240 Jul 2 13:56:56 l2 traffic_server[25853]: {0x2b0a391e1700} ERROR: [SSL_NetVConnection::ssl_read_from_net] SSL_ERROR_SYSCALL, underlying IO error: Connection reset by peer Jul 2 13:59:56 l2 kernel: [ET_NET 1][25855]: segfault at c ip 0058e083 sp 2b0a2982b740 error 6 Jul 2 13:59:56 l2 kernel: [ET_NET 3][25857]: segfault at 84 ip 0058e083 sp 2b0a29a31740 error 6 in traffic_server[40+34] Jul 2 13:59:56 l2 kernel: in traffic_server[40+34] Jul 2 14:02:59 l2 traffic_cop[25901]: (test) read timeout [18 ] Jul 2 14:02:59 l2 traffic_cop[25901]: server heartbeat failed [1] Jul 2 14:03:08 l2 traffic_manager[25826]: {0x7f3f088607e0} FATAL: [LocalManager::pollMgmtProcessServer] Error in read (errno: 104) Jul 2 14:03:09 l2 traffic_manager[25826]: {0x7f3f088607e0} FATAL: (last system error 104: Connection reset by peer) Jul 2 14:03:09 l2 traffic_cop[25901]: cannot find traffic_server [1] Jul 2 14:03:09 l2 traffic_manager[25826]: {0x7f3f088607e0} ERROR: [LocalManager::sendMgmtMsgToProcesses] Error writing message Jul 2 14:03:09 l2 traffic_manager[25826]: {0x7f3f088607e0} ERROR: (last system error 32: Broken pipe) Jul 2 14:03:12 l2 traffic_cop[25901]: cop received child status signal [25826 35584] Jul 2 14:03:12 l2 traffic_cop[25901]: traffic_manager not running, making sure traffic_server is dead Jul 2 14:03:12 l2 traffic_cop[25901]: spawning traffic_manager Jul 2 14:03:13 l2 traffic_manager[18267]: NOTE: --- Manager Starting --- Jul 2 14:03:13 l2 traffic_manager[18267]: NOTE: Manager Version: Apache Traffic Server - traffic_manager - 3.2.0 - (build # 52518 on Jun 25 2012 at 18:22:12) Jul 2 14:03:13 l2 traffic_manager[18267]: {0x7fe63de3f7e0} STATUS: opened /home/y/logs/trafficserver/manager.log Jul 2 14:03:15 l2 traffic_server[18322]: NOTE: --- Server Starting --- Jul 2 14:03:15 l2 traffic_server[18322]: NOTE: Server Version: Apache Traffic Server - traffic_server - 3.2.0 - (build # 52518 on Jun 25 2012 at 18:22:31) Jul 2 14:03:15 l2 traffic_server[18322]: {0x2b77573ab860} STATUS: opened /home/y/logs/trafficserver/diags.log Jul 2 14:03:15 l2 traffic_server[18322]: {0x2b77573ab860} ERROR: Cannot insert duplicate! Jul 2 14:03:22 l2 traffic_cop[25901]: server heartbeat succeeded [Jul 2 13:56:56.304] Server {0x2b0a391e1700} ERROR: [SSL_NetVConnection::ssl_read_from_net] SSL_ERROR_SYSCALL, underlying IO error: Connection reset by peer NOTE: Traffic Server received Sig 11: Segmentation fault NOTE: Traffic Server received Sig 11: Segmentation fault /home/y/bin/traffic_server - STACK TRACE: /home/y/bin/traffic_server - STACK TRACE: /lib64/libpthread.so.0[0x3b54e0f4a0] /lib64/libpthread.so.0[0x3b54e0f4a0] /home/y/bin/traffic_server(_ZN9LogBuffer14checkout_writeEPmm+0x153)[0x58e083] /home/y/bin/traffic_server(_ZN9LogObject15_checkout_writeEPmm+0xa8)[0x5a64c8] /home/y/bin/traffic_server(_ZN9LogObject3logEP9LogAccessPc+0x2f0)[0x5a7e30] /home/y/bin/traffic_server(_ZN9LogBuffer14checkout_writeEPmm+0x153)[0x58e083] /home/y/bin/traffic_server(_ZN9LogObject15_checkout_writeEPmm+0xa8)[0x5a64c8] /home/y/bin/traffic_server(_ZN9LogObject3logEP9LogAccessPc+0x2f0)[0x5a7e30] /home/y/bin/traffic_server(_ZN3Log6accessEP9LogAccess+0x146)[0x58f506] /home/y/bin/traffic_server(_ZN6HttpSM12update_statsEv+0x630)[0x526c50] /home/y/bin/traffic_server(_ZN3Log6accessEP9LogAccess+0x146)[0x58f506] /home/y/bin/traffic_server(_ZN6HttpSM9kill_thisEv+0x928)[0x52b548] /home/y/bin/traffic_server(_ZN6HttpSM12update_statsEv+0x630)[0x526c50] /home/y/bin/traffic_server(_ZN6HttpSM9kill_thisEv+0x928)[0x52b548] /home/y/bin/traffic_server(_ZN6HttpSM12main_handlerEiPv+0x198)[0x52b868] /home/y/bin/traffic_server(_ZN10HttpTunnel12main_handlerEiPv+0xde)[0x56c3ee] /home/y/bin/traffic_server[0x6736a1] /home/y/bin/traffic_server(_ZN6HttpSM12main_handlerEiPv+0x198)[0x52b868]
[jira] [Commented] (TS-1330) Logging related segfault in 3.2.0
[ https://issues.apache.org/jira/browse/TS-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13437112#comment-13437112 ] David Carlin commented on TS-1330: -- I'm not able to find core dumps for this issue. Whats in traffic.out is all I have. There are core dumps on these boxes for other crashes, but for some reason these events don't generate one. Logging related segfault in 3.2.0 - Key: TS-1330 URL: https://issues.apache.org/jira/browse/TS-1330 Project: Traffic Server Issue Type: Bug Components: Logging Affects Versions: 3.2.0 Environment: ATS 3.2.0 on RHEL 6.2 64-bit Reporter: David Carlin Fix For: 3.3.1 I observed the following crash once on one of our ATS boxes - possibly related to TS-1240 Jul 2 13:56:56 l2 traffic_server[25853]: {0x2b0a391e1700} ERROR: [SSL_NetVConnection::ssl_read_from_net] SSL_ERROR_SYSCALL, underlying IO error: Connection reset by peer Jul 2 13:59:56 l2 kernel: [ET_NET 1][25855]: segfault at c ip 0058e083 sp 2b0a2982b740 error 6 Jul 2 13:59:56 l2 kernel: [ET_NET 3][25857]: segfault at 84 ip 0058e083 sp 2b0a29a31740 error 6 in traffic_server[40+34] Jul 2 13:59:56 l2 kernel: in traffic_server[40+34] Jul 2 14:02:59 l2 traffic_cop[25901]: (test) read timeout [18 ] Jul 2 14:02:59 l2 traffic_cop[25901]: server heartbeat failed [1] Jul 2 14:03:08 l2 traffic_manager[25826]: {0x7f3f088607e0} FATAL: [LocalManager::pollMgmtProcessServer] Error in read (errno: 104) Jul 2 14:03:09 l2 traffic_manager[25826]: {0x7f3f088607e0} FATAL: (last system error 104: Connection reset by peer) Jul 2 14:03:09 l2 traffic_cop[25901]: cannot find traffic_server [1] Jul 2 14:03:09 l2 traffic_manager[25826]: {0x7f3f088607e0} ERROR: [LocalManager::sendMgmtMsgToProcesses] Error writing message Jul 2 14:03:09 l2 traffic_manager[25826]: {0x7f3f088607e0} ERROR: (last system error 32: Broken pipe) Jul 2 14:03:12 l2 traffic_cop[25901]: cop received child status signal [25826 35584] Jul 2 14:03:12 l2 traffic_cop[25901]: traffic_manager not running, making sure traffic_server is dead Jul 2 14:03:12 l2 traffic_cop[25901]: spawning traffic_manager Jul 2 14:03:13 l2 traffic_manager[18267]: NOTE: --- Manager Starting --- Jul 2 14:03:13 l2 traffic_manager[18267]: NOTE: Manager Version: Apache Traffic Server - traffic_manager - 3.2.0 - (build # 52518 on Jun 25 2012 at 18:22:12) Jul 2 14:03:13 l2 traffic_manager[18267]: {0x7fe63de3f7e0} STATUS: opened /home/y/logs/trafficserver/manager.log Jul 2 14:03:15 l2 traffic_server[18322]: NOTE: --- Server Starting --- Jul 2 14:03:15 l2 traffic_server[18322]: NOTE: Server Version: Apache Traffic Server - traffic_server - 3.2.0 - (build # 52518 on Jun 25 2012 at 18:22:31) Jul 2 14:03:15 l2 traffic_server[18322]: {0x2b77573ab860} STATUS: opened /home/y/logs/trafficserver/diags.log Jul 2 14:03:15 l2 traffic_server[18322]: {0x2b77573ab860} ERROR: Cannot insert duplicate! Jul 2 14:03:22 l2 traffic_cop[25901]: server heartbeat succeeded [Jul 2 13:56:56.304] Server {0x2b0a391e1700} ERROR: [SSL_NetVConnection::ssl_read_from_net] SSL_ERROR_SYSCALL, underlying IO error: Connection reset by peer NOTE: Traffic Server received Sig 11: Segmentation fault NOTE: Traffic Server received Sig 11: Segmentation fault /home/y/bin/traffic_server - STACK TRACE: /home/y/bin/traffic_server - STACK TRACE: /lib64/libpthread.so.0[0x3b54e0f4a0] /lib64/libpthread.so.0[0x3b54e0f4a0] /home/y/bin/traffic_server(_ZN9LogBuffer14checkout_writeEPmm+0x153)[0x58e083] /home/y/bin/traffic_server(_ZN9LogObject15_checkout_writeEPmm+0xa8)[0x5a64c8] /home/y/bin/traffic_server(_ZN9LogObject3logEP9LogAccessPc+0x2f0)[0x5a7e30] /home/y/bin/traffic_server(_ZN9LogBuffer14checkout_writeEPmm+0x153)[0x58e083] /home/y/bin/traffic_server(_ZN9LogObject15_checkout_writeEPmm+0xa8)[0x5a64c8] /home/y/bin/traffic_server(_ZN9LogObject3logEP9LogAccessPc+0x2f0)[0x5a7e30] /home/y/bin/traffic_server(_ZN3Log6accessEP9LogAccess+0x146)[0x58f506] /home/y/bin/traffic_server(_ZN6HttpSM12update_statsEv+0x630)[0x526c50] /home/y/bin/traffic_server(_ZN3Log6accessEP9LogAccess+0x146)[0x58f506] /home/y/bin/traffic_server(_ZN6HttpSM9kill_thisEv+0x928)[0x52b548] /home/y/bin/traffic_server(_ZN6HttpSM12update_statsEv+0x630)[0x526c50] /home/y/bin/traffic_server(_ZN6HttpSM9kill_thisEv+0x928)[0x52b548] /home/y/bin/traffic_server(_ZN6HttpSM12main_handlerEiPv+0x198)[0x52b868] /home/y/bin/traffic_server(_ZN10HttpTunnel12main_handlerEiPv+0xde)[0x56c3ee] /home/y/bin/traffic_server[0x6736a1] /home/y/bin/traffic_server(_ZN6HttpSM12main_handlerEiPv+0x198)[0x52b868] /home/y/bin/traffic_server(_Z15write_to_net_ioP10NetHandlerP18UnixNetVConnectionP7EThread+0x847)[0x675517]