[GitHub] trafficserver issue #1401: Segfault in write_to_net_io with 7.1.x
Github user oknet commented on the issue: https://github.com/apache/trafficserver/issues/1401 @zwoop @bryancall I think we can close this issue since #1522 and #1559 have merged. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] trafficserver issue #1401: Segfault in write_to_net_io with 7.1.x
Github user scw00 commented on the issue: https://github.com/apache/trafficserver/issues/1401 epoll_wait trigger EVENTIO_ERROR(in read or write) would case segfault, because we do not set any state in vc(by calling do_io_xxx). we just recevie sockets and register in epoll_wait. In 6.x.x, we do not handle EVENTIO_ERROR. It may cause vc leaking, but avoid coredump. Calling do_io_xx before register or closed directly(it do not enter the trasaction) may fix it, but still on test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] trafficserver issue #1401: Segfault in write_to_net_io with 7.1.x
Github user scw00 commented on the issue: https://github.com/apache/trafficserver/issues/1401 epoll_wait triggered EVENTIO_ERROR after accept(do_blocking_accept) will cause coredump, because we do not call any do_io_xx. In 6.x.x we do not handle EVENTIO_ERROR eventï¼it may cause vc leaking, but avoid coredump. ` while ((vc = write_ready_list.dequeue())) { set_cont_flags(vc->control_flags); if (vc->closed) close_UnixNetVConnection(vc, trigger_event->ethread); else if ((vc->write.enabled || vc->write.error) && vc->write.triggered) write_to_net(this, vc, trigger_event->ethread); else if (!vc->write.enabled) { write_ready_list.remove(vc); #if defined(solaris) if (vc->write.triggered && vc->read.enabled) { vc->ep.modify(-EVENTIO_WRITE); vc->ep.refresh(EVENTIO_READ); vc->readReschedule(this); } #endif }` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] trafficserver issue #1401: Segfault in write_to_net_io with 7.1.x
Github user shinrich commented on the issue: https://github.com/apache/trafficserver/issues/1401 Finally get a use-after-free ASAN stack in this area. Anyone else having problems with ASAN in newer builds? Looks like it is showing a use after free in the case of the error bubbling. {code} ==30868==ERROR: AddressSanitizer: heap-use-after-free on address 0x624001933448 at pc 0x5afa20 bp 0x7fffeaefe7e0 sp 0x7fffeaefe7d8 READ of size 8 at 0x624001933448 thread T17 ([ET_NET 15]) #0 0x5afa1f in Continuation::handleEvent(int, void*) ../../../../trafficserver/iocore/eventsystem/I_Continuation.h:153 #1 0xae0c33 in write_signal_and_update ../../../../trafficserver/iocore/net/UnixNetVConnection.cc:176 #2 0xae10ac in write_signal_done ../../../../trafficserver/iocore/net/UnixNetVConnection.cc:218 #3 0xae11b2 in write_signal_error ../../../../trafficserver/iocore/net/UnixNetVConnection.cc:237 #4 0xae2a1e in write_to_net_io(NetHandler*, UnixNetVConnection*, EThread*) ../../../../trafficserver/iocore/net/UnixNetVConnection.cc:458 #5 0xae25e5 in write_to_net(NetHandler*, UnixNetVConnection*, EThread*) ../../../../trafficserver/iocore/net/UnixNetVConnection.cc:430 #6 0xace638 in NetHandler::mainNetEvent(int, Event*) ../../../../trafficserver/iocore/net/UnixNet.cc:526 #7 0x5afb30 in Continuation::handleEvent(int, void*) ../../../../trafficserver/iocore/eventsystem/I_Continuation.h:153 #8 0xb32866 in EThread::process_event(Event*, int) ../../../../trafficserver/iocore/eventsystem/UnixEThread.cc:143 #9 0xb33487 in EThread::execute() ../../../../trafficserver/iocore/eventsystem/UnixEThread.cc:270 #10 0xb3101b in spawn_thread_internal ../../../../trafficserver/iocore/eventsystem/Thread.cc:84 #11 0x7568aaa0 in start_thread (/lib64/libpthread.so.0+0x7aa0) #12 0x74fbd93c in clone (/lib64/libc.so.6+0xe893c) 0x624001933448 is located 4936 bytes inside of 7728-byte region [0x624001932100,0x624001933f30) freed by thread T17 ([ET_NET 15]) here: #0 0x549cb7 in free (/home/y/bin64/traffic_server+0x549cb7) #1 0x77b96c79 in ats_memalign_free ../../../../trafficserver/lib/ts/ink_memory.cc:141 #2 0x77b989be in malloc_free ../../../../trafficserver/lib/ts/ink_queue.cc:322 #3 0x77b986e8 in ink_freelist_free ../../../../trafficserver/lib/ts/ink_queue.cc:276 #4 0x75bc20 in ClassAllocator::free(HttpSM*) /var/builds/workspace/163866-v3-component/BUILD_CONTAINER/rhel6-gcc5_5/label/DOCKER-HIGH/app_root/_build/asan_build/../../trafficserver/lib/ts/Allocator.h:135 #5 0x708afe in HttpSM::destroy() ../../../../trafficserver/proxy/http/HttpSM.cc:365 #6 0x7459ad in HttpSM::kill_this() ../../../../trafficserver/proxy/http/HttpSM.cc:6951 #7 0x71dcb9 in HttpSM::main_handler(int, void*) ../../../../trafficserver/proxy/http/HttpSM.cc:2678 #8 0x5afb30 in Continuation::handleEvent(int, void*) ../../../../trafficserver/iocore/eventsystem/I_Continuation.h:153 #9 0x7f50f6 in HttpTunnel::main_handler(int, void*) ../../../../trafficserver/proxy/http/HttpTunnel.cc:1662 #10 0x5afb30 in Continuation::handleEvent(int, void*) ../../../../trafficserver/iocore/eventsystem/I_Continuation.h:153 #11 0xae0c33 in write_signal_and_update ../../../../trafficserver/iocore/net/UnixNetVConnection.cc:176 #12 0xae10ac in write_signal_done ../../../../trafficserver/iocore/net/UnixNetVConnection.cc:218 #13 0xae3588 in write_to_net_io(NetHandler*, UnixNetVConnection*, EThread*) ../../../../trafficserver/iocore/net/UnixNetVConnection.cc:596 #14 0xae25e5 in write_to_net(NetHandler*, UnixNetVConnection*, EThread*) ../../../../trafficserver/iocore/net/UnixNetVConnection.cc:430 #15 0xace638 in NetHandler::mainNetEvent(int, Event*) ../../../../trafficserver/iocore/net/UnixNet.cc:526 #16 0x5afb30 in Continuation::handleEvent(int, void*) ../../../../trafficserver/iocore/eventsystem/I_Continuation.h:153 #17 0xb32866 in EThread::process_event(Event*, int) ../../../../trafficserver/iocore/eventsystem/UnixEThread.cc:143 #18 0xb33487 in EThread::execute() ../../../../trafficserver/iocore/eventsystem/UnixEThread.cc:270 #19 0xb3101b in spawn_thread_internal ../../../../trafficserver/iocore/eventsystem/Thread.cc:84 #20 0x7568aaa0 in start_thread (/lib64/libpthread.so.0+0x7aa0) previously allocated by thread T17 ([ET_NET 15]) here: #0 0x54a42b in posix_memalign (/home/y/bin64/traffic_server+0x54a42b) #1 0x77b96afa in ats_memalign ../../../../trafficserver/lib/ts/ink_memory.cc:102 #2 0x77b984a5 in malloc_new ../../../../trafficserver/lib/ts/ink_queue.cc:260 #3 0x77b97e57 in ink_freelist_new ../../../../trafficserver/lib/ts/ink_queue.cc:183 #4 0x648f31 in
[GitHub] trafficserver issue #1401: Segfault in write_to_net_io with 7.1.x
Github user PSUdaemon commented on the issue: https://github.com/apache/trafficserver/issues/1401 I'm not sure there is a core to look at in the case I pasted. It was on a build host. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] trafficserver issue #1401: Segfault in write_to_net_io with 7.1.x
Github user shinrich commented on the issue: https://github.com/apache/trafficserver/issues/1401 BTW I'm currently testing without HTTP/2. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] trafficserver issue #1401: Segfault in write_to_net_io with 7.1.x
Github user shinrich commented on the issue: https://github.com/apache/trafficserver/issues/1401 @PSUdaemon would be cool to see symbols with the stack to verify that is the same thing. My hacky fix made it go away only to be immediately replaced by a new crash. (New issue to be posted). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] trafficserver issue #1401: Segfault in write_to_net_io with 7.1.x
Github user PSUdaemon commented on the issue: https://github.com/apache/trafficserver/issues/1401 I think I am [seeing this](https://ci.trafficserver.apache.org/job/ubuntu_14_04-6.2.x/compiler=clang,label=ubuntu_14_04,type=release/143/console) in 6.2.x as well: ``` traffic_server: Segmentation fault (Address not mapped to object [0x18]) traffic_server - STACK TRACE: /var/jenkins/workspace/ubuntu_14_04-6.2.x/compiler/clang/label/ubuntu_14_04/type/release/143/install/bin/traffic_server(_Z19crash_logger_invokeiP9siginfo_tPv+0x8e)[0x4a234e] /lib/x86_64-linux-gnu/libpthread.so.0(+0x10330)[0x7fb6fa205330] /var/jenkins/workspace/ubuntu_14_04-6.2.x/compiler/clang/label/ubuntu_14_04/type/release/143/install/bin/traffic_server(_Z15write_to_net_ioP10NetHandlerP18UnixNetVConnectionP7EThread+0xa1e)[0x6bb3ae] /var/jenkins/workspace/ubuntu_14_04-6.2.x/compiler/clang/label/ubuntu_14_04/type/release/143/install/bin/traffic_server(_ZN10NetHandler12mainNetEventEiP5Event+0x686)[0x6b4606] /var/jenkins/workspace/ubuntu_14_04-6.2.x/compiler/clang/label/ubuntu_14_04/type/release/143/install/bin/traffic_server(_ZN7EThread13process_eventEP5Eventi+0x123)[0x6d7af3] /var/jenkins/workspace/ubuntu_14_04-6.2.x/compiler/clang/label/ubuntu_14_04/type/release/143/install/bin/traffic_server(_ZN7EThread7executeEv+0x560)[0x6d8210] /var/jenkins/workspace/ubuntu_14_04-6.2.x/compiler/clang/label/ubuntu_14_04/type/release/143/install/bin/traffic_server(main+0x1de0)[0x4c8bd0] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7fb6f93ccf45] /var/jenkins/workspace/ubuntu_14_04-6.2.x/compiler/clang/label/ubuntu_14_04/type/release/143/install/bin/traffic_server[0x49218d] /home/jenkins/bin/regression.sh: line 24: 21351 Segmentation fault (core dumped) "${WORKSPACE}/${BUILD_NUMBER}/install/bin/traffic_server" -k -K -R 1 ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] trafficserver issue #1401: Segfault in write_to_net_io with 7.1.x
Github user shinrich commented on the issue: https://github.com/apache/trafficserver/issues/1401 I've finally got my environment running and I see the same stack very quickly as well. In the cases I've seen it looks like there was a write error, but for some reason the write vio has been cleared out (or was never set?) ``` (gdb) frame 2 #2 0x00788350 in write_to_net_io (nh=0x2af588003e60, vc=0x2aad1401b800, thread=0x2af58810) at UnixNetVConnection.cc:440 440 UnixNetVConnection.cc: No such file or directory. in UnixNetVConnection.cc (gdb) print *s $1 = {enabled = 0, error = 1, vio = {_cont = 0x0, nbytes = 0, ndone = 0, op = 0, buffer = {mbuf = 0x0, entry = 0x0}, vc_server = 0x0, mutex = {m_ptr = 0x0}}, ready_link = {= {next = 0x0}, prev = 0x0}, enable_link = {next = 0x0}, in_enabled_list = 0, triggered = 1} ``` The write.error stuff was added by Thomas in TS-4796, but if this just showed up between 7.0 and 7.1, it is unlikely that this was the culprit. I think it has been a while (since 9/3/2016). Seems more likely that someone has cleared the vio, or we are bouncing an error. In the short term I'm adding a NULL check at the begining of write_to_net_io, but that just seems to be masking the failure case rather than identifying the root cause. ``` + if (!s->vio.mutex) { +ink_release_assert(s->vio._cont == NULL && vc->write.error); +return; + } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] trafficserver issue #1401: Segfault in write_to_net_io with 7.1.x
Github user bryancall commented on the issue: https://github.com/apache/trafficserver/issues/1401 (gdb) bt full #0 0x005150b0 in Mutex_trylock (m=0x0, t=0x2b44f3a6d010) at /home/bcall/dev/yahoo/build/_build/ats_build/../../trafficserver/iocore/eventsystem/I_Lock.h:289 No locals. #1 0x0051526f in MutexTryLock::MutexTryLock (this=0x2b44f9c35be0, am=..., t=0x2b44f3a6d010) at /home/bcall/dev/yahoo/build/_build/ats_build/../../trafficserver/iocore/eventsystem/I_Lock.h:555 No locals. #2 0x00787888 in write_to_net_io (nh=0x2b44f3a70e60, vc=0x2aac5002bf10, thread=0x2b44f3a6d010) at ../../../../trafficserver/iocore/net/UnixNetVConnection.cc:439 s = 0x2aac5002c098 mutex = 0x2b44f40035d0 lock = {m = {m_ptr = 0x0}, lock_acquired = 16} ntodo = 47575248100432 buf = @0x2b9de60 towrite = 47575145566224 signalled = 10 needs = 2147483647 total_written = 3807 r = 44090535680 #3 0x0078781e in write_to_net (nh=0x2b44f3a70e60, vc=0x2aac5002bf10, thread=0x2b44f3a6d010) at ../../../../trafficserver/iocore/net/UnixNetVConnection.cc:430 mutex = 0x2b44f40035d0 #4 0x0077f443 in NetHandler::mainNetEvent (this=0x2b44f3a70e60, event=5, e=0x2cb5860) at ../../../../trafficserver/iocore/net/UnixNet.cc:526 epd = 0x2aac5c02dd70 poll_timeout = 0 pd = 0x2b45a010 vc = 0x2aac5002bf10 __func__ = "mainNetEvent" #5 0x00515354 in Continuation::handleEvent (this=0x2b44f3a70e60, event=5, data=0x2cb5860) at /home/bcall/dev/yahoo/build/_build/ats_build/../../trafficserver/iocore/eventsystem/I_Continuation.h:153 No locals. #6 0x007a8bc3 in EThread::process_event (this=0x2b44f3a6d010, e=0x2cb5860, calling_code=5) at ../../../../trafficserver/iocore/eventsystem/UnixEThread.cc:143 c_temp = 0x2b44f3a70e60 lock = {m = {m_ptr = 0x2b44f4001f50}, lock_acquired = true} #7 0x007a90c4 in EThread::execute (this=0x2b44f3a6d010) at ../../../../trafficserver/iocore/eventsystem/UnixEThread.cc:270 done_one = false e = 0x2cb5860 NegativeQueue = {> = {head = 0x0}, tail = 0x0} next_time = 1486425567363334599 #8 0x007a8279 in spawn_thread_internal (a=0x2ba5fd0) at ../../../../trafficserver/iocore/eventsystem/Thread.cc:84 p = 0x2ba5fd0 #9 0x2b44f0c14aa1 in start_thread () from /lib64/libpthread.so.0 No symbol table info available. #10 0x2b44f0960aad in clone () from /lib64/libc.so.6 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] trafficserver issue #1401: Segfault in write_to_net_io with 7.1.x
Github user zwoop commented on the issue: https://github.com/apache/trafficserver/issues/1401 Are you getting a core file / backtrace ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] trafficserver issue #1401: Segfault in write_to_net_io with 7.1.x
Github user biilmann commented on the issue: https://github.com/apache/trafficserver/issues/1401 Not this one no. We've seen a crash that looks related in the crash logs from 7.0.x but it is much rarer and I don't currently have a core-dump of that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] trafficserver issue #1401: Segfault in write_to_net_io with 7.1.x
Github user zwoop commented on the issue: https://github.com/apache/trafficserver/issues/1401 you didn't see this crasher in 7.0.0 right ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] trafficserver issue #1401: Segfault in write_to_net_io with 7.1.x
Github user biilmann commented on the issue: https://github.com/apache/trafficserver/issues/1401 Some info on the vc: ``` p vc $1 = (UnixNetVConnection *) 0x2b9a7802ff10 p vc.closed $2 = 0 p vc.action_ $3 = {_vptr.Action = 0x788a90 , continuation = 0x23dcdf0, mutex = {m_ptr = 0x0}, cancelled = 0} p vc.inactivity_timeout_in $4 = 864000 p vc.active_timeout_in $5 = 9000 ``` Let me know if there's anything specific on the vc that would help... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] trafficserver issue #1401: Segfault in write_to_net_io with 7.1.x
Github user bryancall commented on the issue: https://github.com/apache/trafficserver/issues/1401 What does the vc look like? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] trafficserver issue #1401: Segfault in write_to_net_io with 7.1.x
GitHub user biilmann opened an issue: https://github.com/apache/trafficserver/issues/1401 Segfault in write_to_net_io with 7.1.x Seeing frequent segfaults when trying 7.1.x on some production traffic. What happens is that the `MUTEX_TRY_LOCK_FOR` segfaults since `s->vio.mutex` is NULL https://github.com/apache/trafficserver/blob/master/iocore/net/UnixNetVConnection.cc#L439 Digging into a core dump shows that this happens when `s->enabled == 0` and `s->error == 1`. Inspecting `s->vio` shows that it's an empty vio instance with a null pointer mutex. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---