[GitHub] trafficserver issue #1401: Segfault in write_to_net_io with 7.1.x

2017-03-13 Thread oknet
Github user oknet commented on the issue:

https://github.com/apache/trafficserver/issues/1401
  
@zwoop @bryancall 
I think we can close this issue since #1522 and #1559 have merged.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] trafficserver issue #1401: Segfault in write_to_net_io with 7.1.x

2017-03-01 Thread scw00
Github user scw00 commented on the issue:

https://github.com/apache/trafficserver/issues/1401
  
epoll_wait trigger EVENTIO_ERROR(in read or write) would case segfault, 
because we do not set any state in vc(by calling do_io_xxx). we just recevie 
sockets and register in epoll_wait.
In 6.x.x, we do not handle EVENTIO_ERROR. It may cause vc leaking, but 
avoid coredump.
Calling do_io_xx before register or closed directly(it do not enter the 
trasaction) may fix it, but  still on test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] trafficserver issue #1401: Segfault in write_to_net_io with 7.1.x

2017-03-01 Thread scw00
Github user scw00 commented on the issue:

https://github.com/apache/trafficserver/issues/1401
  
epoll_wait triggered EVENTIO_ERROR after accept(do_blocking_accept) will 
cause  coredump, because we do not call any do_io_xx. In 6.x.x we do not handle 
EVENTIO_ERROR event,it may cause vc leaking, but avoid coredump.
`  while ((vc = write_ready_list.dequeue())) {
set_cont_flags(vc->control_flags);
if (vc->closed)
  close_UnixNetVConnection(vc, trigger_event->ethread);
else if ((vc->write.enabled || vc->write.error) && vc->write.triggered)
  write_to_net(this, vc, trigger_event->ethread);
else if (!vc->write.enabled) {
  write_ready_list.remove(vc);
#if defined(solaris)
  if (vc->write.triggered && vc->read.enabled) {
vc->ep.modify(-EVENTIO_WRITE);
vc->ep.refresh(EVENTIO_READ);
vc->readReschedule(this);
  }
#endif
}`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] trafficserver issue #1401: Segfault in write_to_net_io with 7.1.x

2017-02-25 Thread shinrich
Github user shinrich commented on the issue:

https://github.com/apache/trafficserver/issues/1401
  
Finally get a use-after-free ASAN stack in this area.  Anyone else having 
problems with ASAN in newer builds?

Looks like it is showing a use after free in the case of the error 
bubbling.  

{code}
==30868==ERROR: AddressSanitizer: heap-use-after-free on address 
0x624001933448 at pc 0x5afa20 bp 0x7fffeaefe7e0 sp 0x7fffeaefe7d8
READ of size 8 at 0x624001933448 thread T17 ([ET_NET 15])
#0 0x5afa1f in Continuation::handleEvent(int, void*) 
../../../../trafficserver/iocore/eventsystem/I_Continuation.h:153
#1 0xae0c33 in write_signal_and_update 
../../../../trafficserver/iocore/net/UnixNetVConnection.cc:176
#2 0xae10ac in write_signal_done 
../../../../trafficserver/iocore/net/UnixNetVConnection.cc:218
#3 0xae11b2 in write_signal_error 
../../../../trafficserver/iocore/net/UnixNetVConnection.cc:237
#4 0xae2a1e in write_to_net_io(NetHandler*, UnixNetVConnection*, 
EThread*) ../../../../trafficserver/iocore/net/UnixNetVConnection.cc:458
#5 0xae25e5 in write_to_net(NetHandler*, UnixNetVConnection*, EThread*) 
../../../../trafficserver/iocore/net/UnixNetVConnection.cc:430
#6 0xace638 in NetHandler::mainNetEvent(int, Event*) 
../../../../trafficserver/iocore/net/UnixNet.cc:526
#7 0x5afb30 in Continuation::handleEvent(int, void*) 
../../../../trafficserver/iocore/eventsystem/I_Continuation.h:153
#8 0xb32866 in EThread::process_event(Event*, int) 
../../../../trafficserver/iocore/eventsystem/UnixEThread.cc:143
#9 0xb33487 in EThread::execute() 
../../../../trafficserver/iocore/eventsystem/UnixEThread.cc:270
#10 0xb3101b in spawn_thread_internal 
../../../../trafficserver/iocore/eventsystem/Thread.cc:84
#11 0x7568aaa0 in start_thread (/lib64/libpthread.so.0+0x7aa0)
#12 0x74fbd93c in clone (/lib64/libc.so.6+0xe893c)

0x624001933448 is located 4936 bytes inside of 7728-byte region 
[0x624001932100,0x624001933f30)
freed by thread T17 ([ET_NET 15]) here:
#0 0x549cb7 in free (/home/y/bin64/traffic_server+0x549cb7)
#1 0x77b96c79 in ats_memalign_free 
../../../../trafficserver/lib/ts/ink_memory.cc:141
#2 0x77b989be in malloc_free 
../../../../trafficserver/lib/ts/ink_queue.cc:322
#3 0x77b986e8 in ink_freelist_free 
../../../../trafficserver/lib/ts/ink_queue.cc:276
#4 0x75bc20 in ClassAllocator::free(HttpSM*) 
/var/builds/workspace/163866-v3-component/BUILD_CONTAINER/rhel6-gcc5_5/label/DOCKER-HIGH/app_root/_build/asan_build/../../trafficserver/lib/ts/Allocator.h:135
#5 0x708afe in HttpSM::destroy() 
../../../../trafficserver/proxy/http/HttpSM.cc:365
#6 0x7459ad in HttpSM::kill_this() 
../../../../trafficserver/proxy/http/HttpSM.cc:6951
#7 0x71dcb9 in HttpSM::main_handler(int, void*) 
../../../../trafficserver/proxy/http/HttpSM.cc:2678
#8 0x5afb30 in Continuation::handleEvent(int, void*) 
../../../../trafficserver/iocore/eventsystem/I_Continuation.h:153
#9 0x7f50f6 in HttpTunnel::main_handler(int, void*) 
../../../../trafficserver/proxy/http/HttpTunnel.cc:1662
#10 0x5afb30 in Continuation::handleEvent(int, void*) 
../../../../trafficserver/iocore/eventsystem/I_Continuation.h:153
#11 0xae0c33 in write_signal_and_update 
../../../../trafficserver/iocore/net/UnixNetVConnection.cc:176
#12 0xae10ac in write_signal_done 
../../../../trafficserver/iocore/net/UnixNetVConnection.cc:218
#13 0xae3588 in write_to_net_io(NetHandler*, UnixNetVConnection*, 
EThread*) ../../../../trafficserver/iocore/net/UnixNetVConnection.cc:596
#14 0xae25e5 in write_to_net(NetHandler*, UnixNetVConnection*, 
EThread*) ../../../../trafficserver/iocore/net/UnixNetVConnection.cc:430
#15 0xace638 in NetHandler::mainNetEvent(int, Event*) 
../../../../trafficserver/iocore/net/UnixNet.cc:526
#16 0x5afb30 in Continuation::handleEvent(int, void*) 
../../../../trafficserver/iocore/eventsystem/I_Continuation.h:153
#17 0xb32866 in EThread::process_event(Event*, int) 
../../../../trafficserver/iocore/eventsystem/UnixEThread.cc:143
#18 0xb33487 in EThread::execute() 
../../../../trafficserver/iocore/eventsystem/UnixEThread.cc:270
#19 0xb3101b in spawn_thread_internal 
../../../../trafficserver/iocore/eventsystem/Thread.cc:84
#20 0x7568aaa0 in start_thread (/lib64/libpthread.so.0+0x7aa0)

previously allocated by thread T17 ([ET_NET 15]) here:
#0 0x54a42b in posix_memalign (/home/y/bin64/traffic_server+0x54a42b)
#1 0x77b96afa in ats_memalign 
../../../../trafficserver/lib/ts/ink_memory.cc:102
#2 0x77b984a5 in malloc_new 
../../../../trafficserver/lib/ts/ink_queue.cc:260
#3 0x77b97e57 in ink_freelist_new 
../../../../trafficserver/lib/ts/ink_queue.cc:183
#4 0x648f31 in 

[GitHub] trafficserver issue #1401: Segfault in write_to_net_io with 7.1.x

2017-02-14 Thread PSUdaemon
Github user PSUdaemon commented on the issue:

https://github.com/apache/trafficserver/issues/1401
  
I'm not sure there is a core to look at in the case I pasted. It was on a 
build host.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] trafficserver issue #1401: Segfault in write_to_net_io with 7.1.x

2017-02-14 Thread shinrich
Github user shinrich commented on the issue:

https://github.com/apache/trafficserver/issues/1401
  
BTW I'm currently testing without HTTP/2.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] trafficserver issue #1401: Segfault in write_to_net_io with 7.1.x

2017-02-14 Thread shinrich
Github user shinrich commented on the issue:

https://github.com/apache/trafficserver/issues/1401
  
@PSUdaemon would be cool to see symbols with the stack to verify that is 
the same thing.  My hacky fix made it go away only to be immediately replaced 
by a new crash.  (New issue to be posted).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] trafficserver issue #1401: Segfault in write_to_net_io with 7.1.x

2017-02-13 Thread PSUdaemon
Github user PSUdaemon commented on the issue:

https://github.com/apache/trafficserver/issues/1401
  
I think I am [seeing 
this](https://ci.trafficserver.apache.org/job/ubuntu_14_04-6.2.x/compiler=clang,label=ubuntu_14_04,type=release/143/console)
 in 6.2.x as well:
```
traffic_server: Segmentation fault (Address not mapped to object [0x18])
traffic_server - STACK TRACE: 

/var/jenkins/workspace/ubuntu_14_04-6.2.x/compiler/clang/label/ubuntu_14_04/type/release/143/install/bin/traffic_server(_Z19crash_logger_invokeiP9siginfo_tPv+0x8e)[0x4a234e]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x10330)[0x7fb6fa205330]

/var/jenkins/workspace/ubuntu_14_04-6.2.x/compiler/clang/label/ubuntu_14_04/type/release/143/install/bin/traffic_server(_Z15write_to_net_ioP10NetHandlerP18UnixNetVConnectionP7EThread+0xa1e)[0x6bb3ae]

/var/jenkins/workspace/ubuntu_14_04-6.2.x/compiler/clang/label/ubuntu_14_04/type/release/143/install/bin/traffic_server(_ZN10NetHandler12mainNetEventEiP5Event+0x686)[0x6b4606]

/var/jenkins/workspace/ubuntu_14_04-6.2.x/compiler/clang/label/ubuntu_14_04/type/release/143/install/bin/traffic_server(_ZN7EThread13process_eventEP5Eventi+0x123)[0x6d7af3]

/var/jenkins/workspace/ubuntu_14_04-6.2.x/compiler/clang/label/ubuntu_14_04/type/release/143/install/bin/traffic_server(_ZN7EThread7executeEv+0x560)[0x6d8210]

/var/jenkins/workspace/ubuntu_14_04-6.2.x/compiler/clang/label/ubuntu_14_04/type/release/143/install/bin/traffic_server(main+0x1de0)[0x4c8bd0]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7fb6f93ccf45]

/var/jenkins/workspace/ubuntu_14_04-6.2.x/compiler/clang/label/ubuntu_14_04/type/release/143/install/bin/traffic_server[0x49218d]
/home/jenkins/bin/regression.sh: line 24: 21351 Segmentation fault  
(core dumped) "${WORKSPACE}/${BUILD_NUMBER}/install/bin/traffic_server" -k -K 
-R 1
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] trafficserver issue #1401: Segfault in write_to_net_io with 7.1.x

2017-02-13 Thread shinrich
Github user shinrich commented on the issue:

https://github.com/apache/trafficserver/issues/1401
  
I've finally got my environment running and I see the same stack very 
quickly as well.  In the cases I've seen it looks like there was a write error, 
but for some reason the write vio has been cleared out (or was never set?)

```
(gdb) frame 2
#2  0x00788350 in write_to_net_io (nh=0x2af588003e60, 
vc=0x2aad1401b800, thread=0x2af58810) at UnixNetVConnection.cc:440
440 UnixNetVConnection.cc: No such file or directory.
in UnixNetVConnection.cc
(gdb) print *s
$1 = {enabled = 0, error = 1, vio = {_cont = 0x0, nbytes = 0, ndone = 0, op 
= 0, buffer = {mbuf = 0x0, entry = 0x0}, vc_server = 0x0, mutex = {m_ptr = 
0x0}}, 
  ready_link = { = {next = 0x0}, prev = 0x0}, 
enable_link = {next = 0x0}, in_enabled_list = 0, triggered = 1}
```
The write.error stuff was added by Thomas in TS-4796, but if this just 
showed up between 7.0 and 7.1, it is unlikely that this was the culprit.  I 
think it has been a while (since 9/3/2016).  

Seems more likely that someone has cleared the vio, or we are bouncing an 
error.

In the short term I'm adding a NULL check at the begining of 
write_to_net_io, but that just seems to be masking the failure case rather than 
identifying the root cause. 

```
+  if (!s->vio.mutex) {
+ink_release_assert(s->vio._cont == NULL && vc->write.error);
+return;
+  }
```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] trafficserver issue #1401: Segfault in write_to_net_io with 7.1.x

2017-02-06 Thread bryancall
Github user bryancall commented on the issue:

https://github.com/apache/trafficserver/issues/1401
  
(gdb) bt full
#0  0x005150b0 in Mutex_trylock (m=0x0, t=0x2b44f3a6d010)
at 
/home/bcall/dev/yahoo/build/_build/ats_build/../../trafficserver/iocore/eventsystem/I_Lock.h:289
No locals.
#1  0x0051526f in MutexTryLock::MutexTryLock (this=0x2b44f9c35be0, 
am=..., t=0x2b44f3a6d010)
at 
/home/bcall/dev/yahoo/build/_build/ats_build/../../trafficserver/iocore/eventsystem/I_Lock.h:555
No locals.
#2  0x00787888 in write_to_net_io (nh=0x2b44f3a70e60, 
vc=0x2aac5002bf10, thread=0x2b44f3a6d010)
at ../../../../trafficserver/iocore/net/UnixNetVConnection.cc:439
s = 0x2aac5002c098
mutex = 0x2b44f40035d0
lock = {m = {m_ptr = 0x0}, lock_acquired = 16}
ntodo = 47575248100432
buf = @0x2b9de60
towrite = 47575145566224
signalled = 10
needs = 2147483647
total_written = 3807
r = 44090535680
#3  0x0078781e in write_to_net (nh=0x2b44f3a70e60, 
vc=0x2aac5002bf10, thread=0x2b44f3a6d010)
at ../../../../trafficserver/iocore/net/UnixNetVConnection.cc:430
mutex = 0x2b44f40035d0
#4  0x0077f443 in NetHandler::mainNetEvent (this=0x2b44f3a70e60, 
event=5, e=0x2cb5860)
at ../../../../trafficserver/iocore/net/UnixNet.cc:526
epd = 0x2aac5c02dd70
poll_timeout = 0
pd = 0x2b45a010
vc = 0x2aac5002bf10
__func__ = "mainNetEvent"
#5  0x00515354 in Continuation::handleEvent (this=0x2b44f3a70e60, 
event=5, data=0x2cb5860)
at 
/home/bcall/dev/yahoo/build/_build/ats_build/../../trafficserver/iocore/eventsystem/I_Continuation.h:153
No locals.
#6  0x007a8bc3 in EThread::process_event (this=0x2b44f3a6d010, 
e=0x2cb5860, calling_code=5)
at ../../../../trafficserver/iocore/eventsystem/UnixEThread.cc:143
c_temp = 0x2b44f3a70e60
lock = {m = {m_ptr = 0x2b44f4001f50}, lock_acquired = true}
#7  0x007a90c4 in EThread::execute (this=0x2b44f3a6d010)
at ../../../../trafficserver/iocore/eventsystem/UnixEThread.cc:270
done_one = false
e = 0x2cb5860
NegativeQueue = {> = {head = 0x0}, 
tail = 0x0}
next_time = 1486425567363334599
#8  0x007a8279 in spawn_thread_internal (a=0x2ba5fd0)
at ../../../../trafficserver/iocore/eventsystem/Thread.cc:84
p = 0x2ba5fd0
#9  0x2b44f0c14aa1 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#10 0x2b44f0960aad in clone () from /lib64/libc.so.6


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] trafficserver issue #1401: Segfault in write_to_net_io with 7.1.x

2017-02-02 Thread zwoop
Github user zwoop commented on the issue:

https://github.com/apache/trafficserver/issues/1401
  
Are you getting a core file / backtrace ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] trafficserver issue #1401: Segfault in write_to_net_io with 7.1.x

2017-01-31 Thread biilmann
Github user biilmann commented on the issue:

https://github.com/apache/trafficserver/issues/1401
  
Not this one no. We've seen a crash that looks related in the crash logs 
from 7.0.x but it is much rarer and I don't currently have a core-dump of that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] trafficserver issue #1401: Segfault in write_to_net_io with 7.1.x

2017-01-31 Thread zwoop
Github user zwoop commented on the issue:

https://github.com/apache/trafficserver/issues/1401
  
you didn't see this crasher in 7.0.0 right ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] trafficserver issue #1401: Segfault in write_to_net_io with 7.1.x

2017-01-31 Thread biilmann
Github user biilmann commented on the issue:

https://github.com/apache/trafficserver/issues/1401
  
Some info on the vc:

```
p vc
$1 = (UnixNetVConnection *) 0x2b9a7802ff10
p vc.closed
$2 = 0
p vc.action_
$3 = {_vptr.Action = 0x788a90 , continuation = 
0x23dcdf0, mutex = {m_ptr = 0x0}, cancelled = 0}
p vc.inactivity_timeout_in
$4 = 864000
p vc.active_timeout_in
$5 = 9000
```

Let me know if there's anything specific on the vc that would help...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] trafficserver issue #1401: Segfault in write_to_net_io with 7.1.x

2017-01-31 Thread bryancall
Github user bryancall commented on the issue:

https://github.com/apache/trafficserver/issues/1401
  
What does the vc look like?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] trafficserver issue #1401: Segfault in write_to_net_io with 7.1.x

2017-01-31 Thread biilmann
GitHub user biilmann opened an issue:

https://github.com/apache/trafficserver/issues/1401

Segfault in write_to_net_io with 7.1.x

Seeing frequent segfaults when trying 7.1.x on some production traffic.

What happens is that the `MUTEX_TRY_LOCK_FOR` segfaults since 
`s->vio.mutex` is NULL


https://github.com/apache/trafficserver/blob/master/iocore/net/UnixNetVConnection.cc#L439

Digging into a core dump shows that this happens when `s->enabled == 0` and 
`s->error == 1`. Inspecting `s->vio` shows that it's an empty vio instance with 
a null pointer mutex.








---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---