[jira] [Comment Edited] (TS-3083) crash

2014-12-24 Thread Qiang Li (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14258149#comment-14258149
 ] 

Qiang Li edited comment on TS-3083 at 12/24/14 9:58 AM:


[~yunkai]
i add #define DEBUG in ink_queue_ext.h, and catch a core
traffic.out
{code}
FATAL: ink_queue_ext.cc:263: failed assert `pChunk-item_magic[idx] == 
ITEM_MAGIC`
traffic_server - STACK TRACE:
/usr/lib64/trafficserver/libtsutil.so.5(ink_fatal_die+0x0)[0x2b7214d96c23]
/usr/lib64/trafficserver/libtsutil.so.5(_Z12ink_get_randv+0x0)[0x2b7214d9462c]
/usr/lib64/trafficserver/libtsutil.so.5(+0x408e3)[0x2b7214d9c8e3]
/usr/lib64/trafficserver/libtsutil.so.5(reclaimable_freelist_free+0x49)[0x2b7214d9d8a1]
/usr/lib64/trafficserver/libtsutil.so.5(ink_freelist_free+0x23)[0x2b7214d9b966]
/usr/bin/traffic_server(_ZN14ClassAllocatorI9MIOBufferE4freeEPS0_+0x26)[0x502a22]
/usr/bin/traffic_server(_Z11thread_freeI9MIOBufferEvR14ClassAllocatorIT_EPS2_+0x23)[0x502754]
/usr/bin/traffic_server(_Z14free_MIOBufferP9MIOBuffer+0x43)[0x501fee]
/usr/bin/traffic_server(_ZN7FetchSM7cleanUpEv+0xae)[0x4ff1c8]
/usr/bin/traffic_server(_ZN7FetchSM15InvokePluginExtEi+0x4b5)[0x4ffeeb]
/usr/bin/traffic_server(_ZN7FetchSM18process_fetch_readEi+0x35d)[0x5005db]
/usr/bin/traffic_server(_ZN7FetchSM13fetch_handlerEiPv+0x93)[0x5008f5]
/usr/bin/traffic_server(_ZN12Continuation11handleEventEiPv+0x6c)[0x501886]
/usr/bin/traffic_server(_ZN8PluginVC17process_read_sideEb+0x67b)[0x53db65]
/usr/bin/traffic_server(_ZN8PluginVC18process_write_sideEb+0x6ca)[0x53d496]
/usr/bin/traffic_server(_ZN8PluginVC12main_handlerEiPv+0x5a9)[0x53c359]
/usr/bin/traffic_server(_ZN12Continuation11handleEventEiPv+0x6c)[0x501886]
/usr/bin/traffic_server(_ZN7EThread13process_eventEP5Eventi+0xc6)[0x7bb60a]
/usr/bin/traffic_server(_ZN7EThread7executeEv+0xa0)[0x7bb7d8]
{code}

core dump
{code}
(gdb) bt
#0  0x2b7217b79625 in raise () from /lib64/libc.so.6
#1  0x2b7217b7ad8d in abort () from /lib64/libc.so.6
#2  0x2b7214d96a90 in ink_die_die_die (retval=1) at ink_error.cc:43
#3  0x2b7214d96b5a in ink_fatal_va(int, const char *, typedef __va_list_tag 
__va_list_tag *) (return_code=1, 
fmt=0x2b7214da7d20 %s:%d: failed assert `%s`, ap=0x2b721ac2b6e0) at 
ink_error.cc:67
#4  0x2b7214d96c23 in ink_fatal (return_code=1, 
message_format=0x2b7214da7d20 %s:%d: failed assert `%s`) at ink_error.cc:75
#5  0x2b7214d9462c in _ink_assert (expression=0x2b7214da8c88 
pChunk-item_magic[idx] == ITEM_MAGIC, file=0x2b7214da8b32 
ink_queue_ext.cc, line=263)
at ink_assert.cc:37
#6  0x2b7214d9c8e3 in clear_chunk_item_magic (f=0x11255e0, 
pChunk=0x2aabd49cfed0, item=0x2aabd49cc960) at ink_queue_ext.cc:263
#7  0x2b7214d9d8a1 in reclaimable_freelist_free (f=0x11255e0, 
item=0x2aabd49cc960) at ink_queue_ext.cc:675
#8  0x2b7214d9b966 in ink_freelist_free (f=0x11255e0, item=0x2aabd49cc960) 
at ink_queue.cc:248
#9  0x00502a22 in ClassAllocatorMIOBuffer::free (this=0x1071ac0, 
ptr=0x2aabd49cc960) at ../lib/ts/Allocator.h:138
#10 0x00502754 in thread_freeMIOBuffer (a=..., p=0x2aabd49cc960) at 
../iocore/eventsystem/I_ProxyAllocator.h:84
#11 0x00501fee in free_MIOBuffer (mio=0x2aabd49cc960) at 
../iocore/eventsystem/P_IOBuffer.h:780
#12 0x004ff1c8 in FetchSM::cleanUp (this=0x2aaae948f000) at 
FetchSM.cc:43
#13 0x004ffeeb in FetchSM::InvokePluginExt (this=0x2aaae948f000, 
fetch_event=0) at FetchSM.cc:329
#14 0x005005db in FetchSM::process_fetch_read (this=0x2aaae948f000, 
event=100) at FetchSM.cc:453
#15 0x005008f5 in FetchSM::fetch_handler (this=0x2aaae948f000, 
event=100, edata=0x2aaac4013af8) at FetchSM.cc:509
#16 0x00501886 in Continuation::handleEvent (this=0x2aaae948f000, 
event=100, data=0x2aaac4013af8) at ../iocore/eventsystem/I_Continuation.h:146
#17 0x0053db65 in PluginVC::process_read_side (this=0x2aaac40139f0, 
other_side_call=true) at PluginVC.cc:669
#18 0x0053d496 in PluginVC::process_write_side (this=0x2aaac4013be0, 
other_side_call=false) at PluginVC.cc:565
#19 0x0053c359 in PluginVC::main_handler (this=0x2aaac4013be0, event=1, 
data=0x2b7244002a60) at PluginVC.cc:210
#20 0x00501886 in Continuation::handleEvent (this=0x2aaac4013be0, 
event=1, data=0x2b7244002a60) at ../iocore/eventsystem/I_Continuation.h:146
#21 0x007bb60a in EThread::process_event (this=0x2b721a023010, 
e=0x2b7244002a60, calling_code=1) at UnixEThread.cc:144
#22 0x007bb7d8 in EThread::execute (this=0x2b721a023010) at 
UnixEThread.cc:195
#23 0x007babc5 in spawn_thread_internal (a=0x118c910) at Thread.cc:88
#24 0x2b7216c389d1 in start_thread () from /lib64/libpthread.so.0
#25 0x2b7217c2f9dd in clone () from /lib64/libc.so.6
(gdb) f 6
#6  0x2b7214d9c8e3 in clear_chunk_item_magic (f=0x11255e0, 
pChunk=0x2aabd49cfed0, item=0x2aabd49cc960) at ink_queue_ext.cc:263
263   

[jira] [Comment Edited] (TS-3083) crash

2014-12-24 Thread Sudheer Vinukonda (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14258469#comment-14258469
 ] 

Sudheer Vinukonda edited comment on TS-3083 at 12/24/14 7:23 PM:
-

[~tufang14] : FetchSM::cleanUp() may be called twice (from two different 
threads even) via {{SpdyRequest::clear()}} and {{FetchSM::InvokePluginExt()}}. 
We don't enable reclaimable free lists in our production, but, we did see some 
cores a few months ago, around this area due to double delete. 

You may try the below patch (that we tested briefly, although, we never rolled 
to production, since we stopped seeing these cores sometime back) to see if it 
resolves the issue you are noticing.

{code}
diff --git a/proxy/FetchSM.cc b/proxy/FetchSM.cc
index c5fc04b..34f3dfc 100644
--- a/proxy/FetchSM.cc
+++ b/proxy/FetchSM.cc
@@ -30,12 +30,23 @@
 #define DEBUG_TAG FetchSM
 #define FETCH_LOCK_RETRY_TIME HRTIME_MSECONDS(10)
 
+static ink_mutex ink_mutex_fetch_sm_initializer = INK_MUTEX_INIT;
+
 ClassAllocator  FetchSM  FetchSMAllocator(FetchSMAllocator);
 void
 FetchSM::cleanUp()
 {
   Debug(DEBUG_TAG, [%s] calling cleanup, __FUNCTION__);
 
+  ink_scoped_mutex fetch_sm_mutex(ink_mutex_fetch_sm_initializer);
+  if (!destroyed) {
+destroyed = true;
+  } else {
+Debug(DEBUG_TAG, Error: Double delete: FetchSM has already been 
destroyed. this:%p, this);
+return;
+  }
+
+
   if (resp_is_chunked  0  (fetch_flags  TS_FETCH_FLAGS_DECHUNK)) {
 chunked_handler.clear();
}
diff --git a/proxy/FetchSM.h b/proxy/FetchSM.h
index c48bb4f..076e092 100644
--- a/proxy/FetchSM.h
+++ b/proxy/FetchSM.h
@@ -52,6 +52,7 @@ public:
 header_done = 0;
 user_data = NULL;
 has_sent_header = false;
+destroyed = false;
 req_content_length = 0;
 resp_is_chunked = -1;
 resp_content_length = -1;
@@ -167,6 +168,7 @@ private:
   bool header_done;
   bool is_method_head;
   bool is_internal_request;
+  bool destroyed;
   IpEndpoint _addr;
   int resp_is_chunked;
   int resp_received_close;
{code}


was (Author: sudheerv):
[~tufang14] : FetchSM::cleanUp() may be called twice (from two different 
threads even) via {{SpdyRequest::clear()}} and {{FetchSM::InvokePluginExt()}}. 
We don't enable reclaimable free lists in our production, but, we did see some 
cores a few months ago, around this area due to double delete. 

You may try the below patch (that we tested briefly, although, we never rolled 
to production, since we stopped seeing these cores sometime back) addresses the 
issue you are noticing.

{code}
diff --git a/proxy/FetchSM.cc b/proxy/FetchSM.cc
index c5fc04b..34f3dfc 100644
--- a/proxy/FetchSM.cc
+++ b/proxy/FetchSM.cc
@@ -30,12 +30,23 @@
 #define DEBUG_TAG FetchSM
 #define FETCH_LOCK_RETRY_TIME HRTIME_MSECONDS(10)
 
+static ink_mutex ink_mutex_fetch_sm_initializer = INK_MUTEX_INIT;
+
 ClassAllocator  FetchSM  FetchSMAllocator(FetchSMAllocator);
 void
 FetchSM::cleanUp()
 {
   Debug(DEBUG_TAG, [%s] calling cleanup, __FUNCTION__);
 
+  ink_scoped_mutex fetch_sm_mutex(ink_mutex_fetch_sm_initializer);
+  if (!destroyed) {
+destroyed = true;
+  } else {
+Debug(DEBUG_TAG, Error: Double delete: FetchSM has already been 
destroyed. this:%p, this);
+return;
+  }
+
+
   if (resp_is_chunked  0  (fetch_flags  TS_FETCH_FLAGS_DECHUNK)) {
 chunked_handler.clear();
}
diff --git a/proxy/FetchSM.h b/proxy/FetchSM.h
index c48bb4f..076e092 100644
--- a/proxy/FetchSM.h
+++ b/proxy/FetchSM.h
@@ -52,6 +52,7 @@ public:
 header_done = 0;
 user_data = NULL;
 has_sent_header = false;
+destroyed = false;
 req_content_length = 0;
 resp_is_chunked = -1;
 resp_content_length = -1;
@@ -167,6 +168,7 @@ private:
   bool header_done;
   bool is_method_head;
   bool is_internal_request;
+  bool destroyed;
   IpEndpoint _addr;
   int resp_is_chunked;
   int resp_received_close;
{code}

 crash
 -

 Key: TS-3083
 URL: https://issues.apache.org/jira/browse/TS-3083
 Project: Traffic Server
  Issue Type: Bug
  Components: Core
Affects Versions: 5.2.0
Reporter: bettydramit
Assignee: Zhao Yongming
  Labels: crash
 Fix For: sometime


 c++filt a.txt 
 {code}
 /lib64/libpthread.so.0(+0xf710)[0x2b4c37949710]
 /usr/lib64/trafficserver/libtsutil.so.5(ink_atomiclist_pop+0x3e)[0x2b4c35abb64e]
 /usr/lib64/trafficserver/libtsutil.so.5(reclaimable_freelist_new+0x65)[0x2b4c35abc065]
 /usr/bin/traffic_server(MIOBuffer_tracker::operator()(long)+0x2b)[0x4a33db]
 /usr/bin/traffic_server(PluginVCCore::init()+0x2e3)[0x4d9903]
 /usr/bin/traffic_server(PluginVCCore::alloc()+0x11d)[0x4dcf4d]
 /usr/bin/traffic_server(TSHttpConnectWithPluginId+0x5d)[0x4b9e9d]
 /usr/bin/traffic_server(FetchSM::httpConnect()+0x74)[0x4a0224]
 /usr/bin/traffic_server(PluginVC::process_read_side(bool)+0x375)[0x4da675]
 

[jira] [Comment Edited] (TS-3083) crash

2014-12-24 Thread Yunkai Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14258672#comment-14258672
 ] 

Yunkai Zhang edited comment on TS-3083 at 12/25/14 7:51 AM:


Yes, from the core dump after enable DEBUG,  it obviously shows that 
{{FetchSM::InvokePluginExt = FetchSM::cleanUp}} causing the double-free, that 
is why {{pChunk-item_magic}} was corrupted.

I have a look at the code, [~sudheerv]'s patch should work for this issue.

But I think the lock is too big, we can use CAS() to modify {{destroyed}} 
variable:)


was (Author: yunkai):
Yes, from the core dump after enable DEBUG,  it obviously shows that 
{{FetchSM::InvokePluginExt}}-{{FetchSM::cleanUp}} causing the double-free, 
that is why {{pChunk-item_magic}} was corrupted.

I have a look at the code, [~sudheerv]'s patch should work for this issue.

But I think the lock is too big, we can use CAS() to modify {{destroyed}} 
variable:)

 crash
 -

 Key: TS-3083
 URL: https://issues.apache.org/jira/browse/TS-3083
 Project: Traffic Server
  Issue Type: Bug
  Components: Core
Affects Versions: 5.2.0
Reporter: bettydramit
Assignee: Zhao Yongming
Priority: Blocker
  Labels: crash
 Fix For: sometime


 c++filt a.txt 
 {code}
 /lib64/libpthread.so.0(+0xf710)[0x2b4c37949710]
 /usr/lib64/trafficserver/libtsutil.so.5(ink_atomiclist_pop+0x3e)[0x2b4c35abb64e]
 /usr/lib64/trafficserver/libtsutil.so.5(reclaimable_freelist_new+0x65)[0x2b4c35abc065]
 /usr/bin/traffic_server(MIOBuffer_tracker::operator()(long)+0x2b)[0x4a33db]
 /usr/bin/traffic_server(PluginVCCore::init()+0x2e3)[0x4d9903]
 /usr/bin/traffic_server(PluginVCCore::alloc()+0x11d)[0x4dcf4d]
 /usr/bin/traffic_server(TSHttpConnectWithPluginId+0x5d)[0x4b9e9d]
 /usr/bin/traffic_server(FetchSM::httpConnect()+0x74)[0x4a0224]
 /usr/bin/traffic_server(PluginVC::process_read_side(bool)+0x375)[0x4da675]
 /usr/bin/traffic_server(PluginVC::process_write_side(bool)+0x57a)[0x4dafca]
 /usr/bin/traffic_server(PluginVC::main_handler(int, void*)+0x315)[0x4dc9a5]
 /usr/bin/traffic_server(EThread::process_event(Event*, int)+0x8f)[0x73788f]
 /usr/bin/traffic_server(EThread::execute()+0x57b)[0x7381fb]
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)