[ https://issues.apache.org/jira/browse/TS-4915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15541238#comment-15541238 ]
Susan Hinrichs commented on TS-4915: ------------------------------------ Yes, I assume this is a 7.0.0 issue. Though I cannot say for certain since I'm working on a branch to fix ts-4813 for now. With that branch, it looks like this core appears every couple hours on a light traffic, predominately caching production server. [~jacksontj] since hostdb shows up in the stack does this look like anything from the resent hostdb changes? I have another less frequent stack that shows up in hostdb land. I have seen this following stack 2-3 times since Friday. {code} #0 0x00000000007b7c3f in ink_atomic_increment<long, long> (mem=0xff4e45fa6000802c, count=1) at ../../lib/ts/ink_atomic.h:95 #1 0x00000000007b7703 in RecIncrGlobalRawStatCount (rsb=0x2b1a60037f90, id=4, incr=1) at RecRawStats.cc:467 #2 0x000000000054915d in RefCountCachePartition<HostDBInfo>::metric_inc (this=0x1477c50, metric_enum=refcountcache_total_lookups_stat, data=1) at ../iocore/hostdb/P_RefCountCache.h:327 #3 0x00000000006ebfca in RefCountCachePartition<HostDBInfo>::get (this=0x1477c50, key=18396718469509840932) at ./P_RefCountCache.h:174 #4 0x00000000006eb38f in RefCountCache<HostDBInfo>::get (this=0x1484f00, key=18396718469509840932) at ./P_RefCountCache.h:455 #5 0x00000000006de4bf in probe (mutex=0x1457a80, md5=..., ignore_timeout=false) at HostDB.cc:527 #6 0x00000000006dfbfe in HostDBProcessor::getbyname_imm (this=0xc173c0, cont=0x2b19fc388440, process_hostdb_info= (void (Continuation::*)(Continuation *, HostDBInfo *)) 0x5e8ece <HttpSM::process_hostdb_info(HostDBInfo*)>, hostname=0x2b1a2d0c1419 "lib.lumcs.com", len=0, opt=...) at HostDB.cc:818 #7 0x00000000005f1102 in HttpSM::do_hostdb_lookup (this=0x2b19fc388440) at HttpSM.cc:4130 #8 0x00000000005fd691 in HttpSM::set_next_state (this=0x2b19fc388440) at HttpSM.cc:7256 #9 0x00000000005fca85 in HttpSM::call_transact_and_set_next_state (this=0x2b19fc388440, f=0) at HttpSM.cc:7122 #10 0x00000000005e6eab in HttpSM::handle_api_return (this=0x2b19fc388440) at HttpSM.cc:1604 #11 0x000000000060414c in HttpSM::do_api_callout (this=0x2b19fc388440) at HttpSM.cc:438 #12 0x00000000005fcaf2 in HttpSM::set_next_state (this=0x2b19fc388440) at HttpSM.cc:7155 #13 0x00000000005fca85 in HttpSM::call_transact_and_set_next_state (this=0x2b19fc388440, f=0) at HttpSM.cc:7122 #14 0x00000000005e6eab in HttpSM::handle_api_return (this=0x2b19fc388440) at HttpSM.cc:1604 #15 0x000000000060414c in HttpSM::do_api_callout (this=0x2b19fc388440) at HttpSM.cc:438 #16 0x00000000005fcaf2 in HttpSM::set_next_state (this=0x2b19fc388440) at HttpSM.cc:7155 #17 0x00000000005fca85 in HttpSM::call_transact_and_set_next_state (this=0x2b19fc388440, f= 0x616a42 <HttpTransact::HandleCacheOpenRead(HttpTransact::State*)>) at HttpSM.cc:7122 #18 0x00000000005eaf09 in HttpSM::state_cache_open_read (this=0x2b19fc388440, event=1102, data=0x2b1a3041e740) at HttpSM.cc:2596 #19 0x00000000005eb4b5 in HttpSM::main_handler (this=0x2b19fc388440, event=1102, data=0x2b1a3041e740) at HttpSM.cc:2658 #20 0x00000000005145dc in Continuation::handleEvent (this=0x2b19fc388440, event=1102, data=0x2b1a3041e740) at ../iocore/eventsystem/I_Continuation.h:153 #21 0x00000000005d3818 in HttpCacheSM::state_cache_open_read (this=0x2b19fc389d60, event=1102, data=0x2b1a3041e740) at HttpCacheSM.cc:132 #22 0x00000000005145dc in Continuation::handleEvent (this=0x2b19fc389d60, event=1102, data=0x2b1a3041e740) at ../iocore/eventsystem/I_Continuation.h:153 #23 0x0000000000756997 in CacheVC::callcont (this=0x2b1a3041e740, event=1102) at P_CacheInternal.h:643 #24 0x0000000000754aa6 in CacheVC::openReadStartEarliest (this=0x2b1a3041e740) at CacheRead.cc:914 #25 0x00000000005145dc in Continuation::handleEvent (this=0x2b1a3041e740, event=3900, data=0x0) at ../iocore/eventsystem/I_Continuation.h:153 #26 0x00000000007318f5 in CacheVC::handleReadDone (this=0x2b1a3041e740, event=3900, e=0x2b1a3041e8c8) at Cache.cc:2445 #27 0x00000000005145dc in Continuation::handleEvent (this=0x2b1a3041e740, event=3900, data=0x2b1a3041e8c8) at ../iocore/eventsystem/I_Continuation.h:153 #28 0x0000000000736ebd in AIOCallbackInternal::io_complete (this=0x2b1a3041e8c8, event=1, data=0x2b1a0c0041e0) at ../../iocore/aio/P_AIO.h:117 #29 0x00000000005145dc in Continuation::handleEvent (this=0x2b1a3041e8c8, event=1, data=0x2b1a0c0041e0) at ../iocore/eventsystem/I_Continuation.h:153 #30 0x00000000007bca24 in EThread::process_event (this=0x2b19e1824010, e=0x2b1a0c0041e0, calling_code=1) at UnixEThread.cc:143 #31 0x00000000007bcc93 in EThread::execute (this=0x2b19e1824010) at UnixEThread.cc:197 #32 0x00000000007bc038 in spawn_thread_internal (a=0x147beb0) at Thread.cc:84 #33 0x00002b19db4a9aa1 in start_thread () from /lib64/libpthread.so.0 #34 0x00000032310e893d in clone () from /lib64/libc.so.6 {code} > Crash from hostdb in PriorityQueueLess > -------------------------------------- > > Key: TS-4915 > URL: https://issues.apache.org/jira/browse/TS-4915 > Project: Traffic Server > Issue Type: Bug > Components: HostDB > Reporter: Susan Hinrichs > Priority: Blocker > Fix For: 7.1.0 > > > Saw this while testing fix for TS-4813 with debug enabled. > {code} > (gdb) bt full > #0 0x0000000000547bfe in RefCountCacheHashEntry::operator< (this=0x1cc0880, > v2=...) at ../iocore/hostdb/P_RefCountCache.h:94 > No locals. > #1 0x000000000054988d in > PriorityQueueLess<RefCountCacheHashEntry*>::operator() (this=0x2b78a9a2587b, > a=@0x2b78f402af68, b=@0x2b78f402aa28) > at ../lib/ts/PriorityQueue.h:41 > No locals. > #2 0x0000000000549785 in PriorityQueue<RefCountCacheHashEntry*, > PriorityQueueLess<RefCountCacheHashEntry*> >::_bubble_up (this=0x1cb2990, > index=2) at ../lib/ts/PriorityQueue.h:191 > comp = {<No data fields>} > parent = 0 > #3 0x00000000006ecfcc in PriorityQueue<RefCountCacheHashEntry*, > PriorityQueueLess<RefCountCacheHashEntry*> >::push (this=0x1cb2990, > entry=0x2b78f402af60) at ../../lib/ts/PriorityQueue.h:91 > len = 2 > #4 0x00000000006ec206 in RefCountCachePartition<HostDBInfo>::put > (this=0x1cb2900, key=6912554662447498853, item=0x2b78aee04f00, size=96, > expire_time=1475202356) at ./P_RefCountCache.h:210 > expiry_entry = 0x2b78f402af60 > __func__ = "put" > val = 0x1cc0880 > #5 0x00000000006eb3de in RefCountCache<HostDBInfo>::put (this=0x18051e0, > key=6912554662447498853, item=0x2b78aee04f00, size=16, > expiry_time=1475202356) at ./P_RefCountCache.h:462 > No locals. > #6 0x00000000006e2d8e in HostDBContinuation::dnsEvent (this=0x2b7938020f00, > event=600, e=0x2b78ac009440) at HostDB.cc:1422 > is_rr = false > old_rr_data = 0x0 > first_record = 0x2b78ac0094f8 > m = 0x1 > failed = false > old_r = {m_ptr = 0x0} > af = 2 '\002' > s_size = 16 > rrsize = 0 > allocSize = 16 > r = 0x2b78aee04f00 > old_info = {<RefCountObj> = {<ForceVFPTToTop> = {_vptr.ForceVFPTToTop > = 0x7f3630}, m_refcount = 0}, iobuffer_index = 0, > key = 47797242059264, app = {allotment = {application1 = 5326300, > application2 = 0}, http_data = {http_version = 4, > pipeline_max = 59, keepalive_timeout = 17, fail_count = 81, > unused1 = 0, last_failure = 0}, rr = {offset = 5326300}}, data = { > ip = {sa = {sa_family = 54488, sa_data = > "^\000\000\000\000\000\020\034$\274x+\000"}, sin = {sin_family = 54488, > sin_port = 94, > sin_addr = {s_addr = 0}, sin_zero = "\020\034$\274x+\000"}, > sin6 = {sin6_family = 54488, sin6_port = 94, sin6_flowinfo = 0, > sin6_addr = {__in6_u = {__u6_addr8 = > "\020\034$\274x+\000\000\030\036$\274\375\b\000", __u6_addr16 = {7184, 48164, > 11128, > 0, 7704, 48164, 2301, 0}, __u6_addr32 = {3156483088, > 11128, 3156483608, 2301}}}, sin6_scope_id = 3156478176}}, > hostname_offset = 6214872, srv = {srv_offset = 54488, srv_weight > = 94, srv_priority = 0, srv_port = 0, key = 3156483088}}, > hostname_offset = 11128, ip_timestamp = 2845989456, > ip_timeout_interval = 11128, is_srv = 0, reverse_dns = 0, round_robin = 1, > round_robin_elt = 0} > valid_records = 0 > tip = {_family = 2, _addr = {_ip4 = 540420056, _ip6 = {__in6_u = > {__u6_addr8 = "\330'6 x+\000\000\360L\020\250x+\000", > __u6_addr16 = {10200, 8246, 11128, 0, 19696, 43024, 11128, > 0}, __u6_addr32 = {540420056, 11128, 2819640560, 11128}}}, > _byte = "\330'6 x+\000\000\360L\020\250x+\000", _u32 = > {540420056, 11128, 2819640560, 11128}, _u64 = {47794936489944, > 47797215710448}}} > ttl_seconds = 132 > aname = 0x2b7938021000 "fbmm1.zenfs.com" > offset = 96 > thread = 0x2b78a8101010 > __func__ = "dnsEvent" > #7 0x00000000005145dc in Continuation::handleEvent (this=0x2b7938020f00, > event=600, data=0x2b78ac009440) > at ../iocore/eventsystem/I_Continuation.h:153 > No locals. > #8 0x00000000006f681e in DNSEntry::postEvent (this=0x2b78f4028600) at > DNS.cc:1269 > __func__ = "postEvent" > #9 0x00000000005145dc in Continuation::handleEvent (this=0x2b78f4028600, > event=1, data=0x2aac954db040) > at ../iocore/eventsystem/I_Continuation.h:153 > No locals. > #10 0x00000000007bc9be in EThread::process_event (this=0x2b78a8101010, > e=0x2aac954db040, calling_code=1) at UnixEThread.cc:143 > c_temp = 0x2b78f4028600 > lock = {m = {m_ptr = 0x17dea10}, lock_acquired = true} > __func__ = "process_event" > #11 0x00000000007bcc2d in EThread::execute (this=0x2b78a8101010) at > UnixEThread.cc:197 > done_one = false > e = 0x2aac954db040 > NegativeQueue = {<DLL<Event, Event::Link_link>> = {head = 0x18ce400}, > tail = 0x18ce400} > next_time = 1475191803711988905 > __func__ = "execute" > #12 0x00000000007bbfd2 in spawn_thread_internal (a=0x17fb9a0) at Thread.cc:84 > p = 0x17fb9a0 > #13 0x00002b78a2555aa1 in start_thread () from /lib64/libpthread.so.0 > No symbol table info available. > #14 0x00000032310e893d in clone () from /lib64/libc.so.6 > No symbol table info available. > core == ET_NET 13 and core == ET_NET 20 -- This message was sent by Atlassian JIRA (v6.3.4#6332)