[ 
https://issues.apache.org/jira/browse/TS-4915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15541238#comment-15541238
 ] 

Susan Hinrichs commented on TS-4915:
------------------------------------

Yes, I assume this is a 7.0.0 issue.  Though I cannot say for certain since I'm 
working on a branch to fix ts-4813 for now.  With that branch, it looks like 
this core appears every couple hours on a light traffic, predominately caching 
production server.

[~jacksontj] since hostdb shows up in the stack does this look like anything 
from the resent hostdb changes?  I have another less frequent stack that shows 
up in hostdb land.  I have seen this following stack 2-3 times since Friday.

{code}
#0  0x00000000007b7c3f in ink_atomic_increment<long, long> 
(mem=0xff4e45fa6000802c, count=1) at ../../lib/ts/ink_atomic.h:95
#1  0x00000000007b7703 in RecIncrGlobalRawStatCount (rsb=0x2b1a60037f90, id=4, 
incr=1) at RecRawStats.cc:467
#2  0x000000000054915d in RefCountCachePartition<HostDBInfo>::metric_inc 
(this=0x1477c50, metric_enum=refcountcache_total_lookups_stat, data=1)
    at ../iocore/hostdb/P_RefCountCache.h:327
#3  0x00000000006ebfca in RefCountCachePartition<HostDBInfo>::get 
(this=0x1477c50, key=18396718469509840932) at ./P_RefCountCache.h:174
#4  0x00000000006eb38f in RefCountCache<HostDBInfo>::get (this=0x1484f00, 
key=18396718469509840932) at ./P_RefCountCache.h:455
#5  0x00000000006de4bf in probe (mutex=0x1457a80, md5=..., 
ignore_timeout=false) at HostDB.cc:527
#6  0x00000000006dfbfe in HostDBProcessor::getbyname_imm (this=0xc173c0, 
cont=0x2b19fc388440, process_hostdb_info=
    (void (Continuation::*)(Continuation *, HostDBInfo *)) 0x5e8ece 
<HttpSM::process_hostdb_info(HostDBInfo*)>, 
    hostname=0x2b1a2d0c1419 "lib.lumcs.com", len=0, opt=...) at HostDB.cc:818
#7  0x00000000005f1102 in HttpSM::do_hostdb_lookup (this=0x2b19fc388440) at 
HttpSM.cc:4130
#8  0x00000000005fd691 in HttpSM::set_next_state (this=0x2b19fc388440) at 
HttpSM.cc:7256
#9  0x00000000005fca85 in HttpSM::call_transact_and_set_next_state 
(this=0x2b19fc388440, f=0) at HttpSM.cc:7122
#10 0x00000000005e6eab in HttpSM::handle_api_return (this=0x2b19fc388440) at 
HttpSM.cc:1604
#11 0x000000000060414c in HttpSM::do_api_callout (this=0x2b19fc388440) at 
HttpSM.cc:438
#12 0x00000000005fcaf2 in HttpSM::set_next_state (this=0x2b19fc388440) at 
HttpSM.cc:7155
#13 0x00000000005fca85 in HttpSM::call_transact_and_set_next_state 
(this=0x2b19fc388440, f=0) at HttpSM.cc:7122
#14 0x00000000005e6eab in HttpSM::handle_api_return (this=0x2b19fc388440) at 
HttpSM.cc:1604
#15 0x000000000060414c in HttpSM::do_api_callout (this=0x2b19fc388440) at 
HttpSM.cc:438
#16 0x00000000005fcaf2 in HttpSM::set_next_state (this=0x2b19fc388440) at 
HttpSM.cc:7155
#17 0x00000000005fca85 in HttpSM::call_transact_and_set_next_state 
(this=0x2b19fc388440, f=
    0x616a42 <HttpTransact::HandleCacheOpenRead(HttpTransact::State*)>) at 
HttpSM.cc:7122
#18 0x00000000005eaf09 in HttpSM::state_cache_open_read (this=0x2b19fc388440, 
event=1102, data=0x2b1a3041e740) at HttpSM.cc:2596
#19 0x00000000005eb4b5 in HttpSM::main_handler (this=0x2b19fc388440, 
event=1102, data=0x2b1a3041e740) at HttpSM.cc:2658
#20 0x00000000005145dc in Continuation::handleEvent (this=0x2b19fc388440, 
event=1102, data=0x2b1a3041e740)
    at ../iocore/eventsystem/I_Continuation.h:153
#21 0x00000000005d3818 in HttpCacheSM::state_cache_open_read 
(this=0x2b19fc389d60, event=1102, data=0x2b1a3041e740) at HttpCacheSM.cc:132
#22 0x00000000005145dc in Continuation::handleEvent (this=0x2b19fc389d60, 
event=1102, data=0x2b1a3041e740)
    at ../iocore/eventsystem/I_Continuation.h:153
#23 0x0000000000756997 in CacheVC::callcont (this=0x2b1a3041e740, event=1102) 
at P_CacheInternal.h:643
#24 0x0000000000754aa6 in CacheVC::openReadStartEarliest (this=0x2b1a3041e740) 
at CacheRead.cc:914
#25 0x00000000005145dc in Continuation::handleEvent (this=0x2b1a3041e740, 
event=3900, data=0x0) at ../iocore/eventsystem/I_Continuation.h:153
#26 0x00000000007318f5 in CacheVC::handleReadDone (this=0x2b1a3041e740, 
event=3900, e=0x2b1a3041e8c8) at Cache.cc:2445
#27 0x00000000005145dc in Continuation::handleEvent (this=0x2b1a3041e740, 
event=3900, data=0x2b1a3041e8c8)
    at ../iocore/eventsystem/I_Continuation.h:153
#28 0x0000000000736ebd in AIOCallbackInternal::io_complete 
(this=0x2b1a3041e8c8, event=1, data=0x2b1a0c0041e0) at 
../../iocore/aio/P_AIO.h:117
#29 0x00000000005145dc in Continuation::handleEvent (this=0x2b1a3041e8c8, 
event=1, data=0x2b1a0c0041e0) at ../iocore/eventsystem/I_Continuation.h:153
#30 0x00000000007bca24 in EThread::process_event (this=0x2b19e1824010, 
e=0x2b1a0c0041e0, calling_code=1) at UnixEThread.cc:143
#31 0x00000000007bcc93 in EThread::execute (this=0x2b19e1824010) at 
UnixEThread.cc:197
#32 0x00000000007bc038 in spawn_thread_internal (a=0x147beb0) at Thread.cc:84
#33 0x00002b19db4a9aa1 in start_thread () from /lib64/libpthread.so.0
#34 0x00000032310e893d in clone () from /lib64/libc.so.6
{code}

> Crash from hostdb in PriorityQueueLess
> --------------------------------------
>
>                 Key: TS-4915
>                 URL: https://issues.apache.org/jira/browse/TS-4915
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: HostDB
>            Reporter: Susan Hinrichs
>            Priority: Blocker
>             Fix For: 7.1.0
>
>
> Saw this while testing fix for TS-4813 with debug enabled.
> {code}
> (gdb) bt full
> #0  0x0000000000547bfe in RefCountCacheHashEntry::operator< (this=0x1cc0880, 
> v2=...) at ../iocore/hostdb/P_RefCountCache.h:94
> No locals.
> #1  0x000000000054988d in 
> PriorityQueueLess<RefCountCacheHashEntry*>::operator() (this=0x2b78a9a2587b, 
> a=@0x2b78f402af68, b=@0x2b78f402aa28)
>     at ../lib/ts/PriorityQueue.h:41
> No locals.
> #2  0x0000000000549785 in PriorityQueue<RefCountCacheHashEntry*, 
> PriorityQueueLess<RefCountCacheHashEntry*> >::_bubble_up (this=0x1cb2990, 
>     index=2) at ../lib/ts/PriorityQueue.h:191
>         comp = {<No data fields>}
>         parent = 0
> #3  0x00000000006ecfcc in PriorityQueue<RefCountCacheHashEntry*, 
> PriorityQueueLess<RefCountCacheHashEntry*> >::push (this=0x1cb2990, 
>     entry=0x2b78f402af60) at ../../lib/ts/PriorityQueue.h:91
>         len = 2
> #4  0x00000000006ec206 in RefCountCachePartition<HostDBInfo>::put 
> (this=0x1cb2900, key=6912554662447498853, item=0x2b78aee04f00, size=96, 
>     expire_time=1475202356) at ./P_RefCountCache.h:210
>         expiry_entry = 0x2b78f402af60
>         __func__ = "put"
>         val = 0x1cc0880
> #5  0x00000000006eb3de in RefCountCache<HostDBInfo>::put (this=0x18051e0, 
> key=6912554662447498853, item=0x2b78aee04f00, size=16, 
>     expiry_time=1475202356) at ./P_RefCountCache.h:462
> No locals.
> #6  0x00000000006e2d8e in HostDBContinuation::dnsEvent (this=0x2b7938020f00, 
> event=600, e=0x2b78ac009440) at HostDB.cc:1422
>         is_rr = false
>         old_rr_data = 0x0
>         first_record = 0x2b78ac0094f8
>         m = 0x1
>         failed = false
>         old_r = {m_ptr = 0x0}
>         af = 2 '\002'
>         s_size = 16
>         rrsize = 0
>         allocSize = 16
>         r = 0x2b78aee04f00
>         old_info = {<RefCountObj> = {<ForceVFPTToTop> = {_vptr.ForceVFPTToTop 
> = 0x7f3630}, m_refcount = 0}, iobuffer_index = 0, 
>           key = 47797242059264, app = {allotment = {application1 = 5326300, 
> application2 = 0}, http_data = {http_version = 4, 
>               pipeline_max = 59, keepalive_timeout = 17, fail_count = 81, 
> unused1 = 0, last_failure = 0}, rr = {offset = 5326300}}, data = {
>             ip = {sa = {sa_family = 54488, sa_data = 
> "^\000\000\000\000\000\020\034$\274x+\000"}, sin = {sin_family = 54488, 
> sin_port = 94, 
>                 sin_addr = {s_addr = 0}, sin_zero = "\020\034$\274x+\000"}, 
> sin6 = {sin6_family = 54488, sin6_port = 94, sin6_flowinfo = 0, 
>                 sin6_addr = {__in6_u = {__u6_addr8 = 
> "\020\034$\274x+\000\000\030\036$\274\375\b\000", __u6_addr16 = {7184, 48164, 
> 11128, 
>                       0, 7704, 48164, 2301, 0}, __u6_addr32 = {3156483088, 
> 11128, 3156483608, 2301}}}, sin6_scope_id = 3156478176}}, 
>             hostname_offset = 6214872, srv = {srv_offset = 54488, srv_weight 
> = 94, srv_priority = 0, srv_port = 0, key = 3156483088}}, 
>           hostname_offset = 11128, ip_timestamp = 2845989456, 
> ip_timeout_interval = 11128, is_srv = 0, reverse_dns = 0, round_robin = 1, 
>           round_robin_elt = 0}
>         valid_records = 0
>         tip = {_family = 2, _addr = {_ip4 = 540420056, _ip6 = {__in6_u = 
> {__u6_addr8 = "\330'6 x+\000\000\360L\020\250x+\000", 
>                 __u6_addr16 = {10200, 8246, 11128, 0, 19696, 43024, 11128, 
> 0}, __u6_addr32 = {540420056, 11128, 2819640560, 11128}}}, 
>             _byte = "\330'6 x+\000\000\360L\020\250x+\000", _u32 = 
> {540420056, 11128, 2819640560, 11128}, _u64 = {47794936489944, 
>               47797215710448}}}
>         ttl_seconds = 132
>         aname = 0x2b7938021000 "fbmm1.zenfs.com"
>         offset = 96
>         thread = 0x2b78a8101010
>         __func__ = "dnsEvent"
> #7  0x00000000005145dc in Continuation::handleEvent (this=0x2b7938020f00, 
> event=600, data=0x2b78ac009440)
>     at ../iocore/eventsystem/I_Continuation.h:153
> No locals.
> #8  0x00000000006f681e in DNSEntry::postEvent (this=0x2b78f4028600) at 
> DNS.cc:1269
>         __func__ = "postEvent"
> #9  0x00000000005145dc in Continuation::handleEvent (this=0x2b78f4028600, 
> event=1, data=0x2aac954db040)
>     at ../iocore/eventsystem/I_Continuation.h:153
> No locals.
> #10 0x00000000007bc9be in EThread::process_event (this=0x2b78a8101010, 
> e=0x2aac954db040, calling_code=1) at UnixEThread.cc:143
>         c_temp = 0x2b78f4028600
>         lock = {m = {m_ptr = 0x17dea10}, lock_acquired = true}
>         __func__ = "process_event"
> #11 0x00000000007bcc2d in EThread::execute (this=0x2b78a8101010) at 
> UnixEThread.cc:197
>         done_one = false
>         e = 0x2aac954db040
>         NegativeQueue = {<DLL<Event, Event::Link_link>> = {head = 0x18ce400}, 
> tail = 0x18ce400}
>         next_time = 1475191803711988905
>         __func__ = "execute"
> #12 0x00000000007bbfd2 in spawn_thread_internal (a=0x17fb9a0) at Thread.cc:84
>         p = 0x17fb9a0
> #13 0x00002b78a2555aa1 in start_thread () from /lib64/libpthread.so.0
> No symbol table info available.
> #14 0x00000032310e893d in clone () from /lib64/libc.so.6
> No symbol table info available.
> core == ET_NET 13 and core == ET_NET 20



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to