[jira] [Commented] (TS-4915) Crash from hostdb in PriorityQueueLess

2016-10-10 Thread Bryan Call (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-4915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15563661#comment-15563661
 ] 

Bryan Call commented on TS-4915:


It does look like the DNS thread is sticking to the same thread `ET_NET 0` in 
7.0.0, so this doesn't look like the issue.

> Crash from hostdb in PriorityQueueLess
> --
>
> Key: TS-4915
> URL: https://issues.apache.org/jira/browse/TS-4915
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HostDB
>Reporter: Susan Hinrichs
>Priority: Blocker
> Fix For: 7.1.0
>
> Attachments: ts-4915.diff
>
>
> Saw this while testing fix for TS-4813 with debug enabled.
> {code}
> (gdb) bt full
> #0  0x00547bfe in RefCountCacheHashEntry::operator< (this=0x1cc0880, 
> v2=...) at ../iocore/hostdb/P_RefCountCache.h:94
> No locals.
> #1  0x0054988d in 
> PriorityQueueLess::operator() (this=0x2b78a9a2587b, 
> a=@0x2b78f402af68, b=@0x2b78f402aa28)
> at ../lib/ts/PriorityQueue.h:41
> No locals.
> #2  0x00549785 in PriorityQueue PriorityQueueLess >::_bubble_up (this=0x1cb2990, 
> index=2) at ../lib/ts/PriorityQueue.h:191
> comp = {}
> parent = 0
> #3  0x006ecfcc in PriorityQueue PriorityQueueLess >::push (this=0x1cb2990, 
> entry=0x2b78f402af60) at ../../lib/ts/PriorityQueue.h:91
> len = 2
> #4  0x006ec206 in RefCountCachePartition::put 
> (this=0x1cb2900, key=6912554662447498853, item=0x2b78aee04f00, size=96, 
> expire_time=1475202356) at ./P_RefCountCache.h:210
> expiry_entry = 0x2b78f402af60
> __func__ = "put"
> val = 0x1cc0880
> #5  0x006eb3de in RefCountCache::put (this=0x18051e0, 
> key=6912554662447498853, item=0x2b78aee04f00, size=16, 
> expiry_time=1475202356) at ./P_RefCountCache.h:462
> No locals.
> #6  0x006e2d8e in HostDBContinuation::dnsEvent (this=0x2b7938020f00, 
> event=600, e=0x2b78ac009440) at HostDB.cc:1422
> is_rr = false
> old_rr_data = 0x0
> first_record = 0x2b78ac0094f8
> m = 0x1
> failed = false
> old_r = {m_ptr = 0x0}
> af = 2 '\002'
> s_size = 16
> rrsize = 0
> allocSize = 16
> r = 0x2b78aee04f00
> old_info = { = { = {_vptr.ForceVFPTToTop 
> = 0x7f3630}, m_refcount = 0}, iobuffer_index = 0, 
>   key = 47797242059264, app = {allotment = {application1 = 5326300, 
> application2 = 0}, http_data = {http_version = 4, 
>   pipeline_max = 59, keepalive_timeout = 17, fail_count = 81, 
> unused1 = 0, last_failure = 0}, rr = {offset = 5326300}}, data = {
> ip = {sa = {sa_family = 54488, sa_data = 
> "^\000\000\000\000\000\020\034$\274x+\000"}, sin = {sin_family = 54488, 
> sin_port = 94, 
> sin_addr = {s_addr = 0}, sin_zero = "\020\034$\274x+\000"}, 
> sin6 = {sin6_family = 54488, sin6_port = 94, sin6_flowinfo = 0, 
> sin6_addr = {__in6_u = {__u6_addr8 = 
> "\020\034$\274x+\000\000\030\036$\274\375\b\000", __u6_addr16 = {7184, 48164, 
> 11128, 
>   0, 7704, 48164, 2301, 0}, __u6_addr32 = {3156483088, 
> 11128, 3156483608, 2301}}}, sin6_scope_id = 3156478176}}, 
> hostname_offset = 6214872, srv = {srv_offset = 54488, srv_weight 
> = 94, srv_priority = 0, srv_port = 0, key = 3156483088}}, 
>   hostname_offset = 11128, ip_timestamp = 2845989456, 
> ip_timeout_interval = 11128, is_srv = 0, reverse_dns = 0, round_robin = 1, 
>   round_robin_elt = 0}
> valid_records = 0
> tip = {_family = 2, _addr = {_ip4 = 540420056, _ip6 = {__in6_u = 
> {__u6_addr8 = "\330'6 x+\000\000\360L\020\250x+\000", 
> __u6_addr16 = {10200, 8246, 11128, 0, 19696, 43024, 11128, 
> 0}, __u6_addr32 = {540420056, 11128, 2819640560, 11128}}}, 
> _byte = "\330'6 x+\000\000\360L\020\250x+\000", _u32 = 
> {540420056, 11128, 2819640560, 11128}, _u64 = {47794936489944, 
>   47797215710448}}}
> ttl_seconds = 132
> aname = 0x2b7938021000 "fbmm1.zenfs.com"
> offset = 96
> thread = 0x2b78a8101010
> __func__ = "dnsEvent"
> #7  0x005145dc in Continuation::handleEvent (this=0x2b7938020f00, 
> event=600, data=0x2b78ac009440)
> at ../iocore/eventsystem/I_Continuation.h:153
> No locals.
> #8  0x006f681e in DNSEntry::postEvent (this=0x2b78f4028600) at 
> DNS.cc:1269
> __func__ = "postEvent"
> #9  0x005145dc in Continuation::handleEvent (this=0x2b78f4028600, 
> event=1, data=0x2aac954db040)
> at ../iocore/eventsystem/I_Continuation.h:153
> No locals.
> #10 0x007bc9be in EThread::process_event 

[jira] [Commented] (TS-4915) Crash from hostdb in PriorityQueueLess

2016-10-10 Thread Bryan Call (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-4915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15563160#comment-15563160
 ] 

Bryan Call commented on TS-4915:


DNS is now running on the task thread. That is going to be an issue:

{noformat}
ATS 7.0.0
[Oct 10 11:56:15.457] Server {0x7f1f9a8a9700} DEBUG:  (http_seq) [HttpSM::do_hostdb_lookup] Doing DNS Lookup
[Oct 10 11:56:16.458] Server {0x7f1f9a8a9700} DEBUG:  
(dns) received query www.yahoo.com type = 1, timeout = 0
[Oct 10 11:56:16.458] Server {0x7f1f9a8a9700} DEBUG:  
(dns) enqueing query www.yahoo.com
[Oct 10 11:56:16.458] Server {0x7f1f9a8a9700} DEBUG:  
(dns) adding first to collapsing queue
[Oct 10 11:56:16.458] Server {0x7f1f9a8a9700} DEBUG:  (dns) send query (qtype=1) for www.yahoo.com to fd 27
[Oct 10 11:56:16.458] Server {0x7f1f9a8a9700} DEBUG:  (dns) sent qname = www.yahoo.com, id = 60486, nameserver = 1
[Oct 10 11:56:16.458] Server {0x7f1f9a8a9700} DEBUG:  (dns) sent_one: failover_number for resolver 1 is 1
[Oct 10 11:56:16.580] Server {0x7f1f9a8a9700} DEBUG:  
(dns) received packet size = 90
[Oct 10 11:56:16.580] Server {0x7f1f9a8a9700} DEBUG:  
(dns) round-robin: nameserver 1 DNS response code = 0
[Oct 10 11:56:16.580] Server {0x7f1f9a8a9700} DEBUG:  (dns) received cname = fd-fp3.wg1.b.yahoo.com
[Oct 10 11:56:16.581] Server {0x7f1f9a8a9700} DEBUG:  (dns) received A name = fd-fp3.wg1.b.yahoo.com
[Oct 10 11:56:16.581] Server {0x7f1f9a8a9700} DEBUG:  (dns) received A = 98.139.180.149
[Oct 10 11:56:16.581] Server {0x7f1f9a8a9700} DEBUG:  (dns) received A = 98.139.183.24
[Oct 10 11:56:16.581] Server {0x7f1f9a8a9700} DEBUG:  
(dns) SUCCESS result for www.yahoo.com = 98.139.180.149 retry 0
[Oct 10 11:56:16.581] Server {0x7f1f9a8a9700} DEBUG:  
(dns) called back continuation for www.yahoo.com
[Oct 10 11:56:16.581] Server {0x7f1f9a8a9700} DEBUG:  (http) [0] [HttpSM::main_handler, EVENT_HOST_DB_LOOKUP]t.cc:752 
(StartRemapRequest)> (http_trans) [0] START HttpTransact::StartRemapRequest

ATS 6.2.0
[Oct 10 18:58:14.637] Server {0x7f2752e92700} DEBUG:  (http_seq) [HttpSM::do_hostdb_lookup] Doing DNS Lookup
[Oct 10 18:58:14.658] Server {0x7f2752e92700} DEBUG:  
(dns) received query www.yahoo.com type = 1, timeout = 0
[Oct 10 18:58:14.658] Server {0x7f2758757740} DEBUG:  
(dns) enqueing query www.yahoo.com
[Oct 10 18:58:14.658] Server {0x7f2758757740} DEBUG:  
(dns) adding first to collapsing queue
[Oct 10 18:58:14.658] Server {0x7f2758757740} DEBUG:  (dns) send query (qtype=1) for www.yahoo.com to fd 46
[Oct 10 18:58:14.658] Server {0x7f2758757740} DEBUG:  (dns) sent qname = www.yahoo.com, id = 28251, nameserver = 1
[Oct 10 18:58:14.658] Server {0x7f2758757740} DEBUG:  (dns) sent_one: failover_number for resolver 1 is 1
[Oct 10 18:58:14.695] Server {0x7f2758757740} DEBUG:  
(dns) received packet size = 224
[Oct 10 18:58:14.695] Server {0x7f2758757740} DEBUG:  
(dns) round-robin: nameserver 1 DNS response code = 0
[Oct 10 18:58:14.695] Server {0x7f2758757740} DEBUG:  (dns) received cname = fd-fp3.wg1.b.yahoo.com
[Oct 10 18:58:14.695] Server {0x7f2758757740} DEBUG:  (dns) received A name = fd-fp3.wg1.b.yahoo.com
[Oct 10 18:58:14.695] Server {0x7f2758757740} DEBUG:  (dns) received A = 72.30.202.251
[Oct 10 18:58:14.695] Server {0x7f2758757740} DEBUG:  
(dns) SUCCESS result for www.yahoo.com = 72.30.202.251 retry 0
[Oct 10 18:58:14.695] Server {0x7f2752e92700} DEBUG:  
(dns) called back continuation for www.yahoo.com
[Oct 10 18:58:14.695] Server {0x7f2752e92700} DEBUG:  (http) [0] [HttpSM::main_handler, EVENT_HOST_DB_LOOKUP]
{noformat}


> Crash from hostdb in PriorityQueueLess
> --
>
> Key: TS-4915
> URL: https://issues.apache.org/jira/browse/TS-4915
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HostDB
>Reporter: Susan Hinrichs
>Priority: Blocker
> Fix For: 7.1.0
>
> Attachments: ts-4915.diff
>
>
> Saw this while testing fix for TS-4813 with debug enabled.
> {code}
> (gdb) bt full
> #0  0x00547bfe in RefCountCacheHashEntry::operator< (this=0x1cc0880, 
> v2=...) at ../iocore/hostdb/P_RefCountCache.h:94
> No locals.
> #1  0x0054988d in 
> PriorityQueueLess::operator() (this=0x2b78a9a2587b, 
> a=@0x2b78f402af68, b=@0x2b78f402aa28)
> at ../lib/ts/PriorityQueue.h:41
> No locals.
> #2  0x00549785 in PriorityQueue PriorityQueueLess >::_bubble_up (this=0x1cb2990, 
> index=2) at ../lib/ts/PriorityQueue.h:191
> comp = {}
> parent = 0
> #3  0x006ecfcc in PriorityQueue PriorityQueueLess >::push (this=0x1cb2990, 
> entry=0x2b78f402af60) at ../../lib/ts/PriorityQueue.h:91
> len = 2
> #4  0x006ec206 in RefCountCachePartition::put 
> (this=0x1cb2900, 

[jira] [Commented] (TS-4915) Crash from hostdb in PriorityQueueLess

2016-10-10 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-4915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15563113#comment-15563113
 ] 

Susan Hinrichs commented on TS-4915:


Now see a slightly different stack

{code}
(gdb) bt
#0  0x00547b2a in RefCountCacheHashEntry::operator< 
(this=0x2b1dd00d8a80, v2=...) at ../iocore/hostdb/P_RefCountCache.h:94
#1  0x005497ed in 
PriorityQueueLess::operator() (this=0x2b1d5cc1487b, 
a=@0x2b1dd00dae38, b=@0x2b1dd00daea8)
at ../lib/ts/PriorityQueue.h:41
#2  0x005496e5 in PriorityQueue >::_bubble_up (this=0x2155670, 
index=3)
at ../lib/ts/PriorityQueue.h:192
#3  0x006eccdc in PriorityQueue >::push (this=0x2155670, 
entry=0x2b1dd00dae30) at ../../lib/ts/PriorityQueue.h:91
#4  0x006ebf28 in RefCountCachePartition::put 
(this=0x21555e0, key=18396718469509840932, item=0x2b1d7abadf80, size=94, 
expire_time=1476124925) at ./P_RefCountCache.h:210
#5  0x006eb100 in RefCountCache::put (this=0x1ca8220, 
key=18396718469509840932, item=0x2b1d7abadf80, size=14, 
expiry_time=1476124925) at ./P_RefCountCache.h:462
#6  0x006e2ab0 in HostDBContinuation::dnsEvent (this=0x2b1dec047b80, 
event=600, e=0x2b1d622e8000) at HostDB.cc:1424
#7  0x0051453e in Continuation::handleEvent (this=0x2b1dec047b80, 
event=600, data=0x2b1d622e8000) at ../iocore/eventsystem/I_Continuation.h:153
#8  0x006f64b2 in DNSEntry::postEvent (this=0x2b1dd00cf200) at 
DNS.cc:1269
#9  0x0051453e in Continuation::handleEvent (this=0x2b1dd00cf200, 
event=1, data=0x2b1d6ec25220) at ../iocore/eventsystem/I_Continuation.h:153
#10 0x007bc572 in EThread::process_event (this=0x2b1d57270010, 
e=0x2b1d6ec25220, calling_code=1) at UnixEThread.cc:143
#11 0x007bc7e1 in EThread::execute (this=0x2b1d57270010) at 
UnixEThread.cc:197
#12 0x007bbb86 in spawn_thread_internal (a=0x1c9df40) at Thread.cc:84
#13 0x2b1d55a88aa1 in start_thread () from /lib64/libpthread.so.0
#14 0x0032310e893d in clone () from /lib64/libc.so.6
{code}

> Crash from hostdb in PriorityQueueLess
> --
>
> Key: TS-4915
> URL: https://issues.apache.org/jira/browse/TS-4915
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HostDB
>Reporter: Susan Hinrichs
>Priority: Blocker
> Fix For: 7.1.0
>
> Attachments: ts-4915.diff
>
>
> Saw this while testing fix for TS-4813 with debug enabled.
> {code}
> (gdb) bt full
> #0  0x00547bfe in RefCountCacheHashEntry::operator< (this=0x1cc0880, 
> v2=...) at ../iocore/hostdb/P_RefCountCache.h:94
> No locals.
> #1  0x0054988d in 
> PriorityQueueLess::operator() (this=0x2b78a9a2587b, 
> a=@0x2b78f402af68, b=@0x2b78f402aa28)
> at ../lib/ts/PriorityQueue.h:41
> No locals.
> #2  0x00549785 in PriorityQueue PriorityQueueLess >::_bubble_up (this=0x1cb2990, 
> index=2) at ../lib/ts/PriorityQueue.h:191
> comp = {}
> parent = 0
> #3  0x006ecfcc in PriorityQueue PriorityQueueLess >::push (this=0x1cb2990, 
> entry=0x2b78f402af60) at ../../lib/ts/PriorityQueue.h:91
> len = 2
> #4  0x006ec206 in RefCountCachePartition::put 
> (this=0x1cb2900, key=6912554662447498853, item=0x2b78aee04f00, size=96, 
> expire_time=1475202356) at ./P_RefCountCache.h:210
> expiry_entry = 0x2b78f402af60
> __func__ = "put"
> val = 0x1cc0880
> #5  0x006eb3de in RefCountCache::put (this=0x18051e0, 
> key=6912554662447498853, item=0x2b78aee04f00, size=16, 
> expiry_time=1475202356) at ./P_RefCountCache.h:462
> No locals.
> #6  0x006e2d8e in HostDBContinuation::dnsEvent (this=0x2b7938020f00, 
> event=600, e=0x2b78ac009440) at HostDB.cc:1422
> is_rr = false
> old_rr_data = 0x0
> first_record = 0x2b78ac0094f8
> m = 0x1
> failed = false
> old_r = {m_ptr = 0x0}
> af = 2 '\002'
> s_size = 16
> rrsize = 0
> allocSize = 16
> r = 0x2b78aee04f00
> old_info = { = { = {_vptr.ForceVFPTToTop 
> = 0x7f3630}, m_refcount = 0}, iobuffer_index = 0, 
>   key = 47797242059264, app = {allotment = {application1 = 5326300, 
> application2 = 0}, http_data = {http_version = 4, 
>   pipeline_max = 59, keepalive_timeout = 17, fail_count = 81, 
> unused1 = 0, last_failure = 0}, rr = {offset = 5326300}}, data = {
> ip = {sa = {sa_family = 54488, sa_data = 
> "^\000\000\000\000\000\020\034$\274x+\000"}, sin = {sin_family = 54488, 
> sin_port = 

[jira] [Commented] (TS-4915) Crash from hostdb in PriorityQueueLess

2016-10-10 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-4915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15562927#comment-15562927
 ] 

Susan Hinrichs commented on TS-4915:


assert twigged with entry->index == 5 and _v.length() == 4.  Digging through 
the logic to see it is reasonable to get in this state.  In which case doing a 
check here should suffice, or if there is a broader race condition we should be 
concerned about.

> Crash from hostdb in PriorityQueueLess
> --
>
> Key: TS-4915
> URL: https://issues.apache.org/jira/browse/TS-4915
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HostDB
>Reporter: Susan Hinrichs
>Priority: Blocker
> Fix For: 7.1.0
>
>
> Saw this while testing fix for TS-4813 with debug enabled.
> {code}
> (gdb) bt full
> #0  0x00547bfe in RefCountCacheHashEntry::operator< (this=0x1cc0880, 
> v2=...) at ../iocore/hostdb/P_RefCountCache.h:94
> No locals.
> #1  0x0054988d in 
> PriorityQueueLess::operator() (this=0x2b78a9a2587b, 
> a=@0x2b78f402af68, b=@0x2b78f402aa28)
> at ../lib/ts/PriorityQueue.h:41
> No locals.
> #2  0x00549785 in PriorityQueue PriorityQueueLess >::_bubble_up (this=0x1cb2990, 
> index=2) at ../lib/ts/PriorityQueue.h:191
> comp = {}
> parent = 0
> #3  0x006ecfcc in PriorityQueue PriorityQueueLess >::push (this=0x1cb2990, 
> entry=0x2b78f402af60) at ../../lib/ts/PriorityQueue.h:91
> len = 2
> #4  0x006ec206 in RefCountCachePartition::put 
> (this=0x1cb2900, key=6912554662447498853, item=0x2b78aee04f00, size=96, 
> expire_time=1475202356) at ./P_RefCountCache.h:210
> expiry_entry = 0x2b78f402af60
> __func__ = "put"
> val = 0x1cc0880
> #5  0x006eb3de in RefCountCache::put (this=0x18051e0, 
> key=6912554662447498853, item=0x2b78aee04f00, size=16, 
> expiry_time=1475202356) at ./P_RefCountCache.h:462
> No locals.
> #6  0x006e2d8e in HostDBContinuation::dnsEvent (this=0x2b7938020f00, 
> event=600, e=0x2b78ac009440) at HostDB.cc:1422
> is_rr = false
> old_rr_data = 0x0
> first_record = 0x2b78ac0094f8
> m = 0x1
> failed = false
> old_r = {m_ptr = 0x0}
> af = 2 '\002'
> s_size = 16
> rrsize = 0
> allocSize = 16
> r = 0x2b78aee04f00
> old_info = { = { = {_vptr.ForceVFPTToTop 
> = 0x7f3630}, m_refcount = 0}, iobuffer_index = 0, 
>   key = 47797242059264, app = {allotment = {application1 = 5326300, 
> application2 = 0}, http_data = {http_version = 4, 
>   pipeline_max = 59, keepalive_timeout = 17, fail_count = 81, 
> unused1 = 0, last_failure = 0}, rr = {offset = 5326300}}, data = {
> ip = {sa = {sa_family = 54488, sa_data = 
> "^\000\000\000\000\000\020\034$\274x+\000"}, sin = {sin_family = 54488, 
> sin_port = 94, 
> sin_addr = {s_addr = 0}, sin_zero = "\020\034$\274x+\000"}, 
> sin6 = {sin6_family = 54488, sin6_port = 94, sin6_flowinfo = 0, 
> sin6_addr = {__in6_u = {__u6_addr8 = 
> "\020\034$\274x+\000\000\030\036$\274\375\b\000", __u6_addr16 = {7184, 48164, 
> 11128, 
>   0, 7704, 48164, 2301, 0}, __u6_addr32 = {3156483088, 
> 11128, 3156483608, 2301}}}, sin6_scope_id = 3156478176}}, 
> hostname_offset = 6214872, srv = {srv_offset = 54488, srv_weight 
> = 94, srv_priority = 0, srv_port = 0, key = 3156483088}}, 
>   hostname_offset = 11128, ip_timestamp = 2845989456, 
> ip_timeout_interval = 11128, is_srv = 0, reverse_dns = 0, round_robin = 1, 
>   round_robin_elt = 0}
> valid_records = 0
> tip = {_family = 2, _addr = {_ip4 = 540420056, _ip6 = {__in6_u = 
> {__u6_addr8 = "\330'6 x+\000\000\360L\020\250x+\000", 
> __u6_addr16 = {10200, 8246, 11128, 0, 19696, 43024, 11128, 
> 0}, __u6_addr32 = {540420056, 11128, 2819640560, 11128}}}, 
> _byte = "\330'6 x+\000\000\360L\020\250x+\000", _u32 = 
> {540420056, 11128, 2819640560, 11128}, _u64 = {47794936489944, 
>   47797215710448}}}
> ttl_seconds = 132
> aname = 0x2b7938021000 "fbmm1.zenfs.com"
> offset = 96
> thread = 0x2b78a8101010
> __func__ = "dnsEvent"
> #7  0x005145dc in Continuation::handleEvent (this=0x2b7938020f00, 
> event=600, data=0x2b78ac009440)
> at ../iocore/eventsystem/I_Continuation.h:153
> No locals.
> #8  0x006f681e in DNSEntry::postEvent (this=0x2b78f4028600) at 
> DNS.cc:1269
> __func__ = "postEvent"
> #9  0x005145dc in Continuation::handleEvent (this=0x2b78f4028600, 
> event=1, data=0x2aac954db040)
> at 

[jira] [Commented] (TS-4915) Crash from hostdb in PriorityQueueLess

2016-10-10 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-4915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15562520#comment-15562520
 ] 

Susan Hinrichs commented on TS-4915:


Interesting.  I assume that entry->index is invalid.  I put in an assert that 
entry->index < _v->length.  Hopefully that gives us a good core dump

> Crash from hostdb in PriorityQueueLess
> --
>
> Key: TS-4915
> URL: https://issues.apache.org/jira/browse/TS-4915
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HostDB
>Reporter: Susan Hinrichs
>Priority: Blocker
> Fix For: 7.1.0
>
>
> Saw this while testing fix for TS-4813 with debug enabled.
> {code}
> (gdb) bt full
> #0  0x00547bfe in RefCountCacheHashEntry::operator< (this=0x1cc0880, 
> v2=...) at ../iocore/hostdb/P_RefCountCache.h:94
> No locals.
> #1  0x0054988d in 
> PriorityQueueLess::operator() (this=0x2b78a9a2587b, 
> a=@0x2b78f402af68, b=@0x2b78f402aa28)
> at ../lib/ts/PriorityQueue.h:41
> No locals.
> #2  0x00549785 in PriorityQueue PriorityQueueLess >::_bubble_up (this=0x1cb2990, 
> index=2) at ../lib/ts/PriorityQueue.h:191
> comp = {}
> parent = 0
> #3  0x006ecfcc in PriorityQueue PriorityQueueLess >::push (this=0x1cb2990, 
> entry=0x2b78f402af60) at ../../lib/ts/PriorityQueue.h:91
> len = 2
> #4  0x006ec206 in RefCountCachePartition::put 
> (this=0x1cb2900, key=6912554662447498853, item=0x2b78aee04f00, size=96, 
> expire_time=1475202356) at ./P_RefCountCache.h:210
> expiry_entry = 0x2b78f402af60
> __func__ = "put"
> val = 0x1cc0880
> #5  0x006eb3de in RefCountCache::put (this=0x18051e0, 
> key=6912554662447498853, item=0x2b78aee04f00, size=16, 
> expiry_time=1475202356) at ./P_RefCountCache.h:462
> No locals.
> #6  0x006e2d8e in HostDBContinuation::dnsEvent (this=0x2b7938020f00, 
> event=600, e=0x2b78ac009440) at HostDB.cc:1422
> is_rr = false
> old_rr_data = 0x0
> first_record = 0x2b78ac0094f8
> m = 0x1
> failed = false
> old_r = {m_ptr = 0x0}
> af = 2 '\002'
> s_size = 16
> rrsize = 0
> allocSize = 16
> r = 0x2b78aee04f00
> old_info = { = { = {_vptr.ForceVFPTToTop 
> = 0x7f3630}, m_refcount = 0}, iobuffer_index = 0, 
>   key = 47797242059264, app = {allotment = {application1 = 5326300, 
> application2 = 0}, http_data = {http_version = 4, 
>   pipeline_max = 59, keepalive_timeout = 17, fail_count = 81, 
> unused1 = 0, last_failure = 0}, rr = {offset = 5326300}}, data = {
> ip = {sa = {sa_family = 54488, sa_data = 
> "^\000\000\000\000\000\020\034$\274x+\000"}, sin = {sin_family = 54488, 
> sin_port = 94, 
> sin_addr = {s_addr = 0}, sin_zero = "\020\034$\274x+\000"}, 
> sin6 = {sin6_family = 54488, sin6_port = 94, sin6_flowinfo = 0, 
> sin6_addr = {__in6_u = {__u6_addr8 = 
> "\020\034$\274x+\000\000\030\036$\274\375\b\000", __u6_addr16 = {7184, 48164, 
> 11128, 
>   0, 7704, 48164, 2301, 0}, __u6_addr32 = {3156483088, 
> 11128, 3156483608, 2301}}}, sin6_scope_id = 3156478176}}, 
> hostname_offset = 6214872, srv = {srv_offset = 54488, srv_weight 
> = 94, srv_priority = 0, srv_port = 0, key = 3156483088}}, 
>   hostname_offset = 11128, ip_timestamp = 2845989456, 
> ip_timeout_interval = 11128, is_srv = 0, reverse_dns = 0, round_robin = 1, 
>   round_robin_elt = 0}
> valid_records = 0
> tip = {_family = 2, _addr = {_ip4 = 540420056, _ip6 = {__in6_u = 
> {__u6_addr8 = "\330'6 x+\000\000\360L\020\250x+\000", 
> __u6_addr16 = {10200, 8246, 11128, 0, 19696, 43024, 11128, 
> 0}, __u6_addr32 = {540420056, 11128, 2819640560, 11128}}}, 
> _byte = "\330'6 x+\000\000\360L\020\250x+\000", _u32 = 
> {540420056, 11128, 2819640560, 11128}, _u64 = {47794936489944, 
>   47797215710448}}}
> ttl_seconds = 132
> aname = 0x2b7938021000 "fbmm1.zenfs.com"
> offset = 96
> thread = 0x2b78a8101010
> __func__ = "dnsEvent"
> #7  0x005145dc in Continuation::handleEvent (this=0x2b7938020f00, 
> event=600, data=0x2b78ac009440)
> at ../iocore/eventsystem/I_Continuation.h:153
> No locals.
> #8  0x006f681e in DNSEntry::postEvent (this=0x2b78f4028600) at 
> DNS.cc:1269
> __func__ = "postEvent"
> #9  0x005145dc in Continuation::handleEvent (this=0x2b78f4028600, 
> event=1, data=0x2aac954db040)
> at ../iocore/eventsystem/I_Continuation.h:153
> No locals.
> #10 0x007bc9be in EThread::process_event (this=0x2b78a8101010, 

[jira] [Commented] (TS-4915) Crash from hostdb in PriorityQueueLess

2016-10-08 Thread Bryan Call (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-4915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15558199#comment-15558199
 ] 

Bryan Call commented on TS-4915:


{noformat}
=
==8079==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6060002792a0 
at pc 0x00655099 bp 0x2b95e2972550 sp 0x2b95e2972548
WRITE of size 8 at 0x6060002792a0 thread T31 ([ET_NET 29])
#0 0x655098 in PriorityQueue 
>::erase(PriorityQueueEntry*) 
../../../trafficserver/lib/ts/PriorityQueue.h:126
#1 0x654965 in RefCountCachePartition::erase(unsigned long, 
long) ../../../trafficserver/iocore/hostdb/P_RefCountCache.h:246
#2 0x9772d2 in RefCountCachePartition::put(unsigned long, 
HostDBInfo*, int, int) 
../../../trafficserver/iocore/hostdb/P_RefCountCache.h:192
#3 0x975b31 in RefCountCache::put(unsigned long, HostDBInfo*, 
int, long) ../../../trafficserver/iocore/hostdb/P_RefCountCache.h:462
#4 0x964ef6 in HostDBContinuation::dnsEvent(int, HostEnt*) 
../../../trafficserver/iocore/hostdb/HostDB.cc:1422
#5 0x5ef3c4 in Continuation::handleEvent(int, void*) 
../../../trafficserver/iocore/eventsystem/I_Continuation.h:153
#6 0x98d024 in DNSEntry::postEvent(int, Event*) 
../../../trafficserver/iocore/dns/DNS.cc:1269
#7 0x5ef3c4 in Continuation::handleEvent(int, void*) 
../../../trafficserver/iocore/eventsystem/I_Continuation.h:153
#8 0xb30fb8 in EThread::process_event(Event*, int) 
../../../trafficserver/iocore/eventsystem/UnixEThread.cc:146
#9 0xb314f4 in EThread::execute() 
../../../trafficserver/iocore/eventsystem/UnixEThread.cc:200
#10 0xb2f963 in spawn_thread_internal 
../../../trafficserver/iocore/eventsystem/Thread.cc:84
#11 0x2b95d7633aa0 in start_thread (/lib64/libpthread.so.0+0x3b88c07aa0)
#12 0x3b880e893c in clone (/lib64/libc.so.6+0x3b880e893c)

0x6060002792a0 is located 0 bytes to the right of 64-byte region 
[0x606000279260,0x6060002792a0)
allocated by thread T28 ([ET_NET 26]) here:
#0 0x58399a in __interceptor_malloc (/home/y/bin64/traffic_server+0x58399a)
#1 0x2b95d69dae16 in ats_malloc 
../../../trafficserver/lib/ts/ink_memory.cc:59
#2 0x5c317c in DefaultAlloc::alloc(int) 
../../../trafficserver/lib/ts/defalloc.h:34
#3 0x97e5d9 in Vec*, 
DefaultAlloc, 2>::addx() ../../../trafficserver/lib/ts/Vec.h:826
#4 0x97dca1 in Vec*, 
DefaultAlloc, 2>::add_internal(PriorityQueueEntry*) 
../../../trafficserver/lib/ts/Vec.h:496
#5 0x97d8e3 in Vec*, 
DefaultAlloc, 2>::add(PriorityQueueEntry*) 
../../../trafficserver/lib/ts/Vec.h:272
#6 0x97b584 in Vec*, 
DefaultAlloc, 2>::push_back(PriorityQueueEntry*) 
../../../trafficserver/lib/ts/Vec.h:65
#7 0x979518 in PriorityQueue 
>::push(PriorityQueueEntry*) 
../../../trafficserver/lib/ts/PriorityQueue.h:88
#8 0x9775d9 in RefCountCachePartition::put(unsigned long, 
HostDBInfo*, int, int) 
../../../trafficserver/iocore/hostdb/P_RefCountCache.h:210
#9 0x975b31 in RefCountCache::put(unsigned long, HostDBInfo*, 
int, long) ../../../trafficserver/iocore/hostdb/P_RefCountCache.h:462
#10 0x964ef6 in HostDBContinuation::dnsEvent(int, HostEnt*) 
../../../trafficserver/iocore/hostdb/HostDB.cc:1422
#11 0x5ef3c4 in Continuation::handleEvent(int, void*) 
../../../trafficserver/iocore/eventsystem/I_Continuation.h:153
#12 0x98d024 in DNSEntry::postEvent(int, Event*) 
../../../trafficserver/iocore/dns/DNS.cc:1269
#13 0x5ef3c4 in Continuation::handleEvent(int, void*) 
../../../trafficserver/iocore/eventsystem/I_Continuation.h:153
#14 0xb30fb8 in EThread::process_event(Event*, int) 
../../../trafficserver/iocore/eventsystem/UnixEThread.cc:146
#15 0xb314f4 in EThread::execute() 
../../../trafficserver/iocore/eventsystem/UnixEThread.cc:200
#16 0xb2f963 in spawn_thread_internal 
../../../trafficserver/iocore/eventsystem/Thread.cc:84
#17 0x2b95d7633aa0 in start_thread (/lib64/libpthread.so.0+0x3b88c07aa0)

Thread T31 ([ET_NET 29]) created by T0 ([TS_MAIN]) here:
#0 0x525904 in pthread_create (/home/y/bin64/traffic_server+0x525904)
#1 0xb2f4ee in ink_thread_create 
../../../trafficserver/lib/ts/ink_thread.h:152
#2 0xb2fa8d in Thread::start(char const*, unsigned long, void* (*)(void*), 
void*, void*) ../../../trafficserver/iocore/eventsystem/Thread.cc:99
#3 0xb353db in EventProcessor::start(int, unsigned long) 
../../../trafficserver/iocore/eventsystem/UnixEventProcessor.cc:240
#4 0x650302 in main 

[jira] [Commented] (TS-4915) Crash from hostdb in PriorityQueueLess

2016-10-08 Thread Bryan Call (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-4915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15558197#comment-15558197
 ] 

Bryan Call commented on TS-4915:


I am seeing them too.

> Crash from hostdb in PriorityQueueLess
> --
>
> Key: TS-4915
> URL: https://issues.apache.org/jira/browse/TS-4915
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HostDB
>Reporter: Susan Hinrichs
>Priority: Blocker
> Fix For: 7.1.0
>
>
> Saw this while testing fix for TS-4813 with debug enabled.
> {code}
> (gdb) bt full
> #0  0x00547bfe in RefCountCacheHashEntry::operator< (this=0x1cc0880, 
> v2=...) at ../iocore/hostdb/P_RefCountCache.h:94
> No locals.
> #1  0x0054988d in 
> PriorityQueueLess::operator() (this=0x2b78a9a2587b, 
> a=@0x2b78f402af68, b=@0x2b78f402aa28)
> at ../lib/ts/PriorityQueue.h:41
> No locals.
> #2  0x00549785 in PriorityQueue PriorityQueueLess >::_bubble_up (this=0x1cb2990, 
> index=2) at ../lib/ts/PriorityQueue.h:191
> comp = {}
> parent = 0
> #3  0x006ecfcc in PriorityQueue PriorityQueueLess >::push (this=0x1cb2990, 
> entry=0x2b78f402af60) at ../../lib/ts/PriorityQueue.h:91
> len = 2
> #4  0x006ec206 in RefCountCachePartition::put 
> (this=0x1cb2900, key=6912554662447498853, item=0x2b78aee04f00, size=96, 
> expire_time=1475202356) at ./P_RefCountCache.h:210
> expiry_entry = 0x2b78f402af60
> __func__ = "put"
> val = 0x1cc0880
> #5  0x006eb3de in RefCountCache::put (this=0x18051e0, 
> key=6912554662447498853, item=0x2b78aee04f00, size=16, 
> expiry_time=1475202356) at ./P_RefCountCache.h:462
> No locals.
> #6  0x006e2d8e in HostDBContinuation::dnsEvent (this=0x2b7938020f00, 
> event=600, e=0x2b78ac009440) at HostDB.cc:1422
> is_rr = false
> old_rr_data = 0x0
> first_record = 0x2b78ac0094f8
> m = 0x1
> failed = false
> old_r = {m_ptr = 0x0}
> af = 2 '\002'
> s_size = 16
> rrsize = 0
> allocSize = 16
> r = 0x2b78aee04f00
> old_info = { = { = {_vptr.ForceVFPTToTop 
> = 0x7f3630}, m_refcount = 0}, iobuffer_index = 0, 
>   key = 47797242059264, app = {allotment = {application1 = 5326300, 
> application2 = 0}, http_data = {http_version = 4, 
>   pipeline_max = 59, keepalive_timeout = 17, fail_count = 81, 
> unused1 = 0, last_failure = 0}, rr = {offset = 5326300}}, data = {
> ip = {sa = {sa_family = 54488, sa_data = 
> "^\000\000\000\000\000\020\034$\274x+\000"}, sin = {sin_family = 54488, 
> sin_port = 94, 
> sin_addr = {s_addr = 0}, sin_zero = "\020\034$\274x+\000"}, 
> sin6 = {sin6_family = 54488, sin6_port = 94, sin6_flowinfo = 0, 
> sin6_addr = {__in6_u = {__u6_addr8 = 
> "\020\034$\274x+\000\000\030\036$\274\375\b\000", __u6_addr16 = {7184, 48164, 
> 11128, 
>   0, 7704, 48164, 2301, 0}, __u6_addr32 = {3156483088, 
> 11128, 3156483608, 2301}}}, sin6_scope_id = 3156478176}}, 
> hostname_offset = 6214872, srv = {srv_offset = 54488, srv_weight 
> = 94, srv_priority = 0, srv_port = 0, key = 3156483088}}, 
>   hostname_offset = 11128, ip_timestamp = 2845989456, 
> ip_timeout_interval = 11128, is_srv = 0, reverse_dns = 0, round_robin = 1, 
>   round_robin_elt = 0}
> valid_records = 0
> tip = {_family = 2, _addr = {_ip4 = 540420056, _ip6 = {__in6_u = 
> {__u6_addr8 = "\330'6 x+\000\000\360L\020\250x+\000", 
> __u6_addr16 = {10200, 8246, 11128, 0, 19696, 43024, 11128, 
> 0}, __u6_addr32 = {540420056, 11128, 2819640560, 11128}}}, 
> _byte = "\330'6 x+\000\000\360L\020\250x+\000", _u32 = 
> {540420056, 11128, 2819640560, 11128}, _u64 = {47794936489944, 
>   47797215710448}}}
> ttl_seconds = 132
> aname = 0x2b7938021000 "fbmm1.zenfs.com"
> offset = 96
> thread = 0x2b78a8101010
> __func__ = "dnsEvent"
> #7  0x005145dc in Continuation::handleEvent (this=0x2b7938020f00, 
> event=600, data=0x2b78ac009440)
> at ../iocore/eventsystem/I_Continuation.h:153
> No locals.
> #8  0x006f681e in DNSEntry::postEvent (this=0x2b78f4028600) at 
> DNS.cc:1269
> __func__ = "postEvent"
> #9  0x005145dc in Continuation::handleEvent (this=0x2b78f4028600, 
> event=1, data=0x2aac954db040)
> at ../iocore/eventsystem/I_Continuation.h:153
> No locals.
> #10 0x007bc9be in EThread::process_event (this=0x2b78a8101010, 
> e=0x2aac954db040, calling_code=1) at UnixEThread.cc:143
> c_temp = 0x2b78f4028600
> lock = {m = {m_ptr = 

[jira] [Commented] (TS-4915) Crash from hostdb in PriorityQueueLess

2016-10-06 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-4915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15552510#comment-15552510
 ] 

Susan Hinrichs commented on TS-4915:


Still getting these crashes once every couple hours in production traffic.

> Crash from hostdb in PriorityQueueLess
> --
>
> Key: TS-4915
> URL: https://issues.apache.org/jira/browse/TS-4915
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HostDB
>Reporter: Susan Hinrichs
>Priority: Blocker
> Fix For: 7.1.0
>
>
> Saw this while testing fix for TS-4813 with debug enabled.
> {code}
> (gdb) bt full
> #0  0x00547bfe in RefCountCacheHashEntry::operator< (this=0x1cc0880, 
> v2=...) at ../iocore/hostdb/P_RefCountCache.h:94
> No locals.
> #1  0x0054988d in 
> PriorityQueueLess::operator() (this=0x2b78a9a2587b, 
> a=@0x2b78f402af68, b=@0x2b78f402aa28)
> at ../lib/ts/PriorityQueue.h:41
> No locals.
> #2  0x00549785 in PriorityQueue PriorityQueueLess >::_bubble_up (this=0x1cb2990, 
> index=2) at ../lib/ts/PriorityQueue.h:191
> comp = {}
> parent = 0
> #3  0x006ecfcc in PriorityQueue PriorityQueueLess >::push (this=0x1cb2990, 
> entry=0x2b78f402af60) at ../../lib/ts/PriorityQueue.h:91
> len = 2
> #4  0x006ec206 in RefCountCachePartition::put 
> (this=0x1cb2900, key=6912554662447498853, item=0x2b78aee04f00, size=96, 
> expire_time=1475202356) at ./P_RefCountCache.h:210
> expiry_entry = 0x2b78f402af60
> __func__ = "put"
> val = 0x1cc0880
> #5  0x006eb3de in RefCountCache::put (this=0x18051e0, 
> key=6912554662447498853, item=0x2b78aee04f00, size=16, 
> expiry_time=1475202356) at ./P_RefCountCache.h:462
> No locals.
> #6  0x006e2d8e in HostDBContinuation::dnsEvent (this=0x2b7938020f00, 
> event=600, e=0x2b78ac009440) at HostDB.cc:1422
> is_rr = false
> old_rr_data = 0x0
> first_record = 0x2b78ac0094f8
> m = 0x1
> failed = false
> old_r = {m_ptr = 0x0}
> af = 2 '\002'
> s_size = 16
> rrsize = 0
> allocSize = 16
> r = 0x2b78aee04f00
> old_info = { = { = {_vptr.ForceVFPTToTop 
> = 0x7f3630}, m_refcount = 0}, iobuffer_index = 0, 
>   key = 47797242059264, app = {allotment = {application1 = 5326300, 
> application2 = 0}, http_data = {http_version = 4, 
>   pipeline_max = 59, keepalive_timeout = 17, fail_count = 81, 
> unused1 = 0, last_failure = 0}, rr = {offset = 5326300}}, data = {
> ip = {sa = {sa_family = 54488, sa_data = 
> "^\000\000\000\000\000\020\034$\274x+\000"}, sin = {sin_family = 54488, 
> sin_port = 94, 
> sin_addr = {s_addr = 0}, sin_zero = "\020\034$\274x+\000"}, 
> sin6 = {sin6_family = 54488, sin6_port = 94, sin6_flowinfo = 0, 
> sin6_addr = {__in6_u = {__u6_addr8 = 
> "\020\034$\274x+\000\000\030\036$\274\375\b\000", __u6_addr16 = {7184, 48164, 
> 11128, 
>   0, 7704, 48164, 2301, 0}, __u6_addr32 = {3156483088, 
> 11128, 3156483608, 2301}}}, sin6_scope_id = 3156478176}}, 
> hostname_offset = 6214872, srv = {srv_offset = 54488, srv_weight 
> = 94, srv_priority = 0, srv_port = 0, key = 3156483088}}, 
>   hostname_offset = 11128, ip_timestamp = 2845989456, 
> ip_timeout_interval = 11128, is_srv = 0, reverse_dns = 0, round_robin = 1, 
>   round_robin_elt = 0}
> valid_records = 0
> tip = {_family = 2, _addr = {_ip4 = 540420056, _ip6 = {__in6_u = 
> {__u6_addr8 = "\330'6 x+\000\000\360L\020\250x+\000", 
> __u6_addr16 = {10200, 8246, 11128, 0, 19696, 43024, 11128, 
> 0}, __u6_addr32 = {540420056, 11128, 2819640560, 11128}}}, 
> _byte = "\330'6 x+\000\000\360L\020\250x+\000", _u32 = 
> {540420056, 11128, 2819640560, 11128}, _u64 = {47794936489944, 
>   47797215710448}}}
> ttl_seconds = 132
> aname = 0x2b7938021000 "fbmm1.zenfs.com"
> offset = 96
> thread = 0x2b78a8101010
> __func__ = "dnsEvent"
> #7  0x005145dc in Continuation::handleEvent (this=0x2b7938020f00, 
> event=600, data=0x2b78ac009440)
> at ../iocore/eventsystem/I_Continuation.h:153
> No locals.
> #8  0x006f681e in DNSEntry::postEvent (this=0x2b78f4028600) at 
> DNS.cc:1269
> __func__ = "postEvent"
> #9  0x005145dc in Continuation::handleEvent (this=0x2b78f4028600, 
> event=1, data=0x2aac954db040)
> at ../iocore/eventsystem/I_Continuation.h:153
> No locals.
> #10 0x007bc9be in EThread::process_event (this=0x2b78a8101010, 
> e=0x2aac954db040, calling_code=1) at UnixEThread.cc:143
> 

[jira] [Commented] (TS-4915) Crash from hostdb in PriorityQueueLess

2016-10-02 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-4915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15541238#comment-15541238
 ] 

Susan Hinrichs commented on TS-4915:


Yes, I assume this is a 7.0.0 issue.  Though I cannot say for certain since I'm 
working on a branch to fix ts-4813 for now.  With that branch, it looks like 
this core appears every couple hours on a light traffic, predominately caching 
production server.

[~jacksontj] since hostdb shows up in the stack does this look like anything 
from the resent hostdb changes?  I have another less frequent stack that shows 
up in hostdb land.  I have seen this following stack 2-3 times since Friday.

{code}
#0  0x007b7c3f in ink_atomic_increment 
(mem=0xff4e45fa6000802c, count=1) at ../../lib/ts/ink_atomic.h:95
#1  0x007b7703 in RecIncrGlobalRawStatCount (rsb=0x2b1a60037f90, id=4, 
incr=1) at RecRawStats.cc:467
#2  0x0054915d in RefCountCachePartition::metric_inc 
(this=0x1477c50, metric_enum=refcountcache_total_lookups_stat, data=1)
at ../iocore/hostdb/P_RefCountCache.h:327
#3  0x006ebfca in RefCountCachePartition::get 
(this=0x1477c50, key=18396718469509840932) at ./P_RefCountCache.h:174
#4  0x006eb38f in RefCountCache::get (this=0x1484f00, 
key=18396718469509840932) at ./P_RefCountCache.h:455
#5  0x006de4bf in probe (mutex=0x1457a80, md5=..., 
ignore_timeout=false) at HostDB.cc:527
#6  0x006dfbfe in HostDBProcessor::getbyname_imm (this=0xc173c0, 
cont=0x2b19fc388440, process_hostdb_info=
(void (Continuation::*)(Continuation *, HostDBInfo *)) 0x5e8ece 
, 
hostname=0x2b1a2d0c1419 "lib.lumcs.com", len=0, opt=...) at HostDB.cc:818
#7  0x005f1102 in HttpSM::do_hostdb_lookup (this=0x2b19fc388440) at 
HttpSM.cc:4130
#8  0x005fd691 in HttpSM::set_next_state (this=0x2b19fc388440) at 
HttpSM.cc:7256
#9  0x005fca85 in HttpSM::call_transact_and_set_next_state 
(this=0x2b19fc388440, f=0) at HttpSM.cc:7122
#10 0x005e6eab in HttpSM::handle_api_return (this=0x2b19fc388440) at 
HttpSM.cc:1604
#11 0x0060414c in HttpSM::do_api_callout (this=0x2b19fc388440) at 
HttpSM.cc:438
#12 0x005fcaf2 in HttpSM::set_next_state (this=0x2b19fc388440) at 
HttpSM.cc:7155
#13 0x005fca85 in HttpSM::call_transact_and_set_next_state 
(this=0x2b19fc388440, f=0) at HttpSM.cc:7122
#14 0x005e6eab in HttpSM::handle_api_return (this=0x2b19fc388440) at 
HttpSM.cc:1604
#15 0x0060414c in HttpSM::do_api_callout (this=0x2b19fc388440) at 
HttpSM.cc:438
#16 0x005fcaf2 in HttpSM::set_next_state (this=0x2b19fc388440) at 
HttpSM.cc:7155
#17 0x005fca85 in HttpSM::call_transact_and_set_next_state 
(this=0x2b19fc388440, f=
0x616a42 ) at 
HttpSM.cc:7122
#18 0x005eaf09 in HttpSM::state_cache_open_read (this=0x2b19fc388440, 
event=1102, data=0x2b1a3041e740) at HttpSM.cc:2596
#19 0x005eb4b5 in HttpSM::main_handler (this=0x2b19fc388440, 
event=1102, data=0x2b1a3041e740) at HttpSM.cc:2658
#20 0x005145dc in Continuation::handleEvent (this=0x2b19fc388440, 
event=1102, data=0x2b1a3041e740)
at ../iocore/eventsystem/I_Continuation.h:153
#21 0x005d3818 in HttpCacheSM::state_cache_open_read 
(this=0x2b19fc389d60, event=1102, data=0x2b1a3041e740) at HttpCacheSM.cc:132
#22 0x005145dc in Continuation::handleEvent (this=0x2b19fc389d60, 
event=1102, data=0x2b1a3041e740)
at ../iocore/eventsystem/I_Continuation.h:153
#23 0x00756997 in CacheVC::callcont (this=0x2b1a3041e740, event=1102) 
at P_CacheInternal.h:643
#24 0x00754aa6 in CacheVC::openReadStartEarliest (this=0x2b1a3041e740) 
at CacheRead.cc:914
#25 0x005145dc in Continuation::handleEvent (this=0x2b1a3041e740, 
event=3900, data=0x0) at ../iocore/eventsystem/I_Continuation.h:153
#26 0x007318f5 in CacheVC::handleReadDone (this=0x2b1a3041e740, 
event=3900, e=0x2b1a3041e8c8) at Cache.cc:2445
#27 0x005145dc in Continuation::handleEvent (this=0x2b1a3041e740, 
event=3900, data=0x2b1a3041e8c8)
at ../iocore/eventsystem/I_Continuation.h:153
#28 0x00736ebd in AIOCallbackInternal::io_complete 
(this=0x2b1a3041e8c8, event=1, data=0x2b1a0c0041e0) at 
../../iocore/aio/P_AIO.h:117
#29 0x005145dc in Continuation::handleEvent (this=0x2b1a3041e8c8, 
event=1, data=0x2b1a0c0041e0) at ../iocore/eventsystem/I_Continuation.h:153
#30 0x007bca24 in EThread::process_event (this=0x2b19e1824010, 
e=0x2b1a0c0041e0, calling_code=1) at UnixEThread.cc:143
#31 0x007bcc93 in EThread::execute (this=0x2b19e1824010) at 
UnixEThread.cc:197
#32 0x007bc038 in spawn_thread_internal (a=0x147beb0) at Thread.cc:84
#33 0x2b19db4a9aa1 in start_thread () from /lib64/libpthread.so.0
#34 0x0032310e893d in clone () from /lib64/libc.so.6
{code}

> Crash 

[jira] [Commented] (TS-4915) Crash from hostdb in PriorityQueueLess

2016-09-30 Thread Leif Hedstrom (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-4915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15536942#comment-15536942
 ] 

Leif Hedstrom commented on TS-4915:
---

[~shinrich] I assume this is something that affects 7.0.0? Marking it as a 
blocker for now.

> Crash from hostdb in PriorityQueueLess
> --
>
> Key: TS-4915
> URL: https://issues.apache.org/jira/browse/TS-4915
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HostDB
>Reporter: Susan Hinrichs
>Priority: Blocker
> Fix For: 7.1.0
>
>
> Saw this while testing fix for TS-4813 with debug enabled.
> {code}
> (gdb) bt full
> #0  0x00547bfe in RefCountCacheHashEntry::operator< (this=0x1cc0880, 
> v2=...) at ../iocore/hostdb/P_RefCountCache.h:94
> No locals.
> #1  0x0054988d in 
> PriorityQueueLess::operator() (this=0x2b78a9a2587b, 
> a=@0x2b78f402af68, b=@0x2b78f402aa28)
> at ../lib/ts/PriorityQueue.h:41
> No locals.
> #2  0x00549785 in PriorityQueue PriorityQueueLess >::_bubble_up (this=0x1cb2990, 
> index=2) at ../lib/ts/PriorityQueue.h:191
> comp = {}
> parent = 0
> #3  0x006ecfcc in PriorityQueue PriorityQueueLess >::push (this=0x1cb2990, 
> entry=0x2b78f402af60) at ../../lib/ts/PriorityQueue.h:91
> len = 2
> #4  0x006ec206 in RefCountCachePartition::put 
> (this=0x1cb2900, key=6912554662447498853, item=0x2b78aee04f00, size=96, 
> expire_time=1475202356) at ./P_RefCountCache.h:210
> expiry_entry = 0x2b78f402af60
> __func__ = "put"
> val = 0x1cc0880
> #5  0x006eb3de in RefCountCache::put (this=0x18051e0, 
> key=6912554662447498853, item=0x2b78aee04f00, size=16, 
> expiry_time=1475202356) at ./P_RefCountCache.h:462
> No locals.
> #6  0x006e2d8e in HostDBContinuation::dnsEvent (this=0x2b7938020f00, 
> event=600, e=0x2b78ac009440) at HostDB.cc:1422
> is_rr = false
> old_rr_data = 0x0
> first_record = 0x2b78ac0094f8
> m = 0x1
> failed = false
> old_r = {m_ptr = 0x0}
> af = 2 '\002'
> s_size = 16
> rrsize = 0
> allocSize = 16
> r = 0x2b78aee04f00
> old_info = { = { = {_vptr.ForceVFPTToTop 
> = 0x7f3630}, m_refcount = 0}, iobuffer_index = 0, 
>   key = 47797242059264, app = {allotment = {application1 = 5326300, 
> application2 = 0}, http_data = {http_version = 4, 
>   pipeline_max = 59, keepalive_timeout = 17, fail_count = 81, 
> unused1 = 0, last_failure = 0}, rr = {offset = 5326300}}, data = {
> ip = {sa = {sa_family = 54488, sa_data = 
> "^\000\000\000\000\000\020\034$\274x+\000"}, sin = {sin_family = 54488, 
> sin_port = 94, 
> sin_addr = {s_addr = 0}, sin_zero = "\020\034$\274x+\000"}, 
> sin6 = {sin6_family = 54488, sin6_port = 94, sin6_flowinfo = 0, 
> sin6_addr = {__in6_u = {__u6_addr8 = 
> "\020\034$\274x+\000\000\030\036$\274\375\b\000", __u6_addr16 = {7184, 48164, 
> 11128, 
>   0, 7704, 48164, 2301, 0}, __u6_addr32 = {3156483088, 
> 11128, 3156483608, 2301}}}, sin6_scope_id = 3156478176}}, 
> hostname_offset = 6214872, srv = {srv_offset = 54488, srv_weight 
> = 94, srv_priority = 0, srv_port = 0, key = 3156483088}}, 
>   hostname_offset = 11128, ip_timestamp = 2845989456, 
> ip_timeout_interval = 11128, is_srv = 0, reverse_dns = 0, round_robin = 1, 
>   round_robin_elt = 0}
> valid_records = 0
> tip = {_family = 2, _addr = {_ip4 = 540420056, _ip6 = {__in6_u = 
> {__u6_addr8 = "\330'6 x+\000\000\360L\020\250x+\000", 
> __u6_addr16 = {10200, 8246, 11128, 0, 19696, 43024, 11128, 
> 0}, __u6_addr32 = {540420056, 11128, 2819640560, 11128}}}, 
> _byte = "\330'6 x+\000\000\360L\020\250x+\000", _u32 = 
> {540420056, 11128, 2819640560, 11128}, _u64 = {47794936489944, 
>   47797215710448}}}
> ttl_seconds = 132
> aname = 0x2b7938021000 "fbmm1.zenfs.com"
> offset = 96
> thread = 0x2b78a8101010
> __func__ = "dnsEvent"
> #7  0x005145dc in Continuation::handleEvent (this=0x2b7938020f00, 
> event=600, data=0x2b78ac009440)
> at ../iocore/eventsystem/I_Continuation.h:153
> No locals.
> #8  0x006f681e in DNSEntry::postEvent (this=0x2b78f4028600) at 
> DNS.cc:1269
> __func__ = "postEvent"
> #9  0x005145dc in Continuation::handleEvent (this=0x2b78f4028600, 
> event=1, data=0x2aac954db040)
> at ../iocore/eventsystem/I_Continuation.h:153
> No locals.
> #10 0x007bc9be in EThread::process_event (this=0x2b78a8101010, 
> e=0x2aac954db040, calling_code=1) at