[jira] [Commented] (TS-852) hostdb and dns system hangup
[ https://issues.apache.org/jira/browse/TS-852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14582154#comment-14582154 ] Phil Sorber commented on TS-852: I think the patch I pasted fixes this, but I'd like [~amc] to confirm. > hostdb and dns system hangup > > > Key: TS-852 > URL: https://issues.apache.org/jira/browse/TS-852 > Project: Traffic Server > Issue Type: Bug > Components: Core, DNS >Affects Versions: 3.1.0 >Reporter: Zhao Yongming >Assignee: Alan M. Carroll >Priority: Minor > Labels: dns, hostdb > Fix For: 3.1.0, 6.1.0 > > Attachments: TS-852.patch > > > in my testing, all request that need go OS, fails with 30s timeout, and the > slow query shows data from dns missed: > {code} > [Jun 22 15:33:47.183] Server {1106880832} ERROR: [8122411] Slow Request: url: > http://download.im.alisoft.com/aliim/AliIM6/update/packages//2011-5-4-10-10-40/0/modulesproxy/8203.xml.wwzip > status: 0 unique id: bytes: 0 fd: 0 client state: 6 server state: 0 > ua_begin: 0.000 ua_read_header_done: 0.000 cache_open_read_begin: 0.000 > cache_open_read_end: 0.000 dns_lookup_begin: 0.000 dns_lookup_end: -1.000 > server_connect: -1.000 server_first_read: -1.000 server_read_header_done: > -1.000 server_close: -1.000 ua_close: 30.667 sm_finish: 30.667 > [Jun 22 15:33:47.220] Server {1099663680} ERROR: [8122422] Slow Request: url: > http://img.uu1001.cn/materials/original/2010-11-22/22-48/a302876929a9c40a8272ac439a16ad25e74edf71.png > status: 0 unique id: bytes: 0 fd: 0 client state: 6 server state: 0 > ua_begin: 0.000 ua_read_header_done: 0.000 cache_open_read_begin: 0.000 > cache_open_read_end: 0.000 dns_lookup_begin: 0.000 dns_lookup_end: -1.000 > server_connect: -1.000 server_first_read: -1.000 server_read_header_done: > -1.000 server_close: -1.000 ua_close: 30.318 sm_finish: 30.318 > [Jun 22 15:33:47.441] Server {1098611008} ERROR: [8122421] Slow Request: url: > http://img02.taobaocdn.com/bao/uploaded/i2/T1mp8QXopdXXblNIZ6_061203.jpg_b.jpg > status: 0 unique id: bytes: 0 fd: 0 client state: 6 server state: 0 > ua_begin: 0.000 ua_read_header_done: 0.000 cache_open_read_begin: 0.000 > cache_open_read_end: 0.000 dns_lookup_begin: 0.000 dns_lookup_end: -1.000 > server_connect: -1.000 server_first_read: -1.000 server_read_header_done: > -1.000 server_close: -1.000 ua_close: 30.597 sm_finish: 30.597 > [Jun 22 15:33:47.441] Server {1098611008} ERROR: [8122409] Slow Request: url: > http://img04.taobaocdn.com/tps/i4/T1EM9gXltt-440-135.jpg status: 0 > unique id: bytes: 0 fd: 0 client state: 6 server state: 0 ua_begin: 0.000 > ua_read_header_done: 0.000 cache_open_read_begin: 0.000 cache_open_read_end: > 0.000 dns_lookup_begin: 0.000 dns_lookup_end: -1.000 server_connect: -1.000 > server_first_read: -1.000 server_read_header_done: -1.000 server_close: > -1.000 ua_close: 30.948 sm_finish: 30.948 > {code} > all http SM show from {http} in DNS_LOOKUP > and traffic.out show nothing with error, when I enable debug on > hostdb.*|dns.*: > {code} > [Jun 22 15:27:42.391] Server {1108986176} DEBUG: (hostdb) probe > img03.taobaocdn.com DB357D60B8247DC7 1 [ignore_timeout = 0] > [Jun 22 15:27:42.391] Server {1108986176} DEBUG: (hostdb) timeout 64204 > 1308663404 300 > [Jun 22 15:27:42.391] Server {1108986176} DEBUG: (hostdb) delaying force 0 > answer for img03.taobaocdn.com [timeout 0] > [Jun 22 15:27:42.411] Server {1108986176} DEBUG: (hostdb) probe > img03.taobaocdn.com DB357D60B8247DC7 1 [ignore_timeout = 0] > [Jun 22 15:27:42.411] Server {1108986176} DEBUG: (hostdb) timeout 64204 > 1308663404 300 > [Jun 22 15:27:42.411] Server {1108986176} DEBUG: (hostdb) DNS > img03.taobaocdn.com > [Jun 22 15:27:42.411] Server {1108986176} DEBUG: (hostdb) enqueuing > additional request > [Jun 22 15:27:42.416] Server {47177168876656} DEBUG: (hostdb) probe 127.0.0.1 > E9B7563422C93608 1 [ignore_timeout = 0] > [Jun 22 15:27:42.416] Server {47177168876656} DEBUG: (hostdb) immediate > answer for 127.0.0.1 > [Jun 22 15:27:42.422] Server {1098611008} DEBUG: (hostdb) probe > img08.taobaocdn.com 945198AE9AE37481 1 [ignore_timeout = 0] > [Jun 22 15:27:42.422] Server {1098611008} DEBUG: (hostdb) timeout 64281 > 1308663327 300 > [Jun 22 15:27:42.422] Server {1098611008} DEBUG: (hostdb) delaying force 0 > answer for img08.taobaocdn.com [timeout 0] > [Jun 22 15:27:42.441] Server {1098611008} DEBUG: (hostdb) probe > img08.taobaocdn.com 945198AE9AE37481 1 [ignore_timeout = 0] > [Jun 22 15:27:42.441] Server {1098611008} DEBUG: (hostdb) timeout 64281 > 1308663327 300 > [Jun 22 15:27:42.441] Server {1098611008} DEBUG: (hostdb) DNS > img08.taobaocdn.com > [Jun 22 15:27:42.441] Server {1098611008} DEBUG: (hostdb) enqueuing > additional request > [Jun 22 15:27:42.470] Server {10996636
[jira] [Commented] (TS-852) hostdb and dns system hangup
[ https://issues.apache.org/jira/browse/TS-852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14196472#comment-14196472 ] Susan Hinrichs commented on TS-852: --- [~psudaemon] any additional thoughts on this. > hostdb and dns system hangup > > > Key: TS-852 > URL: https://issues.apache.org/jira/browse/TS-852 > Project: Traffic Server > Issue Type: Bug > Components: Core, DNS >Affects Versions: 3.1.0 >Reporter: Zhao Yongming >Assignee: Phil Sorber >Priority: Minor > Labels: dns, hostdb > Fix For: 3.1.0, sometime > > Attachments: TS-852.patch > > > in my testing, all request that need go OS, fails with 30s timeout, and the > slow query shows data from dns missed: > {code} > [Jun 22 15:33:47.183] Server {1106880832} ERROR: [8122411] Slow Request: url: > http://download.im.alisoft.com/aliim/AliIM6/update/packages//2011-5-4-10-10-40/0/modulesproxy/8203.xml.wwzip > status: 0 unique id: bytes: 0 fd: 0 client state: 6 server state: 0 > ua_begin: 0.000 ua_read_header_done: 0.000 cache_open_read_begin: 0.000 > cache_open_read_end: 0.000 dns_lookup_begin: 0.000 dns_lookup_end: -1.000 > server_connect: -1.000 server_first_read: -1.000 server_read_header_done: > -1.000 server_close: -1.000 ua_close: 30.667 sm_finish: 30.667 > [Jun 22 15:33:47.220] Server {1099663680} ERROR: [8122422] Slow Request: url: > http://img.uu1001.cn/materials/original/2010-11-22/22-48/a302876929a9c40a8272ac439a16ad25e74edf71.png > status: 0 unique id: bytes: 0 fd: 0 client state: 6 server state: 0 > ua_begin: 0.000 ua_read_header_done: 0.000 cache_open_read_begin: 0.000 > cache_open_read_end: 0.000 dns_lookup_begin: 0.000 dns_lookup_end: -1.000 > server_connect: -1.000 server_first_read: -1.000 server_read_header_done: > -1.000 server_close: -1.000 ua_close: 30.318 sm_finish: 30.318 > [Jun 22 15:33:47.441] Server {1098611008} ERROR: [8122421] Slow Request: url: > http://img02.taobaocdn.com/bao/uploaded/i2/T1mp8QXopdXXblNIZ6_061203.jpg_b.jpg > status: 0 unique id: bytes: 0 fd: 0 client state: 6 server state: 0 > ua_begin: 0.000 ua_read_header_done: 0.000 cache_open_read_begin: 0.000 > cache_open_read_end: 0.000 dns_lookup_begin: 0.000 dns_lookup_end: -1.000 > server_connect: -1.000 server_first_read: -1.000 server_read_header_done: > -1.000 server_close: -1.000 ua_close: 30.597 sm_finish: 30.597 > [Jun 22 15:33:47.441] Server {1098611008} ERROR: [8122409] Slow Request: url: > http://img04.taobaocdn.com/tps/i4/T1EM9gXltt-440-135.jpg status: 0 > unique id: bytes: 0 fd: 0 client state: 6 server state: 0 ua_begin: 0.000 > ua_read_header_done: 0.000 cache_open_read_begin: 0.000 cache_open_read_end: > 0.000 dns_lookup_begin: 0.000 dns_lookup_end: -1.000 server_connect: -1.000 > server_first_read: -1.000 server_read_header_done: -1.000 server_close: > -1.000 ua_close: 30.948 sm_finish: 30.948 > {code} > all http SM show from {http} in DNS_LOOKUP > and traffic.out show nothing with error, when I enable debug on > hostdb.*|dns.*: > {code} > [Jun 22 15:27:42.391] Server {1108986176} DEBUG: (hostdb) probe > img03.taobaocdn.com DB357D60B8247DC7 1 [ignore_timeout = 0] > [Jun 22 15:27:42.391] Server {1108986176} DEBUG: (hostdb) timeout 64204 > 1308663404 300 > [Jun 22 15:27:42.391] Server {1108986176} DEBUG: (hostdb) delaying force 0 > answer for img03.taobaocdn.com [timeout 0] > [Jun 22 15:27:42.411] Server {1108986176} DEBUG: (hostdb) probe > img03.taobaocdn.com DB357D60B8247DC7 1 [ignore_timeout = 0] > [Jun 22 15:27:42.411] Server {1108986176} DEBUG: (hostdb) timeout 64204 > 1308663404 300 > [Jun 22 15:27:42.411] Server {1108986176} DEBUG: (hostdb) DNS > img03.taobaocdn.com > [Jun 22 15:27:42.411] Server {1108986176} DEBUG: (hostdb) enqueuing > additional request > [Jun 22 15:27:42.416] Server {47177168876656} DEBUG: (hostdb) probe 127.0.0.1 > E9B7563422C93608 1 [ignore_timeout = 0] > [Jun 22 15:27:42.416] Server {47177168876656} DEBUG: (hostdb) immediate > answer for 127.0.0.1 > [Jun 22 15:27:42.422] Server {1098611008} DEBUG: (hostdb) probe > img08.taobaocdn.com 945198AE9AE37481 1 [ignore_timeout = 0] > [Jun 22 15:27:42.422] Server {1098611008} DEBUG: (hostdb) timeout 64281 > 1308663327 300 > [Jun 22 15:27:42.422] Server {1098611008} DEBUG: (hostdb) delaying force 0 > answer for img08.taobaocdn.com [timeout 0] > [Jun 22 15:27:42.441] Server {1098611008} DEBUG: (hostdb) probe > img08.taobaocdn.com 945198AE9AE37481 1 [ignore_timeout = 0] > [Jun 22 15:27:42.441] Server {1098611008} DEBUG: (hostdb) timeout 64281 > 1308663327 300 > [Jun 22 15:27:42.441] Server {1098611008} DEBUG: (hostdb) DNS > img08.taobaocdn.com > [Jun 22 15:27:42.441] Server {1098611008} DEBUG: (hostdb) enqueuing > additional request > [Jun 22 15:27:42.470] Server {1099663680} DEBUG: (hostdb)
[jira] [Commented] (TS-852) hostdb and dns system hangup
[ https://issues.apache.org/jira/browse/TS-852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006284#comment-14006284 ] Phil Sorber commented on TS-852: After looking at this more, I feel that the only change that should be made here that makes sense to me is: {noformat} diff --git a/iocore/dns/DNS.cc b/iocore/dns/DNS.cc index a93b6c9..7fda955 100644 --- a/iocore/dns/DNS.cc +++ b/iocore/dns/DNS.cc @@ -975,6 +975,10 @@ write_dns_event(DNSHandler *h, DNSEntry *e) h->release_query_id(e->id[dns_retries - e->retries]); } e->id[dns_retries - e->retries] = i; + + if (e->timeout) +e->timeout->cancel(); + Debug("dns", "send query (qtype=%d) for %s to fd %d", e->qtype, e->qname, h->con[h->name_server].fd); int s = socketManager.send(h->con[h->name_server].fd, blob._b, r, 0); @@ -998,9 +1002,6 @@ write_dns_event(DNSHandler *h, DNSEntry *e) e->send_time = ink_get_hrtime(); - if (e->timeout) -e->timeout->cancel(); - if (h->txn_lookup_timeout) { e->timeout = h->mutex->thread_holding->schedule_in(e, HRTIME_MSECONDS(h->txn_lookup_timeout)); //this is in msec } else { {noformat} Basically make sure we remove the timeout. But I am not even convinced of this. I feel there is still a real bug here, but not sure if the fix is in this portion of the code. Input from [~zym], [~wahu0315210], or [~adungaos] would be appreciated. Going to change this to fix sometime. > hostdb and dns system hangup > > > Key: TS-852 > URL: https://issues.apache.org/jira/browse/TS-852 > Project: Traffic Server > Issue Type: Bug > Components: Core, DNS >Affects Versions: 3.1.0 >Reporter: Zhao Yongming >Assignee: Phil Sorber >Priority: Minor > Labels: dns, hostdb > Fix For: 3.1.0, sometime > > Attachments: TS-852.patch > > > in my testing, all request that need go OS, fails with 30s timeout, and the > slow query shows data from dns missed: > {code} > [Jun 22 15:33:47.183] Server {1106880832} ERROR: [8122411] Slow Request: url: > http://download.im.alisoft.com/aliim/AliIM6/update/packages//2011-5-4-10-10-40/0/modulesproxy/8203.xml.wwzip > status: 0 unique id: bytes: 0 fd: 0 client state: 6 server state: 0 > ua_begin: 0.000 ua_read_header_done: 0.000 cache_open_read_begin: 0.000 > cache_open_read_end: 0.000 dns_lookup_begin: 0.000 dns_lookup_end: -1.000 > server_connect: -1.000 server_first_read: -1.000 server_read_header_done: > -1.000 server_close: -1.000 ua_close: 30.667 sm_finish: 30.667 > [Jun 22 15:33:47.220] Server {1099663680} ERROR: [8122422] Slow Request: url: > http://img.uu1001.cn/materials/original/2010-11-22/22-48/a302876929a9c40a8272ac439a16ad25e74edf71.png > status: 0 unique id: bytes: 0 fd: 0 client state: 6 server state: 0 > ua_begin: 0.000 ua_read_header_done: 0.000 cache_open_read_begin: 0.000 > cache_open_read_end: 0.000 dns_lookup_begin: 0.000 dns_lookup_end: -1.000 > server_connect: -1.000 server_first_read: -1.000 server_read_header_done: > -1.000 server_close: -1.000 ua_close: 30.318 sm_finish: 30.318 > [Jun 22 15:33:47.441] Server {1098611008} ERROR: [8122421] Slow Request: url: > http://img02.taobaocdn.com/bao/uploaded/i2/T1mp8QXopdXXblNIZ6_061203.jpg_b.jpg > status: 0 unique id: bytes: 0 fd: 0 client state: 6 server state: 0 > ua_begin: 0.000 ua_read_header_done: 0.000 cache_open_read_begin: 0.000 > cache_open_read_end: 0.000 dns_lookup_begin: 0.000 dns_lookup_end: -1.000 > server_connect: -1.000 server_first_read: -1.000 server_read_header_done: > -1.000 server_close: -1.000 ua_close: 30.597 sm_finish: 30.597 > [Jun 22 15:33:47.441] Server {1098611008} ERROR: [8122409] Slow Request: url: > http://img04.taobaocdn.com/tps/i4/T1EM9gXltt-440-135.jpg status: 0 > unique id: bytes: 0 fd: 0 client state: 6 server state: 0 ua_begin: 0.000 > ua_read_header_done: 0.000 cache_open_read_begin: 0.000 cache_open_read_end: > 0.000 dns_lookup_begin: 0.000 dns_lookup_end: -1.000 server_connect: -1.000 > server_first_read: -1.000 server_read_header_done: -1.000 server_close: > -1.000 ua_close: 30.948 sm_finish: 30.948 > {code} > all http SM show from {http} in DNS_LOOKUP > and traffic.out show nothing with error, when I enable debug on > hostdb.*|dns.*: > {code} > [Jun 22 15:27:42.391] Server {1108986176} DEBUG: (hostdb) probe > img03.taobaocdn.com DB357D60B8247DC7 1 [ignore_timeout = 0] > [Jun 22 15:27:42.391] Server {1108986176} DEBUG: (hostdb) timeout 64204 > 1308663404 300 > [Jun 22 15:27:42.391] Server {1108986176} DEBUG: (hostdb) delaying force 0 > answer for img03.taobaocdn.com [timeout 0] > [Jun 22 15:27:42.411] Server {1108986176} DEBUG: (hostdb) probe > img03.taobaocdn.com DB357D60B8247DC7 1 [ignore_timeout = 0] > [Jun 22 15:27:42.411] Server {1108986176} DEBUG: (hostdb) timeout 64204 > 130
[jira] [Commented] (TS-852) hostdb and dns system hangup
[ https://issues.apache.org/jira/browse/TS-852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14001879#comment-14001879 ] Bryan Call commented on TS-852: --- [~psudaemon] Can you decide what to do on this one? Close it for 5.0 or move it to the 5.1 release. > hostdb and dns system hangup > > > Key: TS-852 > URL: https://issues.apache.org/jira/browse/TS-852 > Project: Traffic Server > Issue Type: Bug > Components: Core, DNS >Affects Versions: 3.1.0 >Reporter: Zhao Yongming >Assignee: Phil Sorber >Priority: Minor > Labels: dns, hostdb > Fix For: 3.1.0, 5.0.0 > > Attachments: TS-852.patch > > > in my testing, all request that need go OS, fails with 30s timeout, and the > slow query shows data from dns missed: > {code} > [Jun 22 15:33:47.183] Server {1106880832} ERROR: [8122411] Slow Request: url: > http://download.im.alisoft.com/aliim/AliIM6/update/packages//2011-5-4-10-10-40/0/modulesproxy/8203.xml.wwzip > status: 0 unique id: bytes: 0 fd: 0 client state: 6 server state: 0 > ua_begin: 0.000 ua_read_header_done: 0.000 cache_open_read_begin: 0.000 > cache_open_read_end: 0.000 dns_lookup_begin: 0.000 dns_lookup_end: -1.000 > server_connect: -1.000 server_first_read: -1.000 server_read_header_done: > -1.000 server_close: -1.000 ua_close: 30.667 sm_finish: 30.667 > [Jun 22 15:33:47.220] Server {1099663680} ERROR: [8122422] Slow Request: url: > http://img.uu1001.cn/materials/original/2010-11-22/22-48/a302876929a9c40a8272ac439a16ad25e74edf71.png > status: 0 unique id: bytes: 0 fd: 0 client state: 6 server state: 0 > ua_begin: 0.000 ua_read_header_done: 0.000 cache_open_read_begin: 0.000 > cache_open_read_end: 0.000 dns_lookup_begin: 0.000 dns_lookup_end: -1.000 > server_connect: -1.000 server_first_read: -1.000 server_read_header_done: > -1.000 server_close: -1.000 ua_close: 30.318 sm_finish: 30.318 > [Jun 22 15:33:47.441] Server {1098611008} ERROR: [8122421] Slow Request: url: > http://img02.taobaocdn.com/bao/uploaded/i2/T1mp8QXopdXXblNIZ6_061203.jpg_b.jpg > status: 0 unique id: bytes: 0 fd: 0 client state: 6 server state: 0 > ua_begin: 0.000 ua_read_header_done: 0.000 cache_open_read_begin: 0.000 > cache_open_read_end: 0.000 dns_lookup_begin: 0.000 dns_lookup_end: -1.000 > server_connect: -1.000 server_first_read: -1.000 server_read_header_done: > -1.000 server_close: -1.000 ua_close: 30.597 sm_finish: 30.597 > [Jun 22 15:33:47.441] Server {1098611008} ERROR: [8122409] Slow Request: url: > http://img04.taobaocdn.com/tps/i4/T1EM9gXltt-440-135.jpg status: 0 > unique id: bytes: 0 fd: 0 client state: 6 server state: 0 ua_begin: 0.000 > ua_read_header_done: 0.000 cache_open_read_begin: 0.000 cache_open_read_end: > 0.000 dns_lookup_begin: 0.000 dns_lookup_end: -1.000 server_connect: -1.000 > server_first_read: -1.000 server_read_header_done: -1.000 server_close: > -1.000 ua_close: 30.948 sm_finish: 30.948 > {code} > all http SM show from {http} in DNS_LOOKUP > and traffic.out show nothing with error, when I enable debug on > hostdb.*|dns.*: > {code} > [Jun 22 15:27:42.391] Server {1108986176} DEBUG: (hostdb) probe > img03.taobaocdn.com DB357D60B8247DC7 1 [ignore_timeout = 0] > [Jun 22 15:27:42.391] Server {1108986176} DEBUG: (hostdb) timeout 64204 > 1308663404 300 > [Jun 22 15:27:42.391] Server {1108986176} DEBUG: (hostdb) delaying force 0 > answer for img03.taobaocdn.com [timeout 0] > [Jun 22 15:27:42.411] Server {1108986176} DEBUG: (hostdb) probe > img03.taobaocdn.com DB357D60B8247DC7 1 [ignore_timeout = 0] > [Jun 22 15:27:42.411] Server {1108986176} DEBUG: (hostdb) timeout 64204 > 1308663404 300 > [Jun 22 15:27:42.411] Server {1108986176} DEBUG: (hostdb) DNS > img03.taobaocdn.com > [Jun 22 15:27:42.411] Server {1108986176} DEBUG: (hostdb) enqueuing > additional request > [Jun 22 15:27:42.416] Server {47177168876656} DEBUG: (hostdb) probe 127.0.0.1 > E9B7563422C93608 1 [ignore_timeout = 0] > [Jun 22 15:27:42.416] Server {47177168876656} DEBUG: (hostdb) immediate > answer for 127.0.0.1 > [Jun 22 15:27:42.422] Server {1098611008} DEBUG: (hostdb) probe > img08.taobaocdn.com 945198AE9AE37481 1 [ignore_timeout = 0] > [Jun 22 15:27:42.422] Server {1098611008} DEBUG: (hostdb) timeout 64281 > 1308663327 300 > [Jun 22 15:27:42.422] Server {1098611008} DEBUG: (hostdb) delaying force 0 > answer for img08.taobaocdn.com [timeout 0] > [Jun 22 15:27:42.441] Server {1098611008} DEBUG: (hostdb) probe > img08.taobaocdn.com 945198AE9AE37481 1 [ignore_timeout = 0] > [Jun 22 15:27:42.441] Server {1098611008} DEBUG: (hostdb) timeout 64281 > 1308663327 300 > [Jun 22 15:27:42.441] Server {1098611008} DEBUG: (hostdb) DNS > img08.taobaocdn.com > [Jun 22 15:27:42.441] Server {1098611008} DEBUG: (hostdb) enqueuing > additional request > [Jun 22 15:2
[jira] [Commented] (TS-852) hostdb and dns system hangup
[ https://issues.apache.org/jira/browse/TS-852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13912177#comment-13912177 ] Leif Hedstrom commented on TS-852: -- Phil? > hostdb and dns system hangup > > > Key: TS-852 > URL: https://issues.apache.org/jira/browse/TS-852 > Project: Traffic Server > Issue Type: Bug > Components: Core, DNS >Affects Versions: 3.1.0 >Reporter: Zhao Yongming >Assignee: Phil Sorber >Priority: Minor > Labels: dns, hostdb > Fix For: 3.1.0, 5.0.0 > > Attachments: TS-852.patch > > > in my testing, all request that need go OS, fails with 30s timeout, and the > slow query shows data from dns missed: > {code} > [Jun 22 15:33:47.183] Server {1106880832} ERROR: [8122411] Slow Request: url: > http://download.im.alisoft.com/aliim/AliIM6/update/packages//2011-5-4-10-10-40/0/modulesproxy/8203.xml.wwzip > status: 0 unique id: bytes: 0 fd: 0 client state: 6 server state: 0 > ua_begin: 0.000 ua_read_header_done: 0.000 cache_open_read_begin: 0.000 > cache_open_read_end: 0.000 dns_lookup_begin: 0.000 dns_lookup_end: -1.000 > server_connect: -1.000 server_first_read: -1.000 server_read_header_done: > -1.000 server_close: -1.000 ua_close: 30.667 sm_finish: 30.667 > [Jun 22 15:33:47.220] Server {1099663680} ERROR: [8122422] Slow Request: url: > http://img.uu1001.cn/materials/original/2010-11-22/22-48/a302876929a9c40a8272ac439a16ad25e74edf71.png > status: 0 unique id: bytes: 0 fd: 0 client state: 6 server state: 0 > ua_begin: 0.000 ua_read_header_done: 0.000 cache_open_read_begin: 0.000 > cache_open_read_end: 0.000 dns_lookup_begin: 0.000 dns_lookup_end: -1.000 > server_connect: -1.000 server_first_read: -1.000 server_read_header_done: > -1.000 server_close: -1.000 ua_close: 30.318 sm_finish: 30.318 > [Jun 22 15:33:47.441] Server {1098611008} ERROR: [8122421] Slow Request: url: > http://img02.taobaocdn.com/bao/uploaded/i2/T1mp8QXopdXXblNIZ6_061203.jpg_b.jpg > status: 0 unique id: bytes: 0 fd: 0 client state: 6 server state: 0 > ua_begin: 0.000 ua_read_header_done: 0.000 cache_open_read_begin: 0.000 > cache_open_read_end: 0.000 dns_lookup_begin: 0.000 dns_lookup_end: -1.000 > server_connect: -1.000 server_first_read: -1.000 server_read_header_done: > -1.000 server_close: -1.000 ua_close: 30.597 sm_finish: 30.597 > [Jun 22 15:33:47.441] Server {1098611008} ERROR: [8122409] Slow Request: url: > http://img04.taobaocdn.com/tps/i4/T1EM9gXltt-440-135.jpg status: 0 > unique id: bytes: 0 fd: 0 client state: 6 server state: 0 ua_begin: 0.000 > ua_read_header_done: 0.000 cache_open_read_begin: 0.000 cache_open_read_end: > 0.000 dns_lookup_begin: 0.000 dns_lookup_end: -1.000 server_connect: -1.000 > server_first_read: -1.000 server_read_header_done: -1.000 server_close: > -1.000 ua_close: 30.948 sm_finish: 30.948 > {code} > all http SM show from {http} in DNS_LOOKUP > and traffic.out show nothing with error, when I enable debug on > hostdb.*|dns.*: > {code} > [Jun 22 15:27:42.391] Server {1108986176} DEBUG: (hostdb) probe > img03.taobaocdn.com DB357D60B8247DC7 1 [ignore_timeout = 0] > [Jun 22 15:27:42.391] Server {1108986176} DEBUG: (hostdb) timeout 64204 > 1308663404 300 > [Jun 22 15:27:42.391] Server {1108986176} DEBUG: (hostdb) delaying force 0 > answer for img03.taobaocdn.com [timeout 0] > [Jun 22 15:27:42.411] Server {1108986176} DEBUG: (hostdb) probe > img03.taobaocdn.com DB357D60B8247DC7 1 [ignore_timeout = 0] > [Jun 22 15:27:42.411] Server {1108986176} DEBUG: (hostdb) timeout 64204 > 1308663404 300 > [Jun 22 15:27:42.411] Server {1108986176} DEBUG: (hostdb) DNS > img03.taobaocdn.com > [Jun 22 15:27:42.411] Server {1108986176} DEBUG: (hostdb) enqueuing > additional request > [Jun 22 15:27:42.416] Server {47177168876656} DEBUG: (hostdb) probe 127.0.0.1 > E9B7563422C93608 1 [ignore_timeout = 0] > [Jun 22 15:27:42.416] Server {47177168876656} DEBUG: (hostdb) immediate > answer for 127.0.0.1 > [Jun 22 15:27:42.422] Server {1098611008} DEBUG: (hostdb) probe > img08.taobaocdn.com 945198AE9AE37481 1 [ignore_timeout = 0] > [Jun 22 15:27:42.422] Server {1098611008} DEBUG: (hostdb) timeout 64281 > 1308663327 300 > [Jun 22 15:27:42.422] Server {1098611008} DEBUG: (hostdb) delaying force 0 > answer for img08.taobaocdn.com [timeout 0] > [Jun 22 15:27:42.441] Server {1098611008} DEBUG: (hostdb) probe > img08.taobaocdn.com 945198AE9AE37481 1 [ignore_timeout = 0] > [Jun 22 15:27:42.441] Server {1098611008} DEBUG: (hostdb) timeout 64281 > 1308663327 300 > [Jun 22 15:27:42.441] Server {1098611008} DEBUG: (hostdb) DNS > img08.taobaocdn.com > [Jun 22 15:27:42.441] Server {1098611008} DEBUG: (hostdb) enqueuing > additional request > [Jun 22 15:27:42.470] Server {1099663680} DEBUG: (hostdb) probe 127.0.0.1 > E9B7563422C93608 1 [ignore
[jira] [Commented] (TS-852) hostdb and dns system hangup
[ https://issues.apache.org/jira/browse/TS-852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13877949#comment-13877949 ] Phil Sorber commented on TS-852: This has a resolution of fixed, but I don't think this patch was ever applied. I think this is still a legitimate race condition, but unsure if this patch is exactly right. I think we need to reset some variables inside the error conditional. > hostdb and dns system hangup > > > Key: TS-852 > URL: https://issues.apache.org/jira/browse/TS-852 > Project: Traffic Server > Issue Type: Bug > Components: Core, DNS >Affects Versions: 3.1.0 >Reporter: Zhao Yongming >Priority: Critical > Labels: dns, hostdb > Fix For: 3.1.0 > > Attachments: TS-852.patch > > > in my testing, all request that need go OS, fails with 30s timeout, and the > slow query shows data from dns missed: > {code} > [Jun 22 15:33:47.183] Server {1106880832} ERROR: [8122411] Slow Request: url: > http://download.im.alisoft.com/aliim/AliIM6/update/packages//2011-5-4-10-10-40/0/modulesproxy/8203.xml.wwzip > status: 0 unique id: bytes: 0 fd: 0 client state: 6 server state: 0 > ua_begin: 0.000 ua_read_header_done: 0.000 cache_open_read_begin: 0.000 > cache_open_read_end: 0.000 dns_lookup_begin: 0.000 dns_lookup_end: -1.000 > server_connect: -1.000 server_first_read: -1.000 server_read_header_done: > -1.000 server_close: -1.000 ua_close: 30.667 sm_finish: 30.667 > [Jun 22 15:33:47.220] Server {1099663680} ERROR: [8122422] Slow Request: url: > http://img.uu1001.cn/materials/original/2010-11-22/22-48/a302876929a9c40a8272ac439a16ad25e74edf71.png > status: 0 unique id: bytes: 0 fd: 0 client state: 6 server state: 0 > ua_begin: 0.000 ua_read_header_done: 0.000 cache_open_read_begin: 0.000 > cache_open_read_end: 0.000 dns_lookup_begin: 0.000 dns_lookup_end: -1.000 > server_connect: -1.000 server_first_read: -1.000 server_read_header_done: > -1.000 server_close: -1.000 ua_close: 30.318 sm_finish: 30.318 > [Jun 22 15:33:47.441] Server {1098611008} ERROR: [8122421] Slow Request: url: > http://img02.taobaocdn.com/bao/uploaded/i2/T1mp8QXopdXXblNIZ6_061203.jpg_b.jpg > status: 0 unique id: bytes: 0 fd: 0 client state: 6 server state: 0 > ua_begin: 0.000 ua_read_header_done: 0.000 cache_open_read_begin: 0.000 > cache_open_read_end: 0.000 dns_lookup_begin: 0.000 dns_lookup_end: -1.000 > server_connect: -1.000 server_first_read: -1.000 server_read_header_done: > -1.000 server_close: -1.000 ua_close: 30.597 sm_finish: 30.597 > [Jun 22 15:33:47.441] Server {1098611008} ERROR: [8122409] Slow Request: url: > http://img04.taobaocdn.com/tps/i4/T1EM9gXltt-440-135.jpg status: 0 > unique id: bytes: 0 fd: 0 client state: 6 server state: 0 ua_begin: 0.000 > ua_read_header_done: 0.000 cache_open_read_begin: 0.000 cache_open_read_end: > 0.000 dns_lookup_begin: 0.000 dns_lookup_end: -1.000 server_connect: -1.000 > server_first_read: -1.000 server_read_header_done: -1.000 server_close: > -1.000 ua_close: 30.948 sm_finish: 30.948 > {code} > all http SM show from {http} in DNS_LOOKUP > and traffic.out show nothing with error, when I enable debug on > hostdb.*|dns.*: > {code} > [Jun 22 15:27:42.391] Server {1108986176} DEBUG: (hostdb) probe > img03.taobaocdn.com DB357D60B8247DC7 1 [ignore_timeout = 0] > [Jun 22 15:27:42.391] Server {1108986176} DEBUG: (hostdb) timeout 64204 > 1308663404 300 > [Jun 22 15:27:42.391] Server {1108986176} DEBUG: (hostdb) delaying force 0 > answer for img03.taobaocdn.com [timeout 0] > [Jun 22 15:27:42.411] Server {1108986176} DEBUG: (hostdb) probe > img03.taobaocdn.com DB357D60B8247DC7 1 [ignore_timeout = 0] > [Jun 22 15:27:42.411] Server {1108986176} DEBUG: (hostdb) timeout 64204 > 1308663404 300 > [Jun 22 15:27:42.411] Server {1108986176} DEBUG: (hostdb) DNS > img03.taobaocdn.com > [Jun 22 15:27:42.411] Server {1108986176} DEBUG: (hostdb) enqueuing > additional request > [Jun 22 15:27:42.416] Server {47177168876656} DEBUG: (hostdb) probe 127.0.0.1 > E9B7563422C93608 1 [ignore_timeout = 0] > [Jun 22 15:27:42.416] Server {47177168876656} DEBUG: (hostdb) immediate > answer for 127.0.0.1 > [Jun 22 15:27:42.422] Server {1098611008} DEBUG: (hostdb) probe > img08.taobaocdn.com 945198AE9AE37481 1 [ignore_timeout = 0] > [Jun 22 15:27:42.422] Server {1098611008} DEBUG: (hostdb) timeout 64281 > 1308663327 300 > [Jun 22 15:27:42.422] Server {1098611008} DEBUG: (hostdb) delaying force 0 > answer for img08.taobaocdn.com [timeout 0] > [Jun 22 15:27:42.441] Server {1098611008} DEBUG: (hostdb) probe > img08.taobaocdn.com 945198AE9AE37481 1 [ignore_timeout = 0] > [Jun 22 15:27:42.441] Server {1098611008} DEBUG: (hostdb) timeout 64281 > 1308663327 300 > [Jun 22 15:27:42.441] Server {1098611008} DEBUG: (hostdb) DNS > img08.taobaocdn.com
[jira] [Commented] (TS-852) hostdb and dns system hangup
[ https://issues.apache.org/jira/browse/TS-852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13059025#comment-13059025 ] AdunGaos commented on TS-852: - I test with patches from TS-833 applied as well. use TS-833-3.diff and TS-833.diff(which should for TS-834). > hostdb and dns system hangup > > > Key: TS-852 > URL: https://issues.apache.org/jira/browse/TS-852 > Project: Traffic Server > Issue Type: Bug > Components: Core, DNS >Affects Versions: 3.1.0 >Reporter: Zhao Yongming >Priority: Critical > Labels: dns, hostdb > Fix For: 3.1.0 > > Attachments: TS-852.patch > > > in my testing, all request that need go OS, fails with 30s timeout, and the > slow query shows data from dns missed: > {code} > [Jun 22 15:33:47.183] Server {1106880832} ERROR: [8122411] Slow Request: url: > http://download.im.alisoft.com/aliim/AliIM6/update/packages//2011-5-4-10-10-40/0/modulesproxy/8203.xml.wwzip > status: 0 unique id: bytes: 0 fd: 0 client state: 6 server state: 0 > ua_begin: 0.000 ua_read_header_done: 0.000 cache_open_read_begin: 0.000 > cache_open_read_end: 0.000 dns_lookup_begin: 0.000 dns_lookup_end: -1.000 > server_connect: -1.000 server_first_read: -1.000 server_read_header_done: > -1.000 server_close: -1.000 ua_close: 30.667 sm_finish: 30.667 > [Jun 22 15:33:47.220] Server {1099663680} ERROR: [8122422] Slow Request: url: > http://img.uu1001.cn/materials/original/2010-11-22/22-48/a302876929a9c40a8272ac439a16ad25e74edf71.png > status: 0 unique id: bytes: 0 fd: 0 client state: 6 server state: 0 > ua_begin: 0.000 ua_read_header_done: 0.000 cache_open_read_begin: 0.000 > cache_open_read_end: 0.000 dns_lookup_begin: 0.000 dns_lookup_end: -1.000 > server_connect: -1.000 server_first_read: -1.000 server_read_header_done: > -1.000 server_close: -1.000 ua_close: 30.318 sm_finish: 30.318 > [Jun 22 15:33:47.441] Server {1098611008} ERROR: [8122421] Slow Request: url: > http://img02.taobaocdn.com/bao/uploaded/i2/T1mp8QXopdXXblNIZ6_061203.jpg_b.jpg > status: 0 unique id: bytes: 0 fd: 0 client state: 6 server state: 0 > ua_begin: 0.000 ua_read_header_done: 0.000 cache_open_read_begin: 0.000 > cache_open_read_end: 0.000 dns_lookup_begin: 0.000 dns_lookup_end: -1.000 > server_connect: -1.000 server_first_read: -1.000 server_read_header_done: > -1.000 server_close: -1.000 ua_close: 30.597 sm_finish: 30.597 > [Jun 22 15:33:47.441] Server {1098611008} ERROR: [8122409] Slow Request: url: > http://img04.taobaocdn.com/tps/i4/T1EM9gXltt-440-135.jpg status: 0 > unique id: bytes: 0 fd: 0 client state: 6 server state: 0 ua_begin: 0.000 > ua_read_header_done: 0.000 cache_open_read_begin: 0.000 cache_open_read_end: > 0.000 dns_lookup_begin: 0.000 dns_lookup_end: -1.000 server_connect: -1.000 > server_first_read: -1.000 server_read_header_done: -1.000 server_close: > -1.000 ua_close: 30.948 sm_finish: 30.948 > {code} > all http SM show from {http} in DNS_LOOKUP > and traffic.out show nothing with error, when I enable debug on > hostdb.*|dns.*: > {code} > [Jun 22 15:27:42.391] Server {1108986176} DEBUG: (hostdb) probe > img03.taobaocdn.com DB357D60B8247DC7 1 [ignore_timeout = 0] > [Jun 22 15:27:42.391] Server {1108986176} DEBUG: (hostdb) timeout 64204 > 1308663404 300 > [Jun 22 15:27:42.391] Server {1108986176} DEBUG: (hostdb) delaying force 0 > answer for img03.taobaocdn.com [timeout 0] > [Jun 22 15:27:42.411] Server {1108986176} DEBUG: (hostdb) probe > img03.taobaocdn.com DB357D60B8247DC7 1 [ignore_timeout = 0] > [Jun 22 15:27:42.411] Server {1108986176} DEBUG: (hostdb) timeout 64204 > 1308663404 300 > [Jun 22 15:27:42.411] Server {1108986176} DEBUG: (hostdb) DNS > img03.taobaocdn.com > [Jun 22 15:27:42.411] Server {1108986176} DEBUG: (hostdb) enqueuing > additional request > [Jun 22 15:27:42.416] Server {47177168876656} DEBUG: (hostdb) probe 127.0.0.1 > E9B7563422C93608 1 [ignore_timeout = 0] > [Jun 22 15:27:42.416] Server {47177168876656} DEBUG: (hostdb) immediate > answer for 127.0.0.1 > [Jun 22 15:27:42.422] Server {1098611008} DEBUG: (hostdb) probe > img08.taobaocdn.com 945198AE9AE37481 1 [ignore_timeout = 0] > [Jun 22 15:27:42.422] Server {1098611008} DEBUG: (hostdb) timeout 64281 > 1308663327 300 > [Jun 22 15:27:42.422] Server {1098611008} DEBUG: (hostdb) delaying force 0 > answer for img08.taobaocdn.com [timeout 0] > [Jun 22 15:27:42.441] Server {1098611008} DEBUG: (hostdb) probe > img08.taobaocdn.com 945198AE9AE37481 1 [ignore_timeout = 0] > [Jun 22 15:27:42.441] Server {1098611008} DEBUG: (hostdb) timeout 64281 > 1308663327 300 > [Jun 22 15:27:42.441] Server {1098611008} DEBUG: (hostdb) DNS > img08.taobaocdn.com > [Jun 22 15:27:42.441] Server {1098611008} DEBUG: (hostdb) enqueuing > additional request > [Jun 22 15:27:42.470] Server {1099663680} DEBUG
[jira] [Commented] (TS-852) hostdb and dns system hangup
[ https://issues.apache.org/jira/browse/TS-852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13059010#comment-13059010 ] Igor Galić commented on TS-852: --- Is this with this patch only, or with the patches in TS-833 applied as well? > hostdb and dns system hangup > > > Key: TS-852 > URL: https://issues.apache.org/jira/browse/TS-852 > Project: Traffic Server > Issue Type: Bug > Components: Core, DNS >Affects Versions: 3.1.0 >Reporter: Zhao Yongming >Priority: Critical > Labels: dns, hostdb > Fix For: 3.1.0 > > Attachments: TS-852.patch > > > in my testing, all request that need go OS, fails with 30s timeout, and the > slow query shows data from dns missed: > {code} > [Jun 22 15:33:47.183] Server {1106880832} ERROR: [8122411] Slow Request: url: > http://download.im.alisoft.com/aliim/AliIM6/update/packages//2011-5-4-10-10-40/0/modulesproxy/8203.xml.wwzip > status: 0 unique id: bytes: 0 fd: 0 client state: 6 server state: 0 > ua_begin: 0.000 ua_read_header_done: 0.000 cache_open_read_begin: 0.000 > cache_open_read_end: 0.000 dns_lookup_begin: 0.000 dns_lookup_end: -1.000 > server_connect: -1.000 server_first_read: -1.000 server_read_header_done: > -1.000 server_close: -1.000 ua_close: 30.667 sm_finish: 30.667 > [Jun 22 15:33:47.220] Server {1099663680} ERROR: [8122422] Slow Request: url: > http://img.uu1001.cn/materials/original/2010-11-22/22-48/a302876929a9c40a8272ac439a16ad25e74edf71.png > status: 0 unique id: bytes: 0 fd: 0 client state: 6 server state: 0 > ua_begin: 0.000 ua_read_header_done: 0.000 cache_open_read_begin: 0.000 > cache_open_read_end: 0.000 dns_lookup_begin: 0.000 dns_lookup_end: -1.000 > server_connect: -1.000 server_first_read: -1.000 server_read_header_done: > -1.000 server_close: -1.000 ua_close: 30.318 sm_finish: 30.318 > [Jun 22 15:33:47.441] Server {1098611008} ERROR: [8122421] Slow Request: url: > http://img02.taobaocdn.com/bao/uploaded/i2/T1mp8QXopdXXblNIZ6_061203.jpg_b.jpg > status: 0 unique id: bytes: 0 fd: 0 client state: 6 server state: 0 > ua_begin: 0.000 ua_read_header_done: 0.000 cache_open_read_begin: 0.000 > cache_open_read_end: 0.000 dns_lookup_begin: 0.000 dns_lookup_end: -1.000 > server_connect: -1.000 server_first_read: -1.000 server_read_header_done: > -1.000 server_close: -1.000 ua_close: 30.597 sm_finish: 30.597 > [Jun 22 15:33:47.441] Server {1098611008} ERROR: [8122409] Slow Request: url: > http://img04.taobaocdn.com/tps/i4/T1EM9gXltt-440-135.jpg status: 0 > unique id: bytes: 0 fd: 0 client state: 6 server state: 0 ua_begin: 0.000 > ua_read_header_done: 0.000 cache_open_read_begin: 0.000 cache_open_read_end: > 0.000 dns_lookup_begin: 0.000 dns_lookup_end: -1.000 server_connect: -1.000 > server_first_read: -1.000 server_read_header_done: -1.000 server_close: > -1.000 ua_close: 30.948 sm_finish: 30.948 > {code} > all http SM show from {http} in DNS_LOOKUP > and traffic.out show nothing with error, when I enable debug on > hostdb.*|dns.*: > {code} > [Jun 22 15:27:42.391] Server {1108986176} DEBUG: (hostdb) probe > img03.taobaocdn.com DB357D60B8247DC7 1 [ignore_timeout = 0] > [Jun 22 15:27:42.391] Server {1108986176} DEBUG: (hostdb) timeout 64204 > 1308663404 300 > [Jun 22 15:27:42.391] Server {1108986176} DEBUG: (hostdb) delaying force 0 > answer for img03.taobaocdn.com [timeout 0] > [Jun 22 15:27:42.411] Server {1108986176} DEBUG: (hostdb) probe > img03.taobaocdn.com DB357D60B8247DC7 1 [ignore_timeout = 0] > [Jun 22 15:27:42.411] Server {1108986176} DEBUG: (hostdb) timeout 64204 > 1308663404 300 > [Jun 22 15:27:42.411] Server {1108986176} DEBUG: (hostdb) DNS > img03.taobaocdn.com > [Jun 22 15:27:42.411] Server {1108986176} DEBUG: (hostdb) enqueuing > additional request > [Jun 22 15:27:42.416] Server {47177168876656} DEBUG: (hostdb) probe 127.0.0.1 > E9B7563422C93608 1 [ignore_timeout = 0] > [Jun 22 15:27:42.416] Server {47177168876656} DEBUG: (hostdb) immediate > answer for 127.0.0.1 > [Jun 22 15:27:42.422] Server {1098611008} DEBUG: (hostdb) probe > img08.taobaocdn.com 945198AE9AE37481 1 [ignore_timeout = 0] > [Jun 22 15:27:42.422] Server {1098611008} DEBUG: (hostdb) timeout 64281 > 1308663327 300 > [Jun 22 15:27:42.422] Server {1098611008} DEBUG: (hostdb) delaying force 0 > answer for img08.taobaocdn.com [timeout 0] > [Jun 22 15:27:42.441] Server {1098611008} DEBUG: (hostdb) probe > img08.taobaocdn.com 945198AE9AE37481 1 [ignore_timeout = 0] > [Jun 22 15:27:42.441] Server {1098611008} DEBUG: (hostdb) timeout 64281 > 1308663327 300 > [Jun 22 15:27:42.441] Server {1098611008} DEBUG: (hostdb) DNS > img08.taobaocdn.com > [Jun 22 15:27:42.441] Server {1098611008} DEBUG: (hostdb) enqueuing > additional request > [Jun 22 15:27:42.470] Server {1099663680} DEBUG: (hostdb) probe 127.0.0.1 >
[jira] [Commented] (TS-852) hostdb and dns system hangup
[ https://issues.apache.org/jira/browse/TS-852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13058974#comment-13058974 ] AdunGaos commented on TS-852: - We have do lots of work at this issue, found some information about it: Default: {code} proxy.config.dns.lookup_timeout=20 proxy.config.hostdb.ttl_mode=0 proxy.config.hostdb.verify_after=720 {code} Our dns ttl is 300 seconds, this cause when a hostdb entry timeout, it will send a dns requst to dns system, ATS CAN NOT DO ANYTHING about this domain's request until dns system return an answer. BUT a dns request may be timeout in some status. this cause ATS DON'T WORK a long time(proxy.config.dns.lookup_timeout=20). In our test, about 3000qps, when this issue appear, will cause ATS stack so many request, even after dns timeout, a dns retry answer right, OLD requst take a long time to dequeue from hostdb. so it seems like hostdb hungup. DNS requst may timeout because next issue: Our nameserver is 127.0.0.1, it will answer requst so quickly less than 0.4ms. When a thread send dns requst, and it not ready for receive answer, the answer received by anthoer thread, It NOT FOUND correct DNS id from dnsProcessor.handler, cause this answer droped, until dns timeout, it do a retry. Next log is our set proxy.config.dns.lookup_timeout=2: step 1. send request: {code} [Jun 26 03:29:12.714] Server {1103427904} DEBUG: (dns) received query img.taobaocdn.com type = 1, timeout = ,0 [Jun 26 03:29:12.714] Server {1103427904} DEBUG: (dns) enqueing query img.taobaocdn.com [Jun 26 03:29:12.715] Server {1103427904} DEBUG: (dns) adding first to collapsing queue [Jun 26 03:29:12.715] Server {1103427904} DEBUG: (dns) send query for img.taobaocdn.com to fd 51 {code} step 2. answer received by an other thread but drop it. {code} [Jun 26 03:29:12.715] Server {47576670086256} DEBUG: (dns) received packet size = 157 [Jun 26 03:29:12.715] Server {47576670086256} DEBUG: (dns) primary DNS response code = 0 [Jun 26 03:29:12.715] Server {47576670086256} DEBUG: (dns) unknown DNS id = 62816 {code} step 3. thread 1 sent finished after answer has been droped. {code} [Jun 26 03:29:12.715] Server {1103427904} DEBUG: (dns) sent qname = img.taobaocdn.com, id = 62816, nameserver = 0 [Jun 26 03:29:12.715] Server {1103427904} DEBUG: (dns) sent_one: failover_number for resolver 0 is 1 {code} step 4. timeout for query, do a retry and get right answer this time. {code} [Jun 26 03:29:14.710] Server {47576670086256} DEBUG: (dns) timeout for query img.taobaocdn.com [Jun 26 03:29:14.710] Server {47576670086256} DEBUG: (dns) marking img.taobaocdn.com as not-written [Jun 26 03:29:14.710] Server {47576670086256} DEBUG: (dns) doing retry for img.taobaocdn.com [Jun 26 03:29:14.710] Server {47576670086256} DEBUG: (dns) send query for img.taobaocdn.com to fd 51 [Jun 26 03:29:14.710] Server {47576670086256} DEBUG: (dns) sent qname = img.taobaocdn.com, id = 39679, nameserver = 0 [Jun 26 03:29:14.710] Server {47576670086256} DEBUG: (dns) sent_one: failover_number for resolver 0 is 2 [Jun 26 03:29:14.710] Server {47576670086256} DEBUG: (dns) received packet size = 157 [Jun 26 03:29:14.710] Server {47576670086256} DEBUG: (dns) primary DNS response code = 0 [Jun 26 03:29:14.710] Server {47576670086256} DEBUG: (dns) received cname = taobao [Jun 26 03:29:14.710] Server {47576670086256} DEBUG: (dns) received A name = taobao [Jun 26 03:29:14.710] Server {47576670086256} DEBUG: (dns) received A = 122.224.194.211 [Jun 26 03:29:14.710] Server {47576670086256} DEBUG: (dns) received A = 122.224.194.212 [Jun 26 03:29:14.710] Server {47576670086256} DEBUG: (dns) received A = 110.75.3.141 [Jun 26 03:29:14.710] Server {47576670086256} DEBUG: (dns) received A = 110.75.3.142 [Jun 26 03:29:14.710] Server {47576670086256} DEBUG: (dns) SUCCESS result for img.taobaocdn.com = 122.224.194.211 retry 0 [Jun 26 03:29:14.710] Server {1103427904} DEBUG: (dns) called back continuation for img.taobaocdn.com {code} We do a small patch(change some code's order) about this issue, it seems this behavior not appeared again. Any one can take a look at this patch, declare this patch is right or no, or it may has other issue? P.S. after this patch, TS-833 seems gone. I don't clear how this two issue seems is related. > hostdb and dns system hangup > > > Key: TS-852 > URL: https://issues.apache.org/jira/browse/TS-852 > Project: Traffic Server > Issue Type: Bug > Components: Core, DNS >Affects Versions: 3.1.0 >Reporter: Zhao Yongming >Priority: Critical > Labels: dns, hostdb > Fix For: 3.1.0 > > > in my testing, all request that need go OS, fails with 30s timeout, and the > slow query shows data from dns missed: > {code} > [Jun 22 15:33:47.183] Server {1106880832} ERROR: [8122411] Slow Request: u