smalenfant opened a new issue #7710: URL: https://github.com/apache/trafficserver/issues/7710
After we upgraded from ATS 7.1.x to ATS 8.1.x, our backbone team went into several maintenances and noted impact on our CDN delivery. After the 3rd night of it, we finally found that something was wrong with the new installed code. ``` # cat /opt/trafficserver/etc/trafficserver/records.config | grep connect_attempts_timeout CONFIG proxy.config.http.connect_attempts_timeout INT 2 CONFIG proxy.config.http.parent_proxy.connect_attempts_timeout INT 2 ``` To make it short, the connect_attempts_timeout (at least on parent selection) or not observed under 2 scenarios: - If ICMP comes back with "No Route to Host". This might impact the timeouts. We've seen under ARP failures that the timeout could be from 3 to 6 seconds (single to dual stack). - If a host can't connect (Example: iptables DROP on port 80), the timeouts observed were 60 seconds. Here's the results for ATS7 (7.1.1.10 to be specific) - Expected results within 4 seconds (OK): ``` $ curl -I 'http://cdn1cdedge0002.hidden.cdn1.coxlab.net/test.mpg' -s -o /dev/null -w "Connect: %{time_connect} DNS: %{time_namelookup} TTFB: %{time_starttransfer} Total time: %{time_total}" Connect: 1.005 DNS: 0.004 TTFB: 3.984 Total time: 3.984 [Apr 16 13:46:46.661] Server {0x2b28de030700} DEBUG: (parent_select) In ParentConfigParams::findParent(): parent_table: 0x1ac6050. [Apr 16 13:46:46.661] Server {0x2b28de030700} DEBUG: (parent_select) policy.ParentEnable: 1 [Apr 16 13:46:46.661] Server {0x2b28de030700} DEBUG: (parent_select) Matched with 0x1be6c08 parent node from line 67 [Apr 16 13:46:46.661] Server {0x2b28de030700} DEBUG: (parent_select) Matched with 0x1be6158 parent node from line 29 [Apr 16 13:46:46.662] Server {0x2b28de030700} DEBUG: (parent_select) ParentConsistentHash::selectParent(): Using a consistent hash parent selection strategy. [Apr 16 13:46:46.662] Server {0x2b28de030700} DEBUG: (parent_select) Chosen parent: cdn1cdmid0003.coxlab.net.80 [Apr 16 13:46:46.662] Server {0x2b28de030700} DEBUG: (parent_select) PARENT_SPECIFIED [Apr 16 13:46:46.662] Server {0x2b28de030700} DEBUG: (parent_select) Result for origin.rd.at.cox.net was parent cdn1cdmid0003.coxlab.net:80 [Apr 16 13:46:49.629] Server {0x2b28de030700} DEBUG: (parent_select) Starting ParentConsistentHash::markParentDown() [Apr 16 13:46:49.629] Server {0x2b28de030700} DEBUG: (parent_select) Parent fail count increased to 1 for cdn1cdmid0003.coxlab.net:80 [Apr 16 13:46:49.629] Server {0x2b28de030700} DEBUG: (parent_select) ParentConfigParams::nextParent(): parent_table: 0x1ac6050, result->rec: 0x1be6158 [Apr 16 13:46:49.629] Server {0x2b28de030700} DEBUG: (parent_select) ParentConfigParams::nextParent(): result->r: 2, tablePtr: 0x1ac6050 [Apr 16 13:46:49.629] Server {0x2b28de030700} DEBUG: (parent_select) Calling selectParent() from nextParent [Apr 16 13:46:49.629] Server {0x2b28de030700} DEBUG: (parent_select) ParentConsistentHash::selectParent(): Using a consistent hash parent selection strategy. [Apr 16 13:46:49.629] Server {0x2b28de030700} DEBUG: (parent_select) Chosen parent: cdn1cdmid0004.coxlab.net.80 [Apr 16 13:46:49.629] Server {0x2b28de030700} DEBUG: (parent_select) Retry result for origin.rd.at.cox.net was parent cdn1cdmid0004.coxlab.net:80 ``` Here's the results for ATS8 (8.1.2 to be specific) - Expected results within 4 seconds (NOT OK - took 60 seconds): ``` $ curl -I 'http://cdn1cdedge0004.hidden.cdn1.coxlab.net/test.mpg' -s -o /dev/null -w "Connect: %{time_connect} DNS: %{time_namelookup} TTFB: %{time_starttransfer} Total time: %{time_total}" Connect: 0.005 DNS: 0.004 TTFB: 60.567 Total time: 60.567 Apr 16 13:44:37 cdn1cdedge0004 traffic_manager: [Apr 16 13:44:37.160] [ET_NET 19] DEBUG: (parent_select) In ParentConfigParams::findParent(): parent_table: 0x1934050. Apr 16 13:44:37 cdn1cdedge0004 traffic_manager: [Apr 16 13:44:37.160] [ET_NET 19] DEBUG: (parent_select) policy.ParentEnable: 1 Apr 16 13:44:37 cdn1cdedge0004 traffic_manager: [Apr 16 13:44:37.160] [ET_NET 19] DEBUG: (parent_select) Matched with 0x1a7e140 parent node from line 45 Apr 16 13:44:37 cdn1cdedge0004 traffic_manager: [Apr 16 13:44:37.160] [ET_NET 19] DEBUG: (parent_select) Matched with 0x1a7d498 parent node from line 18 Apr 16 13:44:37 cdn1cdedge0004 traffic_manager: [Apr 16 13:44:37.160] [ET_NET 19] DEBUG: (parent_select) ParentConsistentHash::selectParent(): Using a consistent hash parent selection strategy. Apr 16 13:44:37 cdn1cdedge0004 traffic_manager: [Apr 16 13:44:37.160] [ET_NET 19] DEBUG: (parent_select) Initial parent lookups: 1 Apr 16 13:44:37 cdn1cdedge0004 traffic_manager: [Apr 16 13:44:37.160] [ET_NET 19] DEBUG: (parent_select) Additional parent lookups: 1 Apr 16 13:44:37 cdn1cdedge0004 traffic_manager: [Apr 16 13:44:37.160] [ET_NET 19] DEBUG: (parent_select) Chosen parent: cdn1cdmid0003.coxlab.net.80 Apr 16 13:44:37 cdn1cdedge0004 traffic_manager: [Apr 16 13:44:37.160] [ET_NET 19] DEBUG: (parent_select) PARENT_SPECIFIED Apr 16 13:44:37 cdn1cdedge0004 traffic_manager: [Apr 16 13:44:37.160] [ET_NET 19] DEBUG: (parent_select) Result for origin.rd.at.cox.net was parent cdn1cdmid0003.coxlab.net:80 Apr 16 13:45:28 cdn1cdedge0004 traffic_manager: [Apr 16 13:45:28.719] [ET_NET 0] DEBUG: (parent_select) Parent fail count increased to 1 for cdn1cdmid0003.coxlab.net:80 Apr 16 13:45:28 cdn1cdedge0004 traffic_manager: [Apr 16 13:45:28.719] [ET_NET 0] DEBUG: (parent_select) ParentConfigParams::nextParent(): parent_table: 0x1934050, result->rec: 0x1a7d498 Apr 16 13:45:28 cdn1cdedge0004 traffic_manager: [Apr 16 13:45:28.719] [ET_NET 0] DEBUG: (parent_select) ParentConfigParams::nextParent(): result->r: 2, tablePtr: 0x1934050 Apr 16 13:45:28 cdn1cdedge0004 traffic_manager: [Apr 16 13:45:28.719] [ET_NET 0] DEBUG: (parent_select) Calling selectParent() from nextParent Apr 16 13:45:28 cdn1cdedge0004 traffic_manager: [Apr 16 13:45:28.719] [ET_NET 0] DEBUG: (parent_select) ParentConsistentHash::selectParent(): Using a consistent hash parent selection strategy. Apr 16 13:45:28 cdn1cdedge0004 traffic_manager: [Apr 16 13:45:28.719] [ET_NET 0] DEBUG: (parent_select) Initial parent lookups: 1 Apr 16 13:45:28 cdn1cdedge0004 traffic_manager: [Apr 16 13:45:28.719] [ET_NET 0] DEBUG: (parent_select) Additional parent lookups: 1 Apr 16 13:45:28 cdn1cdedge0004 traffic_manager: [Apr 16 13:45:28.719] [ET_NET 0] DEBUG: (parent_select) Chosen parent: cdn1cdmid0004.coxlab.net.80 Apr 16 13:45:28 cdn1cdedge0004 traffic_manager: [Apr 16 13:45:28.719] [ET_NET 0] DEBUG: (parent_select) Retry result for origin.rd.at.cox.net was parent cdn1cdmid0004.coxlab.net:80 Apr 16 13:45:37 cdn1cdedge0004 traffic_manager: [Apr 16 13:45:37.714] [ET_NET 19] DEBUG: (parent_select) Parent fail count increased to 2 for cdn1cdmid0003.coxlab.net:80 Apr 16 13:45:37 cdn1cdedge0004 traffic_manager: [Apr 16 13:45:37.714] [ET_NET 19] DEBUG: (parent_select) ParentConfigParams::nextParent(): parent_table: 0x1934050, result->rec: 0x1a7d498 Apr 16 13:45:37 cdn1cdedge0004 traffic_manager: [Apr 16 13:45:37.714] [ET_NET 19] DEBUG: (parent_select) ParentConfigParams::nextParent(): result->r: 2, tablePtr: 0x1934050 Apr 16 13:45:37 cdn1cdedge0004 traffic_manager: [Apr 16 13:45:37.714] [ET_NET 19] DEBUG: (parent_select) Calling selectParent() from nextParent Apr 16 13:45:37 cdn1cdedge0004 traffic_manager: [Apr 16 13:45:37.714] [ET_NET 19] DEBUG: (parent_select) ParentConsistentHash::selectParent(): Using a consistent hash parent selection strategy. Apr 16 13:45:37 cdn1cdedge0004 traffic_manager: [Apr 16 13:45:37.714] [ET_NET 19] DEBUG: (parent_select) Initial parent lookups: 1 Apr 16 13:45:37 cdn1cdedge0004 traffic_manager: [Apr 16 13:45:37.714] [ET_NET 19] DEBUG: (parent_select) Additional parent lookups: 1 Apr 16 13:45:37 cdn1cdedge0004 traffic_manager: [Apr 16 13:45:37.714] [ET_NET 19] DEBUG: (parent_select) Chosen parent: cdn1cdmid0004.coxlab.net.80 Apr 16 13:45:37 cdn1cdedge0004 traffic_manager: [Apr 16 13:45:37.714] [ET_NET 19] DEBUG: (parent_select) Retry result for origin.rd.at.cox.net was parent cdn1cdmid0004.coxlab.net:80 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
