[ https://issues.apache.org/jira/browse/TEZ-3633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15877526#comment-15877526 ]
Rajesh Balamohan edited comment on TEZ-3633 at 2/22/17 5:46 AM: ---------------------------------------------------------------- I enabled keep-alive in tez and mapreduce shuffle handler. (i.e tez.runtime.shuffle.keep-alive.enabled=true and mapreduce.shuffle.connection-keep-alive.enable=true) To verify keep-alive connections are working without connection breaks, I enabled wire logging in HttpUrlConnection as follows Created the following file {{http.logger.properties}} and placed in {{/tmp/}} folder in test cluster {noformat} handlers=java.util.logging.ConsoleHandler java.util.logging.ConsoleHandler.level = ALL java.util.logging.ConsoleHandler.formatter = sun.net.www.protocol.http.HttpLogFormatter sun.net.www.protocol.http.HttpURLConnection.level = ALL sun.net.www.protocol.http.level=ALL {noformat} Tried out query-17 from TPC-DS with hive/tez which is relatively shuffle intensive in hive-1. {noformat} hive --hiveconf tez.task.launch.cmd-opts="-Djava.util.logging.config.file=/tmp/http.logger.properties" {noformat} >From the logs, I see {{KeepAlive stream used}} which indicates that keepAlive >connections being used from sun's internal implementation. Hadoop version used >is 2.7.3 from HDP. {noformat} FINE: sun.net.www.MessageHeader@4f81a00c8 pairs: {GET /mapOutput?job=job_1487556355205_0099&reduce=0&map=attempt_1487556355205_0099_1_00_000000_0_10003&keepAlive=true HTTP/1.1: null}{UrlHash: rjk3o9nG2khcu5d+QGw1Gk1nhBY=}{name: mapreduce}{version: 1.0.0}{User-Agent: Java/1.8.0_121}{Host: ip-10-0-1-66.ec2.internal:13562}{Accept: text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2}{Connection: keep-alive} Feb 22, 2017 5:16:37 AM sun.net.www.http.HttpClient logFinest FINEST: KeepAlive stream used: http://ip-10-0-1-66.ec2.internal:13562/mapOutput?job=job_1487556355205_0099&reduce=0&map=attempt_1487556355205_0099_1_00_000000_0_10003&keepAlive=true Feb 22, 2017 5:16:37 AM sun.net.www.protocol.http.HttpURLConnection getInputStream0 FINE: sun.net.www.MessageHeader@4146578c7 pairs: {null: HTTP/1.1 200 OK}{ReplyHash: wox3/ak3+1T3PwXc8eCk9C4wtDs=}{name: mapreduce}{version: 1.0.0}{Content-Length: 1156}{Connection: Keep-Alive}{Keep-Alive: timeout=5} Feb 22, 2017 5:16:38 AM sun.net.www.protocol.http.HttpURLConnection plainConnect0 ... FINEST: ProxySelector Request for http://ip-10-0-1-66.ec2.internal:13562/mapOutput?job=job_1487556355205_0099&reduce=0&map=attempt_1487556355205_0099_1_06_000002_0_10006,attempt_1487556355205_0099_1_06_000003_0_10007,attempt_1487556355205_0099_1_06_000004_0_10006,attempt_1487556355205_0099_1_06_000001_0_10005,attempt_1487556355205_0099_1_06_000000_0_10005,attempt_1487556355205_0099_1_06_000007_0_10003,attempt_1487556355205_0099_1_06_000005_0_10002,attempt_1487556355205_0099_1_06_000006_0_10002,attempt_1487556355205_0099_1_06_000027_0_10023,attempt_1487556355205_0099_1_06_000033_0_10023,attempt_1487556355205_0099_1_06_000039_0_10026,attempt_1487556355205_0099_1_06_000044_0_10026,attempt_1487556355205_0099_1_06_000049_0_10026,attempt_1487556355205_0099_1_06_000038_0_10017,attempt_1487556355205_0099_1_06_000046_0_10017,attempt_1487556355205_0099_1_06_000048_0_10026&keepAlive=true ... FINEST: KeepAlive stream retrieved from the cache, sun.net.www.http.HttpClient(http://ip-10-0-1-66.ec2.internal:13562/mapOutput?job=job_1487556355205_0099&reduce=0&map=attempt_1487556355205_0099_1_00_000000_0_10003&keepAlive=true) Feb 22, 2017 5:16:38 AM sun.net.www.protocol.http.HttpURLConnection plainConnect0 ........ printing more KeepAlive log lines alone from here onwards for reference.. ...... FINEST: KeepAlive stream used: http://ip-10-0-1-66.ec2.internal:13562/mapOutput?job=job_1487556355205_0099&reduce=0&map=attempt_1487556355205_0099_1_00_000000_0_10003&keepAlive=true FINEST: KeepAlive stream used: http://ip-10-0-1-66.ec2.internal:13562/mapOutput?job=job_1487556355205_0099&reduce=0&map=attempt_1487556355205_0099_1_06_000003_0_10007,attempt_1487556355205_0099_1_06_000004_0_10006,attempt_1487556355205_0099_1_06_000001_0_10005,attempt_1487556355205_0099_1_06_000000_0_10005&keepAlive=true FINEST: KeepAlive stream used: http://ip-10-0-1-66.ec2.internal:13562/mapOutput?job=job_1487556355205_0099&reduce=0&map=attempt_1487556355205_0099_1_06_000007_0_10003,attempt_1487556355205_0099_1_06_000005_0_10002&keepAlive=true FINEST: KeepAlive stream used: http://ip-10-0-1-66.ec2.internal:13562/mapOutput?job=job_1487556355205_0099&reduce=0&map=attempt_1487556355205_0099_1_04_000001_0_10010&keepAlive=true FINEST: KeepAlive stream used: http://ip-10-0-1-66.ec2.internal:13562/mapOutput?job=job_1487556355205_0099&reduce=0&map=attempt_1487556355205_0099_1_04_000000_0_10010&keepAlive=true FINEST: KeepAlive stream used: http://ip-10-0-1-66.ec2.internal:13562/mapOutput?job=job_1487556355205_0099&reduce=0&map=attempt_1487556355205_0099_1_06_000002_0_10006&keepAlive=true FINEST: KeepAlive stream used: http://ip-10-0-1-66.ec2.internal:13562/mapOutput?job=job_1487556355205_0099&reduce=0&map=attempt_1487556355205_0099_1_06_000006_0_10002,attempt_1487556355205_0099_1_06_000027_0_10023,attempt_1487556355205_0099_1_06_000033_0_10023,attempt_1487556355205_0099_1_06_000039_0_10026&keepAlive=true FINEST: KeepAlive stream used: http://ip-10-0-1-66.ec2.internal:13562/mapOutput?job=job_1487556355205_0099&reduce=0&map=attempt_1487556355205_0099_1_03_000000_0_10003&keepAlive=true FINEST: KeepAlive stream retrieved from the cache, sun.net.www.http.HttpClient(http://ip-10-0-1-66.ec2.internal:13562/mapOutput?job=job_1487556355205_0099&reduce=0&map=attempt_1487556355205_0099_1_04_000001_0_10010&keepAlive=true) FINEST: KeepAlive stream used: http://ip-10-0-1-66.ec2.internal:13562/mapOutput?job=job_1487556355205_0099&reduce=0&map=attempt_1487556355205_0099_1_06_000044_0_10026&keepAlive=true FINEST: KeepAlive stream retrieved from the cache, sun.net.www.http.HttpClient(http://ip-10-0-1-228.ec2.internal:13562/mapOutput?job=job_1487556355205_0099&reduce=0&map=attempt_1487556355205_0099_1_04_000021_0_10007,attempt_1487556355205_0099_1_04_000034_0_10007,attempt_1487556355205_0099_1_04_000037_0_10006,attempt_1487556355205_0099_1_04_000027_0_10006,attempt_1487556355205_0099_1_04_000033_0_10006,attempt_1487556355205_0099_1_04_000036_0_10007,attempt_1487556355205_0099_1_04_000031_0_10006,attempt_1487556355205_0099_1_04_000040_0_10006,attempt_1487556355205_0099_1_04_000030_0_10006&keepAlive=true) FINEST: KeepAlive stream used: http://ip-10-0-1-228.ec2.internal:13562/mapOutput?job=job_1487556355205_0099&reduce=0&map=attempt_1487556355205_0099_1_06_000034_0_10011&keepAlive=true FINEST: KeepAlive stream used: http://ip-10-0-1-66.ec2.internal:13562/mapOutput?job=job_1487556355205_0099&reduce=0&map=attempt_1487556355205_0099_1_02_000000_0_10003&keepAlive=true FINEST: KeepAlive stream used: http://ip-10-0-1-66.ec2.internal:13562/mapOutput?job=job_1487556355205_0099&reduce=0&map=attempt_1487556355205_0099_1_04_000043_0_10013,attempt_1487556355205_0099_1_04_000046_0_10022,attempt_1487556355205_0099_1_04_000049_0_10022,attempt_1487556355205_0099_1_04_000047_0_10013,attempt_1487556355205_0099_1_04_000048_0_10021&keepAlive=true FINEST: KeepAlive stream retrieved from the cache, sun.net.www.http.HttpClient(http://ip-10-0-1-66.ec2.internal:13562/mapOutput?job=job_1487556355205_0099&reduce=0&map=attempt_1487556355205_0099_1_04_000000_0_10010&keepAlive=true) FINEST: KeepAlive stream used: http://ip-10-0-1-66.ec2.internal:13562/mapOutput?job=job_1487556355205_0099&reduce=0&map=attempt_1487556355205_0099_1_06_000049_0_10026&keepAlive=true FINEST: KeepAlive stream retrieved from the cache, sun.net.www.http.HttpClient(http://ip-10-0-1-66.ec2.internal:13562/mapOutput?job=job_1487556355205_0099&reduce=0&map=attempt_1487556355205_0099_1_06_000044_0_10026&keepAlive=true) {noformat} was (Author: rajesh.balamohan): I enabled keep-alive in tez and mapreduce shuffle handler. (i.e tez.runtime.shuffle.keep-alive.enabled=true and mapreduce.shuffle.connection-keep-alive.enable=true) To verify keep-alive connections are working without connection breaks, I enabled wire logging in HttpUrlConnection as follows Created the following file {{http.logger.properties}} and placed in {{/tmp/}} folder in test cluster {noformat} handlers=java.util.logging.ConsoleHandler java.util.logging.ConsoleHandler.level = ALL java.util.logging.ConsoleHandler.formatter = sun.net.www.protocol.http.HttpLogFormatter sun.net.www.protocol.http.HttpURLConnection.level = ALL sun.net.www.protocol.http.level=ALL {noformat} Tried out query-17 from TPC-DS with hive/tez which is relatively shuffle intensive in hive-1. {noformat} hive --hiveconf tez.task.launch.cmd-opts="-Djava.util.logging.config.file=/tmp/http.logger.properties" {noformat} >From the logs, I see logs of {{KeepAlive stream used}} which does indicate >that keepAlive connections being used from sun's internal implementation. >Hadoop version used is 2.7.3 from HDP. {noformat} FINE: sun.net.www.MessageHeader@4f81a00c8 pairs: {GET /mapOutput?job=job_1487556355205_0099&reduce=0&map=attempt_1487556355205_0099_1_00_000000_0_10003&keepAlive=true HTTP/1.1: null}{UrlHash: rjk3o9nG2khcu5d+QGw1Gk1nhBY=}{name: mapreduce}{version: 1.0.0}{User-Agent: Java/1.8.0_121}{Host: ip-10-0-1-66.ec2.internal:13562}{Accept: text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2}{Connection: keep-alive} Feb 22, 2017 5:16:37 AM sun.net.www.http.HttpClient logFinest FINEST: KeepAlive stream used: http://ip-10-0-1-66.ec2.internal:13562/mapOutput?job=job_1487556355205_0099&reduce=0&map=attempt_1487556355205_0099_1_00_000000_0_10003&keepAlive=true Feb 22, 2017 5:16:37 AM sun.net.www.protocol.http.HttpURLConnection getInputStream0 FINE: sun.net.www.MessageHeader@4146578c7 pairs: {null: HTTP/1.1 200 OK}{ReplyHash: wox3/ak3+1T3PwXc8eCk9C4wtDs=}{name: mapreduce}{version: 1.0.0}{Content-Length: 1156}{Connection: Keep-Alive}{Keep-Alive: timeout=5} Feb 22, 2017 5:16:38 AM sun.net.www.protocol.http.HttpURLConnection plainConnect0 ... FINEST: ProxySelector Request for http://ip-10-0-1-66.ec2.internal:13562/mapOutput?job=job_1487556355205_0099&reduce=0&map=attempt_1487556355205_0099_1_06_000002_0_10006,attempt_1487556355205_0099_1_06_000003_0_10007,attempt_1487556355205_0099_1_06_000004_0_10006,attempt_1487556355205_0099_1_06_000001_0_10005,attempt_1487556355205_0099_1_06_000000_0_10005,attempt_1487556355205_0099_1_06_000007_0_10003,attempt_1487556355205_0099_1_06_000005_0_10002,attempt_1487556355205_0099_1_06_000006_0_10002,attempt_1487556355205_0099_1_06_000027_0_10023,attempt_1487556355205_0099_1_06_000033_0_10023,attempt_1487556355205_0099_1_06_000039_0_10026,attempt_1487556355205_0099_1_06_000044_0_10026,attempt_1487556355205_0099_1_06_000049_0_10026,attempt_1487556355205_0099_1_06_000038_0_10017,attempt_1487556355205_0099_1_06_000046_0_10017,attempt_1487556355205_0099_1_06_000048_0_10026&keepAlive=true ... FINEST: KeepAlive stream retrieved from the cache, sun.net.www.http.HttpClient(http://ip-10-0-1-66.ec2.internal:13562/mapOutput?job=job_1487556355205_0099&reduce=0&map=attempt_1487556355205_0099_1_00_000000_0_10003&keepAlive=true) Feb 22, 2017 5:16:38 AM sun.net.www.protocol.http.HttpURLConnection plainConnect0 ........ printing more KeepAlive log lines alone from here onwards for reference.. ...... FINEST: KeepAlive stream used: http://ip-10-0-1-66.ec2.internal:13562/mapOutput?job=job_1487556355205_0099&reduce=0&map=attempt_1487556355205_0099_1_00_000000_0_10003&keepAlive=true FINEST: KeepAlive stream used: http://ip-10-0-1-66.ec2.internal:13562/mapOutput?job=job_1487556355205_0099&reduce=0&map=attempt_1487556355205_0099_1_06_000003_0_10007,attempt_1487556355205_0099_1_06_000004_0_10006,attempt_1487556355205_0099_1_06_000001_0_10005,attempt_1487556355205_0099_1_06_000000_0_10005&keepAlive=true FINEST: KeepAlive stream used: http://ip-10-0-1-66.ec2.internal:13562/mapOutput?job=job_1487556355205_0099&reduce=0&map=attempt_1487556355205_0099_1_06_000007_0_10003,attempt_1487556355205_0099_1_06_000005_0_10002&keepAlive=true FINEST: KeepAlive stream used: http://ip-10-0-1-66.ec2.internal:13562/mapOutput?job=job_1487556355205_0099&reduce=0&map=attempt_1487556355205_0099_1_04_000001_0_10010&keepAlive=true FINEST: KeepAlive stream used: http://ip-10-0-1-66.ec2.internal:13562/mapOutput?job=job_1487556355205_0099&reduce=0&map=attempt_1487556355205_0099_1_04_000000_0_10010&keepAlive=true FINEST: KeepAlive stream used: http://ip-10-0-1-66.ec2.internal:13562/mapOutput?job=job_1487556355205_0099&reduce=0&map=attempt_1487556355205_0099_1_06_000002_0_10006&keepAlive=true FINEST: KeepAlive stream used: http://ip-10-0-1-66.ec2.internal:13562/mapOutput?job=job_1487556355205_0099&reduce=0&map=attempt_1487556355205_0099_1_06_000006_0_10002,attempt_1487556355205_0099_1_06_000027_0_10023,attempt_1487556355205_0099_1_06_000033_0_10023,attempt_1487556355205_0099_1_06_000039_0_10026&keepAlive=true FINEST: KeepAlive stream used: http://ip-10-0-1-66.ec2.internal:13562/mapOutput?job=job_1487556355205_0099&reduce=0&map=attempt_1487556355205_0099_1_03_000000_0_10003&keepAlive=true FINEST: KeepAlive stream retrieved from the cache, sun.net.www.http.HttpClient(http://ip-10-0-1-66.ec2.internal:13562/mapOutput?job=job_1487556355205_0099&reduce=0&map=attempt_1487556355205_0099_1_04_000001_0_10010&keepAlive=true) FINEST: KeepAlive stream used: http://ip-10-0-1-66.ec2.internal:13562/mapOutput?job=job_1487556355205_0099&reduce=0&map=attempt_1487556355205_0099_1_06_000044_0_10026&keepAlive=true FINEST: KeepAlive stream retrieved from the cache, sun.net.www.http.HttpClient(http://ip-10-0-1-228.ec2.internal:13562/mapOutput?job=job_1487556355205_0099&reduce=0&map=attempt_1487556355205_0099_1_04_000021_0_10007,attempt_1487556355205_0099_1_04_000034_0_10007,attempt_1487556355205_0099_1_04_000037_0_10006,attempt_1487556355205_0099_1_04_000027_0_10006,attempt_1487556355205_0099_1_04_000033_0_10006,attempt_1487556355205_0099_1_04_000036_0_10007,attempt_1487556355205_0099_1_04_000031_0_10006,attempt_1487556355205_0099_1_04_000040_0_10006,attempt_1487556355205_0099_1_04_000030_0_10006&keepAlive=true) FINEST: KeepAlive stream used: http://ip-10-0-1-228.ec2.internal:13562/mapOutput?job=job_1487556355205_0099&reduce=0&map=attempt_1487556355205_0099_1_06_000034_0_10011&keepAlive=true FINEST: KeepAlive stream used: http://ip-10-0-1-66.ec2.internal:13562/mapOutput?job=job_1487556355205_0099&reduce=0&map=attempt_1487556355205_0099_1_02_000000_0_10003&keepAlive=true FINEST: KeepAlive stream used: http://ip-10-0-1-66.ec2.internal:13562/mapOutput?job=job_1487556355205_0099&reduce=0&map=attempt_1487556355205_0099_1_04_000043_0_10013,attempt_1487556355205_0099_1_04_000046_0_10022,attempt_1487556355205_0099_1_04_000049_0_10022,attempt_1487556355205_0099_1_04_000047_0_10013,attempt_1487556355205_0099_1_04_000048_0_10021&keepAlive=true FINEST: KeepAlive stream retrieved from the cache, sun.net.www.http.HttpClient(http://ip-10-0-1-66.ec2.internal:13562/mapOutput?job=job_1487556355205_0099&reduce=0&map=attempt_1487556355205_0099_1_04_000000_0_10010&keepAlive=true) FINEST: KeepAlive stream used: http://ip-10-0-1-66.ec2.internal:13562/mapOutput?job=job_1487556355205_0099&reduce=0&map=attempt_1487556355205_0099_1_06_000049_0_10026&keepAlive=true FINEST: KeepAlive stream retrieved from the cache, sun.net.www.http.HttpClient(http://ip-10-0-1-66.ec2.internal:13562/mapOutput?job=job_1487556355205_0099&reduce=0&map=attempt_1487556355205_0099_1_06_000044_0_10026&keepAlive=true) {noformat} > Implement keep-alive timeout in tez shuffle handler > --------------------------------------------------- > > Key: TEZ-3633 > URL: https://issues.apache.org/jira/browse/TEZ-3633 > Project: Apache Tez > Issue Type: Sub-task > Reporter: Jonathan Eagles > Assignee: Jonathan Eagles > Attachments: TEZ-3633.1.patch > > > MAPREDUCE-5787 which added keep-alive to mapreduce shuffle handler was not > fully functional as despite advertising keep-alive option and adding the > header to the response, all connections were closed immediately after write. > This reduced the performance of certain fetches as now time is spent > requesting a second get to the same serve, only for that server to reset the > connection forcing the client to reestablish the connection on another port. > The details of this is hidden behind HttpURLConnection and doesn't show in > any log file at default logging level. However TCP sniffing does show errant > behavior. -- This message was sent by Atlassian JIRA (v6.3.15#6346)