[ https://issues.apache.org/jira/browse/MAPREDUCE-6684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15276619#comment-15276619 ]
Haibo Chen commented on MAPREDUCE-6684: --------------------------------------- Thanks a lot for your review, Jason! Patch updated according to your comments. > High contention on scanning of user directory under immediate_done in Job > History Server > ---------------------------------------------------------------------------------------- > > Key: MAPREDUCE-6684 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6684 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobhistoryserver > Affects Versions: 2.7.0 > Reporter: Haibo Chen > Assignee: Haibo Chen > Priority: Critical > Attachments: jhs-jstacks-service-monitor-running.tar.gz, > jhs-jstacks-service-monitor-stopped.tar.gz, mapreduce6684.001.patch, > mapreduce6684.002.patch > > > HistoryFileManager.scanIntermediateDirectory() in JHS acquires a lock on each > user directory it tries to scan (move or delete files under the user > directory as necessary). This method is called in a thread in JobHistory that > performs periodical scanning of intermediate directory, and can also be > called by web server threads for each Web API call made by a JHS client. In > cases where there are many concurrent Web API calls/connections to JHS, all > but one thread are blocked on the lock on the user directory. Eventually, > client connects will time out, but the threads in JHS will not be killed and > leave a lot of TCP connections in CLOSE_WAIT state. > {noformat} > [systest@vb1120 ~]$ sudo netstat -nap | grep 63729 | sort -k 4 > tcp 0 0 10.17.202.19:10020 0.0.0.0:* > LISTEN 63729/java > tcp 0 0 10.17.202.19:10020 10.17.198.30:33010 > ESTABLISHED 63729/java > tcp 0 0 10.17.202.19:10020 10.17.200.30:33980 > ESTABLISHED 63729/java > tcp 0 0 10.17.202.19:10020 10.17.202.10:59625 > ESTABLISHED 63729/java > tcp 0 0 10.17.202.19:10020 10.17.202.13:35765 > ESTABLISHED 63729/java > tcp 0 0 10.17.202.19:10033 0.0.0.0:* > LISTEN 63729/java > tcp 0 0 10.17.202.19:19888 0.0.0.0:* > LISTEN 63729/java > tcp 0 0 10.17.202.19:19888 10.17.198.30:35103 > ESTABLISHED 63729/java > tcp 277 0 10.17.202.19:19888 10.17.198.30:43670 > ESTABLISHED 63729/java > tcp 0 0 10.17.202.19:19888 10.17.198.30:45453 > ESTABLISHED 63729/java > tcp 277 0 10.17.202.19:19888 10.17.198.30:49184 > ESTABLISHED 63729/java > tcp 1 0 10.17.202.19:19888 10.17.202.13:49992 > CLOSE_WAIT 63729/java > tcp 261 0 10.17.202.19:19888 10.17.202.13:52703 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:52707 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:52708 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:52710 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:52714 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:52723 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:52726 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:52727 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:52739 > CLOSE_WAIT 63729/java > tcp 261 0 10.17.202.19:19888 10.17.202.13:52749 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:52753 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:52757 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:52760 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:52820 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:52827 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:52829 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:52831 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:52833 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:52836 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:52839 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:52841 > CLOSE_WAIT 63729/java > tcp 261 0 10.17.202.19:19888 10.17.202.13:52843 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:52850 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:52860 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:52876 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:52879 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:52881 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:52884 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:52886 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:52888 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:52891 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:52893 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:52896 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:52898 > CLOSE_WAIT 63729/java > tcp 261 0 10.17.202.19:19888 10.17.202.13:52899 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:52902 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:52909 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:52910 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:52912 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:52923 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:52925 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:52927 > CLOSE_WAIT 63729/java > tcp 261 0 10.17.202.19:19888 10.17.202.13:52930 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:52937 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:52939 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:52945 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:52947 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:52969 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:52972 > CLOSE_WAIT 63729/java > tcp 261 0 10.17.202.19:19888 10.17.202.13:52975 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:53004 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:53007 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:53009 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:53011 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:53052 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:53058 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:53059 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:53063 > CLOSE_WAIT 63729/java > tcp 261 0 10.17.202.19:19888 10.17.202.13:53071 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:53084 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:53093 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:53095 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:53097 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:53101 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:53104 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:53106 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:53108 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:53110 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:53112 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:53114 > CLOSE_WAIT 63729/java > tcp 261 0 10.17.202.19:19888 10.17.202.13:53115 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:53117 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:53121 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:53123 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:53125 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:53127 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:53129 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:53131 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:53134 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:53138 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:53140 > CLOSE_WAIT 63729/java > tcp 261 0 10.17.202.19:19888 10.17.202.13:53153 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:53155 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:53157 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:53159 > CLOSE_WAIT 63729/java > tcp 261 0 10.17.202.19:19888 10.17.202.13:53173 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:53176 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:53177 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:53178 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:53179 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:53181 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:53183 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:53201 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:53204 > CLOSE_WAIT 63729/java > tcp 261 0 10.17.202.19:19888 10.17.202.13:53218 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:53267 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:53270 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:53275 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:53278 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:53280 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:53283 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:53293 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:53296 > CLOSE_WAIT 63729/java > tcp 261 0 10.17.202.19:19888 10.17.202.13:53299 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:53309 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:53312 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:53314 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:53317 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:53320 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:53322 > CLOSE_WAIT 63729/java > tcp 256 0 10.17.202.19:19888 10.17.202.13:53338 > CLOSE_WAIT 63729/java > tcp 261 0 10.17.202.19:19888 10.17.202.13:53340 > CLOSE_WAIT 63729/java > tcp 255 0 10.17.202.19:19888 10.17.202.13:53364 > ESTABLISHED 63729/java > tcp 255 0 10.17.202.19:19888 10.17.202.13:53366 > ESTABLISHED 63729/java > tcp 260 0 10.17.202.19:19888 10.17.202.13:53367 > ESTABLISHED 63729/java > tcp 255 0 10.17.202.19:19888 10.17.202.13:53380 > ESTABLISHED 63729/java > tcp 255 0 10.17.202.19:19888 10.17.202.13:53382 > ESTABLISHED 63729/java > tcp 255 0 10.17.202.19:19888 10.17.202.13:53386 > ESTABLISHED 63729/java > tcp 255 0 10.17.202.19:19888 10.17.202.13:53390 > ESTABLISHED 63729/java > tcp 255 0 10.17.202.19:19888 10.17.202.13:53392 > ESTABLISHED 63729/java > tcp 1278 0 10.17.202.19:19888 10.17.202.18:45301 > CLOSE_WAIT 63729/java > tcp 1278 0 10.17.202.19:19888 10.17.202.18:45303 > CLOSE_WAIT 63729/java > tcp 1277 0 10.17.202.19:19888 10.17.202.18:45306 > ESTABLISHED 63729/java > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org