[ https://issues.apache.org/jira/browse/HADOOP-19218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17867232#comment-17867232 ]
Viraj Jasani edited comment on HADOOP-19218 at 7/19/24 8:10 AM: ---------------------------------------------------------------- yeah i am also bit confused, i can create addendum PR based on your decision [~hexiaoqiao] [~ayushtkn] {quote}Is this issue only in 3.4.0 and trunk? {quote} that is correct because the test and the improvement to log the longest lock holder (HDFS-15217) is available since 3.4.0 only, whereas HADOOP-18628 is present since 3.3.6/3.4.0. was (Author: vjasani): yeah i am also bit confused, i can create addendum PR based on your decision [~hexiaoqiao] [~ayushtkn] {quote}Is this issue only in 3.4.0 and trunk? {quote} that is correct because the test and the improvement to log the longest lock holder (HDFS-15217) is available since 3.4.0 only. > Avoid DNS lookup while creating IPC Connection object > ----------------------------------------------------- > > Key: HADOOP-19218 > URL: https://issues.apache.org/jira/browse/HADOOP-19218 > Project: Hadoop Common > Issue Type: Improvement > Components: ipc > Reporter: Viraj Jasani > Assignee: Viraj Jasani > Priority: Major > Labels: pull-request-available > Fix For: 3.3.9, 3.5.0, 3.4.1 > > > Been running HADOOP-18628 in production for quite sometime, everything works > fine as long as DNS servers in HA are available. Upgrading single NS server > at a time is also a common case, not problematic. Every DNS lookup takes 1ms > in general. > However, recently we encountered a case where 2 out of 4 NS servers went down > (temporarily but it's a rare case). With small duration DNS cache and 2s of > NS fallback timeout configured in resolv.conf, now any client performing DNS > lookup can encounter 4s+ delay. This caused namenode outage as listener > thread is single threaded and it was not able to keep up with large num of > unique clients (in direct proportion with num of DNS resolutions every few > seconds) initiating connection on listener port. > While having 2 out of 4 DNS servers offline is rare case and NS fallback > settings could also be improved, it is important to note that we don't need > to perform DNS resolution for every new connection if the intention is to > improve the insights into VersionMistmatch errors thrown by the server. > The proposal is the delay the DNS resolution until the server throws the > error for incompatible header or version mismatch. This would also help with > ~1ms extra time spent even for healthy DNS lookup. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org