[ https://issues.apache.org/jira/browse/HDFS-13828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16591139#comment-16591139 ]
Amithsha commented on HDFS-13828: --------------------------------- We found that hive job referring to one particular block. Which caused the spike in Xceiver count. > DataNode breaching Xceiver Count > -------------------------------- > > Key: HDFS-13828 > URL: https://issues.apache.org/jira/browse/HDFS-13828 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode > Affects Versions: 2.7.1 > Reporter: Amithsha > Priority: Critical > > We were observing the breach of the xceiver count 4096, On a particular set > of nodes from 5 - 8 nodes in a 900 nodes cluster. > And we stopped the datanode services on those nodes and made to replicate > across the cluster. After that also, we observed the same issue on a new set > of nodes. > Q1: Why on a particular node, and also after decommissioning the node the > data should be replicated across the cluster, But why again difference set of > node? > Assumptions : > Reading a particular block/ data on that node might be the cause for this but > it should be mitigated after the decommission but not why? So suspected that > those MR jobs are triggered from Hive, so the query might be referring to the > same block mulitple times in different stages and creating this issue? > From Thread Dump : > Thread dump of datanode says that out of 4090+ xceiver threads created on > that node nearly 4000+ where belong to the same AppId of multiple mappers > with state no operation. > > Any suggestions on this? > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org