[ https://issues.apache.org/jira/browse/HDFS-9412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15000242#comment-15000242 ]
He Tianyi commented on HDFS-9412: --------------------------------- Sorry the misuse of word "starvation". It's not starvation exactly, just hung for a while. > getBlocks occupies FSLock and takes too long to complete > -------------------------------------------------------- > > Key: HDFS-9412 > URL: https://issues.apache.org/jira/browse/HDFS-9412 > Project: Hadoop HDFS > Issue Type: Improvement > Reporter: He Tianyi > Assignee: He Tianyi > > {{getBlocks}} in {{NameNodeRpcServer}} acquires a read lock then may take a > long time to complete (probably several seconds, if number of blocks are too > much). > During this period, other threads attempting to acquire write lock will wait. > In an extreme case, RPC handlers are occupied by one reader thread calling > {{getBlocks}} and all other threads waiting for write lock, rpc server acts > like hung. Unfortunately, this tends to happen in heavy loaded cluster, since > read operations come and go fast (they do not need to wait), leaving write > operations waiting. > Looks like we can optimize this thing like DN block report did in past, by > splitting the operation into smaller sub operations, and let other threads do > their work between each sub operation. The whole result is returned at once, > though (one thing different from DN block report). > I am not sure whether this will work. Any better idea? -- This message was sent by Atlassian JIRA (v6.3.4#6332)