[ https://issues.apache.org/jira/browse/HADOOP-17346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226884#comment-17226884 ]
Ahmed Hussein commented on HADOOP-17346: ---------------------------------------- We have deployed that change on our internal cluster and it is working great. [~epayne] can you please take a look at the [GitHub Pull Request #2431|https://github.com/apache/hadoop/pull/2431] > Fair call queue is defeated by abusive service principals > --------------------------------------------------------- > > Key: HADOOP-17346 > URL: https://issues.apache.org/jira/browse/HADOOP-17346 > Project: Hadoop Common > Issue Type: Bug > Components: common, ipc > Reporter: Ahmed Hussein > Assignee: Ahmed Hussein > Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > [~daryn] reported that the FCQ prioritizes based on the full kerberos > principal (ie. "user/host@realm") rather than short name (ie. "user") to > prevent service principals like the DNs and NMs being de-prioritized since > service principals are expected to be well behaved. Notably the DNs > contribute a significant but important load so the intent is not to > de-prioritize all DNs because their sum total load is high relative to users. > This has the unfortunate side effect of allowing misbehaving & non-critical > service principals to abuse the FCQ. The gstorm/* principals are a prime > example. Each server is spamming opens as fast as possible which ensures > that none of the gstorm servers can be de-prioritized because each principal > is a fraction of the total load from all principals. > The secondary and more devasting problem is other abusive non-service > principals cannot be effectively de-prioritized. The sum total of all gstorm > load prevents other principals from surpassing the priority thresholds. > Principals stay in the highest priority queues which allows the abusive > principals to overflow the entire call queue for extended periods of time. > Notably it prevents the FCQ from moderating the heavy create loads from p_gup > @ DB which cause significant performance degradation. > Prioritization should be based on short name with configurable exemptions for > services like the DN/NM. > [~daryn] suggested a solution that we applied on our clusters. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org