[jira] [Assigned] (HADOOP-12382) Add Documentation for FairCallQueue
[ https://issues.apache.org/jira/browse/HADOOP-12382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Li reassigned HADOOP-12382: - Assignee: Chris Li (was: Ajith S) > Add Documentation for FairCallQueue > --- > > Key: HADOOP-12382 > URL: https://issues.apache.org/jira/browse/HADOOP-12382 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Ajith S >Assignee: Chris Li > > Added supporting documentation explaining the FairCallQueue and mention all > the properties introduced by its subtasks accordingly -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12382) Add Documentation for FairCallQueue
[ https://issues.apache.org/jira/browse/HADOOP-12382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735624#comment-14735624 ] Chris Li commented on HADOOP-12382: --- I can do this, sometime later this week. I've assembled various slides and materials over the course of time that I use. > Add Documentation for FairCallQueue > --- > > Key: HADOOP-12382 > URL: https://issues.apache.org/jira/browse/HADOOP-12382 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Ajith S >Assignee: Ajith S > > Added supporting documentation explaining the FairCallQueue and mention all > the properties introduced by its subtasks accordingly -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-9640) RPC Congestion Control with FairCallQueue
[ https://issues.apache.org/jira/browse/HADOOP-9640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14731185#comment-14731185 ] Chris Li commented on HADOOP-9640: -- [~ajithshetty] Sure, what did you have in mind? > RPC Congestion Control with FairCallQueue > - > > Key: HADOOP-9640 > URL: https://issues.apache.org/jira/browse/HADOOP-9640 > Project: Hadoop Common > Issue Type: Improvement >Affects Versions: 2.2.0, 3.0.0 >Reporter: Xiaobo Peng >Assignee: Chris Li > Labels: hdfs, qos, rpc > Attachments: FairCallQueue-PerformanceOnCluster.pdf, > MinorityMajorityPerformance.pdf, NN-denial-of-service-updated-plan.pdf, > faircallqueue.patch, faircallqueue2.patch, faircallqueue3.patch, > faircallqueue4.patch, faircallqueue5.patch, faircallqueue6.patch, > faircallqueue7_with_runtime_swapping.patch, > rpc-congestion-control-draft-plan.pdf > > > For an easy-to-read summary see: > http://www.ebaytechblog.com/2014/08/21/quality-of-service-in-hadoop/ > Several production Hadoop cluster incidents occurred where the Namenode was > overloaded and failed to respond. > We can improve quality of service for users during namenode peak loads by > replacing the FIFO call queue with a [Fair Call > Queue|https://issues.apache.org/jira/secure/attachment/12616864/NN-denial-of-service-updated-plan.pdf]. > (this plan supersedes rpc-congestion-control-draft-plan). > Excerpted from the communication of one incident, “The map task of a user was > creating huge number of small files in the user directory. Due to the heavy > load on NN, the JT also was unable to communicate with NN...The cluster > became responsive only once the job was killed.” > Excerpted from the communication of another incident, “Namenode was > overloaded by GetBlockLocation requests (Correction: should be getFileInfo > requests. the job had a bug that called getFileInfo for a nonexistent file in > an endless loop). All other requests to namenode were also affected by this > and hence all jobs slowed down. Cluster almost came to a grinding > halt…Eventually killed jobtracker to kill all jobs that are running.” > Excerpted from HDFS-945, “We've seen defective applications cause havoc on > the NameNode, for e.g. by doing 100k+ 'listStatus' on very large directories > (60k files) etc.” -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12189) CallQueueManager may drop elements from the queue sometimes when calling swapQueue
[ https://issues.apache.org/jira/browse/HADOOP-12189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14621482#comment-14621482 ] Chris Li commented on HADOOP-12189: --- Looks good to me. +1 > CallQueueManager may drop elements from the queue sometimes when calling > swapQueue > -- > > Key: HADOOP-12189 > URL: https://issues.apache.org/jira/browse/HADOOP-12189 > Project: Hadoop Common > Issue Type: Bug > Components: ipc, test >Affects Versions: 2.7.1 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: HADOOP-12189.000.patch, HADOOP-12189.001.patch, > HADOOP-12189.none_guarantee.000.patch, HADOOP-12189.none_guarantee.001.patch, > HADOOP-12189.none_guarantee.002.patch > > > CallQueueManager may drop elements from the queue sometimes when calling > {{swapQueue}}. > The following test failure from TestCallQueueManager shown some elements in > the queue are dropped. > https://builds.apache.org/job/PreCommit-HADOOP-Build/7150/testReport/org.apache.hadoop.ipc/TestCallQueueManager/testSwapUnderContention/ > {code} > java.lang.AssertionError: expected:<27241> but was:<27245> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.ipc.TestCallQueueManager.testSwapUnderContention(TestCallQueueManager.java:220) > {code} > It looked like the elements in the queue are dropped due to > {{CallQueueManager#swapQueue}} > Looked at the implementation of {{CallQueueManager#swapQueue}}, there is a > possibility that the elements in the queue are dropped. If the queue is full, > the calling thread for {{CallQueueManager#put}} is blocked for long time. It > may put the element into the old queue after queue in {{takeRef}} is changed > by swapQueue, then this element in the old queue will be dropped. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12189) CallQueueManager may drop elements from the queue sometimes when calling swapQueue
[ https://issues.apache.org/jira/browse/HADOOP-12189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14621175#comment-14621175 ] Chris Li commented on HADOOP-12189: --- Great, couple more minor things: Can we update the comment for queueReallyIsEmpty, and also include how the constants CHECKPOINT_INTERVAL_MS and CHECKPOINT_NUM come into play? What do you think about expressing the code in a for loop? Makes the code more readable I think: {code} private boolean queueIsReallyEmpty(BlockingQueue q) { for (int i = 0; i < CHECKPOINT_NUM; i++) { if (!q.isEmpty()) { return false; } try { Thread.sleep(CHECKPOINT_INTERVAL_MS); } catch (InterruptedException ie) { return false; } } return true; } {code} > CallQueueManager may drop elements from the queue sometimes when calling > swapQueue > -- > > Key: HADOOP-12189 > URL: https://issues.apache.org/jira/browse/HADOOP-12189 > Project: Hadoop Common > Issue Type: Bug > Components: ipc, test >Affects Versions: 2.7.1 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: HADOOP-12189.000.patch, HADOOP-12189.001.patch, > HADOOP-12189.none_guarantee.000.patch, HADOOP-12189.none_guarantee.001.patch > > > CallQueueManager may drop elements from the queue sometimes when calling > {{swapQueue}}. > The following test failure from TestCallQueueManager shown some elements in > the queue are dropped. > https://builds.apache.org/job/PreCommit-HADOOP-Build/7150/testReport/org.apache.hadoop.ipc/TestCallQueueManager/testSwapUnderContention/ > {code} > java.lang.AssertionError: expected:<27241> but was:<27245> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.ipc.TestCallQueueManager.testSwapUnderContention(TestCallQueueManager.java:220) > {code} > It looked like the elements in the queue are dropped due to > {{CallQueueManager#swapQueue}} > Looked at the implementation of {{CallQueueManager#swapQueue}}, there is a > possibility that the elements in the queue are dropped. If the queue is full, > the calling thread for {{CallQueueManager#put}} is blocked for long time. It > may put the element into the old queue after queue in {{takeRef}} is changed > by swapQueue, then this element in the old queue will be dropped. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12189) CallQueueManager may drop elements from the queue sometimes when calling swapQueue
[ https://issues.apache.org/jira/browse/HADOOP-12189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619255#comment-14619255 ] Chris Li commented on HADOOP-12189: --- [~arpitagarwal] I think [~zxu] encountered unit test failures, which brought his attention here. If increasing checkpoints works then we should do that. I don't think we should introduce a new config parameter though... this is something nobody will ever modify and controls such a low level detail. I'd suggest experimenting with what passes and then increasing the tolerance by an order of magnitude to be safe... so if you can get it to pass with 10 checks at 2ms pause, then we can do 20-100 checks at 2ms pause (as long as the total wait time is < 1 second). This is basically mostly for developer's sakes, since as Arpit mentioned, dropping is a rarity even during queue swaps in real life. > CallQueueManager may drop elements from the queue sometimes when calling > swapQueue > -- > > Key: HADOOP-12189 > URL: https://issues.apache.org/jira/browse/HADOOP-12189 > Project: Hadoop Common > Issue Type: Bug > Components: ipc, test >Affects Versions: 2.7.1 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: HADOOP-12189.000.patch, HADOOP-12189.001.patch, > HADOOP-12189.none_guarantee.000.patch > > > CallQueueManager may drop elements from the queue sometimes when calling > {{swapQueue}}. > The following test failure from TestCallQueueManager shown some elements in > the queue are dropped. > https://builds.apache.org/job/PreCommit-HADOOP-Build/7150/testReport/org.apache.hadoop.ipc/TestCallQueueManager/testSwapUnderContention/ > {code} > java.lang.AssertionError: expected:<27241> but was:<27245> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.ipc.TestCallQueueManager.testSwapUnderContention(TestCallQueueManager.java:220) > {code} > It looked like the elements in the queue are dropped due to > {{CallQueueManager#swapQueue}} > Looked at the implementation of {{CallQueueManager#swapQueue}}, there is a > possibility that the elements in the queue are dropped. If the queue is full, > the calling thread for {{CallQueueManager#put}} is blocked for long time. It > may put the element into the old queue after queue in {{takeRef}} is changed > by swapQueue, then this element in the old queue will be dropped. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12189) CallQueueManager may drop elements from the queue sometimes when calling swapQueue
[ https://issues.apache.org/jira/browse/HADOOP-12189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14617345#comment-14617345 ] Chris Li commented on HADOOP-12189: --- That's a significant performance hit! I think we should aim for a lock-free solution in the <1% penalty range, hopefully nothing too complex either for code maintainability. > CallQueueManager may drop elements from the queue sometimes when calling > swapQueue > -- > > Key: HADOOP-12189 > URL: https://issues.apache.org/jira/browse/HADOOP-12189 > Project: Hadoop Common > Issue Type: Bug > Components: ipc, test >Affects Versions: 2.7.1 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: HADOOP-12189.000.patch, HADOOP-12189.001.patch > > > CallQueueManager may drop elements from the queue sometimes when calling > {{swapQueue}}. > The following test failure from TestCallQueueManager shown some elements in > the queue are dropped. > https://builds.apache.org/job/PreCommit-HADOOP-Build/7150/testReport/org.apache.hadoop.ipc/TestCallQueueManager/testSwapUnderContention/ > {code} > java.lang.AssertionError: expected:<27241> but was:<27245> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.ipc.TestCallQueueManager.testSwapUnderContention(TestCallQueueManager.java:220) > {code} > It looked like the elements in the queue are dropped due to > {{CallQueueManager#swapQueue}} > Looked at the implementation of {{CallQueueManager#swapQueue}}, there is a > possibility that the elements in the queue are dropped. If the queue is full, > the calling thread for {{CallQueueManager#put}} is blocked for long time. It > may put the element into the old queue after queue in {{takeRef}} is changed > by swapQueue, then this element in the old queue will be dropped. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12189) CallQueueManager may drop elements from the queue sometimes when calling swapQueue
[ https://issues.apache.org/jira/browse/HADOOP-12189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14616102#comment-14616102 ] Chris Li commented on HADOOP-12189: --- Wait, did the throughput drop from 437k/300ms to 260k/300ms? That's a pretty big drop. Even if it's not the bottleneck in the server, it's still extra overhead. Let's wait for Daryn's thoughts, because he guided the original design > CallQueueManager may drop elements from the queue sometimes when calling > swapQueue > -- > > Key: HADOOP-12189 > URL: https://issues.apache.org/jira/browse/HADOOP-12189 > Project: Hadoop Common > Issue Type: Bug > Components: ipc, test >Affects Versions: 2.7.1 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: HADOOP-12189.000.patch > > > CallQueueManager may drop elements from the queue sometimes when calling > {{swapQueue}}. > The following test failure from TestCallQueueManager shown some elements in > the queue are dropped. > https://builds.apache.org/job/PreCommit-HADOOP-Build/7150/testReport/org.apache.hadoop.ipc/TestCallQueueManager/testSwapUnderContention/ > {code} > java.lang.AssertionError: expected:<27241> but was:<27245> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.ipc.TestCallQueueManager.testSwapUnderContention(TestCallQueueManager.java:220) > {code} > It looked like the elements in the queue are dropped due to > {{CallQueueManager#swapQueue}} > Looked at the implementation of {{CallQueueManager#swapQueue}}, there is a > possibility that the elements in the queue are dropped. If the queue is full, > the calling thread for {{CallQueueManager#put}} is blocked for long time. It > may put the element into the old queue after queue in {{takeRef}} is changed > by swapQueue, then this element in the old queue will be dropped. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12189) CallQueueManager may drop elements from the queue sometimes when calling swapQueue
[ https://issues.apache.org/jira/browse/HADOOP-12189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14615900#comment-14615900 ] Chris Li commented on HADOOP-12189: --- Hi [~zxu], I see, it's a synchronized map, so its methods acquire the queue's intrinsic lock. Thanks for clarifying I have concerns with performance. Have you measured impact? Last I checked, acquiring a lock every time resulted in measurable performance penalty. It's not just time under contention, it's also the fact that synchronized will introduce a memory barrier, and having 120 threads contesting over yet another lock when processing 60k requests per second sounds questionable. I think it all comes down to performance. The unsafe "sleep 10ms and try again" scheme was born from that (since queue swapping is a relatively infrequent event compared to {{put()}}. If the performance situation has changed since we last discussed this, then we can come up with better solutions that are guarantee consistency. So options are: 1. Start locking and provide strict consistency (and then we don't need to Thread.sleep) 2. Try to increase sleep timeout or increase number of checks to make call loss practically impossible 3. Relax tests and accept the loss of calls during queue swaps [~daryn] might have some thoughts. > CallQueueManager may drop elements from the queue sometimes when calling > swapQueue > -- > > Key: HADOOP-12189 > URL: https://issues.apache.org/jira/browse/HADOOP-12189 > Project: Hadoop Common > Issue Type: Bug > Components: ipc, test >Affects Versions: 2.7.1 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: HADOOP-12189.000.patch > > > CallQueueManager may drop elements from the queue sometimes when calling > {{swapQueue}}. > The following test failure from TestCallQueueManager shown some elements in > the queue are dropped. > https://builds.apache.org/job/PreCommit-HADOOP-Build/7150/testReport/org.apache.hadoop.ipc/TestCallQueueManager/testSwapUnderContention/ > {code} > java.lang.AssertionError: expected:<27241> but was:<27245> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.ipc.TestCallQueueManager.testSwapUnderContention(TestCallQueueManager.java:220) > {code} > It looked like the elements in the queue are dropped due to > {{CallQueueManager#swapQueue}} > Looked at the implementation of {{CallQueueManager#swapQueue}}, there is a > possibility that the elements in the queue are dropped. If the queue is full, > the calling thread for {{CallQueueManager#put}} is blocked for long time. It > may put the element into the old queue after queue in {{takeRef}} is changed > by swapQueue, then this element in the old queue will be dropped. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12189) CallQueueManager may drop elements from the queue sometimes when calling swapQueue
[ https://issues.apache.org/jira/browse/HADOOP-12189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1461#comment-1461 ] Chris Li commented on HADOOP-12189: --- [~zxu] One of the design decisions of queue swapping is that it not require locking for performance reasons. A tradeoff was made where calls could be dropped in rare occasions, but this should be super rare, like a theoretical concern more than a practical one, so if you're seeing dropped calls we should fix that. However, last we discussed, this code should be lock free, so we can't depend on using synchronized() blocks. Also it seems that this code doesn't provide 100% guarantee either, since the queue can still be swapped in between bq = putRef.get() and num.incrementAndGet() See https://issues.apache.org/jira/browse/HADOOP-10278 > CallQueueManager may drop elements from the queue sometimes when calling > swapQueue > -- > > Key: HADOOP-12189 > URL: https://issues.apache.org/jira/browse/HADOOP-12189 > Project: Hadoop Common > Issue Type: Bug > Components: ipc, test >Affects Versions: 2.7.1 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: HADOOP-12189.000.patch > > > CallQueueManager may drop elements from the queue sometimes when calling > {{swapQueue}}. > The following test failure from TestCallQueueManager shown some elements in > the queue are dropped. > https://builds.apache.org/job/PreCommit-HADOOP-Build/7150/testReport/org.apache.hadoop.ipc/TestCallQueueManager/testSwapUnderContention/ > {code} > java.lang.AssertionError: expected:<27241> but was:<27245> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.ipc.TestCallQueueManager.testSwapUnderContention(TestCallQueueManager.java:220) > {code} > It looked like the elements in the queue are dropped due to > {{CallQueueManager#swapQueue}} > Looked at the implementation of {{CallQueueManager#swapQueue}}, there is a > possibility that the elements in the queue are dropped. If the queue is full, > the calling thread for {{CallQueueManager#put}} is blocked for long time. It > may put the element into the old queue after queue in {{takeRef}} is changed > by swapQueue, then this element in the old queue will be dropped. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11238) Update the NameNode's Group Cache in the background when possible
[ https://issues.apache.org/jira/browse/HADOOP-11238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Li updated HADOOP-11238: -- Attachment: HADOOP-11238.003.patch Re-attaching patch to re-trigger build (no changes were made) > Update the NameNode's Group Cache in the background when possible > - > > Key: HADOOP-11238 > URL: https://issues.apache.org/jira/browse/HADOOP-11238 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 2.5.1 >Reporter: Chris Li >Assignee: Chris Li >Priority: Minor > Attachments: HADOOP-11238.003.patch, HADOOP-11238.003.patch, > HADOOP-11238.patch, HADOOP-11238.patch > > > This patch addresses an issue where the namenode pauses during group > resolution by only allowing a single group resolution query on expiry. There > are two scenarios: > 1. When there is not yet a value in the cache, all threads which make a > request will block while a single thread fetches the value. > 2. When there is already a value in the cache and it is expired, the new > value will be fetched in the background while the old value is used by other > threads > This is handled by guava's cache. > Negative caching is a feature built into the groups cache, and since guava's > caches don't support different expiration times, we have a separate negative > cache which masks the guava cache: if an element exists in the negative cache > and isn't expired, we return it. > In total the logic for fetching a group is: > 1. If username exists in static cache, return the value (this was already > present) > 2. If username exists in negative cache and negative cache is not expired, > raise an exception as usual > 3. Otherwise Defer to guava cache (see two scenarios above) > Original Issue Below: > > Our namenode pauses for 12-60 seconds several times every hour. During these > pauses, no new requests can come in. > Around the time of pauses, we have log messages such as: > 2014-10-22 13:24:22,688 WARN org.apache.hadoop.security.Groups: Potential > performance problem: getGroups(user=x) took 34507 milliseconds. > The current theory is: > 1. Groups has a cache that is refreshed periodically. Each entry has a cache > expiry. > 2. When a cache entry expires, multiple threads can see this expiration and > then we have a thundering herd effect where all these threads hit the wire > and overwhelm our LDAP servers (we are using ShellBasedUnixGroupsMapping with > sssd, how this happens has yet to be established) > 3. group resolution queries begin to take longer, I've observed it taking 1.2 > seconds instead of the usual 0.01-0.03 seconds when measuring in the shell > `time groups myself` > 4. If there is mutual exclusion somewhere along this path, a 1 second pause > could lead to a 60 second pause as all the threads compete for the resource. > The exact cause hasn't been established > Potential solutions include: > 1. Increasing group cache time, which will make the issue less frequent > 2. Rolling evictions of the cache so we prevent the large spike in LDAP > queries > 3. Gate the cache refresh so that only one thread is responsible for > refreshing the cache -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11238) Update the NameNode's Group Cache in the background when possible
[ https://issues.apache.org/jira/browse/HADOOP-11238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Li updated HADOOP-11238: -- Attachment: HADOOP-11238.003.patch > Update the NameNode's Group Cache in the background when possible > - > > Key: HADOOP-11238 > URL: https://issues.apache.org/jira/browse/HADOOP-11238 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 2.5.1 >Reporter: Chris Li >Assignee: Chris Li >Priority: Minor > Attachments: HADOOP-11238.003.patch, HADOOP-11238.patch, > HADOOP-11238.patch > > > This patch addresses an issue where the namenode pauses during group > resolution by only allowing a single group resolution query on expiry. There > are two scenarios: > 1. When there is not yet a value in the cache, all threads which make a > request will block while a single thread fetches the value. > 2. When there is already a value in the cache and it is expired, the new > value will be fetched in the background while the old value is used by other > threads > This is handled by guava's cache. > Negative caching is a feature built into the groups cache, and since guava's > caches don't support different expiration times, we have a separate negative > cache which masks the guava cache: if an element exists in the negative cache > and isn't expired, we return it. > In total the logic for fetching a group is: > 1. If username exists in static cache, return the value (this was already > present) > 2. If username exists in negative cache and negative cache is not expired, > raise an exception as usual > 3. Otherwise Defer to guava cache (see two scenarios above) > Original Issue Below: > > Our namenode pauses for 12-60 seconds several times every hour. During these > pauses, no new requests can come in. > Around the time of pauses, we have log messages such as: > 2014-10-22 13:24:22,688 WARN org.apache.hadoop.security.Groups: Potential > performance problem: getGroups(user=x) took 34507 milliseconds. > The current theory is: > 1. Groups has a cache that is refreshed periodically. Each entry has a cache > expiry. > 2. When a cache entry expires, multiple threads can see this expiration and > then we have a thundering herd effect where all these threads hit the wire > and overwhelm our LDAP servers (we are using ShellBasedUnixGroupsMapping with > sssd, how this happens has yet to be established) > 3. group resolution queries begin to take longer, I've observed it taking 1.2 > seconds instead of the usual 0.01-0.03 seconds when measuring in the shell > `time groups myself` > 4. If there is mutual exclusion somewhere along this path, a 1 second pause > could lead to a 60 second pause as all the threads compete for the resource. > The exact cause hasn't been established > Potential solutions include: > 1. Increasing group cache time, which will make the issue less frequent > 2. Rolling evictions of the cache so we prevent the large spike in LDAP > queries > 3. Gate the cache refresh so that only one thread is responsible for > refreshing the cache -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11238) Update the NameNode's Group Cache in the background when possible
[ https://issues.apache.org/jira/browse/HADOOP-11238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14240225#comment-14240225 ] Chris Li commented on HADOOP-11238: --- [~benoyantony] good catch, I fixed that in this next patch. [~cmccabe] Updated comment in this next patch. Issue rename looks good too. Also thanks for letting me know about the patch numbering, I'll do that from now on! > Update the NameNode's Group Cache in the background when possible > - > > Key: HADOOP-11238 > URL: https://issues.apache.org/jira/browse/HADOOP-11238 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 2.5.1 >Reporter: Chris Li >Assignee: Chris Li >Priority: Minor > Attachments: HADOOP-11238.patch, HADOOP-11238.patch > > > This patch addresses an issue where the namenode pauses during group > resolution by only allowing a single group resolution query on expiry. There > are two scenarios: > 1. When there is not yet a value in the cache, all threads which make a > request will block while a single thread fetches the value. > 2. When there is already a value in the cache and it is expired, the new > value will be fetched in the background while the old value is used by other > threads > This is handled by guava's cache. > Negative caching is a feature built into the groups cache, and since guava's > caches don't support different expiration times, we have a separate negative > cache which masks the guava cache: if an element exists in the negative cache > and isn't expired, we return it. > In total the logic for fetching a group is: > 1. If username exists in static cache, return the value (this was already > present) > 2. If username exists in negative cache and negative cache is not expired, > raise an exception as usual > 3. Otherwise Defer to guava cache (see two scenarios above) > Original Issue Below: > > Our namenode pauses for 12-60 seconds several times every hour. During these > pauses, no new requests can come in. > Around the time of pauses, we have log messages such as: > 2014-10-22 13:24:22,688 WARN org.apache.hadoop.security.Groups: Potential > performance problem: getGroups(user=x) took 34507 milliseconds. > The current theory is: > 1. Groups has a cache that is refreshed periodically. Each entry has a cache > expiry. > 2. When a cache entry expires, multiple threads can see this expiration and > then we have a thundering herd effect where all these threads hit the wire > and overwhelm our LDAP servers (we are using ShellBasedUnixGroupsMapping with > sssd, how this happens has yet to be established) > 3. group resolution queries begin to take longer, I've observed it taking 1.2 > seconds instead of the usual 0.01-0.03 seconds when measuring in the shell > `time groups myself` > 4. If there is mutual exclusion somewhere along this path, a 1 second pause > could lead to a 60 second pause as all the threads compete for the resource. > The exact cause hasn't been established > Potential solutions include: > 1. Increasing group cache time, which will make the issue less frequent > 2. Rolling evictions of the cache so we prevent the large spike in LDAP > queries > 3. Gate the cache refresh so that only one thread is responsible for > refreshing the cache -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11238) Group Cache should not cause namenode pause
[ https://issues.apache.org/jira/browse/HADOOP-11238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Li updated HADOOP-11238: -- Attachment: HADOOP-11238.patch > Group Cache should not cause namenode pause > --- > > Key: HADOOP-11238 > URL: https://issues.apache.org/jira/browse/HADOOP-11238 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 2.5.1 >Reporter: Chris Li >Assignee: Chris Li >Priority: Minor > Attachments: HADOOP-11238.patch, HADOOP-11238.patch > > > This patch addresses an issue where the namenode pauses during group > resolution by only allowing a single group resolution query on expiry. There > are two scenarios: > 1. When there is not yet a value in the cache, all threads which make a > request will block while a single thread fetches the value. > 2. When there is already a value in the cache and it is expired, the new > value will be fetched in the background while the old value is used by other > threads > This is handled by guava's cache. > Negative caching is a feature built into the groups cache, and since guava's > caches don't support different expiration times, we have a separate negative > cache which masks the guava cache: if an element exists in the negative cache > and isn't expired, we return it. > In total the logic for fetching a group is: > 1. If username exists in static cache, return the value (this was already > present) > 2. If username exists in negative cache and negative cache is not expired, > raise an exception as usual > 3. Otherwise Defer to guava cache (see two scenarios above) > Original Issue Below: > > Our namenode pauses for 12-60 seconds several times every hour. During these > pauses, no new requests can come in. > Around the time of pauses, we have log messages such as: > 2014-10-22 13:24:22,688 WARN org.apache.hadoop.security.Groups: Potential > performance problem: getGroups(user=x) took 34507 milliseconds. > The current theory is: > 1. Groups has a cache that is refreshed periodically. Each entry has a cache > expiry. > 2. When a cache entry expires, multiple threads can see this expiration and > then we have a thundering herd effect where all these threads hit the wire > and overwhelm our LDAP servers (we are using ShellBasedUnixGroupsMapping with > sssd, how this happens has yet to be established) > 3. group resolution queries begin to take longer, I've observed it taking 1.2 > seconds instead of the usual 0.01-0.03 seconds when measuring in the shell > `time groups myself` > 4. If there is mutual exclusion somewhere along this path, a 1 second pause > could lead to a 60 second pause as all the threads compete for the resource. > The exact cause hasn't been established > Potential solutions include: > 1. Increasing group cache time, which will make the issue less frequent > 2. Rolling evictions of the cache so we prevent the large spike in LDAP > queries > 3. Gate the cache refresh so that only one thread is responsible for > refreshing the cache -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11238) Group Cache should not cause namenode pause
[ https://issues.apache.org/jira/browse/HADOOP-11238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14236035#comment-14236035 ] Chris Li commented on HADOOP-11238: --- [~benoyantony] I trimmed the length on line 63 Hi [~cmccabe], made those changes to the jira ticket, thanks for the advice. bq. Do we need to set the ticker here? It seems that the default ticker uses System.nanoTime, which is the same monotonic time source that Hadoop's Timer uses. It's currently like this because the test case uses dependency injection to test timing. We could use a fake guava timer but I wanted to avoid tight coupling between hadoop and the guava library. Let me know what you think. bq. Why don't we use Guava's expireAfterWrite option to remove entries from the cache after a certain timeout? I guess this could be a separate configuration option, or we could just use 10 * cacheTimeout. Good idea. I added this using 10*cacheTimeout, and added some comments > Group Cache should not cause namenode pause > --- > > Key: HADOOP-11238 > URL: https://issues.apache.org/jira/browse/HADOOP-11238 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 2.5.1 >Reporter: Chris Li >Assignee: Chris Li >Priority: Minor > Attachments: HADOOP-11238.patch, HADOOP-11238.patch > > > This patch addresses an issue where the namenode pauses during group > resolution by only allowing a single group resolution query on expiry. There > are two scenarios: > 1. When there is not yet a value in the cache, all threads which make a > request will block while a single thread fetches the value. > 2. When there is already a value in the cache and it is expired, the new > value will be fetched in the background while the old value is used by other > threads > This is handled by guava's cache. > Negative caching is a feature built into the groups cache, and since guava's > caches don't support different expiration times, we have a separate negative > cache which masks the guava cache: if an element exists in the negative cache > and isn't expired, we return it. > In total the logic for fetching a group is: > 1. If username exists in static cache, return the value (this was already > present) > 2. If username exists in negative cache and negative cache is not expired, > raise an exception as usual > 3. Otherwise Defer to guava cache (see two scenarios above) > Original Issue Below: > > Our namenode pauses for 12-60 seconds several times every hour. During these > pauses, no new requests can come in. > Around the time of pauses, we have log messages such as: > 2014-10-22 13:24:22,688 WARN org.apache.hadoop.security.Groups: Potential > performance problem: getGroups(user=x) took 34507 milliseconds. > The current theory is: > 1. Groups has a cache that is refreshed periodically. Each entry has a cache > expiry. > 2. When a cache entry expires, multiple threads can see this expiration and > then we have a thundering herd effect where all these threads hit the wire > and overwhelm our LDAP servers (we are using ShellBasedUnixGroupsMapping with > sssd, how this happens has yet to be established) > 3. group resolution queries begin to take longer, I've observed it taking 1.2 > seconds instead of the usual 0.01-0.03 seconds when measuring in the shell > `time groups myself` > 4. If there is mutual exclusion somewhere along this path, a 1 second pause > could lead to a 60 second pause as all the threads compete for the resource. > The exact cause hasn't been established > Potential solutions include: > 1. Increasing group cache time, which will make the issue less frequent > 2. Rolling evictions of the cache so we prevent the large spike in LDAP > queries > 3. Gate the cache refresh so that only one thread is responsible for > refreshing the cache -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11238) Group Cache should not cause namenode pause
[ https://issues.apache.org/jira/browse/HADOOP-11238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Li updated HADOOP-11238: -- Description: This patch addresses an issue where the namenode pauses during group resolution by only allowing a single group resolution query on expiry. There are two scenarios: 1. When there is not yet a value in the cache, all threads which make a request will block while a single thread fetches the value. 2. When there is already a value in the cache and it is expired, the new value will be fetched in the background while the old value is used by other threads This is handled by guava's cache. Negative caching is a feature built into the groups cache, and since guava's caches don't support different expiration times, we have a separate negative cache which masks the guava cache: if an element exists in the negative cache and isn't expired, we return it. In total the logic for fetching a group is: 1. If username exists in static cache, return the value (this was already present) 2. If username exists in negative cache and negative cache is not expired, raise an exception as usual 3. Otherwise Defer to guava cache (see two scenarios above) Original Issue Below: Our namenode pauses for 12-60 seconds several times every hour. During these pauses, no new requests can come in. Around the time of pauses, we have log messages such as: 2014-10-22 13:24:22,688 WARN org.apache.hadoop.security.Groups: Potential performance problem: getGroups(user=x) took 34507 milliseconds. The current theory is: 1. Groups has a cache that is refreshed periodically. Each entry has a cache expiry. 2. When a cache entry expires, multiple threads can see this expiration and then we have a thundering herd effect where all these threads hit the wire and overwhelm our LDAP servers (we are using ShellBasedUnixGroupsMapping with sssd, how this happens has yet to be established) 3. group resolution queries begin to take longer, I've observed it taking 1.2 seconds instead of the usual 0.01-0.03 seconds when measuring in the shell `time groups myself` 4. If there is mutual exclusion somewhere along this path, a 1 second pause could lead to a 60 second pause as all the threads compete for the resource. The exact cause hasn't been established Potential solutions include: 1. Increasing group cache time, which will make the issue less frequent 2. Rolling evictions of the cache so we prevent the large spike in LDAP queries 3. Gate the cache refresh so that only one thread is responsible for refreshing the cache was: This patch prevents the namenode from pausing during group cache expiry when getGroups takes a long time by returning the previous value Original Issue Below: Our namenode pauses for 12-60 seconds several times every hour. During these pauses, no new requests can come in. Around the time of pauses, we have log messages such as: 2014-10-22 13:24:22,688 WARN org.apache.hadoop.security.Groups: Potential performance problem: getGroups(user=x) took 34507 milliseconds. The current theory is: 1. Groups has a cache that is refreshed periodically. Each entry has a cache expiry. 2. When a cache entry expires, multiple threads can see this expiration and then we have a thundering herd effect where all these threads hit the wire and overwhelm our LDAP servers (we are using ShellBasedUnixGroupsMapping with sssd, how this happens has yet to be established) 3. group resolution queries begin to take longer, I've observed it taking 1.2 seconds instead of the usual 0.01-0.03 seconds when measuring in the shell `time groups myself` 4. If there is mutual exclusion somewhere along this path, a 1 second pause could lead to a 60 second pause as all the threads compete for the resource. The exact cause hasn't been established Potential solutions include: 1. Increasing group cache time, which will make the issue less frequent 2. Rolling evictions of the cache so we prevent the large spike in LDAP queries 3. Gate the cache refresh so that only one thread is responsible for refreshing the cache > Group Cache should not cause namenode pause > --- > > Key: HADOOP-11238 > URL: https://issues.apache.org/jira/browse/HADOOP-11238 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 2.5.1 >Reporter: Chris Li >Assignee: Chris Li >Priority: Minor > Attachments: HADOOP-11238.patch > > > This patch addresses an issue where the namenode pauses during group > resolution by only allowing a single group resolution query on expiry. There > are two scenarios: > 1. When there is not yet a value in the cache, all threads which make a > request will block while a single thread fetches the value. > 2. When there is already
[jira] [Updated] (HADOOP-11238) Group Cache should not cause namenode pause
[ https://issues.apache.org/jira/browse/HADOOP-11238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Li updated HADOOP-11238: -- Description: This patch prevents the namenode from pausing during group cache expiry when getGroups takes a long time by returning the previous value Original Issue Below: Our namenode pauses for 12-60 seconds several times every hour. During these pauses, no new requests can come in. Around the time of pauses, we have log messages such as: 2014-10-22 13:24:22,688 WARN org.apache.hadoop.security.Groups: Potential performance problem: getGroups(user=x) took 34507 milliseconds. The current theory is: 1. Groups has a cache that is refreshed periodically. Each entry has a cache expiry. 2. When a cache entry expires, multiple threads can see this expiration and then we have a thundering herd effect where all these threads hit the wire and overwhelm our LDAP servers (we are using ShellBasedUnixGroupsMapping with sssd, how this happens has yet to be established) 3. group resolution queries begin to take longer, I've observed it taking 1.2 seconds instead of the usual 0.01-0.03 seconds when measuring in the shell `time groups myself` 4. If there is mutual exclusion somewhere along this path, a 1 second pause could lead to a 60 second pause as all the threads compete for the resource. The exact cause hasn't been established Potential solutions include: 1. Increasing group cache time, which will make the issue less frequent 2. Rolling evictions of the cache so we prevent the large spike in LDAP queries 3. Gate the cache refresh so that only one thread is responsible for refreshing the cache was: Our namenode pauses for 12-60 seconds several times every hour. During these pauses, no new requests can come in. Around the time of pauses, we have log messages such as: 2014-10-22 13:24:22,688 WARN org.apache.hadoop.security.Groups: Potential performance problem: getGroups(user=x) took 34507 milliseconds. The current theory is: 1. Groups has a cache that is refreshed periodically. Each entry has a cache expiry. 2. When a cache entry expires, multiple threads can see this expiration and then we have a thundering herd effect where all these threads hit the wire and overwhelm our LDAP servers (we are using ShellBasedUnixGroupsMapping with sssd, how this happens has yet to be established) 3. group resolution queries begin to take longer, I've observed it taking 1.2 seconds instead of the usual 0.01-0.03 seconds when measuring in the shell `time groups myself` 4. If there is mutual exclusion somewhere along this path, a 1 second pause could lead to a 60 second pause as all the threads compete for the resource. The exact cause hasn't been established Potential solutions include: 1. Increasing group cache time, which will make the issue less frequent 2. Rolling evictions of the cache so we prevent the large spike in LDAP queries 3. Gate the cache refresh so that only one thread is responsible for refreshing the cache > Group Cache should not cause namenode pause > --- > > Key: HADOOP-11238 > URL: https://issues.apache.org/jira/browse/HADOOP-11238 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 2.5.1 >Reporter: Chris Li >Assignee: Chris Li >Priority: Minor > Attachments: HADOOP-11238.patch > > > This patch prevents the namenode from pausing during group cache expiry when > getGroups takes a long time by returning the previous value > Original Issue Below: > > Our namenode pauses for 12-60 seconds several times every hour. During these > pauses, no new requests can come in. > Around the time of pauses, we have log messages such as: > 2014-10-22 13:24:22,688 WARN org.apache.hadoop.security.Groups: Potential > performance problem: getGroups(user=x) took 34507 milliseconds. > The current theory is: > 1. Groups has a cache that is refreshed periodically. Each entry has a cache > expiry. > 2. When a cache entry expires, multiple threads can see this expiration and > then we have a thundering herd effect where all these threads hit the wire > and overwhelm our LDAP servers (we are using ShellBasedUnixGroupsMapping with > sssd, how this happens has yet to be established) > 3. group resolution queries begin to take longer, I've observed it taking 1.2 > seconds instead of the usual 0.01-0.03 seconds when measuring in the shell > `time groups myself` > 4. If there is mutual exclusion somewhere along this path, a 1 second pause > could lead to a 60 second pause as all the threads compete for the resource. > The exact cause hasn't been established > Potential solutions include: > 1. Increasing group cache time, which will make the issue less frequent > 2. Rolling evictions of the
[jira] [Updated] (HADOOP-11238) Group Cache should not cause namenode pause
[ https://issues.apache.org/jira/browse/HADOOP-11238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Li updated HADOOP-11238: -- Summary: Group Cache should not cause namenode pause (was: Group cache expiry causes namenode slowdown) > Group Cache should not cause namenode pause > --- > > Key: HADOOP-11238 > URL: https://issues.apache.org/jira/browse/HADOOP-11238 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 2.5.1 >Reporter: Chris Li >Assignee: Chris Li >Priority: Minor > Attachments: HADOOP-11238.patch > > > Our namenode pauses for 12-60 seconds several times every hour. During these > pauses, no new requests can come in. > Around the time of pauses, we have log messages such as: > 2014-10-22 13:24:22,688 WARN org.apache.hadoop.security.Groups: Potential > performance problem: getGroups(user=x) took 34507 milliseconds. > The current theory is: > 1. Groups has a cache that is refreshed periodically. Each entry has a cache > expiry. > 2. When a cache entry expires, multiple threads can see this expiration and > then we have a thundering herd effect where all these threads hit the wire > and overwhelm our LDAP servers (we are using ShellBasedUnixGroupsMapping with > sssd, how this happens has yet to be established) > 3. group resolution queries begin to take longer, I've observed it taking 1.2 > seconds instead of the usual 0.01-0.03 seconds when measuring in the shell > `time groups myself` > 4. If there is mutual exclusion somewhere along this path, a 1 second pause > could lead to a 60 second pause as all the threads compete for the resource. > The exact cause hasn't been established > Potential solutions include: > 1. Increasing group cache time, which will make the issue less frequent > 2. Rolling evictions of the cache so we prevent the large spike in LDAP > queries > 3. Gate the cache refresh so that only one thread is responsible for > refreshing the cache -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-10597) Evaluate if we can have RPC client back off when server is under heavy load
[ https://issues.apache.org/jira/browse/HADOOP-10597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218307#comment-14218307 ] Chris Li commented on HADOOP-10597: --- Ah okay, thanks for clarifying, makes sense now > Evaluate if we can have RPC client back off when server is under heavy load > --- > > Key: HADOOP-10597 > URL: https://issues.apache.org/jira/browse/HADOOP-10597 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Ming Ma >Assignee: Ming Ma > Attachments: HADOOP-10597-2.patch, HADOOP-10597.patch, > MoreRPCClientBackoffEvaluation.pdf, RPCClientBackoffDesignAndEvaluation.pdf > > > Currently if an application hits NN too hard, RPC requests be in blocking > state, assuming OS connection doesn't run out. Alternatively RPC or NN can > throw some well defined exception back to the client based on certain > policies when it is under heavy load; client will understand such exception > and do exponential back off, as another implementation of > RetryInvocationHandler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-11299) Selective Group Cache Refresh
Chris Li created HADOOP-11299: - Summary: Selective Group Cache Refresh Key: HADOOP-11299 URL: https://issues.apache.org/jira/browse/HADOOP-11299 Project: Hadoop Common Issue Type: New Feature Reporter: Chris Li Assignee: Chris Li Priority: Minor Currently groups can be refreshed via dfsadmin -refreshUserToGroupMappings, but this places an undue stress on LDAP if we clear everything at once, where we may only want to refresh a single user. This feature would allow the admin to specify a list of users to refresh. Passing arguments is a feature of HADOOP-10376 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11238) Group cache expiry causes namenode slowdown
[ https://issues.apache.org/jira/browse/HADOOP-11238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207110#comment-14207110 ] Chris Li commented on HADOOP-11238: --- Hmm, that test passes on my local machine so it may be another patch causing that failure. I can re-upload the patch later to trigger another run. > Group cache expiry causes namenode slowdown > --- > > Key: HADOOP-11238 > URL: https://issues.apache.org/jira/browse/HADOOP-11238 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 2.5.1 >Reporter: Chris Li >Assignee: Chris Li >Priority: Minor > Attachments: HADOOP-11238.patch > > > Our namenode pauses for 12-60 seconds several times every hour. During these > pauses, no new requests can come in. > Around the time of pauses, we have log messages such as: > 2014-10-22 13:24:22,688 WARN org.apache.hadoop.security.Groups: Potential > performance problem: getGroups(user=x) took 34507 milliseconds. > The current theory is: > 1. Groups has a cache that is refreshed periodically. Each entry has a cache > expiry. > 2. When a cache entry expires, multiple threads can see this expiration and > then we have a thundering herd effect where all these threads hit the wire > and overwhelm our LDAP servers (we are using ShellBasedUnixGroupsMapping with > sssd, how this happens has yet to be established) > 3. group resolution queries begin to take longer, I've observed it taking 1.2 > seconds instead of the usual 0.01-0.03 seconds when measuring in the shell > `time groups myself` > 4. If there is mutual exclusion somewhere along this path, a 1 second pause > could lead to a 60 second pause as all the threads compete for the resource. > The exact cause hasn't been established > Potential solutions include: > 1. Increasing group cache time, which will make the issue less frequent > 2. Rolling evictions of the cache so we prevent the large spike in LDAP > queries > 3. Gate the cache refresh so that only one thread is responsible for > refreshing the cache -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11238) Group cache expiry causes namenode slowdown
[ https://issues.apache.org/jira/browse/HADOOP-11238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207107#comment-14207107 ] Chris Li commented on HADOOP-11238: --- Sure: This patch addresses the issue by only allowing a single group resolution query on expiry. There are two scenarios: 1. When there is not yet a value in the cache, all threads which make a request will block while a single thread fetches the value. 2. When there is already a value in the cache and it is expired, the new value will be fetched in the background while the old value is used by other threads This is handled by guava's cache. Negative caching is a feature built into the groups cache, and since guava's caches don't support different expiration times, we have a separate negative cache which masks the guava cache: if an element exists in the negative cache and isn't expired, we return it. In total the logic for fetching a group is: 1. If username exists in static cache, return the value (this was already present) 2. If username exists in negative cache and negative cache is not expired, raise an exception 3. Defer to guava cache (see two scenarios above) Let me know if you'd like additional details. Also I will investigate that failing test. > Group cache expiry causes namenode slowdown > --- > > Key: HADOOP-11238 > URL: https://issues.apache.org/jira/browse/HADOOP-11238 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 2.5.1 >Reporter: Chris Li >Assignee: Chris Li >Priority: Minor > Attachments: HADOOP-11238.patch > > > Our namenode pauses for 12-60 seconds several times every hour. During these > pauses, no new requests can come in. > Around the time of pauses, we have log messages such as: > 2014-10-22 13:24:22,688 WARN org.apache.hadoop.security.Groups: Potential > performance problem: getGroups(user=x) took 34507 milliseconds. > The current theory is: > 1. Groups has a cache that is refreshed periodically. Each entry has a cache > expiry. > 2. When a cache entry expires, multiple threads can see this expiration and > then we have a thundering herd effect where all these threads hit the wire > and overwhelm our LDAP servers (we are using ShellBasedUnixGroupsMapping with > sssd, how this happens has yet to be established) > 3. group resolution queries begin to take longer, I've observed it taking 1.2 > seconds instead of the usual 0.01-0.03 seconds when measuring in the shell > `time groups myself` > 4. If there is mutual exclusion somewhere along this path, a 1 second pause > could lead to a 60 second pause as all the threads compete for the resource. > The exact cause hasn't been established > Potential solutions include: > 1. Increasing group cache time, which will make the issue less frequent > 2. Rolling evictions of the cache so we prevent the large spike in LDAP > queries > 3. Gate the cache refresh so that only one thread is responsible for > refreshing the cache -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11238) Group cache expiry causes namenode slowdown
[ https://issues.apache.org/jira/browse/HADOOP-11238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Li updated HADOOP-11238: -- Attachment: HADOOP-11238.patch Uploading patch > Group cache expiry causes namenode slowdown > --- > > Key: HADOOP-11238 > URL: https://issues.apache.org/jira/browse/HADOOP-11238 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 2.5.1 >Reporter: Chris Li >Assignee: Chris Li >Priority: Minor > Attachments: HADOOP-11238.patch > > > Our namenode pauses for 12-60 seconds several times every hour. During these > pauses, no new requests can come in. > Around the time of pauses, we have log messages such as: > 2014-10-22 13:24:22,688 WARN org.apache.hadoop.security.Groups: Potential > performance problem: getGroups(user=x) took 34507 milliseconds. > The current theory is: > 1. Groups has a cache that is refreshed periodically. Each entry has a cache > expiry. > 2. When a cache entry expires, multiple threads can see this expiration and > then we have a thundering herd effect where all these threads hit the wire > and overwhelm our LDAP servers (we are using ShellBasedUnixGroupsMapping with > sssd, how this happens has yet to be established) > 3. group resolution queries begin to take longer, I've observed it taking 1.2 > seconds instead of the usual 0.01-0.03 seconds when measuring in the shell > `time groups myself` > 4. If there is mutual exclusion somewhere along this path, a 1 second pause > could lead to a 60 second pause as all the threads compete for the resource. > The exact cause hasn't been established > Potential solutions include: > 1. Increasing group cache time, which will make the issue less frequent > 2. Rolling evictions of the cache so we prevent the large spike in LDAP > queries > 3. Gate the cache refresh so that only one thread is responsible for > refreshing the cache -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11238) Group cache expiry causes namenode slowdown
[ https://issues.apache.org/jira/browse/HADOOP-11238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Li updated HADOOP-11238: -- Status: Patch Available (was: Open) > Group cache expiry causes namenode slowdown > --- > > Key: HADOOP-11238 > URL: https://issues.apache.org/jira/browse/HADOOP-11238 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 2.5.1 >Reporter: Chris Li >Assignee: Chris Li >Priority: Minor > Attachments: HADOOP-11238.patch > > > Our namenode pauses for 12-60 seconds several times every hour. During these > pauses, no new requests can come in. > Around the time of pauses, we have log messages such as: > 2014-10-22 13:24:22,688 WARN org.apache.hadoop.security.Groups: Potential > performance problem: getGroups(user=x) took 34507 milliseconds. > The current theory is: > 1. Groups has a cache that is refreshed periodically. Each entry has a cache > expiry. > 2. When a cache entry expires, multiple threads can see this expiration and > then we have a thundering herd effect where all these threads hit the wire > and overwhelm our LDAP servers (we are using ShellBasedUnixGroupsMapping with > sssd, how this happens has yet to be established) > 3. group resolution queries begin to take longer, I've observed it taking 1.2 > seconds instead of the usual 0.01-0.03 seconds when measuring in the shell > `time groups myself` > 4. If there is mutual exclusion somewhere along this path, a 1 second pause > could lead to a 60 second pause as all the threads compete for the resource. > The exact cause hasn't been established > Potential solutions include: > 1. Increasing group cache time, which will make the issue less frequent > 2. Rolling evictions of the cache so we prevent the large spike in LDAP > queries > 3. Gate the cache refresh so that only one thread is responsible for > refreshing the cache -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HADOOP-10302) Allow CallQueue impls to be swapped at runtime (part 1: internals) Depends on: subtask1
[ https://issues.apache.org/jira/browse/HADOOP-10302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Li reassigned HADOOP-10302: - Assignee: Chris Li > Allow CallQueue impls to be swapped at runtime (part 1: internals) Depends > on: subtask1 > --- > > Key: HADOOP-10302 > URL: https://issues.apache.org/jira/browse/HADOOP-10302 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Chris Li >Assignee: Chris Li > Attachments: HADOOP-10302.patch > > > We wish to swap the active call queue during runtime in order to do > performance tuning without restarting the namenode. > This patch adds only the internals necessary to swap. Part 2 will add a user > interface so that it can be used. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HADOOP-10283) Make Scheduler and Multiplexer swappable
[ https://issues.apache.org/jira/browse/HADOOP-10283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Li resolved HADOOP-10283. --- Resolution: Not a Problem Resolved in HADOOP-10282 > Make Scheduler and Multiplexer swappable > > > Key: HADOOP-10283 > URL: https://issues.apache.org/jira/browse/HADOOP-10283 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Chris Li >Assignee: Chris Li >Priority: Minor > > Currently the FairCallQueue uses the DecayRpcScheduler & > RoundRobinMultiplexer, this task is to allow the user to configure the > scheduler and mux in config settings -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HADOOP-10284) Add metrics to the HistoryRpcScheduler
[ https://issues.apache.org/jira/browse/HADOOP-10284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Li resolved HADOOP-10284. --- Resolution: Not a Problem Assignee: Chris Li Resolved in HADOOP-10281 > Add metrics to the HistoryRpcScheduler > -- > > Key: HADOOP-10284 > URL: https://issues.apache.org/jira/browse/HADOOP-10284 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Chris Li >Assignee: Chris Li > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-10597) Evaluate if we can have RPC client back off when server is under heavy load
[ https://issues.apache.org/jira/browse/HADOOP-10597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14201000#comment-14201000 ] Chris Li commented on HADOOP-10597: --- Cool. I think this is a good feature to have. One small question about the code: +LOG.warn("Element " + e + " was queued properly." + +"But client is asked to retry."); >From my brief study of the code, it seems like isCallQueued is passed pretty >deep in order to maintain some sort of reference count on how many pending >requests each handler has waiting client-side to retry. Does this count always >balance to zero? What if a client makes a request, is denied, and then >terminates before it can make a request that successfully queues? Also, what conditions will the element be queued correctly but the client gets a retry? Also kind of a small thing but instead of recentBackOffCount.set(oldValue) it would be more clear to create a new variable newValue and recentBackOffCount.set(newValue) instead of mutating oldValue, or perhaps just rename the oldValue variable to something which doesn't imply immutability > Evaluate if we can have RPC client back off when server is under heavy load > --- > > Key: HADOOP-10597 > URL: https://issues.apache.org/jira/browse/HADOOP-10597 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Ming Ma >Assignee: Ming Ma > Attachments: HADOOP-10597-2.patch, HADOOP-10597.patch, > MoreRPCClientBackoffEvaluation.pdf, RPCClientBackoffDesignAndEvaluation.pdf > > > Currently if an application hits NN too hard, RPC requests be in blocking > state, assuming OS connection doesn't run out. Alternatively RPC or NN can > throw some well defined exception back to the client based on certain > policies when it is under heavy load; client will understand such exception > and do exponential back off, as another implementation of > RetryInvocationHandler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11238) Group cache expiry causes namenode slowdown
[ https://issues.apache.org/jira/browse/HADOOP-11238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Li updated HADOOP-11238: -- Description: Our namenode pauses for 12-60 seconds several times every hour. During these pauses, no new requests can come in. Around the time of pauses, we have log messages such as: 2014-10-22 13:24:22,688 WARN org.apache.hadoop.security.Groups: Potential performance problem: getGroups(user=x) took 34507 milliseconds. The current theory is: 1. Groups has a cache that is refreshed periodically. Each entry has a cache expiry. 2. When a cache entry expires, multiple threads can see this expiration and then we have a thundering herd effect where all these threads hit the wire and overwhelm our LDAP servers (we are using ShellBasedUnixGroupsMapping with sssd, how this happens has yet to be established) 3. group resolution queries begin to take longer, I've observed it taking 1.2 seconds instead of the usual 0.01-0.03 seconds when measuring in the shell `time groups myself` 4. If there is mutual exclusion somewhere along this path, a 1 second pause could lead to a 60 second pause as all the threads compete for the resource. The exact cause hasn't been established Potential solutions include: 1. Increasing group cache time, which will make the issue less frequent 2. Rolling evictions of the cache so we prevent the large spike in LDAP queries 3. Gate the cache refresh so that only one thread is responsible for refreshing the cache was: Our namenode pauses for 12-60 seconds several times every hour or so. During these pauses, no new requests can come in. Around the time of pauses, we have log messages such as: 2014-10-22 13:24:22,688 WARN org.apache.hadoop.security.Groups: Potential performance problem: getGroups(user=x) took 34507 milliseconds. The current theory is: 1. Groups has a cache that is refreshed periodically. Each entry has a cache expiry. 2. When a cache entry expires, multiple threads can see this expiration and then we have a thundering herd effect where all these threads hit the wire and overwhelm our LDAP servers (we are using ShellBasedUnixGroupsMapping with sssd, how this happens has yet to be established) 3. group resolution queries begin to take longer, I've observed it taking 1.2 seconds instead of the usual 0.01-0.03 seconds when measuring in the shell `time groups myself` 4. If there is mutual exclusion somewhere along this path, a 1 second pause could lead to a 60 second pause as all the threads compete for the resource. The exact cause hasn't been established Potential solutions include: 1. Increasing group cache time, which will make the issue less frequent 2. Rolling evictions of the cache so we prevent the large spike in LDAP queries 3. Gate the cache refresh so that only one thread is responsible for refreshing the cache > Group cache expiry causes namenode slowdown > --- > > Key: HADOOP-11238 > URL: https://issues.apache.org/jira/browse/HADOOP-11238 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 2.5.1 >Reporter: Chris Li >Assignee: Chris Li >Priority: Minor > > Our namenode pauses for 12-60 seconds several times every hour. During these > pauses, no new requests can come in. > Around the time of pauses, we have log messages such as: > 2014-10-22 13:24:22,688 WARN org.apache.hadoop.security.Groups: Potential > performance problem: getGroups(user=x) took 34507 milliseconds. > The current theory is: > 1. Groups has a cache that is refreshed periodically. Each entry has a cache > expiry. > 2. When a cache entry expires, multiple threads can see this expiration and > then we have a thundering herd effect where all these threads hit the wire > and overwhelm our LDAP servers (we are using ShellBasedUnixGroupsMapping with > sssd, how this happens has yet to be established) > 3. group resolution queries begin to take longer, I've observed it taking 1.2 > seconds instead of the usual 0.01-0.03 seconds when measuring in the shell > `time groups myself` > 4. If there is mutual exclusion somewhere along this path, a 1 second pause > could lead to a 60 second pause as all the threads compete for the resource. > The exact cause hasn't been established > Potential solutions include: > 1. Increasing group cache time, which will make the issue less frequent > 2. Rolling evictions of the cache so we prevent the large spike in LDAP > queries > 3. Gate the cache refresh so that only one thread is responsible for > refreshing the cache -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11238) Group cache expiry causes namenode slowdown
[ https://issues.apache.org/jira/browse/HADOOP-11238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Li updated HADOOP-11238: -- Description: Our namenode pauses for 12-60 seconds several times every hour or so. During these pauses, no new requests can come in. Around the time of pauses, we have log messages such as: 2014-10-22 13:24:22,688 WARN org.apache.hadoop.security.Groups: Potential performance problem: getGroups(user=x) took 34507 milliseconds. The current theory is: 1. Groups has a cache that is refreshed periodically. Each entry has a cache expiry. 2. When a cache entry expires, multiple threads can see this expiration and then we have a thundering herd effect where all these threads hit the wire and overwhelm our LDAP servers (we are using ShellBasedUnixGroupsMapping with sssd, how this happens has yet to be established) 3. group resolution queries begin to take longer, I've observed it taking 1.2 seconds instead of the usual 0.01-0.03 seconds when measuring in the shell `time groups myself` 4. If there is mutual exclusion somewhere along this path, a 1 second pause could lead to a 60 second pause as all the threads compete for the resource. The exact cause hasn't been established Potential solutions include: 1. Increasing group cache time, which will make the issue less frequent 2. Rolling evictions of the cache so we prevent the large spike in LDAP queries 3. Gate the cache refresh so that only one thread is responsible for refreshing the cache was: Our namenode pauses for 12-60 seconds several times every hour or so. During these pauses, no new requests can come in. Around the time of pauses, we have log messages such as: 2014-10-22 13:24:22,688 WARN org.apache.hadoop.security.Groups: Potential performance problem: getGroups(user=x) took 34507 milliseconds. The current theory is: 1. Groups has a cache that is refreshed periodically. 2. When the cache is cleared, we have a thundering herd effect which overwhelms our LDAP servers (we are using ShellBasedUnixGroupsMapping with sssd, how this happens has yet to be established) 3. group resolution queries begin to take longer, I've observed it taking 1.2 seconds instead of the usual 0.01-0.03 seconds when measuring in the shell `time groups myself` 4. If there is mutual exclusion somewhere along this path, a 1 second pause could lead to a 60 second pause as all the threads compete for the resource. The exact cause hasn't been established Potential solutions include: 1. Increasing group cache time, which will make the issue less frequent 2. Rolling evictions of the cache so we prevent the large spike in LDAP queries > Group cache expiry causes namenode slowdown > --- > > Key: HADOOP-11238 > URL: https://issues.apache.org/jira/browse/HADOOP-11238 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 2.5.1 >Reporter: Chris Li >Assignee: Chris Li >Priority: Minor > > Our namenode pauses for 12-60 seconds several times every hour or so. During > these pauses, no new requests can come in. > Around the time of pauses, we have log messages such as: > 2014-10-22 13:24:22,688 WARN org.apache.hadoop.security.Groups: Potential > performance problem: getGroups(user=x) took 34507 milliseconds. > The current theory is: > 1. Groups has a cache that is refreshed periodically. Each entry has a cache > expiry. > 2. When a cache entry expires, multiple threads can see this expiration and > then we have a thundering herd effect where all these threads hit the wire > and overwhelm our LDAP servers (we are using ShellBasedUnixGroupsMapping with > sssd, how this happens has yet to be established) > 3. group resolution queries begin to take longer, I've observed it taking 1.2 > seconds instead of the usual 0.01-0.03 seconds when measuring in the shell > `time groups myself` > 4. If there is mutual exclusion somewhere along this path, a 1 second pause > could lead to a 60 second pause as all the threads compete for the resource. > The exact cause hasn't been established > Potential solutions include: > 1. Increasing group cache time, which will make the issue less frequent > 2. Rolling evictions of the cache so we prevent the large spike in LDAP > queries > 3. Gate the cache refresh so that only one thread is responsible for > refreshing the cache -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HADOOP-11238) Group cache expiry causes namenode slowdown
[ https://issues.apache.org/jira/browse/HADOOP-11238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Li reassigned HADOOP-11238: - Assignee: Chris Li > Group cache expiry causes namenode slowdown > --- > > Key: HADOOP-11238 > URL: https://issues.apache.org/jira/browse/HADOOP-11238 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 2.5.1 >Reporter: Chris Li >Assignee: Chris Li >Priority: Minor > > Our namenode pauses for 12-60 seconds several times every hour or so. During > these pauses, no new requests can come in. > Around the time of pauses, we have log messages such as: > 2014-10-22 13:24:22,688 WARN org.apache.hadoop.security.Groups: Potential > performance problem: getGroups(user=x) took 34507 milliseconds. > The current theory is: > 1. Groups has a cache that is refreshed periodically. > 2. When the cache is cleared, we have a thundering herd effect which > overwhelms our LDAP servers (we are using ShellBasedUnixGroupsMapping with > sssd, how this happens has yet to be established) > 3. group resolution queries begin to take longer, I've observed it taking 1.2 > seconds instead of the usual 0.01-0.03 seconds when measuring in the shell > `time groups myself` > 4. If there is mutual exclusion somewhere along this path, a 1 second pause > could lead to a 60 second pause as all the threads compete for the resource. > The exact cause hasn't been established > Potential solutions include: > 1. Increasing group cache time, which will make the issue less frequent > 2. Rolling evictions of the cache so we prevent the large spike in LDAP > queries -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11238) Group cache expiry causes namenode slowdown
[ https://issues.apache.org/jira/browse/HADOOP-11238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Li updated HADOOP-11238: -- Description: Our namenode pauses for 12-60 seconds several times every hour or so. During these pauses, no new requests can come in. Around the time of pauses, we have log messages such as: 2014-10-22 13:24:22,688 WARN org.apache.hadoop.security.Groups: Potential performance problem: getGroups(user=x) took 34507 milliseconds. The current theory is: 1. Groups has a cache that is refreshed periodically. 2. When the cache is cleared, we have a thundering herd effect which overwhelms our LDAP servers (we are using ShellBasedUnixGroupsMapping with sssd, how this happens has yet to be established) 3. group resolution queries begin to take longer, I've observed it taking 1.2 seconds instead of the usual 0.01-0.03 seconds when measuring in the shell `time groups myself` 4. If there is mutual exclusion somewhere along this path, a 1 second pause could lead to a 60 second pause as all the threads compete for the resource. The exact cause hasn't been established Potential solutions include: 1. Increasing group cache time, which will make the issue less frequent 2. Rolling evictions of the cache so we prevent the large spike in LDAP queries was: Our namenode pauses for 12-60 seconds every hour or so. During these pauses, no new requests can come in. Around the time of pauses, we have log messages such as: 2014-10-22 13:24:22,688 WARN org.apache.hadoop.security.Groups: Potential performance problem: getGroups(user=x) took 34507 milliseconds. The current theory is: 1. Groups has a cache that is refreshed periodically. 2. When the cache is cleared, we have a thundering herd effect which overwhelms our LDAP servers (we are using ShellBasedUnixGroupsMapping with sssd, how this happens has yet to be established) 3. group resolution queries begin to take longer, I've observed it taking 1.2 seconds instead of the usual 0.01-0.03 seconds when measuring in the shell `time groups myself` 4. If there is mutual exclusion somewhere along this path, a 1 second pause could lead to a 60 second pause as all the threads compete for the resource. The exact cause hasn't been established Potential solutions include: 1. Increasing group cache time, which will make the issue less frequent 2. Rolling evictions of the cache so we prevent the large spike in LDAP queries > Group cache expiry causes namenode slowdown > --- > > Key: HADOOP-11238 > URL: https://issues.apache.org/jira/browse/HADOOP-11238 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 2.5.1 >Reporter: Chris Li >Priority: Minor > > Our namenode pauses for 12-60 seconds several times every hour or so. During > these pauses, no new requests can come in. > Around the time of pauses, we have log messages such as: > 2014-10-22 13:24:22,688 WARN org.apache.hadoop.security.Groups: Potential > performance problem: getGroups(user=x) took 34507 milliseconds. > The current theory is: > 1. Groups has a cache that is refreshed periodically. > 2. When the cache is cleared, we have a thundering herd effect which > overwhelms our LDAP servers (we are using ShellBasedUnixGroupsMapping with > sssd, how this happens has yet to be established) > 3. group resolution queries begin to take longer, I've observed it taking 1.2 > seconds instead of the usual 0.01-0.03 seconds when measuring in the shell > `time groups myself` > 4. If there is mutual exclusion somewhere along this path, a 1 second pause > could lead to a 60 second pause as all the threads compete for the resource. > The exact cause hasn't been established > Potential solutions include: > 1. Increasing group cache time, which will make the issue less frequent > 2. Rolling evictions of the cache so we prevent the large spike in LDAP > queries -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-11238) Group cache expiry causes namenode slowdown
Chris Li created HADOOP-11238: - Summary: Group cache expiry causes namenode slowdown Key: HADOOP-11238 URL: https://issues.apache.org/jira/browse/HADOOP-11238 Project: Hadoop Common Issue Type: Bug Affects Versions: 2.5.1 Reporter: Chris Li Priority: Minor Our namenode pauses for 12-60 seconds every hour or so. During these pauses, no new requests can come in. Around the time of pauses, we have log messages such as: 2014-10-22 13:24:22,688 WARN org.apache.hadoop.security.Groups: Potential performance problem: getGroups(user=x) took 34507 milliseconds. The current theory is: 1. Groups has a cache that is refreshed periodically. 2. When the cache is cleared, we have a thundering herd effect which overwhelms our LDAP servers (we are using ShellBasedUnixGroupsMapping with sssd, how this happens has yet to be established) 3. group resolution queries begin to take longer, I've observed it taking 1.2 seconds instead of the usual 0.01-0.03 seconds when measuring in the shell `time groups myself` 4. If there is mutual exclusion somewhere along this path, a 1 second pause could lead to a 60 second pause as all the threads compete for the resource. The exact cause hasn't been established Potential solutions include: 1. Increasing group cache time, which will make the issue less frequent 2. Rolling evictions of the cache so we prevent the large spike in LDAP queries -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-9640) RPC Congestion Control with FairCallQueue
[ https://issues.apache.org/jira/browse/HADOOP-9640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Li updated HADOOP-9640: - Description: For an easy-to-read summary see: http://www.ebaytechblog.com/2014/08/21/quality-of-service-in-hadoop/ Several production Hadoop cluster incidents occurred where the Namenode was overloaded and failed to respond. We can improve quality of service for users during namenode peak loads by replacing the FIFO call queue with a [Fair Call Queue|https://issues.apache.org/jira/secure/attachment/12616864/NN-denial-of-service-updated-plan.pdf]. (this plan supersedes rpc-congestion-control-draft-plan). Excerpted from the communication of one incident, “The map task of a user was creating huge number of small files in the user directory. Due to the heavy load on NN, the JT also was unable to communicate with NN...The cluster became responsive only once the job was killed.” Excerpted from the communication of another incident, “Namenode was overloaded by GetBlockLocation requests (Correction: should be getFileInfo requests. the job had a bug that called getFileInfo for a nonexistent file in an endless loop). All other requests to namenode were also affected by this and hence all jobs slowed down. Cluster almost came to a grinding halt…Eventually killed jobtracker to kill all jobs that are running.” Excerpted from HDFS-945, “We've seen defective applications cause havoc on the NameNode, for e.g. by doing 100k+ 'listStatus' on very large directories (60k files) etc.” was: Several production Hadoop cluster incidents occurred where the Namenode was overloaded and failed to respond. We can improve quality of service for users during namenode peak loads by replacing the FIFO call queue with a [Fair Call Queue|https://issues.apache.org/jira/secure/attachment/12616864/NN-denial-of-service-updated-plan.pdf]. (this plan supersedes rpc-congestion-control-draft-plan). Excerpted from the communication of one incident, “The map task of a user was creating huge number of small files in the user directory. Due to the heavy load on NN, the JT also was unable to communicate with NN...The cluster became responsive only once the job was killed.” Excerpted from the communication of another incident, “Namenode was overloaded by GetBlockLocation requests (Correction: should be getFileInfo requests. the job had a bug that called getFileInfo for a nonexistent file in an endless loop). All other requests to namenode were also affected by this and hence all jobs slowed down. Cluster almost came to a grinding halt…Eventually killed jobtracker to kill all jobs that are running.” Excerpted from HDFS-945, “We've seen defective applications cause havoc on the NameNode, for e.g. by doing 100k+ 'listStatus' on very large directories (60k files) etc.” > RPC Congestion Control with FairCallQueue > - > > Key: HADOOP-9640 > URL: https://issues.apache.org/jira/browse/HADOOP-9640 > Project: Hadoop Common > Issue Type: Improvement >Affects Versions: 3.0.0, 2.2.0 >Reporter: Xiaobo Peng >Assignee: Chris Li > Labels: hdfs, qos, rpc > Attachments: FairCallQueue-PerformanceOnCluster.pdf, > MinorityMajorityPerformance.pdf, NN-denial-of-service-updated-plan.pdf, > faircallqueue.patch, faircallqueue2.patch, faircallqueue3.patch, > faircallqueue4.patch, faircallqueue5.patch, faircallqueue6.patch, > faircallqueue7_with_runtime_swapping.patch, > rpc-congestion-control-draft-plan.pdf > > > For an easy-to-read summary see: > http://www.ebaytechblog.com/2014/08/21/quality-of-service-in-hadoop/ > Several production Hadoop cluster incidents occurred where the Namenode was > overloaded and failed to respond. > We can improve quality of service for users during namenode peak loads by > replacing the FIFO call queue with a [Fair Call > Queue|https://issues.apache.org/jira/secure/attachment/12616864/NN-denial-of-service-updated-plan.pdf]. > (this plan supersedes rpc-congestion-control-draft-plan). > Excerpted from the communication of one incident, “The map task of a user was > creating huge number of small files in the user directory. Due to the heavy > load on NN, the JT also was unable to communicate with NN...The cluster > became responsive only once the job was killed.” > Excerpted from the communication of another incident, “Namenode was > overloaded by GetBlockLocation requests (Correction: should be getFileInfo > requests. the job had a bug that called getFileInfo for a nonexistent file in > an endless loop). All other requests to namenode were also affected by this > and hence all jobs slowed down. Cluster almost came to a grinding > halt…Eventually killed jobtracker to kill all jobs that are running.” > Excerpted from HDFS-945, “We've seen defective applications
[jira] [Commented] (HADOOP-10597) Evaluate if we can have RPC client back off when server is under heavy load
[ https://issues.apache.org/jira/browse/HADOOP-10597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14169689#comment-14169689 ] Chris Li commented on HADOOP-10597: --- Hi [~mingma], thanks for adding some numbers. If I understand correctly from the graph, the latency spike is a result of maxing out the call queue's capacity, which FairCallQueue will not solve since FCQ has no choice but to enqueue a call somewhere. Just to double check, were all these calls made under the same user? I'd guess that RPC client backoff would work just as well when FairCallQueue is disabled too, since it solves the different problem of alleviating a full queue. I do agree with Steve that we'll want some fuzz on the retry method, since linear could cause load to be periodic over time > Evaluate if we can have RPC client back off when server is under heavy load > --- > > Key: HADOOP-10597 > URL: https://issues.apache.org/jira/browse/HADOOP-10597 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Ming Ma >Assignee: Ming Ma > Attachments: HADOOP-10597-2.patch, HADOOP-10597.patch, > MoreRPCClientBackoffEvaluation.pdf, RPCClientBackoffDesignAndEvaluation.pdf > > > Currently if an application hits NN too hard, RPC requests be in blocking > state, assuming OS connection doesn't run out. Alternatively RPC or NN can > throw some well defined exception back to the client based on certain > policies when it is under heavy load; client will understand such exception > and do exponential back off, as another implementation of > RetryInvocationHandler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-10286) Allow RPCCallBenchmark to benchmark calls by different users
[ https://issues.apache.org/jira/browse/HADOOP-10286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Li updated HADOOP-10286: -- Status: Open (was: Patch Available) Canceling patch so I can add better metrics: 50th percentile, 90th percentile, 99th, etc > Allow RPCCallBenchmark to benchmark calls by different users > > > Key: HADOOP-10286 > URL: https://issues.apache.org/jira/browse/HADOOP-10286 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Chris Li >Assignee: Chris Li > Attachments: HADOOP-10286.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10282) Create a FairCallQueue: a multi-level call queue which schedules incoming calls and multiplexes outgoing calls
[ https://issues.apache.org/jira/browse/HADOOP-10282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107703#comment-14107703 ] Chris Li commented on HADOOP-10282: --- Thanks! Up next some minor patches: HADOOP-10283 makes the FairCallQueue more customizable HADOOP-10286 allows for instrumentation of IPC performance for multiple users > Create a FairCallQueue: a multi-level call queue which schedules incoming > calls and multiplexes outgoing calls > -- > > Key: HADOOP-10282 > URL: https://issues.apache.org/jira/browse/HADOOP-10282 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Chris Li >Assignee: Chris Li > Attachments: HADOOP-10282.patch, HADOOP-10282.patch, > HADOOP-10282.patch, HADOOP-10282.patch > > > The FairCallQueue ensures quality of service by altering the order of RPC > calls internally. > It consists of three parts: > 1. a Scheduler (`HistoryRpcScheduler` is provided) which provides a priority > number from 0 to N (0 being highest priority) > 2. a Multi-level queue (residing in `FairCallQueue`) which provides a way to > keep calls in priority order internally > 3. a Multiplexer (`WeightedRoundRobinMultiplexer` is provided) which provides > logic to control which queue to take from > Currently the Mux and Scheduler are not pluggable, but they probably should > be (up for discussion). > This is how it is used: > // Production > 1. Call is created and given to the CallQueueManager > 2. CallQueueManager requests a `put(T call)` into the `FairCallQueue` which > implements `BlockingQueue` > 3. `FairCallQueue` asks its scheduler for a scheduling decision, which is an > integer e.g. 12 > 4. `FairCallQueue` inserts Call into the 12th queue: > `queues.get(12).put(call)` > // Consumption > 1. CallQueueManager requests `take()` or `poll()` on FairCallQueue > 2. `FairCallQueue` asks its multiplexer for which queue to draw from, which > will also be an integer e.g. 2 > 3. `FairCallQueue` draws from this queue if it has an available call (or > tries other queues if it is empty) > Additional information is available in the linked JIRAs regarding the > Scheduler and Multiplexer's roles. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10286) Allow RPCCallBenchmark to benchmark calls by different users
[ https://issues.apache.org/jira/browse/HADOOP-10286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Li updated HADOOP-10286: -- Priority: Minor (was: Major) > Allow RPCCallBenchmark to benchmark calls by different users > > > Key: HADOOP-10286 > URL: https://issues.apache.org/jira/browse/HADOOP-10286 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Chris Li >Assignee: Chris Li >Priority: Minor > Attachments: HADOOP-10286.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10283) Make Scheduler and Multiplexer swappable
[ https://issues.apache.org/jira/browse/HADOOP-10283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Li updated HADOOP-10283: -- Description: Currently the FairCallQueue uses the DecayRpcScheduler & RoundRobinMultiplexer, this task is to allow the user to configure the scheduler and mux in config settings Priority: Minor (was: Major) Assignee: Chris Li Summary: Make Scheduler and Multiplexer swappable (was: Add metrics to the FairCallQueue) > Make Scheduler and Multiplexer swappable > > > Key: HADOOP-10283 > URL: https://issues.apache.org/jira/browse/HADOOP-10283 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Chris Li >Assignee: Chris Li >Priority: Minor > > Currently the FairCallQueue uses the DecayRpcScheduler & > RoundRobinMultiplexer, this task is to allow the user to configure the > scheduler and mux in config settings -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10282) Create a FairCallQueue: a multi-level call queue which schedules incoming calls and multiplexes outgoing calls
[ https://issues.apache.org/jira/browse/HADOOP-10282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Li updated HADOOP-10282: -- Attachment: HADOOP-10282.patch Hopefully this fixes the unchecked assignment warnings > Create a FairCallQueue: a multi-level call queue which schedules incoming > calls and multiplexes outgoing calls > -- > > Key: HADOOP-10282 > URL: https://issues.apache.org/jira/browse/HADOOP-10282 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Chris Li >Assignee: Chris Li > Attachments: HADOOP-10282.patch, HADOOP-10282.patch, > HADOOP-10282.patch, HADOOP-10282.patch > > > The FairCallQueue ensures quality of service by altering the order of RPC > calls internally. > It consists of three parts: > 1. a Scheduler (`HistoryRpcScheduler` is provided) which provides a priority > number from 0 to N (0 being highest priority) > 2. a Multi-level queue (residing in `FairCallQueue`) which provides a way to > keep calls in priority order internally > 3. a Multiplexer (`WeightedRoundRobinMultiplexer` is provided) which provides > logic to control which queue to take from > Currently the Mux and Scheduler are not pluggable, but they probably should > be (up for discussion). > This is how it is used: > // Production > 1. Call is created and given to the CallQueueManager > 2. CallQueueManager requests a `put(T call)` into the `FairCallQueue` which > implements `BlockingQueue` > 3. `FairCallQueue` asks its scheduler for a scheduling decision, which is an > integer e.g. 12 > 4. `FairCallQueue` inserts Call into the 12th queue: > `queues.get(12).put(call)` > // Consumption > 1. CallQueueManager requests `take()` or `poll()` on FairCallQueue > 2. `FairCallQueue` asks its multiplexer for which queue to draw from, which > will also be an integer e.g. 2 > 3. `FairCallQueue` draws from this queue if it has an available call (or > tries other queues if it is empty) > Additional information is available in the linked JIRAs regarding the > Scheduler and Multiplexer's roles. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10282) Create a FairCallQueue: a multi-level call queue which schedules incoming calls and multiplexes outgoing calls
[ https://issues.apache.org/jira/browse/HADOOP-10282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Li updated HADOOP-10282: -- Attachment: HADOOP-10282.patch Hi [~arpitagarwal] sorry for the delay, I wanted to fix up the tests for more isolation from the mux and scheduler. I remember why I made that comment now, it's because poll() without timeunit and peek() don't acquire the takeLock, so it is possible for poll()/peek() to return null even though something is in the queue. This doesn't affect hadoop since hadoop uses poll(long, TimeUnit) exclusively. I wanted to get this patch out asap, but I think it would also be good to have pluggable schedulers and multiplexers in the FCQ itself, as it enables some cool stuff like HADOOP-10598. I will open another issue for making pluggable schedulers/muxes. > Create a FairCallQueue: a multi-level call queue which schedules incoming > calls and multiplexes outgoing calls > -- > > Key: HADOOP-10282 > URL: https://issues.apache.org/jira/browse/HADOOP-10282 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Chris Li >Assignee: Chris Li > Attachments: HADOOP-10282.patch, HADOOP-10282.patch, > HADOOP-10282.patch > > > The FairCallQueue ensures quality of service by altering the order of RPC > calls internally. > It consists of three parts: > 1. a Scheduler (`HistoryRpcScheduler` is provided) which provides a priority > number from 0 to N (0 being highest priority) > 2. a Multi-level queue (residing in `FairCallQueue`) which provides a way to > keep calls in priority order internally > 3. a Multiplexer (`WeightedRoundRobinMultiplexer` is provided) which provides > logic to control which queue to take from > Currently the Mux and Scheduler are not pluggable, but they probably should > be (up for discussion). > This is how it is used: > // Production > 1. Call is created and given to the CallQueueManager > 2. CallQueueManager requests a `put(T call)` into the `FairCallQueue` which > implements `BlockingQueue` > 3. `FairCallQueue` asks its scheduler for a scheduling decision, which is an > integer e.g. 12 > 4. `FairCallQueue` inserts Call into the 12th queue: > `queues.get(12).put(call)` > // Consumption > 1. CallQueueManager requests `take()` or `poll()` on FairCallQueue > 2. `FairCallQueue` asks its multiplexer for which queue to draw from, which > will also be an integer e.g. 2 > 3. `FairCallQueue` draws from this queue if it has an available call (or > tries other queues if it is empty) > Additional information is available in the linked JIRAs regarding the > Scheduler and Multiplexer's roles. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10282) Create a FairCallQueue: a multi-level call queue which schedules incoming calls and multiplexes outgoing calls
[ https://issues.apache.org/jira/browse/HADOOP-10282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Li updated HADOOP-10282: -- Status: Patch Available (was: Open) > Create a FairCallQueue: a multi-level call queue which schedules incoming > calls and multiplexes outgoing calls > -- > > Key: HADOOP-10282 > URL: https://issues.apache.org/jira/browse/HADOOP-10282 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Chris Li >Assignee: Chris Li > Attachments: HADOOP-10282.patch, HADOOP-10282.patch, > HADOOP-10282.patch > > > The FairCallQueue ensures quality of service by altering the order of RPC > calls internally. > It consists of three parts: > 1. a Scheduler (`HistoryRpcScheduler` is provided) which provides a priority > number from 0 to N (0 being highest priority) > 2. a Multi-level queue (residing in `FairCallQueue`) which provides a way to > keep calls in priority order internally > 3. a Multiplexer (`WeightedRoundRobinMultiplexer` is provided) which provides > logic to control which queue to take from > Currently the Mux and Scheduler are not pluggable, but they probably should > be (up for discussion). > This is how it is used: > // Production > 1. Call is created and given to the CallQueueManager > 2. CallQueueManager requests a `put(T call)` into the `FairCallQueue` which > implements `BlockingQueue` > 3. `FairCallQueue` asks its scheduler for a scheduling decision, which is an > integer e.g. 12 > 4. `FairCallQueue` inserts Call into the 12th queue: > `queues.get(12).put(call)` > // Consumption > 1. CallQueueManager requests `take()` or `poll()` on FairCallQueue > 2. `FairCallQueue` asks its multiplexer for which queue to draw from, which > will also be an integer e.g. 2 > 3. `FairCallQueue` draws from this queue if it has an available call (or > tries other queues if it is empty) > Additional information is available in the linked JIRAs regarding the > Scheduler and Multiplexer's roles. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10282) Create a FairCallQueue: a multi-level call queue which schedules incoming calls and multiplexes outgoing calls
[ https://issues.apache.org/jira/browse/HADOOP-10282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14101320#comment-14101320 ] Chris Li commented on HADOOP-10282: --- I think you're right, that doc was from an older version without that quirk. Attaching patch with revised doc shortly. > Create a FairCallQueue: a multi-level call queue which schedules incoming > calls and multiplexes outgoing calls > -- > > Key: HADOOP-10282 > URL: https://issues.apache.org/jira/browse/HADOOP-10282 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Chris Li >Assignee: Chris Li > Attachments: HADOOP-10282.patch, HADOOP-10282.patch > > > The FairCallQueue ensures quality of service by altering the order of RPC > calls internally. > It consists of three parts: > 1. a Scheduler (`HistoryRpcScheduler` is provided) which provides a priority > number from 0 to N (0 being highest priority) > 2. a Multi-level queue (residing in `FairCallQueue`) which provides a way to > keep calls in priority order internally > 3. a Multiplexer (`WeightedRoundRobinMultiplexer` is provided) which provides > logic to control which queue to take from > Currently the Mux and Scheduler are not pluggable, but they probably should > be (up for discussion). > This is how it is used: > // Production > 1. Call is created and given to the CallQueueManager > 2. CallQueueManager requests a `put(T call)` into the `FairCallQueue` which > implements `BlockingQueue` > 3. `FairCallQueue` asks its scheduler for a scheduling decision, which is an > integer e.g. 12 > 4. `FairCallQueue` inserts Call into the 12th queue: > `queues.get(12).put(call)` > // Consumption > 1. CallQueueManager requests `take()` or `poll()` on FairCallQueue > 2. `FairCallQueue` asks its multiplexer for which queue to draw from, which > will also be an integer e.g. 2 > 3. `FairCallQueue` draws from this queue if it has an available call (or > tries other queues if it is empty) > Additional information is available in the linked JIRAs regarding the > Scheduler and Multiplexer's roles. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10281) Create a scheduler, which assigns schedulables a priority level
[ https://issues.apache.org/jira/browse/HADOOP-10281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14094983#comment-14094983 ] Chris Li commented on HADOOP-10281: --- Awesome, thanks Arpit! The next stage of this patch is the multilevel queue that ties the scheduler and mux together. This will enable QoS to be actually used: https://issues.apache.org/jira/browse/HADOOP-10282 > Create a scheduler, which assigns schedulables a priority level > --- > > Key: HADOOP-10281 > URL: https://issues.apache.org/jira/browse/HADOOP-10281 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Chris Li >Assignee: Chris Li > Fix For: 3.0.0, 2.6.0 > > Attachments: HADOOP-10281-preview.patch, HADOOP-10281.patch, > HADOOP-10281.patch, HADOOP-10281.patch, HADOOP-10281.patch, HADOOP-10281.patch > > > The Scheduler decides which sub-queue to assign a given Call. It implements a > single method getPriorityLevel(Schedulable call) which returns an integer > corresponding to the subqueue the FairCallQueue should place the call in. > The HistoryRpcScheduler is one such implementation which uses the username of > each call and determines what % of calls in recent history were made by this > user. > It is configured with a historyLength (how many calls to track) and a list of > integer thresholds which determine the boundaries between priority levels. > For instance, if the scheduler has a historyLength of 8; and priority > thresholds of 4,2,1; and saw calls made by these users in order: > Alice, Bob, Alice, Alice, Bob, Jerry, Alice, Alice > * Another call by Alice would be placed in queue 3, since she has already > made >= 4 calls > * Another call by Bob would be placed in queue 2, since he has >= 2 but less > than 4 calls > * A call by Carlos would be placed in queue 0, since he has no calls in the > history > Also, some versions of this patch include the concept of a 'service user', > which is a user that is always scheduled high-priority. Currently this seems > redundant and will probably be removed in later patches, since its not too > useful. > > As of now, the current scheduler is the DecayRpcScheduler, which only keeps > track of the number of each type of call and decays these counts periodically. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10281) Create a scheduler, which assigns schedulables a priority level
[ https://issues.apache.org/jira/browse/HADOOP-10281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Li updated HADOOP-10281: -- Attachment: HADOOP-10281.patch Removed the HistoryRpcSchedulerMXBean > Create a scheduler, which assigns schedulables a priority level > --- > > Key: HADOOP-10281 > URL: https://issues.apache.org/jira/browse/HADOOP-10281 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Chris Li >Assignee: Chris Li > Attachments: HADOOP-10281-preview.patch, HADOOP-10281.patch, > HADOOP-10281.patch, HADOOP-10281.patch, HADOOP-10281.patch, HADOOP-10281.patch > > > The Scheduler decides which sub-queue to assign a given Call. It implements a > single method getPriorityLevel(Schedulable call) which returns an integer > corresponding to the subqueue the FairCallQueue should place the call in. > The HistoryRpcScheduler is one such implementation which uses the username of > each call and determines what % of calls in recent history were made by this > user. > It is configured with a historyLength (how many calls to track) and a list of > integer thresholds which determine the boundaries between priority levels. > For instance, if the scheduler has a historyLength of 8; and priority > thresholds of 4,2,1; and saw calls made by these users in order: > Alice, Bob, Alice, Alice, Bob, Jerry, Alice, Alice > * Another call by Alice would be placed in queue 3, since she has already > made >= 4 calls > * Another call by Bob would be placed in queue 2, since he has >= 2 but less > than 4 calls > * A call by Carlos would be placed in queue 0, since he has no calls in the > history > Also, some versions of this patch include the concept of a 'service user', > which is a user that is always scheduled high-priority. Currently this seems > redundant and will probably be removed in later patches, since its not too > useful. > > As of now, the current scheduler is the DecayRpcScheduler, which only keeps > track of the number of each type of call and decays these counts periodically. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10281) Create a scheduler, which assigns schedulables a priority level
[ https://issues.apache.org/jira/browse/HADOOP-10281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Li updated HADOOP-10281: -- Attachment: HADOOP-10281-preview.patch My bad, I didn't want CI picking this file up; here it is again > Create a scheduler, which assigns schedulables a priority level > --- > > Key: HADOOP-10281 > URL: https://issues.apache.org/jira/browse/HADOOP-10281 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Chris Li >Assignee: Chris Li > Attachments: HADOOP-10281-preview.patch, HADOOP-10281.patch, > HADOOP-10281.patch, HADOOP-10281.patch, HADOOP-10281.patch, HADOOP-10281.patch > > > The Scheduler decides which sub-queue to assign a given Call. It implements a > single method getPriorityLevel(Schedulable call) which returns an integer > corresponding to the subqueue the FairCallQueue should place the call in. > The HistoryRpcScheduler is one such implementation which uses the username of > each call and determines what % of calls in recent history were made by this > user. > It is configured with a historyLength (how many calls to track) and a list of > integer thresholds which determine the boundaries between priority levels. > For instance, if the scheduler has a historyLength of 8; and priority > thresholds of 4,2,1; and saw calls made by these users in order: > Alice, Bob, Alice, Alice, Bob, Jerry, Alice, Alice > * Another call by Alice would be placed in queue 3, since she has already > made >= 4 calls > * Another call by Bob would be placed in queue 2, since he has >= 2 but less > than 4 calls > * A call by Carlos would be placed in queue 0, since he has no calls in the > history > Also, some versions of this patch include the concept of a 'service user', > which is a user that is always scheduled high-priority. Currently this seems > redundant and will probably be removed in later patches, since its not too > useful. > > As of now, the current scheduler is the DecayRpcScheduler, which only keeps > track of the number of each type of call and decays these counts periodically. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10281) Create a scheduler, which assigns schedulables a priority level
[ https://issues.apache.org/jira/browse/HADOOP-10281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Li updated HADOOP-10281: -- Attachment: (was: HADOOP-10281-preview.patch) > Create a scheduler, which assigns schedulables a priority level > --- > > Key: HADOOP-10281 > URL: https://issues.apache.org/jira/browse/HADOOP-10281 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Chris Li >Assignee: Chris Li > Attachments: HADOOP-10281.patch, HADOOP-10281.patch, > HADOOP-10281.patch, HADOOP-10281.patch > > > The Scheduler decides which sub-queue to assign a given Call. It implements a > single method getPriorityLevel(Schedulable call) which returns an integer > corresponding to the subqueue the FairCallQueue should place the call in. > The HistoryRpcScheduler is one such implementation which uses the username of > each call and determines what % of calls in recent history were made by this > user. > It is configured with a historyLength (how many calls to track) and a list of > integer thresholds which determine the boundaries between priority levels. > For instance, if the scheduler has a historyLength of 8; and priority > thresholds of 4,2,1; and saw calls made by these users in order: > Alice, Bob, Alice, Alice, Bob, Jerry, Alice, Alice > * Another call by Alice would be placed in queue 3, since she has already > made >= 4 calls > * Another call by Bob would be placed in queue 2, since he has >= 2 but less > than 4 calls > * A call by Carlos would be placed in queue 0, since he has no calls in the > history > Also, some versions of this patch include the concept of a 'service user', > which is a user that is always scheduled high-priority. Currently this seems > redundant and will probably be removed in later patches, since its not too > useful. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10281) Create a scheduler, which assigns schedulables a priority level
[ https://issues.apache.org/jira/browse/HADOOP-10281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Li updated HADOOP-10281: -- Description: The Scheduler decides which sub-queue to assign a given Call. It implements a single method getPriorityLevel(Schedulable call) which returns an integer corresponding to the subqueue the FairCallQueue should place the call in. The HistoryRpcScheduler is one such implementation which uses the username of each call and determines what % of calls in recent history were made by this user. It is configured with a historyLength (how many calls to track) and a list of integer thresholds which determine the boundaries between priority levels. For instance, if the scheduler has a historyLength of 8; and priority thresholds of 4,2,1; and saw calls made by these users in order: Alice, Bob, Alice, Alice, Bob, Jerry, Alice, Alice * Another call by Alice would be placed in queue 3, since she has already made >= 4 calls * Another call by Bob would be placed in queue 2, since he has >= 2 but less than 4 calls * A call by Carlos would be placed in queue 0, since he has no calls in the history Also, some versions of this patch include the concept of a 'service user', which is a user that is always scheduled high-priority. Currently this seems redundant and will probably be removed in later patches, since its not too useful. As of now, the current scheduler is the DecayRpcScheduler, which only keeps track of the number of each type of call and decays these counts periodically. was: The Scheduler decides which sub-queue to assign a given Call. It implements a single method getPriorityLevel(Schedulable call) which returns an integer corresponding to the subqueue the FairCallQueue should place the call in. The HistoryRpcScheduler is one such implementation which uses the username of each call and determines what % of calls in recent history were made by this user. It is configured with a historyLength (how many calls to track) and a list of integer thresholds which determine the boundaries between priority levels. For instance, if the scheduler has a historyLength of 8; and priority thresholds of 4,2,1; and saw calls made by these users in order: Alice, Bob, Alice, Alice, Bob, Jerry, Alice, Alice * Another call by Alice would be placed in queue 3, since she has already made >= 4 calls * Another call by Bob would be placed in queue 2, since he has >= 2 but less than 4 calls * A call by Carlos would be placed in queue 0, since he has no calls in the history Also, some versions of this patch include the concept of a 'service user', which is a user that is always scheduled high-priority. Currently this seems redundant and will probably be removed in later patches, since its not too useful. > Create a scheduler, which assigns schedulables a priority level > --- > > Key: HADOOP-10281 > URL: https://issues.apache.org/jira/browse/HADOOP-10281 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Chris Li >Assignee: Chris Li > Attachments: HADOOP-10281.patch, HADOOP-10281.patch, > HADOOP-10281.patch, HADOOP-10281.patch > > > The Scheduler decides which sub-queue to assign a given Call. It implements a > single method getPriorityLevel(Schedulable call) which returns an integer > corresponding to the subqueue the FairCallQueue should place the call in. > The HistoryRpcScheduler is one such implementation which uses the username of > each call and determines what % of calls in recent history were made by this > user. > It is configured with a historyLength (how many calls to track) and a list of > integer thresholds which determine the boundaries between priority levels. > For instance, if the scheduler has a historyLength of 8; and priority > thresholds of 4,2,1; and saw calls made by these users in order: > Alice, Bob, Alice, Alice, Bob, Jerry, Alice, Alice > * Another call by Alice would be placed in queue 3, since she has already > made >= 4 calls > * Another call by Bob would be placed in queue 2, since he has >= 2 but less > than 4 calls > * A call by Carlos would be placed in queue 0, since he has no calls in the > history > Also, some versions of this patch include the concept of a 'service user', > which is a user that is always scheduled high-priority. Currently this seems > redundant and will probably be removed in later patches, since its not too > useful. > > As of now, the current scheduler is the DecayRpcScheduler, which only keeps > track of the number of each type of call and decays these counts periodically. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10281) Create a scheduler, which assigns schedulables a priority level
[ https://issues.apache.org/jira/browse/HADOOP-10281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Li updated HADOOP-10281: -- Attachment: HADOOP-10281.patch > Create a scheduler, which assigns schedulables a priority level > --- > > Key: HADOOP-10281 > URL: https://issues.apache.org/jira/browse/HADOOP-10281 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Chris Li >Assignee: Chris Li > Attachments: HADOOP-10281.patch, HADOOP-10281.patch, > HADOOP-10281.patch, HADOOP-10281.patch > > > The Scheduler decides which sub-queue to assign a given Call. It implements a > single method getPriorityLevel(Schedulable call) which returns an integer > corresponding to the subqueue the FairCallQueue should place the call in. > The HistoryRpcScheduler is one such implementation which uses the username of > each call and determines what % of calls in recent history were made by this > user. > It is configured with a historyLength (how many calls to track) and a list of > integer thresholds which determine the boundaries between priority levels. > For instance, if the scheduler has a historyLength of 8; and priority > thresholds of 4,2,1; and saw calls made by these users in order: > Alice, Bob, Alice, Alice, Bob, Jerry, Alice, Alice > * Another call by Alice would be placed in queue 3, since she has already > made >= 4 calls > * Another call by Bob would be placed in queue 2, since he has >= 2 but less > than 4 calls > * A call by Carlos would be placed in queue 0, since he has no calls in the > history > Also, some versions of this patch include the concept of a 'service user', > which is a user that is always scheduled high-priority. Currently this seems > redundant and will probably be removed in later patches, since its not too > useful. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10281) Create a scheduler, which assigns schedulables a priority level
[ https://issues.apache.org/jira/browse/HADOOP-10281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Li updated HADOOP-10281: -- Status: Patch Available (was: Open) > Create a scheduler, which assigns schedulables a priority level > --- > > Key: HADOOP-10281 > URL: https://issues.apache.org/jira/browse/HADOOP-10281 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Chris Li >Assignee: Chris Li > Attachments: HADOOP-10281.patch, HADOOP-10281.patch, > HADOOP-10281.patch, HADOOP-10281.patch > > > The Scheduler decides which sub-queue to assign a given Call. It implements a > single method getPriorityLevel(Schedulable call) which returns an integer > corresponding to the subqueue the FairCallQueue should place the call in. > The HistoryRpcScheduler is one such implementation which uses the username of > each call and determines what % of calls in recent history were made by this > user. > It is configured with a historyLength (how many calls to track) and a list of > integer thresholds which determine the boundaries between priority levels. > For instance, if the scheduler has a historyLength of 8; and priority > thresholds of 4,2,1; and saw calls made by these users in order: > Alice, Bob, Alice, Alice, Bob, Jerry, Alice, Alice > * Another call by Alice would be placed in queue 3, since she has already > made >= 4 calls > * Another call by Bob would be placed in queue 2, since he has >= 2 but less > than 4 calls > * A call by Carlos would be placed in queue 0, since he has no calls in the > history > Also, some versions of this patch include the concept of a 'service user', > which is a user that is always scheduled high-priority. Currently this seems > redundant and will probably be removed in later patches, since its not too > useful. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10281) Create a scheduler, which assigns schedulables a priority level
[ https://issues.apache.org/jira/browse/HADOOP-10281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14093300#comment-14093300 ] Chris Li commented on HADOOP-10281: --- I didn't record it, but I can if you're interested. I suspect it'll be slightly worse than the minority user's latency in the LinkedBlockingQueue (since the resources have to come from somewhere). > Create a scheduler, which assigns schedulables a priority level > --- > > Key: HADOOP-10281 > URL: https://issues.apache.org/jira/browse/HADOOP-10281 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Chris Li >Assignee: Chris Li > Attachments: HADOOP-10281-preview.patch, HADOOP-10281.patch, > HADOOP-10281.patch, HADOOP-10281.patch > > > The Scheduler decides which sub-queue to assign a given Call. It implements a > single method getPriorityLevel(Schedulable call) which returns an integer > corresponding to the subqueue the FairCallQueue should place the call in. > The HistoryRpcScheduler is one such implementation which uses the username of > each call and determines what % of calls in recent history were made by this > user. > It is configured with a historyLength (how many calls to track) and a list of > integer thresholds which determine the boundaries between priority levels. > For instance, if the scheduler has a historyLength of 8; and priority > thresholds of 4,2,1; and saw calls made by these users in order: > Alice, Bob, Alice, Alice, Bob, Jerry, Alice, Alice > * Another call by Alice would be placed in queue 3, since she has already > made >= 4 calls > * Another call by Bob would be placed in queue 2, since he has >= 2 but less > than 4 calls > * A call by Carlos would be placed in queue 0, since he has no calls in the > history > Also, some versions of this patch include the concept of a 'service user', > which is a user that is always scheduled high-priority. Currently this seems > redundant and will probably be removed in later patches, since its not too > useful. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10281) Create a scheduler, which assigns schedulables a priority level
[ https://issues.apache.org/jira/browse/HADOOP-10281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14093229#comment-14093229 ] Chris Li commented on HADOOP-10281: --- Hi [~arpitagarwal] that's correct. It's not very scientific, but it's a sanity check to make sure that the scheduler performs under various loads. The workloads are mapreduce jobs that coordinate to perform a ddos attack on the namenode. Each job runs under 10 users, each job maps to 20 nodes, and spams the namenode using a varying number of threads. Rest: No load Equal: 100 threads each Balanced: 10, 20, 30..., 80, 90, 100 threads respectievly Majority: 100, then 1-2 for the rest I think this is ready. I will post a patch shortly for CI > Create a scheduler, which assigns schedulables a priority level > --- > > Key: HADOOP-10281 > URL: https://issues.apache.org/jira/browse/HADOOP-10281 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Chris Li >Assignee: Chris Li > Attachments: HADOOP-10281-preview.patch, HADOOP-10281.patch, > HADOOP-10281.patch, HADOOP-10281.patch > > > The Scheduler decides which sub-queue to assign a given Call. It implements a > single method getPriorityLevel(Schedulable call) which returns an integer > corresponding to the subqueue the FairCallQueue should place the call in. > The HistoryRpcScheduler is one such implementation which uses the username of > each call and determines what % of calls in recent history were made by this > user. > It is configured with a historyLength (how many calls to track) and a list of > integer thresholds which determine the boundaries between priority levels. > For instance, if the scheduler has a historyLength of 8; and priority > thresholds of 4,2,1; and saw calls made by these users in order: > Alice, Bob, Alice, Alice, Bob, Jerry, Alice, Alice > * Another call by Alice would be placed in queue 3, since she has already > made >= 4 calls > * Another call by Bob would be placed in queue 2, since he has >= 2 but less > than 4 calls > * A call by Carlos would be placed in queue 0, since he has no calls in the > history > Also, some versions of this patch include the concept of a 'service user', > which is a user that is always scheduled high-priority. Currently this seems > redundant and will probably be removed in later patches, since its not too > useful. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10281) Create a scheduler, which assigns schedulables a priority level
[ https://issues.apache.org/jira/browse/HADOOP-10281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14085690#comment-14085690 ] Chris Li commented on HADOOP-10281: --- Also regarding the spikes in above data: spikes are due to GC pauses on the namenode > Create a scheduler, which assigns schedulables a priority level > --- > > Key: HADOOP-10281 > URL: https://issues.apache.org/jira/browse/HADOOP-10281 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Chris Li >Assignee: Chris Li > Attachments: HADOOP-10281-preview.patch, HADOOP-10281.patch, > HADOOP-10281.patch, HADOOP-10281.patch > > > The Scheduler decides which sub-queue to assign a given Call. It implements a > single method getPriorityLevel(Schedulable call) which returns an integer > corresponding to the subqueue the FairCallQueue should place the call in. > The HistoryRpcScheduler is one such implementation which uses the username of > each call and determines what % of calls in recent history were made by this > user. > It is configured with a historyLength (how many calls to track) and a list of > integer thresholds which determine the boundaries between priority levels. > For instance, if the scheduler has a historyLength of 8; and priority > thresholds of 4,2,1; and saw calls made by these users in order: > Alice, Bob, Alice, Alice, Bob, Jerry, Alice, Alice > * Another call by Alice would be placed in queue 3, since she has already > made >= 4 calls > * Another call by Bob would be placed in queue 2, since he has >= 2 but less > than 4 calls > * A call by Carlos would be placed in queue 0, since he has no calls in the > history > Also, some versions of this patch include the concept of a 'service user', > which is a user that is always scheduled high-priority. Currently this seems > redundant and will probably be removed in later patches, since its not too > useful. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10281) Create a scheduler, which assigns schedulables a priority level
[ https://issues.apache.org/jira/browse/HADOOP-10281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14085687#comment-14085687 ] Chris Li commented on HADOOP-10281: --- Sorry for the delay, I completed tests at scale and so far have found the decay scheduler to perform as expected. Attached is the minority user latency for three different load profiles + at rest (no load). default LinkedBlockingQueue in gray, FairCallQueue in color. !http://i.imgur.com/YzXB4Qp.png! !http://i.imgur.com/McnN1xf.png! !http://i.imgur.com/Kfs2Q8A.png! (At rest) !http://i.imgur.com/dEasold.png! > Create a scheduler, which assigns schedulables a priority level > --- > > Key: HADOOP-10281 > URL: https://issues.apache.org/jira/browse/HADOOP-10281 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Chris Li >Assignee: Chris Li > Attachments: HADOOP-10281-preview.patch, HADOOP-10281.patch, > HADOOP-10281.patch, HADOOP-10281.patch > > > The Scheduler decides which sub-queue to assign a given Call. It implements a > single method getPriorityLevel(Schedulable call) which returns an integer > corresponding to the subqueue the FairCallQueue should place the call in. > The HistoryRpcScheduler is one such implementation which uses the username of > each call and determines what % of calls in recent history were made by this > user. > It is configured with a historyLength (how many calls to track) and a list of > integer thresholds which determine the boundaries between priority levels. > For instance, if the scheduler has a historyLength of 8; and priority > thresholds of 4,2,1; and saw calls made by these users in order: > Alice, Bob, Alice, Alice, Bob, Jerry, Alice, Alice > * Another call by Alice would be placed in queue 3, since she has already > made >= 4 calls > * Another call by Bob would be placed in queue 2, since he has >= 2 but less > than 4 calls > * A call by Carlos would be placed in queue 0, since he has no calls in the > history > Also, some versions of this patch include the concept of a 'service user', > which is a user that is always scheduled high-priority. Currently this seems > redundant and will probably be removed in later patches, since its not too > useful. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10811) Allow classes to be reloaded at runtime
[ https://issues.apache.org/jira/browse/HADOOP-10811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14070602#comment-14070602 ] Chris Li commented on HADOOP-10811: --- One thing worth discussing is also whether or not this is a useful feature, now that HA allows for rolling restarts. Not everyone is running HA today, but it may be encouraged to do so in the future for this ability > Allow classes to be reloaded at runtime > --- > > Key: HADOOP-10811 > URL: https://issues.apache.org/jira/browse/HADOOP-10811 > Project: Hadoop Common > Issue Type: New Feature > Components: conf >Affects Versions: 3.0.0 >Reporter: Chris Li >Assignee: Chris Li >Priority: Minor > > Currently hadoop loads its classes and caches them in the Configuration > class. Even if the user swaps a class's jar at runtime, hadoop will continue > to use the cached classes when using reflection to instantiate objects. This > limits the usefulness of things like HADOOP-10285, because the admin would > need to restart each time they wanted to change their queue class. > This patch is to add a way to refresh the class cache, by creating a new > refresh handler to do so (using HADOOP-10376) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10825) Refactor class creation logic in Configuration into nested class
[ https://issues.apache.org/jira/browse/HADOOP-10825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14061995#comment-14061995 ] Chris Li commented on HADOOP-10825: --- Build failures should be unrelated > Refactor class creation logic in Configuration into nested class > > > Key: HADOOP-10825 > URL: https://issues.apache.org/jira/browse/HADOOP-10825 > Project: Hadoop Common > Issue Type: Sub-task > Components: conf >Reporter: Chris Li >Assignee: Chris Li >Priority: Minor > Attachments: HADOOP-10811-10825.patch > > > This first patch refactors class creation inside Configuration into a nested > class called ClassCreator (a pretty uninspired name; I’m up for better naming > suggestions). > Since this is a refactor that adds no new features, I have not included a > test. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10828) Allow user to reload classes
[ https://issues.apache.org/jira/browse/HADOOP-10828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Li updated HADOOP-10828: -- Attachment: HADOOP-10811-10828-preview.patch First a disclaimer: not knowledgable enough in java metaprogramming to know if I’ve done something horribly wrong, this is an attempt at adding this feature. This patch adds runtime class reloading. The user submits a file URL (haven’t worked in support for network URLs yet) as an argument in a refresh request: hadoop dfsadmin -refresh nn ClassLoadPathManager.addPath file:/usr/lib/jars/new-jar-dir This adds the given URL to a set of URLs and causes all new Configuration objects to use a new URLClassLoader with the paths in the set of URLs the manager maintains. This works in my unit tests and testing on my VM cluster. We instantiate a new SelectiveClassLoader each time to get around the limitation of classloaders caching their classes (and to bust the class cache in the ClassCreator) We have our subclass only load classes it knows in advance are reloadable, so that normal classes can be loaded by parent class loaders. This is a limitation in this way of doing things that I'm not yet sure how to work around short of reimplementing a ton of stuff in the native class loaders. For this patch preview, I've hardcoded some reloadable classes (but this would be a configurable property). Looking for feedback on this approach; thanks. > Allow user to reload classes > > > Key: HADOOP-10828 > URL: https://issues.apache.org/jira/browse/HADOOP-10828 > Project: Hadoop Common > Issue Type: Sub-task > Components: conf >Reporter: Chris Li >Assignee: Chris Li >Priority: Minor > Attachments: HADOOP-10811-10828-preview.patch > > > This patch should allow reloading classes at runtime. The software will > maintain a set of extra jars to load classes from. The interface for adding > and removing from this set will be done using the refresh protocol and a > RefreshHandler. > Considerations include: > - Which classes are eligible for reloading? > - Does reloading change existing classes? > - Does reloading affect classes created from existing Configuration objects? > Details will be added in comments for each patch preview. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10825) Refactor class creation logic in Configuration into nested class
[ https://issues.apache.org/jira/browse/HADOOP-10825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Li updated HADOOP-10825: -- Attachment: HADOOP-10811-10825.patch > Refactor class creation logic in Configuration into nested class > > > Key: HADOOP-10825 > URL: https://issues.apache.org/jira/browse/HADOOP-10825 > Project: Hadoop Common > Issue Type: Sub-task > Components: conf >Reporter: Chris Li >Assignee: Chris Li >Priority: Minor > Attachments: HADOOP-10811-10825.patch > > > This first patch refactors class creation inside Configuration into a nested > class called ClassCreator (a pretty uninspired name; I’m up for better naming > suggestions). > Since this is a refactor that adds no new features, I have not included a > test. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10825) Refactor class creation logic in Configuration into nested class
[ https://issues.apache.org/jira/browse/HADOOP-10825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Li updated HADOOP-10825: -- Status: Patch Available (was: Open) > Refactor class creation logic in Configuration into nested class > > > Key: HADOOP-10825 > URL: https://issues.apache.org/jira/browse/HADOOP-10825 > Project: Hadoop Common > Issue Type: Sub-task > Components: conf >Reporter: Chris Li >Assignee: Chris Li >Priority: Minor > > This first patch refactors class creation inside Configuration into a nested > class called ClassCreator (a pretty uninspired name; I’m up for better naming > suggestions). > Since this is a refactor that adds no new features, I have not included a > test. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10828) Allow user to reload classes
[ https://issues.apache.org/jira/browse/HADOOP-10828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Li updated HADOOP-10828: -- Description: This patch should allow reloading classes at runtime. The software will maintain a set of extra jars to load classes from. The interface for adding and removing from this set will be done using the refresh protocol and a RefreshHandler. Considerations include: - Which classes are eligible for reloading? - Does reloading change existing classes? - Does reloading affect classes created from existing Configuration objects? Details will be added in comments for each patch preview. > Allow user to reload classes > > > Key: HADOOP-10828 > URL: https://issues.apache.org/jira/browse/HADOOP-10828 > Project: Hadoop Common > Issue Type: Sub-task > Components: conf >Reporter: Chris Li >Assignee: Chris Li >Priority: Minor > > This patch should allow reloading classes at runtime. The software will > maintain a set of extra jars to load classes from. The interface for adding > and removing from this set will be done using the refresh protocol and a > RefreshHandler. > Considerations include: > - Which classes are eligible for reloading? > - Does reloading change existing classes? > - Does reloading affect classes created from existing Configuration objects? > Details will be added in comments for each patch preview. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HADOOP-10828) Allow user to reload classes
Chris Li created HADOOP-10828: - Summary: Allow user to reload classes Key: HADOOP-10828 URL: https://issues.apache.org/jira/browse/HADOOP-10828 Project: Hadoop Common Issue Type: Sub-task Reporter: Chris Li Assignee: Chris Li Priority: Minor -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HADOOP-10825) Refactor class creation logic in Configuration into nested class
Chris Li created HADOOP-10825: - Summary: Refactor class creation logic in Configuration into nested class Key: HADOOP-10825 URL: https://issues.apache.org/jira/browse/HADOOP-10825 Project: Hadoop Common Issue Type: Sub-task Reporter: Chris Li Assignee: Chris Li Priority: Minor This first patch refactors class creation inside Configuration into a nested class called ClassCreator (a pretty uninspired name; I’m up for better naming suggestions). Since this is a refactor that adds no new features, I have not included a test. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10281) Create a scheduler, which assigns schedulables a priority level
[ https://issues.apache.org/jira/browse/HADOOP-10281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14061779#comment-14061779 ] Chris Li commented on HADOOP-10281: --- Hey guys, sorry for the delay, was on vacation, and have some stuff to catch up on before I can complete the new benchmarks. I don't think it should be committed until I have benchmarks at scale. To answer your questions Eddy: 1. Yes, the latest design abandons tracking a fixed number of calls and instead tracks all calls over a fixed time period. 2. Larger decay periods will indeed make it less responsive to burst traffic. totalCount will increase over time, then get decayed along with the counts on each decay sweep. More aggressive decays (longer periods, greater factors) will make recent calls less affected by older calls. 3. Same, caching will reduce responsiveness to burst traffic. The change Arpit suggested would increase responsiveness, as would eliminating the schedule cache. This might not even make a big difference since it's done on different threads. I hope to have benchmarks up next week > Create a scheduler, which assigns schedulables a priority level > --- > > Key: HADOOP-10281 > URL: https://issues.apache.org/jira/browse/HADOOP-10281 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Chris Li >Assignee: Chris Li > Attachments: HADOOP-10281-preview.patch, HADOOP-10281.patch, > HADOOP-10281.patch, HADOOP-10281.patch > > > The Scheduler decides which sub-queue to assign a given Call. It implements a > single method getPriorityLevel(Schedulable call) which returns an integer > corresponding to the subqueue the FairCallQueue should place the call in. > The HistoryRpcScheduler is one such implementation which uses the username of > each call and determines what % of calls in recent history were made by this > user. > It is configured with a historyLength (how many calls to track) and a list of > integer thresholds which determine the boundaries between priority levels. > For instance, if the scheduler has a historyLength of 8; and priority > thresholds of 4,2,1; and saw calls made by these users in order: > Alice, Bob, Alice, Alice, Bob, Jerry, Alice, Alice > * Another call by Alice would be placed in queue 3, since she has already > made >= 4 calls > * Another call by Bob would be placed in queue 2, since he has >= 2 but less > than 4 calls > * A call by Carlos would be placed in queue 0, since he has no calls in the > history > Also, some versions of this patch include the concept of a 'service user', > which is a user that is always scheduled high-priority. Currently this seems > redundant and will probably be removed in later patches, since its not too > useful. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HADOOP-10811) Allow classes to be reloaded at runtime
Chris Li created HADOOP-10811: - Summary: Allow classes to be reloaded at runtime Key: HADOOP-10811 URL: https://issues.apache.org/jira/browse/HADOOP-10811 Project: Hadoop Common Issue Type: New Feature Components: conf Affects Versions: 3.0.0 Reporter: Chris Li Assignee: Chris Li Priority: Minor Currently hadoop loads its classes and caches them in the Configuration class. Even if the user swaps a class's jar at runtime, hadoop will continue to use the cached classes when using reflection to instantiate objects. This limits the usefulness of things like HADOOP-10285, because the admin would need to restart each time they wanted to change their queue class. This patch is to add a way to refresh the class cache, by creating a new refresh handler to do so (using HADOOP-10376) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10281) Create a scheduler, which assigns schedulables a priority level
[ https://issues.apache.org/jira/browse/HADOOP-10281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Li updated HADOOP-10281: -- Attachment: HADOOP-10281-preview.patch Hi [~arpitagarwal], I've attached a preview of a new scheduler, the DecayRpcScheduler. This patch contains both schedulers, but presumably the decay scheduler would supplant the historyscheduler in the final patch. Performance tests show that the historyscheduler doesn't add statistically significant overhead (on my laptop) compared to both the decayscheduler and no scheduler. Next step is to measure performance on a real cluster. > Create a scheduler, which assigns schedulables a priority level > --- > > Key: HADOOP-10281 > URL: https://issues.apache.org/jira/browse/HADOOP-10281 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Chris Li >Assignee: Chris Li > Attachments: HADOOP-10281-preview.patch, HADOOP-10281.patch, > HADOOP-10281.patch, HADOOP-10281.patch > > > The Scheduler decides which sub-queue to assign a given Call. It implements a > single method getPriorityLevel(Schedulable call) which returns an integer > corresponding to the subqueue the FairCallQueue should place the call in. > The HistoryRpcScheduler is one such implementation which uses the username of > each call and determines what % of calls in recent history were made by this > user. > It is configured with a historyLength (how many calls to track) and a list of > integer thresholds which determine the boundaries between priority levels. > For instance, if the scheduler has a historyLength of 8; and priority > thresholds of 4,2,1; and saw calls made by these users in order: > Alice, Bob, Alice, Alice, Bob, Jerry, Alice, Alice > * Another call by Alice would be placed in queue 3, since she has already > made >= 4 calls > * Another call by Bob would be placed in queue 2, since he has >= 2 but less > than 4 calls > * A call by Carlos would be placed in queue 0, since he has no calls in the > history > Also, some versions of this patch include the concept of a 'service user', > which is a user that is always scheduled high-priority. Currently this seems > redundant and will probably be removed in later patches, since its not too > useful. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10282) Create a FairCallQueue: a multi-level call queue which schedules incoming calls and multiplexes outgoing calls
[ https://issues.apache.org/jira/browse/HADOOP-10282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Li updated HADOOP-10282: -- Attachment: HADOOP-10282.patch Hi Arpit, 1. LinkedBlockingQueue will be on by default, this can be enabled in the config 2. Because there can be multiple RPC servers, we need to have the configs specify which server. It was [~sureshms]'s idea to use port number to do this. 3. I'm not sure where the right place will be to document this. 4. Whoops, is fixed in this new patch. Turns out that dropping calls is a terrible thing for the client. 5. Do you mean to .put() on the high priority queue? That would allow the current user to be serviced the fastest, but it might cause starvation of other requests if the queue is constantly waiting to put on queue 0 6. Thanks 7. Suppose so... I think prior versions of this did use that lock recursively. Any preferences? 8, 9. Actually take() will never be used in hadoop(), maybe it shouldn't even be implemented? The CallQueueManager uses poll() with timeout. I don't remember why take was so bad, it might be possible to take(), block, and never get a result even though something is in the queue... 10. Sounds good. 11. As you know I can't spell :) Uploaded new patch to fix some of these issues > Create a FairCallQueue: a multi-level call queue which schedules incoming > calls and multiplexes outgoing calls > -- > > Key: HADOOP-10282 > URL: https://issues.apache.org/jira/browse/HADOOP-10282 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Chris Li >Assignee: Chris Li > Attachments: HADOOP-10282.patch, HADOOP-10282.patch > > > The FairCallQueue ensures quality of service by altering the order of RPC > calls internally. > It consists of three parts: > 1. a Scheduler (`HistoryRpcScheduler` is provided) which provides a priority > number from 0 to N (0 being highest priority) > 2. a Multi-level queue (residing in `FairCallQueue`) which provides a way to > keep calls in priority order internally > 3. a Multiplexer (`WeightedRoundRobinMultiplexer` is provided) which provides > logic to control which queue to take from > Currently the Mux and Scheduler are not pluggable, but they probably should > be (up for discussion). > This is how it is used: > // Production > 1. Call is created and given to the CallQueueManager > 2. CallQueueManager requests a `put(T call)` into the `FairCallQueue` which > implements `BlockingQueue` > 3. `FairCallQueue` asks its scheduler for a scheduling decision, which is an > integer e.g. 12 > 4. `FairCallQueue` inserts Call into the 12th queue: > `queues.get(12).put(call)` > // Consumption > 1. CallQueueManager requests `take()` or `poll()` on FairCallQueue > 2. `FairCallQueue` asks its multiplexer for which queue to draw from, which > will also be an integer e.g. 2 > 3. `FairCallQueue` draws from this queue if it has an available call (or > tries other queues if it is empty) > Additional information is available in the linked JIRAs regarding the > Scheduler and Multiplexer's roles. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10281) Create a scheduler, which assigns schedulables a priority level
[ https://issues.apache.org/jira/browse/HADOOP-10281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Li updated HADOOP-10281: -- Attachment: (was: HADOOP-10281.patch) > Create a scheduler, which assigns schedulables a priority level > --- > > Key: HADOOP-10281 > URL: https://issues.apache.org/jira/browse/HADOOP-10281 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Chris Li >Assignee: Chris Li > Attachments: HADOOP-10281.patch, HADOOP-10281.patch, > HADOOP-10281.patch > > > The Scheduler decides which sub-queue to assign a given Call. It implements a > single method getPriorityLevel(Schedulable call) which returns an integer > corresponding to the subqueue the FairCallQueue should place the call in. > The HistoryRpcScheduler is one such implementation which uses the username of > each call and determines what % of calls in recent history were made by this > user. > It is configured with a historyLength (how many calls to track) and a list of > integer thresholds which determine the boundaries between priority levels. > For instance, if the scheduler has a historyLength of 8; and priority > thresholds of 4,2,1; and saw calls made by these users in order: > Alice, Bob, Alice, Alice, Bob, Jerry, Alice, Alice > * Another call by Alice would be placed in queue 3, since she has already > made >= 4 calls > * Another call by Bob would be placed in queue 2, since he has >= 2 but less > than 4 calls > * A call by Carlos would be placed in queue 0, since he has no calls in the > history > Also, some versions of this patch include the concept of a 'service user', > which is a user that is always scheduled high-priority. Currently this seems > redundant and will probably be removed in later patches, since its not too > useful. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10281) Create a scheduler, which assigns schedulables a priority level
[ https://issues.apache.org/jira/browse/HADOOP-10281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Li updated HADOOP-10281: -- Attachment: HADOOP-10281.patch > Create a scheduler, which assigns schedulables a priority level > --- > > Key: HADOOP-10281 > URL: https://issues.apache.org/jira/browse/HADOOP-10281 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Chris Li >Assignee: Chris Li > Attachments: HADOOP-10281.patch, HADOOP-10281.patch, > HADOOP-10281.patch > > > The Scheduler decides which sub-queue to assign a given Call. It implements a > single method getPriorityLevel(Schedulable call) which returns an integer > corresponding to the subqueue the FairCallQueue should place the call in. > The HistoryRpcScheduler is one such implementation which uses the username of > each call and determines what % of calls in recent history were made by this > user. > It is configured with a historyLength (how many calls to track) and a list of > integer thresholds which determine the boundaries between priority levels. > For instance, if the scheduler has a historyLength of 8; and priority > thresholds of 4,2,1; and saw calls made by these users in order: > Alice, Bob, Alice, Alice, Bob, Jerry, Alice, Alice > * Another call by Alice would be placed in queue 3, since she has already > made >= 4 calls > * Another call by Bob would be placed in queue 2, since he has >= 2 but less > than 4 calls > * A call by Carlos would be placed in queue 0, since he has no calls in the > history > Also, some versions of this patch include the concept of a 'service user', > which is a user that is always scheduled high-priority. Currently this seems > redundant and will probably be removed in later patches, since its not too > useful. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10281) Create a scheduler, which assigns schedulables a priority level
[ https://issues.apache.org/jira/browse/HADOOP-10281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Li updated HADOOP-10281: -- Attachment: HADOOP-10281.patch Hi Arpit, I agree on #1; I've had this idea sitting around in my head for awhile too, of having just a map of counts that decays every few seconds. I will try this out and see how performance compares. HADOOP-10286 allows performance to be measured for different user loads in the RPCCallBenchmark, so I will use that to benchmark perf hit. Other: {quote} If there is not enough traffic to flush the callHistory, sporadic users could end up with counts greater than the heartbeats {quote} What can happen is, if I'm the only user on the cluster and I make a call every second, I shouldn't be punished because I'm not hitting the NN hard. However, over 1000 seconds, I will fill up the callHistory, thus being placed in low priority. Heartbeats from datanodes will be placed in a higher queue than me, and now I'm being punished. This issue is fixed if we add a decay constant and make it dependent on time. {quote} I also think we are providing too many configuration knobs with this feature. Hadoop performance tuning is quite complex already and additional settings add administrator and debugging overhead{quote} Agreed that this is hard to tune, it's still pretty experimental, so we want to be able to adjust these things. From what I understand, [~mingma] has modified the scheduler to make configuration extremely easy for the administrator, who specifies a target queue utilization. The scheduler and mux work together to hit this target. {quote} is there any analysis behind choosing this bisection approach?" {quote} Sampling from real-world performance in our clusters shows that the usage tiers between different users is exponential, and so the log(usage) is linear. Some of this data is on the 5th slide here: https://issues.apache.org/jira/secure/attachment/12616864/NN-denial-of-service-updated-plan.pdf (may be hard to see in pie graph form) However we've left this configurable since different clusters may have different usage. {quote} "Why do we support multiple identity providers if we discard all but the first?" {quote} I found a method `conf.getInstances()` but no `conf.getInstance()`, not sure if it warrants patching Configuration.java I've uploaded a new version of the patch with code style fixes, but I'm going to work on using a counting+decay approach for the next patch. > Create a scheduler, which assigns schedulables a priority level > --- > > Key: HADOOP-10281 > URL: https://issues.apache.org/jira/browse/HADOOP-10281 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Chris Li >Assignee: Chris Li > Attachments: HADOOP-10281.patch, HADOOP-10281.patch, > HADOOP-10281.patch > > > The Scheduler decides which sub-queue to assign a given Call. It implements a > single method getPriorityLevel(Schedulable call) which returns an integer > corresponding to the subqueue the FairCallQueue should place the call in. > The HistoryRpcScheduler is one such implementation which uses the username of > each call and determines what % of calls in recent history were made by this > user. > It is configured with a historyLength (how many calls to track) and a list of > integer thresholds which determine the boundaries between priority levels. > For instance, if the scheduler has a historyLength of 8; and priority > thresholds of 4,2,1; and saw calls made by these users in order: > Alice, Bob, Alice, Alice, Bob, Jerry, Alice, Alice > * Another call by Alice would be placed in queue 3, since she has already > made >= 4 calls > * Another call by Bob would be placed in queue 2, since he has >= 2 but less > than 4 calls > * A call by Carlos would be placed in queue 0, since he has no calls in the > history > Also, some versions of this patch include the concept of a 'service user', > which is a user that is always scheduled high-priority. Currently this seems > redundant and will probably be removed in later patches, since its not too > useful. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10279) Create multiplexer, a requirement for the fair queue
[ https://issues.apache.org/jira/browse/HADOOP-10279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Li updated HADOOP-10279: -- Attachment: (was: HADOOP-10279.patch) > Create multiplexer, a requirement for the fair queue > > > Key: HADOOP-10279 > URL: https://issues.apache.org/jira/browse/HADOOP-10279 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Chris Li >Assignee: Chris Li > Attachments: HADOOP-10279.patch, HADOOP-10279.patch, > WeightedRoundRobinMultiplexer.java, subtask2_add_mux.patch > > > The Multiplexer helps the FairCallQueue decide which of its internal > sub-queues to read from during a poll() or take(). It controls the penalty of > being in a lower queue. Without the mux, the FairCallQueue would have issues > with starvation of low-priority requests. > The WeightedRoundRobinMultiplexer is an implementation which uses a weighted > round robin approach to muxing the sub-queues. It is configured with an > integer list pattern. > For example: 10, 5, 5, 2 means: > * Read queue 0 10 times > * Read queue 1 5 times > * Read queue 2 5 times > * Read queue 3 2 times > * Repeat -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10279) Create multiplexer, a requirement for the fair queue
[ https://issues.apache.org/jira/browse/HADOOP-10279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Li updated HADOOP-10279: -- Attachment: HADOOP-10279.patch > Create multiplexer, a requirement for the fair queue > > > Key: HADOOP-10279 > URL: https://issues.apache.org/jira/browse/HADOOP-10279 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Chris Li >Assignee: Chris Li > Attachments: HADOOP-10279.patch, HADOOP-10279.patch, > WeightedRoundRobinMultiplexer.java, subtask2_add_mux.patch > > > The Multiplexer helps the FairCallQueue decide which of its internal > sub-queues to read from during a poll() or take(). It controls the penalty of > being in a lower queue. Without the mux, the FairCallQueue would have issues > with starvation of low-priority requests. > The WeightedRoundRobinMultiplexer is an implementation which uses a weighted > round robin approach to muxing the sub-queues. It is configured with an > integer list pattern. > For example: 10, 5, 5, 2 means: > * Read queue 0 10 times > * Read queue 1 5 times > * Read queue 2 5 times > * Read queue 3 2 times > * Repeat -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10279) Create multiplexer, a requirement for the fair queue
[ https://issues.apache.org/jira/browse/HADOOP-10279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Li updated HADOOP-10279: -- Attachment: HADOOP-10279.patch Thanks for the feedback Arpit. Uploaded new patch with the changes > Create multiplexer, a requirement for the fair queue > > > Key: HADOOP-10279 > URL: https://issues.apache.org/jira/browse/HADOOP-10279 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Chris Li >Assignee: Chris Li > Attachments: HADOOP-10279.patch, HADOOP-10279.patch, > WeightedRoundRobinMultiplexer.java, subtask2_add_mux.patch > > > The Multiplexer helps the FairCallQueue decide which of its internal > sub-queues to read from during a poll() or take(). It controls the penalty of > being in a lower queue. Without the mux, the FairCallQueue would have issues > with starvation of low-priority requests. > The WeightedRoundRobinMultiplexer is an implementation which uses a weighted > round robin approach to muxing the sub-queues. It is configured with an > integer list pattern. > For example: 10, 5, 5, 2 means: > * Read queue 0 10 times > * Read queue 1 5 times > * Read queue 2 5 times > * Read queue 3 2 times > * Repeat -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10282) Create a FairCallQueue: a multi-level call queue which schedules incoming calls and multiplexes outgoing calls
[ https://issues.apache.org/jira/browse/HADOOP-10282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14032938#comment-14032938 ] Chris Li commented on HADOOP-10282: --- Hi Arapit, Check out HADOOP-10278, which allows the default LinkedBlockingQueue to be swapped with any implementor of BlockingQueue. The FairCallQueue is one such implementor of BlockingQueue. > Create a FairCallQueue: a multi-level call queue which schedules incoming > calls and multiplexes outgoing calls > -- > > Key: HADOOP-10282 > URL: https://issues.apache.org/jira/browse/HADOOP-10282 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Chris Li >Assignee: Chris Li > Attachments: HADOOP-10282.patch > > > The FairCallQueue ensures quality of service by altering the order of RPC > calls internally. > It consists of three parts: > 1. a Scheduler (`HistoryRpcScheduler` is provided) which provides a priority > number from 0 to N (0 being highest priority) > 2. a Multi-level queue (residing in `FairCallQueue`) which provides a way to > keep calls in priority order internally > 3. a Multiplexer (`WeightedRoundRobinMultiplexer` is provided) which provides > logic to control which queue to take from > Currently the Mux and Scheduler are not pluggable, but they probably should > be (up for discussion). > This is how it is used: > // Production > 1. Call is created and given to the CallQueueManager > 2. CallQueueManager requests a `put(T call)` into the `FairCallQueue` which > implements `BlockingQueue` > 3. `FairCallQueue` asks its scheduler for a scheduling decision, which is an > integer e.g. 12 > 4. `FairCallQueue` inserts Call into the 12th queue: > `queues.get(12).put(call)` > // Consumption > 1. CallQueueManager requests `take()` or `poll()` on FairCallQueue > 2. `FairCallQueue` asks its multiplexer for which queue to draw from, which > will also be an integer e.g. 2 > 3. `FairCallQueue` draws from this queue if it has an available call (or > tries other queues if it is empty) > Additional information is available in the linked JIRAs regarding the > Scheduler and Multiplexer's roles. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10279) Create multiplexer, a requirement for the fair queue
[ https://issues.apache.org/jira/browse/HADOOP-10279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14030257#comment-14030257 ] Chris Li commented on HADOOP-10279: --- Added comments to and linked all 3 parts of the FairCallQueue. They are intended to be pluggable but I haven't gotten around to that yet since we're still experimenting. > Create multiplexer, a requirement for the fair queue > > > Key: HADOOP-10279 > URL: https://issues.apache.org/jira/browse/HADOOP-10279 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Chris Li >Assignee: Chris Li > Attachments: HADOOP-10279.patch, WeightedRoundRobinMultiplexer.java, > subtask2_add_mux.patch > > > The Multiplexer helps the FairCallQueue decide which of its internal > sub-queues to read from during a poll() or take(). It controls the penalty of > being in a lower queue. Without the mux, the FairCallQueue would have issues > with starvation of low-priority requests. > The WeightedRoundRobinMultiplexer is an implementation which uses a weighted > round robin approach to muxing the sub-queues. It is configured with an > integer list pattern. > For example: 10, 5, 5, 2 means: > * Read queue 0 10 times > * Read queue 1 5 times > * Read queue 2 5 times > * Read queue 3 2 times > * Repeat -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (HADOOP-10281) Create a scheduler, which assigns schedulables a priority level
[ https://issues.apache.org/jira/browse/HADOOP-10281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Li reassigned HADOOP-10281: - Assignee: Chris Li > Create a scheduler, which assigns schedulables a priority level > --- > > Key: HADOOP-10281 > URL: https://issues.apache.org/jira/browse/HADOOP-10281 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Chris Li >Assignee: Chris Li > Attachments: HADOOP-10281.patch, HADOOP-10281.patch > > > The Scheduler decides which sub-queue to assign a given Call. It implements a > single method getPriorityLevel(Schedulable call) which returns an integer > corresponding to the subqueue the FairCallQueue should place the call in. > The HistoryRpcScheduler is one such implementation which uses the username of > each call and determines what % of calls in recent history were made by this > user. > It is configured with a historyLength (how many calls to track) and a list of > integer thresholds which determine the boundaries between priority levels. > For instance, if the scheduler has a historyLength of 8; and priority > thresholds of 4,2,1; and saw calls made by these users in order: > Alice, Bob, Alice, Alice, Bob, Jerry, Alice, Alice > * Another call by Alice would be placed in queue 3, since she has already > made >= 4 calls > * Another call by Bob would be placed in queue 2, since he has >= 2 but less > than 4 calls > * A call by Carlos would be placed in queue 0, since he has no calls in the > history > Also, some versions of this patch include the concept of a 'service user', > which is a user that is always scheduled high-priority. Currently this seems > redundant and will probably be removed in later patches, since its not too > useful. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (HADOOP-10282) Create a FairCallQueue: a multi-level call queue which schedules incoming calls and multiplexes outgoing calls
[ https://issues.apache.org/jira/browse/HADOOP-10282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Li reassigned HADOOP-10282: - Assignee: Chris Li > Create a FairCallQueue: a multi-level call queue which schedules incoming > calls and multiplexes outgoing calls > -- > > Key: HADOOP-10282 > URL: https://issues.apache.org/jira/browse/HADOOP-10282 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Chris Li >Assignee: Chris Li > Attachments: HADOOP-10282.patch > > > The FairCallQueue ensures quality of service by altering the order of RPC > calls internally. > It consists of three parts: > 1. a Scheduler (`HistoryRpcScheduler` is provided) which provides a priority > number from 0 to N (0 being highest priority) > 2. a Multi-level queue (residing in `FairCallQueue`) which provides a way to > keep calls in priority order internally > 3. a Multiplexer (`WeightedRoundRobinMultiplexer` is provided) which provides > logic to control which queue to take from > Currently the Mux and Scheduler are not pluggable, but they probably should > be (up for discussion). > This is how it is used: > // Production > 1. Call is created and given to the CallQueueManager > 2. CallQueueManager requests a `put(T call)` into the `FairCallQueue` which > implements `BlockingQueue` > 3. `FairCallQueue` asks its scheduler for a scheduling decision, which is an > integer e.g. 12 > 4. `FairCallQueue` inserts Call into the 12th queue: > `queues.get(12).put(call)` > // Consumption > 1. CallQueueManager requests `take()` or `poll()` on FairCallQueue > 2. `FairCallQueue` asks its multiplexer for which queue to draw from, which > will also be an integer e.g. 2 > 3. `FairCallQueue` draws from this queue if it has an available call (or > tries other queues if it is empty) > Additional information is available in the linked JIRAs regarding the > Scheduler and Multiplexer's roles. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10281) Create a scheduler, which assigns schedulables a priority level
[ https://issues.apache.org/jira/browse/HADOOP-10281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Li updated HADOOP-10281: -- Description: The Scheduler decides which sub-queue to assign a given Call. It implements a single method getPriorityLevel(Schedulable call) which returns an integer corresponding to the subqueue the FairCallQueue should place the call in. The HistoryRpcScheduler is one such implementation which uses the username of each call and determines what % of calls in recent history were made by this user. It is configured with a historyLength (how many calls to track) and a list of integer thresholds which determine the boundaries between priority levels. For instance, if the scheduler has a historyLength of 8; and priority thresholds of 4,2,1; and saw calls made by these users in order: Alice, Bob, Alice, Alice, Bob, Jerry, Alice, Alice * Another call by Alice would be placed in queue 3, since she has already made >= 4 calls * Another call by Bob would be placed in queue 2, since he has >= 2 but less than 4 calls * A call by Carlos would be placed in queue 0, since he has no calls in the history Also, some versions of this patch include the concept of a 'service user', which is a user that is always scheduled high-priority. Currently this seems redundant and will probably be removed in later patches, since its not too useful. was: The Scheduler decides which sub-queue to assign a given Call. It implements a single method getPriorityLevel(Schedulable call) which returns an integer corresponding to the subqueue the FairCallQueue should place the call in. The HistoryRpcScheduler is one such implementation which uses the username of each call and determines what % of calls in recent history were made by this user. It is configured with a historyLength (how many calls to track) and a list of integer thresholds which determine the boundaries between priority levels. For instance, if the scheduler has a historyLength of 8; and priority thresholds of 4,2,1; and saw calls made by these users in order: Alice, Bob, Alice, Alice, Bob, Jerry, Alice, Alice * Another call by Alice would be placed in queue 3, since she has already made >= 4 calls * Another call by Bob would be placed in queue 2, since he has >= 2 but less than 4 calls * A call by Carlos would be placed in queue 0, since he has no calls in the history > Create a scheduler, which assigns schedulables a priority level > --- > > Key: HADOOP-10281 > URL: https://issues.apache.org/jira/browse/HADOOP-10281 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Chris Li > Attachments: HADOOP-10281.patch, HADOOP-10281.patch > > > The Scheduler decides which sub-queue to assign a given Call. It implements a > single method getPriorityLevel(Schedulable call) which returns an integer > corresponding to the subqueue the FairCallQueue should place the call in. > The HistoryRpcScheduler is one such implementation which uses the username of > each call and determines what % of calls in recent history were made by this > user. > It is configured with a historyLength (how many calls to track) and a list of > integer thresholds which determine the boundaries between priority levels. > For instance, if the scheduler has a historyLength of 8; and priority > thresholds of 4,2,1; and saw calls made by these users in order: > Alice, Bob, Alice, Alice, Bob, Jerry, Alice, Alice > * Another call by Alice would be placed in queue 3, since she has already > made >= 4 calls > * Another call by Bob would be placed in queue 2, since he has >= 2 but less > than 4 calls > * A call by Carlos would be placed in queue 0, since he has no calls in the > history > Also, some versions of this patch include the concept of a 'service user', > which is a user that is always scheduled high-priority. Currently this seems > redundant and will probably be removed in later patches, since its not too > useful. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10281) Create a scheduler, which assigns schedulables a priority level
[ https://issues.apache.org/jira/browse/HADOOP-10281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Li updated HADOOP-10281: -- Description: The Scheduler decides which sub-queue to assign a given Call. It implements a single method getPriorityLevel(Schedulable call) which returns an integer corresponding to the subqueue the FairCallQueue should place the call in. The HistoryRpcScheduler is one such implementation which uses the username of each call and determines what % of calls in recent history were made by this user. It is configured with a historyLength (how many calls to track) and a list of integer thresholds which determine the boundaries between priority levels. For instance, if the scheduler has a historyLength of 8; and priority thresholds of 4,2,1; and saw calls made by these users in order: Alice, Bob, Alice, Alice, Bob, Jerry, Alice, Alice * Another call by Alice would be placed in queue 3, since she has already made >= 4 calls * Another call by Bob would be placed in queue 2, since he has >= 2 but less than 4 calls * A call by Carlos would be placed in queue 0, since he has no calls in the history was:The Scheduler decides which sub-queue to assign a given Call. > Create a scheduler, which assigns schedulables a priority level > --- > > Key: HADOOP-10281 > URL: https://issues.apache.org/jira/browse/HADOOP-10281 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Chris Li > Attachments: HADOOP-10281.patch, HADOOP-10281.patch > > > The Scheduler decides which sub-queue to assign a given Call. It implements a > single method getPriorityLevel(Schedulable call) which returns an integer > corresponding to the subqueue the FairCallQueue should place the call in. > The HistoryRpcScheduler is one such implementation which uses the username of > each call and determines what % of calls in recent history were made by this > user. > It is configured with a historyLength (how many calls to track) and a list of > integer thresholds which determine the boundaries between priority levels. > For instance, if the scheduler has a historyLength of 8; and priority > thresholds of 4,2,1; and saw calls made by these users in order: > Alice, Bob, Alice, Alice, Bob, Jerry, Alice, Alice > * Another call by Alice would be placed in queue 3, since she has already > made >= 4 calls > * Another call by Bob would be placed in queue 2, since he has >= 2 but less > than 4 calls > * A call by Carlos would be placed in queue 0, since he has no calls in the > history -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10279) Create multiplexer, a requirement for the fair queue
[ https://issues.apache.org/jira/browse/HADOOP-10279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Li updated HADOOP-10279: -- Description: The Multiplexer helps the FairCallQueue decide which of its internal sub-queues to read from during a poll() or take(). It controls the penalty of being in a lower queue. Without the mux, the FairCallQueue would have issues with starvation of low-priority requests. The WeightedRoundRobinMultiplexer is an implementation which uses a weighted round robin approach to muxing the sub-queues. It is configured with an integer list pattern. For example: 10, 5, 5, 2 means: * Read queue 0 10 times * Read queue 1 5 times * Read queue 2 5 times * Read queue 3 2 times * Repeat was: The Multiplexer helps the FairCallQueue decide which of its internal sub-queues to read from during a `poll()` or `take()`. It controls the "penalty" of being in a lower queue. Without the mux, the FairCallQueue would have issues with starvation of low-priority requests. The WeightedRoundRobinMultiplexer is an implementation which uses a weighted round robin approach to muxing the sub-queues. It is configured with an integer list pattern. For example: 10, 5, 5, 2 means: * Read queue 0 10 times * Read queue 1 5 times * Read queue 2 5 times * Read queue 3 2 times * Repeat > Create multiplexer, a requirement for the fair queue > > > Key: HADOOP-10279 > URL: https://issues.apache.org/jira/browse/HADOOP-10279 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Chris Li >Assignee: Chris Li > Attachments: HADOOP-10279.patch, WeightedRoundRobinMultiplexer.java, > subtask2_add_mux.patch > > > The Multiplexer helps the FairCallQueue decide which of its internal > sub-queues to read from during a poll() or take(). It controls the penalty of > being in a lower queue. Without the mux, the FairCallQueue would have issues > with starvation of low-priority requests. > The WeightedRoundRobinMultiplexer is an implementation which uses a weighted > round robin approach to muxing the sub-queues. It is configured with an > integer list pattern. > For example: 10, 5, 5, 2 means: > * Read queue 0 10 times > * Read queue 1 5 times > * Read queue 2 5 times > * Read queue 3 2 times > * Repeat -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10281) Create a scheduler, which assigns schedulables a priority level
[ https://issues.apache.org/jira/browse/HADOOP-10281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Li updated HADOOP-10281: -- Description: The Scheduler decides which sub-queue to assign a given Call. > Create a scheduler, which assigns schedulables a priority level > --- > > Key: HADOOP-10281 > URL: https://issues.apache.org/jira/browse/HADOOP-10281 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Chris Li > Attachments: HADOOP-10281.patch, HADOOP-10281.patch > > > The Scheduler decides which sub-queue to assign a given Call. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10279) Create multiplexer, a requirement for the fair queue
[ https://issues.apache.org/jira/browse/HADOOP-10279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Li updated HADOOP-10279: -- Description: The Multiplexer helps the FairCallQueue decide which of its internal sub-queues to read from during a `poll()` or `take()`. It controls the "penalty" of being in a lower queue. Without the mux, the FairCallQueue would have issues with starvation of low-priority requests. The WeightedRoundRobinMultiplexer is an implementation which uses a weighted round robin approach to muxing the sub-queues. It is configured with an integer list pattern. For example: 10, 5, 5, 2 means: * Read queue 0 10 times * Read queue 1 5 times * Read queue 2 5 times * Read queue 3 2 times * Repeat > Create multiplexer, a requirement for the fair queue > > > Key: HADOOP-10279 > URL: https://issues.apache.org/jira/browse/HADOOP-10279 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Chris Li >Assignee: Chris Li > Attachments: HADOOP-10279.patch, WeightedRoundRobinMultiplexer.java, > subtask2_add_mux.patch > > > The Multiplexer helps the FairCallQueue decide which of its internal > sub-queues to read from during a `poll()` or `take()`. It controls the > "penalty" of being in a lower queue. Without the mux, the FairCallQueue would > have issues with starvation of low-priority requests. > The WeightedRoundRobinMultiplexer is an implementation which uses a weighted > round robin approach to muxing the sub-queues. It is configured with an > integer list pattern. > For example: 10, 5, 5, 2 means: > * Read queue 0 10 times > * Read queue 1 5 times > * Read queue 2 5 times > * Read queue 3 2 times > * Repeat -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10282) Create a FairCallQueue: a multi-level call queue which schedules incoming calls and multiplexes outgoing calls
[ https://issues.apache.org/jira/browse/HADOOP-10282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Li updated HADOOP-10282: -- Description: The FairCallQueue ensures quality of service by altering the order of RPC calls internally. It consists of three parts: 1. a Scheduler (`HistoryRpcScheduler` is provided) which provides a priority number from 0 to N (0 being highest priority) 2. a Multi-level queue (residing in `FairCallQueue`) which provides a way to keep calls in priority order internally 3. a Multiplexer (`WeightedRoundRobinMultiplexer` is provided) which provides logic to control which queue to take from Currently the Mux and Scheduler are not pluggable, but they probably should be (up for discussion). This is how it is used: // Production 1. Call is created and given to the CallQueueManager 2. CallQueueManager requests a `put(T call)` into the `FairCallQueue` which implements `BlockingQueue` 3. `FairCallQueue` asks its scheduler for a scheduling decision, which is an integer e.g. 12 4. `FairCallQueue` inserts Call into the 12th queue: `queues.get(12).put(call)` // Consumption 1. CallQueueManager requests `take()` or `poll()` on FairCallQueue 2. `FairCallQueue` asks its multiplexer for which queue to draw from, which will also be an integer e.g. 2 3. `FairCallQueue` draws from this queue if it has an available call (or tries other queues if it is empty) Additional information is available in the linked JIRAs regarding the Scheduler and Multiplexer's roles. > Create a FairCallQueue: a multi-level call queue which schedules incoming > calls and multiplexes outgoing calls > -- > > Key: HADOOP-10282 > URL: https://issues.apache.org/jira/browse/HADOOP-10282 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Chris Li > Attachments: HADOOP-10282.patch > > > The FairCallQueue ensures quality of service by altering the order of RPC > calls internally. > It consists of three parts: > 1. a Scheduler (`HistoryRpcScheduler` is provided) which provides a priority > number from 0 to N (0 being highest priority) > 2. a Multi-level queue (residing in `FairCallQueue`) which provides a way to > keep calls in priority order internally > 3. a Multiplexer (`WeightedRoundRobinMultiplexer` is provided) which provides > logic to control which queue to take from > Currently the Mux and Scheduler are not pluggable, but they probably should > be (up for discussion). > This is how it is used: > // Production > 1. Call is created and given to the CallQueueManager > 2. CallQueueManager requests a `put(T call)` into the `FairCallQueue` which > implements `BlockingQueue` > 3. `FairCallQueue` asks its scheduler for a scheduling decision, which is an > integer e.g. 12 > 4. `FairCallQueue` inserts Call into the 12th queue: > `queues.get(12).put(call)` > // Consumption > 1. CallQueueManager requests `take()` or `poll()` on FairCallQueue > 2. `FairCallQueue` asks its multiplexer for which queue to draw from, which > will also be an integer e.g. 2 > 3. `FairCallQueue` draws from this queue if it has an available call (or > tries other queues if it is empty) > Additional information is available in the linked JIRAs regarding the > Scheduler and Multiplexer's roles. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10279) Create multiplexer, a requirement for the fair queue
[ https://issues.apache.org/jira/browse/HADOOP-10279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14030242#comment-14030242 ] Chris Li commented on HADOOP-10279: --- I'll put a writeup in HADOOP-10282 and let you know when it's done > Create multiplexer, a requirement for the fair queue > > > Key: HADOOP-10279 > URL: https://issues.apache.org/jira/browse/HADOOP-10279 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Chris Li >Assignee: Chris Li > Attachments: HADOOP-10279.patch, WeightedRoundRobinMultiplexer.java, > subtask2_add_mux.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10279) Create multiplexer, a requirement for the fair queue
[ https://issues.apache.org/jira/browse/HADOOP-10279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14030236#comment-14030236 ] Chris Li commented on HADOOP-10279: --- Hi Arapit, sorry for the lack of context; the mux is a small component of the FairCallQueue here: https://issues.apache.org/jira/browse/HADOOP-10282 It could even exist as a nested class, but it might be useful to make swappable to enable different behaviors > Create multiplexer, a requirement for the fair queue > > > Key: HADOOP-10279 > URL: https://issues.apache.org/jira/browse/HADOOP-10279 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Chris Li >Assignee: Chris Li > Attachments: HADOOP-10279.patch, WeightedRoundRobinMultiplexer.java, > subtask2_add_mux.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10376) Refactor refresh*Protocols into a single generic refreshConfigProtocol
[ https://issues.apache.org/jira/browse/HADOOP-10376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14028751#comment-14028751 ] Chris Li commented on HADOOP-10376: --- Hmm, how about duplicating the functionality, then? So that users can begin to use the new command (which may uncover shortcomings and needed improvements). > Refactor refresh*Protocols into a single generic refreshConfigProtocol > -- > > Key: HADOOP-10376 > URL: https://issues.apache.org/jira/browse/HADOOP-10376 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Chris Li >Assignee: Chris Li >Priority: Minor > Fix For: 3.0.0, 2.5.0 > > Attachments: HADOOP-10376.patch, HADOOP-10376.patch, > HADOOP-10376.patch, HADOOP-10376.patch, HADOOP-10376.patch, > RefreshFrameworkProposal.pdf > > > See https://issues.apache.org/jira/browse/HADOOP-10285 > There are starting to be too many refresh*Protocols We can refactor them to > use a single protocol with a variable payload to choose what to do. > Thereafter, we can return an indication of success or failure. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10685) Migrate Standalone Refresh Protocols to the GenericRefreshProto
[ https://issues.apache.org/jira/browse/HADOOP-10685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Li updated HADOOP-10685: -- Description: Now that we have a GenericRefreshProtocol, we should migrate existing protocols towards it. First, we will duplicate the functionality. If all goes well we can mark the old methods as deprecated, and remove them later. > Migrate Standalone Refresh Protocols to the GenericRefreshProto > --- > > Key: HADOOP-10685 > URL: https://issues.apache.org/jira/browse/HADOOP-10685 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Chris Li >Assignee: Chris Li >Priority: Minor > > Now that we have a GenericRefreshProtocol, we should migrate existing > protocols towards it. > First, we will duplicate the functionality. > If all goes well we can mark the old methods as deprecated, and remove them > later. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10685) Migrate Standalone Refresh Protocols to the GenericRefreshProto
[ https://issues.apache.org/jira/browse/HADOOP-10685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Li updated HADOOP-10685: -- Summary: Migrate Standalone Refresh Protocols to the GenericRefreshProto (was: Move RefreshCallQueue from its own protocol to the GenericRefreshProto) > Migrate Standalone Refresh Protocols to the GenericRefreshProto > --- > > Key: HADOOP-10685 > URL: https://issues.apache.org/jira/browse/HADOOP-10685 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Chris Li >Assignee: Chris Li >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10376) Refactor refresh*Protocols into a single generic refreshConfigProtocol
[ https://issues.apache.org/jira/browse/HADOOP-10376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14028731#comment-14028731 ] Chris Li commented on HADOOP-10376: --- Thanks for reviewing! I think the next step is to start moving other refreshProtos towards the generic proto, marking things as deprecated along the way. I'll start with RefreshCallQueue since it's the newest. HADOOP-10685 to track > Refactor refresh*Protocols into a single generic refreshConfigProtocol > -- > > Key: HADOOP-10376 > URL: https://issues.apache.org/jira/browse/HADOOP-10376 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Chris Li >Assignee: Chris Li >Priority: Minor > Fix For: 3.0.0, 2.5.0 > > Attachments: HADOOP-10376.patch, HADOOP-10376.patch, > HADOOP-10376.patch, HADOOP-10376.patch, HADOOP-10376.patch, > RefreshFrameworkProposal.pdf > > > See https://issues.apache.org/jira/browse/HADOOP-10285 > There are starting to be too many refresh*Protocols We can refactor them to > use a single protocol with a variable payload to choose what to do. > Thereafter, we can return an indication of success or failure. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HADOOP-10685) Move RefreshCallQueue from its own protocol to the GenericRefreshProto
Chris Li created HADOOP-10685: - Summary: Move RefreshCallQueue from its own protocol to the GenericRefreshProto Key: HADOOP-10685 URL: https://issues.apache.org/jira/browse/HADOOP-10685 Project: Hadoop Common Issue Type: Improvement Reporter: Chris Li Assignee: Chris Li Priority: Minor -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10376) Refactor refresh*Protocols into a single generic refreshConfigProtocol
[ https://issues.apache.org/jira/browse/HADOOP-10376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Li updated HADOOP-10376: -- Attachment: HADOOP-10376.patch Ah oops, I accidentally reverted that change with git on my end. Thanks for catching that. `identifier` should now be optional > Refactor refresh*Protocols into a single generic refreshConfigProtocol > -- > > Key: HADOOP-10376 > URL: https://issues.apache.org/jira/browse/HADOOP-10376 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Chris Li >Assignee: Chris Li >Priority: Minor > Attachments: HADOOP-10376.patch, HADOOP-10376.patch, > HADOOP-10376.patch, HADOOP-10376.patch, HADOOP-10376.patch, > RefreshFrameworkProposal.pdf > > > See https://issues.apache.org/jira/browse/HADOOP-10285 > There are starting to be too many refresh*Protocols We can refactor them to > use a single protocol with a variable payload to choose what to do. > Thereafter, we can return an indication of success or failure. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10376) Refactor refresh*Protocols into a single generic refreshConfigProtocol
[ https://issues.apache.org/jira/browse/HADOOP-10376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Li updated HADOOP-10376: -- Attachment: HADOOP-10376.patch [~benoyantony] good catch, made mutators synchronized. Also on name, I'm okay with RefreshHandlerRegistry if people think it's more clear. I do like that RefreshRegistry is concise though [~wuzesheng] sounds like a good idea when we replace old refreshprotos in later patches with rewritten ones. > Refactor refresh*Protocols into a single generic refreshConfigProtocol > -- > > Key: HADOOP-10376 > URL: https://issues.apache.org/jira/browse/HADOOP-10376 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Chris Li >Assignee: Chris Li >Priority: Minor > Attachments: HADOOP-10376.patch, HADOOP-10376.patch, > HADOOP-10376.patch, HADOOP-10376.patch, RefreshFrameworkProposal.pdf > > > See https://issues.apache.org/jira/browse/HADOOP-10285 > There are starting to be too many refresh*Protocols We can refactor them to > use a single protocol with a variable payload to choose what to do. > Thereafter, we can return an indication of success or failure. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10376) Refactor refresh*Protocols into a single generic refreshConfigProtocol
[ https://issues.apache.org/jira/browse/HADOOP-10376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Li updated HADOOP-10376: -- Attachment: HADOOP-10376.patch Uploaded new patch. I decided to add support for multi registration again. The reasoning is that this feature simplifies things by allowing any class to register as a refresh handler, even if it doesn't know about its server's port (eg what ID to register itself as), and the whole benefit to having a registry like this is to allow for a more decentralized approach. > Refactor refresh*Protocols into a single generic refreshConfigProtocol > -- > > Key: HADOOP-10376 > URL: https://issues.apache.org/jira/browse/HADOOP-10376 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Chris Li >Assignee: Chris Li >Priority: Minor > Attachments: HADOOP-10376.patch, HADOOP-10376.patch, > HADOOP-10376.patch, RefreshFrameworkProposal.pdf > > > See https://issues.apache.org/jira/browse/HADOOP-10285 > There are starting to be too many refresh*Protocols We can refactor them to > use a single protocol with a variable payload to choose what to do. > Thereafter, we can return an indication of success or failure. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10376) Refactor refresh*Protocols into a single generic refreshConfigProtocol
[ https://issues.apache.org/jira/browse/HADOOP-10376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14019478#comment-14019478 ] Chris Li commented on HADOOP-10376: --- Hi Arapit, If we make this feature namenode only, it would remove the requirement to specify the target, but it would leave certain situations ambiguous, such as what should happen in the case of HA. I suppose it's a matter of philosophy, but I wanted to make this intentionally agnostic, so it would work regardless of what service is targeted, whether HA is enabled for RMs or NNs or other things are developed in the future. Perhaps it can offer both? It would ask the user to provide the service to refresh as host:port, or Namenode/ResourceManager which would refresh on all the machines in HA or any future configurations such as a quorum. Maybe that's also another patch. I'll also switch it back to 1-to-1 mappings, since the added complexity turns out to be too much after I tried it. > Refactor refresh*Protocols into a single generic refreshConfigProtocol > -- > > Key: HADOOP-10376 > URL: https://issues.apache.org/jira/browse/HADOOP-10376 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Chris Li >Assignee: Chris Li >Priority: Minor > Attachments: HADOOP-10376.patch, HADOOP-10376.patch, > RefreshFrameworkProposal.pdf > > > See https://issues.apache.org/jira/browse/HADOOP-10285 > There are starting to be too many refresh*Protocols We can refactor them to > use a single protocol with a variable payload to choose what to do. > Thereafter, we can return an indication of success or failure. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10376) Refactor refresh*Protocols into a single generic refreshConfigProtocol
[ https://issues.apache.org/jira/browse/HADOOP-10376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14015884#comment-14015884 ] Chris Li commented on HADOOP-10376: --- Hi Arapit, thanks for the suggestions. Optional identifier sounds good. RefreshResponse: One use case of multiple handlers is for refreshing something on two servers running on two ports. We would ideally like to return a repeated custom message type which includes returnCode, userMessage, and senderName (which becomes important when you have multiple handlers). And then that opens another issue with how to have a user-readable senderName. Perhaps supporting multiple handlers isn't worth the hassle? RefreshRegistry: Changes sound good if we need to support multiple handlers. DFSAdmin: Args are sent as an array of strings, no post-processing done. Not sure if it needs to be there, but other refresh calls are able to infer the host:port based on the protocol used (and the refresh protocols they use are specific to NN or DN) whereas a generic protocol would not. --- It seems like a handful of these changes depend on whether or not it supports one identifier to many handler mappings. In the process the code gets more complex since we're now dealing with collections of responses. Let me know if the feature sounds useful, otherwise I can remove it to maintain simplicity. > Refactor refresh*Protocols into a single generic refreshConfigProtocol > -- > > Key: HADOOP-10376 > URL: https://issues.apache.org/jira/browse/HADOOP-10376 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Chris Li >Assignee: Chris Li >Priority: Minor > Attachments: HADOOP-10376.patch, HADOOP-10376.patch, > RefreshFrameworkProposal.pdf > > > See https://issues.apache.org/jira/browse/HADOOP-10285 > There are starting to be too many refresh*Protocols We can refactor them to > use a single protocol with a variable payload to choose what to do. > Thereafter, we can return an indication of success or failure. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10376) Refactor refresh*Protocols into a single generic refreshConfigProtocol
[ https://issues.apache.org/jira/browse/HADOOP-10376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Li updated HADOOP-10376: -- Attachment: HADOOP-10376.patch Updated patch with support for many handlers mapping to a single identifier. > Refactor refresh*Protocols into a single generic refreshConfigProtocol > -- > > Key: HADOOP-10376 > URL: https://issues.apache.org/jira/browse/HADOOP-10376 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Chris Li >Assignee: Chris Li >Priority: Minor > Attachments: HADOOP-10376.patch, HADOOP-10376.patch, > RefreshFrameworkProposal.pdf > > > See https://issues.apache.org/jira/browse/HADOOP-10285 > There are starting to be too many refresh*Protocols We can refactor them to > use a single protocol with a variable payload to choose what to do. > Thereafter, we can return an indication of success or failure. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10376) Refactor refresh*Protocols into a single generic refreshConfigProtocol
[ https://issues.apache.org/jira/browse/HADOOP-10376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Li updated HADOOP-10376: -- Attachment: HADOOP-10376.patch Hi [~arpitagarwal], sounds good. I went ahead and uploaded a patch. Most of it is pretty typical stuff for adding a new protocol (which shows how painful it is today), the interesting parts are the 3 new files: RefreshRegistry, RefreshHandler, RefreshResponse. A useful new capability is being able to send text and exit status to the user on success (today you can either return 0 and have no text, or throw an exception with a message and return -1) Authorization is coarse in this patch: users can be opted in or out of refreshing any of the registered refresh handlers. Future versions would allow more fine permissions. > Refactor refresh*Protocols into a single generic refreshConfigProtocol > -- > > Key: HADOOP-10376 > URL: https://issues.apache.org/jira/browse/HADOOP-10376 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Chris Li >Assignee: Chris Li >Priority: Minor > Attachments: HADOOP-10376.patch, RefreshFrameworkProposal.pdf > > > See https://issues.apache.org/jira/browse/HADOOP-10285 > There are starting to be too many refresh*Protocols We can refactor them to > use a single protocol with a variable payload to choose what to do. > Thereafter, we can return an indication of success or failure. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-9640) RPC Congestion Control with FairCallQueue
[ https://issues.apache.org/jira/browse/HADOOP-9640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988113#comment-13988113 ] Chris Li commented on HADOOP-9640: -- Uploaded patches to HADOOP-10279, HADOOP-10281, and HADOOP-10282 for feedback. The new scheduler fixes the performance issues identified in the earlier PDF too. > RPC Congestion Control with FairCallQueue > - > > Key: HADOOP-9640 > URL: https://issues.apache.org/jira/browse/HADOOP-9640 > Project: Hadoop Common > Issue Type: Improvement >Affects Versions: 3.0.0, 2.2.0 >Reporter: Xiaobo Peng >Assignee: Chris Li > Labels: hdfs, qos, rpc > Attachments: FairCallQueue-PerformanceOnCluster.pdf, > MinorityMajorityPerformance.pdf, NN-denial-of-service-updated-plan.pdf, > faircallqueue.patch, faircallqueue2.patch, faircallqueue3.patch, > faircallqueue4.patch, faircallqueue5.patch, faircallqueue6.patch, > faircallqueue7_with_runtime_swapping.patch, > rpc-congestion-control-draft-plan.pdf > > > Several production Hadoop cluster incidents occurred where the Namenode was > overloaded and failed to respond. > We can improve quality of service for users during namenode peak loads by > replacing the FIFO call queue with a [Fair Call > Queue|https://issues.apache.org/jira/secure/attachment/12616864/NN-denial-of-service-updated-plan.pdf]. > (this plan supersedes rpc-congestion-control-draft-plan). > Excerpted from the communication of one incident, “The map task of a user was > creating huge number of small files in the user directory. Due to the heavy > load on NN, the JT also was unable to communicate with NN...The cluster > became responsive only once the job was killed.” > Excerpted from the communication of another incident, “Namenode was > overloaded by GetBlockLocation requests (Correction: should be getFileInfo > requests. the job had a bug that called getFileInfo for a nonexistent file in > an endless loop). All other requests to namenode were also affected by this > and hence all jobs slowed down. Cluster almost came to a grinding > halt…Eventually killed jobtracker to kill all jobs that are running.” > Excerpted from HDFS-945, “We've seen defective applications cause havoc on > the NameNode, for e.g. by doing 100k+ 'listStatus' on very large directories > (60k files) etc.” -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10281) Create a scheduler, which assigns schedulables a priority level
[ https://issues.apache.org/jira/browse/HADOOP-10281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Li updated HADOOP-10281: -- Attachment: HADOOP-10281.patch The previous version had issues scaling due to the use of ConcurrentLinkedQueue, which has a non-constant time size() call. > Create a scheduler, which assigns schedulables a priority level > --- > > Key: HADOOP-10281 > URL: https://issues.apache.org/jira/browse/HADOOP-10281 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Chris Li > Attachments: HADOOP-10281.patch, HADOOP-10281.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10286) Allow RPCCallBenchmark to benchmark calls by different users
[ https://issues.apache.org/jira/browse/HADOOP-10286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Li updated HADOOP-10286: -- Attachment: (was: HADOOP-10286.patch) > Allow RPCCallBenchmark to benchmark calls by different users > > > Key: HADOOP-10286 > URL: https://issues.apache.org/jira/browse/HADOOP-10286 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Chris Li > Attachments: HADOOP-10286.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10286) Allow RPCCallBenchmark to benchmark calls by different users
[ https://issues.apache.org/jira/browse/HADOOP-10286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Li updated HADOOP-10286: -- Assignee: Chris Li Status: Patch Available (was: Open) > Allow RPCCallBenchmark to benchmark calls by different users > > > Key: HADOOP-10286 > URL: https://issues.apache.org/jira/browse/HADOOP-10286 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Chris Li >Assignee: Chris Li > Attachments: HADOOP-10286.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)