[ https://issues.apache.org/jira/browse/CASSANDRA-1358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12895434#action_12895434 ]
Mike Malone commented on CASSANDRA-1358: ---------------------------------------- I'm continuing to dig deeper into this code while simultaneously nursing one of our cluster's back to health, so I apologize for the sort of stream-of-consciousness here... I noticed that several of the executor queues are bounded at 4096 tasks. Has there been much thought put into that choice, or is it an arbitrary round number that someone picked? It seems to me that bumping that number up a couple orders of magnitude or making it unbounded might ameliorate the situation. Instead of having the stage executors filling up and pushing task execution back onto the calling thread (which is single thread in the case of MDP) more messages will stack up in the callee-queues. This should give the various queues a fair chance of processing stuff they're interested in without being blocked by MDP (which is being blocked by some other stage). There may be some slight memory overhead because deserialized objects will be in memory instead of serialized ones, but that's a priced I'd be willing to pay. I did find one possible reason to have an executor with a core pool size of 1, an unbounded queue, and a maximumPoolSize > 1. It looks like the default RejectedExecutionHandler is affected by maximumPoolSize. If it's > 1 then the default constructor assumes that tasks can safely be scheduled in parallel, and it defaults to a "caller runs" policy if the queue is full. But if the maximumPoolSize is 1, the rejected execution handler spins on offering the task to the queue with a 1 second timeout. So if your maximum pool size is greater than one you can basically use the calling threads for spare capacity. Still, if that's the goal it should be made more explicit. I'm guessing the intent was to give the MDP one thread per core under the assumption that it will be completely CPU bound. But the implementation is borked for a number of reasons. Plus, if the MDP can block on other stuff, the CPU bound assumption is wrong. If MDP can block, it should probably have a lot more threads. > Clogged RRS/RMS stages can hold up processing of gossip messages and request > acks > --------------------------------------------------------------------------------- > > Key: CASSANDRA-1358 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1358 > Project: Cassandra > Issue Type: Improvement > Components: Core > Affects Versions: 0.5 > Environment: All. > Reporter: Mike Malone > Fix For: 0.6.5 > > > The message deserialization process can become a bottleneck that prevents > efficient resource utilization because the executor that manages the > deserialization process will never grow beyond a single thread. The message > deserializer executor is instantiated in the MessagingService constructor as > a JMXEnableThreadPoolExecutor, which extends > java.util.concurrent.ThreadPoolExecutor. The thread pool is instantiated with > a corePoolSize of 1 and a maximumPoolSize of > Runtime.getRuntime().availableProcessors(). But, according to the > ThreadPoolExecutor documentation "using an unbounded queue (for example a > LinkedBlockingQueue without a predefined capacity) will cause new tasks to be > queued in cases where all corePoolSize threads are busy. Thus, no more than > corePoolSize threads will ever be created. (And the value of the > maximumPoolSize therefore doesn't have any effect.)" > The message deserializer pool uses a LinkedBlockingQueue, so there will never > be more than one deserialization thread. This issue became a problem in our > production cluster when the MESSAGE-DESERIALIZER-POOL began to back up on a > node that was only lightly loaded. We increased the core pool size to 4 and > the situation improved, but the deserializer pool was still backing up while > the machine was not fully utilized (less than 100% CPU utilization). This > leads me to think that the deserializer thread is blocking on some sort of > I/O, which seems like it shouldn't happen. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.