[jira] [Commented] (ZOOKEEPER-1620) NIOServerCnxnFactory (new code introduced in ZK-1504) opens selectors but never closes them
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13557405#comment-13557405 ] Jay Shrauner commented on ZOOKEEPER-1620: - You shouldn't need to do that (explicitly call closeSelector inside factory.stop()), and it's preferable if you don't. The reason it's preferable not to do it is to allow the system to do a graceful shutdown. The reason you shouldn't need to do that is that the shutdown call joins on the accept and selector threads, so it's not going to return until those threads exit. The unit tests create and destroy the factories/threads/selectors in a pretty tight loop. I'm not sure how long it takes the system to close fd's associated with the selector, but maybe this is something like how closed sockets can linger. I might try to put a lengthier sleep after the factory.shutdown() call. There have been known issues with file descriptor leaks when calling selector.close(), so we might check JDK versions we're each running. I found the following bug affecting JDK5u28/6u30/7u5 http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7118373 NIOServerCnxnFactory (new code introduced in ZK-1504) opens selectors but never closes them --- Key: ZOOKEEPER-1620 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1620 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.5.0 Reporter: Alexander Shraer Assignee: Thawan Kooburat Fix For: 3.5.0 Attachments: ZOOKEEPER-1620.patch, ZOOKEEPER-1620.patch New code (committed in ZK-1504) opens selectors but doesn't close them. Specifically AbstractSelectThread in its constructor does this.selector = Selector.open(); But possibly also elsewhere. Tests fail for me with the following message: java.io.IOException: Too many open files at sun.nio.ch.EPollArrayWrapper.epollCreate(Native Method) at sun.nio.ch.EPollArrayWrapper.init(EPollArrayWrapper.java:69) at sun.nio.ch.EPollSelectorImpl.init(EPollSelectorImpl.java:52) at sun.nio.ch.EPollSelectorProvider.openSelector(EPollSelectorProvider.java:18) at java.nio.channels.Selector.open(Selector.java:209) at org.apache.zookeeper.server.NIOServerCnxnFactory$AbstractSelectThread.init(NIOServerCnxnFactory.java:128) at org.apache.zookeeper.server.NIOServerCnxnFactory$AcceptThread.init(NIOServerCnxnFactory.java:177) at org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:663) at org.apache.zookeeper.server.ServerCnxnFactory.createFactory(ServerCnxnFactory.java:127) at org.apache.zookeeper.server.quorum.QuorumPeer.init(QuorumPeer.java:709) at org.apache.zookeeper.test.QuorumBase.startServers(QuorumBase.java:177) at org.apache.zookeeper.test.QuorumBase.setUp(QuorumBase.java:113) at org.apache.zookeeper.test.QuorumBase.setUp(QuorumBase.java:71) at org.apache.zookeeper.test.ReconfigTest.setUp(ReconfigTest.java:56) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1505) Multi-thread CommitProcessor
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530222#comment-13530222 ] Jay Shrauner commented on ZOOKEEPER-1505: - Alex- The race condition is within FinalRequestProcessor on any node--Leader, Follower, or Observer. This has nothing to do with serialization order of the leader. Watch setting/firing is a write operation only on the local node of locally maintained state. What happens is, say client A is toggling the value of node /X from 1 to 2, and client B is reading and setting a watch on node /X. Client B will always see a consistent view; it may however not receive a watch firing so it may never know to read value 2. If client B is relying on timely watch firing to keep its data fresh, this is a problem. 1. It is possible for thread C1 to process client B reading value 1 and setting the watch; thread C2 to process client A writing 2 to /X, firing the watch, writing this out to client B's network stack (the watch firing); and finally thread C1 to push the read of value 1 onto client B's network stack. Because the return value of a getData-and-setWatch call came after the watch fired, the client will possibly ignore the watch firing. So eg say client B had originally responded to a watch firing on /X. In its view, it sees /X watch fire, it sends a getData request, it sees /X watch fire again (which it ignores, because it already has a getData outstanding), and finally it gets the response to its getData request. 2. It is also possible for client B to read value 1, client A to write value 2 and check for watch firing, and then for client B to reset the watch. There is no locking guarding the atomicity of client B reading /X and setting the watch on /X. It is relatively straightforward to add locking preventing case (2), but for case (1) I think we need to restrict parallelism in FinalRequestProcessor. We can improve the parallelism here, but it hit the point where I wanted to leave that for a future Jira. If we could identify which read requests set watches, and treat those as a third type, we could then allow pure read requests from client B to process simultaneously with write request from client A. Current code only fully parses getData and other read request blocks in FinalRequestProcessor, so we would need to move this up earlier, which might however have performance implications. Multi-thread CommitProcessor Key: ZOOKEEPER-1505 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1505 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.3, 3.4.4, 3.5.0 Reporter: Jay Shrauner Assignee: Jay Shrauner Labels: performance, scaling Fix For: 3.5.0 Attachments: ZOOKEEPER-1505.patch, ZOOKEEPER-1505.patch, ZOOKEEPER-1505.patch, ZOOKEEPER-1505.patch CommitProcessor has a single thread that both pulls requests off its queues and runs all downstream processors. This is noticeably inefficient for read-intensive workloads, which could be run concurrently. The trick is handling write transactions. I propose multi-threading this code according to the following two constraints - each session must see its requests responded to in order - all committed transactions must be handled in zxid order, across all sessions I believe these cover the only constraints we need to honor. In particular, I believe we can relax the following: - it does not matter if the read request in one session happens before or after the write request in another session With these constraints, I propose the following threads - 1primary queue servicing/work dispatching thread - 0-N assignable worker threads, where a given session is always assigned to the same worker thread By assigning sessions always to the same worker thread (using a simple sessionId mod number of worker threads), we guarantee the first constraint-- requests we push onto the thread queue are processed in order. The way we guarantee the second constraint is we only allow a single commit transaction to be in flight at a time--the queue servicing thread blocks while a commit transaction is in flight, and when the transaction completes it clears the flag. On a 32 core machine running Linux 2.6.38, achieved best performance with 32 worker threads for a 56% +/- 5% improvement in throughput (this improvement was measured on top of that for ZOOKEEPER-1504, not in isolation). New classes introduced in this patch are: WorkerService (also in ZOOKEEPER-1504): ExecutorService wrapper that makes worker threads daemon threads and names then in an easily debuggable manner. Supports assignable threads (as used here) and non-assignable
[jira] [Updated] (ZOOKEEPER-1504) Multi-thread NIOServerCnxn
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Shrauner updated ZOOKEEPER-1504: Attachment: ZOOKEEPER-1504.patch Rebase Multi-thread NIOServerCnxn -- Key: ZOOKEEPER-1504 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1504 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.3, 3.4.4, 3.5.0 Reporter: Jay Shrauner Assignee: Jay Shrauner Labels: performance, scaling Fix For: 3.5.0 Attachments: ZOOKEEPER-1504.patch, ZOOKEEPER-1504.patch, ZOOKEEPER-1504.patch, ZOOKEEPER-1504.patch NIOServerCnxnFactory is single threaded, which doesn't scale well to large numbers of clients. This is particularly noticeable when thousands of clients connect. I propose multi-threading this code as follows: - 1 acceptor thread, for accepting new connections - 1-N selector threads - 0-M I/O worker threads Numbers of threads are configurable, with defaults scaling according to number of cores. Communication with the selector threads is handled via LinkedBlockingQueues, and connections are permanently assigned to a particular selector thread so that all potentially blocking SelectionKey operations can be performed solely by the selector thread. An ExecutorService is used for the worker threads. On a 32 core machine running Linux 2.6.38, achieved best performance with 4 selector threads and 64 worker threads for a 70% +/- 5% improvement in throughput. This patch incorporates and supersedes the patches for https://issues.apache.org/jira/browse/ZOOKEEPER-517 https://issues.apache.org/jira/browse/ZOOKEEPER-1444 New classes introduced in this patch are: - ExpiryQueue (from ZOOKEEPER-1444): factor out the logic from SessionTrackerImpl used to expire sessions so that the same logic can be used to expire connections - RateLogger (from ZOOKEEPER-517): rate limit error message logging, currently only used to throttle rate of logging out of file descriptors errors - WorkerService (also in ZOOKEEPER-1505): ExecutorService wrapper that makes worker threads daemon threads and names then in an easily debuggable manner. Supports assignable threads (as used by CommitProcessor) and non-assignable threads (as used here). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (ZOOKEEPER-1505) Multi-thread CommitProcessor
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Shrauner updated ZOOKEEPER-1505: Attachment: ZOOKEEPER-1505.patch Address feedback from review--shutdown CommitProcessor if downstream processor throws an exception (preserves previous behavior) Multi-thread CommitProcessor Key: ZOOKEEPER-1505 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1505 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.3, 3.4.4, 3.5.0 Reporter: Jay Shrauner Assignee: Jay Shrauner Labels: performance, scaling Fix For: 3.5.0 Attachments: ZOOKEEPER-1505.patch, ZOOKEEPER-1505.patch, ZOOKEEPER-1505.patch CommitProcessor has a single thread that both pulls requests off its queues and runs all downstream processors. This is noticeably inefficient for read-intensive workloads, which could be run concurrently. The trick is handling write transactions. I propose multi-threading this code according to the following two constraints - each session must see its requests responded to in order - all committed transactions must be handled in zxid order, across all sessions I believe these cover the only constraints we need to honor. In particular, I believe we can relax the following: - it does not matter if the read request in one session happens before or after the write request in another session With these constraints, I propose the following threads - 1primary queue servicing/work dispatching thread - 0-N assignable worker threads, where a given session is always assigned to the same worker thread By assigning sessions always to the same worker thread (using a simple sessionId mod number of worker threads), we guarantee the first constraint-- requests we push onto the thread queue are processed in order. The way we guarantee the second constraint is we only allow a single commit transaction to be in flight at a time--the queue servicing thread blocks while a commit transaction is in flight, and when the transaction completes it clears the flag. On a 32 core machine running Linux 2.6.38, achieved best performance with 32 worker threads for a 56% +/- 5% improvement in throughput (this improvement was measured on top of that for ZOOKEEPER-1504, not in isolation). New classes introduced in this patch are: WorkerService (also in ZOOKEEPER-1504): ExecutorService wrapper that makes worker threads daemon threads and names then in an easily debuggable manner. Supports assignable threads (as used here) and non-assignable threads (as used by NIOServerCnxnFactory). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (ZOOKEEPER-1147) Add support for local sessions
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Shrauner reassigned ZOOKEEPER-1147: --- Assignee: Thawan Kooburat (was: Jay Shrauner) Add support for local sessions -- Key: ZOOKEEPER-1147 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1147 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.3.3 Reporter: Vishal Kathuria Assignee: Thawan Kooburat Labels: api-change, scaling Fix For: 3.5.0 Original Estimate: 840h Remaining Estimate: 840h This improvement is in the bucket of making ZooKeeper work at a large scale. We are planning on having about a 1 million clients connect to a ZooKeeper ensemble through a set of 50-100 observers. Majority of these clients are read only - ie they do not do any updates or create ephemeral nodes. In ZooKeeper today, the client creates a session and the session creation is handled like any other update. In the above use case, the session create/drop workload can easily overwhelm an ensemble. The following is a proposal for a local session, to support a larger number of connections. 1. The idea is to introduce a new type of session - local session. A local session doesn't have a full functionality of a normal session. 2. Local sessions cannot create ephemeral nodes. 3. Once a local session is lost, you cannot re-establish it using the session-id/password. The session and its watches are gone for good. 4. When a local session connects, the session info is only maintained on the zookeeper server (in this case, an observer) that it is connected to. The leader is not aware of the creation of such a session and there is no state written to disk. 5. The pings and expiration is handled by the server that the session is connected to. With the above changes, we can make ZooKeeper scale to a much larger number of clients without making the core ensemble a bottleneck. In terms of API, there are two options that are being considered 1. Let the client specify at the connect time which kind of session do they want. 2. All sessions connect as local sessions and automatically get promoted to global sessions when they do an operation that requires a global session (e.g. creating an ephemeral node) Chubby took the approach of lazily promoting all sessions to global, but I don't think that would work in our case, where we want to keep sessions which never create ephemeral nodes as always local. Option 2 would make it more broadly usable but option 1 would be easier to implement. We are thinking of implementing option 1 as the first cut. There would be a client flag, IsLocalSession (much like the current readOnly flag) that would be used to determine whether to create a local session or a global session. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1505) Multi-thread CommitProcessor
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13475444#comment-13475444 ] Jay Shrauner commented on ZOOKEEPER-1505: - Findbug warning (naked notify) is bogus; this is a helper routine to wakeup the main thread with the state change happening in the routines that call it. From the blurb in findbug: This bug does not necessarily indicate an error, since the change to mutable object state may have taken place in a method which then called the method containing the notification. Multi-thread CommitProcessor Key: ZOOKEEPER-1505 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1505 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.3, 3.4.4, 3.5.0 Reporter: Jay Shrauner Assignee: Jay Shrauner Labels: performance, scaling Fix For: 3.5.0 Attachments: ZOOKEEPER-1505.patch, ZOOKEEPER-1505.patch, ZOOKEEPER-1505.patch CommitProcessor has a single thread that both pulls requests off its queues and runs all downstream processors. This is noticeably inefficient for read-intensive workloads, which could be run concurrently. The trick is handling write transactions. I propose multi-threading this code according to the following two constraints - each session must see its requests responded to in order - all committed transactions must be handled in zxid order, across all sessions I believe these cover the only constraints we need to honor. In particular, I believe we can relax the following: - it does not matter if the read request in one session happens before or after the write request in another session With these constraints, I propose the following threads - 1primary queue servicing/work dispatching thread - 0-N assignable worker threads, where a given session is always assigned to the same worker thread By assigning sessions always to the same worker thread (using a simple sessionId mod number of worker threads), we guarantee the first constraint-- requests we push onto the thread queue are processed in order. The way we guarantee the second constraint is we only allow a single commit transaction to be in flight at a time--the queue servicing thread blocks while a commit transaction is in flight, and when the transaction completes it clears the flag. On a 32 core machine running Linux 2.6.38, achieved best performance with 32 worker threads for a 56% +/- 5% improvement in throughput (this improvement was measured on top of that for ZOOKEEPER-1504, not in isolation). New classes introduced in this patch are: WorkerService (also in ZOOKEEPER-1504): ExecutorService wrapper that makes worker threads daemon threads and names then in an easily debuggable manner. Supports assignable threads (as used here) and non-assignable threads (as used by NIOServerCnxnFactory). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (ZOOKEEPER-1505) Multi-thread CommitProcessor
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Shrauner updated ZOOKEEPER-1505: Attachment: ZOOKEEPER-1505.patch - Addressed reviewboard comments. - Added unit test. - Bugfix for issue Thawan found with watch resets on read requests in one session racing a write request affecting that watch in another session. Solution taken here is to prevent any read requests at all from running concurrently with a write request. There is room for further improvement, by parsing the request earlier in the pipeline and identifying read requests with watch resets. Multi-thread CommitProcessor Key: ZOOKEEPER-1505 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1505 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.3, 3.4.4, 3.5.0 Reporter: Jay Shrauner Assignee: Jay Shrauner Labels: performance, scaling Fix For: 3.5.0 Attachments: ZOOKEEPER-1505.patch, ZOOKEEPER-1505.patch CommitProcessor has a single thread that both pulls requests off its queues and runs all downstream processors. This is noticeably inefficient for read-intensive workloads, which could be run concurrently. The trick is handling write transactions. I propose multi-threading this code according to the following two constraints - each session must see its requests responded to in order - all committed transactions must be handled in zxid order, across all sessions I believe these cover the only constraints we need to honor. In particular, I believe we can relax the following: - it does not matter if the read request in one session happens before or after the write request in another session With these constraints, I propose the following threads - 1primary queue servicing/work dispatching thread - 0-N assignable worker threads, where a given session is always assigned to the same worker thread By assigning sessions always to the same worker thread (using a simple sessionId mod number of worker threads), we guarantee the first constraint-- requests we push onto the thread queue are processed in order. The way we guarantee the second constraint is we only allow a single commit transaction to be in flight at a time--the queue servicing thread blocks while a commit transaction is in flight, and when the transaction completes it clears the flag. On a 32 core machine running Linux 2.6.38, achieved best performance with 32 worker threads for a 56% +/- 5% improvement in throughput (this improvement was measured on top of that for ZOOKEEPER-1504, not in isolation). New classes introduced in this patch are: WorkerService (also in ZOOKEEPER-1504): ExecutorService wrapper that makes worker threads daemon threads and names then in an easily debuggable manner. Supports assignable threads (as used here) and non-assignable threads (as used by NIOServerCnxnFactory). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (ZOOKEEPER-1504) Multi-thread NIOServerCnxn
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Shrauner updated ZOOKEEPER-1504: Attachment: ZOOKEEPER-1504.patch Split connection expiration out into separate thread. Multi-thread NIOServerCnxn -- Key: ZOOKEEPER-1504 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1504 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.3, 3.4.4, 3.5.0 Reporter: Jay Shrauner Assignee: Jay Shrauner Labels: performance, scaling Fix For: 3.5.0 Attachments: ZOOKEEPER-1504.patch, ZOOKEEPER-1504.patch, ZOOKEEPER-1504.patch NIOServerCnxnFactory is single threaded, which doesn't scale well to large numbers of clients. This is particularly noticeable when thousands of clients connect. I propose multi-threading this code as follows: - 1 acceptor thread, for accepting new connections - 1-N selector threads - 0-M I/O worker threads Numbers of threads are configurable, with defaults scaling according to number of cores. Communication with the selector threads is handled via LinkedBlockingQueues, and connections are permanently assigned to a particular selector thread so that all potentially blocking SelectionKey operations can be performed solely by the selector thread. An ExecutorService is used for the worker threads. On a 32 core machine running Linux 2.6.38, achieved best performance with 4 selector threads and 64 worker threads for a 70% +/- 5% improvement in throughput. This patch incorporates and supersedes the patches for https://issues.apache.org/jira/browse/ZOOKEEPER-517 https://issues.apache.org/jira/browse/ZOOKEEPER-1444 New classes introduced in this patch are: - ExpiryQueue (from ZOOKEEPER-1444): factor out the logic from SessionTrackerImpl used to expire sessions so that the same logic can be used to expire connections - RateLogger (from ZOOKEEPER-517): rate limit error message logging, currently only used to throttle rate of logging out of file descriptors errors - WorkerService (also in ZOOKEEPER-1505): ExecutorService wrapper that makes worker threads daemon threads and names then in an easily debuggable manner. Supports assignable threads (as used by CommitProcessor) and non-assignable threads (as used here). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1505) Multi-thread CommitProcessor
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426170#comment-13426170 ] Jay Shrauner commented on ZOOKEEPER-1505: - Posted to reviewboard https://reviews.apache.org/r/6260/ Multi-thread CommitProcessor Key: ZOOKEEPER-1505 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1505 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.3, 3.4.4, 3.5.0 Reporter: Jay Shrauner Assignee: Jay Shrauner Labels: performance, scaling Fix For: 3.5.0 Attachments: ZOOKEEPER-1505.patch CommitProcessor has a single thread that both pulls requests off its queues and runs all downstream processors. This is noticeably inefficient for read-intensive workloads, which could be run concurrently. The trick is handling write transactions. I propose multi-threading this code according to the following two constraints - each session must see its requests responded to in order - all committed transactions must be handled in zxid order, across all sessions I believe these cover the only constraints we need to honor. In particular, I believe we can relax the following: - it does not matter if the read request in one session happens before or after the write request in another session With these constraints, I propose the following threads - 1primary queue servicing/work dispatching thread - 0-N assignable worker threads, where a given session is always assigned to the same worker thread By assigning sessions always to the same worker thread (using a simple sessionId mod number of worker threads), we guarantee the first constraint-- requests we push onto the thread queue are processed in order. The way we guarantee the second constraint is we only allow a single commit transaction to be in flight at a time--the queue servicing thread blocks while a commit transaction is in flight, and when the transaction completes it clears the flag. On a 32 core machine running Linux 2.6.38, achieved best performance with 32 worker threads for a 56% +/- 5% improvement in throughput (this improvement was measured on top of that for ZOOKEEPER-1504, not in isolation). New classes introduced in this patch are: WorkerService (also in ZOOKEEPER-1504): ExecutorService wrapper that makes worker threads daemon threads and names then in an easily debuggable manner. Supports assignable threads (as used here) and non-assignable threads (as used by NIOServerCnxnFactory). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1505) Multi-thread CommitProcessor
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13410458#comment-13410458 ] Jay Shrauner commented on ZOOKEEPER-1505: - FindBug warning is Naked notify in org.apache.zookeeper.server.quorum.CommitProcessor.wakeup(). Explanation of warning states This bug does not necessarily indicate an error, since the change to mutable object state may have taken place in a method which then called the method containing the notification. which is exactly the situation here. Testing: I haven't found the unit tests always to be the best way to find multi-threading issues (even the hammer ones, although they're helpful). Tested and debugged by running on an ensemble and driving test load, and then by running on our production system. Multi-thread CommitProcessor Key: ZOOKEEPER-1505 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1505 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.3, 3.4.4, 3.5.0 Reporter: Jay Shrauner Assignee: Jay Shrauner Labels: performance, scaling Fix For: 3.5.0 Attachments: ZOOKEEPER-1505.patch CommitProcessor has a single thread that both pulls requests off its queues and runs all downstream processors. This is noticeably inefficient for read-intensive workloads, which could be run concurrently. The trick is handling write transactions. I propose multi-threading this code according to the following two constraints - each session must see its requests responded to in order - all committed transactions must be handled in zxid order, across all sessions I believe these cover the only constraints we need to honor. In particular, I believe we can relax the following: - it does not matter if the read request in one session happens before or after the write request in another session With these constraints, I propose the following threads - 1primary queue servicing/work dispatching thread - 0-N assignable worker threads, where a given session is always assigned to the same worker thread By assigning sessions always to the same worker thread (using a simple sessionId mod number of worker threads), we guarantee the first constraint-- requests we push onto the thread queue are processed in order. The way we guarantee the second constraint is we only allow a single commit transaction to be in flight at a time--the queue servicing thread blocks while a commit transaction is in flight, and when the transaction completes it clears the flag. On a 32 core machine running Linux 2.6.38, achieved best performance with 32 worker threads for a 56% +/- 5% improvement in throughput (this improvement was measured on top of that for ZOOKEEPER-1504, not in isolation). New classes introduced in this patch are: WorkerService (also in ZOOKEEPER-1504): ExecutorService wrapper that makes worker threads daemon threads and names then in an easily debuggable manner. Supports assignable threads (as used here) and non-assignable threads (as used by NIOServerCnxnFactory). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (ZOOKEEPER-1504) Multi-thread NIOServerCnxn
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Shrauner updated ZOOKEEPER-1504: Attachment: ZOOKEEPER-1504.patch Address findbugs warnings Multi-thread NIOServerCnxn -- Key: ZOOKEEPER-1504 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1504 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.3, 3.4.4, 3.5.0 Reporter: Jay Shrauner Assignee: Jay Shrauner Labels: performance, scaling Fix For: 3.5.0 Attachments: ZOOKEEPER-1504.patch, ZOOKEEPER-1504.patch NIOServerCnxnFactory is single threaded, which doesn't scale well to large numbers of clients. This is particularly noticeable when thousands of clients connect. I propose multi-threading this code as follows: - 1 acceptor thread, for accepting new connections - 1-N selector threads - 0-M I/O worker threads Numbers of threads are configurable, with defaults scaling according to number of cores. Communication with the selector threads is handled via LinkedBlockingQueues, and connections are permanently assigned to a particular selector thread so that all potentially blocking SelectionKey operations can be performed solely by the selector thread. An ExecutorService is used for the worker threads. On a 32 core machine running Linux 2.6.38, achieved best performance with 4 selector threads and 64 worker threads for a 70% +/- 5% improvement in throughput. This patch incorporates and supersedes the patches for https://issues.apache.org/jira/browse/ZOOKEEPER-517 https://issues.apache.org/jira/browse/ZOOKEEPER-1444 New classes introduced in this patch are: - ExpiryQueue (from ZOOKEEPER-1444): factor out the logic from SessionTrackerImpl used to expire sessions so that the same logic can be used to expire connections - RateLogger (from ZOOKEEPER-517): rate limit error message logging, currently only used to throttle rate of logging out of file descriptors errors - WorkerService (also in ZOOKEEPER-1505): ExecutorService wrapper that makes worker threads daemon threads and names then in an easily debuggable manner. Supports assignable threads (as used by CommitProcessor) and non-assignable threads (as used here). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (ZOOKEEPER-1504) Multi-thread NIOServerCnxn
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Shrauner updated ZOOKEEPER-1504: Attachment: (was: ZOOKEEPER-1504.patch) Multi-thread NIOServerCnxn -- Key: ZOOKEEPER-1504 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1504 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.3, 3.4.4, 3.5.0 Reporter: Jay Shrauner Assignee: Jay Shrauner Labels: performance, scaling Fix For: 3.5.0 NIOServerCnxnFactory is single threaded, which doesn't scale well to large numbers of clients. This is particularly noticeable when thousands of clients connect. I propose multi-threading this code as follows: - 1 acceptor thread, for accepting new connections - 1-N selector threads - 0-M I/O worker threads Numbers of threads are configurable, with defaults scaling according to number of cores. Communication with the selector threads is handled via LinkedBlockingQueues, and connections are permanently assigned to a particular selector thread so that all potentially blocking SelectionKey operations can be performed solely by the selector thread. An ExecutorService is used for the worker threads. On a 32 core machine running Linux 2.6.38, achieved best performance with 4 selector threads and 64 worker threads for a 70% +/- 5% improvement in throughput. This patch incorporates and supersedes the patches for https://issues.apache.org/jira/browse/ZOOKEEPER-517 https://issues.apache.org/jira/browse/ZOOKEEPER-1444 New classes introduced in this patch are: - ExpiryQueue (from ZOOKEEPER-1444): factor out the logic from SessionTrackerImpl used to expire sessions so that the same logic can be used to expire connections - RateLogger (from ZOOKEEPER-517): rate limit error message logging, currently only used to throttle rate of logging out of file descriptors errors - WorkerService (also in ZOOKEEPER-1505): ExecutorService wrapper that makes worker threads daemon threads and names then in an easily debuggable manner. Supports assignable threads (as used by CommitProcessor) and non-assignable threads (as used here). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (ZOOKEEPER-1504) Multi-thread NIOServerCnxn
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Shrauner updated ZOOKEEPER-1504: Attachment: ZOOKEEPER-1504.patch Multi-thread NIOServerCnxn -- Key: ZOOKEEPER-1504 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1504 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.3, 3.4.4, 3.5.0 Reporter: Jay Shrauner Assignee: Jay Shrauner Labels: performance, scaling Fix For: 3.5.0 Attachments: ZOOKEEPER-1504.patch NIOServerCnxnFactory is single threaded, which doesn't scale well to large numbers of clients. This is particularly noticeable when thousands of clients connect. I propose multi-threading this code as follows: - 1 acceptor thread, for accepting new connections - 1-N selector threads - 0-M I/O worker threads Numbers of threads are configurable, with defaults scaling according to number of cores. Communication with the selector threads is handled via LinkedBlockingQueues, and connections are permanently assigned to a particular selector thread so that all potentially blocking SelectionKey operations can be performed solely by the selector thread. An ExecutorService is used for the worker threads. On a 32 core machine running Linux 2.6.38, achieved best performance with 4 selector threads and 64 worker threads for a 70% +/- 5% improvement in throughput. This patch incorporates and supersedes the patches for https://issues.apache.org/jira/browse/ZOOKEEPER-517 https://issues.apache.org/jira/browse/ZOOKEEPER-1444 New classes introduced in this patch are: - ExpiryQueue (from ZOOKEEPER-1444): factor out the logic from SessionTrackerImpl used to expire sessions so that the same logic can be used to expire connections - RateLogger (from ZOOKEEPER-517): rate limit error message logging, currently only used to throttle rate of logging out of file descriptors errors - WorkerService (also in ZOOKEEPER-1505): ExecutorService wrapper that makes worker threads daemon threads and names then in an easily debuggable manner. Supports assignable threads (as used by CommitProcessor) and non-assignable threads (as used here). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (ZOOKEEPER-1505) Multi-thread CommitProcessor
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Shrauner updated ZOOKEEPER-1505: Attachment: (was: ZOOKEEPER-1505.patch) Multi-thread CommitProcessor Key: ZOOKEEPER-1505 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1505 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.3, 3.4.4, 3.5.0 Reporter: Jay Shrauner Assignee: Jay Shrauner Labels: performance, scaling Fix For: 3.5.0 Attachments: ZOOKEEPER-1505.patch CommitProcessor has a single thread that both pulls requests off its queues and runs all downstream processors. This is noticeably inefficient for read-intensive workloads, which could be run concurrently. The trick is handling write transactions. I propose multi-threading this code according to the following two constraints - each session must see its requests responded to in order - all committed transactions must be handled in zxid order, across all sessions I believe these cover the only constraints we need to honor. In particular, I believe we can relax the following: - it does not matter if the read request in one session happens before or after the write request in another session With these constraints, I propose the following threads - 1primary queue servicing/work dispatching thread - 0-N assignable worker threads, where a given session is always assigned to the same worker thread By assigning sessions always to the same worker thread (using a simple sessionId mod number of worker threads), we guarantee the first constraint-- requests we push onto the thread queue are processed in order. The way we guarantee the second constraint is we only allow a single commit transaction to be in flight at a time--the queue servicing thread blocks while a commit transaction is in flight, and when the transaction completes it clears the flag. On a 32 core machine running Linux 2.6.38, achieved best performance with 32 worker threads for a 56% +/- 5% improvement in throughput (this improvement was measured on top of that for ZOOKEEPER-1504, not in isolation). New classes introduced in this patch are: WorkerService (also in ZOOKEEPER-1504): ExecutorService wrapper that makes worker threads daemon threads and names then in an easily debuggable manner. Supports assignable threads (as used here) and non-assignable threads (as used by NIOServerCnxnFactory). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (ZOOKEEPER-1505) Multi-thread CommitProcessor
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Shrauner updated ZOOKEEPER-1505: Attachment: ZOOKEEPER-1505.patch Multi-thread CommitProcessor Key: ZOOKEEPER-1505 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1505 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.3, 3.4.4, 3.5.0 Reporter: Jay Shrauner Assignee: Jay Shrauner Labels: performance, scaling Fix For: 3.5.0 Attachments: ZOOKEEPER-1505.patch CommitProcessor has a single thread that both pulls requests off its queues and runs all downstream processors. This is noticeably inefficient for read-intensive workloads, which could be run concurrently. The trick is handling write transactions. I propose multi-threading this code according to the following two constraints - each session must see its requests responded to in order - all committed transactions must be handled in zxid order, across all sessions I believe these cover the only constraints we need to honor. In particular, I believe we can relax the following: - it does not matter if the read request in one session happens before or after the write request in another session With these constraints, I propose the following threads - 1primary queue servicing/work dispatching thread - 0-N assignable worker threads, where a given session is always assigned to the same worker thread By assigning sessions always to the same worker thread (using a simple sessionId mod number of worker threads), we guarantee the first constraint-- requests we push onto the thread queue are processed in order. The way we guarantee the second constraint is we only allow a single commit transaction to be in flight at a time--the queue servicing thread blocks while a commit transaction is in flight, and when the transaction completes it clears the flag. On a 32 core machine running Linux 2.6.38, achieved best performance with 32 worker threads for a 56% +/- 5% improvement in throughput (this improvement was measured on top of that for ZOOKEEPER-1504, not in isolation). New classes introduced in this patch are: WorkerService (also in ZOOKEEPER-1504): ExecutorService wrapper that makes worker threads daemon threads and names then in an easily debuggable manner. Supports assignable threads (as used here) and non-assignable threads (as used by NIOServerCnxnFactory). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (ZOOKEEPER-1504) Multi-thread NIOServerCnxn
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Shrauner updated ZOOKEEPER-1504: Attachment: ZOOKEEPER-1504.patch Multi-thread NIOServerCnxn -- Key: ZOOKEEPER-1504 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1504 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.3, 3.4.4, 3.5.0 Reporter: Jay Shrauner Assignee: Jay Shrauner Labels: performance, scaling Fix For: 3.5.0 Attachments: ZOOKEEPER-1504.patch, ZOOKEEPER-1504.patch NIOServerCnxnFactory is single threaded, which doesn't scale well to large numbers of clients. This is particularly noticeable when thousands of clients connect. I propose multi-threading this code as follows: - 1 acceptor thread, for accepting new connections - 1-N selector threads - 0-M I/O worker threads Numbers of threads are configurable, with defaults scaling according to number of cores. Communication with the selector threads is handled via LinkedBlockingQueues, and connections are permanently assigned to a particular selector thread so that all potentially blocking SelectionKey operations can be performed solely by the selector thread. An ExecutorService is used for the worker threads. On a 32 core machine running Linux 2.6.38, achieved best performance with 4 selector threads and 64 worker threads for a 70% +/- 5% improvement in throughput. This patch incorporates and supersedes the patches for https://issues.apache.org/jira/browse/ZOOKEEPER-517 https://issues.apache.org/jira/browse/ZOOKEEPER-1444 New classes introduced in this patch are: - ExpiryQueue (from ZOOKEEPER-1444): factor out the logic from SessionTrackerImpl used to expire sessions so that the same logic can be used to expire connections - RateLogger (from ZOOKEEPER-517): rate limit error message logging, currently only used to throttle rate of logging out of file descriptors errors - WorkerService (also in ZOOKEEPER-1505): ExecutorService wrapper that makes worker threads daemon threads and names then in an easily debuggable manner. Supports assignable threads (as used by CommitProcessor) and non-assignable threads (as used here). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (ZOOKEEPER-1504) Multi-thread NIOServerCnxn
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Shrauner updated ZOOKEEPER-1504: Attachment: (was: ZOOKEEPER-1504.patch) Multi-thread NIOServerCnxn -- Key: ZOOKEEPER-1504 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1504 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.3, 3.4.4, 3.5.0 Reporter: Jay Shrauner Assignee: Jay Shrauner Labels: performance, scaling Fix For: 3.5.0 Attachments: ZOOKEEPER-1504.patch NIOServerCnxnFactory is single threaded, which doesn't scale well to large numbers of clients. This is particularly noticeable when thousands of clients connect. I propose multi-threading this code as follows: - 1 acceptor thread, for accepting new connections - 1-N selector threads - 0-M I/O worker threads Numbers of threads are configurable, with defaults scaling according to number of cores. Communication with the selector threads is handled via LinkedBlockingQueues, and connections are permanently assigned to a particular selector thread so that all potentially blocking SelectionKey operations can be performed solely by the selector thread. An ExecutorService is used for the worker threads. On a 32 core machine running Linux 2.6.38, achieved best performance with 4 selector threads and 64 worker threads for a 70% +/- 5% improvement in throughput. This patch incorporates and supersedes the patches for https://issues.apache.org/jira/browse/ZOOKEEPER-517 https://issues.apache.org/jira/browse/ZOOKEEPER-1444 New classes introduced in this patch are: - ExpiryQueue (from ZOOKEEPER-1444): factor out the logic from SessionTrackerImpl used to expire sessions so that the same logic can be used to expire connections - RateLogger (from ZOOKEEPER-517): rate limit error message logging, currently only used to throttle rate of logging out of file descriptors errors - WorkerService (also in ZOOKEEPER-1505): ExecutorService wrapper that makes worker threads daemon threads and names then in an easily debuggable manner. Supports assignable threads (as used by CommitProcessor) and non-assignable threads (as used here). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (ZOOKEEPER-1505) Multi-thread CommitProcessor
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Shrauner updated ZOOKEEPER-1505: Description: CommitProcessor has a single thread that both pulls requests off its queues and runs all downstream processors. This is noticeably inefficient for read-intensive workloads, which could be run concurrently. The trick is handling write transactions. I propose multi-threading this code according to the following two constraints - each session must see its requests responded to in order - all committed transactions must be handled in zxid order, across all sessions I believe these cover the only constraints we need to honor. In particular, I believe we can relax the following: - it does not matter if the read request in one session happens before or after the write request in another session With these constraints, I propose the following threads - 1primary queue servicing/work dispatching thread - 0-N assignable worker threads, where a given session is always assigned to the same worker thread By assigning sessions always to the same worker thread (using a simple sessionId mod number of worker threads), we guarantee the first constraint-- requests we push onto the thread queue are processed in order. The way we guarantee the second constraint is we only allow a single commit transaction to be in flight at a time--the queue servicing thread blocks while a commit transaction is in flight, and when the transaction completes it clears the flag. On a 32 core machine running Linux 2.6.38, achieved best performance with 32 worker threads for a 56% +/- 5% improvement in throughput (this improvement was measured on top of that for ZOOKEEPER-1504, not in isolation). New classes introduced in this patch are: WorkerService (also in ZOOKEEPER-1504): ExecutorService wrapper that makes worker threads daemon threads and names then in an easily debuggable manner. Supports assignable threads (as used here) and non-assignable threads (as used by NIOServerCnxnFactory). was: CommitProcessor has a single thread that both pulls requests off its queues and runs all downstream processors. This is noticeably inefficient for read-intensive workloads, which could be run concurrently. The trick is handling write transactions. I propose multi-threading this code according to the following two constraints - each session must see its requests responded to in order - all committed transactions must be handled in zxid order, across all sessions I believe these cover the only constraints we need to honor. In particular, I believe we can relax the following: - it does not matter if the read request in one session happens before or after the write request in another session With these constraints, I propose the following threads - 1primary queue servicing/work dispatching thread - 0-N assignable worker threads, where a given session is always assigned to the same worker thread By assigning sessions always to the same worker thread (using a simple sessionId mod number of worker threads), we guarantee the first constraint--requests we push onto the thread queue are processed in order. The way we guarantee the second constraint is we only allow a single commit transaction to be in flight at a time--the queue servicing thread blocks while a commit transaction is in flight, and when the transaction completes it clears the flag. On a 32 core machine running Linux 2.6.38, achieved best performance with 32 worker threads for a 56% +/- 5% improvement in throughput (this improvement was measured on top of that for ZOOKEEPER-1504, not in isolation). New classes introduced in this patch are: WorkerService (also in ZOOKEEPER-1504): ExecutorService wrapper that makes worker threads daemon threads and names then in an easily debuggable manner. Supports assignable threads (as used here) and non-assignable threads (as used by NIOServerCnxnFactory). Multi-thread CommitProcessor Key: ZOOKEEPER-1505 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1505 Project: ZooKeeper Issue Type: Improvement Components: server Reporter: Jay Shrauner Assignee: Jay Shrauner Labels: performance, scaling Attachments: ZOOKEEPER-1505.patch CommitProcessor has a single thread that both pulls requests off its queues and runs all downstream processors. This is noticeably inefficient for read-intensive workloads, which could be run concurrently. The trick is handling write transactions. I propose multi-threading this code according to the following two constraints - each session must see its requests responded to in order - all committed transactions must be handled in zxid order, across all
[jira] [Updated] (ZOOKEEPER-1504) Multi-thread NIOServerCnxn
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Shrauner updated ZOOKEEPER-1504: Fix Version/s: 3.5.0 Multi-thread NIOServerCnxn -- Key: ZOOKEEPER-1504 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1504 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.3, 3.4.4, 3.5.0 Reporter: Jay Shrauner Assignee: Jay Shrauner Labels: performance, scaling Fix For: 3.5.0 Attachments: ZOOKEEPER-1504.patch NIOServerCnxnFactory is single threaded, which doesn't scale well to large numbers of clients. This is particularly noticeable when thousands of clients connect. I propose multi-threading this code as follows: - 1 acceptor thread, for accepting new connections - 1-N selector threads - 0-M I/O worker threads Numbers of threads are configurable, with defaults scaling according to number of cores. Communication with the selector threads is handled via LinkedBlockingQueues, and connections are permanently assigned to a particular selector thread so that all potentially blocking SelectionKey operations can be performed solely by the selector thread. An ExecutorService is used for the worker threads. On a 32 core machine running Linux 2.6.38, achieved best performance with 4 selector threads and 64 worker threads for a 70% +/- 5% improvement in throughput. This patch incorporates and supersedes the patches for https://issues.apache.org/jira/browse/ZOOKEEPER-517 https://issues.apache.org/jira/browse/ZOOKEEPER-1444 New classes introduced in this patch are: - ExpiryQueue (from ZOOKEEPER-1444): factor out the logic from SessionTrackerImpl used to expire sessions so that the same logic can be used to expire connections - RateLogger (from ZOOKEEPER-517): rate limit error message logging, currently only used to throttle rate of logging out of file descriptors errors - WorkerService (also in ZOOKEEPER-1505): ExecutorService wrapper that makes worker threads daemon threads and names then in an easily debuggable manner. Supports assignable threads (as used by CommitProcessor) and non-assignable threads (as used here). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (ZOOKEEPER-1505) Multi-thread CommitProcessor
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Shrauner updated ZOOKEEPER-1505: Fix Version/s: 3.5.0 Multi-thread CommitProcessor Key: ZOOKEEPER-1505 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1505 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.3, 3.4.4, 3.5.0 Reporter: Jay Shrauner Assignee: Jay Shrauner Labels: performance, scaling Fix For: 3.5.0 Attachments: ZOOKEEPER-1505.patch CommitProcessor has a single thread that both pulls requests off its queues and runs all downstream processors. This is noticeably inefficient for read-intensive workloads, which could be run concurrently. The trick is handling write transactions. I propose multi-threading this code according to the following two constraints - each session must see its requests responded to in order - all committed transactions must be handled in zxid order, across all sessions I believe these cover the only constraints we need to honor. In particular, I believe we can relax the following: - it does not matter if the read request in one session happens before or after the write request in another session With these constraints, I propose the following threads - 1primary queue servicing/work dispatching thread - 0-N assignable worker threads, where a given session is always assigned to the same worker thread By assigning sessions always to the same worker thread (using a simple sessionId mod number of worker threads), we guarantee the first constraint-- requests we push onto the thread queue are processed in order. The way we guarantee the second constraint is we only allow a single commit transaction to be in flight at a time--the queue servicing thread blocks while a commit transaction is in flight, and when the transaction completes it clears the flag. On a 32 core machine running Linux 2.6.38, achieved best performance with 32 worker threads for a 56% +/- 5% improvement in throughput (this improvement was measured on top of that for ZOOKEEPER-1504, not in isolation). New classes introduced in this patch are: WorkerService (also in ZOOKEEPER-1504): ExecutorService wrapper that makes worker threads daemon threads and names then in an easily debuggable manner. Supports assignable threads (as used here) and non-assignable threads (as used by NIOServerCnxnFactory). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (ZOOKEEPER-1504) Multi-thread NIOServerCnxn
Jay Shrauner created ZOOKEEPER-1504: --- Summary: Multi-thread NIOServerCnxn Key: ZOOKEEPER-1504 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1504 Project: ZooKeeper Issue Type: Improvement Components: server Reporter: Jay Shrauner Assignee: Jay Shrauner NIOServerCnxnFactory is single threaded, which doesn't scale well to large numbers of clients. This is particularly noticeable when thousands of clients connect. I propose multi-threading this code as follows: - 1 acceptor thread, for accepting new connections - 1-N selector threads - 0-M I/O worker threads Numbers of threads are configurable, with defaults scaling according to number of cores. Communication with the selector threads is handled via LinkedBlockingQueues, and connections are permanently assigned to a particular selector thread so that all potentially blocking SelectionKey operations can be performed solely by the selector thread. An ExecutorService is used for the worker threads. On a 32 core machine running Linux 2.6.38, achieved best performance with 4 selector threads and 64 worker threads for a 70% +/- 5% improvement in throughput. This patch incorporates and supersedes the patches for https://issues.apache.org/jira/browse/ZOOKEEPER-517 https://issues.apache.org/jira/browse/ZOOKEEPER-1444 New classes introduced in this patch are: - ExpiryQueue (from ZOOKEEPER-1444): factor out the logic from SessionTrackerImpl used to expire sessions so that the same logic can be used to expire connections - RateLogger (from ZOOKEEPER-517): rate limit error message logging, currently only used to throttle rate of logging out of file descriptors errors - WorkerService: ExecutorService wrapper that makes worker threads daemon threads and names then in an easily debuggable manner -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (ZOOKEEPER-1504) Multi-thread NIOServerCnxn
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Shrauner updated ZOOKEEPER-1504: Attachment: ZOOKEEPER-1504.patch Multi-thread NIOServerCnxn -- Key: ZOOKEEPER-1504 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1504 Project: ZooKeeper Issue Type: Improvement Components: server Reporter: Jay Shrauner Assignee: Jay Shrauner Labels: perfomance Attachments: ZOOKEEPER-1504.patch NIOServerCnxnFactory is single threaded, which doesn't scale well to large numbers of clients. This is particularly noticeable when thousands of clients connect. I propose multi-threading this code as follows: - 1 acceptor thread, for accepting new connections - 1-N selector threads - 0-M I/O worker threads Numbers of threads are configurable, with defaults scaling according to number of cores. Communication with the selector threads is handled via LinkedBlockingQueues, and connections are permanently assigned to a particular selector thread so that all potentially blocking SelectionKey operations can be performed solely by the selector thread. An ExecutorService is used for the worker threads. On a 32 core machine running Linux 2.6.38, achieved best performance with 4 selector threads and 64 worker threads for a 70% +/- 5% improvement in throughput. This patch incorporates and supersedes the patches for https://issues.apache.org/jira/browse/ZOOKEEPER-517 https://issues.apache.org/jira/browse/ZOOKEEPER-1444 New classes introduced in this patch are: - ExpiryQueue (from ZOOKEEPER-1444): factor out the logic from SessionTrackerImpl used to expire sessions so that the same logic can be used to expire connections - RateLogger (from ZOOKEEPER-517): rate limit error message logging, currently only used to throttle rate of logging out of file descriptors errors - WorkerService: ExecutorService wrapper that makes worker threads daemon threads and names then in an easily debuggable manner -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1444) Idle session-less connections never time out
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13407572#comment-13407572 ] Jay Shrauner commented on ZOOKEEPER-1444: - Superseded and made obsolete by ZOOKEEPER-1504 Idle session-less connections never time out Key: ZOOKEEPER-1444 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1444 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.3.2, 3.4.3, 3.5.0 Reporter: Jay Shrauner Assignee: Jay Shrauner Priority: Critical Fix For: 3.5.0 Attachments: ZOOKEEPER-1444.patch, ZOOKEEPER-1444.patch A socket connection to the server on which a session is not created will never time out. A misbehaving client that opens and leaks connections without creating sessions will hold open file descriptors on the server. The existing timeout code is implemented at the session level, but the servers also should track and expire connections at the connection level. Proposed solution is to pull the timeout data structure handling code (hashmap of expiry time to sets of objects, simple monotonically incrementing nextExpirationTime) from SessionTrackerImpl into its own class in order to share it with connection level timeouts to be implemented in NIOServerCnxnFactory. Connections can be assigned a small initial timeout (proposing something small, like 3s) until a session is created, at which point the ServerCnxn session timeout can be used instead. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-517) NIO factory fails to close connections when the number of file handles run out.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13407571#comment-13407571 ] Jay Shrauner commented on ZOOKEEPER-517: Superseded and made obsolete by ZOOKEEPER-1504 NIO factory fails to close connections when the number of file handles run out. --- Key: ZOOKEEPER-517 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-517 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.4.3, 3.5.0 Reporter: Mahadev konar Assignee: Jay Shrauner Priority: Critical Fix For: 3.5.0 Attachments: ZOOKEEPER-517.patch The code in NIO factory is such that if we fail to accept a connection due to some reasons (too many file handles maybe one of them) we do not close the connections that are in CLOSE_WAIT. We need to call an explicit close on these sockets and then close them. One of the solutions might be to move doIO before accpet so that we can still close connection even if we cannot accept connections. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (ZOOKEEPER-1147) Add support for local sessions
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Shrauner reassigned ZOOKEEPER-1147: --- Assignee: Jay Shrauner Add support for local sessions -- Key: ZOOKEEPER-1147 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1147 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.3.3 Reporter: Vishal Kathuria Assignee: Jay Shrauner Labels: api-change, scaling Fix For: 3.5.0 Original Estimate: 840h Remaining Estimate: 840h This improvement is in the bucket of making ZooKeeper work at a large scale. We are planning on having about a 1 million clients connect to a ZooKeeper ensemble through a set of 50-100 observers. Majority of these clients are read only - ie they do not do any updates or create ephemeral nodes. In ZooKeeper today, the client creates a session and the session creation is handled like any other update. In the above use case, the session create/drop workload can easily overwhelm an ensemble. The following is a proposal for a local session, to support a larger number of connections. 1. The idea is to introduce a new type of session - local session. A local session doesn't have a full functionality of a normal session. 2. Local sessions cannot create ephemeral nodes. 3. Once a local session is lost, you cannot re-establish it using the session-id/password. The session and its watches are gone for good. 4. When a local session connects, the session info is only maintained on the zookeeper server (in this case, an observer) that it is connected to. The leader is not aware of the creation of such a session and there is no state written to disk. 5. The pings and expiration is handled by the server that the session is connected to. With the above changes, we can make ZooKeeper scale to a much larger number of clients without making the core ensemble a bottleneck. In terms of API, there are two options that are being considered 1. Let the client specify at the connect time which kind of session do they want. 2. All sessions connect as local sessions and automatically get promoted to global sessions when they do an operation that requires a global session (e.g. creating an ephemeral node) Chubby took the approach of lazily promoting all sessions to global, but I don't think that would work in our case, where we want to keep sessions which never create ephemeral nodes as always local. Option 2 would make it more broadly usable but option 1 would be easier to implement. We are thinking of implementing option 1 as the first cut. There would be a client flag, IsLocalSession (much like the current readOnly flag) that would be used to determine whether to create a local session or a global session. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (ZOOKEEPER-1505) Multi-thread CommitProcessor
Jay Shrauner created ZOOKEEPER-1505: --- Summary: Multi-thread CommitProcessor Key: ZOOKEEPER-1505 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1505 Project: ZooKeeper Issue Type: Improvement Components: server Reporter: Jay Shrauner Assignee: Jay Shrauner Attachments: ZOOKEEPER-1505.patch CommitProcessor has a single thread that both pulls requests off its queues and runs all downstream processors. This is noticeably inefficient for read-intensive workloads, which could be run concurrently. The trick is handling write transactions. I propose multi-threading this code according to the following two constraints - each session must see its requests responded to in order - all committed transactions must be handled in zxid order, across all sessions I believe these cover the only constraints we need to honor. In particular, I believe we can relax the following: - it does not matter if the read request in one session happens before or after the write request in another session With these constraints, I propose the following threads - 1primary queue servicing/work dispatching thread - 0-N assignable worker threads, where a given session is always assigned to the same worker thread By assigning sessions always to the same worker thread (using a simple sessionId mod number of worker threads), we guarantee the first constraint--requests we push onto the thread queue are processed in order. The way we guarantee the second constraint is we only allow a single commit transaction to be in flight at a time--the queue servicing thread blocks while a commit transaction is in flight, and when the transaction completes it clears the flag. On a 32 core machine running Linux 2.6.38, achieved best performance with 32 worker threads for a 56% +/- 5% improvement in throughput (this improvement was measured on top of that for ZOOKEEPER-1504, not in isolation). New classes introduced in this patch are: WorkerService (also in ZOOKEEPER-1504): ExecutorService wrapper that makes worker threads daemon threads and names then in an easily debuggable manner. Supports assignable threads (as used here) and non-assignable threads (as used by NIOServerCnxnFactory). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (ZOOKEEPER-1444) Idle session-less connections never time out
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Shrauner updated ZOOKEEPER-1444: Attachment: ZOOKEEPER-1444.patch Idle session-less connections never time out Key: ZOOKEEPER-1444 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1444 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.3.2, 3.4.3, 3.5.0 Reporter: Jay Shrauner Assignee: Jay Shrauner Priority: Critical Fix For: 3.5.0 Attachments: ZOOKEEPER-1444.patch A socket connection to the server on which a session is not created will never time out. A misbehaving client that opens and leaks connections without creating sessions will hold open file descriptors on the server. The existing timeout code is implemented at the session level, but the servers also should track and expire connections at the connection level. Proposed solution is to pull the timeout data structure handling code (hashmap of expiry time to sets of objects, simple monotonically incrementing nextExpirationTime) from SessionTrackerImpl into its own class in order to share it with connection level timeouts to be implemented in NIOServerCnxnFactory. Connections can be assigned a small initial timeout (proposing something small, like 3s) until a session is created, at which point the ServerCnxn session timeout can be used instead. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (ZOOKEEPER-1444) Idle session-less connections never time out
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Shrauner updated ZOOKEEPER-1444: Attachment: ZOOKEEPER-1444.patch Idle session-less connections never time out Key: ZOOKEEPER-1444 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1444 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.3.2, 3.4.3, 3.5.0 Reporter: Jay Shrauner Assignee: Jay Shrauner Priority: Critical Fix For: 3.5.0 Attachments: ZOOKEEPER-1444.patch, ZOOKEEPER-1444.patch A socket connection to the server on which a session is not created will never time out. A misbehaving client that opens and leaks connections without creating sessions will hold open file descriptors on the server. The existing timeout code is implemented at the session level, but the servers also should track and expire connections at the connection level. Proposed solution is to pull the timeout data structure handling code (hashmap of expiry time to sets of objects, simple monotonically incrementing nextExpirationTime) from SessionTrackerImpl into its own class in order to share it with connection level timeouts to be implemented in NIOServerCnxnFactory. Connections can be assigned a small initial timeout (proposing something small, like 3s) until a session is created, at which point the ServerCnxn session timeout can be used instead. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1444) Idle session-less connections never time out
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267170#comment-13267170 ] Jay Shrauner commented on ZOOKEEPER-1444: - The automated test runs are looking pretty flaky..is this typical? They all pass in my client. Idle session-less connections never time out Key: ZOOKEEPER-1444 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1444 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.3.2, 3.4.3, 3.5.0 Reporter: Jay Shrauner Assignee: Jay Shrauner Priority: Critical Fix For: 3.5.0 Attachments: ZOOKEEPER-1444.patch, ZOOKEEPER-1444.patch A socket connection to the server on which a session is not created will never time out. A misbehaving client that opens and leaks connections without creating sessions will hold open file descriptors on the server. The existing timeout code is implemented at the session level, but the servers also should track and expire connections at the connection level. Proposed solution is to pull the timeout data structure handling code (hashmap of expiry time to sets of objects, simple monotonically incrementing nextExpirationTime) from SessionTrackerImpl into its own class in order to share it with connection level timeouts to be implemented in NIOServerCnxnFactory. Connections can be assigned a small initial timeout (proposing something small, like 3s) until a session is created, at which point the ServerCnxn session timeout can be used instead. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira