Hi guys, we have spent a couple of days this week with Julien and Jeff during the EclipseCon working on MINA 3. We have experimented some things, did some benchmarks, and studied them. This is a short sum up of what we did and teh resuts we've got.
1) Performances We have done some tests with MINA 3 and Ntty 3 TCP. basically, we ran the benchmark code we have either locally (the client and the server on one machine) or with two machines (the server and the client on two machines). What it shows is that the difference between MINA3 (M3) and Netty3 (N3) varies with the size of the exchanged messages. M3 is slightly faster up to 100Kb messages, then N3 is faster up to 1Gb messages, then N3 is clearly having some pb. When we conduct tests with the server on one machine, and the client on another machine, we are CPU bound. On my machine, we can reach roughly 65 000 1kb messages per second (either with M3 or N3). There is no statistically relevent difference. The CPU is at 90%, with roughly 85% system, which means the CPU is busy processing the sockets, the impact of our own code is insignificant. Note that we have mesured reads, not writes. 2) Analysis One of the major diffeence between M3 and N3 is the buffer usage. There are 2 kind of buffers : direct and heap. The direct buffers are allocated outside the JVM, the heap buffers are allocated within the JVM memory. It's important to understand that only direct buffers will be written in a socket, so at some point, we must move the data into a direct buffer. So basically, we would like to push the message into a directBuffer as soon as possibel, like in the encoder. That means we have to allocate a DirectBuffer to do the job. It seems to be a smart idea, at first, but... There is a bug in the JVM : http://bugs.sun.com/view_bug.do?bug_id=4469299 It says "In some cases, particularly with applications with large heaps and light to moderate loads where collections happen infrequently, the Java process can consume memory to the point of process address space exhaustion.". Bottom line, as soon as you have heavy allocations, you might get OOM, even for Direct Buffers. One more problem is that there is a physical limit on the size you can allocate, and it's defined by a parameter : -XX :MaxDirectMemorySize=<size>. It defaults to 64M in java 6, and the size you have set in -Xmx parameter. You can't get any farther. All in all, it's pretty much the same thing than for the Heap buffers. Assuming that allocatng a Direct Buffer is twice more expensive than a heap buffer (again, it depends on the Java version you are using), it's quite important not to allocate too many direct buffers. In order to work around the JVM bug, Sun is suggesting three possibilities : 1) Insert occasional explicit System.gc() invocations 2) Reduce the size of the young generation to force more frequent GCs. 3) Explicitly pool direct buffers at the application level. N3 has implemented the third approach, which is expensive, and create a pb as soon as you send big messages, thus leading to the bad performances we have in this case in M3. We have a possible different approach : never allocate a direct buffer, always use a heap buffer. This will lead sot a penalty of 3% in performance, but this eliminate the pb. Calling the GC is simply not an option. 3) Write performances Writng data into a socke is tricky : we never know in advance how many bytes we will be able to write, and the data must be injected into a Direct buffer before it can be written into the socket. There are a few possible strategy : 1) write the heap buffer into the channel 2) write a direct buffer into the channel 3) get a chunk of the heap buffer, copy it into a direct buffer, and write it into the channel. In case (1), we delegate the copy of the buffer to the channel. If the heap buffer is huge, we might copy it many times, as the channel.write(buffer) will return the number of bytes written. Hopefully, the channel.write() will not copy the whole heap buffer into a huge direct buffer, but we have no way to control what it does In case (2), that means we allocate a huge direct buffer, and put everything into it. It has the advantage of being done only once, and we don't have to take care of what's going in in the write() method. But the main issue is that we will potentially hit the JVM bug In case (3), we can have an approach that tries to deal with both issues : we allocate a direct buffer that is associated with each thread - so only a few ones will be allocated - and we copy a maximum of bytes that is determinated by the socket sendBufferSize (roughly 64kb). We will then copy the data from the heap buffer to the dirct buffer at each round, and if everything goes well, we will just do the minimal number of copies. However, we may perfectly well have to copy the data many times, as the direct buffer might be shared with many other sessions. All in all, there is no perfect strategy here. We can imrove the third strategy by using an adaptative copy : as we know ha many bytes were written, we can limit the number of bytes we copy into the diretc buffer to the last few size that the socket was able to send. The important thing to remember is that we *have* to keep the buffer to send in a stack until it has been fully written, which may lead to some pb when the client are slow readers, and when the server has many client to serve. 5) Selectors There is no measurable differences on the server if we use one single selector or many. It seems that most of the time is consummed in the select() method, no matter what. The original dsign where we created many selectors (as many as we have processors, plus one) seems to be based on some urban legend, or at least, based on Java 4. We have to reasset this design. 4) Conclusion We have more tests to conduct. This is not simple, it all depends on the JVM we are running the server on, and many of the aspects may be configured. The next steps would be to conduct tests with the various scenarii, on different JVM, with different size. We may need to design a plugable system for handling the reads and the writes, we can use a factory for that. Bottom line, we also would like to compare a NIO based server with a BIO based server. I'm not sure that we have a big performance penalty with Java 7. Java 7 is way better than Java 6 in the way it handles buffers too. There is no reason to use Java 6 those days, it's dead anyway. It would be interesting to benchmark Java 8 to see what it brings. Tahnks ! -- Regards, Cordialement, Emmanuel Lécharny www.iktek.com