Hi guys,

we have spent a couple of days this week with Julien and Jeff during the
EclipseCon working on MINA 3. We have experimented some things, did some
benchmarks, and studied them. This is a short sum up of what we did and
teh resuts we've got.

1) Performances

We have done some tests with MINA 3 and Ntty 3 TCP. basically, we ran
the benchmark code we have either locally (the client and the server on
one machine) or with two machines (the server and the client on two
machines). What it shows is that the difference between MINA3 (M3) and
Netty3 (N3) varies with the size of the exchanged messages. M3 is
slightly faster up to 100Kb messages, then N3 is faster up to 1Gb
messages, then N3 is clearly having some pb.

When we conduct tests with the server on one machine, and the client on
another machine, we are CPU bound. On my machine, we can reach roughly
65 000 1kb messages per second (either with M3 or N3). There is no
statistically relevent difference. The CPU is at 90%, with roughly 85%
system, which means the CPU is busy processing the sockets, the impact
of our own code is insignificant. Note that we have mesured reads, not
writes.

2) Analysis

One of the major diffeence between M3 and N3 is the buffer usage. There
are 2 kind of buffers : direct and heap. The direct buffers are
allocated outside the JVM, the heap buffers are allocated within the JVM
memory. It's important to understand that only direct buffers will be
written in a socket, so at some point, we must move the data into a
direct buffer.

So basically, we would like to push the message into a directBuffer as
soon as possibel, like in the encoder. That means we have to allocate a
DirectBuffer to do the job. It seems to be a smart idea, at first, but...

There is a bug in the JVM : http://bugs.sun.com/view_bug.do?bug_id=4469299

It says "In some cases, particularly with applications with large heaps
and light to moderate loads where collections happen infrequently, the
Java process can consume memory to the point of process address space
exhaustion.". Bottom line, as soon as you have heavy allocations, you
might get OOM, even for Direct Buffers.

One more problem is that there is a physical limit on the size you can
allocate, and it's defined by a parameter : -XX
:MaxDirectMemorySize=<size>. It defaults to 64M in java 6, and the size
you have set in -Xmx parameter. You can't get any farther. All in all,
it's pretty much the same thing than for the Heap buffers. Assuming that
allocatng a Direct Buffer is twice more expensive than a heap buffer
(again, it depends on the Java version you are using), it's quite
important not to allocate too many direct buffers.

In order to work around the JVM bug, Sun is suggesting three possibilities :
1) Insert occasional explicit System.gc() invocations
2) Reduce the size of the young generation to force more frequent GCs.
3) Explicitly pool direct buffers at the application level.

N3 has implemented the third approach, which is expensive, and create a
pb as soon as you send big messages, thus leading to the bad
performances we have in this case in M3.
We have a possible different approach : never allocate a direct buffer,
always use a heap buffer. This will lead sot a penalty of 3% in
performance, but this eliminate the pb.

Calling the GC is simply not an option.

3) Write performances

Writng data into a socke is tricky : we never know in advance how many
bytes we will be able to write, and the data must be injected into a
Direct buffer before it can be written into the socket. There are a few
possible strategy :

1) write the heap buffer into the channel
2) write a direct buffer into the channel
3) get a chunk of the heap buffer, copy it into a direct buffer, and
write it into the channel.

In case (1), we delegate the copy of the buffer to the channel. If the
heap buffer is huge, we might copy it many times, as the
channel.write(buffer) will return the number of bytes written.
Hopefully, the channel.write() will not copy the whole heap buffer into
a huge direct buffer, but we have no way to control what it does

In case (2), that means we allocate a huge direct buffer, and put
everything into it. It has the advantage of being done only once, and we
don't have to take care of what's going in in the write() method. But
the main issue is that we will potentially hit the JVM bug

In case (3), we can have an approach that tries to deal with both issues
: we allocate a direct buffer that is associated with each thread - so
only a few ones will be allocated - and we copy a maximum of bytes that
is determinated by the socket sendBufferSize (roughly 64kb). We will
then copy the data from the heap buffer to the dirct buffer at each
round, and if everything goes well, we will just do the minimal number
of copies. However, we may perfectly well have to copy the data many
times, as the direct buffer might be shared with many other sessions.

All in all, there is no perfect strategy here. We can imrove the third
strategy by using an adaptative copy : as we know ha many bytes were
written, we can limit the number of bytes we copy into the diretc buffer
to the last few size that the socket was able to send.

The important thing to remember is that we *have* to keep the buffer to
send in a stack until it has been fully written, which may lead to some
pb when the client are slow readers, and when the server has many client
to serve.

5) Selectors

There is no measurable differences on the server if we use one single
selector or many. It seems that most of the time is consummed in the
select() method, no matter what. The original dsign where we created
many selectors (as many as we have processors, plus one) seems to be
based on some urban legend, or at least, based on Java 4. We have to
reasset this design.

4) Conclusion

We have more tests to conduct. This is not simple, it all depends on the
JVM we are running the server on, and many of the aspects may be configured.

The next steps would be to conduct tests with the various scenarii, on
different JVM, with different size. We may need to design a plugable
system for handling the reads and the writes, we can use a factory for that.

Bottom line, we also would like to compare a NIO based server with a BIO
based server. I'm not sure that we have a big performance penalty with
Java 7.

Java 7 is way better than Java 6 in the way it handles buffers too.
There is no reason to use Java 6 those days, it's dead anyway. It would
be interesting to benchmark Java 8 to see what it brings.


Tahnks !

-- 
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com 

Reply via email to