[ https://issues.apache.org/jira/browse/KAFKA-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14384660#comment-14384660 ]
Rajiv Kurian commented on KAFKA-2045: ------------------------------------- 1. "We can actually make serious performance improvements by improving memory allocation patterns" - Yeah this is definitely the crux of it. <rant>Any performance improvements should also look at long term effects like GC activity, longest GC pause etc in addition to just throughput. Even the throughput and latency numbers will have to be looked at for a long time especially in an application where things don't fit in the L1 or L2 caches. I have usually found that with Java most benchmarks (even ones conducted with JMH) lie because of how short in duration they are. Since Java has a Thread Local Allocation Buffer, objects allocated in quick succession get allocated next to each other in memory too. So even though an ArrayList of objects is an array of pointers to objects, the fact that these objects were allocated next to each other means they get 95% (hand wave hand wave) of the benefits of an equivalent std::vector of structs in C++. The nice memory-striding effects of sequential buffers holds even if it is a linked list of Objects again given that the Objects themselves were next to each other. But over time even if a single Object is actually not deleted/shuffled in the ArrayList, a garbage collection is very likely to move them around in memory and when this happens they don't move as an entire unit but separately. Now what began as sequential access degenerates into an array of pointers to randomly laid out objects. And performance of these is an order of magnitude lower than arrays of sequentially laid out structs in C. A ByteBuffer/sun.misc.Unsafe based approach on the other hand never changes memory layout so the benefits continue to hold. This is why in my experience the 99.99th and above percentiles of typical POJO based solutions tanks and is orders of magnitude worse than the 99th etc, whereas solutions based on ByteBuffers and sun.misc.Unsafe have 99.99s that are maybe 4-5 times worse than the 99th</rant over>. But again there might (will?) be other bottlenecks like the network or CRC that might show up before one can get the max out of such a design. 2. "We don't mangle the code to badly in doing so" - I am planning to write a prototype using my own code from scratch that would include things like on the fly protocol parsing, buffer management and socket management. I'll keep looking at /copy the existing code to ensure that I handle errors correctly. It is just easier to start from fresh - that way I can work solely on getting this to work rather than worrying about how to fit this design in the current class hierarchy. A separate no strings prototype will also probably provide the best platform for a performance demo since I can use things like primitive array based open hash-maps and other non-allocating primitives based data structures for metadata management. It just gives me a lot of options without messing with trunk. If this works out and we see an improvement in performance that seems interesting, we can work on how best to not mangle the code etc. Thoughts? > Memory Management on the consumer > --------------------------------- > > Key: KAFKA-2045 > URL: https://issues.apache.org/jira/browse/KAFKA-2045 > Project: Kafka > Issue Type: Sub-task > Reporter: Guozhang Wang > > We need to add the memory management on the new consumer like we did in the > new producer. This would probably include: > 1. byte buffer re-usage for fetch response partition data. > 2. byte buffer re-usage for on-the-fly de-compression. -- This message was sent by Atlassian JIRA (v6.3.4#6332)