I have filed a bug report with Oracle on this issue today.  I have not yet been 
assigned a bug number.  I thought the issue might be of general interest to the 
Kafka community and I have a suggestion for a configuration change that works 
around the issue in case other people tun into it.

First the issue I am about to describe will be hard to believe, but, I do 
provide a short Java program that demonstrates the JVM GC issue that triggers 
the long ParNewGC pauses under certain conditions.  Before you dismiss the 
analysis out of hand I suggest you try try test program below.  I have tested 
it in multiple environments: linux dev box, linux laptop, and a linux EC2 
instance.  Other engineers have also done so in their own environments.  I 
would be interested to know what you see.

First the conditions:

1) More than a 1G heap (we ran into it when we increased the heap from 1G to 
2G).
2) Requires at least some topics defined.  Not sure the exact amount.  We don't 
see this issue in all our environments.
3) Log compaction enabled.
4) Using JVM other than Oracle 6.  I have reproduced the underlying issue (with 
the test program below) using OpenJDK 6, Oracle 7, OpenJDK 7, and Oracle 8 and 
OpenJDK 8.

Under these conditions we can see 9 second or longer pauses due to individual 
ParNew GC events (which is greater than the default 6 second ZooKeeper session 
timeout).  It tends to be very reproducible in a given environment (e.g. either 
always happens or it never happens).  We are using Kafka 0.8.1.1.

The long ParNewGC delays are triggered by the allocation of a large static object during 
startup.  Specially the SkimpyOffsetMap bytes buffer.  There is no reason why this should 
be the case and that is why I have opened a JVM bug with Oracle.  The even more 
unbelievable part is this bug only seems to trigger a problem for static objects in the 
480M to 512M range.  The default size of this byte buffer is 500M.  Our current 
workaround it to configure this buffer to use 400M instead.  We have verified that this 
makes the issue go away in multiple "real" Kafka environments.  Using the G1 GC 
is another option we are considering (and seems to work).

Here is the test program (note that the size of the static object matches the 
default size of the SkimpyOffsetMap bytes buffer):

import java.util.ArrayList;

public class LongParNewPause {

   static byte[] bigStaticObject;

   public static void main(String[] args) throws Exception {
       int bigObjSize    = args.length > 0 ? Integer.parseInt(args[0]) : 
524288000;
       int littleObjSize = args.length > 1 ? Integer.parseInt(args[1]) : 100;
       int saveFraction  = args.length > 2 ? Integer.parseInt(args[2]) : 15;

       bigStaticObject = new byte[bigObjSize];
       ArrayList<byte[]> holder = new ArrayList<byte[]>();

       int i = 0;
       while (true) {
           byte[] local = new byte[littleObjSize];
           if (i++ % saveFraction == 0) {
               holder.add(local);
           }
       }
   }
}


You should run it with the following JVM options:

-verbose:gc
-XX:+PrintGCTimeStamps
-XX:+PrintGCDetails
-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
-Xmx2G
-Xms2G

Here is what I see using Oracle 8:

0.355: [GC (Allocation Failure) 0.355: [ParNew: 272640K->18193K(306688K), 
0.0265380 secs] 784640K->530193K(4160256K), 0.0266064 secs] [Times: user=0.07 
sys=0.00, real=0.03 secs]
0.420: [GC (Allocation Failure) 0.420: [ParNew: 290833K->23124K(306688K), 
11.4522887 secs] 802833K->552573K(4160256K), 11.4523296 secs] [Times: user=42.51 
sys=0.18, real=11.45 secs]
11.902: [GC (Allocation Failure) 11.902: [ParNew: 295764K->33361K(306688K), 
25.6937427 secs] 825213K->580830K(4160256K), 25.6937906 secs] [Times: user=100.63 
sys=0.11, real=25.70 secs]
37.627: [GC (Allocation Failure) 37.627: [ParNew: 306001K->25620K(306688K), 
0.0219097 secs] 853470K->591124K(4160256K), 0.0219448 secs] [Times: user=0.08 
sys=0.00, real=0.02 secs]
...

If I pass in 500,000,000 for the size:

0.344: [GC (Allocation Failure) 0.344: [ParNew: 272640K->18181K(306688K), 
0.0253515 secs] 760921K->506462K(4160256K), 0.0253957 secs] [Times: user=0.04 
sys=0.01, real=0.02 secs]
0.408: [GC (Allocation Failure) 0.408: [ParNew: 290821K->31804K(306688K), 
0.0499212 secs] 779102K->537534K(4160256K), 0.0499556 secs] [Times: user=0.08 
sys=0.01, real=0.05 secs]
0.491: [GC (Allocation Failure) 0.491: [ParNew: 304444K->27084K(306688K), 
0.0286135 secs] 810174K->550834K(4160256K), 0.0286590 secs] [Times: user=0.08 
sys=0.00, real=0.03 secs]
0.553: [GC (Allocation Failure) 0.554: [ParNew: 299724K->27617K(306688K), 
0.0387482 secs] 823474K->569401K(4160256K), 0.0387790 secs] [Times: user=0.07 
sys=0.01, real=0.04 secs]
0.625: [GC (Allocation Failure) 0.625: [ParNew: 300257K->21936K(306688K), 
0.0368052 secs] 842041K->584850K(4160256K), 0.0368367 secs] [Times: user=0.08 
sys=0.00, real=0.03 secs]
...

Playing with the other two parameters can vary the results, but, they are all bad if the 
size of the static object is in the "bad" range.


James Lent
Senior Software Engineer

Digitalsmiths
A TiVo Company

www.digitalsmiths.com<http://www.digitalsmiths.com/>
jl...@digitalsmiths.com<mailto:jl...@digitalsmiths.com>  | office 919.460.4747

________________________________

This email and any attachments may contain confidential and privileged material 
for the sole use of the intended recipient. Any review, copying, or 
distribution of this email (or any attachments) by others is prohibited. If you 
are not the intended recipient, please contact the sender immediately and 
permanently delete this email and any attachments. No employee or agent of TiVo 
Inc. is authorized to conclude any binding agreement on behalf of TiVo Inc. by 
email. Binding agreements with TiVo Inc. may only be made by a signed written 
agreement.

Reply via email to