Re: ZooKeeper Session Timeout During Startup Caused By Long ParNewGC Pauses

2015-05-04 Thread Jun Rao
James.

Thanks for reporting this. So the issue only shows up at Kafka startup
time? After that, the GC time is back to normal again?

Jun

On Tue, Apr 28, 2015 at 1:48 PM, James Lent jl...@tivo.com wrote:

 I have filed a bug report with Oracle on this issue today.  I have not yet
 been assigned a bug number.  I thought the issue might be of general
 interest to the Kafka community and I have a suggestion for a configuration
 change that works around the issue in case other people tun into it.

 First the issue I am about to describe will be hard to believe, but, I do
 provide a short Java program that demonstrates the JVM GC issue that
 triggers the long ParNewGC pauses under certain conditions.  Before you
 dismiss the analysis out of hand I suggest you try try test program below.
 I have tested it in multiple environments: linux dev box, linux laptop, and
 a linux EC2 instance.  Other engineers have also done so in their own
 environments.  I would be interested to know what you see.

 First the conditions:

 1) More than a 1G heap (we ran into it when we increased the heap from 1G
 to 2G).
 2) Requires at least some topics defined.  Not sure the exact amount.  We
 don't see this issue in all our environments.
 3) Log compaction enabled.
 4) Using JVM other than Oracle 6.  I have reproduced the underlying issue
 (with the test program below) using OpenJDK 6, Oracle 7, OpenJDK 7, and
 Oracle 8 and OpenJDK 8.

 Under these conditions we can see 9 second or longer pauses due to
 individual ParNew GC events (which is greater than the default 6 second
 ZooKeeper session timeout).  It tends to be very reproducible in a given
 environment (e.g. either always happens or it never happens).  We are using
 Kafka 0.8.1.1.

 The long ParNewGC delays are triggered by the allocation of a large static
 object during startup.  Specially the SkimpyOffsetMap bytes buffer.  There
 is no reason why this should be the case and that is why I have opened a
 JVM bug with Oracle.  The even more unbelievable part is this bug only
 seems to trigger a problem for static objects in the 480M to 512M range.
 The default size of this byte buffer is 500M.  Our current workaround it to
 configure this buffer to use 400M instead.  We have verified that this
 makes the issue go away in multiple real Kafka environments.  Using the
 G1 GC is another option we are considering (and seems to work).

 Here is the test program (note that the size of the static object matches
 the default size of the SkimpyOffsetMap bytes buffer):

 import java.util.ArrayList;

 public class LongParNewPause {

static byte[] bigStaticObject;

public static void main(String[] args) throws Exception {
int bigObjSize= args.length  0 ? Integer.parseInt(args[0]) :
 524288000;
int littleObjSize = args.length  1 ? Integer.parseInt(args[1]) :
 100;
int saveFraction  = args.length  2 ? Integer.parseInt(args[2]) :
 15;

bigStaticObject = new byte[bigObjSize];
ArrayListbyte[] holder = new ArrayListbyte[]();

int i = 0;
while (true) {
byte[] local = new byte[littleObjSize];
if (i++ % saveFraction == 0) {
holder.add(local);
}
}
}
 }


 You should run it with the following JVM options:

 -verbose:gc
 -XX:+PrintGCTimeStamps
 -XX:+PrintGCDetails
 -XX:+UseParNewGC
 -XX:+UseConcMarkSweepGC
 -Xmx2G
 -Xms2G

 Here is what I see using Oracle 8:

 0.355: [GC (Allocation Failure) 0.355: [ParNew: 272640K-18193K(306688K),
 0.0265380 secs] 784640K-530193K(4160256K), 0.0266064 secs] [Times:
 user=0.07 sys=0.00, real=0.03 secs]
 0.420: [GC (Allocation Failure) 0.420: [ParNew: 290833K-23124K(306688K),
 11.4522887 secs] 802833K-552573K(4160256K), 11.4523296 secs] [Times:
 user=42.51 sys=0.18, real=11.45 secs]
 11.902: [GC (Allocation Failure) 11.902: [ParNew:
 295764K-33361K(306688K), 25.6937427 secs] 825213K-580830K(4160256K),
 25.6937906 secs] [Times: user=100.63 sys=0.11, real=25.70 secs]
 37.627: [GC (Allocation Failure) 37.627: [ParNew:
 306001K-25620K(306688K), 0.0219097 secs] 853470K-591124K(4160256K),
 0.0219448 secs] [Times: user=0.08 sys=0.00, real=0.02 secs]
 ...

 If I pass in 500,000,000 for the size:

 0.344: [GC (Allocation Failure) 0.344: [ParNew: 272640K-18181K(306688K),
 0.0253515 secs] 760921K-506462K(4160256K), 0.0253957 secs] [Times:
 user=0.04 sys=0.01, real=0.02 secs]
 0.408: [GC (Allocation Failure) 0.408: [ParNew: 290821K-31804K(306688K),
 0.0499212 secs] 779102K-537534K(4160256K), 0.0499556 secs] [Times:
 user=0.08 sys=0.01, real=0.05 secs]
 0.491: [GC (Allocation Failure) 0.491: [ParNew: 30K-27084K(306688K),
 0.0286135 secs] 810174K-550834K(4160256K), 0.0286590 secs] [Times:
 user=0.08 sys=0.00, real=0.03 secs]
 0.553: [GC (Allocation Failure) 0.554: [ParNew: 299724K-27617K(306688K),
 0.0387482 secs] 823474K-569401K(4160256K), 0.0387790 secs] [Times:
 user=0.07 sys=0.01, real=0.04 secs]
 0.625: [GC (Allocation Failure) 0.625: 

Re: ZooKeeper Session Timeout During Startup Caused By Long ParNewGC Pauses

2015-05-04 Thread James Lent

This problem is so mysterious to me that I am hesitant to make any definitive 
statements regarding the scope of the issue.  In the staging environment where 
the problem was originally seen the GC logs were not enabled.  In that 
environment the session timeouts persisted for a long time (and operations 
tried several restarts to clear the issue).  Here are some sample logs:

$ grep -e starting -e session time 
uverse-staging-admin-0-20150423150103-kafka.log
[2015-04-23 14:23:53,837] INFO Client session timed out, have not heard from 
server in 10066ms for sessionid 0x14ce69c464f0009, closing socket connection 
and attempting reconnect (org.apache.zookeeper.ClientCnxn)
[2015-04-23 14:31:56,133] INFO [Kafka Server 0], starting 
(kafka.server.KafkaServer)
[2015-04-23 14:32:08,636] INFO Client session timed out, have not heard from 
server in 10422ms for sessionid 0x14ce6b1209f0002, closing socket connection 
and attempting reconnect (org.apache.zookeeper.ClientCnxn)
[2015-04-23 14:32:21,641] INFO Client session timed out, have not heard from 
server in 10643ms for sessionid 0x14ce6b1209f0003, closing socket connection 
and attempting reconnect (org.apache.zookeeper.ClientCnxn)
[2015-04-23 14:32:33,614] INFO Client session timed out, have not heard from 
server in 9842ms for sessionid 0x14ce6b1209f0004, closing socket connection and 
attempting reconnect (org.apache.zookeeper.ClientCnxn)
[2015-04-23 14:32:45,589] INFO Client session timed out, have not heard from 
server in 9878ms for sessionid 0x14ce6b1209f0005, closing socket connection and 
attempting reconnect (org.apache.zookeeper.ClientCnxn)
[2015-04-23 14:32:57,906] INFO Client session timed out, have not heard from 
server in 9823ms for sessionid 0x14ce6b1209f0006, closing socket connection and 
attempting reconnect (org.apache.zookeeper.ClientCnxn)
[2015-04-23 14:33:10,701] INFO Client session timed out, have not heard from 
server in 10365ms for sessionid 0x14ce6b1209f0007, closing socket connection 
and attempting reconnect (org.apache.zookeeper.ClientCnxn)
[2015-04-23 14:33:22,475] INFO Client session timed out, have not heard from 
server in 10006ms for sessionid 0x14ce6b1209f0008, closing socket connection 
and attempting reconnect (org.apache.zookeeper.ClientCnxn)
[2015-04-23 14:33:46,372] INFO Client session timed out, have not heard from 
server in 11062ms for sessionid 0x14ce6b1209f0009, closing socket connection 
and attempting reconnect (org.apache.zookeeper.ClientCnxn)
[2015-04-23 14:42:11,942] INFO Client session timed out, have not heard from 
server in 12582ms for sessionid 0x14ce6b1209f000a, closing socket connection 
and attempting reconnect (org.apache.zookeeper.ClientCnxn)
[2015-04-23 14:55:47,273] INFO Client session timed out, have not heard from 
server in 12935ms for sessionid 0x14ce6b1209f000b, closing socket connection 
and attempting reconnect (org.apache.zookeeper.ClientCnxn)

The specific configuration of Kafka (heap size, number of topics, ...) seem to 
have a big influence on the behavior.

Using the test program I have generally seen the ParNew GCs times eventually 
recover, but, it can take a long time (and sometimes I kill the process before 
it recovers).  Here is an example of one that persists for 2 ParNew GC cycles:

[ec2-user@ip-10-136-75-2 ~]$ jdk1.8.0_45/bin/java -Xmx4G -Xms4G 
-XX:+UseParNewGC -XX:+UseConcMarkSweepGC -verbose:gc -XX:+PrintGCTimeStamps 
-XX:+PrintGCDetails LongParNewPause $((500*1024*1024)) 100 10
0.289: [GC (Allocation Failure) 0.289: [ParNew: 272640K-27707K(306688K), 
0.0322014 secs] 784640K-539707K(4160256K), 0.0322659 secs] [Times: user=0.10 
sys=0.03, real=0.04 secs]
0.360: [GC (Allocation Failure) 0.360: [ParNew: 300347K-34048K(306688K), 
3.5932171 secs] 812347K-572709K(4160256K), 3.5932638 secs] [Times: user=14.12 
sys=0.05, real=3.60 secs]
3.992: [GC (Allocation Failure) 3.992: [ParNew: 306688K-34048K(306688K), 
53.1317959 secs] 845349K-599722K(4160256K), 53.1318434 secs] [Times: user=212.40 
sys=0.00, real=53.13 secs]
57.162: [GC (Allocation Failure) 57.162: [ParNew: 306688K-34048K(306688K), 
0.0500041 secs] 872362K-626660K(4160256K), 0.0500528 secs] [Times: user=0.17 
sys=0.01, real=0.05 secs]
57.250: [GC (Allocation Failure) 57.250: [ParNew: 306688K-34048K(306688K), 
0.0503560 secs] 899300K-658186K(4160256K), 0.0504036 secs] [Times: user=0.17 
sys=0.01, real=0.05 secs]
57.339: [GC (Allocation Failure) 57.339: [ParNew: 306688K-34048K(306688K), 
0.0510655 secs] 930826K-687994K(4160256K), 0.0511131 secs] [Times: user=0.17 
sys=0.00, real=0.05 secs]
57.428: [GC (Allocation Failure) 57.428: [ParNew: 306688K-34048K(306688K), 
0.0516372 secs] 960634K-720213K(4160256K), 0.0516853 secs] [Times: user=0.17 
sys=0.01, real=0.05 secs]

Just changing the fraction of objects saved from 1 out of 10 to 1 out of 
100 can dramatically increase the recovery time to 11 ParNew GCs:

[ec2-user@ip-10-136-75-2 ~]$ jdk1.8.0_45/bin/java -Xmx4G -Xms4G 
-XX:+UseParNewGC -XX:+UseConcMarkSweepGC -verbose:gc