Duo Zhang created HBASE-21187: --------------------------------- Summary: The HBase UTs are extremely slow on some jenkins node Key: HBASE-21187 URL: https://issues.apache.org/jira/browse/HBASE-21187 Project: HBase Issue Type: Bug Components: test Reporter: Duo Zhang
Looking at the flaky dashboard for master branch, the top several UTs are likely to fail at the same time. One of the common things for the failed flaky tests job is that, the execution time is more than one hour, and the successful executions are usually only about half an hour. And I have compared the output for TestRestoreSnapshotFromClientWithRegionReplicas, for a successful run, the DisableTableProcedure can finish within one second, and for the failed run, it can take even more than half a minute. Not sure what is the real problem, but it seems that for the failed runs, there are likely time holes in the output, i.e, there is no log output for several seconds. Like this: {noformat} 2018-09-11 21:08:08,152 INFO [PEWorker-4] procedure2.ProcedureExecutor(1500): Finished pid=490, state=SUCCESS, hasLock=false; CreateTableProcedure table=testRestoreSnapshotAfterTruncate in 12.9380sec 2018-09-11 21:08:15,590 DEBUG [RpcServer.default.FPBQ.Fifo.handler=1,queue=0,port=33663] master.MasterRpcServices(1174): Checking to see if procedure is done pid=490 {noformat} No log output for about 7 seconds. And for a successful run, the same place {noformat} 2018-09-12 07:47:32,488 INFO [PEWorker-7] procedure2.ProcedureExecutor(1500): Finished pid=490, state=SUCCESS, hasLock=false; CreateTableProcedure table=testRestoreSnapshotAfterTruncate in 1.2220sec 2018-09-12 07:47:32,881 DEBUG [RpcServer.default.FPBQ.Fifo.handler=3,queue=0,port=59079] master.MasterRpcServices(1174): Checking to see if procedure is done pid=490 {noformat} There is no such hole. Maybe there is big GC? -- This message was sent by Atlassian JIRA (v7.6.3#76005)