Hi Flavio,
As test case name says, it is testing multiple bookie failures.
On bookiefailure, when doing the ensemble reformation, unfortunately it is
failing to connect to the Bookie-15039. But it should suppose to get connected
and continue write operation. This is the reason for the test case failure.
Please see the following log pattern:
2013-12-30 12:29:36,731 - INFO -
[BookKeeperClientWorker-0-0:PerChannelBookieClient@167] - Connecting to bookie:
/67.195.138.30:15039
2013-12-30 12:29:36,732 - ERROR - [New I/O client boss
#5-1:PerChannelBookieClient$1@203] - Could not connect to bookie: [id:
0x019a639b, /229.27.250.246:46509 :> /67.195.138.30:15039], current state
CONNECTING
2013-12-30 12:29:36,732 - WARN - [BookKeeperClientWorker-0-0:PendingAddOp@158]
- Write did not succeed: L0 E100 on /67.195.138.30:15039
2013-12-30 12:29:36,733 - INFO - [BookKeeperClientWorker-0-0:LedgerHandle@659]
- Handling failure of bookie: /67.195.138.30:15039 index: 2
2013-12-30 12:29:36,733 - WARN -
[BookKeeperClientWorker-0-0:RackawareEnsemblePlacementPolicy@491] - Failed to
choose a bookie from /default-rack : excluded [<Bookie:67.195.138.30:15036>,
<Bookie:67.195.138.30:15038>, <Bookie:67.195.138.30:15039>,
<Bookie:67.195.138.30:15040>, <Bookie:67.195.138.30:15035>], fallback to choose
bookie randomly from the cluster.
I'm thinking, there could be chance of small network fluctuations or slow
machine and resulting in connection failure.
To handle this IMHO, we should have netty client connection timeout in place
and should retry for few intervals. Let me do a try with
bootstrap.setOption("connectTimeoutMillis", timeoutvalue);
Shall I raise a JIRA to discuss about these concerns and will reach to a
conclusion. Whats your opinion?
-Rakesh
-----Original Message-----
From: Flavio Junqueira [mailto:[email protected]]
Sent: 31 December 2013 01:51
To: [email protected]
Subject: Fwd: Build failed in Jenkins: bookkeeper-trunk #489
I was wondering if there is a jira open for the test that failed below, does
anyone know?
-Flavio
Begin forwarded message:
> Tests in error:
>
> testWithMultipleBookieFailuresInLastEnsemble[2](org.apache.bookkeeper.client.BookieWriteLedgerTest)