Hi Sijie,

But I didn't understand why the connection failure is immediately sending 
without waiting for the timeout. In general, client should wait for the 
connection timeout(10secs) and internally retries before throwing failure 
message. Am I correct?

Do we need to have an explicit retry mechanism in netty?

-Rakesh
-----Original Message-----
From: Sijie Guo [mailto:[email protected]] 
Sent: 31 December 2013 11:59
To: [email protected]
Subject: Re: Build failed in Jenkins: bookkeeper-trunk #489

I don't think its connect timeout setting issue. as by default, netty channel 
connect timeout is 10 sec ( 
https://github.com/netty/netty/blob/3.2/src/main/java/org/jboss/netty/channel/DefaultChannelConfig.java#L38).
If you checked the log, the log statements show that the connect operation is 
in same second.

2013-12-30 12:29:36,731 - INFO  -
[BookKeeperClientWorker-0-0:PerChannelBookieClient@167] - Connecting to bookie: 
/67.195.138.30:15039
2013-12-30 12:29:36,732 - ERROR - [New I/O client boss 
#5-1:PerChannelBookieClient$1@203] - Could not connect to bookie: [id:
0x019a639b, /229.27.250.246:46509 :> /67.195.138.30:15039], current state 
CONNECTING




On Mon, Dec 30, 2013 at 9:31 PM, Rakesh R <[email protected]> wrote:

> Hi Flavio,
>
> As test case name says, it is testing multiple bookie failures.
>
> On bookiefailure, when doing the ensemble reformation, unfortunately 
> it is failing to connect to the Bookie-15039. But it should suppose to 
> get connected and continue write operation. This is the reason for the 
> test case failure. Please see the following log pattern:
>
> 2013-12-30 12:29:36,731 - INFO  -
> [BookKeeperClientWorker-0-0:PerChannelBookieClient@167] - Connecting 
> to
> bookie: /67.195.138.30:15039
> 2013-12-30 12:29:36,732 - ERROR - [New I/O client boss 
> #5-1:PerChannelBookieClient$1@203] - Could not connect to bookie: [id:
> 0x019a639b, /229.27.250.246:46509 :> /67.195.138.30:15039], current 
> state CONNECTING
> 2013-12-30 12:29:36,732 - WARN  -
> [BookKeeperClientWorker-0-0:PendingAddOp@158] - Write did not succeed: 
> L0
> E100 on /67.195.138.30:15039
> 2013-12-30 12:29:36,733 - INFO  -
> [BookKeeperClientWorker-0-0:LedgerHandle@659] - Handling failure of
> bookie: /67.195.138.30:15039 index: 2
> 2013-12-30 12:29:36,733 - WARN  -
> [BookKeeperClientWorker-0-0:RackawareEnsemblePlacementPolicy@491] - 
> Failed to choose a bookie from /default-rack : excluded [<Bookie:
> 67.195.138.30:15036>, <Bookie:67.195.138.30:15038>, <Bookie:
> 67.195.138.30:15039>, <Bookie:67.195.138.30:15040>, <Bookie:
> 67.195.138.30:15035>], fallback to choose bookie randomly from the 
> cluster.
>
>
> I'm thinking, there could be chance of small network fluctuations or 
> slow machine and resulting in connection failure.
> To handle this IMHO, we should have netty client connection timeout in 
> place and should retry for few intervals. Let me do a try with 
> bootstrap.setOption("connectTimeoutMillis", timeoutvalue); Shall I 
> raise a JIRA to discuss about these concerns and will reach to a 
> conclusion. Whats your opinion?
>
> -Rakesh
>
> -----Original Message-----
> From: Flavio Junqueira [mailto:[email protected]]
> Sent: 31 December 2013 01:51
> To: [email protected]
> Subject: Fwd: Build failed in Jenkins: bookkeeper-trunk #489
>
> I was wondering if there is a jira open for the test that failed 
> below, does anyone know?
>
> -Flavio
>
> Begin forwarded message:
>
> > Tests in error:
> >
>  
> testWithMultipleBookieFailuresInLastEnsemble[2](org.apache.bookkeeper.
> client.BookieWriteLedgerTest)
>
>

Reply via email to