On Mon, Dec 30, 2013 at 10:45 PM, Rakesh R <[email protected]> wrote:

> Hi Sijie,
>
> But I didn't understand why the connection failure is immediately sending
> without waiting for the timeout.

In general, client should wait for the connection timeout(10secs) and
> internally retries before throwing failure message. Am I correct?
>

No idea. from the log, there is less information to tell what was going on
at that time. I think the better solution is to add  logs about the failure
so we could catch the details of what's wrong inside.


>
> Do we need to have an explicit retry mechanism in netty?
>

I don't think we need retry connect in netty. As 1) we already have retry
mechanism in bookie client ; 2) if connect failed on any bookie, we should
let netty notify bookkeeper immediately. as connect failure means bookie
down in most of case, we should change bookie immediately to avoid high
latency.

- Sijie


>
> -Rakesh
> -----Original Message-----
> From: Sijie Guo [mailto:[email protected]]
> Sent: 31 December 2013 11:59
> To: [email protected]
> Subject: Re: Build failed in Jenkins: bookkeeper-trunk #489
>
> I don't think its connect timeout setting issue. as by default, netty
> channel connect timeout is 10 sec (
> https://github.com/netty/netty/blob/3.2/src/main/java/org/jboss/netty/channel/DefaultChannelConfig.java#L38
> ).
> If you checked the log, the log statements show that the connect operation
> is in same second.
>
> 2013-12-30 12:29:36,731 - INFO  -
> [BookKeeperClientWorker-0-0:PerChannelBookieClient@167] - Connecting to
> bookie: /67.195.138.30:15039
> 2013-12-30 12:29:36,732 - ERROR - [New I/O client boss
> #5-1:PerChannelBookieClient$1@203] - Could not connect to bookie: [id:
> 0x019a639b, /229.27.250.246:46509 :> /67.195.138.30:15039], current state
> CONNECTING
>
>
>
>
> On Mon, Dec 30, 2013 at 9:31 PM, Rakesh R <[email protected]> wrote:
>
> > Hi Flavio,
> >
> > As test case name says, it is testing multiple bookie failures.
> >
> > On bookiefailure, when doing the ensemble reformation, unfortunately
> > it is failing to connect to the Bookie-15039. But it should suppose to
> > get connected and continue write operation. This is the reason for the
> > test case failure. Please see the following log pattern:
> >
> > 2013-12-30 12:29:36,731 - INFO  -
> > [BookKeeperClientWorker-0-0:PerChannelBookieClient@167] - Connecting
> > to
> > bookie: /67.195.138.30:15039
> > 2013-12-30 12:29:36,732 - ERROR - [New I/O client boss
> > #5-1:PerChannelBookieClient$1@203] - Could not connect to bookie: [id:
> > 0x019a639b, /229.27.250.246:46509 :> /67.195.138.30:15039], current
> > state CONNECTING
> > 2013-12-30 12:29:36,732 - WARN  -
> > [BookKeeperClientWorker-0-0:PendingAddOp@158] - Write did not succeed:
> > L0
> > E100 on /67.195.138.30:15039
> > 2013-12-30 12:29:36,733 - INFO  -
> > [BookKeeperClientWorker-0-0:LedgerHandle@659] - Handling failure of
> > bookie: /67.195.138.30:15039 index: 2
> > 2013-12-30 12:29:36,733 - WARN  -
> > [BookKeeperClientWorker-0-0:RackawareEnsemblePlacementPolicy@491] -
> > Failed to choose a bookie from /default-rack : excluded [<Bookie:
> > 67.195.138.30:15036>, <Bookie:67.195.138.30:15038>, <Bookie:
> > 67.195.138.30:15039>, <Bookie:67.195.138.30:15040>, <Bookie:
> > 67.195.138.30:15035>], fallback to choose bookie randomly from the
> > cluster.
> >
> >
> > I'm thinking, there could be chance of small network fluctuations or
> > slow machine and resulting in connection failure.
> > To handle this IMHO, we should have netty client connection timeout in
> > place and should retry for few intervals. Let me do a try with
> > bootstrap.setOption("connectTimeoutMillis", timeoutvalue); Shall I
> > raise a JIRA to discuss about these concerns and will reach to a
> > conclusion. Whats your opinion?
> >
> > -Rakesh
> >
> > -----Original Message-----
> > From: Flavio Junqueira [mailto:[email protected]]
> > Sent: 31 December 2013 01:51
> > To: [email protected]
> > Subject: Fwd: Build failed in Jenkins: bookkeeper-trunk #489
> >
> > I was wondering if there is a jira open for the test that failed
> > below, does anyone know?
> >
> > -Flavio
> >
> > Begin forwarded message:
> >
> > > Tests in error:
> > >
> >
> > testWithMultipleBookieFailuresInLastEnsemble[2](org.apache.bookkeeper.
> > client.BookieWriteLedgerTest)
> >
> >
>

Reply via email to