oopz...you are right. -----Original Message----- From: Sijie Guo [mailto:[email protected]] Sent: 31 December 2013 12:55 To: [email protected] Subject: Re: Build failed in Jenkins: bookkeeper-trunk #489
On Mon, Dec 30, 2013 at 11:16 PM, Rakesh R <[email protected]> wrote: > I've tried one simple test case: > > Just before connecting 'bootstrap.connect(addr)', I have killed the > Bookieserver. What I have observed is immediately returning the call > with failure. > Isn't that expected? as there is not server listened on given port. this is how TCP works, no? > > Any thoughts? > > -----Original Message----- > From: Sijie Guo [mailto:[email protected]] > Sent: 31 December 2013 12:33 > To: [email protected] > Subject: Re: Build failed in Jenkins: bookkeeper-trunk #489 > > On Mon, Dec 30, 2013 at 10:45 PM, Rakesh R <[email protected]> wrote: > > > Hi Sijie, > > > > But I didn't understand why the connection failure is immediately > > sending without waiting for the timeout. > > In general, client should wait for the connection timeout(10secs) and > > internally retries before throwing failure message. Am I correct? > > > > No idea. from the log, there is less information to tell what was > going on at that time. I think the better solution is to add logs > about the failure so we could catch the details of what's wrong inside. > > > > > > Do we need to have an explicit retry mechanism in netty? > > > > I don't think we need retry connect in netty. As 1) we already have > retry mechanism in bookie client ; 2) if connect failed on any bookie, > we should let netty notify bookkeeper immediately. as connect failure > means bookie down in most of case, we should change bookie immediately > to avoid high latency. > > - Sijie > > > > > > -Rakesh > > -----Original Message----- > > From: Sijie Guo [mailto:[email protected]] > > Sent: 31 December 2013 11:59 > > To: [email protected] > > Subject: Re: Build failed in Jenkins: bookkeeper-trunk #489 > > > > I don't think its connect timeout setting issue. as by default, > > netty channel connect timeout is 10 sec ( > > https://github.com/netty/netty/blob/3.2/src/main/java/org/jboss/nett > > y/ > > channel/DefaultChannelConfig.java#L38 > > ). > > If you checked the log, the log statements show that the connect > > operation is in same second. > > > > 2013-12-30 12:29:36,731 - INFO - > > [BookKeeperClientWorker-0-0:PerChannelBookieClient@167] - Connecting > > to > > bookie: /67.195.138.30:15039 > > 2013-12-30 12:29:36,732 - ERROR - [New I/O client boss > > #5-1:PerChannelBookieClient$1@203] - Could not connect to bookie: [id: > > 0x019a639b, /229.27.250.246:46509 :> /67.195.138.30:15039], current > > state CONNECTING > > > > > > > > > > On Mon, Dec 30, 2013 at 9:31 PM, Rakesh R <[email protected]> wrote: > > > > > Hi Flavio, > > > > > > As test case name says, it is testing multiple bookie failures. > > > > > > On bookiefailure, when doing the ensemble reformation, > > > unfortunately it is failing to connect to the Bookie-15039. But it > > > should suppose to get connected and continue write operation. This > > > is the reason for the test case failure. Please see the following log > > > pattern: > > > > > > 2013-12-30 12:29:36,731 - INFO - > > > [BookKeeperClientWorker-0-0:PerChannelBookieClient@167] - > > > Connecting to > > > bookie: /67.195.138.30:15039 > > > 2013-12-30 12:29:36,732 - ERROR - [New I/O client boss > > > #5-1:PerChannelBookieClient$1@203] - Could not connect to bookie: [id: > > > 0x019a639b, /229.27.250.246:46509 :> /67.195.138.30:15039], > > > current state CONNECTING > > > 2013-12-30 12:29:36,732 - WARN - > > > [BookKeeperClientWorker-0-0:PendingAddOp@158] - Write did not succeed: > > > L0 > > > E100 on /67.195.138.30:15039 > > > 2013-12-30 12:29:36,733 - INFO - > > > [BookKeeperClientWorker-0-0:LedgerHandle@659] - Handling failure > > > of > > > bookie: /67.195.138.30:15039 index: 2 > > > 2013-12-30 12:29:36,733 - WARN - > > > [BookKeeperClientWorker-0-0:RackawareEnsemblePlacementPolicy@491] > > > - Failed to choose a bookie from /default-rack : excluded [<Bookie: > > > 67.195.138.30:15036>, <Bookie:67.195.138.30:15038>, <Bookie: > > > 67.195.138.30:15039>, <Bookie:67.195.138.30:15040>, <Bookie: > > > 67.195.138.30:15035>], fallback to choose bookie randomly from the > > > cluster. > > > > > > > > > I'm thinking, there could be chance of small network fluctuations > > > or slow machine and resulting in connection failure. > > > To handle this IMHO, we should have netty client connection > > > timeout in place and should retry for few intervals. Let me do a > > > try with bootstrap.setOption("connectTimeoutMillis", > > > timeoutvalue); Shall I raise a JIRA to discuss about these > > > concerns and will reach to a conclusion. Whats your opinion? > > > > > > -Rakesh > > > > > > -----Original Message----- > > > From: Flavio Junqueira [mailto:[email protected]] > > > Sent: 31 December 2013 01:51 > > > To: [email protected] > > > Subject: Fwd: Build failed in Jenkins: bookkeeper-trunk #489 > > > > > > I was wondering if there is a jira open for the test that failed > > > below, does anyone know? > > > > > > -Flavio > > > > > > Begin forwarded message: > > > > > > > Tests in error: > > > > > > > > > > testWithMultipleBookieFailuresInLastEnsemble[2](org.apache.bookkeeper. > > > client.BookieWriteLedgerTest) > > > > > > > > >
