On Mon, Dec 30, 2013 at 11:02 PM, Sijie Guo <[email protected]> wrote:
> > On Mon, Dec 30, 2013 at 10:45 PM, Rakesh R <[email protected]> wrote: > >> Hi Sijie, >> >> But I didn't understand why the connection failure is immediately sending >> without waiting for the timeout. > > In general, client should wait for the connection timeout(10secs) and >> internally retries before throwing failure message. Am I correct? >> > > No idea. from the log, there is less information to tell what was going on > at that time. I think the better solution is to add logs about the failure > so we could catch the details of what's wrong inside. > Created https://issues.apache.org/jira/browse/BOOKKEEPER-714 to add exception in logging, so we could catch more details when this test case failed. > > >> >> Do we need to have an explicit retry mechanism in netty? >> > > I don't think we need retry connect in netty. As 1) we already have retry > mechanism in bookie client ; 2) if connect failed on any bookie, we should > let netty notify bookkeeper immediately. as connect failure means bookie > down in most of case, we should change bookie immediately to avoid high > latency. > > - Sijie > > >> >> -Rakesh >> -----Original Message----- >> From: Sijie Guo [mailto:[email protected]] >> Sent: 31 December 2013 11:59 >> To: [email protected] >> Subject: Re: Build failed in Jenkins: bookkeeper-trunk #489 >> >> I don't think its connect timeout setting issue. as by default, netty >> channel connect timeout is 10 sec ( >> https://github.com/netty/netty/blob/3.2/src/main/java/org/jboss/netty/channel/DefaultChannelConfig.java#L38 >> ). >> If you checked the log, the log statements show that the connect >> operation is in same second. >> >> 2013-12-30 12:29:36,731 - INFO - >> [BookKeeperClientWorker-0-0:PerChannelBookieClient@167] - Connecting to >> bookie: /67.195.138.30:15039 >> 2013-12-30 12:29:36,732 - ERROR - [New I/O client boss >> #5-1:PerChannelBookieClient$1@203] - Could not connect to bookie: [id: >> 0x019a639b, /229.27.250.246:46509 :> /67.195.138.30:15039], current >> state CONNECTING >> >> >> >> >> On Mon, Dec 30, 2013 at 9:31 PM, Rakesh R <[email protected]> wrote: >> >> > Hi Flavio, >> > >> > As test case name says, it is testing multiple bookie failures. >> > >> > On bookiefailure, when doing the ensemble reformation, unfortunately >> > it is failing to connect to the Bookie-15039. But it should suppose to >> > get connected and continue write operation. This is the reason for the >> > test case failure. Please see the following log pattern: >> > >> > 2013-12-30 12:29:36,731 - INFO - >> > [BookKeeperClientWorker-0-0:PerChannelBookieClient@167] - Connecting >> > to >> > bookie: /67.195.138.30:15039 >> > 2013-12-30 12:29:36,732 - ERROR - [New I/O client boss >> > #5-1:PerChannelBookieClient$1@203] - Could not connect to bookie: [id: >> > 0x019a639b, /229.27.250.246:46509 :> /67.195.138.30:15039], current >> > state CONNECTING >> > 2013-12-30 12:29:36,732 - WARN - >> > [BookKeeperClientWorker-0-0:PendingAddOp@158] - Write did not succeed: >> > L0 >> > E100 on /67.195.138.30:15039 >> > 2013-12-30 12:29:36,733 - INFO - >> > [BookKeeperClientWorker-0-0:LedgerHandle@659] - Handling failure of >> > bookie: /67.195.138.30:15039 index: 2 >> > 2013-12-30 12:29:36,733 - WARN - >> > [BookKeeperClientWorker-0-0:RackawareEnsemblePlacementPolicy@491] - >> > Failed to choose a bookie from /default-rack : excluded [<Bookie: >> > 67.195.138.30:15036>, <Bookie:67.195.138.30:15038>, <Bookie: >> > 67.195.138.30:15039>, <Bookie:67.195.138.30:15040>, <Bookie: >> > 67.195.138.30:15035>], fallback to choose bookie randomly from the >> > cluster. >> > >> > >> > I'm thinking, there could be chance of small network fluctuations or >> > slow machine and resulting in connection failure. >> > To handle this IMHO, we should have netty client connection timeout in >> > place and should retry for few intervals. Let me do a try with >> > bootstrap.setOption("connectTimeoutMillis", timeoutvalue); Shall I >> > raise a JIRA to discuss about these concerns and will reach to a >> > conclusion. Whats your opinion? >> > >> > -Rakesh >> > >> > -----Original Message----- >> > From: Flavio Junqueira [mailto:[email protected]] >> > Sent: 31 December 2013 01:51 >> > To: [email protected] >> > Subject: Fwd: Build failed in Jenkins: bookkeeper-trunk #489 >> > >> > I was wondering if there is a jira open for the test that failed >> > below, does anyone know? >> > >> > -Flavio >> > >> > Begin forwarded message: >> > >> > > Tests in error: >> > > >> > >> > testWithMultipleBookieFailuresInLastEnsemble[2](org.apache.bookkeeper. >> > client.BookieWriteLedgerTest) >> > >> > >> > >
