Writable RPC had a lot of leftover TCP connections in CLOSE_WAIT after RPC_TIMEOUT is enabled

2014-06-10 Thread Anfernee Xu
Hi,

I'm using hadoop-2.2.0 and take advantage of Hadoop WritableRpcEngine to
build my distributed application, and I have 'heartbeat' interface in my
application to check availability periodically, in order to detect any
potential failure, I enabled rpc_timeout when creating the proxy as below

 int rpcTimeout=1000;// 1 second as rpc timeout

 RPC.waitForProxy(
  MyApplicationInterface.class, MyApplicationInterface.versionID,
  socAddr, conf, rpcTimeout, timeout);

Everything went fine initially, I can see failures can be detected by the
heartbeat, but after a period of time(2 days or so), I saw a lot of TCP
connections in CLOSE_WAIT state on server side, and client was not able to
connect to it again.

Any clue about this?

Thanks

-- 
--Anfernee


Re: Writable RPC had a lot of leftover TCP connections in CLOSE_WAIT after RPC_TIMEOUT is enabled

2014-06-10 Thread Ted Yu
Why don't you base your application on ProtobufRpcEngine ?

Cheers


On Tue, Jun 10, 2014 at 10:42 AM, Anfernee Xu anfernee...@gmail.com wrote:

 Hi,

 I'm using hadoop-2.2.0 and take advantage of Hadoop WritableRpcEngine to
 build my distributed application, and I have 'heartbeat' interface in my
 application to check availability periodically, in order to detect any
 potential failure, I enabled rpc_timeout when creating the proxy as below

  int rpcTimeout=1000;// 1 second as rpc timeout

  RPC.waitForProxy(
   MyApplicationInterface.class, MyApplicationInterface.versionID,
   socAddr, conf, rpcTimeout, timeout);

 Everything went fine initially, I can see failures can be detected by the
 heartbeat, but after a period of time(2 days or so), I saw a lot of TCP
 connections in CLOSE_WAIT state on server side, and client was not able to
 connect to it again.

 Any clue about this?

 Thanks

 --
 --Anfernee



Re: Writable RPC had a lot of leftover TCP connections in CLOSE_WAIT after RPC_TIMEOUT is enabled

2014-06-10 Thread Anfernee Xu
Because it's kind of legacy system I built 4-5 years back with Hadoop 0.2.x
release, and recently we moved to 2.2.0 release. Moving to ProtocolBuffer
is one option but we need to migrate our infrastructure(hadoop and so on)
first and get it working(no regressions).

Is it a known issue?

Thanks


On Tue, Jun 10, 2014 at 10:47 AM, Ted Yu yuzhih...@gmail.com wrote:

 Why don't you base your application on ProtobufRpcEngine ?

 Cheers


 On Tue, Jun 10, 2014 at 10:42 AM, Anfernee Xu anfernee...@gmail.com
 wrote:

 Hi,

 I'm using hadoop-2.2.0 and take advantage of Hadoop WritableRpcEngine to
 build my distributed application, and I have 'heartbeat' interface in my
 application to check availability periodically, in order to detect any
 potential failure, I enabled rpc_timeout when creating the proxy as below

  int rpcTimeout=1000;// 1 second as rpc timeout

  RPC.waitForProxy(
   MyApplicationInterface.class, MyApplicationInterface.versionID,
   socAddr, conf, rpcTimeout, timeout);

 Everything went fine initially, I can see failures can be detected by the
 heartbeat, but after a period of time(2 days or so), I saw a lot of TCP
 connections in CLOSE_WAIT state on server side, and client was not able to
 connect to it again.

 Any clue about this?

 Thanks

 --
 --Anfernee





-- 
--Anfernee