[
https://issues.apache.org/jira/browse/TAJO-942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14064488#comment-14064488
]
Jaehwa Jung edited comment on TAJO-942 at 7/17/14 2:22 AM:
-----------------------------------------------------------
Hi, guys,
I tried a few test to resolve this bug as follows:
* Case 1
*# TajoMaster::stop
{code:xml}
//RpcChannelFactory.shutdown()
//super.stop();
{code}
*# TajoWorker::stop
{code:xml}
//connPool.shutdown();
//RpcChannelFactory.shutdown();
{code}
In this case, the unit test finished successfully.
* Case 2
*# TajoMaster::stop
{code:xml}
//super.stop();
{code}
*# RpcChannelFactory::shutdown
{code:xml}
//factory.releaseExternalResources();
{code}
*# RpcConnectionPool::shutdown
{code:xml}
//factory.releaseExternalResources();
{code}
In this case, the unit test finished successfully.
* Case 3
*# RpcChannelFactory::shutdown
{code:xml}
//factory.releaseExternalResources();
{code}
*# RpcConnectionPool::shutdown
{code:xml}
//factory.releaseExternalResources();
{code}
In this case, NettyClientBase::connect goes into infinite loop.
* Case 4
*# When all component call NettyClientBase and NettyServerBase, they set
service name unique name with host address. But in this case,
NettyClientBase::connect goes into infinite loop.
* Case 5
*# RpcChannelFactory::getSharedClientChannelFactory
{code:xml}
public static synchronized ClientSocketChannelFactory
getSharedClientChannelFactory(){
//shared woker and boss pool
TajoConf conf = new TajoConf();
int workerNum =
conf.getIntVar(TajoConf.ConfVars.INTERNAL_RPC_CLIENT_WORKER_THREAD_NUM);
return createClientChannelFactory("Internal-Client", workerNum);
}
{code}
*# WorkerHeartbeatThread::run cause RejectedExecutionException.
I think that tajo share rpc channel in context member instead of static member.
And if we update this architecture, it can be affect other codes and
performance. Thus we need to discuss about this issue, and if you guys agree to
resolve it, we need to handle it at another jira issue.
was (Author: blrunner):
I tried a few test to resolve this bug as follows:
- Case 1
-- TajoMaster::stop
{code:xml}
//RpcChannelFactory.shutdown()
//super.stop();
{code}
-- TajoWorker::stop
{code:xml}
//connPool.shutdown();
//RpcChannelFactory.shutdown();
{code}
In this case, the unit test finished successfully.
- Case 2
> NettyClientBase throws RejectedExecutionException occasionally.
> ---------------------------------------------------------------
>
> Key: TAJO-942
> URL: https://issues.apache.org/jira/browse/TAJO-942
> Project: Tajo
> Issue Type: Bug
> Components: rpc
> Reporter: Jaehwa Jung
> Assignee: Jaehwa Jung
>
> NettyClientBase throws RejectedExecutionException occasionally.
> For example, add following simple codes to unit test cases.
> {code:xml}
> @Test
> public final void testShutdownCluster() throws Exception {
> TajoTestingCluster activeMaster = new TajoTestingCluster();
> activeMaster.startMiniCluster(1);
> activeMaster.shutdownMiniCluster();
> }
> {code}
> If you added above codes, run 'mvn clean install', and then you can find
> infinite loop as follows:
> {code:xml}
> 2014-07-15 10:36:12,217 ERROR: org.apache.tajo.rpc.AsyncRpcClient
> (exceptionCaught(235)) - RPC
> Exception:java.util.concurrent.RejectedExecutionException: Worker has already
> been shutdown
> 2014-07-15 10:36:12,218 ERROR: org.apache.tajo.worker.WorkerHeartbeatService
> (run(241)) - java.util.concurrent.RejectedExecutionException: Worker has
> already been shutdown
> java.lang.RuntimeException: java.util.concurrent.RejectedExecutionException:
> Worker has already been shutdown
> at org.apache.tajo.rpc.NettyClientBase.connect(NettyClientBase.java:93)
> at
> org.apache.tajo.rpc.RpcConnectionPool.getConnection(RpcConnectionPool.java:89)
> at
> org.apache.tajo.worker.WorkerHeartbeatService$WorkerHeartbeatThread.run(WorkerHeartbeatService.java:220)
> Caused by: java.util.concurrent.RejectedExecutionException: Worker has
> already been shutdown
> at
> org.jboss.netty.channel.socket.nio.AbstractNioSelector.registerTask(AbstractNioSelector.java:115)
> at
> org.jboss.netty.channel.socket.nio.AbstractNioSelector.register(AbstractNioSelector.java:100)
> at
> org.jboss.netty.channel.socket.nio.NioClientBoss.register(NioClientBoss.java:42)
> at
> org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink.connect(NioClientSocketPipelineSink.java:121)
> at
> org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink.eventSunk(NioClientSocketPipelineSink.java:70)
> at
> org.jboss.netty.handler.codec.oneone.OneToOneEncoder.handleDownstream(OneToOneEncoder.java:54)
> at
> org.jboss.netty.handler.codec.oneone.OneToOneEncoder.handleDownstream(OneToOneEncoder.java:54)
> at org.jboss.netty.channel.Channels.connect(Channels.java:634)
> at
> org.jboss.netty.channel.AbstractChannel.connect(AbstractChannel.java:207)
> at
> org.jboss.netty.bootstrap.ClientBootstrap.connect(ClientBootstrap.java:229)
> at
> org.jboss.netty.bootstrap.ClientBootstrap.connect(ClientBootstrap.java:182)
> at org.apache.tajo.rpc.NettyClientBase.connect(NettyClientBase.java:76)
> {code}
--
This message was sent by Atlassian JIRA
(v6.2#6252)