[
https://issues.apache.org/jira/browse/TAJO-587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13895585#comment-13895585
]
Hyunsik Choi commented on TAJO-587:
-----------------------------------
Your point is that a failed query should be stopped immediate instead of
hanging. Is it right? I didn't get your point because this issue showed a
situation without a proposal.
Nevertheless, I still think that OOM caused by this situation is very useful
report for us, and we have to resolve this problem. If OOM still occurs, we
still cannot execute this kind query, even though we fix the haning problem. Of
course, we also have to fix the hanging problem by OOM too.
Thank you for this report.
> Query is hanging when OutOfMemoryError occurs in the query master
> -----------------------------------------------------------------
>
> Key: TAJO-587
> URL: https://issues.apache.org/jira/browse/TAJO-587
> Project: Tajo
> Issue Type: Bug
> Components: tajo master
> Reporter: Jihoon Son
> Fix For: 0.8-incubating
>
>
> See the title. When I run a simple sort query against a table of 1TB, the
> query is hanging and not finished.
> {noformat}
> tajo> select l_orderkey from lineitem order by l_orderkey
> 2014-02-05 17:20:52,339 FATAL master.TajoAsyncDispatcher
> (TajoAsyncDispatcher.java:dispatch(143)) - Error in dispatcher
> thread:SUBQUERY_COMPLETED
> java.lang.OutOfMemoryError: GC overhead limit exceeded
> at java.net.URI.create(URI.java:857)
> at
> org.apache.tajo.master.querymaster.Repartitioner.scheduleRangeShuffledFetches(Repartitioner.java:342)
> at
> org.apache.tajo.master.querymaster.Repartitioner.scheduleFragmentsForNonLeafTasks(Repartitioner.java:261)
> at
> org.apache.tajo.master.querymaster.SubQuery$InitAndRequestContainer.schedule(SubQuery.java:680)
> at
> org.apache.tajo.master.querymaster.SubQuery$InitAndRequestContainer.transition(SubQuery.java:523)
> at
> org.apache.tajo.master.querymaster.SubQuery$InitAndRequestContainer.transition(SubQuery.java:504)
> at
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
> at
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> at
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at
> org.apache.tajo.master.querymaster.SubQuery.handle(SubQuery.java:481)
> at
> org.apache.tajo.master.querymaster.Query$SubQueryCompletedTransition.executeNextBlock(Query.java:311)
> at
> org.apache.tajo.master.querymaster.Query$SubQueryCompletedTransition.transition(Query.java:357)
> at
> org.apache.tajo.master.querymaster.Query$SubQueryCompletedTransition.transition(Query.java:297)
> at
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
> at
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> at
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at org.apache.tajo.master.querymaster.Query.handle(Query.java:584)
> at org.apache.tajo.master.querymaster.Query.handle(Query.java:58)
> at
> org.apache.tajo.master.TajoAsyncDispatcher.dispatch(TajoAsyncDispatcher.java:137)
> at
> org.apache.tajo.master.TajoAsyncDispatcher$1.run(TajoAsyncDispatcher.java:79)
> at java.lang.Thread.run(Thread.java:701)
> 2014-02-05 17:20:52,339 WARN querymaster.QueryMaster
> (QueryMaster.java:run(459)) - Query q_1391587770871_0001 stopped cause query
> sesstion timeout: 384113 ms
> 2014-02-05 17:20:52,339 INFO querymaster.QueryMasterTask
> (QueryMasterTask.java:stop(168)) - Stopping
> QueryMasterTask:q_1391587770871_0001
> 2014-02-05 17:20:52,346 INFO master.TajoAsyncDispatcher
> (TajoAsyncDispatcher.java:stop(122)) - AsyncDispatcher
> stopped:q_1391587770871_0001
> 2014-02-05 17:20:52,351 INFO querymaster.QueryMasterTask
> (QueryMasterTask.java:stop(198)) - Stopped
> QueryMasterTask:q_1391587770871_0001
> 2014-02-05 17:23:28,614 ERROR worker.TajoWorker
> (SignalLogger.java:handle(60)) - RECEIVED SIGNAL 15: SIGTERM
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)