[
https://issues.apache.org/jira/browse/TEZ-2669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15087106#comment-15087106
]
Jeff Zhang edited comment on TEZ-2669 at 1/7/16 9:38 AM:
---------------------------------------------------------
* Code duplication for handling InternalError and plugin error. Is the
InternalError still necessary ? It only happens in TaskSchedulerManager, we may
replace it as TASK_SCHEDULER_SERVICE_FATAL_ERROR
{code}
case TASK_COMMUNICATOR_SERVICE_FATAL_ERROR:
case CONTAINER_LAUNCHER_SERVICE_FATAL_ERROR:
case TASK_SCHEDULER_SERVICE_FATAL_ERROR:
// A fatal error from the pluggable services. The AM cannot continue
operation, and should
// be shutdown. The AM should not be restarted for recovery.
DAGAppMasterEventUserServiceFatalError usfe =
(DAGAppMasterEventUserServiceFatalError) event;
Throwable error = usfe.getError();
errDiagnostics = "Service Error: " + usfe.getDiagnosticInfo()
+ ", eventType=" + event.getType()
+ ", exception=" + ExceptionUtils.getStackTrace(usfe.getError());
LOG.error(errDiagnostics, error);
addDiagnostic(errDiagnostics);
state = DAGAppMasterState.ERROR;
if (currentDAG != null) {
_updateLoggers(currentDAG, "_post");
LOG.info("Service error: " + event.getType() + ". Aborting dag" +
currentDAG.getID());
// Inform the current DAG about the error
sendEvent(new DAGEventInternalError(currentDAG.getID(),
errDiagnostics));
} else {
LOG.info("Service error: " + event.getType() + ". AppMaster will exit
as no dag is active");
// This could be problematic if the scheduler generated the error,
// since un-registration may not be possible.
// For now - try setting this flag, but call the shutdownHandler
irrespective of
// how the flag is handled by user code.
try {
this.taskSchedulerManager.setShouldUnregisterFlag();
} catch (Exception e) {
// Ignore exception for now
LOG.error("Error when trying to set unregister flag for
TaskScheduler", e);
} finally {
shutdownHandler.shutdown();
}
}
break;
case INTERNAL_ERROR:
state = DAGAppMasterState.ERROR;
addDiagnostic("DAGAppMaster Internal Error occurred");
if(currentDAG != null) {
_updateLoggers(currentDAG, "_post");
// notify dag to finish which will send the DAG_FINISHED event
LOG.info("Internal Error. Notifying dags to finish.");
sendEvent(new DAGEventInternalError(currentDAG.getID(), "Internal
Error"));
} else {
LOG.info("Internal Error. Finishing directly as no dag is active.");
this.taskSchedulerManager.setShouldUnregisterFlag();
shutdownHandler.shutdown();
}
{code}
* What's the purpose of
TaskCommunicatorWrapper/ContainerLauncherWrapper/TaskSchedulerWrapper ? Add
exception on methods ? We can do it on
TaskCommunicator/ContainerLauncher/TaskScheduler directly since the plugin api
is not stabilized I guess. Or maybe you don't want to make impact on downstream
project (hive) ?
was (Author: zjffdu):
* Code duplicated for handling InternalError and plugin error. Is the
InternalError still necessary ? It only happens in TaskSchedulerManager, we may
replace it as TASK_SCHEDULER_SERVICE_FATAL_ERROR
{code}
case TASK_COMMUNICATOR_SERVICE_FATAL_ERROR:
case CONTAINER_LAUNCHER_SERVICE_FATAL_ERROR:
case TASK_SCHEDULER_SERVICE_FATAL_ERROR:
// A fatal error from the pluggable services. The AM cannot continue
operation, and should
// be shutdown. The AM should not be restarted for recovery.
DAGAppMasterEventUserServiceFatalError usfe =
(DAGAppMasterEventUserServiceFatalError) event;
Throwable error = usfe.getError();
errDiagnostics = "Service Error: " + usfe.getDiagnosticInfo()
+ ", eventType=" + event.getType()
+ ", exception=" + ExceptionUtils.getStackTrace(usfe.getError());
LOG.error(errDiagnostics, error);
addDiagnostic(errDiagnostics);
state = DAGAppMasterState.ERROR;
if (currentDAG != null) {
_updateLoggers(currentDAG, "_post");
LOG.info("Service error: " + event.getType() + ". Aborting dag" +
currentDAG.getID());
// Inform the current DAG about the error
sendEvent(new DAGEventInternalError(currentDAG.getID(),
errDiagnostics));
} else {
LOG.info("Service error: " + event.getType() + ". AppMaster will exit
as no dag is active");
// This could be problematic if the scheduler generated the error,
// since un-registration may not be possible.
// For now - try setting this flag, but call the shutdownHandler
irrespective of
// how the flag is handled by user code.
try {
this.taskSchedulerManager.setShouldUnregisterFlag();
} catch (Exception e) {
// Ignore exception for now
LOG.error("Error when trying to set unregister flag for
TaskScheduler", e);
} finally {
shutdownHandler.shutdown();
}
}
break;
case INTERNAL_ERROR:
state = DAGAppMasterState.ERROR;
addDiagnostic("DAGAppMaster Internal Error occurred");
if(currentDAG != null) {
_updateLoggers(currentDAG, "_post");
// notify dag to finish which will send the DAG_FINISHED event
LOG.info("Internal Error. Notifying dags to finish.");
sendEvent(new DAGEventInternalError(currentDAG.getID(), "Internal
Error"));
} else {
LOG.info("Internal Error. Finishing directly as no dag is active.");
this.taskSchedulerManager.setShouldUnregisterFlag();
shutdownHandler.shutdown();
}
{code}
* What's the purpose of
TaskCommunicatorWrapper/ContainerLauncherWrapper/TaskSchedulerWrapper ? Add
exception on methods ? We can do it on
TaskCommunicator/ContainerLauncher/TaskScheduler directly since the plugin api
is not stabilized I guess. Or maybe you don't want to make impact on downstream
project (hive) ?
> Propagation of errors from plugins to the AM for error reporting
> ----------------------------------------------------------------
>
> Key: TEZ-2669
> URL: https://issues.apache.org/jira/browse/TEZ-2669
> Project: Apache Tez
> Issue Type: Sub-task
> Affects Versions: 0.8.0-alpha
> Reporter: Siddharth Seth
> Assignee: Siddharth Seth
> Priority: Blocker
> Attachments: TEZ-2669.1.txt
>
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)