[ https://issues.apache.org/jira/browse/AMBARI-15393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sandor Magyari updated AMBARI-15393: ------------------------------------ Attachment: AMBARI-15393.patch > Add stderr output of Ambari auto-recovery commands in agent log > --------------------------------------------------------------- > > Key: AMBARI-15393 > URL: https://issues.apache.org/jira/browse/AMBARI-15393 > Project: Ambari > Issue Type: Bug > Components: ambari-agent > Affects Versions: 2.2.1 > Reporter: Sandor Magyari > Assignee: Sandor Magyari > Priority: Critical > Fix For: 2.2.2 > > Attachments: AMBARI-15393.patch, AMBARI-15393_branch-2.2.patch > > > Users rely on Ambari auto-recovery logic to recover from component start > failures during cluster create. The idea is to improve reliability (through > retries) by sacrificing some of the latency. > In some cases we see that cluster creates fail because component start fails > and auto-recovery is unable to start those components for up to 2 hrs, most > often on headnodes for HIVE_SERVER, OOZIE_SERVER, and NAMENODE components. > The problem these kind of problems are hard to investigate later, as auto > recovery files are not sent to server side nor they are saved in ambari agent > logs, only stored on agent . > The solution is to add a new an option log_auto_execute_errors in logging > section to ambari-agent.ini. In case this is enabled agent will append stderr > of auto recovery command to agent log. -- This message was sent by Atlassian JIRA (v6.3.4#6332)