[ 
https://issues.apache.org/jira/browse/AMBARI-15393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandor Magyari updated AMBARI-15393:
------------------------------------
    Description: 
Users rely on Ambari auto-recovery logic to recover from component start 
failures during cluster create. The idea is to improve reliability (through 
retries) by sacrificing some of the latency.
In some cases we see that cluster creates fail because component start fails 
and auto-recovery is unable to start those components for up to 2 hrs, most 
often on headnodes for HIVE_SERVER, OOZIE_SERVER, and NAMENODE components.
The problem these kind of problems are hard to investigate later, as auto 
recovery files are not sent to server side nor they are saved in ambari agent 
logs, only stored on agent . 
The solution is to add a new an option *log_command_executes* in logging 
section to ambari-agent.ini. In case this is enabled agent will append stderr 
of all commands (including auto_execute commands) to agent log.

  was:
Users rely on Ambari auto-recovery logic to recover from component start 
failures during cluster create. The idea is to improve reliability (through 
retries) by sacrificing some of the latency.
In some cases we see that cluster creates fail because component start fails 
and auto-recovery is unable to start those components for up to 2 hrs, most 
often on headnodes for HIVE_SERVER, OOZIE_SERVER, and NAMENODE components.
The problem these kind of problems are hard to investigate later, as auto 
recovery files are not sent to server side nor they are saved in ambari agent 
logs, only stored on agent . 
The solution is to add a new an option log_command_executes in logging section 
to ambari-agent.ini. In case this is enabled agent will append stderr of all 
commands (including auto_execute commands) to agent log.


> Add stderr output of Ambari auto-recovery commands in agent log
> ---------------------------------------------------------------
>
>                 Key: AMBARI-15393
>                 URL: https://issues.apache.org/jira/browse/AMBARI-15393
>             Project: Ambari
>          Issue Type: Bug
>          Components: ambari-agent
>    Affects Versions: 2.2.1
>            Reporter: Sandor Magyari
>            Assignee: Sandor Magyari
>            Priority: Critical
>             Fix For: 2.2.2
>
>         Attachments: AMBARI-15393.patch, AMBARI-15393_branch-2.2.patch
>
>
> Users rely on Ambari auto-recovery logic to recover from component start 
> failures during cluster create. The idea is to improve reliability (through 
> retries) by sacrificing some of the latency.
> In some cases we see that cluster creates fail because component start fails 
> and auto-recovery is unable to start those components for up to 2 hrs, most 
> often on headnodes for HIVE_SERVER, OOZIE_SERVER, and NAMENODE components.
> The problem these kind of problems are hard to investigate later, as auto 
> recovery files are not sent to server side nor they are saved in ambari agent 
> logs, only stored on agent . 
> The solution is to add a new an option *log_command_executes* in logging 
> section to ambari-agent.ini. In case this is enabled agent will append stderr 
> of all commands (including auto_execute commands) to agent log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to