[ 
https://issues.apache.org/jira/browse/SAMZA-1008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15437610#comment-15437610
 ] 

Shanthoosh Venkataraman commented on SAMZA-1008:
------------------------------------------------

{quote}
  To enable work preserving RM recovery both YARN cluster & Samza jobs should 
be using YARN 2.6.1 or higher. In Samza this corresponds to using version 0.10 
or higher. Yarn configuration related to this are the following: 
{code} 
yarn.resourcemanager.work-preserving-recovery.enabled: This must be set to true.

yarn.resourcemanager.work-preserving-recovery.scheduling-wait-ms: Total time in 
milliseconds that the resource manager has to wait before handling new 
container requests after failover. This allows RM to re-sync up with NM’s to 
get latest resource availability information. This value has to be greater than 
NM-RM heartbeat configuration value.

yarn.resourcemanager.am.max-attempts: Maximum number of times any YARN 
application can be recovered after RM restart/failover is controlled by this 
config. This has to be set to high value.   
{code}
{quote}
Ping [~jonbringhurst] Please add this as a section when you're writing 
documentation for YARN in samza.

> Create documentation for YARN work preserving recovery
> ------------------------------------------------------
>
>                 Key: SAMZA-1008
>                 URL: https://issues.apache.org/jira/browse/SAMZA-1008
>             Project: Samza
>          Issue Type: Task
>          Components: yarn
>    Affects Versions: 0.10.0
>            Reporter: Shanthoosh Venkataraman
>            Assignee: Shanthoosh Venkataraman
>            Priority: Minor
>
> This is to create documentation for work preserving recovery feature of YARN 
> in samza. 
> Please see SAMZA-750 for more details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to