[
https://issues.apache.org/jira/browse/SAMZA-1008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15437610#comment-15437610
]
Shanthoosh Venkataraman commented on SAMZA-1008:
------------------------------------------------
{quote}
To enable work preserving RM recovery both YARN cluster & Samza jobs should
be using YARN 2.6.1 or higher. In Samza this corresponds to using version 0.10
or higher. Yarn configuration related to this are the following:
{code}
yarn.resourcemanager.work-preserving-recovery.enabled: This must be set to true.
yarn.resourcemanager.work-preserving-recovery.scheduling-wait-ms: Total time in
milliseconds that the resource manager has to wait before handling new
container requests after failover. This allows RM to re-sync up with NM’s to
get latest resource availability information. This value has to be greater than
NM-RM heartbeat configuration value.
yarn.resourcemanager.am.max-attempts: Maximum number of times any YARN
application can be recovered after RM restart/failover is controlled by this
config. This has to be set to high value.
{code}
{quote}
Ping [~jonbringhurst] Please add this as a section when you're writing
documentation for YARN in samza.
> Create documentation for YARN work preserving recovery
> ------------------------------------------------------
>
> Key: SAMZA-1008
> URL: https://issues.apache.org/jira/browse/SAMZA-1008
> Project: Samza
> Issue Type: Task
> Components: yarn
> Affects Versions: 0.10.0
> Reporter: Shanthoosh Venkataraman
> Assignee: Shanthoosh Venkataraman
> Priority: Minor
>
> This is to create documentation for work preserving recovery feature of YARN
> in samza.
> Please see SAMZA-750 for more details.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)