[
https://issues.apache.org/jira/browse/TWILL-103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14178681#comment-14178681
]
Terence Yim commented on TWILL-103:
-----------------------------------
The delay in lookup is caused by the fact that all ZK interactions in Twill are
performed asynchronously. The mechanism of how TwillRunnerService knows what
Applications are running are done by fetching and watching ZooKeeper, hence the
delay. One way to give better experience is to have TwillRunner to expose
methods for doing watches on the changes of applications.
> YarnTwillRunnerService lookup() fails to find application if called
> immediately after startAndWait()
> ----------------------------------------------------------------------------------------------------
>
> Key: TWILL-103
> URL: https://issues.apache.org/jira/browse/TWILL-103
> Project: Apache Twill
> Issue Type: Bug
> Components: yarn
> Affects Versions: 0.3.0-incubating, 0.4.0-incubating, 0.5.0-incubating
> Reporter: Mike Walch
>
> While TwillRunnerService.startAndWait() requests application/controller state
> from Zookeeper, there may be a delay in retrieving this state. This can
> cause subsequent lookup() calls to return null even if there is a Twill
> application running.
> While this can be prevented by adding a one second sleep after startAndWait()
> is called, it would be better if startAndWait() was modifed to not return
> until all state has been retrieved from Zookeeper.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)