[jira] [Created] (TWILL-221) TwillRunner should periodically cleanup files that no app is using
Terence Yim created TWILL-221: - Summary: TwillRunner should periodically cleanup files that no app is using Key: TWILL-221 URL: https://issues.apache.org/jira/browse/TWILL-221 Project: Apache Twill Issue Type: Bug Reporter: Terence Yim Currently the app AM is responsible for cleaning files on HDFS belonging to that that run during shutdown. However, if the app is KILLed, then no one is removing those files. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (TWILL-221) TwillRunner should periodically cleanup files that no app is using
[ https://issues.apache.org/jira/browse/TWILL-221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Terence Yim updated TWILL-221: -- Affects Version/s: 0.8.0 0.9.0 0.10.0 > TwillRunner should periodically cleanup files that no app is using > -- > > Key: TWILL-221 > URL: https://issues.apache.org/jira/browse/TWILL-221 > Project: Apache Twill > Issue Type: Bug >Affects Versions: 0.8.0, 0.9.0, 0.10.0 >Reporter: Terence Yim > Fix For: 0.11.0 > > > Currently the app AM is responsible for cleaning files on HDFS belonging to > that that run during shutdown. However, if the app is KILLed, then no one is > removing those files. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (TWILL-221) TwillRunner should periodically cleanup files that no app is using
[ https://issues.apache.org/jira/browse/TWILL-221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Terence Yim updated TWILL-221: -- Fix Version/s: 0.11.0 > TwillRunner should periodically cleanup files that no app is using > -- > > Key: TWILL-221 > URL: https://issues.apache.org/jira/browse/TWILL-221 > Project: Apache Twill > Issue Type: Bug >Affects Versions: 0.8.0, 0.9.0, 0.10.0 >Reporter: Terence Yim > Fix For: 0.11.0 > > > Currently the app AM is responsible for cleaning files on HDFS belonging to > that that run during shutdown. However, if the app is KILLed, then no one is > removing those files. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (TWILL-186) ApplicationMaster keeps restarting with NPE in the log.
[ https://issues.apache.org/jira/browse/TWILL-186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15889361#comment-15889361 ] ASF GitHub Bot commented on TWILL-186: -- Github user anwar6953 commented on the issue: https://github.com/apache/twill/pull/34 LGTM > ApplicationMaster keeps restarting with NPE in the log. > --- > > Key: TWILL-186 > URL: https://issues.apache.org/jira/browse/TWILL-186 > Project: Apache Twill > Issue Type: Bug > Components: core, yarn >Affects Versions: 0.7.0-incubating >Reporter: Sagar Kapare >Assignee: Terence Yim > Fix For: 0.11.0 > > > Seems like certain combination of the container sizes launched by AM, causing > the AM to keep restarting. > Following exception is seen in the app master container log: > {noformat} > Aug 12, 2016 4:37:39 PM > com.google.common.util.concurrent.AbstractExecutionThreadService$1$1 run > WARNING: Error while attempting to shut down the service after failure. > java.lang.NullPointerException > at > org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.decResourceRequest(AMRMClientImpl.java:687) > at > org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.removeContainerRequest(AMRMClientImpl.java:477) > at > org.apache.twill.internal.yarn.Hadoop21YarnAMClient.removeContainerRequest(Hadoop21YarnAMClient.java:116) > at > org.apache.twill.internal.yarn.Hadoop21YarnAMClient.removeContainerRequest(Hadoop21YarnAMClient.java:45) > at > org.apache.twill.internal.yarn.AbstractYarnAMClient.allocate(AbstractYarnAMClient.java:119) > at > org.apache.twill.internal.appmaster.ApplicationMasterService.doStop(ApplicationMasterService.java:281) > at > org.apache.twill.internal.AbstractTwillService.shutDown(AbstractTwillService.java:186) > at > com.google.common.util.concurrent.AbstractExecutionThreadService$1$1.run(AbstractExecutionThreadService.java:55) > at java.lang.Thread.run(Thread.java:745) > Exception in thread "ApplicationMasterService" java.lang.NullPointerException > at > org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.decResourceRequest(AMRMClientImpl.java:687) > at > org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.removeContainerRequest(AMRMClientImpl.java:477) > at > org.apache.twill.internal.yarn.Hadoop21YarnAMClient.removeContainerRequest(Hadoop21YarnAMClient.java:116) > at > org.apache.twill.internal.yarn.Hadoop21YarnAMClient.removeContainerRequest(Hadoop21YarnAMClient.java:45) > at > org.apache.twill.internal.yarn.AbstractYarnAMClient.allocate(AbstractYarnAMClient.java:119) > at > org.apache.twill.internal.appmaster.ApplicationMasterService.doRun(ApplicationMasterService.java:369) > at > org.apache.twill.internal.AbstractTwillService.run(AbstractTwillService.java:179) > at > com.google.common.util.concurrent.AbstractExecutionThreadService$1$1.run(AbstractExecutionThreadService.java:52) > at java.lang.Thread.run(Thread.java:745) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] twill issue #34: (TWILL-186) Fix NPE and container size mismatch
Github user anwar6953 commented on the issue: https://github.com/apache/twill/pull/34 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] twill issue #35: (TWILL-207) Only use list of class names as the cache name
Github user anwar6953 commented on the issue: https://github.com/apache/twill/pull/35 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (TWILL-207) Better have the cache name purely based on class hash to encourage greater reuse.
[ https://issues.apache.org/jira/browse/TWILL-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15889342#comment-15889342 ] ASF GitHub Bot commented on TWILL-207: -- Github user anwar6953 commented on the issue: https://github.com/apache/twill/pull/35 LGTM > Better have the cache name purely based on class hash to encourage greater > reuse. > - > > Key: TWILL-207 > URL: https://issues.apache.org/jira/browse/TWILL-207 > Project: Apache Twill > Issue Type: Improvement >Reporter: Terence Yim >Assignee: Terence Yim > Fix For: 0.11.0 > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (TWILL-207) Better have the cache name purely based on class hash to encourage greater reuse.
[ https://issues.apache.org/jira/browse/TWILL-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15889330#comment-15889330 ] ASF GitHub Bot commented on TWILL-207: -- GitHub user chtyim opened a pull request: https://github.com/apache/twill/pull/35 (TWILL-207) Only use list of class names as the cache name - Also some indentation changes. You can merge this pull request into a Git repository by running: $ git pull https://github.com/chtyim/twill feature/twill-207 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/twill/pull/35.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #35 commit 542850875e0aecbe34d16bca962186b3d32bfb19 Author: Terence Yim Date: 2017-03-01T02:03:45Z (TWILL-207) Only use list of class names as the cache name - Also some indentation changes. > Better have the cache name purely based on class hash to encourage greater > reuse. > - > > Key: TWILL-207 > URL: https://issues.apache.org/jira/browse/TWILL-207 > Project: Apache Twill > Issue Type: Improvement >Reporter: Terence Yim >Assignee: Terence Yim > Fix For: 0.11.0 > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] twill pull request #35: (TWILL-207) Only use list of class names as the cach...
GitHub user chtyim opened a pull request: https://github.com/apache/twill/pull/35 (TWILL-207) Only use list of class names as the cache name - Also some indentation changes. You can merge this pull request into a Git repository by running: $ git pull https://github.com/chtyim/twill feature/twill-207 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/twill/pull/35.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #35 commit 542850875e0aecbe34d16bca962186b3d32bfb19 Author: Terence Yim Date: 2017-03-01T02:03:45Z (TWILL-207) Only use list of class names as the cache name - Also some indentation changes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] twill pull request #34: (TWILL-186) Fix NPE and container size mismatch
Github user chtyim commented on a diff in the pull request: https://github.com/apache/twill/pull/34#discussion_r103596214 --- Diff: twill-yarn/src/main/java/org/apache/twill/internal/yarn/AbstractYarnAMClient.java --- @@ -50,12 +51,11 @@ private static final Logger LOG = LoggerFactory.getLogger(AbstractYarnAMClient.class); // Map from a unique ID to inflight requests - private final Multimap containerRequests; - - // List of requests pending to send through allocate call - private final List requests; + private final Multimap inflightRequests; + // Map from a unique ID to pending requests. It is for recording --- End diff -- Oh. It is for recording the container requests that has yet to be sent to RM. Will update the comment. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (TWILL-186) ApplicationMaster keeps restarting with NPE in the log.
[ https://issues.apache.org/jira/browse/TWILL-186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15889314#comment-15889314 ] ASF GitHub Bot commented on TWILL-186: -- Github user chtyim commented on a diff in the pull request: https://github.com/apache/twill/pull/34#discussion_r103596214 --- Diff: twill-yarn/src/main/java/org/apache/twill/internal/yarn/AbstractYarnAMClient.java --- @@ -50,12 +51,11 @@ private static final Logger LOG = LoggerFactory.getLogger(AbstractYarnAMClient.class); // Map from a unique ID to inflight requests - private final Multimap containerRequests; - - // List of requests pending to send through allocate call - private final List requests; + private final Multimap inflightRequests; + // Map from a unique ID to pending requests. It is for recording --- End diff -- Oh. It is for recording the container requests that has yet to be sent to RM. Will update the comment. > ApplicationMaster keeps restarting with NPE in the log. > --- > > Key: TWILL-186 > URL: https://issues.apache.org/jira/browse/TWILL-186 > Project: Apache Twill > Issue Type: Bug > Components: core, yarn >Affects Versions: 0.7.0-incubating >Reporter: Sagar Kapare >Assignee: Terence Yim > Fix For: 0.11.0 > > > Seems like certain combination of the container sizes launched by AM, causing > the AM to keep restarting. > Following exception is seen in the app master container log: > {noformat} > Aug 12, 2016 4:37:39 PM > com.google.common.util.concurrent.AbstractExecutionThreadService$1$1 run > WARNING: Error while attempting to shut down the service after failure. > java.lang.NullPointerException > at > org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.decResourceRequest(AMRMClientImpl.java:687) > at > org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.removeContainerRequest(AMRMClientImpl.java:477) > at > org.apache.twill.internal.yarn.Hadoop21YarnAMClient.removeContainerRequest(Hadoop21YarnAMClient.java:116) > at > org.apache.twill.internal.yarn.Hadoop21YarnAMClient.removeContainerRequest(Hadoop21YarnAMClient.java:45) > at > org.apache.twill.internal.yarn.AbstractYarnAMClient.allocate(AbstractYarnAMClient.java:119) > at > org.apache.twill.internal.appmaster.ApplicationMasterService.doStop(ApplicationMasterService.java:281) > at > org.apache.twill.internal.AbstractTwillService.shutDown(AbstractTwillService.java:186) > at > com.google.common.util.concurrent.AbstractExecutionThreadService$1$1.run(AbstractExecutionThreadService.java:55) > at java.lang.Thread.run(Thread.java:745) > Exception in thread "ApplicationMasterService" java.lang.NullPointerException > at > org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.decResourceRequest(AMRMClientImpl.java:687) > at > org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.removeContainerRequest(AMRMClientImpl.java:477) > at > org.apache.twill.internal.yarn.Hadoop21YarnAMClient.removeContainerRequest(Hadoop21YarnAMClient.java:116) > at > org.apache.twill.internal.yarn.Hadoop21YarnAMClient.removeContainerRequest(Hadoop21YarnAMClient.java:45) > at > org.apache.twill.internal.yarn.AbstractYarnAMClient.allocate(AbstractYarnAMClient.java:119) > at > org.apache.twill.internal.appmaster.ApplicationMasterService.doRun(ApplicationMasterService.java:369) > at > org.apache.twill.internal.AbstractTwillService.run(AbstractTwillService.java:179) > at > com.google.common.util.concurrent.AbstractExecutionThreadService$1$1.run(AbstractExecutionThreadService.java:52) > at java.lang.Thread.run(Thread.java:745) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (TWILL-186) ApplicationMaster keeps restarting with NPE in the log.
[ https://issues.apache.org/jira/browse/TWILL-186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15889299#comment-15889299 ] ASF GitHub Bot commented on TWILL-186: -- Github user anwar6953 commented on a diff in the pull request: https://github.com/apache/twill/pull/34#discussion_r103595352 --- Diff: twill-yarn/src/main/java/org/apache/twill/internal/yarn/AbstractYarnAMClient.java --- @@ -50,12 +51,11 @@ private static final Logger LOG = LoggerFactory.getLogger(AbstractYarnAMClient.class); // Map from a unique ID to inflight requests - private final Multimap containerRequests; - - // List of requests pending to send through allocate call - private final List requests; + private final Multimap inflightRequests; + // Map from a unique ID to pending requests. It is for recording --- End diff -- It is for recording what? (incomplete sentence?) > ApplicationMaster keeps restarting with NPE in the log. > --- > > Key: TWILL-186 > URL: https://issues.apache.org/jira/browse/TWILL-186 > Project: Apache Twill > Issue Type: Bug > Components: core, yarn >Affects Versions: 0.7.0-incubating >Reporter: Sagar Kapare >Assignee: Terence Yim > Fix For: 0.11.0 > > > Seems like certain combination of the container sizes launched by AM, causing > the AM to keep restarting. > Following exception is seen in the app master container log: > {noformat} > Aug 12, 2016 4:37:39 PM > com.google.common.util.concurrent.AbstractExecutionThreadService$1$1 run > WARNING: Error while attempting to shut down the service after failure. > java.lang.NullPointerException > at > org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.decResourceRequest(AMRMClientImpl.java:687) > at > org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.removeContainerRequest(AMRMClientImpl.java:477) > at > org.apache.twill.internal.yarn.Hadoop21YarnAMClient.removeContainerRequest(Hadoop21YarnAMClient.java:116) > at > org.apache.twill.internal.yarn.Hadoop21YarnAMClient.removeContainerRequest(Hadoop21YarnAMClient.java:45) > at > org.apache.twill.internal.yarn.AbstractYarnAMClient.allocate(AbstractYarnAMClient.java:119) > at > org.apache.twill.internal.appmaster.ApplicationMasterService.doStop(ApplicationMasterService.java:281) > at > org.apache.twill.internal.AbstractTwillService.shutDown(AbstractTwillService.java:186) > at > com.google.common.util.concurrent.AbstractExecutionThreadService$1$1.run(AbstractExecutionThreadService.java:55) > at java.lang.Thread.run(Thread.java:745) > Exception in thread "ApplicationMasterService" java.lang.NullPointerException > at > org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.decResourceRequest(AMRMClientImpl.java:687) > at > org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.removeContainerRequest(AMRMClientImpl.java:477) > at > org.apache.twill.internal.yarn.Hadoop21YarnAMClient.removeContainerRequest(Hadoop21YarnAMClient.java:116) > at > org.apache.twill.internal.yarn.Hadoop21YarnAMClient.removeContainerRequest(Hadoop21YarnAMClient.java:45) > at > org.apache.twill.internal.yarn.AbstractYarnAMClient.allocate(AbstractYarnAMClient.java:119) > at > org.apache.twill.internal.appmaster.ApplicationMasterService.doRun(ApplicationMasterService.java:369) > at > org.apache.twill.internal.AbstractTwillService.run(AbstractTwillService.java:179) > at > com.google.common.util.concurrent.AbstractExecutionThreadService$1$1.run(AbstractExecutionThreadService.java:52) > at java.lang.Thread.run(Thread.java:745) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] twill pull request #34: (TWILL-186) Fix NPE and container size mismatch
Github user anwar6953 commented on a diff in the pull request: https://github.com/apache/twill/pull/34#discussion_r103595352 --- Diff: twill-yarn/src/main/java/org/apache/twill/internal/yarn/AbstractYarnAMClient.java --- @@ -50,12 +51,11 @@ private static final Logger LOG = LoggerFactory.getLogger(AbstractYarnAMClient.class); // Map from a unique ID to inflight requests - private final Multimap containerRequests; - - // List of requests pending to send through allocate call - private final List requests; + private final Multimap inflightRequests; + // Map from a unique ID to pending requests. It is for recording --- End diff -- It is for recording what? (incomplete sentence?) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Assigned] (TWILL-207) Better have the cache name purely based on class hash to encourage greater reuse.
[ https://issues.apache.org/jira/browse/TWILL-207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Terence Yim reassigned TWILL-207: - Assignee: Terence Yim > Better have the cache name purely based on class hash to encourage greater > reuse. > - > > Key: TWILL-207 > URL: https://issues.apache.org/jira/browse/TWILL-207 > Project: Apache Twill > Issue Type: Improvement >Reporter: Terence Yim >Assignee: Terence Yim > Fix For: 0.11.0 > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] twill pull request #34: Feature/twill 186
GitHub user chtyim opened a pull request: https://github.com/apache/twill/pull/34 Feature/twill 186 You can merge this pull request into a Git repository by running: $ git pull https://github.com/chtyim/twill feature/twill-186 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/twill/pull/34.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #34 commit b64c90b618f065ca01a8c5d36df2535b1344564e Author: Terence Yim Date: 2017-02-28T05:22:32Z (TWILL-186) Code cleanup for ApplicationMasterService and related classes - Get rid of the inner loop in the doRun method - The inner loop can block the heartbeat thread for too long if there are a lot of runnable instances to stop - Remove unnecessary throwables.propagate - Remove unnecessary intermediate method - Better logging - Request multiple instances in the same request - Refactory/simiply placement policy related code - Expose container instanceId instead of parsing it from runId commit fcfa5becd61ddc2513f29a2be700b3be166d4b0b Author: Terence Yim Date: 2017-02-28T22:43:37Z (TWILL-186) Guard against YARN returning mismatch container size case. - Also make sure we don't remove container request without adding it first --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---