[jira] [Created] (TWILL-221) TwillRunner should periodically cleanup files that no app is using

2017-02-28 Thread Terence Yim (JIRA)
Terence Yim created TWILL-221:
-

 Summary: TwillRunner should periodically cleanup files that no app 
is using
 Key: TWILL-221
 URL: https://issues.apache.org/jira/browse/TWILL-221
 Project: Apache Twill
  Issue Type: Bug
Reporter: Terence Yim


Currently the app AM is responsible for cleaning files on HDFS belonging to 
that that run during shutdown. However, if the app is KILLed, then no one is 
removing those files.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (TWILL-221) TwillRunner should periodically cleanup files that no app is using

2017-02-28 Thread Terence Yim (JIRA)

 [ 
https://issues.apache.org/jira/browse/TWILL-221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Terence Yim updated TWILL-221:
--
Affects Version/s: 0.8.0
   0.9.0
   0.10.0

> TwillRunner should periodically cleanup files that no app is using
> --
>
> Key: TWILL-221
> URL: https://issues.apache.org/jira/browse/TWILL-221
> Project: Apache Twill
>  Issue Type: Bug
>Affects Versions: 0.8.0, 0.9.0, 0.10.0
>Reporter: Terence Yim
> Fix For: 0.11.0
>
>
> Currently the app AM is responsible for cleaning files on HDFS belonging to 
> that that run during shutdown. However, if the app is KILLed, then no one is 
> removing those files.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (TWILL-221) TwillRunner should periodically cleanup files that no app is using

2017-02-28 Thread Terence Yim (JIRA)

 [ 
https://issues.apache.org/jira/browse/TWILL-221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Terence Yim updated TWILL-221:
--
Fix Version/s: 0.11.0

> TwillRunner should periodically cleanup files that no app is using
> --
>
> Key: TWILL-221
> URL: https://issues.apache.org/jira/browse/TWILL-221
> Project: Apache Twill
>  Issue Type: Bug
>Affects Versions: 0.8.0, 0.9.0, 0.10.0
>Reporter: Terence Yim
> Fix For: 0.11.0
>
>
> Currently the app AM is responsible for cleaning files on HDFS belonging to 
> that that run during shutdown. However, if the app is KILLed, then no one is 
> removing those files.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TWILL-186) ApplicationMaster keeps restarting with NPE in the log.

2017-02-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TWILL-186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15889361#comment-15889361
 ] 

ASF GitHub Bot commented on TWILL-186:
--

Github user anwar6953 commented on the issue:

https://github.com/apache/twill/pull/34
  
LGTM


> ApplicationMaster keeps restarting with NPE in the log.
> ---
>
> Key: TWILL-186
> URL: https://issues.apache.org/jira/browse/TWILL-186
> Project: Apache Twill
>  Issue Type: Bug
>  Components: core, yarn
>Affects Versions: 0.7.0-incubating
>Reporter: Sagar Kapare
>Assignee: Terence Yim
> Fix For: 0.11.0
>
>
> Seems like certain combination of the container sizes launched by AM, causing 
> the AM to keep restarting.
> Following exception is seen in the app master container log:
> {noformat}
> Aug 12, 2016 4:37:39 PM 
> com.google.common.util.concurrent.AbstractExecutionThreadService$1$1 run
> WARNING: Error while attempting to shut down the service after failure.
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.decResourceRequest(AMRMClientImpl.java:687)
> at 
> org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.removeContainerRequest(AMRMClientImpl.java:477)
> at 
> org.apache.twill.internal.yarn.Hadoop21YarnAMClient.removeContainerRequest(Hadoop21YarnAMClient.java:116)
> at 
> org.apache.twill.internal.yarn.Hadoop21YarnAMClient.removeContainerRequest(Hadoop21YarnAMClient.java:45)
> at 
> org.apache.twill.internal.yarn.AbstractYarnAMClient.allocate(AbstractYarnAMClient.java:119)
> at 
> org.apache.twill.internal.appmaster.ApplicationMasterService.doStop(ApplicationMasterService.java:281)
> at 
> org.apache.twill.internal.AbstractTwillService.shutDown(AbstractTwillService.java:186)
> at 
> com.google.common.util.concurrent.AbstractExecutionThreadService$1$1.run(AbstractExecutionThreadService.java:55)
> at java.lang.Thread.run(Thread.java:745)
> Exception in thread "ApplicationMasterService" java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.decResourceRequest(AMRMClientImpl.java:687)
> at 
> org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.removeContainerRequest(AMRMClientImpl.java:477)
> at 
> org.apache.twill.internal.yarn.Hadoop21YarnAMClient.removeContainerRequest(Hadoop21YarnAMClient.java:116)
> at 
> org.apache.twill.internal.yarn.Hadoop21YarnAMClient.removeContainerRequest(Hadoop21YarnAMClient.java:45)
> at 
> org.apache.twill.internal.yarn.AbstractYarnAMClient.allocate(AbstractYarnAMClient.java:119)
> at 
> org.apache.twill.internal.appmaster.ApplicationMasterService.doRun(ApplicationMasterService.java:369)
> at 
> org.apache.twill.internal.AbstractTwillService.run(AbstractTwillService.java:179)
> at 
> com.google.common.util.concurrent.AbstractExecutionThreadService$1$1.run(AbstractExecutionThreadService.java:52)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] twill issue #34: (TWILL-186) Fix NPE and container size mismatch

2017-02-28 Thread anwar6953
Github user anwar6953 commented on the issue:

https://github.com/apache/twill/pull/34
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] twill issue #35: (TWILL-207) Only use list of class names as the cache name

2017-02-28 Thread anwar6953
Github user anwar6953 commented on the issue:

https://github.com/apache/twill/pull/35
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (TWILL-207) Better have the cache name purely based on class hash to encourage greater reuse.

2017-02-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TWILL-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15889342#comment-15889342
 ] 

ASF GitHub Bot commented on TWILL-207:
--

Github user anwar6953 commented on the issue:

https://github.com/apache/twill/pull/35
  
LGTM


> Better have the cache name purely based on class hash to encourage greater 
> reuse.
> -
>
> Key: TWILL-207
> URL: https://issues.apache.org/jira/browse/TWILL-207
> Project: Apache Twill
>  Issue Type: Improvement
>Reporter: Terence Yim
>Assignee: Terence Yim
> Fix For: 0.11.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TWILL-207) Better have the cache name purely based on class hash to encourage greater reuse.

2017-02-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TWILL-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15889330#comment-15889330
 ] 

ASF GitHub Bot commented on TWILL-207:
--

GitHub user chtyim opened a pull request:

https://github.com/apache/twill/pull/35

(TWILL-207) Only use list of class names as the cache name

- Also some indentation changes.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chtyim/twill feature/twill-207

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/twill/pull/35.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #35


commit 542850875e0aecbe34d16bca962186b3d32bfb19
Author: Terence Yim 
Date:   2017-03-01T02:03:45Z

(TWILL-207) Only use list of class names as the cache name

- Also some indentation changes.




> Better have the cache name purely based on class hash to encourage greater 
> reuse.
> -
>
> Key: TWILL-207
> URL: https://issues.apache.org/jira/browse/TWILL-207
> Project: Apache Twill
>  Issue Type: Improvement
>Reporter: Terence Yim
>Assignee: Terence Yim
> Fix For: 0.11.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] twill pull request #35: (TWILL-207) Only use list of class names as the cach...

2017-02-28 Thread chtyim
GitHub user chtyim opened a pull request:

https://github.com/apache/twill/pull/35

(TWILL-207) Only use list of class names as the cache name

- Also some indentation changes.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chtyim/twill feature/twill-207

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/twill/pull/35.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #35


commit 542850875e0aecbe34d16bca962186b3d32bfb19
Author: Terence Yim 
Date:   2017-03-01T02:03:45Z

(TWILL-207) Only use list of class names as the cache name

- Also some indentation changes.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] twill pull request #34: (TWILL-186) Fix NPE and container size mismatch

2017-02-28 Thread chtyim
Github user chtyim commented on a diff in the pull request:

https://github.com/apache/twill/pull/34#discussion_r103596214
  
--- Diff: 
twill-yarn/src/main/java/org/apache/twill/internal/yarn/AbstractYarnAMClient.java
 ---
@@ -50,12 +51,11 @@
   private static final Logger LOG = 
LoggerFactory.getLogger(AbstractYarnAMClient.class);
 
   // Map from a unique ID to inflight requests
-  private final Multimap containerRequests;
-
-  // List of requests pending to send through allocate call
-  private final List requests;
+  private final Multimap inflightRequests;
+  // Map from a unique ID to pending requests. It is for recording
--- End diff --

Oh. It is for recording the container requests that has yet to be sent to 
RM. Will update the comment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (TWILL-186) ApplicationMaster keeps restarting with NPE in the log.

2017-02-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TWILL-186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15889314#comment-15889314
 ] 

ASF GitHub Bot commented on TWILL-186:
--

Github user chtyim commented on a diff in the pull request:

https://github.com/apache/twill/pull/34#discussion_r103596214
  
--- Diff: 
twill-yarn/src/main/java/org/apache/twill/internal/yarn/AbstractYarnAMClient.java
 ---
@@ -50,12 +51,11 @@
   private static final Logger LOG = 
LoggerFactory.getLogger(AbstractYarnAMClient.class);
 
   // Map from a unique ID to inflight requests
-  private final Multimap containerRequests;
-
-  // List of requests pending to send through allocate call
-  private final List requests;
+  private final Multimap inflightRequests;
+  // Map from a unique ID to pending requests. It is for recording
--- End diff --

Oh. It is for recording the container requests that has yet to be sent to 
RM. Will update the comment.


> ApplicationMaster keeps restarting with NPE in the log.
> ---
>
> Key: TWILL-186
> URL: https://issues.apache.org/jira/browse/TWILL-186
> Project: Apache Twill
>  Issue Type: Bug
>  Components: core, yarn
>Affects Versions: 0.7.0-incubating
>Reporter: Sagar Kapare
>Assignee: Terence Yim
> Fix For: 0.11.0
>
>
> Seems like certain combination of the container sizes launched by AM, causing 
> the AM to keep restarting.
> Following exception is seen in the app master container log:
> {noformat}
> Aug 12, 2016 4:37:39 PM 
> com.google.common.util.concurrent.AbstractExecutionThreadService$1$1 run
> WARNING: Error while attempting to shut down the service after failure.
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.decResourceRequest(AMRMClientImpl.java:687)
> at 
> org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.removeContainerRequest(AMRMClientImpl.java:477)
> at 
> org.apache.twill.internal.yarn.Hadoop21YarnAMClient.removeContainerRequest(Hadoop21YarnAMClient.java:116)
> at 
> org.apache.twill.internal.yarn.Hadoop21YarnAMClient.removeContainerRequest(Hadoop21YarnAMClient.java:45)
> at 
> org.apache.twill.internal.yarn.AbstractYarnAMClient.allocate(AbstractYarnAMClient.java:119)
> at 
> org.apache.twill.internal.appmaster.ApplicationMasterService.doStop(ApplicationMasterService.java:281)
> at 
> org.apache.twill.internal.AbstractTwillService.shutDown(AbstractTwillService.java:186)
> at 
> com.google.common.util.concurrent.AbstractExecutionThreadService$1$1.run(AbstractExecutionThreadService.java:55)
> at java.lang.Thread.run(Thread.java:745)
> Exception in thread "ApplicationMasterService" java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.decResourceRequest(AMRMClientImpl.java:687)
> at 
> org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.removeContainerRequest(AMRMClientImpl.java:477)
> at 
> org.apache.twill.internal.yarn.Hadoop21YarnAMClient.removeContainerRequest(Hadoop21YarnAMClient.java:116)
> at 
> org.apache.twill.internal.yarn.Hadoop21YarnAMClient.removeContainerRequest(Hadoop21YarnAMClient.java:45)
> at 
> org.apache.twill.internal.yarn.AbstractYarnAMClient.allocate(AbstractYarnAMClient.java:119)
> at 
> org.apache.twill.internal.appmaster.ApplicationMasterService.doRun(ApplicationMasterService.java:369)
> at 
> org.apache.twill.internal.AbstractTwillService.run(AbstractTwillService.java:179)
> at 
> com.google.common.util.concurrent.AbstractExecutionThreadService$1$1.run(AbstractExecutionThreadService.java:52)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TWILL-186) ApplicationMaster keeps restarting with NPE in the log.

2017-02-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TWILL-186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15889299#comment-15889299
 ] 

ASF GitHub Bot commented on TWILL-186:
--

Github user anwar6953 commented on a diff in the pull request:

https://github.com/apache/twill/pull/34#discussion_r103595352
  
--- Diff: 
twill-yarn/src/main/java/org/apache/twill/internal/yarn/AbstractYarnAMClient.java
 ---
@@ -50,12 +51,11 @@
   private static final Logger LOG = 
LoggerFactory.getLogger(AbstractYarnAMClient.class);
 
   // Map from a unique ID to inflight requests
-  private final Multimap containerRequests;
-
-  // List of requests pending to send through allocate call
-  private final List requests;
+  private final Multimap inflightRequests;
+  // Map from a unique ID to pending requests. It is for recording
--- End diff --

It is for recording what?
(incomplete sentence?)


> ApplicationMaster keeps restarting with NPE in the log.
> ---
>
> Key: TWILL-186
> URL: https://issues.apache.org/jira/browse/TWILL-186
> Project: Apache Twill
>  Issue Type: Bug
>  Components: core, yarn
>Affects Versions: 0.7.0-incubating
>Reporter: Sagar Kapare
>Assignee: Terence Yim
> Fix For: 0.11.0
>
>
> Seems like certain combination of the container sizes launched by AM, causing 
> the AM to keep restarting.
> Following exception is seen in the app master container log:
> {noformat}
> Aug 12, 2016 4:37:39 PM 
> com.google.common.util.concurrent.AbstractExecutionThreadService$1$1 run
> WARNING: Error while attempting to shut down the service after failure.
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.decResourceRequest(AMRMClientImpl.java:687)
> at 
> org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.removeContainerRequest(AMRMClientImpl.java:477)
> at 
> org.apache.twill.internal.yarn.Hadoop21YarnAMClient.removeContainerRequest(Hadoop21YarnAMClient.java:116)
> at 
> org.apache.twill.internal.yarn.Hadoop21YarnAMClient.removeContainerRequest(Hadoop21YarnAMClient.java:45)
> at 
> org.apache.twill.internal.yarn.AbstractYarnAMClient.allocate(AbstractYarnAMClient.java:119)
> at 
> org.apache.twill.internal.appmaster.ApplicationMasterService.doStop(ApplicationMasterService.java:281)
> at 
> org.apache.twill.internal.AbstractTwillService.shutDown(AbstractTwillService.java:186)
> at 
> com.google.common.util.concurrent.AbstractExecutionThreadService$1$1.run(AbstractExecutionThreadService.java:55)
> at java.lang.Thread.run(Thread.java:745)
> Exception in thread "ApplicationMasterService" java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.decResourceRequest(AMRMClientImpl.java:687)
> at 
> org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.removeContainerRequest(AMRMClientImpl.java:477)
> at 
> org.apache.twill.internal.yarn.Hadoop21YarnAMClient.removeContainerRequest(Hadoop21YarnAMClient.java:116)
> at 
> org.apache.twill.internal.yarn.Hadoop21YarnAMClient.removeContainerRequest(Hadoop21YarnAMClient.java:45)
> at 
> org.apache.twill.internal.yarn.AbstractYarnAMClient.allocate(AbstractYarnAMClient.java:119)
> at 
> org.apache.twill.internal.appmaster.ApplicationMasterService.doRun(ApplicationMasterService.java:369)
> at 
> org.apache.twill.internal.AbstractTwillService.run(AbstractTwillService.java:179)
> at 
> com.google.common.util.concurrent.AbstractExecutionThreadService$1$1.run(AbstractExecutionThreadService.java:52)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] twill pull request #34: (TWILL-186) Fix NPE and container size mismatch

2017-02-28 Thread anwar6953
Github user anwar6953 commented on a diff in the pull request:

https://github.com/apache/twill/pull/34#discussion_r103595352
  
--- Diff: 
twill-yarn/src/main/java/org/apache/twill/internal/yarn/AbstractYarnAMClient.java
 ---
@@ -50,12 +51,11 @@
   private static final Logger LOG = 
LoggerFactory.getLogger(AbstractYarnAMClient.class);
 
   // Map from a unique ID to inflight requests
-  private final Multimap containerRequests;
-
-  // List of requests pending to send through allocate call
-  private final List requests;
+  private final Multimap inflightRequests;
+  // Map from a unique ID to pending requests. It is for recording
--- End diff --

It is for recording what?
(incomplete sentence?)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Assigned] (TWILL-207) Better have the cache name purely based on class hash to encourage greater reuse.

2017-02-28 Thread Terence Yim (JIRA)

 [ 
https://issues.apache.org/jira/browse/TWILL-207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Terence Yim reassigned TWILL-207:
-

Assignee: Terence Yim

> Better have the cache name purely based on class hash to encourage greater 
> reuse.
> -
>
> Key: TWILL-207
> URL: https://issues.apache.org/jira/browse/TWILL-207
> Project: Apache Twill
>  Issue Type: Improvement
>Reporter: Terence Yim
>Assignee: Terence Yim
> Fix For: 0.11.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] twill pull request #34: Feature/twill 186

2017-02-28 Thread chtyim
GitHub user chtyim opened a pull request:

https://github.com/apache/twill/pull/34

Feature/twill 186



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chtyim/twill feature/twill-186

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/twill/pull/34.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #34


commit b64c90b618f065ca01a8c5d36df2535b1344564e
Author: Terence Yim 
Date:   2017-02-28T05:22:32Z

(TWILL-186) Code cleanup for ApplicationMasterService and related classes

- Get rid of the inner loop in the doRun method
  - The inner loop can block the heartbeat thread for too long if there are 
a lot of runnable instances to stop
- Remove unnecessary throwables.propagate
- Remove unnecessary intermediate method
- Better logging
- Request multiple instances in the same request
- Refactory/simiply placement policy related code
- Expose container instanceId instead of parsing it from runId

commit fcfa5becd61ddc2513f29a2be700b3be166d4b0b
Author: Terence Yim 
Date:   2017-02-28T22:43:37Z

(TWILL-186) Guard against YARN returning mismatch container size case.

- Also make sure we don't remove container request without adding it
first




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---