[jira] [Resolved] (YARN-9183) TestAMRMTokens fails

2019-01-09 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe resolved YARN-9183. -- Resolution: Done HDFS-14084 was reverted so this should now be fixed. > TestAMRMTokens fails >

[jira] [Updated] (YARN-9183) TestAMRMTokens fails

2019-01-09 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-9183: - Priority: Blocker (was: Major) I think this is much worse than just a failed unit test. A simple

[jira] [Commented] (YARN-8498) Yarn NodeManager OOM Listener Fails Compilation on Ubuntu 18.04

2019-01-09 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16738362#comment-16738362 ] Jason Lowe commented on YARN-8498: -- Just coming up to speed on this, so apologies if I'm missing

[jira] [Commented] (YARN-6523) Newly retrieved security Tokens are sent as part of each heartbeat to each node from RM which is not desirable in large cluster

2019-01-08 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-6523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737610#comment-16737610 ] Jason Lowe commented on YARN-6523: -- Thanks for updating the patch! The unit test failure is unrelated

[jira] [Updated] (YARN-6523) Optimize system credentials sent in node heartbeat responses

2019-01-08 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-6523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-6523: - Summary: Optimize system credentials sent in node heartbeat responses (was: Newly retrieved security

[jira] [Updated] (YARN-9183) TestAMRMTokens fails

2019-01-08 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-9183: - Target Version/s: 3.0.4, 3.1.2, 3.3.0, 3.2.1 > TestAMRMTokens fails > > >

[jira] [Commented] (YARN-6523) Newly retrieved security Tokens are sent as part of each heartbeat to each node from RM which is not desirable in large cluster

2019-01-04 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-6523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16734513#comment-16734513 ] Jason Lowe commented on YARN-6523: -- The most recently posted patch is identical to patch version 11 which

[jira] [Commented] (YARN-6523) Newly retrieved security Tokens are sent as part of each heartbeat to each node from RM which is not desirable in large cluster

2018-12-20 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-6523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725927#comment-16725927 ] Jason Lowe commented on YARN-6523: -- Thanks for updating the patch! I think it is really close now.

[jira] [Commented] (YARN-9129) Ensure flush after printing to stderr plus additional cleanup

2018-12-17 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723124#comment-16723124 ] Jason Lowe commented on YARN-9129: -- bq. we added several fprintf(stderr calls, but the convention in

[jira] [Commented] (YARN-8937) Upgrade Curator version to 2.13.0 to fix ZK tests

2018-12-04 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16709293#comment-16709293 ] Jason Lowe commented on YARN-8937: -- Thanks for the excellent analysis! +1 lgtm. Committing this. >

[jira] [Commented] (YARN-6523) Newly retrieved security Tokens are sent as part of each heartbeat to each node from RM which is not desirable in large cluster

2018-12-04 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-6523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16708868#comment-16708868 ] Jason Lowe commented on YARN-6523: -- Thanks for updating the patch! If a unit test just added in a patch

[jira] [Commented] (YARN-6523) Newly retrieved security Tokens are sent as part of each heartbeat to each node from RM which is not desirable in large cluster

2018-11-29 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-6523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16703485#comment-16703485 ] Jason Lowe commented on YARN-6523: -- Thanks for updating the patch! NodeHeartbeatResponse should not take

[jira] [Commented] (YARN-7018) Interface for adding extra behavior to node heartbeats

2018-11-28 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16702036#comment-16702036 ] Jason Lowe commented on YARN-7018: -- Thanks for updating the patch! Overall looks OK for a POC. Would be

[jira] [Commented] (YARN-7086) Release all containers aynchronously

2018-11-28 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-7086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16702010#comment-16702010 ] Jason Lowe commented on YARN-7086: -- Sorry for the long delay. It's good to see the performance number

[jira] [Commented] (YARN-8812) Containers fail during creating a symlink which started with hyphen for a resource file

2018-11-28 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16701959#comment-16701959 ] Jason Lowe commented on YARN-8812: -- Thanks for the patch! +1 lgtm. Committing this. > Containers fail

[jira] [Commented] (YARN-6523) Newly retrieved security Tokens are sent as part of each heartbeat to each node from RM which is not desirable in large cluster

2018-11-26 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-6523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16699188#comment-16699188 ] Jason Lowe commented on YARN-6523: -- Thanks for updating the patch! The whitespace and ASF warnings are

[jira] [Commented] (YARN-6523) Newly retrieved security Tokens are sent as part of each heartbeat to each node from RM which is not desirable in large cluster

2018-11-13 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-6523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16685353#comment-16685353 ] Jason Lowe commented on YARN-6523: -- Thanks for updating the patch! The unit test failure is unrelated

[jira] [Commented] (YARN-9014) OCI/squashfs container runtime

2018-11-12 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16684493#comment-16684493 ] Jason Lowe commented on YARN-9014: -- Attached a rough draft of the document. There's quite a bit of

[jira] [Updated] (YARN-9014) OCI/squashfs container runtime

2018-11-12 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-9014: - Attachment: OciSquashfsRuntime.v001.pdf > OCI/squashfs container runtime > --

[jira] [Created] (YARN-9014) OCI/squashfs container runtime

2018-11-12 Thread Jason Lowe (JIRA)
Jason Lowe created YARN-9014: Summary: OCI/squashfs container runtime Key: YARN-9014 URL: https://issues.apache.org/jira/browse/YARN-9014 Project: Hadoop YARN Issue Type: New Feature

[jira] [Commented] (YARN-8951) Defining default queue placement rule in allocations file with create="false" throws an NPE

2018-10-29 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16667162#comment-16667162 ] Jason Lowe commented on YARN-8951: -- I am not an expert in FairScheduler or its placement rule policies,

[jira] [Commented] (YARN-6523) Newly retrieved security Tokens are sent as part of each heartbeat to each node from RM which is not desirable in large cluster

2018-10-25 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-6523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664285#comment-16664285 ] Jason Lowe commented on YARN-6523: -- Thanks for updating the patch! All PBImpl set methods must call

[jira] [Commented] (YARN-8672) TestContainerManager#testLocalingResourceWhileContainerRunning occasionally times out

2018-10-25 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664229#comment-16664229 ] Jason Lowe commented on YARN-8672: -- Thanks for updating the patch! I'm still skeptical this is going to

[jira] [Commented] (YARN-8672) TestContainerManager#testLocalingResourceWhileContainerRunning occasionally times out

2018-10-23 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661198#comment-16661198 ] Jason Lowe commented on YARN-8672: -- Thanks for the analysis and patch! I believe the patch will fix the

[jira] [Commented] (YARN-8937) TestLeaderElectorService hangs

2018-10-23 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661044#comment-16661044 ] Jason Lowe commented on YARN-8937: -- git bisect shows this was caused by HADOOP-15816. Looks like the ZK

[jira] [Commented] (YARN-8904) TestRMDelegationTokens can fail in testRMDTMasterKeyStateOnRollingMasterKey

2018-10-23 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16660919#comment-16660919 ] Jason Lowe commented on YARN-8904: -- Whenever it says there was a timeout or other error in the fork then

[jira] [Created] (YARN-8937) TestLeaderElectorService hangs

2018-10-23 Thread Jason Lowe (JIRA)
Jason Lowe created YARN-8937: Summary: TestLeaderElectorService hangs Key: YARN-8937 URL: https://issues.apache.org/jira/browse/YARN-8937 Project: Hadoop YARN Issue Type: Bug Affects

[jira] [Commented] (YARN-8928) TestRMAdminService is failing

2018-10-22 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16659073#comment-16659073 ] Jason Lowe commented on YARN-8928: -- {noformat} [INFO] Running

[jira] [Created] (YARN-8928) TestRMAdminService is failing

2018-10-22 Thread Jason Lowe (JIRA)
Jason Lowe created YARN-8928: Summary: TestRMAdminService is failing Key: YARN-8928 URL: https://issues.apache.org/jira/browse/YARN-8928 Project: Hadoop YARN Issue Type: Bug Affects

[jira] [Commented] (YARN-8865) RMStateStore contains large number of expired RMDelegationToken

2018-10-19 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16657549#comment-16657549 ] Jason Lowe commented on YARN-8865: -- Thanks for updating the patch! The main change looks fine to me. It

[jira] [Assigned] (YARN-8587) Delays are noticed to launch docker container

2018-10-19 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe reassigned YARN-8587: Assignee: Charo Zhang [~Charo Zhang] I added you as a contributor to the YARN project in JIRA.

[jira] [Commented] (YARN-8448) AM HTTPS Support

2018-10-11 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16647131#comment-16647131 ] Jason Lowe commented on YARN-8448: -- bq. The cc warning isn't a problem. IMHO the warning should be

[jira] [Commented] (YARN-8861) executorLock is misleading in ContainerLaunch

2018-10-11 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16646664#comment-16646664 ] Jason Lowe commented on YARN-8861: -- Thanks for the patch! +1 lgtm. Committing this. > executorLock is

[jira] [Commented] (YARN-8865) RMStateStore contains large number of expired RMDelegationToken

2018-10-10 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16645180#comment-16645180 ] Jason Lowe commented on YARN-8865: -- Thanks for the report and patch! Do we have any idea how these are

[jira] [Commented] (YARN-7086) Release all containers aynchronously

2018-10-10 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-7086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16645159#comment-16645159 ] Jason Lowe commented on YARN-7086: -- Thanks for developing a perf test case! The huge variations in

[jira] [Commented] (YARN-7644) NM gets backed up deleting docker containers

2018-10-10 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16645096#comment-16645096 ] Jason Lowe commented on YARN-7644: -- Thanks for updating the patch! +1 for patch v6 as well. Committing

[jira] [Commented] (YARN-7018) Interface for adding extra behavior to node heartbeats

2018-10-10 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644967#comment-16644967 ] Jason Lowe commented on YARN-7018: -- Thanks for updating the patch! Interface looks good for now. This

[jira] [Commented] (YARN-7644) NM gets backed up deleting docker containers

2018-10-09 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644125#comment-16644125 ] Jason Lowe commented on YARN-7644: -- Thanks for updating the patch! +1 for patch 5 pending Jenkins. > NM

[jira] [Commented] (YARN-7644) NM gets backed up deleting docker containers

2018-10-09 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644044#comment-16644044 ] Jason Lowe commented on YARN-7644: -- Thanks for updating the patch! compareAndSetAlreadyLaunched is too

[jira] [Updated] (YARN-8858) CapacityScheduler should respect maximum node resource when per-queue maximum-allocation is being used.

2018-10-09 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-8858: - Target Version/s: 2.10.0, 3.2.0, 2.9.2, 3.0.4, 3.1.2, 2.8.6 (was: 3.2.0, 3.1.2) > CapacityScheduler

[jira] [Commented] (YARN-7644) NM gets backed up deleting docker containers

2018-10-09 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643405#comment-16643405 ] Jason Lowe commented on YARN-7644: -- Ah yes, sorry I was confusing reaping a container with killing it.

[jira] [Commented] (YARN-7644) NM gets backed up deleting docker containers

2018-10-08 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16642576#comment-16642576 ] Jason Lowe commented on YARN-7644: -- Thanks for the patch! I'm a little concerned about the container

[jira] [Commented] (YARN-8856) TestTimelineReaderWebServicesHBaseStorage tests failing with NoClassDefFoundError

2018-10-08 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16642514#comment-16642514 ] Jason Lowe commented on YARN-8856: -- Sample test failure: {noformat} java.io.IOException: Incorrect

[jira] [Created] (YARN-8856) TestTimelineReaderWebServicesHBaseStorage tests failing with NoClassDefFoundError

2018-10-08 Thread Jason Lowe (JIRA)
Jason Lowe created YARN-8856: Summary: TestTimelineReaderWebServicesHBaseStorage tests failing with NoClassDefFoundError Key: YARN-8856 URL: https://issues.apache.org/jira/browse/YARN-8856 Project:

[jira] [Updated] (YARN-4254) ApplicationAttempt stuck for ever due to UnknownHostException

2018-10-05 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-4254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-4254: - Summary: ApplicationAttempt stuck for ever due to UnknownHostException (was: ApplicationAttempt stuck

[jira] [Updated] (YARN-4254) ApplicationAttempt stuck for ever due to UnknownHostexception

2018-10-05 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-4254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-4254: - Summary: ApplicationAttempt stuck for ever due to UnknownHostexception (was: ApplicationAttempt stuck

[jira] [Commented] (YARN-4254) ApplicationAttempt stuck for ever due to UnknowHostexception

2018-10-03 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-4254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16637565#comment-16637565 ] Jason Lowe commented on YARN-4254: -- Sorry for the delay, was out of the office for a bit and very busy

[jira] [Resolved] (YARN-6091) the AppMaster register failed when use Docker on LinuxContainer

2018-10-03 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-6091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe resolved YARN-6091. -- Resolution: Implemented Fix Version/s: 3.1.1 3.2.0 Closing this as fixed by

[jira] [Commented] (YARN-8837) TestNMProxy.testNMProxyRPCRetry Improvement

2018-10-01 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16634068#comment-16634068 ] Jason Lowe commented on YARN-8837: -- Thanks for the patch! Wouldn't it be much simpler to have the patch

[jira] [Commented] (YARN-8804) resourceLimits may be wrongly calculated when leaf-queue is blocked in cluster with 3+ level queues

2018-09-24 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626564#comment-16626564 ] Jason Lowe commented on YARN-8804: -- Thanks for updating the patch! +1 lgtm. I'll commit this by

[jira] [Updated] (YARN-6510) Fix profs stat file warning caused by process names that includes parenthesis

2018-09-21 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-6510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-6510: - Fix Version/s: 2.8.6 Thanks, [~wilfreds]! I committed this to branch-2.8 as well. > Fix profs stat file

[jira] [Updated] (YARN-8804) resourceLimits may be wrongly calculated when leaf-queue is blocked in cluster with 3+ level queues

2018-09-21 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-8804: - Target Version/s: 2.10.0, 3.2.0, 2.9.2, 3.0.4, 3.1.2, 2.8.6 Thanks for updating the patch! This is a

[jira] [Commented] (YARN-8804) resourceLimits may be wrongly calculated when leaf-queue is blocked in cluster with 3+ level queues

2018-09-20 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16622771#comment-16622771 ] Jason Lowe commented on YARN-8804: -- >From the looks of YARN-8513 this appears to be a separate issue.

[jira] [Commented] (YARN-8804) resourceLimits may be wrongly calculated when leaf-queue is blocked in cluster with 3+ level queues

2018-09-20 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16622281#comment-16622281 ] Jason Lowe commented on YARN-8804: -- Thanks for the report and patch! Nice analysis. A naked volatile

[jira] [Commented] (YARN-8783) Improve the documentation for the docker.trusted.registries configuration

2018-09-19 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16621225#comment-16621225 ] Jason Lowe commented on YARN-8783: -- [~simonprewo] I added you to the YARN contributor list. Feel free to

[jira] [Commented] (YARN-6456) Allow administrators to set a single ContainerRuntime for all containers

2018-09-19 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-6456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16621224#comment-16621224 ] Jason Lowe commented on YARN-6456: -- My apologies for the delay -- I missed the last comment being posted.

[jira] [Commented] (YARN-8784) DockerLinuxContainerRuntime prevents access to distributed cache entries on a full disk

2018-09-19 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16620747#comment-16620747 ] Jason Lowe commented on YARN-8784: -- Thanks for the patch! We should be fine bind-mounting the full and

[jira] [Commented] (YARN-8648) Container cgroups are leaked when using docker

2018-09-18 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619670#comment-16619670 ] Jason Lowe commented on YARN-8648: -- Thanks, [~billie.rinaldi]! Looks like this is good to go then.

[jira] [Reopened] (YARN-8786) LinuxContainerExecutor fails sporadically in create_local_dirs

2018-09-18 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe reopened YARN-8786: -- This should be left open to track the sporadic failure in creating directories. YARN-8751 may make this

[jira] [Commented] (YARN-8635) Container Resource localization fails if umask is 077

2018-09-18 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619284#comment-16619284 ] Jason Lowe commented on YARN-8635: -- Thanks for the patch! It would be nice to have a short comment

[jira] [Created] (YARN-8784) DockerLinuxContainerRuntime prevents access to distributed cache entries on a full disk

2018-09-17 Thread Jason Lowe (JIRA)
Jason Lowe created YARN-8784: Summary: DockerLinuxContainerRuntime prevents access to distributed cache entries on a full disk Key: YARN-8784 URL: https://issues.apache.org/jira/browse/YARN-8784 Project:

[jira] [Commented] (YARN-8648) Container cgroups are leaked when using docker

2018-09-13 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16614019#comment-16614019 ] Jason Lowe commented on YARN-8648: -- Thanks for updating the patch! +1 lgtm. Waiting to hear back from

[jira] [Updated] (YARN-8680) YARN NM: Implement Iterable Abstraction for LocalResourceTracker state

2018-09-13 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-8680: - Summary: YARN NM: Implement Iterable Abstraction for LocalResourceTracker state (was: YARN NM: Implement

[jira] [Commented] (YARN-8680) YARN NM: Implement Iterable Abstraction for LocalResourceTrackerstate

2018-09-13 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16613878#comment-16613878 ] Jason Lowe commented on YARN-8680: -- Thanks for updating the patch! +1 lgtm. Committing this. > YARN

[jira] [Commented] (YARN-6456) Allow administrators to set a single ContainerRuntime for all containers

2018-09-13 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-6456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16613676#comment-16613676 ] Jason Lowe commented on YARN-6456: -- Thanks for updating the patch! +1 lgtm. I don't see the allowed

[jira] [Commented] (YARN-8648) Container cgroups are leaked when using docker

2018-09-12 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16612788#comment-16612788 ] Jason Lowe commented on YARN-8648: -- Thanks for updating the patch! It seems a bit awkward that

[jira] [Commented] (YARN-6456) Allow administrators to set a single ContainerRuntime for all containers

2018-09-12 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-6456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16612220#comment-16612220 ] Jason Lowe commented on YARN-6456: -- bq. My initial thought was that we already validate this at runtime,

[jira] [Commented] (YARN-7086) Release all containers aynchronously

2018-09-11 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-7086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16610823#comment-16610823 ] Jason Lowe commented on YARN-7086: -- I'm worried that we're delving into the classic pitfall of optimizing

[jira] [Commented] (YARN-6456) Allow administrators to set a single ContainerRuntime for all containers

2018-09-11 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-6456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16610814#comment-16610814 ] Jason Lowe commented on YARN-6456: -- Thanks for the patch! Looks good overall, just some small

[jira] [Commented] (YARN-7018) Interface for adding extra behavior to node heartbeats

2018-09-10 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609832#comment-16609832 ] Jason Lowe commented on YARN-7018: -- Originally I was thinking this could be outside of the scheduler,

[jira] [Commented] (YARN-8680) YARN NM: Implement Iterable Abstraction for LocalResourceTrackerstate

2018-09-10 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609821#comment-16609821 ] Jason Lowe commented on YARN-8680: -- Thanks for updating the patch! In loadUserLocalizedResources for

[jira] [Commented] (YARN-8648) Container cgroups are leaked when using docker

2018-09-10 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609735#comment-16609735 ] Jason Lowe commented on YARN-8648: -- Thanks for updating the patch! Should DockerRmCommand take the

[jira] [Assigned] (YARN-4961) Wrapper for leveldb DB to aid in handling database exceptions

2018-09-07 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-4961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe reassigned YARN-4961: Assignee: Pradeep Ambati Yes, exactly.  Thanks for picking this up! Currently it's very fragile

[jira] [Commented] (YARN-8751) Container-executor permission check errors cause the NM to be marked unhealthy

2018-09-06 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605803#comment-16605803 ] Jason Lowe commented on YARN-8751: -- A bad container executor or config file is pretty catastrophic since

[jira] [Commented] (YARN-8730) TestRMWebServiceAppsNodelabel#testAppsRunning fails

2018-09-05 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605013#comment-16605013 ] Jason Lowe commented on YARN-8730: -- Thanks for posting the test-patch results! I agree the test failures

[jira] [Commented] (YARN-8680) YARN NM: Implement Iterable Abstraction for LocalResourceTrackerstate

2018-09-04 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16603616#comment-16603616 ] Jason Lowe commented on YARN-8680: -- Thanks for updating the patch! seekPastPrefix needs to handle

[jira] [Commented] (YARN-8730) TestRMWebServiceAppsNodelabel#testAppsRunning fails

2018-09-04 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16603517#comment-16603517 ] Jason Lowe commented on YARN-8730: -- Thanks for the patch! The risk of creating a new resource on every

[jira] [Commented] (YARN-8051) TestRMEmbeddedElector#testCallbackSynchronization is flakey

2018-08-30 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16597748#comment-16597748 ] Jason Lowe commented on YARN-8051: -- Thanks for the review and commit, [~eepayne]! >

[jira] [Commented] (YARN-8695) ERROR: Container complete event for unknown container id

2018-08-30 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16597738#comment-16597738 ] Jason Lowe commented on YARN-8695: -- Here's the relevant portion of the log: {noformat} 2018-08-30

[jira] [Commented] (YARN-8680) YARN NM: Implement Iterable Abstraction for LocalResourceTrackerstate

2018-08-30 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16597623#comment-16597623 ] Jason Lowe commented on YARN-8680: -- Thanks for the patch! It would be cleaner to have a

[jira] [Commented] (YARN-8703) Localized resource may leak on disk if container is killed while localizing

2018-08-30 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16597493#comment-16597493 ] Jason Lowe commented on YARN-8703: -- Ah, sorry. I mistakenly used the wrong log message that I would

[jira] [Commented] (YARN-8730) TestRMWebServiceAppsNodelabel#testAppsRunning fails

2018-08-29 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16596921#comment-16596921 ] Jason Lowe commented on YARN-8730: -- trunk and other releases ahead of 2.8 do not do this since they

[jira] [Commented] (YARN-7619) Max AM Resource value in Capacity Scheduler UI has to be refreshed for every user

2018-08-29 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-7619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16596922#comment-16596922 ] Jason Lowe commented on YARN-7619: -- The branch-2.8 version of this patch unfortunately affected the 2.8

[jira] [Updated] (YARN-8730) TestRMWebServiceAppsNodelabel#testAppsRunning fails

2018-08-29 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-8730: - Affects Version/s: 2.8.4 > TestRMWebServiceAppsNodelabel#testAppsRunning fails >

[jira] [Commented] (YARN-8730) TestRMWebServiceAppsNodelabel#testAppsRunning fails

2018-08-29 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16596911#comment-16596911 ] Jason Lowe commented on YARN-8730: -- git bisect narrows this down to YARN-7619. A "res" field was added

[jira] [Created] (YARN-8730) TestRMWebServiceAppsNodelabel#testAppsRunning fails

2018-08-29 Thread Jason Lowe (JIRA)
Jason Lowe created YARN-8730: Summary: TestRMWebServiceAppsNodelabel#testAppsRunning fails Key: YARN-8730 URL: https://issues.apache.org/jira/browse/YARN-8730 Project: Hadoop YARN Issue Type:

[jira] [Commented] (YARN-8648) Container cgroups are leaked when using docker

2018-08-29 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16596699#comment-16596699 ] Jason Lowe commented on YARN-8648: -- Thanks for the patch! Why was the postComplete call moved in

[jira] [Updated] (YARN-8051) TestRMEmbeddedElector#testCallbackSynchronization is flakey

2018-08-24 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-8051: - Attachment: YARN-8051-branch-2.002.patch > TestRMEmbeddedElector#testCallbackSynchronization is flakey >

[jira] [Reopened] (YARN-8051) TestRMEmbeddedElector#testCallbackSynchronization is flakey

2018-08-24 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe reopened YARN-8051: -- Reopening to get a Jenkins run on the branch-2 patch. > TestRMEmbeddedElector#testCallbackSynchronization

[jira] [Updated] (YARN-8051) TestRMEmbeddedElector#testCallbackSynchronization is flakey

2018-08-24 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-8051: - Affects Version/s: 2.10.0 2.9.1 2.8.4

[jira] [Commented] (YARN-8638) Allow linux container runtimes to be pluggable

2018-08-23 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16590572#comment-16590572 ] Jason Lowe commented on YARN-8638: -- bq. However, it also can cause security issues, if configuration can

[jira] [Commented] (YARN-7086) Release all containers aynchronously

2018-08-23 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-7086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16590556#comment-16590556 ] Jason Lowe commented on YARN-7086: -- bq. I assume you are referring the lock inside

[jira] [Comment Edited] (YARN-8638) Allow linux container runtimes to be pluggable

2018-08-23 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16590416#comment-16590416 ] Jason Lowe edited comment on YARN-8638 at 8/23/18 4:01 PM: --- bq. It would be

[jira] [Commented] (YARN-8638) Allow linux container runtimes to be pluggable

2018-08-23 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16590416#comment-16590416 ] Jason Lowe commented on YARN-8638: -- Unless I'm missing something, the whole point of a pluggable

[jira] [Commented] (YARN-8703) Localized resource may leak on disk if container is killed while localizing

2018-08-23 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16590347#comment-16590347 ] Jason Lowe commented on YARN-8703: -- The ResourceLocalizedEvent has a local path, so it looks like we can

[jira] [Updated] (YARN-8649) NPE in localizer hearbeat processing if a container is killed while localizing

2018-08-23 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-8649: - Summary: NPE in localizer hearbeat processing if a container is killed while localizing (was: Similar as

[jira] [Commented] (YARN-8649) Similar as YARN-4355:NPE while processing localizer heartbeat

2018-08-23 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16590302#comment-16590302 ] Jason Lowe commented on YARN-8649: -- Thanks for updating the patch! +1 lgtm. While reviewing it looks

[jira] [Created] (YARN-8703) Localized resource may leak on disk if container is killed while localizing

2018-08-23 Thread Jason Lowe (JIRA)
Jason Lowe created YARN-8703: Summary: Localized resource may leak on disk if container is killed while localizing Key: YARN-8703 URL: https://issues.apache.org/jira/browse/YARN-8703 Project: Hadoop YARN

[jira] [Commented] (YARN-8649) Similar as YARN-4355:NPE while processing localizer heartbeat

2018-08-22 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16589432#comment-16589432 ] Jason Lowe commented on YARN-8649: -- Thanks for updating the patch! Logic looks good overall, but I have

[jira] [Commented] (YARN-8649) Similar as YARN-4355:NPE while processing localizer heartbeat

2018-08-21 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16587944#comment-16587944 ] Jason Lowe commented on YARN-8649: -- Thanks for the analysis and patch, [~xiaoheipangzi]! Is ignoring the

  1   2   3   4   5   6   7   8   9   10   >