[jira] [Commented] (YARN-1021) Yarn Scheduler Load Simulator

2014-08-04 Thread JiankunLiu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14084438#comment-14084438
 ] 

JiankunLiu commented on YARN-1021:
--

Hi Wei Yan,I meet with the following exception when I use this commond. 
bin/slsrun.sh --input-rumen=sample-data/2jobs2min-rumen-jh.json 
--output-dir=sample_output
Exception in thread pool-3-thread-1 java.lang.NullPointerException
at 
org.apache.hadoop.yarn.sls.appmaster.AMSimulator.registerAM(AMSimulator.java:300)
at 
org.apache.hadoop.yarn.sls.appmaster.AMSimulator.firstStep(AMSimulator.java:141)
at 
org.apache.hadoop.yarn.sls.appmaster.MRAMSimulator.firstStep(MRAMSimulator.java:146)
at 
org.apache.hadoop.yarn.sls.scheduler.TaskRunner$Task.run(TaskRunner.java:84)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Can you do me a favor ? Thank you very much.

 Yarn Scheduler Load Simulator
 -

 Key: YARN-1021
 URL: https://issues.apache.org/jira/browse/YARN-1021
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: scheduler
Reporter: Wei Yan
Assignee: Wei Yan
 Fix For: 2.3.0

 Attachments: YARN-1021-demo.tar.gz, YARN-1021-images.tar.gz, 
 YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, 
 YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, 
 YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, 
 YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.pdf


 The Yarn Scheduler is a fertile area of interest with different 
 implementations, e.g., Fifo, Capacity and Fair  schedulers. Meanwhile, 
 several optimizations are also made to improve scheduler performance for 
 different scenarios and workload. Each scheduler algorithm has its own set of 
 features, and drives scheduling decisions by many factors, such as fairness, 
 capacity guarantee, resource availability, etc. It is very important to 
 evaluate a scheduler algorithm very well before we deploy it in a production 
 cluster. Unfortunately, currently it is non-trivial to evaluate a scheduling 
 algorithm. Evaluating in a real cluster is always time and cost consuming, 
 and it is also very hard to find a large-enough cluster. Hence, a simulator 
 which can predict how well a scheduler algorithm for some specific workload 
 would be quite useful.
 We want to build a Scheduler Load Simulator to simulate large-scale Yarn 
 clusters and application loads in a single machine. This would be invaluable 
 in furthering Yarn by providing a tool for researchers and developers to 
 prototype new scheduler features and predict their behavior and performance 
 with reasonable amount of confidence, there-by aiding rapid innovation.
 The simulator will exercise the real Yarn ResourceManager removing the 
 network factor by simulating NodeManagers and ApplicationMasters via handling 
 and dispatching NM/AMs heartbeat events from within the same JVM.
 To keep tracking of scheduler behavior and performance, a scheduler wrapper 
 will wrap the real scheduler.
 The simulator will produce real time metrics while executing, including:
 * Resource usages for whole cluster and each queue, which can be utilized to 
 configure cluster and queue's capacity.
 * The detailed application execution trace (recorded in relation to simulated 
 time), which can be analyzed to understand/validate the  scheduler behavior 
 (individual jobs turn around time, throughput, fairness, capacity guarantee, 
 etc).
 * Several key metrics of scheduler algorithm, such as time cost of each 
 scheduler operation (allocate, handle, etc), which can be utilized by Hadoop 
 developers to find the code spots and scalability limits.
 The simulator will provide real time charts showing the behavior of the 
 scheduler and its performance.
 A short demo is available http://www.youtube.com/watch?v=6thLi8q0qLE, showing 
 how to use simulator to simulate Fair Scheduler and Capacity Scheduler.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2381) CLONE - Yarn Scheduler Load Simulator

2014-08-04 Thread JiankunLiu (JIRA)
JiankunLiu created YARN-2381:


 Summary: CLONE - Yarn Scheduler Load Simulator
 Key: YARN-2381
 URL: https://issues.apache.org/jira/browse/YARN-2381
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: scheduler
Reporter: JiankunLiu
Assignee: Wei Yan
 Fix For: 2.3.0


The Yarn Scheduler is a fertile area of interest with different 
implementations, e.g., Fifo, Capacity and Fair  schedulers. Meanwhile, several 
optimizations are also made to improve scheduler performance for different 
scenarios and workload. Each scheduler algorithm has its own set of features, 
and drives scheduling decisions by many factors, such as fairness, capacity 
guarantee, resource availability, etc. It is very important to evaluate a 
scheduler algorithm very well before we deploy it in a production cluster. 
Unfortunately, currently it is non-trivial to evaluate a scheduling algorithm. 
Evaluating in a real cluster is always time and cost consuming, and it is also 
very hard to find a large-enough cluster. Hence, a simulator which can predict 
how well a scheduler algorithm for some specific workload would be quite useful.

We want to build a Scheduler Load Simulator to simulate large-scale Yarn 
clusters and application loads in a single machine. This would be invaluable in 
furthering Yarn by providing a tool for researchers and developers to prototype 
new scheduler features and predict their behavior and performance with 
reasonable amount of confidence, there-by aiding rapid innovation.

The simulator will exercise the real Yarn ResourceManager removing the network 
factor by simulating NodeManagers and ApplicationMasters via handling and 
dispatching NM/AMs heartbeat events from within the same JVM.

To keep tracking of scheduler behavior and performance, a scheduler wrapper 
will wrap the real scheduler.

The simulator will produce real time metrics while executing, including:

* Resource usages for whole cluster and each queue, which can be utilized to 
configure cluster and queue's capacity.
* The detailed application execution trace (recorded in relation to simulated 
time), which can be analyzed to understand/validate the  scheduler behavior 
(individual jobs turn around time, throughput, fairness, capacity guarantee, 
etc).
* Several key metrics of scheduler algorithm, such as time cost of each 
scheduler operation (allocate, handle, etc), which can be utilized by Hadoop 
developers to find the code spots and scalability limits.

The simulator will provide real time charts showing the behavior of the 
scheduler and its performance.

A short demo is available http://www.youtube.com/watch?v=6thLi8q0qLE, showing 
how to use simulator to simulate Fair Scheduler and Capacity Scheduler.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2136) RMStateStore can explicitly handle store/update events when fenced

2014-08-04 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14084473#comment-14084473
 ] 

Varun Saxena commented on YARN-2136:


Thanks [~jianhe]. I will work on it. 
What I will do is as under :
1. Add a FENCED state in state machine.
2. When fencing happens, switch state to FENCED. RM will switch to standby as 
before.
3. Do nothing for Store and Update events if we are in FENCED state.
4. Reset state to DEFAULT when this RM switches back to Active.

 RMStateStore can explicitly handle store/update events when fenced
 --

 Key: YARN-2136
 URL: https://issues.apache.org/jira/browse/YARN-2136
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He

 RMStateStore can choose to handle/ignore store/update events upfront instead 
 of invoking more ZK operations if state store is at fenced state. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2198) Remove the need to run NodeManager as privileged account for Windows Secure Container Executor

2014-08-04 Thread Remus Rusanu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14084540#comment-14084540
 ] 

Remus Rusanu commented on YARN-2198:


 The NM never explicitly reads the stdout/stderr from the container, the 
 streams are redirected today to their own log files according as the user's 
 code dictates (for e.g in linux bash -c user-command.sh 1 stderr 
 2stdout). Do we need to do this in the WintuilsProcessStubExecutor ?

The existing org.apache.hadoop.util.Shell.runCommand() does read the stdout and 
stderr of the launched process. The WSCE should preserve this. For one the 
container redirection occurs not in the launched process per-se, but in the 
shell wrapper script, so any stdout/stderr output that occurs in the wrapper 
script *before* the redirect would be lost. But more importantly the NM has 
another user for launchers: the localizer. This runs w/o a shell wrapper script 
and has no stdout/stderr redirect. So all in all I think there are valid 
reasons to capture the stdout/stderr. It is one of my goals to make 
WintuilsProcessStubExecutor reuse more of the existing 
Shell.ShellCommandExecutor implementation, rather than implement its own 
execute(). 

  (1) restricting users who can launch the special service and (2) restricting 
 callers who can invoke the RPCs. So, this is done by the combination of the 
 OS doing the authentication and the authorization being explicitly done by 
 the service using the allowed list. Right?

That is correct. There are 3 access checks that occur:

 - task.c: ValidateImpersonateAccessCheck checks that the desired runas account 
is allowed to be impersonated (this includes a check against the 
groups/accounts explicitly denied). this covers the restricting users who can 
launch the special service
 - service.c: RpcAuthorizeCallback  checks that the caller (NM) has the 
permission invoke the RPC service. this covers the restricting callers who can 
invoke the RPCs
 - client.c:  RpcCall_TaskCreateAsUser has an implicit access check in which 
the NM validates that the RPC call is serviced by an authorized service (by 
specifying RPC_C_QOS_CAPABILITIES_MUTUAL_AUTH and explicitly requesting the 
desired SID via pLocalSystemSid). This covers the 'mutual trust' part on the 
RPC call boundary (ie. is the complement of restricting callers who can invoke 
the RPCs)

I will address the other comments with a newer patch, ETA Wednesday.

 Remove the need to run NodeManager as privileged account for Windows Secure 
 Container Executor
 --

 Key: YARN-2198
 URL: https://issues.apache.org/jira/browse/YARN-2198
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Remus Rusanu
Assignee: Remus Rusanu
  Labels: security, windows
 Attachments: YARN-2198.1.patch, YARN-2198.2.patch


 YARN-1972 introduces a Secure Windows Container Executor. However this 
 executor requires a the process launching the container to be LocalSystem or 
 a member of the a local Administrators group. Since the process in question 
 is the NodeManager, the requirement translates to the entire NM to run as a 
 privileged account, a very large surface area to review and protect.
 This proposal is to move the privileged operations into a dedicated NT 
 service. The NM can run as a low privilege account and communicate with the 
 privileged NT service when it needs to launch a container. This would reduce 
 the surface exposed to the high privileges. 
 There has to exist a secure, authenticated and authorized channel of 
 communication between the NM and the privileged NT service. Possible 
 alternatives are a new TCP endpoint, Java RPC etc. My proposal though would 
 be to use Windows LPC (Local Procedure Calls), which is a Windows platform 
 specific inter-process communication channel that satisfies all requirements 
 and is easy to deploy. The privileged NT service would register and listen on 
 an LPC port (NtCreatePort, NtListenPort). The NM would use JNI to interop 
 with libwinutils which would host the LPC client code. The client would 
 connect to the LPC port (NtConnectPort) and send a message requesting a 
 container launch (NtRequestWaitReplyPort). LPC provides authentication and 
 the privileged NT service can use authorization API (AuthZ) to validate the 
 caller.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2198) Remove the need to run NodeManager as privileged account for Windows Secure Container Executor

2014-08-04 Thread Remus Rusanu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14084547#comment-14084547
 ] 

Remus Rusanu commented on YARN-2198:


 How the service's port is configured

LRPC protocol has no port in the sense of TCP/UDP ports. It uses shared memory 
as a transport. The equivalent is the LRPC endpoint name, which in the patch is 
declared in the IDL file as {code}endpoint(ncalrpc:[hdpwinutilsvc]){code}. 
Using hardcoded (ie. known by both client and server) endpoints names for local 
LRCP is common practice.

 Remove the need to run NodeManager as privileged account for Windows Secure 
 Container Executor
 --

 Key: YARN-2198
 URL: https://issues.apache.org/jira/browse/YARN-2198
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Remus Rusanu
Assignee: Remus Rusanu
  Labels: security, windows
 Attachments: YARN-2198.1.patch, YARN-2198.2.patch


 YARN-1972 introduces a Secure Windows Container Executor. However this 
 executor requires a the process launching the container to be LocalSystem or 
 a member of the a local Administrators group. Since the process in question 
 is the NodeManager, the requirement translates to the entire NM to run as a 
 privileged account, a very large surface area to review and protect.
 This proposal is to move the privileged operations into a dedicated NT 
 service. The NM can run as a low privilege account and communicate with the 
 privileged NT service when it needs to launch a container. This would reduce 
 the surface exposed to the high privileges. 
 There has to exist a secure, authenticated and authorized channel of 
 communication between the NM and the privileged NT service. Possible 
 alternatives are a new TCP endpoint, Java RPC etc. My proposal though would 
 be to use Windows LPC (Local Procedure Calls), which is a Windows platform 
 specific inter-process communication channel that satisfies all requirements 
 and is easy to deploy. The privileged NT service would register and listen on 
 an LPC port (NtCreatePort, NtListenPort). The NM would use JNI to interop 
 with libwinutils which would host the LPC client code. The client would 
 connect to the LPC port (NtConnectPort) and send a message requesting a 
 container launch (NtRequestWaitReplyPort). LPC provides authentication and 
 the privileged NT service can use authorization API (AuthZ) to validate the 
 caller.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2370) Fix comment in o.a.h.y.server.resourcemanager.schedulerAppSchedulingInfo

2014-08-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14084550#comment-14084550
 ] 

Hudson commented on YARN-2370:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #633 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/633/])
YARN-2370. Fix comment in 
o.a.h.y.server.resourcemanager.schedulerAppSchedulingInfo (Contributed by Wenwu 
Peng) (junping_du: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1615469)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java


 Fix comment in o.a.h.y.server.resourcemanager.schedulerAppSchedulingInfo
 

 Key: YARN-2370
 URL: https://issues.apache.org/jira/browse/YARN-2370
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 3.0.0
Reporter: Wenwu Peng
Assignee: Wenwu Peng
Priority: Trivial
  Labels: newbie
 Fix For: 2.6.0

 Attachments: YARN-2370.0.patch


 in method allocateOffSwitch of AppSchedulingInfo.java, only invoke update 
 OffRack request,  the comment should be Update cloned OffRack requests for 
 recovery not Update cloned RackLocal  and OffRack requests for recover 
 {code} 
 // Update cloned RackLocal and OffRack requests for recovery
  resourceRequests.add(cloneResourceRequest(offSwitchRequest));
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1354) Recover applications upon nodemanager restart

2014-08-04 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14084554#comment-14084554
 ] 

Junping Du commented on YARN-1354:
--

bq. It's possible if they explicitly handle it themselves (e.g.: write out 
schema versions and switch on it during load), however they don't typically do 
that. I agree that YARN-668 is the proper place to discuss how best to migrate 
Credentials and Tokens to a more manageable infrastructure for supporting 
upgrades.
Agree. We should have built-in mechanism to handle serialize/deserialize across 
versions. Let's discussion more on YARN-668.

bq. Therefore I updated the reconnected node path to forward the applications 
running on the node and have the RM inform the NM to finish the application if 
it is no longer active.
Sounds good. New added code and tests make sense to me.

+1. Will commit it shortly. 

 Recover applications upon nodemanager restart
 -

 Key: YARN-1354
 URL: https://issues.apache.org/jira/browse/YARN-1354
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-1354-v1.patch, 
 YARN-1354-v2-and-YARN-1987-and-YARN-1362.patch, YARN-1354-v3.patch, 
 YARN-1354-v4.patch, YARN-1354-v5.patch, YARN-1354-v6.patch


 The set of active applications in the nodemanager context need to be 
 recovered for work-preserving nodemanager restart



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1354) Recover applications upon nodemanager restart

2014-08-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14084647#comment-14084647
 ] 

Hudson commented on YARN-1354:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6007 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6007/])
YARN-1354. Recover applications upon nodemanager restart. (Contributed by Jason 
Lowe) (junping_du: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1615550)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/application/ApplicationImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMLeveldbStateStoreService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMNullStateStoreService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMStateStoreService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/proto/yarn_server_nodemanager_recovery.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestContainerManagerRecovery.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/TestContainer.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMMemoryStateStoreService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/TestNMLeveldbStateStoreService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeReconnectEvent.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestApplicationCleanup.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java


 Recover applications upon nodemanager restart
 -

 Key: YARN-1354
 URL: https://issues.apache.org/jira/browse/YARN-1354
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-1354-v1.patch, 
 YARN-1354-v2-and-YARN-1987-and-YARN-1362.patch, YARN-1354-v3.patch, 
 YARN-1354-v4.patch, YARN-1354-v5.patch, YARN-1354-v6.patch


 The set of active applications in the nodemanager context need to be 
 recovered for work-preserving nodemanager restart



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2370) Fix comment in o.a.h.y.server.resourcemanager.schedulerAppSchedulingInfo

2014-08-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14084666#comment-14084666
 ] 

Hudson commented on YARN-2370:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1827 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1827/])
YARN-2370. Fix comment in 
o.a.h.y.server.resourcemanager.schedulerAppSchedulingInfo (Contributed by Wenwu 
Peng) (junping_du: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1615469)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java


 Fix comment in o.a.h.y.server.resourcemanager.schedulerAppSchedulingInfo
 

 Key: YARN-2370
 URL: https://issues.apache.org/jira/browse/YARN-2370
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 3.0.0
Reporter: Wenwu Peng
Assignee: Wenwu Peng
Priority: Trivial
  Labels: newbie
 Fix For: 2.6.0

 Attachments: YARN-2370.0.patch


 in method allocateOffSwitch of AppSchedulingInfo.java, only invoke update 
 OffRack request,  the comment should be Update cloned OffRack requests for 
 recovery not Update cloned RackLocal  and OffRack requests for recover 
 {code} 
 // Update cloned RackLocal and OffRack requests for recovery
  resourceRequests.add(cloneResourceRequest(offSwitchRequest));
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2033) Investigate merging generic-history into the Timeline Store

2014-08-04 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14084670#comment-14084670
 ] 

Junping Du commented on YARN-2033:
--

Thanks [~zjshen] for uploading a proposal and keep updating patches! 
The proposal sounds like a good direction. I just take a glance at latest patch 
which looks pretty big for me and mix of code refactor (move location of 
TimelineClient, etc.), side logic like: adding originalURL to 
ApplicationAttemptReport, etc. Can we break this patch to several pieces and 
create a sub JIRA for each one? That should be much easier for review.

 Investigate merging generic-history into the Timeline Store
 ---

 Key: YARN-2033
 URL: https://issues.apache.org/jira/browse/YARN-2033
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Zhijie Shen
 Attachments: ProposalofStoringYARNMetricsintotheTimelineStore.pdf, 
 YARN-2033.1.patch, YARN-2033.2.patch, YARN-2033.3.patch, YARN-2033.4.patch, 
 YARN-2033.Prototype.patch, YARN-2033_ALL.1.patch, YARN-2033_ALL.2.patch, 
 YARN-2033_ALL.3.patch, YARN-2033_ALL.4.patch


 Having two different stores isn't amicable to generic insights on what's 
 happening with applications. This is to investigate porting generic-history 
 into the Timeline Store.
 One goal is to try and retain most of the client side interfaces as close to 
 what we have today.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1021) Yarn Scheduler Load Simulator

2014-08-04 Thread JiankunLiu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14084689#comment-14084689
 ] 

JiankunLiu commented on YARN-1021:
--

I found that it was no exception on hadoop-2.3.0. But I meet with the following 
exception on hadoop-2.4.0.
Exception in thread pool-3-thread-1 java.lang.NullPointerException
at 
org.apache.hadoop.yarn.sls.appmaster.AMSimulator.registerAM(AMSimulator.java:300)
at 
org.apache.hadoop.yarn.sls.appmaster.AMSimulator.firstStep(AMSimulator.java:141)
at 
org.apache.hadoop.yarn.sls.appmaster.MRAMSimulator.firstStep(MRAMSimulator.java:146)
at org.apache.hadoop.yarn.sls.scheduler.TaskRunner$Task.run(TaskRunner.java:84)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
what should I do?

 Yarn Scheduler Load Simulator
 -

 Key: YARN-1021
 URL: https://issues.apache.org/jira/browse/YARN-1021
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: scheduler
Reporter: Wei Yan
Assignee: Wei Yan
 Fix For: 2.3.0

 Attachments: YARN-1021-demo.tar.gz, YARN-1021-images.tar.gz, 
 YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, 
 YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, 
 YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, 
 YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.pdf


 The Yarn Scheduler is a fertile area of interest with different 
 implementations, e.g., Fifo, Capacity and Fair  schedulers. Meanwhile, 
 several optimizations are also made to improve scheduler performance for 
 different scenarios and workload. Each scheduler algorithm has its own set of 
 features, and drives scheduling decisions by many factors, such as fairness, 
 capacity guarantee, resource availability, etc. It is very important to 
 evaluate a scheduler algorithm very well before we deploy it in a production 
 cluster. Unfortunately, currently it is non-trivial to evaluate a scheduling 
 algorithm. Evaluating in a real cluster is always time and cost consuming, 
 and it is also very hard to find a large-enough cluster. Hence, a simulator 
 which can predict how well a scheduler algorithm for some specific workload 
 would be quite useful.
 We want to build a Scheduler Load Simulator to simulate large-scale Yarn 
 clusters and application loads in a single machine. This would be invaluable 
 in furthering Yarn by providing a tool for researchers and developers to 
 prototype new scheduler features and predict their behavior and performance 
 with reasonable amount of confidence, there-by aiding rapid innovation.
 The simulator will exercise the real Yarn ResourceManager removing the 
 network factor by simulating NodeManagers and ApplicationMasters via handling 
 and dispatching NM/AMs heartbeat events from within the same JVM.
 To keep tracking of scheduler behavior and performance, a scheduler wrapper 
 will wrap the real scheduler.
 The simulator will produce real time metrics while executing, including:
 * Resource usages for whole cluster and each queue, which can be utilized to 
 configure cluster and queue's capacity.
 * The detailed application execution trace (recorded in relation to simulated 
 time), which can be analyzed to understand/validate the  scheduler behavior 
 (individual jobs turn around time, throughput, fairness, capacity guarantee, 
 etc).
 * Several key metrics of scheduler algorithm, such as time cost of each 
 scheduler operation (allocate, handle, etc), which can be utilized by Hadoop 
 developers to find the code spots and scalability limits.
 The simulator will provide real time charts showing the behavior of the 
 scheduler and its performance.
 A short demo is available http://www.youtube.com/watch?v=6thLi8q0qLE, showing 
 how to use simulator to simulate Fair Scheduler and Capacity Scheduler.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2370) Fix comment in o.a.h.y.server.resourcemanager.schedulerAppSchedulingInfo

2014-08-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14084740#comment-14084740
 ] 

Hudson commented on YARN-2370:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1852 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1852/])
YARN-2370. Fix comment in 
o.a.h.y.server.resourcemanager.schedulerAppSchedulingInfo (Contributed by Wenwu 
Peng) (junping_du: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1615469)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java


 Fix comment in o.a.h.y.server.resourcemanager.schedulerAppSchedulingInfo
 

 Key: YARN-2370
 URL: https://issues.apache.org/jira/browse/YARN-2370
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 3.0.0
Reporter: Wenwu Peng
Assignee: Wenwu Peng
Priority: Trivial
  Labels: newbie
 Fix For: 2.6.0

 Attachments: YARN-2370.0.patch


 in method allocateOffSwitch of AppSchedulingInfo.java, only invoke update 
 OffRack request,  the comment should be Update cloned OffRack requests for 
 recovery not Update cloned RackLocal  and OffRack requests for recover 
 {code} 
 // Update cloned RackLocal and OffRack requests for recovery
  resourceRequests.add(cloneResourceRequest(offSwitchRequest));
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1021) Yarn Scheduler Load Simulator

2014-08-04 Thread Wei Yan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14084797#comment-14084797
 ] 

Wei Yan commented on YARN-1021:
---

[~Jackliu91], yes, the simulator has exception in hadoop-2.4.0, the patch 
provided in YARN-1726. The coming released hadoop-2.5.0 will include that patch 
and the simulator should run well.

 Yarn Scheduler Load Simulator
 -

 Key: YARN-1021
 URL: https://issues.apache.org/jira/browse/YARN-1021
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: scheduler
Reporter: Wei Yan
Assignee: Wei Yan
 Fix For: 2.3.0

 Attachments: YARN-1021-demo.tar.gz, YARN-1021-images.tar.gz, 
 YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, 
 YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, 
 YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, 
 YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.pdf


 The Yarn Scheduler is a fertile area of interest with different 
 implementations, e.g., Fifo, Capacity and Fair  schedulers. Meanwhile, 
 several optimizations are also made to improve scheduler performance for 
 different scenarios and workload. Each scheduler algorithm has its own set of 
 features, and drives scheduling decisions by many factors, such as fairness, 
 capacity guarantee, resource availability, etc. It is very important to 
 evaluate a scheduler algorithm very well before we deploy it in a production 
 cluster. Unfortunately, currently it is non-trivial to evaluate a scheduling 
 algorithm. Evaluating in a real cluster is always time and cost consuming, 
 and it is also very hard to find a large-enough cluster. Hence, a simulator 
 which can predict how well a scheduler algorithm for some specific workload 
 would be quite useful.
 We want to build a Scheduler Load Simulator to simulate large-scale Yarn 
 clusters and application loads in a single machine. This would be invaluable 
 in furthering Yarn by providing a tool for researchers and developers to 
 prototype new scheduler features and predict their behavior and performance 
 with reasonable amount of confidence, there-by aiding rapid innovation.
 The simulator will exercise the real Yarn ResourceManager removing the 
 network factor by simulating NodeManagers and ApplicationMasters via handling 
 and dispatching NM/AMs heartbeat events from within the same JVM.
 To keep tracking of scheduler behavior and performance, a scheduler wrapper 
 will wrap the real scheduler.
 The simulator will produce real time metrics while executing, including:
 * Resource usages for whole cluster and each queue, which can be utilized to 
 configure cluster and queue's capacity.
 * The detailed application execution trace (recorded in relation to simulated 
 time), which can be analyzed to understand/validate the  scheduler behavior 
 (individual jobs turn around time, throughput, fairness, capacity guarantee, 
 etc).
 * Several key metrics of scheduler algorithm, such as time cost of each 
 scheduler operation (allocate, handle, etc), which can be utilized by Hadoop 
 developers to find the code spots and scalability limits.
 The simulator will provide real time charts showing the behavior of the 
 scheduler and its performance.
 A short demo is available http://www.youtube.com/watch?v=6thLi8q0qLE, showing 
 how to use simulator to simulate Fair Scheduler and Capacity Scheduler.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2381) CLONE - Yarn Scheduler Load Simulator

2014-08-04 Thread Wei Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-2381:
--

Assignee: (was: Wei Yan)

 CLONE - Yarn Scheduler Load Simulator
 -

 Key: YARN-2381
 URL: https://issues.apache.org/jira/browse/YARN-2381
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: scheduler
Reporter: JiankunLiu
 Fix For: 2.3.0


 The Yarn Scheduler is a fertile area of interest with different 
 implementations, e.g., Fifo, Capacity and Fair  schedulers. Meanwhile, 
 several optimizations are also made to improve scheduler performance for 
 different scenarios and workload. Each scheduler algorithm has its own set of 
 features, and drives scheduling decisions by many factors, such as fairness, 
 capacity guarantee, resource availability, etc. It is very important to 
 evaluate a scheduler algorithm very well before we deploy it in a production 
 cluster. Unfortunately, currently it is non-trivial to evaluate a scheduling 
 algorithm. Evaluating in a real cluster is always time and cost consuming, 
 and it is also very hard to find a large-enough cluster. Hence, a simulator 
 which can predict how well a scheduler algorithm for some specific workload 
 would be quite useful.
 We want to build a Scheduler Load Simulator to simulate large-scale Yarn 
 clusters and application loads in a single machine. This would be invaluable 
 in furthering Yarn by providing a tool for researchers and developers to 
 prototype new scheduler features and predict their behavior and performance 
 with reasonable amount of confidence, there-by aiding rapid innovation.
 The simulator will exercise the real Yarn ResourceManager removing the 
 network factor by simulating NodeManagers and ApplicationMasters via handling 
 and dispatching NM/AMs heartbeat events from within the same JVM.
 To keep tracking of scheduler behavior and performance, a scheduler wrapper 
 will wrap the real scheduler.
 The simulator will produce real time metrics while executing, including:
 * Resource usages for whole cluster and each queue, which can be utilized to 
 configure cluster and queue's capacity.
 * The detailed application execution trace (recorded in relation to simulated 
 time), which can be analyzed to understand/validate the  scheduler behavior 
 (individual jobs turn around time, throughput, fairness, capacity guarantee, 
 etc).
 * Several key metrics of scheduler algorithm, such as time cost of each 
 scheduler operation (allocate, handle, etc), which can be utilized by Hadoop 
 developers to find the code spots and scalability limits.
 The simulator will provide real time charts showing the behavior of the 
 scheduler and its performance.
 A short demo is available http://www.youtube.com/watch?v=6thLi8q0qLE, showing 
 how to use simulator to simulate Fair Scheduler and Capacity Scheduler.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2381) CLONE - Yarn Scheduler Load Simulator

2014-08-04 Thread Wei Yan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14084804#comment-14084804
 ] 

Wei Yan commented on YARN-2381:
---

[~Jackliu91], purpose of this jira?

 CLONE - Yarn Scheduler Load Simulator
 -

 Key: YARN-2381
 URL: https://issues.apache.org/jira/browse/YARN-2381
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: scheduler
Reporter: JiankunLiu
 Fix For: 2.3.0


 The Yarn Scheduler is a fertile area of interest with different 
 implementations, e.g., Fifo, Capacity and Fair  schedulers. Meanwhile, 
 several optimizations are also made to improve scheduler performance for 
 different scenarios and workload. Each scheduler algorithm has its own set of 
 features, and drives scheduling decisions by many factors, such as fairness, 
 capacity guarantee, resource availability, etc. It is very important to 
 evaluate a scheduler algorithm very well before we deploy it in a production 
 cluster. Unfortunately, currently it is non-trivial to evaluate a scheduling 
 algorithm. Evaluating in a real cluster is always time and cost consuming, 
 and it is also very hard to find a large-enough cluster. Hence, a simulator 
 which can predict how well a scheduler algorithm for some specific workload 
 would be quite useful.
 We want to build a Scheduler Load Simulator to simulate large-scale Yarn 
 clusters and application loads in a single machine. This would be invaluable 
 in furthering Yarn by providing a tool for researchers and developers to 
 prototype new scheduler features and predict their behavior and performance 
 with reasonable amount of confidence, there-by aiding rapid innovation.
 The simulator will exercise the real Yarn ResourceManager removing the 
 network factor by simulating NodeManagers and ApplicationMasters via handling 
 and dispatching NM/AMs heartbeat events from within the same JVM.
 To keep tracking of scheduler behavior and performance, a scheduler wrapper 
 will wrap the real scheduler.
 The simulator will produce real time metrics while executing, including:
 * Resource usages for whole cluster and each queue, which can be utilized to 
 configure cluster and queue's capacity.
 * The detailed application execution trace (recorded in relation to simulated 
 time), which can be analyzed to understand/validate the  scheduler behavior 
 (individual jobs turn around time, throughput, fairness, capacity guarantee, 
 etc).
 * Several key metrics of scheduler algorithm, such as time cost of each 
 scheduler operation (allocate, handle, etc), which can be utilized by Hadoop 
 developers to find the code spots and scalability limits.
 The simulator will provide real time charts showing the behavior of the 
 scheduler and its performance.
 A short demo is available http://www.youtube.com/watch?v=6thLi8q0qLE, showing 
 how to use simulator to simulate Fair Scheduler and Capacity Scheduler.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2033) Investigate merging generic-history into the Timeline Store

2014-08-04 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14084869#comment-14084869
 ] 

Zhijie Shen commented on YARN-2033:
---

[~djp], thanks for you volunteering to review this big patch.

bq. I just take a glance at latest patch which looks pretty big for me and mix 
of code refactor (move location of TimelineClient, etc.),

In case of the big refactor work, I've already had two jiras for them: 
YARN-2298 and YARN-2302. Please note that in this jira I posted the patch of 
generic-history in timeline store only and the uber patch including all code 
changes (with _ALL suffix).

bq. side logic like: adding originalURL to ApplicationAttemptReport, etc. Can 
we break this patch to several pieces and create a sub JIRA for each one?

I prefer not to do it right now, because moving the minor change out is not 
going to squeeze this patch significant. Instead, I've to maintain a longer 
dependency chain of the outstanding patches. Do you agree?

 Investigate merging generic-history into the Timeline Store
 ---

 Key: YARN-2033
 URL: https://issues.apache.org/jira/browse/YARN-2033
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Zhijie Shen
 Attachments: ProposalofStoringYARNMetricsintotheTimelineStore.pdf, 
 YARN-2033.1.patch, YARN-2033.2.patch, YARN-2033.3.patch, YARN-2033.4.patch, 
 YARN-2033.Prototype.patch, YARN-2033_ALL.1.patch, YARN-2033_ALL.2.patch, 
 YARN-2033_ALL.3.patch, YARN-2033_ALL.4.patch


 Having two different stores isn't amicable to generic insights on what's 
 happening with applications. This is to investigate porting generic-history 
 into the Timeline Store.
 One goal is to try and retain most of the client side interfaces as close to 
 what we have today.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2136) RMStateStore can explicitly handle store/update events when fenced

2014-08-04 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14084971#comment-14084971
 ] 

Jian He commented on YARN-2136:
---

On a second check, state-store is actually already stopped/closed when RM gets 
fenced, which means no more store operation can be performed once RM gets 
fenced. [~kkambatl], is this the case?

 RMStateStore can explicitly handle store/update events when fenced
 --

 Key: YARN-2136
 URL: https://issues.apache.org/jira/browse/YARN-2136
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He

 RMStateStore can choose to handle/ignore store/update events upfront instead 
 of invoking more ZK operations if state store is at fenced state. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1954) Add waitFor to AMRMClient(Async)

2014-08-04 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14084981#comment-14084981
 ] 

Zhijie Shen commented on YARN-1954:
---

My concern is that if users want a even larger checkEveryMillis, he may not get 
it, and the log is actually recorded at 6ms silently. If 6ms is the 
interval cap, we should somehow let users know.

 Add waitFor to AMRMClient(Async)
 

 Key: YARN-1954
 URL: https://issues.apache.org/jira/browse/YARN-1954
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: client
Affects Versions: 3.0.0, 2.4.0
Reporter: Zhijie Shen
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-1954.1.patch, YARN-1954.2.patch, YARN-1954.3.patch, 
 YARN-1954.4.patch, YARN-1954.4.patch


 Recently, I saw some use cases of AMRMClient(Async). The painful thing is 
 that the main non-daemon thread has to sit in a dummy loop to prevent AM 
 process exiting before all the tasks are done, while unregistration is 
 triggered on a separate another daemon thread by callback methods (in 
 particular when using AMRMClientAsync). IMHO, it should be beneficial to add 
 a waitFor method to AMRMClient(Async) to block the AM until unregistration or 
 user supplied check point, such that users don't need to write the loop 
 themselves.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2374) YARN trunk build failing TestDistributedShell.testDSShell

2014-08-04 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14084986#comment-14084986
 ] 

Jian He commented on YARN-2374:
---

looks good, [~vvasudev], can you add some comments to the checkHostname method 
to explain what its doing? thx

 YARN trunk build failing TestDistributedShell.testDSShell
 -

 Key: YARN-2374
 URL: https://issues.apache.org/jira/browse/YARN-2374
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-2374.0.patch, apache-yarn-2374.1.patch, 
 apache-yarn-2374.2.patch, apache-yarn-2374.3.patch


 The YARN trunk build has been failing for the last few days in the 
 distributed shell module.
 {noformat}
 testDSShell(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
   Time elapsed: 27.269 sec   FAILURE!
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:86)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at org.junit.Assert.assertTrue(Assert.java:52)
   at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:188)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-857) Errors when localizing end up with the localization failure not being seen by the NM

2014-08-04 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated YARN-857:
-

Assignee: Vinod Kumar Vavilapalli  (was: Hitesh Shah)

 Errors when localizing end up with the localization failure not being seen by 
 the NM
 

 Key: YARN-857
 URL: https://issues.apache.org/jira/browse/YARN-857
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Hitesh Shah
Assignee: Vinod Kumar Vavilapalli
 Attachments: YARN-857.1.patch, YARN-857.2.patch


 at 
 org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.client.LocalizationProtocolPBClientImpl.heartbeat(LocalizationProtocolPBClientImpl.java:62)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.localizeFiles(ContainerLocalizer.java:235)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:169)
 at 
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:106)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:978)
 Traced this down to DefaultExecutor which does not look at the exit code for 
 the localizer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2301) Improve yarn container command

2014-08-04 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14085040#comment-14085040
 ] 

Zhijie Shen commented on YARN-2301:
---

[~Naganarasimha], do you mean you want to move

bq. appId with -all: containers of all app attempts in RM/Timeline server

or

bq. May have an option to run as yarn container -list appId OR yarn 
application -list-containers appId also. 

later?

 Improve yarn container command
 --

 Key: YARN-2301
 URL: https://issues.apache.org/jira/browse/YARN-2301
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jian He
Assignee: Naganarasimha G R
  Labels: usability

 While running yarn container -list Application Attempt ID command, some 
 observations:
 1) the scheme (e.g. http/https  ) before LOG-URL is missing
 2) the start-time is printed as milli seconds (e.g. 1405540544844). Better to 
 print as time format.
 3) finish-time is 0 if container is not yet finished. May be N/A
 4) May have an option to run as yarn container -list appId OR  yarn 
 application -list-containers appId also.  
 As attempt Id is not shown on console, this is easier for user to just copy 
 the appId and run it, may  also be useful for container-preserving AM 
 restart. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2375) Allow enabling/disabling timeline server per framework

2014-08-04 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2375:
--

Issue Type: Sub-task  (was: Improvement)
Parent: YARN-1530

 Allow enabling/disabling timeline server per framework
 --

 Key: YARN-2375
 URL: https://issues.apache.org/jira/browse/YARN-2375
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jonathan Eagles





--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (YARN-2138) Cleanup notifyDone* methods in RMStateStore

2014-08-04 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena reassigned YARN-2138:
--

Assignee: Varun Saxena

 Cleanup notifyDone* methods in RMStateStore
 ---

 Key: YARN-2138
 URL: https://issues.apache.org/jira/browse/YARN-2138
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Varun Saxena
 Fix For: 2.5.0


 The storedException passed into notifyDoneStoringApplication is always null. 
 Similarly for other notifyDone* methods. We can clean up these methods as 
 this control flow path is not used anymore.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (YARN-72) NM should handle cleaning up containers when it shuts down

2014-08-04 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-72?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla reassigned YARN-72:


Assignee: Sandy Ryza  (was: Vinod Kumar Vavilapalli)

 NM should handle cleaning up containers when it shuts down
 --

 Key: YARN-72
 URL: https://issues.apache.org/jira/browse/YARN-72
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Hitesh Shah
Assignee: Sandy Ryza
 Fix For: 2.0.3-alpha, 0.23.6

 Attachments: YARN-72-1.patch, YARN-72-2.patch, YARN-72-2.patch, 
 YARN-72-2.patch, YARN-72.patch


 Ideally, the NM should wait for a limited amount of time when it gets a 
 shutdown signal for existing containers to complete and kill the containers ( 
 if we pick an aggressive approach ) after this time interval. 
 For NMs which come up after an unclean shutdown, the NM should look through 
 its directories for existing container.pids and try and kill an existing 
 containers matching the pids found. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2212) ApplicationMaster needs to find a way to update the AMRMToken periodically

2014-08-04 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14085556#comment-14085556
 ] 

Xuan Gong commented on YARN-2212:
-

new patch addressed all latest comments.

bq. Please try on secure cluster too.

Tested in secure HA/non-HA cluster

 ApplicationMaster needs to find a way to update the AMRMToken periodically
 --

 Key: YARN-2212
 URL: https://issues.apache.org/jira/browse/YARN-2212
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2212.1.patch, YARN-2212.2.patch, 
 YARN-2212.3.1.patch, YARN-2212.3.patch, YARN-2212.4.patch, YARN-2212.5.patch, 
 YARN-2212.5.patch, YARN-2212.5.rebase.patch, YARN-2212.6.patch, 
 YARN-2212.6.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2212) ApplicationMaster needs to find a way to update the AMRMToken periodically

2014-08-04 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-2212:


Attachment: YARN-2212.6.patch

 ApplicationMaster needs to find a way to update the AMRMToken periodically
 --

 Key: YARN-2212
 URL: https://issues.apache.org/jira/browse/YARN-2212
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2212.1.patch, YARN-2212.2.patch, 
 YARN-2212.3.1.patch, YARN-2212.3.patch, YARN-2212.4.patch, YARN-2212.5.patch, 
 YARN-2212.5.patch, YARN-2212.5.rebase.patch, YARN-2212.6.patch, 
 YARN-2212.6.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2301) Improve yarn container command

2014-08-04 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14085628#comment-14085628
 ] 

Naganarasimha G R commented on YARN-2301:
-

Hi [~zjshen],
   I believe the existing working of 
bq. yarn container -list appAttemptID
itself is not proper (wrt  YARN-1794) and once all the data is taken from time 
line server it would make proper sense for the end user.
End user should not bother its coming from AHS/Timeline/RM and result should be 
list of all containers for the given attempt or app ID.
I have the code for the command
bd.yarn container -list appAttemptID/appID
but i feel it would not be complete until timeline integration is done and 
users of this command will again get confused if given with half feature
So my pick would be to your second option as new jira i.e 
bq. *yarn container -list appAttemptID/appID*
but if you see timeline server intergration will be delayed then we can discuss 
what part of the command could be taken and the behavior of it.
Also i would like to fix YARN-1794-??show all containers when application is 
finished from AHS instead of no containers from RM?? 
(again here too . i would like to fix the bug after timeline integration, so 
that list of all containers for the given attempt or app ID irrespective of the 
app state )
Please provide your opinion.

 Improve yarn container command
 --

 Key: YARN-2301
 URL: https://issues.apache.org/jira/browse/YARN-2301
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jian He
Assignee: Naganarasimha G R
  Labels: usability

 While running yarn container -list Application Attempt ID command, some 
 observations:
 1) the scheme (e.g. http/https  ) before LOG-URL is missing
 2) the start-time is printed as milli seconds (e.g. 1405540544844). Better to 
 print as time format.
 3) finish-time is 0 if container is not yet finished. May be N/A
 4) May have an option to run as yarn container -list appId OR  yarn 
 application -list-containers appId also.  
 As attempt Id is not shown on console, this is easier for user to just copy 
 the appId and run it, may  also be useful for container-preserving AM 
 restart. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2352) FairScheduler: Collect metrics on duration of critical methods that affect performance

2014-08-04 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2352:
---

Attachment: fs-perf-metrics.png

Just got a working version of this, and here is a screenshot from a FileSink 
that I hooked up. Will post the patch momentarily.

 FairScheduler: Collect metrics on duration of critical methods that affect 
 performance
 --

 Key: YARN-2352
 URL: https://issues.apache.org/jira/browse/YARN-2352
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.4.1
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: fs-perf-metrics.png


 We need more metrics for better visibility into FairScheduler performance. At 
 the least, we need to do this for (1) handle node events, (2) update, (3) 
 compute fairshares, (4) preemption.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2352) FairScheduler: Collect metrics on duration of critical methods that affect performance

2014-08-04 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2352:
---

Attachment: yarn-2352-1.patch

 FairScheduler: Collect metrics on duration of critical methods that affect 
 performance
 --

 Key: YARN-2352
 URL: https://issues.apache.org/jira/browse/YARN-2352
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.4.1
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: fs-perf-metrics.png, yarn-2352-1.patch


 We need more metrics for better visibility into FairScheduler performance. At 
 the least, we need to do this for (1) handle node events, (2) update, (3) 
 compute fairshares, (4) preemption.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2229) ContainerId can overflow with RM restart

2014-08-04 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-2229:
-

Attachment: YARN-2229.11.patch

 ContainerId can overflow with RM restart
 

 Key: YARN-2229
 URL: https://issues.apache.org/jira/browse/YARN-2229
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2229.1.patch, YARN-2229.10.patch, 
 YARN-2229.10.patch, YARN-2229.11.patch, YARN-2229.2.patch, YARN-2229.2.patch, 
 YARN-2229.3.patch, YARN-2229.4.patch, YARN-2229.5.patch, YARN-2229.6.patch, 
 YARN-2229.7.patch, YARN-2229.8.patch, YARN-2229.9.patch


 On YARN-2052, we changed containerId format: upper 10 bits are for epoch, 
 lower 22 bits are for sequence number of Ids. This is for preserving 
 semantics of {{ContainerId#getId()}}, {{ContainerId#toString()}}, 
 {{ContainerId#compareTo()}}, {{ContainerId#equals}}, and 
 {{ConverterUtils#toContainerId}}. One concern is epoch can overflow after RM 
 restarts 1024 times.
 To avoid the problem, its better to make containerId long. We need to define 
 the new format of container Id with preserving backward compatibility on this 
 JIRA.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2212) ApplicationMaster needs to find a way to update the AMRMToken periodically

2014-08-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14085722#comment-14085722
 ] 

Hadoop QA commented on YARN-2212:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12659777/YARN-2212.6.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 7 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4519//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4519//console

This message is automatically generated.

 ApplicationMaster needs to find a way to update the AMRMToken periodically
 --

 Key: YARN-2212
 URL: https://issues.apache.org/jira/browse/YARN-2212
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2212.1.patch, YARN-2212.2.patch, 
 YARN-2212.3.1.patch, YARN-2212.3.patch, YARN-2212.4.patch, YARN-2212.5.patch, 
 YARN-2212.5.patch, YARN-2212.5.rebase.patch, YARN-2212.6.patch, 
 YARN-2212.6.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2352) FairScheduler: Collect metrics on duration of critical methods that affect performance

2014-08-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14085725#comment-14085725
 ] 

Hadoop QA commented on YARN-2352:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12659794/yarn-2352-1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 2 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.ha.TestZKFailoverControllerStress
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4520//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/4520//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4520//console

This message is automatically generated.

 FairScheduler: Collect metrics on duration of critical methods that affect 
 performance
 --

 Key: YARN-2352
 URL: https://issues.apache.org/jira/browse/YARN-2352
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.4.1
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: fs-perf-metrics.png, yarn-2352-1.patch


 We need more metrics for better visibility into FairScheduler performance. At 
 the least, we need to do this for (1) handle node events, (2) update, (3) 
 compute fairshares, (4) preemption.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2382) Resource Manager throws InvalidStateTransitonException

2014-08-04 Thread Nishan Shetty (JIRA)
Nishan Shetty created YARN-2382:
---

 Summary: Resource Manager throws InvalidStateTransitonException
 Key: YARN-2382
 URL: https://issues.apache.org/jira/browse/YARN-2382
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 3.0.0
Reporter: Nishan Shetty


{code}
2014-08-05 03:44:47,882 INFO org.apache.zookeeper.ClientCnxn: Socket connection 
established to 10.18.40.26/10.18.40.26:11578, initiating session
2014-08-05 03:44:47,888 INFO org.apache.zookeeper.ClientCnxn: Session 
establishment complete on server 10.18.40.26/10.18.40.26:11578, sessionid = 
0x347a051fda60035, negotiated timeout = 1
2014-08-05 03:44:47,889 ERROR 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
Can't handle this event at current state
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
CONTAINER_ALLOCATED at LAUNCHED
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:664)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:104)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:764)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:745)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
at java.lang.Thread.run(Thread.java:662)
2014-08-05 03:44:47,890 ERROR 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
Can't handle this event at current state
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
STATUS_UPDATE at LAUNCHED
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:664)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:104)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:764)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:745)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
at java.lang.Thread.run(Thread.java:662)
2014-08-05 03:44:47,890 ERROR 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
Can't handle this event at current state
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
STATUS_UPDATE at LAUNCHED
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:664)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:104)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:764)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:745)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
at java.lang.Thread.run(Thread.java:662)
2014-08-05 03:44:47,890 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
Ignoring stale result from old client with sessionId 0x147a051fd93002e
{code}



--
This message was sent by Atlassian JIRA

[jira] [Commented] (YARN-2136) RMStateStore can explicitly handle store/update events when fenced

2014-08-04 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14085826#comment-14085826
 ] 

Varun Saxena commented on YARN-2136:


Yes, no more store operations can be performed once RM switches to standby. But 
there might be some threads processing store/update once that is going on. I 
thought JIRA was to handle that case.

 RMStateStore can explicitly handle store/update events when fenced
 --

 Key: YARN-2136
 URL: https://issues.apache.org/jira/browse/YARN-2136
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He

 RMStateStore can choose to handle/ignore store/update events upfront instead 
 of invoking more ZK operations if state store is at fenced state. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)