Re: YarnException: Unauthorized request to start container. This token is expired.
Can you check few things? What is the container expiry interval set to? How many containers are getting allocated? Is there any reservation of the containers happening..? if yes then that was a known problem...I don't remember the jira number though... Underlying problem in case of reservation was that it creates a token at the time of reservation and not when it issues the token to AM. On Fri, Mar 28, 2014 at 6:03 AM, Leibnitz se3g2...@gmail.com wrote: no doubt Sent from my iPhone 6 On Mar 23, 2014, at 17:37, Fengyun RAO raofeng...@gmail.com wrote: What does this exception mean? I googled a lot, all the results tell me it's because the time is not synchronized between datanode and namenode. However, I checked all the servers, that the ntpd service is on, and the time differences are less than 1 second. What's more, the tasks are not always failing on certain datanodes. It fails and then it restarts and succeeds. If it were the time problem, I guess it would always fail. My hadoop version is CDH5 beta. Below is the detailed log: 14/03/23 14:57:06 INFO mapreduce.Job: Running job: job_1394434496930_0032 14/03/23 14:57:17 INFO mapreduce.Job: Job job_1394434496930_0032 running in uber mode : false 14/03/23 14:57:17 INFO mapreduce.Job: map 0% reduce 0% 14/03/23 15:08:01 INFO mapreduce.Job: Task Id : attempt_1394434496930_0032_m_34_0, Status : FAILED Container launch failed for container_1394434496930_0032_01_41 : org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container. This token is expired. current time is 1395558481146 found 1395558443384 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:152) at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:155) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:370) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) 14/03/23 15:08:02 INFO mapreduce.Job: map 1% reduce 0% 14/03/23 15:09:36 INFO mapreduce.Job: Task Id : attempt_1394434496930_0032_m_36_0, Status : FAILED Container launch failed for container_1394434496930_0032_01_38 : org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container. This token is expired. current time is 1395558575889 found 1395558443245 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:152) at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:155) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:370) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724)
Re: Command line tools for YARN
if it is not present in CLI then writing a simple wrapper around yarn -web services will be a option. I think all your listed requirements will be covered with web services though On Thu, Dec 26, 2013 at 11:28 AM, Jian He j...@hortonworks.com wrote: 1) checking how many nodes are in my cluster 3) what are the names of nodes in my cluster you can use yarn node. 2) what are the cluster resources total or available at the time of running the command Not quite sure, you can search possible options in the yarn command menu. And you can always see the resources usage via web UI though. On Mon, Dec 23, 2013 at 10:08 PM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi, Are there any command line tools for things like, 1) checking how many nodes are in my cluster 2) what are the cluster resources total or available at the time of running the command 3) what are the names of nodes in my cluster etc.. Thanks, Kishore CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Container exit status for released container
I think this is tricky in the sense if application returns with same exit code then I am not sure if we differentiate between the two. But can be used as defaults. On Thu, Dec 26, 2013 at 11:24 AM, Jian He j...@hortonworks.com wrote: Not true, you can look at ContainerExitStatus.java which includes all the possible exit codes Jian On Mon, Dec 23, 2013 at 11:06 PM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi, I am seeing the exit status of released container(released through a call to releaseAssignedContainer()) to be -100. Can my code assume that -100 will always be given as exit status for a released container by YARN? Thanks, Kishore CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: YARN: LocalResources and file distribution
add this file in the files to be localized. (LocalResourceRequest). and then refer it as ./list.ksh .. While adding this to LocalResource specify the path which you have mentioned. On Thu, Dec 5, 2013 at 10:40 PM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi Arun, I have copied a shell script to HDFS and trying to execute it on containers. How do I specify my shell script PATH in setCommands() call on ContainerLaunchContext? I am doing it this way String shellScriptPath = hdfs://isredeng:8020/user/kbonagir/KKDummy/list.ksh; commands.add(shellScriptPath); But my container execution is failing saying that there is No such file or directory! org.apache.hadoop.util.Shell$ExitCodeException: /bin/bash: hdfs://isredeng:8020/user/kbonagir/KKDummy/list.ksh: No such file or directory at org.apache.hadoop.util.Shell.runCommand(Shell.java:464) at org.apache.hadoop.util.Shell.run(Shell.java:379) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589) I could see this file with hadoop fs command and also saw messages in Node Manager's log saying that the resource is downloaded and localized. So, how do I run the downloaded shell script on a container? Thanks, Kishore On Tue, Dec 3, 2013 at 4:57 AM, Arun C Murthy a...@hortonworks.com wrote: Robert, YARN, by default, will only download *resource* from a shared namespace (e.g. HDFS). If /home/hadoop/robert/large_jar.jar is available on each node then you can specify path as file:///home/hadoop/robert/large_jar.jar and it should work. Else, you'll need to copy /home/hadoop/robert/large_jar.jar to HDFS and then specify hdfs://host:port/path/to/large_jar.jar. hth, Arun On Dec 1, 2013, at 12:03 PM, Robert Metzger metrob...@gmail.com wrote: Hello, I'm currently writing code to run my application using Yarn (Hadoop 2.2.0). I used this code as a skeleton: https://github.com/hortonworks/simple-yarn-app Everything works fine on my local machine or on a cluster with the shared directories, but when I want to access resources outside of commonly accessible locations, my application fails. I have my application in a large jar file, containing everything (Submission Client, Application Master, and Workers). The submission client registers the large jar file as a local resource for the Application master's context. In my understanding, Yarn takes care of transferring the client-local resources to the application master's container. This is also stated here: http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/WritingYarnApplications.html You can use the LocalResource to add resources to your application request. This will cause YARN to distribute the resource to the ApplicationMaster node. If I'm starting my jar from the dir /home/hadoop/robert/large_jar.jar, I'll get the following error from the nodemanager (another node in the cluster): 2013-12-01 20:13:00,810 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Failed to download rsrc { { file:/home/hadoop/robert/large_jar.jar, .. So it seems as this node tries to access the file from its local file system. Do I have to use another protocol for the file, something like file://host:port/home/blabla ? Is it true that Yarn is able to distribute files (not using hdfs obviously?) ? The distributedshell-example suggests that I have to use HDFS: https://github.com/apache/hadoop-common/blob/50f0de14e377091c308c3a74ed089a7e4a7f0bfe/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/Client.java Sincerely, Robert -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/ CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: In YARN, how does a task tracker knows the address of a job tracker?
Hi, Starting with YARN there is no notion of job tracker and task tracker. Here is a quick summary JobTracker :- 1) Resource management :- Now done by Resource Manager (it does all scheduling work) 2) Application state management :- managing and launching new map /reduce tasks (done by Application Master .. It is per job not one single entity in the cluster for all jobs like MRv1). TaskTracker :- replaced by Node Manager I would suggest you read the YARN blog posthttp://hortonworks.com/blog/resource-localization-in-yarn-deep-dive/. This will answer most of your questions. Plus read thishttp://www.slideshare.net/ovjforu/yarn-way-to-share-cluster-beyond (slide 12) for how job actually gets executed. Thanks, Omkar Joshi *Hortonworks Inc.* http://www.hortonworks.com On Thu, Nov 21, 2013 at 7:52 AM, ricky l rickylee0...@gmail.com wrote: Hi all, I have a question of how a task tracker identifies job tracker address when I submit MR job through YARN. As far as I know, both job tracker and task trackers are launched through application master and I am curious about the details about job and task tracker launch sequence. thanks. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Limit on total jobs running using fair scheduler
MRv2 :) Thanks, Omkar Joshi *Hortonworks Inc.* http://www.hortonworks.com On Thu, Nov 21, 2013 at 1:46 PM, Ivan Tretyakov itretya...@griddynamics.com wrote: Thank you for your replies! We are using MR version 1 and my question is regarding this version. Omkar, are you talking about MR1 or MR2? I didn't find property to limit number of running jobs per queue for capacity scheduler using MR1. Did I wrong somewhere? What exactly option do you mean? Sandy, thanks, I got it. But unfortunately we are using MR1 for now. On Wed, Nov 20, 2013 at 2:12 AM, Sandy Ryza sandy.r...@cloudera.comwrote: Unfortunately, this is not possible in the MR1 fair scheduler without setting the jobs for individual pools. In MR2, fair scheduler hierarchical queues will allow setting maxRunningApps at the top of the hierarchy, which would have the effect you're looking for. -Sandy On Tue, Nov 19, 2013 at 2:01 PM, Omkar Joshi ojo...@hortonworks.comwrote: Not sure about the fair scheduler but in capacity scheduler you can achieve this by controlling number of jobs/applications per queue. Thanks, Omkar Joshi *Hortonworks Inc.* http://www.hortonworks.com On Tue, Nov 19, 2013 at 3:26 AM, Ivan Tretyakov itretya...@griddynamics.com wrote: Hello! We are using CDH 4.1.1 (Version: 2.0.0-mr1-cdh4.1.1) and fair-scheduler. We need to limit total number of jobs which can run at the same time on cluster. I can see maxRunningJobs options but it sets limit for pool or user. We wouldn't like to limit each pool or user we just need to set limit on total number of jobs running. Is it possible to do it using fair scheduler? Can capacity scheduler help here? Maybe there are other options to achieve the goal? Thanks in advance! -- Best Regards Ivan Tretyakov CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- Best Regards Ivan Tretyakov Deployment Engineer Grid Dynamics +7 812 640 38 76 Skype: ivan.v.tretyakov www.griddynamics.com itretya...@griddynamics.com -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Limit on total jobs running using fair scheduler
Not sure about the fair scheduler but in capacity scheduler you can achieve this by controlling number of jobs/applications per queue. Thanks, Omkar Joshi *Hortonworks Inc.* http://www.hortonworks.com On Tue, Nov 19, 2013 at 3:26 AM, Ivan Tretyakov itretya...@griddynamics.com wrote: Hello! We are using CDH 4.1.1 (Version: 2.0.0-mr1-cdh4.1.1) and fair-scheduler. We need to limit total number of jobs which can run at the same time on cluster. I can see maxRunningJobs options but it sets limit for pool or user. We wouldn't like to limit each pool or user we just need to set limit on total number of jobs running. Is it possible to do it using fair scheduler? Can capacity scheduler help here? Maybe there are other options to achieve the goal? Thanks in advance! -- Best Regards Ivan Tretyakov -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: DefaultResourceCalculator ClassNotFoundException
property nameyarn.scheduler.capacity.resource-calculator/name valueorg.apache.hadoop.yarn.util.resource.DefaultResourceCalculator/value description The ResourceCalculator implementation to be used to compare Resources in the scheduler. The default i.e. DefaultResourceCalculator only uses Memory while DominantResourceCalculator uses dominant-resource to compare multi-dimensional resources such as Memory, CPU etc. /description /property inside capacity-scheduler.xml Thanks, Omkar Joshi *Hortonworks Inc.* http://www.hortonworks.com On Mon, Nov 18, 2013 at 5:34 PM, YouPeng Yang yypvsxf19870...@gmail.comwrote: Hi It does not work. I do not find the yarn.scheduler.capacity.resource-calculator property in the hadoop-2.2.0/share/doc/hadoop/hadoop-yarn/hadoop-yarn-common/yarn-default.xml. Is it the right property? Anyone could give me any suggestion about the exception? 2013/11/15 Rob Blah tmp5...@gmail.com Can you check the config entry for yarn.scheduler.capacity.resource- calculator ? It should point to org.apache.hadoop.yarn.util.resource. DefaultResourceCalculator Answer provided by Ted Yu in thread DefaultResourceCalculator class not found, ResourceManager fails to start. regards 2013/11/15 YouPeng Yang yypvsxf19870...@gmail.com Hi all It‘s wierd to failed to start my yarn resourcemanager with an exception[1]. I aslo do some google, someone also encountered this problem with no solved answer. I check the src ,there is actually no the DefaultResourceCalculator in package :org.apache.hadoop.yarn.server.resourcemanager.resource,however I found the DefaultResourceCalculator in org.apache.hadoop.yarn.util.resource why we miss the class? [1]- 2013-11-15 17:41:46,755 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties 2013-11-15 17:41:46,876 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s). 2013-11-15 17:41:46,877 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ResourceManager metrics system started 2013-11-15 17:41:47,013 INFO org.apache.hadoop.service.AbstractService: Service ResourceManager failed in state INITED; cause: java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.yarn.server.resourcemanager.resource.DefaultResourceCalculator not found java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.yarn.server.resourcemanager.resource.DefaultResourceCalculator not found at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1752) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerConfiguration.getResourceCalculator(CapacitySchedulerConfiguration.java:333) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:263) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:249) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:871) Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.yarn.server.resourcemanager.resource.DefaultResourceCalculator not found at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1720) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1744) ... 5 more Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.yarn.server.resourcemanager.resource.DefaultResourceCalculator not found at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1626) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1718) ... 6 more 2013-11-15 17:41:47,030 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping ResourceManager metrics system... 2013-11-15 17:41:47,032 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ResourceManager metrics system stopped. 2013-11-15 17:41:47,032 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ResourceManager metrics system shutdown complete. 2013-11-15 17:41:47,034 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting ResourceManager java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.yarn.server.resourcemanager.resource.DefaultResourceCalculator not found at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1752
Re: Cannot start resourcemanager
Hi, in capacity-scheduler.xml what value you have set it for below property? property nameyarn.scheduler.capacity.resource-calculator/name valueorg.apache.hadoop.yarn.util.resource.DefaultResourceCalculator/value description The ResourceCalculator implementation to be used to compare Resources in the scheduler. The default i.e. DefaultResourceCalculator only uses Memory while DominantResourceCalculator uses dominant-resource to compare multi-dimensional resources such as Memory, CPU etc. /description /property Thanks, Omkar Joshi *Hortonworks Inc.* http://www.hortonworks.com On Thu, Oct 17, 2013 at 12:48 PM, Arun C Murthy a...@hortonworks.com wrote: What command did you use to start the RM? On Oct 17, 2013, at 10:18 AM, Siddharth Tiwari siddharth.tiw...@live.com wrote: Hi Team, trying to start resourcemanager in the latest hadoop-2.2.0 stable release. It throws following error. Please help 2013-10-17 10:01:51,230 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping ResourceManager metrics system... 2013-10-17 10:01:51,230 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ResourceManager metrics system stopped. 2013-10-17 10:01:51,231 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ResourceManager metrics system shutdown complete. 2013-10-17 10:01:51,232 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting ResourceManager java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.yarn.server.resourcemanager.resource.DefaultResourceCalculator not found at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1752) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerConfiguration.getResourceCalculator(CapacitySchedulerConfiguration.java:333) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:263) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:249) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:871) Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.yarn.server.resourcemanager.resource.DefaultResourceCalculator not found at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1720) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1744) ... 5 more Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.yarn.server.resourcemanager.resource.DefaultResourceCalculator not found at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1626) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1718) ... 6 more 2013-10-17 10:01:51,239 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: SHUTDOWN_MSG: / SHUTDOWN_MSG: Shutting down ResourceManager at node1/192.168.147.101 **** *Cheers !!!* *Siddharth Tiwari* Have a refreshing day !!! *Every duty is holy, and devotion to duty is the highest form of worship of God.” * *Maybe other people will try to limit me but I don't limit myself* -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/ CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Problem: org.apache.hadoop.mapred.ReduceTask: java.net.SocketTimeoutException: connect timed out
what was the problem? oozie problem or machine was getting rebooted where map task ran before? Thanks, Omkar Joshi *Hortonworks Inc.* http://www.hortonworks.com On Thu, Apr 18, 2013 at 12:47 PM, Som Satpathy somsatpa...@gmail.comwrote: Never mind, got it fixed. Thanks, Som On Tue, Apr 16, 2013 at 6:18 PM, Som Satpathy somsatpa...@gmail.comwrote: Hi All, I have just set up a CDH cluster on EC2 using cloudera manager 4.5. I have been trying to run a couple of mapreduce jobs as part of an oozie workflow but have been blocked by the following exception: (my reducer always hangs because of this) - 2013-04-17 00:32:02,268 WARN org.apache.hadoop.mapred.ReduceTask: attempt_201304170021_0003_r_00_0 copy failed: attempt_201304170021_0003_m_00_0 from ip-10-174-49-51.us-west-1.compute.internal 2013-04-17 00:32:02,269 WARN org.apache.hadoop.mapred.ReduceTask: java.net.SocketTimeoutException: connect timed out at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351) at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213) at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366) at java.net.Socket.connect(Socket.java:529) at sun.net.NetworkClient.doConnect(NetworkClient.java:158) at sun.net.www.http.HttpClient.openServer(HttpClient.java:395) at sun.net.www.http.HttpClient.openServer(HttpClient.java:530) at sun.net.www.http.HttpClient.init(HttpClient.java:234) at sun.net.www.http.HttpClient.New(HttpClient.java:307) at sun.net.www.http.HttpClient.New(HttpClient.java:324) at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:970) at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:911) at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:836) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getInputStream(ReduceTask.java:1573) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.setupSecureConnection(ReduceTask.java:1530) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1466) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1360) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1292) 2013-04-17 00:32:02,269 INFO org.apache.hadoop.mapred.ReduceTask: Task attempt_201304170021_0003_r_00_0: Failed fetch #1 from attempt_201304170021_0003_m_00_0 2013-04-17 00:32:02,269 WARN org.apache.hadoop.mapred.ReduceTask: attempt_201304170021_0003_r_00_0 adding host ip-10-174-49-51.us-west-1.compute.internal to penalty box, next contact in 12 seconds Any suggestions that can help me get around this? Really appreciate any help here. Thanks, Som -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Can container requests be made paralelly from multiple threads
Hi, I suggest you should not do that. After YARN-744 goes in this will be prevented on RM side. May I know why you want to do this? any advantage/ use case? Thanks, Omkar Joshi *Hortonworks Inc.* http://www.hortonworks.com On Fri, Sep 27, 2013 at 8:31 AM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi, Can we submit container requests from multiple threads in parallel to the Resource Manager? Thanks, Kishore -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Can container requests be made paralelly from multiple threads
My point is why you want multiple threads as a part of single AM talking to RM simultaneously? I think AMRMProtocol only AM is suppose to use and if the requirement is to have multiple requestor requesting resources then it should be clubbed into one single request and sent to RM. One more thing may be related to this. When AM makes a request to RM; today it gets resources only if earlier NM heartbeat resulted into RM scheduling one for it. So multiplexing AMRM requests wont help anyway but it will only complicate things on RM side. Scheduler is not kicked in (synchronously) when AM makes a request. Thanks, Omkar Joshi *Hortonworks Inc.* http://www.hortonworks.com On Fri, Sep 27, 2013 at 11:14 AM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi Omkar, Thanks for the quick reply. I have a requirement for sets of containers depending on some of my business logic. I found that each of the request allocations is taking around 2 seconds, so I am thinking of doing the requests at the same from multiple threads. Kishore On Fri, Sep 27, 2013 at 11:27 PM, Omkar Joshi ojo...@hortonworks.comwrote: Hi, I suggest you should not do that. After YARN-744 goes in this will be prevented on RM side. May I know why you want to do this? any advantage/ use case? Thanks, Omkar Joshi *Hortonworks Inc.* http://www.hortonworks.com On Fri, Sep 27, 2013 at 8:31 AM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi, Can we submit container requests from multiple threads in parallel to the Resource Manager? Thanks, Kishore CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Distributed cache in command line
Hi, I have no idea about RHadoop but in general in YARN we do create symlinks for the files in distributed cache in the current working directory of every container. You may be able to use that somehow. Thanks, Omkar Joshi *Hortonworks Inc.* http://www.hortonworks.com On Mon, Sep 23, 2013 at 6:28 AM, Chandra Mohan, Ananda Vel Murugan ananda.muru...@honeywell.com wrote: Hi, ** ** Is it possible to access distributed cache in command line? I have written a custom InputFormat implementation which I want to add to distributed cache. Using *libjars *is not an option for me as I am not running Hadoop job in command line. I am running it using RHadoop package in R which internally uses Hadoop streaming. Please help. Thanks. ** ** Regards, Anand.C -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: How to make hadoop use all nodes?
Hi, Let me clarify few things. 1) you are making container requests which are not explicitly looking for certain nodes. (No white listing). 2) All nodes are identical in terms of resources (memory/cores) and every container requires same amount of resources. 3) All nodes have capacity to run say 2 containers. 4) You have 20 nodes. Now if an application is running and is requesting 20 containers then you can not say that you will get all on different nodes (uniformly distributed). It more depends on which node heartbeated to the Resource manager at what time and how much memory is available with it and also how many applications are present in queue and how much they are requesting at what request priorities. If it has say sufficient memory to run 2 containers then they will get allocated (This allocation is quite complex ..I am assuming very simple * reuqest). So you may see few running 2, few running 1 where as few with 0 containers. I hope it clarifies your doubt. Thanks, Omkar Joshi *Hortonworks Inc.* http://www.hortonworks.com On Thu, Sep 19, 2013 at 7:19 AM, Vandecreme, Antoine antoine.vandecr...@nist.gov wrote: Hi all, I am working with Hadoop 2.0.5 (I plan to migrate to 2.1.0 soon). When I am starting a Job, I notice that some nodes are not used or partially used. For example, if my nodes can hold 2 containers, I notice that some nodes are not running any or just 1 while others are running 2. All my nodes are configured the same way. Is this an expected behavior (maybe in case others jobs are started) ? Is there a configuration to change this behavior? Thanks, Antoine -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: ContainerLaunchContext in 2.1.x
Good question... There was a security problem earlier and to address that we removed it from ContainerLaunchContext. Today if you check the payload we are sending Container which contains ContainerToken. ContainerToken is the secured channel for RM to tell NM about 1) ContainerId 2) Resource 3) User 4) NodeId It is present there by default (irrespective of security). I hope it answers your doubt. Thanks, Omkar Joshi *Hortonworks Inc.* http://www.hortonworks.com On Wed, Sep 4, 2013 at 2:51 AM, Janne Valkealahti janne.valkeala...@gmail.com wrote: With 2.0.x ContainerId was part of the ContainerLaunchContext and I assume container id was then used to identify what node manager would actually start. With 2.1.x ContainerId was removed from ContainerLaunchContext. ContainerManagementProtocol is only using a list of StartContainerRequest which have ContainerLaunchContext and Token. My first question is that if you have different ContainerLaunchContext(i.e. command, env variables, etc), how do you know which container is launched with which launch context? My second question is how node manager is assosiating allocated container(which you requested from resource manager) to ContainerLaunchContext? -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Cache file conflict
you should check this https://issues.apache.org/jira/browse/MAPREDUCE-4493?focusedCommentId=13713706page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13713706 Thanks, Omkar Joshi *Hortonworks Inc.* http://www.hortonworks.com On Thu, Aug 29, 2013 at 5:06 PM, Public Network Services publicnetworkservi...@gmail.com wrote: Hi... After updating the source JARs of an application that launches a second job while running a MR job, the following error keeps occurring: org.apache.hadoop.mapred.InvalidJobConfException: cache file (mapreduce.job.cache.files) scheme: hdfs, host: server, port: 9000, file: /tmp/hadoop-yarn/staging/root/.staging/job_1367474197612_0887/libjars/Some.jar, conflicts with cache file (mapreduce.job.cache.files) hdfs://server:9000/tmp/hadoop-yarn/staging/root/.staging/job_1367474197612_0888/libjars/Some.jar at org.apache.hadoop.mapreduce.v2.util.MRApps.parseDistributedCacheArtifacts(MRApps.java:338) at org.apache.hadoop.mapreduce.v2.util.MRApps.setupDistributedCache(MRApps.java:273) at org.apache.hadoop.mapred.YARNRunner.createApplicationSubmissionContext(YARNRunner.java:419) at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:288) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:394) at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1218) at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1215) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1367) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1215) where job_1367474197612_0887 is the name of the initial job, job_1367474197612_0888 is the name of the subsequent job, and Some.jar is a JAR file specific to the application. Any ideas as to how the above error could be eliminated? Thanks! -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
RE: Hadoop - impersonation doubts/issues while accessing from remote machine
Thanks :) Regards, Omkar Joshi -Original Message- From: Harsh J [mailto:ha...@cloudera.com] Sent: Friday, August 23, 2013 3:52 PM To: user@hadoop.apache.org Subject: Re: Hadoop - impersonation doubts/issues while accessing from remote machine I've answered this on the stackoverflow link: http://stackoverflow.com/questions/18354664/spring-data-hadoop-connectivity On Thu, Aug 22, 2013 at 1:29 PM, Omkar Joshi omkar.jo...@lntinfotech.com wrote: For readability, I haven't posted the code, output etc. in this mail - please check the thread below : http://stackoverflow.com/questions/18354664/spring-data-hadoop-connectivity I'm trying to connect to a remote hadoop(1.1.2) cluster from my local Windows machine via Spring data(later, eclipse plug-in may also be used). In future, multiple such connections from several Windows machines are expected. On my remote(single-node) cluster, bigdata is the user for Hadoop etc. bigdata@cloudx-843-770:~$ groups bigdata bigdata : bigdata On my local Windows machine D:\echo %username% 298790 D:\hostname INFVA03351 Now if I refer to Hadoop Secure Impersonation., does it mean I need to create a user 298790 on the cluster, add the hostname in core-site.xml etc. ??? Any less-cumbersome ways out? I tried that too on the cluster but the (partial given)output error still persists : Aug 22, 2013 12:29:20 PM org.springframework.context.support.AbstractApplicationContext prepareRefresh INFO: Refreshing org.springframework.context.support.ClassPathXmlApplicationContext@1815338: startup date [Thu Aug 22 12:29:20 IST 2013]; root of context hierarchy Aug 22, 2013 12:29:20 PM org.springframework.beans.factory.xml.XmlBeanDefinitionReader loadBeanDefinitions INFO: Loading XML bean definitions from class path resource [com/hadoop/basics/applicationContext.xml] Aug 22, 2013 12:29:20 PM org.springframework.core.io.support.PropertiesLoaderSupport loadProperties INFO: Loading properties file from class path resource [resources/hadoop.properties] Aug 22, 2013 12:29:20 PM org.springframework.beans.factory.support.DefaultListableBeanFactory preInstantiateSingletons INFO: Pre-instantiating singletons in org.springframework.beans.factory.support.DefaultListableBeanFactory@7c197e: defining beans [org.springframework.context.support.PropertySourcesPlaceholderConfigurer#0,hadoopConfiguration,wc-job,myjobs-runner,resourceLoader]; root of factory hierarchy Aug 22, 2013 12:29:21 PM org.springframework.data.hadoop.mapreduce.JobExecutor$2 run INFO: Starting job [wc-job] Aug 22, 2013 12:29:21 PM org.apache.hadoop.security.UserGroupInformation doAs SEVERE: PriviledgedActionException as:bigdata via 298790 cause:org.apache.hadoop.ipc.RemoteException: User: 298790 is not allowed to impersonate bigdata Aug 22, 2013 12:29:21 PM org.springframework.data.hadoop.mapreduce.JobExecutor$2 run WARNING: Cannot start job [wc-job] org.apache.hadoop.ipc.RemoteException: User: 298790 is not allowed to impersonate bigdata at org.apache.hadoop.ipc.Client.call(Client.java:1107) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229) at org.apache.hadoop.mapred.$Proxy2.getProtocolVersion(Unknown Source) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:411) at org.apache.hadoop.mapred.JobClient.createRPCProxy(JobClient.java:499) at org.apache.hadoop.mapred.JobClient.init(JobClient.java:490) at org.apache.hadoop.mapred.JobClient.init(JobClient.java:473) at org.apache.hadoop.mapreduce.Job$1.run(Job.java:513) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Unknown Source) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149) at org.apache.hadoop.mapreduce.Job.connect(Job.java:511) at org.apache.hadoop.mapreduce.Job.submit(Job.java:499) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530) at org.springframework.data.hadoop.mapreduce.JobExecutor$2.run(JobExecutor.java:197) at org.springframework.core.task.SyncTaskExecutor.execute(SyncTaskExecutor.java:49) at org.springframework.data.hadoop.mapreduce.JobExecutor.startJobs(JobExecutor.java:168) at org.springframework.data.hadoop.mapreduce.JobExecutor.startJobs(JobExecutor.java:160) at org.springframework.data.hadoop.mapreduce.JobRunner.call(JobRunner.java:52) at org.springframework.data.hadoop.mapreduce.JobRunner.afterPropertiesSet(JobRunner.java:44) at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.invokeInitMethods(AbstractAutowireCapableBeanFactory.java:1541) at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1479
Hadoop - impersonation doubts/issues while accessing from remote machine
) at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:292) at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:194) at org.springframework.beans.factory.support.DefaultListableBeanFactory.preInstantiateSingletons(DefaultListableBeanFactory.java:628) at org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:932) at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:479) at org.springframework.context.support.ClassPathXmlApplicationContext.init(ClassPathXmlApplicationContext.java:197) at org.springframework.context.support.ClassPathXmlApplicationContext.init(ClassPathXmlApplicationContext.java:172) at org.springframework.context.support.ClassPathXmlApplicationContext.init(ClassPathXmlApplicationContext.java:158) at com.hadoop.basics.WordCounter.main(WordCounter.java:58) Regards, Omkar Joshi The contents of this e-mail and any attachment(s) may contain confidential or privileged information for the intended recipient(s). Unintended recipients are prohibited from taking action on the basis of information in this e-mail and using or disseminating the information, and must notify the sender and delete it from their system. LT Infotech will not accept responsibility or liability for the accuracy or completeness of, or the presence of any virus or disabling code in this e-mail
Re: Method Calls
Hi, I think you should take a look at yarn architecture blog series. I found them useful when I started working on YARN ( http://hortonworks.com/blog/introducing-apache-hadoop-yarn/) This will give you a very good picture about communication between different modules and roles played by them. Thanks, Omkar Joshi *Hortonworks Inc.* http://www.hortonworks.com On Tue, Aug 20, 2013 at 2:58 PM, Rajesh Jain rjai...@gmail.com wrote: I am looking for the various method calls (entry points) in the inter tier communication. When the client submits a job for map-reduce, what is the sequence of calls started, e.g Node Manager to Resource Manager or Container to App Master. How is the App Master Child JVM spawned etc. Does any one have a high level call flow / sequence diagram? Thanks, Rajesh [image: MapReduce NextGen Architecture] -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: setLocalResources() on ContainerLaunchContext
Good that your timestamp worked... Now for hdfs try this hdfs://hdfs-host-name:hdfs-host-portabsolute-path now verify that your absolute path is correct. I hope it will work. bin/hadoop fs -ls absolute-path hdfs://isredeng:8020*//*kishore/kk.ksh... why // ?? you have hdfs file at absolute location /kishore/kk.sh? is /kishore and /kishore/kk.sh accessible to the user who is making startContainer call or the one running AM container? Thanks, Omkar Joshi *Hortonworks Inc.* http://www.hortonworks.com On Tue, Aug 6, 2013 at 10:43 PM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi Harsh, Hitesh Omkar, Thanks for the replies. I tried getting the last modified timestamp like this and it works. Is this a right thing to do? File file = new File(/home_/dsadm/kishore/kk.ksh); shellRsrc.setTimestamp(file.lastModified()); And, when I tried using a hdfs file qualifying it with both node name and port, it didn't work, I get a similar error as earlier. String shellScriptPath = hdfs://isredeng:8020//kishore/kk.ksh; 13/08/07 01:36:28 INFO ApplicationMaster: Got container status for containerID= container_1375853431091_0005_01_02, state=COMPLETE, exitStatus=-1000, diagnostics=File does not exist: hdfs://isredeng:8020/kishore/kk.ksh 13/08/07 01:36:28 INFO ApplicationMaster: Got failure status for a container : -1000 On Wed, Aug 7, 2013 at 7:45 AM, Harsh J ha...@cloudera.com wrote: Thanks Hitesh! P.s. Port isn't a requirement (and with HA URIs, you shouldn't add a port), but isredeng has to be the authority component. On Wed, Aug 7, 2013 at 7:37 AM, Hitesh Shah hit...@apache.org wrote: @Krishna, your logs showed the file error for hdfs://isredeng/kishore/kk.ksh I am assuming you have tried dfs -ls /kishore/kk.ksh and confirmed that the file exists? Also the qualified path seems to be missing the namenode port. I need to go back and check if a path without the port works by assuming the default namenode port. @Harsh, adding a helper function seems like a good idea. Let me file a jira to have the above added to one of the helper/client libraries. thanks -- Hitesh On Aug 6, 2013, at 6:47 PM, Harsh J wrote: It is kinda unnecessary to be asking developers to load in timestamps and length themselves. Why not provide a java.io.File, or perhaps a Path accepting API, that gets it automatically on their behalf using the FileSystem API internally? P.s. A HDFS file gave him a FNF, while a Local file gave him a proper TS/Len error. I'm guessing there's a bug here w.r.t. handling HDFS paths. On Wed, Aug 7, 2013 at 12:35 AM, Hitesh Shah hit...@apache.org wrote: Hi Krishna, YARN downloads a specified local resource on the container's node from the url specified. In all situtations, the remote url needs to be a fully qualified path. To verify that the file at the remote url is still valid, YARN expects you to provide the length and last modified timestamp of that file. If you use an hdfs path such as hdfs://namenode:port/absolute path to file, you will need to get the length and timestamp from HDFS. If you use file:///, the file should exist on all nodes and all nodes should have the file with the same length and timestamp for localization to work. ( For a single node setup, this works but tougher to get right on a multi-node setup - deploying the file via a rpm should likely work). -- Hitesh On Aug 6, 2013, at 11:11 AM, Omkar Joshi wrote: Hi, You need to match the timestamp. Probably get the timestamp locally before adding it. This is explicitly done to ensure that file is not updated after user makes the call to avoid possible errors. Thanks, Omkar Joshi Hortonworks Inc. On Tue, Aug 6, 2013 at 5:25 AM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: I tried the following and it works! String shellScriptPath = file:///home_/dsadm/kishore/kk.ksh; But now getting a timestamp error like below, when I passed 0 to setTimestamp() 13/08/06 08:23:48 INFO ApplicationMaster: Got container status for containerID= container_1375784329048_0017_01_02, state=COMPLETE, exitStatus=-1000, diagnostics=Resource file:/home_/dsadm/kishore/kk.ksh changed on src filesystem (expected 0, was 136758058 On Tue, Aug 6, 2013 at 5:24 PM, Harsh J ha...@cloudera.com wrote: Can you try passing a fully qualified local path? That is, including the file:/ scheme On Aug 6, 2013 4:05 PM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi Harsh, The setResource() call on LocalResource() is expecting an argument of type org.apache.hadoop.yarn.api.records.URL which is converted from a string in the form of URI. This happens in the following call of Distributed Shell example, shellRsrc.setResource(ConverterUtils.getYarnUrlFromURI(new URI( shellScriptPath))); So, if I give a local file I get a parsing error like below, which
Re: setLocalResources() on ContainerLaunchContext
Hi, You need to match the timestamp. Probably get the timestamp locally before adding it. This is explicitly done to ensure that file is not updated after user makes the call to avoid possible errors. Thanks, Omkar Joshi *Hortonworks Inc.* http://www.hortonworks.com On Tue, Aug 6, 2013 at 5:25 AM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: I tried the following and it works! String shellScriptPath = file:///home_/dsadm/kishore/kk.ksh; But now getting a timestamp error like below, when I passed 0 to setTimestamp() 13/08/06 08:23:48 INFO ApplicationMaster: Got container status for containerID= container_1375784329048_0017_01_02, state=COMPLETE, exitStatus=-1000, diagnostics=Resource file:/home_/dsadm/kishore/kk.ksh changed on src filesystem (expected 0, was 136758058 On Tue, Aug 6, 2013 at 5:24 PM, Harsh J ha...@cloudera.com wrote: Can you try passing a fully qualified local path? That is, including the file:/ scheme On Aug 6, 2013 4:05 PM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi Harsh, The setResource() call on LocalResource() is expecting an argument of type org.apache.hadoop.yarn.api.records.URL which is converted from a string in the form of URI. This happens in the following call of Distributed Shell example, shellRsrc.setResource(ConverterUtils.getYarnUrlFromURI(new URI( shellScriptPath))); So, if I give a local file I get a parsing error like below, which is when I changed it to an HDFS file thinking that it should be given like that only. Could you please give an example of how else it could be used, using a local file as you are saying? 2013-08-06 06:23:12,942 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Failed to parse resource-request java.net.URISyntaxException: Expected scheme name at index 0: :///home_/dsadm/kishore/kk.ksh at java.net.URI$Parser.fail(URI.java:2820) at java.net.URI$Parser.failExpecting(URI.java:2826) at java.net.URI$Parser.parse(URI.java:3015) at java.net.URI.init(URI.java:747) at org.apache.hadoop.yarn.util.ConverterUtils.getPathFromYarnURL(ConverterUtils.java:77) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourceRequest.init(LocalResourceRequest.java:46) On Tue, Aug 6, 2013 at 3:36 PM, Harsh J ha...@cloudera.com wrote: To be honest, I've never tried loading a HDFS file onto the LocalResource this way. I usually just pass a local file and that works just fine. There may be something in the URI transformation possibly breaking a HDFS source, but try passing a local file - does that fail too? The Shell example uses a local file. On Tue, Aug 6, 2013 at 10:54 AM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi Harsh, Please see if this is useful, I got a stack trace after the error has occurred 2013-08-06 00:55:30,559 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: CWD set to /tmp/nm-local-dir/usercache/dsadm/appcache/application_1375716148174_0004 = file:/tmp/nm-local-dir/usercache/dsadm/appcache/application_1375716148174_0004 2013-08-06 00:55:31,017 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:dsadm (auth:SIMPLE) cause:java.io.FileNotFoundException: File does not exist: hdfs://isredeng/kishore/kk.ksh 2013-08-06 00:55:31,029 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: DEBUG: FAILED { hdfs://isredeng/kishore/kk.ksh, 0, FILE, null }, File does not exist: hdfs://isredeng/kishore/kk.ksh 2013-08-06 00:55:31,031 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://isredeng/kishore/kk.ksh transitioned from DOWNLOADING to FAILED 2013-08-06 00:55:31,034 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1375716148174_0004_01_02 transitioned from LOCALIZING to LOCALIZATION_FAILED 2013-08-06 00:55:31,035 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl: Container container_1375716148174_0004_01_02 sent RELEASE event on a resource request { hdfs://isredeng/kishore/kk.ksh, 0, FILE, null } not present in cache. 2013-08-06 00:55:31,036 WARN org.apache.hadoop.ipc.Client: interrupted waiting to send rpc request to server java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1290) at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:229) at java.util.concurrent.FutureTask.get(FutureTask.java:94) at org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:930) at org.apache.hadoop.ipc.Client.call(Client.java:1285
Re: DistributedCache incompatibility issue between 1.0 and 2.0
check https://issues.apache.org/jira/browse/MAPREDUCE-4493 and https://issues.apache.org/jira/browse/YARN-916 Thanks, Omkar Joshi *Hortonworks Inc.* http://www.hortonworks.com On Fri, Jul 19, 2013 at 8:12 AM, Ted Yu yuzhih...@gmail.com wrote: See this thread also: http://search-hadoop.com/m/3pgakkVpm71/Distributed+Cache+omkarsubj=Re+Distributed+Cache On Fri, Jul 19, 2013 at 6:20 AM, Botelho, Andrew andrew.bote...@emc.comwrote: I have been using Job.addCacheFile() to cache files in the distributed cache. It has been working for me on Hadoop 2.0.5: public void addCacheFile(URI uri) Add a file to be localized Parameters: uri - The uri of the cache to be localized -Original Message- From: Edward J. Yoon [mailto:edwardy...@apache.org] Sent: Friday, July 19, 2013 8:03 AM To: user@hadoop.apache.org Subject: DistributedCache incompatibility issue between 1.0 and 2.0 Hi, I wonder why setLocalFiles and addLocalFiles methods have been removed, and what should I use instead of them? -- Best Regards, Edward J. Yoon @eddieyoon
Re: New Distributed Cache
Yeah Andrew.. there seems to be some problem with context.getCacheFiles() api which is returning null.. Path[] cachedFilePaths = context.getLocalCacheFiles(); // I am checking why it is deprecated... for (Path cachedFilePath : cachedFilePaths) { File cachedFile = new File(cachedFilePath.toUri().getRawPath()); System.out.println(cached fie path + cachedFile.getAbsolutePath()); } I hope this helps for the time being.. JobContext was suppose to replace DistributedCache api (it will be deprecated) however there is some problem with that or I am missing something... Will reply if I find the solution to it. context.getCacheFiles will give you the uri used for localizing files... (original uri used for adding it to cache)... However you can use DistributedCache.getCacheFiles() api till context api is fixed. context.getLocalCacheFiles .. will give you the actual file path on node manager... (after file is localized). Thanks, Omkar Joshi *Hortonworks Inc.* http://www.hortonworks.com On Thu, Jul 11, 2013 at 8:19 AM, Botelho, Andrew andrew.bote...@emc.comwrote: So in my driver code, I try to store the file in the cache with this line of code: ** ** job.addCacheFile(new URI(file location)); ** ** Then in my Mapper code, I do this to try and access the cached file: ** ** URI[] localPaths = context.getCacheFiles(); File f = new File(localPaths[0]); ** ** However, I get a NullPointerException when I do that in the Mapper code.** ** ** ** Any suggesstions? ** ** Andrew ** ** *From:* Shahab Yunus [mailto:shahab.yu...@gmail.com] *Sent:* Wednesday, July 10, 2013 9:43 PM *To:* user@hadoop.apache.org *Subject:* Re: New Distributed Cache ** ** Also, once you have the array of URIs after calling getCacheFiles you can iterate over them using File class or Path ( http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/Path.html#Path(java.net.URI) ) ** ** Regards, Shahab ** ** On Wed, Jul 10, 2013 at 5:08 PM, Omkar Joshi ojo...@hortonworks.com wrote: did you try JobContext.getCacheFiles() ? ** ** Thanks, Omkar Joshi *Hortonworks Inc.* http://www.hortonworks.com ** ** On Wed, Jul 10, 2013 at 10:15 AM, Botelho, Andrew andrew.bote...@emc.com wrote: Hi, I am trying to store a file in the Distributed Cache during my Hadoop job. In the driver class, I tell the job to store the file in the cache with this code: Job job = Job.getInstance(); job.addCacheFile(new URI(file name)); That all compiles fine. In the Mapper code, I try accessing the cached file with this method: Path[] localPaths = context.getLocalCacheFiles(); However, I am getting warnings that this method is deprecated. Does anyone know the newest way to access cached files in the Mapper code? (I am using Hadoop 2.0.5) Thanks in advance, Andrew ** ** ** **
Re: New Distributed Cache
did you try JobContext.getCacheFiles() ? Thanks, Omkar Joshi *Hortonworks Inc.* http://www.hortonworks.com On Wed, Jul 10, 2013 at 10:15 AM, Botelho, Andrew andrew.bote...@emc.comwrote: Hi, ** ** I am trying to store a file in the Distributed Cache during my Hadoop job. In the driver class, I tell the job to store the file in the cache with this code: ** ** Job job = Job.getInstance(); job.addCacheFile(new URI(file name)); ** ** That all compiles fine. In the Mapper code, I try accessing the cached file with this method: ** ** Path[] localPaths = context.getLocalCacheFiles(); ** ** However, I am getting warnings that this method is deprecated. Does anyone know the newest way to access cached files in the Mapper code? (I am using Hadoop 2.0.5) ** ** Thanks in advance, ** ** Andrew
Re: Distributed Cache
try JobContext.getCacheFiles() Thanks, Omkar Joshi *Hortonworks Inc.* http://www.hortonworks.com On Wed, Jul 10, 2013 at 6:31 AM, Botelho, Andrew andrew.bote...@emc.comwrote: Ok using job.addCacheFile() seems to compile correctly. However, how do I then access the cached file in my Mapper code? Is there a method that will look for any files in the cache? ** ** Thanks, ** ** Andrew ** ** *From:* Ted Yu [mailto:yuzhih...@gmail.com] *Sent:* Tuesday, July 09, 2013 6:08 PM *To:* user@hadoop.apache.org *Subject:* Re: Distributed Cache ** ** You should use Job#addCacheFile() Cheers On Tue, Jul 9, 2013 at 3:02 PM, Botelho, Andrew andrew.bote...@emc.com wrote: Hi, I was wondering if I can still use the DistributedCache class in the latest release of Hadoop (Version 2.0.5). In my driver class, I use this code to try and add a file to the distributed cache: import java.net.URI; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.filecache.DistributedCache; import org.apache.hadoop.fs.*; import org.apache.hadoop.io.*; import org.apache.hadoop.mapreduce.*; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; Configuration conf = new Configuration(); DistributedCache.addCacheFile(new URI(file path in HDFS), conf); Job job = Job.getInstance(); … However, I keep getting warnings that the method addCacheFile() is deprecated. Is there a more current way to add files to the distributed cache? Thanks in advance, Andrew ** **
Re: can not start yarn
probably you should run jps everytime you start/stop NM/RM. just for you to know whether RM/NM started/stopped successfully or not. devaraj is right.. try checking RM logs.. Thanks, Omkar Joshi *Hortonworks Inc.* http://www.hortonworks.com On Tue, Jul 9, 2013 at 8:20 PM, Devaraj k devara...@huawei.com wrote: Hi, ** ** Here NM is failing to connect to Resource Manager. ** ** Have you started the Resource Manager successfully? Or Do you see any problem while starting Resource Manager in RM log.. ** ** If you have started the Resource Manager in different machine other than NM, you need to set this configuration for NM “yarn.resourcemanager.resource-tracker.address” with RM resource tracker address. ** ** ** ** Thanks Devaraj k ** ** *From:* ch huang [mailto:justlo...@gmail.com] *Sent:* 10 July 2013 08:36 *To:* user@hadoop.apache.org *Subject:* can not start yarn ** ** i am testing mapreducev2 ,i find i start NM error here is NM log content 2013-07-10 11:02:35,909 INFO org.apache.hadoop.yarn.service.AbstractService: Service:org.apache.hadoop.yarn.server.nodemanager.webapp.WebServer is started. 2013-07-10 11:02:35,909 INFO org.apache.hadoop.yarn.service.AbstractService: Service:Dispatcher is started. 2013-07-10 11:02:35,930 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Connecting to ResourceManager at /0.0.0.0:8031 2013-07-10 11:02:37,209 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFi xedSleep(maxRetries=10, sleepTime=1 SECONDS) 2013-07-10 11:02:38,210 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFi xedSleep(maxRetries=10, sleepTime=1 SECONDS) 2013-07-10 11:02:39,211 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFi xedSleep(maxRetries=10, sleepTime=1 SECONDS) 2013-07-10 11:02:40,212 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFi xedSleep(maxRetries=10, sleepTime=1 SECONDS) 2013-07-10 11:02:41,213 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFi xedSleep(maxRetries=10, sleepTime=1 SECONDS) 2013-07-10 11:02:42,215 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFi xedSleep(maxRetries=10, sleepTime=1 SECONDS) 2013-07-10 11:02:43,216 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFi xedSleep(maxRetries=10, sleepTime=1 SECONDS) 2013-07-10 11:02:44,217 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFi xedSleep(maxRetries=10, sleepTime=1 SECONDS) 2013-07-10 11:02:45,218 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFi xedSleep(maxRetries=10, sleepTime=1 SECONDS) 2013-07-10 11:02:46,219 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFi xedSleep(maxRetries=10, sleepTime=1 SECONDS) 2013-07-10 11:02:46,226 ERROR org.apache.hadoop.yarn.service.CompositeService: Error starting services org.apache.hadoop.yarn.server.nodemanager.NodeManager org.apache.avro.AvroRuntimeException: java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:141) at org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.start(NodeManager.java:196) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:329) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:351) Caused by: java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl.unwrapAndThrowException(YarnRemoteExceptionPBImpl.java:135) at org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:61) at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:190
Re: ConnectionException in container, happens only sometimes
can you post RM/NM logs too.? Thanks, Omkar Joshi *Hortonworks Inc.* http://www.hortonworks.com On Wed, Jul 10, 2013 at 6:42 AM, Andrei faithlessfri...@gmail.com wrote: If it helps, full log of AM can be found herehttp://pastebin.com/zXTabyvv . On Wed, Jul 10, 2013 at 4:21 PM, Andrei faithlessfri...@gmail.com wrote: Hi Devaraj, thanks for your answer. Yes, I suspected it could be because of host mapping, so I have already checked (and have just re-checked) settings in /etc/hosts of each machine, and they all are ok. I use both fully-qualified names (e.g. `master-host.company.com`) and their shortcuts (e.g. `master-host`), so it shouldn't depend on notation too. I have also checked AM syslog. There's nothing about network, but there are several messages like the following: ERROR [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Container complete event for unknown container id container_1373460572360_0001_01_88 I understand container just doesn't get registered in AM (probably because of the same issue), is it correct? So I wonder who sends container complete event to ApplicationMaster? On Wed, Jul 10, 2013 at 3:19 PM, Devaraj k devara...@huawei.com wrote: 1. I assume this is the task (container) that tries to establish connection, but what it wants to connect to? It is trying to connect to MRAppMaster for executing the actual task.*** * ** ** 1. I assume this is the task (container) that tries to establish connection, but what it wants to connect to? It seems Container is not getting the correct MRAppMaster address due to some reason or AM is crashing before giving the task to Container. Probably it is coming due to invalid host mapping. Can you check the host mapping is proper in both the machines and also check the AM log that time for any clue. ** ** Thanks Devaraj k ** ** *From:* Andrei [mailto:faithlessfri...@gmail.com] *Sent:* 10 July 2013 17:32 *To:* user@hadoop.apache.org *Subject:* ConnectionException in container, happens only sometimes ** ** Hi, ** ** I'm running CDH4.3 installation of Hadoop with the following simple setup: ** ** master-host: runs NameNode, ResourceManager and JobHistoryServer slave-1-host and slave-2-hosts: DataNodes and NodeManagers. ** ** When I run simple MapReduce job (both - using streaming API or Pi example from distribution) on client I see that some tasks fail: ** ** 13/07/10 14:40:10 INFO mapreduce.Job: map 60% reduce 0% 13/07/10 14:40:14 INFO mapreduce.Job: Task Id : attempt_1373454026937_0005_m_03_0, Status : FAILED 13/07/10 14:40:14 INFO mapreduce.Job: Task Id : attempt_1373454026937_0005_m_05_0, Status : FAILED ... 13/07/10 14:40:23 INFO mapreduce.Job: map 60% reduce 20% ... ** ** Every time different set of tasks/attempts fails. In some cases number of failed attempts becomes critical, and the whole job fails, in other cases job is finished successfully. I can't see any dependency, but I noticed the following. ** ** Let's say, ApplicationMaster runs on _slave-1-host_. In this case on _slave-2-host_ there will be corresponding syslog with the following contents: ** ** ... 2013-07-10 11:06:10,986 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: slave-2-host/127.0.0.1:11812. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)* *** 2013-07-10 11:06:11,989 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: slave-2-host/127.0.0.1:11812. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)* *** ... 2013-07-10 11:06:20,013 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: slave-2-host/127.0.0.1:11812. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)* *** 2013-07-10 11:06:20,019 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.net.ConnectException: Call From slave-2-host/ 127.0.0.1 to slave-2-host:11812 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:782) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:729
Re: Distributed Cache
Path[] cachedFilePaths = DistributedCache.getLocalCacheFiles(context.getConfiguration()); for (Path cachedFilePath : cachedFilePaths) { File cachedFile = new File(cachedFilePath.toUri().getRawPath()); System.out.println(cached fie path + cachedFile.getAbsolutePath()); } I hope this helps for the time being.. JobContext was suppose to replace DistributedCache api (it will be deprecated) however there is some problem with that or I am missing something... Will reply if I find the solution to it. getCacheFiles will give you the uri used for localizing files... (original uri used for adding it to cache). getLocalCacheFiles .. will give you the actual file path on node manager. Thanks, Omkar Joshi *Hortonworks Inc.* http://www.hortonworks.com On Wed, Jul 10, 2013 at 2:43 PM, Botelho, Andrew andrew.bote...@emc.comwrote: Ok so JobContext.getCacheFiles() retures URI[]. Let’s say I only stored one folder in the cache that has several .txt files within it. How do I use that returned URI to read each line of those .txt files? ** ** Basically, how do I read my cached file(s) after I call JobContext.getCacheFiles()? ** ** Thanks, ** ** Andrew ** ** *From:* Omkar Joshi [mailto:ojo...@hortonworks.com] *Sent:* Wednesday, July 10, 2013 5:15 PM *To:* user@hadoop.apache.org *Subject:* Re: Distributed Cache ** ** try JobContext.getCacheFiles() Thanks, Omkar Joshi *Hortonworks Inc.* http://www.hortonworks.com ** ** On Wed, Jul 10, 2013 at 6:31 AM, Botelho, Andrew andrew.bote...@emc.com wrote: Ok using job.addCacheFile() seems to compile correctly. However, how do I then access the cached file in my Mapper code? Is there a method that will look for any files in the cache? Thanks, Andrew *From:* Ted Yu [mailto:yuzhih...@gmail.com] *Sent:* Tuesday, July 09, 2013 6:08 PM *To:* user@hadoop.apache.org *Subject:* Re: Distributed Cache You should use Job#addCacheFile() Cheers On Tue, Jul 9, 2013 at 3:02 PM, Botelho, Andrew andrew.bote...@emc.com wrote: Hi, I was wondering if I can still use the DistributedCache class in the latest release of Hadoop (Version 2.0.5). In my driver class, I use this code to try and add a file to the distributed cache: import java.net.URI; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.filecache.DistributedCache; import org.apache.hadoop.fs.*; import org.apache.hadoop.io.*; import org.apache.hadoop.mapreduce.*; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; Configuration conf = new Configuration(); DistributedCache.addCacheFile(new URI(file path in HDFS), conf); Job job = Job.getInstance(); … However, I keep getting warnings that the method addCacheFile() is deprecated. Is there a more current way to add files to the distributed cache? Thanks in advance, Andrew ** **
Re: could only be replicated to 0 nodes instead of minReplication exception during job execution
Hi, I see there are 2 datanodes and for some reason namenode is not able to create even single replica for requested blocks. are you sure the system on which these datanodes are running have sufficient disk space? Do you see any other errors in datanode/namenode logs? What must be happening is as file creation in hdfs is failing it is marking that reduce attempt as failure and restarting it. Keep checking namenode state when it reaches 67%. Thanks, Omkar Joshi *Hortonworks Inc.* http://www.hortonworks.com On Mon, Jun 24, 2013 at 3:01 PM, Yuzhang Han yuzhanghan1...@gmail.comwrote: Hello, I am using YARN. I get some exceptions at my namenode and datanode. They are thrown when my Reduce progress gets 67%. Then, reduce phase is restarted from 0% several times, but always restarts at this point. Can someone tell me what I should do? Many thanks! Namenode log: 2013-06-24 19:08:50,345 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.224.2.190:50010 is added to blk_654446797771285606_5062{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.224.2.190:50010|RBW]]} size 0 2013-06-24 19:08:50,349 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Not able to place enough replicas, still in need of 1 to reach 1 For more information, please enable DEBUG log level on org.apache.commons.logging.impl.Log4JLogger 2013-06-24 19:08:50,350 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:ubuntu (auth:SIMPLE) cause:java.io.IOException: File /output/_temporary/1/_temporary/attempt_1372090853102_0001_r_02_0/part-2 could only be replicated to 0 nodes instead of minReplication (=1). There are 2 datanode(s) running and no node(s) are excluded in this operation. 2013-06-24 19:08:50,353 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 9000, call org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from 10.224.2.190:49375: error: java.io.IOException: File /output/_temporary/1/_temporary/attempt_1372090853102_0001_r_02_0/part-2 could only be replicated to 0 nodes instead of minReplication (=1). There are 2 datanode(s) running and no node(s) are excluded in this operation. java.io.IOException: File /output/_temporary/1/_temporary/attempt_1372090853102_0001_r_02_0/part-2 could only be replicated to 0 nodes instead of minReplication (=1). There are 2 datanode(s) running and no node(s) are excluded in this operation. at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1339) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2155) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:491) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:351) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:40744) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:454) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1014) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1741) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1737) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:416) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1478) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1735) 2013-06-24 19:08:50,413 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.224.2.190:50010 is added to blk_8924314838535676494_5063{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.224.2.190:50010|RBW]]} size 0 2013-06-24 19:08:50,418 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Not able to place enough replicas, still in need of 1 to reach 1 For more information, please enable DEBUG log level on org.apache.commons.logging.impl.Log4JLogger Datanode log: 2013-06-24 19:25:54,695 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-1724882733-10.10.79.145-1372090400593:blk_-2417373821601940925_6022, type=LAST_IN_PIPELINE, downstreams=0:[] terminating 2013-06-24 19:25:54,699 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-1724882733-10.10.79.145-1372090400593:blk_3177955398059619584_6033 src: / 10.35.99.108:59710 dest: /10.35.99.108:50010 2013-06-24 19:25:56,473 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exception for BP-1724882733-10.10.79.145-1372090400593:blk_8751401862589207807_6026 java.io.IOException: Connection
Re: Compile Just a Subproject
Hi Curtis...I see a mismatch in paths which you have mentioned..are they same and was a typo or a mistake? ...modify logging code in *C:/l/hadoop-yarn-applications-** distributedshell/src/main/java/org/apache/hadoop/yarn/ applications/distributedshell/Client.java* * * and * C:\l\hadoop-yarn-project\hadoop-yarn\hadoop-yarn-applications\hadoop-yarn-applications-distributedshell *mvn clean install -DskipTests C:\l\hadoop-yarn-project\hadoop-yarn\hadoop-yarn-applications\hadoop-yarn-applications-distributedshellhadoop org.apache.hadoop.yarn.applications.distributedshell.Client -jar target\hadoop-yarn-applications-distributedshell-3.0.0-SNAPSHOT.jar -shell_command whoami * * looks like you also have another directory named *hadoop-yarn-applications- **distributedshell *copied inside c:\1\... to make it more simple...go to *C:\l\hadoop-yarn-project\hadoop-yarn\hadoop-yarn-applications\hadoop-yarn-applications-distributedshell\target\ directory and clean everything and then run the mvn clean install command from **C:\l\hadoop-yarn-project\hadoop-yarn\hadoop-yarn-applications\hadoop-yarn-applications-distributedshell directory.* * * * * *Let me know if you can now see the jar file (with updated time stamp) or not in target directory.* Thanks, Omkar Joshi *Hortonworks Inc.* http://www.hortonworks.com On Fri, Jun 21, 2013 at 3:12 PM, Curtis Ullerich curtuller...@gmail.comwrote: I've executed the commands as you've said, and the jar that is run is unchanged. Here's exactly what I did. C:\l\ start-dfs C:\l\ start-yarn C:\l\ cd hadoop-yarn-project\hadoop-yarn\hadoop-yarn-applications\hadoop-yarn-applications-distributedshell C:\l\hadoop-yarn-project\hadoop-yarn\hadoop-yarn-applications\hadoop-yarn-applications-distributedshellhadoop org.apache.hadoop.yarn.applications.distributedshell.Client -jar target\hadoop-yarn-applications-distributedshell-3.0.0-SNAPSHOT.jar -shell_command whoami ...output... ...modify logging code in C:/l/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/Client.java C:\l\hadoop-yarn-project\hadoop-yarn\hadoop-yarn-applications\hadoop-yarn-applications-distributedshellmvn clean install -DskipTests C:\l\hadoop-yarn-project\hadoop-yarn\hadoop-yarn-applications\hadoop-yarn-applications-distributedshellhadoop org.apache.hadoop.yarn.applications.distributedshell.Client -jar target\hadoop-yarn-applications-distributedshell-3.0.0-SNAPSHOT.jar -shell_command whoami ...identical output to before... Do you see anything wrong with what I've done? Thanks, Curtis On Thu, Jun 20, 2013 at 7:17 PM, Omkar Joshi ojo...@hortonworks.comwrote: Hi Curtis, where are you picking your jar file from? once you run above command you will see the updated jar file in /hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/target/hadoop-yarn-applications-distributedshell-3.0.0-SNAPSHOT.jar I hope you are not using below jar file /hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.0.0-SNAPSHOT.jar What chris has said is right. once you have taken latest code, you should follow below steps 1) mvn clean install -DskipTests (clean to remove previously generated code) 2) now say you are updating distributed shell client code. then go to /hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/ and then run mvn clean install and use jar from target sub folder. Thanks, Omkar Joshi *Hortonworks Inc.* http://www.hortonworks.com On Thu, Jun 20, 2013 at 11:47 AM, Curtis Ullerich curtuller...@gmail.com wrote: Hi Chris, I really appreciate the response. What you described is what I initially tried. The changes never seem to take effect though. Here's what I've done (this is Windows): cd %hadoop_install_dir% mvn clean package -DskipTests mvn install -DskipTests --- modify the code in distributed shell's Client.java --- cd hadoop-yarn-project\hadoop-yarn\hadoop-yarn-applications\hadoop-yarn-applications-distributedshell mvn clean mvn package -DskipTests mvn install -DskipTests Then I run the jar, just as before. I've just been changing log statements to see if the changes worked. They don't--the output doesn't change after doing this rebuild. I've also tried clearing the files put on HDFS in my user directory in case that was the issue. I've taken a more thorough look at BUILDING.txt and I seemed to be consistent with the procedures described there. Am I missing anything else? I've tried restarting yarn and dfs, though I didn't think that would matter. Thanks, Curtis On Thu, Jun 20, 2013 at 11:17 AM, Chris Nauroth cnaur...@hortonworks.com wrote: Hi Curtis, I handle this by running mvn install -DskipTests once from the root of the whole hadoop project to install the sub-modules in my local Maven repository. Then, you
Reducer not getting called
(); Job job = new Job(configuration, Image_Source); job.setJarByClass(getClass()); job.setInputFormatClass(SequenceFileInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); job.setMapperClass(ImageSourceMapper.class); job.setCombinerClass(ImageSourceReducer.class); job.setReducerClass(ImageSourceReducer.class); job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(IntWritable.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); SequenceFileInputFormat.addInputPath(job, new Path(args[0])); TextOutputFormat.setOutputPath(job, new Path(args[1])); System.out.println(Submitting job); job.waitForCompletion(true); int jobStatus = job.isSuccessful() ? 0 : -1; System.out.println(Returning jobStatus = + jobStatus); return jobStatus; } } The command : hadoop jar /home/hduser/dumphere/codes/hadoop/imageops.jar com.hadoop.basics.ImageSummary /scratchpad/imageOps/WholeImageSeqFile /scratchpad/imageOps/cnt The part-file (/http://172.25.6.71:50075/browseDirectory.jsp?dir=/namenodeInfoPort=50070scratchpadhttp://172.25.6.71:50075/browseDirectory.jsp?dir=/scratchpadnamenodeInfoPort=50070/imageOpshttp://172.25.6.71:50075/browseDirectory.jsp?dir=/scratchpad/imageOpsnamenodeInfoPort=50070/cnthttp://172.25.6.71:50075/browseDirectory.jsp?dir=/scratchpad/imageOps/cntnamenodeInfoPort=50070/part-r-0) COOLPIX L120 1 COOLPIX L120 1 K750i 1 The mapper stdout logs : stdout logs In the map method, image is It's a long roadJPG It's a long roadJPG is taken using COOLPIX L120 Returning from the map method In the map method, image is Every man is a mountainsideJPG Every man is a mountainsideJPG is taken using COOLPIX L120 Returning from the map method In the map method, image is mystic.JPG mystic.JPG is taken using K750i Returning from the map method But nothing is reflected in stdout logs of the reducer. What have I missed? Regards, Omkar Joshi The contents of this e-mail and any attachment(s) may contain confidential or privileged information for the intended recipient(s). Unintended recipients are prohibited from taking action on the basis of information in this e-mail and using or disseminating the information, and must notify the sender and delete it from their system. LT Infotech will not accept responsibility or liability for the accuracy or completeness of, or the presence of any virus or disabling code in this e-mail
RE: Reducer not getting called
Ok but that link is broken - can you provide a working one? Regards, Omkar Joshi -Original Message- From: Harsh J [mailto:ha...@cloudera.com] Sent: Thursday, June 13, 2013 11:01 AM To: user@hadoop.apache.org Subject: Re: Reducer not getting called You're not using the recommended @Override annotations, and are hitting a classic programming mistake. Your issue is same as this earlier discussion: http://search-hadoop.com/m/gqA3rAaVQ7 (and the ones before it). On Thu, Jun 13, 2013 at 9:52 AM, Omkar Joshi omkar.jo...@lntinfotech.com wrote: Hi, I have a SequenceFile which contains several jpeg images with (image name, image bytes) as key-value pairs. My objective is to count the no. of images by grouping them by the source, something like this : Nikon Coolpix 100 Sony Cybershot 251 N82 100 The MR code is : package com.hadoop.basics; import java.io.BufferedInputStream; import java.io.ByteArrayInputStream; import java.io.IOException; import java.util.Iterator; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.conf.Configured; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.BytesWritable; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat; import org.apache.hadoop.util.Tool; import org.apache.hadoop.util.ToolRunner; import com.drew.imaging.ImageMetadataReader; import com.drew.imaging.ImageProcessingException; import com.drew.metadata.Directory; import com.drew.metadata.Metadata; import com.drew.metadata.exif.ExifIFD0Directory; public class ImageSummary extends Configured implements Tool { public static class ImageSourceMapper extends MapperText, BytesWritable, Text, IntWritable { private static int tagId = 272; private static final IntWritable one = new IntWritable(1); public void map(Text imageName, BytesWritable imageBytes, Context context) throws IOException, InterruptedException { // TODO Auto-generated method stub System.out.println(In the map method, image is + imageName.toString()); byte[] imageInBytes = imageBytes.getBytes(); ByteArrayInputStream bais = new ByteArrayInputStream(imageInBytes); BufferedInputStream bis = new BufferedInputStream(bais); Metadata imageMD = null; try { imageMD = ImageMetadataReader.readMetadata(bis, true); } catch (ImageProcessingException e) { // TODO Auto-generated catch block System.out.println(Got an ImageProcessingException !); e.printStackTrace(); } Directory exifIFD0Directory = imageMD .getDirectory(ExifIFD0Directory.class); String imageSource = exifIFD0Directory.getString(tagId); System.out.println(imageName.toString() + is taken using + imageSource); context.write(new Text(imageSource), one); System.out.println(Returning from the map method); } } public static class ImageSourceReducer extends ReducerText, IntWritable, Text, IntWritable { public void reduce(Text imageSource, IteratorIntWritable counts, Context context) throws IOException, InterruptedException { // TODO Auto-generated method stub System.out.println(In the reduce method); int finalCount = 0; while (counts.hasNext()) { finalCount += counts.next().get
Re: Warnings?
Hi, did you check in your ubuntu installation; libhadoop binary.. it is present in my ubuntu installation at a relative path of (I used apache installation) hadoop-common-project/hadoop-common/target/native/target/usr/local/lib if present add it to your LID_LIBRARY_PATH. if not present then you can try rebuilding your hadoop installation mvn clean install -Pnative -Pdist -Dtar -DskipTests Thanks, Omkar Joshi *Hortonworks Inc.* http://www.hortonworks.com On Mon, Apr 29, 2013 at 11:19 AM, Kevin Burton rkevinbur...@charter.netwrote: If it doesn't work what are my options? Is there source that I can download and compile? On Apr 29, 2013, at 10:31 AM, Ted Xu t...@gopivotal.com wrote: Hi Kevin, Native libraries are those implemented using C/C++, which only provide code level portability (instead of binary level portability, as Java do). That is to say, the binaries provided by CDH4 distribution will in most cases be broken in your environment. To check if your native libraries are working or not, you can follow the instructions I sent previously. Quote as following. blockquote During runtime, check the hadoop log files for your MapReduce tasks. - If everything is all right, then: DEBUG util.NativeCodeLoader - Trying to load the custom-built native-hadoop library... INFO util.NativeCodeLoader - Loaded the native-hadoop library - If something goes wrong, then: INFO util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable /blokquote On Mon, Apr 29, 2013 at 10:21 AM, Kevin Burton rkevinbur...@charter.netwrote: I looked at the link you provided and found the Ubuntu is one of the “supported platforms” but it doesn’t give any information on how to obtain it or build it. Any idea why it is not includde as part of the Cloudera CDH4 distribution? I followed the installation instructions (mostly apt-get install . . . .) but I fail to see the libhadoop.so. In order to avoid this warning do I need to download the Apache distribution? Which one?*** * ** ** For the warnings about the configuration I looked in my configuration and for this specific example I don’t see ‘session.id’ used anywhere. It must be used by default. If so why is the deprecated default being used? ** ** As for the two warnings about counters. I know I have not implemented any code for counters so again this must be something internal. Is there something I am doing to trigger this? ** ** So I can avoid them what are “hadoop generic options”? ** ** Thanks again. ** ** Kevin ** ** *From:* Ted Xu [mailto:t...@gopivotal.com] *Sent:* Friday, April 26, 2013 10:49 PM *To:* user@hadoop.apache.org *Subject:* Re: Warnings? ** ** Hi Kevin, ** ** Please see my comments inline, ** ** On Sat, Apr 27, 2013 at 11:24 AM, Kevin Burton rkevinbur...@charter.net wrote: Is the native library not available for Ubuntu? If so how do I load it?** ** Native libraries usually requires recompile, for more information please refer Native Librarieshttp://hadoop.apache.org/docs/r2.0.4-alpha/hadoop-project-dist/hadoop-common/NativeLibraries.html . ** ** ** ** Can I tell which key is off? Since I am just starting I would want to be as up to date as possible. It is out of date probably because I copied my examples from books and tutorials. ** ** I think the warning messages are telling it already, xxx is deprecated, use xxx instead In fact, most of the configure keys are changed from hadoop 1.x to 2.x. The compatibility change may later documented on http://wiki.apache.org/hadoop/Compatibility. The main class does derive from Tool. Should I ignore this warning as it seems to be in error? Of course you can ignore this warning as long as you don't use hadoop generic options. ** ** Thank you. On Apr 26, 2013, at 7:49 PM, Ted Xu t...@gopivotal.com wrote: Hi, ** ** First warning is saying hadoop cannot load native library, usually a compression codec. In that case, hadoop will use java implementation instead, which is slower. ** ** Second is caused by hadoop 1.x/2.x configuration key change. You're using a 1.x style key under 2.x, yet hadoop still guarantees backward compatibility. ** ** Third is saying that the main class of a hadoop application is recommanded to implement org.apache.hadoop.util.Toolhttp://hadoop.apache.org/docs/current/api/org/apache/hadoop/util/Tool.html, or else generic command line options (e.g., -D options) will not supported. ** ** On Sat, Apr 27, 2013 at 5:51 AM, rkevinbur...@charter.net wrote: I am running a simple WordCount m/r job and I get output but I get five warnings that I am not sure if I should pay attention to: ** ** *13/04/26 16:24:50 WARN util.NativeCodeLoader: Unable to load native-hadoop
Re: M/R Staticstics
Have you enabled security? can you share the output for your hdfs? bin/hadoop fs -ls / and is /tmp/hadoop-yarn/staging/history/done directory present in hdfs ? if so then what permissions? also please share exception stack trace... Thanks, Omkar Joshi Hortonworks Inc On Fri, Apr 26, 2013 at 3:05 PM, rkevinbur...@charter.net wrote: I was able to overcome the permission exception in the log by creating an HDFS tmp folder (hadoop fs -mkdir /tmp) and opening it up to the world (hadoop fs -chmod a+rwx /tmp). That got rid of the exception put I still am able to connect to port 50030 to see M/R status. More ideas? Even though the exception was missing from the logs of one server in the cluster, l looked on another server and found essentially the same permission problem: 2013-04-26 13:34:56,462 FATAL org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer: Error starting JobHistoryServer org.apache.hadoop.yarn.YarnException: Error creating done directory: [hdfs://devubuntu05:9000/tmp/hadoop-yarn/staging/history/done] at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.init(HistoryFileManager.java:424) at org.apache.hadoop.mapreduce.v2.hs.JobHistory.init(JobHistory.java:87) at org.apache.hadoop.yarn.service.CompositeService.init(CompositeService.java:58) . . . . . On Fri, Apr 26, 2013 at 10:37 AM, Rishi Yadav wrote: do you see retired jobs on job tracker page. There is also job tracker history on the bottom of page. something like this *http://nn.zettabyte.com:50030/jobtracker.jsp*http://nn.zettabyte.com:50030/jobtracker.jsp Thanks and Regards, Rishi Yadav On Fri, Apr 26, 2013 at 7:36 AM, *rkevinbur...@charter.net* wrote: When I submit a simple Hello World M/R job like WordCount it takes less than 5 seconds. The texts show numerous methods for monitoring M/R jobs as they are happening but I have yet to see any that show statistics about a job after it has completed. Obviously simple jobs that take a short amount of time don't allow time to fire up any web mage or monitoring tool to see how it progresses through the JobTracker and TaskTracker as well as which node it is processed on. Any suggestions on how could see this kind of data *after* a job has completed?