Running Distributed shell in hadoop0.23
Hi, Can anyone give the procedure about how to run Distibuted shell example in hadoop yarn.So that i try to understand how applicatin master really works.
running shell commands inside mapper
Hi, I am trying to run a shell command from within a mapper. The shell command is of the form: * hadoop jar somjarfile arg1 arg2 ...* Can i do this type of operation from within a mapper? Also, can i copy the jar file in distributed cache and use it? Thanks, Souri
Overriding remote classes
Hi, there. I've run into an odd situation, and I'm wondering if there's a way around it; I'm trying to use Jackson for some JSON serialization in my program, and I wrote/unit-tested it to work with Jackson 1.9. Then, in integration testing, I started to see some weird version incompatibilities and AbstractMethodErrors. Indeed, some digging revealed that our Hadoop installation (CDH3b3, incidentally) has the Jackson 1.5.2 JARs in its $HADOOP_HOME/lib directory which, as I understand it, forms the basis of the remote JVM classpath. So, for now I've rewritten our code to use the 1.5.2 libraries, but it's ugly and hacky in some places due to Jackson 1.5.2 not having a sensible TypeFactory or anything like that. I'm wondering, though, if there's a way to make the remove JVM use *our* versions of the Jackson libraries (packaged in the fat JAR) instead of the ones that come with Hadoop. And no, in deployment we will not be able to control the cluster ourselves and rip out the old JARs or replace them with updated ones.
Analysing Completed Job info programmatically apart from Jobtracker GUI
Hi Guys ! I want to analyse the completed Job counters like FILE/HDFS BYTES READ/WRITTEN along with other values like average map/reduce task run time. I see that Jobtracker GUI has this info but i want to programmatically retrieve these values instead of manually noting down these values and do some analysis. Can i do it in a simple/easier way ? I also see that Cloudera's HUE is good for this but is there anything equivalent in Hadoop. Can anyone guide me in this regard ? Arun
Re: Overriding remote classes
On 12/14/2011 08:20 AM, John Armstrong wrote: Hi, there. I've run into an odd situation, and I'm wondering if there's a way around it; I'm trying to use Jackson for some JSON serialization in my program, and I wrote/unit-tested it to work with Jackson 1.9. Then, in integration testing, I started to see some weird version incompatibilities and AbstractMethodErrors. Indeed, some digging revealed that our Hadoop installation (CDH3b3, incidentally) has the Jackson 1.5.2 JARs in its $HADOOP_HOME/lib directory which, as I understand it, forms the basis of the remote JVM classpath. So, for now I've rewritten our code to use the 1.5.2 libraries, but it's ugly and hacky in some places due to Jackson 1.5.2 not having a sensible TypeFactory or anything like that. I'm wondering, though, if there's a way to make the remove JVM use *our* versions of the Jackson libraries (packaged in the fat JAR) instead of the ones that come with Hadoop. And no, in deployment we will not be able to control the cluster ourselves and rip out the old JARs or replace them with updated ones. I ran into the same (known) issue. (See: https://issues.apache.org/jira/browse/MAPREDUCE-1700) Doesn't look like there's a solution yet. DR
Re: Analysing Completed Job info programmatically apart from Jobtracker GUI
On 12/14/2011 09:39 AM, arun k wrote: Hi Guys ! I want to analyse the completed Job counters like FILE/HDFS BYTES READ/WRITTEN along with other values like average map/reduce task run time. I see that Jobtracker GUI has this info but i want to programmatically retrieve these values instead of manually noting down these values and do some analysis. Can i do it in a simple/easier way ? I also see that Cloudera's HUE is good for this but is there anything equivalent in Hadoop. Can anyone guide me in this regard ? Arun See: http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapreduce/Job.html#getCounters%28%29 DR
File Side-Effect files in Streaming
Hi, is it possible use side-effect file using streaming (python)? If it is, how can i do it? Thanks.
Re: Overriding remote classes
On Wed, 14 Dec 2011 11:04:37 -0500, David Rosenstrauch dar...@darose.net wrote: I ran into the same (known) issue. (See: https://issues.apache.org/jira/browse/MAPREDUCE-1700) Doesn't look like there's a solution yet. Thanks; good to know that I'm actually doing the best I can be writing everything to be compatible with 1.5.2.
Re: Where JobTracker stores Task'sinformation
Take a look at JobInProgress.java. There is one object per job. Arun On Dec 14, 2011, at 1:14 AM, hadoop anis wrote: Hi Friends, I want to know, where JobTracker stores Task's Information, i.e. which task is being executed on which tasktracker, and how JobTracker stores this information. If anyone know this please let me know it. Regards, Anis M.Tech. Student
Re: Running Distributed shell in hadoop0.23
Assuming you have a non-secure cluster setup ( the code does not handle security properly yet ), the following command would run the ls command on 5 allocated containers. $HADOOP_COMMON_HOME/bin/hadoop jar path to hadoop-yarn-applications-distributedshell-0.24.0-SNAPSHOT.jar org.apache.hadoop.yarn.applications.distributedshell.Client --jar path to hadoop-yarn-applications-distributedshell-0.24.0-SNAPSHOT.jar --shell_command ls --num_containers 5 --debug What the above does is upload the jar that contains the AppMaster class to hdfs, submits a new application request to launch the distributed shell app master on a container which then in turn runs the shell command on the no. of containers specified. -- Hitesh On Dec 14, 2011, at 1:06 AM, sri ram wrote: Hi, Can anyone give the procedure about how to run Distibuted shell example in hadoop yarn.So that i try to understand how applicatin master really works.
Re: running shell commands inside mapper
Souri, Yes and no. Oozie does something like this, but not through the shell, and you can look at how they are doing it. I don't know the details, but you will probably need to get some delegation tokens to make it work properly if you have security enabled. You should be able to copy the jar file over in the distributed cache without any problems. The real question is why are you doing this? Oozie is doing it to avoid security problems with how it runs user code. Is this something to launch more jobs in some sort of a loop? If so you need to be very careful that you don't end up doing a DDOS on the JobTracker by submitting way too many jobs. --Bobby Evans On 12/14/11 5:24 AM, souri datta souri.isthe...@gmail.com wrote: Hi, I am trying to run a shell command from within a mapper. The shell command is of the form: hadoop jar somjarfile arg1 arg2 ... Can i do this type of operation from within a mapper? Also, can i copy the jar file in distributed cache and use it? Thanks, Souri
registering service for use in input / output formats
Is there a way to pass a service to the output format? I have an object which I would like to initialize/configure outside and then pass in (since it must also be used elsewhere). So far I have been using a static singleton instance so the format can find it but I'd prefer to avoid that if possible. Thanks, --AP
Re: Running Distributed shell in hadoop0.23
I get the following erroer by the given command to run distributed shell hadoop1@master:~/hadoop/bin$ ./hadoop jar ../modules/hadoop-yarn-applications-distributedshell-0.23.0.jar org.apache.hadoop.yarn.applications.distributedshell.Client --jar ../modules/hadoop-yarn-applications-distributedshell-0.23.0.jar --shell_command ls --num_containers 5 --debug 2011-12-15 10:04:41,605 FATAL distributedshell.Client (Client.java:main(190)) - Error running CLient java.lang.NoClassDefFoundError: org/apache/hadoop/yarn/ipc/YarnRPC at org.apache.hadoop.yarn.applications.distributedshell.Client.init(Client.java:206) at org.apache.hadoop.yarn.applications.distributedshell.Client.main(Client.java:182) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.hadoop.util.RunJar.main(RunJar.java:189) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.yarn.ipc.YarnRPC at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:321) at java.lang.ClassLoader.loadClass(ClassLoader.java:266) ... 7 more On Thu, Dec 15, 2011 at 12:09 AM, Hitesh Shah hit...@hortonworks.comwrote: Assuming you have a non-secure cluster setup ( the code does not handle security properly yet ), the following command would run the ls command on 5 allocated containers. $HADOOP_COMMON_HOME/bin/hadoop jar path to hadoop-yarn-applications-distributedshell-0.24.0-SNAPSHOT.jar org.apache.hadoop.yarn.applications.distributedshell.Client --jar path to hadoop-yarn-applications-distributedshell-0.24.0-SNAPSHOT.jar --shell_command ls --num_containers 5 --debug What the above does is upload the jar that contains the AppMaster class to hdfs, submits a new application request to launch the distributed shell app master on a container which then in turn runs the shell command on the no. of containers specified. -- Hitesh On Dec 14, 2011, at 1:06 AM, sri ram wrote: Hi, Can anyone give the procedure about how to run Distibuted shell example in hadoop yarn.So that i try to understand how applicatin master really works.
Re: Running Distributed shell in hadoop0.23
The yarn jars are likely missing from the classpath. Could you try creating the symlinks as per step 11 from http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-mapreduce-project/INSTALL?revision=1166955 ? -- Hitesh On Dec 14, 2011, at 8:35 PM, raghavendhra rahul wrote: I get the following erroer by the given command to run distributed shell hadoop1@master:~/hadoop/bin$ ./hadoop jar ../modules/hadoop-yarn-applications-distributedshell-0.23.0.jar org.apache.hadoop.yarn.applications.distributedshell.Client --jar ../modules/hadoop-yarn-applications-distributedshell-0.23.0.jar --shell_command ls --num_containers 5 --debug 2011-12-15 10:04:41,605 FATAL distributedshell.Client (Client.java:main(190)) - Error running CLient java.lang.NoClassDefFoundError: org/apache/hadoop/yarn/ipc/YarnRPC at org.apache.hadoop.yarn.applications.distributedshell.Client.init(Client.java:206) at org.apache.hadoop.yarn.applications.distributedshell.Client.main(Client.java:182) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.hadoop.util.RunJar.main(RunJar.java:189) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.yarn.ipc.YarnRPC at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:321) at java.lang.ClassLoader.loadClass(ClassLoader.java:266) ... 7 more On Thu, Dec 15, 2011 at 12:09 AM, Hitesh Shah hit...@hortonworks.com wrote: Assuming you have a non-secure cluster setup ( the code does not handle security properly yet ), the following command would run the ls command on 5 allocated containers. $HADOOP_COMMON_HOME/bin/hadoop jar path to hadoop-yarn-applications-distributedshell-0.24.0-SNAPSHOT.jar org.apache.hadoop.yarn.applications.distributedshell.Client --jar path to hadoop-yarn-applications-distributedshell-0.24.0-SNAPSHOT.jar --shell_command ls --num_containers 5 --debug What the above does is upload the jar that contains the AppMaster class to hdfs, submits a new application request to launch the distributed shell app master on a container which then in turn runs the shell command on the no. of containers specified. -- Hitesh On Dec 14, 2011, at 1:06 AM, sri ram wrote: Hi, Can anyone give the procedure about how to run Distibuted shell example in hadoop yarn.So that i try to understand how applicatin master really works.
Re: Where JobTracker stores Task'sinformation
From where Jobtracker takes information regrading which map reduce task fails. and where it reschedules these map reduce task (i.e. which code it executes to reschedule map reduce task) On Thu, Dec 15, 2011 at 12:02 AM, Arun C Murthy a...@hortonworks.com wrote: Take a look at JobInProgress.java. There is one object per job. Arun On Dec 14, 2011, at 1:14 AM, hadoop anis wrote: Hi Friends, I want to know, where JobTracker stores Task's Information, i.e. which task is being executed on which tasktracker, and how JobTracker stores this information. If anyone know this please let me know it. Regards, Anis M.Tech. Student
Re: Running Distributed shell in hadoop0.23
should i link the hadoop-yarn-applications-distributedshell-0.23.0.jar also. Without linking this jar it throws the same error. If linked it shows at org.apache.hadoop.util.RunJar.main(RunJar.java:130) Caused by: java.util.zip.ZipException: error in opening zip file at java.util.zip.ZipFile.open(Native Method) at java.util.zip.ZipFile.init(ZipFile.java:131) at java.util.jar.JarFile.init(JarFile.java:150) at java.util.jar.JarFile.init(JarFile.java:87) at org.apache.hadoop.util.RunJar.main(RunJar.java:128)
Re: Running Distributed shell in hadoop0.23
Thanks for the help i made a mistake of creating symlinks within modules.Now everythng is fine. On Thu, Dec 15, 2011 at 11:18 AM, raghavendhra rahul raghavendhrara...@gmail.com wrote: should i link the hadoop-yarn-applications-distributedshell-0.23.0.jar also. Without linking this jar it throws the same error. If linked it shows at org.apache.hadoop.util.RunJar.main(RunJar.java:130) Caused by: java.util.zip.ZipException: error in opening zip file at java.util.zip.ZipFile.open(Native Method) at java.util.zip.ZipFile.init(ZipFile.java:131) at java.util.jar.JarFile.init(JarFile.java:150) at java.util.jar.JarFile.init(JarFile.java:87) at org.apache.hadoop.util.RunJar.main(RunJar.java:128)
Re: Running Distributed shell in hadoop0.23
How to run any script using this.When i tried it shows final status as failed. On Thu, Dec 15, 2011 at 11:48 AM, raghavendhra rahul raghavendhrara...@gmail.com wrote: Thanks for the help i made a mistake of creating symlinks within modules.Now everythng is fine. On Thu, Dec 15, 2011 at 11:18 AM, raghavendhra rahul raghavendhrara...@gmail.com wrote: should i link the hadoop-yarn-applications-distributedshell-0.23.0.jar also. Without linking this jar it throws the same error. If linked it shows at org.apache.hadoop.util.RunJar.main(RunJar.java:130) Caused by: java.util.zip.ZipException: error in opening zip file at java.util.zip.ZipFile.open(Native Method) at java.util.zip.ZipFile.init(ZipFile.java:131) at java.util.jar.JarFile.init(JarFile.java:150) at java.util.jar.JarFile.init(JarFile.java:87) at org.apache.hadoop.util.RunJar.main(RunJar.java:128)
Re: Running Distributed shell in hadoop0.23
When we create a directory using distributed shell,any idea where it is created On Thu, Dec 15, 2011 at 11:57 AM, raghavendhra rahul raghavendhrara...@gmail.com wrote: How to run any script using this.When i tried it shows final status as failed. On Thu, Dec 15, 2011 at 11:48 AM, raghavendhra rahul raghavendhrara...@gmail.com wrote: Thanks for the help i made a mistake of creating symlinks within modules.Now everythng is fine. On Thu, Dec 15, 2011 at 11:18 AM, raghavendhra rahul raghavendhrara...@gmail.com wrote: should i link the hadoop-yarn-applications-distributedshell-0.23.0.jar also. Without linking this jar it throws the same error. If linked it shows at org.apache.hadoop.util.RunJar.main(RunJar.java:130) Caused by: java.util.zip.ZipException: error in opening zip file at java.util.zip.ZipFile.open(Native Method) at java.util.zip.ZipFile.init(ZipFile.java:131) at java.util.jar.JarFile.init(JarFile.java:150) at java.util.jar.JarFile.init(JarFile.java:87) at org.apache.hadoop.util.RunJar.main(RunJar.java:128)