Running Distributed shell in hadoop0.23

2011-12-14 Thread sri ram
Hi,
 Can anyone give the procedure about how to run Distibuted shell
example in hadoop yarn.So that i try to understand how applicatin master
really works.


running shell commands inside mapper

2011-12-14 Thread souri datta
Hi,
 I am trying to run a shell command from within a mapper. The shell command
is of the form:
*  hadoop jar somjarfile arg1 arg2 ...*


Can i do this type of operation from within a mapper?

Also, can i copy the jar file in distributed cache and use it?


Thanks,
Souri


Overriding remote classes

2011-12-14 Thread John Armstrong
Hi, there.

I've run into an odd situation, and I'm wondering if there's a way around
it; I'm trying to use Jackson for some JSON serialization in my program,
and I wrote/unit-tested it to work with Jackson 1.9.  Then, in integration
testing, I started to see some weird version incompatibilities and
AbstractMethodErrors.  Indeed, some digging revealed that our Hadoop
installation (CDH3b3, incidentally) has the Jackson 1.5.2 JARs in its
$HADOOP_HOME/lib directory which, as I understand it, forms the basis of
the remote JVM classpath.

So, for now I've rewritten our code to use the 1.5.2 libraries, but it's
ugly and hacky in some places due to Jackson 1.5.2 not having a sensible
TypeFactory or anything like that.  I'm wondering, though, if there's a way
to make the remove JVM use *our* versions of the Jackson libraries
(packaged in the fat JAR) instead of the ones that come with Hadoop.

And no, in deployment we will not be able to control the cluster ourselves
and rip out the old JARs or replace them with updated ones.


Analysing Completed Job info programmatically apart from Jobtracker GUI

2011-12-14 Thread arun k
Hi Guys !

I want to analyse the completed Job counters like FILE/HDFS BYTES
READ/WRITTEN along with other values like average map/reduce task run time.
I see that Jobtracker GUI has this info but i want to programmatically
retrieve these values instead of manually noting down these values and do
some analysis. Can i do it in a simple/easier way ?
I also see that Cloudera's HUE is good for this but is there anything
equivalent in Hadoop.

Can anyone guide me in this regard ?


Arun


Re: Overriding remote classes

2011-12-14 Thread David Rosenstrauch

On 12/14/2011 08:20 AM, John Armstrong wrote:

Hi, there.

I've run into an odd situation, and I'm wondering if there's a way around
it; I'm trying to use Jackson for some JSON serialization in my program,
and I wrote/unit-tested it to work with Jackson 1.9.  Then, in integration
testing, I started to see some weird version incompatibilities and
AbstractMethodErrors.  Indeed, some digging revealed that our Hadoop
installation (CDH3b3, incidentally) has the Jackson 1.5.2 JARs in its
$HADOOP_HOME/lib directory which, as I understand it, forms the basis of
the remote JVM classpath.

So, for now I've rewritten our code to use the 1.5.2 libraries, but it's
ugly and hacky in some places due to Jackson 1.5.2 not having a sensible
TypeFactory or anything like that.  I'm wondering, though, if there's a way
to make the remove JVM use *our* versions of the Jackson libraries
(packaged in the fat JAR) instead of the ones that come with Hadoop.

And no, in deployment we will not be able to control the cluster ourselves
and rip out the old JARs or replace them with updated ones.


I ran into the same (known) issue.  (See: 
https://issues.apache.org/jira/browse/MAPREDUCE-1700)


Doesn't look like there's a solution yet.

DR


Re: Analysing Completed Job info programmatically apart from Jobtracker GUI

2011-12-14 Thread David Rosenstrauch

On 12/14/2011 09:39 AM, arun k wrote:

Hi Guys !

I want to analyse the completed Job counters like FILE/HDFS BYTES
READ/WRITTEN along with other values like average map/reduce task run time.
I see that Jobtracker GUI has this info but i want to programmatically
retrieve these values instead of manually noting down these values and do
some analysis. Can i do it in a simple/easier way ?
I also see that Cloudera's HUE is good for this but is there anything
equivalent in Hadoop.

Can anyone guide me in this regard ?


Arun


See:

http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapreduce/Job.html#getCounters%28%29

DR


File Side-Effect files in Streaming

2011-12-14 Thread Kadu canGica Eduardo
Hi,
is it possible use side-effect file using streaming (python)?
If it is, how can i do it?

Thanks.


Re: Overriding remote classes

2011-12-14 Thread John Armstrong
On Wed, 14 Dec 2011 11:04:37 -0500, David Rosenstrauch dar...@darose.net
wrote:
 I ran into the same (known) issue.  (See: 
 https://issues.apache.org/jira/browse/MAPREDUCE-1700)
 
 Doesn't look like there's a solution yet.

Thanks; good to know that I'm actually doing the best I can be writing
everything to be compatible with 1.5.2.


Re: Where JobTracker stores Task'sinformation

2011-12-14 Thread Arun C Murthy
Take a look at JobInProgress.java. There is one object per job.

Arun

On Dec 14, 2011, at 1:14 AM, hadoop anis wrote:

 
 
 
   Hi Friends,
   I want to know, where JobTracker stores Task's Information,
   i.e.  which task is being executed on which tasktracker, and how 
 JobTracker  stores this information. 
   If anyone know this please let me know it.
 
 
 
 Regards,
 Anis
 
 M.Tech. Student
 



Re: Running Distributed shell in hadoop0.23

2011-12-14 Thread Hitesh Shah
Assuming you have a non-secure cluster setup ( the code does not handle 
security properly yet ), the following command would run the ls command on 5 
allocated containers.
 
$HADOOP_COMMON_HOME/bin/hadoop jar  path to 
hadoop-yarn-applications-distributedshell-0.24.0-SNAPSHOT.jar 
org.apache.hadoop.yarn.applications.distributedshell.Client --jar  path to 
hadoop-yarn-applications-distributedshell-0.24.0-SNAPSHOT.jar --shell_command 
ls --num_containers 5 --debug 

What the above does is upload the jar that contains the AppMaster class to 
hdfs, submits a new application request to launch the distributed shell app 
master on a container which then in turn runs the shell command on the no. of 
containers specified. 

-- Hitesh

On Dec 14, 2011, at 1:06 AM, sri ram wrote:

 Hi,
  Can anyone give the procedure about how to run Distibuted shell example 
 in hadoop yarn.So that i try to understand how applicatin master really works.



Re: running shell commands inside mapper

2011-12-14 Thread Robert Evans
Souri,

Yes and no.  Oozie does something like this, but not through the shell, and you 
can look at how they are doing it.  I don't know the details, but you will 
probably need to get some delegation tokens to make it work properly if you 
have security enabled.  You should be able to copy the jar file over in the 
distributed cache without any problems.  The real question is why are you doing 
this?  Oozie is doing it to avoid security problems with how it runs user code. 
 Is this something to launch more jobs in some sort of a loop?  If so you need 
to be very careful that you don't end up doing a DDOS on the JobTracker by 
submitting way too many jobs.

--Bobby Evans

On 12/14/11 5:24 AM, souri datta souri.isthe...@gmail.com wrote:

Hi,
 I am trying to run a shell command from within a mapper. The shell command is 
of the form:
  hadoop jar somjarfile arg1 arg2 ...


Can i do this type of operation from within a mapper?

Also, can i copy the jar file in distributed cache and use it?


Thanks,
Souri



registering service for use in input / output formats

2011-12-14 Thread Adam Portley
Is there a way to pass a service to the output format?  I have an object 
which I would like to initialize/configure outside and then pass in 
(since it must also be used elsewhere).
So far I have been using a static singleton instance so the format can 
find it but I'd prefer to avoid that if possible.

Thanks,
--AP



Re: Running Distributed shell in hadoop0.23

2011-12-14 Thread raghavendhra rahul
I get the following erroer by the given command to run distributed shell
hadoop1@master:~/hadoop/bin$ ./hadoop jar
../modules/hadoop-yarn-applications-distributedshell-0.23.0.jar
org.apache.hadoop.yarn.applications.distributedshell.Client --jar
../modules/hadoop-yarn-applications-distributedshell-0.23.0.jar
--shell_command ls --num_containers 5 --debug
2011-12-15 10:04:41,605 FATAL distributedshell.Client
(Client.java:main(190)) - Error running CLient
java.lang.NoClassDefFoundError: org/apache/hadoop/yarn/ipc/YarnRPC
at
org.apache.hadoop.yarn.applications.distributedshell.Client.init(Client.java:206)
at
org.apache.hadoop.yarn.applications.distributedshell.Client.main(Client.java:182)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.apache.hadoop.util.RunJar.main(RunJar.java:189)
Caused by: java.lang.ClassNotFoundException:
org.apache.hadoop.yarn.ipc.YarnRPC
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
... 7 more


On Thu, Dec 15, 2011 at 12:09 AM, Hitesh Shah hit...@hortonworks.comwrote:

 Assuming you have a non-secure cluster setup ( the code does not handle
 security properly yet ), the following command would run the ls command on
 5 allocated containers.

 $HADOOP_COMMON_HOME/bin/hadoop jar  path to
 hadoop-yarn-applications-distributedshell-0.24.0-SNAPSHOT.jar
 org.apache.hadoop.yarn.applications.distributedshell.Client --jar  path to
 hadoop-yarn-applications-distributedshell-0.24.0-SNAPSHOT.jar
 --shell_command ls --num_containers 5 --debug

 What the above does is upload the jar that contains the AppMaster class to
 hdfs, submits a new application request to launch the distributed shell app
 master on a container which then in turn runs the shell command on the no.
 of containers specified.

 -- Hitesh

 On Dec 14, 2011, at 1:06 AM, sri ram wrote:

  Hi,
   Can anyone give the procedure about how to run Distibuted shell
 example in hadoop yarn.So that i try to understand how applicatin master
 really works.




Re: Running Distributed shell in hadoop0.23

2011-12-14 Thread Hitesh Shah
The yarn jars are likely missing from the classpath. Could you try creating the 
symlinks as per step 11 from 
http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-mapreduce-project/INSTALL?revision=1166955
 ?

-- Hitesh

On Dec 14, 2011, at 8:35 PM, raghavendhra rahul wrote:

 I get the following erroer by the given command to run distributed shell
 hadoop1@master:~/hadoop/bin$ ./hadoop jar 
 ../modules/hadoop-yarn-applications-distributedshell-0.23.0.jar 
 org.apache.hadoop.yarn.applications.distributedshell.Client --jar 
 ../modules/hadoop-yarn-applications-distributedshell-0.23.0.jar 
 --shell_command ls --num_containers 5 --debug
 2011-12-15 10:04:41,605 FATAL distributedshell.Client (Client.java:main(190)) 
 - Error running CLient
 java.lang.NoClassDefFoundError: org/apache/hadoop/yarn/ipc/YarnRPC
 at 
 org.apache.hadoop.yarn.applications.distributedshell.Client.init(Client.java:206)
 at 
 org.apache.hadoop.yarn.applications.distributedshell.Client.main(Client.java:182)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:616)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:189)
 Caused by: java.lang.ClassNotFoundException: 
 org.apache.hadoop.yarn.ipc.YarnRPC
 at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
 ... 7 more
 
 
 On Thu, Dec 15, 2011 at 12:09 AM, Hitesh Shah hit...@hortonworks.com wrote:
 Assuming you have a non-secure cluster setup ( the code does not handle 
 security properly yet ), the following command would run the ls command on 5 
 allocated containers.
 
 $HADOOP_COMMON_HOME/bin/hadoop jar  path to 
 hadoop-yarn-applications-distributedshell-0.24.0-SNAPSHOT.jar 
 org.apache.hadoop.yarn.applications.distributedshell.Client --jar  path to 
 hadoop-yarn-applications-distributedshell-0.24.0-SNAPSHOT.jar 
 --shell_command ls --num_containers 5 --debug
 
 What the above does is upload the jar that contains the AppMaster class to 
 hdfs, submits a new application request to launch the distributed shell app 
 master on a container which then in turn runs the shell command on the no. of 
 containers specified.
 
 -- Hitesh
 
 On Dec 14, 2011, at 1:06 AM, sri ram wrote:
 
  Hi,
   Can anyone give the procedure about how to run Distibuted shell 
  example in hadoop yarn.So that i try to understand how applicatin master 
  really works.
 
 



Re: Where JobTracker stores Task'sinformation

2011-12-14 Thread hadoop anis
 From where Jobtracker takes information regrading which map 
reduce task fails.
 and where it reschedules these map  reduce task (i.e. which
code it executes to reschedule map  reduce task)

On Thu, Dec 15, 2011 at 12:02 AM, Arun C Murthy a...@hortonworks.com wrote:

 Take a look at JobInProgress.java. There is one object per job.

 Arun

 On Dec 14, 2011, at 1:14 AM, hadoop anis wrote:

 
 
 
Hi Friends,
I want to know, where JobTracker stores Task's Information,
i.e.  which task is being executed on which tasktracker, and
 how JobTracker  stores this information.
If anyone know this please let me know it.
 
 
 
  Regards,
  Anis
 
  M.Tech. Student
 




Re: Running Distributed shell in hadoop0.23

2011-12-14 Thread raghavendhra rahul
should i link the hadoop-yarn-applications-distributedshell-0.23.0.jar also.
Without linking this jar it throws the same error.
If linked it shows
at org.apache.hadoop.util.RunJar.main(RunJar.java:130)
Caused by: java.util.zip.ZipException: error in opening zip file
at java.util.zip.ZipFile.open(Native Method)
at java.util.zip.ZipFile.init(ZipFile.java:131)
at java.util.jar.JarFile.init(JarFile.java:150)
at java.util.jar.JarFile.init(JarFile.java:87)
at org.apache.hadoop.util.RunJar.main(RunJar.java:128)


Re: Running Distributed shell in hadoop0.23

2011-12-14 Thread raghavendhra rahul
Thanks for the help i made a mistake of creating symlinks within
modules.Now everythng is fine.


On Thu, Dec 15, 2011 at 11:18 AM, raghavendhra rahul 
raghavendhrara...@gmail.com wrote:

 should i link the hadoop-yarn-applications-distributedshell-0.23.0.jar
 also.
 Without linking this jar it throws the same error.
 If linked it shows
 at org.apache.hadoop.util.RunJar.main(RunJar.java:130)
 Caused by: java.util.zip.ZipException: error in opening zip file
 at java.util.zip.ZipFile.open(Native Method)
 at java.util.zip.ZipFile.init(ZipFile.java:131)
 at java.util.jar.JarFile.init(JarFile.java:150)
 at java.util.jar.JarFile.init(JarFile.java:87)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:128)





Re: Running Distributed shell in hadoop0.23

2011-12-14 Thread raghavendhra rahul
How to run any script using this.When i tried it shows final status as
failed.

On Thu, Dec 15, 2011 at 11:48 AM, raghavendhra rahul 
raghavendhrara...@gmail.com wrote:

 Thanks for the help i made a mistake of creating symlinks within
 modules.Now everythng is fine.



 On Thu, Dec 15, 2011 at 11:18 AM, raghavendhra rahul 
 raghavendhrara...@gmail.com wrote:

 should i link the hadoop-yarn-applications-distributedshell-0.23.0.jar
 also.
 Without linking this jar it throws the same error.
 If linked it shows
 at org.apache.hadoop.util.RunJar.main(RunJar.java:130)
 Caused by: java.util.zip.ZipException: error in opening zip file
 at java.util.zip.ZipFile.open(Native Method)
 at java.util.zip.ZipFile.init(ZipFile.java:131)
 at java.util.jar.JarFile.init(JarFile.java:150)
 at java.util.jar.JarFile.init(JarFile.java:87)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:128)






Re: Running Distributed shell in hadoop0.23

2011-12-14 Thread raghavendhra rahul
When we create a directory using distributed shell,any idea where it is
created

On Thu, Dec 15, 2011 at 11:57 AM, raghavendhra rahul 
raghavendhrara...@gmail.com wrote:

 How to run any script using this.When i tried it shows final status as
 failed.


 On Thu, Dec 15, 2011 at 11:48 AM, raghavendhra rahul 
 raghavendhrara...@gmail.com wrote:

 Thanks for the help i made a mistake of creating symlinks within
 modules.Now everythng is fine.



 On Thu, Dec 15, 2011 at 11:18 AM, raghavendhra rahul 
 raghavendhrara...@gmail.com wrote:

 should i link the hadoop-yarn-applications-distributedshell-0.23.0.jar
 also.
 Without linking this jar it throws the same error.
 If linked it shows
 at org.apache.hadoop.util.RunJar.main(RunJar.java:130)
 Caused by: java.util.zip.ZipException: error in opening zip file
 at java.util.zip.ZipFile.open(Native Method)
 at java.util.zip.ZipFile.init(ZipFile.java:131)
 at java.util.jar.JarFile.init(JarFile.java:150)
 at java.util.jar.JarFile.init(JarFile.java:87)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:128)