Help with Hadoop Eclipse Plugin on Mac OS X Lion

2011-12-02 Thread Will L





Hello,
I am having problems getting my hadoop eclipse plugin to work on Mac OS X Lion.

I have tried the following combinations:
Hadoop 0.20.203, Eclipse 3.6.2 (32-bit), 
hadoop-eclipse-plugin-0.20.203.0.jarHadoop 0.20.203, Eclipse 3.6.2 (32-bit), 
hadoop-eclipse-plugin-0.20.3-SNAPSHOT.jar (from JIRA)Hadoop 0.20.203, Eclipse 
3.7.1 (32-bit), hadoop-eclipse-plugin-0.20.203.0.jarHadoop 0.20.203, Eclipse 
3.7.1 (32-bit), hadoop-eclipse-plugin-0.20.3-SNAPSHOT.jar (from JIRA)Hadoop 
0.20.205, Eclipse 3.7.1 (32-bit), hadoop-eclipse-plugin-0.20.205.0.jar

Has anyone gotten the hadoop eclipse plugin to work on Mac OS X Lion?


Thank you for your time and help I greatly appreciate it!


Sincerely,


Will

  

RE: Help with Hadoop Eclipse Plugin on Mac OS X Lion

2011-12-02 Thread Will L


Oops guess the formatting went away:
I have tried the following combinations:
* Hadoop 0.20.203, Eclipse 3.6.2 (32-bit), hadoop-eclipse-plugin-0.20.203.0.jar
* Hadoop 0.20.203, Eclipse 3.6.2 (32-bit), 
hadoop-eclipse-plugin-0.20.3-SNAPSHOT.jar (from JIRA) 
* Hadoop 0.20.203 Eclipse 3.7.1 (32-bit), hadoop-eclipse-plugin-0.20.203.0.jar
* Hadoop 0.20.203, Eclipse 3.7.1 (32-bit), 
hadoop-eclipse-plugin-0.20.3-SNAPSHOT.jar (from JIRA)
* Hadoop 0.20.205, Eclipse 3.7.1 (32-bit), hadoop-eclipse-plugin-0.20.205.0.jar

> From: seventeen_reas...@hotmail.com
> To: common-user@hadoop.apache.org
> Subject: Help with Hadoop Eclipse Plugin on Mac OS X Lion
> Date: Fri, 2 Dec 2011 00:26:28 -0800
> 
> 
> 
> 
> 
> 
> Hello,
> I am having problems getting my hadoop eclipse plugin to work on Mac OS X 
> Lion.
> 
> I have tried the following combinations:
> Hadoop 0.20.203, Eclipse 3.6.2 (32-bit), 
> hadoop-eclipse-plugin-0.20.203.0.jarHadoop 0.20.203, Eclipse 3.6.2 (32-bit), 
> hadoop-eclipse-plugin-0.20.3-SNAPSHOT.jar (from JIRA)Hadoop 0.20.203, Eclipse 
> 3.7.1 (32-bit), hadoop-eclipse-plugin-0.20.203.0.jarHadoop 0.20.203, Eclipse 
> 3.7.1 (32-bit), hadoop-eclipse-plugin-0.20.3-SNAPSHOT.jar (from JIRA)Hadoop 
> 0.20.205, Eclipse 3.7.1 (32-bit), hadoop-eclipse-plugin-0.20.205.0.jar
> 
> Has anyone gotten the hadoop eclipse plugin to work on Mac OS X Lion?
> 
> 
> Thank you for your time and help I greatly appreciate it!
> 
> 
> Sincerely,
> 
> 
> Will
> 
> 
  

Does Hadoop 0.20.205 and Ganglia 3.1.7 compatible with each other ?

2011-12-02 Thread praveenesh kumar
or Do I have to apply some hadoop patch for this ?

Thanks,
Praveenesh


Re: Help with Hadoop Eclipse Plugin on Mac OS X Lion

2011-12-02 Thread Prashant Sharma
Why do you need a plugin at all?

you can do away with it by having a maven project i.e. having a pom.xml and
setting hadoop as one of the dependencies. Then use regular maven commands
to build etc.. e.g. mvn eclipse:eclipse would be an interesting command.

On Fri, Dec 2, 2011 at 1:59 PM, Will L wrote:

>
>
> Oops guess the formatting went away:
> I have tried the following combinations:
> * Hadoop 0.20.203, Eclipse 3.6.2 (32-bit),
> hadoop-eclipse-plugin-0.20.203.0.jar
> * Hadoop 0.20.203, Eclipse 3.6.2 (32-bit),
> hadoop-eclipse-plugin-0.20.3-SNAPSHOT.jar (from JIRA)
> * Hadoop 0.20.203 Eclipse 3.7.1 (32-bit),
> hadoop-eclipse-plugin-0.20.203.0.jar
> * Hadoop 0.20.203, Eclipse 3.7.1 (32-bit),
> hadoop-eclipse-plugin-0.20.3-SNAPSHOT.jar (from JIRA)
> * Hadoop 0.20.205, Eclipse 3.7.1 (32-bit),
> hadoop-eclipse-plugin-0.20.205.0.jar
>
> > From: seventeen_reas...@hotmail.com
> > To: common-user@hadoop.apache.org
> > Subject: Help with Hadoop Eclipse Plugin on Mac OS X Lion
> > Date: Fri, 2 Dec 2011 00:26:28 -0800
> >
> >
> >
> >
> >
> >
> > Hello,
> > I am having problems getting my hadoop eclipse plugin to work on Mac OS
> X Lion.
> >
> > I have tried the following combinations:
> > Hadoop 0.20.203, Eclipse 3.6.2 (32-bit),
> hadoop-eclipse-plugin-0.20.203.0.jarHadoop 0.20.203, Eclipse 3.6.2
> (32-bit), hadoop-eclipse-plugin-0.20.3-SNAPSHOT.jar (from JIRA)Hadoop
> 0.20.203, Eclipse 3.7.1 (32-bit),
> hadoop-eclipse-plugin-0.20.203.0.jarHadoop 0.20.203, Eclipse 3.7.1
> (32-bit), hadoop-eclipse-plugin-0.20.3-SNAPSHOT.jar (from JIRA)Hadoop
> 0.20.205, Eclipse 3.7.1 (32-bit), hadoop-eclipse-plugin-0.20.205.0.jar
> >
> > Has anyone gotten the hadoop eclipse plugin to work on Mac OS X Lion?
> >
> >
> > Thank you for your time and help I greatly appreciate it!
> >
> >
> > Sincerely,
> >
> >
> > Will
> >
> >
>
>


RE: Help with Hadoop Eclipse Plugin on Mac OS X Lion

2011-12-02 Thread Will L


I got the setup working under my laptop running OS X Snow Leopard without any 
problems and I would like to use my new laptop running OS X Lion.

The plugin is helpful in that I can see hadoop output being dumped to the 
eclipse console and it used to integrate well with the Eclipse IDE making my 
development life a little easier. 

Thank you for your time and help.

Sincerely,

Will Lieu

> Date: Fri, 2 Dec 2011 21:44:36 +0530
> Subject: Re: Help with Hadoop Eclipse Plugin on Mac OS X Lion
> From: prashant.ii...@gmail.com
> To: common-user@hadoop.apache.org
> 
> Why do you need a plugin at all?
> 
> you can do away with it by having a maven project i.e. having a pom.xml and
> setting hadoop as one of the dependencies. Then use regular maven commands
> to build etc.. e.g. mvn eclipse:eclipse would be an interesting command.
> 
> On Fri, Dec 2, 2011 at 1:59 PM, Will L wrote:
> 
> >
> >
> > Oops guess the formatting went away:
> > I have tried the following combinations:
> > * Hadoop 0.20.203, Eclipse 3.6.2 (32-bit),
> > hadoop-eclipse-plugin-0.20.203.0.jar
> > * Hadoop 0.20.203, Eclipse 3.6.2 (32-bit),
> > hadoop-eclipse-plugin-0.20.3-SNAPSHOT.jar (from JIRA)
> > * Hadoop 0.20.203 Eclipse 3.7.1 (32-bit),
> > hadoop-eclipse-plugin-0.20.203.0.jar
> > * Hadoop 0.20.203, Eclipse 3.7.1 (32-bit),
> > hadoop-eclipse-plugin-0.20.3-SNAPSHOT.jar (from JIRA)
> > * Hadoop 0.20.205, Eclipse 3.7.1 (32-bit),
> > hadoop-eclipse-plugin-0.20.205.0.jar
> >
> > > From: seventeen_reas...@hotmail.com
> > > To: common-user@hadoop.apache.org
> > > Subject: Help with Hadoop Eclipse Plugin on Mac OS X Lion
> > > Date: Fri, 2 Dec 2011 00:26:28 -0800
> > >
> > >
> > >
> > >
> > >
> > >
> > > Hello,
> > > I am having problems getting my hadoop eclipse plugin to work on Mac OS
> > X Lion.
> > >
> > > I have tried the following combinations:
> > > Hadoop 0.20.203, Eclipse 3.6.2 (32-bit),
> > hadoop-eclipse-plugin-0.20.203.0.jarHadoop 0.20.203, Eclipse 3.6.2
> > (32-bit), hadoop-eclipse-plugin-0.20.3-SNAPSHOT.jar (from JIRA)Hadoop
> > 0.20.203, Eclipse 3.7.1 (32-bit),
> > hadoop-eclipse-plugin-0.20.203.0.jarHadoop 0.20.203, Eclipse 3.7.1
> > (32-bit), hadoop-eclipse-plugin-0.20.3-SNAPSHOT.jar (from JIRA)Hadoop
> > 0.20.205, Eclipse 3.7.1 (32-bit), hadoop-eclipse-plugin-0.20.205.0.jar
> > >
> > > Has anyone gotten the hadoop eclipse plugin to work on Mac OS X Lion?
> > >
> > >
> > > Thank you for your time and help I greatly appreciate it!
> > >
> > >
> > > Sincerely,
> > >
> > >
> > > Will
> > >
> > >
> >
> >
  

Re: Help with Hadoop Eclipse Plugin on Mac OS X Lion

2011-12-02 Thread Prashant Sharma
nice to know Will, well the way i said you have the same luxury as far as
you are running in stand-alone mode which is ideal for development.

On Fri, Dec 2, 2011 at 10:02 PM, Will L wrote:

>
>
> I got the setup working under my laptop running OS X Snow Leopard without
> any problems and I would like to use my new laptop running OS X Lion.
>
> The plugin is helpful in that I can see hadoop output being dumped to the
> eclipse console and it used to integrate well with the Eclipse IDE making my
> development life a little easier.
>
> Thank you for your time and help.
>
> Sincerely,
>
> Will Lieu
>
> > Date: Fri, 2 Dec 2011 21:44:36 +0530
> > Subject: Re: Help with Hadoop Eclipse Plugin on Mac OS X Lion
> > From: prashant.ii...@gmail.com
> > To: common-user@hadoop.apache.org
> >
> > Why do you need a plugin at all?
> >
> > you can do away with it by having a maven project i.e. having a pom.xml
> and
> > setting hadoop as one of the dependencies. Then use regular maven
> commands
> > to build etc.. e.g. mvn eclipse:eclipse would be an interesting command.
> >
> > On Fri, Dec 2, 2011 at 1:59 PM, Will L  >wrote:
> >
> > >
> > >
> > > Oops guess the formatting went away:
> > > I have tried the following combinations:
> > > * Hadoop 0.20.203, Eclipse 3.6.2 (32-bit),
> > > hadoop-eclipse-plugin-0.20.203.0.jar
> > > * Hadoop 0.20.203, Eclipse 3.6.2 (32-bit),
> > > hadoop-eclipse-plugin-0.20.3-SNAPSHOT.jar (from JIRA)
> > > * Hadoop 0.20.203 Eclipse 3.7.1 (32-bit),
> > > hadoop-eclipse-plugin-0.20.203.0.jar
> > > * Hadoop 0.20.203, Eclipse 3.7.1 (32-bit),
> > > hadoop-eclipse-plugin-0.20.3-SNAPSHOT.jar (from JIRA)
> > > * Hadoop 0.20.205, Eclipse 3.7.1 (32-bit),
> > > hadoop-eclipse-plugin-0.20.205.0.jar
> > >
> > > > From: seventeen_reas...@hotmail.com
> > > > To: common-user@hadoop.apache.org
> > > > Subject: Help with Hadoop Eclipse Plugin on Mac OS X Lion
> > > > Date: Fri, 2 Dec 2011 00:26:28 -0800
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > Hello,
> > > > I am having problems getting my hadoop eclipse plugin to work on Mac
> OS
> > > X Lion.
> > > >
> > > > I have tried the following combinations:
> > > > Hadoop 0.20.203, Eclipse 3.6.2 (32-bit),
> > > hadoop-eclipse-plugin-0.20.203.0.jarHadoop 0.20.203, Eclipse 3.6.2
> > > (32-bit), hadoop-eclipse-plugin-0.20.3-SNAPSHOT.jar (from JIRA)Hadoop
> > > 0.20.203, Eclipse 3.7.1 (32-bit),
> > > hadoop-eclipse-plugin-0.20.203.0.jarHadoop 0.20.203, Eclipse 3.7.1
> > > (32-bit), hadoop-eclipse-plugin-0.20.3-SNAPSHOT.jar (from JIRA)Hadoop
> > > 0.20.205, Eclipse 3.7.1 (32-bit), hadoop-eclipse-plugin-0.20.205.0.jar
> > > >
> > > > Has anyone gotten the hadoop eclipse plugin to work on Mac OS X Lion?
> > > >
> > > >
> > > > Thank you for your time and help I greatly appreciate it!
> > > >
> > > >
> > > > Sincerely,
> > > >
> > > >
> > > > Will
> > > >
> > > >
> > >
> > >
>
>


How do I programmatically get total job execution time?

2011-12-02 Thread W.P. McNeill
After my Hadoop job has successfully completed I'd like to log the total
amount of time it took. This is the "Finished in" statistic in the web UI.
How do I get this number programmatically? Is there some way I can query
the Job object? I didn't see anything in the API documentation.


Re: How do I programmatically get total job execution time?

2011-12-02 Thread Tom Melendez
On Fri, Dec 2, 2011 at 9:57 AM, W.P. McNeill  wrote:
> After my Hadoop job has successfully completed I'd like to log the total
> amount of time it took. This is the "Finished in" statistic in the web UI.
> How do I get this number programmatically? Is there some way I can query
> the Job object? I didn't see anything in the API documentation.

This probably *doesn't* help you, but if you're using (or planning on
using) oozie, it has a restful API that can give you this information.

Thanks,

Tom


Re: Help with Hadoop Eclipse Plugin on Mac OS X Lion

2011-12-02 Thread Jignesh Patel
I am running eclipse plugin in Lion OS X on eclipse 3.7.

Take the plugin from contrib folder in dump to your eclipse plugin library. If 
doesn't work remove eclipse and reinstall a fresh version.

-Jignesh

On Dec 2, 2011, at 11:59 AM, Prashant Sharma wrote:

> nice to know Will, well the way i said you have the same luxury as far as
> you are running in stand-alone mode which is ideal for development.
> 
> On Fri, Dec 2, 2011 at 10:02 PM, Will L wrote:
> 
>> 
>> 
>> I got the setup working under my laptop running OS X Snow Leopard without
>> any problems and I would like to use my new laptop running OS X Lion.
>> 
>> The plugin is helpful in that I can see hadoop output being dumped to the
>> eclipse console and it used to integrate well with the Eclipse IDE making my
>> development life a little easier.
>> 
>> Thank you for your time and help.
>> 
>> Sincerely,
>> 
>> Will Lieu
>> 
>>> Date: Fri, 2 Dec 2011 21:44:36 +0530
>>> Subject: Re: Help with Hadoop Eclipse Plugin on Mac OS X Lion
>>> From: prashant.ii...@gmail.com
>>> To: common-user@hadoop.apache.org
>>> 
>>> Why do you need a plugin at all?
>>> 
>>> you can do away with it by having a maven project i.e. having a pom.xml
>> and
>>> setting hadoop as one of the dependencies. Then use regular maven
>> commands
>>> to build etc.. e.g. mvn eclipse:eclipse would be an interesting command.
>>> 
>>> On Fri, Dec 2, 2011 at 1:59 PM, Will L >> wrote:
>>> 
 
 
 Oops guess the formatting went away:
 I have tried the following combinations:
 * Hadoop 0.20.203, Eclipse 3.6.2 (32-bit),
 hadoop-eclipse-plugin-0.20.203.0.jar
 * Hadoop 0.20.203, Eclipse 3.6.2 (32-bit),
 hadoop-eclipse-plugin-0.20.3-SNAPSHOT.jar (from JIRA)
 * Hadoop 0.20.203 Eclipse 3.7.1 (32-bit),
 hadoop-eclipse-plugin-0.20.203.0.jar
 * Hadoop 0.20.203, Eclipse 3.7.1 (32-bit),
 hadoop-eclipse-plugin-0.20.3-SNAPSHOT.jar (from JIRA)
 * Hadoop 0.20.205, Eclipse 3.7.1 (32-bit),
 hadoop-eclipse-plugin-0.20.205.0.jar
 
> From: seventeen_reas...@hotmail.com
> To: common-user@hadoop.apache.org
> Subject: Help with Hadoop Eclipse Plugin on Mac OS X Lion
> Date: Fri, 2 Dec 2011 00:26:28 -0800
> 
> 
> 
> 
> 
> 
> Hello,
> I am having problems getting my hadoop eclipse plugin to work on Mac
>> OS
 X Lion.
> 
> I have tried the following combinations:
> Hadoop 0.20.203, Eclipse 3.6.2 (32-bit),
 hadoop-eclipse-plugin-0.20.203.0.jarHadoop 0.20.203, Eclipse 3.6.2
 (32-bit), hadoop-eclipse-plugin-0.20.3-SNAPSHOT.jar (from JIRA)Hadoop
 0.20.203, Eclipse 3.7.1 (32-bit),
 hadoop-eclipse-plugin-0.20.203.0.jarHadoop 0.20.203, Eclipse 3.7.1
 (32-bit), hadoop-eclipse-plugin-0.20.3-SNAPSHOT.jar (from JIRA)Hadoop
 0.20.205, Eclipse 3.7.1 (32-bit), hadoop-eclipse-plugin-0.20.205.0.jar
> 
> Has anyone gotten the hadoop eclipse plugin to work on Mac OS X Lion?
> 
> 
> Thank you for your time and help I greatly appreciate it!
> 
> 
> Sincerely,
> 
> 
> Will
> 
> 
 
 
>> 
>> 



Re: How do I programmatically get total job execution time?

2011-12-02 Thread Harsh J
I remember hitting this once in 0.20 - seems like an API limitation. The 
resolution we took back then was to get a list of all tasks, and get the end 
time with the last ended task's completion time (sort and pick). There may be 
other ways though - others can comment on that perhaps (metrics? job-history?)

On 02-Dec-2011, at 11:27 PM, W.P. McNeill wrote:

> After my Hadoop job has successfully completed I'd like to log the total
> amount of time it took. This is the "Finished in" statistic in the web UI.
> How do I get this number programmatically? Is there some way I can query
> the Job object? I didn't see anything in the API documentation.



Re: How do I programmatically get total job execution time?

2011-12-02 Thread Raj V
As Harsh said, I don't think there is a simple way to way to find when the job 
ended, especially after the job is completed. 

But cant you just wait for your job to complete and log the time when the job 
completed? 

Raj



>
> From: Harsh J 
>To: common-user@hadoop.apache.org 
>Sent: Friday, December 2, 2011 12:53 PM
>Subject: Re: How do I programmatically get total job execution time?
> 
>I remember hitting this once in 0.20 - seems like an API limitation. The 
>resolution we took back then was to get a list of all tasks, and get the end 
>time with the last ended task's completion time (sort and pick). There may be 
>other ways though - others can comment on that perhaps (metrics? job-history?)
>
>On 02-Dec-2011, at 11:27 PM, W.P. McNeill wrote:
>
>> After my Hadoop job has successfully completed I'd like to log the total
>> amount of time it took. This is the "Finished in" statistic in the web UI.
>> How do I get this number programmatically? Is there some way I can query
>> the Job object? I didn't see anything in the API documentation.
>
>02-Dec-2011, at 11:27 PM, W.P. McNeill wrote:
>
>> After my Hadoop job has successfully completed I'd like to log the total
>> amount of time it took. This is the "Finished in" statistic in the web UI.
>> How do I get this number programmatically? Is there some way I can query
>> the Job object? I didn't see anything in the API documentation.
>
>
>
>

RE: Hadoop-streaming using binary executable c program

2011-12-02 Thread Daniel Yehdego





Hi.

I was trying to run hadoop streaming and before that I check with the following 
:
bin/hadoop fs -cat 
/user/yehdego/Hadoop-Data-New/RF00171_A.bpseqL3G1_seg_Optimized_Method.txt | 
head -2 | ./HADOOP 
Were HADOOP is a shell script:
#!/bin/shrm -f temp.txt;while read line doecho $line >> temp.txt;doneexec 
/data/yehdego/hadoop-0.20.2/PKNOTSRG/src/bin/pknotsRG -k o -F temp.txt;
and its working, but when i try running on streaming using the following:
 bin/hadoop jar /data/yehdego/hadoop-0.20.2/hadoop-0.20.2-streaming.jar -mapper 
./HADOOP  -file /data/yehdego/hadoop-0.20.2/HADOOP -file 
/data/yehdego/hadoop-0.20.2/PKNOTSRG/src/bin/pknotsRG -reducer 
./ReduceLatest.py -file /data/yehdego/hadoop-0.20.2/ReduceLatest.py -input 
/user/yehdego/Hadoop-Data-New/RF00171_A.bpseqL3G1_seg_Optimized_Method.txt  
-output /user/yehdego/RF171_NEW/RF00171_A.bpseqL3G1_Optimized_Method40.txt 
-verbose 
it failed with the following error:
PipeMapRed\.waitOutputThreads(): subprocess failed with code 126at 
org\.apache\.hadoop\.streaming\.PipeMapRed\.waitOutputThreads(PipeMapRed\.java:311)
  at 
org\.apache\.hadoop\.streaming\.PipeMapRed\.mapRedFinished(PipeMapRed\.java:545)
 at org\.apache\.hadoop\.streaming\.PipeMapper\.close(PipeMapper\.java:132) 
 at org\.apache\.hadoop\.mapred\.MapRunner\.run(MapRunner\.java:57)  at 
org\.apache\.hadoop\.streaming\.PipeMapRunner\.run(PipeMapRunner\.java:36)   at 
org\.apache\.hadoop\.mapred\.MapTask\.runOldMapper(MapTask\.java:358)at 
org\.apache\.hadoop\.mapred\.MapTask\.run(MapTask\.java:307) at 
org\.apache\.hadoop\.mapred\.Child\.main(Child\.java:170)
Any idea on this problem ?
Regards, 

Daniel T. Yehdego
Computational Science Program 
University of Texas at El Paso, UTEP 
dtyehd...@miners.utep.edu

> From: ev...@yahoo-inc.com
> To: common-user@hadoop.apache.org
> Date: Mon, 25 Jul 2011 14:47:34 -0700
> Subject: Re: Hadoop-streaming using binary executable c program
> 
> This is likely to be slow and it is not ideal.  The ideal would be to modify 
> pknotsRG to be able to read from stdin, but that may not be possible.
> 
> The shell script would probably look something like the following
> 
> #!/bin/sh
> rm -f temp.txt;
> while read line
> do
>   echo $line >> temp.txt;
> done
> exec pknotsRG temp.txt;
> 
> Place it in a file say hadoopPknotsRG  Then you probably want to run
> 
> chmod +x hadoopPknotsRG
> 
> After that you want to test it with
> 
> hadoop fs -cat 
> /user/yehdego/RNAData/RF00028_B.bpseqL3G5_seg_Centered_Method.txt | head -2 | 
> ./hadoopPknotsRG
> 
> If that works then you can try it with Hadoop streaming
> 
> HADOOP_HOME$ bin/hadoop jar 
> /data/yehdego/hadoop-0.20.2/hadoop-0.20.2-streaming.jar -mapper 
> ./hadoopPknotsRG -file /data/yehdego/hadoop-0.20.2/pknotsRG -file 
> /data/yehdego/hadoop-0.20.2/hadoopPknotsRG -input 
> /user/yehdego/RF00028_B.bpseqL3G5_seg_Centered_Method.txt -output 
> /user/yehdego/RF-out -reducer NONE -verbose
> 
> --Bobby
> 
> On 7/25/11 3:37 PM, "Daniel Yehdego"  wrote:
> 
> 
> 
> Good afternoon Bobby,
> 
> Thanks, you gave me a great help in finding out what the problem was. After I 
> put the command line you suggested me, I found out that there was a 
> segmentation error.
> The binary executable program pknotsRG only reads a file with a sequence in 
> it. This means, there should be a shell script, as you have said, that will 
> take the data coming
> from stdin and write it to a temporary file. Any idea on how to do this job 
> in shell script. The thing is I am from a biology background and don't have 
> much experience in CS.
> looking forward to hear from you. Thanks so much.
> 
> Regards,
> 
> Daniel T. Yehdego
> Computational Science Program
> University of Texas at El Paso, UTEP
> dtyehd...@miners.utep.edu
> 
> > From: ev...@yahoo-inc.com
> > To: common-user@hadoop.apache.org
> > Date: Fri, 22 Jul 2011 12:39:08 -0700
> > Subject: Re: Hadoop-streaming using binary executable c program
> >
> > I would suggest that you do the following to help you debug.
> >
> > hadoop fs -cat 
> > /user/yehdego/RNAData/RF00028_B.bpseqL3G5_seg_Centered_Method.txt | head -2 
> > | /data/yehdego/hadoop-0.20.2/pknotsRG-1.3/src/pknotsRG -
> >
> > This is simulating what hadoop streaming is doing.  Here we are taking the 
> > first 2 lines out of the input file and feeding them to the stdin of 
> > pknotsRG.  The first step is to make sure that you can get your program to 
> > run correctly with something like this.  You may need to change the command 
> > line to pknotsRG to get it to read the data it is processing from stdin, 
> > instead of from a file.  Alternatively you may need to write a shell script 
> > that will take the data coming from stdin.  Write it to a file and then 
> > call pknotsRG on that temporary file.  Once you have this working then you 
> > should try it again with streaming.
> >
> > --Bobby Evans
> >
> > On 7/22/11 12:31 PM, "Daniel Yehdego"  wrote:
> >
> >
> >
> > Hi Bobby, Thanks f

RE: Help with Hadoop Eclipse Plugin on Mac OS X Lion

2011-12-02 Thread Will L

What version of Hadoop are you running on OS X Lion and are you running 32-bit 
or 64-bit version of Eclipse?

> Subject: Re: Help with Hadoop Eclipse Plugin on Mac OS X Lion
> From: jign...@websoft.com
> Date: Fri, 2 Dec 2011 14:37:28 -0500
> To: common-user@hadoop.apache.org
> 
> I am running eclipse plugin in Lion OS X on eclipse 3.7.
> 
> Take the plugin from contrib folder in dump to your eclipse plugin library. 
> If doesn't work remove eclipse and reinstall a fresh version.
> 
> -Jignesh
> 
> On Dec 2, 2011, at 11:59 AM, Prashant Sharma wrote:
> 
> > nice to know Will, well the way i said you have the same luxury as far as
> > you are running in stand-alone mode which is ideal for development.
> > 
> > On Fri, Dec 2, 2011 at 10:02 PM, Will L 
> > wrote:
> > 
> >> 
> >> 
> >> I got the setup working under my laptop running OS X Snow Leopard without
> >> any problems and I would like to use my new laptop running OS X Lion.
> >> 
> >> The plugin is helpful in that I can see hadoop output being dumped to the
> >> eclipse console and it used to integrate well with the Eclipse IDE making 
> >> my
> >> development life a little easier.
> >> 
> >> Thank you for your time and help.
> >> 
> >> Sincerely,
> >> 
> >> Will Lieu
> >> 
> >>> Date: Fri, 2 Dec 2011 21:44:36 +0530
> >>> Subject: Re: Help with Hadoop Eclipse Plugin on Mac OS X Lion
> >>> From: prashant.ii...@gmail.com
> >>> To: common-user@hadoop.apache.org
> >>> 
> >>> Why do you need a plugin at all?
> >>> 
> >>> you can do away with it by having a maven project i.e. having a pom.xml
> >> and
> >>> setting hadoop as one of the dependencies. Then use regular maven
> >> commands
> >>> to build etc.. e.g. mvn eclipse:eclipse would be an interesting command.
> >>> 
> >>> On Fri, Dec 2, 2011 at 1:59 PM, Will L  >>> wrote:
> >>> 
>  
>  
>  Oops guess the formatting went away:
>  I have tried the following combinations:
>  * Hadoop 0.20.203, Eclipse 3.6.2 (32-bit),
>  hadoop-eclipse-plugin-0.20.203.0.jar
>  * Hadoop 0.20.203, Eclipse 3.6.2 (32-bit),
>  hadoop-eclipse-plugin-0.20.3-SNAPSHOT.jar (from JIRA)
>  * Hadoop 0.20.203 Eclipse 3.7.1 (32-bit),
>  hadoop-eclipse-plugin-0.20.203.0.jar
>  * Hadoop 0.20.203, Eclipse 3.7.1 (32-bit),
>  hadoop-eclipse-plugin-0.20.3-SNAPSHOT.jar (from JIRA)
>  * Hadoop 0.20.205, Eclipse 3.7.1 (32-bit),
>  hadoop-eclipse-plugin-0.20.205.0.jar
>  
> > From: seventeen_reas...@hotmail.com
> > To: common-user@hadoop.apache.org
> > Subject: Help with Hadoop Eclipse Plugin on Mac OS X Lion
> > Date: Fri, 2 Dec 2011 00:26:28 -0800
> > 
> > 
> > 
> > 
> > 
> > 
> > Hello,
> > I am having problems getting my hadoop eclipse plugin to work on Mac
> >> OS
>  X Lion.
> > 
> > I have tried the following combinations:
> > Hadoop 0.20.203, Eclipse 3.6.2 (32-bit),
>  hadoop-eclipse-plugin-0.20.203.0.jarHadoop 0.20.203, Eclipse 3.6.2
>  (32-bit), hadoop-eclipse-plugin-0.20.3-SNAPSHOT.jar (from JIRA)Hadoop
>  0.20.203, Eclipse 3.7.1 (32-bit),
>  hadoop-eclipse-plugin-0.20.203.0.jarHadoop 0.20.203, Eclipse 3.7.1
>  (32-bit), hadoop-eclipse-plugin-0.20.3-SNAPSHOT.jar (from JIRA)Hadoop
>  0.20.205, Eclipse 3.7.1 (32-bit), hadoop-eclipse-plugin-0.20.205.0.jar
> > 
> > Has anyone gotten the hadoop eclipse plugin to work on Mac OS X Lion?
> > 
> > 
> > Thank you for your time and help I greatly appreciate it!
> > 
> > 
> > Sincerely,
> > 
> > 
> > Will
> > 
> > 
>  
>  
> >> 
> >> 
> 
  

Re: How do I programmatically get total job execution time?

2011-12-02 Thread Praveen Sripati
Hi,

Ran a job using new MR API in stand alone mode and 0.21. Both,
Job#getFinishTime and Job#getStartTime are returning 0. Not sure, if this
is a bug.

Thanks,
Praveen

On Sat, Dec 3, 2011 at 6:14 AM, Raj V  wrote:

> As Harsh said, I don't think there is a simple way to way to find when the
> job ended, especially after the job is completed.
>
> But cant you just wait for your job to complete and log the time when the
> job completed?
>
> Raj
>
>
>
> >
> > From: Harsh J 
> >To: common-user@hadoop.apache.org
> >Sent: Friday, December 2, 2011 12:53 PM
> >Subject: Re: How do I programmatically get total job execution time?
> >
> >I remember hitting this once in 0.20 - seems like an API limitation. The
> resolution we took back then was to get a list of all tasks, and get the
> end time with the last ended task's completion time (sort and pick). There
> may be other ways though - others can comment on that perhaps (metrics?
> job-history?)
> >
> >On 02-Dec-2011, at 11:27 PM, W.P. McNeill wrote:
> >
> >> After my Hadoop job has successfully completed I'd like to log the total
> >> amount of time it took. This is the "Finished in" statistic in the web
> UI.
> >> How do I get this number programmatically? Is there some way I can query
> >> the Job object? I didn't see anything in the API documentation.
> >
> >02-Dec-2011, at 11:27 PM, W.P. McNeill wrote:
> >
> >> After my Hadoop job has successfully completed I'd like to log the total
> >> amount of time it took. This is the "Finished in" statistic in the web
> UI.
> >> How do I get this number programmatically? Is there some way I can query
> >> the Job object? I didn't see anything in the API documentation.
> >
> >
> >
> >
>


RE: Help with Hadoop Eclipse Plugin on Mac OS X Lion

2011-12-02 Thread Will L


I am using 64-Bit Eclipse 3.7.1 Cocoa with Hadoop 0.20.205.0. I get the 
following error message:
An internal error occurred during: "Connecting to DFS localhost".
org/apache/commons/configuration/Configuration 

> From: seventeen_reas...@hotmail.com
> To: common-user@hadoop.apache.org
> Subject: RE: Help with Hadoop Eclipse Plugin on Mac OS X Lion
> Date: Fri, 2 Dec 2011 20:51:02 -0800
> 
> 
> What version of Hadoop are you running on OS X Lion and are you running 
> 32-bit or 64-bit version of Eclipse?
> 
> > Subject: Re: Help with Hadoop Eclipse Plugin on Mac OS X Lion
> > From: jign...@websoft.com
> > Date: Fri, 2 Dec 2011 14:37:28 -0500
> > To: common-user@hadoop.apache.org
> > 
> > I am running eclipse plugin in Lion OS X on eclipse 3.7.
> > 
> > Take the plugin from contrib folder in dump to your eclipse plugin library. 
> > If doesn't work remove eclipse and reinstall a fresh version.
> > 
> > -Jignesh
> > 
> > On Dec 2, 2011, at 11:59 AM, Prashant Sharma wrote:
> > 
> > > nice to know Will, well the way i said you have the same luxury as far as
> > > you are running in stand-alone mode which is ideal for development.
> > > 
> > > On Fri, Dec 2, 2011 at 10:02 PM, Will L 
> > > wrote:
> > > 
> > >> 
> > >> 
> > >> I got the setup working under my laptop running OS X Snow Leopard without
> > >> any problems and I would like to use my new laptop running OS X Lion.
> > >> 
> > >> The plugin is helpful in that I can see hadoop output being dumped to the
> > >> eclipse console and it used to integrate well with the Eclipse IDE 
> > >> making my
> > >> development life a little easier.
> > >> 
> > >> Thank you for your time and help.
> > >> 
> > >> Sincerely,
> > >> 
> > >> Will Lieu
> > >> 
> > >>> Date: Fri, 2 Dec 2011 21:44:36 +0530
> > >>> Subject: Re: Help with Hadoop Eclipse Plugin on Mac OS X Lion
> > >>> From: prashant.ii...@gmail.com
> > >>> To: common-user@hadoop.apache.org
> > >>> 
> > >>> Why do you need a plugin at all?
> > >>> 
> > >>> you can do away with it by having a maven project i.e. having a pom.xml
> > >> and
> > >>> setting hadoop as one of the dependencies. Then use regular maven
> > >> commands
> > >>> to build etc.. e.g. mvn eclipse:eclipse would be an interesting command.
> > >>> 
> > >>> On Fri, Dec 2, 2011 at 1:59 PM, Will L  > >>> wrote:
> > >>> 
> >  
> >  
> >  Oops guess the formatting went away:
> >  I have tried the following combinations:
> >  * Hadoop 0.20.203, Eclipse 3.6.2 (32-bit),
> >  hadoop-eclipse-plugin-0.20.203.0.jar
> >  * Hadoop 0.20.203, Eclipse 3.6.2 (32-bit),
> >  hadoop-eclipse-plugin-0.20.3-SNAPSHOT.jar (from JIRA)
> >  * Hadoop 0.20.203 Eclipse 3.7.1 (32-bit),
> >  hadoop-eclipse-plugin-0.20.203.0.jar
> >  * Hadoop 0.20.203, Eclipse 3.7.1 (32-bit),
> >  hadoop-eclipse-plugin-0.20.3-SNAPSHOT.jar (from JIRA)
> >  * Hadoop 0.20.205, Eclipse 3.7.1 (32-bit),
> >  hadoop-eclipse-plugin-0.20.205.0.jar
> >  
> > > From: seventeen_reas...@hotmail.com
> > > To: common-user@hadoop.apache.org
> > > Subject: Help with Hadoop Eclipse Plugin on Mac OS X Lion
> > > Date: Fri, 2 Dec 2011 00:26:28 -0800
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > Hello,
> > > I am having problems getting my hadoop eclipse plugin to work on Mac
> > >> OS
> >  X Lion.
> > > 
> > > I have tried the following combinations:
> > > Hadoop 0.20.203, Eclipse 3.6.2 (32-bit),
> >  hadoop-eclipse-plugin-0.20.203.0.jarHadoop 0.20.203, Eclipse 3.6.2
> >  (32-bit), hadoop-eclipse-plugin-0.20.3-SNAPSHOT.jar (from JIRA)Hadoop
> >  0.20.203, Eclipse 3.7.1 (32-bit),
> >  hadoop-eclipse-plugin-0.20.203.0.jarHadoop 0.20.203, Eclipse 3.7.1
> >  (32-bit), hadoop-eclipse-plugin-0.20.3-SNAPSHOT.jar (from JIRA)Hadoop
> >  0.20.205, Eclipse 3.7.1 (32-bit), hadoop-eclipse-plugin-0.20.205.0.jar
> > > 
> > > Has anyone gotten the hadoop eclipse plugin to work on Mac OS X Lion?
> > > 
> > > 
> > > Thank you for your time and help I greatly appreciate it!
> > > 
> > > 
> > > Sincerely,
> > > 
> > > 
> > > Will
> > > 
> > > 
> >  
> >  
> > >> 
> > >> 
> > 
> 
  

Re: Availability of Job traces or logs

2011-12-02 Thread Amar Kamat
Arun,
You can very well run synthetic workloads like large scale sort, wordcount etc 
or more realistic workloads like PigMix 
(https://cwiki.apache.org/confluence/display/PIG/PigMix). On a decent enough 
cluster, these workloads work pretty well. Is there a specific reason why you 
want traces of varied sizes from various organizations?

> How can i make sure that the rumen generates only say 25 jobs,50 jobs or so
Do you want to get 25/50 jobs based on some filtering criterion? I recently 
faced a similar situation where I wanted to extract jobs from a Rumen trace 
based on job ids. I will be happy to share these filtering tools.

Amar


On 12/1/11 8:48 AM, "ArunKumar"  wrote:

Hi guys !

Apart from generating the job traces from RUMEN , can i get logs or job
traces of varied sizes from some organizations.

How can i make sure that the rumen generates only say 25 jobs,50 jobs or so
?


Thanks,
Arun

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Availability-of-Job-traces-or-logs-tp3550462p3550462.html
Sent from the Hadoop lucene-users mailing list archive at Nabble.com.



Re: Capturing Map and Reduce I/O time

2011-12-02 Thread Amar Kamat
Arun,
> I see that hadoop doesn't capture the Map task I/O time and Reduce task I/O 
> time and captures only map runtime
> and reduce runtime. Am i right ?
For maps, the framework doesn't explicitly capture the read time. For reduce, 
maybe shuffle time is a good metric to start with.

> What does that runtime of Map and reduce tasks mean ?
Time to finish the entire map task (not the method). Includes data read, data 
processing, sort and spill.

> Which files do i need to look at and modify in Hadoop if i want to capture 
> the map and reduce I/O time's ?
For the old codebase (pre YARN), see MapTask.java and ReduceTask.java.

Roughly, the map phase is divided into 2 phases i.e map and sort. In the map 
phase, the read and processing happens in parallel. While the user code 
processes the current key-value pair, the framework reads and caches the next 
key-value pair. Hence its tough to distinguish between the read and process 
phases.

Reduce task is divided into 3 phases i.e shuffle, sort (final), reduce. The 
shuffle phase has data copy (over the network) and sort (rather merge) 
happening in parallel. Once the entire data gets copies, a final merge happens. 
This gets captured under the sort phase. But still the shuffle phase time 
(recorded in the job history) is a good indicator of the time it takes to read 
the data off the network.

Amar

On 11/29/11 7:56 PM, "ArunKumar"  wrote:

Hi guys !

I see that hadoop doesn't capture the Map task I/O time and Reduce task I/O
time and captures only map runtime  and reduce runtime. Am i right ?

By I/O time for map task i meant time taken by the map task to read the
input chunk allocated to it for processing and the time for it to write the
O/P data to the local disk.
By I/O time for Reduce task i meant time for reduce task to transfer map
O/Ps to reduce task(shuffle phase) and writing reduce O/Ps to DFS.

> What does that runtime of Map and reduce tasks mean ?
   Does it mean time taken to execute the Map method and reduce method
respectively ? (or)
   Does it mean time taken from the start of the Map/Reduce task to the
completion of the Map/Reduce task(i.e including time to read,sort ,compute
map or reduce ,merge,etc.) ?

> Which files do i need to look at and modify in Hadoop if i want to capture
> the map and reduce I/O time's ?

>  If i want to capture these values for few jobs of applications like
> wordcount,sort,etc. what is the best way to do ?

Can anyone guide me in this regard ?

Thanks,
Arun

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Capturing-Map-and-Reduce-I-O-time-tp3545298p3545298.html
Sent from the Hadoop lucene-users mailing list archive at Nabble.com.