RE: Hadoop-streaming using binary executable c program

2011-12-02 Thread Daniel Yehdego





Hi.

I was trying to run hadoop streaming and before that I check with the following 
:
bin/hadoop fs -cat 
/user/yehdego/Hadoop-Data-New/RF00171_A.bpseqL3G1_seg_Optimized_Method.txt | 
head -2 | ./HADOOP 
Were HADOOP is a shell script:
#!/bin/shrm -f temp.txt;while read line doecho $line  temp.txt;doneexec 
/data/yehdego/hadoop-0.20.2/PKNOTSRG/src/bin/pknotsRG -k o -F temp.txt;
and its working, but when i try running on streaming using the following:
 bin/hadoop jar /data/yehdego/hadoop-0.20.2/hadoop-0.20.2-streaming.jar -mapper 
./HADOOP  -file /data/yehdego/hadoop-0.20.2/HADOOP -file 
/data/yehdego/hadoop-0.20.2/PKNOTSRG/src/bin/pknotsRG -reducer 
./ReduceLatest.py -file /data/yehdego/hadoop-0.20.2/ReduceLatest.py -input 
/user/yehdego/Hadoop-Data-New/RF00171_A.bpseqL3G1_seg_Optimized_Method.txt  
-output /user/yehdego/RF171_NEW/RF00171_A.bpseqL3G1_Optimized_Method40.txt 
-verbose 
it failed with the following error:
PipeMapRed\.waitOutputThreads(): subprocess failed with code 126at 
org\.apache\.hadoop\.streaming\.PipeMapRed\.waitOutputThreads(PipeMapRed\.java:311)
  at 
org\.apache\.hadoop\.streaming\.PipeMapRed\.mapRedFinished(PipeMapRed\.java:545)
 at org\.apache\.hadoop\.streaming\.PipeMapper\.close(PipeMapper\.java:132) 
 at org\.apache\.hadoop\.mapred\.MapRunner\.run(MapRunner\.java:57)  at 
org\.apache\.hadoop\.streaming\.PipeMapRunner\.run(PipeMapRunner\.java:36)   at 
org\.apache\.hadoop\.mapred\.MapTask\.runOldMapper(MapTask\.java:358)at 
org\.apache\.hadoop\.mapred\.MapTask\.run(MapTask\.java:307) at 
org\.apache\.hadoop\.mapred\.Child\.main(Child\.java:170)
Any idea on this problem ?
Regards, 

Daniel T. Yehdego
Computational Science Program 
University of Texas at El Paso, UTEP 
dtyehd...@miners.utep.edu

 From: ev...@yahoo-inc.com
 To: common-user@hadoop.apache.org
 Date: Mon, 25 Jul 2011 14:47:34 -0700
 Subject: Re: Hadoop-streaming using binary executable c program
 
 This is likely to be slow and it is not ideal.  The ideal would be to modify 
 pknotsRG to be able to read from stdin, but that may not be possible.
 
 The shell script would probably look something like the following
 
 #!/bin/sh
 rm -f temp.txt;
 while read line
 do
   echo $line  temp.txt;
 done
 exec pknotsRG temp.txt;
 
 Place it in a file say hadoopPknotsRG  Then you probably want to run
 
 chmod +x hadoopPknotsRG
 
 After that you want to test it with
 
 hadoop fs -cat 
 /user/yehdego/RNAData/RF00028_B.bpseqL3G5_seg_Centered_Method.txt | head -2 | 
 ./hadoopPknotsRG
 
 If that works then you can try it with Hadoop streaming
 
 HADOOP_HOME$ bin/hadoop jar 
 /data/yehdego/hadoop-0.20.2/hadoop-0.20.2-streaming.jar -mapper 
 ./hadoopPknotsRG -file /data/yehdego/hadoop-0.20.2/pknotsRG -file 
 /data/yehdego/hadoop-0.20.2/hadoopPknotsRG -input 
 /user/yehdego/RF00028_B.bpseqL3G5_seg_Centered_Method.txt -output 
 /user/yehdego/RF-out -reducer NONE -verbose
 
 --Bobby
 
 On 7/25/11 3:37 PM, Daniel Yehdego dtyehd...@miners.utep.edu wrote:
 
 
 
 Good afternoon Bobby,
 
 Thanks, you gave me a great help in finding out what the problem was. After I 
 put the command line you suggested me, I found out that there was a 
 segmentation error.
 The binary executable program pknotsRG only reads a file with a sequence in 
 it. This means, there should be a shell script, as you have said, that will 
 take the data coming
 from stdin and write it to a temporary file. Any idea on how to do this job 
 in shell script. The thing is I am from a biology background and don't have 
 much experience in CS.
 looking forward to hear from you. Thanks so much.
 
 Regards,
 
 Daniel T. Yehdego
 Computational Science Program
 University of Texas at El Paso, UTEP
 dtyehd...@miners.utep.edu
 
  From: ev...@yahoo-inc.com
  To: common-user@hadoop.apache.org
  Date: Fri, 22 Jul 2011 12:39:08 -0700
  Subject: Re: Hadoop-streaming using binary executable c program
 
  I would suggest that you do the following to help you debug.
 
  hadoop fs -cat 
  /user/yehdego/RNAData/RF00028_B.bpseqL3G5_seg_Centered_Method.txt | head -2 
  | /data/yehdego/hadoop-0.20.2/pknotsRG-1.3/src/pknotsRG -
 
  This is simulating what hadoop streaming is doing.  Here we are taking the 
  first 2 lines out of the input file and feeding them to the stdin of 
  pknotsRG.  The first step is to make sure that you can get your program to 
  run correctly with something like this.  You may need to change the command 
  line to pknotsRG to get it to read the data it is processing from stdin, 
  instead of from a file.  Alternatively you may need to write a shell script 
  that will take the data coming from stdin.  Write it to a file and then 
  call pknotsRG on that temporary file.  Once you have this working then you 
  should try it again with streaming.
 
  --Bobby Evans
 
  On 7/22/11 12:31 PM, Daniel Yehdego dtyehd...@miners.utep.edu wrote:
 
 
 
  Hi Bobby, Thanks for the response.
 
  After I tried the following comannd:
 
  bin/hadoop jar

RE: Hadoop-streaming using binary executable c program

2011-08-01 Thread Daniel Yehdego

Hi Bobby, 

I have written a small Perl script which do the following job:

Assume we have an output from the mapper

MAP1
RNA-1
STRUCTURE-1

MAP2
RNA-2
STRUCTURE-2

MAP3
RNA-3
STRUCTURE-3

and what the script does is reduce in the following manner : 
RNA-1RNA-2RNA-3\tSTRUCTURE-1STRUCTURE-2STRUCTURE-3\n
 and the script looks like this:

#!/usr/bin/perl
use strict;
use warnings;
use autodie;

my @handles = map { open my $h, '', $_; $h } @ARGV;

while (@handles){
@handles = grep { ! eof $_ } @handles;
my @lines = map { my $v = $_; chomp $v; $v } @handles;
print join(' ', @lines), \n;
}

close $_ for @handles;

This should work for any inputs from the  mapper. But after I use hadoop 
streaming and put the above code as my reducer, the job was successful
but the output files were empty. And I couldn't find out.

 bin/hadoop jar /data/yehdego/hadoop-0.20.2/hadoop-0.20.2-streaming.jar 
-mapper ./hadoopPknotsRG 
-file /data/yehdego/hadoop-0.20.2/pknotsRG 
-file /data/yehdego/hadoop-0.20.2/hadoopPknotsRG 
-reducer ./reducer.pl 
-file /data/yehdego/hadoop-0.20.2/reducer.pl  
-input /user/yehdego/RF00028_B.bpseqL3G5_seg_Centered_Method.txt 
-output /user/yehdego/RFR2-out - verbose

Any help or suggestion is really appreciatedI am just stuck here for the 
weekend.
 
Regards, 

Daniel T. Yehdego
Computational Science Program 
University of Texas at El Paso, UTEP 
dtyehd...@miners.utep.edu

 From: ev...@yahoo-inc.com
 To: common-user@hadoop.apache.org
 Date: Thu, 28 Jul 2011 07:12:11 -0700
 Subject: Re: Hadoop-streaming using binary executable c program
 
 I am not completely sure what you are getting at.  It looks like the output 
 of your c program is (And this is just a guess)  NOTE: \t stands for the tab 
 character and in streaming it is used to separate the key from the value \n 
 stands for carriage return and is used to separate individual records..
 RNA-1\tSTRUCTURE-1\n
 RNA-2\tSTRUCTURE-2\n
 RNA-3\tSTRUCTURE-3\n
 ...
 
 
 And you want the output to look like
 RNA-1RNA-2RNA-3\tSTRUCTURE-1STRUCTURE-2STRUCTURE-3\n
 
 You could use a reduce to do this, but the issue here is with the shuffle in 
 between the maps and the reduces.  The Shuffle will group by the key to send 
 to the reducers and then sort by the key.  So in reality your map output 
 looks something like
 
 FROM MAP 1:
 RNA-1\tSTRUCTURE-1\n
 RNA-2\tSTRUCTURE-2\n
 
 FROM MAP 2:
 RNA-3\tSTRUCTURE-3\n
 RNA-4\tSTRUCTURE-4\n
 
 FROM MAP 3:
 RNA-5\tSTRUCTURE-5\n
 RNA-6\tSTRUCTURE-6\n
 
 If you send it to a single reducer (The only way to get a single file) Then 
 the input to the reducer will be sorted alphabetically by the RNA, and the 
 order of the input will be lost.  You can work around this by giving each 
 line a unique number that is in the order you want It to be output.  But 
 doing this would require you to write some code.  I would suggest that you do 
 it with a small shell script after all the maps have completed to splice them 
 together.
 
 --
 Bobby
 
 On 7/27/11 2:55 PM, Daniel Yehdego dtyehd...@miners.utep.edu wrote:
 
 
 
 Hi Bobby,
 
 I just want to ask you if there is away of using a reducer or something like 
 concatenation to glue my outputs from the mapper and outputs
 them as a single file and segment of the predicted RNA 2D structure?
 
 FYI: I have used a reducer NONE before:
 
 HADOOP_HOME$ bin/hadoop jar
 /data/yehdego/hadoop-0.20.2/hadoop-0.20.2-streaming.jar -mapper
 ./hadoopPknotsRG -file /data/yehdego/hadoop-0.20.2/pknotsRG -file
 /data/yehdego/hadoop-0.20.2/hadoopPknotsRG -input
 /user/yehdego/RF00028_B.bpseqL3G5_seg_Centered_Method.txt -output
 /user/yehdego/RF-out -reducer NONE -verbose
 
 and a sample of my output using the mapper of two different slave nodes looks 
 like this :
 
 AUACCCGCAAAUUCACUCAAAUCUGUAAUAGGUUUGUCAUUCAAAUCUAGUGCAAAUAUUACUUUCGCCAAUUAGGUAUAAUAAUGGUAAGC
 and
 [...(((...))).].
   (-13.46)
 
 GGGACAAGACUCGACAUUUGAUACACUAUUUAUCAAUGGAUGUCUUCU
 .(((.((......)..  (-11.00)
 
 and I want to concatenate and output them as a single predicated RNA sequence 
 structure:
 
 AUACCCGCAAAUUCACUCAAAUCUGUAAUAGGUUUGUCAUUCAAAUCUAGUGCAAAUAUUACUUUCGCCAAUUAGGUAUAAUAAUGGUAAGCGGGACAAGACUCGACAUUUGAUACACUAUUUAUCAAUGGAUGUCUUCU
 
 [...(((...))).]..(((.((......)..
 
 
 Regards,
 
 Daniel T. Yehdego
 Computational Science Program
 University of Texas at El Paso, UTEP
 dtyehd...@miners.utep.edu
 
  From: dtyehd...@miners.utep.edu
  To: common-user@hadoop.apache.org
  Subject: RE: Hadoop-streaming using binary executable c program
  Date: Tue, 26 Jul 2011 16:23:10 +
 
 
  Good afternoon Bobby,
 
  Thanks so much, now its working excellent. And the speed is also 
  reasonable. Once again thanks u.
 
  Regards,
 
  Daniel T. Yehdego
  Computational

Re: Hadoop-streaming using binary executable c program

2011-07-28 Thread Robert Evans
I am not completely sure what you are getting at.  It looks like the output of 
your c program is (And this is just a guess)  NOTE: \t stands for the tab 
character and in streaming it is used to separate the key from the value \n 
stands for carriage return and is used to separate individual records..
RNA-1\tSTRUCTURE-1\n
RNA-2\tSTRUCTURE-2\n
RNA-3\tSTRUCTURE-3\n
...


And you want the output to look like
RNA-1RNA-2RNA-3\tSTRUCTURE-1STRUCTURE-2STRUCTURE-3\n

You could use a reduce to do this, but the issue here is with the shuffle in 
between the maps and the reduces.  The Shuffle will group by the key to send to 
the reducers and then sort by the key.  So in reality your map output looks 
something like

FROM MAP 1:
RNA-1\tSTRUCTURE-1\n
RNA-2\tSTRUCTURE-2\n

FROM MAP 2:
RNA-3\tSTRUCTURE-3\n
RNA-4\tSTRUCTURE-4\n

FROM MAP 3:
RNA-5\tSTRUCTURE-5\n
RNA-6\tSTRUCTURE-6\n

If you send it to a single reducer (The only way to get a single file) Then the 
input to the reducer will be sorted alphabetically by the RNA, and the order of 
the input will be lost.  You can work around this by giving each line a unique 
number that is in the order you want It to be output.  But doing this would 
require you to write some code.  I would suggest that you do it with a small 
shell script after all the maps have completed to splice them together.

--
Bobby

On 7/27/11 2:55 PM, Daniel Yehdego dtyehd...@miners.utep.edu wrote:



Hi Bobby,

I just want to ask you if there is away of using a reducer or something like 
concatenation to glue my outputs from the mapper and outputs
them as a single file and segment of the predicted RNA 2D structure?

FYI: I have used a reducer NONE before:

HADOOP_HOME$ bin/hadoop jar
/data/yehdego/hadoop-0.20.2/hadoop-0.20.2-streaming.jar -mapper
./hadoopPknotsRG -file /data/yehdego/hadoop-0.20.2/pknotsRG -file
/data/yehdego/hadoop-0.20.2/hadoopPknotsRG -input
/user/yehdego/RF00028_B.bpseqL3G5_seg_Centered_Method.txt -output
/user/yehdego/RF-out -reducer NONE -verbose

and a sample of my output using the mapper of two different slave nodes looks 
like this :

AUACCCGCAAAUUCACUCAAAUCUGUAAUAGGUUUGUCAUUCAAAUCUAGUGCAAAUAUUACUUUCGCCAAUUAGGUAUAAUAAUGGUAAGC
and
[...(((...))).].
  (-13.46)

GGGACAAGACUCGACAUUUGAUACACUAUUUAUCAAUGGAUGUCUUCU
.(((.((......)..  (-11.00)

and I want to concatenate and output them as a single predicated RNA sequence 
structure:

AUACCCGCAAAUUCACUCAAAUCUGUAAUAGGUUUGUCAUUCAAAUCUAGUGCAAAUAUUACUUUCGCCAAUUAGGUAUAAUAAUGGUAAGCGGGACAAGACUCGACAUUUGAUACACUAUUUAUCAAUGGAUGUCUUCU

[...(((...))).]..(((.((......)..


Regards,

Daniel T. Yehdego
Computational Science Program
University of Texas at El Paso, UTEP
dtyehd...@miners.utep.edu

 From: dtyehd...@miners.utep.edu
 To: common-user@hadoop.apache.org
 Subject: RE: Hadoop-streaming using binary executable c program
 Date: Tue, 26 Jul 2011 16:23:10 +


 Good afternoon Bobby,

 Thanks so much, now its working excellent. And the speed is also reasonable. 
 Once again thanks u.

 Regards,

 Daniel T. Yehdego
 Computational Science Program
 University of Texas at El Paso, UTEP
 dtyehd...@miners.utep.edu

  From: ev...@yahoo-inc.com
  To: common-user@hadoop.apache.org
  Date: Mon, 25 Jul 2011 14:47:34 -0700
  Subject: Re: Hadoop-streaming using binary executable c program
 
  This is likely to be slow and it is not ideal.  The ideal would be to 
  modify pknotsRG to be able to read from stdin, but that may not be possible.
 
  The shell script would probably look something like the following
 
  #!/bin/sh
  rm -f temp.txt;
  while read line
  do
echo $line  temp.txt;
  done
  exec pknotsRG temp.txt;
 
  Place it in a file say hadoopPknotsRG  Then you probably want to run
 
  chmod +x hadoopPknotsRG
 
  After that you want to test it with
 
  hadoop fs -cat 
  /user/yehdego/RNAData/RF00028_B.bpseqL3G5_seg_Centered_Method.txt | head -2 
  | ./hadoopPknotsRG
 
  If that works then you can try it with Hadoop streaming
 
  HADOOP_HOME$ bin/hadoop jar 
  /data/yehdego/hadoop-0.20.2/hadoop-0.20.2-streaming.jar -mapper 
  ./hadoopPknotsRG -file /data/yehdego/hadoop-0.20.2/pknotsRG -file 
  /data/yehdego/hadoop-0.20.2/hadoopPknotsRG -input 
  /user/yehdego/RF00028_B.bpseqL3G5_seg_Centered_Method.txt -output 
  /user/yehdego/RF-out -reducer NONE -verbose
 
  --Bobby
 
  On 7/25/11 3:37 PM, Daniel Yehdego dtyehd...@miners.utep.edu wrote:
 
 
 
  Good afternoon Bobby,
 
  Thanks, you gave me a great help in finding out what the problem was. After 
  I put the command line you suggested me, I found out that there was a 
  segmentation error.
  The binary executable program pknotsRG only reads a file with a sequence in 
  it. This means, there should be a shell script, as you have said

RE: Hadoop-streaming using binary executable c program

2011-07-27 Thread Daniel Yehdego

Hi Bobby, 

I just want to ask you if there is away of using a reducer or something like 
concatenation to glue my outputs from the mapper and outputs
them as a single file and segment of the predicted RNA 2D structure?

FYI: I have used a reducer NONE before:

HADOOP_HOME$ bin/hadoop jar
/data/yehdego/hadoop-0.20.2/hadoop-0.20.2-streaming.jar -mapper
./hadoopPknotsRG -file /data/yehdego/hadoop-0.20.2/pknotsRG -file
/data/yehdego/hadoop-0.20.2/hadoopPknotsRG -input
/user/yehdego/RF00028_B.bpseqL3G5_seg_Centered_Method.txt -output
/user/yehdego/RF-out -reducer NONE -verbose

and a sample of my output using the mapper of two different slave nodes looks 
like this :

AUACCCGCAAAUUCACUCAAAUCUGUAAUAGGUUUGUCAUUCAAAUCUAGUGCAAAUAUUACUUUCGCCAAUUAGGUAUAAUAAUGGUAAGC
and
[...(((...))).].
  (-13.46)

GGGACAAGACUCGACAUUUGAUACACUAUUUAUCAAUGGAUGUCUUCU
.(((.((......)..  (-11.00)

and I want to concatenate and output them as a single predicated RNA sequence 
structure:

AUACCCGCAAAUUCACUCAAAUCUGUAAUAGGUUUGUCAUUCAAAUCUAGUGCAAAUAUUACUUUCGCCAAUUAGGUAUAAUAAUGGUAAGCGGGACAAGACUCGACAUUUGAUACACUAUUUAUCAAUGGAUGUCUUCU
   

[...(((...))).]..(((.((......)..
  


Regards, 

Daniel T. Yehdego
Computational Science Program 
University of Texas at El Paso, UTEP 
dtyehd...@miners.utep.edu

 From: dtyehd...@miners.utep.edu
 To: common-user@hadoop.apache.org
 Subject: RE: Hadoop-streaming using binary executable c program
 Date: Tue, 26 Jul 2011 16:23:10 +
 
 
 Good afternoon Bobby, 
 
 Thanks so much, now its working excellent. And the speed is also reasonable. 
 Once again thanks u.  
 
 Regards, 
 
 Daniel T. Yehdego
 Computational Science Program 
 University of Texas at El Paso, UTEP 
 dtyehd...@miners.utep.edu
 
  From: ev...@yahoo-inc.com
  To: common-user@hadoop.apache.org
  Date: Mon, 25 Jul 2011 14:47:34 -0700
  Subject: Re: Hadoop-streaming using binary executable c program
  
  This is likely to be slow and it is not ideal.  The ideal would be to 
  modify pknotsRG to be able to read from stdin, but that may not be possible.
  
  The shell script would probably look something like the following
  
  #!/bin/sh
  rm -f temp.txt;
  while read line
  do
echo $line  temp.txt;
  done
  exec pknotsRG temp.txt;
  
  Place it in a file say hadoopPknotsRG  Then you probably want to run
  
  chmod +x hadoopPknotsRG
  
  After that you want to test it with
  
  hadoop fs -cat 
  /user/yehdego/RNAData/RF00028_B.bpseqL3G5_seg_Centered_Method.txt | head -2 
  | ./hadoopPknotsRG
  
  If that works then you can try it with Hadoop streaming
  
  HADOOP_HOME$ bin/hadoop jar 
  /data/yehdego/hadoop-0.20.2/hadoop-0.20.2-streaming.jar -mapper 
  ./hadoopPknotsRG -file /data/yehdego/hadoop-0.20.2/pknotsRG -file 
  /data/yehdego/hadoop-0.20.2/hadoopPknotsRG -input 
  /user/yehdego/RF00028_B.bpseqL3G5_seg_Centered_Method.txt -output 
  /user/yehdego/RF-out -reducer NONE -verbose
  
  --Bobby
  
  On 7/25/11 3:37 PM, Daniel Yehdego dtyehd...@miners.utep.edu wrote:
  
  
  
  Good afternoon Bobby,
  
  Thanks, you gave me a great help in finding out what the problem was. After 
  I put the command line you suggested me, I found out that there was a 
  segmentation error.
  The binary executable program pknotsRG only reads a file with a sequence in 
  it. This means, there should be a shell script, as you have said, that will 
  take the data coming
  from stdin and write it to a temporary file. Any idea on how to do this job 
  in shell script. The thing is I am from a biology background and don't have 
  much experience in CS.
  looking forward to hear from you. Thanks so much.
  
  Regards,
  
  Daniel T. Yehdego
  Computational Science Program
  University of Texas at El Paso, UTEP
  dtyehd...@miners.utep.edu
  
   From: ev...@yahoo-inc.com
   To: common-user@hadoop.apache.org
   Date: Fri, 22 Jul 2011 12:39:08 -0700
   Subject: Re: Hadoop-streaming using binary executable c program
  
   I would suggest that you do the following to help you debug.
  
   hadoop fs -cat 
   /user/yehdego/RNAData/RF00028_B.bpseqL3G5_seg_Centered_Method.txt | head 
   -2 | /data/yehdego/hadoop-0.20.2/pknotsRG-1.3/src/pknotsRG -
  
   This is simulating what hadoop streaming is doing.  Here we are taking 
   the first 2 lines out of the input file and feeding them to the stdin of 
   pknotsRG.  The first step is to make sure that you can get your program 
   to run correctly with something like this.  You may need to change the 
   command line to pknotsRG to get it to read the data it is processing from 
   stdin, instead of from a file.  Alternatively you may need to write a 
   shell script that will take the data coming from stdin.  Write it to a 
   file and then call pknotsRG

RE: Hadoop-streaming using binary executable c program

2011-07-26 Thread Daniel Yehdego

Good afternoon Bobby, 

Thanks so much, now its working excellent. And the speed is also reasonable. 
Once again thanks u.  

Regards, 

Daniel T. Yehdego
Computational Science Program 
University of Texas at El Paso, UTEP 
dtyehd...@miners.utep.edu

 From: ev...@yahoo-inc.com
 To: common-user@hadoop.apache.org
 Date: Mon, 25 Jul 2011 14:47:34 -0700
 Subject: Re: Hadoop-streaming using binary executable c program
 
 This is likely to be slow and it is not ideal.  The ideal would be to modify 
 pknotsRG to be able to read from stdin, but that may not be possible.
 
 The shell script would probably look something like the following
 
 #!/bin/sh
 rm -f temp.txt;
 while read line
 do
   echo $line  temp.txt;
 done
 exec pknotsRG temp.txt;
 
 Place it in a file say hadoopPknotsRG  Then you probably want to run
 
 chmod +x hadoopPknotsRG
 
 After that you want to test it with
 
 hadoop fs -cat 
 /user/yehdego/RNAData/RF00028_B.bpseqL3G5_seg_Centered_Method.txt | head -2 | 
 ./hadoopPknotsRG
 
 If that works then you can try it with Hadoop streaming
 
 HADOOP_HOME$ bin/hadoop jar 
 /data/yehdego/hadoop-0.20.2/hadoop-0.20.2-streaming.jar -mapper 
 ./hadoopPknotsRG -file /data/yehdego/hadoop-0.20.2/pknotsRG -file 
 /data/yehdego/hadoop-0.20.2/hadoopPknotsRG -input 
 /user/yehdego/RF00028_B.bpseqL3G5_seg_Centered_Method.txt -output 
 /user/yehdego/RF-out -reducer NONE -verbose
 
 --Bobby
 
 On 7/25/11 3:37 PM, Daniel Yehdego dtyehd...@miners.utep.edu wrote:
 
 
 
 Good afternoon Bobby,
 
 Thanks, you gave me a great help in finding out what the problem was. After I 
 put the command line you suggested me, I found out that there was a 
 segmentation error.
 The binary executable program pknotsRG only reads a file with a sequence in 
 it. This means, there should be a shell script, as you have said, that will 
 take the data coming
 from stdin and write it to a temporary file. Any idea on how to do this job 
 in shell script. The thing is I am from a biology background and don't have 
 much experience in CS.
 looking forward to hear from you. Thanks so much.
 
 Regards,
 
 Daniel T. Yehdego
 Computational Science Program
 University of Texas at El Paso, UTEP
 dtyehd...@miners.utep.edu
 
  From: ev...@yahoo-inc.com
  To: common-user@hadoop.apache.org
  Date: Fri, 22 Jul 2011 12:39:08 -0700
  Subject: Re: Hadoop-streaming using binary executable c program
 
  I would suggest that you do the following to help you debug.
 
  hadoop fs -cat 
  /user/yehdego/RNAData/RF00028_B.bpseqL3G5_seg_Centered_Method.txt | head -2 
  | /data/yehdego/hadoop-0.20.2/pknotsRG-1.3/src/pknotsRG -
 
  This is simulating what hadoop streaming is doing.  Here we are taking the 
  first 2 lines out of the input file and feeding them to the stdin of 
  pknotsRG.  The first step is to make sure that you can get your program to 
  run correctly with something like this.  You may need to change the command 
  line to pknotsRG to get it to read the data it is processing from stdin, 
  instead of from a file.  Alternatively you may need to write a shell script 
  that will take the data coming from stdin.  Write it to a file and then 
  call pknotsRG on that temporary file.  Once you have this working then you 
  should try it again with streaming.
 
  --Bobby Evans
 
  On 7/22/11 12:31 PM, Daniel Yehdego dtyehd...@miners.utep.edu wrote:
 
 
 
  Hi Bobby, Thanks for the response.
 
  After I tried the following comannd:
 
  bin/hadoop jar $HADOOP_HOME/hadoop-0.20.2-streaming.jar -mapper 
  /data/yehdego/hadoop-0.20.2/pknotsRG-1.3/src/pknotsRG -  -file 
  /data/yehdego/hadoop-0.20.2/pknotsRG-1.3/src/pknotsRG  -reducer NONE -input 
  /user/yehdego/RNAData/RF00028_B.bpseqL3G5_seg_Centered_Method.txt -output 
  /user/yehdego/RF-out - verbose
 
  I got a stderr logs :
 
  java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess 
  failed with code 139
  at 
  org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:311)
  at 
  org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545)
  at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:132)
  at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
  at 
  org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
  at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
  at org.apache.hadoop.mapred.Child.main(Child.java:170)
 
 
 
  syslog logs
 
  2011-07-22 13:02:27,467 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: 
  Initializing JVM Metrics with processName=MAP, sessionId=
  2011-07-22 13:02:27,913 INFO org.apache.hadoop.mapred.MapTask: 
  numReduceTasks: 0
  2011-07-22 13:02:28,149 INFO org.apache.hadoop.streaming.PipeMapRed: 
  PipeMapRed exec 
  [/data/yehdego/hadoop_tmp/dfs/local/taskTracker/jobcache/job_201107181535_0079/attempt_201107181535_0079_m_00_0/work/./pknotsRG]
  2011-07-22

RE: Hadoop-streaming using binary executable c program

2011-07-25 Thread Daniel Yehdego

Good afternoon Bobby, 

Thanks, you gave me a great help in finding out what the problem was. After I 
put the command line you suggested me, I found out that there was a 
segmentation error.
The binary executable program pknotsRG only reads a file with a sequence in it. 
This means, there should be a shell script, as you have said, that will take 
the data coming
from stdin and write it to a temporary file. Any idea on how to do this job in 
shell script. The thing is I am from a biology background and don't have much 
experience in CS.
looking forward to hear from you. Thanks so much.

Regards, 

Daniel T. Yehdego
Computational Science Program 
University of Texas at El Paso, UTEP 
dtyehd...@miners.utep.edu

 From: ev...@yahoo-inc.com
 To: common-user@hadoop.apache.org
 Date: Fri, 22 Jul 2011 12:39:08 -0700
 Subject: Re: Hadoop-streaming using binary executable c program
 
 I would suggest that you do the following to help you debug.
 
 hadoop fs -cat 
 /user/yehdego/RNAData/RF00028_B.bpseqL3G5_seg_Centered_Method.txt | head -2 | 
 /data/yehdego/hadoop-0.20.2/pknotsRG-1.3/src/pknotsRG -
 
 This is simulating what hadoop streaming is doing.  Here we are taking the 
 first 2 lines out of the input file and feeding them to the stdin of 
 pknotsRG.  The first step is to make sure that you can get your program to 
 run correctly with something like this.  You may need to change the command 
 line to pknotsRG to get it to read the data it is processing from stdin, 
 instead of from a file.  Alternatively you may need to write a shell script 
 that will take the data coming from stdin.  Write it to a file and then call 
 pknotsRG on that temporary file.  Once you have this working then you should 
 try it again with streaming.
 
 --Bobby Evans
 
 On 7/22/11 12:31 PM, Daniel Yehdego dtyehd...@miners.utep.edu wrote:
 
 
 
 Hi Bobby, Thanks for the response.
 
 After I tried the following comannd:
 
 bin/hadoop jar $HADOOP_HOME/hadoop-0.20.2-streaming.jar -mapper 
 /data/yehdego/hadoop-0.20.2/pknotsRG-1.3/src/pknotsRG -  -file 
 /data/yehdego/hadoop-0.20.2/pknotsRG-1.3/src/pknotsRG  -reducer NONE -input 
 /user/yehdego/RNAData/RF00028_B.bpseqL3G5_seg_Centered_Method.txt -output 
 /user/yehdego/RF-out - verbose
 
 I got a stderr logs :
 
 java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed 
 with code 139
 at 
 org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:311)
 at 
 org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545)
 at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:132)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
 at 
 org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
 at org.apache.hadoop.mapred.Child.main(Child.java:170)
 
 
 
 syslog logs
 
 2011-07-22 13:02:27,467 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: 
 Initializing JVM Metrics with processName=MAP, sessionId=
 2011-07-22 13:02:27,913 INFO org.apache.hadoop.mapred.MapTask: 
 numReduceTasks: 0
 2011-07-22 13:02:28,149 INFO org.apache.hadoop.streaming.PipeMapRed: 
 PipeMapRed exec 
 [/data/yehdego/hadoop_tmp/dfs/local/taskTracker/jobcache/job_201107181535_0079/attempt_201107181535_0079_m_00_0/work/./pknotsRG]
 2011-07-22 13:02:28,242 INFO org.apache.hadoop.streaming.PipeMapRed: 
 R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s]
 2011-07-22 13:02:28,267 INFO org.apache.hadoop.streaming.PipeMapRed: 
 MROutputThread done
 2011-07-22 13:02:28,267 INFO org.apache.hadoop.streaming.PipeMapRed: 
 MRErrorThread done
 2011-07-22 13:02:28,267 INFO org.apache.hadoop.streaming.PipeMapRed: 
 PipeMapRed failed!
 2011-07-22 13:02:28,361 WARN org.apache.hadoop.mapred.TaskTracker: Error 
 running child
 java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed 
 with code 139
 at 
 org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:311)
 at 
 org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545)
 at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:132)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
 at 
 org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
 at org.apache.hadoop.mapred.Child.main(Child.java:170)
 2011-07-22 13:02:28,395 INFO org.apache.hadoop.mapred.TaskRunner: Runnning 
 cleanup for the task
 
 
 
 Regards,
 
 Daniel T. Yehdego
 Computational Science Program
 University of Texas at El Paso, UTEP
 dtyehd...@miners.utep.edu
 
  From: ev...@yahoo-inc.com
  To: common-user@hadoop.apache.org; dtyehd...@miners.utep.edu
  Date: Fri, 22 Jul 2011 09:12:18

Re: Hadoop-streaming using binary executable c program

2011-07-25 Thread Robert Evans
This is likely to be slow and it is not ideal.  The ideal would be to modify 
pknotsRG to be able to read from stdin, but that may not be possible.

The shell script would probably look something like the following

#!/bin/sh
rm -f temp.txt;
while read line
do
  echo $line  temp.txt;
done
exec pknotsRG temp.txt;

Place it in a file say hadoopPknotsRG  Then you probably want to run

chmod +x hadoopPknotsRG

After that you want to test it with

hadoop fs -cat 
/user/yehdego/RNAData/RF00028_B.bpseqL3G5_seg_Centered_Method.txt | head -2 | 
./hadoopPknotsRG

If that works then you can try it with Hadoop streaming

HADOOP_HOME$ bin/hadoop jar 
/data/yehdego/hadoop-0.20.2/hadoop-0.20.2-streaming.jar -mapper 
./hadoopPknotsRG -file /data/yehdego/hadoop-0.20.2/pknotsRG -file 
/data/yehdego/hadoop-0.20.2/hadoopPknotsRG -input 
/user/yehdego/RF00028_B.bpseqL3G5_seg_Centered_Method.txt -output 
/user/yehdego/RF-out -reducer NONE -verbose

--Bobby

On 7/25/11 3:37 PM, Daniel Yehdego dtyehd...@miners.utep.edu wrote:



Good afternoon Bobby,

Thanks, you gave me a great help in finding out what the problem was. After I 
put the command line you suggested me, I found out that there was a 
segmentation error.
The binary executable program pknotsRG only reads a file with a sequence in it. 
This means, there should be a shell script, as you have said, that will take 
the data coming
from stdin and write it to a temporary file. Any idea on how to do this job in 
shell script. The thing is I am from a biology background and don't have much 
experience in CS.
looking forward to hear from you. Thanks so much.

Regards,

Daniel T. Yehdego
Computational Science Program
University of Texas at El Paso, UTEP
dtyehd...@miners.utep.edu

 From: ev...@yahoo-inc.com
 To: common-user@hadoop.apache.org
 Date: Fri, 22 Jul 2011 12:39:08 -0700
 Subject: Re: Hadoop-streaming using binary executable c program

 I would suggest that you do the following to help you debug.

 hadoop fs -cat 
 /user/yehdego/RNAData/RF00028_B.bpseqL3G5_seg_Centered_Method.txt | head -2 | 
 /data/yehdego/hadoop-0.20.2/pknotsRG-1.3/src/pknotsRG -

 This is simulating what hadoop streaming is doing.  Here we are taking the 
 first 2 lines out of the input file and feeding them to the stdin of 
 pknotsRG.  The first step is to make sure that you can get your program to 
 run correctly with something like this.  You may need to change the command 
 line to pknotsRG to get it to read the data it is processing from stdin, 
 instead of from a file.  Alternatively you may need to write a shell script 
 that will take the data coming from stdin.  Write it to a file and then call 
 pknotsRG on that temporary file.  Once you have this working then you should 
 try it again with streaming.

 --Bobby Evans

 On 7/22/11 12:31 PM, Daniel Yehdego dtyehd...@miners.utep.edu wrote:



 Hi Bobby, Thanks for the response.

 After I tried the following comannd:

 bin/hadoop jar $HADOOP_HOME/hadoop-0.20.2-streaming.jar -mapper 
 /data/yehdego/hadoop-0.20.2/pknotsRG-1.3/src/pknotsRG -  -file 
 /data/yehdego/hadoop-0.20.2/pknotsRG-1.3/src/pknotsRG  -reducer NONE -input 
 /user/yehdego/RNAData/RF00028_B.bpseqL3G5_seg_Centered_Method.txt -output 
 /user/yehdego/RF-out - verbose

 I got a stderr logs :

 java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed 
 with code 139
 at 
 org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:311)
 at 
 org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545)
 at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:132)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
 at 
 org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
 at org.apache.hadoop.mapred.Child.main(Child.java:170)



 syslog logs

 2011-07-22 13:02:27,467 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: 
 Initializing JVM Metrics with processName=MAP, sessionId=
 2011-07-22 13:02:27,913 INFO org.apache.hadoop.mapred.MapTask: 
 numReduceTasks: 0
 2011-07-22 13:02:28,149 INFO org.apache.hadoop.streaming.PipeMapRed: 
 PipeMapRed exec 
 [/data/yehdego/hadoop_tmp/dfs/local/taskTracker/jobcache/job_201107181535_0079/attempt_201107181535_0079_m_00_0/work/./pknotsRG]
 2011-07-22 13:02:28,242 INFO org.apache.hadoop.streaming.PipeMapRed: 
 R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s]
 2011-07-22 13:02:28,267 INFO org.apache.hadoop.streaming.PipeMapRed: 
 MROutputThread done
 2011-07-22 13:02:28,267 INFO org.apache.hadoop.streaming.PipeMapRed: 
 MRErrorThread done
 2011-07-22 13:02:28,267 INFO org.apache.hadoop.streaming.PipeMapRed: 
 PipeMapRed failed!
 2011-07-22 13:02:28,361 WARN org.apache.hadoop.mapred.TaskTracker: Error 
 running child
 java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess

Re: Hadoop-streaming using binary executable c program

2011-07-22 Thread Robert Evans
It looks like it tried to run your program and the program exited with a 1 not 
a 0.  What are the stderr logs like for the mappers that were launched, you 
should be able to access them through the Web GUI?  You might want to add in 
some stderr log messages to you c program too. To be able to debug how far 
along it is going before exiting.

--Bobby Evans

On 7/22/11 9:19 AM, Daniel Yehdego dtyehd...@miners.utep.edu wrote:

I am trying to parallelize some very long RNA sequence for the sake of
predicting their RNA 2D structures. I am using a binary executable c
program called pknotsRG as my mapper. I tried the following bin/hadoop
command:

HADOOP_HOME$ bin/hadoop
jar /data/yehdego/hadoop-0.20.2/hadoop-0.20.2-streaming.jar
-mapper /data/yehdego/hadoop-0.20.2/pknotsRG
-file /data/yehdego/hadoop-0.20.2/pknotsRG
-input /user/yehdego/RF00028_B.bpseqL3G5_seg_Centered_Method.txt
-output /user/yehdego/RF-out -reducer NONE -verbose

but i keep getting the following error message:

java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess
failed with code 1
at
org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:311)
at
org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:132)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)

FYI: my input file is RF00028_B.bpseqL3G5_seg_Centered_Method.txt which
is a chunk of RNA sequences and the mapper is expected to get the input
and execute the input file line by line and out put the predicted
structure for each line of sequence for a specified number of maps. Any
help on this problem is really appreciated. Thanks.




RE: Hadoop-streaming using binary executable c program

2011-07-22 Thread Daniel Yehdego

Hi Bobby, Thanks for the response.

After I tried the following comannd:

bin/hadoop jar $HADOOP_HOME/hadoop-0.20.2-streaming.jar -mapper 
/data/yehdego/hadoop-0.20.2/pknotsRG-1.3/src/pknotsRG -  -file 
/data/yehdego/hadoop-0.20.2/pknotsRG-1.3/src/pknotsRG  -reducer NONE -input 
/user/yehdego/RNAData/RF00028_B.bpseqL3G5_seg_Centered_Method.txt -output 
/user/yehdego/RF-out - verbose

I got a stderr logs :

java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed 
with code 139
at 
org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:311)
at 
org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:132)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)



syslog logs

2011-07-22 13:02:27,467 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: 
Initializing JVM Metrics with processName=MAP, sessionId=
2011-07-22 13:02:27,913 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 0
2011-07-22 13:02:28,149 INFO org.apache.hadoop.streaming.PipeMapRed: PipeMapRed 
exec 
[/data/yehdego/hadoop_tmp/dfs/local/taskTracker/jobcache/job_201107181535_0079/attempt_201107181535_0079_m_00_0/work/./pknotsRG]
2011-07-22 13:02:28,242 INFO org.apache.hadoop.streaming.PipeMapRed: 
R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s]
2011-07-22 13:02:28,267 INFO org.apache.hadoop.streaming.PipeMapRed: 
MROutputThread done
2011-07-22 13:02:28,267 INFO org.apache.hadoop.streaming.PipeMapRed: 
MRErrorThread done
2011-07-22 13:02:28,267 INFO org.apache.hadoop.streaming.PipeMapRed: PipeMapRed 
failed!
2011-07-22 13:02:28,361 WARN org.apache.hadoop.mapred.TaskTracker: Error 
running child
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed 
with code 139
at 
org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:311)
at 
org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:132)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
2011-07-22 13:02:28,395 INFO org.apache.hadoop.mapred.TaskRunner: Runnning 
cleanup for the task



Regards, 

Daniel T. Yehdego
Computational Science Program 
University of Texas at El Paso, UTEP 
dtyehd...@miners.utep.edu

 From: ev...@yahoo-inc.com
 To: common-user@hadoop.apache.org; dtyehd...@miners.utep.edu
 Date: Fri, 22 Jul 2011 09:12:18 -0700
 Subject: Re: Hadoop-streaming using binary executable c program
 
 It looks like it tried to run your program and the program exited with a 1 
 not a 0.  What are the stderr logs like for the mappers that were launched, 
 you should be able to access them through the Web GUI?  You might want to add 
 in some stderr log messages to you c program too. To be able to debug how far 
 along it is going before exiting.
 
 --Bobby Evans
 
 On 7/22/11 9:19 AM, Daniel Yehdego dtyehd...@miners.utep.edu wrote:
 
 I am trying to parallelize some very long RNA sequence for the sake of
 predicting their RNA 2D structures. I am using a binary executable c
 program called pknotsRG as my mapper. I tried the following bin/hadoop
 command:
 
 HADOOP_HOME$ bin/hadoop
 jar /data/yehdego/hadoop-0.20.2/hadoop-0.20.2-streaming.jar
 -mapper /data/yehdego/hadoop-0.20.2/pknotsRG
 -file /data/yehdego/hadoop-0.20.2/pknotsRG
 -input /user/yehdego/RF00028_B.bpseqL3G5_seg_Centered_Method.txt
 -output /user/yehdego/RF-out -reducer NONE -verbose
 
 but i keep getting the following error message:
 
 java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess
 failed with code 1
 at
 org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:311)
 at
 org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545)
 at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:132)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
 at 
 org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
 at org.apache.hadoop.mapred.Child.main(Child.java:170)
 
 FYI: my input file is RF00028_B.bpseqL3G5_seg_Centered_Method.txt which
 is a chunk of RNA sequences and the mapper is expected

Re: Hadoop-streaming using binary executable c program

2011-07-22 Thread Robert Evans
I would suggest that you do the following to help you debug.

hadoop fs -cat 
/user/yehdego/RNAData/RF00028_B.bpseqL3G5_seg_Centered_Method.txt | head -2 | 
/data/yehdego/hadoop-0.20.2/pknotsRG-1.3/src/pknotsRG -

This is simulating what hadoop streaming is doing.  Here we are taking the 
first 2 lines out of the input file and feeding them to the stdin of pknotsRG.  
The first step is to make sure that you can get your program to run correctly 
with something like this.  You may need to change the command line to pknotsRG 
to get it to read the data it is processing from stdin, instead of from a file. 
 Alternatively you may need to write a shell script that will take the data 
coming from stdin.  Write it to a file and then call pknotsRG on that temporary 
file.  Once you have this working then you should try it again with streaming.

--Bobby Evans

On 7/22/11 12:31 PM, Daniel Yehdego dtyehd...@miners.utep.edu wrote:



Hi Bobby, Thanks for the response.

After I tried the following comannd:

bin/hadoop jar $HADOOP_HOME/hadoop-0.20.2-streaming.jar -mapper 
/data/yehdego/hadoop-0.20.2/pknotsRG-1.3/src/pknotsRG -  -file 
/data/yehdego/hadoop-0.20.2/pknotsRG-1.3/src/pknotsRG  -reducer NONE -input 
/user/yehdego/RNAData/RF00028_B.bpseqL3G5_seg_Centered_Method.txt -output 
/user/yehdego/RF-out - verbose

I got a stderr logs :

java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed 
with code 139
at 
org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:311)
at 
org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:132)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)



syslog logs

2011-07-22 13:02:27,467 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: 
Initializing JVM Metrics with processName=MAP, sessionId=
2011-07-22 13:02:27,913 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 0
2011-07-22 13:02:28,149 INFO org.apache.hadoop.streaming.PipeMapRed: PipeMapRed 
exec 
[/data/yehdego/hadoop_tmp/dfs/local/taskTracker/jobcache/job_201107181535_0079/attempt_201107181535_0079_m_00_0/work/./pknotsRG]
2011-07-22 13:02:28,242 INFO org.apache.hadoop.streaming.PipeMapRed: 
R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s]
2011-07-22 13:02:28,267 INFO org.apache.hadoop.streaming.PipeMapRed: 
MROutputThread done
2011-07-22 13:02:28,267 INFO org.apache.hadoop.streaming.PipeMapRed: 
MRErrorThread done
2011-07-22 13:02:28,267 INFO org.apache.hadoop.streaming.PipeMapRed: PipeMapRed 
failed!
2011-07-22 13:02:28,361 WARN org.apache.hadoop.mapred.TaskTracker: Error 
running child
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed 
with code 139
at 
org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:311)
at 
org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:132)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
2011-07-22 13:02:28,395 INFO org.apache.hadoop.mapred.TaskRunner: Runnning 
cleanup for the task



Regards,

Daniel T. Yehdego
Computational Science Program
University of Texas at El Paso, UTEP
dtyehd...@miners.utep.edu

 From: ev...@yahoo-inc.com
 To: common-user@hadoop.apache.org; dtyehd...@miners.utep.edu
 Date: Fri, 22 Jul 2011 09:12:18 -0700
 Subject: Re: Hadoop-streaming using binary executable c program

 It looks like it tried to run your program and the program exited with a 1 
 not a 0.  What are the stderr logs like for the mappers that were launched, 
 you should be able to access them through the Web GUI?  You might want to add 
 in some stderr log messages to you c program too. To be able to debug how far 
 along it is going before exiting.

 --Bobby Evans

 On 7/22/11 9:19 AM, Daniel Yehdego dtyehd...@miners.utep.edu wrote:

 I am trying to parallelize some very long RNA sequence for the sake of
 predicting their RNA 2D structures. I am using a binary executable c
 program called pknotsRG as my mapper. I tried the following bin/hadoop
 command:

 HADOOP_HOME$ bin/hadoop
 jar /data/yehdego/hadoop-0.20.2/hadoop-0.20.2-streaming.jar
 -mapper /data/yehdego/hadoop-0.20.2/pknotsRG
 -file /data/yehdego/hadoop-0.20.2/pknotsRG
 -input /user/yehdego/RF00028_B.bpseqL3G5_seg_Centered_Method.txt
 -output /user