input file order

2012-01-19 Thread Daniel Yehdego

Hi, 
I have 100 .txt input files and I want my mapper output to be in an orderly 
manner. I am not using any reducer.Any idea? 

Regards, 


  

Hadoop Streaming

2011-12-03 Thread Daniel Yehdego

I have the following error in running hadoop streaming, 
PipeMapRed\.waitOutputThreads(): subprocess failed with code 126at 
org\.apache\.hadoop\.streaming\.PipeMapRed\.waitOutputThreads(PipeMapRed\.java:311)
  at 
org\.apache\.hadoop\.streaming\.PipeMapRed\.mapRedFinished(PipeMapRed\.java:545)
 at org\.apache\.hadoop\.streaming\.PipeMapper\.close(PipeMapper\.java:132) 
 at org\.apache\.hadoop\.mapred\.MapRunner\.run(MapRunner\.java:57)  at 
org\.apache\.hadoop\.streaming\.PipeMapRunner\.run(PipeMapRunner\.java:36)   at 
org\.apache\.hadoop\.mapred\.MapTask\.runOldMapper(MapTask\.java:358)at 
org\.apache\.hadoop\.mapred\.MapTask\.run(MapTask\.java:307) at 
org\.apache\.hadoop\.mapred\.Child\.main(Child\.java:170)
I couldn't find out any other error information. 
Any help ? 
  

RE: Hadoop Streaming

2011-12-03 Thread Daniel Yehdego

Thanks Tom for your reply, 
I think my code is reading from stdin. Because I tried it locally using the 
following command and its running:
 $ bin/hadoop fs -cat 
/user/yehdego/Hadoop-Data-New/RF00171_A.bpseqL3G1_seg_Optimized_Method.txt | 
head -2 | ./HADOOP

But when I tried streaming , it failed and gave me the error code 126.

 Date: Sat, 3 Dec 2011 19:14:20 -0800
 Subject: Re: Hadoop Streaming
 From: t...@supertom.com
 To: common-user@hadoop.apache.org
 
 So that code 126 should be kicked out by your program - do you know
 what that means?
 
 Your code can read from stdin?
 
 Thanks,
 
 Tom
 
 On Sat, Dec 3, 2011 at 7:09 PM, Daniel Yehdego
 dtyehd...@miners.utep.edu wrote:
 
  I have the following error in running hadoop streaming,
  PipeMapRed\.waitOutputThreads(): subprocess failed with code 126at 
  org\.apache\.hadoop\.streaming\.PipeMapRed\.waitOutputThreads(PipeMapRed\.java:311)
at 
  org\.apache\.hadoop\.streaming\.PipeMapRed\.mapRedFinished(PipeMapRed\.java:545)
   at 
  org\.apache\.hadoop\.streaming\.PipeMapper\.close(PipeMapper\.java:132) 
   at org\.apache\.hadoop\.mapred\.MapRunner\.run(MapRunner\.java:57)  at 
  org\.apache\.hadoop\.streaming\.PipeMapRunner\.run(PipeMapRunner\.java:36)  
   at org\.apache\.hadoop\.mapred\.MapTask\.runOldMapper(MapTask\.java:358)   
   at org\.apache\.hadoop\.mapred\.MapTask\.run(MapTask\.java:307) at 
  org\.apache\.hadoop\.mapred\.Child\.main(Child\.java:170)
  I couldn't find out any other error information.
  Any help ?
 
  

RE: Hadoop Streaming

2011-12-03 Thread Daniel Yehdego

TOM, 
What the HADOOP script do is ...read each line from the STDIN and execute the 
program pknotsRG. tmp.txt is a temporary file.
the script is like this: 
#!/bin/sh
rm -f temp.txt;while read line
   doecho $line  temp.txt;doneexec 
/data/yehdego/hadoop-0.20.2/PKNOTSRG/src/pknotsRG -k 0 -F temp.txt;

 Date: Sat, 3 Dec 2011 19:49:46 -0800
 Subject: Re: Hadoop Streaming
 From: t...@supertom.com
 To: common-user@hadoop.apache.org
 
 Hi Daniel,
 
 I see from your other thread that your HADOOP script has a line like:
 
 #!/bin/shrm -f temp.txt
 
 I'm not sure what that is, exactly.  I suspect the -f is reading from
 some file and the while loop you had listed read from stdin it seems.
 
 What does your input look like?  I think what's happening is that you
 might be expecting lines of input and you're getting splits.  What
 does your input look like?
 
 You might want to try this:
 -inputformat org.apache.hadoop.mapred.lib.NLineInputFormat
 
 Thanks,
 
 Tom
 
 
 
 
 On Sat, Dec 3, 2011 at 7:22 PM, Daniel Yehdego
 dtyehd...@miners.utep.edu wrote:
 
  Thanks Tom for your reply,
  I think my code is reading from stdin. Because I tried it locally using the 
  following command and its running:
   $ bin/hadoop fs -cat 
  /user/yehdego/Hadoop-Data-New/RF00171_A.bpseqL3G1_seg_Optimized_Method.txt 
  | head -2 | ./HADOOP
 
  But when I tried streaming , it failed and gave me the error code 126.
 
  Date: Sat, 3 Dec 2011 19:14:20 -0800
  Subject: Re: Hadoop Streaming
  From: t...@supertom.com
  To: common-user@hadoop.apache.org
 
  So that code 126 should be kicked out by your program - do you know
  what that means?
 
  Your code can read from stdin?
 
  Thanks,
 
  Tom
 
  On Sat, Dec 3, 2011 at 7:09 PM, Daniel Yehdego
  dtyehd...@miners.utep.edu wrote:
  
   I have the following error in running hadoop streaming,
   PipeMapRed\.waitOutputThreads(): subprocess failed with code 126
   at 
   org\.apache\.hadoop\.streaming\.PipeMapRed\.waitOutputThreads(PipeMapRed\.java:311)
 at 
   org\.apache\.hadoop\.streaming\.PipeMapRed\.mapRedFinished(PipeMapRed\.java:545)
at 
   org\.apache\.hadoop\.streaming\.PipeMapper\.close(PipeMapper\.java:132)  
   at org\.apache\.hadoop\.mapred\.MapRunner\.run(MapRunner\.java:57)   
  at 
   org\.apache\.hadoop\.streaming\.PipeMapRunner\.run(PipeMapRunner\.java:36)
  at 
   org\.apache\.hadoop\.mapred\.MapTask\.runOldMapper(MapTask\.java:358)
   at org\.apache\.hadoop\.mapred\.MapTask\.run(MapTask\.java:307) at 
   org\.apache\.hadoop\.mapred\.Child\.main(Child\.java:170)
   I couldn't find out any other error information.
   Any help ?
  
 
  

RE: Hadoop-streaming using binary executable c program

2011-12-02 Thread Daniel Yehdego





Hi.

I was trying to run hadoop streaming and before that I check with the following 
:
bin/hadoop fs -cat 
/user/yehdego/Hadoop-Data-New/RF00171_A.bpseqL3G1_seg_Optimized_Method.txt | 
head -2 | ./HADOOP 
Were HADOOP is a shell script:
#!/bin/shrm -f temp.txt;while read line doecho $line  temp.txt;doneexec 
/data/yehdego/hadoop-0.20.2/PKNOTSRG/src/bin/pknotsRG -k o -F temp.txt;
and its working, but when i try running on streaming using the following:
 bin/hadoop jar /data/yehdego/hadoop-0.20.2/hadoop-0.20.2-streaming.jar -mapper 
./HADOOP  -file /data/yehdego/hadoop-0.20.2/HADOOP -file 
/data/yehdego/hadoop-0.20.2/PKNOTSRG/src/bin/pknotsRG -reducer 
./ReduceLatest.py -file /data/yehdego/hadoop-0.20.2/ReduceLatest.py -input 
/user/yehdego/Hadoop-Data-New/RF00171_A.bpseqL3G1_seg_Optimized_Method.txt  
-output /user/yehdego/RF171_NEW/RF00171_A.bpseqL3G1_Optimized_Method40.txt 
-verbose 
it failed with the following error:
PipeMapRed\.waitOutputThreads(): subprocess failed with code 126at 
org\.apache\.hadoop\.streaming\.PipeMapRed\.waitOutputThreads(PipeMapRed\.java:311)
  at 
org\.apache\.hadoop\.streaming\.PipeMapRed\.mapRedFinished(PipeMapRed\.java:545)
 at org\.apache\.hadoop\.streaming\.PipeMapper\.close(PipeMapper\.java:132) 
 at org\.apache\.hadoop\.mapred\.MapRunner\.run(MapRunner\.java:57)  at 
org\.apache\.hadoop\.streaming\.PipeMapRunner\.run(PipeMapRunner\.java:36)   at 
org\.apache\.hadoop\.mapred\.MapTask\.runOldMapper(MapTask\.java:358)at 
org\.apache\.hadoop\.mapred\.MapTask\.run(MapTask\.java:307) at 
org\.apache\.hadoop\.mapred\.Child\.main(Child\.java:170)
Any idea on this problem ?
Regards, 

Daniel T. Yehdego
Computational Science Program 
University of Texas at El Paso, UTEP 
dtyehd...@miners.utep.edu

 From: ev...@yahoo-inc.com
 To: common-user@hadoop.apache.org
 Date: Mon, 25 Jul 2011 14:47:34 -0700
 Subject: Re: Hadoop-streaming using binary executable c program
 
 This is likely to be slow and it is not ideal.  The ideal would be to modify 
 pknotsRG to be able to read from stdin, but that may not be possible.
 
 The shell script would probably look something like the following
 
 #!/bin/sh
 rm -f temp.txt;
 while read line
 do
   echo $line  temp.txt;
 done
 exec pknotsRG temp.txt;
 
 Place it in a file say hadoopPknotsRG  Then you probably want to run
 
 chmod +x hadoopPknotsRG
 
 After that you want to test it with
 
 hadoop fs -cat 
 /user/yehdego/RNAData/RF00028_B.bpseqL3G5_seg_Centered_Method.txt | head -2 | 
 ./hadoopPknotsRG
 
 If that works then you can try it with Hadoop streaming
 
 HADOOP_HOME$ bin/hadoop jar 
 /data/yehdego/hadoop-0.20.2/hadoop-0.20.2-streaming.jar -mapper 
 ./hadoopPknotsRG -file /data/yehdego/hadoop-0.20.2/pknotsRG -file 
 /data/yehdego/hadoop-0.20.2/hadoopPknotsRG -input 
 /user/yehdego/RF00028_B.bpseqL3G5_seg_Centered_Method.txt -output 
 /user/yehdego/RF-out -reducer NONE -verbose
 
 --Bobby
 
 On 7/25/11 3:37 PM, Daniel Yehdego dtyehd...@miners.utep.edu wrote:
 
 
 
 Good afternoon Bobby,
 
 Thanks, you gave me a great help in finding out what the problem was. After I 
 put the command line you suggested me, I found out that there was a 
 segmentation error.
 The binary executable program pknotsRG only reads a file with a sequence in 
 it. This means, there should be a shell script, as you have said, that will 
 take the data coming
 from stdin and write it to a temporary file. Any idea on how to do this job 
 in shell script. The thing is I am from a biology background and don't have 
 much experience in CS.
 looking forward to hear from you. Thanks so much.
 
 Regards,
 
 Daniel T. Yehdego
 Computational Science Program
 University of Texas at El Paso, UTEP
 dtyehd...@miners.utep.edu
 
  From: ev...@yahoo-inc.com
  To: common-user@hadoop.apache.org
  Date: Fri, 22 Jul 2011 12:39:08 -0700
  Subject: Re: Hadoop-streaming using binary executable c program
 
  I would suggest that you do the following to help you debug.
 
  hadoop fs -cat 
  /user/yehdego/RNAData/RF00028_B.bpseqL3G5_seg_Centered_Method.txt | head -2 
  | /data/yehdego/hadoop-0.20.2/pknotsRG-1.3/src/pknotsRG -
 
  This is simulating what hadoop streaming is doing.  Here we are taking the 
  first 2 lines out of the input file and feeding them to the stdin of 
  pknotsRG.  The first step is to make sure that you can get your program to 
  run correctly with something like this.  You may need to change the command 
  line to pknotsRG to get it to read the data it is processing from stdin, 
  instead of from a file.  Alternatively you may need to write a shell script 
  that will take the data coming from stdin.  Write it to a file and then 
  call pknotsRG on that temporary file.  Once you have this working then you 
  should try it again with streaming.
 
  --Bobby Evans
 
  On 7/22/11 12:31 PM, Daniel Yehdego dtyehd...@miners.utep.edu wrote:
 
 
 
  Hi Bobby, Thanks for the response.
 
  After I tried the following comannd:
 
  bin/hadoop jar

Job Submission schedule, one file at a time ?

2011-10-25 Thread Daniel Yehdego

Hi, 
I do have a folder with 50 different files and and I want to submit a Hadoop 
MapReduce job using each file as an input.My Map/Reduce programs basically do 
the same job for each of my files but I want to schedule and submit a job one 
file at a time. Its like submitting a job with one file input, wait until the 
job completes and submit the second job (second file) right after.I want to 
have 50 different Mapreduce outputs for the 50 input files. 
Looking forward for your inputs , Thanks.
Regards, 


  

RE: Job Submission schedule, one file at a time ?

2011-10-25 Thread Daniel Yehdego

Hi Mike, Thanks for your quick response.
What I am looking is, I have m/r job which accepts an input file, process it 
and outputs to a single reducer.But I do have many file ques waiting to be 
processed similarly and I don't want to submit each m/r job foreach file 
manually. Is there a way to submit these files one after another or 
concurrently. All the files are independentto each other. I think its kind of 
job scheduler or something ? I am not sure on how to proceed .

Regards, 


 Subject: Re: Job Submission schedule, one file at a time ?
 From: michael_se...@hotmail.com
 Date: Tue, 25 Oct 2011 16:25:40 -0500
 To: common-user@hadoop.apache.org
 
 Not sure what you are attempting to do...
 If you submit the directory name... You get a single m/r job to process all. 
 ( but it doesn't sound like that is what you want...)
 
 You could use Oozie, or just a simple shell script that will walk down a list 
 of files in the directory and then launch a Hadoop task...
 
 Or did you want something else?
 
 
 Sent from a remote device. Please excuse any typos...
 
 Mike Segel
 
 On Oct 25, 2011, at 3:53 PM, Daniel Yehdego dtyehd...@miners.utep.edu wrote:
 
  
  Hi, 
  I do have a folder with 50 different files and and I want to submit a 
  Hadoop MapReduce job using each file as an input.My Map/Reduce programs 
  basically do the same job for each of my files but I want to schedule and 
  submit a job one file at a time. Its like submitting a job with one file 
  input, wait until the job completes and submit the second job (second file) 
  right after.I want to have 50 different Mapreduce outputs for the 50 input 
  files. 
  Looking forward for your inputs , Thanks.
  Regards, 
  
  
  
  

Hadoop Streaming

2011-10-01 Thread Daniel Yehdego

Hi all, 
I am using hadoop streaming and I want to use a secondary sort so that I will 
output my values in order. Can i use stream.num.map.input.key.fields instead of 
stream.num.map.output.key.fields ? I am doing this because the output from the 
mapper is just a string of letters and its difficult to use the keys to compare.

Regards, 


  

RE: Reducer to concatenate string values

2011-09-20 Thread Daniel Yehdego

Hi Kai, 
Many thanks for your response. I will look at the links you sent me and I will 
be back to you. 

Regards, 

Daniel T. Yehdego
Computational Science Program 
University of Texas at El Paso, UTEP 
dtyehd...@miners.utep.edu

 Subject: Re: Reducer to concatenate string values
 From: k...@123.org
 Date: Tue, 20 Sep 2011 07:53:56 +0200
 To: common-user@hadoop.apache.org
 
 Hi Daniel,
 
 the values for a single key will be passed to reduce() in a non-predictable 
 order. Actually, when running the same job on the same data again, the order 
 is most likely different every time.
 
 If you want the values to be in a sorted way, you need to apply a 'secondary 
 sort'. The basic idea is to attach your values to the key, and then benefit 
 from the sorting Hadoop does on the key.
 
 However, you need to write some code to make that happen. Josh wrote a nice 
 series of articles on it, and you will find more if you google for secondary 
 sort.
 
 http://www.cloudera.com/blog/2011/03/simple-moving-average-secondary-sort-and-mapreduce-part-1/
 http://www.cloudera.com/blog/2011/03/simple-moving-average-secondary-sort-and-mapreduce-part-2/
 http://www.cloudera.com/blog/2011/04/simple-moving-average-secondary-sort-and-mapreduce-part-3/
 
 Kai
 
 Am 20.09.2011 um 07:43 schrieb Daniel Yehdego:
 
  
  Good evening, 
  I have a certain value output from a mapper and I want to concatenate the 
  string outputs using a Reducer (one reducer).But the order of the 
  concatenated string values is not in order. How can I use a reducer that 
  receives a value from a mapper output and concatenate the strings in order. 
  waiting your response and thanks in advance.  
  
  Regards, 
  
  Daniel T. Yehdego
  Computational Science Program 
  University of Texas at El Paso, UTEP 
  dtyehd...@miners.utep.edu 
 
 -- 
 Kai Voigt
 k...@123.org
 
 
 
 
  

RE: Reducer to concatenate string values

2011-09-20 Thread Daniel Yehdego

Hi Ayon, 
I am using a C executable as my mapper (streaming), but I am not sure how to 
use a reducer that concatenates the values from a mapper in order. 

Regards, 

Daniel T. Yehdego
Computational Science Program 
University of Texas at El Paso, UTEP 
dtyehd...@miners.utep.edu

 Date: Mon, 19 Sep 2011 22:54:46 -0700
 From: ayonsi...@yahoo.com
 Subject: Re: Reducer to concatenate string values
 To: common-user@hadoop.apache.org
 
 What are you using for your map/reduce? Streaming/Java/Pig/Hive?
  
 -Ayon
 See My Photos on Flickr
 Also check out my Blog for answers to commonly asked questions.
 
 
 
 
 From: Daniel Yehdego dtyehd...@miners.utep.edu
 To: common-user@hadoop.apache.org
 Sent: Monday, September 19, 2011 10:43 PM
 Subject: Reducer to concatenate string values
 
 
 Good evening, 
 I have a certain value output from a mapper and I want to concatenate the 
 string outputs using a Reducer (one reducer).But the order of the 
 concatenated string values is not in order. How can I use a reducer that 
 receives a value from a mapper output and concatenate the strings in order. 
 waiting your response and thanks in advance.  
 
 Regards, 
 
 Daniel T. Yehdego
 Computational Science Program 
 University of Texas at El Paso, UTEP 
 dtyehd...@miners.utep.edu   
  

RE: Reducer to concatenate string values

2011-09-20 Thread Daniel Yehdego

Hi Ayon, 
any idea on my previous question?

Regards, 

Daniel T. Yehdego
Computational Science Program 
University of Texas at El Paso, UTEP 
dtyehd...@miners.utep.edu

 From: dtyehd...@miners.utep.edu
 To: common-user@hadoop.apache.org
 Subject: RE: Reducer to concatenate string values
 Date: Tue, 20 Sep 2011 06:06:22 +
 
 
 Hi Ayon, 
 I am using a C executable as my mapper (streaming), but I am not sure how to 
 use a reducer that concatenates the values from a mapper in order. 
 
 Regards, 
 
 Daniel T. Yehdego
 Computational Science Program 
 University of Texas at El Paso, UTEP 
 dtyehd...@miners.utep.edu
 
  Date: Mon, 19 Sep 2011 22:54:46 -0700
  From: ayonsi...@yahoo.com
  Subject: Re: Reducer to concatenate string values
  To: common-user@hadoop.apache.org
  
  What are you using for your map/reduce? Streaming/Java/Pig/Hive?
   
  -Ayon
  See My Photos on Flickr
  Also check out my Blog for answers to commonly asked questions.
  
  
  
  
  From: Daniel Yehdego dtyehd...@miners.utep.edu
  To: common-user@hadoop.apache.org
  Sent: Monday, September 19, 2011 10:43 PM
  Subject: Reducer to concatenate string values
  
  
  Good evening, 
  I have a certain value output from a mapper and I want to concatenate the 
  string outputs using a Reducer (one reducer).But the order of the 
  concatenated string values is not in order. How can I use a reducer that 
  receives a value from a mapper output and concatenate the strings in order. 
  waiting your response and thanks in advance.  
  
  Regards, 
  
  Daniel T. Yehdego
  Computational Science Program 
  University of Texas at El Paso, UTEP 
  dtyehd...@miners.utep.edu   
 
  

Reducer to concatenate string values

2011-09-19 Thread Daniel Yehdego

Good evening, 
I have a certain value output from a mapper and I want to concatenate the 
string outputs using a Reducer (one reducer).But the order of the concatenated 
string values is not in order. How can I use a reducer that receives a value 
from a mapper output and concatenate the strings in order. waiting your 
response and thanks in advance.  

Regards, 

Daniel T. Yehdego
Computational Science Program 
University of Texas at El Paso, UTEP 
dtyehd...@miners.utep.edu 

HADOOP MapReduce sorting

2011-09-08 Thread Daniel Yehdego

Hi, 
I want to use an input file which has lines of sequences in which each line 
(RNA sequence) will be mapped to the mapper (an executable programthat 
determines the secondary structure of each line of sequence). I am also using a 
reducer which concatenates the output linesfrom the mapper. But I have some 
problem that the final output is not sorted in an orderly manner as the input 
sequence (RNA-1,RNA-2,RNA-3). 
STDIN INPUT FILE : RNA-1 RNA-2  
   RNA-3.
MAPPER 
OutPutMAP1RNA-2STRUCTURE-2MAP2RNA-1STRUCTURE-1MAP3RNA-3STRUCTURE-3REDUCER
 OUTPUTRNA-2RNA-1RNA-3\tSTRUCTURE-1STRUCTURE-2STRUCTURE-3\n 
ORRNA-3RNA-2RNA-1\tSTRUCTURE-1STRUCTURE-2STRUCTURE-3\n
and what I am looking is to reduce in the following ordered manner: 
RNA-1RNA-2RNA-3\tSTRUCTURE-1STRUCTURE-2STRUCTURE-3\nlooking forward 
to your input. 

Regards, 

Daniel T. Yehdego
Computational Science Program 
University of Texas at El Paso, UTEP 
dtyehd...@miners.utep.edu 

Hadoop reducer according an inout

2011-09-04 Thread Daniel Yehdego


Hi, 
I am using Hadoop streaming to distribute some biological data strings. My 
mapper is some executable binary program to determine the structure of a 
certain input. I am also using some reducer script to glue the output strings 
from the mapper so that I have a one long string. But I have a problem that the 
order of the output string is not same as the input from the mapper. Is there a 
way that I can use Hadoop so that the output is in the same order as the input.
Assume we have an output from the 
mapperMAP1RNA-1STRUCTURE-1MAP2RNA-2STRUCTURE-2MAP3RNA-3STRUCTURE-3and
 what I am looking is to reduce in the following manner: 
RNA-1RNA-2RNA-3\tSTRUCTURE-1STRUCTURE-2STRUCTURE-3\n
your input highly appreciated. Thanks in advance.Regards, 

Daniel T. Yehdego
Computational Science Program 
University of Texas at El Paso, UTEP 
dtyehd...@miners.utep.edu 

Hadoop Reduction order

2011-09-04 Thread Daniel Yehdego

Hi, I am using Hadoop streaming to distribute some biological data strings. My 
mapper is some executable binary program to determine the structure of a 
certain input. I am also using some reducer script to glue the output strings 
from the mapper so that I have a one long string. But I have a problem that the 
order of the output string is not same as the input from the mapper. Is there a 
way that I can use Hadoop so that the output is in the same order as the 
input.Assume we have an output from the 
mapperMAP1RNA-1STRUCTURE-1MAP2RNA-2STRUCTURE-2MAP3RNA-3STRUCTURE-3and
 what I am looking is to reduce in the following manner: 
RNA-1RNA-2RNA-3\tSTRUCTURE-1STRUCTURE-2STRUCTURE-3\nyour input 
highly appreciated. Thanks in advance.Regards, 
Regards, 

Daniel T. Yehdego
Computational Science Program 
University of Texas at El Paso, UTEP 
dtyehd...@miners.utep.edu 

RE: Hadoop-streaming using binary executable c program

2011-08-01 Thread Daniel Yehdego

Hi Bobby, 

I have written a small Perl script which do the following job:

Assume we have an output from the mapper

MAP1
RNA-1
STRUCTURE-1

MAP2
RNA-2
STRUCTURE-2

MAP3
RNA-3
STRUCTURE-3

and what the script does is reduce in the following manner : 
RNA-1RNA-2RNA-3\tSTRUCTURE-1STRUCTURE-2STRUCTURE-3\n
 and the script looks like this:

#!/usr/bin/perl
use strict;
use warnings;
use autodie;

my @handles = map { open my $h, '', $_; $h } @ARGV;

while (@handles){
@handles = grep { ! eof $_ } @handles;
my @lines = map { my $v = $_; chomp $v; $v } @handles;
print join(' ', @lines), \n;
}

close $_ for @handles;

This should work for any inputs from the  mapper. But after I use hadoop 
streaming and put the above code as my reducer, the job was successful
but the output files were empty. And I couldn't find out.

 bin/hadoop jar /data/yehdego/hadoop-0.20.2/hadoop-0.20.2-streaming.jar 
-mapper ./hadoopPknotsRG 
-file /data/yehdego/hadoop-0.20.2/pknotsRG 
-file /data/yehdego/hadoop-0.20.2/hadoopPknotsRG 
-reducer ./reducer.pl 
-file /data/yehdego/hadoop-0.20.2/reducer.pl  
-input /user/yehdego/RF00028_B.bpseqL3G5_seg_Centered_Method.txt 
-output /user/yehdego/RFR2-out - verbose

Any help or suggestion is really appreciatedI am just stuck here for the 
weekend.
 
Regards, 

Daniel T. Yehdego
Computational Science Program 
University of Texas at El Paso, UTEP 
dtyehd...@miners.utep.edu

 From: ev...@yahoo-inc.com
 To: common-user@hadoop.apache.org
 Date: Thu, 28 Jul 2011 07:12:11 -0700
 Subject: Re: Hadoop-streaming using binary executable c program
 
 I am not completely sure what you are getting at.  It looks like the output 
 of your c program is (And this is just a guess)  NOTE: \t stands for the tab 
 character and in streaming it is used to separate the key from the value \n 
 stands for carriage return and is used to separate individual records..
 RNA-1\tSTRUCTURE-1\n
 RNA-2\tSTRUCTURE-2\n
 RNA-3\tSTRUCTURE-3\n
 ...
 
 
 And you want the output to look like
 RNA-1RNA-2RNA-3\tSTRUCTURE-1STRUCTURE-2STRUCTURE-3\n
 
 You could use a reduce to do this, but the issue here is with the shuffle in 
 between the maps and the reduces.  The Shuffle will group by the key to send 
 to the reducers and then sort by the key.  So in reality your map output 
 looks something like
 
 FROM MAP 1:
 RNA-1\tSTRUCTURE-1\n
 RNA-2\tSTRUCTURE-2\n
 
 FROM MAP 2:
 RNA-3\tSTRUCTURE-3\n
 RNA-4\tSTRUCTURE-4\n
 
 FROM MAP 3:
 RNA-5\tSTRUCTURE-5\n
 RNA-6\tSTRUCTURE-6\n
 
 If you send it to a single reducer (The only way to get a single file) Then 
 the input to the reducer will be sorted alphabetically by the RNA, and the 
 order of the input will be lost.  You can work around this by giving each 
 line a unique number that is in the order you want It to be output.  But 
 doing this would require you to write some code.  I would suggest that you do 
 it with a small shell script after all the maps have completed to splice them 
 together.
 
 --
 Bobby
 
 On 7/27/11 2:55 PM, Daniel Yehdego dtyehd...@miners.utep.edu wrote:
 
 
 
 Hi Bobby,
 
 I just want to ask you if there is away of using a reducer or something like 
 concatenation to glue my outputs from the mapper and outputs
 them as a single file and segment of the predicted RNA 2D structure?
 
 FYI: I have used a reducer NONE before:
 
 HADOOP_HOME$ bin/hadoop jar
 /data/yehdego/hadoop-0.20.2/hadoop-0.20.2-streaming.jar -mapper
 ./hadoopPknotsRG -file /data/yehdego/hadoop-0.20.2/pknotsRG -file
 /data/yehdego/hadoop-0.20.2/hadoopPknotsRG -input
 /user/yehdego/RF00028_B.bpseqL3G5_seg_Centered_Method.txt -output
 /user/yehdego/RF-out -reducer NONE -verbose
 
 and a sample of my output using the mapper of two different slave nodes looks 
 like this :
 
 AUACCCGCAAAUUCACUCAAAUCUGUAAUAGGUUUGUCAUUCAAAUCUAGUGCAAAUAUUACUUUCGCCAAUUAGGUAUAAUAAUGGUAAGC
 and
 [...(((...))).].
   (-13.46)
 
 GGGACAAGACUCGACAUUUGAUACACUAUUUAUCAAUGGAUGUCUUCU
 .(((.((......)..  (-11.00)
 
 and I want to concatenate and output them as a single predicated RNA sequence 
 structure:
 
 AUACCCGCAAAUUCACUCAAAUCUGUAAUAGGUUUGUCAUUCAAAUCUAGUGCAAAUAUUACUUUCGCCAAUUAGGUAUAAUAAUGGUAAGCGGGACAAGACUCGACAUUUGAUACACUAUUUAUCAAUGGAUGUCUUCU
 
 [...(((...))).]..(((.((......)..
 
 
 Regards,
 
 Daniel T. Yehdego
 Computational Science Program
 University of Texas at El Paso, UTEP
 dtyehd...@miners.utep.edu
 
  From: dtyehd...@miners.utep.edu
  To: common-user@hadoop.apache.org
  Subject: RE: Hadoop-streaming using binary executable c program
  Date: Tue, 26 Jul 2011 16:23:10 +
 
 
  Good afternoon Bobby,
 
  Thanks so much, now its working excellent. And the speed is also 
  reasonable. Once again thanks u.
 
  Regards,
 
  Daniel T. Yehdego
  Computational

RE: Hadoop-streaming using binary executable c program

2011-07-27 Thread Daniel Yehdego

Hi Bobby, 

I just want to ask you if there is away of using a reducer or something like 
concatenation to glue my outputs from the mapper and outputs
them as a single file and segment of the predicted RNA 2D structure?

FYI: I have used a reducer NONE before:

HADOOP_HOME$ bin/hadoop jar
/data/yehdego/hadoop-0.20.2/hadoop-0.20.2-streaming.jar -mapper
./hadoopPknotsRG -file /data/yehdego/hadoop-0.20.2/pknotsRG -file
/data/yehdego/hadoop-0.20.2/hadoopPknotsRG -input
/user/yehdego/RF00028_B.bpseqL3G5_seg_Centered_Method.txt -output
/user/yehdego/RF-out -reducer NONE -verbose

and a sample of my output using the mapper of two different slave nodes looks 
like this :

AUACCCGCAAAUUCACUCAAAUCUGUAAUAGGUUUGUCAUUCAAAUCUAGUGCAAAUAUUACUUUCGCCAAUUAGGUAUAAUAAUGGUAAGC
and
[...(((...))).].
  (-13.46)

GGGACAAGACUCGACAUUUGAUACACUAUUUAUCAAUGGAUGUCUUCU
.(((.((......)..  (-11.00)

and I want to concatenate and output them as a single predicated RNA sequence 
structure:

AUACCCGCAAAUUCACUCAAAUCUGUAAUAGGUUUGUCAUUCAAAUCUAGUGCAAAUAUUACUUUCGCCAAUUAGGUAUAAUAAUGGUAAGCGGGACAAGACUCGACAUUUGAUACACUAUUUAUCAAUGGAUGUCUUCU
   

[...(((...))).]..(((.((......)..
  


Regards, 

Daniel T. Yehdego
Computational Science Program 
University of Texas at El Paso, UTEP 
dtyehd...@miners.utep.edu

 From: dtyehd...@miners.utep.edu
 To: common-user@hadoop.apache.org
 Subject: RE: Hadoop-streaming using binary executable c program
 Date: Tue, 26 Jul 2011 16:23:10 +
 
 
 Good afternoon Bobby, 
 
 Thanks so much, now its working excellent. And the speed is also reasonable. 
 Once again thanks u.  
 
 Regards, 
 
 Daniel T. Yehdego
 Computational Science Program 
 University of Texas at El Paso, UTEP 
 dtyehd...@miners.utep.edu
 
  From: ev...@yahoo-inc.com
  To: common-user@hadoop.apache.org
  Date: Mon, 25 Jul 2011 14:47:34 -0700
  Subject: Re: Hadoop-streaming using binary executable c program
  
  This is likely to be slow and it is not ideal.  The ideal would be to 
  modify pknotsRG to be able to read from stdin, but that may not be possible.
  
  The shell script would probably look something like the following
  
  #!/bin/sh
  rm -f temp.txt;
  while read line
  do
echo $line  temp.txt;
  done
  exec pknotsRG temp.txt;
  
  Place it in a file say hadoopPknotsRG  Then you probably want to run
  
  chmod +x hadoopPknotsRG
  
  After that you want to test it with
  
  hadoop fs -cat 
  /user/yehdego/RNAData/RF00028_B.bpseqL3G5_seg_Centered_Method.txt | head -2 
  | ./hadoopPknotsRG
  
  If that works then you can try it with Hadoop streaming
  
  HADOOP_HOME$ bin/hadoop jar 
  /data/yehdego/hadoop-0.20.2/hadoop-0.20.2-streaming.jar -mapper 
  ./hadoopPknotsRG -file /data/yehdego/hadoop-0.20.2/pknotsRG -file 
  /data/yehdego/hadoop-0.20.2/hadoopPknotsRG -input 
  /user/yehdego/RF00028_B.bpseqL3G5_seg_Centered_Method.txt -output 
  /user/yehdego/RF-out -reducer NONE -verbose
  
  --Bobby
  
  On 7/25/11 3:37 PM, Daniel Yehdego dtyehd...@miners.utep.edu wrote:
  
  
  
  Good afternoon Bobby,
  
  Thanks, you gave me a great help in finding out what the problem was. After 
  I put the command line you suggested me, I found out that there was a 
  segmentation error.
  The binary executable program pknotsRG only reads a file with a sequence in 
  it. This means, there should be a shell script, as you have said, that will 
  take the data coming
  from stdin and write it to a temporary file. Any idea on how to do this job 
  in shell script. The thing is I am from a biology background and don't have 
  much experience in CS.
  looking forward to hear from you. Thanks so much.
  
  Regards,
  
  Daniel T. Yehdego
  Computational Science Program
  University of Texas at El Paso, UTEP
  dtyehd...@miners.utep.edu
  
   From: ev...@yahoo-inc.com
   To: common-user@hadoop.apache.org
   Date: Fri, 22 Jul 2011 12:39:08 -0700
   Subject: Re: Hadoop-streaming using binary executable c program
  
   I would suggest that you do the following to help you debug.
  
   hadoop fs -cat 
   /user/yehdego/RNAData/RF00028_B.bpseqL3G5_seg_Centered_Method.txt | head 
   -2 | /data/yehdego/hadoop-0.20.2/pknotsRG-1.3/src/pknotsRG -
  
   This is simulating what hadoop streaming is doing.  Here we are taking 
   the first 2 lines out of the input file and feeding them to the stdin of 
   pknotsRG.  The first step is to make sure that you can get your program 
   to run correctly with something like this.  You may need to change the 
   command line to pknotsRG to get it to read the data it is processing from 
   stdin, instead of from a file.  Alternatively you may need to write a 
   shell script that will take the data coming from stdin.  Write it to a 
   file and then call pknotsRG

RE: Hadoop-streaming using binary executable c program

2011-07-26 Thread Daniel Yehdego

Good afternoon Bobby, 

Thanks so much, now its working excellent. And the speed is also reasonable. 
Once again thanks u.  

Regards, 

Daniel T. Yehdego
Computational Science Program 
University of Texas at El Paso, UTEP 
dtyehd...@miners.utep.edu

 From: ev...@yahoo-inc.com
 To: common-user@hadoop.apache.org
 Date: Mon, 25 Jul 2011 14:47:34 -0700
 Subject: Re: Hadoop-streaming using binary executable c program
 
 This is likely to be slow and it is not ideal.  The ideal would be to modify 
 pknotsRG to be able to read from stdin, but that may not be possible.
 
 The shell script would probably look something like the following
 
 #!/bin/sh
 rm -f temp.txt;
 while read line
 do
   echo $line  temp.txt;
 done
 exec pknotsRG temp.txt;
 
 Place it in a file say hadoopPknotsRG  Then you probably want to run
 
 chmod +x hadoopPknotsRG
 
 After that you want to test it with
 
 hadoop fs -cat 
 /user/yehdego/RNAData/RF00028_B.bpseqL3G5_seg_Centered_Method.txt | head -2 | 
 ./hadoopPknotsRG
 
 If that works then you can try it with Hadoop streaming
 
 HADOOP_HOME$ bin/hadoop jar 
 /data/yehdego/hadoop-0.20.2/hadoop-0.20.2-streaming.jar -mapper 
 ./hadoopPknotsRG -file /data/yehdego/hadoop-0.20.2/pknotsRG -file 
 /data/yehdego/hadoop-0.20.2/hadoopPknotsRG -input 
 /user/yehdego/RF00028_B.bpseqL3G5_seg_Centered_Method.txt -output 
 /user/yehdego/RF-out -reducer NONE -verbose
 
 --Bobby
 
 On 7/25/11 3:37 PM, Daniel Yehdego dtyehd...@miners.utep.edu wrote:
 
 
 
 Good afternoon Bobby,
 
 Thanks, you gave me a great help in finding out what the problem was. After I 
 put the command line you suggested me, I found out that there was a 
 segmentation error.
 The binary executable program pknotsRG only reads a file with a sequence in 
 it. This means, there should be a shell script, as you have said, that will 
 take the data coming
 from stdin and write it to a temporary file. Any idea on how to do this job 
 in shell script. The thing is I am from a biology background and don't have 
 much experience in CS.
 looking forward to hear from you. Thanks so much.
 
 Regards,
 
 Daniel T. Yehdego
 Computational Science Program
 University of Texas at El Paso, UTEP
 dtyehd...@miners.utep.edu
 
  From: ev...@yahoo-inc.com
  To: common-user@hadoop.apache.org
  Date: Fri, 22 Jul 2011 12:39:08 -0700
  Subject: Re: Hadoop-streaming using binary executable c program
 
  I would suggest that you do the following to help you debug.
 
  hadoop fs -cat 
  /user/yehdego/RNAData/RF00028_B.bpseqL3G5_seg_Centered_Method.txt | head -2 
  | /data/yehdego/hadoop-0.20.2/pknotsRG-1.3/src/pknotsRG -
 
  This is simulating what hadoop streaming is doing.  Here we are taking the 
  first 2 lines out of the input file and feeding them to the stdin of 
  pknotsRG.  The first step is to make sure that you can get your program to 
  run correctly with something like this.  You may need to change the command 
  line to pknotsRG to get it to read the data it is processing from stdin, 
  instead of from a file.  Alternatively you may need to write a shell script 
  that will take the data coming from stdin.  Write it to a file and then 
  call pknotsRG on that temporary file.  Once you have this working then you 
  should try it again with streaming.
 
  --Bobby Evans
 
  On 7/22/11 12:31 PM, Daniel Yehdego dtyehd...@miners.utep.edu wrote:
 
 
 
  Hi Bobby, Thanks for the response.
 
  After I tried the following comannd:
 
  bin/hadoop jar $HADOOP_HOME/hadoop-0.20.2-streaming.jar -mapper 
  /data/yehdego/hadoop-0.20.2/pknotsRG-1.3/src/pknotsRG -  -file 
  /data/yehdego/hadoop-0.20.2/pknotsRG-1.3/src/pknotsRG  -reducer NONE -input 
  /user/yehdego/RNAData/RF00028_B.bpseqL3G5_seg_Centered_Method.txt -output 
  /user/yehdego/RF-out - verbose
 
  I got a stderr logs :
 
  java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess 
  failed with code 139
  at 
  org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:311)
  at 
  org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545)
  at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:132)
  at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
  at 
  org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
  at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
  at org.apache.hadoop.mapred.Child.main(Child.java:170)
 
 
 
  syslog logs
 
  2011-07-22 13:02:27,467 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: 
  Initializing JVM Metrics with processName=MAP, sessionId=
  2011-07-22 13:02:27,913 INFO org.apache.hadoop.mapred.MapTask: 
  numReduceTasks: 0
  2011-07-22 13:02:28,149 INFO org.apache.hadoop.streaming.PipeMapRed: 
  PipeMapRed exec 
  [/data/yehdego/hadoop_tmp/dfs/local/taskTracker/jobcache/job_201107181535_0079/attempt_201107181535_0079_m_00_0/work/./pknotsRG]
  2011-07-22

RE: Hadoop-streaming using binary executable c program

2011-07-25 Thread Daniel Yehdego

Good afternoon Bobby, 

Thanks, you gave me a great help in finding out what the problem was. After I 
put the command line you suggested me, I found out that there was a 
segmentation error.
The binary executable program pknotsRG only reads a file with a sequence in it. 
This means, there should be a shell script, as you have said, that will take 
the data coming
from stdin and write it to a temporary file. Any idea on how to do this job in 
shell script. The thing is I am from a biology background and don't have much 
experience in CS.
looking forward to hear from you. Thanks so much.

Regards, 

Daniel T. Yehdego
Computational Science Program 
University of Texas at El Paso, UTEP 
dtyehd...@miners.utep.edu

 From: ev...@yahoo-inc.com
 To: common-user@hadoop.apache.org
 Date: Fri, 22 Jul 2011 12:39:08 -0700
 Subject: Re: Hadoop-streaming using binary executable c program
 
 I would suggest that you do the following to help you debug.
 
 hadoop fs -cat 
 /user/yehdego/RNAData/RF00028_B.bpseqL3G5_seg_Centered_Method.txt | head -2 | 
 /data/yehdego/hadoop-0.20.2/pknotsRG-1.3/src/pknotsRG -
 
 This is simulating what hadoop streaming is doing.  Here we are taking the 
 first 2 lines out of the input file and feeding them to the stdin of 
 pknotsRG.  The first step is to make sure that you can get your program to 
 run correctly with something like this.  You may need to change the command 
 line to pknotsRG to get it to read the data it is processing from stdin, 
 instead of from a file.  Alternatively you may need to write a shell script 
 that will take the data coming from stdin.  Write it to a file and then call 
 pknotsRG on that temporary file.  Once you have this working then you should 
 try it again with streaming.
 
 --Bobby Evans
 
 On 7/22/11 12:31 PM, Daniel Yehdego dtyehd...@miners.utep.edu wrote:
 
 
 
 Hi Bobby, Thanks for the response.
 
 After I tried the following comannd:
 
 bin/hadoop jar $HADOOP_HOME/hadoop-0.20.2-streaming.jar -mapper 
 /data/yehdego/hadoop-0.20.2/pknotsRG-1.3/src/pknotsRG -  -file 
 /data/yehdego/hadoop-0.20.2/pknotsRG-1.3/src/pknotsRG  -reducer NONE -input 
 /user/yehdego/RNAData/RF00028_B.bpseqL3G5_seg_Centered_Method.txt -output 
 /user/yehdego/RF-out - verbose
 
 I got a stderr logs :
 
 java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed 
 with code 139
 at 
 org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:311)
 at 
 org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545)
 at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:132)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
 at 
 org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
 at org.apache.hadoop.mapred.Child.main(Child.java:170)
 
 
 
 syslog logs
 
 2011-07-22 13:02:27,467 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: 
 Initializing JVM Metrics with processName=MAP, sessionId=
 2011-07-22 13:02:27,913 INFO org.apache.hadoop.mapred.MapTask: 
 numReduceTasks: 0
 2011-07-22 13:02:28,149 INFO org.apache.hadoop.streaming.PipeMapRed: 
 PipeMapRed exec 
 [/data/yehdego/hadoop_tmp/dfs/local/taskTracker/jobcache/job_201107181535_0079/attempt_201107181535_0079_m_00_0/work/./pknotsRG]
 2011-07-22 13:02:28,242 INFO org.apache.hadoop.streaming.PipeMapRed: 
 R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s]
 2011-07-22 13:02:28,267 INFO org.apache.hadoop.streaming.PipeMapRed: 
 MROutputThread done
 2011-07-22 13:02:28,267 INFO org.apache.hadoop.streaming.PipeMapRed: 
 MRErrorThread done
 2011-07-22 13:02:28,267 INFO org.apache.hadoop.streaming.PipeMapRed: 
 PipeMapRed failed!
 2011-07-22 13:02:28,361 WARN org.apache.hadoop.mapred.TaskTracker: Error 
 running child
 java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed 
 with code 139
 at 
 org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:311)
 at 
 org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545)
 at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:132)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
 at 
 org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
 at org.apache.hadoop.mapred.Child.main(Child.java:170)
 2011-07-22 13:02:28,395 INFO org.apache.hadoop.mapred.TaskRunner: Runnning 
 cleanup for the task
 
 
 
 Regards,
 
 Daniel T. Yehdego
 Computational Science Program
 University of Texas at El Paso, UTEP
 dtyehd...@miners.utep.edu
 
  From: ev...@yahoo-inc.com
  To: common-user@hadoop.apache.org; dtyehd...@miners.utep.edu
  Date: Fri, 22 Jul 2011 09:12:18

Hadoop-streaming with a c binary executable as a mapper

2011-07-22 Thread Daniel Yehdego
Hi, 

I using hadoop-streaming for parallelizing a big RNA data. I am using a
c binary executable program called pknotsRG as my mapper. My command to
run the job looks like:

HADOOP_HOME$  bin/hadoop
jar /data/yehdego/hadoop-0.20.2/hadoop-0.20.2-streaming.jar
-mapper /data/yehdego/hadoop-0.20.2/pknotsRG
-file /data/yehdego/hadoop-0.20.2/pknotsRG 
-input /user/yehdego/RF00028_B.bpseqL3G5_seg_Centered_Method.txt
-output /user/yehdego/RF-out 
-reducer NONE 
-verbose 

and I keep getting the following error messages:

java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess
failed with code 1
at 
org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:311)
at 
org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:132)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)

FYI: I am inputing a file with lines of sequences and the mapper is expected to 
take each line 
and execute and predict their 2D secondary structure. I tried the executable 
locally and it worked.

[yehdego@bulgaria hadoop-0.20.2]$ ./pknotsRG
RF00028_B.bpseqL3G5_seg_Centered_Method.txt 

AUGACUCUCUAAAUUGCUUUACCUUUGGAGGGGUUAUCAGGCCUGCACCUGAUAGCUAGUCUUUAAACCAAUAGAUUGCAUCGGUUUAAUA
(..)...((..))[.{{...].}}...
  
GCAAGACCGUCAAAUUGCGGGGGGU
.....  
CAACAGCCGUUCAGUACCAAGUCUCAA
..((.((.(()).)).)).  
AACUUUGAGAUGGCCUUGCAAAGGAUAUGGUAAUAAGCUGACGGACAGGGUCCUAACCACGCAGCCAAGUCCUAAGUCAACAUUU
..[[[.]]](.......)).))..)....
  
CGGUGUUGAUAUGGAUGCAGUUCACAGACUAAAUGUCGGUCAAGAAUAGGUAUUCUUCUCAUAAGAUAUAGUCGGACCUCUCCUUAAUGGGAGCU
.(((...(...)..(...())..))).()..
  


Hadoop-streaming using binary executable c program

2011-07-22 Thread Daniel Yehdego
I am trying to parallelize some very long RNA sequence for the sake of
predicting their RNA 2D structures. I am using a binary executable c
program called pknotsRG as my mapper. I tried the following bin/hadoop
command:

HADOOP_HOME$ bin/hadoop
jar /data/yehdego/hadoop-0.20.2/hadoop-0.20.2-streaming.jar
-mapper /data/yehdego/hadoop-0.20.2/pknotsRG
-file /data/yehdego/hadoop-0.20.2/pknotsRG
-input /user/yehdego/RF00028_B.bpseqL3G5_seg_Centered_Method.txt
-output /user/yehdego/RF-out -reducer NONE -verbose 

but i keep getting the following error message: 

java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess
failed with code 1
at
org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:311)
at
org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:132)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)

FYI: my input file is RF00028_B.bpseqL3G5_seg_Centered_Method.txt which
is a chunk of RNA sequences and the mapper is expected to get the input
and execute the input file line by line and out put the predicted
structure for each line of sequence for a specified number of maps. Any
help on this problem is really appreciated. Thanks.



RE: Hadoop-streaming with a c binary executable as a mapper

2011-07-22 Thread Daniel Yehdego

Thanks Joey for your quick response, 

I have tried the suggestion you gave me and its still not working, after I  run:

bin/hadoop jar $HADOOP_HOME/hadoop-0.20.2-streaming.jar -mapper 
/data/yehdego/hadoop-0.20.2/pknotsRG-1.3/src/pknotsRG -  -file 
/data/yehdego/hadoop-0.20.2/pknotsRG-1.3/src/pknotsRG  -reducer NONE -input 
/user/yehdego/RNAData/RF00028_B.bpseqL3G5_seg_Centered_Method.txt -output 
/user/yehdego/RF-out - verbose

I  got the following task logs:

java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed 
with code 139
at 
org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:311)
at 
org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:132)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)



syslog logs

2011-07-22 13:02:27,467 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: 
Initializing JVM Metrics with processName=MAP, sessionId=
2011-07-22 13:02:27,913 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 0
2011-07-22 13:02:28,149 INFO org.apache.hadoop.streaming.PipeMapRed: PipeMapRed 
exec 
[/data/yehdego/hadoop_tmp/dfs/local/taskTracker/jobcache/job_201107181535_0079/attempt_201107181535_0079_m_00_0/work/./pknotsRG]
2011-07-22 13:02:28,242 INFO org.apache.hadoop.streaming.PipeMapRed: 
R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s]
2011-07-22 13:02:28,267 INFO org.apache.hadoop.streaming.PipeMapRed: 
MROutputThread done
2011-07-22 13:02:28,267 INFO org.apache.hadoop.streaming.PipeMapRed: 
MRErrorThread done
2011-07-22 13:02:28,267 INFO org.apache.hadoop.streaming.PipeMapRed: PipeMapRed 
failed!
2011-07-22 13:02:28,361 WARN org.apache.hadoop.mapred.TaskTracker: Error 
running child
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed 
with code 139
at 
org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:311)
at 
org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:132)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
2011-07-22 13:02:28,395 INFO org.apache.hadoop.mapred.TaskRunner: Runnning 
cleanup for the task


Regards, 

Daniel T. Yehdego
Computational Science Program 
University of Texas at El Paso, UTEP 
dtyehd...@miners.utep.edu

 CC: common-user@hadoop.apache.org
 From: j...@cloudera.com
 Subject: Re: Hadoop-streaming with a c binary executable as a mapper
 Date: Fri, 22 Jul 2011 11:34:08 -0400
 To: common-user@hadoop.apache.org
 
 Your executable needs to read lines from standard in. Try setting your mapper 
 like this:
 
  -mapper /data/yehdego/hadoop-0.20.2/pknotsRG -
 
 If that doesn't work, you may need to execute your C program from a shell 
 script. The -I added to the command line says read from STDIN. 
 
 -Joey
 
 
 On Jul 22, 2011, at 10:41, Daniel Yehdego dtyehd...@miners.utep.edu wrote:
 
  Hi, 
  
  I using hadoop-streaming for parallelizing a big RNA data. I am using a
  c binary executable program called pknotsRG as my mapper. My command to
  run the job looks like:
  
  HADOOP_HOME$  bin/hadoop
  jar /data/yehdego/hadoop-0.20.2/hadoop-0.20.2-streaming.jar
  -mapper /data/yehdego/hadoop-0.20.2/pknotsRG
  -file /data/yehdego/hadoop-0.20.2/pknotsRG 
  -input /user/yehdego/RF00028_B.bpseqL3G5_seg_Centered_Method.txt
  -output /user/yehdego/RF-out 
  -reducer NONE 
  -verbose 
  
  and I keep getting the following error messages:
  
  java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess
  failed with code 1
 at 
  org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:311)
 at 
  org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545)
 at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:132)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
 at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
 at org.apache.hadoop.mapred.Child.main(Child.java:170)
  
  FYI: I am inputing a file with lines of sequences and the mapper is 
  expected to take each line 
  and execute and predict their 2D secondary structure. I tried

RE: Hadoop-streaming using binary executable c program

2011-07-22 Thread Daniel Yehdego

Hi Bobby, Thanks for the response.

After I tried the following comannd:

bin/hadoop jar $HADOOP_HOME/hadoop-0.20.2-streaming.jar -mapper 
/data/yehdego/hadoop-0.20.2/pknotsRG-1.3/src/pknotsRG -  -file 
/data/yehdego/hadoop-0.20.2/pknotsRG-1.3/src/pknotsRG  -reducer NONE -input 
/user/yehdego/RNAData/RF00028_B.bpseqL3G5_seg_Centered_Method.txt -output 
/user/yehdego/RF-out - verbose

I got a stderr logs :

java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed 
with code 139
at 
org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:311)
at 
org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:132)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)



syslog logs

2011-07-22 13:02:27,467 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: 
Initializing JVM Metrics with processName=MAP, sessionId=
2011-07-22 13:02:27,913 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 0
2011-07-22 13:02:28,149 INFO org.apache.hadoop.streaming.PipeMapRed: PipeMapRed 
exec 
[/data/yehdego/hadoop_tmp/dfs/local/taskTracker/jobcache/job_201107181535_0079/attempt_201107181535_0079_m_00_0/work/./pknotsRG]
2011-07-22 13:02:28,242 INFO org.apache.hadoop.streaming.PipeMapRed: 
R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s]
2011-07-22 13:02:28,267 INFO org.apache.hadoop.streaming.PipeMapRed: 
MROutputThread done
2011-07-22 13:02:28,267 INFO org.apache.hadoop.streaming.PipeMapRed: 
MRErrorThread done
2011-07-22 13:02:28,267 INFO org.apache.hadoop.streaming.PipeMapRed: PipeMapRed 
failed!
2011-07-22 13:02:28,361 WARN org.apache.hadoop.mapred.TaskTracker: Error 
running child
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed 
with code 139
at 
org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:311)
at 
org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:132)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
2011-07-22 13:02:28,395 INFO org.apache.hadoop.mapred.TaskRunner: Runnning 
cleanup for the task



Regards, 

Daniel T. Yehdego
Computational Science Program 
University of Texas at El Paso, UTEP 
dtyehd...@miners.utep.edu

 From: ev...@yahoo-inc.com
 To: common-user@hadoop.apache.org; dtyehd...@miners.utep.edu
 Date: Fri, 22 Jul 2011 09:12:18 -0700
 Subject: Re: Hadoop-streaming using binary executable c program
 
 It looks like it tried to run your program and the program exited with a 1 
 not a 0.  What are the stderr logs like for the mappers that were launched, 
 you should be able to access them through the Web GUI?  You might want to add 
 in some stderr log messages to you c program too. To be able to debug how far 
 along it is going before exiting.
 
 --Bobby Evans
 
 On 7/22/11 9:19 AM, Daniel Yehdego dtyehd...@miners.utep.edu wrote:
 
 I am trying to parallelize some very long RNA sequence for the sake of
 predicting their RNA 2D structures. I am using a binary executable c
 program called pknotsRG as my mapper. I tried the following bin/hadoop
 command:
 
 HADOOP_HOME$ bin/hadoop
 jar /data/yehdego/hadoop-0.20.2/hadoop-0.20.2-streaming.jar
 -mapper /data/yehdego/hadoop-0.20.2/pknotsRG
 -file /data/yehdego/hadoop-0.20.2/pknotsRG
 -input /user/yehdego/RF00028_B.bpseqL3G5_seg_Centered_Method.txt
 -output /user/yehdego/RF-out -reducer NONE -verbose
 
 but i keep getting the following error message:
 
 java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess
 failed with code 1
 at
 org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:311)
 at
 org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545)
 at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:132)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
 at 
 org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
 at org.apache.hadoop.mapred.Child.main(Child.java:170)
 
 FYI: my input file is RF00028_B.bpseqL3G5_seg_Centered_Method.txt which
 is a chunk of RNA sequences and the mapper is expected