input file order
Hi, I have 100 .txt input files and I want my mapper output to be in an orderly manner. I am not using any reducer.Any idea? Regards,
Hadoop Streaming
I have the following error in running hadoop streaming, PipeMapRed\.waitOutputThreads(): subprocess failed with code 126at org\.apache\.hadoop\.streaming\.PipeMapRed\.waitOutputThreads(PipeMapRed\.java:311) at org\.apache\.hadoop\.streaming\.PipeMapRed\.mapRedFinished(PipeMapRed\.java:545) at org\.apache\.hadoop\.streaming\.PipeMapper\.close(PipeMapper\.java:132) at org\.apache\.hadoop\.mapred\.MapRunner\.run(MapRunner\.java:57) at org\.apache\.hadoop\.streaming\.PipeMapRunner\.run(PipeMapRunner\.java:36) at org\.apache\.hadoop\.mapred\.MapTask\.runOldMapper(MapTask\.java:358)at org\.apache\.hadoop\.mapred\.MapTask\.run(MapTask\.java:307) at org\.apache\.hadoop\.mapred\.Child\.main(Child\.java:170) I couldn't find out any other error information. Any help ?
RE: Hadoop Streaming
Thanks Tom for your reply, I think my code is reading from stdin. Because I tried it locally using the following command and its running: $ bin/hadoop fs -cat /user/yehdego/Hadoop-Data-New/RF00171_A.bpseqL3G1_seg_Optimized_Method.txt | head -2 | ./HADOOP But when I tried streaming , it failed and gave me the error code 126. Date: Sat, 3 Dec 2011 19:14:20 -0800 Subject: Re: Hadoop Streaming From: t...@supertom.com To: common-user@hadoop.apache.org So that code 126 should be kicked out by your program - do you know what that means? Your code can read from stdin? Thanks, Tom On Sat, Dec 3, 2011 at 7:09 PM, Daniel Yehdego dtyehd...@miners.utep.edu wrote: I have the following error in running hadoop streaming, PipeMapRed\.waitOutputThreads(): subprocess failed with code 126at org\.apache\.hadoop\.streaming\.PipeMapRed\.waitOutputThreads(PipeMapRed\.java:311) at org\.apache\.hadoop\.streaming\.PipeMapRed\.mapRedFinished(PipeMapRed\.java:545) at org\.apache\.hadoop\.streaming\.PipeMapper\.close(PipeMapper\.java:132) at org\.apache\.hadoop\.mapred\.MapRunner\.run(MapRunner\.java:57) at org\.apache\.hadoop\.streaming\.PipeMapRunner\.run(PipeMapRunner\.java:36) at org\.apache\.hadoop\.mapred\.MapTask\.runOldMapper(MapTask\.java:358) at org\.apache\.hadoop\.mapred\.MapTask\.run(MapTask\.java:307) at org\.apache\.hadoop\.mapred\.Child\.main(Child\.java:170) I couldn't find out any other error information. Any help ?
RE: Hadoop Streaming
TOM, What the HADOOP script do is ...read each line from the STDIN and execute the program pknotsRG. tmp.txt is a temporary file. the script is like this: #!/bin/sh rm -f temp.txt;while read line doecho $line temp.txt;doneexec /data/yehdego/hadoop-0.20.2/PKNOTSRG/src/pknotsRG -k 0 -F temp.txt; Date: Sat, 3 Dec 2011 19:49:46 -0800 Subject: Re: Hadoop Streaming From: t...@supertom.com To: common-user@hadoop.apache.org Hi Daniel, I see from your other thread that your HADOOP script has a line like: #!/bin/shrm -f temp.txt I'm not sure what that is, exactly. I suspect the -f is reading from some file and the while loop you had listed read from stdin it seems. What does your input look like? I think what's happening is that you might be expecting lines of input and you're getting splits. What does your input look like? You might want to try this: -inputformat org.apache.hadoop.mapred.lib.NLineInputFormat Thanks, Tom On Sat, Dec 3, 2011 at 7:22 PM, Daniel Yehdego dtyehd...@miners.utep.edu wrote: Thanks Tom for your reply, I think my code is reading from stdin. Because I tried it locally using the following command and its running: $ bin/hadoop fs -cat /user/yehdego/Hadoop-Data-New/RF00171_A.bpseqL3G1_seg_Optimized_Method.txt | head -2 | ./HADOOP But when I tried streaming , it failed and gave me the error code 126. Date: Sat, 3 Dec 2011 19:14:20 -0800 Subject: Re: Hadoop Streaming From: t...@supertom.com To: common-user@hadoop.apache.org So that code 126 should be kicked out by your program - do you know what that means? Your code can read from stdin? Thanks, Tom On Sat, Dec 3, 2011 at 7:09 PM, Daniel Yehdego dtyehd...@miners.utep.edu wrote: I have the following error in running hadoop streaming, PipeMapRed\.waitOutputThreads(): subprocess failed with code 126 at org\.apache\.hadoop\.streaming\.PipeMapRed\.waitOutputThreads(PipeMapRed\.java:311) at org\.apache\.hadoop\.streaming\.PipeMapRed\.mapRedFinished(PipeMapRed\.java:545) at org\.apache\.hadoop\.streaming\.PipeMapper\.close(PipeMapper\.java:132) at org\.apache\.hadoop\.mapred\.MapRunner\.run(MapRunner\.java:57) at org\.apache\.hadoop\.streaming\.PipeMapRunner\.run(PipeMapRunner\.java:36) at org\.apache\.hadoop\.mapred\.MapTask\.runOldMapper(MapTask\.java:358) at org\.apache\.hadoop\.mapred\.MapTask\.run(MapTask\.java:307) at org\.apache\.hadoop\.mapred\.Child\.main(Child\.java:170) I couldn't find out any other error information. Any help ?
RE: Hadoop-streaming using binary executable c program
Hi. I was trying to run hadoop streaming and before that I check with the following : bin/hadoop fs -cat /user/yehdego/Hadoop-Data-New/RF00171_A.bpseqL3G1_seg_Optimized_Method.txt | head -2 | ./HADOOP Were HADOOP is a shell script: #!/bin/shrm -f temp.txt;while read line doecho $line temp.txt;doneexec /data/yehdego/hadoop-0.20.2/PKNOTSRG/src/bin/pknotsRG -k o -F temp.txt; and its working, but when i try running on streaming using the following: bin/hadoop jar /data/yehdego/hadoop-0.20.2/hadoop-0.20.2-streaming.jar -mapper ./HADOOP -file /data/yehdego/hadoop-0.20.2/HADOOP -file /data/yehdego/hadoop-0.20.2/PKNOTSRG/src/bin/pknotsRG -reducer ./ReduceLatest.py -file /data/yehdego/hadoop-0.20.2/ReduceLatest.py -input /user/yehdego/Hadoop-Data-New/RF00171_A.bpseqL3G1_seg_Optimized_Method.txt -output /user/yehdego/RF171_NEW/RF00171_A.bpseqL3G1_Optimized_Method40.txt -verbose it failed with the following error: PipeMapRed\.waitOutputThreads(): subprocess failed with code 126at org\.apache\.hadoop\.streaming\.PipeMapRed\.waitOutputThreads(PipeMapRed\.java:311) at org\.apache\.hadoop\.streaming\.PipeMapRed\.mapRedFinished(PipeMapRed\.java:545) at org\.apache\.hadoop\.streaming\.PipeMapper\.close(PipeMapper\.java:132) at org\.apache\.hadoop\.mapred\.MapRunner\.run(MapRunner\.java:57) at org\.apache\.hadoop\.streaming\.PipeMapRunner\.run(PipeMapRunner\.java:36) at org\.apache\.hadoop\.mapred\.MapTask\.runOldMapper(MapTask\.java:358)at org\.apache\.hadoop\.mapred\.MapTask\.run(MapTask\.java:307) at org\.apache\.hadoop\.mapred\.Child\.main(Child\.java:170) Any idea on this problem ? Regards, Daniel T. Yehdego Computational Science Program University of Texas at El Paso, UTEP dtyehd...@miners.utep.edu From: ev...@yahoo-inc.com To: common-user@hadoop.apache.org Date: Mon, 25 Jul 2011 14:47:34 -0700 Subject: Re: Hadoop-streaming using binary executable c program This is likely to be slow and it is not ideal. The ideal would be to modify pknotsRG to be able to read from stdin, but that may not be possible. The shell script would probably look something like the following #!/bin/sh rm -f temp.txt; while read line do echo $line temp.txt; done exec pknotsRG temp.txt; Place it in a file say hadoopPknotsRG Then you probably want to run chmod +x hadoopPknotsRG After that you want to test it with hadoop fs -cat /user/yehdego/RNAData/RF00028_B.bpseqL3G5_seg_Centered_Method.txt | head -2 | ./hadoopPknotsRG If that works then you can try it with Hadoop streaming HADOOP_HOME$ bin/hadoop jar /data/yehdego/hadoop-0.20.2/hadoop-0.20.2-streaming.jar -mapper ./hadoopPknotsRG -file /data/yehdego/hadoop-0.20.2/pknotsRG -file /data/yehdego/hadoop-0.20.2/hadoopPknotsRG -input /user/yehdego/RF00028_B.bpseqL3G5_seg_Centered_Method.txt -output /user/yehdego/RF-out -reducer NONE -verbose --Bobby On 7/25/11 3:37 PM, Daniel Yehdego dtyehd...@miners.utep.edu wrote: Good afternoon Bobby, Thanks, you gave me a great help in finding out what the problem was. After I put the command line you suggested me, I found out that there was a segmentation error. The binary executable program pknotsRG only reads a file with a sequence in it. This means, there should be a shell script, as you have said, that will take the data coming from stdin and write it to a temporary file. Any idea on how to do this job in shell script. The thing is I am from a biology background and don't have much experience in CS. looking forward to hear from you. Thanks so much. Regards, Daniel T. Yehdego Computational Science Program University of Texas at El Paso, UTEP dtyehd...@miners.utep.edu From: ev...@yahoo-inc.com To: common-user@hadoop.apache.org Date: Fri, 22 Jul 2011 12:39:08 -0700 Subject: Re: Hadoop-streaming using binary executable c program I would suggest that you do the following to help you debug. hadoop fs -cat /user/yehdego/RNAData/RF00028_B.bpseqL3G5_seg_Centered_Method.txt | head -2 | /data/yehdego/hadoop-0.20.2/pknotsRG-1.3/src/pknotsRG - This is simulating what hadoop streaming is doing. Here we are taking the first 2 lines out of the input file and feeding them to the stdin of pknotsRG. The first step is to make sure that you can get your program to run correctly with something like this. You may need to change the command line to pknotsRG to get it to read the data it is processing from stdin, instead of from a file. Alternatively you may need to write a shell script that will take the data coming from stdin. Write it to a file and then call pknotsRG on that temporary file. Once you have this working then you should try it again with streaming. --Bobby Evans On 7/22/11 12:31 PM, Daniel Yehdego dtyehd...@miners.utep.edu wrote: Hi Bobby, Thanks for the response. After I tried the following comannd: bin/hadoop jar
Job Submission schedule, one file at a time ?
Hi, I do have a folder with 50 different files and and I want to submit a Hadoop MapReduce job using each file as an input.My Map/Reduce programs basically do the same job for each of my files but I want to schedule and submit a job one file at a time. Its like submitting a job with one file input, wait until the job completes and submit the second job (second file) right after.I want to have 50 different Mapreduce outputs for the 50 input files. Looking forward for your inputs , Thanks. Regards,
RE: Job Submission schedule, one file at a time ?
Hi Mike, Thanks for your quick response. What I am looking is, I have m/r job which accepts an input file, process it and outputs to a single reducer.But I do have many file ques waiting to be processed similarly and I don't want to submit each m/r job foreach file manually. Is there a way to submit these files one after another or concurrently. All the files are independentto each other. I think its kind of job scheduler or something ? I am not sure on how to proceed . Regards, Subject: Re: Job Submission schedule, one file at a time ? From: michael_se...@hotmail.com Date: Tue, 25 Oct 2011 16:25:40 -0500 To: common-user@hadoop.apache.org Not sure what you are attempting to do... If you submit the directory name... You get a single m/r job to process all. ( but it doesn't sound like that is what you want...) You could use Oozie, or just a simple shell script that will walk down a list of files in the directory and then launch a Hadoop task... Or did you want something else? Sent from a remote device. Please excuse any typos... Mike Segel On Oct 25, 2011, at 3:53 PM, Daniel Yehdego dtyehd...@miners.utep.edu wrote: Hi, I do have a folder with 50 different files and and I want to submit a Hadoop MapReduce job using each file as an input.My Map/Reduce programs basically do the same job for each of my files but I want to schedule and submit a job one file at a time. Its like submitting a job with one file input, wait until the job completes and submit the second job (second file) right after.I want to have 50 different Mapreduce outputs for the 50 input files. Looking forward for your inputs , Thanks. Regards,
Hadoop Streaming
Hi all, I am using hadoop streaming and I want to use a secondary sort so that I will output my values in order. Can i use stream.num.map.input.key.fields instead of stream.num.map.output.key.fields ? I am doing this because the output from the mapper is just a string of letters and its difficult to use the keys to compare. Regards,
RE: Reducer to concatenate string values
Hi Kai, Many thanks for your response. I will look at the links you sent me and I will be back to you. Regards, Daniel T. Yehdego Computational Science Program University of Texas at El Paso, UTEP dtyehd...@miners.utep.edu Subject: Re: Reducer to concatenate string values From: k...@123.org Date: Tue, 20 Sep 2011 07:53:56 +0200 To: common-user@hadoop.apache.org Hi Daniel, the values for a single key will be passed to reduce() in a non-predictable order. Actually, when running the same job on the same data again, the order is most likely different every time. If you want the values to be in a sorted way, you need to apply a 'secondary sort'. The basic idea is to attach your values to the key, and then benefit from the sorting Hadoop does on the key. However, you need to write some code to make that happen. Josh wrote a nice series of articles on it, and you will find more if you google for secondary sort. http://www.cloudera.com/blog/2011/03/simple-moving-average-secondary-sort-and-mapreduce-part-1/ http://www.cloudera.com/blog/2011/03/simple-moving-average-secondary-sort-and-mapreduce-part-2/ http://www.cloudera.com/blog/2011/04/simple-moving-average-secondary-sort-and-mapreduce-part-3/ Kai Am 20.09.2011 um 07:43 schrieb Daniel Yehdego: Good evening, I have a certain value output from a mapper and I want to concatenate the string outputs using a Reducer (one reducer).But the order of the concatenated string values is not in order. How can I use a reducer that receives a value from a mapper output and concatenate the strings in order. waiting your response and thanks in advance. Regards, Daniel T. Yehdego Computational Science Program University of Texas at El Paso, UTEP dtyehd...@miners.utep.edu -- Kai Voigt k...@123.org
RE: Reducer to concatenate string values
Hi Ayon, I am using a C executable as my mapper (streaming), but I am not sure how to use a reducer that concatenates the values from a mapper in order. Regards, Daniel T. Yehdego Computational Science Program University of Texas at El Paso, UTEP dtyehd...@miners.utep.edu Date: Mon, 19 Sep 2011 22:54:46 -0700 From: ayonsi...@yahoo.com Subject: Re: Reducer to concatenate string values To: common-user@hadoop.apache.org What are you using for your map/reduce? Streaming/Java/Pig/Hive? -Ayon See My Photos on Flickr Also check out my Blog for answers to commonly asked questions. From: Daniel Yehdego dtyehd...@miners.utep.edu To: common-user@hadoop.apache.org Sent: Monday, September 19, 2011 10:43 PM Subject: Reducer to concatenate string values Good evening, I have a certain value output from a mapper and I want to concatenate the string outputs using a Reducer (one reducer).But the order of the concatenated string values is not in order. How can I use a reducer that receives a value from a mapper output and concatenate the strings in order. waiting your response and thanks in advance. Regards, Daniel T. Yehdego Computational Science Program University of Texas at El Paso, UTEP dtyehd...@miners.utep.edu
RE: Reducer to concatenate string values
Hi Ayon, any idea on my previous question? Regards, Daniel T. Yehdego Computational Science Program University of Texas at El Paso, UTEP dtyehd...@miners.utep.edu From: dtyehd...@miners.utep.edu To: common-user@hadoop.apache.org Subject: RE: Reducer to concatenate string values Date: Tue, 20 Sep 2011 06:06:22 + Hi Ayon, I am using a C executable as my mapper (streaming), but I am not sure how to use a reducer that concatenates the values from a mapper in order. Regards, Daniel T. Yehdego Computational Science Program University of Texas at El Paso, UTEP dtyehd...@miners.utep.edu Date: Mon, 19 Sep 2011 22:54:46 -0700 From: ayonsi...@yahoo.com Subject: Re: Reducer to concatenate string values To: common-user@hadoop.apache.org What are you using for your map/reduce? Streaming/Java/Pig/Hive? -Ayon See My Photos on Flickr Also check out my Blog for answers to commonly asked questions. From: Daniel Yehdego dtyehd...@miners.utep.edu To: common-user@hadoop.apache.org Sent: Monday, September 19, 2011 10:43 PM Subject: Reducer to concatenate string values Good evening, I have a certain value output from a mapper and I want to concatenate the string outputs using a Reducer (one reducer).But the order of the concatenated string values is not in order. How can I use a reducer that receives a value from a mapper output and concatenate the strings in order. waiting your response and thanks in advance. Regards, Daniel T. Yehdego Computational Science Program University of Texas at El Paso, UTEP dtyehd...@miners.utep.edu
Reducer to concatenate string values
Good evening, I have a certain value output from a mapper and I want to concatenate the string outputs using a Reducer (one reducer).But the order of the concatenated string values is not in order. How can I use a reducer that receives a value from a mapper output and concatenate the strings in order. waiting your response and thanks in advance. Regards, Daniel T. Yehdego Computational Science Program University of Texas at El Paso, UTEP dtyehd...@miners.utep.edu
HADOOP MapReduce sorting
Hi, I want to use an input file which has lines of sequences in which each line (RNA sequence) will be mapped to the mapper (an executable programthat determines the secondary structure of each line of sequence). I am also using a reducer which concatenates the output linesfrom the mapper. But I have some problem that the final output is not sorted in an orderly manner as the input sequence (RNA-1,RNA-2,RNA-3). STDIN INPUT FILE : RNA-1 RNA-2 RNA-3. MAPPER OutPutMAP1RNA-2STRUCTURE-2MAP2RNA-1STRUCTURE-1MAP3RNA-3STRUCTURE-3REDUCER OUTPUTRNA-2RNA-1RNA-3\tSTRUCTURE-1STRUCTURE-2STRUCTURE-3\n ORRNA-3RNA-2RNA-1\tSTRUCTURE-1STRUCTURE-2STRUCTURE-3\n and what I am looking is to reduce in the following ordered manner: RNA-1RNA-2RNA-3\tSTRUCTURE-1STRUCTURE-2STRUCTURE-3\nlooking forward to your input. Regards, Daniel T. Yehdego Computational Science Program University of Texas at El Paso, UTEP dtyehd...@miners.utep.edu
Hadoop reducer according an inout
Hi, I am using Hadoop streaming to distribute some biological data strings. My mapper is some executable binary program to determine the structure of a certain input. I am also using some reducer script to glue the output strings from the mapper so that I have a one long string. But I have a problem that the order of the output string is not same as the input from the mapper. Is there a way that I can use Hadoop so that the output is in the same order as the input. Assume we have an output from the mapperMAP1RNA-1STRUCTURE-1MAP2RNA-2STRUCTURE-2MAP3RNA-3STRUCTURE-3and what I am looking is to reduce in the following manner: RNA-1RNA-2RNA-3\tSTRUCTURE-1STRUCTURE-2STRUCTURE-3\n your input highly appreciated. Thanks in advance.Regards, Daniel T. Yehdego Computational Science Program University of Texas at El Paso, UTEP dtyehd...@miners.utep.edu
Hadoop Reduction order
Hi, I am using Hadoop streaming to distribute some biological data strings. My mapper is some executable binary program to determine the structure of a certain input. I am also using some reducer script to glue the output strings from the mapper so that I have a one long string. But I have a problem that the order of the output string is not same as the input from the mapper. Is there a way that I can use Hadoop so that the output is in the same order as the input.Assume we have an output from the mapperMAP1RNA-1STRUCTURE-1MAP2RNA-2STRUCTURE-2MAP3RNA-3STRUCTURE-3and what I am looking is to reduce in the following manner: RNA-1RNA-2RNA-3\tSTRUCTURE-1STRUCTURE-2STRUCTURE-3\nyour input highly appreciated. Thanks in advance.Regards, Regards, Daniel T. Yehdego Computational Science Program University of Texas at El Paso, UTEP dtyehd...@miners.utep.edu
RE: Hadoop-streaming using binary executable c program
Hi Bobby, I have written a small Perl script which do the following job: Assume we have an output from the mapper MAP1 RNA-1 STRUCTURE-1 MAP2 RNA-2 STRUCTURE-2 MAP3 RNA-3 STRUCTURE-3 and what the script does is reduce in the following manner : RNA-1RNA-2RNA-3\tSTRUCTURE-1STRUCTURE-2STRUCTURE-3\n and the script looks like this: #!/usr/bin/perl use strict; use warnings; use autodie; my @handles = map { open my $h, '', $_; $h } @ARGV; while (@handles){ @handles = grep { ! eof $_ } @handles; my @lines = map { my $v = $_; chomp $v; $v } @handles; print join(' ', @lines), \n; } close $_ for @handles; This should work for any inputs from the mapper. But after I use hadoop streaming and put the above code as my reducer, the job was successful but the output files were empty. And I couldn't find out. bin/hadoop jar /data/yehdego/hadoop-0.20.2/hadoop-0.20.2-streaming.jar -mapper ./hadoopPknotsRG -file /data/yehdego/hadoop-0.20.2/pknotsRG -file /data/yehdego/hadoop-0.20.2/hadoopPknotsRG -reducer ./reducer.pl -file /data/yehdego/hadoop-0.20.2/reducer.pl -input /user/yehdego/RF00028_B.bpseqL3G5_seg_Centered_Method.txt -output /user/yehdego/RFR2-out - verbose Any help or suggestion is really appreciatedI am just stuck here for the weekend. Regards, Daniel T. Yehdego Computational Science Program University of Texas at El Paso, UTEP dtyehd...@miners.utep.edu From: ev...@yahoo-inc.com To: common-user@hadoop.apache.org Date: Thu, 28 Jul 2011 07:12:11 -0700 Subject: Re: Hadoop-streaming using binary executable c program I am not completely sure what you are getting at. It looks like the output of your c program is (And this is just a guess) NOTE: \t stands for the tab character and in streaming it is used to separate the key from the value \n stands for carriage return and is used to separate individual records.. RNA-1\tSTRUCTURE-1\n RNA-2\tSTRUCTURE-2\n RNA-3\tSTRUCTURE-3\n ... And you want the output to look like RNA-1RNA-2RNA-3\tSTRUCTURE-1STRUCTURE-2STRUCTURE-3\n You could use a reduce to do this, but the issue here is with the shuffle in between the maps and the reduces. The Shuffle will group by the key to send to the reducers and then sort by the key. So in reality your map output looks something like FROM MAP 1: RNA-1\tSTRUCTURE-1\n RNA-2\tSTRUCTURE-2\n FROM MAP 2: RNA-3\tSTRUCTURE-3\n RNA-4\tSTRUCTURE-4\n FROM MAP 3: RNA-5\tSTRUCTURE-5\n RNA-6\tSTRUCTURE-6\n If you send it to a single reducer (The only way to get a single file) Then the input to the reducer will be sorted alphabetically by the RNA, and the order of the input will be lost. You can work around this by giving each line a unique number that is in the order you want It to be output. But doing this would require you to write some code. I would suggest that you do it with a small shell script after all the maps have completed to splice them together. -- Bobby On 7/27/11 2:55 PM, Daniel Yehdego dtyehd...@miners.utep.edu wrote: Hi Bobby, I just want to ask you if there is away of using a reducer or something like concatenation to glue my outputs from the mapper and outputs them as a single file and segment of the predicted RNA 2D structure? FYI: I have used a reducer NONE before: HADOOP_HOME$ bin/hadoop jar /data/yehdego/hadoop-0.20.2/hadoop-0.20.2-streaming.jar -mapper ./hadoopPknotsRG -file /data/yehdego/hadoop-0.20.2/pknotsRG -file /data/yehdego/hadoop-0.20.2/hadoopPknotsRG -input /user/yehdego/RF00028_B.bpseqL3G5_seg_Centered_Method.txt -output /user/yehdego/RF-out -reducer NONE -verbose and a sample of my output using the mapper of two different slave nodes looks like this : AUACCCGCAAAUUCACUCAAAUCUGUAAUAGGUUUGUCAUUCAAAUCUAGUGCAAAUAUUACUUUCGCCAAUUAGGUAUAAUAAUGGUAAGC and [...(((...))).]. (-13.46) GGGACAAGACUCGACAUUUGAUACACUAUUUAUCAAUGGAUGUCUUCU .(((.((......).. (-11.00) and I want to concatenate and output them as a single predicated RNA sequence structure: AUACCCGCAAAUUCACUCAAAUCUGUAAUAGGUUUGUCAUUCAAAUCUAGUGCAAAUAUUACUUUCGCCAAUUAGGUAUAAUAAUGGUAAGCGGGACAAGACUCGACAUUUGAUACACUAUUUAUCAAUGGAUGUCUUCU [...(((...))).]..(((.((......).. Regards, Daniel T. Yehdego Computational Science Program University of Texas at El Paso, UTEP dtyehd...@miners.utep.edu From: dtyehd...@miners.utep.edu To: common-user@hadoop.apache.org Subject: RE: Hadoop-streaming using binary executable c program Date: Tue, 26 Jul 2011 16:23:10 + Good afternoon Bobby, Thanks so much, now its working excellent. And the speed is also reasonable. Once again thanks u. Regards, Daniel T. Yehdego Computational
RE: Hadoop-streaming using binary executable c program
Hi Bobby, I just want to ask you if there is away of using a reducer or something like concatenation to glue my outputs from the mapper and outputs them as a single file and segment of the predicted RNA 2D structure? FYI: I have used a reducer NONE before: HADOOP_HOME$ bin/hadoop jar /data/yehdego/hadoop-0.20.2/hadoop-0.20.2-streaming.jar -mapper ./hadoopPknotsRG -file /data/yehdego/hadoop-0.20.2/pknotsRG -file /data/yehdego/hadoop-0.20.2/hadoopPknotsRG -input /user/yehdego/RF00028_B.bpseqL3G5_seg_Centered_Method.txt -output /user/yehdego/RF-out -reducer NONE -verbose and a sample of my output using the mapper of two different slave nodes looks like this : AUACCCGCAAAUUCACUCAAAUCUGUAAUAGGUUUGUCAUUCAAAUCUAGUGCAAAUAUUACUUUCGCCAAUUAGGUAUAAUAAUGGUAAGC and [...(((...))).]. (-13.46) GGGACAAGACUCGACAUUUGAUACACUAUUUAUCAAUGGAUGUCUUCU .(((.((......).. (-11.00) and I want to concatenate and output them as a single predicated RNA sequence structure: AUACCCGCAAAUUCACUCAAAUCUGUAAUAGGUUUGUCAUUCAAAUCUAGUGCAAAUAUUACUUUCGCCAAUUAGGUAUAAUAAUGGUAAGCGGGACAAGACUCGACAUUUGAUACACUAUUUAUCAAUGGAUGUCUUCU [...(((...))).]..(((.((......).. Regards, Daniel T. Yehdego Computational Science Program University of Texas at El Paso, UTEP dtyehd...@miners.utep.edu From: dtyehd...@miners.utep.edu To: common-user@hadoop.apache.org Subject: RE: Hadoop-streaming using binary executable c program Date: Tue, 26 Jul 2011 16:23:10 + Good afternoon Bobby, Thanks so much, now its working excellent. And the speed is also reasonable. Once again thanks u. Regards, Daniel T. Yehdego Computational Science Program University of Texas at El Paso, UTEP dtyehd...@miners.utep.edu From: ev...@yahoo-inc.com To: common-user@hadoop.apache.org Date: Mon, 25 Jul 2011 14:47:34 -0700 Subject: Re: Hadoop-streaming using binary executable c program This is likely to be slow and it is not ideal. The ideal would be to modify pknotsRG to be able to read from stdin, but that may not be possible. The shell script would probably look something like the following #!/bin/sh rm -f temp.txt; while read line do echo $line temp.txt; done exec pknotsRG temp.txt; Place it in a file say hadoopPknotsRG Then you probably want to run chmod +x hadoopPknotsRG After that you want to test it with hadoop fs -cat /user/yehdego/RNAData/RF00028_B.bpseqL3G5_seg_Centered_Method.txt | head -2 | ./hadoopPknotsRG If that works then you can try it with Hadoop streaming HADOOP_HOME$ bin/hadoop jar /data/yehdego/hadoop-0.20.2/hadoop-0.20.2-streaming.jar -mapper ./hadoopPknotsRG -file /data/yehdego/hadoop-0.20.2/pknotsRG -file /data/yehdego/hadoop-0.20.2/hadoopPknotsRG -input /user/yehdego/RF00028_B.bpseqL3G5_seg_Centered_Method.txt -output /user/yehdego/RF-out -reducer NONE -verbose --Bobby On 7/25/11 3:37 PM, Daniel Yehdego dtyehd...@miners.utep.edu wrote: Good afternoon Bobby, Thanks, you gave me a great help in finding out what the problem was. After I put the command line you suggested me, I found out that there was a segmentation error. The binary executable program pknotsRG only reads a file with a sequence in it. This means, there should be a shell script, as you have said, that will take the data coming from stdin and write it to a temporary file. Any idea on how to do this job in shell script. The thing is I am from a biology background and don't have much experience in CS. looking forward to hear from you. Thanks so much. Regards, Daniel T. Yehdego Computational Science Program University of Texas at El Paso, UTEP dtyehd...@miners.utep.edu From: ev...@yahoo-inc.com To: common-user@hadoop.apache.org Date: Fri, 22 Jul 2011 12:39:08 -0700 Subject: Re: Hadoop-streaming using binary executable c program I would suggest that you do the following to help you debug. hadoop fs -cat /user/yehdego/RNAData/RF00028_B.bpseqL3G5_seg_Centered_Method.txt | head -2 | /data/yehdego/hadoop-0.20.2/pknotsRG-1.3/src/pknotsRG - This is simulating what hadoop streaming is doing. Here we are taking the first 2 lines out of the input file and feeding them to the stdin of pknotsRG. The first step is to make sure that you can get your program to run correctly with something like this. You may need to change the command line to pknotsRG to get it to read the data it is processing from stdin, instead of from a file. Alternatively you may need to write a shell script that will take the data coming from stdin. Write it to a file and then call pknotsRG
RE: Hadoop-streaming using binary executable c program
Good afternoon Bobby, Thanks so much, now its working excellent. And the speed is also reasonable. Once again thanks u. Regards, Daniel T. Yehdego Computational Science Program University of Texas at El Paso, UTEP dtyehd...@miners.utep.edu From: ev...@yahoo-inc.com To: common-user@hadoop.apache.org Date: Mon, 25 Jul 2011 14:47:34 -0700 Subject: Re: Hadoop-streaming using binary executable c program This is likely to be slow and it is not ideal. The ideal would be to modify pknotsRG to be able to read from stdin, but that may not be possible. The shell script would probably look something like the following #!/bin/sh rm -f temp.txt; while read line do echo $line temp.txt; done exec pknotsRG temp.txt; Place it in a file say hadoopPknotsRG Then you probably want to run chmod +x hadoopPknotsRG After that you want to test it with hadoop fs -cat /user/yehdego/RNAData/RF00028_B.bpseqL3G5_seg_Centered_Method.txt | head -2 | ./hadoopPknotsRG If that works then you can try it with Hadoop streaming HADOOP_HOME$ bin/hadoop jar /data/yehdego/hadoop-0.20.2/hadoop-0.20.2-streaming.jar -mapper ./hadoopPknotsRG -file /data/yehdego/hadoop-0.20.2/pknotsRG -file /data/yehdego/hadoop-0.20.2/hadoopPknotsRG -input /user/yehdego/RF00028_B.bpseqL3G5_seg_Centered_Method.txt -output /user/yehdego/RF-out -reducer NONE -verbose --Bobby On 7/25/11 3:37 PM, Daniel Yehdego dtyehd...@miners.utep.edu wrote: Good afternoon Bobby, Thanks, you gave me a great help in finding out what the problem was. After I put the command line you suggested me, I found out that there was a segmentation error. The binary executable program pknotsRG only reads a file with a sequence in it. This means, there should be a shell script, as you have said, that will take the data coming from stdin and write it to a temporary file. Any idea on how to do this job in shell script. The thing is I am from a biology background and don't have much experience in CS. looking forward to hear from you. Thanks so much. Regards, Daniel T. Yehdego Computational Science Program University of Texas at El Paso, UTEP dtyehd...@miners.utep.edu From: ev...@yahoo-inc.com To: common-user@hadoop.apache.org Date: Fri, 22 Jul 2011 12:39:08 -0700 Subject: Re: Hadoop-streaming using binary executable c program I would suggest that you do the following to help you debug. hadoop fs -cat /user/yehdego/RNAData/RF00028_B.bpseqL3G5_seg_Centered_Method.txt | head -2 | /data/yehdego/hadoop-0.20.2/pknotsRG-1.3/src/pknotsRG - This is simulating what hadoop streaming is doing. Here we are taking the first 2 lines out of the input file and feeding them to the stdin of pknotsRG. The first step is to make sure that you can get your program to run correctly with something like this. You may need to change the command line to pknotsRG to get it to read the data it is processing from stdin, instead of from a file. Alternatively you may need to write a shell script that will take the data coming from stdin. Write it to a file and then call pknotsRG on that temporary file. Once you have this working then you should try it again with streaming. --Bobby Evans On 7/22/11 12:31 PM, Daniel Yehdego dtyehd...@miners.utep.edu wrote: Hi Bobby, Thanks for the response. After I tried the following comannd: bin/hadoop jar $HADOOP_HOME/hadoop-0.20.2-streaming.jar -mapper /data/yehdego/hadoop-0.20.2/pknotsRG-1.3/src/pknotsRG - -file /data/yehdego/hadoop-0.20.2/pknotsRG-1.3/src/pknotsRG -reducer NONE -input /user/yehdego/RNAData/RF00028_B.bpseqL3G5_seg_Centered_Method.txt -output /user/yehdego/RF-out - verbose I got a stderr logs : java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 139 at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:311) at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545) at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:132) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57) at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.Child.main(Child.java:170) syslog logs 2011-07-22 13:02:27,467 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=MAP, sessionId= 2011-07-22 13:02:27,913 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 0 2011-07-22 13:02:28,149 INFO org.apache.hadoop.streaming.PipeMapRed: PipeMapRed exec [/data/yehdego/hadoop_tmp/dfs/local/taskTracker/jobcache/job_201107181535_0079/attempt_201107181535_0079_m_00_0/work/./pknotsRG] 2011-07-22
RE: Hadoop-streaming using binary executable c program
Good afternoon Bobby, Thanks, you gave me a great help in finding out what the problem was. After I put the command line you suggested me, I found out that there was a segmentation error. The binary executable program pknotsRG only reads a file with a sequence in it. This means, there should be a shell script, as you have said, that will take the data coming from stdin and write it to a temporary file. Any idea on how to do this job in shell script. The thing is I am from a biology background and don't have much experience in CS. looking forward to hear from you. Thanks so much. Regards, Daniel T. Yehdego Computational Science Program University of Texas at El Paso, UTEP dtyehd...@miners.utep.edu From: ev...@yahoo-inc.com To: common-user@hadoop.apache.org Date: Fri, 22 Jul 2011 12:39:08 -0700 Subject: Re: Hadoop-streaming using binary executable c program I would suggest that you do the following to help you debug. hadoop fs -cat /user/yehdego/RNAData/RF00028_B.bpseqL3G5_seg_Centered_Method.txt | head -2 | /data/yehdego/hadoop-0.20.2/pknotsRG-1.3/src/pknotsRG - This is simulating what hadoop streaming is doing. Here we are taking the first 2 lines out of the input file and feeding them to the stdin of pknotsRG. The first step is to make sure that you can get your program to run correctly with something like this. You may need to change the command line to pknotsRG to get it to read the data it is processing from stdin, instead of from a file. Alternatively you may need to write a shell script that will take the data coming from stdin. Write it to a file and then call pknotsRG on that temporary file. Once you have this working then you should try it again with streaming. --Bobby Evans On 7/22/11 12:31 PM, Daniel Yehdego dtyehd...@miners.utep.edu wrote: Hi Bobby, Thanks for the response. After I tried the following comannd: bin/hadoop jar $HADOOP_HOME/hadoop-0.20.2-streaming.jar -mapper /data/yehdego/hadoop-0.20.2/pknotsRG-1.3/src/pknotsRG - -file /data/yehdego/hadoop-0.20.2/pknotsRG-1.3/src/pknotsRG -reducer NONE -input /user/yehdego/RNAData/RF00028_B.bpseqL3G5_seg_Centered_Method.txt -output /user/yehdego/RF-out - verbose I got a stderr logs : java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 139 at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:311) at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545) at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:132) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57) at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.Child.main(Child.java:170) syslog logs 2011-07-22 13:02:27,467 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=MAP, sessionId= 2011-07-22 13:02:27,913 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 0 2011-07-22 13:02:28,149 INFO org.apache.hadoop.streaming.PipeMapRed: PipeMapRed exec [/data/yehdego/hadoop_tmp/dfs/local/taskTracker/jobcache/job_201107181535_0079/attempt_201107181535_0079_m_00_0/work/./pknotsRG] 2011-07-22 13:02:28,242 INFO org.apache.hadoop.streaming.PipeMapRed: R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s] 2011-07-22 13:02:28,267 INFO org.apache.hadoop.streaming.PipeMapRed: MROutputThread done 2011-07-22 13:02:28,267 INFO org.apache.hadoop.streaming.PipeMapRed: MRErrorThread done 2011-07-22 13:02:28,267 INFO org.apache.hadoop.streaming.PipeMapRed: PipeMapRed failed! 2011-07-22 13:02:28,361 WARN org.apache.hadoop.mapred.TaskTracker: Error running child java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 139 at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:311) at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545) at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:132) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57) at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.Child.main(Child.java:170) 2011-07-22 13:02:28,395 INFO org.apache.hadoop.mapred.TaskRunner: Runnning cleanup for the task Regards, Daniel T. Yehdego Computational Science Program University of Texas at El Paso, UTEP dtyehd...@miners.utep.edu From: ev...@yahoo-inc.com To: common-user@hadoop.apache.org; dtyehd...@miners.utep.edu Date: Fri, 22 Jul 2011 09:12:18
Hadoop-streaming with a c binary executable as a mapper
Hi, I using hadoop-streaming for parallelizing a big RNA data. I am using a c binary executable program called pknotsRG as my mapper. My command to run the job looks like: HADOOP_HOME$ bin/hadoop jar /data/yehdego/hadoop-0.20.2/hadoop-0.20.2-streaming.jar -mapper /data/yehdego/hadoop-0.20.2/pknotsRG -file /data/yehdego/hadoop-0.20.2/pknotsRG -input /user/yehdego/RF00028_B.bpseqL3G5_seg_Centered_Method.txt -output /user/yehdego/RF-out -reducer NONE -verbose and I keep getting the following error messages: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1 at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:311) at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545) at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:132) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57) at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.Child.main(Child.java:170) FYI: I am inputing a file with lines of sequences and the mapper is expected to take each line and execute and predict their 2D secondary structure. I tried the executable locally and it worked. [yehdego@bulgaria hadoop-0.20.2]$ ./pknotsRG RF00028_B.bpseqL3G5_seg_Centered_Method.txt AUGACUCUCUAAAUUGCUUUACCUUUGGAGGGGUUAUCAGGCCUGCACCUGAUAGCUAGUCUUUAAACCAAUAGAUUGCAUCGGUUUAAUA (..)...((..))[.{{...].}}... GCAAGACCGUCAAAUUGCGGGGGGU ..... CAACAGCCGUUCAGUACCAAGUCUCAA ..((.((.(()).)).)). AACUUUGAGAUGGCCUUGCAAAGGAUAUGGUAAUAAGCUGACGGACAGGGUCCUAACCACGCAGCCAAGUCCUAAGUCAACAUUU ..[[[.]]](.......)).))..).... CGGUGUUGAUAUGGAUGCAGUUCACAGACUAAAUGUCGGUCAAGAAUAGGUAUUCUUCUCAUAAGAUAUAGUCGGACCUCUCCUUAAUGGGAGCU .(((...(...)..(...())..))).()..
Hadoop-streaming using binary executable c program
I am trying to parallelize some very long RNA sequence for the sake of predicting their RNA 2D structures. I am using a binary executable c program called pknotsRG as my mapper. I tried the following bin/hadoop command: HADOOP_HOME$ bin/hadoop jar /data/yehdego/hadoop-0.20.2/hadoop-0.20.2-streaming.jar -mapper /data/yehdego/hadoop-0.20.2/pknotsRG -file /data/yehdego/hadoop-0.20.2/pknotsRG -input /user/yehdego/RF00028_B.bpseqL3G5_seg_Centered_Method.txt -output /user/yehdego/RF-out -reducer NONE -verbose but i keep getting the following error message: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1 at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:311) at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545) at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:132) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57) at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.Child.main(Child.java:170) FYI: my input file is RF00028_B.bpseqL3G5_seg_Centered_Method.txt which is a chunk of RNA sequences and the mapper is expected to get the input and execute the input file line by line and out put the predicted structure for each line of sequence for a specified number of maps. Any help on this problem is really appreciated. Thanks.
RE: Hadoop-streaming with a c binary executable as a mapper
Thanks Joey for your quick response, I have tried the suggestion you gave me and its still not working, after I run: bin/hadoop jar $HADOOP_HOME/hadoop-0.20.2-streaming.jar -mapper /data/yehdego/hadoop-0.20.2/pknotsRG-1.3/src/pknotsRG - -file /data/yehdego/hadoop-0.20.2/pknotsRG-1.3/src/pknotsRG -reducer NONE -input /user/yehdego/RNAData/RF00028_B.bpseqL3G5_seg_Centered_Method.txt -output /user/yehdego/RF-out - verbose I got the following task logs: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 139 at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:311) at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545) at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:132) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57) at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.Child.main(Child.java:170) syslog logs 2011-07-22 13:02:27,467 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=MAP, sessionId= 2011-07-22 13:02:27,913 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 0 2011-07-22 13:02:28,149 INFO org.apache.hadoop.streaming.PipeMapRed: PipeMapRed exec [/data/yehdego/hadoop_tmp/dfs/local/taskTracker/jobcache/job_201107181535_0079/attempt_201107181535_0079_m_00_0/work/./pknotsRG] 2011-07-22 13:02:28,242 INFO org.apache.hadoop.streaming.PipeMapRed: R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s] 2011-07-22 13:02:28,267 INFO org.apache.hadoop.streaming.PipeMapRed: MROutputThread done 2011-07-22 13:02:28,267 INFO org.apache.hadoop.streaming.PipeMapRed: MRErrorThread done 2011-07-22 13:02:28,267 INFO org.apache.hadoop.streaming.PipeMapRed: PipeMapRed failed! 2011-07-22 13:02:28,361 WARN org.apache.hadoop.mapred.TaskTracker: Error running child java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 139 at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:311) at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545) at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:132) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57) at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.Child.main(Child.java:170) 2011-07-22 13:02:28,395 INFO org.apache.hadoop.mapred.TaskRunner: Runnning cleanup for the task Regards, Daniel T. Yehdego Computational Science Program University of Texas at El Paso, UTEP dtyehd...@miners.utep.edu CC: common-user@hadoop.apache.org From: j...@cloudera.com Subject: Re: Hadoop-streaming with a c binary executable as a mapper Date: Fri, 22 Jul 2011 11:34:08 -0400 To: common-user@hadoop.apache.org Your executable needs to read lines from standard in. Try setting your mapper like this: -mapper /data/yehdego/hadoop-0.20.2/pknotsRG - If that doesn't work, you may need to execute your C program from a shell script. The -I added to the command line says read from STDIN. -Joey On Jul 22, 2011, at 10:41, Daniel Yehdego dtyehd...@miners.utep.edu wrote: Hi, I using hadoop-streaming for parallelizing a big RNA data. I am using a c binary executable program called pknotsRG as my mapper. My command to run the job looks like: HADOOP_HOME$ bin/hadoop jar /data/yehdego/hadoop-0.20.2/hadoop-0.20.2-streaming.jar -mapper /data/yehdego/hadoop-0.20.2/pknotsRG -file /data/yehdego/hadoop-0.20.2/pknotsRG -input /user/yehdego/RF00028_B.bpseqL3G5_seg_Centered_Method.txt -output /user/yehdego/RF-out -reducer NONE -verbose and I keep getting the following error messages: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1 at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:311) at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545) at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:132) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57) at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.Child.main(Child.java:170) FYI: I am inputing a file with lines of sequences and the mapper is expected to take each line and execute and predict their 2D secondary structure. I tried
RE: Hadoop-streaming using binary executable c program
Hi Bobby, Thanks for the response. After I tried the following comannd: bin/hadoop jar $HADOOP_HOME/hadoop-0.20.2-streaming.jar -mapper /data/yehdego/hadoop-0.20.2/pknotsRG-1.3/src/pknotsRG - -file /data/yehdego/hadoop-0.20.2/pknotsRG-1.3/src/pknotsRG -reducer NONE -input /user/yehdego/RNAData/RF00028_B.bpseqL3G5_seg_Centered_Method.txt -output /user/yehdego/RF-out - verbose I got a stderr logs : java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 139 at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:311) at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545) at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:132) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57) at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.Child.main(Child.java:170) syslog logs 2011-07-22 13:02:27,467 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=MAP, sessionId= 2011-07-22 13:02:27,913 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 0 2011-07-22 13:02:28,149 INFO org.apache.hadoop.streaming.PipeMapRed: PipeMapRed exec [/data/yehdego/hadoop_tmp/dfs/local/taskTracker/jobcache/job_201107181535_0079/attempt_201107181535_0079_m_00_0/work/./pknotsRG] 2011-07-22 13:02:28,242 INFO org.apache.hadoop.streaming.PipeMapRed: R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s] 2011-07-22 13:02:28,267 INFO org.apache.hadoop.streaming.PipeMapRed: MROutputThread done 2011-07-22 13:02:28,267 INFO org.apache.hadoop.streaming.PipeMapRed: MRErrorThread done 2011-07-22 13:02:28,267 INFO org.apache.hadoop.streaming.PipeMapRed: PipeMapRed failed! 2011-07-22 13:02:28,361 WARN org.apache.hadoop.mapred.TaskTracker: Error running child java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 139 at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:311) at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545) at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:132) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57) at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.Child.main(Child.java:170) 2011-07-22 13:02:28,395 INFO org.apache.hadoop.mapred.TaskRunner: Runnning cleanup for the task Regards, Daniel T. Yehdego Computational Science Program University of Texas at El Paso, UTEP dtyehd...@miners.utep.edu From: ev...@yahoo-inc.com To: common-user@hadoop.apache.org; dtyehd...@miners.utep.edu Date: Fri, 22 Jul 2011 09:12:18 -0700 Subject: Re: Hadoop-streaming using binary executable c program It looks like it tried to run your program and the program exited with a 1 not a 0. What are the stderr logs like for the mappers that were launched, you should be able to access them through the Web GUI? You might want to add in some stderr log messages to you c program too. To be able to debug how far along it is going before exiting. --Bobby Evans On 7/22/11 9:19 AM, Daniel Yehdego dtyehd...@miners.utep.edu wrote: I am trying to parallelize some very long RNA sequence for the sake of predicting their RNA 2D structures. I am using a binary executable c program called pknotsRG as my mapper. I tried the following bin/hadoop command: HADOOP_HOME$ bin/hadoop jar /data/yehdego/hadoop-0.20.2/hadoop-0.20.2-streaming.jar -mapper /data/yehdego/hadoop-0.20.2/pknotsRG -file /data/yehdego/hadoop-0.20.2/pknotsRG -input /user/yehdego/RF00028_B.bpseqL3G5_seg_Centered_Method.txt -output /user/yehdego/RF-out -reducer NONE -verbose but i keep getting the following error message: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1 at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:311) at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545) at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:132) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57) at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.Child.main(Child.java:170) FYI: my input file is RF00028_B.bpseqL3G5_seg_Centered_Method.txt which is a chunk of RNA sequences and the mapper is expected