Re: reading input for a map function from 2 different files?
Since you need to pass only one number (average) to all mappers, you can pass it through jobconf with a config variable defined by you, say my.average.. - milind On 11/11/08 8:25 PM, some speed [EMAIL PROTECTED] wrote: Thanks for the response. What I am trying is to do is finding the average and then the standard deviation for a very large set (say a million) of numbers. The result would be used in further calculations. I have got the average from the first map-reduce chain. now i need to read this average as well as the set of numbers to calculate the standard deviation. so one file would have the input set and the other resultant file would have just the average. Please do tell me in case there is a better way of doing things than what i am doing. Any input/suggestion is appreciated.:) On Mon, Nov 10, 2008 at 4:22 AM, Amar Kamat [EMAIL PROTECTED] wrote: Amar Kamat wrote: some speed wrote: I was wondering if it was possible to read the input for a map function from 2 different files: 1st file --- user-input file from a particular location(path) Is the input/user file sorted? If yes then you can use map-side join for performance reasons. See org.apache.hadoop.mapred.join for more details. 2nd file=--- A resultant file (has just one key,value pair) from a previous MapReduce job. (I am implementing a chain MapReduce function) Can you explain in more detail the contents of 2nd file? Now, for every key,value pair in the user-input file, I would like to use the same key,value pair from the 2nd file for some calculations. Can you explain this in more detail? Can you give some abstracted example of how file1 and file2 look like and what operation/processing you want to do? I guess you might need to do some kind of join on the 2 files. Look at contrib/data_join for more details. Amar Is it possible for me to do so? Can someone guide me in the right direction please? Thanks! -- Milind Bhandarkar Y!IM: GridSolutions 408-349-2136 ([EMAIL PROTECTED])
Re: Hadoop Streaming - running a jar file
You should specify A.jar on the bin/hadoop command line with -file A.jar, so that streaming knows to copy that file on the tasktracker node. - milind On 11/11/08 10:50 AM, Amit_Gupta [EMAIL PROTECTED] wrote: Hi I have a jar file which takes input from stdin and writes something on stdout. i.e. When I run java -jar A.jar input It prints the required output. However, when I run it as a mapper in hadoop streaming using the command $HADOOP_HOME/bin/hadoop jar streaming.jar -input .. -output ... -mapper 'java -jar A.jar' -reducer NONE i get the broken pipe exception. the error message is additionalConfSpec_:null null=@@@userJobConfProps_.get(stream.shipped.hadoopstreaming packageJobJar: [/mnt/hadoop/HADOOP/hadoop-0.16.3/tmp/dir/hadoop-hadoop/hadoop-unjar45410/] [] /tmp/streamjob45411.jar tmpDir=null 08/11/11 23:20:14 INFO mapred.FileInputFormat: Total input paths to process : 1 08/11/11 23:20:14 INFO streaming.StreamJob: getLocalDirs(): [/mnt/hadoop/HADOOP/hadoop-0.16.3/tmp/mapred] 08/11/11 23:20:14 INFO streaming.StreamJob: Running job: job_20081724_0014 08/11/11 23:20:14 INFO streaming.StreamJob: To kill this job, run: 08/11/11 23:20:14 INFO streaming.StreamJob: /mnt/hadoop/HADOOP/hadoop-0.16.3/bin/../bin/hadoop job -Dmapred.job.tracker=10.105.41.25:54311 -kill job_20081724_0014 08/11/11 23:20:15 INFO streaming.StreamJob: Tracking URL: http://sayali:50030/jobdetails.jsp?jobid=job_20081724_0014 08/11/11 23:20:16 INFO streaming.StreamJob: map 0% reduce 0% 08/11/11 23:21:00 INFO streaming.StreamJob: map 100% reduce 100% 08/11/11 23:21:00 INFO streaming.StreamJob: To kill this job, run: 08/11/11 23:21:00 INFO streaming.StreamJob: /mnt/hadoop/HADOOP/hadoop-0.16.3/bin/../bin/hadoop job -Dmapred.job.tracker=10.105.41.25:54311 -kill job_20081724_0014 08/11/11 23:21:00 INFO streaming.StreamJob: Tracking URL: http://sayali:50030/jobdetails.jsp?jobid=job_20081724_0014 08/11/11 23:21:00 ERROR streaming.StreamJob: Job not Successful! 08/11/11 23:21:00 INFO streaming.StreamJob: killJob... Streaming Job Failed! Could some one please help me with any ideas or pointers. regards Amit -- View this message in context: http://www.nabble.com/Hadoop-Streamingrunning-a-jar-file-tp20445877p204458 77.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com. -- Milind Bhandarkar Y!IM: GridSolutions 408-349-2136 ([EMAIL PROTECTED])
Re: Seeking Someone to Review Hadoop Article
Tom, Please consider adding it to: http://wiki.apache.org/hadoop/HadoopArticles Thanks, - milind On 11/2/08 5:57 PM, Tom Wheeler [EMAIL PROTECTED] wrote: The article I've written about Hadoop has just been published: http://www.ociweb.com/jnb/jnbNov2008.html I'd like to again thank Mafish Liu and Amit Kumar Saha for reviewing my draft and offering suggestions for helping me improve it. I hope the article is compelling, clear and technically accurate. However, if you notice anything in need of correction, please contact me offlist and I will address it ASAP. Tom Wheeler On Thu, Oct 23, 2008 at 5:31 PM, Tom Wheeler [EMAIL PROTECTED] wrote: Each month the developers at my company write a short article about a Java technology we find exciting. I've just finished one about Hadoop for November and am seeking a volunteer knowledgeable about Hadoop to look it over to help ensure it's both clear and technically accurate. If you're interested in helping me, please contact me offlist and I will send you the draft. Meanwhile, you can get a feel for the length and general style of the articles from our archives: http://www.ociweb.com/articles/publications/jnb.html Thanks in advance, Tom Wheeler -- Milind Bhandarkar Y!IM: GridSolutions 408-349-2136 ([EMAIL PROTECTED])
Re: Hadoop Camp next month
Hi, Just received an email from ApacheCon US organizers that they are giving 50% discount for Hadoop Training (http://us.apachecon.com/c/acus2008/sessions/93). We just created a discount code for you to give people 50% off of the cost of the training. The code is Hadoop50The discount would apply only to the Training. Hope to see you there ! - Milind On 10/2/08 9:03 AM, Owen O'Malley [EMAIL PROTECTED] wrote: Hi all, I'd like to remind everyone that the Hadoop Camp ApacheCon US is coming up in New Orleans next month. http://tinyurl.com/hadoop-camp It will be the largest gathering of Hadoop developers outside of California. We'll have: Core: Doug Cutting, Dhruba Borthakur, Arun Murthy, Owen O'Malley, Sameer Paranjpye, Sanjay Radia, Tom White Zookeeper: Ben Reed There will also be a training session on Practical Problem Solving with Hadoop by Milind Bhandarkar on Monday. So if you'd like to meet the developers or find out more about Hadoop, come join us! -- Owen -- Milind Bhandarkar Y!IM: GridSolutions 408-349-2136 ([EMAIL PROTECTED])
Re: Backup Tasks in Hadoop MapReduce.
Yes. In hadoop, you can enable backup tasks by setting mapred.speculative.execution to true. - milind On 4/16/08 8:07 AM, Chaman Singh Verma [EMAIL PROTECTED] wrote: Hello, I am curious to know whether Hadoop MapReduce has the feature of Backup Tasks as described in the seminal paper MapReduce:Simplified Data Processing in Large Cluster ( Dean and Ghemawat ). Any implementation detail will be extremely valuable. Thanks. With Regards. Chaman Singh Verma Poona,India - Chaman Singh Verma Poona, India -- View this message in context: http://www.nabble.com/Backup-Tasks-in-Hadoop-MapReduce.-tp16722539p16722539.ht ml Sent from the Hadoop core-user mailing list archive at Nabble.com. - Milind -- Milind Bhandarkar, Chief Spammer, Grid Team Y!IM: GridSolutions 408-349-2136 ([EMAIL PROTECTED])
Re: Aborting Map Function
If you want to kill the whole job (I assume that's what you mean by aborting all map tasks) from a mapper, you can use: new JobClient(jobConf).getJob(job.get(mapred.job.id)).killJob(); - milind On 4/16/08 10:25 AM, Owen O'Malley [EMAIL PROTECTED] wrote: On Apr 16, 2008, at 8:28 AM, Chaman Singh Verma wrote: I am developing one application with MapReduce and in that whenever some MapTask condition is met, I would like to broadcast to all other MapTask to abort their work. I am not quite sure whether such broadcasting functionality currently exist in Hadoop MapReduce. Could someone give some hints. This is pretty atypical behavior, but you could have each map look for the existence of an hdfs file every 1 minute or so. When the condition is true, create the file and your maps will exit in the next minute. Except on very large clusters, that wouldn't be too expensive... -- Owen - Milind -- Milind Bhandarkar, Chief Spammer, Grid Team Y!IM: GridSolutions 408-349-2136 ([EMAIL PROTECTED])