Hi Aaron, I guess that it can be done by using counters. You can define a counter for each node in your cluster and then, in map method increment a node specific counter either by checking hostname or ip address. It's not a very good solution as you will need to modify your code whenever a node is added/removed from cluster and there will be as many if conditions in code as number of nodes. You can try this out if you do not find a cleaner solution. I wish that this counter should have been part of predefined counters.
Regards, Ajay Srivastava On 30-Mar-2012, at 12:49 AM, aaron_v wrote: > > Hi people, Am new to Nabble and Hadoop. I was having a look at the wordcount > program. Can someone please let me know how to find which data gets mapped > to which node?In the sense, I have a master node 0 and 4 other nodes 1-4 > and I ran the wordcount successfully. But I would like to print for each > node how much data it got from the input data file. Any suggestions?? > > us latha wrote: >> >> Hi, >> >> Inside Map method, performed following change for Example: WordCount >> v1.0<http://hadoop.apache.org/core/docs/current/mapred_tutorial.html#Example%3A+WordCount+v1.0>at >> http://hadoop.apache.org/core/docs/current/mapred_tutorial.html >> ------------------ >> String filename = new String(); >> ... >> filename = ((FileSplit) reporter.getInputSplit()).getPath().toString(); >> while (tokenizer.hasMoreTokens()) { >> word.set(tokenizer.nextToken()+" "+filename); >> -------------------- >> >> Worked great!! Thanks to everyone! >> >> Regards, >> Srilatha >> >> >> On Sat, Oct 18, 2008 at 6:24 PM, Latha <usla...@gmail.com> wrote: >> >>> Hi All, >>> >>> Thankyou for your valuable inputs in suggesting me the possible solutions >>> of creating an index file with following format. >>> word1 filename count >>> word2 filename count. >>> >>> However, following is not working for me. Please help me to resolve the >>> same. >>> >>> -------------------------- >>> public static class Map extends MapReduceBase implements >>> Mapper<LongWritable, Text, Text, Text> { >>> private Text word = new Text(); >>> private Text filename = new Text(); >>> public void map(LongWritable key, Text value, >>> OutputCollector<Text, Text > output, Reporter reporter) throws >>> IOException { >>> filename.set( ((FileSplit) >>> reporter.getInputSplit()).getPath().toString()); >>> String line = value.toString(); >>> StringTokenizer tokenizer = new StringTokenizer(line); >>> while (tokenizer.hasMoreTokens()) { >>> word.set(tokenizer.nextToken()); >>> output.collect(word, filename); >>> } >>> } >>> } >>> >>> public static class Reduce extends MapReduceBase implements >>> Reducer<Text, >>> Text , Text, Text> { >>> public void reduce(Text key, Iterator<Text> values, >>> OutputCollector<Text, Text > output, Reporter reporter) throws >>> IOException { >>> int sum = 0; >>> Text filename; >>> while (values.hasNext()) { >>> sum ++; >>> filename.set(values.next().toString()); >>> } >>> String file = filename.toString() + " " + ( new >>> IntWritable(sum)).toString(); >>> filename=new Text(file); >>> output.collect(key, filename); >>> } >>> } >>> >>> -------------------------- >>> 08/10/18 05:38:25 INFO mapred.JobClient: Task Id : >>> task_200810170342_0010_m_000000_2, Status : FAILED >>> java.io.IOException: Type mismatch in value from map: expected >>> org.apache.hadoop.io.IntWritable, recieved org.apache.hadoop.io.Text >>> at >>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:427) >>> at org.myorg.WordCount$Map.map(WordCount.java:23) >>> at org.myorg.WordCount$Map.map(WordCount.java:13) >>> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47) >>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219) >>> at >>> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2122) >>> >>> >>> Thanks >>> Srilatha >>> >>> >>> >>> On Mon, Oct 6, 2008 at 11:38 AM, Owen O'Malley <omal...@apache.org> >>> wrote: >>> >>>> On Sun, Oct 5, 2008 at 12:46 PM, Ted Dunning <ted.dunn...@gmail.com> >>>> wrote: >>>> >>>>> What you need to do is snag access to the filename in the configure >>>> method >>>>> of the mapper. >>>> >>>> >>>> You can also do it in the map method with: >>>> >>>> ((FileSplit) reporter.getInputSplit()).getPath() >>>> >>>> >>>> Then instead of outputting just the word as the key, output a pair >>>>> containing the word and the file name as the key. Everything >>>> downstream >>>>> should remain the same. >>>> >>>> >>>> If you want to have each file handled by a single reduce, I'd suggest: >>>> >>>> class FileWordPair implements Writable { >>>> private Text fileName; >>>> private Text word; >>>> ... >>>> public int hashCode() { >>>> return fileName.hashCode(); >>>> } >>>> } >>>> >>>> so that the HashPartitioner will send the records for file Foo to a >>>> single >>>> reducer. It would make sense to use this as an example for when to use >>>> grouping comparators (for getting a single call to reduce for each file) >>>> too... >>>> >>>> -- Owen >>>> >>> >>> >> >> > > -- > View this message in context: > http://old.nabble.com/How-to-modify-hadoop-wordcount-example-to-display-File-wise-results.-tp19826857p33544888.html > Sent from the Hadoop core-user mailing list archive at Nabble.com. >