I understand now. And looks like the job will print the min value instead of max value per my test. In the stdout I can see the following data: 3 is the year (I fake the data by myself), 99 is the max, and 0 is the min. We can see for year 3, there are 100 records. So the inside a group, the key could be different, and context.write(key, NullWritable.get()) will write the LAST key to the output, since the temperature is order desc, so the last key has the min temperature
3 99 ........ 3 0 number of records for this group 100 -----------------biggest key is-------------------------- 3 0 public void reduce(IntPair key, Iterable<NullWritable> values, Context context ) throws IOException, InterruptedException { int count=0; for (NullWritable iw:values) { count++; System.out.print(key.getFirst()); System.out.print(' '); System.out.println(key.getSecond()); } System.out.println("number of records for this group "+Integer.toString(count)); System.out.println("-----------------biggest key is--------------------------"); System.out.print(key.getFirst()); System.out.print(' '); System.out.println(key.getSecond()); context.write(key, NullWritable.get()); } At 2011-08-03 11:41:23,"Daniel,Wu" <hadoop...@163.com> wrote: >or I should ask, should the input of the reducer for the group of year 1900 be >like >key, value pair >(1900,35), null >(1900,34),null >(1900,33),null > > >or like >(1900,35), null >(1900,35), null ==> since (1900,34) is for the same group as (1900,35), so >it use (1900,35) as the key. >(1900,35), null > > >At 2011-08-03 10:35:51,"Daniel,Wu" <hadoop...@163.com> wrote: >> >>So the key of a group is determined by the first coming record in the group, >>if we have 3 records in a group >>1: (1900,35) >>2:(1900,34) >>3:(1900,33) >> >>if (1900,35) comes in as the first row, then the result key will be >>(1900,35), when the second row (1900,34) comes in, it won't the impact the >>key of the group, meaning it will not overwrite the key (1900,35) to >>(1900,34), correct. >> >>>in the KeyComparator, these are guaranteed to come in reverse order in the >>>>second slot. That is, if 35 is the maximum temperature then (1900,35) will >>>>come before ANY other (1900,t). Then as the GroupComparator does its >>>>thing, any time (1900,t) comes up it gets compared AND FOUND EQUAL TO >>>>(1900,35), and thus its (null) value is added to the (1900,35) group. > >>>>The reducer then gets a (1900,35) key with an Iterable of null values, >>>>which it pretty much discards and just emits the key, which contains the >>>>maximum value.