Re:Re:Re:Re:Re: one quesiton in the book of "hadoop:definitive guide 2 edition"

Daniel,Wu Wed, 03 Aug 2011 01:07:37 -0700

I understand now. And looks like the job will print the min value instead of 
max value per my test. In the stdout I can see the following data: 3 is the 
year (I fake the data by myself), 99 is the max, and 0 is the min. We can see 
for year 3, there are 100 records. So the inside a group, the key could be 
different, and 
context.write(key, NullWritable.get()) will write the LAST key to the output, 
since the temperature is order desc, so the last key has the min temperature


3 99
........
3 0
number of records for this group 100
-----------------biggest key is--------------------------
3 0


    public void reduce(IntPair key, Iterable<NullWritable> values, 
                       Context context
                       ) throws IOException, InterruptedException {
      int count=0;
      for (NullWritable iw:values) {
            count++;
                        System.out.print(key.getFirst());
                    System.out.print(' ');
                    System.out.println(key.getSecond());
       }
          System.out.println("number of records for this group 
"+Integer.toString(count));
          System.out.println("-----------------biggest key 
is--------------------------");
          System.out.print(key.getFirst());
          System.out.print(' ');
          System.out.println(key.getSecond());
          context.write(key, NullWritable.get());
     }




At 2011-08-03 11:41:23,"Daniel,Wu" <hadoop...@163.com> wrote:
>or I should ask, should the input of the reducer for the group of year 1900 be 
>like
>key,  value pair
>(1900,35), null
>(1900,34),null
>(1900,33),null
>
>
>or like
>(1900,35), null
>(1900,35), null    ==> since (1900,34) is for the same group as (1900,35), so 
>it use (1900,35) as the key.
>(1900,35), null
>
>
>At 2011-08-03 10:35:51,"Daniel,Wu" <hadoop...@163.com> wrote:
>>
>>So the key of a group is determined by the first coming record in the group,  
>>if we have 3 records in a group
>>1: (1900,35)
>>2:(1900,34)
>>3:(1900,33)
>>
>>if (1900,35) comes in as the first row, then the result key will be 
>>(1900,35), when the second row (1900,34) comes in, it won't the impact the 
>>key of the group, meaning it will not overwrite the key (1900,35) to 
>>(1900,34), correct.
>>
>>>in the KeyComparator, these are guaranteed to come in reverse order in the 
>>>>second slot.  That is, if 35 is the maximum temperature then (1900,35) will 
>>>>come before ANY other (1900,t).  Then as the GroupComparator does its 
>>>>thing, any time (1900,t) comes up it gets compared AND FOUND EQUAL TO 
>>>>(1900,35), and thus its (null) value is added to the (1900,35) group. > 
>>>>The reducer then gets a (1900,35) key with an Iterable of null values, 
>>>>which it pretty much discards and just emits the key, which contains the 
>>>>maximum value.

Re:Re:Re:Re:Re: one quesiton in the book of "hadoop:definitive guide 2 edition"

Reply via email to