Mapper2 doesn't wait for Mapper1. They starts at the same time. It knows the 
"real" record by looking at the characters he reads. If he sees a newline, then 
that is the start of a "real" record. It discards all the stuff before that 
newline. Check the source code of LineRecordReader. You will get more detailed 
information for that.

________________________________
From: Zhong Wang <wangzhong....@gmail.com>
To: core-user@hadoop.apache.org
Sent: Thursday, June 11, 2009 10:47:48 AM
Subject: Re: Large size Text file split

> Mapper 2 starts reading at byte 10000. It finds the first newline at byte
> 10020, so the first "real" record it processes starts at byte 10021.
>

There's one problem: how does Mapper2 know the "real" record start at
10021 before Mapper1 reach the end of Split1 (9999)? Mappers starts at
the same time.


-- 
Zhong Wang



      

Reply via email to