I guess this would be a better answer

A FileSplit is merely a description of the boundaries. e.g., "bytes 0 to
9999" and "bytes 10000 to 19999". The Mapper then interprets the boundaries
described by a FileSplit in a way that makes sense at the data level.  The
FileSplit does not actually physically contain the data to be mapped over.

So mapper 1 will open a file via the InputFormat and start reading at byte
0, and stop reading when it gets to its "final record," which is defined as
the first record which stops after byte 9999. If it has to read through
bytes 10020, that's ok. The stream used to read the bytes from the file will
not "cut off" at 9999.

Mapper 2 starts reading at byte 10000. It finds the first newline at byte
10020, so the first "real" record it processes starts at byte 10021.


http://mail-archives.apache.org/mod_mbox/hadoop-common-user/200906.mbox/<d6d7c4410906110012l3629748agf064176b224c8...@mail.gmail.com><http://mail-archives.apache.org/mod_mbox/hadoop-common-user/200906.mbox/%3cd6d7c4410906110012l3629748agf064176b224c8...@mail.gmail.com%3e>

On Fri, Jan 29, 2010 at 1:55 AM, Prabhu Hari Dhanapal <
dragonzsn...@gmail.com> wrote:

> The splitting does not know anything about the input file's internal
> logical structure, for example line-oriented text files are split on
> arbitrary byte boundaries.
>
>
> On Fri, Jan 29, 2010 at 1:49 AM, .ke. sivakumar <kesivaku...@gmail.com>wrote:
>
>> Hadoop will take care of it. If the split is supposed to be at the middle
>> of
>> the
>> line, then it will be extended till the end. Though the split limit will
>> be
>> exceeded
>> by few bytes.
>>
>>
>>
>> On Thu, Jan 28, 2010 at 7:34 PM, Udaya Lakshmi <udaya...@gmail.com>
>> wrote:
>>
>> > Hi,
>> >   When framework splits a file, will it happen that some part of a
>> > line falls in one split and the other part in some other split? Or is
>> > the framework going to take care that it always splits at the end of
>> > the line?
>> >
>> > Thanks,
>> > Udaya.
>> >
>>
>
>
>
> --
> Hari
>



-- 
Hari

Reply via email to