Tim,

Its pretty interesting to read, I once dug in for another user around
here. Check out this archive post:
http://search-hadoop.com/m/cRmJ3gTtN32 - Make sure to also read the
LineReader sources (a layer under the LineRecordReader explained
above), where we also can see the beyond-block-boundary fetch happen
at the bytes level :)

On Wed, Sep 19, 2012 at 10:03 PM, Tim Robertson
<timrobertson...@gmail.com> wrote:
> Thanks for the explanation HJ - I always meant to look into that bit of code
> to work out how it did it.
>
> Tim
>
>
>
>
> On Wed, Sep 19, 2012 at 6:24 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>> Hi Tim,
>>
>> Splits don't look at newlines in the TextInputFormat at least. So
>> since the computed splits > default map numbers, I think a perfect
>> file of 10 blocks will spawn only 10 mappers. The mapper's record
>> reader is the one that reads until a newline (even after the end of
>> its block length bytes).
>>
>> On Wed, Sep 19, 2012 at 9:16 PM, Tim Robertson
>> <timrobertson...@gmail.com> wrote:
>> > I think the splitting recognises the end of line, so you might get 11
>> > but
>> > otherwise that looks correct.
>> >
>> >
>> >
>> > On Wed, Sep 19, 2012 at 5:42 PM, Pedro Sá da Costa <psdc1...@gmail.com>
>> > wrote:
>> >>
>> >>
>> >>
>> >> If I've an input  file of 640MB in size, and a split size of 64Mb, this
>> >> file will be partitioned in 10 splits, and each split will be processed
>> >> by a
>> >> map task, right?
>> >>
>> >> --
>> >> Best regards,
>> >>
>> >
>>
>>
>>
>> --
>> Harsh J
>
>



-- 
Harsh J

Reply via email to