Re: Using own InputSplit

Harsh J Fri, 27 May 2011 10:26:36 -0700

Mohit,

On Fri, May 27, 2011 at 10:44 PM, Mohit Anchlia <mohitanch...@gmail.com> wrote:
> Actually this link confused me
>
> http://hadoop.apache.org/common/docs/current/mapred_tutorial.html#Job+Input
>
> "Clearly, logical splits based on input-size is insufficient for many
> applications since record boundaries must be respected. In such cases,
> the application should implement a RecordReader, who is responsible
> for respecting record-boundaries and presents a record-oriented view
> of the logical InputSplit to the individual task."
>
> But it looks like application doesn't need to do that since it's done
> default? Or am I misinterpreting this entirely?


For any type of InputFormat Hadoop provides along with itself, it
should already handle this for you (Text Files (say, \n-ended),
Sequence Files, Avro Datafiles). If you have a custom file format that
defines its own record delimiter character(s); you would surely need
to write your own InputFormat that splits across properly (the wiki
still helps on how to manage the reads across the first split and the
subsequents).

-- 
Harsh J

Re: Using own InputSplit

Reply via email to