Re: Using own InputSplit

Harsh J Fri, 27 May 2011 10:13:56 -0700

The query fit into mapreduce-user, since it primarily dealt with how
Map/Reduce operates over data, just to clarify :)


On Fri, May 27, 2011 at 10:38 PM, Mohit Anchlia <[email protected]> wrote:
> thanks! Just thought it's better to post to multiple groups together
> since I didn't know where it belongs :)
>
> On Fri, May 27, 2011 at 10:04 AM, Harsh J <[email protected]> wrote:
>> Mohit,
>>
>> Please do not cross-post a question to multiple lists unless you're
>> announcing something.
>>
>> What you describe, does not happen; and the way the splitting is done
>> for Text files is explained in good detail here:
>> http://wiki.apache.org/hadoop/HadoopMapReduce
>>
>> Hope this solves your doubt :)
>>
>> On Fri, May 27, 2011 at 10:25 PM, Mohit Anchlia <[email protected]> 
>> wrote:
>>> I am new to hadoop and from what I understand by default hadoop splits
>>> the input into blocks. Now this might result in splitting a line of
>>> record into 2 pieces and getting spread accross 2 maps. For eg: Line
>>> "abcd" might get split into "ab" and "cd". How can one prevent this in
>>> hadoop and pig? I am looking for some examples where I can see how I
>>> can specify my own split so that it logically splits based on the
>>> record delimiter and not the block size. For some reason I am not able
>>> to get right examples online.
>>>
>>
>>
>>
>> --
>> Harsh J
>>
>



-- 
Harsh J

Re: Using own InputSplit

Reply via email to