The query fit into mapreduce-user, since it primarily dealt with how Map/Reduce operates over data, just to clarify :)
On Fri, May 27, 2011 at 10:38 PM, Mohit Anchlia <[email protected]> wrote: > thanks! Just thought it's better to post to multiple groups together > since I didn't know where it belongs :) > > On Fri, May 27, 2011 at 10:04 AM, Harsh J <[email protected]> wrote: >> Mohit, >> >> Please do not cross-post a question to multiple lists unless you're >> announcing something. >> >> What you describe, does not happen; and the way the splitting is done >> for Text files is explained in good detail here: >> http://wiki.apache.org/hadoop/HadoopMapReduce >> >> Hope this solves your doubt :) >> >> On Fri, May 27, 2011 at 10:25 PM, Mohit Anchlia <[email protected]> >> wrote: >>> I am new to hadoop and from what I understand by default hadoop splits >>> the input into blocks. Now this might result in splitting a line of >>> record into 2 pieces and getting spread accross 2 maps. For eg: Line >>> "abcd" might get split into "ab" and "cd". How can one prevent this in >>> hadoop and pig? I am looking for some examples where I can see how I >>> can specify my own split so that it logically splits based on the >>> record delimiter and not the block size. For some reason I am not able >>> to get right examples online. >>> >> >> >> >> -- >> Harsh J >> > -- Harsh J
