Re: How to maintain record boundaries

Ankur C. Goel Fri, 11 May 2012 15:35:16 -0700

Record reader implementations are typically written to honor record
boundaries. This means that while reading a split data they will continue
reading if the end of split has reached BUT end of record is yet to be
encountered.


-@nkur

On 5/11/12 5:15 AM, "shreya....@cognizant.com" <shreya....@cognizant.com>
wrote:

>Hi
>
>When we store data into HDFS, it gets broken into small pieces and
>distributed across the cluster based on Block size for the file.
>While processing the data using MR program I want a particular record as
>a whole without it being split across nodes, but the data has already
>been split and stored in HDFS when I loaded the data.
>How would I make sure that my record doesn't get split, how would my
>Input format make a difference now ?
>
>Regards
>Shreya
>
>This e-mail and any files transmitted with it are for the sole use of the
>intended recipient(s) and may contain confidential and privileged
>information. If you are not the intended recipient(s), please reply to
>the sender and destroy all copies of the original message. Any
>unauthorized review, use, disclosure, dissemination, forwarding, printing
>or copying of this email, and/or any action taken in reliance on the
>contents of this e-mail is strictly prohibited and may be unlawful.

Re: How to maintain record boundaries

Reply via email to