Hi

When we store data into HDFS, it gets broken into small pieces and distributed 
across the cluster based on Block size for the file.
While processing the data using MR program I want a particular record as a 
whole without it being split across nodes, but the data has already been split 
and stored in HDFS when I loaded the data.
How would I make sure that my record doesn't get split, how would my Input 
format make a difference now ?

Regards
Shreya

This e-mail and any files transmitted with it are for the sole use of the 
intended recipient(s) and may contain confidential and privileged information. 
If you are not the intended recipient(s), please reply to the sender and 
destroy all copies of the original message. Any unauthorized review, use, 
disclosure, dissemination, forwarding, printing or copying of this email, 
and/or any action taken in reliance on the contents of this e-mail is strictly 
prohibited and may be unlawful.

Reply via email to