Not sure of your question. 

Java child Heap size is independent of how files are split on HDFS. 

I suggest you look at Tom White's book on HDFS and how files are split in to 
blocks. 

Blocks are split on set sizes. 64MB by default. 
Your record boundaries are not necessarily on file block boundaries so one 
process may read the rest of the last record in block A and then complete 
reading it at the start of block B. A different task may start with block B and 
skip the first n bytes until it hits the start of a record. 

HTH

-Mike

On Apr 26, 2012, at 3:46 PM, Barry, Sean F wrote:

> Within my small 2 node cluster I set up my 4 core slave node to have 4 task 
> trackers and I also limited my java heap size to -Xmx1024m
> 
> Is there a possibility that when the data gets broken up that it will break 
> it at a place in the file that is not a whitespace? Or is that already 
> handled when the data on HDFS is broken up into blocks?
> 
> -SB

Reply via email to