Thanks Bejoy! It's better to process the data blocks locally and separately. I just want to know how to deal with a structure (i.e. a word,a line) that is split into two blocks.
Cheers, Donal 在 2011年11月11日 下午7:01,Bejoy KS <bejoy.had...@gmail.com>写道: > Hi Donal > You can configure your map tasks the way you like to process your > input. If you have file of size 100 mb, it would be divided into two input > blocks and stored in hdfs ( if your dfs.block.size is default 64 Mb). It is > your choice on how you process the same using map reduce > - With the default TextInputFormat the two blocks would be processed by > two different mappers. (under default split settings) If the blocks are in > two different data nodes then two different mappers mappers would be > spanned in each data node in beat case. ie They are data local map tasks > - If you want one mapper to process the whole file,change your input > format to WholeFileInputFormat. There a mapper task would be triggred on > any one of the node where the blocks are located. (best case) If both the > blocks are not on the same node then one of the blocks would be transferred > to the map task location for processing. > > Hope it helps!... > > Thank You > Bejoy.K.S > > > 2011/11/11 臧冬松 <donal0...@gmail.com> > >> Thanks Denny! >> So that means each map task will have to read from another DataNode >> inorder to read the end line of the previous block? >> >> Cheers, >> Donal >> >> >> 2011/11/11 Denny Ye <denny...@gmail.com> >> >>> hi >>> Structured data is always being split into different blocks, likes a >>> word or line. >>> MapReduce task read HDFS data with the unit - *line* - it will read >>> the whole line from the end of previous block to start of subsequent to >>> obtains that part of line record. So you does not worry about the >>> Incomplete structured data. HDFS do nothing for this mechanism. >>> >>> -Regards >>> Denny Ye >>> >>> >>> On Fri, Nov 11, 2011 at 3:43 PM, 臧冬松 <donal0...@gmail.com> wrote: >>> >>>> Usually large file in HDFS is split into bulks and store in different >>>> DataNodes. >>>> A map task is assigned to deal with that bulk, I wonder what if the >>>> Structured data(i.e a word) was split into two bulks? >>>> How MapReduce and HDFS deal with this? >>>> >>>> Thanks! >>>> Donal >>>> >>> >>> >> >