Hi, Please read the map section of http://wiki.apache.org/hadoop/HadoopMapReduce to understand how Hadoop ends up respecting record boundaries despite block-chops not taking that into consideration. I hope it helps clear things up for you.
On Fri, Feb 10, 2012 at 10:26 PM, GUOJUN Zhu <guojun_...@freddiemac.com> wrote: > > Hi, > > I am learning Hadoop. We have some special formated text file for input, so > we need to write some customized inputFormat, probably based on > FileInputFormat. Does the FileInputFormat respect the record boundary > (every line or maybe every other line)? I am reading the source code > (1.0.0). For example in the LineRecordReader, is "in" field (InputStream) > of the LineReader(in,..) the full HDFS file (of many blocks) or just the > real local file of one block? All books I read have very little details > about it. Can any expert point me to some reference about it, or maybe > which part of the source code I should concentrate on? Thanks. > > Zhu, Guojun > Modeling Sr Graduate > 571-3824370 > guojun_...@freddiemac.com > Financial Engineering > Freddie Mac -- Harsh J Customer Ops. Engineer Cloudera | http://tiny.cloudera.com/about