Yes, I have looked at the block files and it matches what you said. I am just wondering if there is some property or flag that would turn this feature on, if it exists.
-Kevin On Wed, Aug 6, 2008 at 8:01 PM, Taeho Kang <[EMAIL PROTECTED]> wrote: > I guess a quick way to find an answer for your question is to look at size > of data block files stored in datanodes. > > If they are all the same (e.g. 64MB), then you could say lines are NOT > preserved in block level as DFS simply cuts the original file into exact > 64MB pieces. > > They are almost all the same, by the way, except for few blocks which may > represent files smaller than 64MB or some blocks that may represent the end > blocks of a file. > > /Taeho > > > On Thu, Aug 7, 2008 at 9:23 AM, Kevin <[EMAIL PROTECTED]> wrote: > >> Hi, >> >> I guess this thread is old. But I eventually need to raise the >> question again as I am more into dfs now. Would a line be broken >> between adjacent blocks in dfs? Can line be preserved in block level? >> >> -Kevin >> >> >> >> On Wed, Jul 16, 2008 at 4:57 PM, Chris Douglas <[EMAIL PROTECTED]> >> wrote: >> > InputFormats don't have a concept of "blocks"; each FileSplit contains a >> > list of locations that advise the framework where it should prefer to >> > schedule the map (i.e. on the node that contains most of the data (in >> > practice, IIRC this is the the location of the first byte of the block, >> > which may not actually contain the bulk of the data)). For >> LineRecordReader, >> > this means that it will open a stream, seek to its start position, read >> > (opening up a connection to the node that contains that block, with luck >> a >> > local read) to the first record delimiter, then return lines as Text >> records >> > to the map until the end of that split precedes the start offset at the >> > beginning of a read (i.e. the end of split A and the start of split B >> will >> > likely be in the middle of a record, so A will emit that record and B >> will >> > start from the end of that record). >> > >> > I think it's fair to say that blocks and records are orthogonal >> abstractions >> > to HDFS and map/reduce. -C >> > >> > On Jul 15, 2008, at 5:07 PM, Kevin wrote: >> > >> >> Hi, >> >> >> >> I was trying to parse text input with line-based information in mapper >> >> and this problem becomes an issue. I wonder if lines are preserved or >> >> broken when a file is cut into blocks by dfs. Also, it looks that >> >> although TextInputFormat breaks file into lines records, the >> >> InputSplit passed to InputFormat may not preserve lines. If this is >> >> the case, is it possible to restore the lines for mapper input, or I >> >> have to drop broken lines? Thank you. >> >> >> >> Best, >> >> -Kevin >> > >> > >> >