I guess a quick way to find an answer for your question is to look at size
of data block files stored in datanodes.

If they are all the same (e.g. 64MB), then you could say lines are NOT
preserved in block level as DFS simply cuts the original file into exact
64MB pieces.

They are almost all the same, by the way, except for few blocks which may
represent files smaller than 64MB or some blocks that may represent the end
blocks of a file.

/Taeho


On Thu, Aug 7, 2008 at 9:23 AM, Kevin <[EMAIL PROTECTED]> wrote:

> Hi,
>
> I guess this thread is old. But I eventually need to raise the
> question again as I am more into dfs now. Would a line be broken
> between adjacent blocks in dfs? Can line be preserved in block level?
>
> -Kevin
>
>
>
> On Wed, Jul 16, 2008 at 4:57 PM, Chris Douglas <[EMAIL PROTECTED]>
> wrote:
> > InputFormats don't have a concept of "blocks"; each FileSplit contains a
> > list of locations that advise the framework where it should prefer to
> > schedule the map (i.e. on the node that contains most of the data (in
> > practice, IIRC this is the the location of the first byte of the block,
> > which may not actually contain the bulk of the data)). For
> LineRecordReader,
> > this means that it will open a stream, seek to its start position, read
> > (opening up a connection to the node that contains that block, with luck
> a
> > local read) to the first record delimiter, then return lines as Text
> records
> > to the map until the end of that split precedes the start offset at the
> > beginning of a read (i.e. the end of split A and the start of split B
> will
> > likely be in the middle of a record, so A will emit that record and B
> will
> > start from the end of that record).
> >
> > I think it's fair to say that blocks and records are orthogonal
> abstractions
> > to HDFS and map/reduce. -C
>  >
> > On Jul 15, 2008, at 5:07 PM, Kevin wrote:
> >
> >> Hi,
> >>
> >> I was trying to parse text input with line-based information in mapper
> >> and this problem becomes an issue. I wonder if lines are preserved or
> >> broken when a file is cut into blocks by dfs. Also, it looks that
> >> although TextInputFormat breaks file into lines records, the
> >> InputSplit passed to InputFormat may not preserve lines. If this is
> >> the case, is it possible to restore the lines for mapper input, or I
> >> have to drop broken lines? Thank you.
> >>
> >> Best,
> >> -Kevin
> >
> >
>

Reply via email to