Yes, I have looked at the block files and it matches what you said. I
am just wondering if there is some property or flag that would turn
this feature on, if it exists.

-Kevin



On Wed, Aug 6, 2008 at 8:01 PM, Taeho Kang <[EMAIL PROTECTED]> wrote:
> I guess a quick way to find an answer for your question is to look at size
> of data block files stored in datanodes.
>
> If they are all the same (e.g. 64MB), then you could say lines are NOT
> preserved in block level as DFS simply cuts the original file into exact
> 64MB pieces.
>
> They are almost all the same, by the way, except for few blocks which may
> represent files smaller than 64MB or some blocks that may represent the end
> blocks of a file.
>
> /Taeho
>
>
> On Thu, Aug 7, 2008 at 9:23 AM, Kevin <[EMAIL PROTECTED]> wrote:
>
>> Hi,
>>
>> I guess this thread is old. But I eventually need to raise the
>> question again as I am more into dfs now. Would a line be broken
>> between adjacent blocks in dfs? Can line be preserved in block level?
>>
>> -Kevin
>>
>>
>>
>> On Wed, Jul 16, 2008 at 4:57 PM, Chris Douglas <[EMAIL PROTECTED]>
>> wrote:
>> > InputFormats don't have a concept of "blocks"; each FileSplit contains a
>> > list of locations that advise the framework where it should prefer to
>> > schedule the map (i.e. on the node that contains most of the data (in
>> > practice, IIRC this is the the location of the first byte of the block,
>> > which may not actually contain the bulk of the data)). For
>> LineRecordReader,
>> > this means that it will open a stream, seek to its start position, read
>> > (opening up a connection to the node that contains that block, with luck
>> a
>> > local read) to the first record delimiter, then return lines as Text
>> records
>> > to the map until the end of that split precedes the start offset at the
>> > beginning of a read (i.e. the end of split A and the start of split B
>> will
>> > likely be in the middle of a record, so A will emit that record and B
>> will
>> > start from the end of that record).
>> >
>> > I think it's fair to say that blocks and records are orthogonal
>> abstractions
>> > to HDFS and map/reduce. -C
>>  >
>> > On Jul 15, 2008, at 5:07 PM, Kevin wrote:
>> >
>> >> Hi,
>> >>
>> >> I was trying to parse text input with line-based information in mapper
>> >> and this problem becomes an issue. I wonder if lines are preserved or
>> >> broken when a file is cut into blocks by dfs. Also, it looks that
>> >> although TextInputFormat breaks file into lines records, the
>> >> InputSplit passed to InputFormat may not preserve lines. If this is
>> >> the case, is it possible to restore the lines for mapper input, or I
>> >> have to drop broken lines? Thank you.
>> >>
>> >> Best,
>> >> -Kevin
>> >
>> >
>>
>

Reply via email to