Re: Splitting SequenceFile in controlled manner

Harsh J Tue, 06 Dec 2011 12:03:22 -0800

Majid,

Sync markers are written into sequence files already, they are part of the 
format. This is nothing to worry about - and is simple enough to test and be 
confident about. The mechanism is same as reading a text file with newlines - 
the reader will ensure reading off the boundary data in order to complete a 
record if it has to.


On 07-Dec-2011, at 1:25 AM, Majid Azimi wrote:

> hadoop writes in a SequenceFile in in key-value pair(record) format.
> Consider we have a large unbounded log file. Hadoop will split the file
> based on block size and save them on multiple data nodes. Is it guaranteed
> that each key-value pair will reside on a single block? or we may have a
> case so that key is in one block on node 1 and value(or parts of it) on
> second block on node 2? If we may have unmeaning-full splits, then what is
> the solution? sync markers?
> 
> Another question is: Does hadoop automatically write sync markers or we
> should write it manually?

Re: Splitting SequenceFile in controlled manner

Reply via email to