Re: SequenceFile Header
Hi Edward , Thanks for your reply. My aim is not to generate a SequenceFile. It is to take a file (of a certain format) and sort it. So I guess I should create a input SequenceFile from the original file and feed it to the Sort as input. Now the output will again be SequenceFile format and I will have to convert it back to my original file format. So I am right now more concerned about step 1 (conversion of original file to input sequence file) and step 3 (conversion of output sequence file to original file format) .. It would be great if you can suggest some ways of doing that. Also please correct me if my approach is wrong.. Thanks, Matthew
Re: SequenceFile Header
On Wed, Sep 8, 2010 at 1:06 PM, Matthew John wrote: > Hi guys, > > I m trying to run a sort on a metafile which had a record consisting of a > key<8 bytes> and a value<32 bytes>. Sort will be with respect to the key. > But my input file does not have a header. So inorder to avail the use of > SequenceFile I thought I ll write a new file with the SequenceFile header > and my records. I have some doubts here :: > > Q1) What exactly is the sync (bytes[]).I dont even have a clue about it . > While trying to read some SequenceFile files (generated by randomwriter) I > am not able to figure out what the sync is. > > Q2) Do I have to provide the sync whereever I think a file split is required > ?? > > It would be great if someone can clarify these doubts. > > Thanks, > Matthew John > Are you trying to write sequence files by yourself? That is not the suggested way. You should write your sequence files like this. SequenceFile.Writer writer = null; writer = SequenceFile.createWriter (fs, jobConf, outPath, keyClass, valueClass , this.compressionType ,this.codec); writer.append(k, v); Or in a mapreduce job set the outputFormat to be SequenceFileOuputFormat, then you do not have to worry about the internals.
SequenceFile Header
Hi guys, I m trying to run a sort on a metafile which had a record consisting of a key<8 bytes> and a value<32 bytes>. Sort will be with respect to the key. But my input file does not have a header. So inorder to avail the use of SequenceFile I thought I ll write a new file with the SequenceFile header and my records. I have some doubts here :: Q1) What exactly is the sync (bytes[]).I dont even have a clue about it . While trying to read some SequenceFile files (generated by randomwriter) I am not able to figure out what the sync is. Q2) Do I have to provide the sync whereever I think a file split is required ?? It would be great if someone can clarify these doubts. Thanks, Matthew John