Re: SequenceFile Header

2010-09-08 Thread Matthew John
Hi Edward ,

Thanks for your reply.

My aim is not to generate a SequenceFile. It is to take a file (of a certain
format) and sort it. So I guess I should create a input SequenceFile from
the original file and feed it to the Sort as input. Now the output will
again be SequenceFile format and I will have to convert it back to my
original file format.

So I am right now more concerned about step 1 (conversion of original file
to input sequence file) and step 3 (conversion of output sequence file to
original file format) .. It would be great if you can suggest some ways of
doing that. Also please correct me if my approach is wrong..

Thanks,

Matthew


Re: SequenceFile Header

2010-09-08 Thread Edward Capriolo
On Wed, Sep 8, 2010 at 1:06 PM, Matthew John  wrote:
> Hi guys,
>
> I m trying to run a sort on a metafile which had a record consisting of a
> key<8 bytes> and a value<32 bytes>. Sort will be with respect to the key.
> But my input file does not have a header. So inorder to avail the use of
> SequenceFile I thought I ll write a new file with the SequenceFile header
> and my records. I have some doubts here ::
>
> Q1) What exactly is the sync (bytes[]).I dont even have a clue about it .
> While trying to read some SequenceFile files (generated by randomwriter) I
> am not able to figure out what the sync is.
>
> Q2) Do I have to provide the sync whereever I think a file split is required
> ??
>
> It would be great if someone can clarify these doubts.
>
> Thanks,
> Matthew John
>

Are you trying to write sequence files by yourself? That is not the
suggested way.

You should write your sequence files like this.
SequenceFile.Writer writer = null;
 writer =  SequenceFile.createWriter
(fs, jobConf, outPath, keyClass, valueClass ,
this.compressionType ,this.codec);

writer.append(k, v);

Or in a mapreduce job set the outputFormat to be
SequenceFileOuputFormat, then you do not have to worry about the
internals.


SequenceFile Header

2010-09-08 Thread Matthew John
Hi guys,

I m trying to run a sort on a metafile which had a record consisting of a
key<8 bytes> and a value<32 bytes>. Sort will be with respect to the key.
But my input file does not have a header. So inorder to avail the use of
SequenceFile I thought I ll write a new file with the SequenceFile header
and my records. I have some doubts here ::

Q1) What exactly is the sync (bytes[]).I dont even have a clue about it .
While trying to read some SequenceFile files (generated by randomwriter) I
am not able to figure out what the sync is.

Q2) Do I have to provide the sync whereever I think a file split is required
??

It would be great if someone can clarify these doubts.

Thanks,
Matthew John