Re: Merge sequence files

Doug Cutting Tue, 15 May 2007 11:50:42 -0700

Johan Oskarsson wrote:

I'm considering using the sequence file output of hadoop jobs to servedata from as it would mean I could skip the conversion from sequencefile -> other file format step.
To do this efficiently I would need the data to be in one file.

I think it should be more efficient to keep things in separate files.If you use MapFileOutputFormat, there are methods to randomly accessentries from job output:


http://lucene.apache.org/hadoop/api/org/apache/hadoop/mapred/MapFileOutputFormat.html

SequenceFileOutputFormat will also let you open all readers, but there'sno random access, since a SequenceFile has no index.


http://lucene.apache.org/hadoop/api/org/apache/hadoop/mapred/SequenceFileOutputFormat.html

Will these suffice?

Doug

Re: Merge sequence files

Reply via email to